SWE 任务和轨迹进度看板

最后更新时间:2026-05-13 16:52:02 BJT | 下次刷新:2026-05-13 17:52:02 BJT | 刷新间隔:3600 秒

收集 PR 总数
425,966
1h +2,033 / 24h +2,234
有效 SWE 总数
28,825
1h +32 / 24h +240
整体处理成功率
47.5%
Valid SWE / 已处理目录 60,704
difficulty_score 均值
5.77
median 5.7,count 28,778

语言进度

语言收集 PR过去 1h过去 24h有效 SWE过去 1h过去 24h已处理目录处理成功率
Cc28,232+30+307,7340010,431
74.1%
C++cpp43,105+39+391,918+11+654,249
45.1%
Gogo88,472+864+9554,454+7+4710,962
40.6%
Javajava59,936+47+532,637+7+336,444
40.9%
JavaScriptjs30,941+155+1553,668+2+27,877
46.6%
Pythonpy68,268+185+2362,518007,964
31.6%
Rustrust50,823+387+4372,505+1+365,426
46.2%
TypeScriptts56,189+326+3293,391+4+577,351
46.1%

运行参数

语言评估模型 (OPENAI)填充模型 (ANTHROPIC)并发数min_source_filesmax_source_files
Cgpt-5.4claude-sonnet-4-616215
C++gpt-5.4claude-sonnet-4-616215
Gogpt-5.4claude-sonnet-4-616210
Javaclaude-haiku-4-5-20251001claude-sonnet-4-616210
JavaScriptglmmoedsaglmmoedsa16210
PythonMiniMax-M2.7MiniMax-M2.716315
Rustgpt-5.4claude-opus-4-616210
TypeScriptgpt-5.4claude-sonnet-4-616210

失败原因统计

语言已处理成功失败trivial_prvalidationinfra_errortimeoutworkflow_error其他
C39,7988,22031,57813,8092,26412,7306801,855240
C++23,4922,10821,3845,05657214,881221381273
Go30,1576,53423,62310,5332,1966,5831,5831,1781,550
Java14,7572,99111,7664,7281,5933,2132613571,614
JavaScript16,9725,11511,8577,3861,7945293586371,153
Python22,9152,63020,2858,8592,5348,204238317133
Rust17,5372,94514,5924,4051,1505,5426411,6201,234
TypeScript36,9773,39333,5846,65788622,8561812,899105

fix.patch 复杂度

语言Valid SWE CountAvg fix.patch linesAvg fix.patch hunksAvg fix.patch files
C7,734281.2215.404.93
C++1,918330.7511.554.54
Go4,454270.9814.924.96
Java2,637170.8810.934.46
JavaScript3,66873.286.212.76
Python2,518135.789.993.44
Rust2,505257.0312.864.06
TypeScript3,391159.629.094.09

统计方法说明

难度打分 difficulty_score

读取每个有效任务目录的 solution/fix.patchtests/instruction.md,由 src/swegen/scoring.py 使用零 API 静态评分。

当前公式采用 log-scale 连续评分,避免中等规模 patch 过早变成 hard。权重为:patch_scope 38%logic_complexity 32%context_breadth 15%test_complexity 10%instruction_complexity 5%

label 阈值:easy <= 4.0medium <= 7.0hard > 7.0

Tags 生成与展示

tags 不是看板现场计算的,而是在 swegen 构建任务时由 LLM 根据 PR 信息生成,并写入 task.toml[metadata].tags

prompt 要求 tags 按三段式生成:编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。

fix.patch 统计

patch 统计来自每个有效任务的 solution/fix.patch,并按语言扩展名过滤代码文件,口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。

Avg fix.patch lines 统计代码文件 diff 中新增/删除行数;Avg fix.patch hunks 统计 @@ hunk 数;Avg fix.patch files 统计涉及的代码文件数。

difficulty_label 分布

语言easy / medium / hardeasymediumhard
C
735 / 5189 / 1802
7355,1891,802
C++
323 / 1194 / 397
3231,194397
Go
370 / 3283 / 795
3703,283795
Java
351 / 1676 / 607
3511,676607
JavaScript
593 / 2726 / 348
5932,726348
Python
211 / 1724 / 560
2111,724560
Rust
260 / 1465 / 778
2601,465778
TypeScript
346 / 2570 / 475
3462,570475

difficulty_score 概览

语言countminp25medianmeanp75max
C7,7262.44.95.95.917.09.2
C++1,9142.54.45.65.636.89.0
Go4,4482.64.95.85.816.79.1
Java2,6342.84.75.95.816.99.2
JavaScript3,6672.64.45.25.286.19.2
Python2,4952.64.95.85.906.98.9
Rust2,5032.74.96.26.117.49.0
TypeScript3,3912.74.65.55.586.58.9

全局 Top Tags

library13,014 (45.2%)
backend8,878 (30.8%)
cli4,020 (14.0%)
frontend1,705 (5.9%)
testing1,240 (4.3%)
react917 (3.2%)
http907 (3.2%)
framework779 (2.7%)
embedded567 (2.0%)
cpp396 (1.4%)
networking361 (1.3%)
async328 (1.1%)
kubernetes265 (0.9%)
graphql231 (0.8%)
postgresql226 (0.8%)
eslint217 (0.8%)
parsing209 (0.7%)
aws182 (0.6%)
angular174 (0.6%)
kernel173 (0.6%)
compiler172 (0.6%)
firmware170 (0.6%)
quic166 (0.6%)
git165 (0.6%)
json162 (0.6%)
redis154 (0.5%)
aem147 (0.5%)
security142 (0.5%)
rust141 (0.5%)
tls138 (0.5%)

每语言 Tags 分布

C c

library4,080 (52.8%)
backend1,884 (24.4%)
cli938 (12.1%)
embedded562 (7.3%)
cpp395 (5.1%)
networking287 (3.7%)
testing213 (2.8%)
postgresql186 (2.4%)
kernel173 (2.2%)
framework172 (2.2%)
firmware170 (2.2%)
quic161 (2.1%)
http146 (1.9%)
rust131 (1.7%)
bluetooth121 (1.6%)
tls119 (1.5%)
ruby113 (1.5%)
scheduler110 (1.4%)
cryptography102 (1.3%)
python88 (1.1%)

C++ cpp

library1,413 (73.8%)
backend308 (16.1%)
testing245 (12.8%)
cli152 (7.9%)
boost92 (4.8%)
framework77 (4.0%)
http70 (3.7%)
async63 (3.3%)
compiler41 (2.1%)
parsing40 (2.1%)
serialization36 (1.9%)
templates29 (1.5%)
actor-framework28 (1.5%)
arrayfire28 (1.5%)
formatting27 (1.4%)
logging22 (1.1%)
redis22 (1.1%)
audio20 (1.0%)
sparql20 (1.0%)
iceberg19 (1.0%)

Go go

backend2,363 (53.1%)
cli1,197 (26.9%)
library992 (22.3%)
http337 (7.6%)
kubernetes233 (5.2%)
testing137 (3.1%)
docker98 (2.2%)
aws80 (1.8%)
framework71 (1.6%)
grpc70 (1.6%)
aws-sdk56 (1.3%)
dns45 (1.0%)
git45 (1.0%)
database44 (1.0%)
aws-lambda38 (0.9%)
security37 (0.8%)
blockchain36 (0.8%)
terraform36 (0.8%)
prometheus34 (0.8%)
templ34 (0.8%)

Java java

backend1,328 (50.4%)
library1,168 (44.3%)
aem147 (5.6%)
testing133 (5.0%)
spring106 (4.0%)
framework77 (2.9%)
http77 (2.9%)
android72 (2.7%)
json63 (2.4%)
sling57 (2.2%)
flink47 (1.8%)
cli40 (1.5%)
concurrency40 (1.5%)
grpc37 (1.4%)
nacos34 (1.3%)
mybatis33 (1.3%)
maven29 (1.1%)
parsing27 (1.0%)
sql-parser27 (1.0%)
websocket27 (1.0%)

JavaScript js

library1,797 (49.0%)
backend762 (20.8%)
frontend487 (13.3%)
cli436 (11.9%)
testing204 (5.6%)
react180 (4.9%)
eslint178 (4.9%)
framework176 (4.8%)
fastify118 (3.2%)
http105 (2.9%)
mongoose97 (2.6%)
webpack71 (1.9%)
typescript69 (1.9%)
express67 (1.8%)
lighthouse61 (1.7%)
svelte61 (1.7%)
aframe53 (1.4%)
async50 (1.4%)
apostrophecms48 (1.3%)
nodejs47 (1.3%)

Python py

backend1,095 (43.9%)
library916 (36.7%)
cli418 (16.7%)
ansible98 (3.9%)
fastapi78 (3.1%)
testing77 (3.1%)
framework66 (2.6%)
aiohttp62 (2.5%)
aws56 (2.2%)
django54 (2.2%)
http52 (2.1%)
click46 (1.8%)
dbt41 (1.6%)
black40 (1.6%)
async39 (1.6%)
openai39 (1.6%)
jinja235 (1.4%)
pipx34 (1.4%)
litellm31 (1.2%)
pytorch31 (1.2%)

Rust rust

library1,409 (56.3%)
cli578 (23.1%)
backend458 (18.3%)
testing166 (6.6%)
http93 (3.7%)
git88 (3.5%)
async77 (3.1%)
compiler59 (2.4%)
graphql57 (2.3%)
parsing55 (2.2%)
datafusion53 (2.1%)
substrate48 (1.9%)
actix-web45 (1.8%)
macros45 (1.8%)
sql44 (1.8%)
framework41 (1.6%)
parquet39 (1.6%)
blockchain31 (1.2%)
clap28 (1.1%)
sqlparser28 (1.1%)

TypeScript ts

library1,239 (36.5%)
frontend1,081 (31.9%)
react733 (21.6%)
backend680 (20.1%)
cli261 (7.7%)
angular173 (5.1%)
graphql133 (3.9%)
framework99 (2.9%)
electron94 (2.8%)
fullstack86 (2.5%)
testing65 (1.9%)
github-actions59 (1.7%)
vue55 (1.6%)
express48 (1.4%)
javascript45 (1.3%)
nextjs42 (1.2%)
eslint39 (1.2%)
xstate37 (1.1%)
react-native36 (1.1%)
zod36 (1.1%)
轨迹文件数
24
轨迹总条数
65,965
平均消息轮数
61.5
平均 composite_score
0.7449
基于 29,596 条有分数的轨迹

轨迹文件总览

数据集脚手架模型Agent轨迹数(main)轨迹数(sub)文件大小平均轮数平均 Token平均 Tool Calls平均 Score
swegenClaude Codeglm5chaofan1,0961,039446 MB72.735,90437.90.7218
swegenOpenCodeglm5chaofan860317197 MB64.828,86734.00.7337
swegenOpenHands SDKglm5chaofan1,140280 MB119.644,17259.80.7533
swegenTerminusglm5chaofan1,331142 MB53.226,3620.00.7048
swerebench_oraclesolvedClaude Codeglm5chaofan4,053601 MB44.121,01522.7
swerebench_oraclesolvedOpenCodeglm5chaofan3,431405 MB41.319,51921.5
swerebench_oraclesolvedOpenHands SDKglm5chaofan2,808638 MB135.140,63167.2
swerebench_oraclesolvedTerminusglm5chaofan2,920291 MB55.724,6250.0
swerebench_oraclesolvedClaude Codeglm5jierun5,013779 MB44.722,63023.3
swerebench_oraclesolvedOpenCodeglm5jierun3,521464 MB51.822,13827.0
swerebench_oraclesolvedOpenHands SDKglm5jierun2,733507 MB95.831,78547.5
swerebench_oraclesolvedTerminusglm5jierun2,998296 MB55.524,3960.0
swerebench_othersClaude Codeglm5jierun2,885457 MB43.823,38023.10.7178
swerebench_othersOpenCodeglm5jierun2,637360 MB49.123,13525.80.7210
swerebench_othersOpenHands SDKglm5jierun1,843340 MB90.232,07844.80.7868
swerebench_othersTerminusglm5jierun2,270225 MB50.624,4020.00.7844
swerebenchv2_python_oraclesolvedClaude Codeglm5jierun2,728445 MB47.524,44625.1
swerebenchv2_python_oraclesolvedOpenCodeglm5jierun1,654240 MB60.723,82231.3
swerebenchv2_python_oraclesolvedOpenHands SDKglm5jierun1,521313 MB103.035,85851.0
swerebenchv2_python_oraclesolvedTerminusglm5jierun1,633174 MB55.726,3820.0
v2nopy_fullClaude Codeglm5jierun5,128818 MB42.724,09722.80.7281
v2nopy_fullOpenCodeglm5jierun3,465455 MB47.021,71824.80.7353
v2nopy_fullOpenHands SDKglm5jierun3,266620 MB92.733,36746.00.7580
v2nopy_fullTerminusglm5jierun3,675411 MB64.127,5170.00.7799

质量评分统计

数据集脚手架compositeefficiencystyletool_masterycompletionprecision
swegenClaude Code0.72180.91250.34140.86780.62130.7722
swegenOpenCode0.73370.93220.32720.88860.62880.7923
swegenOpenHands SDK0.75330.89750.35000.85370.77370.7632
swegenTerminus0.70480.92410.42280.89830.39600.8866
swerebench_othersClaude Code0.71780.88640.31970.79420.64050.8924
swerebench_othersOpenCode0.72100.89520.31710.80930.65090.8621
swerebench_othersOpenHands SDK0.78680.88860.36930.76980.93330.8525
swerebench_othersTerminus0.78440.85390.42920.81720.82030.9323
v2nopy_fullClaude Code0.72810.91860.32130.88860.55650.8996
v2nopy_fullOpenCode0.73530.92780.32540.90080.57660.8772
v2nopy_fullOpenHands SDK0.75800.90020.35550.84400.75230.8370
v2nopy_fullTerminus0.77990.92220.43100.90160.69300.8808

按脚手架对比

脚手架总轨迹数平均轮数平均 Token平均 Tool Calls平均 Score
Claude Code20,90345.823,71324.00.7241
OpenCode15,56849.622,18725.90.7297
OpenHands SDK13,311105.435,60652.40.7656
Terminus14,82756.725,6110.00.7675

按数据集对比

数据集总轨迹数平均轮数平均 Token平均 Tool Calls平均 Score
swegen4,42777.433,79731.40.7271
swerebench_oraclesolved27,47761.825,09525.3
swerebench_others9,63555.725,21722.60.7475
swerebenchv2_python_oraclesolved7,53663.427,03226.2
v2nopy_full15,53459.226,32422.70.7483

源数据路径目录

数据集脚手架Agent文件路径大小条数
swegenClaude Codechaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_cc_2135.jsonl446 MB2,135
swegenOpenCodechaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_oc_1177.jsonl197 MB1,177
swegenOpenHands SDKchaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_ohsdk_1140.jsonl280 MB1,140
swegenTerminuschaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_t2_1331.jsonl142 MB1,331
swerebench_oraclesolvedClaude Codechaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_cc_4053.jsonl601 MB4,053
swerebench_oraclesolvedClaude Codejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_cc_5013.jsonl779 MB5,013
swerebench_oraclesolvedOpenCodechaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oc_3431.jsonl405 MB3,431
swerebench_oraclesolvedOpenCodejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oc_3521.jsonl464 MB3,521
swerebench_oraclesolvedOpenHands SDKchaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oh_2808.jsonl638 MB2,808
swerebench_oraclesolvedOpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oh_sdk_2733.jsonl507 MB2,733
swerebench_oraclesolvedTerminuschaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_t2_2920.jsonl291 MB2,920
swerebench_oraclesolvedTerminusjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_t2_2998.jsonl296 MB2,998
swerebench_othersClaude Codejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_cc_2885.jsonl457 MB2,885
swerebench_othersOpenCodejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oc_2637.jsonl360 MB2,637
swerebench_othersOpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oh_sdk_1843.jsonl340 MB1,843
swerebench_othersTerminusjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_t2_2270.jsonl225 MB2,270
swerebenchv2_python_oraclesolvedClaude Codejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_cc_2728.jsonl445 MB2,728
swerebenchv2_python_oraclesolvedOpenCodejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oc_1654.jsonl240 MB1,654
swerebenchv2_python_oraclesolvedOpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oh_sdk_1521.jsonl313 MB1,521
swerebenchv2_python_oraclesolvedTerminusjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_t2_1633.jsonl174 MB1,633
v2nopy_fullClaude Codejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_cc_5128.jsonl818 MB5,128
v2nopy_fullOpenCodejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oc_3465.jsonl455 MB3,465
v2nopy_fullOpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oh_sdk_3266.jsonl620 MB3,266
v2nopy_fullTerminusjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_t2_3675.jsonl411 MB3,675

统计方法说明

轨迹数统计

JSONL 文件中每行为一条轨迹。有 _agent_type 字段的文件按 main/subagent 分类;无该字段的文件所有行视为 main。

平均轮数 / Token / Tool Calls

平均轮数:每条轨迹的 messages 数组长度的平均值。

平均 Token(估算):所有 message 的 content + reasoning_content 字符总数 ÷ 4 的平均值。

平均 Tool Calls:assistant 消息中 tool_calls 数组长度之和的平均值。

质量评分

composite_score(0-1)由五个维度加权:efficiency(效率)、style(风格)、tool_mastery(工具掌握)、completion(完成度)、precision(精确度)。

仅部分文件包含 _score 字段,无分数的文件显示 "—"。