SWE 任务和轨迹进度看板

最后更新时间:2026-06-15 18:11:28 BJT | 下次刷新:2026-06-15 19:11:29 BJT | 刷新间隔:3600 秒

收集 PR 总数
558,766
1h 0 / 24h +1,707
有效 SWE 总数
46,456
1h +31 / 24h +1,045
整体处理成功率
10.0%
Valid SWE / 已处理 462,633
difficulty_score 均值
5.88
median 5.8,count 46,402

语言进度

语言收集 PR过去 1h过去 24h有效 SWE过去 1h过去 24h已处理处理成功率
Cc29,5880+1199,1090+16229,516
30.9%
C++cpp45,9430+3313,736+7+9345,582
8.2%
Gogo126,439007,578+2+4673,381
10.3%
Javajava85,4060+5383,646+9+15060,927
6.0%
JavaScriptjs37,1280+7196,689+9+29348,432
13.8%
Pythonpy98,883004,5630+2279,890
5.7%
Rustrust68,650005,195+1+6968,604
7.6%
TypeScriptts66,729005,940+3+21056,301
10.6%

运行参数

语言评估模型 (OPENAI)填充模型 (ANTHROPIC)并发数min_source_filesmax_source_files
Cgpt-5.4claude-opus-4-712215
C++glm-5claude-sonnet-4-68215
GoQwen3.6-35B-A3BQwen3.6-35B-A3B12210
Javaclaude-haiku-4-5-20251001claude-opus-4-78210
JavaScriptQwen3.6-35B-A3BQwen3.6-35B-A3B12210
PythonQwen3.6-35B-A3BQwen3.6-35B-A3B12315
RustQwen3.6-35B-A3BQwen3.6-35B-A3B8210
TypeScriptQwen3.6-35B-A3BQwen3.6-35B-A3B12210

失败原因统计

语言已处理有效 SWE失败trivial_prvalidationinfra_errortimeoutworkflow_error其他
C29,5169,10920,40714,6409605,01514311
C++45,5823,73641,8468,8871,99631,931320579266
Go73,3817,57865,80320,8236,82235,1221,576780652
Java60,9273,64657,28115,7976,52131,7021,2627431,602
JavaScript48,4326,68941,74320,0981,73718,8797832460
Python79,8904,56375,32724,8045,72045,111894355120
Rust68,6045,19563,40918,9565,25836,4131,3118631,234
TypeScript56,3015,94050,36113,4053,24931,4841,40181111

trivial_pr:PR 被 LLM 评估为过于简单(如仅修改配置、文档、依赖版本等),不适合作为 SWE 任务。

validation:任务生成后验证失败(NOP agent 未返回 reward=0 或 ORACLE agent 未返回 reward=1)。

infra_error:基础设施错误(Docker 构建失败、网络超时、磁盘空间不足等)。

timeout:处理超时(单个 PR 总超时或 Claude Code session 超时)。

workflow_error:工作流程错误(PR 元数据获取失败、worktree 创建失败、patch 生成失败等)。

fix.patch 复杂度

语言Valid SWE CountAvg fix.patch linesAvg fix.patch hunksAvg fix.patch files
C9,109306.3916.675.47
C++3,736293.8513.585.09
Go7,578219.5412.964.42
Java3,646167.9810.774.34
JavaScript6,68976.726.262.78
Python4,563151.6610.983.81
Rust5,195228.1313.184.09
TypeScript5,940159.559.584.15

统计方法说明

难度打分 difficulty_score

读取每个有效任务目录的 solution/fix.patchtests/instruction.md,由 src/swegen/scoring.py 使用零 API 静态评分。

当前公式采用 log-scale 连续评分,避免中等规模 patch 过早变成 hard。权重为:patch_scope 38%logic_complexity 32%context_breadth 15%test_complexity 10%instruction_complexity 5%

label 阈值:easy <= 4.0medium <= 7.0hard > 7.0

Tags 生成与展示

tags 不是看板现场计算的,而是在 swegen 构建任务时由 LLM 根据 PR 信息生成,并写入 task.toml[metadata].tags

prompt 要求 tags 按三段式生成:编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。

fix.patch 统计

patch 统计来自每个有效任务的 solution/fix.patch,并按语言扩展名过滤代码文件,口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。

Avg fix.patch lines 统计代码文件 diff 中新增/删除行数;Avg fix.patch hunks 统计 @@ hunk 数;Avg fix.patch files 统计涉及的代码文件数。

difficulty_label 分布

语言easy / medium / hardeasymediumhard
C
821 / 6042 / 2237
8216,0422,237
C++
418 / 2305 / 1007
4182,3051,007
Go
583 / 5496 / 1493
5835,4961,493
Java
410 / 2340 / 891
4102,340891
JavaScript
1001 / 4979 / 707
1,0014,979707
Python
253 / 2952 / 1335
2532,9521,335
Rust
370 / 3067 / 1756
3703,0671,756
TypeScript
527 / 4411 / 1001
5274,4111,001

difficulty_score 概览

语言countminp25medianmeanp75max
C9,1002.44.96.05.967.09.2
C++3,7302.54.86.05.977.19.1
Go7,5722.64.95.85.866.89.1
Java3,6412.84.85.95.907.09.2
JavaScript6,6872.64.45.25.356.29.2
Python4,5402.65.26.26.197.39.1
Rust5,1932.75.26.36.267.49.0
TypeScript5,9392.74.75.65.716.69.1

全局 Top Tags

library21,557 (46.4%)
backend14,248 (30.7%)
cli6,362 (13.7%)
frontend2,818 (6.1%)
testing1,975 (4.3%)
http1,454 (3.1%)
react1,311 (2.8%)
framework1,128 (2.4%)
kubernetes658 (1.4%)
embedded632 (1.4%)
async630 (1.4%)
networking555 (1.2%)
cpp522 (1.1%)
typescript336 (0.7%)
eslint327 (0.7%)
parsing326 (0.7%)
graphql324 (0.7%)
postgresql284 (0.6%)
git273 (0.6%)
compiler272 (0.6%)
security271 (0.6%)
aws254 (0.5%)
database252 (0.5%)
json238 (0.5%)
redis223 (0.5%)
fastapi206 (0.4%)
angular202 (0.4%)
cryptography201 (0.4%)
api194 (0.4%)
fullstack193 (0.4%)

每语言 Tags 分布

C c

library4,769 (52.4%)
backend2,312 (25.4%)
cli1,079 (11.8%)
embedded608 (6.7%)
cpp520 (5.7%)
networking353 (3.9%)
testing327 (3.6%)
framework252 (2.8%)
postgresql202 (2.2%)
http186 (2.0%)
firmware184 (2.0%)
kernel177 (1.9%)
ruby172 (1.9%)
quic169 (1.9%)
rust138 (1.5%)
tls128 (1.4%)
bluetooth126 (1.4%)
scheduler111 (1.2%)
python109 (1.2%)
cryptography107 (1.2%)

C++ cpp

library2,615 (70.1%)
backend652 (17.5%)
testing462 (12.4%)
cli301 (8.1%)
framework176 (4.7%)
http125 (3.3%)
boost113 (3.0%)
async89 (2.4%)
parsing66 (1.8%)
qt60 (1.6%)
serialization53 (1.4%)
compiler49 (1.3%)
geometry48 (1.3%)
networking48 (1.3%)
frontend37 (1.0%)
json37 (1.0%)
logging37 (1.0%)
ros237 (1.0%)
templates35 (0.9%)
formatting33 (0.9%)

Go go

backend3,988 (52.7%)
cli1,867 (24.7%)
library1,850 (24.4%)
kubernetes605 (8.0%)
http497 (6.6%)
testing209 (2.8%)
docker126 (1.7%)
aws116 (1.5%)
grpc112 (1.5%)
terraform108 (1.4%)
prometheus92 (1.2%)
networking89 (1.2%)
database84 (1.1%)
framework82 (1.1%)
api77 (1.0%)
git76 (1.0%)
security67 (0.9%)
aws-sdk65 (0.9%)
blockchain54 (0.7%)
configuration52 (0.7%)

Java java

backend1,783 (48.9%)
library1,641 (45.0%)
testing190 (5.2%)
aem147 (4.0%)
spring143 (3.9%)
framework138 (3.8%)
http125 (3.4%)
android107 (2.9%)
json66 (1.8%)
cli57 (1.6%)
sling57 (1.6%)
flink48 (1.3%)
maven45 (1.2%)
concurrency43 (1.2%)
grpc43 (1.2%)
kafka43 (1.2%)
mybatis37 (1.0%)
nacos34 (0.9%)
frontend32 (0.9%)
serialization32 (0.9%)

JavaScript js

library3,589 (53.7%)
backend1,163 (17.4%)
frontend941 (14.1%)
cli805 (12.0%)
react327 (4.9%)
testing281 (4.2%)
typescript275 (4.1%)
eslint251 (3.8%)
framework194 (2.9%)
http170 (2.5%)
fastify142 (2.1%)
webpack140 (2.1%)
mongoose101 (1.5%)
nodejs98 (1.5%)
svelte91 (1.4%)
express90 (1.3%)
lighthouse62 (0.9%)
vue61 (0.9%)
async60 (0.9%)
stylelint59 (0.9%)

Python py

backend1,914 (42.1%)
library1,889 (41.6%)
cli637 (14.0%)
fastapi185 (4.1%)
django122 (2.7%)
testing103 (2.3%)
ansible99 (2.2%)
pytorch95 (2.1%)
async90 (2.0%)
framework90 (2.0%)
http88 (1.9%)
aws70 (1.5%)
aiohttp65 (1.4%)
openai53 (1.2%)
click51 (1.1%)
pydantic51 (1.1%)
litellm49 (1.1%)
beets48 (1.1%)
flask46 (1.0%)
black42 (0.9%)

Rust rust

library2,913 (56.1%)
backend1,109 (21.4%)
cli1,074 (20.7%)
testing288 (5.5%)
async214 (4.1%)
http197 (3.8%)
compiler117 (2.3%)
git116 (2.2%)
parsing89 (1.7%)
macros88 (1.7%)
graphql75 (1.4%)
serde69 (1.3%)
substrate68 (1.3%)
blockchain66 (1.3%)
sql62 (1.2%)
framework59 (1.1%)
datafusion56 (1.1%)
cryptography55 (1.1%)
lsp53 (1.0%)
database52 (1.0%)

TypeScript ts

library2,291 (38.6%)
frontend1,635 (27.5%)
backend1,327 (22.3%)
react976 (16.4%)
cli542 (9.1%)
angular196 (3.3%)
graphql161 (2.7%)
framework137 (2.3%)
electron128 (2.2%)
javascript124 (2.1%)
fullstack122 (2.1%)
testing115 (1.9%)
vue90 (1.5%)
eslint76 (1.3%)
express76 (1.3%)
github-actions76 (1.3%)
nextjs75 (1.3%)
vscode68 (1.1%)
http66 (1.1%)
mcp63 (1.1%)
轨迹文件数
27
轨迹总条数
79,279
平均消息轮数
67.0
平均 composite_score
0.7449
基于 29,596 条有分数的轨迹

轨迹文件总览

数据集脚手架模型Owner轨迹数文件大小平均轮数平均 Token平均 Tool Calls平均 Score
swegenClaude Codeglm5chaofan1,096446 MB72.739,64837.90.7218
swegenOpenCodeglm5chaofan860197 MB64.830,57734.00.7337
swegenOpenHands SDKglm5chaofan1,140280 MB119.649,81159.80.7533
swegenTerminus-2glm5chaofan1,331142 MB53.226,96841.20.7048
swerebench_oraclesolvedClaude Codeglm5chaofan4,053601 MB44.122,11722.7
swerebench_oraclesolvedOpenCodeglm5chaofan3,431405 MB41.319,70421.5
swerebench_oraclesolvedOpenHands-AIglm5chaofan2,808638 MB135.136,43467.2
swerebench_oraclesolvedTerminus-2glm5chaofan2,920291 MB55.725,47336.9
swegen_selfmade_260301_260414OpenHands SDKglm5jierun4,555993 MB98.541,83949.1
swegen_selfmade_260301_260414Terminus-2glm5jierun4,805659 MB75.834,44854.4
swegen_selfmade_260415_260505OpenHands SDKglm5jierun3,954901 MB108.843,86754.2
swerebench_oraclesolvedClaude Codeglm5jierun5,013779 MB44.724,12323.3
swerebench_oraclesolvedOpenCodeglm5jierun3,521464 MB51.822,44727.0
swerebench_oraclesolvedOpenHands SDKglm5jierun2,733507 MB95.835,56747.5
swerebench_oraclesolvedTerminus-2glm5jierun2,998296 MB55.525,25737.0
swerebench_othersClaude Codeglm5jierun2,885457 MB43.825,19423.10.7178
swerebench_othersOpenCodeglm5jierun2,637360 MB49.123,63725.80.7210
swerebench_othersOpenHands SDKglm5jierun1,843340 MB90.235,60244.80.7868
swerebench_othersTerminus-2glm5jierun2,270225 MB50.624,99035.10.7844
swerebenchv2_python_oraclesolvedClaude Codeglm5jierun2,728445 MB47.526,06425.1
swerebenchv2_python_oraclesolvedOpenCodeglm5jierun1,654240 MB60.723,89831.3
swerebenchv2_python_oraclesolvedOpenHands SDKglm5jierun1,521313 MB103.040,23251.0
swerebenchv2_python_oraclesolvedTerminus-2glm5jierun1,633174 MB55.727,01337.6
v2nopy_fullClaude Codeglm5jierun5,128818 MB42.726,94122.80.7281
v2nopy_fullOpenCodeglm5jierun3,465455 MB47.023,17224.80.7353
v2nopy_fullOpenHands SDKglm5jierun3,266620 MB92.738,25246.00.7580
v2nopy_fullTerminus-2glm5jierun3,675411 MB64.128,36545.60.7799

质量评分统计

数据集脚手架compositeefficiencystyletool_masterycompletionprecision
swegenClaude Code0.72180.91250.34140.86780.62130.7722
swegenOpenCode0.73370.93220.32720.88860.62880.7923
swegenOpenHands SDK0.75330.89750.35000.85370.77370.7632
swegenTerminus-20.70480.92410.42280.89830.39600.8866
swerebench_othersClaude Code0.71780.88640.31970.79420.64050.8924
swerebench_othersOpenCode0.72100.89520.31710.80930.65090.8621
swerebench_othersOpenHands SDK0.78680.88860.36930.76980.93330.8525
swerebench_othersTerminus-20.78440.85390.42920.81720.82030.9323
v2nopy_fullClaude Code0.72810.91860.32130.88860.55650.8996
v2nopy_fullOpenCode0.73530.92780.32540.90080.57660.8772
v2nopy_fullOpenHands SDK0.75800.90020.35550.84400.75230.8370
v2nopy_fullTerminus-20.77990.92220.43100.90160.69300.8808

按数据集对比

数据集总轨迹数平均轮数平均 Token平均 Tool Calls平均 Score
swegen4,42777.436,69143.80.7271
swegen_selfmade_260301_2604149,36086.838,04451.8
swegen_selfmade_260415_2605053,954108.843,86754.2
swerebench_oraclesolved27,47761.825,72433.3
swerebench_others9,63555.726,71130.80.7475
swerebenchv2_python_oraclesolved7,53663.428,65434.4
v2nopy_full15,53459.228,81533.50.7483

源数据路径目录

数据集脚手架Owner文件路径大小条数
swegenClaude Codechaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_cc_2135.jsonl446 MB2,135
swegenOpenCodechaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_oc_1177.jsonl197 MB1,177
swegenOpenHands SDKchaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_ohsdk_1140.jsonl280 MB1,140
swegenTerminus-2chaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_t2_1331.jsonl142 MB1,331
swegen_selfmade_260301_260414OpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swegen_selfmade_260301_260414_oh_sdk_4555.jsonl993 MB4,555
swegen_selfmade_260301_260414Terminus-2jierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swegen_selfmade_260301_260414_t2_4805.jsonl659 MB4,805
swegen_selfmade_260415_260505OpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swegen_selfmade_260415_260505_oh_sdk_3954.jsonl901 MB3,954
swerebench_oraclesolvedClaude Codechaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_cc_4053.jsonl601 MB4,053
swerebench_oraclesolvedClaude Codejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_cc_5013.jsonl779 MB5,013
swerebench_oraclesolvedOpenCodechaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oc_3431.jsonl405 MB3,431
swerebench_oraclesolvedOpenCodejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oc_3521.jsonl464 MB3,521
swerebench_oraclesolvedOpenHands-AIchaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oh_2808.jsonl638 MB2,808
swerebench_oraclesolvedOpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oh_sdk_2733.jsonl507 MB2,733
swerebench_oraclesolvedTerminus-2chaofan/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_t2_2920.jsonl291 MB2,920
swerebench_oraclesolvedTerminus-2jierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_t2_2998.jsonl296 MB2,998
swerebench_othersClaude Codejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_cc_2885.jsonl457 MB2,885
swerebench_othersOpenCodejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oc_2637.jsonl360 MB2,637
swerebench_othersOpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oh_sdk_1843.jsonl340 MB1,843
swerebench_othersTerminus-2jierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_t2_2270.jsonl225 MB2,270
swerebenchv2_python_oraclesolvedClaude Codejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_cc_2728.jsonl445 MB2,728
swerebenchv2_python_oraclesolvedOpenCodejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oc_1654.jsonl240 MB1,654
swerebenchv2_python_oraclesolvedOpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oh_sdk_1521.jsonl313 MB1,521
swerebenchv2_python_oraclesolvedTerminus-2jierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_t2_1633.jsonl174 MB1,633
v2nopy_fullClaude Codejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_cc_5128.jsonl818 MB5,128
v2nopy_fullOpenCodejierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oc_3465.jsonl455 MB3,465
v2nopy_fullOpenHands SDKjierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oh_sdk_3266.jsonl620 MB3,266
v2nopy_fullTerminus-2jierun/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_t2_3675.jsonl411 MB3,675

统计方法说明

平均轮数 / Token / Tool Calls

平均轮数:每条轨迹的 messages 数组长度的平均值。

平均 Token:使用 tiktoken cl100k_base tokenizer 对所有 message 的 content + reasoning_content 精确编码计数的平均值。

平均 Tool Calls:assistant 消息中 tool_calls 数组长度之和的平均值。对 Terminus-2 脚手架,统计 assistant 消息 JSON content 中 commands 数组的长度。

质量评分

composite_score(0-1)由五个维度加权:efficiency(效率)、style(风格)、tool_mastery(工具掌握)、completion(完成度)、precision(精确度)。

仅部分文件包含 _score 字段,无分数的文件显示 "—"。