SWE 任务和轨迹进度看板

最后更新时间:2026-07-04 07:47:40 BJT | 下次刷新:2026-07-04 08:47:40 BJT | 刷新间隔:3600 秒

收集 PR 总数
557,067
1h -41,710 / 24h -41,703
有效 SWE 总数
51,808
1h +55 / 24h +900
整体处理成功率
8.8%
Valid SWE / 已处理 588,824
difficulty_score 均值
5.89
median 5.9,count 51,758

语言进度

语言收集 PR过去 1h过去 24h有效 SWE过去 1h过去 24h已处理处理成功率
Cc29,469-3,512-3,51210,127+35+41132,981
30.7%
C++cpp45,620-3,543-3,5434,0620049,019
8.3%
Gogo126,439-6,586-6,5838,3860+6123,518
6.8%
Javajava84,868-5,968-5,9674,260+1+10290,730
4.7%
JavaScriptjs36,409-3,657-3,6567,2570040,065
18.1%
Pythonpy98,883-9,462-9,4625,445+16+326108,256
5.0%
Rustrust68,650-3,927-3,9275,5950072,539
7.7%
TypeScriptts66,729-5,055-5,0536,676+3+5571,716
9.3%

运行参数

语言评估模型 (OPENAI)填充模型 (ANTHROPIC)并发数min_source_filesmax_source_files
Cglm-5claude-opus-4-712215
C++glm-5claude-opus-4-78215
Goglm-5claude-opus-4-712210
Javaglm-5claude-opus-4-78210
JavaScriptglm-5claude-opus-4-712210
Pythonglm-5claude-opus-4-712315
Rustglm-5claude-opus-4-78210
TypeScriptglm-5claude-opus-4-712210

失败原因统计

语言已处理有效 SWE失败trivial_prvalidationinfra_errortimeoutworkflow_error其他
C32,98110,12722,85419,4431,1462,4779312
C++49,0194,06244,9578,7786,80930,529159662266
Go123,5188,386115,13238,74124,04849,7421,4611,033108
Java90,7304,26086,47026,98211,01241,2181,0865,0091,599
JavaScript40,0657,25732,80820,7249313,3491150
Python108,2565,445102,81144,78515,51942,956836339120
Rust72,5395,59566,94430,14910,20524,1001,0778461,233
TypeScript71,7166,67665,04022,08913,93926,8241,7697118

trivial_pr:PR 被 LLM 评估为过于简单(如仅修改配置、文档、依赖版本等),不适合作为 SWE 任务。

validation:任务生成后验证失败(NOP agent 未返回 reward=0 或 ORACLE agent 未返回 reward=1)。

infra_error:基础设施错误(Docker 构建失败、网络超时、磁盘空间不足等)。

timeout:处理超时(单个 PR 总超时或 Claude Code session 超时)。

workflow_error:工作流程错误(PR 元数据获取失败、worktree 创建失败、patch 生成失败等)。

fix.patch 复杂度

语言Valid SWE CountAvg fix.patch linesAvg fix.patch hunksAvg fix.patch files
C10,127331.7517.795.81
C++4,062286.0413.745.09
Go8,386212.0112.564.34
Java4,260166.7210.704.27
JavaScript7,25777.116.312.80
Python5,445154.3111.123.90
Rust5,595226.3513.194.11
TypeScript6,676155.309.534.11

统计方法说明

难度打分 difficulty_score

读取每个有效任务目录的 solution/fix.patchtests/instruction.md,由 src/swegen/scoring.py 使用零 API 静态评分。

当前公式采用 log-scale 连续评分,避免中等规模 patch 过早变成 hard。权重为:patch_scope 38%logic_complexity 32%context_breadth 15%test_complexity 10%instruction_complexity 5%

label 阈值:easy <= 4.0medium <= 7.0hard > 7.0

Tags 生成与展示

tags 不是看板现场计算的,而是在 swegen 构建任务时由 LLM 根据 PR 信息生成,并写入 task.toml[metadata].tags

prompt 要求 tags 按三段式生成:编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。

fix.patch 统计

patch 统计来自每个有效任务的 solution/fix.patch,并按语言扩展名过滤代码文件,口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。

Avg fix.patch lines 统计代码文件 diff 中新增/删除行数;Avg fix.patch hunks 统计 @@ hunk 数;Avg fix.patch files 统计涉及的代码文件数。

difficulty_label 分布

语言easy / medium / hardeasymediumhard
C
887 / 6677 / 2555
8876,6772,555
C++
433 / 2524 / 1100
4332,5241,100
Go
689 / 6041 / 1650
6896,0411,650
Java
469 / 2753 / 1034
4692,7531,034
JavaScript
1096 / 5376 / 784
1,0965,376784
Python
277 / 3477 / 1667
2773,4771,667
Rust
391 / 3312 / 1890
3913,3121,890
TypeScript
615 / 4925 / 1136
6154,9251,136

difficulty_score 概览

语言countminp25medianmeanp75max
C10,1192.44.96.05.997.19.2
C++4,0572.54.96.05.997.29.1
Go8,3802.64.95.85.856.89.1
Java4,2562.84.85.95.907.09.2
JavaScript7,2562.64.45.25.366.29.2
Python5,4212.65.26.36.257.39.1
Rust5,5932.75.26.36.267.49.0
TypeScript6,6762.74.75.65.726.69.2

全局 Top Tags

library24,122 (46.6%)
backend16,033 (31.0%)
cli6,949 (13.4%)
missing-feature3,642 (7.0%)
frontend3,165 (6.1%)
testing2,225 (4.3%)
http1,595 (3.1%)
react1,432 (2.8%)
incomplete-validation1,360 (2.6%)
framework1,249 (2.4%)
missing-implementation1,194 (2.3%)
missing-metadata-propagation945 (1.8%)
missing-fallback862 (1.7%)
kubernetes739 (1.4%)
async683 (1.3%)
wrong-default666 (1.3%)
embedded642 (1.2%)
networking629 (1.2%)
cpp602 (1.2%)
missing-validation571 (1.1%)
missing-functionality453 (0.9%)
missing-configuration424 (0.8%)
typescript401 (0.8%)
type-handling-inconsistency385 (0.7%)
parsing363 (0.7%)
eslint345 (0.7%)
graphql340 (0.7%)
race-condition336 (0.6%)
missing-configuration-option306 (0.6%)
postgresql300 (0.6%)

每语言 Tags 分布

C c

library5,362 (52.9%)
backend2,606 (25.7%)
cli1,156 (11.4%)
missing-feature692 (6.8%)
embedded616 (6.1%)
cpp599 (5.9%)
testing417 (4.1%)
networking409 (4.0%)
framework294 (2.9%)
incomplete-validation283 (2.8%)
missing-implementation283 (2.8%)
http221 (2.2%)
ruby217 (2.1%)
postgresql213 (2.1%)
firmware184 (1.8%)
kernel181 (1.8%)
quic174 (1.7%)
missing-metadata-propagation152 (1.5%)
rust141 (1.4%)
tls136 (1.3%)

C++ cpp

library2,794 (68.8%)
backend741 (18.2%)
testing496 (12.2%)
cli359 (8.8%)
missing-feature268 (6.6%)
framework186 (4.6%)
missing-implementation144 (3.5%)
http129 (3.2%)
incomplete-validation119 (2.9%)
boost114 (2.8%)
async92 (2.3%)
parsing76 (1.9%)
qt65 (1.6%)
serialization55 (1.4%)
compiler54 (1.3%)
geometry51 (1.3%)
missing-fallback50 (1.2%)
networking49 (1.2%)
missing-metadata-propagation48 (1.2%)
formatting45 (1.1%)

Go go

backend4,423 (52.8%)
library2,087 (24.9%)
cli2,027 (24.2%)
missing-feature739 (8.8%)
kubernetes665 (7.9%)
http536 (6.4%)
incomplete-validation251 (3.0%)
testing236 (2.8%)
missing-metadata-propagation207 (2.5%)
missing-fallback179 (2.1%)
missing-implementation179 (2.1%)
wrong-default134 (1.6%)
docker134 (1.6%)
terraform125 (1.5%)
grpc122 (1.5%)
aws121 (1.4%)
missing-validation120 (1.4%)
prometheus115 (1.4%)
networking97 (1.2%)
framework90 (1.1%)

Java java

backend2,071 (48.6%)
library1,924 (45.2%)
missing-feature237 (5.6%)
testing222 (5.2%)
spring162 (3.8%)
framework156 (3.7%)
aem147 (3.5%)
http146 (3.4%)
android127 (3.0%)
incomplete-validation100 (2.3%)
missing-metadata-propagation95 (2.2%)
missing-implementation83 (1.9%)
missing-configuration72 (1.7%)
cli70 (1.6%)
json68 (1.6%)
missing-null-check64 (1.5%)
wrong-default63 (1.5%)
maven61 (1.4%)
sling57 (1.3%)
kafka56 (1.3%)

JavaScript js

library3,913 (53.9%)
backend1,239 (17.1%)
frontend1,043 (14.4%)
cli859 (11.8%)
missing-feature496 (6.8%)
react354 (4.9%)
typescript334 (4.6%)
testing297 (4.1%)
eslint263 (3.6%)
incomplete-validation221 (3.0%)
framework200 (2.8%)
http183 (2.5%)
fastify149 (2.1%)
webpack148 (2.0%)
missing-fallback124 (1.7%)
missing-metadata-propagation113 (1.6%)
svelte106 (1.5%)
nodejs104 (1.4%)
mongoose104 (1.4%)
missing-implementation100 (1.4%)

Python py

library2,324 (42.8%)
backend2,241 (41.3%)
cli737 (13.6%)
missing-feature454 (8.4%)
fastapi220 (4.1%)
django157 (2.9%)
missing-implementation146 (2.7%)
pytorch119 (2.2%)
testing116 (2.1%)
incomplete-validation114 (2.1%)
missing-fallback114 (2.1%)
missing-metadata-propagation111 (2.0%)
async102 (1.9%)
framework102 (1.9%)
ansible100 (1.8%)
http95 (1.8%)
aws76 (1.4%)
aiohttp66 (1.2%)
pydantic65 (1.2%)
missing-parameter63 (1.2%)

Rust rust

library3,132 (56.0%)
backend1,224 (21.9%)
cli1,138 (20.3%)
missing-feature411 (7.3%)
testing315 (5.6%)
async225 (4.0%)
http211 (3.8%)
missing-implementation162 (2.9%)
incomplete-validation133 (2.4%)
compiler120 (2.1%)
git119 (2.1%)
missing-metadata-propagation97 (1.7%)
macros93 (1.7%)
parsing92 (1.6%)
graphql81 (1.4%)
blockchain80 (1.4%)
substrate77 (1.4%)
serde73 (1.3%)
missing-fallback71 (1.3%)
sql65 (1.2%)

TypeScript ts

library2,586 (38.7%)
frontend1,837 (27.5%)
backend1,488 (22.3%)
react1,069 (16.0%)
cli603 (9.0%)
missing-feature345 (5.2%)
angular203 (3.0%)
graphql164 (2.5%)
framework159 (2.4%)
missing-fallback148 (2.2%)
javascript144 (2.2%)
electron141 (2.1%)
incomplete-validation139 (2.1%)
fullstack131 (2.0%)
testing126 (1.9%)
missing-metadata-propagation122 (1.8%)
wrong-default120 (1.8%)
vue105 (1.6%)
missing-implementation97 (1.5%)
nextjs86 (1.3%)
无轨迹数据