SWE 任务和轨迹进度看板

最后更新时间:2026-06-22 17:03:00 BJT | 下次刷新:2026-06-22 18:03:00 BJT | 刷新间隔:3600 秒

收集 PR 总数
557,067
1h 0 / 24h 0
有效 SWE 总数
49,647
1h +62 / 24h +389
整体处理成功率
10.5%
Valid SWE / 已处理 471,079
difficulty_score 均值
5.89
median 5.9,count 49,590

语言进度

语言收集 PR过去 1h过去 24h有效 SWE过去 1h过去 24h已处理处理成功率
Cc29,469009,688+4+2529,469
32.9%
C++cpp45,620004,0320019,571
20.6%
Gogo126,439008,007+14+12387,081
9.2%
Javajava84,868004,012+1+877,669
5.2%
JavaScriptjs36,409007,092+9+2936,409
19.5%
Pythonpy98,883004,997+23+10795,919
5.2%
Rustrust68,650005,472+4+5868,611
8.0%
TypeScriptts66,729006,347+7+3956,350
11.3%

运行参数

语言评估模型 (OPENAI)填充模型 (ANTHROPIC)并发数min_source_filesmax_source_files
Cgpt-5.4claude-sonnet-4-612215
C++glm-5claude-sonnet-4-68215
GoQwen3.6-35B-A3BQwen3.6-35B-A3B12210
Javaclaude-haiku-4-5-20251001claude-sonnet-4-68210
JavaScriptQwen3.6-35B-A3BQwen3.6-35B-A3B12210
Pythonglm-5claude-sonnet-4-612315
RustQwen3.6-35B-A3BQwen3.6-35B-A3B8210
TypeScriptQwen3.6-35B-A3BQwen3.6-35B-A3B12210

失败原因统计

语言已处理有效 SWE失败trivial_prvalidationinfra_errortimeoutworkflow_error其他
C29,4699,68819,78114,6323645,07126311
C++19,5714,03215,5392,0607314,956156285266
Go87,0818,00779,07421,5067,12347,6661,524733505
Java77,6694,01273,65717,7416,58544,7411,2932,1021,602
JavaScript36,4097,09229,31715,02482214,4035451450
Python95,9194,99790,92226,2245,79159,198945358120
Rust68,6115,47263,13919,0964,53036,8751,2448131,234
TypeScript56,3506,34750,00313,5053,08531,2841,72875711

trivial_pr:PR 被 LLM 评估为过于简单(如仅修改配置、文档、依赖版本等),不适合作为 SWE 任务。

validation:任务生成后验证失败(NOP agent 未返回 reward=0 或 ORACLE agent 未返回 reward=1)。

infra_error:基础设施错误(Docker 构建失败、网络超时、磁盘空间不足等)。

timeout:处理超时(单个 PR 总超时或 Claude Code session 超时)。

workflow_error:工作流程错误(PR 元数据获取失败、worktree 创建失败、patch 生成失败等)。

fix.patch 复杂度

语言Valid SWE CountAvg fix.patch linesAvg fix.patch hunksAvg fix.patch files
C9,688334.2317.925.84
C++4,032287.0313.735.10
Go8,007214.5112.734.37
Java4,012163.4010.504.23
JavaScript7,09277.266.292.79
Python4,997151.9310.993.83
Rust5,472226.3313.174.10
TypeScript6,347158.669.604.14

统计方法说明

难度打分 difficulty_score

读取每个有效任务目录的 solution/fix.patchtests/instruction.md,由 src/swegen/scoring.py 使用零 API 静态评分。

当前公式采用 log-scale 连续评分,避免中等规模 patch 过早变成 hard。权重为:patch_scope 38%logic_complexity 32%context_breadth 15%test_complexity 10%instruction_complexity 5%

label 阈值:easy <= 4.0medium <= 7.0hard > 7.0

Tags 生成与展示

tags 不是看板现场计算的,而是在 swegen 构建任务时由 LLM 根据 PR 信息生成,并写入 task.toml[metadata].tags

prompt 要求 tags 按三段式生成:编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。

fix.patch 统计

patch 统计来自每个有效任务的 solution/fix.patch,并按语言扩展名过滤代码文件,口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。

Avg fix.patch lines 统计代码文件 diff 中新增/删除行数;Avg fix.patch hunks 统计 @@ hunk 数;Avg fix.patch files 统计涉及的代码文件数。

difficulty_label 分布

语言easy / medium / hardeasymediumhard
C
866 / 6403 / 2409
8666,4032,409
C++
431 / 2501 / 1092
4312,5011,092
Go
622 / 5787 / 1592
6225,7871,592
Java
443 / 2600 / 964
4432,600964
JavaScript
1069 / 5254 / 767
1,0695,254767
Python
268 / 3210 / 1496
2683,2101,496
Rust
380 / 3243 / 1847
3803,2431,847
TypeScript
566 / 4696 / 1084
5664,6961,084

difficulty_score 概览

语言countminp25medianmeanp75max
C9,6782.44.96.05.977.09.2
C++4,0242.54.96.05.997.29.1
Go8,0012.64.95.85.876.89.1
Java4,0072.84.85.95.897.09.2
JavaScript7,0902.64.45.25.366.29.2
Python4,9742.65.26.26.227.39.1
Rust5,4702.75.26.36.267.49.0
TypeScript6,3462.74.75.65.726.69.1

全局 Top Tags

library23,136 (46.6%)
backend15,262 (30.8%)
cli6,713 (13.5%)
frontend3,021 (6.1%)
testing2,161 (4.4%)
http1,529 (3.1%)
react1,380 (2.8%)
framework1,210 (2.4%)
kubernetes700 (1.4%)
async670 (1.4%)
embedded640 (1.3%)
networking614 (1.2%)
cpp590 (1.2%)
typescript376 (0.8%)
parsing357 (0.7%)
eslint339 (0.7%)
graphql335 (0.7%)
postgresql298 (0.6%)
security286 (0.6%)
compiler284 (0.6%)
git282 (0.6%)
database280 (0.6%)
aws260 (0.5%)
json247 (0.5%)
redis236 (0.5%)
api224 (0.5%)
fastapi224 (0.5%)
ruby221 (0.4%)
angular207 (0.4%)
cryptography205 (0.4%)

每语言 Tags 分布

C c

library5,077 (52.4%)
backend2,505 (25.9%)
cli1,117 (11.5%)
embedded614 (6.3%)
cpp587 (6.1%)
testing408 (4.2%)
networking400 (4.1%)
framework287 (3.0%)
postgresql212 (2.2%)
ruby210 (2.2%)
http201 (2.1%)
firmware185 (1.9%)
kernel179 (1.8%)
quic169 (1.7%)
rust140 (1.4%)
bluetooth129 (1.3%)
tls128 (1.3%)
python114 (1.2%)
scheduler111 (1.1%)
cryptography107 (1.1%)

C++ cpp

library2,779 (69.0%)
backend728 (18.1%)
testing488 (12.1%)
cli356 (8.8%)
framework185 (4.6%)
http129 (3.2%)
boost114 (2.8%)
async91 (2.3%)
parsing76 (1.9%)
qt63 (1.6%)
compiler54 (1.3%)
serialization54 (1.3%)
geometry51 (1.3%)
networking49 (1.2%)
formatting45 (1.1%)
ros244 (1.1%)
frontend40 (1.0%)
json40 (1.0%)
logging37 (0.9%)
templates36 (0.9%)

Go go

backend4,200 (52.5%)
library2,009 (25.1%)
cli1,935 (24.2%)
kubernetes641 (8.0%)
http517 (6.5%)
testing222 (2.8%)
docker133 (1.7%)
aws118 (1.5%)
terraform118 (1.5%)
grpc117 (1.5%)
prometheus100 (1.2%)
networking97 (1.2%)
database88 (1.1%)
framework86 (1.1%)
api81 (1.0%)
git77 (1.0%)
security69 (0.9%)
aws-sdk65 (0.8%)
blockchain56 (0.7%)
configuration56 (0.7%)

Java java

backend1,945 (48.5%)
library1,812 (45.2%)
testing210 (5.2%)
spring159 (4.0%)
framework152 (3.8%)
aem147 (3.7%)
http136 (3.4%)
android118 (2.9%)
json67 (1.7%)
cli64 (1.6%)
maven59 (1.5%)
sling57 (1.4%)
kafka53 (1.3%)
flink48 (1.2%)
concurrency47 (1.2%)
grpc47 (1.2%)
frontend46 (1.1%)
mybatis37 (0.9%)
nacos34 (0.8%)
security34 (0.8%)

JavaScript js

library3,820 (53.9%)
backend1,227 (17.3%)
frontend1,000 (14.1%)
cli849 (12.0%)
react344 (4.9%)
typescript313 (4.4%)
testing292 (4.1%)
eslint260 (3.7%)
framework196 (2.8%)
http179 (2.5%)
fastify146 (2.1%)
webpack144 (2.0%)
mongoose103 (1.5%)
nodejs102 (1.4%)
svelte99 (1.4%)
express91 (1.3%)
stylelint65 (0.9%)
vue64 (0.9%)
lighthouse63 (0.9%)
async61 (0.9%)

Python py

library2,110 (42.4%)
backend2,074 (41.7%)
cli684 (13.7%)
fastapi202 (4.1%)
django140 (2.8%)
testing113 (2.3%)
pytorch102 (2.0%)
ansible100 (2.0%)
async99 (2.0%)
framework99 (2.0%)
http92 (1.8%)
aws72 (1.4%)
aiohttp65 (1.3%)
pydantic57 (1.1%)
openai55 (1.1%)
click51 (1.0%)
flask51 (1.0%)
litellm49 (1.0%)
beets48 (1.0%)
frontend44 (0.9%)

Rust rust

library3,070 (56.1%)
backend1,179 (21.6%)
cli1,121 (20.5%)
testing307 (5.6%)
async220 (4.0%)
http205 (3.7%)
compiler119 (2.2%)
git117 (2.1%)
macros92 (1.7%)
parsing91 (1.7%)
graphql79 (1.4%)
blockchain75 (1.4%)
substrate75 (1.4%)
serde71 (1.3%)
framework62 (1.1%)
sql62 (1.1%)
datafusion57 (1.0%)
cryptography56 (1.0%)
lsp56 (1.0%)
database55 (1.0%)

TypeScript ts

library2,459 (38.7%)
frontend1,750 (27.6%)
backend1,404 (22.1%)
react1,028 (16.2%)
cli587 (9.2%)
angular201 (3.2%)
graphql162 (2.6%)
framework143 (2.3%)
javascript138 (2.2%)
electron130 (2.0%)
fullstack124 (2.0%)
testing121 (1.9%)
vue96 (1.5%)
github-actions83 (1.3%)
nextjs82 (1.3%)
eslint79 (1.2%)
express79 (1.2%)
vscode73 (1.2%)
http70 (1.1%)
mcp69 (1.1%)
无轨迹数据