SWE 任务和轨迹进度看板

收集 PR 总数

425,966

1h +2,033 / 24h +2,234

有效 SWE 总数

28,825

1h +32 / 24h +240

整体处理成功率

47.5%

Valid SWE / 已处理目录 60,704

difficulty_score 均值

5.77

median 5.7，count 28,778

语言进度

语言	收集 PR	过去 1h	过去 24h	有效 SWE	过去 1h	过去 24h	已处理目录	处理成功率
Cc	28,232	+30	+30	7,734	0	0	10,431	74.1%
C++cpp	43,105	+39	+39	1,918	+11	+65	4,249	45.1%
Gogo	88,472	+864	+955	4,454	+7	+47	10,962	40.6%
Javajava	59,936	+47	+53	2,637	+7	+33	6,444	40.9%
JavaScriptjs	30,941	+155	+155	3,668	+2	+2	7,877	46.6%
Pythonpy	68,268	+185	+236	2,518	0	0	7,964	31.6%
Rustrust	50,823	+387	+437	2,505	+1	+36	5,426	46.2%
TypeScriptts	56,189	+326	+329	3,391	+4	+57	7,351	46.1%

运行参数

语言	评估模型 (OPENAI)	填充模型 (ANTHROPIC)	并发数	min_source_files	max_source_files
C	`gpt-5.4`	`claude-sonnet-4-6`	16	2	15
C++	`gpt-5.4`	`claude-sonnet-4-6`	16	2	15
Go	`gpt-5.4`	`claude-sonnet-4-6`	16	2	10
Java	`claude-haiku-4-5-20251001`	`claude-sonnet-4-6`	16	2	10
JavaScript	`glmmoedsa`	`glmmoedsa`	16	2	10
Python	`MiniMax-M2.7`	`MiniMax-M2.7`	16	3	15
Rust	`gpt-5.4`	`claude-opus-4-6`	16	2	10
TypeScript	`gpt-5.4`	`claude-sonnet-4-6`	16	2	10

失败原因统计

语言	已处理	成功	失败	trivial_pr	validation	infra_error	timeout	workflow_error	其他
C	39,798	8,220	31,578	13,809	2,264	12,730	680	1,855	240
C++	23,492	2,108	21,384	5,056	572	14,881	221	381	273
Go	30,157	6,534	23,623	10,533	2,196	6,583	1,583	1,178	1,550
Java	14,757	2,991	11,766	4,728	1,593	3,213	261	357	1,614
JavaScript	16,972	5,115	11,857	7,386	1,794	529	358	637	1,153
Python	22,915	2,630	20,285	8,859	2,534	8,204	238	317	133
Rust	17,537	2,945	14,592	4,405	1,150	5,542	641	1,620	1,234
TypeScript	36,977	3,393	33,584	6,657	886	22,856	181	2,899	105

fix.patch 复杂度

语言	Valid SWE Count	Avg fix.patch lines	Avg fix.patch hunks	Avg fix.patch files
C	7,734	281.22	15.40	4.93
C++	1,918	330.75	11.55	4.54
Go	4,454	270.98	14.92	4.96
Java	2,637	170.88	10.93	4.46
JavaScript	3,668	73.28	6.21	2.76
Python	2,518	135.78	9.99	3.44
Rust	2,505	257.03	12.86	4.06
TypeScript	3,391	159.62	9.09	4.09

统计方法说明

难度打分 difficulty_score

读取每个有效任务目录的 solution/fix.patch、tests/ 和 instruction.md，由 src/swegen/scoring.py 使用零 API 静态评分。

当前公式采用 log-scale 连续评分，避免中等规模 patch 过早变成 hard。权重为：patch_scope 38%、logic_complexity 32%、context_breadth 15%、test_complexity 10%、instruction_complexity 5%。

label 阈值：easy <= 4.0，medium <= 7.0，hard > 7.0。

Tags 生成与展示

tags 不是看板现场计算的，而是在 swegen 构建任务时由 LLM 根据 PR 信息生成，并写入 task.toml 的 [metadata].tags。

prompt 要求 tags 按三段式生成：编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。

fix.patch 统计

patch 统计来自每个有效任务的 solution/fix.patch，并按语言扩展名过滤代码文件，口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。

Avg fix.patch lines 统计代码文件 diff 中新增/删除行数；Avg fix.patch hunks 统计 @@ hunk 数；Avg fix.patch files 统计涉及的代码文件数。

difficulty_label 分布

语言	easy / medium / hard	easy	medium	hard
C	735 / 5189 / 1802	735	5,189	1,802
C++	323 / 1194 / 397	323	1,194	397
Go	370 / 3283 / 795	370	3,283	795
Java	351 / 1676 / 607	351	1,676	607
JavaScript	593 / 2726 / 348	593	2,726	348
Python	211 / 1724 / 560	211	1,724	560
Rust	260 / 1465 / 778	260	1,465	778
TypeScript	346 / 2570 / 475	346	2,570	475

difficulty_score 概览

语言	count	min	p25	median	mean	p75	max
C	7,726	2.4	4.9	5.9	5.91	7.0	9.2
C++	1,914	2.5	4.4	5.6	5.63	6.8	9.0
Go	4,448	2.6	4.9	5.8	5.81	6.7	9.1
Java	2,634	2.8	4.7	5.9	5.81	6.9	9.2
JavaScript	3,667	2.6	4.4	5.2	5.28	6.1	9.2
Python	2,495	2.6	4.9	5.8	5.90	6.9	8.9
Rust	2,503	2.7	4.9	6.2	6.11	7.4	9.0
TypeScript	3,391	2.7	4.6	5.5	5.58	6.5	8.9

全局 Top Tags

library13,014 (45.2%)

backend8,878 (30.8%)

cli4,020 (14.0%)

frontend1,705 (5.9%)

testing1,240 (4.3%)

react917 (3.2%)

http907 (3.2%)

framework779 (2.7%)

embedded567 (2.0%)

cpp396 (1.4%)

networking361 (1.3%)

async328 (1.1%)

kubernetes265 (0.9%)

graphql231 (0.8%)

postgresql226 (0.8%)

eslint217 (0.8%)

parsing209 (0.7%)

aws182 (0.6%)

angular174 (0.6%)

kernel173 (0.6%)

compiler172 (0.6%)

firmware170 (0.6%)

quic166 (0.6%)

git165 (0.6%)

json162 (0.6%)

redis154 (0.5%)

aem147 (0.5%)

security142 (0.5%)

rust141 (0.5%)

tls138 (0.5%)

每语言 Tags 分布

C c

library4,080 (52.8%)

backend1,884 (24.4%)

cli938 (12.1%)

embedded562 (7.3%)

cpp395 (5.1%)

networking287 (3.7%)

testing213 (2.8%)

postgresql186 (2.4%)

kernel173 (2.2%)

framework172 (2.2%)

firmware170 (2.2%)

quic161 (2.1%)

http146 (1.9%)

rust131 (1.7%)

bluetooth121 (1.6%)

tls119 (1.5%)

ruby113 (1.5%)

scheduler110 (1.4%)

cryptography102 (1.3%)

python88 (1.1%)

C++ cpp

library1,413 (73.8%)

backend308 (16.1%)

testing245 (12.8%)

cli152 (7.9%)

boost92 (4.8%)

framework77 (4.0%)

http70 (3.7%)

async63 (3.3%)

compiler41 (2.1%)

parsing40 (2.1%)

serialization36 (1.9%)

templates29 (1.5%)

actor-framework28 (1.5%)

arrayfire28 (1.5%)

formatting27 (1.4%)

logging22 (1.1%)

redis22 (1.1%)

audio20 (1.0%)

sparql20 (1.0%)

iceberg19 (1.0%)

Go go

backend2,363 (53.1%)

cli1,197 (26.9%)

library992 (22.3%)

http337 (7.6%)

kubernetes233 (5.2%)

testing137 (3.1%)

docker98 (2.2%)

aws80 (1.8%)

framework71 (1.6%)

grpc70 (1.6%)

aws-sdk56 (1.3%)

dns45 (1.0%)

git45 (1.0%)

database44 (1.0%)

aws-lambda38 (0.9%)

security37 (0.8%)

blockchain36 (0.8%)

terraform36 (0.8%)

prometheus34 (0.8%)

templ34 (0.8%)

Java java

backend1,328 (50.4%)

library1,168 (44.3%)

aem147 (5.6%)

testing133 (5.0%)

spring106 (4.0%)

framework77 (2.9%)

http77 (2.9%)

android72 (2.7%)

json63 (2.4%)

sling57 (2.2%)

flink47 (1.8%)

cli40 (1.5%)

concurrency40 (1.5%)

grpc37 (1.4%)

nacos34 (1.3%)

mybatis33 (1.3%)

maven29 (1.1%)

parsing27 (1.0%)

sql-parser27 (1.0%)

websocket27 (1.0%)

JavaScript js

library1,797 (49.0%)

backend762 (20.8%)

frontend487 (13.3%)

cli436 (11.9%)

testing204 (5.6%)

react180 (4.9%)

eslint178 (4.9%)

framework176 (4.8%)

fastify118 (3.2%)

http105 (2.9%)

mongoose97 (2.6%)

webpack71 (1.9%)

typescript69 (1.9%)

express67 (1.8%)

lighthouse61 (1.7%)

svelte61 (1.7%)

aframe53 (1.4%)

async50 (1.4%)

apostrophecms48 (1.3%)

nodejs47 (1.3%)

Python py

backend1,095 (43.9%)

library916 (36.7%)

cli418 (16.7%)

ansible98 (3.9%)

fastapi78 (3.1%)

testing77 (3.1%)

framework66 (2.6%)

aiohttp62 (2.5%)

aws56 (2.2%)

django54 (2.2%)

http52 (2.1%)

click46 (1.8%)

dbt41 (1.6%)

black40 (1.6%)

async39 (1.6%)

openai39 (1.6%)

jinja235 (1.4%)

pipx34 (1.4%)

litellm31 (1.2%)

pytorch31 (1.2%)

Rust rust

library1,409 (56.3%)

cli578 (23.1%)

backend458 (18.3%)

testing166 (6.6%)

http93 (3.7%)

git88 (3.5%)

async77 (3.1%)

compiler59 (2.4%)

graphql57 (2.3%)

parsing55 (2.2%)

datafusion53 (2.1%)

substrate48 (1.9%)

actix-web45 (1.8%)

macros45 (1.8%)

sql44 (1.8%)

framework41 (1.6%)

parquet39 (1.6%)

blockchain31 (1.2%)

clap28 (1.1%)

sqlparser28 (1.1%)

TypeScript ts

library1,239 (36.5%)

frontend1,081 (31.9%)

react733 (21.6%)

backend680 (20.1%)

cli261 (7.7%)

angular173 (5.1%)

graphql133 (3.9%)

framework99 (2.9%)

electron94 (2.8%)

fullstack86 (2.5%)

testing65 (1.9%)

github-actions59 (1.7%)

vue55 (1.6%)

express48 (1.4%)

javascript45 (1.3%)

nextjs42 (1.2%)

eslint39 (1.2%)

xstate37 (1.1%)

react-native36 (1.1%)

zod36 (1.1%)

轨迹文件数

轨迹总条数

65,965

平均消息轮数

61.5

平均 composite_score

0.7449

基于 29,596 条有分数的轨迹

轨迹文件总览

数据集	脚手架	模型	Agent	轨迹数(main)	轨迹数(sub)	文件大小	平均轮数	平均 Token	平均 Tool Calls	平均 Score
swegen	Claude Code	`glm5`	chaofan	1,096	1,039	446 MB	72.7	35,904	37.9	0.7218
swegen	OpenCode	`glm5`	chaofan	860	317	197 MB	64.8	28,867	34.0	0.7337
swegen	OpenHands SDK	`glm5`	chaofan	1,140	—	280 MB	119.6	44,172	59.8	0.7533
swegen	Terminus	`glm5`	chaofan	1,331	—	142 MB	53.2	26,362	0.0	0.7048
swerebench_oraclesolved	Claude Code	`glm5`	chaofan	4,053	—	601 MB	44.1	21,015	22.7	—
swerebench_oraclesolved	OpenCode	`glm5`	chaofan	3,431	—	405 MB	41.3	19,519	21.5	—
swerebench_oraclesolved	OpenHands SDK	`glm5`	chaofan	2,808	—	638 MB	135.1	40,631	67.2	—
swerebench_oraclesolved	Terminus	`glm5`	chaofan	2,920	—	291 MB	55.7	24,625	0.0	—
swerebench_oraclesolved	Claude Code	`glm5`	jierun	5,013	—	779 MB	44.7	22,630	23.3	—
swerebench_oraclesolved	OpenCode	`glm5`	jierun	3,521	—	464 MB	51.8	22,138	27.0	—
swerebench_oraclesolved	OpenHands SDK	`glm5`	jierun	2,733	—	507 MB	95.8	31,785	47.5	—
swerebench_oraclesolved	Terminus	`glm5`	jierun	2,998	—	296 MB	55.5	24,396	0.0	—
swerebench_others	Claude Code	`glm5`	jierun	2,885	—	457 MB	43.8	23,380	23.1	0.7178
swerebench_others	OpenCode	`glm5`	jierun	2,637	—	360 MB	49.1	23,135	25.8	0.7210
swerebench_others	OpenHands SDK	`glm5`	jierun	1,843	—	340 MB	90.2	32,078	44.8	0.7868
swerebench_others	Terminus	`glm5`	jierun	2,270	—	225 MB	50.6	24,402	0.0	0.7844
swerebenchv2_python_oraclesolved	Claude Code	`glm5`	jierun	2,728	—	445 MB	47.5	24,446	25.1	—
swerebenchv2_python_oraclesolved	OpenCode	`glm5`	jierun	1,654	—	240 MB	60.7	23,822	31.3	—
swerebenchv2_python_oraclesolved	OpenHands SDK	`glm5`	jierun	1,521	—	313 MB	103.0	35,858	51.0	—
swerebenchv2_python_oraclesolved	Terminus	`glm5`	jierun	1,633	—	174 MB	55.7	26,382	0.0	—
v2nopy_full	Claude Code	`glm5`	jierun	5,128	—	818 MB	42.7	24,097	22.8	0.7281
v2nopy_full	OpenCode	`glm5`	jierun	3,465	—	455 MB	47.0	21,718	24.8	0.7353
v2nopy_full	OpenHands SDK	`glm5`	jierun	3,266	—	620 MB	92.7	33,367	46.0	0.7580
v2nopy_full	Terminus	`glm5`	jierun	3,675	—	411 MB	64.1	27,517	0.0	0.7799

质量评分统计

数据集	脚手架	composite	efficiency	style	tool_mastery	completion	precision
swegen	Claude Code	0.7218	0.9125	0.3414	0.8678	0.6213	0.7722
swegen	OpenCode	0.7337	0.9322	0.3272	0.8886	0.6288	0.7923
swegen	OpenHands SDK	0.7533	0.8975	0.3500	0.8537	0.7737	0.7632
swegen	Terminus	0.7048	0.9241	0.4228	0.8983	0.3960	0.8866
swerebench_others	Claude Code	0.7178	0.8864	0.3197	0.7942	0.6405	0.8924
swerebench_others	OpenCode	0.7210	0.8952	0.3171	0.8093	0.6509	0.8621
swerebench_others	OpenHands SDK	0.7868	0.8886	0.3693	0.7698	0.9333	0.8525
swerebench_others	Terminus	0.7844	0.8539	0.4292	0.8172	0.8203	0.9323
v2nopy_full	Claude Code	0.7281	0.9186	0.3213	0.8886	0.5565	0.8996
v2nopy_full	OpenCode	0.7353	0.9278	0.3254	0.9008	0.5766	0.8772
v2nopy_full	OpenHands SDK	0.7580	0.9002	0.3555	0.8440	0.7523	0.8370
v2nopy_full	Terminus	0.7799	0.9222	0.4310	0.9016	0.6930	0.8808

按脚手架对比

脚手架	总轨迹数	平均轮数	平均 Token	平均 Tool Calls	平均 Score
Claude Code	20,903	45.8	23,713	24.0	0.7241
OpenCode	15,568	49.6	22,187	25.9	0.7297
OpenHands SDK	13,311	105.4	35,606	52.4	0.7656
Terminus	14,827	56.7	25,611	0.0	0.7675

按数据集对比

数据集	总轨迹数	平均轮数	平均 Token	平均 Tool Calls	平均 Score
swegen	4,427	77.4	33,797	31.4	0.7271
swerebench_oraclesolved	27,477	61.8	25,095	25.3	—
swerebench_others	9,635	55.7	25,217	22.6	0.7475
swerebenchv2_python_oraclesolved	7,536	63.4	27,032	26.2	—
v2nopy_full	15,534	59.2	26,324	22.7	0.7483

源数据路径目录

数据集	脚手架	Agent	文件路径	大小	条数
swegen	Claude Code	chaofan	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_cc_2135.jsonl`	446 MB	2,135
swegen	OpenCode	chaofan	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_oc_1177.jsonl`	197 MB	1,177
swegen	OpenHands SDK	chaofan	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_ohsdk_1140.jsonl`	280 MB	1,140
swegen	Terminus	chaofan	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_t2_1331.jsonl`	142 MB	1,331
swerebench_oraclesolved	Claude Code	chaofan	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_cc_4053.jsonl`	601 MB	4,053
swerebench_oraclesolved	Claude Code	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_cc_5013.jsonl`	779 MB	5,013
swerebench_oraclesolved	OpenCode	chaofan	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oc_3431.jsonl`	405 MB	3,431
swerebench_oraclesolved	OpenCode	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oc_3521.jsonl`	464 MB	3,521
swerebench_oraclesolved	OpenHands SDK	chaofan	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oh_2808.jsonl`	638 MB	2,808
swerebench_oraclesolved	OpenHands SDK	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oh_sdk_2733.jsonl`	507 MB	2,733
swerebench_oraclesolved	Terminus	chaofan	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_t2_2920.jsonl`	291 MB	2,920
swerebench_oraclesolved	Terminus	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_t2_2998.jsonl`	296 MB	2,998
swerebench_others	Claude Code	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_cc_2885.jsonl`	457 MB	2,885
swerebench_others	OpenCode	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oc_2637.jsonl`	360 MB	2,637
swerebench_others	OpenHands SDK	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oh_sdk_1843.jsonl`	340 MB	1,843
swerebench_others	Terminus	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_t2_2270.jsonl`	225 MB	2,270
swerebenchv2_python_oraclesolved	Claude Code	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_cc_2728.jsonl`	445 MB	2,728
swerebenchv2_python_oraclesolved	OpenCode	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oc_1654.jsonl`	240 MB	1,654
swerebenchv2_python_oraclesolved	OpenHands SDK	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oh_sdk_1521.jsonl`	313 MB	1,521
swerebenchv2_python_oraclesolved	Terminus	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_t2_1633.jsonl`	174 MB	1,633
v2nopy_full	Claude Code	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_cc_5128.jsonl`	818 MB	5,128
v2nopy_full	OpenCode	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oc_3465.jsonl`	455 MB	3,465
v2nopy_full	OpenHands SDK	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oh_sdk_3266.jsonl`	620 MB	3,266
v2nopy_full	Terminus	jierun	`/home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_t2_3675.jsonl`	411 MB	3,675

统计方法说明

轨迹数统计

JSONL 文件中每行为一条轨迹。有 _agent_type 字段的文件按 main/subagent 分类；无该字段的文件所有行视为 main。

平均轮数 / Token / Tool Calls

平均轮数：每条轨迹的 messages 数组长度的平均值。

平均 Token（估算）：所有 message 的 content + reasoning_content 字符总数 ÷ 4 的平均值。

平均 Tool Calls：assistant 消息中 tool_calls 数组长度之和的平均值。

质量评分

composite_score（0-1）由五个维度加权：efficiency（效率）、style（风格）、tool_mastery（工具掌握）、completion（完成度）、precision（精确度）。

仅部分文件包含 _score 字段，无分数的文件显示 "—"。