收集 PR 总数
557,074
1h +1 / 24h -41,702
有效 SWE 总数
52,439
1h +32 / 24h +1,106
整体处理成功率
8.9%
Valid SWE / 已处理 588,932
difficulty_score 均值
5.90
median 5.9,count 52,389
语言进度
| 语言 | 收集 PR | 过去 1h | 过去 24h | 有效 SWE | 过去 1h | 过去 24h | 已处理 | 处理成功率 |
| Cc | 29,470 | +1 | -3,511 | 10,269 | 0 | +411 | 32,981 | |
| C++cpp | 45,622 | 0 | -3,541 | 4,062 | 0 | 0 | 49,127 | |
| Gogo | 126,442 | 0 | -6,582 | 8,386 | 0 | 0 | 123,518 | |
| Javajava | 84,868 | 0 | -5,968 | 4,304 | +11 | +55 | 90,730 | |
| JavaScriptjs | 36,409 | 0 | -3,657 | 7,257 | 0 | 0 | 40,065 | |
| Pythonpy | 98,883 | 0 | -9,462 | 5,828 | +17 | +549 | 108,256 | |
| Rustrust | 68,650 | 0 | -3,927 | 5,595 | 0 | 0 | 72,539 | |
| TypeScriptts | 66,730 | 0 | -5,054 | 6,738 | +4 | +91 | 71,716 | |
运行参数
| 语言 | 评估模型 (OPENAI) | 填充模型 (ANTHROPIC) | 并发数 | min_source_files | max_source_files |
| C | glm-5 | claude-opus-4-7 | 12 | 2 | 15 |
| C++ | glm-5 | claude-opus-4-7 | 8 | 2 | 15 |
| Go | glm-5 | claude-opus-4-7 | 12 | 2 | 10 |
| Java | glm-5 | claude-opus-4-7 | 8 | 2 | 10 |
| JavaScript | glm-5 | claude-opus-4-7 | 12 | 2 | 10 |
| Python | glm-5 | claude-opus-4-7 | 12 | 3 | 15 |
| Rust | glm-5 | claude-opus-4-7 | 8 | 2 | 10 |
| TypeScript | glm-5 | claude-opus-4-7 | 12 | 2 | 10 |
失败原因统计
| 语言 | 已处理 | 有效 SWE | 失败 | trivial_pr | validation | infra_error | timeout | workflow_error | 其他 |
| C | 32,981 | 10,269 | 22,712 | 19,452 | 1,127 | 2,343 | 11 | 37 | 2 |
| C++ | 49,127 | 4,062 | 45,065 | 8,805 | 6,810 | 30,622 | 159 | 649 | 266 |
| Go | 123,518 | 8,386 | 115,132 | 38,741 | 24,048 | 49,742 | 1,461 | 1,033 | 108 |
| Java | 90,730 | 4,304 | 86,426 | 27,085 | 11,033 | 40,992 | 1,148 | 5,009 | 1,599 |
| JavaScript | 40,065 | 7,257 | 32,808 | 20,845 | 93 | 13,228 | 1 | 15 | 0 |
| Python | 108,256 | 5,828 | 102,428 | 45,400 | 15,545 | 41,907 | 877 | 339 | 120 |
| Rust | 72,539 | 5,595 | 66,944 | 30,149 | 10,205 | 24,100 | 1,077 | 846 | 1,233 |
| TypeScript | 71,716 | 6,738 | 64,978 | 22,100 | 13,943 | 26,760 | 1,775 | 692 | 8 |
trivial_pr:PR 被 LLM 评估为过于简单(如仅修改配置、文档、依赖版本等),不适合作为 SWE 任务。
validation:任务生成后验证失败(NOP agent 未返回 reward=0 或 ORACLE agent 未返回 reward=1)。
infra_error:基础设施错误(Docker 构建失败、网络超时、磁盘空间不足等)。
timeout:处理超时(单个 PR 总超时或 Claude Code session 超时)。
workflow_error:工作流程错误(PR 元数据获取失败、worktree 创建失败、patch 生成失败等)。
fix.patch 复杂度
| 语言 | Valid SWE Count | Avg fix.patch lines | Avg fix.patch hunks | Avg fix.patch files |
| C | 10,269 | 333.28 | 17.96 | 5.82 |
| C++ | 4,062 | 286.04 | 13.74 | 5.09 |
| Go | 8,386 | 212.01 | 12.56 | 4.34 |
| Java | 4,304 | 166.46 | 10.70 | 4.26 |
| JavaScript | 7,257 | 77.11 | 6.31 | 2.80 |
| Python | 5,828 | 156.38 | 11.19 | 3.94 |
| Rust | 5,595 | 226.35 | 13.19 | 4.11 |
| TypeScript | 6,738 | 155.55 | 9.53 | 4.12 |
统计方法说明
难度打分 difficulty_score
读取每个有效任务目录的 solution/fix.patch、tests/ 和 instruction.md,由 src/swegen/scoring.py 使用零 API 静态评分。
当前公式采用 log-scale 连续评分,避免中等规模 patch 过早变成 hard。权重为:patch_scope 38%、logic_complexity 32%、context_breadth 15%、test_complexity 10%、instruction_complexity 5%。
label 阈值:easy <= 4.0,medium <= 7.0,hard > 7.0。
Tags 生成与展示
tags 不是看板现场计算的,而是在 swegen 构建任务时由 LLM 根据 PR 信息生成,并写入 task.toml 的 [metadata].tags。
prompt 要求 tags 按三段式生成:编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。
fix.patch 统计
patch 统计来自每个有效任务的 solution/fix.patch,并按语言扩展名过滤代码文件,口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。
Avg fix.patch lines 统计代码文件 diff 中新增/删除行数;Avg fix.patch hunks 统计 @@ hunk 数;Avg fix.patch files 统计涉及的代码文件数。
difficulty_label 分布
| 语言 | easy / medium / hard | easy | medium | hard |
| C | 897 / 6767 / 2597 | 897 | 6,767 | 2,597 |
| C++ | 433 / 2524 / 1100 | 433 | 2,524 | 1,100 |
| Go | 689 / 6041 / 1650 | 689 | 6,041 | 1,650 |
| Java | 470 / 2784 / 1046 | 470 | 2,784 | 1,046 |
| JavaScript | 1096 / 5376 / 784 | 1,096 | 5,376 | 784 |
| Python | 281 / 3687 / 1836 | 281 | 3,687 | 1,836 |
| Rust | 391 / 3312 / 1890 | 391 | 3,312 | 1,890 |
| TypeScript | 618 / 4975 / 1145 | 618 | 4,975 | 1,145 |
difficulty_score 概览
| 语言 | count | min | p25 | median | mean | p75 | max |
| C | 10,261 | 2.4 | 4.9 | 6.0 | 5.99 | 7.1 | 9.2 |
| C++ | 4,057 | 2.5 | 4.9 | 6.0 | 5.99 | 7.2 | 9.1 |
| Go | 8,380 | 2.6 | 4.9 | 5.8 | 5.85 | 6.8 | 9.1 |
| Java | 4,300 | 2.8 | 4.8 | 5.9 | 5.91 | 7.0 | 9.2 |
| JavaScript | 7,256 | 2.6 | 4.4 | 5.2 | 5.36 | 6.2 | 9.2 |
| Python | 5,804 | 2.6 | 5.3 | 6.3 | 6.28 | 7.3 | 9.1 |
| Rust | 5,593 | 2.7 | 5.2 | 6.3 | 6.26 | 7.4 | 9.0 |
| TypeScript | 6,738 | 2.7 | 4.7 | 5.6 | 5.72 | 6.6 | 9.2 |
全局 Top Tags
library24,455 (46.7%)
backend16,227 (31.0%)
cli7,016 (13.4%)
missing-feature3,717 (7.1%)
frontend3,189 (6.1%)
testing2,229 (4.3%)
http1,603 (3.1%)
react1,440 (2.7%)
incomplete-validation1,375 (2.6%)
framework1,255 (2.4%)
missing-implementation1,196 (2.3%)
missing-metadata-propagation954 (1.8%)
missing-fallback865 (1.7%)
kubernetes743 (1.4%)
async689 (1.3%)
wrong-default666 (1.3%)
embedded643 (1.2%)
networking629 (1.2%)
cpp606 (1.2%)
missing-validation573 (1.1%)
missing-functionality454 (0.9%)
missing-configuration425 (0.8%)
typescript402 (0.8%)
type-handling-inconsistency389 (0.7%)
parsing364 (0.7%)
eslint348 (0.7%)
graphql342 (0.7%)
race-condition336 (0.6%)
missing-configuration-option316 (0.6%)
postgresql301 (0.6%)
每语言 Tags 分布
C c
library5,450 (53.1%)
backend2,632 (25.6%)
cli1,174 (11.4%)
missing-feature710 (6.9%)
embedded617 (6.0%)
cpp601 (5.9%)
testing417 (4.1%)
networking409 (4.0%)
framework294 (2.9%)
incomplete-validation288 (2.8%)
missing-implementation284 (2.8%)
http224 (2.2%)
ruby221 (2.2%)
postgresql213 (2.1%)
firmware184 (1.8%)
kernel182 (1.8%)
quic180 (1.8%)
missing-metadata-propagation153 (1.5%)
rust141 (1.4%)
tls137 (1.3%)
C++ cpp
library2,794 (68.8%)
backend741 (18.2%)
testing496 (12.2%)
cli359 (8.8%)
missing-feature268 (6.6%)
framework186 (4.6%)
missing-implementation144 (3.5%)
http129 (3.2%)
incomplete-validation119 (2.9%)
boost114 (2.8%)
async92 (2.3%)
parsing76 (1.9%)
qt65 (1.6%)
serialization55 (1.4%)
compiler54 (1.3%)
geometry51 (1.3%)
missing-fallback50 (1.2%)
networking49 (1.2%)
missing-metadata-propagation48 (1.2%)
formatting45 (1.1%)
Go go
backend4,423 (52.8%)
library2,087 (24.9%)
cli2,027 (24.2%)
missing-feature739 (8.8%)
kubernetes665 (7.9%)
http536 (6.4%)
incomplete-validation251 (3.0%)
testing236 (2.8%)
missing-metadata-propagation207 (2.5%)
missing-fallback179 (2.1%)
missing-implementation179 (2.1%)
docker134 (1.6%)
wrong-default134 (1.6%)
terraform125 (1.5%)
grpc122 (1.5%)
aws121 (1.4%)
missing-validation120 (1.4%)
prometheus115 (1.4%)
networking97 (1.2%)
database90 (1.1%)
Java java
backend2,092 (48.6%)
library1,945 (45.2%)
missing-feature239 (5.6%)
testing222 (5.2%)
spring164 (3.8%)
framework157 (3.7%)
aem147 (3.4%)
http147 (3.4%)
android129 (3.0%)
incomplete-validation101 (2.3%)
missing-metadata-propagation96 (2.2%)
missing-implementation83 (1.9%)
missing-configuration72 (1.7%)
cli71 (1.7%)
json68 (1.6%)
missing-null-check64 (1.5%)
wrong-default63 (1.5%)
maven61 (1.4%)
sling57 (1.3%)
kafka56 (1.3%)
JavaScript js
library3,913 (53.9%)
backend1,239 (17.1%)
frontend1,043 (14.4%)
cli859 (11.8%)
missing-feature496 (6.8%)
react354 (4.9%)
typescript334 (4.6%)
testing297 (4.1%)
eslint263 (3.6%)
incomplete-validation221 (3.0%)
framework200 (2.8%)
http183 (2.5%)
fastify149 (2.1%)
webpack148 (2.0%)
missing-fallback124 (1.7%)
missing-metadata-propagation113 (1.6%)
svelte106 (1.5%)
mongoose104 (1.4%)
nodejs104 (1.4%)
missing-implementation100 (1.4%)
Python py
library2,517 (43.3%)
backend2,377 (40.9%)
cli781 (13.4%)
missing-feature507 (8.7%)
fastapi232 (4.0%)
django170 (2.9%)
missing-implementation147 (2.5%)
pytorch138 (2.4%)
incomplete-validation122 (2.1%)
testing119 (2.0%)
missing-fallback117 (2.0%)
missing-metadata-propagation117 (2.0%)
async108 (1.9%)
framework104 (1.8%)
ansible100 (1.7%)
http99 (1.7%)
aws80 (1.4%)
pydantic71 (1.2%)
aiohttp66 (1.1%)
flask63 (1.1%)
Rust rust
library3,132 (56.0%)
backend1,224 (21.9%)
cli1,138 (20.3%)
missing-feature411 (7.3%)
testing315 (5.6%)
async225 (4.0%)
http211 (3.8%)
missing-implementation162 (2.9%)
incomplete-validation133 (2.4%)
compiler120 (2.1%)
git119 (2.1%)
missing-metadata-propagation97 (1.7%)
macros93 (1.7%)
parsing92 (1.6%)
graphql81 (1.4%)
blockchain80 (1.4%)
substrate77 (1.4%)
serde73 (1.3%)
missing-fallback71 (1.3%)
sql65 (1.2%)
TypeScript ts
library2,617 (38.8%)
frontend1,848 (27.4%)
backend1,499 (22.2%)
react1,076 (16.0%)
cli607 (9.0%)
missing-feature347 (5.1%)
angular204 (3.0%)
graphql164 (2.4%)
framework162 (2.4%)
missing-fallback148 (2.2%)
javascript144 (2.1%)
electron142 (2.1%)
incomplete-validation140 (2.1%)
fullstack133 (2.0%)
testing127 (1.9%)
missing-metadata-propagation123 (1.8%)
wrong-default120 (1.8%)
vue106 (1.6%)
missing-implementation97 (1.4%)
nextjs86 (1.3%)