收集 PR 总数
557,067
1h -41,710 / 24h -41,703
有效 SWE 总数
51,808
1h +55 / 24h +900
整体处理成功率
8.8%
Valid SWE / 已处理 588,824
difficulty_score 均值
5.89
median 5.9,count 51,758
语言进度
| 语言 | 收集 PR | 过去 1h | 过去 24h | 有效 SWE | 过去 1h | 过去 24h | 已处理 | 处理成功率 |
| Cc | 29,469 | -3,512 | -3,512 | 10,127 | +35 | +411 | 32,981 | |
| C++cpp | 45,620 | -3,543 | -3,543 | 4,062 | 0 | 0 | 49,019 | |
| Gogo | 126,439 | -6,586 | -6,583 | 8,386 | 0 | +6 | 123,518 | |
| Javajava | 84,868 | -5,968 | -5,967 | 4,260 | +1 | +102 | 90,730 | |
| JavaScriptjs | 36,409 | -3,657 | -3,656 | 7,257 | 0 | 0 | 40,065 | |
| Pythonpy | 98,883 | -9,462 | -9,462 | 5,445 | +16 | +326 | 108,256 | |
| Rustrust | 68,650 | -3,927 | -3,927 | 5,595 | 0 | 0 | 72,539 | |
| TypeScriptts | 66,729 | -5,055 | -5,053 | 6,676 | +3 | +55 | 71,716 | |
运行参数
| 语言 | 评估模型 (OPENAI) | 填充模型 (ANTHROPIC) | 并发数 | min_source_files | max_source_files |
| C | glm-5 | claude-opus-4-7 | 12 | 2 | 15 |
| C++ | glm-5 | claude-opus-4-7 | 8 | 2 | 15 |
| Go | glm-5 | claude-opus-4-7 | 12 | 2 | 10 |
| Java | glm-5 | claude-opus-4-7 | 8 | 2 | 10 |
| JavaScript | glm-5 | claude-opus-4-7 | 12 | 2 | 10 |
| Python | glm-5 | claude-opus-4-7 | 12 | 3 | 15 |
| Rust | glm-5 | claude-opus-4-7 | 8 | 2 | 10 |
| TypeScript | glm-5 | claude-opus-4-7 | 12 | 2 | 10 |
失败原因统计
| 语言 | 已处理 | 有效 SWE | 失败 | trivial_pr | validation | infra_error | timeout | workflow_error | 其他 |
| C | 32,981 | 10,127 | 22,854 | 19,443 | 1,146 | 2,477 | 9 | 31 | 2 |
| C++ | 49,019 | 4,062 | 44,957 | 8,778 | 6,809 | 30,529 | 159 | 662 | 266 |
| Go | 123,518 | 8,386 | 115,132 | 38,741 | 24,048 | 49,742 | 1,461 | 1,033 | 108 |
| Java | 90,730 | 4,260 | 86,470 | 26,982 | 11,012 | 41,218 | 1,086 | 5,009 | 1,599 |
| JavaScript | 40,065 | 7,257 | 32,808 | 20,724 | 93 | 13,349 | 1 | 15 | 0 |
| Python | 108,256 | 5,445 | 102,811 | 44,785 | 15,519 | 42,956 | 836 | 339 | 120 |
| Rust | 72,539 | 5,595 | 66,944 | 30,149 | 10,205 | 24,100 | 1,077 | 846 | 1,233 |
| TypeScript | 71,716 | 6,676 | 65,040 | 22,089 | 13,939 | 26,824 | 1,769 | 711 | 8 |
trivial_pr:PR 被 LLM 评估为过于简单(如仅修改配置、文档、依赖版本等),不适合作为 SWE 任务。
validation:任务生成后验证失败(NOP agent 未返回 reward=0 或 ORACLE agent 未返回 reward=1)。
infra_error:基础设施错误(Docker 构建失败、网络超时、磁盘空间不足等)。
timeout:处理超时(单个 PR 总超时或 Claude Code session 超时)。
workflow_error:工作流程错误(PR 元数据获取失败、worktree 创建失败、patch 生成失败等)。
fix.patch 复杂度
| 语言 | Valid SWE Count | Avg fix.patch lines | Avg fix.patch hunks | Avg fix.patch files |
| C | 10,127 | 331.75 | 17.79 | 5.81 |
| C++ | 4,062 | 286.04 | 13.74 | 5.09 |
| Go | 8,386 | 212.01 | 12.56 | 4.34 |
| Java | 4,260 | 166.72 | 10.70 | 4.27 |
| JavaScript | 7,257 | 77.11 | 6.31 | 2.80 |
| Python | 5,445 | 154.31 | 11.12 | 3.90 |
| Rust | 5,595 | 226.35 | 13.19 | 4.11 |
| TypeScript | 6,676 | 155.30 | 9.53 | 4.11 |
统计方法说明
难度打分 difficulty_score
读取每个有效任务目录的 solution/fix.patch、tests/ 和 instruction.md,由 src/swegen/scoring.py 使用零 API 静态评分。
当前公式采用 log-scale 连续评分,避免中等规模 patch 过早变成 hard。权重为:patch_scope 38%、logic_complexity 32%、context_breadth 15%、test_complexity 10%、instruction_complexity 5%。
label 阈值:easy <= 4.0,medium <= 7.0,hard > 7.0。
Tags 生成与展示
tags 不是看板现场计算的,而是在 swegen 构建任务时由 LLM 根据 PR 信息生成,并写入 task.toml 的 [metadata].tags。
prompt 要求 tags 按三段式生成:编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。
fix.patch 统计
patch 统计来自每个有效任务的 solution/fix.patch,并按语言扩展名过滤代码文件,口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。
Avg fix.patch lines 统计代码文件 diff 中新增/删除行数;Avg fix.patch hunks 统计 @@ hunk 数;Avg fix.patch files 统计涉及的代码文件数。
difficulty_label 分布
| 语言 | easy / medium / hard | easy | medium | hard |
| C | 887 / 6677 / 2555 | 887 | 6,677 | 2,555 |
| C++ | 433 / 2524 / 1100 | 433 | 2,524 | 1,100 |
| Go | 689 / 6041 / 1650 | 689 | 6,041 | 1,650 |
| Java | 469 / 2753 / 1034 | 469 | 2,753 | 1,034 |
| JavaScript | 1096 / 5376 / 784 | 1,096 | 5,376 | 784 |
| Python | 277 / 3477 / 1667 | 277 | 3,477 | 1,667 |
| Rust | 391 / 3312 / 1890 | 391 | 3,312 | 1,890 |
| TypeScript | 615 / 4925 / 1136 | 615 | 4,925 | 1,136 |
difficulty_score 概览
| 语言 | count | min | p25 | median | mean | p75 | max |
| C | 10,119 | 2.4 | 4.9 | 6.0 | 5.99 | 7.1 | 9.2 |
| C++ | 4,057 | 2.5 | 4.9 | 6.0 | 5.99 | 7.2 | 9.1 |
| Go | 8,380 | 2.6 | 4.9 | 5.8 | 5.85 | 6.8 | 9.1 |
| Java | 4,256 | 2.8 | 4.8 | 5.9 | 5.90 | 7.0 | 9.2 |
| JavaScript | 7,256 | 2.6 | 4.4 | 5.2 | 5.36 | 6.2 | 9.2 |
| Python | 5,421 | 2.6 | 5.2 | 6.3 | 6.25 | 7.3 | 9.1 |
| Rust | 5,593 | 2.7 | 5.2 | 6.3 | 6.26 | 7.4 | 9.0 |
| TypeScript | 6,676 | 2.7 | 4.7 | 5.6 | 5.72 | 6.6 | 9.2 |
全局 Top Tags
library24,122 (46.6%)
backend16,033 (31.0%)
cli6,949 (13.4%)
missing-feature3,642 (7.0%)
frontend3,165 (6.1%)
testing2,225 (4.3%)
http1,595 (3.1%)
react1,432 (2.8%)
incomplete-validation1,360 (2.6%)
framework1,249 (2.4%)
missing-implementation1,194 (2.3%)
missing-metadata-propagation945 (1.8%)
missing-fallback862 (1.7%)
kubernetes739 (1.4%)
async683 (1.3%)
wrong-default666 (1.3%)
embedded642 (1.2%)
networking629 (1.2%)
cpp602 (1.2%)
missing-validation571 (1.1%)
missing-functionality453 (0.9%)
missing-configuration424 (0.8%)
typescript401 (0.8%)
type-handling-inconsistency385 (0.7%)
parsing363 (0.7%)
eslint345 (0.7%)
graphql340 (0.7%)
race-condition336 (0.6%)
missing-configuration-option306 (0.6%)
postgresql300 (0.6%)
每语言 Tags 分布
C c
library5,362 (52.9%)
backend2,606 (25.7%)
cli1,156 (11.4%)
missing-feature692 (6.8%)
embedded616 (6.1%)
cpp599 (5.9%)
testing417 (4.1%)
networking409 (4.0%)
framework294 (2.9%)
incomplete-validation283 (2.8%)
missing-implementation283 (2.8%)
http221 (2.2%)
ruby217 (2.1%)
postgresql213 (2.1%)
firmware184 (1.8%)
kernel181 (1.8%)
quic174 (1.7%)
missing-metadata-propagation152 (1.5%)
rust141 (1.4%)
tls136 (1.3%)
C++ cpp
library2,794 (68.8%)
backend741 (18.2%)
testing496 (12.2%)
cli359 (8.8%)
missing-feature268 (6.6%)
framework186 (4.6%)
missing-implementation144 (3.5%)
http129 (3.2%)
incomplete-validation119 (2.9%)
boost114 (2.8%)
async92 (2.3%)
parsing76 (1.9%)
qt65 (1.6%)
serialization55 (1.4%)
compiler54 (1.3%)
geometry51 (1.3%)
missing-fallback50 (1.2%)
networking49 (1.2%)
missing-metadata-propagation48 (1.2%)
formatting45 (1.1%)
Go go
backend4,423 (52.8%)
library2,087 (24.9%)
cli2,027 (24.2%)
missing-feature739 (8.8%)
kubernetes665 (7.9%)
http536 (6.4%)
incomplete-validation251 (3.0%)
testing236 (2.8%)
missing-metadata-propagation207 (2.5%)
missing-fallback179 (2.1%)
missing-implementation179 (2.1%)
wrong-default134 (1.6%)
docker134 (1.6%)
terraform125 (1.5%)
grpc122 (1.5%)
aws121 (1.4%)
missing-validation120 (1.4%)
prometheus115 (1.4%)
networking97 (1.2%)
framework90 (1.1%)
Java java
backend2,071 (48.6%)
library1,924 (45.2%)
missing-feature237 (5.6%)
testing222 (5.2%)
spring162 (3.8%)
framework156 (3.7%)
aem147 (3.5%)
http146 (3.4%)
android127 (3.0%)
incomplete-validation100 (2.3%)
missing-metadata-propagation95 (2.2%)
missing-implementation83 (1.9%)
missing-configuration72 (1.7%)
cli70 (1.6%)
json68 (1.6%)
missing-null-check64 (1.5%)
wrong-default63 (1.5%)
maven61 (1.4%)
sling57 (1.3%)
kafka56 (1.3%)
JavaScript js
library3,913 (53.9%)
backend1,239 (17.1%)
frontend1,043 (14.4%)
cli859 (11.8%)
missing-feature496 (6.8%)
react354 (4.9%)
typescript334 (4.6%)
testing297 (4.1%)
eslint263 (3.6%)
incomplete-validation221 (3.0%)
framework200 (2.8%)
http183 (2.5%)
fastify149 (2.1%)
webpack148 (2.0%)
missing-fallback124 (1.7%)
missing-metadata-propagation113 (1.6%)
svelte106 (1.5%)
nodejs104 (1.4%)
mongoose104 (1.4%)
missing-implementation100 (1.4%)
Python py
library2,324 (42.8%)
backend2,241 (41.3%)
cli737 (13.6%)
missing-feature454 (8.4%)
fastapi220 (4.1%)
django157 (2.9%)
missing-implementation146 (2.7%)
pytorch119 (2.2%)
testing116 (2.1%)
incomplete-validation114 (2.1%)
missing-fallback114 (2.1%)
missing-metadata-propagation111 (2.0%)
async102 (1.9%)
framework102 (1.9%)
ansible100 (1.8%)
http95 (1.8%)
aws76 (1.4%)
aiohttp66 (1.2%)
pydantic65 (1.2%)
missing-parameter63 (1.2%)
Rust rust
library3,132 (56.0%)
backend1,224 (21.9%)
cli1,138 (20.3%)
missing-feature411 (7.3%)
testing315 (5.6%)
async225 (4.0%)
http211 (3.8%)
missing-implementation162 (2.9%)
incomplete-validation133 (2.4%)
compiler120 (2.1%)
git119 (2.1%)
missing-metadata-propagation97 (1.7%)
macros93 (1.7%)
parsing92 (1.6%)
graphql81 (1.4%)
blockchain80 (1.4%)
substrate77 (1.4%)
serde73 (1.3%)
missing-fallback71 (1.3%)
sql65 (1.2%)
TypeScript ts
library2,586 (38.7%)
frontend1,837 (27.5%)
backend1,488 (22.3%)
react1,069 (16.0%)
cli603 (9.0%)
missing-feature345 (5.2%)
angular203 (3.0%)
graphql164 (2.5%)
framework159 (2.4%)
missing-fallback148 (2.2%)
javascript144 (2.2%)
electron141 (2.1%)
incomplete-validation139 (2.1%)
fullstack131 (2.0%)
testing126 (1.9%)
missing-metadata-propagation122 (1.8%)
wrong-default120 (1.8%)
vue105 (1.6%)
missing-implementation97 (1.5%)
nextjs86 (1.3%)