语言进度
| 语言 | 收集 PR | 过去 1h | 过去 24h | 有效 SWE | 过去 1h | 过去 24h | 已处理目录 | 处理成功率 |
|---|---|---|---|---|---|---|---|---|
| Cc | 28,232 | +30 | +30 | 7,734 | 0 | 0 | 10,431 | |
| C++cpp | 43,105 | +39 | +39 | 1,918 | +11 | +65 | 4,249 | |
| Gogo | 88,472 | +864 | +955 | 4,454 | +7 | +47 | 10,962 | |
| Javajava | 59,936 | +47 | +53 | 2,637 | +7 | +33 | 6,444 | |
| JavaScriptjs | 30,941 | +155 | +155 | 3,668 | +2 | +2 | 7,877 | |
| Pythonpy | 68,268 | +185 | +236 | 2,518 | 0 | 0 | 7,964 | |
| Rustrust | 50,823 | +387 | +437 | 2,505 | +1 | +36 | 5,426 | |
| TypeScriptts | 56,189 | +326 | +329 | 3,391 | +4 | +57 | 7,351 |
运行参数
| 语言 | 评估模型 (OPENAI) | 填充模型 (ANTHROPIC) | 并发数 | min_source_files | max_source_files |
|---|---|---|---|---|---|
| C | gpt-5.4 | claude-sonnet-4-6 | 16 | 2 | 15 |
| C++ | gpt-5.4 | claude-sonnet-4-6 | 16 | 2 | 15 |
| Go | gpt-5.4 | claude-sonnet-4-6 | 16 | 2 | 10 |
| Java | claude-haiku-4-5-20251001 | claude-sonnet-4-6 | 16 | 2 | 10 |
| JavaScript | glmmoedsa | glmmoedsa | 16 | 2 | 10 |
| Python | MiniMax-M2.7 | MiniMax-M2.7 | 16 | 3 | 15 |
| Rust | gpt-5.4 | claude-opus-4-6 | 16 | 2 | 10 |
| TypeScript | gpt-5.4 | claude-sonnet-4-6 | 16 | 2 | 10 |
失败原因统计
| 语言 | 已处理 | 成功 | 失败 | trivial_pr | validation | infra_error | timeout | workflow_error | 其他 |
|---|---|---|---|---|---|---|---|---|---|
| C | 39,798 | 8,220 | 31,578 | 13,809 | 2,264 | 12,730 | 680 | 1,855 | 240 |
| C++ | 23,492 | 2,108 | 21,384 | 5,056 | 572 | 14,881 | 221 | 381 | 273 |
| Go | 30,157 | 6,534 | 23,623 | 10,533 | 2,196 | 6,583 | 1,583 | 1,178 | 1,550 |
| Java | 14,757 | 2,991 | 11,766 | 4,728 | 1,593 | 3,213 | 261 | 357 | 1,614 |
| JavaScript | 16,972 | 5,115 | 11,857 | 7,386 | 1,794 | 529 | 358 | 637 | 1,153 |
| Python | 22,915 | 2,630 | 20,285 | 8,859 | 2,534 | 8,204 | 238 | 317 | 133 |
| Rust | 17,537 | 2,945 | 14,592 | 4,405 | 1,150 | 5,542 | 641 | 1,620 | 1,234 |
| TypeScript | 36,977 | 3,393 | 33,584 | 6,657 | 886 | 22,856 | 181 | 2,899 | 105 |
fix.patch 复杂度
| 语言 | Valid SWE Count | Avg fix.patch lines | Avg fix.patch hunks | Avg fix.patch files |
|---|---|---|---|---|
| C | 7,734 | 281.22 | 15.40 | 4.93 |
| C++ | 1,918 | 330.75 | 11.55 | 4.54 |
| Go | 4,454 | 270.98 | 14.92 | 4.96 |
| Java | 2,637 | 170.88 | 10.93 | 4.46 |
| JavaScript | 3,668 | 73.28 | 6.21 | 2.76 |
| Python | 2,518 | 135.78 | 9.99 | 3.44 |
| Rust | 2,505 | 257.03 | 12.86 | 4.06 |
| TypeScript | 3,391 | 159.62 | 9.09 | 4.09 |
统计方法说明
难度打分 difficulty_score
读取每个有效任务目录的 solution/fix.patch、tests/ 和 instruction.md,由 src/swegen/scoring.py 使用零 API 静态评分。
当前公式采用 log-scale 连续评分,避免中等规模 patch 过早变成 hard。权重为:patch_scope 38%、logic_complexity 32%、context_breadth 15%、test_complexity 10%、instruction_complexity 5%。
label 阈值:easy <= 4.0,medium <= 7.0,hard > 7.0。
Tags 生成与展示
tags 不是看板现场计算的,而是在 swegen 构建任务时由 LLM 根据 PR 信息生成,并写入 task.toml 的 [metadata].tags。
prompt 要求 tags 按三段式生成:编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。
fix.patch 统计
patch 统计来自每个有效任务的 solution/fix.patch,并按语言扩展名过滤代码文件,口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。
Avg fix.patch lines 统计代码文件 diff 中新增/删除行数;Avg fix.patch hunks 统计 @@ hunk 数;Avg fix.patch files 统计涉及的代码文件数。
difficulty_label 分布
| 语言 | easy / medium / hard | easy | medium | hard |
|---|---|---|---|---|
| C | 735 / 5189 / 1802 | 735 | 5,189 | 1,802 |
| C++ | 323 / 1194 / 397 | 323 | 1,194 | 397 |
| Go | 370 / 3283 / 795 | 370 | 3,283 | 795 |
| Java | 351 / 1676 / 607 | 351 | 1,676 | 607 |
| JavaScript | 593 / 2726 / 348 | 593 | 2,726 | 348 |
| Python | 211 / 1724 / 560 | 211 | 1,724 | 560 |
| Rust | 260 / 1465 / 778 | 260 | 1,465 | 778 |
| TypeScript | 346 / 2570 / 475 | 346 | 2,570 | 475 |
difficulty_score 概览
| 语言 | count | min | p25 | median | mean | p75 | max |
|---|---|---|---|---|---|---|---|
| C | 7,726 | 2.4 | 4.9 | 5.9 | 5.91 | 7.0 | 9.2 |
| C++ | 1,914 | 2.5 | 4.4 | 5.6 | 5.63 | 6.8 | 9.0 |
| Go | 4,448 | 2.6 | 4.9 | 5.8 | 5.81 | 6.7 | 9.1 |
| Java | 2,634 | 2.8 | 4.7 | 5.9 | 5.81 | 6.9 | 9.2 |
| JavaScript | 3,667 | 2.6 | 4.4 | 5.2 | 5.28 | 6.1 | 9.2 |
| Python | 2,495 | 2.6 | 4.9 | 5.8 | 5.90 | 6.9 | 8.9 |
| Rust | 2,503 | 2.7 | 4.9 | 6.2 | 6.11 | 7.4 | 9.0 |
| TypeScript | 3,391 | 2.7 | 4.6 | 5.5 | 5.58 | 6.5 | 8.9 |
全局 Top Tags
每语言 Tags 分布
轨迹文件总览
| 数据集 | 脚手架 | 模型 | Agent | 轨迹数(main) | 轨迹数(sub) | 文件大小 | 平均轮数 | 平均 Token | 平均 Tool Calls | 平均 Score |
|---|---|---|---|---|---|---|---|---|---|---|
| swegen | Claude Code | glm5 | chaofan | 1,096 | 1,039 | 446 MB | 72.7 | 35,904 | 37.9 | 0.7218 |
| swegen | OpenCode | glm5 | chaofan | 860 | 317 | 197 MB | 64.8 | 28,867 | 34.0 | 0.7337 |
| swegen | OpenHands SDK | glm5 | chaofan | 1,140 | — | 280 MB | 119.6 | 44,172 | 59.8 | 0.7533 |
| swegen | Terminus | glm5 | chaofan | 1,331 | — | 142 MB | 53.2 | 26,362 | 0.0 | 0.7048 |
| swerebench_oraclesolved | Claude Code | glm5 | chaofan | 4,053 | — | 601 MB | 44.1 | 21,015 | 22.7 | — |
| swerebench_oraclesolved | OpenCode | glm5 | chaofan | 3,431 | — | 405 MB | 41.3 | 19,519 | 21.5 | — |
| swerebench_oraclesolved | OpenHands SDK | glm5 | chaofan | 2,808 | — | 638 MB | 135.1 | 40,631 | 67.2 | — |
| swerebench_oraclesolved | Terminus | glm5 | chaofan | 2,920 | — | 291 MB | 55.7 | 24,625 | 0.0 | — |
| swerebench_oraclesolved | Claude Code | glm5 | jierun | 5,013 | — | 779 MB | 44.7 | 22,630 | 23.3 | — |
| swerebench_oraclesolved | OpenCode | glm5 | jierun | 3,521 | — | 464 MB | 51.8 | 22,138 | 27.0 | — |
| swerebench_oraclesolved | OpenHands SDK | glm5 | jierun | 2,733 | — | 507 MB | 95.8 | 31,785 | 47.5 | — |
| swerebench_oraclesolved | Terminus | glm5 | jierun | 2,998 | — | 296 MB | 55.5 | 24,396 | 0.0 | — |
| swerebench_others | Claude Code | glm5 | jierun | 2,885 | — | 457 MB | 43.8 | 23,380 | 23.1 | 0.7178 |
| swerebench_others | OpenCode | glm5 | jierun | 2,637 | — | 360 MB | 49.1 | 23,135 | 25.8 | 0.7210 |
| swerebench_others | OpenHands SDK | glm5 | jierun | 1,843 | — | 340 MB | 90.2 | 32,078 | 44.8 | 0.7868 |
| swerebench_others | Terminus | glm5 | jierun | 2,270 | — | 225 MB | 50.6 | 24,402 | 0.0 | 0.7844 |
| swerebenchv2_python_oraclesolved | Claude Code | glm5 | jierun | 2,728 | — | 445 MB | 47.5 | 24,446 | 25.1 | — |
| swerebenchv2_python_oraclesolved | OpenCode | glm5 | jierun | 1,654 | — | 240 MB | 60.7 | 23,822 | 31.3 | — |
| swerebenchv2_python_oraclesolved | OpenHands SDK | glm5 | jierun | 1,521 | — | 313 MB | 103.0 | 35,858 | 51.0 | — |
| swerebenchv2_python_oraclesolved | Terminus | glm5 | jierun | 1,633 | — | 174 MB | 55.7 | 26,382 | 0.0 | — |
| v2nopy_full | Claude Code | glm5 | jierun | 5,128 | — | 818 MB | 42.7 | 24,097 | 22.8 | 0.7281 |
| v2nopy_full | OpenCode | glm5 | jierun | 3,465 | — | 455 MB | 47.0 | 21,718 | 24.8 | 0.7353 |
| v2nopy_full | OpenHands SDK | glm5 | jierun | 3,266 | — | 620 MB | 92.7 | 33,367 | 46.0 | 0.7580 |
| v2nopy_full | Terminus | glm5 | jierun | 3,675 | — | 411 MB | 64.1 | 27,517 | 0.0 | 0.7799 |
质量评分统计
| 数据集 | 脚手架 | composite | efficiency | style | tool_mastery | completion | precision |
|---|---|---|---|---|---|---|---|
| swegen | Claude Code | 0.7218 | 0.9125 | 0.3414 | 0.8678 | 0.6213 | 0.7722 |
| swegen | OpenCode | 0.7337 | 0.9322 | 0.3272 | 0.8886 | 0.6288 | 0.7923 |
| swegen | OpenHands SDK | 0.7533 | 0.8975 | 0.3500 | 0.8537 | 0.7737 | 0.7632 |
| swegen | Terminus | 0.7048 | 0.9241 | 0.4228 | 0.8983 | 0.3960 | 0.8866 |
| swerebench_others | Claude Code | 0.7178 | 0.8864 | 0.3197 | 0.7942 | 0.6405 | 0.8924 |
| swerebench_others | OpenCode | 0.7210 | 0.8952 | 0.3171 | 0.8093 | 0.6509 | 0.8621 |
| swerebench_others | OpenHands SDK | 0.7868 | 0.8886 | 0.3693 | 0.7698 | 0.9333 | 0.8525 |
| swerebench_others | Terminus | 0.7844 | 0.8539 | 0.4292 | 0.8172 | 0.8203 | 0.9323 |
| v2nopy_full | Claude Code | 0.7281 | 0.9186 | 0.3213 | 0.8886 | 0.5565 | 0.8996 |
| v2nopy_full | OpenCode | 0.7353 | 0.9278 | 0.3254 | 0.9008 | 0.5766 | 0.8772 |
| v2nopy_full | OpenHands SDK | 0.7580 | 0.9002 | 0.3555 | 0.8440 | 0.7523 | 0.8370 |
| v2nopy_full | Terminus | 0.7799 | 0.9222 | 0.4310 | 0.9016 | 0.6930 | 0.8808 |
按脚手架对比
| 脚手架 | 总轨迹数 | 平均轮数 | 平均 Token | 平均 Tool Calls | 平均 Score |
|---|---|---|---|---|---|
| Claude Code | 20,903 | 45.8 | 23,713 | 24.0 | 0.7241 |
| OpenCode | 15,568 | 49.6 | 22,187 | 25.9 | 0.7297 |
| OpenHands SDK | 13,311 | 105.4 | 35,606 | 52.4 | 0.7656 |
| Terminus | 14,827 | 56.7 | 25,611 | 0.0 | 0.7675 |
按数据集对比
| 数据集 | 总轨迹数 | 平均轮数 | 平均 Token | 平均 Tool Calls | 平均 Score |
|---|---|---|---|---|---|
| swegen | 4,427 | 77.4 | 33,797 | 31.4 | 0.7271 |
| swerebench_oraclesolved | 27,477 | 61.8 | 25,095 | 25.3 | — |
| swerebench_others | 9,635 | 55.7 | 25,217 | 22.6 | 0.7475 |
| swerebenchv2_python_oraclesolved | 7,536 | 63.4 | 27,032 | 26.2 | — |
| v2nopy_full | 15,534 | 59.2 | 26,324 | 22.7 | 0.7483 |
源数据路径目录
| 数据集 | 脚手架 | Agent | 文件路径 | 大小 | 条数 |
|---|---|---|---|---|---|
| swegen | Claude Code | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_cc_2135.jsonl | 446 MB | 2,135 |
| swegen | OpenCode | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_oc_1177.jsonl | 197 MB | 1,177 |
| swegen | OpenHands SDK | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_ohsdk_1140.jsonl | 280 MB | 1,140 |
| swegen | Terminus | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_t2_1331.jsonl | 142 MB | 1,331 |
| swerebench_oraclesolved | Claude Code | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_cc_4053.jsonl | 601 MB | 4,053 |
| swerebench_oraclesolved | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_cc_5013.jsonl | 779 MB | 5,013 |
| swerebench_oraclesolved | OpenCode | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oc_3431.jsonl | 405 MB | 3,431 |
| swerebench_oraclesolved | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oc_3521.jsonl | 464 MB | 3,521 |
| swerebench_oraclesolved | OpenHands SDK | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oh_2808.jsonl | 638 MB | 2,808 |
| swerebench_oraclesolved | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oh_sdk_2733.jsonl | 507 MB | 2,733 |
| swerebench_oraclesolved | Terminus | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_t2_2920.jsonl | 291 MB | 2,920 |
| swerebench_oraclesolved | Terminus | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_t2_2998.jsonl | 296 MB | 2,998 |
| swerebench_others | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_cc_2885.jsonl | 457 MB | 2,885 |
| swerebench_others | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oc_2637.jsonl | 360 MB | 2,637 |
| swerebench_others | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oh_sdk_1843.jsonl | 340 MB | 1,843 |
| swerebench_others | Terminus | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_t2_2270.jsonl | 225 MB | 2,270 |
| swerebenchv2_python_oraclesolved | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_cc_2728.jsonl | 445 MB | 2,728 |
| swerebenchv2_python_oraclesolved | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oc_1654.jsonl | 240 MB | 1,654 |
| swerebenchv2_python_oraclesolved | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oh_sdk_1521.jsonl | 313 MB | 1,521 |
| swerebenchv2_python_oraclesolved | Terminus | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_t2_1633.jsonl | 174 MB | 1,633 |
| v2nopy_full | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_cc_5128.jsonl | 818 MB | 5,128 |
| v2nopy_full | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oc_3465.jsonl | 455 MB | 3,465 |
| v2nopy_full | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oh_sdk_3266.jsonl | 620 MB | 3,266 |
| v2nopy_full | Terminus | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_t2_3675.jsonl | 411 MB | 3,675 |
统计方法说明
轨迹数统计
JSONL 文件中每行为一条轨迹。有 _agent_type 字段的文件按 main/subagent 分类;无该字段的文件所有行视为 main。
平均轮数 / Token / Tool Calls
平均轮数:每条轨迹的 messages 数组长度的平均值。
平均 Token(估算):所有 message 的 content + reasoning_content 字符总数 ÷ 4 的平均值。
平均 Tool Calls:assistant 消息中 tool_calls 数组长度之和的平均值。
质量评分
composite_score(0-1)由五个维度加权:efficiency(效率)、style(风格)、tool_mastery(工具掌握)、completion(完成度)、precision(精确度)。
仅部分文件包含 _score 字段,无分数的文件显示 "—"。