语言进度
| 语言 | 收集 PR | 过去 1h | 过去 24h | 有效 SWE | 过去 1h | 过去 24h | 已处理 | 处理成功率 |
|---|---|---|---|---|---|---|---|---|
| Cc | 29,588 | 0 | +119 | 9,109 | 0 | +201 | 29,516 | |
| C++cpp | 45,943 | 0 | +331 | 3,724 | +3 | +86 | 45,582 | |
| Gogo | 126,439 | 0 | 0 | 7,572 | +2 | +42 | 73,008 | |
| Javajava | 85,406 | 0 | +538 | 3,624 | +3 | +159 | 60,353 | |
| JavaScriptjs | 37,128 | 0 | +719 | 6,670 | +12 | +300 | 48,432 | |
| Pythonpy | 98,883 | 0 | 0 | 4,563 | 0 | +26 | 79,890 | |
| Rustrust | 68,650 | 0 | 0 | 5,191 | 0 | +77 | 68,604 | |
| TypeScriptts | 66,729 | 0 | 0 | 5,928 | +6 | +212 | 56,260 |
运行参数
| 语言 | 评估模型 (OPENAI) | 填充模型 (ANTHROPIC) | 并发数 | min_source_files | max_source_files |
|---|---|---|---|---|---|
| C | gpt-5.4 | claude-opus-4-7 | 12 | 2 | 15 |
| C++ | glm-5 | claude-sonnet-4-6 | 8 | 2 | 15 |
| Go | Qwen3.6-35B-A3B | Qwen3.6-35B-A3B | 12 | 2 | 10 |
| Java | claude-haiku-4-5-20251001 | claude-opus-4-7 | 8 | 2 | 10 |
| JavaScript | Qwen3.6-35B-A3B | Qwen3.6-35B-A3B | 12 | 2 | 10 |
| Python | Qwen3.6-35B-A3B | Qwen3.6-35B-A3B | 12 | 3 | 15 |
| Rust | Qwen3.6-35B-A3B | Qwen3.6-35B-A3B | 8 | 2 | 10 |
| TypeScript | Qwen3.6-35B-A3B | Qwen3.6-35B-A3B | 12 | 2 | 10 |
失败原因统计
| 语言 | 已处理 | 有效 SWE | 失败 | trivial_pr | validation | infra_error | timeout | workflow_error | 其他 |
|---|---|---|---|---|---|---|---|---|---|
| C | 29,516 | 9,109 | 20,407 | 14,640 | 960 | 5,015 | 14 | 31 | 1 |
| C++ | 45,582 | 3,724 | 41,858 | 8,882 | 2,016 | 31,914 | 322 | 583 | 266 |
| Go | 73,008 | 7,572 | 65,436 | 20,813 | 6,803 | 34,774 | 1,582 | 780 | 656 |
| Java | 60,353 | 3,624 | 56,729 | 15,769 | 6,516 | 31,183 | 1,262 | 743 | 1,602 |
| JavaScript | 48,432 | 6,670 | 41,762 | 20,096 | 1,736 | 18,907 | 777 | 246 | 0 |
| Python | 79,890 | 4,563 | 75,327 | 24,804 | 5,720 | 45,111 | 894 | 355 | 120 |
| Rust | 68,604 | 5,191 | 63,413 | 18,956 | 5,287 | 36,384 | 1,313 | 865 | 1,234 |
| TypeScript | 56,260 | 5,928 | 50,332 | 13,372 | 3,243 | 31,510 | 1,386 | 810 | 11 |
trivial_pr:PR 被 LLM 评估为过于简单(如仅修改配置、文档、依赖版本等),不适合作为 SWE 任务。
validation:任务生成后验证失败(NOP agent 未返回 reward=0 或 ORACLE agent 未返回 reward=1)。
infra_error:基础设施错误(Docker 构建失败、网络超时、磁盘空间不足等)。
timeout:处理超时(单个 PR 总超时或 Claude Code session 超时)。
workflow_error:工作流程错误(PR 元数据获取失败、worktree 创建失败、patch 生成失败等)。
fix.patch 复杂度
| 语言 | Valid SWE Count | Avg fix.patch lines | Avg fix.patch hunks | Avg fix.patch files |
|---|---|---|---|---|
| C | 9,109 | 306.39 | 16.67 | 5.47 |
| C++ | 3,724 | 294.35 | 13.58 | 5.09 |
| Go | 7,572 | 219.68 | 12.96 | 4.42 |
| Java | 3,624 | 167.98 | 10.78 | 4.35 |
| JavaScript | 6,670 | 76.61 | 6.25 | 2.78 |
| Python | 4,563 | 151.66 | 10.98 | 3.81 |
| Rust | 5,191 | 228.18 | 13.18 | 4.09 |
| TypeScript | 5,928 | 159.50 | 9.58 | 4.15 |
统计方法说明
难度打分 difficulty_score
读取每个有效任务目录的 solution/fix.patch、tests/ 和 instruction.md,由 src/swegen/scoring.py 使用零 API 静态评分。
当前公式采用 log-scale 连续评分,避免中等规模 patch 过早变成 hard。权重为:patch_scope 38%、logic_complexity 32%、context_breadth 15%、test_complexity 10%、instruction_complexity 5%。
label 阈值:easy <= 4.0,medium <= 7.0,hard > 7.0。
Tags 生成与展示
tags 不是看板现场计算的,而是在 swegen 构建任务时由 LLM 根据 PR 信息生成,并写入 task.toml 的 [metadata].tags。
prompt 要求 tags 按三段式生成:编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。
fix.patch 统计
patch 统计来自每个有效任务的 solution/fix.patch,并按语言扩展名过滤代码文件,口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。
Avg fix.patch lines 统计代码文件 diff 中新增/删除行数;Avg fix.patch hunks 统计 @@ hunk 数;Avg fix.patch files 统计涉及的代码文件数。
difficulty_label 分布
| 语言 | easy / medium / hard | easy | medium | hard |
|---|---|---|---|---|
| C | 821 / 6042 / 2237 | 821 | 6,042 | 2,237 |
| C++ | 417 / 2298 / 1003 | 417 | 2,298 | 1,003 |
| Go | 583 / 5491 / 1492 | 583 | 5,491 | 1,492 |
| Java | 409 / 2322 / 888 | 409 | 2,322 | 888 |
| JavaScript | 1000 / 4966 / 702 | 1,000 | 4,966 | 702 |
| Python | 253 / 2952 / 1335 | 253 | 2,952 | 1,335 |
| Rust | 370 / 3065 / 1754 | 370 | 3,065 | 1,754 |
| TypeScript | 526 / 4404 / 997 | 526 | 4,404 | 997 |
difficulty_score 概览
| 语言 | count | min | p25 | median | mean | p75 | max |
|---|---|---|---|---|---|---|---|
| C | 9,100 | 2.4 | 4.9 | 6.0 | 5.96 | 7.0 | 9.2 |
| C++ | 3,718 | 2.5 | 4.8 | 6.0 | 5.97 | 7.1 | 9.1 |
| Go | 7,566 | 2.6 | 4.9 | 5.8 | 5.86 | 6.8 | 9.1 |
| Java | 3,619 | 2.8 | 4.8 | 5.9 | 5.89 | 7.0 | 9.2 |
| JavaScript | 6,668 | 2.6 | 4.4 | 5.2 | 5.35 | 6.2 | 9.2 |
| Python | 4,540 | 2.6 | 5.2 | 6.2 | 6.19 | 7.3 | 9.1 |
| Rust | 5,189 | 2.7 | 5.2 | 6.3 | 6.26 | 7.4 | 9.0 |
| TypeScript | 5,927 | 2.7 | 4.7 | 5.6 | 5.71 | 6.6 | 9.1 |
全局 Top Tags
每语言 Tags 分布
轨迹文件总览
| 数据集 | 脚手架 | 模型 | Owner | 轨迹数 | 文件大小 | 平均轮数 | 平均 Token | 平均 Tool Calls | 平均 Score |
|---|---|---|---|---|---|---|---|---|---|
| swegen | Claude Code | glm5 | chaofan | 1,096 | 446 MB | 72.7 | 39,648 | 37.9 | 0.7218 |
| swegen | OpenCode | glm5 | chaofan | 860 | 197 MB | 64.8 | 30,577 | 34.0 | 0.7337 |
| swegen | OpenHands SDK | glm5 | chaofan | 1,140 | 280 MB | 119.6 | 49,811 | 59.8 | 0.7533 |
| swegen | Terminus-2 | glm5 | chaofan | 1,331 | 142 MB | 53.2 | 26,968 | 41.2 | 0.7048 |
| swerebench_oraclesolved | Claude Code | glm5 | chaofan | 4,053 | 601 MB | 44.1 | 22,117 | 22.7 | — |
| swerebench_oraclesolved | OpenCode | glm5 | chaofan | 3,431 | 405 MB | 41.3 | 19,704 | 21.5 | — |
| swerebench_oraclesolved | OpenHands-AI | glm5 | chaofan | 2,808 | 638 MB | 135.1 | 36,434 | 67.2 | — |
| swerebench_oraclesolved | Terminus-2 | glm5 | chaofan | 2,920 | 291 MB | 55.7 | 25,473 | 36.9 | — |
| swegen_selfmade_260301_260414 | OpenHands SDK | glm5 | jierun | 4,555 | 993 MB | 98.5 | 41,839 | 49.1 | — |
| swegen_selfmade_260301_260414 | Terminus-2 | glm5 | jierun | 4,805 | 659 MB | 75.8 | 34,448 | 54.4 | — |
| swegen_selfmade_260415_260505 | OpenHands SDK | glm5 | jierun | 3,954 | 901 MB | 108.8 | 43,867 | 54.2 | — |
| swerebench_oraclesolved | Claude Code | glm5 | jierun | 5,013 | 779 MB | 44.7 | 24,123 | 23.3 | — |
| swerebench_oraclesolved | OpenCode | glm5 | jierun | 3,521 | 464 MB | 51.8 | 22,447 | 27.0 | — |
| swerebench_oraclesolved | OpenHands SDK | glm5 | jierun | 2,733 | 507 MB | 95.8 | 35,567 | 47.5 | — |
| swerebench_oraclesolved | Terminus-2 | glm5 | jierun | 2,998 | 296 MB | 55.5 | 25,257 | 37.0 | — |
| swerebench_others | Claude Code | glm5 | jierun | 2,885 | 457 MB | 43.8 | 25,194 | 23.1 | 0.7178 |
| swerebench_others | OpenCode | glm5 | jierun | 2,637 | 360 MB | 49.1 | 23,637 | 25.8 | 0.7210 |
| swerebench_others | OpenHands SDK | glm5 | jierun | 1,843 | 340 MB | 90.2 | 35,602 | 44.8 | 0.7868 |
| swerebench_others | Terminus-2 | glm5 | jierun | 2,270 | 225 MB | 50.6 | 24,990 | 35.1 | 0.7844 |
| swerebenchv2_python_oraclesolved | Claude Code | glm5 | jierun | 2,728 | 445 MB | 47.5 | 26,064 | 25.1 | — |
| swerebenchv2_python_oraclesolved | OpenCode | glm5 | jierun | 1,654 | 240 MB | 60.7 | 23,898 | 31.3 | — |
| swerebenchv2_python_oraclesolved | OpenHands SDK | glm5 | jierun | 1,521 | 313 MB | 103.0 | 40,232 | 51.0 | — |
| swerebenchv2_python_oraclesolved | Terminus-2 | glm5 | jierun | 1,633 | 174 MB | 55.7 | 27,013 | 37.6 | — |
| v2nopy_full | Claude Code | glm5 | jierun | 5,128 | 818 MB | 42.7 | 26,941 | 22.8 | 0.7281 |
| v2nopy_full | OpenCode | glm5 | jierun | 3,465 | 455 MB | 47.0 | 23,172 | 24.8 | 0.7353 |
| v2nopy_full | OpenHands SDK | glm5 | jierun | 3,266 | 620 MB | 92.7 | 38,252 | 46.0 | 0.7580 |
| v2nopy_full | Terminus-2 | glm5 | jierun | 3,675 | 411 MB | 64.1 | 28,365 | 45.6 | 0.7799 |
质量评分统计
| 数据集 | 脚手架 | composite | efficiency | style | tool_mastery | completion | precision |
|---|---|---|---|---|---|---|---|
| swegen | Claude Code | 0.7218 | 0.9125 | 0.3414 | 0.8678 | 0.6213 | 0.7722 |
| swegen | OpenCode | 0.7337 | 0.9322 | 0.3272 | 0.8886 | 0.6288 | 0.7923 |
| swegen | OpenHands SDK | 0.7533 | 0.8975 | 0.3500 | 0.8537 | 0.7737 | 0.7632 |
| swegen | Terminus-2 | 0.7048 | 0.9241 | 0.4228 | 0.8983 | 0.3960 | 0.8866 |
| swerebench_others | Claude Code | 0.7178 | 0.8864 | 0.3197 | 0.7942 | 0.6405 | 0.8924 |
| swerebench_others | OpenCode | 0.7210 | 0.8952 | 0.3171 | 0.8093 | 0.6509 | 0.8621 |
| swerebench_others | OpenHands SDK | 0.7868 | 0.8886 | 0.3693 | 0.7698 | 0.9333 | 0.8525 |
| swerebench_others | Terminus-2 | 0.7844 | 0.8539 | 0.4292 | 0.8172 | 0.8203 | 0.9323 |
| v2nopy_full | Claude Code | 0.7281 | 0.9186 | 0.3213 | 0.8886 | 0.5565 | 0.8996 |
| v2nopy_full | OpenCode | 0.7353 | 0.9278 | 0.3254 | 0.9008 | 0.5766 | 0.8772 |
| v2nopy_full | OpenHands SDK | 0.7580 | 0.9002 | 0.3555 | 0.8440 | 0.7523 | 0.8370 |
| v2nopy_full | Terminus-2 | 0.7799 | 0.9222 | 0.4310 | 0.9016 | 0.6930 | 0.8808 |
按数据集对比
| 数据集 | 总轨迹数 | 平均轮数 | 平均 Token | 平均 Tool Calls | 平均 Score |
|---|---|---|---|---|---|
| swegen | 4,427 | 77.4 | 36,691 | 43.8 | 0.7271 |
| swegen_selfmade_260301_260414 | 9,360 | 86.8 | 38,044 | 51.8 | — |
| swegen_selfmade_260415_260505 | 3,954 | 108.8 | 43,867 | 54.2 | — |
| swerebench_oraclesolved | 27,477 | 61.8 | 25,724 | 33.3 | — |
| swerebench_others | 9,635 | 55.7 | 26,711 | 30.8 | 0.7475 |
| swerebenchv2_python_oraclesolved | 7,536 | 63.4 | 28,654 | 34.4 | — |
| v2nopy_full | 15,534 | 59.2 | 28,815 | 33.5 | 0.7483 |
源数据路径目录
| 数据集 | 脚手架 | Owner | 文件路径 | 大小 | 条数 |
|---|---|---|---|---|---|
| swegen | Claude Code | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_cc_2135.jsonl | 446 MB | 2,135 |
| swegen | OpenCode | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_oc_1177.jsonl | 197 MB | 1,177 |
| swegen | OpenHands SDK | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_ohsdk_1140.jsonl | 280 MB | 1,140 |
| swegen | Terminus-2 | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_t2_1331.jsonl | 142 MB | 1,331 |
| swegen_selfmade_260301_260414 | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swegen_selfmade_260301_260414_oh_sdk_4555.jsonl | 993 MB | 4,555 |
| swegen_selfmade_260301_260414 | Terminus-2 | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swegen_selfmade_260301_260414_t2_4805.jsonl | 659 MB | 4,805 |
| swegen_selfmade_260415_260505 | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swegen_selfmade_260415_260505_oh_sdk_3954.jsonl | 901 MB | 3,954 |
| swerebench_oraclesolved | Claude Code | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_cc_4053.jsonl | 601 MB | 4,053 |
| swerebench_oraclesolved | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_cc_5013.jsonl | 779 MB | 5,013 |
| swerebench_oraclesolved | OpenCode | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oc_3431.jsonl | 405 MB | 3,431 |
| swerebench_oraclesolved | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oc_3521.jsonl | 464 MB | 3,521 |
| swerebench_oraclesolved | OpenHands-AI | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oh_2808.jsonl | 638 MB | 2,808 |
| swerebench_oraclesolved | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oh_sdk_2733.jsonl | 507 MB | 2,733 |
| swerebench_oraclesolved | Terminus-2 | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_t2_2920.jsonl | 291 MB | 2,920 |
| swerebench_oraclesolved | Terminus-2 | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_t2_2998.jsonl | 296 MB | 2,998 |
| swerebench_others | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_cc_2885.jsonl | 457 MB | 2,885 |
| swerebench_others | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oc_2637.jsonl | 360 MB | 2,637 |
| swerebench_others | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oh_sdk_1843.jsonl | 340 MB | 1,843 |
| swerebench_others | Terminus-2 | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_t2_2270.jsonl | 225 MB | 2,270 |
| swerebenchv2_python_oraclesolved | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_cc_2728.jsonl | 445 MB | 2,728 |
| swerebenchv2_python_oraclesolved | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oc_1654.jsonl | 240 MB | 1,654 |
| swerebenchv2_python_oraclesolved | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oh_sdk_1521.jsonl | 313 MB | 1,521 |
| swerebenchv2_python_oraclesolved | Terminus-2 | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_t2_1633.jsonl | 174 MB | 1,633 |
| v2nopy_full | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_cc_5128.jsonl | 818 MB | 5,128 |
| v2nopy_full | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oc_3465.jsonl | 455 MB | 3,465 |
| v2nopy_full | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oh_sdk_3266.jsonl | 620 MB | 3,266 |
| v2nopy_full | Terminus-2 | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_t2_3675.jsonl | 411 MB | 3,675 |
统计方法说明
平均轮数 / Token / Tool Calls
平均轮数:每条轨迹的 messages 数组长度的平均值。
平均 Token:使用 tiktoken cl100k_base tokenizer 对所有 message 的 content + reasoning_content 精确编码计数的平均值。
平均 Tool Calls:assistant 消息中 tool_calls 数组长度之和的平均值。对 Terminus-2 脚手架,统计 assistant 消息 JSON content 中 commands 数组的长度。
质量评分
composite_score(0-1)由五个维度加权:efficiency(效率)、style(风格)、tool_mastery(工具掌握)、completion(完成度)、precision(精确度)。
仅部分文件包含 _score 字段,无分数的文件显示 "—"。