语言进度
| 语言 | 收集 PR | 过去 1h | 过去 24h | 有效 SWE | 过去 1h | 过去 24h | 已处理 | 处理成功率 |
|---|---|---|---|---|---|---|---|---|
| Cc | 28,232 | 0 | +30 | 7,734 | 0 | 0 | 26,979 | |
| C++cpp | 43,116 | 0 | +50 | 2,017 | +20 | +144 | 19,637 | |
| Gogo | 88,516 | 0 | +908 | 4,483 | 0 | +65 | 26,714 | |
| Javajava | 59,936 | 0 | +47 | 2,652 | 0 | +47 | 13,757 | |
| JavaScriptjs | 30,941 | 0 | +155 | 3,674 | +6 | +8 | 14,021 | |
| Pythonpy | 68,318 | 0 | +235 | 2,518 | 0 | 0 | 19,567 | |
| Rustrust | 54,661 | 0 | +4,225 | 2,523 | +3 | +51 | 16,486 | |
| TypeScriptts | 56,189 | 0 | +326 | 3,423 | +2 | +58 | 32,938 |
运行参数
| 语言 | 评估模型 (OPENAI) | 填充模型 (ANTHROPIC) | 并发数 | min_source_files | max_source_files |
|---|---|---|---|---|---|
| C | gpt-5.4 | claude-sonnet-4-6 | 16 | 2 | 15 |
| C++ | gpt-5.4 | claude-sonnet-4-6 | 16 | 2 | 15 |
| Go | gpt-5.4 | claude-sonnet-4-6 | 16 | 2 | 10 |
| Java | claude-haiku-4-5-20251001 | claude-sonnet-4-6 | 16 | 2 | 10 |
| JavaScript | glmmoedsa | glmmoedsa | 16 | 2 | 10 |
| Python | MiniMax-M2.7 | MiniMax-M2.7 | 16 | 3 | 15 |
| Rust | gpt-5.4 | claude-opus-4-6 | 16 | 2 | 10 |
| TypeScript | gpt-5.4 | claude-sonnet-4-6 | 16 | 2 | 10 |
失败原因统计
| 语言 | 已处理 | 有效 SWE | 失败 | trivial_pr | validation | infra_error | timeout | workflow_error | 其他 |
|---|---|---|---|---|---|---|---|---|---|
| C | 26,979 | 7,734 | 19,245 | 12,238 | 2,237 | 2,815 | 297 | 1,655 | 4 |
| C++ | 19,637 | 2,017 | 17,620 | 3,503 | 396 | 14,009 | 227 | 385 | 267 |
| Go | 26,714 | 4,483 | 22,231 | 9,777 | 2,182 | 6,525 | 1,544 | 944 | 1,196 |
| Java | 13,757 | 2,652 | 11,105 | 4,450 | 1,564 | 3,201 | 283 | 254 | 1,601 |
| JavaScript | 14,021 | 3,674 | 10,347 | 6,574 | 1,749 | 521 | 246 | 432 | 826 |
| Python | 19,567 | 2,518 | 17,049 | 7,515 | 2,256 | 8,117 | 229 | 290 | 123 |
| Rust | 16,486 | 2,523 | 13,963 | 4,209 | 1,114 | 5,504 | 613 | 1,504 | 1,234 |
| TypeScript | 32,938 | 3,423 | 29,515 | 5,683 | 922 | 21,391 | 142 | 1,349 | 28 |
trivial_pr:PR 被 LLM 评估为过于简单(如仅修改配置、文档、依赖版本等),不适合作为 SWE 任务。
validation:任务生成后验证失败(NOP agent 未返回 reward=0 或 ORACLE agent 未返回 reward=1)。
infra_error:基础设施错误(Docker 构建失败、网络超时、磁盘空间不足等)。
timeout:处理超时(单个 PR 总超时或 Claude Code session 超时)。
workflow_error:工作流程错误(PR 元数据获取失败、worktree 创建失败、patch 生成失败等)。
fix.patch 复杂度
| 语言 | Valid SWE Count | Avg fix.patch lines | Avg fix.patch hunks | Avg fix.patch files |
|---|---|---|---|---|
| C | 7,734 | 281.22 | 15.40 | 4.93 |
| C++ | 2,017 | 323.95 | 11.58 | 4.59 |
| Go | 4,483 | 269.81 | 14.86 | 4.95 |
| Java | 2,652 | 176.40 | 11.26 | 4.55 |
| JavaScript | 3,674 | 73.35 | 6.22 | 2.76 |
| Python | 2,518 | 135.78 | 9.99 | 3.44 |
| Rust | 2,523 | 256.21 | 12.84 | 4.06 |
| TypeScript | 3,423 | 158.87 | 9.07 | 4.08 |
统计方法说明
难度打分 difficulty_score
读取每个有效任务目录的 solution/fix.patch、tests/ 和 instruction.md,由 src/swegen/scoring.py 使用零 API 静态评分。
当前公式采用 log-scale 连续评分,避免中等规模 patch 过早变成 hard。权重为:patch_scope 38%、logic_complexity 32%、context_breadth 15%、test_complexity 10%、instruction_complexity 5%。
label 阈值:easy <= 4.0,medium <= 7.0,hard > 7.0。
Tags 生成与展示
tags 不是看板现场计算的,而是在 swegen 构建任务时由 LLM 根据 PR 信息生成,并写入 task.toml 的 [metadata].tags。
prompt 要求 tags 按三段式生成:编程语言、项目层级/领域、框架/库名或具体主题。看板只读取已有 task.toml 并统计每个语言的 tag 出现次数和占比。
fix.patch 统计
patch 统计来自每个有效任务的 solution/fix.patch,并按语言扩展名过滤代码文件,口径与 upload_march_swe_to_hf.py 的 code-only 统计保持一致。
Avg fix.patch lines 统计代码文件 diff 中新增/删除行数;Avg fix.patch hunks 统计 @@ hunk 数;Avg fix.patch files 统计涉及的代码文件数。
difficulty_label 分布
| 语言 | easy / medium / hard | easy | medium | hard |
|---|---|---|---|---|
| C | 735 / 5189 / 1802 | 735 | 5,189 | 1,802 |
| C++ | 329 / 1249 / 435 | 329 | 1,249 | 435 |
| Go | 372 / 3305 / 800 | 372 | 3,305 | 800 |
| Java | 351 / 1686 / 612 | 351 | 1,686 | 612 |
| JavaScript | 593 / 2730 / 350 | 593 | 2,730 | 350 |
| Python | 211 / 1724 / 560 | 211 | 1,724 | 560 |
| Rust | 260 / 1477 / 784 | 260 | 1,477 | 784 |
| TypeScript | 347 / 2596 / 480 | 347 | 2,596 | 480 |
difficulty_score 概览
| 语言 | count | min | p25 | median | mean | p75 | max |
|---|---|---|---|---|---|---|---|
| C | 7,726 | 2.4 | 4.9 | 5.9 | 5.91 | 7.0 | 9.2 |
| C++ | 2,013 | 2.5 | 4.4 | 5.6 | 5.67 | 6.9 | 9.0 |
| Go | 4,477 | 2.6 | 4.9 | 5.8 | 5.81 | 6.7 | 9.1 |
| Java | 2,649 | 2.8 | 4.7 | 5.9 | 5.81 | 6.9 | 9.2 |
| JavaScript | 3,673 | 2.6 | 4.4 | 5.2 | 5.29 | 6.1 | 9.2 |
| Python | 2,495 | 2.6 | 4.9 | 5.8 | 5.90 | 6.9 | 8.9 |
| Rust | 2,521 | 2.7 | 5.0 | 6.2 | 6.11 | 7.4 | 9.0 |
| TypeScript | 3,423 | 2.7 | 4.6 | 5.5 | 5.59 | 6.5 | 8.9 |
全局 Top Tags
每语言 Tags 分布
轨迹文件总览
| 数据集 | 脚手架 | 模型 | Owner | 轨迹数 | 文件大小 | 平均轮数 | 平均 Token | 平均 Tool Calls | 平均 Score |
|---|---|---|---|---|---|---|---|---|---|
| swegen | Claude Code | glm5 | chaofan | 1,096 | 446 MB | 72.7 | 39,648 | 37.9 | 0.7218 |
| swegen | OpenCode | glm5 | chaofan | 860 | 197 MB | 64.8 | 30,577 | 34.0 | 0.7337 |
| swegen | OpenHands SDK | glm5 | chaofan | 1,140 | 280 MB | 119.6 | 49,811 | 59.8 | 0.7533 |
| swegen | Terminus-2 | glm5 | chaofan | 1,331 | 142 MB | 53.2 | 26,968 | 41.2 | 0.7048 |
| swerebench_oraclesolved | Claude Code | glm5 | chaofan | 4,053 | 601 MB | 44.1 | 22,117 | 22.7 | — |
| swerebench_oraclesolved | OpenCode | glm5 | chaofan | 3,431 | 405 MB | 41.3 | 19,704 | 21.5 | — |
| swerebench_oraclesolved | OpenHands-AI | glm5 | chaofan | 2,808 | 638 MB | 135.1 | 36,434 | 67.2 | — |
| swerebench_oraclesolved | Terminus-2 | glm5 | chaofan | 2,920 | 291 MB | 55.7 | 25,473 | 36.9 | — |
| swerebench_oraclesolved | Claude Code | glm5 | jierun | 5,013 | 779 MB | 44.7 | 24,123 | 23.3 | — |
| swerebench_oraclesolved | OpenCode | glm5 | jierun | 3,521 | 464 MB | 51.8 | 22,447 | 27.0 | — |
| swerebench_oraclesolved | OpenHands SDK | glm5 | jierun | 2,733 | 507 MB | 95.8 | 35,567 | 47.5 | — |
| swerebench_oraclesolved | Terminus-2 | glm5 | jierun | 2,998 | 296 MB | 55.5 | 25,257 | 37.0 | — |
| swerebench_others | Claude Code | glm5 | jierun | 2,885 | 457 MB | 43.8 | 25,194 | 23.1 | 0.7178 |
| swerebench_others | OpenCode | glm5 | jierun | 2,637 | 360 MB | 49.1 | 23,637 | 25.8 | 0.7210 |
| swerebench_others | OpenHands SDK | glm5 | jierun | 1,843 | 340 MB | 90.2 | 35,602 | 44.8 | 0.7868 |
| swerebench_others | Terminus-2 | glm5 | jierun | 2,270 | 225 MB | 50.6 | 24,990 | 35.1 | 0.7844 |
| swerebenchv2_python_oraclesolved | Claude Code | glm5 | jierun | 2,728 | 445 MB | 47.5 | 26,064 | 25.1 | — |
| swerebenchv2_python_oraclesolved | OpenCode | glm5 | jierun | 1,654 | 240 MB | 60.7 | 23,898 | 31.3 | — |
| swerebenchv2_python_oraclesolved | OpenHands SDK | glm5 | jierun | 1,521 | 313 MB | 103.0 | 40,232 | 51.0 | — |
| swerebenchv2_python_oraclesolved | Terminus-2 | glm5 | jierun | 1,633 | 174 MB | 55.7 | 27,013 | 37.6 | — |
| v2nopy_full | Claude Code | glm5 | jierun | 5,128 | 818 MB | 42.7 | 26,941 | 22.8 | 0.7281 |
| v2nopy_full | OpenCode | glm5 | jierun | 3,465 | 455 MB | 47.0 | 23,172 | 24.8 | 0.7353 |
| v2nopy_full | OpenHands SDK | glm5 | jierun | 3,266 | 620 MB | 92.7 | 38,252 | 46.0 | 0.7580 |
| v2nopy_full | Terminus-2 | glm5 | jierun | 3,675 | 411 MB | 64.1 | 28,365 | 45.6 | 0.7799 |
质量评分统计
| 数据集 | 脚手架 | composite | efficiency | style | tool_mastery | completion | precision |
|---|---|---|---|---|---|---|---|
| swegen | Claude Code | 0.7218 | 0.9125 | 0.3414 | 0.8678 | 0.6213 | 0.7722 |
| swegen | OpenCode | 0.7337 | 0.9322 | 0.3272 | 0.8886 | 0.6288 | 0.7923 |
| swegen | OpenHands SDK | 0.7533 | 0.8975 | 0.3500 | 0.8537 | 0.7737 | 0.7632 |
| swegen | Terminus-2 | 0.7048 | 0.9241 | 0.4228 | 0.8983 | 0.3960 | 0.8866 |
| swerebench_others | Claude Code | 0.7178 | 0.8864 | 0.3197 | 0.7942 | 0.6405 | 0.8924 |
| swerebench_others | OpenCode | 0.7210 | 0.8952 | 0.3171 | 0.8093 | 0.6509 | 0.8621 |
| swerebench_others | OpenHands SDK | 0.7868 | 0.8886 | 0.3693 | 0.7698 | 0.9333 | 0.8525 |
| swerebench_others | Terminus-2 | 0.7844 | 0.8539 | 0.4292 | 0.8172 | 0.8203 | 0.9323 |
| v2nopy_full | Claude Code | 0.7281 | 0.9186 | 0.3213 | 0.8886 | 0.5565 | 0.8996 |
| v2nopy_full | OpenCode | 0.7353 | 0.9278 | 0.3254 | 0.9008 | 0.5766 | 0.8772 |
| v2nopy_full | OpenHands SDK | 0.7580 | 0.9002 | 0.3555 | 0.8440 | 0.7523 | 0.8370 |
| v2nopy_full | Terminus-2 | 0.7799 | 0.9222 | 0.4310 | 0.9016 | 0.6930 | 0.8808 |
按数据集对比
| 数据集 | 总轨迹数 | 平均轮数 | 平均 Token | 平均 Tool Calls | 平均 Score |
|---|---|---|---|---|---|
| swegen | 4,427 | 77.4 | 36,691 | 43.8 | 0.7271 |
| swerebench_oraclesolved | 27,477 | 61.8 | 25,724 | 33.3 | — |
| swerebench_others | 9,635 | 55.7 | 26,711 | 30.8 | 0.7475 |
| swerebenchv2_python_oraclesolved | 7,536 | 63.4 | 28,654 | 34.4 | — |
| v2nopy_full | 15,534 | 59.2 | 28,815 | 33.5 | 0.7483 |
源数据路径目录
| 数据集 | 脚手架 | Owner | 文件路径 | 大小 | 条数 |
|---|---|---|---|---|---|
| swegen | Claude Code | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_cc_2135.jsonl | 446 MB | 2,135 |
| swegen | OpenCode | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_oc_1177.jsonl | 197 MB | 1,177 |
| swegen | OpenHands SDK | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_ohsdk_1140.jsonl | 280 MB | 1,140 |
| swegen | Terminus-2 | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swegen_t2_1331.jsonl | 142 MB | 1,331 |
| swerebench_oraclesolved | Claude Code | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_cc_4053.jsonl | 601 MB | 4,053 |
| swerebench_oraclesolved | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_cc_5013.jsonl | 779 MB | 5,013 |
| swerebench_oraclesolved | OpenCode | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oc_3431.jsonl | 405 MB | 3,431 |
| swerebench_oraclesolved | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oc_3521.jsonl | 464 MB | 3,521 |
| swerebench_oraclesolved | OpenHands-AI | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_oh_2808.jsonl | 638 MB | 2,808 |
| swerebench_oraclesolved | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_oh_sdk_2733.jsonl | 507 MB | 2,733 |
| swerebench_oraclesolved | Terminus-2 | chaofan | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/chaofan_glm5_swerebench_oraclesolved_t2_2920.jsonl | 291 MB | 2,920 |
| swerebench_oraclesolved | Terminus-2 | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_oraclesolved_t2_2998.jsonl | 296 MB | 2,998 |
| swerebench_others | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_cc_2885.jsonl | 457 MB | 2,885 |
| swerebench_others | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oc_2637.jsonl | 360 MB | 2,637 |
| swerebench_others | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_oh_sdk_1843.jsonl | 340 MB | 1,843 |
| swerebench_others | Terminus-2 | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebench_others_t2_2270.jsonl | 225 MB | 2,270 |
| swerebenchv2_python_oraclesolved | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_cc_2728.jsonl | 445 MB | 2,728 |
| swerebenchv2_python_oraclesolved | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oc_1654.jsonl | 240 MB | 1,654 |
| swerebenchv2_python_oraclesolved | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_oh_sdk_1521.jsonl | 313 MB | 1,521 |
| swerebenchv2_python_oraclesolved | Terminus-2 | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_swerebenchv2_python_oraclesolved_t2_1633.jsonl | 174 MB | 1,633 |
| v2nopy_full | Claude Code | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_cc_5128.jsonl | 818 MB | 5,128 |
| v2nopy_full | OpenCode | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oc_3465.jsonl | 455 MB | 3,465 |
| v2nopy_full | OpenHands SDK | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_oh_sdk_3266.jsonl | 620 MB | 3,266 |
| v2nopy_full | Terminus-2 | jierun | /home/ywxzml3j/ywxzml3juser57/LLaMA-Factory/data/chaofan_jierun_traj/jierun_glm5_v2nopy_full_t2_3675.jsonl | 411 MB | 3,675 |
统计方法说明
平均轮数 / Token / Tool Calls
平均轮数:每条轨迹的 messages 数组长度的平均值。
平均 Token:使用 tiktoken cl100k_base tokenizer 对所有 message 的 content + reasoning_content 精确编码计数的平均值。
平均 Tool Calls:assistant 消息中 tool_calls 数组长度之和的平均值。对 Terminus-2 脚手架,统计 assistant 消息 JSON content 中 commands 数组的长度。
质量评分
composite_score(0-1)由五个维度加权:efficiency(效率)、style(风格)、tool_mastery(工具掌握)、completion(完成度)、precision(精确度)。
仅部分文件包含 _score 字段,无分数的文件显示 "—"。