SWE-gen/instance
Updated 2026-07-04 02:38:43 BJT · next 2026-07-04 03:38:44 BJT
Overview

Live Dashboard of PRs collections and SWE Tasks Generation

Live state of GitHub PR collection and verifiable SWE-Bench task generation across 8 languages. Detailed per-language analysis is collapsed below — click any section to expand.

Total PRs collected
598,777
24,283 unique repos · 1h +1 / 24h +7
Total valid SWE
51,461
10,021 unique repos · 1h +74 / 24h +567
Overall success rate
8.7%
Valid SWE / processed 588,367
Mean difficulty_score
5.89
median 5.9, count 51,411

Language Progress

LanguagePRs collectedLast 1hLast 24hValid SWELast 1hLast 24hProcessedSuccess rate
Cc32,981009,934+39+21832,981
30.1%
C++cpp49,163004,0620048,562
8.4%
Gogo133,025+1+38,3860+12123,518
6.8%
Javajava90,8360+14,253+3+9790,730
4.7%
JavaScriptjs40,0660+17,2570040,065
18.1%
Pythonpy108,345005,321+28+202108,256
4.9%
Rustrust72,577005,5950072,539
7.7%
TypeScriptts71,7840+26,653+4+3871,716
9.3%

Run Parameters

LanguageEval model (OPENAI)Completion model (ANTHROPIC)Concurrencymin_source_filesmax_source_files
Cglm-5claude-opus-4-712215
C++glm-5claude-opus-4-78215
Goglm-5claude-opus-4-712210
Javaglm-5claude-opus-4-78210
JavaScriptglm-5claude-opus-4-712210
Pythonglm-5claude-opus-4-712315
Rustglm-5claude-opus-4-78210
TypeScriptglm-5claude-opus-4-712210

Failure Reason Breakdown

click to expand
LanguageProcessedValid SWEFailedtrivial_prvalidationinfra_errortimeoutworkflow_errorOther
C32,9819,93423,04719,4381,1342,6948312
C++48,5624,06244,5008,7676,80930,083159662266
Go123,5188,386115,13238,74124,04849,7421,4611,033108
Java90,7304,25386,47726,94711,00241,2861,0695,0091,599
JavaScript40,0657,25732,80820,6849313,3891150
Python108,2565,321102,93544,55015,51343,324828339120
Rust72,5395,59566,94430,14910,20524,1001,0778461,233
TypeScript71,7166,65365,06322,08613,93826,8451,7687188

trivial_pr: the PR was judged by the LLM as too trivial (e.g. only config, docs, or dependency-version changes) and unsuitable as a SWE task.

validation: validation failed after task generation (the NOP agent did not return reward=0, or the ORACLE agent did not return reward=1).

infra_error: infrastructure error (Docker build failure, network timeout, insufficient disk space, etc.).

timeout: processing timed out (per-PR total timeout or Claude Code session timeout).

workflow_error: workflow error (PR metadata fetch failure, worktree creation failure, patch generation failure, etc.).

fix.patch Complexity

click to expand
LanguageValid SWE CountAvg fix.patch linesAvg fix.patch hunksAvg fix.patch files
C9,934333.1917.835.82
C++4,062285.9313.745.09
Go8,386212.0112.564.34
Java4,253166.3310.664.26
JavaScript7,25777.116.312.80
Python5,321152.9511.103.88
Rust5,595226.3513.194.11
TypeScript6,653155.529.534.11

Metric Definitions

click to expand

Difficulty score (difficulty_score)

Reads each valid task directory's solution/fix.patch, tests/, and instruction.md, scored statically with zero API calls by src/swegen/scoring.py.

The current formula uses log-scale continuous scoring to avoid mid-sized patches becoming hard too early. Weights: patch_scope 38%, logic_complexity 32%, context_breadth 15%, test_complexity 10%, instruction_complexity 5%.

Label thresholds: easy <= 4.0, medium <= 7.0, hard > 7.0.

Tag generation and display

tags are not computed live by the dashboard; they are generated by the LLM from PR information when swegen builds the task, and written to [metadata].tags in task.toml.

The prompt asks for tags in four parts: programming language, project layer/domain, framework/library or specific topic, and a domain-independent bug class (e.g. missing-fallback, incomplete-validation). The dashboard reads existing task.toml files, counts each language's tag occurrences and share, and treats the 4th tag as the bug class for the Bug-Class panels below.

fix.patch statistics

Patch stats come from each valid task's solution/fix.patch, filtering code files by language extension, consistent with the code-only stats in upload_march_swe_to_hf.py.

Avg fix.patch lines counts added/removed lines in code-file diffs; Avg fix.patch hunks counts @@ hunks; Avg fix.patch files counts the code files involved.

difficulty_label Distribution

click to expand
Languageeasy / medium / hardeasymediumhard
C
879 / 6559 / 2488
8796,5592,488
C++
433 / 2524 / 1100
4332,5241,100
Go
689 / 6041 / 1650
6896,0411,650
Java
468 / 2748 / 1033
4682,7481,033
JavaScript
1096 / 5376 / 784
1,0965,376784
Python
276 / 3406 / 1615
2763,4061,615
Rust
391 / 3312 / 1890
3913,3121,890
TypeScript
613 / 4906 / 1134
6134,9061,134

difficulty_score Overview

click to expand
Languagecountminp25medianmeanp75max
C9,9262.44.96.05.987.19.2
C++4,0572.54.96.05.997.29.1
Go8,3802.64.95.85.856.89.1
Java4,2492.84.85.95.907.09.2
JavaScript7,2562.64.45.25.366.29.2
Python5,2972.65.26.26.247.39.1
Rust5,5932.75.26.36.267.49.0
TypeScript6,6532.74.75.65.726.69.2

Global Top Tags

click to expand
library23,915 (46.5%)
backend15,930 (31.0%)
cli6,922 (13.5%)
missing-feature3,610 (7.0%)
frontend3,159 (6.1%)
testing2,225 (4.3%)
http1,593 (3.1%)
react1,429 (2.8%)
incomplete-validation1,350 (2.6%)
framework1,247 (2.4%)
missing-implementation1,194 (2.3%)
missing-metadata-propagation943 (1.8%)
missing-fallback861 (1.7%)
kubernetes735 (1.4%)
async682 (1.3%)
wrong-default665 (1.3%)
embedded641 (1.2%)
networking620 (1.2%)
cpp600 (1.2%)
missing-validation567 (1.1%)
missing-functionality451 (0.9%)
missing-configuration424 (0.8%)
typescript399 (0.8%)
type-handling-inconsistency382 (0.7%)
parsing361 (0.7%)
eslint345 (0.7%)
graphql340 (0.7%)
race-condition334 (0.6%)
missing-configuration-option305 (0.6%)
postgresql300 (0.6%)

Per-Language Tag Distribution

click to expand

C c

library5,226 (52.6%)
backend2,564 (25.8%)
cli1,143 (11.5%)
missing-feature671 (6.8%)
embedded615 (6.2%)
cpp597 (6.0%)
testing417 (4.2%)
networking402 (4.0%)
framework294 (3.0%)
missing-implementation283 (2.8%)
incomplete-validation275 (2.8%)
http220 (2.2%)
ruby214 (2.2%)
postgresql213 (2.1%)
firmware184 (1.9%)
kernel179 (1.8%)
quic169 (1.7%)
missing-metadata-propagation151 (1.5%)
rust141 (1.4%)
missing-functionality134 (1.3%)

C++ cpp

library2,793 (68.8%)
backend741 (18.3%)
testing496 (12.2%)
cli359 (8.8%)
missing-feature268 (6.6%)
framework186 (4.6%)
missing-implementation144 (3.5%)
http129 (3.2%)
incomplete-validation119 (2.9%)
boost114 (2.8%)
async92 (2.3%)
parsing76 (1.9%)
qt65 (1.6%)
serialization55 (1.4%)
compiler54 (1.3%)
geometry51 (1.3%)
missing-fallback50 (1.2%)
networking49 (1.2%)
missing-metadata-propagation48 (1.2%)
formatting45 (1.1%)

Go go

backend4,423 (52.8%)
library2,087 (24.9%)
cli2,027 (24.2%)
missing-feature739 (8.8%)
kubernetes665 (7.9%)
http536 (6.4%)
incomplete-validation251 (3.0%)
testing236 (2.8%)
missing-metadata-propagation207 (2.5%)
missing-fallback179 (2.1%)
missing-implementation179 (2.1%)
docker134 (1.6%)
wrong-default134 (1.6%)
terraform125 (1.5%)
grpc122 (1.5%)
aws121 (1.4%)
missing-validation120 (1.4%)
prometheus115 (1.4%)
networking97 (1.2%)
database90 (1.1%)

Java java

backend2,066 (48.6%)
library1,923 (45.2%)
missing-feature237 (5.6%)
testing222 (5.2%)
spring162 (3.8%)
framework155 (3.6%)
aem147 (3.5%)
http146 (3.4%)
android127 (3.0%)
incomplete-validation100 (2.4%)
missing-metadata-propagation95 (2.2%)
missing-implementation83 (2.0%)
missing-configuration72 (1.7%)
cli70 (1.6%)
json68 (1.6%)
missing-null-check64 (1.5%)
wrong-default63 (1.5%)
maven61 (1.4%)
sling57 (1.3%)
kafka56 (1.3%)

JavaScript js

library3,913 (53.9%)
backend1,239 (17.1%)
frontend1,043 (14.4%)
cli859 (11.8%)
missing-feature496 (6.8%)
react354 (4.9%)
typescript334 (4.6%)
testing297 (4.1%)
eslint263 (3.6%)
incomplete-validation221 (3.0%)
framework200 (2.8%)
http183 (2.5%)
fastify149 (2.1%)
webpack148 (2.0%)
missing-fallback124 (1.7%)
missing-metadata-propagation113 (1.6%)
svelte106 (1.5%)
mongoose104 (1.4%)
nodejs104 (1.4%)
missing-implementation100 (1.4%)

Python py

library2,266 (42.7%)
backend2,191 (41.3%)
cli724 (13.7%)
missing-feature444 (8.4%)
fastapi215 (4.1%)
django150 (2.8%)
missing-implementation146 (2.8%)
testing116 (2.2%)
missing-fallback114 (2.2%)
pytorch114 (2.2%)
incomplete-validation112 (2.1%)
missing-metadata-propagation110 (2.1%)
framework102 (1.9%)
async101 (1.9%)
ansible100 (1.9%)
http95 (1.8%)
aws76 (1.4%)
aiohttp66 (1.2%)
pydantic64 (1.2%)
missing-parameter63 (1.2%)

Rust rust

library3,132 (56.0%)
backend1,224 (21.9%)
cli1,138 (20.3%)
missing-feature411 (7.3%)
testing315 (5.6%)
async225 (4.0%)
http211 (3.8%)
missing-implementation162 (2.9%)
incomplete-validation133 (2.4%)
compiler120 (2.1%)
git119 (2.1%)
missing-metadata-propagation97 (1.7%)
macros93 (1.7%)
parsing92 (1.6%)
graphql81 (1.4%)
blockchain80 (1.4%)
substrate77 (1.4%)
serde73 (1.3%)
missing-fallback71 (1.3%)
sql65 (1.2%)

TypeScript ts

library2,575 (38.7%)
frontend1,833 (27.6%)
backend1,482 (22.3%)
react1,066 (16.0%)
cli602 (9.0%)
missing-feature344 (5.2%)
angular203 (3.1%)
graphql164 (2.5%)
framework158 (2.4%)
missing-fallback147 (2.2%)
electron141 (2.1%)
javascript141 (2.1%)
incomplete-validation139 (2.1%)
fullstack131 (2.0%)
testing126 (1.9%)
missing-metadata-propagation122 (1.8%)
wrong-default120 (1.8%)
vue105 (1.6%)
missing-implementation97 (1.5%)
nextjs86 (1.3%)

Global Top Bug Classes

click to expand

Bug class is the 4th tag in task.toml -> [metadata].tags: a domain-independent label describing the defect mechanism (e.g. missing-fallback, incomplete-validation, off-by-one-error). Generated by the LLM during swegen create and backfilled into legacy 3-tag tasks via swegen backfill-tags.

missing-feature3,610 (7.0%)
incomplete-validation1,350 (2.6%)
missing-implementation1,194 (2.3%)
missing-metadata-propagation943 (1.8%)
missing-fallback861 (1.7%)
wrong-default665 (1.3%)
missing-validation567 (1.1%)
missing-functionality451 (0.9%)
missing-configuration424 (0.8%)
type-handling-inconsistency382 (0.7%)
race-condition334 (0.6%)
missing-configuration-option305 (0.6%)
missing-method285 (0.6%)
missing-api238 (0.5%)
missing-null-check227 (0.4%)
missing-type-support208 (0.4%)
missing-parameter183 (0.4%)
missing-error-handling180 (0.4%)
missing-initialization176 (0.3%)
missing-cleanup168 (0.3%)
missing-field137 (0.3%)
missing-error-propagation132 (0.3%)
missing-context-propagation126 (0.2%)
missing-bounds-check125 (0.2%)
missing-format-support111 (0.2%)
missing-function108 (0.2%)
missing-input-validation108 (0.2%)
off-by-one107 (0.2%)
missing-cli-option107 (0.2%)
missing-configuration-propagation106 (0.2%)

Per-Language Bug-Class Distribution

click to expand

Top bug classes per mainstream language. Counts are over tasks whose task.toml already carries a 4-tag entry; tasks still on the legacy 3-tag schema do not contribute until the backfill catches up.

C c9,934 tagged

missing-feature671 (6.8%)
missing-implementation283 (2.8%)
incomplete-validation275 (2.8%)
missing-metadata-propagation151 (1.5%)
missing-functionality134 (1.3%)
missing-fallback128 (1.3%)
missing-api112 (1.1%)
missing-validation111 (1.1%)
wrong-default97 (1.0%)
missing-initialization80 (0.8%)
race-condition75 (0.8%)
missing-configuration70 (0.7%)
missing-bounds-check55 (0.6%)
memory-leak52 (0.5%)
missing-null-check52 (0.5%)

C++ cpp4,060 tagged

missing-feature268 (6.6%)
missing-implementation144 (3.5%)
incomplete-validation119 (2.9%)
missing-fallback50 (1.2%)
missing-metadata-propagation48 (1.2%)
type-handling-inconsistency40 (1.0%)
missing-functionality38 (0.9%)
missing-api37 (0.9%)
wrong-default36 (0.9%)
missing-validation35 (0.9%)
missing-type-support29 (0.7%)
race-condition25 (0.6%)
missing-configuration20 (0.5%)
missing-method17 (0.4%)
missing-bounds-check16 (0.4%)

Go go8,380 tagged

missing-feature739 (8.8%)
incomplete-validation251 (3.0%)
missing-metadata-propagation207 (2.5%)
missing-fallback179 (2.1%)
missing-implementation179 (2.1%)
wrong-default134 (1.6%)
missing-validation120 (1.4%)
race-condition85 (1.0%)
missing-configuration83 (1.0%)
missing-functionality79 (0.9%)
missing-configuration-option69 (0.8%)
missing-field61 (0.7%)
type-handling-inconsistency61 (0.7%)
missing-nil-check56 (0.7%)
missing-method43 (0.5%)

Java java4,250 tagged

missing-feature237 (5.6%)
incomplete-validation100 (2.4%)
missing-metadata-propagation95 (2.2%)
missing-implementation83 (2.0%)
missing-configuration72 (1.7%)
missing-null-check64 (1.5%)
wrong-default63 (1.5%)
missing-fallback48 (1.1%)
type-handling-inconsistency44 (1.0%)
missing-validation43 (1.0%)
race-condition39 (0.9%)
missing-method33 (0.8%)
missing-configuration-option24 (0.6%)
missing-functionality21 (0.5%)
missing-type-support19 (0.4%)

JavaScript js7,257 tagged

missing-feature496 (6.8%)
incomplete-validation221 (3.0%)
missing-fallback124 (1.7%)
missing-metadata-propagation113 (1.6%)
missing-implementation100 (1.4%)
wrong-default93 (1.3%)
missing-validation82 (1.1%)
missing-configuration-option71 (1.0%)
missing-method63 (0.9%)
type-handling-inconsistency58 (0.8%)
missing-functionality52 (0.7%)
missing-configuration47 (0.6%)
missing-null-check46 (0.6%)
missing-error-handling40 (0.6%)
missing-option40 (0.6%)

Python py5,301 tagged

missing-feature444 (8.4%)
missing-implementation146 (2.8%)
missing-fallback114 (2.2%)
incomplete-validation112 (2.1%)
missing-metadata-propagation110 (2.1%)
missing-parameter63 (1.2%)
wrong-default62 (1.2%)
missing-validation58 (1.1%)
type-handling-inconsistency54 (1.0%)
missing-functionality50 (0.9%)
missing-configuration45 (0.8%)
missing-error-handling33 (0.6%)
missing-method29 (0.5%)
missing-configuration-option25 (0.5%)
missing-type-support24 (0.5%)

Rust rust5,593 tagged

missing-feature411 (7.3%)
missing-implementation162 (2.9%)
incomplete-validation133 (2.4%)
missing-metadata-propagation97 (1.7%)
missing-fallback71 (1.3%)
wrong-default60 (1.1%)
missing-validation59 (1.1%)
missing-functionality52 (0.9%)
missing-method49 (0.9%)
missing-type-support49 (0.9%)
missing-configuration38 (0.7%)
missing-api37 (0.7%)
missing-syntax-support31 (0.6%)
type-handling-inconsistency26 (0.5%)
missing-cli-option24 (0.4%)

TypeScript ts6,653 tagged

missing-feature344 (5.2%)
missing-fallback147 (2.2%)
incomplete-validation139 (2.1%)
missing-metadata-propagation122 (1.8%)
wrong-default120 (1.8%)
missing-implementation97 (1.5%)
missing-validation59 (0.9%)
type-handling-inconsistency58 (0.9%)
missing-configuration49 (0.7%)
race-condition49 (0.7%)
missing-configuration-option42 (0.6%)
missing-method39 (0.6%)
missing-null-check27 (0.4%)
missing-prop27 (0.4%)
missing-functionality25 (0.4%)