SWE-gen/instance
Updated 2026-06-24 01:14:00 BJT · next 2026-06-24 02:14:00 BJT
Overview

Live Dashboard of PRs collections and SWE Tasks Generation

Live state of GitHub PR collection and verifiable SWE-Bench task generation across 8 languages. Detailed per-language analysis is collapsed below — click any section to expand.

Total PRs collected
589,505
23,810 unique repos · 1h +1 / 24h +20,594
Total valid SWE
49,988
9,927 unique repos · 1h +5 / 24h +97
Overall success rate
10.1%
Valid SWE / processed 496,211
Mean difficulty_score
5.89
median 5.9, count 49,938

Language Progress

LanguagePRs collectedLast 1hLast 24hValid SWELast 1hLast 24hProcessedSuccess rate
Cc32,3570+2,3119,7090+131,278
31.0%
C++cpp48,173+1+1,7134,041+1+421,201
19.1%
Gogo132,1030+4,1428,101+2+2990,385
9.0%
Javajava89,0290+1,7114,0200+384,155
4.8%
JavaScriptjs39,6820+6157,1390+1637,760
18.9%
Pythonpy105,7690+4,6445,0580+9102,028
5.0%
Rustrust71,9010+2,6245,5050+971,011
7.8%
TypeScriptts70,4910+2,8346,415+2+2658,393
11.0%

Run Parameters

LanguageEval model (OPENAI)Completion model (ANTHROPIC)Concurrencymin_source_filesmax_source_files
Cgpt-5.4claude-sonnet-4-612215
C++Qwen3.6-35B-A3BQwen3.6-35B-A3B8215
GoQwen3.6-35B-A3BQwen3.6-35B-A3B12210
Javaclaude-haiku-4-5-20251001claude-sonnet-4-68210
JavaScriptQwen3.6-35B-A3BQwen3.6-35B-A3B12210
Pythonglm-5claude-sonnet-4-612315
RustQwen3.6-35B-A3BQwen3.6-35B-A3B8210
TypeScriptQwen3.6-35B-A3BQwen3.6-35B-A3B12210

Failure Reason Breakdown

click to expand
LanguageProcessedValid SWEFailedtrivial_prvalidationinfra_errortimeoutworkflow_errorOther
C31,2789,70921,56915,3821,6494,74537441
C++21,2014,04117,1602,79474415,105159337266
Go90,3858,10182,28422,9908,80047,6631,526824466
Java84,1554,02080,13520,5216,64544,9701,2985,4971,602
JavaScript37,7607,13930,62115,6921,49714,1015291840
Python102,0285,05896,97030,1637,97159,071985378120
Rust71,0115,50565,50620,5715,73936,4511,2419281,234
TypeScript58,3936,41551,97814,7764,64230,3291,76078711

trivial_pr: the PR was judged by the LLM as too trivial (e.g. only config, docs, or dependency-version changes) and unsuitable as a SWE task.

validation: validation failed after task generation (the NOP agent did not return reward=0, or the ORACLE agent did not return reward=1).

infra_error: infrastructure error (Docker build failure, network timeout, insufficient disk space, etc.).

timeout: processing timed out (per-PR total timeout or Claude Code session timeout).

workflow_error: workflow error (PR metadata fetch failure, worktree creation failure, patch generation failure, etc.).

fix.patch Complexity

click to expand
LanguageValid SWE CountAvg fix.patch linesAvg fix.patch hunksAvg fix.patch files
C9,709336.5717.945.85
C++4,041286.7713.765.10
Go8,101213.1512.664.36
Java4,020163.3010.504.23
JavaScript7,13977.096.282.79
Python5,058151.4210.983.83
Rust5,505225.8813.174.10
TypeScript6,415158.339.614.13

Metric Definitions

click to expand

Difficulty score (difficulty_score)

Reads each valid task directory's solution/fix.patch, tests/, and instruction.md, scored statically with zero API calls by src/swegen/scoring.py.

The current formula uses log-scale continuous scoring to avoid mid-sized patches becoming hard too early. Weights: patch_scope 38%, logic_complexity 32%, context_breadth 15%, test_complexity 10%, instruction_complexity 5%.

Label thresholds: easy <= 4.0, medium <= 7.0, hard > 7.0.

Tag generation and display

tags are not computed live by the dashboard; they are generated by the LLM from PR information when swegen builds the task, and written to [metadata].tags in task.toml.

The prompt asks for tags in four parts: programming language, project layer/domain, framework/library or specific topic, and a domain-independent bug class (e.g. missing-fallback, incomplete-validation). The dashboard reads existing task.toml files, counts each language's tag occurrences and share, and treats the 4th tag as the bug class for the Bug-Class panels below.

fix.patch statistics

Patch stats come from each valid task's solution/fix.patch, filtering code files by language extension, consistent with the code-only stats in upload_march_swe_to_hf.py.

Avg fix.patch lines counts added/removed lines in code-file diffs; Avg fix.patch hunks counts @@ hunks; Avg fix.patch files counts the code files involved.

difficulty_label Distribution

click to expand
Languageeasy / medium / hardeasymediumhard
C
869 / 6414 / 2418
8696,4142,418
C++
430 / 2509 / 1097
4302,5091,097
Go
630 / 5860 / 1605
6305,8601,605
Java
443 / 2605 / 968
4432,605968
JavaScript
1077 / 5291 / 770
1,0775,291770
Python
273 / 3248 / 1513
2733,2481,513
Rust
382 / 3260 / 1861
3823,2601,861
TypeScript
574 / 4745 / 1096
5744,7451,096

difficulty_score Overview

click to expand
Languagecountminp25medianmeanp75max
C9,7012.44.96.05.977.09.2
C++4,0362.54.96.05.997.29.1
Go8,0952.64.95.85.876.89.1
Java4,0162.84.85.95.907.09.2
JavaScript7,1382.64.45.25.366.29.2
Python5,0342.65.26.26.227.39.1
Rust5,5032.75.26.36.267.49.0
TypeScript6,4152.74.75.65.726.69.1

Global Top Tags

click to expand
library23,299 (46.6%)
backend15,364 (30.8%)
cli6,766 (13.5%)
missing-feature3,495 (7.0%)
frontend3,056 (6.1%)
testing2,180 (4.4%)
http1,536 (3.1%)
react1,393 (2.8%)
incomplete-validation1,305 (2.6%)
framework1,222 (2.4%)
missing-implementation1,184 (2.4%)
missing-metadata-propagation918 (1.8%)
missing-fallback842 (1.7%)
kubernetes709 (1.4%)
async674 (1.3%)
wrong-default643 (1.3%)
embedded637 (1.3%)
networking615 (1.2%)
cpp594 (1.2%)
missing-validation550 (1.1%)
missing-functionality443 (0.9%)
missing-configuration408 (0.8%)
typescript383 (0.8%)
type-handling-inconsistency370 (0.7%)
parsing358 (0.7%)
eslint342 (0.7%)
graphql336 (0.7%)
race-condition328 (0.7%)
postgresql298 (0.6%)
missing-configuration-option297 (0.6%)

Per-Language Tag Distribution

click to expand

C c

library5,096 (52.5%)
backend2,509 (25.8%)
cli1,119 (11.5%)
missing-feature647 (6.7%)
embedded611 (6.3%)
cpp591 (6.1%)
testing414 (4.3%)
networking400 (4.1%)
framework293 (3.0%)
missing-implementation283 (2.9%)
incomplete-validation272 (2.8%)
ruby213 (2.2%)
postgresql212 (2.2%)
http201 (2.1%)
firmware183 (1.9%)
kernel177 (1.8%)
quic168 (1.7%)
missing-metadata-propagation150 (1.5%)
rust140 (1.4%)
missing-functionality133 (1.4%)

C++ cpp

library2,785 (69.0%)
backend731 (18.1%)
testing491 (12.2%)
cli357 (8.8%)
missing-feature268 (6.6%)
framework186 (4.6%)
missing-implementation144 (3.6%)
http129 (3.2%)
incomplete-validation118 (2.9%)
boost114 (2.8%)
async91 (2.3%)
parsing76 (1.9%)
qt63 (1.6%)
compiler54 (1.3%)
serialization54 (1.3%)
geometry51 (1.3%)
missing-fallback49 (1.2%)
networking49 (1.2%)
missing-metadata-propagation48 (1.2%)
formatting45 (1.1%)

Go go

backend4,246 (52.5%)
library2,039 (25.2%)
cli1,960 (24.2%)
missing-feature727 (9.0%)
kubernetes649 (8.0%)
http520 (6.4%)
incomplete-validation240 (3.0%)
testing224 (2.8%)
missing-metadata-propagation201 (2.5%)
missing-implementation177 (2.2%)
missing-fallback173 (2.1%)
docker133 (1.6%)
wrong-default128 (1.6%)
aws119 (1.5%)
terraform118 (1.5%)
grpc117 (1.4%)
missing-validation116 (1.4%)
prometheus103 (1.3%)
networking97 (1.2%)
database90 (1.1%)

Java java

backend1,950 (48.5%)
library1,815 (45.2%)
missing-feature220 (5.5%)
testing211 (5.3%)
spring159 (4.0%)
framework152 (3.8%)
aem147 (3.7%)
http136 (3.4%)
android118 (2.9%)
incomplete-validation92 (2.3%)
missing-metadata-propagation91 (2.3%)
missing-implementation80 (2.0%)
json67 (1.7%)
missing-configuration67 (1.7%)
cli65 (1.6%)
missing-null-check64 (1.6%)
wrong-default61 (1.5%)
maven59 (1.5%)
sling57 (1.4%)
kafka54 (1.3%)

JavaScript js

library3,854 (54.0%)
backend1,229 (17.2%)
frontend1,009 (14.1%)
cli851 (11.9%)
missing-feature486 (6.8%)
react346 (4.8%)
typescript320 (4.5%)
testing295 (4.1%)
eslint262 (3.7%)
incomplete-validation217 (3.0%)
framework197 (2.8%)
http180 (2.5%)
fastify147 (2.1%)
webpack145 (2.0%)
missing-fallback122 (1.7%)
missing-metadata-propagation110 (1.5%)
mongoose103 (1.4%)
nodejs103 (1.4%)
svelte101 (1.4%)
missing-implementation100 (1.4%)

Python py

library2,137 (42.4%)
backend2,096 (41.6%)
cli697 (13.8%)
missing-feature406 (8.1%)
fastapi205 (4.1%)
missing-implementation144 (2.9%)
django143 (2.8%)
testing115 (2.3%)
missing-fallback113 (2.2%)
missing-metadata-propagation106 (2.1%)
pytorch105 (2.1%)
incomplete-validation101 (2.0%)
ansible100 (2.0%)
async99 (2.0%)
framework99 (2.0%)
http93 (1.8%)
aws72 (1.4%)
aiohttp65 (1.3%)
missing-parameter63 (1.3%)
pydantic60 (1.2%)

Rust rust

library3,090 (56.2%)
backend1,186 (21.6%)
cli1,126 (20.5%)
missing-feature405 (7.4%)
testing308 (5.6%)
async223 (4.1%)
http207 (3.8%)
missing-implementation159 (2.9%)
incomplete-validation131 (2.4%)
compiler120 (2.2%)
git117 (2.1%)
missing-metadata-propagation97 (1.8%)
macros93 (1.7%)
parsing91 (1.7%)
graphql79 (1.4%)
substrate76 (1.4%)
blockchain75 (1.4%)
serde71 (1.3%)
missing-fallback70 (1.3%)
sql64 (1.2%)

TypeScript ts

library2,483 (38.7%)
frontend1,774 (27.7%)
backend1,417 (22.1%)
react1,039 (16.2%)
cli591 (9.2%)
missing-feature336 (5.2%)
angular203 (3.2%)
graphql162 (2.5%)
framework147 (2.3%)
missing-fallback142 (2.2%)
javascript138 (2.2%)
electron135 (2.1%)
incomplete-validation134 (2.1%)
fullstack126 (2.0%)
testing122 (1.9%)
missing-metadata-propagation115 (1.8%)
wrong-default115 (1.8%)
vue98 (1.5%)
missing-implementation97 (1.5%)
github-actions84 (1.3%)

Global Top Bug Classes

click to expand

Bug class is the 4th tag in task.toml -> [metadata].tags: a domain-independent label describing the defect mechanism (e.g. missing-fallback, incomplete-validation, off-by-one-error). Generated by the LLM during swegen create and backfilled into legacy 3-tag tasks via swegen backfill-tags.

missing-feature3,495 (7.0%)
incomplete-validation1,305 (2.6%)
missing-implementation1,184 (2.4%)
missing-metadata-propagation918 (1.8%)
missing-fallback842 (1.7%)
wrong-default643 (1.3%)
missing-validation550 (1.1%)
missing-functionality443 (0.9%)
missing-configuration408 (0.8%)
type-handling-inconsistency370 (0.7%)
race-condition328 (0.7%)
missing-configuration-option297 (0.6%)
missing-method276 (0.6%)
missing-api234 (0.5%)
missing-null-check225 (0.5%)
missing-type-support204 (0.4%)
missing-parameter181 (0.4%)
missing-error-handling174 (0.3%)
missing-initialization172 (0.3%)
missing-cleanup165 (0.3%)
missing-field135 (0.3%)
missing-error-propagation129 (0.3%)
missing-context-propagation125 (0.3%)
missing-bounds-check123 (0.2%)
missing-format-support107 (0.2%)
missing-cli-option107 (0.2%)
missing-function106 (0.2%)
off-by-one104 (0.2%)
missing-input-validation103 (0.2%)
missing-configuration-propagation102 (0.2%)

Per-Language Bug-Class Distribution

click to expand

Top bug classes per mainstream language. Counts are over tasks whose task.toml already carries a 4-tag entry; tasks still on the legacy 3-tag schema do not contribute until the backfill catches up.

C c9,709 tagged

missing-feature647 (6.7%)
missing-implementation283 (2.9%)
incomplete-validation272 (2.8%)
missing-metadata-propagation150 (1.5%)
missing-functionality133 (1.4%)
missing-fallback127 (1.3%)
missing-api108 (1.1%)
missing-validation108 (1.1%)
wrong-default97 (1.0%)
missing-initialization80 (0.8%)
race-condition74 (0.8%)
missing-configuration68 (0.7%)
missing-bounds-check55 (0.6%)
missing-null-check52 (0.5%)
memory-leak49 (0.5%)

C++ cpp4,039 tagged

missing-feature268 (6.6%)
missing-implementation144 (3.6%)
incomplete-validation118 (2.9%)
missing-fallback49 (1.2%)
missing-metadata-propagation48 (1.2%)
type-handling-inconsistency39 (1.0%)
missing-functionality38 (0.9%)
missing-api37 (0.9%)
wrong-default36 (0.9%)
missing-validation34 (0.8%)
missing-type-support29 (0.7%)
race-condition25 (0.6%)
missing-configuration20 (0.5%)
missing-method17 (0.4%)
missing-bounds-check16 (0.4%)

Go go8,095 tagged

missing-feature727 (9.0%)
incomplete-validation240 (3.0%)
missing-metadata-propagation201 (2.5%)
missing-implementation177 (2.2%)
missing-fallback173 (2.1%)
wrong-default128 (1.6%)
missing-validation116 (1.4%)
race-condition83 (1.0%)
missing-configuration80 (1.0%)
missing-functionality74 (0.9%)
missing-configuration-option68 (0.8%)
missing-field60 (0.7%)
type-handling-inconsistency59 (0.7%)
missing-nil-check53 (0.7%)
missing-method40 (0.5%)

Java java4,017 tagged

missing-feature220 (5.5%)
incomplete-validation92 (2.3%)
missing-metadata-propagation91 (2.3%)
missing-implementation80 (2.0%)
missing-configuration67 (1.7%)
missing-null-check64 (1.6%)
wrong-default61 (1.5%)
missing-fallback46 (1.1%)
type-handling-inconsistency42 (1.0%)
missing-validation41 (1.0%)
race-condition39 (1.0%)
missing-method32 (0.8%)
missing-configuration-option24 (0.6%)
missing-functionality21 (0.5%)
missing-type-support19 (0.5%)

JavaScript js7,139 tagged

missing-feature486 (6.8%)
incomplete-validation217 (3.0%)
missing-fallback122 (1.7%)
missing-metadata-propagation110 (1.5%)
missing-implementation100 (1.4%)
wrong-default91 (1.3%)
missing-validation82 (1.1%)
missing-configuration-option70 (1.0%)
missing-method61 (0.9%)
type-handling-inconsistency56 (0.8%)
missing-functionality52 (0.7%)
missing-configuration46 (0.6%)
missing-null-check46 (0.6%)
missing-option40 (0.6%)
missing-error-handling38 (0.5%)

Python py5,038 tagged

missing-feature406 (8.1%)
missing-implementation144 (2.9%)
missing-fallback113 (2.2%)
missing-metadata-propagation106 (2.1%)
incomplete-validation101 (2.0%)
missing-parameter63 (1.3%)
wrong-default59 (1.2%)
missing-validation55 (1.1%)
type-handling-inconsistency51 (1.0%)
missing-functionality50 (1.0%)
missing-configuration44 (0.9%)
missing-error-handling31 (0.6%)
missing-method29 (0.6%)
missing-configuration-option23 (0.5%)
missing-type-support23 (0.5%)

Rust rust5,503 tagged

missing-feature405 (7.4%)
missing-implementation159 (2.9%)
incomplete-validation131 (2.4%)
missing-metadata-propagation97 (1.8%)
missing-fallback70 (1.3%)
missing-validation59 (1.1%)
wrong-default56 (1.0%)
missing-functionality50 (0.9%)
missing-type-support49 (0.9%)
missing-method48 (0.9%)
missing-configuration38 (0.7%)
missing-api37 (0.7%)
missing-syntax-support31 (0.6%)
type-handling-inconsistency26 (0.5%)
missing-cli-option24 (0.4%)

TypeScript ts6,415 tagged

missing-feature336 (5.2%)
missing-fallback142 (2.2%)
incomplete-validation134 (2.1%)
missing-metadata-propagation115 (1.8%)
wrong-default115 (1.8%)
missing-implementation97 (1.5%)
type-handling-inconsistency56 (0.9%)
missing-validation55 (0.9%)
race-condition48 (0.7%)
missing-configuration45 (0.7%)
missing-configuration-option42 (0.7%)
missing-method37 (0.6%)
missing-null-check27 (0.4%)
missing-prop27 (0.4%)
missing-functionality25 (0.4%)