SWE-gen/instance
Updated 2026-06-27 02:32:11 BJT · next 2026-06-27 03:32:11 BJT
Overview

Live Dashboard of PRs collections and SWE Tasks Generation

Live state of GitHub PR collection and verifiable SWE-Bench task generation across 8 languages. Detailed per-language analysis is collapsed below — click any section to expand.

Total PRs collected
589,521
23,810 unique repos · 1h 0 / 24h +6
Total valid SWE
50,345
9,935 unique repos · 1h +7 / 24h +120
Overall success rate
9.5%
Valid SWE / processed 530,427
Mean difficulty_score
5.89
median 5.8, count 50,295

Language Progress

LanguagePRs collectedLast 1hLast 24hValid SWELast 1hLast 24hProcessedSuccess rate
Cc32,357009,7150032,357
30.0%
C++cpp48,1750+14,0500+327,006
15.0%
Gogo132,1040+18,184+1+32100,015
8.2%
Javajava89,031004,070+1+1688,910
4.6%
JavaScriptjs39,682007,2140+1939,682
18.2%
Pythonpy105,7760+25,080+1+11105,670
4.8%
Rustrust71,9030+15,535+1+1371,862
7.7%
TypeScriptts70,4930+16,497+3+2664,925
10.0%

Run Parameters

LanguageEval model (OPENAI)Completion model (ANTHROPIC)Concurrencymin_source_filesmax_source_files
Cglm-5claude-sonnet-4-612215
C++Qwen3.6-35B-A3BQwen3.6-35B-A3B8215
GoQwen3.6-35B-A3BQwen3.6-35B-A3B12210
Javaglm-5claude-sonnet-4-68210
JavaScriptQwen3.6-35B-A3BQwen3.6-35B-A3B12210
Pythonglm-5claude-sonnet-4-612315
RustQwen3.6-35B-A3BQwen3.6-35B-A3B8210
TypeScriptQwen3.6-35B-A3BQwen3.6-35B-A3B12210

Failure Reason Breakdown

click to expand
LanguageProcessedValid SWEFailedtrivial_prvalidationinfra_errortimeoutworkflow_errorOther
C32,3579,71522,64217,1263,8961,75921432
C++27,0064,05022,9565,5323,38215,345159521266
Go100,0158,18491,83127,44213,71347,6701,5811,060353
Java88,9104,07084,84023,8408,34242,4491,2047,8111,601
JavaScript39,6827,21432,46816,9573,19413,0453912630
Python105,6705,080100,59036,72011,48852,679917385120
Rust71,8625,53566,32722,1307,66933,7371,1901,0271,234
TypeScript64,9256,49758,42818,6929,49027,9061,76489310

trivial_pr: the PR was judged by the LLM as too trivial (e.g. only config, docs, or dependency-version changes) and unsuitable as a SWE task.

validation: validation failed after task generation (the NOP agent did not return reward=0, or the ORACLE agent did not return reward=1).

infra_error: infrastructure error (Docker build failure, network timeout, insufficient disk space, etc.).

timeout: processing timed out (per-PR total timeout or Claude Code session timeout).

workflow_error: workflow error (PR metadata fetch failure, worktree creation failure, patch generation failure, etc.).

fix.patch Complexity

click to expand
LanguageValid SWE CountAvg fix.patch linesAvg fix.patch hunksAvg fix.patch files
C9,715336.4817.945.85
C++4,050286.3313.755.09
Go8,184211.8012.614.34
Java4,070163.0010.514.23
JavaScript7,21477.136.302.80
Python5,080152.1411.043.84
Rust5,535225.4013.164.10
TypeScript6,497157.429.604.13

Metric Definitions

click to expand

Difficulty score (difficulty_score)

Reads each valid task directory's solution/fix.patch, tests/, and instruction.md, scored statically with zero API calls by src/swegen/scoring.py.

The current formula uses log-scale continuous scoring to avoid mid-sized patches becoming hard too early. Weights: patch_scope 38%, logic_complexity 32%, context_breadth 15%, test_complexity 10%, instruction_complexity 5%.

Label thresholds: easy <= 4.0, medium <= 7.0, hard > 7.0.

Tag generation and display

tags are not computed live by the dashboard; they are generated by the LLM from PR information when swegen builds the task, and written to [metadata].tags in task.toml.

The prompt asks for tags in four parts: programming language, project layer/domain, framework/library or specific topic, and a domain-independent bug class (e.g. missing-fallback, incomplete-validation). The dashboard reads existing task.toml files, counts each language's tag occurrences and share, and treats the 4th tag as the bug class for the Bug-Class panels below.

fix.patch statistics

Patch stats come from each valid task's solution/fix.patch, filtering code files by language extension, consistent with the code-only stats in upload_march_swe_to_hf.py.

Avg fix.patch lines counts added/removed lines in code-file diffs; Avg fix.patch hunks counts @@ hunks; Avg fix.patch files counts the code files involved.

difficulty_label Distribution

click to expand
Languageeasy / medium / hardeasymediumhard
C
869 / 6417 / 2421
8696,4172,421
C++
431 / 2517 / 1097
4312,5171,097
Go
650 / 5911 / 1617
6505,9111,617
Java
452 / 2633 / 981
4522,633981
JavaScript
1093 / 5342 / 778
1,0935,342778
Python
273 / 3258 / 1525
2733,2581,525
Rust
385 / 3277 / 1871
3853,2771,871
TypeScript
588 / 4801 / 1108
5884,8011,108

difficulty_score Overview

click to expand
Languagecountminp25medianmeanp75max
C9,7072.44.96.05.977.09.2
C++4,0452.54.96.05.997.29.1
Go8,1782.64.95.85.866.89.1
Java4,0662.84.85.95.907.09.2
JavaScript7,2132.64.45.25.366.29.2
Python5,0562.65.26.26.227.39.1
Rust5,5332.75.26.36.267.49.0
TypeScript6,4972.74.75.65.726.69.2

Global Top Tags

click to expand
library23,433 (46.6%)
backend15,498 (30.8%)
cli6,806 (13.5%)
missing-feature3,517 (7.0%)
frontend3,100 (6.2%)
testing2,193 (4.4%)
http1,547 (3.1%)
react1,406 (2.8%)
incomplete-validation1,317 (2.6%)
framework1,231 (2.4%)
missing-implementation1,190 (2.4%)
missing-metadata-propagation931 (1.9%)
missing-fallback849 (1.7%)
kubernetes718 (1.4%)
async677 (1.3%)
wrong-default650 (1.3%)
embedded637 (1.3%)
networking616 (1.2%)
cpp595 (1.2%)
missing-validation551 (1.1%)
missing-functionality445 (0.9%)
missing-configuration410 (0.8%)
typescript394 (0.8%)
type-handling-inconsistency373 (0.7%)
parsing358 (0.7%)
eslint344 (0.7%)
graphql336 (0.7%)
race-condition329 (0.7%)
missing-configuration-option299 (0.6%)
postgresql298 (0.6%)

Per-Language Tag Distribution

click to expand

C c

library5,097 (52.5%)
backend2,512 (25.9%)
cli1,120 (11.5%)
missing-feature647 (6.7%)
embedded611 (6.3%)
cpp592 (6.1%)
testing416 (4.3%)
networking400 (4.1%)
framework294 (3.0%)
missing-implementation283 (2.9%)
incomplete-validation272 (2.8%)
ruby213 (2.2%)
postgresql212 (2.2%)
http202 (2.1%)
firmware183 (1.9%)
kernel177 (1.8%)
quic168 (1.7%)
missing-metadata-propagation150 (1.5%)
rust140 (1.4%)
missing-functionality133 (1.4%)

C++ cpp

library2,786 (68.8%)
backend738 (18.2%)
testing493 (12.2%)
cli357 (8.8%)
missing-feature268 (6.6%)
framework186 (4.6%)
missing-implementation144 (3.6%)
http129 (3.2%)
incomplete-validation119 (2.9%)
boost114 (2.8%)
async91 (2.2%)
parsing76 (1.9%)
qt64 (1.6%)
serialization55 (1.4%)
compiler54 (1.3%)
geometry51 (1.3%)
missing-fallback50 (1.2%)
networking49 (1.2%)
missing-metadata-propagation48 (1.2%)
formatting45 (1.1%)

Go go

backend4,291 (52.5%)
library2,056 (25.1%)
cli1,984 (24.3%)
missing-feature730 (8.9%)
kubernetes654 (8.0%)
http524 (6.4%)
incomplete-validation243 (3.0%)
testing226 (2.8%)
missing-metadata-propagation204 (2.5%)
missing-implementation178 (2.2%)
missing-fallback174 (2.1%)
docker134 (1.6%)
wrong-default129 (1.6%)
aws120 (1.5%)
grpc120 (1.5%)
terraform120 (1.5%)
missing-validation116 (1.4%)
prometheus106 (1.3%)
networking97 (1.2%)
database90 (1.1%)

Java java

backend1,977 (48.6%)
library1,836 (45.1%)
missing-feature223 (5.5%)
testing214 (5.3%)
spring160 (3.9%)
framework152 (3.7%)
aem147 (3.6%)
http139 (3.4%)
android124 (3.0%)
incomplete-validation94 (2.3%)
missing-metadata-propagation93 (2.3%)
missing-implementation82 (2.0%)
json67 (1.6%)
missing-configuration67 (1.6%)
cli66 (1.6%)
missing-null-check64 (1.6%)
wrong-default61 (1.5%)
maven60 (1.5%)
sling57 (1.4%)
kafka54 (1.3%)

JavaScript js

library3,890 (53.9%)
backend1,237 (17.1%)
frontend1,031 (14.3%)
cli857 (11.9%)
missing-feature493 (6.8%)
react353 (4.9%)
typescript330 (4.6%)
testing296 (4.1%)
eslint263 (3.6%)
incomplete-validation220 (3.0%)
framework200 (2.8%)
http182 (2.5%)
fastify148 (2.1%)
webpack147 (2.0%)
missing-fallback123 (1.7%)
missing-metadata-propagation113 (1.6%)
mongoose104 (1.4%)
nodejs104 (1.4%)
svelte103 (1.4%)
missing-implementation100 (1.4%)

Python py

library2,147 (42.4%)
backend2,106 (41.6%)
cli698 (13.8%)
missing-feature409 (8.1%)
fastapi206 (4.1%)
missing-implementation145 (2.9%)
django143 (2.8%)
testing115 (2.3%)
missing-fallback113 (2.2%)
missing-metadata-propagation106 (2.1%)
pytorch106 (2.1%)
incomplete-validation101 (2.0%)
ansible100 (2.0%)
async100 (2.0%)
framework99 (2.0%)
http93 (1.8%)
aws74 (1.5%)
aiohttp65 (1.3%)
missing-parameter63 (1.2%)
pydantic60 (1.2%)

Rust rust

library3,104 (56.1%)
backend1,199 (21.7%)
cli1,130 (20.4%)
missing-feature409 (7.4%)
testing310 (5.6%)
async224 (4.0%)
http207 (3.7%)
missing-implementation161 (2.9%)
incomplete-validation131 (2.4%)
compiler120 (2.2%)
git117 (2.1%)
missing-metadata-propagation97 (1.8%)
macros93 (1.7%)
parsing91 (1.6%)
graphql79 (1.4%)
blockchain77 (1.4%)
substrate76 (1.4%)
missing-fallback71 (1.3%)
serde71 (1.3%)
sql65 (1.2%)

TypeScript ts

library2,517 (38.7%)
frontend1,793 (27.6%)
backend1,438 (22.1%)
react1,045 (16.1%)
cli594 (9.1%)
missing-feature338 (5.2%)
angular203 (3.1%)
graphql162 (2.5%)
framework152 (2.3%)
missing-fallback144 (2.2%)
javascript140 (2.2%)
electron139 (2.1%)
incomplete-validation137 (2.1%)
fullstack128 (2.0%)
testing123 (1.9%)
missing-metadata-propagation120 (1.8%)
wrong-default118 (1.8%)
vue99 (1.5%)
missing-implementation97 (1.5%)
nextjs85 (1.3%)

Global Top Bug Classes

click to expand

Bug class is the 4th tag in task.toml -> [metadata].tags: a domain-independent label describing the defect mechanism (e.g. missing-fallback, incomplete-validation, off-by-one-error). Generated by the LLM during swegen create and backfilled into legacy 3-tag tasks via swegen backfill-tags.

missing-feature3,517 (7.0%)
incomplete-validation1,317 (2.6%)
missing-implementation1,190 (2.4%)
missing-metadata-propagation931 (1.9%)
missing-fallback849 (1.7%)
wrong-default650 (1.3%)
missing-validation551 (1.1%)
missing-functionality445 (0.9%)
missing-configuration410 (0.8%)
type-handling-inconsistency373 (0.7%)
race-condition329 (0.7%)
missing-configuration-option299 (0.6%)
missing-method279 (0.6%)
missing-api234 (0.5%)
missing-null-check226 (0.4%)
missing-type-support205 (0.4%)
missing-parameter183 (0.4%)
missing-error-handling178 (0.4%)
missing-initialization173 (0.3%)
missing-cleanup165 (0.3%)
missing-field135 (0.3%)
missing-error-propagation130 (0.3%)
missing-context-propagation125 (0.2%)
missing-bounds-check123 (0.2%)
missing-function107 (0.2%)
missing-format-support107 (0.2%)
missing-cli-option107 (0.2%)
missing-input-validation105 (0.2%)
off-by-one104 (0.2%)
missing-configuration-propagation102 (0.2%)

Per-Language Bug-Class Distribution

click to expand

Top bug classes per mainstream language. Counts are over tasks whose task.toml already carries a 4-tag entry; tasks still on the legacy 3-tag schema do not contribute until the backfill catches up.

C c9,715 tagged

missing-feature647 (6.7%)
missing-implementation283 (2.9%)
incomplete-validation272 (2.8%)
missing-metadata-propagation150 (1.5%)
missing-functionality133 (1.4%)
missing-fallback127 (1.3%)
missing-api108 (1.1%)
missing-validation108 (1.1%)
wrong-default97 (1.0%)
missing-initialization80 (0.8%)
race-condition74 (0.8%)
missing-configuration68 (0.7%)
missing-bounds-check55 (0.6%)
missing-null-check52 (0.5%)
memory-leak49 (0.5%)

C++ cpp4,048 tagged

missing-feature268 (6.6%)
missing-implementation144 (3.6%)
incomplete-validation119 (2.9%)
missing-fallback50 (1.2%)
missing-metadata-propagation48 (1.2%)
type-handling-inconsistency40 (1.0%)
missing-functionality38 (0.9%)
missing-api37 (0.9%)
wrong-default36 (0.9%)
missing-validation35 (0.9%)
missing-type-support29 (0.7%)
race-condition25 (0.6%)
missing-configuration20 (0.5%)
missing-method17 (0.4%)
missing-bounds-check16 (0.4%)

Go go8,178 tagged

missing-feature730 (8.9%)
incomplete-validation243 (3.0%)
missing-metadata-propagation204 (2.5%)
missing-implementation178 (2.2%)
missing-fallback174 (2.1%)
wrong-default129 (1.6%)
missing-validation116 (1.4%)
race-condition83 (1.0%)
missing-configuration80 (1.0%)
missing-functionality76 (0.9%)
missing-configuration-option69 (0.8%)
missing-field60 (0.7%)
type-handling-inconsistency60 (0.7%)
missing-nil-check53 (0.6%)
missing-method41 (0.5%)

Java java4,067 tagged

missing-feature223 (5.5%)
incomplete-validation94 (2.3%)
missing-metadata-propagation93 (2.3%)
missing-implementation82 (2.0%)
missing-configuration67 (1.6%)
missing-null-check64 (1.6%)
wrong-default61 (1.5%)
missing-fallback47 (1.2%)
type-handling-inconsistency42 (1.0%)
missing-validation41 (1.0%)
race-condition39 (1.0%)
missing-method32 (0.8%)
missing-configuration-option24 (0.6%)
missing-functionality21 (0.5%)
missing-type-support19 (0.5%)

JavaScript js7,214 tagged

missing-feature493 (6.8%)
incomplete-validation220 (3.0%)
missing-fallback123 (1.7%)
missing-metadata-propagation113 (1.6%)
missing-implementation100 (1.4%)
wrong-default92 (1.3%)
missing-validation82 (1.1%)
missing-configuration-option71 (1.0%)
missing-method61 (0.8%)
type-handling-inconsistency56 (0.8%)
missing-functionality52 (0.7%)
missing-configuration47 (0.7%)
missing-null-check46 (0.6%)
missing-error-handling40 (0.6%)
missing-option40 (0.6%)

Python py5,060 tagged

missing-feature409 (8.1%)
missing-implementation145 (2.9%)
missing-fallback113 (2.2%)
missing-metadata-propagation106 (2.1%)
incomplete-validation101 (2.0%)
missing-parameter63 (1.2%)
wrong-default60 (1.2%)
missing-validation55 (1.1%)
type-handling-inconsistency52 (1.0%)
missing-functionality50 (1.0%)
missing-configuration44 (0.9%)
missing-error-handling31 (0.6%)
missing-method29 (0.6%)
missing-configuration-option23 (0.5%)
missing-type-support23 (0.5%)

Rust rust5,533 tagged

missing-feature409 (7.4%)
missing-implementation161 (2.9%)
incomplete-validation131 (2.4%)
missing-metadata-propagation97 (1.8%)
missing-fallback71 (1.3%)
missing-validation59 (1.1%)
wrong-default57 (1.0%)
missing-functionality50 (0.9%)
missing-method49 (0.9%)
missing-type-support49 (0.9%)
missing-configuration38 (0.7%)
missing-api37 (0.7%)
missing-syntax-support31 (0.6%)
type-handling-inconsistency26 (0.5%)
missing-cli-option24 (0.4%)

TypeScript ts6,497 tagged

missing-feature338 (5.2%)
missing-fallback144 (2.2%)
incomplete-validation137 (2.1%)
missing-metadata-propagation120 (1.8%)
wrong-default118 (1.8%)
missing-implementation97 (1.5%)
type-handling-inconsistency56 (0.9%)
missing-validation55 (0.8%)
race-condition48 (0.7%)
missing-configuration46 (0.7%)
missing-configuration-option42 (0.6%)
missing-method38 (0.6%)
missing-null-check27 (0.4%)
missing-prop27 (0.4%)
missing-functionality25 (0.4%)