03

SIGINT

All-source analysis of agentic development research. What's signal. What's noise.

An LLM-assisted scan of 1,524 papers on agentic AI, scored on four dimensions: Rigor, Transparency, Claims, and Integrity. Each score is a pass rate of binary fact-checks extracted from the paper. No calibration weights, no reviewer judgment in the math.

  • 4.1% meet full reproducibility standards (code, data, seeds, environment).
  • 66% show overclaiming: conclusions stronger than the evidence supports.
  • 33.1 / 100 median composite score across the corpus. That's the baseline the field is working from.
10,663
Papers tracked
1,524
Scored
33.1
Composite median
4.1%
Full reproducibility
53.3
Rigor median
41.7
Transparency median
63.3
Claims median
25.0
Integrity median
> Survey progress of 10663 scanned (NaN%)
V5 Haiku () Legacy () Not scanned (8899)

Of NaN scanned: 1,524 empirical papers feed the aggregates above; NaN non-empirical (surveys, position papers, frameworks) are scored on a reduced rubric and shown individually but not mixed into the aggregates.

What is this?

Every major claim in agentic AI, checked against the primary source. Stop betting your architecture on a blog post someone wrote about a preprint they skimmed.

> Assess claims

Each paper rated for methodology, sample size, and evidence quality.

> Track trends

What the field is converging on, where papers disagree, and what holds up under scrutiny.

> Source details

Drill into any paper's methodology, limitations, and how it connects to the broader body of work.