SIGINT
All-source analysis of agentic development research. What's signal. What's noise.
An LLM-assisted scan of 1,524 papers on agentic AI, scored on four dimensions: Rigor, Transparency, Claims, and Integrity. Each score is a pass rate of binary fact-checks extracted from the paper. No calibration weights, no reviewer judgment in the math.
- 4.1% meet full reproducibility standards (code, data, seeds, environment).
- 66% show overclaiming: conclusions stronger than the evidence supports.
- 33.1 / 100 median composite score across the corpus. That's the baseline the field is working from.
Of NaN scanned: 1,524 empirical papers feed the aggregates above; NaN non-empirical (surveys, position papers, frameworks) are scored on a reduced rubric and shown individually but not mixed into the aggregates.
What is this?
Every major claim in agentic AI, checked against the primary source. Stop betting your architecture on a blog post someone wrote about a preprint they skimmed.
> Assess claims
Each paper rated for methodology, sample size, and evidence quality.
> Track trends
What the field is converging on, where papers disagree, and what holds up under scrutiny.
> Source details
Drill into any paper's methodology, limitations, and how it connects to the broader body of work.