SIGINT

Paper detail

Attention Is All You Need

2017 Neural Information Processing Systems Mixed Composite: 36.3

R 66 T 67 C 25 I 25

benchmark-eval

Key findings

The paper introduces the Transformer architecture, which relies entirely on self-attention mechanisms and dispenses with recurrence and convolution. On WMT 2014 machine translation, it achieves state-of-the-art BLEU scores (28.4 EN-DE, 41.8 EN-FR) while requiring a fraction of the training compute of competitive models. Ablation studies demonstrate the importance of multi-head attention, model size, and regularization. The architecture also transfers to English constituency parsing with competitive results.

Claims (5)

strongThe Transformer (big) achieves 28.4 BLEU on WMT 2014 English-to-German, improving over existing best results including ensembles by over 2 BLEU.

moderateThe Transformer (big) achieves a new single-model state-of-the-art BLEU score of 41.8 on WMT 2014 English-to-French.

strongThe Transformer requires significantly less training cost than competitive models.

moderateThe Transformer generalizes well to English constituency parsing.

strongMulti-head attention is beneficial; single-head attention is 0.9 BLEU worse.

Red flags (5)

No variance or multi-run results: All results appear to be single runs with no error bars, standard deviations, or confidence intervals. For a paper making state-of-the-art claims, this makes it impossible to assess whether observed differences are within noise.

Abstract/body BLEU inconsistency on EN-FR: The abstract claims 41.8 BLEU on EN-FR, but Section 6.1 states 'our big model achieves a BLEU score of 41.0' for the same task. This discrepancy is unexplained.

Overly broad title and generalization claims: The title 'Attention Is All You Need' and claims of generalization are based on only two translation language pairs and one parsing task. The scope of evidence is narrower than the scope of claims.

No limitations section: The paper contains no discussion of limitations, threats to validity, or scope boundaries despite making broad architectural claims.

Conflict of interest: Google evaluating Google architecture: All authors are Google employees (with one University of Toronto affiliate working at Google). The paper proposes and evaluates a Google-developed architecture with no independent evaluation or conflict disclosure.

Games detected

Big Numbers No Error BarsOverclaimingOpen Source Theater

Dimension scores

Rigor

65.9

Transparency

66.7

Claims

25.0

Integrity

25.0

Composite: 36.3(harmonic mean)

Checklist (19/36 passed)

Category scores

artifacts

statistical methodology

evaluation design

77.8

claims and evidence

setup transparency

100

limitations and scope

data integrity

100

conflicts of interest

cost and practicality

arXiv · PDF · DOI · Code · HN (173pts)

Permalink →