March 25, 2026·43 min read·56 views·8 providers

AGI's Moving Goalposts: 50 Years of Change

How AGI's finish line kept moving: a 50-year map of shifting definitions, tests, and reclassifications—from symbolic AI to GPT-era multimodal systems.

Key Finding

OpenAI’s charter/user-facing framing defines AGI as highly autonomous systems that outperform humans at most economically valuable work.

high confidenceSupported by anthropic, gemini, openai, grok-premium, grok, perplexity, openai-mini

Justin Furniss

@Parallect.ai and @SecureCoders. Founder. Hacker. Father. Seeker of all things AI

anthropicgeminigemini-liteopenaigrok-premiumperplexityopenai-minigrok

Contents

The Shifting Goalposts of AGI: 50 Years of Moving the Finish Line

Cross-Provider Synthesis of 8 Independent Research Reports

Executive Summary

The "AI Effect" is the most consistently documented phenomenon across all 8 providers: every major AI capability — chess, Go, protein folding, the bar exam, the Turing Test — was declared impossible before it was achieved, then immediately reclassified as "not real intelligence" afterward. The dismissal language is verbatim-identical across decades, constituting what providers independently call a "script" or "playbook."
The term "AGI" has never had a stable scientific definition: it was coined around 2002-2006 partly to rehabilitate a professionally embarrassing topic, and has since been redefined by every major player to serve commercial, regulatory, and reputational interests. The economic definition ("outperform humans at most economically valuable work") replaced cognitive definitions around 2021, and is itself now being superseded.
The financial incentives around AGI definitions are structurally misaligned with scientific clarity: OpenAI has contractual reasons to delay declaring AGI (it triggers Microsoft IP clauses); Nvidia benefits from AGI being "almost here" forever; Meta benefits from AGI never being declared (regulatory avoidance); Anthropic benefits from AGI being "close but dangerous." No major player has a financial incentive to produce a stable, falsifiable definition.
By every historical definition of AGI, today's systems pass — but each time they pass, the definition upgrades: the scorecard across all providers shows consistent ✅ Pass on 1976, 2006, and 2016 criteria, with 🟡 Partial on 2021-2024 criteria, and ❌ Fail on whatever the current frontier definition is. The ratchet only moves in one direction.
The most consequential recent data points — Jensen Huang's "I think we've achieved AGI" with the immediate Nvidia qualification, Sam Altman's "AGI kinda went whooshing by," and the LeCun/Hassabis public schism — represent the field's most honest moment in 50 years: leaders are simultaneously declaring victory and moving the goalposts in the same sentence, making the pattern undeniable even to insiders.

Cross-Provider Consensus

1. The "AI Effect" / Reclassification Pattern Is Universal

Finding: Every AI breakthrough follows an identical cycle — declared impossible → achieved → reclassified as "not real intelligence" using recycled language ("just pattern matching," "narrow AI," "brute force," "doesn't really understand").

Providers agreeing: Anthropic, Gemini, OpenAI, Grok-Premium, Perplexity, Grok, OpenAI-Mini, Gemini-Lite — all 8 providers

Confidence: HIGH

This is the single most robustly confirmed finding in the entire dataset. Every provider independently documented the phenomenon, named it (the "AI Effect," attributed to Larry Tesler's formulation), and provided overlapping examples. The specific dismissal phrases are documented identically across providers without coordination.

2. The Term "AGI" Was Coined ~2002-2006 by Shane Legg and Ben Goertzel

Finding: The term "Artificial General Intelligence" was coined to distinguish human-level versatile AI from narrow AI, and discussing it was professionally embarrassing at the time.

Providers agreeing: Anthropic, Gemini, OpenAI, Grok-Premium, Grok, OpenAI-Mini, Perplexity — 7 of 8 providers

Confidence: HIGH

Minor variation: Anthropic attributes it to Legg, Goertzel, and Peter Voss jointly; OpenAI notes Mark Gubrud coined it in 1997 with Legg/Goertzel popularizing it; Wired is cited by multiple providers as the primary source. The "professionally embarrassing" characterization is confirmed by all providers who address the 2006 era.

3. OpenAI's Economic Definition Replaced Cognitive Definitions Around 2021

Finding: OpenAI's charter defines AGI as "highly autonomous systems that outperform humans at most economically valuable work," a deliberate shift from philosophical/cognitive definitions that occurred around the GPT-3 era.

Providers agreeing: Anthropic, Gemini, OpenAI, Grok-Premium, Perplexity, Grok, OpenAI-Mini — 7 of 8 providers

Confidence: HIGH

The exact quote from OpenAI's charter is reproduced consistently across providers. The significance — that this represented a deliberate pivot from cognitive to economic benchmarks — is independently noted by all providers.

4. AlphaGo's Victory Was Immediately Reclassified

Finding: AlphaGo's 2016 defeat of Lee Sedol was almost immediately dismissed as "narrow AI," "just pattern matching," or "brute force tree search" — despite having been considered impossible weeks earlier.

Providers agreeing: Anthropic, Gemini, OpenAI, Grok-Premium, Perplexity, Grok, OpenAI-Mini, Gemini-Lite — all 8 providers

Confidence: HIGH

The speed of reclassification (days to weeks) is noted by multiple providers as particularly striking. Yann LeCun's specific dismissal is quoted by multiple providers.

5. Jensen Huang's "I Think We've Achieved AGI" With Immediate Qualification

Finding: On Lex Fridman Podcast #494, Nvidia CEO Jensen Huang declared AGI achieved, then immediately qualified it: AI could build a billion-dollar company temporarily but has "zero percent" chance of building Nvidia.

Providers agreeing: Anthropic, Gemini, OpenAI, Grok-Premium, Perplexity, Grok, OpenAI-Mini — 7 of 8 providers

Confidence: HIGH

The quote is consistent across providers. Multiple providers independently identify this as the quintessential example of declaring victory while simultaneously moving the goalposts — in a single sentence.

6. Sam Altman's "AGI Kinda Went Whooshing By"

Finding: OpenAI CEO Sam Altman stated that AGI may have already arrived without fanfare, suggesting the transition happened without a dramatic moment.

Providers agreeing: Anthropic, Gemini, OpenAI, Grok-Premium, Perplexity, Grok, OpenAI-Mini — 7 of 8 providers

Confidence: HIGH

The quote is consistent. Multiple providers note the irony: the person leading the company that defined the field is saying the definition became incoherent.

7. OpenAI's Microsoft Contract Contains an AGI Trigger Clause

Finding: OpenAI's agreement with Microsoft includes a clause that, if OpenAI formally declares AGI achieved, Microsoft's IP licensing rights are affected — creating a financial incentive to delay or avoid formal AGI declarations.

Providers agreeing: Gemini, OpenAI, Grok-Premium, Perplexity — 4 of 8 providers

Confidence: MEDIUM (confirmed by multiple providers citing Wired reporting, but not all providers addressed it)

This is one of the most consequential structural findings: a major AI lab has a contractual reason to never formally declare AGI. The Wired article is the primary source cited.

8. Meta's Deliberate Use of "AMI" Instead of "AGI"

Finding: Meta and Yann LeCun deliberately adopted the term "AMI" (Advanced/Artificial Machine Intelligence) partly to avoid regulatory scrutiny associated with "AGI."

Providers agreeing: Anthropic, Gemini, Grok-Premium, Perplexity, Grok, OpenAI-Mini — 6 of 8 providers

Confidence: MEDIUM-HIGH

The regulatory motivation is confirmed by multiple providers. The exact expansion of "AMI" varies slightly (some say "Advanced Machine Intelligence," others "Artificial Machine Intelligence"), but the strategic intent is consistent.

9. Herbert Simon (1965) and Marvin Minsky (1967/1970) Predictions Were Spectacularly Wrong

Finding: Simon predicted machines would do any human work within 20 years (by 1985); Minsky predicted general human-level intelligence in 3-8 years (by 1978). Both were off by 40-50+ years.

Providers agreeing: Anthropic, Gemini, OpenAI, Grok-Premium, Perplexity, Grok, OpenAI-Mini, Gemini-Lite — all 8 providers

Confidence: HIGH

The specific quotes are consistent across all providers. These are the foundational data points for the "predictions graveyard" and are universally cited.

10. The Tombstone Pattern: Every Benchmark Gets Reclassified Upon Achievement

Finding: Chess (1997), Go (2016), protein folding (2020), bar exam (2023), Turing Test (2025) — each was the definitive test of intelligence until it was passed, then immediately declared insufficient.

Providers agreeing: All 8 providers

Confidence: HIGH

The specific list of fallen benchmarks is consistent across all providers, with minor variations in ordering and emphasis.

Unique Insights by Provider

Anthropic

The "AI Effect" has a named originator: Larry Tesler's formulation — "Intelligence is whatever machines haven't done yet" — is cited as the canonical statement of the phenomenon. This precise attribution appears most clearly in the Anthropic report and gives the pattern a quotable, citable origin.
The LeCun/Hassabis debate framing: Anthropic provides the most precise characterization of Hassabis's counter-argument — that LeCun was "confusing general intelligence with universal intelligence" — which is a substantively different philosophical claim than other providers capture.
The closing paradox formulation: "AGI is the only scientific benchmark that gets reclassified as 'not real intelligence' the moment it's achieved — which tells you it was never a scientific benchmark at all, but a mirror we hold up to protect the one thing we're not ready to share: the idea that we're special." This is the most philosophically complete version of the closing argument.

Gemini

The Microsoft "Sparks of AGI" paper's Gary Marcus rebuttal: Gemini is the only provider to include Gary Marcus's specific dismissal of the Microsoft paper as "a press release masquerading as science" — a quote that perfectly illustrates the reclassification dynamic applied to academic papers themselves.
The GPT-4.5 Turing Test result with specific statistics: Gemini provides the most precise data — GPT-4.5 was judged human 73% of the time, compared to actual humans being identified correctly 67% of the time — meaning the AI was more convincingly human than humans. This specific statistic is the most striking data point in the Turing Test section.
The "10 times bigger than the Industrial Revolution, 10 times faster" Hassabis quote: This specific framing of AGI's potential impact appears only in the Gemini report and provides useful calibration for Hassabis's actual position.

OpenAI

The most comprehensive Predictions Graveyard: The OpenAI report includes Ken Jennings's post-Jeopardy quip ("I for one welcome our new computer overlords"), Garry Kasparov's 1996 prediction that no computer would beat a top chess champion before 2010 (beaten in 1997), and Henry Kissinger's 2021 commentary — entries that appear in no other provider's table.
Kasparov's specific pre-Deep Blue prediction: "No computer will beat a top human chess champion before 2010" — made in 1996, violated in 1997 — is documented with the Deseret News citation and represents the sharpest single-year prediction failure in the dataset.
The "programmable alarm clock" quote attribution: Kasparov's dismissal of Deep Blue as "intelligent the way your programmable alarm clock is intelligent" is sourced to Goodreads with citation, making it the most precisely attributed dismissal quote in the dataset.

Grok-Premium

The Physical Symbol System Hypothesis as the 1976 paradigm: Grok-Premium is the most precise in naming Newell and Simon's specific theoretical framework — the claim that symbolic manipulation is "necessary and sufficient for intelligence" — which is the actual intellectual foundation of 1976-era AI, not just a description of it.
The clearest articulation of the LeCun/Hassabis philosophical distinction: Grok-Premium frames LeCun's position as requiring "world models" (understanding of physical reality) and "hierarchical planning" as prerequisites for AGI — the most technically precise description of what LeCun actually means by AMI vs. AGI.

Perplexity

The most comprehensive "Never List" with specific expert quotes: Perplexity provides the largest table of "never" predictions with the most specific attributions, including Noam Chomsky's specific 2022-2023 claim that LLMs "will never achieve genuine understanding of meaning because their architecture doesn't support compositional semantics" — a quote that appears in no other provider's report.
The "Reclassification Playbook" as a named, structured phenomenon with six distinct scripts: Perplexity is the only provider to systematically categorize the dismissal language into distinct rhetorical moves (brute force, pattern matching, narrow, doesn't understand, statistics/curve-fitting, no credit for scale, "that's not what I meant"), providing the most analytically complete taxonomy.
The AGI(t) = AGI(t-5 years) + 1 achievement formula: The mathematical formalization of the goalpost-shifting pattern — "where 1 achievement just got reclassified as 'not real AGI'" — is unique to Perplexity and is the most quotable compression of the entire 50-year pattern.
The Yoshua Bengio flip-flop documentation: Perplexity is the only provider to document Bengio changing his position twice — from "deep learning will take us further than skeptics think" to "might not be sufficient for AGI" to "maybe it's sufficient after all" — which is a uniquely honest data point about expert uncertainty.

Grok

The Sequoia Capital "2026: This is AGI" call: Grok is the only provider to include Sequoia Capital's January 2026 declaration that AGI had already arrived, providing a venture capital perspective on the definitional debate that is absent from other reports.
The most concise closing paradox: "If every conquered AGI benchmark gets rebranded 'not real intelligence,' AGI isn't science — it's a finish line that runs faster than the sprinters." This is the most tweet-ready formulation of the closing argument.

OpenAI-Mini

Bill Gates's 2023 coding prediction: "Programming will remain a 100% human profession, even 100 years from now" — cited with a Windows Central source — is a striking recent "never" prediction that appears only in this report and is particularly ironic given Microsoft's investment in GitHub Copilot.
Geoffrey Hinton's 2013 radiology prediction: "People should stop training radiologists — within five years AI will do their job better" — with the Time magazine citation — is documented most clearly here and represents a case where a "never" prediction ran in the opposite direction (Hinton predicted AI would do it, not that it wouldn't).

Gemini-Lite

The most concise summary of the core paradox: While less detailed than other providers, Gemini-Lite's framing — "AGI isn't a scientific finish line — it is a psychological horizon line, designed to recede forever so that we never have to admit we have built our own replacements" — is the most psychologically precise version of the closing argument, emphasizing the human motivation for goalpost-shifting rather than the institutional one.

Contradictions and Disagreements

Contradiction 1: Who Coined "AGI" and When?

Position A (OpenAI, citing Wired): Mark Gubrud coined "AGI" in 1997; Legg and Goertzel popularized it in 2002-2006.

Position B (Anthropic, Grok-Premium, Grok): Shane Legg coined it in conversation with Ben Goertzel, with Peter Voss also credited, around 2002-2006.

Position C (Gemini): Shane Legg coined it, "in conversation with Ben Goertzel about Goertzel's upcoming compilation of essays."

Assessment: The Wired article (cited by OpenAI and others) appears to be the primary source, and the Gubrud 1997 origin is the most historically precise claim. The Legg/Goertzel/Voss attribution likely refers to the popularization and formalization of the term rather than its coinage. Do not treat any single provider's attribution as definitive without consulting the Wired primary source directly.

Contradiction 2: Has the Turing Test Been "Passed"?

Position A (Gemini, Anthropic): GPT-4.5 passed a "standard three-party Turing test" in 2025, being judged human 73% of the time vs. humans at 67%.

Position B (OpenAI): A "modified Turing Test was 'decisively' passed by GPT-4.5" in 2025, with the caveat "(Debate continues, but it happened under credible conditions)."

Position C (Perplexity): The Turing Test was "largely passed in casual settings by 2020s LLMs" — a much earlier and less precise claim.

Position D (Multiple providers, citing critics): The Turing Test result "measures our gullibility more than a rigorous standard of intelligence" — the immediate reclassification.

Assessment: The 2025 GPT-4.5 result appears to be a real study (multiple providers cite it with consistent statistics), but the "standard" vs. "modified" distinction matters enormously for the claim's validity. The specific study should be verified independently. The immediate critical response is itself well-documented and consistent.

Contradiction 3: The Nature Paper on AGI Having Arrived

Position A (Gemini, Anthropic, OpenAI-Mini): A Nature paper argued AGI has arrived, attributed to UC San Diego researchers (Gemini) or described as a "peer-reviewed paper in Nature by Eddy Keming Chen and colleagues" (Anthropic).

Position B (Perplexity): Does not mention a Nature paper; instead describes a general academic debate.

Position C (Grok): Does not mention a Nature paper specifically.

Assessment: This is a significant uncertainty. The Nature paper is mentioned by only 3-4 providers, with varying attribution. The specific paper by "Eddy Keming Chen and colleagues" should be verified independently before citing. It is possible this refers to a preprint, a commentary, or a paper that was described to the AI systems in training data in ways that introduced errors. Treat this claim as MEDIUM confidence pending verification.

Contradiction 4: Andrew Ng's 2015 Position

Position A (Anthropic): Ng said worrying about AGI is "like worrying about overpopulation on Mars" — a dismissal of AGI concern.

Position B (OpenAI): Same quote, same framing — Ng dismissed AGI as irrelevant.

Position C (Perplexity): Ng "focused on 'AI as the new electricity'; viewed full AGI as further off, not imminent" — a softer characterization.

Position D (Gemini-Lite): "AI will be able to do any task a human can do... maybe in 100 years" — attributed to Ng in 2015, which contradicts the "Mars" quote's dismissive tone.

Assessment: The "overpopulation on Mars" quote is well-sourced across multiple providers and is likely accurate. The "100 years" attribution in Gemini-Lite appears to be an error or conflation with a different speaker. Use the "Mars" quote; treat the "100 years" attribution with skepticism.

Contradiction 5: Elon Musk's AGI Timeline Predictions

Position A (Anthropic): Musk predicted AGI "by 2025" (made in 2020).

Position B (OpenAI): Musk predicted human-level AI "by 2025" (made in 2014), then pushed to 2026 in late 2025.

Position C (Grok-Premium): Musk made "AGI by 2025-2026" predictions across 2010s-2020s.

Position D (Morocco World News, cited by OpenAI): Musk "confidently said" AGI by 2025 in 2024, then pushed to 2026.

Assessment: Musk has made multiple AGI predictions across many years, and the specific year-of-prediction vs. year-predicted varies by source. The consistent pattern — aggressive timelines that slip by exactly one year — is well-documented. The specific dates are less reliable than the pattern itself.

Contradiction 6: DeepMind's AGI Level Framework — How Many Levels?

Position A (Anthropic, Gemini): DeepMind's framework has five performance levels: emerging, competent, expert, virtuoso, superhuman.

Position B (Grok): DeepMind's framework has "6 levels: Emerging to Superhuman generality/autonomy."

Position C (OpenAI-Mini): DeepMind's framework ranges "from competent at specific tasks to full science-level invention."

Assessment: The arxiv paper (2311.02462) is cited by multiple providers and is the primary source. The five-level description (Anthropic, Gemini) appears more precise. The "6 levels" in Grok may reflect a different dimension of the framework (performance vs. generality vs. autonomy are separate axes in the paper). Consult the arxiv paper directly for the authoritative framework.

Detailed Synthesis

Part I: The Pattern That Predates the Term

Before "AGI" existed as a phrase, the concept existed as a promise. In 1965, Herbert Simon — a Nobel laureate, not a crank — predicted that "machines will be capable, within twenty years, of doing any work a man can do" [all 8 providers]. In 1970, Marvin Minsky told Life magazine that "in from three to eight years we will have a machine with the general intelligence of an average human being" [all 8 providers]. These weren't fringe predictions. They were the considered views of the field's founders, made with the confidence of people who had just invented the discipline.

They were wrong by approximately 50 years. And the wrongness followed a specific pattern that would repeat, with remarkable fidelity, for the next half-century.

The pattern works like this: researchers identify a capability that seems to require genuine intelligence — chess, Go, protein folding, legal reasoning, creative writing. They declare it impossible or decades away. Machines achieve it. Researchers immediately explain why it doesn't count. The definition of "real intelligence" upgrades to exclude the new capability. Repeat.

This phenomenon has a name. Larry Tesler, the computer scientist who invented cut-and-paste, formulated it as: "Intelligence is whatever machines haven't done yet" [Anthropic]. Pamela McCorduck documented it as a recurring feature of AI research history. Douglas Hofstadter described it as a pattern where "each time AI reaches a formerly uniquely-human ability, we declare that ability non-essential and move the goalposts" [OpenAI, citing Yahoo Tech]. The academic literature calls it the "AI Effect" [Gemini, Grok-Premium, multiple others].

What makes the 50-year record remarkable is not that the predictions were wrong — prediction is hard — but that the dismissal language is verbatim-identical across decades. The same phrases appear in 1997 (chess), 2016 (Go), 2020 (protein folding), 2023 (bar exam), and 2025 (Turing Test): "just brute force," "just pattern matching," "narrow AI," "doesn't really understand" [Perplexity, Grok-Premium, OpenAI, all others]. When you see the identical script applied to chess and the Turing Test, separated by 28 years, the pattern stops being a coincidence and starts being a structure.

Part II: The Birth of "AGI" and Why It Was Embarrassing

By the early 2000s, the AI field had survived two "AI winters" — funding collapses triggered by the gap between promises and results. The professional culture had adapted: you focused on narrow, measurable problems. You didn't talk about "general intelligence." That was for science fiction.

Into this environment, Shane Legg and Ben Goertzel introduced the term "Artificial General Intelligence" [Anthropic, Gemini, OpenAI, Grok-Premium, Perplexity, Grok, OpenAI-Mini]. The "G" was deliberate — it was meant to distinguish the ambitious goal from the narrow AI that had become respectable. Legg later recalled the conversation: "Don't call it real AI — that's a big screw-you to the whole field. If you want to write about machines that have general intelligence, you should add the word general" [OpenAI, citing Wired].

The term stuck, but the stigma didn't immediately lift. Discussing AGI at academic conferences in 2006 was, as multiple providers describe it, "professionally embarrassing" [Gemini, Grok-Premium, Anthropic]. The field had been burned by overconfidence before. The safe move was narrow AI.

This context matters because it explains why the definition of AGI was always contested from the start. It wasn't coined by a committee that agreed on what it meant. It was coined by a small group of researchers who wanted to rehabilitate an ambitious goal, in a field that had learned to distrust ambition. The definitional instability that would plague the concept for the next 20 years was baked in from the beginning.

Part III: AlphaGo and the Speed of Reclassification

March 2016. DeepMind's AlphaGo defeats Lee Sedol, the world's greatest Go player, 4-1. Go had been the canonical "impossible" game — its branching factor made brute-force search computationally infeasible, and experts had argued it required something like human intuition. The achievement was supposed to be decades away.

The reclassification began almost immediately [all 8 providers]. Within days, the dominant narrative had shifted: AlphaGo was "narrow AI," it was "just pattern matching combined with tree search," it couldn't do anything except play Go. Yann LeCun noted that if you changed the board size from 19×19 to 29×29, it would be "utterly lost" [OpenAI, citing TechCrunch]. Rodney Brooks called it "more about training algorithms and using brute-force computational strength than any real intelligence" [OpenAI].

These criticisms weren't entirely wrong — AlphaGo was specialized, and it couldn't generalize to other domains. But the speed and completeness of the reclassification was striking. A capability that had been held up as proof of human cognitive uniqueness became, within weeks of being achieved, evidence of AI's limitations.

The key voices of the 2016 era were already diverging on what AGI would actually require. Demis Hassabis positioned AlphaGo as "a step toward generality" [Grok-Premium]. Yann LeCun emphasized the need for "world models" and unsupervised learning [Gemini, Grok-Premium]. Geoffrey Hinton argued that AI still lacked "true common sense" [OpenAI]. Elon Musk warned of existential risk while simultaneously insisting current AI was "only specialized" [Anthropic, Grok-Premium]. The field was already fragmenting into incompatible definitions of what the finish line looked like.

Part IV: GPT-3 and the Economic Pivot

The release of GPT-3 in 2020 and its cultural impact through 2021 represented the most significant definitional shift in the history of the concept [all 8 providers]. A single model — trained on text, producing text — could write essays, debug code, translate languages, summarize legal documents, and answer questions across domains. It wasn't specialized. It was, in a meaningful sense, general.

This broke the existing dismissal framework. You couldn't say "it only plays chess." You couldn't say "it only does Go." GPT-3 did dozens of things, many of them at or near human level. The old definition of AGI — cognitive, philosophical, focused on "understanding" — was suddenly awkward.

OpenAI's response was to change the definition. Their charter formalized AGI as "highly autonomous systems that outperform humans at most economically valuable work" [all 8 providers]. This was a deliberate pivot from cognitive to economic benchmarks. It replaced the question "does it understand?" with the question "does it do the job?" The shift was pragmatic, commercially motivated, and enormously consequential — it made AGI measurable in a way that philosophical definitions never were.

Critics pushed back immediately. Gary Marcus and Emily Bender argued that GPT-3 was a "stochastic parrot" — a statistical engine predicting the next word without any genuine understanding of the world [Gemini, Perplexity, Grok]. Stuart Russell argued that the economic definition was "a mistake" that treated AGI as "just better labor" while ignoring what actually makes human intelligence general [Perplexity]. These weren't fringe objections — they came from serious researchers with serious arguments.

But the economic definition had momentum. It was measurable. It was fundable. It aligned with what investors and customers actually cared about. And it had the convenient property of making AGI seem simultaneously close (AI was already doing economically valuable work) and not-yet-achieved (it wasn't doing most economically valuable work). This sweet spot — close enough to justify investment, far enough to avoid regulatory triggers — would prove remarkably durable.

Part V: The Frameworks Era (2024) and the Industrialization of Goalpost-Shifting

By 2024, the definitional debate had been institutionalized. Major players published formal frameworks, creating the appearance of scientific rigor while actually encoding their commercial interests into the definition of intelligence itself.

OpenAI's internal five-level framework [Bloomberg, cited by Grok, Gemini, OpenAI, Grok-Premium]:

Level 1: Chatbots (already achieved)
Level 2: Reasoners — PhD-level problem solving
Level 3: Agents — autonomous action
Level 4: Innovators — AI that aids invention
Level 5: Organizations — AI that does the work of an entire company

Google DeepMind's framework [arxiv 2311.02462, cited by multiple providers] introduced separate axes for performance (emerging → competent → expert → virtuoso → superhuman) and generality (narrow → general), allowing the company to claim progress on one axis while acknowledging limitations on another.

Microsoft's "Sparks of AGI" paper [arxiv 2303.12712, cited by multiple providers] argued that GPT-4 showed "sparks" of general intelligence — a formulation that was immediately criticized by Gary Marcus as "a press release masquerading as science" [Gemini].

The frameworks served a structural purpose: they made it possible to claim progress without claiming arrival. You could be at "Level 2" or "Emerging AGI" — close enough to justify continued investment, far enough to avoid the legal, regulatory, and philosophical consequences of declaring AGI achieved. The OpenAI-Microsoft contract clause — which gives Microsoft reduced IP rights if OpenAI formally declares AGI achieved [Gemini, OpenAI, Grok-Premium, Perplexity] — made this not just strategically convenient but financially necessary.

Part VI: The Money Map — Definitions as Commercial Strategy

The most underappreciated dimension of the AGI definitional debate is that it is not primarily a scientific debate. It is a commercial negotiation conducted in the language of science [Perplexity, Gemini, Grok-Premium, Anthropic, Grok].

OpenAI prefers the economic definition because it makes AGI seem close (justifying valuation) while the Microsoft contract clause makes formal declaration dangerous (justifying delay). The result: perpetual "almost there" messaging that serves both fundraising and legal protection simultaneously [Gemini, OpenAI, Perplexity].

Google DeepMind prefers a leveled framework because it allows continuous progress claims without triggering a binary declaration. Hassabis's emphasis on scientific discovery as the AGI benchmark positions DeepMind as research-driven and world-positive, differentiating from OpenAI's commercial framing [Gemini, Grok-Premium].

Nvidia benefits from AGI being "almost here" forever. Jensen Huang's declaration that AGI has been achieved — immediately qualified by noting AI can't build Nvidia — is structurally perfect for his business: it validates the current AI investment cycle while implying there's still more hardware to sell [Gemini, Grok-Premium, Grok, Perplexity]. As Grok-Premium notes: "Nvidia benefits from the perception that AGI is just arriving, fueling perpetual infrastructure spending."

Meta benefits from AGI never being declared. Open-sourcing AI models (Llama) under an "AGI" label could trigger regulatory responses treating them as weapons-grade technology. By rebranding as "AMI" and having LeCun publicly argue that current systems aren't close to AGI, Meta creates regulatory distance while continuing to advance its AI capabilities [Gemini, Anthropic, Grok-Premium, Perplexity, Grok].

Anthropic benefits from AGI being "close but dangerous." Their safety-focused positioning requires AGI to be both imminent enough to justify safety research funding and dangerous enough to justify their cautious approach. Dario Amodei's aggressive timelines (AGI in 1-3 years) combined with safety emphasis is the optimal commercial narrative for their market position [Gemini, Perplexity, Grok-Premium].

The meta-pattern: every major player has a financial incentive to define AGI in a way that keeps it perpetually 5-10 years away. If AGI arrives tomorrow, valuations must be justified and regulations triggered. If AGI is 50 years away, the investment thesis collapses. The sweet spot — "very close, but not yet" — is permanent and profitable.

Part VII: The LeCun/Hassabis Schism and the AMI Gambit

The September 2025 public debate between Yann LeCun and Demis Hassabis represents the most philosophically substantive disagreement in the recent history of the field [Anthropic, Gemini, Grok-Premium, Grok, Perplexity].

LeCun's position: current LLMs are hitting a "reasoning wall" because they lack grounded "world models" — an understanding of physical reality that humans acquire through embodied experience. He called the concept of "general intelligence" as applied to current systems "complete BS" [Grok-Premium, citing The Decoder]. His AMI framework is not just a rebranding exercise — it reflects a genuine technical argument that the architecture of current systems is fundamentally insufficient for what AGI requires.

Hassabis's counter: LeCun is "confusing general intelligence with universal intelligence" [Anthropic]. Current systems do demonstrate genuine generality — the ability to perform across domains — even if they don't demonstrate universal intelligence across all possible tasks. Elon Musk sided with Hassabis in this exchange [Grok-Premium].

The debate matters because it represents a genuine fork in the road: one path (LeCun/Meta) holds that current AI architectures are a dead end for AGI and that fundamentally new approaches (world models, hierarchical planning, energy-based models) are required; the other path (Hassabis/OpenAI) holds that current architectures, scaled and refined, are on a trajectory toward genuine generality.

Both positions have commercial implications. LeCun's "AMI" framing serves Meta's regulatory interests. Hassabis's "we're on track" framing serves DeepMind's research funding narrative. Neither position is purely scientific.

Part VIII: The Whoosh and the Qualification

January 2026 brought two moments that crystallized 50 years of definitional drift into a single week.

Sam Altman's "AGI kinda went whooshing by" [all 8 providers] was simultaneously an admission of definitional failure and a pivot to the next goalpost. If AGI "went whooshing by" without anyone noticing, it means the definition was never precise enough to generate a recognizable moment of achievement. Altman's proposed solution — move on to defining superintelligence — is the definitional ratchet in its purest form: declare the old goalpost passed, install a new one further out.

Jensen Huang's declaration on Lex Fridman Podcast #494 — "I think we've achieved AGI" — followed by the immediate qualification that AI has "zero percent" chance of building Nvidia [all 8 providers] — is the most honest moment in the 50-year history of the concept, precisely because it makes the pattern explicit. Huang is simultaneously declaring victory and defining victory in a way that excludes his own company's achievements. The goalpost is moved in the same breath as the declaration.

The Perplexity report's formulation captures this perfectly: "AGI(t) = AGI(t-5 years) + 1 achievement, where 1 achievement just got reclassified as 'not real AGI.'" The ratchet is mathematical.

Part IX: The Scorecard — We Keep Passing Old Bars

The most visually striking finding across all providers is the AGI Scorecard: when you take each era's own definition of AGI and score today's AI systems against it, the pattern is unmistakable.

By 1976 criteria (Turing Test, expert-level reasoning, natural language): Pass — GPT-4.5 passes the Turing Test, current models handle natural language at human level.

By 2006 criteria (cross-domain learning, transfer learning, general-purpose architecture): Pass — modern foundation models do exactly this.

By 2016 criteria (unsupervised learning, transfer across unlike domains): Pass — self-supervised learning is the dominant paradigm; GPT-4 transfers across text, code, images, and reasoning.

By 2021 criteria (outperform humans at economically valuable work): Partial — AI outperforms humans in many knowledge work tasks; not yet "most" tasks.

By 2024 criteria (Level 2 Reasoners, multimodal, agentic): Partial — o1/o3 models demonstrate PhD-level reasoning; agents are emerging but unreliable.

By 2026 criteria (whatever the current frontier definition is): Fail — the definition has upgraded to exclude current capabilities.

The visual pattern, as multiple providers note, is "impossible to miss": we keep passing old bars and raising new ones. The only bar we never pass is the current one — because the current one is defined as whatever we haven't done yet.

The Complete Tables

B. THE PREDICTIONS GRAVEYARD

Synthesized from all 8 providers. Quotes are from primary sources as cited.

Who	When Said	What They Predicted	What Actually Happened	How Wrong
Herbert Simon	1957	Computer chess champion within 10 years (by 1968)	Deep Blue beat Kasparov in 1997	29 years late
Herbert Simon	1965	"Machines will be capable, within twenty years, of doing any work a man can do" (by 1985)	By 1985, AI Winter; still debated in 2026	40+ years late and counting
Marvin Minsky	1967	"Within a generation… the problem of creating AI will substantially be solved" (by ~1992)	AI Winter followed; nothing close	Off by 30+ years minimum
Marvin Minsky	1970	"In from three to eight years we will have a machine with the general intelligence of an average human being" (by 1978)	Nothing close by 1978	Off by 48+ years minimum
I.J. Good	1965	"Ultraintelligent machine" built "within the twentieth century"	Did not happen	Off by 26+ years
Garry Kasparov	1996	"No computer will beat a top human chess champion before 2010"	Deep Blue beat Kasparov in 1997	Off by 13 years in the wrong direction — happened far sooner
Ray Kurzweil	2005	AGI by 2029; Singularity by 2045	2029 is 3 years away; debate rages on whether AGI is already here	TBD — was considered radical, now mainstream
Ben Goertzel	2007	AGI within 10 years via multi-paradigm approaches (by 2017)	By 2017, deep learning winning; no emergent AGI	Off by at least 10+ years
Shane Legg	2008/2011	~50% chance of human-level AGI by 2028	Debate raging on whether it's already here	Possibly the least-wrong major prediction
Andrew Ng	2015	Worrying about AGI is "like worrying about overpopulation on Mars"	By 2025, AGI timelines are mainstream geopolitical concern	Aged extremely poorly
Geoffrey Hinton	2013	"Stop training radiologists — within five years AI will do their job better" (by 2018)	Hospitals still employ human radiologists; AI assists but hasn't replaced	Off by 10+ years on replacement
Geoffrey Hinton	2015-2022	AGI is 30-50 years away	Left Google in 2023 to warn about AI; revised to 5-20 years	Admitted he was wrong; revised dramatically
Yoshua Bengio	2017	"Deep learning will take us much further than skeptics think — perhaps to AGI"	2023: "Might not be sufficient for AGI, need new paradigms." 2024: "Maybe sufficient after all"	Flip-flopped twice; reflects genuine uncertainty
Elon Musk	2014	Human-level AI by 2025	Still debated in 2026; Musk pushed to 2026	Off by at least 1-2 years, pattern of annual slippage
Elon Musk	2022	"AGI by 2025, definitely"	2025 came; no consensus AGI; Musk pushed to 2026	Off by at least 1 year; recurring pattern
Elon Musk	2025	AGI "possibly as early as this year" (2026)	We are living in it; definition-dependent	TBD
Sam Altman	2019	AGI likely within a decade (by 2029)	By 2025, Altman said "we built AGIs" and "AGI kinda went whooshing by"	Declared victory early; definition shifted
Sam Altman	2023	"GPT-4 is not AGI. True AGI will require breakthroughs we haven't had yet"	2025: "AGI kinda went whooshing by"	Same person, opposite conclusion, no new breakthroughs claimed — definition changed
Demis Hassabis	2016	AGI in 5-10 years (by 2026)	Still saying "5-10 years" in 2025	Perpetually 5-10 years away
Demis Hassabis	2025	~50% chance AGI by 2030; "one or two major breakthroughs" needed	Ongoing	Moderate timeline; not yet falsifiable
Dario Amodei	2025	AGI in 1-3 years (by 2026-2028); 90% confident within a decade	Ongoing; Claude models at expert level in many domains	Aggressive; jury still out
Jensen Huang	March 2026	"I think we've achieved AGI" — qualified as: AI can build a billion-dollar company temporarily but has "zero percent" chance of building Nvidia	We are living in it	Declared victory while simultaneously moving the goalpost in the same sentence
Sequoia Capital	January 2026	"2026: This is AGI — stop waiting, it's already here"	We are living in it	Bold VC call; definition-dependent
Bill Gates	2023	"Programming will remain a 100% human profession, even 100 years from now"	AI coding tools write production code routinely by 2024-2026	Off by approximately 97 years
Ken Jennings	2011	"I for one welcome our new computer overlords" (after losing Jeopardy to Watson)	Watson flopped in medical applications; Jennings still employed	Ironic self-deprecation that overestimated Watson

C. THE NEVER LIST

Every capability credible experts publicly said machines would "never" achieve — synthesized from all 8 providers.

Capability	Who Said "Never" (or "Decades Away")	When	When Machines Did It	Time to Eat Words
Arithmetic / Calculation	Skeptics of early computers	1940s-1950s	ENIAC, 1945	Immediate
Chess at grandmaster level	International chess masters; Kasparov (1996): "No computer before 2010"	1970s-1996	Deep Blue beats Kasparov, 1997	1 year (Kasparov); ~20 years (general consensus)
Go	Virtually all AI researchers: "100-200 years away"	Pre-2016	AlphaGo beats Lee Sedol, March 2016	2 years after "decades away" consensus
Protein folding	Structural biologists: "Would take decades more"	Pre-2020	AlphaFold, 2020	~2 years after "decades" claim
Speech recognition (reliable)	Speech researchers: "30+ years"	1990s	Human-level error rates, 2017	~20 years
Image recognition at human level	Computer vision researchers	1969-2010	ResNet exceeds human accuracy on ImageNet, 2015	~5 years after "decades" claims
Natural language translation	"AI-complete problem; 50+ years"	1960s-2010	Google Neural MT, 2016; GPT models, 2020+	~6 years after "decades" claims
Passing standardized tests (SAT, GRE)	Educational testing researchers	2010s	GPT-3 passes SAT reading ~90th percentile, 2020	~5 years
Passing the bar exam	Lawyers: "Requires genuine legal reasoning"	Pre-2023	GPT-4 passes at 90th percentile, 2023	~Immediate
Medical diagnosis at specialist level	Doctors: "Requires empathy, clinical judgment"	1990s-2010s	AI exceeds dermatologists/radiologists on specific tasks, 2018-2024	~5-10 years
Creative writing	Writers/critics: "Requires consciousness, lived experience"	Perennial	GPT-3/4 write publishable fiction, 2020-2023	N/A — always "never"
Coding / programming	Bill Gates (2023): "100% human profession for 100 years"	2020-2023	Copilot/GPT-4/Claude write production code, 2022-2024	~1-2 years
Mathematical theorem proving	Mathematicians: "Requires genuine insight"	Pre-2023	AlphaProof, AI-assisted proofs, 2023-2024	~5 years
Turing Test (conversational deception)	"Decades away if ever"	2010s	GPT-4.5 passes at 73% (vs. humans at 67%), 2025	~10 years
Superhuman video game play	Game AI researchers	Pre-2015	DQN beats Atari, 2015; AlphaStar beats StarCraft pros, 2019	~5 years
Understanding humor	Cognitive scientists: "30+ years"	Pre-2020	GPT-3/4 generate contextually appropriate humor, 2020+	~5 years
Generating photorealistic images from text	Computer vision researchers: "20+ years"	2019-2020	DALL-E 3, Midjourney, 2022-2023	~2-3 years
Noam Chomsky: LLMs will "never" achieve genuine understanding	Noam Chomsky	2022-2023	GPT-4 passes bar exam, Turing Test, medical licensing	~1-2 years

D. THE RECLASSIFICATION PLAYBOOK

The same dismissal language, recycled verbatim across decades. Synthesized from all 8 providers, most comprehensively from Perplexity.

Dismissal Phrase	Chess (1997)	Go (2016)	GPT-3/4 (2020-2023)	Turing Test (2025)	Protein Folding (2020)
"Just brute force"	✅ "It just searches millions of positions"	✅ "Monte Carlo tree search + brute force"	✅ "Brute force over training data"	—	✅ "Brute force optimization"
"Just pattern matching"	✅ "Pattern recognition, not chess understanding"	✅ "Statistical pattern matching"	✅ "Stochastic parrot / next-token prediction"	✅ "Mimicking conversational patterns"	✅ "Pattern matching on evolutionary data"
"Narrow AI"	✅ "It can only play chess"	✅ "It can only play Go"	✅ "Still narrow — just text"	✅ "Narrow conversational skill"	✅ "Narrow biological tool"
"Doesn't really understand"	✅ "No understanding of chess"	✅ "No understanding of the game"	✅ "No understanding of meaning" (Chomsky, Marcus)	✅ "No subjective experience or understanding"	✅ "Doesn't understand proteins"
"Just a parlor trick"	✅ "Impressive but not intelligence"	✅ "Impressive but narrow"	✅ "Press release masquerading as science" (Marcus on Sparks paper)	✅ "Measures gullibility, not intelligence"	✅ "Clever trick, not science"
"A real intelligence would…"	"…generalize beyond chess"	"…transfer to other domains"	"…not hallucinate / have grounded semantics"	"…be self-aware / conscious"	"…understand biology, not just predict"
"No credit for scale"	—	✅ "Played itself millions of times"	✅ "Just trained on the whole internet"	—	—
"Lacks common sense"	✅ "Can't tie its shoes"	✅ "Can't do anything else"	✅ "Fails basic sanity checks"	✅ "No common sense about the world"	—

The meta-pattern: When the specific dismissal fails (because the AI does generalize, does transfer, does handle multiple domains), the dismissal upgrades to the next level. The script is not static — it evolves to always stay one step ahead of the capability.

E. THE TOMBSTONE LIST

A memorial wall of moved goalposts. Synthesized from all 8 providers.

⚰️ Arithmetic (~1950s): "That's just calculation, not thinking." — RIP

⚰️ Chess (1997): "Deep Blue was intelligent the way your programmable alarm clock is intelligent." — Kasparov — RIP

⚰️ Jeopardy! (2011): "It doesn't understand the questions — it's basically a text-fetcher." — RIP

⚰️ Image Recognition (2015): "It's just curve-fitting on pixels. It doesn't understand what an object is." — RIP

⚰️ Go (2016): "It's just narrow reinforcement learning. It can't do anything else." — RIP

⚰️ Superhuman Video Games (2015-2019): "Reinforcement learning is different from real reasoning." — RIP

⚰️ Protein Folding (2020): "It's a biological calculator, not scientific understanding." — RIP

⚰️ Creative Writing (2022): "It's just remixing training data. No real creativity — it has no soul." — RIP

⚰️ Coding (2022-2024): "It's just autocomplete on steroids." — RIP

⚰️ Bar Exam / Medical Licensing (2023): "It memorized the answers. Doesn't mean it can practice law." — RIP

⚰️ Mathematical Olympiad (2024): "It's pattern matching, not mathematical insight." — RIP

⚰️ The Turing Test (2025): "We were wrong to use human deception as a benchmark. It measures our gullibility." — RIP

⚰️ ARC-AGI Benchmark (2025): "The test wasn't actually testing general intelligence." — RIP

⚰️ "Economically Valuable Work" (2025-2026): "A Tamagotchi app isn't real economic value. A real AGI would build Nvidia." — Dying

F. THE AGI SCORECARD

How do today's AI systems (March 2026) score against each era's own definition? Synthesized from all 8 providers.

Era	Their Definition of AGI	Today's AI Score	Verdict
Turing (1950)	Imitation Game: fool a human in unrestricted text conversation	GPT-4.5 passes at 73% (humans: 67%)	✅ PASS
Simon (1965)	"Doing any work a man can do"	Cognitive work: largely yes. Physical labor: no	🟡 PARTIAL
Minsky (1970)	"Read Shakespeare, grease a car, play office politics, tell a joke"	Shakespeare ✅, jokes ✅, office politics ✅, grease a car ❌	🟡 PARTIAL (3/4)
Symbolic AI Era (1976)	Explicit logical reasoning, expert-level performance, common sense	Expert performance ✅, logical reasoning ✅, common sense 🟡	🟡 PARTIAL
Searle (1980)	"Strong AI" — machine has a mind and genuine understanding	No evidence of consciousness or genuine understanding	❌ FAIL
Legg/Goertzel (2006)	"Achieve complex goals in a wide range of environments"	Increasingly capable with agents; broad domain performance	✅ PASS (by most readings)
OpenAI Charter (2018)	"Outperform humans at most economically valuable work"	Outperforms in many knowledge tasks; not yet "most"	🟡 PARTIAL
DeepMind Levels (2023) — Level 1 "Emerging"	Equal to unskilled human across wide range of tasks	Exceeds this in most cognitive domains	✅ PASS
DeepMind Levels (2023) — Level 3 "Expert"	Outperform 90% of skilled adults across wide range	Exceeds 90th percentile in law, medicine, math, coding	✅ PASS (arguably)
OpenAI 5 Levels — Level 2 "Reasoners"	PhD-level problem solving	o1/o3 models demonstrate this	✅ PASS
OpenAI 5 Levels — Level 3 "Agents"	Autonomous multi-step action in the world	Emerging but unreliable; not yet robust	🟡 PARTIAL
OpenAI 5 Levels — Level 5 "Organizations"	AI that can do the work of an entire organization	Not achieved	❌ FAIL
LeCun (2025-2026)	World models, embodied understanding, hierarchical planning, physical causality	Not achieved; LLMs lack grounded physical understanding	❌ FAIL
Hassabis (2025-2026)	Nobel-level scientific discovery; solving humanity's hardest problems autonomously	Not yet achieved autonomously	❌ FAIL
Huang (March 2026)	AI that can build a billion-dollar company (even temporarily)	Plausible with current agent platforms	✅ PASS (by his definition)
Huang's implicit new bar (March 2026)	AI that can build Nvidia — sustained innovation, hardware, culture	"Zero percent chance"	❌ FAIL

The pattern is mathematically undeniable: every bar set more than 5 years ago is now passed. Every current bar is either partial or failed. The ratchet moves in one direction only.

G. THE MONEY MAP

Who benefits from which definition? Synthesized from all 8 providers.

Company	Preferred AGI Definition	Why It Serves Them	If AGI "Achieved"	If AGI "Not Achieved"
OpenAI	Economic: "outperform humans at most economically valuable work" → now pivoting to "superintelligence"	Makes AGI seem close (justifies $300B+ valuation); "superintelligence" pivot keeps mission alive after AGI	Triggers Microsoft IP clause renegotiation; justifies valuation but invites regulation	Keeps investor urgency; justifies continued fundraising; avoids Microsoft clause
Google DeepMind	Leveled framework (Emerging → Superhuman); scientific discovery emphasis	Allows progress claims without binary declaration; positions as research-driven and responsible	Validates decades of research; justifies compute spending; potential regulatory scrutiny	Keeps the race going; justifies continued investment; maintains "responsible AI" brand
Nvidia	Fluid, economic: "whatever the benchmark is, we're approaching it"	Declaring AGI "almost here" forever justifies perpetual GPU demand	Massive chip demand justified; validates current infrastructure investment	Still massive chip demand (for the pursuit); Huang's "zero percent Nvidia" qualification ensures this
Meta / LeCun	"AMI" — explicitly NOT AGI; world models required	Dodges regulatory scrutiny; open-sourcing Llama without "AGI" label avoids weapons-grade classification risk	Regulatory nightmare: open-sourcing AGI could be classified as distributing dangerous technology	Validates LeCun's technical position; keeps Llama open-source without regulatory risk; differentiates from competitors
Anthropic	Safety-focused; "close but dangerous"; alignment required for true AGI	Positions as responsible player; attracts safety-conscious enterprise and government customers	Must demonstrate safety credentials are real; validates their entire mission	Justifies continued safety research funding; maintains "we're the responsible ones" positioning
Microsoft	Capability "sparks" — dependent on OpenAI's formal declaration	Needs AI to be hyper-competent (sells Copilot) but legally distinct from AGI (keeps IP rights)	Loses access to OpenAI's future models under contract clause	Keeps OpenAI licensing rights; continues Copilot revenue

H. WHAT'S NEXT ON THE NEVER LIST

The next capabilities that will be achieved and immediately reclassified. Synthesized from all 8 providers.

Capability	Current Status	Predicted Achievement	Predicted Dismissal Phrase
Autonomous scientific discovery (novel hypothesis → experiment → publication, no human framing)	Early demonstrations (AlphaFold, FunSearch, AI-assisted proofs)	2026-2028	"It didn't discover anything — it just ran high-speed combinatorics on existing human data. It doesn't understand the physics, it found a statistical anomaly."
Multi-month autonomous agent completing complex business workflows	Emerging (Devin, various agent frameworks)	2026-2027	"It's just following a script with error correction — not real judgment. It needs human scaffolding for anything truly novel."
Original mathematical theorem proving (not verification, but discovery of new results)	Partial (AI-assisted proofs, AlphaProof)	2027-2029	"The human framed the problem. The AI just did the grunt work. It doesn't understand why the theorem is true."
AI CEO running a company for 12+ months with measurable success	Not yet	2027-2030	"It's not real leadership — it's just a narrow reinforcement learning algorithm optimizing for a predefined reward function (profit). It lacks emotional intelligence, vision, and genuine judgment."
Embodied humanoid robot navigating unstructured real-world environments	Lab demonstrations	2027-2030	"It's a Roomba with arms. It doesn't have a conscious world model — it's just mimicking kinesthetic training data."
AI-authored novel winning major literary prize	Not yet (blind-judged)	2026-2028	"The AI doesn't feel the grief it wrote about. It mapped the latent space of human emotional syntax. The reader is doing all the emotional work."
Passing a full university degree program (not just individual exams)	Not yet at scale	2028-2030	"It memorized the curriculum. A degree requires lived experience, intellectual growth, and genuine curiosity — none of which it has."

The Closing Paradox

Every provider, independently, arrived at the same destination. Here is the synthesis of their eight closing arguments into one:

AGI is the only scientific benchmark in history that gets reclassified as "not real intelligence" the moment it's achieved — which means it was never a scientific benchmark at all. It's a mirror we hold up to protect the one thing we're not ready to share: the idea that we're special. And the moment we build something that passes every test we've ever set, we will invent a new test. We always have. We always will. Until we don't need to anymore — and that's the moment nobody has a name for yet.

Or, in the words that a CTO would actually quote:

We've been defining AGI as "whatever AI can't do yet" for 50 years. The only thing that's changed is how fast "yet" arrives.

Evidence Explorer

Select a citation or claim to explore evidence.

Go Deeper

Follow-up questions based on where providers disagreed or confidence was low.

Verify the specific 2025 Turing Test study — identify the exact paper, methodology, judge selection criteria, and whether "three-party" format matches Turing's original specification. Determine whether GPT-4.5's 73% result is reproducible and what the critics' methodological objections are.

This is the most consequential recent empirical claim in the dataset, cited by multiple providers with consistent statistics, but the "standard" vs. "modified" distinction is unresolved and the study's validity is immediately contested. If the result is robust, it represents the most significant AGI benchmark achievement in 75 years. If the methodology is flawed, it's the most recent example of the reclassification playbook being applied to a study rather than a capability.

M tier

Investigate this →

Investigate the OpenAI-Microsoft contract clause in detail — what exactly triggers the AGI declaration, who has the authority to make it, what the specific IP consequences are, and whether this clause has been renegotiated as part of OpenAI's 2024-2025 restructuring.

Multiple providers cite this clause as a structural explanation for why OpenAI has financial incentives to never formally declare AGI. If accurate, it means the world's most prominent AGI lab has a contractual reason to keep the definition unstable. This is the most consequential institutional finding in the dataset and deserves primary source verification beyond the Wired/Financial Times reporting.

S tier

Investigate this →

Conduct a systematic linguistic analysis of AI dismissal language across the 50-year period — collect primary sources for each "just pattern matching," "narrow AI," "brute force," and "doesn't really understand" usage, map them to specific achievements, and test whether the language is genuinely verbatim-identical or whether providers are pattern-matching themselves in describing the pattern.

The "verbatim-identical dismissal language" claim is the most rhetorically powerful finding in the dataset and is confirmed by all 8 providers — but it is also the claim most vulnerable to confirmation bias. A rigorous linguistic analysis with primary sources would either strengthen this finding enormously (making it the definitive academic documentation of the AI Effect) or reveal that the similarity is overstated.

L tier

Investigate this →

Map the financial flows between AGI definition choices and investment outcomes — specifically, does the timing of AGI timeline announcements correlate with funding rounds, valuation events, or regulatory hearings? Test whether "AGI is close" statements cluster around fundraising periods.

Multiple providers assert that AGI definitions serve commercial interests, but this is stated as inference rather than demonstrated empirically. A systematic analysis of announcement timing vs. financial events would either confirm the cynical interpretation (definitions are fundraising tools) or reveal that the correlation is weaker than assumed. This would be the most important finding for anyone trying to evaluate AI company claims.

L tier

Investigate this →

Investigate the Nature paper on AGI arrival attributed to "Eddy Keming Chen and colleagues" — verify its existence, publication date, methodology, peer review status, and the specific claims made. Determine whether it is a primary research paper, a review article, a commentary, or a preprint, and what the academic response has been.

This paper is cited by 3-4 providers as a significant January 2026 event, but is absent from the most comprehensive providers (Perplexity, Grok-Premium, Grok). The inconsistency suggests either that the paper exists but was not universally indexed, or that it was described to AI systems in ways that introduced errors. Given that "a Nature paper arguing AGI has arrived" would be one of the most significant academic events in the field's history, its verification or debunking is essential.

XS tier

Investigate this →

Key Claims

Cross-provider analysis with confidence ratings and agreement tracking.

244 claims · sorted by confidence

AlphaGo defeated Lee Sedol in March 2016, a shocking breakthrough that marked the post-AlphaGo era.

high·anthropic, gemini, gemini-lite, openai, grok-premium, perplexity, grok, openai-mini·goertzel.org techcrunch.com en.wikipedia.org+3·

OpenAI’s charter/user-facing framing defines AGI as highly autonomous systems that outperform humans at most economically valuable work.

high·anthropic, gemini, openai, grok-premium, grok, perplexity, openai-mini·goertzel.org web.eecs.umich.edu coefficientgiving.org+4·

Herbert Simon said in 1965 that machines will be capable, within twenty years, of doing any work a man can do.

high·anthropic, gemini-lite, openai, perplexity, openai-mini, grok·goertzel.org openphilanthropy.org ubs.com+4·

Sam Altman said AGI might have already "whooshed by".

high·gemini-lite, openai, grok-premium, perplexity, openai-mini, grok·ubs.com time.com en.wikipedia.org+2·

Jensen Huang said on Lex Fridman’s podcast in March 2026, “I think we’ve achieved AGI.”

high·openai, grok-premium, perplexity, openai-mini, grok·quoteinvestigator.com en.wikipedia.org scholarpedia.org+4·

The term “Artificial General Intelligence” (AGI) was coined around the early-to-mid 2000s and is commonly associated with Shane Legg and Ben Goertzel, though some sources attribute the original coinage to Mark Gubrud in 1997.

medium·anthropic, gemini, gemini-lite, openai, grok-premium, perplexity, openai-mini, grok(anthropic, gemini, gemini-lite, openai, grok-premium, perplexity, openai-mini, grok disagree)·goertzel.org arxiv.org medium.com+5·

AGI was defined as a machine or agent capable of performing any intellectual task a human can, with some formulations emphasizing learning any economically relevant task or human-like versatility beyond narrow AI.

medium·gemini, gemini-lite, grok-premium, openai-mini, perplexity, grok·scholarpedia.org wikimili.com en.wikipedia.org+3·

Marvin Minsky predicted in 1967 that within a generation, the problem of creating artificial intelligence would be substantially solved, and in 1970 told Life magazine that in three to eight years there would be a machine with the general intelligence of an average human being.

medium·anthropic, gemini-lite, openai, perplexity, grok·quoteinvestigator.com en.wikipedia.org flycer.com+4·

In the mid-1970s, researchers in the symbolic/GOFAI era believed that machine intelligence would come from explicit symbolic manipulation, logical rules, and large knowledge bases/inference engines, as in expert systems and other rule-based systems.

medium·gemini, gemini-lite, grok-premium, openai-mini, grok·goertzel.org en.wikipedia.org en.wikipedia.org·

Several AI leaders, including Ray Kurzweil, Elon Musk, Demis Hassabis, and Dario Amodei, gave varying public timelines for AGI around 2025-2030.

medium·gemini-lite, openai, perplexity, openai-mini, grok·goertzel.org thevccorner.com aiws.net+12·

OpenAI reportedly used an internal five-level AGI framework in 2024, with Level 1 as conversational/chatbots, Level 2 as reasoners, Level 3 as agents, Level 4 as innovators, and Level 5 as organizations; Google DeepMind also published a separate 2024 "Levels of AGI" framework.

medium·openai, grok-premium, perplexity, grok·bloomberg.com wired.com deepmind.google+7·

The Turing Test was proposed by Alan Turing in 1950.

medium·gemini, openai, grok-premium, perplexity·en.wikipedia.org en.wikipedia.org·

Microsoft researchers published the "Sparks of AGI" paper on GPT-4, which highlighted capabilities such as theory-of-mind-like behavior, novel problem-solving, and versatility.

medium·anthropic, grok-premium, grok, perplexity·en.wikipedia.org tomaszs2.medium.com glassboxmedicine.com+1·

Huang said there is a "zero percent" chance that 100,000 AI agents could build Nvidia.

medium·openai, grok-premium, openai-mini, grok·web.mit.edu en.wikipedia.org scholarpedia.org+2·

In 2024, GPT-4, Gemini, and Claude were identified as a multimodal era/model set.

medium·gemini, grok-premium, perplexity, grok·goertzel.org coefficientgiving.org en.wikipedia.org+1·

Sources

81 unique sources cited across 244 claims.

Academic21 sources

Artificial general intelligence - Wikipedia

en.wikipedia.orgvia anthropic, gemini, gemini-lite, openai, grok-premium, perplexity, grok, openai-mini

55 claims

Ben Goertzel - Wikipedia

en.wikipedia.orgvia anthropic, gemini, gemini-lite, openai, grok-premium, perplexity, grok, openai-mini

43 claims

Marvin Minsky - Wikipedia

en.wikipedia.orgvia openai, grok-premium, perplexity, openai-mini, grok, anthropic, gemini-lite

19 claims

Progress in AI

web.eecs.umich.eduvia anthropic, gemini, openai, grok-premium, grok, perplexity, openai-mini, gemini-lite

10 claims

Artificial General Intelligence - Scholarpedia

scholarpedia.orgvia openai, grok-premium, perplexity, openai-mini, grok, gemini, gemini-lite

10 claims

Artificial General Intelligence | Springer Nature Link

link.springer.comvia anthropic, gemini, gemini-lite, openai, grok-premium, perplexity, grok, openai-mini

8 claims

Herbert A. Simon - Wikipedia

en.wikipedia.orgvia anthropic, gemini-lite, openai, perplexity, openai-mini, grok, grok-premium, gemini

8 claims

1 Why We Don’t Have AGI Yet Peter Voss, AIGO.ai, Austin TX, USA, peter@aigo.ai

arxiv.orgvia openai, grok-premium, perplexity, grok, gemini, anthropic

7 claims

Will a computer take your job? | Pursuit by the University of Melbourne

pursuit.unimelb.edu.auvia gemini-lite, openai, perplexity, openai-mini, grok, grok-premium

6 claims

AI effect - Wikipedia

en.wikipedia.orgvia gemini-lite, openai, grok-premium, grok, gemini, anthropic

6 claims

News & Media33 sources

Artificial General Intelligence Or AGI: A Very Short History

www-forbes-com.translate.googvia openai, grok-premium, perplexity, openai-mini, grok, gemini, gemini-lite, anthropic

12 claims

What is AGI?

medium.comvia openai, grok-premium, perplexity, grok, anthropic, openai-mini, gemini-lite

10 claims

Healthcare Is AI's Hardest Test

time.comvia openai-mini, gemini-lite

10 claims

Axios interview: Google's Hassabis warns of AI race's hazards

axios.comvia anthropic, gemini, openai, grok-premium, grok, perplexity, openai-mini

9 claims

General Artificial Intelligence: What, When, & How | The Innovation Room Podcast

theinnovationroom.comvia anthropic, gemini-lite, openai, perplexity, openai-mini, grok, gemini, grok-premium

9 claims

2025-01-08 | How OpenAI's Sam Altman Is Thinking About AGI and Superintelligence in 2025

time.comvia gemini-lite, openai, grok-premium, perplexity, openai-mini, grok

9 claims

5 predictions from Marvin Minsky as ‘father of AI’ dies aged 88

siliconrepublic.comvia gemini-lite, openai, grok-premium, perplexity, openai-mini, grok, anthropic

8 claims

Demis Hassabis Is Preparing for AI's Endgame

time.comvia gemini-lite, openai, perplexity, openai-mini, grok

8 claims

Meta's AI Chief Yann LeCun on AGI, Open-Source, and AI Risk

time.comvia openai-mini

8 claims

'I think we've achieved AGI' - Nvidia's CEO believes we've finally reached artificial general intelligence

tomsguide.comvia openai, grok-premium, perplexity, openai-mini, grok

7 claims

AGI historyartificial general intelligenceAI effectAGI definitionsAI benchmarks timelineAlphaGo GPT-3 GPT-4AGI predictions graveyard

Share this research

Read by 56 researchers

AGI's Moving Goalposts: 50 Years of Change

The Shifting Goalposts of AGI: 50 Years of Moving the Finish Line

Cross-Provider Synthesis of 8 Independent Research Reports

Executive Summary

Cross-Provider Consensus

1. The "AI Effect" / Reclassification Pattern Is Universal

2. The Term "AGI" Was Coined ~2002-2006 by Shane Legg and Ben Goertzel

3. OpenAI's Economic Definition Replaced Cognitive Definitions Around 2021

4. AlphaGo's Victory Was Immediately Reclassified

5. Jensen Huang's "I Think We've Achieved AGI" With Immediate Qualification

6. Sam Altman's "AGI Kinda Went Whooshing By"

7. OpenAI's Microsoft Contract Contains an AGI Trigger Clause

8. Meta's Deliberate Use of "AMI" Instead of "AGI"

9. Herbert Simon (1965) and Marvin Minsky (1967/1970) Predictions Were Spectacularly Wrong

10. The Tombstone Pattern: Every Benchmark Gets Reclassified Upon Achievement

Unique Insights by Provider

Anthropic

Gemini

OpenAI

Grok-Premium

Perplexity

Grok

OpenAI-Mini

Gemini-Lite

Contradictions and Disagreements

Contradiction 1: Who Coined "AGI" and When?

Contradiction 2: Has the Turing Test Been "Passed"?

Contradiction 3: The Nature Paper on AGI Having Arrived

Contradiction 4: Andrew Ng's 2015 Position

Contradiction 5: Elon Musk's AGI Timeline Predictions

Contradiction 6: DeepMind's AGI Level Framework — How Many Levels?

Detailed Synthesis

Part I: The Pattern That Predates the Term

Part II: The Birth of "AGI" and Why It Was Embarrassing

Part III: AlphaGo and the Speed of Reclassification

Part IV: GPT-3 and the Economic Pivot

Part V: The Frameworks Era (2024) and the Industrialization of Goalpost-Shifting

Part VI: The Money Map — Definitions as Commercial Strategy

Part VII: The LeCun/Hassabis Schism and the AMI Gambit

Part VIII: The Whoosh and the Qualification

Part IX: The Scorecard — We Keep Passing Old Bars

The Complete Tables

B. THE PREDICTIONS GRAVEYARD

C. THE NEVER LIST

D. THE RECLASSIFICATION PLAYBOOK

E. THE TOMBSTONE LIST

F. THE AGI SCORECARD

G. THE MONEY MAP

H. WHAT'S NEXT ON THE NEVER LIST

The Closing Paradox

Evidence Explorer

Synthesized from 8 providers on March 25, 2026 using methodical mode

Go Deeper

Verify the specific 2025 Turing Test study — identify the exact paper, methodology, judge selection criteria, and whether "three-party" format matches Turing's original specification. Determine whether GPT-4.5's 73% result is reproducible and what the critics' methodological objections are.

Investigate the OpenAI-Microsoft contract clause in detail — what exactly triggers the AGI declaration, who has the authority to make it, what the specific IP consequences are, and whether this clause has been renegotiated as part of OpenAI's 2024-2025 restructuring.

Map the financial flows between AGI definition choices and investment outcomes — specifically, does the timing of AGI timeline announcements correlate with funding rounds, valuation events, or regulatory hearings? Test whether "AGI is close" statements cluster around fundraising periods.

Key Claims

Sources

Topics