Cross-Provider Synthesis: Quantum Error Correction on Superconducting Hardware, Active Inference, and CRQC Timelines
Analysis Date: April 10, 2026 | Sources: 146 | Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity
Executive Summary
-
Google Willow has definitively crossed the surface code threshold, demonstrating Λ = 2.14 ± 0.02 suppression per distance increment, with a d=7 LER of 0.143% ± 0.003% per cycle and a logical qubit lifetime 2.4× that of its best physical qubit [1]. The d=3 LER (~0.65% per cycle) is inferred, not directly published, introducing meaningful uncertainty into cross-platform comparisons.
-
IBM Heron r2 and Microsoft are not comparable benchmarks for surface-code LER at d=3: IBM's heavy-hex architecture yields ~3–4% LER per syndrome round at d=3 (improving to ~96% survival probability per round with optimizations) [2], while Microsoft has no published surface-code LER data whatsoever — its Majorana 1 effort remains at single-qubit characterization and faces serious scientific credibility questions [2].
-
The "13 physical qubits per d=3 patch" question has a legitimate answer: The [[13,1,3]] surface code variant is a rigorously defined, hardware-efficient construction that encodes one logical qubit with distance 3 using 13 physical qubits — fewer than the standard 17 (square lattice) or 19 (heavy-hex) — and has been validated in simulation and small-scale experiments on IBM hardware [2]. This is not fringe; it is a known, published code variant.
-
Active Inference / Free Energy Principle applied to real-time QEC on superconducting hardware does not exist as a demonstrated result: All five providers independently confirmed zero published or preprint results applying Friston's framework, Helmholtz machines, or geometric manifold methods as classical control layers for real hardware QEC [4]. The conceptual bridge exists in theory; the experimental bridge does not.
-
Google's March 2026 revised CRQC estimate (~500,000 physical qubits for ECDLP-256 in ~9 minutes) and 2029 Q-Day projection [3] represent a dramatic compression of timelines. A validated improvement in LER at d=3 via a novel classical decoder would directly reduce required code distance, yielding quadratic savings in physical qubit overhead — potentially accelerating CRQC feasibility further, or alternatively, enabling fault-tolerant computation at smaller scale than currently projected.
Cross-Provider Consensus
1. Google Willow's Core QEC Performance Metrics
Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity (all five)
All five providers independently confirmed from [1] (Nature, 2024):
- 105-qubit processor (Grok says 105; OpenAI says 72-qubit "Willow" — see Contradictions)
- d=7 LER: 0.143% ± 0.003% per cycle
- Suppression factor Λ = 2.14 ± 0.02
- Logical lifetime: 291 ± 6 μs
- Best physical qubit lifetime: ~119 μs; median: ~85 μs
- Logical lifetime exceeds best physical qubit by 2.4 ± 0.3×
- Real-time decoder latency: ~63 μs at d=5
- Cycle time: 1.1 μs
This is the most robustly confirmed finding in the dataset.
2. d=3 LER for Google Willow is Inferred, Not Directly Measured
Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity
All providers agree the d=3 figure (~0.57–0.70% per cycle) is derived by back-calculating from the Λ suppression factor, not from a directly reported d=3 experiment. Perplexity [2] is most explicit: "Willow's direct measurements extend to distance-5 and distance-7 surface codes, not distance-3." The range of inferred values (0.57–0.70%) reflects different extrapolation assumptions across providers.
3. Standard d=3 Surface Code Qubit Counts
Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity
Universal agreement on:
- Standard rotated planar surface code: 17 physical qubits (9 data + 8 ancilla) [2]- Heavy-hexagonal lattice (IBM): 19 physical qubits via formula n = (5d² − 2d − 1)/2 [10]
- [[13,1,3]] code: legitimate, published, hardware-validated variant [2]### 4. IBM Heron r2 d=3 LER Performance Confidence: MEDIUM-HIGH Providers: Grok-Premium, OpenAI, Anthropic, Perplexity
Convergent finding: IBM's d=3 heavy-hex demonstrations yield ~3–4% LER per syndrome round baseline, improving to ~96% survival probability per round (~4% LER) with optimizations including Pauli frame updates replacing reset operations [2]. IBM has not demonstrated below-threshold scaling (Λ > 1 going from d=3 to d=5) on superconducting hardware as of the analysis date. Anthropic notes IBM has not shown a logical qubit beating physical qubits [20].
5. Microsoft Has No Comparable Surface-Code LER Data
Confidence: HIGH Providers: Anthropic, Gemini, Perplexity
Microsoft's Majorana 1 effort is at single-qubit characterization stage [8]. The Nature editorial team concluded the manuscript "does not represent evidence for the presence of Majorana zero modes" [54]. Microsoft's significant QEC contributions are algorithmic (Floquet codes [53], 4D geometric codes [Gemini]). No surface-code LER data exists for comparison.
6. Active Inference / FEP Has Zero Hardware QEC Demonstrations
Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity
All five providers independently searched and found zero published or preprint results applying Active Inference, Helmholtz machines, or geometric manifold methods as real-time classical control layers for QEC on superconducting hardware [4]. This is a strong null result with high confidence.
7. Novel Classical Decoder = QEC Improvement, Not QEM
Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini
All four providers who addressed this question agree: applying a novel classical decoder to standard QEC hardware (with stabilizer encoding, syndrome extraction, and real-time correction) constitutes an improvement within QEC, not quantum error mitigation. QEM operates without logical encoding; QEC operates with it [3].
8. Google's 2029 Q-Day Projection and ~500,000 Qubit CRQC Estimate
Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity
All providers confirm Google's March 2026 whitepaper revised CRQC estimates downward to <500,000 physical qubits for ECDLP-256 [4]. Google's internal PQC migration deadline is 2029, two years ahead of NSA's 2031 target [2].
Unique Insights by Provider
Grok-Premium
- Detection probability as a d=3 metric: Grok uniquely reports that for d=3 on Willow, the detection probability is approximately 7.7% [1], providing a complementary metric to LER that is directly measured (not inferred). This matters because detection probability is a hardware-level diagnostic that doesn't depend on decoder choice, making it a cleaner cross-platform comparison metric.
- Logical error scaling formula: Grok explicitly states the scaling relation ε_L ~ (p_phys/p_thr)^{(d+1)/2} [16], which is the key formula linking physical error rates, threshold, and code distance — essential for quantifying the impact of any LER improvement on CRQC timelines.
- Correlated error floors: Grok notes that rare correlated errors set floors in repetition codes at ~10⁻¹⁰, occurring roughly once per hour [2] — a critical practical limitation for long-duration fault-tolerant computation that other providers underemphasized.
OpenAI
- USTC [[17,1,3]] explicit confirmation: OpenAI uniquely cites the Zuchongzhi 2.1 processor experiment [22] as an explicit implementation of the [[17,1,3]] surface code, providing a third-party (non-IBM, non-Google) confirmation of the 17-qubit standard. This matters for establishing the baseline as hardware-validated, not merely theoretical.
- IBM 2022 27-qubit heavy-hex experiment specifics: OpenAI provides the most granular account of the 2022 IBM d=3 experiment [18], noting that post-selection to discard leakage events was required to reach the low end of ~3% per round — a methodological caveat that affects how the IBM numbers should be interpreted.
- Anisotropic codes on IBM: OpenAI uniquely highlights IBM's exploration of d_x × d_z = 3×5 anisotropic distance codes [21], which represents a practical strategy for IBM's heavy-hex architecture that doesn't require the full overhead of isotropic distance scaling.
Anthropic
- IBM VP Jay Gambetta's "engineering pipe dream" quote: Anthropic uniquely surfaces Gambetta's statement that the surface code approach requiring ~1,000 physical qubits per logical qubit was an "engineering pipe dream" [38], signaling IBM's strategic pivot toward qLDPC codes. This is strategically significant for understanding IBM's actual roadmap.
- IBM Gross code efficiency: Anthropic uniquely quantifies IBM's qLDPC advantage: 12 logical qubits protected for ~1 million cycles using 288 physical qubits vs. ~3,000 for the equivalent surface code task [2]. This is the most concrete published comparison of qLDPC vs. surface code overhead.
- Nature editorial rejection of Majorana claims: Anthropic is the only provider to explicitly cite the Nature editorial team's conclusion that Microsoft's manuscript "does not represent evidence for the presence of Majorana zero modes" [54] — a critical credibility flag for Microsoft's entire topological qubit program.
- Sycamore baseline for comparison: Anthropic uniquely provides the Sycamore predecessor data — d=5 at ~2.9% per cycle, d=3 at ~3.0% per cycle [1] — establishing that Willow's ~20× improvement over Sycamore is the relevant historical benchmark, not just the absolute LER values.
Gemini
- Microsoft's 4D geometric codes: Gemini uniquely identifies Microsoft's algorithmic contribution of "4D geometric codes" as a significant QEC contribution distinct from hardware [3]. This matters for understanding Microsoft's actual technical contributions to the field.
- Quantum-Assisted Helmholtz Machine: Gemini uniquely notes that the Quantum-Assisted Helmholtz Machine has been proposed and simulated on gate-based quantum computers and annealers [3], using quantum sampling to reduce computational complexity. This is the closest thing to an Active Inference / Helmholtz machine connection to quantum computing — but it runs in the opposite direction (quantum assists classical ML, not classical ML assists QEC).
- Tensor network representation of Active Inference control flow: Gemini uniquely notes that quantum systems executing Active Inference can have their control flow mathematically represented as tensor networks [15], providing a theoretical bridge that could eventually be operationalized.
- Surface-13 syndrome qubit reuse: Gemini provides the most precise description of the [[13,1,3]] architecture: 9 data qubits + 4 syndrome qubits, where each syndrome qubit is used twice per QEC cycle [12]. This is a critical implementation detail that explains how 13 qubits achieves d=3 protection.
Perplexity
- FiLM-conditioned neural decoder: Perplexity uniquely identifies the FiLM-conditioned neural decoder framework [2] as achieving up to 11.1× reduction in logical error rate compared to conventional MWPM decoders on IBM hardware, while generalizing to unseen qubit chains without retraining. This is the most concrete published example of a novel classical decoder achieving meaningful LER improvement on real hardware — directly relevant to the question of what proof of a novel decoder improvement would look like.
- IBM measurement cycle time degradation: Perplexity uniquely notes that IBM's measurement cycle times increased from 768 ns to 3,000–4,000 ns on Heron [2] — a counterintuitive finding that partially explains why IBM's LER numbers lag behind Google despite competitive gate fidelities.
- Coherence time specifics for Heron r2: Perplexity provides the most specific Heron r2 coherence data: T1/T2 ≈ 213/120 μs on best devices [2], which is actually competitive with Willow's ~119 μs best physical qubit — suggesting IBM's LER gap is decoder/architecture-driven, not purely hardware-limited.
- QpiAI high-speed decoder: Perplexity uniquely identifies QpiAI's March 2026 high-speed QEC decoder platform for superconducting systems [95], representing the most recent (within weeks of analysis date) commercial decoder development.
Contradictions and Disagreements
Contradiction 1: Google Willow Qubit Count — 72 vs. 105
Severity: MEDIUM (affects interpretation of hardware scale)
- OpenAI [17]: States "a 72-qubit 'Willow' processor"
- Anthropic, Gemini, Perplexity [1]: State "105-qubit processor"
- Grok-Premium: States "101-qubit code" (referring to the d=7 patch specifically)
Analysis: The 105-qubit figure appears to be the total processor qubit count; the 101-qubit figure refers to the d=7 logical memory patch; the 72-qubit figure may be a confusion with an earlier Google device (Sycamore had 72 qubits) or a misattribution. The Nature paper [1] is the primary source and should be treated as authoritative. Do not resolve — flag for verification against [1] directly.
Contradiction 2: IBM d=3 LER — "3–4% per round" vs. "~10% per round"
Severity: HIGH (directly affects cross-platform comparison)
- Grok-Premium, OpenAI, Anthropic [3]: Report IBM d=3 LER as ~3–4% per syndrome round, improving to ~96% survival probability (~4% LER) with optimizations
- Perplexity [2]: States "approximately 90 percent logical fidelity per syndrome extraction round... equivalent to 10 percent logical error rate per round"
Analysis: These figures may refer to different experimental conditions, different IBM devices (27-qubit Eagle vs. Heron r2), different numbers of syndrome rounds, or different definitions of "per round." The 96% survival figure from Grok/Anthropic [5] and the 90% figure from Perplexity [2] could both be correct for different experiments. This is a critical unresolved discrepancy — the difference between 4% and 10% LER per round is enormous for any cross-platform comparison. Do not resolve — requires direct examination of [5] vs. [2].
Contradiction 3: IBM's QEC Status — "Not yet deployed" vs. "Demonstrated"
Severity: MEDIUM
- Anthropic [20]: "Error-correction is not yet deployed on Heron-class devices"; "IBM had not publicly shown a logical qubit beating physical qubits"
- Grok-Premium, OpenAI [2]: IBM has "demonstrated logical qubits and entangled logical qubits" and "below-threshold behavior is observed in stability experiments on Heron r2"
- OpenAI [18]: "As of 2025, IBM has not demonstrated a breakeven logical qubit with transmons"
Analysis: The contradiction likely reflects different definitions of "deployed" vs. "demonstrated in research context," and different thresholds for what constitutes "below-threshold behavior." IBM has demonstrated QEC in research settings [7] but has not achieved the breakeven metric (logical lifetime > physical lifetime) that Google achieved with Willow. Partially resolvable: IBM has demonstrated QEC research results but not breakeven.
Contradiction 4: The [[13,1,3]] Code — "Validated in simulation and small-scale experiments" vs. "No primary derivation in peer-reviewed literature"
Severity: MEDIUM
- OpenAI [2]: "The existence of 13-qubit distance-3 codes has been validated in simulation and small-scale experiments"
- Perplexity [3]: "No peer-reviewed paper in the search results provides a primary derivation of a 13-qubit distance-3 code construction"
Analysis: OpenAI cites [23] (Kim et al., 2024, "Magic State Injection on IBM Quantum Processors Above the Distillation Threshold") and [24] (ResearchGate diagram of [[13,1,3]] surface code in heavy-hex structure). Perplexity's claim that no primary derivation exists may reflect a search gap rather than a true absence. The ResearchGate diagram [24] and the IBM paper [23] together suggest the code is real and has been implemented. Do not fully resolve — the primary peer-reviewed derivation paper should be identified.
Contradiction 5: CRQC Physical Qubit Estimates — Wide Range
Severity: HIGH (directly affects policy and security implications)
- Grok-Premium [16]: "<500,000 physical qubits" for ECDLP-256
- Anthropic [76]: "fewer than 100,000 under newer architectures" (low confidence, 0.74)
- Anthropic [76]: "fewer than 500,000 for elliptic curve cryptography"
- Anthropic [76]: "1 million noisy qubits for RSA-2048 running for one week" (May 2025 estimate)
- Gemini: "under 500,000 physical qubits and 9 minutes of runtime" for ECDLP-256
Analysis: These figures refer to different targets (RSA-2048 vs. ECDLP-256/secp256k1), different time horizons, and different architectural assumptions. The range from <100,000 to 1,000,000 is not a contradiction per se but reflects genuinely different problem instances and assumptions. The Google March 2026 whitepaper [16] figure of <500,000 for ECDLP-256 in ~9 minutes appears to be the most recent and specific. Do not resolve — the specific paper [16] should be read directly.
Detailed Synthesis
Part I: The State of QEC on Real Superconducting Hardware (2025–2026)
Google Willow: The Benchmark
The most significant QEC result on real superconducting hardware as of April 2026 remains Google's Willow experiment, published in Nature in late 2024 [1]. All five providers independently confirmed the core metrics: a 105-qubit transmon processor running a ZXXZ variant of the surface code [Gemini], demonstrating below-threshold scaling with Λ = 2.14 ± 0.02 per distance increment of 2 [Grok, OpenAI, Anthropic, Gemini, Perplexity]. The d=7 logical memory achieves 0.143% ± 0.003% LER per cycle, with a logical lifetime of 291 ± 6 μs — exceeding the best physical qubit lifetime (~119 μs) by a factor of 2.4 ± 0.3 [1].
This "beyond breakeven" achievement is the first credible demonstration of the quantum threshold theorem in superconducting hardware at scale [4]. The exponential suppression was confirmed via linear regression of ln(ε_d) vs. d [1], and the system maintained performance up to 10⁶ cycles with real-time decoding at ~63 μs average latency [17]. Compared to the predecessor Sycamore processor — which achieved ~3.0% per cycle at d=3 and ~2.9% at d=5 [1] — Willow represents approximately a 20× improvement in logical error performance [Anthropic].
The d=3 LER question: Critically, no provider found a directly published d=3 LER for Willow. Grok notes that "the paper reports averages over nine different d=3 subgrids on the processor" with a detection probability of approximately 7.7% [1] — a directly measured hardware metric. The LER figure of ~0.65% per cycle (range: 0.57–0.70% across providers) is universally treated as inferred by back-calculating from Λ [2]. This distinction matters enormously for cross-platform comparison: the inferred d=3 figure carries uncertainty from the extrapolation, and the detection probability metric (7.7%) is arguably more reliable for hardware comparison purposes.
A practical limitation flagged by Grok [2]: rare correlated error bursts occurring roughly once per hour create an error floor at approximately 10⁻¹⁰ in repetition codes. This is not a decoder problem — it is a hardware problem that will require engineering solutions for any long-duration fault-tolerant computation.
IBM Heron r2: Competitive Hardware, Lagging QEC Demonstrations
IBM's Heron r2 presents a paradox: hardware specifications that are competitive with Willow (T1/T2 ≈ 213/120 μs on best devices [2]; single-qubit gate fidelity >99.9% [20]; two-qubit gate error ~0.35–0.5% [5]) but QEC demonstrations that lag significantly behind Google's results.
The heavy-hexagonal lattice architecture [10] — IBM's signature design choice — provides hardware advantages (reduced crosstalk, lower connectivity requirements) but imposes a structural overhead: the native d=3 code requires 19 physical qubits via n = (5d² − 2d − 1)/2 [10], compared to 17 for the standard rotated surface code. IBM's published d=3 demonstrations on Heron-class devices report LER in the range of 3–4% per syndrome round [2], with optimizations (replacing reset operations with Pauli frame updates [2]) improving survival probability to ~96% per round [5].
A critical finding from Perplexity [2]: IBM's measurement cycle times increased from 768 ns to 3,000–4,000 ns on Heron. This counterintuitive degradation in cycle speed — despite improved gate fidelities — partially explains the LER gap with Google. Grok confirms that "measurement/reset noise dominates failures on Heron r2" [5]. IBM has not demonstrated below-threshold scaling (Λ > 1 going from d=3 to d=5) on superconducting hardware, and has not achieved the breakeven metric [2].
IBM's strategic response is notable: rather than competing with Google on surface code metrics, IBM is pivoting to qLDPC codes. Anthropic uniquely surfaces IBM VP Jay Gambetta's characterization of the ~1,000 physical qubits per logical qubit surface code approach as an "engineering pipe dream" [2], with IBM's Gross code achieving 12 logical qubits for ~1 million cycles using 288 physical qubits vs. ~3,000 for the equivalent surface code task [38]. IBM's Relay BP decoder achieves 5–10× speedup over prior decoders [38].
Microsoft: A Different Game
Microsoft's position is categorically different from Google and IBM. Its Majorana 1 processor [8] is pursuing topological qubits based on Majorana zero modes — a fundamentally different physical platform with potentially superior error properties if realized. However, the Nature editorial team's conclusion that the manuscript "does not represent evidence for the presence of Majorana zero modes" [54] is a serious credibility challenge. Microsoft's approach remains at single-qubit characterization stage [8], with next steps involving a 4×2 tetron array [51].
Microsoft's genuine contributions to QEC are algorithmic: Floquet codes for topological qubits [53] and 4D geometric codes [3]. There is no surface-code LER data from Microsoft for any distance. The comparison requested in the query — IBM Heron r2 vs. Google Willow vs. Microsoft hardware at d=3 — is only partially answerable: Google provides inferred d=3 data, IBM provides direct d=3 data (with caveats), and Microsoft provides nothing comparable.
Part II: Physical Qubit Overhead — Standard and Non-Standard
The Standard Counts
The standard rotated planar surface code at d=3 requires 17 physical qubits: 9 data qubits and 8 ancilla/measurement qubits [2]. This is confirmed by all five providers and validated in hardware by USTC's Zuchongzhi 2.1 implementation of the [[17,1,3]] code [22]. The formula n = 2d² − 1 gives 17 for d=3.
IBM's heavy-hexagonal architecture cannot natively map this 17-qubit layout without SWAP gates [2], because heavy-hex qubits connect to only 2–3 neighbors rather than 4 [2]. The native heavy-hex d=3 code requires 19 physical qubits via n = (5d² − 2d − 1)/2 [10], confirmed by all providers.
The [[13,1,3]] Code: Legitimate and Published
The question of whether 13 physical qubits per d=3 patch is legitimate has a clear answer: yes. The [[13,1,3]] surface code variant is a rigorously defined construction that encodes one logical qubit with distance 3 using 13 physical qubits — approximately 25% fewer than the standard 17-qubit code [2].
Gemini provides the most precise architectural description [12]: the Surface-13 layout uses 9 data qubits and 4 syndrome qubits, where each syndrome qubit is used twice per QEC cycle. This syndrome qubit reuse is the key innovation enabling the qubit reduction. The code corrects any single X or Z error [24] and has been validated in simulation and small-scale experiments on IBM hardware [23].
The code appears in the context of IBM's heavy-hex architecture specifically [2], where the reduced connectivity of the heavy-hex lattice is actually exploited rather than worked around. Perplexity notes that no peer-reviewed paper in its search results provides a primary derivation [3] — a gap that OpenAI's citation of [23] (Kim et al., 2024) partially addresses but does not fully resolve.
The theoretical minimum for a single-error-correcting code is 5 physical qubits (the quantum Hamming bound [12]), but the [[5,1,3]] code has impractical syndrome extraction requirements for superconducting hardware. The [[13,1,3]] code represents a practical middle ground between the theoretical minimum and the standard 17-qubit implementation.
Part III: Active Inference, Free Energy Principle, and Quantum Computing
What Active Inference Is
Active Inference, developed by Karl Friston, is a Bayesian framework for adaptive systems grounded in the Free Energy Principle (FEP) [2]. Under the FEP, any system maintaining its boundary against an environment must minimize variational free energy — a quantity that upper-bounds surprise (negative log model evidence) [2]. Agents minimize free energy through two complementary processes: perception (updating internal beliefs to match observations) and action (changing the environment to match predictions) [13].
A Helmholtz machine is a related generative model architecture that uses wake-sleep algorithms for approximate inference [13] — essentially a bidirectional neural network with a generative model (top-down) and a recognition model (bottom-up). The connection to quantum computing is theoretical: Gemini notes that quantum systems executing Active Inference can have their control flow represented as tensor networks [15], and Gemini uniquely identifies the Quantum-Assisted Helmholtz Machine [3] — but this runs in the opposite direction (quantum hardware assists classical ML inference, not classical ML assists quantum error correction).
The Null Result: No Hardware Demonstrations
All five providers independently confirmed: there are zero published or preprint results demonstrating Active Inference, geometric manifold methods, or Helmholtz machine-based approaches as classical control layers for real-time QEC on superconducting hardware [5]. This is a strong null result.
The theoretical connections exist: [25] ("A free energy principle for generic quantum systems") and [65] ("Connecting the free energy principle with quantum cognition") establish conceptual bridges, and [68] ("Predictive Inference and the Origin of Quantum Phenomena") extends the framework toward quantum mechanics. But none of these papers involve real hardware, syndrome decoding, or error correction.
The closest existing work in the spirit of "novel classical control for QEC" involves:
- AlphaQubit (Google): Recurrent Transformer-based neural network decoder, outperforming MWPM on real Sycamore data at d=3 and d=5 [2]2. FiLM-conditioned neural decoder (Perplexity, [2]): Achieves up to 11.1× LER reduction vs. MWPM on IBM hardware, generalizes without retraining
- Relay BP (IBM): 5–10× speedup over prior decoders [38]
- QpiAI decoder (March 2026): High-speed QEC decoder for superconducting systems [95]
These represent the actual frontier of novel classical control for QEC — all are ML/neural network approaches, none are explicitly framed in Active Inference or FEP terms.
Part IV: QEC vs. QEM — The Definitional Question
The distinction between quantum error correction and quantum error mitigation is well-established and consistently reported across all providers [4]:
Quantum Error Correction (QEC):
- Encodes quantum information redundantly into logical qubits via stabilizer codes
- Measures syndromes in real time without disturbing logical information
- Applies corrections during computation to preserve the quantum state
- Scales exponentially: below threshold, larger codes give exponentially lower LER
- Requires physical error rates below a threshold (~1% for surface codes)
- Enables indefinitely long computations in principle
Quantum Error Mitigation (QEM):
- Operates on physical or shallow-encoded circuits without full FT encoding
- Uses extra circuit runs + classical post-processing to estimate ideal expectation values
- Does not correct the actual quantum state in real time
- Has exponential sampling overhead that does not scale to deep FT algorithms [86]
- No threshold requirement — works at any error rate, but with diminishing returns
The key definitional question: Does applying a novel classical decoder/controller to standard QEC hardware constitute a new QEC method or QEM?
All four providers who addressed this agree [Grok, OpenAI, Anthropic, Gemini]: it is a decoder improvement within an existing QEC framework — neither a new QEC code nor QEM. The logical encoding, stabilizer structure, and syndrome extraction remain unchanged; only the classical processing of syndrome data changes. This is analogous to AlphaQubit [2] or the FiLM decoder [2] — improvements to the QEC pipeline that reduce effective LER without changing the underlying code.
The practical difference matters enormously: a better decoder can improve LER and raise effective Λ without requiring new hardware, new code designs, or new physical qubit counts. This makes decoder innovation one of the highest-leverage interventions in the current QEC landscape.
Part V: What Peer-Reviewed Proof Would Look Like
Based on the convergent findings across providers [4], a credible peer-reviewed demonstration of improved LER at d=3 via a novel classical control approach would require:
Minimum experimental requirements:
- Head-to-head LER comparison: Novel decoder vs. MWPM baseline (the standard) and ideally vs. AlphaQubit/neural network baselines, on the same hardware and same experimental conditions
- Statistical power: 10⁵–10⁶ total error correction cycles [17] to accumulate sufficient statistics for LER at the ~0.1–1% per cycle level
- Real-time operation: Decoder must operate at or below the syndrome cycle time (~1.1 μs for Google, ~3–4 μs for IBM [2]) or demonstrate a pipelined architecture that avoids backlog [14]
- Hardware specification disclosure: Full characterization of physical error rates (T1, T2, gate fidelities, measurement fidelities) to enable reproducibility and comparison
- Multiple distance points: At minimum d=3 and d=5 to demonstrate that the improvement persists (or improves) with distance — i.e., that effective Λ increases
- Logical vs. physical comparison: Demonstration that the logical qubit lifetime exceeds physical qubit lifetime (breakeven metric)
What would make it compelling:
- Demonstration on multiple hardware platforms (not just one vendor's device)
- Comparison of effective Λ with and without the novel decoder
- Latency characterization showing real-time feasibility
- Open-source code release for reproducibility
The FiLM decoder precedent [2] — 11.1× LER reduction vs. MWPM on IBM hardware — provides the current benchmark for what a strong decoder paper looks like. A novel Active Inference or geometric manifold approach would need to meet or exceed this bar.
Part VI: CRQC Timelines and Implications
Google's March 2026 whitepaper [2] revised the CRQC resource estimate to <500,000 physical qubits for ECDLP-256 (secp256k1) in approximately 9 minutes [3]. This represents a dramatic compression from earlier estimates: Anthropic notes a May 2025 paper showed RSA-2048 factorization with <1 million noisy qubits [76], down from 20 million in 2019 estimates [2]. Google's internal PQC migration deadline of 2029 [74] — two years ahead of NSA's 2031 target [82] — reflects confidence in this accelerated timeline.
The connection to LER improvements is direct and quantifiable. Grok provides the key formula [16]: ε_L ~ (p_phys/p_thr)^{(d+1)/2}, with overhead per logical qubit scaling as O(d²). If a novel decoder raises effective Λ from 2.14 to, say, 4.0, the required code distance for a target logical error rate drops substantially — and since overhead scales as d², this yields quadratic savings in physical qubits per logical qubit [16]. Anthropic estimates [76] that a Λ improvement from 2 to 4 could reduce required distance by ~30% and roughly halve physical qubit overhead.
For the ~1,200–1,450 logical qubits required for ECDLP-256 [16], even a modest reduction in required code distance per logical qubit compounds dramatically across the full system. A decoder that achieves 11.1× LER reduction at d=3 [2] — if that improvement persists at higher distances — could potentially reduce the physical qubit requirement by a factor of 2–4×, bringing CRQC within reach of near-term hardware generations.
The implication is bidirectional: better decoders could either (a) accelerate CRQC timelines by enabling fault-tolerant computation at smaller physical qubit counts, or (b) enable more reliable fault-tolerant computation at the same qubit count, improving the practical feasibility of the ~500,000-qubit CRQC. Either way, decoder innovation is a critical variable in CRQC timeline projections that is currently underweighted in public threat assessments.