April 11, 2026·26 min read·5 views·5 providers

QEC Performance on Superconducting Hardware (2025–26)

Q: Direct measurement of d=3 LER on Google Willow — does the published Nature paper [src_1] contain explicit d=3 LER data, or only the inferred value from Λ back-calculation, and what is the detection probability metric's relationship to LER under different decoder assumptions?

All five providers treat the d=3 Willow LER as inferred (~0.65%), but Grok uniquely reports a directly measured detection probability of 7.7% at d=3. The relationship between detection probability and LER depends on the decoder, and the primary source [src_1] should be examined directly to determine whether a direct d=3 LER was published. This gap affects every cross-platform comparison in the analysis.

Q: Primary peer-reviewed derivation and hardware validation of the [[13,1,3]] surface code — specifically, does [src_23] (Kim et al., 2024) constitute the primary derivation, what are the measured LER results on IBM hardware, and how does the syndrome qubit reuse mechanism affect fault-tolerance under realistic noise models?

Providers disagree on whether a primary peer-reviewed derivation exists [OpenAI vs. Perplexity], and the architectural detail of syndrome qubit reuse (flagged by Gemini) has significant implications for fault-tolerance that no provider fully analyzed. If the [[13,1,3]] code achieves comparable LER to the [[17,1,3]] code with 25% fewer qubits, this is a significant practical result for near-term hardware.

Q: Quantitative impact of IBM Heron r2's increased measurement cycle time (768 ns → 3,000–4,000 ns) on d=3 LER, and whether decoder innovations (FiLM, Relay BP, AlphaQubit) can compensate for this architectural disadvantage — specifically, what LER is achievable on Heron r2 at d=3 with state-of-the-art neural decoders?

Perplexity's unique finding that IBM's measurement cycle time *increased* on Heron r2 [src_2][src_39] — combined with Grok's finding that measurement/reset noise dominates failures [src_5] — suggests IBM's LER gap vs. Google may be addressable through decoder innovation rather than hardware redesign. The FiLM decoder's 11.1× improvement [src_36][src_135] was demonstrated on IBM hardware, making this a tractable near-term research question with direct implications for the IBM vs. Google comparison.

Q: Formal feasibility analysis of Active Inference / Free Energy Principle as a classical decoder architecture for surface codes — specifically, can the variational free energy minimization framework be operationalized as a real-time syndrome decoder with latency ≤1.1 μs, and how does it compare to MWPM and AlphaQubit on standard benchmark datasets?

All five providers confirmed a complete absence of hardware demonstrations, but the theoretical bridges exist [src_25][src_64][src_65][src_132]. The question is whether the computational structure of Active Inference (belief propagation over a generative model) is compatible with the latency requirements of real-time QEC. A simulation study comparing Active Inference decoders to MWPM on standard syndrome datasets (e.g., Google's published Willow data) would determine feasibility before any hardware investment.

Q: Second-order CRQC timeline implications of decoder improvements — specifically, if a novel decoder raises effective Λ from 2.14 to 4.0 on existing hardware, what is the revised physical qubit estimate for ECDLP-256, and does this bring CRQC within reach of hardware generations expected before 2029?

Grok and Anthropic provide the formula and rough estimates [src_16][src_76], but no provider performed a rigorous calculation connecting decoder improvement (measured as Λ increase) to revised CRQC qubit counts using Google's March 2026 whitepaper [src_16] as the baseline. Given that Google's 2029 Q-Day projection [src_73][src_74] is already driving enterprise security decisions, a quantitative sensitivity analysis of CRQC timelines to decoder performance would be directly actionable for policy and security planning.

Cross-provider synthesis of QEC on superconducting hardware (2025–26): d=3 LERs for IBM/Google/Microsoft, 13‑qubit patches, Active Inference gaps, and CRQC

Key Finding

No published or preprint results demonstrate Active Inference, Helmholtz-machine-based methods, or closely related geometric manifold approaches being used as real-time classical control layers or decoders for superconducting quantum error correction/error mitigation on actual hardware.

high confidenceSupported by grok-premium, openai, anthropic, gemini, perplexity

Justin Furniss

@Parallect.ai and @SecureCoders. Founder. Hacker. Father. Seeker of all things AI

grok-premiumopenaianthropicgeminiperplexity

Contents

Cross-Provider Synthesis: Quantum Error Correction on Superconducting Hardware, Active Inference, and CRQC Timelines

Analysis Date: April 10, 2026 | Sources: 146 | Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity

Executive Summary

Google Willow has definitively crossed the surface code threshold, demonstrating Λ = 2.14 ± 0.02 suppression per distance increment, with a d=7 LER of 0.143% ± 0.003% per cycle and a logical qubit lifetime 2.4× that of its best physical qubit ^[1]. The d=3 LER (~0.65% per cycle) is inferred, not directly published, introducing meaningful uncertainty into cross-platform comparisons.
IBM Heron r2 and Microsoft are not comparable benchmarks for surface-code LER at d=3: IBM's heavy-hex architecture yields ~3–4% LER per syndrome round at d=3 (improving to ~96% survival probability per round with optimizations) ^[2], while Microsoft has no published surface-code LER data whatsoever — its Majorana 1 effort remains at single-qubit characterization and faces serious scientific credibility questions ^[2].
The "13 physical qubits per d=3 patch" question has a legitimate answer: The [[13,1,3]] surface code variant is a rigorously defined, hardware-efficient construction that encodes one logical qubit with distance 3 using 13 physical qubits — fewer than the standard 17 (square lattice) or 19 (heavy-hex) — and has been validated in simulation and small-scale experiments on IBM hardware ^[2]. This is not fringe; it is a known, published code variant.
Active Inference / Free Energy Principle applied to real-time QEC on superconducting hardware does not exist as a demonstrated result: All five providers independently confirmed zero published or preprint results applying Friston's framework, Helmholtz machines, or geometric manifold methods as classical control layers for real hardware QEC ^[4]. The conceptual bridge exists in theory; the experimental bridge does not.
Google's March 2026 revised CRQC estimate (~500,000 physical qubits for ECDLP-256 in ~9 minutes) and 2029 Q-Day projection ^[3] represent a dramatic compression of timelines. A validated improvement in LER at d=3 via a novel classical decoder would directly reduce required code distance, yielding quadratic savings in physical qubit overhead — potentially accelerating CRQC feasibility further, or alternatively, enabling fault-tolerant computation at smaller scale than currently projected.

Cross-Provider Consensus

1. Google Willow's Core QEC Performance Metrics

Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity (all five)

All five providers independently confirmed from ^[1] (Nature, 2024):

105-qubit processor (Grok says 105; OpenAI says 72-qubit "Willow" — see Contradictions)
d=7 LER: 0.143% ± 0.003% per cycle
Suppression factor Λ = 2.14 ± 0.02
Logical lifetime: 291 ± 6 μs
Best physical qubit lifetime: ~119 μs; median: ~85 μs
Logical lifetime exceeds best physical qubit by 2.4 ± 0.3×
Real-time decoder latency: ~63 μs at d=5
Cycle time: 1.1 μs

This is the most robustly confirmed finding in the dataset.

2. d=3 LER for Google Willow is Inferred, Not Directly Measured

Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity

All providers agree the d=3 figure (~0.57–0.70% per cycle) is derived by back-calculating from the Λ suppression factor, not from a directly reported d=3 experiment. Perplexity ^[2] is most explicit: "Willow's direct measurements extend to distance-5 and distance-7 surface codes, not distance-3." The range of inferred values (0.57–0.70%) reflects different extrapolation assumptions across providers.

3. Standard d=3 Surface Code Qubit Counts

Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity

Universal agreement on:

Standard rotated planar surface code: 17 physical qubits (9 data + 8 ancilla) ^[2]- Heavy-hexagonal lattice (IBM): 19 physical qubits via formula n = (5d² − 2d − 1)/2 ^[10]
[[13,1,3]] code: legitimate, published, hardware-validated variant ^[2]### 4. IBM Heron r2 d=3 LER Performance Confidence: MEDIUM-HIGH Providers: Grok-Premium, OpenAI, Anthropic, Perplexity

Convergent finding: IBM's d=3 heavy-hex demonstrations yield ~3–4% LER per syndrome round baseline, improving to ~96% survival probability per round (~4% LER) with optimizations including Pauli frame updates replacing reset operations ^[2]. IBM has not demonstrated below-threshold scaling (Λ > 1 going from d=3 to d=5) on superconducting hardware as of the analysis date. Anthropic notes IBM has not shown a logical qubit beating physical qubits ^[20].

5. Microsoft Has No Comparable Surface-Code LER Data

Confidence: HIGH Providers: Anthropic, Gemini, Perplexity

Microsoft's Majorana 1 effort is at single-qubit characterization stage ^[8]. The Nature editorial team concluded the manuscript "does not represent evidence for the presence of Majorana zero modes" ^[54]. Microsoft's significant QEC contributions are algorithmic (Floquet codes ^[53], 4D geometric codes [Gemini]). No surface-code LER data exists for comparison.

6. Active Inference / FEP Has Zero Hardware QEC Demonstrations

Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity

All five providers independently searched and found zero published or preprint results applying Active Inference, Helmholtz machines, or geometric manifold methods as real-time classical control layers for QEC on superconducting hardware ^[4]. This is a strong null result with high confidence.

7. Novel Classical Decoder = QEC Improvement, Not QEM

Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini

All four providers who addressed this question agree: applying a novel classical decoder to standard QEC hardware (with stabilizer encoding, syndrome extraction, and real-time correction) constitutes an improvement within QEC, not quantum error mitigation. QEM operates without logical encoding; QEC operates with it ^[3].

8. Google's 2029 Q-Day Projection and ~500,000 Qubit CRQC Estimate

Confidence: HIGH Providers: Grok-Premium, OpenAI, Anthropic, Gemini, Perplexity

All providers confirm Google's March 2026 whitepaper revised CRQC estimates downward to <500,000 physical qubits for ECDLP-256 ^[4]. Google's internal PQC migration deadline is 2029, two years ahead of NSA's 2031 target ^[2].

Unique Insights by Provider

Grok-Premium

Detection probability as a d=3 metric: Grok uniquely reports that for d=3 on Willow, the detection probability is approximately 7.7% ^[1], providing a complementary metric to LER that is directly measured (not inferred). This matters because detection probability is a hardware-level diagnostic that doesn't depend on decoder choice, making it a cleaner cross-platform comparison metric.
Logical error scaling formula: Grok explicitly states the scaling relation ε_L ~ (p_phys/p_thr)^{(d+1)/2} ^[16], which is the key formula linking physical error rates, threshold, and code distance — essential for quantifying the impact of any LER improvement on CRQC timelines.
Correlated error floors: Grok notes that rare correlated errors set floors in repetition codes at ~10⁻¹⁰, occurring roughly once per hour ^[2] — a critical practical limitation for long-duration fault-tolerant computation that other providers underemphasized.

OpenAI

USTC [[17,1,3]] explicit confirmation: OpenAI uniquely cites the Zuchongzhi 2.1 processor experiment ^[22] as an explicit implementation of the [[17,1,3]] surface code, providing a third-party (non-IBM, non-Google) confirmation of the 17-qubit standard. This matters for establishing the baseline as hardware-validated, not merely theoretical.
IBM 2022 27-qubit heavy-hex experiment specifics: OpenAI provides the most granular account of the 2022 IBM d=3 experiment ^[18], noting that post-selection to discard leakage events was required to reach the low end of ~3% per round — a methodological caveat that affects how the IBM numbers should be interpreted.
Anisotropic codes on IBM: OpenAI uniquely highlights IBM's exploration of d_x × d_z = 3×5 anisotropic distance codes ^[21], which represents a practical strategy for IBM's heavy-hex architecture that doesn't require the full overhead of isotropic distance scaling.

Anthropic

IBM VP Jay Gambetta's "engineering pipe dream" quote: Anthropic uniquely surfaces Gambetta's statement that the surface code approach requiring ~1,000 physical qubits per logical qubit was an "engineering pipe dream" ^[38], signaling IBM's strategic pivot toward qLDPC codes. This is strategically significant for understanding IBM's actual roadmap.
IBM Gross code efficiency: Anthropic uniquely quantifies IBM's qLDPC advantage: 12 logical qubits protected for ~1 million cycles using 288 physical qubits vs. ~3,000 for the equivalent surface code task ^[2]. This is the most concrete published comparison of qLDPC vs. surface code overhead.
Nature editorial rejection of Majorana claims: Anthropic is the only provider to explicitly cite the Nature editorial team's conclusion that Microsoft's manuscript "does not represent evidence for the presence of Majorana zero modes" ^[54] — a critical credibility flag for Microsoft's entire topological qubit program.
Sycamore baseline for comparison: Anthropic uniquely provides the Sycamore predecessor data — d=5 at ~2.9% per cycle, d=3 at ~3.0% per cycle ^[1] — establishing that Willow's ~20× improvement over Sycamore is the relevant historical benchmark, not just the absolute LER values.

Gemini

Microsoft's 4D geometric codes: Gemini uniquely identifies Microsoft's algorithmic contribution of "4D geometric codes" as a significant QEC contribution distinct from hardware ^[3]. This matters for understanding Microsoft's actual technical contributions to the field.
Quantum-Assisted Helmholtz Machine: Gemini uniquely notes that the Quantum-Assisted Helmholtz Machine has been proposed and simulated on gate-based quantum computers and annealers ^[3], using quantum sampling to reduce computational complexity. This is the closest thing to an Active Inference / Helmholtz machine connection to quantum computing — but it runs in the opposite direction (quantum assists classical ML, not classical ML assists QEC).
Tensor network representation of Active Inference control flow: Gemini uniquely notes that quantum systems executing Active Inference can have their control flow mathematically represented as tensor networks ^[15], providing a theoretical bridge that could eventually be operationalized.
Surface-13 syndrome qubit reuse: Gemini provides the most precise description of the [[13,1,3]] architecture: 9 data qubits + 4 syndrome qubits, where each syndrome qubit is used twice per QEC cycle ^[12]. This is a critical implementation detail that explains how 13 qubits achieves d=3 protection.

Perplexity

FiLM-conditioned neural decoder: Perplexity uniquely identifies the FiLM-conditioned neural decoder framework ^[2] as achieving up to 11.1× reduction in logical error rate compared to conventional MWPM decoders on IBM hardware, while generalizing to unseen qubit chains without retraining. This is the most concrete published example of a novel classical decoder achieving meaningful LER improvement on real hardware — directly relevant to the question of what proof of a novel decoder improvement would look like.
IBM measurement cycle time degradation: Perplexity uniquely notes that IBM's measurement cycle times increased from 768 ns to 3,000–4,000 ns on Heron ^[2] — a counterintuitive finding that partially explains why IBM's LER numbers lag behind Google despite competitive gate fidelities.
Coherence time specifics for Heron r2: Perplexity provides the most specific Heron r2 coherence data: T1/T2 ≈ 213/120 μs on best devices ^[2], which is actually competitive with Willow's ~119 μs best physical qubit — suggesting IBM's LER gap is decoder/architecture-driven, not purely hardware-limited.
QpiAI high-speed decoder: Perplexity uniquely identifies QpiAI's March 2026 high-speed QEC decoder platform for superconducting systems ^[95], representing the most recent (within weeks of analysis date) commercial decoder development.

Contradictions and Disagreements

Contradiction 1: Google Willow Qubit Count — 72 vs. 105

Severity: MEDIUM (affects interpretation of hardware scale)

OpenAI ^[17]: States "a 72-qubit 'Willow' processor"
Anthropic, Gemini, Perplexity ^[1]: State "105-qubit processor"
Grok-Premium: States "101-qubit code" (referring to the d=7 patch specifically)

Analysis: The 105-qubit figure appears to be the total processor qubit count; the 101-qubit figure refers to the d=7 logical memory patch; the 72-qubit figure may be a confusion with an earlier Google device (Sycamore had 72 qubits) or a misattribution. The Nature paper ^[1] is the primary source and should be treated as authoritative. Do not resolve — flag for verification against ^[1] directly.

Contradiction 2: IBM d=3 LER — "3–4% per round" vs. "~10% per round"

Severity: HIGH (directly affects cross-platform comparison)

Grok-Premium, OpenAI, Anthropic ^[3]: Report IBM d=3 LER as ~3–4% per syndrome round, improving to ~96% survival probability (~4% LER) with optimizations
Perplexity ^[2]: States "approximately 90 percent logical fidelity per syndrome extraction round... equivalent to 10 percent logical error rate per round"

Analysis: These figures may refer to different experimental conditions, different IBM devices (27-qubit Eagle vs. Heron r2), different numbers of syndrome rounds, or different definitions of "per round." The 96% survival figure from Grok/Anthropic ^[5] and the 90% figure from Perplexity ^[2] could both be correct for different experiments. This is a critical unresolved discrepancy — the difference between 4% and 10% LER per round is enormous for any cross-platform comparison. Do not resolve — requires direct examination of ^[5] vs. ^[2].

Contradiction 3: IBM's QEC Status — "Not yet deployed" vs. "Demonstrated"

Severity: MEDIUM

Anthropic ^[20]: "Error-correction is not yet deployed on Heron-class devices"; "IBM had not publicly shown a logical qubit beating physical qubits"
Grok-Premium, OpenAI ^[2]: IBM has "demonstrated logical qubits and entangled logical qubits" and "below-threshold behavior is observed in stability experiments on Heron r2"
OpenAI ^[18]: "As of 2025, IBM has not demonstrated a breakeven logical qubit with transmons"

Analysis: The contradiction likely reflects different definitions of "deployed" vs. "demonstrated in research context," and different thresholds for what constitutes "below-threshold behavior." IBM has demonstrated QEC in research settings ^[7] but has not achieved the breakeven metric (logical lifetime > physical lifetime) that Google achieved with Willow. Partially resolvable: IBM has demonstrated QEC research results but not breakeven.

Contradiction 4: The [[13,1,3]] Code — "Validated in simulation and small-scale experiments" vs. "No primary derivation in peer-reviewed literature"

Severity: MEDIUM

OpenAI ^[2]: "The existence of 13-qubit distance-3 codes has been validated in simulation and small-scale experiments"
Perplexity ^[3]: "No peer-reviewed paper in the search results provides a primary derivation of a 13-qubit distance-3 code construction"

Analysis: OpenAI cites ^[23] (Kim et al., 2024, "Magic State Injection on IBM Quantum Processors Above the Distillation Threshold") and ^[24] (ResearchGate diagram of [[13,1,3]] surface code in heavy-hex structure). Perplexity's claim that no primary derivation exists may reflect a search gap rather than a true absence. The ResearchGate diagram ^[24] and the IBM paper ^[23] together suggest the code is real and has been implemented. Do not fully resolve — the primary peer-reviewed derivation paper should be identified.

Contradiction 5: CRQC Physical Qubit Estimates — Wide Range

Severity: HIGH (directly affects policy and security implications)

Grok-Premium ^[16]: "<500,000 physical qubits" for ECDLP-256
Anthropic ^[76]: "fewer than 100,000 under newer architectures" (low confidence, 0.74)
Anthropic ^[76]: "fewer than 500,000 for elliptic curve cryptography"
Anthropic ^[76]: "1 million noisy qubits for RSA-2048 running for one week" (May 2025 estimate)
Gemini: "under 500,000 physical qubits and 9 minutes of runtime" for ECDLP-256

Analysis: These figures refer to different targets (RSA-2048 vs. ECDLP-256/secp256k1), different time horizons, and different architectural assumptions. The range from <100,000 to 1,000,000 is not a contradiction per se but reflects genuinely different problem instances and assumptions. The Google March 2026 whitepaper ^[16] figure of <500,000 for ECDLP-256 in ~9 minutes appears to be the most recent and specific. Do not resolve — the specific paper ^[16] should be read directly.

Detailed Synthesis

Part I: The State of QEC on Real Superconducting Hardware (2025–2026)

Google Willow: The Benchmark

The most significant QEC result on real superconducting hardware as of April 2026 remains Google's Willow experiment, published in Nature in late 2024 ^[1]. All five providers independently confirmed the core metrics: a 105-qubit transmon processor running a ZXXZ variant of the surface code [Gemini], demonstrating below-threshold scaling with Λ = 2.14 ± 0.02 per distance increment of 2 [Grok, OpenAI, Anthropic, Gemini, Perplexity]. The d=7 logical memory achieves 0.143% ± 0.003% LER per cycle, with a logical lifetime of 291 ± 6 μs — exceeding the best physical qubit lifetime (~119 μs) by a factor of 2.4 ± 0.3 ^[1].

This "beyond breakeven" achievement is the first credible demonstration of the quantum threshold theorem in superconducting hardware at scale ^[4]. The exponential suppression was confirmed via linear regression of ln(ε_d) vs. d ^[1], and the system maintained performance up to 10⁶ cycles with real-time decoding at ~63 μs average latency ^[17]. Compared to the predecessor Sycamore processor — which achieved ~3.0% per cycle at d=3 and ~2.9% at d=5 ^[1] — Willow represents approximately a 20× improvement in logical error performance [Anthropic].

The d=3 LER question: Critically, no provider found a directly published d=3 LER for Willow. Grok notes that "the paper reports averages over nine different d=3 subgrids on the processor" with a detection probability of approximately 7.7% ^[1] — a directly measured hardware metric. The LER figure of ~0.65% per cycle (range: 0.57–0.70% across providers) is universally treated as inferred by back-calculating from Λ ^[2]. This distinction matters enormously for cross-platform comparison: the inferred d=3 figure carries uncertainty from the extrapolation, and the detection probability metric (7.7%) is arguably more reliable for hardware comparison purposes.

A practical limitation flagged by Grok ^[2]: rare correlated error bursts occurring roughly once per hour create an error floor at approximately 10⁻¹⁰ in repetition codes. This is not a decoder problem — it is a hardware problem that will require engineering solutions for any long-duration fault-tolerant computation.

IBM Heron r2: Competitive Hardware, Lagging QEC Demonstrations

IBM's Heron r2 presents a paradox: hardware specifications that are competitive with Willow (T1/T2 ≈ 213/120 μs on best devices ^[2]; single-qubit gate fidelity >99.9% ^[20]; two-qubit gate error ~0.35–0.5% ^[5]) but QEC demonstrations that lag significantly behind Google's results.

The heavy-hexagonal lattice architecture ^[10] — IBM's signature design choice — provides hardware advantages (reduced crosstalk, lower connectivity requirements) but imposes a structural overhead: the native d=3 code requires 19 physical qubits via n = (5d² − 2d − 1)/2 ^[10], compared to 17 for the standard rotated surface code. IBM's published d=3 demonstrations on Heron-class devices report LER in the range of 3–4% per syndrome round ^[2], with optimizations (replacing reset operations with Pauli frame updates ^[2]) improving survival probability to ~96% per round ^[5].

A critical finding from Perplexity ^[2]: IBM's measurement cycle times increased from 768 ns to 3,000–4,000 ns on Heron. This counterintuitive degradation in cycle speed — despite improved gate fidelities — partially explains the LER gap with Google. Grok confirms that "measurement/reset noise dominates failures on Heron r2" ^[5]. IBM has not demonstrated below-threshold scaling (Λ > 1 going from d=3 to d=5) on superconducting hardware, and has not achieved the breakeven metric ^[2].

IBM's strategic response is notable: rather than competing with Google on surface code metrics, IBM is pivoting to qLDPC codes. Anthropic uniquely surfaces IBM VP Jay Gambetta's characterization of the ~1,000 physical qubits per logical qubit surface code approach as an "engineering pipe dream" ^[2], with IBM's Gross code achieving 12 logical qubits for ~1 million cycles using 288 physical qubits vs. ~3,000 for the equivalent surface code task ^[38]. IBM's Relay BP decoder achieves 5–10× speedup over prior decoders ^[38].

Microsoft: A Different Game

Microsoft's position is categorically different from Google and IBM. Its Majorana 1 processor ^[8] is pursuing topological qubits based on Majorana zero modes — a fundamentally different physical platform with potentially superior error properties if realized. However, the Nature editorial team's conclusion that the manuscript "does not represent evidence for the presence of Majorana zero modes" ^[54] is a serious credibility challenge. Microsoft's approach remains at single-qubit characterization stage ^[8], with next steps involving a 4×2 tetron array ^[51].

Microsoft's genuine contributions to QEC are algorithmic: Floquet codes for topological qubits ^[53] and 4D geometric codes ^[3]. There is no surface-code LER data from Microsoft for any distance. The comparison requested in the query — IBM Heron r2 vs. Google Willow vs. Microsoft hardware at d=3 — is only partially answerable: Google provides inferred d=3 data, IBM provides direct d=3 data (with caveats), and Microsoft provides nothing comparable.

Part II: Physical Qubit Overhead — Standard and Non-Standard

The Standard Counts

The standard rotated planar surface code at d=3 requires 17 physical qubits: 9 data qubits and 8 ancilla/measurement qubits ^[2]. This is confirmed by all five providers and validated in hardware by USTC's Zuchongzhi 2.1 implementation of the [[17,1,3]] code ^[22]. The formula n = 2d² − 1 gives 17 for d=3.

IBM's heavy-hexagonal architecture cannot natively map this 17-qubit layout without SWAP gates ^[2], because heavy-hex qubits connect to only 2–3 neighbors rather than 4 ^[2]. The native heavy-hex d=3 code requires 19 physical qubits via n = (5d² − 2d − 1)/2 ^[10], confirmed by all providers.

The [[13,1,3]] Code: Legitimate and Published

The question of whether 13 physical qubits per d=3 patch is legitimate has a clear answer: yes. The [[13,1,3]] surface code variant is a rigorously defined construction that encodes one logical qubit with distance 3 using 13 physical qubits — approximately 25% fewer than the standard 17-qubit code ^[2].

Gemini provides the most precise architectural description ^[12]: the Surface-13 layout uses 9 data qubits and 4 syndrome qubits, where each syndrome qubit is used twice per QEC cycle. This syndrome qubit reuse is the key innovation enabling the qubit reduction. The code corrects any single X or Z error ^[24] and has been validated in simulation and small-scale experiments on IBM hardware ^[23].

The code appears in the context of IBM's heavy-hex architecture specifically ^[2], where the reduced connectivity of the heavy-hex lattice is actually exploited rather than worked around. Perplexity notes that no peer-reviewed paper in its search results provides a primary derivation ^[3] — a gap that OpenAI's citation of ^[23] (Kim et al., 2024) partially addresses but does not fully resolve.

The theoretical minimum for a single-error-correcting code is 5 physical qubits (the quantum Hamming bound ^[12]), but the [[5,1,3]] code has impractical syndrome extraction requirements for superconducting hardware. The [[13,1,3]] code represents a practical middle ground between the theoretical minimum and the standard 17-qubit implementation.

Part III: Active Inference, Free Energy Principle, and Quantum Computing

What Active Inference Is

Active Inference, developed by Karl Friston, is a Bayesian framework for adaptive systems grounded in the Free Energy Principle (FEP) ^[2]. Under the FEP, any system maintaining its boundary against an environment must minimize variational free energy — a quantity that upper-bounds surprise (negative log model evidence) ^[2]. Agents minimize free energy through two complementary processes: perception (updating internal beliefs to match observations) and action (changing the environment to match predictions) ^[13].

A Helmholtz machine is a related generative model architecture that uses wake-sleep algorithms for approximate inference ^[13] — essentially a bidirectional neural network with a generative model (top-down) and a recognition model (bottom-up). The connection to quantum computing is theoretical: Gemini notes that quantum systems executing Active Inference can have their control flow represented as tensor networks ^[15], and Gemini uniquely identifies the Quantum-Assisted Helmholtz Machine ^[3] — but this runs in the opposite direction (quantum hardware assists classical ML inference, not classical ML assists quantum error correction).

The Null Result: No Hardware Demonstrations

All five providers independently confirmed: there are zero published or preprint results demonstrating Active Inference, geometric manifold methods, or Helmholtz machine-based approaches as classical control layers for real-time QEC on superconducting hardware ^[5]. This is a strong null result.

The theoretical connections exist: ^[25] ("A free energy principle for generic quantum systems") and ^[65] ("Connecting the free energy principle with quantum cognition") establish conceptual bridges, and ^[68] ("Predictive Inference and the Origin of Quantum Phenomena") extends the framework toward quantum mechanics. But none of these papers involve real hardware, syndrome decoding, or error correction.

The closest existing work in the spirit of "novel classical control for QEC" involves:

AlphaQubit (Google): Recurrent Transformer-based neural network decoder, outperforming MWPM on real Sycamore data at d=3 and d=5 ^[2]2. FiLM-conditioned neural decoder (Perplexity, ^[2]): Achieves up to 11.1× LER reduction vs. MWPM on IBM hardware, generalizes without retraining
Relay BP (IBM): 5–10× speedup over prior decoders ^[38]
QpiAI decoder (March 2026): High-speed QEC decoder for superconducting systems ^[95]

These represent the actual frontier of novel classical control for QEC — all are ML/neural network approaches, none are explicitly framed in Active Inference or FEP terms.

Part IV: QEC vs. QEM — The Definitional Question

The distinction between quantum error correction and quantum error mitigation is well-established and consistently reported across all providers ^[4]:

Quantum Error Correction (QEC):

Encodes quantum information redundantly into logical qubits via stabilizer codes
Measures syndromes in real time without disturbing logical information
Applies corrections during computation to preserve the quantum state
Scales exponentially: below threshold, larger codes give exponentially lower LER
Requires physical error rates below a threshold (~1% for surface codes)
Enables indefinitely long computations in principle

Quantum Error Mitigation (QEM):

Operates on physical or shallow-encoded circuits without full FT encoding
Uses extra circuit runs + classical post-processing to estimate ideal expectation values
Does not correct the actual quantum state in real time
Has exponential sampling overhead that does not scale to deep FT algorithms ^[86]
No threshold requirement — works at any error rate, but with diminishing returns

The key definitional question: Does applying a novel classical decoder/controller to standard QEC hardware constitute a new QEC method or QEM?

All four providers who addressed this agree [Grok, OpenAI, Anthropic, Gemini]: it is a decoder improvement within an existing QEC framework — neither a new QEC code nor QEM. The logical encoding, stabilizer structure, and syndrome extraction remain unchanged; only the classical processing of syndrome data changes. This is analogous to AlphaQubit ^[2] or the FiLM decoder ^[2] — improvements to the QEC pipeline that reduce effective LER without changing the underlying code.

The practical difference matters enormously: a better decoder can improve LER and raise effective Λ without requiring new hardware, new code designs, or new physical qubit counts. This makes decoder innovation one of the highest-leverage interventions in the current QEC landscape.

Part V: What Peer-Reviewed Proof Would Look Like

Based on the convergent findings across providers ^[4], a credible peer-reviewed demonstration of improved LER at d=3 via a novel classical control approach would require:

Minimum experimental requirements:

Head-to-head LER comparison: Novel decoder vs. MWPM baseline (the standard) and ideally vs. AlphaQubit/neural network baselines, on the same hardware and same experimental conditions
Statistical power: 10⁵–10⁶ total error correction cycles ^[17] to accumulate sufficient statistics for LER at the ~0.1–1% per cycle level
Real-time operation: Decoder must operate at or below the syndrome cycle time (~1.1 μs for Google, ~3–4 μs for IBM ^[2]) or demonstrate a pipelined architecture that avoids backlog ^[14]
Hardware specification disclosure: Full characterization of physical error rates (T1, T2, gate fidelities, measurement fidelities) to enable reproducibility and comparison
Multiple distance points: At minimum d=3 and d=5 to demonstrate that the improvement persists (or improves) with distance — i.e., that effective Λ increases
Logical vs. physical comparison: Demonstration that the logical qubit lifetime exceeds physical qubit lifetime (breakeven metric)

What would make it compelling:

Demonstration on multiple hardware platforms (not just one vendor's device)
Comparison of effective Λ with and without the novel decoder
Latency characterization showing real-time feasibility
Open-source code release for reproducibility

The FiLM decoder precedent ^[2] — 11.1× LER reduction vs. MWPM on IBM hardware — provides the current benchmark for what a strong decoder paper looks like. A novel Active Inference or geometric manifold approach would need to meet or exceed this bar.

Part VI: CRQC Timelines and Implications

Google's March 2026 whitepaper ^[2] revised the CRQC resource estimate to <500,000 physical qubits for ECDLP-256 (secp256k1) in approximately 9 minutes ^[3]. This represents a dramatic compression from earlier estimates: Anthropic notes a May 2025 paper showed RSA-2048 factorization with <1 million noisy qubits ^[76], down from 20 million in 2019 estimates ^[2]. Google's internal PQC migration deadline of 2029 ^[74] — two years ahead of NSA's 2031 target ^[82] — reflects confidence in this accelerated timeline.

The connection to LER improvements is direct and quantifiable. Grok provides the key formula ^[16]: ε_L ~ (p_phys/p_thr)^{(d+1)/2}, with overhead per logical qubit scaling as O(d²). If a novel decoder raises effective Λ from 2.14 to, say, 4.0, the required code distance for a target logical error rate drops substantially — and since overhead scales as d², this yields quadratic savings in physical qubits per logical qubit ^[16]. Anthropic estimates ^[76] that a Λ improvement from 2 to 4 could reduce required distance by ~30% and roughly halve physical qubit overhead.

For the ~1,200–1,450 logical qubits required for ECDLP-256 ^[16], even a modest reduction in required code distance per logical qubit compounds dramatically across the full system. A decoder that achieves 11.1× LER reduction at d=3 ^[2] — if that improvement persists at higher distances — could potentially reduce the physical qubit requirement by a factor of 2–4×, bringing CRQC within reach of near-term hardware generations.

The implication is bidirectional: better decoders could either (a) accelerate CRQC timelines by enabling fault-tolerant computation at smaller physical qubit counts, or (b) enable more reliable fault-tolerant computation at the same qubit count, improving the practical feasibility of the ~500,000-qubit CRQC. Either way, decoder innovation is a critical variable in CRQC timeline projections that is currently underweighted in public threat assessments.

Evidence Explorer

Select a citation or claim to explore evidence.

Go Deeper

Follow-up questions based on where providers disagreed or confidence was low.

Direct measurement of d=3 LER on Google Willow — does the published Nature paper [src_1] contain explicit d=3 LER data, or only the inferred value from Λ back-calculation, and what is the detection probability metric's relationship to LER under different decoder assumptions?

All five providers treat the d=3 Willow LER as inferred (~0.65%), but Grok uniquely reports a directly measured detection probability of 7.7% at d=3. The relationship between detection probability and LER depends on the decoder, and the primary source should be examined directly to determine whether a direct d=3 LER was published. This gap affects every cross-platform comparison in the analysis.

Low ConfidenceXS tier

Investigate this →

Primary peer-reviewed derivation and hardware validation of the [[13,1,3]] surface code — specifically, does [src_23] (Kim et al., 2024) constitute the primary derivation, what are the measured LER results on IBM hardware, and how does the syndrome qubit reuse mechanism affect fault-tolerance under realistic noise models?

Providers disagree on whether a primary peer-reviewed derivation exists [OpenAI vs. Perplexity], and the architectural detail of syndrome qubit reuse (flagged by Gemini) has significant implications for fault-tolerance that no provider fully analyzed. If the [[13,1,3]] code achieves comparable LER to the [[17,1,3]] code with 25% fewer qubits, this is a significant practical result for near-term hardware.

DisagreementS tier

Investigate this →

Quantitative impact of IBM Heron r2's increased measurement cycle time (768 ns → 3,000–4,000 ns) on d=3 LER, and whether decoder innovations (FiLM, Relay BP, AlphaQubit) can compensate for this architectural disadvantage — specifically, what LER is achievable on Heron r2 at d=3 with state-of-the-art neural decoders?

Perplexity's unique finding that IBM's measurement cycle time *increased* on Heron r2 — combined with Grok's finding that measurement/reset noise dominates failures — suggests IBM's LER gap vs. Google may be addressable through decoder innovation rather than hardware redesign. The FiLM decoder's 11.1× improvement was demonstrated on IBM hardware, making this a tractable near-term research question with direct implications for the IBM vs. Google comparison.

ImplicationM tier

Investigate this →

Formal feasibility analysis of Active Inference / Free Energy Principle as a classical decoder architecture for surface codes — specifically, can the variational free energy minimization framework be operationalized as a real-time syndrome decoder with latency ≤1.1 μs, and how does it compare to MWPM and AlphaQubit on standard benchmark datasets?

All five providers confirmed a complete absence of hardware demonstrations, but the theoretical bridges exist . The question is whether the computational structure of Active Inference (belief propagation over a generative model) is compatible with the latency requirements of real-time QEC. A simulation study comparing Active Inference decoders to MWPM on standard syndrome datasets (e.g., Google's published Willow data) would determine feasibility before any hardware investment.

Low ConfidenceM tier

Investigate this →

Second-order CRQC timeline implications of decoder improvements — specifically, if a novel decoder raises effective Λ from 2.14 to 4.0 on existing hardware, what is the revised physical qubit estimate for ECDLP-256, and does this bring CRQC within reach of hardware generations expected before 2029?

Grok and Anthropic provide the formula and rough estimates , but no provider performed a rigorous calculation connecting decoder improvement (measured as Λ increase) to revised CRQC qubit counts using Google's March 2026 whitepaper as the baseline. Given that Google's 2029 Q-Day projection is already driving enterprise security decisions, a quantitative sensitivity analysis of CRQC timelines to decoder performance would be directly actionable for policy and security planning.

ImplicationL tier

Investigate this →

Key Claims

Cross-provider analysis with confidence ratings and agreement tracking.

149 claims · sorted by confidence

Google Willow’s surface-code experiment achieved below-threshold logical error suppression, including a distance-7, 101-qubit logical memory with logical error rate 0.143% ± 0.003% per cycle and a suppression factor Λ = 2.14 ± 0.02 for each +2 increase in code distance.

high·grok-premium, openai, anthropic, gemini, perplexity·quantamagazine.org research.google nature.com+2·

Google’s Willow is a superconducting quantum processor; sources describe it as a 105-qubit tunable-coupler transmon processor, and one report says Google later demonstrated a distance-7 surface-code logical memory on a 72-qubit Willow device in late 2024.

high·grok-premium, anthropic, gemini, perplexity, openai·quantamagazine.org research.google nature.com+3·

Google Quantum AI’s March 2026 whitepaper estimated that breaking ECDLP-256 (secp256k1) would require under 500,000 physical superconducting qubits, about 1,200–1,450 logical qubits, and roughly 70–90 million Toffoli gates, with runtime on the order of minutes.

high·grok-premium, gemini, openai, anthropic, perplexity·physicsworld.com postquantum.com researchgate.net+5·

Google’s AlphaQubit is a recurrent/Transformer-based machine-learning neural network decoder for the surface code.

high·grok-premium, openai, anthropic, gemini, perplexity·arxiv.org itpro.com pubmed.ncbi.nlm.nih.gov+4·

high·grok-premium, openai, anthropic, gemini, perplexity·research.ibm.com pubmed.ncbi.nlm.nih.gov physicsworld.com+7·

At distance d=7 in the Willow experiment, the logical qubit lifetime was about 291 μs, while the best physical qubit lifetime was about 119 μs, so the logical lifetime was about 2.4× longer; the median physical qubit lifetime was about 85 μs.

high·grok-premium, openai, anthropic, gemini, perplexity·researchgate.net research.google nature.com+2·

Quantum error mitigation reduces the impact of noise in quantum computations, typically using post-processing or circuit-level techniques to estimate ideal expectation values without full logical/FT encoding or massive physical qubit redundancy.

high·grok-premium, openai, anthropic, gemini, perplexity·arxiv.org nature.com azure.microsoft.com+7·

IBM’s Heron R2-family two-qubit gate error is around 0.35–0.5% (with best-qubit values near 0.3%), while single-qubit gate error is around 0.02–0.03% (or >99.9% fidelity).

high·grok-premium, openai, anthropic, perplexity·arxiv.org linkedin.com postquantum.com+1·

Quantum error correction encodes information redundantly in multiple physical qubits and detects, measures, and corrects errors during computation using syndrome measurements and classical decoding.

high·grok-premium, openai, anthropic, perplexity·quantamagazine.org azure.microsoft.com techtarget.com+2·

Quantum error correction can reduce logical error rates exponentially as code distance increases when physical error rates are below threshold.

high·grok-premium, openai, anthropic, perplexity·quantamagazine.org azure.microsoft.com techtarget.com+2·

The rotated heavy-hex method reduces physical-qubit overhead from scaling like about 5d² to about 5d, with a d=3 instance around 15 qubits rather than 18.

high·grok-premium, openai, anthropic, perplexity·arxiv.org arxiv.org quantum-journal.org+2·

Google Willow’s real-time decoder had an average latency of 63 microseconds at d=5 on FPGA.

high·grok-premium, openai, perplexity·research.google nature.com arxiv.org·

Google announced or set a 2029 deadline for completing post-quantum cryptography (PQC) migration, which is earlier than the NSA’s 2030/2031 target.

high·grok-premium, anthropic, gemini·blog.google quantum.microsoft.com ibm.com+4·

Active Inference is a Bayesian framework in cognitive science/neuroscience in which agents or systems minimize variational free energy to reduce prediction error or surprise and thereby support adaptive behavior.

high·grok-premium, openai, gemini·physicsworld.com pmc.ncbi.nlm.nih.gov arxiv.org+1·

IBM uses a heavy-hexagonal lattice architecture for its Heron hardware.

high·openai, perplexity, gemini·research.ibm.com arxiv.org reddit.com+3·

Sources

112 unique sources cited across 149 claims.

Academic53 sources

nature.comvia grok-premium, openai, anthropic, gemini, perplexity

26 claims

Quantum error correction below the surface code threshold

arxiv.orgvia grok-premium, openai, anthropic, gemini, perplexity

25 claims

arxiv.orgvia grok-premium, openai, anthropic, perplexity, gemini

17 claims

Realizing Repeated Quantum Error Correction in a Distance-Three Surface Code

arxiv.orgvia grok-premium, openai, anthropic, gemini, perplexity

11 claims

A free energy principle for generic quantum systems

arxiv.orgvia grok-premium, openai, anthropic, gemini, perplexity

10 claims

link.aps.orgvia openai, anthropic, grok-premium, perplexity, gemini

10 claims

nature.comvia grok-premium, openai, anthropic, gemini, perplexity

9 claims

Physics - Cracking the Challenge of Quantum Error Correction

physics.aps.orgvia grok-premium, openai, anthropic, gemini, perplexity

8 claims

quantum-journal.orgvia grok-premium, openai, anthropic, perplexity, gemini

8 claims

Control ﬂow in active inference systems

arxiv.orgvia grok-premium, openai, anthropic, gemini, perplexity

7 claims

News & Media15 sources

quantamagazine.orgvia grok-premium, openai, anthropic, gemini, perplexity

14 claims

bengoertzel.substack.comvia grok-premium, gemini, openai, anthropic, perplexity

13 claims

Quantum processor enters unprecedented territory for error correction – Physics World

physicsworld.comvia grok-premium, gemini, openai, anthropic, perplexity

7 claims

reddit.comvia openai, perplexity, gemini, grok-premium

6 claims

What is quantum error correction? | TechTarget

techtarget.comvia grok-premium, openai, anthropic, perplexity

5 claims

Thoughts on the 2025 IBM Quantum Roadmap Update | by Jack Krupansky | Medium

jackkrupansky.medium.comvia anthropic, gemini

4 claims

Google's new quantum chip hits error correction target

phys.orgvia anthropic

3 claims

Why I’m Rapidly Losing Faith in the Prospects for Quantum Error Correction | by Jack Krupansky | Medium

jackkrupansky.medium.comvia anthropic

3 claims

Experts weigh in on Microsoft's topological qubit claim – Physics World

physicsworld.comvia anthropic

3 claims

QpiAI Achieves High-Speed Quantum Error Correction on Superconducting Systems with New Decoder Platform

thequantuminsider.comvia anthropic

3 claims

quantum error correctionsurface code d=3logical error rateIBM Heron r2Google Willow13-qubit surface codeActive Inference quantum controlCRQC timelines

Share this research

Read by 5 researchers

QEC Performance on Superconducting Hardware (2025–26)

Cross-Provider Synthesis: Quantum Error Correction on Superconducting Hardware, Active Inference, and CRQC Timelines

Executive Summary

Cross-Provider Consensus

1. Google Willow's Core QEC Performance Metrics

2. d=3 LER for Google Willow is Inferred, Not Directly Measured

3. Standard d=3 Surface Code Qubit Counts

5. Microsoft Has No Comparable Surface-Code LER Data

6. Active Inference / FEP Has Zero Hardware QEC Demonstrations

7. Novel Classical Decoder = QEC Improvement, Not QEM

8. Google's 2029 Q-Day Projection and ~500,000 Qubit CRQC Estimate

Unique Insights by Provider

Grok-Premium

OpenAI

Anthropic

Gemini

Perplexity

Contradictions and Disagreements

Contradiction 1: Google Willow Qubit Count — 72 vs. 105

Contradiction 2: IBM d=3 LER — "3–4% per round" vs. "~10% per round"

Contradiction 3: IBM's QEC Status — "Not yet deployed" vs. "Demonstrated"

Contradiction 4: The [[13,1,3]] Code — "Validated in simulation and small-scale experiments" vs. "No primary derivation in peer-reviewed literature"

Contradiction 5: CRQC Physical Qubit Estimates — Wide Range

Detailed Synthesis

Part I: The State of QEC on Real Superconducting Hardware (2025–2026)

Google Willow: The Benchmark

IBM Heron r2: Competitive Hardware, Lagging QEC Demonstrations

Microsoft: A Different Game

Part II: Physical Qubit Overhead — Standard and Non-Standard

The Standard Counts

The [[13,1,3]] Code: Legitimate and Published

Part III: Active Inference, Free Energy Principle, and Quantum Computing

What Active Inference Is

The Null Result: No Hardware Demonstrations

Part IV: QEC vs. QEM — The Definitional Question

Part V: What Peer-Reviewed Proof Would Look Like

Part VI: CRQC Timelines and Implications

Evidence Explorer

Synthesized from 5 providers on April 10, 2026 using methodical mode

Go Deeper

Direct measurement of d=3 LER on Google Willow — does the published Nature paper [src_1] contain explicit d=3 LER data, or only the inferred value from Λ back-calculation, and what is the detection probability metric's relationship to LER under different decoder assumptions?

Key Claims

Sources

Topics