Understanding Quantum Error Correction Thresholds

Introduction

In quantum error correction (QEC), the word threshold is ubiquitous. One hears about the surface code threshold, fault-tolerance thresholds, breakeven for logical qubits, and more. However, these notions are not all the same, and conflating them can obscure what has actually been demonstrated in an experiment and what remains to be done.

From the perspective of an experimentalist, thresholds are not just abstract asymptotic numbers; they are tools for:

diagnosing whether a device is in a regime where larger codes outperform smaller ones,
prioritizing hardware improvements (e.g., gate fidelity vs. readout vs. coherence), and
judging how far a given platform (superconducting, trapped ions, photonics) is from fault tolerance.

This article is organized around four questions:

What does the threshold of a code mean in various contexts (scaling threshold, breakeven, thresholds for logical operations)? What are memory and stabilization experiments in this context?
How are these different thresholds calculated (theoretically and experimentally)?
How should thresholds be interpreted, and how are they useful to experimentalists?
Which hardware parameters are most important for thresholds in different qubit modalities (superconducting circuits, trapped ions, photonics)?

1. What Do We Mean by "Threshold"?
2. Memory Experiments vs. Stabilization Experiments
- Memory Experiments
- Stabilization (Stability) Experiments
3. How Are Thresholds Calculated?
4. How to Interpret Thresholds in Practice
- The Error Suppression Factor \(\Lambda\)
- Error Budgeting and Prioritization
5. Hardware Parameters and Thresholds
Conclusion

What Do We Mean by "Threshold"? Several Distinct Notions

Code Family Scaling Threshold

The most theoretically fundamental notion is the code family scaling threshold. Consider a family of quantum codes with increasing distance \(d\) (e.g., planar surface codes with lattice size \(d \times d\)). Under a physical noise model characterized by an error rate \(p\), one can define the logical error rate per cycle \(\varepsilon_d(p)\).

The threshold \(p_{\mathrm{th}}\) of the code family (with respect to a given noise model and decoder) is the critical physical error rate such that

p < p_{\mathrm{th}} \quad \Rightarrow \quad \varepsilon_d(p) \to 0 \text{ as } d \to \infty,

typically in an exponential fashion. For instance, for the surface code under suitable noise models one often observes an approximate scaling of the form

\varepsilon_d(p) \approx A \left( \frac{p}{p_{\mathrm{th}}} \right)^{(d+1)/2},

where \(A\) is a weakly distance-dependent prefactor. When \(p < p_{\mathrm{th}}\), the logical error rate decreases exponentially with \(d\). When \(p> p_{\mathrm{th}}\), larger codes actually perform worse.

For the 2D surface code, one typically quotes thresholds of order \(\sim 10.9\%\) for phenomenological bit/phase-flip noise and \(\sim 10^{-3}\)--\(10^{-2}\) for realistic circuit-level depolarizing noise, depending on the exact circuit and decoder.

In experiment, a very convenient diagnostic of being below threshold is the error suppression factor between two consecutive distances

\Lambda_{d/(d+2)}(p) := \frac{\varepsilon_d(p)}{\varepsilon_{d+2}(p)}.

If \(\Lambda_{d/(d+2)} > 1\) and grows with distance, the device operates in the below-threshold regime for that noise level.

Breakeven and Beyond-Breakeven

A different but related concept is the breakeven point for a logical qubit. Here, the comparison is not between different code distances, but between an encoded logical qubit and the best available physical qubit on the same device.

Define the lifetime (or characteristic error rate) of the best physical qubit as \(\tau_{\mathrm{phys}}\) (or \(p_{\mathrm{phys}}\)) and that of the encoded logical qubit as \(\tau_{\mathrm{log}}\) (or \(p_{\mathrm{log}}\)). Then:

Breakeven corresponds to \(\tau_{\mathrm{log}} = \tau_{\mathrm{phys}}\) (or \(p_{\mathrm{log}} = p_{\mathrm{phys}}\)).
Beyond-breakeven corresponds to \(\tau_{\mathrm{log}} > \tau_{\mathrm{phys}}\).

This is a highly practical milestone: it means that the logical qubit is already "paying for itself" compared to just using the single best physical qubit. However, it is logically distinct from crossing the asymptotic code family threshold. Finite-size effects can allow one to reach breakeven even when the device would not exhibit asymptotic exponential suppression at larger distances.

Fault-Tolerant (Computational) Threshold

The fault-tolerant or computational threshold is stricter. It refers to the maximum physical error rate below which arbitrarily long quantum computations can be performed reliably, given a specific fault-tolerance scheme (code, layout, gate set, decoder, and scheduling).

This threshold is typically expressed in terms of a circuit-level error rate, including:

state preparation errors,
single- and two-qubit gate errors,
idle errors,
measurement/readout errors,
and sometimes leakage and correlated error mechanisms.

For surface codes and related topological codes, the circuit-level threshold is generally quoted in the range \(10^{-3}\)--\(10^{-2}\) for simple noise models. More realistic noise models, architectural constraints, and non-ideal decoders often reduce this effective threshold.

Pseudo-Thresholds for Finite-Distance Codes

In practice, experiments and simulations are limited to modest code distances (e.g., \(d=3,5,7,9\)). The pseudo-threshold is the apparent threshold extracted from such finite-distance data:

p_{\mathrm{pseudo}} \approx p \;\text{where} \; \varepsilon_d(p) = \varepsilon_{\mathrm{phys}}(p)

for a given small distance \(d\), or where curves for two small distances cross.

Pseudo-thresholds are often higher than the true asymptotic \(p_{\mathrm{th}}\) because finite-size effects artificially boost performance at small \(d\). Nevertheless, they are very relevant for near-term devices: they tell you in which parameter regime a given code and decoder are actually helping.

Thresholds for Logical Operations

So far, we have mostly discussed memory thresholds (preserving a logical qubit). For full fault-tolerant quantum computation, one also needs thresholds for logical operations: logical Clifford gates, magic-state injection, lattice surgery, etc.

These thresholds can differ from the memory threshold, because the circuits used to implement logical operations typically involve more gates, ancillas, and possible failure modes. For example, a code might be below threshold for idling logical qubits but above threshold for a particular logical two-qubit gate circuit.

In practice, one aims to show that:

logical operations (e.g., \(\overline{\mathrm{CNOT}}\), \(\overline{H}\), \(\overline{S}\)) are less noisy than their physical counterparts, and
their error rates decrease as code distance increases.

This constitutes an operational notion of "threshold for logical operations".

Memory Experiments vs. Stabilization Experiments

To probe these thresholds experimentally, two related but distinct paradigms have emerged: memory experiments and stability/stabilization experiments.

Memory Experiments

A memory experiment tests the ability of a code to preserve a logical state over time, under repeated rounds of error detection and correction. A typical protocol is:

Prepare a logical state, e.g., \(|0_L\rangle\) or \(|+_L\rangle\).
For \(t\) cycles:
- Measure all stabilizers (syndrome extraction).
- Feed the syndrome into a decoder (real-time or offline).
- Apply corrections (or track Pauli frame).
Measure the logical qubit and compare with the initial state.

By repeating this for many runs and varying \(t\) (and the code distance \(d\)), one obtains estimates of the logical error rate per cycle. The key signatures of being below threshold are:

for fixed \(p\), \(\varepsilon_d(p)\) decreases with increasing \(d\);
the suppression factor \(\Lambda_{d/(d+2)}\) is greater than one and tends to increase with \(d\).

Stabilization (Stability) Experiments

Stabilization experiments are a closely related but more flexible paradigm, conceptually introduced as the "overlooked dual" of memory experiments. The idea is:

Prepare a code state that is an eigenstate of some global logical operator or stabilizer (e.g., a global logical \(Z\)).
Repeatedly measure the local stabilizers for \(r\) rounds.
Monitor whether the global stabilizer eigenvalue is preserved.

In such experiments, one effectively treats the number of rounds \(r\) as a temporal code distance. A key advantage is that a relatively small physical patch (e.g., a \(3 \times 3\) array) can emulate much larger effective distances by running more rounds of syndrome measurements.

This has several consequences:

One can demonstrate scaling behaviors (e.g., error suppression with "distance") without needing a physically large 2D array.
Stabilization experiments often exhibit more forgiving thresholds and are experimentally easier to scale.
They are particularly well-suited to early-stage devices with limited qubit counts but reasonably reliable mid-circuit measurement and feedback.

How Are Thresholds Calculated?

Monte Carlo Simulations with Decoders

The most common approach is to perform Monte Carlo simulations of the full QEC cycle under a specified noise model. The threshold \(p_{\mathrm{th}}\) is estimated from the crossing point of the curves \(\varepsilon_d(p)\) for successive distances.

Mappings to Statistical Mechanics

For certain codes and noise models, the QEC problem maps to a classical statistical mechanics model, such as an Ising model with random couplings. In these cases, the threshold corresponds to a phase transition point (a critical temperature or disorder strength).

Coherent-Information-Based Methods

Recent work has proposed using the coherent information of the noisy encoded state as a diagnostic of the threshold. By computing coherent information as a function of \(p\) for small distances \(d\), one can obtain accurate estimates of the optimal threshold without explicitly simulating large-distance codes or running a decoder.

Experimental Threshold Extraction

Experimentally, thresholds are inferred from observed scaling behavior in memory or stabilization experiments. The onset of a regime where \(\Lambda_{d/(d+2)} > 1\) and remains \(>1\) as \(d\) increases is taken as experimental evidence of operating below threshold.

How to Interpret Thresholds in Practice

The Error Suppression Factor \(\Lambda\)

The ratio

\Lambda_{d/(d+2)} = \frac{\varepsilon_d}{\varepsilon_{d+2}}

plays a central role in experimental analysis.

If \(\Lambda_{d/(d+2)} < 1\), increasing code distance makes things worse: you are above threshold.
If \(\Lambda_{d/(d+2)} \approx 1\), you are near the threshold.
If \(\Lambda_{d/(d+2)} > 1\), and especially if it grows with \(d\), you are below threshold.

Error Budgeting and Prioritization

Threshold analysis naturally leads to an error budget, where the contribution of different physical error sources to the logical error rate is quantified. By fitting experimental data to a noise model, one can identify which parameter improvement yields the greatest reduction in logical error. This guides where to invest engineering effort: faster/quieter gates, better resonators, improved amplifiers and readout chains, shielding against cross-talk, etc.

Hardware Parameters and Thresholds in Different Qubit Modalities

Superconducting Qubits

Superconducting transmon qubits are one of the leading platforms for implementing 2D surface codes. Key parameters include:

Readout fidelity: A typical bottleneck. Readout errors at the percent level can contribute a large fraction of the logical error budget.
Two-qubit gate error: Target \(\lesssim 10^{-3}\). Often the main source of gate-induced errors.
Leakage: Population leaving the computational subspace can significantly degrade performance.

Trapped Ion Qubits

Trapped ions offer very high-fidelity operations and long coherence times. The main challenges include scaling to 2D layouts, managing motional mode structure and heating, and controlling cross-talk at scale. They are in a favorable regime for gate and measurement fidelities but must address architectural scaling.

Photonic Qubits

Photonic platforms differ fundamentally, with loss rather than depolarizing noise as the dominant error mechanism. Thresholds are typically expressed in terms of maximum tolerable loss per link or per time step. Key parameters include photon loss rate, detector efficiency, and source efficiency.

Conclusion

The notion of threshold in quantum error correction is multifaceted: there are scaling thresholds for code families, practical breakeven points for logical qubits, thresholds for logical operations, and strict computational thresholds for full fault tolerance. Memory and stabilization experiments provide two complementary ways of probing whether a device is operating in a below-threshold regime.

For experimentalists, thresholds are less about quoting a single number and more about using scaling behavior—captured succinctly by the error suppression factor \(\Lambda\)—to guide hardware improvements and architectural decisions. As devices continue to improve, understanding and correctly interpreting these thresholds is a crucial step on the path to practical fault-tolerant quantum computation.

Understanding Quantum Error Correction Thresholds: A Practical Guide for Experimentalists