Table of Contents
- Introduction: Why Should You Care About Decoders?
- The Decoding Problem: Surface Codes as a Starting Point
- The Major Decoding Techniques
- Decoding Beyond the Surface Code: The qLDPC Frontier
- The Hardware Revolution: FPGAs, ASICs, and GPUs
- A Timeline of Decoder Progress
- Where Are We Heading?
- References
Introduction: Why Should You Care About Decoders?
Let me start with a confession: when I first got into quantum error correction (QEC), I thought the hard part was designing clever codes. You know, finding some beautiful algebraic structure that protects quantum information from the chaos of noise. And sure, code design is fascinating. But here is the thing I quickly learned: a code is only as good as its decoder.
Think of it this way. Imagine you have a fire alarm system installed throughout a building. The sensors (stabilizer measurements) go off telling you something is wrong, but then you need someone (the decoder) to figure out where exactly the fire is and what to do about it. If that someone is too slow, the building burns down. If that someone misreads the signals, you end up hosing the wrong room. That is essentially the decoder problem in QEC.
In this blog post, I want to walk you through the world of QEC decoders: what the decoding problem actually is, why it is so surprisingly hard, and how the community has made remarkable progress over the past several years. We will start with surface codes as our pedagogical workhorse, tour through the major decoding algorithms, step into the wilder territory of quantum LDPC (qLDPC) codes, and finally look at how classical hardware (FPGAs, ASICs, GPUs) is being leveraged to make real-time decoding a reality.
The Decoding Problem: Surface Codes as a Starting Point
What Is a Surface Code, Anyway?
The surface code is the "poster child" of quantum error correction, and for good reason. It is defined on a two-dimensional square lattice where \(n = L^2\) data qubits encode \(k = 1\) logical qubit with code distance \(d = L\) [1, 2]. In addition, you have \(n_A = L^2 - 1\) ancilla qubits that are used to measure the stabilizers, those parity checks that tell you whether an error has happened.
The beauty of the surface code is its locality: you only need nearest-neighbor interactions on a 2D grid. This is a dream for hardware platforms like superconducting circuits and neutral atoms, where qubit connectivity is inherently limited. The error threshold for the surface code sits around 0.75% to 1% per operation [3, 4], meaning if your physical error rate is below this value, you can in principle suppress logical errors exponentially by increasing the code distance.
So, What Exactly Does a Decoder Do?
Here is the setup. Your logical qubit is humming along, encoded across many physical qubits. Noise hits. Errors (Pauli \(X\), \(Y\), or \(Z\) flips) land on some subset of your data qubits. You do not know which ones. What you can do is measure the stabilizers to get the syndrome: a binary string that tells you which stabilizer checks returned \(-1\).
The decoder's job is to take this syndrome \(\mathbf{s}\) and figure out the most likely error \(\hat{E}\) that caused it. Formally, we want to find:
This is the maximum-likelihood decoding (MLD) problem. If we get the right equivalence class of error (i.e., our correction \(\hat{E}\) times the actual error \(E\) is a stabilizer element), the logical qubit is saved. If not, we have a logical error and the computation is corrupted.
Why Is Decoding So Hard?
There are two layers of difficulty here. First, maximum-likelihood decoding is computationally intractable for generic stabilizer codes. It belongs to the #P complexity class [7]. That means even for moderate code sizes, you cannot just brute-force your way through all possible errors.
Second, and this is the really nasty part: decoding has to happen in real time. Every microsecond you spend thinking about what error happened, new errors are piling up. If your decoder cannot keep up with the syndrome extraction rate (roughly every 1 microsecond for superconducting qubits), you develop a "backlog" of unprocessed syndromes that grows exponentially, eventually tanking your logical clock rate [8, 5]. This is sometimes called the backlog problem.
So the decoder must be simultaneously:
- Accurate: it should approximate maximum-likelihood decoding as closely as possible (high threshold, low logical error rate).
- Fast: it must keep pace with the physical error rate, ideally sub-microsecond per syndrome round.
- Scalable: it should not blow up in complexity as you increase the code distance.
This triple constraint is what makes decoder design one of the most important open challenges in fault-tolerant quantum computing.
What Does Performance Depend On?
The performance of a decoder depends on several factors:
- The noise model: Is the noise depolarizing? Biased? Correlated? Circuit-level (including measurement errors) or code-capacity (ideal measurements)?
- The code distance \(d\): Larger codes can correct more errors but demand more from the decoder.
- The code structure: Surface codes have a planar, local structure that matching-based decoders exploit beautifully. General qLDPC codes do not.
- Degeneracy: In quantum codes, many distinct physical errors map to the same syndrome. A good decoder must account for this degeneracy rather than just finding one specific error pattern.
The Major Decoding Techniques
Let me now walk you through the main families of decoders that have shaped the field. I will try to give you the intuition behind each one, so you can appreciate their strengths and weaknesses without wading through pages of formal proofs.
Minimum-Weight Perfect Matching (MWPM)
MWPM is the "classic" surface code decoder and probably the most widely used one to date [5, 6]. The idea is elegantly simple.
In the surface code, errors create pairs of syndrome defects (the stabilizers that flip to \(-1\)). An \(X\)-error on a data qubit, for instance, anticommutes with the two adjacent \(Z\)-stabilizers, lighting them up. A chain of errors creates defects at its two endpoints. The decoder's job is to pair up these defects in a way that minimizes the total weight of the error chains connecting them.
This is exactly the minimum-weight perfect matching problem on a graph, which was solved by Jack Edmonds in 1965 with his famous "blossom algorithm" [6]. For QEC, you construct a complete graph where nodes are syndrome defects, edges are weighted by the log-likelihood of the shortest error chain between pairs, and then you find the minimum-weight perfect matching.
The MWPM decoder has a code-capacity threshold around 10.3% for the surface code (close to optimal) and around 2.9% for circuit-level noise. Its time complexity is \(\mathcal{O}(n^3)\) in the worst case for the standard blossom algorithm, but modern implementations have dramatically improved this.
Sparse Blossom by Higgott and Gidney [9] is a beautiful rethinking of the blossom algorithm tailored specifically for QEC. Instead of constructing the full graph and running all-to-all Dijkstra searches, sparse blossom works directly on the sparse detection graph. It processes over a million errors per core-second for distance-17 surface codes under 0.1% circuit-level noise, fast enough to match the syndrome extraction rate of superconducting qubits on a single CPU core. It is open-source as PyMatching v2 [10].
Fusion Blossom by Wu and Zhong [11] takes a different but complementary approach: parallelism. It recursively divides the decoding problem into sub-problems that can be solved independently and then fuses their solutions. At 0.1% circuit-level noise, Fusion Blossom can decode a million measurement rounds per second up to code distance 33 on a 64-core CPU, and supports a stream decoding mode with 0.7 ms latency at distance 21.
Union-Find Decoder
The Union-Find (UF) decoder was introduced by Delfosse and Nickerson [12] as a faster alternative to MWPM, trading a small amount of accuracy for a significant speedup. The key idea is to grow clusters around syndrome defects and merge them using a disjoint-set (union-find) data structure.
Here is how it works intuitively. Each defect starts as its own little cluster. Clusters grow outward, one step at a time. When two clusters touch, they merge. Growth continues until every cluster has an even number of defects (so they can be paired internally). Then a peeling decoder efficiently finds a correction within each cluster. The amortized time complexity is nearly linear in the number of qubits, \(\mathcal{O}(n \alpha(n))\) where \(\alpha\) is the inverse Ackermann function.
The threshold of UF is slightly lower than MWPM (around 9.9% code-capacity for the surface code), but its speed advantage is substantial. What makes UF particularly exciting is its suitability for hardware implementation. Its local, cluster-growing behavior maps naturally onto parallel architectures.
A notable variant is Actis by Chan and Benjamin [13], a strictly local Union-Find decoder where all operations are nearest-neighbor. This has a worst-case runtime of \(\mathcal{O}(d^3)\) and is designed to be directly mapped onto a lattice of simple processors.
Belief Propagation (and Why It Struggles)
Belief propagation (BP) is the workhorse decoder of classical LDPC codes. It is an iterative message-passing algorithm that runs on the Tanner graph of the code, passing probability estimates ("beliefs") back and forth between variable nodes (qubits) and check nodes (stabilizers) [14].
For classical LDPC codes, BP works spectacularly well, operating near the Shannon limit in many cases. For quantum codes, however, it runs into serious trouble:
- Short cycles: The Tanner graphs of quantum stabilizer codes inherently contain many short cycles (length-4 cycles from the CSS construction), causing the BP messages to become correlated and degrade performance.
- Degeneracy: Quantum codes are highly degenerate, meaning many physically distinct errors are logically equivalent. Standard BP does not naturally account for this.
Because of these issues, raw BP has historically been considered "not effective for topological codes" [15]. However, researchers have found clever ways to augment BP, which brings us to the next set of decoders.
BP+OSD: The Gold Standard for qLDPC Codes
Belief propagation with ordered statistics decoding (BP+OSD) was proposed by Roffe, White, Burton, and Campbell [16] as a general-purpose decoder for quantum LDPC codes. The strategy is a two-stage approach:
- Stage 1 (BP): Run belief propagation for a set number of iterations. If it converges to a valid solution, great, you are done.
- Stage 2 (OSD): If BP fails to converge (which happens frequently for quantum codes), invoke ordered statistics decoding as a post-processing step. OSD uses the soft reliability information from BP to order the bits by confidence, then performs Gaussian elimination on a reordered parity-check matrix to find a valid correction.
OSD comes in several flavors: OSD-0 (cheapest, just finds one solution), OSD-CS (combination sweep, tries multiple candidate solutions), and higher-order variants that explore a wider search space at higher computational cost. The combination sweep variant with order \(w\) is denoted OSD-CS-\(w\).
BP+OSD has become the "gold standard" for decoding general qLDPC codes [16, 17]. It achieves good performance across a wide landscape of codes, from hypergraph product codes to bivariate bicycle codes. The catch is its computational cost: the OSD step involves matrix inversion with worst-case cubic complexity \(\mathcal{O}(n^3)\), which can be slow for large codes.
BP+LSD: Speeding Things Up with Locality
Belief propagation with localized statistics decoding (BP+LSD) was developed as a faster alternative to BP+OSD for large qLDPC codes [18]. The key insight is that when BP fails, the residual errors tend to cluster in localized regions of the code. Instead of inverting the entire parity-check matrix (as OSD does), LSD identifies these local error clusters and performs the matrix inversion independently on each cluster.
This localization dramatically reduces the computational overhead for large codes while maintaining decoding performance on par with BP+OSD. The developers demonstrated BP+LSD on surface codes, hypergraph product codes, and bivariate bicycle codes under circuit-level depolarizing noise, showing that it matches the logical error rates of BP+OSD-CS-7 at a fraction of the computational cost.
Other Notable Decoding Approaches
Beyond the "big four" (MWPM, UF, BP+OSD, BP+LSD), several other fascinating decoders deserve a mention:
- Tensor Network Decoders: These approximate the maximum-likelihood decoder by contracting a tensor network representation of the error probability distribution. They can be very accurate but tend to be computationally expensive, with complexity depending on the bond dimension \(\chi\) [19, 20].
- Cellular Automaton Decoders: Inspired by Conway's Game of Life, these use local update rules to propagate information across the lattice. They are inherently parallel and have provable thresholds for topological codes [21].
- Neural Network Decoders: Machine learning approaches to decoding have gained huge traction. From convolutional neural networks for small surface codes to recurrent transformer-based architectures, NNs can learn complex correlations in the noise that rule-based decoders miss. A recent standout is the Google/DeepMind transformer decoder that outperforms correlated MWPM on experimental data [22].
- Renormalization Group (RG) Decoders: Also known as hard-decision renormalization group (HDRG) decoders, these coarse-grain the syndrome at multiple scales, making local corrections at each level.
A Quick Comparison
Table 1 gives a rough comparison of the major decoder families for the surface code.
| Decoder | Threshold | Complexity | Hardware-Friendly? |
|---|---|---|---|
| ML (exact) | Optimal (~10.9%) | #P-hard | No |
| MWPM | ~10.3% | \(\mathcal{O}(n^3)\); near-linear (sparse) | Moderate |
| Union-Find | ~9.9% | Near-linear | Yes |
| BP+OSD | ~10%+ | \(\mathcal{O}(n^3)\) (OSD step) | No |
| Tensor Network | Near-optimal | \(\mathcal{O}(n\chi^3)\) | No |
| Neural Network | Varies | \(\mathcal{O}(1)\) inference | GPU-friendly |
Decoding Beyond the Surface Code: The qLDPC Frontier
Why Move Beyond Surface Codes?
The surface code is wonderful, but it has a fundamental limitation: its encoding rate is terrible. You need \(\mathcal{O}(d^2)\) physical qubits to encode just one logical qubit with distance \(d\). For a practical quantum computer running, say, Shor's algorithm to factor a 2048-bit number, you might need millions of physical qubits just for error correction overhead.
Quantum LDPC (qLDPC) codes promise much better scaling. These codes can encode \(k\) logical qubits into \(n\) physical qubits with \(k/n\) remaining constant (or even growing) as \(n\) increases, while still maintaining good distance. The catch? Their Tanner graphs are no longer planar, meaning the matching-based decoders that work so beautifully for surface codes do not directly apply.
Bivariate Bicycle Codes: The New Darling
In 2024, Bravyi, Cross, Gambetta, Maslov, Rall, and Yoder at IBM introduced bivariate bicycle (BB) codes, a family of qLDPC codes that caused a lot of excitement [23]. These codes are constructed from two circulant matrices over a bivariate polynomial ring \(\mathbb{F}_2[x,y]/(x^\ell - 1, y^m - 1)\).
The star of the show is the \(\llbracket 144, 12, 12 \rrbracket\) "gross" code, which encodes 12 logical qubits in 144 physical qubits with distance 12. Compare this to the surface code, where encoding 12 logical qubits at distance 12 would require \(12 \times 144 = 1728\) data qubits (plus a comparable number of ancillas). That is roughly an order of magnitude improvement in qubit overhead.
Other notable members of the family include:
- \(\llbracket 72, 12, 6 \rrbracket\): a smaller code for near-term experiments
- \(\llbracket 288, 12, 18 \rrbracket\): a larger "two-gross" code for higher distances
- \(\llbracket 90, 8, 10 \rrbracket\): another promising code with good parameters
The stabilizer weights are constant (6 or 8) regardless of code size, which is excellent for practical implementation, though the checks are non-geometrically-local.
Hypergraph Product and Lifted Product Codes
Bivariate bicycle codes are not the only game in town. Two other important families of qLDPC codes are:
Hypergraph product (HGP) codes [24]: These take two classical LDPC codes and combine them using a product construction. They naturally give quantum codes with parameters \(\llbracket n, k, d \rrbracket\) where \(k\) and \(d\) both grow with \(n\). A nice feature is that their syndrome extraction circuits have low depth.
Lifted product (LP) codes [25]: These generalize hypergraph products by exploiting symmetry (specifically, group actions) to reduce qubit overhead while maintaining or improving the distance. The bivariate bicycle codes can actually be seen as a special case of lifted product codes. Recent work by Panteleev and Kalachev showed that LP codes can achieve asymptotically good parameters, matching the best known quantum code constructions [26].
Decoding these general qLDPC codes is significantly harder than decoding surface codes because:
- There is no simple "pair the defects" interpretation as in the surface code.
- The Tanner graphs have complex, non-local connectivity.
- Trapping sets (subgraphs that trap iterative decoders) are more prevalent [27].
Decoders for qLDPC Codes
The decoder landscape for qLDPC codes is evolving rapidly. BP+OSD remains the most broadly applicable decoder [16], but several newer approaches are closing in:
- BP+LSD: As discussed above, offers comparable accuracy to BP+OSD with much lower computational cost for large codes [18].
- Ambiguity Clustering (AC): Developed by Wolanski and Barber at Riverlane [28], AC takes a clever approach. After BP, it identifies "ambiguous" regions where the decoder is uncertain and clusters them. Each cluster is then decoded independently. On the \(\llbracket 144, 12, 12 \rrbracket\) gross code, AC is up to 27× faster than BP+OSD with matched accuracy, decoding in 135 μs per syndrome round on a single CPU core.
- Closed-Branch Decoder: Proposed by deMarti iOlius and Etxezarreta Martinez [29], this treats the error as a set of separate "closed branches" and decodes them independently, offering tunable accuracy-speed tradeoffs.
- HyperBlossom (Hyperion): A very recent framework that generalizes the blossom algorithm to hypergraphs, unifying graph-based decoders (MWPM, UF) under a single minimum-weight parity factor (MWPF) formulation [30]. On a distance-11 surface code, it achieves 4.8× lower logical error rate than standard MWPM, and on the \(\llbracket 90,8,10 \rrbracket\) BB code, 1.6× lower than tuned BP+OSD.
- Machine Learning Decoders: Transformer-based decoders are being applied to BB codes with promising results. A recurrent transformer by Blue et al. [31] achieved nearly 5× lower logical error rates than BP+OSD on the \(\llbracket 72,12,6 \rrbracket\) code, and the universal GraphQEC decoder [32] demonstrated 39.4% lower logical error rate on a distance-12 BB code compared to previous best decoders.
The Hardware Revolution: FPGAs, ASICs, and GPUs
Alright, let us talk about the elephant in the room. You can design the cleverest decoder algorithm in the world, but if it cannot run fast enough on real hardware to keep up with the quantum processor, it is useless. This section is about how the community has been tackling the decoder speed challenge with specialized classical hardware.
The Latency Budget
For a superconducting qubit processor running surface code QEC, a single cycle of syndrome extraction takes about 1 microsecond. That means you need to decode (or at least buffer) syndromes at a rate of roughly one million per second. If your decoder takes 10 milliseconds per round, you are falling behind by a factor of 10,000. Google's below-threshold demonstration in 2024 achieved an average decoder latency of 63 microseconds at distance 5, which was sufficient because they used a buffered approach with enough headroom [33].
But as we scale to larger codes (distance 15, 21, 27 and beyond), the latency demands only get tighter. This is where custom hardware comes in.
FPGA Decoders
Field-programmable gate arrays (FPGAs) have become the go-to platform for real-time QEC decoders because they offer low latency, parallelism, and reconfigurability.
Helios is an FPGA-based distributed Union-Find decoder developed by Liyanage, Wu, Deters, and Zhong [34]. Its key innovation is a hybrid tree-grid architecture that organizes parallel computing resources efficiently. The result is stunning: an average decoding time of just 11.5 nanoseconds per measurement round at distance 21 under 0.1% phenomenological noise. Even more remarkably, the decoding time per round actually decreases as distance increases, thanks to the parallelism scaling with the number of resources.
Collision Clustering by Riverlane [35] takes a different algorithmic approach. It grows clusters around defects (similar in spirit to Union-Find) and detects "collisions" between clusters to trigger merges. The FPGA implementation handles an 881-qubit surface code in 810 nanoseconds, while a custom ASIC design decodes a 1,057-qubit code in just 240 nanoseconds. The ASIC is tiny (0.06 mm²) and sips only 8 milliwatts of power, making it potentially suitable for cryogenic operation near the quantum processor itself.
For qLDPC codes, real-time FPGA decoding of the \(\llbracket 144, 12, 12 \rrbracket\) gross code has been demonstrated using the Relay-BP algorithm [36]. This prototype achieves a BP iteration time of 24 nanoseconds and an average per-cycle decoding time under 1 microsecond for circuit-level error probabilities below \(3 \times 10^{-3}\).
GPU-Accelerated Decoding
GPUs bring massive parallelism and floating-point throughput, making them attractive for decoders that involve heavy numerical computation (like BP+OSD).
At GTC 2025, NVIDIA announced a GPU-accelerated BP+OSD decoder within the CUDA-Q QEC framework [37]. Running on an NVIDIA Grace Hopper Superchip, this implementation achieves an order-of-magnitude speedup over CPU baselines for the \(\llbracket 144, 12, 12 \rrbracket\) bivariate bicycle code. The average syndrome decoding time drops to a few milliseconds, and with batched decoding (processing many syndromes simultaneously), they report an additional 40× speedup for high-throughput scenarios.
NVIDIA and QuEra have also co-developed a transformer-based AI decoder [38] that uses graph neural networks and attention mechanisms to decode error syndromes. For distance-3 magic state distillation circuits on QuEra's neutral atom processor, the AI decoder outperformed the exact maximum-likelihood decoder in both accuracy and speed. Training the distance-3 decoder required 42 NVIDIA H100 GPUs and completed in about one hour.
Google's tensor network decoder has also been GPU-accelerated through the CUDA-QX toolkit, achieving parity with Google's proprietary implementation in an open-source setting [39].
The Rise of AI-Powered Decoders
One of the most exciting developments in early 2026 is the emergence of companies dedicated specifically to AI-powered QEC decoding. EdenCode, a startup founded by researchers from Harvard and UC San Diego, emerged from stealth in January 2026 with a neural-network-based real-time decoder [40]. They claim sub-millisecond decoding latency, a 99.9% error detection rate, and up to 10× faster processing than conventional approaches. Their system is designed to be hardware-agnostic and continuously learns from observed error patterns.
The universal GraphQEC decoder [32] takes this further with a code-agnostic approach based on linear attention and graph neural networks. It works on any stabilizer code's graph structure without modification, achieving state-of-the-art results across surface codes, color codes, and qLDPC codes with linear time scaling. On a distance-12 BB code, it achieves an 18-fold improvement in logical error rate compared to previous specialized decoders while maintaining 157 microsecond per-cycle decoding speed.
A Timeline of Decoder Progress
It is helpful to see how rapidly this field has evolved. Here is a rough timeline of major decoder milestones:
| Year | Milestone |
|---|---|
| 1965 | Edmonds publishes the blossom algorithm for matching |
| 2002 | Dennis, Kitaev et al. apply MWPM to surface code decoding |
| 2014 | Bravyi, Suchara, Vargo introduce tensor network decoder |
| 2017 | First neural network decoders for topological codes |
| 2020 | Roffe et al. propose BP+OSD for general qLDPC codes |
| 2021 | Delfosse and Nickerson publish the Union-Find decoder |
| 2022 | Kuo and Lai address BP degeneracy issue with MBP |
| 2023 | Higgott and Gidney release Sparse Blossom (PyMatching v2) |
| 2023 | Wu and Zhong release Fusion Blossom |
| 2023 | Helios FPGA decoder: 11.5 ns/round at \(d=21\) |
| 2023 | Riverlane's Collision Clustering on FPGA and ASIC |
| 2024 | Bravyi et al. introduce bivariate bicycle codes |
| 2024 | Google demonstrates below-threshold QEC (\(d=7\), 0.143%) |
| 2024 | Ambiguity Clustering: 27× faster than BP+OSD |
| 2025 | NVIDIA GPU-accelerated BP+OSD and AI decoders |
| 2025 | Relay-BP on FPGA for the gross code: 24 ns/iteration |
| 2025 | BP+LSD published for fast qLDPC decoding |
| 2025 | HyperBlossom unifies graph-based decoders |
| 2025 | GraphQEC: universal NN decoder with linear time scaling |
| 2026 | EdenCode launches AI decoder startup |
Where Are We Heading?
Looking at the trajectory of decoder development, a few clear trends emerge.
First, there is a convergence between classical and quantum computing hardware. Decoders are no longer just algorithms running on a distant classical computer. They are being baked into FPGAs, ASICs, and GPU clusters that sit right next to (or even inside) the cryostat. The vision of a "decoder-on-chip" that operates at cryogenic temperatures alongside the quantum processor is getting closer to reality.
Second, machine learning decoders are maturing fast. Early neural network decoders were cute proof-of-concept demonstrations on tiny codes. Now we have transformer architectures decoding distance-12 qLDPC codes in real time, universal decoders that work across code families, and companies building commercial products around AI-powered decoding.
Third, the shift from surface codes to qLDPC codes is creating a new decoder design challenge. The beautiful geometric structure of the surface code that matching-based decoders exploit is absent in general qLDPC codes. This means we need fundamentally new ideas, and combinations like BP+OSD, BP+LSD, ambiguity clustering, and graph neural networks are filling that gap.
Fourth, real-time decoding is no longer theoretical. Multiple groups have demonstrated decoding fast enough for superconducting qubit systems, and the demonstrated latencies (tens of nanoseconds for surface codes, sub-microsecond for qLDPC codes on FPGAs) are within the required budget. The next frontier is scaling these demonstrations to larger code distances and more complex noise models.
I believe we are entering a really exciting era where the decoder is becoming just as important as the code itself and the hardware it runs on. The interplay between code design, decoder algorithms, and classical acceleration hardware will define the path to practical fault-tolerant quantum computing. As someone working in this field, I find it remarkable how much progress has been made in just the past three to four years, and I cannot wait to see what comes next.
References
- A. Y. Kitaev, "Fault-tolerant quantum computation by anyons," Annals of Physics, 303, 2–30 (2003).
- A. G. Fowler, M. Mariantoni, J. M. Martinis, A. N. Cleland, "Surface codes: Towards practical large-scale quantum computation," Phys. Rev. A 86, 032324 (2012).
- R. Raussendorf, J. Harrington, "Fault-tolerant quantum computation with high threshold in two dimensions," Phys. Rev. Lett. 98, 190504 (2007).
- A. G. Fowler, A. M. Stephens, P. Groszkowski, "High-threshold universal quantum computation on the surface code," Phys. Rev. A 80, 052312 (2009).
- E. Dennis, A. Kitaev, A. Landahl, J. Preskill, "Topological quantum memory," J. Math. Phys. 43, 4452–4505 (2002).
- J. Edmonds, "Paths, trees, and flowers," Can. J. Math. 17, 449–467 (1965).
- P. Iyer, D. Poulin, "Hardness of decoding quantum stabilizer codes," IEEE Trans. Inf. Theory 61, 5209–5223 (2015).
- B. M. Terhal, "Quantum error correction for quantum memories," Rev. Mod. Phys. 87, 307–346 (2015).
- O. Higgott, C. Gidney, "Sparse blossom: correcting a million errors per core second with minimum-weight matching," Quantum 9, 1600 (2025).
- O. Higgott, "PyMatching: A Python package for decoding quantum codes with MWPM," ACM Trans. Quantum Comput. 3 (2022).
- Y. Wu, L. Zhong, "Fusion Blossom: Fast MWPM decoders for QEC," arXiv:2305.08307 (2023).
- N. Delfosse, N. H. Nickerson, "Almost-linear time decoding algorithm for topological codes," Quantum 5, 595 (2021).
- T. Chan, S. C. Benjamin, "Actis: A strictly local Union-Find decoder," Quantum 7, 1183 (2023).
- J. Pearl, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann (1988).
- K.-Y. Kuo, C.-Y. Lai, "Exploiting degeneracy in belief propagation decoding of quantum codes," npj Quantum Inf. 8, 111 (2022).
- J. Roffe et al., "Decoding across the quantum low-density parity-check code landscape," Phys. Rev. Res. 2, 043423 (2020).
- Nature Comm., "Localized statistics decoding for quantum LDPC codes," (2025).
- J. Roffe et al., "BP+LSD: Localized statistics decoding for quantum LDPC codes," Nature Comm. (2025).
- S. Bravyi, M. Suchara, A. Vargo, "Efficient algorithms for maximum likelihood decoding in the surface code," Phys. Rev. A 90, 032326 (2014).
- C. T. Chubb, "General tensor network decoding of 2D Pauli codes," arXiv:2101.04125 (2021).
- A. Kubica, J. Preskill, "Cellular-automaton decoders with provable thresholds for topological codes," Phys. Rev. Lett. 123, 020501 (2019).
- Google DeepMind, "Learning high-accuracy error decoding for quantum processors," Nature (2024).
- S. Bravyi et al., "High-threshold and low-overhead fault-tolerant quantum memory," Nature 627, 778–782 (2024).
- J.-P. Tillich, G. Zémor, "Quantum LDPC codes with minimum distance proportional to \(n^{1/2}\)," IEEE ISIT (2009).
- P. Panteleev, G. Kalachev, "Degenerate quantum LDPC codes with good finite length performance," Quantum 5, 585 (2021).
- P. Panteleev, G. Kalachev, "Asymptotically good quantum and locally testable classical LDPC codes," STOC 2022 (2022).
- A. K. Pradhan et al., "Linear time iterative decoders for hypergraph-product and lifted-product codes," arXiv:2504.01728 (2025).
- S. Wolanski, B. Barber, "Ambiguity Clustering: an accurate and efficient decoder for qLDPC codes," arXiv:2406.14527 (2024).
- A. deMarti iOlius, J. Etxezarreta Martinez, "The closed-branch decoder for quantum LDPC codes," arXiv:2402.01532 (2024).
- "Minimum-weight parity factor decoder for quantum error correction," arXiv:2508.04969 (2025).
- J. Blue et al., "Machine learning decoding of circuit-level noise for bivariate bicycle codes," arXiv:2504.13043 (2025).
- G. Hu et al., "Efficient and universal neural-network decoder for stabilizer-based quantum error correction," arXiv:2502.19971 (2025).
- R. Acharya et al. (Google Quantum AI), "Quantum error correction below the surface code threshold," Nature (2024).
- N. Liyanage et al., "Scalable quantum error correction for surface codes using FPGA," arXiv:2301.08419 (2023).
- B. Barber et al., "A real-time, scalable, fast and highly resource efficient decoder for a quantum computer," Nature Electronics (2025).
- "Real-time decoding of the gross code memory with FPGAs," arXiv:2510.21600 (2025).
- NVIDIA, "Accelerating quantum error correction research with NVIDIA Quantum," GTC 2025.
- NVIDIA and QuEra, "Advancing AI-driven quantum error decoding for scalable fault-tolerant quantum computing," (2025).
- NVIDIA, "Streamlining quantum error correction and application development with CUDA-QX 0.4," (2025).
- EdenCode Inc., "Real-time AI decoder for quantum error correction," (Jan 2026).
- A. deMarti iOlius et al., "Decoding algorithms for surface codes," Quantum 8, 1498 (2024).