Linear codes background

# Linear codes A linear code is one of the most fundamental tools in coding theory. It allows us to add redundancy to a message in a structured way, so that errors introduced during transmission can be detected and even corrected. ## Definition Let $\mathbb{F}_p$ be a finite field with $p$ elements. An $[n, k, \ell]_p$ linear code $\mathcal{C}$ is a linear subspace of $\mathbb{F}_p^n$ with dimension $k$, such that every nonzero codeword has Hamming weight at least $\ell$: \begin{equation} \mathcal{C} \subseteq \mathbb{F}_p^n \quad \text{is a $k$-dimensional subspace}, \quad \text{with } |u|_0 \geq \ell \text{ for all } 0 \ne u \in \mathcal{C} \end{equation} Since $\mathcal{C}$ is a $k$-dimensional subspace over $\mathbb{F}_p$, it contains exactly $|\mathcal{C}| = p^k$ codewords. :::info ### Example: Let’s build a linear code step by step to illustrate the definition. Let $p = 2$, so we are working over the binary field $\mathbb{F}_2 = \{0, 1\}$. We define a code $\mathcal{C}$ as the row-span of the generator matrix: \begin{equation} G = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix} \quad \text{(a $2 \times 3$ matrix)} \end{equation} This means: - $\mathcal{C}$ is a **linear subspace** of $\mathbb{F}_2^3$ - The dimension is $k = 2$, since there are 2 linearly independent rows - The ambient space has length $n = 3$ We now compute all linear combinations of the rows of $G$ using coefficients from $\mathbb{F}_2$. Each combination corresponds to a vector $a = (a_1, a_2) \in \mathbb{F}_2^2$, and we compute: \begin{equation} a \cdot G = a_1 \cdot \text{row}_1 + a_2 \cdot \text{row}_2 \end{equation} Let’s compute them one by one: - For $a = (0, 0)$: \begin{equation} 0 \cdot (1, 0, 1) + 0 \cdot (0, 1, 1) = (0, 0, 0) \end{equation} - For $a = (1, 0)$: \begin{equation} 1 \cdot (1, 0, 1) + 0 \cdot (0, 1, 1) = (1, 0, 1) \end{equation} - For $a = (0, 1)$: \begin{equation} 0 \cdot (1, 0, 1) + 1 \cdot (0, 1, 1) = (0, 1, 1) \end{equation} - For $a = (1, 1)$: \begin{equation} 1 \cdot (1, 0, 1) + 1 \cdot (0, 1, 1) = (1 + 0, 0 + 1, 1 + 1) = (1, 1, 0) \end{equation} (Note: all operations are modulo 2) So the code $\mathcal{C}$ contains 4 codewords: \begin{equation} \mathcal{C} = \left\{ \underbrace{(0, 0, 0)}_{\text{zero vector}}, \underbrace{(1, 0, 1)}_{\text{weight 2}}, \underbrace{(0, 1, 1)}_{\text{weight 2}}, \underbrace{(1, 1, 0)}_{\text{weight 2}} \right\} \end{equation} Let’s verify the definition: - $\mathcal{C}$ is a linear subspace of $\mathbb{F}_2^3$ ✅ - It has dimension $k = 2$ ✅ - All nonzero codewords have Hamming weight at least $\ell = 2$ ✅ \begin{equation} \forall u \in \mathcal{C},\ u \ne 0 \quad \Rightarrow \quad |u|_0 \geq 2 \end{equation} So this is a valid $[3, 2, 2]_2$ linear code. Moreover: - There are $|\mathcal{C}| = 2^2 = 4$ codewords, consistent with $|\mathcal{C}| = p^k$ - The minimum weight of nonzero codewords is $\ell = 2$ - The minimum distance between any two codewords is also 2 (since $\mathcal{C}$ is linear) This code can detect any single-bit error but cannot correct it. ::: ## Basic notions Before going further, let’s recall a few essential definitions used when analyzing codes. ### Hamming weight The Hamming weight $|u_0|$ of a vector $u = (u_0, u_1, \dots, u_{n-1})$ in $\mathbb{F}_p^n$ is the number of non-zero coordinates: \begin{equation} |u|_0 := \sum_{i=0}^{n-1} (u_i)^0 \end{equation} Note: we define $0^0 := 0$ to make this sum count only the nonzero entries. :::info ### Example: Let $u = (0, 1, 0, 1, 1) \in \mathbb{F}_2^5$. To compute the Hamming weight $|u|_0$, we count the number of nonzero entries: \begin{equation} |u|_0 = (0)^0 + (1)^0 + (0)^0 + (1)^0 + (1)^0 = 0 + 1 + 0 + 1 + 1 = 3 \end{equation} So the vector $u$ has Hamming weight 3. ::: ### Relative Hamming distance The **relative Hamming distance** between two vectors $u, v \in \mathbb{F}_p^n$ is defined as: \begin{equation} \Delta(u, v) := \frac{1}{n} |u - v|_0 \end{equation} This is a normalized measure (always in $[0, 1]$) of how many positions differ between $u$ and $v$. :::info ### Example: Let us consider two vectors over $\mathbb{F}_p^5$: $$ u = (1, 5, 9, 4, 1), \quad v = (1, 2, 9, 7, 4) $$ Then the number of differing positions is 3 (the second, fourth, and fifth entries differ), so: \begin{equation} \Delta(u, v) = \frac{3}{5} \end{equation} ::: ### Relative minimum weight Just as we defined the Hamming weight of a codeword, we can define a related global parameter for the whole code. The **relative minimum weight** of a code $\mathcal{C}$ is the minimum Hamming weight of a nonzero codeword, normalized by the block length $n$: \begin{equation} \mu = \mu(\mathcal{C}) := \frac{\ell}{n} = \frac{1}{n} \cdot \min_{0 \ne u \in \mathcal{C}} |u|_0 \end{equation} This quantity $\mu$ lies in the interval $[0, 1]$, and captures how "dense" the codewords are in terms of their nonzero entries. The larger $\mu$ is, the more resilient the code is to noise: it means that every nonzero codeword differs from zero in many positions, which helps with error detection and correction. :::info ### Example: Computing the Relative Minimum Weight Let’s rigorously compute the relative minimum weight of a binary linear code $\mathcal{C} \subseteq \mathbb{F}_2^3$ defined by the generator matrix: \begin{equation} G = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix} \end{equation} This is a $2 \times 3$ matrix, so the code has: - Dimension $k = 2$ - Block length $n = 3$ - Field $\mathbb{F}_2$ - Total number of codewords: $2^2 = 4$ We compute each codeword as a linear combination $a \cdot G$, where $a = (a_1, a_2) \in \mathbb{F}_2^2$: | $a$ |Codeword $u = a \cdot G$ | Hamming weight $\|u\|_0$ | Relative weight $\dfrac{\|u\|_0}{n}$ | |-----------|---------------------------------------------------------|-------------------------|-------------------------------------| | $(0, 0)$ | $(0, 0, 0)$ | $0$ | $0$ | | $(1, 0)$ | $1 \cdot (1, 0, 1) = (1, 0, 1)$ | $2$ | $\dfrac{2}{3}$ | | $(0, 1)$ | $1 \cdot (0, 1, 1) = (0, 1, 1)$ | $2$ | $\dfrac{2}{3}$ | | $(1, 1)$ | $(1, 0, 1) + (0, 1, 1) = (1, 1, 0)$ | $2$ | $\dfrac{2}{3}$ | Now compute: \begin{equation} \mu(\mathcal{C}) = \frac{1}{n} \cdot \min_{0 \ne u \in \mathcal{C}} |u|_0 = \frac{1}{3} \cdot 2 = \frac{2}{3} \end{equation} So this code has **relative minimum weight** $\mu(\mathcal{C}) = \frac{2}{3}$. This means: - All nonzero codewords have at least 2 nonzero entries - Any codeword differs from 0 in at least $\frac{2}{3}$ of the positions - The **unique decoding radius** is at most $\frac{1}{3}$ This high relative weight reflects strong error detection properties. ::: ### Basic facts about linear codes Let $\mathcal{C} \subseteq \mathbb{F}_p^n$ be an $[n, k, \ell]_p$ linear code. We now introduce two fundamental facts and one important definition about such codes. ### Fact 1: Codewords are well-spaced For all distinct codewords $u, v \in \mathcal{C}$, the relative Hamming distance between them is at least the **relative minimum weight** $\mu(\mathcal{C})$: \begin{equation} \Delta(u, v) \geq \mu(\mathcal{C}) = \frac{\ell}{n} \end{equation} This is because the difference $u - v$ is also a codeword (by linearity), and if $u \ne v$, then $u - v \ne 0$. So: $$ |u - v|_0 \geq \ell \quad \Rightarrow \quad \Delta(u, v) = \frac{1}{n} |u - v|_0 \geq \frac{\ell}{n} $$ This means all distinct codewords are separated by at least $\ell$ positions. Such a spacing is critical for detecting and correcting errors. :::info ### Example: Consider the binary linear code $\mathcal{C} \subseteq \mathbb{F}_2^3$ with generator matrix: \begin{equation} G = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix} \end{equation} From this, we compute the codewords: - $(0, 0) \cdot G = (0, 0, 0)$ - $(1, 0) \cdot G = (1, 0, 1)$ - $(0, 1) \cdot G = (0, 1, 1)$ - $(1, 1) \cdot G = (1, 1, 0)$ Now take two distinct codewords, say: \begin{equation} u = (1, 0, 1), \quad v = (0, 1, 1) \end{equation} Their difference is: \begin{equation} u - v = (1 - 0, 0 - 1, 1 - 1) = (1, 1, 0) \end{equation} This vector is itself a codeword in $\mathcal{C}$ (because the code is linear), and has Hamming weight: \begin{equation} |u - v|_0 = 2 \end{equation} So their relative Hamming distance is: \begin{equation} \Delta(u, v) = \frac{1}{3} \cdot |u - v|_0 = \frac{2}{3} \end{equation} Since the minimum weight of any nonzero codeword is $\ell = 2$, and $n = 3$, we have: \begin{equation} \mu(\mathcal{C}) = \frac{2}{3}, \quad \text{and} \quad \Delta(u, v) \geq \mu(\mathcal{C}) \end{equation} This confirms the fact: all distinct codewords are separated by at least the relative minimum weight. ::: ### Fact 2: The singleton bound Every $[n, k, \ell]_p$ linear code satisfies the following inequality: \begin{equation} k \leq n - \ell + 1 \end{equation} Equivalently, since $|\mathcal{C}| = p^k$, this means: \begin{equation} |\mathcal{C}| \leq p^{n - \ell + 1} \end{equation} This is known as the **Singleton bound**. It sets a hard limit on the trade-off between redundancy (measured by $n - k$) and error resilience (measured by $\ell$). :::info ### Example: Let us consider again the binary linear code $\mathcal{C} \subseteq \mathbb{F}_2^3$ with generator matrix: \begin{equation} G = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix} \end{equation} This is a $2 \times 3$ matrix, so: - The dimension is $k = 2$ - The block length is $n = 3$ We previously computed all codewords and observed that the minimum Hamming weight of any nonzero codeword is $\ell = 2$. Let us now check whether this code satisfies the Singleton bound: \begin{equation} k \leq n - \ell + 1 \end{equation} Substituting the values: \begin{equation} 2 \leq 3 - 2 + 1 = 2 \end{equation} The inequality holds with equality. This tells us that this code achieves the Singleton bound, and therefore qualifies as an **MDS code** (Maximum Distance Separable). This means: - It carries the **maximum possible amount of information** (given its error tolerance), - And it is **optimally efficient** in the sense of the Singleton trade-off. ::: ### Definition: Maximum Distance Separable (MDS) codes If a code achieves the Singleton bound with equality: \begin{equation} k = n - \ell + 1 \end{equation} then $\mathcal{C}$ is called a **Maximum Distance Separable (MDS) code**. These codes are optimal in the sense that they offer the **largest possible dimension** (i.e., carry the most information) for a given block length $n$ and minimum distance $\ell$. ## Encoding a message as a codeword Let $\mathcal{C} \subseteq \mathbb{F}_p^n$ be an $[n, k, \ell]_p$ linear code. Since $\mathcal{C}$ is a $k$-dimensional linear subspace, it has a basis consisting of $k$ vectors: $$ \{c_1, c_2, \dots, c_k\} \subseteq \mathbb{F}_p^n $$ Each codeword in $\mathcal{C}$ is a linear combination of these basis vectors with coefficients in $\mathbb{F}_p$. ### Encoding process Given a message $m = (m_1, m_2, \dots, m_k) \in \mathbb{F}_p^k$, we encode it by computing the corresponding linear combination: \begin{equation} \text{encode}(m) = m_1 c_1 + m_2 c_2 + \cdots + m_k c_k \in \mathbb{F}_p^n \end{equation} So, the encoding is simply a map from message space to codeword space: \begin{equation} \mathcal{C} : \mathbb{F}_p^k \to \mathbb{F}_p^n \end{equation} This map is **linear**. It can be implemented by matrix multiplication using a generator matrix $G \in \mathbb{F}_p^{k \times n}$ whose rows are the basis vectors $c_i$: \begin{equation} \text{encode}(m) = m \cdot G \end{equation} ![image](https://hackmd.io/_uploads/Sk52P53Zxx.png) :::info ### Example: Encoding with a Generator Matrix Let us work over the binary field $\mathbb{F}_2$ and consider the following generator matrix: \begin{equation} G = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix} \end{equation} This matrix defines a linear code $\mathcal{C} \subseteq \mathbb{F}_2^3$ of dimension $k = 2$ and length $n = 3$. Each row of $G$ is a basis vector of the code: - $c_1 = (1, 0, 1)$ - $c_2 = (0, 1, 1)$ Let’s encode the message $m = (1, 1) \in \mathbb{F}_2^2$ using the formula: \begin{equation} \text{encode}(m) = m_1 c_1 + m_2 c_2 \end{equation} Substituting in the values: \begin{equation} \text{encode}(m) = 1 \cdot (1, 0, 1) + 1 \cdot (0, 1, 1) = (1 + 0, 0 + 1, 1 + 1) = (1, 1, 0) \end{equation} Alternatively, using matrix multiplication: \begin{aligned} \text{encode}(m) &= (1, 1) \cdot \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix} \\ &= (1 \cdot 1 + 1 \cdot 0,\ 1 \cdot 0 + 1 \cdot 1,\ 1 \cdot 1 + 1 \cdot 1) \\ &= (1, 1, 0) \end{aligned} So the encoded codeword is: \begin{equation} \boxed{(1, 1, 0) \in \mathbb{F}_2^3} \end{equation} This codeword belongs to $\mathcal{C}$ and represents the encoded version of the message $(1, 1)$. ::: ### Code rate The **rate** of a linear code is a measure of how much information is being transmitted per symbol: \begin{equation} \rho := \frac{k}{n} \in [0, 1] \end{equation} A higher rate means less redundancy, and a lower rate means more error protection. In practice, we often want $\rho$ to be as large as possible (to maximize information throughput), subject to the constraint that the code still corrects the expected number of errors. For example, if $\rho = 0.5$, then $n = 2k$ — each message of $k$ symbols is expanded to a codeword of $2k$ symbols. :::info ### Example: Computing the Code Rate Suppose we have a linear code $\mathcal{C}$ of length $n = 6$ and dimension $k = 3$. That means we are encoding 3-symbol messages as 6-symbol codewords. The rate of the code is: \begin{equation} \rho = \frac{k}{n} = \frac{3}{6} = 0.5 \end{equation} This tells us that only half of each codeword actually carries information, while the other half provides redundancy for error correction. Now consider a $[7, 4, 3]_2$ Hamming code: \begin{equation} \rho = \frac{4}{7} \approx 0.571 \end{equation} This code has a higher rate than the previous one and transmits more information per symbol, but typically offers slightly less protection against errors. So, the rate $\rho$ helps us understand the trade-off between **efficiency** and **reliability** in a code. ::: ## Unique Decoding Distance Let $\mathcal{C} \subseteq \mathbb{F}_p^n$ be an $[n, k, \ell]_p$ linear code, with relative minimum weight $\mu(\mathcal{C}) = \ell/n$. One of the most important questions in error correction is the following: > Given a received word $w \in \mathbb{F}_p^n$, can we uniquely recover the original message that was sent? This leads us to the concept of **unique decoding**. ### Fact 3: Unique Decoding via Minimum Distance Let $w \in \mathbb{F}_p^n$ be a received word. Then: \begin{equation} \text{There is at most one codeword } u \in \mathcal{C} \text{ such that } \Delta(u, w) < \frac{\mu(\mathcal{C})}{2} \end{equation} In words: **if the received word is close enough to a codeword — closer than half the minimum distance — then that codeword is unique.** This fact follows from the **triangle inequality**. If two different codewords $u$ and $v$ were both within distance less than $\mu(\mathcal{C})/2$ of $w$, then: $$ \Delta(u, v) \leq \Delta(u, w) + \Delta(v, w) < \frac{\mu(\mathcal{C})}{2} + \frac{\mu(\mathcal{C})}{2} = \mu(\mathcal{C}) $$ But this contradicts the fact that all codewords differ by at least $\mu(\mathcal{C})$. So decoding within a **radius less than $\mu(\mathcal{C})/2$** guarantees that any valid codeword is **unique**. ![image](https://hackmd.io/_uploads/Sy_n592bel.png) We can think of each codeword as the center of a "safe zone" or a bubble. The radius of this bubble is $\mu(\mathcal{C})/2$. The properties of the code guarantee that none of these bubbles overlap. - **If the received word $w$ lands inside one of these bubbles:** We know with 100% certainty which codeword it belongs to. The codeword at the center of that bubble is the unique candidate for the original message. There's no ambiguity. - **If your received word w lands outside all bubbles:** It might be equally close to two or more different codewords. In this case, you can't be sure which one was sent, and the decoding is ambiguous (not unique). ### Definition: Unique Decoding Radius The quantity \begin{equation} \frac{\mu(\mathcal{C})}{2} \in [0, 0.5] \end{equation} is called the **unique decoding radius** of the code $\mathcal{C}$. It tells us how much noise we can correct with certainty. :::info ### Example: Unique Decoding with Radius < $\mu(\mathcal{C})/2$ Let’s revisit the binary linear code $\mathcal{C} \subseteq \mathbb{F}_2^3$ with generator matrix: \begin{equation} G = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{pmatrix} \end{equation} This defines a $[3, 2, 2]_2$ code. So: - Block length: $n = 3$ - Dimension: $k = 2$ - Minimum Hamming weight: $\ell = 2$ - Relative minimum weight: $\mu(\mathcal{C}) = \frac{2}{3}$ - Unique decoding radius: $\frac{\mu(\mathcal{C})}{2} = \frac{1}{3}$ We previously computed the codewords of $\mathcal{C}$: - $u_1 = (0, 0, 0)$ - $u_2 = (1, 0, 1)$ - $u_3 = (0, 1, 1)$ - $u_4 = (1, 1, 0)$ Now consider the received word: \begin{equation} w = (1, 0, 0) \end{equation} We want to know if $w$ can be uniquely decoded. Compute the Hamming distances between $w$ and each codeword: - $\Delta(w, u_1) = \frac{1}{3} \cdot |(1, 0, 0) - (0, 0, 0)| = \frac{1}{3} \cdot 1 = \frac{1}{3}$ - $\Delta(w, u_2) = \frac{1}{3} \cdot |(1, 0, 0) - (1, 0, 1)| = \frac{1}{3} \cdot 1 = \frac{1}{3}$ - $\Delta(w, u_3) = \frac{1}{3} \cdot |(1, 0, 0) - (0, 1, 1)| = \frac{1}{3} \cdot 3 = 1$ - $\Delta(w, u_4) = \frac{1}{3} \cdot |(1, 0, 0) - (1, 1, 0)| = \frac{1}{3} \cdot 1 = \frac{1}{3}$ We see that three different codewords ($u_1$, $u_2$, $u_4$) are at distance exactly $\frac{1}{3}$ from $w$. So although $w$ is within the decoding radius $\frac{1}{3}$ of some codewords, it is not strictly less than that threshold. But now try: \begin{equation} w' = (1, 0, 1) \end{equation} Then: - $\Delta(w', u_2) = \frac{1}{3} \cdot |(1, 0, 1) - (1, 0, 1)| = 0$ So $w'$ is **exactly one codeword**. Because $w'$ lies within distance 0 from $u_2$, and no other codeword is closer than $\mu(\mathcal{C}) = 2/3$, this is a **unique decoding**. So we observe: - If $\Delta(w, u) < \mu(\mathcal{C})/2 = \frac{1}{3}$ → **unique decoding guaranteed** - If $\Delta(w, u) = \frac{1}{3}$ → **no guarantee**, decoding may be ambiguous This illustrates the **tightness** of the decoding radius. ::: ### Most words are not uniquely decodable Although unique decoding is guaranteed **within** the radius $\mu(\mathcal{C})/2$, it only applies to a small fraction of all possible words in $\mathbb{F}_p^n$. Most words are **too far** from any codeword to be uniquely decodable. To see why, we estimate the total number of words in $\mathbb{F}_p^n$ that lie within a Hamming ball of radius $\ell/2$ around some codeword. Let $B_0(u, \ell/2)$ be the **Hamming ball** of radius $\ell/2$ centered at a codeword $u \in \mathcal{C}$. This is the set of words $w \in \mathbb{F}_p^n$ such that: \begin{equation} \Delta(u, w) \leq \frac{\ell}{2n} \quad \Leftrightarrow \quad |u - w|_0 \leq \frac{\ell}{2} \end{equation} That is, $w$ differs from $u$ in at most $\ell/2$ coordinates. ### Counting the number of such words For a fixed codeword $u$, we are interested in $B_0(u, \ell/2)$, which is the set of all words $w \in \mathbb{F}_p^n$ that differ from $u$ in at most $\ell/2$ coordinate positions. The number of such words $w$, denoted as $|B_0(u, \ell/2)|$, is calculated as follows: \begin{equation} |B_0(u, \ell/2)| = \sum_{i = 0}^{\lfloor \ell/2 \rfloor} \binom{n}{i} (p - 1)^i \end{equation} Let's break down what a single term $\binom{n}{i} (p - 1)^i$ in this sum represents. This term counts the number of words $w$ that differ from $u$ in *exactly* $i$ positions: - $\binom{n}{i}$: This is the binomial coefficient representing the number of ways to **choose which $i$ coordinate positions** (out of the $n$ total positions) will have different values in $w$ compared to $u$. - $(p - 1)^i$: For each of these $i$ chosen positions where $w$ and $u$ differ, the symbol in $w$ must be different from the symbol in $u$. Since elements are from the field $\mathbb{F}_p$ (which has $p$ distinct symbols), there are $(p-1)$ other possible symbols that can be used. As there are $i$ such differing positions, there are $(p-1) \times (p-1) \times \dots \times (p-1)$ (i.e., $i$ times), or $(p-1)^i$, ways to assign these new, different values. The summation $\sum_{i = 0}^{\lfloor \ell/2 \rfloor}$ then adds these counts for all possible numbers of differing positions $i$ — starting from $i=0$ (where $w$ is identical to $u$) up to a maximum of $i = \lfloor \ell/2 \rfloor$ differences. This sum gives the total number of words $w$ contained within the Hamming ball of radius $\ell/2$ centered at $u$. :::warning ### Reminder on binomial formula The notation $\binom{n}{k}$, often read as "$n$ choose $k$", is called a **binomial coefficient**. It represents the number of distinct ways you can choose $k$ items from a larger set of $n$ distinct items, where the order in which you choose the items does not matter. Let's break down $\binom{4}{i}$: * **$n = 4$**: This means you have a set of 4 distinct items to choose from. * **$k = i$**: This means you want to find out how many ways you can choose $i$ items from those 4 items. The value of $i$ can range from 0 (choosing no items) up to 4 (choosing all items). **Formula:** The binomial coefficient is calculated using the following formula: $$\binom{n}{k} = \frac{n!}{k!(n-k)!}$$ Where "$!$" denotes the factorial operation (e.g., $4! = 4 \times 3 \times 2 \times 1 = 24$). So, for $\binom{4}{i}$, the formula is: $$\binom{4}{i} = \frac{4!}{i!(4-i)!}$$ **Generic Example: Choosing Letters** Let's imagine you have a set of 4 distinct letters: **{A, B, C, D}**. We want to see how many different combinations of letters we can form if we choose $i$ letters from this set. Let's go through the possible values of $i$: 1. **Case $i = 0$: $\binom{4}{0}$ (Choosing 0 letters)** * How many ways can you choose 0 letters from {A, B, C, D}? There's only one way: choose nothing (the empty set). * Using the formula: $$\binom{4}{0} = \frac{4!}{0!(4-0)!} = \frac{4!}{0!4!} = \frac{24}{1 \cdot 24} = 1$$ (Note: By definition, $0! = 1$) 2. **Case $i = 1$: $\binom{4}{1}$ (Choosing 1 letter)** * How many ways can you choose 1 letter from {A, B, C, D}? * {A} * {B} * {C} * {D} There are 4 ways. * Using the formula: $$\binom{4}{1} = \frac{4!}{1!(4-1)!} = \frac{4!}{1!3!} = \frac{4 \times 3 \times 2 \times 1}{(1) \times (3 \times 2 \times 1)} = \frac{24}{6} = 4$$ 3. **Case $i = 2$: $\binom{4}{2}$ (Choosing 2 letters)** * How many ways can you choose 2 letters from {A, B, C, D}? Remember, order doesn't matter (so {A, B} is the same as {B, A}). * {A, B} * {A, C} * {A, D} * {B, C} * {B, D} * {C, D} There are 6 ways. * Using the formula: $$\binom{4}{2} = \frac{4!}{2!(4-2)!} = \frac{4!}{2!2!} = \frac{4 \times 3 \times 2 \times 1}{(2 \times 1) \times (2 \times 1)} = \frac{24}{2 \cdot 2} = \frac{24}{4} = 6$$ 4. **Case $i = 3$: $\binom{4}{3}$ (Choosing 3 letters)** * How many ways can you choose 3 letters from {A, B, C, D}? * {A, B, C} * {A, B, D} * {A, C, D} * {B, C, D} There are 4 ways. * Using the formula: $$\binom{4}{3} = \frac{4!}{3!(4-3)!} = \frac{4!}{3!1!} = \frac{4 \times 3 \times 2 \times 1}{(3 \times 2 \times 1) \times (1)} = \frac{24}{6} = 4$$ (Notice that choosing 3 items is the same as choosing 1 item to *leave out*, so $\binom{4}{3} = \binom{4}{1}$) 5. **Case $i = 4$: $\binom{4}{4}$ (Choosing 4 letters)** * How many ways can you choose 4 letters from {A, B, C, D}? There's only one way: choose all of them. * {A, B, C, D} * Using the formula: $$\binom{4}{4} = \frac{4!}{4!(4-4)!} = \frac{4!}{4!0!} = \frac{24}{24 \cdot 1} = 1$$ **In summary for $\binom{4}{i}$:** * $\binom{4}{0} = 1$ * $\binom{4}{1} = 4$ * $\binom{4}{2} = 6$ * $\binom{4}{3} = 4$ * $\binom{4}{4} = 1$ These values are also the coefficients in the expansion of a binomial expression like $(x+y)^4$. This is known as the **Binomial Theorem**: $$(x+y)^4 = \binom{4}{0}x^4y^0 + \binom{4}{1}x^3y^1 + \binom{4}{2}x^2y^2 + \binom{4}{3}x^1y^3 + \binom{4}{4}x^0y^4$$ $$(x+y)^4 = 1x^4 + 4x^3y + 6x^2y^2 + 4xy^3 + 1y^4$$ So, $\binom{4}{i}$ tells you "how many ways can you select $i$ items from a set of 4 distinct items." The generic example with letters helps visualize these combinations. ::: ### Approximation by the largest term To simplify the estimate, we upper-bound the sum by its largest term — which occurs at $i = \ell/2$: \begin{equation} |B_0(u, \ell/2)| \lesssim \underbrace{\binom{n}{\ell/2}}_{\text{choose error positions}} \cdot \underbrace{(p - 1)^{\ell/2}}_{\text{choose new values}} \end{equation} We also apply the inequalities: \begin{equation} \binom{n}{\ell/2} \leq n^{\ell/2}, \qquad (p - 1)^{\ell/2} \leq p^{\ell/2} \end{equation} So the size of a single decoding ball is at most: \begin{equation} |B_0(u, \ell/2)| \leq n^{\ell/2} \cdot p^{\ell/2} \end{equation} ### Summing over all codewords Now we count the total number of words in $\mathbb{F}_p^n$ that can be uniquely decoded — that is, that lie within distance $\ell/2$ of **some** codeword $u \in \mathcal{C}$. There are $|\mathcal{C}| = p^k$ codewords, so the total number of uniquely decodable words is at most: \begin{equation} \sum_{u \in \mathcal{C}} |B_0(u, \ell/2)| \leq p^k \cdot n^{\ell/2} \cdot p^{\ell/2} = p^{k + \ell/2} \cdot n^{\ell/2} \end{equation} ### Applying the Singleton bound From the Singleton bound, we know: \begin{equation} k \leq n - \ell + 1 \end{equation} Therefore: \begin{equation} k + \frac{\ell}{2} \leq n - \frac{\ell}{2} + 1 \end{equation} Substituting into the earlier expression: \begin{equation} p^{k + \ell/2} \cdot n^{\ell/2} \leq p^{n - \ell/2 + 1} \cdot n^{\ell/2} \end{equation} ### Final comparison Now compare this to the total number of words in the ambient space $\mathbb{F}_p^n$, which is: \begin{equation} |\mathbb{F}_p^n| = p^n \end{equation} Since: \begin{equation} p^{n - \ell/2 + 1} \cdot n^{\ell/2} \ll p^n \end{equation} this means that the **proportion** of words that are uniquely decodable is **exponentially small**. So: \begin{equation} \text{Most words in } \mathbb{F}_p^n \text{ are not uniquely decodable.} \end{equation} This insight is critical: although unique decoding is **guaranteed locally**, it does **not** apply to most of the ambient space. ### Summary * Unique decoding is **guaranteed** within a radius of $\mu(\mathcal{C})/2$ * Beyond that, multiple codewords may be equally close → decoding becomes ambiguous * The fraction of words that are uniquely decodable is exponentially small in $n$ unless the rate is very low :::info ### Example: Counting Words in a Hamming Ball Let's use a concrete example to see how this counting works. We'll consider a code over the field $\mathbb{F}_3 = \{0, 1, 2\}$ (so the number of possible symbols is $p=3$). Suppose: - The **block length** of our codewords is $n=4$. - We are interested in a **Hamming radius** of $r = 1$. (This corresponds to $\ell/2 = 1$ in the previous notation, meaning we're looking at words differing in at most 1 position). Our goal is to calculate the number of words $w \in \mathbb{F}_3^4$ that are in the **Hamming ball of radius 1** centered at a specific codeword $u$. That is, we want to find $|B_0(u, 1)|$. #### Step-by-step calculation: We are counting all words $w$ such that their Hamming distance from $u$, denoted $|u - w|_0$, is less than or equal to 1. This means $w$ can differ from $u$ in $0$ positions (i.e., $w=u$) or in exactly $1$ position. Using the formula for the size of a Hamming ball: \begin{equation} |B_0(u, r)| = \sum_{i = 0}^{r} \binom{n}{i} (p - 1)^i \end{equation} For our specific case ($n=4, r=1, p=3$): \begin{aligned} |B_0(u, 1)| &= \sum_{i = 0}^{1}\binom{4}{i} (3 - 1)^i \\ &= \binom{4}{0}(2)^0 + \binom{4}{1}(2)^1 \\ &= 1 + 4 \cdot 2 \\ & = 9 \end{aligned} This means the Hamming ball of radius 1 around $u$ contains: - The original codeword $u$ itself (1 word). - 8 other words, each differing from $u$ in exactly one coordinate position. #### Comparison to Total Words: The total number of possible words of length $n=4$ over $\mathbb{F}_3$ is $p^n = 3^4 = 81$. So, the Hamming ball $B_0(u, 1)$ contains 9 words. This is only $9/81 \approx 0.111$, or about 11% of all possible words in $\mathbb{F}_3^4$. This example concretely demonstrates how the formula \begin{equation} \sum_{i = 0}^{\lfloor \ell/2 \rfloor} \binom{n}{i} (p - 1)^i \end{equation} calculates the exact number of words $w$ within a given Hamming radius of a codeword $u$. It also highlights that this set of "decodable" words (for a unique decoding scheme with this radius) is often a small fraction of the entire space of possible received words. ::: ## List Decoding and the Johnson Bound In real-world settings, decoding a message correctly is often challenging when there are too many errors. Unique decoding guarantees work only when the received word is *very close* to the original codeword — within the decoding radius $\mu(\mathcal{C})/2$. But what if more errors occur? ### Motivation: Beyond Unique Decoding Instead of asking “what is the *unique* closest codeword to $w$?”, we can ask: > What are **all** codewords in $\mathcal{C}$ that are *close enough* to $w$? This leads to the notion of **list decoding**. ### List decoding definition For a linear code $\mathcal{C} \subseteq \mathbb{F}_p^n$ and a received word $w \in \mathbb{F}_p^n$, define: \begin{equation} \text{List}[w, \mathcal{C}, \delta] := \left\{ u \in \mathcal{C} \ \middle| \ \Delta(u, w) \leq \delta \right\} \end{equation} In words: this is the set of codewords that are within relative distance $\delta$ of $w$. ### How large can the list be? When $\delta < \mu(\mathcal{C})/2$, the list has size at most 1 — we are in the **unique decoding** regime. When $\delta$ is larger, the list may contain multiple codewords. But how many? That’s where the **Johnson bound** comes in. ## The Johnson Bound The Johnson bound gives a universal upper bound on the number of codewords that can lie close to any word $w$ — even when the number of errors exceeds the unique decoding radius. Let $\mathcal{C} \subseteq \mathbb{F}_p^n$ be a linear code with **relative minimum distance** $\mu := \ell/n$. Then for any $w \in \mathbb{F}_p^n$ and any $\delta$ in the interval: \begin{equation} 0 < \delta < 1 - \sqrt{1 - \mu} \end{equation} we have the following bound on the list size: \begin{equation} |\text{List}[w, \mathcal{C}, \delta]| \leq \frac{1}{\varepsilon_\delta} \end{equation} where: \begin{equation} \varepsilon_\delta := 2 \sqrt{1 - \mu} \left(1 - \sqrt{1 - \mu} - \delta \right) \end{equation} This result holds for **any** code $\mathcal{C}$ — it's a worst-case upper bound on how many codewords can fall within a radius $\delta$ of any received word. ### Understanding the function $\varepsilon_\delta$ - As $\delta \to 0$, we are near the unique decoding regime, and $\varepsilon_\delta$ is small — the list size is close to 1. - As $\delta \to 1 - \sqrt{1 - \mu}$, the bound **blows up**, meaning that the list may contain many codewords. - Beyond that point, **no general list size bound exists** — it becomes code-dependent. ### Summary of decoding regimes We can summarize how the list size behaves as a function of the decoding radius $\delta$: | $\delta$ | Max list size | Regime | |------------------------|----------------------------------|----------------------------| | $0 \leq \delta < \mu/2$ | $\leq 1$ | Unique decoding | | $\mu/2 \leq \delta < 1 - \sqrt{1 - \mu}$ | $\leq \frac{1}{\varepsilon_\delta}$ | Johnson bound (list decoding) | | $\delta \geq 1 - \sqrt{1 - \mu}$ | No general bound (depends on $\mathcal{C}$) | Beyond Johnson bound | You can also visualize this with the decoding landscape: ![image](https://hackmd.io/_uploads/ryetii3bge.png) ### Why is this important? The Johnson bound is crucial in the design of modern decoding algorithms, which can efficiently recover all nearby codewords even when unique decoding is impossible. It tells us that controlled list decoding is possible beyond the unique decoding radius, and that the list size remains bounded and computationally manageable — up to a certain threshold. ## Convenient Terminology: $\delta$-close and $\delta$-far To make reasoning about list decoding cleaner, we introduce two useful terms that describe how a received word relates to the code: **$\delta$-close** and **$\delta$-far**. These definitions are based on the **relative Hamming distance** $\Delta(w, c)$ between a received word $w$ and a codeword $c$. ### Definition: $\delta$-close We say that a word $w \in \mathbb{F}_p^n$ is **$\delta$-close** to the code $\mathcal{C} \subseteq \mathbb{F}_p^n$ if there exists *at least one* codeword $c \in \mathcal{C}$ such that the relative distance is at most $\delta$: \begin{equation} \exists c \in \mathcal{C} \quad \text{such that} \quad \Delta(w, c) \leq \delta \end{equation} This is equivalent to saying: \begin{equation} |\text{List}[w, \mathcal{C}, \delta]| \geq 1 \end{equation} We denote this situation compactly as: \begin{equation} \Delta(w, \mathcal{C}) \leq \delta \end{equation} That is, the **minimum** distance from $w$ to any codeword in $\mathcal{C}$ is at most $\delta$. ### Definition: $\delta$-far On the other hand, we say that $w$ is **$\delta$-far** from the code $\mathcal{C}$ if *every* codeword $c \in \mathcal{C}$ is at distance **strictly greater** than $\delta$: \begin{equation} \forall c \in \mathcal{C}, \quad \Delta(w, c) > \delta \end{equation} This means the list of nearby codewords is empty: \begin{equation} |\text{List}[w, \mathcal{C}, \delta]| = 0 \end{equation} And we write: \begin{equation} \Delta(w, \mathcal{C}) > \delta \end{equation} ### Why are these terms useful? They help express decoding questions more naturally: - Is a received word close enough to decode? - Should we search the list or reject the input? Instead of always writing the full list decoding expression, we can now talk about whether $w$ is $\delta$-close or $\delta$-far — a cleaner abstraction that still captures the geometric intuition of decoding in Hamming space.