In this note, I will provide a high-level explanation for an intuitive understanding of how FRI works and why it is secure.
Let me begin with explaining what is FRI.
FRI is the acronym for Fast Reed-Solomon IOP of Proximity.
Well, that doesn't look helpful. Let me explain each part of the name one by one.
So, first, FRI is an IOP. IOP is a kind of interactive protocol where the prover can send large strings to the verifier, while the verifier does not need to read the complete strings. Instead, the verifier can arbitrarily select just a few places in the string to read. We say the verifier has **oracle access** to the string, or the prover sends a **string oracle** to the verifier. One important thing to note is that once the string is sent to the verifier, the prover can no longer modify it. In practice, IOP is instantiated with Merkle trees. However, I will stick with the convention of IOP which is more convenient, e.g., I will say the prover sends a vector to the verifier via a string oracle, and the verifier reads some entries of this vector through reading the string oracle.
Secondly, FRI is used for checking if a given vector (by "given", I mean the verifier already has its oracle access) is a member of the **Reed-Solomon code**. If you are not familiar with this, here is the explanation starting from general code. A code $C$ with codeword length $N$ is a set of vectors of length $N$, where the vectors, called the codewords, are over a finite field $\mathbb{F}$. The full vector space of length $N$ contains $|\mathbb{F}|^N$ vectors and has dimension $N$, but the code $C$ contains much fewer vectors than this number. Reed-Solomon code is a kind of linear code, i.e., it is a linear subspace of $\mathbb{F}^N$, with dimension $k<N$. As a linear space, for any codewords $a$ and $b$ and scalars $\alpha$ and $\beta$, $\alpha\cdot a+\beta\cdot b$ is also a codeword. Particularly, the all-zero vector is always a codeword in any linear code. Finally, Reed-Solomon code is **Maximum-Distance Separable (MDS)**, which is a nice property and one of the primary reasons why we are interested in Reed-Solomon code. MDS means any two different codewords differ by at least $d=N-k+1$ positions. Since we are talking about linear codes, this is equivalent to say that any non-zero codeword contains at least $N-k+1$ non-zero entries, or at most $k-1$ zeros.
That's all you need to know about Reed-Solomon code so far. I will gradually introduce more about this code when I explain the mechanisms of FRI.
In FRI, the verifier is given oracle access to a vector of length $N$, and wants to check if this vector is a Reed-Solomon codeword or not, by reading just a few positions of this vector. However, this is not possible if the verifier reads less than $k$ positions, because no matter which positions the verifier selects and what values are obtained from reading these positions, there is always some codeword that matches these values at these positions. That's why we need a more sophisticated protocol for accomplishing this, so we have FRI.
Thirdly, FRI is fast, which means the prover won't compute or send too many vectors, either as string oracle or as direct message, to the verifier, and the verifier won't read too many positions in the vectors. More precisely, the total number of field elements sent through the string oracles is linear in $N$, the amount of direct communication messages should be at most logarithmic, and the verifier reads at most logarithmic number of positions in all the vectors.
Finally, FRI is a proximity test protocol, i.e., it cannot convince the verifier that the given vector is exactly a codeword, but can only convince that the given vector is within a certain distance (Hamming distance) from some codeword. This is due to the following unfortunate fact: no matter how smart your protocol is designed, as long as the verifier reads just a small fraction, say $q$ positions out of $N$, of the given vector, it can't tell apart a codeword $w$ and a non-codeword vector $w'$ that differs from $w$ by just one position by probability more than $q/N$. Since as long as that one differing position is not read by the verifier, the prover can just pretend that $w'$ is $w$, and the protocol executes exactly the same regardless of whether $w$ is replaced with $w'$. Therefore, FRI can only catch the bad vector with high probability when the vector is far from any valid codeword by a certain threshold distance, say $\delta$. Note that even if $\delta$ is as small as $1/10$, the situation is much better, since just reading 10 positions in the vector will give the verifier a decent probability of reading one of the "bad" positions, potentially allowing the verifier to tell $w$ apart from $w'$.
Note that the proximity test allowing a tolerance of distance $\delta$ does not mean that the above mentioned $w'$ that differs from $w$ by just one position is classified as "valid". No, it's still considered invalid, and if the verifier finds out, no matter how narrow the chance is, that the given vector is $w'$ instead of $w$, the verifier rejects. In another word, the "proximity" only admits failure of catching some really "bad but hard to catch" vectors, not admitting them as "good" vectors.
OK. That's the explaination about the name of FRI and its purpose. So, how does FRI work?
In FRI, $N$ is usually selected to be a power of $2$, say $N=2^n$ for some integer $n$ approximately ranging from $5$ to $20$, and for simplicity, let's just set the dimension of the Reed-Solomon code to $k=N/2$. Therefore, the distance between codewords is at least $N/2+1$, i.e., every non-zero codeword has at least $N/2+1$ non-zero entries.
To help understanding the protocol, imagine the following scenario where Alice and Bob are playing a game. Bob has written down a sequence of numbers on the table, and cover each number with a plate. Alice is trying to tell if the sequence of numbers is a valid Reed-Solomon codeword. Alice can uncover some of the numbers, but wants to uncover as few numbers as possible.
As a simple example, suppose $N=8$, and the sequence of numbers, denoted by vector $v$, has the content shown in the following figure. The closest codeword from $v$ is $(1,2,3,4,5,6,7,8)$. The three positions where $v$ differs from the closest codeword are marked by red in the first row. Note that the red color is marked only for you, the reader of this article. For Alice, she sees all numbers are black, if she removes all the plates, as shown in the second row.

In the beginning of the game, Alice sees $N$ plates and nothing else. She has no idea about the content of $v$, and no idea which positions are $v$ different from the closest codeword. What could Alice do?
The naive method, of course, is to remove all the plates. This is the absolutely correct method that succeeds by 100 percent probability. Unfortunately, this requires uncovering $N$ plates, which couldn't be worse.
If Alice allows some probability of failure, she can be more efficient than this. She doesn't have to uncover all the numbers, just $N/2+1$ positions are enough. In the above simple example, Alice could remove five plates selected randomly. As long as Alice catches one of the red positions (which happens with probability $55/56$), it's very unlikely that the five values in these positions belong to a part of some valid codeword.
However, $N/2+1$ is still too much for Alice. So can she be more efficient than this?
Let's consider another approach: Alice asks Bob to tell her the entire vector $v$, check that $v$ is a valid codeword, and then randomly uncover some numbers to check if Bob had told her the correct $v$.
You may think, this doesn't make any sense. What's the difference between this and removing all the plates on the table?
The difference is explained as follows. In our imagined scenario, the rule of the game is to "uncover as few numbers on the table as possible". So hearing the number directly from Bob is considered "cheaper" than removing the plate on the table in this game. If that's not convincing enough because the game rule is established by me arbitrarily, let's go back to the IOP model. In the IOP model, the number of readings to the string oracle is regarded as a separate cost metric from the amount of direct communication. This is because the actual cost of the oracle access depends on how the IOP is instantiated into a standard protocol, which is undecided when we are designing the IOP. For example, when IOP is instantiated with Merkle tree, reading a number in the string oracle corresponds to the prover sending an entire Merkle path. However, if we use other vector commitment schemes instead of Merkle tree, the cost is different. Of course, the cost of opening a vector commitment is always larger than directly sending the value, so in the IOP model we consider direct communication cheaper than reading the oracle.
Let's go back to the game. Now we know that Bob directly telling Alice the content of $v$ is cheaper. Of course, this is cheaper for a reason: it's unreliable. Yes, Bob could cheat by telling lies, but the numbers written on the table could not. So Alice couldn't just believe Bob. To prevent this, she randomly uncovers some numbers on the table to make sure that Bob wasn't lying.
Now, what's the probability for Alice to catch a cheating Bob? If Bob tells Alice a non-codeword, he fails right away. So the best strategy for Bob is to tell Alice a valid codeword that's different from $v$. In the above simple example, the best choice for Bob is to tell Alice $(1,2,3,4,5,6,7,8)$, which differs from $v$ in only three positions. For Alice, removing one plate gives Alice a 3/8 probability to win, and two plates makes the probability a bit more than $1-(5/8)^2$, which is quite good for such a small price.
However, in practice, asking the prover to send the entire $v$ is also unacceptable even if direct message is considered cheaper than reading the oracle. So let's count that as a penalty for Alice in the game, too. We want to minimize the amount of direct communication with Bob, too.
Could Alice do better than the above strategy?
Here is the hard part of FRI, and we need to introduce some more properties of Reed-Solomon code.
For some particularly selected parameters, Reed-Solomon code is **foldable**, which means there exist some vectors $t_+,t_-$ of size $N/2$, such that if we take any codeword $w$ from the Reed-Solomon code $C$, split it into two halves $w_{left}$ and $w_{right}$, then $(w_{left}+w_{right})\circ t_+$ and $(w_{left}-w_{right})\circ t_-$ (where $\circ$ means entry-wise multiplication) are also codewords, of course, not in the original code $C$, but in another Reed-Solomon code $C'$ that has codeword length $N/2$. We can continue folding codewords in $C'$ using another two vectors $t_+',t_-'$ into codewords in $C''$, and so on. Note that it's not technically correct to say that $C$ is foldable. Instead we should say the sequence of codes $C,C',C'',\cdots$ form a family of foldable codes. We call the vectors $t_+,t_-,t_+',t_-',\cdots$ the twist factors.
This is enough to understand FRI. However, if you are interested, here is a more detailed definition of the family of Reed-Solomon codes used in FRI. Let $N$ be the codeword size and $N=2^n$, and $f$ be any polynomial of degree less than $\rho N$ where $\rho$ is a power of $1/2$. For simplicity, let's just assume $\rho=1/2$, so $f$ is a polynomial of degree less than $N/2$. Let $\omega$ be the $N$-th root of unity in $\mathbb{F}$. Then $(f(1),f(\omega),\cdots,f(\omega^{N-1}))$ is a codeword. On the other hand, any vector of length $N$ is a codeword if and only if there exists such a polynomial $f$. In summary, the Reed-Solomon code has a one-to-one correspondence with all polynomials of degree less than $N/2$. That's why the term Reed-Solomon codeword is often used interchangeably with polynomial. Now, let's fold the codewords. Consider a polynomial $f=f_0+f_1X+\cdots+f_{N/2-1}X^{N/2-1}$. Then we define another two polynomials by $f_e(X^2):=\frac{f(X)+f(-X)}{2}$ and $f_o(X^2):=\frac{f(X)-f(-X)}{2X}$. Note that the two polynomials are well defined because the right sides of the two definitions consists of terms only of the form $X^{2i}$. Also note that $f_e$ and $f_o$ have degree less than $N/4$. Now we evaluate $f_e$ over $(1,\omega^2,\omega^4,\cdots,\omega^{N/2-2})$. By definition of $f$, and by the fact that $\omega^{N/2}=-1$, we get $\frac{f(1)+f(\omega^{N/2})}{2},\frac{f(\omega)+f(\omega^{N/2+1})}{2},\cdots,\frac{f(\omega^{N/2-1})+f(\omega^{N-1})}{2}$ which is a codeword in the next code $C'$. This codeword is exactly taking the left half and the right half of the codeword $f$, add together, and divide by $2$. Alternatively, we could omit this constant factor 2 since the code is linear, so the twist factor $t_+$ is simply the all-one vector. Similarly, $f_o$ corresponds to the codeword $\frac{f(1)-f(\omega^{N/2})}{2},\frac{f(\omega)-f(\omega^{N/2+1})}{2\omega},\cdots,\frac{f(\omega^{N/2-1})-f(\omega^{N-1})}{2\omega^{N/2-1}}$, so the twist factor $t_-$ is $(1,\omega^{-1},\cdots,\omega^{-(N/2-1)})$.
Note that for proper twist factors (which is the case for Reed-Solomon code), the map from $C$ to the two smaller codewords in $C'$ is invertible. Therefore, given any vector $v$, if $(v_{left}+v_{right})\circ t_+$ and $(v_{left}-v_{right})\circ t_-$ are both codewords in $C'$, then $v$ is a codeword in $C$.
Let's go back to the game between Alice and Bob.
With the knowledge of foldable code in mind, and assuming that both Alice and Bob know the twist factors $t_+,t_-$ (which are very simple for Reed-Solomon codes), Alice could ask Bob to split $v$ into two halves $v_{left}$ and $v_{right}$, compute $a=(v_{left}+v_{right})\circ t_+$ and $b=(v_{left}-v_{right})\circ t_-$, which should be codewords in $C'$ if $v$ is. Then, given the above fact, it suffices for Alice to check that $a,b$ are valid codewords.
As illustrated in the following graph, where we ignore the twist factors (or assume they are just all-one vectors), $a$ should have been $(6,8,10,12)$, and $b$ should have been $(-4,-4,-4,-4)$, and the positions where the actual values differ from the expected values are marked by red.

Alice could ask Bob to tell her $a,b$ instead of $v$. However, this does not improve the situation since $a$ and $b$ together still have $N$ numbers.
Instead, Alice samples a random number $\alpha$, tells $\alpha$ to Bob, and asks Bob to tell her $a+\alpha b$ instead of $v$. Intuitively, if any of $a$ or $b$ is far from all codewords, then unless Bob is very lucky, the combined vector is also far from any valid codeword. This intuition is formalized as the proximity gap of Reed-Solomon code, which is the core theorem in the security proof of FRI. I will introduce this in another article.
Here are some tricky facts about this strategy.
Note that Bob is free to choose any other closest codewords for $a$ and $b$ than the one computed from the closest codeword of $v$. In the above example, the closest codeword for $b$ is $(-3,-4,-5,-6)$ instead of $(-4,-4,-4,-4)$, so we should have marked the vectors as follows.

The distance of $a$ and $b$ from correct codewords are both just 1, smaller than the original 3. Even if we measure the relative distance, they are 1/4, smaller than the original 3/8. Has Bob increased his chance of winning?
Don't worry, note that the two red numbers are in different positions in $a$ and $b$, so after combined using random $\alpha$, the result would be a vector with relative distance 1/2 with overwhelming probability. So the relative distance increases rather than decreases, Alice is more likely to win using this strategy.
However, could Bob be more clever? Could Bob pick a vector $v$ in a particular way such that even if we merge the red positions in both $a$ and $b$, the relative number of red positions still decreases compared to $v$? In fact, in the above example, the Bob is already playing clever. Note that adding the fourth position and the eighth position produces $3+9=12$, which is a correct value, i.e., two red values adds to a black value, because the differences cancel out. Bob is able to achieve this because $a$ and $b$ are decided from $v$ deterministically, so Bob could select $v$ with a carefully designed strategy to ensure the cancel always happens. The question is, could Bob find a even better $v$ than this? For example, to decrease the total number of red positions to only one in $a$ and $b$ together, or make the two red numbers appear in the same position in $a$ and $b$.
The answer is no, and if you are interested, here is the proof (which you can skip without affecting your understanding of this article). Suppose Bob successfully finds some $v$ such that the number of red positions in $a=(v_{left}+v_{right})\circ t_+$ and $b=(v_{left}-v_{right})\circ t_-$, after combining the two vectors, is at most $\delta N/2$, which means that Bob also finds appropriate codewords $\tilde{a}$ and $\tilde{b}$ such that $a$ differs from $\tilde{a}$ or $b$ differs from $\tilde{b}$ in at most these many positions. Then by the invertibility, we can find $\tilde{v}$ such that $\tilde{a}=(\tilde{v}_{left}+\tilde{v}_{right})\circ t_+$ and $\tilde{b}=(\tilde{v}_{left}-\tilde{v}_{right})\circ t_-$. Then we take the difference of $v$ and $\tilde{v}$, the difference vector $\Delta v$ satisfies $\Delta a=a-\tilde{a}=(\Delta v_{left}+\Delta v_{right})\circ t_+$ and $\Delta b=b-\tilde{b}=(\Delta v_{left}-\Delta v_{right})\circ t_-$. Since we already assumed that Bob is clever enough to make $a$ and $b$ close to $\tilde{a}$ and $\tilde{b}$, the number of positions in which at least one of $\Delta a$ and $\Delta b$ takes a non-zero value is at most $\delta N/2$. That means for the other $(1-\delta)N/2$ positions, both $\Delta a$ and $\Delta b$ are zero, so in these positions both $\Delta v_{left}$ and $\Delta v_{right}$ are zero. As a result, $\Delta v$ has at least $(1-\delta)N$ zeros, which means $v$ differs from $\tilde{v}$ by at most $\delta N$ positions. Note that $\tilde{v}$ is not necessarily the closest codeword of $v$, which means the closest codeword could be even closer than $\tilde{v}$. In summary, Bob can't find $a,b$ such that the positions where at least one of them differ from the closest codeword is fewer than $\delta N/2$, unless the distance of $v$ from $C$ is already smaller than $\delta N$ in the first place.
Let's come back to the game. Now Alice asks Bob to tell her $v'=a+\alpha b$. As we discussed before, if $v$ is a codeword in $C$, then $v'$ is also a codeword in $C'$. And if $v$ has a distance at least $\delta N$ from any codewords in $C$, then with very large probability, $v'$ would also have a distance at least $\delta N/2$ from any codewords in $C'$.
Suppose $\alpha=2$, then $v'$ takes the following values.

In this simple example, $a=v_{left}+v_{right}$ and $b=v_{left}-v_{right}$, and $v'=a+2b$, so $v'$ can be written as the linear combination of the two halves of $v$ as $v'=3v_{left}-v_{right}$. In a real execution of FRI, the relation of $v'$ and $v$ isn't that simple. It's simple (and technically incorrect) here because we have omitted the twist factor. With the twist factor, the relation should be of the form $v'=c\circ v_{left}+d\circ v_{right}$ for some vectors $c,d$ that depend on $\alpha$, i.e., $c$ is of the form $c=c_1+\alpha c_2$ and $d$ is of the form $d=d_1+\alpha d_2$. However, let's assume it's as simple as $v'=3v_{left}-v_{right}$ for ease of explanation.

By the above discussions, it suffices for Alice to check that $v'$ is a valid codeword. However, as before, Bob could also lie about $v'$ by telling Alice a valid codeword that differs from $v'$. To prevent this, note that although $v'$ isn't directly written down on the table, every entry of $v'$ can be deterministically computed from only two entries in $v$. In this example, suppose Alice wants to check if $v_3'$ is indeed the value Bob told her, Alice uncovers $v_3$ and $v_7$, and compares $v_3'$ against $3v_3-v_7$.
What's the best strategy for Bob? Since the $v'$ honestly computed from $v$ differs from any valid codeword by at least two positions, the best strategy is to find such a closest codeword, tell Alice this codeword instead of $v'$, then hope that Alice doesn't catch the wrong positions. If Alice just checks one entry of $v'$, then the probability of winning is $1/2$, and Alice needs to uncover two positions in $v$. The probability for winning is lower than previously, but the communication cost has decreased from $N$ to $N/2$. Not bad.
Let's continue this. Alice could further ask Bob to fold $v'$ into $v''$ using another random $\beta$. However, to check each entry in $v''$, Alice now would have to uncover four positions in $v$. This cost doubles by each fold, and will soon make the entire strategy as bad as the naive method of directly uncovering $N/2$ positions in $v$.
To save the cost, Alice could ask Bob to write down $v'$ on the table, and cover the numbers with plates, just as $v$. Bob could write down the correct $v'$, or replace it with a valid codeword, or modify only one position in $v'$, or take any other strategy that maximize his chance to win. Then Alice and Bob applies the above strategy to $v'$ to check if $v'$ is a codeword. That is, Bob tells Alice the entire $v''$, and Alice randomly checks the correctness of $v''$ using the values of $v'$ written on the table. Assume in the simple example, $\beta=3$ and Bob writes down the correct $v'$, the result is illustrated in the following graph.

In this example, Alice has 1/2 probability to win, because Bob either lies with $v''=(-10,-10)$ or $v''=(0,0)$, in either case, Bob is caught cheating with 1/2 probability. However, Bob could have written down a codeword instead of $v'$, as in the following example.

In this case, Alice could be cheated into believing that $v'$ is a valid codeword (in fact, it indeed is). However, two entries of $v'$ are inconsistent with $v$. In this example, $v_1'$ and $v_3'$. So Alice could randomly pick one of them, say $v_1'$, and continue to uncover $v_1$ and $v_5$, and find that Bob is cheating by checking if $v_1'=3v_1-v_5$. The probability that Alice wins in this case is still 1/2. Now, Alice needs to uncover three numbers written on the table, but the communication cost is only two numbers.
You probably have figured out how things will continue and how FRI works. Yes, Alice could continue folding $v''$ until the communication cost becomes a constant, while the number of plates to remove from the table is only logarithmic w.r.t. $N$. This is basically the idea of FRI. The missing part is an intuitive explanation why this is secure. We can continue to argue that Alice wins with certain probability when Bob takes other strategies, but enumerating all possible strategies Bob could take is obviously not the right way for security proof. That would be too exhausting even for this absurdly simple example. How to argue that Alice always has a good chance to win regardless of Bob's strategy?
To accomplish this, let me introduce a property for a vector that measures Bob's chance to win. I will illustrate this by introducing another game which is played by Bob alone. To differentiate the two games, we call the original one game A, and this new one game B. In game B, we still start from the vector $v$. Then we include another vector $s$, also of size $N$. We call it the score vector of Bob. Let $s$ be the vector of $N$ ones, i.e., $s=(1,1,1,\cdots,1)$.

In this game, Bob is allowed to change any position of $v$. The ultimate goal is to gradually change $v$ into a codeword. However, whenever Bob makes a change, the corresponding position in $s$ will be changed to zero. More formally, for any codeword $w$, define the function $s(v,w):=\sum_i\{s_i|v_i=w_i\}$, which we call the **agreement** of $v$ and $w$ weighted by $s$.
Obviously, in the beginning, since $s$ is all-one, $s(v,w)$ is just equal to $N$ minus the distance between $v$ and $w$. We then argue that, for any codeword $w$, no matter how Bob modifies $v$, Bob could never increase $s(v,w)$. To see this, note that if Bob modifies some $i$ where $v_i$ is already equal to $w_i$, Bob is definitely making things worse. And if Bob modifies some $i$ where $v_i\neq w_i$, then $s_i$ becomes zero, and $s(v,w)$ stays the same regardless of what value $v_i$ is changed to.
In the above example, if Bob changes the $v_4$ from $3$ to $4$, although $v$ becomes closer to $w$, the $s_i$ in the corresponding position becomes zero, so the weighted agreement does not change, i.e., still 5.

Therefore, the best score Bob could hope for is $\max_{w\in C} s(v,w)$, and Bob has a way to achieve it, that is to find the $w$ with maximal $s(v,w)$, and gradually modifies $v$ into $w$.
Obviously, the maximal score Bob could get in game $B$ measures Bob's best chance to win in game A, when Alice uniformly randomly pick one position to check if $v_i$ agrees with $w_i$. More precisely, Bob's winning probability is $s(v,w)/N$, where $w$ is picked by Bob to maximize the agreement. Let $s(v,C):=\max_{w\in C} s(v,w)$, we call it the agreement of $v$ and the code $C$ weighted by $s$. Then the best score Bob could get is $s(v,C)$, and the best probability for Bob to win game A is $s(v,C)/N$.
So far, the weighted agreement between $v$ and $C$ measures the probability that Bob wins when Alice takes the initial strategy in game A. What about the next strategy, i.e., Alice asks Bob to tell her $v'$ instead of $v$? How do we modify game B to measure Bob's probability to win?
In game B, we also sample $\alpha$, and let $v'$ be the correct vector computed from $v$ and $\alpha$ and $s'$ be the score vector of length $N/2$ and also initialized to be all one. Obviously, everything is the same: the maximal score Bob could get is $s'(v',C')$, and the maximal probability Bob could win in game A is $s'(v',C')/(N/2)$.

Next, we consider the final version of game A, where Bob first decides the value of $v'$, then receives $\beta$, and finally decides the value of $v''$. In game B, correspondingly, we divide it into two stages.
In the first stage, let $s'$ be initialized to all-one vector of size $N/2$, and $v'$ initialized to the correct one computed from $v$, which, with high probability, has relative distance $\delta$ from $C$.
Like before, no matter what strategy Bob takes, Bob could never increase $s'(v',C')$. Therefore, in the end, no matter what is the final value of $v'$ and $s'$ when Bob stops changing $v'$ in the first stage, $s'(v',C')$ is smaller than or equal to $(1-\delta)\cdot N/2$.
For example, if Bob chooses to change $v_1'$ to zero, although $v'$ becomes closer to codeword, the weighted agreement is still just 2.

Next, we proceed to the second stage. Sample $\beta$ and compute $v''$ as in game A, i.e., from the final content of $v'$ in the first stage using $\beta$. Then we initialize $s''$ by splitting $s'$ into left and right halves, and compute $s''=(s_{left}'+s_{right}')/2$. That's the start of second stage. The example is illustrated in the following graph.

As before, in the second stage Bob could modify $v''$, which will zerofy the corresponding positions in $s''$. And Bob should change $v''$ into a codeword in the end.
We first argue that for every $i$ from $1$ to $N/4$, $s_i''$ measures the conditional probability that, if Alice picks $v_i''$ to check the correctness, Bob wins game A. To see this, note that if Bob ever changes $v_i''$, then $s_i''$ is zero according to the rule of game B, and Bob's winning probability in game A is also zero if Alice happens to check $v_i''$. Otherwise, if Bob never changed $v_i''$, then $s_i''=(s_{left,i}'+s_{right,i}')/2=(s_i'+s_{i+N/4}')/2$. By similar arguments, $s_{left,i}'$ and $s_{i+N/4}'$ also measures the conditional probability that Bob wins if Alice picks $i$ and $i+N/4$ to check in $v'$, respectively. Since Alice picks from these two positions in $v'$ uniformly randomly, the final probability that Bob wins is exactly $(s_i'+s_{i+N/4}')/2$, which is equal to $s_i''$.
In the above example, which is the starting state of second stage, Bob hasn't changed $v_i''$. Let's temporarily ignore that $v_i''$ is not a codeword yet. If Alice chooses to check $v_1''$, then Alice uncovers $v_1'$ and $v_3'$, which are respectively $0$ and $1$. Note that $v_1'$ has been modified, so $s_1'=0$, and if Alice continues to check $v_1'$, Bob fails, i.e., wins with probability $s_1'=0$. Otherwise, if Alice checks $v_3'$, which has not been checked, Bob wins with probability $s_3'=1$. Therefore, on average, the probability that Bob wins is $1/2$. Similarly, if Alice checks $v_2''$, then Bob wins with probability 1.
Next, suppose Bob chooses to change $v_2''$ to $-2$ to make $v''$ a codeword. Then $s''$ becomes $(1/2,0)$. It's easy to check that $s''$ still measures the probability that Bob wins.

Therefore, after entering the second stage, the best strategy for Bob is to find the closest codeword of $v''$, but weighted by the score vector $s''$. In the above example, although both codewords $(-2,-2)$ and $(0,0)$ are closest to $(-2,0)$, but when we consider the weight $(1/2,1)$, the closest codeword is only $(0,0)$, which has weighted agreement $1$, while $(-2,-2)$ only has weighted agreement $1/2$. So $s''(v'',C'')=1$. Same as before, no matter how Bob changes $v''$, Bob could never increase $s''(v'',C'')$.
In either stage, Bob could not increase $s'(v',C')$ or $s''(v'',C'')$. The only missing piece is to show that during the transition from the first stage to the second stage, the initial value of $s''(v'',C'')$ in the second stage should not be larger than $s'(v',C')/2$. This statement is a variant of the proximity gap theorem of Reed-Solomon code that involves the weight. As said before, I will leave it to another article.
I hope the above example has outlined how FRI works and how its security is proved. I conclude this article with the complete description of FRI in the IOP model.
In the beginning, the verifier has the string oracle that represents a vector $v$ of size $N=2^n$. Let $v^{(0)}=v$ and $r$ be a small constant.
1. **Commit Phase**. For round $i$ from $1$ to $n-r$, the prover and the verifier execute the following steps.
1. The verifier samples $\alpha_i$ and sends to the prover.
2. The prover splits $v^{(i-1)}$ into $(v_{left}^{(i-1)},v_{right}^{(i-1)})$ and computes $v^{(i)}=(v_{left}^{(i-1)}+v_{right}^{(i-1)})\circ t_+^{(i)}+\alpha_i\cdot(v_{left}^{(i-1)}-v_{right}^{(i-1)})\circ t_-^{(i)}$.
3. If $i<n-r$, the prover sends the string oracle of $v^{(i)}$ to the verifier. Otherwise, the prover directly sends the vector $v^{(n-r)}$ to the verifier.
2. The verifier checks $v^{(n-r)}$ is a codeword.
3. **Query Phase**. Repeat the following steps by $Q$ times.
1. The verifier samples $\ell^{(n-r)}$ uniformly from $1$ to $2^{r}$ and reads $v_{\ell^{(n-r)}}^{(n-r)}$ directly from the prover message.
2. For round $i$ from $n-r-1$ down to $0$, execute the following steps.
1. The verifier reads both $v_{2\ell^{(i+1)}-1}^{(i)}$ and $v_{2\ell^{(i+1)}}^{(i)}$ from the string oracle.
2. The verifier checks that $v_{\ell^{(i+1)}}^{(i+1)}=(v_{2\ell^{(i+1)}-1}^{(i)}+v_{2\ell^{(i+1)}}^{(i)})\cdot t_{+,\ell^{(i+1)}}^{(i+1)}+\alpha_{i+1}\cdot(v_{2\ell^{(i+1)}-1}^{(i)}-v_{2\ell^{(i+1)}}^{(i)})\cdot t_{-,\ell^{(i+1)}}^{(i+1)}$.
3. The verifier samples $\ell^{(i)}$ uniformly randomly from $2\ell^{(i)}-1$ and $2\ell^{(i)}$.
This is the simplest version of FRI, and could be improved in several respects. For example, in the last round of commit phase, the prover could instead send only $\rho\cdot 2^r$ entries of $v^{(n-r)}$ to the verifier, and the verifier computes the entire $v^{(n-r)}$ locally. In fact, the verifier doesn't need to compute the entire $v^{(n-r)}$, but only those values selected in the query phase. In the query phase, the verifier can instead sample $\ell^{(1)}$ from $1$ to $N/2$, and compute $\ell^{(i)}=\lfloor\ell^{(i-1)}/2\rfloor$. Instead of reading all of $v_{2\ell^{(i+1)}-1}^{(i)}$, $v_{2\ell^{(i+1)}}^{(i)}$, and $v_{\ell^{(i+1)}}^{(i+1)}$ from the oracle, the verifier instead locally computes $v_{\ell^{(i+1)}}^{(i+1)}=(v_{2\ell^{(i+1)}-1}^{(i)}+v_{2\ell^{(i+1)}}^{(i)})\cdot t_{+,\ell^{(i+1)}}^{(i+1)}+\alpha_{i+1}\cdot(v_{2\ell^{(i+1)}-1}^{(i)}-v_{2\ell^{(i+1)}}^{(i)})\cdot t_{-,\ell^{(i+1)}}^{(i+1)}$ after obtaining the first two values. Therefore, for each repetition in the query phase, the verifier only reads one entry in each string oracle, except that it still reads two positions of $v^{(0)}$.
That's all for this article. In the next article, I will fill in the missing piece in the security proof of FRI, and talk about the proximity gap.