# 2024 Summer Research Internship: Before Finance
This is the record for each meeting in the NYCU Applied Mathematics Summer Research Intership under the guidance of Prof. Yuki Chino. This convers some topics of the random walk theory, including gambler's ruin problem and the first hitting time of $1$ from $0$, and also offers a brief introduction to martingales and related theorems under the setting regarding the Lebesgue integration theory.
## Summary
We first work on the random walk theory. We start from the famous gambler's ruin problem, in which we can imagine that a gambler starts to bet with $z$ dollars and expects to gain wealth until $A$ dollars ($z<A$) without going broke. For each bet, there is probability $p$ to gain one dollar and is probability $q=1-p$ to lose one dollar. In the long run, we conclude that the gambler can only either go broke or attain wealth $A$. We also calculate the distribution of the length of the game $$T_A(z)=\left\{\begin{array}{ll}\dfrac{A}{p-q}\left(\dfrac{1}{1-\left(\dfrac{q}{p}\right)^A}-\dfrac{\left(\dfrac{q}{p}\right)^z}{1-\left(\dfrac{q}{p}\right)^A}\right)-\dfrac{z}{p-q}&\quad\text{if }p\ne q;\\ Az-z^2&\quad\text{if }p=q,\end{array}\right.$$ and the probability of the wealth reaches one end and the other end $$P_A(z)=\left\{\begin{array}{ll}\dfrac{\left(\frac{q}{p}\right)^z-1}{\left(\frac{q}{p}\right)^A-1}&\quad\text{if }p\ne q;\\\dfrac{z}{A}&\quad\text{if }p=q.\end{array}\right.$$ We further investigate the distribution of the first hitting time $\tau_1$ to $1$ from $0$ with the simple random walk. At each time, there is probability $p$ for $+1$ and is probaiblity $q=1-p$ for $-1$. The explicit formulae are $$f_{\tau_1}(2n-1)=\dfrac{(2q)^{n-1}p^n}{n!}\prod_{k=2}^n(2k-3),\quad{n\geq 2},$$ $$f_{\tau_1}(1)=p,$$ and $$f_{\tau_1}(2n)=0,\quad n\geq 0.$$
After the random walk theory, we take a slight taste about martingale and related topics under the setting regarding the Lebesgue integration theory. We first introduce the definition of martingale and explain every details.
:::danger
**Definition 13.16** (Martingale).
A family $(X_t, \mathcal F_t)_{t\in T}$ is called a <u>martingale</u> if
1. the process $X_t$ is adapted to the filtration $\mathcal F_t$,
2. $X_t\in L^1(\Omega, \mathcal F, P)$ for all $t\in T$, and
3. $X_s=E(X_t\mid\mathcal F_s)$ for all $s\leq t$.
If the equal sign is replaced by $\leq$ or $\geq$, then $(X_t, \mathcal F_t)_{t\in T}$ is called a <u>submartingale</u> or a <u>supermartingale</u>.
:::
Then, we state some and prove some famous theorems, including the Doob decomposition, which can be used to determine the largest optimal exercise time of an American option,
:::info
**Theorem 13.17** (Doob Decomposition).
If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale, then there exist two random processes, $M_n$ and $A_n$ such that
1. $X_n=M_n+A_n$ for $n\in\mathbb N$,
2. $(M_n, \mathcal F_n)_{n\in\mathbb N}$ is a martingale,
3. $A_1=0$ and $A_n$ is $\mathcal F_{n-1}$-measurable for $n\geq 2$, and
4. $A_n$ is non-decreasing (increasing), i.e., $$A_n(\omega)\leq A_{n+1}(\omega)$$ almost surely for $n\in\mathbb N$.
If there is another pair of processes $\tilde{M}_n$ and $\tilde{A}_n$ with the same properties, then $M_n=\tilde{M}_n$ and $A_n=\tilde{A}_n$ almost surely.
:::
the optional sampling theorem,
:::info
**Theorem 13.18** (Optional Sampling Theorem).
If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale and $\sigma$ and $\tau$ are two stopping times such that $\sigma\leq\tau\leq k$ for some $k\in\mathbb N$, then $$X_{\sigma}\leq E(X_{\tau}\mid\mathcal F_{\sigma}).$$ If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a martingale or a supermartingale, then the same statement holds with $\leq$ replaced by $=$ or $\geq$ respectively.
:::
Doob's first martingale convergence theorem,
:::info
**Theorem 13.33** (Doob's First Martingale Convergence Theorem).
Let $(X_n, \mathcal F_n)_{n\in\mathbb N}$ be a right-closable martingale. Then, $$\lim_{n\to\infty}X_n=X_{\infty}$$ almost surely and in $L^1(\Omega, \mathcal F, P)$.
:::
and Doob's inequality.
:::info
**Theorem 13.22** (Doob's inequality).
If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale, then $$\lambda\cdot P(A(\lambda, n))\leq\int_{A(\lambda, n)}X_n\text{ d}P\leq E(\max(X_n, 0))$$ for any $n\in\mathbb N$ and for any $\lambda>0$, where $\displaystyle A(\lambda, n)=\{\omega\in\Omega\mid \max_{1\leq i\leq n}X_i(w)\geq \lambda\}$.
:::
## Gambler's Ruin Problem and the Random Walk Theory
### Gambler's Ruin Problem
> I use Koralov and Sinai's book Theory of Probability and Random Process (Chapter 6) as a main reference.
We can imagine that there is a gambler, whose initial fortune is $z$ dollars, placing bets with one dollar at every integer moments of time with probability $p$ to gain one dollar and with probability $1-p$ to lose one dollar. The fortune of the gambler after $n$ time steps can be represented as the position, after $n$ steps, of a random walk starting at $z$. The game stops when the gambler's fortune becomes equal to his desired wealth $A$ or zero, whichever happens first. We are interested in the distribution (to $A$) of length of the game and in the probabilities of the gambler either losing the entire fortune or reaching the goal of accumulating the fortune equal to $A$.

Graph 1.1 An example about this kind of bet with $p=0.3$ starting at $z=20$.
In more precise language, we consider a *one-dimensional random walk of infinite length* with transition probabilities $P(x, x+1)=p$, $P(x, x-1)=1-p=q$, and $P(x, y)=0$ if $|x-y|\ne1$. This means that the probability of making one step to the right is $p$, the probability of making one step to the right is $q$, and all the other probabilities are zero. Given a pair of positive integers $(z, A)$ with $A>z$. We shall study the distribution of the number of steps needed for the random walk starting at $z$ to reach on of the end-points of the interval $[0, A]$. We shall also be interested in finding the probability of the random walk reaching an end-point of the interval before reaching the other one.
To sum up, we have a one-dimensional random walk of infinite length, and we would like to find
1. the distribution of the length of the game, and
2. the probability of the wealth reaches one end and the other end.
#### One-Dimensional Random Walk of Infinite Length
Let $\{X_k\}$ be an infinite sequence of random variables with value $1$ or $-1$, i.e., $X_k=\pm1$. The sequence $\{X_k\}$ represents the step. In addition, we call $\{S_k\}$ the sequence of positions of a "particle" performing a random walk. It follows that $$S_k=S_0+\sum_{\ell=1}^k X_\ell,$$ where $S_0$ is the starting point.
### One Special Relation in Gambler's Ruin Problem
Let $P_n(z)$ be the probability that a trajectory of the random walk starting at $z$ does not reach the end-points of the interval during the first $n$ time steps. It is clear that $P_0(z)=1$ for $z\in(0, A)$. For singular cases, set $P_n(0)=0$ and $P_n(A)=0$, which agrees with the fact that a game starts at an end-point lasts zero steps. We have a special relation. Let $z\in(0, A)$ and let $n$ be a positive integer. Let $\{S_k\}_{k=0}^\infty$ be a random walk with $S_0=z$. Then,
\begin{align*}
P_n(z)&=P(\{S_k\in(0, A)\mid k=1,2,\dots,n\})\\
&=p\cdot P_{n-1}(z+1)+q\cdot P_{n-1}(z-1).
\end{align*}
We thus obtain a partial difference equation for $P_n(z)$, $$\tag{1.1}P_n(z)=p\cdot P_{n-1}(z+1)+q\cdot P_{n-1}(z-1).$$
In general, the equation can be with any initial and boundary conditions $$P_0(z)=\varphi(z)\text{ for }z\in(0, A),$$ $$P_n(0)=\psi_n(0), \quad P_n(A)=\psi_n(A)\text{ for }n\geq 0.$$
There are a few properties of (1.1).
1. Regardless of its initial and boundary conditions, it has a unique solution, since it can be solved recursively.
2. If the boundary conditions are $\psi_n(0)=\psi_n(A)=0$, then the soluition depends monotonically on the initial conditions. Namely, if the initial condition solution pairs $(\varphi^1, P^1)$ and $(\varphi^2, P^2)$ have the relation $\varphi^1(z)\geq \varphi^2(z)$ for all $z\in(0, A)$, then ${P^1}_n(z)\geq {P^2}_n(z)$ for all $z\in(0, A)$ and for all $n\geq 0$.
**Proof**. It is clear that the base case ($n=0$) holds. Suppose ${P^1}_n(z)\geq {P^2}_n(z)$ for all $z\in(0, A)$. It follows that ${P^1}_n(z)\geq {P^2}_n(z)$ for all $z\in[0, A]$ by the boundary condition. We want to show that ${P^1}_{n+1}(z)\geq {P^2}_{n+1}(z)$ for all $z\in(0, A)$. Since $${P^1}_{n+1}(z)=p\cdot {P^1}_n(z+1)+q\cdot {P^1}_n(z-1),$$ $${P^2}_{n+1}(z)=p\cdot {P^2}_n(z+1)+q\cdot {P^2}_n(z-1),$$ and we have ${P^1}_n(z)\geq {P^2}_n(z)$ for all $z\in[0, A]$, we obtain the result \begin{align*}
{P^1}_{n+1}(z)&=p\cdot {P^1}_n(z+1)+q\cdot {P^1}_n(z-1)\\
&\geq p\cdot {P^2}_n(z+1)+q\cdot {P^2}_n(z-1)\\
&={P^2}_{n+1}(z)
\end{align*} for all $z\in(0, A)$, as desired. $\square$
3. If the boundary conditions are $\psi_n(0)=\psi_n(A)=0$, then the soluition depends linearly on the initial conditions. Namely, if $(\varphi^1, P^1)$ and $(\varphi^2, P^2)$ initial condition solution pairs, then $(c_1\varphi^1+c_2\varphi^2, c_1P^1+c_2P^2)$ will be another initial condition solution pair.
4. If the boundary conditions are $\psi_n(0)=\psi_n(A)=0$, then $$\max_{z\in[0, A]}P_n(z)\leq\max_{z\in[0, A]}P_{n-1}(z)$$ for $n>0$.
**Proof**. By (1.1), \begin{align*}
P_n(z)&=p\cdot P_{n-1}(z+1)+q\cdot P_{n-1}(z-1)\\
&\leq p\cdot\max_{z\in[0, A]}P_{n-1}(z)+q\cdot\max_{z\in[0, A]}P_{n-1}(z)\\
&=\max_{z\in[0, A]}P_{n-1}(z)
\end{align*} for all $z\in(0, A)$. Hence, by the boundary conditions, $$\max_{z\in[0, A]}P_n(z)\leq\max_{z\in[0, A]}P_{n-1}(z)$$ for $n>0$. $\square$
We may have an example to see the solution to (1.1). We set up a simple environment about the partial difference equation $$P_n(z)=p\cdot P_{n-1}(z+1)+q\cdot P_{n-1}(z-1)$$ with $p=0.5$, $q=0.5$, $z\in\{0,1,2,3\}$, $P_0(z)=1$ for $z\in(0, 3)$, $P_n(0)=0$, and $P_n(3)=0$. Hence, the equation becomes $$P_n(z)=0.5\cdot P_{n-1}(z+1)+0.5\cdot P_{n-1}(z-1).$$ We first calculate $P_1(z)$. By the recursive relation, \begin{align*}
P_1(1)&=0.5\cdot P_0(0)+0.5\cdot P_0(2)\\
&=0.5,
\end{align*}
and
\begin{align*}
P_1(2)&=0.5\cdot P_0(1)+0.5\cdot P_0(3)\\
&=0.5.
\end{align*} By the boundary conditions, $$P_1(0)=P_1(3)=0.$$ We now calculate $P_2(z)$. By the recursive relation, \begin{align*}
P_2(1)&=0.5\cdot P_1(0)+0.5\cdot P_1(2)\\
&=0.25,
\end{align*}
and
\begin{align*}
P_2(2)&=0.5\cdot P_1(1)+0.5\cdot P_1(3)\\
&=0.25.
\end{align*} By the boundary conditions, $$P_2(0)=P_2(3)=0.$$ Continuing in this fashion, we can see that $$P_n(z)=\left\{\begin{matrix}
0, &\text{ if }z=0,\\
\left(\dfrac{1}{2}\right)^n, &\text{ if }z=1,\\
\left(\dfrac{1}{2}\right)^n, &\text{ if }z=2,\\
0, &\text{ if }z=3.
\end{matrix}\right.$$
We can see that this sequence $\{P_n\}$ converges to the zero map uniformly. In fact, for general $p$, the sequence of probability converges to the zero map as well. The intuitive idea is that (1.1) is averaging. Since the boundary conditions are all zeros, and averaging takes weight about the zeros, things in the middle gradually gets smaller and smaller, until zero. Hence, after a long run, the gambler must attain wealth $A$ or go broke.
### The Original Two Aims
We first focus on the second aim. We want to find the probability that the gambler will win the fortune $A$ before going broke (the random walk reaching $A$ before reaching $0$).
Let $z, A\in\mathbb{N}$ with $z\in(0,A)$. Let the probability of the gembler gets $A$ starting from $z$ before ruined is $P_A(z)$. Then, we have \begin{align*}
P_A(z)&=p\cdot P_A(z+1)+q\cdot P_A(z-1)
\end{align*} with $P_A(0)=0$ and $P_A(A)=1$. This is a second-order linear difference equation. Suppose that there is a solution $P_A(z)=\alpha^z$, where $\alpha$ is a constant. Then, we have $$\tag{1.2}\alpha^z=p\alpha^{z+1}+q\alpha^{z-1}.$$ This implies $\alpha=\dfrac{1\pm\sqrt{1-4pq}}{2p}$. By the relation $p+q=1$, we have \begin{align*}
\alpha&=\dfrac{1\pm\sqrt{1-4pq}}{2p}\\
&=\dfrac{1\pm\sqrt{(p+q)^2-4pq}}{2p}\\
&=\dfrac{1\pm|p-q|}{2p}\\
&=\dfrac{p+q\pm|p-q|}{2p}.
\end{align*} If $p>q$, then the solutions are $\alpha=1, \dfrac{q}{p}$. If $p<q$, then the solutions are $\alpha=\dfrac{q}{p}, 1$. Hence, if $p\ne q$, we have the following general solution $$P_A(z)=C_1+C_2\left(\dfrac{q}{p}\right)^z.$$ By the boundary conditions, we have $$C_1+C_2=0$$ and $$C_1+C_2\left(\dfrac{q}{p}\right)^A=1.$$ Hence, after solving the system, we have $(C_1, C_2)=\left(\dfrac{-1}{\left(\frac{q}{p}\right)^A-1}, \dfrac{1}{\left(\frac{q}{p}\right)^A-1}\right)$. Therefore, the solution if $p\ne q$ is $$P_A(z)=\dfrac{\left(\frac{q}{p}\right)^z-1}{\left(\frac{q}{p}\right)^A-1}.$$ Now, if $p=q=\dfrac{1}{2}$, then we have only one root $\alpha=1$. By some observation, we can see that $P_A(z)=z$ is also a particular solution. Hence, when $p=q$, the solution must be in form of $D_1z+D_2$. By the boundary conditions, we have $D_2=0$ and $D_1=\dfrac{1}{A}$. Therefore, when $p=q$, the solution is $$P_A(z)=\dfrac{z}{A}.$$
We now go back to our first aim. We want to find the distribution of the length of the game. Let $T_A(z)$ denote the time starting from $z$ to $A$ without go broke. We have $$T_A(z)=p\cdot T_A(z+1)+q\cdot T_A(z-1)+1.$$ It is a non-homogenous equation. Hence, the solution will be the general solution of the homogeneous equation plus a particular solution. If $p\ne q$, we observe that $$T_A(z)=\dfrac{-z}{p-q}$$ is a solution. Hence, set $$T_A(z)=E_1+E_2\left(\dfrac{q}{p}\right)^2-\dfrac{z}{p-q}.$$ By the boundary conditions, we have $$E_2=\dfrac{-\frac{A}{p-q}}{1-\left(\frac{q}{p}\right)^A}$$ and $E_1=-E_2$. Therefore, $$T_A(z)=\dfrac{A}{p-q}\left(\dfrac{1}{1-\left(\dfrac{q}{p}\right)^A}-\dfrac{\left(\dfrac{q}{p}\right)^z}{1-\left(\dfrac{q}{p}\right)^A}\right)-\dfrac{z}{p-q}.$$ If $p=q=\dfrac{1}{2}$, choose $T_A(z)=-z^2$. Hence, set $$T_A(z)=F_1z+F_2-z^2.$$ By the boundary conditions, we have $$T_A(z)=Az-z^2.$$
### Random Walk Theory
> Reference: [Lecture Notes](https://web.ma.utexas.edu/users/gordanz/notes/advanced_random_walks_color.pdf) by Gordan Žitković.
Let $\{S_n\}$ be a simple random walk starting from $0$ with probability $p$ of going up. That is, $X_i=\pm 1$ and $$S_n=\sum_{i=1}^n X_i$$ with $P(X_i=1)=p$ and $P(X_i=-1)=q=1-p$. Let $\tau_1=\min\{n\in\mathbb N\mid S_n=1\}$ be the first hitting time of $1$. We would like to investigate about the probability mass function $f_{\tau_1}(n)=P(\tau_1=n)$. We will use (ordinary) generating function method to find $f_{\tau_1}(n)$.
Since we start from $0$, the particle cannot go to $1$ in even steps. Hence, $f_{\tau_1}(2k)=0$ for all $k\in\mathbb N$. In addition, we have simple $f_{\tau_1}(1)=p$, since the particle just go up on the first step. Now, we discuss cases with $n>1$.
In order to go from $0$ to $1$ in $n$ steps, i.e., the $n$-th step is the first time the particle reaches $1$, the first step must be down to $-1$. There are $(n-1)$ steps left, and the particle needs to get to $1$ from $-1$. Suppose there are $i$ steps from $-1$ to $0$, and there are $n-1-i$ steps from $0$ to $1$. Here $i=1,2,\dots, n-2$. In fact, the starting point does not matter at all; the thing really matters is the increment. Both from $-1$ to $0$ and from $0$ to $1$ have increment $+1$. Hence, we can use the same notation and obtain the recursion
\begin{equation}\tag{2.1}
f_{\tau_1}(n)=q\sum_{i=1}^{n-2}f_{\tau_1}(i)f_{\tau_1}(n-i-1),
\end{equation}
where $n\geq 2$, $f_{\tau_1}(0)=0$, and $f_{\tau_1}(1)=p$.
Let $\displaystyle G_{\tau_1}(x)=\sum_{i=0}^\infty f_{\tau_1}(i) x^i$ be the probability generating function (a power series representation of the probability mass function of the random variable.) We first square the probability generating function to obtain something more like a convolution as (2.1) does,
\begin{equation}\tag{2.2}
(G_{\tau_1}(x))^2=\sum_{i=0}^\infty\left(\sum_{j=0}^i f_{\tau_1}(j) f_{\tau_1}(i-j)\right) x^i.
\end{equation}
This is where the magic happens. The coefficient can be simplified as
\begin{align*}
\sum_{j=0}^i f_{\tau_1}(j) f_{\tau_1}(i-j)&=f_{\tau_1}(0)f_{\tau_1}(i)+\sum_{j=1}^{i-1} f_{\tau_1}(j) f_{\tau_1}(i-j)+f_{\tau_1}(i)f_{\tau_1}(0)\\
&=\sum_{j=1}^{i-1} f_{\tau_1}(j) f_{\tau_1}(i-j)\\
&=\sum_{j=1}^{(i+1)-2} f_{\tau_1}(j) f_{\tau_1}((i+1)-j-1)\\
&=\dfrac{f_{\tau_1}(i+1)}{q}
\end{align*}
for $i\geq 2$. Hence, we can rewrite (2.2) to be
\begin{align*}
(G_{\tau_1}(x))^2&=\sum_{i=0}^\infty\left(\sum_{j=0}^i f_{\tau_1}(j) f_{\tau_1}(i-j)\right) x^i\\
&=f_{\tau_1}(0) f_{\tau_1}(0) x^0+\left(f_{\tau_1}(0) f_{\tau_1}(1)+f_{\tau_1}(1) f_{\tau_1}(0)\right) x^1+\sum_{i=2}^\infty \dfrac{f_{\tau_1}(i+1)}{q}\cdot x^i\\
&=\sum_{i=2}^\infty \dfrac{f_{\tau_1}(i+1)}{q}\cdot x^i\\
&=\dfrac{1}{q}\cdot\sum_{i=3}^\infty f_{\tau_1}(i) x^{i-1}\\
&=\dfrac{1}{qx}\cdot\sum_{i=3}^\infty f_{\tau_1}(i) x^{i}\\
&=\dfrac{1}{qx}\cdot\left(\sum_{i=0}^\infty f_{\tau_1}(i) x^{i}-f_{\tau_1}(1)x\right)\\
&=\dfrac{1}{qx}\cdot\left(G_{\tau_1}(x)-px\right),
\end{align*}
which is a quadratic equation for $G_{\tau_1}(x)$. By the quadratic formula, we have $$G_{\tau_1}(x)=\dfrac{1\pm\sqrt{1-4pqx^2}}{2qx}.$$ Since the one with positive sign will be greater than 1 in absolute value, the generating function can only be $$G_{\tau_1}(x)=\dfrac{1-\sqrt{1-4pqx^2}}{2qx}.$$ Using Taylor's theorem, we have the explicit formula $$f_{\tau_1}(2n-1)=\dfrac{(2q)^{n-1}p^n}{n!}\prod_{k=2}^n(2k-3),\quad{n\geq 2},$$ $$f_{\tau_1}(1)=p,$$ and $$f_{\tau_1}(2n)=0,\quad n\geq 0.$$
## Martingale and Some Related Theorems
> I use Koralov and Sinai's book Theory of Probability and Random Process (Chapter 13) as a main reference.
### Definition of Martingale and More Explanations
:::danger
**Definition 13.16**. (Martingale)
A family $(X_t, \mathcal F_t)_{t\in T}$ is called a <u>martingale</u> if
1. the process $X_t$ is adapted to the filtration $\mathcal F_t$,
2. $X_t\in L^1(\Omega, \mathcal F, P)$ for all $t\in T$, and
3. $X_s=E(X_t\mid\mathcal F_s)$ for all $s\leq t$.
If the equal sign is replaced by $\leq$ or $\geq$, then $(X_t, \mathcal F_t)_{t\in T}$ is called a <u>submartingale</u> or a <u>supermartingale</u>.
:::
If one thinks of $X_t$ as the fortune of a gambler at time $t$, then a martingale is a model of a fair game (any information available by time $s$ does not affect the fact that the expected increment in the fortune over the time period from $s$ to $t$ is equal to zero). More precisely, $$E(X_t-X_s\mid\mathcal F_s)=0.$$
It can be seen that there are plenty of things to explain. We first construct the background, the notation for measurable space and its meaning.
#### Related to Measurable Space (Reference: Chapter 1.1)
:::warning
**Notation**. Let $(\Omega, \mathcal F)$ be a measurable space and $T$ a subset of $\mathbb R$ if it is not specifically mentioned.
:::
:::danger
**Definition 1.7** (Measurable Space).
A <u>measurable space</u> is a pair $(\Omega, \mathcal F)$, where $\Omega$ is the space of elementary outcomes and $\mathcal F$ is a $\sigma$-algebra of subsets of $\Omega$.
:::
:::success
**Remark**. One can consider that $\mathcal F$ is the power set of $\Omega$.
:::
:::danger
**Definition 1.6** ($\sigma$-Algebra).
A collection $\mathcal F$ of subsets of $\Omega$ is called a <u>$\sigma$-algebra</u> if $\mathcal F$ is an algebra and is closed under countable unions. That is, $$\bigcup_{i=1}^\infty C_i\in\mathcal F$$ for $C_i\in\mathcal F$.
:::
:::danger
**Definition 1.1** (Algebra).
A collection $\mathcal F$ of subsets of $\Omega$ is called an <u>algebra</u> if
1. $\Omega\in\mathcal F$,
2. $C\in\mathcal F$ implies $\Omega\setminus C\in\mathcal F$, and
3. $C_i\in\mathcal F$, $i=1, 2, \dots, n$, implies $\bigcup_{i=1}^n C_i\in\mathcal F$.
:::
#### Related to Filtration
:::danger
**Definition 13.8** (Filtration).
A collection of $\sigma$-subalgebras $\mathcal F_t\subseteq \mathcal F$, $t\in T$, is called a <u>filtration</u> if $\mathcal F_s\subseteq \mathcal F_t$ for all $s\leq t$.
:::
:::danger
**Definition** ($\sigma$-Subalgebra).
Let $\mathcal F$ be a $\sigma$-algebra. If $\mathcal G\subseteq\mathcal F$ and $\mathcal G$ is a $\sigma$-algebra, we say $\mathcal G$ is a <u>$\sigma$-subalgebra</u> of $\mathcal F$.
:::
:::danger
**Definition 13.14** (Adapted).
A random process $X_t$ is called <u>adapted</u> to filtration $\mathcal F_t$ if $X_t$ is $\mathcal F_t$-measurable for each $t\in T$.
:::
:::danger
**Definition** (Random Process).
Consider a family of random variables $X_t$ defined on $(\Omega, \mathcal F, P)$ and indexed by a parameter $t\in T$. We refer to the parameter $t$ as time, and to $X_t$ as a <u>random process</u>.
:::
#### Related to $L^1$ Space (Reference: Chapter 3.7)
:::danger
**Definition 1.17** (Probability Space).
A <u>probability space</u> is a triplet $(\Omega, \mathcal F, P)$, where $(\Omega, \mathcal F)$ is a measurable space and $P$ is a probability measure. If $C\in\mathcal F$, then the number $P(C)$ is called the probability of $C$.
:::
:::danger
**Definition 1.16** (Probability Measure).
A measure $P$ on a measurable space $(\Omega, \mathcal F)$ is called a <u>probability measure</u> or a <u>probability distribution</u> if $P(\Omega)=1$.
:::
:::danger
**Definition 1.12** (Measure).
Let $(\Omega, \mathcal F)$ be a measurable space. A function $\mu:\mathcal F\to[0, \infty)$ is called a <u>finite non-negative measure</u> if $$\mu\left(\bigcup_{i=1}^\infty C_i\right)=\sum_{i=1}^\infty \mu(C_i)$$ for $C_i\in\mathcal F$ and $C_i\cap C_j=\emptyset$ for $i\ne j$. We often omit "finite non-negative" and simply refer to $\mu$ as a measure.
:::
:::danger
**Definition** ($L^p$ Norm).
Let $p\geq 1$. Without involving the Lebesgue integration, we simply write $\|f\|_p$ to indicate the norm of $f$ with power $p$. To be more precise, $$\|f\|_p=\left(\int_\Omega |f|^p\text{ d}\mu\right)^{\frac{1}{p}}.$$
:::
:::danger
**Definition** ($L^p$ Space).
Let $p\geq 1$. The set of function for which $\|f\|_p$ is finite is denoted by $L^p(\Omega, \mathcal F, \mu)$ or simply $L^p$. To be more precise, $$L^p(\Omega, \mathcal F, \mu)=\{\text{$f$ is measurable}\mid \|f\|_p<\infty\}.$$
:::
:::danger
**Definition 1.10** (Measurable Function).
Let $(\Omega, \mathcal F)$ be a measurable space. A function $\xi:\Omega\to\mathbb R$ is said to be $\mathcal F$-measurable (simply <u>measurable</u>) if $\{\omega\in\Omega\mid a\leq\xi(\omega)<b\}=\xi^{-1}([a,b))\in\mathcal F$ for each $a, b\in\mathbb R$.
:::
#### Related to Conditional Expectation
:::danger
**Definition** (Conditional Expectation).
Let $(\Omega, \mathcal F, P)$ be a probability space. Let $f$ be a random variable. Let $B\in\mathcal F$. We define the <u>conditional expectation</u> of $f$ given $B$ as $$E(f\mid B)=\dfrac{1}{P(B)}\cdot\int_B f(\omega)\text{ d}P(\omega)$$ provided that $\displaystyle\int_B f(\omega)\text{ d}P(\omega)<\infty$ and $P(B)\neq0$.
:::
We may define the conditional expectation more generally.
:::danger
**Definition 13.1** (Conditional Expectation).
Let $(\Omega, \mathcal F, P)$ be a probability space, let $\mathcal G$ be a $\sigma$-subalgebra of $\mathcal F$, and let $f\in L^1(\Omega, \mathcal F, P)$. The <u>conditional expectation</u> of $f$ given $\mathcal G$, denoted by $E(f\mid\mathcal G)$, is the random variable $g\in L^1(\Omega, \mathcal F, P)$ such that for any $A\in\mathcal G$, $$\int_A f\text{ d}P=\int_A g\text{ d}P.$$
:::
:::info
**Corollary**.
If $f$ is measurable with respect to $\mathcal G\in\mathcal F$, then $E(f\mid\mathcal G)=f$.
:::
The conditional expectation is reduced to ordinary expectation if $f$ is independent of $\mathcal G$. This is the case when $\mathcal G$ is the trivial $\sigma$-algebra, $\mathcal G=\{\emptyset, \Omega\}$.
We can also have the conditional expectation with respect to a random variable.
:::danger
**Definition** (Conditional Expectation).
Let $f, g\in L^1(\Omega, \mathcal F, P)$. The conditional expectation of $f$ with respect to $g$ is $$E(f\mid g)=E(f\mid \sigma(g)),$$ where $\sigma(g)=\{g^{-1}(F)\mid F\in\mathcal F\}$.
:::
#### Stopping Time
:::danger
**Definition 13.9** (Stopping Time).
A random variable $\tau:\Omega\to T$ is a <u>stopping time</u> of the filtration $\mathcal F_t$ if $\{\tau\leq t\}=\tau^{-1}((-\infty, t])\in\mathcal F_t$ for each $t\in T$.
:::
:::danger
**Definition 1.17** (Random Variable).
A measurable function defined on a probability space is called a <u>random variable</u>.
:::
We have now finished the settings.
### Some Properties of Submartingale and the Doob Decomposition
:::danger
**Definition 13.16** (Martingale).
A family $(X_t, \mathcal F_t)_{t\in T}$ is called a <u>martingale</u> if
1. the process $X_t$ is adapted to the filtration $\mathcal F_t$,
2. $X_t\in L^1(\Omega, \mathcal F, P)$ for all $t\in T$, and
3. $X_s=E(X_t\mid\mathcal F_s)$ for all $s\leq t$.
If the equal sign is replaced by $\leq$ or $\geq$, then $(X_t, \mathcal F_t)_{t\in T}$ is called a <u>submartingale</u> or a <u>supermartingale</u>.
:::
:::info
**Theorem** (Conditional Jensen's Inequality).
Let $g$ be a convex function on $\mathbb R^d$ and $f$ a random variable with values in $\mathbb R^d$ such that $$E(|f|)<\infty\quad\text{and}\quad E(|g(f)|)<\infty.$$ Let $\mathcal G$ be a $\sigma$-algebra of $\mathcal F$. Then, $$g(E(f\mid\mathcal G))\leq E(g(f)\mid \mathcal G)$$ almost surely.
:::
**Proof**. This can be roughly seen through the property of convex funciton. For details, see "Real Analysis and Probability" by R. M. Dudley. $\square$
:::info
**Theorem**.
If $(X_t, \mathcal F_t)_{t\in T}$ is a martingale and $f$ is a convex function such that $f(X_t)$ is integrable for all $t$, then $(f(X_t), \mathcal F_t)_{t\in T}$ is a submartingale.
:::
**Proof**. By the conditional Jensen's inequality,
\begin{align*}
f(X_s)&=f(E(X_t\mid\mathcal F_s))\\
&\leq E(f(X_t)\mid\mathcal F_s).
\end{align*} Hence, $(f(X_t), \mathcal F_t)_{t\in T}$ is a submartingale. $\square$
We now focus on the case $T=\mathbb N$. It is intuitive that we can decompose a submartingale to two random processes, one of which is a martingale.
:::info
**Theorem 13.17** (Doob Decomposition).
If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale, then there exist two random processes, $M_n$ and $A_n$ such that
1. $X_n=M_n+A_n$ for $n\in\mathbb N$,
2. $(M_n, \mathcal F_n)_{n\in\mathbb N}$ is a martingale,
3. $A_1=0$ and $A_n$ is $\mathcal F_{n-1}$-measurable for $n\geq 2$, and
4. $A_n$ is non-decreasing (increasing), i.e., $$A_n(\omega)\leq A_{n+1}(\omega)$$ almost surely for $n\in\mathbb N$.
If there is another pair of processes $\tilde{M}_n$ and $\tilde{A}_n$ with the same properties, then $M_n=\tilde{M}_n$ and $A_n=\tilde{A}_n$ almost surely.
:::
**Proof**. Let $(X_n, \mathcal F_n)_{n\in\mathbb N}$ be a submartingale. We can use the relations $$A_1=0,\quad M_1=X_1,$$ $$A_n=E(X_n\mid\mathcal F_{n-1})-X_{n-1}+A_{n-1},\quad M_n=X_n-A_n\quad\text{for }n\geq 2,$$ to define inductively the random processes $A_n$ and $M_n$. Clearly, properties 1, 3, and 4 are satisfied. To verify property 2, we have
\begin{align*}
E(M_n\mid \mathcal F_{n-1})&=E(X_n-A_n\mid\mathcal F_{n-1})\\
&=E(X_n\mid\mathcal F_{n-1})-E(A_n\mid\mathcal F_{n-1})\\
&=E(X_n\mid\mathcal F_{n-1})-A_n\\
&=X_{n-1}-A_{n-1}\\
&=M_{n-1}
\end{align*} for $n\geq 2$. Since this is a discrete case, we can show the property for all $s<n-1$ once we have the property for $s=n-1$ by the law of iterated expectations. $\square$
:::info
**Lemma** ([Law of Iterated Expectations](https://en.wikipedia.org/wiki/Law_of_total_expectation#Proof_in_the_general_case)/ Law of Total Expectation).
Let $X$ be an integrable random variable and let $\mathcal{G}$ be a $\sigma$-algebra. Let $\mathcal{F}$ be a $\sigma$-subalgebra of $\mathcal{G}$. Then, $$E(E(X\mid\mathcal{G})\mid\mathcal{F})=E(X\mid\mathcal{F}).$$
:::
**Proof**. Let $A\in\mathcal F\subseteq\mathcal G$. Then,
\begin{align*}
\int_A E(E(X\mid\mathcal{G})\mid\mathcal{F})\text{ d}P&=\int_A E(X\mid\mathcal{G})\text{ d}P\\
&=\int_A X\text{ d} P.
\end{align*} The first equation holds since it is the same with or without conditioning $\mathcal F$ under $A$. Hence, $$E(E(E(X\mid\mathcal{G})\mid\mathcal{F})\mid\mathcal{F})=E(X\mid\mathcal{F}).$$ In addition, since the left-hand side is conditioned on $\mathcal{F}$ twice, the second condition on $\mathcal{F}$ changes nothing, we obtain the desired result. $\square$
:::success
**Remark**. The Doob decomposition theorem can be used to determine the largest optimal exercise time of an American option.
:::
### Optional Sampling Theorem
:::info
**Theorem 13.18** (Optional Sampling Theorem).
If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale and $\sigma$ and $\tau$ are two stopping times such that $\sigma\leq\tau\leq k$ for some $k\in\mathbb N$, then $$X_{\sigma}\leq E(X_{\tau}\mid\mathcal F_{\sigma}).$$ If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a martingale or a supermartingale, then the same statement holds with $\leq$ replaced by $=$ or $\geq$ respectively.
:::
**Proof**. The case of $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a supermartingale is equivalent to considering the submartingale $(-X_n, \mathcal F_n)_{n\in\mathbb N}$. Thus, without loss of generality, suppose $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale. Let $A\in\mathcal F_{\sigma}$. Our final goal is to obtain $$\int_A X_\sigma\text{ d}P\leq \int_A X_\tau\text{ d}P.$$ For $1\leq m\leq n$, we define $$A_m=A\cap\{\sigma=m\},$$ $$B^0_n=A_m\cap\{\tau=n\},$$ $$B^1_n=A_m\cap\{\tau>n\},$$ and $$B^2_n=A_m\cap\{\tau\geq n\}.$$ Since $B^2_n=B^0_n\cup B^1_n$ and $B^0_n\cap B^1_n=\emptyset$, $$\int_{B^2_n} X_n\text{ d}P=\int_{B^0_n} X_n\text{ d}P+\int_{B^1_n} X_n\text{ d}P.$$ By the definition of a submartingale, we have $$E(X_n\mid B^1_n)\leq E(X_{n+1}\mid B^1_n),$$ and hence $$\int_{B^2_n} X_n\text{ d}P\leq\int_{B^0_n} X_n\text{ d}P+\int_{B^1_n} X_{n+1}\text{ d}P.$$ In addition, since $B^1_n=B^2_{n+1}$, $$\int_{B^2_n} X_n\text{ d}P-\int_{B^2_{n+1}} X_{n+1}\text{ d}P\leq\int_{B^0_n} X_n\text{ d}P.$$ Summing the inequality from $n=m$ to $k$ and using the definition of a submartingale again, we have
\begin{align*}
\sum_{n=m}^k \left(\int_{B^2_n} X_n\text{ d}P-\int_{B^2_{n+1}} X_{n+1}\text{ d}P\right)&\leq\sum_{n=m}^k \int_{B^0_n} X_n\text{ d}P\\
&\leq\sum_{n=m}^k \int_{B^0_n} X_\tau\text{ d}P\\
\int_{B^2_m} X_m\text{ d}P-\int_{B^2_{k+1}} X_k\text{ d}P&\leq \int_{B^2_m} X_\tau\text{ d}P.
\end{align*}
The last inequality holds since $$\bigcup_{n=m}^k B^0_n=B^2_m.$$ Since $B^2_{k+1}=\emptyset$ ($\tau\leq k$ by assumption), we simply have $$\int_{B^2_m} X_m\text{ d}P\leq \int_{B^2_m} X_\tau\text{ d}P.$$ Moreover, since $\sigma\leq\tau$, $$A_m=A\cap\{\sigma=m\}=B^2_m=A_m\cap\{\tau\geq m\}.$$ Taking summation from $m=1$ to $k$, we have $$\sum_{m=1}^k \int_{A_m} X_m\text{ d}P\leq\sum_{m=1}^k \int_{A_m} X_\tau\text{ d}P.$$ For the right-hand side, we have $$\sum_{m=1}^k \int_{A_m} X_\tau\text{ d}P=\int_{A\cap\{\sigma\leq k\}}X_\tau\text{ d}P=\int_A X_\tau\text{ d}P.$$ For the left-hand side, since $A_m=A\cap\{\sigma=m\}$, $$\sum_{m=1}^k \int_{A_m} X_m\text{ d}P=\sum_{m=1}^k \int_{A_m} X_\sigma\text{ d}P=\int_A X_\sigma\text{ d}P.$$ This completes the proof. $\square$
### Convergence of Margingale
> I use Koralov and Sinai's book Theory of Probability and Random Process (Chapter 13) as a main reference.
Do martingales have the "last" random process? Do we have something about random processes that is similar to numerical sequences?
:::info
**Theorem 13.33** (Doob's First Martingale Convergence Theorem).
Let $(X_n, \mathcal F_n)_{n\in\mathbb N}$ be a right-closable martingale. Then, $$\lim_{n\to\infty}X_n=X_{\infty}$$ almost surely and in $L^1(\Omega, \mathcal F, P)$.
:::
**Proof**. Let $$F=\bigcup_{n\in\mathbb N}\{f\in L^1(\Omega, \mathcal F, P)\mid f \text{ is }\mathcal F_n\text{-measurable}\}.$$ The idea is to find some element in $F$ such that we can say the limit is that specific element. We first want to show that $F$ is dense in $L^1(\Omega,\mathcal F_{\infty}, P)$. Since any indicator function of a set in $\mathcal F_{\infty}$ can be approximated by elements of $F$, the same is true for finite linear combinations of indicator functions which are dense in $L^1(\Omega,\mathcal F_{\infty}, P)$. Let $\varepsilon>0$. Since $X_\infty$ is $\mathcal F_{\infty}$-measurable, there exists a $Y_\infty\in F$ such that $E(|X_\infty-Y_\infty|)<\varepsilon^2$ by the density of $F$. Define $Y_n=E(Y_\infty\mid\mathcal F_n)$. Then,
\begin{align*}
E(X_n-Y_n\mid\mathcal F_{n-1})&=E(X_{n}\mid\mathcal F_{n-1})-E(E(Y_\infty\mid\mathcal F_{n})\mid\mathcal F_{n-1})\\
&=X_{n-1}-E(Y_\infty\mid\mathcal F_{n-1})\\
&=X_{n-1}-Y_{n-1},
\end{align*} which implies that $(X_n-Y_n, \mathcal F_n)_{n\in\mathbb N}$ is a martingale. Hence, $(|X_n-Y_n|, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale. It follows that $$E(|X_n-Y_n|)\leq E(|X_\infty-Y_\infty|).$$
> **Theorem 13.22** (Doob's inequality)
>
> If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale, then $$\lambda\cdot P(A(\lambda, n))\leq\int_{A(\lambda, n)}X_n\text{ d}P\leq E(\max(X_n, 0))$$ for any $n\in\mathbb N$ and for any $\lambda>0$, where $\displaystyle A(\lambda, n)=\{\omega\in\Omega\mid \max_{1\leq i\leq n}X_i(w)\geq \lambda\}$.
By Doob's inequality, $$\varepsilon\cdot P(\sup_{n\in\mathbb N}|X_n-Y_n|>\varepsilon)\leq\sup_{n\in\mathbb N}E(|X_n-Y_n|)\leq E(|X_\infty-Y_\infty|)\leq\varepsilon^2.$$ Since $Y_\infty$ is $\mathcal F_N$-measurable for some finite $N$, we may have $Y_n=Y_\infty$ for $n\geq N$. Therefore, $$P(\limsup_{n\to\infty} X_n - Y_\infty>\varepsilon)\leq\varepsilon\quad\text{and}\quad P(\liminf_{n\to\infty} X_n - Y_\infty<-\varepsilon)\leq\varepsilon.$$
> **Theorem** (Markov's Inquality).
>
> Let $X$ be a nonnegative random variable and let $a>0$. Then, $$P(X\geq a)\leq\dfrac{E(X)}{a}.$$
In addition, by Markov's inequality, $$P(|X_n - Y_\infty|\geq\varepsilon)\leq \dfrac{E(|X_\infty-Y_\infty|)}{\varepsilon}\leq\varepsilon.$$ Hence, $$P(\limsup_{n\to\infty} X_n - Y_\infty>2\varepsilon)\leq2\varepsilon\quad\text{and}\quad P(\liminf_{n\to\infty} X_n - Y_\infty<-2\varepsilon)\leq2\varepsilon.$$ Since $\varepsilon$ is arbitrary, $$\lim_{n\to\infty}X_n=X_{\infty}$$ almost surely. Moreover, since $E(|X_\infty-Y_\infty|)\leq\varepsilon^2$, $E(|X_n-Y_n|)\leq E(|X_\infty-Y_\infty|)$, and $Y_n=Y_\infty$ for large $n$'s, the convergence of $\{X_n\}$ to $X_\infty$ is in $L^1(\Omega, \mathcal F, P)$ due to the assumpstion that $Y_\infty\in F\subseteq L^1(\Omega, \mathcal F, P)$. $\square$
:::danger
**Definition 13.31** (Right-Closable).
A martingale $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is said to be <u>right-closable</u> if there exists a random variable $X_{\infty}\in L^1(\Omega, \mathcal F, P)$ such that $E(X_{\infty}\mid\mathcal F_n)=X_n$ for all $n\in\mathbb N$.
We define $\mathcal F_{\infty}$ as the minimal $\sigma$-algebra containing $\mathcal F_n$ for all $n$. That is, $$\mathcal F_{\infty}=\lim_{n\to\infty}\mathcal F_n$$ in the sense that $\{F_n\}_{n\in\mathbb N}$ is a filtration. The limit must exist since $\mathcal F_n\subseteq\mathcal F$.
:::
:::info
**Theorem 13.22** (Doob's inequality)
If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale, then $$\lambda\cdot P(A(\lambda, n))\leq\int_{A(\lambda, n)}X_n\text{ d}P\leq E(\max(X_n, 0))$$ for any $n\in\mathbb N$ and for any $\lambda>0$, where $\displaystyle A(\lambda, n)=\{\omega\in\Omega\mid \max_{1\leq i\leq n}X_i(w)\geq \lambda\}$.
:::
**Proof**. Let $n\in\mathbb N$ and let $\lambda>0$. We split two cases to firstly define a stopping time $\sigma$ in order to apply the optional sampling theorem. If $\displaystyle\max_{i\leq n}X_i\geq\lambda$, we define $\sigma$ be the first time when $X_i\geq\lambda$; if $\displaystyle\max_{i\leq n}X_i<\lambda$, we just put $\sigma = n$. We define another stopping time $\tau = n$. Since $\sigma\leq\tau$, we can apply the optional sampling theorem.
> **Theorem 13.18** (Optional Sampling Theorem).
>
> If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a submartingale and $\sigma$ and $\tau$ are two stopping times such that $\sigma\leq\tau\leq k$ for some $k\in\mathbb N$, then $$X_{\sigma}\leq E(X_{\tau}\mid\mathcal F_{\sigma}).$$ If $(X_n, \mathcal F_n)_{n\in\mathbb N}$ is a martingale or a supermartingale, then the same statement holds with $\leq$ replaced by $=$ or $\geq$ respectively.
Since $X_{\sigma}\geq\lambda$ on $A(\lambda, n)$, we can easily have $$\int_{A(\lambda, n)}\lambda\text{ d}P\leq\int_{A(\lambda, n)}X_{\sigma}\text{ d}P.$$ Note that $A(\lambda, n)\in\mathcal F_{\sigma}$ *(Why?)* since
\begin{align*}
A(\lambda, n)\cap\{\sigma\leq m\}&=\{\omega\in\Omega\mid \max_{1\leq i\leq n}X_i(w)\geq \lambda\}\cap\{\omega\in\Omega\mid\sigma(\omega)\leq m\}\\
&=\{\omega\in\Omega\mid\max_{1\leq i\leq m}X_i(w)\}\\
&\in\mathcal F_m
\end{align*} for any $m\in\mathbb N$. Hence, by the optional sampling theorem, $$X_{\sigma}\leq E(X_{\tau}\mid\mathcal F_{\sigma}),$$ which implies $$E(X_{\sigma}\mid A(\lambda, n))\leq E(E(X_{\tau}\mid\mathcal F_{\sigma})\mid A(\lambda, n))=E(X_{\tau}\mid A(\lambda, n)).$$ Hence, $$\int_{A(\lambda, n)}\lambda\text{ d}P\leq\int_{A(\lambda, n)}X_{\sigma}\text{ d}P\leq\int_{A(\lambda, n)}X_{\tau}\text{ d}P=\int_{A(\lambda, n)}X_{n}\text{ d}P.$$ The last equality follows since we set $\tau=n$. For the second inequality we want to show, we just take positive part and integrate them up, so it must be greater than any other integral. $\square$