[Crude Draft] $EE629$: Probability Models and Application Course @ IIT Dharwad

B. N. Bharath
Assistant professor,
Department of Electrical, Electronics and Communication Engineering (EECE),
IIT Dharwad

Commenting

Please write your name and roll number followed by your comments. This helps me know who has commented. Also, please do not hesitate to comment. All the comments will be taken seriously, and no comment is a stupid comment :grinning:

Logistics

The following are the TAs for this course (will update soon!):

Shruti; unofficial TA (PhD student)

Evaluations: Quiz ($25\%$), one midsem ($35\%$) and one final exam ($40\%$). The quiz/midsem/final exam may contain some assignments (given during the regular classes).

Quiz was conducted on 18th September, 2024 at 2 PM. The following topic will be covered in the quiz
- Linear algebra-vector space definition, subspace, span, bases, matrices, PSD matrix, eigen value decomposition,
- Real analysis: problems related to convergence of sequences, $\sup$, $\inf$, $\limsup$, $\liminf$ etc.
- Probability: definition of probability measure, $\sigma$-algebra, algebra, semialgebra etc.
Midsem exam syllabus: Everything except almost sure convergence and related results. Convergence in probability and mse are included.
Quiz 2 Syllabus: Almost sure convergence, expectation of rvs, conditional expectation/probability and its properties, Guassian vectors, MMSE estimation, use of inequalities (Markov, Chebyshev and Chernoff), Markov chain definition, TPM and its properties.
Final and midsem exams will be as per the academic calendar shared by the academic office.
The repeat exam or evaluation will not be encouraged. If someone misses the quiz for a genuine reason, th en I may may use the final exam marks to grade with a possible penalty. The penalty will be decided based on the reason for absence.

References

"Introduction to Probability, statistics and Random processes," click here
Nice problems can be found here

Announcements

All the announcements will be made here.

Introduction

This course covers a bit of meaure theoretic probability, and hence requires some background in analysis and linear algebra. This will be briefly dealt with in the following.

Preliminary

First, let us look at convergence in $\mathbb R$. Let us define it using the notion of "$\epsilon$-room". Take a point $a \in \mathbb R$. An $\epsilon$-room around $a$ is simply an interval around $a$ whose size is $\epsilon$. More precisely, $\mathcal E_a := \{y: |a-y| < \epsilon/2\}$. Note that $|a-y|$ denote the distance between points $a$ and $y$. Now, we are ready to look at a sequence of numbers, say $x_n \in \mathbb R$ for any natural number $n$. Example includes $1, 1/2, 1/3, \ldots$, where $x_1 = 1$, $x_2 = 1/2$ and so on. Next, we define the notion of a sequence $x_n$ entering an $\epsilon$-room around $a$, and never leaving it. Intuitively, it means that the sequence should enter the region for say $n=1000$, and should not after that, i.e., for all $n > 1000$. A two dimensional picture of $\epsilon$-room is shown below:

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

More precisely, we have the following definition.

Definition (Entering $\epsilon$-room): we say that $x_n$ is locked in the $\epsilon$-room around $a$ if $x_n \in \mathcal E_a$ for all $n > N$ for some natural number $N$.

If a sequence converges to $a$, then we expect it be locked in the $\epsilon$-room around $a$ for $\epsilon > 0$. Intuitively, this means that the sequence enters a room, however small around $a$. Smaller the room, it may take longer for the sequence to enter, i.e., larger $N$. In otherwords, $N$ may depend on $\epsilon$, and hence we use $N_{\epsilon}$. A more precise statement of convergence is given below.

Definition (convergence): We say that a sequence $x_n$ conveges to $a$ if it is locked in the $\epsilon$-room around $a$ for every $\epsilon > 0$.

Notation: We use $\lim_{n \rightarrow \infty} x_n = a$ or $x_n \rightarrow a$ to say that $x_n$ converges to $a$, and a precise definition is given above. "$\lim$" is merely a notation.

Let us look at a few examples.

Example: Consider the sequence $x_n = 1/n$. To prove that it converges to $0$, we need to show that it enters $\epsilon$-room for any size. So, pick a room size $\epsilon > 0$. For this room, the sequence will enter it if $1/n < \epsilon$. In other words if $n > 1/\epsilon$, then $x_n$ is locked in the $\epsilon$-room around $0$. Here, $N_{\epsilon} = \lceil\frac{1}{\epsilon} \rceil$.

Problem: Prove that the sequences (i) $1/n^2$, (ii) $\log n/n$, and (iii) $\sin n/n$ converge to zero.

An interesting fact about convergence is that you can do some algebra on the sequence, and convergence of the resulting sequence is well behaved. Let me make it more clear.

If $x_n \rightarrow a$ and $y_n \rightarrow b$, then $x_n + y_n \rightarrow a+b$.
If $x_n \rightarrow a$ and $\alpha x_n \rightarrow \alpha a$ for any $\alpha \in \mathbb R$.
If $x_n \rightarrow a$ and $y_n \rightarrow b$, then $x_n / y_n \rightarrow a/b$ as long as $y_n, b \neq 0$.
If $x_n \rightarrow a$ and $y_n \rightarrow b$, then $x_n y_n \rightarrow ab$.

Problem: Prove the above. Also, prove that $x_n^2 \rightarrow a^2$, and use it to prove the last problem above.

If the sequence is bounded, i.e., there exists $M > 0$ such that $|x_n| < M$ for all $n$ and monotone, then intuitively, we expect it to converge. This is true, and is the essence of the following argument.

The idea is to come up with a guess for the limit, and then prove that it is the limit. The guess works as follows. Without loss of generality, assume that the sequence is increasing (otherwise, consider $-x_n$). Let $S:=\{x_n: n \geq 1\}$ be all the numbers in the sequence. Intuitively, the supremum of the sequence $S_1$ should be the limit, i.e., $L:= \sup_n S$. Wait a minute, what is supremum?

Consider the sets $(0,1)$, $[-1,5)$, $A:=\{1-1/n: n \geq 1\}$. What are the maximas of this set? Definitely $1$ and $5$ are not maximas of this set as it is not there in the set! Really? What about the last set? Does $1 \in A$? Think. Although the maxima does not belong to set, it would be good (for all practical purposes) to consider the respective "boundary" points as a proxy for the maxima. Then we need to define it mathematically. Lets do it by first defining the upper bound.

We say that $M$ is an upper bound on the set $A \subseteq \mathbb R$ if $x \leq M$ for all $x \in A$.
For any set $A\subseteq \mathbb R$ consider the set of all upper bounds. Cleary, this upper bound is non-empty. The minimum value of this set is contained in the set (why? verify with $(0,1)$). Thus, define the minimum of this set as supremum of $A$. Similarly, define the infimum. In summary, supremum is the least upper bound and infimum is the greatest lower bound.

Now, we need to prove that $L$ is the supremum. Lets invoke our locked room concept. Take any $\epsilon > 0$. Consider an $\epsilon$ room around $L$. Since the sequence is increasing, and $L$ is the upper bound, we have that the sequence enters the "left half" room, i.e., $(-\epsilon/2,L)$. More specifically, there exists $N_{\epsilon}$ such that $|x_n - L| < \epsilon/2$ for all $n > N_{\epsilon}$. This proves that the sequence can be locked! But, the choice of $\epsilon$ is arbitrary, and hence this is true for all $\epsilon$. This results in the following theorem.

Theorem: Bounded monotone sequence converges.

Aside: For series, see lecture $18$ onwards here.
Now, can we have the notion of convergence of sequence of vectors? To do so, we need to study vectors, i.e., linear algebra, which will be briefly reviewed below.

Quick Introduction to Linear Algebra
Linear algebra is a study of vectors and its abstraction. For more details click here and refer to chapter $1$ and $3$.

Intorduction to Probability Measure

The subject starts by considering a sample space $\Omega$; the set of all outcomes. Obviously, $\Omega$ depends on the problem. For example, in the tossing of a coin example, $\Omega = \{H,T\}$. However, if you consider rotation of a disk example, then $\Omega = [0,2 \pi]$. Thus, $\Omega$ can be countable or uncountable. The task is to attach a number called probability to every subset of $\Omega$. When we attach probability, it should satisfy the following properties:

$\Pr\{\Omega\} = 1$
For every disjoint sets $A_i$, $i=1,2,\ldots,n$, $\Pr\{\bigcup_{i=1}^n A_i\} = \sum_{i=1}^n \Pr\{A_i\}$.

Now we ask the following important question.

Question: Given any subset of $\Omega$, is it possible to attach probability that satisfies the above property?

Before attempting this question, we need to know how to start? Towards this, we shall first use $\Omega$ that is discrete.

Example A: Let $\Omega = \{H,T\}$. Suppose I attach probability this way: $\Pr\{H\} = p$ and $\Pr\{T\} = 1-p$ for any $0 \leq p \leq 1$. Then everything is fine, i.e., given any subset, i.e., $\{H\}, \{T\}, \{H,T\}, \Phi$, the probabilities are $\{p, 1-p, 1, 0\}$, which obeys all the properties of being a probability.

Example B: Let $\Omega = \{1,2,3,4,5,6\}$. Suppose I attach probability this way: $\Pr\{i\} = 1/6$ for $i=1,2,3,4,5,6$. Then everything is fine, i.e., given any subset, i.e., $\texttt{fill the subset yourself:}$, the probability of a subset $A \subseteq \Omega$ will be $\frac{\# \text{ of elements in } A}{6}$. This obeys all the properties of being a probability.

From the above two examples, it seem like there is no problem in assigning probabilities to every subset of $\Omega$! Indeed the following exercise confirms this.

Exercise: Show that if $\Omega$ is finite, then assigning probabilies to each elements in $\Omega$ such that it sums to one is a valid probability.

Well now whats the problem? Let us consider $\Omega = [0,1]$. Can we assign probabilities to every subset of $\Omega$ here? Before answering this, we need to know where to start?

Humble begining
We shall begin with some simple sets of the form
\begin{equation} \mathcal{S}:=\{(a,b),[a,b],(a,b],[a,b): a \leq b\} \end{equation}

the set of all intervals. In the rotation of a disc example, we can assign probabilities to each of the interval in a reasonable way by conducting experiments. For example, rotate the disc $n$ times and count how many times you observed an angle between $[\theta_1,\theta_2]$ will give $\Pr\{[\theta_1,\theta_2]\} = \frac{\text{# of times angle is in } [\theta_1,\theta_2]}{n}$. Thus, it is safe to assume that we have assigned probabilities to every peace in $\mathcal{S}$ in a reasonable way. This is the starting point. Let us investigate the properties of $\mathcal{S}$.

Properites of $\mathcal{S}$:

$\Phi$ and $\Omega$ is in $\mathcal{S}$
If $A, B \in \mathcal{S}$ implies $A \bigcap B \in \mathcal{S}$; easy to verify
If $A \in \mathcal{S}$, then $A^c$ can be written as a finite union of sets in $\mathcal S$, i.e., there exists $C_1,C_2,\ldots, C_n \in \mathcal S$ such that $A^c = \bigcup_{i=1}^n C_i$. Again easy to verfiy!

Let us give a name to such a system; semi-algebra also called semi-field.

Interesting example: Let me give an interesting example. Suppose you toss a coin infinitely many times, then $\Omega := \{(\omega_1,\omega_2,\ldots,): \omega_i \in \{H,T\}\}$. Consider a collection of finite subset of $\Omega$ as follows: $\mathcal C := \{(\omega_j,\omega_{j+1},\ldots,\omega_{j+n}: \omega_k \in \{H,T\}, k= j,j+1,\ldots,j+n, j,n \in \mathbb N\}$. This is analogous to intervals. Show that $\mathcal C$ is a semi-algebra.

So far we have a probability defined on a semialgebra denoted $\mathbb P_S$. Note that the semi-algebra is not a rich enough set. For example, it does not include the set of all rationals in $[0,1]$. How do we make it rich enough? The first step in this endeveour is to include all the intervals and finite union of intervals. For example, $[0,0.1] \bigcup (0.2,0.3] \bigcup [0.25,0.5)$ should belong to the set. Note that this is not necessarily disjoint but can be written as the disjoint union. Can you think how? Lets make this more formal. Consider the following set:
\begin{equation} \mathcal{A} := \left\{\bigcup_{i=1}^n I_i: I_i \in \mathcal{S}, \text{for any finite } n\right\}. \end{equation}

Properties of $\mathcal{A}$:

$\Omega, \Phi \in \mathcal{A}$.
If $A, B \in \mathcal A$, then $A \bigcap B^c \in \mathcal A$.
Closed under finite union.

Comments: Note that the second condition above implies closed under complement and intersection. Prove this as a homework problem.

Definition (algebra): Consider any set $\Omega$. We say that any set of collections of subsets of $\Omega$ denoted $\mathcal A$ is an algebra if it satisfies the above property.

Exercise: Extend the set $\mathcal C$ in the infinite coin toss example to form an algebra.

How do we attach probability to the set in algebra in a reasonable way? Easy! Take any element $A$ in $\mathcal A$. Then it can be written as $A = \bigcup_{i=1}^n I_i$ for some non-disjoint $I_i \in \mathcal S$ and $n \in \mathbb N$. But we can write this as a dijoint union (why?), say $A = \bigcup_{i=1}^N J_i$, where $J_i$'s are disjoint and $N$ is some natural number. Then, $\mathbb P_A: \mathcal A \rightarrow [0,1]$ is defined as

\begin{equation} \mathbb P_A \{A\} = \sum_{i=1}^N \mathbb{P}_S(J_i)! \end{equation}

There is one problem. Is the expression $A = \bigcup_{i=1}^N J_i$ unique? No! Then, if there are many ways of decomposing the set $A$, then each will induce a probability as above. Luckily, it turns out that all the decompositions result in the same probability for a set $A \in \mathcal A$:

FACT $1$: The probability defined as $\mathbb P_A \{A\} = \sum_{i=1}^N \mathbb{P}_S(J_i)$ is consistentent, i.e., the value is the same regardless of the decomposition.

Note that the above fact requires a proof. I will skip the proof, and leave it as a homework problem.

Question: Is $\mathbb P_A()$ a valid probability?

We need to check if it satisfies the property of being a probability:

Consider $\Omega$. Since $\Omega \in \mathcal S$, we have $\mathbb P_S(\Omega) = 1$. Letting $J_1 = \Omega$, and $J_i = \Phi$ for all $i$, we get $\mathbb P_A (\Omega) = \mathbb P_S(\Omega) = 1$.
Take disjoint sets in $\mathcal A$ say $A_1,A_2,\ldots, A_n$, then each $A_i = \cup_{j=1}^{n_i} J^{(i)}_{j}$, $i=1,2,\ldots, n$ where $J^{(i)}_{j} \in \mathcal S$. It is possible to choose $J^{(i)}_{j}$ such that $\bigcup_{j=1}^{n_i} J^{(i)}_{j}$ is disjoint for each $i$ since $A_i$'s are disjoint. Now, consider

\begin{eqnarray} \mathbb P_A(\cup_{j=1}^{n} A_{i}) &=& \sum_{i=1}^n \sum_{j=1}^{n_i} \mathbb P_S(J^{(i)}_{j}) \\ &=& \sum_{i=1}^n \mathbb P_S(\cup_{j=1}^{n_i} J^{(i)}_{j})\\ &=& \sum_{i=1}^n \mathbb P_A(A_{i}). \end{eqnarray}

Reason out why each equality above is true. The probability defined above is consistent in the sense that for every $S \in \mathcal S$, $\mathbb P_{A}(S) = \mathbb P_{S}(A)$ (requires a proof but easy). This consistency is very crucial throughout.

Now, we have a reasonable way to measure probability for each set in the algebra. What next? Ideally, one should take every subset of $\Omega$ and see if we can assign probability. It turns out that this is not possible in general due to a result famously known as the Banach Tarxsi paradox. Hence we will abondon this idea. The next best thing is to look for a richer set than $\mathcal A$ but not as big as the power set $2^{\Omega}$. This requires us to look at a way of approximately measuring any subset. This leads us to the idea of outer and inner measures (analogous to how you measure weight of an object by putting or removing weights from the other balance).

Take any set $A \subseteq \Omega$. Cover (see this for the definition of cover; especially the picture) the set by a sequence of sets from $\mathcal A$. Say the cover is $A_1,A_2,\ldots,$. Then, $\sum_{j=1}^\infty \mathbb P_A(A_j)$ is an upper bound on the true measure (if at all it exists). However, we can refine the cover to get better estimates. The best among the class is the outer measure defined below.

Outer Measure: For any set $A \subseteq \Omega$, the outer measure is defined as
\begin{equation} \mathbb P^{*}(A):=\inf \left\{\sum_{j=1}^\infty \mathbb P_A(A_j): A \subseteq \bigcup_{j=1}^{\infty} A_j, A_j \in \mathcal A \right\}. \end{equation}

Inner measure is a bit tricky. Take a set $A \subseteq \Omega$ for which we need to define inner measure. Let $E \in \mathcal A$ such that $A \subseteq E$. There always exists such a set $E$-one such set is $E = \Omega$. Take the complement of $A$ in $E$, i.e., $A^c \bigcap E$. Find the outer measure of this set. Then the inner measure is simply the measure of $E$ minus the outer measure of $A^c \bigcap E$.

Inner Measure: For any set $A \subseteq \Omega$, the outer measure relative to the set $E \in \mathcal A$ is defined as
\begin{equation} \mathbb P_{*,E}(A):= \mathbb P_{\mathcal A}(E) - \mathbb P^{*}(A^c \bigcap E). \end{equation}

Note: In the class, we were interested in only $\Omega = [0,1]$ in which case we had used $E = \Omega$; this results in the inner measure $\mathbb P_{*}(A):= 1 - \mathbb P^{*}(A^c)$. The above definition of inner measure is very general.

We say that the set is measurable if the outer measure is equal to the inner measure. More specifically,
\begin{equation} \mathbb P^{*}(A):= \mathbb P_{\mathcal A}(E) - \mathbb P^{*}(A^c \bigcap E). \end{equation}
But wait, this depends on $E$! So, a natural resolution is the following.

Measurable set: We say that any set $A \subseteq \Omega$ is measurable if for every $E \subseteq \Omega$ such that $A \subseteq E$, we have
\begin{eqnarray} \mathbb P_{\mathcal A}(E) &=& \mathbb P^{*}(A) + \mathbb P^{*}(A^c \cap E)\\ &=& \mathbb P^{*}(A \cap E) + \mathbb P^{*}(A^c \cap E). \end{eqnarray}

Note: The above definition does not work for all condition. For example, if $\Omega$ is an abstract set, and $A$ is any subset. In such cases, we are forced to choose $E= \Omega$. However, it is impossible to define the outer measure of the complement as the complement is a null in this case. Note that this was not an issue in the case of $\Omega = [0,1]$. See class notes.

Motivated by the above, it is not necessary to have the condition $A \subseteq E$, which can be removed from the above definition, and still have a valid theory. Thus, we use the following definiton due to mathematician Caretheodary.
[Bad image]

Caratheodary's Measurable set: We say that a set $A \subseteq \Omega$ is measurable if for every $E \subseteq \Omega$, we have

\begin{eqnarray} \mathbb P_{\mathcal A}(E) = \mathbb P^{*}(E) &=& \mathbb P^{*}(A \cap E) + \mathbb P^{*}(A^c \cap E). \end{eqnarray}

In this case, the measure is defined to be $\mathbb P^{*}(A)$.

Obviously, there are bad sets. But how big is the collection of good sets? More specifically, what can we say about the set of good sets:
\begin{equation} \mathcal{F}:=\{A \subseteq \Omega: A \text{ is measurable}\}. \end{equation}

Properties of $\mathcal F$:

$\Omega \in \mathcal F$.
If $A \in \mathcal F$, then $A^c \in \mathcal F$.
Closed under countable union.

Reading Assignment: Prove the above stated properties of $\mathcal F$.

There is a name for such a system. It is called $\sigma$-algebra or the $\sigma$-field. Thus, $\mathcal F$ contains set of measurable sets also called events in the probability literature.

Reading Assignment: Read about the smallest $\sigma$-algebra, $\sigma$-algebra generated by a set, Borel $\sigma$-algebra and Lebesgue measure.

Now we are ready to start the theory of probability.

Definition: Probability space is a tuple $(\Omega, \mathcal F, \mathbb P)$, where $\Omega$ is the sample space, $\mathcal F$ is the $\sigma$-algebra and $\mathbb P$ is the probability measure defined on $\mathcal F$.

With this powerful setup, we can deduce many useful conclusions.

Lemma: If $A \subseteq B$ measurable, then $\mathbb P(A) \leq \mathbb{P}(B)$.
Proof: The set $B = A \cap (A^c \cup B)$; disjoint sets. Thus, $\mathbb P(B) = \mathbb P(A) + \mathbb P(A^c \cup B) \geq \mathbb P(A)$. Easy :grinning:

Exercise: Show that $\mathbb P(A \cap B) = \mathbb P(A) + \mathbb P(B) - \mathbb P(A \cap B)$.

Defintion: We say that the events $A, B \in \mathcal F$ are independent if $\mathbb P(A \cap B) = \mathbb P(A) \mathbb P(B)$.

Example: Consider $\Omega = \mathbb R$, and the probability on the interval $[a,b]$ (can consider open or half open/closed intervals as well) is given by
\begin{equation} \mathbb P([a,b]) = \frac{1}{\sqrt{2 \pi} }\int_{a}^b e^{-x^2/2}dx. \end{equation}
One can show that this is a valid probability on intervals, i.e., $\mathcal S$ the semi-algebra. One can carry out the method above to show that the probability can be extended first to an algebra, and then to a sigma-algebra, in this case, also known as Borel sets $\textbf B(\mathbb R)$. This measure is called the Gaussian measure which you are familiar with from your previous courses.

Example: Consider $\Omega = \mathbb R^{+}$, and the probability on the interval $[a,b]$ (can consider open or half open/closed intervals as well) is given by
\begin{equation} \mathbb P([a,b]) = \lambda \int_{a}^b e^{-\lambda x}dx. \end{equation}
One can show that this is a valid probability on $\mathbb R^{+}$. This measure is called the exponential measure with rate $\lambda > 0$.

Exercise: Provide more examples of measure like the one above based on your previous courses. Also, use the Guassian measure and construct a measure on every Borel sets.

Now, we are ready to use some of the concepts above to some interesting problems. Not really practical problems-we are not yet there. Let us look at the following example, where we toss the coin infinitely many times. The sample space is

\begin{equation} \Omega = \{(\omega_1,\omega_2,\ldots,): \omega_i \in \{H,T\}\}. \end{equation}

The twist is that each time I toss, I change the coin. More precisely, I use a coin with bias $1/n$ for the $n$-th toss. As you toss more and more, the chances of Head to occur becomes lesser and lesser. Now, we ask the following question:

Question: What is the probability that we observe infinitely many Heads?

One expects (intuitively) that is zero or very less to say the least. Spoiler alert: The answer is that with probability one we observe infinitely many Heads!! We model the situation as follows and generalize it further.

Let $A_n$ denote the event that the $n$-th toss is a Head. If we observe infinitely many Heads, then all of the following should happen:

we should observe at least one Head after the first toss (i.e, the event $B_1 := \cup_{i=1}^{\infty} A_i$ should occur)
we should observe at least one Head after the first toss (i.e, the event $B_2 := \cup_{i=2}^{\infty} A_i$ should occur)

It means that the following event should happen $\cap_{j=1}^{\infty} B_j = \cap_{j=1}^{\infty}\cup_{i=j}^{\infty} A_i$. It is a if and only if condition. We use the following short hand to denote it: $A_n \text{ i.o}$.

We state our first result. The result is general, where events are not necessarily related to the coin toss example above.

Borel-Cantilli Lemma $1$: Suppose $A_1,A_2,\ldots, \in \mathcal F$ are such that $\sum_{j=1}^{\infty} \mathbb P(A_j) < \infty$, then $\mathbb P(A_n \text{ i.o.}) = 0$.

Note that the coin toss example don't satisfy the condition of the Lemma: $\sum_{j=1}^{\infty} \mathbb P(A_j) = \sum_{j=1}^{\infty} \frac{1}{n} = \infty$. On the other hand, if the bias of the coin is replaced with $1/n^2$, then the condition holds good, and we can use the result to conclude that we do not observe infinitely many Heads! Before proving the above lemma, I will state another related result.

Borel-Cantilli Lemma $2$: Suppose $A_1,A_2,\ldots, \in \mathcal F$ are independent events, and satisfy $\sum_{j=1}^{\infty} \mathbb P(A_j) = \infty$, then $\mathbb P(A_n \text{ i.o.}) = 1$.

Proof of Borel-Cantilli Lemma $1$

Consider the following

\begin{eqnarray} \Pr\{A_n \text{ i.o. }\} &=& \Pr\{\cap_{j=1}^{\infty}\cup_{i=j}^{\infty} A_i\}\\ &\stackrel{\text{why?}}{=}& \Pr\{\cup_{i=j}^{\infty} A_i\}\\ &\leq& \sum_{i=j}^{\infty} \Pr\{A_j\} \text{ for all $j$}\\ &\stackrel{\text{why?}}{\rightarrow}& 0 \end{eqnarray}

as $j \rightarrow \infty$. It was an easy proof.

Proof of Borel-Cantilli Lemma $2$

Consider the following (note the usage of the assumption in the Lemma below)

\begin{eqnarray} \Pr\{A_n \text{ i.o. }\} &=& 1 - \Pr\{\cup_{j=1}^{\infty}\cap_{i=j}^{\infty} A_i^c\}\\ &\stackrel{\text{why?}}{\geq}& 1 - \sum_{j=1}^{\infty}\Pr\{\cap_{i=j}^{\infty} A_i^c\}\\ &\stackrel{\text{why?}}{=}& 1 - \sum_{j=1}^{\infty}\Pi_{i=j}^{\infty}\Pr\{A_i^c\}\\ &{=}& 1 - \sum_{j=1}^{\infty} \exp\left\{-\sum_{i=j}^{\infty}\Pr\{A_i\}\right\}, \end{eqnarray}
where the last inequality follows from the fact that $1-x \leq e^{-x}$. Now, from the fact that $\sum_{i=1}^{\infty} \Pr\{A_i\} = \infty$, we have $\sum_{i=j}^{\infty} \Pr\{A_i\} = \infty$ for all $j$. Hence, the summand above is zero for all $j$. Hence, we have $\Pr\{A_n \text{ i.o. }\} \geq 1$; this completes the proof.

Applying the Borel-Cantelli Lemmas $1$ and $2$ to the problem of tossing, we get: $\Pr\{A_n \text{ i.o. }\} =0$ for $p=1/n^2$ and $\Pr\{A_n \text{ i.o. }\} = 1$ for $p=1/n$.

Notation: Often times, the notation $\lim \sup A_n$ (resp. $\lim \inf$) is used in place of $A_n \text{ i.o.}$ (resp. $A_n^C \text{ i.o}$).

Application in percolation theory: Need to fill in.

Problems

Let $A_1, \ldots, A_n$ be not necessarily disjoint events, and let $X$ be the number of events that occur. This is a random variable with $X(s) = \left| \{ i : s \in A_i \} \right| = \sum_{i=1}^{n} \mathbf{1}_{A_i}(s)$. Since expectation is a linear operator, we have $\mathbb{E}[X] = P(A_1) + P(A_2) + \cdots + P(A_n)$. Use this fact to answer the following questions:

Question 1: There are $m$ urns and $n$ balls. Each ball is dropped into an urn at random, independently of the other balls. (An urn may contain more than one ball. What is the expected number of urns that hold at least one ball?

Question 2: A seven-card stud poker hand is dealt from a well-shuffled standard deck. Let $X$ be the random variable that tells how many Clubs are in the hand. What is $\mathbb{E}[X]$?

Select $n$ points on a circle (not a disc) independently according to a uniform distribution. What is the probability that there is a semicircle containing all of them?

Let $S_1, S_2,\ldots$ be a sequence of collections of subsets of $\Omega$, such that $S_n\subset S_{n+1}$ for all n.
(a) Suppose that each $S_i$ is an algebra. Prove that $\cup_{i=1}^{\infty}S_i$ is also an algebra.
(b) Suppose that each $S_i$ is a $\sigma$-algebra. Show (by counterexample) that $\cup_{i=1}^{\infty}S_i$ might not be a σalgebra.

Expected Value

Let us turn our attention to finding the averages. Suppose you have a probability space $(\Omega,\mathcal F, \mathbb P)$, then often times the sample space contains non-numerical values such as Head and tail, for example. It is often times convenient to use numbers in place of these non-numerical outcomes. A natural way of doing it is to have a map $X: \Omega \rightarrow \mathbb R$. This was called a random variable in your undergraduate level courses. However, to be called a random variable, you should be able to find averages. What do you mean by averages? Towards answering this, let us start with a simple set of random variables of the form

\begin{equation} X(\omega) = \sum_{j} a_i \mathbb 1\{\omega \in A_i\}, \end{equation}

where $A_i \in \mathcal F$ form a partition of $\Omega$. The set of such functions are called simple functions. Examples include $X(\omega) = 1 \times \mathbb 1\{\omega = H\} + 0 \times \mathbb 1\{\omega = T\}$ in case of tossing a coin experiment.

**Definition (Expected value of simple functions): The expected value of simple random variable $X(\omega) = \sum_{j} a_i \mathbb 1\{\omega \in A_i\}$ is defined as

\begin{equation} \mathbb E X(\omega) = \sum_{i} a_i \mathbb \Pr\{\omega \in A_i\}. \end{equation}

In the above, $A_i \in \mathcal F$.

The above does not cover all possible random variables. However, it serves as a possible way to approximate the expected value of any random variable. Let us consider the set of all simple functions/random variables by

\begin{equation} F_\texttt{Sim} := \left\{X: X(\omega) = \sum_{i} a_i \mathbb{1}\{\omega \in A_i\} \text{ for some partition } A_i \in \mathcal F\right\} \end{equation}

First, let us state a few results.

Exercise: If $\mathbb E(X+Y) = \mathbb E X + \mathbb E Y$ for two simple random variables $X$ and $Y$.

Now, let us see how we can use simple function to approximate all functions (all? :grimacing:).

Definition: Let $X$ be any random varialbe. Then, its upper expected value is defined as

\begin{equation} \mathbb E(X_U) := \inf \left\{\mathbb E Y: X \leq Y, Y \in F_{\texttt{Sim}}\right\}. \end{equation}

Note that the above exists for all $X$. Similarly, let us define the lower expected value.

Definition: Let $X$ be any random varialbe. Then, its lower expected value is defined as

\begin{equation} \mathbb E({X_L}) := \sup \left\{\mathbb E Y: X \geq Y, Y \in F_{\texttt{Sim}}\right\}. \end{equation}

Definition: We say that the expected value of the random variable $X$ denoted $\mathbb E X$ exists if $\mathbb E X_U = \mathbb E X_L$, and is equal to one of them.

Often times, the expected value is written in the following form (meaning is same as above definition)

\begin{equation} \mathbb E X = \int_{\Omega} X ~dP. \end{equation}

Although the above is a definition of the expected value, the existence depends on the following definition of a random variable. In the class, a different motivation for the following definition was given. In fact, we used the partition of the $Y$-axis, and looked at the inverse map to define the integral (Lebesgue), where the following condition was natural. Here we state it without any further motivation. But, the following is an important!

Definition (random variable or measurable function): We say that $X:\Omega \rightarrow \mathbb R$ is a random variable if
\begin{equation} \{\omega \in \Omega: X(\omega) \leq a\} \in \mathcal F \text{ for all $a \in \mathbb R$}. \end{equation}

Exercise: Show that if $X$ and $Y$ are random variables, then the following are also random variables, a) $X+Y$, b) $\alpha X$, c) $X^2$, and (d) $X/Y$.

Now, consider a random variable $X$ for which the expected value exists. A curcial property $X$ satisfies is that for any $\epsilon > 0$, there exists simple functions say $X_u \geq X$ and $X_l \leq X$ such that
\begin{equation} \mathbb E X_u - \epsilon/2 \leq \mathbb E(X) \leq \mathbb E(X_l) + \epsilon/2. \end{equation}
It shows that the expected value of any random variable can be very well approximated by the expected value of a simple function.

Exercise: Show that the above inequality is true. Show that there exists a sequence of simple functions $X_n \in F_{sim}$ such that $X_n \rightarrow X$. Here the convergence is pointwise.

Suppose we have two random variable $X$ and $Y$ for which the respective expectation exists. Also, assume that $\mathbb E(X+Y)$ also exists. Then, we have
\begin{eqnarray} \mathbb E(X+Y) &=& \inf\left\{\mathbb E Z: X + Y \leq Z, Z \in F_{\texttt{simple}}\right\} \\ &\leq& \inf\left\{\mathbb E (Z_x + Z_y): X \leq Z_x, Y \leq Z_y, Z_x,Z_y \in F_{\texttt{simple}}\right\} \\ &= & \inf\left\{\mathbb E (Z_x) + \mathbb E (Z_y): X \leq Z_x, Y \leq Z_y, Z_x,Z_y \in F_{\texttt{simple}}\right\} \\ &\leq& \inf\left\{\mathbb E Z_x: X \leq Z_x, Z_x \in F_{\texttt{simple}}\right\} + \inf\left\{\mathbb E Z_y: Y \leq Z_y, Z_y \in F_{\texttt{simple}}\right\}\\ &=& \mathbb E X + \mathbb E Y. \end{eqnarray}

Exercise: Reason out why the above (in)equalities hold good.

Result $1$: The above implies that $\mathbb E(X+Y) \leq \mathbb E X + \mathbb E Y$.

Since $\mathbb E(X+Y)$ exists, we also have
\begin{eqnarray} \mathbb E(X+Y) &=& \sup\left\{\mathbb E Z: X + Y \geq Z, Z \in F_{\texttt{simple}}\right\} \\ &\geq& \sup\left\{\mathbb E (Z_x + Z_y): X \geq Z_x, Y \geq Z_y, Z_x,Z_y \in F_{\texttt{simple}}\right\} \\ &= & \sup\left\{\mathbb E (Z_x) + \mathbb E (Z_y): X \geq Z_x, Y \geq Z_y, Z_x,Z_y \in F_{\texttt{simple}}\right\} \\ &\geq& \sup\left\{\mathbb E Z_x: X \leq Z_x, Z_x \in F_{\texttt{simple}}\right\} + \inf\left\{\mathbb E Z_y: Y \leq Z_y, Z_y \in F_{\texttt{simple}}\right\}\\ &=& \mathbb E X + \mathbb E Y. \end{eqnarray}

Result $2$: The above implies that $\mathbb E(X+Y) \geq \mathbb E X + \mathbb E Y$.

Combining the above two results, we have $\mathbb E (X+Y) = \mathbb E X + \mathbb E Y$.

Exercise: Show that $\mathbb E (\alpha X) = \alpha \mathbb E X$.

Now, let us see how to compute expected value of various random variables. Suppose $\Omega = \{H,T\}$ with probabilities $p$ and $1-p$. Let the corresponding random varalge be $X(\{H\}) = 1$ and $X(\{T\}) = 0$. This random variable is called a Bernoulli random variable. Then
\begin{eqnarray} \mathbb E(X) &=& \int_{\Omega} X dP\\ &=& \int_{\omega \in \{H,T\}} X(\omega) dP\\ &=& X(\{H\}) \Pr\{H\} + X(\{T\}) \Pr\{T\}\\ &=& 1 \times p + 0 \times (1- p)\\ &=& p \end{eqnarray}

Suppose you map a random variable $X$ to say $g(X)$ for some measuable (why?) $g()$. Then, what is $\mathbb E (g(X))$? It turns out that $\mathbb E(g(X)) = \int_{\Omega} g(X) dP$. The way to show this is though the route of using simple functions.

There are a few interesting averages such as variance in particular, and moments in general. We call the following quantity the $k$-th central moment
\begin{equation} m_k := \mathbb E(X-\mathbb EX)^k \end{equation}
and $c_k := \mathbb E X^k$ simply the $k$-th moment.

Sequence of Random variables

Often times in applications, we observe a seqeunce of random variables $X_n$, and one would like to understand the limit properties of these quantities. In particular, we would be interested in understanding the meaning of $X_n \rightarrow X$ or $\mathbb E X_n \rightarrow \mathbb E X$. To set the premise, assume that $X_n \rightarrow X$ pointwise-see below for the preccise definition.

Definition (pointwise convergence): We say that $X_n$ converges pointwise $X$ if $X_n(\omega) \rightarrow X(\omega)$ for all $\omega \in \Omega$. The convergence is in the usual limit sense.

Suppose $X_n \rightarrow X$, then we ask the following question

Question: When $\mathbb E X_n \rightarrow \mathbb E X$, i.e., when can we interchange limit and expected value?

First, we prove the following simple one.

Theorem (Monotone Convergence Theorem (MCT)): Suppose $X_1 \leq X_2 \leq \ldots$ be a sequence of monotone random variables, and suppose $X_n \rightarrow X$, then $\lim_{n \rightarrow \infty} \mathbb E X_n = \mathbb E X$.
Proof: See here.

Now, we state another important result, which along with MCT lead to the answer to the question above.

Theorem (Fautou Lemma): Let $X_n$ be a sequence of random varibles. Then, $\mathbb E [\liminf X_n] \leq \liminf \mathbb E X_n$.
Proof: See here.

This leads to the Lebesgue dominated convergence theorem, whose proof can be found here.

Theorem (DCT): Let $X_n \rightarrow X$ and $X_n \leq Y$ for some integrable $Y$ for all $n$. Then, $\mathbb E X_n \rightarrow \mathbb E X$.

So far, we have been concentrating on the convergence of averages (after taking expectation). It is very demanding to assume that $X_n$ converges pointwise. We need to relax this. One way to relax this is to assume that the outcomes where the convergence happens pointwise has probability one; this leads to the following notion.

Defintion: We say that $X_n$ converges almost surely (a.s.) to $X$ denoted $X_n \stackrel{a.s}{\rightarrow} X$ if $\Pr\{X_n \rightarrow X\} = 1$.

Note: See here for all the proofs, exercise problems for the following results.

Lemma: If $X_n \stackrel{a.s}{\rightarrow} X$ and $Y_n \stackrel{a.s}{\rightarrow} Y$, then $X_n + Y_n \stackrel{a.s}{\rightarrow} X + Y$.

Exercise: Show that if $X_n \stackrel{a.s}{\rightarrow} X$ then $\alpha X_n \stackrel{a.s}{\rightarrow} \alpha X$ for any $\alpha$.

Although almost sure is a relaxation of pointwise, it is still demanding. One way to relax this further is to look at the following notion.

Defintion: We say that $X_n$ converges in probability to $X$ denoted $X_n \stackrel{p}{\rightarrow} X$ if $\Pr\{|X_n - X| > \epsilon \} \rightarrow 0$ for every $\epsilon > 0$.

Lemma: If $X_n \stackrel{p}{\rightarrow} X$ and $Y_n \stackrel{p}{\rightarrow} Y$, then $X_n + Y_n \stackrel{p}{\rightarrow} X + Y$.

Exercise: Show that if $X_n \stackrel{p}{\rightarrow} X$ then $\alpha X_n \stackrel{p}{\rightarrow} \alpha X$ for any $\alpha$.

There is one more notion of convergence.

Defintion: We say that $X_n$ converges in the mean square error (mse) to $X$ denoted $X_n \stackrel{mse}{\rightarrow} X$ if $\mathbb E|X_n - X|^2 \rightarrow 0$.

Exercise: Show that $X_n \stackrel{mse}{\rightarrow} X$ and $Y_n \stackrel{mse}{\rightarrow} Y$, then $X_n + Y_n \stackrel{mse}{\rightarrow} X + Y$.

Theorem: If $X_n \stackrel{a.s}{\rightarrow} X$ implies $X_n \stackrel{p}{\rightarrow} X$. Further, $X_n \stackrel{mse}{\rightarrow} X$ implies $X_n \stackrel{p}{\rightarrow} X$.

Problems

Problem-01

Let $A_1, \ldots, A_n$ be not necessarily disjoint events, and let $X$ be the number of events that occur. This is a random variable with $X(s) = \left| \{ i : s \in A_i \} \right| = \sum_{i=1}^{n} \mathbf{1}_{A_i}(s)$. Since expectation is a linear operator, we have $\mathbb{E}[X] = P(A_1) + P(A_2) + \cdots + P(A_n)$. Use this fact to answer the following questions:

Question 2: A seven-card stud poker hand is dealt from a well-shuffled standard deck. Let $X$ be the random variable that tells how many Clubs are in the hand. What is $\mathbb{E}[X]$?

Problem-02

Select $n$ points on a circle (not a disc) independently according to a uniform distribution. What is the probability that there is a semicircle containing all of them?

Problem-03

Let $S_1, S_2,\ldots$ be a sequence of collections of subsets of $\Omega$, such that $S_n\subset S_{n+1}$ for all n.
(a) Suppose that each $S_i$ is an algebra. Prove that $\cup_{i=1}^{\infty}S_i$ is also an algebra.
(b) Suppose that each $S_i$ is a $\sigma$-algebra. Show (by counterexample) that $\cup_{i=1}^{\infty}S_i$ might not be a σalgebra.

Problem-04

Question-01: Prove that if $\mathcal{F}_1$ and $\mathcal{F}_2$ are fields of subsets of $\Omega$, then $\mathcal{F}_1\cap\mathcal{F}_2$ is also field.

Question-02: Find two fields such that their union is not a field.

Question-03: Is $\mathcal{F}=\left\{A\subset\Omega:A \text{is a finite set} \right\}$ always a field?

Atom: The smallest (with respect to inclusion) non-empty events belonging to $\mathcal{F}$ are called atom.

Question-04: What are the atoms in the fields
$\mathcal{F}_1=\left\{\phi,\{1\},\{2\},\{1,2\},\{3,4\},\{2,3,4\},\{1,3,4\},\{1,2,3,4\}\right\}$,

$\mathcal{F}_2=\left\{\phi,(0,1),(0,\frac{2}{3}),[\frac{2}{3},1)\right\}$?

Question-05: Show that if $\mathcal{F}$ is a finite field, then every non-empty event $A\in \mathcal{F}$ contains an atom.

Question-06: Which of the following families of sets is a field:
$\mathcal{F}_1=\left\{\phi,\{a\},\{b\},\{a,c\},\{a,b,c\}\right\}$,
$\mathcal{F}_1=\left\{\phi,[1,4], (\frac{1}{4},1],[0,\frac{1}{4}],(0,\frac{1}{4}],(0,\frac{1}{4}],\{\frac{1}{4}\},[\frac{1}{4},1],[0,1]- \{\frac{1}{4}\}\right\}$,
$\mathcal{F}_1=\left\{\phi,\{1\},\{2\},\{3\},\{1,2\},\{3,4\},\{2,3,4\},\{1,3,4\}\right\}?$

Problem-05

Question-01: Find $\alpha$ such that $P$ is a finitely additive probability measure, where
$\Omega=\{1,2,3,4\}$
$\mathcal{F}=\left\{\phi,\Omega,\{1\},\{2,3\},\{4\},\{1,2,3\},\{2,3,4\}\{1,4\}\right\}$,
$P(\{1\})=1/4, P(\{2,3\})=1/2, P(\{4\})=\alpha.$
Compute $P(\{1,2,3\}), P(\{2,3,4\})$ and $P(\{1,4\})$.

Question-02: Prove that, if $A\subset B$ then $P(A)\leq P(B).$

Question-03: Show that \[P\left(\bigcup_{i=1}^nA_i\right)\leq\sum_{i=1}^nP(A_i).\]

Question-04: Suppose that $P(A)=1/2$ and $P(B)=2/3$. Show that $\frac{1}{6}\leq P(A\cap B)\leq \frac{1}{2}$. Give examples to show that the extreme values $\frac{1}{6}$ and $\frac{1}{2}$ can be attained.

Problem-06

Question-01: Let $\mathcal{F}$ be a $\sigma-$field. Demonstrate that if $A_1,A_2,\ldots \in \mathcal{F}$, then \[\bigcap_{i=1}^{\infty}\in\mathcal{F}.\]

Question-02: Let $\mathcal{F}$ be a $\sigma-$field on $\Omega=[0,1]$ such that $\left[\frac{1}{n+1},\frac{1}{n}\right]\in\mathcal{F}$ for $n=1,2,3\ldots$. Show that

(1). $\{0\}\in\mathcal{F}$,
(2). $\{\frac{1}{n}:n=2,3,4,\ldots\}\in\mathcal{F}$;
(3). $(\frac{1}{n},1]\in\mathcal{F}$ for all $n$,
(4). $(0,\frac{1}{n}]\in \mathcal{F}$ for all $n$.

Question-03: Let $\Omega$ and $\tilde{\Omega}$ be an arbitrary sets and let $X:\tilde{\Omega}\rightarrow\Omega$ be any function. Show that if $\mathcal{F}$ is a $\sigma-$field on $\Omega$, then $\tilde{\mathcal{F}}=\{X^{-1}(A):A\in\mathcal{F}\}$ is a $\sigma-$field of $\tilde{\Omega}.$

Question-04: Find two $\sigma-$fields such that their union is not a $\sigma-$field.

Question-05: Show that $\mathbf E f(X) \geq f(\mathbb E X)$, where $f()$ is a convex function.

Question-06: Show that if $\Pr\{|X_n - X| > \epsilon \text{ i.o.}\} = 0$ implies that $X_n$ converges to $X$ almost surely.

Question-07: Show that $\mathbb E(XY) \leq \sqrt{\mathbb E X^2 \times \mathbb E Y^2}$.

Problem-07

Question-01: If $X_1$ and $X_2$ are independent binomial random variables with respective parameters $(n_1, p)$ and $(n_2, p)$, calculate the conditional probability mass function of $X_1$ given that $X_1 + X_2 = m$.

Question-02: There are $n$ components. On a rainy day, component i will function with probability $p_i$ ; on a nonrainy day, component i will function with probability $q_i$ , for $i= 1, . . . ,n$. It will rain tomorrow with probability $\alpha$. Calculate the conditional expected number of components that function tomorrow, given that it rains.

Question-03: Suppose that the expected number of accidents per week at an industrial plant is four. Suppose also that the numbers of workers injured in each accident are independent random variables with a common mean of $2$. Assume also that the number of workers injured in each accident is independent of the number of accidents that occur. What is the expected number of injuries during a week?

Problem-08

Question-01: Let $X_1,X_2,\ldots$ be a sequence of IID Uniform(0,1) random variables. Define the sequence $Y_n$ as \[Y_n=\min(X_1,X_2,\ldots,X_n).\] Prove the following convergence results independently:
a) $Y_n\rightarrow 0$ in distribution.
b) $Y_n\rightarrow 0$ in probability.
c) $Y_n\rightarrow 0$ in r-th mean.
d) $Y_n\rightarrow 0$ almost surely.

Question-02: Let $X_n\sim N(0,1/n)$. Show that $X_n\rightarrow 0$ almost surely.

Question-03: Let $\{X_n, n=1,2,\ldots\}$ and $\{Y_n, n=1,2,\ldots\}$ be two sequences of ramdom variables, defined on the sample space $S$.
a) Suppose that we know $X_n\rightarrow X$ and $Y_n\rightarrow Y$ almost surely. Prove that $X_n+Y_n\rightarrow X+Y$ almost surely.
b) Suppose that we know $X_n\rightarrow X$ and $Y_n\rightarrow Y$ in probability. Prove that $X_n+Y_n\rightarrow X+Y$ in probability.

Question-04: Let $X_1,X_2,\ldots$ be independent random variables, where $X_n\sim Ber(\frac{n}{n+1})$ for $n=1,2,\ldots$. We define the sequence $\{X_n,n=2,3,4,\ldots\}$ as \[X_{n+1}=Y_1Y_2Y_3\ldots Y_n,\] for $n=1,2,3,\ldots$. Show that $X_n\rightarrow 0$ almost surely.

Question-05: Let $X_1,X_2,\ldots$ be a sequence of IID random variables with mean $0$. Show that the sequence of random variables defined by \[Y_n=\frac{X_1X_2+X_2X_3+X_3X_4+\ldots+X_{n-1}X_n}{n}\rightarrow 0\] almost surely.

Problem-09

Question-01: Consider the Markov chain shown in the figure below:

a) Find $P(X_4=3|X_3=2).$
b) Find $P(X_3=1|X_2=1).$
c) If we know $P(X_0=1)=1/3$, find $P(X_0=1,X_1=2).$
d) If we know $P(X_0=1)=1/3$, find $P(X_0=1,X_1=2,X_2=3).$

Question-02: Show that in a finite Markov chain, there is at least one recurrent class.

Question-03: Consider the Markov chain in the figure below:

a) Is $Class 1=\{state 1, state 2\}$ aperiodic?
b) Is $Class 2=\{state 3, state 4\}$ aperiodic?
c) Is $Class 3=\{state 6, state 7, state 8\}$ aperiodic?

[Crude Draft] \(EE629\): Probability Models and Application Course @ IIT Dharwad

Commenting

Logistics

References

Announcements

Introduction

Preliminary

Intorduction to Probability Measure

Proof of Borel-Cantilli Lemma \(1\)

Proof of Borel-Cantilli Lemma \(2\)

Problems

Expected Value

Sequence of Random variables

Problems

Problem-01

Problem-02

Problem-03

Problem-04

Problem-05

Problem-06

Problem-07

Problem-08

Problem-09