🌞 Week 5 - HackMD

# 🌞 Week 5 ## Notes on <u>Eigenvalues and Eigenvectors</u> > **Prepared by:** _Roshan Naidu_ > **Instructor:** _Professor Leon Johnson_ > **Course:** _INFO H611_ > [!NOTE] > Inline math must be wrapped within `$ ... $` and display math within `$$ ... $$` in HackMD. > > Switching to the **Preview Mode** (👁️) would render eveything here to be viewed properly. ## Conventions and Definitions used throughout the writeup • Vectors are bold lowercase: $\mathbf{u},\mathbf{v}\in\mathbb{R}^n$ • Scalars (real numbers) are plain: $a,b,c\in\mathbb{R}$ • Matrices are uppercase: $A,B\in\mathbb{R}^{m\times n}$ • Norm / Magnitude of a vector: $\|\mathbf{v}\|$ • Dot product: $\mathbf{u}\cdot\mathbf{v}$ • Cross product: $\mathbf{u}\times \mathbf{v}$ • Unit vector: $\hat{\mathbf{v}}=\frac{\mathbf{v}}{\|\mathbf{v}\|}$ --- ## Q.1. Since all the eigenvalues of $T$ are at most $1$, $T^k$ approaches the **0-matrix** as $k$ gets larger and larger. Therefore, eventually, we will reach a steady state ‘final’ genre. I said: **False**. Correct: **False**. ### Why I thought False I recognized that for a **Markov transition matrix** $T$, the eigenvalues satisfy $|\lambda_i| \le 1$, but there is **always at least one eigenvalue equal to $1$**. That means $T^k$ cannot approach the zero matrix, since the component corresponding to $\lambda = 1$ remains unchanged as $k$ increases. Hence, $T^k$ will stabilize, not vanish. ### Why it was False For a stochastic matrix $T$: - The largest eigenvalue is $\lambda_1 = 1$ - The eigenvector corresponding to $\lambda_1 = 1$ represents the **steady state distribution** $\pi$, satisfying: $$ T {\pi} = {\pi} $$ - When we take high powers of $T$, the effect of eigenvalues with $|\lambda_i| < 1$ fades away, but the $\lambda_1 = 1$ term persists. Thus, $$ \lim_{k \to \infty} T^k = \begin{bmatrix} \pi^T \\ \pi^T \\ \vdots \\ \pi^T \end{bmatrix} $$ which is a **rank-1 matrix**, not the zero matrix. Therefore, $T^k$ converges to a **steady-state projection matrix**, not to $\mathbf{0}$. ### Examples - **Theoretical:** $$ T = \begin{bmatrix} 0.9 & 0.1 \\ 0.5 & 0.5 \end{bmatrix} $$ has eigenvalues $\lambda_1 = 1$, $\lambda_2 = 0.4$ Then, $$ T^k \to \begin{bmatrix} 0.8333 & 0.8333 \\ 0.1667 & 0.1667 \end{bmatrix} \text{ as } k \to \infty $$ This is not a zero matrix — it represents convergence to a **steady-state distribution**. - **Real-world:** In the **music genre transition** model (Week 5 notebook), each entry of $T$ shows the probability of moving from one genre to another. As $k$ grows, repeated transitions reflect long-run listener habits. The system stabilizes into a **steady state vector** showing the long-term proportion of time spent on each genre, it never collapses to zero. > [!Important] > For stochastic matrices, $T^k$ does **not** approach the zero matrix. > It converges to a **steady-state matrix** determined by the eigenvalue $\lambda = 1$ and its eigenvector ${\pi}$. --- ## Q.2. When passing in probabilities, the values in $\mathbf{u}_k$ will always sum to 1. So, the $j$-th element of $\mathbf{u}_k$ corresponds to the probability our $k$-th song will be of genre $j$. I answered: **True**. Correct answer: **True**. ### Why I thought it was True In the notebook, $\mathbf{u}_k$ represents a **probability vector** over all genres for the $k$-th song. Since probabilities must collectively represent a complete distribution, their total must sum to 1. Each entry of $\mathbf{u}_k$ (that is, $u_{k,j}$) gives the probability that the $k$-th song belongs to genre $j$. This interpretation aligns with how transition and probability vectors work in Markov models and stochastic systems. ### Why it was True (theoretically) A probability vector is defined as $$ \mathbf{u}_k = [u_{k,1}, u_{k,2}, \dots, u_{k,n}], $$ where each component satisfies $$ 0 \le u_{k,j} \le 1, $$ and the total probability mass equals 1: $$ \sum_{j=1}^{n} u_{k,j} = 1. $$ This ensures $\mathbf{u}_k$ lies in the **probability simplex**, meaning it encodes a valid probability distribution over all $n$ genres. Therefore, the $j$-th element corresponds directly to the probability that the $k$-th song will be of genre $j$. ### Examples **Theoretical:** Suppose there are 3 genres: Rock, Pop, and Jazz. If $$ \mathbf{u}_k = \begin{bmatrix} 0.2 \\ 0.5 \\ 0.3 \end{bmatrix} $$ then $$ \sum_j u_{k,j} = 0.2 + 0.5 + 0.3 = 1. $$ Here $u_{k,1}=0.2$ means the $k$-th song has a 20% chance of being Rock, $u_{k,2}=0.5$ means 50% chance of Pop, etc. **Real-world:** In the Week 5 notebook’s **music genre transition model**, $\mathbf{u}_k$ represents the **current probability distribution** of a listener’s next song genre. For example, if $$ \mathbf{u}_k = [0.6,\, 0.3,\, 0.1] $$ it means there’s a 60% chance the next song will be Rock, 30% Pop, and 10% Jazz. Since $\mathbf{u}_k$ is always normalized (its entries sum to 1), it accurately reflects the model’s probabilistic prediction for the next genre. >[!Important] >Because $\mathbf{u}_k$ represents a valid probability distribution, its elements must always sum to 1, and each entry gives the probability of a specific genre. --- ## Q.3. It is possible to generate $\mathbf{u}_k$ using *only* a linear combination of the columns of $S$. I said: **True**. Correct: **True**. ### Why I thought it was True From the concept of eigen-decomposition, any vector in the same vector space can be expressed as a **linear combination of the eigenvectors** of a matrix. Since the columns of $S$ are the eigenvectors of the matrix (or orthogonal basis vectors if $S$ is orthogonal), $\mathbf{u}_k$ can indeed be written as a combination of those columns. Thus, $\mathbf{u}_k$ exists entirely within the **column space** of $S$. ### Why it was True (theoretically) Given the decomposition $$ A = S \Lambda S^{-1}, $$ the columns of $S$ form a **basis** for the space spanned by $A$. Therefore, any vector — including $\mathbf{u}_k$ — can be represented as a **linear combination** of these columns. Formally, there exists some coefficient vector $\mathbf{c}$ such that: $$ \mathbf{u}_k = S \mathbf{c} $$ where $\mathbf{c}$ contains the weights of the linear combination of the columns of $S$. This means the columns of $S$ span the entire space in which $\mathbf{u}_k$ lies. ### Examples **Theoretical:** Let $$ S = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} $$ and suppose $$ \mathbf{u}_k = \begin{bmatrix} 2 \\ 3 \end{bmatrix} $$ Then $$ \mathbf{u}_k = 2 \begin{bmatrix} 1 \\ 0 \end{bmatrix}+ 3 \begin{bmatrix} 0 \\ 1 \end{bmatrix} = S \begin{bmatrix} 2 \\ 3 \end{bmatrix} $$ Hence, $\mathbf{u}_k$ is a linear combination of the columns of $S$. **Real-world example:** In the **music genre transition** model from the Week 5 notebook, $S$ contains the eigenvectors that define the fundamental "modes" of listener transitions. Any observed user pattern $\mathbf{u}_k$ (for example, a listener’s probability vector over genres) can be expressed as a combination of these base patterns. Each column of $S$ represents a pure “behavioral mode,” and $\mathbf{u}_k$ mixes them to represent real listener dynamics. > [!Important] > Because the columns of $S$ form a **basis**, any vector $\mathbf{u}_k$ — including state or behavior vectors — can be expressed as a **linear combination** of those columns. --- ## Q.4. The vector $\dfrac{\mathbf{v}_1}{\text{sum}(\mathbf{v}_1)}$ represents a steady state distribution for $T$, whose $j$-th element represents the expected probability of obtaining the $j$-th state.” I answered: **True**. Correct answer: **True**. ### Why I thought it was True I knew that the **dominant eigenvector** of a stochastic (Markov) transition matrix $T$, corresponding to the eigenvalue $\lambda = 1$, represents the **steady-state distribution** of the system. Normalizing it by dividing by the sum of its elements ensures that all probabilities add up to $1$. Therefore, $\dfrac{\mathbf{v}_1}{\text{sum}(\mathbf{v}_1)}$ gives the long-term probability of being in each state. ### Why it was True (theoretically) For any **Markov transition matrix** $T$: - There exists an eigenvector $\mathbf{v}_1$ corresponding to $\lambda_1 = 1$ such that: $$ T \mathbf{v}_1 = \mathbf{v}_1 $$ - This eigenvector represents the **steady-state** or **stationary distribution**, but its entries are not necessarily normalized. To interpret it as a probability vector, we normalize it: $$ {\pi} = \frac{\mathbf{v}_1}{\sum(\mathbf{v}_1)} $$ so that $$ \sum_i \pi_i = 1 $$ Each component $\pi_j$ then gives the **expected long-run probability** of being in the $j$-th state. ### Examples **Theoretical:** Consider $$ T = \begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix} $$ The eigenvalues are $\lambda_1 = 1$ and $\lambda_2 = 0.3$ The eigenvector corresponding to $\lambda_1 = 1$ is $\mathbf{v}_1 = [4, 6]^T$ Normalizing gives $$ {\pi} = \frac{1}{10}[4, 6]^T = [0.4, 0.6]^T $$ which represents the steady-state distribution — i.e., the long-run proportion of time spent in each state. **Real-world:** In the **music genre transition** model from the Week 5 notebook, $T$ captures the probability of switching between genres. The normalized dominant eigenvector $\dfrac{\mathbf{v}_1}{\text{sum}(\mathbf{v}_1)}$ represents the **steady-state listening distribution** — how a listener’s genre preferences stabilize after many transitions. For instance, it might show that over time, a user spends $60\%$ of their listening time on pop and $40\%$ on rock. > [!Important] > The vector $\dfrac{\mathbf{v}_1}{\text{sum}(\mathbf{v}_1)}$ correctly represents the steady-state distribution of a Markov matrix $T$. > Each entry corresponds to the **long-run expected probability** of being in that state. --- ## Q.5. This matrix $D$ can be used as a Markov transition matrix. I answered: **False**. Correct answer: **False**. ### Why I thought it was False I recognized that to be a Markov (transition) matrix a matrix must satisfy specific constraints (nonnegativity and column- or row-sums equal to 1 depending on convention). The matrix $(D)$ in the notebook looked like a raw co-occurrence / count / data matrix rather than a normalized probability transition matrix, so it likely violates those constraints. Therefore it cannot be directly used as a Markov transition matrix. ### Why it was False (theoretically) A matrix $M$ is a valid Markov transition matrix only if **all** these hold (convention: columns sum to 1 if you apply $M$ to column state vectors; some texts use rows sum to 1 to be consistent): 1. **Nonnegativity:** $M_{ij} \ge 0$ for every entry. 2. **Normalization:** each column (or row, per chosen convention) sums to $1: \sum_i M_{ij} = 1$ for all $j$ 3. **Interpretation:** each column (or row) represents a probability distribution over next states. If $D$ contains negative numbers, zeros that break normalization, or columns/rows that do not sum to 1, it is **not** a Markov matrix. Raw data matrices (counts, scores, term frequencies, similarity measures) typically fail one or both conditions and must be converted (e.g., renormalized and nonnegativized) before being used as a transition matrix. **Therefore** $D$ cannot be used *as is* as a Markov transition matrix, it needs preprocessing (nonnegativity check and normalization). ### Examples **Theoretical:** $$ D = \begin{bmatrix} 2 & -1 \\ 3 & 4 \end{bmatrix} $$ - Entry $D_{1,2}=-1$ is negative → cannot represent probabilities. - Column sums: first column sums to $5$, second sums to $3$ (not 1). So $D$ is not a Markov matrix. **How to fix:** make all entries nonnegative and normalize columns (or rows), e.g. 1. replace negatives if they are measurement artifacts 2. then divide each column by its column-sum: $$ \tilde{T}_{:,j} = \frac{D_{:,j}}{\sum_i D_{i,j}}, $$ which yields a valid column stochastic transition matrix (provided all entries non negative). **Real-world:** In the Week 5 notebook you build a transition matrix `df_T` from a listening history using `pd.crosstab(..., normalize='columns')`. Suppose instead you used a raw co-occurrence matrix of genres (counts of transitions) and forgot to normalize: - Raw co-occurrence $D$ (counts) might be: \[ D = \begin{bmatrix} 15 & 2 & 0 \\ 5 & 10 & 1 \\ 0 & 3 & 12 \end{bmatrix} \] Columns are counts, not probabilities → columns do not sum to 1. - Using this $D$ directly as a transition matrix would treat counts as probabilities — incorrect. **Correct approach used in the notebook:** normalize columns (or rows) to convert counts to probabilities so that each column sums to 1 (or each row does, depending on your convention). After normalization you obtain a valid transition matrix `T = df_T` which can be used for Markov analysis, eigen-decomposition, and steady-state calculations. > [!Tip] Always verify two things before calling a matrix "Markov": > 1. All entries are $\ge 0$. > 2. Each probability-vector (row or column per your convention) sums to exactly $1$. > If either fails, preprocess (clip/shift if justified) and normalize the matrix first. --- ## Q.6. There are no complex eigenvalues of $D$. i.e., all eigenvalues of $D$ are real numbers. answered: **True**. Correct answer: **Truee**. ### Why I thought it was True From the notebook output, all the eigenvalues listed in the diagonal matrix $D$ were real numbers with no imaginary components. Since $D$ came from diagonalizing a real and symmetric matrix (as seen in the eigen decomposition section), I expected all eigenvalues to be real. Symmetric matrices always produce real eigenvalues, so the result matched my intuition. ### Why it was True (theoretically) For any **real symmetric matrix** $A$, we have $$ A = A^T $$ and it can be diagonalized as $$ A = S D S^T $$ where $S$ is an orthogonal matrix whose columns are the eigenvectors of $A$, and $D$ is a diagonal matrix whose entries are the eigenvalues of $A$. A fundamental property of symmetric matrices is that **all their eigenvalues are real**. Thus, every diagonal entry of $D$ (i.e., every eigenvalue of $A$) must be a real number which means $D$ has no complex values. ### Examples **Theoretical:** Let $$ A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} $$ Then the characteristic polynomial is $$ \det(A - \lambda I) = (2 - \lambda)^2 - 1 = 0 $$ which gives eigenvalues $\lambda_1 = 3$ and $\lambda_2 = 1$. Both are real, so $$ D = \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix} $$ **Real-world:** In the Week 5 notebook, the matrix $D$ (obtained from diagonalizing the transition matrix $T$) showed eigenvalues such as $$ 1.0, \ 0.43, \ 0.27, \ 0.11, \ldots $$ of which all are real numbers This confirms that the transition or similarity matrix used was symmetric ensuring real eigenvalues. Such behavior appears in many real systems like recommendation models or network diffusion models where the matrices are designed to preserve real-valued eigenstructure. >[!Note] Because the matrix is symmetric and real, no complex eigenvalues can appear. All eigenvalues of $D$ are purely real. --- ## Q.7. The sum of the eigenvalues of a distance matrix (like our Jaccard Distance Matrix, $D$) cannot be anything other than $0$ I answered: **True**. Correct answer: **True**. ### Why I thought it was True The sum of the eigenvalues of a matrix is equal to its trace. For a distance matrix, the diagonal elements are always zero, which means the trace is zero. Hence, I concluded that the sum of the eigenvalues must also be zero. ### Why it was True (theoretically) This is indeed True because for any square matrix $D$, the sum of its eigenvalues equals its trace: $$ \sum_i \lambda_i = \text{trace}(D) $$ For a distance matrix such as the Jaccard Distance Matrix, $D_{ii} = 0$ for all $i$, since the distance from any point to itself is zero. Thus, $$ \text{trace}(D) = \sum_i D_{ii} = 0 $$ which implies that the sum of all eigenvalues of $ D $ must also be zero. ### Examples **Theoretical:** In linear algebra, any symmetric distance matrix derived from a metric space has zeros on its diagonal. The trace is zero, which means the sum of eigenvalues equal to the trace is also zero. **Real-world:** In recommendation systems (e.g., music or movie similarity), distance matrices like the Jaccard Distance Matrix are used to represent dissimilarity between items. Since the diagonal entries (self-distances) are zero, the trace and the sum of eigenvalues are both zero. --- ## Q.8. According to our matrix $D$, rock music and hip hop are very similar when it comes to the kinds of genres that follow. answered: **True**. Correct answer: **False**. ### Why I thought it was True Looking at the ${Jaccard\ Distance\ Matrix}$ heatmap, I initially thought that the $rock$ and $hip\ hop$ genres were similar due to their proximity in the matrix. There seemed to be a relatively high value in their respective cells, suggesting that these two genres are likely to share similar subsequent genres, which in turn made me believe they are similar in their transitions. ### Why it was True (theoretically) On further examination, the Jaccard distance value between $rock$ and $hip\ hop$ is ${0.83}$, which is actually quite high. A Jaccard distance value close to 1 implies dissimilarity (as opposed to similarity). In fact, a low Jaccard distance (close to 0) suggests that two genres tend to follow each other frequently, indicating that they are similar in the kinds of genres that follow. In this case, since 0.83 is quite high, the matrix actually shows that rock and hip hop are not similar in terms of genre transitions. They do not share a significant overlap in the subsequent genres listeners typically choose after listening to these genres. ### Examples **Theoretical:** Consider the $Jaccard\ Distance$ formula which measures similarity between two sets: $$ J(A,B) = \frac{|A \cap B|}{|A \cup B|} $$ Where: - $ A $ is the set of genres that follow rock music, - $ B $ is the set of genres that follow hip hop. The high Jaccard distance of $0.83$ suggests that these sets have very little overlap. This indicates that the genres following rock music are vastly different from those following hip hop music. **Real-world:** In real-world music behavior, someone who listens to rock music might typically move to genres like pop or jazz. However, someone listening to hip hop might be more likely to transition to genres like pop or rap. The genre transition patterns of these two types of music are very different, reflecting a higher Jaccard distance. Thus, these genres are not as similar in genre transitions as initially believed. >[!Tip] A hight Jaccard distance value suggests variaytion, so they are not similar when it comes to the kinds of genres that follow. ## Optional LaTeX Practice Some useful math symbols: $\alpha,\beta,\gamma,\pi,\infty$ $\nabla, \int_0^1 f(x)\,dx, \sum_{i=1}^n i^2$ $\lim_{x\to 0}\frac{\sin x}{x} = 1$ Actions: Fractions: $\tfrac{1}{2}$ Summation: $\sum_{i=1}^n i^2\) → \(\sum_{i=1}^n i^2$ Integral: $\int_0^1 x^2\,dx = \tfrac{1}{3}$ Determinant: $\det(A)$ Inverse: $A^{-1}$ --- ## 🌕 Final Reflection In this week’s exercises, I deepened my understanding of eigenvalues, eigenvectors, and their role in analyzing matrix behavior—especially in the context of Markov transitions and distance matrices. The relationship between eigenvalues and the trace of a matrix helped clarify how the structure of a matrix directly influences its properties, such as stability and steady states. I also learned how concepts like the Jaccard Distance Matrix apply to real-world problems, such as measuring dissimilarity in recommendation systems or clustering data. Overall, this week reinforced how linear algebra provides the mathematical foundation for understanding transitions, convergence, and relationships in large-scale data systems, bridging the gap between theoretical mathematics and real-world applications.