# PageRank
### Big Idea
The *PageRank vector* is the dominant eigenvector (the steady state) of the *Google matrix*, a specially constructed stochastic matrix derived from a directed graph of webpages. The components of this eigenvector provide a quantitative measure of the *importance* of each corresponding vertex (webpage) in the graph.
## Directed Graph and Stochastic Matrix
A collection of webpages and the links between them can be modeled as a *directed graph (DG)* with $N$ vertices (webpages) and edges (links)1.
* **Adjacency Matrix $A$:** An $N \times N$ matrix where $a_{i,j}=1$ if there is a link from webpage $j$ to webpage $i$, and $a_{i,j}=0$ otherwise.
* **Stochastic Matrix $P$:** Represents the probability of a random click moving from one page to another. It is constructed from the adjacency matrix by normalizing each column: $$p_{i,j} = \frac{a_{i,j}}{\text{total # of links from webpage } j}$$ The entry $p_{i,j}$ is the probability of clicking to webpage $i$ from webpage $j$.
:::warning
**Example**
Consider the directed graph

For the given graph, the corresponding adjacency matrix $A$ and the stochastic matrix $P$ are:
$$
A = \begin{bmatrix} 0 & 1 & 1 & 0 \\ 1 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 0 \end{bmatrix}
\hspace{10mm}
P = \begin{bmatrix} 0 & 1/2 & 1/3 & 0 \\ 1 & 0 & 1/3 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & 1/2 & 1/3 & 0 \end{bmatrix}
$$
:::
### Steady State
The fundamental question is whether the iterative process of random clicks converges to a stable probability distribution, $\mathbf{x} = \lim_{n \to \infty} P^n \mathbf{x}_0$.
* If the limit $\mathbf{x}$ exists, it satisfies the equation $P\mathbf{x} = \mathbf{x}$.
* This means $\mathbf{x}$ is an eigenvector of $P$ corresponding to the eigenvalue $\lambda=1$, called the **steady state** vector.
#### Properties of Stochastic Matrices
1. **Eigenvalue $\lambda=1$:** Every stochastic matrix $P$ is guaranteed to have an eigenvalue $\lambda=1$ because its transpose, $P^T$, has $\mathbf{1}$ (a vector of all ones) as an eigenvector: $P^T\mathbf{1} = \mathbf{1}$.
2. **Bound on Eigenvalues:** All other eigenvalues $\lambda$ of $P$ satisfy $|\lambda| \leq 1$.
3. **Dominant Eigenvalue:** Convergence to a unique steady state requires $\lambda=1$ to be the dominant eigenvalue ($|\lambda_1| > |\lambda_j|$ for all $j \neq 1$). The *Perron–Frobenius Theorem* guarantees that if $P^k$ has all positive entries for some $k \ge 1$, then $\lambda_1 = 1$ is the single dominant eigenvalue, ensuring a unique, positive steady state.
:::warning
**Example (Convergence to Steady State)**
1. **Non-Dominant Eigenvalue Example (Failure to Converge)** Consider the stochastic matrix $P = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$.
* The characteristic polynomial is $C_P(\lambda) = \lambda^2 - 1$, yielding eigenvalues $\lambda_1 = 1$ and $\lambda_2 = -1$.
* Since $|\lambda_1| = |\lambda_2|$, there is *no dominant eigenvalue*.
* Applying the power iteration shows non-convergence to a single vector:$$P^n \begin{bmatrix} a \\ b \end{bmatrix} = \begin{cases} \begin{bmatrix} a \\ b \end{bmatrix}, & n \text{ even} \\ \begin{bmatrix} b \\ a \end{bmatrix}, & n \text{ odd} \end{cases}$$
* The limit $\displaystyle \lim_{n \to \infty} P^n \mathbf{x}_0$ does not exist unless the starting vector is already the steady state ($a=b$).
2. **Dominant Eigenvalue Example (Guaranteed Convergence)** Consider the stochastic matrix $P = \begin{bmatrix} \frac{3}{4} & 0 & 0 \\ \frac{1}{8} & \frac{3}{4} & \frac{1}{4} \\ \frac{1}{8} & \frac{1}{4} & \frac{3}{4} \end{bmatrix}$.
* The eigenvalues are $\lambda_1 = 1$, $\lambda_2 = \frac{3}{4}$, and $\lambda_3 = \frac{1}{2}$.
* Since $|\lambda_1| = 1$ and all other eigenvalues satisfy $|\lambda_j| < 1$, $\lambda_1$ is the *dominant eigenvalue*.
* The corresponding eigenvector for $\lambda_1=1$ is $\mathbf{v}_1 = \begin{bmatrix} 0 \\ 1/2 \\ 1/2 \end{bmatrix}$.
* Any initial state vector $\mathbf{x}_0$ can be written as a linear combination of eigenvectors: $\mathbf{x}_0 = \mathbf{v}_1 + c_2 \mathbf{v}_2 + c_3 \mathbf{v}_3$.
* Applying the power method, the terms associated with the non-dominant eigenvalues decay to zero:$$\lim_{n \to \infty} P^n \mathbf{x}_0 = \lim_{n \to \infty} P^n (\mathbf{v}_1 + c_2 \mathbf{v}_2 + c_3 \mathbf{v}_3) = \mathbf{v}_1$$
* The limit is the unique steady state $\mathbf{v}_1$.
:::
## PageRank Algorithm
The stochastic matrix $P$ for a real-world web graph may not guarantee a unique steady state (e.g., if there are dead ends or cyclical links). Google's solution is **teleportation**.
### The Google Matrix
The Google matrix $G$ is constructed by blending the original stochastic transition matrix $P$ with a teleportation matrix:
$$
G = \alpha P + (1 - \alpha) \mathbf{v} \mathbf{e}^T
$$
* **Stochastic Matrix $P$:** Represents clicking a random link (with probability $\alpha$).
* **Teleportation Parameter $\alpha$:** A value between $0$ and $1$ (usually $\alpha = 0.85$).
* **Teleportation Matrix $\mathbf{v}\mathbf{e}^T$:** Represents jumping to a random page (with probability $1-\alpha$) according to the distribution $\mathbf{v}$ (often uniform, $\mathbf{v} = \frac{1}{N}\mathbf{e}$).
**Process:** The Google matrix models the user process: with probability $\alpha$, they follow a link; with probability $1-\alpha$, they jump to a random page.
### The PageRank Vector
:::danger
**Theorem** By ensuring the Google matrix $G$ has all positive entries, the Perron-Frobenius theorem guarantees that $\lambda=1$ is the unique dominant eigenvalue.
This ensures the existence of a unique steady state vector $\mathbf{x}$, known as the PageRank vector: $$G \mathbf{x} = \mathbf{x}$$The entry $x_i$ is the *PageRank* of the webpage at vertex $i$. The PageRank vector is computed efficiently using the *power method* because $\lambda=1$ is guaranteed to be dominant.
:::
:::warning
**Example**
For the graph in the previous example, using $\alpha=0.85$ and a uniform teleportation vector, the PageRank vector is approximated as: $$\mathbf{x} \approx \begin{bmatrix} 0.2472 \\ 0.4681 \\ 0.0375 \\ 0.2472 \end{bmatrix}$$Webpage 2 has the highest PageRank, indicating it is the most important.
:::
:::success
**Exercise**
1. Determine whether the statement is *True* or *False*: It is necessary to compute all the eigenvectors of the Google matrix to find the PageRank vector of a directed graph.
2. Find the Google matrix $\alpha P + (1 - \alpha)\underline{v}\underline{e}^T$ for the directed graph using teleportation parameter $\alpha = 0.5$ and uniform distribution vector $\underline{v}$. Let $\underline{x}_0 = [1 \; 0 \; 0 \; 0]^T$ and use Python to compute 50 iterations of the power method to approximate the PageRank vector.

3. Consider the same directed graph as in the example in this section on PageRank. As $\alpha \to 1$, describe what happens to the PageRank $x_3$ of vertex 3.

4. Let $G$ be the complete directed graph with $N$ vertices. In other words, there is an edge from each vertex to every other vertex in $G$ (excluding edges from a vertex to itself). Describe the Google matrix and the PageRank vector for the complete directed graph.
:::