---
tags: algorithm analysis
---
# Page Rank
(this is only a bare-bones outline of my lecture ... for more see references below)
## Basic Definitions
A [graph](https://en.wikipedia.org/wiki/Graph_theory) $G=(V,E)$.
The elements of $V$ are called vertices, nodes, pages, etc.
The elements of $E$ are called edges, arrows, links, etc. We write an edge from node $i$ to node $j$ as $i\to j$.
The set of **in-links** (aka backlinks) of node $j$ is $\{i\to j \mid i\in V\}$. $i$ is also called a backpage of $j$.
The set of **out-links** of node $j$ is $\{j\to i \mid i\in V\}$.
The **degree** $d_i$ of node $i$ is the cardinality (size) of its set of out-links.
## Page Rank Definition
Every page $i\in V$ has a rank (importance, vote) $r_i$, subject to the following equations:[^pagerank]
$$r_j = \sum_{i\to j} \frac {r_i} {d_i}$$
$$\sum_{i} r_i = 1$$
**Exercise/Activity:** Explain this formula using the concepts of in-link and out-link.
## Page Rank as Eigenvector
We define the "column stochastic adjaceny" matrix
$$
M_{ij} = \begin{cases}
\frac 1 {d_j} & \text{if } j\to i \\
0 & \text{otherwise}
\end{cases}
$$
**Exercise/Activity:** Explain why we modified the concept of the adjacency matrix of a graph (which we studied previously).
**Exercise/Activity:** Explain why the two equations defining $r_i$ above are equivalent to the one equation
$$r=M\cdot r$$
## Random Walk
What is the probability $p(t)$ of a random surfer being on page $i$ at time $t$?
The probability $p(t+1)$ given $p(t)$ is computed via
$$p(t+1) = M\cdot p(t)$$
**Exercise/Activity:** The solution to the equation $r=M\cdot r$ is exactly the stationary distribution of random walker.
Moreover, if we choose a random starting point $p(0)$ we can use the formula $p(t+1) = M\cdot p(t)$ to approximate a good solution iteratively.
**PS:** To actually implement this there are a few more things to consider, though, see the references below for more. And there are interesting mathematical questions about the existence and uniqueness of solutions to the equation $r=Mr$.
**PPS:** In class we had an interesting discussion how the concept of a random walk allows us to predict how much time people spend on each page (on average) without making any actual observations.
[^pagerank]: Here is the quote from the paper by Brin and Page:
![image](https://hackmd.io/_uploads/ryYFOlOb0.png)
## References
An excellent video:
- Stanford CS224W: Machine Learning with Graphs, 2021. [Lecture 4.1 - 4.2](https://www.youtube.com/watch?v=TU0ankRcHmo&list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&index=10).
A classic textbook:
- Easley, Kleinberg: [Networks, Crowds, and Markets: Reasoning About a Highly Connected World](https://www.cs.cornell.edu/home/kleinber/networks-book/), 2010. Chapter 14.3.
Further reading:
- Sergey Brin, Lawrence Page: [The Anatomy of a Large-Scale Hypertextual Web Search Engine](https://research.google/pubs/the-anatomy-of-a-large-scale-hypertextual-web-search-engine/), Computer Networks, vol. 30 (1998), pp. 107-117
- Andrei Broder etal: [Graph structure in the web](https://www.cis.upenn.edu/~mkearns/teaching/NetworkedLife/broder.pdf), 2000.
- Langville and Meyer: [Google's PageRank and Beyond: The Science of Search Engine Rankings](https://gi.cebitec.uni-bielefeld.de/_media/teaching/2019winter/alggr/langville_meyer_2006.pdf), 2006.