--- tags: algorithm analysis --- # Page Rank (this is only a bare-bones outline of my lecture ... for more see references below) ## Basic Definitions A [graph](https://en.wikipedia.org/wiki/Graph_theory) $G=(V,E)$. The elements of $V$ are called vertices, nodes, pages, etc. The elements of $E$ are called edges, arrows, links, etc. We write an edge from node $i$ to node $j$ as $i\to j$. The set of **in-links** (aka backlinks) of node $j$ is $\{i\to j \mid i\in V\}$. $i$ is also called a backpage of $j$. The set of **out-links** of node $j$ is $\{j\to i \mid i\in V\}$. The **degree** $d_i$ of node $i$ is the cardinality (size) of its set of out-links. ## Page Rank Definition Every page $i\in V$ has a rank (importance, vote) $r_i$, subject to the following equations:[^pagerank] $$r_j = \sum_{i\to j} \frac {r_i} {d_i}$$ $$\sum_{i} r_i = 1$$ **Exercise/Activity:** Explain this formula using the concepts of in-link and out-link. ## Page Rank as Eigenvector We define the "column stochastic adjaceny" matrix $$ M_{ij} = \begin{cases} \frac 1 {d_j} & \text{if } j\to i \\ 0 & \text{otherwise} \end{cases} $$ **Exercise/Activity:** Explain why we modified the concept of the adjacency matrix of a graph (which we studied previously). **Exercise/Activity:** Explain why the two equations defining $r_i$ above are equivalent to the one equation $$r=M\cdot r$$ ## Random Walk What is the probability $p(t)$ of a random surfer being on page $i$ at time $t$? The probability $p(t+1)$ given $p(t)$ is computed via $$p(t+1) = M\cdot p(t)$$ **Exercise/Activity:** The solution to the equation $r=M\cdot r$ is exactly the stationary distribution of random walker. Moreover, if we choose a random starting point $p(0)$ we can use the formula $p(t+1) = M\cdot p(t)$ to approximate a good solution iteratively. **PS:** To actually implement this there are a few more things to consider, though, see the references below for more. And there are interesting mathematical questions about the existence and uniqueness of solutions to the equation $r=Mr$. **PPS:** In class we had an interesting discussion how the concept of a random walk allows us to predict how much time people spend on each page (on average) without making any actual observations. [^pagerank]: Here is the quote from the paper by Brin and Page: ![image](https://hackmd.io/_uploads/ryYFOlOb0.png) ## References An excellent video: - Stanford CS224W: Machine Learning with Graphs, 2021. [Lecture 4.1 - 4.2](https://www.youtube.com/watch?v=TU0ankRcHmo&list=PLoROMvodv4rPLKxIpqhjhPgdQy7imNkDn&index=10). A classic textbook: - Easley, Kleinberg: [Networks, Crowds, and Markets: Reasoning About a Highly Connected World](https://www.cs.cornell.edu/home/kleinber/networks-book/), 2010. Chapter 14.3. Further reading: - Sergey Brin, Lawrence Page: [The Anatomy of a Large-Scale Hypertextual Web Search Engine](https://research.google/pubs/the-anatomy-of-a-large-scale-hypertextual-web-search-engine/), Computer Networks, vol. 30 (1998), pp. 107-117 - Andrei Broder etal: [Graph structure in the web](https://www.cis.upenn.edu/~mkearns/teaching/NetworkedLife/broder.pdf), 2000. - Langville and Meyer: [Google's PageRank and Beyond: The Science of Search Engine Rankings](https://gi.cebitec.uni-bielefeld.de/_media/teaching/2019winter/alggr/langville_meyer_2006.pdf), 2006.