# Chapter 18. Power Law and Rich-Get-Risher Phenomena
[TOC]
## 18.1 Popularity as a Network Phenomenon
**Popularity** is a phenomenon characterized by extreme **imbalances**
Let's focus on the Web as our example
:::info
**Definition**
- In-links: the full set of links pointing to a given Web page
- $f(k)$: fraction of pages on the Web which have $k$ in-links
:::
A simple hypothesis:
- Assume that each page $j$ decides independently at random whether to link to any other given page $i$
- denoted by a random variable $X_{ij}\in\{0,1\}$
- The number of in-links to a given page is the sum of many independent random variables
- $X_i = \sum_j{X_{ij}}$
- By the **central limit theorem**, $X_i$ can be approximated by a **normal distribution**
- If we believe this model, then $f(k)$ should decrease **exponentially** as $k$ grows large
## 18.2 Power Laws
- In real measurements, the recurring finding is that $f(k)$ is approximately proportional to $1/k^2$
- A function that decreases as $k$ to some fixed power is called a **power law**
- There’s a simple method to test whether a dataset exhibits a power-law distribution
:::info
_We want to know whether the equation $f(k)=a/k^c$ holds for some constants $a$ and $c$_
$$
\begin{aligned}
\ & f(k) = ak^{-c} \\
\Rightarrow\ & log\ f(k) = log\ a + log\ k^{-c} \\
\Rightarrow\ & log\ f(k) = log\ a - c\ log\ k
\end{aligned}
$$
When we plot $log\ f(k)$ as a function of $log\ k$, we should see a **straight line**
- $-c$ is the slope
- $log\ a$ is the y-intercept
:::
:::danger
The central limit theorem leads to normal distribution, but **what property leads to power-law distribution?**
:::
## 18.3 Rich-Get-Richer Models
### Model description
1. Pages are created in order, and named $1, 2, 3, ..., N$
2. When page $j$ is created, it chooses a page $i$ uniformly at random from $1, 2, ..., j-1$
- With probability $p$, page $j$ creates a link to this page $i$ **(LINK TO)**
- With probability $1−p$, page $j$ instead creates a link to the page that $i$ points to **(COPY FROM)**
### Properties of the model
- If we run it for many pages, $f(k)$ will be distributed approximately according to a power law $1/k^c$, where the value of $c$ depends on the choice of $p$
- As $p$ gets smaller, **copying** becomes more frequent, making one more likely to see extremely popular pages, so $c$ gets smaller as well
- The copying mechanism is an implementation of **rich-get-richer** dynamics
- The probability that page $\ell$ experiences an increase in popularity is directly proportional to $\ell$’s current popularity
## 18.4 The Unpredictability of Rich-Get-Richer Effects
Random effects early in the process should play a role in the dynamics of popularity
### Relationships between Power Laws and Information Cascades
- Similarity to information cascades:
- People are aware of earlier decisions made
- Differences with information cascades:
1. Provide many possible options (e.g. all possible Web pages) rather than just two (accepting an idea or rejecting it)
2. Involve a set of people who engage in very **limited observation** of the population
- e.g. consult the decision of just one other randomly selected person
3. The model doesn’t derive this imitation from a more fundamental model of **rational decision-making**. (simply imitate)
## 18.5 The Long Tail
Let’s imagine a media company with a large inventory. Consider this question:
> _Are most sales being generated by **hits** or by a multitude of **niche products**?_
- **Hits**: a small number of blockbusters that create huge revenues
- **Niche products**: items that are each individually less popular
According to Chris Anderson:
With the rise of the Internet, the latter alternative would be dominant, with a **long tail** of obscure products driving the bulk of audience interest
### Visualizing the long tail
- We order the books by **sales rank**, and then we look at the popularity of books as we move out to larger and larger sales ranks — into the niche products

- A concrete version of the hits-vs.-niche question, is whether there is significantly **more area** under the left part of this curve _(**hits**)_ or the right _(**niche products**)_

## 18.6 The Effect of Search and Recommandation Systems
### Examples of accentuating rich-get-richer
- Some search engines like Google use popularity measures to rank Web pages and assign their availability to users, so even the choice of what to copy from becomes **highly skewed**
### Examples of counteracting rich-get-richer
- **Recommendation systems** of companies like Amazon and Netflix are designed to expose people to items that may not be generally popular, but which match user interests as inferred from their history of past purchases
## 18.7 Advanced Material: Analysis of Rich-Get-Richer Processes
### Rich-Get-Richer Model
- pages are indexed $1,2,\ldots,N$
- for $t$ from $1$ to $N$
- with probability $0<p<1$
- choose page $i$ from $1,2,\ldots,t-1$ uniformly
- with probability $1-p$
- choose page $i$ from $1,2,\ldots,t-1$ with probability proportional to $i$'s number of in-links
- create link $t\rightarrow i$
### Deterministic Approximation
Let $X_j(t)$ be a random variable of the number of in-links of page $j$ at time $t\geq j$, then
- $X_j(j)=0$
- $X_j(t+1)-X_j(t)=\displaystyle\frac{p}{t}+\frac{(1-p)X_j(t)}{t}$
Extend the range of $t$ from $\mathbb N$ to $\mathbb R$, and assume $X_j(t)$ change continuously. That is
$$
\frac{dX_j(t)}{dt}=\displaystyle\frac{p}{t}+\frac{(1-p)X_j(t)}{t}
$$
:::info
***Lemma.*** Consider the differential equation
$$
\begin{cases}
\displaystyle\frac{du(x)}{dx}=\frac{p+qu(x)}{x}\\
u(x_0)=0
\end{cases}
$$
Then the solusion is
$$
u(x)=\frac{p}{q}\left(\left(\frac{x}{x_0}\right)^q-1\right)
$$
***Proof.***
Rewrite
$$
\frac{du}{p+qu}=\frac{dx}{x}
$$
Take integral on both side
$$
\frac{1}{q}\ln(p+qu)=\ln(x)+c
$$
and hence
$$
p+qu=Ae^{q\ln(x)},\quad A=e^{qc}
$$
finally
$$
u=Ax^q-\frac{p}{q}
$$
To find $A$, evaluate $u(x_0)=0$, we have
$$
0=u(x_0)=Ax_0^q-\frac{p}{q},\quad A=x_0^{-q}\frac{p}{q}
$$
thus
$$
u(x)=x_0^{-q}\frac{p}{q}x^q-\frac{p}{q}=\frac{p}{q}\left(\left(\frac{x}{x_0}\right)^q-1\right)
$$
:::
By lemma,
$$
X_j(t)=\frac{p}{q}\left(\left(\frac{x}{j}\right)^q-1\right),\quad q=1-p
$$
### Power Law
Given $k,t$ what fraction of pages satisfies $X_j(t)\geq k$?
$$
X_j(t)=\frac{p}{q}\left(\left(\frac{t}{j}\right)^q-1\right)\geq k,\quad j\leq t\left(\frac{q}{p}\cdot k+1\right)^{-1/q}
$$
Thus the fraction of pages has at least $k$ in-links at time $t$ is
$$
\frac{1}{t}\cdot t\left(\frac{q}{p}\cdot k+1\right)^{-1/q}=\left(\frac{q}{p}\cdot k+1\right)^{-1/q}=F(k)
$$
- independet of $t$
- $F(k)=O(k^{-1/q})$
$F(k)$ is the fraction of pages with at least $k$ in-links, thus $f(k)=-dF/dk$ is the fraction of pages with exact $k$ in-links
$$
f(k)=-\frac{dF(k)}{dk}=-\frac{-1}{q}\frac{q}{p}\left(\frac{q}{p}\cdot k+1\right)^{-1/q-1}=O(k^{-(1+1/q)})
$$
- the original model shows the same trend
- when $q\to 1$, the exponent decreases to $2$
- many real networks e.g. Web Page are with power-law exponent slightly above $2$