# Chapter 20. The Small-World Phenomenon
[ToC]
## 20.1 Six Degrees of Separation
* The first empirical study of the small-world phenomenon
* By Stanley Milgram
* Randomly ask individuals to try fowarding a letter to a person in Boston.
* Only advance letter by fowarding it to a acquaintance.
* Roughly a third of the letters eventually arrive in a median of six steps.
## 20.2 Structure and Randomness
:::success
*Recall.*
**Homophily**: The principle that nodes connect to others who are like themselves
**Weak ties**: Nodes connect to nodes that far away
:::
### The Watts-Strogatz model
A **random** graph model with **small-world properties**
1. Suppose everyone lives on a two-dimensional grid
2. Grid step: Two nodes are one *grid step* apart if they are horizontally or vertically adjacant
3. Network:
* Given two constant $r, k$
* **Homophily**: Each node forms a edges to all other nodes lie within a radius of $r$ grid steps.
We call these edge **local contacts**
* **Weak ties**: Each node forms a edges to $k$ other nodes selected uniformly at random.
We call these edge **long-range contacts**
4. Small-world properties: **Homophily** and **weak ties** make a node be able to reach any other node in a few steps.

> Introducing a tiny amount of randomness is enough to make the world "small"
## 20.3 Decentralized Search
Reference: [Complex networks and decentralized search algorithms](https://www.cs.cornell.edu/home/kleinber/icm06-swn.pdf)
Decentralized: You don't know the global information, only have the information of current node.
Decentralized algorithm: How you choose the next node.
### A model for decentralized search
* Continue with the grid-based model of Watts and Strogatz
* Consider decentralized algorithm that pass a message from starting node $s$ to target node $t$
* The current message-holder $v$ only knows
1. The location of the target $t$
2. Its own local/long-range contacts.
* $v$ must choose one of its neighbors $w$ to pass the message by the algorithm
* Evaluate decentralized algorithm according to their *delivery time*
* Delivery time: The expected value of steps required to reach the target, over randomly chosen starting and target nodes
Our goal is to find algorithms with delivery times that are in $O(\log^k n)$
i.e., find algorithms that take short path on average
But ... since long-range contacts are selected **uniformly at random**
the delivery time might be very large.

## 20.4 Modeling the Process of Decentralized Search
### Generalizing the Network model
Earlier, the probability of generating a **long-range contact** from $v$ to $w$ is a constant.
Now we generalize the model by changing the way generating a **long-range contact**
Generalized model
* For two nodes $v$ and $w$,
let $d(v,w)$ denotes the number of grid steps between them
* The probability of generating a random edge from $v$ to $w$
is proportional to $d(v,w)^{-q}$, where $q$ is called *clustering exponent*
### Theorems
1. The delivery time of any decentralized algorithm in the grid-based model is $\Omega(n^{2 / 3})$
2. For $0 \leq q < 2$,
the delivery time of any decentralized algorithm in the grid-based model is $\Omega(n^{(2-q)/3})$
3. For $q > 2$,
the delivery time of any decentralized algorithm in the grid-based model is $\Omega(n^{(q-2)/(q-1)})$
4. For $q = 2$, there is a decentralized algorithm with delivery time $O(\log^2 n)$
So, polylogarithmic delivery time is acheivable when $q=2$, and in fact is the most effective one.
## 20.5 Empirical Analysis and Generalized Models
### Geographic Data on Friendship
Liben-Nowell use blogging site LiveJournal to collect users address and their friends on the site.

### Rank-Base Friendship
Determine the link probability not by physical distance but by **rank**

The results.

$x$-axis: rank
$y$-axis: the probability of having a link with rank($x$)
We can find out that the probability is apporximately proportional to rank$(x)^{-1}$
and thus is also proportional to $d^{-2}$, it matches the model of $q=2$
So there is a decentralized algorithm with delivery time in $O(\log^2 n)$
## 20.6 Core-Periphery Structures and Difficulties in Decentralized Search
_Judith Kleinfeld's critique of the Milgram experiment:_
> The success rate at finding targets in recreations of the experiment has often been **much lower** than it was in the original work.
### Difficulties
- **Lack of participation**
- An ideal situation of the experiment is when the target person is **socially high-status**
### Core-Periphery Structures
- Large social networks tend to be organized in what is called a **core-periphery structure**
- It indicates some reasons why it is harder for Milgram-style decentralized search to find low-status targets than high-status targets
- Properties
- The **high-status** people are linked in a densely-connected **core**
- The **low-status** people are atomized around the **periphery**

## 20.7 Advanced Material: Analysis of Decentralized Search
- The best exponent for decentralized search is equal to the dimension
- _For example, we have seen two-dimensional network in Section 20.4, where the search is the most efficient when $q=2$_
### One-Dimension Model

- Each node connected by directed edges to the two others adjacent to it
- The edges are called **local contacts**
- Each node $v$ also has a single directed edge to some other node $w$ with the probability proportional to $d(v,w)^{-1}$
- The edges are called **long-range contacts**
:::info
**Myopic Search**
- When a node is holding the message, it passes it to the contact that lies **the closest** to the target node
- It is similar to the concept of **greedy algorithm**, so the myopic path may not be the shortest one

:::
### Analyzing Myopic Search
- Let $X$ be a random variable representing the number of steps required from start node $s$ to end node $t$
- We are interested in showing that $E[X]$ is relatively small
:::success
**Step 1: Split the path into phases**
- As the message moves from $s$ to $t$, we’ll say that it’s **in phase $j$** of the search if its distance from the target is between $2^j$ and $2^{j+1}$
- The number of different phases is at most $\log_2 n$, where $n$ is the number of nodes in the ring

$X$ becomes:
$$
X = X_1 + X_2 + \cdots + X_{\log_2 n} \\
$$
By **linearity of expectation**,
$$
E[X] = E[X_1] + E[X_2] + \cdots + E[X_{\log_2 n}]
$$
:::
:::info
**Step 2: Find the normalizing constant of probability distribution**
- Let $Z$ be the sum of $d(v,w)^{-1}$ over all nodes $w \neq v$, then the probability of $v$ linking to $w$ is equal to $\frac{1}{Z}d(v,w)^{-1}$
There are two nodes at each distance $d$ from $v$ for $d=1, 2, \ldots , \frac{n}{2}$, so
$$
Z = 2\sum_{d=1}^\frac{n}{2}{\frac{1}{d}}
$$
The **harmonic series** can be bounded by the area under $y=\frac{1}{x}$:

$$
\sum_{i=1}^k{\frac{1}{i}} \leq 1 + \int_1^k{\frac{1}{x}dx} = 1 + \ln k \\
\Rightarrow Z \leq 2(1 + \ln \frac{n}{2})
$$
By the observation that $\ln x \leq \log_2 x$,
$$
\begin{aligned}
Z & \leq 2(1 + \log_2 \frac{n}{2}) \\
& = 2(1 + \log_2 n - \log_2 2) \\
& = 2\log_2 n
\end{aligned}
$$
Thus, the probability $v$ links to $w$ is
$$
\frac{1}{Z}d(v,w)^{-1} \geq \frac{1}{2\log_2 n}d(v,w)^{-1}
$$
:::
:::success
**Step 3: Analyzing the time spent in one phase**

- Let $I$ be the set of nodes at distance $\leq \frac{d}{2}$ from t, and there are $d+1$ nodes in $I$
- If $v$ is the last node in phase $j$, then $v$'s long-range contact $w \in I$
$$
d(v, w) \leq \frac{3d}{2} \\
\Rightarrow \frac{1}{2\log_2 n}d(v,w)^{-1} \geq \frac{1}{2\log_2 n}\cdot\frac{2}{3d} = \frac{1}{3d\log_2 n}
$$
The probability that **one of nodes in $I$** is the long-range contact of $v$ is
$$
d\cdot\frac{1}{2\log_2 n}d(v,w)^{-1} \geq \frac{1}{3\log_2 n}
$$
It means phase $j$ has a probability of at least $\frac{1}{3\log_2 n}$ of coming to an end, so the probability that phase $j$ runs for at least $i$ steps is
$$
Pr[X_j\geq i] \leq (1 - \frac{1}{3\log_2 n})^{i-1}
$$
Finally,
$$
\begin{aligned}
E[X_j] & = \sum_{i\in\Bbb N}{i\cdot Pr[X_j=i]} \\
& = 1\cdot Pr[X_j=1] + 2\cdot Pr[X_j=2] + 3\cdot Pr[X_j=3] + \cdots \\
& = Pr[X_j\geq 1] + Pr[X_j\geq 2] + Pr[X_j\geq 3] + \cdots \\
& = \sum_{i\in\Bbb N} Pr[X_j\geq i] \\
& \leq \sum_{i\in\Bbb N} (1 - \frac{1}{3\log_2 n})^{i-1} \\
& = \frac{1}{1-(1 - \frac{1}{3\log_2 n})} \\
& = 3\log_2 n
\end{aligned}
$$
:::
**Conclusion**
$$
\begin{aligned}
E[X] & = E[X_1] + E[X_2] + \cdots + E[X_{\log_2 n}] \\
& \leq (3\log_2 n) \cdot (\log_2 n) \\
& = 3(\log_2 n)^2 \\ \\
\Rightarrow E[X] & = O((\log n)^2)
\end{aligned}
$$
### The analysis in higher dimensions
In the one-dimensional version, there is a nice property:
> The probability that **one of nodes in $I$** is the long-range contact of $v$ is
$$
d\cdot\frac{1}{2\log_2 n}d(v,w)^{-1} \geq \frac{1}{3\log_2 n}
$$
That makes the probability of halving the distance to the target in any given step is **independent of $d$**
- In $q$ dimensions, the number of nodes within distance $d/2$ of the target is proportional to $d^q$
- To get the **cancellation property**, we should have $v$ link to each node $w$ with probability proportional to $d(v, w)^{−q}$
### How about other exponents
Here we just focus on **one-dimensional** cases
- If $q=0$
- Let $K$ be the set of nodes at distance $< \sqrt{n}$ from $t$
- The probability that any one node has a long-range contact inside $K$:
$$
\frac{2\sqrt{n}}{n} = \frac{2}{\sqrt{n}}
$$
- At least $\frac{\sqrt{n}}{2}$ steps are needed in expectation to find a node with long-range contacts in $K$
- **Conclusion:** $E[X] = \Omega(\sqrt{n})$
- If $0<q<1$
- The arguments are similar to $q=0$, but with the width of $K$ depending on $q$
- If $q>1$
- Since even the long-range links are relatively short, it takes a long time to find links that span sufficiently long distances
:::info
Overall, for every exponent $q \neq 1$, $E[X] = \Omega(n^c)$ for some $c$ depending on $q$
:::