Chapter 20. The Small-World Phenomenon

# Chapter 20. The Small-World Phenomenon [ToC] ## 20.1 Six Degrees of Separation * The first empirical study of the small-world phenomenon * By Stanley Milgram * Randomly ask individuals to try fowarding a letter to a person in Boston. * Only advance letter by fowarding it to a acquaintance. * Roughly a third of the letters eventually arrive in a median of six steps. ## 20.2 Structure and Randomness :::success *Recall.* **Homophily**: The principle that nodes connect to others who are like themselves **Weak ties**: Nodes connect to nodes that far away ::: ### The Watts-Strogatz model A **random** graph model with **small-world properties** 1. Suppose everyone lives on a two-dimensional grid 2. Grid step: Two nodes are one *grid step* apart if they are horizontally or vertically adjacant 3. Network: * Given two constant $r, k$ * **Homophily**: Each node forms a edges to all other nodes lie within a radius of $r$ grid steps. We call these edge **local contacts** * **Weak ties**: Each node forms a edges to $k$ other nodes selected uniformly at random. We call these edge **long-range contacts** 4. Small-world properties: **Homophily** and **weak ties** make a node be able to reach any other node in a few steps. ![](https://i.imgur.com/jwljvlH.png) > Introducing a tiny amount of randomness is enough to make the world "small" ## 20.3 Decentralized Search Reference: [Complex networks and decentralized search algorithms](https://www.cs.cornell.edu/home/kleinber/icm06-swn.pdf) Decentralized: You don't know the global information, only have the information of current node. Decentralized algorithm: How you choose the next node. ### A model for decentralized search * Continue with the grid-based model of Watts and Strogatz * Consider decentralized algorithm that pass a message from starting node $s$ to target node $t$ * The current message-holder $v$ only knows 1. The location of the target $t$ 2. Its own local/long-range contacts. * $v$ must choose one of its neighbors $w$ to pass the message by the algorithm * Evaluate decentralized algorithm according to their *delivery time* * Delivery time: The expected value of steps required to reach the target, over randomly chosen starting and target nodes Our goal is to find algorithms with delivery times that are in $O(\log^k n)$ i.e., find algorithms that take short path on average But ... since long-range contacts are selected **uniformly at random** the delivery time might be very large. ![](https://i.imgur.com/JPQyEtj.png) ## 20.4 Modeling the Process of Decentralized Search ### Generalizing the Network model Earlier, the probability of generating a **long-range contact** from $v$ to $w$ is a constant. Now we generalize the model by changing the way generating a **long-range contact** Generalized model * For two nodes $v$ and $w$, let $d(v,w)$ denotes the number of grid steps between them * The probability of generating a random edge from $v$ to $w$ is proportional to $d(v,w)^{-q}$, where $q$ is called *clustering exponent* ### Theorems 1. The delivery time of any decentralized algorithm in the grid-based model is $\Omega(n^{2 / 3})$ 2. For $0 \leq q < 2$, the delivery time of any decentralized algorithm in the grid-based model is $\Omega(n^{(2-q)/3})$ 3. For $q > 2$, the delivery time of any decentralized algorithm in the grid-based model is $\Omega(n^{(q-2)/(q-1)})$ 4. For $q = 2$, there is a decentralized algorithm with delivery time $O(\log^2 n)$ So, polylogarithmic delivery time is acheivable when $q=2$, and in fact is the most effective one. ## 20.5 Empirical Analysis and Generalized Models ### Geographic Data on Friendship Liben-Nowell use blogging site LiveJournal to collect users address and their friends on the site. ![](https://i.imgur.com/2TLUjnJ.png) ### Rank-Base Friendship Determine the link probability not by physical distance but by **rank** ![](https://i.imgur.com/x7Gk6za.png) The results. ![](https://i.imgur.com/z2iKyfv.png) $x$-axis: rank $y$-axis: the probability of having a link with rank($x$) We can find out that the probability is apporximately proportional to rank$(x)^{-1}$ and thus is also proportional to $d^{-2}$, it matches the model of $q=2$ So there is a decentralized algorithm with delivery time in $O(\log^2 n)$ ## 20.6 Core-Periphery Structures and Difficulties in Decentralized Search _Judith Kleinfeld's critique of the Milgram experiment:_ > The success rate at finding targets in recreations of the experiment has often been **much lower** than it was in the original work. ### Difficulties - **Lack of participation** - An ideal situation of the experiment is when the target person is **socially high-status** ### Core-Periphery Structures - Large social networks tend to be organized in what is called a **core-periphery structure** - It indicates some reasons why it is harder for Milgram-style decentralized search to find low-status targets than high-status targets - Properties - The **high-status** people are linked in a densely-connected **core** - The **low-status** people are atomized around the **periphery** ![](https://i.imgur.com/fkBTVQt.jpg) ## 20.7 Advanced Material: Analysis of Decentralized Search - The best exponent for decentralized search is equal to the dimension - _For example, we have seen two-dimensional network in Section 20.4, where the search is the most efficient when $q=2$_ ### One-Dimension Model ![](https://i.imgur.com/kPS3m6U.jpg) - Each node connected by directed edges to the two others adjacent to it - The edges are called **local contacts** - Each node $v$ also has a single directed edge to some other node $w$ with the probability proportional to $d(v,w)^{-1}$ - The edges are called **long-range contacts** :::info **Myopic Search** - When a node is holding the message, it passes it to the contact that lies **the closest** to the target node - It is similar to the concept of **greedy algorithm**, so the myopic path may not be the shortest one ![](https://i.imgur.com/EW4GMNt.jpg) ::: ### Analyzing Myopic Search - Let $X$ be a random variable representing the number of steps required from start node $s$ to end node $t$ - We are interested in showing that $E[X]$ is relatively small :::success **Step 1: Split the path into phases** - As the message moves from $s$ to $t$, we’ll say that it’s **in phase $j$** of the search if its distance from the target is between $2^j$ and $2^{j+1}$ - The number of different phases is at most $\log_2 n$, where $n$ is the number of nodes in the ring ![](https://i.imgur.com/G1kofaO.jpg) $X$ becomes: $$ X = X_1 + X_2 + \cdots + X_{\log_2 n} \\ $$ By **linearity of expectation**, $$ E[X] = E[X_1] + E[X_2] + \cdots + E[X_{\log_2 n}] $$ ::: :::info **Step 2: Find the normalizing constant of probability distribution** - Let $Z$ be the sum of $d(v,w)^{-1}$ over all nodes $w \neq v$, then the probability of $v$ linking to $w$ is equal to $\frac{1}{Z}d(v,w)^{-1}$ There are two nodes at each distance $d$ from $v$ for $d=1, 2, \ldots , \frac{n}{2}$, so $$ Z = 2\sum_{d=1}^\frac{n}{2}{\frac{1}{d}} $$ The **harmonic series** can be bounded by the area under $y=\frac{1}{x}$: ![](https://i.imgur.com/FUZXlF2.jpg) $$ \sum_{i=1}^k{\frac{1}{i}} \leq 1 + \int_1^k{\frac{1}{x}dx} = 1 + \ln k \\ \Rightarrow Z \leq 2(1 + \ln \frac{n}{2}) $$ By the observation that $\ln x \leq \log_2 x$, $$ \begin{aligned} Z & \leq 2(1 + \log_2 \frac{n}{2}) \\ & = 2(1 + \log_2 n - \log_2 2) \\ & = 2\log_2 n \end{aligned} $$ Thus, the probability $v$ links to $w$ is $$ \frac{1}{Z}d(v,w)^{-1} \geq \frac{1}{2\log_2 n}d(v,w)^{-1} $$ ::: :::success **Step 3: Analyzing the time spent in one phase** ![](https://i.imgur.com/txIdyQc.jpg) - Let $I$ be the set of nodes at distance $\leq \frac{d}{2}$ from t, and there are $d+1$ nodes in $I$ - If $v$ is the last node in phase $j$, then $v$'s long-range contact $w \in I$ $$ d(v, w) \leq \frac{3d}{2} \\ \Rightarrow \frac{1}{2\log_2 n}d(v,w)^{-1} \geq \frac{1}{2\log_2 n}\cdot\frac{2}{3d} = \frac{1}{3d\log_2 n} $$ The probability that **one of nodes in $I$** is the long-range contact of $v$ is $$ d\cdot\frac{1}{2\log_2 n}d(v,w)^{-1} \geq \frac{1}{3\log_2 n} $$ It means phase $j$ has a probability of at least $\frac{1}{3\log_2 n}$ of coming to an end, so the probability that phase $j$ runs for at least $i$ steps is $$ Pr[X_j\geq i] \leq (1 - \frac{1}{3\log_2 n})^{i-1} $$ Finally, $$ \begin{aligned} E[X_j] & = \sum_{i\in\Bbb N}{i\cdot Pr[X_j=i]} \\ & = 1\cdot Pr[X_j=1] + 2\cdot Pr[X_j=2] + 3\cdot Pr[X_j=3] + \cdots \\ & = Pr[X_j\geq 1] + Pr[X_j\geq 2] + Pr[X_j\geq 3] + \cdots \\ & = \sum_{i\in\Bbb N} Pr[X_j\geq i] \\ & \leq \sum_{i\in\Bbb N} (1 - \frac{1}{3\log_2 n})^{i-1} \\ & = \frac{1}{1-(1 - \frac{1}{3\log_2 n})} \\ & = 3\log_2 n \end{aligned} $$ ::: **Conclusion** $$ \begin{aligned} E[X] & = E[X_1] + E[X_2] + \cdots + E[X_{\log_2 n}] \\ & \leq (3\log_2 n) \cdot (\log_2 n) \\ & = 3(\log_2 n)^2 \\ \\ \Rightarrow E[X] & = O((\log n)^2) \end{aligned} $$ ### The analysis in higher dimensions In the one-dimensional version, there is a nice property: > The probability that **one of nodes in $I$** is the long-range contact of $v$ is $$ d\cdot\frac{1}{2\log_2 n}d(v,w)^{-1} \geq \frac{1}{3\log_2 n} $$ That makes the probability of halving the distance to the target in any given step is **independent of $d$** - In $q$ dimensions, the number of nodes within distance $d/2$ of the target is proportional to $d^q$ - To get the **cancellation property**, we should have $v$ link to each node $w$ with probability proportional to $d(v, w)^{−q}$ ### How about other exponents Here we just focus on **one-dimensional** cases - If $q=0$ - Let $K$ be the set of nodes at distance $< \sqrt{n}$ from $t$ - The probability that any one node has a long-range contact inside $K$: $$ \frac{2\sqrt{n}}{n} = \frac{2}{\sqrt{n}} $$ - At least $\frac{\sqrt{n}}{2}$ steps are needed in expectation to find a node with long-range contacts in $K$ - **Conclusion:** $E[X] = \Omega(\sqrt{n})$ - If $0<q<1$ - The arguments are similar to $q=0$, but with the width of $K$ depending on $q$ - If $q>1$ - Since even the long-range links are relatively short, it takes a long time to find links that span sufficiently long distances :::info Overall, for every exponent $q \neq 1$, $E[X] = \Omega(n^c)$ for some $c$ depending on $q$ :::