# Homework 3: CSCI 347: Data Mining
Names: Sam Behrens and Karl Molina
Show your work. Include any code snippets you used to generate an answer, using comments in the code to clearly indicate which problem corresponds to which code.
Consider the following graph:

1. [3 points] Find the betweenness centrality of vertices 3 and 12. You may use networkx or other graph analysis packages.
```python=
import networkx as nx
>>> graph = nx.Graph()
>>> graph.add_nodes_from(range(1, 13))
>>> graph.add_edges_from([
(1, 2), (1, 3), (2, 3), (3, 4), (3, 5), (4, 5), (3 ,12),
(6, 12), (7, 12), (8, 12), (9, 12), (10, 12), (11, 12)
])
>>> nx.betweenness_centrality(graph)
{1: 0.0, 2: 0.0, 3: 0.5818181818181818, 4: 0.0, 5: 0.0, 6: 0.0,
7: 0.0, 8: 0.0, 9: 0.0, 10: 0.0, 11: 0.0, 12: 0.8181818181818181}
```
The betweenness centrality of vertex 3 is **0.582** or $\frac{32}{55}$ and the betweenness centrality of vertex 12 is **0.818** or $\frac{9}{11}$.
2. [3 points] Without using networkx or other graph analysis packages (though you may use them to check your answer), find the closeness centrality of vertices 3 and 12.
Closeness centrality of vertex 3:
$$
\frac{1}{d(1, 3) + d(2, 3) + d(4, 3) + ... + d(12, 3)}
$$
$$
\frac{1}{1 + 1 + 1 + 1 + 1 + 2 + 2 + 2 + 2 + 2 + 2}
$$
$$
=\frac{1}{17}
$$
Closeness centrality of vertex 12:
$$
\frac{1}{d(1, 12) + d(2, 12) + d(3, 12) + ... + d(11, 12)}
$$
$$
\frac{1}{1 + 1 + 1 + 1 + 1 + 1 + 1 + 2 + 2 + 2 + 2}
$$
$$
=\frac{1}{15}
$$
3. [3 points] Without using networkx or other graph analysis packages (though you may use them to check your answer), find the eccentricity of vertices 3 and 12.
Eccentricity of vertex 3:
$$
d(v_3, v_6)=2
$$
Eccentricity of vertex 12:
$$
d(v_{12}, v_1)=2
$$
4. [3 points] Using networkx, find the prestige/eigenvector centrality of vertices 3 and 12. Include the code used to generate the answer.
```python=
>>> nx.eigenvector_centrality(graph)[3]
0.52222943921471
>>> nx.eigenvector_centrality(graph)[12]
0.5222380174120582
```
The eigenvector centrality of vertex 3 is **0.522** and the eigenvector centrality of vertex 12 is **0.522** as well.
5. [3 points] Find $\mu_L$, the average length of the shortest path between two vertices in this graph.
```python=
>>> nx.average_shortest_path_length(graph)
2.1666666666666665
```
6. [3 points] Use Python to create a plot of the degree distribution of this graph. Include the code and the plot.
```python=
>>> degrees = dict(nx.degree(graph))
>>> plt.hist(degrees.values(), bins=100)
>>> plt.show()
```

7. [3 points] Without using networkx or other graph analysis packages (though you may use them to check your answer), find the clustering coefficient of vertex 3.
Vertex 3 has five neighbors. They are vertices 1, 2, 4, 5, and 12. Between them, 1 and 2 are connected, and 4 and 5 are connected. There are ten possible edges between them. Then the clustering coefficient is:
$$
\frac{2}{10}\space\text{or}\space\frac{1}{5}
$$
8. [3 points] Without using networkx or other graph analysis packages (though you may use them to check your answer), find the clustering coefficient of the graph.
$$
\begin{aligned}
cc(1) &= 1 \\
cc(2) &= 1 \\
cc(3) &= \frac{1}{5} \\
cc(4) &= 1 \\
cc(5) &= 1 \\
cc(6) &= 0 \\
cc(7) &= 0 \\
cc(8) &= 0 \\
cc(9) &= 0 \\
cc(10) &= 0 \\
cc(11) &= 0 \\
cc(12) &= 0 \\
\mu &= \frac{4.2}{12} \\
&= 0.35
\end{aligned}
$$
The average clustering coefficient is **0.35**.
Check the answer:
```python=
>>> nx.clustering(graph)
{1: 1.0, 2: 1.0, 3: 0.2, 4: 1.0, 5: 1.0, 6: 0,
7: 0, 8: 0, 9: 0, 10: 0, 11: 0, 12: 0}
>>> nx.average_clustering(graph)
0.35000000000000003
```
9. [6 points] Use networkx to create two random graphs, each having 200 nodes. The first graph should be an (undirected) Erdos-Renyi random graph with parameters n=200 and p=0.1. The second graph should be a (undirected) Barabási–Albert (preferential attachment) graph with parameters n=200 and m=3. Create visualizations of both graphs, with vertex sizes dependent on betweenness centrality (the higher the betweenness centrality, the greater the size) and node color dependent on degree. Include both the code to generate the plots, as well as the plots themselves, in your submission.
```python=
>>> g1 = nx.random_graphs.erdos_renyi_graph(200, 0.1)
>>> pos1 = nx.spring_layout(g1)
>>> bc1 = nx.betweenness_centrality(g1)
>>> node_color_1 = [2000.0*g1.degree(v) for v in g1]
>>> node_size_1 = [v*10000 for v in bc1.values()]
>>> nx.draw_networkx(g1, pos=pos1, with_labels=False,
node_color=node_color_1, node_size=node_size_1)
```

```python=
>>> g2 = nx.random_graphs.barabasi_albert_graph(200, 3)
>>> pos2 = nx.spring_layout(g2)
>>> bc2 = nx.betweenness_centrality(g2)
>>> node_color_2 = [2000.0*g2.degree(v) for v in g2]
>>> node_size_2 = [v for v in bc2.values()]
>>> nx.draw_networkx(g2, pos=pos2, with_labels=False,
node_color=node_color_2, node_size=node_size_2)
```
