Centrality is a term used to describe the importance of nodes in a network. The most important nodes being those towards the center, which peripheral nodes interact through. However, in a quantitative sense, this is not well-defined. There are many approaches/methods for calculating centrality, which are relevant in different cases.
Examples:
Side note: came across this in the context of game theory. It's a way of ranking the power that different coalitions wield in a vote-based system. Shapley-Shubik power index https://en.wikipedia.org/wiki/Shapley–Shubik_power_index
Possibly the simplest measure of centrality, degree centrality is simply defined as the degree of a node. I.e. the centrality of node
Another relatively straightforward algorithm, this simply looks at each pair of vertices in the network and determines the shortest path between said vertices (where shortest means fewest edges). The centrality of a node
where
Closeness is determined from the average distance of node
where
The current-flow centrality is a variant of betweennness that, instead of counting only the shortest path between two nodes, calculates the effective current through each node when a voltage source and sink are placed on nodes
As these are equivalent, we discuss the more straightforward current calculation here. Consider the network in question as a resistor network where nodes are junctions and edges have resistance
The current flows can be calculated using Kirchhoff's law of current conservation, which states that the current flowing into a node is equal to the current flowing out.
For calculating current betweenness, we are interested in the current flowing through each node except
The current flow betweenness follows as the average of this current over all source/target pairs:
While the metrics thus far are fairly intuitive in terms of what they say about a node, the eigenvector centrality is best understood in terms of the network as a whole. Let us define the adjacency matrix
where
Intuitively, an eigenvector of
This is an important metric upon which many famous algorithms, including Google's PageRank, are based.
Like eigenvector centrality, PageRank highlights nodes that are connected to other high-ranking nodes. In the context of websites, this means it measures links to a page (node) weighted by the ranking of the page that links to it. While the PageRank is designed for directed graphs, we can handle non-directed graphs by treating each edge as a bidirectional link.
Formally, consider a network with
where
However, there is an issue with this definition. Consider 2 nodes such that they each only link to each other, but are linked to by an outside source. Each iteration, some page rank gets transferred from the outside to these two nodes, but nothing is transferred out. This is a Rank Sink that will, every iteration, drain Rank from the rest of the system. Any such closed loop will cause issues with this definition of PageRank–leading to a meaningless ranking.
The issue above is solved by the introduction of a damping factor
Now clearly, the basic algorithm captures an important aspect of this: highly linked sites are ranked higher, with links from highly ranked sites weighted more than those from lower ranked sites. However, there are some mathematical idiosyncrasies that don't necessarily align with the actual goal of the metric in practice. Rank sinks (and sources) for example, can lead to non-meaningful ranks due to the directed nature of the graph.
This is where the damping factor
This gives each node a baseline factor based on the total number of nodes, which is added to the value defined in the basic algorithm section above (scaled by the damping factor, so the total of all ranks still sums to 1). The damping factor is usually set to
All of the above measures quantify a single node's contribution to the overall network. Recently, there has been a realization that this is sometime insufficient. Nodes identified by the traditional metrics have a relatively large impact on their own, but these metrics fail to capture impact in groups. For example, consider the difference between a power network that loses a single node to one that undergoes simultaneous failure at multiple nodes. Traditional metrics are designed for the first case and can completely miss that a group of nodes is absolutely critical if the network can still function without any one of the group.
An approach to tackle this problem is by adapting Shapley Values to networks. The premise is that each node's value is determined by its marginal contributions to all possible coalitions of nodes. To compute Shapley Values in network systems, we need to first define some sort of coalitional game. This means defining a value function
This is analogous to degree centrality, and originally proposed by Suri and Narahari in the context of determining the top-k nodes in social networks. For a graph
The game is characterized by the value function
There is a simple, exact formula for computing this Shapley value without having to iterate through all possible coalitions. See Michalak et al for the proof.
where
This is a generalized version of the last game. Instead of the fringe containing all nodes with a connection to
Given the extra condition, this calculation seems a bit more daunting. There are many conditionals based on the degree of a node and its existing connections. However, this can still be worked out to a simple algorithm.
Algorithm for Game 2
Input: Unweighted graph
, positive integer Output: Shapley value of all nodes in for game for
:
for :
A quick test for this is to verify that, with
The third approach is one that works with weighted networks. Instead of simply looking at direct connections, we introduce a cutoff distance
The extended degree is then
This allows for a "smarter" filtering of connections based on some weighted distance from a coalition