[Paper] Graphchi: Large-scale graph computation on just a PC
PPT slides
Google Slides
Graph processing problems
- Social networks
- Web graphs
- Protein interaction graphs
Why graph processing is hard ?
- Hard to divide and conquer this problem
- Partitioning the graph across cluster nodes
- Finding efficient graph cuts that minimize communication between nodes, and are also balanced, is hard [27]
- [27] Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters
- Indeed, many real-world graphs have a substantial amount of inherent locality. For example, webpages are clustered under domains, and people have more connections in social networks inside their geographical region than outside it.
- Unfortunately, the locality of real-world graphs is limited, because the number of edges crossing local clusters is also large.
- Graph may be dynamic evolving.
- CANNOT be readily decomposed into smaller parts that can be processed in parallel
- MapReduce is inefficient for graph processing [13,30,31]
Vertex-centric model definition
- Directed sparse graph
- Associate a value with each vertex , and each edge
- Given an edge , is 's out-edge, 's in-edge
-
- Example: PageRank Algorithm
Bulk-Synchronous Parallel (BSP) model
- From "A Bridging Model for Parallel Computation" Communication of ACM, 1990
- BSP is a bridge between parallel computation and hardware implementation
- Goal: Mapping high-level programs to actual machines in a great variety of contexts, little efficiency is lost if we utilize this single model.
- Detail:
- A computation consists of a sequence of supersteps. In each superstep, each component is allocated a task consisting of some combination of local computation steps, and message passing steps.
- After each period of L time units, a global check is made to determine whether the superstep has been completed by all the components.
- If yes, the machine proceeds, if no, the next L units is allocated to the unfinished superstep.
Sparse graph encoding
Example graph
Adjacency matrix
Transpose of adjacency matrix
CSR (Compressed Sparse Row)
- Encoding (assume zero-index array)
- W = {10, 20, 30, 40, 50, 60, 70, 80}
- COL_INDEX = {0, 1, 1, 3, 2, 3, 4, 5}
- ROW_INDEX = {0, 2, 4, 7, 8}
- Find the out-edge of vertex (row 1 of matrix M)
- ROW_INDEX[1:2] = {2, 4}
- W[2:4] = {30, 40}
- COL_INDEX[2:4] = {1, 3}
- Find the in-edge of vertex (col 1 of adj matrix)
- Fast loading of out-edges of a vertex from the disk
CSC (Compressed Sparse Column)
- Encoding (assume zero-index array)
- W = {10, 20, 30, 40, 50, 60, 70, 80}
- ROW_INDEX = {0, 1, 1, 3, 2, 3, 4, 5}
- COL_INDEX = {0, 2, 4, 7, 8}
- Find the in-edge of vertex (col 1 of matrix )
- COL_INDEX[1:2] = {2, 4}
- W[2:4] = {30, 40}
- ROW_INDEX[2:4] = {1, 3}
- Find the out-edge of vertex (row 1 of matrix )
- Fast loading of in-edges of a vertex from the disk
Comparison
- CSR format
- Adv: row access, out-edge of vertex
- CSC format
- Adv: col access, in-edge of vertex
Disadvantage for sparse encoding
- Random memory access is bad for disk
- Store graph in BOTH CSR and CSC format for bidirectional access
Graph encoding in GraphChi
- The following graph can be encoded in two different format in GraphChi
-
The adjacency shard stores, implicitly, an edge array for each vertex, in order.
- Edge array of a vertex starts with a variable-sized length word, followed by the list of neighbors. If a vertex has no edges in this shard, zero length byte is followed by the number of subsequent vertices with no edges in this shard.
-
-
The edge data shard is a flat array of edge values, in user-defined type. Values must be of constant size.
PSW (Parallel Sliding Windows)
- PSW is an algorithm proposed in this paper to execute any vertex-centric graph processing algorithm
- PSW can process a graph with mutable edge values efficiently from disk, with only a small number of non-sequential disk accesses, while supporting the asynchronous model of computation
What is "intervals" ?
- Vertices are split into disjoint intervals
- Each vertex in graph can ONLY belongs to one interval.
- Intervals are chosen to balance the number of edges in each shard.
- Policy: The number of intervals is chosen so that any one shard can be loaded completely into memory
What is "shards" ?
- For each interval, we associate a shard. Shard stores ALL the edges that have destination in that interval
- Edges are stored in the order of their source
- That's the reason why memory access pattern "slides" instead of random access
- Memory shard (Brown) is fully loaded into memory
- Sliding shard (Boxes) is a set of edges whose source vertices belongs to the interval

Parallel Updates
- Run user-defined update-function(v) for each vertex in parallel
- External determinism to gurantee that each execution produces the same result
- Vertices that have edges with both end-points in the same interval are flagged as critical. Will be updated sequentially
- Non-critical vertices can be safely updated in parallel
- Each update function will compute results based on the previous results. Even for cirtical edges. This adheres to the asynchronous model
Example
Loading the graph
- Graph is divided into two intervals (disjoint)
- Interval 1: vertex 1, 2
- Interval 2: vertex 3, 4
- Interval 3: vertex 5, 6
- Only one interval will be fully loaded into memory
Observation
- Shard is grouped by destination vertex
- Each shard is sorted by source vertex
Parallel update
- Critical vertices
- Processed in sequential
- Vertex 1, 2, 3, 4, 5, 6
- Non-critical vertices
- This section teach you how to run "pagerank" example app in GraphChi
Env setup
- Download GraphChi
git clone https://github.com/GraphChi/graphchi-cpp.git
- Build GraphChi
cd graphchi-cpp
make -j 12
where 12 is the number of cores on your CPU
- Download Google web graph from here
wget http://snap.stanford.edu/data/web-Google.txt.gz
- The downloaded file
web-Google.txt
contains the following entries
-
Run GraphChi on Google web graph
- Run the page rank algorithm for 100 iterations
./bin/example_apps/pagerank file web-Google.txt niters 100 filetype edgelist
- Notice the
web-Google.txt
file stores a graph in the form of edgelist
Results
- The most popular vertices determined by the page rank algorithm will be shown
Other
value += vertex.inedge(i).get_data();
SEL w from T where dst == i
- Normal DRAM: