# Improving performance of Merkle Sum Tree Benchmark before improving using a 16 levels MT: - CSV parsing - Leaf hashing - Level hashing ![](https://hackmd.io/_uploads/H1E0U2sE2.png) The area of improving is definitely the hasing. This can be improved by running the hashing within each level of the tree in parallel. For example let's consider an Merkle Sum Tree with 8 entries. The current program is performing the hashing of the entry to build a leaf in a sequential manner. But actually, there's no reason it has be run sequentially as there each hashing is independent from the previous one and the following one. Therefore we can add efficiencty there by parallelizing the hashing performance. # Improving MST - Leaf Hashing **current implementation** ```rust // Compute the leaves for (i, entry) in entries.iter().enumerate() { tree[0][i] = entry.compute_leaf(); } ``` **efficient implementation** In order to parallelize the hashing of the entries we can use the [`std::thread` crate](https://doc.rust-lang.org/std/thread/). It's important to rembember that the maximum number of thread that a machine depends on the number of cores. "Spawning a thread" is the process of instructing the operating system to create a new thread of execution. Each new thread will execute independently and concurrently with any other spawned thread. ![](https://hackmd.io/_uploads/BJt33ToV2.png) ```rust let mut handles = vec![]; // vector that will contain the `JoinHandle` objects for each spawned thread let chunk_size = (entries.len() + num_cpus::get() - 1) / num_cpus::get(); // calculate the size of each chuck of `entries` based on the number of threads that can be run for chunk in entries.chunks(chunk_size) { // The chunks method returns an iterator that yields slices of the original vector, each of size `chunk_size`. let chunk = chunk.to_vec(); handles.push(thread::spawn(move || { // spawn a new thread that will perform all the hashing for a chunks. HERE IS WHERE THE PARALLELIZATION HAPPENS chunk .into_iter() .map(|entry| entry.compute_leaf()) .collect::<Vec<_>>() // collect the handles })); } // Now all the threads have been spawned // We iterate over the computed hashes (leaves) and store in the tree vector let mut index = 0; for handle in handles { let result = handle.join().unwrap(); // the handle allows you to wait for the thread to finish by calling its join method. for leaf in result { tree[0][index] = leaf; index += 1; } } ``` # Improving MST - Middle Nodes Hashing **current implementation** ```rust // Compute the inner nodes for level in 1..=depth { let nodes_in_level = (n + (1 << level) - 1) / (1 << level); for i in 0..nodes_in_level { tree[level][i] = create_middle_node(&tree[level - 1][2 * i], &tree[level - 1][2 * i + 1]); } } ``` Similarly as before, we can perform the computation of a level of middle nodes leveraging the parallelization given by `thread` in this way: ```rust // Compute the inner nodes for level in 1..=depth { let nodes_in_level = (n + (1 << level) - 1) / (1 << level); // calculate the number of nodes for each level let mut handles = vec![]; let chunk_size = (nodes_in_level + num_cpus::get() - 1) / num_cpus::get(); // Divide the number of nodes by the number of CPU cores to get the number of nodes each thread should process // Iterates over chunks of two times the chunk size (since each middle node depends on two nodes from the previous level), and for each chunk, it spawns a new thread. for chunk in tree[level - 1].chunks(chunk_size * 2) { let chunk = chunk.to_vec(); handles.push(thread::spawn(move || { // Each thread iterates over pairs of nodes in its chunk and creates a middle node for each pair. chunk .chunks(2) .map(|pair| create_middle_node(&pair[0], &pair[1])) .collect::<Vec<_>>() })); } // it iterates over the thread handles and calls join on each one to get the computed middle nodes. It then stores the middle nodes in the tree. let mut index = 0; for handle in handles { let result = handle.join().unwrap(); for node in result { tree[level][index] = node; index += 1; } } } ```