Multithread parallel computation illustraed using C++ and Python

--- tags: C++, Python --- # Multithread parallel computation illustraed using C++ and Python author: [`ngokchaoho`](https://github.com/ngokchaoho) TL;DR: **I illustrated that multiple threads can actually complete CPU intensive jobs in parallel**. My understanding of parallelism is like below. >Quoting Sun's Multithreaded Programming Guide: > >Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism. Parallelism: A condition that arises when at least two threads are executing simultaneously. I started using parallel computing / concurrent computing using python, and I have always been under the impression that [threading](https://docs.python.org/3/library/threading.html) are for I/O intersive job and [multiprocess](https://docs.python.org/3/library/multiprocessing.html#:~:text=multiprocessing%20is%20a%20package%20that,using%20subprocesses%20instead%20of%20threads.) are for CPU intensive job. One of the source of my impression came from this article - [Speed Up Your Python Program With Concurrency](https://realpython.com/python-concurrency/), where `threading` and `asyncio` were introduced to me for I/O task, and since Python had [GIL](https://en.wikipedia.org/wiki/Global_interpreter_lock) which makes it look like thread can only be used for concurrently but not in parallel. But after reading this article (which is in Chinese) [Process,thread and file discriptor](https://mp.weixin.qq.com/s/USb5e2Zoc0LRgRShRpTYfg), I am enlightened that thread and process in Linux are almost the same thing except thread share the same **virtual memory** while processes do not. Both of them use [task_struct](https://docs.huihoo.com/doxygen/linux/kernel/3.7/structtask__struct.html) in kernel space. Hence, I think if that's the case we should be able to compute simultaneously using thread if there is no GIL, **we can execute as many jobs as the number of cores we have, at the same time!** (with hyperthreading, we should be able to excute more jobs at the same per core, but I haven't explored this part yet) Now with the following C++ code, given a **6 corse CPU** we can see that when sorting n vector, n=1 and n=6 cost almost the same time, while the n=7 take addtional time in the scale of 1 job (n=1). This is evidence that multithread can be executed at the same time too. Below I used real sorting task instead of using sleep because modern OS sceduler seems to put it to another queue and do not schdule any running time for this task until the sleep time is passed, hence it is different from having real CPU intensive task which scheduler do have to schedule running time to it # C++ multithreads ```c++= #include <chrono> #include <thread> #include <iostream> #include <vector> #include <algorithm> void hello(std::vector<std::vector<int>> data, int i) { std::sort(data[i].begin(), data[i].end(), std::greater<int>()); std::cout << "Hello from thread " << std::this_thread::get_id() << "using core" << sched_getcpu() << std::endl; } int main(){ int n = 7; int default_value = 1; int length = 10000000; std::vector<int> v(length, default_value); std::vector<std::vector<int>> data(n,v); for (int i = 0; i < n; i++) for (int x = 0; x < length; x++) { data[i].push_back(rand()); } std::vector<std::thread> threads; for(int i = 0; i < n; ++i){ threads.push_back(std::thread(hello,data,i)); } for(auto& thread : threads){ thread.join(); } return 0; } ``` ## Result: ``` n=1 ngokchaoho@DESKTOP-S73LMN4:/sys/kernel/debug$ time /home/ngokchaoho/c++_thread_test/thread_parallel_test Hello from thread 140446305122048using core2 real 0m7.121s user 0m7.069s sys 0m0.050s ``` ``` n=6 ngokchaoho@DESKTOP-S73LMN4:/sys/kernel/debug$ time /home/ngokchaoho/c++_thread_test/thread_parallel_test Hello from thread 140494024288000using core1 Hello from thread 140493535876864using core3 Hello from thread 140493047465728using core2 Hello from thread 140492559054592using core5 Hello from thread 140492070643456using core4 Hello from thread 140491582232320using core0 real 0m9.997s user 0m46.054s sys 0m1.148s ``` ``` n=7 ngokchaoho@DESKTOP-S73LMN4:/sys/kernel/debug$ time /home/ngokchaoho/c++_thread_test/thread_parallel_test Hello from thread 139841105495808using core3 Hello from thread 139841097103104using core2 Hello from thread 139841113888512using core5 Hello from thread 139837953767168using core4 Hello from thread 139841088710400using core1 Hello from thread 139837385352960using core0 Hello from thread 139840353859328using core5 real 0m19.097s user 0m53.380s sys 0m2.939s ``` # Python multithread While due to GIC, python thread really executes one by one ```python= #! /usr/bin/python3 # imports import threading import time import random n = 6 data = [[] for i in range(n)] length = 10000000 for j in range (n): data[j] = [random.randint(0,1000) for i in range(length)] def worker(i): print("worker ",i) data[i].sort() threads = [] for i in range(5): thread = threading.Thread(target=worker, args=(i,)) threads.append(thread) thread.start() for thread in threads: # iterates over the threads thread.join() # waits until the thread has finished work ``` ## Result: ``` ngokchaoho@DESKTOP-S73LMN4:/sys/kernel/debug$ time /home/ngokchaoho/c++_thread_test/thread_test.py worker 0 worker 1 worker 2 worker 3 worker 4 real 0m44.225s user 0m43.447s sys 0m0.690s ```