---
tags: C++, Python
---
# Multithread parallel computation illustraed using C++ and Python
author: [`ngokchaoho`](https://github.com/ngokchaoho)
TL;DR: **I illustrated that multiple threads can actually complete CPU intensive jobs in parallel**.
My understanding of parallelism is like below.
>Quoting Sun's Multithreaded Programming Guide:
>
>Concurrency: A condition that exists when at least two threads are making progress. A more generalized form of parallelism that can include time-slicing as a form of virtual parallelism.
Parallelism: A condition that arises when at least two threads are executing simultaneously.
I started using parallel computing / concurrent computing using python, and I have always been under the impression that [threading](https://docs.python.org/3/library/threading.html) are for I/O intersive job and [multiprocess](https://docs.python.org/3/library/multiprocessing.html#:~:text=multiprocessing%20is%20a%20package%20that,using%20subprocesses%20instead%20of%20threads.) are for CPU intensive job.
One of the source of my impression came from this article - [Speed Up Your Python Program With Concurrency](https://realpython.com/python-concurrency/), where `threading` and `asyncio` were introduced to me for I/O task, and since Python had [GIL](https://en.wikipedia.org/wiki/Global_interpreter_lock) which makes it look like thread can only be used for concurrently but not in parallel.
But after reading this article (which is in Chinese) [Process,thread and file discriptor](https://mp.weixin.qq.com/s/USb5e2Zoc0LRgRShRpTYfg), I am enlightened that thread and process in Linux are almost the same thing except thread share the same **virtual memory** while processes do not.
Both of them use [task_struct](https://docs.huihoo.com/doxygen/linux/kernel/3.7/structtask__struct.html) in kernel space.
Hence, I think if that's the case we should be able to compute simultaneously using thread if there is no GIL, **we can execute as many jobs as the number of cores we have, at the same time!** (with hyperthreading, we should be able to excute more jobs at the same per core, but I haven't explored this part yet)
Now with the following C++ code, given a **6 corse CPU** we can see that when sorting n vector, n=1 and n=6 cost almost the same time, while the n=7 take addtional time in the scale of 1 job (n=1). This is evidence that multithread can be executed at the same time too.
Below I used real sorting task instead of using sleep because modern OS sceduler seems to put it to another queue and do not schdule any running time for this task until the sleep time is passed, hence it is different from having real CPU intensive task which scheduler do have to schedule running time to it
# C++ multithreads
```c++=
#include <chrono>
#include <thread>
#include <iostream>
#include <vector>
#include <algorithm>
void hello(std::vector<std::vector<int>> data, int i)
{
std::sort(data[i].begin(), data[i].end(), std::greater<int>());
std::cout << "Hello from thread " << std::this_thread::get_id() << "using core" << sched_getcpu() << std::endl;
}
int main(){
int n = 7;
int default_value = 1;
int length = 10000000;
std::vector<int> v(length, default_value);
std::vector<std::vector<int>> data(n,v);
for (int i = 0; i < n; i++)
for (int x = 0; x < length; x++)
{
data[i].push_back(rand());
}
std::vector<std::thread> threads;
for(int i = 0; i < n; ++i){
threads.push_back(std::thread(hello,data,i));
}
for(auto& thread : threads){
thread.join();
}
return 0;
}
```
## Result:
```
n=1
ngokchaoho@DESKTOP-S73LMN4:/sys/kernel/debug$ time /home/ngokchaoho/c++_thread_test/thread_parallel_test
Hello from thread 140446305122048using core2
real 0m7.121s
user 0m7.069s
sys 0m0.050s
```
```
n=6
ngokchaoho@DESKTOP-S73LMN4:/sys/kernel/debug$ time /home/ngokchaoho/c++_thread_test/thread_parallel_test
Hello from thread 140494024288000using core1
Hello from thread 140493535876864using core3
Hello from thread 140493047465728using core2
Hello from thread 140492559054592using core5
Hello from thread 140492070643456using core4
Hello from thread 140491582232320using core0
real 0m9.997s
user 0m46.054s
sys 0m1.148s
```
```
n=7
ngokchaoho@DESKTOP-S73LMN4:/sys/kernel/debug$ time /home/ngokchaoho/c++_thread_test/thread_parallel_test
Hello from thread 139841105495808using core3
Hello from thread 139841097103104using core2
Hello from thread 139841113888512using core5
Hello from thread 139837953767168using core4
Hello from thread 139841088710400using core1
Hello from thread 139837385352960using core0
Hello from thread 139840353859328using core5
real 0m19.097s
user 0m53.380s
sys 0m2.939s
```
# Python multithread
While due to GIC, python thread really executes one by one
```python=
#! /usr/bin/python3
# imports
import threading
import time
import random
n = 6
data = [[] for i in range(n)]
length = 10000000
for j in range (n):
data[j] = [random.randint(0,1000) for i in range(length)]
def worker(i):
print("worker ",i)
data[i].sort()
threads = []
for i in range(5):
thread = threading.Thread(target=worker, args=(i,))
threads.append(thread)
thread.start()
for thread in threads: # iterates over the threads
thread.join() # waits until the thread has finished work
```
## Result:
```
ngokchaoho@DESKTOP-S73LMN4:/sys/kernel/debug$ time /home/ngokchaoho/c++_thread_test/thread_test.py
worker 0
worker 1
worker 2
worker 3
worker 4
real 0m44.225s
user 0m43.447s
sys 0m0.690s
```