Python GLS - HackMD

# Python GLS * Concurrency: 在同一個時間點內，只有其中一個 worker 可以執行，但可以存在多個 worker * Parallelism: 在同一個時間點內 worker 可以並行存在且同時執行 Python 的 thread 和 AsycIO 具有 Concurrency 的概念 Python 的 multiprocessing 具有 Parallelism 的概念但在介紹以上名詞之前要講講 GIL 是什麼 ## Global Interpreter Lock 原文: In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.) 先講講 Python 是怎麼被執行的吧，首先你的 Python 程式碼被編譯成一個稱做 bytecode 的二進位格式，接著 Python 直譯器中的一個名為 PyEval_EvalFrameEx() 的函式，會將 bytecode 一個一個的讀入並且執行。原文中 CPython 直譯器為了確保同一時間執行該 bytecode 只有一個 thread 執行，所以才加了 GIL。 ### GIL in IO/connection task 網路 I/O 相關 task，會需要很長一段的等待時間且不會跑任何的 Python 程式，這時執行緒會將 GIL 給釋放出來，好讓其他的執行緒可以拿走並且工作。假設有兩個執行緒想要連接一個 socket: ``` import socket import threading def do_connect(): s = socket.socket() s.connect(('python.org', 80)) # drop the GIL for i in range(2): t = threading.Thread(target=do_connect) t.start() ``` 當執行緒的socket開始連接時，它會將GIL釋放好讓其他執行緒可以執行。 CPython 中 socketmodule.c 的作法: ``` /* s.connect((host, port)) method */ static PyObject * sock_connect(PySocketSockObject *s, PyObject *addro) { sock_addr_t addrbuf; int addrlen; int res; /* convert (host, port) tuple to C address */ getsockaddrarg(s, addro, SAS2SA(&addrbuf), &addrlen); Py_BEGIN_ALLOW_THREADS res = connect(s->sock_fd, addr, addrlen); Py_END_ALLOW_THREADS /* error handling and so on .... */ } ``` 其中 Py_BEGIN_ALLOW_THREADS 進行的就是釋放 GIL ， Py_END_ALLOW_THREADS 則是請求 GIL ### GIL Preemptive mechism 除了 IO/connection 時會釋放 GIL 外，當直譯器在執行你的程式的 bytecode 時，會時不時的釋放 GIL，而且不會詢問目前正在執行的執行緒，好讓其他的執行緒可以運行(猜測可能是避免 dead lock 之類 GIL 一直被 hold 的情況 ?)： ``` for (;;) { if (--ticker < 0) { ticker = check_interval; /* Give another thread a chance */ PyThread_release_lock(interpreter_lock); /* Other threads may run now */ PyThread_acquire_lock(interpreter_lock, 1); } bytecode = *next_instr++; switch (bytecode) { /* execute the next instruction ... */ } } 預設情況下 check_interval 是 1000 bytecodes。全部的執行緒都跑著相同的程式以及同樣的方式釋放 GIL。在 Python 3 中 GIL 的實作更為複雜，而且 check_interval 不是固定數字的 bytecodes，而是 15 ms。不過對你的程式而言，這樣的改動影響並不大。 ``` 代表說 GIL 雖然保證只有一個 thread 再跑 bytecode，但是它有可能跑完一行然後換人，對於一些全域變數的 non-atomic operation 會有 thread safety 的問題 (GLS 不保證 thread safety) ### non-atomic operation ``` import dis n = 0 def foo(): global n n += 1 dis.dis(foo) ``` bytecodes: ``` LOAD_GLOBAL 0 (n) LOAD_CONST 1 (1) INPLACE_ADD STORE_GLOBAL 0 (n) ``` 由此可知如果有 thread A 在 LOAD_GLOBAL 時被強制釋放GIL ，其他 thread B 先行 INPLACE_ADD 且 STORE_GLOBAL，當在換回 thread A 時還是只會被加一次(因為 A 在 B store n 之前已經先 load n 了) ### atomic operation ``` import dis lst = [4, 1, 3, 2] def foo(): lst.sort() dis.dis(foo) ``` bytecodes: ``` LOAD_GLOBAL 0 (lst) LOAD_ATTR 1 (sort) CALL_FUNCTION 0 ``` 有些操看似很複雜，實既上在 bytecodes 中只有一行 ## Concurrency 介紹一下 threading 和 Asyncio 的差別 threading | Asyncio -----|----- ![](https://i.imgur.com/pBm6vPJ.png) | ![](https://i.imgur.com/YGtb2sd.png) 每個 thread 負責一個 task | 一個 thread 負責多個 task (cooperative multitasking) 理論上 Asyncio 比 threading 來的好，省去了 GIL release 和 request 的時間，此外: Threading in Python allows asynchronicity, but our program could theoretically skip around different threads that may not yet be ready, wasting time if there are threads ready to continue running. 接下來看看範例 ### threading ``` import threading import requests import time urls = ["https://blog.finxter.com/wp-content/uploads/2020/05/speed-Kopie.jpg","https://blog-cdn.feedspot.com/wp-content/uploads/2016/12/Python-25-transparent_216px.png", "https://blog.finxter.com/wp-content/uploads/2020/05/speed-Kopie.jpg","https://blog-cdn.feedspot.com/wp-content/uploads/2016/12/Python-25-transparent_216px.png"] task_list = [] def worker(): while True: try: url = urls.pop() except IndexError: break # Done. requests.get(url) print(str(url) +" ....ok") t0 = time.time() for _ in range(2): t = threading.Thread(target=worker) task_list.append(t) t.start() for t in task_list: t.join() print(time.time() - t0) ```  ### Asyncio ``` import asyncio import requests import time urls = ["https://blog.finxter.com/wp-content/uploads/2020/05/speed-Kopie.jpg","https://blog-cdn.feedspot.com/wp-content/uploads/2016/12/Python-25-transparent_216px.png", "https://blog.finxter.com/wp-content/uploads/2020/05/speed-Kopie.jpg","https://blog-cdn.feedspot.com/wp-content/uploads/2016/12/Python-25-transparent_216px.png"] async def worker(): while True: try: url = urls.pop() except IndexError: break # Done. requests.get(url) print(str(url) +" ....ok") t0 = time.time() asyncio.run(worker()) print(time.time() - t0) ```  ## Parallelism threading 和 Asyncio 皆是只使用一個 cpu 資源，若想發揮一個 cpu 一個 task 的情況，可以使用 multiprocessing。因為 process 會將其整個 urls 拷貝一份，Process1 和 Process2 都會 pop 各自的 urls，所以我們做些手腳分成 urls_p1 和 urls_p2，分別讓2個 process 執行。 ### multiprocessing: ``` from multiprocessing import Process import requests import time urls_p1 = ["https://blog.finxter.com/wp-content/uploads/2020/05/speed-Kopie.jpg", "https://blog-cdn.feedspot.com/wp-content/uploads/2016/12/Python-25-transparent_216px.png"] urls_p2 = ["https://blog-cdn.feedspot.com/wp-content/uploads/2016/12/Python-25-transparent_216px.png", "https://blog.finxter.com/wp-content/uploads/2020/05/speed-Kopie.jpg"] urls_all = [urls_p1, urls_p2] task_list = [] def worker(i): urls = urls_all[i] while True: try: url = urls.pop() except IndexError: break # Done. requests.get(url) print(str(url) +" ....ok") t0 = time.time() for i in range(2): p = Process(target=worker, args=(i,)) task_list.append(p) p.start() for p in task_list: p.join() print(time.time() - t0) ```  ## reference * https://blog.louie.lu/2017/05/19/%E6%B7%B1%E5%85%A5-gil-%E5%A6%82%E4%BD%95%E5%AF%AB%E5%87%BA%E5%BF%AB%E9%80%9F%E4%B8%94-thread-safe-%E7%9A%84-python-grok-the-gil-how-to-write-fast-and-thread-safe-python/ * https://realpython.com/ * https://testdriven.io/blog/concurrency-parallelism-asyncio/