---
title: "Python FFI 的陰暗角落 - scc"
tags: PyConTW2025, 2025-organize, 2025-共筆
---
# Python FFI 的陰暗角落 - scc
{%hackmd L_RLmFdeSD--CldirtUhCw %}
<iframe src=https://app.sli.do/event/tpEoj76rR1dNJ79KxGpe2s height=450 width=100%></iframe>
:::success
本演講提供 AI 翻譯字幕及摘要,請點選這裡前往 >> [PyCon Taiwan AI Notebook](https://pycontw.connyaku.app/?room=8TV2RAg1bJP3qXDUH7bO)
AI translation subtitles and summaries are available for this talk. Click here to access >> [PyCon Taiwan AI Notebook](https://pycontw.connyaku.app/?room=8TV2RAg1bJP3qXDUH7bO)
:::
> Collaborative writing start from below
> 從這裡開始共筆
## What is FFI
- Foreign Function Interface
- Python World --> C++ World
- frequent function call
## The hidden corners
### Benchmark ctypes, cffi, pybind11, PyO3: why free-threaded makes them slower
- `PyMutex_LockFast`, `PyMutex_Unluck` is frequently called
- Needs to consider racing condition in free-threaded version
- Recap
- Python3-13-NoGIL shows higher overhead than 3.13-GIL when crossing FFI boundaries
- Python3-14-NoGIL shows higher overhead than 3.14-GIL for FFi calls
### Why PyO3 fast, but ctypes slow?
- `take-gil`
- CDLL releases GIL / PyDLL doesn't release
- In free-threaded, PyDLL won't need to take GIL with it.
- Recap
- PyDLL > CDLL over a 20% performance when GIL.
- PyO3 includes deep opt for vector calls without the GIL.
- The libffi trampoline emerges as the next bottleneck.
- Interesting findings: `Python --> Pybind11 --> C++` is slower than `Python --> PyO3 --> Rust --> C++`
### Racing global states after No-GIL
- GIL protects foreign functions but if we discard it
||GIL|No-GIL|
|:--:|:--:|:--:|
|Thread Safe FF|Safe|Safe|
|Non TS FF | Uncertain|Race|
#### Suggestions
- Suggestion for migratting to free-threaded version: Foreign functions should be thread-safe
- Otherwise, don't use multithreading python to call FF.
## Takeaway
- Python3.14t is impressive, performance is almost as good as NoGIL version
- Canary or Shadow Deployment
- Make sure all components are thread-safe
## QA
- Why is PyO3 + Rust faster? ==PyO3 ~ 75ns per function call, Pybind11 ~ 150ns, Rust --> C++ about 10ns ==
- Stack profiling software? ==perf, should enable some option in kernel options==
## Links
- https://pycontw2025.scc.tw
- scc@scc.tw
Below is the part that speaker updated the talk/tutorial after speech
講者於演講後有更新或勘誤投影片的部份