# nutika 替換shardlib 路徑
看了幾篇文章只有找到這個
https://www.fournoas.com/posts/nuitka-inject-custom-c-code-at-compile-time/
像這篇文章有透過暫停的方式去介入 nuitka 編譯過程,但是這方法是改不動我們要做的事情,不過也大概知道 nuitka 的過程是什麼
然後嘗試用ld preload方式去替換shardlib ,但發現在編譯完的binary 的 shardlib 引用太少了,大部分還是透過 dlopen 去加載 shardlib,那麼在實際要變成一個產品的話又有需要要嘗試改動 nuitka 加載 dynamic shared library 的路徑怎麼辦呢,來研究一下
先假設你要編譯的source code為
# source code
## test.py
```python=
import matplotlib
import mymodule
myclass = mymodule.MyClass1()
mymodule.function2()
import math
print(math.nan == math.nan)
print(float('nan') == float('nan'))
print(math.isnan(math.nan))
print(math.isnan(float('nan')))
```
## nuitka command
```
nuitka --standalone --onefile --show-memory --show-progress --static-libpython=yes --nofollow-imports --output-dir=out --mingw64 ./test.py
```
nuitka編譯後目錄下會產生out/test.build , out/test.dist
可以發現 out/test.dist 透過ldd 裡面跟編譯後的目錄下那麼多.so 對應不起來
```
ldd testcase
linux-vdso.so.1 (0x00007ffdf4de7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fbf6c226000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbf6c0d7000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fbf6c0b4000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fbf6c0af000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fbf6c0a5000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbf6beb3000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbf6c23f000)
```
那大概可以得知就是他們加載 shard lib 是透過動態加載 dlopen
這邊先處理編譯時期的shardlib
那麼已經知道都是會有一個起始點 scons 去編譯,找到了Backend.scons
```
/usr/local/lib/python3.9/site-packages/nuitka/build/Backend.scons
```
```
# Set load libpython from binary directory default
if env.gcc_mode and not isMacOS() and not os.name == "nt" and not module_mode:
if env.standalone_mode:
rpath = "$$ORIGIN"
else:
rpath = python_lib_path
local_lib_path = os.path.join(os.getcwd(), 'lib')
print(local_lib_path)
env.Append(LINKFLAGS=["-Wl,-R,'%s'" % rpath, "-Wl,-L'%s'" % local_lib_path])
# The rpath is no longer used unless we do this on modern Linux. The
# option name is not very revealing, but basically without this, the
# rpath in the binary will be ignored by the loader.
if "linux" in sys.platform:
env.Append(LINKFLAGS=["-Wl,--disable-new-dtags","-Wl,-R,'%s'" % rpath, "-Wl,-L'%s'" % local_lib_path])
```
這邊僅限於一些需要編譯時期已經加入的shard lib ,
然後解決完編譯時期的shard lib 在找看看從動態的呼叫 dlopen 加載 shard lib 的程式碼在哪。
那麼去到source code
grep -r "dlopen" /usr/local/lib/python3.9/site-packages/nuitka
搜尋看看
```
[root@328103204ad4 out]# grep -r "dlopen" /usr/local/lib/python3.9/site-packages/nuitka
/usr/local/lib/python3.9/site-packages/nuitka/build/static_src/MetaPathBasedLoader.c: // spell-checker: ignore getdlopenflags,dlopenflags
/usr/local/lib/python3.9/site-packages/nuitka/build/static_src/MetaPathBasedLoader.c: static PyObject *dlopenflags_object = NULL;
/usr/local/lib/python3.9/site-packages/nuitka/build/static_src/MetaPathBasedLoader.c: if (dlopenflags_object == NULL) {
/usr/local/lib/python3.9/site-packages/nuitka/build/static_src/MetaPathBasedLoader.c: dlopenflags_object = CALL_FUNCTION_NO_ARGS(tstate, Nuitka_SysGetObject("getdlopenflags"));
/usr/local/lib/python3.9/site-packages/nuitka/build/static_src/MetaPathBasedLoader.c: int dlopenflags = PyInt_AsLong(dlopenflags_object);
/usr/local/lib/python3.9/site-packages/nuitka/build/static_src/MetaPathBasedLoader.c: PySys_WriteStderr("import %s # dlopen(\"%s\", %x);\n", full_name, filename, dlopenflags);
/usr/local/lib/python3.9/site-packages/nuitka/build/static_src/MetaPathBasedLoader.c: void *handle = dlopen(filename, dlopenflags);
/usr/local/lib/python3.9/site-packages/nuitka/build/static_src/MetaPathBasedLoader.c: error = "unknown dlopen() error";
```
```
vim /usr/local/lib/python3.9/site-packages/nuitka/build/static_src/MetaPathBasedLoader.c
```
定位到這邊可以看到說,估計也是透過
void *handle = dlopen(filename, dlopenflags);
加載 .so 那麼就往上追
```
// This code would work for all versions, we are avoiding access to interpreter
// structure internals of 3.8 or higher.
// spell-checker: ignore getdlopenflags,dlopenflags
static PyObject *dlopenflags_object = NULL;
if (dlopenflags_object == NULL) {
dlopenflags_object = CALL_FUNCTION_NO_ARGS(tstate, Nuitka_SysGetObject("getdlopenflags"));
}
int dlopenflags = PyInt_AsLong(dlopenflags_object);
if (isVerbose()) {
PySys_WriteStderr("import %s # dlopen(\"%s\", %x);\n", full_name, filename, dlopenflags);
}
void *handle = dlopen(filename, dlopenflags);
```
往上追後 callIntoExtensionModule 這支function 是負責加載單一shardlib 的c function
```
#ifdef _WIN32
static PyObject *callIntoExtensionModule(PyThreadState *tstate, char const *full_name, const wchar_t *filename) {
#else
static PyObject *callIntoExtensionModule(PyThreadState *tstate, char const *full_name, const char *filename) {
#endif
```
透過一陣printf 大法 找到 filename 為我們要加載的lib 位置
```
// Pointers to bytecode data.
static char **_bytecode_data = NULL;
static PyObject *loadModule(PyThreadState *tstate, PyObject *module, PyObject *module_name,
struct Nuitka_MetaPathBasedLoaderEntry const *entry) {
#ifdef _NUITKA_STANDALONE
if ((entry->flags & NUITKA_EXTENSION_MODULE_FLAG) != 0) {
// Append the the entry name from full path module name with dots,
// and translate these into directory separators.
#ifdef _WIN32
wchar_t filename[MAXPATHLEN + 1] = {0};
appendWStringSafeW(filename, getBinaryDirectoryWideChars(true), sizeof(filename) / sizeof(wchar_t));
appendCharSafeW(filename, SEP, sizeof(filename) / sizeof(wchar_t));
appendModuleNameAsPathW(filename, entry->name, sizeof(filename) / sizeof(wchar_t));
appendStringSafeW(filename, ".pyd", sizeof(filename) / sizeof(wchar_t));
#else
char filename[MAXPATHLEN + 1] = {0};
appendStringSafe(filename, getBinaryDirectoryHostEncoded(true), sizeof(filename));
appendCharSafe(filename, SEP, sizeof(filename));
appendModuleNameAsPath(filename, entry->name, sizeof(filename));
appendStringSafe(filename, ".so", sizeof(filename));
printf("%s",filename);
#endif
// Set "__spec__" and "__file__", some modules expect it early.
setModuleFileValue(tstate, module, filename);
#if PYTHON_VERSION >= 0x350
PyObject *spec_value =
createModuleSpec(tstate, module_name, LOOKUP_ATTRIBUTE(tstate, module, const_str_plain___file__), false);
SET_ATTRIBUTE(tstate, module, const_str_plain___spec__, spec_value);
#endif
callIntoExtensionModule(tstate, entry->name, filename);
} else
#endif
if ((entry->flags & NUITKA_BYTECODE_FLAG) != 0) {
// TODO: Do node use marshal, but our own stuff, once we
// can do code objects too.
PyCodeObject *code_object =
(PyCodeObject *)PyMarshal_ReadObjectFromString(_bytecode_data[entry->bytecode_index], entry->bytecode_size);
// TODO: Probably a bit harsh reaction.
if (unlikely(code_object == NULL)) {
PyErr_Print();
abort();
}
return loadModuleFromCodeObject(module, code_object, entry->name, (entry->flags & NUITKA_PACKAGE_FLAG) != 0);
} else {
assert((entry->flags & NUITKA_EXTENSION_MODULE_FLAG) == 0);
assert(entry->python_initfunc);
{
NUITKA_MAY_BE_UNUSED bool res = Nuitka_SetModule(module_name, module);
assert(res != false);
}
// Run the compiled module code, we get the module returned.
#if PYTHON_VERSION < 0x300
NUITKA_MAY_BE_UNUSED
#endif
PyObject *result = entry->python_initfunc(tstate, module, entry);
CHECK_OBJECT_X(result);
#if PYTHON_VERSION >= 0x300
if (likely(result != NULL)) {
_fixupSpecAttribute(tstate, result);
}
#endif
}
if (unlikely(HAS_ERROR_OCCURRED(tstate))) {
return NULL;
}
if (isVerbose()) {
PySys_WriteStderr("Loaded %s\n", entry->name);
}
return Nuitka_GetModule(tstate, module_name);
}
```
所以我們是在這邊組合 .so 的路徑
```
#else
char filename[MAXPATHLEN + 1] = {0};
appendStringSafe(filename, getBinaryDirectoryHostEncoded(true), sizeof(filename));
appendCharSafe(filename, SEP, sizeof(filename));
appendModuleNameAsPath(filename, entry->name, sizeof(filename));
appendStringSafe(filename, ".so", sizeof(filename));
printf("%s",filename);
#endif
```
來加料一下
```
#ifdef _WIN32
wchar_t filename[MAXPATHLEN + 1] = {0};
appendWStringSafeW(filename, getBinaryDirectoryWideChars(true), sizeof(filename) / sizeof(wchar_t));
appendCharSafeW(filename, SEP, sizeof(filename) / sizeof(wchar_t));
appendModuleNameAsPathW(filename, entry->name, sizeof(filename) / sizeof(wchar_t));
appendStringSafeW(filename, ".pyd", sizeof(filename) / sizeof(wchar_t));
#else
char filename[MAXPATHLEN + 1] = {0};
appendStringSafe(filename, getBinaryDirectoryHostEncoded(true), sizeof(filename));
if (strlen(filename) + strlen("lib") < sizeof(filename)) {
strcat( filename, "/lib");
} else {
printf("Not enough space in 'filename' to append 'lib'\n");
}
appendCharSafe(filename, SEP, sizeof(filename));
appendModuleNameAsPath(filename, entry->name, sizeof(filename));
appendStringSafe(filename, ".so", sizeof(filename));
#endif
printf("%s\n",filename);
```
那麼預設 我們編譯出來的test.bin
nutika build command
```
nuitka --standalone --onefile --show-memory --show-progress --static-libpython=yes --nofollow-imports --output-dir=out --mingw64 ./test.py
```
我們查看out 目錄下的 test.dist 裡面應該有一個test.bin
```
test.dist
```
先看一下目錄結構
```
out/test.dist/
_codecs_cn.so _codecs_iso2022.so _codecs_kr.so _datetime.so _pickle.so _sha512.so binascii.so test.bin zlib.so
_codecs_hk.so _codecs_jp.so _codecs_tw.so _multibytecodec.so _random.so _struct.so math.so unicodedata.so
```
直接執行
out/test.dist/test.bin
```
qweeeeeeeeeeeeeeeeeeeee/shared_data/out/test.dist/test.bin
qweeeeeeeeeeeeeeeeeeeee/shared_data/out/test.dist/test.bin
inspect
ast
contextlib
_collections_abc
collections
heapq
keyword
operator
reprlib
functools
enum
dis
opcode
collections.abc
importlib.machinery
linecache
os
stat
posixpath
genericpath
tokenize
re
sre_compile
sre_parse
sre_constants
copyreg
token
__main__
math
/shared_data/Tuning_automation/out/test.dist/lib/math.so
/shared_data/Tuning_automation/out/test.dist/lib/math.so
Traceback (most recent call last):
File "/shared_data/Tuning_automation/out/test.dist/test.py", line 1, in <module>
ImportError: /shared_data/Tuning_automation/out/test.dist/lib/math.so: cannot open shared object file: No such file or directory
```
```
/shared_data/Tuning_automation/out/test.dist/lib/math.so: cannot open shared object file: No such file or directory
```
這邊可以發現路徑已經多一層lib 也算替換成功
那麼接下來就是 將dist 裡面的.so 往裡面移動一層到lib
```
[root@328103204ad4 test.dist]# ls -all
total 6488
drwxr-xr-x 3 root root 4096 May 11 06:13 .
drwxr-xr-x 9 root root 4096 May 11 05:49 ..
drwxr-xr-x 2 root root 4096 May 11 06:13 lib
-rwxr-xr-x 1 root root 6630352 May 11 05:48 test.bin
```
到這邊就算改完了,目錄下就蠻乾淨的
![image](https://hackmd.io/_uploads/SkTu6KnzR.png)
額外加碼,假設在 python 需要加載 package 的 module 的話你的目錄結構會變這樣
```
PIL test.bin contourpy kiwisolver lib markupsafe matplotlib numpy pandas pytz
```
# source code
## test.py
```python=
import sys
import pytz
from datetime import datetime
# Create a time zone-aware datetime object in UTC
utc_now = datetime.utcnow().replace(tzinfo=pytz.utc)
# Convert to a specific time zone (e.g., New York)
local_timezone = pytz.timezone('America/New_York')
local_time = utc_now.astimezone(local_timezone)
print(f'UTC Time: {utc_now}')
print(f'Local Time (New York): {local_time}')
```
## nuitka command
```
nuitka --standalone --onefile --show-memory --show-progress --static-libpython=yes --nofollow-imports --output-dir=out --mingw64 ./test.py
```
以我們的情況來說,剛剛我們動的是最後 link time 和 c code,所以只有動到 動態和靜態的 shardlib 的路徑,所以,一些有關於 nuitka 架構,比如說 python 轉成 c++後,他是怎麼 import 對應到 c++ dlopen加載 lib 這情況我不清楚,這會導致架構上的東西是沒辦法動的,也就是加載so 確實都移動到 lib裡面了,但是最外層還是要保留一些package的folder 目錄結構,裡面是 nuitka 原本的架構設計,所以可能要維持他 python package 的目錄結構,但是裡面的.so都可以刪除.
所以lib 裡面 和 根目錄都有 python package name的資料夾,差別在 ./lib 裡面的是有 .so 的
# root list
ex.matplotlib裡面只有一些package預設的module 設定檔案
```
PIL test.bin contourpy kiwisolver lib markupsafe matplotlib numpy pandas pytz
```
# lib list
ex.matplotlib裡面有 .so
```
PIL contourpy kiwisolver markupsafe matplotlib numpy pandas pytz
```
# total list
實際開發環境的 tree ,可以看到 matplotlib目錄下的差別
```
tree -L 3
.
├── PIL
├── test.bin
├── contourpy
├── kiwisolver
├── lib
│ ├── PIL
│ │ ├── _imaging.so
│ │ ├── _imagingcms.so
│ │ ├── _imagingmath.so
│ │ └── _webp.so
│ ├── _asyncio.so
│ ├── _blake2.so
│ ├── _bz2.so
│ ├── _codecs_cn.so
│ ├── _codecs_hk.so
│ ├── _codecs_iso2022.so
│ ├── _codecs_jp.so
│ ├── _codecs_kr.so
│ ├── _codecs_tw.so
│ ├── _contextvars.so
│ ├── _csv.so
│ ├── _ctypes.so
│ ├── _datetime.so
│ ├── _decimal.so
│ ├── _elementtree.so
│ ├── _hashlib.so
│ ├── _heapq.so
│ ├── _md5.so
│ ├── _multibytecodec.so
│ ├── _multiprocessing.so
│ ├── _opcode.so
│ ├── _pickle.so
│ ├── _posixshmem.so
│ ├── _posixsubprocess.so
│ ├── _queue.so
│ ├── _random.so
│ ├── _sha1.so
│ ├── _sha256.so
│ ├── _sha3.so
│ ├── _sha512.so
│ ├── _socket.so
│ ├── _ssl.so
│ ├── _statistics.so
│ ├── _struct.so
│ ├── array.so
│ ├── binascii.so
│ ├── contourpy
│ │ └── _contourpy.so
│ ├── grp.so
│ ├── kiwisolver
│ │ ├── _cext.cpython-39-x86_64-linux-gnu.so
│ │ └── _cext.so
│ ├── libXau-00ec42fe.so.6.0.0
│ ├── libbz2.so.1
│ ├── libcrypto.so.1.1
│ ├── libffi.so.6
│ ├── libgfortran-040039e1.so.5.0.0
│ ├── libjpeg-cec335f2.so.62.4.0
│ ├── liblcms2-8d000061.so.2.0.16
│ ├── liblzma-d1e41b3a.so.5.4.5
│ ├── libopenblas64_p-r0-0cf96a72.3.23.dev.so
│ ├── libopenjp2-98a646ca.so.2.5.0
│ ├── libquadmath-96973f99.so.0.0.0
│ ├── libsharpyuv-652b6057.so.0.0.1
│ ├── libssl.so.1.1
│ ├── libtiff-f683b479.so.6.0.2
│ ├── libwebp-8a0843dd.so.7.1.8
│ ├── libwebpdemux-f9b98349.so.2.0.14
│ ├── libwebpmux-b067bc14.so.3.0.13
│ ├── libxcb-ac5351d8.so.1.1.0
│ ├── markupsafe
│ │ └── _speedups.so
│ ├── math.so
│ ├── matplotlib
│ │ ├── _c_internal_utils.so
│ │ ├── _image.so
│ │ ├── _path.so
│ │ ├── _qhull.so
│ │ ├── _tri.so
│ │ ├── backends
│ │ ├── ft2font.so
│ │ └── mpl-data
│ ├── mmap.so
│ ├── numpy
│ │ ├── core
│ │ ├── fft
│ │ ├── linalg
│ │ └── random
│ ├── pandas
│ │ ├── _libs
│ │ └── io
│ ├── pyexpat.so
│ ├── pytz
│ │ └── zoneinfo
│ ├── select.so
│ ├── termios.so
│ ├── unicodedata.so
│ └── zlib.so
├── markupsafe
├── matplotlib
│ ├── backends
│ └── mpl-data
│ ├── fonts
│ ├── images
│ ├── kpsewhich.lua
│ ├── matplotlibrc
│ ├── plot_directive
│ └── stylelib
├── numpy
│ ├── core
│ ├── fft
│ ├── linalg
│ └── random
├── pandas
│ ├── _libs
│ │ ├── tslibs
│ │ └── window
│ └── io
│ └── formats
└── pytz
└── zoneinfo
├── Africa
├── America
├── Antarctica
├── Arctic
├── Asia
├── Atlantic
├── Australia
├── Brazil
├── CET
├── CST6CDT
├── Canada
├── Chile
├── Cuba
├── EET
├── EST
├── EST5EDT
├── Egypt
├── Eire
├── Etc
├── Europe
├── Factory
├── GB
├── GB-Eire
├── GMT
├── GMT+0
├── GMT-0
├── GMT0
├── Greenwich
├── HST
├── Hongkong
├── Iceland
├── Indian
├── Iran
├── Israel
├── Jamaica
├── Japan
├── Kwajalein
├── Libya
├── MET
├── MST
├── MST7MDT
├── Mexico
├── NZ
├── NZ-CHAT
├── Navajo
├── PRC
├── PST8PDT
├── Pacific
├── Poland
├── Portugal
├── ROC
├── ROK
├── Singapore
├── Turkey
├── UCT
├── US
├── UTC
├── Universal
├── W-SU
├── WET
├── Zulu
├── iso3166.tab
├── leapseconds
├── tzdata.zi
├── zone.tab
├── zone1970.tab
└── zonenow.tab
```
最外層目錄最終就變成比較不會那麼亂了
```
tree -L 1
.
├── PIL
├── pytz
├── contourpy
├── kiwisolver
├── lib
├── markupsafe
├── matplotlib
├── numpy
├── pandas
└── test.bin
```
# nuitka add option "--onefile"
這邊有找到官方的option
result=subprocess.run([f"{args.pp}", "-m", "nuitka", "--standalone","--onefile", "--show-memory","--show-progress","--static-libpython=yes", "--nofollow-imports", "--output-dir=out","--mingw64","./custom.py"])
假設加上 onfile的情況下,nuitka可以先將依些shard lib 打包到成一個binary runtime 在解壓縮 temp資料夾
Example python code
```
import sys
import os
current_directory = os.path.dirname(os.path.abspath(__file__))
lib_directory = os.path.join(current_directory, 'lib')
sys.path = [lib_directory]
print("sys.path:", sys.path)
try:
import pytz
print("Custom pytz module loaded from:", pytz.__file__)
except ImportError as e:
print("Failed to import pytz:", e)
from datetime import datetime
# Create a time zone-aware datetime object in UTC
utc_now = datetime.utcnow().replace(tzinfo=pytz.utc)
# Convert to a specific time zone (e.g., New York)
local_timezone = pytz.timezone('America/New_York')
local_time = utc_now.astimezone(local_timezone)
print(f'UTC Time: {utc_now}')
print(f'Local Time (New York): {local_time}')
```
在linux
```
sys.path: ['/tmp/onefile_61131_1715670463_304893']
Custom pytz module loaded from: /tmp/onefile_61131_1715670463_304893/pytz/__init__.py
UTC Time: 2024-05-14 07:07:43.516114+00:00
Local Time (New York): 2024-05-14 03:07:43.516114-04:00
```
和
```
sys.path: ['C:\\Users\\rex603\\AppData\\Local\\Temp\\ONEFIL~1']
Custom pytz module loaded from: C:\Users\rex603\AppData\Local\Temp\ONEFIL~1\lib\pytz\__init__.py
UTC Time: 2024-05-14 06:29:22.284119+00:00
Local Time (New York): 2024-05-14 02:29:22.284119-04:00
```
這跟根據ONEFIL 去搜一下source code 這樣可以看到
/usr/local/lib/python3.9/site-packages/nuitka/Options.py
C:\Users\rex603\AppData\Local\Programs\Python\Python39\Lib\site-packages\nuitka\Options.py
可以改動他解壓縮的路徑
```
def getOnefileTempDirSpec():
"""*str* = ``--onefile-tempdir-spec``"""
result = (
options.onefile_tempdir_spec or "{TEMP}" + os.path.sep + "onefile_{PID}_{TIME}"
)
result = "./lib"
# This changes the '/' to '\' on Windows at least.
return os.path.normpath(result)
```
```
[root@5a1c68e4daa2 lib]# ls
_blake2.so _codecs_jp.so _ctypes.so _md5.so _random.so _socket.so custom_genetic_algorithm.bin libssl.so.1.1 zlib.so
_bz2.so _codecs_kr.so _datetime.so _multibytecodec.so _sha1.so _statistics.so grp.so math.so
_codecs_cn.so _codecs_tw.so _decimal.so _opcode.so _sha256.so _struct.so libbz2.so.1 pytz
_codecs_hk.so _contextvars.so _hashlib.so _pickle.so _sha3.so array.so libcrypto.so.1.1 select.so
_codecs_iso2022.so _csv.so _heapq.so _posixsubprocess.so _sha512.so binascii.so libffi.so.6 unicodedata.so
```
```
[root@5a1c68e4daa2 dist]# ls
test lib
```
這樣打包後的binaty 就可以解壓縮到使用者根目錄lib這樣,要指定其他目錄也可以。