Try   HackMD

深度學習python套件安裝 (with Ubuntu22.04)

tags: Tutorial, Ubuntu, python

目錄:

其他參考:


Pytorch

Installation

pip install torch torchvision torchaudio

Cuda check

import torch torch.cuda.is_available() torch.cuda.device_count() torch.cuda.current_device() torch.cuda.device(0) torch.cuda.get_device_name(0)

Jax

Installation

  • CPU only (Linux/MacOS/Windows)
pip install -U jax
  • GPU (NVIDIA with CUDA 12, Linux)
pip install -U "jax[cuda12]"

(2025.02.18)
This version (via pip) doesn't support for CUDNN 9. (XlaRuntimeError: INTERNAL: the library was not initialized)
To use the latest CUDNN, try install locally instead.

Problems

  • Jax(<=0.5.0) was built with NumPy 1.25.0, doesn't support for NumPy>=2.0. Downgrade it to specific version is needed.
pip install --upgrade numpy==1.25.0

TensorFlow

Installation*

Tensorflow2 最新穩定版本,支援 CPU 和 GPU (Ubuntu 和 Windows),無須額外指定。

pip install --upgrade tensorflow

但目前版本(2.11)可能與TensorRT 8.x不相容,建議參考下面的安裝方式。

Problems

Could not load dynamic library 'libnvinfer.so.7'

先安裝新版TensorRT(8.x)

pip install --upgrade setuptools pip pip install nvidia-pyindex pip install nvidia-tensorrt # verify installation of tensorrt python3 -c "import tensorrt; print(tensorrt.__version__); assert tensorrt.Builder(tensorrt.Logger())"

建立libnvinfer version 7 和 8 的 symbolic link

# 到tensorrt的安裝位置(因安裝方式不同有異) cd /env/lib/python3.10/site-packages/tensorrt # create symbolic links ln -s libnvinfer_plugin.so.8 libnvinfer_plugin.so.7 ln -s libnvinfer.so.8 libnvinfer.so.7

更改.bashrc或venv/activate,新增tensorrtLD_LIBRARY_PATH

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/env/lib/python3.10/site-packages/tensorrt/

安裝Tensorflow並驗證GPU安裝是否成功

pip install --upgrade tensorflow python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Fixing "successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node"

  1. Check Nodes
lspci | grep -i nvidia

輸出結果如下:

01:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX {xxxx}] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)

表示 VGA compatible device, NVIDIA Geforce 在 01:00,如果顯示結果不同可能須視情況調整下列code。

  1. Check if it is connected.
cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
  • 0表示已連結。-1表示未連結,須執行下一步驟。
  1. Fix it with the command below.
sudo echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node

可再次執行步驟2確認是否連結成功。

Memory limit setting

僅建議作為替代方案,最好還是能順利安裝TensorRT,不然還是會有很多bug。

如果TensorRT安裝失敗/沒有成功抓到,可能在使用Jupyter Notebook時會遇到GPU記憶體不足的問題,此時可以通過設定每個Notebook的可用記憶體限制來解決。

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    tf.config.experimental.set_virtual_device_configuration(
        gpus[0],[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=4096)])
  except RuntimeError as e:
    print(e)

Pyspark

Install Pyspark on Ubuntu

Use Pyspark on Jupyter Notebook and Virtualenv

進入虛擬環境,安裝findpyspark

pip install findspark

在Notebook的起始或python script的起始使用下列代碼:

import findspark
findspark.init()

Kaggle API

進入虛擬環境,安裝kaggle

pip install kaggle --upgrade

API credentials

要使用API下載競賽或其他資料,則必須要先設定你的API token才能正常使用。

  1. 在官網注冊帳號,再到帳號的介面中選取'Create API Token'即會下載一個kaggle.json的檔案,當中包含你的API token資料。
  2. 設定API token,設定的方法可以為下列任一種:
  • 將Token放在指定路徑$HOME/.kaggle/kaggle.json下,並運行下列指令防止其他電腦使用者讀取你的API
chmod 600 ~/.kaggle/kaggle.json
  • 將username和key (參考kaggle.json) export到虛擬環境中
export KAGGLE_USERNAME={user_name}
export KAGGLE_KEY={xxxxxxxxxxxxxx}