[Google Coral USB Accelerator and RPI](https://chtseng.wordpress.com/2019/06/29/google-coral-usb-accelerator%E9%96%8B%E7%AE%B1/) [Nvidia model 轉換成 TensorRT](https://d246810g2000.medium.com/nvidia-jetson-nano-for-jetpack-4-4-03-%E8%BD%89%E6%8F%9B%E5%90%84%E7%A8%AE%E6%A8%A1%E5%9E%8B%E6%A1%86%E6%9E%B6%E5%88%B0-onnx-%E6%A8%A1%E5%9E%8B-17adcece9c34) ## package 安裝事項 ### ai-edge-torch [github repo](https://github.com/google-ai-edge/ai-edge-torch) Requirements and Dependencies Python versions: 3.9, 3.10, 3.11 Operating system: Linux PyTorch: torch TensorFlow: tf-nightly ``` tf-nightly 會很久,似乎有訊號不穩定的狀況,所以 default timeout 時間要記得設定大一點 pip --default-timeout=1000 install tf-nightly torchaudio torchvision torch 他們都有固定對應的版本,要注意,對應不上他會安裝失敗 ``` ### 7/8 ``` 目前這邊有試過 google 的 colab 上面是能夠安裝成功的 ai-edge-torch 這邊在安裝的過程遇到了很多版本問題,嘗試使用 python 3.8~3.11 來安裝,也嘗試過在 Windows 還有 Mac 上面執行都失敗了,嘗試使用 VM 的方式建構 Linux 環境來去做安裝 重新遇到了 tf-nightly timeout 問題,所以 default timeout 時間要記得設定大一點 pip --default-timeout=1000 install xxxx 安裝時間大約 30 分鐘,目前在 Ubuntu22.04 上面安裝成功,週一中午前的進度 ``` ``` 下午實驗 1. 測試一 pytorch pretrained model to tflite pytorch 實測使用 ai-edge-torch 來將 pytorch model 輸出成 .tflite 目前實驗階段,測試成功 2. 測試二 tensorflow.keras to tensorflow lite 目前測試成功 3. 測試三 pytorch to tensorflow lite - [目前依照官方文件中的 example 是有這回事的](https://github.com/google-ai-edge/ai-edge-torch/tree/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples) 目前還未測試 ``` - [huggingface to tflite](https://huggingface.co/docs/transformers/tflite) 根據上述文件,我們能夠將 huggingface 上面的 pre-trained model 來做儲存,另外文件下方有提到,我們可以儲存本地的模型,所以只要能夠 load 上面的 pre-trained model 並且加以 fine-tune 那就可以得到一個客製化的模型 要儲存 local model 時需要將 weights 與 tokenizer file 都放在同一個資料夾 週二進度 Jetson Inference的環境 他是一個體驗套件,目測是一個 docker 的 container 或者 image,有提供 .sh 來去快速執行 內有 imagenet用於圖像辨識 ( Image Recognition )、detectNet用於物件辨識 ( Object Detection )、segNet用於語意分割,三種常見的pre_trained CV model 可以用 ![image](https://hackmd.io/_uploads/H1jdY4KDA.png) - 如果 model 不需要太複雜的話,可以嘗試使用這個方向的 model ## Coral 的使用方法 ![image](https://hackmd.io/_uploads/S1Ja3zcwR.png) 官方的使用流程圖,目前的進度能夠將 tensorflow.keras 的 model 轉換成 .tflite 的格式,可能需要再找 tf to tflite 的格式 ### Coral 限制 If you want to build a TensorFlow model that takes full advantage of the Edge TPU for accelerated inferencing, the model must meet these basic requirements: 如果您想要建立充分利用 Edge TPU 進行加速推理的 TensorFlow 模型,模型必須滿足以下基本要求: Tensor parameters are quantized (8-bit fixed-point numbers; int8 or uint8). - 張量參數被量化(8 位元定點數;int8 或 uint8)。 Tensor sizes are constant at compile-time (no dynamic sizes). 張量大小在編譯時是恆定的(無動態大小)。 Model parameters (such as bias tensors) are constant at compile-time. Tensors are either 1-, 2-, or 3-dimensional. If a tensor has more than 3 dimensions, then only the 3 innermost dimensions may have a size greater than 1. - 張量可以是 1 維、2 維或 3 維。如果張量的維度超過 3 個,則只有最裡面的 3 個維度的大小可以大於 1。 The model uses only the operations supported by the Edge TPU (see table 1 below). 此模型僅使用 Edge TPU 支援的操作(見下表 1)。 - [官方文件 下方有 table1](https://coral.ai/docs/edgetpu/models-intro/#quantization) ### Quantization 量化 Quantizing your model means converting all the 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This makes the model smaller and faster. And although these 8-bit representations can be less precise, the inference accuracy of the neural network is not significantly affected. 量化模型意味著將所有 32 位元浮點數(例如權重和激活輸出)轉換為最接近的 8 位元定點數。這使得模型更小、速度更快。儘管這些 8 位元表示可能不太精確,但神經網路的推理精確度並沒有受到顯著影響。 For compatibility with the Edge TPU, you must use either quantization-aware training (recommended) or full integer post-training quantization. 為了與 Edge TPU 相容,您必須使用量化感知訓練(建議)或全整數訓練後量化。 [Quantization 工具](https://www.tensorflow.org/model_optimization?hl=zh-tw) - 目前的結論是單純轉換成 tflite 還不夠,還需要將他量化成 int8 的 model 才可以 #### 以下為 post training - [參考 repo](https://github.com/lain-m21/pytorch-to-tflite-example/tree/master) - 該 repo 上面有看到使用了 pytorch 轉換成 tflite 並且 quantize,但根據內涵的 code 看出他的 quantize 是使用 hybrid 的用法,所以混合了 int 與 float ,有可能在 edge 端無法正常使用,所以根據當年的開發者大會介紹還有[官方文件](https://www.tensorflow.org/api_docs/python/tf/lite/TFLiteConverter)皆需要再轉換的過程中提供部分的訓練資料集給他(文件上說是數百筆即可) ![截圖 2024-07-09 上午11.19.30](https://hackmd.io/_uploads/B1bD3QqP0.png) - 所以結論是不論時用的是 keras, tf 還是 pytorch,如果希望在 Coral 上使用的話接需要用 int8 的格式 quantize 它,支援訓練中與訓練好的模型。 #### 在 training 過程中就校調好的 ![截圖 2024-07-09 上午11.25.52](https://hackmd.io/_uploads/r10TTQqDC.png) #### [TF World '19](https://www.youtube.com/watch?v=3JWRVx1OKQQ) - 在其中內部有提到,我們也可以同時使用 purning 來去降低我們 model 的稀疏程度,使得整體的 model 效能更高 #### 實驗-模型壓縮 使用 l1 正規化與不使用 l1 正規化的模型大小比較 實驗結果:兩個訓練出來的模型大小誤差在 1%內 但使用了 tensorflow_model_optimization 的 Sparse 後模型有顯著的壓縮,這邊測試壓縮比例為 1/10 ```python= import tensorflow as tf from tensorflow.keras import layers, models, regularizers def create_model_with_l1(): model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1), kernel_regularizer=regularizers.l1(0.01))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu', kernel_regularizer=regularizers.l1(0.01))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu', kernel_regularizer=regularizers.l1(0.01))) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l1(0.01))) model.add(layers.Dense(10)) return model # 加載數據集 (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data() train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255 test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255 # 訓練模型 model_l1 = create_model_with_l1() model_l1.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model_l1.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels)) # 保存使用 L1 正規化的模型 model_l1.save('model_with_l1') def create_model(): model = models.Sequential() model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1))) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.MaxPooling2D((2, 2))) model.add(layers.Conv2D(64, (3, 3), activation='relu')) model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10)) return model # 訓練模型 model_no_l1 = create_model() model_no_l1.compile(optimizer='adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) model_no_l1.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels)) # 保存不使用 L1 正規化的模型 model_no_l1.save('model_without_l1') import tempfile import os import tensorflow as tf import numpy as np from tensorflow.keras import layers, models, regularizers from tensorflow_model_optimization.python.core.keras.compat import keras import tensorflow_model_optimization as tfmot def create_model(): model = keras.Sequential([ keras.layers.InputLayer(input_shape=(28, 28)), keras.layers.Reshape(target_shape=(28, 28, 1)), keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'), keras.layers.MaxPooling2D(pool_size=(2, 2)), keras.layers.Flatten(), keras.layers.Dense(10) ]) return model # Load MNIST dataset (train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data() # Normalize the input image so that each pixel value is between 0 and 1. train_images = train_images / 255.0 test_images = test_images / 255.0 # Define the model architecture. model_sparse = create_model() pruning_schedule = tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.0, final_sparsity=0.5, begin_step=2000, end_step=4000 ) # 應用修剪 model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(model_sparse, pruning_schedule=pruning_schedule) model_for_pruning.compile(optimizer='adam', loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) # Train the digit classification model callbacks = [ tfmot.sparsity.keras.UpdatePruningStep(), tfmot.sparsity.keras.PruningSummaries(log_dir="test"), ] model_for_pruning.fit( train_images, train_labels, epochs=5, validation_data=(test_images, test_labels), callbacks=callbacks ) model_sparse.save('model_sparse.keras') import os # 獲取模型大小 def get_model_size(model_path): total_size = 0 total_size += os.path.getsize(model_path) return total_size size_with_l1 = get_model_size('model_with_l1.keras') size_without_l1 = get_model_size('model_without_l1.keras') size_sparse = get_model_size('model_sparse.keras') print(f'Model size with sparse: {size_sparse} bytes') print(f'Model size with L1 regularization: {size_with_l1} bytes') print(f'Model size without L1 regularization: {size_without_l1} bytes') ``` - 這個部分是一個很好的模型壓縮方法,詳細的參數設定還需再研究 ![截圖 2024-07-09 下午3.03.02](https://hackmd.io/_uploads/HkealP5P0.png) - 最後需要 Compile model - [官方文件 Compiler](https://coral.ai/docs/edgetpu/compiler/) - 官方文件內容有提到總共有兩種轉換方式 1. 這邊有 web base 的轉換模型方式 [colab 連結](https://colab.research.google.com/github/google-coral/tutorials/blob/master/compile_for_edgetpu.ipynb) 2. 自行在 local compile, compiler 需要在 linux 系統上執行 並且是 x86-64 系統架構 - 以下內容明天補齊 - [tensorflow youtube 官方頻道](https://www.youtube.com/watch?v=Ka_qRt8_Glw) - 中間有提到我們可以在 device 上面基於 tensorflow lite 做訓練,實際如何使用還要再去追蹤 週三進度 尋找 hailo 的使用方法 ### Jetson orin nano 的安裝步驟 ``` 上面有兩個 M.2 的 slot 沒有 HDMI 而是 DisplayPort ``` - 目前該SBC 有提供兩種設定作業系統的方式 1. 將原廠提供的映像檔燒錄進 microSD 在插入進 Jetson Nano 2. 使用SSD安裝系統開機,如果要將系統安裝在SSD硬碟裡開機的話須要使用NVIDIA SDK Manager透過Ubuntu系統的電腦安裝。 - 記得要先將它設定為 recovery mode - 要透過 Nvidia 官方的安裝工具(NVIDIA SDK Manager)將作業系統燒錄進該 ssd 之中 - 燒錄過後即可開機 - [相關文章](https://blog.cavedu.com/2023/05/09/jetson-orin-nano-boot/) - [相關教學影片](https://www.youtube.com/watch?v=FX2exKW_20E) ### 本週進度 1. 持續的進行模型訓練與轉換實驗 2. 將各個平台目前能夠測試的做調查 3. 確認並且解決版本問題 ``` import tensorflow as tf from tensorflow.keras.datasets import mnist from tensorflow.keras.utils import to_categorical from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense # 1. Load and preprocess the MNIST dataset (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train[:6000].reshape(-1, 28, 28, 1).astype('float32') / 255 # Use 1/10 of the training data x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255 y_train = to_categorical(y_train[:6000], 10) # Use 1/10 of the training data y_test = to_categorical(y_test, 10) # 2. Build a smaller CNN model model = Sequential([ Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)), MaxPooling2D(pool_size=(2, 2)), Conv2D(64, kernel_size=(3, 3), activation='relu'), MaxPooling2D(pool_size=(2, 2)), Flatten(), Dense(128, activation='relu'), Dense(10, activation='softmax') ]) # 3. Compile the model model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # 4. Train the model model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=5, batch_size=32) # 5. Save the trained model model.save('small_cnn_mnist_model.h5') ``` ``` import tensorflow as tf import numpy as np from tensorflow.keras.models import load_model # 1. Load the pre-trained model model = load_model('small_cnn_mnist_model.h5') # 2. Create a representative dataset for calibration (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255 def representative_data_gen(): for input_value in x_train[:100]: # Use the first 100 samples for calibration yield [np.expand_dims(input_value, axis=0)] # 3. Define the quantization parameters converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_data_gen converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = tf.int8 converter.inference_output_type = tf.int8 # 4. Convert the model to TFLite INT8 format tflite_model = converter.convert() # 5. Save the quantized TFLite model with open('small_cnn_mnist_model_int8.tflite', 'wb') as f: f.write(tflite_model) ``` ``` import tensorflow as tf import numpy as np from tensorflow.keras.models import load_model from tensorflow.keras.datasets import mnist # 1. Load the MNIST dataset (_, _), (x_test, y_test) = mnist.load_data() x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255 y_test = tf.keras.utils.to_categorical(y_test, 10) # 2. Load the saved Keras model model = load_model('small_cnn_mnist_model.h5') # 3. Evaluate the Keras model loss, accuracy = model.evaluate(x_test, y_test, verbose=0) print(f"Keras Model Accuracy: {accuracy * 100:.2f}%") # 4. Load the TFLite model interpreter = tf.lite.Interpreter(model_path='small_cnn_mnist_model_int8.tflite') interpreter.allocate_tensors() # 5. Get input and output tensors input_details = interpreter.get_input_details() output_details = interpreter.get_output_details() # 6. Evaluate the TFLite model correct_predictions = 0 for i in range(len(x_test)): input_data = np.expand_dims(x_test[i].astype(np.float32), axis=0) interpreter.set_tensor(input_details[0]['index'], input_data) interpreter.invoke() output_data = interpreter.get_tensor(output_details[0]['index']) if np.argmax(output_data) == np.argmax(y_test[i]): correct_predictions += 1 tflite_accuracy = correct_predictions / len(x_test) print(f"TFLite INT8 Model Accuracy: {tflite_accuracy * 100:.2f}%") ``` 這邊還需要做關於 int8 的實驗確保訓練視覺模型不會有問題