[Google Coral USB Accelerator and RPI](https://chtseng.wordpress.com/2019/06/29/google-coral-usb-accelerator%E9%96%8B%E7%AE%B1/)
[Nvidia model 轉換成 TensorRT](https://d246810g2000.medium.com/nvidia-jetson-nano-for-jetpack-4-4-03-%E8%BD%89%E6%8F%9B%E5%90%84%E7%A8%AE%E6%A8%A1%E5%9E%8B%E6%A1%86%E6%9E%B6%E5%88%B0-onnx-%E6%A8%A1%E5%9E%8B-17adcece9c34)
## package 安裝事項
### ai-edge-torch
[github repo](https://github.com/google-ai-edge/ai-edge-torch)
Requirements and Dependencies
Python versions: 3.9, 3.10, 3.11
Operating system: Linux
PyTorch: torch
TensorFlow: tf-nightly
```
tf-nightly 會很久,似乎有訊號不穩定的狀況,所以 default timeout 時間要記得設定大一點
pip --default-timeout=1000 install tf-nightly
torchaudio
torchvision
torch
他們都有固定對應的版本,要注意,對應不上他會安裝失敗
```
### 7/8
```
目前這邊有試過 google 的 colab 上面是能夠安裝成功的 ai-edge-torch
這邊在安裝的過程遇到了很多版本問題,嘗試使用 python 3.8~3.11 來安裝,也嘗試過在 Windows 還有 Mac 上面執行都失敗了,嘗試使用 VM 的方式建構 Linux 環境來去做安裝
重新遇到了 tf-nightly timeout 問題,所以 default timeout 時間要記得設定大一點
pip --default-timeout=1000 install xxxx
安裝時間大約 30 分鐘,目前在 Ubuntu22.04 上面安裝成功,週一中午前的進度
```
```
下午實驗
1. 測試一 pytorch pretrained model to tflite
pytorch 實測使用 ai-edge-torch 來將 pytorch model 輸出成 .tflite
目前實驗階段,測試成功
2. 測試二 tensorflow.keras to tensorflow lite
目前測試成功
3. 測試三 pytorch to tensorflow lite
- [目前依照官方文件中的 example 是有這回事的](https://github.com/google-ai-edge/ai-edge-torch/tree/853301630f2b2455bd2e2f73d8a47e1a1534c91c/ai_edge_torch/generative/examples)
目前還未測試
```
- [huggingface to tflite](https://huggingface.co/docs/transformers/tflite)
根據上述文件,我們能夠將 huggingface 上面的 pre-trained model 來做儲存,另外文件下方有提到,我們可以儲存本地的模型,所以只要能夠 load 上面的 pre-trained model 並且加以 fine-tune 那就可以得到一個客製化的模型
要儲存 local model 時需要將 weights 與 tokenizer file 都放在同一個資料夾
週二進度
Jetson Inference的環境 他是一個體驗套件,目測是一個 docker 的 container 或者 image,有提供 .sh 來去快速執行
內有 imagenet用於圖像辨識 ( Image Recognition )、detectNet用於物件辨識 ( Object Detection )、segNet用於語意分割,三種常見的pre_trained CV model 可以用

- 如果 model 不需要太複雜的話,可以嘗試使用這個方向的 model
## Coral 的使用方法

官方的使用流程圖,目前的進度能夠將 tensorflow.keras 的 model 轉換成 .tflite 的格式,可能需要再找 tf to tflite 的格式
### Coral 限制
If you want to build a TensorFlow model that takes full advantage of the Edge TPU for accelerated inferencing, the model must meet these basic requirements:
如果您想要建立充分利用 Edge TPU 進行加速推理的 TensorFlow 模型,模型必須滿足以下基本要求:
Tensor parameters are quantized (8-bit fixed-point numbers; int8 or uint8).
- 張量參數被量化(8 位元定點數;int8 或 uint8)。
Tensor sizes are constant at compile-time (no dynamic sizes).
張量大小在編譯時是恆定的(無動態大小)。
Model parameters (such as bias tensors) are constant at compile-time.
Tensors are either 1-, 2-, or 3-dimensional. If a tensor has more than 3 dimensions, then only the 3 innermost dimensions may have a size greater than 1.
- 張量可以是 1 維、2 維或 3 維。如果張量的維度超過 3 個,則只有最裡面的 3 個維度的大小可以大於 1。
The model uses only the operations supported by the Edge TPU (see table 1 below).
此模型僅使用 Edge TPU 支援的操作(見下表 1)。
- [官方文件 下方有 table1](https://coral.ai/docs/edgetpu/models-intro/#quantization)
### Quantization 量化
Quantizing your model means converting all the 32-bit floating-point numbers (such as weights and activation outputs) to the nearest 8-bit fixed-point numbers. This makes the model smaller and faster. And although these 8-bit representations can be less precise, the inference accuracy of the neural network is not significantly affected.
量化模型意味著將所有 32 位元浮點數(例如權重和激活輸出)轉換為最接近的 8 位元定點數。這使得模型更小、速度更快。儘管這些 8 位元表示可能不太精確,但神經網路的推理精確度並沒有受到顯著影響。
For compatibility with the Edge TPU, you must use either quantization-aware training (recommended) or full integer post-training quantization.
為了與 Edge TPU 相容,您必須使用量化感知訓練(建議)或全整數訓練後量化。
[Quantization 工具](https://www.tensorflow.org/model_optimization?hl=zh-tw)
- 目前的結論是單純轉換成 tflite 還不夠,還需要將他量化成 int8 的 model 才可以
#### 以下為 post training
- [參考 repo](https://github.com/lain-m21/pytorch-to-tflite-example/tree/master)
- 該 repo 上面有看到使用了 pytorch 轉換成 tflite 並且 quantize,但根據內涵的 code 看出他的 quantize 是使用 hybrid 的用法,所以混合了 int 與 float ,有可能在 edge 端無法正常使用,所以根據當年的開發者大會介紹還有[官方文件](https://www.tensorflow.org/api_docs/python/tf/lite/TFLiteConverter)皆需要再轉換的過程中提供部分的訓練資料集給他(文件上說是數百筆即可)

- 所以結論是不論時用的是 keras, tf 還是 pytorch,如果希望在 Coral 上使用的話接需要用 int8 的格式 quantize 它,支援訓練中與訓練好的模型。
#### 在 training 過程中就校調好的

#### [TF World '19](https://www.youtube.com/watch?v=3JWRVx1OKQQ)
- 在其中內部有提到,我們也可以同時使用 purning 來去降低我們 model 的稀疏程度,使得整體的 model 效能更高
#### 實驗-模型壓縮
使用 l1 正規化與不使用 l1 正規化的模型大小比較
實驗結果:兩個訓練出來的模型大小誤差在 1%內
但使用了 tensorflow_model_optimization 的 Sparse 後模型有顯著的壓縮,這邊測試壓縮比例為 1/10
```python=
import tensorflow as tf
from tensorflow.keras import layers, models, regularizers
def create_model_with_l1():
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1), kernel_regularizer=regularizers.l1(0.01)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', kernel_regularizer=regularizers.l1(0.01)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu', kernel_regularizer=regularizers.l1(0.01)))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu', kernel_regularizer=regularizers.l1(0.01)))
model.add(layers.Dense(10))
return model
# 加載數據集
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
# 訓練模型
model_l1 = create_model_with_l1()
model_l1.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model_l1.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
# 保存使用 L1 正規化的模型
model_l1.save('model_with_l1')
def create_model():
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
return model
# 訓練模型
model_no_l1 = create_model()
model_no_l1.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
model_no_l1.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))
# 保存不使用 L1 正規化的模型
model_no_l1.save('model_without_l1')
import tempfile
import os
import tensorflow as tf
import numpy as np
from tensorflow.keras import layers, models, regularizers
from tensorflow_model_optimization.python.core.keras.compat import keras
import tensorflow_model_optimization as tfmot
def create_model():
model = keras.Sequential([
keras.layers.InputLayer(input_shape=(28, 28)),
keras.layers.Reshape(target_shape=(28, 28, 1)),
keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
keras.layers.MaxPooling2D(pool_size=(2, 2)),
keras.layers.Flatten(),
keras.layers.Dense(10)
])
return model
# Load MNIST dataset
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()
# Normalize the input image so that each pixel value is between 0 and 1.
train_images = train_images / 255.0
test_images = test_images / 255.0
# Define the model architecture.
model_sparse = create_model()
pruning_schedule = tfmot.sparsity.keras.PolynomialDecay(
initial_sparsity=0.0,
final_sparsity=0.5,
begin_step=2000,
end_step=4000
)
# 應用修剪
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(model_sparse, pruning_schedule=pruning_schedule)
model_for_pruning.compile(optimizer='adam',
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
# Train the digit classification model
callbacks = [
tfmot.sparsity.keras.UpdatePruningStep(),
tfmot.sparsity.keras.PruningSummaries(log_dir="test"),
]
model_for_pruning.fit(
train_images,
train_labels,
epochs=5,
validation_data=(test_images, test_labels),
callbacks=callbacks
)
model_sparse.save('model_sparse.keras')
import os
# 獲取模型大小
def get_model_size(model_path):
total_size = 0
total_size += os.path.getsize(model_path)
return total_size
size_with_l1 = get_model_size('model_with_l1.keras')
size_without_l1 = get_model_size('model_without_l1.keras')
size_sparse = get_model_size('model_sparse.keras')
print(f'Model size with sparse: {size_sparse} bytes')
print(f'Model size with L1 regularization: {size_with_l1} bytes')
print(f'Model size without L1 regularization: {size_without_l1} bytes')
```
- 這個部分是一個很好的模型壓縮方法,詳細的參數設定還需再研究

- 最後需要 Compile model
- [官方文件 Compiler](https://coral.ai/docs/edgetpu/compiler/)
- 官方文件內容有提到總共有兩種轉換方式
1. 這邊有 web base 的轉換模型方式 [colab 連結](https://colab.research.google.com/github/google-coral/tutorials/blob/master/compile_for_edgetpu.ipynb)
2. 自行在 local compile, compiler 需要在 linux 系統上執行 並且是 x86-64 系統架構
- 以下內容明天補齊
- [tensorflow youtube 官方頻道](https://www.youtube.com/watch?v=Ka_qRt8_Glw)
- 中間有提到我們可以在 device 上面基於 tensorflow lite 做訓練,實際如何使用還要再去追蹤
週三進度
尋找 hailo 的使用方法
### Jetson orin nano 的安裝步驟
```
上面有兩個 M.2 的 slot
沒有 HDMI 而是 DisplayPort
```
- 目前該SBC 有提供兩種設定作業系統的方式
1. 將原廠提供的映像檔燒錄進 microSD 在插入進 Jetson Nano
2. 使用SSD安裝系統開機,如果要將系統安裝在SSD硬碟裡開機的話須要使用NVIDIA SDK Manager透過Ubuntu系統的電腦安裝。
- 記得要先將它設定為 recovery mode
- 要透過 Nvidia 官方的安裝工具(NVIDIA SDK Manager)將作業系統燒錄進該 ssd 之中
- 燒錄過後即可開機
- [相關文章](https://blog.cavedu.com/2023/05/09/jetson-orin-nano-boot/)
- [相關教學影片](https://www.youtube.com/watch?v=FX2exKW_20E)
### 本週進度
1. 持續的進行模型訓練與轉換實驗
2. 將各個平台目前能夠測試的做調查
3. 確認並且解決版本問題
```
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
# 1. Load and preprocess the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train[:6000].reshape(-1, 28, 28, 1).astype('float32') / 255 # Use 1/10 of the training data
x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255
y_train = to_categorical(y_train[:6000], 10) # Use 1/10 of the training data
y_test = to_categorical(y_test, 10)
# 2. Build a smaller CNN model
model = Sequential([
Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
MaxPooling2D(pool_size=(2, 2)),
Conv2D(64, kernel_size=(3, 3), activation='relu'),
MaxPooling2D(pool_size=(2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dense(10, activation='softmax')
])
# 3. Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# 4. Train the model
model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=5, batch_size=32)
# 5. Save the trained model
model.save('small_cnn_mnist_model.h5')
```
```
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import load_model
# 1. Load the pre-trained model
model = load_model('small_cnn_mnist_model.h5')
# 2. Create a representative dataset for calibration
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255
def representative_data_gen():
for input_value in x_train[:100]: # Use the first 100 samples for calibration
yield [np.expand_dims(input_value, axis=0)]
# 3. Define the quantization parameters
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
# 4. Convert the model to TFLite INT8 format
tflite_model = converter.convert()
# 5. Save the quantized TFLite model
with open('small_cnn_mnist_model_int8.tflite', 'wb') as f:
f.write(tflite_model)
```
```
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import load_model
from tensorflow.keras.datasets import mnist
# 1. Load the MNIST dataset
(_, _), (x_test, y_test) = mnist.load_data()
x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255
y_test = tf.keras.utils.to_categorical(y_test, 10)
# 2. Load the saved Keras model
model = load_model('small_cnn_mnist_model.h5')
# 3. Evaluate the Keras model
loss, accuracy = model.evaluate(x_test, y_test, verbose=0)
print(f"Keras Model Accuracy: {accuracy * 100:.2f}%")
# 4. Load the TFLite model
interpreter = tf.lite.Interpreter(model_path='small_cnn_mnist_model_int8.tflite')
interpreter.allocate_tensors()
# 5. Get input and output tensors
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# 6. Evaluate the TFLite model
correct_predictions = 0
for i in range(len(x_test)):
input_data = np.expand_dims(x_test[i].astype(np.float32), axis=0)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_data = interpreter.get_tensor(output_details[0]['index'])
if np.argmax(output_data) == np.argmax(y_test[i]):
correct_predictions += 1
tflite_accuracy = correct_predictions / len(x_test)
print(f"TFLite INT8 Model Accuracy: {tflite_accuracy * 100:.2f}%")
```
這邊還需要做關於 int8 的實驗確保訓練視覺模型不會有問題