# MCU Basic & Advanced Study Note
> 爭什麼,把AI和MCU摻在一起做tinyML就對了!
> https://ithelp.ithome.com.tw/users/20141396/ironman/4855
> AI Embedded System Algorithm optimization and Implementation
> https://www.tenlong.com.tw/products/9787111693253
:::info
:information_source: **Create Project Support CMSIS NN with STM32CubeIDE**
stm32cubeIDE移植神經網路庫CMSIS-NN
https://www.bilibili.com/video/BV16J411w731/
I have create a project that already include DSP NN and CMSIS NN. Please refer to the Github repository.
*TODO : Upload to github*
:::
## Platform Used
***Used in CMSIS related projects***
* Seeed Studio XIAO nRF52840
https://www.seeedstudio.com/Seeed-XIAO-BLE-nRF52840-p-5201.html
* STM32F767ZI
https://www.st.com/en/evaluation-tools/nucleo-f767zi.html
:::info
:bulb: CMSIS Reference
*Common Microcontroller Software Interface Standard*
https://www.keil.com/pack/doc/CMSIS/General/html/index.html
*\[Day 14\] tinyML開發框架\(二\):Arm CMSIS 簡介*
https://ithelp.ithome.com.tw/articles/10273236
:::
***Used in TinyML related projects***
* Arduino Nano 33 BLE Sense
https://docs.arduino.cc/hardware/nano-33-ble-sense/
## MCU Communication Interfaces
Reference \:
> * 【Maker進階】認識UART、I2C、SPI三介面特性
> https://makerpro.cc/2016/07/learning-interfaces-about-uart-i2c-spi/
### SPI \(Serial Peripheral Interface\)
> Reference \:
> * \[STM32\] 18-SPI
> https://medium.com/%E9%96%B1%E7%9B%8A%E5%A6%82%E7%BE%8E/stm32-18-spi-679573f11c31
### UART
> Reference \:
> * \[STM32\] 19-UART
> https://medium.com/%E9%96%B1%E7%9B%8A%E5%A6%82%E7%BE%8E/stm32-19-uart-8f104abc0798
Good Tutorial about How to Debug and What can Go Wrong for UART
**\[淺談UART協定\] 測試及解析幾個常見造成UART錯誤的原因。 \[UART序列埠\]\[CH340\]\[MaxBuadRate\]\[UAR_ErrorRate\]\[UART_Recovery\] by [阿吉米德](https://www.youtube.com/@modernagmid)**
https://www.youtube.com/watch?v=ZJaQryS0lHk
### I2C
> Reference \:
> * \[STM32\] 20-I2C
> https://medium.com/%E9%96%B1%E7%9B%8A%E5%A6%82%E7%BE%8E/stm32-20-i2c-c4c9ce6d1c3a
> * I2C For Hackers: The Basics
> https://hackaday.com/2024/08/07/i2c-for-hackers-the-basics/
> * I2C For Hackers: Digging Deeper
> https://hackaday.com/2024/09/05/i2c-for-hackers-digging-deeper/
## Cortex Microcontroller Software Interface Standard \(CMSIS\) Framework
> Reference \:
> https://www.keil.com/pack/doc/CMSIS/General/html/index.html
## CMSIS-DSP
> Reference \:
> https://www.keil.com/pack/doc/CMSIS/DSP/html/index.html
* Basic math functions
* Fast math functions
* Complex math functions
* Filtering functions
* Matrix functions
* Transform functions
* Motor control functions
* Statistical functions
* Support functions
* Interpolation functions
* Support Vector Machine functions (SVM)
* Bayes classifier functions
* Distance functions
* Quaternion functions
### Matrix Calculation
In the following code, a matrix computation on a float32 matrix and a q31 one is demostrated. Notice that there are no differece in there output and extra conversion is needed for quantized version.
:::info
:bulb: ***Quantized Types***
Refer to other post.
> AI Embedded Systems Algorithm Optimization and Implementation
> https://hackmd.io/@Erebustsai/Sk39malXa
:::
***Example Full Code***
```cpp
/*
* @author ErebusTsai
* @note In most of my work, I use colSize, rowSize instead of nrow, ncol respectively.
*
*/
#include "arm_math.h"
#include "Adafruit_TinyUSB.h"
const float32_t A_f32[4]{ 0.1, 0.2, 0.3, 0.4 };
q31_t A_q31[4];
const float32_t B_f32[16]{
0.1, 0.2, 0.3, 0.4,
0.5, 0.6, 0.7, 0.8,
0.9, 0.1, 0.11, 0.12,
0.13, 0.14, 0.15, 0.16
};
q31_t B_q31[16];
float32_t Y_f32[4];
q31_t Y_q31[4];
float32_t Y_f32q[4];
arm_status status; // CMSIS-DSP calculation result status
arm_matrix_instance_f32 A;
arm_matrix_instance_f32 B;
arm_matrix_instance_f32 Y;
arm_matrix_instance_q31 Aq;
arm_matrix_instance_q31 Bq;
arm_matrix_instance_q31 Yq;
uint32_t srcRows, srcColumns;
void setup() {
Serial.begin(9600);
while (!Serial)
;
Serial.print("XIAO-nRF52840 Start\n");
// Initial f32 Matrix
Serial.print("Start Initial f32 Matrix\n");
srcRows = 1;
srcColumns = 4;
arm_mat_init_f32(&A, srcRows, srcColumns, (float32_t*)A_f32);
srcRows = 4;
srcColumns = 4;
arm_mat_init_f32(&B, srcRows, srcColumns, (float32_t*)B_f32);
srcRows = 1;
srcColumns = 4;
arm_mat_init_f32(&Y, srcRows, srcColumns, (float32_t*)Y_f32);
// Initial q15 Matrix
Serial.print("Start Initial q31 Matrix\n");
arm_float_to_q31(A_f32, A_q31, 4);
arm_float_to_q31(B_f32, B_q31, 16);
srcRows = 1;
srcColumns = 4;
for (int i = 0; i < srcRows; ++i) {
for (int j = 0; j < srcColumns; ++j) {
Serial.print(A_q31[i * srcRows + j]);
Serial.print(" ");
}
Serial.print("\n");
}
srcRows = 4;
srcColumns = 4;
for (int i = 0; i < srcRows; ++i) {
for (int j = 0; j < srcColumns; ++j) {
Serial.print(B_q31[i * srcRows + j]);
Serial.print(" ");
}
Serial.print("\n");
}
srcRows = 1;
srcColumns = 4;
arm_mat_init_q31(&Aq, srcRows, srcColumns, (q31_t*)A_q31);
srcRows = 4;
srcColumns = 4;
arm_mat_init_q31(&Bq, srcRows, srcColumns, (q31_t*)B_q31);
srcRows = 1;
srcColumns = 4;
arm_mat_init_q31(&Yq, srcRows, srcColumns, (q31_t*)Y_q31);
}
void loop() {
Serial.print("Start f32 matrix multiplication\n");
status = arm_mat_mult_f32(&A, &B, &Y);
if (status != ARM_MATH_SUCCESS) {
printf("FAILURE\n");
while (true)
;
}
Serial.print("Start q15 matrix multiplication\n");
status = arm_mat_mult_q31(&Aq, &Bq, &Yq);
if (status != ARM_MATH_SUCCESS) {
printf("FAILURE\n");
while (true)
;
}
Serial.print("Convert q15 output to f32\n");
arm_q31_to_float(Yq.pData, Y_f32q, 4);
for (int i = 0; i < srcRows; ++i) {
for (int j = 0; j < srcColumns; ++j) {
Serial.print(Y.pData[i * srcRows + j]);
Serial.print(" ");
}
Serial.print("\n");
}
for (int i = 0; i < srcRows; ++i) {
for (int j = 0; j < srcColumns; ++j) {
Serial.print(Y_f32q[i * srcRows + j]);
Serial.print(" ");
}
Serial.print("\n");
}
}
```
***Output***
```bash
04:48:30.686 -> XIAO-nRF52840 Start
04:48:30.686 -> Start Initial f32 Matrix
04:48:30.686 -> Start Initial q31 Matrix
04:48:30.686 -> 214748368 429496736 644245120 858993472
04:48:30.686 -> 214748368 429496736 644245120 858993472
04:48:30.686 -> 1073741824 1288490240 1503238528 1717986944
04:48:30.686 -> 1932735232 214748368 236223200 257698032
04:48:30.686 -> 279172864 300647712 322122560 343597376
04:48:30.686 -> Start f32 matrix multiplication
04:48:30.686 -> Start q15 matrix multiplication
04:48:30.686 -> Convert q15 output to f32
04:48:30.686 -> 0.43 0.23 0.26 0.30
04:48:30.686 -> 0.43 0.23 0.26 0.30
```
## Intro to TinyML with TFLM \(TensorFlow Lite for Microcontrollers\)
:::info
:bulb: **Hardware Used**
* *GPU info from Tensorflow*
```
2024-04-09 12:14:13.664609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:81:00.0 name: Tesla P40 computeCapability: 6.1
coreClock: 1.531GHz coreCount: 30 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 323.21GiB/s
```
* *Tensorflow Version*
```
>>> tf.version.VERSION
'2.4.0'
```
:::
### Application of TinyML
* Predictive Maintenance \: By monitoring the vibration, torque and other things to check if a machine is function correctly.
* Healthcare
* Agriculture
* Voice-assisted devices
* Ocean-life conservation
:::info
:bulb: **FlatBuffer**
https://zhuanlan.zhihu.com/p/391109273
::::
## Hello World Project Tensorflow Lite

Reference \: https://www.researchgate.net/figure/Tensorflow-Lite-Workflow-adapted-from-Cavagnis-2023_fig1_376715530
### Convert Tensorflow Model to TFLite Model
In the following code snippet, the tensorflow model in `h5` format is converted and saved as a TFLite model.
```pyhton
baseline_model = load_model('baseline_model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(baseline_model)
tflite_model = converter.convert()
tflite_models_dir = pathlib.Path("./")
tflite_models_file = tflite_models_dir/'model.tflite'
tflite_models_file.write_bytes(tflite_model)
```
Another way to convert model into TFLite model is create a graph representation of the model also known as *concrete function*.
```python
# export model as a concrete function
func = tf.function(baseline_model).get_concrete_function(
tf.TensorSpec(baseline_model.inputs[0].shape, baseline_model.inputs[0].dtype)
)
# serialized graph representation of the concrete function
func.graph.as_graph_def()
# converting the concrete function to TfLite
converter = tf.lite.TFLiteConverter.from_concrete_functions([func])
tflite_model = converter.convert()
tflite_models_dir = pathlib.Path("./")
tflite_models_file = tflite_models_dir/'model_concrete.tflite'
tflite_models_file.write_bytes(tflite_model)
```
:::info
:bulb: **TensorFlow2.X——tf.function && Autograph介紹以及使用**
https://blog.csdn.net/qq_40913465/article/details/104604979
> Reference \: Book P117
> We can further convert the baseline model to a TensorFlow graph using tf.function,which contains all the computational operations, variables, and weights.
:::
### Use TFLite Interpreter
:::info
:bulb: A Good Article Series about TFLite
https://hackmd.io/@yillkid/ByQ7ySDT8/https%3A%2F%2Fhackmd.io%2F%40yillkid%2FrkUlAjkGF
:::
```python
tflite_model_file = 'model_concrete.tflite'
interpreter = tf.lite.Interpreter(model_path=tflite_model_file)
interpreter.allocate_tensors()
input_index = interpreter.get_input_details()[0]['index']
output_index = interpreter.get_output_details()[0]['index']
pred_list = []
for images in X_test:
input_data = np.array(images, dtype=np.float32)
input_data = input_data.reshape(1, input_data.shape[0], input_data.shape[1], 1)
interpreter.set_tensor(input_index, input_data)
interpreter.invoke()
prediction = interpreter.get_tensor(output_index)
prediction = np.argmax(prediction)
pred_list.append(prediction)
```
### Quantization
## ESP-IDF vs Arduino Framework Performance
The following is the performance comparison between the output binaries of ESP-IDF and Arduino Framework.
https://www.youtube.com/watch?v=O-7rPkya4Yw