project 製作 (人臉辨識)

# 攝影機學習 ## OV2640使用 : 因為SCCB協議遇見問題，直接買OV2640做使用，這裡原廠已經處理好ov2640到rgblcd的驅動，因此只要寫應用就好。我以為這裡很簡單，結果這裡遇到超級大坑。 ### 硬體架構: 先說硬體架構，左邊為lcd設備，為800*480，**格式為ARGB888**，右邊為OV2640，因為lcd設備是原先就已經設定好了，所以只去研究ov2640使用。 ![15451](https://hackmd.io/_uploads/Sy6o2YYh1x.jpg) 開發版使用I2C協議設定OV2640相關參數，CSI協議去捕捉訊號: ![image](https://hackmd.io/_uploads/Syj_pFt2Jx.png) 使用的cpu引腳如下: ![image](https://hackmd.io/_uploads/BknPCKt3ke.png) ### 通訊協議: 這裡介紹一下ov2640的通訊協議，跟ov7670很像，我猜ov系列都採用這種架構。 ov2640內部邏輯: ![image](https://hackmd.io/_uploads/ByXtz5Knyl.png) 看懂邏輯之後就能懂各種引腳怎麼使用，首先是xvclk，需要給ov2640一個clk訊號，我使用的ov2640已經具有內部晶振12mhz，所以無須提供，另外有一個PLL，可以放大兩倍clk，用於提高ov2640的fps。而sio_c和sio_d當作i2c協議即可，因為driver部分也是用i2c2的driver，使用後可設定ov2640相關參數。 resetb 和 pwdn 就用在reset和低功率模式，reset可以在改寫i2c參數後重啟設備，更改其設定。而**pclk herf vsync** 則是輸出圖片資料時的相關訊號，類似輸出的clk，ov2640輸出資料的引腳同樣是8位，也就是每次能傳遞8bit。輸出訊號如下: ![image](https://hackmd.io/_uploads/BkA1HqFn1g.png) 每個vsync訊號影片代表新的一個frame，也就是一張圖，href訊號每個positive edge期間代表圖片的每行，pclk代表一組8bit的data準備好能讀取了，可以設定pclk是 positive 還是 negative讀取data。以rgb565 (red 5bit green 6bit blue 5bit 1個pixel 16bit) 舉例，假設是 640*480 的圖片，vsync 來訊號後，第一個row有640 pixel，這段pixel輸出期間href為positive，表示為第一行，而一個pixel有16bit，代表需要送兩次8bit的資料，所以pclk需要傳遞兩次，而到下個row期間herf會下降再上升，代表第二個row，以此類推直到下張圖片vsync會上升在下降。 ## 軟體架構軟體分為應用和驅動 ### OV2640 driver: driver 使用廠商提供的driver，這裡本來懶得研究的，結果遇到bug，害我整個driver幾乎都看過了。主要程式分別是ov2640.c和mx6s_capture.c 細節請看github。兩者皆是採用v4l2架構寫成，要寫成整體架構如下: #### mx6s_capture.c mx6s_capture.c是csi的driver，負責處理接收到的data，可以給各種攝影機Driver使用，其原理是通過dma讀取數據，這裡定義v4l2 中ioctl可以使用的對應操作。 ![image](https://hackmd.io/_uploads/Skj9A3MpJg.png) 這裡可以設置一個queue，用來儲存dma讀取到的資料， #### ov2640.c 接下來是ov2640.c 主要用來設置i2c相關參數(底層就是i2c的driver)，最主要的function 為ov2640_set_params(): 會根據ioctl傳入的對應參數，去write i2c 對應的reg ![image](https://hackmd.io/_uploads/SJVtwTMpyl.png) 因此我們只要在對應 reg 中調整你要的值就能完成 i2c的設定了: 第一個值是要寫的寄存器，第二個值是要寫的值，根據datasheet進行調整。 ![image](https://hackmd.io/_uploads/H1RADaGakx.png) 我認為ov2640最重要的reg就是 0xDA : ![image](https://hackmd.io/_uploads/By4Ud6Ma1l.png) 這裡不需要自己配置，會根據ioctl傳入的format配置，圖片大小也是同理。 ![image](https://hackmd.io/_uploads/B1a5u6G6yx.png) 有yuyv uyvy rgb565等。 ![image](https://hackmd.io/_uploads/ryo6uafp1l.png) #### v4l2 ioctl 流程這邊先介紹v4l2 driver如何初始化ioctl，首先是應用中的ioctl: ![image](https://hackmd.io/_uploads/HJelBaf61l.png) 接著來到內核，V4L2處理IOCTL的程式為v4l2-ioctl.c，其中有個結構體如下: 宏定義第一個參數是ioctl的設定，當使用該ioctl，則會使用第二個值，也就是對應的function。 ![image](https://hackmd.io/_uploads/ry4YBTMTJl.png) ![image](https://hackmd.io/_uploads/HysmbTGT1x.png) 在function中，會根據傳進來結構體的type，執行不同case，ret的值就會使用宏定義去執行對應的函數: 有點忘記那個宏定義在哪了，但原理就是那樣。 ![image](https://hackmd.io/_uploads/r1WfGTfTke.png) 最後會去執行這個function: 設置csi的初始對應參數。 ![image](https://hackmd.io/_uploads/HyNaXpMTJx.png) 而同樣的ov2640中的i2c設置也是通過ioctl去啟動。 ### OV2640 應用: 應用實際上就是利用ioctl去控制設備相關參數，其餘流程皆差不多。例如這裡設置rgb格式: ![image](https://hackmd.io/_uploads/Hy5ZsaGTyl.png) 其中這裡使用了v4l2的buffer，去儲存ov2640讀到的圖片，通過buffer，避免兩者互相等待，同時根據上面設置的圖片長寬，可以知道buffer多大，driver真的是方便的東西。 ![image](https://hackmd.io/_uploads/rJsb2pM61x.png) 這裡rgblcd是開發版設定好的設備樹，如何執行就沒去研究了，將讀取到的資料寫進 rgblcd mmap 對應內存，就會顯示資料了。映射地址: ![image](https://hackmd.io/_uploads/ryEya6fayx.png) 寫入地址程式: ```c static void v4l2_read_data(void) { struct v4l2_buffer buf = {0}; unsigned int *base; // ARGB8888 占 4 字节 unsigned short *start; int min_w, min_h; int j, i; if (width > frm_width) min_w = frm_width; else min_w = width; if (height > frm_height) min_h = frm_height; else min_h = height; buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE; buf.memory = V4L2_MEMORY_MMAP; for ( ; ; ) { for (buf.index = 0; buf.index < FRAMEBUFFER_COUNT; buf.index++) { ioctl(v4l2_fd, VIDIOC_DQBUF, &buf); // 出队 for (j = 0, base = (unsigned int *)screen_base, start = buf_infos[buf.index].start; j < min_h; j++) { for (i = 0; i < min_w; i++) { unsigned short pixel = start[i]; // 获取 RGB565 像素 unsigned int r, g, b, argb; r = ((pixel & 0xF800) >> 8) | ((pixel & 0xF800) >> 13); // 5-bit → 8-bit g = ((pixel & 0x07E0) >> 3) | ((pixel & 0x07E0) >> 9); // 6-bit → 8-bit b = ((pixel & 0x001F) << 3) | ((pixel & 0x001F) >> 2); // 5-bit → 8-bit // **ARGB8888 格式：A(8) | R(8) | G(8) | B(8)** argb = (0xFF << 24) | (r << 16) | (g << 8) | b; // Alpha 设为 0xFF（不透明） base[i] = argb; // 赋值到 LCD 屏幕缓冲区 } base += width; // LCD 指向下一行 start += frm_width; // 视频帧指向下一行 } ioctl(v4l2_fd, VIDIOC_QBUF, &buf); // 重新入队 } } } ``` #### 遇到的bug 接下來講卡了我兩個多禮拜的bug，我應用是直接拿別人的code來使用，因為他的設備和顯示器大小和我一模一樣，只有攝像機是用ov5640，但沒想到跑下去圖片會變這樣: ![15448](https://hackmd.io/_uploads/SJ1uCpGakx.jpg) 原先應用如下: ```c static void v4l2_read_data(void) { struct v4l2_buffer buf = {0}; unsigned short *base; unsigned short *start; int min_w, min_h; int j; if (width > frm_width) min_w = frm_width; else min_w = width; if (height > frm_height) min_h = frm_height; else min_h = height; buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE; buf.memory = V4L2_MEMORY_MMAP; for ( ; ; ) { for(buf.index = 0; buf.index < FRAMEBUFFER_COUNT; buf.index++) { ioctl(v4l2_fd, VIDIOC_DQBUF, &buf); //出队 for (j = 0, base=screen_base, start=buf_infos[buf.index].start; j < min_h; j++) { memcpy(base, start, min_w * 2); //RGB565 一个像素占2个字节 base += width; //LCD显示指向下一行 start += frm_width;//指向下一行数据 } // 数据处理完之后、再入队、往复 ioctl(v4l2_fd, VIDIOC_QBUF, &buf); } } } ``` 我上網查類似問題都說是格式錯誤，也就是yuyv讀成uvyv之類的原因，害我一直往這方面去修改，跑去看driver是不是設定有誤，搞來搞去還是差不多結果: ![15447](https://hackmd.io/_uploads/ByaEJRf6Jx.jpg) 後來我就覺得driver錯的可能性極低，那code寫得那麼漂亮，而且我看日期都是2009年附近的，有錯早就改了，而且我把i2c的id可以讀出來，我很確定我i2c沒有問題，有成功設置成rgb565格式。後來我開始往為什麼我大小設置成640*480，但大小只有四分之一，而且還會重複，我才靈機一動發現是螢幕的問題，因為讀到的數據是rgb565格式，但rgblcd是argb8888格式的，因此要再做處理。所以數據只有一半，導致memcpy圖片只有四分之一，並且rgb565換行後，argb8888還沒，所以會產生兩張重複的圖。到這裡我才明白韌體常說的通靈到底是什麼意思，真的是突然通到的，還有韌體中的記憶體真的好重要，必須符合相同格式。最後修正後成果如下: ![15446](https://hackmd.io/_uploads/r1OSRpz61e.jpg) 調整焦距後成果如下: ![15445](https://hackmd.io/_uploads/SkHICaM6Jl.jpg) ## 攝影機學習使用 ov7670 並撰寫 driver(待完成) 硬體接線和控制方法構想如下: ![image](https://hackmd.io/_uploads/BydP88Vikg.png) 1. reset 和 3.3v 接開發版供給的3.3v 2. gnd 和 pwdn(不接也沒關係) 接開發版的地 3. xclk 利用開發版原廠就有的pwm 驅動供給 20MHz，建議為 12~48MHz，最好為4的倍數，有使用led測試10Hz pwm功能正常，但在高需要示波器，經費不足，放棄。 4. xclk 和 d7~d0 使用 gpio，進行時序和圖片資料的讀取(尚未撰寫)。目前卡一個大問題在SCCB協議上，目前想法是使用I2C，但不知為何，就是無法成功讀取到設備ID，代碼有參考別人的，跟我的並無多大差別，推測錯誤出現在xclk的供給，這裡我是使用pwm訊號模擬20MHz，但有文章說需要上拉電阻，可有些文章又說不需要，這裡在等待蝦皮送電阻來，再進行測試。帶fifo的OV7670(這裡搞完ov2640快吐了，之後再研究) 如果能成功完成，可學習v4l2 driver撰寫，會比使用i2c+gpio去硬讀來的方便。 # AI布置於嵌入式系統學習 ## 模型選擇這裡選擇的AI為YOLOv8，這裡選擇AI為YOLOv8是因為我論文就做這個，使用上比較熟悉，不然最新已經出到YOLOv12，不得不感慨AI更迭真的是很快，在我剛畢業時才出到v9，才半年左右就已經更迭三代。想挑戰不使用python，部屬在imx6ull上跑yolo，用python我以前碩論就用過了，沒什麼意思。 ## 為什麼用tflite? 因為開發版只有cpu，為提高模型運算速度，需要對模型進行量化，減少計算量，原本打算量化為tflite格式(tensorflow lite)，利用tensorflow提供的動態庫丟到imx6ull的lib，並用別人寫好的code就能運行，使用tensorflow 提供的docker時，docker網路一直出問題(起碼有學到docker怎麼用)，索性放棄，轉而使用onxx。 onxx相較tflite在推理速度上有差異，但tflite實在是太難用了，放棄，onxx概念是將yolo 原先的格式 .pt，轉化成.onxx，更適合部屬在各種架構，因為pt是pytorch，只能布置在pytorch上，onxx則是三大架構都可，但參數就不能像tflite一樣做縮減了。這裡onxx用c++去跑，因為已經有別人現成寫好的，結果發現onxx不支援arm32架構的mcu。不過不支援是正常的，因為我用tflite都跑超慢。 ## 前置作業: 將opencv 動態庫丟到mcu上，而tflite只有靜態庫，在編譯時會自動鏈接，不需要安裝動態庫。 ## 模型用法: 這裡只需將模型訓練後的參數，和圖片，通過tflite的api進行輸入，模型就會將預測結果輸出，這裡是一組vector形式的tensor，vector大小是[1,8400,5]，這裡的tensor shape我是用python一個一個步驟print出來看的，不然c++做機器學習超難用，每個tensor都要手動處理，chatgpt 又靠不住。 1就是class的dimension，因為我只有辨識人臉，所以只有一組tensor，8400是預測結果，5則是 x y width hight confidence，有這五組tensor，再配合opencv 的 dnn.nms retrangle lib，就能畫出bounding box 了。用opencv處理tensor並畫到image的 code : ```cpp pair<vector<vector<float>>, vector<float>> applyNMS(const vector<vector<float>>& boxes, const vector<float>& confidences, float iouThreshold) { vector<Rect> nmsBoxes; for (const auto& box : boxes) { float xCenter = box[0]; float yCenter = box[1]; float w = box[2]; float h = box[3]; // 將相對座標轉換為 int 類型的絕對像素座標 (用於 NMSBoxes) nmsBoxes.emplace_back(static_cast<int>(xCenter - w / 2), static_cast<int>(yCenter - h / 2), static_cast<int>(w), static_cast<int>(h)); } vector<int> indices; NMSBoxes(nmsBoxes, confidences, 0.5f, iouThreshold, indices); vector<vector<float>> filteredBoxes; vector<float> filteredConfidences; for (int i : indices) { int index = indices[i]; filteredBoxes.push_back(boxes[index]); // 保留原始的 float 座標 filteredConfidences.push_back(confidences[index]); } return make_pair(filteredBoxes, filteredConfidences); } void draw_predictions(Mat& img, float* outputData, int numDetections, int imgWidth, int imgHeight) { vector<vector<float>> boxes; vector<float> confidences; for (int i = 0; i < numDetections; i++) { float x = outputData[(numDetections*0) + i]; float y = outputData[(numDetections*1) + i]; float w = outputData[(numDetections*2) + i]; float h = outputData[(numDetections*3) + i]; float conf = outputData[(numDetections*4) + i]; if (conf > 0.5) { printf("x=%f,y=%f,w=%f,h=%f\r\n",x, y, w, h); boxes.push_back({x, y, w, h}); // 相對座標 confidences.push_back(conf); } } // 應用 NMS auto [nmsBoxes, nmsConfidences] = applyNMS(boxes, confidences, 0.4f); // 繪製框 for (size_t i = 0; i < nmsBoxes.size(); i++) { float xCenter = nmsBoxes[i][0]; float yCenter = nmsBoxes[i][1]; float w = nmsBoxes[i][2]; float h = nmsBoxes[i][3]; float conf = nmsConfidences[i]; int xMin = (int)((xCenter - w / 2.0) * imgWidth); int yMin = (int)((yCenter - h / 2.0) * imgHeight); int xMax = (int)((xCenter + w / 2.0) * imgWidth); int yMax = (int)((yCenter + h / 2.0) * imgHeight); printf("xm=%d,ym=%d,xmax=%d,ymax=%d\r\n",xMin,yMin,xMax,yMax); rectangle(img, Point(xMin, yMin), Point(xMax, yMax), Scalar(0, 255, 0), 4); string label = "Class " + to_string(static_cast<int>(outputData[i * 6 + 5])) + " " + to_string(conf); putText(img, label, Point(xMin, yMin - 10), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 255, 0), 1); } } ``` 最後這裡為了練習用threading，有加入thread優化程式，雖然這個任務因為沒有gpu，算是cpu密集型，IO只佔了一點點，優化並不明顯。 thread 的程式部分 (三個thread) ```cpp static void v4l2_read(unsigned short * save) { struct v4l2_buffer buf = {0}; buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE; buf.memory = V4L2_MEMORY_MMAP; while (true) { for (buf.index = 0; buf.index < FRAMEBUFFER_COUNT; buf.index++) { ioctl(v4l2_fd, VIDIOC_DQBUF, &buf); // 出队 { unique_lock<mutex> lock(v42tfl); v4cv.wait(lock, [] { return !buffer; }); memcpy(save, buf_infos[buf.index].start, min_w * min_h * 2); buffer=true; } tfcv.notify_one(); // 通知推理线程 ioctl(v4l2_fd, VIDIOC_QBUF, &buf); // 入队 } } } ///////////////////////////////////////////////////////////////////////// opce process image and tflite ini model code static void tflite_predict(const string & model_path,unsigned short * save,uint8_t * bgr_buffer) { auto model = load_model(model_path); if (!model) { cerr << "无法加载模型: " << model_path << endl; exit(1); } cv::Mat image(min_h, min_w, CV_8UC3); uint8_t * bgr_save = (uint8_t *) malloc(sizeof(uint8_t)*3*min_w*min_h); ops::builtin::BuiltinOpResolver resolver; InterpreterBuilder builder(*model, resolver); unique_ptr<Interpreter> interpreter; builder(&interpreter); if (!interpreter || interpreter->AllocateTensors() != kTfLiteOk) { cerr << "解释器初始化失败" << endl; exit(1); } int input_idx = interpreter->inputs()[0]; TfLiteTensor* input_tensor = interpreter->tensor(input_idx); int input_width = input_tensor->dims->data[1]; int input_height = input_tensor->dims->data[2]; int input_channels = input_tensor->dims->data[3]; cv::Mat reimage(input_height, input_width, CV_8UC3); while (true) { { unique_lock<mutex> lock(v42tfl); tfcv.wait(lock, [] { return buffer; }); // RGB565 转 RGB888 uint8_t* base = bgr_save; unsigned short* start = save; for (int j = 0; j < min_h; j++) { for (int i = 0; i < min_w; i++) { unsigned short pixel = *(start + i); base[3 * i + 2] = ((pixel & 0xF800) >> 8) | ((pixel & 0xF800) >> 13); base[3 * i + 1] = ((pixel & 0x07E0) >> 3) | ((pixel & 0x07E0) >> 9); base[3 * i] = ((pixel & 0x001F) << 3) | ((pixel & 0x001F) >> 2); } base += (3 * frm_width); start += frm_width; } buffer = false; } v4cv.notify_one(); // 通知 V4L2 线程 memcpy(image.data, bgr_save, 3 * min_w * min_h); cvtColor(image, image, COLOR_BGR2RGB); cv::resize(image, reimage, cv::Size(input_width, input_height)); cvtColor(image, image, COLOR_RGB2BGR); reimage.convertTo(reimage, CV_32F, 1.0 / 255); if (image.channels() != input_channels) { cerr << "图片通道数与模型不匹配" << endl; exit(1); } memcpy(interpreter->typed_input_tensor<float>(0), reimage.data, reimage.total() * reimage.elemSize()); if (interpreter->Invoke() != kTfLiteOk) { cerr << "推理失败" << endl; exit(1); } int output_idx = interpreter->outputs()[0]; TfLiteTensor* output_tensor = interpreter->tensor(output_idx); float* output_data = interpreter->typed_output_tensor<float>(0); int num_detections = output_tensor->dims->data[2]; printf("num_detectiom = %d\r\n",num_detections); draw_predictions(image, output_data, num_detections, image.cols, image.rows); { unique_lock<mutex> lock1(tf2lcl); lcv.wait(lock1,[] {return !op;}); memcpy(bgr_buffer, image.data, 3 * min_w * min_h); op = true; } lcv.notify_one(); // 通知 LCD 线程 } free(bgr_save); } static int fb_dev_init(void) { struct fb_var_screeninfo fb_var = {0}; struct fb_fix_screeninfo fb_fix = {0}; unsigned long screen_size; /* 打开framebuffer设备 */ fb_fd = open(FB_DEV, O_RDWR); if (0 > fb_fd) { fprintf(stderr, "open error: %s: %s\n", FB_DEV, strerror(errno)); return -1; } /* 获取framebuffer设备信息 */ ioctl(fb_fd, FBIOGET_VSCREENINFO, &fb_var); ioctl(fb_fd, FBIOGET_FSCREENINFO, &fb_fix); screen_size = fb_fix.line_length * fb_var.yres; width = fb_var.xres; height = fb_var.yres; /* 内存映射 */ screen_base = (unsigned long *) mmap(NULL, screen_size, PROT_READ | PROT_WRITE, MAP_SHARED, fb_fd, 0); if (MAP_FAILED == (void *)screen_base) { perror("mmap error"); close(fb_fd); return -1; } /* LCD背景刷白 */ memset(screen_base, 0xFF, screen_size); return 0; } ```