2020q3 Image Beautifier

--- title: '2020q3 Image Beautifier' tags: linux2020 --- # 2020q3 Image Beautifier contributed by < `ChongMingWei` > --- github: [skin-deep](https://github.com/ChongMingWei/skin-deep) 相關討論:[Ugly](https://www.facebook.com/groups/system.software2020/permalink/449520442648827/), [champ1](https://www.facebook.com/champ.yen/posts/10220212674120306), [champ2](https://www.facebook.com/champ.yen/posts/10220219876340357), [champ3](https://www.facebook.com/champ.yen/posts/10220233197793385) ## Outline [TOC] ## Development Environment :::spoiler {state="open"} ``` cpp $ uname -a Linux cmw-System-Product-Name 5.4.0-47-generic #51~18.04.1-Ubuntu SMP Sat Sep 5 14:35:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux $ gcc --version gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 ``` ::: ## Functions ### int main(int argc, char *argv[]) >補充沒看過的 api #### int getopt(int argc, char * const argv[], const char *optstring); > [Parsing program options using getopt](https://www.gnu.org/software/libc/manual/html_node/Getopt.html) Used to parse the options in command lines. Normally, getopt is called in a loop. When getopt returns -1, indicating no more options are present, the loop terminates. - Varibles 1. int **opterr**: Set to 0 to show error message to the standard error stream if it encounters an unknown option character or an option with a missing required argument. ``` Ex. cmd> ./main -i mark.jpg -l ./main: option requires an argument -- 'l' ifn:mark.jpg ofn:out.jpg level:10 detect - 5843 us denoise - 32325 us ``` :::spoiler 圖示 ![](https://i.imgur.com/lN02VFl.png) ::: 2. int **optopt**: When getopt encounters an unknown option character or an option with a missing required argument, it stores that option character in this variable. 3. int **optind**: This variable is set by getopt to the index of the next element of the argv array to be processed. Once getopt has found all of the option arguments, you can use this variable to determine where the remaining non-option arguments begin. The initial value of this variable is 1. 4. char **\*optarg**: This variable is set by getopt to **point at the value of the option argument**, for those options that accept arguments. - Parameters 1. `argc` and `argv` are the argument count and array as passed to the main() function on program invocation. 2. `optstring` is a string that specifies the option characters that are valid for this program. An option character in this string can be **followed by a colon (‘:’) to indicate that it takes a required argument.** If an option character is **followed by two colons (‘::’), its argument is optional**; this is a GNU extension. - Return value The getopt function returns the option character for the next command line option. When no more option arguments are available, it returns -1. #### unsigned char *data = stbi_load(filename, &x, &y, &n, 0); >[stb_image.h](https://github.com/nothings/stb/blob/master/stb_image.h) Used to load image. - Parameters 1. **filename**: char* type, the filename of image. 2. **x**: int* type, to be assigned the width of the image. 3. **y**: int* type, to be assigned the height of the image. 4. **n**: int* type, to be assigned the channel number of the image. 5. **0**: int type, [0,4], if non-zero, # of channels requested in result. Otherwise # of channels of the image - Return value The pixel data consists of \*y scanlines of \*x pixels, with each pixel consisting of N interleaved 8-bit components; the first pixel pointed to is top-left-most in the image. | N | Components | | -------- | -------- | | 0 | depends on the number of channles | | 1 | grey | | 2 | grey, alpha | | 3 | red, green, blue | | 4 | red, green, blue, alpha | #### int stbi_write_jpg(char const *filename, int w, int h, int comp, const void *data, int quality); > [stb_image_write.h](https://github.com/nothings/stb/blob/master/stb_image_write.h) Used to save .jpg image. - Parameters 1. char const **\*filename**: the filename of output image. 2. int **w**: width of output image. 3. int **h**: height of output image. 4. int **comp**: the channel number of output image. | N | Components | | -------- | -------- | | 1 | grey | | 2 | grey, alpha | | 3 | red, green, blue | | 4 | red, green, blue, alpha | 6. const void **\*data**: the pixel data. 7. int **quality**: quality level of jpg format [0, 100] - Return value: 1 for success and 0 for failure. ### unsigned int detect(uint8_t *pixel, uint8_t **plane, int width, int height, int channels) 針對每一個 pixel ，分別計算在以自己為起點的 2 \* 2 範圍中的 R~avg~, G~avg~, B~avg~ 計算符合以下條件的 pixel 數： 1. (R~avg~, G~avg~, B~avg~) 分別大於 (60, 40, 20) 2. R~avg~ - G~avg~ >= 10 3. max(R~avg~, G~avg~, B~avg~) - min(R~avg~, G~avg~, B~avg~) >= 10 ### void compute_offset(int \*out, int len, int left, int right, int step) 建立row 和 column 的 LUT(Look Up Table)，用來便於查詢每一個 pixel 的 window 範圍內 row 和 column 的所有 index 位置。要注意的是，因為原圖像沒有做 padding 所以 window 超出原圖範圍的部份，會以重複圖像內部的 pixel 來遞補。 - left: window 的左半部長度 - right: window 的右半部長度 - step: In the same row, step of different columns is **the number of channels**. In the same column, step of different rows is **(width \* number of channels)** ### void denoise(uint8_t \*out, uint8_t \*in, int \*smooth_table, int width, int height, int channels, int radius) 以每一個 pixel 為中心，以 window 範圍內的所有 pixel 灰階值來計算新的結果。 - smooth_table: 根據輸入的 `smoothing_level` 計算得來 - radius: 形成(2\*radius + 1)為邊長的 window ### void denoise2(uint8_t \*out, uint8_t \*\*planes, int \*smooth_table, int width, int height, int channels, int ch_idx, int radius) denoise 的平行化版本，將 RGB 3個 channels 的灰階值做平行處理 ## Process ### Argument parsing 在 `main` 裡面解析 command line 的參數，取得: 1. input image name 2. smooth level 3. output image name :::spoiler ```c=310 while ((opt = getopt (argc, argv, "i:l:o:")) != -1){ switch(opt){ case 'i': { ifn = optarg; break; } case 'l': { smoothing_level = atoi(optarg); smoothing_level = (smoothing_level < 1 ? 1 : (smoothing_level > 20 ? 20 : smoothing_level)); break; } case 'o': { ofn = optarg; break; } } } ``` ::: ### Load image 在 `main` 裡面使用 `stb` library 讀取圖片的資料 :::spoiler 程式碼 ```c=333 uint8_t *in = stbi_load(ifn, &width, &height, &channels, 0); ``` ::: 其中 `in` 為指向第一個 pixel R channel 灰階值的指標，在 `stbi_load` 中會完成記憶體的分配。 $\begin{matrix} width*height*channels \\ \overbrace{ RGBRGB \cdots RGB } \end{matrix}$ $\ \ \ \ \ \ \uparrow$ $\ \ \ \ \ \ \color{Red}{in}$ ### Detect skin area 呼叫 `detect` 函式，計算皮膚部分佔全體 pixels 的百分比，目的是用於決定後面 `denoise` 的 window 大小 :::spoiler 程式碼 ```c=350 float rate = detect(in, in_planes, width, height, channels) / (float) dimension * 100; ``` ::: 其中 `detect` 函式內部會篩選出屬於皮膚的顏色，下圖繪出符合資格的顏色，整體偏粉色系。 :::spoiler 繪圖程式碼和結果圖 ```python= import matplotlib.pyplot as plt import numpy as np %matplotlib inline # Get pixels qualified pixels = [] for r in range(60, 256): for g in range(40, 256): if r > 20: for b in range(20, r): if r - g >= 10: pixels.append([r,g,b]) # Draw color blocks nppixels= np.asarray(pixels) nppixels= np.reshape(nppixels,(1680,2114,3)) plt.figure(figsize = (120,151)) plt.imshow(nppixels, interpolation='nearest', aspect='auto') ``` ![](https://i.imgur.com/3FsRq9F.png) ::: ### Compute smooth table 針對 0~255 的灰階值 i ，建立 LUT ，公式如下: $smooth_table[i] = max(1,\ exp^{\tfrac{-i}{255\times smooth\_level}\ +\ smooth\_level\times(i+1)+1})$ :::spoiler 程式碼 ```c=358 int smooth_table[256] = {0}; float ii = 0.f; for (int i = 0; i <= 255; i++, ii -= 1.) { smooth_table[i] = ( expf(ii * (1.0f / (smoothing_level * 255.0f))) + (smoothing_level * (i + 1)) + 1 ) / 2; smooth_table[i] = max(smooth_table[i], 1); } ``` ::: ### Denoise 包含兩種實作的方法， `denoise` 為原始未做平行化處理的版本， `denoise2` 為 RGB 三通道平行化的版本 :::spoiler 程式碼 ```c=367 #if OPT_PLANE #pragma omp parallel for for(int i = 0; i < channels; i++) denoise2(out, in_planes, smooth_table, width, height, channels, i, min(width, height)/rate + 1); #else denoise(out, in, smooth_table, width, height, channels, min(width, height) / rate + 1); #endif ``` ::: 兩種實作方式都會先使用 `compute_offset` 函式來計算影像中，每一個 pixel 的 window 範圍為何，並且分別建立 row 和 column 的 LUT :::spoiler 程式碼 ```c=72 void compute_offset(int *out, int len, int left, int right, int step) { assert(out); assert((len >= 0) && (left >= 0) && (right >= 0)); for (int x = -left; x < len + right; x++) { int pos = x; int len2 = 2 * len; if (pos < 0) { do { pos += len2; } while (pos < 0); } else if (pos >= len2) { do { pos -= len2; } while (pos >= len2); } if (pos >= len) pos = len2 - 1 - pos; out[x + left] = pos * step; } } ``` ::: 在 `denoise` 中， `col_pow` 和 `col_val` 兩個變數分別是用來記錄一個 row 之中的所有 pixel，其 window 範圍內，固定 column 後所有 row 之 pixel 灰階值的平方和以及和。(**不同通道分開計算**) ```c=108 int *col_pow = malloc(width * channels * sizeof(int)); int *col_val = malloc(width * channels * sizeof(int)); ``` 不論是 row 或者是 column 的 LUT ，大小皆為 (2\*radius (*window's*) + width/height (*image's*)) \* channels ```c=110 int *row_pos = malloc((width + 2 * radius) * channels * sizeof(int)); int *col_pos = malloc((height + 2 * radius) * channels * sizeof(int)); ``` 在完成上述準備後，便可以開始計算以每一個 pixel 為中心， window 範圍內的所有 pixel 之灰階值平方和以及和: 計算方法為計算目前 pixel 為中心，在同一個 row 和 column 兩邊延伸 `radius` 長度， ![](https://i.imgur.com/JUWr6fB.png) :::info 若是 window 的範圍超出原本的影像範圍，則會重複計算，例如: OpenCV 中的 padding 策略 [BORDER_REFLECT_101](https://docs.opencv.org/master/d2/de8/group__core__array.html#gga209f2f4869e304c82d07739337eae7c5afe14c13a4ea8b8e3b3ef399013dbae01) `gfedcb|abcdefgh|gfedcba` 如下圖 ::: ![](https://i.imgur.com/mfDK8fU.png) 在同一個 row 中， `radius` 範圍內的所有 pixel 個別加總該和 pixel 同一個 column 且在 `radius` 範圍內的灰階值平方和以及和 ![](https://i.imgur.com/ekV67co.png) :::info window 的範圍超出原本的影像範圍的情形，綠色和橘色有部分重複計算的情形，如下圖 ::: ![](https://i.imgur.com/PLORZf6.png) 在一個 row 的所有 pixel 完成計算後，只要把在 `radius` 範圍內的所有 `col_pow` 和 `col_val` 加起來就能得到 window 內所有 pixel 灰階值的平方和以及和了。在遍歷完第一個 row 的所有 pixel 後，接下來每一個 row 的 `col_pow` 和 `col_val` 只需要把每個 pixel 所對應的舊 window 最上方 pixel 的灰階值移除掉，接著加上新 window 最下方 pixel 的灰階值即可。得到 window 範圍內的所有 pixel 灰階值之平方和以及和，就可以接著做 denoise 的計算: > $\vec{q}\ is\ pixel\ inside\ window\\ > \vec{p}\ is\ current\ pixel\\ > I\ is\ gray\ intensit\ value\\ > CLAMP2BYTE(v):\ 將v控制在[0,255]的區間$ $mean = \dfrac{\sum_{\vec{q} \in window}I_{\vec{q}}}{window\ size}\\ diff = \dfrac{\sum_{\vec{q} \in window}I_{\vec{q}}}{window\ size} - I_{\vec{p}}\\ edge = CLAMP2BYTE(diff)\\ masked\_edge = \dfrac{edge\times I_{\vec{p}} + (256-edge)\times mean}{256}\\ var=\dfrac{\sum_{\vec{q} \in window}I_{\vec{q}}^2-mean\times \sum_{\vec{q} \in window}I_{\vec{q}}}{window\ size}\\ out=masked\_edge-\dfrac{diff\times var}{var+smooth_table[I_{\vec{p}}]}$ :::spoiler 程式碼 ```c=118 int *row_off = row_pos + radius; int *col_off = col_pos + radius; for (int y = 0; y < height; y++) { uint8_t *scan_in_line = in + y * stride; uint8_t *scan_out_line = out + y * stride; if (y == 0) { for (int x = 0; x < stride; x += channels) { int col_sum[3] = {0}; int col_sum_pow[3] = {0}; for (int z = -radius; z <= radius; z++) { uint8_t *sample = in + col_off[z] + x; for (int c = 0; c < channels; ++c) { col_sum[c] += sample[c]; col_sum_pow[c] += sample[c] * sample[c]; } } for (int c = 0; c < channels; ++c) { col_val[x + c] = col_sum[c]; col_pow[x + c] = col_sum_pow[c]; } } } else { uint8_t *last_col = in + col_off[y - radius - 1]; uint8_t *next_col = in + col_off[y + radius]; for (int x = 0; x < stride; x += channels) { for (int c = 0; c < channels; ++c) { col_val[x + c] -= last_col[x + c] - next_col[x + c]; col_pow[x + c] -= last_col[x + c] * last_col[x + c] - next_col[x + c] * next_col[x + c]; } } } int prev_sum[3] = {0}, prev_sum_pow[3] = {0}; for (int z = -radius; z <= radius; z++) { int index = row_off[z]; for (int c = 0; c < channels; ++c) { prev_sum[c] += col_val[index + c]; prev_sum_pow[c] += col_pow[index + c]; } } for (int c = 0; c < channels; ++c) { int mean = prev_sum[c] / window_size; int diff = mean - scan_in_line[c]; int edge = CLAMP2BYTE(diff); int masked_edge = (edge * scan_in_line[c] + (256 - edge) * mean) >> 8; int var = (prev_sum_pow[c] - mean * prev_sum[c]) / window_size; int out = masked_edge - diff * var / (var + smooth_table[scan_in_line[c]]); scan_out_line[c] = CLAMP2BYTE(out); } scan_in_line += channels, scan_out_line += channels; for (int x = 1; x < width; x++) { int last_row = row_off[x - radius - 1]; int next_row = row_off[x + radius]; for (int c = 0; c < channels; ++c) { prev_sum[c] -= col_val[last_row + c] - col_val[next_row + c]; prev_sum_pow[c] = prev_sum_pow[c] - col_pow[last_row + c] + col_pow[next_row + c]; int mean = prev_sum[c] / window_size; int diff = mean - scan_in_line[c]; int edge = CLAMP2BYTE(diff); int masked_edge = (edge * scan_in_line[c] + (256 - edge) * mean) >> 8; int var = (prev_sum_pow[c] - mean * prev_sum[c]) / window_size; int out = masked_edge - diff * var / (var + smooth_table[scan_in_line[c]]); scan_out_line[c] = CLAMP2BYTE(out); } scan_in_line += channels, scan_out_line += channels; } } ``` ::: ## Analysis Test image info ``` cmd> jpeginfo jserv.jpg jserv.jpg 515 x 391 24bit Exif N 79551 cmd> jpeginfo girl.jpg jserv.jpg 758 x 408 24bit JFIF P 112971 ``` :::spoiler {state="open"} 圖示 ![](https://i.imgur.com/bIlKbpN.png) ::: ### Original program Using perf to analyze the bottleneck of this program. > [Perf -- Linux下的系统性能调优工具，第 1 部分](https://www.ibm.com/developerworks/cn/linux/l-cn-perf1/) - 使用 `perf stat` 指令分析程式 `main` 的執行狀況 ``` cmd> sudo perf stat ./main -i jserv.jpg -o out.jpg -l 10 ifn:jserv.jpg ofn:out.jpg level:10 detect - 981 us opt not used denoise - 9999 us Performance counter stats for './main -i jserv.jpg -o out.jpg -l 10': | 29.04 | task-clock | 0.994 CPUs utilized | | 0 | context-switches | 0.000 K/sec | | 0 | cpu-migrations | 0.000 K/sec | | 456 | page-faults | 0.016 M/sec | | 107,702,529 | cycles | 3.709 GHz | | 176,301,009 | instructions | 1.64 insn per cycle | | 20,828,946 | branches | 717.224 M/sec | | 1,009,097 | branch-misses | 4.84% of all branches | 0.029206255 seconds time elapsed 0.029257000 seconds user 0.000000000 seconds sys ``` :::spoiler 圖示 ![](https://i.imgur.com/Th1ETbN.png) ::: ``` task-clock 部份可以看到 CPU 的使用率接近1，代表此程式為 `CPU bound 型` ，運算時間多花在 CPU 運算上 ``` - 使用 `perf record` 紀錄程式執行的 cycles 分佈，再用 `perf report` 檢視 ``` cmd> sudo perf record ./main -i jserv.jpg -o out.jpg -l 10 cmd> sudo perf report Samples: 191 of event 'cycles', Event count (approx.): 176917325 |Children| Self | Command| Shared Object | Symbol | | 36.09% |35.57%| main | main |[.]denoise | | 26.52% |26.52%| main | main |[.]stbiw__jpg_processDU | | 8.84% |8.84% | main | main |[.]stbiw__jpg_writeBits.isra.15| | 5.72% |0.00% | main | [unknown] |[.]0x3dc3ef153db890d2 | | 5.52% |0.00% | main | [unknown] |[k]0000000000000000 | | 4.16% |4.16% | main | main |[.]stbiw__jpg_DCT | | 3.93% |0.00% | main | [unknown] |[.]0x0000000000000003 | | 3.64% |3.64% | main | libc-2.27.so |[.]_IO_fwrite | | 3.34% |2.79% | main | main |[.]detect | | 3.24% |3.24% | main | main |[.]stbi__load_main | | 2.60% |2.60% | main | libc-2.27.so |[.]_IO_file_xsputn@@GLIBC_2.2.5| | 2.43% |1.81% | main | main |[.]stbi__YCbCr_to_RGB_simd | | 2.21% |0.00% | main |[kernel.kallsyms]|[.]page_fault | | 2.21% |0.00% | main |[kernel.kallsyms]|[.]do_page_fault | | 2.21% |0.00% | main |[kernel.kallsyms]|[.]__do_page_fault | 0.028042523 seconds time elapsed 0.154073000 seconds user 0.000000000 seconds sys ``` :::spoiler {state="open"} 圖示 ![](https://i.imgur.com/trL2m3v.png) ::: - 其中，花費最多 cycle 的是 `denoise` 函式，`Children` 相較於 `Self` ，多出來的 overhead 測量是來自於自身 callee ([perf wiki](https://perf.wiki.kernel.org/index.php/Tutorial#Overhead_calculation))，花費第二、三多的則是做圖片 I/O 的函式。 - 檢視 `denoise` 函式可以發現最花時間的指令是 `idiv` ，佔據了高達 41.21%的時間。 ``` 1.47| mov %r9d,%eax | mov %r9d,0x0(%r13,%r8,4) 2.93| cltd 5.87| idivl 0xc(%rsp) 1.47| mov %eax,%edi | sub %ecx,%edi | cmp $0xfe,%edi | ↓ja 5d0 | mov $0x100,%r11d | sub %edi,%r11d | imul %edi,%ecx | imul %eax,%r11d |4dd:add %r11d,%ecx | sar $0x8,%ecx | imul %r9d,%eax | sub %eax,%esi | mov %esi,%eax | cltd 4.41| idivl 0xc(%rsp) 1.47| mov %eax,%esi | mov 0x28(%rsp),%rdi | imul %esi,%eax | add (%rdi,%r10,4),%esi | mov 0x18(%rsp),%rdi | cltd 41.21| idiv %esi 5.87| sub %eax,%ecx 7.36| mov %ecx,%edx | mov %ecx,%eax 1.47| not %edx | sar $0x1f,%edx | cmp $0xfe,%ecx 2.94| cmova %edx,%eax 1.47| mov %al,(%rdi,%r8,1) | add $0x1,%r8 | cmp $r8d,0x10(%rsp) | ↑jg 482 |52f:mov 0x30(%rsp),%rax 1.47| addl $0x1,0x38(%rsp) | add %rax,0x18(%rsp) | add %rax,%r14 | mov 0x38(%rsp),%eax | cmp %eax,0x3c(%rsp) | ↑jne 420 | mov 0x10(%rsp),%r14d |554:addl $0x1,0x58(%rsp) | mov 0x88(%rsp),%rdi | mov 0x58(%rsp),%eax | add %rdi,0x68(%rsp) | add %rdi,0x60(%rsp) | cmp %eax,0x78(%rsp) ``` :::spoiler 圖示 {state="open"} ![](https://i.imgur.com/GIheZ3E.png) ::: ### Parallelize version > 原始 `denoise` 函式中會一次處理 RGB 3個通道的灰階值，但彼此不互相影響，因此可以平行化處理， `denoise2` 函式一次只處理1個通道 To enable openmp, need to modify the `Makefile` to assign 1 to variable `ENABLE_OPENMP` and make macro `OPT_PLANE` effective. :::spoiler Makefile ```cpp CC ?= gcc CFLAGS = -Wall -O2 LDFLAGS = -lm ENABLE_OPENMP = 1 ifeq ("$(ENABLE_OPENMP)","1") CFLAGS += -DOPT_PLANE -march=native -fopenmp LDFLAGS += -fopenmp -lpthread endif ``` ::: - 使用 `perf stat` 指令分析程式 `main` 的執行狀況 ``` cmd> sudo perf stat ./main -i jserv.jpg -o out.jpg -l 10 ifn:jserv.jpg ofn:out.jpg level:10 detect - 1268 us opt not used denoise - 6456 us Performance counter stats for './main -i jserv.jpg -o out.jpg -l 10': | 153.55 | task-clock | 5.476 CPUs utilized | | 58 | context-switches | 0.378 K/sec | | 6 | cpu-migrations | 0.039 K/sec | | 836 | page-faults | 0.005 M/sec | | 569,175,959 | cycles | 3.707 GHz | | 188,975,348 | instructions | 0.33 insn per cycle | | 27,413,482 | branches | 178.533 M/sec | | 1,086,819 | branch-misses | 3.96% of all branches | 0.028042523 seconds time elapsed 0.154073000 seconds user 0.000000000 seconds sys ``` :::spoiler 圖示 {state="open"} ![](https://i.imgur.com/HGh1l9M.png) ::: - 相較於原始版本， performance 提昇了54% ## Discussion ### Size of window 在進行平滑化之前，通常要決定 kernel window 的大小，在此處我們不使用固定大小的 window size，而是去偵測人臉占整張圖的比例，並根據此來得到 window size。 ### Denoise algorithm >[Bilateral Filters](https://www.csie.ntu.edu.tw/~cyy/courses/vfx/10spring/lectures/handouts/lec14_bilateral_4up.pdf) > #### Filter basis 在影像上做 denoise 的方法有很多種，例如. mean filter, median filter, gaussian filter 和 bilateral等等。 bilateral filter ，翻譯為**雙邊濾波器**，上述其他的 filter 在對影像做平滑化時，會針對一個**給定大小**的 window ，針對其範圍內的所有 pixel 來做加權平均，下圖為針對 `jserv.jpg` 分別使用 mean filter, median filter 和 gaussian filter的結果 | Mean filter | Median filter | Gaussian filter | | -------- | -------- | -------- | | ![](https://i.imgur.com/DT1JeiM.png)| ![](https://i.imgur.com/Dy56Oa2.png)| ![](https://i.imgur.com/kFKTz9m.png) 由於這三種 filter 都只考慮到了 window 內 pixel 和 pixel **空間上的關聯性**來給予權重，在邊緣的部份很容易出現模糊的狀況，而雙邊濾波器為了解決邊緣的模糊問題，在原有的空間關聯以外，還加入了 pixel 和 pixel 之間的**色彩、光度差異**來協助保留邊緣，雙邊濾波器的計算公式和示意圖如下： ![](https://i.imgur.com/P8tyLnr.png) $\dfrac{1}{W_p}\sum_{q\in S}\overbrace {G_{\sigma_s}(||\vec{p}-\vec q||)}^{spatial\ kernel} \overbrace {G_{\sigma_r}(|I_{\vec p}-I_{\vec q}|)}^{range\ kernel} \overbrace{I_{\vec q}}^{grayvalue\ at\ location\ \vec q}$ - spatial kernel: 負責計算空間關聯性的 kernel，此處使用的是 gaussian distribution ，以 $G_{\sigma_s}$ 表示。 - range kernel: 負責計算色彩關聯性的 kernel，此處使用的是 gaussian distribution ，以 $G_{\sigma_r}$ 表示，其中 I 表示原輸入影像的灰階值， $\vec p$ 和 $\vec q$ 分別為當前目標 pixel 以及其他在 window 範圍內 pixel 的位置，以向量表示。 - grayvalue at location $\vec q$: 在 window 範圍內位置為 $\vec q$ 的 pixel，他的灰階值 - $W_p$: 放在分母做 normalization ，其數學式為: $\sum_{q\in S}{G_{\sigma_s}(||\vec{p}-\vec q||)}{G_{\sigma_r}(|I_{\vec p}-I_{\vec q}|)}$ 基本的雙邊濾波器如上圖所示，但不像其他的 filter ，(例如 gaussian filter 有明確指定使用 gaussian distribution 來作為 kernel)，雙邊濾波器比較像是一個概念，只要同時考慮到空間和色彩關聯性，其 kernel 使用的函式可以因應不同的實作來決定。 #### Back to our denoise function 在我們的 denoise 實作中，對應到的程式碼為: ```c= // Original denoise process int mean = prev_sum[c] / window_size; int diff = mean - scan_in_line[c]; int edge = CLAMP2BYTE(diff); int masked_edge = (edge * scan_in_line[c] + (256 - edge) * mean) >> 8; int var = (prev_sum_pow[c] - mean * prev_sum[c]) / window_size; int out = masked_edge - diff * var / (var + smooth_table[scan_in_line[c]]); scan_out_line[c] = CLAMP2BYTE(out); ``` 以下一一解說其平滑化的過程(提到的 pixel 灰階值都是只單一通道， RGB 三通道都是相同作法): - Line1: `mean` 做的是**空間關聯性**的平滑化，為 mean filter 所使用的概念，將 window 範圍內的所有值做平均，得到的會是模糊的影像 ![](https://i.imgur.com/m9R5PTT.jpg) - Line2-4: - `diff` 計算 window 平均值和當前 pixel 的差 - 在邊緣處， diff 值越大; 非邊緣處則越小 - 若相對於周遭較亮 (灰階值較高)，則 diff 為負; 反之則為正 - `edge` 將 `diff` 限制在 [0, 255] 的範圍內 - `masked_edge` 是做**色彩關聯性**的平滑化， `edge` 作為當前 pixel 的權重， `255-edge` 則是作為 window 平均值的權重，在邊緣處會加強當前 pixel 的色彩，非邊緣處則是保留空間平滑化的結果。下圖將 masked_edge 的結果取出來看並和上圖比較可以發現邊緣的部份被突顯出來 ![](https://i.imgur.com/kij0XzF.jpg) - Line6-8: - `var` 的值和是否為邊緣處有所關聯 - 若為邊緣處， `var` 值較大，反之則較小 - `out` 主要將平滑過後的值 (也就是 `masked_edge`) 扣除掉 `diff`，這邊的 `diff` 還要乘上一個接近 `1` 的比值 $\dfrac{var}{var+smooth\_table[scan\_in\_line[c]]}$ - smooth_table 中 index 越大則值越大，也就是說當前 pixel 灰階值越高，則比值越小 - 若在邊緣處，由於 `var` 值大，會減少當前 pixel 灰階值的影響 - 也就是說在原始影像的邊緣處，且比較暗的一側，會傾向於減去 `diff` ，又 `diff` 在偏暗一側為正值，使得結果變小，凸顯出暗色。 - 在比較亮的這一側則是減去 `diff` 的一部分，又 `diff` 在偏亮一側為負值，使得結果變大，凸顯出亮色。 #### Modification 1: Light edge 原始的程式碼中在偵測邊緣時， `edge` 的計算會將負值直接改成 `0` ，但亮度高的邊緣也應該被保留，做修改後如下 ```c= // Modified denoise process - 1 int mean = prev_sum[c] / window_size; int diff = mean - scan_in_line[c]; // 在進行範圍限制之前，先對 diff 取絕對值，使得亮度高的邊緣也被保留 int edge = CLAMP2BYTE(myabs(diff); int masked_edge = (edge * scan_in_line[c] + (256 - edge) * mean) >> 8; int var = (prev_sum_pow[c] - mean * prev_sum[c]) / window_size; int out = masked_edge - diff * var / (var + smooth_table[scan_in_line[c]]); scan_out_line[c] = CLAMP2BYTE(out); ``` 使用 `jserv.jpg` 來做觀察 ``` cmd>./main -i jserv.jpg -o out.jpg -l 10 ``` - 最終結果 | Before modification | After modification | | -------- | -------- | | ![](https://i.imgur.com/Esz44nH.jpg) | ![](https://i.imgur.com/Ywq6X3K.jpg)| - 使用 mean filter 後第一次凸顯邊緣的結果 `masked_edge` 觀察，可以發現色彩較亮的區域比較明顯 |Before modification | After modification | | -------- | -------- | |![](https://i.imgur.com/Ky6Kjwk.jpg)|![](https://i.imgur.com/tgAvS9W.jpg)| 使用 `girl.jpg` 來做觀察 ``` cmd>./main -i girl.jpg -o out.jpg -l 10 ``` - 最終結果 | Before modification | After modification | | -------- | -------- | | ![](https://i.imgur.com/39b4a2c.jpg) | ![](https://i.imgur.com/Dg0m4sn.jpg) | - 使用 mean filter 後第一次凸顯邊緣的結果 `masked_edge` 觀察，髮絲的部分明顯較清楚 |Before modification | After modification | | -------- | -------- | |![](https://i.imgur.com/uJZNPKo.jpg)|![](https://i.imgur.com/31xVDbW.jpg)| #### Modification 2: Computation of `var` 將原始程式碼計算 `var` 重複計算的部份做修改 ```c= // Modified denoise process - 2 int mean = prev_sum[c] / window_size; int diff = mean - scan_in_line[c]; // 在進行範圍限制之前，先對 diff 取絕對值，使得亮度高的邊緣也被保留 int edge = CLAMP2BYTE(myabs(diff); int masked_edge = (edge * scan_in_line[c] + (256 - edge) * mean) >> 8; // 重複利用前面已經計算過的 mean 做平方以取代 mean * prev_sum[c]/window_size int var = prev_sum_pow[c] / window_size - mean * mean; int out = masked_edge - diff * var / (var + smooth_table[scan_in_line[c]]); scan_out_line[c] = CLAMP2BYTE(out); ``` - 在未平行化的版本使用 `jserv.jpg` 測試， performance 可獲得 10%的提昇 | Before modification | After modification | | -------- | -------- | | detect - 862 us <br>denoise - 9987 us | detect - 862 us <br>denoise - 9016 us | - 在平行化的版本使用 `jserv.jpg` 測試， performance 差異不大 | Before modification | After modification | | -------- | -------- | | detect - 1197 us <br>denoise - 4216 us | detect - 1200 us <br>denoise - 3991 us |