# 2020 CV (Notes for Mid)
==[期末共筆]==(https://hackmd.io/qwoDMQFSTF6sdb150AThrA?both)
[TOC]
[去年共筆](https://hackmd.io/UhyNov3ASWm6DdsFvIUcwQ?fbclid=IwAR0_HqdLbGov94zQG7v6MllD4TLmC3DnzbVmPUAZtS_Hr5iBLNjKlkJ_hL4)
> 跪求 108, 106(PTT) 以外的考古,不太方便放連結的話也可以直接把考古的內容加到共筆裡,有其他資源也可以放在這感謝!
## 備註
> 權限改所有人都可編輯,不需登入。如果不小心刪掉我有備份。[name=v][time=Sun, Nov 8, 2020 6:46 AM][color=#ea00da]
> ==0. 另外我看了一下去年共筆中分享的題目(在共筆連結中裡面有個 google drive 的連結),也有加上來,如果有看到考古或沒見過的名詞也可以加到這,因為名詞真的太多了,我的感覺是目前應該只涵蓋 70% 名詞解釋。btw 最底下的考古答案應該有些名詞都有加上來了,preserve order 之類的,可以 Ctrl+f==
1. ==目前還有很多名詞(包括考古題的名詞)是空白的或還沒加進來,甚至單純只寫概念沒有完整敘述,我很難在短時間內完成,開共筆就是希望能集合多人的力量完成,同時也能作為你期中準備的資源!==
2. 我沒時間排版,如果有好心人願意排版,我和大家應該都會很感激。
3. 助教說考試中英文都可,所以我用蠻多中文敘述的,而除了第六章以外的中文敘述,其餘中文敘述是直接丟 google translate 的結果,要加英文的就自己加。
4. 有些定義是從課本找到的。
5. 有些名詞找不太到或跳章節,我目前是加在第一章或關聯章節居多。
6. "()" 是我個人習慣,可替換
7. HackMD 好像沒有備份,或得連結 GitHub,所以我目前會定時手動備份。
## ch1
- computer vision
to emulate human vision with computers, dual process of computer graphics: 2D to 3D (graphics: 3D model -> render to 2D image)
- pixel
picture element: has properties of position and value
圖片元素:具有位置和值的屬性
- intensity image
For **optic or photographic sensors**, $I(r,c)$ 會正比於 radiant energy,此時 image 被稱為 intenstity image
> From Google: An intensity image is a data matrix, $I$ , whose values represent intensities within some range
- range image
for range sensor, $I(r,c)$ 是描述 line-of-sight distance,此時 image 被稱為 range image。 簡而言之是紀錄該 pixel 在 world coordinate 中的深度值,有多張 range image 就可以重建 3D model。
- symbolic image
$I(r,c)$ 是 index 或 symbol associated with category,稱 symbolic image。
- image formation
e.g. perspective(透視圖法)or orthographic projection(垂直投影圖)
- image sequence
通常指某段時間(frames)內得到的多個影像資料
- Stereo
3-D reconstruction (from ch1 slide)
立體視覺(Stereo Vision)是基於人眼視差的原理,在自然光源下,透過兩個或兩個以上相機模組從不同的角度對同一物體拍攝影像,再進行三角測量法等運算來取得與物體之間的距離資訊。
- position
表示 pixel 的位置,可用 row r 和 column c 表示 (r,c)
- orientation
旋轉的方位,可用 $\theta$ 來判定
- proximity
在論文中通常指真實距離。而 proximity sensor 指鄰近感測器。
- genus
genus $I(g)$ is the number of connected component of $I$ minus the number of holes of $I$. (from textbook)
- hole

a hole is a connected component of binay-0 pixels that does not connect with the border frame of the image.(from textbook)
- corner
A corner can be defined as the intersection of two edges.
影像的轉角點
- edge
Edges represent boundaries between objects and background (or two image regions).
輪廓
- shape
Prime carrier of information in machine vision
描述 image 或 region 中 2D 形狀或 3D 物體的方式,如 512x512 image 代表 row, cols = 512
- feature
A feature is typically defined as an "interesting" part of an image.
- atomic image features:
最小不可分割的特徵單位(例如 edge 不會再被分成 sub-edge), 如 edge、corner、hole、ridge(山脊)、valley(山谷)、peak(高點)、pit(低點)
- composite features
如 arcs、region
atomic features merged
- arcs
edge or ridge pixels linked together
- region
相似屬性的相連像素集
connected sets of pixels with similar properties
- conditioning(潤飾):去除無用資訊
基於一個模型,該模型建議觀察到的圖像由 an informative pattern 所組成,而 an informative pattern 是由經 uninteresting variations 修改後得到。 e.g. Noise suppression, background normalization
(Conditioning is based on a model that suggests that the observed image is composed of an informative pattern modified by uninteresting variations that typically add to or multiply the informative pattern.)
- labeling(標籤):針對相連的像素進行標籤
標記基於一個模型,模型建議 the informative pattern 具有空間事件排列的結構,每個空間事件都是一組相連的像素。e.g thresholding, edge detection
(Labeling is based on a model that suggests that the informative pattern has structure as a spatial arrangement of events, each spatial event being a set of connected pixels.)
- grouping(群組):把標籤值相同或相近區域圈選出來
通過收集在一起或識別參與相同事件類型的最大像素連接集(來識別事件)。
-> 收集參與相同事件之最大像素連接集合 e.g. Segmentation, edge linking
(The grouping operation identifying maximal connected sets of pixels participating in the same event.)
- extracting(解析):計算出能代表群組特性的特質
為 group of pixels 計算 properties
(The extracting operation computes for each group of pixels a list of its properties.
example properties: centroid, orientation, area, spatial moments)
- matching(比對):與已知物體進行分析比對
決定解釋圖像事件,將這些事件與某些給定的三維物體或二維形狀相關聯。e.g. template matching
(Matching operation determines the interpretation of some related set of image events, associating these events with some given three-dimensional object or two-dimensional shape.)
- gray level
Intensity Image 的像素值。8 bit integers 的範圍介於 0~255。
- segmentation (from ch.10)
partition of image into sets of non-overlapping regions
- VR, AR
Virtual Reality 虛擬實境
利用電腦類比產生一個三維空間的虛擬世界,提供使用者關於視覺等感官的類比。
Augmented Reality 擴增實境
攝影機影像的位置及角度精算並加上圖像分析技術,讓螢幕上的虛擬世界能夠與現實世界場景進行結合與互動的技術。
- pattern recognition(詳細解釋在ch4.)
圖形識別,就是通過電腦用數學技術方法來研究圖形的自動處理和判讀。
- alignment
Image alignment is the technique of warping one image (or sometimes both images) so that the **features in the two images line up perfectly**.
> 我覺得這段應該改成 warping images so that... 可能會比較保守,因為一般應該是對多張影像做 alignment。[name=v]
> [From Google] 影像對準(image alignment)是指找出兩張影像之間的變換關係,如平移、旋轉、縮放,經過變換之後使兩張影像中相同的部分可以重疊。影像之間的變換關係可以用一個矩陣來表示,所要求得的矩陣的未知數數量越多,代表兩影像之間的關係越複雜、越難對準。
> 在投影片中 alignment 技術可用來自動化 PCB(printed circuit board) drilling (鑽孔)
- shape from texture
a computer vision technique where a 3D object is reconstructed from a 2D image
- shape from shading
computing the 3D shape of a surface from one image of that surface
- peripheral
<>
> 圓周的,周圍的、(??)
peripheral vision 稱鄰近視覺
## ch2 - Thresholding and Segmentation
binary value 1: considered part of object
binary value 0: background pixel
binary machine vision: generation and analysis of binary image
* historgram
視覺化的分析,X 為灰階影像的強度,Y 則是在區域內強度為 X 的數量
$h(m)=\#\{(r,c)~|~I(r,c)=m\}$
* minimize within-group variance
minimize weighted (weight by probability) sum of group variances, where the two groups are:
$\text{pixels} \le t,~\text{pixels} \gt t$
$\sigma_w^2 = q_1(t)\sigma_1^2(t) + q_2(t)\sigma_2^2(t)$
* mixture of two gaussians
two gaussians: $f(i)=q_1N(\mu_1, \sigma_1) + q_2N(\mu_2, \sigma_2)$
* standard deviation
<>
> from wiki:
> a measure of the amount of variation or dispersion of a set of values
> 一組數值的離散程度
==**Two Thrersholding Algorithms**==
* minimize Within-Group Variance (Otsu)
* minimize KL-div (Kittler-Illingworth)
>概念都是去尋找 threshold。雖然作業是直接定 127, 128,但如何找到一個或多個適當的threshold 是一門學問,以上兩方法都是為了找 threshold 值,一個是 minimize variance,另一個是 minimize KL divergence
- connected component analysis
thresholding (binary image)
-> connected component labeling
-> region properties measurement
-> statistical pattern recognition (make decisions based on projections)
- label(標籤)
獨特的名字或是索引。作為具有潛在目標區域的辨識碼
- pixel property
position, gray level or brightness level
- region property
shape, bounding box, position, intensity statistics
- Connected-Component Labeling
一個分組行為,在一個二元影像中,所有聯通在一起的像素會給予一個相通的標籤
All pixels that have value binary-1 and are connected to each other by a path of pixels all with value binary-1 are given the same identifying label. It is a grouping operation.
> from wiki:
> the creation of a labeled image in which the positions associated with the same connected component of the binary input image have a unique label
> 白話:給每個聯通分量一個獨特的標記[color=#3ef265]
==**Two Connected-Component Labeling**==
* Connected-Component Labeling - An Iterative Algorithm
1. 初始化給定每個 1-pixel unique label
2. 進行 top-down pass(左上右下)看當下 pixel 鄰居的最小label並更動成此最小值,做完再 bottom-up pass(右下左上)進行同樣動作,一直重複兩個 pass 直到沒有任何 change
* Connected-Component Labeling - The Classical Algorithm
two pass + global table
1. 第一次的 top-down pass 時,如果當前的 1-pixel 的鄰居皆為 0-pixel,則創立一個新label;若鄰居是 1-pixel 且 label 值相同,則其設為相同 label 值;若鄰居是 1-pixel 但有兩個不同的 label,則選值較小的作為其 label 並記錄這兩個 label 值為等價的,存在 global table 中
2. 第二次 top-down pass 則根據 global table 去合併 label 得到結果
-> 問題:global table 可能過大
()Connected-Component Labeling - A Space-Efficient Two-Pass Algorithm That Uses a Local Equivalence Table
用較小的 table 儲存當行的等價,因此 table 的大小最大只會是 image 寬度
當偵測到等價時,記錄到 table 中,第一次 pass 完該行後,馬上進行第二次 pass 去重新標記該行 label
> 此方法只改善空間複雜度
()Connected-Component Labeling - An Efficient Run-Length Implementation of the Local Table Method
直接將整 row 的 1, 0 轉換成 table 紀錄 index
之後完全都在處理 table,因此空間、時間皆 efficient
()signature
*histogram* of the nonzero pixels of the resulting masked image/projection
()signature analysis
thresholding (binary image)
-> signature segmentation (projection)
-> region properties measurement
-> statistical pattern recognition (make decisions based on projections)
==()**Two Signature Segmentation**==
()signature segmentation(多次投影)
consists of taking one or more projections of a binary image or a subimage, and taking property measurements of each projection segment.
> 將 value 非 0 的 pixel 投影(垂直、水平、對角投影皆可),可得該區段下的 pixel 數目,藉以做些影像分析。
()signature segmentation - projections
Projections can be vertical, horizontal, diagonal ... etc
## ch3 - Region Analysis
#### region properties - basic
#### Basic terms
* regions:
connected components labeling 產生的結果(同時也可以用 ch1 的解釋)(相似屬性的相連像素集)
* region properties
measurement vector input to classifier
輸入到分類器的測量向量 position, extent, shape,
level properties
從一個region要被萃取來用在分類任務上的特徵, 如 position, extent, shape, gray level properties。
* region intensity histogram

* Bounding Rectangle:
smallest rectangle circumscribes the region
能夠包含所選 region 的最小矩形
* major axis
$\max(M_1, M_2, M_3, M_4)$
$M_i$ 的定義見[這裡第二點](#region-properties---Extremal-points-and-spatial-moments)
* rotation matrix

Counter clockwise rotate theta.
* center of circle
<>
* border pixel(邊界像素)
只要存在相鄰像素在四/八鄰域窗內包含非屬同區域的像素則為 boader pixel
has some neighboring pixel outside the region
* Perimeter:
a sequence of interior border pixels
region 的所有 border pixel 構成的集合
* microtexture properties
function of co-occurrence matrix, 在講義中為 $P(g1, g2)$, 輸出為 grey level value 共同出現的頻率
#### Basic Formula
* Area $A=\sum_{(r,c)\in R}1$, $R$ means region.
* Centroid $r_{mean}=\frac{1}{A}\sum_{(r,c)\in R}r$, $R$ means region.
* Centroid $c_{mean}=\frac{1}{A}\sum_{(r,c)\in R}c$, $R$ means region.
* Perimeter - $P_4$ and $P_8$
$P_4=\left\{(r,c)\in R | N_8(r,c) - R \neq \phi\right\}$ 八聯通的邊界,內部為四聯通
$P_8=\left\{(r,c)\in R | N_4(r,c) - R \neq \phi\right\}$ 四聯通的邊界,內部的八聯通
* Average gray level (intensity)(from previous slide)

* Gray level (intensity) variance(from previous slide)

* Gray-Level Co-occurrence Matrix(GLCM)(需先定義$S$)
$P(g1,g2)=\frac{\#\{ [(r1,c1),(r2,c2)] \in S | I(r1,c1)=g1, I(r2,c2)=g2)\}}{\#S}$
> $P(1,0)$ 代表收集(counting) 所有 1.灰階值為(1,0)的pixel pair 和 2. pixel pair必須被定義在$S$ 的像素集合。講義中 $\#S=1$
#### region properties - Extremal points and spatial moments
> How can we use extremal points?
> 1. Line’s length/orientation
> 2. Triangle’s base/height
> 3. Rectangle’s orientation[color=#3259e5]
1. definition 八個點:(順時針)

2. Respective Axes ($M_1, M_2, M_3, M_4$)
由相反的一組點形成的軸 e.g. $M_1$: topmost-left ($r_1, c_1$) and bottommost-right ($r_5, c_5$)
length of $M_i=\sqrt{(r_i-r_{i+4})^2+(c_i-c_{i+4})^2} + Q(\theta),\ i=1\sim4$
$Q(\theta)\approx 1.12$
3. major axis and minor axis
major axis = $\max(M_1,M_2, M_3, M_4)$
minor axis = mate of major axis
4. Orientation of Respective Axes
$\phi_i=\tan^{-1}\frac{r_i-r_{i+4}}{-(c_i-c_{i+4})}$
5. spatial moments formula
* Second-order row moment $\mu_{rr}=\frac{1}{A}\sum_{(r, c)\in R}(r-\bar{r})^2$
* Second-order mixed moment $\mu_{rc}=\frac{1}{A}\sum_{(r, c)\in R}(r-\bar{r})(c-\bar{c})$
* Second-order column moment $\mu_{cc}=\frac{1}{A}\sum_{(r, c)\in R}(c-\bar{c})^2$
* Second-order mixed gray level moment $\mu_{rg}=\frac{1}{A}\sum_{(r, c)\in R}(r-\bar{r})(I(r, c)-\mu)$
$\mu_{cg}=\frac{1}{A}\sum_{(r, c)\in R}(c-\bar{c})(I(r, c)-\mu)$
#### signature properties
* $P_H(r)=\#\{c|(r,c)\in R\}$
* $P_V(c)=\#\{r|(r,c)\in R\}$
* $A=\sum_{(r,c)\in R}1=\sum_{r}P_H(r)=\sum_{c}P_V(c)$
* Derivation
* $\mu_{rr}=\frac{1}{A}\sum_r(r-\bar{r})^2P_H(r)$
* relationship
* e 是左上到右下 d 是右上到左下
* $\mu_{dd}=\mu_{rr}+2\mu_{rc}+\mu_{cc}$ 記法:第一象限方向 -> 都正的 全用加的
* $\mu_{ee}=\mu_{rr}-2\mu_{rc}+\mu_{cc}$
* $\mu_{rc}=\frac{\mu_{dd}-\mu_{ee}}{4}$
* surface mount device (SMD) placement: position and orientation of parts
#### signature analysis (rectangle)
1. 先用左上角($a_1, b_1$)表示中心($\Delta x, \Delta y$)
2. 換成 u, v (用g)
3. 水平 A+B, E+F 算 u
4. 垂直 AEC, BFD 算 v
5. 代回
#### signature analysis (circle)


會算 $d$ 跟 $A=\frac{r^2}{2}(2\theta - sin2\theta)$ 就可了
> $A, B$ 面積可以直接求得,接著透過 $\frac{2\pi A}{A+B}=2\theta-\sin 2\theta$ 搭配查表得到$ \theta$,最後再由 $d = r\cos\theta$ 找出 $d$
#### Histogram Equalization
利用 Histogram 算出各個灰階值的 probability density function
將 PDF 做累加求出 CDF
將 CDF 的結果四捨五入後做出對照表
透過查詢剛剛建立的對照表,決定 Transition 完各個灰階值的機率(值)
## ch4 Statistical Pattern Recognition

- Units
Image regions and projected segments, each unit has an associated measurement vector
- category assignment
category assignment is made that names or classifies the unit as a type of object
- pattern recognition
classify each unit based on measurement vector using decision rule
Also called pattern identification
Process:
A unit is observed or measured
A category assignment is made that names or classifies the unit as a type of object
The category assignment is made only on observed measurement (pattern)
- economic gain
進行類別分配在經濟或效用上帶來結果
> the act of making category assignments carries consequences (t, a, d) economically or in terms of utility
- identity gain matrix(diagonal matrix)
\begin{equation}
e(t, a) =
\begin{cases}
1 & \text{if}\ \ t = a \\
0 & \text{else}
\end{cases}
\end{equation}
- measurement vector(課本無,但考試有考)
Unit 對應到的觀測值
- Statistical Pattern Recognition
就是 ch4.introduction,因為 SPR 就是 ch4 的標題

- prior probability
先驗機率
- posterior probability
後驗機率

- conditional probability
條件機率
寫公式 $P(A|B)=\frac{P(A \cap B)}{P(B)}$
- Bayes decision rule
選後驗機率(posterior probability)最大的
- maximin decision rule
$f$ is a maximin desicion rule iff the expected economic gain of $f$ is not less than any other decision rule $g$
- Reserve Judgement
保留的判斷為決策規則提供了另一種選擇,並不會每個量測值 d 都給 assignment
- Nearest Neighbor Rule
assign pattern x 給 training set 中最近的 vector
- Chief difficulty: brute-force nearest neighbor algorithm computational complexity proportional to number of patterns in training set
- Binary Decision Tree Classifier
- Decision at non-terminal nodes
- Thresholding the measurement component
- Fisher’s linear decision rule
- Bayes quadratic decision rule
- Bayes linear decision rule
- Linear decision rule from the first principal component
- Error Estimation
用來表示 performance of that decision rule
#### Formula
- Expected profit
g: good, b: bad
1. $E=P(g,g)e(g,g)+P(g,b)e(g,b)+P(b,g)e(b,g)+P(b,b)e(b,b)$
- $P(b|g)$: false-alarm rate:given good但detect bad
$P(b|g) = \frac{P(g, b)}{P(g, b)+ P(g, g)}$
$P(g|b)$: misdetection rate(漏檢率):given bad但以為是good(沒檢查出來)
$P(g|b) = \frac{P(b, g)}{P(b,g ) + P(b, b)}$
- fair game assumption
只知道 measurment data
$P(a|t,d) = P(a|d)$ (你不知道真實 label)
leads to
$P(t,a|d) = P(a|d) P(t|d)$
conditioned on measurement d, the true category and the assigned category are independent.
- $f(a|d)$: decision rule
- Expected Economic Gain
$E[e;f]=\sum_{d\in D} \{\sum_{a \in C}f(a|d) [\sum_{t \in C}e(t,a)P(t,d)] \}$
$=\sum_{a \in C} f(a|d^1)D(a,d^1) + \sum_{a \in C} f(a|d^2)D(a,d^2)+...+\sum_{a \in C} f(a|d^k)D(a,d^k)$
* $D(a, d^i)$: 在觀察到 $d^i$ 的時候預測 $a$ 的期望 economic gain
#### Calculation(自己導可能比較有用)
1. Expected profit per object (direct form and conditional form)
2. Maximizing Expected Economic Gain
- Rule: $f(a|d^k)=1$ if $D(a, d^k)$ max, else 0
- $f(a|d^k)$ 是一個 decision rule, $\sum_{a \in C}f(a|d^k)=1$
3. Maximin Decision Rule(Bayes gain and Maxmin gain plot)
-
4. Bayes/ Maximum likelihood decision rule
- $f(a|d) = 1~~\text{if}~~P(d|t=a) = max P(d|t)$ (maximize likelihood -> maximize posterior probability)
5. Decision Rule Error
- Misidentification error: P(unit not assigned to ck | true category is ck)
- False identification error: P(unit assigned | true category is NOT ck)
## ch5
- mathematical morphology works on shape
- morphological operations:
- simplify image data
- preserve essential shape characteristics
- eliminate irrelevancies
- shape: correlates directly with decomposition of object, object features, object surface defects, assembly defects
- set theory
language of binary mathematical morphology.
- extensive operator
operators whose output contains input
- antiextensive
output contained in the original set
- dilation
combine two sets by vector addition of set elements.
$A\oplus B=\{ c\in E^N | \exists a\in A, b\in B, c=a+b \}$, $B$ is kernel
* associative: $(A \oplus B) \oplus C = A \oplus (B \oplus C)$
* dilation of translated kernel=translation of dilation: $A_t\oplus B=(A\oplus B)_t$
- translation
$A_t=\{c\in E^N | c=a+t \text{ for some } a \in A\}$
- erosion
- $A\ominus B=\{ c + b \in A, \forall b \in B \}$, $B$ is kernel
- morphological dual of dilation.
- $A_t\ominus B=(A\ominus B)_{-t}$
- erosion dilation duality
- $(A\ominus B)^c = A^c \oplus \hat B$
- B hat is the kernel reflected with resprect to the origin
- hit and miss
intersection of erosions.
$A \ominus J \cap A^c \ominus K$
- usage:
- selects corner/isolated/border points,
- performs template matching,
- thinning,
- thickening,
- centering
- **genus**
Number of connected components minus number of holes of I, 4-connected for object, 8-connected for background
- opening
First apply erosion then apply dilation.
$B\circ K=(B\ominus K)\oplus K$ 將圖形凸出的銳角給鈍化,也會把孤島(isolate island)去除。
- anti extensive
- invariant to kernel translation
- preserve order
- closing
$B\bullet K=(B\oplus K)\ominus K$ 將圖形內陷的銳角給鈍化,也會把hole填補上。
- extensive
- invariant to kernel translation
- preserve order
- opening and closing **idempotent**
$(A \circ K) \circ K=A \circ K$
$(B \bullet K)\bullet K = B \bullet K$
- extensive (closing's property)
operators whose output contains input
e.g A 包含於 A closing K / 若 B 為 A 的使用包含 origin 的 dilation 結果, 則 B 必包含 A, 因此 dilation 有 extensive 性質
- antiextensive (opening's property)
operated set contained in the original set.
e.g A opening K 包含於 A
- preserve order (increasing)
order 指的是集合的包含屬於,也就是該 operation 會有保留原本的包含性質。dilation, erosion, opening, closing 都有此性質。
e.g A 包含於 B ->(A dilate K)包含於(B dilate K)
- Conditional Dilation
$J_n = (J_{n-1} \oplus D) \cap I$, $D,I$ 分別為 kernel 和 original image
- generalized opening: any increasing,
antiextensive, idempotent operation
- generalized closing: any increasing, extensive,
idempotent operation
- duality
雙重性,二元性
- top

- umbra

- gray scale dilation $f \oplus k = T\{U [f] \oplus U [k]\}$

- gray scale erosion $f \ominus k = T\{U [f] \ominus U [k]\}$

- median root image
經過 median filter 後,圖片不會改變
- distance transform(就是 ch6 的 distance transform)
The intensity values after distance transform represents the distance to the closest boundary from each point. ([ref](https://homepages.inf.ed.ac.uk/rbf/HIPR2/distance.htm))
- morphological skeleton(可想成一種壓縮演算法)

> 照著公式做就能得到骨架。最後得到的骨架每個 pixel 內的值代表要 dilate 幾次才能還原,0 則補 1。[color=#3259e5]
> from wiki:
> morphological skeleton is a skeleton (or medial axis) representation of a shape or binary image, computed by means of morphological operators
- medial axis transform
medial axis with distance function
- Morphological Noise Cleaning and Connectivity
Images perturbed by noise can be morphologically filtered to remove some noise
- Morphological Sampling Theorem
Before sets are sampled for morphological processing, they must be morphologically simplified by an opening or a closing.
Such sampled sets can be reconstructed in two ways: by either a closing or a dilation. (from slide)
- morphological operations 定義:
simplify image data, preserve essential shape characteristics, eliminate
irrelevancies
- morphological operations 用途:
shape extraction, noise cleaning, thickening, thinning,
skeletonizing
---
## Ch6 Neighborhood Operators
### 助教上課提示重點
1. ==**各種 Symbolic Neighborhood Operators 計算和效果(不用記代數)**==
2. ==**6-1-1 Introduction 名詞解釋**==
3. ==**Non-Rec. Neighborhood Operator 如何產生的**==
4. ==**6.4.0 Linear Shift-Invariant Neighborhood Operators**==
5. ==**Convolution & correlation**==
6. ==**3個 noise cleaning 的 kernel 怎麼生出來的**==
### Introduction(neighborhood operator可用以下來分類)
1. domain type: symbolic and numeric
- numeric: +, -, max, min
- symbolic : AND, OR, NOT, Table look up
2. neighborhood type : 4 connected and 8 connected
3. Recursive type: Rec.(Sequential) / Non-Rec. (parallel)
輸出是否取決於先前生成的輸出,從 memory 觀點: sequential vs parallel
> 如果 recursive,那會如何決定各個位置輸出的順序呢?
### Other Terms
- low level vision
- grouping, Color, Spatial freq., Local Motion(from slide)
- weight mask
- mask(kernel) 通常是有權重的,將 weight mask 和 image 做 neighbor operations,可以 aggregate 原本 image 中每個 pixel 鄰域的特性,變成用一個 pixel 來表示(我也不知道怎麼寫比較好)
- noise cleaning
- 使用 neighbor operators 來將雜訊去除
- neighborhood operator
- 可用 domain type, neighbor type 和 recursive type 來分類,用定義的 opeator 使用鄰域的 pixels 進行運算,得到不同的 characterstics
- recursive neighborhood operator
- 輸出取決於先前生成的輸出的 neighborhood operator。如 connected shrink operator。
- seperability
將一個 2D mask 拆解為兩個 1D mask,能將 (2M+1)x(2N+1)mutliplications, addtion 減少成 2M+1+2N+1mutliplications, addtion
- monotonic path
行走方向要垂直,當一開始選擇了左後,就不能再往右,因此最多往兩個方向前進
### Non-Rec. Neighborhood Operator for noise cleaning
for N x N mask
* 1. (N-2)\*(N-2) box filter
* 2. ($2\left(\frac{n-1}{2}\right)^2-(r^2+c^2)$)
* 3. $C^{N-1}_{r+((N-1)/2)} \times C^{N-1}_{c+((N-1)/2)}$,N is usually a odd number. (from MIT CV course slide)
* Cross-correlation($\otimes$) vs convolution($*$)
* convolution slides a filpped kernel, while cross-correlation doesn't
* Equal when point symmetric
### Symbolic Neighborhood Operators
> 這裡用數字標記,因為有些是有相關的
1. Region growing (Symbolic/Non-Recur.)
計算:將目前像素改成第一個碰到的 Label 值。(注意:pad background)
效果:可讓背景值變 label 值 -> Label 範圍擴大
- 只看第一個碰到的
2. Nearest Neighbor Sets and Influence Zones(Symbolic/Recursive)
計算:recursive 地做 region growing
效果:會 讓label 外圍匡成一圈
- 阿就框
3. Region-Shrinking Operator (Symbolic/Non-Recur.)
計算 : kernel 範圍是 "hit" 就將目前像素改成 background(注意:pad background)
效果 : change border pixel to background,change connectivity,Label區域會被縮小
- Kernel 放下去只要裡面有一個是 g,那這個 center 就改成 g
4. Approach Euclidean-distance
效果:discrete pixels approach to continuous circle
- 從 center 要移動多少格才會到該地方,該格數字就是多少,4鄰和8鄰差別在於能不能斜著走。
- Approach Euclidean 採用混用。(4-8-4-8 ..interatively)
- 4-neighborhood dist < Euclidean-distance < 8-neighborhood dist
5. Mark-Interior/Border-Pixel Operator
計算:周圍要跟 x0 一樣,才會是 interior label。不然就是 border pixel。
效果:標記 interior 和 border pixel,可用於其他分析如 Thinning Operator。
- 其實可以把 kernel 套下去看是不是都跟 center 一樣,一樣就是 interior
6. Connectivity Number Operator(HW)
共同目的(效果):對像素與其鄰居的連接方式進行分類
- Yokoi Connectivity Number
- 計算:Label 值為拿掉該 pixel,鄰居會被分成幾塊。0 代表 isolated (周圍沒有人),5 代表 interior pixel。其他 1~4 代表拿掉 pixel,周圍鄰居會被分成幾塊。
- Rutovitz Connectivity Number
- 計算:Label 值為穿梭八鄰域會變換幾次 Label。因此 0 會是 interior pixel。
7. Connected Shrink Operator(Recursive)(part of HW)
計算:拿掉連接區域中,若刪除卻不會斷開區域的 border pixel。
效果:shrink 後不會把 region 分成兩個,會讓 region 保持 connected。
8. Pair Relationship Operator
計算:若本身是 border pixel,且 neighbor 有至少一個 interior pixel,就標記成 p,反之都標記成 q(包含interior pixel)。
效果:p Label 代表我們關注的點,代表這個點 Label l 附近有足夠多的 Label m。(from textbook)
9. Thinning Operator(HW)
計算: original symbolic image + (5.+8.)的 image,兩個去做 connected shrinking operator,若該 pixel 在兩邊都能 shrink 才能 shrink
> 每做一次就是上下左右剝一層皮 (by 教授)
效果: 運算結果能保留原本 region 的 geometry 和 topologic,相關應用為骨架抽取(skeletonization)。
10. Distance Transformation Operator
計算:從 background 做 region growing。
效果:生成一個數字圖像,其像素用每個像素與其最接近的邊界像素之間的距離標記。
11. Radius of Fusion
計算:把所有相聚 $\rho$ 的聯通塊合併在一起
效果:把凸起的角變圓滑
12. Number of Shortest Paths
計算:從 0-pixel 走到 1-pixel 的 shortest path 走法有幾種(如果你自己是空,就把周圍加總,如果你有值,就 output 上一次的 value)
### ==**Linear Shift-Invariant Neighborhood Operators**==
種類: 1. convolution 2. correlation
- Linear: 我們將每個像素替換為其相鄰像素的線性組合。
- Shift-Invariant: 我們在圖像的每個點執行相同的操作
- ==**Convolution**==
Kernel shape: square, octagon, disk, diamond
- ==**Relation between Convolution and Correlation**==
- Key difference : **Convoultion is associative**.(具有結合律)
- Identical Condition : they are same when the kernel is point symmetric.(若 kernel 對稱原點)
## 109可能會考的 & Bonus
### 名詞解釋
109
ch2.助教的反白考題(在投影片最後一頁)
1. 舉兩種connected component labeling ,以及他們的傳播方向、次數與其特色。
1. Iterative (n-pass forward/backward)
2. Classical (2-pass forward + global table)
3. Signature segmentation 可以有什麼切法,舉兩種 (ch 2-4)
1. 垂直切
2. 水平切
4. 對角線切(2種)
4. 舉出兩種你所知道的thresholding 演算法
1. Minimize within-group (foreground/background) variance (大津演算法)
2. 假設histogram是gaussian mixture,去minimize mixture model和histogram的KL divergence。
ch6.(列在上面)
### 助教們研究(5*6=30分)
報告者論文研究的方法(2%)、步驟(2%)、結果(2%)
#### 1. **Autonomous Probe Card Analysis**
**動機**:探針卡的維護需要透過人力使用光學顯微鏡來檢測,耗時且不客觀
**目標(結果?)**:將探針卡的檢測透過掃描與程式設計自動化
**方法**:透過全光譜共焦線性掃描器掃描出的座標、高度、光強度之地理資訊,透過標準檔與忍受值,計算出掃描物是否符合要求。並且利用Mayavi套件來成像。(from 助教投影片)
**步驟**:
1. Probe card scanning (laser scan) 得到兩種data
1. Geometric Data
2. Scanning setting data
2. 資料輸入電腦
3. Geometric Data 處理
1. 資料存在 numpy array
2. 用Mayavi視覺化
3. 得到探針3D影像
4. Scanning Setting Data 處理
1. 資料存在 numpy array
2. 處理資料
3. 得到 maximum value, minimum value, average value 和 ng needles
**結果:**
1. 將探針卡的檢測透過掃描與程式設計自動化
2. 將Python改成C++增進效能
3. 未來將進行其他scanning data(MEMS probe card)的處理
#### 2. **Blind Monaural Source Separation on Heart and Lung Sounds Based on Periodic-Coded Deep Autoencoder**
Demo:https://weichian0920.github.io/
簡述版:
* Method : Unsupervised learning + Autoencoder
* Steps:


* Result:
separated lung sound and separated heart sound.
詳述版:
**動機**:聽診是診斷心血管和呼吸系統疾病的最有效方法。為了進行準確的診斷,設備必須能夠識別來自各種臨床情況的心音和肺音。但是,錄製的胸腔聲音會被心臟和肺部聲音混合。
**目標**:在預處理階段,有效分離這心音和肺音
**方法**:提出了一種新穎的周期性編碼深層自動編碼器(Periodic-Coded Deep Autoencoder, PC-DAE)方法,通過假設心率和呼吸頻率之間存在不同的周期性,從而以無監督的方式分離混合的心肺聲音。 PC-DAE通過提取代表性特徵而受益於基於深度學習的模型,並考慮了心音和肺音的周期性來執行分離。
**步驟**:
1. 把phonocardiogram輸入network
2. 經過STFT得到spectrogram
4. 將spectrogram encode成feature vector z(心音肺音混合)
5. z要能夠用decoder reconstruct回spectrogram
6. 在z_mix feature 上做modulation frequency analysis,經過Sparse NMF Clustering得到肺音的z和心音的z
7. 用decoder可以將分離的z重建回分離的spectrogram
8. 把重建的分離的spectrogram經過ISTFT可以得到分離的phonocardiogram
**結果**:在兩個數據集上評估了PC-DAE。第一個包括來自學生聽診假人(SAM)的聲音,第二個是通過記錄真實環境中的胸部聲音來準備的。實驗結果表明,PC-DAE的性能優於數個著名的分離工作。證實了通過將所提出的PC-DAE用作預處理階段,可以顯著提高心音識別精度。
(本篇是從paper abstract上修改來的)
#### 3. **Percutaneous Pedicle Screw Placement and Vertebra Pedicle Awl Tip Extension**
YOLO 拿來幹嘛的???
- 物件偵測。
(Chp4 PPT最後一張圖)
**方法**:使用cv和deep learning的技術進行pedicle detection and AWl injection point detection.
**步驟**:
1. 將脊椎影像轉成黑白
2. 進行connected component labelling找到pedicle
3. 利用canny edge detector找到pedicle 的輪廓
4. 利用hough transform 找到pedicle的中線
5. 利用image gradient轉折最大觸找到awl tip
6. 用剛剛hough transform找到的線以及awl tip得知injection point
**結果**
- The computer vision technique has potential to improve the accuracy of pedicle screw instrumentation.
- Combining computer vision and deep learning technique has potential to establish the Computer-Assisted Navigation Systems.
#### 4. **Watercolor**
**目的**:影像水彩化
**方法**:用濾波器filter +熱傳導的方法把影像作冷擴散
(A heat transfer equation is used as the image diffusion method, and a statistic mask filter is applied to the photographic image afterwards. Experimental results show that better color effect with natural appearance is achieved.)
**步驟**:

1. Input an imageofsize MxN pixels
2. Split image into3 color channels. perform color diffusion with heat equation on each channel
1. Set heat equation stop time
2. Calculate image gradient with sobel filter
3. Determine edge to stop diffusion
4. Repeat 1-3 until stop time is reached
3. Get heat transfer image
4. Get intensity level of each pixel in each statistical mask
5. Average all the pixel values that correspond to the highest repetition level, and this value will be the representative pixel value in the mask
6. Replace pixels in mask with the above average value
**結果**:相片的水彩化
#### 5. **Image Segmentation for Solder Defect Inspection**
Method:用tensorflow implement Unet-based semantic segmentation model來做solder defect detection,並將model上傳到Openvino。
Steps:

Result:將圖片分割成幾個類別區域
將solder image分割成
• 0: voids in microphone
• 1: cracks in microphone
• 2: normal solder pad
• 3: voids in solder pad
• 4: non-wetting
• 255: background

### Bonus
Is text or joke more important?
ans:笑話比本文重要!
>貌似是每年都會有的bonus :https://www.pttweb.cc/bbs/NTU-Exam/M.1578382932.A.CA6
Please write the Chinese name of Professor Chiou-Shann Fuh.
ans: 傅楸善
猴子在開飛機
What's Professor Chiou-Shann Fuh's pet phrase? (a) 酷斃了Cool (b) 帥呆人Handsome (c) 好極了Good (d)棒透了Awesome
(c)
Please translate "To err is human, to forgive divine." into Chinese.
人非聖賢,孰能無過
Please translate "塞翁失馬,焉知非福" into English.
Sometimes misfortune is a blessing in disguise.
## 名詞解釋考古(from ptt)
108
(1)shape from texture
(2)shape from shading
(3)alignment
(4)measurement vector
(5)pattern recognition
(6)virtual reality
(7)augmented reality
(8)stereo vision
(9)segmentation
(10)intensity histogram
(11)Gray-Level Co-occurrence Matrix(GLCM)
(12)region
(13)classifier
(14)bounding rectangle
(15)area
(16)centroid
(17)Statistical Pattern Recognition
(18)maximin decision rule
(19)Bayesian decision rule
(20)dilation
(21)opening
(22)recursive neighborhood operator
(23)symbolic domain
(24)linear shift invariant operator
(25)correlation
---
108 ans
(1)shape from texture
a computer vision technique where a 3D object is reconstructed from a 2D image using texture as a cue
(2)shape from shading
computing the three- dimensional shape of a surface from one image of that surface using shading(brightness of a black & white image) as a cue
(3)alignment
the technique of warping one image ( or sometimes both images ) so that the features in the two images line up perfectly.
(4)measurement vector
measurement vector input to classifier
輸入到分類器的測量向量
(5)pattern recognition
圖型識別,就是通過電腦用數學技術方法來研究圖型的自動處理和判讀。
(6)virtual reality
Virtual Reality, 虛擬實境
the use of computer technology to create a simulated environment.
(7)augmented reality
Augmented Reality, 擴增實境
攝影機影像的位置及角度精算並加上圖像分析技術,讓螢幕上的虛擬世界能夠與現實世界場景進行結合與互動的技術。
(8)stereo vision
3-D reconstruction
(9)segmentation
partition of image into set of non-overlapping regions
(10)intensity histogram
強度統計圖(?
graphical representation of the intensity distribution in a digital image.
(11)Gray-Level Co-occurrence Matrix(GLCM)
灰階共生矩陣
$P(g1,g2)=\frac{\#\{ ((r1,c1),(r2,c2) \in S | I(r1,c1)=g1, I(r2,c2)=g2)\}}{\#S}$
通過研究灰階的空間相關特性來描述紋理的常用方法。
(12)region
相似屬性的相連像素集
connected sets of pixels with similar properties
(13)classifier
分類器
(14)bounding rectangle
smallest rectangle circumscribes the region
最小外接區域的矩形
(15)area
$A=\sum_{(r,c)\in R}1$, $R$ means region.
(16)centroid
$r_{mean}=\frac{1}{A}\sum_{(r,c)\in R}r$, $R$ means region.
$c_{mean}=\frac{1}{A}\sum_{(r,c)\in R}c$, $R$ means region.
(17)Statistical Pattern Recognition
統計特徵辨識
the use of statistics to learn from examples
(18)maximin decision rule
$f$ is a maximin desicion rule iff the expected economic gain of $f$ is not less than any other decision rule $g$ when the prior is under the worst case (giving the minimum economic gain.)
(19)Bayesian decision rule
選後驗機率(posterior probability)最大的
(20)dilation
combine two sets by vector addition of set elements.
$A\oplus B=\{ c\in E^N | \exists a\in A, b\in B, c=a+b \}$, $B$ is kernel
(21)opening
First apply erosion then apply dilation.
$B\circ K=(B\ominus K)\oplus K$
(22)recursive neighborhood operator
輸出取決於先前生成的輸出的 neighborhood operator
(23)symbolic domain
AND, OR, NOT, Table loop up
(24)linear shift invariant operator
operators that replace every pixel with a linear combination of its neighbors and we porform the same operator at every point of the image.
(25)correlation
$F\circ I(x, y)=\sum_{j=-N}^{N}\sum_{i=-N}^{N}F(i, j)I(x+i, y+j)$
---
106
(1) grouping
(2) labeling
(3) shape
(4) feature
(5) preserve order
(6) hexagonal grid
(7) corner
(8) edge
(9) linear shift-invariant operator
(10) mathematical morphology
(11) conditioning
(12) convolution
(13) cross correlation
(14) weight mask
(15) noise cleaning
---
106 ans
(1) grouping
通過收集在一起或識別參與相同事件類型的最大像素連接集(來識別事件)。
->收集參與相同事件之最大像素連接集合 e.g. Segmentation, edge linking
(2) labeling
標記基於一個模型,模型建議the informative pattern具有空間事件排列的結構,每個空間事件都是一組相連的像素。e.g thresholding, edge detection
(3) shape
Prime carrier of information in machine vision
描述image或region中2D形狀或3D物體的方式,如512x512 image代表row,cols=512
(4) feature
A feature is typically defined as an “interesting” part of an image.
(5) preserve order
(6) hexagonal grid
六邊形地圖
(7) corner
A corner can be defined as the intersection of two edges.
(8) edge
Edges represent boundaries between objects and background (or two image regions).
(9) linear shift-invariant operator
operators that replace very pixel with a linear combination of its neighbors and we porform the same operator at every point of the image.
(10) mathematical morphology
a theory and technique for the analysis and processing of geometrical structures, based on set theory, lattice theory, topology, and random functions.
(11) conditioning
基於一個模型,該模型建議觀察到的圖像由an informative pattern所組成,而an informative pattern是由經uninteresting variations修改後得到。 e.g. Noise suppression, background normalization
(12) convolution
$F* I(x, y)=\sum_{j=-N}^{N}\sum_{i=-N}^{N}F(i, j)I(x-i, y-j)$
(13) cross correlation
$F\circ I(x, y)=\sum_{j=-N}^{N}\sum_{i=-N}^{N}F(i, j)I(x+i, y+j)$
(14) weight mask
mask(kernel)通常是有權重的,將weight mask和image做neighbor operations,可以aggregate原本image中每個pixel鄰域的特性,變成用一個pixel來表示(我也不知道怎麼寫比較好)
(15) noise cleaning
使用neighbor operators來將雜訊去除