論文筆記

Emotion Recognition for Cognitive Edge Computing Using Deep Learning

當資料從 sensor / IoT 大量傳輸到處理中心時 , 主要有三個挑戰 : latency, scalability, and security

latency : 將 vedio data 輸入從 sensor 送到 cloud , 在 cloud 推斷結果 , 再 return 結果 , 這樣會造成許多延遲 , 此時 data scaliing 是一個有效運用資源的方法

邊緣運算 : 從 sensor / IoT 傳遞資料上去處中心時時需要大量的頻寬以及足夠的時間 , 因此 , 在連續的資料傳輸中 , 資料應該被預處理以減輕傳遞的負擔 , 可以为用户提供更少的延迟和实时体验

Edge devices : 1 個 edge server , 一個行動通信基地台 , 數個 end devices(數據來源)

edge server 位置處於 sensor / IoT gateways 附近可以減低 latency

對深度學習進行 edge computing 在 Edge devices 中由哪一個 device 進行哪些部份的 deep learning 處理是一個潛在的課題 , 不同 devices 之間的協調以及工作分布 (這部份由排程演算法處理 )

本文提出一個識別人臉情緒的系統(透過深度學習以及邊緣運算)

在 end device 上捕捉人臉圖像 , 再做一些預處理 , 再透過 5G 網路送到 edge server

edge server 將送來的圖片送入 pretrained 的 model

Cloud server 訓練模型以及儲存 face data

在 off-period 時 , cloud server 才會跟　edge server 進行溝通

雖然現況下有很多情緒辨認系統 , 但有使用到 edge computing 的卻很少 , 即使有運用到的也沒完全運用 edge 的計算能力

這篇文章的貢獻：

引入透過邊緣計算的臉部情緒辨認系統
在 end device 上進行預處理 , 在 edge server 測試 , 在 cloud server 訓練
使用多個數據集評估該系統

情緒分類主要有 7 種基本的情緒 , paper 選用的資料集為 JAFFE 以及 CK+

Fig.1為論文中的架構設計

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

edge server 搭載 GPU 使其有良好的算力

off-time , 從　cloud 下載模型
non off-time , 接收 end device 的圖片並進行推論 , 再將結果回傳

core cloud 有 global model for 情緒辨認還有 global storage(儲存各個 end device 收集的臉部圖像 , 也是 global model 的訓練資料 )

end device 接收到推論結果的時間主要可分成 2 大部分

在 end device 中進行預處理的時間(人臉偵測/剪裁/對比度增強/resize等)
offloading
1. 傳輸時間 (end device 傳輸 data 到 edge server的時間 + edge server 回傳結果的時間 )
2. edge server 的推論時間

Fig.2 从边缘服务器到终端设备所需的时间

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

系統組件以及相對應的任務

在 end device 端 ,偵測出來的人臉圖片會被裁切成 227x227 的大小以符合模型需求,图像只包含人脸，而且数据大小很小，这样传输到边缘服务器就不会占用很多带宽或时间

在通信組件部分

圖像在預處理後從 end device 傳輸到 edge server 传输到边缘服务器，这发生在预处理之后。
將 edge server 判斷的情緒種類回傳給 end device

在 edge server 中

當圖像傳送過來後 , 運行 CNN 並推斷結果
提供可視化模型(optional)
收集足夠多的圖片後 , 在 off-time 更新模型

在 cloud 通信組件中

將 global DL model 從 cloud server 下載到 edge server
將 edge server 中更新後的DL模型上傳到 cloud server

在 cloud server 中使用來自不同 edge server 中的 data 去更新 global model

目前有許多的 CNN 模型都已經有出色的表現 , 然而大多數模型都是用數以百萬計的樣本訓練 , 並且有著許多的參數需要學習 ,雖然他們的準確性高 , 但不太適合計算能力低的裝置(edge server 即是如此) , 因此 , paper 開發一個新的輕型 CNN 來辨識情緒

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

error cost function 是採用 class cross entropy , batch size = 5

臉部圖像使用 0.8 ~ 1.2 的縮放係數 , 旋轉 -25◦ ~ 25◦

此模型大約有 19 萬個須學習的參數 , 此參數量跟其他現有的模型相比是非常少的 , 唯一一個現有的 model squeezenet 的參數量比文中提到的更少 (然而在實驗中此模型表現不好)

對於中小規模的資料量 , 自由度低的緊湊模型就可以很好的完成 , 並且也適合在 edge server 上運行

實驗使用 2 個公開的 dataset

JAFFE (213張圖片 , 7種類型 ,size 256x256)
CK+ (123人的面部情绪视频样本,这些视频的帧率在10到60帧/秒之间不等 , 图像大小为640 × 490或640 × 480)

prototype

终端设备组件，在Android 10版本上实现
边缘服务器组件，使用支持CUDA 10.0的NVIDIA GeForce RTX 2070 8-GB GPU驱动、用于深度学习模型的cuDNN v7.6和TensorFlow 2.0实现
通信组件
- 在智能手机中运行，使用 Apache HttpClient来与服务器进行通信
- 服务器使用Django运行

系统采用五倍交叉验证法进行验证，即所有的样本被随机分成相等的五倍，在每一次重复中，四倍用于训练，另一倍用于测试

训练是在云服务器上完成的，然后将训练好的模型下载到边缘服务器上。测试图像的预处理在智能手机上完成，使用训练好的CNN模型进行测试。CNN模型的测试是在边缘服务器上完成的

結果:

CK+整體正確率 96.6%
JAFFE 总体准确率为93.5%

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

对于同一个情绪类别，激活区域可能有所不同,但主要集中在眼睛、脸颊或嘴巴区域。在未来的工作中，我们可能需要在这些注意区域上下功夫，以提高性能

耗能比較 , 比較四種不同的系統

classical system (手工 feature)
classical system (CNN feature)
edge system ( 採用本文提出的 CNN模型，在边缘服务器上运行以及预处理
本文系統 (在手機上進行預處理 CNN模型在边缘服务器上运行)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

除了本文提出的系統外 , 其他先進的系統都超過 10 M 的參數量(這在我們的方法中是大問題) , 儘管參數量少但仍達到跟其他系統相當的準確性

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Hardware-Oriented Memory-Limited Online Artifact Subspace Reconstruction (HMO-ASR) Algorithm

Artifact Subspace Reconstruction(ASR) 是一种机器学习技术，广泛用于去除脑电图（EEG）中的非大脑信号 (Artifact) 即眼球运动、肌肉活动等

ASR 可用于提高信号质量，然而，ASR算法需要相当大的内存大小，使其不足以在便携式设备、特定应用集成电路（ASIC）或现场可编程门阵列（FPGA）上进行在线 Artifact 去除,因此提出 hardware-oriented and memory-constrained online ASR (HMO-ASR) algorithm

HMO-ASR

two-level window-based preprocessing
rejection threshold calibration processing
reconstruction module

下圖為 HMO-ASR算法的流程图

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

two-level window-based preprocessing
- PCA-based preprocessing sliding window(對 sliding window 做線性降維)
- z-score based preprocessing (将不同量级的数据转化为统一量度的Z-Score分值进行比较)
rejection threshold calibration processing
- iterative updating
- early eigenvector matrix determination module to update the corresponding rejection thresholds
reconstruction module
- removes the principal components (PC) with values greater than the rejection threshold
- reconstruct the data using remaining PCs
- non-overlapped reconstructed samples are outputted

two-level window-based preprocessing

對比原本的 ASR :

先對 whole recording 執行 PCA
使用 non-overlapping sliding window 計算 channel wised 均方根(RMS)
將 RMS 轉換成 z-score
z-score 在 -3~5 區間定義為 clean

由於不乾淨的數據會大大影響 rejection threshold 的收斂速度 , 所以 HMO-ASR 先對 sliding window 進行 PCA (將特徵值轉成 z-score , z-score > 1.5 則刪除對應的特徵向量)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

λ_k是第k个排序的特征值 , m_λ和σ_λ分别是特征值的平均值和标准差

保留的特征向量被投射回 channel , 接著如果被送入 z-score based preprocessing 的 window 的 z-score 在 -3~5 區間會被送往 rejection threshold calibration processing 去更新 artifact removal threshold , 否則的話視其為不乾淨 , 送到 reconstruction module

rejection threshold calibration processing

iterative updating 大意: 更新這個 iteration 的 parameters 只會用到前一個 iteration 的東西(應該是其他論文的結論)

在每次迭代中计算协方差矩阵的平方根（sqrtm(Ci)），计算量很大。如果传入数据的特征向量和以前的数据相似，計算新的 sqrtm(Ci)的用處也不大 , 因此 , 定義一個 threshold 來評估是否需要特徵值分解來更新 sqrtm(Ci)

C_i : updated covariance matrix
E_i−1 previous eigenvector matrix

因為 C_i 是實對稱矩陣(可對角化) 所以可推論 E^T_i−1C_iE_i−1 = E^T_i−1E_iD_iE^T_iE_i−1
(實對稱矩陣 E^-1_iC_iE_i = D , 其中 E^-1 = E^T)

如果 E_i 跟 E_i−1 相近 , 則 E^T_i E_i−1 ≈ I , 也就是說 E^T_i−1C_iE_i−1 = E^T_i−1E_iD_iE^T_iE_i−1

右半邊的式子可視為 ID_iI (如果E_i 跟 E_i−1 相近) , 那就是說左式的非對角元素接近0

因此定義 threshold 如下

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

如果傳入的數據滿足公式 , 則保留 E_i-1 以近似sqrtm（Ci）的特征向量矩阵E_i。否则，需要进行特征值分解

Threshold Update 機制

對比原本的 ASR :

使用 IIR 濾波器過濾數據的干淨部分以獲得 (Xc)
計算 Xc 的 covariance matrix 的平方根的特徵向量矩陣 Ec
Xc 被投影到 the principal component space( Yc=E^T_c * Xc )
Yc are splited into several windows and the RMS values for principal components are computed within the window
计算所有窗口中RMS值的平均值 μ_c和标准差σ_c
fixed rejection threshold c 定義為 c = μc +f · σc, where f is the adjustable cutoff parameter

在HMO-ASR算法中，一旦传入的数据X∗ 满足 z-score范围，HMO-ASR就会更新 rejection thresholds

μ_i and σ_i are updated by the iterative updating scheme

adaptive threshold is defined as

Γ

_i = μ_i + f · σ_i

更新 rejection threshold 后，被污染的数据 X_j 将进入 reconstruction module

首先，

对于每个X_k，应用PCA得到 C_k = E_kD_kE^T_k

接著比較以及 rejection thresholds

Γ

_i 從 E_i 投射至 E_k 的值(請看下面的不等式)

λ

_k,l 指的是對角線矩陣 D_k 的第 l 個 element

如果不等式成立 , 將對應的 E_k 中對應的第 l 個特徵向量用 0 向量取代掉

Reconstruct 如下

trunc 的反函數概念上會跟 sqrtm 以及 E^k_t 互相抵消 , 即可得到重構的結果

Cost-Effective and Variable-Channel FastICA Hardware Architecture and Implementation for EEG Signal Processing

independent component analysis (ICA) algorithm 被认为是通过脑电信号研究大脑活动的一个有用方法

ICA算法是設計用來解決 BSS 問題 , 可以将混合的信号分离出来，并揭示脑活动的信息 (但 ICA 算法的硬件設計與實現是個挑戰 )

基于Gram-Schmidt的 whitening 可以应用于BSS问题的解决 , 使用 Gram-Schmidt 在 PCA 進行降維且为了提高灵活性，我们希望支持可变的通道选择，并为脑电信号处理提供 re-reference , synchronized average , moving average 等功能

在此架構中提出了两个重复使用的处理单元（PU），以实现低成本和可变通道的FastICA硬件实现

主要贡献总结如下:

具有成本效益的2-16 通道浮点FastICA架构，其中有两个新的重复使用的 PU，采用 Gram-Schmidt的 whitening
在特定应用集成电路（ASIC）方法中实现 FastICA 硬件架构以支持可变通道 FastICA、re-reference , synchronized average , moving average 功能。

FastICA

X : n by m mixed-signal matrix
S : n by m blind-source-signal matrix

S = W^TX

basic idea of ICA is to maximize the non-Gaussianity of W^TX

在信號/噪音比低的時候 , FastICA 擁有快的收斂速度以及好的效能

在FastICA算法中，需要对混合信号进行预处理 , 使其 center 以及 whiten

center

whiten

論文筆記

Emotion Recognition for Cognitive Edge Computing Using Deep Learning

Hardware-Oriented Memory-Limited Online Artifact Subspace Reconstruction (HMO-ASR) Algorithm

HMO-ASR

two-level window-based preprocessing

rejection threshold calibration processing

Cost-Effective and Variable-Channel FastICA Hardware Architecture and Implementation for EEG Signal Processing

FastICA

Read more

Parallel Programming HW-6

Parallel Programming HW-2

2020q3 Homework3 (quiz3)

視訊串流與追蹤筆記整理