# Towards Multilingual Sign Language Recognition
[Toc]
---
* 其中信息通過多個視覺渠道傳達,例如手勢(手的形狀,位置和運動),面部表情,身體姿勢,嘴唇運動[1]。 在裡面手語識別文獻,重點主要放在提取與手勢有關的多通道信息(手的形狀和手的運動)從視覺信號,並建模這些信息,以識別跡象
* discrete unit representation of hand movements obtained using HMMs

* KL-HMM [16,17]特徵觀測是概率性的(**後驗分佈**)
* 在貝氏統計中,一個隨機事件或者一個不確定事件的事後機率(Posterior probability)是在考慮和給出相關證據或數據後所得到的**條件機率**。同樣,事後機率分布是一個未知量(視為隨機變數)基於試驗和調查後得到的機率分布。
https://www.ycc.idv.tw/deep-dl_3.html
* 
y=狀態的categorical distribution
z=stack of posterior features,在y產生後的條件機率分布
https://www.ycc.idv.tw/deep-dl_3.html
* KL-divergence,俗稱KL距離,常用來衡量兩個概率分佈的距離(relative entropy)

https://www.ycc.idv.tw/deep-dl_2.html
* 在語音識別的情況下,已經發現可以通過使用輔助或非目標語言資源來有效解決資源限制。
* First, using **HamNoSys annotations** of signs [21–24]. Second, through unsupervised segmentation and clustering [8, 25–30]
https://www.sign-lang.uni-hamburg.de/dgs-korpus/files/inhalt_pdf/HamNoSys_06en.pdf
* 一種基於HMM的方法[15],其中基於燈光監督派生獨立於簽名者的手部運動子單元
https://www.mdpi.com/2078-2489/10/10/298
* “從基於發音特徵的語音處理中獲得啟發的基於HMM的手語多通道信息建模方法”[5]
http://publications.idiap.ch/downloads/papers/2020/Tornay_LREC_2020.pdf
* 基於發音特徵的詞法建模基於連續發音的語音識別[14]
* SMILE數據庫中的瑞士德語手語(DSGS),HospiSign數據庫中的土耳其手語(TSL),DGS數據庫中的德國手語(DGS)
### 瑞士德語手語(DSGS)
It has **100 isolated signs** of a DSGS vocabulary production test
30 adult signers performed each item three times and the second pass was **manually annotated**.
我們使用數據庫中提供的身體姿勢信息,該信息是使用基於深度學習的關鍵點檢測庫OpenPose提取的,作為我們特徵提取的基礎。
### 土耳其手語(TSL)
The HospiSign subset includes 6 adult signers, with each sign being repeated approximately 6 times by each signer.
We have used the **skeletal joint** coordinates that are provided in the database as the basis for our feature extraction.
### 德國手語(DGS)
3D skeleton position and velocity of both hands
3D coordinates of a human skeleton has been tracked using the **OpenNI** framework


"An HMM Approach with Inherent Model Selection for Sign Language and Gesture Recognition,"
Sandrine Tornay, Oya Aran, Mathew Magimai Doss
http://publications.idiap.ch/downloads/papers/2020/Tornay_LREC_2020.pdf
## Hand Movement Subunit Extraction
1. Then left-to-right HMMs with one **mixture Gaussian** and **diagonal covariance** was trained for each sign (sign-based HMM/GMM) and the HMM states are clustered by pairwise comparison of respective **Gaussian distributions** using the **Bhattacharyya distance** leading to a clustered subunits states. 針對每個sign訓練具有混合高斯和對角協方差的HMM,並使用巴氏距離通過成對比較各個高斯分佈成對比較,從而將HMM狀態分群,從而得出分群的subunit狀態。
*巴氏距離用於測量兩離散機率分布
https://www.itsfun.com.tw/%E5%B7%B4%E6%B0%8F%E8%B7%9D%E9%9B%A2/wiki-6854927-6761707
https://medium.com/ai-academy-taiwan/clustering-%E5%88%86%E7%BE%A4%E6%87%B6%E4%BA%BA%E5%8C%85-9c0bb861a3ba
2. 為了構建基於sign的和基於subunits的MLP,我們首先使用符號級別或基於群集的subunits的HMM / GMM系統獲得了基於HMM狀態的對齊方式。
*Multilayer Perceptron(多重感知器)
https://chih-sheng-huang821.medium.com/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-%E5%A4%9A%E5%B1%A4%E6%84%9F%E7%9F%A5%E6%A9%9F-multilayer-perceptron-mlp-%E9%81%8B%E4%BD%9C%E6%96%B9%E5%BC%8F-f0e108e8b9af
3. trained MLPs classifying HMM states with **output non-linearity of softmax** and **minimum cross-entropy** error criterion. we trained MLPs with different number of hidden units (600, 800, 1000) and hidden layers (0, 1, 2, 3).
4. 
zt是在該手語衍生的手部動作的特徵概率
sl是對應不同的語料庫
## Hand Shape Subunit Extraction
1. used the **DeepHand** net which is trained on one-million hands dataset [13] for hand shape posterior estimation from Danish sign language, New Zealand sign language and German sign language.
2. hand shape class-conditional posterior probabilities
## Results

RA(recognition accuracy)
* 造成準確率低下的原因:(a)SMILE database中的signer是坐著的,而DGS和hospiSign中的signer是站著的,(b)每個database中的詞彙量都是有限的,難以保證每個hand movement皆被derived subunits所covered
* HospiSign database由短語組成,而其他兩個數據庫則由獨立的sign所組成。這皆會影響取樣的性質。這個事實可以解釋為什麼添加TSL子單元對識別DGS或DSGS語言沒有明顯幫助的原因
* When comparing subunit-based MLP and sign-based MLP KL-HMM systems, it can be observed that the performances are **comparable**, despite the fact that subunit extraction leads to state reduction.

* Average RA for reference monolingual HMM/GMM system and cross-/multi-lingual KL-HMM systems using hand movement and hand shape subunits.