---
# System prepended metadata

title: Towards Multilingual Sign Language Recognition

---

# Towards Multilingual Sign Language Recognition
[Toc]

---

* 其中信息通過多個視覺渠道傳達，例如手勢（手的形狀，位置和運動），面部表情，身體姿勢，嘴唇運動[1]。 在裡面手語識別文獻，重點主要放在提取與手勢有關的多通道信息（手的形狀和手的運動）從視覺信號，並建模這些信息，以識別跡象
*  discrete unit representation of hand movements obtained using HMMs
![](https://i.imgur.com/sDGrFeF.png)

*  KL-HMM [16，17]特徵觀測是概率性的（**後驗分佈**）
*  在貝氏統計中，一個隨機事件或者一個不確定事件的事後機率（Posterior probability）是在考慮和給出相關證據或數據後所得到的**條件機率**。同樣，事後機率分布是一個未知量（視為隨機變數）基於試驗和調查後得到的機率分布。
https://www.ycc.idv.tw/deep-dl_3.html
*  ![](https://i.imgur.com/NB5rI9H.png)
y=狀態的categorical distribution 
z=stack of posterior features，在y產生後的條件機率分布
https://www.ycc.idv.tw/deep-dl_3.html

* KL-divergence，俗稱KL距離，常用來衡量兩個概率分佈的距離(relative entropy)
![](https://i.imgur.com/r7vEK1g.png)

https://www.ycc.idv.tw/deep-dl_2.html
* 在語音識別的情況下，已經發現可以通過使用輔助或非目標語言資源來有效解決資源限制。
* First, using **HamNoSys annotations** of signs [21–24]. Second, through unsupervised segmentation and clustering [8, 25–30]
https://www.sign-lang.uni-hamburg.de/dgs-korpus/files/inhalt_pdf/HamNoSys_06en.pdf
* 一種基於HMM的方法[15]，其中基於燈光監督派生獨立於簽名者的手部運動子單元
https://www.mdpi.com/2078-2489/10/10/298
* “從基於發音特徵的語音處理中獲得啟發的基於HMM的手語多通道信息建模方法”[5]
http://publications.idiap.ch/downloads/papers/2020/Tornay_LREC_2020.pdf
* 基於發音特徵的詞法建模基於連續發音的語音識別[14]
* SMILE數據庫中的瑞士德語手語（DSGS），HospiSign數據庫中的土耳其手語（TSL），DGS數據庫中的德國手語（DGS）
### 瑞士德語手語（DSGS）
It has **100 isolated signs** of a DSGS vocabulary production test
30 adult signers performed each item three times and the second pass was **manually annotated**.
我們使用數據庫中提供的身體姿勢信息，該信息是使用基於深度學習的關鍵點檢測庫OpenPose提取的，作為我們特徵提取的基礎。
### 土耳其手語（TSL）
The HospiSign subset includes 6 adult signers, with each sign being repeated approximately 6 times by each signer.
We have used the **skeletal joint** coordinates that are provided in the database as the basis for our feature extraction.
### 德國手語（DGS）
3D skeleton position and velocity of both hands
3D coordinates of a human skeleton has been tracked using the **OpenNI** framework
![](https://i.imgur.com/Jb8RRa4.png)
![](https://i.imgur.com/lOk3uQB.png)
"An HMM Approach with Inherent Model Selection for Sign Language and Gesture Recognition,"
Sandrine Tornay, Oya Aran, Mathew Magimai Doss
http://publications.idiap.ch/downloads/papers/2020/Tornay_LREC_2020.pdf
## Hand Movement Subunit Extraction

1.  Then left-to-right HMMs with one **mixture Gaussian** and **diagonal covariance** was trained for each sign (sign-based HMM/GMM) and the HMM states are clustered by pairwise comparison of respective **Gaussian distributions** using the **Bhattacharyya distance** leading to a clustered subunits states. 針對每個sign訓練具有混合高斯和對角協方差的HMM，並使用巴氏距離通過成對比較各個高斯分佈成對比較，從而將HMM狀態分群，從而得出分群的subunit狀態。
*巴氏距離用於測量兩離散機率分布
https://www.itsfun.com.tw/%E5%B7%B4%E6%B0%8F%E8%B7%9D%E9%9B%A2/wiki-6854927-6761707
https://medium.com/ai-academy-taiwan/clustering-%E5%88%86%E7%BE%A4%E6%87%B6%E4%BA%BA%E5%8C%85-9c0bb861a3ba
2.  為了構建基於sign的和基於subunits的MLP，我們首先使用符號級別或基於群集的subunits的HMM / GMM系統獲得了基於HMM狀態的對齊方式。
*Multilayer Perceptron(多重感知器)
https://chih-sheng-huang821.medium.com/%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92-%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-%E5%A4%9A%E5%B1%A4%E6%84%9F%E7%9F%A5%E6%A9%9F-multilayer-perceptron-mlp-%E9%81%8B%E4%BD%9C%E6%96%B9%E5%BC%8F-f0e108e8b9af
3.  trained MLPs classifying HMM states with **output non-linearity of softmax** and **minimum cross-entropy** error criterion. we trained MLPs with different number of hidden units (600, 800, 1000) and hidden layers (0, 1, 2, 3).
4.  ![](https://i.imgur.com/NP4LHSK.png)
zt是在該手語衍生的手部動作的特徵概率
sl是對應不同的語料庫
## Hand Shape Subunit Extraction
1. used the **DeepHand** net which is trained on one-million hands dataset [13] for hand shape posterior estimation from Danish sign language, New Zealand sign language and German sign language. 
2. hand shape class-conditional posterior probabilities
## Results
![](https://i.imgur.com/Uya2EfF.png)
RA(recognition accuracy)
* 造成準確率低下的原因:(a)SMILE database中的signer是坐著的，而DGS和hospiSign中的signer是站著的,(b)每個database中的詞彙量都是有限的，難以保證每個hand movement皆被derived subunits所covered
* HospiSign database由短語組成，而其他兩個數據庫則由獨立的sign所組成。這皆會影響取樣的性質。這個事實可以解釋為什麼添加TSL子單元對識別DGS或DSGS語言沒有明顯幫助的原因
* When comparing subunit-based MLP and sign-based MLP KL-HMM systems, it can be observed that the performances are **comparable**, despite the fact that subunit extraction leads to state reduction.

![](https://i.imgur.com/NLFb576.png)
* Average RA for reference monolingual HMM/GMM system and cross-/multi-lingual KL-HMM systems using hand movement and hand shape subunits.