# Speech Processing
[toc]
---
## 1. General information and organization issues
:::info
- Voice-based human-machine communication via HMI - Human Machine Interface
![image](https://hackmd.io/_uploads/Bk9kuCQ2T.png)
- Dialogue systems like chatbox
![image](https://hackmd.io/_uploads/BJfYKCm3T.png)
- the middle part is called Natural Language Processing (NLP)
- TTS: Text-to-Speech
- ASR: Automatic Speech Recognition
- speech transcription from acoustic to text form
![image](https://hackmd.io/_uploads/S1jm6CX3T.png)
- replacement of keyboard by voice
- automated transcription of audio records/streams
- speaker recognition
- system of biometrics identification for autorization purposes, forensic applications
- human articulatory system
![image](https://hackmd.io/_uploads/r1n2JyE3a.png)
- speech production model
![image](https://hackmd.io/_uploads/rJZ7GyEn6.png)
- samples of different parts of speech signal
![image](https://hackmd.io/_uploads/Hk3RV1436.png)
- plosive sound is hard to deal with because of short duration
- general description of speech signal
- acoustic level: purely analyze waveform and signal itself without considering content
- phonetic level: information content, try to separate it into subwords
- signal sampling and quantization, Pulse Code Modulation(PCM)
![image](https://hackmd.io/_uploads/ByWfOJVna.png)
- speech sampling
![image](https://hackmd.io/_uploads/BkTbFyVn6.png)
- linear quantization of speech signal
![image](https://hackmd.io/_uploads/SJsYj1N2T.png)
- freq. perception
![image](https://hackmd.io/_uploads/By-Xn1436.png)
- itensity
![image](https://hackmd.io/_uploads/S1TZay4n6.png)
- loudness
![image](https://hackmd.io/_uploads/SJSy0kVn6.png)
> freq. 200 to 5000, we have the highest sensitivity for the sounds
:::
---
## 2. Basic time-domain and spectral characteristics of speech signal
:::info
- Most of them are mathematical eq., check slide.
:::