Speech Processing

# Speech Processing [toc] --- ## 1. General information and organization issues :::info - Voice-based human-machine communication via HMI - Human Machine Interface ![image](https://hackmd.io/_uploads/Bk9kuCQ2T.png) - Dialogue systems like chatbox ![image](https://hackmd.io/_uploads/BJfYKCm3T.png) - the middle part is called Natural Language Processing (NLP) - TTS: Text-to-Speech - ASR: Automatic Speech Recognition - speech transcription from acoustic to text form ![image](https://hackmd.io/_uploads/S1jm6CX3T.png) - replacement of keyboard by voice - automated transcription of audio records/streams - speaker recognition - system of biometrics identification for autorization purposes, forensic applications - human articulatory system ![image](https://hackmd.io/_uploads/r1n2JyE3a.png) - speech production model ![image](https://hackmd.io/_uploads/rJZ7GyEn6.png) - samples of different parts of speech signal ![image](https://hackmd.io/_uploads/Hk3RV1436.png) - plosive sound is hard to deal with because of short duration - general description of speech signal - acoustic level: purely analyze waveform and signal itself without considering content - phonetic level: information content, try to separate it into subwords - signal sampling and quantization, Pulse Code Modulation(PCM) ![image](https://hackmd.io/_uploads/ByWfOJVna.png) - speech sampling ![image](https://hackmd.io/_uploads/BkTbFyVn6.png) - linear quantization of speech signal ![image](https://hackmd.io/_uploads/SJsYj1N2T.png) - freq. perception ![image](https://hackmd.io/_uploads/By-Xn1436.png) - itensity ![image](https://hackmd.io/_uploads/S1TZay4n6.png) - loudness ![image](https://hackmd.io/_uploads/SJSy0kVn6.png) > freq. 200 to 5000, we have the highest sensitivity for the sounds ::: --- ## 2. Basic time-domain and spectral characteristics of speech signal :::info - Most of them are mathematical eq., check slide. :::