# Speech Recognision
## I/O of SR sys

## Feature Extraction before entering NN

## Acoustic features used in paper

## How much data needed

## Famous Structure

## Audio signal NN pooling


## Listen, Atten, and Spell (LAS)

Inference: Beam Search
- Greedy Search and its limit

- Beam Search

Training: Teacher forcing

Attention


## Connectionist Temporal Classification(CTC)

* Input T acoustic features, output T tokens (ignoring down sampling)
* Output tokens including $\phi$, merging duplicate tokens, removing$\phi$


## Model summary
