# 李弘毅
## Fundation Models for Speech Model
- pretrain -> downstream
- tasks:
- speech recognition
- speaker recognision
- [SSS representation approaches](https://aclanthology.org/2022.naacl-tutorials.2/)
## SQA
- No transcription needed
- I: q, doc
- O: a
- Time Span
- NMSQA
- [SLUE-SQA-5](https://arxiv.org/pdf/2212.10525.pdf)
- Direct train:
- content OK, bad at semantic
- Inprovement
- add attention layers? X
- cross dicipline capability
- e.g. using english word to represent ATCG in DNA
- HuBERT(Speech fundation model)+Longformer(Text fundation model)
- 0.06 -> 0.5 F1 score
- https://github.com/ga642381/SpeechPrompt
## Q
- how to map id and vowl embedding?
- random
- not special token
- more frequent token -> not improved so much