# 李弘毅 ## Fundation Models for Speech Model - pretrain -> downstream - tasks: - speech recognition - speaker recognision - [SSS representation approaches](https://aclanthology.org/2022.naacl-tutorials.2/) ## SQA - No transcription needed - I: q, doc - O: a - Time Span - NMSQA - [SLUE-SQA-5](https://arxiv.org/pdf/2212.10525.pdf) - Direct train: - content OK, bad at semantic - Inprovement - add attention layers? X - cross dicipline capability - e.g. using english word to represent ATCG in DNA - HuBERT(Speech fundation model)+Longformer(Text fundation model) - 0.06 -> 0.5 F1 score - https://github.com/ga642381/SpeechPrompt ## Q - how to map id and vowl embedding? - random - not special token - more frequent token -> not improved so much