# Deep Learning 李宏毅 > [name=周惶振] > [Website](https://speech.ee.ntu.edu.tw/~hylee/ml/2021-spring.html) ![](https://i.imgur.com/677CFqq.jpg) # HW5: Transformer * Attention 要有順序 * Monotonic Attention * Location-aware attention * Beam Search * 有模糊地帶的不適用 * Pure Sampling * 語音合成的decoder要加noise(要有隨機性) * 遇到Evaluation metric的時候,使用RL * 把Evaluation metric直接當成reward * [Sequence Level Training with Recurrent Neural Networks](https://arxiv.org/abs/1511.06732) * Scheduled Sampling * [Parallel Scheduled Samplling](https://arxiv.org/pdf/1906.04331.pdf) # GAN * [Generative Adversarial Nets](https://proceedings.neurips.cc/paper/2014/file/5ca3e9b122f61f8f06494c97b1afccf3-Paper.pdf) * [f-GAN](https://dl.acm.org/doi/pdf/10.5555/3157096.3157127) * Tips: * JS divergence is not suitable * 改用 Wasserstein distance * $D \in 1-Lipschitz continuity$ # Self-supervised Learning * BART for Summarization * Paper * [MASS: Masked Sequence to Sequence Pre-training for Language Generation](https://arxiv.org/pdf/1905.02450.pdf) * [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension](https://arxiv.org/pdf/1910.13461.pdf) * Code * [Huggingface Toolkit](https://huggingface.co/transformers/model_doc/bart.html) * [Facebook Original](https://github.com/facebookresearch/GENRE * [Pre-Trained Model](https://github.com/HHousen/TransformerSum)