[PAPER] DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

# [PAPER] DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter ![](https://hackmd.io/_uploads/HkOVfqcFh.png) :::info **Author** : Zhihong Chen, Yan Song, Tsung-Hui Chang, Xiang Wan(The Chinese University of Hong Kong & Shenzhen Research Institute of Big Data) **Paper Link** : https://arxiv.org/abs/1910.01108 **Code** : https://github.com/askaydevs/distillbert-qa ::: ## Contributions * DistilBERT model : By applying the Distilling Knowledge technique to the existing BERT model smaller, faster. * Knowledge distillation : The concept of sharing AI knowledge from the teacher, a large-scale pre-trained model, to the student, a lightweight compressed model ![](https://i.imgur.com/sMo1Kud.png)