# [PAPER] DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

:::info
**Author** : Zhihong Chen, Yan Song, Tsung-Hui Chang, Xiang Wan(The Chinese University of Hong Kong & Shenzhen Research Institute of Big Data)
**Paper Link** : https://arxiv.org/abs/1910.01108
**Code** : https://github.com/askaydevs/distillbert-qa
:::
## Contributions
* DistilBERT model : By applying the Distilling Knowledge technique to the existing BERT model smaller, faster.
* Knowledge distillation : The concept of sharing AI knowledge from the teacher, a large-scale pre-trained model, to the student, a lightweight compressed model
