---
# System prepended metadata

title: '[PAPER] DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter'

---

# [PAPER] DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
![](https://hackmd.io/_uploads/HkOVfqcFh.png)


:::info
**Author** : Zhihong Chen, Yan Song, Tsung-Hui Chang, Xiang Wan(The Chinese University of Hong Kong & Shenzhen Research Institute of Big Data)

**Paper Link** : https://arxiv.org/abs/1910.01108

**Code** : https://github.com/askaydevs/distillbert-qa
:::

## Contributions
* DistilBERT model : By applying the Distilling Knowledge technique to the existing BERT model smaller, faster.
* Knowledge distillation : The concept of sharing AI knowledge from the teacher, a large-scale pre-trained model, to the student, a lightweight compressed model
![](https://i.imgur.com/sMo1Kud.png)
