###### tags: Paper Reading # Downstream Model Design of Pre-trained Language Model for Relation Extraction Task ## Outline     This paper is talking about a new network architecture for relation extraction downstream model. ## Introduction/Motivation     In previous works, relation extraction can be mainly summarized by 3 steps below. But these methos do not perform well on datasets which have complicated relations in it. 1. building an encorder and obtaion word embedding. 2. bulding a certain network structure to extract information from embeddings. 3. Puts encoded information into a classifier,like softmax classifier..     Recently, many works take a use of **pre-trained language models(PLM)** as a idea. This paper thinks that they do not fully take use of PLM, so it propose a new downstram model architecture.     ## Model ### architecture 1. Put text T into transformer(Bert) and **get penultimate layer output as $E_w$** 2. **Get last layer output as $E_p$**, and **add Bert [CLS] embedding to $E_w$ and get $E_b$** 3. Caculate similarity between $E_b$ and $E_p$ by following $S_i = E_bW_{hi}\cdot(E_pW_{ti})^T$ 4. Use sigmoid functions to normalize $S_i$ ### loss function     Please go to section 3.3. ![](https://i.imgur.com/8wTun5O.png) ## Experiment     Please go to section 4. ## Conclusion     This paper propose a new architecture to make good use of PLM. In addition, it also introduces corresponding loss fuction. We can take this kind of downstream model architecture as a idea. Excellent work.