<style>
img {
display: block;
margin-left: auto;
margin-right: auto;
}
</style>
> [Paper link](https://aclanthology.org/P19-1534.pdf) | [Note link](https://zhuanlan.zhihu.com/p/444030341) | [Code link](https://github.com/facebookresearch/EmpatheticDialogues) | ACL 2019
:::success
**Thoughts**
- EmpatheticDialogues is a new dataset for both using retrieval-based and generative task.
- For both retrieval and generative systems, they compute **BLEU scores** for the model response, comparing against the gold label (the actual response)
- For the generative systems, they additionally report **perplexity** of the actual gold response.
:::
## Abstract
This work proposes a new benchmark for empathetic dialogue generation and EmpatheticDialogues, a novel dataset of 25k conversations grounded in emotional situations.
Their experiments indicate that dialogue models that use their dataset are perceived to be more empathetic by human evaluators, compared to models merely trained on large-scale Internet conversation data.
## Introduction

Empathetic responding is clearly relevant to dialogue systems that are geared towards general conversation or chit-chat.
Indeed, ordinary communication is frequently prompted by people sharing their feelings or circumstances.
This work aims to facilitate evaluating models’ ability to produce empathetic responses.

Their experiments show that large-capacity conversation models trained on spontaneous internet conversation data are not rated as very empathetic.
They propose two simple ways to leverage their dataset to improve those models:
- use utterances from their training data as candidate responses in a retrieval model at inference time
- fine-tune the model on their task
## Related Work
**Emotion data**
This paper focus on personal conversations rather than using social media data to be closer to a context of a one-on-one conversation.
**Controllable language generation**
This paper focuses on empathetic responses that are appropriate to signals inferred purely from text rather than conveying a pre-specified emotion.
**Related chit-chat data**
- DailyDialog Dataset
## Talking about Personal Situations
**Emotional situation grounding**
This paper consider 32 emotion labels.

**Speaker and listener**
The person who wrote the situation description (*Speaker*) initiates a conversation to talk about it.
The other conversation participant (*Listener*) becomes aware of the underlying situation through what the Speaker says and responds.
Speaker and Listener then exchange up to 6 more turns.
**The models discussed below are tested in the role of Listener responding to the Speaker.**
**EmpatheticDialogues dataset statistics**
The resulting dataset comprises 24,850 conversations about a situation description.
The final train/val/test split was 19533 / 2770 / 2547 conversations, respectively.
## Empathetic Response Generation
To emulate a normal conversation, the model has access to previous utterances in the dialogue, but not to the emotion word prompt (e.g., “proud”), nor to the situation description generated by the Speaker.
Given a dialogue context $x$ of $n$ previous conversation utterances concatenated and tokenized as $x_1, \cdots , x_m$, followed by a target response $\bar{y}$, their models are trained to maximize the likelihood $p(\bar{y} \mid x)$ of producing the target response.
Here, they investigate both **generative** and **retrieval-based** settings.

### Base Architecture
Their base model is Transformer
### Leveraging the Training Data from ED
### Adding Information from External

## Experimental Evaluation
**Automated metrics**
For both retrieval and generative systems, they compute **BLEU scores** for the model response, comparing against the gold label (the actual response)
For the generative systems, they additionally report **perplexity** of the actual gold response.

**Human ratings**
- **Empathy/Sympathy**: did the responses show understanding of the feelings of the person talking about their experience?
- **Relevance**: did the responses seem appropriate to the conversation? Were they on-topic?
- **Fluency**: could you understand the responses? Did the language seem accurate?

### Results


## Conclusion
Future work will investigate how to integrate empathetic responding into more general dialogue.