# Tasks
| Task | Description |
| ---- | ----------- |
| REW (Dialogue Rewrite)| predict the rewritten question of current turn based on the history.|
|NLG (Natural Language Generation)| Given a slot-value table, predict its corresponding utterance.|
|SUM (Dialogue Summary)| predict the summary based on the dialogue history.|
|FILL (Slot Filling) | predict the slot-value table based on the given test. |
|INTENT (Intent Dectection) | predict the intent of the input text. |
| DST (Dialogue State Tracking) | predict the belief states based on the dialogue history. |
| COMM (Commensense QA) | predict the answer based on the passage |
| EMO (Emotion Detection) | predict the sentiment of each turn |
| DOCQA (Document QA) | predict the answer based on the passage of each turn in extractive or generative manner. |
| DIALQA (Dialog QA) | predict the answer based on the passage of each turn in extractive or generative manner. |
| CHAT (Chitchat) | predict responses based on history and possible persona |
| KGDIAL (Knowledge-graph Dialogue) | predict the response based on history and KG. |
|TXT2SQL (Text-to-SQL) | predict the sql of the question based on history and database schema. |
| SIM (User Simulator) | predict the response based on history and predefined knowledge. |
| TOD (Task-oriented dialogue) | predict the response based on history and predefined knowledge |
# Format
**We adopt the [jsonl](https://jsonlines.org/) file format.** Each line represents a dialogue in json format. The json content is specified below:
**NOTE: Do not set the value if it is not specified.**
```python
{
# 可选值: `single` or `multi`。表示是单轮对话还是多轮对话
"turn": str,
# 该对话涉及到的领域(由于有的对话涉及多领域,所以是个列表)
"domain": [],
# 该对话的语言,用原数据集自带标注(en, fr, ... 等)
"locale": str,
# 对话,一个列表,列表中的每个元素是一个字典,表示一轮(turn)
"dialog": [
{
# 每轮涉及到的角色,有的数据集一轮涉及多个角色,所以是个列表
# 对于没有标注角色的数据集:
# * 单轮数据,用 `ROLE`
# * 多轮数据,用 `ROLE1`, `ROLE2`, ...
"roles": [str, ...],
# 该轮的文本
"utterance": str,
# QA中的answer使用
"start": int,
"end": int,
"dialog_turn": int
# 该轮文本对应的改写文本
"rewritten": str,
# 对话状态,一个列表,列表中每个元素包括
# 领域: 有的数据集的标注会限制某个领域下有哪些槽值对
# 意图: 有的数据集的标注会限制某个意图下有哪些槽值对
# 槽值对:一个列表,每个元素包括一个槽以及对应的值
# 槽名:一个字符串
# 值:一个列表,一个槽可能会有多个值
# 每个元素包括四个部分:值,标准化后的值,在该轮文本 (utterance) 中对应的字符下标位置,
# 。标准化后的值某些数据集才会标注。
# 关系:有的槽是等于某个值,有的是大于某个值,不填默认是等于
# 查询槽:一个列表,表示当前状态下还需要查询但是没有填充的槽名
"belief_state": [
{
# 意图
"intent": str,
# 槽值对
"informed_slot_value_table": [
{
# 槽名
"slot": str,
# 值
"values": [{
# 实际值
"value": str,
# 标准化后的值
"cononical_value": str
}, ...],
# 槽-值关系
"relation": str,
},
...
],
# 查询槽
"requested_slots": [],
# 领域
"domain": str,
}, ...
]
# 对话动作,一个列表,表示当前轮的对话动作,列表中每个元素包括
# 领域:有的数据集的标注会限制某个领域下有哪些槽值对
# 动作:当前轮次涉及到的动作
# 槽值对:和对话状态相同
"dialog_acts": [
{
# 动作
"act": str,
# 槽值对
"slot_value_table": [
{
# 槽名
"slot": str,
# 槽-值关系
"relation": str,
# 值
"values": [
{
# 实际值
"value": str,
# 标准化后的值
"cononical_value": str,
# 起始位置
"start": int,
# 终止位置
"end": int,
},...
]
},
...
],
# 领域
"domain": str,
},
...
],
# slot filling
"slots_to_fill": {
"intent": str,
"slot_value_table": [
{
"slot": str,
"values": [
{
"value": str,
"start": int,
"end": int
}
],
"relation": str, # '=', '<=' and so on
}
]
},
# 命名实体识别
"named_entity_recognition": [
{
"type": str,
"values": [
{
"value": str,
"start": int,
"end": int
}, ...
]
}, ...
]
"characters": [
{
"value": str,
"start": int,
"end": int
}
]
# 意图检测
"active_intents": [str],
# query
"query" {
...
},
# query result
"querying_result": {
...
},
# recorded satisfied main items
"main_items": [],
# Aspect Sentiment Triplet Extraction 任务,一个列表,每个元素包括三个
# 目标实体
# 相关情感
# 反映情感的词
"aspects": [
{
# 目标实体
"target": {
# 实体值
"value": str,
# 当前轮文本中的开始位置
"start": int,
# 当前轮文本中的结束位置
"end": int
},
# 目标实体的类别
"category": str,
# 反映情感的词
"opinion": {
# 情感词
"value": str,
# 当前轮文本中的开始位置
"start": int,
# 当前轮文本中的结束位置
"end": int
},
# 相关情感
"sentiment": str
}
],
"emotions": [
{
"emotion": str,
"sentiment": "positive", "negative" or "ambiguous",
"evidences": [
{
"turn": int,
"span": str,
"start": int,
"end": int
}
],
"evidence_types": [str]
}
],
"kg_label": str,
# 每个turn可能用到的知识不同,根据这个字段挑选知识
"knowledge_to_select": str,
# sql
"sql": str,
# rewritten
"rewritten": str,
"roles_to_select": [str],
},
],
# 根据整个对话得出的summary
"summary": str,
# 依据整个对话判断的实体关系
"instance_relations": [
{
"instance1": str,
"instance2": str,
"relations": [
{
"relation": str,
"trigger": str
}, ...
]
}, ...
]
# 依据整个对话判断的角色关系
"role_relations": [
{
"turn": int,
"relation": str
}
],
# FriendsPersona使用,依据整个对话判断某个人物的persona
"role_personas": [
{
"name": str,
"personas": [
{
"persona": str,
"sentiment": int
}, ...
]
}
],
# 对话依赖的外部知识
"knowledge": {
# `text`, `persona`, `kg` or `schema`
"type": str,
# for `text`
"value": str,
# for `persona`, persona of all roles, for personachat...
"value": [
{
# role name, the same as dialog turn
"role": str,
# persona description, may be seveal sentences
"description": []
},
...
]
# for `kg`
"value": {
# `directed` or `undirected`
"direction": str,
# graph
"graph": [
{
# source node
"source": str,
# target node
"target": str,
# relation
"relation": str
},
...
]
}
# for `schema`
"value": {
...
}
# for `dialogue`
"value": {
"dialog": [],
"relations": []
}
# for `wiki`
"value": {
...
}
# for 'sql'
"value": [
{
"turn": int,
"sql": str,
"result": ...
}, ...
],
# 有的对话是基于某个文章中某个片段的,该字段表示文章和片段的标题
"value": {
"article title": str,
"section title": str
},
}
}
```
# Update Log
* [md, 2022.07.20] => remove the `target` section and specify these info in `summary`, `intent`, `answer`, etc.
* [xrk, 2022.07.27] => put `act` into `slot_value_table`, add `goal` from a turn.
* [hyh, 2022.08.02] => put `locale`, `scenario` into `dialog`, modify the definition of `intent`.
* [xrk, 2022.08.03] => put `topic` into `dialog`; modify the definition of `summary` from "str" to "str or list"; add `QA`; modify the definition of `answer`.
* [xrk, 2022.08.10] => modify the description of `summary` for ASTE-Data-V2; add `opinions` for sentihood.
* [md, 2022.08.18] => extend `slot_value_table` to `belief_state` to support multi intent belief states (DST);
* [md, 2022.08.18] => introduct the `querying_result` for the querying result for some turns (e.g. SYSTEM).
* [hyh, 2022.08.18] => remove the `scenario` and `intent` outside the `belief_state`* [xrk, 2022.08.19] => remove the `QA` and `goal` outside the `belief_state`, introduct `act`.
* [md, 2022.09.14] => add `hyp` for AlphaNLI.
* [md, 2022.09.15] => add `start` and `end` expressing the start end end index of each slot value in character level.
* [md, 2022.09.16] => add `aspects` section.
* [md, 2022.09.18] => refine `belief_state` and add `dialog_act` and `active_intent`.
* [md, 2022.09.19] => add `title` section.
# TODO
* query: API, Constraints format
* querying result: format
# Datasets
## AlphaNLI
```
obs1: str
obs2: str
hyp: str
label: bool
```
## ASTE (v2)
```
utterance: str
tags: [{
target: {
value: str,
start: int,
end: int
},
opinion: {
value: str,
start: int,
end: int
},
sentiment: str
}, ...
]
```
## CamRest676
```
User:
{
transcript: str,
slu: [
{
act: str,
slots: [
str, (slot)
str, (values)
...
]
}
]
}
Sys:
{
"sent": str
"DA" [
str (slot), ...
]
}
```
## CANARD
```
{
passage: search on QuAC by dial id. str,
Question: str,
Rewrite: str
}
```
## CLINC150
```
{
utterance: str,
intent: str,
domain: str
}
```
## Commonsense-Dialogues
```
{
context: str,
utterance: str
}
```
## CommonsenseQA
```
{
stem: str, (question)
choices: [
{
label: str,
text: str
}, ...
]
}
```
## CosmosQA
```
{
context: str,
answers: [
str, ...
],
label: int
}
```
## CommonsenseQA 2.0
```
{
question: str,
answer: yes / no
}
```
## SAMSum Corpus
```
{
dialogue: str,
summary: str
}
```
## DailyDialog
```
{
topic: str,
utterance: str,
da: str,
emotion: str,
}
```
## DDRel
```
{
utterance: str,
label: [13-class, 6-class, 4-class]
}
```
## DialogRE
```
{
utterance: str,
relations: [
x: str,
y: str,
r: str,
t: str,
x_type: str,
y_type: str,
]
}
```
## DialogSum
```
{
utterance: str,
summary: str,
topic: str
}
```
## DREAM
```
{
dialogue: str,
question: str,
choice: [
str, ...
]
answer: str
}
```
## E2E
```
{
ref: str,
slot value table.
}
```
## Identification-Character-EmoryNLP
```
{
utterance: str,
speakers: [str, ],
character_entities: [
[start, end, value]
]
}
```
## Emotion-Detection-EmoryNLP
```
{
utterance: str,
speakers: [str, ]
emotion: str
}
```
## FriendsPersona
```
{
roles: [str],
utterance: str,
persona: [str, ]
}
```
## EmpatheticDialogues
```
{
context: str,
prompt: str,
utterance: str,
}
```
## FriendsQA
```
{
context: [
dialogue multi turns
],
question: str,
answers: [
{
text: str,
utterance_id: int,
start: int,
end: int,
is_speaker: bool
}
]
}
```
## GoEmotions
```
{
utterance: str,
emotions: [str, ...]
sentiment_dict
}
```
## Google Simulated Dialogue
```
{
dialogue_state: [
{
slot: str,
value: str
}, ...
],
user_acts: [
{
slot: str,
type: str,
value: str,
}
],
user_utterance: {
slots: [{
slot: str,
start: int,
exclusive_end: int
}],
text: str,
tokens: List
}
user_intents: []
system_acts: [
{
slot: str,
type: str,
value: str
}
],
system_utterance: ...
}
```
## Banking77
```
{
text: str,
intent: str
}
```
## HWU64
```
{
text: str,
category: str
}
```
## MELD
```
{
Utterance: str,
Speaker: str,
Emotion: str,
Sentiment: positive, negative, neutral
}
```
## Molweni
```
{
context: [
multi turn dialogue
],
question: str,
answers: [
{
text: str,
start: int
}
]
}
```
## MultiWOZ 2.2 (identical to SGD)
```
{
services: [str],
frames: [
{
actions: [],
service: str,
slots: [
],
state: {
...
}
}
]
}
```
## Reading-comprehension
```
{
context: [
multi turn dialog
],
query: str,
answer: value
}
```
## RECCON
```
{
speaker: str,
utterance: str,
emotion: str,
evidence: [turn id]
span: [utterance span str,]
type: [
str,
]
}
```
## Restaurant8k
```
{
UserInput: str,
labels: [
{
slot: str,
value: str,
start: int,
end: int
}
]
}
```
## DSTC8 identical to Restaurant8k
## RNNLG
```
{
da+svt: str,
ref1: str,
ref2: str
}
```
## SentiHood
```
{
text: str,
options: [
{
sentiment: str,
aspect: str,
target_entity: str
}
]
}
```
## SNIPS
```
{
text: str,
svt
}
```
## SocialIQA
```
{
context: str,
answers: [
str, ...
]
}
```
## MuDoCo
```
{
utterance: str,
rewritten_utterance: str
}
```
## MAMS
```
ATSA:
{
text: str
from: int,
to: int,
polarity: str,
term
}
ACSA:
{
text: str,
polarity: str,
category: str
}
```
## SGD
```
{
services
}
```
## TaskMaster
## MultiDoGo
## DSTC2
## DSTC3
## CoQA
## DoQA
## NarrativeQA
## QuAC
## RACE
## SQuAD
## MuTual
## E2E_Dialogue
## Spider
## SParC
## Soccer
## NLU++
## MASSIVE
## BiToD
## COD
## GlobalWoZ
## DSTC6
## XPersona
## MINDS-14
## MKQA
## MTOP
## Multi2WOZ
## Multilingual TOP
## Multilingual WOZ