# Tasks | Task | Description | | ---- | ----------- | | REW (Dialogue Rewrite)| predict the rewritten question of current turn based on the history.| |NLG (Natural Language Generation)| Given a slot-value table, predict its corresponding utterance.| |SUM (Dialogue Summary)| predict the summary based on the dialogue history.| |FILL (Slot Filling) | predict the slot-value table based on the given test. | |INTENT (Intent Dectection) | predict the intent of the input text. | | DST (Dialogue State Tracking) | predict the belief states based on the dialogue history. | | COMM (Commensense QA) | predict the answer based on the passage | | EMO (Emotion Detection) | predict the sentiment of each turn | | DOCQA (Document QA) | predict the answer based on the passage of each turn in extractive or generative manner. | | DIALQA (Dialog QA) | predict the answer based on the passage of each turn in extractive or generative manner. | | CHAT (Chitchat) | predict responses based on history and possible persona | | KGDIAL (Knowledge-graph Dialogue) | predict the response based on history and KG. | |TXT2SQL (Text-to-SQL) | predict the sql of the question based on history and database schema. | | SIM (User Simulator) | predict the response based on history and predefined knowledge. | | TOD (Task-oriented dialogue) | predict the response based on history and predefined knowledge | # Format **We adopt the [jsonl](https://jsonlines.org/) file format.** Each line represents a dialogue in json format. The json content is specified below: **NOTE: Do not set the value if it is not specified.** ```python { # 可选值: `single` or `multi`。表示是单轮对话还是多轮对话 "turn": str, # 该对话涉及到的领域(由于有的对话涉及多领域,所以是个列表) "domain": [], # 该对话的语言,用原数据集自带标注(en, fr, ... 等) "locale": str, # 对话,一个列表,列表中的每个元素是一个字典,表示一轮(turn) "dialog": [ { # 每轮涉及到的角色,有的数据集一轮涉及多个角色,所以是个列表 # 对于没有标注角色的数据集: # * 单轮数据,用 `ROLE` # * 多轮数据,用 `ROLE1`, `ROLE2`, ... "roles": [str, ...], # 该轮的文本 "utterance": str, # QA中的answer使用 "start": int, "end": int, "dialog_turn": int # 该轮文本对应的改写文本 "rewritten": str, # 对话状态,一个列表,列表中每个元素包括 # 领域: 有的数据集的标注会限制某个领域下有哪些槽值对 # 意图: 有的数据集的标注会限制某个意图下有哪些槽值对 # 槽值对:一个列表,每个元素包括一个槽以及对应的值 # 槽名:一个字符串 # 值:一个列表,一个槽可能会有多个值 # 每个元素包括四个部分:值,标准化后的值,在该轮文本 (utterance) 中对应的字符下标位置, # 。标准化后的值某些数据集才会标注。 # 关系:有的槽是等于某个值,有的是大于某个值,不填默认是等于 # 查询槽:一个列表,表示当前状态下还需要查询但是没有填充的槽名 "belief_state": [ { # 意图 "intent": str, # 槽值对 "informed_slot_value_table": [ { # 槽名 "slot": str, # 值 "values": [{ # 实际值 "value": str, # 标准化后的值 "cononical_value": str }, ...], # 槽-值关系 "relation": str, }, ... ], # 查询槽 "requested_slots": [], # 领域 "domain": str, }, ... ] # 对话动作,一个列表,表示当前轮的对话动作,列表中每个元素包括 # 领域:有的数据集的标注会限制某个领域下有哪些槽值对 # 动作:当前轮次涉及到的动作 # 槽值对:和对话状态相同 "dialog_acts": [ { # 动作 "act": str, # 槽值对 "slot_value_table": [ { # 槽名 "slot": str, # 槽-值关系 "relation": str, # 值 "values": [ { # 实际值 "value": str, # 标准化后的值 "cononical_value": str, # 起始位置 "start": int, # 终止位置 "end": int, },... ] }, ... ], # 领域 "domain": str, }, ... ], # slot filling "slots_to_fill": { "intent": str, "slot_value_table": [ { "slot": str, "values": [ { "value": str, "start": int, "end": int } ], "relation": str, # '=', '<=' and so on } ] }, # 命名实体识别 "named_entity_recognition": [ { "type": str, "values": [ { "value": str, "start": int, "end": int }, ... ] }, ... ] "characters": [ { "value": str, "start": int, "end": int } ] # 意图检测 "active_intents": [str], # query "query" { ... }, # query result "querying_result": { ... }, # recorded satisfied main items "main_items": [], # Aspect Sentiment Triplet Extraction 任务,一个列表,每个元素包括三个 # 目标实体 # 相关情感 # 反映情感的词 "aspects": [ { # 目标实体 "target": { # 实体值 "value": str, # 当前轮文本中的开始位置 "start": int, # 当前轮文本中的结束位置 "end": int }, # 目标实体的类别 "category": str, # 反映情感的词 "opinion": { # 情感词 "value": str, # 当前轮文本中的开始位置 "start": int, # 当前轮文本中的结束位置 "end": int }, # 相关情感 "sentiment": str } ], "emotions": [ { "emotion": str, "sentiment": "positive", "negative" or "ambiguous", "evidences": [ { "turn": int, "span": str, "start": int, "end": int } ], "evidence_types": [str] } ], "kg_label": str, # 每个turn可能用到的知识不同,根据这个字段挑选知识 "knowledge_to_select": str, # sql "sql": str, # rewritten "rewritten": str, "roles_to_select": [str], }, ], # 根据整个对话得出的summary "summary": str, # 依据整个对话判断的实体关系 "instance_relations": [ { "instance1": str, "instance2": str, "relations": [ { "relation": str, "trigger": str }, ... ] }, ... ] # 依据整个对话判断的角色关系 "role_relations": [ { "turn": int, "relation": str } ], # FriendsPersona使用,依据整个对话判断某个人物的persona "role_personas": [ { "name": str, "personas": [ { "persona": str, "sentiment": int }, ... ] } ], # 对话依赖的外部知识 "knowledge": { # `text`, `persona`, `kg` or `schema` "type": str, # for `text` "value": str, # for `persona`, persona of all roles, for personachat... "value": [ { # role name, the same as dialog turn "role": str, # persona description, may be seveal sentences "description": [] }, ... ] # for `kg` "value": { # `directed` or `undirected` "direction": str, # graph "graph": [ { # source node "source": str, # target node "target": str, # relation "relation": str }, ... ] } # for `schema` "value": { ... } # for `dialogue` "value": { "dialog": [], "relations": [] } # for `wiki` "value": { ... } # for 'sql' "value": [ { "turn": int, "sql": str, "result": ... }, ... ], # 有的对话是基于某个文章中某个片段的,该字段表示文章和片段的标题 "value": { "article title": str, "section title": str }, } } ``` # Update Log * [md, 2022.07.20] => remove the `target` section and specify these info in `summary`, `intent`, `answer`, etc. * [xrk, 2022.07.27] => put `act` into `slot_value_table`, add `goal` from a turn. * [hyh, 2022.08.02] => put `locale`, `scenario` into `dialog`, modify the definition of `intent`. * [xrk, 2022.08.03] => put `topic` into `dialog`; modify the definition of `summary` from "str" to "str or list"; add `QA`; modify the definition of `answer`. * [xrk, 2022.08.10] => modify the description of `summary` for ASTE-Data-V2; add `opinions` for sentihood. * [md, 2022.08.18] => extend `slot_value_table` to `belief_state` to support multi intent belief states (DST); * [md, 2022.08.18] => introduct the `querying_result` for the querying result for some turns (e.g. SYSTEM). * [hyh, 2022.08.18] => remove the `scenario` and `intent` outside the `belief_state`* [xrk, 2022.08.19] => remove the `QA` and `goal` outside the `belief_state`, introduct `act`. * [md, 2022.09.14] => add `hyp` for AlphaNLI. * [md, 2022.09.15] => add `start` and `end` expressing the start end end index of each slot value in character level. * [md, 2022.09.16] => add `aspects` section. * [md, 2022.09.18] => refine `belief_state` and add `dialog_act` and `active_intent`. * [md, 2022.09.19] => add `title` section. # TODO * query: API, Constraints format * querying result: format # Datasets ## AlphaNLI ``` obs1: str obs2: str hyp: str label: bool ``` ## ASTE (v2) ``` utterance: str tags: [{ target: { value: str, start: int, end: int }, opinion: { value: str, start: int, end: int }, sentiment: str }, ... ] ``` ## CamRest676 ``` User: { transcript: str, slu: [ { act: str, slots: [ str, (slot) str, (values) ... ] } ] } Sys: { "sent": str "DA" [ str (slot), ... ] } ``` ## CANARD ``` { passage: search on QuAC by dial id. str, Question: str, Rewrite: str } ``` ## CLINC150 ``` { utterance: str, intent: str, domain: str } ``` ## Commonsense-Dialogues ``` { context: str, utterance: str } ``` ## CommonsenseQA ``` { stem: str, (question) choices: [ { label: str, text: str }, ... ] } ``` ## CosmosQA ``` { context: str, answers: [ str, ... ], label: int } ``` ## CommonsenseQA 2.0 ``` { question: str, answer: yes / no } ``` ## SAMSum Corpus ``` { dialogue: str, summary: str } ``` ## DailyDialog ``` { topic: str, utterance: str, da: str, emotion: str, } ``` ## DDRel ``` { utterance: str, label: [13-class, 6-class, 4-class] } ``` ## DialogRE ``` { utterance: str, relations: [ x: str, y: str, r: str, t: str, x_type: str, y_type: str, ] } ``` ## DialogSum ``` { utterance: str, summary: str, topic: str } ``` ## DREAM ``` { dialogue: str, question: str, choice: [ str, ... ] answer: str } ``` ## E2E ``` { ref: str, slot value table. } ``` ## Identification-Character-EmoryNLP ``` { utterance: str, speakers: [str, ], character_entities: [ [start, end, value] ] } ``` ## Emotion-Detection-EmoryNLP ``` { utterance: str, speakers: [str, ] emotion: str } ``` ## FriendsPersona ``` { roles: [str], utterance: str, persona: [str, ] } ``` ## EmpatheticDialogues ``` { context: str, prompt: str, utterance: str, } ``` ## FriendsQA ``` { context: [ dialogue multi turns ], question: str, answers: [ { text: str, utterance_id: int, start: int, end: int, is_speaker: bool } ] } ``` ## GoEmotions ``` { utterance: str, emotions: [str, ...] sentiment_dict } ``` ## Google Simulated Dialogue ``` { dialogue_state: [ { slot: str, value: str }, ... ], user_acts: [ { slot: str, type: str, value: str, } ], user_utterance: { slots: [{ slot: str, start: int, exclusive_end: int }], text: str, tokens: List } user_intents: [] system_acts: [ { slot: str, type: str, value: str } ], system_utterance: ... } ``` ## Banking77 ``` { text: str, intent: str } ``` ## HWU64 ``` { text: str, category: str } ``` ## MELD ``` { Utterance: str, Speaker: str, Emotion: str, Sentiment: positive, negative, neutral } ``` ## Molweni ``` { context: [ multi turn dialogue ], question: str, answers: [ { text: str, start: int } ] } ``` ## MultiWOZ 2.2 (identical to SGD) ``` { services: [str], frames: [ { actions: [], service: str, slots: [ ], state: { ... } } ] } ``` ## Reading-comprehension ``` { context: [ multi turn dialog ], query: str, answer: value } ``` ## RECCON ``` { speaker: str, utterance: str, emotion: str, evidence: [turn id] span: [utterance span str,] type: [ str, ] } ``` ## Restaurant8k ``` { UserInput: str, labels: [ { slot: str, value: str, start: int, end: int } ] } ``` ## DSTC8 identical to Restaurant8k ## RNNLG ``` { da+svt: str, ref1: str, ref2: str } ``` ## SentiHood ``` { text: str, options: [ { sentiment: str, aspect: str, target_entity: str } ] } ``` ## SNIPS ``` { text: str, svt } ``` ## SocialIQA ``` { context: str, answers: [ str, ... ] } ``` ## MuDoCo ``` { utterance: str, rewritten_utterance: str } ``` ## MAMS ``` ATSA: { text: str from: int, to: int, polarity: str, term } ACSA: { text: str, polarity: str, category: str } ``` ## SGD ``` { services } ``` ## TaskMaster ## MultiDoGo ## DSTC2 ## DSTC3 ## CoQA ## DoQA ## NarrativeQA ## QuAC ## RACE ## SQuAD ## MuTual ## E2E_Dialogue ## Spider ## SParC ## Soccer ## NLU++ ## MASSIVE ## BiToD ## COD ## GlobalWoZ ## DSTC6 ## XPersona ## MINDS-14 ## MKQA ## MTOP ## Multi2WOZ ## Multilingual TOP ## Multilingual WOZ