DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset
We develop a high-quality multi-turn dialog dataset, DailyDialog, which is intriguing in several aspects. The language is human-written and less noisy. The dialogues in the dataset reflect our daily communication way and cover various topics about our daily life. We also manually label the developed dataset with communication intention and emotion information. Then, we evaluate existing approaches on DailyDialog dataset and hope it benefit the research field of dialog systems.
The dataset file (.zip) has been updated on Nov 1, 2017 to fix a segmentation bug in train.zip. Thank Sanghoon from KAIST to report this bug!
[New!] (Poria et al., 2020) has created extra annotations on DailyDialog, please check here for detailed information.
[New!] The license of DailyDialog has been updated!
License
DailyDialog dataset is licensed under CC BY-NC-SA 4.0.
If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. Any third party annotation is welcome. Note the dataset may not be adopted for commercial use.
Basic Statistics
Th developed DailyDialog dataset contains 13,118 multi-turn dialogues. We also count the average speaker turns and tokens to give a brief view of the dataset. The resulting statistics are given in the table below. From the statistics we can see, the speaker turns are roughly 8, and the average tokens per utterance is about 15.
Total Dialogues | 13,118 |
Average Speaker Turns Per Dialogue | 7.9 |
Average Tokens Per Dialogue | 114.7 |
Average Tokens Per Utterance | 14.6 |
Example
The words in Italic are speaker B’s own ideas that are new for the other speaker A. The underlined words in purple explicitly indicate the emotions. In the fourth speaker turn, speaker B first expresses his/her feeling on what he/she has heared from speaker A, which reveals his/her understanding. Then, speaker B suggests by saying “Just breathe deeply when you feel yourself getting upset”. Following the direct response towards A, B’s suggestion is original yet context-dependent. It shows that B builds up a connection link by responding to forgoing context and proposing new suggestions.
Basic Features
Download
-
For downloading, please follow the copyright declaration and click: DailyDialog.zip
-
Latest update on Nov 1, 2017.
Usage and Parser
- Thank Sanghoon Kang from KAIST who provides a code to parse our dataset, and shares it with us: https://github.com/Sanghoon94/DailyDialogue-Parser
Citation
-
We appreciate your citation if you find our dataset is beneficial.
-
Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, and Shuzi Niu. DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset. IJCNLP 2017. [pdf] [arXiv] [dataset]
@InProceedings{li2017dailydialog,
author = {Li, Yanran and Su, Hui and Shen, Xiaoyu and Li, Wenjie and Cao, Ziqiang and Niu, Shuzi},
title = {DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset},
booktitle = {Proceedings of The 8th International Joint Conference on Natural Language Processing (IJCNLP 2017)},
year = {2017}
}
Copyright
-
The original copyright of all the conversations belongs to the source owner.
-
The copyright of annotation belongs to our group, and they are free to the public.
-
The dataset is only for research purposes. Without permission, it may not be used for any commercial purposes.