QuAC Dataset

QuAC Conversational Question Answering
Dataset

Contains 14,000 Wikipedia-based multi-turn conversational question answering dialogues, simulating information-seeking conversations between students and teachers. It is an important benchmark in the field of conversational reading comprehension.

14,000 Dialogues 98,000 QA Pairs CC BY-SA 4.0 License Choi et al. (2018)
QuAC Dataset
πŸ’¬
14,000
Number of Multi-turn Dialogues
❓
98,000
Total QA Pairs
πŸ“„
8,600+
Wikipedia Paragraphs
πŸ“œ
CC BY-SA 4.0
Open License

Dataset Highlights

A benchmark for conversational reading comprehension, covering real multi-turn interactive scenarios

πŸŽ“

Student-Teacher Dialogue Mode

Simulates real information-seeking scenarios: students ask questions to explore unknown topics, teachers provide answer snippets based on Wikipedia paragraphs, resulting in natural and fluent conversations.

πŸ”„

Multi-turn Context Dependency

Each dialogue contains an average of 7 QA turns, where subsequent questions depend on previous context, challenging models to track context and resolve coreferences.

πŸ“–

Wikipedia Knowledge Base

All dialogues are based on over 8,600 Wikipedia paragraphs covering diverse domains such as people, history, and science, providing broad knowledge coverage.

🎯

Extractive Answer Spans

Each answer is annotated as an exact text span from the original passage, supporting standard reading comprehension evaluation methods.

🏷️

Dialogue Behavior Annotations

Includes follow-up question flags and unanswerable question markers, providing rich metadata to support research on dialogue strategies.

πŸ›οΈ

Authoritative Academic Source

Released jointly by the University of Washington and Allen AI at EMNLP 2018, widely cited and adopted as a benchmark in academia and industry.

Applicable Scenarios

From academic research to industrial applications, covering various conversational understanding needs

πŸ’¬

Conversational Question Answering

Train and evaluate QA systems capable of understanding context and tracking topics across multiple dialogue turns

πŸ€–

Dialogue Systems

Build intelligent conversational agents with information-seeking capabilities to enhance chatbot deep interaction experiences

🧠

Contextual Understanding

Research challenges in dialogue context modeling such as coreference resolution, ellipsis recovery, and topic shifts

πŸ”

Multi-turn Reasoning

Evaluate models’ abilities in cross-turn reasoning, information aggregation, and progressive understanding

Conversational QA Multi-turn Dialogue Contextual Understanding Information Retrieval Reading Comprehension

Data Preview

Below is a typical example of a multi-turn conversational QA dialogue, showing the interaction between a student and a teacher based on a Wikipedia paragraph

JSON
{
  "dialog_id": "C_6c5f277c0eef4b6e9e24b5e2b063673a_1",
  "wikipedia_page_title": "Daffy Duck",
  "background": "Daffy Duck is an animated cartoon character...",
  "section_title": "Early years",
  "context": "The earliest version of Daffy Duck appeared in the
    cartoon Porky's Duck Hunt, released on April 17, 1937.
    The cartoon was directed by Tex Avery and animated by
    Bob Clampett. Daffy's name was given to him by Mel
    Blanc, who provided his original voice...",
  "turns": [
    {
      "turn_id": 0,
      "question": "When did Daffy Duck first appear?",
      "answer": "April 17, 1937",
      "follow_up": "y"
    },
    {
      "turn_id": 1,
      "question": "Who directed that cartoon?",
      "answer": "Tex Avery",
      "follow_up": "y"
    },
    {
      "turn_id": 2,
      "question": "Who animated it?",
      "answer": "Bob Clampett",
      "follow_up": "y"
    },
    {
      "turn_id": 3,
      "question": "How did he get his name?",
      "answer": "Daffy's name was given to him by Mel Blanc",
      "follow_up": "n"
    }
  ]
}

3-Step Quick Start

From browsing to analysis, start your conversational understanding research in minutes

01

Browse the Dataset

View dataset details on the Ace Data Cloud platform, including dialogue structure, field descriptions, and license information.

02

Download Data

Obtain the JSON file containing 14,000 multi-turn dialogues, with a clear data structure, ready to use without additional preprocessing.

03

Load and Analyze

Use json.load() to load the data, parse QA pairs by dialogue turns, and start building your conversational understanding model.

Start Exploring Conversational QA Data

An authoritative benchmark dataset with an open license, available for immediate download. Whether you are an NLP researcher or a dialogue system developer, QuAC is an indispensable research resource.