CoQA Conversational Question Answering
Dataset
Stanford CoQA contains 127K question-answer pairs covering 8,000 multi-turn conversations across 7 domains, supporting both extractive and free-form conversational question answering.
Dataset Highlights
The first large-scale multi-domain conversational question answering dataset, advancing natural language understanding research
7 Diverse Domains
Covers children's stories, literature, middle school exams, news, Wikipedia, Reddit, and scientific articles, ensuring cross-domain generalization capability.
Real Multi-turn Conversations
Each conversation contains multiple naturally connected Q&A turns, with coreference resolution and contextual dependencies, closely reflecting real dialogue scenarios.
Extractive + Free-form Answers
Supports both extractive answers from the original text and free-form generated answers; each answer includes rationale spans from the source text to facilitate model training and evaluation.
Large-scale High Quality
Over 127,000 crowdsourced Q&A pairs with strict quality control; each conversation averages 15 Q&A turns, resulting in dense data.
Significant Academic Impact
Published by Stanford NLP team led by Reddy et al. in 2019 in TACL, widely cited and a core benchmark in conversational QA research.
Rich Annotation Information
Each sample includes story/paragraph, question sequences, free-form answers, rationale spans, and domain labels, providing comprehensive annotation dimensions.
Applicable Scenarios
From academic research to industrial applications, covering core conversational AI scenarios
Conversational Question Answering
Train and evaluate QA models capable of multi-turn dialogue understanding, handling coreference resolution and contextual dependencies
Multi-domain Understanding
Test model transfer and generalization capabilities across domains such as children's stories, news, and science
Generative Answers
Train models to generate natural and fluent free-form answers, beyond simply extracting text spans from the source
Dialogue System Development
Provide high-quality training and evaluation data for dialogue systems such as intelligent customer service, educational tutoring, and reading assistants
Data Preview
The following is an example of a multi-turn conversation from the children's stories domain
{
"source": "mctest",
"domain": "children_stories",
"story": "Once upon a time, in a barn near a farm house,
there lived a little white kitten named Cotton.
Cotton lived high up in a nice warm place above
the barn where all of the hay was stored...",
"questions": [
{"turn_id": 1, "input_text": "What was the kitten's name?"},
{"turn_id": 2, "input_text": "Where did it live?"},
{"turn_id": 3, "input_text": "Was it alone?"},
{"turn_id": 4, "input_text": "Who were its friends?"}
],
"answers": [
{
"turn_id": 1,
"input_text": "Cotton",
"span_text": "a little white kitten named Cotton"
},
{
"turn_id": 2,
"input_text": "In a barn",
"span_text": "in a barn near a farm house"
},
{
"turn_id": 3,
"input_text": "No",
"span_text": "Cotton had two friends"
},
{
"turn_id": 4,
"input_text": "A hen and a dog",
"span_text": "a chicken named Marge and a dog named Lulu"
}
]
}
3 Steps to Get Started Quickly
From browsing to research, start your conversational QA project in minutes
Browse the Dataset
View dataset details on the Ace Data Cloud platform, including domain distribution, annotation format, and data statistics metadata.
Download Data
Obtain CoQA training and validation JSON files containing complete multi-turn conversations, answers, and rationale spans.
Load and Train
Use json.load() to parse the data, build conversational QA models, or fine-tune and evaluate existing models.
Start Exploring Conversational QA Data
The Stanford CoQA dataset: 127K Q&A pairs, 7 major domains, multi-turn conversations. Whether you are an NLP researcher or dialogue system developer, this dataset is an indispensable benchmark resource.
