SNLI Natural Language Inference
Dataset
The first large-scale natural language inference dataset, containing 570,000 manually crafted sentence pairs labeled with entailment, contradiction, and neutral relations. A foundational benchmark in the NLI field.
Dataset Highlights
A foundational benchmark dataset in NLI, driving progress in natural language understanding research
Large-Scale Manual Annotation
All 570,000 sentence pairs were manually written by crowdworkers, not automatically generated. Each data point was independently annotated by 5 annotators to ensure high quality and consistency.
Three Semantic Relations
Covers the three core inference relations: entailment, contradiction, and neutral, encompassing fundamental dimensions of natural language understanding.
Foundational Benchmark
As the first large-scale NLI dataset, SNLI is widely used to evaluate and compare various natural language understanding models, advancing models like BERT and GPT.
Vision-Based Text
Premise sentences are sourced from Flickr30K image captions, naturally possessing scene-based and concrete characteristics, providing a bridge for multimodal research.
Annotator Consistency
Each sample retains independent labels from 5 annotators, supporting inter-annotator agreement analysis, suitable for meta-learning and data quality research.
Academic Authority
Released by the Stanford NLP group, the paper has been cited over ten thousand times. It is one of the most influential datasets in NLP, widely adopted in academia and industry.
Applicable Scenarios
From fundamental research to industrial applications, covering the full natural language understanding pipeline
Natural Language Inference
Train and evaluate NLI models to determine entailment, contradiction, or neutral relations between two sentences
Sentence Embeddings
Use sentence pair relations to train high-quality sentence vector representations, improving semantic similarity and retrieval performance
Transfer Learning
Fine-tune models like BERT and RoBERTa on this pretraining task to enhance downstream NLU task performance
Textual Entailment Detection
Build core reasoning modules for applications such as fact verification, question answering, and text consistency checking
Data Preview
Below is a JSON format example from the SNLI dataset, showing the premise, hypothesis, and label fields
[
{
"premise": "A person on a horse jumps over a broken down airplane.",
"hypothesis": "A person is training his horse for a competition.",
"label": "neutral",
"annotator_labels": ["neutral", "entailment", "neutral", "neutral", "neutral"]
},
{
"premise": "A person on a horse jumps over a broken down airplane.",
"hypothesis": "A person is at a diner, ordering an omelette.",
"label": "contradiction",
"annotator_labels": ["contradiction", "contradiction", "contradiction", "contradiction", "contradiction"]
},
{
"premise": "A person on a horse jumps over a broken down airplane.",
"hypothesis": "A person is outdoors, on a horse.",
"label": "entailment",
"annotator_labels": ["entailment", "entailment", "entailment", "entailment", "entailment"]
},
{
"premise": "Children smiling and waving at camera.",
"hypothesis": "They are smiling at their parents.",
"label": "neutral",
"annotator_labels": ["neutral", "neutral", "neutral", "neutral", "entailment"]
},
{
"premise": "Children smiling and waving at camera.",
"hypothesis": "The kids are frowning.",
"label": "contradiction",
"annotator_labels": ["contradiction", "contradiction", "contradiction", "contradiction", "contradiction"]
}
]
3 Steps to Get Started Quickly
From browsing to research, start your NLI experiments in minutes
Browse the Dataset
View dataset details on the Ace Data Cloud platform, including field descriptions, label distributions, and license information.
Download Data
Obtain the SNLI dataset's train/dev/test splits, containing 570,000 sentence pairs in JSON format, ready to use out of the box.
Load and Train
Use datasets.load_dataset("snli") or directly load the JSON to start training and evaluating NLI models.
Start Exploring Natural Language Inference Data
A foundational benchmark in NLI, open licensed, available for immediate download. Whether you are an NLP researcher or a deep learning engineer, SNLI is an indispensable experimental cornerstone.
