Cross-Lingual Natural Language Inference
Dataset
A benchmark dataset covering 15 languages for cross-lingual natural language inference, used to evaluate multilingual models’ cross-language understanding and zero-shot transfer capabilities.
Dataset Highlights
The authoritative benchmark in the cross-lingual natural language inference field, widely used for multilingual model evaluation
Coverage of 15 Languages
Includes English, French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili, and Urdu, covering multiple language families.
High-Quality Annotations
The test set is manually translated and annotated by professional translators, ensuring translation quality and label consistency, avoiding noise and bias introduced by machine translation.
Zero-Shot Transfer Evaluation
Designed specifically to evaluate cross-lingual zero-shot transfer abilities. Models trained on English can directly be tested across the other 14 languages, measuring real cross-lingual generalization.
Standardized Format
Each sample includes three fields: premise, hypothesis, and label. Clear, uniform structure facilitates direct use for model training and evaluation.
Academic and Authoritative Source
Published by Conneau et al. from Facebook AI Research in 2018, extended from the MultiNLI dataset, making it one of the most authoritative benchmarks in cross-lingual NLU.
Widely Cited
Used as a core evaluation benchmark by milestones like mBERT, XLM, XLM-R, occupying an irreplaceable position in multilingual pretraining research.
Application Scenarios
From cross-lingual research to multilingual products, covering multiple key applications
Cross-Lingual Transfer
Assess the model’s inference ability when trained on English and transferred zero-shot to other languages, measuring cross-lingual generalization
Multilingual NLU
Test the natural language understanding capabilities of multilingual pretraining models (like mBERT, XLM-R), comparing different architectures’ performance
Model Benchmarking
Serve as a standard evaluation benchmark for multilingual models, used for paper experiments, leaderboard rankings, and performance tracking
Language Model Evaluation
Evaluate the reasoning and semantic understanding capabilities of large language models across languages, identifying performance gaps in low-resource languages
Data Preview
Below are sample samples from the XNLI dataset, showcasing premise-hypothesis pairs and their inference labels
[
{
"premise": "And he said, Mama, I'm home.",
"hypothesis": "He called out to his mother.",
"label": "entailment",
"language": "en"
},
{
"premise": "And he said, Mama, I'm home.",
"hypothesis": "He didn't say a word.",
"label": "contradiction",
"language": "en"
},
{
"premise": "Et il a dit, Maman, je suis rentré.",
"hypothesis": "Il a appelé sa mère.",
"label": "entailment",
"language": "fr"
},
{
"premise": "他说,妈妈,我回来了。",
"hypothesis": "他一句话也没说。",
"label": "contradiction",
"language": "zh"
},
{
"premise": "And he said, Mama, I'm home.",
"hypothesis": "He is outside the house.",
"label": "neutral",
"language": "en"
}
]
3-Step Quick Start
From browsing to evaluation, get started with your cross-lingual NLU research in minutes
Browse Dataset
View dataset details on Ace Data Cloud platform, including sample distribution across 15 languages, label definitions, and license information.
Download Data
Download XNLI data files containing 7,500 validation samples and 5,000 test samples per language. Ready to use out of the box.
Load and Evaluate
Use datasets.load_dataset("xnli") to load the data and run inference evaluation experiments on multilingual models.
Start Exploring Cross-Lingual Inference Data
Authoritative benchmark dataset covering 15 languages. Whether you are a multilingual NLP researcher or a language model developer, XNLI is the top choice for evaluating cross-lingual capabilities.
