<style>
.obelics-page * { box-sizing: border-box; }
.obelics-page h1, .obelics-page h2, .obelics-page h3, .obelics-page h4, .obelics-page h5, .obelics-page h6, .obelics-page p, .obelics-page ul, .obelics-page ol, .obelics-page li, .obelics-page pre, .obelics-page blockquote, .obelics-page table, .obelics-page td, .obelics-page th { margin: 0; padding: 0; }
.obelics-page {
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
color: var(--el-text-color-primary);
background: var(--el-bg-color);
line-height: 1.6;
}
.obelics-page a { text-decoration: none; color: inherit; }
.obelics-page a:hover { text-decoration: none; }
.obelics-page ul { list-style: none; }
.markdown-body .obelics-page a { color: inherit !important; text-decoration: none !important; }
.markdown-body .obelics-page a:hover { text-decoration: none !important; }
.markdown-body .obelics-page a.s-btn-primary,
.markdown-body .obelics-page a.btn-cta-light { color: #ffffff !important; }
.markdown-body .obelics-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
.markdown-body .obelics-page a.btn-cta-ghost { color: #94a3b8 !important; }
.markdown-body .obelics-page a.btn-cta-ghost:hover { color: #e2e8f0 !important; }
.markdown-body .obelics-page h1, .markdown-body .obelics-page h2 { border-bottom: none !important; padding-bottom: 0 !important; }
.obelics-page .s-container { max-width: 1200px; margin: 0 auto; padding: 0 24px; }
.obelics-page .s-container-narrow { max-width: 800px; margin: 0 auto; padding: 0 24px; }
.obelics-page .s-container-wide { max-width: 1100px; margin: 0 auto; padding: 0 32px; }
.obelics-page .s-section { padding: 80px 0; }
.obelics-page .s-section-lg { padding: 100px 0; }
.obelics-page .s-section-sm { padding: 48px 0; }
.obelics-page .s-bg-white { background: var(--el-bg-color); }
.obelics-page .s-bg-gray { background: var(--el-bg-color-page); }
.obelics-page .s-bg-dark { background: #0f172a; color: #f8fafc; }
.obelics-page .s-header { text-align: center; margin-bottom: 64px; }
.obelics-page .s-header h2 {
font-size: clamp(28px, 4vw, 40px);
font-weight: 700;
color: var(--el-text-color-primary);
letter-spacing: normal;
margin-bottom: 20px;
line-height: 1.15;
}
.obelics-page .s-header p {
font-size: clamp(16px, 2vw, 18px);
color: var(--el-text-color-regular);
max-width: 640px;
margin: 0 auto;
line-height: 1.6;
}
.obelics-page .s-bg-dark .s-header h2 { color: #f8fafc; }
.obelics-page .s-bg-dark .s-header p { color: var(--el-text-color-secondary); }
.obelics-page .s-btn-primary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: #ec4899; color: #ffffff !important;
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: background 0.2s, transform 0.15s;
border: none; cursor: pointer;
text-decoration: none !important;
}
.obelics-page .s-btn-primary:hover { background: #db2777; transform: translateY(-1px); text-decoration: none !important; }
.obelics-page .s-btn-secondary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: var(--el-bg-color); color: var(--el-text-color-primary) !important;
border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: border-color 0.2s, background 0.2s;
cursor: pointer;
text-decoration: none !important;
}
.obelics-page .s-btn-secondary:hover { background: var(--el-bg-color-page); text-decoration: none !important; }
.obelics-hero {
padding: 100px 0 80px;
text-align: center;
background: var(--el-bg-color);
position: relative;
overflow: hidden;
}
.obelics-hero::before {
content: '';
position: absolute;
top: -200px; left: 50%;
transform: translateX(-50%);
width: 900px; height: 500px;
background: radial-gradient(ellipse, rgba(236, 72, 153, 0.06) 0%, transparent 70%);
pointer-events: none;
}
.obelics-page .hero-badge {
display: inline-flex; align-items: center; gap: 8px;
padding: 6px 16px;
background: var(--el-bg-color-page); border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 13px; font-weight: 600; color: var(--el-text-color-regular);
margin-bottom: 28px;
}
.obelics-page .hero-badge .badge-dot {
width: 6px; height: 6px; background: #10b981; border-radius: 50%;
display: inline-block;
}
.obelics-hero h1 {
font-size: clamp(36px, 5vw, 60px);
font-weight: 700; line-height: 1.05;
letter-spacing: normal; color: var(--el-text-color-primary);
margin-bottom: 20px;
position: relative;
}
.obelics-hero h1 span { color: #ec4899; }
.obelics-page .hero-subtitle {
font-size: clamp(16px, 2vw, 20px);
color: var(--el-text-color-regular); line-height: 1.6;
max-width: 620px; margin: 0 auto 56px;
position: relative;
}
.obelics-page .hero-actions {
display: flex; gap: 12px; justify-content: center;
flex-wrap: wrap; margin-bottom: 56px; position: relative;
}
.obelics-page .hero-highlights {
display: flex; align-items: center; justify-content: center;
gap: 16px; flex-wrap: wrap; position: relative;
}
.obelics-page .hero-highlights .h-item { font-size: 14px; color: var(--el-text-color-regular); font-weight: 500; }
.obelics-page .hero-highlights .h-div { width: 1px; height: 16px; background: var(--el-border-color-light); }
@media (max-width: 640px)
{ .obelics-page .hero-highlights .h-div { display: none; } .obelics-page .hero-highlights { gap: 8px 16px; } .obelics-page .hero-actions { flex-direction: column; align-items: center; } .obelics-page .hero-actions a { width: 100%; max-width: 280px; justify-content: center; } } .obelics-page .hero-cover { max-width: 720px; margin: 48px auto 0; border-radius: 16px; overflow: hidden; box-shadow: 0 8px 32px rgba(0,0,0,0.10); } .obelics-page .hero-cover img { width: 100%; height: auto; display: block; } .obelics-stats { padding: 48px 0; background: var(--el-bg-color-page); border-top: 1px solid var(--el-border-color-lighter); border-bottom: 1px solid var(--el-border-color-lighter); } .obelics-page .stats-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 32px; text-align: center; } .obelics-page .stat-icon { font-size: 28px; margin-bottom: 12px; } .obelics-page .stat-val { font-size: clamp(28px, 4vw, 40px); font-weight: 700; color: var(--el-text-color-primary); letter-spacing: normal; margin-bottom: 4px; } .obelics-page .stat-lbl { font-size: 14px; color: var(--el-text-color-secondary); font-weight: 500; } @media (max-width: 768px) { .obelics-page .stats-grid { grid-template-columns: repeat(2, 1fr); gap: 24px; } } @media (max-width: 480px) { .obelics-page .stats-grid { grid-template-columns: 1fr; gap: 20px; } } .obelics-page .features-grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 24px; } .obelics-page .feat-card { padding: 32px 28px; border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); background: var(--el-bg-color); transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .obelics-page .feat-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .obelics-page .feat-icon { font-size: 32px; margin-bottom: 16px; } .obelics-page .feat-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .obelics-page .feat-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .obelics-page .features-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 640px) { .obelics-page .features-grid { grid-template-columns: 1fr; } } .obelics-page .usecases-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 20px; } .obelics-page .uc-card { padding: 28px 24px; background: var(--el-bg-color); border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); text-align: center; transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .obelics-page .uc-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .obelics-page .uc-icon { font-size: 36px; margin-bottom: 16px; } .obelics-page .uc-card h3 { font-size: 17px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .obelics-page .uc-card p { font-size: 14px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .obelics-page .usecases-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 480px) { .obelics-page .usecases-grid { grid-template-columns: 1fr; } } .obelics-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #334155 !important; background: #0f172a !important; max-width: 860px; margin: 0 auto; } .markdown-body .obelics-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #334155 !important; background: #0f172a !important; } .obelics-page .code-bar { display: flex !important; align-items: center !important; justify-content: space-between !important; padding: 12px 20px !important; background: #1e293b !important; border-bottom: 1px solid #334155 !important; } .obelics-page .code-dots { display: flex; gap: 6px; } .obelics-page .code-dots i { width: 10px; height: 10px; border-radius: 50%; display: inline-block; } .obelics-page .code-dots .r { background: #ef4444; } .obelics-page .code-dots .y { background: #f59e0b; } .obelics-page .code-dots .g { background: #10b981; } .obelics-page .code-lang { font-size: 12px; color: var(--el-text-color-secondary); font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; } .obelics-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #e2e8f0 !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .markdown-body .obelics-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #e2e8f0 !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .obelics-page .steps-row { display: flex; align-items: flex-start; justify-content: center; margin-bottom: 48px; } .obelics-page .stp-card { flex: 1; max-width: 320px; text-align: center; padding: 0 24px; } .obelics-page .stp-num { font-size: clamp(48px, 6vw, 72px); font-weight: 700; color: #e2e8f0; letter-spacing: -0.04em; line-height: 1; margin-bottom: 20px; } .obelics-page .stp-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 10px; } .obelics-page .stp-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } .obelics-page .stp-conn { width: 60px; height: 2px; background: var(--el-border-color-light); margin-top: 36px; flex-shrink: 0; } .obelics-page .steps-cta { text-align: center; } @media (max-width: 768px) { .obelics-page .steps-row { flex-direction: column; align-items: center; gap: 32px; } .obelics-page .stp-conn { width: 2px; height: 32px; margin: 0; } .obelics-page .stp-card { max-width: 100%; } } .obelics-cta { padding: 100px 0; background: #0f172a; text-align: center; position: relative; overflow: hidden; } .obelics-cta::before { content: ''; position: absolute; top: -100px; left: 50%; transform: translateX(-50%); width: 700px; height: 400px; background: radial-gradient(ellipse, rgba(236, 72, 153, 0.12) 0%, transparent 70%); pointer-events: none; } .obelics-cta h2 { font-size: clamp(28px, 4vw, 44px); font-weight: 700; color: #f8fafc; letter-spacing: normal; margin-bottom: 28px; position: relative; } .obelics-cta > div > p { font-size: clamp(16px, 2vw, 18px); color: var(--el-text-color-secondary); max-width: 520px; margin: 0 auto 56px; line-height: 1.6; position: relative; } .obelics-page .cta-actions { display: flex; gap: 12px; justify-content: center; flex-wrap: wrap; position: relative; } .obelics-page .btn-cta-light { display: inline-flex; align-items: center; gap: 6px; padding: 14px 32px; background: #ec4899; color: #ffffff !important; border-radius: 9999px; font-size: 15px; font-weight: 700; transition: background 0.2s, transform 0.15s; text-decoration: none !important; } .obelics-page .btn-cta-light:hover { background: #db2777; transform: translateY(-1px); text-decoration: none !important; } .obelics-page .btn-cta-ghost { display: inline-flex; align-items: center; padding: 14px 32px; background: transparent; color: #94a3b8 !important; border: 1px solid #334155; border-radius: 9999px; font-size: 15px; font-weight: 600; transition: border-color 0.2s, color 0.2s; text-decoration: none !important; } .obelics-page .btn-cta-ghost:hover { border-color: var(--el-text-color-regular); color: #e2e8f0 !important; text-decoration: none !important; } .obelics-page code { background: #fdf2f8 !important; padding: 2px 8px !important; border-radius: 5px !important; font-size: 13px !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; color: #db2777 !important; border: 1px solid #fbcfe8 !important; } .obelics-page .s-text-dark { color: var(--el-text-color-primary); } .obelics-page .s-text-brand { color: #ec4899; } .obelics-page .s-section-body { font-size: 16px; color: var(--el-text-color-regular); line-height: 1.8; text-align: center; max-width: 680px; margin: 0 auto; } .obelics-page .s-section-body p + p { margin-top: 16px; } .obelics-page .tag-row { display: flex; gap: 8px; flex-wrap: wrap; justify-content: center; margin-top: 16px; } .obelics-page .tag-item
{
padding: 4px 12px; background: var(--el-bg-color-page);
border: 1px solid var(--el-border-color-light); border-radius: 9999px;
font-size: 12px; font-weight: 600; color: var(--el-text-color-regular);
}
html.dark .obelics-page { background: var(--el-bg-color); color: var(--el-text-color-primary); }
html.dark .obelics-page a { color: inherit; }
html.dark .markdown-body .obelics-page a { color: inherit !important; }
html.dark .markdown-body .obelics-page a.s-btn-primary,
html.dark .markdown-body .obelics-page a.btn-cta-light { color: #ffffff !important; }
html.dark .markdown-body .obelics-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
html.dark .markdown-body .obelics-page a.btn-cta-ghost { color: #94a3b8 !important; }
html.dark .markdown-body .obelics-page a.btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
html.dark .obelics-page .s-bg-white { background: var(--el-bg-color); }
html.dark .obelics-page .s-bg-gray { background: var(--el-bg-color-page); }
html.dark .obelics-page .s-bg-dark { background: var(--el-bg-color); }
html.dark .obelics-page .s-header h2 { color: var(--el-text-color-primary); }
html.dark .obelics-page .s-header p { color: var(--el-text-color-secondary); }
html.dark .obelics-page .s-btn-primary { background: #ec4899; color: #ffffff !important; }
html.dark .obelics-page .s-btn-primary:hover { background: #db2777; }
html.dark .obelics-page .s-btn-secondary {
background: #1e293b; color: var(--el-text-color-primary) !important;
border-color: #475569;
}
html.dark .obelics-page .s-btn-secondary:hover { background: var(--el-border-color); border-color: var(--el-text-color-regular); }
html.dark .obelics-hero { background: var(--el-bg-color); }
html.dark .obelics-hero::before {
background: radial-gradient(ellipse, rgba(236, 72, 153, 0.15) 0%, transparent 70%);
}
html.dark .obelics-page .hero-badge { background: var(--el-bg-color-page); border-color: var(--el-border-color); color: var(--el-text-color-secondary); }
html.dark .obelics-hero h1 { color: var(--el-text-color-primary); }
html.dark .obelics-hero h1 span { color: #f472b6; }
html.dark .obelics-page .hero-subtitle { color: var(--el-text-color-secondary); }
html.dark .obelics-page .hero-highlights .h-item { color: var(--el-text-color-secondary); }
html.dark .obelics-page .hero-highlights .h-div { background: var(--el-border-color); }
html.dark .obelics-stats { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .obelics-page .stat-val { color: var(--el-text-color-primary); }
html.dark .obelics-page .stat-lbl { color: var(--el-text-color-regular); }
html.dark .obelics-page .feat-card {
background: var(--el-bg-color-page); border-color: var(--el-border-color);
}
html.dark .obelics-page .feat-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .obelics-page .feat-card h3 { color: var(--el-text-color-primary); }
html.dark .obelics-page .feat-card p { color: var(--el-text-color-secondary); }
html.dark .obelics-page .uc-card { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .obelics-page .uc-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .obelics-page .uc-card h3 { color: var(--el-text-color-primary); }
html.dark .obelics-page .uc-card p { color: var(--el-text-color-secondary); }
html.dark .obelics-page .stp-num { color: #334155; }
html.dark .obelics-page .stp-card h3 { color: var(--el-text-color-primary); }
html.dark .obelics-page .stp-card p { color: var(--el-text-color-secondary); }
html.dark .obelics-page .stp-conn { background: var(--el-border-color); }
html.dark .obelics-page code {
background: #500724 !important; color: #f9a8d4 !important; border-color: #ec4899 !important;
}
html.dark .obelics-page .s-text-dark { color: var(--el-text-color-primary); }
html.dark .obelics-page .s-text-brand { color: #f472b6; }
html.dark .obelics-page .s-section-body { color: var(--el-text-color-secondary); }
html.dark .obelics-page .tag-item { background: var(--el-border-color); border-color: var(--el-text-color-regular); color: var(--el-text-color-secondary); }
html.dark .obelics-cta { background: #020617; }
html.dark .obelics-cta::before {
background: radial-gradient(ellipse, rgba(236, 72, 153, 0.2) 0%, transparent 70%);
}
html.dark .obelics-page .btn-cta-light { color: #ffffff !important; }
html.dark .obelics-page .btn-cta-ghost { color: #94a3b8 !important; }
html.dark .obelics-page .btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
</style>
<div class="obelics-page">
<section class="obelics-hero">
<div class="s-container-narrow">
<div class="hero-badge">
<span class="badge-dot"></span>
OBELICS Dataset
</div>
<h1>
OBELICS<br/><span>Multimodal Dataset</span>
</h1>
<p class="hero-subtitle">
OBELICS (Open Bimodal Examples from Large-scale Interleaved Captioned Sources) is a large-scale multimodal web document dataset released by Hugging Face. It contains 141 million web documents, interleaved with 353 million images and 115 billion text tokens. The data is extracted from CommonCrawl, preserving the natural arrangement of text and images, used for training multimodal large language models like IDEFICS.
Dataset Highlights
A large-scale image-text intertwined dataset providing a training foundation for the next generation of multimodal large language models
Image-Text Intertwined Format
Preserves the natural arrangement of images and text in web documents, faithfully restoring the authentic reading experience of intertwined content, rather than simple image-text pairing.
Ultra Large Scale
Contains 141 million web documents, 353 million images, and 115 billion text tokens, making it one of the largest open-source multimodal intertwined datasets available.
Web-Native Structure
Data is extracted directly from real web pages, preserving the original document structure and contextual relationships, rather than artificially constructed image-text pairs, making it closer to natural scenarios.
Driving IDEFICS
As the core training data for Hugging Face's open-source multimodal large language model IDEFICS, it has been validated through large-scale model training in practical applications.
Quality Selected
Processed through multiple layers of filtering pipelines, including sensitive content filtering, document quality assessment, and duplicate document removal, ensuring the data is clean and usable.
Open and Reproducible
Utilizes the CC BY 4.0 license agreement, deeply integrated with the Hugging Face Datasets library, allowing for direct streaming loading or bulk downloading via API.
Applicable Scenarios
From multimodal model training to document understanding, covering cutting-edge research and engineering implementation
Multimodal Large Model Training
Train multimodal large language models that can understand both images and text using the intertwined format
Document Understanding
Learn the layout and structure of real web pages, enhancing the model's ability to understand complex documents and extract information
Visual Question Answering
Build visual question answering models that can reason across images and text, addressing complex questions requiring multimodal understanding
Few-Shot Multimodal Learning
Utilize the intertwined format for contextual learning, enabling the model to quickly grasp new tasks from a small number of examples
Data Preview
The following is a structural example of a single web document in the OBELICS dataset, showcasing the intertwined arrangement of images and text
{
"document_url": "https://example.com/article",
"content": [
{ "type": "text", "value": "Exploring the latest advancements of deep learning in natural language processing..." },
{ "type": "image", "url": "https://example.com/img1.jpg", "alt": "Transformer architecture" }, { "type": "text", "value": "As shown in the above image, the Transformer architecture consists of encoders and decoders..."
},
{ "type": "image", "url": "https://example.com/img2.jpg", "alt": "Attention mechanism" },
{ "type": "text", "value": "The attention mechanism allows the model to focus on different parts of the input sequence..." }
]
}
3 Steps to Get Started Quickly
From browsing to training, quickly integrate OBELICS into your multimodal research workflow
Browse the Dataset
Browse the OBELICS dataset on the Ace Data Cloud platform to understand details such as document structure, metadata, and licensing agreements.
Stream or Download
Stream or batch download data through the Hugging Face Datasets library, flexibly choosing a loading method that suits your computational resources.
Integrate into Training Workflow
Load the image-text intertwined documents into your multimodal training pipeline and start training the next generation of large language models that can understand image-text relationships.
Start Exploring OBELICS Data
141 million web documents, 353 million images, multimodal training data with intertwined image and text. Whether you are researching multimodal large models or exploring document understanding, OBELICS is the ideal data foundation.
