<style>
.favdbench-page * { box-sizing: border-box; }
.favdbench-page h1, .favdbench-page h2, .favdbench-page h3, .favdbench-page h4, .favdbench-page h5, .favdbench-page h6, .favdbench-page p, .favdbench-page ul, .favdbench-page ol, .favdbench-page li, .favdbench-page pre, .favdbench-page blockquote, .favdbench-page table, .favdbench-page td, .favdbench-page th { margin: 0; padding: 0; }
.favdbench-page {
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
color: var(--el-text-color-primary);
background: var(--el-bg-color);
line-height: 1.6;
}
.favdbench-page a { text-decoration: none; color: inherit; }
.favdbench-page a:hover { text-decoration: none; }
.favdbench-page ul { list-style: none; }
.markdown-body .favdbench-page a { color: inherit !important; text-decoration: none !important; }
.markdown-body .favdbench-page a:hover { text-decoration: none !important; }
.markdown-body .favdbench-page a.s-btn-primary,
.markdown-body .favdbench-page a.btn-cta-light { color: #ffffff !important; }
.markdown-body .favdbench-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
.markdown-body .favdbench-page a.btn-cta-ghost { color: #94a3b8 !important; }
.markdown-body .favdbench-page a.btn-cta-ghost:hover { color: #e2e8f0 !important; }
.markdown-body .favdbench-page h1, .markdown-body .favdbench-page h2 { border-bottom: none !important; padding-bottom: 0 !important; }
.favdbench-page .s-container { max-width: 1200px; margin: 0 auto; padding: 0 24px; }
.favdbench-page .s-container-narrow { max-width: 800px; margin: 0 auto; padding: 0 24px; }
.favdbench-page .s-container-wide { max-width: 1100px; margin: 0 auto; padding: 0 32px; }
.favdbench-page .s-section { padding: 80px 0; }
.favdbench-page .s-section-lg { padding: 100px 0; }
.favdbench-page .s-section-sm { padding: 48px 0; }
.favdbench-page .s-bg-white { background: var(--el-bg-color); }
.favdbench-page .s-bg-gray { background: var(--el-bg-color-page); }
.favdbench-page .s-bg-dark { background: #0f172a; color: #f8fafc; }
.favdbench-page .s-header { text-align: center; margin-bottom: 64px; }
.favdbench-page .s-header h2 {
font-size: clamp(28px, 4vw, 40px);
font-weight: 700;
color: var(--el-text-color-primary);
letter-spacing: normal;
margin-bottom: 20px;
line-height: 1.15;
}
.favdbench-page .s-header p {
font-size: clamp(16px, 2vw, 18px);
color: var(--el-text-color-regular);
max-width: 640px;
margin: 0 auto;
line-height: 1.6;
}
.favdbench-page .s-bg-dark .s-header h2 { color: #f8fafc; }
.favdbench-page .s-bg-dark .s-header p { color: var(--el-text-color-secondary); }
.favdbench-page .s-btn-primary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: #dc2626; color: #ffffff !important;
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: background 0.2s, transform 0.15s;
border: none; cursor: pointer;
text-decoration: none !important;
}
.favdbench-page .s-btn-primary:hover { background: #b91c1c; transform: translateY(-1px); text-decoration: none !important; }
.favdbench-page .s-btn-secondary {
display: inline-flex; align-items: center; gap: 6px;
padding: 14px 28px;
background: var(--el-bg-color); color: var(--el-text-color-primary) !important;
border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 15px; font-weight: 600;
transition: border-color 0.2s, background 0.2s;
cursor: pointer;
text-decoration: none !important;
}
.favdbench-page .s-btn-secondary:hover { background: var(--el-bg-color-page); text-decoration: none !important; }
.favdbench-hero {
padding: 100px 0 80px;
text-align: center;
background: var(--el-bg-color);
position: relative;
overflow: hidden;
}
.favdbench-hero::before {
content: '';
position: absolute;
top: -200px; left: 50%;
transform: translateX(-50%);
width: 900px; height: 500px;
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.06) 0%, transparent 70%);
pointer-events: none;
}
.favdbench-page .hero-badge {
display: inline-flex; align-items: center; gap: 8px;
padding: 6px 16px;
background: var(--el-bg-color-page); border: 1px solid var(--el-border-color-light);
border-radius: 9999px; font-size: 13px; font-weight: 600; color: var(--el-text-color-regular);
margin-bottom: 28px;
}
.favdbench-page .hero-badge .badge-dot {
width: 6px; height: 6px; background: #10b981; border-radius: 50%;
display: inline-block;
}
.favdbench-hero h1 {
font-size: clamp(36px, 5vw, 60px);
font-weight: 700; line-height: 1.05;
letter-spacing: normal; color: var(--el-text-color-primary);
margin-bottom: 20px;
position: relative;
}
.favdbench-hero h1 span { color: #dc2626; }
.favdbench-page .hero-subtitle {
font-size: clamp(16px, 2vw, 20px);
color: var(--el-text-color-regular); line-height: 1.6;
max-width: 620px; margin: 0 auto 56px;
position: relative;
}
.favdbench-page .hero-actions {
display: flex; gap: 12px; justify-content: center;
flex-wrap: wrap; margin-bottom: 56px; position: relative;
}
.favdbench-page .hero-highlights {
display: flex; align-items: center; justify-content: center;
gap: 16px; flex-wrap: wrap; position: relative;
}
.favdbench-page .hero-highlights .h-item { font-size: 14px; color: var(--el-text-color-regular); font-weight: 500; }
.favdbench-page .hero-highlights .h-div { width: 1px; height: 16px; background: var(--el-border-color-light); }
@media (max-width: 640px) 

{ .favdbench-page .hero-highlights .h-div { display: none; } .favdbench-page .hero-highlights { gap: 8px 16px; } .favdbench-page .hero-actions { flex-direction: column; align-items: center; } .favdbench-page .hero-actions a { width: 100%; max-width: 280px; justify-content: center; } } .favdbench-page .hero-cover { max-width: 720px; margin: 48px auto 0; border-radius: 16px; overflow: hidden; box-shadow: 0 8px 32px rgba(0,0,0,0.10); } .favdbench-page .hero-cover img { width: 100%; height: auto; display: block; } .favdbench-stats { padding: 48px 0; background: var(--el-bg-color-page); border-top: 1px solid var(--el-border-color-lighter); border-bottom: 1px solid var(--el-border-color-lighter); } .favdbench-page .stats-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 32px; text-align: center; } .favdbench-page .stat-icon { font-size: 28px; margin-bottom: 12px; } .favdbench-page .stat-val { font-size: clamp(28px, 4vw, 40px); font-weight: 700; color: var(--el-text-color-primary); letter-spacing: normal; margin-bottom: 4px; } .favdbench-page .stat-lbl { font-size: 14px; color: var(--el-text-color-secondary); font-weight: 500; } @media (max-width: 768px) { .favdbench-page .stats-grid { grid-template-columns: repeat(2, 1fr); gap: 24px; } } @media (max-width: 480px) { .favdbench-page .stats-grid { grid-template-columns: 1fr; gap: 20px; } } .favdbench-page .features-grid { display: grid; grid-template-columns: repeat(3, 1fr); gap: 24px; } .favdbench-page .feat-card { padding: 32px 28px; border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); background: var(--el-bg-color); transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .favdbench-page .feat-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .favdbench-page .feat-icon { font-size: 32px; margin-bottom: 16px; } .favdbench-page .feat-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .favdbench-page .feat-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .favdbench-page .features-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 640px) { .favdbench-page .features-grid { grid-template-columns: 1fr; } } .favdbench-page .usecases-grid { display: grid; grid-template-columns: repeat(4, 1fr); gap: 20px; } .favdbench-page .uc-card { padding: 28px 24px; background: var(--el-bg-color); border: none; border-radius: 20px; box-shadow: 0 2px 12px 0 rgba(0,0,0,0.08); text-align: center; transition: border-color 0.2s, box-shadow 0.2s, transform 0.15s; } .favdbench-page .uc-card:hover { box-shadow: 0 8px 24px 0 rgba(0,0,0,0.12); transform: translateY(-2px); } .favdbench-page .uc-icon { font-size: 36px; margin-bottom: 16px; } .favdbench-page .uc-card h3 { font-size: 17px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 8px; } .favdbench-page .uc-card p { font-size: 14px; color: var(--el-text-color-regular); line-height: 1.6; } @media (max-width: 1024px) { .favdbench-page .usecases-grid { grid-template-columns: repeat(2, 1fr); } } @media (max-width: 480px) { .favdbench-page .usecases-grid { grid-template-columns: 1fr; } } .favdbench-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #334155 !important; background: #0f172a !important; max-width: 860px; margin: 0 auto; } .markdown-body .favdbench-page .code-wrap { border-radius: 16px !important; overflow: hidden !important; border: 1px solid #334155 !important; background: #0f172a !important; } .favdbench-page .code-bar { display: flex !important; align-items: center !important; justify-content: space-between !important; padding: 12px 20px !important; background: #1e293b !important; border-bottom: 1px solid #334155 !important; } .favdbench-page .code-dots { display: flex; gap: 6px; } .favdbench-page .code-dots i { width: 10px; height: 10px; border-radius: 50%; display: inline-block; } .favdbench-page .code-dots .r { background: #ef4444; } .favdbench-page .code-dots .y { background: #f59e0b; } .favdbench-page .code-dots .g { background: #10b981; } .favdbench-page .code-lang { font-size: 12px; color: var(--el-text-color-secondary); font-weight: 600; text-transform: uppercase; letter-spacing: 0.05em; } .favdbench-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #e2e8f0 !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .markdown-body .favdbench-page .code-block { padding: 24px !important; margin: 0 !important; overflow-x: auto !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; font-size: 13.5px !important; line-height: 1.7 !important; color: #e2e8f0 !important; white-space: pre !important; background: transparent !important; border: none !important; border-radius: 0 !important; } .favdbench-page .steps-row { display: flex; align-items: flex-start; justify-content: center; margin-bottom: 48px; } .favdbench-page .stp-card { flex: 1; max-width: 320px; text-align: center; padding: 0 24px; } .favdbench-page .stp-num { font-size: clamp(48px, 6vw, 72px); font-weight: 700; color: #e2e8f0; letter-spacing: -0.04em; line-height: 1; margin-bottom: 20px; } .favdbench-page .stp-card h3 { font-size: 18px; font-weight: 700; color: var(--el-text-color-primary); margin-bottom: 10px; } .favdbench-page .stp-card p { font-size: 15px; color: var(--el-text-color-regular); line-height: 1.6; } .favdbench-page .stp-conn { width: 60px; height: 2px; background: var(--el-border-color-light); margin-top: 36px; flex-shrink: 0; } .favdbench-page .steps-cta { text-align: center; } @media (max-width: 768px) { .favdbench-page .steps-row { flex-direction: column; align-items: center; gap: 32px; } .favdbench-page .stp-conn { width: 2px; height: 32px; margin: 0; } .favdbench-page .stp-card { max-width: 100%; } } .favdbench-cta { padding: 100px 0; background: #0f172a; text-align: center; position: relative; overflow: hidden; } .favdbench-cta::before { content: ''; position: absolute; top: -100px; left: 50%; transform: translateX(-50%); width: 700px; height: 400px; background: radial-gradient(ellipse, rgba(220, 38, 38, 0.12) 0%, transparent 70%); pointer-events: none; } .favdbench-cta h2 { font-size: clamp(28px, 4vw, 44px); font-weight: 700; color: #f8fafc; letter-spacing: normal; margin-bottom: 28px; position: relative; } .favdbench-cta > div > p { font-size: clamp(16px, 2vw, 18px); color: var(--el-text-color-secondary); max-width: 520px; margin: 0 auto 56px; line-height: 1.6; position: relative; } .favdbench-page .cta-actions { display: flex; gap: 12px; justify-content: center; flex-wrap: wrap; position: relative; } .favdbench-page .btn-cta-light { display: inline-flex; align-items: center; gap: 6px; padding: 14px 32px; background: #dc2626; color: #ffffff !important; border-radius: 9999px; font-size: 15px; font-weight: 700; transition: background 0.2s, transform 0.15s; text-decoration: none !important; } .favdbench-page .btn-cta-light:hover { background: #b91c1c; transform: translateY(-1px); text-decoration: none !important; } .favdbench-page .btn-cta-ghost { display: inline-flex; align-items: center; padding: 14px 32px; background: transparent; color: #94a3b8 !important; border: 1px solid #334155; border-radius: 9999px; font-size: 15px; font-weight: 600; transition: border-color 0.2s, color 0.2s; text-decoration: none !important; } .favdbench-page .btn-cta-ghost:hover { border-color: var(--el-text-color-regular); color: #e2e8f0 !important; text-decoration: none !important; } .favdbench-page code { background: #fee2e2 !important; padding: 2px 8px !important; border-radius: 5px !important; font-size: 13px !important; font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', monospace !important; color: #b91c1c !important; border: 1px solid #fecaca !important; } .favdbench-page .s-text-dark { color: var(--el-text-color-primary); } .favdbench-page .s-text-brand { color: #dc2626; } .favdbench-page .s-section-body { font-size: 16px; color: var(--el-text-color-regular); line-height: 1.8; text-align: center; max-width: 680px; margin: 0 auto; } .favdbench-page .s-section-body p + p { margin-top: 16px; } .favdbench-page .tag-row { display: flex; gap: 8px; flex-wrap: wrap; justify-content: center; margin-top: 16px; } .favdbench-page .tag-item

{
padding: 4px 12px; background: var(--el-bg-color-page);
border: 1px solid var(--el-border-color-light); border-radius: 9999px;
font-size: 12px; font-weight: 600; color: var(--el-text-color-regular);
}
html.dark .favdbench-page { background: var(--el-bg-color); color: var(--el-text-color-primary); }
html.dark .favdbench-page a { color: inherit; }
html.dark .markdown-body .favdbench-page a { color: inherit !important; }
html.dark .markdown-body .favdbench-page a.s-btn-primary,
html.dark .markdown-body .favdbench-page a.btn-cta-light { color: #ffffff !important; }
html.dark .markdown-body .favdbench-page a.s-btn-secondary { color: var(--el-text-color-primary) !important; }
html.dark .markdown-body .favdbench-page a.btn-cta-ghost { color: #94a3b8 !important; }
html.dark .markdown-body .favdbench-page a.btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
html.dark .favdbench-page .s-bg-white { background: var(--el-bg-color); }
html.dark .favdbench-page .s-bg-gray { background: var(--el-bg-color-page); }
html.dark .favdbench-page .s-bg-dark { background: var(--el-bg-color); }
html.dark .favdbench-page .s-header h2 { color: var(--el-text-color-primary); }
html.dark .favdbench-page .s-header p { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .s-btn-primary { background: #dc2626; color: #ffffff !important; }
html.dark .favdbench-page .s-btn-primary:hover { background: #b91c1c; }
html.dark .favdbench-page .s-btn-secondary {
background: #1e293b; color: var(--el-text-color-primary) !important;
border-color: #475569;
}
html.dark .favdbench-page .s-btn-secondary:hover { background: var(--el-border-color); border-color: var(--el-text-color-regular); }
html.dark .favdbench-hero { background: var(--el-bg-color); }
html.dark .favdbench-hero::before {
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.15) 0%, transparent 70%);
}
html.dark .favdbench-page .hero-badge { background: var(--el-bg-color-page); border-color: var(--el-border-color); color: var(--el-text-color-secondary); }
html.dark .favdbench-hero h1 { color: var(--el-text-color-primary); }
html.dark .favdbench-hero h1 span { color: #f87171; }
html.dark .favdbench-page .hero-subtitle { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .hero-highlights .h-item { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .hero-highlights .h-div { background: var(--el-border-color); }
html.dark .favdbench-stats { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .favdbench-page .stat-val { color: var(--el-text-color-primary); }
html.dark .favdbench-page .stat-lbl { color: var(--el-text-color-regular); }
html.dark .favdbench-page .feat-card {
background: var(--el-bg-color-page); border-color: var(--el-border-color);
}
html.dark .favdbench-page .feat-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .favdbench-page .feat-card h3 { color: var(--el-text-color-primary); }
html.dark .favdbench-page .feat-card p { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .uc-card { background: var(--el-bg-color-page); border-color: var(--el-border-color); }
html.dark .favdbench-page .uc-card:hover { border-color: var(--el-text-color-regular); box-shadow: 0 4px 16px rgba(0,0,0,0.3); }
html.dark .favdbench-page .uc-card h3 { color: var(--el-text-color-primary); }
html.dark .favdbench-page .uc-card p { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .stp-num { color: #334155; }
html.dark .favdbench-page .stp-card h3 { color: var(--el-text-color-primary); }
html.dark .favdbench-page .stp-card p { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .stp-conn { background: var(--el-border-color); }
html.dark .favdbench-page code {
background: #7f1d1d !important; color: #fca5a5 !important; border-color: #dc2626 !important;
}
html.dark .favdbench-page .s-text-dark { color: var(--el-text-color-primary); }
html.dark .favdbench-page .s-text-brand { color: #f87171; }
html.dark .favdbench-page .s-section-body { color: var(--el-text-color-secondary); }
html.dark .favdbench-page .tag-item { background: var(--el-border-color); border-color: var(--el-text-color-regular); color: var(--el-text-color-secondary); }
html.dark .favdbench-cta { background: #020617; }
html.dark .favdbench-cta::before {
background: radial-gradient(ellipse, rgba(220, 38, 38, 0.2) 0%, transparent 70%);
}
html.dark .favdbench-page .btn-cta-light { color: #ffffff !important; }
html.dark .favdbench-page .btn-cta-ghost { color: #94a3b8 !important; }
html.dark .favdbench-page .btn-cta-ghost:hover { color: var(--el-text-color-primary) !important; }
</style>
<div class="favdbench-page">
<section class="favdbench-hero">
<div class="s-container-narrow">
<div class="hero-badge">
<span class="badge-dot"></span>
FAVDBench Dataset
</div>
<h1>
FAVDBench<br/><span>Dataset</span>
</h1>
<p class="hero-subtitle">
FAVDBench (Fine-grained Audible Video Description Benchmark) is a fine-grained audio-visual description benchmark dataset proposed at CVPR 2023, aimed at providing detailed textual descriptions for audible videos, including object appearance, spatial location, action information, and sound descriptions.

11,000+ videos 5 types of descriptions CVPR 2023 OpenNLPLab
FAVDBench Dataset
🎬
11,000+
Number of videos
📝
5
Types of descriptions
🏆
CVPR 2023
Top conference
📜
Open
Open license agreement

Dataset Highlights

A refined description benchmark for audio-visual understanding, pushing the frontier of multimodal research

🔊

Audio-Visual Fusion

Covers both visual and auditory information, one of the few datasets that incorporates audio descriptions into video description benchmarks, supporting cross-modal research.

🎯

Refined Descriptions

Provides fine-grained textual annotations across five dimensions: Appearance, Spatial, Temporal, Action, and Audio.

🏷️

Multimodal Annotations

Each video contains multidimensional human-annotated information, with high annotation quality, suitable for training and evaluating multimodal generation models.

📊

Academic Benchmark

Proposed and established by a CVPR 2023 paper, widely cited in academia, serving as the standard evaluation benchmark for audio-visual description tasks.

🌐

Diverse Content

Videos cover various scenes and themes, including natural scenes, human activities, animal behaviors, etc., ensuring a comprehensive assessment of model generalization capabilities.

🔧

Openly Available

The dataset is released under an open license, allowing researchers to freely download and use it, lowering the barriers for academic research and industrial applications.

Applicable Scenarios

From academic research to industrial applications, empowering audio-visual understanding technologies

📹

Video Description Generation

Train and evaluate video description generation models, automatically generating multidimensional natural language descriptions for videos

🔊

Audio-Visual Understanding

Research the joint understanding of visual and auditory information, exploring cross-modal semantic alignment and fusion methods

🧠

Multimodal Research

Provide high-quality training and evaluation data for visual-language-audio tri-modal pre-training models

💬

Video Subtitle Generation

Develop automatic video subtitle systems to enhance the accessibility and retrievability of video content

Video Description Audio-Visual Understanding Multimodal CVPR Benchmark Testing

Data Preview

Below are annotation examples from the FAVDBench dataset, including fine-grained descriptions across five dimensions

JSON
{
"video_id": "video_00123",
"descriptions": {
"appearance": "A brown dog with floppy ears and a red collar stands on green grass.",
"spatial": "The dog is positioned in the center of the frame with trees in the background.",
"temporal": "The video starts with the dog sitting, then it stands up and begins to walk.",
"action": "The dog wags its tail, barks twice, and runs toward the camera.",
"audio": "Birds chirping in the background, followed by two loud barks and rustling grass."
},
"duration": 8.5,
"split": "train"
}

3 Steps to Get Started Quickly

From browsing to usage, you can start your multimodal research in just a few minutes

01

Browse the Dataset

View dataset details on the Ace Data Cloud platform to understand metadata such as annotation format, data scale, and usage licenses.

02

Download Data

Download video files and JSON annotation data; the dataset provides standard splits for training, validation, and testing sets.

03

Load and Use

Use json.load() to load the annotation data, and start training and evaluating multimodal models with a video processing library.

Start Exploring the FAVDBench Dataset

CVPR 2023 Fine-grained Audio-Video Description Benchmark, open license, available for immediate download. Whether you are a multimodal researcher or a video understanding engineer, this dataset is worth a try.