COCO 2017
Dataset
COCO (Common Objects in Context) is one of the most classic benchmark datasets in the field of computer vision, containing 330,000 images, 1.5 million object instances, and 80 object categories, supporting tasks such as object detection, instance segmentation, keypoint detection, and image captioning. ```
Dataset Highlights
The most influential benchmark dataset in the field of computer vision, covering various core visual tasks
Object Detection
Provides bounding box annotations for 80 object categories, covering common objects in daily life, serving as the standard training and evaluation dataset for detection models like YOLO, Faster R-CNN, etc.
Instance Segmentation
Each object instance comes with pixel-level segmentation masks, supporting fine-grained instance segmentation tasks, widely used for training and evaluation of models like Mask R-CNN.
Keypoint Detection
Includes over 250,000 annotated human instances with 17 keypoints, serving as a core data source for human pose estimation and action recognition research.
Image Captioning
Each image is accompanied by 5 human-written natural language descriptions, supporting tasks like image captioning and visual question answering (VQA).
Stuff Segmentation
In addition to countable "thing" categories, 91 "stuff" categories (such as sky, grass, road) are also annotated, supporting panoptic segmentation.
Academic Standard
Created under the leadership of Microsoft Research, it is the most cited visual dataset in top conference papers such as CVPR, ECCV, ICCV, recognized as a benchmark in both academia and industry.
Applicable Scenarios
From academic research to industrial deployment, covering core tasks in computer vision
Object Detection
Training and evaluation of object detection models, such as YOLO, SSD, Faster R-CNN, DETR, and other classic and cutting-edge architectures
Image Segmentation
A standard evaluation platform for instance segmentation, semantic segmentation, and panoptic segmentation tasks, supporting models like Mask R-CNN, SAM, etc.
Pose Estimation
Pose estimation research based on 17 annotated human keypoints, suitable for training models like OpenPose, HRNet, etc.
Image Captioning
Training image description generation models using image-text paired data, supporting multimodal understanding and visual language pre-training
Data Preview
The following is a JSON example of a single object instance in the COCO annotation file
{
"info": {
"description": "COCO 2017 Dataset",
"version": "1.0",
"year": 2017,
"contributor": "COCO Consortium",
"url": "http://cocodataset.org"
},
"images": [
{
"id": 397133,
"file_name": "000000397133.jpg",
"width": 640,
"height": 480
}
],
"annotations":
[
{
"id": 1768,
"image_id": 397133,
"category_id": 18,
"bbox": [217.62, 240.54, 38.99, 57.75],
"area": 2254.6,
"segmentation": [[...]],
"iscrowd": 0
}
],
"categories": [
{"id": 1, "name": "person", "supercategory": "person"},
{"id": 18, "name": "dog", "supercategory": "animal"}
]
}
3 Steps to Get Started Quickly
From browsing to loading, you can start your computer vision project in just a few minutes
Browse the Dataset
View dataset details on the Ace Data Cloud platform to understand metadata such as category distribution, annotation format, and licensing agreements.
Download Data
Download the training set (118K images, 18 GB), validation set (5K images, 1 GB), and annotation files, selecting the required subsets as needed.
Load and Use
Use pycocotools to load annotation data, quickly completing data visualization, model training, and evaluation processes.
Start Exploring the COCO 2017 Dataset
The most classic benchmark dataset in the field of computer vision, open license, available for immediate download. Whether you are a deep learning researcher or an industrial application developer, COCO is an indispensable data resource.
