```css
Mushroom Dataset

Mushroom Classification
Dataset

The classic mushroom dataset from the UCI Machine Learning Repository, containing 8,124 samples and 22 classification features, used to determine whether mushrooms are edible or poisonous, is a standard introductory dataset for classification learning. ```

8,124 samples 23 features CC BY 4.0 license UCI ML Repository
Mushroom Dataset
πŸ“Š
8,124
Total number of samples
πŸ”¬
23
Feature dimensions
πŸ„
2
Classification categories
πŸ“œ
CC BY 4.0
Open license agreement

Dataset Highlights

A pure classification feature dataset, suitable for learning decision trees and rule mining

πŸ„

Intuitive classification task

Determine whether a mushroom is edible or poisonous, with results that are intuitive and meaningful.

πŸ”€

Pure classification features

All 22 features are categorical variables (cap shape, color, odor, etc.), suitable for practicing one-hot encoding and label encoding.

🌲

Decision tree friendly

Classification features make it an ideal dataset for learning decision trees, random forests, and rule learning algorithms.

πŸ“Š

Ample samples

8,124 samples provide sufficient data, with a relatively balanced distribution of edible and poisonous categories.

πŸ”

Feature analysis

Single features like odor can achieve near-perfect classification, suitable for exploring feature importance.

πŸ›οΈ

UCI authoritative source

Originating from the UCI Machine Learning Repository, a classic binary classification dataset widely cited in academia.

Applicable Scenarios

Valuable from beginner learning to advanced feature analysis

🌲

Decision tree learning

Pure classification features are very suitable for learning decision trees, CART, and rule learning algorithms

🏷️

Binary classification modeling

Classify edible/poisonous using algorithms like Naive Bayes, logistic regression, SVM, etc.

πŸ”

Feature selection

Discover the strong predictive power of key features like odor, practice information gain and chi-squared tests

πŸ”§

Data encoding

Practice one-hot encoding, label encoding, and target encoding techniques for categorical variables

Binary classification Categorical features Decision trees Beginner dataset Rule learning

Data Preview

Below are the first few rows of the mushroom dataset (all features are single-letter encoded)

CSV
class,cap_shape,cap_surface,cap_color,bruises,odor,gill_attachment,...,habitat
p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g
e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m
p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u
e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g

3 Steps to Get Started Quickly

From browsing to analysis, you can start your data science project in minutes

01

Browse the dataset

View dataset details on the Ace Data Cloud platform, including field descriptions, sample size, and license agreement metadata.

02

Download the data

Download the CSV file (374 KB), data is ready to use without additional cleaning.

03

Load and analyze

Use pandas.read_csv() to load the data, along with pd.get_dummies() to encode categorical features.

Start exploring mushroom classification data

A classic classification dataset with an open license, available for immediate download. The pure classification feature design makes it the best introductory dataset for decision trees and rule learning.