Datasets
Datasets
General Datasets
1. Titanic Dataset
• Data Cleaning:
• Preprocessing:
o Convert categorical variables like Sex and Embarked to numerical values using
one-hot encoding or label encoding.
• Visualization:
https://www.kaggle.com/datasets/brendan45774/test-file
2. House Prices
• Data Cleaning:
• Preprocessing:
• Visualization:
o Visualize correlations between house prices and features like lot size or
neighborhood.
https://www.kaggle.com/datasets/lespin/house-prices-dataset
3. Amazon Product Reviews
• Data Cleaning:
• Preprocessing:
• Visualization:
https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews
• Preprocessing:
• Visualization:
https://www.kaggle.com/datasets/kyanyoga/sample-sales-data
• Visualization:
o Create scatter plots showing correlations between test preparation and scores.
https://www.kaggle.com/datasets/spscientist/students-performance-in-exams
Real-World Applications
6. Supermarket Sales
• Data Cleaning:
• Preprocessing:
• Visualization:
https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales
• Data Cleaning:
o Handle missing socioeconomic indicators.
o Remove duplicates or invalid country records.
• Preprocessing:
o Scale happiness scores and other indicators.
o Create regional aggregates (e.g., average happiness by continent).
• Visualization:
o Visualize happiness scores on a world map.
o Create scatter plots comparing happiness to GDP or freedom scores.
https://www.kaggle.com/datasets/unsdsn/world-happiness
8. Diabetes Dataset
• Data Cleaning:
• Preprocessing:
• Visualization:
https://www.kaggle.com/datasets/mathchi/diabetes-data-set
• Data Cleaning:
• Preprocessing:
o Create new features like the release decade or duration category (e.g., short,
medium, long).
• Visualization:
https://www.kaggle.com/datasets/shivamb/netflix-shows
10. Spotify Tracks Dataset
• Data Cleaning:
• Preprocessing:
• Visualization:
https://www.kaggle.com/datasets/zaheenhamidani/ultimate-spotify-tracks-db
• Data Cleaning:
o Handle missing temperature records.
o Ensure proper datetime formatting.
• Preprocessing:
o Aggregate data by year or decade for trend analysis.
• Visualization:
o Plot temperature trends over time.
o Use choropleth maps to show regional temperature changes.
https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data