0% found this document useful (0 votes)
20 views5 pages

Datasets

Uploaded by

adham0n10ashraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

Datasets

Uploaded by

adham0n10ashraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Datasets

General Datasets
1. Titanic Dataset
• Data Cleaning:

o Handle missing values (e.g., Age, Cabin).

o Remove duplicates if any.

• Preprocessing:

o Convert categorical variables like Sex and Embarked to numerical values using
one-hot encoding or label encoding.

• Visualization:

o Plot survival rates by gender, class, or embarkation point.

o Visualize age distribution using histograms.

https://www.kaggle.com/datasets/brendan45774/test-file

2. House Prices
• Data Cleaning:

o Handle missing values in attributes like LotFrontage or GarageYrBlt.

o Standardize and correct inconsistencies in categorical features like Neighborhood.

• Preprocessing:

o Encode categorical variables like Condition or Style.

o Create new features like price per square foot.

• Visualization:

o Plot price distributions using histograms.

o Visualize correlations between house prices and features like lot size or
neighborhood.

https://www.kaggle.com/datasets/lespin/house-prices-dataset
3. Amazon Product Reviews
• Data Cleaning:

o Handle missing reviews or ratings.

o Remove duplicates and irrelevant reviews.

• Preprocessing:

o Perform sentiment analysis on review text.

o Group data by product categories or brands.

• Visualization:

o Plot distributions of ratings using histograms.

o Create word clouds for commonly used terms in reviews.

https://www.kaggle.com/datasets/snap/amazon-fine-food-reviews

4. Retail Store Sales


• Data Cleaning:

o Handle missing sales or customer data.

o Correct inconsistencies in product categories.

• Preprocessing:

o Aggregate sales by region, category, or time period.

o Create new metrics like average sales per customer.

• Visualization:

o Plot sales trends over time using line charts.

o Use pie charts to show sales contributions by product category.

o Create bar charts for top-performing regions.

https://www.kaggle.com/datasets/kyanyoga/sample-sales-data

5. Student Performance Dataset


• Data Cleaning:

o Handle missing entries for gender, parental education, or test scores.

o Standardize categories like test preparation status.


• Preprocessing:

o Create new metrics like average test score.

o Group data by gender or parental education level.

• Visualization:

o Plot test score distributions using histograms.

o Use bar charts to compare performance across genders or parental education


levels.

o Create scatter plots showing correlations between test preparation and scores.

https://www.kaggle.com/datasets/spscientist/students-performance-in-exams

Real-World Applications
6. Supermarket Sales
• Data Cleaning:

o Check for duplicates and remove them.

o Handle inconsistent entries in Customer Type or City.

• Preprocessing:

o Convert Date column to datetime format.

o Aggregate sales by month or category.

• Visualization:

o Visualize sales trends over time using line charts.

o Create a pie chart showing sales distribution by payment type.

https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales

Health and Social Sciences

7. World Happiness Report

• Data Cleaning:
o Handle missing socioeconomic indicators.
o Remove duplicates or invalid country records.
• Preprocessing:
o Scale happiness scores and other indicators.
o Create regional aggregates (e.g., average happiness by continent).
• Visualization:
o Visualize happiness scores on a world map.
o Create scatter plots comparing happiness to GDP or freedom scores.

https://www.kaggle.com/datasets/unsdsn/world-happiness

8. Diabetes Dataset
• Data Cleaning:

o Replace zeros in columns like BloodPressure or BMI with mean/median values.

• Preprocessing:

o Normalize health metrics for better analysis.

• Visualization:

o Visualize distributions of features like BMI and glucose levels.

o Use heatmaps to show correlations between features.

https://www.kaggle.com/datasets/mathchi/diabetes-data-set

Entertainment and Media


9. Netflix Movies and TV Shows
Dataset Link: Netflix Dataset on Kaggle

• Data Cleaning:

o Handle missing values in attributes like Director or Cast.

o Remove duplicate records for movies or shows.

• Preprocessing:

o Encode categorical variables like Genre and Country.

o Create new features like the release decade or duration category (e.g., short,
medium, long).

• Visualization:

o Plot counts of content types (movies vs. TV shows).

o Visualize the distribution of genres using bar charts.

o Show trends in content release over years using line plots.

https://www.kaggle.com/datasets/shivamb/netflix-shows
10. Spotify Tracks Dataset
• Data Cleaning:

o Remove duplicate tracks.

o Handle missing genres or artist data.

• Preprocessing:

o Scale numerical columns like Popularity or Duration_ms.

o Group data by artists or genres for aggregation.

• Visualization:

o Plot top genres using bar charts.

o Visualize trends in track popularity over time.

https://www.kaggle.com/datasets/zaheenhamidani/ultimate-spotify-tracks-db

Geographical and Environmental

11. Global Temperature Data

• Data Cleaning:
o Handle missing temperature records.
o Ensure proper datetime formatting.
• Preprocessing:
o Aggregate data by year or decade for trend analysis.
• Visualization:
o Plot temperature trends over time.
o Use choropleth maps to show regional temperature changes.

https://www.kaggle.com/datasets/berkeleyearth/climate-change-earth-surface-temperature-data

You might also like