0% found this document useful (0 votes)
8 views5 pages

Mokhless Hajji Project

The Food and Nutrition dataset consists of 1698 entries and 19 columns, including demographic and nutritional information. Data exploration revealed no missing values or duplicates, and a BMI feature was added during transformation. The cleaned dataset has been exported for further analysis.

Uploaded by

rayen benassi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

Mokhless Hajji Project

The Food and Nutrition dataset consists of 1698 entries and 19 columns, including demographic and nutritional information. Data exploration revealed no missing values or duplicates, and a BMI feature was added during transformation. The cleaned dataset has been exported for further analysis.

Uploaded by

rayen benassi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Science Report: Food and Nutrition Dataset

1. Data Exploration
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1698 entries, 0 to 1697
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ages 1698 non-null int64
1 Gender 1698 non-null object
2 Height 1698 non-null int64
3 Weight 1698 non-null int64
4 Activity Level 1698 non-null object
5 Dietary Preference 1698 non-null object
6 Daily Calorie Target 1698 non-null int64
7 Protein 1698 non-null int64
8 Sugar 1698 non-null float64
9 Sodium 1698 non-null float64
10 Calories 1698 non-null int64
11 Carbohydrates 1698 non-null int64
12 Fiber 1698 non-null float64
13 Fat 1698 non-null int64
14 Breakfast Suggestion 1698 non-null object
15 Lunch Suggestion 1698 non-null object
16 Dinner Suggestion 1698 non-null object
17 Snack Suggestion 1698 non-null object
18 Disease 1698 non-null object
dtypes: float64(3), int64(8), object(8)
memory usage: 252.2+ KB

Dataset Description:

Ages Height Weight ... Carbohydrates Fiber Fat


count 1698.000000 1698.000000 1698.000000 ... 1698.000000 1698.000000
1698.000000
mean 43.961720 174.130153 78.064193 ... 252.385159 30.286219 69.700824
std 15.915002 13.420936 16.949264 ... 69.877804 8.385337 21.430707
min 18.000000 150.000000 48.000000 ... 120.000000 14.400000 30.000000
25% 30.000000 163.250000 64.000000 ... 200.000000 24.000000 52.000000
50% 42.000000 174.000000 78.000000 ... 248.000000 29.760000 69.000000
75% 57.000000 185.000000 91.000000 ... 300.000000 36.000000 85.000000
max 79.000000 200.000000 119.000000 ... 436.000000 52.320000 145.000000

[8 rows x 11 columns]

Missing Values per Column:

Ages 0
Gender 0
Height 0
Weight 0
Activity Level 0
Dietary Preference 0
Daily Calorie Target 0
Protein 0
Sugar 0
Sodium 0
Calories 0
Carbohydrates 0
Fiber 0
Fat 0
Breakfast Suggestion 0
Lunch Suggestion 0
Dinner Suggestion 0
Snack Suggestion 0
Disease 0
dtype: int64

Histograms of Numeric Columns:


2. Data Cleaning
Number of duplicate records removed: 0

3. Data Transformation
Added BMI feature: BMI = Weight / (Height in meters)^2

Scatter Plot: Height vs Weight


Correlation Heatmap:

Explained Variance Ratio by PCA: [0.50762786 0.1589109 0.12249427]


6. Validation
Missing Values After Cleaning and Transformation: 0

Duplicates Remaining: 0

Cleaned dataset exported to Cleaned_Food_and_Nutrition.csv

You might also like