Matplotlib Inline PD Set - Option (, X: X) : Import As Import As Import As Import As Lambda Import As Import
Matplotlib Inline PD Set - Option (, X: X) : Import As Import As Import As Import As Lambda Import As Import
import pandas as pd
%matplotlib inline
In [5]: import os
os.getcwd()
Out[5]: 'C:\\Users\\10000981'
df.head().T
Out[47]:
0 1 2 3 4
Saturated Fat (%
25 15 42 52 42
Daily Value)
Cholesterol (% Daily
87 8 15 95 16
Value)
Sodium (% Daily
31 32 33 36 37
Value)
Carbohydrates 31 30 29 30 30
Carbohydrates (%
10 10 10 10 10
Daily Value)
Dietary Fiber 4 4 4 4 4
Dietary Fiber (%
17 17 17 17 17
Daily Value)
Sugars 3 3 2 2 2
Protein 17 18 14 21 21
Vitamin A (% Daily
10 6 8 15 6
Value)
Vitamin C (% Daily
0 0 0 0 0
Value)
Calcium (% Daily
25 25 25 30 25
Value)
In [69]: df.describe()
Out[69]:
Total Saturated
Cholesterol
Calories Total Fat (% Saturated Fat (% Trans
Calories Cholesterol (% Daily
from Fat Fat Daily Fat Daily Fat
Value)
Value) Value)
count 260.00 260.00 260.00 260.00 260.00 260.00 260.00 260.00 260.00
mean 368.27 127.10 14.17 21.82 6.01 29.97 0.20 54.94 18.39
std 240.27 127.88 14.21 21.89 5.32 26.64 0.43 87.27 29.09
min 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
25% 210.00 20.00 2.38 3.75 1.00 4.75 0.00 5.00 2.00
50% 340.00 100.00 11.00 17.00 5.00 24.00 0.00 35.00 11.00
75% 500.00 200.00 22.25 35.00 10.00 48.00 0.00 65.00 21.25
max 1880.00 1060.00 118.00 182.00 20.00 102.00 2.50 575.00 192.00
8 rows × 21 columns
The table shows the maximum, minimum, mean, standard deviation, count, 25%,50%,75% for each data column
In [49]: df.shape
In [51]: df.info()
<class 'pandas.core.frame.DataFrame'>
In [70]: df['Category'].value_counts()
Breakfast 42
Beverages 27
Desserts 7
Salads 6
In [72]: df['Category'].nunique()
Out[72]: 9
In [73]: df['Category'].unique()
There are 9 different categories of food available in McD which are listed below
1.'Breakfast'
2.'Beef & Pork',
3.'Chicken & Fish',
4.'Salads',
5.'Snacks & Sides', 6.'Desserts', 7.'Beverages', 8.'Coffee & Tea', 9.'Smoothies &
Shakes'
1. Plot graphically which food categories have the highest and lowest varieties.
In [132]: plt.figure(figsize=(16,8))
category_count = df.groupby('Category')['Item'].count().plot(kind='bar',color
="pink")
plt.xlabel('Category')
plt.ylabel('Item')
category_count
From the graph it is inferred that highest number of items are available in category "Coffee & tea" and the lowest
number of items are available in category "Salads"
In [116]: plt.figure(figsize=(15,5))
plt.subplot(1,5,1)
df.boxplot(column='Calories')
plt.subplot(1,5,2)
df.boxplot(column='Total Fat')
plt.subplot(1,5,3)
df.boxplot(column='Saturated Fat')
plt.subplot(1,5,4)
df.boxplot(column='Cholesterol')
plt.subplot(1,5,5)
df.boxplot(column='Sodium')
Out[116]: <AxesSubplot:>
1. Calories
2. Total Fat
3. Sodium
4. Cholesterol and the data columns that does not have outliers are
5. Saturated Fat
In [124]: plt.figure(figsize=(15,5))
plt.subplot(1,5,1)
df.boxplot(column='Carbohydrates')
plt.subplot(1,5,2)
df.boxplot(column='Dietary Fiber')
plt.subplot(1,5,3)
df.boxplot(column='Sugars')
plt.subplot(1,5,4)
df.boxplot(column='Protein')
plt.subplot(1,5,5)
Out[124]: <AxesSubplot:>
1. Carbohydrates
2. Sugars
3. Protein
and the data columns that doesnot have outliers are
4. Dietary Fiber
5. Saturated Fat (% Daily Value)
In [121]: plt.figure(figsize=(15,5))
plt.subplot(1,5,1)
plt.subplot(1,5,2)
plt.subplot(1,5,3)
plt.subplot(1,5,4)
plt.subplot(1,5,5)
Out[121]: <AxesSubplot:>
In [122]: plt.figure(figsize=(15,5))
plt.subplot(1,5,1)
plt.subplot(1,5,2)
plt.subplot(1,5,3)
plt.subplot(1,5,4)
Out[122]: <AxesSubplot:>
Out[127]: Category
Breakfast 50.95
Salads 17.33
Desserts 4.86
Beverages 0.19
INFERENCE:
The maximum Cholesterol (% Daily Value) in the diet is contributed by the category Breakfast and the second
highest category is Beef & Pork and third highest category is Chicken & Fish.
In [ ]: 3. Which variables have the highest correlation? Plot them and find out the
value?
corr
Out[135]:
Total Saturated
Chole
Calories Total Fat (% Saturated Fat (% Trans
Calories Cholesterol (%
from Fat Fat Daily Fat Daily Fat
V
Value) Value)
Calories from
0.90 1.00 1.00 1.00 0.85 0.85 0.43 0.68
Fat
Total Fat 0.90 1.00 1.00 1.00 0.85 0.85 0.43 0.68
Total Fat (%
0.90 1.00 1.00 1.00 0.85 0.85 0.43 0.68
Daily Value)
Saturated Fat 0.85 0.85 0.85 0.85 1.00 1.00 0.62 0.63
Saturated Fat
(% Daily 0.85 0.85 0.85 0.85 1.00 1.00 0.62 0.63
Value)
Trans Fat 0.52 0.43 0.43 0.43 0.62 0.62 1.00 0.25
Cholesterol (%
0.60 0.68 0.68 0.68 0.63 0.63 0.25 1.00
Daily Value)
Sodium (%
0.71 0.85 0.85 0.85 0.59 0.59 0.19 0.62
Daily Value)
Carbohydrates
(% Daily 0.78 0.46 0.46 0.46 0.59 0.59 0.46 0.27
Value)
Dietary Fiber 0.54 0.58 0.58 0.58 0.35 0.36 0.05 0.44
Dietary Fiber
(% Daily 0.54 0.58 0.58 0.58 0.35 0.35 0.06 0.44
Value)
Vitamin A (%
0.11 0.06 0.05 0.05 0.06 0.07 0.08 0.08
Daily Value)
Vitamin C (%
-0.07 -0.09 -0.09 -0.09 -0.18 -0.18 -0.08 -0.08
Daily Value)
Calcium (%
0.43 0.16 0.16 0.16 0.40 0.40 0.39 0.13
Daily Value)
Iron (% Daily
0.64 0.74 0.73 0.74 0.58 0.58 0.33 0.65
Value)
21 rows × 21 columns
In [138]: plt.figure(figsize=(20,15))
sns.heatmap(corr,annot=True)
Out[138]: <AxesSubplot:>
Correlation of calories:
Calories and saturated fat - 0.85
Calories and cholesteraol - 0.6
Calories and sodium -
0.7
Calories and Protein - 0.79
Correlation of fat:
Fat and Protein - 0.8
Fat and Sodium - 0.8
Fat and Iron - 0.73
Fat and cholesterol - 0.6
Fat and
Dietary Fiber - 0.5
Correlation of Cholesterol:
Cholesterol and calories - 0.6
Cholesterol and protein - 0.5
Cholesterol and sodium -
0.6
Cholesterol and iron - 0.6
Correlation of Sodium:
Sodium and iron - 0.8
Sodium and protein - 0.8
Sodium and Dietary Fiber - 0.6
Sodium
and fat - 0.8
Sodium and Calories - 0.7
Correlation of Protein:
Protein and iron - 0.7
Protein and dietary fiber - 0.6
Protein and sodium - 0.8
Protein and
Fat - 0.8
Protein and Calories - 0.7
In [129]: df.groupby('Item').mean()['Sodium'].sort_values(ascending=False).head(20)
Out[129]: Item
Big Breakfast with Hotcakes and Egg Whites (Large Biscuit) 2290
Big Breakfast with Hotcakes and Egg Whites (Regular Biscuit) 2170
INFERENCE:
The the item that contributes maximum to the sodium intake is Chicken McNuggets (40 piece)
In [ ]: 6. Which 4 food items contain the most amount of Saturated Fat?
Out[99]: Item
Big Breakfast with Hotcakes and Egg Whites (Regular Biscuit) 16.00
Big Breakfast with Hotcakes and Egg Whites (Large Biscuit) 16.00
In [ ]: INFERENCE:
The four items that contain the most amount of saturated fat are listed below