Data Analysis On Nutrition Facts For McDonalds Menu Data Set Using Python
Data Analysis On Nutrition Facts For McDonalds Menu Data Set Using Python
Abstract:
Python is now-a-days easy to go programming language which is so popular due to its multiple features and applications. Python has
become the language choice for most of data scientists now-a-days for data & its operations like visualization, analysis, manipulation,
retrieval, cleaning, and machine learning. It uses open source platform and libraries such as NumPy, Scipy, matplotlib, pandas, scikit-
learn etc. This paper aims to highlight data analysis of ' Nutrition Facts for McDonald's Menu' dataset using Python. The Indian food
industry has risen as a high-development and high-benefit area because of its huge potential for esteem expansion, especially inside
the food processing industry. This dataset is used to analyze nutritious and non-nutritious food items in the menu. It uses various
python libraries to analyze this dataset to represent the data in the form of different charts.
1. INTRODUCTION makes Python perfect for model development and other specially
appointed programming tasks, without trading off viability. It
The Python programming language is very popular today accompanies a huge standard library that backings numerous
because of its features and use. So in this paper the data analysis normal programming errands, for example, associating with web
of a ‘Nutrition Facts for McDonald's Menu’ data-set is done servers, searching content with regular expressions, reading and
using Python language. There are total 9 sections in this paper altering files. Python's intuitive mode makes it simple to test
which are as follows: section 2 represents Introduction to short scraps of code. There's likewise a packaged improvement
Python, section 3 represents Why python is used for Data condition called IDLE. It is effortlessly stretched out by
Analysis, section 4 represents Applications of Python, section 5 including new modules executed in a gathered language, for
represents Introduction to data-set, section 6 represents Analysis example, C or C++. It can likewise be inserted into an
Performed on Data-set, section 7 represents Result using application to give a programmable interface. It runs anywhere,
different chart diagrams, section 8 represents Conclusion, while including Mac OS X, Windows, Linux, and Unix. It is free
section 9 represents References used. programming in two detects. It doesn't cost anything to
download or utilize Python, or to incorporate it in your
2. INTRODUCTION TO PYTHON application. Python can likewise be uninhibitedly altered and re-
distributed, on the grounds that while the language is
The Python programming language was conceived in the late copyrighted it's accessible under an open source license[3].
1980s, and its implementation was started in December 1989 by
Guido van Rossum at CWI in the Netherlands as a successor to 3. WHY PYTHON IS USED FOR DATA ANALYSIS
the ABC programming language capable of exception handling
and interfacing with the Amoeba operating system [1]. Python is The scripting language Python currently available in 2 different
an translated, object-oriented, high-level programming language versions, python 3.4.3 released in February 2015 while python
with dynamic semantics. It’s high-level built in data structures, released in December 2014.Many data analyst use python for
consolidated with dynamic typing and dynamic binding make it analysis of data-sets. So python has certain features which
very exceptionally appealing for Rapid Application enables it to be used for data analysis purpose.
Development. Python supports modules and packages, which 1. Purpose -Python focuses on productivity and code readability.
supports program seclusion and code reuse. The Python 2. Used by -It is used by programmers that want to dive into data
translator and the broad standard library are accessible in source analysis Or apply statistical / mathematical techniques And by
or parallel frame without charge for every single significant developers that turn to data science
stage, and can be uninhibitedly circulated.[2] Python has some 3. Usability -Coding and debugging is much easier to do in
one of the kind of elements so it can be utilized as a part of Python because of simple syntax and terminology. The
numerous applications. Some of these components are as per the indentation of code affects its meaning.
following: Utilizes a rich language structure, making the projects 4. Flexibility -It is flexible for doing something that has never
you compose less demanding to peruse. It is a simple to-utilize been done before. Developers can use Python for scripting a
language that makes it easy to get your program working. This website or other applications.
International Journal of Engineering Science and Computing, June 2017 13679 http://ijesc.org/
5. Ease of learning -Python makes learning curve relatively low Some toolkits that are usable on a few stages are accessible
and gradual, So it good for starting programmers. independently:
6. Set of Libraries -In python there are many libraries which we wxWidgets Kivy, for composing multitouch applications.
can use as per use for extracting analysis from data-sets. There Qt by means of pyqt or pyside
are many libraries , some of main libraries that are most Stage particular toolboxs are likewise accessible:
commonly used libraries are NumPy (Numerical Python), SciPy GTK+
(Scientific Python), Matplotlib, Pandas, Scikit Learn, Scrapy, Microsoft Foundation Classes through the win32 augmentations
Bokeh, Pygal etc.
7. Python IDE's -There are many Python IDE's, most popular are 4.4.1. Image Processing and Graphic Design Applications:
Spyder and IPython Notebook Python has been utilized to make 2D imaging programming, for
8. Python Testing Framework -Python's testing framework example, Inkscape, GIMP, Paint Shop Pro and Scribus. Further,
guarantee that code is reusable and dependable. 3D movement bundles, similar to Blender, 3ds Max, Cinema 4D,
9. Open Source -Python is free to download for everyone so Houdini, Light wave and Maya, additionally utilize Python in
good for developers, programmers and data analyst [4]. factor extents.
International Journal of Engineering Science and Computing, June 2017 13680 http://ijesc.org/
So there is lot of information of menu items which contains Let's sort them by the amount of sugar they have in a
basically, Category, Item, Serving Size, Calories, Calories from ascending order:
Fat, Total Fat, Total Fat (% Daily Value), Saturated Fat,
Saturated Fat (% Daily Value), Trans Fat, Cholesterol, Item Sugars
Cholesterol (% Daily Value), Sodium, Sodium (% Daily Value), 145 Coffee (Small) 0
Carbohydrates, Carbohydrates (% Daily Value), Dietary Fiber, 99 Kids French Fries 0
Dietary Fiber (% Daily Value), Sugars, Protein, Vitamin A (% 96 Small French Fries 0
Daily Value), Vitamin C (% Daily Value), Calcium (% Daily 81 Chicken McNuggets (20piece) 0
Value), Iron (% Daily Value). 114 Diet Coke (Small) 0
115 Diet Coke (Medium) 0
6. ANALYSIS PERFORMED ON DATA-SET 116 Diet Coke (Large) 0
117 Diet Coke (Child) 0
- Import csv file in python 122 Diet DrPepper (Small) 0
In Python: 123 Diet Dr Pepper (Medium) 0
>>> import csv
>>> with open('C:\\Users\\Bappa\\Pictures\\menu.csv', -Check for item which contains no sugar.
encoding='utf-8', newline='') as f: In Python:
reader = csv. reader(f) print("Number of items in the menu: "+str(len(menu.index)))
for row in reader: print("Number of items without sugar in the menu:
print(', '.join(row)) "+str(len(df_sugars.loc[df_sugars['Sugars'] == 0])))
print(row) print(df_sugars.loc[df_sugars['Sugars'] == 0])
Result : It will import Menu.csv file of data-set Result:
-To get first 10 lines of dataset with specific columns Number of items in the menu: 260
In Python: Number of items without sugar in the menu: 25
>>> import csv, itertools Item Sugars
>>> with open('C:\\Users\\Bappa\\Pictures\\menu.csv', 145 Coffee (Small) 0
encoding='utf-8', newline='') as csvfile: 99 Kids French Fries 0
for row in itertools. Islice (csv.DictReader(csvfile), 10): 96 Small French Fries 0
print(row['Category'], row['Item'], 81 Chicken McNuggets (20 piece) 0
row['Serving Size']) 114 Diet Coke (Small) 0
115 Diet Coke (Medium) 0
Result : Here function islice() will create an iterator from the 116 Diet Coke (Large) 0
iterable object you pass and it will allow you iterate till the 117 Diet Coke (Child) 0
limit, you pass as the second parameter. 122 Diet Dr Pepper (Small) 0
123 Diet Dr Pepper (Medium) 0
-Import all necessary files 124 Diet Dr Pepper (Large) 0
import pandas as pd 98 Large French Fries 0
import numpy as np 80 Chicken McNuggets (10 piece) 0
import seaborn as sns 79 Chicken McNuggets (6 piece) 0
import matplotlib.pyplot as plt 136 Dasani Water Bottle 0
%matplotlib inline 137 Iced Tea (Small) 0
import plotly.offline as py 138 Iced Tea (Medium) 0
py.init_notebook_mode(connected=True) 139 Iced Tea (Large) 0
import plotly.graph_objs as go 140 Iced Tea (Child) 0
import plotly.tools as tls 78 Chicken McNuggets (4 piece) 0
import warnings 146 Coffee (Medium) 0
warnings. filter warnings('ignore') 38 Hash Brown 0
147 Coffee (Large) 0
- Sugar content in Menu’s items 125 Diet Dr Pepper (Child) 0
Create a new Data Frame with the columns Item and Sugars 97 Medium French Fries 0
and find first 10 items containing high sugar content value. So only 25 elements of 260, which means that only the 9.61%
In Python: of the items in McDonalds doesn't have any amount of sugar.
df_sugars = pd.DataFrame(columns=('Item','Sugars'))
df_sugars['Item'] = menu['Item'] 7. RESULT USING DIFFERENT CHART DIAGRAMS
df_sugars['Sugars'] = menu['Sugars']
print("Let's sort them by the amount of sugar they have in a It is important to show the result in form of chart diagrams so
ascending order: ") that it is easily identified. There are many chart diagrams that
df_sugars = df_sugars.sort_values('Sugars', ascending=[True]) can be drawn using libraries in python [7]. In this paper, bar
print(df_sugars.head(10)) diagram, pie chart, scatter diagram, heatmap diagram are
Result: shown with result and analysis.
International Journal of Engineering Science and Computing, June 2017 13681 http://ijesc.org/
7.1. Bar Diagram of Calories in Different Category of Menu pyplot.axis("equal") #The pie chart is oval by default. To make it
Data-set a circle use pyplot.axis("equal")
In Python: plt.pie(x_list,labels=label_list,autopct="%1.1f%%")
mc_menu = read.csv("../input/menu.csv", header = T, sep = ",") plt.title("Pie-chart of Menu Category with Calories")
# PIVORT TABLE OF CATEGORY AND SUM OF plt.show()
CALORIES
aggregate(mc_menu$Calories, by=list(mc_menu$Category), Result:
sum)
calories_cat = as.data.frame(aggregate(mc_menu$Calories,
by=list(mc_menu$Category), sum))
library(ggplot2)
ggplot(calories_cat ) + geom_col(aes(Group.1, x,
fill=rainbow(9))) +
geom_text(aes(x=Group.1 , y=x , label = x))+
labs(title = "Each Category Containg number of Calories", x=
"Categories", y= "Calories")+
theme(
plot.background = element_rect(fill="#F0F3F4"),
panel.grid.major = element_line(colour = "#37474F"),
panel.background = element_rect(fill="#F0F3F4"), Figure.2. Pie Chart 1
axis.title.y = element_text(colour = "#3E2723", angle=90),
axis.title.x = element_text(colour = "#3E2723", angle = 0), Analysis:
axis.text = element_text(colour = "#3E2723"), From above, Fig 2: Pie Chart 1 we found that different category
legend.position = "none") of menu in McDonald’s menu dataset with their calorie values in
percentage (%). So highest value of Calorie found in Category
Result: Beef & Pork with value 21.8%.While other categories Chicken
& Fish with 21%, Snacks and Sides with 14%, Breakfast 12.3%,
Desserts 10.3%, Smoothies & Shakes with 9.05%, Salads with
5.76% , Beverages with 5.76%.
International Journal of Engineering Science and Computing, June 2017 13682 http://ijesc.org/
Analysis: Analysis:
From above, Fig 3:Pie Chart 2 it is found that different category From above Fig 5: Bar Diagram 3, it is found Vitamin A is
of menu in McDonald’s menu dataset with their cholesterol majorly (almost 170%) in salad category of McDonald’s Menu
values in percentage (%). So highest value found in Breakfast dataset. So salad is healthy nutritious food from this menu
category with highest value 57.2%. dataset.
7.4. Bar Chart for Category highest Range 7. 6. Bar Chart for Vitamin C Range in % for Different
In Python: Category
import matplotlib.pyplot as plt In Python:
ax = df[['Category']].plot(kind='bar', title ="Categories in Menu import matplotlib.pyplot as plt
Data-set of McDonald's Menu", figsize=(15, 10), fontsize=12) ax = df[['Category']].plot(kind='bar', title ="Vitamin C Range for
ax.set_xlabel("Different Categories in Menu", fontsize=12) Different Category", figsize=(15, 10), fontsize=12)
ax.set_ylabel("Range", fontsize=12) ax.set_xlabel("Vitamin C (% Daily Value)", fontsize=12)
plt.show() ax.set_ylabel("Range in %", fontsize=12)
plt.show()
Result:
Result:
7.5. Bar Chart for Vitamin A Range in % for Different 7.7. Bar Chart For Sodium Range in % for Different
Category Category
In Python: In Python:
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
ax = df[['Category']].plot(kind='bar', title ="Vitamin A Range for ax = df[['Category']].plot(kind='bar', title ="Sodium Range for
Different Category", figsize=(15, 10), fontsize=12) Different Category", figsize=(15, 10), fontsize=12)
ax.set_xlabel("Vitamin A (% Daily Value)", fontsize=12) ax.set_xlabel("Sodium (% Daily Value)", fontsize=12)
ax.set_ylabel("Range in %", fontsize=12) ax.set_ylabel("Range in %", fontsize=12)
plt.show() plt.show()
Result: Result:
International Journal of Engineering Science and Computing, June 2017 13683 http://ijesc.org/
Analysis: Analysis:
From above Fig 7: Bar Diagram 5, it is found that Sodium is From above Fig 9: Bar Diagram 7, it is found that Sugar is
majorly (almost 150%) found in Chicken and Fish Category of majorly (almost 140%) in both Smoothies & Shakes Category of
McDonald’s Menu dataset. So Chicken and Fish is rich source of McDonald’s Menu dataset. Also Sugar is found in most of menu
Sodium from this menu dataset. categories from this menu dataset.
7.8. Bar Chart For Dietary Fiber Range in % for different 7.10. Scatter chart for Protein Range in % for different
Category Category
In Python: In Python :
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
ax = df[['Category']].plot(kind='bar', title ="Dietary Fiber Range plot_ly(x = ~Protein, y = ~Carbohydrates, color =
for Different Category", figsize=(15, 10), fontsize=12) ~Dietary_Fiber, text = ~paste("Item ",Item),type= "scatter",
ax.set_xlabel("Dietary Fiber (% Daily Value)", fontsize=12) mode=" markers",size=~Total.Fat) %>%
ax.set_ylabel("Range in %", fontsize=12) layout(title="Proteins, Dietary Fibre,Carbs and Total Fat (Size
plt.show() represents Total Fat) ")
Input:
# selecting Breakfast category to show neutricious food
df = data[data['Category'] == 'Breakfast']
df = df.groupby(['Item']).sum().sort_values(by='neturicious',
Figure.9. Bar Diagram 7 ascending=False).head(10)
International Journal of Engineering Science and Computing, June 2017 13684 http://ijesc.org/
fig, ax = plt.subplots(figsize=(7, 5)) and range values in percentage obtained. So it is beneficial for
df = df[neturicious] demonstrating different range values for food nutrients such as
sns.heatmap(df, ax=ax, annot=True) vitamin A, vitamin B, vitamin C, sugar, dietary fibers, fats,
ax.set_title('What to choose for Breakfast') carbohydrates, cholesterol, iron, sodium and protein for their
plt.xticks(rotation=45) proper consumption from menu items.
Result: 9. REFERENCES
Output :
(array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5]), [1] https://en.wikipedia.org/wiki/History_of_Python
<a list of 6 Text xticklabel objects>)
[2]. https://www.python.org/ doc/essays/blurb/
[4]http://blog.datacamp.com/wp-content/ uploads/2015/05/R-vs-
Python-216-2.png
[5]. https://www.invensis.net/blog/it/applications-of-python-in-
real-world/
[6]. https://www.kaggle.com/mcdonalds/nutrition-facts
[7]. http://www.randalolson.com/2014/06/28/how-to-make-
Figure.11. Heatmap Diagram 1 beautiful-data-visualizations-in-python-with-matplotlib/
Input:
# selecting Breakfast category to show non-neutricious food
df = data[data['Category'] == 'Breakfast']
df = df.groupby(['Item']).sum().sort_values(by='nonneturicious',
ascending=False).head(10)
fig, ax = plt.subplots(figsize=(9, 5))
df = df[nonneturicious]
sns.heatmap(df, ax=ax, annot=True)
ax.set_title('What not to choose for Breakfast')
plt.xticks(rotation=45)
Result:
Output:
(array([ 0.5, 1.5, 2.5, 3.5, 4.5]), <a list of 5 Text xticklabel
objects>)
8. CONCLUSION
International Journal of Engineering Science and Computing, June 2017 13685 http://ijesc.org/