0% found this document useful (0 votes)
101 views10 pages

BDA Lab 4: Python Data Visualization: Your Name: Mohamad Salehuddin Bin Zulkefli Matric No: 17005054

The document provides instructions for a Python data visualization lab asking students to import a CSV dataset, plot various visualizations including a line graph, histogram, pie charts and box plot using Matplotlib and Pandas. Students are to code the visualizations, take screenshots of the output, and submit a PDF lab report with their code, outputs and answers to questions about the visualizations and insights gained.

Uploaded by

Saleh Zul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views10 pages

BDA Lab 4: Python Data Visualization: Your Name: Mohamad Salehuddin Bin Zulkefli Matric No: 17005054

The document provides instructions for a Python data visualization lab asking students to import a CSV dataset, plot various visualizations including a line graph, histogram, pie charts and box plot using Matplotlib and Pandas. Students are to code the visualizations, take screenshots of the output, and submit a PDF lab report with their code, outputs and answers to questions about the visualizations and insights gained.

Uploaded by

Saleh Zul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

By Dr Norshakirah Ab Aziz

BDA Lab 4: Python Data Visualization


*Using the provided ShampooSales2020.csv file

Make sure you have installed Python and pandas. You are recommended to use Jupyter Notebook to
write the codes. Include the screenshot of the code you wrote, and the output generated in your lab
sheet. You need to have the matplotlib module installed for this.

References:
1. Matplotlib - Pyplot API - Tutorialspoint
2. How to Plot a DataFrame using Pandas - Data to Fish
3. Pandas Dataframe: Plot Examples with Matplotlib and Pyplot

Your Name: Mohamad Salehuddin bin Zulkefli


Matric No: 17005054

Short Notes:
By Dr Norshakirah Ab Aziz

Q1. Figure below shows the 6 steps of a CRISP-DM methodology. State all the phases in which data
visualization is used and why it is critical to implement it as a part of the data mining process? [2M]

Your answer:
Crisp-DM methodology are consist of these 6 steps:
1. Business understanding – What does the business need?
• Focuses on understanding the objectives and requirements of the project.
2. Data understanding – What data do we have / need? Is it clean?
• it drives the focus to identify, collect, and analyze the data sets
3. Data preparation – How do we organize the data for modeling?
• prepares the final data set(s) for modeling.
4. Modeling – What modeling techniques should we apply?
• build and assess various models based on several different modeling techniques.
5. Evaluation – Which model best meets the business objectives?
• the Evaluation phase looks more broadly at which model best meets the business
and what to do next.
6. Data Presentation – How do stakeholders access the results?
• A model is not particularly useful unless the customer can access its results.

Q2. Import the Shampoo Sales 2020 dataset file as a dataframe and plot a line graph. [1M]

• Make sure the DataFrame from Shampoo Sales is named as your name.
• Change the colour of the line graph to your fav. colour. (do not use default)
• Customize the marker code using the marker of your choice (do not use default)

Paste your FINAL python code:

import matplotlib.pyplot as plt


import seaborn as sb
import pandas as pd
from matplotlib.pyplot import figure

Mohamad_Salehuddin = pd.read_csv('MohamadSalehuddin_ShampooSales.csv')
print(Mohamad_Salehuddin)
print('------------------------------')

x ='Month'
y='Sales'

df.plot(x , y, kind = 'line', color = 'c', marker = 'x')


plt.title('Sales Vs Month line Graph')
By Dr Norshakirah Ab Aziz

plt.ylabel('Sales')
plt.xlabel('Month')
plt.legend()
plt.grid(True,color='k')
plt.show()
FINAL output print screen:

Is there any anomaly/outlier in the dataset? If yes, describe:


The result that are obtain are not linear as the final part of data has some flunctuation.
By Dr Norshakirah Ab Aziz

Q3. Find a dataset from https://www.dosm.gov.my and create a pie chart using the data that you
have selected. Provide insight from the created visualization and present answer below: [3M]

Paste your FINAL python code:

import matplotlib.pyplot as plt


import seaborn as sb
import pandas as pd
from matplotlib.pyplot import figure, pie, axis, show
%matplotlib inline

df = pd.read_excel('Live_births_by_state_and_sex_2009-2018.xlsx',skiprows =range(1, 307))


print(df)

print('---------------------------')

plt.figure(figsize=(12,8))

sums = df.groupby(df["State"])["Number of Live births"].sum()


plt.axis('equal');
plt.pie(sums, labels=sums.index,
explode=(0.2,0,0,0,0,0,0,0,0,0,0,0,0,0.2,0.4,0.6),pctdistance=0.7,autopct='%.2f %%');
plt.show()
FINAL output print screen:

Insight (what can you describe from the visualization?):


The data taken from ‘department of statistics Malaysia’ is 'Live_births_by_state_and_sex_2009-
2018.xlsx'. This data has 338 rows with 4 columns(parameters) which is “Year”, ”State”, ”Sex” and
”Number of Live births”. In the pie chart, we only take data from Year = 2018. And the pie chart
shows the Total number of live births according to State.

From the pie chart we can conclude that Selangor has the highest Number of Live births and W.P.
Labuan has the lowest Number of Live births.
By Dr Norshakirah Ab Aziz

*For next questions, refer to dataset on https://github.com/KeithGalli/matplotlib_tutorial and follow


tutorial https://www.youtube.com/watch?v=0P7QnIQDBJY

Q4. Histogram Example (FIFA Overall Skill Distribution). [1M]

Paste your FINAL python code:

import matplotlib.pyplot as plt


import seaborn as sb
import pandas as pd
from matplotlib.pyplot import figure, pie, axis, show

fifa = pd.read_csv('fifa_data.csv')
print(fifa.head(10))

print('---------------------------------------')

bins = [40,50,60,70,80,90,100]

plt.figure(figsize=(8,5))

plt.hist(fifa.Overall, bins=bins, color='#abcdef')

plt.xticks(bins)

plt.ylabel('Number of Players')
plt.xlabel('Skill Level')
plt.title('Distribution of Player Skills in FIFA 2018')

plt.savefig('histogram.png', dpi=300)

plt.show()
FINAL output print screen:
By Dr Norshakirah Ab Aziz

Q5. Pie Chart #1 (Counting data in CSV) - Visualizing Soccer Foot Preferences. [1M]

Paste your FINAL python code:


import matplotlib.pyplot as plt
import seaborn as sb
import pandas as pd
from matplotlib.pyplot import figure, pie, axis, show

fifa = pd.read_csv('fifa_data.csv')
print(fifa.head(10))

print('---------------------------------------')

left = fifa.loc[fifa['Preferred Foot'] == 'Left'].count()[0]


right = fifa.loc[fifa['Preferred Foot'] == 'Right'].count()[0]

plt.figure(figsize=(8,5))

labels = ['Left', 'Right']


colors = ['#abcdef', '#aabbcc']

plt.pie([left, right], labels = labels, colors=colors, autopct='%.2f %%')

plt.title('Foot Preference of FIFA Players')

plt.show()
FINAL output print screen:
By Dr Norshakirah Ab Aziz

Q6. Pie Chart #2 (More advance Pandas Example) - Weight Distribution of FIFA Players. [1M]

Paste your FINAL python code:

import matplotlib.pyplot as plt


import seaborn as sb
import pandas as pd
from matplotlib.pyplot import figure, pie, axis, show

fifa = pd.read_csv('fifa_data.csv')
print(fifa.head(10))

print('---------------------------------------')

plt.figure(figsize=(8,5), dpi=100)

plt.style.use('ggplot')

fifa.Weight = [int(x.strip('lbs')) if type(x)==str else x for x in fifa.Weight]

light = fifa.loc[fifa.Weight < 125].count()[0]


light_medium = fifa[(fifa.Weight >= 125) & (fifa.Weight < 150)].count()[0]
medium = fifa[(fifa.Weight >= 150) & (fifa.Weight < 175)].count()[0]
medium_heavy = fifa[(fifa.Weight >= 175) & (fifa.Weight < 200)].count()[0]
heavy = fifa[fifa.Weight >= 200].count()[0]

weights = [light,light_medium, medium, medium_heavy, heavy]


label = ['under 125', '125-150', '150-175', '175-200', 'over 200']
explode = (.4,.2,0,0,.4)

plt.title('Weight of Professional Soccer Players (lbs)')

plt.pie(weights, labels=label, explode=explode, pctdistance=0.8,autopct='%.2f %%')


plt.show()
By Dr Norshakirah Ab Aziz

FINAL output print screen:


By Dr Norshakirah Ab Aziz

Q7. Box & Whisker Plot (Comparing FIFA teams to one another). [1M]

Paste your FINAL python code:

import matplotlib.pyplot as plt


import seaborn as sb
import pandas as pd
from matplotlib.pyplot import figure, pie, axis, show

fifa = pd.read_csv('fifa_data.csv')
print(fifa.head(10))

print('---------------------------------------')

plt.figure(figsize=(8,11), dpi=100)

plt.style.use('default')

barcelona = fifa.loc[fifa.Club == "Manchester United"]['Overall']


madrid = fifa.loc[fifa.Club == "Manchester City"]['Overall']
revs = fifa.loc[fifa.Club == "Liverpool"]['Overall']

#bp = plt.boxplot([barcelona, madrid, revs], labels=['a','b','c'], boxprops=dict(facecolor='red'))


bp = plt.boxplot([barcelona, madrid, revs], labels=['Manchester United','Manchester
City','Liverpool'], patch_artist=True, medianprops={'linewidth': 2})

plt.title('Professional Soccer Team Comparison')


plt.ylabel('FIFA Overall Rating')

for box in bp['boxes']:


# change outline color
box.set(color='#4286f4', linewidth=2)
# change fill color
box.set(facecolor = '#e0e0e0' )
# change hatch
#box.set(hatch = '/')

plt.show()
FINAL output print screen:
By Dr Norshakirah Ab Aziz

***You are required to submit:

Lab report in PDF format with the screenshot of code and output of your lab task.

Submission: INDIVIDUAL SUBMISSION


File type: PDF file (.pdf)
File name: matricNumber-Lab4.pdf example: 19002199-Lab4.pdf
Deadline: Before next week, submit according to GA instruction.

*Copying is prohibited. Late submission without acceptable reasons will not be considered.

You might also like