BDA Lab 4: Python Data Visualization: Your Name: Mohamad Salehuddin Bin Zulkefli Matric No: 17005054
BDA Lab 4: Python Data Visualization: Your Name: Mohamad Salehuddin Bin Zulkefli Matric No: 17005054
Make sure you have installed Python and pandas. You are recommended to use Jupyter Notebook to
write the codes. Include the screenshot of the code you wrote, and the output generated in your lab
sheet. You need to have the matplotlib module installed for this.
References:
1. Matplotlib - Pyplot API - Tutorialspoint
2. How to Plot a DataFrame using Pandas - Data to Fish
3. Pandas Dataframe: Plot Examples with Matplotlib and Pyplot
Short Notes:
By Dr Norshakirah Ab Aziz
Q1. Figure below shows the 6 steps of a CRISP-DM methodology. State all the phases in which data
visualization is used and why it is critical to implement it as a part of the data mining process? [2M]
Your answer:
Crisp-DM methodology are consist of these 6 steps:
1. Business understanding – What does the business need?
• Focuses on understanding the objectives and requirements of the project.
2. Data understanding – What data do we have / need? Is it clean?
• it drives the focus to identify, collect, and analyze the data sets
3. Data preparation – How do we organize the data for modeling?
• prepares the final data set(s) for modeling.
4. Modeling – What modeling techniques should we apply?
• build and assess various models based on several different modeling techniques.
5. Evaluation – Which model best meets the business objectives?
• the Evaluation phase looks more broadly at which model best meets the business
and what to do next.
6. Data Presentation – How do stakeholders access the results?
• A model is not particularly useful unless the customer can access its results.
•
Q2. Import the Shampoo Sales 2020 dataset file as a dataframe and plot a line graph. [1M]
• Make sure the DataFrame from Shampoo Sales is named as your name.
• Change the colour of the line graph to your fav. colour. (do not use default)
• Customize the marker code using the marker of your choice (do not use default)
Mohamad_Salehuddin = pd.read_csv('MohamadSalehuddin_ShampooSales.csv')
print(Mohamad_Salehuddin)
print('------------------------------')
x ='Month'
y='Sales'
plt.ylabel('Sales')
plt.xlabel('Month')
plt.legend()
plt.grid(True,color='k')
plt.show()
FINAL output print screen:
Q3. Find a dataset from https://www.dosm.gov.my and create a pie chart using the data that you
have selected. Provide insight from the created visualization and present answer below: [3M]
print('---------------------------')
plt.figure(figsize=(12,8))
From the pie chart we can conclude that Selangor has the highest Number of Live births and W.P.
Labuan has the lowest Number of Live births.
By Dr Norshakirah Ab Aziz
fifa = pd.read_csv('fifa_data.csv')
print(fifa.head(10))
print('---------------------------------------')
bins = [40,50,60,70,80,90,100]
plt.figure(figsize=(8,5))
plt.xticks(bins)
plt.ylabel('Number of Players')
plt.xlabel('Skill Level')
plt.title('Distribution of Player Skills in FIFA 2018')
plt.savefig('histogram.png', dpi=300)
plt.show()
FINAL output print screen:
By Dr Norshakirah Ab Aziz
Q5. Pie Chart #1 (Counting data in CSV) - Visualizing Soccer Foot Preferences. [1M]
fifa = pd.read_csv('fifa_data.csv')
print(fifa.head(10))
print('---------------------------------------')
plt.figure(figsize=(8,5))
plt.show()
FINAL output print screen:
By Dr Norshakirah Ab Aziz
Q6. Pie Chart #2 (More advance Pandas Example) - Weight Distribution of FIFA Players. [1M]
fifa = pd.read_csv('fifa_data.csv')
print(fifa.head(10))
print('---------------------------------------')
plt.figure(figsize=(8,5), dpi=100)
plt.style.use('ggplot')
Q7. Box & Whisker Plot (Comparing FIFA teams to one another). [1M]
fifa = pd.read_csv('fifa_data.csv')
print(fifa.head(10))
print('---------------------------------------')
plt.figure(figsize=(8,11), dpi=100)
plt.style.use('default')
plt.show()
FINAL output print screen:
By Dr Norshakirah Ab Aziz
Lab report in PDF format with the screenshot of code and output of your lab task.
*Copying is prohibited. Late submission without acceptable reasons will not be considered.