0% found this document useful (0 votes)
10 views47 pages

Week 15

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views47 pages

Week 15

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

LAB MODULE

Coding & Big Data

Week 15
Using pandas and matplotplib as analysis
and visualization tools
What is Pandas?
Initialize a dataset
View a dataset
Outline
Practice I
What is matplotlib?
Try to visualize
Use your data for visualization
Tips n Trick
Practice II
Submission
Before we start, here are some
information about dataset
What is Dataset? Dataset is a collection of data that
we import.

In Python, we usually use dataset in a form


of .csv by using pandas library .
Preview your data in excel
when you open the .csv file you can see that the data is not in table format.
Try this solution for solve your problem

Follow these steps:


• Follow these steps:
• Select whole Column A that contain with data
• On the data tab, click Text to columns
• Ensure Delimited is selected and click next
• Ensure comma is selected and click finish
Preview your data in excel
What is Pandas?
Data analysis and manipulation
What is tool.
pandas? pandas is a fast, powerful, flexible and easy
to use open source data analysis and
manipulation tool, built on top of the Python
programming language. You can use it by
importing the library to your python code.
pandas is usually paired with NumPy (a
scientific computing library with Python).
Note
From this module onwards, you can use Google Colaboratory service to do the practices. It is similar to
jupyter notebook, plus it is easy to set up. If you are not familiar with this service, please learn more about
the basics of using it so you don’t have problems following the module.
Initialize a dataset
There are 2 ways to initialize a dataset
You can either create your own dataset from scratch using pandas or import an existing dataset into your
code. To initialize a dataset from scratch, you can use the DataFrame provided by pandas to hold your
dataset. We are going to focus on importing an existing dataset in this module. An imported dataset will
be stored in a form of DataFrame as well.

DataFrame is the data structure used to store datasets in pandas.


Importing a CSV dataset
Import pandas and NumPy first, then import the dataset using the read_csv function from pandas. The
imported csv file will be automatically converted to a DataFrame object by pandas.
Importing a CSV dataset example
For example, this code imports a csv file named california_housing_train.csv and store the contents into a
DataFrame as dataset variable.
Importing a CSV dataset preview
This is the preview of the dataset opened using notepad
View a dataset
Functions to view the dataset
dataset.shape → shows the dimensions of the row and
column
dataset.head() → shows the first five rows of the
DataFrame
dataset.tail() → shows the last five rows of the
DataFrame
dataset.columns → shows the column names of the
DataFrame
dataset.describe() → shows a statistical summary of the
DataFrame
Viewing a dataset example
This code will view the shape and the first 5 rows of the dataset imported previously using .shape and
.head() functions.
Viewing a dataset example continued
The output shows that the dataset has 17.000 rows and 9 columns. Also the first 5 rows are displayed.
Practice I
Import and view the dataset
ODD STUDENT ID
Import the dataset from the provided csv file named “SalesJan2009.csv” and match the output in Figure
1. Implement view functions to display the data.

EVEN STUDENT ID
Import the dataset from the provided csv file named “SacramentocrimeJanuary2006.csv” and match the
output in Figure 2. Implement view functions to display the data.
Figure 1
Figure 2
What is matplotlib?
What is Data Visualization library from
Python.
matplotlib? matplotlib is a comprehensive Python
library that is useful for creating static and
interactive data visualizations in 2D and 3D.
What inside
matplotlib?
You can try every function in matplotlib

Plt.Plot (): Produce plot graph


Plt.Bar (): Produce bar chart

Plt.Hist(): produce Histogram graph

Plt.Pie(): produce pie chart

Plt.Scatter(): produce scatter plot

Dropna(): remove missing value or zero value

For more reference about this click here


Try to Visualize
Example
Code: Result:
How to resize the chart?

Resize Figure.
ADD This Code:
plt.figure(figsize=(W,H))
W = Width
H = Height

Example:
plt.figure(figsize=(16,10))
Use your data for visualization
Import Your library and CSV dataset.
Information
Pay Attention with your columns name
when you declare It on your code.
(Sensitive Case)

Preview Your Data


This is the preview of the dataset opened using Excel
Example
With Variable: Without Variable:
Result
Bonus
Broken Barh
Result

For more reference for this Broken Barh Click Here or Here.
Stacked Bar Graph
Result

For more reference About this Stacked Bar Chart Click Here.
Tips n Trick
(Additional)
Add label to your chart
Practice II
ODD Student ID
Import the dataset from the provided csv file named “worldometer_data.csv” then use 10 last data from
Dataset in Total cases of covid-19 and visualize the result for each country using
Bar Chart.

Even Student ID
Import the dataset from the provided csv file named “worldometer_data.csv” then use 10 last data from
Dataset in Total Recovered of covid-19 and visualize the result for each country
using Bar Chart.
Submission
Screenshot & Submit your file
Screenshot your code and the output, then put them into a pdf file. After that
upload them to eCampus or any given submission. Follow your assistant guidance
for submission regulations.
Questions?
References
California Housing Data Description

Machine Learning Terminologies

pandas User Guide

Sample Data

Covid-19 Cases Dataset from Kaggle

Ngodingdata.com

You might also like