0% found this document useful (0 votes)
41 views

Intro To Py and ML - Part 2

This document outlines the topics and objectives of a session on data analytics in Python. The session aims to teach participants how to: 1) Design Python scripts to solve data analytics problems and visualize results 2) Solve data management problems in Python 3) Import, export, manage, and visualize data using Python libraries like Pandas The document provides code examples for common data analytics tasks like importing CSV data, managing empty values, selecting data by conditions, calculating statistics, and visualizing data using scatter plots, bar graphs, boxplots, and geospatial maps.

Uploaded by

KAORU Amane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Intro To Py and ML - Part 2

This document outlines the topics and objectives of a session on data analytics in Python. The session aims to teach participants how to: 1) Design Python scripts to solve data analytics problems and visualize results 2) Solve data management problems in Python 3) Import, export, manage, and visualize data using Python libraries like Pandas The document provides code examples for common data analytics tasks like importing CSV data, managing empty values, selecting data by conditions, calculating statistics, and visualizing data using scatter plots, bar graphs, boxplots, and geospatial maps.

Uploaded by

KAORU Amane
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Dr Mohd Hilmi Hasan

DATA ANALYTICS
OAU5362/DAM5362

May 2021
OUTCOMES & OUTLINE

OUTCOMES OUTLINES

At the end of this session, you will be able to: • Managing Empty Cells
• Design Python script to solve data analytics • Importing and Exporting Data
problems and visualize the results. • Managing Data
• Solve data management problems in Python. • Visualization

2
MANAGING EMPTY CELLS

• Create a data frame as follows:

import pandas as pd
import numpy as np
data = {'Name':['Ali', 'Abu', 'George', 'Mike', 'Chan', 'Sammy'],
'Marks':[70, 65,np.nan, 82, 78, 75]}
score = pd.DataFrame(data)
print(score)

• Type and run (Note: to use mean function import from package - from statistics import mean):
▪ print(sum(score[‘Marks’]))
▪ print(mean(score[‘Marks’]))

• To resolve:
score2 = score.dropna()
print(sum(score2['Marks']))
print(mean(score2['Marks']))
3
IMPORTING & EXPORTING DATA

• The most common way of getting data for analysis is through importing csv or excel dataset.

• Python provides this capability through pandas functions.

• Copy covid_my.csv in the working folder (working path), and code the following:
my = pd.read_csv("covid_my.csv")

• To display the data, call the variable (data frame) name: my

• The correctness of data import may be verified by using head function (this is normally used when data is huge,
and we only want to display certain n observations of the data):
my.head()

• Summary of data containing some statistical measurements can be retrieved by using describe function:
my.describe() • read_excel() and to_excel() functions are used for xlsx files.
• Path is used for files that are not in the working directory:
• To save a csv file use to_csv function: pd.read_csv("C:/Users/mhilmi_hasan/OneDrive/
my.to_csv("covid_my2.csv") cerdas/Py/covid_my.csv")
4
MANAGING DATA
• Type and run:
▪ my['Confirmed']
▪ my[7:16]
▪ max(my['Deaths'])
▪ min(my['Deaths'])
▪ mean(my['Confirmed’])

• Selecting data by condition:


▪ my[my['State']=='Perak']
▪ my[my['Confirmed'] > 1000]
▪ my.loc[my['Confirmed']>1000, 'State']

▪ max_deaths = max(my['Deaths’])
my.loc[my['Deaths']==max_deaths,'State']

▪ my2 = my
my2['Perc_Confirmed'] = round((my2['Confirmed']/my2['Population'])*100,2)
my2['Perc_Deaths'] = (my2['Deaths']/my2['Population'])*100
my2
5
VISUALIZATION – SCATTER PLOT

• Scatter plot:

import matplotlib.pyplot as plt


x = my['State']
y = my['Confirmed']
plt.scatter(x, y)
plt.title('Covid-19 Cases in MY by State')
plt.show()

• To improved the overlapped labels on x-axis (before show function):

plt.xticks(rotation=90)

6
VISUALIZATION – BAR GRAPH & BOXPLOT

• Bar graph:

x = my['State']
y = my['Confirmed']
plt.bar(x,y)
plt.xticks(rotation=90)
plt.title('Covid-19 Cases in MY by State')
plt.show()

• Boxplot:
plt.boxplot(my['Confirmed'])
plt.show()

7
VISUALIZATION – BAR GRAPH & BOXPLOT

• Create dataframe:

import pandas as pd
import numpy as np
data = {'Name':['Ali', 'Abu', 'George', 'Mike', 'Chan', 'Sammy'],
'Marks1':[70, 65,77, 82, 78, 75],
'Marks2':[80, 81,77, 82, 10, 85],
'Marks3':[70, 65,77, 10, 82, 75]}
score = pd.DataFrame(data)
print(score)

• Mutiple Boxplot:
score.boxplot(column=['Marks1','Marks2','Marks3'])

Outliers

8
VISUALIZATION – GEO & MAP

CASE STUDY

• Type and run the following codes:

pip install folium


import pandas as pd
import matplotlib.pyplot as plt #importing plotting library
import folium #importing geospatial visualization library

# creating the map


my_map = folium.Map(location=[1.559580, 103.637489], zoom_start=6)
df = pd.read_csv("covid_my.csv")

df.apply(lambda cvd:folium.Marker(location=[cvd["Lat"], cvd["Long"]],


popup =
['Cases='+str(cvd["Confirmed"]), 'Deaths='+str(cvd["Deaths"])])
.add_to(my_map), axis=1)

# display the map


my_map 9
10

You might also like