Intro To Py and ML - Part 2
Intro To Py and ML - Part 2
DATA ANALYTICS
OAU5362/DAM5362
May 2021
OUTCOMES & OUTLINE
OUTCOMES OUTLINES
At the end of this session, you will be able to: • Managing Empty Cells
• Design Python script to solve data analytics • Importing and Exporting Data
problems and visualize the results. • Managing Data
• Solve data management problems in Python. • Visualization
2
MANAGING EMPTY CELLS
import pandas as pd
import numpy as np
data = {'Name':['Ali', 'Abu', 'George', 'Mike', 'Chan', 'Sammy'],
'Marks':[70, 65,np.nan, 82, 78, 75]}
score = pd.DataFrame(data)
print(score)
• Type and run (Note: to use mean function import from package - from statistics import mean):
▪ print(sum(score[‘Marks’]))
▪ print(mean(score[‘Marks’]))
• To resolve:
score2 = score.dropna()
print(sum(score2['Marks']))
print(mean(score2['Marks']))
3
IMPORTING & EXPORTING DATA
• The most common way of getting data for analysis is through importing csv or excel dataset.
• Copy covid_my.csv in the working folder (working path), and code the following:
my = pd.read_csv("covid_my.csv")
• The correctness of data import may be verified by using head function (this is normally used when data is huge,
and we only want to display certain n observations of the data):
my.head()
• Summary of data containing some statistical measurements can be retrieved by using describe function:
my.describe() • read_excel() and to_excel() functions are used for xlsx files.
• Path is used for files that are not in the working directory:
• To save a csv file use to_csv function: pd.read_csv("C:/Users/mhilmi_hasan/OneDrive/
my.to_csv("covid_my2.csv") cerdas/Py/covid_my.csv")
4
MANAGING DATA
• Type and run:
▪ my['Confirmed']
▪ my[7:16]
▪ max(my['Deaths'])
▪ min(my['Deaths'])
▪ mean(my['Confirmed’])
▪ max_deaths = max(my['Deaths’])
my.loc[my['Deaths']==max_deaths,'State']
▪ my2 = my
my2['Perc_Confirmed'] = round((my2['Confirmed']/my2['Population'])*100,2)
my2['Perc_Deaths'] = (my2['Deaths']/my2['Population'])*100
my2
5
VISUALIZATION – SCATTER PLOT
• Scatter plot:
plt.xticks(rotation=90)
6
VISUALIZATION – BAR GRAPH & BOXPLOT
• Bar graph:
x = my['State']
y = my['Confirmed']
plt.bar(x,y)
plt.xticks(rotation=90)
plt.title('Covid-19 Cases in MY by State')
plt.show()
• Boxplot:
plt.boxplot(my['Confirmed'])
plt.show()
7
VISUALIZATION – BAR GRAPH & BOXPLOT
• Create dataframe:
import pandas as pd
import numpy as np
data = {'Name':['Ali', 'Abu', 'George', 'Mike', 'Chan', 'Sammy'],
'Marks1':[70, 65,77, 82, 78, 75],
'Marks2':[80, 81,77, 82, 10, 85],
'Marks3':[70, 65,77, 10, 82, 75]}
score = pd.DataFrame(data)
print(score)
• Mutiple Boxplot:
score.boxplot(column=['Marks1','Marks2','Marks3'])
Outliers
8
VISUALIZATION – GEO & MAP
CASE STUDY