Unit-4 PSC
Unit-4 PSC
# – 3
Outline
Looping (Pandas)
Series
Data Frames
Accessing text, CSV, Excel files using pandas
Accessing SQL Database
Missing Data
Group By
Merging, Joining & Concatenating
Operations
Series
Series is an one-dimensional* array with axis labels.
It supports both integer and label-based index but index must be of hashable type.
If we do not specify index it will assign integer zero-based index.
syntax Parameters
import pandas as pd data = array like Iterable
s = pd.Series(data,index,dtype,copy=False) index = array like index
dtype = data-type
copy = bool, default is False
pandasSeries.py Output
1 import pandas as pd 0 1
2 s = pd.Series([1, 3, 5, 7, 9, 11]) 1 3
3 print(s) 2 5
3 7
4 9
5 11
dtype: int64
# – 5
Series (Cont.)
We can then access the elements inside Series just like array using square
brackets notation.
pdSeriesEle.py Output
1 import pandas as pd S[0] = 1
2 s = pd.Series([1, 3, 5, 7, 9, 11]) Sum = 4
3 print("S[0] = ", s[0])
4 b = s[0] + s[1]
5 print("Sum = ", b)
s
We can specify the data type of Series using dtype parameter
pdSeriesdtype.py Output
1 import pandas as pd S[0] = 1
2 s = pd.Series([1, 3, 5, 7, 9, 11], dtype='str') Sum = 13
3 print("S[0] = ", s[0])
4 b = s[0] + s[1]
5 print("Sum = ", b)
# – 6
Series (Cont.)
We can specify index to Series with the help of index parameter
pdSeriesdtype.p
y Output
1 import numpy as np name darshan
2 import pandas as pd address rj
3 i = ['name','address','phone','email','website'] phone 123
4 d = ['darshan','rj',123','[email protected]','darshan.ac.in'] email [email protected]
5 s = pd.Series(data=d,index=i) website darshan.ac.in
6 print(s) dtype: object
# – 7
Creating Time Series
We can use some of pandas inbuilt date functions to create a time series.
pdSeriesEle.py Output
1 import numpy as np 2020-07-27 50
2 import pandas as pd 2020-07-28 53
3 dates = pd.to_datetime("27th of July, 2020") 2020-07-29 25
4 i = dates + pd.to_timedelta(np.arange(5), unit='D') 2020-07-30 70
d = [50,53,25,70,60] 2020-07-31 60
5 time_series = pd.Series(data=d,index=i) dtype: int64
6 print(time_series)
7
# – 8
Data Frames
Data frames are two dimensional data structure, i.e. data is aligned in a tabular
format in rows and columns.
Data frame also contains labelled axes on rows and columns.
Features of Data Frame :
It is size-mutable
Has labelled axes
Columns can be of different data types
We can perform arithmetic operations on rows and columns.
Structure :
PDS Algo SE INS
101
102
103
….
160
# – 9
Data Frames (Cont.)
Syntax :
syntax Parameters
import pandas as pd data = array like Iterable
df = pd.DataFrame(data,index,columns,dtype,copy=False) index = array like row index
columns = array like col index
dtype = data-type
copy = bool, default is False
Example :
pdDataFrame.py Output
1 import numpy as np PDS Algo SE INS
2 import pandas as pd 101 0 23 93 46
3 randArr = np.random.randint(0,100,20).reshape(5,4) 102 85 47 31 12
4 df = 103 35 34 6 89
pd.DataFrame(randArr,np.arange(101,106,1),['PDS','Algo','S 104 66 83 70 50
E','INS']) 105 65 88 87 87
5 print(df)
# – 10
Data Frames (Cont.)
• Grabbing the column
dfGrabCol.py Output
1 import numpy as np 101 0
2 import pandas as pd 102 85
3 randArr = np.random.randint(0,100,20).reshape(5,4) 103 35
4 df = 104 66
pd.DataFrame(randArr,np.arange(101,106,1),['PDS','Algo','S 105 65
E','INS']) Name: PDS, dtype: int32
5 print(df['PDS'])
• Grabbing the multiple column
Output
dfGrabMulCol.py
PDS SE
1 print(df['PDS', 'SE']) 101 0 93
102 85 31
103 35 6
104 66 70
105 65 87
# 11
Data Frames (Cont.)
Grabbing a row
dfGrabRow.py Output
1 print(df.loc[101]) # using labels PDS 0
2 #OR Algo 23
3 print(df.iloc[0]) # using zero based index SE 93
INS 46
Name: 101, dtype: int32
Grabbing Single Value
dfGrabSingle.py Output
1 print(df.loc[101, 'PDS']) # using labels 0
dfDelCol.py
PDS Algo SE INS
101 0 23 93 46
1 df.drop('103',inplace=True) 102 85 47 31 12
2 print(df) 104 66 83 70 50
105 65 88 87 87
# 12
Data Frames (Cont.)
Creating new column Output
dfCreateCol.py
PDS Algo SE INS total
101 0 23 93 46 162
1 df['total'] = df['PDS'] + df['Algo'] + df['SE'] + 102 85 47 31 12 175
df['INS'] 103 35 34 6 89 164
2 print(df) 104 66 83 70 50 269
105 65 88 87 87 327
dfDelCol.py
PDS Algo SE INS
101 0 23 93 46
1 df.drop('total',axis=1,inplace=True) 102 85 47 31 12
2 print(df) 103 35 34 6 89
104 66 83 70 50
105 65 88 87 87
# – 13
Data Frames (Cont.)
Getting Subset of Data Frame
dfGrabSubSet.py Output
1 print(df.loc[[101,104], [['PDS','INS']]) PDS INS
101 0 46
104 66 50
# – 14
Conditional Selection
Similar to NumPy we can do conditional selection in pandas.
dfCondSel.py Output
1 import numpy as np PDS Algo SE INS
2 import pandas as pd 101 66 85 8 95
3 np.random.seed(121) 102 65 52 83 96
4 randArr = 103 46 34 52 60
np.random.randint(0,100,20).reshape(5,4) 104 54 3 94 52
5 df = 105 57 75 88 39
pd.DataFrame(randArr,np.arange(101,106,1),['PD PDS Algo SE INS
S','Algo','SE','INS']) 101 True True False True
6 print(df) 102 True True True True
7 print(df>50) 103 False False True True
104 True False True True
105 True True True False
Note : we have used np.random.seed() method and set seed to be 121, so that
when you generate random number it matches with the random number I have
generated.
# 15
Conditional Selection (Cont.)
We can then use this boolean DataFrame to get associated values.
dfCondSel.py Output
1 dfBool = df > 50 PDS Algo SE INS
2 print(df[dfBool]) 101 66 85 NaN 95
102 65 52 83 96
103 NaN NaN 52 60
Note : It will set NaN (Not a Number) in case of False
104 54 NaN 94 52
105 57 75 88 NaN
# – 16
Setting/Resetting index
In our previous example we have seen our index does not have name, if we want
to specify name to our index we can specify it using DataFrame.index.name
property.
dfCondSel.py Output
1 df.index.name('RollNo') PDS Algo SE INS
RollNo
101 66 85 8 95
102 65 52 83 96
Note: We have name
103 46 34 52 60
to our index now
104 54 3 94 52
105 57 75 88 39
# – 17
Setting/Resetting index (Cont.)
set_index(new_index)
dfCondSel.py Output
1 df.set_index('PDS') #inplace=True Algo SE INS
PDS
66 85 8 95
65 52 83 96
Note: We have PDS as
46 34 52 60
our index now
54 3 94 52
reset_index() 57 75 88 39
dfCondSel.py Output
1 df.reset_index() RollNo PDS Algo SE INS
0 101 66 85 8 95
Note: Our 1 102 65 52 83 96
RollNo(index) become
2 103 46 34 52 60
new column, and we
now have zero based
3 104 54 3 94 52
numeric index 4 105 57 75 88 39
# – 18
Multi-Index DataFrame
Hierarchical indexes (AKA multiindexes) help us to organize, find, and aggregate
information faster at almost no cost.
Example where we need Hierarchical indexes
Numeric Index/Single Index Multi Index
Col Dep Sem RN S1 S2 S3 RN S1 S2 S3
0 ABC CE 5 101 50 60 70 Col Dep Sem
1 ABC CE 5 102 48 70 25 ABC CE 5 101 50 60 70
2 ABC CE 7 101 58 59 51 5 102 48 70 25
3 ABC ME 5 101 30 35 39 7 101 58 59 51
4 ABC ME 5 102 50 90 48 ME 5 101 30 35 39
5 Darshan CE 5 101 88 99 77 5 102 50 90 48
6 Darshan CE 5 102 99 84 76 Darshan CE 5 101 88 99 77
7 Darshan CE 7 101 88 77 99 5 102 99 84 76
8 Darshan ME 5 101 44 88 99 7 101 88 77 99
ME 5 101 44 88 99
# – 19
Multi-Index DataFrame (Cont.)
Creating multiindexes is as simple as creating single index using set_index method,
only difference is in case of multiindexes we need to provide list of indexes instead
of a single string index, lets see and example for that
dfMultiIndex.py Output
1 dfMulti = pd.read_csv('MultiIndexDemo.csv') RN S1 S2 S3
dfMulti.set_index(['Col','Dep','Sem'],inplace=T Col Dep Sem
2 rue) ABC CE 5 101 50 60 70
print(dfMulti) 5 102 48 70 25
3 7 101 58 59 51
ME 5 101 30 35 39
5 102 50 90 48
Darshan CE 5 101 88 99 77
5 102 99 84 76
7 101 88 77 99
ME 5 101 44 88 99
# – 20
Multi-Index DataFrame (Cont.)
Now we have multi-indexed DataFrame from which we can access data using
multiple index
Output (Darshan)
For Example RN S1 S2 S3
dfGrabDarshanStu.
Sub DataFrame for all the students of Darshan Dep Sem
py
1 print(dfMulti.loc['Darshan']) CE 5 101 88 99 77
5 102 99 84 76
7 101 88 77 99
ME 5 101 44 88 99
Output (Darshan->CE)
RN S1 S2 S3
Sem
Sub DataFrame for Computer Engineering 5 101 88 99 77
dfGrabDarshanCEStu.
students from Darshan 5 102 99 84 76
py 7 101 88 77 99
1 print(dfMulti.loc['Darshan','CE'])
# – 21
Reading in Multiindexed DataFrame directly from CSV
read_csv function of pandas provides easy way to create multi-indexed DataFrame
directly while fetching the CSV file.
dfMultiIndex.py Output
1 dfMultiCSV = RN S1 S2 S3
pd.read_csv('MultiIndexDemo.csv',inde Col Dep Sem
x_col=[0,1,2]) ABC CE 5 101 50 60 70
#for multi-index in cols we can use 5 102 48 70 25
header parameter 7 101 58 59 51
2 print(dfMultiCSV) ME 5 101 30 35 39
5 102 50 90 48
Darshan CE 5 101 88 99 77
5 102 99 84 76
7 101 88 77 99
ME 5 101 44 88 99
# – 22
Cross Sections in DataFrame
The xs() function is used to get cross-section from === Parameters ===
key : label
the Series/DataFrame.
axis : Axis to retrieve cross
This method takes a key argument to select data at section
a particular level of a MultiIndex. level : level of key
drop_level : False if you want to
Syntax : preserve the level
syntax
DataFrame.xs(key, axis=0, level=None, drop_level=True) Output
RN S1 S2 S3 RN S1 S2 S3
Example :
Col
Col Dep
SemSem
dfMultiIndex.py
ABC
ABC CE 5 101
5 101 50 70
50 60 60 70
1 dfMultiCSV = pd.read_csv('MultiIndexDemo.csv', 5 5 102
1024848
70702525
index_col=[0,1,2]) 7 7 101
1015858
59595151
print(dfMultiCSV) ME 55 101
Darshan 101 88
30 99
35 77
39
2 print(dfMultiCSV.xs('CE',axis=0,level='Dep')) 5 5 102
1029950
84907648
3 Darshan
7 CE1015 88101
77 88
99 99 77
5 102 99 84 76
7 101 88 77 99
ME 5 101 44 88 99
# – 23
Dealing with Missing Data
There are many methods by which we can deal with the missing data, some of
most commons are listed below,
dropna, will drop (delete) the missing data (rows/cols)
fillna, will fill specified values in place of missing data
interpolate, will interpolate missing data and fill interpolated value in place of missing data.
# – 24
Groupby in Pandas
Any groupby operation involves one of the
following operations on the original object. They Colleg Enn CP
e o I
are
Splitting the Object Darsha 123 8.9
n
Applying a function
Combining the results Darsha 124 9.2
n
In many situations, we split the data into sets Darsha 125 7.8
and we apply some functionality on each n College Mean
subset. Darsha 128 8.7 CPI
n Darshan 8.65
we can perform the following operations
ABC 211 5.6 ABC 4.8
Aggregation − computing a summary statistic
212 6.2 XYZ 5.83
Transformation − perform some group-specific ABC
operation ABC 215 3.2
Filtration − discarding the data with some condition ABC 218 4.2
Basic ways to use of groupby method XYZ 312 5.2
df.groupby('key') XYZ 315 6.5
df.groupby(['key1','key2']) XYZ 315 5.8
df.groupby(key,axis=1) # – 25
Groupby in Pandas (Cont.)
Example : Listing all the groups
dfGroup.py Output
1 dfIPL = pd.read_csv('IPLDataSet.csv') {2014: Int64Index([0, 2, 4, 9],
2 print(dfIPL.groupby('Year').groups) dtype='int64'),
2015: Int64Index([1, 3, 5, 10],
dtype='int64'),
2016: Int64Index([6, 8], dtype='int64'),
2017: Int64Index([7, 11],
dtype='int64')}
# – 26
Groupby in Pandas (Cont.)
Example : Group by multiple columns
dfGroupMul.py Output
1 dfIPL = pd.read_csv('IPLDataSet.csv') {(2014, 'Devils'): Int64Index([2],
2 print(dfIPL.groupby(['Year','Team']).groups) dtype='int64'),
(2014, 'Kings'): Int64Index([4],
dtype='int64'),
(2014, 'Riders'): Int64Index([0],
dtype='int64'),
………
………
(2016, 'Riders'): Int64Index([8],
dtype='int64'),
(2017, 'Kings'): Int64Index([7],
dtype='int64'),
(2017, 'Riders'): Int64Index([11],
dtype='int64')}
# – 27
Groupby in Pandas (Cont.) 2014
Output
# – 30
Concatenation in Pandas
Concatenation basically glues together DataFrames.
Keep in mind that dimensions should match along the axis you are concatenating
on.
You can use pd.concat and pass in a list of DataFramesOutput
dfConcat.py to concatenate together:
1 dfCX = pd.read_csv('CX_Marks.csv',index_col=0) PDS Algo SE
2 101 50 55 60
dfCY = pd.read_csv('CY_Marks.csv',index_col=0)
3 102 70 80 61
dfCZ = pd.read_csv('CZ_Marks.csv',index_col=0)
4 dfAllStudent = pd.concat([dfCX,dfCY,dfCZ]) 103 55 89 70
5 print(dfAllStudent) 104 58 96 85
201 77 96 63
202 44 78 32
203 55 85 21
Note : We can use axis=1 parameter to concat columns.
204 69 66 54
301 11 75 88
302 22 48 77
303 33 59 68
304 44 55 62
# – 31
Join in Pandas
df.join() method will efficiently join multiple DataFrame objects by index(or column
specified) .
some of important Parameters :
dfOther : Right Data Frame
on (Not recommended) : specify the column on which we want to join (Default is index)
how : How to handle the operation of the two objects.
left: use calling frame’s index (Default).
right: use dfOther index.
outer: form union of calling frame’s index with other’s index (or column if on is specified), and sort it.
lexicographically.
inner: form intersection of calling frame’s index (or column if on is specified) with other’s index,
preserving the order of the calling’s one.
# – 32
Join in Pandas (Example)
# – 35
Merge in Pandas (Example)
dfMerge.py Output
1 m1 = pd.read_csv('Merge1.csv') RollNo EnNo Name
2 print(m1) 0 101 11112222 Abc
3 m2 = pd.read_csv('Merge2.csv') 1 102 11113333 Xyz
4 print(m2) 2 103 22224444 Def
5 m3 = m1.merge(m2,on='EnNo')
6 print(m3) EnNo PDS INS
0 11112222 50 60
1 11113333 60 70
# – 36
Read CSV in Pandas
read_csv() is used to read Comma Separated Values (CSV) file into a pandas
DataFrame.
Output
some of important Parameters : PDS Algo SE
filePath : str, path object, or file-like object INS
sep : separator (Default is comma) 101 50 55 60
header: Row number(s) to use as the column names. 55.0
index_col : index column(s) of the data frame.
readCSV.py
102 70 80 61
66.0
1 dfINS = pd.read_csv('Marks.csv',index_col=0,header=0) 103 55 89 70
2 print(dfINS) 77.0
104 58 96 85
88.0
201 77 96 63
66.0
# – 37
Read Excel in Pandas
Read an Excel file into a pandas DataFrame.
Supports xls, xlsx, xlsm, xlsb, odf, ods and odt file extensions read from a local
filesystem or URL. Supports an option to read a single sheet or a list of sheets.
some of important Parameters :
excelFile : str, bytes, ExcelFile, xlrd.Book, path object, or file-like object
sheet_name : sheet no in integer or the name of the sheet, can have list of sheets.
index_col : index column of the data frame.
# – 38
Read from MySQL Database
We need two libraries for that,
conda install sqlalchemy
conda install pymysql
After installing both the libraries, import create_engine from sqlalchemy and import
pymysql
importsForDB.py
1 from sqlalchemy import create_engine
2 import pymysql
Then, create a database connection string and create engine using it.
createEngine.py
1 db_connection_str = 'mysql+pymysql://username:password@host/dbname'
2 db_connection = create_engine(db_connection_str)
# – 39
Read from MySQL Database (Cont.)
After getting the engine, we can fire any sql query using pd.read_sql method.
read_sql is a generic method which can be used t o r e a d f r o m a n y sq l
(MySQL,MSSQL, Oracle etc…)
readSQLDemo.p
y
1 df = pd.read_sql('SELECT * FROM cities', con=db_connection)
2 print(df)
Output
CityID CityName CityDescription CityCode
0 1 Rajkot Rajkot Description here RJT
1 2 Ahemdabad Ahemdabad Description here ADI
2 3 Surat Surat Description here SRT
# – 40
Web Scrapping using Beautiful Soup
Beautiful Soup is a library that makes it easy to scrape information from web pages.
It sits atop an HTML or XML parser, providing Pythonic idioms for iterating,
searching, and modifying the parse tree. Output
webScrap.py Dr. Gopi Sanghani
1 import requests Dr. Nilesh
2 import bs4 Gambhava
3 req = requests.get('https://www.indusuni.ac.in') Dr. Pradyumansinh
soup = bs4.BeautifulSoup(req.text,'lxml') Jadeja
4 allFaculty = soup.select('body > main > section:nth-child(5) > div > div > Prof. Hardik Doshi
5 div.col-lg-8.col-xl-9 > div > div') Prof. Maulik Trivedi
for fac in allFaculty : Prof. Dixita
allSpans = fac.select('h2>a') Kagathara
6 print(allSpans[0].text.strip()) Prof. Firoz
7 Sherasiya
8 Prof. Rupesh
Vaishnav
Prof. Swati Sharma
Prof. Arjun Bala
# –
Prof. Mayur Padia
41
Looping
Introduction to MatPlotLib
Graph
Plot
Drawing Multiple Lines and Plots
Export graphs/plots to
Image/PDF/SVG
Axis, Ticks ad Grids
Line Appearance
Labels, Annotation, Legends
Types of Graphs
Pie Chart
Bar Chart
Histograms
Boxplots
Scatterplots
Time Series
Plotting Geographical data
Introduction to MatPlotLib
Most people visualize information better when they see it in graphic versus textual
format.
Graphics help people see relationships and make comparisons with greater ease.
Fortunately, python makes the task of converting textual data into graphics
relatively easy using libraries, one of most commonly used library for this is
MatPlotLib.
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python.
# – 43
Graph
A Graph or chart is simply a visual representation of numeric data.
MatPlotLib makes a large number of graph and chart types.
We can choose any of the common graph such as line charts, histogram, scatter
plots etc....
Line Chart Histogram Scatter Plot 3D Plot Images Bar Chart Pie Chart
Etc.......
# – 44
Plot
To define a plot, we need some values, the matplotlib.pyplot module and an idea of
what we want to display.
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values = [5,8,9,4,1,6,7,2,3,8]
4 plt.plot(range(1,11),values)
5 plt.show()
In this case, the code tells the plt.plot() function to create a plot using x-axis
between 1 and 11 and y-axis as per values list.
# – 45
Plot – Drawing multiple lines
We can draw multiple lines in a plot by making multiple plt.plot() calls.
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values1 = [5,8,9,4,1,6,7,2,3,8]
4 values2 = [8,3,2,7,6,1,4,9,8,5]
5 plt.plot(range(1,11),values1)
6 plt.plot(range(1,11),values2)
7 plt.show()
# – 46
Plot – Export graphs/plots
We can export/save our plots on a drive using savefig() method.
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values1 = [5,8,9,4,1,6,7,2,3,8]
4 values2 = [8,3,2,7,6,1,4,9,8,5]
5 plt.plot(range(1,11),values1)
6 plt.plot(range(1,11),values2)
7 #plt.show()
8 plt.savefig('SaveToPath.png',format='png')
SaveToPath.png
Possible values for the format parameters are
png
svg
pdf
Etc...
# – 47
Plot – Axis, Ticks and Grid
We can access and format the axis, ticks and grid on the plot using the axis()
method of the matplotlib.pyplot.plt
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib notebook
3 values = [5,8,9,4,1,6,7,2,3,8]
4 ax = plt.axes()
5 ax.set_xlim([0,50])
6 ax.set_ylim([-10,10])
7 ax.set_xticks([0,5,10,15,20,25,30,35,40,45,50])
8 ax.set_yticks([-10,-8,-6,-4,-2,0,2,4,6,8,10])
9 ax.grid()
10 plt.plot(range(1,11),values)
# – 48
Plot – Line Appearance
We need different line styles in order to differentiate when having multiple lines in
the same plot, we can achieve this using many parameters, some of them are listed
below.
Line style (linestyle or ls)
Line width (linewidth or lw)
Line color (color or c)
Markers (marker)
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values1 = [5,8,9,4,1,6,7,2,3,8]
4 values2 = [8,3,2,7,6,1,4,9,8,5]
5 plt.plot(range(1,11),values1,c='r',lw=1,ls='--',marker='>')
6 plt.plot(range(1,11),values2,c='b',lw=2,ls=':',marker='o')
7 plt.show()
# – 49
Plot – Line Appearance (Cont.)
Possible Values for each parameters are,
# – 50
Plot – Labels, Annotation and Legends
To fully document our graph, we have to
resort the labels, annotation and legends.
Each of this elements has a different
purpose as follows,
Label : provides identification of a particular
Y Label
data element or grouping, it will make easy
for viewer to know the name or kind of data
illustrated.
Annotation : augments the information the
viewer can immediately see about the data
with notes, sources or other useful
information.
Legend : presents a listing of the data groups Annotatio
within the graph and often provides cues n
( such as line type or color) to identify the line
with the data. X Label Legend
# – 51
Plot – Labels, Annotation and Legends (Example)
plotDemo1.py
1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 values1 = [5,8,9,4,1,6,7,2,3,8]
4 values2 = [8,3,2,7,6,1,4,9,8,5]
5 plt.plot(range(1,11),values1)
6 plt.plot(range(1,11),values2)
7 plt.xlabel('Roll No')
8 plt.ylabel('CPI')
9 plt.annotate(xy=[5,1],s='Lowest CPI')
10 plt.legend(['CX','CY'],loc=4)
11 plt.show()
# – 52
Choosing the Right Graph
The kind of graph we choose determines how people view the associated data, so
choosing the right graph from the outset is important.
For example,
if we want o show how various data elements contribute towards a whole, we should use pie
chart.
If we want to compare data elements, we should use bar chart.
If we want to show distribution of elements, we should use histograms.
If we want to depict groups in elements, we should use boxplots.
If we want to find patterns in data, we should use scatterplots.
If we want to display trends over time, we should use line chart.
If we want to display geographical data, we should use basemap.
If we want to display network, we should use networkx.
All the above graphs are there in our syllabus and we are going to cover all the
graphs in this Unit.
# – 53
Pie Chart
Pie chart focus on showing parts of a whole, the entire pie would be 100
percentage, the question is how much of that percentage each value occupies.
pieChartDemo.p
y
1 import matplotlib.pyplot as plt
2 %matplotlib notebook
3 values = [305,201,805,35,436]
4 l = ['Food','Travel','Accomodation','Misc','Shoping']
c = ['b','g','r','c','m']
5 e = [0,0.2,0,0,0]
6 plt.pie(values,colors=c,labels=l,explode=e)
7 plt.show()
8
# – 54
Pie Chart (Cont.)
There are lots of other options available with the pie chart, we are going to cover
two important parameters in this slide.
pieChartDemo.p
y
1 import matplotlib.pyplot as plt
2 %matplotlib notebook
3 values = [305,201,805,35,436]
4 l = ['Food','Travel','Accomodation','Misc','Shoping']
c = ['b','g','r','c','m']
5 plt.pie(values,colors=c,labels=l,shadow=True,
6 autopct='%1.1f%%')
7 plt.show()
8
# – 55
Bar charts
Bar charts make comparing values easy, wide bars an d segregated measurements
emphasize the difference between values, rather that the flow of one value to
another as a line graph.
barChartDemo.p
y
1 import matplotlib.pyplot as plt
2 %matplotlib notebook
3 x = [1,2,3,4,5]
4 y = [5.9,6.2,3.2,8.9,9.7]
5 l = ['1st','2nd','3rd','4th','5th']
6 c = ['b','g','r','c','m']
7 w = [0.5,0.6,0.3,0.8,0.9]
8 plt.title('Sem wise spi')
9 plt.bar(x,y,color=c,label=l,width=w)
10 plt.show()
# – 56
Histograms
Histograms categorize data by breaking it into bins, where each bin contains a
subset of the data range.
A Histogram then displays the number of items in each bin so that you can see the
distribution of data and the progression of data from bin to bin.
histDemo.py
1 import matplotlib.pyplot as plt
2 import numpy as np
3 %matplotlib notebook
4 cpis = np.random.randint(0,10,100)
5 plt.hist(cpis,bins=10,
histtype='stepfilled',align='mid',label='CPI Hist')
plt.legend()
6 plt.show()
7
# – 57
Boxplots
Boxplots provide a means of depicting groups of numbers through their quartiles.
Quartiles means three points dividing a group into four equal parts.
In boxplot, data will be divided in 4 part using the 3 points (25th percentile, median,
75th percentile)
Interquartile Range
(IQR)
Minimum Maximum
(Q1 – 1.5 * IQR) (Q3 + 1.5 * IQR)
Median
Q1 Q2 Q3
(25th Percentile)(50th Percentile)(75th Percentile)
-5 -4 -3 -2 -1 0 1 2 3 4 5
# 58
Boxplot (Cont.)
Boxplot basically used to detect outliers in the data, lets see an example where we
need boxplot.
We have a dataset where we have time taken to check the paper, and we want to
find the faculty which either takes more time or very little time to check the paper.
boxDemo.py
1 import pandas as pd
2 import matplotlib.pyplot as plt
3 %matplotlib inline
4 timetaken =
pd.Series([50,45,52,63,70,21,56,68,54,57,35,
62,65,92,32])
5 plt.boxplot(timetaken)
# – 60
Scatter Plot (Cont.)
To find specific pattern from the data, we can further divide the data and plot scatter
plot.
We can do this with the help of groupby method of DataFrame, and then using
tuple unpacking while looping the group.
histDemo.py
1 import matplotlib.pyplot as plt
2 import pandas as pd
3 %matplotlib inline
4 df = pd.read_csv('insurance.csv')
5 grouped = df.groupby(['smoker'])
6 for key, group in grouped:
7 plt.scatter(group['bmi'],
group['charges'],
label='Smoke = '+key)
8 plt.legend()
9 plt.show()
Note : we can specify marker, color, and size of the marker with the help
of marker, color and s parameter respectively.
# – 61
Time Series
Observations over time can be considered as a Time Series.
Visualization plays an important role in time series analysis and forecasting.
Time Series plots can provide valuable diagnostics to identify temporal structures
like trends, cycles, and seasonality.
In order to create a Time Series we first need to get the date range, which can be
created with the help of datetime and pandas library.
timeDemo.py OUTPUT
1 import pandas as pd DatetimeIndex(['2020-08-28', '2020-08-
2 29', '2020-08-30', '2020-08-31', '2020-
import datetime as dt
09-01', '2020-09-02', '2020-09-03',
3 start_date = dt.datetime(2020,8,28) '2020-09-04', '2020-09-05'],
4 end_date = dt.datetime(2020,9,05) dtype='datetime64[ns]', freq='D')
5 daterange = pd.date_range(start_date,end_date)
6 print(daterange)
# – 62
Time Series (Cont.)
We can use some more parameters for date_range() function like
freq, to specify the frequency at which we want the date range (default is ‘D’ for days)
periods, number of periods to generate in between start/end or from start with freq.
We can also create a date range with the help of startdate, periods and freq, for
example
timeDemo.py OUTPUT
1 import pandas as pd DatetimeIndex(['2020-08-25', '2020-08-
2 26', '2020-08-27', '2020-08-28', '2020-
import datetime as dt
08-29', '2020-08-30', '2020-08-31',
3 start_date = dt.datetime(2020,8,28) '2020-09-01', '2020-09-02', '2020-09-
4 daterange = 03'],
5 pd.date_range(start_date,freq='D',periods=10) dtype='datetime64[ns]', freq='D')
6 print(daterange)
# – 64
NetworkX
We can use networkx library in order to deal with any kind of networks, which
includes social network, railway network, road connectivity etc….
Install
pip install networkx
conda install networkx
Types of network graph
Undirected
Directed
Weighted graph
# – 65
NetworkX (example)
networkxDemo.p
y
1 import networkx as nx
2 g = nx.Graph() # undirected graph
3 g.add_edge('rajkot','junagadh')
4 g.add_edge('junagadh','porbandar')
5 g.add_edge('rajkot','jamnagar')
6 g.add_edge('jamnagar','bhanvad')
7 g.add_edge('bhanvad','porbandar')
8 nx.draw(g,with_labels=True)
networkxDemo.p
y
1 import networkx as nx
2 gD = nx.DiGraph() # directed graph
3 gD.add_edge('Modi','Arjun')
4 gD.add_edge('Modi','GambhavaSir')
5 gD.add_edge('GambhavaSir','Modi')
6
7 nx.draw(gD, with_labels=True)
# 66
NetworkX (cont.)
We can use many analysis functions available in NetworkX library, some of
functions are as below
nx.shortest_path(g,'rajkot','porbandar')
Will return ['rajkot', 'junagadh', 'porbandar']
nx.clustering(g)
Will return clustering value for each node
nx.degree_centrality(g)
Will return the degree of centrality for each node, we can find most popular/influential node using this
method.
nx.density(g)
Will return the density of the graph.
The density is 0 for a graph without edges and 1 for a complete graph.
nx.info(g)
Return a summary of information for the graph G.
The summary includes the number of nodes and edges, and their average degree.
# – 67
18. Python - GUI Programming (Tkinter)
import Tkinter
top = Tkinter.Tk()
# Code to add widgets will go here...
top.mainloop()
Python - Tkinter Button
The Button widget is used to add buttons in a Python application. These
buttons can display text or images that convey the purpose of the buttons.
You can attach a function or a method to a button, which is called
automatically when you click the button.
Syntax:
w = Button ( master, option=value, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
import Tkinter
import tkMessageBox
top = Tkinter.Tk()
def helloCallBack():
tkMessageBox.showinfo( "Hello Python", "Hello World")
B = Tkinter.Button(top, text ="Hello", command =
helloCallBack)
B.pack()
top.mainloop()
Python - Tkinter Canvas
The Canvas is a rectangular area intended for drawing pictures or other
complex layouts. You can place graphics, text, widgets, or frames on a
Canvas.
Syntax:
w = Canvas ( master, option=value, ... )
• Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
The Canvas widget can support the following standard items:
• arc . Creates an arc item.
coord = 10, 50, 240, 210
arc = canvas.create_arc(coord, start=0, extent=150,
fill="blue")
• image . Creates an image item, which can be an instance of either the
BitmapImage or the PhotoImage classes.
filename = PhotoImage(file = "sunshine.gif")
image = canvas.create_image(50, 50, anchor=NE,
image=filename)
• line . Creates a line item.
line = canvas.create_line(x0, y0, x1, y1, ..., xn, yn,
options)
• oval . Creates a circle or an ellipse at the given coordinates. oval =
canvas.create_oval(x0, y0, x1, y1, options)
• polygon . Creates a polygon item that must have at least three vertices.
oval = canvas.create_polygon(x0, y0, x1, y1,...xn, yn,
options)
Example:
import Tkinter
import tkMessageBox
top = Tkinter.Tk()
C = Tkinter.Canvas(top, bg="blue", height=250,
width=300)
coord = 10, 50, 240, 210
arc = C.create_arc(coord, start=0, extent=150,
fill="red")
C.pack()
top.mainloop()
Example:
import Tkinter
import tkMessageBox
top = Tkinter.Tk()
C.pack()
top.mainloop()
Python - Tkinter Checkbutton
The Checkbutton widget is used to display a number of options to a user
as toggle buttons. The user can then select one or more options by clicking
the button corresponding to each option.
You can also display images in place of text.
Syntax:
w = Checkbutton ( master, option, ... )
• Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
import tkMessageBox
import Tkinter
top = Tkinter.Tk()
CheckVar1 = IntVar()
CheckVar2 = IntVar()
C1 = Checkbutton(top, text = "Music", variable = CheckVar1,
\
onvalue = 1, offvalue = 0, height=5, width = 20)
C2 = Checkbutton(top, text = "Video", variable = CheckVar2,
\
onvalue = 1, offvalue = 0, height=5, width = 20)
C1.pack()
C2.pack()
top.mainloop()
Python - Tkinter Entry:
• The Entry widget is used to accept single-line text strings from a user.
• If you want to display multiple lines of text that can be edited, then you
should usethe Text widget.
• If you want to display one or more lines of text that cannot be modified by
the user then you should use the Label widget.
Syntax:
Here is the simple syntax to create this widget:
w = Entry( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
top = Tk()
L1 = Label(top, text="User Name")
L1.pack( side = LEFT)
E1 = Entry(top, bd =5)
E1.pack(side = RIGHT)
top.mainloop()
Python - Tkinter Frame
• The Frame widget is very important for the process of grouping and
organizing other widgets in a somehow friendly way. It works like a
container, which is responsible for arranging the position of other widgets.
• It uses rectangular areas in the screen to organize the layout and to
provide padding of these widgets. A frame can also be used as a
foundation class to implement complex widgets.
Syntax:
Here is the simple syntax to create this widget:
w = Frame ( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
root = Tk()
frame = Frame(root)
frame.pack()
bottomframe = Frame(root)
bottomframe.pack( side = BOTTOM )
redbutton = Button(frame, text="Red", fg="red")
redbutton.pack( side = LEFT)
greenbutton = Button(frame, text="Brown", fg="brown")
greenbutton.pack( side = LEFT )
bluebutton = Button(frame, text="Blue", fg="blue")
bluebutton.pack( side = LEFT )
blackbutton = Button(bottomframe, text="Black",
fg="black")
blackbutton.pack( side = BOTTOM)
root.mainloop()
Python - Tkinter Label
• This widget implements a display box where you can place text or images.
The text displayed by this widget can be updated at any time you want.
• It is also possible to underline part of the text (like to identify a keyboard
shortcut), and span the text across multiple lines.
Syntax:
Here is the simple syntax to create this widget:
w = Label ( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
root = Tk()
var = StringVar()
label = Label( root, textvariable=var, relief=RAISED )
top = Tk()
Lb1 = Listbox(top)
Lb1.insert(1, "Python")
Lb1.insert(2, "Perl")
Lb1.insert(3, "C")
Lb1.insert(4, "PHP")
Lb1.insert(5, "JSP")
Lb1.insert(6, "Ruby")
Lb1.pack()
top.mainloop()
Python - Tkinter Menubutton
• A menubutton is the part of a drop-down menu that stays on the screen all
the time. Every menubutton is associated with a Menu widget that can
display the choices for that menubutton when the user clicks on it.
Syntax:
Here is the simple syntax to create this widget:
w = Menubutton ( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
import tkMessageBox
import Tkinter
top = Tk()
mb= Menubutton ( top, text="condiments", relief=RAISED )
mb.grid()
mb.menu = Menu ( mb, tearoff = 0 )
mb["menu"] = mb.menu
mayoVar = IntVar()
ketchVar = IntVar()
mb.menu.add_checkbutton ( label="mayo“, variable=mayoVar )
mb.menu.add_checkbutton ( label="ketchup“,
variable=ketchVar )
mb.pack()
top.mainloop()
Python - Tkinter Message
• This widget provides a multiline and noneditable object that displays texts,
automatically breaking lines and justifying their contents.
• Its functionality is very similar to the one provided by the Label widget,
except that it can also automatically wrap the text, maintaining a given
width or aspect ratio.
Syntax:
Here is the simple syntax to create this widget:
w = Message ( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
root = Tk()
var = StringVar()
label = Message( root, textvariable=var, relief=RAISED )
root = Tk()
var = DoubleVar()
scale = Scale( root, variable = var )
scale.pack(anchor=CENTER)
label = Label(root)
label.pack()
root.mainloop()
Python - Tkinter Scrollbar
• This widget provides a slide controller that is used to implement vertical
scrolled widgets, such as Listbox, Text, and Canvas. Note that you can
also create horizontal scrollbars on Entry widgets.
Syntax:
Here is the simple syntax to create this widget:
w = Scrollbar ( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
root = Tk()
scrollbar = Scrollbar(root)
scrollbar.pack( side = RIGHT, fill=Y )
mainloop()
Python - Tkinter Text
• Text widgets provide advanced capabilities that allow you to edit a multiline
text and format the way it has to be displayed, such as changing its color
and font.
• You can also use elegant structures like tabs and marks to locate specific
sections of the text, and apply changes to those areas. Moreover, you can
embed windows and images in the text because this widget was designed
to handle both plain and formatted text.
Syntax:
Here is the simple syntax to create this widget:
w = Text ( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
def onclick():
pass
root = Tk()
text = Text(root)
text.insert(INSERT, "Hello.....")
text.insert(END, "Bye Bye.....")
text.pack()
root = Tk()
top = Toplevel()
top.mainloop()
Python - Tkinter Spinbox
• The Spinbox widget is a variant of the standard Tkinter Entry widget, which
can be used to select from a fixed number of values.
Syntax:
Here is the simple syntax to create this widget:
w = Spinbox( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
master = Tk()
mainloop()
Python - Tkinter PanedWindow
• A PanedWindow is a container widget that may contain any number of
panes, arranged horizontally or vertically.
• Each pane contains one widget, and each pair of panes is separated by a
moveable (via mouse movements) sash. Moving a sash causes the
widgets on either side of the sash to be resized.
Syntax:
Here is the simple syntax to create this widget:
w = PanedWindow( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
m1 = PanedWindow()
m1.pack(fill=BOTH, expand=1)
left = Label(m1, text="left pane")
m1.add(left)
m2 = PanedWindow(m1, orient=VERTICAL)
m1.add(m2)
top = Label(m2, text="top pane")
m2.add(top)
bottom = Label(m2, text="bottom pane")
m2.add(bottom)
mainloop()
Python - Tkinter LabelFrame
• A labelframe is a simple container widget. Its primary purpose is to act as a
spacer or container for complex window layouts.
• This widget has the features of a frame plus the ability to display a label.
Syntax:
Here is the simple syntax to create this widget:
w = LabelFrame( master, option, ... )
Parameters:
• master: This represents the parent window.
• options: Here is the list of most commonly used options for this widget.
These options can be used as key-value pairs separated by commas.
Example:
from Tkinter import *
root = Tk()
root.mainloop()