0% found this document useful (0 votes)
153 views15 pages

CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only

Uploaded by

deopadevyansh88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
153 views15 pages

CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only

Uploaded by

deopadevyansh88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Chapter 2: Data Handling Using Pandas – I

Installing NumPy and Pandas library pip install numpy


packages through cmd pip install pandas
Importing Pandas library package into any import numpy as np
Python program import pandas as pd
Creating an empty series s0 = pd.Series()
Creating a series from a list of scalar values s1 = pd.Series([10,20,30])
s2 = pd.Series(['a','b','c'])
s3 = pd.Series([1,2.0,np.nan,'a'])
Creating a series from a sequence of s4 = pd.Series(np.arange(1,4,1))
numbers
Creating a series from 1D numpy array arr1 = np.array([1,2,3,4])
s5 = pd.Series(arr1)
Creating a series from a dict dict1 = {'India':'NewDelhi','UK':'London','Japan':'Tokyo'}
Keys become index of series and dictionary s6 = pd.Series(dict1)
values form the series
Assigning user-defined numeric and non- s7=pd.Series(["Kavi","Shyam","Ravi"],index=[3,5,1])
numeric index labels to the elements of a s8=pd.Series(["Kavi","Shyam","Ravi"],index=['k','s','r'])
s9=pd.Series([2,3,4], index=["Feb","Mar","Apr"])
series
While passing index labels, the length of arr=[1,2,3,4]
index and length of array/list must be same myindex=["Jan","Feb","Mar","Apr"]
else it will give ValueError s10=pd.Series(arr,index=myindex)
Altering index of a series list1 = ['Kavi','Shyam','Ravi']
Assigning new index values to a series myindex = [3,5,1]
s11 = pd.Series(list1,index=myindex)
s11.index=[10,20,30]
s11.index=['x','y','z']
Accessing a series & its elements
Printing the entire series print(s)
Output appears in two columns – index in
the left and data values in the right column.
Printing the series in reverse order print(s[::-1])
Accessing elements of a series
 by using indexing
o by using integer positional
index
o by using index label
 by using slicing
Accessing elements of a series by using print(s[0]) #first element
integer positional index (position of an print(s[n-1]) #n th element
element in the series starting from 0)
Accessing elements of a series by using print(s['k'])
index label (user-defined or default index
label)
Accessing multiple discrete elements of a print(s8[[0,2]])
series (list of index positions or labels) print(s8[['k','r']])
Accessing elements of the part of a series print(s8[1:3])
by using slicing print(s8['s':'r'])
Extracting a part of the series by specifying
the start and end parameters for the slice as
[start:end].
The 'end' integer position index is excluded
in the output, however, the 'end' index label
is included in the output.
Updating the values of a series s8[1]='Suvi'
s8['s']='Purvi'
s8[1:3]='x'
s8['s':'r']='y'
Attributes of a series object
accessing properties of a series object
name – Assigning a name to the series object s.name = 'MySeries'
Printing the name of the series object print(s.name)
index.name – Assigning a name to the index of s.index.name = 'MyIndex'
the series
value – Printing a list of the values of the series print(s.values)

770945137.docx 1
size – Printing the number of values in the series print(s.size)
empty – Checking if series is empty or not print(s.empty)
if s.empty == True:
Methods of a series object
head(n) – accessing first n (by default 5) values print(s.head())
of the series print(s.head(2))
tail(n) – accessing last n (by default 5) values of print(s.tail())
the series print(s.tail(2))
count() – returning the number of non-NaN print(s.count())
values in the Series
len(series) – returning the number of values in print(len(s))
the Series including NaN values
len() is a Python function and not a Series
method
Mathematical operations on two or more
series
Addition of two series objects: sA = pd.Series([1,2,3,np.nan],index=['a','b','c','d'])
by using + operator to add the sB = pd.Series([4,5,6,7],index=['a','x','c','d'])
corresponding elements of two series print(sA+sB)
Addition of two series objects by using print(sA.add(sB,fill_value=0)
series method add() with parameter
fill_value to replace the missing values with
a specified value while adding the
corresponding elements of two series so as
to avoid NaN values in the output.
Other mathematical operations on two
series
Subtraction of Series print(sA-sB)
print(sA.sub(sB,fill_value=0)
Multiplication of Series print(sA*sB)
print(sA.mul(sB,fill_value=0)
Division of Series print(sA/sB)
Gives "inf" if denominator is zero print(sA.div(sB,fill_value=0)

770945137.docx 2
DataFrame (Two Dimensional or 2D structure)

Creating an empty dataframe df = pd.DataFrame()


Creating dataframe from list of arrays a1 = np.array([1,2,3])
No. of rows = No. of arrays a2 = np.array(['a','b','c','d'])
No. of cols = No. of vals in lengthiest array a3 = np.array([1.0,2.0,3.0])
NaN represents missing/mismatching vals #without index and column labels
df = pd.DataFrame([a1,a3,a2])
#with index labels
df = pd.DataFrame([a1,a3,a2], index=['r1','r2','r3'])
#with columns labels
df = pd.DataFrame([a1,a3,a2], columns=['c1','c2','c3','c4'])
#without index and column labels
df = pd.DataFrame([a1,a3,a2], index=['r1','r2','r3'],
columns=['c1','c2','c3','c4'])
Creating a dataframe from a list of dicts ld = [{'a':10,'b':20},{'a':5,'b':10,'c':20}]
No. of rows = No. of dicts df = pd.DataFrame(ld)
No. of cols = No. of unique keys in all dicts
Creating dataframe from a dict of lists dl = {'State':['Assam','Delhi','Kerala'],
No. of rows = No. of vals in lengthiest list 'GArea':[78438,1483,38852],'VDF':[2797,6.72,1663]}
No. of cols = No. of keys in the dict df = pd.DataFrame(dl)
#with sequence of columns changed in dataframe
df = pd.DataFrame(dict1, columns=['State','VDF','GArea'])
Creating dataframe from a list of series s = pd.Series([1,2,3,4],index=['a','b','c','d'])
No. of rows = No. of vals in series df = pd.DataFrame(s)
No. of cols = 1
Series becomes a col in dataframe.
Creating dataframe from a list of series sA =
pd.Series([1,2,3,4],index=['a','b','c','d'])
No. of rows = No. of series sB =
pd.Series(['v1','v2','v3'],index=['a','b','c'])
No. of cols = No. of unique index in all sC =
pd.Series([1.0,2.0,3.0],index=['z','b','c'])
series df =
pd.DataFrame([sA,sB,sC])
Creating dataframe from a dict of series ds =
{ 'c1': pd.Series([1,2,3],index=['r1','r2','r3']),
No. of rows = No. of unique index in all 'c2': pd.Series(['v1','v2'],index=['r1','r3']),
series 'c3': pd.Series([1.0,2.0,3.0],index=['r3','r1','r4'])}
No. of cols = No. of dict keys df = pd.DataFrame(ds)
Every column in the dataframe is a series. type(df.c1)
<class 'pandas.core.series.Series'>
If dataframe is created without custom index, pos index of rows are used as index labels.
If dataframe is created without custom columns, pos index of cols are used as column
labels.
Operations on rows and columns of a Selection, Addition, Deletion, Renaming of rows & columns of a dataframe
dataframe To refer to a row in a dataframe always use df.loc[ ] method.
Adding a new column to a dataframe #assigning values to a column that does not exist creates a new column at the end
Specify a list of values #otherwise values of the existing column gets updated to avoid 'ValueError' ensure that
Use np.nan for missing values. #length of list=length of index OR No. of vals=No. of rows
df['c5']=[60,70,80]
Adding a New Row to a DataFrame #to avoid 'ValueError' ensure that
#length of list=length of columns OR No. of vals=No. of cols
df.loc['r4']=[11,22,33,44,55]
Changing values of a column df['c5']=[61,71,81] #list length=index length
Changing values of an entire column to a df['c5']=99
particular value
Changing values of a row df.loc['r4']=[11,22,33,44,55] #list length=columns length
Change values of an entire row to a df.loc['r4']=0
particular value
Setting all the values of a dataframe to a df[:]='x'
particular value
Deleting rows of a dataframe df=df.drop('r5', axis=0)
axis=0 or axis='index' means rows df=df.drop(['r4,'r5'], axis=0)
Deleting columns of a dataframe df=df.drop('c4', axis=1)
axis=1 or axis='columns' means columns df=df.drop(['c3','c4'], axis=1)
Altering/Renaming row and column labels Specify labels as {'oldlabel':'newlabel'}. Labels not specified, remain intact.
of a dataframe If any specified label does not exist in dataframe, it is ignored without reporting any error
df=df.rename({'r2':'r21','r4':'r41'}, axis='index')
df=df.rename({'c2':'c21','c4':'c41'}, axis='columns')
Reordering rows in a dataframe df = df.loc[['r3','r1','r2']]
specify a list of rows in desired order
Reordering columns in a dataframe df = df[['c4','c1','c2','c3']]
specify a list of columns in desired order

770945137.docx 3
Accessing dataframe elements by using #use .loc indexer to specify row and col index labels
label based indexing #General syntax: DataFrame.loc[ rows , columns ]
#specify labels e.g. as 'r1', ['r1','r3'], 'r1':'r5'
#for single row, discrete multiple rows and slice or range of rows
#in a slice, both start and stop index labels are included
#single row label returns the row as a series
#single column name returns the column as a series

#returns a single value


df.loc['r1' , 'c1']

#returns a series
df.loc['r1']
df.loc['r1' , : ]
df.loc[ , : 'c1']

#returns a dataframe
df.loc['r1' , ['c1','c3']]
df.loc[['r1','r3'] , 'c1']
df.loc[['r1','r3'] , ['c1','c3']]
df.loc['r1':'r3' , 'c1':'c3']
df.loc[ : , 'c1':'c3']
df.loc['r1':'r3' , : ]
df.loc[ 'r1' , 'c1':'c3']
df.loc['r1':'r3' , : 'c1']

#columns only can be specified without using .loc indexer


df['c1']
df[['c1','c3']]
df['c1':'c3'] #does not work, so use .loc indexer for slice
Accessing dataframe elements by using #use .iloc indexer to specify row and col pos index
integer position indexing #General syntax: DataFrame.iloc[ rows , columns ]
#specify labels e.g. as 0, [0,2], 1:5
#for single row, discrete multiple rows and slice or range of rows
#in a slice, start pos index is included however, the stop pos index is excluded
#single row pos index returns the row as a series
#specify columns without using .iloc indexer
Boolean Indexing #selecting the subsets of data based on actual dataframe values rather than using their row
and column labels by using conditions on index or column labels to filter data values
which returns True or False for index or columns which in turn can be used to filter rows
and columns

df.loc['r1',:] > 90
df.loc[:,'c1'] > 90

Filtering rows and columns of a #filtering rows and columns without writing any explicit condition
dataframe by using Boolean indexing #but by giving a list of Boolean values for each row and column
#use a Boolean list specifying 'True' for the rows and columns to be displayed and 'False'
for the rows and columns to be omitted
# Rows and columns for which Boolean values are not specified are not displayed

ResultDF.loc[[True, False, True]] #4throw onwards are ignored


ResultDF.loc[:,[True, False, True]]

#filtering rows and columns by writing an explicit condition


#which returns a list of Boolean values for each row and column
#that in turn filters the rows and columns
df[df['c1'] > 90]
df[df.loc[:,'c1'] > 90]
df.loc[df['c1'] > 90]
df.loc[:,df.loc['r1'] > 90]

Example:
To increment the marks by 5 if marks are less than 33
df.loc[df['marks']<33, 'marks'] = df['marks']+5
If MRP is >70 then store 10% of MRP into a new column called 'Discount'
df['Discount'] = df[df['MRP']>70]['MRP']*10/100

770945137.docx 4
Descriptive Statistics with Pandas Aggregating data to get summary statistics of the data so as to analyse it
Applying statistical or aggregation effectively
functions on dataframes Applying a library function such as max(), min(), count(), sum(), mean()... or a
custom function on df or df.loc or df.iloc or df.colLabel or df['colLabel']
['rowLabel']
While using Pandas' statistical functions with dataframe:
axis=0 (default) means for each column and axis=1 means for each row
NOTE!
Elsewhere in Python, axis=0 (default) means for each row and axis=1
means for each column.
Applying functions on all columns of the df.functionName()
dataframe
(certain functions work on numeric
columns only)
Applying functions on a column of the df['colLabel'].functionName()
dataframe
Applying functions on multiple columns df[['colLabel1','colLabel1',...]].functionName()
of the dataframe
Applying functions on a row of the df.loc['rowLabel',:].functionName()
dataframe
Applying functions on a range of rows of df.loc['startRowLabel':'endRowLabel',:].functionName()
the dataframe
Applying functions on a slice or a subset df.loc['startRowlabel':'endRowIndex',
of the dataframe 'startColIndex':'endColIndex'].functionName()
Count of values of each row or column in df.count(axis=0|1|None, skipna=True|False|None,
the dataframe numeric_only=True|False|None, min_count=0)
Sum of all values of each row or column df.sum(axis=0|1|None, skipna=None,
in the dataframe numeric_only=None, min_count=0)
Max (maximum) value of each row or df.max(axis=0|1|None, skipna=True|False|None,
column in the dataframe numeric_only=True|False|None)
Min (minimum) value of each row or df.min(axis=0|1|None, skipna=True|False|None,
column in the dataframe numeric_only=True|False|None)
Mean (computed mean or average of the df.mean(axis=0|1|None, skipna=True|False|None,
dataset) of each row or column in the numeric_only=True|False|None).round(2)
dataframe
Mode (value that appears most often in a df.mode(axis=0|1|None, numeric_only=True|False|None)
dataset) of each row or column in the df.loc['s01':'s02','marks1':'marks2'].mode(axis=1)
dataframe
Median (middle value of a dataset, value df.median(axis=0|1|None, skipna=True|False|None,
which divides dataset into two equal parts) numeric_only=True|False|None)
of each row or column in the dataframe
MAD (mean absolute deviation) – average df.mad(axis=0|1|None, skipna=True|False)
distance between each data value and the
mean of a dataset
Var (unbiased variance) over the df.var(axis=0|1|None, skipna=True|False|None,
requested axis. Average of the squared numeric_only=True|False|None)
differences from the mean. Variance is the
expectation of the squared deviation of a
random variable from its mean. The var()
function calculates the variance of a given
set of numbers, variance of the dataframe,
variance of one or more columns or
variance of rows.
Std (standard deviation) – Square root of df.std(axis=0|1|None, skipna=True|False)
the variance.
Quantile is the point in a distribution that df.quantile(q=f, axis=0|1|None, numeric_only=True|False|None)
relate to the rank order of a value in that where, q is a float or array-like sequence of values between
0 and 1.0
distribution.
To produce quartiles or 4-quantile use a list
q=[0.25,0.5,0.75,1.0]

770945137.docx 5
Attributes of DataFrames Access properties/attributes of a DataFrame by using the property name with the
DataFrame name.
DataFrame.index displays row labels ForestAreaDF.index
Index(['GeoArea', 'VeryDense', 'ModeratelyDense',
'OpenForest'], dtype='object')
DataFrame.columns displays column
labels
DataFrame.dtypes displays data type of ForestAreaDF.dtypes
each column in the DataFrame Assam int64
Kerala int64
Delhi float64
dtype: object
DataFrame.values displays a NumPy ForestAreaDF.values
array([ [7.8438e+04, 3.8852e+04, 1.4830e+03],
ndarray having all the values in the [2.7970e+03, 1.6630e+03, 6.7200e+00],
DataFrame, without the axes labels. [1.0192e+04, 9.4070e+03, 5.6240e+01],
[1.5116e+04, 9.2510e+03, 1.2945e+02]])
DataFrame.shape displays a tuple ForestAreaDF.shape
representing the dimensionality of the (4, 3) #means ForestAreaDF has 4 rows and 3 columns
DataFrame.
DataFrame.size displays a tuple ForestAreaDF.size
representing the dimensionality of the 12 #means ForestAreaDF has 12 values in i
DataFrame.
DataFrame.T transposes the DataFrame – ForestAreaDF.T
GeoArea VeryDense Moderately Dense OpenForest
row index and column labels of the Assam 78438.0 2797.00 10192.00 15116.00
DataFrame are interchanged. Kerala 38852.0 1663.00 9407.00 9251.00
Equivalent of writing Delhi 1483.0 6.72 56.24 129.45
DataFrame.transposes()
DataFrame.head(n) displays first n rows ForestAreaDF.head(2)
Assam Kerala Delhi
of the DataFrame. If parameter n is not GeoArea 78438 38852 1483.00
specified, then by default, it returns first 5 VeryDense 2797 1663 6.72
rows of the DataFrame.
DataFrame.tail(n) displays last n rows of ForestAreaDF.tail(2)
Assam Kerala Delhi
the DataFrame. If parameter n is not ModeratelyDense 10192 9407 56.24
specified then by default, it gives last 5 OpenForest 15116 9251 129.45
rows of the DataFrame.
DataFrame.empty returns value True if ForestAreaDF.empty
False
DataFrame is empty and False otherwise.
df=pd.DataFrame() #Create an empty dataFrame
df.empty
True
Sorting Dataframes df.sort_values( by=colLabels|rowLabels, axis=0|1,
Arranging rows and columns in ascending ascending=True|False, inplace=True|False,
kind='quicksort', na_position='first|last')
or descending order on the basis of the
Arranging rows by the values of a column
values of one or more specified rows and df.sort_values(by='marks1')
columns. Arranging rows by the values of multiple columns
df.sort_values(by=['class','marks1'])
Arranging rows by the values of multiple columns in different orders
df.sort_values(by=['class','marks1'],ascending=[False,True])

770945137.docx 6
Checking/detecting missing values in a df.isnull() returns a Boolean same-sized DataFrame indicating if values are missing
dataframe df.notnull() returns a Boolean same-sized DataFrame which is just opposite of
isnull()
df.isnull().sum() returns a Series containing the number of missing values for
each column
df.isnull().sum().sum() returns the total number of missing values in the entire
dataframe
Dropping the missing values in a df.dropna(axis=0|1, how='any'|'all', thresh=None|num,
dataframe subset=None|[cols], inplace=True|False)
To drop row if any NaN values are present:
df.dropna(axis=0)
To drop column if any NaN values are present:
df.dropna(axis=1)
To drop the rows if the number of non-NaN is less than 6.
df.dropna(axis=0,thresh=6)

Replacing the missing values in a df.fillna(value=None|val, method=None, axis=None|0|1,


dataframe inplace=True|False)
Replace all NaN values in the dataframe with a scalar value
df.fillna(value=10)
Replace NaN values with the values of the previous row
df.fillna(axis=0, method='ffill')
Replace NaN values with the values of the previous column
df.fillna(axis=1, method='ffill')
Replace NaN with the values of the next row
df.fillna(axis=0, method='bfill')
Replace NaN with the values of the next column
df.fillna(axis=1, method='bfill')
Replace NaN values in column B with the mean value of that column
df['B'].fillna(value=df['B'].mean(), inplace=True)

770945137.docx 7
Importing and exporting data between
csv file and dataframe
Importing csv file to a dataframe Reading a csv file with pandas and into a dataframe
Using pd.read_csv() function to read a CSV file
Parameters:
filep URL or path of the CSV file
sep the seperator. By default it is comma ',' as in csv (comma
seperated values)
index_col uses passed column labels instead of column index 0, 1, 2,
3…
header uses passed row (int) or rows (int list) as header
use_cols uses passed col (string list) to make dataframe
squeeze if true and only one column is passed, returns pandas series
skiprows skips passed rows in new data frame
df = pd.read_csv("nba.csv")
#or df =
pd.read_csv("https://media.geeksforgeeks.org/nba.csv")

#filepath=r'd:\myfolder\nba.csv' #'r' preceding a


string denotes a raw un-escaped string
#df=pd.read_csv(filepath)
To load the data from result.csv file into a dataframe marks.
df=pd.read_csv('c:/myfolder/result.csv',sep=',',header=0)
By default, header=0 infers column names from first line of csv file.
The names=[] specifies columns.
marks1=pd.read_csv('c:/myfolder/result.csv',sep=',',
names=['RNo','SName','M1','M2'])
Exporting dataframe to a text or csv file Saving a dataframe as a CSV file using pd.to_csv() method
df.to_csv('file1.csv') #saving dataframe as csv
file in working directory
#with header and index
##Saving CSV to a specified location and without
headers and index
#df.to_csv(r'D:\path\file3.csv', header=False, index=False)
To save resultdf dataframe as resultout.csv file in the folder c:/myfolder with
index, columns and data values separated by comma.
df.to_csv('c:/myfolder/result.csv',sep=',')
To save resultdf dataframe as resultout.csv file in the folder c:/myfolder with data
values separated by comma but without index and columns.
df.to_csv('c:/myfolder/result.csv',sep=',',
index=False,header=False)
To save resultdf dataframe as resultout.csv file in the folder c:/myfolder with data
values separated by '@' but without index and columns.
df.to_csv('c:/myfolder/result.csv',sep='@',
index=False,header=False)
PANDAS dataframe & MYSQL Using pymysql or mysql.connector libraries to create connection and cursor
database objects and perform database operations and handle errors, if any.

Database connectivity code is written within try-catch section.


Using sqlalchemy for database from sqlalchemy import create_engine
connectivity engine=create_engine(
'mysql+pymysql://root:@localhost/mydb')
conn=sqlEngine.connect()
df=pd.read_sql_query(sql,conn)

#pip install sqlalchemy


from sqlalchemy.types import VARCHAR
from sqlalchemy import create_engine
#engine = create_engine(
'mysql+pymysql://root:@127.0.0.1/test')
engine=create_engine( 'mysql+pymysql:// {user}:
{pwd}@localhost/{db}'
.format(user=username, pwd=password, db=database))
connection=engine.connect()
df.to_sql(tablename, connection, if_exists='replace',
index=False, chunksize=1000)
#if_exists='replace|append|fail'

770945137.docx 8
Importing database connectivity libraries import mysql.connector as myconnector
for MySQL connection OR
import pymysql
Creating database connection object to #mysql-connector
connect to the mysql database conn = myconnector.connect( host='localhost',
user='root', password='', database='mydb')

#pymysql
conn = pymysql.connect( host='localhost',
user='root', port='', password='', db=db,
cursorclass=pymysql.cursors.DictCursor)
Creating cursor object to execute SQL #mysql-connector
queries cursor = connection.cursor(dictionary=True)

#pymysql
cursor=conn.cursor()
Executing SQL queries sql='select * from t1'
cursor.execute(sql)
OR
df=pd.read_sql(sql,con=conn)
Fetching MySQL table rows in the cursor rows=cursor.fetchall()
by using a loop to access one row at a time df=pd.DataFrame(rows)
Committing DML SQL queries, if any sql='delete from t1'
cursor.execute(sql)
conn.commit()
Handling errors/exceptions try:
...
except myconnector.Error as e:
print("ERROR%d:%s"%(e.args[0],e.args[1]))
Closing connections and destroying try:
objects before terminating the application ...
catch:
...
cursor.close()
connection.close()
raise SystemExit #terminate app
Change datatype of index of the dataframe df.index=df.index.astype('str')
e.g. from int to str after importing data
into dataframe from mysql table

770945137.docx 9
MATPLOTLIB Example (plots ver01.py) – Line, Bar & Histogram
#pip install matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#====plot data
year = [2015,2016,2017,2018]
percentMale = [70,82,63,90]
percentFemale = [80,52,73,43]
#ticks
xticks=np.arange(2015,2020)
yticks=np.arange(0,101,10)
'''
#PLOT – LINE CHART
plt.plot(year,percentMale,color='b')
plt.plot(year,percentFemale,color='r')
'''
'''
#BAR
x=np.arange(year[0],year[len(year)-1]+1)
plt.bar(x+0.00, percentMale, color='b', width=0.25)
plt.bar(x+0.25, percentFemale, color='r', width=0.25)
'''
'''
#BARH - Grouped Horizontal Bar (not stacked)
y=np.arange(year[0],year[len(year)-1]+1)
width = 0.3
fig, ax = plt.subplots()
ax.barh(y+(0.5*width), percentMale, width, color='red', label='Male')
ax.barh(y+(0.5*width)+width, percentFemale, width, color='green', label='Female')
ax.set(yticks=y+width, yticklabels=year) #do not set plt.xticks() and plt.yticks()
xticks = np.arange(0,101,10)
yticks=y+width
'''
'''
#HISTOGRAM
#plt.hist(positions, intervals or bins, weightsofbars)
percent=[2,5,70,40,30,45,50,45,43,40,44,60,7,13,57,18,90,77,32,21,20,40] #frequencies
range = (0, 100) #set ranges and no. of intervals
bins = 10
plt.hist(ages,bins,range,color='green',histtype='bar',rwidth=0.8) # plotting a histogram
xticks=np.arange(0,100,10)
yticks=np.arange(0,11)
'''
#====common plot settings
plt.title('MYPLOT')
plt.xticks(xticks) #plt.xticks(xticks,labels,rotation='vertical')
plt.yticks(yticks)
plt.xlabel('YEAR') #plt.xlabel('X Labels', fontsize=6)
plt.ylabel('PERCENTAGE') #plt.ylabel('Y Labels', fontsize=10)
plt.legend(['Male','Female']) #legends used in the plot
#plt.grid(True)
plt.show()
#plt.savefig("temp.png") #save the plot as an image

770945137.docx 10
Python MySQL Database Connectivity Example (PythonMySQLdbCRUD.py)
#Python MySQL Databasde Connectivity
import pandas as pd
import numpy as np
import pymysql

conn = pymysql.connect( host='localhost', user='root', port='', password='', db='ip2022',


cursorclass=pymysql.cursors.DictCursor)
cursor=conn.cursor()
def cls():
print("\n" * 20)
def intro():
print("Student Management System (SMS)\n")
def menu():
cls()
intro()
print(" Select a CRUD operation...")
print(" 1. Create")
print(" 2. Insert")
print(" 3. Select All Records")
print(" 4. Select Records Conditionally")
print(" 5. Update")
print(" 6. Delete")
print(" Q. Quit")
def main():
ch = 0
menu()
ch = input("Press 1 to 5 for CRUD or Q to Quit...")
if ch == '1':
create()
elif ch == '2':
insert()
elif ch == '3':
selectall()
elif ch == '4':
selectcondition()
elif ch == '5':
update()
elif ch == '6':
delete()
elif ch == 'Q' or ch == 'q' :
print("Thanks for using SMS!")
else:
print("Invalid choice! Enter a valid option.")
main()
while ch == 'Q' or ch == 'q':
cursor.close()
conn.close()
raise SystemExit #terminate the app
else:
replayMenu()
def replayMenu():
startover = ""
startover = input('...continue (y/n)? ')
while startover.lower() != 'y':
print("Thank you for using SMS.")
break
else:
main()
def create():
sql = '''CREATE TABLE IF NOT EXISTS student(roll INT PRIMARY KEY, name VARCHAR(30),
std VARCHAR(30), dob DATE, marks decimal(4,1))'''
try:
cursor.execute(sql)
conn.commit()
print("New table created.")
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
def insert():
print('Enter the student details: ')
roll = input('Roll: ')

770945137.docx 11
name = input('Name: ')
std = input('Class & Section (e.g. 12A): ')
dob = input('DOB (YYYYMMDD): ')
marks = input('Marks %: ')
sql = '''INSERT INTO student(roll,name,std,dob,marks)
VALUES (%s,%s,%s,%s,%s)'''
try:
cursor.execute(sql, (roll,name,std,dob,marks))
conn.commit()
print("New record added successfully.")
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
'''--------------------------------
#to insert multiple rows all at once
sql = "INSERT INTO student(roll,name) VALUES(%s, %s)"
val = [('123','Abhay',...),('456','Vijay',...),...]
cursor.executemany(sql,val)
--------------------------------'''
def selectall():
sql = "SELECT * FROM student"

try:
cursor.execute(sql)
#number_of_rows = cursor.execute(sql)
#print(number_of_rows)
result = cursor.fetchall()
print('''ROLL \t NAME \t\t CLASS \t DOB \t\t MARKS''')
print("----------------------------------------------------------------------")
for row in result:
#to use index rather than column name to access cursor data values
#disable 'cursorclass=pymysql.cursors.DictCursor' in conn declaration
#print(row[0],"\t",row[1],"\t",row[2],"\t",row[3],"\t",row[4]),"\t",row[5]
#0:>7 for right-align; #0:<7 for left-align; #0:7 for exact length
print("{0:<9}{1:<16}{2:<8}{3:<16}
{4:<5}".format(row['roll'],row['name'],row['std'],str(row['dob']),str(row['marks'])))
#print(row['roll'],row['name'],row['std'],row['dob'],row['marks'])
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))

def selectcondition():
column = input("Enter column name for record search...")
value = input("Enter value for the column...")
sql = "SELECT * FROM student WHERE {} = %s".format(column)
try:
cursor.execute(sql, (value,))
result = cursor.fetchall()
print("ROLL \t NAME \t STD \t DOB \t MARKS")
print("----------------------------------------------------------------------")
for row in result: print(row['roll'],"\t",row['name'],"\
t",row['std'],"\t",row["dob"],"\t",row["marks"])
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
def update():
roll = input("Enter roll of record to be updated...")
column = input("Enter the column to be updated...")
value = input("Enter the new value of the column...")
sql = "UPDATE student SET {}=%s WHERE roll = %s".format(column)
try:
cursor.execute(sql, (value, roll))
conn.commit()
print("\+nSuccessfully Updated...\n")
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
def delete():
roll = input("Enter roll of record to be deleted...")
sql = "DELETE FROM student WHERE roll = %s"
try:
cursor.execute(sql, (roll,))
conn.commit()
print("\nSuccessfully Deleted...\n")
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
#=====call for Starting function=====
main()

770945137.docx 12
DataFrame Plot – Line, Bar & Histogram

import matplotlib.pyplot as plt


import numpy as np
import pandas as pd

#----------------------------
#DATAFRAME plots - line & bar
#DATA
year = [2015,2016,2017,2018] #index for x-axis
percentMale = [70,82,63,90] #numeric column for y-axis
percentFemale = [80,52,73,43] #another numeric column for y-axis
#DATAFRAME
#By default - 'index' is laid out on X-axis and all numeric columns on Y-axis
df = pd.DataFrame({'Male': percentMale, 'Female': percentFemale}, index=year)
#PLOT
#ax = df['columnname'].plot(kind='bar',rot=0)
#to plot specified column(y-axis) against index(x-axis)
ax = df.plot(kind='bar',rot=0) #kind='bar|barh|line'
#PLOT settings
#ticks and legends are automatically set
plt.title('MYPLOT')
plt.xlabel('YEAR')
plt.ylabel('PERCENTAGE')
plt.show()
#----------------------------

#----------------------------
#DATAFRAME plots - histogram
#frequencies
ages=[2,5,70,40,30,45,50,45,43,40,44,60,7,13,57,18,90,77,32,21,20,40]
# setting the ranges and no. of intervals
myrange = (0, 100)
bins = 10
#bins=[0,20,40,60,80,100]
# plotting a histogram
df = pd.DataFrame({'Age': ages})
ax = df.plot(kind='hist',range=myrange,bins=bins,rwidth=0.8,alpha=0.5)
plt.title('MYPLOT')
plt.xlabel('YEAR')
plt.ylabel('PERCENTAGE')
plt.show()
#----------------------------

770945137.docx 13
Joining, Merging and Concatenating
DataFrames
Appending DataFrames The DataFrame.append() method merges two DataFrames. It appends rows of second
dataframe to the end of the first DataFrame and may end up having duplicate index.
Columns not present in the first dataframe are added as new columns in the resultant
dataframe. Columns remain unique.
df1=pd.DataFrame([[1,2,3],[4,5],[6]],
columns=['C1','C2','C3'], index=['R1','R2','R3'])
df2=pd.DataFrame([[10,20],[30],[40,50]],
columns=['C2','C5'], index=['R4','R2','R5'])
df=df1.append(df2)
C1 C2 C3 C5
R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
R5 NaN 40.0 NaN 50.0
Set parameter sort=True to get the column labels appear in sorted order.
df=df1.append(df2,sort='True')
Set verify_integrity=True parameter to raise an error if row labels are duplicate. By
default, verify_integrity=False, use it to accept duplicate row labels while appending the
DataFrames.
Set ignore_index=True parameter if row index labels are to be ignore in the resultant
dataframe. By default, ignore_index=False retains the index labels.
dFrame1 = dFrame1.append(dFrame2, ignore_index=True)
C1 C2 C3 C5
0 1.0 2.0 3.0 NaN
1 4.0 5.0 NaN NaN
2 6.0 NaN NaN NaN
3 NaN 10.0 NaN 20.0
4 NaN 30.0 NaN NaN
5 NaN 40.0 NaN 50.0
The append() method can also be used to append a series or a dictionary to a DataFrame.
Concatenating DataFrames The concat() function concates dataframes along an axis while performing optional set
logic (union or intersection) applied on index or other axes i.e. column. It joins multiple
dataframes vertically (row-wise, one after another).
df1 = pd.DataFrame({'A':['A0','A1'],'B':['B0','B1']},
index=[0,1])
df2 = pd.DataFrame({'A':['A2','A3'],'B':['B2','B3']},
index=[2,3])
df3 = pd.DataFrame({'A':['A4','A5'],'B':['B4','B5']},
index=[4,5])
pd.concat([df1,df2,df3])
A B
0 A0 B0
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 B4
5 A5 B5
Merging DataFrames merge() function joins dataframes using a key column. Similar to SQL join using a
common column. Similar to df.join(), but uses a common key column to join on rather than
the common index. It joins multiple dataframes horizontally (column-wise, one after
another).
df1 = pd.DataFrame({'Key':['K0','K1'],
'A':['A0','A1'],'B':['B0','B1']})
df2 = pd.DataFrame({'Key':['K0','K1'],
'C':['C0','C1'],'D':['D0','D1']})
pd.merge(df1,df2,how='inner',on='Key')
A B Key C D
0 A0 B0 K0 C0 D0
1 A1 B1 K1 C1 D1
Joining DataFrames The join() function joins dataframes using common index. Similar to SQL join using a
common column. Similar to pd.merge(), but uses common index to join on rather than a
common key column. It joins multiple dataframes horizontally (column-wise, one after
another).
df1 = pd.DataFrame({'A':['A0','A1'],'B':['B0','B1']},
index=['K0','K1'])
df2 = pd.DataFrame({'C':['C0','C1'],'D':['D0','D1']},
index=['K0','K1'])
df1.join(df2)
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 C1 D1

770945137.docx 14
770945137.docx 15

You might also like