0% found this document useful (0 votes)

153 views15 pages

CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only

Uploaded by

deopadevyansh88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

153 views15 pages

CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only

Uploaded by

deopadevyansh88

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Chapter 2: Data Handling Using Pandas – I

Installing NumPy and Pandas library pip install numpy

packages through cmd pip install pandas
Importing Pandas library package into any import numpy as np
Python program import pandas as pd
Creating an empty series s0 = pd.Series()
Creating a series from a list of scalar values s1 = pd.Series([10,20,30])
s2 = pd.Series(['a','b','c'])
s3 = pd.Series([1,2.0,np.nan,'a'])
Creating a series from a sequence of s4 = pd.Series(np.arange(1,4,1))
numbers
Creating a series from 1D numpy array arr1 = np.array([1,2,3,4])
s5 = pd.Series(arr1)
Creating a series from a dict dict1 = {'India':'NewDelhi','UK':'London','Japan':'Tokyo'}
Keys become index of series and dictionary s6 = pd.Series(dict1)
values form the series
Assigning user-defined numeric and non- s7=pd.Series(["Kavi","Shyam","Ravi"],index=[3,5,1])
numeric index labels to the elements of a s8=pd.Series(["Kavi","Shyam","Ravi"],index=['k','s','r'])
s9=pd.Series([2,3,4], index=["Feb","Mar","Apr"])
series
While passing index labels, the length of arr=[1,2,3,4]
index and length of array/list must be same myindex=["Jan","Feb","Mar","Apr"]
else it will give ValueError s10=pd.Series(arr,index=myindex)
Altering index of a series list1 = ['Kavi','Shyam','Ravi']
Assigning new index values to a series myindex = [3,5,1]
s11 = pd.Series(list1,index=myindex)
s11.index=[10,20,30]
s11.index=['x','y','z']
Accessing a series & its elements
Printing the entire series print(s)
Output appears in two columns – index in
the left and data values in the right column.
Printing the series in reverse order print(s[::-1])
Accessing elements of a series
 by using indexing
o by using integer positional
index
o by using index label
 by using slicing
Accessing elements of a series by using print(s[0]) #first element
integer positional index (position of an print(s[n-1]) #n th element
element in the series starting from 0)
Accessing elements of a series by using print(s['k'])
index label (user-defined or default index
label)
Accessing multiple discrete elements of a print(s8[[0,2]])
series (list of index positions or labels) print(s8[['k','r']])
Accessing elements of the part of a series print(s8[1:3])
by using slicing print(s8['s':'r'])
Extracting a part of the series by specifying
the start and end parameters for the slice as
[start:end].
The 'end' integer position index is excluded
in the output, however, the 'end' index label
is included in the output.
Updating the values of a series s8[1]='Suvi'
s8['s']='Purvi'
s8[1:3]='x'
s8['s':'r']='y'
Attributes of a series object
accessing properties of a series object
name – Assigning a name to the series object s.name = 'MySeries'
Printing the name of the series object print(s.name)
index.name – Assigning a name to the index of s.index.name = 'MyIndex'
the series
value – Printing a list of the values of the series print(s.values)

770945137.docx 1
size – Printing the number of values in the series print(s.size)
empty – Checking if series is empty or not print(s.empty)
if s.empty == True:
Methods of a series object
head(n) – accessing first n (by default 5) values print(s.head())
of the series print(s.head(2))
tail(n) – accessing last n (by default 5) values of print(s.tail())
the series print(s.tail(2))
count() – returning the number of non-NaN print(s.count())
values in the Series
len(series) – returning the number of values in print(len(s))
the Series including NaN values
len() is a Python function and not a Series
method
Mathematical operations on two or more
series
Addition of two series objects: sA = pd.Series([1,2,3,np.nan],index=['a','b','c','d'])
by using + operator to add the sB = pd.Series([4,5,6,7],index=['a','x','c','d'])
corresponding elements of two series print(sA+sB)
Addition of two series objects by using print(sA.add(sB,fill_value=0)
series method add() with parameter
fill_value to replace the missing values with
a specified value while adding the
corresponding elements of two series so as
to avoid NaN values in the output.
Other mathematical operations on two
series
Subtraction of Series print(sA-sB)
print(sA.sub(sB,fill_value=0)
Multiplication of Series print(sA*sB)
print(sA.mul(sB,fill_value=0)
Division of Series print(sA/sB)
Gives "inf" if denominator is zero print(sA.div(sB,fill_value=0)

770945137.docx 2
DataFrame (Two Dimensional or 2D structure)

Creating an empty dataframe df = pd.DataFrame()

Creating dataframe from list of arrays a1 = np.array([1,2,3])
No. of rows = No. of arrays a2 = np.array(['a','b','c','d'])
No. of cols = No. of vals in lengthiest array a3 = np.array([1.0,2.0,3.0])
NaN represents missing/mismatching vals #without index and column labels
df = pd.DataFrame([a1,a3,a2])
#with index labels
df = pd.DataFrame([a1,a3,a2], index=['r1','r2','r3'])
#with columns labels
df = pd.DataFrame([a1,a3,a2], columns=['c1','c2','c3','c4'])
#without index and column labels
df = pd.DataFrame([a1,a3,a2], index=['r1','r2','r3'],
columns=['c1','c2','c3','c4'])
Creating a dataframe from a list of dicts ld = [{'a':10,'b':20},{'a':5,'b':10,'c':20}]
No. of rows = No. of dicts df = pd.DataFrame(ld)
No. of cols = No. of unique keys in all dicts
Creating dataframe from a dict of lists dl = {'State':['Assam','Delhi','Kerala'],
No. of rows = No. of vals in lengthiest list 'GArea':[78438,1483,38852],'VDF':[2797,6.72,1663]}
No. of cols = No. of keys in the dict df = pd.DataFrame(dl)
#with sequence of columns changed in dataframe
df = pd.DataFrame(dict1, columns=['State','VDF','GArea'])
Creating dataframe from a list of series s = pd.Series([1,2,3,4],index=['a','b','c','d'])
No. of rows = No. of vals in series df = pd.DataFrame(s)
No. of cols = 1
Series becomes a col in dataframe.
Creating dataframe from a list of series sA =
pd.Series([1,2,3,4],index=['a','b','c','d'])
No. of rows = No. of series sB =
pd.Series(['v1','v2','v3'],index=['a','b','c'])
No. of cols = No. of unique index in all sC =
pd.Series([1.0,2.0,3.0],index=['z','b','c'])
series df =
pd.DataFrame([sA,sB,sC])
Creating dataframe from a dict of series ds =
{ 'c1': pd.Series([1,2,3],index=['r1','r2','r3']),
No. of rows = No. of unique index in all 'c2': pd.Series(['v1','v2'],index=['r1','r3']),
series 'c3': pd.Series([1.0,2.0,3.0],index=['r3','r1','r4'])}
No. of cols = No. of dict keys df = pd.DataFrame(ds)
Every column in the dataframe is a series. type(df.c1)
<class 'pandas.core.series.Series'>
If dataframe is created without custom index, pos index of rows are used as index labels.
If dataframe is created without custom columns, pos index of cols are used as column
labels.
Operations on rows and columns of a Selection, Addition, Deletion, Renaming of rows & columns of a dataframe
dataframe To refer to a row in a dataframe always use df.loc[ ] method.
Adding a new column to a dataframe #assigning values to a column that does not exist creates a new column at the end
Specify a list of values #otherwise values of the existing column gets updated to avoid 'ValueError' ensure that
Use np.nan for missing values. #length of list=length of index OR No. of vals=No. of rows
df['c5']=[60,70,80]
Adding a New Row to a DataFrame #to avoid 'ValueError' ensure that
#length of list=length of columns OR No. of vals=No. of cols
df.loc['r4']=[11,22,33,44,55]
Changing values of a column df['c5']=[61,71,81] #list length=index length
Changing values of an entire column to a df['c5']=99
particular value
Changing values of a row df.loc['r4']=[11,22,33,44,55] #list length=columns length
Change values of an entire row to a df.loc['r4']=0
particular value
Setting all the values of a dataframe to a df[:]='x'
particular value
Deleting rows of a dataframe df=df.drop('r5', axis=0)
axis=0 or axis='index' means rows df=df.drop(['r4,'r5'], axis=0)
Deleting columns of a dataframe df=df.drop('c4', axis=1)
axis=1 or axis='columns' means columns df=df.drop(['c3','c4'], axis=1)
Altering/Renaming row and column labels Specify labels as {'oldlabel':'newlabel'}. Labels not specified, remain intact.
of a dataframe If any specified label does not exist in dataframe, it is ignored without reporting any error
df=df.rename({'r2':'r21','r4':'r41'}, axis='index')
df=df.rename({'c2':'c21','c4':'c41'}, axis='columns')
Reordering rows in a dataframe df = df.loc[['r3','r1','r2']]
specify a list of rows in desired order
Reordering columns in a dataframe df = df[['c4','c1','c2','c3']]
specify a list of columns in desired order

770945137.docx 3
Accessing dataframe elements by using #use .loc indexer to specify row and col index labels
label based indexing #General syntax: DataFrame.loc[ rows , columns ]
#specify labels e.g. as 'r1', ['r1','r3'], 'r1':'r5'
#for single row, discrete multiple rows and slice or range of rows
#in a slice, both start and stop index labels are included
#single row label returns the row as a series
#single column name returns the column as a series

#returns a single value

df.loc['r1' , 'c1']

#returns a series
df.loc['r1']
df.loc['r1' , : ]
df.loc[ , : 'c1']

#returns a dataframe
df.loc['r1' , ['c1','c3']]
df.loc[['r1','r3'] , 'c1']
df.loc[['r1','r3'] , ['c1','c3']]
df.loc['r1':'r3' , 'c1':'c3']
df.loc[ : , 'c1':'c3']
df.loc['r1':'r3' , : ]
df.loc[ 'r1' , 'c1':'c3']
df.loc['r1':'r3' , : 'c1']

#columns only can be specified without using .loc indexer

df['c1']
df[['c1','c3']]
df['c1':'c3'] #does not work, so use .loc indexer for slice
Accessing dataframe elements by using #use .iloc indexer to specify row and col pos index
integer position indexing #General syntax: DataFrame.iloc[ rows , columns ]
#specify labels e.g. as 0, [0,2], 1:5
#for single row, discrete multiple rows and slice or range of rows
#in a slice, start pos index is included however, the stop pos index is excluded
#single row pos index returns the row as a series
#specify columns without using .iloc indexer
Boolean Indexing #selecting the subsets of data based on actual dataframe values rather than using their row
and column labels by using conditions on index or column labels to filter data values
which returns True or False for index or columns which in turn can be used to filter rows
and columns

df.loc['r1',:] > 90
df.loc[:,'c1'] > 90

Filtering rows and columns of a #filtering rows and columns without writing any explicit condition
dataframe by using Boolean indexing #but by giving a list of Boolean values for each row and column
#use a Boolean list specifying 'True' for the rows and columns to be displayed and 'False'
for the rows and columns to be omitted
# Rows and columns for which Boolean values are not specified are not displayed

ResultDF.loc[[True, False, True]] #4throw onwards are ignored

ResultDF.loc[:,[True, False, True]]

#filtering rows and columns by writing an explicit condition

#which returns a list of Boolean values for each row and column
#that in turn filters the rows and columns
df[df['c1'] > 90]
df[df.loc[:,'c1'] > 90]
df.loc[df['c1'] > 90]
df.loc[:,df.loc['r1'] > 90]

Example:
To increment the marks by 5 if marks are less than 33
df.loc[df['marks']<33, 'marks'] = df['marks']+5
If MRP is >70 then store 10% of MRP into a new column called 'Discount'
df['Discount'] = df[df['MRP']>70]['MRP']*10/100

770945137.docx 4
Descriptive Statistics with Pandas Aggregating data to get summary statistics of the data so as to analyse it
Applying statistical or aggregation effectively
functions on dataframes Applying a library function such as max(), min(), count(), sum(), mean()... or a
custom function on df or df.loc or df.iloc or df.colLabel or df['colLabel']
['rowLabel']
While using Pandas' statistical functions with dataframe:
axis=0 (default) means for each column and axis=1 means for each row
NOTE!
Elsewhere in Python, axis=0 (default) means for each row and axis=1
means for each column.
Applying functions on all columns of the df.functionName()
dataframe
(certain functions work on numeric
columns only)
Applying functions on a column of the df['colLabel'].functionName()
dataframe
Applying functions on multiple columns df[['colLabel1','colLabel1',...]].functionName()
of the dataframe
Applying functions on a row of the df.loc['rowLabel',:].functionName()
dataframe
Applying functions on a range of rows of df.loc['startRowLabel':'endRowLabel',:].functionName()
the dataframe
Applying functions on a slice or a subset df.loc['startRowlabel':'endRowIndex',
of the dataframe 'startColIndex':'endColIndex'].functionName()
Count of values of each row or column in df.count(axis=0|1|None, skipna=True|False|None,
the dataframe numeric_only=True|False|None, min_count=0)
Sum of all values of each row or column df.sum(axis=0|1|None, skipna=None,
in the dataframe numeric_only=None, min_count=0)
Max (maximum) value of each row or df.max(axis=0|1|None, skipna=True|False|None,
column in the dataframe numeric_only=True|False|None)
Min (minimum) value of each row or df.min(axis=0|1|None, skipna=True|False|None,
column in the dataframe numeric_only=True|False|None)
Mean (computed mean or average of the df.mean(axis=0|1|None, skipna=True|False|None,
dataset) of each row or column in the numeric_only=True|False|None).round(2)
dataframe
Mode (value that appears most often in a df.mode(axis=0|1|None, numeric_only=True|False|None)
dataset) of each row or column in the df.loc['s01':'s02','marks1':'marks2'].mode(axis=1)
dataframe
Median (middle value of a dataset, value df.median(axis=0|1|None, skipna=True|False|None,
which divides dataset into two equal parts) numeric_only=True|False|None)
of each row or column in the dataframe
MAD (mean absolute deviation) – average df.mad(axis=0|1|None, skipna=True|False)
distance between each data value and the
mean of a dataset
Var (unbiased variance) over the df.var(axis=0|1|None, skipna=True|False|None,
requested axis. Average of the squared numeric_only=True|False|None)
differences from the mean. Variance is the
expectation of the squared deviation of a
random variable from its mean. The var()
function calculates the variance of a given
set of numbers, variance of the dataframe,
variance of one or more columns or
variance of rows.
Std (standard deviation) – Square root of df.std(axis=0|1|None, skipna=True|False)
the variance.
Quantile is the point in a distribution that df.quantile(q=f, axis=0|1|None, numeric_only=True|False|None)
relate to the rank order of a value in that where, q is a float or array-like sequence of values between
0 and 1.0
distribution.
To produce quartiles or 4-quantile use a list
q=[0.25,0.5,0.75,1.0]

770945137.docx 5
Attributes of DataFrames Access properties/attributes of a DataFrame by using the property name with the
DataFrame name.
DataFrame.index displays row labels ForestAreaDF.index
Index(['GeoArea', 'VeryDense', 'ModeratelyDense',
'OpenForest'], dtype='object')
DataFrame.columns displays column
labels
DataFrame.dtypes displays data type of ForestAreaDF.dtypes
each column in the DataFrame Assam int64
Kerala int64
Delhi float64
dtype: object
DataFrame.values displays a NumPy ForestAreaDF.values
array([ [7.8438e+04, 3.8852e+04, 1.4830e+03],
ndarray having all the values in the [2.7970e+03, 1.6630e+03, 6.7200e+00],
DataFrame, without the axes labels. [1.0192e+04, 9.4070e+03, 5.6240e+01],
[1.5116e+04, 9.2510e+03, 1.2945e+02]])
DataFrame.shape displays a tuple ForestAreaDF.shape
representing the dimensionality of the (4, 3) #means ForestAreaDF has 4 rows and 3 columns
DataFrame.
DataFrame.size displays a tuple ForestAreaDF.size
representing the dimensionality of the 12 #means ForestAreaDF has 12 values in i
DataFrame.
DataFrame.T transposes the DataFrame – ForestAreaDF.T
GeoArea VeryDense Moderately Dense OpenForest
row index and column labels of the Assam 78438.0 2797.00 10192.00 15116.00
DataFrame are interchanged. Kerala 38852.0 1663.00 9407.00 9251.00
Equivalent of writing Delhi 1483.0 6.72 56.24 129.45
DataFrame.transposes()
DataFrame.head(n) displays first n rows ForestAreaDF.head(2)
Assam Kerala Delhi
of the DataFrame. If parameter n is not GeoArea 78438 38852 1483.00
specified, then by default, it returns first 5 VeryDense 2797 1663 6.72
rows of the DataFrame.
DataFrame.tail(n) displays last n rows of ForestAreaDF.tail(2)
Assam Kerala Delhi
the DataFrame. If parameter n is not ModeratelyDense 10192 9407 56.24
specified then by default, it gives last 5 OpenForest 15116 9251 129.45
rows of the DataFrame.
DataFrame.empty returns value True if ForestAreaDF.empty
False
DataFrame is empty and False otherwise.
df=pd.DataFrame() #Create an empty dataFrame
df.empty
True
Sorting Dataframes df.sort_values( by=colLabels|rowLabels, axis=0|1,
Arranging rows and columns in ascending ascending=True|False, inplace=True|False,
kind='quicksort', na_position='first|last')
or descending order on the basis of the
Arranging rows by the values of a column
values of one or more specified rows and df.sort_values(by='marks1')
columns. Arranging rows by the values of multiple columns
df.sort_values(by=['class','marks1'])
Arranging rows by the values of multiple columns in different orders
df.sort_values(by=['class','marks1'],ascending=[False,True])

770945137.docx 6
Checking/detecting missing values in a df.isnull() returns a Boolean same-sized DataFrame indicating if values are missing
dataframe df.notnull() returns a Boolean same-sized DataFrame which is just opposite of
isnull()
df.isnull().sum() returns a Series containing the number of missing values for
each column
df.isnull().sum().sum() returns the total number of missing values in the entire
dataframe
Dropping the missing values in a df.dropna(axis=0|1, how='any'|'all', thresh=None|num,
dataframe subset=None|[cols], inplace=True|False)
To drop row if any NaN values are present:
df.dropna(axis=0)
To drop column if any NaN values are present:
df.dropna(axis=1)
To drop the rows if the number of non-NaN is less than 6.
df.dropna(axis=0,thresh=6)

Replacing the missing values in a df.fillna(value=None|val, method=None, axis=None|0|1,

dataframe inplace=True|False)
Replace all NaN values in the dataframe with a scalar value
df.fillna(value=10)
Replace NaN values with the values of the previous row
df.fillna(axis=0, method='ffill')
Replace NaN values with the values of the previous column
df.fillna(axis=1, method='ffill')
Replace NaN with the values of the next row
df.fillna(axis=0, method='bfill')
Replace NaN with the values of the next column
df.fillna(axis=1, method='bfill')
Replace NaN values in column B with the mean value of that column
df['B'].fillna(value=df['B'].mean(), inplace=True)

770945137.docx 7
Importing and exporting data between
csv file and dataframe
Importing csv file to a dataframe Reading a csv file with pandas and into a dataframe
Using pd.read_csv() function to read a CSV file
Parameters:
filep URL or path of the CSV file
sep the seperator. By default it is comma ',' as in csv (comma
seperated values)
index_col uses passed column labels instead of column index 0, 1, 2,
3…
header uses passed row (int) or rows (int list) as header
use_cols uses passed col (string list) to make dataframe
squeeze if true and only one column is passed, returns pandas series
skiprows skips passed rows in new data frame
df = pd.read_csv("nba.csv")
#or df =
pd.read_csv("https://media.geeksforgeeks.org/nba.csv")

#filepath=r'd:\myfolder\nba.csv' #'r' preceding a

string denotes a raw un-escaped string
#df=pd.read_csv(filepath)
To load the data from result.csv file into a dataframe marks.
df=pd.read_csv('c:/myfolder/result.csv',sep=',',header=0)
By default, header=0 infers column names from first line of csv file.
The names=[] specifies columns.
marks1=pd.read_csv('c:/myfolder/result.csv',sep=',',
names=['RNo','SName','M1','M2'])
Exporting dataframe to a text or csv file Saving a dataframe as a CSV file using pd.to_csv() method
df.to_csv('file1.csv') #saving dataframe as csv
file in working directory
#with header and index
##Saving CSV to a specified location and without
headers and index
#df.to_csv(r'D:\path\file3.csv', header=False, index=False)
To save resultdf dataframe as resultout.csv file in the folder c:/myfolder with
index, columns and data values separated by comma.
df.to_csv('c:/myfolder/result.csv',sep=',')
To save resultdf dataframe as resultout.csv file in the folder c:/myfolder with data
values separated by comma but without index and columns.
df.to_csv('c:/myfolder/result.csv',sep=',',
index=False,header=False)
To save resultdf dataframe as resultout.csv file in the folder c:/myfolder with data
values separated by '@' but without index and columns.
df.to_csv('c:/myfolder/result.csv',sep='@',
index=False,header=False)
PANDAS dataframe & MYSQL Using pymysql or mysql.connector libraries to create connection and cursor
database objects and perform database operations and handle errors, if any.

Database connectivity code is written within try-catch section.

Using sqlalchemy for database from sqlalchemy import create_engine
connectivity engine=create_engine(
'mysql+pymysql://root:@localhost/mydb')
conn=sqlEngine.connect()
df=pd.read_sql_query(sql,conn)

#pip install sqlalchemy

from sqlalchemy.types import VARCHAR
from sqlalchemy import create_engine
#engine = create_engine(
'mysql+pymysql://root:@127.0.0.1/test')
engine=create_engine( 'mysql+pymysql:// {user}:
{pwd}@localhost/{db}'
.format(user=username, pwd=password, db=database))
connection=engine.connect()
df.to_sql(tablename, connection, if_exists='replace',
index=False, chunksize=1000)
#if_exists='replace|append|fail'

770945137.docx 8
Importing database connectivity libraries import mysql.connector as myconnector
for MySQL connection OR
import pymysql
Creating database connection object to #mysql-connector
connect to the mysql database conn = myconnector.connect( host='localhost',
user='root', password='', database='mydb')

#pymysql
conn = pymysql.connect( host='localhost',
user='root', port='', password='', db=db,
cursorclass=pymysql.cursors.DictCursor)
Creating cursor object to execute SQL #mysql-connector
queries cursor = connection.cursor(dictionary=True)

#pymysql
cursor=conn.cursor()
Executing SQL queries sql='select * from t1'
cursor.execute(sql)
OR
df=pd.read_sql(sql,con=conn)
Fetching MySQL table rows in the cursor rows=cursor.fetchall()
by using a loop to access one row at a time df=pd.DataFrame(rows)
Committing DML SQL queries, if any sql='delete from t1'
cursor.execute(sql)
conn.commit()
Handling errors/exceptions try:
...
except myconnector.Error as e:
print("ERROR%d:%s"%(e.args[0],e.args[1]))
Closing connections and destroying try:
objects before terminating the application ...
catch:
...
cursor.close()
connection.close()
raise SystemExit #terminate app
Change datatype of index of the dataframe df.index=df.index.astype('str')
e.g. from int to str after importing data
into dataframe from mysql table

770945137.docx 9
MATPLOTLIB Example (plots ver01.py) – Line, Bar & Histogram
#pip install matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#====plot data
year = [2015,2016,2017,2018]
percentMale = [70,82,63,90]
percentFemale = [80,52,73,43]
#ticks
xticks=np.arange(2015,2020)
yticks=np.arange(0,101,10)
'''
#PLOT – LINE CHART
plt.plot(year,percentMale,color='b')
plt.plot(year,percentFemale,color='r')
'''
'''
#BAR
x=np.arange(year[0],year[len(year)-1]+1)
plt.bar(x+0.00, percentMale, color='b', width=0.25)
plt.bar(x+0.25, percentFemale, color='r', width=0.25)
'''
'''
#BARH - Grouped Horizontal Bar (not stacked)
y=np.arange(year[0],year[len(year)-1]+1)
width = 0.3
fig, ax = plt.subplots()
ax.barh(y+(0.5*width), percentMale, width, color='red', label='Male')
ax.barh(y+(0.5*width)+width, percentFemale, width, color='green', label='Female')
ax.set(yticks=y+width, yticklabels=year) #do not set plt.xticks() and plt.yticks()
xticks = np.arange(0,101,10)
yticks=y+width
'''
'''
#HISTOGRAM
#plt.hist(positions, intervals or bins, weightsofbars)
percent=[2,5,70,40,30,45,50,45,43,40,44,60,7,13,57,18,90,77,32,21,20,40] #frequencies
range = (0, 100) #set ranges and no. of intervals
bins = 10
plt.hist(ages,bins,range,color='green',histtype='bar',rwidth=0.8) # plotting a histogram
xticks=np.arange(0,100,10)
yticks=np.arange(0,11)
'''
#====common plot settings
plt.title('MYPLOT')
plt.xticks(xticks) #plt.xticks(xticks,labels,rotation='vertical')
plt.yticks(yticks)
plt.xlabel('YEAR') #plt.xlabel('X Labels', fontsize=6)
plt.ylabel('PERCENTAGE') #plt.ylabel('Y Labels', fontsize=10)
plt.legend(['Male','Female']) #legends used in the plot
#plt.grid(True)
plt.show()
#plt.savefig("temp.png") #save the plot as an image

770945137.docx 10
Python MySQL Database Connectivity Example (PythonMySQLdbCRUD.py)
#Python MySQL Databasde Connectivity
import pandas as pd
import numpy as np
import pymysql

conn = pymysql.connect( host='localhost', user='root', port='', password='', db='ip2022',

cursorclass=pymysql.cursors.DictCursor)
cursor=conn.cursor()
def cls():
print("\n" * 20)
def intro():
print("Student Management System (SMS)\n")
def menu():
cls()
intro()
print(" Select a CRUD operation...")
print(" 1. Create")
print(" 2. Insert")
print(" 3. Select All Records")
print(" 4. Select Records Conditionally")
print(" 5. Update")
print(" 6. Delete")
print(" Q. Quit")
def main():
ch = 0
menu()
ch = input("Press 1 to 5 for CRUD or Q to Quit...")
if ch == '1':
create()
elif ch == '2':
insert()
elif ch == '3':
selectall()
elif ch == '4':
selectcondition()
elif ch == '5':
update()
elif ch == '6':
delete()
elif ch == 'Q' or ch == 'q' :
print("Thanks for using SMS!")
else:
print("Invalid choice! Enter a valid option.")
main()
while ch == 'Q' or ch == 'q':
cursor.close()
conn.close()
raise SystemExit #terminate the app
else:
replayMenu()
def replayMenu():
startover = ""
startover = input('...continue (y/n)? ')
while startover.lower() != 'y':
print("Thank you for using SMS.")
break
else:
main()
def create():
sql = '''CREATE TABLE IF NOT EXISTS student(roll INT PRIMARY KEY, name VARCHAR(30),
std VARCHAR(30), dob DATE, marks decimal(4,1))'''
try:
cursor.execute(sql)
conn.commit()
print("New table created.")
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
def insert():
print('Enter the student details: ')
roll = input('Roll: ')

770945137.docx 11
name = input('Name: ')
std = input('Class & Section (e.g. 12A): ')
dob = input('DOB (YYYYMMDD): ')
marks = input('Marks %: ')
sql = '''INSERT INTO student(roll,name,std,dob,marks)
VALUES (%s,%s,%s,%s,%s)'''
try:
cursor.execute(sql, (roll,name,std,dob,marks))
conn.commit()
print("New record added successfully.")
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
'''--------------------------------
#to insert multiple rows all at once
sql = "INSERT INTO student(roll,name) VALUES(%s, %s)"
val = [('123','Abhay',...),('456','Vijay',...),...]
cursor.executemany(sql,val)
--------------------------------'''
def selectall():
sql = "SELECT * FROM student"

try:
cursor.execute(sql)
#number_of_rows = cursor.execute(sql)
#print(number_of_rows)
result = cursor.fetchall()
print('''ROLL \t NAME \t\t CLASS \t DOB \t\t MARKS''')
print("----------------------------------------------------------------------")
for row in result:
#to use index rather than column name to access cursor data values
#disable 'cursorclass=pymysql.cursors.DictCursor' in conn declaration
#print(row[0],"\t",row[1],"\t",row[2],"\t",row[3],"\t",row[4]),"\t",row[5]
#0:>7 for right-align; #0:<7 for left-align; #0:7 for exact length
print("{0:<9}{1:<16}{2:<8}{3:<16}
{4:<5}".format(row['roll'],row['name'],row['std'],str(row['dob']),str(row['marks'])))
#print(row['roll'],row['name'],row['std'],row['dob'],row['marks'])
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))

def selectcondition():
column = input("Enter column name for record search...")
value = input("Enter value for the column...")
sql = "SELECT * FROM student WHERE {} = %s".format(column)
try:
cursor.execute(sql, (value,))
result = cursor.fetchall()
print("ROLL \t NAME \t STD \t DOB \t MARKS")
print("----------------------------------------------------------------------")
for row in result: print(row['roll'],"\t",row['name'],"\
t",row['std'],"\t",row["dob"],"\t",row["marks"])
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
def update():
roll = input("Enter roll of record to be updated...")
column = input("Enter the column to be updated...")
value = input("Enter the new value of the column...")
sql = "UPDATE student SET {}=%s WHERE roll = %s".format(column)
try:
cursor.execute(sql, (value, roll))
conn.commit()
print("\+nSuccessfully Updated...\n")
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
def delete():
roll = input("Enter roll of record to be deleted...")
sql = "DELETE FROM student WHERE roll = %s"
try:
cursor.execute(sql, (roll,))
conn.commit()
print("\nSuccessfully Deleted...\n")
except conn.Error as e:
print("ERROR %d: %s" %(e.args[0], e.args[1]))
#=====call for Starting function=====
main()

770945137.docx 12
DataFrame Plot – Line, Bar & Histogram

import matplotlib.pyplot as plt

import numpy as np
import pandas as pd

#----------------------------
#DATAFRAME plots - line & bar
#DATA
year = [2015,2016,2017,2018] #index for x-axis
percentMale = [70,82,63,90] #numeric column for y-axis
percentFemale = [80,52,73,43] #another numeric column for y-axis
#DATAFRAME
#By default - 'index' is laid out on X-axis and all numeric columns on Y-axis
df = pd.DataFrame({'Male': percentMale, 'Female': percentFemale}, index=year)
#PLOT
#ax = df['columnname'].plot(kind='bar',rot=0)
#to plot specified column(y-axis) against index(x-axis)
ax = df.plot(kind='bar',rot=0) #kind='bar|barh|line'
#PLOT settings
#ticks and legends are automatically set
plt.title('MYPLOT')
plt.xlabel('YEAR')
plt.ylabel('PERCENTAGE')
plt.show()
#----------------------------

#----------------------------
#DATAFRAME plots - histogram
#frequencies
ages=[2,5,70,40,30,45,50,45,43,40,44,60,7,13,57,18,90,77,32,21,20,40]
# setting the ranges and no. of intervals
myrange = (0, 100)
bins = 10
#bins=[0,20,40,60,80,100]
# plotting a histogram
df = pd.DataFrame({'Age': ages})
ax = df.plot(kind='hist',range=myrange,bins=bins,rwidth=0.8,alpha=0.5)
plt.title('MYPLOT')
plt.xlabel('YEAR')
plt.ylabel('PERCENTAGE')
plt.show()
#----------------------------

770945137.docx 13
Joining, Merging and Concatenating
DataFrames
Appending DataFrames The DataFrame.append() method merges two DataFrames. It appends rows of second
dataframe to the end of the first DataFrame and may end up having duplicate index.
Columns not present in the first dataframe are added as new columns in the resultant
dataframe. Columns remain unique.
df1=pd.DataFrame([[1,2,3],[4,5],[6]],
columns=['C1','C2','C3'], index=['R1','R2','R3'])
df2=pd.DataFrame([[10,20],[30],[40,50]],
columns=['C2','C5'], index=['R4','R2','R5'])
df=df1.append(df2)
C1 C2 C3 C5
R1 1.0 2.0 3.0 NaN
R2 4.0 5.0 NaN NaN
R3 6.0 NaN NaN NaN
R4 NaN 10.0 NaN 20.0
R2 NaN 30.0 NaN NaN
R5 NaN 40.0 NaN 50.0
Set parameter sort=True to get the column labels appear in sorted order.
df=df1.append(df2,sort='True')
Set verify_integrity=True parameter to raise an error if row labels are duplicate. By
default, verify_integrity=False, use it to accept duplicate row labels while appending the
DataFrames.
Set ignore_index=True parameter if row index labels are to be ignore in the resultant
dataframe. By default, ignore_index=False retains the index labels.
dFrame1 = dFrame1.append(dFrame2, ignore_index=True)
C1 C2 C3 C5
0 1.0 2.0 3.0 NaN
1 4.0 5.0 NaN NaN
2 6.0 NaN NaN NaN
3 NaN 10.0 NaN 20.0
4 NaN 30.0 NaN NaN
5 NaN 40.0 NaN 50.0
The append() method can also be used to append a series or a dictionary to a DataFrame.
Concatenating DataFrames The concat() function concates dataframes along an axis while performing optional set
logic (union or intersection) applied on index or other axes i.e. column. It joins multiple
dataframes vertically (row-wise, one after another).
df1 = pd.DataFrame({'A':['A0','A1'],'B':['B0','B1']},
index=[0,1])
df2 = pd.DataFrame({'A':['A2','A3'],'B':['B2','B3']},
index=[2,3])
df3 = pd.DataFrame({'A':['A4','A5'],'B':['B4','B5']},
index=[4,5])
pd.concat([df1,df2,df3])
A B
0 A0 B0
1 A1 B1
2 A2 B2
3 A3 B3
4 A4 B4
5 A5 B5
Merging DataFrames merge() function joins dataframes using a key column. Similar to SQL join using a
common column. Similar to df.join(), but uses a common key column to join on rather than
the common index. It joins multiple dataframes horizontally (column-wise, one after
another).
df1 = pd.DataFrame({'Key':['K0','K1'],
'A':['A0','A1'],'B':['B0','B1']})
df2 = pd.DataFrame({'Key':['K0','K1'],
'C':['C0','C1'],'D':['D0','D1']})
pd.merge(df1,df2,how='inner',on='Key')
A B Key C D
0 A0 B0 K0 C0 D0
1 A1 B1 K1 C1 D1
Joining DataFrames The join() function joins dataframes using common index. Similar to SQL join using a
common column. Similar to pd.merge(), but uses common index to join on rather than a
common key column. It joins multiple dataframes horizontally (column-wise, one after
another).
df1 = pd.DataFrame({'A':['A0','A1'],'B':['B0','B1']},
index=['K0','K1'])
df2 = pd.DataFrame({'C':['C0','C1'],'D':['D0','D1']},
index=['K0','K1'])
df1.join(df2)
A B C D
K0 A0 B0 C0 D0
K1 A1 B1 C1 D1

770945137.docx 14
770945137.docx 15

Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
XII IP CH 1 Python Pandas - I Series
No ratings yet
XII IP CH 1 Python Pandas - I Series
45 pages
NIST - SP.1800-26 - Data Integrity PDF
No ratings yet
NIST - SP.1800-26 - Data Integrity PDF
523 pages
CH 1 Python Pandas-I
No ratings yet
CH 1 Python Pandas-I
13 pages
12 IP Questions
No ratings yet
12 IP Questions
181 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
1 IP 12 NOTES PythonPandas 2022 PDF
100% (3)
1 IP 12 NOTES PythonPandas 2022 PDF
66 pages
Pandas
No ratings yet
Pandas
57 pages
Panda
No ratings yet
Panda
46 pages
Pandas
No ratings yet
Pandas
49 pages
Data Handlinng Using Pandas
No ratings yet
Data Handlinng Using Pandas
46 pages
Python Pandas Series
No ratings yet
Python Pandas Series
30 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
SR Ip Pandas I Full Notes
No ratings yet
SR Ip Pandas I Full Notes
30 pages
SPC Manual Servogun InglesV1.5
No ratings yet
SPC Manual Servogun InglesV1.5
101 pages
Ip Notes
No ratings yet
Ip Notes
72 pages
Data Handlinng Using Pandas-I
No ratings yet
Data Handlinng Using Pandas-I
46 pages
HSC ICT Lecture Sheet
No ratings yet
HSC ICT Lecture Sheet
86 pages
Dataframes UNIT 1 PART 2
No ratings yet
Dataframes UNIT 1 PART 2
33 pages
Unit-1 Python Pandas
No ratings yet
Unit-1 Python Pandas
56 pages
Pandas Class 12 Ncertttt
No ratings yet
Pandas Class 12 Ncertttt
48 pages
09 - Pandas Slides
No ratings yet
09 - Pandas Slides
33 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Pandas
No ratings yet
Pandas
63 pages
Pandas AI ML Python Software Engineering
No ratings yet
Pandas AI ML Python Software Engineering
63 pages
Project Report
No ratings yet
Project Report
55 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
23 pages
Ip Notes
No ratings yet
Ip Notes
20 pages
Final Formatted After Iloc Loc
No ratings yet
Final Formatted After Iloc Loc
34 pages
Chapter 1 and 2 Series and Data Frame
No ratings yet
Chapter 1 and 2 Series and Data Frame
45 pages
Python UnitIV
No ratings yet
Python UnitIV
20 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
Pandas Notes
No ratings yet
Pandas Notes
19 pages
Python Pandas
No ratings yet
Python Pandas
22 pages
12 IP Notes On Series
No ratings yet
12 IP Notes On Series
5 pages
MLL Ip Xii
No ratings yet
MLL Ip Xii
22 pages
Ip Study
No ratings yet
Ip Study
18 pages
Reading Material For Data Handling Using Pandas-I
No ratings yet
Reading Material For Data Handling Using Pandas-I
51 pages
14 Pandas
No ratings yet
14 Pandas
25 pages
Exp 25 - 26
No ratings yet
Exp 25 - 26
17 pages
XII IP QuickRevision
No ratings yet
XII IP QuickRevision
26 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
No ratings yet
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
25 pages
Python Pandas - Series Notes
No ratings yet
Python Pandas - Series Notes
13 pages
Chapter 2 Data Handling Using Pandas - I (Series)
No ratings yet
Chapter 2 Data Handling Using Pandas - I (Series)
13 pages
Unit 4
No ratings yet
Unit 4
36 pages
Unit 2
No ratings yet
Unit 2
81 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
14 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
9 pages
12ip 22 23
No ratings yet
12ip 22 23
188 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
XII - LIST OF PRACTICALS - With Answers
No ratings yet
XII - LIST OF PRACTICALS - With Answers
20 pages
NetWorker - Updating The NetWorker Software-NetWorker 19.4
No ratings yet
NetWorker - Updating The NetWorker Software-NetWorker 19.4
74 pages
Towards Automated Defense From Rootkit Attacks: Arati Baliga and Liviu Iftode
No ratings yet
Towards Automated Defense From Rootkit Attacks: Arati Baliga and Liviu Iftode
32 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
25 pages
Pandas
No ratings yet
Pandas
29 pages
Bim Coordinator Cover Letter
67% (3)
Bim Coordinator Cover Letter
6 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Introduction To Pandas & Data Structures
No ratings yet
Introduction To Pandas & Data Structures
11 pages
Novo9 Spark User Manual
No ratings yet
Novo9 Spark User Manual
34 pages
Week 1 - 240801095
No ratings yet
Week 1 - 240801095
14 pages
CH1 OO Database
No ratings yet
CH1 OO Database
29 pages
Class XII Data Handlinng Using PandasI
No ratings yet
Class XII Data Handlinng Using PandasI
46 pages
Online Communities
No ratings yet
Online Communities
9 pages
Autonomous Unmanned Aerial Vehicle Navigation Using Reinforcement Learning: A Systematic Review
No ratings yet
Autonomous Unmanned Aerial Vehicle Navigation Using Reinforcement Learning: A Systematic Review
24 pages
Linux Privilege Escalation 1714714339
No ratings yet
Linux Privilege Escalation 1714714339
18 pages
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
No ratings yet
Pandas-Creating Series & Dataframes (DR V Gowri, Srmist)
47 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Assignmentt 1
No ratings yet
Assignmentt 1
7 pages
Task Management For Soft Real-Time Applications Based On General Purpose Operating Systems
No ratings yet
Task Management For Soft Real-Time Applications Based On General Purpose Operating Systems
11 pages
Dse 2225 Os Midterm
No ratings yet
Dse 2225 Os Midterm
4 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Govardhan Reddy1
No ratings yet
Govardhan Reddy1
3 pages
04 Introduction To Python-1
No ratings yet
04 Introduction To Python-1
29 pages
Job Advert - DevOps Engineer
No ratings yet
Job Advert - DevOps Engineer
2 pages
HHSC
No ratings yet
HHSC
3 pages
Semiconductor MPC565/MPC566 MPC565/MPC566 RISC MCU With Code Compression Option
No ratings yet
Semiconductor MPC565/MPC566 MPC565/MPC566 RISC MCU With Code Compression Option
12 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Air University
No ratings yet
Air University
6 pages
RHB-Futures-GTS-Desktop App-UserGuide PDF
No ratings yet
RHB-Futures-GTS-Desktop App-UserGuide PDF
31 pages
Andrew Wells CV
No ratings yet
Andrew Wells CV
3 pages
Wa0000.
No ratings yet
Wa0000.
1 page
Mad Part 2 Notes
No ratings yet
Mad Part 2 Notes
11 pages
Linux Installation: Installing Linux Redhat 9 by
No ratings yet
Linux Installation: Installing Linux Redhat 9 by
37 pages
OSWE Exam Report
No ratings yet
OSWE Exam Report
5 pages
XIIComp SC PT1200 PDF
No ratings yet
XIIComp SC PT1200 PDF
4 pages
Web Developer Cover Letter
No ratings yet
Web Developer Cover Letter
2 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only

Uploaded by

CH 02 - Data Handling Using Pandas Leip102 EDITED Smaller 01 Codes Only

Uploaded by

Chapter 2: Data Handling Using Pandas – I

Installing NumPy and Pandas library pip install numpy

Creating an empty dataframe df = pd.DataFrame()

#returns a single value

#columns only can be specified without using .loc indexer

ResultDF.loc[[True, False, True]] #4throw onwards are ignored

#filtering rows and columns by writing an explicit condition

Replacing the missing values in a df.fillna(value=None|val, method=None, axis=None|0|1,

#filepath=r'd:\myfolder\nba.csv' #'r' preceding a

Database connectivity code is written within try-catch section.

#pip install sqlalchemy

conn = pymysql.connect( host='localhost', user='root', port='', password='', db='ip2022',

import matplotlib.pyplot as plt

You might also like