Pandas Class XII (2021-22)
Pandas Class XII (2021-22)
Pandas
• It is a python library for data analysis
• Author is Wes McKinney
• *“IP”,”maths”,”hindi”+
Accessing elements from array
• Array_name [index]
• books=*“IP”,”maths”,”hindi”+
• a=np.array(books)
• a[1]
• ‘maths’
Creating multidimensional array
• Also called as matrix and have multiple row
and columns
• import numpy as np
• Subject=**1,2,3,4,5+,*“hin”,”soc”,”bio,”,”phy”,”m
at”++
• x=np.array(Subject)
• print(x)
Anatomy of array
• Number or row and number of columns
•
Shape attribute
• prize=[[1,2,3,4,5,6],[10,20,30,40,50,60],[100,2
00,300,400,500,600]]
• p=k.array(prize)
• p.shape
• (3, 6)
• Shape attribute will result 1 or single index
only for one dimensional array and both row
and columns for multi dimensional array
dtype attribute
• Used to find the data type for an array
• array.dtype
Difference
• Difference between numpy array and lists
• Both hold values or elements in same way and
we can access them in same manner but
• If numpy array created once can’t be
changeable it’s size overwrite can be possible
• In numpy array all element must be same type
or same data types
• Numpy array support vectorized operation if
you apply a function it is performed on every
item element by element but in list it can’t
Numpy data types
• Int
• Float
• Complex
• String
• UNICODE
Ways to create numpy array
• 1. empty array using empty()
• The empty create array with any random
garbage values
• Example
• 2. creating array filled with zeros using zeros()
Number ofDefault
elements to is
value be50
generated
Example 1
np.linspace(2,4,4)
2 4
n=4
space=((last-first)/n-1)
space=((4-2)/4-1) =2/3=0.66666666
2 2.66666666 3.33333332 4
space=((last-first)/n-1)
space=((4-2)/4-1)=2/3=0.66666666
2+0.66666666= 2.66666666
2.66666666+0.66666666= 3.33333332
3.33333332+0.66666666= 4
Quiz-1
• Do this on paper and send screen shot
• np.linspace(3,9,6)
• Formula
• Space=((last-first)/n-1)
• Last value in linspace =last
• First value in linspace = first
• n= total number of elements to be generated
• Then add result in first value, and again second
values + space ……………so on.
• Now you are having sufficient knowledge for
learning Pandas
Pandas data structure
• Data structure:
• A particular way to storing and organizing data
in computer for easy access
• Pandas uses two basic data structure
• Series and dataframe
• To start pandas we have to import pandas and
numpy both
• import pandas as pd
• import numpy as np
Series data structure
• It’s one of the most important data structure
• It represent one dimensional array of indexed
data.
• It has two components ! Array of actual data
• 2. associated array of indexes or data labels
• Both components are one d array with same
length
Creating series objects
• There are many way to create series
• 1. empty series using Series() with no
parameter
• This will create an empty series with no values
and default data type float64
• Let see
• 2. non empty series
• Series object=pd.Series(data,index=idx)
• Let see
Quiz
• Series
obj=pd.Series(data=None,index=None,dtype=
None)
• If all none it take by default values
Quiz
• Create array with value 12,13,14,15
• Use the same array to create series of pandas
data and index it none with data type float 64.
• If possible show result
• 4. using mathematical function to create
series
• Series
object=pd.series(data=function,index=None)
Series object attributes
• The series attribute can be use in the
following format in pandas
• Series object.attribute name
• Example:
• Object.index
• Let see
• 1. Series.index
• Will return the index of the series
• As you have already seen in previous exmaple
• 2. Series.values
• Will return the series as array
• 3. Series.dtype it will return the data type of
the series
• 4.Series.shape will return the no of row and
columns
Dimension, size and bytes of series
ndim
nbytes
Number of elements *8
3*8=24
• If float64 then * by 8
• If float32 * by 4
• If int16 * by 2 to get nbytes
Quiz
• obj2=pd.Series(data=[12,13,14,15,12.5])
0
1
2
3
4
INDEX DATA
Operation on series object
• 1. modifying elements of series object
• seriesobject [index]=new vlaue
• It will replace or modify only the indexed
value with new value
• If the index is not existing in series then it add
a new index with new value
Modify all values
• Seriesobject [start:stop]=new data value
Quiz
• Find the error in the below code
• a=pd.Series(range(1,15,3),index=list(‘abc’))
• 2. The head() and tail () functions
• The head() function used to fetch first rows
from the panda object and tail() function used
to fetch last rows from pandas object
• Pandasobject.head(n)
• Pandasobject.tail(n)
• Let see
• If there is no value assign to head and tail it
will return first five or last five row.
Vector operation on series object
• If you apply any expression or function on
series object it will apply on each item
individually.
• Let see
• We can apply the following legal operation
• obj+2
• obj-2
• obj*2
• obj/2
• obj>250
Filtering entries or values
• Object[object expression]
• obj[obj>250]
quiz
• Difference between numpy array and series
object
Data frame and other operations
• The data frame object of pandas can store
two dimensional data
• The panel object can store three dimensional
data
DataFrame
• It’s another way to store and represent pandas
data in two D.
• It’s same like spreadsheet or excel file
• It’s 2 D labelled array or an ordered collection of
columns.
• Column can have different types of data types
• The index can be number or character
• They are mutable
• Size mutable
Creating DataFrame
• Same we have to import
• import pandas as pd
• import numpy as np
• Then
• DataFrame object
name=panda.DataFrame(data structure)
DataFrame using dictionary
QUIZ
Index specified by us
The index
Sequence
And specify
Index
sequence
Must match
DataFrame with 2D ndarray
index Quiz
Column name
• Specifying own column names or index name
• Using columns keyword and index keyword.
Quiz
Index
Quiz
Displaying DataFrame
• Using variable name
• Using print command
DataFrame attributes
• We can use attribute in the following format
• DataFrame object.attribute name
• 1. index
• 2. columns
• 3. axes
• 4. dtypes
• 5. size
• 6. shape
• 7. values
• 8. empty
• 9. ndim
• 10. T
index
columns
axes
• Return a list representing both the axes (axis 0
for index and axis 1 for columns
• Dtype will return data type for DataFrame
• Size return number of element in Dataframe
shape
• Return number of tuple and attribute (row
and columns) in DataFrame
Rows
columns
• values will return values of DataFrame
• empty will return or indicate DataFrame is
empty or not
ndim
• Return number of dimensions
T
• Transpose index and columns
Selecting and accessing data
• 1. accessing a column
• DataFrame object name[column name]
NaN values with size
NaN values with count()
Accessing multiple columns
• DataFrame object[[column1,column2......]]
• To select multiple columns we need to list the
columns inside the square brackets with
Dframe
• Example
Accessing subset
• We can access not only columns but also rows
• By using iloc and loc
• iloc used to slice the Dataframe based on
index or position
• And loc is used to access single and multiple
rows based on label
loc
• DataFrame object.loc[startrow:endrow]
Index column
iloc
Row
Rows
Columns
Selecting individual values
• With name of row or index number
• Dobj.column[row name or index number]
Modifying vlaues
Adding and deleting columns
Binary operation in DataFrame
• It require two values
• These two values picked element wise
• Data aligned from two data frame
• Based of their row and columns if match
operation performed if not NaN is stored in
the result
• We can perform add binary operation using +
using add() df1.add(df2)
• We can perform subtract binary operation
using – using sub() df1.sub(df2)
• For multiply * mul()
• And division / div()
• Same for mul and div
2
df1.add(df2)
Axis 1
Axis 0
firstdf.drop('b',axis=0)
• Will result error that
• "['b'] not found in axis"
Data transfer between files ,SQl data
basae and data frame
• Data can be transfer into CSV file format from
data frame or series.
• CSV =comma separated form or comma
separated values
• It’s simple way to store data
• Common format for data exchange
• It can open in excel file or calc file
• It’s easy to import or export to CSV
What Is a CSV File?
• A CSV file (Comma Separated Values file) is a
type of plain text file that uses specific
structuring to arrange tabular data. Because
it’s a plain text file, it can contain only actual
text data—in other words, printable ASCII or
Unicode characters.
each piece of data is separated by a comma
• In general, the separator character is called a
delimiter, and the comma is not the only one
used. Other popular delimiters include the tab
(\t), colon (:) and semi-colon (;) characters.
Creating csv file
• Any text file or excel file can became CSV file
using file extension while saving.
• Suppose a file name is data.exls
• Then you can save it data.csv
• Your csv file created and ready to use.
Loading data from CSV to dataframe
Stored
In c:\ drive
Data folder
File name is sample.csv
Function to read csv
File location
in computer
First row is
treated as
column headings
Specifying own column name
Reading specified number of rows
Reading csv with different seprator
semicolon
Summary of read_csv
Storing data frame data to csv