0% found this document useful (0 votes)
83 views246 pages

Pandas Class XII (2021-22)

The document introduces pandas, a Python library for data analysis. Pandas allows importing and analyzing data in Python using data structures like Series and DataFrames. It is open source and offers high performance data tools. NumPy is also introduced as it is used to create arrays for use in pandas.

Uploaded by

Kishan Kikkeri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views246 pages

Pandas Class XII (2021-22)

The document introduces pandas, a Python library for data analysis. Pandas allows importing and analyzing data in Python using data structures like Series and DataFrames. It is open source and offers high performance data tools. NumPy is also introduced as it is used to create arrays for use in pandas.

Uploaded by

Kishan Kikkeri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 246

Introducing python

Pandas
• It is a python library for data analysis
• Author is Wes McKinney

•Panel data system


• It’s an open source
• Offer high performance easy to use data
structure
• Data analysis tools
• To use python panda we have to import the
panda library
Start panda
• 1. using jupyter notebook

• We can import the panda by typing the


following command on jupyter notebook
• Import pandas as pd
• We can run jupyter command by pressing shift
and enter keys together
Numeric python
• Called as numpy
• Numpy arrays (named group of same types of
elments)
• It’s also open source module of python
• We can start using numpy by typing the
following commads in same jupyter notebook
• import numpy as np
• Creating arrary using numpy
• Example 2
• Students=[35,36,40,39,26]
• Array with students marks scored in 4 unit test
• Numpy array are quit similar like python list
but different in functionality
Types of numpy array
• Two types of numpy array
• 1. 1-D array or one dimensional array
• 2. multidimensional array
1-D array
• Also called as vectors
• They have single row/column only
Creating array 1-D
• import numpy as np
• books=*“IP”,”maths”,”hindi”+
• a=np.array(books) # array created with name
a
• print(a)

• *“IP”,”maths”,”hindi”+
Accessing elements from array
• Array_name [index]

• books=*“IP”,”maths”,”hindi”+
• a=np.array(books)
• a[1]
• ‘maths’
Creating multidimensional array
• Also called as matrix and have multiple row
and columns

• import numpy as np
• Subject=**1,2,3,4,5+,*“hin”,”soc”,”bio,”,”phy”,”m
at”++
• x=np.array(Subject)
• print(x)
Anatomy of array
• Number or row and number of columns


Shape attribute
• prize=[[1,2,3,4,5,6],[10,20,30,40,50,60],[100,2
00,300,400,500,600]]
• p=k.array(prize)

• p.shape
• (3, 6)
• Shape attribute will result 1 or single index
only for one dimensional array and both row
and columns for multi dimensional array
dtype attribute
• Used to find the data type for an array
• array.dtype
Difference
• Difference between numpy array and lists
• Both hold values or elements in same way and
we can access them in same manner but
• If numpy array created once can’t be
changeable it’s size overwrite can be possible
• In numpy array all element must be same type
or same data types
• Numpy array support vectorized operation if
you apply a function it is performed on every
item element by element but in list it can’t
Numpy data types
• Int
• Float
• Complex
• String
• UNICODE
Ways to create numpy array
• 1. empty array using empty()
• The empty create array with any random
garbage values
• Example
• 2. creating array filled with zeros using zeros()

Create array with specified size and type filled


with zeros only
• 3. creating array filled with ones using ones()
• Create array with specified size and data type
filled with only 1
v
• 4. creating array with numerical range using
arange() function
• 5 creating array with a numerical range using
linspace()
• It will print start value end value with number
of element to be generated.
np.linspace()
Both included in result

np. linspace (start,end,num)

Number ofDefault
elements to is
value be50
generated
Example 1

np.linspace(2,4,4)

2 4

n=4
space=((last-first)/n-1)
space=((4-2)/4-1) =2/3=0.66666666
2 2.66666666 3.33333332 4

space=((last-first)/n-1)
space=((4-2)/4-1)=2/3=0.66666666

Finally space between each will be 0.66666666


First value+space=Second value

2+0.66666666= 2.66666666
2.66666666+0.66666666= 3.33333332
3.33333332+0.66666666= 4
Quiz-1
• Do this on paper and send screen shot
• np.linspace(3,9,6)

• Formula
• Space=((last-first)/n-1)
• Last value in linspace =last
• First value in linspace = first
• n= total number of elements to be generated
• Then add result in first value, and again second
values + space ……………so on.
• Now you are having sufficient knowledge for
learning Pandas
Pandas data structure
• Data structure:
• A particular way to storing and organizing data
in computer for easy access
• Pandas uses two basic data structure
• Series and dataframe
• To start pandas we have to import pandas and
numpy both
• import pandas as pd
• import numpy as np
Series data structure
• It’s one of the most important data structure
• It represent one dimensional array of indexed
data.
• It has two components ! Array of actual data
• 2. associated array of indexes or data labels
• Both components are one d array with same
length
Creating series objects
• There are many way to create series
• 1. empty series using Series() with no
parameter
• This will create an empty series with no values
and default data type float64
• Let see
• 2. non empty series
• Series object=pd.Series(data,index=idx)
• Let see
Quiz

• Create a series of five subject


marks out of 100 ?
Quiz
• Create a series of five students height and
print it.
Using list
Using dictionary
Creating a series from Scalar value:
• In order to create a series from scalar value, an
index must be provided. The scalar value will be
repeated to match the length of index.
• import pandas as pd
• import numpy as np

• # giving a scalar value with index
• ser = pd.Series(10, index =[0, 1, 2, 3, 4, 5])

• print(ser)
Creating a series using NumPy
functions :
• In order to create a series using numpy
function, we can use different function of
numpy like
• numpy.linspace(),
• numpy.random.radn().
numpy linspace( )
• # import pandas and numpy
• import pandas as pd
• import numpy as np

• # series with numpy linspace()
• ser1 = pd.Series(np.linspace(3, 33, 3))
• print(ser1)
Example-2
• # series with numpy linspace()
• ser2 = pd.Series(np.linspace(1, 100, 10))
• print("\n", ser2)
numpy.random.randn() in Python
• The numpy.random.randn() function creates an array
of specified shape and fills it with random values as
per standard normal distribution.
• # Python Program illustrating
• # numpy.random.randn() method

• import numpy as kk
• # 1D Array
• array = pd.Series(kk.random.randn(5))
• print("1D Array filled with random values : \n", array);
Creating series objects
• 1. specifying adding NaN value in a series o
• bject
• When we don’t have complete data for series
such condition we can fill missing data with
NaN
• NaN= Not a number
• It can use as np.NaN in numpy module
• 2. specify index as well as data with series
• Series
object=pandas.Series(data=none,index=none)
Create series for the following output
• 101 arvind
• 102 mohan
• 103 mukesh
• 104 rajesh
• 105 prem
• 106 NaN
Create a series
• 1. Create a series object using three different
word :”you”,”are”,”snoring”?
• 2. create a series using ndarry that has 5
elements in the range of 24 to 64.
• 3. specifying data type along data and index

• Series
obj=pd.Series(data=None,index=None,dtype=
None)
• If all none it take by default values
Quiz
• Create array with value 12,13,14,15
• Use the same array to create series of pandas
data and index it none with data type float 64.
• If possible show result
• 4. using mathematical function to create
series
• Series
object=pd.series(data=function,index=None)
Series object attributes
• The series attribute can be use in the
following format in pandas
• Series object.attribute name

• Example:
• Object.index
• Let see
• 1. Series.index
• Will return the index of the series
• As you have already seen in previous exmaple
• 2. Series.values
• Will return the series as array
• 3. Series.dtype it will return the data type of
the series
• 4.Series.shape will return the no of row and
columns
Dimension, size and bytes of series
ndim
nbytes

Number of elements *8
3*8=24
• If float64 then * by 8
• If float32 * by 4
• If int16 * by 2 to get nbytes
Quiz
• obj2=pd.Series(data=[12,13,14,15,12.5])

• Find the nbytes for the above statement


Accessing series data
• Can be access in single or slices also
• Single can be access using index number
• Series object name[valid index]
Slices
• Take place position wise not index wise in
series data objects
• All the individual object store in following
position
• When you have to extract slices then you need
to specify slices as [start:end:step]
POSITION INDEX DATA

0
1
2
3
4
INDEX DATA
Operation on series object
• 1. modifying elements of series object
• seriesobject [index]=new vlaue
• It will replace or modify only the indexed
value with new value
• If the index is not existing in series then it add
a new index with new value
Modify all values
• Seriesobject [start:stop]=new data value
Quiz
• Find the error in the below code
• a=pd.Series(range(1,15,3),index=list(‘abc’))
• 2. The head() and tail () functions
• The head() function used to fetch first rows
from the panda object and tail() function used
to fetch last rows from pandas object
• Pandasobject.head(n)
• Pandasobject.tail(n)
• Let see
• If there is no value assign to head and tail it
will return first five or last five row.
Vector operation on series object
• If you apply any expression or function on
series object it will apply on each item
individually.
• Let see
• We can apply the following legal operation
• obj+2
• obj-2
• obj*2
• obj/2
• obj>250
Filtering entries or values
• Object[object expression]
• obj[obj>250]
quiz
• Difference between numpy array and series
object
Data frame and other operations
• The data frame object of pandas can store
two dimensional data
• The panel object can store three dimensional
data
DataFrame
• It’s another way to store and represent pandas
data in two D.
• It’s same like spreadsheet or excel file
• It’s 2 D labelled array or an ordered collection of
columns.
• Column can have different types of data types
• The index can be number or character
• They are mutable
• Size mutable
Creating DataFrame
• Same we have to import
• import pandas as pd
• import numpy as np
• Then
• DataFrame object
name=panda.DataFrame(data structure)
DataFrame using dictionary
QUIZ
Index specified by us

The index
Sequence
And specify
Index
sequence
Must match
DataFrame with 2D ndarray
index Quiz

Column name
• Specifying own column names or index name
• Using columns keyword and index keyword.
Quiz
Index
Quiz
Displaying DataFrame
• Using variable name
• Using print command
DataFrame attributes
• We can use attribute in the following format
• DataFrame object.attribute name
• 1. index
• 2. columns
• 3. axes
• 4. dtypes
• 5. size
• 6. shape
• 7. values
• 8. empty
• 9. ndim
• 10. T
index
columns
axes
• Return a list representing both the axes (axis 0
for index and axis 1 for columns
• Dtype will return data type for DataFrame
• Size return number of element in Dataframe
shape
• Return number of tuple and attribute (row
and columns) in DataFrame
Rows

columns
• values will return values of DataFrame
• empty will return or indicate DataFrame is
empty or not
ndim
• Return number of dimensions
T
• Transpose index and columns
Selecting and accessing data
• 1. accessing a column
• DataFrame object name[column name]
NaN values with size
NaN values with count()
Accessing multiple columns
• DataFrame object[[column1,column2......]]
• To select multiple columns we need to list the
columns inside the square brackets with
Dframe
• Example
Accessing subset
• We can access not only columns but also rows
• By using iloc and loc
• iloc used to slice the Dataframe based on
index or position
• And loc is used to access single and multiple
rows based on label
loc
• DataFrame object.loc[startrow:endrow]
Index column
iloc

Row
Rows

Columns
Selecting individual values
• With name of row or index number
• Dobj.column[row name or index number]
Modifying vlaues
Adding and deleting columns
Binary operation in DataFrame
• It require two values
• These two values picked element wise
• Data aligned from two data frame
• Based of their row and columns if match
operation performed if not NaN is stored in
the result
• We can perform add binary operation using +
using add() df1.add(df2)
• We can perform subtract binary operation
using – using sub() df1.sub(df2)
• For multiply * mul()
• And division / div()
• Same for mul and div
2
df1.add(df2)
Axis 1

Axis 0

firstdf.drop('b',axis=0)
• Will result error that
• "['b'] not found in axis"
Data transfer between files ,SQl data
basae and data frame
• Data can be transfer into CSV file format from
data frame or series.
• CSV =comma separated form or comma
separated values
• It’s simple way to store data
• Common format for data exchange
• It can open in excel file or calc file
• It’s easy to import or export to CSV
What Is a CSV File?
• A CSV file (Comma Separated Values file) is a
type of plain text file that uses specific
structuring to arrange tabular data. Because
it’s a plain text file, it can contain only actual
text data—in other words, printable ASCII or
Unicode characters.
each piece of data is separated by a comma
• In general, the separator character is called a
delimiter, and the comma is not the only one
used. Other popular delimiters include the tab
(\t), colon (:) and semi-colon (;) characters.
Creating csv file
• Any text file or excel file can became CSV file
using file extension while saving.
• Suppose a file name is data.exls
• Then you can save it data.csv
• Your csv file created and ready to use.
Loading data from CSV to dataframe

Stored
In c:\ drive
Data folder
File name is sample.csv
Function to read csv
File location
in computer
First row is
treated as
column headings
Specifying own column name
Reading specified number of rows
Reading csv with different seprator
semicolon
Summary of read_csv
Storing data frame data to csv

You might also like