0% found this document useful (0 votes)
10 views

Unit-1 Python Pandas (1)

The document provides an overview of data handling using the Pandas library in Python, focusing on data manipulation and analysis. It covers key features of Pandas, including data structures like Series, methods for creating Series, and various operations such as indexing, slicing, and mathematical functions. Additionally, it highlights attributes of Series objects and techniques for sorting and filtering data.

Uploaded by

poojasaini989700
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Unit-1 Python Pandas (1)

The document provides an overview of data handling using the Pandas library in Python, focusing on data manipulation and analysis. It covers key features of Pandas, including data structures like Series, methods for creating Series, and various operations such as indexing, slicing, and mathematical functions. Additionally, it highlights attributes of Series objects and techniques for sorting and filtering data.

Uploaded by

poojasaini989700
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Class 12

INFORMATICS PRACTICES

Unit -1
Data Handling using Pandas
and
Data Visualization

This pdf contain 2 chapters


Chapter – 1 ,2 , 3 , 4

Board Examination = 25 Marks


CHAPTER- 1 Python Pandas-1
PANDA:
 It is a software library written for the python programming language for datamanipulation
and analysis.
 It is a package for data science that offers powerful and flexible data structures which makes
data analysis and manipulation easy.

Features of Pandas:
 It can read or write in many different data formats( int, float, double etc).
 Columns in a Panda data structure can be deleted or inserted.
 It supports groups of operation for data aggregation and transformation.
 It allows merging and joining of data.
 It can easily find and fill missing data.
 It supports reshaping of data into different formats.

Data Structures in Pandas:


There are 3 types of data structures in Pandas:

1. Series:
 It is a one-dimensional structures storing homongeneous data( same data type) which
can be modified(mutable).
 It contains the sequence of values and an associated index.
 It can be created using Series( ) method.
Example1 :
INDEX DATA
0 22
1 -14
2 32
3 100

Example:
INDEX DATA
Sun 1
Mon 2
Tue 3
wed 3

Note: Series data is mutable but size of series is immutable( once declared we cannot changed)
Creation of series:
Series can be created using two methods:
 Dict ( Dictionary)
 Scalar value or constant

Creating an empty series using Series( ) method:

Import pandas as pd
S1= pd . Series( )
Print S1

Creating a series using Series( ) method with arguments:

Creating a series using List

Import pandas as pd
S = pd . Series( [10, 20 , 30, 40 ] )
Print ( S )

0utput:
0 10
1 20
2 30
3 40

Creating a series using range( ) method

Import pandas as pd
S= pd . Series( range(5) )
Print ( S )

0utput:
Index data
0 0
1 1
2 2
3 3
4 4

Creating a series with pre-defined index:

Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print ( S )
0utput:
a 10
b 20
c 30
d 40
e 50

To access single value based on index:

Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print (‘ b’)

0utput:
20

To access multiple values based on index:

Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print (‘a’ , ‘c ’ , ‘d’)

0utput:
a 10
c 30
d 40

Accessing data from series with position:

Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print ( S [0] ) // for zero index position
Print ( S [ : 3] ) // for first 3 index values
Print ( S [ -3 : ] ) // for last 3 index values

OUTPUT:

10

10
20
30

30
40
50
ILoc and Loc

iLoc:
it is used for indexing and selecting based on position i.e row number and column number. It is also
called position based indexing.

Ex: Import pandas as pd


S1= pd . Series( [21, 32, 43, 14, 55] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print (s.iLoc [1:4])

Output:
Index Data
b 32
c 43
d 14
Loc:
it is used for indexing and selecting based on name i.e row name and column name. It is also
called name based indexing.

Ex: Import pandas as pd


S1= pd . Series( [21, 32, 43, 14, 55] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print (s. Loc [ ‘b’ : ‘e’ ] )

Output:
Index Data
b 32
c 43
d 14
e 55

creating a series from Scalar or Constant value:

Ex-1 Import pandas as pd


S = pd . Series ( 55 , index =[‘a’ , ‘ b’ , ‘c’ , ‘d’ ] )
Print ( S )

Output:
Index Data
a 55
b 55
c 55
d 55
Ex-2 Import pandas as pd
S = pd . Series ( 55 , index =[0 , 1 , 2, 3 ] )
Print ( S )

Output:
Index Data
0 55
1 55
2 55
3 55

Ex-3 using range function


Import pandas as pd
S = pd . Series ( 55 , index =range ( 0 , 3 ) )
Print ( S )

Output:
Index Data
0 55
1 55
2 55

Ex-4 using range function


Import pandas as pd
S = pd . Series ( 55 , index =range ( 1 , 6 , 2 ) )
Print ( S )

Output:
Index Data
1 55
3 55
5 55

Ex: To create a series using String as index

Import pandas as pd
S = pd . Series ( “WELCOME” , index =[ ‘arun’ , ‘nitin’ , ‘vikas’ ] )
Print ( S )

Output:
Index Data
arun WELCOME
nitin WELCOME
vikas WELCOME
Ex: To create a series using Range() and for loop

Import pandas as pd
S = pd . Series ( range( 1, 15, 3 ), index= [ x for x in ‘abcde’ ] )
Print ( S )

Output:

Index Data
a 1
b 4
c 7
d 10
e 13

Ex: Creating a series using 2 different lists:

Import pandas as pd
Months = [ ‘jan’ , ‘feb’ , ‘mar’ , ‘apr’ , ‘may’]
Days = [ 31 , 28 , 31 , 30 , 31 ]
S = pd . Series ( Days, index =Months )
Print ( S )

Output:

Index Data
jan 31
feb 28
mar 31
apr 30
may 31

Ex: Creating a series using missing values (NaN):

Import pandas as pd
Import numpy as np
S = pd . Series ( [ 7.5, 5.4, np.NaN, 3.5 ])
Print ( S )

OUTPUT:

Index Data
0 7.5
1 5.4
2 NaN
3 3.5
Creating a series from dictionary:

Import pandas as pd
S = pd . Series ( { ‘jan’ :31 , ‘feb’ : 28 , ‘mar’: 31 , ‘apr’: 30 ]
Print ( S )

Output:
Index Data
jan 31
feb 28
mar 31
apr 30
may 31

Creating a series using a mathematical expression/ function:

Import pandas as pd
Import numpy as np
S1 = np.arange( 1 , 5 )
Print (S1)
S2 = pd . Series ( index =S1 , data = S1 x 4 )
Print (S2)

Output:

Index Data
1 4
2 8
3 12
4 16

Creating a series using a exponent/ power:

Import pandas as pd
Import numpy as np
S1 = np.arange( 1 , 5 )
Print (S1)
S2 = pd . Series ( index =S1 , data = S1 * * 4 )
Print (S2)

Output:
Index Data
1 1
2 4
3 9
4 16
Series Using head() and tail() functions:

Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50, 60] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ , ‘f ’ ] )

Print ( S.head( ) ) // By default head() prints upper 5 rows

Output:
Index data
a 10
b 20
c 30
d 40
e 50

Print (S.head(3)) // head( 3) prints upper 3 rows

Output:
Index data
a 10
b 20
c 30

Print ( S.tail( ) ) // By default tail() prints lower 5 rows

Output:
Index data
b 20
c 30
d 40
e 50
f 60

Print ( S.tail( 3 ) ) // By default tail() prints lower 3 rows

Output:
Index data
d 40
e 50
f 60
Creating a series using addition , subtraction, multiplication and divison

Import pandas as pd
S1 = pd . Series( [10, 20, 30, 40 ]
S2 = pd . Series( [1 , 2 , 3 , 4 ]
Add= S1 + S2
Sub = S1 - S2
Mul = S1 x S2
Div = S1 / S2
Print( Add)
Print( Sub)
Print (Mul)
Print( Div)

OUTPUT:
Addition Subtraction Multiplication Division
11 9 10 10
22 18 40 10
33 27 90 10
44 36 160 10

Vector Operations on Series:

Import pandas as pd
S = pd . Series( [10, 20, 30 ]

Print( S + 2)
Output:
12
22
32

Print ( S x 2)
Output:
20
40
60

Print( S * * 2)
Output:
100
400
900

Print(S >10)
Output:
False
True
True
Deleting elements from series :

Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
S .drop (‘c’ )

0utput:
a 10
b 20
d 40
e 50

Series Objects attributes:

Attribute Description
Series.index Return index of the series
Series.dtype Return datatype of underlying data
Series.shape Return tuples of the shape of underlying data
Series.nbytes Return bytes of underlying data
Series.ndim Return the number of dimension
Series.itemsize Return the size of the datatype
Series.Values Return values of the series
Series.hasnans Return true if there are any NaN
Series.Empty Return true if series object is empty

Example:
Import pandas as pd
S = pd . Series ( range( 1, 15, 3 ), index= [ x for x in ‘abcde’ ] )

a b c d e
1 4 7 10 13

Print S.index
Output: a b c d e

Print S.values
Output: 1 4 7 10 13

Print S.size
Output: 5

Print S.itemsize
Output: 8

Print S.nbytes
Output: 40

Print S.ndim
Output: 1
Sorting series values in ascending and descending order:

Ascending order:

Import pandas as pd
S = pd . Series( [40, 20, 30, 50])
S .sort_values( )
Print(S)

0utput:
20
30
40
50

Descending order:

Import pandas as pd
S = pd . Series( [40, 20, 30, 50 ] )
S .sort_values(ascending = false)
Print(S)

0utput:
50
40
30
20

Print data based on condition:

Import pandas as pd
S = pd . Series( [40, 20, 30, 50 ] )
Print(S [ S > 35 ] )
0utput:
40
50

Print data based on condition:

Import pandas as pd
S = pd . Series( [40, 20, 30, 50 ] )
Print( S > 35 )

0utput:
True
False
False
True
Tiling a series:

Import pandas as pd
S = pd . Series(np.tile ( [3 , 5] , 2 ) )
Print( S )

Output:
0 3
1 5
2 3
3 5

Slicing a Series into subsets:

import pandas as pd
num = [000, 100, 200, 300, 400, 500, 600, 700, 800, 900 ]
id = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J' ]
series = pd.Series(num, index = id)

0 1 2 3 4 5 6 7 8 9
A B C D E F G H I J
000 100 200 300 400 500 600 700 800 900
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1

print(series[2:4])

OUTPUT:
C 200
D 300
print(series[ : 6 ] )

OUTPUT:
A 0
B 100
C 200
D 300
E 400
F 500

print(series[ 1 : 6 : 2 ] )

OUTPUT:
B 100
D 300
F 500
print(series[ 4 : ] )

OUTPUT:
E 400
F 500
G 600
H 700
I 800
J 900

print(series[ : 4 : 2 ] )

OUTPUT:
A
C

print(series[ 4 : :2])

OUTPUT:
E 400
G 600
I 800

print(series[ : : -1 ] )
OUTPUT:
J 900
I 800
H 700
G 600
F 500
E 400
D 300
C 200
B 100
A 0
Chapter-2 Pandas Python -2
Dataframe:
It is a panda data structure which stores data in two-dimensional way or tabular form.
Dataframe can be created using DataFrame( ) method.

Features of dataframe:
 Size of dataframe is mutable (can be changed)
 Values of dataframe are also mutable ( can change anytime)
 Arithmetic operations on row and columns.

A B C D
0
1
2
3

 Creation of Dataframe:
Import pandas as pd
df1= pd.DataFrame( )
print df1
output: blank dataframe is created

 Creating dataframe from lists:

Example:1 Import pandas as pd


List1 = [10, 20, 30, 40, 50 ]
df1= pd.DataFrame(List1 )
print df1
Output:
0

0 10
1 20
2 30
3 40
4 50

Example 2: Import pandas as pd


data1 = [ [‘shreya’ , 20], [ ‘Rakshit’, 22 ], [‘Madhav’, 26] ]
df1= pd.DataFrame (data1, columns= [‘Name’ , ‘Age’ ] )
print df1
Output:

Age Name
0 20 Shreya
1 22 Rakshit
2 26 Madhav
Creating dataframe from 2D dictionary having values as List:

Import pandas as pd
Dict1 = { ‘Student’ : [ ‘Ruchika’ , ‘Neha’ , ‘Raj’ , ‘Tanu’ ],
‘Sports’ : [ ‘Football’ , ‘Chess’ , ‘Ludo’ , ‘Hockey’],
‘Marks’ : [ 200 , 300 , 100 , 400 ] }
df1= pd.DataFrame ( Dict1, index= [‘I’ , ‘II’ , ‘III’ , ‘IV’ ] )
print df1

Output:

Marks Sports Student


I 200 Football Ruchika
II 300 Chess Neha
III 100 Ludo Raj
IV 400 Hockey Tanu

OR

Import pandas as pd
List1 = [ 10 , 20 , 30 ], [ 40 , 50 , 60] , [ 70 , 80 , 90 ] ]
df1= pd.DataFrame ( List1)
print df1

Output:

0 1 2
0 10 20 30
1 40 50 60
2 70 80 90

 Creating dataframe from 2D dictionary having values as Dictionary object:

Import pandas as pd
Dict1 = { ‘ Sales’ : [ Name: ‘Ruchika’ , ‘Age’ : ‘24’ , ‘Gender’ : ‘Female’ ],
Marketing’ : [ Name: ‘Vikas’ , ‘Age’ : ‘21’ , ‘Gender’ : ‘Male’ ] }
df1= pd.DataFrame ( Dict1)
print df1

Output:

Marketing Sales
Age 21 24
Gender Male Female
Name Vikas Ruchika
Q: Write a program to create a dataframe from a list containing dictionaries of the
performance of 3 topper students. Topper grade should be row labels.

Import pandas as pd
TopperA = { ‘Rollno’ : 10 , ‘Name’ : ‘Arun’ , ‘Marks’ : ‘99’ } ,
TopperB = { ‘Rollno’ : 12 , ‘Name’ : ‘Rishi’ , ‘Marks’ : ‘98’ } ,
TopperC = { ‘Rollno’ : 13 , ‘Name’ : ‘Preet’ , ‘Marks’ :96’ } ,

Toppers = [ TopperA , TopperB , TopperC ]


df1= pd.DataFrame ( Toppers, index= [‘TopperA’ , ‘TopperB’ , ‘TopperC ‘] )
print df1

Output:
Rollno Name Marks
TopperA 10 Arun 99
TopperB 12 Rishi 98
TopperC 13 Preet 96

====================================================================

Ques: Suppose you have a table Student , you want to change index from 0-3 to name.

Rollno Name Marks


0 10 Arun 99
1 12 Rishi 98
2 13 Preet 96
3 14 Nishi 99

(i) To Change the index column:

Ans: Import pandas as pd


df . set_index (‘Name’ , inplace= True)
print df

Output:

Rollno Marks
Arun 10 99
Rishi 12 98
Preet 13 96
Nishi 14 99
(ii) To Reset index column:

Ans: Import pandas as pd


df . reset_index ( inplace= True)
print df
Output:
Rollno Name Marks
0 10 Arun 99
1 12 Rishi 98
2 13 Preet 96
3 14 Nishi 99

(iii) Adding new column to dataframe:

Import pandas as pd
df [‘Age’] = [ 20, 30, 25, 26, 15]
df [‘Age2’] = 45
df [‘Age3’] = pd. Series ( [42, 44, 50, 60, 45] , index= [0, 1, 2, 3 ,4] )
df[‘Total’} = df[Age’] + df[‘Age2’] +df[‘Age3’]
print df

Output:
Age Age2 Age3 Total
0 20 45 42 107
1 30 45 44 119
2 25 45 50 120
3 26 45 60 131
4 15 45 45 105

(iv) Selecting a column using iLoc method:


Suppose you have a dataframe df shown below:

Age Age2 Age3 Total


20 45 42 107
30 45 44 119
25 45 50 120
26 45 60 131
15 45 45 105

(a) df.iLoc[ : , [0 ,3] ]

output:
Age Total
0 20 107
1 30 119
2 25 120
3 26 131
4 15 105
(b) df.iLoc [ : , [0 : 3] ]

output:
Age Age2 Age3
0 20 45 42
1 30 45 44
2 25 45 50
3 26 45 60
4 15 45 45

(c) df.iLoc [ : , [0 : 4] ]

output:
Age Age2 Age3 Total
0 20 45 42 107
1 30 45 44 119
2 25 45 50 120
3 26 45 60 131
4 15 45 45 105

====================================================================
 Delete column from dataframe:

we can delete the columns using drop () method.

df.drop [‘ Total’, axis=1 ] // axis =1 is used for column bcos by default axis value is 0 forrows.

output:
Age Age2 Age3
0 20 45 42
1 30 45 44
2 25 45 50
3 26 45 60
4 15 45 45

 Renaming Rows/Columns in dataframe

Suppose we have a dataframe df

Name Rollno Marks


Sec A Anu 10 99
Sec B Dev 12 98
Sec C Ram 13 96

(a) To change the row name/labels into A, B, C, D

df.rename( index = { ‘Sec A’ : ‘A’ , ‘Sec B’ : ‘B’ , ‘Sec C’ : ‘C’ } , inplace= True )
OUTPUT:

Name Rollno Marks


A Anu 10 99
B Dev 12 98
C Ram 13 96
D Vasu 14 99

(b) To change the column name Rollno to Rno

df.rename ( columns = { ‘Rollno’ : ‘Rno’ } )

OUTPUT:

Name Rno Marks


Sec A Anu 10 99
Sec B Dev 12 98
Sec C Ram 13 96

==============================================================
 Modify rows and columns:

Suppose you have a dataframe df

school hospital cinema


Delhi 50 10 90
Pune 40 20 80
Noida 30 30 60
Nagpur 60 40 30

Adding Columns:

(a) df [‘Cinema’] =20

output:

school hospital cinema


Delhi 50 10 20
Pune 40 20 20
Noida 30 30 20
Nagpur 60 40 20
(b) df [‘Cinema’] = [ 30 , 40, 50 , 60]

output:

school hospital cinema


Delhi 50 10 30
Pune 40 20 40
Noida 30 30 50
Nagpur 60 40 60

Adding rows: use ‘at’ or ‘loc’ to add new rows.

(a) df . at [‘Noida’ , : ] = 200

output:

school hospital cinema


Delhi 50 10 30
Pune 40 20 40
Noida 200 200 200
Nagpur 60 40 60

(b) df . at [‘Noida’ , : ] = [ 300 , 200, 100 ]

output:

school hospital cinema


Delhi 50 10 30
Pune 40 20 40
Noida 300 200 100
Nagpur 60 40 60

Modify single cell:

df.school [‘Delhi’] = 00

output:

school hospital cinema


Delhi 00 10 30
Pune 40 20 40
Noida 300 200 100
Nagpur 60 40 60

===============================================================
Boolean indexing:
it means we have Boolean values (True or False) and (1 or 0) as indexes of dataframe.

Creating dataframe with Boolean index:


Example:
Import pandas as pd
Days = [‘mon’, ‘tue’, ‘wed’, ‘thu’, ‘fri’ ]
Classes = [ 5 , 0 , 3 , 0 , 8 ]
dc = {‘Days’ : Days, ‘No of classes’: Classes}
df = pd.dataframe( dc, index=[ true, false, true, false, true] )
print df
output:
Days no of classes
True mon 5
False tue 0
True wed 3
False thu 0
True fri 8

OR
If we use in a program
df = pd.dataframe( dc, index=[ 1, 0 , 1 , 0 ,1)

output:

Days no of classes
1 mon 5
0 tue 0
1 wed 3
0 thu 0
1 fri 8

Accessing rows from dataframes with Boolean indexes:

Q1: Suppose we have a dataframe df


Days no of classes
True mon 5
False tue 0
True wed 3
False thu 0
True fri 8

df.loc [True] // this function prints all values whose index is true.
output:
Days no of classes
True mon 5
True wed 3
True fri 8

(a) df.loc [False] // this function prints all values whose index is false.

output:

Days no of classes
False tue 0
False thu 0

===============================================================
Q2: Suppose we have a dataframe df

Days no of classes
mon 5
tue 0
wed 3
thu 1
fri 8

df.loc [ 1 ] // this function prints all values whose index is 1.

output:

Days no of classes
1 mon 5
1 wed 3
1 fri 8
-----------------------------------------------------------------------------------------------------------------------

Q : Answer the following questions

Rollno Name Marks


0 10 Arun 99
1 12 Rishi 98
2 13 Preet -
3 14 Nishi 99

 df.index:
output:
0 , 1, 2 , 3

 df.columns:
output:
Rollno, Name Marks
 df.axes
output:
index : 0, 1, 2, 3
index: Rollno , Name , Marks

 df.datatypes
output:
Rollno: int
Name : object/ text
Marks: int

 Len(df)
Output: 4

 df.count()
output:
Rollno: 4
Name :4
Marks :3

=======================================================

Transpose a dataframe:

We can transpose dataframe by swapping its indexes and columns.

We have a dataframe df
Rollno Name Marks
0 10 Arun 99
1 12 Rishi 98
2 13 Preet 76

df. Transpose ( )

After transpose:

Output:

0 1 2
Rollno 10 12 13
Name Arun Rishi Preet
Marks 99 98 76
Selecting and accessing dataframe:

Suppose we have a dataframe df

Rollno Name Marks


0 10 Arun 99
1 12 Rishi 88
2 13 Preet 75
3 14 Nishi 93

Accesing single column:

df [‘Name’] or df. Name

output:

Name
0 Arun
1 Rishi
2 Preet
3 Nishi

Accessing multiple columns

df [ ‘Rollno’ , ’Name’]

output:

Rollno Name
0 10 Arun
1 12 Rishi
2 13 Preet
3 14 Nishi

Accessing a subset from dataframe

Ex1: df .Loc [ ‘1’ , : ]

output:

Rollno Name Marks


1 12 Rishi 88
Ex-2 df .Loc [ ‘1’, ‘ 2’ , : ]

output:

Rollno Name Marks


1 12 Rishi 88
2 13 Preet 75

Ex-3 df .Loc [ : , ‘Rollno’: ‘Marks ’ ]

output:

Rollno Name Marks


0 10 Arun 99
1 12 Rishi 88
2 13 Preet 75
3 14 Nishi 93

Ex-3 df .Loc [ : , ‘Rollno’, ‘Marks ’ ]

output:

Rollno Marks
0 10 99
1 12 88
2 13 75
3 14 93

Ex-3 df [0: 2]

output:

Rollno Name Marks


10 Arun 99
12 Rishi 88
Ex- 3 df [0: 3 , 0 : 2]

output:

Rollno Name
0 10 Arun
1 12 Rishi
2 13 Preet
Q: Write a program to create dataframe from 2D array as shown below:

10 20 30
40 50 60
70 80 90

Ans: Import pandas as


pd Import numpy as np
Arr2 = np.array ( [ [ 10 , 20 , 30 ], [ 40 , 50 , 60] , [ 70 , 80 , 90 ] ] )
df1= pd.DataFrame (
Arr2) print df1

======================================================

Q: Read carefully the following code:

Import pandas as pd
Yr1 = { ‘Qtr1’ : 4400 , ‘Qtr2’ : 6600 , ‘Q3’ : 2200 }
Yr2 = {‘A’ : 5400, ‘B’ : 3200, ‘Qtr3’ : 1100 }
Sales ={ 1 : Yr1 , 2 :
Yr2 } df1 =
pd.dataframe( sales)

 List the index labels of the Dataframe df1


Ans: Qtr1, Qtr2, Qtr3, Q3, A, B

 List the columns name of the dataframe df1


Ans: ( yr1 , yr2)
Iterating over a Dataframe

It contain 2 methods:

f Iterrows() :- It views a dataframe in the form of horizontal subsets or row wise.

Example:
Import pandas as pd
Sales = { ‘yr1’ : { ‘ Qtr1’ : 100 , ‘Qtr2’ : 200 , ‘Qtr3’ : 300 },
{ ‘yr2’ : { ‘ Qtr1’ : 400 , ‘Qtr2’ : 500 , ‘Qtr3’ : 600 },
{ ‘yr3’ : { ‘ Qtr1’ : 700 , ‘Qtr2’ : 800 , ‘Qtr3’ : 900 } }
df1 = pd.Dataframe(Sales)

for (row, rowSeries ) in df1.iterrows() :


print(“ row index: “ ,
row) print ( rowSeries)

output:

Row index: Qtr1 Row index: Qtr2 Row index:


Qtr3
Yr1: 100 Yr1: 200 Yr1: 300
Yr2: 400 Yr2: 500 Yr2: 600
Yr3: 700 Yr3: 800 Yr3: 900

g Iteritems():- It views a dataframe in the form of vertical subsets or column wise.

Example:
Import pandas as pd
Sales = { ‘yr1’ : { ‘ Qtr1’ : 100 , ‘Qtr2’ : 200 , ‘Qtr3’ : 300 },
{ ‘yr2’ : { ‘ Qtr1’ : 400 , ‘Qtr2’ : 500 , ‘Qtr3’ : 600 },
{ ‘yr3’ : { ‘ Qtr1’ : 700 , ‘Qtr2’ : 800 , ‘Qtr3’ : 900 } }
df1 = pd.Dataframe(Sales)

for (col, colSeries ) in df1.iteritems() :


print(“ column index: “ ,
col) print ( rowSeries)

output:

Column index: Yr2 Column index: yr3


Column index: Yr1
Qtr1: 100 Qtr1: 400 Qtr1: 700
Qtr2: 200 Qtr2: 500 Qtr2: 800
Qtr3: 300 Qtr3: 600 Qtr3: 900
Binary operations in Dataframe:
Binary operations means operations requiring tow values to perform and these values are picked
element wise.

1. Addition: it contains two functions add() and radd().


Both functions can perform addition but radd() perform addition in
reverse . The result of both functions are same:

Example 1: Example : 2
Suppose we have 2 dataframes df1 and df2. Suppose we have 2 dataframes df1 and df2.

df1 . add (df2) OR df1 + df2 df1 + df2

output: output:

2. Subtraction: It contains two functions sub() and rsub().


Both functions can perform subtraction but rsub() perform subtraction in
reverse . The result of both functions are same:

Example 1: Example : 2

Suppose we have 2 dataframes df1 and df2. Suppose we have 2 dataframes df1 and df2.

df1 . sub(df2) OR df1 - df2 df1 - df2

output: output:
3. Multiplication: It contains two functions mul() and rmul().
Both functions can perform multiplication, but rmul() perform multiplication in
reverse . The result of both functions are same:

Example 1: Example : 2
Suppose we have 2 dataframes df1 and Suppose we have 2 dataframes df1 and df2.
df2.

df1 . mul (df2) OR df1 x df2 df1 x df2

output: output:

4.. Division: It contains two functions div() and rdiv().


Both functions can perform division but rdiv() perform division in
reverse . But the result of both functions are same:

Example 1: Example : 2

Suppose we have 2 dataframes df1 and df2. Suppose we have 2 dataframes df1 and df2.

df1 . div (df2) OR df1 / df2 df1 / df2

output: output:
Ques: Given 2 dataframes storing points of a 2-player teams in four rounds, write a
program to calculate average points obtained by each player in each round.

Decriptive Statistics with Pandas:

Ques: Read the code carefully and answer the given questions
Import pandas as pd
Prod = { ‘Grapes’ : { ‘ Assam’ : 800 ,‘Kerala : 200 , ‘Jammu’ : 600 },
{ ‘Orange’ : { ‘ Assam’ : 400 , ‘Kerala’ : 600 , ‘Jammu’ : 300 },
{ ‘Apple’ : { ‘ Assam’ : 600 , ‘Kerala’ : 700 , ‘Jammu’ : 900}}

df1 = pd.Dataframe(Prod)

OR
Ques: Suppose you have a dataframe prod

Grapes Orange Apple


Assam 800 400 600
Kerala 200 600 700
Jammu 600 300 900

 Prod.min(): It returns the minimum production of each item from dataframe.

Output:
Grapes: 200
Orange: 300
Apple: 600

 Prod.min (axis =1) : it returns the minimum production from items for every state.

Output:
Assam: 400
Kerala: 200
Apple: 300
 Prod.max(): It returns the minimum production of each item from dataframe.
Output:
Grapes: 800
Orange: 600
Apple: 900

 Prod.max (axis =1) : it returns the minimum production from items for every state.
Output:
Assam: 800
Kerala: 700
Apple: 900

Note: idxMax() and idxMin() functions returns the index for maximum and minimum values of
the dataframe. It cannot return data/ values , it returns index only

Mode , Mean ,Median , count and sum


 Mode(): It returns the value appearing the most.
 Mean(): It returns the average value.
 Median(): It returns the middle value.
 Count( ): it returns the number of values in either row and column.
 Sum( ): it returns the sum of all rows or columns.

Note: axis=0 is for column, it is by default set , we cannot write in program


axis=1 is for rows , it we access data columnwise then it should be mention in program.

Ques1: Suppose you have a dataframe df . Find the mean , mode, median,
count and sum.
A B C D
Hin 50 60 30 60
Eco 40 50 40 90
Eng 30 80 20 30
IP 50 70 20 20
Math 90 NAN 60 90

df . mode( ) df . mode(axis=1 )
Output: Output:
A : 50 Hin: 60
B : 50, 60, 70, 80 , NaN Eco : 40
C : 20 Eng : 30
D : 90 IP : 20
Math: 90

df . median( ) df . median(axis=1 )
Output: Output:
A : 50 Hin: 55
B : 65 Eco : 45
C : 30 Eng : 30
D : 60 IP : 35
Math: 90
df . mean( ) df . mean(axis=1 )
Output: Output:
A : 52 Hin: 50
B : 65 Eco : 55
C : 34 Eng : 40
D : 58 IP : 40
Math: 80

df . count( ) df . count(axis=1 )
Output: Output:
A :5 Hin: 4
B :4 Eco : 4
C :5 Eng : 4
D :5 IP : 4
Math: 3

df . sum( ) df . sum(axis=1 )
Output: Output:
A : 260 Hin: 200
B : 260 Eco : 220
C : 170 Eng : 160
D : 290 IP : 160
Math: 240
------------------------------------------------------------------------------------------------------------------------
Ques: Suppose you have a dataframe mks1

Carefully observer C column . Which value is repeated most


of the times ? Ans:
Head() and Tail() in dataframe:

Suppose you have a dataframe df

ID Name Age Contact


11 Arun 21 34534
22 Akash 22 37556
33 Naman 34 74646
44 Deepak 18 75645
55 vaibhav 24 77754
66 shivam 21 43244

 df. Head( ) : // print by default starting 5 records

output:
ID Name Age Contact
11 Arun 21 34534
22 Akash 22 37556
33 Naman 34 74646
44 Deepak 18 75645
55 vaibhav 24 77754

 df.tail() // print by default last 5 records


 output:

ID Name Age Contact
22 Akash 22 37556
33 Naman 34 74646
44 Deepak 18 75645
55 vaibhav 24 77754
66 shivam 21 43244

 df.Head(3) // print starting 3 records
 output:

ID Name Age Contact


11 Arun 21 34534
22 Akash 22 37556
33 Naman 34 74646

 df.Tail (2) // print last 3 records


output:

ID Name Age Contact


55 vaibhav 24 77754
66 shivam 21 43244
---------------------------------------------------------------------------------------------------------------
Chapter-3 CSV Just Below
Chapter-3 CSV ( comma separated values)
CSV:
 It is a simple file format used to store tabular data such as a spreadsheet or database.
 A CSV file stores tabular data in plain text.
 Each line of a file is data record. Each record consist one or more fields separated by commas.

Tabular Data CSV file


ID Name Marks After convert to ID, Name ,
11 Arun 55 CSV Marks 11,
22 Vikas 78 arun, 55
22, vikas, 78

Advantages of CSV:
 A common format for data interchange.
 All spreadsheet and databases support import and export to CSV format.
 Simple and compact storage.

Reading from a CSV file to Dataframe:


Import pandas as pd
df = pd . read_CSV ( “ E:\\
myfolder\\Employee.CSV” ) print df
output:
ID Name Marks
0 11 Arun 55
1 22 Vikas 78

df.shape
Output: 3 , 2

Reading CSV with specific/ selected columns:


Import pandas as pd
df = pd . read_CSV ( “ E:\\ myfolder\\Employee.CSV” , usecols =[ ‘ID’ , ‘Name’ ] ) print df

output:
ID Name
0 11 Arun
1 22 Vikas
Reading CSV without header:
Import pandas as pd
df = pd . read_CSV ( “ E:\\ myfolder\\Employee.CSV” , header= None ) print df

output:
0 1 2
0 11 Arun 55
1 22 Vika 78
s

Reading CSV without index:


Import pandas as pd
df = pd . read_CSV ( “ E:\\ myfolder\\Employee.CSV” , index_col = 0 )
print df
output:

11 Arun 55
22 Vika 78
s

Reading CSV with new columns names:


Import pandas as pd
df = pd . read_CSV ( “ E:\\ myfolder\\Employee.CSV” , skiprows =1 , names = [ ‘E_id’ , ‘E_name’
, ‘E_marks’ ] )
print df
output:
E_id E_name E_marks
11 Arun 55
22 Vikas 78

ID, Name ,
Reading CSV file having separator
11, arun 5
22, 7
Import pandas as pd

df = pd . read_CSV ( “ E:\\ myfolder\\Employee.CSV” , sep = ’\ t’ ] )


print df
output:
E_id E_name E_marks
11 Arun 55
22 Vikas 78

Reading specified number of rows from CSV file:


Suppose we have a CSV file Employee shown below

11, Arun, 55
22, Vikas, 78
33, Sachin, 34

Import pandas as pd

df = pd . read_CSV ( “ E:\\ myfolder\\Employee.CSV” , names =[“ID” , “Name” , “Marks” ] , nrows = 2 )

print df

ID, Name, Marks


11, Arun, 55
22, Vikas, 78

Storing Dataframe data to CSV file:


To_CSV() method can be
used. Ex:
Import pandas as pd
df . To_CSV ( “ E:\\ myfolder\\Employee.CSV” )
Output:

ID, Name ,
Marks 11,
arun, 55

Handling NaN values with to_CSV( ): suppose we have a CSV file shown below:

ID, Name ,
Marks 11,
arun, 55
22, vikas, 78
df.loc [ 0 , “Name” ] = np.NaN
df.loc [ 2 , “Marks” ] = np.NaN

output:
ID, Name, Marks
11, NaN 55
22, Vikas, 78
33 Tanu NaN

Transferring Data between Dataframes and MySQL


Step 1: Start python and import the packages
Step 2: Open a connection to database
Step 3: Execute SQL command and fetch rows into a dataframe
Step 4: Process is required
Step 5: Close the connection.

Example: Suppose we have a test.CSV file and we want to access all


data from this CSV file using SQL.

ID, Name , Marks, Age


11, arun, 55 , 21
22, vikas, 78, 24
33, Tanu, 46 , 27
44, Neha, 64 , 24

Solution
Import pandas as pd
Import mysql. connector as sqltor

mycon = sqltor .connect ( host = “localhost” , user = “root” , passwd =”Mypass” , database =”test”)
If mycon. is_connected( ):
df = pd. read_sql (“select * from student ; “ , mycon )

print df
else
print (“Connection is failed”)
output:
ID Name Marks Age
11 Arun 55 21
22 Vikas, 78 24
33 Tanu 46 27
44 Neha 64 24

Example2 : Suppose we have a test.CSV file and we want to access all data
based on condition (access records of those students whose marks is greater
than 60) from this CSV file using SQL.

ID, Name , Marks, Age


11, arun, 55 , 21
22, vikas, 78, 24
33, Tanu, 46 , 27
44, Neha, 64 , 24

Solution

Import pandas as pd
Import mysql. connector as sqltor
mycon = sqltor .connect ( host = “localhost” , user = “root” , passwd =”Mypass” ,
database =”test”) If mycon. is_connected( ):
df = pd. read_sql (“select * from student where Marks > 60 ; “ , mycon )
print df
else
print (“Connection is failed”)

output:
ID Name Marks Age
22 Vikas, 78 24
44 Neha 64 24
Example: Suppose we have a test.CSV file and we want to access all students
records whose name is begin with a specific letter.

ID, Name , Marks, Age


11, arun, 55 , 21
22, vikas, 78, 24
33, Tanu, 46 , 27
44, Neha, 64 , 24

Solution

Import pandas as pd
Import mysql. connector as sqltor

mycon = sqltor .connect ( host = “localhost” , user = “root” , passwd =”Mypass” , database =”test”)

If mycon. is_connected( ):

qry = “Select * from student where name like ‘%s ‘ ; ” %( V )


df = pd. read_sql (qry , mycon )

print df

else
print (“Connection is failed”)
output:
ID Name Marks Age
22 Vikas, 78 24

Writing data into an SQL table using dataframe:


Import pandas as pd
Import mysql. connector as sqltor
con = sql . create_engine(”mysql + mysqlconnector:// root: “ ” @ localhost/school” )

d = { ‘name’ : pd.series ( [ ‘arun’ , ‘dev’ , ‘vani’ ] ) , ‘age’ : pd.series ( [ 25, 26, 29 ] )

Df = pd.dataframe(
d ) print df
Df. To_sql (“Cricket team” , con)
Output:
name age
0 arun 25
1 dev 26
2 vani 29

Note:

 read_CSV() : this function is used to read data from a CSV file in your dataframe
 to_CSV() : this function saves the data of dataframe on a CSV file.
 read_sql ( ): this function is used to read the contents from an sql database into dataframe.
 to_sql( ) : this function is used to write the contents from an sql database into a dataframe.

----------------------------------------------------------------------------------------------------------------

Chapter- 4 Data Visualization

Data visualization
 Data visualization refers to the graphical or visual representation of information and data using
visual elements like charts, graphs, maps etc.
 Data visualization helps us to easily understand a complex problems and see certain patterns.
MatPlotLib: It is a 2D plotting library that helps in visualizing figures.
It is used python for data visualization.

Pyplot: It is a module of matplotlib library containing collection of methods which allows users to create 2D
plots and graphs easily and interactively.

Plot: A plot is a graphically representation techiniques for representing a dataset, usually as a graph ,
showing the relationship between two or more variables
.
Numpy: It stands for number python. It is a library for scientific computing (multidimensional array) in
python.
_______________________________________________________

Types of chart:

1. Line Graph
 A line chart or line graph is a type of chart which displays information as a series of data points
called ‘markers’ connected by straight line segments.
 This type of plot is often used to visualize a trend in data over intervals of time.
 Plot() is used to create line chart.
Example: write a program in python to plot a line chart to depict the changing weekly
onion prices for 4 weeks.

Apply various features in Line chart:


Plot size and Grid
plt.figure(figsize=(15,7)) // it means 15cm (width) wide x co-ordinate and
7 units (length) long y co-ordinates.
Plt.grid(true) // if you want to show grid on the plot.

Changing line color and style


plt.plot(color=’r’) // it is used to change the color of line
plt.plot(linewidth=2) // it is used for line thickness.
plt.plot(linewidth=2, linestyle=’dashed’) // it is used for line style( we can also use dashes or
dashdot or dotted or solid types also)

Example : Create line chart using different properties.


Output:

Changing marker type size and color:

Plt.plot(marker=’X’ , markersize=4 , markeredgecolor=’red’)


Example: Creating the line chart using list values with marker properties.

2. Scatter Chart:

 A scatter chart uses dots to represent values for two different numeric variables.
 Scatter plots are used to observe relationships between variables.
 Scatter() is used to create scatter chart.
Example: write a program in python to plot a scatter chart to depict the changing weekly onion
prices for 4 weeks

Changing marker type size and color:

plt.scatter( S = 8 , C =’r’ , marker=’x’) //it means size is 8, color is red and marker design is x.

Example: Creating the scatter chart using list values name and price with marker properties.
Example: write a program to to plot a scatter graph taking tgwo random distributions in x
and y having randomly generated integers and plotted .

3. Bar Chart:
 A bar chart is a chart or graph that presents categorical data with rectangular bars with heights.
 The bars can be plotted vertically or horizontally.
 bar() is used to create bar chart vertically.
 Barh() is used to create bar chart vertically.

Example: Creating the bar chart using name and prices.


Changing width and colors of the bars in bar chart:

 plt.bar(width=[0.6,0.5,0.4,0.3,0.2,])
it changes the bar sizes .

 plt.bar(color=['red','gold','silver','blue','green'])
it changes the colors of the bars available in program

Example: Creating the bar chart using name and prices with different colors and different sizes.
Example: Creating the bar chart using name and prices horizontally.

Example: write a program to plot a bar chart form the persons A,B,C with their unit test
marks.
4. Pie Chart:
 A type of graph in which a circle is divided into sectors that each represent a proportion of the whole.
Pie() is used to create pie chart.
Example: Create a pie chart using name and marks with label and title

Example: Create a pie chart using name and sizes with colors and explode properties.
Add legends in charts:

Example: create bar chart on common plot where three data ranges are plotted on same
chart.
Histogram:
 A histogram is a summarization tool for discrete or continuous data.
 hist() method is to create histogram.

Example: create a histogram with the age values of 100


Example: create a histogram with the age values of 100 with his type feature
Board important Questions: 2020

1. Find the output of following program.

import numpy as np
d=np.array([10,20,30,40,50,60,70])
print(d[-4:])

output: [40 50 60 70]

2. Fill in the blank with appropriate numpy method to calculate and print the variance of an array.
import numpy as np
data=np.array([1,2,3,4,5,6])
print(np. (data,ddof=0)

Output: np.var

3. Mr. Sanjay wants to plot a bar graph for the given set of values of subject on x-axis and
number of students who opted for that subject on y-axis.

Complete the code to perform the following :


(i) To plot the bar graph in statement 1
(ii) To display the graph in statement 2

import matplotlib.pyplot as plt


x=['Hindi', 'English', 'Science', 'SST']
y=[10,20,30,40]
Statement 1
Statement 2

Output: (i) plt.bar(x,y)


(ii) plt.show()

4. Mr. Harry wants to draw a line chart using a list of elements named LIST.

Complete the code to perform the following operations:


(i) To plot a line chart using the given LIST,
(ii) To give a y-axis label to the line chart named “Sample Numbers”.

import matplotlib.pyplot as PLINE


LIST=[10,20,30,40,50,60]
Statement 1
Statement 2
PLINE.show()

Output:
(i) PLINE.plot(LIST)
(ii) PLINE.ylabel(“Sample Numbers”)

5. Write the output of the following code :


import numpy as np
array1=np.array([10,12,14,16,18,20,22])
array2=np.array([10,12,15,16,12,20,12])
a=(np.where(array1==array2))
print(array1[a])
output:
[10 12 16 20]

6.

Output:

import matplotlib.pyplot as plt


import numpy as np
x = np.arange(1, 5)
plt.plot(x, x*1.5, label='Normal')
plt.plot(x, x*3.0, label='Fast')
plt.plot(x, x/3.0, label='Slow')
plt.legend()
plt.show()

7. method in Pandas can be used to change the index of rows and columns of a
Series or Dataframe :
Ans: reindex()

8. Write a small python code to drop a row fromdataframe labeled as 0.

Output: df = df.drop(0)
print(df )

9. Write a python code to create a dataframe with appropriate headings from the list given below

['S101', 'Amy', 70], ['S102', 'Bandhi', 69], ['S104', 'Cathy', 75], ['S105', 'Gundaho', 82]

output:
import pandas as pd
data = [['S101', 'Amy', 70], ['S102', 'Bandhi', 69], ['S104','Cathy', 75], ['S105', 'Gundaho', 82]]
df = pd.DataFrame(data, columns = ['ID', 'Name', 'Marks'])
print(df )
10. Write a small python codeto create a dataframewith headings(a and b) from the list given
below :
[ [1,2],[3,4],[5,6],[7,8] ]

Output:
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2)

11.

Output:
(i) print(df.mean(axis = 1, skipna = True))
print(df.mean(axis = 0, skipna = True))

(ii) print(df.sum(axis = 1, skipna = True))

(iii) print(df.median())

12..

Output:

(i) df1.sum()
(ii) df1[‘Rainfall’].mean()
(iii) df1.loc[:11, ‘maxtemp’:’Rainfall’].mean( )
13. Find the output of the following code:
import pandas as pd
data = [{'a': 10, 'b': 20}, {'a': 6, 'b': 32, 'c': 22}]

#with two column indices, values same as dictionary keys


df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'c'])

print(df1
)
print(df2
)

OutPut:

a b
first 10 20
second 6 32

a c
first 10 NaN
second 6 NaN

14.. Write the code in pandas to create the following dataframes :


df1 df2
mark1 mark2 mark1 mark2
0 10 150 30 20
1 40 451 20 25
2 15 302 20 30
3 40 703 50 30

Write the commands to do the following operations on the dataframes


given above :
(i) To add dataframes df1 and df2.
(ii) To subtract df2 from df1
(iii) To rename column mark1 as marks1in both the dataframes df1 and df2.
(iv) To change index label of df1 from 0 to zero and from 1 to one.

Output:
(i) import numpy as
np import pandas
as pd
df1 = pd.DataFrame({'mark1':[30,40,15,40], 'mark2':[20,45,30,70]});
df2 = pd.DataFrame({'mark1':[10,20,20,50], 'mark2':[15,25,30,30]});
print(df1
)
print(df2
)

(i) print(df1.add(df2))
(ii) print(df1.subtract(df2))
(iii) df1.rename(columns={'mark1':'marks1'},
inplace=True) print(df1)

(iv) df1.rename(index = {0: "zero", 1:"one"}, inplace =


True) print(df1)

=================================================

All the best Students…!!


Arun Kumar
Golden Heart Academy

You might also like