Unit-1 Python Pandas (1)
Unit-1 Python Pandas (1)
INFORMATICS PRACTICES
Unit -1
Data Handling using Pandas
and
Data Visualization
Features of Pandas:
It can read or write in many different data formats( int, float, double etc).
Columns in a Panda data structure can be deleted or inserted.
It supports groups of operation for data aggregation and transformation.
It allows merging and joining of data.
It can easily find and fill missing data.
It supports reshaping of data into different formats.
1. Series:
It is a one-dimensional structures storing homongeneous data( same data type) which
can be modified(mutable).
It contains the sequence of values and an associated index.
It can be created using Series( ) method.
Example1 :
INDEX DATA
0 22
1 -14
2 32
3 100
Example:
INDEX DATA
Sun 1
Mon 2
Tue 3
wed 3
Note: Series data is mutable but size of series is immutable( once declared we cannot changed)
Creation of series:
Series can be created using two methods:
Dict ( Dictionary)
Scalar value or constant
Import pandas as pd
S1= pd . Series( )
Print S1
Import pandas as pd
S = pd . Series( [10, 20 , 30, 40 ] )
Print ( S )
0utput:
0 10
1 20
2 30
3 40
Import pandas as pd
S= pd . Series( range(5) )
Print ( S )
0utput:
Index data
0 0
1 1
2 2
3 3
4 4
Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print ( S )
0utput:
a 10
b 20
c 30
d 40
e 50
Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print (‘ b’)
0utput:
20
Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print (‘a’ , ‘c ’ , ‘d’)
0utput:
a 10
c 30
d 40
Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
Print ( S [0] ) // for zero index position
Print ( S [ : 3] ) // for first 3 index values
Print ( S [ -3 : ] ) // for last 3 index values
OUTPUT:
10
10
20
30
30
40
50
ILoc and Loc
iLoc:
it is used for indexing and selecting based on position i.e row number and column number. It is also
called position based indexing.
Output:
Index Data
b 32
c 43
d 14
Loc:
it is used for indexing and selecting based on name i.e row name and column name. It is also
called name based indexing.
Output:
Index Data
b 32
c 43
d 14
e 55
Output:
Index Data
a 55
b 55
c 55
d 55
Ex-2 Import pandas as pd
S = pd . Series ( 55 , index =[0 , 1 , 2, 3 ] )
Print ( S )
Output:
Index Data
0 55
1 55
2 55
3 55
Output:
Index Data
0 55
1 55
2 55
Output:
Index Data
1 55
3 55
5 55
Import pandas as pd
S = pd . Series ( “WELCOME” , index =[ ‘arun’ , ‘nitin’ , ‘vikas’ ] )
Print ( S )
Output:
Index Data
arun WELCOME
nitin WELCOME
vikas WELCOME
Ex: To create a series using Range() and for loop
Import pandas as pd
S = pd . Series ( range( 1, 15, 3 ), index= [ x for x in ‘abcde’ ] )
Print ( S )
Output:
Index Data
a 1
b 4
c 7
d 10
e 13
Import pandas as pd
Months = [ ‘jan’ , ‘feb’ , ‘mar’ , ‘apr’ , ‘may’]
Days = [ 31 , 28 , 31 , 30 , 31 ]
S = pd . Series ( Days, index =Months )
Print ( S )
Output:
Index Data
jan 31
feb 28
mar 31
apr 30
may 31
Import pandas as pd
Import numpy as np
S = pd . Series ( [ 7.5, 5.4, np.NaN, 3.5 ])
Print ( S )
OUTPUT:
Index Data
0 7.5
1 5.4
2 NaN
3 3.5
Creating a series from dictionary:
Import pandas as pd
S = pd . Series ( { ‘jan’ :31 , ‘feb’ : 28 , ‘mar’: 31 , ‘apr’: 30 ]
Print ( S )
Output:
Index Data
jan 31
feb 28
mar 31
apr 30
may 31
Import pandas as pd
Import numpy as np
S1 = np.arange( 1 , 5 )
Print (S1)
S2 = pd . Series ( index =S1 , data = S1 x 4 )
Print (S2)
Output:
Index Data
1 4
2 8
3 12
4 16
Import pandas as pd
Import numpy as np
S1 = np.arange( 1 , 5 )
Print (S1)
S2 = pd . Series ( index =S1 , data = S1 * * 4 )
Print (S2)
Output:
Index Data
1 1
2 4
3 9
4 16
Series Using head() and tail() functions:
Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50, 60] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ , ‘f ’ ] )
Output:
Index data
a 10
b 20
c 30
d 40
e 50
Output:
Index data
a 10
b 20
c 30
Output:
Index data
b 20
c 30
d 40
e 50
f 60
Output:
Index data
d 40
e 50
f 60
Creating a series using addition , subtraction, multiplication and divison
Import pandas as pd
S1 = pd . Series( [10, 20, 30, 40 ]
S2 = pd . Series( [1 , 2 , 3 , 4 ]
Add= S1 + S2
Sub = S1 - S2
Mul = S1 x S2
Div = S1 / S2
Print( Add)
Print( Sub)
Print (Mul)
Print( Div)
OUTPUT:
Addition Subtraction Multiplication Division
11 9 10 10
22 18 40 10
33 27 90 10
44 36 160 10
Import pandas as pd
S = pd . Series( [10, 20, 30 ]
Print( S + 2)
Output:
12
22
32
Print ( S x 2)
Output:
20
40
60
Print( S * * 2)
Output:
100
400
900
Print(S >10)
Output:
False
True
True
Deleting elements from series :
Import pandas as pd
S = pd . Series( [10, 20, 30, 40, 50] , index =[‘a’ , ‘b’ , ‘c’, ‘d’ , ‘e’ ] )
S .drop (‘c’ )
0utput:
a 10
b 20
d 40
e 50
Attribute Description
Series.index Return index of the series
Series.dtype Return datatype of underlying data
Series.shape Return tuples of the shape of underlying data
Series.nbytes Return bytes of underlying data
Series.ndim Return the number of dimension
Series.itemsize Return the size of the datatype
Series.Values Return values of the series
Series.hasnans Return true if there are any NaN
Series.Empty Return true if series object is empty
Example:
Import pandas as pd
S = pd . Series ( range( 1, 15, 3 ), index= [ x for x in ‘abcde’ ] )
a b c d e
1 4 7 10 13
Print S.index
Output: a b c d e
Print S.values
Output: 1 4 7 10 13
Print S.size
Output: 5
Print S.itemsize
Output: 8
Print S.nbytes
Output: 40
Print S.ndim
Output: 1
Sorting series values in ascending and descending order:
Ascending order:
Import pandas as pd
S = pd . Series( [40, 20, 30, 50])
S .sort_values( )
Print(S)
0utput:
20
30
40
50
Descending order:
Import pandas as pd
S = pd . Series( [40, 20, 30, 50 ] )
S .sort_values(ascending = false)
Print(S)
0utput:
50
40
30
20
Import pandas as pd
S = pd . Series( [40, 20, 30, 50 ] )
Print(S [ S > 35 ] )
0utput:
40
50
Import pandas as pd
S = pd . Series( [40, 20, 30, 50 ] )
Print( S > 35 )
0utput:
True
False
False
True
Tiling a series:
Import pandas as pd
S = pd . Series(np.tile ( [3 , 5] , 2 ) )
Print( S )
Output:
0 3
1 5
2 3
3 5
import pandas as pd
num = [000, 100, 200, 300, 400, 500, 600, 700, 800, 900 ]
id = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J' ]
series = pd.Series(num, index = id)
0 1 2 3 4 5 6 7 8 9
A B C D E F G H I J
000 100 200 300 400 500 600 700 800 900
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1
print(series[2:4])
OUTPUT:
C 200
D 300
print(series[ : 6 ] )
OUTPUT:
A 0
B 100
C 200
D 300
E 400
F 500
print(series[ 1 : 6 : 2 ] )
OUTPUT:
B 100
D 300
F 500
print(series[ 4 : ] )
OUTPUT:
E 400
F 500
G 600
H 700
I 800
J 900
print(series[ : 4 : 2 ] )
OUTPUT:
A
C
print(series[ 4 : :2])
OUTPUT:
E 400
G 600
I 800
print(series[ : : -1 ] )
OUTPUT:
J 900
I 800
H 700
G 600
F 500
E 400
D 300
C 200
B 100
A 0
Chapter-2 Pandas Python -2
Dataframe:
It is a panda data structure which stores data in two-dimensional way or tabular form.
Dataframe can be created using DataFrame( ) method.
Features of dataframe:
Size of dataframe is mutable (can be changed)
Values of dataframe are also mutable ( can change anytime)
Arithmetic operations on row and columns.
A B C D
0
1
2
3
Creation of Dataframe:
Import pandas as pd
df1= pd.DataFrame( )
print df1
output: blank dataframe is created
0 10
1 20
2 30
3 40
4 50
Age Name
0 20 Shreya
1 22 Rakshit
2 26 Madhav
Creating dataframe from 2D dictionary having values as List:
Import pandas as pd
Dict1 = { ‘Student’ : [ ‘Ruchika’ , ‘Neha’ , ‘Raj’ , ‘Tanu’ ],
‘Sports’ : [ ‘Football’ , ‘Chess’ , ‘Ludo’ , ‘Hockey’],
‘Marks’ : [ 200 , 300 , 100 , 400 ] }
df1= pd.DataFrame ( Dict1, index= [‘I’ , ‘II’ , ‘III’ , ‘IV’ ] )
print df1
Output:
OR
Import pandas as pd
List1 = [ 10 , 20 , 30 ], [ 40 , 50 , 60] , [ 70 , 80 , 90 ] ]
df1= pd.DataFrame ( List1)
print df1
Output:
0 1 2
0 10 20 30
1 40 50 60
2 70 80 90
Import pandas as pd
Dict1 = { ‘ Sales’ : [ Name: ‘Ruchika’ , ‘Age’ : ‘24’ , ‘Gender’ : ‘Female’ ],
Marketing’ : [ Name: ‘Vikas’ , ‘Age’ : ‘21’ , ‘Gender’ : ‘Male’ ] }
df1= pd.DataFrame ( Dict1)
print df1
Output:
Marketing Sales
Age 21 24
Gender Male Female
Name Vikas Ruchika
Q: Write a program to create a dataframe from a list containing dictionaries of the
performance of 3 topper students. Topper grade should be row labels.
Import pandas as pd
TopperA = { ‘Rollno’ : 10 , ‘Name’ : ‘Arun’ , ‘Marks’ : ‘99’ } ,
TopperB = { ‘Rollno’ : 12 , ‘Name’ : ‘Rishi’ , ‘Marks’ : ‘98’ } ,
TopperC = { ‘Rollno’ : 13 , ‘Name’ : ‘Preet’ , ‘Marks’ :96’ } ,
Output:
Rollno Name Marks
TopperA 10 Arun 99
TopperB 12 Rishi 98
TopperC 13 Preet 96
====================================================================
Ques: Suppose you have a table Student , you want to change index from 0-3 to name.
Output:
Rollno Marks
Arun 10 99
Rishi 12 98
Preet 13 96
Nishi 14 99
(ii) To Reset index column:
Import pandas as pd
df [‘Age’] = [ 20, 30, 25, 26, 15]
df [‘Age2’] = 45
df [‘Age3’] = pd. Series ( [42, 44, 50, 60, 45] , index= [0, 1, 2, 3 ,4] )
df[‘Total’} = df[Age’] + df[‘Age2’] +df[‘Age3’]
print df
Output:
Age Age2 Age3 Total
0 20 45 42 107
1 30 45 44 119
2 25 45 50 120
3 26 45 60 131
4 15 45 45 105
output:
Age Total
0 20 107
1 30 119
2 25 120
3 26 131
4 15 105
(b) df.iLoc [ : , [0 : 3] ]
output:
Age Age2 Age3
0 20 45 42
1 30 45 44
2 25 45 50
3 26 45 60
4 15 45 45
(c) df.iLoc [ : , [0 : 4] ]
output:
Age Age2 Age3 Total
0 20 45 42 107
1 30 45 44 119
2 25 45 50 120
3 26 45 60 131
4 15 45 45 105
====================================================================
Delete column from dataframe:
df.drop [‘ Total’, axis=1 ] // axis =1 is used for column bcos by default axis value is 0 forrows.
output:
Age Age2 Age3
0 20 45 42
1 30 45 44
2 25 45 50
3 26 45 60
4 15 45 45
df.rename( index = { ‘Sec A’ : ‘A’ , ‘Sec B’ : ‘B’ , ‘Sec C’ : ‘C’ } , inplace= True )
OUTPUT:
OUTPUT:
==============================================================
Modify rows and columns:
Adding Columns:
output:
output:
output:
output:
df.school [‘Delhi’] = 00
output:
===============================================================
Boolean indexing:
it means we have Boolean values (True or False) and (1 or 0) as indexes of dataframe.
OR
If we use in a program
df = pd.dataframe( dc, index=[ 1, 0 , 1 , 0 ,1)
output:
Days no of classes
1 mon 5
0 tue 0
1 wed 3
0 thu 0
1 fri 8
df.loc [True] // this function prints all values whose index is true.
output:
Days no of classes
True mon 5
True wed 3
True fri 8
(a) df.loc [False] // this function prints all values whose index is false.
output:
Days no of classes
False tue 0
False thu 0
===============================================================
Q2: Suppose we have a dataframe df
Days no of classes
mon 5
tue 0
wed 3
thu 1
fri 8
output:
Days no of classes
1 mon 5
1 wed 3
1 fri 8
-----------------------------------------------------------------------------------------------------------------------
df.index:
output:
0 , 1, 2 , 3
df.columns:
output:
Rollno, Name Marks
df.axes
output:
index : 0, 1, 2, 3
index: Rollno , Name , Marks
df.datatypes
output:
Rollno: int
Name : object/ text
Marks: int
Len(df)
Output: 4
df.count()
output:
Rollno: 4
Name :4
Marks :3
=======================================================
Transpose a dataframe:
We have a dataframe df
Rollno Name Marks
0 10 Arun 99
1 12 Rishi 98
2 13 Preet 76
df. Transpose ( )
After transpose:
Output:
0 1 2
Rollno 10 12 13
Name Arun Rishi Preet
Marks 99 98 76
Selecting and accessing dataframe:
output:
Name
0 Arun
1 Rishi
2 Preet
3 Nishi
df [ ‘Rollno’ , ’Name’]
output:
Rollno Name
0 10 Arun
1 12 Rishi
2 13 Preet
3 14 Nishi
output:
output:
output:
output:
Rollno Marks
0 10 99
1 12 88
2 13 75
3 14 93
Ex-3 df [0: 2]
output:
output:
Rollno Name
0 10 Arun
1 12 Rishi
2 13 Preet
Q: Write a program to create dataframe from 2D array as shown below:
10 20 30
40 50 60
70 80 90
======================================================
Import pandas as pd
Yr1 = { ‘Qtr1’ : 4400 , ‘Qtr2’ : 6600 , ‘Q3’ : 2200 }
Yr2 = {‘A’ : 5400, ‘B’ : 3200, ‘Qtr3’ : 1100 }
Sales ={ 1 : Yr1 , 2 :
Yr2 } df1 =
pd.dataframe( sales)
It contain 2 methods:
Example:
Import pandas as pd
Sales = { ‘yr1’ : { ‘ Qtr1’ : 100 , ‘Qtr2’ : 200 , ‘Qtr3’ : 300 },
{ ‘yr2’ : { ‘ Qtr1’ : 400 , ‘Qtr2’ : 500 , ‘Qtr3’ : 600 },
{ ‘yr3’ : { ‘ Qtr1’ : 700 , ‘Qtr2’ : 800 , ‘Qtr3’ : 900 } }
df1 = pd.Dataframe(Sales)
output:
Example:
Import pandas as pd
Sales = { ‘yr1’ : { ‘ Qtr1’ : 100 , ‘Qtr2’ : 200 , ‘Qtr3’ : 300 },
{ ‘yr2’ : { ‘ Qtr1’ : 400 , ‘Qtr2’ : 500 , ‘Qtr3’ : 600 },
{ ‘yr3’ : { ‘ Qtr1’ : 700 , ‘Qtr2’ : 800 , ‘Qtr3’ : 900 } }
df1 = pd.Dataframe(Sales)
output:
Example 1: Example : 2
Suppose we have 2 dataframes df1 and df2. Suppose we have 2 dataframes df1 and df2.
output: output:
Example 1: Example : 2
Suppose we have 2 dataframes df1 and df2. Suppose we have 2 dataframes df1 and df2.
output: output:
3. Multiplication: It contains two functions mul() and rmul().
Both functions can perform multiplication, but rmul() perform multiplication in
reverse . The result of both functions are same:
Example 1: Example : 2
Suppose we have 2 dataframes df1 and Suppose we have 2 dataframes df1 and df2.
df2.
output: output:
Example 1: Example : 2
Suppose we have 2 dataframes df1 and df2. Suppose we have 2 dataframes df1 and df2.
output: output:
Ques: Given 2 dataframes storing points of a 2-player teams in four rounds, write a
program to calculate average points obtained by each player in each round.
Ques: Read the code carefully and answer the given questions
Import pandas as pd
Prod = { ‘Grapes’ : { ‘ Assam’ : 800 ,‘Kerala : 200 , ‘Jammu’ : 600 },
{ ‘Orange’ : { ‘ Assam’ : 400 , ‘Kerala’ : 600 , ‘Jammu’ : 300 },
{ ‘Apple’ : { ‘ Assam’ : 600 , ‘Kerala’ : 700 , ‘Jammu’ : 900}}
df1 = pd.Dataframe(Prod)
OR
Ques: Suppose you have a dataframe prod
Output:
Grapes: 200
Orange: 300
Apple: 600
Prod.min (axis =1) : it returns the minimum production from items for every state.
Output:
Assam: 400
Kerala: 200
Apple: 300
Prod.max(): It returns the minimum production of each item from dataframe.
Output:
Grapes: 800
Orange: 600
Apple: 900
Prod.max (axis =1) : it returns the minimum production from items for every state.
Output:
Assam: 800
Kerala: 700
Apple: 900
Note: idxMax() and idxMin() functions returns the index for maximum and minimum values of
the dataframe. It cannot return data/ values , it returns index only
Ques1: Suppose you have a dataframe df . Find the mean , mode, median,
count and sum.
A B C D
Hin 50 60 30 60
Eco 40 50 40 90
Eng 30 80 20 30
IP 50 70 20 20
Math 90 NAN 60 90
df . mode( ) df . mode(axis=1 )
Output: Output:
A : 50 Hin: 60
B : 50, 60, 70, 80 , NaN Eco : 40
C : 20 Eng : 30
D : 90 IP : 20
Math: 90
df . median( ) df . median(axis=1 )
Output: Output:
A : 50 Hin: 55
B : 65 Eco : 45
C : 30 Eng : 30
D : 60 IP : 35
Math: 90
df . mean( ) df . mean(axis=1 )
Output: Output:
A : 52 Hin: 50
B : 65 Eco : 55
C : 34 Eng : 40
D : 58 IP : 40
Math: 80
df . count( ) df . count(axis=1 )
Output: Output:
A :5 Hin: 4
B :4 Eco : 4
C :5 Eng : 4
D :5 IP : 4
Math: 3
df . sum( ) df . sum(axis=1 )
Output: Output:
A : 260 Hin: 200
B : 260 Eco : 220
C : 170 Eng : 160
D : 290 IP : 160
Math: 240
------------------------------------------------------------------------------------------------------------------------
Ques: Suppose you have a dataframe mks1
output:
ID Name Age Contact
11 Arun 21 34534
22 Akash 22 37556
33 Naman 34 74646
44 Deepak 18 75645
55 vaibhav 24 77754
Advantages of CSV:
A common format for data interchange.
All spreadsheet and databases support import and export to CSV format.
Simple and compact storage.
df.shape
Output: 3 , 2
output:
ID Name
0 11 Arun
1 22 Vikas
Reading CSV without header:
Import pandas as pd
df = pd . read_CSV ( “ E:\\ myfolder\\Employee.CSV” , header= None ) print df
output:
0 1 2
0 11 Arun 55
1 22 Vika 78
s
11 Arun 55
22 Vika 78
s
ID, Name ,
Reading CSV file having separator
11, arun 5
22, 7
Import pandas as pd
11, Arun, 55
22, Vikas, 78
33, Sachin, 34
Import pandas as pd
print df
ID, Name ,
Marks 11,
arun, 55
Handling NaN values with to_CSV( ): suppose we have a CSV file shown below:
ID, Name ,
Marks 11,
arun, 55
22, vikas, 78
df.loc [ 0 , “Name” ] = np.NaN
df.loc [ 2 , “Marks” ] = np.NaN
output:
ID, Name, Marks
11, NaN 55
22, Vikas, 78
33 Tanu NaN
Solution
Import pandas as pd
Import mysql. connector as sqltor
mycon = sqltor .connect ( host = “localhost” , user = “root” , passwd =”Mypass” , database =”test”)
If mycon. is_connected( ):
df = pd. read_sql (“select * from student ; “ , mycon )
print df
else
print (“Connection is failed”)
output:
ID Name Marks Age
11 Arun 55 21
22 Vikas, 78 24
33 Tanu 46 27
44 Neha 64 24
Example2 : Suppose we have a test.CSV file and we want to access all data
based on condition (access records of those students whose marks is greater
than 60) from this CSV file using SQL.
Solution
Import pandas as pd
Import mysql. connector as sqltor
mycon = sqltor .connect ( host = “localhost” , user = “root” , passwd =”Mypass” ,
database =”test”) If mycon. is_connected( ):
df = pd. read_sql (“select * from student where Marks > 60 ; “ , mycon )
print df
else
print (“Connection is failed”)
output:
ID Name Marks Age
22 Vikas, 78 24
44 Neha 64 24
Example: Suppose we have a test.CSV file and we want to access all students
records whose name is begin with a specific letter.
Solution
Import pandas as pd
Import mysql. connector as sqltor
mycon = sqltor .connect ( host = “localhost” , user = “root” , passwd =”Mypass” , database =”test”)
If mycon. is_connected( ):
print df
else
print (“Connection is failed”)
output:
ID Name Marks Age
22 Vikas, 78 24
Df = pd.dataframe(
d ) print df
Df. To_sql (“Cricket team” , con)
Output:
name age
0 arun 25
1 dev 26
2 vani 29
Note:
read_CSV() : this function is used to read data from a CSV file in your dataframe
to_CSV() : this function saves the data of dataframe on a CSV file.
read_sql ( ): this function is used to read the contents from an sql database into dataframe.
to_sql( ) : this function is used to write the contents from an sql database into a dataframe.
----------------------------------------------------------------------------------------------------------------
Data visualization
Data visualization refers to the graphical or visual representation of information and data using
visual elements like charts, graphs, maps etc.
Data visualization helps us to easily understand a complex problems and see certain patterns.
MatPlotLib: It is a 2D plotting library that helps in visualizing figures.
It is used python for data visualization.
Pyplot: It is a module of matplotlib library containing collection of methods which allows users to create 2D
plots and graphs easily and interactively.
Plot: A plot is a graphically representation techiniques for representing a dataset, usually as a graph ,
showing the relationship between two or more variables
.
Numpy: It stands for number python. It is a library for scientific computing (multidimensional array) in
python.
_______________________________________________________
Types of chart:
1. Line Graph
A line chart or line graph is a type of chart which displays information as a series of data points
called ‘markers’ connected by straight line segments.
This type of plot is often used to visualize a trend in data over intervals of time.
Plot() is used to create line chart.
Example: write a program in python to plot a line chart to depict the changing weekly
onion prices for 4 weeks.
2. Scatter Chart:
A scatter chart uses dots to represent values for two different numeric variables.
Scatter plots are used to observe relationships between variables.
Scatter() is used to create scatter chart.
Example: write a program in python to plot a scatter chart to depict the changing weekly onion
prices for 4 weeks
plt.scatter( S = 8 , C =’r’ , marker=’x’) //it means size is 8, color is red and marker design is x.
Example: Creating the scatter chart using list values name and price with marker properties.
Example: write a program to to plot a scatter graph taking tgwo random distributions in x
and y having randomly generated integers and plotted .
3. Bar Chart:
A bar chart is a chart or graph that presents categorical data with rectangular bars with heights.
The bars can be plotted vertically or horizontally.
bar() is used to create bar chart vertically.
Barh() is used to create bar chart vertically.
plt.bar(width=[0.6,0.5,0.4,0.3,0.2,])
it changes the bar sizes .
plt.bar(color=['red','gold','silver','blue','green'])
it changes the colors of the bars available in program
Example: Creating the bar chart using name and prices with different colors and different sizes.
Example: Creating the bar chart using name and prices horizontally.
Example: write a program to plot a bar chart form the persons A,B,C with their unit test
marks.
4. Pie Chart:
A type of graph in which a circle is divided into sectors that each represent a proportion of the whole.
Pie() is used to create pie chart.
Example: Create a pie chart using name and marks with label and title
Example: Create a pie chart using name and sizes with colors and explode properties.
Add legends in charts:
Example: create bar chart on common plot where three data ranges are plotted on same
chart.
Histogram:
A histogram is a summarization tool for discrete or continuous data.
hist() method is to create histogram.
import numpy as np
d=np.array([10,20,30,40,50,60,70])
print(d[-4:])
2. Fill in the blank with appropriate numpy method to calculate and print the variance of an array.
import numpy as np
data=np.array([1,2,3,4,5,6])
print(np. (data,ddof=0)
Output: np.var
3. Mr. Sanjay wants to plot a bar graph for the given set of values of subject on x-axis and
number of students who opted for that subject on y-axis.
4. Mr. Harry wants to draw a line chart using a list of elements named LIST.
Output:
(i) PLINE.plot(LIST)
(ii) PLINE.ylabel(“Sample Numbers”)
6.
Output:
7. method in Pandas can be used to change the index of rows and columns of a
Series or Dataframe :
Ans: reindex()
Output: df = df.drop(0)
print(df )
9. Write a python code to create a dataframe with appropriate headings from the list given below
['S101', 'Amy', 70], ['S102', 'Bandhi', 69], ['S104', 'Cathy', 75], ['S105', 'Gundaho', 82]
output:
import pandas as pd
data = [['S101', 'Amy', 70], ['S102', 'Bandhi', 69], ['S104','Cathy', 75], ['S105', 'Gundaho', 82]]
df = pd.DataFrame(data, columns = ['ID', 'Name', 'Marks'])
print(df )
10. Write a small python codeto create a dataframewith headings(a and b) from the list given
below :
[ [1,2],[3,4],[5,6],[7,8] ]
Output:
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])
df = df.append(df2)
11.
Output:
(i) print(df.mean(axis = 1, skipna = True))
print(df.mean(axis = 0, skipna = True))
(iii) print(df.median())
12..
Output:
(i) df1.sum()
(ii) df1[‘Rainfall’].mean()
(iii) df1.loc[:11, ‘maxtemp’:’Rainfall’].mean( )
13. Find the output of the following code:
import pandas as pd
data = [{'a': 10, 'b': 20}, {'a': 6, 'b': 32, 'c': 22}]
#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'c'])
print(df1
)
print(df2
)
OutPut:
a b
first 10 20
second 6 32
a c
first 10 NaN
second 6 NaN
Output:
(i) import numpy as
np import pandas
as pd
df1 = pd.DataFrame({'mark1':[30,40,15,40], 'mark2':[20,45,30,70]});
df2 = pd.DataFrame({'mark1':[10,20,20,50], 'mark2':[15,25,30,30]});
print(df1
)
print(df2
)
(i) print(df1.add(df2))
(ii) print(df1.subtract(df2))
(iii) df1.rename(columns={'mark1':'marks1'},
inplace=True) print(df1)
=================================================