0% found this document useful (0 votes)
4 views

Python Pandas-DataFrames Complete - Jupyter Notebook

Xcv

Uploaded by

santro9776
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Python Pandas-DataFrames Complete - Jupyter Notebook

Xcv

Uploaded by

santro9776
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

PANDAS DATAFRAMES

Pandas DataFrame-a Two-dimensional size-mutable, heterogeneous tabular data structure.


Tabular data structure has rows and columns.Pandas DataFrame is similar to excel sheet.
Data Frame can be created using Dictionary & List.

In [ ]: 1 #Create a dataframe from a list

In [1]: 1 import pandas as pd


2 games_list = ['Cricket', 'Volleyball', 'Judo', 'Hockey']
3 df= pd.DataFrame(games_list,index =['G1','G2','G3','G4'])
4 df

Out[1]: 0

G1 Cricket

G2 Volleyball

G3 Judo

G4 Hockey

In [1]: 1 #Create a dataframe from a dictionary with default index


2 import pandas as pd
3 dict1={"Name":["Riya","Rishab","Isha","Rahul"],
4 "Age":[19,23,20,18],
5 "Class":[12,11,12,12]}
6 df=pd.DataFrame(dict1)
7 df

Out[1]: Name Age Class

0 Riya 19 12

1 Rishab 23 11

2 Isha 20 12

3 Rahul 18 12
In [7]: 1 #Create a dataframe from a dictionary with custom index
2 import pandas as pd
3 dict1={"Name":["Riya","Rishab","Isha","Rahul"],"Age":[19,23,20,18]}
4 df=pd.DataFrame(dict1,index=["P1","P2","P3","P4"])
5 df

Out[7]: Name Age

P1 Riya 19

P2 Rishab 23

P3 Isha 20

P4 Rahul 18

In [26]: 1 import pandas as pd


2 dic={'Rollno':[1,2,3,4,5,6],
3 'Name':["Prerna Singh","Manish Arora","Tanish Goel", "Falguni Jain","
4 'UT1':[24,18,20,22,15,20],
5 'UT2':[24,17,22,20,20,15],
6 'UT3':[20,19,18,24,18,22],
7 'UT4':[22,22,24,20,22,24]
8 }
9 df=pd.DataFrame(dic,index=["P1","P2","P3","P4","P5","P6"])
10 df

Out[26]: Rollno Name UT1 UT2 UT3 UT4

P1 1 Prerna Singh 24 24 20 22

P2 2 Manish Arora 18 17 19 22

P3 3 Tanish Goel 20 22 18 24

P4 4 Falguni Jain 22 20 24 20

P5 5 Kanika Bhatnagar 15 20 18 22

P6 6 Ramandeep Kaur 20 15 22 24

In [32]: 1 df.index

Out[32]: Index(['P1', 'P2', 'P3', 'P4', 'P5', 'P6'], dtype='object')

In [31]: 1 df.info

Out[31]: <bound method DataFrame.info of Rollno Name UT1 UT2 UT3


UT4
P1 1 Prerna Singh 24 24 20 22
P2 2 Manish Arora 18 17 19 22
P3 3 Tanish Goel 20 22 18 24
P4 4 Falguni Jain 22 20 24 20
P5 5 Kanika Bhatnagar 15 20 18 22
P6 6 Ramandeep Kaur 20 15 22 24>
In [30]: 1 df.count()

Out[30]: Rollno 6
Name 6
UT1 6
UT2 6
UT3 6
UT4 6
dtype: int64

In [27]: 1 df.ndim

Out[27]: 2

In [28]: 1 df.shape

Out[28]: (6, 6)

In [29]: 1 df.columns

Out[29]: Index(['Rollno', 'Name', 'UT1', 'UT2', 'UT3', 'UT4'], dtype='object')

In [11]: 1 #Create a dataframe from a dictionary with custom index


2 import pandas as pd
3 dict1={"Name":["Riya","Rishab","Isha","Rahul"],"Age":[19,23,20,18]}
4 df=pd.DataFrame(dict1,index=["P1","P2","P3","P4"])
5 df

Out[11]: Name Age

P1 Riya 19

P2 Rishab 23

P3 Isha 20

P4 Rahul 18

In [14]: 1 #attribute access


2 df.Name

Out[14]: P1 19
P2 23
P3 20
P4 18
Name: Age, dtype: int64
In [15]: 1 df['Name']

Out[15]: P1 Riya
P2 Rishab
P3 Isha
P4 Rahul
Name: Name, dtype: object

In [16]: 1 df.Age

Out[16]: P1 19
P2 23
P3 20
P4 18
Name: Age, dtype: int64

In [17]: 1 df['Age']

Out[17]: P1 19
P2 23
P3 20
P4 18
Name: Age, dtype: int64

In [18]: 1 df.index

Out[18]: Index(['P1', 'P2', 'P3', 'P4'], dtype='object')

In [19]: 1 df.info

Out[19]: <bound method DataFrame.info of Name Age


P1 Riya 19
P2 Rishab 23
P3 Isha 20
P4 Rahul 18>

In [20]: 1 df.shape

Out[20]: (4, 2)

In [21]: 1 df.columns

Out[21]: Index(['Name', 'Age'], dtype='object')

In [22]: 1 df.ndim

Out[22]: 2
In [23]: 1 df

Out[23]: Name Age

P1 Riya 19

P2 Rishab 23

P3 Isha 20

P4 Rahul 18

In [24]: 1 print(df)

Name Age
P1 Riya 19
P2 Rishab 23
P3 Isha 20
P4 Rahul 18

In [25]: 1 df.count()

Out[25]: Name 4
Age 4
dtype: int64

In [33]: 1 import pandas as pd


2 import numpy as np
3 df = pd.DataFrame({"Person":["Jhonny", "Mira", "Tom", "Jhonny", "Mira"],
4 "Age": [26., np.nan, 24., 35, 36],
5 "Single": [False, True, True, True, False]})
6 df

Out[33]: Person Age Single

0 Jhonny 26.0 False

1 Mira NaN True

2 Tom 24.0 True

3 Jhonny 35.0 True

4 Mira 36.0 False

Head & Tail function in dataframe


In [2]: 1 import pandas as pd
2 import numpy as np
3 name_dict = {'Name' : ["Anita", "Sajal", "Ayaan", "Abhey","Rahul","Isha"]
4 'Age' : [14,32, 3, 6,10,13] }
5 df = pd.DataFrame(name_dict)
6 df
7 print("-----First Five Rows-----")
8 print(df.head()) # Displays first Five Rows

-----First Five Rows-----


Name Age
0 Anita 14
1 Sajal 32
2 Ayaan 3
3 Abhey 6
4 Rahul 10

In [3]: 1 print("-----First Two Rows-----")


2 print(df.head(2)) # Displays first 2 Rows

-----First Two Rows-----


Name Age
0 Anita 14
1 Sajal 32

In [4]: 1 print("-----Last Five Rows-----")


2 print(df.tail()) # Displays last Five Rows
3 print("-----Last Two Rows-----")
4 print(df.tail(2)) # Displays last 2 Rows

-----Last Five Rows-----


Name Age
1 Sajal 32
2 Ayaan 3
3 Abhey 6
4 Rahul 10
5 Isha 13
-----Last Two Rows-----
Name Age
4 Rahul 10
5 Isha 13

In [8]: 1 #Display all rows except last 2 rows


2 df.head(-2)

Out[8]: Name Age

0 Anita 14

1 Sajal 32

2 Ayaan 3

3 Abhey 6
In [9]: 1 df.tail(-1)

Out[9]: Name Age

1 Sajal 32

2 Ayaan 3

3 Abhey 6

4 Rahul 10

5 Isha 13

In [2]: 1 import pandas as pd


2
3 # dictionary of lists
4 dict1 = {'Name':["aparna", "pankaj", "sudhir", "Geeku"],
5 'Degree': ["BCA", "BCA", "M.Tech", "BCA"],
6 'Score':[90, 40, 80, 98]}
7
8 # creating a dataframe
9 df = pd.DataFrame(dict1)
10 df

Out[2]: Name Degree Score

0 aparna BCA 90

1 pankaj BCA 40

2 sudhir M.Tech 80

3 Geeku BCA 98

In [19]: 1 df[df['Score']<=40]

Out[19]: Name Degree Score

1 pankaj BCA 40
In [15]: 1 print(df['Score'])
2 print(df['Degree'])
3 print(df['Name'])

0 90
1 40
2 80
3 98
Name: Score, dtype: int64
0 BCA
1 BCA
2 M.Tech
3 BCA
Name: Degree, dtype: object
0 aparna
1 pankaj
2 sudhir
3 Geeku
Name: Name, dtype: object

In [1]: 1 import pandas as pd


2 dict={'Rollno':[1,2,3,4],
3 'Name':["Aman","Preeti","Kartik", "Lakshay"],
4 'Class':['IX','X','IX','X'],
5 'Section':['E','F','D','A'],
6 'CGPA':[8.7,8.9,9.2,9.4],
7 'Stream':["Science","Arts","Science","Commerce"]
8 }
9 classframe=pd.DataFrame(dict,index=["ST1","ST2","ST3","ST4"])
10 print(classframe)

Rollno Name Class Section CGPA Stream


ST1 1 Aman IX E 8.7 Science
ST2 2 Preeti X F 8.9 Arts
ST3 3 Kartik IX D 9.2 Science
ST4 4 Lakshay X A 9.4 Commerce
In [2]: 1 import pandas as pd
2 dict1={"Rollno":[1,2,3,4],
3 "Name":["Aman","Preeti","Kartik","Lakshay"],
4 "Class":["IX","X","IX","X"],
5 "Section":["E","F","D","A"],
6 "CGPA":[8.7,8.9,9.2,9.4],
7 "Stream":["Science","Arts","Science","Commerce"]
8 }
9 df=pd.DataFrame(dict1,index=["ST1","ST2","ST3","ST4"])
10 df

Out[2]: Rollno Name Class Section CGPA Stream

ST1 1 Aman IX E 8.7 Science

ST2 2 Preeti X F 8.9 Arts

ST3 3 Kartik IX D 9.2 Science

ST4 4 Lakshay X A 9.4 Commerce

In [4]: 1 df[df['CGPA']>9]

Out[4]: Rollno Name Class Section CGPA Stream

ST3 3 Kartik IX D 9.2 Science

ST4 4 Lakshay X A 9.4 Commerce

In [3]: 1 import pandas as pd


2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df

Out[3]: TNAME TANO TNADD SALARY

0 AMIT T01 123 PASCHIM VIHAR 23000

1 RAJESH TO2 6/11 RAMESH NAGAR 34000

2 BINNY T03 5 WEST PUNJABHI BAG H 12000

3 CHARU T04 23 MALVIYA NAGAR 45000

4 MEENAKSHI TO5 19 MEERA BAGH 34000


In [5]: 1 df.SALARY

Out[5]: 0 23000
1 34000
2 12000
3 45000
4 34000
Name: SALARY, dtype: int64

In [6]: 1 df['SALARY']

Out[6]: 0 23000
1 34000
2 12000
3 45000
4 34000
Name: SALARY, dtype: int64

In [7]: 1 df['SALARY']>16000

Out[7]: 0 True
1 True
2 False
3 True
4 True
Name: SALARY, dtype: bool

In [10]: 1 df[df['SALARY']>16000]

Out[10]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

RAJESH TO2 6/11 RAMESH NAGAR 34000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI TO5 19 MEERA BAGH 34000

In [8]: 1 df=df.set_index('TNAME')
In [9]: 1 df

Out[9]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

RAJESH TO2 6/11 RAMESH NAGAR 34000

BINNY T03 5 WEST PUNJABHI BAG H 12000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI TO5 19 MEERA BAGH 34000

In [27]: 1 import pandas as pd


2 import numpy as np
3 dic={'rollno':[1,2,3,4,5,6],
4 'name':["Prerna Singh","Manish Arora","Tanish Goel", "Falguni Jain","
5 'UT1':[24,18,20,22,15,20],
6 'UT2':[24,17,22,20,np.nan,15],
7 'UT3':[20,19,18,24,18,22],
8 'UT4':[22,22,24,20,22,24]
9 }
10 df=pd.DataFrame(dic)
11 df

Out[27]: rollno name UT1 UT2 UT3 UT4

0 1 Prerna Singh 24 24.0 20 22

1 2 Manish Arora 18 17.0 19 22

2 3 Tanish Goel 20 22.0 18 24

3 4 Falguni Jain 22 20.0 24 20

4 5 Kanika Bhatnagar 15 NaN 18 22

5 6 Ramandeep Kaur 20 15.0 22 24

In [25]: 1 df[df['rollno']==4]

Out[25]: rollno name UT1 UT2 UT3 UT4

3 4 Falguni Jain 22 20 24 20
In [28]: 1 print(df.count())

rollno 6
name 6
UT1 6
UT2 5
UT3 6
UT4 6
dtype: int64

In [29]: 1 print(df.columns)

Index(['rollno', 'name', 'UT1', 'UT2', 'UT3', 'UT4'], dtype='object')

Using loc & iloc


In [8]: 1 import pandas as pd
2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df

Out[8]: TNAME TANO TNADD SALARY

0 AMIT T01 123 PASCHIM VIHAR 23000

1 RAJESH TO2 6/11 RAMESH NAGAR 34000

2 BINNY T03 5 WEST PUNJABHI BAG H 12000

3 CHARU T04 23 MALVIYA NAGAR 45000

4 MEENAKSHI TO5 19 MEERA BAGH 34000

loc is label based and iloc is index based integers to retreive rows from dataframe

In [10]: 1 df.iloc[1:4]

Out[10]: TNAME TANO TNADD SALARY

1 RAJESH TO2 6/11 RAMESH NAGAR 34000

2 BINNY T03 5 WEST PUNJABHI BAG H 12000

3 CHARU T04 23 MALVIYA NAGAR 45000


In [11]: 1 df.iloc[2:3]

Out[11]: TNAME TANO TNADD SALARY

2 BINNY T03 5 WEST PUNJABHI BAG H 12000

In [10]: 1 df.loc[2:4]

Out[10]: TNAME TANO TNADD SALARY

2 BINNY T03 5 WEST PUNJABHI BAG H 12000

3 CHARU T04 23 MALVIYA NAGAR 45000

4 MEENAKSHI TO5 19 MEERA BAGH 34000

In [15]: 1 import pandas as pd


2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df

Out[15]: TNAME TANO TNADD SALARY

0 AMIT T01 123 PASCHIM VIHAR 23000

1 RAJESH TO2 6/11 RAMESH NAGAR 34000

2 BINNY T03 5 WEST PUNJABHI BAG H 12000

3 CHARU T04 23 MALVIYA NAGAR 45000

4 MEENAKSHI TO5 19 MEERA BAGH 34000

Adding a column
In [17]: 1 df ['Grade']=['A','B','A','A','B']
2 df

Out[17]: TNAME TANO TNADD SALARY Grade

0 AMIT T01 123 PASCHIM VIHAR 23000 A

1 RAJESH TO2 6/11 RAMESH NAGAR 34000 B

2 BINNY T03 5 WEST PUNJABHI BAG H 12000 A

3 CHARU T04 23 MALVIYA NAGAR 45000 A

4 MEENAKSHI TO5 19 MEERA BAGH 34000 B


In [1]: 1 import pandas as pd
2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df

Out[1]: TNAME TANO TNADD SALARY

0 AMIT T01 123 PASCHIM VIHAR 23000

1 RAJESH TO2 6/11 RAMESH NAGAR 34000

2 BINNY T03 5 WEST PUNJABHI BAG H 12000

3 CHARU T04 23 MALVIYA NAGAR 45000

4 MEENAKSHI TO5 19 MEERA BAGH 34000

In [2]: 1 df=df.set_index('TNAME')

In [3]: 1 df

Out[3]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

RAJESH TO2 6/11 RAMESH NAGAR 34000

BINNY T03 5 WEST PUNJABHI BAG H 12000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI TO5 19 MEERA BAGH 34000

In [4]: 1 df['Allowance']=[4000,6000,8000,10000,'']
2 df

Out[4]: TANO TNADD SALARY Allowance

TNAME

AMIT T01 123 PASCHIM VIHAR 23000 4000

RAJESH TO2 6/11 RAMESH NAGAR 34000 6000

BINNY T03 5 WEST PUNJABHI BAG H 12000 8000

CHARU T04 23 MALVIYA NAGAR 45000 10000

MEENAKSHI TO5 19 MEERA BAGH 34000


In [5]: 1 df['Desig']=['Manager','Clerk','Manager','HR','Manager']
2 df

Out[5]: TANO TNADD SALARY Allowance Desig

TNAME

AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager

RAJESH TO2 6/11 RAMESH NAGAR 34000 6000 Clerk

BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager

CHARU T04 23 MALVIYA NAGAR 45000 10000 HR

MEENAKSHI TO5 19 MEERA BAGH 34000 Manager

In [6]: 1 #add a column using assign function


2 df=df.assign(Tax=[500,100,300,200,150])

In [7]: 1 df

Out[7]: TANO TNADD SALARY Allowance Desig Tax

TNAME

AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager 500

RAJESH TO2 6/11 RAMESH NAGAR 34000 6000 Clerk 100

BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager 300

CHARU T04 23 MALVIYA NAGAR 45000 10000 HR 200

MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150

In [8]: 1 df.loc['AMIT']

Out[8]: TANO T01


TNADD 123 PASCHIM VIHAR
SALARY 23000
Allowance 4000
Desig Manager
Tax 500
Name: AMIT, dtype: object

In [9]: 1 df.loc[['AMIT','BINNY']]

Out[9]: TANO TNADD SALARY Allowance Desig Tax

TNAME

AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager 500

BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager 300


In [10]: 1 df.loc[['AMIT','BINNY'],['TANO','SALARY']]
2 ​

Out[10]: TANO SALARY

TNAME

AMIT T01 23000

BINNY T03 12000

In [11]: 1 df.loc[['AMIT','BINNY'],'SALARY']

Out[11]: TNAME
AMIT 23000
BINNY 12000
Name: SALARY, dtype: int64

In [12]: 1 df.loc['AMIT':'BINNY','SALARY']

Out[12]: TNAME
AMIT 23000
RAJESH 34000
BINNY 12000
Name: SALARY, dtype: int64

In [13]: 1 #adding a column with loc


2 df.loc[:,'HRA']=[3000,4000,5000,3000,6000]

In [14]: 1 df

Out[14]: TANO TNADD SALARY Allowance Desig Tax HRA

TNAME

AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager 500 3000

RAJESH TO2 6/11 RAMESH NAGAR 34000 6000 Clerk 100 4000

BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager 300 5000

CHARU T04 23 MALVIYA NAGAR 45000 10000 HR 200 3000

MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000


In [15]: 1 df.loc[:,'HRA']

Out[15]: TNAME
AMIT 3000
RAJESH 4000
BINNY 5000
CHARU 3000
MEENAKSHI 6000
Name: HRA, dtype: int64

In [16]: 1 df.HRA

Out[16]: TNAME
AMIT 3000
RAJESH 4000
BINNY 5000
CHARU 3000
MEENAKSHI 6000
Name: HRA, dtype: int64

In [17]: 1 df["HRA"]

Out[17]: TNAME
AMIT 3000
RAJESH 4000
BINNY 5000
CHARU 3000
MEENAKSHI 6000
Name: HRA, dtype: int64

In [18]: 1 df.loc[["AMIT","CHARU"],"HRA"]

Out[18]: TNAME
AMIT 3000
CHARU 3000
Name: HRA, dtype: int64

In [19]: 1 df

Out[19]: TANO TNADD SALARY Allowance Desig Tax HRA

TNAME

AMIT T01 123 PASCHIM VIHAR 23000 4000 Manager 500 3000

RAJESH TO2 6/11 RAMESH NAGAR 34000 6000 Clerk 100 4000

BINNY T03 5 WEST PUNJABHI BAG H 12000 8000 Manager 300 5000

CHARU T04 23 MALVIYA NAGAR 45000 10000 HR 200 3000

MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000


In [20]: 1 df['Total_Salary']=df['SALARY']+df['HRA']-df['Tax']

In [21]: 1 df

Out[21]: TANO TNADD SALARY Allowance Desig Tax HRA Total_Salary

TNAME

123 PASCHIM
AMIT T01 23000 4000 Manager 500 3000 25500
VIHAR

6/11 RAMESH
RAJESH TO2 34000 6000 Clerk 100 4000 37900
NAGAR

5 WEST
BINNY T03 12000 8000 Manager 300 5000 16700
PUNJABHI BAG H

23 MALVIYA
CHARU T04 45000 10000 HR 200 3000 47800
NAGAR

MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000 39850

In [22]: 1 #Sorting the data frame


2 dfsort=df.sort_values('Total_Salary')
3 dfsort

Out[22]: TANO TNADD SALARY Allowance Desig Tax HRA Total_Salary

TNAME

5 WEST
BINNY T03 12000 8000 Manager 300 5000 16700
PUNJABHI BAG H

123 PASCHIM
AMIT T01 23000 4000 Manager 500 3000 25500
VIHAR

6/11 RAMESH
RAJESH TO2 34000 6000 Clerk 100 4000 37900
NAGAR

MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000 39850

23 MALVIYA
CHARU T04 45000 10000 HR 200 3000 47800
NAGAR
In [23]: 1 dfsort=df.sort_values('Total_Salary',ascending=False)
2 dfsort

Out[23]: TANO TNADD SALARY Allowance Desig Tax HRA Total_Salary

TNAME

23 MALVIYA
CHARU T04 45000 10000 HR 200 3000 47800
NAGAR

MEENAKSHI TO5 19 MEERA BAGH 34000 Manager 150 6000 39850

6/11 RAMESH
RAJESH TO2 34000 6000 Clerk 100 4000 37900
NAGAR

123 PASCHIM
AMIT T01 23000 4000 Manager 500 3000 25500
VIHAR

5 WEST
BINNY T03 12000 8000 Manager 300 5000 16700
PUNJABHI BAG H

In [24]: 1 df['SALARY']=df['SALARY']+df['SALARY']*10/100
2 df

Out[24]: TANO TNADD SALARY Allowance Desig Tax HRA Total_Salary

TNAME

123 PASCHIM
AMIT T01 25300.0 4000 Manager 500 3000 25500
VIHAR

6/11 RAMESH
RAJESH TO2 37400.0 6000 Clerk 100 4000 37900
NAGAR

5 WEST
BINNY T03 13200.0 8000 Manager 300 5000 16700
PUNJABHI BAG H

23 MALVIYA
CHARU T04 49500.0 10000 HR 200 3000 47800
NAGAR

MEENAKSHI TO5 19 MEERA BAGH 37400.0 Manager 150 6000 39850

In [25]: 1 #Deleting a column using del


2 del df['Total_Salary']

In [26]: 1 df

Out[26]: TANO TNADD SALARY Allowance Desig Tax HRA

TNAME

AMIT T01 123 PASCHIM VIHAR 25300.0 4000 Manager 500 3000

RAJESH TO2 6/11 RAMESH NAGAR 37400.0 6000 Clerk 100 4000

BINNY T03 5 WEST PUNJABHI BAG H 13200.0 8000 Manager 300 5000

CHARU T04 23 MALVIYA NAGAR 49500.0 10000 HR 200 3000

MEENAKSHI TO5 19 MEERA BAGH 37400.0 Manager 150 6000


In [27]: 1 #Deleting a column using pop()
2 df.pop("Desig")

Out[27]: TNAME
AMIT Manager
RAJESH Clerk
BINNY Manager
CHARU HR
MEENAKSHI Manager
Name: Desig, dtype: object

In [28]: 1 df

Out[28]: TANO TNADD SALARY Allowance Tax HRA

TNAME

AMIT T01 123 PASCHIM VIHAR 25300.0 4000 500 3000

RAJESH TO2 6/11 RAMESH NAGAR 37400.0 6000 100 4000

BINNY T03 5 WEST PUNJABHI BAG H 13200.0 8000 300 5000

CHARU T04 23 MALVIYA NAGAR 49500.0 10000 200 3000

MEENAKSHI TO5 19 MEERA BAGH 37400.0 150 6000

In [29]: 1 df.drop(labels='Allowance',axis=1)

Out[29]: TANO TNADD SALARY Tax HRA

TNAME

AMIT T01 123 PASCHIM VIHAR 25300.0 500 3000

RAJESH TO2 6/11 RAMESH NAGAR 37400.0 100 4000

BINNY T03 5 WEST PUNJABHI BAG H 13200.0 300 5000

CHARU T04 23 MALVIYA NAGAR 49500.0 200 3000

MEENAKSHI TO5 19 MEERA BAGH 37400.0 150 6000

In [30]: 1 df.drop(labels='Tax',axis=1,inplace=True)
In [31]: 1 df

Out[31]: TANO TNADD SALARY Allowance HRA

TNAME

AMIT T01 123 PASCHIM VIHAR 25300.0 4000 3000

RAJESH TO2 6/11 RAMESH NAGAR 37400.0 6000 4000

BINNY T03 5 WEST PUNJABHI BAG H 13200.0 8000 5000

CHARU T04 23 MALVIYA NAGAR 49500.0 10000 3000

MEENAKSHI TO5 19 MEERA BAGH 37400.0 6000

In [32]: 1 df

Out[32]: TANO TNADD SALARY Allowance HRA

TNAME

AMIT T01 123 PASCHIM VIHAR 25300.0 4000 3000

RAJESH TO2 6/11 RAMESH NAGAR 37400.0 6000 4000

BINNY T03 5 WEST PUNJABHI BAG H 13200.0 8000 5000

CHARU T04 23 MALVIYA NAGAR 49500.0 10000 3000

MEENAKSHI TO5 19 MEERA BAGH 37400.0 6000

In [33]: 1 df.drop(labels=['Allowance','HRA'],axis=1,inplace=True)

In [34]: 1 df

Out[34]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 25300.0

RAJESH TO2 6/11 RAMESH NAGAR 37400.0

BINNY T03 5 WEST PUNJABHI BAG H 13200.0

CHARU T04 23 MALVIYA NAGAR 49500.0

MEENAKSHI TO5 19 MEERA BAGH 37400.0

INSERTING ROWS & DELETING ROWS


In [1]: 1 import pandas as pd
2 emp={'TNAME':['AMIT','RAJESH','BINNY','CHARU','MEENAKSHI'],
3 'TANO':['T01','TO2','T03','T04','TO5'],
4 'TNADD':['123 PASCHIM VIHAR','6/11 RAMESH NAGAR','5 WEST PUNJABHI BAG
5 'SALARY':[23000,34000,12000,45000,34000]}
6 df=pd.DataFrame(emp)
7 df
8 ​

Out[1]: TNAME TANO TNADD SALARY

0 AMIT T01 123 PASCHIM VIHAR 23000

1 RAJESH TO2 6/11 RAMESH NAGAR 34000

2 BINNY T03 5 WEST PUNJABHI BAG H 12000

3 CHARU T04 23 MALVIYA NAGAR 45000

4 MEENAKSHI TO5 19 MEERA BAGH 34000

In [2]: 1 df=df.set_index('TNAME')
2 df

Out[2]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

RAJESH TO2 6/11 RAMESH NAGAR 34000

BINNY T03 5 WEST PUNJABHI BAG H 12000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI TO5 19 MEERA BAGH 34000

In [3]: 1 #INSERT NEW ROW WITH VALUES["ISHA","T06","23 MODEL TOWN",35000] using loc
2 df.loc["ISHA"]=["T06","23 MODEL TOWN",35000]
3 df

Out[3]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

RAJESH TO2 6/11 RAMESH NAGAR 34000

BINNY T03 5 WEST PUNJABHI BAG H 12000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI TO5 19 MEERA BAGH 34000

ISHA T06 23 MODEL TOWN 35000


In [4]: 1 #CHANGING THE CONTENTS
2 df.loc["BINNY"]=["T03","5 WEST PUNJABI BAGH",20000]
3 df

Out[4]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

RAJESH TO2 6/11 RAMESH NAGAR 34000

BINNY T03 5 WEST PUNJABI BAGH 20000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI TO5 19 MEERA BAGH 34000

ISHA T06 23 MODEL TOWN 35000

In [5]: 1 #Edit the contents using iloc


2 df.iloc[4]=["T05","10 MEERA BAGH",25000]
3 df

Out[5]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

RAJESH TO2 6/11 RAMESH NAGAR 34000

BINNY T03 5 WEST PUNJABI BAGH 20000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI T05 10 MEERA BAGH 25000

ISHA T06 23 MODEL TOWN 35000

DELETING ROW
In [6]: 1 df.drop("RAJESH",axis=0)

Out[6]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

BINNY T03 5 WEST PUNJABI BAGH 20000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI T05 10 MEERA BAGH 25000

ISHA T06 23 MODEL TOWN 35000


In [7]: 1 df

Out[7]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

RAJESH TO2 6/11 RAMESH NAGAR 34000

BINNY T03 5 WEST PUNJABI BAGH 20000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI T05 10 MEERA BAGH 25000

ISHA T06 23 MODEL TOWN 35000

In [8]: 1 df.drop("RAJESH",axis=0,inplace=True)

In [9]: 1 df

Out[9]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

BINNY T03 5 WEST PUNJABI BAGH 20000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI T05 10 MEERA BAGH 25000

ISHA T06 23 MODEL TOWN 35000

In [10]: 1 df

Out[10]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

BINNY T03 5 WEST PUNJABI BAGH 20000

CHARU T04 23 MALVIYA NAGAR 45000

MEENAKSHI T05 10 MEERA BAGH 25000

ISHA T06 23 MODEL TOWN 35000

In [12]: 1 #df.drop(labels=["ISHA","CHARU"],axis=0,inplace=True)
2 df.drop(["ISHA","CHARU"],0,inplace=True)
In [13]: 1 df

Out[13]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

BINNY T03 5 WEST PUNJABI BAGH 20000

MEENAKSHI T05 10 MEERA BAGH 25000

In [14]: 1 df.drop(df.index[1])

Out[14]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

MEENAKSHI T05 10 MEERA BAGH 25000

In [15]: 1 df

Out[15]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

BINNY T03 5 WEST PUNJABI BAGH 20000

MEENAKSHI T05 10 MEERA BAGH 25000

In [16]: 1 df.drop(df.index[1],inplace=True)

In [17]: 1 df

Out[17]: TANO TNADD SALARY

TNAME

AMIT T01 123 PASCHIM VIHAR 23000

MEENAKSHI T05 10 MEERA BAGH 25000

In [18]: 1 df.drop(df.index[[0,1]],inplace=True)

In [19]: 1 df

Out[19]: TANO TNADD SALARY

TNAME
BOOLEAN INDEXING
In [1]: 1 import pandas as pd
2 dict1={'Names':['Sush','Adarsh','Ravi','Manu','Sushma'],
3 'Clas':[11,12,11,12,12],
4 'Sec':['A','A','C','A','B'],
5 'Phy':[34,40,56,67,50],
6 'Chem':[78,90,50,65,90],
7 'Eng':[50,55,67,68,69],
8 'Proj_rem':['Avg','Good','Good','Fair','Avg']
9 }
10 student=pd.DataFrame(dict1,index=[100,101,102,103,104])
11 student
12 ​

Out[1]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

101 Adarsh 12 A 40 90 55 Good

102 Ravi 11 C 56 50 67 Good

103 Manu 12 A 67 65 68 Fair

104 Sushma 12 B 50 90 69 Avg

In [6]: 1 #Marks >=70 in Chemistry


2 student[student.Chem>=70]

Out[6]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

101 Adarsh 12 A 40 90 55 Good

104 Sushma 12 B 50 90 69 Avg

In [5]: 1 #Student whose section is A


2 student[student.Sec=="A"]

Out[5]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

101 Adarsh 12 A 40 90 55 Good

103 Manu 12 A 67 65 68 Fair

WAC to display the details of class 11 students


In [4]: 1 student[student.Clas==11]

Out[4]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

102 Ravi 11 C 56 50 67 Good

In [8]: 1 student.loc[student.Clas==11]

Out[8]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

102 Ravi 11 C 56 50 67 Good

WAC display the project remarks of all students.

In [10]: 1 student["Proj_rem"]

Out[10]: 100 Avg


101 Good
102 Good
103 Fair
104 Avg
Name: Proj_rem, dtype: object

In [11]: 1 student.Proj_rem

Out[11]: 100 Avg


101 Good
102 Good
103 Fair
104 Avg
Name: Proj_rem, dtype: object

WAC to display all subject marks for class 12 students.

In [13]: 1 student

Out[13]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

101 Adarsh 12 A 40 90 55 Good

102 Ravi 11 C 56 50 67 Good

103 Manu 12 A 67 65 68 Fair

104 Sushma 12 B 50 90 69 Avg


In [15]: 1 student[student.Clas==12][["Phy","Chem","Eng"]]

Out[15]: Phy Chem Eng

101 40 90 55

103 67 65 68

104 50 90 69

In [2]: 1 student.loc[student.Clas==12,["Phy","Chem","Eng"]]

Out[2]: Phy Chem Eng

101 40 90 55

103 67 65 68

104 50 90 69

WAC to view the Project remark for those who have got more than 80 in chemistry.

In [3]: 1 student.loc[student.Chem>80,"Proj_rem"]

Out[3]: 101 Good


104 Avg
Name: Proj_rem, dtype: object

Display the details of students who have got Good in their Project remarks.

In [4]: 1 student.loc[student.Proj_rem=="Good"]

Out[4]: Names Clas Sec Phy Chem Eng Proj_rem

101 Adarsh 12 A 40 90 55 Good

102 Ravi 11 C 56 50 67 Good

In [5]: 1 student[student.Proj_rem=="Good"]

Out[5]: Names Clas Sec Phy Chem Eng Proj_rem

101 Adarsh 12 A 40 90 55 Good

102 Ravi 11 C 56 50 67 Good

Change the physics marks of Adarsh to 50.


In [7]: 1 student.loc[student.Names=="Adarsh","Phy"]=50

In [8]: 1 student

Out[8]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

101 Adarsh 12 A 50 90 55 Good

102 Ravi 11 C 56 50 67 Good

103 Manu 12 A 67 65 68 Fair

104 Sushma 12 B 50 90 69 Avg

In [ ]: 1 WAC to Change the name “Sushma” to Sushmita”.

In [9]: 1 student.loc[student.Names=="Sushma","Names"]="Sushmita"

In [10]: 1 student

Out[10]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

101 Adarsh 12 A 50 90 55 Good

102 Ravi 11 C 56 50 67 Good

103 Manu 12 A 67 65 68 Fair

104 Sushmita 12 B 50 90 69 Avg

In [13]: 1 student.loc[student.Names=="Sush"]="Sushmita"

In [15]: 1 student

Out[15]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sushmita Sushmita Sushmita Sushmita Sushmita Sushmita Sushmita

101 Adarsh 12 A 50 90 55 Good

102 Ravi 11 C 56 50 67 Good

103 Manu 12 A 67 65 68 Fair

104 Sushmita 12 B 50 90 69 Avg

WAC to change the Project remark to “Excellent” for those who have got more than 80 in
chemistry.
In [3]: 1 student.loc[student.Chem>80,"Proj_rem"]="Excellent"

In [5]: 1 student

Out[5]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

101 Adarsh 12 A 40 90 55 Excellent

102 Ravi 11 C 56 50 67 Good

103 Manu 12 A 67 65 68 Fair

104 Sushma 12 B 50 90 69 Excellent

In [18]: 1 student.drop(100,0,inplace=True)

In [4]: 1 student

Out[4]: Names Clas Sec Phy Chem Eng Proj_rem

100 Sush 11 A 34 78 50 Avg

101 Adarsh 12 A 40 90 55 Excellent

102 Ravi 11 C 56 50 67 Good

103 Manu 12 A 67 65 68 Fair

104 Sushma 12 B 50 90 69 Excellent

In [20]: 1 student.loc[student.Chem>80,"Proj_rem"]="Excellent"

In [21]: 1 student

Out[21]: Names Clas Sec Phy Chem Eng Proj_rem

101 Adarsh 12 A 50 90 55 Excellent

102 Ravi 11 C 56 50 67 Good

103 Manu 12 A 67 65 68 Fair

104 Sushmita 12 B 50 90 69 Excellent

In [1]: 1 D1={ 'Riya':19, 'Isha':20}


2 D2={ 'Isha':20, 'Riya':19}
3 D1==D2
4 ​

Out[1]: True
In [4]: 1 import pandas as pd
2 import numpy as np
3 a1=np.array([2,3,4,5,6])
4 s1=pd.Series(a1,index=list("ABCDE"))
5 print(s1.ndim)

In [2]: 1 import pandas as pd


2 dict1={"Name":["Riya","Rishab","Isha","Rahul"],"Age":[19,23,20,18]}
3 df=pd.DataFrame(dict1, index=["P1","P2","P3","P4"])
4 df

Out[2]: Name Age

P1 Riya 19

P2 Rishab 23

P3 Isha 20

P4 Rahul 18

In [4]: 1 df.shape

Out[4]: (4, 2)

In [5]: 1 df.count()

Out[5]: Name 4
Age 4
dtype: int64
In [1]: 1 import pandas as pd
2 dic={'Rollno':[1,2,3,4,5,6],
3 'Name':["Prerna Singh","Manish Arora","Tanish Goel", "Falguni Jain","
4 'UT1':[24,18,20,22,15,20],
5 'UT2':[24,17,22,20,20,15],
6 'UT3':[20,19,18,24,18,22],
7 'UT4':[22,22,24,20,22,24]
8 }
9 df=pd.DataFrame(dic,index=["P1","P2","P3","P4","P5","P6"])
10 print(df.index)
11 print(df.info)
12 print(df.columns)
13 print(df)

Index(['P1', 'P2', 'P3', 'P4', 'P5', 'P6'], dtype='object')


<bound method DataFrame.info of Rollno Name UT1 UT2 UT3
UT4
P1 1 Prerna Singh 24 24 20 22
P2 2 Manish Arora 18 17 19 22
P3 3 Tanish Goel 20 22 18 24
P4 4 Falguni Jain 22 20 24 20
P5 5 Kanika Bhatnagar 15 20 18 22
P6 6 Ramandeep Kaur 20 15 22 24>
Index(['Rollno', 'Name', 'UT1', 'UT2', 'UT3', 'UT4'], dtype='object')
Rollno Name UT1 UT2 UT3 UT4
P1 1 Prerna Singh 24 24 20 22
P2 2 Manish Arora 18 17 19 22
P3 3 Tanish Goel 20 22 18 24
P4 4 Falguni Jain 22 20 24 20
P5 5 Kanika Bhatnagar 15 20 18 22
P6 6 Ramandeep Kaur 20 15 22 24
In [8]: 1 import pandas as pd
2 df=pd.read_csv("grocery.csv")
3 print(df)

Sno Product Category Price Quantity


0 1 Chips Food 10 15
1 2 Milk Food 60 5
2 3 Maggi Food 20 5
3 4 Juice Food 100 4
4 5 Bread Food 20 2
5 6 Biscuit Food 20 2
6 7 Tea Food 120 1
7 8 Bourn-Vita Food 70 1
8 9 Bottle Household 80 2
9 10 Tiffin Box Household 75 2
10 11 Bucket Household 200 1
11 12 Detergent Household 80 1
12 13 Tissues Hygiene 30 5
13 14 Soap Hygiene 40 4
14 15 Brush Hygiene 30 2
15 16 Perfume Hygiene 150 1
16 17 Hair-Oil Hygiene 100 1
17 18 Pen Stationery 5 10
18 19 Pencil Stationery 2 10

In [4]: 1 df1=df[["Product","Price","Quantity"]]

In [9]: 1 df2=df.loc[df.Price>100,["Product","Price","Quantity"]]
2 df2

Out[9]: Product Price Quantity

6 Tea 120 1

10 Bucket 200 1

15 Perfume 150 1
In [5]: 1 df1

Out[5]: Product Price Quantity

0 Chips 10 15

1 Milk 60 5

2 Maggi 20 5

3 Juice 100 4

4 Bread 20 2

5 Biscuit 20 2

6 Tea 120 1

7 Bourn-Vita 70 1

8 Bottle 80 2

9 Tiffin Box 75 2

10 Bucket 200 1

11 Detergent 80 1

12 Tissues 30 5

13 Soap 40 4

14 Brush 30 2

15 Perfume 150 1

16 Hair-Oil 100 1

17 Pen 5 10

18 Pencil 2 10

In [ ]: 1 ​

You might also like