0% found this document useful (0 votes)
8 views

Dataframes - Jupyter Notebook

DATAFRAMES TOPIC CODE

Uploaded by

Arundhathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Dataframes - Jupyter Notebook

DATAFRAMES TOPIC CODE

Uploaded by

Arundhathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

8/9/24, 9:58 AM Dataframes - Jupyter Notebook

Objective: learn to create dataframe and apply


join operations between dataframes
1)Concatenating 2)Append 3)Merge

In [2]: 1 # dataframe 1
2 import pandas as pd
3
4 # assign data of lists.
5 data = {'city': ['delhi', 'mumbai', 'agra', 'goa'], \
6 'positive': [20, 21, 19, 18],'neagtive': [120, 121, 119, 18] }
7
8 # Create DataFrame
9 df1 = pd.DataFrame(data)

In [3]: 1 df1

Out[3]:
city positive neagtive

0 delhi 20 120

1 mumbai 21 121

2 agra 19 119

3 goa 18 18

In [4]: 1 # dataframe 2
2 data = {'city': ['delhi', 'mumbai', 'agra', 'chennai'], \
3 'positive': [10, 21, 39, 18],'neagtive': [12, 101, 129, 118] }
4
5 # Create DataFrame
6 df2 = pd.DataFrame(data)
7 df2

Out[4]:
city positive neagtive

0 delhi 10 12

1 mumbai 21 101

2 agra 39 129

3 chennai 18 118

In [99]: 1 # concatenate: concatenate the two dataframes one below the other.
2 df3 = pd.concat([df1,df2])

localhost:8888/notebooks/Dataframes.ipynb 1/9
8/9/24, 9:58 AM Dataframes - Jupyter Notebook

In [100]: 1 df3

Out[100]: city positive neagtive

0 delhi 20 120

1 mumbai 21 121

2 agra 19 119

3 goa 18 18

0 delhi 10 12

1 mumbai 21 101

2 agra 39 129

3 chennai 18 118

We see that in above result, we did not get continuous indexes( 0,1,2,3,0,1,2,3) to make
them continuous like 0,1,2,3,4,… we can write ignore_index=True

In [101]: 1 df3 = pd.concat([df1,df2], ignore_index=True)

In [102]: 1 df3

Out[102]: city positive neagtive

0 delhi 20 120

1 mumbai 21 121

2 agra 19 119

3 goa 18 18

4 delhi 10 12

5 mumbai 21 101

6 agra 39 129

7 chennai 18 118

Assignng keys to dataframes df1 and df2

In [103]: 1 df3 = pd.concat([df1,df2], keys = ['first', 'second'])

localhost:8888/notebooks/Dataframes.ipynb 2/9
8/9/24, 9:58 AM Dataframes - Jupyter Notebook

In [104]: 1 df3

Out[104]: city positive neagtive

first 0 delhi 20 120

1 mumbai 21 121

2 agra 19 119

3 goa 18 18

second 0 delhi 10 12

1 mumbai 21 101

2 agra 39 129

3 chennai 18 118

In [105]: 1 df3.loc['first']

Out[105]: city positive neagtive

0 delhi 20 120

1 mumbai 21 121

2 agra 19 119

3 goa 18 18

In [106]: 1 df3.loc['first', 0]

Out[106]: city delhi


positive 20
neagtive 120
Name: (first, 0), dtype: object

In [107]: 1 df3.loc['second']

Out[107]: city positive neagtive

0 delhi 10 12

1 mumbai 21 101

2 agra 39 129

3 chennai 18 118

In [108]: 1 # if you want to combine the two data frames horizontally means one nex
2 df3 = pd.concat([df1,df2], axis =1)
3 df3

Out[108]: city positive neagtive city positive neagtive

0 delhi 20 120 delhi 10 12

1 mumbai 21 121 mumbai 21 101

2 agra 19 119 agra 39 129

3 goa 18 18 chennai 18 118

Another example: Create two dataframes and concatenate them horizontally (axis =1)

localhost:8888/notebooks/Dataframes.ipynb 3/9
8/9/24, 9:58 AM Dataframes - Jupyter Notebook

In [5]: 1 # dataframe 1
2 import pandas as pd
3
4 # assign data of lists.
5 data = {'city': ['delhi', 'mumbai', 'agra', 'goa'], \
6 'temperature': [20, 21, 19, 18]}
7
8 # Create DataFrame
9 df1 = pd.DataFrame(data)

In [110]: 1 df1

Out[110]: city temperature

0 delhi 20

1 mumbai 21

2 agra 19

3 goa 18

In [6]: 1 # dataframe 2
2 import pandas as pd
3
4 # assign data of lists.
5 data = {'city': ['agra','mumbai','goa','delhi',], \
6 'windspeed': [2, 2, 1, 1]}
7
8 # Create DataFrame
9 df2 = pd.DataFrame(data)

In [112]: 1 df2

Out[112]: city windspeed

0 agra 2

1 mumbai 2

2 goa 1

3 delhi 1

In [113]: 1 df3 = pd.concat([df1,df2], axis =1)


2 df3

Out[113]: city temperature city windspeed

0 delhi 20 agra 2

1 mumbai 21 mumbai 2

2 agra 19 goa 1

3 goa 18 delhi 1

We see in the above output the rows are not containing records of same city, to rectify it we
can pass the index

localhost:8888/notebooks/Dataframes.ipynb 4/9
8/9/24, 9:58 AM Dataframes - Jupyter Notebook

In [7]: 1 df1 = pd.DataFrame({'city': ['delhi', 'mumbai', 'agra', 'goa'],\


2 'temperature': [20, 21, 19, 18]} , index=[0,1,2,3])
3 # 0,1,2,3 are the indexes given to 'delhi', 'mumbai', 'agra', 'goa'
4 df2 = pd.DataFrame({'city': ['agra','mumbai','goa','delhi'], \
5 'windspeed': [2, 2, 1, 1]}, index=[2, 1,3,0])
6 #index=[2, 1,3,0] are the indexes for 'agra','mumbai','goa','delhi'

In [115]: 1 df3 = pd.concat([df1,df2], axis =1)


2 df3

Out[115]: city temperature city windspeed

0 delhi 20 delhi 1

1 mumbai 21 mumbai 2

2 agra 19 agra 2

3 goa 18 goa 1

Check what will happen if axis =0, it means rows

Append:
The concat method can combine data frames along either rows or columns, while the
append method only combines data frames along rows

In [8]: 1 # dataframe 1
2 import pandas as pd
3
4 # assign data of lists.
5 data = {'city': ['delhi', 'mumbai', 'agra'], \
6 'positive': [20, 21, 19],'neagtive': [120, 121, 119] }
7
8 # Create DataFrame
9 df1 = pd.DataFrame(data)
10 df1

Out[8]:
city positive neagtive

0 delhi 20 120

1 mumbai 21 121

2 agra 19 119

localhost:8888/notebooks/Dataframes.ipynb 5/9
8/9/24, 9:58 AM Dataframes - Jupyter Notebook

In [117]: 1 # dataframe 2
2 import pandas as pd
3
4 # assign data of lists.
5 data = {'city': ['delhi', 'mumbai', 'agra'],\
6 'positive': [210, 211, 19],'neagtive': [12, 121, 109] }
7
8 # Create DataFrame
9 df2 = pd.DataFrame(data)
10 df2

Out[117]: city positive neagtive

0 delhi 210 12

1 mumbai 211 121

2 agra 19 109

In [118]: 1 df3 = df1._append(df2)

In [119]: 1 df3

Out[119]: city positive neagtive

0 delhi 20 120

1 mumbai 21 121

2 agra 19 119

0 delhi 210 12

1 mumbai 211 121

2 agra 19 109

Merge data frames:In merging, you can merge


two data frames to form a single data frame.
You can also decide which columns you want
to make common.

merge: always combine based on a column


and we have to specify it, some column
should be same in both dataframes based on
which we can combine
In [120]: 1 df3 = df1.merge(df2, on = 'city')
2 df3

Out[120]: city positive_x neagtive_x positive_y neagtive_y

0 delhi 20 120 210 12

1 mumbai 21 121 211 121

2 agra 19 119 19 109

localhost:8888/notebooks/Dataframes.ipynb 6/9
8/9/24, 9:58 AM Dataframes - Jupyter Notebook

In [121]: 1 # positive_x neagtive_x belongs to first dataframe and positive_y

We can join the dataframes in different ways: 1)inner join: only common data of the
dataframes are outputted 2)left join:That means we should get all records of left dataframe
and only the matching data of right dataframe. 3)Right join:That means we should get all
records of right dataframe and only the matching data of left dataframe. 4)Full outer join: all
data from right and left dataframe. if no matching NaN will come

In [122]: 1 # inner join


2 ​
3 df3 = df1.merge(df2, on = 'city', how ='inner')
4 df3

Out[122]: city positive_x neagtive_x positive_y neagtive_y

0 delhi 20 120 210 12

1 mumbai 21 121 211 121

2 agra 19 119 19 109

In [123]: 1 # in the above output we cant see the change as all records were common

In [124]: 1 # dataframe 1
2 import pandas as pd
3
4 # assign data of lists.
5 data = {'city': ['delhi', 'mumbai', 'agra', 'goa'], \
6 'positive': [20, 21, 19, 88],'neagtive': [120, 121, 119, 133] }
7
8 # Create DataFrame
9 df1 = pd.DataFrame(data)
10 df1

Out[124]: city positive neagtive

0 delhi 20 120

1 mumbai 21 121

2 agra 19 119

3 goa 88 133

localhost:8888/notebooks/Dataframes.ipynb 7/9
8/9/24, 9:58 AM Dataframes - Jupyter Notebook

In [9]: 1 # dataframe 2
2 import pandas as pd
3
4 # assign data of lists.
5 data = {'city': ['delhi', 'mumbai', 'agra'], \
6 'positive': [210, 211, 19],'neagtive': [12, 121, 109] }
7
8 # Create DataFrame
9 df2 = pd.DataFrame(data)
10 df2

Out[9]:
city positive neagtive

0 delhi 210 12

1 mumbai 211 121

2 agra 19 109

In [126]: 1 df3 = df1.merge(df2, on = 'city', how ='inner')


2 df3

Out[126]: city positive_x neagtive_x positive_y neagtive_y

0 delhi 20 120 210 12

1 mumbai 21 121 211 121

2 agra 19 119 19 109

In [127]: 1 # we see that record for goa did not come as it was not common in both

In [128]: 1 # left join


2 df3 = df1.merge(df2, on = 'city', how ='left')
3 df3

Out[128]: city positive_x neagtive_x positive_y neagtive_y

0 delhi 20 120 210.0 12.0

1 mumbai 21 121 211.0 121.0

2 agra 19 119 19.0 109.0

3 goa 88 133 NaN NaN

In [129]: 1 # Right join


2 df3 = df1.merge(df2, on = 'city', how ='right')
3 df3

Out[129]: city positive_x neagtive_x positive_y neagtive_y

0 delhi 20 120 210 12

1 mumbai 21 121 211 121

2 agra 19 119 19 109

localhost:8888/notebooks/Dataframes.ipynb 8/9
8/9/24, 9:58 AM Dataframes - Jupyter Notebook

In [130]: 1 # outer join


2 # Right join
3 df3 = df1.merge(df2, on = 'city', how ='outer')
4 df3

Out[130]: city positive_x neagtive_x positive_y neagtive_y

0 delhi 20 120 210.0 12.0

1 mumbai 21 121 211.0 121.0

2 agra 19 119 19.0 109.0

3 goa 88 133 NaN NaN

https://github.com/codebasics/py/blob/master/pandas/9_merge/pandas_merge.ipynb
(https://github.com/codebasics/py/blob/master/pandas/9_merge/pandas_merge.ipynb)

localhost:8888/notebooks/Dataframes.ipynb 9/9

You might also like