0% found this document useful (0 votes)

38 views40 pages

Chapter 2

Uploaded by

LucíaLópez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views40 pages

Chapter 2

Uploaded by

LucíaLópez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Left join

J O I N I N G D ATA W I T H PA N D A S

Aaren Stubberfield
Instructor
Quick review

JOINING DATA WITH PANDAS

Left join

JOINING DATA WITH PANDAS

Left join

JOINING DATA WITH PANDAS

New dataset

JOINING DATA WITH PANDAS

Movies table
movies = pd.read_csv('tmdb_movies.csv')
print(movies.head())
print(movies.shape)

id original_title popularity release_date

0 257 Oliver Twist 20.415572 2005-09-23
1 14290 Better Luck ... 3.877036 2002-01-12
2 38365 Grown Ups 38.864027 2010-06-24
3 9672 Infamous 3.6808959999... 2006-11-16
4 12819 Alpha and Omega 12.300789 2010-09-17
(4803, 4)

JOINING DATA WITH PANDAS

Tagline table
taglines = pd.read_csv('tmdb_taglines.csv')
print(taglines.head())
print(taglines.shape)

id tagline
0 19995 Enter the World of Pandora.
1 285 At the end of the world, the adventure begins.
2 206647 A Plan No One Escapes
3 49026 The Legend Ends
4 49529 Lost in our world, found in another.
(3955, 2)

JOINING DATA WITH PANDAS

Merge with left join
movies_taglines = movies.merge(taglines, on='id', how='left')
print(movies_taglines.head())

id original_title popularity release_date tagline

0 257 Oliver Twist 20.415572 2005-09-23 NaN
1 14290 Better Luck ... 3.877036 2002-01-12 Never undere...
2 38365 Grown Ups 38.864027 2010-06-24 Boys will be...
3 9672 Infamous 3.6808959999... 2006-11-16 There's more...
4 12819 Alpha and Omega 12.300789 2010-09-17 A Pawsome 3D...

JOINING DATA WITH PANDAS

Number of rows returned
print(movies_taglines.shape)

(4805, 5)

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Other joins
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubberfield
Instructor
Right join

JOINING DATA WITH PANDAS

Right join

JOINING DATA WITH PANDAS

Looking at data
movie_to_genres = pd.read_csv('tmdb_movie_to_genres.csv')
tv_genre = movie_to_genres[movie_to_genres['genre'] == 'TV Movie']
print(tv_genre)

movie_id genre
4998 10947 TV Movie
5994 13187 TV Movie
7443 22488 TV Movie
10061 78814 TV Movie
10790 153397 TV Movie
10835 158150 TV Movie
11096 205321 TV Movie
11282 231617 TV Movie

JOINING DATA WITH PANDAS

Filtering the data
m = movie_to_genres['genre'] == 'TV Movie'
tv_genre = movie_to_genres[m]
print(tv_genre)

movie_id genre
4998 10947 TV Movie
5994 13187 TV Movie
7443 22488 TV Movie
10061 78814 TV Movie
10790 153397 TV Movie
10835 158150 TV Movie
11096 205321 TV Movie
11282 231617 TV Movie

JOINING DATA WITH PANDAS

Data to merge
id title popularity release_date
0 257 Oliver Twist 20.415572 2005-09-23
1 14290 Better Luck ... 3.877036 2002-01-12
2 38365 Grown Ups 38.864027 2010-06-24
3 9672 Infamous 3.6808959999... 2006-11-16
4 12819 Alpha and Omega 12.300789 2010-09-17

movie_id genre
4998 10947 TV Movie
5994 13187 TV Movie
7443 22488 TV Movie
10061 78814 TV Movie
10790 153397 TV Movie

JOINING DATA WITH PANDAS

Merge with right join
tv_movies = movies.merge(tv_genre, how='right',
left_on='id', right_on='movie_id')
print(tv_movies.head())

id title popularity release_date movie_id genre

0 153397 Restless 0.812776 2012-12-07 153397 TV Movie
1 10947 High School ... 16.536374 2006-01-20 10947 TV Movie
2 231617 Signed, Seal... 1.444476 2013-10-13 231617 TV Movie
3 78814 We Have Your... 0.102003 2011-11-12 78814 TV Movie
4 158150 How to Fall ... 1.923514 2012-07-21 158150 TV Movie

JOINING DATA WITH PANDAS

Outer join

JOINING DATA WITH PANDAS

Outer join

JOINING DATA WITH PANDAS

Datasets for outer join
m = movie_to_genres['genre'] == 'Family' m = movie_to_genres['genre'] == 'Comedy'
family = movie_to_genres[m].head(3) comedy = movie_to_genres[m].head(3)

movie_id genre movie_id genre

0 12 Family 0 5 Comedy
1 35 Family 1 13 Comedy
2 105 Family 2 35 Comedy

JOINING DATA WITH PANDAS

Merge with outer join
family_comedy = family.merge(comedy, on='movie_id', how='outer',
suffixes=('_fam', '_com'))
print(family_comedy)

movie_id genre_fam genre_com

0 12 Family NaN
1 35 Family Comedy
2 105 Family NaN
3 5 NaN Comedy
4 13 NaN Comedy

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Merging a table to
itself
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubberfield
Instructor
Sequel movie data
print(sequel.head())

id title sequel
0 19995 Avatar NaN
1 862 Toy Story 863
2 863 Toy Story 2 10193
3 597 Titanic NaN
4 24428 The Avengers NaN

JOINING DATA WITH PANDAS

Merging a table to itself

JOINING DATA WITH PANDAS

Merging a table to itself
original_sequels = sequels.merge(sequels, left_on='sequel', right_on='id',
suffixes=('_org','_seq'))
print(original_sequels.head())

id_org title_org sequel_org id_seq title_seq sequel_seq

0 862 Toy Story 863 863 Toy Story 2 10193
1 863 Toy Story 2 10193 10193 Toy Story 3 NaN
2 675 Harry Potter... 767 767 Harry Potter... NaN
3 121 The Lord of ... 122 122 The Lord of ... NaN
4 120 The Lord of ... 121 121 The Lord of ... 122

JOINING DATA WITH PANDAS

Continue format results
print(original_sequels[,['title_org','title_seq']].head())

title_org title_seq
0 Toy Story Toy Story 2
1 Toy Story 2 Toy Story 3
2 Harry Potter... Harry Potter...
3 The Lord of ... The Lord of ...
4 The Lord of ... The Lord of ...

JOINING DATA WITH PANDAS

Merging a table to itself with left join
original_sequels = sequels.merge(sequels, left_on='sequel', right_on='id',
how='left', suffixes=('_org','_seq'))
print(original_sequels.head())

id_org title_org sequel_org id_seq title_seq sequel_seq

0 19995 Avatar NaN NaN NaN NaN
1 862 Toy Story 863 863 Toy Story 2 10193
2 863 Toy Story 2 10193 10193 Toy Story 3 NaN
3 597 Titanic NaN NaN NaN NaN
4 24428 The Avengers NaN NaN NaN NaN

JOINING DATA WITH PANDAS

When to merge at table to itself
Common situations:

Hierarchical relationships

Sequential relationships
Graph data

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S
Merging on indexes
J O I N I N G D ATA W I T H PA N D A S

Aaren Stubberfield
Instructor
Table with an index
id title popularity release_date
0 257 Oliver Twist 20.415572 2005-09-23
1 14290 Better Luck ... 3.877036 2002-01-12
2 38365 Grown Ups 38.864027 2010-06-24
3 9672 Infamous 3.680896 2006-11-16
4 12819 Alpha and Omega 12.300789 2010-09-17

title popularity release_date

id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16
12819 Alpha and Omega 12.300789 2010-09-17

JOINING DATA WITH PANDAS

Setting an index
movies = pd.read_csv('tmdb_movies.csv', index_col=['id'])
print(movies.head())

title popularity release_date

id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16
12819 Alpha and Omega 12.300789 2010-09-17

JOINING DATA WITH PANDAS

Merge index datasets
title popularity release_date
id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16

tagline
id
19995 Enter the Wo...
285 At the end o...
206647 A Plan No On...
49026 The Legend Ends

JOINING DATA WITH PANDAS

Merging on index
movies_taglines = movies.merge(taglines, on='id', how='left')
print(movies_taglines.head())

title popularity release_date tagline

id
257 Oliver Twist 20.415572 2005-09-23 NaN
14290 Better Luck ... 3.877036 2002-01-12 Never undere...
38365 Grown Ups 38.864027 2010-06-24 Boys will be...
9672 Infamous 3.680896 2006-11-16 There's more...
12819 Alpha and Omega 12.300789 2010-09-17 A Pawsome 3D...

JOINING DATA WITH PANDAS

MultiIndex datasets
samuel = pd.read_csv('samuel.csv', casts = pd.read_csv('casts.csv',
index_col=['movie_id', index_col=['movie_id',
'cast_id']) 'cast_id'])
print(samuel.head()) print(casts.head())

name character
movie_id cast_id movie_id cast_id
184 3 Samuel L. Jackson 5 22 Jezebel
319 13 Samuel L. Jackson 23 Diana
326 2 Samuel L. Jackson 24 Athena
329 138 Samuel L. Jackson 25 Elspeth
393 21 Samuel L. Jackson 26 Eva

JOINING DATA WITH PANDAS

MultiIndex merge
samuel_casts = samuel.merge(casts, on=['movie_id','cast_id'])
print(samuel_casts.head())
print(samuel_casts.shape)

name character
movie_id cast_id
184 3 Samuel L. Jackson Ordell Robbie
319 13 Samuel L. Jackson Big Don
326 2 Samuel L. Jackson Neville Flynn
329 138 Samuel L. Jackson Arnold
393 21 Samuel L. Jackson Rufus
(67, 2)

JOINING DATA WITH PANDAS

Index merge with left_on and right_on
title popularity release_date
id
257 Oliver Twist 20.415572 2005-09-23
14290 Better Luck ... 3.877036 2002-01-12
38365 Grown Ups 38.864027 2010-06-24
9672 Infamous 3.680896 2006-11-16

genre
movie_id
5 Crime
5 Comedy
11 Science Fiction
11 Action

JOINING DATA WITH PANDAS

Index merge with left_on and right_on
movies_genres = movies.merge(movie_to_genres, left_on='id', left_index=True,
right_on='movie_id', right_index=True)
print(movies_genres.head())

id title popularity release_date genre

5 5 Four Rooms 22.876230 1995-12-09 Crime
5 5 Four Rooms 22.876230 1995-12-09 Comedy
11 11 Star Wars 126.393695 1977-05-25 Science Fiction
11 11 Star Wars 126.393695 1977-05-25 Action
11 11 Star Wars 126.393695 1977-05-25 Adventure

JOINING DATA WITH PANDAS

Let's practice!
J O I N I N G D ATA W I T H PA N D A S

Project 18 Movie Recommendation System Using Machine Learning With Python
No ratings yet
Project 18 Movie Recommendation System Using Machine Learning With Python
77 pages
Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
20 pages
Ch8 Data Wrangling Join, Combine, and Reshape
No ratings yet
Ch8 Data Wrangling Join, Combine, and Reshape
13 pages
Panda 3
No ratings yet
Panda 3
11 pages
OOM Unit 2
No ratings yet
OOM Unit 2
145 pages
Module - d2
No ratings yet
Module - d2
41 pages
07 Data Wrangling
No ratings yet
07 Data Wrangling
51 pages
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
No ratings yet
Netflix Business Case Study - Data Exploration and Visualisation.. Sonam Meshram
27 pages
Python 101: Using Pandas
No ratings yet
Python 101: Using Pandas
53 pages
Recommendation Engine 1657857468
No ratings yet
Recommendation Engine 1657857468
15 pages
Python Lecture 5 (2025)
No ratings yet
Python Lecture 5 (2025)
29 pages
Session2-DM Using Pandas
No ratings yet
Session2-DM Using Pandas
51 pages
Data Wrangling and Analysis
100% (1)
Data Wrangling and Analysis
36 pages
Recommendation System 1696663388
No ratings yet
Recommendation System 1696663388
29 pages
UNIT IV Material
No ratings yet
UNIT IV Material
23 pages
Notes For Python Part III
No ratings yet
Notes For Python Part III
44 pages
Movie Data Analysis Netflix
No ratings yet
Movie Data Analysis Netflix
16 pages
Data Science Data Manipulation With Pandas
No ratings yet
Data Science Data Manipulation With Pandas
77 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
4th Unit Answer Bank
No ratings yet
4th Unit Answer Bank
40 pages
Chapter 4
No ratings yet
Chapter 4
20 pages
Swati Mam The - Iscale Movies Project Code
No ratings yet
Swati Mam The - Iscale Movies Project Code
13 pages
Movie Recommender System Copy1
No ratings yet
Movie Recommender System Copy1
41 pages
15 Funciones Esenciales de Pandas
No ratings yet
15 Funciones Esenciales de Pandas
12 pages
Joining Data 4
No ratings yet
Joining Data 4
40 pages
Movie Recommender
No ratings yet
Movie Recommender
19 pages
Import As Import As Import As Import Import As From Import: 'Ggplot'
No ratings yet
Import As Import As Import As Import Import As From Import: 'Ggplot'
13 pages
Edp 3
No ratings yet
Edp 3
16 pages
4.3 Joining Data With Pandas (Advanced Merging and Concatenating)
No ratings yet
4.3 Joining Data With Pandas (Advanced Merging and Concatenating)
38 pages
Combining Datasets
No ratings yet
Combining Datasets
36 pages
Lecture 7 - CS50x
No ratings yet
Lecture 7 - CS50x
9 pages
Exp 6
No ratings yet
Exp 6
9 pages
Entrega 1 - Computer Science
No ratings yet
Entrega 1 - Computer Science
19 pages
Recomendacao de Filmes Chatbot
No ratings yet
Recomendacao de Filmes Chatbot
24 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Project 5
No ratings yet
Project 5
5 pages
Netflix PDF
No ratings yet
Netflix PDF
16 pages
Anatomy of Literature
No ratings yet
Anatomy of Literature
3 pages
Weg51024 OGL D6 Magic 2.0
No ratings yet
Weg51024 OGL D6 Magic 2.0
109 pages
Involuntary Romance - Chapter 14 - Read Free Manga Online at Bato - To
No ratings yet
Involuntary Romance - Chapter 14 - Read Free Manga Online at Bato - To
1 page
Merge, Join, and Concatenate: Concatenating Objects
No ratings yet
Merge, Join, and Concatenate: Concatenating Objects
62 pages
The Lumber Room Paragraphs 1 - 14 Summary and Analysis - GradeSaver
No ratings yet
The Lumber Room Paragraphs 1 - 14 Summary and Analysis - GradeSaver
2 pages
UnitIV 1
No ratings yet
UnitIV 1
4 pages
SRC 7
No ratings yet
SRC 7
11 pages
Source Code Source Code
No ratings yet
Source Code Source Code
4 pages
Week 5 LAB
No ratings yet
Week 5 LAB
23 pages
Pandas
No ratings yet
Pandas
94 pages
15 Pandas That Every Data Scientists Should Know 1674474419
No ratings yet
15 Pandas That Every Data Scientists Should Know 1674474419
10 pages
15 Pandas Function For 90 - of The Work
No ratings yet
15 Pandas Function For 90 - of The Work
12 pages
Esio Trot by Roald Dahl
50% (2)
Esio Trot by Roald Dahl
28 pages
COM 428 - Jupyter Notebook2 - 101223
No ratings yet
COM 428 - Jupyter Notebook2 - 101223
16 pages
Moviesuggester - Jupyter Notebook
No ratings yet
Moviesuggester - Jupyter Notebook
11 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
21Bcs5066 - Deepanshu Tyagi Source Code: #Importing Libraries
No ratings yet
21Bcs5066 - Deepanshu Tyagi Source Code: #Importing Libraries
18 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
Movie Recommendation System-Jupyter System
No ratings yet
Movie Recommendation System-Jupyter System
8 pages
7 Days Analytics Course 3feiz7 4
No ratings yet
7 Days Analytics Course 3feiz7 4
8 pages
Thesis Statements Magical Realism
100% (3)
Thesis Statements Magical Realism
8 pages
Narrative N Descriptive Text
No ratings yet
Narrative N Descriptive Text
11 pages
Data Cleaning and Exploratory Data Analysis With Pandas On Trending Youtube Video Statistics
No ratings yet
Data Cleaning and Exploratory Data Analysis With Pandas On Trending Youtube Video Statistics
5 pages
Project MovieLens 17082019 by Monalisa Ganguly
No ratings yet
Project MovieLens 17082019 by Monalisa Ganguly
28 pages
Descent: Journeys in The Dark 2nd Edition Rulebook
No ratings yet
Descent: Journeys in The Dark 2nd Edition Rulebook
24 pages
SPM English Trial P1
No ratings yet
SPM English Trial P1
12 pages
Keeping A Reading Journal: The Purpose of Your Reading Journal Is To
No ratings yet
Keeping A Reading Journal: The Purpose of Your Reading Journal Is To
2 pages
Reshaping Data With Python
No ratings yet
Reshaping Data With Python
1 page
Creative Writing Lecturer
100% (1)
Creative Writing Lecturer
5 pages
Characters in A Story
No ratings yet
Characters in A Story
2 pages
06options3 M2 L3
No ratings yet
06options3 M2 L3
6 pages
Combining Data in Pandas With Merge, .Join, and Concat - Real Python
No ratings yet
Combining Data in Pandas With Merge, .Join, and Concat - Real Python
2 pages
Python Project
No ratings yet
Python Project
1 page
The House of The Spirits Allende
No ratings yet
The House of The Spirits Allende
12 pages
Eternal Duelist Soul
No ratings yet
Eternal Duelist Soul
19 pages
Mindful Maths 1: Use Your Algebra to Solve These Puzzling Pictures
From Everand
Mindful Maths 1: Use Your Algebra to Solve These Puzzling Pictures
Ann McNair
No ratings yet
Creative Writing Fiction - Melcs
No ratings yet
Creative Writing Fiction - Melcs
3 pages
20220428-汉语拼音拼读全表（一、二） 2
No ratings yet
20220428-汉语拼音拼读全表（一、二） 2
2 pages
UNIT 2 10th
No ratings yet
UNIT 2 10th
20 pages
30 Day Reading Challenge K6
No ratings yet
30 Day Reading Challenge K6
1 page
Soal Latihan Introducing Oneself and Others Read Carefully and Choose The Correct Answer Between A, B, C, D, or E
No ratings yet
Soal Latihan Introducing Oneself and Others Read Carefully and Choose The Correct Answer Between A, B, C, D, or E
2 pages
The Dream of The Artificial Intelligence Is Written by John Derbyshire
No ratings yet
The Dream of The Artificial Intelligence Is Written by John Derbyshire
3 pages
TENSE+ PASSIVE- tiếng anh 7
No ratings yet
TENSE+ PASSIVE- tiếng anh 7
5 pages
Epartment OF Nglish: Foundation University Islamabad MS English Literature Postmodern Criticism (Spring 2021)
No ratings yet
Epartment OF Nglish: Foundation University Islamabad MS English Literature Postmodern Criticism (Spring 2021)
6 pages
Pardoners Tale Worksheet
No ratings yet
Pardoners Tale Worksheet
3 pages
Sarumon - The Homebrewery
No ratings yet
Sarumon - The Homebrewery
1 page
ODCL Information: 416 James Street Ozark, Alabama 36360
No ratings yet
ODCL Information: 416 James Street Ozark, Alabama 36360
4 pages
Seminario Dos Ratos - Lygia Fagundes Telles
No ratings yet
Seminario Dos Ratos - Lygia Fagundes Telles
1 page
The Key To Every Thing by Pat Schmatz Author's Note
No ratings yet
The Key To Every Thing by Pat Schmatz Author's Note
2 pages
Characters: Setting
No ratings yet
Characters: Setting
1 page
Differences Between Simon and Piggy's Deaths
No ratings yet
Differences Between Simon and Piggy's Deaths
1 page

Chapter 2

Uploaded by

Chapter 2

Uploaded by

Left join

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

id original_title popularity release_date

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

id original_title popularity release_date tagline

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

id title popularity release_date movie_id genre

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

movie_id genre movie_id genre

JOINING DATA WITH PANDAS

movie_id genre_fam genre_com

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

id_org title_org sequel_org id_seq title_seq sequel_seq

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

id_org title_org sequel_org id_seq title_seq sequel_seq

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

title popularity release_date

JOINING DATA WITH PANDAS

title popularity release_date

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

title popularity release_date tagline

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

JOINING DATA WITH PANDAS

id title popularity release_date genre

JOINING DATA WITH PANDAS

You might also like