0% found this document useful (0 votes)

40 views17 pages

JOIN Operations

The document discusses JOIN operations in databases. JOINs combine data from two or more tables based on related columns. The main types of JOINs are INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, and CROSS JOIN. Examples are provided to illustrate each JOIN type and how to write JOIN queries. Practical examples using SQL queries on tables are also included to demonstrate real-world JOIN operations.

Uploaded by

famasya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views17 pages

JOIN Operations

Uploaded by

famasya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

JOIN Operations

References

1. https://courses.cs.washington.edu/courses/cse154/18au/lectures/lec26-sql-joins/?print-pdf#/

undefined. https://www.enterprisedb.com/postgres-tutorials/overview-postgresql-indexes

undefined. https://github.com/morenoh149/postgresDBSamples/tree/master

Introduction

Join operations are essential for working with data that is spread across multiple tables. Join operations allow
us to combine information from multiple tables into a single result set, giving us a more complete picture of the
data we're working with.

Whether you're a database administrator, a software developer, or just someone curious about how databases
work, understanding join operations is crucial for working with data effectively. In this lecture, we will cover the
basics of join operations, including different types of joins, how to write join queries, and common use cases for
join operations. In the end of this lecture, we will also dive in into some best practices and pitfalls to avoid in
writing JOIN operations.

What is JOIN operations

In simple terms, a JOIN operation combines data from two or more tables in a database based on a related
column between them. It's like a powerful tool that lets you merge information from different sources, creating a
unified picture of your data.

Source: XKCD 1810

For example, let's say you have one table with your friends' names and contact information, and another table
with their favorite hobbies. You could use a JOIN operation to combine the two tables based on the common
column of their names, creating a new table that includes both their contact info and hobbies. Now you can plan
the perfect party that caters to everyone's interests!

JOIN operations are crucial for complex data analysis, where data is spread across multiple tables with different
attributes. They allow you to extract valuable insights by connecting the dots between different data sources.
Here's a conceptual example

Terms:

Primary Key : a column guaranteed to be unique for each record (e.g. Alice's ID 1)
Foreign Key : a column in Table A storing a primary key from table B

In such database structures, there are primary keys and foreign keys which are used to interlinked data between
tables. The customer_id in Orders table resolves into column id in Customer table.

There are at least two standards to write JOIN query: using WHERE or JOIN keyword. We'll take an example
how to connect teachers with its courses.

SELECT c.name as customer_name, t.product_name

FROM Orders o, Customers c
WHERE o.customer_id = c.id

Which is equivalent with

SELECT c.name as customer_name, t.product_name

FROM Orders o
JOIN Customers c ON o.customer_id = c.id

Questions: Should referenced key always be primary key?

Types of JOIN

INNER JOIN

This type of JOIN returns only the rows that have matching values in both tables based on a specified join
condition. In other words, it returns the intersection of the two tables. INNER JOIN is the most commonly used
JOIN operation in databases.

SELECT Customers.customer_id, Customers.name, Orders.order_id, Orders.product_name

FROM Customers
INNER JOIN Orders
ON Customers.customer_id = Orders.customer_id;

This query will return the following result:

LEFT JOIN

This type of JOIN returns all the rows from the left table and the matching rows from the right table based on a
specified join condition. If there are no matching rows in the right table, the result will still include the rows from
the left table with NULL values in the columns from the right table.

SELECT Customers.customer_id, Customers.name, Orders.order_id, Orders.product_name

FROM Customers
LEFT JOIN Orders
ON Customers.customer_id = Orders.customer_id;

This query will return the following result:

RIGHT JOIN

This type of JOIN is similar to LEFT JOIN but returns all the rows from the right table and the matching rows
from the left table based on a specified join condition. If there are no matching rows in the left table, the result
will still include the rows from the right table with NULL values in the columns from the left table.

SELECT Customers.customer_id, Customers.name, Orders.order_id, Orders.product_name

FROM Customers
RIGHT JOIN Orders
ON Customers.customer_id = Orders.customer_id;

This query will return the following result:

FULL OUTER JOIN

This type of JOIN returns all the rows from both tables and NULL values in the columns that do not have
matching values in the other table based on a specified join condition. In other words, it returns the union of the
two tables.

SELECT Customers.customer_id, Customers.name, Orders.order_id, Orders.product_name

FROM Customers
FULL OUTER JOIN Orders
ON Customers.customer_id = Orders.customer_id;

This query will return the following result:

CROSS JOIN

This type of JOIN returns the Cartesian product of the two tables, where each row from the first table is
combined with each row from the second table. CROSS JOIN does not require a join condition and can result in
a large number of rows.

SELECT Customers.customer_id, Customers.name, Orders.order_id, Orders.product_name

FROM Customers
CROSS JOIN Orders;

This query will return the following result:

Practical example

In [ ]:
df_3 = _deepnote_execute_sql('SELECT "Customer"."CustomerId", "Customer"."FirstName"||\'
\' ||"Customer"."LastName" as CustomerName \n FROM "Invoice" \n JOIN "Customer" ON "Invoice"
."CustomerId" = "Customer"."CustomerId"', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_3
Out[ ]:

CustomerId customername

0 2 Leonie Köhler

1 4 Bjørn Hansen

2 8 Daan Peeters

3 14 Mark Philips

4 23 John Gordon

... ... ...

407 25 Victor Stevens

408 29 Robert Brown

409 35 Madalena Sampaio

410 44 Terhi Hämäläinen

411 58 Manoj Pareek

412 rows × 2 columns

In [ ]:
df_6 = _deepnote_execute_sql('SELECT "Customer"."CustomerId", "Customer"."FirstName"||\'
\' ||"Customer"."LastName" as CustomerName \n FROM "Invoice" \n JOIN "Customer" ON "Invoice"
."CustomerId" = "Customer"."CustomerId"\n GROUP BY "Customer"."CustomerId"', 'SQL_5C2E9A9B
_B591_4FC4_ADD5_829EB3188444')
df_6
Out[ ]:

CustomerId customername

0 29 Robert Brown

1 54 Steve Murray

2 4 Bjørn Hansen

3 34 João Fernandes

4 51 Joakim Johansson

5 52 Emma Jones

6 10 Eduardo Martins

7 35 Madalena Sampaio

8 45 Ladislav Kovács

9 6 Helena Holý

10 39 Camille Bernard

11 36 Hannah Schneider

12 31 Martha Silk

13 50 Enrique Muñoz

14 14 Mark Philips

15 22 Heather Leacock

16 59 Puja Srivastava

17 13 Fernanda Ramos

18 2 Leonie Köhler

19 16 Frank Harris
20 CustomerId
11 customername
Alexandre Rocha

21 44 Terhi Hämäläinen

22 42 Wyatt Girard

23 41 Marc Dubois

24 46 Hugh O'Reilly

25 40 Dominique Lefebvre

26 43 Isabelle Mercier

27 32 Aaron Mitchell

28 53 Phil Hughes

29 7 Astrid Gruber

30 9 Kara Nielsen

31 38 Niklas Schröder

32 15 Jennifer Peterson

33 26 Richard Cunningham

34 12 Roberto Almeida

35 48 Johannes Van der Berg

36 24 Frank Ralston

37 57 Luis Rojas

38 19 Tim Goyer

39 25 Victor Stevens

40 30 Edward Francis

41 21 Kathy Chase

42 49 Stanislaw Wójcik

43 47 Lucas Mancini

44 3 François Tremblay

45 17 Jack Smith

46 20 Dan Miller

47 28 Julia Barnett

48 37 Fynn Zimmermann

49 33 Ellie Sullivan

50 1 Luís Gonçalves

51 5 Frantiek Wichterlová

52 18 Michelle Brooks

53 55 Mark Taylor

54 27 Patrick Gray

55 23 John Gordon

56 56 Diego Gutiérrez

57 58 Manoj Pareek

58 8 Daan Peeters

In [ ]:
df_5 = _deepnote_execute_sql('SELECT "Customer"."CustomerId", "Customer"."FirstName"||\'
\' ||"Customer"."LastName" as CustomerName, "Track"."Name" \n FROM "Invoice" \n JOIN "Custom
er" ON "Invoice"."CustomerId" = "Customer"."CustomerId"\n JOIN "InvoiceLine" ON "Invoice".
"InvoiceId" = "InvoiceLine"."InvoiceId"\n JOIN "Track" ON "Track"."TrackId" = "InvoiceLine
"."TrackId"', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_5
Out[ ]:

CustomerId customername Name

0 2 Leonie Köhler Balls to the Wall

1 2 Leonie Köhler Restless and Wild

2 4 Bjørn Hansen Put The Finger On You

3 4 Bjørn Hansen Inject The Venom

4 4 Bjørn Hansen Evil Walks

... ... ... ...

2235 44 Terhi Hämäläinen Looking For Love

2236 44 Terhi Hämäläinen Sweet Lady Luck

2237 44 Terhi Hämäläinen Feirinha da Pavuna/Luz do Repente/Bagaço da La...

2238 44 Terhi Hämäläinen Samba pras moças

2239 58 Manoj Pareek Hot Girl

2240 rows × 3 columns

In [ ]:
df_4 = _deepnote_execute_sql('SELECT "Customer"."CustomerId", "Customer"."FirstName"||\'
\' ||"Customer"."LastName" as CustomerName, string_agg("Track"."Name", \' ,\' ) \n FROM "Invo
ice" \n JOIN "Customer" ON "Invoice"."CustomerId" = "Customer"."CustomerId"\n JOIN "Invoice
Line" ON "Invoice"."InvoiceId" = "InvoiceLine"."InvoiceId"\n JOIN "Track" ON "Track"."Trac
kId" = "InvoiceLine"."TrackId"\n GROUP BY "Customer"."CustomerId"', 'SQL_5C2E9A9B_B591_4FC
4_ADD5_829EB3188444')
df_4
Out[ ]:

CustomerId customername string_agg

0 29 Robert Brown Your Time Is Gonna Come,Mensagen De Amor (2000...

1 54 Steve Murray Someday Never Comes,Give Me Novacaine,Extraord...

2 4 Bjørn Hansen Put The Finger On You,Inject The Venom,Evil Wa...

3 34 João Fernandes Helpless,Mouth To Mouth,Thank You,Living Lovin...

4 51 Joakim Johansson Invaders,Run to the Hills,I Don't Know,Flying ...

5 52 Emma Jones Samba Makossa,Lixo Do Mangue,Firmamento,Já Foi...

6 10 Eduardo Martins Admirável Gado Novo,Mis Penas Lloraba Yo (Ao V...

7 35 Madalena Sampaio Wave (Os Cariocas),Garota de Ipanema (Dick Far...

8 45 Ladislav Kovács O Pulso,Nem 5 Minutos Guardados,Dirty Little T...

9 6 Helena Holý When You Gonna Learn (Digeridoo),Whatever It I...

10 39 Camille Bernard Prometheus Overture, Op. 43,Sonata for Solo Vi...

11 36 Hannah Schneider She Loves Me Not,Paths Of Glory,Fear Is The Ke...

12 31 Martha Silk Diga Lá, Coração,Comportamento Geral,Podres Po...

13 50 Enrique Muñoz Hallowed Be Thy Name,Phantom Lord,Seek & Destr...

14 14 Mark Philips Right Through You,Not The Doctor,Bleed The Fre...

15 22 Heather Leacock When Love Comes To Town,Angel Of Harlem,Sozinh...

16 59 Puja Srivastava Cotton Fields,Don't Look Now,Before You Accuse...

17 13 Fernanda Ramos Dust N' Bones,Live and Let Die,The Memory Rema...

18 2 Leonie Köhler Balls to the Wall,Restless and Wild,Lavadeira,...

19 16 Frank Harris Valentino's,Promises,Signe,Ghost Of The Naviga...

20 11 Alexandre Rocha Leper Messiah,Damage Inc.,Green Disease,Why Go...

20 11 Alexandre Rocha Leper Messiah,Damage Inc.,Green Disease,Why Go...
CustomerId customername string_agg
21 44 Terhi Hämäläinen Dazed And Confused,L'Avventura,Soul Parsifal,Q...

22 42 Wyatt Girard Com Açúcar E Com Afeto,Meu Caro Amigo,Trocando...

23 41 Marc Dubois Suite No. 3 in D, BWV 1068: III. Gavotte I & I...

24 46 Hugh O'Reilly Etnia,Samba Do Lado,Sobremesa,Sangue De Bairro...

25 40 Dominique Lefebvre Morena De Angola,A Banda,União Da Ilha,Put You...

26 43 Isabelle Mercier Prá Dizer Adeus,Família,Act IV, Symphony,Music...

27 32 Aaron Mitchell How Many More Times,What Is And What Should Ne...

28 53 Phil Hughes The Prisoner,Lord Of The Flies,Condição,Aquilo...

29 7 Astrid Gruber Soldier Side - Intro,Revenga,Solitary,Fire + W...

30 9 Kara Nielsen The Thing That Should Not Be,Welcome Home (San...

31 38 Niklas Schröder Atras Da Porta,Tatuagem,Pristina,Caffeine,RV,E...

32 15 Jennifer Peterson Perfect Crime,Bad Obsession,Stone Free,Satch B...

33 26 Richard Cunningham Radio Free Aurope,Perfect Circle,Drowning Man,...

34 12 Roberto Almeida Right Next Door to Hell,In The Evening,Fool In...

35 48 Johannes Van der Berg Underwater Love,Falamansa Song,Avisa,Desaforo,...

36 24 Frank Ralston Sunday Bloody Sunday,New Year's Day,Meet Kevin...

37 57 Luis Rojas Good Golly Miss Molly,Wrote A Song For Everyon...

38 19 Tim Goyer J Squared,Maria,Hey Cisco,Fortuneteller,High B...

39 25 Victor Stevens Stuck With Me,Nice Guys Finish Last,Macy's Day...

40 30 Edward Francis Black Mountain Side,Communication Breakdown,Ca...

41 21 Kathy Chase Longview,Basket Case,She,Geek Stink Breath,Yes...

42 49 Stanislaw Wójcik Romance Ideal,SKA,Finding My Way,Evil Ways,It'...

43 47 Lucas Mancini Me Liga,Quase Um Segundo,Palavras,A Melhor For...

44 3 François Tremblay Pilot,Through the Looking Glass, Pt. 1,Canta, ...

45 17 Jack Smith Believe,As We Sleep,Double Talkin' Jive,The Ga...

46 20 Dan Miller Bem Devagar,Saudosismo,Posso Perder Minha Mulh...

47 28 Julia Barnett So Central Rain,Pretty Persuasion,Can't Stand ...

48 37 Fynn Zimmermann Bye, Bye Brasil,Susie Q,Proud Mary,Over And Ou...

49 33 Ellie Sullivan Naked In Front Of The Computer,Moonchild,Can I...

50 1 Luís Gonçalves Experiment In Terra,Take the Celestra,Shout It...

51 5 Frantiek Wichterlová Wet My Bed,Crackerman,#9 Dream,Give Peace a Ch...

52 18 Michelle Brooks Terra,Eclipse Oculto,Hey Hey,Lonely Stranger,L...

53 55 Mark Taylor Walking On The Water,Suzie-Q, Pt. 2,Fortunes O...

54 27 Patrick Gray These Colours Don't Run,For the Greater Good o...

55 23 John Gordon Your Time Has Come,Dandelion,Rock 'N' Roll Mus...

Love Gun,Deuce,Wake Me Up When September

56 56 Diego Gutiérrez
Ends,...

57 58 Manoj Pareek Shock Me,She,Black Night,Pictures Of Home,Casc...

58 8 Daan Peeters Dog Eat Dog,Overdose,Love In An Elevator,Janie...

In [ ]:
df_1 = _deepnote_execute_sql('SELECT "Customer"."FirstName", SUM("Total") as TotalBought
\n FROM "Invoice" \n JOIN "Customer" ON "Invoice"."CustomerId" = "Customer"."CustomerId"\n G
ROUP BY "Customer"."CustomerId" \n ORDER BY TotalBought DESC', 'SQL_5C2E9A9B_B591_4FC4_ADD
5_829EB3188444')
df_1
Out[ ]:
Out[ ]:

FirstName totalbought

0 Helena 49.62

1 Richard 47.62

2 Luis 46.62

3 Ladislav 45.62

4 Hugh 45.62

5 Julia 43.62

6 Fynn 43.62

7 Frank 43.62

8 Astrid 42.62

9 Victor 42.62

10 Terhi 41.62

11 Johannes 40.62

12 Frantiek 40.62

13 Isabelle 40.62

14 François 39.62

15 Bjørn 39.62

16 João 39.62

17 Heather 39.62

18 Wyatt 39.62

19 Jack 39.62

20 Dan 39.62

21 Luís 39.62

22 Joakim 38.62

23 Tim 38.62

24 Dominique 38.62

25 Jennifer 38.62

26 Manoj 38.62

27 Camille 38.62

28 Kara 37.62

29 Niklas 37.62

30 Martha 37.62

31 Roberto 37.62

32 Hannah 37.62

33 Madalena 37.62

34 Eduardo 37.62

35 Edward 37.62

36 Kathy 37.62

37 Stanislaw 37.62

38 Lucas 37.62

39 Robert 37.62

40 Diego 37.62

41 Emma 37.62

42 Fernanda 37.62
43 Marc totalbought
FirstName 37.62

44 Mark 37.62

45 Aaron 37.62

46 Phil 37.62

47 Enrique 37.62

48 John 37.62

49 Ellie 37.62

50 Daan 37.62

51 Steve 37.62

52 Michelle 37.62

53 Mark 37.62

54 Patrick 37.62

55 Leonie 37.62

56 Frank 37.62

57 Alexandre 37.62

58 Puja 36.64

In [ ]:
df_2 = _deepnote_execute_sql('SELECT "CustomerId", SUM("Total") as TotalBought \n FROM "In
voice" \n GROUP BY "CustomerId" \n ORDER BY TotalBought DESC', 'SQL_5C2E9A9B_B591_4FC4_ADD5
_829EB3188444')
df_2
Out[ ]:

CustomerId totalbought

0 6 49.62

1 26 47.62

2 57 46.62

3 45 45.62

4 46 45.62

5 28 43.62

6 37 43.62

7 24 43.62

8 7 42.62

9 25 42.62

10 44 41.62

11 48 40.62

12 5 40.62

13 43 40.62

14 3 39.62

15 4 39.62

16 34 39.62

17 22 39.62

18 42 39.62

19 17 39.62

20 20 39.62

21 1 39.62
21 1 39.62
CustomerId totalbought
22 51 38.62

23 19 38.62

24 40 38.62

25 15 38.62

26 58 38.62

27 39 38.62

28 9 37.62

29 38 37.62

30 31 37.62

31 12 37.62

32 36 37.62

33 35 37.62

34 10 37.62

35 30 37.62

36 21 37.62

37 49 37.62

38 47 37.62

39 29 37.62

40 56 37.62

41 52 37.62

42 13 37.62

43 41 37.62

44 14 37.62

45 32 37.62

46 53 37.62

47 50 37.62

48 23 37.62

49 33 37.62

50 8 37.62

51 54 37.62

52 18 37.62

53 55 37.62

54 27 37.62

55 2 37.62

56 16 37.62

57 11 37.62

58 59 36.64

In [ ]:
df_7 = _deepnote_execute_sql('SELECT "InvoiceDate", COUNT(1) as TotalSold\n FROM "Invoice"
\n GROUP BY "InvoiceDate"\n ORDER BY TotalSold DESC', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB318
8444')
df_7

Out[ ]:

InvoiceDate totalsold
0 InvoiceDate
2012-12-28 totalsold
2

1 2013-07-02 2

2 2009-04-04 2

3 2011-04-18 2

4 2012-01-22 2

... ... ...

349 2010-05-14 1

350 2010-01-13 1

351 2010-11-24 1

352 2010-06-13 1

353 2010-06-30 1

354 rows × 2 columns

Performance optimization

EXPLAIN SELECT orders.order_id, customers.customer_name

FROM orders
INNER JOIN customers
ON orders.customer_id = customers.customer_id;

The EXPLAIN results in PostgreSQL will show the query execution plan, which is a step-by-step breakdown of
how the PostgreSQL query planner will execute the query.

Here's an example of what the results might look like:

QUERY PLAN
--------------------------------------------------------------------------------
-
Hash Join (cost=31.05..41.43 rows=1000 width=50)
Hash Cond: (orders.customer_id = customers.customer_id)
-> Seq Scan on orders (cost=0.00..8.00 rows=1000 width=26)
-> Hash (cost=20.00..20.00 rows=1000 width=28)
-> Seq Scan on customers (cost=0.00..20.00 rows=1000 width=28)

The query plan shows that PostgreSQL will use a hash join to combine the two tables.

The first step is a sequential scan of the orders table, with an estimated cost of 0.00..8.00. PostgreSQL estimates
that there are 1000 rows in the table, and it will need to read all of them to perform the join.

The second step is also a sequential scan, but this time of the customers table. This table is also estimated to
have 1000 rows, and will also need to be fully scanned to perform the join.

Finally, the query plan shows that PostgreSQL will use a hash join operation to combine the two tables, based on
the join condition orders.customer_id = customers.customer_id. The hash join has an estimated cost of
31.05..41.43, which includes the cost of reading the tables and performing the join operation.

By analyzing the query plan, we can see that the query will perform a full table scan of both the orders and
customers tables, and that a hash join is used to combine the two tables. This query plan provides useful
information for optimizing the query and improving its performance if needed.

Deep Dive

In [ ]:

df_10 = _deepnote_execute_sql('SELECT * FROM orders WHERE customer_id = 733219;', 'SQL_5

C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_10
Out[ ]:

order_id customer_id order_date total_amount

0 1000061 733219 2022-12-16 207.69

1 1021143 733219 2022-10-13 357.43

2 1316755 733219 2022-02-06 889.33

In [ ]:
df_8 = _deepnote_execute_sql('-- NORMAL QUERY\n EXPLAIN ANALYZE SELECT * FROM "orders_no_
index" WHERE "customer_id" = 733219;', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_8

Out[ ]:

QUERY PLAN

0 Gather (cost=1000.00..14723.14 rows=2 width=1...

1 Workers Planned: 1

2 Workers Launched: 0

3 -> Parallel Seq Scan on orders_no_index (c...

4 Filter: (customer_id = 733219)

5 Rows Removed by Filter: 999997

6 Planning Time: 1.797 ms

7 Execution Time: 384.361 ms

In this example, we can see that the query is doing a parallel sequential scan on the table "orders_no_index",
filtering rows where the "customer_id" column equals '733219'. The estimated cost of this operation is 13722.94,
and the actual execution time is 72.526 ms.

To identify which column should be indexed, we can look at the Filter line in the output. In this case, we can see
that the query is filtering rows based on the "customer_id" column. Since this is a simple equality check, we can
create an index on this column to speed up the query.

Now let's optimize with index

CREATE INDEX orders_customer_id_idx ON orders (customer_id);

In [ ]:
df_9 = _deepnote_execute_sql('EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 73
3219;', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_9
Out[ ]:

QUERY PLAN

Index Scan using orders_customer_id_idx on

0
ord...

1 Index Cond: (customer_id = 733219)

2 Planning Time: 0.065 ms

3 Execution Time: 0.049 ms

As you can see, the EXPLAIN ANALYZE output now shows an "Index Scan" operation on the
orders_customer_id_idx index, with a cost of 3.76. The estimated number of rows returned by the query is 2,
which didn't match the actual number of rows returned. The actual time it took to execute the query is now
much faster, at 0.057 milliseconds, compared to 72.526 milliseconds without the index.

In summary, adding an index on the customer_id column of the orders table significantly improved the
performance of the query by allowing PostgreSQL to scan the index instead of the entire table. This also will
improve JOIN operation.

QUERY

In [ ]:
df_11 = _deepnote_execute_sql('SELECT\n customers.customer_name,\n orders.order_date,\n
orders.total_amount\n FROM\n orders\n JOIN customers ON orders.customer_id = customers.cus
tomer_id\n WHERE customers.customer_id = 733219;', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB31884
44')
df_11

Out[ ]:

customer_name order_date total_amount

0 Customer 733219 2022-12-16 207.69

1 Customer 733219 2022-10-13 357.43

2 Customer 733219 2022-02-06 889.33

WITHOUT INDEX

In [ ]:

df_12 = _deepnote_execute_sql('EXPLAIN ANALYZE\n SELECT\n customers.customer_name,\n order

s_no_index.order_date,\n orders_no_index.total_amount\n FROM\n orders_no_index\n JOIN cust
omers ON orders_no_index.customer_id = customers.customer_id\n WHERE customers.customer_id
= 733219;', 'SQL_5C2E9A9B_B591_4FC4_ADD5_829EB3188444')
df_12

Out[ ]:

QUERY PLAN

0 Gather (cost=1000.42..14725.79 rows=2 width=2...

1 Workers Planned: 1

2 Workers Launched: 0

3 -> Nested Loop (cost=0.42..13725.59 rows=1...

4 -> Parallel Seq Scan on orders_no_ind...

5 Filter: (customer_id = 733219)

6 Rows Removed by Filter: 999997

7 -> Index Scan using customers_pkey on...

8 Index Cond: (customer_id = 733219)

9 Planning Time: 0.096 ms

10 Execution Time: 73.538 ms

WITH INDEX

In [ ]:

df_13 = _deepnote_execute_sql('EXPLAIN ANALYZE\n SELECT\n customers.customer_name,\n order

s.order_date,\n orders.total_amount\n FROM\n orders\n JOIN customers ON orders.customer_id
= customers.customer_id\n WHERE customers.customer_id = 733219;', 'SQL_5C2E9A9B_B591_4FC4_
ADD5_829EB3188444')
df_13

Out[ ]:
Out[ ]:

QUERY PLAN

0 Nested Loop (cost=0.85..6.42 rows=2 width=25)...

1 -> Index Scan using customers_pkey on custo...

2 Index Cond: (customer_id = 733219)

3 -> Index Scan using orders_customer_id_idx ...

4 Index Cond: (customer_id = 733219)

5 Planning Time: 0.096 ms

6 Execution Time: 0.074 ms

Exploration

Install Postgres & load Chinook dataset, then try to answer these questions:

Which are the top 5 most popular genres in the Chinook database? How many tracks are there for each
genre?

Who are the top 10 best-selling artists in the Chinook store? How many tracks have they sold?

What is the average purchase price per invoice for each country in the Chinook database? Which countries
have the highest and lowest average purchase prices?

What is the correlation between the length of a track and its price in the Chinook store? Are longer tracks
generally more expensive or less expensive?

How has the Chinook store's sales performance changed over time? Is it trending upwards, downwards, or
staying the same? Can you identify any patterns in the data?

Supplementary material

FInd top 90 percentile

In [ ]:

df_14 = _deepnote_execute_sql('SELECT c."CustomerId", c."FirstName", c."LastName", \n

SUM(i."Total") AS "TotalSpent",\n COUNT(DISTINCT i."InvoiceId") AS "NumPurchases",\
n COUNT(DISTINCT il."TrackId") AS "NumTracksPurchased"\n FROM "Customer" c\n JOIN "In
voice" i ON i."CustomerId" = c."CustomerId"\n JOIN "InvoiceLine" il ON il."InvoiceId" = i.
"InvoiceId"\n GROUP BY c."CustomerId"\n ORDER BY "TotalSpent" DESC\n ', 'SQL_5C2E9A9B_B591_4
FC4_ADD5_829EB3188444')
df_14
Out[ ]:

CustomerId FirstName LastName TotalSpent NumPurchases NumTracksPurchased

0 6 Helena Holý 502.62 7 38

1 26 Richard Cunningham 474.62 7 38

2 45 Ladislav Kovács 446.62 7 38

3 46 Hugh O'Reilly 446.62 7 38

4 57 Luis Rojas 415.62 7 38

5 25 Victor Stevens 404.62 7 38

6 CustomerId
7 Astrid
FirstName Gruber TotalSpent
LastName 404.62 NumPurchases
7 NumTracksPurchased
38

7 37 Fynn Zimmermann 388.62 7 38

8 24 Frank Ralston 378.62 7 38

9 5 Frantiek Wichterlová 376.62 7 38

10 43 Isabelle Mercier 376.62 7 38

11 28 Julia Barnett 370.62 7 38

12 4 Bjørn Hansen 362.62 7 38

13 17 Jack Smith 352.62 7 38

14 34 João Fernandes 352.62 7 38

15 48 Johannes Van der Berg 352.62 7 38

16 44 Terhi Hämäläinen 350.62 7 38

17 15 Jennifer Peterson 343.62 7 38

18 51 Joakim Johansson 340.62 7 38

19 42 Wyatt Girard 338.62 7 38

20 3 François Tremblay 338.62 7 38

21 1 Luís Gonçalves 338.62 7 38

22 22 Heather Leacock 338.62 7 38

23 20 Dan Miller 338.62 7 38

24 40 Dominique Lefebvre 336.62 7 38

25 58 Manoj Pareek 335.62 7 38

26 39 Camille Bernard 335.62 7 38

27 19 Tim Goyer 335.62 7 38

28 55 Mark Taylor 334.62 7 38

29 56 Diego Gutiérrez 334.62 7 38

30 52 Emma Jones 334.62 7 38

31 2 Leonie Köhler 334.62 7 38

32 8 Daan Peeters 334.62 7 38

33 9 Kara Nielsen 334.62 7 38

34 10 Eduardo Martins 334.62 7 38

35 11 Alexandre Rocha 334.62 7 38

36 12 Roberto Almeida 334.62 7 38

37 13 Fernanda Ramos 334.62 7 38

38 14 Mark Philips 334.62 7 38

39 16 Frank Harris 334.62 7 38

40 18 Michelle Brooks 334.62 7 38

41 21 Kathy Chase 334.62 7 38

42 23 John Gordon 334.62 7 38

43 27 Patrick Gray 334.62 7 38

44 29 Robert Brown 334.62 7 38

45 30 Edward Francis 334.62 7 38

46 31 Martha Silk 334.62 7 38

47 32 Aaron Mitchell 334.62 7 38

48 33 Ellie Sullivan 334.62 7 38

49 35 Madalena Sampaio 334.62 7 38

50 36 Hannah Schneider 334.62 7 38

51 CustomerId
38 Niklas
FirstName Schröder TotalSpent
LastName 334.62 NumPurchases
7 NumTracksPurchased
38

52 41 Marc Dubois 334.62 7 38

53 47 Lucas Mancini 334.62 7 38

54 49 Stanislaw Wójcik 334.62 7 38

55 50 Enrique Muñoz 334.62 7 38

56 53 Phil Hughes 334.62 7 38

57 54 Steve Murray 334.62 7 38

58 59 Puja Srivastava 331.66 6 36

In [ ]:

import pandas as pd
import plotly.express as px

# create a histogram of CLV values

fig = px.histogram(df_14, x='TotalSpent', nbins=20)
fig.show()

In [ ]:
# calculate the 90th percentile of CLV values
pct_90 = df_14['TotalSpent'].quantile(0.9)

# count the number of high-value customers

num_high_value_customers = (df_14['TotalSpent'] >= pct_90).sum()

print(f"There are {num_high_value_customers} high-value customers in the Chinook store (i

.e. the top 10%).")

There are 7 high-value customers in the Chinook store (i.e. the top 10%).

In [ ]:

df_14.loc[df_14['TotalSpent'] >= pct_90]

Out[ ]:
Out[ ]:

CustomerId FirstName LastName TotalSpent NumPurchases NumTracksPurchased

0 6 Helena Holý 502.62 7 38

1 26 Richard Cunningham 474.62 7 38

2 45 Ladislav Kovács 446.62 7 38

3 46 Hugh O'Reilly 446.62 7 38

4 57 Luis Rojas 415.62 7 38

5 25 Victor Stevens 404.62 7 38

6 7 Astrid Gruber 404.62 7 38

In [ ]:

Created in Deepnote

Understanding Joins in Database Management Systems
No ratings yet
Understanding Joins in Database Management Systems
7 pages
Joins SQL
No ratings yet
Joins SQL
16 pages
Joins DBMS
No ratings yet
Joins DBMS
21 pages
Orderid Customerid Employeeid Orderdate: Joins
No ratings yet
Orderid Customerid Employeeid Orderdate: Joins
10 pages
Writeups - Assignment 4
No ratings yet
Writeups - Assignment 4
32 pages
DB Presentation
No ratings yet
DB Presentation
42 pages
Idbslab 10
No ratings yet
Idbslab 10
10 pages
DBMS Experiment - Lab 5
No ratings yet
DBMS Experiment - Lab 5
26 pages
DBMS Lab 5
No ratings yet
DBMS Lab 5
8 pages
CS DBMS 5
No ratings yet
CS DBMS 5
5 pages
DBMS Exp 3
No ratings yet
DBMS Exp 3
8 pages
Mysql Joins: by Diksha Nagpal
No ratings yet
Mysql Joins: by Diksha Nagpal
32 pages
CHAPTER 8. Display Data From Multiple Tables
No ratings yet
CHAPTER 8. Display Data From Multiple Tables
6 pages
SQL Joins 1713036947
No ratings yet
SQL Joins 1713036947
29 pages
Lecture 06 Joins
No ratings yet
Lecture 06 Joins
44 pages
Lab07 Mysql
No ratings yet
Lab07 Mysql
12 pages
Comparative Analysis of Join Types Join: ID Name
No ratings yet
Comparative Analysis of Join Types Join: ID Name
8 pages
12.SQL Queiresjoins
No ratings yet
12.SQL Queiresjoins
36 pages
Snowflake
No ratings yet
Snowflake
12 pages
ASSIGNMENT Database
No ratings yet
ASSIGNMENT Database
6 pages
Database - Lec4 - Joins and Subqueries in SQL
No ratings yet
Database - Lec4 - Joins and Subqueries in SQL
13 pages
SQL Joins
No ratings yet
SQL Joins
6 pages
Chat GPT
No ratings yet
Chat GPT
3 pages
SQL Joins - Odp
No ratings yet
SQL Joins - Odp
33 pages
SQL Join
No ratings yet
SQL Join
11 pages
Lec08 - SQL JOIN
No ratings yet
Lec08 - SQL JOIN
30 pages
Join Operations
No ratings yet
Join Operations
8 pages
Cartesian Product - EQUI - JOIN
No ratings yet
Cartesian Product - EQUI - JOIN
5 pages
SQL Session 5v3
No ratings yet
SQL Session 5v3
25 pages
SQL Join
No ratings yet
SQL Join
5 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
13 pages
Introduction To Join Types
No ratings yet
Introduction To Join Types
9 pages
Joins Subquery
No ratings yet
Joins Subquery
17 pages
13.1 Joins in MySQL
No ratings yet
13.1 Joins in MySQL
32 pages
Joins
No ratings yet
Joins
6 pages
Week 4 V 3
No ratings yet
Week 4 V 3
12 pages
Joins in Database
No ratings yet
Joins in Database
6 pages
SQL - Joins
No ratings yet
SQL - Joins
5 pages
Week 3SQL
No ratings yet
Week 3SQL
6 pages
SQL Joins
No ratings yet
SQL Joins
6 pages
Week 4
No ratings yet
Week 4
12 pages
SQL Joins Tutorial: Cross Join, Full Outer Join, Inner Join, Left Join, and Right Join
No ratings yet
SQL Joins Tutorial: Cross Join, Full Outer Join, Inner Join, Left Join, and Right Join
22 pages
SQL Topic 7 - LECTURE 3
No ratings yet
SQL Topic 7 - LECTURE 3
25 pages
SQL Joins
50% (2)
SQL Joins
24 pages
Types of SQL JOins
No ratings yet
Types of SQL JOins
3 pages
08 Joins
No ratings yet
08 Joins
7 pages
Transcript: SQL Joins What Is SQL Join?
No ratings yet
Transcript: SQL Joins What Is SQL Join?
6 pages
SQL Joins
No ratings yet
SQL Joins
4 pages
SQL - Using Joins SQL - Using Joins: Table 1 CUSTOMERS Table
No ratings yet
SQL - Using Joins SQL - Using Joins: Table 1 CUSTOMERS Table
2 pages
6.SQL JOINS Operations
No ratings yet
6.SQL JOINS Operations
2 pages
SQL Join Queries
No ratings yet
SQL Join Queries
3 pages
Joins
No ratings yet
Joins
16 pages
SQL Using Joins PDF
No ratings yet
SQL Using Joins PDF
2 pages
Joins
No ratings yet
Joins
6 pages
Joins
No ratings yet
Joins
9 pages
SQL JOIN Types Explained
No ratings yet
SQL JOIN Types Explained
8 pages
Joins
No ratings yet
Joins
8 pages
SQL Lab Assignment
No ratings yet
SQL Lab Assignment
6 pages
Exp 7
No ratings yet
Exp 7
6 pages
Alteon ADC Professional Lab Manual 2018-Dec5
No ratings yet
Alteon ADC Professional Lab Manual 2018-Dec5
62 pages
Tuning PostgreSQL With Pgbench
No ratings yet
Tuning PostgreSQL With Pgbench
11 pages
Web Services Material - Sriman Latest PDF
No ratings yet
Web Services Material - Sriman Latest PDF
300 pages
Ibm Hardware Innovation: Gse Zexpertenforum April 2-3, 2019
No ratings yet
Ibm Hardware Innovation: Gse Zexpertenforum April 2-3, 2019
27 pages
Pci Dss Self Assessment Certificate
No ratings yet
Pci Dss Self Assessment Certificate
1 page
ITS OD 102 Network Security
0% (1)
ITS OD 102 Network Security
2 pages
Front-End Developer Handbook 2019 - Compressed - Removed-1
No ratings yet
Front-End Developer Handbook 2019 - Compressed - Removed-1
49 pages
Oracle Golden Gate (GG) vs. Oracle Stream: Sanjay Naik
No ratings yet
Oracle Golden Gate (GG) vs. Oracle Stream: Sanjay Naik
4 pages
Reader Software Specification-2022-2023
No ratings yet
Reader Software Specification-2022-2023
191 pages
Unit 4 - Memory Organization
No ratings yet
Unit 4 - Memory Organization
127 pages
ISACA
No ratings yet
ISACA
32 pages
Ex - No: 1 Study of Basic Network Commands
No ratings yet
Ex - No: 1 Study of Basic Network Commands
6 pages
Ccna Semester 1 V 3.0
No ratings yet
Ccna Semester 1 V 3.0
69 pages
Chapter-10 Working With Multiple Tables
No ratings yet
Chapter-10 Working With Multiple Tables
4 pages
Sales and Inventory MGT Sys
No ratings yet
Sales and Inventory MGT Sys
86 pages
Image Classification With Convolutional Neural Networks: Plotting
No ratings yet
Image Classification With Convolutional Neural Networks: Plotting
16 pages
Chapter 2
No ratings yet
Chapter 2
29 pages
Pig Operations Load Store Dump Describe
No ratings yet
Pig Operations Load Store Dump Describe
8 pages
Rules
No ratings yet
Rules
84 pages
Adbms PDF
No ratings yet
Adbms PDF
199 pages
L6. System Software
No ratings yet
L6. System Software
9 pages
Advanced Database Operation
No ratings yet
Advanced Database Operation
8 pages
Introducing Smart View For Google Workspace For Cloud EPM
No ratings yet
Introducing Smart View For Google Workspace For Cloud EPM
22 pages
822pm - 26.EPRA JOURNALS-5445
No ratings yet
822pm - 26.EPRA JOURNALS-5445
5 pages
Hackathon Project Report
No ratings yet
Hackathon Project Report
2 pages
DBT Materialization1
No ratings yet
DBT Materialization1
3 pages
Past Paper ITE1712-19S1 - Attempt Review
No ratings yet
Past Paper ITE1712-19S1 - Attempt Review
30 pages
How To Make A Simple Blog or Eportfolio in Google Blogger
No ratings yet
How To Make A Simple Blog or Eportfolio in Google Blogger
6 pages
CS231n Convolutional Neural Networks For Visual Recognition 2
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition 2
12 pages
Using TKProf To Compare Actual and Predicted Row Counts
No ratings yet
Using TKProf To Compare Actual and Predicted Row Counts
2 pages
IOT and ERP SYSTEM
No ratings yet
IOT and ERP SYSTEM
10 pages
Peningkatan Kualitas Pengelolaan Jurnal Ilmiah: Model Dashboard Information System Untuk
No ratings yet
Peningkatan Kualitas Pengelolaan Jurnal Ilmiah: Model Dashboard Information System Untuk
8 pages
sqqs2013 Group Assignment
No ratings yet
sqqs2013 Group Assignment
6 pages
Hospital Management Software Development: Olawale Ayotunde Sobogungod
No ratings yet
Hospital Management Software Development: Olawale Ayotunde Sobogungod
3 pages