Project Introduction: Chinook Database

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 42

Project:SQL Project

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform
 Peer Chat
Chat with peers and alumni

Project Introduction
SEND FEEDBACK

Chinook Database

Introduction
In this project, you will query the Chinook Database. The Chinook Database holds
information about a music store. For this project, you will be assisting the Chinook team with
understanding the media in their store, their customers and employees, and their invoice
information. To assist you in the queries ahead, the schema for the Chinook Database is
provided below. You can see the columns that link tables together via the arrows.
All of the below instructions are discussed in detail as we work through this lesson on your
way to completing this project. The below serves as a quick reference of what you will be
doing as you progress through the completion of this project.

Instructions
 You will need to follow the instructions on the next three concepts to get the Chinook
database up and running on your own machine, and check that it is set up correctly. There
will be two parts to this project.

1. The first part is a series of questions that will assure you have mastered the
main concepts taught throughout the SQL lessons. Though these questions will not be
"graded" by a reviewer, they will help you self assess.

2. The second part is a presentation. Similar to the first project, there isn't a
'right answer' for the second portion of the project. You have the ability to be creative in the
questions you ask. You will then write a SQL query to pull the data needed to successfully
answer your question. Use the pulled data to build a visual (bar chart, histogram, or another
plot) that answers your question. The essentials of your project submission are discussed in
the next sections. In order to review your presentation, you will need to save your
slides as a  PDF .

Project Walkthrough
This video shows one of our instructors giving some great hints and tips for how to get
started with your project: Walkthrough Video

Common Questions and Answers:


https://knowledge.udacity.com/?nanodegree=5ce69a5c-496f-11e8-b509-
932e3a4da789&page=1&project=5541b6b0-496f-11e8-b323-17cf4b56eb6a

NEXT

Project:SQL Project

SEARCH

RESOURCES
CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

Set Up the Database


SEND FEEDBACK

Let's set up your local environment


It's totally fine to get started learning SQL here in the classroom, but the way to really
master your skills is to get a local setup and learn to work within your own environment.

The next few problems are going to help make sure you are comfortable working locally.
Once you're set up you'll be able to use this workspace not only for this project...but
BEYOND!

The environment we'll use is pretty quick to set up and hopefully you'll be up and running in
no time :)

All you'll need to do here is...


 Download your new database
 Download DB Browser for SQLite
As soon as you have DB Broswer for SQLite connected to your new database you're ready
for the next page!

Download DB Browser for SQLite


There are many different database browsers that work with different types of databases. For
this course, we'll be using the DB Browser for SQLite. The other browsers you may use will
likely be very similar.

DB Browser for SQLite can be downloaded here: http://sqlitebrowser.org/


Download Database
You can download the database we will be using for this project from the bottom of the
page.

Connect the Browser to the Database


Here are the steps:

 Open up DB Browser to SQLite


 Click on Open Database
 Navigate to the Chinook.db file (probably in your downloads)
 Click on the Execute SQL
 Start querying your data
Start Querying Your Data
The database Entity Relationship Diagram was provided in the previous concept, but you
can also find it on the Chinook database homepage.
Once it looks like you have it all set up, you can start querying your database! First, we
could have a look at all the data from the Invoice table:
SELECT * FROM Invoice;
Your first query, AWESOME!

Now check out what's in the Employee table.


SELECT * FROM Employee;
Looks like you are ready to take on this PROJECT and the WORLD! Everything you
have been studying is going to come in handy now!
Supporting Materials
 chinook-db

NEXT

Project:SQL Project

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform
 Peer Chat
Chat with peers and alumni

Checking Setup
SEND FEEDBACK

 SUBMIT ANSWER
NEXT
; Project:SQL Project

SEARCH

RESOURCES
CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform
 Peer Chat
Chat with peers and alumni

Schema Match
SEND FEEDBACK
QUIZ QUESTION

Which of these columns are in the Invoice table?



CustomerID

InvoiceID

UnitPrice


BillingAddress

Name

BillingPostalCode
SUBMIT

NEXT
; Project:SQL Project

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

SQL: Question Set 1


SEND FEEDBACK

Question 1: Which countries have the most Invoices?


Use the Invoice table to determine the countries that have the most invoices. Provide a
table of BillingCountry and Invoices ordered by the number of invoices for each country.
The country with the most invoices should appear first.

Check Your Solution


Your solution should have 2 columns and 24 rows. The below image shows a header of
your ending table. The Invoices columns in a count of the number of invoices for each
country. It should be sorted from most to least.
Question 2: Which city has the best customers?
We would like to throw a promotional Music Festival in the city we made the most money.
Write a query that returns the 1 city that has the highest sum of invoice totals. Return both
the city name and the sum of all invoice totals.
Check Your Solution
The top city for Invoice dollars was Prague with an amount of 90.24.

Question 3: Who is the best customer?


The customer who has spent the most money will be declared the best customer. Build a
query that returns the person who has spent the most money. I found the solution by linking
the following three: Invoice, InvoiceLine, and Customer tables to retrieve this information,
but you can probably do it with fewer!

Check Your Solution


The customer who spent the most according to invoices was Customer 6 with 49.62 in
purchases.
NEXT

; Project:SQL Project

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes


11.  11. Project: SQL Project
 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

SQL: Question Set 2


SEND FEEDBACK

Question 1
Use your query to return the email, first name, last name, and Genre of all Rock Music
listeners. Return your list ordered alphabetically by email address starting with  A .
I chose to link information from the Customer, Invoice, InvoiceLine, Track,
and Genre tables, but you may be able to find another way to get at the information.

Check Your Solution


From my query, I found that all of the customers have a connection to Rock music (you
could see this by looking at the original length of the customers table). Your final table
should have 59 rows and 4 columns (if you want to check the connection to 'Rock' music).
The header of this table is provided below.
Question 2: Who is writing the rock music?
Now that we know that our customers love rock music, we can decide which musicians to
invite to play at the concert.

Let's invite the artists who have written the most rock music in our dataset. Write a query
that returns the Artist name and total track count of the top 10 rock bands.
You will need to use the Genre, Track , Album, and Artist tables.
Check Your Solution
The top 10 bands are shown below along with the number of songs each band has on
record.

Question 3
First, find which artist has earned the most according to the InvoiceLines?
Now use this artist to find which customer spent the most on this artist.

For this query, you will need to use the Invoice, InvoiceLine, Track, Customer, Album,


and Artist tables.
Notice, this one is tricky because the Total spent in the Invoice table might not be on a
single product, so you need to use the InvoiceLine table to find out how many of each
product was purchased, and then multiply this by the price for each artist.

Check Your Solution


The top artists according to invoice amounts are shown in the table below. The very top
being Iron Maiden.

Solution Continued with top Purchaser


Then, the top purchasers are shown in the table below. The customer with the highest total
invoice amount is customer 55, Mark Taylor.
NEXT

; Project:SQL Project

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match


5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

(Advanced) SQL: Question Set 3


SEND FEEDBACK

Advanced SQL
To solve the questions here, you will need to write a query that extends beyond the content
covered in these lessons. These questions are simply here to show you that there are
extensions of the material we have already covered, but you definitely have the building
blocks to tackle these tougher topics! These questions are given as additional material
to challenge you! Each of the below require the tools you are already familiar with, but
they also use a new method known as a SUBQUERY.

Question 1
We want to find out the most popular music Genre for each country. We determine the most
popular genre as the genre with the highest amount of purchases. Write a query that returns
each country along with the top Genre. For countries where the maximum number of
purchases is shared return all Genres.

For this query, you will need to use the Invoice, InvoiceLine, Customer, Track,


and Genre tables.

Check Your Solution


Though there are only 24 countries, your query should return 25 rows. The first 11 rows are
shown in the image below. Notice that the Argentina has 2 Genres that share the maximum.
Question 2
Return all the track names that have a song length longer than the average song length.
Though you could perform this with two queries. Imagine you wanted your query to update
based on when new data is put in the database. Therefore, you do not want to hard code
the average into your query. You only need the Track table to complete this query.
Return the Name and Milliseconds for each track. Order by the song length with the
longest songs listed first.

Check Your Solution


Below is an image of what the top ten rows of your table should look like. There should only
be 494 of the 3503 tracks in your table.

Question 3
Write a query that determines the customer that has spent the most on music for each
country. Write a query that returns the country along with the top customer and how much
they spent. For countries where the top amount spent is shared, provide all customers who
spent this amount.

You should only need to use the Customer and Invoice tables.

Check Your Solution


Though there are only 24 countries, your query should return 25 rows. The last 11 rows are
shown in the image below. Notice that the United Kingdom has 2 customers that share the
maximum.
NEXT

; Project:SQL Project

SEARCH
RESOURCES

CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform
 Peer Chat
Chat with peers and alumni

Different Date Functions for the Project


SEND FEEDBACK

Dates in SQLite (the project) Differ From


Postgres (the classroom)
In the project you are working with a little different SQL syntax than in the classroom.
Though most of the commands and logic will carry over directly, there are some differences
between SQLite (used for this project), and PostgreSQL (used in the classroom).
Specifically, the way those differences are likely to impact you is related to date
functionality.

Postgres SQL DATE_TRUNC


SELECT DATE_TRUNC('month', o.occurred_at) ord_date
FROM orders o
This would only return the year and month of the date field in the query results

SQLite version of DATE_TRUNC is STRFTIME

SELECT STRFTIME('%Y-%m', o.occurred_at) ord_date


FROM orders o
This would only return the year and month of the date field in the query results. In SQLite
we have to describe the date format more precisely since this is all that it will return are the
pieces specified. We specify this by putting within single quotes the parts of the date we
want in our final table.

For this query we wanted only the year and month %Y stands for year and %m stands for
month. The full list of what is below.

%d - day of month: 00

%f - fractional seconds: SS.SSS

%H - hour: 00-24

%j - day of year: 001-366

%J - Julian day number

%m - month: 01-12

%M - minute: 00-59

%s - seconds since 1970-01-01

%S - seconds: 00-59

%w - day of week 0-6 with Sunday==0

%W - week of year: 00-53

%Y - year: 0000-9999

Postgres SQL DATE_PART

SELECT DATE_PART('month', occurred_at) ord_year


FROM orders
This would only return the month of the date field in the query results

SQLite version of DATE_PART is STRFTIME

SELECT STRFTIME('%m', o.occurred_at) ord_date


FROM orders o
Since we only want to pull the month out we can specify that using the same STRFTIME
function in SQLite we just have to use the %letter notation to specify which part we want. So
here we have to use '%m' instead of 'month'.

Here are some helpful links to assist with working with dates in SQLite.

 https://www.techonthenet.com/sqlite/functions/strftime.php
 https://sqlite.org/lang_datefunc.html
Should you have to work another SQL environment in the future, like Microsoft SQL Server,
Oracle, MySQL, or any other SQL environment; there are likely to again be subtle
differences. With your current skills, a quick Google search will likely help you be able your
transfer what you know to work with any of these environments very quickly.

If you are still having difficulty using date functions in SQLite and want to use it in your
project please ask a question in  Study Groups .

NEXT
; Project:SQL Project

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT


10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform

 Peer Chat
Chat with peers and alumni

Project Submission - IMPORTANT


SEND FEEDBACK

Project submission

Presentations
You are now on the portion of the project you will need to submit to a reviewer. To pass this
project follow the below instructions to create a presentation.

Your presentation should include:

 Four slides
 One visualization per slide
 A 1-2 sentence explanation of each slide
 The SQL query used to create the data used in the visualization.
Note: you may choose to use queries that were motivated by the questions on the
previous concepts, or you may choose four entirely new questions. However, if you
use any of the previous queries, they must be those that had a JOIN as stated in
the Rubric.
The submission template is a Google Slides file. Make a copy of the submission template to
complete your project. We suggest you use the layout provided, though it is not a
requirement.

Queries
Please include a text file that includes each of the queries used to create the visualizations.
You should format your queries for readability, use this tool to help http://www.sql-
format.com/. In a plain text file (use notepad, notepad++, or atom).
Put your text file and presentation in a folder and zip it. Then submit the zipped folder
for your project. A slide template is provided here:
SUBMISSION TEMPLATE

If you cannot access Google Slides please scroll to the bottom to download the power point file
of the template.

Visualizations
We suggest you use a spreadsheet application, such as Excel or Google Sheets to create
your visualizations. However, you’re welcome to use whatever tool you’d like. Your
visualizations could be any that you learned about in the previous lesson. Below is one
example, and a link has been provided to an example slide.
You should have four slides that are similar to the below submission, but the questions you
ask are up to you, and all four of your final submitted queries should contain
a JOIN and AGGREGATION. Look at the Rubric to verify you have met all of necessities
for this submission.
SUBMISSION SLIDE EXAMPLE

How to Get Data Into Excel


To export the results of your queries from DB Browser, use the button below and to the right
of the results window. Below shows you how you can export your data to a spreadsheet
software.

In order to create the visualizations like those shown in the link above, you will need to
move your data out of SQL and into Excel (or another spreadsheet application).
Select Export to CSV, and then select the settings that match the ones below. Make sure
your setting on New line characters is set to  Unix: LF(\n) .

Additional Guidelines
 There shouldn’t be any additional data prep (sorting, filtering, renaming, etc.)
between the query output and the visualization.
 All your four queries must include at least one join and an aggregation.
 Review your project against the project Rubric. Reviewers will use this to evaluate
your work.
 The first part of this project is aimed at helping you understand the database, so you
can ask interesting questions in the second part. Feel free to use and expand upon the
queries you wrote in the first part.
 Once you've finished your project, submit the presentation as a PDF and the queries
as a .txt file.
 Don't be afraid to challenge yourself! Try to combine the SQL concepts you know!
In order to review your presentation, you will need to save your slides as a  PDF . You
can do this from within Google Slides by selecting File > Download as > PDF Document.
Supporting Materials
 SQL Project Submission Template

NEXT
; Project:SQL Project

SEARCH

RESOURCES

CONCEPTS

1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform
 Peer Chat
Chat with peers and alumni

Common Mistakes
SEND FEEDBACK

Invoice.Total vs InvoiceLine.UnitPrice and


InvoiceLine.UnitQuantity

Invoice.Total
This contains the total amount a customer paid in the transaction. A customer may have
bought several different genres, albums, and songs in one transaction.

InvoiceLine.UnitPrice and InvoiceLine.UnitQuantity


The InvoiceLine is connected to the Invoice by joining on Invoice.InvoiceId =
InvoiceLine.InvoiceId The InvoiceLine.UnitPrice is the amount that a particular song costs
on that invoice, InvoiceLine.UnitQuantity is how many of that track were purchased on that
invoice. There can be multiple InvoiceLines in each Invoice.

So if you are looking for the amount of genre sales you will need to use Invoiceline to
calculate it since just using Invoice.Total might contain multiple genres

Aggregations
Be careful with Aggregations! You need to include all of the columns you are returning other
than the aggregation in your group by statement.

Right:

Select Album.title, Count(*)


From Album
Group by Album.title
 This returns 347 rows, with the title of the album and the number of albums with that
name

Wrong:

Select Album.title, Count(*)


From Album
 This returns 1 row, with the album title that is last in the table.
If a column in the select statement is not in the Group By statement your results will be
something you are not expecting. Please be careful of this!

SubQueries
Subqueries are awesome but you should not use them if you do not need it to answer the
question you asked. Many times the first question that is thought of does not require one.
You may need to think of a few more to find a complex question that necessitates a
subquery. Though writing a subquery is not required, you are encouraged to challenge
yourself if you would like to submit a subquery!

Joins
Joins in general should be from a Primary key to it's corresponding Foreign key
Right:  ON Track.TrackId = InvoiceLine.TrackId
 Track PrimaryKey = InvoiceLine ForeignKey for Track

Wrong:  ON Track.TrackId = InvoiceLine.InvoiceLineId


 Track PrimaryKey does not equal InvoiceLine PrimaryKey

Understanding the data


All of these purchases are of songs, no purchases are of entire albums. So if you are trying
to show which album has the most sales, you would need to clarify this so that it is not
misleading. You would have to show which album has the most songs sold off of it.

Unrecognized Token
You may get an error that says "unrecognized token" the reason for this is the different uses
of single and double quotes around words in your query.

In SQLite "" are used to denote strings and '' are used to denote columns(though this is
rare, but sometimes people will have column names with spaces in them and so using the ''
allows you to still reference that column

Correct  SELECT * FROM Genre WHERE Genre.name = "Rock"


Incorrect  SELECT * FROM Genre WHERE Genre.name = 'Rock'

NEXT
;

Project:SQL Project

SEARCH

RESOURCES

CONCEPTS
1.  1. Project Introduction

2.  2. Set Up the Database

3.  3. Checking Setup

4.  4. Schema Match

5.  5. SQL: Question Set 1

6.  6. SQL: Question Set 2

7.  7. (Advanced) SQL: Question Set 3

8.  8. Different Date Functions for the Project

9.  9. Project Submission - IMPORTANT

10.  10. Common Mistakes

11.  11. Project: SQL Project


 Mentor Help
Ask a mentor on our Q&A platform
 Peer Chat
Chat with peers and alumni

SQL Project
SUBMIT PROJECT
Project Submission
Have project questions? Ask a technical mentor or search for existing answers!

ASK A MENTOR
 DUE DATE
Mar 11
 STATUS
Unsubmitted
Project past due
To submit your project, please do the following:

 Review your project against the project Rubric. Reviewers will use this to evaluate
your work.
 Create your slides with whatever presentation software you'd like (e.g. Google
Slides, PowerPoint, Keynote, etc.).
In order to review your presentation, you will need to save your slides as a  PDF . You
can do this from within Google Slides by selecting File > Download as > PDF Document.
 Create a separate text file with each of the SQL queries used to create the
visualizations.
 Save the presentation as a PDF and the SQL queries in a text file(.txt) in the same
folder.

 Zip (compress) the folder and submit this zipped folder with both files in it.
 Submit the zipped file.

You might also like