0% found this document useful (0 votes)
19 views

Assignment 2

Uploaded by

bachtyar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Assignment 2

Uploaded by

bachtyar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Assignment 2 « SQL »

Brief
Due date: Two weeks after the lecture

For more, this is a very useful inventory of all SQL commands / usage: https://www.w3schools.com/sql/

Data: https://www.kaggle.com/tmdb/tmdb-movie-metadata

This is individual assignment. Part 1 is compulsory, whereas Part 2 and 3 are bonus.

File to submit: SQL or text files. Please do not hand in the data files!

Please submit through the link of assignment submission in e-learning.

SQLite
You can download SQLite from internet. It can be accessed from the command line using sqlite3 . Running sqlite3
somedb.db from your terminal will launch an environment that will allow you to type your SQL queries directly into the
terminal. You can exit this environment by pushing Ctrl+D or by typing .exit and pressing enter.

As a more explicit example, to open a sql environment where you can query the movies.db database, you can type:

$ sqlite3 movies.db

To execute a SQL statement that you have saved in a solution file, you can run the following command:

$ sqlite3 movies.db < sql_solutions.sql

For more information on using SQLite from the command line, see http://www.sqlite.org/sqlite.html
(http://www.sqlite.org/sqlite.html).

Part 1
The Disney princesses want to visit all the Budiluhur students and alumni they know, but before they do so they want to make
sure that they get everybody's details right! Please create a database called people.db in order to answer certain questions
about the students and alumni they will be visiting!

We have provided a database named people.db with the name, age, ID, and occupation of some Budiluhur students and
alumni with at least 10 rows of dataset. Here is the schema:

people_main(ID INTEGER, name TEXT, occupation TEXT, age INTEGER)


people_likes(ID1 INTEGER, ID2 INTEGER)
people_friends(ID1 INTEGER, ID2 INTEGER)

In the people_main table, ID is a unique identifier for a particular student or alumni. name, occupation and age correspond
to the person's first name, occupation and age.

In the people_friends table, each (ID1, ID2) pair indicates that the particular person with ID1 is friends with the person with
ID2 (and vice versa). The friendship is mutual, and if (ID1, ID2) is in the table, it is guaranteed that (ID2, ID1) exists in the
table.

In the people_likes table, each (ID1, ID2) pair indicates that the student or alumni with ID1 likes the person with ID2. The
(ID1, ID2) pair in the table does not guarantee that the (ID2, ID1) pair also exists in the table.

Your job is to write SQL queries for the data being requested:

1. Write a SQL statement that returns the name of people, ordered by name (A-Z). Save the query to
part1_problem1.sql
Make sure to add the ; symbol at the end!

2. Write a SQL statement that returns the name and age of people who are engineers or doctors. Results should
be ordered by name (A-Z). Save the query to part1_problem2.sql
Hint: Use the WHERE operation.

3. Write a SQL statement that returns the name of people that are liked by at least one other person. Results should
be ordered by name (A-Z). Save the query to part1_problem3.sql
Careful! Some people are liked multiple times, but their name should only appear once. Take a look at the DISTINCT
and IN operators.

4. Write a SQL statement that returns the name of people who aren’t liked by anyone else. Results should be ordered
by name (A-Z). Save the query to part1_problem4.sql
Hint: This should be very similar to previous question!

5. Write a SQL statement that returns each occupation and the number of people with that occupation. Results
should be ordered by number of people within that occupation (descending), and then by occupation (A-Z). Save
the query to part1_problem5.sql
Hint: Use GROUP BY and COUNT .

6. Write a SQL statement that returns the name and number of people that like each person. Results should be
ordered by count (descending), and then by name (A-Z). Save the query to part1_problem6.sql
Hint: Use a LEFT JOIN … the following website is quite useful: http://blog.codinghorror.com/a-visual-explanation-of-
sql-joins/ (http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/)

7. Write a SQL statement that returns the two occupations that have the least number of people who are liked,
and said number of people. Results should be ordered by occupation (A-Z). Save the query to
part1_problem7.sql
Hint: The LIMIT statement will come in handy!

8. Write a SQL statement that returns the name and occupation of all people who have more than 3 friends. Results
should be ordered by name (A-Z). Save the query to part1_problem8.sql
Hint: You'll need to take a look at the HAVING function.

9. Write a SQL statement that returns the distinct name and age of all people who are liked by anyone younger
than them. Results should be ordered by name (A-Z). Save the query to part1_problem9.sql
Hint: You should be using more than one JOIN !

10. Write a SQL statement to find pairs (A, B) such that person A likes person B, but A is not friends with B. The query
should return 4 columns: ID of person 1, name of person 1, ID of person 2 and name of person 2. Results should be
ordered by ID1 (ascending), then ID2 (ascending). Save the query to part1_problem10.sql
Time to join stuff!

Part 2
After a long day of visiting all their Budiluhur student and alumni friends, the princesses wanted to wind down and watch
some movies. However, they don't know which ones to watch! Can you help them figure it out?

For this part of the assignment, you have to use data the TMDB Movie Dataset, which should be exported to the movies.db
database. The database schema is as follows:
movies(budget INTEGER, homepage TEXT, id INTEGER, original_language TEXT, original_title TEXT, overview T
EXT, popularity REAL, release_date TEXT, revenue REAL, runtime INTEGER, status TEXT, tagline TEXT, title
TEXT, vote_average REAL, vote_count INTEGER)
scores(review TEXT, min_score INTEGER, max_score INTEGER)

We encourage you to use the WITH operator, which lets you divide your query into separate queries. As an example, we can
define a subquery and use it in another query as follows:

WITH subquery AS (
SELECT
original_title, vote_average
FROM
movies
)
SELECT
original_title
FROM
subquery
;

11. Write a SQL query to find the original_title , budget and release_date of the movie "John Carter" and append to
that the movie original_title , budget and release_date of the movie that was released 9 days after 'John Carter'
(This movie should also be a Disney movie!).
You can add days to a particular day by using the date function. For example, in order to add 3 days to '2012-07-16',
you can use date('2012-07-16', '+3 days')
Hint: The UNION statement should come in handy.

12. Write a SQL query to count the number of movies that start with "The", end with a "2" or contain the word "shark".
Your query should be case insensitive and return one column with one entry. You should return a single value. Hint:
You may want to look into CASE statements and the LIKE operator.
13. Write a SQL query to select the original_title of all movies and a column where there is a 1 if there exists another
movie that has the same vote average and the same runtime as that movie, and a 0 otherwise. Results should be
ordered by original_title (A-Z).
Hint: You may want to look into the EXISTS operator. Additionally, think about possible edge cases

14. Write a SQL query that returns the original_title, vote_average and review of every movie. The reviews
depends on the vote_average as described in the scores table. For example, movies with a vote average between
2 and 3.9 (inclusive) are reviewed as 'poor', whereas movies with a vote average between 9 and 10 (inclusive) are
reviewed as 'excellent'. However, the princesses do not even want to consider any bad movies. Thus, if a movie is
reviewed as 'awful' or 'poor' then original_title should read 'do not watch'. Results should be ordered by id
(ascending). For example, the output should have the following format:

'Snow White' | 8.7 | 'great'


'Toy Story' | 9.3 | 'must see'
'do not watch' | 2.3 | 'poor'

Hint: Look into the BETWEEN statement and how it can be used in a join.

15. Write a SQL query that finds the original_title, release_date and revenue of all the movies whose revenue
exceeded the average revenue of all the movies released on the same day (including itself). Results should be
ordered by release_date (ascending), and then revenue (descending).
Part 3: Optimization
After having watched a movie, the Disney princesses wanted to catch a college sports game before they left! However, they
only want to see the games that have the highest performing players! Help them optimize their SQL query so they can get to
the game on time!

We have provided you with the athletes.db database, although querying it is not necessary at all. The schema is as
follows:

school_athletes(ID INTEGER, name TEXT, school TEXT, performance_score INTEGER, team TEXT)

The SQL query to optimize is as follows:

SELECT ID, name


FROM school_athletes AS athletes
WHERE school = 'Budiluhur' and performance_score
> ( SELECT AVG(performance_score)
FROM school_athletes
WHERE team = athletes.team
);

16. Write down your optimized query for movie database in part3.txt , what type of query this is, and what steps you
took to optimize it.

Credits
Movie Database from: https://www.kaggle.com/tmdb/tmdb-movie-metadata (https://www.kaggle.com/tmdb/tmdb-movie-
metadata)

You might also like