Big Assignment 2024 Questions
Big Assignment 2024 Questions
To complete this assignment, upload the “Yelp data.sql” database from Mycourse
[“Exercises and database files” section] into HeidiSQL. The database has 16 tables
containing information on reviews Yelp users have left for different businesses they
have visited.
Please read each question carefully and consider which information (attributes) you
need from each table to arrive at the solution. Provide the answer in a designated
space under each question and the MySQL code that generated the result by copying
the code from HeidiSQL and pasting it into this document. If asked in the question,
also attach screenshots of the results.
Please check the rows of the data imported for each table (see Yelp Dataset
Description.docx) and ensure you have imported the data correctly, not importing the
data twice or importing insufficient data.
If you see an error message like “the server has gone away,” please contact the teacher
to update the server settings. Aalto IT might update the server settings without
informing the teacher, preventing you from importing large files.
You are free to create additional columns or tables to support your analysis.
Please submit the assignment as a Word document, not a PDF. A Word document
allows us to copy and check your code easily.
Grading method: If your answer is wrong but the code is correct, you will still receive
a substantial minus in points. A wrong answer with the correct code is still considered
a failure in real-life business.
Good luck with the assignment! :)
Question 2 (6 points)
In the table friends, we can see the Yelp users who are friends. Find the user [‘name’ from
table users] with the most friends, where one friendship is considered only once.
Amount of friends: _________ (3 points)
“Name” [not user_id] of the user: _________ (3 points)
MySQL code that generates the result:
b) In the city “Phoenix”, what business category received the highest amount of
reviews? Please provide the name of the business category and the number of reviews
given to the category.
Business Category: _______ (2 points)
Number of reviews received: _______ (3 points)
MySQL code that generates the result:
Question 5 (8 points)
a) Table “users” contains information on reviews users have left for businesses. From
those users who started Yelping between September 2010 and May 2011
[yelping_since_year] [yelping_since_month], what is the number of users who have
written reviews but received no votes in funny [votes_funny], useful [votes_useful],
and cool [votes_cool] categories in the “reviews” table?
Number of users: _______(4 points)
MySQL code that generates the result:
b) From the users that have reviewed businesses located in the city “Phoenix”, please
identify the user with the highest total amount of reviews written [“review_acount” in
the “users” table], and give the name of the user [name] and the total number of
reviews written [“review_acount” in the “users” table].
The name of the user: _______(2 points)
The number of reviews: _______(2 points)
MySQL code that generates the result:
Question 6 (8 points)
Explore “business”, “business_attribute_goodfor” and “business_hours” tables, and answer
the following question:
What is the business name [business_name], the opening time [opening_time] on
Sundays of a business that is currently active, has been reviewed as good for “breakfast” in
the table “business_attribute_goodfor” and has received the star rating 5 among other similar
businesses?
Note: you can find the data about the “active” and “stars” of a business in the “business”
table; “opening_time_hours” at “business_hours” table; “subattribute” and true or false
“value” at “business_attributes_goodfor” table.
Question 7 (8 points)
Use the tables “reviews” and “business” to answer this question. Please find the business
name [business_name] that has received the highest proportion of star ratings [stars] above 4
out of all their reviews. Consider only those businesses which have over 800 reviews in total.
The name of the business [NOT business_id]: _______ (4 points)
The ratio of the ratings for the business: _______ (4 points)
MySQL code that generates the result:
Question 8 (8 points)
The table “checkin” represents a simplified report on the amounts of customers checking in
per each hour from Mondays to Sundays. Which are the top 3 busiest hotels in terms of
checkins on Mondays that are currently active? Consider those businesses that have a
business category only in “Hotels” [see “business_categories” table, whether a business is of
“Hotels & Travel” category or other similar is not relevant here] and the number of checkins
only during the evening hours between 18 pm and 23 pm [time_18] – [time_23]. Provide the
name of the hotel, star rating, and total amount of customers checking for that business,
which had the highest star rating among the three busiest hotels on Monday evenings.
The name of the business [NOT business_id]: _______ (2 points)
Star rating: _______ (3 points)
The total number of customers checking in (18 pm-23 pm): ________ (3 points)
MySQL code that generates the result:
We do not consider removing more punctuation marks to obtain consistent results. Also,
please remove empty spaces from both sides of the title.
MySQL code that generates the result: