0% found this document useful (0 votes)
7 views10 pages

Big Assignment 2024 Questions

The document outlines the requirements for a Big Assignment in 2024 involving the analysis of a Yelp database using MySQL. It includes specific questions related to user reviews, business ratings, and user interactions, requiring the submission of MySQL code and results in a Word document. The grading criteria emphasize the importance of correct answers and code accuracy, with various tasks focusing on different aspects of the Yelp dataset.

Uploaded by

Seira Rehtona
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views10 pages

Big Assignment 2024 Questions

The document outlines the requirements for a Big Assignment in 2024 involving the analysis of a Yelp database using MySQL. It includes specific questions related to user reviews, business ratings, and user interactions, requiring the submission of MySQL code and results in a Word document. The grading criteria emphasize the importance of correct answers and code accuracy, with various tasks focusing on different aspects of the Yelp dataset.

Uploaded by

Seira Rehtona
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Big Assignment 2024

 To complete this assignment, upload the “Yelp data.sql” database from Mycourse
[“Exercises and database files” section] into HeidiSQL. The database has 16 tables
containing information on reviews Yelp users have left for different businesses they
have visited.
 Please read each question carefully and consider which information (attributes) you
need from each table to arrive at the solution. Provide the answer in a designated
space under each question and the MySQL code that generated the result by copying
the code from HeidiSQL and pasting it into this document. If asked in the question,
also attach screenshots of the results.
 Please check the rows of the data imported for each table (see Yelp Dataset
Description.docx) and ensure you have imported the data correctly, not importing the
data twice or importing insufficient data.
 If you see an error message like “the server has gone away,” please contact the teacher
to update the server settings. Aalto IT might update the server settings without
informing the teacher, preventing you from importing large files.
 You are free to create additional columns or tables to support your analysis.
 Please submit the assignment as a Word document, not a PDF. A Word document
allows us to copy and check your code easily.
 Grading method: If your answer is wrong but the code is correct, you will still receive
a substantial minus in points. A wrong answer with the correct code is still considered
a failure in real-life business.
 Good luck with the assignment! :)

Question 1 (4 points, each answer for 1 point)


In the business table, assume that the businesses with a rating below 2 belong to the very poor
rating category; the businesses with ratings [“stars” variable] from 2 to below 3 are
considered poor ratings; the businesses with ratings from 3 to below 4.5 are considered good
ratings, and the businesses that have ratings 4.5 or above are considered excellent ratings.
Only the businesses that are still active (“true” in the “active” column) are considered.
Please report the average review count for the businesses in each rating level.
Rating level Average review count
Excellent_rating
Good_rating
Poor_rating
Very_poor_rating
MySQL code that generates the result:

Question 2 (6 points)
In the table friends, we can see the Yelp users who are friends. Find the user [‘name’ from
table users] with the most friends, where one friendship is considered only once.
Amount of friends: _________ (3 points)
“Name” [not user_id] of the user: _________ (3 points)
MySQL code that generates the result:

Question 3 (10 points)


a) In the table “business”, there are different attributes for each business reviewed by
users in Yelp. Consider the [city] column and find which city received the highest
amount of reviews from those cities where the businesses received less than 100,000
reviews in total.
The city with the highest amount of reviews: ________ (5 points)
MySQL code that generates the result:

b) In the city “Phoenix”, what business category received the highest amount of
reviews? Please provide the name of the business category and the number of reviews
given to the category.
Business Category: _______ (2 points)
Number of reviews received: _______ (3 points)
MySQL code that generates the result:

Question 4 (10 points)


In the table “users”, please find the user’s name [name] for that user who ranks as the 1st
among all users regarding the number of fans. Also, find this user’s rank in giving
funny_votes [votes_funny].
User name: _______ (5 points)
Ranking in votes_funny: _______ (5 points)
MySQL code that generates the result:

Question 5 (8 points)
a) Table “users” contains information on reviews users have left for businesses. From
those users who started Yelping between September 2010 and May 2011
[yelping_since_year] [yelping_since_month], what is the number of users who have
written reviews but received no votes in funny [votes_funny], useful [votes_useful],
and cool [votes_cool] categories in the “reviews” table?
Number of users: _______(4 points)
MySQL code that generates the result:

b) From the users that have reviewed businesses located in the city “Phoenix”, please
identify the user with the highest total amount of reviews written [“review_acount” in
the “users” table], and give the name of the user [name] and the total number of
reviews written [“review_acount” in the “users” table].
The name of the user: _______(2 points)
The number of reviews: _______(2 points)
MySQL code that generates the result:

Question 6 (8 points)
Explore “business”, “business_attribute_goodfor” and “business_hours” tables, and answer
the following question:
What is the business name [business_name], the opening time [opening_time] on
Sundays of a business that is currently active, has been reviewed as good for “breakfast” in
the table “business_attribute_goodfor” and has received the star rating 5 among other similar
businesses?
Note: you can find the data about the “active” and “stars” of a business in the “business”
table; “opening_time_hours” at “business_hours” table; “subattribute” and true or false
“value” at “business_attributes_goodfor” table.

The name of the business [NOT business_id]: _______(4 points)


Opening time: _______(4 points)
MySQL code that generates the result:

Question 7 (8 points)
Use the tables “reviews” and “business” to answer this question. Please find the business
name [business_name] that has received the highest proportion of star ratings [stars] above 4
out of all their reviews. Consider only those businesses which have over 800 reviews in total.
The name of the business [NOT business_id]: _______ (4 points)
The ratio of the ratings for the business: _______ (4 points)
MySQL code that generates the result:

Question 8 (8 points)
The table “checkin” represents a simplified report on the amounts of customers checking in
per each hour from Mondays to Sundays. Which are the top 3 busiest hotels in terms of
checkins on Mondays that are currently active? Consider those businesses that have a
business category only in “Hotels” [see “business_categories” table, whether a business is of
“Hotels & Travel” category or other similar is not relevant here] and the number of checkins
only during the evening hours between 18 pm and 23 pm [time_18] – [time_23]. Provide the
name of the hotel, star rating, and total amount of customers checking for that business,
which had the highest star rating among the three busiest hotels on Monday evenings.
The name of the business [NOT business_id]: _______ (2 points)
Star rating: _______ (3 points)
The total number of customers checking in (18 pm-23 pm): ________ (3 points)
MySQL code that generates the result:

Question 9 (8 points, each answer for 2 points)


Assuming that when a hair salon specializes only in certain areas rather than providing all
different kinds of services, the quality tends to be better. Your task is to determine if the star
ratings for businesses in the table “business_attribute_hairtypesspecializedin” tend to be
better or worse when their areas of specialization increase or decrease.
To answer, please create four specialization categories based on the amount of true values in
the subattribute categories each business_id in the
“business_attribute_hairtypesspecializedin” has. For the categorization, the businesses that
specialize in 8 or 7 categories belong to the full_specialities category; the businesses with 6
or 5 categories belong to the multiple_specialities category; the businesses with 4 or 3
categories belong to some_specialities category, and the businesses with 2 or 1 categories
belong to few_specialities category. You can obtain “stars” values from the “business” table.
Please report the average stars for the businesses in each specialty category.
Speciality category Average stars
Full_specialities (8 – 7)
Multiple_specialities (6 – 5)
Some_specialities (4 – 3)
Few_specialities (2 – 1)

MySQL code that generates the result:

Question 10 (10 points)


In the “tip” table in the Yelp database, based on the column “tip_text” that have at least two
words:
a) Was the frequency of the word “food” as the first or second word more popular?
Please ignore whether the letter in the word is an upper or lower case.
Please give frequencies:
The word “Food” as 1st word: _______(2 points)
The word “Food” as 2nd word: _______(3 points)
b) What is the most popular word when food was the second word?
The most popular first word with the word “food” as second word: _______(2 points)
Frequency of the word: _______(3 points)
Please remember to drop seven punctuation marks from the title, including:
“ ” " - , ! `

We do not consider removing more punctuation marks to obtain consistent results. Also,
please remove empty spaces from both sides of the title.
MySQL code that generates the result:

Question 11 (10 points)


Based on the “review” table, please compare the total amounts of reviews for each business
for 2013 and 2012, respectively, and report the name of the business [business_name] and the
difference in reviews for the business that had the highest decrease in number of reviews
received from 2012 to 2013.
Name of the business [NOT business_id]: _______ (3 points)
The decrease in reviews from 2012 to 2013: ______ (7 points)
My SQL code that generates the result:

Question 12 (10 points)


Analyze the revisiting customer’s satisfaction on their next visit to a business branch. You
will only need the table “reviews” for this task.
Users[user_id] can leave a lower star rating [stars] on their first visit [review_date] to a
business [business_id], but give a higher star rating on their next visit to the same business,
indicating that the customer revisited the business and is more satisfied on their next visit.
Please count the number of users who rated a business higher on their second [or third, et.]
visit than their first visit.
Please consider only the reviews from 2012 and thus it is recommended to filter only rows
with reviews written in 2012 first to reduce the run time of the SQL commands.
Note1: If one user visits different businesses, all the visits count, and the same user could be
counted several times in the analysis.
Note 2: If a user wrote two or more reviews for a business within the same day, this user’s
reviews for the specific business are considered untrustworthy. Thus, we drop this user’s
reviews for this specific business. However, the user’s reviews on other businesses will be
retained if no more untrustworthy reviews are found.
Amount of users who rated a business higher on their second visit: _______
MySQL code that generates the result:

You might also like