0% found this document useful (0 votes)

590 views8 pages

Practice Assignment SQL

This document describes a two-part assignment for a data science course involving profiling and analyzing the Yelp dataset. The first part involves profiling the data by finding statistics like record counts, distinct primary keys, and minimum/maximum/average values for certain fields. The second part tasks the student with analyzing the data to answer a self-posed research question, including preparing the data for the chosen analysis. Code formatting and readability are emphasized.

Uploaded by

Iggzy Thenoob

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

590 views8 pages

Practice Assignment SQL

Uploaded by

Iggzy Thenoob

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Data Scientist Role Play: Profiling and Analyzing the Yelp Dataset

Coursera Worksheet

This is a 2-part assignment. In the first part, you are asked a series of
questions that will help you profile and understand the data just like a
data scientist would. For this first part of the assignment, you will be
assessed both on the correctness of your findings, as well as the code
you used to arrive at your answer. You will be graded on how easy your
code is to read, so remember to use proper formatting and comments where
necessary.

In the second part of the assignment, you are asked to come up with your
own inferences and analysis of the data for a particular research
question you want to answer. You will be required to prepare the dataset
for the analysis you choose to do. As with the first part, you will be
graded, in part, on how easy your code is to read, so use proper
formatting and comments to illustrate and communicate your intent as
required.

For both parts of this assignment, use this "worksheet." It provides all
the questions you are being asked, and your job will be to transfer your
answers and SQL coding where indicated into this worksheet so that your
peers can review your work. You should be able to use any Text Editor
(Windows Notepad, Apple TextEdit, Notepad ++, Sublime Text, etc.) to copy
and paste your answers. If you are going to use Word or some other page
layout application, just be careful to make sure your answers and code
are lined appropriately.
In this case, you may want to save as a PDF to ensure your formatting
remains intact for you reviewer.

Part 1: Yelp Dataset Profiling and Understanding

1. Profile the data by finding the total number of records for each of
the tables below:

i. Attribute table = 10000

ii. Business table = 10000
iii. Category table = 10000
iv. Checkin table = 10000
v. elite_years table = 10000
vi. friend table = 10000
vii. hours table = 10000
viii. photo table = 10000
ix. review table = 10000
x. tip table = 10000
xi. user table = 10000

2. Find the total number of distinct records for the primary keys in each
of the tables listed below:

i. Business = 10000
ii. Hours = 1562
iii. Category = 2643
iv. Attribute = 1115
v. Review = 10000
vi. Checkin = 493
vii. Photo = 10000
viii. Tip = 537
ix. User = 10000
x. Friend = 11
xi. Elite_years = 2780

Note: Primary Keys are denoted in the ER-Diagram with a yellow key icon.

3. Are there any columns with null values in the Users table? Indicate
"yes," or "no."

Answer: no.

SQL code used to arrive at answer:

select *
from user
where name is null or id is null or review_count is null or
yelping_since is null or useful is null or name is null or cool is null
or fans is null or average_stars is null or funny is null or
compliment_hot is null
or compliment_more is null or compliment_profile is null
or compliment_cute is null or compliment_list is null or compliment_note
is null or compliment_plain is null or compliment_cool is null
or compliment_funny is null or compliment_writer is
null or compliment_photos is null

4. For each table and column listed below, display the smallest
(minimum), largest (maximum), and average (mean) value for the following
fields:

i. Table: Review, Column: Stars

min: max: avg:

1 5 3.7082

ii. Table: Business, Column: Stars

min: max: avg:

1.0 5.0 3.6549

iii. Table: Tip, Column: Likes

min: max: avg:

0 2 0.0144

iv. Table: Checkin, Column: Count

min: max: avg:

1 53 1.9414

v. Table: User, Column: Review_count

min: max: avg:
0 2000 24.2995

5. List the cities with the most reviews in descending order:

SQL code used to arrive at answer:

SELECT city, review_count
FROM business
ORDER BY review_count DESC

Copy and Paste the Result Below:

+------------+--------------+
| city | review_count |
+------------+--------------+
| Las Vegas | 3873 |
| Montréal | 1757 |
| Gilbert | 1549 |
| Las Vegas | 1410 |
| Las Vegas | 1389 |
| Las Vegas | 1252 |
| Las Vegas | 1116 |
| Las Vegas | 1084 |
| Las Vegas | 961 |
| Gilbert | 902 |
| Las Vegas | 864 |
| Scottsdale | 823 |
| Las Vegas | 821 |
| Las Vegas | 786 |
| Henderson | 785 |
| Toronto | 778 |
| Las Vegas | 768 |
| Las Vegas | 758 |
| Scottsdale | 726 |
| Cleveland | 723 |
| Las Vegas | 720 |
| Charlotte | 715 |
| Phoenix | 711 |
| Las Vegas | 706 |
| Phoenix | 700 |
+------------+--------------+
(Output limit exceeded, 25 of 10000 total rows shown)

6. Find the distribution of star ratings to the business in the following

cities:

i. Avon

SQL code used to arrive at answer:

ii. Beachwood

SQL code used to arrive at answer:

SELECT stars, review_count

FROM business
WHERE city = 'Beachwood'
ORDER BY stars

Copy and Paste the Resulting Table Below (2 columns – star rating and
count):

+-------+--------------+
| stars | review_count |
+-------+--------------+
| 2.0 | 8 |
| 2.5 | 3 |
| 3.0 | 8 |
| 3.0 | 3 |
| 3.5 | 3 |
| 3.5 | 3 |
| 4.0 | 69 |
| 4.5 | 14 |
| 4.5 | 3 |
| 5.0 | 6 |
| 5.0 | 4 |
| 5.0 | 6 |
| 5.0 | 3 |
| 5.0 | 4 |
+-------+--------------+

7. Find the top 3 users based on their total number of reviews:

SQL code used to arrive at answer:

SELECT name, review_count

FROM user
ORDER BY review_count desc
LIMIT 3

Copy and Paste the Result Below:

+--------+--------------+
| name | review_count |
+--------+--------------+
| Gerald | 2000 |
| Sara | 1629 |
| Yuri | 1339 |
+--------+--------------+

8. Does posing more reviews correlate with more fans?

Please explain your findings and interpretation of the results:

No.
The users with the most reviews do not have the most fans.

9. Are there more reviews with the word "love" or with the word "hate" in
them?

Answer:
"love"

SQL code used to arrive at answer:

SELECT (SELECT count('text') FROM review WHERE text like '%love%')

AS 'love',
(SELECT count('text') FROM review WHERE text like
'%hate%') AS 'hate'
FROM review
GROUP BY 'love'

10. Find the top 10 users with the most fans:

SQL code used to arrive at answer:

SELECT name, fans

FROM user
ORDER BY fans DESC
LIMIT 10

Copy and Paste the Result Below:

+-----------+------+
| name | fans |
+-----------+------+
| Amy | 503 |
| Mimi | 497 |
| Harald | 311 |
| Gerald | 253 |
| Christine | 173 |
| Lisa | 159 |
| Cat | 133 |
| William | 126 |
| Fran | 124 |
| Lissa | 120 |
+-----------+------+
11. Is there a strong relationship (or correlation) between having a high
number of fans and being listed as "useful" or "funny?" Out of the top 10
users with the highest number of fans, what percent are also listed as
“useful” or “funny”?

Key:
0% - 25% - Low relationship
26% - 75% - Medium relationship
76% - 100% - Strong relationship

SQL code used to arrive at answer:

SELECT name, fans, useful, funny, review_count

FROM user
ORDER BY fans DESC
LIMIT 10

Copy and Paste the Result Below:

+-----------+------+--------+--------+--------------+
| name | fans | useful | funny | review_count |
+-----------+------+--------+--------+--------------+
| Amy | 503 | 3226 | 2554 | 609 |
| Mimi | 497 | 257 | 138 | 968 |
| Harald | 311 | 122921 | 122419 | 1153 |
| Gerald | 253 | 17524 | 2324 | 2000 |
| Christine | 173 | 4834 | 6646 | 930 |
| Lisa | 159 | 48 | 13 | 813 |
| Cat | 133 | 1062 | 672 | 377 |
| William | 126 | 9363 | 9361 | 1215 |
| Fran | 124 | 9851 | 7606 | 862 |
| Lissa | 120 | 455 | 150 | 834 |
+-----------+------+--------+--------+--------------+

Please explain your findings and interpretation of the results:

Medium relationship for "useful" and "funny". "useful" seems to be

more important by few %

Part 2: Inferences and Analysis

1. Pick one city and category of your choice and group the businesses in
that city or category by their overall star rating. Compare the
businesses with 2-3 stars to the businesses with 4-5 stars and answer the
following questions. Include your code.

i. Do the two groups you chose to analyze have a different distribution

of hours?
yes

ii. Do the two groups you chose to analyze have a different number of
reviews?
yes

iii. Are you able to infer anything from the location data provided
between these two groups? Explain.
No. Expalnation - Bottom line : insuficient data.
SQL code used for analysis:

2. Group business based on the ones that are open and the ones that are
closed. What differences can you find between the ones that are still
open and the ones that are closed? List at least two differences and the
SQL code you used to arrive at your answer.

i. Difference 1:
Review count Average

ii. Difference 2:
Count of open and closed
Count of Checkins

SQL code used for analysis:

SELECT AVG(stars), AVG(review_count), is_open, COUNT(id),

COUNT(c.count) as Checkings
FROM business b
LEFT JOIN checkin c
ON b.id = c.business_id
GROUP BY is_open

3. For this last part of your analysis, you are going to choose the type
of analysis you want to conduct on the Yelp dataset and are going to
prepare the data for analysis.

Ideas for analysis include: Parsing out keywords and business attributes
for sentiment analysis, clustering businesses to find commonalities or
anomalies between them, predicting the overall star rating for a
business, predicting the number of fans a user will have, and so on.
These are just a few examples to get you started, so feel free to be
creative and come up with your own problem you want to solve. Provide
answers, in-line, to all of the following:

i. Indicate the type of analysis you chose to do:

Find most popular type of business category and what type of business
have the most users 'yelping'.

ii. Write 1-2 brief paragraphs on the type of data you will need for your
analysis and why you chose that data:
Number of business grouped by category. Number of users that
reviewed business.
iii. Output of your finished dataset:

Most categorized business are related to food as are most

users doing reviews of food establishments.

+---------------+------------------------+------+
| numberOfUnits | category | u_id |
+---------------+------------------------+------+
| 9940 | None | 622 |
| 75 | Restaurants | 9 |
| 26 | Food | 6 |
| 22 | Nightlife | 4 |
| 13 | American (Traditional) | 4 |
| 19 | Bars | 3 |
| 4 | Barbeque | 3 |
| 3 | Smokehouse | 3 |
| 31 | Shopping | 2 |
| 6 | Specialty Food | 2 |
| 5 | Breakfast & Brunch | 2 |
| 5 | Chinese | 2 |
| 3 | Asian Fusion | 2 |
| 3 | Ethnic Food | 2 |
| 3 | Noodles | 2 |
| 3 | Soup | 2 |
| 2 | Farmers Market | 2 |
| 2 | Fruits & Veggies | 2 |
| 2 | Malaysian | 2 |
| 2 | Market Stalls | 2 |
| 2 | Meat Shops | 2 |
| 2 | Public Markets | 2 |
| 2 | Seafood Markets | 2 |
| 2 | Taiwanese | 2 |
| 10 | Active Life | 1 |
+---------------+------------------------+------+
(Output limit exceeded, 25 of 258 total rows shown)

iv. Provide the SQL code you used to create your final dataset:

SELECT COUNT(b.id) as numberOfUnits, category, COUNT(user_id)

as u_id
FROM business b
LEFT JOIN category c
ON b.id = c.business_id
LEFT JOIN review r
ON b.id = r.business_id
GROUP BY category
ORDER BY u_id desc , numberOfUnits desc

Peer Graded Assignment
100% (1)
Peer Graded Assignment
10 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Yelp Data Coursera PR2
No ratings yet
Yelp Data Coursera PR2
13 pages
RDO No. 75 - Zarraga, Iloilo
No ratings yet
RDO No. 75 - Zarraga, Iloilo
1,133 pages
Data Scientist Role Play
No ratings yet
Data Scientist Role Play
9 pages
Health Care Waste Assessment Tools
No ratings yet
Health Care Waste Assessment Tools
28 pages
Data Scientist Role Play 1
No ratings yet
Data Scientist Role Play 1
15 pages
Yelp Dataset SQL Lookup PDF
No ratings yet
Yelp Dataset SQL Lookup PDF
14 pages
Select AS From: Part 1: Yelp Dataset Profiling and Understanding
No ratings yet
Select AS From: Part 1: Yelp Dataset Profiling and Understanding
10 pages
Data Scientist Role Play
No ratings yet
Data Scientist Role Play
9 pages
SQL_for DS
No ratings yet
SQL_for DS
14 pages
Sqlite Project
No ratings yet
Sqlite Project
16 pages
Yelp Data Coursera
No ratings yet
Yelp Data Coursera
14 pages
Part 1
No ratings yet
Part 1
6 pages
SQL
No ratings yet
SQL
15 pages
Yelpdatacoursera
No ratings yet
Yelpdatacoursera
11 pages
NRc7xB3wQZ-XO8Qd8CGfng YelpDataCourseraPR2
No ratings yet
NRc7xB3wQZ-XO8Qd8CGfng YelpDataCourseraPR2
10 pages
SQL Peer PDF
No ratings yet
SQL Peer PDF
13 pages
Data Scientist Role Play
No ratings yet
Data Scientist Role Play
11 pages
Data Scientist Role Play Profiling
No ratings yet
Data Scientist Role Play Profiling
9 pages
SQL Peer Graded
No ratings yet
SQL Peer Graded
10 pages
SQL For Data Science
No ratings yet
SQL For Data Science
11 pages
Yelp Dataset
No ratings yet
Yelp Dataset
9 pages
Action Item
No ratings yet
Action Item
5 pages
大作业原题
No ratings yet
大作业原题
5 pages
Big Assignment 2024 Questions
No ratings yet
Big Assignment 2024 Questions
10 pages
COMPUTERT SCIENCE2024-25(2)
No ratings yet
COMPUTERT SCIENCE2024-25(2)
2 pages
RDBMS_Syllabus_Assignment_LabPractice
No ratings yet
RDBMS_Syllabus_Assignment_LabPractice
6 pages
SQL Questions
No ratings yet
SQL Questions
58 pages
Dbms Exercises
No ratings yet
Dbms Exercises
6 pages
DBMS Lab EXP 4 (1)
No ratings yet
DBMS Lab EXP 4 (1)
16 pages
CampusX Live - SQL Interview Questions For Data Science
No ratings yet
CampusX Live - SQL Interview Questions For Data Science
4 pages
ST ANDREWS PUBLIC SCHOOL CS
No ratings yet
ST ANDREWS PUBLIC SCHOOL CS
40 pages
Exam DBU F20 Solutions
No ratings yet
Exam DBU F20 Solutions
19 pages
WK 2
No ratings yet
WK 2
10 pages
Form_5_Exam_Summary
No ratings yet
Form_5_Exam_Summary
17 pages
DBMS LABmanual
No ratings yet
DBMS LABmanual
28 pages
Sri Ram Dayal Khemka Vivekananda Vidyalaya Junior College: Half Yearly Examination 2024 - 2025
No ratings yet
Sri Ram Dayal Khemka Vivekananda Vidyalaya Junior College: Half Yearly Examination 2024 - 2025
3 pages
lab6db
No ratings yet
lab6db
5 pages
SQL Assignment
50% (2)
SQL Assignment
3 pages
Simple Queries in SQL Chapter - 14 Type A: Very Short Answer Questions
No ratings yet
Simple Queries in SQL Chapter - 14 Type A: Very Short Answer Questions
4 pages
COMPUTER PRACTICAL FILE (Class 12)
No ratings yet
COMPUTER PRACTICAL FILE (Class 12)
49 pages
Wk2 DY2
No ratings yet
Wk2 DY2
7 pages
Yr10_CS_23-24_T3_W7_L1_SQL_Basics
No ratings yet
Yr10_CS_23-24_T3_W7_L1_SQL_Basics
28 pages
Unit 3
No ratings yet
Unit 3
47 pages
PRACTICAL FILE INDEX SQL
No ratings yet
PRACTICAL FILE INDEX SQL
3 pages
Question Bank for Practical Exam YCAP-4
No ratings yet
Question Bank for Practical Exam YCAP-4
8 pages
Department of Information Sciences and Technologies
No ratings yet
Department of Information Sciences and Technologies
12 pages
DBMS SQL
No ratings yet
DBMS SQL
17 pages
Create
No ratings yet
Create
11 pages
Computer Science Textbook Solutions - 30
No ratings yet
Computer Science Textbook Solutions - 30
31 pages
Dbms 4
No ratings yet
Dbms 4
55 pages
Learner Responses (7)
No ratings yet
Learner Responses (7)
33 pages
R22 Lab Exercise (1)
No ratings yet
R22 Lab Exercise (1)
5 pages
SQL Having Clause
No ratings yet
SQL Having Clause
5 pages
Query(1)
No ratings yet
Query(1)
9 pages
MySQL Activity 3
No ratings yet
MySQL Activity 3
9 pages
Assignment Chapter 3 PDF
No ratings yet
Assignment Chapter 3 PDF
2 pages
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
From Everand
Exam AZ-800: Administering Windows Server Hybrid Core Infrastructure Preparation
Georgio Daccache
No ratings yet
SAP Business Objects SA
From Everand
SAP Business Objects SA
equitypress
5/5 (2)
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
From Everand
Microsoft Visual Basic Interview Questions: Microsoft VB Certification Review
Equity Press
No ratings yet
Excel Power Pivot & Power Query For Dummies
From Everand
Excel Power Pivot & Power Query For Dummies
Michael Alexander
No ratings yet
On Ramp Curriculum
No ratings yet
On Ramp Curriculum
10 pages
64b9a7c0-5ce7-483e-ab4b-b4bfa285df9a
No ratings yet
64b9a7c0-5ce7-483e-ab4b-b4bfa285df9a
27 pages
Ulangan Harian Quantifier Satrio 8c 31
100% (1)
Ulangan Harian Quantifier Satrio 8c 31
6 pages
CASE STUDIES Convention Centre Indian Habitat Centre and Indian International Centre PDF
No ratings yet
CASE STUDIES Convention Centre Indian Habitat Centre and Indian International Centre PDF
43 pages
Sample Business Plan
No ratings yet
Sample Business Plan
63 pages
Soal Pas Bahasa Inggris
No ratings yet
Soal Pas Bahasa Inggris
1 page
Lab Class 9
No ratings yet
Lab Class 9
2 pages
Delta Resume Final
No ratings yet
Delta Resume Final
2 pages
Teck Resources' 10-Year Management Plan For Cheviot Mine Region
No ratings yet
Teck Resources' 10-Year Management Plan For Cheviot Mine Region
229 pages
Aryan Garg Research Paper
No ratings yet
Aryan Garg Research Paper
20 pages
InSinkErator Model Evolution 100 Food Waste Disposer
No ratings yet
InSinkErator Model Evolution 100 Food Waste Disposer
1 page
SAPS - PSG4 - Sugar, Starch or Cellulose - Student Notes
No ratings yet
SAPS - PSG4 - Sugar, Starch or Cellulose - Student Notes
2 pages
Keyword Leptitox
No ratings yet
Keyword Leptitox
46 pages
Tanzania 1013
No ratings yet
Tanzania 1013
8 pages
Identify mushroom species on the basis of morphological characteristics
No ratings yet
Identify mushroom species on the basis of morphological characteristics
7 pages
Poverty Employment ESI Notes
No ratings yet
Poverty Employment ESI Notes
9 pages
Lista Locuri Munca
No ratings yet
Lista Locuri Munca
4 pages
Planting in The Garden Lesson Plan
No ratings yet
Planting in The Garden Lesson Plan
2 pages
Menu Bloomery 2023
No ratings yet
Menu Bloomery 2023
18 pages
Articles by Jurriaan Plesman
No ratings yet
Articles by Jurriaan Plesman
4 pages
I. Learning Competency With Code LO 2. Prepare and Cook Egg Dishes (MELC 2)
No ratings yet
I. Learning Competency With Code LO 2. Prepare and Cook Egg Dishes (MELC 2)
9 pages
ASS 2 Observation Checklist
No ratings yet
ASS 2 Observation Checklist
9 pages
CHAPTER NO. 2# HISTORY & OVERVIEW OF FMCG SECTOR_
No ratings yet
CHAPTER NO. 2# HISTORY & OVERVIEW OF FMCG SECTOR_
7 pages
Kim Kardashian Lawsuit
100% (1)
Kim Kardashian Lawsuit
47 pages
English Vocabulary Builder - Learnenglishteam - Com 2
No ratings yet
English Vocabulary Builder - Learnenglishteam - Com 2
362 pages
A Rat Named Remy Dreams of Becoming A Great French Chef Despite His Family
No ratings yet
A Rat Named Remy Dreams of Becoming A Great French Chef Despite His Family
2 pages
Boh Champion Training Guide Os v1219
No ratings yet
Boh Champion Training Guide Os v1219
28 pages
Bonfils, Marc - Winter Wheat Physiology
No ratings yet
Bonfils, Marc - Winter Wheat Physiology
13 pages

Practice Assignment SQL

Uploaded by

Practice Assignment SQL

Uploaded by

Data Scientist Role Play: Profiling and Analyzing the Yelp Dataset

Part 1: Yelp Dataset Profiling and Understanding

i. Attribute table = 10000

SQL code used to arrive at answer:

i. Table: Review, Column: Stars

min: max: avg:

ii. Table: Business, Column: Stars

min: max: avg:

iii. Table: Tip, Column: Likes

min: max: avg:

iv. Table: Checkin, Column: Count

min: max: avg:

v. Table: User, Column: Review_count

5. List the cities with the most reviews in descending order:

SQL code used to arrive at answer:

Copy and Paste the Result Below:

6. Find the distribution of star ratings to the business in the following

SQL code used to arrive at answer:

SELECT stars, review_count

SQL code used to arrive at answer:

SELECT stars, review_count

7. Find the top 3 users based on their total number of reviews:

SQL code used to arrive at answer:

SELECT name, review_count

Copy and Paste the Result Below:

8. Does posing more reviews correlate with more fans?

Please explain your findings and interpretation of the results:

SQL code used to arrive at answer:

SELECT (SELECT count('text') FROM review WHERE text like '%love%')

10. Find the top 10 users with the most fans:

SQL code used to arrive at answer:

SELECT name, fans

Copy and Paste the Result Below:

SQL code used to arrive at answer:

SELECT name, fans, useful, funny, review_count

Copy and Paste the Result Below:

Please explain your findings and interpretation of the results:

Medium relationship for "useful" and "funny". "useful" seems to be

Part 2: Inferences and Analysis

i. Do the two groups you chose to analyze have a different distribution

SELECT name, category, city, stars, review_count, hours,

SQL code used for analysis:

SELECT AVG(stars), AVG(review_count), is_open, COUNT(id),

i. Indicate the type of analysis you chose to do:

Most categorized business are related to food as are most

SELECT COUNT(b.id) as numberOfUnits, category, COUNT(user_id)

You might also like