0% found this document useful (0 votes)
10 views4 pages

Data Science Laboratory Worksheet

This document outlines a laboratory worksheet for a CS 102 course focused on web scraping and data wrangling using Python with data from TripAdvisor. It includes instructions for scraping data using the Instant Data Scraper extension, saving it as a CSV file, and performing data cleaning, transformation, and analysis tasks in Python. Additionally, it requires students to submit their original and cleaned CSV files along with their Python code and reflective answers to specific questions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views4 pages

Data Science Laboratory Worksheet

This document outlines a laboratory worksheet for a CS 102 course focused on web scraping and data wrangling using Python with data from TripAdvisor. It includes instructions for scraping data using the Instant Data Scraper extension, saving it as a CSV file, and performing data cleaning, transformation, and analysis tasks in Python. Additionally, it requires students to submit their original and cleaned CSV files along with their Python code and reflective answers to specific questions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

</ CS 102: Data Preparations

Data Science Laboratory Worksheet: Web Scraping and Data Wrangling


from TripAdvisor
Objective:
Learn to scrape data from TripAdvisor and apply data wrangling techniques using Python.

PART A: Web Scraping TripAdvisor Data Using Instant Data


Scraper Prerequisites:
1. Install the Instant Data Scraper Google Chrome Extension or Microsoft
Edge. 2. Google Colab.

Instructions:
Step 1: Select Your Target Page on TripAdvisor

1. Choose a Category:
o Popular Destinations: Go to the TripAdvisor destinations section
https://www.tripadvisor.com/TravelersChoice-Destinations

o Hotels: Visit a location’s hotels page


https://www.tripadvisor.com/Hotels

o Restaurants: Visit a restaurant list in any city


https://www.tripadvisor.com/Restaurants

2. Enable Instant Data Scraper:


o Open the page you want to scrape.

o Click the Instant Data Scraper extension in your browser.

o Let it automatically detect data tables. Make sure it captures essential columns like
Name, Rating, Location, Review Count, etc.

Step 2: Extract and Save Data


1. Click Start Crawling if you want to load more items (the extension will scroll through pages
to capture more data).
2. After crawling, Save the data as a CSV file.

3. Name your CSV file based on the category you scraped (e.g.,
tripadvisor_hotels.csv, tripadvisor_restaurants.csv).
</ CS 102: Data Preparations

Example Data Points to Scrape:

• For Popular Destinations: Location, Rank, Country, Popularity Score.

• For Hotels: Name, Location, Star Rating, Price, Total Reviews.

• For Restaurants: Name, Cuisine Type, Rating, Price Range, Total Reviews.

PART B: Data Wrangling with Python


After saving your CSV file, follow these tasks to clean and analyze the data using
Python. Instructions:

1. Load the Data


2. Data Cleaning Tasks
Perform the following cleaning tasks:

o Remove Duplicate Entries:

o Handle Missing Values:

▪ Fill missing ratings with the mean or median rating.

▪ Drop rows with missing values if critical (like Name or Location).

o Standardize Text:

▪ Ensure consistent casing for text columns (e.g., restaurant names, locations).

3. Data Transformation Tasks


Implement these transformations:
o Convert Ratings to Numeric (if they appear as strings):

o Split Columns:

▪ For columns like Price Range (e.g., "$20-$50"), split it into Min Price and Max

Price. 4. Exploratory Data Analysis (EDA) Tasks

o Compute Basic Statistics:

▪ Get average, median, and mode of numeric columns like Rating and Review

Count. o Group Data:

▪ For example, group hotels or restaurants by Location and calculate the average
rating for each.
5. Visualization Tasks
Create simple visualizations using matplotlib or seaborn.

o Bar Chart: Show the average rating per location.


</ CS 102: Data Preparations

o Histogram: Plot the distribution of ratings.

6. Export Cleaned Data


After wrangling the data, save it to a new CSV file:

Submission

1. Submit the following:


o Original scraped CSV file.

o Python code or notebook file (.ipynb) with completed wrangling tasks.

o Cleaned CSV file.

2. Reflective Questions (Submit your answer using private comment after submitting your files
in google classroom)

Example:

Write your answers here


Questions:
o What challenges did you face in scraping data from TripAdvisor?

o Describe two insights you gained from the data wrangling and analysis

process. o What additional data could enhance this analysis?

Prepared by:
CHRISTIAN LESTER D. GIMENO

You might also like