Final Report PDF
Final Report PDF
in
I hereby declare that the work which is being presented in the B.Tech. Project
“HOTEL REVIEW SENTIMENT ANALYSIS,” in partial fulfillment of the
requirements for the award of the Bachelor of Technology in Computer Science
and Engineering and submitted to the Department of Computer Engineering and
Applications of GLA University, Mathura, is an authentic record of my own
work carried under the supervision of Mr. Vibhoo Sharma.
The contents of this project report, in full or in parts, have not been submitted to
any other institute or University for the award of any degree.
Sign Sign
Name of Student: Arya Siddarath Rao Name of student: Geeta Singh
Sign
Name of Student: Pratham Kumar
University RollNo.:2115000754
ii
CERTIFICATE
This is to certify that the above statements made by the candidate are correct to the
best of my/our knowledge and belief.
Supervisor
(Mr. Vibhoo Sharma)
Assistant Professor
Dept. of Computer Engg, & App.
Date:
iii
ACKNOWLEDGEMENT
It gives us a great sense of pleasure to present the Project report undertaken during B.
Tech. This project is an acknowledgment of the inspiration, drive and technical
assistance contributed to it by many individuals. This project would never have seen the
light of the day without the help and guidance that we have received.
Our heartiest thanks to Dr. Sandeep Kumar Rathor, Head of Dept., Department of CEA
for providing us with an encouraging platform to develop this project, which thus helped
us in shaping our abilities towards a constructive goal.
We also would not like to miss the opportunity to acknowledge the contribution
of all faculty members of the department for their kind guidance and cooperation
during the development of our project. Last but not the least, we acknowledge
our friends for their contribution to the completion of the project.
Sign Sign
Name of Student: Arya Siddarath Rao Name of Student: Geeta Singh
University Roll No.:21150000210 University Roll No.: 2115000754
Sign
Name of Student: Pratham Kumar
University Roll No.:2115000754
iv
ABSTRACT
The hospitality industry, being highly service-oriented and competitive, increasingly depends on
customer feedback and online reviews to maintain service quality and enhance guest satisfaction. With
the growing volume and complexity of reviews, manual processing has become inefficient,
necessitating an automated solution like the Intelligent Hotel Review System (IHRS). This system is
designed to streamline the collection, analysis, and presentation of hotel reviews, converting
unstructured textual feedback into actionable insights. IHRS integrates advanced Natural Language
Processing (NLP) and sentiment analysis to accurately detect and classify sentiments—positive,
negative, or neutral—associated with various aspects of the hotel experience such as room cleanliness,
staff interaction, food quality, and amenities. Using machine learning models that adapt over time, the
system continuously improves its understanding of language nuances, making it more reliable than
conventional methods. Beyond sentiment classification, IHRS identifies recurring themes and patterns
in guest reviews, offering hotel managers a comprehensive understanding of their strengths and areas
needing improvement. A key feature of the system is its interactive visualization dashboard, which uses
dynamic charts, graphs, heatmaps, and filters to present insights in a clear and actionable format. This
enables managers to monitor key performance indicators, evaluate trends, and make data-informed
decisions. Additionally, the system provides department-specific recommendations—such as service
enhancements for housekeeping, dining, or front-desk operations—helping to optimize internal
processes and elevate the overall guest experience. For prospective customers, IHRS delivers
summarized review highlights and sentiment summaries, aiding them in making well-informed booking
decisions based on real user experiences. In essence, IHRS represents a transformative approach to
hotel management by automating review analysis and delivering strategic insights that foster
continuous improvement, operational efficiency, and stronger guest relationships in an increasingly
data-driven hospitality landscape.
v
LIST OF FIGURES
vi
LIST OF TABELS
vii
CONTENTS
DECLARATION II
CERTIFICATE III
ACKNOWLEDGEMENT IV
ABSTRACT V
LIST OF FIGURES VI
LIST OF TABLES VII
CHAPTER 1 Introduction 1
1.1. Overview And Motivation 1
1.2 Objective 2
1.2. Key Challenges and Potential Solutions 3
1.2.1. Data Sparsity 3
1.2.2. Cold Start Problem 3
1.2.3. Diversity and Novelty 3
1.2.4. Overfitting 4
1.2.5. Interpretability 4
1.3. Contribution 4
1.4. Scope Definition 5
1.5. Organization Of the Project Report 5
CHAPTER 2 Tools and Technologies 6
2.1. Python 6
2.1.1. Basic Usage 6
2.1.2. Notable Features 7
2.2. Filtering 13
2.3. Item Based Filtering 14
viii
2.4. Anaconda 18
CHAPTER 3 Literature Reviews 20-21
4.1. Introduction 22
4.1.1. Purpose 22
4.1.2. Document Convention: 22
4.1.3. Intended Audience: 23
4.2. Perspective: 23
4.3. Product Functions: 23
4.4. User Classes and Characteristics: 23
4.5. Operating Environment: 24
4.6. Performance Requirements: 24
4.6.1. Hardware Requirements: 24
4.6.2. Software Requirements: 24
CHAPTER 7 Conclusion 42
CHAPTER 8 Summary 43
CHAPTER 8 REFERENCE 45
ix
Chapter 1 INTRODUCTION
The Hotel Review System serves as a strategic tool for hotels to harness the power
of guest feedback, driving continuous improvement, and ensuring that the evolving
needs and preferences of guests are met with agility and precision.
1.2. Objective
The objectives of a Hotel Review System are multifaceted and aim to address various
aspects of guest satisfaction, operational efficiency, and overall improvement in the
hospitality industry. Here are key objectives for a Hotel Review System
By aligning the Hotel Review System with these objectives, hotels can create a
dynamic feedback loop that not only addresses current guest experiences but also
contributes to the continuous improvement of services and operations.
• Provide a platform for guests to share their experiences, opinions, and feedback.
Identify positive aspects of guest experiences to reinforce and replicate. Identify
positive aspects of guest experiences to reinforce and replicate.
Data sparsity in the context of a Hotel Review System (HRS) refers to a situation
where the available data for analysis is insufficient or incomplete, making it
challenging to draw meaningful insights or make accurate predictions. In the context
of a hotel review system, data sparsity can manifest in various ways. Addressing data
sparsity in a Hotel Review System is crucial for ensuring the reliability and relevance
of the insights derived from guest feedback. Employing a combination of strategies
to encourage more reviews and diversify data sources can help mitigate the
challenges associated with sparse data.
The "Cold Start Problem" in the context of a Hotel Review System (HRS) refers to
the challenges and limitations faced when dealing with new hotels that have limited
or no historical data, or when introducing new features or aspects for which there is
insufficient data. This problem can hinder the system's ability to provide accurate
recommendations, predictions, or insights, as it lacks the necessary information to
make informed decisions. Here are the key aspects of the Cold Start Problem in an
HRS
In the context of a Hotel Review System (HRS), diversity and novelty are essential
considerations for providing a rich and engaging user experience. Ensuring diversity
in the recommendations and presenting novel options to users can enhance their
satisfaction and encourage exploration. Here's how diversity and novelty contribute
to the effectiveness of an HRS.
1.3.4. Overfitting
In the context of a Hotel Review System (HRS), overfitting refers to a situation
where the system learns the training data too well, capturing noise or random
fluctuations in the data rather than the underlying patterns or relationships. Overfitting
can lead to poor generalization performance, meaning that the system may perform
well on the training data but poorly on new or unseen data. Here's how overfitting can
impact an HRS and some strategies to mitigate it. By implementing these strategies, an
HRS can mitigate the risk of overfitting and ensure that its recommendations
generalize well to new data, providing accurate and robust suggestions to users.
1.3.5. Interpretability
Interpretability in the context of a Hotel Review System (HRS) refers to the system's
ability to provide clear, understandable, and meaningful explanations for its
recommendations or predictions. An interpretable HRS is valuable for both users and
stakeholders, as it fosters trust, aids in decision-making, and allows users to
comprehend the rationale behind the system's suggestions. Here are key aspects and
strategies related to interpretability in an HRS.
1.4. Contribution
o Enhanced Guest Experience: How: HRS allows hotels to gather feedback from
guests and identify areas for improvement. By addressing issues highlighted in
reviews, hotels can enhance the overall guest experience.
o Reputation Management: How: HRS helps hotels manage their online reputation
by monitoring and responding to guest reviews. Positive reviews can attract new
guests, while addressing negative reviews demonstrates responsiveness.
Organizing a project report for a Hotel Review System (HRS) involves structuring
the content in a logical and coherent manner. A well-organized report enhances
readability and understanding. Below is a suggested organization for the project
report:
2.1. Python
item_similarity_matrix = cosine_similarity(train_data)
weighted_sum = item_similarity_matrix.dot(user_ratings['rating'].values)
recommendations = pd.DataFrame({'item_id':
train_data['item_id'].unique(),
'weighted_avg': weighted_avg})
top_recommendations = recommendations.sort_values(by='weighted_avg',
ascending=False).head(num_recommendations)
top_recommendations = get_top_recommendations(user_id,
num_recommendations)
Declarative
Components
Python code is executed using an interpreter. You can use the Python interpreter to
run code interactively or to execute Python scripts. Here is an example of
running some Python code in the interactive interpreter:
Hello, World!
Python comes with a large and comprehensive standard library that provides many
useful modules and functions for tasks like working with files, networking, and data
manipulation. Here is an example of using the os module from the standard library to
list files in a directory:
import os
dir_path = "/path/to/directory"
Third-Party Libraries and Frameworks: Python has a large and active community
that has developed many third-party libraries and frameworks for various
purposes. These libraries and frameworks make it easy to extend the functionality
of Python and build complex applications and systems. Here is an example of
using the popular NumPy library to create a 2D array:
import numpy as np
print(my_array)
8
Chapter 2 Tools And Technologies
Python provides a variety of control flow structures, such as loops and conditional
statements, that allow developers to control the execution of their code.
my_list = [1, 2, 3, 4, 5]
print(item)
Python supports object-oriented programming, which allows developers to define
classes and objects that encapsulate data and behavior. Here's an example of
defining a class that represents a person:Ex:- class Person:
9
Chapter 2 Tools and Technologies
Filtering Techniques
A group of procedures known as filtering techniques are used in recommendation
systems to go through vast volumes of data and produce user-specific suggestions.
Filtering techniques are an important part of recommendation systems as they
allow for personalized recommendations to be generated for each user based on
their preferences and behavior. However, these techniques are not without their
limitations, and they can struggle with certain types of items or users, such as those
with limited interaction data. Therefore, it is important to carefully choose and
tune the filtering technique used in a recommendation system to ensure that it
provides accurate and useful recommendations for users.
feedback to gain insights into the sentiment and opinions of users towards products
or services.
Analyzing the sentiment of user feedback, recommendation systems can identify
areas where users are experiencing issues or dissatisfaction, and suggest changes
that can improve the overall user experience. This can help to improve customer
satisfaction, loyalty, and retention.
Data Merging
Data merging is the process of combining different datasets that contain related
information to create a unified dataset. In recommendation systems, data merging
is often used to combine user and item data to create a user-item matrix that can
be used to generate personalized recommendations.
Think about an e-commerce site that sells books as an illustration. The website
features two datasets: one for individuals, which includes information on users like
name, age, and location; the other is for books, which includes information about
books such author, genre, and publisher.
The two datasets may be combined based on a shared variable, such the book ID,
to produce a user-item matrix. A single dataset containing data on the users, the
books, and their interactions (such as purchases, ratings, or reviews) will be
produced as a result.
Once the user-item matrix is created, it can be used to generate personalized
recommendations for each user based on their past interactions with the website.
For example, if a user has purchased several science fiction books in the past, the
recommendation system can use this information to recommend other science
fiction books that are likely to be of interest to the user.
In general, data merging is a crucial stage in developing recommendation
systems since it enables the system to draw on a variety of data sources to provide
individualized suggestions that are pertinent to and helpful to the user.
11
Chapter 2 Tools and Technologies
Python hooks
Python has a number of built-in books that can be used to learn the language and
its various features.
Python Cookbook: - This book is a collection of practical recipes and tips for
solving common programming problems in Python. It covers a wide range of
topics, including data structures, algorithms, file I/O, and network programming,
and provides useful code snippets and examples to help readers improve their
Python skills.
Python for Data Analysis: This book is a practical guide to using Python for data
analysis and manipulation. It covers a wide range of topics, including data
12
Chapter 2 Tools and Technologies
2.2. Filtering
Content Based Filtering
The content-based approach uses additional information about users and/or items.
This filtering method uses item features to recommend other items similar to what
the user likes and based on their previous actions or explicit feedback. If we
consider the example for a movies recommender system, the additional
information can be, the age, the sex, the job or any other personal information for
users as well as the category, the main actors, the duration or other characteristics
for the movies i.e the items.
The main idea of content-based methods is to try to build a model, based on the
available “features,” that explain the observed user-item interactions. Still
considering users and movies, we can also create the model in such a way that it
could provide us with an insight into why so is happening. Such a model helps us in
making new predictions for a user easily, with just a look at the profile of this user
and based on its information, to determine relevant movies to suggest.
We can make use of a Utility Matrix for Content-Based Methods. A Utility Matrix
can help signify the user’s preference for certain items. With the data gathered
from the user, we can find a relation between the items which are liked by the user
as well as those which are disliked, for this purpose the utility matrix can be put
to best use. We assign a particular value to each user-item pair, this value is known
as the degree of preference and a matrix of the user is drawn with the respective
items to identify their preference relationship.
13
Chapter 2 Tools and Technologies
• Define a function to recommend items based on the user profile and the
item list
The main idea behind item-based filtering is to recommend items that are like the
items a user has already liked or rated highly, to increase the probability that the
user will also like the recommended items.
14
Chapter 2 Tools and Technologies
To achieve this, item-based filtering first builds a similarity matrix that measures
the similarity between each pair of items, based on their attributes or content. The
similarity matrix is then used to identify the top N most similar items to each item
in the user's history, and the ratings or reviews associated with these items are used
to predict the user's rating or preference for each recommended item.
The algorithm would then compute the similarity between movies using a similarity
metric such as cosine similarity. The similarity between two movies A and B would
be computed as the dot product of their ratings vectors divided by the product of their
norms.
15
Chapter 2 Tools and Technologies
6. compute weighted average of item similarities for each item the user has
interacted with
The system might analyze the movie titles, genres, actors, directors, and other
features to identify similarities between movies. If a user has watched and enjoyed
The Dark Knight, the recommendation system might identify other action movies
with similar features, such as The Avengers or Mission Impossible, as potential
recommendations.
The ability to offer accurate and comprehensible suggestions is one of the benefits
of item-based filtering. Item-based filtering, in contrast to other
recommendation
16
Chapter 2 Tools and Technologies
These frameworks can offer many functions for creating recommendation systems,
ranging from straightforward collaborative filtering models to more intricate deep
learning models. The project's unique objectives and restrictions, like the size of
the dataset, the difficulty of the model, and the available computing resources, all
influence the framework that is chosen.
17
Chapter 2 Tools And Technologies
2.4. Anaconda
Use the `conda create` command to create a virtual environment for your hotel review
ReviewvSystem project. This helps to isolate your project dependencies.
Eg : bash
Eg :
conda activate hotel_management_system
Install the necessary Python packages for your hotel management system using
`pip` or `conda` commands. For example:
18
Chapter 2 Tools and Technologies
Anaconda has a large and active community of users and contributors. This
community support is valuable for getting help, sharing knowledge, and staying
updated on best practices in the data science and Python ecosystem.
19
Chapter 3 Literature Review
3 Ha Thi Thu Nguyen 1 & July, 2022 The study uses NLP-based
Trung Xuan Nguyen2 Python and the sentiment analysis
Vader library to effectively
analyze 20,551 measures customer
hotel reviews from satisfaction, aiding
major Vietnamese Vietnam's hotel
cities for industry in data-
developing driven service
satisfaction improvement.
measurement
formulas.
4 Nicolau, J. L., Xiang, Z., & May, 2022 The study uses Daily sentiment
20
Wang, D. time-based influences key
sentiment analysis hotel performance
of daily hotel metrics,
reviews to supporting its use
examine their in real-time
impact on OR, pricing and
ADR, and reputation
RevPAR. strategies.
Dominic Gabbard York
5 University Accepted December, 2023 The study employs Online reviews
desktop research significantly
to analyze existing influence hotel
literature and data performance,
on the impact of supporting the
online reviews on need for active
hotel performance reputation
metrics across management and
global regions. strategic
engagement based
on review
sentiment and
credibility.
21
Chapter 4 Software Requirement Analysis
4.1. Introduction
The Recommendation systems have become an integral part of modern
software applications. They help users discover new products or services,
enhance the user experience, and increase customer satisfaction. Software
developers can use various techniques and algorithms to build
recommendation systems, depending on the application's requirements
and the available data.
4.1.1. Purpose
In this text, it will use font small 2 and overstriking for primary title, font
small 3 for secondary title and font 4 for the content. And it will use the
italic when mentions the name of the application Cin Suggest (Hotel
Review System).
4.2. Perspective:
A Web Recommendation System using Content Based Filtering.
The requests of the hardware for the web application are as followed:
To access a web portal of this application, its only need a PC/Laptop/Mobile with
an integrated and updated web browser.
This information can be gathered from a variety of sources, such hotel databases like
user ratings and reviews, or Kaggle data.
Kaggle hosts datasets covering a diverse array of topics, including but not limited to
machine learning, natural language processing, image recognition, finance,
healthcare, sports, and social sciences.
Many datasets on Kaggle are associated with data science competitions. These
competitions often have specific tasks and challenges that participants aim to solve
using machine learning and data analysis.
To explore datasets on Kaggle, you can visit the Kaggle Datasets page
(https://www.kaggle.com/datasets).
Overview
In this project, we will create a model that can predict if a hotel review is negative or positive
so that hotels can use it to classify their reviews correctly. we will analyze a specific hotel in
London and compare it to other hotels in London as well. We will walk through multiple
Natural Language Processing to understand how we can use machines to read reviews and
get insights out of it. Baseline models include Logistic Regression, Random Forest, Naive
Bayes, and Support Vector Machine (SVM). Ensemble models include Voting, Bagging,
Grid Search, AdaBoost, and Gradient Boosting. The final model was a Grid Search SVM
with an accuracy of 0.8268 and F1Score 0.8247.
This project walks through exploratory data analysis, data cleaning, sentiment analysis, data
preprocessing, vanilla models, and ensemble models.
Business Problem
One of the biggest problems that many companies have been trying to overcome is how to
take advantage of all the data collected from guests. The amount of data has challenged the
travel industry. One type of data is reviews left by guests on websites such as Booking.com,
TripAdvisor, and Yelp.
Hotels have been trying to find ways to analyze the reviews and get insights out of them.
However, some hotels can receive thousands of guests every week and hundreds of reviews.
It becomes nearly impossible and expensive for hotels to keep track of the reviews. Thus,
multiple hotels might ignore these valuable data due to the cost and energy that need to be
allocated. The other problem is that hotels such as Booking.com do not allow users to
choose their score.
Our actual client is a hotel in London called Britannia International Hotel Canary Wharf.
They have thousands of reviews and a 6.7 overall score on Booking.com. They think this is
a low score compared to other London hotels, and they want to understand what is causing
this low score. Due to COVID-19, they do not have the resources to read all the reviews
and make sense of them. Thus, they want to find a way to get quick insights without
having to read every review. They have a few business questions:
• Can we create a model that can correctly identify the most important features when
predicting if a review is positive or negative for all the reviews, we have available?
What are these features?
• What are the most mentioned words in negative and positive reviews? What insights
could they get from them? How would a word cloud for negative and positive
reviews look like for their hotel and in comparison, to other hotels?
• How does the client score performs compared to other hotels in the city?
While doing the Exploratory Data Analysis in the dataset downloaded from Kaggle, I
noticed that Britannia International Hotel Canary Wharf was the hotel with the highest
number of reviews. The average score is 6.7, which means that there is probably room for
improvement. It is more likely to find different word clouds for negative and positive
reviews.
Vanilla Model
For the modeling process, I chose multiple models, testing them with different vectorizers in
different stages of data cleaning. For the baseline models, I ran Logistic Regression, Random
Forest, Naive Bayes, and Support Vector Machine.
I ran the models with the Count Vectorizer and TF-IDF vectorizers to compared which one
would have the best performance. I also tried these models with and without lemmatization.
I did not include other features such as the name of the hotel or location because the main
objective is to train a model using the reviews only.
The Vanilla Models performed well since the beginning with an accuracy of 0.7981. The
time I spent cleaning the data paid off. The best performing model was a SVM model with
the accuracy score of 0.8233 and F1 Score of 0.8205 using the RBF kernel. However,
SVM models using RBF kernel do not allow feature importance retrieve. I tried running a
SVM model using linear kernel, but the performance was poor compared to the RBF.
Thus, Random Forest with lemmatized words was the winner between the vanilla models.
You can see all the models I ran in the vanilla model’s notebook.
Above we can see the top 25 most important features for each class. The green section
contains the features with the highest weight for the positive class and the red-ish contains
the features with the highest weight for the negative class. For the negative class feature
predictors, the lower the weight of the feature, the highest is the importance and for the
positive class feature predictor, the higher the weight Let's analyze the results.
When looking at the top features for the negative class predictor, we can find words that
everyone could expect from negative reviews such as awful, horrible, and bad. However,
with the negative class feature predictor, we can have more insights and areas that every
hotel should consider as critical. It's interesting to see that dirty has a higher weight than
overpriced and dated. Let's take a look at a few features:
Dirty: It's the most crucial feature when predicting negative reviews. This means that dirty
is a giant red flag. Rude: Probably talking about the staff, which means that the hotel needs
improvement in training Old: It's probably related to the hotel being outdated Overpriced:
This is obvious to me. If people pay more money than they think it's worth, they will
complain.
Final Recommendations
All models were trained with the reviews of over 1,400 hotels. Thus, the model can be used
for any hotel because it used over 515k reviews. Britannia International Hotel Canary Wharf
can use our model to correctly classify reviews at any point. However, the most important
takeaways here is the feature importance. Since guests tend to expect for the same things in
every hotel, we learned that words such as staff, location, comfortable, and dirty will make
have a higher value in their reviews. The words also match to the word cloud create for
Britannia International Hotel Canary Wharf, which proves that the hotel, similarly to other,
need to focus on those words.
I recommend the hotel start using word clouds to get quick insights from negative and
positive reviews. The negative reviews can be used to improve the business and should be
used as soon as possible if the hotel wants to increase their overall score. The hotel should
also use positive reviews can be used for advertisement, for example.
Conclusion Machine learning can be used to identify positive and negative reviews correctly.
However, identity with 100% confidence is difficult. My final model can be used for any
hotel to find feature importance Word clouds can be used to understand what words appear
the most in negative and positive reviews. The management can quickly take a look and get
insights out of it. The Britannia International Hotel Canary Wharf performs poorly in the
reviews compared to other hotels in London. There is a lot of room for improvement.
The system's successful deployment underscores its pivotal role in enhancing operational
efficiency, guest experiences, and revenue generation within the hotel industry. Through
automation and streamlining of processes, the system minimizes manual errors, improves
workflow efficiencies, and empowers staff to deliver superior and personalized guest
services. This not only elevates guest satisfaction but also contributes to fostering long-
term guest loyalty and positive reviews.
While this project has achieved significant milestones in creating an efficient hotel
management system, there's a recognition of ongoing opportunities for enhancement
and adaptation. Future iterations could explore integrating advanced analytics,
incorporating additional features for greater personalization, or expanding the system's
scalability to accommodate emerging industry trends.
o https://www.kaggle.com/datasets
RESEARCH PAPERS: -
Bibliography
2. Author, H. H., Author, I. I., & Author, J. J. (Year). Title of the Paper. In
Proceedings of the Conference Name(pp. Page Range). Publisher.