0% found this document useful (0 votes)
9 views55 pages

Final Report PDF

The document is a project report on 'Hotel Review Sentiment Analysis' submitted for a Bachelor of Technology in Computer Science and Engineering. It outlines the development of an Intelligent Hotel Review System (IHRS) that utilizes Natural Language Processing and sentiment analysis to automate the collection and analysis of hotel reviews, providing actionable insights for hotel management. The report includes acknowledgments, objectives, challenges, and the technological framework for the project, emphasizing the importance of customer feedback in enhancing guest experiences and operational efficiency in the hospitality industry.

Uploaded by

anujjkumar.work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views55 pages

Final Report PDF

The document is a project report on 'Hotel Review Sentiment Analysis' submitted for a Bachelor of Technology in Computer Science and Engineering. It outlines the development of an Intelligent Hotel Review System (IHRS) that utilizes Natural Language Processing and sentiment analysis to automate the collection and analysis of hotel reviews, providing actionable insights for hotel management. The report includes acknowledgments, objectives, challenges, and the technological framework for the project, emphasizing the importance of customer feedback in enhancing guest experiences and operational efficiency in the hospitality industry.

Uploaded by

anujjkumar.work
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

HOTEL REVIEW SENTIMENT ANALYSIS

A Project Report submitted in partial fulfillment of the requirements for


the award of the degree of

in

Computer Science and Engineering


By

Arya Siddarath Rao (2115000210)


Geeta Singh (2115000412)
Pratham Kumar (2115000754)

Under the Guidance of

MR. VIBHOO SHARMA


Department of Computer Engineering & Applications

Institute of Engineering & Technology


DECLARATION

I hereby declare that the work which is being presented in the B.Tech. Project
“HOTEL REVIEW SENTIMENT ANALYSIS,” in partial fulfillment of the
requirements for the award of the Bachelor of Technology in Computer Science
and Engineering and submitted to the Department of Computer Engineering and
Applications of GLA University, Mathura, is an authentic record of my own
work carried under the supervision of Mr. Vibhoo Sharma.

The contents of this project report, in full or in parts, have not been submitted to
any other institute or University for the award of any degree.

Sign Sign
Name of Student: Arya Siddarath Rao Name of student: Geeta Singh

University Roll No.:2115000210 University Roll No.:2115000412

Sign
Name of Student: Pratham Kumar

University RollNo.:2115000754

ii
CERTIFICATE

This is to certify that the above statements made by the candidate are correct to the
best of my/our knowledge and belief.

Supervisor
(Mr. Vibhoo Sharma)
Assistant Professor
Dept. of Computer Engg, & App.

Project Co-ordinator Program Co-ordinator


(Dr. Mayank Srivastava) (Dr. Nikhil Govil)
Associate Professor Associate Professor
Dept. of Computer Engg, & App. Dept. of Computer Engg, &App.

Date:

iii
ACKNOWLEDGEMENT

It gives us a great sense of pleasure to present the Project report undertaken during B.
Tech. This project is an acknowledgment of the inspiration, drive and technical
assistance contributed to it by many individuals. This project would never have seen the
light of the day without the help and guidance that we have received.

Our heartiest thanks to Dr. Sandeep Kumar Rathor, Head of Dept., Department of CEA
for providing us with an encouraging platform to develop this project, which thus helped
us in shaping our abilities towards a constructive goal.

We owe a special debt of gratitude to Mr. Vibhoo Sharma, Assistant Professor,


Institute of Engineering & Technology, for her constant support and guidance
throughout the course of our work. Her sincerity, thoroughness, and
perseverance have been a constant source of inspiration for us. She has showered
us with all his extensively experienced ideas and insightful comments at virtually
all stages of the project & has also taught us about the latest industry-oriented
technologies.

We also would not like to miss the opportunity to acknowledge the contribution
of all faculty members of the department for their kind guidance and cooperation
during the development of our project. Last but not the least, we acknowledge
our friends for their contribution to the completion of the project.
Sign Sign
Name of Student: Arya Siddarath Rao Name of Student: Geeta Singh
University Roll No.:21150000210 University Roll No.: 2115000754

Sign
Name of Student: Pratham Kumar
University Roll No.:2115000754

iv
ABSTRACT

The hospitality industry, being highly service-oriented and competitive, increasingly depends on
customer feedback and online reviews to maintain service quality and enhance guest satisfaction. With
the growing volume and complexity of reviews, manual processing has become inefficient,
necessitating an automated solution like the Intelligent Hotel Review System (IHRS). This system is
designed to streamline the collection, analysis, and presentation of hotel reviews, converting
unstructured textual feedback into actionable insights. IHRS integrates advanced Natural Language
Processing (NLP) and sentiment analysis to accurately detect and classify sentiments—positive,
negative, or neutral—associated with various aspects of the hotel experience such as room cleanliness,
staff interaction, food quality, and amenities. Using machine learning models that adapt over time, the
system continuously improves its understanding of language nuances, making it more reliable than
conventional methods. Beyond sentiment classification, IHRS identifies recurring themes and patterns
in guest reviews, offering hotel managers a comprehensive understanding of their strengths and areas
needing improvement. A key feature of the system is its interactive visualization dashboard, which uses
dynamic charts, graphs, heatmaps, and filters to present insights in a clear and actionable format. This
enables managers to monitor key performance indicators, evaluate trends, and make data-informed
decisions. Additionally, the system provides department-specific recommendations—such as service
enhancements for housekeeping, dining, or front-desk operations—helping to optimize internal
processes and elevate the overall guest experience. For prospective customers, IHRS delivers
summarized review highlights and sentiment summaries, aiding them in making well-informed booking
decisions based on real user experiences. In essence, IHRS represents a transformative approach to
hotel management by automating review analysis and delivering strategic insights that foster
continuous improvement, operational efficiency, and stronger guest relationships in an increasingly
data-driven hospitality landscape.

v
LIST OF FIGURES

4.1 DFD LEVEL 0 26


4.2 DFD LEVEL 1 27
4.3 DFD LEVEL 2 28
4.4 USE CASE DIAGRAM 29
4.5 SEQUENCE DIAGRAM 29
4.6 CLASS DIAGRAM 30

vi
LIST OF TABELS

4.5 Data Set 31

vii
CONTENTS

DECLARATION II
CERTIFICATE III
ACKNOWLEDGEMENT IV
ABSTRACT V
LIST OF FIGURES VI
LIST OF TABLES VII

CHAPTER 1 Introduction 1
1.1. Overview And Motivation 1
1.2 Objective 2
1.2. Key Challenges and Potential Solutions 3
1.2.1. Data Sparsity 3
1.2.2. Cold Start Problem 3
1.2.3. Diversity and Novelty 3
1.2.4. Overfitting 4
1.2.5. Interpretability 4
1.3. Contribution 4
1.4. Scope Definition 5
1.5. Organization Of the Project Report 5
CHAPTER 2 Tools and Technologies 6
2.1. Python 6
2.1.1. Basic Usage 6
2.1.2. Notable Features 7
2.2. Filtering 13
2.3. Item Based Filtering 14
viii
2.4. Anaconda 18
CHAPTER 3 Literature Reviews 20-21

CHAPTER 4 System Requirement Analysis 22

4.1. Introduction 22
4.1.1. Purpose 22
4.1.2. Document Convention: 22
4.1.3. Intended Audience: 23
4.2. Perspective: 23
4.3. Product Functions: 23
4.4. User Classes and Characteristics: 23
4.5. Operating Environment: 24
4.6. Performance Requirements: 24
4.6.1. Hardware Requirements: 24
4.6.2. Software Requirements: 24

CHAPTER 5 Software Design 26

5.1. Dataflow Diagram 26


5.1.1. DFD Level 0 26
5.1.2. DFD Level -1 27
5.1.3. DFD Level 2 28
5.2. Use Case Diagram 29
5.3. Sequence Diagram 29
5.4. Class Dagmar 30
CHAPTER 6 Implementation & Result 32-41

CHAPTER 7 Conclusion 42
CHAPTER 8 Summary 43

CHAPTER 8 REFERENCE 45

ix
Chapter 1 INTRODUCTION

1.1. Overview And Motivation

In the contemporary hospitality industry, customer reviews play a pivotal role in


influencing the choices of potential guests and shaping the reputation of hotels.
Recognizing the significance of guest feedback, the development of a Hotel Review
System becomes crucial to efficiently manage, analyze, and derive actionable
insights from the wealth of information provided by guests.

The Hotel Review System is a comprehensive platform designed to streamline the


process of collecting, processing, and utilizing guest reviews to enhance the overall
guest experience and improve hotel operations.

It integrates advanced technologies such as Natural Language Processing (NLP),


sentiment analysis, and data analytics to automate the extraction of valuable
information from textual reviews. This system goes beyond merely presenting
reviews; it aims to provide in-depth insights that can guide decision-making at
various levels within the hotel management hierarchy.

The Hotel Review System serves as a strategic tool for hotels to harness the power
of guest feedback, driving continuous improvement, and ensuring that the evolving
needs and preferences of guests are met with agility and precision.

Dept. of CEA, GLAU, Mathura 1


Chapter 1 Introduction

1.2. Objective

The objectives of a Hotel Review System are multifaceted and aim to address various
aspects of guest satisfaction, operational efficiency, and overall improvement in the
hospitality industry. Here are key objectives for a Hotel Review System

By aligning the Hotel Review System with these objectives, hotels can create a
dynamic feedback loop that not only addresses current guest experiences but also
contributes to the continuous improvement of services and operations.

• Provide a platform for guests to share their experiences, opinions, and feedback.
Identify positive aspects of guest experiences to reinforce and replicate. Identify
positive aspects of guest experiences to reinforce and replicate.

• Manage and maintain a positive online reputation by addressing guest concerns.


Promote positive reviews to bolster the hotel's image and attract new guests.
Minimize the impact of negative

reviews by actively addressing and resolving issues.

Dept. of CEA, GLAU, Mathura 2


Chapter 1 Introduction

1.3. Key Challenges and Potential Solutions

1.3.1. Data Sparsity

Data sparsity in the context of a Hotel Review System (HRS) refers to a situation
where the available data for analysis is insufficient or incomplete, making it
challenging to draw meaningful insights or make accurate predictions. In the context
of a hotel review system, data sparsity can manifest in various ways. Addressing data
sparsity in a Hotel Review System is crucial for ensuring the reliability and relevance
of the insights derived from guest feedback. Employing a combination of strategies
to encourage more reviews and diversify data sources can help mitigate the
challenges associated with sparse data.

1.3.2. Cold Start Problem

The "Cold Start Problem" in the context of a Hotel Review System (HRS) refers to
the challenges and limitations faced when dealing with new hotels that have limited
or no historical data, or when introducing new features or aspects for which there is
insufficient data. This problem can hinder the system's ability to provide accurate
recommendations, predictions, or insights, as it lacks the necessary information to
make informed decisions. Here are the key aspects of the Cold Start Problem in an
HRS

1.3.3. Diversity And Novelty

In the context of a Hotel Review System (HRS), diversity and novelty are essential
considerations for providing a rich and engaging user experience. Ensuring diversity
in the recommendations and presenting novel options to users can enhance their
satisfaction and encourage exploration. Here's how diversity and novelty contribute
to the effectiveness of an HRS.

Dept. of CEA, GLAU, Mathura 3


Chapter 1 Introduction

1.3.4. Overfitting
In the context of a Hotel Review System (HRS), overfitting refers to a situation
where the system learns the training data too well, capturing noise or random
fluctuations in the data rather than the underlying patterns or relationships. Overfitting
can lead to poor generalization performance, meaning that the system may perform
well on the training data but poorly on new or unseen data. Here's how overfitting can
impact an HRS and some strategies to mitigate it. By implementing these strategies, an
HRS can mitigate the risk of overfitting and ensure that its recommendations
generalize well to new data, providing accurate and robust suggestions to users.

1.3.5. Interpretability
Interpretability in the context of a Hotel Review System (HRS) refers to the system's
ability to provide clear, understandable, and meaningful explanations for its
recommendations or predictions. An interpretable HRS is valuable for both users and
stakeholders, as it fosters trust, aids in decision-making, and allows users to
comprehend the rationale behind the system's suggestions. Here are key aspects and
strategies related to interpretability in an HRS.

1.4. Contribution

o Enhanced Guest Experience: How: HRS allows hotels to gather feedback from
guests and identify areas for improvement. By addressing issues highlighted in
reviews, hotels can enhance the overall guest experience.

o Reputation Management: How: HRS helps hotels manage their online reputation
by monitoring and responding to guest reviews. Positive reviews can attract new
guests, while addressing negative reviews demonstrates responsiveness.

Dept. of CEA, GLAU, Mathura 4


Chapter 1 Introduction

1.5. Scope Definition


The scope definition for a Hotel Review System (HRS) outlines the boundaries,
objectives, and functionalities of the system. It helps establish a clear
understanding of what the system aims to achieve and what features it will
include. Here is a comprehensive scope definition for an HRS. This
comprehensive scope definition provides a foundation for the development and
implementation of a Hotel Review System, outlining its objectives,
functionalities, limitations, and considerations for future growth and
enhancements.

1.6. Organization Of the Project Report

Organizing a project report for a Hotel Review System (HRS) involves structuring
the content in a logical and coherent manner. A well-organized report enhances
readability and understanding. Below is a suggested organization for the project
report:

Organizing the project report in this manner ensures a structured and


comprehensive presentation of the Hotel Review System development, making it
accessible and informative for readers and stakeholders.

Dept. of CEA, GLAU, Mathura 5


Chapter 2 Tools and Technologies

2.1. Python

Data science and machine learning frequently employ the simple-to-learn


programming language Python. Even for sophisticated systems like
recommendation engines, it features an easy-to-use syntax that makes coding and
maintaining code for them straightforward tasks. Python include Surprise, Light
FM, Turi create, and Apache Spark. These libraries provide pre-built algorithms
and tools for building recommendation systems, making it easier to get started. A
scalable language with data handling capacity. It may be applied to the
development of recommendation engines that can work with both small and big
datasets.
A wonderful language for quick prototyping is Python. It enables programmers to
easily test out various techniques and algorithms when creating recommendation
systems.

2.1.1. Basic Usage

The following is a rudimentary example of Python usage in


Recommendation system Libraries
from sklearn. metrics.pairwise import cosine_similarity

item_similarity_matrix = cosine_similarity(train_data)

weighted_sum = item_similarity_matrix.dot(user_ratings['rating'].values)

Dept. of CEA, GLAU, Mathura 6


Chapter 2 Tools and Technologies

recommendations = pd.DataFrame({'item_id':
train_data['item_id'].unique(),
'weighted_avg': weighted_avg})

top_recommendations = recommendations.sort_values(by='weighted_avg',
ascending=False).head(num_recommendations)

top_recommendations = get_top_recommendations(user_id,
num_recommendations)

2.1.2. Notable Features

Declarative

Python is an interpreted language, therefore there is no requirement for


compilation while running code. This makes it simple to quickly create and test
code. dynamically-typed language, meaning that variable types are determined at
runtime rather than at compile time. This makes it more flexible and forgiving than
statically-typed languages. has a large and active community that has developed
many third-party libraries and frameworks for various purposes.

Components

Python code is executed using an interpreter. You can use the Python interpreter to
run code interactively or to execute Python scripts. Here is an example of
running some Python code in the interactive interpreter:

print ("Hello, World!")

Dept. of CEA, GLAU, Mathura


7
Chapter 2 Tools and Technologies

Hello, World!

Python comes with a large and comprehensive standard library that provides many
useful modules and functions for tasks like working with files, networking, and data
manipulation. Here is an example of using the os module from the standard library to
list files in a directory:

import os

dir_path = "/path/to/directory"

for filename in os.listdir(dir_path):


print(filename)

Third-Party Libraries and Frameworks: Python has a large and active community
that has developed many third-party libraries and frameworks for various
purposes. These libraries and frameworks make it easy to extend the functionality
of Python and build complex applications and systems. Here is an example of
using the popular NumPy library to create a 2D array:

import numpy as np

my_array = np.array([[1, 2], [3, 4]])

print(my_array)

8
Chapter 2 Tools And Technologies

Control Flow Structures

Python provides a variety of control flow structures, such as loops and conditional
statements, that allow developers to control the execution of their code.

my_list = [1, 2, 3, 4, 5]

for item in my_list:

print(item)
Python supports object-oriented programming, which allows developers to define
classes and objects that encapsulate data and behavior. Here's an example of
defining a class that represents a person:Ex:- class Person:

def init (self, name, age):


self.name = name
self.age = age
def greet(self):
print(f"Hello, my name is {self.name} and I am {self.age} years old.")
p = Person("Alice", 30) p.greet()

9
Chapter 2 Tools and Technologies

Filtering Techniques
A group of procedures known as filtering techniques are used in recommendation
systems to go through vast volumes of data and produce user-specific suggestions.
Filtering techniques are an important part of recommendation systems as they
allow for personalized recommendations to be generated for each user based on
their preferences and behavior. However, these techniques are not without their
limitations, and they can struggle with certain types of items or users, such as those
with limited interaction data. Therefore, it is important to carefully choose and
tune the filtering technique used in a recommendation system to ensure that it
provides accurate and useful recommendations for users.

Natural Language Processing in Recommendation System

The subject of research known as "Natural Language Processing" (NLP) focuses


on how human language and computers interact.
NLP can be used to analyze the text in product reviews, social media posts, or
other usergenerated content, to understand the sentiment, opinions, and
preferences of users. This information can then be used to recommend products or
services that are likely to be relevant to the user's interests.
NLP can be used to understand the meaning and intent behind user queries and
provide personalized search results based on the user's preferences and search history.
This can help to improve the relevance and accuracy of search results and provide a
more satisfying user experience.
Sentiment analysis is a natural language processing (NLP) technique that involves
analyzing text to determine the sentiment or opinion expressed in it. Sentiment
analysis can be used in recommendation systems to analyze user reviews and

Dept. of CEA, GLAU, Mathura 3


10
Chapter 2 Tools and Technologies

feedback to gain insights into the sentiment and opinions of users towards products
or services.
Analyzing the sentiment of user feedback, recommendation systems can identify
areas where users are experiencing issues or dissatisfaction, and suggest changes
that can improve the overall user experience. This can help to improve customer
satisfaction, loyalty, and retention.

Data Merging

Data merging is the process of combining different datasets that contain related
information to create a unified dataset. In recommendation systems, data merging
is often used to combine user and item data to create a user-item matrix that can
be used to generate personalized recommendations.
Think about an e-commerce site that sells books as an illustration. The website
features two datasets: one for individuals, which includes information on users like
name, age, and location; the other is for books, which includes information about
books such author, genre, and publisher.
The two datasets may be combined based on a shared variable, such the book ID,
to produce a user-item matrix. A single dataset containing data on the users, the
books, and their interactions (such as purchases, ratings, or reviews) will be
produced as a result.
Once the user-item matrix is created, it can be used to generate personalized
recommendations for each user based on their past interactions with the website.
For example, if a user has purchased several science fiction books in the past, the
recommendation system can use this information to recommend other science
fiction books that are likely to be of interest to the user.
In general, data merging is a crucial stage in developing recommendation
systems since it enables the system to draw on a variety of data sources to provide
individualized suggestions that are pertinent to and helpful to the user.

11
Chapter 2 Tools and Technologies

Python hooks

Python has a number of built-in books that can be used to learn the language and
its various features.

Python Cookbook: - This book is a collection of practical recipes and tips for
solving common programming problems in Python. It covers a wide range of
topics, including data structures, algorithms, file I/O, and network programming,
and provides useful code snippets and examples to help readers improve their
Python skills.

Fluent Python: - This book is an advanced guide to using Python for


experienced programmers. It covers advanced topics such as decorators, meta
classes, concurrency, and parallelism, and provides a deep dive into Python's
object- oriented programming features.

The Python Standard Library: - This book serves as a comprehensive


reference guide for the modules and functions that come with Python. It is an
indispensable tool for every Python coder since it gives a thorough overview of
the Python language's fundamental concepts and built-in libraries.

Python for Data Analysis: This book is a practical guide to using Python for data
analysis and manipulation. It covers a wide range of topics, including data

12
Chapter 2 Tools and Technologies

cleaning, visualization, and statistical analysis, and provides real-world examples


and case studies to help readers apply their Python skills to real-world problems.

2.2. Filtering
Content Based Filtering

The content-based approach uses additional information about users and/or items.
This filtering method uses item features to recommend other items similar to what
the user likes and based on their previous actions or explicit feedback. If we
consider the example for a movies recommender system, the additional
information can be, the age, the sex, the job or any other personal information for
users as well as the category, the main actors, the duration or other characteristics
for the movies i.e the items.

The main idea of content-based methods is to try to build a model, based on the
available “features,” that explain the observed user-item interactions. Still
considering users and movies, we can also create the model in such a way that it
could provide us with an insight into why so is happening. Such a model helps us in
making new predictions for a user easily, with just a look at the profile of this user
and based on its information, to determine relevant movies to suggest.

We can make use of a Utility Matrix for Content-Based Methods. A Utility Matrix
can help signify the user’s preference for certain items. With the data gathered
from the user, we can find a relation between the items which are liked by the user
as well as those which are disliked, for this purpose the utility matrix can be put
to best use. We assign a particular value to each user-item pair, this value is known
as the degree of preference and a matrix of the user is drawn with the respective
items to identify their preference relationship.

13
Chapter 2 Tools and Technologies

Look at a movie streaming service as an illustration, which suggests movies to


users using content-based filtering. To find similarities between films, the service
may examine movie titles, genres, actors, directors, and other elements. The
recommendation engine may suggest further films with Tom Hanks and Meg Ryan
or films in a related genre, such romantic comedies, if a user has already seen and
appreciated films featuring those stars.
Pseudo code for content-based filtering

• Define a user profile with preferences for certain features or characteristics


of items

• Define a list of items with attributes or content.

• Define a function to compute the similarity between two items based on


their attributes.

• Define a function to recommend items based on the user profile and the
item list

• all the recommend items function to generate personalized


recommendations

2.3. Item based Filtering


Item-based filtering is a strategy for making recommendations that compares
objects based on their characteristics or content and suggests items that are
comparable to those that a user has previously expressed interest in.

The main idea behind item-based filtering is to recommend items that are like the
items a user has already liked or rated highly, to increase the probability that the
user will also like the recommended items.

14
Chapter 2 Tools and Technologies

To achieve this, item-based filtering first builds a similarity matrix that measures
the similarity between each pair of items, based on their attributes or content. The
similarity matrix is then used to identify the top N most similar items to each item
in the user's history, and the ratings or reviews associated with these items are used
to predict the user's rating or preference for each recommended item.

For example, consider a movie recommendation system that uses item-based


filtering.

The algorithm would then compute the similarity between movies using a similarity
metric such as cosine similarity. The similarity between two movies A and B would
be computed as the dot product of their ratings vectors divided by the product of their
norms.

In conclusion, item-based filtering is a popular approach for building


recommendation systems that generate recommendations based on similarities
between items. It is computationally efficient, resilient to new items being added to
the system, and produces accurate recommendations even for users with limited
interaction history. The algorithm analyzes the interactions of users with items,
computes the similarity between items using a similarity metric, and identifies the
k-most similar items to a given item.

15
Chapter 2 Tools and Technologies

Pseudo code for item-based filtering

1. create a user-item matrix

2. compute item-item similarity matrix

3. transpose user-item matrix to get item-item matrix

4. generate recommendations for a user

5. get items the user has interacted with

6. compute weighted average of item similarities for each item the user has
interacted with

7. sort items by weighted average and return top k items

The system might analyze the movie titles, genres, actors, directors, and other
features to identify similarities between movies. If a user has watched and enjoyed
The Dark Knight, the recommendation system might identify other action movies
with similar features, such as The Avengers or Mission Impossible, as potential
recommendations.

The ability to offer accurate and comprehensible suggestions is one of the benefits
of item-based filtering. Item-based filtering, in contrast to other
recommendation

16
Chapter 2 Tools and Technologies

approaches like collaborative filtering, provides recommendations based just on


the characteristics or content of objects, without depending on user data or user-
to-user similarities. The justifications for the suggested things are so simpler to
understand and articulate.

These frameworks can offer many functions for creating recommendation systems,
ranging from straightforward collaborative filtering models to more intricate deep
learning models. The project's unique objectives and restrictions, like the size of
the dataset, the difficulty of the model, and the available computing resources, all
influence the framework that is chosen.

framework Scikit-learn is a popular Python library for machine learning,


including recommendation systems. It offers a range of algorithms and tools for
building recommendation systems, such as collaborative filtering, content-based
filtering, and matrix factorization.

Surprise is a Python package created primarily for creating collaborative filtering-


based r e c o m m e n d a t i o n systems. It provides a variety of pre-built
algorithms and tools for creating such models, including collaborative filtering
based on users and items.

An open-source framework called Apache Mahout may be used to create scalable


machine learning models, such as recommendation engines. It is compatible with
distributed computing platforms like Apache Hadoop and Apache Spark and
provides a variety of techniques and tools for creating these models, including
matrix factorization and collaborative filtering.

17
Chapter 2 Tools And Technologies

2.4. Anaconda

Anaconda is a popular open-source distribution of the Python and R programming


languages for scientific computing, data science, and machine learning. It comes with
a collection of pre-installed packages and tools that are commonly used in these
fields, making it easier for users to set up and manage their development
environments.
Download and install Anaconda from the official website:
https://www.anaconda.com/products/distribution

Use the `conda create` command to create a virtual environment for your hotel review
ReviewvSystem project. This helps to isolate your project dependencies.

Eg : bash

conda create --name hotel_management_system python=3.x

Activate the virtual environment

Eg :
conda activate hotel_management_system

Install the necessary Python packages for your hotel management system using
`pip` or `conda` commands. For example:

conda install pandas numpy Django

18
Chapter 2 Tools and Technologies

Anaconda has a large and active community of users and contributors. This
community support is valuable for getting help, sharing knowledge, and staying
updated on best practices in the data science and Python ecosystem.

Overall, Anaconda is a powerful tool for data scientists, researchers, and


developers working on projects that involve scientific computing, data analysis,
and machine learning. Its ease of use, package management capabilities, and pre-
installed libraries make it a popular choice for individuals and organizations in
these fields.

19
Chapter 3 Literature Review

Sr No Paper Name / Publication Methodology Conclusion


Author Year
1 Mohammad Mashrekul Kabir November, 2023 The study employs The proposed
Zulaiha Ali Othman and Mohd a hybrid aspect- model enhances
Ridzwan Yaakub
based sentiment sentiment
analysis approach accuracy and
using domain aspect extraction,
lexicons, Sent offering valuable
WordNet, and a insights through
Naïve Bayes improved
classifier to extract precision and
and classify visual trend
sentiments from analysis for hotel
hotel reviews. management and
potential guests.

2 Muhammad Sanwal, August, 2023 The study BERT


Muhammad Mamoon Mazhar compares outperforms other
traditional ML and models,
deep learning highlighting the
models (BERT, importance of
LSTM) for hotel contextual
review sentiment understanding in
analysis using sentiment analysis.
performance
metrics on a
labeled dataset.

3 Ha Thi Thu Nguyen 1 & July, 2022 The study uses NLP-based
Trung Xuan Nguyen2 Python and the sentiment analysis
Vader library to effectively
analyze 20,551 measures customer
hotel reviews from satisfaction, aiding
major Vietnamese Vietnam's hotel
cities for industry in data-
developing driven service
satisfaction improvement.
measurement
formulas.

4 Nicolau, J. L., Xiang, Z., & May, 2022 The study uses Daily sentiment

20
Wang, D. time-based influences key
sentiment analysis hotel performance
of daily hotel metrics,
reviews to supporting its use
examine their in real-time
impact on OR, pricing and
ADR, and reputation
RevPAR. strategies.
Dominic Gabbard York
5 University Accepted December, 2023 The study employs Online reviews
desktop research significantly
to analyze existing influence hotel
literature and data performance,
on the impact of supporting the
online reviews on need for active
hotel performance reputation
metrics across management and
global regions. strategic
engagement based
on review
sentiment and
credibility.

21
Chapter 4 Software Requirement Analysis

4.1. Introduction
The Recommendation systems have become an integral part of modern
software applications. They help users discover new products or services,
enhance the user experience, and increase customer satisfaction. Software
developers can use various techniques and algorithms to build
recommendation systems, depending on the application's requirements
and the available data.

4.1.1. Purpose

The purpose of software analysis is to understand the problem at hand,


identify the requirements, and determine the best approach to solving the
problem. It involves gathering and analyzing information about the
problem domain, the users, the available data, and the constraints and
limitations of the system. The software analysis helps to define the scope
of the project, identify the risks and challenges, and determine the
feasibility and cost-effectiveness of different solutions. The goal of
software analysis is to ensure that the software solution meets the user's
needs, is scalable, maintainable, and delivers value to the stakeholders.

4.1.2. Document Convention:

In this text, it will use font small 2 and overstriking for primary title, font
small 3 for secondary title and font 4 for the content. And it will use the
italic when mentions the name of the application Cin Suggest (Hotel
Review System).

4.1.3. Intended Audience:

Dept. of CEA, GLAU, Mathura 22


In general, the primary audience for a recommendation system includes
end-users who are looking for personalized and relevant recommendations
based on their preferences, history, and behavior.

In general, the primary audience for a recommendation system includes


end-users who are looking for personalized and relevant
recommendations based on their preferences, history, and behavior.

4.2. Perspective:
A Web Recommendation System using Content Based Filtering.

4.3. Product Functions:

The web application has a simple interface with

A. A network of professionals at your disposal


B. Promote a corporate culture based on mental wellbeing.

C. Increase the productivity of your business and decrease absenteeism.


D. Break the mound and drive change

E. Improve your company’s attraction & retention.

4.4. User Classes and Characteristics:


The Different user classes have distinctive interests, needs, and
behaviors that must be considered when making recommendations. For
instance, power users would need more specialized and individualized
advice whereas rookie users could need more training and help.

Dept. of CEA, GLAU, Mathura 23


4.5. Operating Environment:
Our software is a multi-functional software system based on the windows platform.
It is compatible and can run on 64 - bit laptop or ordinary desktop as well as
smartphones.

4.6. Performance Requirements:


Since the system uses client architecture, Performance requirements are critical
for ensuring that the recommendation system meets the user's needs and
expectations.

The choice of performance measure depends on the specific goals and


requirements of the recommendation system, and different measures may be
more appropriate for different scenarios. NDCG measures the effectiveness of
the recommendation system in terms of the quality of the recommendations.

4.6.1. Hardware Requirements:

The requests of the hardware for the web application are as followed:

- 64 bits laptop or desktop.

- Processor with 1.7-2.4gHz speed.

- Minimum of 8gb RAM

4.6.2. Software Requirements:

To access a web portal of this application, its only need a PC/Laptop/Mobile with
an integrated and updated web browser.

Dept. of CEA, GLAU, Mathura 24


Desktop browser: Safari, Chrome, Firefox, Opera,

Mobile browsers: Android, Chrome Mobile.

On the server side: A PC/Web Server which meets these specifications:

Window Operating System

At least 8 GB RAM and

150 GB Free Space Anaconda distribution

Dept. of CEA, GLAU, Mathura 25


Chapter 5 Software Design

5.1. Dataflow Diagram

5.1.1. DFD Level 0

Fig 5.1 DFD Level 0

Dept. of CEA, GLAU, Mathura 26


Fig 5.2 DFD Level 1
5.1.2 DFD Level 1

5.1.3 DFD Level 2

Dept. of CEA, GLAU, Mathura 27


Fig 5.3 DFD Level 2

Dept. of CEA, GLAU, Mathura 28


4.1. Use Case Diagram

Fig 5.4 Use Case Diagram

4.2. Sequence Diagram

Dept. of CEA, GLAU, Mathura 29


4.3. Class Diagram

Fig 5.6 Class Diagram

4.4. Data Set: -After processing through Kaggle

Dept. of CEA, GLAU, Mathura 30


Table 4.5 Data Set

This information can be gathered from a variety of sources, such hotel databases like
user ratings and reviews, or Kaggle data.

Kaggle hosts datasets covering a diverse array of topics, including but not limited to
machine learning, natural language processing, image recognition, finance,
healthcare, sports, and social sciences.

Many datasets on Kaggle are associated with data science competitions. These
competitions often have specific tasks and challenges that participants aim to solve
using machine learning and data analysis.

To explore datasets on Kaggle, you can visit the Kaggle Datasets page
(https://www.kaggle.com/datasets).

Dept. of CEA, GLAU, Mathura 31


Chapter 6 Implementation & Result

Overview
In this project, we will create a model that can predict if a hotel review is negative or positive
so that hotels can use it to classify their reviews correctly. we will analyze a specific hotel in
London and compare it to other hotels in London as well. We will walk through multiple
Natural Language Processing to understand how we can use machines to read reviews and
get insights out of it. Baseline models include Logistic Regression, Random Forest, Naive
Bayes, and Support Vector Machine (SVM). Ensemble models include Voting, Bagging,
Grid Search, AdaBoost, and Gradient Boosting. The final model was a Grid Search SVM
with an accuracy of 0.8268 and F1Score 0.8247.
This project walks through exploratory data analysis, data cleaning, sentiment analysis, data
preprocessing, vanilla models, and ensemble models.

Business Problem
One of the biggest problems that many companies have been trying to overcome is how to
take advantage of all the data collected from guests. The amount of data has challenged the
travel industry. One type of data is reviews left by guests on websites such as Booking.com,
TripAdvisor, and Yelp.
Hotels have been trying to find ways to analyze the reviews and get insights out of them.
However, some hotels can receive thousands of guests every week and hundreds of reviews.
It becomes nearly impossible and expensive for hotels to keep track of the reviews. Thus,
multiple hotels might ignore these valuable data due to the cost and energy that need to be
allocated. The other problem is that hotels such as Booking.com do not allow users to
choose their score.

Dept. of CEA, GLAU, Mathura 32


The score is determined by questions asked to the user, and then the review is calculated.
This is problematic because guests could have had a bad experience, and the hotel would
still get a 7 or 8 score, which gives an illusion that the guest didn't have any problems. This
project will build a model that can correctly predict if a hotel review is negative or
positive so that hotels can input their reviews and get a non-biased score.

Setting the hypothetical scenario

Our actual client is a hotel in London called Britannia International Hotel Canary Wharf.
They have thousands of reviews and a 6.7 overall score on Booking.com. They think this is
a low score compared to other London hotels, and they want to understand what is causing
this low score. Due to COVID-19, they do not have the resources to read all the reviews
and make sense of them. Thus, they want to find a way to get quick insights without
having to read every review. They have a few business questions:
• Can we create a model that can correctly identify the most important features when
predicting if a review is positive or negative for all the reviews, we have available?
What are these features?

• What are the most mentioned words in negative and positive reviews? What insights
could they get from them? How would a word cloud for negative and positive
reviews look like for their hotel and in comparison, to other hotels?
• How does the client score performs compared to other hotels in the city?

Why Britannia International Hotel Canary Wharf?

While doing the Exploratory Data Analysis in the dataset downloaded from Kaggle, I
noticed that Britannia International Hotel Canary Wharf was the hotel with the highest
number of reviews. The average score is 6.7, which means that there is probably room for
improvement. It is more likely to find different word clouds for negative and positive
reviews.

Dept. of CEA, GLAU, Mathura 33


We can see that there is a big class imbalance. Since our dataset is large, we can fix this using
the pandas sample function. Thus, I will only use 12% of the positive reviews for this stage,
which will get closer to the number of negative reviews, so it will get closer to the number
of negative reviews.
The reason why I removed the preprocessing manually is to make the spell checker faster,
since it takes more than 24 hours to check 100k. As a next step, I will keep the class
imbalance for the modeling stage.

Dept. of CEA, GLAU, Mathura 34


.

Vanilla Model
For the modeling process, I chose multiple models, testing them with different vectorizers in
different stages of data cleaning. For the baseline models, I ran Logistic Regression, Random
Forest, Naive Bayes, and Support Vector Machine.
I ran the models with the Count Vectorizer and TF-IDF vectorizers to compared which one
would have the best performance. I also tried these models with and without lemmatization.
I did not include other features such as the name of the hotel or location because the main
objective is to train a model using the reviews only.
The Vanilla Models performed well since the beginning with an accuracy of 0.7981. The
time I spent cleaning the data paid off. The best performing model was a SVM model with
the accuracy score of 0.8233 and F1 Score of 0.8205 using the RBF kernel. However,
SVM models using RBF kernel do not allow feature importance retrieve. I tried running a
SVM model using linear kernel, but the performance was poor compared to the RBF.
Thus, Random Forest with lemmatized words was the winner between the vanilla models.
You can see all the models I ran in the vanilla model’s notebook.

Best Vanilla Model using Count Vectorizer¶


As I mentioned, I tried different vectorizer to check which one would have the best
performance. Although Random Forest had a better performance than Logistic Regression,
my focus on this stage is to find out the most important features. Thus, the best performing
mode was Logistic Regression, which was also the first model I tried. Let's see the results.

Dept. of CEA, GLAU, Mathura 35


The result are quite good for the first model. I checked cross validation to confirm that there
is not underfitting or overfitting in the train set and the results were quite similar to the test
set. Looking at the confusion matrix, we can see that the model classifies correctly 75% of
both
Negative and Positive reviews. A Random Forest model had a better accuracy than Logistic
Regression. However, it had a lower F1 Score, which is the metric that I will be focusing
further in the project. For this reason, I chose Logistic Regression as my best model. You
can see below the performance of every model.

Dept. of CEA, GLAU, Mathura 36


Looking at the confusion matrix, we can see that the model needs improvements classifying
Positive reviews, since it has a higher number of False Negatives compared to False
Positives. Overall, the model performs well, but it has plenty of room for improvement for
next steps of this project. Now, let's take a look at the 50most important features for each
class using ELI5.

Above we can see the top 25 most important features for each class. The green section
contains the features with the highest weight for the positive class and the red-ish contains
the features with the highest weight for the negative class. For the negative class feature
predictors, the lower the weight of the feature, the highest is the importance and for the
positive class feature predictor, the higher the weight Let's analyze the results.

Dept. of CEA, GLAU, Mathura 37


Top Features - Positive Class
positive_features.transpose()
Excellent Great Amazing Comfortable Lovely Bit Perfect
Spacious \

0 6.092 5.27 5.082 5.01 4.544 4.222 4.15


3.823
Loved Friendly Wonderful Fantastic Quiet Brilliant Superb
Nice \
Negative Come Beautiful Modern Helpful Little Large Fabulous
0 2.674 2.61 2.56 2.482 2.424 2.266 2.235 2.234
0 3.818 3.596 3.407 3.221 2.963 2.935 2.896
2.685
The top features for predicting positive reviews weren't a surprise. Most of the features are
adjectives that we can relate to positive reviews, such as excellent, great, and amazing.
Although these words might not give much insight, some others are very related to hotels
and can give us insights. Hotels should make sure that there are delivering this aspect to their
guests. Let's check a few that I believe can carry insights:
Comfortable: The most important aspect of a hotel is comfortable, so the guest can rest Bit:
It doesn't carry much meaning Spacious: It's good when a hotel has a spacious room
Friendly: It could be talking about how friendly the staff is, a very important aspect Quiet:
It seems like quiet places are something that guests are looking for Negative: Although it is
a negative word, I assume that the guests are saying that there isn't anything negative about
the hotel Modern: Modern hotels seem to be noticed in the reviews Helpful: It's probably
walking about the staff

Top Features - Negative Class


negative_features.sort_values(by=0, ascending=False)
0
Dirty 5.308
Rude 4.277

Dept. of CEA, GLAU, Mathura 39


Terrible 3.543
Poor 3.451
Worst 3.248
Tired 3.230
Old 3.208
Bad 3.105
Basic 2.865
Uncomfortable 2.843
Star 2.830
Horrible 2.801
Money 2.776
Paid 2.699
Overpriced 2.631
Awful 2.556
Unfriendly 2.428
Charged 2.252
Broken 2.234
Management 2.211
Dated 2.133
Tiny 2.118
Work 2.077
Run 2.066
Attitude 2.064
Dusty 2.040

When looking at the top features for the negative class predictor, we can find words that
everyone could expect from negative reviews such as awful, horrible, and bad. However,
with the negative class feature predictor, we can have more insights and areas that every
hotel should consider as critical. It's interesting to see that dirty has a higher weight than
overpriced and dated. Let's take a look at a few features:
Dirty: It's the most crucial feature when predicting negative reviews. This means that dirty
is a giant red flag. Rude: Probably talking about the staff, which means that the hotel needs
improvement in training Old: It's probably related to the hotel being outdated Overpriced:
This is obvious to me. If people pay more money than they think it's worth, they will
complain.

Dept. of CEA, GLAU, Mathura 40


Final Model Evaluation
Logistic Regression was the final model for its high accuracy and showed the feature
importance for each class. The accuracy was 0.8183, which means that the model can
correctly classify the target variable 81.83% of the time. Looking at cross-validation, we can
see that the model performed similarly in the train set. I used 5 folds, and the range difference
between the highest accuracy and lowest accuracy was a very small difference.
Looking at the confusion matrix, we can see that the overall performance is ok. However, it
tends to predict more False Positives than False Negatives. There is definitely room for
improvement as next steps.

Final Recommendations
All models were trained with the reviews of over 1,400 hotels. Thus, the model can be used
for any hotel because it used over 515k reviews. Britannia International Hotel Canary Wharf
can use our model to correctly classify reviews at any point. However, the most important
takeaways here is the feature importance. Since guests tend to expect for the same things in
every hotel, we learned that words such as staff, location, comfortable, and dirty will make
have a higher value in their reviews. The words also match to the word cloud create for
Britannia International Hotel Canary Wharf, which proves that the hotel, similarly to other,
need to focus on those words.
I recommend the hotel start using word clouds to get quick insights from negative and
positive reviews. The negative reviews can be used to improve the business and should be
used as soon as possible if the hotel wants to increase their overall score. The hotel should
also use positive reviews can be used for advertisement, for example.
Conclusion Machine learning can be used to identify positive and negative reviews correctly.
However, identity with 100% confidence is difficult. My final model can be used for any
hotel to find feature importance Word clouds can be used to understand what words appear
the most in negative and positive reviews. The management can quickly take a look and get
insights out of it. The Britannia International Hotel Canary Wharf performs poorly in the
reviews compared to other hotels in London. There is a lot of room for improvement.

Dept. of CEA, GLAU, Mathura 41


Chapter 7 Conclusion

In conclusion, the development and implementation of the hotel management system


represent a significant stride in revolutionizing the operational landscape of hospitality
establishments.
Throughout this project, our primary aim was to create an integrated and efficient system
that optimizes various facets of hotel operations, from reservations and guest profiles to
billing, inventory, and staff administration.

The system's successful deployment underscores its pivotal role in enhancing operational
efficiency, guest experiences, and revenue generation within the hotel industry. Through
automation and streamlining of processes, the system minimizes manual errors, improves
workflow efficiencies, and empowers staff to deliver superior and personalized guest
services. This not only elevates guest satisfaction but also contributes to fostering long-
term guest loyalty and positive reviews.

Furthermore, the system's data-centric approach enables hotel management to make


informed decisions backed by comprehensive analytics and insights. Centralized data on
occupancy rates, revenue trends, and guest preferences equips management with the
tools to strategize effectively, optimize resources, and tailor services to meet evolving
guest demands, ultimately bolstering the hotel's competitiveness in the market.

While this project has achieved significant milestones in creating an efficient hotel
management system, there's a recognition of ongoing opportunities for enhancement
and adaptation. Future iterations could explore integrating advanced analytics,
incorporating additional features for greater personalization, or expanding the system's
scalability to accommodate emerging industry trends.

In essence, the hotel management system stands as a testament to the convergence of


technology and hospitality, facilitating a more seamless, personalized, and efficient
guest experience while empowering hotel management with data-driven insights to
thrive in an increasingly competitive industry landscape.

Dept. of CEA, GLAU, Mathura 42


Chapter 8 Summary
The hotel management system project epitomizes a concerted effort to
revolutionize and streamline the intricate operations within the hospitality
industry. With a focus on efficiency, guest satisfaction, and data-driven decision-
making, this project aimed to create a comprehensive system encompassing
various modules to optimize hotel operations.

The development journey began with a meticulous understanding of the industry's


needs, leading to the design and implementation of a robust system architecture.
This architecture, comprising modules for reservation management, guest profiles,
room allocation, billing, inventory control, and staff administration, laid the
foundation for a seamless and integrated system.

Throughout the project, a key emphasis was placed on enhancing operational


efficiency. The system's automation of routine tasks, such as reservations and
billing, significantly reduced manual errors and improved staff productivity. This
efficiency translated into smoother operations, quicker guest services, and a more
personalized experience for visitors.

Moreover, the system's data-centric approach provided invaluable insights. By


centralizing data and offering comprehensive analytics, it empowered hotel
management to make informed decisions. This data-driven decision-making not
only optimized resource allocation but also facilitated the tailoring of services to
meet guests' ever-evolving preferences, ultimately bolstering the hotel's competitive
edge.

While the project achieved significant milestones in creating an efficient hotel


management system, it also highlighted avenues for future growth. Potential
enhancements could include the incorporation of advanced analytics for predictive

Dept. of CEA, GLAU, Mathura 43


insights or expanding the system's scalability to accommodate emerging industry
trends.

In summary, this hotel management system project epitomizes the fusion of


technology and hospitality, aiming to deliver superior guest experiences while
enabling hotel management to navigate the industry's challenges through informed
decision-making and operational excellence.

By incorporating these theoretical aspects into the design and implementation of a


hotel management system, hoteliers can create a robust and efficient system that
enhances the overall guest experience while optimizing operational processes.

Dept. of CEA, GLAU, Mathura 44


Chapter 8 References

R. A. Bawa o Hotel Management and Operations" by Bardi, James A. o Platforms


like Coursera, Udemy, and edX offer courses on hotel management systems,
databases, and software development that can provide practical insights.

o Kaggle website for datasets.

o https://www.kaggle.com/datasets

RESEARCH PAPERS: -

Bibliography

1. Author, F. F., & Author, G. G. (Year). Title of the Paper. In Proceedings of


the Conference Name(pp. Page Range). Publisher.

2. Author, H. H., Author, I. I., & Author, J. J. (Year). Title of the Paper. In
Proceedings of the Conference Name(pp. Page Range). Publisher.

3. Kasavana, M. L., & Brooks, R. M. (Year). Managing Front Office


Operations. American Hotel & Lodging Educational Institute.

Dept. of CEA, GLAU, Mathura

Dept. of CEA, GLAU, Mathura 45

You might also like