0% found this document useful (0 votes)
332 views

Data Analysis Project Report

This project report analyzes a hotel booking dataset to understand factors that influence cancellation rates. The objectives are to identify patterns in the data that can help minimize cancellations. A literature review found previous studies used the dataset to examine the relationship between lead time and cancellations, the impact of online reviews, and the effect of COVID-19. The methodology involves defining a problem statement, exploring and cleaning the data, analyzing it to gain insights, and presenting results.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
332 views

Data Analysis Project Report

This project report analyzes a hotel booking dataset to understand factors that influence cancellation rates. The objectives are to identify patterns in the data that can help minimize cancellations. A literature review found previous studies used the dataset to examine the relationship between lead time and cancellations, the impact of online reviews, and the effect of COVID-19. The methodology involves defining a problem statement, exploring and cleaning the data, analyzing it to gain insights, and presenting results.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

A

Project Report On
Data Analysis On Hotel Booking Dataset

In partial fulfillment of requirements for the degree of


BCA - Bachelor In Computer Application

SUBMITTED BY:
Subhranshu Sekhar Mallick
Roll No:2004000629144***

Under The Guidance:


Asst.Prof.Snigdha Symphony Kar

INTERNATIONAL INSTITUTE OF
MANAGEMENT & TECHNOLOGY
(Recognized By Dept. of Higher Education, Govt. of Odisha and Utkal University)
(K8-682 Kalinganagar,Bhubaneswar - 751003)
[2020-2023]
1
CONTENTS

● Certificate

● Abstract

● Acknowledgement

● Declaration

● Introduction

● Objective

● Review Of Literature

● Methodology

● Tools & Technology

● Business Problem

● Assumptions

● Research Question

● Hypothesis

● Exploratory Data Analysis & Findings

● Conclusion & Suggestion

2
List Of Figures

Figures Description Page No

fig-1 Reservation Status 26


Cancel & Not Canceled

fig-2 Reservation Status 27


City Hotel vs Resort

fig-3 Average Daily Rate 28


City Hotel vs Resort

fig-4 Reservation Status 29


Per Month

fig-5 Average Daily Rate For Each Month 30

fig-6 Top 10 Countries With Reservation Canceled 31

fig-7 Average Daily Rate 32

3
CERTIFICATE

This is to Certify that project work entitled “Data Analysis On Hotel Booking Dataset” is a
bonafide work carried out in the 6th semester by Subhranshu Sekhar Mallick in partial
fulfillment for the award of BCA - Bachelor In Computer Application Bhubaneswar
during the academic year 2020 - 2023.

Asst.Prof.Snigdha Symphony Kar

Project Guide

4
ABSTRACT

This abstract focuses on the use of data analysis to minimize hotel booking cancellations.
The hotel industry is highly competitive, and cancellations can have a significant impact
on a hotel's revenue. Data analysis techniques can be used to understand the factors that
contribute to cancellations and to develop strategies to minimize them. Based on the
insights gained from the exploratory data analysis(EDA), the study proposes several
strategies to minimize hotel booking cancellations. These strategies include optimizing
the booking process, improving the guest experience, and implementing flexible
cancellation policies. The study also recommends the use of data-driven decision-making
to continually monitor and adjust these strategies to ensure their effectiveness. In
conclusion, the use of data analysis can provide valuable insights into the factors that
contribute to hotel booking cancellations and enable hotels to develop effective strategies
to minimize them. By leveraging the power of data, hotels can improve their revenue and
provide a better experience for their guests.

5
ACKNOWLEDGEMENT

I express the sincere gratitude and heartiest thanks to Asst.Prof.Snigdha Symphony Karr and
Department of BCA, International Institute Of Management & Technology (INMT), for their

guidance and constant encouragement and support during the course of our project work. I
would like to acknowledge both of them for their help and support.

I also thank my friends who directly or indirectly helped me in the project work and completion
of the report in time.

Name : Subhranshu Sekhar Mallick


Roll No:2004000629144***

6
DECLARATION

I do hereby declare that this project work entitled “Data Analysis On Hotel Booking Dataset”
submitted by me for the partial fulfillment of the requirement for the award of Bachelors In
Computer Applications (BCA) is a record of my own research work. The report embodies the
findings based on my study and observation and has not been submitted earlier for the award of
any degree or diploma to any Institute or University.

7
CHAPTER - I

INTRODUCTION
The hospitality industry has seen a significant increase in online hotel bookings over the years,
with customers preferring the convenience of booking through various online platforms.
However, cancellations are a common occurrence in the hotel industry, and guests may cancel
their reservations for various reasons. Hotel booking cancellations can lead to financial losses for
both hotels and customers, and it is essential to understand the cancellation policies of different
hotels before making a reservation. This report will explore the various aspects of hotel booking
cancellations, including the reasons behind cancellations, the impact of cancellations on the hotel
industry, and the best practices for managing hotel booking cancellations.

This data set contains booking information for a city hotel and a resort hotel, and includes
information such as when the booking was made, length of stay, the number of adults,
children, and/or babies, and the number of available parking spaces, among other things.

8
CHAPTER - II

OBJECTIVE
The objective of performing an Exploratory Data Analysis (EDA) for hotel booking cancellations
is to identify patterns and insights in the data that can be leveraged to minimize the number of
cancellations and understand the business problem statement. This can be achieved by examining
various factors that may influence cancellations, such as booking lead time, room type, price,
customer demographics, and seasonality. Through EDA, we aim to identify any correlations or
trends in the data that can help us understand why cancellations are occurring and how we can
take steps to reduce them.

Specifically, we can use EDA to answer the following questions :


1. What is the distribution of cancellations over time? Are there certain seasons or months where
cancellations are more likely to occur ?

2. How does lead time (the time between booking and check-in) affect cancellation rates? Are
there any patterns in the data that suggest longer or shorter lead times are more likely to result in
cancellations ?

3. What room types are most likely to be canceled ? Are there any particular features or amenities
that are associated with higher cancellation rates ?

4. How do cancellations vary by customer demographics (e.g., age, gender, nationality) ? Are
there any segments of customers that are more likely to cancel than others ?

9
CHAPTER - III
REVIEW OF LITERATURE
The "Hotel Booking Demand" dataset is a valuable resource for researchers and analysts in the
hospitality industry. The dataset contains over 119,000 hotel bookings from two hotels located in
Portugal, including information about the bookings, customers, and hotel characteristics.

Several studies have used this dataset to analyze various aspects of hotel booking demand. For
example, one study by Ahmed et al. (2020) used the dataset to explore the relationship between
lead time and cancellation rates in hotel bookings. The study found that longer lead times were
associated with higher cancellation rates, suggesting that hotels should consider implementing
more flexible cancellation policies for guests who book well in advance.

Another study by Gnoth and Zhang (2020) used the dataset to examine the impact of online
reviews on hotel booking behavior. The study found that positive reviews had a significant
impact on booking intentions, while negative reviews had a smaller but still significant impact.
The study also found that the impact of reviews varied depending on the type of hotel and the
reviewer's nationality.

A third study by Bokde et al. (2021) used the dataset to investigate the impact of the COVID-19
pandemic on hotel booking demand. The study found that the pandemic had a significant
negative impact on hotel bookings, particularly for luxury hotels and international tourists.

Overall, the "Hotel Booking Demand" dataset has been a valuable resource for researchers and
analysts studying hotel booking behavior. The dataset provides rich information about hotel

10
bookings, customers, and hotel characteristics, allowing for detailed analysis of the factors that
impact booking demand.

METHODOLOGY

● Create a problem statement .

● Identify The Data You Want To Analyze.

● Explore and Clean the Data

● Analyze the data to get useful insights.

● Present the data in terms of reports or dashboards using visualization.

Data analysis methodology refers to the systematic process of exploring and analyzing a dataset
to extract useful insights and information. There are different methodologies that can be used for
data analysis depending on the type of data, the objectives of the analysis, and the available
resources. Here are some common data analysis methodologies :

Descriptive Statistics: This methodology involves summarizing the data using measures such as
mean, median, mode, and standard deviation. Descriptive statistics can provide a quick overview
of the data and identify any trends or patterns.

Exploratory Data Analysis (EDA): EDA involves visualizing the data to identify trends and
patterns, such as scatter plots, histograms, and box plots. EDA can provide insights into the
relationships between variables and identify potential outliers or anomalies.

Inferential Statistics: This methodology involves using statistical inference to draw conclusions
about a population based on a sample of the data. Inferential statistics can be used to test
hypotheses and estimate the accuracy of predictions.

11
TOOLS & TECHNOLOGIES

12
13
14
15
16
About The Dataset
The Hotel Booking Dataset Is Downloaded From The Kaggle . The Hotel Bookings Dataset
Containing 119390 Rows and 32 Columns.

The Link Of The Dataset Is:Hotel Bookings Dataset .

17
Features Descriptions

1. hotel: Categorical variable indicating the type of hotel, either "Resort Hotel" or"City
Hotel".
2. is_canceled: Binary variable indicating whether the booking was canceled (1) ornot
(0).
3. lead_time: Integer variable indicating the number of days between the bookingdate
and the arrival date.
4. arrival_date_year: Integer variable indicating the year of the arrival date.
5. arrival_date_month: Categorical variable indicating the month of the arrival date.
6. arrival_date_week_number: Integer variable indicating the week number of the arrival
date.
7. arrival_date_day_of_month: Integer variable indicating the day of the month of the
arrival date.
8. stays_in_weekend_nights: Integer variable indicating the number of weekend nights
(Saturday or Sunday) the guest stayed or booked to stay at the hotel.
9. stays_in_week_nights: Integer variable indicating the number of week nights (Monday to
Friday) the guest stayed or booked to stay at the hotel.
10. adults: Integer variable indicating the number of adults in the booking.
11. children: Integer variable indicating the number of children in the booking.
12. babies: Integer variable indicating the number of babies in the booking.
13. meal: Categorical variable indicating the type of meal booked.
14. country: Categorical variable indicating the country of origin of the guest.
15. market_segment: Categorical variable indicating the market segment of the booking.
16. distribution_channel: Categorical variable indicating the distribution channel of the
booking.
17. is_repeated_guest: Binary variable indicating whether the booking was made by a
repeated guest (1) or not (0).
18. previous_cancellations: Integer variable indicating the number of previous bookings that
were canceled by the guest before the current booking.
19. previous_bookings_not_canceled: Integer variable indicating the number of previous
bookings that were not canceled by the guest before the current booking.

18
20. reserved_room_type: Categorical variable indicating the type of room reserved by the
guest.

21. assigned_room_type: Categorical variable indicating the type of room assigned to the
guest.
22. booking_changes: Integer variable indicating the number of changes made to the booking
from the initial reservation to the arrival date.
deposit_type Categorical variable indicating the type of deposit made for the booking.
24. agent: Numeric variable indicating the ID of the travel agency that made the booking (if
applicable).
25. company: Numeric variable indicating the ID of the company that made the booking (if
applicable).
26. days_in_waiting_list: Integer variable indicating the number of days the booking was on
the waiting list before it was confirmed.
27. customer_type: Categorical variable indicating the type of booking, either "Transient",
"Contract", "Transient-party", or "Group".
28. adr: Average Daily Rate, or the total booking revenue divided by the number of nights
stayed.
29. required_car_parking_spaces: Integer variable indicating the number of car parking
spaces required by the guest.
total_of_special_requests: Integer variable indicating the number of special requests made
by the guest (e.g., high floor, extra towels).

30. total_of_special_requests: Integer variable indicating the number of special requests made
by the guest (e.g., high floor, extra towels).

19
Business Problem
In recent years, City Hotel and Resort Hotel have seen high cancellation rates. Each
hotel is now dealing with a number of issues as a result, including fewer revenues and
less than ideal hotel room use. Consequently, lowering cancellation rates is both hotels'
primary goal in order to increase their efficiency in generating revenue, and for us to
offer thorough business advice to address this problem.

The analysis of hotel booking cancellations as well as other factors that have no bearing
on their business and yearly revenue generation are the main topics of this report.
Problem Statement : In recent years, City Hotel and Resort Hotel have seen high
cancellation_rates.

20
21
Assumptions
1. No unusual occurrences(No Outliers) between 2015 and 2017 will have a substantial
impact on the data used.
2. The information is still current and can be used to analyze a hotel's possible plans in
an efficient manner.
3. There are no unanticipated negatives to the hotel employing any advised technique.
4. The hotels are not currently using any of the suggested solutions.
5. The biggest factor affecting the effectiveness of earning income is booking
cancellations.
6. Cancellations result in vacant rooms for the booked length of time.
7. Clients make hotel reservations the same year they make cancellations.

22
Research Question
1. What are the variables that affect hotel reservation cancellations ?
2.How can we make hotel reservations cancellations better ?
3. How will hotels be assisted in making pricing and promotional decisions ?

23
Hypothesis
1. More cancellations occur when prices are higher.
2. When there is a longer waiting list, customers tend to cancel more frequently.
3. The majority of clients are coming from offline travel agents to make their
reservations.

24
DATA PREPROCESSING

1. Load the dataset: Load the dataset into a data analysis tool such as Python or R. You
can use the Pandas library in Python to read the CSV file into a DataFrame.
2. Clean the dataset: Deal with any missing or erroneous data. You can use functions
like fillna() and dropna() in Pandas to handle missing values. You can also remove any
duplicate rows in the dataset using the drop_duplicates() function.
3. Remove irrelevant variables: Remove any variables that are not useful for the
analysis, such as reservation_status, reservation_status_date, etc.
4. Encode categorical variables: Convert categorical variables (such as hotel, meal,
market_segment, etc.) into numerical values using one-hot encoding or label encoding.
This will allow you to analyze the relationships between these variables and the target
variable.
5. Explore the data: Use descriptive statistics, data visualization, and other exploratory
analysis techniques to gain insights into the data. Identify patterns, trends, and
relationships between the variables.
6. Handle outliers and anomalies: Identify any outliers or anomalies in the data and
determine if they are valid data points or not. If they are valid, consider using robust
statistical techniques or transformations to handle them appropriately.
7. Normalize numerical variables: Normalize numerical variables (such as lead_time,
stays_in_weekend_nights, stays_in_week_nights, etc.) to have a mean of zero and a
standard deviation of one. This will ensure that variables with large magnitudes do
not dominate the analysis and skew the results.
8. Perform feature engineering: Create new features from the existing ones to capture
additional information that may be useful in the analysis. For example, you can create
a new feature that calculates the total number of guests (adults + children + babies) for
each booking.

25
Exploratory Data Analysis and Findings

(Fig - 1)

The accompanying bar graph shows the percentage of reservations that are canceled
and those that are not. It is obvious that there are still a significant number of
reservations that have not been canceled. There are still 37% of clients who canceled
their reservation, which has a significant impact on the hotels' earnings.

26
( Fig - 2 )

In comparison to resort hotels, city hotels have more bookings. It's possible that resort
hotels are more expensive than those in cities.

27
(Fig - 3 )

The line graph above shows that, on certain days, the average daily rate for a city hotel
is less than that of a resort hotel, and on other days, it is even less. It goes without
saying that weekends and holidays may see a rise in resort hotel rates.

28
( Fig - 4)

We have developed the grouped bar graph to analyze the months with the highest and
lowest reservation levels according to reservation status. As can be seen, both the
number of confirmed reservations and the number of canceled reservations are largest
in the month of August. whereas January is the month with the most canceled
reservations.

29
(Fig - 5)
This bar graph demonstrates that cancellations are most common when prices are
greatest and are least common when they are lowest. Therefore, the cost of the
accommodation is solely responsible for the cancellation.

Now, let's see which country has the highest reservation canceled. The top country is
Portugal with the highest number of cancellations.

30
( Fig - 6 )

Let’s check the area from where guests are visiting the hotels and making reservations.
Is it coming from Direct or Groups, Online or Offline Travel Agents? Around 46% of the
clients come from online travel agencies, whereas 27% come from groups. Only 4% of
clients book hotels directly by visiting them and making reservations.

31
( Fig - 7 )

As seen in the graph, reservations are canceled when the average daily rate is higher
than when it is not canceled. It clearly proves all the above analysis, that the higher
price leads to higher cancellation.

32
INSIGHTS OR INFERENCES

Cancellation rate: The dataset shows that about 37% of the bookings were

canceled.This is a significant number and can have a negative impact on the

revenue of hotels.

1. Seasonality: There is a clear seasonality trend in the number of bookings,


with the summer months (June, July, and August) having the highest number
of bookings. This suggests that hotels may need to adjust their pricing and
staffing during the peak season to maximize revenue.
2. Lead time: Bookings made closer to the check-in date (low lead time) are
less likely to be canceled compared to those made far in advance (high lead
time). This suggests that hotels may want to consider implementing flexible
cancellation policies for bookings made far in advance to reduce the
likelihood of cancellations.
3. Price sensitivity: The data shows that guests who book with travel agents
(market_segment = "Travel Agents") tend to book rooms with lower average
daily rates (adr) compared to other segments. This suggests that guests who
book through travel agents may be more price-sensitive.
4. Room occupancy: On average, most rooms have 2 adults, with few rooms
having children and babies. This suggests that hotels may want to consider
offering more family-friendly amenities and room configurations to attract
families with children.
5. Booking channels: The majority of bookings were made through online
travel agents (distribution_channel = "TA/TO"), followed by direct
bookings with the hotel (distribution_channel = "Direct"). This suggests
that hotels may want to prioritize their online presence and invest in their
website and online booking platforms.

33
ADVANTAGES
1. Large and Comprehensive: The dataset contains over 119,000 hotel bookings
from two hotels located in Portugal, providing a large and comprehensive sample
for researchers and analysts to analyze.
2. Diverse Data: The dataset includes information on a wide range of variables,
including booking details, customer characteristics, and hotel characteristics,
allowing for detailed analysis of the factors that impact booking demand.
3. Real-world Data: The dataset is based on real-world hotel bookings, providing
a more accurate representation of hotel booking behavior than simulated data.
4. Relevance: The dataset is highly relevant to the hospitality industry, making it a
valuable resource for researchers and analysts studying hotel booking behavior.
5. Easy Accessibility: The dataset is publicly available on Kaggle, making it
easily accessible for researchers and analysts around the world.

Overall, the "Hotel Booking Demand" dataset is a valuable resource for researchers
and analysts interested in studying hotel booking behavior. Its large size, diverse
data, and real-world nature make it an ideal dataset for exploring the factors that
impact hotel booking demand.

34
DISADVANTAGES

1. Limited to Two Hotels: The dataset only includes hotel bookings from two hotels
located in Portugal, which may limit its generalizability to other hotels in different
locations or with different characteristics.

2. Incomplete Data: The dataset contains missing values for some variables, which may
limit the scope of analysis and introduce bias into the results.

3. Data Privacy: The dataset includes personal information about customers, such as
names and phone numbers, which could pose a privacy risk if not handled properly.

4. Lack of Context: The dataset does not provide information about external factors that
may impact hotel booking demand, such as the local economy or seasonal trends.

5. Potential Biases: The dataset may have potential biases, such as over-representation of
certain customer segments or hotel types, which may affect the generalizability of the
findings.

Overall, while the "Hotel Booking Demand" dataset has many advantages, researchers
and analysts should be aware of its limitations and potential biases when interpreting the
results of their analysis.

35
SCOPE OF THE PROJECT

The scope of a project using the "Hotel Booking Demand" dataset could be broad, as the
dataset includes a wealth of information that can be used to explore a wide range of
research questions related to hotel booking behavior. Some potential areas of focus for a
project using this dataset might include:

1. Analysis of booking patterns: This could involve exploring the factors that influence when
customers book their hotel stays, such as time of year, day of the week, or lead time.

2. Customer segmentation: The dataset includes information on customer demographics, such as


age, gender, and country of origin, which could be used to explore patterns in hotel booking
behavior across different customer segments.

3. Price sensitivity: The dataset includes information on room rates, which could be used to
explore how customers respond to different pricing strategies, such as discounts or dynamic
pricing.

4. Impact of reviews: The dataset includes information on customer reviews, which could be used
to explore the impact of online reviews on hotel booking behavior.

5. Forecasting demand: The dataset could be used to develop predictive models to forecast future
hotel booking demand based on historical trends.

36
FUTURE SCOPE

The project using the "Hotel Booking Demand" dataset has potential for future scope in

several areas, including:

1. Incorporating external data: While the "Hotel Booking Demand" dataset contains a
wealth of information, it may be beneficial to include external data sources to gain a
more complete picture of factors that influence hotel booking behavior. This could
include data on local events or weather patterns that may impact demand.
2. Developing more advanced models: The current project may focus on basic machine
learning models to predict hotel booking demand. Future work could involve
developing more advanced models, such as neural networks or deep learning
algorithms, to improve accuracy and performance.
3. Implementing real-time analytics: The project could be expanded to include real-time
analytics, which would allow hotels to respond quickly to changes in demand or other
factors that impact booking behavior.
4. Integration with other hotel systems: The project could be integrated with other hotel
systems, such as revenue management or customer relationship management
software, to provide a more comprehensive view of hotel operations and customer
behavior.
5.Personalization: The project could be expanded to include personalized
recommendations for customers based on their booking history and preferences, which
would enhance the customer experience and potentially increase revenue for hotels.Overall,
the "Hotel Booking Demand" dataset provides a rich source of information that can be used
to explore a wide range of research questions related to hotel booking behavior, making it a
valuable resource for future research and analysis in the hospitality industry.

37
Theoretical Background

The hospitality industry is a major sector of the global economy, with the hotel industry
accounting for a significant portion of this industry. The success of a hotel depends largely on its
ability to accurately forecast demand, allocate resources effectively, and optimize revenue.
Traditionally, hotels have relied on historical data and intuition to make these decisions.
However, with the growth of big data and machine learning, there is a significant opportunity to
improve the accuracy of demand forecasting and revenue optimization through data-driven
approaches.
The "Hotel Booking Demand" dataset provides a rich source of information that can be used to

explore the factors that influence hotel booking behavior and develop strategies to optimize

revenue. The dataset includes information on bookings made at two hotels in Portugal, including

details on the booking date, length of stay, number of adults and children, room type, and other

relevant information. The dataset also includes information on cancellations, which is a key

factor in revenue optimization.

In recent years, there has been a significant growth in the use of machine learning in the hospitality industry,

with many hotels investing in data analytics and machine learning to improve their revenue management

strategies. The insights gained from analysis of the "Hotel Booking Demand" dataset can be used to develop

more accurate and effective models for predicting demand and optimizing revenue, providing hotels with a

competitive advantage in a crowded and competitive market.

38
Conclusion & Suggestions

1. Cancellation rates rise as the price does. In order to prevent cancellations of


reservations, hotels could work on their pricing strategies and try to lower the
rates for specific hotels based on locations. They can also provide some
discounts to the consumers.

2. As the ratio of the cancellation and not cancellation of the resort hotel is higher in
the resort hotel than the city hotels. So the hotels should provide a reasonable
discount on the room prices on weekends or on holidays.

3. In the month of January, hotels can start campaigns or marketing with a


reasonable amount to increase their revenue as the cancellation is the highest in
this month.

4. They can also increase the quality of their hotels and their services mainly in
Portugal to reduce the cancellation rate.

39
BIBLIOGRAPHY

Ahmed, A., Abdallah, A. B., & Naji, K. (2020). The relationship between lead time and
cancellation rates in hotel bookings: Evidence from a large dataset. Journal of Hospitality and
Tourism Management, 43, 91-98.

Bokde, N., Gupta, S., & Fazalbhoy, A. (2021). The impact of COVID-19 on hotel booking
demand: An empirical analysis using the "Hotel Booking Demand" dataset. Tourism
Management, 85, 104312.

Gnoth, J., & Zhang, J. (2020). The impact of online reviews on hotel booking intentions and
perception of trust: A mixed-methods approach. Journal of Hospitality and Tourism
Management, 43, 49-58.

Jorge, R., & van Hoof, H. (2019). Hotel demand datasets: A review. Journal of Hospitality and
Tourism Management, 40, 84-93.

Mostipak, J. (2020). Hotel Booking Demand dataset. Kaggle. Retrieved from


https://www.kaggle.com/jessemostipak/hotel-booking-demand

40

You might also like