0% found this document useful (0 votes)
292 views13 pages

Hotel Bookings Exploratory Data Analysis - 1

This document analyzes a dataset containing 119,390 hotel bookings between 2015-2017 to understand trends and guest behavior. It describes the 32 columns in the dataset which include information like hotel type, booking dates, room details, number of guests, cancellations, and more. The analysis will help understand important factors for hotel bookings and guest preferences to improve future deals for hotels and customers. Hotel reservation systems allow guests to directly book and pay for rooms online through the hotel's software, without needing travel agents.

Uploaded by

Rishu Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
292 views13 pages

Hotel Bookings Exploratory Data Analysis - 1

This document analyzes a dataset containing 119,390 hotel bookings between 2015-2017 to understand trends and guest behavior. It describes the 32 columns in the dataset which include information like hotel type, booking dates, room details, number of guests, cancellations, and more. The analysis will help understand important factors for hotel bookings and guest preferences to improve future deals for hotels and customers. Hotel reservation systems allow guests to directly book and pay for rooms online through the hotel's software, without needing travel agents.

Uploaded by

Rishu Gupta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Hotel Bookings Exploratory Data Analysis

We are provided with data of


hotel bookings we had
Abstract: analyze data on following
When we hear about hotel questions.
booking, we definitely hear
about Trivago. Thanks to • Which type of
large scale advertisement, I hotels are most
can’t seem to get ad out of preferred by
my head. Whenever customer?
someone try to book hotel, • What is the
they will surely consider distribution of hotel
following factors like price changes made by
per night, distance of hotel, guests?
restaurant availability, • Which distribution
scenery location, room channel is mostly
quality, cleanliness, food used for booking?
quality. • What is the
In this dataset we are percentage of
provided with dataset that booking
contains data about hotel cancellation?
bookings, cancellation and • Does not getting
some important factors that same room type as
relates to hotel bookings. demanded causes
Our analysis will help us to cancellation?
understand what factors are • Does longer
important for hotel booking, waiting period
preference of guest when causes booking
they book hotels etc. We cancellation?
will analyze hotel bookings • Which distribution
data to gain insights. channel has the
highest cancellation
Keywords: Hotel rate?
bookings, City hotels, • Which year has the
Resort hotels, highest bookings?
Cancellation, • Which month has
the highest
Distribution channel.
bookings overall?
1.Problem • Which date day of
month has the
Statement: highest bookings?
• What is the percentage what every feature in data
of repeated guests? means.
• Which hotel has the The data table consists of
more lead time? 119,390 rows and 32
• Which agent has made columns. Our analysis starts
the most bookings? with defining each column
• Which type of food is and our understanding for
mostly preferred by each column mentioned
guests? below:
• What is the percentage
• hotel: Hotel type
distribution of
(City hotels, Resort
customer type?
hotels)
• What is the percentage
distribution of car • is_canceled: value
parking space indicates if the
required? booking is
• What is the percentage cancelled or not.
distribution of deposit • lead_time: How
type? long in advance the
• Which is the most booking was made.
reserved room type by • arrival_date_year:
customer? Customer arrival
• From which country year.
the most guests are • arrival_date_month
coming? : In which month of
• Which hotel type has the year customer
the highest visited hotel.
ADR? • arrival_date_week_
• What is the optimal number: In which
stay length in both the week of the year
type of hotel? customer arrived.
• What are the booking • arrival_date_day_o
percentage according f_month: Date of
to number of people? the month customer
visited hotel.
2.Data description: • stays_in_weekend_
The main objective of nights: Customer
Exploratory data analysis is to stayed or booked to
understand trend and behavior of stay in hotel during
guest in hotel bookings. For that weekend nights.
first we will need to understand
• stays_in_week_nights: customer prior to
Customer stayed in the current
hotel during week booking.
nights. • reserved_room_typ
• adults: Number of e: Code of room
adults. type reserved. Code
• children: number of is presented instead
children. of designation for
• babies: Number of anonymity reasons.
babies. • assigned_room_typ
• meal: Type of meal e: Code for the type
booked.: of room assigned to
• country: Country of the booking.
orgin of cutomer. Sometimes the
• market_segment: assigned room type
where the bookings differs from the
came from. reserved room type
due.
• distribution_channel:
Booking distribution • booking_changes:
channel. The term Number of
“TA” means “Travel changes/amendmen
Agents” and “TO” ts made to the
means “Tour booking from the
Operators” moment the
booking was
• is_repeated_guest:
entered on the
Value indicating if the
PMS.
booking name was
from a repeated guest • deposit_type:
(1) or not (0). Indication on if the
customer made a
• previous_cancellations
deposit to
: Number of previous
guarantee the
bookings that were
booking.
cancelled by the
customer prior to the • agent: ID of the
current booking. travel agency that
made the booking.
• previous_bookings_no
t_canceled: umber of • company: ID of the
previous bookings that company/entity that
were cancelled by the made the booking
or responsible for • reservation_status_
paying the booking. date: Date at which
• days_in_waiting_list: the last status was
Number of days the set. This variable
booking was in the can be used in
waiting list before it conjunction with
was confirmed to the the Reservation
customer. Status to
• customer_type: Type understand when
of booking, assuming was the booking
one of four categories. cancelled or when
did the customer
• adr: Average Daily
checked out of the
Rate as defined by
hotel.
dividing the sum of all
lodging transactions by
the total number of
staying nights.
• required_car_parking_
spaces: Number of car 3.Introduction
parking spaces :
required by the This dataset with
customer. hotel bookings data.
• total_of_special_reque One of the hotels is a
sts: Number of special resort hotel and the
requests made by the other is a city hotel.
customer (e.g. twin The dataset consists
bed or high floor). of 32 columns and
• reservation_status: 119,390
Reservation last status, observations. Each
assuming one of three observation
categories: Canceled – represents a hotel
booking was canceled booking. The
by the customer; datasets comprehend
CheckOut: customer hotel bookings to
check out from arrive between the
hotel,Noshow: year 2015 and the
Customer did not 2017, including
check-in hotel and bookings that
informed hotel with effectively arrived
reason. and bookings that
were canceled. Due to Through the hotel
the scarcity of real reservation system
business data for software, guests can
scientific and choose how long they
educational purposes, will stay, the type of
these datasets can have room they want, get
an important role for add-ons, and pay
research and education securely online
in revenue management, through a payment
machine learning, or platform. In this
data mining, as well as article we’ll go in
in other fields. depth about what a
hotel reservation
Our analysis will help us
system is, how it
to understand data in
works, and the
depth, trends in hotel
benefits that it offers
bookings that can help us
to the hospitality
to plan better deal
sector. So, let’s get
accordingly for both
started.
hotels and guests in
future.
A hotel reservation system is
4. How hotel a software application that
allows guests to book directly
reservation with the hotel online, with no
system works? intermediaries necessary. The
software essentially processes
A hotel reservation online reservations made via
system is the the hotel’s website and then
mechanism through passes this information to the
which guests can create hotel’s own backend so that
secure online the information can be easily
reservations. While the accessed. Bookings are then
process is similar to managed by hotel staff.
booking with an online
travel agent (OTA), the With the boom of the
difference is the hotel’s Millennial traveler, now more
booking engine than 700 million people are
essentially links up to expected to book primarily
their own website so online by 2023, so having an
that there are no online reservation system is
additional fees incurred key to reaching a widespread
for the property. audience. It is also key to
generating a good first Therefore, it is very
impression because guests are critical to build an
able to place bookings without online booking
having to navigate to another system for your site.
domain. Keeping the whole
process internally prevents 5.2 Indirect bookings
clients from navigating away
from the page before making the The current indirect
final booking. booking channels are
pretty diverse and
5.Types of hotel popular with many
bookings customers.

system? The advantage of


There are mainly two type of indirect booking
hotel bookings systems Direct channels is offering
and Indirect. many choices for
5.1 Direct bookings lookers. They can see
many options from
This booking source is important
different hotels and
to most hotel owners and is
compare prices,
considered the best long-term
services, promotions,
strategy for your hotel marketing
etc., to choose the
and distribution strategy.
right hotel for their
needs.
By offering direct booking,
customers can make their room
Here are a few popular
reservations by sending an email,
indirect booking
calling the hotel, visiting and
channels.
booking room services on your
website, and social media
OTA
channels.
OTA is an acronym
Booking room through the
for Online Travel
hotel-owned website
Agency, this is a
Not only providing familiar room sales
official information of channel for those who
hotel services, but the are in the
website is also a accommodation
powerful tool to convert business.
lookers to bookers
without any cost.
Some OTA channels hotel rooms, also known as
attracting a lot of users the method of selling
include Agoda, Business to business (B2B)
Traveloka, Booking, rooms.
Expedia, Abay.
GDS works on the principle
For each booking that hotels will sign a contract
through the OTA with GDS, then provide
channel, the hotel will information about their offers.
have to pay a certain After that travel companies
percentage of the will get information from the
booking value as GDS system and resell it to
commission for the room their tourists.
sales channel as agreed
in the contract. The hotel will have a lot of
customers through the GDS.
TA

TA stands for Travel Agency.


6.Steps
These travel agencies often Involved
organize tours with a large and
stable number of guests. Exploratory Data
Normally, travel agencies are Analysis
responsible for arranging all
services on tour including Exploratory Data Analysis is
booking rooms for customers, a data analytics process to
which is why a TA will bring a understand the data in depth
stable number of customers to and learn the different data
the hotel. characteristics, often with
visual means. This allows you
The guest source from TAs to get a better feel of your
usually accounts for 50 – 60% of data and find useful patterns
the hotel’s bookings, so many in it
hotels today are very focused on
developing this indirect booking
channel.

GDS

GDS stands for Global


Distribution System, this is a
global distribution system for
6.1 Data Collection

Data collection is the


process of collecting,
measuring and analysing
different types of
information using a set
of standard validated
techniques. The main
objective of data
collection is to gather
It is crucial to understand information-rich and
it in depth before you reliable data, and analyse
perform data analysis them to make critical
and run your data business decisions. Once
through an algorithm. the data is collected, it
You need to know the goes through a rigorous
patterns in your data and process of data cleaning
determine which and data processing to
variables are important make this data truly
and which do not play a useful for businesses. It
significant role in the refers to the process of
output. Further, some finding and loading data
variables may have into our system.
correlations with other
variables. You also need Pandas library is used to
to recognize errors in loading our data in our system
your data. in python. Using pandas we
can manipulate data easily.
All of this can be done
with Exploratory Data 6.2 Data Cleaning
Analysis. It helps you
gather insights and make Data cleaning refers to the
better sense of the data, process of removing
and removes unwanted variables and
irregularities and values from your dataset and
unnecessary values from getting rid of any
data. irregularities in it. Such
anomalies can
disproportionately skew the
data and hence adversely
affect the results. Some steps that mathematical values in
can be done to clean data are: the data.

• Handling missing values: Bivariate analysis:


There are always some Here, you use two
missing values in dataset. variables and
If we don’t remove or compare them. This
handle those missing way, you can find
values then that can cause how one feature
a trouble in our analysis. affects the other. It
Removing or replacing is done with scatter
those missing values with plots, which plot
something meaningful is individual data
very important so that our points or correlation
data will have no missing matrices that plot the
values. correlation in hues.
6.4 Visualization
• Removing duplicates:
Drop the duplicates rows. Data visualization is
the representation of
• Formatting data to proper data through use of
dtype. common graphics,
such as charts, plots,
• Adding or removing infographics, and
columns required for even animations.
analysis. These visual displays
of information
communicate
6.3 Univariate or Bivariate complex data
analysis relationships and
data-driven insights in
Univariate analysis: In a way that is easy to
Univariate Analysis, you understand.
analyse data of just one
variable. A variable in your Types of data
dataset refers to a single visualizations
feature/ column. You can
do this either with • Tables: This
graphical or non-graphical consists of rows
means by finding specific and columns used
to compare
variables. Tables
can show a great deal
of information in a
structured way, but
they can also
overwhelm users that
are simply looking for
high-level trends.
• Pie charts and bar charts: These graphs are
divided into sections that represent parts of a
whole. They provide a simple way to organize
data and compare the size of each component
to one other.

• Scatter plots: These visuals are beneficial in revealing the


relationship between two variables, and they are commonly
used within
regression data analysis. However, these can
sometimes be confused
with bubble charts, which are used
to visualize three variables via the
x-axis, the y-axis,
and the size of the
bubble.
• Heat maps: These graphical displays are helpful in visualizing
• Line graphs and area charts: These behavioural data by
location. This visuals show change in one or more can be a
location on a map, or even quantities by plotting a series of a
webpage.
data
points
over
time.
Line
graphs
utilize
lines to
demonstr
ate these
changes
while
area
charts
connect
data
points
with line
segments
,
stacking
variables
on top of
one
another
and
using
colour to
distingui
sh
between
variables
.
7. Conclusion
Now we reached to end of our
project.
In all these processes we get
to know that city hotels have
more bookings than resort
hotel might be because resort
hotels are on costlier side or at
longer distance on other has
on other hand city hotels are
• Kdeplot: A kernel less costly and situated near at
density estimate (KDE) Railway stations and Airports.
plot is a method for
visualizing the distribution
27 % of bookings were cancelled,
of observations in a only 3.9
dataset, analogous to a % guests were repeated, guest
histogram. KDE represents preferred to
the data using a continuous
stay for week or less than a week
probability density curve in
and Travel agents/Travel
one or more dimensions.
operators is the most preferred
distribution channel for bookings.

You might also like