This document analyzes a dataset containing 119,390 hotel bookings between 2015-2017 to understand trends and guest behavior. It describes the 32 columns in the dataset which include information like hotel type, booking dates, room details, number of guests, cancellations, and more. The analysis will help understand important factors for hotel bookings and guest preferences to improve future deals for hotels and customers. Hotel reservation systems allow guests to directly book and pay for rooms online through the hotel's software, without needing travel agents.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
292 views13 pages
Hotel Bookings Exploratory Data Analysis - 1
This document analyzes a dataset containing 119,390 hotel bookings between 2015-2017 to understand trends and guest behavior. It describes the 32 columns in the dataset which include information like hotel type, booking dates, room details, number of guests, cancellations, and more. The analysis will help understand important factors for hotel bookings and guest preferences to improve future deals for hotels and customers. Hotel reservation systems allow guests to directly book and pay for rooms online through the hotel's software, without needing travel agents.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13
Hotel Bookings Exploratory Data Analysis
We are provided with data of
hotel bookings we had Abstract: analyze data on following When we hear about hotel questions. booking, we definitely hear about Trivago. Thanks to • Which type of large scale advertisement, I hotels are most can’t seem to get ad out of preferred by my head. Whenever customer? someone try to book hotel, • What is the they will surely consider distribution of hotel following factors like price changes made by per night, distance of hotel, guests? restaurant availability, • Which distribution scenery location, room channel is mostly quality, cleanliness, food used for booking? quality. • What is the In this dataset we are percentage of provided with dataset that booking contains data about hotel cancellation? bookings, cancellation and • Does not getting some important factors that same room type as relates to hotel bookings. demanded causes Our analysis will help us to cancellation? understand what factors are • Does longer important for hotel booking, waiting period preference of guest when causes booking they book hotels etc. We cancellation? will analyze hotel bookings • Which distribution data to gain insights. channel has the highest cancellation Keywords: Hotel rate? bookings, City hotels, • Which year has the Resort hotels, highest bookings? Cancellation, • Which month has the highest Distribution channel. bookings overall? 1.Problem • Which date day of month has the Statement: highest bookings? • What is the percentage what every feature in data of repeated guests? means. • Which hotel has the The data table consists of more lead time? 119,390 rows and 32 • Which agent has made columns. Our analysis starts the most bookings? with defining each column • Which type of food is and our understanding for mostly preferred by each column mentioned guests? below: • What is the percentage • hotel: Hotel type distribution of (City hotels, Resort customer type? hotels) • What is the percentage distribution of car • is_canceled: value parking space indicates if the required? booking is • What is the percentage cancelled or not. distribution of deposit • lead_time: How type? long in advance the • Which is the most booking was made. reserved room type by • arrival_date_year: customer? Customer arrival • From which country year. the most guests are • arrival_date_month coming? : In which month of • Which hotel type has the year customer the highest visited hotel. ADR? • arrival_date_week_ • What is the optimal number: In which stay length in both the week of the year type of hotel? customer arrived. • What are the booking • arrival_date_day_o percentage according f_month: Date of to number of people? the month customer visited hotel. 2.Data description: • stays_in_weekend_ The main objective of nights: Customer Exploratory data analysis is to stayed or booked to understand trend and behavior of stay in hotel during guest in hotel bookings. For that weekend nights. first we will need to understand • stays_in_week_nights: customer prior to Customer stayed in the current hotel during week booking. nights. • reserved_room_typ • adults: Number of e: Code of room adults. type reserved. Code • children: number of is presented instead children. of designation for • babies: Number of anonymity reasons. babies. • assigned_room_typ • meal: Type of meal e: Code for the type booked.: of room assigned to • country: Country of the booking. orgin of cutomer. Sometimes the • market_segment: assigned room type where the bookings differs from the came from. reserved room type due. • distribution_channel: Booking distribution • booking_changes: channel. The term Number of “TA” means “Travel changes/amendmen Agents” and “TO” ts made to the means “Tour booking from the Operators” moment the booking was • is_repeated_guest: entered on the Value indicating if the PMS. booking name was from a repeated guest • deposit_type: (1) or not (0). Indication on if the customer made a • previous_cancellations deposit to : Number of previous guarantee the bookings that were booking. cancelled by the customer prior to the • agent: ID of the current booking. travel agency that made the booking. • previous_bookings_no t_canceled: umber of • company: ID of the previous bookings that company/entity that were cancelled by the made the booking or responsible for • reservation_status_ paying the booking. date: Date at which • days_in_waiting_list: the last status was Number of days the set. This variable booking was in the can be used in waiting list before it conjunction with was confirmed to the the Reservation customer. Status to • customer_type: Type understand when of booking, assuming was the booking one of four categories. cancelled or when did the customer • adr: Average Daily checked out of the Rate as defined by hotel. dividing the sum of all lodging transactions by the total number of staying nights. • required_car_parking_ spaces: Number of car 3.Introduction parking spaces : required by the This dataset with customer. hotel bookings data. • total_of_special_reque One of the hotels is a sts: Number of special resort hotel and the requests made by the other is a city hotel. customer (e.g. twin The dataset consists bed or high floor). of 32 columns and • reservation_status: 119,390 Reservation last status, observations. Each assuming one of three observation categories: Canceled – represents a hotel booking was canceled booking. The by the customer; datasets comprehend CheckOut: customer hotel bookings to check out from arrive between the hotel,Noshow: year 2015 and the Customer did not 2017, including check-in hotel and bookings that informed hotel with effectively arrived reason. and bookings that were canceled. Due to Through the hotel the scarcity of real reservation system business data for software, guests can scientific and choose how long they educational purposes, will stay, the type of these datasets can have room they want, get an important role for add-ons, and pay research and education securely online in revenue management, through a payment machine learning, or platform. In this data mining, as well as article we’ll go in in other fields. depth about what a hotel reservation Our analysis will help us system is, how it to understand data in works, and the depth, trends in hotel benefits that it offers bookings that can help us to the hospitality to plan better deal sector. So, let’s get accordingly for both started. hotels and guests in future. A hotel reservation system is 4. How hotel a software application that allows guests to book directly reservation with the hotel online, with no system works? intermediaries necessary. The software essentially processes A hotel reservation online reservations made via system is the the hotel’s website and then mechanism through passes this information to the which guests can create hotel’s own backend so that secure online the information can be easily reservations. While the accessed. Bookings are then process is similar to managed by hotel staff. booking with an online travel agent (OTA), the With the boom of the difference is the hotel’s Millennial traveler, now more booking engine than 700 million people are essentially links up to expected to book primarily their own website so online by 2023, so having an that there are no online reservation system is additional fees incurred key to reaching a widespread for the property. audience. It is also key to generating a good first Therefore, it is very impression because guests are critical to build an able to place bookings without online booking having to navigate to another system for your site. domain. Keeping the whole process internally prevents 5.2 Indirect bookings clients from navigating away from the page before making the The current indirect final booking. booking channels are pretty diverse and 5.Types of hotel popular with many bookings customers.
system? The advantage of
There are mainly two type of indirect booking hotel bookings systems Direct channels is offering and Indirect. many choices for 5.1 Direct bookings lookers. They can see many options from This booking source is important different hotels and to most hotel owners and is compare prices, considered the best long-term services, promotions, strategy for your hotel marketing etc., to choose the and distribution strategy. right hotel for their needs. By offering direct booking, customers can make their room Here are a few popular reservations by sending an email, indirect booking calling the hotel, visiting and channels. booking room services on your website, and social media OTA channels. OTA is an acronym Booking room through the for Online Travel hotel-owned website Agency, this is a Not only providing familiar room sales official information of channel for those who hotel services, but the are in the website is also a accommodation powerful tool to convert business. lookers to bookers without any cost. Some OTA channels hotel rooms, also known as attracting a lot of users the method of selling include Agoda, Business to business (B2B) Traveloka, Booking, rooms. Expedia, Abay. GDS works on the principle For each booking that hotels will sign a contract through the OTA with GDS, then provide channel, the hotel will information about their offers. have to pay a certain After that travel companies percentage of the will get information from the booking value as GDS system and resell it to commission for the room their tourists. sales channel as agreed in the contract. The hotel will have a lot of customers through the GDS. TA
TA stands for Travel Agency.
6.Steps These travel agencies often Involved organize tours with a large and stable number of guests. Exploratory Data Normally, travel agencies are Analysis responsible for arranging all services on tour including Exploratory Data Analysis is booking rooms for customers, a data analytics process to which is why a TA will bring a understand the data in depth stable number of customers to and learn the different data the hotel. characteristics, often with visual means. This allows you The guest source from TAs to get a better feel of your usually accounts for 50 – 60% of data and find useful patterns the hotel’s bookings, so many in it hotels today are very focused on developing this indirect booking channel.
GDS
GDS stands for Global
Distribution System, this is a global distribution system for 6.1 Data Collection
Data collection is the
process of collecting, measuring and analysing different types of information using a set of standard validated techniques. The main objective of data collection is to gather It is crucial to understand information-rich and it in depth before you reliable data, and analyse perform data analysis them to make critical and run your data business decisions. Once through an algorithm. the data is collected, it You need to know the goes through a rigorous patterns in your data and process of data cleaning determine which and data processing to variables are important make this data truly and which do not play a useful for businesses. It significant role in the refers to the process of output. Further, some finding and loading data variables may have into our system. correlations with other variables. You also need Pandas library is used to to recognize errors in loading our data in our system your data. in python. Using pandas we can manipulate data easily. All of this can be done with Exploratory Data 6.2 Data Cleaning Analysis. It helps you gather insights and make Data cleaning refers to the better sense of the data, process of removing and removes unwanted variables and irregularities and values from your dataset and unnecessary values from getting rid of any data. irregularities in it. Such anomalies can disproportionately skew the data and hence adversely affect the results. Some steps that mathematical values in can be done to clean data are: the data.
• Handling missing values: Bivariate analysis:
There are always some Here, you use two missing values in dataset. variables and If we don’t remove or compare them. This handle those missing way, you can find values then that can cause how one feature a trouble in our analysis. affects the other. It Removing or replacing is done with scatter those missing values with plots, which plot something meaningful is individual data very important so that our points or correlation data will have no missing matrices that plot the values. correlation in hues. 6.4 Visualization • Removing duplicates: Drop the duplicates rows. Data visualization is the representation of • Formatting data to proper data through use of dtype. common graphics, such as charts, plots, • Adding or removing infographics, and columns required for even animations. analysis. These visual displays of information communicate 6.3 Univariate or Bivariate complex data analysis relationships and data-driven insights in Univariate analysis: In a way that is easy to Univariate Analysis, you understand. analyse data of just one variable. A variable in your Types of data dataset refers to a single visualizations feature/ column. You can do this either with • Tables: This graphical or non-graphical consists of rows means by finding specific and columns used to compare variables. Tables can show a great deal of information in a structured way, but they can also overwhelm users that are simply looking for high-level trends. • Pie charts and bar charts: These graphs are divided into sections that represent parts of a whole. They provide a simple way to organize data and compare the size of each component to one other.
• Scatter plots: These visuals are beneficial in revealing the
relationship between two variables, and they are commonly used within regression data analysis. However, these can sometimes be confused with bubble charts, which are used to visualize three variables via the x-axis, the y-axis, and the size of the bubble. • Heat maps: These graphical displays are helpful in visualizing • Line graphs and area charts: These behavioural data by location. This visuals show change in one or more can be a location on a map, or even quantities by plotting a series of a webpage. data points over time. Line graphs utilize lines to demonstr ate these changes while area charts connect data points with line segments , stacking variables on top of one another and using colour to distingui sh between variables . 7. Conclusion Now we reached to end of our project. In all these processes we get to know that city hotels have more bookings than resort hotel might be because resort hotels are on costlier side or at longer distance on other has on other hand city hotels are • Kdeplot: A kernel less costly and situated near at density estimate (KDE) Railway stations and Airports. plot is a method for visualizing the distribution 27 % of bookings were cancelled, of observations in a only 3.9 dataset, analogous to a % guests were repeated, guest histogram. KDE represents preferred to the data using a continuous stay for week or less than a week probability density curve in and Travel agents/Travel one or more dimensions. operators is the most preferred distribution channel for bookings.