Grand
Python Hospitality Project
AtliQ Grand
Content
Problem statement
About the dataset
Data cleaning
Data transformation
Analysis Insights
Problem statement
AtliQ Grand owns multiple five-star hotels across India . They have been in
the hospitality industry for past 20 years . Due to strategic moves from
other competitors and ineffective decision-making in the management ,
AtliQ grands are losing its market share and revenue in the luxury/business
hotels category .
As a strategic move , the managing director of AtliQ Grands wanted to
regain their market share and revenue . Their revenue management team
had decided to hire a 3rd party service provider to provide them with
insights from their historical data .
You are a data analyst who has been provided with sample data to provide
the revenue insights to the team
Project by Pratham Gupta [01]
About the Dataset
Team provided the 3 months bookings data of AtliQ Grand having around
1.4 lakh records .
Dataset contains 3 dimension tables and 2 fact tables
In between project, we were also provided with the August month data to
include it in the previous data
Importing datasets
dim_date dim_hotels
Project by Pratham Gupta [02]
About the Dataset
dim_rooms fact_aggregated_bookings
fact_bookings
Difference between revenue_generated and revenue_realized -
for example someone booked a hotel for 9100 but after some time cancelled it . He will be
charged cancellation fee and that fee will be given to the hotel lets say 3600 . So revenue
generated is 9100 and revenue realized is 3600 . when someone is cancelling his booking ,
then also the hotel is generating some amount of money .
Project by Pratham Gupta [03]
Data Cleaning
If we look closely we find that minimum number of guests is -ve , which can’t be true
1.Clean Invalid guests
Since rows containing negative number of guests are less as compared to the
total rows , hence we can ignore them for insight generation .
So keeping the rows which has number of guests +ve
Project by Pratham Gupta [04]
Data Cleaning
2. Outlier removal in revenue_generated
we have no negative values in rev_generated column
Since we have only 5 outliers , we can ignore them like we did earlier .
Project by Pratham Gupta [05]
Data Cleaning
3. Outlier removal in revenue_realized
One observation we can have in above dataframe is that all rooms are RT4 type which means
Presidential Suite. Now since RT4 is a luxurious room it is likely their rent will be higher. To
make a fair analysis, we need to do data analysis only on RT4 room type
Project by Pratham Gupta [06]
Data Cleaning
Here higher limit comes to be 50583 and in our dataframe above we can see that max
value for revenue realized is 45220. Hence we can conclude that there is no outlier
and we don't need to do any data cleaning on this particular column
4. Null values in ratings_given column
Since, it is logical that every customer don’t provide rating, our ratings_given
column can have null values.
5. Null values in df_agg_bookings
Project by Pratham Gupta [07]
Data Transformation
It means transforming or changing data to such a state on which we can
perform data analytics better .
1.Create occupancy percentage column
2. Cast dates to datetime format
There are various types of data transformations that you may have to perform
based on the need. Few examples of data transformations are,
1.Creating new columns
2.Normalization
3.Merging data- using merge function
4.Aggregation- sum , mean , etc
Project by Pratham Gupta [08]
Analysis Insights
1. What is an average occupancy rate in each of the room categories?
Project by Pratham Gupta [09]
Analysis Insights
2. What is an average occupancy rate per city ?
3. When was the occupancy better? Weekday or Weekend?
Project by Pratham Gupta [10]
Analysis Insights
4. What is the revenue_realized per city ?
Project by Pratham Gupta [11]
Analysis Insights
5. What is the revenue_realized per hotel type ?
Project by Pratham Gupta [12]
Analysis Insights
6. Print a pie chart of revenue_realized per booking platform
Project by Pratham Gupta [13]
We got new data for the month of August , append that to existing data
Project by Pratham Gupta [14]