Airbnb Data Analysis Report
Airbnb Data Analysis Report
PANAM Consulting
Max Pedersen – z5164270
Prerita Mehta – z5162933
Anderson Wong – z5076423
Noumik Thadani – z5246273
Andrew Rong – z5059252
2
Table of Contents
1. Background and Objectives ............................................................................................................. 4
1.1 Key Objectives ............................................................................................................................... 4
1.2 Data Science Profiles ................................................................................................................... 4
1.3 Mitigating Risks ............................................................................................................................. 5
2. Analysis.................................................................................................................................................. 6
2.1 Framework Overview .................................................................................................................... 6
2.2 Current Market Positioning & Expansion Strategy .............................................................. 6
2.3 Incorporating Hosts into Possible Expansion Strategy.................................................... 16
3. Evaluations and Recommendations............................................................................................. 18
3.1 Summary of Objectives and Insights ..................................................................................... 18
3.2 Exploring Dataset (Open/Structured/Social Media) ........................................................... 19
4. References........................................................................................................................................... 21
5. Appendix .............................................................................................................................................. 24
Appendix A: Data Science Profiles ............................................................................................... 24
Appendix B: Gantt Chart .................................................................................................................. 26
Appendix C: Risk Mitigation Matrix ............................................................................................... 27
Appendix D: Assumptions............................................................................................................... 27
Appendix E: Calculations and Logic ............................................................................................ 28
Appendix F: Risks & Limitations ................................................................................................... 28
Appendix G: ......................................................................................................................................... 29
Appendix H: ......................................................................................................................................... 29
Appendix I: ........................................................................................................................................... 29
Appendix J:.......................................................................................................................................... 30
Appendix K: ......................................................................................................................................... 31
Appendix L: ......................................................................................................................................... 33
Appendix M:......................................................................................................................................... 34
Appendix N: ......................................................................................................................................... 36
3
1. Background and Objectives
1.1 Key Objectives
The primary purpose of this report is to determine how Airbnb can subsequently expand
market share, given its current position in the Washington D.C. market. Researching
Airbnb’s positioning within the rental apartment space revealed that leveraging the
insights produced from their datasets could improve their overall position within
Washington D.C. This principal goal also furthers Airbnb’s vision of providing a “people-
to-people” platform for travel bookings and experiences, to benefit all stakeholders
including hosts, guests, and communities. Two underlying objectives stem from this
primary purpose:
1. Where to play?
Focusing on the customer environment, the main aim is to determine guest
interaction with current geographic segments, and an optimal method of
expansion
2. How to play in the most attractive segments?
Focusing on the host environment, the main aim is to determine the elasticity of
hosts to support an expansion, through conducting host segmentation analysis.
This, in turn, resulted in leveraging individuals’ core skills and defining roles and
responsibilities to conduct exploration and analysis in the most efficient and effective
way. Noumik capitalized on his strong problem skills by extracting, cleansing and
manipulating data into a useable format. This allowed Max to derive meaningful insights
from the data by paying close attention to detail, particularly underlying assumptions
and limitations in the data to predict and optimize decision-making. Anderson then
utilized various software and his proficiency in technical languages to maximize the
4
accuracy of the findings, while Andrew creatively conceptualized the data using
graphical tools to create a cohesive story from the data. Prerita drew upon her deep
understanding of the business context and implications of the findings to subsequently
effectively communicate data complexities and technical understanding in an intelligible
manner.
In terms of data analysis, clearly listing out all assumptions is crucial to alleviating the
risk of making generalizations based on a relatively small sample of data. Similarly,
conducting further research and appropriately extending the given data set is vital in
providing informed recommendations for how Airbnb can expand optimally.
5
2. Analysis
2.1 Overview
Washington D.C. receives more than 20 million overseas visitors annually, making it the
8th most popular destination within the US, at 5.56% of market share. As the US capital,
it is both a political stronghold and prime tourist destination, having a strong inflow of
government and business-related travel. Through analyzing key trends across various
elements that comprise of the Airbnb environment including properties, hosts, guests,
and sentiment, it becomes apparent that Washington D.C. is an emerging and lucrative
market for Airbnb to further expand in.
6
Geographical Clusters Analysis
Figure 3 indicates that currently, Airbnb’s listings are primarily concentrated within
Wards 1, 2, and 6. Wards 4, 7, and 8 have a lesser concentration, which could
potentially be attractive for Airbnb to capitalize on in order to expand market share.
7
Guest Analysis
Of the 125 neighborhoods, the markets for each vary significantly, with Airbnb’s markets
consequently being segmented into established and emerging regions based on
maturity and revenue, to realize where the most lucrative markets lie. Established
markets have been classified as neighborhoods that have both a total inferred revenue
of greater than $100,000 and 10+ listings in that respective segment (Appendix E). All
other geographic segments not meeting these specific criteria are defined as emerging
markets. Figures 4-7 depict inferred total revenue and number of listings for both
emerging and established markets respectively.
Figure 4:
Figure 5:
8
Figure 6:
Figure 7:
9
Figure 8 reveals the geographic concentration of both emerging (orange) and
established (blue) markets in Washington DC. Wards 1, 2, 5 and 6 are more heavily
concentrated with established markets, with Wards 7 & 8 having a larger proportion of
emerging markets.
Figure 8:
10
Guest Review Analysis
Segmenting the markets into ‘emerging’ and ‘established’ was performed to enable
analysis of the established market segment, and evaluation of property attributes based
on guest reviews. Appendix E contains further explanation of calculations and logic, but
some details are:
The analysis is focused on established markets, as there are wider spreads in price and
a significantly larger size of properties and bookings that have occurred. Thus, data
relating to customer feedback is likely to be more accurate, and inferences on
customers’ willingness to pay certain prices can be made more easily.
Guest review analysis will evaluate Airbnb’s adjustments to price and value in the
established markets, and determine if current levels are suitable for market expansion.
11
Average PPG Analysis
Figure 9:
As seen above, Airbnb’s most attractive guests from a revenue perspective lie within
Bellevue, Palisades, Woodley Park, North Cleveland Park, and Logan Circle. These
neighborhoods have the highest PPGs, with a combined mean nightly PPG of $85.91.
Airbnb’s least attractive guests from a revenue perspective lie within Colonial Village,
Central Business District, Woodridge, Carver Langston, & Fort Davis. These have the
lowest PPGs, with a combined mean nightly PPG of $23.22. This equates to a 370%
difference between the 5 least expensive neighborhoods and 5 most expensive
neighborhoods, which is very significant.
12
Figure 10:
Figure 10 indicates the average value review for top 5 suburbs is 9.63, while the average
value review for bottom 5 suburbs is 9.33. There is only a 0.086 difference in the mean
value review for the 5 most expensive suburbs, and 5 least expensive suburbs. With a
minimal variation in mean value scores between high-priced and low-priced segments,
and an overall mean value review score higher than 9, it suggests that properties seem
to be priced accordingly across geographical segments. Despite the significant difference
in mean PPG demonstrated previously, there is no indication of significant overpricing
occurring, with guests booking in specific neighborhoods based on their propensity to
pay. De-aggregating this to an individual listing level, there is virtually no correlation
between price and value rating (Pearson coefficient of -0.05) (Appendix H).
13
These findings are suggestive that the current range of pricing across geographical
segments in Washington DC accommodates for differing guest propensities to spend,
and that their value expectations from Airbnb properties at specific price points are being
satisfied. As Airbnb considers market expansion options, pricing appears currently to be
customer-driven at an optimal level (Collins et at. 2006), and so pricing adjustments to
listings are unlikely to be a key component of an expansion strategy.
Further evaluating the influence of exact location versus other factors including personal
safety on location ratings, Google Maps’ Distance Vector API was reverse-queried to
determine the distance of each suburb’s midpoint from Capitol Hill (Appendix H). Capitol
Hill was chosen for its positioning as the largest established market by inferred revenue
(Figure 5), hypothesized as indicative of strong demand for its location (Statistical Atlas
2019). Figures 11 & 12 show the lack of identifiable relationship between distance from
Capitol Hill and average location review score, for established and emerging markets
respectively.
14
Figure 11: Average Review Score by Distance, with # of listings (Established markets)
Figure 12: Average Review Score by Distance, with # of listings (Emerging markets)
15
A correlation coefficient of 0.124887 between average location review and distance from
Capitol Hill indicated a very weak linear relationship (Appendix H), implying location
review scores are not strongly linked with absolute location.
Given this, and that price reviews were overarchingly positive across price points,
expanding geographically into emerging markets is viable. Guest engagement, feedback
and spending in these emerging markets would not be fully dependent on their price level
or absolute distance from the most established markets. The lack of particular offering
types in established markets could then be identified and supplied by emerging markets.
16
A frequent criticism of Airbnb is that whilst operating under the guise of local hosts earning
supplemental income, professional property management companies or business
operators are often key users. “Professional” Airbnb hosts, whose activities feature
characteristics of a “business”, are assumed to offer more than single accommodation
listings on Airbnb, with a higher likelihood of them also being in violation of most short-
term rental laws protecting residential housing (Inside Airbnb, 2017).
Inferring a host’s category as professional or personal was based on their listing count,
an approach that proved successful in previous studies (DC Working Families, 2017).
Across both emerging and expanded, a total of 18.7% of hosts are classified as
professional, with the remaining 81.3% labelled personal hosts. A breakdown of host
makeup by top 50 neighborhoods is provided below in Figure 13.
17
There is currently no neighborhood where professional hosts outweigh personal hosts,
and the top 5 neighborhoods contribute a significant proportion of the 18.7% of
professionals. The higher proportion of professional hosts in these top 5 may have
contributed significantly to the success of these neighborhoods in revenue generation, so
incentivizing professional hosts to move away from these areas could cause significant
financial issues. Whilst further data is needed to make greater conclusions, it is highly
probable that Airbnb would need to consider attracting new hosts to the platform to
support an expansion into the emerging neighborhoods.
The analysis has identified that despite strong PPG variation, value reviews from guests
are overarchingly positive, and have low correlation with PPG. Location reviews
continue the positive trend, and were determined to have a very weak relationship with
distance. These two observations justified a focus on emerging markets for expansion,
with product differentiation to avoid cannibalization.
The next stage of the analysis evaluated the feasibility of such expansion, given optimal
differentiations had been established. It analyzed trends for the categories of hosts –
‘personal’ and ‘professional’, and found it likely that there are not currently enough
professional hosts to support expansion. Thus, Airbnb would need to attract new hosts
to the platform when supporting an expansion strategy into the emerging markets.
18
3.2 Exploring Dataset (Open/Structured/Social Media)
Whilst substantial insights have been derived above, the current dataset is incomplete
and can be improved through joining it with open data.
Structured and unstructured data can also be used to enable further understanding of
Airbnb’s positioning within the Washington DC market as a whole.
The original source of the dataset is Inside Airbnb, which scrapes listing properties from
Airbnb itself to provide open data. By using Inside Airbnb’s original datasets, the initial
dataset can be joined with an extended range of attributes, improving the quality and
information range of the dataset. An example of a joined dataset in Appendix H shows
additional features such as host profile description, their level of verification, and their
expectations-from which a more accurate segmentation of whether a host is
professional or personal can be made.
Social media data, a key type of unstructured data, has enabled implicit real-time
inference of consumer opinions, trends, and behaviors to gain qualitative understanding
of consumer feedback (Xu, Y. et al. 2016). Airbnb can utilize Twitter through the
scraping framework provided (Appendix H), to gather tweets relating to itself and the
hotel industry in Washington DC. Tweets can be filtered on keywords, location posted
and dates, from which sentiment analysis can be applied. This would enable
benchmarking and identification of guest sentiment trends over time.
19
The limitation of such Twitter data is that it only captures user-generated content. To
understand user actions, a combination of Google Trends & Words Everywhere has
been used in Appendix H, to determine absolute search count from Google Trends for
Airbnbs and hotels in Washington DC. Figure 11 shows their tracking in the year prior to
the capture of the dataset, from which Airbnb can further understand its positioning
relative to the traditional hotel industry.
20
4. References
Airbnb. (2019). How do star ratings work?. [online] Available at:
https://www.Airbnb.com.au/help/article/1257/how-do-star-ratings-work [Accessed
1 Apr. 2019]
Airdna. Shares of Full Time Airbnb Operators and their Revenue. [online] Available at:
https://i.pinimg.com/originals/83/c2/72/83c272260603e8bead24cb83b550714e.p
ng [Accessed 31 Mar. 2019]
Areavibes (2019). Anacostia, DC Crime Rates & Crime Map. [online] Available at:
https://www.areavibes.com/washington-dc/anacostia/crime/ [Accessed 29 Mar.
2019].
CBRE (2017), Hosts with Multiple Units – A Key Driver of Airbnb Growth. [ebook]
Available at:
https://www.ahla.com/sites/default/files/CBRE_AirbnbStudy_2017.pdf [Accessed
1 Apr. 2019]
Collins, M & Parsa, H.G. (2006). Pricing strategies to maximize revenues in the lodging
industry. [ebook] Hospitality Management. Available at:
https://pdfs.semanticscholar.org/b084/5675c228988d647976da5ae7b81df63d8d
17.pdf [Accessed 29 Mar. 2019]
Fradkin, A., Grewal, E. and Holtz, D. (2018). The Determinants of Online Review
Informativeness: Evidence from Field Experiments on Airbnb. [ebook] MIT Sloan
School of Management. Available at:
https://andreyfradkin.com/assets/reviews_paper.pdf [Accessed 28 Mar. 2019].
21
Jet, J. (2017). Are Business Travelers Using Airbnb. [online] Available at:
https://www.forbes.com/sites/johnnyjet/2017/08/22/are-business-travelers-using-
Airbnb/#615975a44ddf [Accessed 31 Mar. 2019].
Kwow, L & Xie, K. (2018). Pricing strategies on Airbnb: Are multi-unit host revenue
pros?. [online] International Journal of Hospitality Management. Available at:
https://www.researchgate.net/publication/327954155_Pricing_strategies_on_Airb
nb_Are_multi-unit_host_revenue_pros [Accessed 29 Mar. 2019]
Learn Airbnb. (2016). The State of Airbnb Hosting. [ebook] Available at:
https://learnAirbnb.com/wp-content/uploads/2017/08/LearnAirbnb.com-Airbnb-
Home-Sharing-Report-v1.4.pdf [Accessed 1 Apr. 2019]
Priceonomics. (2017). The Rise of the Professional Airbnb Investor. [online] Available
at: https://priceonomics.com/will-real-estate-investors-take-over-Airbnb/
[Accessed 27 Mar. 2019].
Samaan, R. (2015). Airbnb, rising rent, and the housing crisis in Los Angeles. [ebook]
Available at:
https://www.ftc.gov/system/files/documents/public_comments/2015/05/01166-
96023.pdf [Accessed 29 Mar. 2019]
22
of-Columbia/Washington/Anacostia/Household-
Incomehttps://statisticalatlas.com/neighborhood/District-of-
Columbia/Washington/Anacostia/Household-Income [Accessed 1 Apr. 2019].
Working Families. (2017). Selling the District Short. [ebook] D.C. Working Families.
Available at: http://dcsharebetter.org/wp-content/uploads/2017/03/D.C.-Housing-
Report_Web.pdf [Accessed 2 Apr. 2019]
Xu, Y., Zhou, D., and Lawless, S. (2019). [ebook] Inferring Your Expertise from Twitter:
Integrating Sentiment and Topic Relatedness,
p.https://www.scss.tcd.ie/seamus.lawless/papers/WI-2016.pdf. Available at:
https://www.scss.tcd.ie/seamus.lawless/papers/WI-2016.pdf [Accessed 29 Mar.
2019].
23
5. Appendix
Appendix A: Data Science Profiles
Andrew:
Anderson:
24
Max:
Prerita:
25
Noumik:
26
Appendix C: Risk Mitigation Matrix
Appendix D: Assumptions
The wards will consistently stay the same – wards 1, 2 and 6 will remain the most
concentrated according to our implementation plan (one year horizon)
Consumer trends are still stable – they still have a preference for Airbnb,
(privacy) over hotels and we do not expect that to change
The aim of growth is still specific to Washington DC and we do not expect other
cities to take over DC in terms of rental focus
Hosts are still available next year (they do not delist)
Hosts willingness to move is accurately measured
Cities are not expected to expand
This is a limited data set, and more data is required in order to gain a holistic
understanding of the Airbnb environment, which would in turn inform more
accurate recommendations
The original dataset has a historical trawl as of 12th October 2018, which is the
basis of the analytical findings which may not be an accurate reflection of real
time conditions
The review scores are ratings and are subjected to noise and bias
27
Appendix E: Calculations and Logic
PPG (Price per Guest) - Price of listing / accommodates (number of guests) - to scale
metrics based on price which may be misleading if there is a skew of properties
accommodating only 1 person.
This is based on a metric of 72% of guests leaving reviews (Fradkin, A., Grewal, E. and
Holtz, D. 2018). The average stay in the Airbnb has been evaluated to be roughly 3
nights (Learn Airbnb 2016). There is a variety of assumptions underlying this analysis –
namely that the price does not typically change, which is unlikely to hold up in real life.
However, the metric aims to be directional and assumes that price changes will occur
across groupings of properties.
28
recommendations are informed by vast amounts of data analysis, actually implementing
these suggestions must be informed by both data as well as intuition and experience.
Appendix G:
Example of Cannibalism (Section 2.2 – Risk to Analysis)
Capitol Hill has 534 listings and generates the most revenue of any established
markets. Lincoln Park, by comparison, has just 1 listing in the dataset, and is 0.6 miles
away. Given that apartments, houses and townhouses make up 82.3% of all listings
within Capitol Hill, if Lincoln Park’s makeup was focused largely on serviced
apartments, B&Bs and condominiums, potential guests indifferent between the locations
could purchase their desired listing types (B&Bs and condos) within Lincoln Park.
Appendix H:
https://github.com/max-pedersen/infs3603project/ contains the relevant files relating to
the initial data approach that was taken (see Dataset exploration & workings.ipynb). It
also contains folders with samples of the open, structured and unstructured data that
was mentioned.
https://github.com/max-pedersen/infs3603project/tree/master/extendingdata - Contains
hotel data, and contains Google Trends & Words Everywhere data, to infer search
counts.
https://github.com/max-pedersen/infs3603project/tree/master/GetOldTweets-python-
Scraping module for Twitter, and examples of scraped Airbnb/hotel tweets
https://github.com/max-pedersen/infs3603project/tree/master/open-data-usage-inside-
airbnb- Contains example of joined data with the open data source from Inside Airbnb
Appendix I:
29
Appendix J:
30
Appendix K:
31
32
Appendix L:
33
Appendix M:
34
35
Appendix N:
36