Capstone Project 1
Capstone Project 1
Data Report
a) Understanding how data was collected in terms of time, 4
frequency and methodology
b) Visual inspection of data (rows, columns, descriptive 4
details)
c) Understanding of attributes (Variable info, renaming if 4
required)
1
Problem - 1
1. Problem understanding
Given that purchasing propensity varies across different login devices, the company
aims to develop separate predictive models for users accessing the platform via
laptops and mobile devices. The challenge is to accurately identify and target
potential customers based on their digital behaviors, ultimately increasing the
effectiveness of digital advertising campaigns and driving ticket sales.
Tourism businesses now understand the importance of online reviews, as they sell
experiences that can't be touched. Knowing what info people look for online and
how they use it helps tailor marketing strategies.
By digging into data and understanding online behaviors, businesses can create
strategies that match what customers want, driving success in the travel industry.
2
c) Understanding business/social opportunity
The data provided highlights the significant impact of digital media on the tourism
industry, emphasizing its role in transforming how travel experiences are planned,
consumed, evaluated, and marketed. Websites and social media platforms serve as
rich sources of information, offering insights such as destination reviews, user
interactions, and travel check-ins. This digital landscape presents ample
opportunities for businesses to leverage social media marketing, which can enhance
exposure, increase traffic, and drive sales at reduced costs.
The project outlined aims to dwell deeper into understanding the digital behaviors
of customers, particularly in the context of the aviation industry. By studying
existing data and employing predictive analytics, the objective is to discern trends
and anticipate future consumer preferences. This endeavor not only provides
insights into current market dynamics but also informs strategic decision-making for
future initiatives.
Overall, the convergence of business and social opportunities within the digital
realm presents a promising landscape for leveraging big data analytics. By
harnessing the power of social media, entertainment, and website browsing data,
businesses can gain a comprehensive understanding of consumer behaviors, unlock
new business opportunities, and drive sustainable growth in the ever-evolving
tourism sector.
3
2. Data Report
The data provided for analysis is the past customer data from aviation tourism
sector, their preferences in selecting destinations, Number of members in the
family, their social media activism about trip planning and interest in travelling.
The type of data: There are 3 float variables, 7 numeric columns and 7 categorical
columns present in the given dataset.
Description of data - This returns us the count, mean, min and max values for all
the numeric columns present in the dataset.
4
Replacing the attributes in below variables:
Replacing the attributes which are similar in nature in Preferred device, Preferred
location type and member in family.
5
3. Exploratory data analysis
Distribution plot/Histogram for all the numeric variables, before treating the
null values and before treating outliers.
6
Distribution plot/Histogram for all the numeric variables, after treating the null
values and before treating outliers.
7
Count Plot of Preferred device: Below is the number of users who are
buying the ticket next month.
8
Count Plot of Preferred device:
9
Count Plot of Preferred destinations of Customers : We can see that most of
the customers are interested in visiting Beaches.
10
Users who follow the company page has a less yearly average view on travel page
and users who do not follow the company page has higher average views on travel
page.
The probability of users who checked-in outstation last week is high that they are
going to buy the ticket.
11
c) Removal of unwanted variables (if applicable)
7 variables in the dataset have null values which are to be imputed, for analyzing
the data without any imbalance.
Null values can be imputed using statistical methods (like mean, media and mode
values) or by using K-Nearest Neighbors (KNN) algorithm where missing data is
estimated based on the values of neighboring data points.
Since, the null values overall percentage is less than 5, we can use statistical
methods to impute null values in the variables respectively
12
Data Column in which Null values are present Statistical tool used for
imputing Null values
Yearly_avg_view_on_travel_page Median
total_likes_on_outstation_checkin_given Median
Yearly_avg_comment_on_travel_page Median
following_company_page Mode
yearly_avg_Outstation_checkins Mode
preferred_device Mode
preferred_location_type Mode
13
Checking for Skewness in the data
Skewness observations:
1. If the skewness is in between -0.5 and 0.5, the data is fairly symmetrical.
2. If the skewness is between -1 and -0.5 / 0.5 and 1, data is moderately skewed.
3. If the skewness is less than -1 or greater than 1. The data is said to be highly
skewed.
Skewness Conclusion:
From the above, we can state that there is no much skewness in the data, as
all the values are in between -0.5 and 1.5.
14
Removing the outliers present in the above variables using Inter Quartile Range
(IQR) methodology.
15
f) Variable transformation
Customers are buying tickets using different platforms, so instead of having various
sources, we can club all the variables except Laptop into “Mobile” using Groupby
functionality or can Repalce all the variables into Mobile except Laptop.
Now, we have 2 sources as below:
We can see that, most of the users prefer Mobile phones for Booking the tickets.
16
The “following_compage_page” column has values “Yes”, “No”, “1”, “0” and Blanks.
All the Blanks/null values have been replaced with Mode values.
The 1 and 0 values are replaced with Yes and No respectively.
We can see that, most of the users do not follow the company page.
The “preferred_location_type” has 2 similar values with slightly different names as:
- Tour Travel
- Tour and Travel
The Tour Travel has been replaced with “Tour and Travel”
The “member_in_family” has a value “Three” instead of number “3”, which has
been replaced.
17
g) Addition of new variables
No new variables are added for our analysis.
18
Beach Destination Discounts: Launch promotional offers and discounts
targeting beach destinations, which are the most visited locations. Consider
partnering with beach resorts or tour operators to provide bundled deals,
such as discounted flight and hotel packages, to attract travelers seeking
beach vacations.
Social Media Campaigns for User Engagement: Create engaging social media
campaigns to increase user interaction and brand engagement. Target users
who are not following your company page but have high average views on
your travel page with personalized advertisements, exclusive offers, or
interactive content to capture their attention and encourage them to follow
your page for future updates.
*****
19