100% found this document useful (1 vote)
375 views20 pages

Capstone Project 1

The document discusses a project using social media data to help an aviation company improve their digital advertising. It describes collecting data on customers' social media behaviors and developing models to predict purchase propensity and target potential customers. The data is explored through univariate and bivariate analysis to gain business insights for strategic decision making.

Uploaded by

pranavi p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
375 views20 pages

Capstone Project 1

The document discusses a project using social media data to help an aviation company improve their digital advertising. It describes collecting data on customers' social media behaviors and developing models to predict purchase propensity and target potential customers. The data is explored through univariate and bivariate analysis to gain business insights for strategic decision making.

Uploaded by

pranavi p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

CAPSTONE PROJECT (SOCIAL MEDIA TOURISM)

MARCH 10, 2024


Contents
Problem 1:
Sl. No Problem Page No
Problem Understanding
a) Defining problem statement 2
b) Need of the study/project 2
c) Understanding business/social opportunity 3

Data Report
a) Understanding how data was collected in terms of time, 4
frequency and methodology
b) Visual inspection of data (rows, columns, descriptive 4
details)
c) Understanding of attributes (Variable info, renaming if 4
required)

Exploratory Data Analysis


a) Univariate analysis 6
b) Bivariate analysis 10
c) Removal of unwanted variables 12
d) Missing value treatment 12
e) Outlier treatment 13
f) Variable transformation 16
g) Addition of new variables 18

Business insights from EDA


a) Is the data balanced? If so, what can be done? 18
Explanation in the context of the business
b) Any Business insights using clustering 18
c) Any other business insights 18

1
Problem - 1
1. Problem understanding

a) Defining the Problem statement

An aviation company specializing in both domestic and international travel seeks to


transition from traditional tele-calling methods to targeted digital advertising. To
achieve this, they have partnered with a social networking platform to gain insights
into the digital and social behaviors of their customers. The objective is to deliver
digital advertisements directly to the user pages of targeted customers who exhibit
a high likelihood of purchasing tickets.

Given that purchasing propensity varies across different login devices, the company
aims to develop separate predictive models for users accessing the platform via
laptops and mobile devices. The challenge is to accurately identify and target
potential customers based on their digital behaviors, ultimately increasing the
effectiveness of digital advertising campaigns and driving ticket sales.

b) Need of the study/Project


c) The influence of digital media on tourism is huge. It's changed how people
plan trips, share experiences, and choose destinations. Websites and social
media are packed with info like reviews, comments, and check-ins. Using
social media for marketing is super effective and costs less.

 Tourism businesses now understand the importance of online reviews, as they sell
experiences that can't be touched. Knowing what info people look for online and
how they use it helps tailor marketing strategies.

 For aviation companies, understanding how customers behave online is key.


Analyzing existing data helps predict future trends and figure out which customers
are likely to be interested in specific products.

 By digging into data and understanding online behaviors, businesses can create
strategies that match what customers want, driving success in the travel industry.

2
c) Understanding business/social opportunity

The data provided highlights the significant impact of digital media on the tourism
industry, emphasizing its role in transforming how travel experiences are planned,
consumed, evaluated, and marketed. Websites and social media platforms serve as
rich sources of information, offering insights such as destination reviews, user
interactions, and travel check-ins. This digital landscape presents ample
opportunities for businesses to leverage social media marketing, which can enhance
exposure, increase traffic, and drive sales at reduced costs.

The project outlined aims to dwell deeper into understanding the digital behaviors
of customers, particularly in the context of the aviation industry. By studying
existing data and employing predictive analytics, the objective is to discern trends
and anticipate future consumer preferences. This endeavor not only provides
insights into current market dynamics but also informs strategic decision-making for
future initiatives.

Overall, the convergence of business and social opportunities within the digital
realm presents a promising landscape for leveraging big data analytics. By
harnessing the power of social media, entertainment, and website browsing data,
businesses can gain a comprehensive understanding of consumer behaviors, unlock
new business opportunities, and drive sustainable growth in the ever-evolving
tourism sector.

3
2. Data Report

a) Understanding how data was collected in terms of time, frequency


and methodology

The data provided for analysis is the past customer data from aviation tourism
sector, their preferences in selecting destinations, Number of members in the
family, their social media activism about trip planning and interest in travelling.

b) Visual inspection of data (rows, columns, descriptive details)

Below is the data provided for analysis:

The data has 11760 Rows and 17 Columns.

c) Understanding of attributes (variable info, renaming if required)

The type of data: There are 3 float variables, 7 numeric columns and 7 categorical
columns present in the given dataset.

Description of data - This returns us the count, mean, min and max values for all
the numeric columns present in the dataset.

4
 Replacing the attributes in below variables:

Replacing the attributes which are similar in nature in Preferred device, Preferred
location type and member in family.

5
3. Exploratory data analysis

a) Univariate analysis (distribution and spread for every continuous


attribute, distribution of data in categories for categorical ones)

 Distribution plot/Histogram for all the numeric variables, before treating the
null values and before treating outliers.

6
 Distribution plot/Histogram for all the numeric variables, after treating the null
values and before treating outliers.

7
 Count Plot of Preferred device: Below is the number of users who are
buying the ticket next month.

8
 Count Plot of Preferred device:

 Count Plot of Number of members in the family : Most of customers family


consists of 3 members, followed by 4 members.

9
 Count Plot of Preferred destinations of Customers : We can see that most of
the customers are interested in visiting Beaches.

b) Bivariate analysis (relationship bw different variables, correlations)

10
Users who follow the company page has a less yearly average view on travel page
and users who do not follow the company page has higher average views on travel
page.

The probability of users who checked-in outstation last week is high that they are
going to buy the ticket.

11
c) Removal of unwanted variables (if applicable)

In “yearly_avg_Outstation_checkins” we have an unwanted variable “*” in the data.


We have replaced it with Mode value along with the missing/null values.

d) Missing Value Treatment

 Checking for missing/null values:

7 variables in the dataset have null values which are to be imputed, for analyzing
the data without any imbalance.

Null values can be imputed using statistical methods (like mean, media and mode
values) or by using K-Nearest Neighbors (KNN) algorithm where missing data is
estimated based on the values of neighboring data points.

Checking the % of null values in the whole dataset:

Since, the null values overall percentage is less than 5, we can use statistical
methods to impute null values in the variables respectively
12
Data Column in which Null values are present Statistical tool used for
imputing Null values
Yearly_avg_view_on_travel_page Median
total_likes_on_outstation_checkin_given Median
Yearly_avg_comment_on_travel_page Median
following_company_page Mode
yearly_avg_Outstation_checkins Mode
preferred_device Mode
preferred_location_type Mode

e) Outlier treatment (if required)

Checking for the presence of outliers:

13
 Checking for Skewness in the data

Skewness observations:
1. If the skewness is in between -0.5 and 0.5, the data is fairly symmetrical.
2. If the skewness is between -1 and -0.5 / 0.5 and 1, data is moderately skewed.
3. If the skewness is less than -1 or greater than 1. The data is said to be highly
skewed.
Skewness Conclusion:
From the above, we can state that there is no much skewness in the data, as
all the values are in between -0.5 and 1.5.

14
Removing the outliers present in the above variables using Inter Quartile Range
(IQR) methodology.

15
f) Variable transformation

Customers are buying tickets using different platforms, so instead of having various
sources, we can club all the variables except Laptop into “Mobile” using Groupby
functionality or can Repalce all the variables into Mobile except Laptop.
Now, we have 2 sources as below:

We can see that, most of the users prefer Mobile phones for Booking the tickets.

16
The “following_compage_page” column has values “Yes”, “No”, “1”, “0” and Blanks.
All the Blanks/null values have been replaced with Mode values.
The 1 and 0 values are replaced with Yes and No respectively.
We can see that, most of the users do not follow the company page.

The “preferred_location_type” has 2 similar values with slightly different names as:
- Tour Travel
- Tour and Travel
The Tour Travel has been replaced with “Tour and Travel”

Before replacement: After replacement:

The “member_in_family” has a value “Three” instead of number “3”, which has
been replaced.
17
g) Addition of new variables
No new variables are added for our analysis.

4. Business insights from EDA

a) Is the data balanced? If so, what can be done? Explanation in the


context of the business
No, the data is not balanced. To determine if the data is balanced, we need to
assess whether the distribution of classes or categories within the dataset is
approximately equal.
Balancing data is critical for aviation companies as it ensures that machine
learning models provide accurate insights for all scenarios. For instance, in
flight delay prediction, having balanced data helps in accurately identifying
factors contributing to delays across various flight categories. Failing to
address data imbalance could lead to biased models, impacting the reliability
of predictions and potentially affecting operational decisions, such as
scheduling and resource allocation.

b) Any Business insights using clustering


Clustering, an unsupervised machine learning approach, entails identifying
and grouping similar data points within large datasets without targeting
specific outcomes. Termed as cluster analysis, this method helps organize
data into more comprehensible structures. However, as our business
objective centers on prediction rather than classification, clustering is not
essential for our current goals.

c) Any other business insights


Here are some ideas tailored for an aviation company based on the given
scenarios:

Family Package Offers: Develop exclusive family travel packages catering to


groups of 3 to 4 members. Offer discounts on bundled tickets,
accommodations, and additional perks like priority boarding or
complimentary meals to incentivize families to choose your airline for their
travel needs.

18
Beach Destination Discounts: Launch promotional offers and discounts
targeting beach destinations, which are the most visited locations. Consider
partnering with beach resorts or tour operators to provide bundled deals,
such as discounted flight and hotel packages, to attract travelers seeking
beach vacations.

SMOTE for Imbalanced Data: Implement the Synthetic Minority Over-sampling


Technique (SMOTE) to address data imbalance, particularly in customer
segmentation or predictive modeling tasks. By generating synthetic samples
for underrepresented groups, ensure that your predictive models are trained
on a balanced dataset, leading to more accurate insights and predictions.

Social Media Campaigns for User Engagement: Create engaging social media
campaigns to increase user interaction and brand engagement. Target users
who are not following your company page but have high average views on
your travel page with personalized advertisements, exclusive offers, or
interactive content to capture their attention and encourage them to follow
your page for future updates.

Advance Booking Promotions: Launch marketing campaigns promoting


advance ticket bookings for the upcoming month. Offer early bird discounts,
bonus miles, or flexible booking options to incentivize customers to plan their
travel in advance and secure their tickets with your airline.

Frequent Flyer Rewards: Enhance your frequent flyer program by offering


additional rewards, upgrades, or exclusive perks for loyal customers.
Encourage repeat business and brand loyalty by rewarding customers for
their continued support and patronage.

By implementing these tailored strategies, the aviation company can


effectively retain customers, attract new travelers, and boost overall business
performance.

*****

19

You might also like