0% found this document useful (0 votes)

12 views19 pages

Interim Repor - Final

This interim report focuses on enhancing high-value customer retention in subscription-based businesses through predictive analytics. It outlines objectives such as identifying churn indicators, segmenting customers, and developing a predictive model using machine learning. The study utilizes a comprehensive dataset from LoyaltyVision Analytics to analyze customer behavior and derive actionable insights for improving retention strategies.

Uploaded by

JHANKAR BHUYAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views19 pages

Interim Repor - Final

Uploaded by

JHANKAR BHUYAN

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

INTERIM

REPORT

ENHANCING HIGH-VALUE CUSTOMER RETENTION THROUGH

PREDICTIVE ANALYTICS

Name JHANKAR BHUYAN

USN 23VMBR01961

Elective DATA SCIENCE AND ANALYTICS

Date of Submission 20/04/2025

CONTENTS

 OBJECTIVES OF THE STUDY

 SCOPE OF THE STUDY

 METHODOLOGY

 RESEARCH DESIGN

 DATA COLLECTION METHOD

 SAMPLING METHOD

 DATA ANALYSIS TOOLS

 RESULT OF EXPLORATORY DATA ANALYSIS

 COMMUNICATING FINDINGS AND INSIGHTS

 CONCLUSION
OBJECTIVES OF THE STUDY

This project targets the crucial problem of high-value customer churn in subscription-based
businesses. Leveraging predictive analytics and machine learning methods, the research intends
to detect early indicators of customer disengagement and create data-informed interventions for
improved retention. The interim report summarizes the objectives, methodology, and preliminary
findings, and includes insights into patterns of customer behavior and predictive model
effectiveness in churn forecasting.

 To examine customer demographics, behavior, and transactional

patterns through exploratory data analysis methods.
 To determine the most important factors driving customer churn through data profiling
and correlation analysis.
 To segment customers by revenue, tenure, and service interaction to reveal actionable
insights.
 To design new features to enhance the predictability of churn modeling.
 To develop a baseline predictive model to identify at-risk customers through machine
learning algorithms.
 To suggest targeted recommendations to enhance customer retention
and maximize loyalty initiatives.

SCOPE OF THE STUDY

The scope of the study includes the interpretation and analysis of customers' data to
improve high-value customer retention. It is aimed at the detection of behavior trends,
customer segmentation, and the construction of baseline models for churn prediction. The
research is restricted to the dataset given by LoyaltyVision Analytics and involves the
following:

1. Focus on High-Value Customers in Subscription-Based Businesses The study focuses

on high-value customers—those generating substantial revenue for a business—across
industries like Software as a Service (SaaS), financial services, and streaming digital
media. These are the customers whose retention is key to the long-term profitability of
subscription models, and whose retention is most crucial.

2. Leverage of Structured and Unstructured Data The study leverages both structured data
(e.g., transaction records, CRM data) and unstructured data (e.g., customer feedback,
support tickets) to gain a comprehensive understanding of customer behaviors and churn
indicators. This holistic approach ensures a nuanced analysis of factors influencing
customer retention.

3. Focus on Predictive Analytics and Retention Strategy Development Using machine

learning algorithms and predictive modeling methodologies, the study seeks to establish
early warning indicators of customer disengagement. The findings obtained will guide the
creation of specific retention strategies, allowing companies to act ahead of churn threats.

4. Scalable and Flexible Framework Development The research aims to develop a

retention framework that is scalable and flexible in different service-based industries.
This framework is meant to be adaptable, fitting the specific needs and dynamics of each
subscription-based company.

5. Exclusion of Non-Revenue-Generating Customers and Non-Service-Based Industries

Research deliberately excludes non-revenue-generating customers and excludes
businesses outside of the service business space, i.e., manufacturing or retail businesses.
This segregation allows for an intensive analysis in context to the businesses where the
retention of the customer is proportionally linked with the recurring streams of revenue.

METHODOLOGY
This research will utilize both data-driven methods and machine learning algorithms to forecast
and prevent high-value customer churn. The approach consists of the following elements:
Type of Research: The study is empirical and descriptive, with historical customer information
used to recognize patterns of churn and test predictive models.
 Data Collection Methods:
1. Primary Data: Surveys, customer feedback, and interaction logs.
2. Secondary Data: Historical transaction records, CRM databases, and industry reports.

 Data Cleaning Techniques:

1. Dealing with missing values via imputation strategies.
2. Elimination of duplicate entries to maintain truthfulness.
3. Regularization of formats and rectifying inconsistencies.
4. Scaling numerical data to improve model generalizability.

 Exploratory Data Analysis (EDA):

1. Spotting trends and anomalies in client behavior.
2. Levelling correlations using visualization utilities (e.g., heatmaps, histograms).
3. Segmenting consumers based on purchase behavior and consumption patterns.

 Evaluation:
1. Comparison of pre- and post-deployment retention rates.
2. Quantifying intervention effects using A/B testing.
3. Regular review to optimize retention strategies.

 Expected Outcomes:
1. Discovery of critical churn indicators and risk factors for high-value customers.
2. Creation of a predictive model to segment at-risk customers.
3. Enhanced customer retention strategies resulting in revenue increase.
4. A model for companies to maximize customer engagement and loyalty.
5. A dynamic, real-time customer retention solution that adjusts to changing customer
behaviors and market trends.

RESEARCH DESIGN
The research design used in this study is a quantitative, exploratory, and predictive design with a
combination of statistical analysis and machine learning for the purpose of understanding customer
behavior and churn trends.

The key components of the research design include:

1. Research Type:

 Quantitative: The research is numerically data-analysis-based, statistically related, and

predictive modeling.
 Exploratory: The initial stage is focused on revealing patterns, structures, and anomalies of
the data.
 Predictive: Later stages include training a machine learning model to predict customer
churn based on identified features.

2. Data Source:

 Secondary data supplied by LoyaltyVision Analytics, comprising more than 11,000

customer records along more than one behavioral, demographic, and transactional
dimension.

3. Sampling Design:

 Census-based: The whole dataset was employed without sampling in order to maintain
representativeness and provide complete insights.
 All customer segments were taken into account in the analysis including high-value,
regular, and churned customers.

4. Tools & Techniques Used

 Pandas, NumPy, Matplotlib, Seaborn for exploration and visualization.

 Scikit-learn for modeling and evaluation.
 Descriptive stats, correlation, outliers, feature engineering, classification modeling
were a part of analysis.

5. Data Analysis Framework:

 Study progressed in linear sequence:

 Data Cleaning → EDA → Feature Engineering → Churn Analysis → Model
Training → Insights

6. Ethical Considerations:

 All used data is anonymized and applied for academic as well as analytic purposes
only to meet data privacy principles.

DATA COLLECTION METHOD

The information applied in this study was collected through secondary data collection. The data
was supplied by LoyaltyVision Analytics, comprising a rich set of customer-level data applicable
in the understanding of behavior, engagement, and churn.

Nature of the Data:

 The data consist of 11,260 instances with 19 features, spanning demographic data,
customer tenure, revenue measures, service-related scores, complaints, device
usage, and churn labels.
 It contains both quantitative (example cashback, rev_per_month, Tenure) and
categorical (e.g., Marital_Status, Gender, account_segment) features.

Data Source:
 Internal Organizational Dataset provided by LoyaltyVision Analytics.
 Data is representative of customer activity and behavior within a specified time period
and is presumed to be anonymized for research purposes.

Data Integrity:
 The data were checked for missing values, invalid data types, and outliers.
 Exploratory tests verified that the data was detailed and diverse enough to back up the
research goals.

Relevance to Study:
 The data is directly relevant to the study's objective of comprehending customer churn,
thus ideal for exploratory and predictive analysis.

SAMPLING METHOD

The research adopts a census-oriented strategy over the traditional sampling framework since
the entire dataset supplied by LoyaltyVision Analytics was made available for analysis.
1. Sampling Design:

 Sampling Technique: Census Method

a) All records in the data set were used, thus making sure to cover all customer profiles,
such as churned customers and active customers.

 Population:
a) All customers that are represented within the LoyaltyVision Analytics dataset —
comprising a total of 11,260 entries — from different customer segments, revenue
categories, and service levels.

 Rationale for Census Sampling:

a) The size of the dataset was small and computationally feasible to handle in one go.
b) It provided greater accuracy and representativeness of findings.
c) Avoided the possibility of sampling bias, which is particularly critical in churn analysis
where the minority class (churned customers) must be fully visible.

2. Target Group:

 High-value customers were a prime target.

 Segments like "Super," "Super Plus," and "Regular Plus" were examined in more detail to
see churn behavior and retention opportunitie

DATA ANALYSIS TOOLS

For extracting useful insights from data gathered and for solving the research goals effectively, a
mix of analytical libraries and statistical techniques have been utilized. These libraries assist in
data cleaning, data exploration, visualization, and predictive modelling.
1 Programming Language:
 Python: The central programming language adopted for data cleansing, analysis,
visualization, and modeling.

2 Python Libraries:
 Pandas – used for data preprocessing and manipulation.
 NumPy – used for numeric computations and manipulating arrays.
 Matplotlib & Seaborn – used for plotting data through histograms, boxplots, countplots,
and heatmaps.
 Scikit-learn (sklearn) – for machine learning operations such as model training,
evaluation, and data splitting.

3 Environment:
 Jupyter Notebook – an interactive coding environment to write, run, and document the
analysis process.

4 Analytical Techniques:
 Descriptive statistics and summary tables.
 Correlation analysis and visual heatmaps.
 Outlier detection via boxplots.
 Feature engineering through derived variables.
 Logistic Regression model as a baseline predictive classifier.
 Performance evaluation using classification report (precision, recall, F1-score).
Through the use of such data analysis tools, the research guarantees a detailed and meaningful
exploration of high-value customer behavior to allow for the creation of effective strategies for
improving retention.

RESULT OF EXPLORATORY DATA ANALYSIS

This section presents a comprehensive exploratory data analysis (EDA) of the dataset provided
for the research on improving high-value customer retention using predictive analytics. The
purpose of EDA is to find out about the shape of the dataset, clean it and prepare it for further
steps, detect patterns, and derive insights that assist in modeling and decision-making.

[Step 1] : Dataset Overview and Summary Statistics

We start by learning the shape and structure of the dataset. The dataset has 11,260 rows and 19
columns, which correspond to different customer attributes like demographics, engagement
metrics, and revenue information.

Screenshot : 1

df.describe(include='all')

Screenshot : 2 Histogram of Numerical Features

df.hist()

Screenshot : 3 Distribution plot

Key Takeaways:
 Mean customer tenure is approximately 18 months.
 High standard deviation values for revenue variables indicate the occurrence of outliers.
 Approximately 16.83% is the churn rate.
 Debit Card is the most frequently used mode of payment.
 Histograms represent that revenue and cashback distribution are right-skewed, while
tenure is heterogenous across customers.

[Step 2] : Univariate Analysis (Numerical Variables)

Univariate analysis is concerned with the investigation of each single numerical variable alone to
see its distribution, central tendency, and dispersion. Histograms were graphed with df.hist() to
plot the frequency distribution of major numeric features like rev_per_month, Tenure, cashback,
and Service_Score.
This is the important step to detect skewness, identify potential outliers, and determine whether
or not data transformation like normalization or log scaling is required prior to proceeding to
predictive modeling.
Key Takeaways:
 rev_per_month is right-skewed, which means a few customers account for most revenue.
 cashback and Service_Score are similarly right-skewed.
 Tenure has a fairly even distribution with a couple of peaks, which indicates different
stages of customer lifecycle.
 Insights derived here will inform feature engineering and preprocessing decisions such as
normalization and binning.

[Step 3] : Categorical Variable Distribution

Categorical variables were also explored in order to interpret the frequency distribution of
different types and classes of customers such as account segments, modes of payments, gender,
and marital status. Countplot visualization was utilized to illustrate the varying counts of
customers between different categories. These interpretations aid in recognizing prominent
groups, niche categories, and underrepresented or even high-risk groups of customers. This kind
of information is absolutely necessary for market segmentation and target marketing approaches.

Screenshot : 3.1 Countplot of account_segment using sns.countplot()

Key Takeaways:
 "Super" and "Regular Plus" are the most populous segments, meaning they play a
significant part in the customer base.
 "Super +" is a niche segment with a tiny population, perhaps symbolizing premium or
elite-class customers.
 Comparison of distributions like these assists in prioritizing segment-wise retention
activity and bespoke intervention strategies.

[Step 4] : Correlation Analysis

Correlation analysis was used to establish linear relationships between different numerical
features. This helps identify how highly features are correlated with each other and can guide
feature selection and multicollinearity testing when training models.
A heatmap was plotted using sns.heatmap() to graphically present the correlation matrix. High
positive or negative correlations can indicate potential predictive power or redundancy between
variables.

Screenshot : 4.1 Correlation heatmap (sns.heatmap(df.corr(), annot=True))

Key Takeaways:
 There is a high positive correlation between Tenure and rev_per_month, indicating that long-
term customers spend more.
 Service_Score is negatively correlated with Churn, which means that better serviced customers
are less likely to churn.
 cashback and rev_growth_yoy are also moderately correlated, suggesting that reward policies
could impact revenue growth.
 This step assists in determining which features can be given priority or observed more
intensively in predictive modeling.
[Step 5] : Outlier Detection
Detection of outliers is critical to determine extreme values that may skew
statistical summaries and affect model performance. For customer data, outliers
tend to be either data quality problems or truly high-value customers.
Boxplots were created for quantitative columns like rev_per_month and cashback
to see the spread and pinpoint values lying well beyond the interquartile range
(IQR). Such plots aid in determining whether to keep, truncate, or convert outlier
values.

Screenshot : 5.1 and 5.2 Boxplot of rev_per_month and/or cashback

Key Takeaways:
 A large proportion of outliers in rev_per_month were seen that are high-spending
customers.
 Likewise, cashback values had a broad range and some very extreme cashback cases.
 These were kept, because they are giving useful information on the behavior of high-
value segments of customers who are important in this retention-based study.

[Step 6] : Feature Engineering Outputs

Feature engineering is an essential step in the data preparation task that requires generating new
variables or changing existing ones to draw out more useful patterns and enhance predictive
model performance. Domain knowledge and insights obtained from previous analysis steps were
used to construct additional variables for this project to capture customer behavior more
effectively and support segmentations better.
spend_category: Segments customers into Low, Medium, High, and Very High according to
rev_per_month.
CC_Contact_Recency: Segments customers by their most recent interaction with customer
service.

Screenshot : 6.1 Countplot of spend_category (categorized revenue levels) and

CC_Contact_Recency (customer service contact intervals)

Key Takeaways:
 Most of the customers are classified into the Medium spender category according to
revenue ranges.
 The feature CC_Contact_Recency indicates that newer engaged customers (contacting
the customer care unit within the recent 1–3 months) have lower churn, which validates
the significance of prompt support and communication.
 All these engineered features provide greater granularity in the customer profile for more
effective targeting in retention practices.
 Customers, who contacted customer care within the recent 3 months, tend to churn less.

[Step 7] : Churn Rate Analysis

Identifying the distribution of target variable Churn is essential because it acts as the foundation
for predictive modeling and business strategy development. It's a step centered on discovering
the percentage of churned customers (label 1) versus not churned (label 0).
With value_counts(normalize=True), we determined the percentage breakdown between the
churned and retained customers. Not only does this assist in establishing baseline accuracy for
models, but it also identifies whether the dataset is imbalanced, a typical scenario in churn
prediction problems.

Screenshot : 7.1 Value counts of Churn (df['Churn'].value_counts(normalize=True))

Key Takeaways:
 The total churn rate in the data is roughly 16.83%, implying that although there is a
general retention of most customers, there is a considerable percentage at risk.
 This class imbalance implies a requirement for such methods as class weighting or
SMOTE (Synthetic Minority Over-sampling Technique) to preserve balanced model
training.
 The rate of churn may inform business agendas—focusing on this 17% and engaging
them in personalized strategies may greatly improve retention.

COMMUNICATING FINDINGS AND INSIGHTS

The results of the Exploratory Data Analysis have been thoroughly recorded with the right
visualizations and summary statistics. The insights have been interpreted into practical points
that not only make sense in modeling but are also beneficial for making business decisions.
Every step was conveyed through easy-to-understand visuals example histograms, boxplots,
heatmaps, countplots), accompanied by concise explanations and contextual interpretations. This
presentation makes it possible for technical and non-technical stakeholders alike to grasp the
important results, promoting transparency and collaboration among teams.
The EDA acts as a link between raw data and strategic thought, pointing out trouble spots such
as customer churn, possible high-value segments, and behavior patterns that need attention.

CONCLUSION

The exploratory data analysis process yielded a thorough insight into the customer dataset and
yielded interesting insights into customer behavior, segment distribution, churn patterns, and
possible predictive indicators. The important findings like the high revenue variability, segment
dominance, presence of outliers, and inter-relationship between service scores and churn are
important inputs for future modeling.
The organized analysis not only revealed patterns but also informed the creation of new features
that will improve the accuracy of prediction in subsequent phases. This foundation allows
following steps—predictive modeling and optimization of strategy—to be constructed upon data-
driven knowledge.

Jhankar Synopsis
No ratings yet
Jhankar Synopsis
33 pages
Abhay Ankit Customer Churn Capstone Project
No ratings yet
Abhay Ankit Customer Churn Capstone Project
19 pages
Telco Customer Churn Prediction Project Report
No ratings yet
Telco Customer Churn Prediction Project Report
40 pages
IM Thesis MartinBobula JanPetr PDF
No ratings yet
IM Thesis MartinBobula JanPetr PDF
154 pages
Week 6
100% (1)
Week 6
3 pages
Final Thesis Report-Bhuvanesh Kumar J
No ratings yet
Final Thesis Report-Bhuvanesh Kumar J
72 pages
Interim Report
No ratings yet
Interim Report
12 pages
MIKE11 UserManual
No ratings yet
MIKE11 UserManual
542 pages
Wa0004.
No ratings yet
Wa0004.
70 pages
Concept Note - Chhandavi Gowardhan
No ratings yet
Concept Note - Chhandavi Gowardhan
2 pages
Churn Prediction Product Idea
No ratings yet
Churn Prediction Product Idea
7 pages
CP 2
No ratings yet
CP 2
4 pages
Pragmatics PDF
0% (1)
Pragmatics PDF
87 pages
Customer Churn Presentation
No ratings yet
Customer Churn Presentation
28 pages
05 Chapter-1
No ratings yet
05 Chapter-1
12 pages
Pupil Practice Book
67% (3)
Pupil Practice Book
89 pages
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
100% (1)
Customer Churn Prediction Using Machine Learning: D. Deepika, Nihal Chandra
14 pages
Project Report..
No ratings yet
Project Report..
36 pages
Finalized Version
No ratings yet
Finalized Version
16 pages
Capstone+Project+ +Nikhil.+R+ +01
No ratings yet
Capstone+Project+ +Nikhil.+R+ +01
30 pages
DataScience Project-New
No ratings yet
DataScience Project-New
16 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
6 pages
Caie Igcse Mathematics Theory Znotes
No ratings yet
Caie Igcse Mathematics Theory Znotes
21 pages
Undersatanding Churn in B2B and Imporance 2025
No ratings yet
Undersatanding Churn in B2B and Imporance 2025
34 pages
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
67% (3)
Chpt4 ThConsumer Satisfaction Theories A Critical Revieweories
35 pages
Final Report Srini
No ratings yet
Final Report Srini
24 pages
Wa0004.
No ratings yet
Wa0004.
239 pages
Data Science Case Report
No ratings yet
Data Science Case Report
20 pages
20pd02 Aakar
No ratings yet
20pd02 Aakar
16 pages
A Review On Machine Learning Methods For Customer Churn Prediction and Recommendations For Business Practitioners
No ratings yet
A Review On Machine Learning Methods For Customer Churn Prediction and Recommendations For Business Practitioners
30 pages
OpenStack Cookbook: Freedom in The Cloud...
100% (1)
OpenStack Cookbook: Freedom in The Cloud...
17 pages
Synopsis Customer
No ratings yet
Synopsis Customer
12 pages
3 Customer Churn Prediction Using Composite Deep Learning Technique
No ratings yet
3 Customer Churn Prediction Using Composite Deep Learning Technique
17 pages
Abstract On CPP Project Sample
No ratings yet
Abstract On CPP Project Sample
19 pages
2020 Paper 6
No ratings yet
2020 Paper 6
24 pages
PFEreport
No ratings yet
PFEreport
43 pages
Credit Scoring Using Machine Learning
No ratings yet
Credit Scoring Using Machine Learning
381 pages
Iso 20344 2021
No ratings yet
Iso 20344 2021
15 pages
Customer Churn Analysis Report With Visuals
No ratings yet
Customer Churn Analysis Report With Visuals
5 pages
Final Project Report
No ratings yet
Final Project Report
25 pages
Geneaid - GSYNC DNA Extraction Kit - Protocol
100% (1)
Geneaid - GSYNC DNA Extraction Kit - Protocol
16 pages
Problem Statement
No ratings yet
Problem Statement
2 pages
Algorithms 17 00231
No ratings yet
Algorithms 17 00231
21 pages
SQL Project
No ratings yet
SQL Project
21 pages
JAROL Assumes The Promotion of Energy-Saving Technology As Its Own Task! 1. PREFACE NOTICE
50% (2)
JAROL Assumes The Promotion of Energy-Saving Technology As Its Own Task! 1. PREFACE NOTICE
182 pages
Token ID Ain20250117003-1
No ratings yet
Token ID Ain20250117003-1
14 pages
Customer Churn
No ratings yet
Customer Churn
38 pages
131 574 1 PB
No ratings yet
131 574 1 PB
12 pages
Synopsis
No ratings yet
Synopsis
17 pages
Admin, Guliyev
No ratings yet
Admin, Guliyev
15 pages
Customer Churn Prediction Capstone Projectdocx
No ratings yet
Customer Churn Prediction Capstone Projectdocx
11 pages
PowerCo Problem
No ratings yet
PowerCo Problem
2 pages
Predictive Analytics Strategy
No ratings yet
Predictive Analytics Strategy
4 pages
Interim Report
No ratings yet
Interim Report
17 pages
Executive Summary - Douaa
No ratings yet
Executive Summary - Douaa
3 pages
Abhishek Singh 15 ICICN Research Paper Feb 2025
No ratings yet
Abhishek Singh 15 ICICN Research Paper Feb 2025
6 pages
Machine Assignment
No ratings yet
Machine Assignment
2 pages
GEHC DICOM Conformance - Senographe Pristina Zephyr - DOC2139635 - Rev2
No ratings yet
GEHC DICOM Conformance - Senographe Pristina Zephyr - DOC2139635 - Rev2
212 pages
Business Problem
No ratings yet
Business Problem
10 pages
Methodology
No ratings yet
Methodology
12 pages
Unit 5 File Management PDF
No ratings yet
Unit 5 File Management PDF
40 pages
Analysis of Customer Churn Prediction in Telecom Industry Using Decision Trees and Logistic Regression
No ratings yet
Analysis of Customer Churn Prediction in Telecom Industry Using Decision Trees and Logistic Regression
4 pages
Efficacy of Customer Churn Prediction System
No ratings yet
Efficacy of Customer Churn Prediction System
8 pages
Capstone
No ratings yet
Capstone
1 page
Churn Data Prediction Project
No ratings yet
Churn Data Prediction Project
5 pages
Agilent 54622D Oscilloscope Service
No ratings yet
Agilent 54622D Oscilloscope Service
118 pages
Assignment Csit
No ratings yet
Assignment Csit
5 pages
Basic Principles of Colour Measurement and Colour
No ratings yet
Basic Principles of Colour Measurement and Colour
25 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
8 pages
Anticipating Customer Churn in Telecommunication Using Machine Learning Algorithms For Customer Retention
No ratings yet
Anticipating Customer Churn in Telecommunication Using Machine Learning Algorithms For Customer Retention
7 pages
Synopsis
No ratings yet
Synopsis
3 pages
Machine-Learning Techniques For Customer Retention - A Comparative Study
No ratings yet
Machine-Learning Techniques For Customer Retention - A Comparative Study
9 pages
Cohesion
No ratings yet
Cohesion
7 pages
Unit 9 Vocabulary
No ratings yet
Unit 9 Vocabulary
34 pages
Smartview Common Issues - Master Blog Part-1: Issue-Smart View Not Submitting Data To Essbase Application/Database
No ratings yet
Smartview Common Issues - Master Blog Part-1: Issue-Smart View Not Submitting Data To Essbase Application/Database
19 pages
Customer Churn Analysis in Telecom Industry
No ratings yet
Customer Churn Analysis in Telecom Industry
6 pages
Prudent Race Engineering OL Brochure - 2nd Edition
No ratings yet
Prudent Race Engineering OL Brochure - 2nd Edition
9 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
Fanuc Manual Guide
100% (2)
Fanuc Manual Guide
172 pages
Numerical Methods L3 Ok
No ratings yet
Numerical Methods L3 Ok
28 pages
SD Mill
No ratings yet
SD Mill
10 pages
Reference Softener Calculations
No ratings yet
Reference Softener Calculations
1 page
CLP 02.2 Course Title: Microprocessors & Microcontrollers Lab
No ratings yet
CLP 02.2 Course Title: Microprocessors & Microcontrollers Lab
6 pages
Assignment No. 1: Course: Hydraulic Engineering I&D-501 Due Date: 12 March 2021
No ratings yet
Assignment No. 1: Course: Hydraulic Engineering I&D-501 Due Date: 12 March 2021
2 pages
I-O List
No ratings yet
I-O List
6 pages
Topic 1 Past Exam Extended Questions
No ratings yet
Topic 1 Past Exam Extended Questions
3 pages
Analisa Sifat Material
No ratings yet
Analisa Sifat Material
10 pages
Settingan Untuk Modem GSM
No ratings yet
Settingan Untuk Modem GSM
5 pages
Advanced E-Commerce Business Questions and Analytical Hints
From Everand
Advanced E-Commerce Business Questions and Analytical Hints
Zemelak Goraga
No ratings yet
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet

Interim Repor - Final

Uploaded by

Interim Repor - Final

Uploaded by

INTERIM

ENHANCING HIGH-VALUE CUSTOMER RETENTION THROUGH

Name JHANKAR BHUYAN

Elective DATA SCIENCE AND ANALYTICS

Date of Submission 20/04/2025

 OBJECTIVES OF THE STUDY

 SCOPE OF THE STUDY

 DATA COLLECTION METHOD

 DATA ANALYSIS TOOLS

 RESULT OF EXPLORATORY DATA ANALYSIS

 COMMUNICATING FINDINGS AND INSIGHTS

 To examine customer demographics, behavior, and transactional

SCOPE OF THE STUDY

1. Focus on High-Value Customers in Subscription-Based Businesses The study focuses

3. Focus on Predictive Analytics and Retention Strategy Development Using machine

4. Scalable and Flexible Framework Development The research aims to develop a

5. Exclusion of Non-Revenue-Generating Customers and Non-Service-Based Industries

 Data Cleaning Techniques:

 Exploratory Data Analysis (EDA):

The key components of the research design include:

 Quantitative: The research is numerically data-analysis-based, statistically related, and

 Secondary data supplied by LoyaltyVision Analytics, comprising more than 11,000

4. Tools & Techniques Used

 Pandas, NumPy, Matplotlib, Seaborn for exploration and visualization.

5. Data Analysis Framework:

 Study progressed in linear sequence:

DATA COLLECTION METHOD

Nature of the Data:

 Sampling Technique: Census Method

 Rationale for Census Sampling:

 High-value customers were a prime target.

DATA ANALYSIS TOOLS

RESULT OF EXPLORATORY DATA ANALYSIS

[Step 1] : Dataset Overview and Summary Statistics

Screenshot : 2 Histogram of Numerical Features

Screenshot : 3 Distribution plot

[Step 2] : Univariate Analysis (Numerical Variables)

[Step 3] : Categorical Variable Distribution

Screenshot : 3.1 Countplot of account_segment using sns.countplot()

[Step 4] : Correlation Analysis

Screenshot : 4.1 Correlation heatmap (sns.heatmap(df.corr(), annot=True))

Screenshot : 5.1 and 5.2 Boxplot of rev_per_month and/or cashback

[Step 6] : Feature Engineering Outputs

Screenshot : 6.1 Countplot of spend_category (categorized revenue levels) and

[Step 7] : Churn Rate Analysis

Screenshot : 7.1 Value counts of Churn (df['Churn'].value_counts(normalize=True))

COMMUNICATING FINDINGS AND INSIGHTS

You might also like