0% found this document useful (0 votes)

82 views16 pages

Iranian Churn

Uploaded by

xegocic823

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

82 views16 pages

Iranian Churn

Uploaded by

xegocic823

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

A REPORT on

Iranian Churn Prediction

Submitted to

KIIT Deemed to be University

In Partial Fulfilment of the Requirement for the Award of

MASTER’S DEGREE IN
COMPUTER APPLICATION

Summited

By
Lakshya Namdeo 23700220
Ishani Banerjee 2370187
Dibya Prakash Dash 2370152

SCHOOL OF COMPUTER APPLICATION

KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY

BHUBANESWAR, ODISHA -751024
November 2024
INDEX

SNO. TOPIC PAGENO.

1 Dataset Description 3

2 Problem Statement 4

3 Methodology 4

5 Data Splitting 5

5 Classification Method 5

6 Coding Process 6

7 Performance Analysis 6-7

8 Results and Discussion 7

9 Conclusion 8

10 Source code 9-14

12 Confusion Matrix 13

13 AUC_ROC Curve 14

11 References 15

2. Dataset Description

2
Overview

The dataset used in this project is the Iranian telecom churn dataset. It includes
various features that capture customer behaviour and demographics, which can help
predict whether a customer is likely to leave the service (churn) or remain
subscribed. The main aim of the dataset is to identify patterns and correlations that
can be utilised to develop a predictive model.

● Features:
○ Call Failure: Number of call failures made by the customer.
○ Complaints: Number of complaints raised by the customer.
○ Subscription Length: Duration of the customer’s subscription in
months.
○ Charge Amount: The total amount charged to the customer.
○ Frequency of Use: Frequency with which the customer uses the
service.
○ Other behavioural and usage metrics.
● Target Variable:
○ Churn: A binary indicator where 0 represents a non-churned customer
and 1 represents a churned customer.
● Class Distribution:
○ Non-Churned (0): 525 instances
○ Churned (1): 105 instances
○ Class Imbalance: There is a significant imbalance in the data, with
more non-churned instances than churned. This imbalance can affect
the model’s performance if not handled properly.

Summary Statistics and Preprocessing

Each feature underwent preprocessing steps to prepare it for model training:

1. Handling Missing Values: Missing values were either imputed with

mean/median values or removed based on the extent of missing data.
2. Outlier Detection and Treatment: Outliers were detected using statistical
methods such as the Interquartile Range (IQR) and treated to prevent them
from skewing model predictions.
3. Feature Scaling: StandardScaler was used to normalise the features,
ensuring that each feature contributes equally to model training, especially for
distance-based algorithms.

3
3. Problem Statement

The primary goal of this project is to predict customer churn for a telecom company
in Iran. Churn prediction models help businesses identify customers likely to leave,
enabling them to develop strategies to retain these customers. By accurately
predicting churn, the company can improve customer retention rates, reduce losses,
and increase profitability. Specifically, this project aims to develop a model that can
effectively classify whether a customer will churn based on their usage patterns,
complaints, and demographic data.

4. Methodology

Data Preprocessing

A series of preprocessing steps were carried out to clean and prepare the data:

1. Data Cleaning:
○ Missing values in the dataset were addressed using imputation
(mean/median) for numerical columns and mode for categorical
columns.
○ Categorical features were transformed into numerical values via
one-hot encoding where necessary.
2. Handling Outliers:
○ Outliers were detected through statistical methods (e.g., Z-score, IQR).
○ Outliers that could affect model performance were capped or removed
to ensure they didn’t distort the learning process.
3. Feature Scaling:
○ Standardisation using StandardScaler was applied to scale the
features, especially for models sensitive to feature scales (e.g., logistic
regression).

Model Evaluation Metrics

The following metrics were calculated to evaluate model performance:

● Accuracy: Measures the overall proportion of correct predictions.

● Precision: Indicates how many of the predicted churned customers were
actually churned.
● Recall: Reflects how well the model identifies actual churned customers.
● F1 Score: Balances precision and recall, useful in scenarios with class
imbalance.
● AUC-ROC: Measures the model's ability to distinguish between the churn and
non-churn classes.

4
5. Data Splitting

The dataset was split into training and testing sets to evaluate the model’s
performance:

● Split Ratio: 80% for training, 20% for testing.

● Random Seed: random_state=12 was set to ensure the split is
reproducible.
● Data Shapes:
○ Training set: (xtrain, ytrain) where xtrain includes features,
and ytrain includes churn labels.
○ Testing set: (xtest, ytest) for model evaluation.

This split allowed the model to learn from the majority of the data while providing a
separate set for unbiased performance evaluation.

6. Classification Method

To tackle the problem, several machine learning algorithms were used:

1. Logistic Regression: A baseline linear model for binary classification. It

assumes a linear relationship between the features and the log odds of the
target class.
2. Decision Tree Classifier: Captures non-linear relationships by recursively
splitting the data into homogenous sets based on feature values.
3. Random Forest Classifier: An ensemble of decision trees that reduces
overfitting and improves generalisation by averaging multiple trees'
predictions.
4. XGBoost Classifier: A powerful gradient-boosted tree algorithm known for
high accuracy, especially in classification problems with structured data.
5. Gradient Boosting Classifier: A boosting algorithm that sequentially builds
weak learners to minimise prediction errors.
6. AdaBoost Classifier: A boosting technique that adjusts weights to focus on
difficult instances, improving performance on imbalanced data.

Each model was evaluated using the model_metrics function, which computed
performance metrics on both training and test sets.

5
7. Coding Process

Libraries Used

● Data Handling: pandas, numpy

● Modelling and Metrics: sklearn, xgboost
● Visualisation: matplotlib, seaborn for plotting metrics and performance
graphs

Key Functions and Process

1. Data Loading and Preprocessing: Loaded the dataset and performed initial
data cleaning.
2. Model Training: Each model was trained on the training data (xtrain,
ytrain).
3. Evaluation: The model_metrics function iteratively fitted each model and
computed accuracy, precision, recall, and F1 scores, storing the results in a
DataFrame.

8. Performance Analysis

Confusion Matrix

The confusion matrix summarises the performance for each model:

● True Positives (TP): Correctly predicted churned customers.

● True Negatives (TN): Correctly predicted non-churned customers.
● False Positives (FP): Incorrectly predicted churned customers.
● False Negatives (FN): Incorrectly predicted non-churned customers.

For instance, in a sample confusion matrix, RandomForestClassifier achieved the

following:

AUC-ROC Curve

6
The ROC curve, shown in the provided image, illustrates the true positive rate
(sensitivity) against the false positive rate for XGBoost (XGB) and Random Forest
Classifier (RFC). Both models achieved an AUC score of 0.983, indicating
excellent discrimination between churn and non-churn classes.

Performance Summary

9. Results and Discussion

7
Key Observations and Insights

1. High AUC and ROC Performance:

○ Both the Random Forest and XGBoost models achieved an AUC of
0.983, indicating a strong ability to distinguish between churned and
non-churned customers. The high AUC suggests that these models are
effective in reducing both false positives and false negatives, making
them reliable for churn prediction.
2. Accuracy and Recall Balance:
○ While accuracy is an important metric, recall is critical in churn
prediction to ensure we capture as many actual churn cases as
possible. The ensemble models (Random Forest and XGBoost)
achieved a good balance between accuracy and recall, which is
beneficial for business applications where failing to identify churned
customers can lead to revenue loss.
3. Impact of Ensemble Techniques:
○ Random Forest and XGBoost both utilise ensemble techniques,
which combine multiple decision trees to improve robustness and
reduce overfitting. The superior performance of these models over
single estimators like Decision Trees indicates that ensemble methods
are particularly effective for this dataset, which may contain complex
patterns that are better captured through an ensemble approach.
4. Model Stability:
○ The ensemble models also showed greater stability across multiple
runs, with consistent results in terms of performance metrics (accuracy,
precision, recall). This stability is essential for real-world applications
where the model might be deployed and updated periodically. Stability
reduces the need for constant retraining and fine-tuning, lowering
maintenance costs.
5. Performance of Boosting Algorithms:
○ Gradient Boosting and AdaBoost performed well but were slightly
less effective compared to XGBoost. XGBoost, being a more optimised
version of boosting, provides superior handling of high-dimensional
data and faster convergence, explaining its higher scores. This
indicates that while boosting methods are suitable, XGBoost may be
preferable when working with complex datasets.
6. Effect of Class Imbalance:
○ Although the dataset was imbalanced, the models still achieved high
recall rates for the minority class (churned customers), In a real-world
setting, additional techniques such as SMOTE (Synthetic Minority
Over-sampling Technique) could be used to further balance the
dataset, possibly enhancing recall without compromising precision.

10. Conclusion

8
Summary

The Iranian Churn Prediction Project achieved its objective of developing a robust
model to predict customer churn with high accuracy and recall. The analysis shows
that ensemble methods, particularly Random Forest and XGBoost, are highly
effective for this dataset, likely due to their ability to capture complex patterns and
interactions among features. These models are not only accurate but also stable and
reliable, making them well-suited for deployment in a real-world telecom business
context.

In conclusion, the churn prediction model developed in this project has significant
potential to assist the telecom company in minimising customer attrition. The model’s
high performance, particularly in AUC and recall, demonstrates its ability to serve as
a reliable tool for churn prediction. By acting on these predictions, the company can
implement targeted retention strategies, enhancing customer satisfaction and
ultimately supporting sustainable business growth.

9
SOURCE CODE

1) IMPORTING DATASETS AND LIBRARIES

2) EXTRACTING DATASET FEATURES

10
3) MODEL BUILDING

11
4) GETTING METRICS FOR THE MODEL

12
5) CALCULATING ACCURACY , PRECISION , RECALL

6) BUILDING CONFUSION MATRIX

13
7) DETERMINING CHURN VALUE

8) PLOTTING THE ROC-AUC CURVE

14
15
REFERENCES

1) UCI machine learning dataset repository

2) Dataset Source: Iranian Telecom Dataset
3) Documentation and tutorials from the Scikit-Learn library and XGBoost library
for model implementation.

A Comparison of Machine Learning Algorithms For Customer Churn Prediction
No ratings yet
A Comparison of Machine Learning Algorithms For Customer Churn Prediction
6 pages
Artificial Intelligence For Business Optimization Research and Applications (Unhelkar Bhuvan, Gonsalves Tad) (Z-Library)
100% (2)
Artificial Intelligence For Business Optimization Research and Applications (Unhelkar Bhuvan, Gonsalves Tad) (Z-Library)
325 pages
Vig SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
No ratings yet
Vig SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
51 pages
SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
No ratings yet
SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
51 pages
A Course in Machine Learning
No ratings yet
A Course in Machine Learning
189 pages
Ahmad2019 Article CustomerChurnPredictionInTelec PDF
No ratings yet
Ahmad2019 Article CustomerChurnPredictionInTelec PDF
24 pages
Contemporary ML For Physicists
No ratings yet
Contemporary ML For Physicists
91 pages
Quiz and Mid Paper Data
No ratings yet
Quiz and Mid Paper Data
31 pages
Bias and Variance in Machine Learning
100% (1)
Bias and Variance in Machine Learning
7 pages
Customer Churn Prediction System: A Machine Learning Approach
No ratings yet
Customer Churn Prediction System: A Machine Learning Approach
24 pages
Churn Prediction2
No ratings yet
Churn Prediction2
16 pages
Customer Churn Telecom
No ratings yet
Customer Churn Telecom
35 pages
Logistic Regression
No ratings yet
Logistic Regression
37 pages
All That Glitters Is Not Gold - Comparing Backtest and Out-of-Sample Performance On A Large Cohort o
No ratings yet
All That Glitters Is Not Gold - Comparing Backtest and Out-of-Sample Performance On A Large Cohort o
19 pages
XGBoost WM
No ratings yet
XGBoost WM
39 pages
Churn Prediction Product Idea
No ratings yet
Churn Prediction Product Idea
7 pages
0 - Worsheet Template
No ratings yet
0 - Worsheet Template
10 pages
Report
No ratings yet
Report
17 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
Bda Review
No ratings yet
Bda Review
13 pages
Deep Learning Curve 1693642530
No ratings yet
Deep Learning Curve 1693642530
10 pages
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
No ratings yet
Customer Churn Prediction in Telecom Sector Using Machine Learning Techniques
16 pages
New Approach For The Diagnosis of Extractions With Neural Network Machine Learning
No ratings yet
New Approach For The Diagnosis of Extractions With Neural Network Machine Learning
7 pages
Supervised Machine Learning
No ratings yet
Supervised Machine Learning
20 pages
Customer Churn Analysis and Prediction
No ratings yet
Customer Churn Analysis and Prediction
4 pages
INNOVATION - PDF Phrase 2
No ratings yet
INNOVATION - PDF Phrase 2
9 pages
Research Churn
No ratings yet
Research Churn
4 pages
2022 - Towards Understanding Grokking - An Effective Theory of Representation Learning
No ratings yet
2022 - Towards Understanding Grokking - An Effective Theory of Representation Learning
29 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
5 pages
Application of ANN in Pavement - Review
100% (1)
Application of ANN in Pavement - Review
61 pages
CS3491 Artificial Intelilgence and Machine Learning
No ratings yet
CS3491 Artificial Intelilgence and Machine Learning
27 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Meteorological Drought Forecasting For Ungauged Areas Based On
No ratings yet
Meteorological Drought Forecasting For Ungauged Areas Based On
18 pages
Capstone Project
No ratings yet
Capstone Project
21 pages
DL Unit 1 Introduction To DL
No ratings yet
DL Unit 1 Introduction To DL
62 pages
Churn Forecasting Using Deep Ljearning Model
No ratings yet
Churn Forecasting Using Deep Ljearning Model
5 pages
Hospital Productivity: The Role of Efficiency Drivers: Manhal Ali Reza Salehnejad Mohaimen Mansur
No ratings yet
Hospital Productivity: The Role of Efficiency Drivers: Manhal Ali Reza Salehnejad Mohaimen Mansur
18 pages
Algorithms 17 00231
No ratings yet
Algorithms 17 00231
21 pages
Group 13 - Analyzing Customer Churn
No ratings yet
Group 13 - Analyzing Customer Churn
6 pages
ML Customer Churn Case Study
No ratings yet
ML Customer Churn Case Study
4 pages
Paper Published
No ratings yet
Paper Published
5 pages
Efficacy of Customer Churn Prediction System
No ratings yet
Efficacy of Customer Churn Prediction System
8 pages
1.11 Introduction To Big Data Techniques - Answers
No ratings yet
1.11 Introduction To Big Data Techniques - Answers
15 pages
Hack Conquest
No ratings yet
Hack Conquest
7 pages
CNN Project
No ratings yet
CNN Project
16 pages
Grade Project
No ratings yet
Grade Project
1 page
Machine Learning
No ratings yet
Machine Learning
5 pages
Customer Churn Prediction Using Machine Learning Approaches
No ratings yet
Customer Churn Prediction Using Machine Learning Approaches
7 pages
Phase 3
No ratings yet
Phase 3
16 pages
1.) Detailed Workflow For Predicting Customer Churn in An Online Retail Store
No ratings yet
1.) Detailed Workflow For Predicting Customer Churn in An Online Retail Store
9 pages
Abhishekj Uvatkar
No ratings yet
Abhishekj Uvatkar
4 pages
Duplichecker Plagiarism Report
No ratings yet
Duplichecker Plagiarism Report
2 pages
Presentation 2
No ratings yet
Presentation 2
19 pages
Conference Latex Template
No ratings yet
Conference Latex Template
6 pages
ML Project Life Cycle With Example
No ratings yet
ML Project Life Cycle With Example
2 pages
ML New
No ratings yet
ML New
20 pages
003 KNN Complete
No ratings yet
003 KNN Complete
66 pages
A Hybrid Model For The Prediction of Air Pollutants Concentration, Based On Statistical and Machine Learning Techniques
No ratings yet
A Hybrid Model For The Prediction of Air Pollutants Concentration, Based On Statistical and Machine Learning Techniques
13 pages
22-CP-63 ML Assignment Report
No ratings yet
22-CP-63 ML Assignment Report
5 pages
Customer Churn Prediction Employing Ensemble Learning
No ratings yet
Customer Churn Prediction Employing Ensemble Learning
5 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
15 pages
Customer Churn Prediction Using Machine Learning Algorithms
No ratings yet
Customer Churn Prediction Using Machine Learning Algorithms
6 pages
22-cp-57 Assignment #02
No ratings yet
22-cp-57 Assignment #02
5 pages
Enhancing Robustness and Generalization in Deep Learning Models For Image Processing
No ratings yet
Enhancing Robustness and Generalization in Deep Learning Models For Image Processing
16 pages
Predictive Analytics Strategy
No ratings yet
Predictive Analytics Strategy
4 pages
Nimish
No ratings yet
Nimish
4 pages
Classification Report Telco
No ratings yet
Classification Report Telco
2 pages
Data Science Case Report
No ratings yet
Data Science Case Report
20 pages
DSS 2 Draft
No ratings yet
DSS 2 Draft
33 pages
Customer Churn Prediction Capstone Projectdocx
No ratings yet
Customer Churn Prediction Capstone Projectdocx
11 pages
Customer Churn Prediction Capstone Himanshu
No ratings yet
Customer Churn Prediction Capstone Himanshu
5 pages
Chatterjee 2011
No ratings yet
Chatterjee 2011
13 pages
Project Report
No ratings yet
Project Report
11 pages
Project Report
No ratings yet
Project Report
12 pages
DataScience Project-New
No ratings yet
DataScience Project-New
16 pages
Output 4
No ratings yet
Output 4
5 pages
s8 - Detection of Malicious Social Bots - Project Report
No ratings yet
s8 - Detection of Malicious Social Bots - Project Report
58 pages
Token ID Ain20250117003-1
No ratings yet
Token ID Ain20250117003-1
14 pages
Phase-2 Ibrahim
No ratings yet
Phase-2 Ibrahim
9 pages
Customer Churn Prediction Detailed Presentation
No ratings yet
Customer Churn Prediction Detailed Presentation
11 pages
Customer Churn Prediction Using Machine Learning
No ratings yet
Customer Churn Prediction Using Machine Learning
7 pages
Churn Prediction in Telecom Using Machine Learning in R
No ratings yet
Churn Prediction in Telecom Using Machine Learning in R
9 pages
Customerchurnprediction Systema Machinelearning
No ratings yet
Customerchurnprediction Systema Machinelearning
24 pages
Varshini Phase 3
No ratings yet
Varshini Phase 3
12 pages
Phase 3
No ratings yet
Phase 3
12 pages
Bharad Waj 2018
No ratings yet
Bharad Waj 2018
3 pages
Module - 01 Machine Learning (BCS602)
No ratings yet
Module - 01 Machine Learning (BCS602)
42 pages
Python ML Project Documentation
No ratings yet
Python ML Project Documentation
3 pages
One Algorithm To Predict Them All
No ratings yet
One Algorithm To Predict Them All
40 pages

Iranian Churn

Uploaded by

Iranian Churn

Uploaded by

A REPORT on

Iranian Churn Prediction

KIIT Deemed to be University

SCHOOL OF COMPUTER APPLICATION

KALINGA INSTITUTE OF INDUSTRIAL TECHNOLOGY

SNO. TOPIC PAGENO.

7 Performance Analysis 6-7

8 Results and Discussion 7

10 Source code 9-14

Summary Statistics and Preprocessing

Each feature underwent preprocessing steps to prepare it for model training:

1. Handling Missing Values: Missing values were either imputed with

Model Evaluation Metrics

The following metrics were calculated to evaluate model performance:

● Accuracy: Measures the overall proportion of correct predictions.

● Split Ratio: 80% for training, 20% for testing.

To tackle the problem, several machine learning algorithms were used:

1. Logistic Regression: A baseline linear model for binary classification. It

● Data Handling: pandas, numpy

Key Functions and Process

The confusion matrix summarises the performance for each model:

● True Positives (TP): Correctly predicted churned customers.

For instance, in a sample confusion matrix, RandomForestClassifier achieved the

9. Results and Discussion

1. High AUC and ROC Performance:

1) IMPORTING DATASETS AND LIBRARIES

2) EXTRACTING DATASET FEATURES

6) BUILDING CONFUSION MATRIX

8) PLOTTING THE ROC-AUC CURVE

1) UCI machine learning dataset repository

You might also like