0% found this document useful (0 votes)

6 views12 pages

Phase 3

The project focuses on predicting customer churn in the telecom industry using machine learning techniques. It employs a structured dataset to build models like Logistic Regression, Random Forest, and XGBoost, with XGBoost yielding the highest accuracy of 86%. The project aims to provide actionable insights for customer retention through data analysis and model interpretability using SHAP values.

Uploaded by

mdnafeed29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views12 pages

Phase 3

Uploaded by

mdnafeed29

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Github Link: https://github.com/nandhu345-coder/phase_3.

git

Project Title : Predicting Customer Churn Using Machine Learning To

Uncover Hidden Patterns

PHASE-3

1. Problem Statement

Customer churn refers to when clients stop doing business with a company. In highly
competitive industries, understanding why customers churn is crucial for retaining them. This
project aims to build a machine learning model that can accurately classify whether a customer is
likely to churn, using behavioral and demographic data from a structured dataset. Accurately
predicting churn allows businesses to take proactive steps for customer retention and reduced
revenue loss.

2. Abstract

This project applies machine learning to the problem of customer churn prediction using real-
world telecom data. The dataset includes customer demographics, subscription details, billing
patterns, and service usage. After rigorous preprocessing and analysis, we trained three models—
Logistic Regression, Random Forest, and XGBoost—with XGBoost achieving the highest
accuracy (86%) and F1-score (0.82). The model's predictions were interpreted using SHAP
values for transparency. This system enables telecom companies to identify and retain at-risk
customers effectively, resulting in better business performance.

3. System Requirements
○ Hardware:
○ Minimum 4 GB RAM (8 GB recommended)
○ Standard processor (Intel i3/i5 or AMD equivalent)
○ Software:
○ Python 3.10+
○ Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn, xgboost, shap,
plotly
○ IDE: Google Colab / Jupyter Notebook
4. Objectives

 Build a robust classification model for predicting customer churn.

 Identify the most important features contributing to churn.

 Provide actionable insights using visualizations and SHAP values.

 Achieve high model performance using advanced ensemble techniques.

 Make the system interpretable and usable for non-technical business teams.

5. Flowchart of the Project Workflow

1. Data Collection (Kaggle/IBM dataset)
2. Data Preprocessing (Cleaning, Encoding, Scaling)
3. EDA (Exploring patterns and key drivers of churn)
4. Feature Engineering (new features + selection + PCA)
5. Model Building (Logistic Regression, Random Forest, XGBoost)
6. Model Evaluation (Confusion Matrix, ROC, F1-Score)
7. Interpretation (SHAP Values)
8. Reporting & Visualization
6. Dataset Description

● Source: Kaggle / IBM Sample Dataset

Type: Structured CSV
Records: ~7000+ rows
Features: Customer demographics, services, billing details
Target Variable: Churn (Yes/No)
● Nature: Structured tabular data
7. Data Preprocessing
 Missing values handled via imputation
 Duplicate entries removed
 Outliers capped using IQR technique
 Label Encoding and One-Hot Encoding for categorical features
 MinMaxScaler for normalizing numeric data

8. Exploratory Data Analysis (EDA)

●  Contract type, tenure, and monthly charges had strong correlations with churn
●  Visualizations: Histograms, Boxplots, Correlation Heatmaps
●  Insights: Customers with short contracts and high bills churn more; fiber internet users
show higher churn probability

9. Feature Engineering
- Created new features: Total Services Used, Engagement Level
- Interaction terms: e.g., contract type × charges
- Feature selection via SelectKBest
- PCA for dimensionality reduction while retaining interpretability

9. Model Building
● - Models: Logistic Regression, Random Forest, XGBoost
- Train-test split: 80-20
- Best model: XGBoost
- Accuracy: 86%
- F1-Score: 0.82
- AUC: 0.88
○

○ train_test_split(random_state=42)

11. Model Evaluation

- Confusion Matrix: Improved precision and recall in XGBoost
- ROC Curve: Best AUC with XGBoost
- SHAP Analysis: Showed top features influencing churn (Contract Type, Tenure,
Monthly Charges)
12. Deployment
- Model is ready for deployment via Flask/Streamlit (pending UI integration)
- SHAP plots embedded for model interpretability
- Notebook available on GitHub with end-to-end code and visualizations

13. Source Code

# Phase-2: Predicting Customer Churn using Machine Learning

# Author: Mohammed Aasif

# Step 1: Import Libraries

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder, StandardScaler

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,

roc_auc_score, roc_curve

import warnings

warnings.filterwarnings('ignore')

# Step 2: Create a Larger Sample Dataset with equal-length columns

n_samples = 20

# Cycle contract values safely

contract_values = ['Month-to-month', 'One year', 'Two year']

contract_column = [contract_values[i % 3] for i in range(n_samples)]

data = {

'customerID': [f'{i:03}' for i in range(1, n_samples + 1)],

'gender': ['Female', 'Male'] * (n_samples // 2),

'SeniorCitizen': [0, 1] * (n_samples // 2),

'Partner': ['Yes', 'No'] * (n_samples // 2),

'Dependents': ['No', 'Yes'] * (n_samples // 2),

'tenure': np.random.randint(1, 72, n_samples),

'PhoneService': ['Yes', 'No'] * (n_samples // 2),

'InternetService': ['DSL', 'Fiber optic'] * (n_samples // 2),

'Contract': contract_column,

'MonthlyCharges': np.round(np.random.uniform(20, 120, n_samples), 2),

'TotalCharges': np.round(np.random.uniform(100, 5000, n_samples), 2),

'Churn': ['No', 'Yes'] * (n_samples // 2)

df = pd.DataFrame(data)
# Step 3: Preprocessing

label_cols = ['gender', 'Partner', 'Dependents', 'PhoneService', 'InternetService', 'Contract',

'Churn']

for col in label_cols:

df[col] = LabelEncoder().fit_transform(df[col])

# Step 4: Feature Engineering

df['TotalServicesUsed'] = df['PhoneService'] + df['InternetService']

df['EngagementScore'] = df['Contract'] * df['tenure']

# Step 5: Feature Selection

X = df.drop(['customerID', 'Churn'], axis=1)

y = df['Churn']

# Step 6: Scaling

scaler = StandardScaler()

X_scaled = scaler.fit_transform(X)

# Step 7: Train-Test Split

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, stratify=y,
random_state=42)

# Step 8: Model Training

model = RandomForestClassifier(random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

y_proba = model.predict_proba(X_test)[:, 1]

# Step 9: Evaluation

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Confusion Matrix

sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')

plt.title("Confusion Matrix")

plt.xlabel("Predicted")

plt.ylabel("Actual")

plt.show()
# ROC Curve

fpr, tpr, _ = roc_curve(y_test, y_proba)

plt.plot(fpr, tpr, label=f"AUC = {roc_auc_score(y_test, y_proba):.2f}")

plt.plot([0, 1], [0, 1], 'k--')

plt.xlabel("False Positive Rate")

plt.ylabel("True Positive Rate")

plt.title("ROC Curve")

plt.legend()

plt.grid()

plt.show()

14. Future Scope

- Integrate model into a customer relationship management (CRM) dashboard
- Expand dataset across multiple telecom operators for generalizability
- Real-time prediction system with alerts for retention teams
- Deploy as a chatbot-based churn predictor for customer support agents

13. Team Members and Roles

NAME ROLE RESPONSIBLITY

PRIYADHARSHINI R Lead Oversee project
development,
coordinate team
activities, ensure
timely delivery of
milestones, and
contribute to
documentation and
Data Engineer final
NANDHITHA M Data Engineer Collect data from
APIs (e.g., Twitter),
manage dataset
storage, clean and
preprocess text data,
and ensure quality of
input data
Varshini.S, NLP Specialist / Build sentiment and
Data emotion classification
Vaishnavi.A models, perform
feature engineering,
and evaluate
model performance
using suitable metrics
Sonika.R Data Analyst / Conduct
Visualization exploratory data
analysis (EDA),
generate insights, and
develop visualizations
such as word clouds,
emotion trends, and
sentiment

P8 Book v2.0
50% (2)
P8 Book v2.0
588 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
18 pages
Fashion Forecast
No ratings yet
Fashion Forecast
25 pages
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
100% (1)
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
38 pages
Churn Prediction Product Idea
No ratings yet
Churn Prediction Product Idea
7 pages
Global Fashion Report 2023
No ratings yet
Global Fashion Report 2023
62 pages
Internship Evaluation Presentation (Pranshu)
No ratings yet
Internship Evaluation Presentation (Pranshu)
7 pages
Lab Assignment 1 Ucs551
No ratings yet
Lab Assignment 1 Ucs551
23 pages
Concept Note - Chhandavi Gowardhan
No ratings yet
Concept Note - Chhandavi Gowardhan
2 pages
Vig SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
No ratings yet
Vig SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
51 pages
SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
No ratings yet
SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
51 pages
Group 13 - Analyzing Customer Churn
No ratings yet
Group 13 - Analyzing Customer Churn
6 pages
ML Project Life Cycle With Example
No ratings yet
ML Project Life Cycle With Example
2 pages
0 - Worsheet Template
No ratings yet
0 - Worsheet Template
10 pages
Project Report
No ratings yet
Project Report
11 pages
Py - Customer Churn Classification - Actuaries' Analytical Cookbook
No ratings yet
Py - Customer Churn Classification - Actuaries' Analytical Cookbook
76 pages
Hanoi - 2021: (Document Title)
No ratings yet
Hanoi - 2021: (Document Title)
19 pages
Iranian Churn
No ratings yet
Iranian Churn
16 pages
INNOVATION - PDF Phrase 2
No ratings yet
INNOVATION - PDF Phrase 2
9 pages
Review1 1
No ratings yet
Review1 1
16 pages
Final Project Report
No ratings yet
Final Project Report
25 pages
Churnprediction Project File
No ratings yet
Churnprediction Project File
12 pages
Major Project
No ratings yet
Major Project
27 pages
DWDM Cep
No ratings yet
DWDM Cep
13 pages
Research Churn
No ratings yet
Research Churn
4 pages
Final Review Batch 07
No ratings yet
Final Review Batch 07
30 pages
Customer Churn Prediction Capstone Projectdocx
No ratings yet
Customer Churn Prediction Capstone Projectdocx
11 pages
Customer Churn Prediction Capstone Himanshu
No ratings yet
Customer Churn Prediction Capstone Himanshu
5 pages
Token ID Ain20250117003-1
No ratings yet
Token ID Ain20250117003-1
14 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
5 pages
Abhishekj Uvatkar
No ratings yet
Abhishekj Uvatkar
4 pages
ML Project Part B
No ratings yet
ML Project Part B
8 pages
Data Science Case Report
No ratings yet
Data Science Case Report
20 pages
Predictive Analytics Strategy
No ratings yet
Predictive Analytics Strategy
4 pages
Batch 3
No ratings yet
Batch 3
22 pages
G I Ý Làm KHDL
No ratings yet
G I Ý Làm KHDL
82 pages
Nimish
No ratings yet
Nimish
4 pages
Capstone Project
No ratings yet
Capstone Project
21 pages
DSS 2 Draft
No ratings yet
DSS 2 Draft
33 pages
1.) Detailed Workflow For Predicting Customer Churn in An Online Retail Store
No ratings yet
1.) Detailed Workflow For Predicting Customer Churn in An Online Retail Store
9 pages
Synopsis Major Project
No ratings yet
Synopsis Major Project
8 pages
Churn Prediction in Telecom Using Machine Learning in R
No ratings yet
Churn Prediction in Telecom Using Machine Learning in R
9 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
15 pages
Customer Churn Prediction Using Machine Learning Algorithms
No ratings yet
Customer Churn Prediction Using Machine Learning Algorithms
6 pages
Daa 01
No ratings yet
Daa 01
11 pages
Project Report
No ratings yet
Project Report
12 pages
Python ML Project Documentation
No ratings yet
Python ML Project Documentation
3 pages
Varshini Phase 2
No ratings yet
Varshini Phase 2
19 pages
Synopsis
No ratings yet
Synopsis
3 pages
Output 4
No ratings yet
Output 4
5 pages
Naresh PBL
No ratings yet
Naresh PBL
18 pages
Bharad Waj 2018
No ratings yet
Bharad Waj 2018
3 pages
Grade Project
No ratings yet
Grade Project
1 page
Final PPT
No ratings yet
Final PPT
25 pages
Nikhil Sanjay Thorat Assignment 2
No ratings yet
Nikhil Sanjay Thorat Assignment 2
9 pages
Customer Churn Prediction Using Machine Learning
No ratings yet
Customer Churn Prediction Using Machine Learning
7 pages
Customer Churn Analysis and Prediction
No ratings yet
Customer Churn Analysis and Prediction
4 pages
Varshini Phase 3
No ratings yet
Varshini Phase 3
12 pages
Classification Report Telco
No ratings yet
Classification Report Telco
2 pages
Wa0001.
No ratings yet
Wa0001.
11 pages
Phase-2 Ibrahim
No ratings yet
Phase-2 Ibrahim
9 pages
Professional Ethics
No ratings yet
Professional Ethics
45 pages
Forms of Economic Analysis
No ratings yet
Forms of Economic Analysis
10 pages
Question Analysis: Taxation I
No ratings yet
Question Analysis: Taxation I
9 pages
Security Analysis and Portfolio Management of Five Major Players in Banking Sector
100% (2)
Security Analysis and Portfolio Management of Five Major Players in Banking Sector
130 pages
FCHN DisplayCheckRegister
No ratings yet
FCHN DisplayCheckRegister
3 pages
DFMEA Study & Examples
No ratings yet
DFMEA Study & Examples
17 pages
The Essential Guide To Creating A Captivating Customer Recap Video Ebo
No ratings yet
The Essential Guide To Creating A Captivating Customer Recap Video Ebo
17 pages
Presentation by DR SP Garg Former ED GAIL
No ratings yet
Presentation by DR SP Garg Former ED GAIL
27 pages
TikTokCreatorMarketplace Making Creator Marketing Easy
No ratings yet
TikTokCreatorMarketplace Making Creator Marketing Easy
11 pages
Raw Summaries Craft Beer in China
No ratings yet
Raw Summaries Craft Beer in China
5 pages
Velocity Capital Group
No ratings yet
Velocity Capital Group
4 pages
THIS PROJECT TRAINEE AGREEMENT Is Made On - (Date) Between
100% (1)
THIS PROJECT TRAINEE AGREEMENT Is Made On - (Date) Between
1 page
Chapter 07 - Strategy Formulation-Corporate Strategy
No ratings yet
Chapter 07 - Strategy Formulation-Corporate Strategy
22 pages
Untitled
No ratings yet
Untitled
3 pages
Quick Revision
No ratings yet
Quick Revision
6 pages
Application Form - NCTU - Diploma
No ratings yet
Application Form - NCTU - Diploma
2 pages
Clubhouse Redevlopment 1.2
No ratings yet
Clubhouse Redevlopment 1.2
7 pages
Combinations: Advocates & Solicitors Delhi - Gurgaon - Mumbai - Bangalore - Hyderabad
No ratings yet
Combinations: Advocates & Solicitors Delhi - Gurgaon - Mumbai - Bangalore - Hyderabad
23 pages
Workbook For Students
No ratings yet
Workbook For Students
28 pages
GenAI Workshop Report
No ratings yet
GenAI Workshop Report
14 pages
AI For All Assignment 01
No ratings yet
AI For All Assignment 01
4 pages
Transmittal Memo
No ratings yet
Transmittal Memo
10 pages
Virtual Management and The New Normal: New Perspectives On HRM and Leadership Since The Covid-19 Pandemic Svein Bergum
100% (1)
Virtual Management and The New Normal: New Perspectives On HRM and Leadership Since The Covid-19 Pandemic Svein Bergum
47 pages
Guangzhou Metro
No ratings yet
Guangzhou Metro
13 pages
NexGen Onboarding Manual (v1.4)
No ratings yet
NexGen Onboarding Manual (v1.4)
10 pages
Packet Tracer
No ratings yet
Packet Tracer
3 pages
BCM54684D0KFBG
No ratings yet
BCM54684D0KFBG
1 page
Phase 2 File 1
No ratings yet
Phase 2 File 1
4 pages
Project Expo Summary Report
No ratings yet
Project Expo Summary Report
7 pages
Project Expo Summary Report Final
No ratings yet
Project Expo Summary Report Final
7 pages
Conso Sugar
No ratings yet
Conso Sugar
4 pages
Phase 2 File
No ratings yet
Phase 2 File
4 pages
Demo
No ratings yet
Demo
1 page
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet

Phase 3

Uploaded by

Phase 3

Uploaded by

Github Link: https://github.com/nandhu345-coder/phase_3.

Project Title : Predicting Customer Churn Using Machine Learning To

 Build a robust classification model for predicting customer churn.

 Identify the most important features contributing to churn.

 Provide actionable insights using visualizations and SHAP values.

 Achieve high model performance using advanced ensemble techniques.

5. Flowchart of the Project Workflow

● Source: Kaggle / IBM Sample Dataset

8. Exploratory Data Analysis (EDA)

11. Model Evaluation

13. Source Code

# Author: Mohammed Aasif

# Step 1: Import Libraries

import seaborn as sns

import matplotlib.pyplot as plt

from sklearn.preprocessing import LabelEncoder, StandardScaler

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,

# Step 2: Create a Larger Sample Dataset with equal-length columns

# Cycle contract values safely

contract_column = [contract_values[i % 3] for i in range(n_samples)]

'customerID': [f'{i:03}' for i in range(1, n_samples + 1)],

'gender': ['Female', 'Male'] * (n_samples // 2),

'SeniorCitizen': [0, 1] * (n_samples // 2),

'Partner': ['Yes', 'No'] * (n_samples // 2),

'Dependents': ['No', 'Yes'] * (n_samples // 2),

'tenure': np.random.randint(1, 72, n_samples),

'PhoneService': ['Yes', 'No'] * (n_samples // 2),

'InternetService': ['DSL', 'Fiber optic'] * (n_samples // 2),

'MonthlyCharges': np.round(np.random.uniform(20, 120, n_samples), 2),

'TotalCharges': np.round(np.random.uniform(100, 5000, n_samples), 2),

'Churn': ['No', 'Yes'] * (n_samples // 2)

label_cols = ['gender', 'Partner', 'Dependents', 'PhoneService', 'InternetService', 'Contract',

for col in label_cols:

# Step 4: Feature Engineering

df['TotalServicesUsed'] = df['PhoneService'] + df['InternetService']

df['EngagementScore'] = df['Contract'] * df['tenure']

# Step 5: Feature Selection

X = df.drop(['customerID', 'Churn'], axis=1)

# Step 7: Train-Test Split

# Step 8: Model Training

print("Accuracy:", accuracy_score(y_test, y_pred))

print("\nClassification Report:\n", classification_report(y_test, y_pred))

sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')

fpr, tpr, _ = roc_curve(y_test, y_proba)

plt.plot(fpr, tpr, label=f"AUC = {roc_auc_score(y_test, y_proba):.2f}")

plt.plot([0, 1], [0, 1], 'k--')

plt.xlabel("False Positive Rate")

plt.ylabel("True Positive Rate")

14. Future Scope

13. Team Members and Roles

NAME ROLE RESPONSIBLITY

You might also like