0% found this document useful (0 votes)

6 views9 pages

Phase-2 Ibrahim

The document outlines a project focused on predicting customer churn in the telecommunications industry using machine learning techniques. It details the problem statement, project objectives, data description, preprocessing steps, exploratory data analysis, model building, and team contributions. The project aims to develop a classification model to identify churn likelihood and improve customer retention strategies.

Uploaded by

azhanmohammed04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views9 pages

Phase-2 Ibrahim

Uploaded by

azhanmohammed04

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Phase-2 Submission Template

Student Name: MOHAMMED IBRAHIM

Register Number: 510623104058

Institution: C ABDUL HAKEEM COLEGE OF ENGINEERING
& TECHNOLOGY
Department: COMPUTER SCIENCE & ENGINEERING

Date of Submission: 08-05-2025

Github Repository Link: https://github.com/ibrahim5826/customer-
support-chatbot.git

1. Problem Statement

Customer churn is a critical metric for businesses, especially in highly competitive

industries like telecommunications. Churn refers to the rate at which customers stop
doing business with a company. Our project aims to predict whether a customer is
likely to churn based on their service usage patterns, contract details, and
demographics.
• Refinement from Phase-1:
Initially, the problem was understood in broad terms. After analyzing the
dataset, we realized that several categorical variables (e.g., Contract,
PaymentMethod, InternetService) significantly influence churn behavior.
Therefore, we narrowed our focus to predicting churn using a classification
model.
• Problem Type: Binary Classification The target variable (Churn) has two
possible outcomes: "Yes" or "No".
• Why it matters:
Accurately predicting churn can help businesses reduce customer loss by
targeting retention strategies, leading to improved customer satisfaction and
revenue stability.

2. Project Objectives
• Build machine learning models that can classify whether a customer will
churn.

• Achieve high model accuracy and balanced precision-recall to handle class

imbalance.

• Identify the most influential features contributing to churn.

• Make model outputs interpretable for business stakeholders.

Updated Objective: After EDA, we included feature importance analysis and

customer profiling to better understand churn causes.
3. Flowchart of the Project Workflow

Data Collection

Data Cleaning and Processing

Exploratory Data Analysis(EDA)

Future Engineering

Model Building and Evaluation

Result Interpretation & Visualization

Documentation & Deployment Plan

4. Data Description

• Dataset Name: Telco Customer Churn Dataset

• Source: Kaggle - IBM Sample Dataset

• Type of Data: Structured tabular data

• Number of Records: 7,043 rows

• Number of Features: 21 features (excluding customer ID)

• Static/Dynamic: Static snapshot

• Target Variable: Churn (Yes/No)

5. Data Preprocessing

• Missing Values:

o Column TotalCharges had 11 missing values due to blank entries.

These were imputed using the median value of the column.

• Duplicate Records:

o Checked using df.duplicated().sum() → Result: 0 duplicates.

• Outliers:

o Outliers in MonthlyCharges and TotalCharges were identified using

boxplots. Handled using winsorization for extreme cases.

• Data Type Conversion:

o TotalCharges was originally an object type. Converted to float using

pd.to_numeric().

• Categorical Encoding:

o Binary columns (e.g., gender, Partner) were label encoded.

o Multi-category columns (e.g., PaymentMethod, InternetService) were

one-hot encoded.

• Feature Scaling:

o Numerical features (tenure, MonthlyCharges, TotalCharges) were

standardized using StandardScaler.
6. Exploratory Data Analysis (EDA)

• Univariate Analysis:

o Churn: 26.5% customers churned (imbalanced target) o Contract:

Most churn occurs in month-to-month contracts o Visuals used:

Histograms, boxplots, countplots

• Bivariate/Multivariate Analysis:

o Correlation matrix: Showed strong correlation between tenure and

MonthlyCharges with churn o Pairplots and groupby plots:

▪ Customers with fiber optic internet churn more often ▪

Customers using electronic checks are more likely to churn

• Insights Summary:

o tenure is inversely related to churn o Longer contract types (1-year or

2-year) have lower churn rates o Services like tech support and online

backup seem to retain customers

7. Feature Engineering
New Features Created:
• TenureGroup: Categorized tenure into "0–12", "12–24", etc.

• HasMultipleServices: Combined multiple service features to count total

services per customer

Transformed Features:

• Created interaction terms like MonthlyCharges * Tenure

Dimensionality Reduction:

• PCA was attempted but didn't improve model performance significantly, so

not retained in final model.

Feature Selection:

• Used Recursive Feature Elimination (RFE) to choose top 10 important

features

8. Model Building

• Train/Test Split:
o 80% training, 20% testing; stratified on target to maintain class
balance

• Models Implemented:
o Logistic Regression: Baseline model
o Random Forest Classifier: Non-linear model for improved
performance
• Evaluation Metrics:
o Accuracy, Precision, Recall, F1-Score, ROC AUC
F1-
Model Accuracy Precision Recall AUC

Logistic Regression 0.80 0.71 0.67 0.69 0.83

Random Forest 0.86 0.78 0.76 0.77 0.89

9. Visualization of Results & Model Insights

• Confusion Matrix: Showed reduction in false positives for Random Forest

• ROC Curve: Random Forest had an AUC of 0.89 indicating strong

performance

• Feature Importance Plot: Contract, tenure, and PaymentMethod were most

influential

• Churn Profile Visuals: Created churn heatmaps by contract and tenure

group

10. Tools and Technologies Used

Programming Language: Python

Notebook Environment: Google Colab Libraries

Used:

• pandas, numpy – Data processing

• matplotlib, seaborn, plotly – Visualization

• scikit-learn – Modeling and evaluation

• xgboost (optional experiment)

11. Team Members and Contributions

Mohammed Musaddiq. M [510623104059]-Project Lead & Problem

Definition Responsible for defining the problem statement, coordinating
tasks, and ensuring the project follows the timeline. Oversees final
documentation and submission.

Mohammed Ammar Saqib [510623104055] - Data Collection & Cleaning

Gathers relevant datasets from public sources or generates synthetic data.
Handles data preprocessing (cleaning, formatting, normalization).

Ghani Adnan Faiz [510623104005] - Exploratory Data Analysis (EDA)

Analyzes data to uncover patterns and insights. Creates visualizations using
matplotlib/seaborn/plotly.

Fateh Mohammed [510623104024] - Feature Engineering & Model Building

Designs features, selects and trains models (e.g., intent classifiers, response
generators). Chooses appropriate NLP techniques.

Abraar. A [510623104003] - Model Evaluation & Interpretation

Evaluates model performance using metrics (accuracy, F1 score, etc.).
Prepares interpretation reports and validation results.
Mohammed Ibrahim – [510623104058] - Deployment & Frontend
Integration Builds and deploys the chatbot using Streamlit/Gradio/Flask.
Handles web interface design and chatbot testing.

Rice Mill Project Report
68% (28)
Rice Mill Project Report
31 pages
Dbs Group Data Anaytics in Audit CS Clean
No ratings yet
Dbs Group Data Anaytics in Audit CS Clean
12 pages
KTU Format Mini Project
0% (1)
KTU Format Mini Project
14 pages
Final Churn Prediction
No ratings yet
Final Churn Prediction
16 pages
Business Analytics in Performance Management
No ratings yet
Business Analytics in Performance Management
18 pages
MATH6200 - Data Analysis
No ratings yet
MATH6200 - Data Analysis
4 pages
Chapter 3 - Forecasting - EXCEL TEMPLATES
No ratings yet
Chapter 3 - Forecasting - EXCEL TEMPLATES
14 pages
Marketing Analytics Price and Promotion
No ratings yet
Marketing Analytics Price and Promotion
90 pages
Chapter 2
No ratings yet
Chapter 2
32 pages
Be Summer 2022
No ratings yet
Be Summer 2022
2 pages
SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
No ratings yet
SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
51 pages
Vig SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
No ratings yet
Vig SPS-5382-Telecom Customer Churn Prediction Using Watson Auto AI
51 pages
Weibull Distribution Illustration in Excel
100% (1)
Weibull Distribution Illustration in Excel
4 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
44 pages
Forcasting Dimsum
No ratings yet
Forcasting Dimsum
18 pages
Ps 7
No ratings yet
Ps 7
9 pages
Data Science Engineering Program Brochure
No ratings yet
Data Science Engineering Program Brochure
18 pages
A Bittersweet Phenomenon The Internal Structure, Functional Mechanism, and Effect PDF
No ratings yet
A Bittersweet Phenomenon The Internal Structure, Functional Mechanism, and Effect PDF
10 pages
Stat & Prob Formula Sheet
100% (1)
Stat & Prob Formula Sheet
2 pages
Parul Unversity Parul Institute of Business Administration: Regression
No ratings yet
Parul Unversity Parul Institute of Business Administration: Regression
3 pages
Chapter 04 02 Stat
No ratings yet
Chapter 04 02 Stat
18 pages
EViews Workshop
No ratings yet
EViews Workshop
26 pages
Hanoi - 2021: (Document Title)
No ratings yet
Hanoi - 2021: (Document Title)
19 pages
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
100% (1)
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
38 pages
Zimmerman 2007
No ratings yet
Zimmerman 2007
5 pages
Customer Churn Presentation
No ratings yet
Customer Churn Presentation
28 pages
Writing A Thesis Report
100% (3)
Writing A Thesis Report
7 pages
Topic 8 Time Series and Forecasting
No ratings yet
Topic 8 Time Series and Forecasting
33 pages
Ieor-Mid Term-1 2k20 Pe 25
No ratings yet
Ieor-Mid Term-1 2k20 Pe 25
13 pages
Dac Phase 2
No ratings yet
Dac Phase 2
5 pages
Report
No ratings yet
Report
17 pages
DWDM Cep
No ratings yet
DWDM Cep
13 pages
Churn Prediction Product Idea
No ratings yet
Churn Prediction Product Idea
7 pages
12622-Article Text-22383-1-10-20220510
No ratings yet
12622-Article Text-22383-1-10-20220510
5 pages
Telecom Customer Churn Report
No ratings yet
Telecom Customer Churn Report
3 pages
Edanalytix - Experienced Professionals - Analytics & DS Opportunities
No ratings yet
Edanalytix - Experienced Professionals - Analytics & DS Opportunities
13 pages
Customer Churn Analysis and Prediction
No ratings yet
Customer Churn Analysis and Prediction
4 pages
Correlation Analysis
No ratings yet
Correlation Analysis
48 pages
Data Mining Project Report Template
No ratings yet
Data Mining Project Report Template
3 pages
Research Churn
No ratings yet
Research Churn
4 pages
Py - Customer Churn Classification - Actuaries' Analytical Cookbook
No ratings yet
Py - Customer Churn Classification - Actuaries' Analytical Cookbook
76 pages
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
5 pages
Kuldeep Sharma Thesis Valuation Report
No ratings yet
Kuldeep Sharma Thesis Valuation Report
3 pages
Capstone Project
No ratings yet
Capstone Project
21 pages
Lab Assignment 1 Ucs551
No ratings yet
Lab Assignment 1 Ucs551
23 pages
Group 13 - Analyzing Customer Churn
No ratings yet
Group 13 - Analyzing Customer Churn
6 pages
Internship Evaluation Presentation (Pranshu)
No ratings yet
Internship Evaluation Presentation (Pranshu)
7 pages
PR69 - Data Analisis and Simulations in Intralogistics - EN
No ratings yet
PR69 - Data Analisis and Simulations in Intralogistics - EN
3 pages
Iranian Churn
No ratings yet
Iranian Churn
16 pages
Churnprediction Project File
No ratings yet
Churnprediction Project File
12 pages
Customer Churn Project Description
No ratings yet
Customer Churn Project Description
2 pages
Customer Churn Internship Report PDF
No ratings yet
Customer Churn Internship Report PDF
34 pages
Artikel Jurnal Mia Nurhidayah
No ratings yet
Artikel Jurnal Mia Nurhidayah
6 pages
1.) Detailed Workflow For Predicting Customer Churn in An Online Retail Store
No ratings yet
1.) Detailed Workflow For Predicting Customer Churn in An Online Retail Store
9 pages
Abhishekj Uvatkar
No ratings yet
Abhishekj Uvatkar
4 pages
Grade Project
No ratings yet
Grade Project
1 page
ML Project Part B
No ratings yet
ML Project Part B
8 pages
Customer Churn Prediction
100% (1)
Customer Churn Prediction
18 pages
What Is Data Interpretation
No ratings yet
What Is Data Interpretation
11 pages
Problem Statement - Usecase 1.2
No ratings yet
Problem Statement - Usecase 1.2
3 pages
ML Project Life Cycle With Example
No ratings yet
ML Project Life Cycle With Example
2 pages
Theory of Computation Regulation 2021
No ratings yet
Theory of Computation Regulation 2021
356 pages
Telco Customers Churn Predication - Analysis
No ratings yet
Telco Customers Churn Predication - Analysis
24 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
15 pages
Nikhil Sanjay Thorat Assignment 2
No ratings yet
Nikhil Sanjay Thorat Assignment 2
9 pages
Naresh PBL
No ratings yet
Naresh PBL
18 pages
Stucor - QP CS3492 ND2023 Ab
No ratings yet
Stucor - QP CS3492 ND2023 Ab
4 pages
Predictive Analytics Strategy
No ratings yet
Predictive Analytics Strategy
4 pages
Data Science Case Report
No ratings yet
Data Science Case Report
20 pages
DSS 2 Draft
No ratings yet
DSS 2 Draft
33 pages
Nimish
No ratings yet
Nimish
4 pages
Stucor - QP CS3492 Am2023 Ac
No ratings yet
Stucor - QP CS3492 Am2023 Ac
3 pages
Customer Churn Prediction Capstone Projectdocx
No ratings yet
Customer Churn Prediction Capstone Projectdocx
11 pages
Customer Churn Prediction Capstone Himanshu
No ratings yet
Customer Churn Prediction Capstone Himanshu
5 pages
Project Report
No ratings yet
Project Report
11 pages
Project Report
No ratings yet
Project Report
12 pages
Wa0001.
No ratings yet
Wa0001.
11 pages
Token ID Ain20250117003-1
No ratings yet
Token ID Ain20250117003-1
14 pages
Output 4
No ratings yet
Output 4
5 pages
Finalized Version
No ratings yet
Finalized Version
16 pages
Customer Churn Prediction Using Machine Learning
No ratings yet
Customer Churn Prediction Using Machine Learning
7 pages
IBM Data Science Project - Round2
No ratings yet
IBM Data Science Project - Round2
32 pages
Project - Telecom Customer Churn (Sprint)
No ratings yet
Project - Telecom Customer Churn (Sprint)
3 pages
Varshini Phase 3
No ratings yet
Varshini Phase 3
12 pages
Churn Prediction in Telecom Using Machine Learning in R
No ratings yet
Churn Prediction in Telecom Using Machine Learning in R
9 pages
Phase-2 Intelligent Chatbot Automated Assistance
No ratings yet
Phase-2 Intelligent Chatbot Automated Assistance
7 pages
Phase 3
No ratings yet
Phase 3
12 pages
Varshini Phase 2
No ratings yet
Varshini Phase 2
19 pages
CP 2
No ratings yet
CP 2
4 pages
Python ML Project Documentation
No ratings yet
Python ML Project Documentation
3 pages
Concept Note - Chhandavi Gowardhan
No ratings yet
Concept Note - Chhandavi Gowardhan
2 pages
WT Manual_merged
No ratings yet
WT Manual_merged
68 pages
Practice Exam Answers
No ratings yet
Practice Exam Answers
40 pages
fateh 1
No ratings yet
fateh 1
7 pages
BCG_analysis - Colab
No ratings yet
BCG_analysis - Colab
4 pages
harini 1
No ratings yet
harini 1
3 pages
algo Important questions
No ratings yet
algo Important questions
2 pages
TOC Important QP
No ratings yet
TOC Important QP
2 pages
aasif 1
No ratings yet
aasif 1
2 pages
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet

Phase-2 Ibrahim

Uploaded by

Phase-2 Ibrahim

Uploaded by

Phase-2 Submission Template

Student Name: MOHAMMED IBRAHIM

Register Number: 510623104058

Date of Submission: 08-05-2025

Customer churn is a critical metric for businesses, especially in highly competitive

• Achieve high model accuracy and balanced precision-recall to handle class

• Identify the most influential features contributing to churn.

• Make model outputs interpretable for business stakeholders.

Updated Objective: After EDA, we included feature importance analysis and

Data Cleaning and Processing

Exploratory Data Analysis(EDA)

Model Building and Evaluation

Result Interpretation & Visualization

Documentation & Deployment Plan

• Dataset Name: Telco Customer Churn Dataset

• Source: Kaggle - IBM Sample Dataset

• Type of Data: Structured tabular data

• Number of Records: 7,043 rows

• Number of Features: 21 features (excluding customer ID)

• Static/Dynamic: Static snapshot

• Target Variable: Churn (Yes/No)

o Column TotalCharges had 11 missing values due to blank entries.

o Checked using df.duplicated().sum() → Result: 0 duplicates.

o Outliers in MonthlyCharges and TotalCharges were identified using

• Data Type Conversion:

o TotalCharges was originally an object type. Converted to float using

o Binary columns (e.g., gender, Partner) were label encoded.

o Multi-category columns (e.g., PaymentMethod, InternetService) were

o Numerical features (tenure, MonthlyCharges, TotalCharges) were

o Churn: 26.5% customers churned (imbalanced target) o Contract:

Most churn occurs in month-to-month contracts o Visuals used:

Histograms, boxplots, countplots

o Correlation matrix: Showed strong correlation between tenure and

MonthlyCharges with churn o Pairplots and groupby plots:

▪ Customers with fiber optic internet churn more often ▪

Customers using electronic checks are more likely to churn

o tenure is inversely related to churn o Longer contract types (1-year or

backup seem to retain customers

• HasMultipleServices: Combined multiple service features to count total

• Created interaction terms like MonthlyCharges * Tenure

• PCA was attempted but didn't improve model performance significantly, so

• Used Recursive Feature Elimination (RFE) to choose top 10 important

Logistic Regression 0.80 0.71 0.67 0.69 0.83

9. Visualization of Results & Model Insights

• Confusion Matrix: Showed reduction in false positives for Random Forest

• ROC Curve: Random Forest had an AUC of 0.89 indicating strong

• Feature Importance Plot: Contract, tenure, and PaymentMethod were most

• Churn Profile Visuals: Created churn heatmaps by contract and tenure

10. Tools and Technologies Used

Programming Language: Python

Notebook Environment: Google Colab Libraries

• pandas, numpy – Data processing

• matplotlib, seaborn, plotly – Visualization

• scikit-learn – Modeling and evaluation

11. Team Members and Contributions

Mohammed Musaddiq. M [510623104059]-Project Lead & Problem

Mohammed Ammar Saqib [510623104055] - Data Collection & Cleaning

Ghani Adnan Faiz [510623104005] - Exploratory Data Analysis (EDA)

Fateh Mohammed [510623104024] - Feature Engineering & Model Building

Abraar. A [510623104003] - Model Evaluation & Interpretation

You might also like