Phase 3
Phase 3
Prepared by: shouq alrahma 202102114, mariam alhammadi 202118366, maitha alateeqi
202117547
Using the `head` of the telecom data, the first few rows of the data reveal
important factors that impact customer churn. Customers who have been
with the company for a shorter period of time (1-2 months) are more
likely to churn, based on their "Yes" churn status, compared to a
customer who has been with the company for 45 months and has not
churned. Furthermore, clients with month-to-month agreements exhibit
higher turnover rates when contrasted with customers who have one-
year contracts. The payment method is also important, as three
customers who used Electronic check as their payment method have left,
indicating a potential connection between this payment method and
customer turnover. Clients who do not have extra features such as
Technical Support, Device Protection, and Streaming TV are more likely
to cancel their services frequently, suggesting that combining services
could improve customer retention. These trends underscore the
significance of tenure, type of contract, payment method, and extra
services in comprehending and dealing with customer turnover.
In this step, we prepared the data for analysis by converting the target
variable Churn into a factor, which is essential for classification tasks.
The Churn column, initially containing "Yes" and "No" values as
strings, was transformed into a factor to enable the model to interpret it
correctly as a categorical variable. This change is crucial for the
classification model to distinguish between customers who have left
("Yes") and those who have stayed ("No"). By ensuring that Churn is set
as a factor, we allow the model to handle it as a binary outcome,
improving the accuracy and interpretability of our predictive analysis.
In this phase, we proceed with Model Planning and Building by first
dividing our data into training and testing sets, which is essential for
evaluating the model’s performance. We set a random seed
(set.seed(123)) to ensure reproducibility, so that each time the code runs,
the split will be the same. Using a 70-30 split ratio, we assign 70% of the
data to the training set (used to build the model) and the remaining 30%
to the testing set (used to evaluate the model's accuracy). This separation
allows us to train the model on one portion of the data and test it on
another, providing a more reliable measure of how the model will
perform on unseen data.
In this step, we constructed a Decision Tree Model for classification to
predict customer churn based on various factors. Using
the rpart function, the model analyzes the relationship between the target
variable (Churn) and 19 independent variables such
as gender, tenure, contract type, payment method, and additional
services like Tech Support and Streaming TV. The output shows the tree
structure starting with 4,930 training observations at the root node,
where 1,308 customers are predicted as "Yes" for churn and the rest as
"No." The model splits based on key factors like contract type, tenure,
and Internet Service, progressively narrowing down groups of customers
to make more accurate predictions. For example, customers with
a month-to-month contract have a higher churn probability, while those
with longer contracts (e.g., one-year or two-year agreements) are less
likely to churn. Similarly, customers using Fiber optic services or
lacking Tech Support are identified as high-risk groups.
In this step, we visualize our decision tree model for predicting customer
churn by plotting it. Each node in the tree represents a decision point
based on different features, such as "Contract," "InternetService," or
"Tenure." At each decision node, customers are split based on their
characteristics to predict their likelihood of churn ("Yes" or "No").
For example, at the top node (root), we see the "Contract" feature, where
customers with month-to-month contracts are more likely to churn than
those with one-year or two-year contracts. Specifically, out of 2,706
customers with month-to-month contracts, 1,167 have a churn status of
"No," while 1,539 have a churn status of "Yes," indicating a higher
likelihood of churn among this group. The number shown in each node
indicates the split based on the feature, with "Yes" or "No" outcomes
representing churn status. The numbers within each node display the
customer count and the distribution of churn outcomes. Blue nodes
generally indicate a prediction of "No" (not churning), while green
nodes indicate "Yes" (churning).
Improve Service for Fiber Optic Customers: If the analysis shows fiber
optic users are more likely to churn, investigate service quality or
pricing issues and consider offering tailored support or premium features
to improve satisfaction among these customers.