Project 3
Project 3
Problem Statement:
You are the data scientist at a telecom company named “Neo” whose customers
are churning out to its competitors. You have to analyze the data of your
company and find insights and stop your customers from churning out to other
telecom companies.
Customer_churn Dataset:
The details regarding this ‘customer_churn’ dataset are present in the data
dictionary
t
aa
lliP
Lab Environment: Anaconda
te
Domain: Telecom
Tasks To Be Performed:
In
1. Data Manipulation:
● Extract the 5th column and store it in ‘customer_5’
● Extract the 15th column and store it in ‘customer_15’
● Extract all the male senior citizens whose payment method is electronic
check and store the result in ‘senior_male_electronic’
● Extract all those customers whose tenure is greater than 70 months or
their monthly charges is more than $100 and store the result in
‘customer_total_tenure’
● Extract all the customers whose contract is of two years, payment method
is mailed check and the value of churn is ‘Yes’ and store the result in
‘two_mail_yes’
● Extract 333 random records from the customer_churndataframe and store
the result in ‘customer_333’
● Get the count of different levels from the ‘Churn’ column
2. Data Visualization:
● Build a bar-plot for the ’InternetService’ column:
a. Set x-axis label to ‘Categories of Internet Service’
t
b. Set y-axis label to ‘Count of Categories’
aa
c. Set the title of plot to be ‘Distribution of Internet Service’
d. Set the color of the bars to be ‘orange’
3. Linear Regression:
● Build a simple linear model where dependent variable is ‘MonthlyCharges’
and independent variable is ‘tenure’:
a. Divide the dataset into train and test sets in 70:30 ratio.
b. Build the model on train set and predict the values on test set
c. After predicting the values, find the root mean square error
d. Find out the error in prediction & store the result in ‘error’
e. Find the root mean square error
4. Logistic Regression:
● Build a simple logistic regression model where dependent variable is
‘Churn’ and independent variable is ‘MonthlyCharges’:
a. Divide the dataset in 65:35 ratio
b. Build the model on train set and predict the values on test set
c. Build the confusion matrix and get the accuracy score
d. Build a multiple logistic regression model where dependent variable
is ‘Churn’ and independent variables are ‘tenure’ and
‘MonthlyCharges’
t
e. Divide the dataset in 80:20 ratio
aa
f. Build the model on train set and predict the values on test set
g. Build the confusion matrix and get the accuracy score
5. Decision Tree:
● Build a decision tree model where dependent variable is ‘Churn’ and
independent variable is ‘tenure’:
lliP
a. Divide the dataset in 80:20 ratio
b. Build the model on train set and predict the values on test set
c. Build the confusion matrix and calculate the accuracy
6. Random Forest:
● Build a Random Forest model where dependent variable is ‘Churn’ and
te