EST_Problem Statement-3
EST_Problem Statement-3
JEC
O
PR
LE
DU
T MO
ES
AIML MODULE
PROJECT
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited
AIML MODULE PROJECT
• CONTEXT: A telecom company wants to use their historical customer data and leverage machine learning to predict behaviour in an attempt
to retain customers. The end goal is to develop focused customer retention programs
• DATA DESCRIPTION: Each row represents a customer, each column contains customer’s attributes described on the column Metadata. The
• PROJECT OBJECTIVE: The objective, as a data scientist hired by the telecom company, is to build a model that will help to identify the
potential customers who have a higher probability to churn. This will help the company to understand the pain points and patterns of customer
C. Merge both the DataFrames on key ‘customerID’ to form a single DataFrame [2 Mark]
D. Verify if all the columns are incorporated in the merged DataFrame by using simple comparison Operator in Python. [1 Marks]
B. Make sure all the variables with continuous values are of ‘Float’ type. [2 Marks]
C. Create a function that will accept a DataFrame as input and return pie-charts for all the appropriate Categorical features. Clearly show percentage
E. Encode all the appropriate Categorical features with the best suitable approach. [2 Marks]
F. Split the data into 80% train and 20% test. [1 Marks]
A. Train a model using Decision tree and check the performance of the model on train and test data ( 4 marks )
B. Use grid search and improve the performance of the Decision tree model , check the performance of the model on train and test data , provide the
C. Train a model using Random forest and check the performance of the model on train and test data ( 4 marks )
D. Use grid search and improve the performance of the Random tree model , check the performance of the model on train and test data , provide the
E. Train a model using Adaboost and check the performance of the model on train and test data ( 4 marks )
F. Use grid search and improve the performance of the Adaboost model , check the performance of the model on train and test data , provide the
G. Train a model using GradientBoost and check the performance of the model on train and test data ( 4 marks )
©Great Learning. Proprietary content. All Rights Reserved. Unauthorised use or distribution prohibited
ff
ff
ff
H. Use grid search and improve the performance of the GradientBoost model , check the performance of the model on train and test data , provide
(1) Compare the performance of each model in train stage and test stage