ML 2 - Problem Statements and Rubirics
ML 2 - Problem Statements and Rubirics
Context
Employee Promotion means the ascension of an employee to higher ranks, this aspect of the job is
what drives employees the most.
The ultimate reward for dedication and loyalty towards an organization and the HR team plays
an important role in handling all these promotion tasks based on ratings and other attributes
available.
The HR team in JMD company stored data of the promotion cycle last year, which consists of details
of all the employees in the company working last year and also
if they got promoted or not, but every time this process gets delayed due to so many details available
for each employee - it gets difficult to compare and decide.
So this time HR team wants to utilize the stored data to make a model, that will predict if a person is
eligible for promotion or not.
Objective
You as a data scientist at JMD company, need to come up with a model that will help the HR team to
predict if a person is eligible for promotion or not.
2. Build a classification model to predict if the customer has a higher probability of getting a
promotion
4. Generate a set of insights and recommendations that will help the company
Data Dictionary
no_ of_ trainings: no of other trainings completed in the previous year on soft skills,
technical skills, etc.
awards_ won: if awards won during the previous year then 1 else 0
Criteria Exploratory Data Analysis- Problem definition, questions to be answered - Data background
and contents - Univariate analysis - Bivariate analysis - Key meaningful observations on individual
variables and the relationship between variablesPoints6
Criteria Data Preprocessing- Prepare the data for analysis - Feature Engineering - Missing value
Treatment - Ensure no data leakage among train-test and validation setsPoints3
Criteria Model Building - Original Data- Choose the appropriate metric for model evaluation - Build 5
models (from decision trees, bagging and boosting methods) - Comment on the model performance
* You can choose NOT to build XGBoost if you are facing issues with the installationPoints6
Criteria Model building - Oversampled data- Oversample the train data - Build 5 models (from
decision trees, bagging and boosting methods) - Comment on the model performance * You can
choose NOT to build XGBoost if you are facing issues with the installationPoints6
Criteria Model building - Undersampled data- Undersample the train data - Build 5 models (from
decision trees, bagging and boosting methods) - Comment on the model performance * You can
choose NOT to build XGBoost if you are facing issues with the installationPoints6
Criteria Model Performance Improvement using Hyperparameter Tuning- Choose 3 models (at least)
that might perform better after tuning with proper reasoning - Tune the 3 models (at least) obtained
above using randomized search and metric of interest - Comment on the performance of 3 tuned
models * You can choose NOT to tune XGBoost if you experience long runtimesPoints10
Criteria Model Performance Comparison and Final Model Selection- Compare the performance of
tuned models - Choose the best model - Comment on the performance of the best model on the test
setPoints2
Criteria Actionable Insights & Recommendations- Write down insights from the analysis conducted -
Provide actionable business recommendationsPoints6