0% found this document useful (0 votes)
124 views3 pages

ML 2 - Problem Statements and Rubirics

The document outlines the objective for a data scientist at JMD company to develop a model that predicts employee promotion eligibility based on historical data. It includes steps for data exploration, model building, optimization, and generating insights. The document also provides a detailed data dictionary and evaluation criteria for the project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views3 pages

ML 2 - Problem Statements and Rubirics

The document outlines the objective for a data scientist at JMD company to develop a model that predicts employee promotion eligibility based on historical data. It includes steps for data exploration, model building, optimization, and generating insights. The document also provides a detailed data dictionary and evaluation criteria for the project.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

ML - 2

Context

Employee Promotion means the ascension of an employee to higher ranks, this aspect of the job is
what drives employees the most.
The ultimate reward for dedication and loyalty towards an organization and the HR team plays
an important role in handling all these promotion tasks based on ratings and other attributes
available.

The HR team in JMD company stored data of the promotion cycle last year, which consists of details
of all the employees in the company working last year and also
if they got promoted or not, but every time this process gets delayed due to so many details available
for each employee - it gets difficult to compare and decide.

So this time HR team wants to utilize the stored data to make a model, that will predict if a person is
eligible for promotion or not.

Objective

You as a data scientist at JMD company, need to come up with a model that will help the HR team to
predict if a person is eligible for promotion or not.

1. Explore and visualize the dataset.

2. Build a classification model to predict if the customer has a higher probability of getting a
promotion

3. Optimize the model using appropriate techniques

4. Generate a set of insights and recommendations that will help the company

Data Dictionary

 employee_id: Unique ID for the employee

 department: Department of employee

 region: Region of employment (unordered)

 education: Education Level

 gender: Gender of Employee

 recruitment_channel: Channel of recruitment for employee

 no_ of_ trainings: no of other trainings completed in the previous year on soft skills,
technical skills, etc.

 age: Age of Employee

 previous_ year_ rating: Employee Rating for the previous year


 length_ of_ service: Length of service in years

 awards_ won: if awards won during the previous year then 1 else 0

 avg_ training_ score: Average score in current training evaluations

 is_promoted: (Target) Recommended for promotion


Rubric

Criteria Exploratory Data Analysis- Problem definition, questions to be answered - Data background
and contents - Univariate analysis - Bivariate analysis - Key meaningful observations on individual
variables and the relationship between variablesPoints6

Criteria Data Preprocessing- Prepare the data for analysis - Feature Engineering - Missing value
Treatment - Ensure no data leakage among train-test and validation setsPoints3

Criteria Model Building - Original Data- Choose the appropriate metric for model evaluation - Build 5
models (from decision trees, bagging and boosting methods) - Comment on the model performance
* You can choose NOT to build XGBoost if you are facing issues with the installationPoints6

Criteria Model building - Oversampled data- Oversample the train data - Build 5 models (from
decision trees, bagging and boosting methods) - Comment on the model performance * You can
choose NOT to build XGBoost if you are facing issues with the installationPoints6

Criteria Model building - Undersampled data- Undersample the train data - Build 5 models (from
decision trees, bagging and boosting methods) - Comment on the model performance * You can
choose NOT to build XGBoost if you are facing issues with the installationPoints6

Criteria Model Performance Improvement using Hyperparameter Tuning- Choose 3 models (at least)
that might perform better after tuning with proper reasoning - Tune the 3 models (at least) obtained
above using randomized search and metric of interest - Comment on the performance of 3 tuned
models * You can choose NOT to tune XGBoost if you experience long runtimesPoints10

Criteria Model Performance Comparison and Final Model Selection- Compare the performance of
tuned models - Choose the best model - Comment on the performance of the best model on the test
setPoints2

Criteria Actionable Insights & Recommendations- Write down insights from the analysis conducted -
Provide actionable business recommendationsPoints6

Criteria Business Report Quality- Adhere to the business report checklistPoints6

Criteria Guided Project DeductionPoints9

You might also like