80% found this document useful (10 votes)

5K views16 pages

Capstone Interim Report - HR CTC Prediction

The document is an interim report for an HR data capstone project. It aims to develop a tool to predict employee salaries to reduce effort for HR and avoid discrimination. The report details data collection, preprocessing of 25,000 applicant records with 29 parameters. Exploratory data analysis identified fresher records as outliers and examined relationships between variables. Three regression models were tested on preprocessed data, with boosted decision trees performing best with lower error rates. Recommendations include further outlier removal, parameter selection, and model tuning to improve accuracy.

Uploaded by

chinudash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

80% found this document useful (10 votes)

5K views16 pages

Capstone Interim Report - HR CTC Prediction

Uploaded by

chinudash

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Interim Report of

HR Data Capstone Project

Submitted By

Chinmaynanda Dash
Seshavataram Peesapati
Yogesh S

Under the guidance of

Prerna Bhardwaj

P a g e 1 | 16
1. Introduction
HR team plays a crucial role in determination of salary of employee in organization, if any of the
judgement or consideration goes wrong, will affect the performance due to employee dissatisfaction
& which may lead to disengagement of employee. Meanwhile HR team need to keep an eye to retain
the talent in organization.

At present situation / crisis / opportunity, people do move out frequently and in the hand Organization
need more people as replacement as well as for new project requirements. HR team has to carry out
recruitment drives throughout the year as well as each year fresher need to be hired.

To overcome such cumbersome & judgmental process, Can we have some prediction tool, which can
predict the salary details of each employee recruited by the firm, which will reduce the hard work
carried out by HR team for negotiating the salary & avoid discrimination in organization.

2. Problem Statement, Scope and Objective

We have a problem statement related to an organization Delta ltd. The HR team of Delta want to
have a system, which predict the salary of employees, which will lead to no discrimination &
employee satisfaction based on their past data, easy to use, avoid manual judgement & effective tool
with minimal involvement.

We have a scope of developing a tool, which help them out in solving their issue & reduce their effort
in salary calculation. It will easy to use & avoid manual work out.

The objective, we have here is, we collect past data of all employees of Delta ltd, which are presently
used for estimation of Annual salary of an employee by HR. then we understand the data & analysis
the data & prepare a model to predict the salary of new employee with similar kind of profile & avoid
manual judgement.
We test the model by comparing it with existing data as confirmation.

3. Data Description
We have collected handsome amount of data (25000 Applicants) from the HR team of Delta ltd. It
contain 29 different parameter on which the salary judgement( Expected CTC) is processed. We have
observed it contains both numerical & categorical data.
Numerical data – There are 12 Parameters such as Index, Application ID, Total experience,
Experience in field, passing years of graduation, PG & PHD, Current CTC, No. of companied
worked, No.of publication, certification & expected CTC.

P a g e 2 | 16
Categorical data - Remaining 17 out of 29 are categorical data.
Ordinal categorical data are – Education, Appraisal Rating and Designation.

We do have Missing values in Department, Roles, Designation, education, education related columns.
Most of the missing values have arisen due to freshers & under graduates.
The fresher are outliers.

4. Data Pre-processing
We have observed, fresher’s or “0 Experience” category is an outlier, we remove such rows from
data. Do the model evaluation in 2nd phase.
The higher education details are kept as null as not applicable to lower education level as per
hierarchy. For example, an undergraduate will be not applicable for graduate, postgraduate & PhD
parameters.

P a g e 3 | 16
For industry related parameters such as role, position, industry, & department null positions, we
replaced it with others for experience candidates and for fresher (0 experience candidates it is “NA”
for industry, organization, department, role & designation.
Our dependent variable is expected salary; we consider the median Expected salary as dependent
variable & other 28 parameter as independent variable.
We evaluate the relationship with dependent & independent variable through EDA.
We evaluate the model with all data then check the error reduction with eliminating the outliers & by
model tuning.

5. Exploratory Data Analysis

1. We carried out EDA-01 for initially with all 26 independent parameters with replacing null
values of roles, department, industry & designation as “Others”.
2. The higher education details are kept as null as not applicable to lower education level as per
hierarchy. For example, an undergraduate will be not applicable for graduate, postgraduate
& PHD parameters.
3. Graph shown below department & organization as independent variable with reference to
expected CTC.
4. We have considered “Median of expected CTC” for identification of correlation with
independent variable.

P a g e 4 | 16
Other EDA graphs are covered in Appendix -01

5. We had major observation related to fresher (with zero experience) as outlier.

6. We removed all the 908 rows with fresher to carry out further EDA. With new data to check
the correlation of dependent variable with all 26 independent variable.
7. Below are the inferences of EDA-02 listed in the table.

P a g e 5 | 16
The EDA graph for remaining variable in available in Appendix- EDA-02

P a g e 6 | 16
6. Modelling Approach

We have used Azure ML Studio with initial data without elimination of outlier, With 3 different
regression models.
We have considered three parameters to evaluate the model best suited for our project.
1. Mean absolute error.
2. Root mean square error.
3. Coefficient of determination.

We have split the data into 70:30 ratio as train & test data.

Mean Absolute Root Mean Square Coefficient of

Sl.no. Models Error (MAE) Error(RMSE) Determination(COD)
1 Boosted decision tree Regression. 17744.97 31778.9 0.9992
2 Linear Regression. 53880.17 80657.2 0.9953
3 Decision forest Regression. 41877.84 63639.72 0.997

We have observed boosted decision tree model give better results. Further to this we will
work with boosted decision tree for model tuning.

P a g e 7 | 16
After elimination of fresher (with zero experience) as outlier.

Mean Absolute Root Mean Square Coefficient of

Sl.no. Models
Error (MAE) Error (RMSE) Determination (COD)
1 Boosted decision tree Regression. 13403.08 17277.75 0.9997
2 Linear Regression. 48251.56 65183.91 0.9968
3 Decision forest Regression. 39203.29 57430.4 0.9974

7. Actionable insights and recommendations to the stakeholder

1. We need to identify few insights from EDA & Reason being for such pattern observation.
2. We need to reduce further the MAE & RMSE values & reduce the difference within them.
3. That can be done by identifying further outliers, by elimination of parameter which has
minimal relationship with dependent variables & by model tuning.
4. We convert the data into 70:25:5 ratio to train, test & verify the model as user experience by
providing 5% data as external source to validate the model accuracy.

8. References and Bibliography

1. Tableau dashboard
2. Great learning lecturer videos
3. https://www.ijitee.org/wp-content/uploads/papers/v9i6/F4545049620.pdf
4. https://machinelearningmastery.com/difference-test-validation-datasets/
5. https://www.datascience2000.in/2021/05/employee-salary-prediction-in-machine.html
6. https://towardsdatascience.com/will-your-employee-leave-a-machine-learning-model-
8484c2a6663e
7. https://medium.com/analytics-vidhya/machine-learning-project-3-predict-salary-using-
polynomial-regression-7024c7bace4f
8. https://www.atlantis-press.com/journals/ijcis/25899235/view
9. https://www.hindawi.com/journals/sp/2021/8387277/

9. Appendix

1. EDA-01
2. EDA-02
3. MODEL -01
4. MODEL-02

P a g e 8 | 16
P a g e 9 | 16
P a g e 10 | 16
1. EDA-02

P a g e 11 | 16
P a g e 12 | 16
P a g e 13 | 16
P a g e 14 | 16
Model -01

Model -02

P a g e 15 | 16
For detail instructions see Interim Report Guidelines. Non-adherence to Guideline instructions
will incur heavy penalty.

P a g e 16 | 16

Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
77 pages
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
No ratings yet
TSF Shoe Sales & Softdrink by Shubradip Ghosh Pgpdsba 2022 Mar
61 pages
(Wisconsin Studies in Autobiography) Anna Poletti - Julie Rak - Identity Technologies - Constructing The Self Online (2014, University of Wisconsin Press)
100% (1)
(Wisconsin Studies in Autobiography) Anna Poletti - Julie Rak - Identity Technologies - Constructing The Self Online (2014, University of Wisconsin Press)
301 pages
FinalReport Life Insurance
80% (5)
FinalReport Life Insurance
34 pages
Project Report - FRA V1.0
71% (7)
Project Report - FRA V1.0
28 pages
SMDM Project
87% (15)
SMDM Project
23 pages
Predictive Modelling Project Report Final
45% (11)
Predictive Modelling Project Report Final
49 pages
Plant Design - 1.3 Design of Chemical Products
No ratings yet
Plant Design - 1.3 Design of Chemical Products
69 pages
Customer Churn Prediction Project: by Shweta Gupta
100% (6)
Customer Churn Prediction Project: by Shweta Gupta
41 pages
Guideline RNAO Ostomy Care Management
No ratings yet
Guideline RNAO Ostomy Care Management
117 pages
Project Report - Tableau
No ratings yet
Project Report - Tableau
4 pages
Report - Project8 - FRA - Surabhi - Report
0% (1)
Report - Project8 - FRA - Surabhi - Report
15 pages
Capstone Project - Credit Risk Analysis
67% (6)
Capstone Project - Credit Risk Analysis
50 pages
Time Series Forecasting - SoftDrink - Business Report
75% (4)
Time Series Forecasting - SoftDrink - Business Report
37 pages
Capstone - 1 Notes - Vikas Chauhan PDF
100% (3)
Capstone - 1 Notes - Vikas Chauhan PDF
13 pages
Customer Churn Data - A Project Based On Logistic Regression
100% (12)
Customer Churn Data - A Project Based On Logistic Regression
31 pages
FRA Milestone1 - Maminulislam
100% (4)
FRA Milestone1 - Maminulislam
23 pages
Business Report M2 PDF
100% (2)
Business Report M2 PDF
14 pages
Anushi Sparkling
100% (4)
Anushi Sparkling
70 pages
Problem 1: Linear Regression
54% (13)
Problem 1: Linear Regression
14 pages
Capstone Presentation: Telecom Churn Study
100% (3)
Capstone Presentation: Telecom Churn Study
19 pages
Assignment 1 Project Risk Managemnt
No ratings yet
Assignment 1 Project Risk Managemnt
21 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
100% (2)
Project +Sweta+Kumari+ +FRA+Milestone+1+ July+ 2021
31 pages
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
100% (2)
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
29 pages
Predective Modellig Project
100% (1)
Predective Modellig Project
18 pages
Project Notes - II (Capstone Project) - Facebook Comments Volume Prediction - YS
83% (6)
Project Notes - II (Capstone Project) - Facebook Comments Volume Prediction - YS
15 pages
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
100% (1)
MRA Project ML 1: Abhishek Kapoor Dsba Aug A20
47 pages
Business Report Machine Learning-1
100% (7)
Business Report Machine Learning-1
60 pages
Business Report TSF - Rose DataSet
100% (4)
Business Report TSF - Rose DataSet
52 pages
CustomerChurn Assignment
100% (3)
CustomerChurn Assignment
15 pages
FRA Report
100% (1)
FRA Report
30 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
MRA ML1 - Kirtesh
100% (7)
MRA ML1 - Kirtesh
43 pages
Ritesh Tandon Machine Learning Project
100% (5)
Ritesh Tandon Machine Learning Project
23 pages
FRA Project Business Report
100% (2)
FRA Project Business Report
27 pages
Vivek Dubey - Marketing & Retail Analytics
100% (2)
Vivek Dubey - Marketing & Retail Analytics
20 pages
Advance Statistics - Buisness Report
100% (1)
Advance Statistics - Buisness Report
26 pages
Lifi
100% (1)
Lifi
16 pages
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
100% (4)
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
19 pages
Business Report
No ratings yet
Business Report
12 pages
Predictive Modelling Project - Business Report
100% (1)
Predictive Modelling Project - Business Report
23 pages
Beyond Winners and Losers: Median Sector Rotation in The Japanese Equity Market
No ratings yet
Beyond Winners and Losers: Median Sector Rotation in The Japanese Equity Market
15 pages
Capstone Project Business: Predict Customer Churn in E-Commerce
100% (2)
Capstone Project Business: Predict Customer Churn in E-Commerce
10 pages
Capstone Proect Notes 2
100% (2)
Capstone Proect Notes 2
16 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Facebook Comment Volume Prediction
100% (1)
Facebook Comment Volume Prediction
12 pages
DVT Alternate Project
50% (2)
DVT Alternate Project
1 page
Chapter 10 - Data Collection Methods
0% (2)
Chapter 10 - Data Collection Methods
10 pages
Assignment - Python
50% (2)
Assignment - Python
1 page
The Impact of Modular Learning On G12 Humss Students in Saint Theresa College of Tandag Amidst of Covid - 19
No ratings yet
The Impact of Modular Learning On G12 Humss Students in Saint Theresa College of Tandag Amidst of Covid - 19
33 pages
Data Visualisation - Car Claim Insurance Project
100% (5)
Data Visualisation - Car Claim Insurance Project
6 pages
Problem Statement
0% (2)
Problem Statement
2 pages
Quarter 1 - Module 1: The Nature of Research (Meaning, Characteristics and Importance)
100% (2)
Quarter 1 - Module 1: The Nature of Research (Meaning, Characteristics and Importance)
10 pages
Assignment Report - Data Mining
No ratings yet
Assignment Report - Data Mining
24 pages
Project 1-3
No ratings yet
Project 1-3
8 pages
Health Crisis Looms As Life Expectancy Soars - Advanced PDF
No ratings yet
Health Crisis Looms As Life Expectancy Soars - Advanced PDF
5 pages
Project Report - Advanced - Stats - Final PDF
No ratings yet
Project Report - Advanced - Stats - Final PDF
25 pages
FRA Assignment - India Credit Model
No ratings yet
FRA Assignment - India Credit Model
14 pages
Emerging - 2021 - Module 2 PDF
No ratings yet
Emerging - 2021 - Module 2 PDF
61 pages
Gowtham Mra 2
No ratings yet
Gowtham Mra 2
18 pages
Advance Statistics Business Report
No ratings yet
Advance Statistics Business Report
15 pages
Ma Psychology
No ratings yet
Ma Psychology
46 pages
SMDM Extended Project Report
No ratings yet
SMDM Extended Project Report
9 pages
Gifted and Talented Education
No ratings yet
Gifted and Talented Education
31 pages
Milestone 1
No ratings yet
Milestone 1
2 pages
Capstone Final PPT Group 6
No ratings yet
Capstone Final PPT Group 6
19 pages
Letter
No ratings yet
Letter
23 pages
Culture and The Consumer Journey - 2020 - Journal of Retailing
No ratings yet
Culture and The Consumer Journey - 2020 - Journal of Retailing
15 pages
Report On Dunya Media Group: Human Resource Management
No ratings yet
Report On Dunya Media Group: Human Resource Management
13 pages
Kaushik Project
No ratings yet
Kaushik Project
13 pages
Journal of Business Research: Shashwat Gupta, Mohammad M. Foroudi, Juha Väätänen, Suraksha Gupta, Len Tiu Wright
No ratings yet
Journal of Business Research: Shashwat Gupta, Mohammad M. Foroudi, Juha Väätänen, Suraksha Gupta, Len Tiu Wright
13 pages
Ijerph 18 07207 v2
No ratings yet
Ijerph 18 07207 v2
9 pages
The Personal Interest Project (PIP)
No ratings yet
The Personal Interest Project (PIP)
13 pages
Assignment # 1 Styles of Communication
No ratings yet
Assignment # 1 Styles of Communication
5 pages
Synopsis Group 6 Final
No ratings yet
Synopsis Group 6 Final
6 pages
The Effect of Real Effective Exchange Rate On Balance of Payments in Ethiopia: A Co-Integrated VAR Approach
No ratings yet
The Effect of Real Effective Exchange Rate On Balance of Payments in Ethiopia: A Co-Integrated VAR Approach
10 pages
Predictive 1.3 Linear Regression
No ratings yet
Predictive 1.3 Linear Regression
11 pages
DATA4800 Report
No ratings yet
DATA4800 Report
6 pages
(1-Way Analysis of Covariance ANCOVA) (DR SEE KIN HAI)
No ratings yet
(1-Way Analysis of Covariance ANCOVA) (DR SEE KIN HAI)
5 pages
Multivariate Calibration . II. Chemometric Methods: Tormod Naes and Harald Martens
No ratings yet
Multivariate Calibration . II. Chemometric Methods: Tormod Naes and Harald Martens
6 pages
Salary Hike Predictor Synopsis
No ratings yet
Salary Hike Predictor Synopsis
4 pages
College of West Africa, Alumni Association, Usa
No ratings yet
College of West Africa, Alumni Association, Usa
6 pages
Graphic Organizer For Investigation
No ratings yet
Graphic Organizer For Investigation
1 page
McKency 7S Principles
No ratings yet
McKency 7S Principles
4 pages
Week 17 Research 1ST Att.
No ratings yet
Week 17 Research 1ST Att.
4 pages
New Content-1
No ratings yet
New Content-1
2 pages
Example Writing SPSS Results
No ratings yet
Example Writing SPSS Results
3 pages
Part 2 Prelim Ha Lec Transes
No ratings yet
Part 2 Prelim Ha Lec Transes
2 pages

Capstone Interim Report - HR CTC Prediction

Uploaded by

Capstone Interim Report - HR CTC Prediction

Uploaded by

Interim Report of

HR Data Capstone Project

Under the guidance of

2. Problem Statement, Scope and Objective

5. Exploratory Data Analysis

5. We had major observation related to fresher (with zero experience) as outlier.

Mean Absolute Root Mean Square Coefficient of

Mean Absolute Root Mean Square Coefficient of

7. Actionable insights and recommendations to the stakeholder

8. References and Bibliography

You might also like