0% found this document useful (0 votes)

85 views11 pages

HR Analyst (Data Analyst)

Hr manager

Uploaded by

dharavathpavan935

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views11 pages

HR Analyst (Data Analyst)

Hr manager

Uploaded by

dharavathpavan935

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Project Title Human Resources Analyst

Tools Machine Learning

Technologies Data Analyst

Project Difficulties level intermediate

Dataset : Dataset is available in the given link. You can download it at your convenience.

Click here to download data set

About Dataset
Updated 30 January 2023

Version 14 of Dataset

License Update:
There has been some confusion around licensing for this data set. Dr. Carla Patalano and Dr. Rich Huebner are the
original authors of this dataset.

We provide a license to anyone who wishes to use this dataset for learning or teaching. For the purposes of sharing,
please follow this license:

CC-BY-NC-ND
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
License.
Codebook
https://rpubs.com/rhuebner/hrd_cb_v14

PLEASE NOTE -- I recently updated the codebook - please use the above link. A few minor discrepancies were
identified between the codebook and the dataset. Please feel free to contact me through LinkedIn
(www.linkedin.com/in/RichHuebner) to report discrepancies and make requests.

Context
HR data can be hard to come by, and HR professionals generally lag behind with respect to analytics and data
visualization competency. Thus, Dr. Carla Patalano and I set out to create our own HR-related dataset, which is
used in one of our graduate MSHRM courses called HR Metrics and Analytics, at New England College of Business.
We created this data set ourselves. We use the data set to teach HR students how to use and analyze the data in
Tableau Desktop - a data visualization tool that's easy to learn.

This version provides a variety of features that are useful for both data visualization AND creating machine learning /
predictive analytics models. We are working on expanding the data set even further by generating even more
records and a few additional features. We will be keeping this as one file/one data set for now. There is a possibility
of creating a second file perhaps down the road where you can join the files together to practice SQL/joins, etc.

Note that this dataset isn't perfect. By design, there are some issues that are present. It is primarily designed as a
teaching data set - to teach human resources professionals how to work with data and analytics.

Content
We have reduced the complexity of the dataset down to a single data file (v14). The CSV revolves around a fictitious
company and the core data set contains names, DOBs, age, gender, marital status, date of hire, reasons for
termination, department, whether they are active or terminated, position title, pay rate, manager name, and
performance score.

Recent additions to the data include:

● Absences
● Most Recent Performance Review Date
● Employee Engagement Score

Acknowledgements
Dr. Carla Patalano provided the baseline idea for creating this synthetic data set, which has been used now by over
200 Human Resource Management students at the college. Students in the course learn data visualization
techniques with Tableau Desktop and use this data set to complete a series of assignments.
Inspiration
We've included some open-ended questions that you can explore and try to address through creating Tableau
visualizations, or R or Python analyses. Good luck and enjoy the learning!

● Is there any relationship between who a person works for and their performance score?
● What is the overall diversity profile of the organization?
● What are our best recruiting sources if we want to ensure a diverse organization?
● Can we predict who is going to terminate and who isn't? What level of accuracy can we achieve on this?
● Are there areas of the company where pay is not equitable?

There are so many other interesting questions that could be addressed through this interesting data set. Dr.
Patalano and I look forward to seeing what we can come up with.

If you have any questions or comments about the dataset, please do not hesitate to reach out to me on LinkedIn:
http://www.linkedin.com/in/RichHuebner

You can also reach me via email at: [email protected]

HOW WE CREATE. PROJECT GUIDE LINE BY USING ML

Below is a comprehensive guide for a Human Resources Machine Learning project. This
project will involve building a machine learning model to predict employee turnover, also
known as employee attrition, using Python. The code will use common libraries such as
Pandas, Scikit-Learn, and Matplotlib for data processing, model building, and
visualization.

Human Resources Machine Learning Project: Predicting Employee Turnover

Objective:

To predict whether an employee will leave the company (attrition) based on various
features such as age, job satisfaction, salary, etc.

Step-by-Step Guide

1. Data Collection and Preparation

For this project, we'll use a sample dataset. You can use an HR analytics dataset from
sources like Kaggle or any other dataset you have.

Sample Data:

EmployeeID,Age,Gender,Department,Position,YearsAtCompany,JobSatisfaction,Salary,
Attrition

1,30,Male,Sales,Manager,5,4,75000,No
2,28,Female,Marketing,Executive,3,3,65000,Yes

2. Load and Explore the Data

# Importing necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset

df = pd.read_csv('employees.csv')

# Display the first few rows of the dataset

print(df.head())

# Check for missing values

print(df.isnull().sum())

# Summary statistics
print(df.describe())

3. Data Preprocessing
# Encode categorical variables
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})
df['Attrition'] = df['Attrition'].map({'No': 0, 'Yes': 1})

# One-hot encoding for department and position

df = pd.get_dummies(df, columns=['Department', 'Position'], drop_first=True)

# Drop irrelevant columns

df = df.drop(columns=['EmployeeID', 'Name'])

# Separate features and target variable

X = df.drop('Attrition', axis=1)
y = df['Attrition']

4. Split the Data into Training and Testing Sets

from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

5. Build and Train the Model

We'll use a Random Forest classifier for this project.

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Initialize the model

model = RandomForestClassifier(n_estimators=100, random_state=42)

# Train the model

model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

print('Classification Report:')
print(classification_report(y_test, y_pred))

print('Confusion Matrix:')
print(confusion_matrix(y_test, y_pred))

6. Feature Importance

# Get feature importances

importances = model.feature_importances_
feature_names = X.columns
feature_importances = pd.DataFrame({'Feature': feature_names, 'Importance': importances})
feature_importances = feature_importances.sort_values(by='Importance', ascending=False)

# Plot feature importances

plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=feature_importances)
plt.title('Feature Importances')
plt.show()

7. Model Deployment

In a real-world scenario, you would save the model and deploy it using a framework like Flask or
Django for making predictions on new data.

import joblib
# Save the model
joblib.dump(model, 'employee_attrition_model.pkl')

# Load the model

loaded_model = joblib.load('employee_attrition_model.pkl')

# Make predictions with the loaded model

new_predictions = loaded_model.predict(X_test)

Summary

In this project, we built a machine learning model to predict employee attrition using a Random
Forest classifier. We started by loading and exploring the dataset, followed by data
preprocessing, model building, training, and evaluation. Finally, we analyzed feature importances
and discussed the steps for model deployment.

This project can be further enhanced by:

● Tuning hyperparameters using GridSearchCV.

● Trying different machine learning algorithms.
● Incorporating additional features.
● Building a more sophisticated model evaluation process.

Feel free to expand on this foundation based on your specific requirements and data availability.

Sample report
Reference link

(Slides) Module 8 (Employee Attrition Prediction)
No ratings yet
(Slides) Module 8 (Employee Attrition Prediction)
100 pages
DADM Unit 5 Programs
No ratings yet
DADM Unit 5 Programs
63 pages
Attrition Project Mangal
No ratings yet
Attrition Project Mangal
75 pages
HR - Analytics 1
No ratings yet
HR - Analytics 1
28 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
77 pages
Churn Prediction - Commercial Use of Data Science
No ratings yet
Churn Prediction - Commercial Use of Data Science
25 pages
Iinx Project Summary
No ratings yet
Iinx Project Summary
20 pages
SMARAN HR Analytics - Ipynb - Colab
No ratings yet
SMARAN HR Analytics - Ipynb - Colab
65 pages
Employee Turnover Prediction Project
No ratings yet
Employee Turnover Prediction Project
10 pages
DOCUMENTATION12
No ratings yet
DOCUMENTATION12
42 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
AI Workshop Predict Employee Leave
No ratings yet
AI Workshop Predict Employee Leave
22 pages
Capstone Final PPT Group 6
No ratings yet
Capstone Final PPT Group 6
19 pages
Employee Attrition Study Case
No ratings yet
Employee Attrition Study Case
88 pages
HR Data Analysis
No ratings yet
HR Data Analysis
9 pages
Capstone Interim Report - HR CTC Prediction
80% (10)
Capstone Interim Report - HR CTC Prediction
16 pages
Employee Attrition PREDICTION Using Machine Learning
No ratings yet
Employee Attrition PREDICTION Using Machine Learning
11 pages
Assighment3 4 AI Projecct
No ratings yet
Assighment3 4 AI Projecct
58 pages
RESEARCH PAPER (HR Analytics)
No ratings yet
RESEARCH PAPER (HR Analytics)
11 pages
ANLY 502 Final Report
No ratings yet
ANLY 502 Final Report
7 pages
Decision - Tree-Random - Forest - Jupyter Notebook
No ratings yet
Decision - Tree-Random - Forest - Jupyter Notebook
12 pages
Report
No ratings yet
Report
45 pages
BerkeGündüz MelihAydın Cmpe442 Training Report
No ratings yet
BerkeGündüz MelihAydın Cmpe442 Training Report
14 pages
Hep & GIT Final MCQ 21 B
100% (4)
Hep & GIT Final MCQ 21 B
23 pages
IBM HR Analytics For Employee Attrition and Performance Prediction
No ratings yet
IBM HR Analytics For Employee Attrition and Performance Prediction
44 pages
Kel 2 - Uas Data Science
No ratings yet
Kel 2 - Uas Data Science
17 pages
CA Cover Sheet For Submissions
No ratings yet
CA Cover Sheet For Submissions
9 pages
Employee Performance Evaluation Using Classification Modeling
No ratings yet
Employee Performance Evaluation Using Classification Modeling
6 pages
Employee Attrition Prediction
100% (1)
Employee Attrition Prediction
21 pages
Chapter 1
No ratings yet
Chapter 1
19 pages
Training
No ratings yet
Training
13 pages
Predicting Employee Churn in Python
100% (1)
Predicting Employee Churn in Python
19 pages
Naina
No ratings yet
Naina
14 pages
1
No ratings yet
1
12 pages
MKTM Ca2
No ratings yet
MKTM Ca2
7 pages
Uefa A 2014 Oppgave Luis Pimenta
No ratings yet
Uefa A 2014 Oppgave Luis Pimenta
45 pages
Requirements
No ratings yet
Requirements
2 pages
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
07 HR
No ratings yet
07 HR
15 pages
PRDA-03.docx (HR Analysis)
No ratings yet
PRDA-03.docx (HR Analysis)
3 pages
Data Mining
No ratings yet
Data Mining
17 pages
Research Paper
No ratings yet
Research Paper
5 pages
Cdu 1121 09
No ratings yet
Cdu 1121 09
10 pages
Employee Attrition Classification
No ratings yet
Employee Attrition Classification
16 pages
User Requirments Final
No ratings yet
User Requirments Final
3 pages
IS5312 Mini Project-2
No ratings yet
IS5312 Mini Project-2
5 pages
HR 1
No ratings yet
HR 1
5 pages
Employee Turnover Prediction
100% (1)
Employee Turnover Prediction
16 pages
Boeing 777-300ER Air New Zealand
No ratings yet
Boeing 777-300ER Air New Zealand
18 pages
Employee Turnover
No ratings yet
Employee Turnover
19 pages
Karpagam Sep Oct 2019 Article 6
No ratings yet
Karpagam Sep Oct 2019 Article 6
6 pages
Business Analytics
No ratings yet
Business Analytics
5 pages
R Project and Read Me
No ratings yet
R Project and Read Me
1 page
Employee Attrition Miniblogs
100% (1)
Employee Attrition Miniblogs
15 pages
01 Excel Test CL 11 and Below
100% (1)
01 Excel Test CL 11 and Below
23 pages
African Traditional Religion (ATR)
100% (1)
African Traditional Religion (ATR)
18 pages
HR Analytics Synopsis
100% (1)
HR Analytics Synopsis
3 pages
Bain Report Long Live Luxury Converge To Expand Through Turbulence
No ratings yet
Bain Report Long Live Luxury Converge To Expand Through Turbulence
32 pages
Cot-English 2 Q2 W6
No ratings yet
Cot-English 2 Q2 W6
7 pages
Amapl - SS316L - Dia 100 MM - HT - 24SL1214 - 2596.000 Kgs.
No ratings yet
Amapl - SS316L - Dia 100 MM - HT - 24SL1214 - 2596.000 Kgs.
4 pages
Realflow-Autodesk 3dstudio Max Connectivity
100% (1)
Realflow-Autodesk 3dstudio Max Connectivity
42 pages
The Composite Steel Reinforced Concrete Column Under Axial and Seismic Loads: A Review
No ratings yet
The Composite Steel Reinforced Concrete Column Under Axial and Seismic Loads: A Review
19 pages
Rail Gun
100% (1)
Rail Gun
20 pages
Sse 213
No ratings yet
Sse 213
3 pages
Science 5 - Q2 - M12
No ratings yet
Science 5 - Q2 - M12
16 pages
Andculture Brand Guide
No ratings yet
Andculture Brand Guide
35 pages
Sorcerer (Alternate) - Sorcerous Origins (Archmage)
No ratings yet
Sorcerer (Alternate) - Sorcerous Origins (Archmage)
15 pages
MBA Managerial Economics Unit 1 - Economic Problems and Decision Making
No ratings yet
MBA Managerial Economics Unit 1 - Economic Problems and Decision Making
24 pages
Second Quarter Physical Education: Masbate National Comprehensive High School
No ratings yet
Second Quarter Physical Education: Masbate National Comprehensive High School
11 pages
DIVIDENDS
No ratings yet
DIVIDENDS
2 pages
Core Competencies For Driving Skills Certification
No ratings yet
Core Competencies For Driving Skills Certification
2 pages
Trip of Dreams PDF
No ratings yet
Trip of Dreams PDF
6 pages
C - 16922312 - Shafa Raisa Hazet - P-3.4 - 1
No ratings yet
C - 16922312 - Shafa Raisa Hazet - P-3.4 - 1
10 pages
Nokia Solutions and Networks Jaipur (Raj.) : Seminar Report ON Industrial Training AT
No ratings yet
Nokia Solutions and Networks Jaipur (Raj.) : Seminar Report ON Industrial Training AT
51 pages
Impact of Learning Styles On The Academic Performance of Junior High School Students of Golden Sunbeams Christian School, Antipolo City
No ratings yet
Impact of Learning Styles On The Academic Performance of Junior High School Students of Golden Sunbeams Christian School, Antipolo City
63 pages
Procedimiento Actualización SW Juniper
No ratings yet
Procedimiento Actualización SW Juniper
4 pages
Installation of NS2
No ratings yet
Installation of NS2
3 pages
RLT A Question of Trust
No ratings yet
RLT A Question of Trust
3 pages
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
No ratings yet
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
7 pages
Pink and Red Collage Modern Maximalist Art Trifold Brochure
No ratings yet
Pink and Red Collage Modern Maximalist Art Trifold Brochure
2 pages
Gynecology & Obstetrics
No ratings yet
Gynecology & Obstetrics
5 pages
Vaidyasala Malayalam Note
No ratings yet
Vaidyasala Malayalam Note
2 pages
Ultimate Machine Learning with ML.NET: Build, Optimize, and Deploy Powerful Machine Learning Models for Data-Driven Insights with ML.NET, Azure Functions, and Web API
From Everand
Ultimate Machine Learning with ML.NET: Build, Optimize, and Deploy Powerful Machine Learning Models for Data-Driven Insights with ML.NET, Azure Functions, and Web API
Kalicharan Mahasivabhattu
No ratings yet
Data Analysis and Business Modeling with Excel 2013: Manage, analyze, and visualize data with Microsoft Excel 2013 to transform raw data into ready to use information
From Everand
Data Analysis and Business Modeling with Excel 2013: Manage, analyze, and visualize data with Microsoft Excel 2013 to transform raw data into ready to use information
David Rojas
1/5 (2)
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
From Everand
Practical Full Stack Machine Learning: A Guide to Build Reliable, Reusable, and Production-Ready Full Stack ML Solutions
Alok Kumar
No ratings yet
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
From Everand
PYTHON DATA SCIENCE: A Practical Guide to Mastering Python for Data Science and Artificial Intelligence (2023 Beginner Crash Course)
Calvert Long
No ratings yet
Investigating Performance: Design and Outcomes With Xapi
From Everand
Investigating Performance: Design and Outcomes With Xapi
Sean Putman
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Concept Based Practice Questions for Tableau Desktop Specialist Certification Latest Edition 2023
From Everand
Concept Based Practice Questions for Tableau Desktop Specialist Certification Latest Edition 2023
Exam OG
No ratings yet
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet

HR Analyst (Data Analyst)

Uploaded by

HR Analyst (Data Analyst)

Uploaded by

Project Title Human Resources Analyst

Tools Machine Learning

Technologies Data Analyst

Project Difficulties level intermediate

Click here to download data set

Recent additions to the data include:

You can also reach me via email at: [email protected]

HOW WE CREATE. PROJECT GUIDE LINE BY USING ML

Human Resources Machine Learning Project: Predicting Employee Turnover

1. Data Collection and Preparation

2. Load and Explore the Data

# Importing necessary libraries

# Load the dataset

# Display the first few rows of the dataset

# Check for missing values

# One-hot encoding for department and position

# Drop irrelevant columns

# Separate features and target variable

4. Split the Data into Training and Testing Sets

from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets

5. Build and Train the Model

We'll use a Random Forest classifier for this project.

from sklearn.ensemble import RandomForestClassifier

# Initialize the model

# Train the model

# Evaluate the model

# Get feature importances

# Plot feature importances

# Load the model

# Make predictions with the loaded model

This project can be further enhanced by:

● Tuning hyperparameters using GridSearchCV.

You might also like