0% found this document useful (0 votes)

63 views5 pages

Salary Data Analysis - Phase 1

This document outlines a project focused on analyzing a salary dataset to predict employee salaries based on various factors such as age, gender, education level, job title, and years of experience. The project aims to enhance analytical skills and provide insights for organizations to optimize their human resource practices and address potential inequities in compensation. Key components include data preparation, model building, and evaluation, with hypotheses formulated to guide the analysis.

Uploaded by

Adeel Manaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views5 pages

Salary Data Analysis - Phase 1

Uploaded by

Adeel Manaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Data Analytics

CIT-466-502

Student Name – ID

Data Understanding Phase 1: Salary Prediction Dataset

Table of Contents
1. Framing the Problem
1.1 Introduction
1.2 Understanding the Business Domain
1.3 Purpose
1.4 Project Resources
1.5 Formulating Hypothesis
Salary Data Analysis
In this project, we will apply analytical techniques to the salary dataset. We aim to clean and
explore this salary dataset, do some visualizations to generate insights, build a model, and
evaluate it by the end of this project. This project will enhance our analytical, visualization, and
practical skills in addressing real-world problems. Accordingly, we will be delving into the
problem thoroughly.

1. Framing the Problem

1.1 Introduction
In this project, we will explore a salary dataset to understand factors that can have a significant
impact on employee salaries, and then we will build a regression model to predict the salary of
employees. Today, businesses started to focus on strategic workforce planning and equitable
compensation. The analysis of salary data is more important for optimizing human resource
practices. In this project, we will use statistical analysis along with a regression model to
analyze, visualize, and model the factors associated with employee salaries. By exploring the
relationship of employee salaries with different factors such as age, gender, education level, job
title, and years of experience, we want to generate useful insights that can help companies with
employee hiring, retention, and compensation strategies.

We will use systematic approaches to complete this project. This approach includes data
preparation, model planning, model building, and then evaluating the performance of the model.
The insights generated from this project will not only help organizations to offer market
competitive salaries but also identify any potential inequities or gender disparity. Addressing any
inequity or disparity helps organizations to improve employee satisfaction and retention and can
also help to enhance the company’s reputation.

1.2 Understanding the Business Domain

After thoroughly reviewing the data to understand the business domain, we determined that the
dataset describes the salary of the employees. The datasets dimensions are (375,7), which means
there are 375 rows and 7 variables in the dataset. This dataset contains information about the
salaries of employees at a company. Each row represents a different employee, and the columns
include information such as age, gender, education level, job title, years of experience, and
salary.

Age: This column represents the age of each employee in years. The values in this column are
numeric.
Gender: This column contains the gender of each employee, which can be either male or female.
The values in this column are categorcal.

Education Level: This column contains the educational level of each employee, which can be
high school, bachelor's degree, master's degree, or PhD. The values in this column are
categorical.

Job Title: This column contains the job title of each employee. The job titles can vary depending
on the company and may include positions such as manager, analyst, engineer, or administrator.
The values in this column are categorical.

Years of Experience: This column represents the number of years of work experience of each
employee. The values in this column are numeric.

Salary: This column represents the annual salary of each employee in US dollars. The values in
this column are numeric and can vary depending on factors such as job title, years of experience,
and education level.

From a business point of view, the dataset consists of different variables that can be used for
providing variable insights for employees, strategic workforce planning, and human resources
optimization. We can analyze variables such as age, gender, educational level, job title, years of
experience, along with salary, and can identify the key trends that can be used for hiring
employees, training employees, and employee retention strategies. By understanding
relationships between salary and other variables, it can help businesses assess whether their
current salary structure aligns with industry standards and also with employee expectations to
make sure salary is competitive and fair with compensation practices.

We can also perform gender analysis for salary, job title, and experience that can help us reveal
if there are any gaps or biases in compensation and promotion practices, which will enable
companies to create targeted initiatives and equal opportunities for male and female employees.
By identifying discrepancies in salary by gender or by experience level, the organization can take
proactive steps to address potential inequities that will help to strengthen the company’s
reputation as an inclusive and fair employer.

1.3 Purpose
The purpose of this salary dataset is to predict the salary of an employee based on the different
factors provided in the dataset. Since the salary variable is a continuous and numeric variable, it
is clear that regression is suitable for this problem because the output variable salary is numerical
and continuous.
For this project, building an accurate model that can predict the salary of an employee accurately
is crucial. The organizations can use this model for rapid decision-making, operational
efficiency, and retention efforts that can help identify if there are any potential salary disparities
and to make sure that compensation practices promote both fairness and competitiveness. The
insights generated from this dataset can help organizations attract and retain the top talent, offer
competitive salary packages and enhance employee satisfaction.

1.4 Project Resources

Project resources are an important component in managing and executing projects effectively.
Therefore, it’s essential to consider important resources such as data, technical, and human
resources.

Data Resources:
● The historical employee salary data that will be used for analysis and building a
regression model.
Technical Resources
● Data Storage
● Computation Resources and Computer
● R Studio, which will be used for data cleaning, data analysis, data visualization, and then
for regression
Human Resources:
● Data Engineer: Responsible for cleaning and transforming data to make it ready and
suitable for further analysis.
● Data Scientists: Responsible for building regression models to predict the salary of
employees.
● Project Manager: To make sure everything progresses according to a plan and to ensure
that the milestones and objectives are met in a timely manner and with high quality.

1.5 Formulating Hypothesis

Based on the dataset, the following hypothesis can be formulated:

1. Hypothesis 1: Higher education levels are associated with higher average salaries.

2. Hypothesis 2: Employees with higher years of experience tend to have higher salaries.

3. Hypothesis 3: Salaries may not be evenly distributed across genders for senior-level
positions, which may suggest a potential gender disparity in leadership roles.

4. Hypothesis 4: Younger employees with advanced degrees have similar or higher salaries
compared to older employees with the same job titles and fewer formal qualifications.

Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
No ratings yet
Capstone Project Final Report Rupesh Kumar PGP-DSBA APR 21C
77 pages
Capstone Interim Report - HR CTC Prediction
80% (10)
Capstone Interim Report - HR CTC Prediction
16 pages
Final Capstone Project Report
100% (1)
Final Capstone Project Report
35 pages
FoAI - ASM2 - Mai Ngo
100% (1)
FoAI - ASM2 - Mai Ngo
15 pages
HR Salary Dashboard
No ratings yet
HR Salary Dashboard
12 pages
Salary Prediction
No ratings yet
Salary Prediction
4 pages
PHRi Functional Area 02 - HR Administration and Shared Services
100% (1)
PHRi Functional Area 02 - HR Administration and Shared Services
108 pages
Contact Employer Feedback Report
No ratings yet
Contact Employer Feedback Report
102 pages
Student PPT 1
No ratings yet
Student PPT 1
27 pages
Training and Development Vision Plus
No ratings yet
Training and Development Vision Plus
52 pages
Case - POS INDONESIA - SURFING THROUGH THE CHALLENGES OF TIME - 2 (FIN)
No ratings yet
Case - POS INDONESIA - SURFING THROUGH THE CHALLENGES OF TIME - 2 (FIN)
14 pages
Management Unit 1,3,4,5 MCQ Ans
No ratings yet
Management Unit 1,3,4,5 MCQ Ans
41 pages
Capstone Final Report DSA Group 14
100% (1)
Capstone Final Report DSA Group 14
22 pages
Departments in Corporate
No ratings yet
Departments in Corporate
6 pages
Project Report
No ratings yet
Project Report
11 pages
Group 4 Ins1053 Ins105301
No ratings yet
Group 4 Ins1053 Ins105301
57 pages
Labour Cost
No ratings yet
Labour Cost
11 pages
PAKT - 2019 Pakistan Tobacco Company Limited - OpenDoors - PK
No ratings yet
PAKT - 2019 Pakistan Tobacco Company Limited - OpenDoors - PK
229 pages
Work Measurement - IQC Project
No ratings yet
Work Measurement - IQC Project
20 pages
Chapter 6: Stakeholder Management
100% (1)
Chapter 6: Stakeholder Management
13 pages
PLSQL Tutorial
No ratings yet
PLSQL Tutorial
55 pages
Salaries For San Francisco Employee - ML - FA - DA Projects
No ratings yet
Salaries For San Francisco Employee - ML - FA - DA Projects
33 pages
Sumo
No ratings yet
Sumo
45 pages
Marketing Final Test
100% (1)
Marketing Final Test
8 pages
Kushal Kadayat
No ratings yet
Kushal Kadayat
33 pages
Project Report Departmental
No ratings yet
Project Report Departmental
43 pages
Salary Predictions
No ratings yet
Salary Predictions
43 pages
Mini Project Report
No ratings yet
Mini Project Report
10 pages
Iinx Project Summary
No ratings yet
Iinx Project Summary
20 pages
Salary Prediction
No ratings yet
Salary Prediction
28 pages
Bussiness Tech and Finance
No ratings yet
Bussiness Tech and Finance
39 pages
BT4234 - RPT - Mr. Sreenarayanan N M
No ratings yet
BT4234 - RPT - Mr. Sreenarayanan N M
32 pages
Kel 2 - Uas Data Science
No ratings yet
Kel 2 - Uas Data Science
17 pages
Group 24 Miniproject
No ratings yet
Group 24 Miniproject
33 pages
Airline Passenger Data Analysis
No ratings yet
Airline Passenger Data Analysis
9 pages
Group 8 - Tata Consultancy Services - Answer
No ratings yet
Group 8 - Tata Consultancy Services - Answer
18 pages
UK Crime Data Analysis
No ratings yet
UK Crime Data Analysis
11 pages
Data Collection
No ratings yet
Data Collection
4 pages
Decision Support Systems
No ratings yet
Decision Support Systems
23 pages
Job Analyser 1
No ratings yet
Job Analyser 1
28 pages
Code Masters
No ratings yet
Code Masters
10 pages
AMCAT Data Analysis
No ratings yet
AMCAT Data Analysis
18 pages
Chapter 1
No ratings yet
Chapter 1
19 pages
HR Analyst (Data Analyst)
No ratings yet
HR Analyst (Data Analyst)
11 pages
Kaushik Project
No ratings yet
Kaushik Project
13 pages
Synopsis Group 6 Final
No ratings yet
Synopsis Group 6 Final
6 pages
Assessment 2 UEL CN 7000
No ratings yet
Assessment 2 UEL CN 7000
10 pages
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
No ratings yet
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
13 pages
Fba Affiliation Report Format
No ratings yet
Fba Affiliation Report Format
4 pages
Report
No ratings yet
Report
15 pages
TB 969425740
No ratings yet
TB 969425740
16 pages
Managing Organizational Change
No ratings yet
Managing Organizational Change
33 pages
BIA 660: Glassdoor Sentimental Analysis and Salary Prediction
No ratings yet
BIA 660: Glassdoor Sentimental Analysis and Salary Prediction
15 pages
Data Scientist
No ratings yet
Data Scientist
10 pages
SSRN Id3990877
No ratings yet
SSRN Id3990877
8 pages
A Model To Predict Pay Scale Fixation in Job Marke
No ratings yet
A Model To Predict Pay Scale Fixation in Job Marke
6 pages
Data Analytics Final Project
No ratings yet
Data Analytics Final Project
6 pages
Daa Project Research
No ratings yet
Daa Project Research
6 pages
Week 018-Course Module-Features of Professional Correspondence
No ratings yet
Week 018-Course Module-Features of Professional Correspondence
7 pages
Course Project - Machine Learning (DS PGC)
No ratings yet
Course Project - Machine Learning (DS PGC)
6 pages
19761597.2017.1385976 (International) PDF
No ratings yet
19761597.2017.1385976 (International) PDF
16 pages
MIDTERM MODULE Training Development
No ratings yet
MIDTERM MODULE Training Development
7 pages
Adepoju O
No ratings yet
Adepoju O
9 pages
Dranga, EMБ-23-М, Business English Module 3
No ratings yet
Dranga, EMБ-23-М, Business English Module 3
7 pages
User Requirments Final
No ratings yet
User Requirments Final
3 pages
Salary Hike Predictor Synopsis
No ratings yet
Salary Hike Predictor Synopsis
4 pages
Business Context (WHY) 4. HR Accountabi Lity (WHO) 3. HR Redesign (HOW)
100% (1)
Business Context (WHY) 4. HR Accountabi Lity (WHO) 3. HR Redesign (HOW)
9 pages
PO687 Assignment Example: What's in Orange Are Tips From Me
No ratings yet
PO687 Assignment Example: What's in Orange Are Tips From Me
14 pages
IS5312 Mini Project-2
No ratings yet
IS5312 Mini Project-2
5 pages
Pac 604 Reviewer
No ratings yet
Pac 604 Reviewer
8 pages
HR 1
No ratings yet
HR 1
5 pages
Anova and Pca
No ratings yet
Anova and Pca
10 pages
Shsconf Cdems2023 03013
No ratings yet
Shsconf Cdems2023 03013
5 pages
Assessment 1 - UEL-CN-7000
No ratings yet
Assessment 1 - UEL-CN-7000
3 pages
Statistic Assignment
No ratings yet
Statistic Assignment
8 pages
Capstone Project Assignment
No ratings yet
Capstone Project Assignment
3 pages
Data Viz Case Study
No ratings yet
Data Viz Case Study
3 pages
Pragya Gupta CV.
No ratings yet
Pragya Gupta CV.
3 pages
New Content-1
No ratings yet
New Content-1
2 pages
Nojish Hussain HR
No ratings yet
Nojish Hussain HR
2 pages
Phase 1
No ratings yet
Phase 1
2 pages
Project 144520
No ratings yet
Project 144520
2 pages
Roles and Responsibilities of Human Resource Management
No ratings yet
Roles and Responsibilities of Human Resource Management
2 pages
BC HRM 2 - CL - 23 June 2021 - S2
No ratings yet
BC HRM 2 - CL - 23 June 2021 - S2
2 pages
Report Amk
No ratings yet
Report Amk
1 page
Employee Surveys That Work: Improving Design, Use, and Organizational Impact
From Everand
Employee Surveys That Work: Improving Design, Use, and Organizational Impact
Alec Levenson
No ratings yet
Using Forecasting Methodologies to Explore an Uncertain Future
From Everand
Using Forecasting Methodologies to Explore an Uncertain Future
James Poon
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Project Management to improve IT
From Everand
Project Management to improve IT
Andrés Báez
No ratings yet
How to Align Employee Targets to the Strategy
From Everand
How to Align Employee Targets to the Strategy
Tawia Tsekumah
No ratings yet
The Fastest, Easiest Way To Improve How You Hire
From Everand
The Fastest, Easiest Way To Improve How You Hire
Eileen Smith
No ratings yet
Data Conversion: Calculating the Monetary Benefits
From Everand
Data Conversion: Calculating the Monetary Benefits
Patricia Pulliam Phillips
No ratings yet
Logical data model A Clear and Concise Reference
From Everand
Logical data model A Clear and Concise Reference
Gerardus Blokdyk
No ratings yet
Data Reference Model The Ultimate Step-By-Step Guide
From Everand
Data Reference Model The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Data Ops Complete Self-Assessment Guide
From Everand
Data Ops Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Data model Second Edition
From Everand
Data model Second Edition
Gerardus Blokdyk
No ratings yet

Salary Data Analysis - Phase 1

Uploaded by

Salary Data Analysis - Phase 1

Uploaded by

Data Analytics

Data Understanding Phase 1: Salary Prediction Dataset

1. Framing the Problem

1.2 Understanding the Business Domain

1.4 Project Resources

1.5 Formulating Hypothesis

You might also like