Project Report

Uploaded by

Nikhil Nagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Project Report

Uploaded by

Nikhil Nagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Indian Insititute of Information Technology Raichur

salary prediction

Independent Project
(Course Code: ID151)
By

Nikhil Nagar
Roll No : AD23B1035
TABLE OF CONTENT: -
1) Introduction
2) Objectives
3) Tools and Technologies Used
4) Features and Functionality
5)Future Enhancements
6)Challenges Faced
7)data visualization
8)Conclusion

Introduction
The Salary Prediction Project aims to leverage the power of
machine learning to provide reliable estimates of salaries based
on a comprehensive set of factors. By incorporating variables
such as skills, country of employment, experience level, and
educational background, this project endeavors to offer valuable
insights into the intricate dynamics influencing salary
determinations across various industries and geographic regions.

For job seekers, having a clear understanding of their expected

salary enables informed negotiations, career planning, and overall
financial stability. On the other hand, employers benefit from
accurate salary predictions by ensuring fair compensation
practices, attracting top talent, and optimizing budget allocations
for human resources.

 Objectives

The primary objective of this project is to develop a robust machine learning

model capable of accurately predicting salaries based on multiple factors. By
analyzing a diverse range of features including skills, country, experience,
and education, the model aims to provide actionable insights into salary
trends and patterns within specific job markets.

Job Seekers: Job seekers can benefit from the insights generated by the
salary prediction model to make informed decisions about their career paths.
By having access to accurate salary estimates based on factors such as
skills, experience, and education, job seekers can negotiate better
compensation packages and plan their career progression more effectively.

Employers: Employers can use the predictive model to ensure fair and
competitive compensation practices within their organizations. By
understanding the factors that influence salary outcomes, employers can
optimize salary structures, attract top talent, and retain valuable employees.

HR Professionals: Human resources professionals can leverage the

predictive model to streamline recruitment and hiring processes. By
accurately predicting salaries for different job roles, HR professionals can set
realistic salary expectations

Tools and methods Used:

Programming Language: Python
Libraries:
pandas: Data manipulation and analysis (used for loading CSV
data, cleaning, and creating dataframes)
numpy: Numerical computations (used for mathematical
operations and array manipulations)
scikit-learn: Machine learning algorithms
LabelEncoder: Converts categorical features into numerical labels
for machine learning algorithms.
train_test_split: Splits the data into training and testing sets for
model training and evaluation.
RandomForestRegressor: Ensemble learning method that
averages predictions from multiple decision trees for improved
accuracy and robustness.
DecisionTreeRegressor: Tree-based model that makes predictions
by following a series of decision rules based on feature values.
LinearRegression: Creates a linear relationship between features
and the target variable (salary) for prediction.
GridSearchCV: Performs an exhaustive grid search over a
specified parameter space to find the optimal hyperparameters
for the chosen model.
XGBRegressor: Gradient boosting algorithm that combines
multiple weak decision trees into a strong learner for improved
prediction performance.
mean_squared_error: Calculates the average squared difference
between predicted and actual values, used to evaluate model
performance.
matplotlib: Data visualization library for creating plots and charts.
pickle: Allows saving and loading the trained model and encoders
for future use.

4. Literature Review
Salary prediction is a well-established field within machine
learning and human resources. Numerous studies have explored
various algorithms and feature sets to achieve accurate salary
estimations. Common approaches include:
Linear Regression: This is a simple and interpretable model that
establishes a linear relationship between features (e.g.,
experience, education) and salary. However, it may not capture
complex non-linear relationships present in real-world data.
Decision Trees and Random Forests: These algorithms build tree-
like structures where each node represents a decision rule based
on a specific feature. Random forests combine predictions from
multiple decision trees, leading to improved accuracy and
reduced overfitting.
Gradient Boosting Techniques (XGBoost): These algorithms
iteratively build an ensemble of models, where each model learns
to improve upon the errors of the previous one. XGBoost is a
popular choice for salary prediction due to its ability to handle
complex relationships and high performance.
The choice of algorithm depends on the specific dataset, desired
model interpretability, and computational resources available .

5. Features and Functionality

Data Preprocessing:
Data Loading: The code utilizes the pandas.read_csv function to
load the salary dataset from a CSV file.
Data Cleaning: This step might involve removing irrelevant
columns or rows with missing values, handling inconsistencies in
formatting (e.g., removing currency symbols from salary entries
using regular expressions as demonstrated in the code).
Techniques like imputation or removing rows with too many
missing values might be considered depending on the data
quality.
Label Encoding: Categorical features like job title, skills,
country, and education are converted into numerical labels using
LabelEncoder from scikit-learn. This allows machine learning
algorithms to handle these features effectively.
Model Training and Evaluation:
1)Training-Testing Split: The dataset is divided into two sets using
train_test_split. The training set (typically 70-80% of the data) is used to train
the model, and the testing set (remaining 20-30%) is used to evaluate its
performance on unseen data.

2)Model Selection and Hyperparameter Tuning: Multiple

machine learning algorithms (Random Forest, Decision Tree, XGBoost, Linear
Regression) are trained and evaluated on the training set. GridSearchCV can
be used to explore different hyperparameter combinations for each
algorithm to find the best performing configuration. Metrics like RMSE (Root
Mean Squared Error) are used to compare model performance. The model
with the lowest RMSE on the testing set is chosen as the final prediction
model

3) Model Training: The chosen model (e.g., XGBoost) is trained on

the entire training set using the optimized hyperparameters.

4 )Model Evaluation:
The trained model is evaluated on the testing set.

The code calculates RMSE using mean_squared_error from scikit-learn to

assess the difference between predicted and actual salaries.

Additionally, data visualization techniques using matplotlib can be employed

to create scatter plots comparing predicted vs. actual salaries. This helps
identify potential biases or outliers in the model's predictions.

Prediction:
 Saving the Model and Encoders: The trained model and label encoders are saved using
pickle for future use. This allows you to avoid retraining the model on the entire dataset every
time a new prediction is needed.
 Loading Saved Model and Encoders: When a new salary prediction is required, the saved
model and encoders are loaded using pickle.
 Preprocessing New Data: New data points with features like job title, skills, experience,
education, and country are prepared by performing similar pre-processing steps as during
training (e.g., encoding categorical features using the loaded encoders).
 Making Predictions: The preprocessed new data point is fed to the loaded model, and the
model predicts the corresponding salary
6. Future Enhancements

Feature Engineering Exploration: Explore more advanced feature

engineering techniques, such as one-hot encoding for categorical features or
feature scaling for numerical features, to potentially improve model
performance.

Data Augmentation: If the dataset is limited, consider data augmentation

techniques (e.g., generating synthetic data points) to increase the amount of
training data and potentially improve model generalizability.

Hyperparameter Tuning Optimization: Experiment with different

hyperparameter tuning techniques beyond GridSearchCV, such as
RandomizedSearchCV or Bayesian optimization, to potentially find even
better hyperparameter configurations.

Additional Features: Consider in5b corporating additional features that

capture more comprehensive information about job roles (e.g., company
size, industry, required certifications) if relevant data is available .

7. Data Visualization

8. Conclusion
The developed salary prediction model demonstrates the power
of machine learning in estimating salaries based on job-related
information. While the model has limitations (e.g., may not
capture all factors influencing salary), it can be a valuable tool for
both individuals and organizations. By incorporating future
enhancements and data visualization, the model's accuracy and
usefulness can be further improved.

Capstone Interim Report - HR CTC Prediction
80% (10)
Capstone Interim Report - HR CTC Prediction
16 pages
Machine Learning Models For Salary Prediction Dataset Using Python
No ratings yet
Machine Learning Models For Salary Prediction Dataset Using Python
5 pages
Modellaufgabenheft English
No ratings yet
Modellaufgabenheft English
72 pages
JOB SALARIES PREDICTION SYSTEM
No ratings yet
JOB SALARIES PREDICTION SYSTEM
9 pages
BT4234 - RPT - Mr. Sreenarayanan N M
No ratings yet
BT4234 - RPT - Mr. Sreenarayanan N M
32 pages
Group 24 Miniproject
No ratings yet
Group 24 Miniproject
33 pages
Mini Project Report
No ratings yet
Mini Project Report
10 pages
Salary Predictions
No ratings yet
Salary Predictions
43 pages
Salary_hike_predictor_synopsis
No ratings yet
Salary_hike_predictor_synopsis
4 pages
Salary Prediction
No ratings yet
Salary Prediction
4 pages
Salary Prediction Using Machine Learning
No ratings yet
Salary Prediction Using Machine Learning
4 pages
CODE MASTERS
No ratings yet
CODE MASTERS
10 pages
Batch 1 Publication
No ratings yet
Batch 1 Publication
16 pages
TB 969425740
No ratings yet
TB 969425740
16 pages
Assessment 2 UEL CN 7000
No ratings yet
Assessment 2 UEL CN 7000
10 pages
DSciHomeworkAssignmentV4
No ratings yet
DSciHomeworkAssignmentV4
2 pages
DS final project
No ratings yet
DS final project
20 pages
KEL 2 - UAS DATA SCIENCE
No ratings yet
KEL 2 - UAS DATA SCIENCE
17 pages
Modeling
No ratings yet
Modeling
4 pages
Course Project - Machine Learning (DS PGC)
No ratings yet
Course Project - Machine Learning (DS PGC)
6 pages
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
No ratings yet
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
10 pages
AI 53
No ratings yet
AI 53
13 pages
SALARY PREDICTION ABSTRACT
No ratings yet
SALARY PREDICTION ABSTRACT
5 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Task1
No ratings yet
Task1
5 pages
Machine Learning Algorithms for Data Scientists: An Overview
From Everand
Machine Learning Algorithms for Data Scientists: An Overview
Vinaitheerthan Renganathan
No ratings yet
ML Report
No ratings yet
ML Report
20 pages
ssrn-3526707
No ratings yet
ssrn-3526707
5 pages
AMCAT Data Analysis
No ratings yet
AMCAT Data Analysis
18 pages
shsconf_cdems2023_03013
No ratings yet
shsconf_cdems2023_03013
5 pages
HR Salary Dashboard
No ratings yet
HR Salary Dashboard
12 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
RajivRanjan CapstoneProjectFinalReport HRData PGP-DSBA Sep2022-23
No ratings yet
RajivRanjan CapstoneProjectFinalReport HRData PGP-DSBA Sep2022-23
32 pages
Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
Software Industry Salary Prediction
No ratings yet
Software Industry Salary Prediction
14 pages
A_Model_to_Predict_Pay_Scale_Fixation_in_Job_Marke
No ratings yet
A_Model_to_Predict_Pay_Scale_Fixation_in_Job_Marke
6 pages
SALARY PREDICTION DOCUMENT
No ratings yet
SALARY PREDICTION DOCUMENT
30 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
House Price Using Machine Learning (1)
No ratings yet
House Price Using Machine Learning (1)
9 pages
Salary Prediction-2
No ratings yet
Salary Prediction-2
26 pages
Salary Prediction Model Using Principal Component Analysis and Deep Neural Network Algorithm
No ratings yet
Salary Prediction Model Using Principal Component Analysis and Deep Neural Network Algorithm
11 pages
batch 1 Job market analysis and prediction-1
No ratings yet
batch 1 Job market analysis and prediction-1
60 pages
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
From Everand
IT Specialist: Artificial Intelligence Exam Prep - 500 Questions for Certification Success (0225)
Satou Takahiro
No ratings yet
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
No ratings yet
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
13 pages
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
DS CP Project Report (1)
No ratings yet
DS CP Project Report (1)
7 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Research Paper 1
No ratings yet
Research Paper 1
9 pages
Placment Predection Using Machine Learning
No ratings yet
Placment Predection Using Machine Learning
9 pages
Assessment 1 - UEL-CN-7000
No ratings yet
Assessment 1 - UEL-CN-7000
3 pages
KAUSHIK PROJECT
No ratings yet
KAUSHIK PROJECT
13 pages
Volume6_Issue3_Paper10_2022
No ratings yet
Volume6_Issue3_Paper10_2022
6 pages
AI Projects
No ratings yet
AI Projects
13 pages
Salaries for San Francisco Employee _ ML _ FA _ DA projects
No ratings yet
Salaries for San Francisco Employee _ ML _ FA _ DA projects
33 pages
Dnyaneshwar Ds
No ratings yet
Dnyaneshwar Ds
2 pages
Project Proposal (1)
No ratings yet
Project Proposal (1)
20 pages
Medical Insurance Cost Prediction
100% (2)
Medical Insurance Cost Prediction
16 pages
Regression and Neural Network Based Prediction Model For The Participation of Female Employment in Bangladesh
No ratings yet
Regression and Neural Network Based Prediction Model For The Participation of Female Employment in Bangladesh
59 pages
Internship Report
No ratings yet
Internship Report
33 pages
Customer Segmentation Using OLAP and Clustering
No ratings yet
Customer Segmentation Using OLAP and Clustering
8 pages
maths last
No ratings yet
maths last
15 pages
Customer-Segmentation-using-OLAP-and-Clustering
No ratings yet
Customer-Segmentation-using-OLAP-and-Clustering
8 pages
ChatGPT Prompts by EcomTushar
No ratings yet
ChatGPT Prompts by EcomTushar
2 pages
Assignment 2 (LA)
No ratings yet
Assignment 2 (LA)
5 pages
Oup Technology JIT Layout
No ratings yet
Oup Technology JIT Layout
20 pages
Statistical Modeling of Complex Backgrounds For Foreground Object Detection
No ratings yet
Statistical Modeling of Complex Backgrounds For Foreground Object Detection
14 pages
Cfhipsterref Low Level Programming On Ios Mac Os
0% (1)
Cfhipsterref Low Level Programming On Ios Mac Os
57 pages
Interactive Whiteboard Thesis
100% (2)
Interactive Whiteboard Thesis
4 pages
AFOQT Preparation Study Aids
No ratings yet
AFOQT Preparation Study Aids
3 pages
Lab Manual 3rd Sem MCA - 2K19
No ratings yet
Lab Manual 3rd Sem MCA - 2K19
70 pages
CLIENT APPLICATION FORM Pay
No ratings yet
CLIENT APPLICATION FORM Pay
2 pages
Dam Exercise Using Cadam
No ratings yet
Dam Exercise Using Cadam
8 pages
MIE Academy Framework
100% (1)
MIE Academy Framework
15 pages
DS UNIT WISE Important Questions
No ratings yet
DS UNIT WISE Important Questions
4 pages
How To Apply A Rolling Opatch: Rac How To... Fold Patch Installation Instructions Patch Deinstallation Instructions
No ratings yet
How To Apply A Rolling Opatch: Rac How To... Fold Patch Installation Instructions Patch Deinstallation Instructions
6 pages
1_5075902460489367745
No ratings yet
1_5075902460489367745
20 pages
Advantys Telefast ABE7 ABE7CPA03
No ratings yet
Advantys Telefast ABE7 ABE7CPA03
4 pages
SDS Template
No ratings yet
SDS Template
10 pages
Know How: 3Ccd Digital Video Camcorder
No ratings yet
Know How: 3Ccd Digital Video Camcorder
9 pages
MATH15 Project JavaScript Source Code
No ratings yet
MATH15 Project JavaScript Source Code
3 pages
Plaintext and Ciphertext
No ratings yet
Plaintext and Ciphertext
7 pages
Rotary Inclinometer Product Description
No ratings yet
Rotary Inclinometer Product Description
8 pages
Logcat
No ratings yet
Logcat
4 pages
A Review on the Role of Big Data Analytics in The
No ratings yet
A Review on the Role of Big Data Analytics in The
8 pages
Course Code: CS 281 Course Title: Digital Logic Design (DLD)
No ratings yet
Course Code: CS 281 Course Title: Digital Logic Design (DLD)
15 pages
D D D D D: TMS27C512 65536 BY 8-BIT UV ERASABLE TMS27PC512 65536 BY 8-BIT Programmable Read-Only Memories
No ratings yet
D D D D D: TMS27C512 65536 BY 8-BIT UV ERASABLE TMS27PC512 65536 BY 8-BIT Programmable Read-Only Memories
13 pages
Indala Reader Comparison2
No ratings yet
Indala Reader Comparison2
2 pages
Biodiesel Technology and Applications Inamuddin all chapter instant download
100% (3)
Biodiesel Technology and Applications Inamuddin all chapter instant download
41 pages
IB CS HL Case Study Terms
No ratings yet
IB CS HL Case Study Terms
4 pages
Fernando, Logit Tobit Probit March 2011
No ratings yet
Fernando, Logit Tobit Probit March 2011
19 pages
RNP - 54319500reporting Notice Biometric
No ratings yet
RNP - 54319500reporting Notice Biometric
1 page
Product Brochure Idirect Hub
No ratings yet
Product Brochure Idirect Hub
6 pages
Ingles II: Federico Zagal Section I
No ratings yet
Ingles II: Federico Zagal Section I
4 pages