0% found this document useful (0 votes)

13 views1 page

Salary Estimation Using K-Nearest Neighbour

The document outlines a process for predicting whether a job applicant's salary is above or below 50K based on various features such as age, education, capital gain, and hours worked per week. It details steps including data collection, dataset loading, feature mapping, dataset segregation, and scaling to ensure equal contribution of features. The K-Nearest Neighbor algorithm is employed for classification, with emphasis on finding the optimal number of neighbors and validating the model's accuracy.

Uploaded by

a.n.thonym.u.sar.azal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views1 page

Salary Estimation Using K-Nearest Neighbour

Uploaded by

a.n.thonym.u.sar.azal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

To Predict

Predicting whether this Job applicant got

Finding the Problem - Salary above 50K or Not from Previous
1 Application Company - HR

Input: Age, Education number, Capital

Gain & Hours/week Output: Salary above/
below 50K

Based on Age, Education Number, Capital

Gain, Hours per week to estimate the
2 Collecting Dataset salary is above 50K or below 50K

Pandas - Load CSV Format Dataset

dataset = pandas.read_csv('dataset.csv')

No. of
Rows &
Load Dataset from the directory & Columns
Summarize the details such as no. of rows
3 Load & Summarize Dataset and Columns & Content
dataset.shape

Display 1st 5
rows of dataset

dataset.head(5)

Function: .map

Mapping Data from to Text to If the data is <=50K or >=50K, kind of text,
4 Binary Numbers Here we need to Map <50K as 0 & >50K as 1

SYNTAX: dataset.iloc[:,
start_col:end_col]

iloc - It helps us select a value that X = dataset.iloc[:, :-1].values

5 Segregating Dataset into X & Y belongs to a particular row or column
Y = dataset.iloc[:, -1].values

train_test_split(X, Y, test_size = 0.25,

6 Splitting Dataset to Train & Test Useful for validation random_state = 0)

PROBLEM

Since both the features have different

scales, there is a chance that higher
weightage is given to features with higher
magnitude. This will impact the
performance of the machine learning
algorithm and obviously, we do not want
our algorithm to be biassed towards one
feature.

SOLUTION

we scale our data to make all the features

contribute equally to the result

Salary Estimation 7 Feature Scaling

using K-Nearest Here the values are centered around the
mean with a unit standard deviation. This
Neighbour means that the mean of the attribute
becomes zero and the resultant
Standardization distribution has a unit standard deviation.

Types
Normalization is a scaling technique in
which values are shifted and rescaled so
that they end up ranging between 0 and 1.
Normalization It is also known as Min-Max scaling

Euclidean distance is calculated as the

square root of the sum of the squared
differences between a new point (x) and
Euclidean Distance an existing point (y)

Based on Minkowski Distance Metric we

gonna classify the data points |
p = 1 , Manhattan Distance
8 Algorithm K-Nearest Neighbor p = 2 , Euclidean Distance
This is the distance between real vectors
Manhattan Distance using the sum of their absolute difference

It is used for categorical variables. If the

value (x) and the value (y) are the same,
the distance D will be equal to 0 .
Hamming Distance Otherwise D=1

From this ﬁgure, we can observe that K

Finding the Best K-Value - Choose the K Value where we are getting Value range from 15 to 35, our mean error
9 number of neighbors least mean error is low

Training our Model for Pre-processed

10 Training Dataset model.ﬁt(X_train, y_train)

11 Validation Obtaining the accuracy of the Model Confusion Matrix

Observing how our model is classifying result = model.predict(sc.transform(

12 Prediction our new data newEmp))

Steel Grades For GB Standard - JIS Standard - ASTM Standard - DIN Standard
70% (10)
Steel Grades For GB Standard - JIS Standard - ASTM Standard - DIN Standard
8 pages
Asce 7-22 CH 01 - For PC
100% (2)
Asce 7-22 CH 01 - For PC
17 pages
SYSCAL Pro Users Manual SYSCAL Pro Stand
No ratings yet
SYSCAL Pro Users Manual SYSCAL Pro Stand
114 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Work Sampling
100% (1)
Work Sampling
69 pages
Slide 2 ML Basics
No ratings yet
Slide 2 ML Basics
42 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
06 KNN
No ratings yet
06 KNN
41 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
ML Lab Exam Document
No ratings yet
ML Lab Exam Document
14 pages
ML Unit2
No ratings yet
ML Unit2
38 pages
4K-Nearest Neighbor
No ratings yet
4K-Nearest Neighbor
38 pages
Feature Scaling Notes
No ratings yet
Feature Scaling Notes
4 pages
DSV Ia2
No ratings yet
DSV Ia2
18 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Statistic Inference Unit 2 Notes
No ratings yet
Statistic Inference Unit 2 Notes
34 pages
INSY446 - 5 - Classification Part 2
No ratings yet
INSY446 - 5 - Classification Part 2
37 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Week10 KNN Practical
No ratings yet
Week10 KNN Practical
4 pages
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
No ratings yet
K-Nearest Neighbor (KNN) Algorithm For Machine Learning
17 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
Feature Engineering
No ratings yet
Feature Engineering
50 pages
Untitled3.Ipynb - Colab
No ratings yet
Untitled3.Ipynb - Colab
7 pages
Classification and K Nearest Neighbour Algorithm
No ratings yet
Classification and K Nearest Neighbour Algorithm
53 pages
ML Unit 2
No ratings yet
ML Unit 2
52 pages
CSE445 NSU Week - 5
No ratings yet
CSE445 NSU Week - 5
26 pages
Unit 2 ML
No ratings yet
Unit 2 ML
89 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
ML Mid1 Myans
No ratings yet
ML Mid1 Myans
24 pages
Here's An Visualization of The K-Nearest Neighbors Algorithm
No ratings yet
Here's An Visualization of The K-Nearest Neighbors Algorithm
5 pages
Unit 3
No ratings yet
Unit 3
100 pages
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
No ratings yet
Data Preprocessing: Essential Steps For Preparing Data Before Modeling
111 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Lesson 4 - Supervised Learning
No ratings yet
Lesson 4 - Supervised Learning
36 pages
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
No ratings yet
01 Apply Data Preprocessing On Heart Dataset and Evaluate Performance Using Confusion Matrix
19 pages
TE ML LAB Mannual
No ratings yet
TE ML LAB Mannual
21 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Lecture Material 10
No ratings yet
Lecture Material 10
9 pages
Fiche Econo 2
No ratings yet
Fiche Econo 2
14 pages
Algorithm
No ratings yet
Algorithm
27 pages
ML Unit 2 r20 Jntuk
No ratings yet
ML Unit 2 r20 Jntuk
34 pages
Week 10
No ratings yet
Week 10
50 pages
cs4302 Lecture2
No ratings yet
cs4302 Lecture2
40 pages
ML Unit-2
No ratings yet
ML Unit-2
55 pages
W1
No ratings yet
W1
15 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
12 - 23ECE216 - Nearest Neighbors
No ratings yet
12 - 23ECE216 - Nearest Neighbors
29 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
000+ +curriculum+ +Complete+Data+Science+and+Machine+Learning+Using+Python
No ratings yet
000+ +curriculum+ +Complete+Data+Science+and+Machine+Learning+Using+Python
10 pages
Machine Learning
100% (5)
Machine Learning
56 pages
Lecture 5-KNN
No ratings yet
Lecture 5-KNN
55 pages
ML Notes
No ratings yet
ML Notes
44 pages
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
No ratings yet
Beginner's Guide To Implementing A Simple Machine Learning Project - DeV Community
9 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Lecture 2 20022025 092902am
No ratings yet
Lecture 2 20022025 092902am
87 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Text Summarization
No ratings yet
Text Summarization
6 pages
OpenText File System Archiving 10.2.0 Release Notes
No ratings yet
OpenText File System Archiving 10.2.0 Release Notes
13 pages
Subjectivity Objectivity and Frames of R PDF
No ratings yet
Subjectivity Objectivity and Frames of R PDF
49 pages
Lavalle Planning
No ratings yet
Lavalle Planning
121 pages
EA04A
No ratings yet
EA04A
3 pages
Ielts
No ratings yet
Ielts
2 pages
Claude Shannon Masters Thesis
100% (3)
Claude Shannon Masters Thesis
7 pages
SAQ Ans 24
No ratings yet
SAQ Ans 24
2 pages
65° Panel Antenna
No ratings yet
65° Panel Antenna
2 pages
Gr09 Maths Term2 Pack01 Practice Paper Memo
No ratings yet
Gr09 Maths Term2 Pack01 Practice Paper Memo
5 pages
Unit - 3 Terms of Trade Types
No ratings yet
Unit - 3 Terms of Trade Types
4 pages
An Investigation of A Model For Air Resistance Lab
No ratings yet
An Investigation of A Model For Air Resistance Lab
4 pages
Auditing in Oracle 10g Release 2
No ratings yet
Auditing in Oracle 10g Release 2
9 pages
Exam - 1013S 2023 Final
No ratings yet
Exam - 1013S 2023 Final
20 pages
This Kit Contains
No ratings yet
This Kit Contains
1 page
DSP in Radar
No ratings yet
DSP in Radar
11 pages
Best Practice Catalog: Machine Condition Monitoring
No ratings yet
Best Practice Catalog: Machine Condition Monitoring
18 pages
2.003J/1.053J Dynamics and Control I Fall 2007 Problem Set 4
No ratings yet
2.003J/1.053J Dynamics and Control I Fall 2007 Problem Set 4
4 pages
Grouping of Resistances-1
No ratings yet
Grouping of Resistances-1
13 pages
(MS-02.00) Condensing Unit & Ahu
No ratings yet
(MS-02.00) Condensing Unit & Ahu
52 pages
How To Enable and Read QueryService Logs
No ratings yet
How To Enable and Read QueryService Logs
3 pages
2 Axis Positioner Manual
100% (1)
2 Axis Positioner Manual
76 pages
Last Minute Notes
No ratings yet
Last Minute Notes
2 pages
Path Alignment Cross Polarization Parabolic Antennas TP 108827
No ratings yet
Path Alignment Cross Polarization Parabolic Antennas TP 108827
7 pages
M2 Lesson 4 Slides For Students
No ratings yet
M2 Lesson 4 Slides For Students
48 pages
MIT's Undergraduate String Theory Project
100% (13)
MIT's Undergraduate String Theory Project
18 pages

Salary Estimation Using K-Nearest Neighbour

Uploaded by

Salary Estimation Using K-Nearest Neighbour

Uploaded by

To Predict

Predicting whether this Job applicant got

Input: Age, Education number, Capital

Based on Age, Education Number, Capital

Pandas - Load CSV Format Dataset

iloc - It helps us select a value that X = dataset.iloc[:, :-1].values

train_test_split(X, Y, test_size = 0.25,

Since both the features have different

we scale our data to make all the features

Salary Estimation 7 Feature Scaling

Euclidean distance is calculated as the

Based on Minkowski Distance Metric we

It is used for categorical variables. If the

From this ﬁgure, we can observe that K

Training our Model for Pre-processed

11 Validation Obtaining the accuracy of the Model Confusion Matrix

Observing how our model is classifying result = model.predict(sc.transform(

You might also like