0% found this document useful (0 votes)

18 views39 pages

Unit 2h

The document provides an overview of the K Nearest Neighbor (KNN) classification algorithm, detailing its methodology, distance measures, and implementation. It explains the importance of proximity measures in machine learning and discusses KNN's application in classification and regression tasks. Additionally, it includes examples, problems, and a question bank for further understanding of the concepts presented.

Uploaded by

sudhir kc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views39 pages

Unit 2h

Uploaded by

sudhir kc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 39

PACE INSTITUTE OF TECHNOLOGY AND SCIENCES

Machine Learning

Dr. G. Ganesh Naidu ME,Ph.D,PG (AI & ML)-IIT KGP

Professor
Department of Civil Engineering
[email protected]
9581456545
UNIT-2

K Nearest Neighbor
Classification

Contents : Nearest Neighbor-Based Models: Introduction to Proximity Measures,

Distance Measures, Non-Metric Similarity Functions, Proximity Between Binary

Patterns, Different Classification Algorithms Based on the Distance Measures ,K-

Nearest Neighbor Classifier, Radius Distance Nearest Neighbor Algorithm, KNN

Regression, Performance of Classifiers, Performance of Regression Algorithms.

Introduction to KNN

• Nearest Neighbor-based models are a class of machine learning algorithms that classify data

points based on the similarity (or distance) to nearby data points.

• These models are widely used in classification, regression, and recommendation tasks.

• The core idea is simple: for a given test point, the model finds the most similar data points from

the training set and uses their labels or values to make a prediction.
Nearest Neighbor Classifiers

• Basic idea:
– If it walks like a duck, quacks like a duck, then it’s probably a duck

Compute
Distance Test Record

Training Choose k of the

Records “nearest” records
Introduction to KNN
● KNN Classifier is a non-parametric and instance-based learning algorithm.
○ Non-parametric makes no assumptions about the distribution of data and thus
avoids the risks of mistaking the underlying distribution of the data.
○ Instance-based learning means that the algorithm doesn’t explicitly learn any
parameters.
● For classification, the algorithm obtains a majority vote between the K most
similar instances to a given “unseen” observation. K is a count.
● KNN is not suitable if the data is noisy and the target classes do not have clear
demarcation in terms of attribute values.

● The closest class will be identified using the distance measures like Euclidean
distance.
Introduction to Proximity Measures

• Proximity measures (or distance measures) are mathematical

tools used to quantify the similarity or dissimilarity between
two data points.
• These measures are critical in various machine learning tasks
such as clustering, classification, regression, anomaly detection,
and recommendation systems.
• Different proximity measures are suited to different types of
data and tasks. Below is a discussion of the most common
proximity measures with their mathematical derivations.
Distance measures

● Euclidean distance between any two points:

● Manhattan distance
Distance Metrics
Non-Metric Similarity Functions

• Non-metric similarity functions quantify the closeness or resemblance

between two objects without adhering to the mathematical properties of a
metric, such as symmetry or the triangle inequality.
• These functions are often used in domains where metric-based distances are
unsuitable, such as text, categorical data, or high-dimensional spaces.
Common Non-Metric Similarity Functions
Data standardization
● The distance formula is highly dependent on how features / attributes /

dimensions are measured.

● Those dimensions which have larger possible range of values
will dominate the result of the distance calculation using Euclidean
formula.
● To ensure all the dimensions have similar scale, we normalize the data on

all the dimensions / attributes.

● There are multiple ways of normalizing the data. We can use Z-score

standardization.
Where,
x̄: mean of the variable
s: standard deviation
KNN methodology

● Let’s say we have a new instances called x.

● Algorithm will calculate distance between x

and all the instances in the training set.

● Arrange these distances in increasing order.

● Find k nearest neighbors. If k = 3, then it will

select three nearest instances based on
the similarity measure.

● Use k neighbors to determine the class of x

using majority voting. (more than 1
instance in this case) of the closest instances.
KNN Example
● Humidity: (Independent variable) observation Humidity Temperature Rain

the percentage of humidity in 1 58 19 0

2 62 26 0
the atmosphere.
3 40 30 0
● Temperature: 4 36 35 0
5 87 19 1
(independent variable)
6 93 18 1
the temperature during 7 79 16 1
8 69 17 1
precipitation.
9 62 33 0
● Rain: (Target variable) indicates 10 71 15 1

whether it rained or not, takes 11 55 33 0

12 78 19 1
value 1 if rained and value 0
otherwise.
KNN Example

● Let us choose the K = 5 and use the

Euclidean distance
● Compute the Euclidean
distance Humidity Temperature Rainfall
84 34 ?
between the new data for
● For example, each
consider the
instance.observation
first
Humidity = 58 and Temperature = 19,
the Euclidean distance is
[(58 - 84)² + (19-34)²]½ = 31.623
KNN Example

Euclidean
● Computed the Euclidean distances for observation Distance Class label
(sorted) (Rainfall)
each instance with the new data and
5 18.25 1
sorted the data in ascending order with 12 18.97 1

respect to the Euclidean distance. 6 21.02 1

7 21.59 1
● Since K = 5, consider the class labels of
9 22.36 0
first five observations. 2 24.6 0
● 1 appears 4 times and 0 appears 1 time. 8 25.00 1

● Thus for our new instance the class label is 10 25.55 1

11 29.27 0
1 (using max voting).
1 31.62 0
● Implies that for Humidity = 84 and 3 44.55 0

temperature = 34, it will rain. 4 48.04 0

Implementation of KNN
K as a hyperparameter
● K can range from 1 to n number of training data points.
● K values can affect performance of the classifier.
● K in KNN is a hyper parameter. It has to be discovered through iterations!
● We can imagine K as a way of influencing the shape of the boundary
between classes.
● A simple approach is to use K = √n. It depends on individual cases, it is good to
run through various values of K.
● Odd values of K helps to avoid tie between predicted classes.
Small K Large K
A small value of 𝐾 means that ● Increases confidence in prediction.
noise will have a higher influence ● But if k is too large then decision
on the result i.e., the probability may be skewed.
of overfitting is very high. ● Computationally expensive
Let’s answer some questions

1. In which of the following

Images KNN will fail to
segregate the two classes
denoted by red and green
colors?
2. What happened if we build
a KNN model with k=1
Problem

► For X1=3 and X2=7 What will be the classification?

X1=Durability X2=Strength Y=Classification

7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Principal Component Analysis

Numerical Example: Calculation of principal components.

STEP 1: SAMPLE DATA SET

Let us consider X, which is a 3-variate dataset with 10 observations.

Each observation consists of 3 measurements on a wafer: Thickness
(Let x, 1st column), Horizontal displacement (Let y, 2nd column ),
Vertical displacement (Let z, 3rd column)
Principal Component Analysis
STEP 2: COMPUTE THE CORRELATION MATRIX.

First compute the correlation matrix.

x y z x y z
x CC (x,x) CC (x,y) CC (x,z) x 1.oo 0.66865 -0.10131
 7
Y CC (y,x) CC (y,y) CC (y,z)
y 0.668657 1.oo -0.28793
z CC (z,x) CC (z,y) CC (z,z)
z -0.10131 -0.28793 1.00

• The correlation of a variable with itself is 1. For that reason, all

the diagonal values are 1.00.
• Here, CC (x,y) means correlation coefficient between x and y.
Principal Component Analysis
STEP 2: COMPUTE THE CORRELATION MATRIX.

First compute the correlation matrix, let it be R.

x y z
X 1.oo 0.66865 -0.10131
7 
Y 0.668657 1.oo -0.28793
z -0.10131 -0.28793 1.00

Click here for python code to find correlation coefficient for the given data.

Click here for Excel sheet to find the correlation matrix for the same data.
Principal Component Analysis
► Example:
CC(x,y) =

x y xy x2 y2
Where, N is number of data points
7 4 28 49 16
4 1 4 16 1
6 3 18 36 9
8 6 48 64 36
8 5 40 64 25
7 2 14 49 4
5 3 15 25 9 Similarly, CC of other combinations can be
9 5 45 81 25 seen in matrix R.
7 4 28 49 16
8 2 16 64 4
Principal Component Analysis

STEP 3: SOLVE FOR THE ROOTS OF R

Next solve for the roots of correlation matrix R.

λ1 = 1.769, λ2 = 0.927, λ3 = 0.304

Principal Component Analysis
Note:

• Each eigenvalue satisfies |R−λI|=0.

• The sum of the eigenvalues =3=p, which is equal to the

trace of R (i.e., the sum of the main diagonal elements).

• The determinant of R is the product of the eigenvalues.

• The product is λ1×λ2×λ3=0.499

Principal Component Analysis

► STEP 4: COMPUTE THE FIRST COLUMN OF V MATRIX.

 Substituting the first eigenvalue of 1.769 and R in the

appropriate equation we obtain

 This is the matrix expression for three homogeneous equations

with three unknowns and yields the first column of V: 0.64
0.69
-0.34
Principal Component Analysis
► STEP 5: COMPUTE THE REMAINING COLUMNS OF THE V MATRIX

 Notice that if you multiply V by its transpose, the result is

an identity matrix, V′V=I.
PCA loadings and the loading matrix
PCA loadings and the loading matrix
PCA loadings and the loading matrix
Principal Component Analysis
► STEP 6: COMPUTE THE E1/2 MATRIX

 Now form the matrix E1/2, which is a diagonal matrix whose

elements are the square roots of the eigenvalues of R. Then
obtain S, the factor structure, using S=VE^{1/2}.

 So, for example, 0.91 is the correlation between the

second variable and the first principal component.
Principal Component Analysis
► STEP 7: COMPUTE THE COMMUNALITY.

 Next compute the communality, using the first two eigenvalues

only.
Principal Component Analysis

► STEP 8: DIAGONAL ELEMENTS REPORT HOW MUCH OF

THE VARIABILITY IS EXPLAINED.
 Communality consists of the diagonal elements.
var
1 0.8662
2 0.8420
3 0.9876
 This means that the first two principal components
"explain"
86.62 % of the first variable, 84.20 % of the second variable, and
98.76 % of the third.
Principal Component Analysis
► STEP 9: COMPUTE THE COEFFICIENT MATRIX

 The coefficient matrix, B, is formed using the reciprocals of the

diagonals of E1/2.
Principal Component Analysis

► STEP 10: COMPUTE THE PRINCIPAL FACTORS.

 Finally, we can compute the factor scores from ZB, where Z is X converted
to standard score form. These columns are the principal factors.
Principal Component Analysis

► PRINCIPAL FACTORS CONTROL CHART

 These factors can be plotted against the indices, which could be

times. If time is used, the resulting plot is an example of a
principal factors control chart.
QUESTION BANK

S. Question Mark BL CO
N s
o
1 a) Explain the concept of proximity measures and their importance in machine learning. [7M] 2 2
b) Differentiate between metric and non-metric similarity functions with suitable examples. [7M] 2 2
2 Solve a problem to calculate the Euclidean and Manhattan distances for given data points: [14M 3 2
]
Example: Data points: A = [3, 4], B = [6, 8], calculate both distances.
3 a) Describe different distance measures such as Euclidean, Manhattan, and Minkowski with [7M] 2 2
examples.
b) Problem: For two binary patterns A = [1, 0, 1, 1, 0] and B = [1, 1, 1, 0, 0], calculate the Jaccard [7M] 3 2
similarity.
4 a) Explain the K-Nearest Neighbor (KNN) algorithm and describe its working with an example. [7M] 2 2
b) Problem: Given a dataset: [(2, 3, "A"), (4, 5, "B"), (6, 7, "A")], classify the test point (3, 4) for K=3. [7M] 3 2

5 Describe and solve a Radius Distance Nearest Neighbor (RNN) classification problem: [14M 3 2
]
Use a sample dataset and classify points within a radius of 2 units.
QUESTION BANK

6 a) Differentiate between KNN classifier and KNN regression with examples. [7M] 2 2
b) Problem: Perform KNN regression on the given data points: [(1, 2), (2, 3), (3, 4)], predict for [7M] 3 2
x=2.5 using K=2.
7 Analyze the performance of classifiers using a confusion matrix and compute evaluation metrics: [14M] 4 2
Example: Confusion matrix: True Positives=50, False Positives=10, False Negatives=5, True 2
Negatives=35.
8 a) Discuss the factors affecting the performance of regression algorithms in KNN. [7M] 2 2
b) Problem: Calculate the mean squared error (MSE) and mean absolute error (MAE) for a KNN [7M] 4 2
regression problem:
Predicted values: [2.5, 3.7, 4.2], Actual values: [3, 4, 4.5].
Thank You!

K - Nearest Neighbor
No ratings yet
K - Nearest Neighbor
22 pages
05 KNN
No ratings yet
05 KNN
49 pages
Investigation of The Force Extension Graph For A Spring
No ratings yet
Investigation of The Force Extension Graph For A Spring
3 pages
K Nearest Neighbour Classifier
No ratings yet
K Nearest Neighbour Classifier
24 pages
S3 K Nearest Neighbor LKW 15jan2025
No ratings yet
S3 K Nearest Neighbor LKW 15jan2025
16 pages
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
No ratings yet
K-Nearest Neighbors: Marcel Van Velzen Junior Marte Garcia
8 pages
ML Lecture#2
No ratings yet
ML Lecture#2
70 pages
1 KNN-Algo
No ratings yet
1 KNN-Algo
27 pages
m3 Final-1
No ratings yet
m3 Final-1
171 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
35 pages
Week 5 - Instance-Based Learning & PCA
No ratings yet
Week 5 - Instance-Based Learning & PCA
69 pages
Gec 2023 Subject Assessment Guidelines
No ratings yet
Gec 2023 Subject Assessment Guidelines
35 pages
ML Unit-2
No ratings yet
ML Unit-2
33 pages
Caie As Level Psychology 9990 Methodology 63d5229efa0a7313631e05cb 853
No ratings yet
Caie As Level Psychology 9990 Methodology 63d5229efa0a7313631e05cb 853
9 pages
AIML
No ratings yet
AIML
13 pages
Methods/Stata Manual For School of Public Policy Oregon State University SOC 516 Alison Johnston
No ratings yet
Methods/Stata Manual For School of Public Policy Oregon State University SOC 516 Alison Johnston
261 pages
Chapter 4. K Nearest Neighbors
No ratings yet
Chapter 4. K Nearest Neighbors
55 pages
19-K-Nearest Neighbor Learning.-22-08-2024
No ratings yet
19-K-Nearest Neighbor Learning.-22-08-2024
25 pages
10 11648 J Ijass 20231102 11
No ratings yet
10 11648 J Ijass 20231102 11
8 pages
KNN
No ratings yet
KNN
53 pages
Probability Test Grade 4 2018 2019
No ratings yet
Probability Test Grade 4 2018 2019
5 pages
ML Lecture 13 KNN
No ratings yet
ML Lecture 13 KNN
14 pages
21STA024 Md. Toufik Umar Assignment On STAT 309 1
No ratings yet
21STA024 Md. Toufik Umar Assignment On STAT 309 1
12 pages
Introduction To Classification - KNN
No ratings yet
Introduction To Classification - KNN
29 pages
4.1 K-Nearest Neighbours (K-NN
No ratings yet
4.1 K-Nearest Neighbours (K-NN
9 pages
KNN CIML
No ratings yet
KNN CIML
12 pages
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
No ratings yet
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
74 pages
406d PDF
No ratings yet
406d PDF
6 pages
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
No ratings yet
JNTUK R20 B.tech CSE 3-2 Machine Learning Unit 2 Notes
33 pages
K Nearest Neighbor: Presented by
No ratings yet
K Nearest Neighbor: Presented by
29 pages
Week 07
No ratings yet
Week 07
24 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
DMTH202 5
No ratings yet
DMTH202 5
2 pages
Instance-Based Learning: K-Nearest Neighbour Learning
No ratings yet
Instance-Based Learning: K-Nearest Neighbour Learning
21 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
K-Nearest Neighbor
No ratings yet
K-Nearest Neighbor
22 pages
Rbi Grade B Depr Analysis and Pyqs
No ratings yet
Rbi Grade B Depr Analysis and Pyqs
17 pages
Part A 3. KNN Classification
No ratings yet
Part A 3. KNN Classification
35 pages
Supervised Example KNN
No ratings yet
Supervised Example KNN
22 pages
K Nearestneighborknnalgorithm 241117075907 d767c46d
No ratings yet
K Nearestneighborknnalgorithm 241117075907 d767c46d
13 pages
4 KNN Classifier
No ratings yet
4 KNN Classifier
6 pages
Matrices Notes Part 2 2024 SOLUTIONS
No ratings yet
Matrices Notes Part 2 2024 SOLUTIONS
13 pages
Lecture 07 KNN 14112022 034756pm
100% (1)
Lecture 07 KNN 14112022 034756pm
24 pages
Bài nhóm tìm hiểu về KNN
No ratings yet
Bài nhóm tìm hiểu về KNN
5 pages
Osborne (2008) CH 22 Testing The Assumptions of Analysis of Variance
No ratings yet
Osborne (2008) CH 22 Testing The Assumptions of Analysis of Variance
29 pages
Research Paper
No ratings yet
Research Paper
6 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
Introduction To KNN and R
No ratings yet
Introduction To KNN and R
12 pages
14 K - Nearest Neighbours
No ratings yet
14 K - Nearest Neighbours
8 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Algorithms - K Nearest Neighbors
No ratings yet
Algorithms - K Nearest Neighbors
23 pages
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
100% (1)
Introduction To K-Nearest Neighbors: Simplified (With Implementation in Python)
125 pages
Decision Tree KNN
No ratings yet
Decision Tree KNN
9 pages
Adaptive Inverse Control of Two-Axis Hydraulic Shaking Table Based On RLS Filter
No ratings yet
Adaptive Inverse Control of Two-Axis Hydraulic Shaking Table Based On RLS Filter
6 pages
Analysis of Selected Mathematical Models of High-Cycle S-N Characteristics
No ratings yet
Analysis of Selected Mathematical Models of High-Cycle S-N Characteristics
15 pages
08 Classification Using K NN
No ratings yet
08 Classification Using K NN
23 pages
CIA 3 Assignment 4 Unit 2 PDF
No ratings yet
CIA 3 Assignment 4 Unit 2 PDF
8 pages
4.kNN Concepts
No ratings yet
4.kNN Concepts
12 pages
Math-12th Sample Question Papers (Solved) 2024-25
No ratings yet
Math-12th Sample Question Papers (Solved) 2024-25
21 pages
K Nearest Neighbors (KNN) : "Birds of A Feather Flock Together"
No ratings yet
K Nearest Neighbors (KNN) : "Birds of A Feather Flock Together"
16 pages
Lead Compensator Design Paper
No ratings yet
Lead Compensator Design Paper
17 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
K Nearest Neighbor - Step by Step Tutorial
No ratings yet
K Nearest Neighbor - Step by Step Tutorial
16 pages
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbor Algorithm
25 pages
FYP Final Report
No ratings yet
FYP Final Report
40 pages
Raw Cashew Moisture Tester: Operating Manual
No ratings yet
Raw Cashew Moisture Tester: Operating Manual
24 pages
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
No ratings yet
Introduction To Machine Learning: K-Nearest Neighbors: Zhongheng Zhang
7 pages
Pattern Recognition Assignment: Hari Narayan N.U B110490EE EEE A Batch
No ratings yet
Pattern Recognition Assignment: Hari Narayan N.U B110490EE EEE A Batch
18 pages
Studying The Factors Affecting The Settling Velocity of Solid Particles in Non-Newtonian Fluids
No ratings yet
Studying The Factors Affecting The Settling Velocity of Solid Particles in Non-Newtonian Fluids
10 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
KNN Presentation
No ratings yet
KNN Presentation
16 pages
Instance Based Learning
No ratings yet
Instance Based Learning
20 pages
Triangles Report
No ratings yet
Triangles Report
10 pages
GOVT 702: Advanced Political Analysis Georgetown University
No ratings yet
GOVT 702: Advanced Political Analysis Georgetown University
5 pages
Concrete Design Flowcharts 3-14-17
No ratings yet
Concrete Design Flowcharts 3-14-17
38 pages
EViews Guide
100% (1)
EViews Guide
14 pages
ML Notes
100% (2)
ML Notes
125 pages
K-Nearest Neighbour Classifier: Prerequisite
No ratings yet
K-Nearest Neighbour Classifier: Prerequisite
6 pages
Science MODEL QUESTION 2079
No ratings yet
Science MODEL QUESTION 2079
5 pages
Presentation Regression
No ratings yet
Presentation Regression
12 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
KNN
No ratings yet
KNN
5 pages
Grip Worksheets 35 and 39 Grade 7
No ratings yet
Grip Worksheets 35 and 39 Grade 7
2 pages
Example 1: Riding Mowers
No ratings yet
Example 1: Riding Mowers
6 pages
Chapter - 5 Is - LM Model Econ - 102 2
No ratings yet
Chapter - 5 Is - LM Model Econ - 102 2
28 pages
ST 16 2-5 (-4)
No ratings yet
ST 16 2-5 (-4)
9 pages
Aerodynamic Flutter Analysis of Suspension Bridges by A Modal Technique
No ratings yet
Aerodynamic Flutter Analysis of Suspension Bridges by A Modal Technique
8 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Unit 2h

Uploaded by

Unit 2h

Uploaded by

PACE INSTITUTE OF TECHNOLOGY AND SCIENCES

Dr. G. Ganesh Naidu ME,Ph.D,PG (AI & ML)-IIT KGP

Contents : Nearest Neighbor-Based Models: Introduction to Proximity Measures,

Distance Measures, Non-Metric Similarity Functions, Proximity Between Binary

Patterns, Different Classification Algorithms Based on the Distance Measures ,K-

Nearest Neighbor Classifier, Radius Distance Nearest Neighbor Algorithm, KNN

Regression, Performance of Classifiers, Performance of Regression Algorithms.

points based on the similarity (or distance) to nearby data points.

Training Choose k of the

• Proximity measures (or distance measures) are mathematical

● Euclidean distance between any two points:

• Non-metric similarity functions quantify the closeness or resemblance

dimensions are measured.

all the dimensions / attributes.

● Let’s say we have a new instances called x.

● Algorithm will calculate distance between x

● Arrange these distances in increasing order.

● Find k nearest neighbors. If k = 3, then it will

● Use k neighbors to determine the class of x

the percentage of humidity in 1 58 19 0

whether it rained or not, takes 11 55 33 0

● Let us choose the K = 5 and use the

respect to the Euclidean distance. 6 21.02 1

● Thus for our new instance the class label is 10 25.55 1

temperature = 34, it will rain. 4 48.04 0

1. In which of the following

► For X1=3 and X2=7 What will be the classification?

X1=Durability X2=Strength Y=Classification

Numerical Example: Calculation of principal components.

Let us consider X, which is a 3-variate dataset with 10 observations.

First compute the correlation matrix.

• The correlation of a variable with itself is 1. For that reason, all

First compute the correlation matrix, let it be R.

STEP 3: SOLVE FOR THE ROOTS OF R

Next solve for the roots of correlation matrix R.

λ1 = 1.769, λ2 = 0.927, λ3 = 0.304

• Each eigenvalue satisfies |R−λI|=0.

• The sum of the eigenvalues =3=p, which is equal to the

• The determinant of R is the product of the eigenvalues.

• The product is λ1×λ2×λ3=0.499

► STEP 4: COMPUTE THE FIRST COLUMN OF V MATRIX.

 Substituting the first eigenvalue of 1.769 and R in the

 This is the matrix expression for three homogeneous equations

 Notice that if you multiply V by its transpose, the result is

 Now form the matrix E1/2, which is a diagonal matrix whose

 So, for example, 0.91 is the correlation between the

 Next compute the communality, using the first two eigenvalues

► STEP 8: DIAGONAL ELEMENTS REPORT HOW MUCH OF

 The coefficient matrix, B, is formed using the reciprocals of the

► STEP 10: COMPUTE THE PRINCIPAL FACTORS.

► PRINCIPAL FACTORS CONTROL CHART

 These factors can be plotted against the indices, which could be

You might also like