0% found this document useful (0 votes)

4 views4 pages

Module 5 Notes

Fisher Discriminant Analysis (FDA) is a dimensionality reduction and classification technique that maximizes class separability by finding linear combinations of features. It is applied in various fields such as face recognition, speech recognition, bioinformatics, remote sensing, and quality control. The Parzen Window method is a non-parametric technique for estimating probability density functions, while K-Nearest Neighbors (K-NN) is a supervised learning algorithm that classifies data based on similarity to existing data points.

Uploaded by

Papan Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views4 pages

Module 5 Notes

Uploaded by

Papan Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Fisher Discriminant Analysis (FDA)

Fisher Discriminant Analysis (FDA), also known as Linear Discriminant Analysis (LDA), is a
dimensionality reduction and classification technique used in pattern recognition and machine learning. It
was developed by Ronald A. Fisher in the 1930s and is based on the concept of finding linear
combinations of features that best separate different classes of data.
Here's how Fisher Discriminant Analysis works:
1. Data Preprocessing: FDA assumes that the input data is numerical and continuous. Categorical
variables may need to be converted into numerical form, and data normalization may be performed to
ensure that each feature contributes equally to the analysis.
2. Class Separability: FDA aims to find linear combinations of features that maximize the
separability between different classes in the data. It does this by maximizing the between-class scatter
while minimizing the within-class scatter.
3. Between-Class Scatter: The between-class scatter measures the spread between the means of
different classes. It quantifies how well the classes are separated from each other in the feature space.
4. Within-Class Scatter: The within-class scatter measures the spread within each class. It quantifies
the variability of data points within the same class.
5. Fisher Criterion: The Fisher criterion is defined as the ratio of between-class scatter to within-
class scatter. The goal of FDA is to find linear combinations of features that maximize this criterion.
6. Eigenvalue Decomposition: FDA typically involves eigenvalue decomposition of the scatter
matrices to find the directions (linear discriminants) in which the data is most discriminative. These linear
discriminants are the optimal projection axes for separating the classes.
7. Dimensionality Reduction: Once the linear discriminants are obtained, FDA can be used for
dimensionality reduction by projecting the data onto the subspace spanned by the discriminant vectors.
This reduces the dimensionality of the feature space while preserving the class-discriminatory information
as much as possible.
8. Classification: FDA can also be used for classification by applying a decision rule to the
projected data. For example, a simple decision rule might involve assigning a new data point to the class
with the nearest mean in the projected space.

Applications
1. Face Recognition: In facial recognition systems, FDA can be used to extract discriminative
features from facial images and classify them into different individuals. By analyzing the patterns of pixel
intensities in facial images, FDA can find linear combinations of features that maximize the differences
between individuals while minimizing the variations within the same individual.
2. Speech Recognition: FDA can also be applied in speech recognition to classify spoken words or
phonemes. By extracting relevant acoustic features from speech signals, such as Mel-frequency cepstral
coefficients (MFCCs) or spectral features, FDA can help differentiate between different phonetic classes
or spoken words.
3. Bioinformatics: In bioinformatics, FDA can be used for tasks such as classifying gene expression
data into different disease subtypes or identifying biomarkers for disease diagnosis. By analyzing the
expression levels of genes across different samples, FDA can help identify gene sets that are most
discriminative for differentiating between healthy and diseased individuals.
4. Remote Sensing: In remote sensing applications, FDA can be used to classify land cover types or
detect environmental changes based on satellite imagery. By extracting spectral features from satellite
images, FDA can help classify different land cover classes such as vegetation, water bodies, and urban
areas, enabling applications in agriculture, environmental monitoring, and urban planning.
5. Quality Control: In manufacturing and industrial applications, FDA can be used for quality
control and defect detection. By analyzing sensor data or measurements from production processes, FDA
can help classify products into different quality categories or detect anomalies that indicate manufacturing
defects or deviations from desired specifications.

Parzen Window method

The Parzen Window method, also known as the kernel density estimation (KDE) method, is a non-
parametric technique used for estimating the probability density function (PDF) of a random variable. It's
particularly useful when the underlying distribution of the data is unknown or difficult to model
parametrically.
Here's how the Parzen Window method works:
1. Data Representation: Assume we have a dataset {𝑥1,𝑥2,...,𝑥𝑛} consisting of 𝑛 observations of a
random variable 𝑋. Each 𝑥𝑖 represents a data point in the dataset.
2. Window Function: The Parzen Window method uses a window function, often a kernel function
𝐾(⋅), to estimate the PDF. The window function defines the shape and size of the "window" or "kernel"
around each data point.
3. Density Estimation: For a given data point 𝑥, the Parzen Window estimate of the PDF at 𝑥 is
computed by averaging the contributions of all data points within the window centered at x. This is done
by applying the window function to each data point and summing up the results:

Where:
• 𝑓^(𝑥) is the estimated PDF at 𝑥.
• 𝐾(⋅) is the kernel function.
• ℎ is the bandwidth or width of the window, which controls the smoothness of the
estimated PDF. It determines the size of the neighborhood around each data point that contributes to the
density estimate.
2. Choice of Kernel Function: Commonly used kernel functions include the Gaussian
(normal) kernel, the Epanechnikov kernel, and the uniform kernel. The choice of kernel function affects
the smoothness and shape of the estimated PDF.
3. Bandwidth Selection: The bandwidth ℎ is a crucial parameter in the Parzen Window
method. It controls the trade-off between bias and variance in the density estimate. Choosing an
appropriate bandwidth is important for obtaining an accurate estimate of the underlying PDF. Common
methods for bandwidth selection include cross-validation and Silverman's rule of thumb.
4. Normalization: To ensure that the estimated PDF integrates to 1 over the entire domain,
the estimated densities are often normalized by dividing each density estimate by the total number of data
points and the volume of the window.

K-Nearest Neighbors (K-NN)

K-Nearest Neighbors (K-NN) is a straightforward yet effective Machine Learning algorithm that falls
under the Supervised Learning category. It operates on the principle of similarity, assigning new data
points to categories based on their resemblance to existing data. By storing all available data and gauging
similarity, K-NN facilitates easy classification of new data into relevant categories. It is versatile,
applicable to both Regression and Classification tasks, although it's predominantly used for the latter.
Notably, K-NN is non-parametric, making no assumptions about the data's underlying distribution. Due to
its deferred learning approach, where it stores the dataset during training and classifies new data when
prompted, K-NN is often referred to as a "lazy learner" algorithm.
Example:
Consider a scenario where you have a dataset of various fruits such as apples, oranges, and bananas. Each
fruit is characterized by its weight and color. Now, if you were to use the K-Nearest Neighbors (K-NN)
algorithm to classify a new fruit, say a small, red fruit, the algorithm would examine the characteristics of
nearby fruits in the dataset. If the majority of nearby fruits are apples, the algorithm would classify the
new fruit as an apple. However, if the majority are oranges, it would classify the new fruit as an orange.
The algorithm's decision depends on the similarity between the features of the new fruit and those of
existing fruits in the dataset.
Steps:
1. Load the Data: Start by loading the dataset that contains the features (attributes) of the objects
you want to classify and their corresponding labels (class).
2. Choose the Value of K: Determine the value of K, which represents the number of nearest
neighbors to consider when classifying a new data point. This value is typically chosen empirically or
through techniques like cross-validation.
3. Calculate Distance: Compute the distance between the new data point and all other data points in
the dataset. The distance metric used (e.g., Euclidean distance, Manhattan distance) depends on the nature
of the data and the problem.
4. Find K Nearest Neighbors: Identify the K data points with the shortest distances to the new data
point. These data points are the "nearest neighbors."
5. Majority Vote: Determine the majority class among the K nearest neighbors. This is typically
done by counting the occurrences of each class label among the neighbors.
6. Assign Class: Assign the majority class as the predicted class for the new data point. If there is a
tie, additional rules (such as considering distances) may be used to break it.
7. Evaluate Model: After classifying all data points, evaluate the performance of the model using
metrics such as accuracy, precision, recall, or F1 score. This step helps assess how well the model
generalizes to unseen data.
8. Adjust K: If necessary, iterate over different values of K and evaluate the model's performance to
find the optimal value.
9. Predict New Data: Once the model is trained and evaluated, you can use it to predict the classes
of new, unseen data points by repeating steps 3 to 6 for each new data point.
10. Finalize Model: Finally, once satisfied with the model's performance, finalize it for deployment
and use it to classify new data in real-world applications.

Guide To Intelligent Data Analysis
No ratings yet
Guide To Intelligent Data Analysis
398 pages
Data Mining Unit-2
No ratings yet
Data Mining Unit-2
37 pages
AI Unit 5
No ratings yet
AI Unit 5
103 pages
Introduction To Data Science
100% (1)
Introduction To Data Science
200 pages
Introduction To Data Mining 2005
60% (5)
Introduction To Data Mining 2005
400 pages
Da Thoery
No ratings yet
Da Thoery
24 pages
Classification
No ratings yet
Classification
32 pages
Data Science Unit 5 Sppu Notes
No ratings yet
Data Science Unit 5 Sppu Notes
23 pages
W1
No ratings yet
W1
15 pages
QSRI Lecture4
No ratings yet
QSRI Lecture4
56 pages
DataStreamsCRC Anjaly
No ratings yet
DataStreamsCRC Anjaly
258 pages
Datamining Unit 3
No ratings yet
Datamining Unit 3
47 pages
Detecting Patterns With Unsupervised Learning
No ratings yet
Detecting Patterns With Unsupervised Learning
21 pages
Module 3 Lab 2
No ratings yet
Module 3 Lab 2
6 pages
Unit 5 Pattern Recognition
No ratings yet
Unit 5 Pattern Recognition
10 pages
Unit 3 - MLnotes-WPS Office
No ratings yet
Unit 3 - MLnotes-WPS Office
18 pages
Simple 4,6 DWDM
No ratings yet
Simple 4,6 DWDM
5 pages
EDA Explanations
No ratings yet
EDA Explanations
22 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
TOD 212 - Digging Through Data - PPT - For Students - Monsoon 2023 (Autosaved)
No ratings yet
TOD 212 - Digging Through Data - PPT - For Students - Monsoon 2023 (Autosaved)
18 pages
Pattern Recognition: Fisher's Discriminant Analysis
No ratings yet
Pattern Recognition: Fisher's Discriminant Analysis
37 pages
Fishers Linear Discriminant Function
No ratings yet
Fishers Linear Discriminant Function
24 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Entropy (S) Log (P) : I 1c I I
No ratings yet
Entropy (S) Log (P) : I 1c I I
5 pages
Jguytibu
No ratings yet
Jguytibu
4 pages
Data Mining University Answer
No ratings yet
Data Mining University Answer
10 pages
Python DM Lab Manual Part 2
No ratings yet
Python DM Lab Manual Part 2
8 pages
R Course - Part7 ML - Exercise Sheet 2024
No ratings yet
R Course - Part7 ML - Exercise Sheet 2024
8 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
Michael Melese (PH.D.) Michael - Melese@aau - Edu.et
No ratings yet
Michael Melese (PH.D.) Michael - Melese@aau - Edu.et
22 pages
UNIT2SVMKNN
No ratings yet
UNIT2SVMKNN
31 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
20 Cs 112
No ratings yet
20 Cs 112
11 pages
DSH - L5 - Data-Driven Approaches - Concepts
No ratings yet
DSH - L5 - Data-Driven Approaches - Concepts
38 pages
Prems Mann
0% (1)
Prems Mann
17 pages
(Ebook PDF) Business Statistics: A First Course, Global Edition 8Th Edition
No ratings yet
(Ebook PDF) Business Statistics: A First Course, Global Edition 8Th Edition
49 pages
Lecture 4
No ratings yet
Lecture 4
4 pages
Evaluation of Student Academic Performan
No ratings yet
Evaluation of Student Academic Performan
7 pages
Clustering
No ratings yet
Clustering
80 pages
It-3006 (Da) - CS End April 2024
No ratings yet
It-3006 (Da) - CS End April 2024
23 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
PR - Unit 4
No ratings yet
PR - Unit 4
15 pages
Lec 04
No ratings yet
Lec 04
70 pages
TFM Oviedo de La Fuente
No ratings yet
TFM Oviedo de La Fuente
92 pages
KMEANS
No ratings yet
KMEANS
9 pages
ch3 SEM Methods of Estimation - 105548
No ratings yet
ch3 SEM Methods of Estimation - 105548
17 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Learning Book 11 Feb
No ratings yet
Learning Book 11 Feb
322 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Interactive and Dynamic Graphics For Data Analysis
No ratings yet
Interactive and Dynamic Graphics For Data Analysis
169 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Feature Extraction Techniques Using Support Vector Machines in Disease Prediction
No ratings yet
Feature Extraction Techniques Using Support Vector Machines in Disease Prediction
8 pages
Stats AP Review
100% (2)
Stats AP Review
38 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Class XI Economics 2011
0% (1)
Class XI Economics 2011
159 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
Graph Partitioning Advance Clustering Technique
No ratings yet
Graph Partitioning Advance Clustering Technique
14 pages
Discovering Knowledge in Data: Lecture Review of
No ratings yet
Discovering Knowledge in Data: Lecture Review of
20 pages
Quantitative Methods - Chapter 1-2-5-6-7-8-9-10-15-17
No ratings yet
Quantitative Methods - Chapter 1-2-5-6-7-8-9-10-15-17
197 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
ML.4-Classification Techniques (Week 5,6,7)
No ratings yet
ML.4-Classification Techniques (Week 5,6,7)
56 pages
CH 03 SM
No ratings yet
CH 03 SM
44 pages
Structural Equation Models With Latent V
No ratings yet
Structural Equation Models With Latent V
36 pages
Teks DATA SCIENCE Syllabus - QR
No ratings yet
Teks DATA SCIENCE Syllabus - QR
26 pages
A Note On R
No ratings yet
A Note On R
90 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Mat 2377 Final 2011
No ratings yet
Mat 2377 Final 2011
12 pages
Forecasting in INAR (1) Model
No ratings yet
Forecasting in INAR (1) Model
17 pages
Notes Part 2 PDF
No ratings yet
Notes Part 2 PDF
63 pages
Berhan Abera PDF
No ratings yet
Berhan Abera PDF
57 pages
Feature Engineering Techniques
No ratings yet
Feature Engineering Techniques
5 pages
Session 09 - BS - 2020-Z Score
No ratings yet
Session 09 - BS - 2020-Z Score
32 pages
Allison
No ratings yet
Allison
6 pages
Nov Dec 2022
No ratings yet
Nov Dec 2022
4 pages
4793 11183 1 PB
No ratings yet
4793 11183 1 PB
6 pages
Let's Apply
No ratings yet
Let's Apply
5 pages
METHODS OF DETE-WPS Office
No ratings yet
METHODS OF DETE-WPS Office
8 pages
A Primer On Strong Vs Weak Control of Familywise Error Rate: Michael A. Proschan Erica H. Brittain
No ratings yet
A Primer On Strong Vs Weak Control of Familywise Error Rate: Michael A. Proschan Erica H. Brittain
7 pages
7z1018 CW Example Predicting House Prices in King County
No ratings yet
7z1018 CW Example Predicting House Prices in King County
16 pages
Learning Task 10 - The Central Limit Theorem, Confidence Interval and Sample Size
No ratings yet
Learning Task 10 - The Central Limit Theorem, Confidence Interval and Sample Size
7 pages
Temario - Task 7
No ratings yet
Temario - Task 7
1 page
Assignment 1 CLB20903 January 2020 PDF
No ratings yet
Assignment 1 CLB20903 January 2020 PDF
4 pages
(ST-APP) Summary of probability distributions: n x π, x = 0, 1, - . -, n
No ratings yet
(ST-APP) Summary of probability distributions: n x π, x = 0, 1, - . -, n
2 pages
Parametric and Nonparametric
No ratings yet
Parametric and Nonparametric
2 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Module 5 Notes

Uploaded by

Module 5 Notes

Uploaded by

Fisher Discriminant Analysis (FDA)

Parzen Window method

K-Nearest Neighbors (K-NN)

You might also like