0% found this document useful (0 votes)

49 views

Machine Learning

The EM algorithm is an iterative method for finding maximum likelihood estimates of parameters in probabilistic models with latent variables. It alternates between performing an expectation (E) step, which computes the expected value of the log-likelihood with respect to the latent variables, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. This process is repeated until convergence. The K-nearest neighbors algorithm is a simple supervised learning method that stores all available cases and classifies new cases based on a majority vote of its k nearest neighbors. It can be used for both classification and regression tasks by using different distance metrics to find the nearest neighbors.

Uploaded by

Ali Hiadr

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views

Machine Learning

Uploaded by

Ali Hiadr

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

The Islamia University of Bahawalpur

Department: BS IT 7th Semester Morning (A)

Assignment: • EM Algorithm
• K-Nearest Neighbors (KNN)

Subject: Machine Learning

(INFT- 4028)

Submitted to: Dr. Qurat ul Ain

Submitted by: Group # 05

Areeba Samar 1003
Khadija Sajid 1009
Amna Mustafa 1015
Zoha Azmat 1021
Muhammad Awais 1043
Muhammad Hamza Zahid 1071
EM Algorithm

In machine learning, the Expectation-Maximization (EM) algorithm is a fundamental technique

used for unsupervised learning, particularly in situations where the data has missing or hidden
variables. Here's a more detailed presentation of the EM algorithm in the context of machine
learning:

1. Problem Setting:
Incomplete Data: We have a dataset with missing or hidden variables. The goal is to estimate
the parameters of a statistical model that best describes the observed data.
Model Assumption: We assume a probabilistic model that describes how the observed data is
generated from the hidden variables and model parameters.

2. E-step (Expectation Step):

Compute Expected Values: Given the current estimates of the model parameters, compute the
expected values of the missing or latent variables using the observed data.
Use Conditional Probabilities: Utilize conditional probabilities to estimate the distribution of
the missing or latent variables given the observed data and current parameter estimates.

3. M-step (Maximization Step):

Update Parameters: Update the model parameters to maximize the likelihood of the observed
data, taking into account the expected values of the missing or latent variables computed in the
E-step.
Optimization Techniques: This step often involves solving optimization problems to find the
parameter values that maximize the likelihood function, typically using techniques like gradient
ascent.

4. Iterative Process:
Iterate: Alternate between the E-step and M-step until the algorithm converges, i.e., until the
parameter estimates stabilize and no longer change significantly between iterations.

5. Applications in Machine Learning:

Clustering: EM algorithm is commonly used for clustering algorithms like Gaussian Mixture
Models (GMMs), where each cluster is modeled as a Gaussian distribution, and the EM algorithm
is used to estimate the parameters of the Gaussians and the assignment of data points to clusters.
Latent Variable Models: EM algorithm is also used in latent variable models like Hidden Markov
Models (HMMs) and Latent Dirichlet Allocation (LDA), where there are hidden variables that
govern the generation of the observed data, and the EM algorithm is used to infer the hidden
variables and estimate the model parameters.
Missing Data Imputation: EM algorithm can be used for imputing missing values in datasets by
modeling the data generation process and estimating the missing values using the observed
data.

6. Considerations:
Initialization: The performance of the EM algorithm can be sensitive to the initial parameter
values, and it is often beneficial to use multiple initializations or more sophisticated techniques
like EM with random restarts to avoid local optima.
Convergence: While the EM algorithm is guaranteed to converge to a local maximum of the
likelihood function, it may not always find the global maximum, so careful initialization and
multiple runs may be required to achieve good results.
Overall, the EM algorithm provides a powerful framework for estimating the parameters of
complex probabilistic models in machine learning, particularly in situations involving missing or
hidden variables, and it is widely used in various applications like clustering, latent variable
modeling, and missing data imputation.

K-Nearest Neighbors

The K-Nearest Neighbors (KNN) algorithm is a robust and intuitive machine learning method
employed to tackle classification and regression problems. By capitalizing on the concept of
similarity, KNN predicts the label or value of a new data point by considering its K closest
neighbours in the training dataset. In this article, we will learn about a supervised learning
algorithm (KNN) or the k – Nearest Neighbours, highlighting it’s user-friendly nature.

What is the K-Nearest Neighbors Algorithm?

K-Nearest Neighbours is one of the most basic yet essential classification algorithms in
Machine Learning. It belongs to the supervised learning domain and finds intense application in
pattern recognition, data mining, and intrusion detection.

It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not

make any underlying assumptions about the distribution of data (as opposed to other algorithms
such as GMM, which assume a Gaussian distribution of the given data). We are given some
prior data (also called training data), which classifies coordinates into groups identified by an
attribute.

Why do we need a KNN algorithm?

(K-NN) algorithm is a versatile and widely used machine learning algorithm that is primarily
used for its simplicity and ease of implementation. It does not require any assumptions about
the underlying data distribution. It can also handle both numerical and categorical data, making
it a flexible choice for various types of datasets in classification and regression tasks. It is a
non-parametric method that makes predictions based on the similarity of data points in a given
dataset. K-NN is less sensitive to outliers compared to other algorithms.

The K-NN algorithm works by finding the K nearest neighbors to a given data point based on a
distance metric, such as Euclidean distance. The class or value of the data point is then
determined by the majority vote or average of the K neighbors. This approach allows the
algorithm to adapt to different patterns and make predictions based on the local structure of the
data.

Distance Metrics Used in KNN Algorithm:

As we know that the KNN algorithm helps us identify the nearest points or the groups for a
query point. But to determine the closest groups or the nearest points for a query point we need
some metric. For this purpose, we use below distance metrics:

1. Euclidean Distance:

This is nothing but the cartesian distance between the two points which are in the
plane/hyperplane. Euclidean distance can also be visualized as the length of the straight line
that joins the two points which are into consideration. This metric helps us calculate the net
displacement done between the two states of an object.

2. Manhattan Distance:

Manhattan Distance metric is generally used when we are interested in the total distance
traveled by the object instead of the displacement. This metric is calculated by summing the
absolute difference between the coordinates of the points in n-dimensions.
3. Workings of KNN algorithm:

Thе K-Nearest Neighbors (KNN) algorithm operates on the principle of similarity, where it
predicts the label or value of a new data point by considering the labels or values of its K
nearest neighbors in the training dataset.

To make predictions, the algorithm calculates the distance between each new data point in the
test dataset and all the data points in the training dataset. The Euclidean distance is a commonly
used distance metric in K-NN, but other distance metrics, such as Manhattan distance or
Minkowski distance, can also be used depending on the problem and data. Once theDistances
between the new data point and all the data points in the training dataset are calculated, the
algorithm proceeds to find the K nearest neighbors based on these distances. Thе specific
method for selecting the nearest neighbors can vary, but a common approach is to sort the
distances in ascending order and choose the K data points with the shortest distances.

After identifying the K nearest neighbors, the algorithm makes predictions based on the labels
or values associated with these neighbors. For classification tasks, the majority class among the
K neighbors is assigned as the predicted label for the new data point. For regression tasks, the
average or weighted average of the values of the K neighbors is assigned as the predicted value.

Let X be the training dataset with n data points, where each data point is represented by a
ddimensional feature vector

X_i and Y be the corresponding labels or values for each data point in X.Given a new data
point x, the algorithm calculates the distance between x and each data point

X_i in X using a distance metric, such as Euclidean distance:

.
Pros and Cons:

Pros: Simple, easy to understand, and doesn't make strong assumptions about the underlying
data distribution.
Cons: Computationally expensive for large datasets, sensitive to irrelevant features, and may
struggle with high-dimensional data. Normalization:

Scaling features can be important for KNN since it's distance-based. Features with larger scales
might dominate the distance calculations.
KNN is often used in scenarios where the decision boundaries are complex and not easily
defined by a mathematical formula. It's a non-parametric and instance-based learning
algorithm, meaning it doesn't explicitly learn a model during training; instead, it memorizes the
training data and makes predictions based on similarity in the input space.

6 - KNN Classifier
No ratings yet
6 - KNN Classifier
10 pages
Entropy, Relative Entropy and Mutual Information
No ratings yet
Entropy, Relative Entropy and Mutual Information
4 pages
12_23ECE216_Nearest Neighbors
No ratings yet
12_23ECE216_Nearest Neighbors
29 pages
'Machine Learning (Nagarjun)
No ratings yet
'Machine Learning (Nagarjun)
10 pages
K-Nearest Neighbours (KNN)
No ratings yet
K-Nearest Neighbours (KNN)
10 pages
KNN
No ratings yet
KNN
53 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
2.unit 2 ML Q&A
No ratings yet
2.unit 2 ML Q&A
36 pages
Road Traffic Algorithm
No ratings yet
Road Traffic Algorithm
5 pages
ML CH 3
No ratings yet
ML CH 3
88 pages
KNN Dan KMeans
No ratings yet
KNN Dan KMeans
37 pages
ML-KN
No ratings yet
ML-KN
12 pages
UNIT 3 ML Distance Based Learning
No ratings yet
UNIT 3 ML Distance Based Learning
19 pages
ML04_KNN-SVM_2024-2025
No ratings yet
ML04_KNN-SVM_2024-2025
57 pages
KMEANS
No ratings yet
KMEANS
9 pages
Unit V: Distance and Rule Based Models
No ratings yet
Unit V: Distance and Rule Based Models
56 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
KNN (K Nearest Neighbor)
No ratings yet
KNN (K Nearest Neighbor)
21 pages
K-Nearest Neighbors
No ratings yet
K-Nearest Neighbors
2 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
Day43 KNN Intro
No ratings yet
Day43 KNN Intro
4 pages
KNN - Algorithm - SVM - Algorithm
No ratings yet
KNN - Algorithm - SVM - Algorithm
27 pages
Introduction To AI and ML - UNIT 4
No ratings yet
Introduction To AI and ML - UNIT 4
29 pages
AI Unit 5 Part1
No ratings yet
AI Unit 5 Part1
6 pages
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
No ratings yet
Enhanced K-Nearest Neighbor Algorithm: Dalvinder Singh Dhaliwal, Parvinder S. Sandhu, S. N. Panda
5 pages
ML Unit 3
No ratings yet
ML Unit 3
12 pages
Unit-4 Unsupervised Algorithm
No ratings yet
Unit-4 Unsupervised Algorithm
18 pages
Lecture 3
No ratings yet
Lecture 3
17 pages
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
No ratings yet
A Complete Guide To K Nearest Neighbors Algorithm 1598272616
13 pages
Experiment 2.2 KNN Classifier
No ratings yet
Experiment 2.2 KNN Classifier
7 pages
Lecture Week 2 KNN and Model Evaluation PDF
100% (1)
Lecture Week 2 KNN and Model Evaluation PDF
53 pages
Research Paper
No ratings yet
Research Paper
6 pages
CSL0777 L22
No ratings yet
CSL0777 L22
35 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
ML Unit -2
No ratings yet
ML Unit -2
85 pages
K-Nearest Neighbor Classification-Algorithm and Characteristics
No ratings yet
K-Nearest Neighbor Classification-Algorithm and Characteristics
6 pages
ML-Unit 5
No ratings yet
ML-Unit 5
40 pages
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
No ratings yet
K-Nearest Neighbor (KNN) : Non-Parametric Algorithm
7 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 479: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
23 pages
Notes 02
No ratings yet
Notes 02
79 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
STAT 451: Introduction To Machine Learning Lecture Notes
No ratings yet
STAT 451: Introduction To Machine Learning Lecture Notes
22 pages
k-Nearest Neighbors (k-NN) Algorithm
No ratings yet
k-Nearest Neighbors (k-NN) Algorithm
10 pages
Lecture 14 and 15
No ratings yet
Lecture 14 and 15
42 pages
02-knn Notes
No ratings yet
02-knn Notes
23 pages
05 K-Nearest Neighbors
No ratings yet
05 K-Nearest Neighbors
15 pages
KNN Algorithm
No ratings yet
KNN Algorithm
15 pages
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
No ratings yet
29 K-Nearest Neighbor and Summing Up The End-To-End Workflow
6 pages
Presentation UNIT-2(Old)
No ratings yet
Presentation UNIT-2(Old)
58 pages
Data Sciene - Unit 5 Material
No ratings yet
Data Sciene - Unit 5 Material
15 pages
Experiment 4: Aim/Overview of The Practical: Task To Be Done
No ratings yet
Experiment 4: Aim/Overview of The Practical: Task To Be Done
7 pages
4+KNN+Classifier
No ratings yet
4+KNN+Classifier
6 pages
Machine Learning unit 3
No ratings yet
Machine Learning unit 3
40 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
KNN HMM
No ratings yet
KNN HMM
51 pages
B-56 Sanket Jambhulkar MLA-7
No ratings yet
B-56 Sanket Jambhulkar MLA-7
9 pages
Predict Classify Cluster
No ratings yet
Predict Classify Cluster
12 pages
Final MLCaseStudy.b
No ratings yet
Final MLCaseStudy.b
14 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
June 2018 MA
No ratings yet
June 2018 MA
27 pages
The Relationship Between Family Interference With Work and Job Performance
No ratings yet
The Relationship Between Family Interference With Work and Job Performance
54 pages
Math03 Co5b.1 Sy2223
No ratings yet
Math03 Co5b.1 Sy2223
33 pages
Problem Set C
No ratings yet
Problem Set C
1 page
Quarter4 Module 6 - 7
No ratings yet
Quarter4 Module 6 - 7
19 pages
CoTM Thesis - Engida Ejigu
No ratings yet
CoTM Thesis - Engida Ejigu
125 pages
Cie A2 Furthermaths 9231 Statistics1 v1 Znotes
No ratings yet
Cie A2 Furthermaths 9231 Statistics1 v1 Znotes
15 pages
Sentence Correction P2
No ratings yet
Sentence Correction P2
16 pages
A Project Synopsis On
No ratings yet
A Project Synopsis On
4 pages
Steps in The Research Process
No ratings yet
Steps in The Research Process
4 pages
17782
No ratings yet
17782
40 pages
POD Apostila HSE
No ratings yet
POD Apostila HSE
144 pages
Metascience As A Social Movement PDF
No ratings yet
Metascience As A Social Movement PDF
33 pages
Six Sigma Session9 27022021 WILP BITS PILANI 1613825883729
100% (1)
Six Sigma Session9 27022021 WILP BITS PILANI 1613825883729
62 pages
On Asymptotic Distribution Theory in Segmented Regression Problems
No ratings yet
On Asymptotic Distribution Theory in Segmented Regression Problems
36 pages
Deep Learning Hybrid Approaches To Detect Fake Reviews and Ratings
No ratings yet
Deep Learning Hybrid Approaches To Detect Fake Reviews and Ratings
8 pages
The Relationship Between Reading Anxiety and Reading Strategy Used by Efl Student Teachers
No ratings yet
The Relationship Between Reading Anxiety and Reading Strategy Used by Efl Student Teachers
9 pages
MMW Midterm Reviewer
No ratings yet
MMW Midterm Reviewer
6 pages
Statistical Theories of Discrimination in Labor Market
No ratings yet
Statistical Theories of Discrimination in Labor Market
14 pages
L1 - Introduction To Industrial Economics
85% (13)
L1 - Introduction To Industrial Economics
25 pages
English10 Q4 Melc1 3
No ratings yet
English10 Q4 Melc1 3
21 pages
Probability
No ratings yet
Probability
36 pages
Customer Segmentation Clustering
No ratings yet
Customer Segmentation Clustering
35 pages
5 Algoritma Klastering
No ratings yet
5 Algoritma Klastering
85 pages
Distance Education Programme (Applicable To The Candidates Admitted From The Academic Year 2007 - 2008)
No ratings yet
Distance Education Programme (Applicable To The Candidates Admitted From The Academic Year 2007 - 2008)
11 pages
Institution Nam-WPS Office
No ratings yet
Institution Nam-WPS Office
5 pages
Chapter 2 - SPC
No ratings yet
Chapter 2 - SPC
75 pages