0% found this document useful (0 votes)
6 views

Pattern Analysis-Machine Learning

Pattern analysis - machine learning module.Pattern analysis is an approach to neuropsychological test interpretation in which relationships among test scores are used to inform differential diagnosis.

Uploaded by

Sunil Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Pattern Analysis-Machine Learning

Pattern analysis - machine learning module.Pattern analysis is an approach to neuropsychological test interpretation in which relationships among test scores are used to inform differential diagnosis.

Uploaded by

Sunil Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 74

Module-4

Pattern Analysis And Motion


Analysis
K-Means Algorithm
• K-means clustering algorithm computes the centroids
• It iterates until optimal centroid is found.
• Assumption: No. of Clusters are already known.
• It is also called flat clustering algorithm.
• The number of clusters identified from data is represented by ‘K’ in K-
means.
• In this algorithm, the data points are assigned to a cluster in such a
manner that the sum of the squared distance between the data
points and centroid would be minimum.
• Step 1 − First, we need to specify the number of clusters, K, need to
be generated by this algorithm.
• Step 2 − Next, randomly select K data points and assign each data
point to a cluster. In simple words, classify the data based on the
number of data points.
• Step 3 − Now it will compute the cluster centroids.
• Step 4 − Next, keep iterating the following until we find optimal
centroid which is the assignment of data points to the clusters that
are not changing any more
• 4.1 − First, the sum of squared distance between data points and
centroids would be computed.
• 4.2 − Now, we have to assign each data point to the cluster that is
closer than other cluster (centroid).
• 4.3 − At last compute the centroids for the clusters by taking the
average of all data points of that cluster.
• K-means follows Expectation-Maximization approach to solve the
problem.
• The Expectation-step is used for assigning the data points to the
closest cluster and the Maximization-step is used for computing the
centroid of each cluster.
• Advantages
• The following are some advantages of K-Means clustering algorithms

• It is very easy to understand and implement.
• If we have large number of variables then, K-means would be faster
than Hierarchical clustering.
• On re-computation of centroids, an instance can change the cluster.
• Tighter clusters are formed with K-means as compared to Hierarchical
clustering.
• Disadvantages
• The following are some disadvantages of K-Means clustering algorithms −
• It is a bit difficult to predict the number of clusters i.e. the value of k.
• Output is strongly impacted by initial inputs like number of clusters (value of
k)
• Order of data will have strong impact on the final output.
• It is very sensitive to rescaling. If we will rescale our data by means of
normalization or standardization, then the output will completely change.
• It is not good in doing clustering job if the clusters have a complicated
geometric shape.
k-medoids Clustering

• k-medoids is another type of clustering algorithm that can be used to


find natural groupings in a dataset.
• k-medoids clustering is very similar to k-means clustering, except for a
few differences.
• The k-medoids clustering algorithm has a slightly different
optimization function than k-means.
• k-medoids clustering gives almost identical results to k-means
clustering. But in some special cases where we have outliers in a
dataset, k-medoids clustering is preferred as it's more robust to
outliers. 
Another Example-----
Total Cost
5+0+0+4.242641+1.414+2+0+4.472136
= 17.129
• Choose k data points from the scatter plot as starting points for
cluster centers.
• Calculate their distance from all the points in the scatter plot.
• Classify each point into the cluster whose center it is closest to.
• Select a new point in each cluster that minimizes the sum of distances
of all points in that cluster from itself.
• Repeat Step 2 until the centers stop changing.
Gaussian Mixture Model
• Gaussian Mixture Models (GMMs) assume that there are a certain
number of Gaussian distributions, and each of these distributions
represent a cluster.
• Hence, a Gaussian Mixture Model tends to group the data points
belonging to a single distribution together
• Let’s say we have three Gaussian distributions – GD1, GD2, and GD3.
• These have a certain mean (μ1, μ2, μ3) and variance (σ1, σ2, σ3) value
respectively.
• For a given set of data points, our GMM would identify the probability
of each data point belonging to each of these distributions.
• Gaussian Mixture Models are probabilistic models and use the soft
clustering approach for distributing the points in different clusters.
• Here, we have three clusters that are denoted by three colors – Blue,
Green, and Cyan.
• Let’s take the data point highlighted in red.
• The probability of this point being a part of the blue cluster is 1, while
the probability of it being a part of the green or cyan clusters is 0.
• Now, consider another point – somewhere in between the blue and
cyan.
• The probability that this point is a part of cluster green is 0,
• And the probability that this belongs to blue and cyan is 0.2 and 0.8
respectively.
Gaussian Distribution
• It has a bell-shaped curve, with the data points symmetrically
distributed around the mean value.
• The below image has a few Gaussian distributions with a difference in
mean (μ) and variance (σ2).
Probability Density Function

But this would only be true for a single variable.

In the case of two variables, instead of a 2D bell-shaped curve


where x is the input vector, μ is the 2D mean vector, and Σ
is the 2×2 covariance matrix.

The covariance would now define the shape of this curve. 


EM
• Expectation-Maximization (EM) is a statistical algorithm for finding the right
model parameters. We typically use EM when the data has missing values,
or in other words, when the data is incomplete
• Expectation-Maximization tries to use the existing data to determine the
optimum values for these variables and then finds the model parameters. 
• Broadly, the Expectation-Maximization algorithm has two steps:
• E-step: In this step, the available data is used to estimate (guess) the values
of the missing variables
• M-step: Based on the estimated values generated in the E-step, the
complete data is used to update the parameters
• Let’s say we need to assign k number of clusters.
• This means that there are k Gaussian distributions, with the mean and covariance
values to be μ1, μ2, .. μk and Σ1, Σ2, .. Σk .
• Additionally, there is another parameter for the distribution that defines the number
of points for the distribution.
• Or in other words, the density of the distribution is represented with Π i.
• Now, we need to find the values for these parameters to define the Gaussian
distributions.
• We already decided the number of clusters, and randomly assigned the values for
the mean, covariance, and density.
• Next, we’ll perform the E-step and the M-step!
• k-means only considers the mean to update the centroid while GMM
takes into account the mean as well as the variance of the data!
Machine Learning
Supervised Learning
• A supervised learning algorithm learns from labeled training data, helps you to predict outcomes
for unforeseen data.
Unsupervised Learning
• Unsupervised learning is a machine learning technique, where you do not need to supervise the
model.
• Instead, you need to allow the model to work on its own to discover information. It mainly deals
with the unlabelled data.
Semi supervised Learning
• Semi-supervised learning is a learning problem that involves a small number of labeled examples
and a large number of unlabeled examples.
• Learning problems of this type are challenging as neither supervised nor unsupervised learning
algorithms are able to make effective use of the mixtures of labeled and untellable data.
Classification

• Supervised Machine Learning algorithm can be broadly classified into


Regression and Classification Algorithms.
• In Regression algorithms, we have predicted the output for continuous values,
but to predict the categorical values, we need Classification algorithms.
• Classification Algorithm
• The Classification algorithm is a Supervised Learning technique that is used to
identify the category of new observations on the basis of training data.
• In classification algorithm, a discrete output function(y) is mapped to input
variable(x).
• y=f(x), where y = categorical output  
Types of classification
• The algorithm which implements the classification on a dataset is
known as a classifier.
• Binary Classifier: If the classification problem has only two possible
outcomes, then it is called as Binary Classifier.
Examples: YES or NO, MALE or FEMALE, SPAM or NOT SPAM, CAT or
DOG, etc.
• Multi-class Classifier: If a classification problem has more than two
outcomes, then it is called as Multi-class Classifier.
Example: Classifications of types of crops, Classification of types of
music.
Learners
• In the classification problems, there are two types of learners:
• Lazy Learners: Lazy Learner firstly stores the training dataset and wait until it receives the
test dataset.
• In Lazy learner case, classification is done on the basis of the most related data stored in the
training dataset.
• It takes less time in training but more time for predictions.
• Example: K-NN algorithm, Case-based reasoning
• Eager Learners: Eager Learners develop a classification model based on a training dataset
before receiving a test dataset.
• Opposite to Lazy learners, Eager learners take less time in training and more time in
prediction. 
• Example: Decision Trees, Naïve Bayes, ANN.
Types of Classification Algorithms:

• Classification Algorithms can be further divided into the Mainly two


category:
• Linear Models
• Logistic Regression
• Support Vector Machines
• Non-linear Models
• K-Nearest Neighbours
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification
K-Nearest Neighbour
• Based on Supervised Learning technique.
• Assumes the similarity between the new case/data and available cases and put the new case into
the category that is most similar to the available categories.
• K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
• This means when new data appears then it can be easily classified into a well suite category by
using K- NN algorithm.
• K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying
data.
• It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action on
the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much similar to the new data.
KNN Classifier
Why K-NN
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• Step-4: Among these k neighbors, count the number of the data
points in each category.
• Step-5: Assign the new data points to that category for which the
number of the neighbor is maximum.
• Step-6: Our model is ready.
Select K value
• There is no particular way to determine the best value for "K", so we
need to try some values to find the best out of them.
• The most preferred value for K is 5.
• A very low value for K such as K=1 or K=2, can be noisy and lead to the
effects of outliers in the model.
• Large values for K are good, but it may find some difficulties.
Advantages of KNN Algorithm:
• It is simple to implement.
• It is robust to the noisy training data
• It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:
• Always needs to determine the value of K which may be complex
some time.
• The computation cost is high because of calculating the distance
between the data points for all the training samples.
Naïve Bayes Classifier
• Naïve Bayes algorithm is a supervised learning algorithm,
• Based on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier helps in building the fast machine learning
models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of
the probability of an object.
• The Naïve Bayes algorithm is comprised of two words Naïve and
Bayes, Which can be described as:
• Naïve: It is called Naïve because it assumes that the occurrence of a
certain feature is independent of the occurrence of other features.
• Hence each feature individually contributes to identify without
depending on each other.
• Bayes: It is called Bayes because it depends on the principle of 
Bayes' Theorem.
• Bayes' theorem is used to determine the probability of a hypothesis
with prior knowledge. It depends on the conditional probability.
• P(A|B) is Posterior probability: Probability of hypothesis A on the
observed event B.
• P(B|A) is Likelihood probability: Probability of the evidence given
that the probability of a hypothesis is true.
• P(A) is Prior Probability: Probability of hypothesis before observing
the evidence.
• P(B) is Marginal Probability: Probability of Evidence.
• Suppose we have a dataset of weather conditions and corresponding
target variable "Play".
• So using this dataset we need to decide that whether we should play
or not on a particular day according to the weather conditions.
• Convert the given dataset into frequency tables.
• Generate Likelihood table by finding the probabilities of given
features.
• Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the
Player should play or not?
Applying Bayes'theorem:

P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
• P(Sunny|Yes)= 3/10= 0.3
• P(Sunny)= 0.35
• P(Yes)=0.71
• So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
• P(Sunny|NO)= 2/4=0.5
• P(No)= 0.29
• P(Sunny)= 0.35
• So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
• So as we can see from the above calculation that P(Yes|Sunny)>P(No|Sunny)
• Hence on a Sunny day, Player can play the game.
Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms to predict a class of
datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the other
Algorithms.
• It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or unrelated, so it
cannot learn the relationship between features.
Artificial neural Network model
• Multilayer Perceptron – It is a feedforward artificial neural network
model. It maps sets of input data onto a set of appropriate outputs.
• Radial Basis Function Network – A radial basis function network is an
artificial neural network. It uses radial basis functions as activation
functions.
• Both of the above are being supervised learning networks used with
1 or more dependent variables at the output.
Multilayer Perceptron
• A multilayer perceptron is a feedforward artificial neural network
model.
• It maps sets of input data onto a set of appropriate outputs.
• In feed-forward neural networks, the movement is only possible in
the forward direction.
• An MLP consists of many layers of nodes in a directed graph, with
each layer connected to the next one.
• Each neuron is a linear equation like linear regression as shown in the
following equation
Radial Basis Function Network

• A Radial Basis Function (RBF) network is a supervised learning network


• RBF network works with only one hidden layer.
• It accomplishes this by calculating the value of each unit in the hidden layer for an
observation.
• It uses the distance in space between this observation and the center of the unit.
• Instead of the sum of the weighted values of the units of the preceding level.
• The centers of the hidden layer of an RBF network are not adjusted at each iteration
during learning.
• In RBF network, hidden neurons share the space and are virtually independent of
each other.
• This makes for faster convergence of RBF networks in the learning phase

You might also like