0% found this document useful (0 votes)

75 views73 pages

Introduction-To-Ml-Part-3 Edited

This document provides an overview of dimensionality reduction and unsupervised learning techniques. It discusses principal component analysis (PCA), kernel PCA, locally linear embedding, and clustering algorithms like K-means clustering. For K-means clustering, it explains the algorithm, limitations, methods for choosing the number of clusters like the elbow method and silhouette analysis. It also discusses variants like mini-batch K-means for large datasets.

Uploaded by

Hana Alhomrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views73 pages

Introduction-To-Ml-Part-3 Edited

Uploaded by

Hana Alhomrani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Introduction to Machine

Learning, part III

August 2022
References
• Slides closely follow Hands-on Machine Learning with Scikit-
Learn, Keras and Tensorflow by Aurelien Geron.
• Another great reference is Machine Learning with PyTorch and
Scikit-Learn by Sebastian Raschka.
• Official documentation and the user guide for Scikit-Learn are
also fantastic.

Prof. David R. Pugh

Dimensionality
Reduction

Prof. David R. Pugh

Dimensionality Reduction
Curse of Dimensionality 3D Intuition fails when applied to higher dimensions

• High-dimensional datasets are often

very sparse: most training instances
are likely to be far away from each
other.
• New instance will likely be far away
from any training instance => harder
to generalize well.
• High-dimensional datasets are prone
to overfitting.
• More data? Data required to
achieve given density of coverage
grow exponentially with number
of dimensions.
Prof. David R. Pugh
Projection
3D dataset lying "close" to 2D subspace Project down from 3D to 2D

Prof. David R. Pugh

Manifold Learning
Classic Swiss Roll Dataset Projection (left) vs Unrolling (right)

Prof. David R. Pugh

Manifold Learning
• With manifold learning you try to learn the lower dimensional
space that "best" represents the higher dimensional data.
• Relies on manifold hypothesis: most real-world high-
dimensional datasets lie close to a much lower-dimensional
manifold. Assumption is often observed empirically.
• Another implicit assumption: classification or regression task
will be "easier" if expressed in the lower-dimensional space of
the manifold. Does not always hold in practice.

Prof. David R. Pugh

Principal Components Analysis
(PCA)
What is PCA? Selecting the subspace to project on

• Find the lower dimensional

hyperplane "closest" to the
higher dimensional data.
• Project the data onto the lower
dimensional hyperplane.
• "Closest" means preserves the
most variation in the training
data.
Prof. David R. Pugh
Principal Components Analysis
(PCA)
Principle Components Choosing "right" number of dimensions

• PCA identifies the axis that accounts for

the largest amount of variance in the
training set.
• PCA also finds a second axis, orthogonal
to the first one, that accounts for the
largest amount of remaining variance.
• For an n-dimensional training set, PCA
will find n principal components.
• Choosing d < n principal components
allows you to project n-dimensional
training set into a d-dimensional training
set.

Prof. David R. Pugh

Kernel PCA (kPCA)
What is kPCA? Reducing Swiss Roll to 2D with various kernels

• Can combine the "kernel trick"

with PCA to perform complex
non-linear projections.
• Good at preserving clusters
of instances after projection, or
sometimes even unrolling
datasets that lie close to a
twisted manifold.
• Compute and
memory intensive; doesn't scale
to large datasets. Prof. David R. Pugh
Kernel PCA (kPCA)
Selecting kernels and tuning hyperparameters kPCA and reconstruction per-image error

• If using kPCA as preprocessing

step in classification/regression
pipeline, then choose kernel
and tune kPCA hyperparameters
to max performance of the
whole pipeline.
• Alternatively, choose kernel and
tune kPCA hyperparameters to
minimize reconstruction error.
Prof. David R. Pugh
Locally-Linear Embedding (LLE)
What is LLE? Unrolling the Swiss Roll with LLE

• Non-linear dimensionality
reduction technique.
• Unlike previous algorithms, LLE
doesn't rely on projections.
• Measures how each training
instance linearly relates to its
nearest neighbors, then looks for a
low-dimensional representation of
the training set that preserves
those local relationships.
• Scales poorly to large datasets. Prof. David R. Pugh
Unsupervised
Learning

Prof. David R. Pugh

Clustering
• The task of identifying similar instances and assigning them
to clusters, or groups of similar instances.
• Just like in classification, each instance gets assigned to a
group. However, unlike classification, clustering is an
unsupervised learning task.

Prof. David R. Pugh

Clustering

Prof. David R. Pugh

Clustering
• Wide variety of use cases for clustering algorithms.
• Customer segmentation
• Basic data analysis
• Dimensionality reduction
• Feature engineering
• Anomaly detection
• Semi-supervised learning
• Image segmentation

Prof. David R. Pugh

Hard vs Soft Clustering
• Hard clustering: assigning each instance to a single cluster.
• Soft clustering: give each instance a score for each cluster.
• Score can be the distance between the instance and the
centroid; conversely, it can be a similarity (affinity) score.

Prof. David R. Pugh

K-Means
Origins of K-Means algorithm Unlabeled dataset with 5 "clusters"
• The K-Means algorithm is a simple
algorithm capable of clustering
this kind of dataset very
efficiently.
• It was proposed by Stuart Lloyd at
Bell Labs in 1957.
• In 1965, Edward W. Forgy
published virtually the same
algorithm, so K-Means is
sometimes referred to as Lloyd–
Forgy.
Prof. David R. Pugh
K-Means
Deriving the K-Means algorithm A few iterations of the K-Means algorithm

• Suppose you were given the

centroids. How would you label
the instances?
• Suppose you were given all the
instance labels. How would you
compute the centroids?
• But you are given neither the
labels nor the centroids. How
can you proceed?
Prof. David R. Pugh
K-Means
Limitations of K-Means algorithm K-Means decision boundaries

• K-Means algorithm does not

behave well when blobs have
very different diameters. Why?
• K-Means only cares about the
distance to the centroid when
assigning an instance to a
cluster.

Prof. David R. Pugh

K-Means
• Algorithm is guaranteed to converge, but it may not converge to
the "best" solution. Compare solutions by comparing inertia.
• Inertia of a solution is the sum of squared distances between
the training instances and their closest centroid.
• The solution with lower inertia is better; the "best" solution will
minimize inertia.

Prof. David R. Pugh

K-Means
Centroid initialization methods Example: "unlucky" centroid initialization

• Randomly initialize centriods,

run algorithm n_init number of
times and keep the "best"
solution.
• K-Means++ algorithm initializes
centroids that are distant from
one another. Smarter
initialization reduces
computation.
Prof. David R. Pugh
K-Means
For large data use mini-batch K-Means Mini-batch: higher inertia but much faster!

• Instead of using the full dataset

at each iteration, use mini-
batches.
• Speeds up the algorithm by a
factor of three to four.
• Possible to cluster huge
datasets that do not fit in
memory.

Prof. David R. Pugh

K-Means
Finding optimal number of clusters Poor choice of k: too low (L) vs too high (R)

• Typically, you will not know k

beforehand.
• Poor choice of k will lead to low
quality solutions.
• Inertia is not a good
performance metric when
trying to choose k. Why?

Prof. David R. Pugh

K-Means
Choose k to "minimize" inertia? Can use an "elbow" plot to choose k
• Increasing k will always
decrease inertia. Why?
• Can plot inertia as function of
k and look for "elbow".
• This method is crude but
doesn't require too much
computation.

Prof. David R. Pugh

K-Means
What is the silhouette coefficient? Choose k to maximize silhouette score
• Silhouette coefficient can vary
between –1 and +1.
• Coefficient close to +1 means that the
instance is well inside its own cluster
and far from other clusters.
• Coefficient close to 0 means that
instance is close to a cluster
boundary.
• Coefficient close to –1 means that the
instance may have been assigned to
the wrong cluster.
• Average silhouette coefficient across
all instances is the silhouette score.
Prof. David R. Pugh
K-Means
Analyze silhouette diagram to choose k Helps identify "balanced" clusters
• Silhouette diagram: plot
of silhouette coefficients sorted
by the cluster they are assigned
to and by the value of the
coefficient.
• Vertical dashed lines represent
the mean silhouette score for
each number of clusters

Prof. David R. Pugh

K-Means
Limitations of K-Means K-Means fails with "non-circular" clusters
• Necessary to run the algorithm
several times to avoid suboptimal
solutions.
• Need to specify the number of
clusters.
• K-Means does not behave well
when the clusters have varying
sizes, different densities, or non-
spherical shapes.
• Important to scale the input
features before you run K-Means!
Prof. David R. Pugh
Clustering for Image Segmentation
• Image segmentation is the task of partitioning an image into
multiple segments.
• color segmentation: pixels with a similar color get assigned to
the same segment.
• semantic segmentation: all pixels that are part of the same
object type get assigned to the same segment.
• instance segmentation: all pixels that are part of the same
individual object are assigned to the same segment.

Prof. David R. Pugh

Clustering for Image Segmentation
Pixels get the mean color of their clusters K-means with various color "clusters"
• K-Means algorithm with 8
cluster outputs the image
shown in the upper right.
• With fewer than eight clusters,
the ladybug’s flashy red color
fails to get a cluster of its own.
• K-Means prefers clusters of
similar sizes. The ladybug is
small so even though its color is
flashy, K-Means fails to give it a
separate cluster. Prof. David R. Pugh
Clustering for Semi-Supervised
Learning
Clustering can help label unlabeled data 50 representative digits (one per cluster)
• Suppose that we have plenty of
unlabeled instances but very few
labeled instances.
• Use clustering algorithm to
identify representative instances
and label these manually.
• Propagate manual labels to all
instances in the same cluster.
• Re-train your classification
algorithms on this larger data set.
Prof. David R. Pugh
Active Learning
• When a human expert interacts with the learning algorithm,
providing labels for specific instances when the algorithm
requests them.
• Many different strategies for active learning, but one of the
most common ones is called uncertainty sampling.

Prof. David R. Pugh

Uncertainty Sampling
1. Model is trained on the labeled instances gathered so far;
model is used to make predictions on all the unlabeled
instances.
2. The instances for which the model is most uncertain are given
to the expert for labeling.
3. Iterate until the performance improvement stops being worth
the labeling effort.

Prof. David R. Pugh

DBSCAN
• Defines clusters as continuous regions of high density. Here is how it
works:
• For each instance, count how many instances are located within a
small distance ε from it. This region is called the instance’s ε-
neighborhood.
• If instance has at least min_samples instances in its ε-neighborhood,
then it is considered a core instance.
• All instances in the ε-neighborhood of a core instance belong to the
same cluster.
• Any instance that is not a core instance and does not have one in its
neighborhood is considered an anomaly.

Prof. David R. Pugh

DBSCAN

Prof. David R. Pugh

DBSCAN
Pros Cons
• If the density varies
• Capable of identifying any significantly across
number of clusters of any the clusters,
shape. • or if there’s no sufficiently
• Robust to outliers. low-density region around
some clusters, DBSCAN can
• Only two hyperparameters. struggle to capture all the
clusters.
• Algorithm does not scale
well to large datasets.
Prof. David R. Pugh
Other clustering algorithms
• Agglomerative clustering
• BIRCH
• Mean-Shift
• Affinity propagation
• Spectral clustering

Prof. David R. Pugh

Gaussian Mixture Model (GMM)
• Probabilistic model that assumes that instances were
generated from a mixture of several Gaussian distributions
whose parameters are unknown.
• Many variants; simplest variant discussed here requires that
you know the number of clusters, k.

Prof. David R. Pugh

Gaussian Mixture Model (GMM)
• Dataset is assumed to be generated as follows.
• For each instance, cluster is picked randomly from
among k clusters. Probability of choosing the cluster j is cluster
weight ϕ(j). The index of the cluster chosen for the instance
i is z(i).
• If instance i was assigned to the cluster j (i.e., z(i) = j), then
location x(i) of this instance is sampled randomly from Gaussian
distribution with mean μ(j) and covariance matrix Σ(j).

Prof. David R. Pugh

Expectation Maximization (EM)
• Expectation-Maximization (EM) algorithm has many similarities
with the K-Means algorithm:
1. Initializes the cluster parameters randomly.
2. Expectation step: assign instances to clusters.
3. Maximization step: update the clusters.
4. Repeat steps 2 and 3 until convergence.

Prof. David R. Pugh

Expectation Maximization (EM)
• Expectation-Maximization (EM) can be thought of as a
generalization of the K-Means algorithm:
• Like K-Means algorithm, EM algorithm finds the cluster
centers.
• EM algorithm also finds cluster size, shape, and orientation (as
well as their relative weights).
• EM algorithm uses soft cluster assignments, not hard
assignments (like K-Means algorithm).

Prof. David R. Pugh

Gaussian Mixture Model (GMM)
GMM is a soft clustering model Clusters, decision boundaries, contours
• Expectation step: algorithm
estimates probability that each
instance belongs to each cluster.
• Maximization step: each cluster is
updated using all instances; each
instance is weighted by the
estimated probability that it
belongs to that cluster.
• Each cluster’s update will mostly
be impacted by the instances that
have high probability of being in
the cluster.
Prof. David R. Pugh
Gaussian Mixture Model (GMM)
How to restrict share and orientation of clusters? Clusters using different covariance types

• When there are many dimensions,

or many clusters, or few instances,
EM can struggle to converge to the
optimal solution.
• Can reduce problem difficulty by
limiting the range of shapes and
orientations that the clusters can
have.
• Impose constraints on the
covariance matrices. Options
are full, spherical, diagonal, tied.
Prof. David R. Pugh
Anomaly Detection Using Gaussian
Mixtures
GMMs make anomaly detection simple! Anomalies are represented as stars
• Any instance located in a low-
density region can be
considered an anomaly.
• Must define what density
threshold to use.
• Too many false positives,
decrease the threshold; too
many false negatives, increase
the threshold.
Prof. David R. Pugh
Selecting the Number of Clusters
How to choose number of clusters? Use an "elbow" plot to select k
• Metrics like inertia or silhouette
score are not reliable when
clusters are not spherical or
have different sizes.
• Choose k to minimize some
information criterion, such as
Bayesian information criterion
(BIC) or Akaike information
criterion (AIC).
Prof. David R. Pugh
Bayesian GMM
• Rather than manually searching for the optimal number of
clusters, you can use Bayesian GMM.
• Capable of giving weights equal (or close) to zero to
unnecessary clusters.
• Set the number of clusters to a value that you have good
reason to believe is greater than the optimal number of
clusters.
• Algorithm will eliminate the unnecessary clusters automatically.

Prof. David R. Pugh

GMM Limitations

Prof. David R. Pugh

Algorithms for anomaly detection
• Fast-MCD
• Isolation Forest
• Local Outlier Factor (LOF)
• One-class SVM
• Invertible dimensionality reduction algorithms such as PCA

Prof. David R. Pugh

Introduction to
Neural Networks

Prof. David R. Pugh

Introduction to Neural Networks
• Artificial neural networks (ANNs).
• ML model inspired by the networks of biological neurons found
in our brains.
• ANNs have gradually become quite different from their
biological cousins.
• ANNs are at the very core of Deep Learning.

Prof. David R. Pugh

Introduction to Neural Networks
• Ideal for tackling large and highly complex tasks.
• Classifying billions of images (Google Images).
• Speech recognition services (Apple’s Siri).
• Recommending the best videos to (YouTube).
• Super-human gaming (DeepMind’s AlphaGo).

Prof. David R. Pugh

A Brief History of ANNs
• Interest in ANNs has come in waves.
• First wave of interest kicked off by McCulloch and Pitts (1943).
• Computational model of how biological neurons could
perform complex computations using propositional calculus.
• Early success led to lots of hype! But by 1960s the hype had
died down and ANNs fell into disuse as ANNs research entered
its first "winter" period.

Prof. David R. Pugh

A Brief History of ANNs
• A second wave of interest in ANNs kicked off in the early 1980s.
• In 1980s new ANN architectures invented, better training
techniques developed. Lots of hype!
• In 1990s other powerful ML techniques were invented, such as
SVMs, which seemed to offer better results than ANNs.
• Hype died down and ANN research entered its second "winter"
period.

Prof. David R. Pugh

A Brief History of ANNs
• Now experiencing a third wave of interest in ANNs. Will we see a
repeat of the past? Or is this time different?
• Huge quantities of data now available to train ANNs.
• Better training algorithms have been developed.
• Theoretical issues of ANNs have turned out to be benign in
practice.
• ANNs often outperform other ML techniques on large and
complex problems.
• Increase in computing power (GPUs, TPUs, etc.) makes it
possible to train large ANNs efficiently.

Prof. David R. Pugh

Biological Neurons
A single biological neuron Layers of biological neurons in a brain

Prof. David R. Pugh

Logical Computations with ANNs
• McCulloch and Pitts (1943) developed a simple model of a
biological neuron.
• Artificial neuron has one or more binary (on/off) inputs and one
binary output.
• Artificial neuron activates when more than certain number of
inputs are active.
• Proved that one build ANNs to compute any logical proposition.

Prof. David R. Pugh

Logical Computations with Artificial
Neurons
ANNs performing logical computations ANNs can act as logical operators

• Artificial neuron
called a threshold logic
unit (TLU).
• Inputs and output are numbers
(not binary on/off values).
• Each input connection is
associated with a weight.

Prof. David R. Pugh

TLUs as simple binary classifiers
• TLUs can perform linear binary classification (like logistic
regression, linear SVMs, etc.).
• TLU first computes a linear function of its inputs and then
applies a step function to the result.
• Most common step function used in TLU is the Heaviside step
function.
• If result exceeds a threshold, then output positive class (else
output the negative class).

Prof. David R. Pugh

The Perceptron
Perceptron is a single layer of TLUs Perceptron with 2 inputs and 3 outputs

• Composed of one or more TLUs

organized in a single layer: every
TLU is connected to every input.
• Such a layer is called a fully
connected or dense layer.
• Inputs form an "input layer";
since the layer of TLUs
also produces the final output,
it is also the output layer.
Prof. David R. Pugh
Training a Perceptron
• Perceptron training rule reinforces weights that help reduce the
prediction error.
• “Cells that fire together, wire together”; connection weight
between two neurons tends to increase when they fire
simultaneously.
• Perceptron is trained using a variant of this rule that considers
the error made by the network when making a prediction.
• If classes are linearly separable, then the training rule
converges to a solution.

Prof. David R. Pugh

The Perceptron
Adding layers helps address limitations Two-layer perception can perform XOR

• Single perceptron can't solve

the simple XOR classification
problem.
• However, a two-layer
perceptron can solve the XOR
classification problem.
• Other limitations of perceptron
can be addressed by adding
multiple layers.
Prof. David R. Pugh
The Multi-layer Perceptron (MLP)
Key neural network terms MLP with one hidden layer
• An input layer, one or more layers
of TLUs, called hidden layers, one
final layer of TLUs called
the output layer.
• When an ANN contains a deep
stack of hidden layers, it is called
a deep neural network (DNN).
• A feedforward neural
network (FNN) is when signal
flows from inputs to the outputs.
Prof. David R. Pugh
Training an MLP
• In 1970, master's student Seppo Linnainmaa, developed an
efficient algorithm called reverse-mode automatic differentiation.
• Computes gradients of the neural network’s error for every
single model parameter with just two passes through the
network!
• Combination of reverse-mode automatic differentiation and
stochastic gradient descent is called backpropagation (or
backprop).

Prof. David R. Pugh

Backpropagation
• Backpropagation algorithm in brief...

• Randomly initialize all the model parameters.

• Forward pass: make predictions for a mini-batch of training
samples.
• Measures the error between current predictions and targets using
some loss function.
• Backward pass: then goes through each layer in reverse to measure
the error contribution from each parameter.
• Gradient Descent: update model parameters to reduce the error.
• Repeat for some, typically large, number of mini-batches.

Prof. David R. Pugh

Activation Functions
Backprop requires non-linear activations Activation functions (L) vs derivatives (R)

• Without non-linear activation

functions DNNs would just be
linear functions! Why?
• Popular activation functions are
sigmoid, RELU, and hyperbolic
tangent.
• A large enough DNN with non-
linear activations
can approximate any
continuous function.
Prof. David R. Pugh
MLPs for Regression
• If you want to predict a single value, then you just need a single
output neuron: its output is the predicted value.
• To predict multiple values at once, you need one output
neuron per output dimension.
• Often does not use any activation function for the output layer,
so it’s free to output any value it wants.
• Typically use root mean squared error (RMSE) loss but other
loss functions are possible as well.

Prof. David R. Pugh

MLPs for Classification
• For a binary classification you need a single output neuron and
the sigmoid activation function.
• For multi-label binary classification tasks, you need one output
neuron per output dimension.
• For multi-class classification tasks, you need one output neuron
per class and the SoftMax activation function.

Prof. David R. Pugh

MLPs for Classification
Multi-class classification using MLPs SoftMax activation + Cross-entropy loss
• One output neuron per class;
use the SoftMax activation
function.
• SoftMax function ensures that
estimated probabilities are
between 0 and 1 (and that they
add up to 1 since the classes are
exclusive).
• Typically combine SoftMax
activation with cross-entropy
loss function. Prof. David R. Pugh
Building Intuition for MLPs

Prof. David R. Pugh

Checkout TensorFlow Playground!

Challenges:
• Solve each of the classification tasks.
• Can you find a single neural network that will perform well on
all the classification tasks?
• Can you solve all the classification tasks without using any
hidden layers?

Prof. David R. Pugh

Where to go from here?

Prof. David R. Pugh

Additional Training
• After completing Introduction to Machine Learning you should
check out the following training courses.
• Introduction to Deep Learning
• Accelerated Machine Learning

Prof. David R. Pugh

Coworking Space Rs. 55.67 Million Dec-2022
No ratings yet
Coworking Space Rs. 55.67 Million Dec-2022
37 pages
ML Clustering2
No ratings yet
ML Clustering2
11 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
U L D R: Nsupervised Earning and Imensionality Eduction
No ratings yet
U L D R: Nsupervised Earning and Imensionality Eduction
58 pages
Lecture 08 Slides
No ratings yet
Lecture 08 Slides
43 pages
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
No ratings yet
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
28 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
UNIT 4 ML Notes
No ratings yet
UNIT 4 ML Notes
22 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Clustering and Dimensionality Reduction
No ratings yet
Clustering and Dimensionality Reduction
58 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Unit 4
No ratings yet
Unit 4
46 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
LSML 18
No ratings yet
LSML 18
98 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
14
No ratings yet
14
72 pages
Module 4
No ratings yet
Module 4
63 pages
K Means
No ratings yet
K Means
9 pages
14 - Machine Learning and Big Data in Bioinformatics
No ratings yet
14 - Machine Learning and Big Data in Bioinformatics
37 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
L4-Kmeans
No ratings yet
L4-Kmeans
30 pages
Machine Learning IV
No ratings yet
Machine Learning IV
54 pages
K Means
No ratings yet
K Means
25 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
Lec 04
No ratings yet
Lec 04
70 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
Machine Learning4
No ratings yet
Machine Learning4
39 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
ML Unit-4
No ratings yet
ML Unit-4
23 pages
10.program K Means
No ratings yet
10.program K Means
16 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
IDS26 Clustering and Classification
No ratings yet
IDS26 Clustering and Classification
30 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Clustering Classification and Intro Neural Network
No ratings yet
Clustering Classification and Intro Neural Network
168 pages
Lecture Note 08
No ratings yet
Lecture Note 08
6 pages
AI Unit 5
No ratings yet
AI Unit 5
103 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
IntroML 8 KmeanClustering
No ratings yet
IntroML 8 KmeanClustering
21 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
169 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
59 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
Clustering
No ratings yet
Clustering
65 pages
Day 3
No ratings yet
Day 3
74 pages
Unit 4-Unsupervised Learning-K Means and Hierarchical Clustering
No ratings yet
Unit 4-Unsupervised Learning-K Means and Hierarchical Clustering
48 pages
Unit 3 - MLnotes-WPS Office
No ratings yet
Unit 3 - MLnotes-WPS Office
18 pages
Clustering
No ratings yet
Clustering
29 pages
Experiment 10 Vtu ML
No ratings yet
Experiment 10 Vtu ML
5 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Driven Indust Transformation With FactoryTalk DataMosaix Feb 2024
No ratings yet
Data Driven Indust Transformation With FactoryTalk DataMosaix Feb 2024
36 pages
Paper 3 Type Questions
No ratings yet
Paper 3 Type Questions
5 pages
GSLOPE 5 Manual
No ratings yet
GSLOPE 5 Manual
53 pages
Rızvanoğlu2019 Article OptimizationOfMunicipalSolidWa
No ratings yet
Rızvanoğlu2019 Article OptimizationOfMunicipalSolidWa
12 pages
ASAM AE MCD-1 XCP BS V1-5-0 Release Presentation 1
No ratings yet
ASAM AE MCD-1 XCP BS V1-5-0 Release Presentation 1
10 pages
Attacking Rust
No ratings yet
Attacking Rust
22 pages
Tle 6 Ictentrep Module 3
No ratings yet
Tle 6 Ictentrep Module 3
29 pages
Machine Learning Theory and Application
No ratings yet
Machine Learning Theory and Application
3 pages
WCYL-6204 Assignment (M. Tech II)
No ratings yet
WCYL-6204 Assignment (M. Tech II)
2 pages
1.introduction To DBMS
No ratings yet
1.introduction To DBMS
11 pages
MBSTU Topic List Sheet1
No ratings yet
MBSTU Topic List Sheet1
1 page
HIVE
No ratings yet
HIVE
24 pages
Metasys - A Family History (Ver 7.0)
No ratings yet
Metasys - A Family History (Ver 7.0)
1 page
Kode Program Login Dan Register
No ratings yet
Kode Program Login Dan Register
63 pages
Approaches and Methods in Computational Linguistics
No ratings yet
Approaches and Methods in Computational Linguistics
18 pages
Computer Networks: Practical - 4
No ratings yet
Computer Networks: Practical - 4
8 pages
Igcse Ict
No ratings yet
Igcse Ict
6 pages
My Murlipura: E-Commerce Web App: A Project Report On
No ratings yet
My Murlipura: E-Commerce Web App: A Project Report On
47 pages
Ps Aux
No ratings yet
Ps Aux
11 pages
Information, Communication & Technology (Ict)
No ratings yet
Information, Communication & Technology (Ict)
52 pages
Cell Phone Payment Project Via Bluetooth - Process
No ratings yet
Cell Phone Payment Project Via Bluetooth - Process
6 pages
Despacho 19-07 2
No ratings yet
Despacho 19-07 2
15 pages
2.5. Steganography: Figure 2.8. A Puzzle For Inspector Morse
No ratings yet
2.5. Steganography: Figure 2.8. A Puzzle For Inspector Morse
2 pages
Bim Presentation
100% (1)
Bim Presentation
23 pages
Using The Email Header Analyzer
No ratings yet
Using The Email Header Analyzer
4 pages
Bugreport Borag - Retail SOVS32.121 56 36 2024 08 08 23 49 06 Dumpstate - Log 29415
No ratings yet
Bugreport Borag - Retail SOVS32.121 56 36 2024 08 08 23 49 06 Dumpstate - Log 29415
28 pages
PLC Data Comparison Instructions
No ratings yet
PLC Data Comparison Instructions
4 pages
Steps To Turn On BitLocker For Windows
No ratings yet
Steps To Turn On BitLocker For Windows
16 pages
Grid-Connected PV Inverter: User Manual
No ratings yet
Grid-Connected PV Inverter: User Manual
65 pages

Introduction-To-Ml-Part-3 Edited

Uploaded by

Introduction-To-Ml-Part-3 Edited

Uploaded by

Introduction to Machine

Learning, part III

Prof. David R. Pugh

Prof. David R. Pugh

• High-dimensional datasets are often

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

• Find the lower dimensional

• PCA identifies the axis that accounts for

Prof. David R. Pugh

• Can combine the "kernel trick"

• If using kPCA as preprocessing

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

• Suppose you were given the

• K-Means algorithm does not

Prof. David R. Pugh

Prof. David R. Pugh

• Randomly initialize centriods,

• Instead of using the full dataset

Prof. David R. Pugh

• Typically, you will not know k

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

• When there are many dimensions,

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

• The first network is the identity

Prof. David R. Pugh

Prof. David R. Pugh

• Composed of one or more TLUs

Prof. David R. Pugh

• Single perceptron can't solve

Prof. David R. Pugh

• Randomly initialize all the model parameters.

Prof. David R. Pugh

• Without non-linear activation

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

Prof. David R. Pugh

You might also like