0% found this document useful (0 votes)

28 views6 pages

Variance

The document discusses key machine learning concepts including variance, covariance, correlation, eigenvalues and eigenvectors. It then explains how dimensionality reduction techniques like PCA use eigenvalues and eigenvectors to reduce dimensions while preserving information. Unsupervised learning techniques including clustering and association are also covered.

Uploaded by

umesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views6 pages

Variance

Uploaded by

umesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Variance:Variance refers to the changes in the model when using different portions of the training

data set.variance is the variability in the model prediction—how much the ML function can adjust
depending on the given data set.

Covariance:It is the relationship between a pair of random variables where change in one variable
causes change in another variable.

Correlation:Correlation is a statistical measure that indicates how strongly two variables are related.

Eigenvalues and Eigenvector:eigenvalues and eigenvectors are used to represent data, to perform
operations on data, and to train machine learning models.
In artificial intelligence, eigenvalues and eigenvectors are used to develop algorithms for tasks such as
image recognition, natural language processing, and robotics.
1. Eigenvalue (λ): An eigenvalue of a square matrix A is a scalar (a single number) λ such that
there exists a non-zero vector v (the eigenvector) for which the following equation holds:
Av = λv

In other words, when you multiply the matrix A by the eigenvector v, you get a new vector
that is just a scaled version of v (scaled by the eigenvalue λ).

2. Eigenvector: The vector v mentioned above is called an eigenvector corresponding to the

eigenvalue λ. Eigenvectors only change in scale (magnitude) when multiplied by the matrix A;
their direction remains the same.to find eigenvalues and eigenvectors, you typically solve the
following equation for λ and v:
(A — λI)v = 0
Where:
1.A is the square matrix for which you want to find eigenvalues and eigenvectors.
2.λ is the eigenvalue you’re trying to find.
3.I is the identity matrix (a diagonal matrix with 1s on the diagonal and 0s elsewhere).
4.v is the eigenvector you’re trying to find.

Use of Eigenvalues and Eigenvectors in Machine Learning and AI:

1.Dimensionality Reduction (PCA): In Principal Component Analysis (PCA), you calculate the
eigenvectors and eigenvalues of the covariance matrix of your data.

2.The eigenvectors (principal components) with the largest eigenvalues capture the most variance in
the data and can be used to reduce the dimensionality of the dataset while preserving important
information.

3.Image Compression: Eigenvectors and eigenvalues are used in techniques like Singular Value
Decomposition (SVD) for image compression.

4.Support vector machines: Support vector machines (SVMs) are a type of machine learning
algorithm that can be used for classification and regression tasks. SVMs work by finding a hyperplane
that separates the data into two classes. The eigenvalues and eigenvectors of the kernel matrix of the
SVM can be used to improve the performance of the algorithm.

What is dimensionality reduction?

It aids in the process of data compression, allowing the data to take up less storage space as well as
reducing computation times. The technique is commonly used in machine learning (ML).

Different techniques, such as feature selection and feature extraction, are used to complete
dimensionality reduction.
Why is dimensionality reduction important for machine learning?

Machine learning requires large data sets to properly train and operate. Dimensionality reduction is a
particularly useful way to prevent overfitting and to solve classification and regression problems.

Feature Selection Techniques in Machine Learning

Feature selection:
Feature selection is a process that chooses a subset of features from the original features so that the
feature space is optimally reduced according to a certain criterion.

What is Unsupervised Learning?

Unsupervised learning is a type of machine learning in which models are trained
using unlabeled dataset and are allowed to act on that data without any
supervision.

Why use Unsupervised Learning?

o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own experiences,
which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which make unsupervised
learning more important.
o In real-world, we do not always have input data with the corresponding output so to solve
such cases, we need unsupervised learning.

Types of Unsupervised Learning Algorithm:

Clustering: Clustering is a method of grouping the objects into clusters such that objects with most
similarities remains into a group and has less or no similarities with the objects of another group.

Association: An association rule is an unsupervised learning method which is used for finding the
relationships between variables in the large database. It determines the set of items that occurs
together in the dataset.

Advantages of Unsupervised Learning

Unsupervised learning is used for more complex tasks as compared to supervised learning because, in
unsupervised learning, we don't have labeled input data.
Unsupervised learning is preferable as it is easy to get unlabeled data in comparison to labeled data.

Disadvantages of Unsupervised Learning

Unsupervised learning is intrinsically more difficult than supervised learning as it does not have
corresponding output.
The result of the unsupervised learning algorithm might be less accurate as input data is not labeled,
and algorithms do not know the exact output in advance.

Applications of Clustering
In Identification of Cancer Cells: The clustering algorithms are widely used for the identification of
cancerous cells. It divides the cancerous and non-cancerous data sets into different groups.
In Search Engines: Search engines also work on the clustering technique. The search result appears
based on the closest object to the search query. It does it by grouping similar data objects in one
group that is far from the other dissimilar objects. The accurate result of a query depends on the
quality of the clustering algorithm used.
Customer Segmentation: It is used in market research to segment the customers based on their
choice and preferences.
In Biology: It is used in the biology stream to classify different species of plants and animals using the
image recognition technique.

In Land Use: The clustering technique is used in identifying the area of similar lands use in the GIS
database. This can be very useful to find that for what purpose the particular land should be used,
that means for which purpose it is more suitable.

What Is Hierarchical Clustering?

Hierarchical agglomerative clustering is one of the main types of clustering algorithm and works using
the following steps:

What Is Hierarchical Clustering?

Hierarchical agglomerative clustering is one of the main types of clustering algorithm and works using
the following steps:
1.Each item within a dataset starts as an individual cluster (AKA: singletons).
2.The algorithm computes the proximity amongst each cluster.
3.It then proceeds to merge each pair of closest clusters and computes the new proximity amongst
remaining clusters.
4.This process is then repeated until there is one cluster left.

What Is Partitional Clustering?

On the other hand, partitional clustering is another primary type of clustering technique. The K-Means
algorithm is one of the most popular partitional clustering methods.

The K-Means clustering requires the analyst to define K number of clusters or the
number of iterations before running the algorithm. As such, it relies heavily on the
analyst’s knowledge to classify the clusters in a meaningful way.

The K-Means clustering works using the following steps:

1.The analyst selects the number of K clusters to be used as initial centroids or the
number of iterations for the algorithm to run (AKA: Stopping criterion)

2.Objects closest to the centroid are grouped to form K number of clusters

3.With every iteration, the centroid distance for each cluster shifts and is updated

4.The process is repeated until there is no more change in the centroid distance for
each cluster or until the number of iterations is fulfilled

Difference between Hierarchical and Partitional Clustering

In conclusion, the main differences between Hierarchical and Partitional Clustering

are that each cluster starts as individual clusters or singletons. With every iteration,
the closest clusters get merged. This process repeats until one single cluster remains
for Hierarchical clustering.

Complete versus Partial

A complete Clustering allocates each object to a cluster, whereas partial Clustering

does not. The inspiration for a partial Clustering is that a few objects in a data set may
not belong to distinct groups. Most of the time, objects in the data set may produce
outliers, noise, or "uninteresting background." For example, some news headlines
stories may share a common subject, such that " Industrial production shrinks globally
by 1.1 percent," While different stories are more frequent or one-of-a-kind.
Consequently, to locate the significant topics in the last month's stories, we might
need to search only for clusters of documents that are firmly related by a common
subject. In other cases, a complete Clustering of objects is desired. For example, an
application that utilizes Clustering to sort out documents for browsing needs to ensure
that all documents can be browsed.

Types of Clustering in Machine Learning

Clustering broadly divides into two subgroups:
Hard Clustering: Each input data point either fully belongs to a cluster or not. For instance, in the
example above, every customer is assigned to one group out of the ten.
Soft Clustering: Rather than assigning each input data point to a distinct cluster, it assigns a
probability or likelihood of the data point being in those clusters. For example, in the given scenario,
each customer receives a probability of being in any of the ten retail store clusters.

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the

unlabeled dataset into different clusters. Here K defines the number of pre-defined
clusters that need to be created in the process, as if K=2, there will be two clusters,
and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in
such a way that each dataset belongs only one group that has similar properties.

It is a centroid-based algorithm, where each cluster is associated with a centroid. The

main aim of this algorithm is to minimize the sum of distances between the data point
and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-2: Select random K points or centroids. (It can be other from the input dataset).

Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.

Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
centroid of each cluster.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Significance of support vector machine

1. Effective in High-Dimensional Spaces: SVMs perform well even in high-

dimensional spaces, making them suitable for tasks where the number of features
exceeds the number of samples. This makes them particularly useful in text
classification and image recognition.

2. Robustness to Overfitting: SVMs are less prone to overfitting compared to

other algorithms like decision trees. This is because they maximize the margin
between classes, which helps generalize well to unseen data.

3. Kernel Trick for Non-Linear Decision Boundaries: SVMs can efficiently

handle non-linear decision boundaries through the kernel trick. By transforming the
input space into a higher-dimensional space, SVMs can find linear decision
boundaries in this transformed space, effectively capturing non-linear relationships in
the original space.

4. Global Optimization Objective: The objective function of SVMs involves

finding the hyperplane that maximizes the margin between classes, which is a convex
optimization problem. This means that SVMs converge to the global minimum,
providing a unique solution and making them more stable compared to algorithms
with non-convex objectives.

5. Regularization Parameter: SVMs have a regularization parameter (C) that

helps control the trade-off between maximizing the margin and minimizing the
classification error. This parameter allows users to adjust the flexibility of the model,
making it more adaptable to different types of datasets.
6. **Memory Efficient**: Once the SVM model is trained, only a subset of training
data points, the support vectors, are used to define the decision boundary. This makes
SVMs memory efficient, especially when dealing with large datasets.

7. Well-Studied and Established: SVMs are a well-studied algorithm with strong

theoretical foundations in machine learning. There is a wealth of literature and
research on SVMs, making them a trusted choice for many classification and
regression tasks.

Overall, the significance of SVMs lies in their ability to handle high-dimensional data,
their robustness to overfitting, their capability to capture non-linear relationships, and
their solid theoretical foundations, making them a valuable tool in the machine
learning toolkit.

CEC453 Machine Learning
No ratings yet
CEC453 Machine Learning
168 pages
Unsupervised Machine Learning in Python
100% (1)
Unsupervised Machine Learning in Python
89 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
Unsupervised Learning Notes
No ratings yet
Unsupervised Learning Notes
21 pages
Meta Motion Fitness Tracker 241109 213742 (1) Removed
No ratings yet
Meta Motion Fitness Tracker 241109 213742 (1) Removed
20 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
16 pages
ML Valkenborg
No ratings yet
ML Valkenborg
84 pages
Optimisation and Dimension Reduction Tech-Unlocked
No ratings yet
Optimisation and Dimension Reduction Tech-Unlocked
29 pages
Unit 4
No ratings yet
Unit 4
62 pages
Unit 2 Unsupervised Learning
No ratings yet
Unit 2 Unsupervised Learning
86 pages
Unit IV - Learning
No ratings yet
Unit IV - Learning
18 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
10 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
59 pages
Machine Learning Questions
No ratings yet
Machine Learning Questions
6 pages
R20 Machine Learning Unit 4
No ratings yet
R20 Machine Learning Unit 4
49 pages
ML Clustering
No ratings yet
ML Clustering
33 pages
UnSupervised ML
No ratings yet
UnSupervised ML
17 pages
Machine Learning4
No ratings yet
Machine Learning4
39 pages
Presentation 3
No ratings yet
Presentation 3
43 pages
Lecture 03
No ratings yet
Lecture 03
28 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Unit 3 Big Data
No ratings yet
Unit 3 Big Data
50 pages
Module 3
No ratings yet
Module 3
17 pages
Data Science Crash Course
100% (1)
Data Science Crash Course
32 pages
Ai - W8L15
No ratings yet
Ai - W8L15
44 pages
WEEK 5 Machine Learning
No ratings yet
WEEK 5 Machine Learning
8 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
30 pages
Assignment 3
No ratings yet
Assignment 3
22 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
15 pages
Machine Learningfor Everyone
No ratings yet
Machine Learningfor Everyone
35 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Unit 4 Mining
No ratings yet
Unit 4 Mining
12 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
51 pages
Statistics For Geoscientists: Pieter Vermeesch
No ratings yet
Statistics For Geoscientists: Pieter Vermeesch
225 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
Unsupervised Learning 1691392220
No ratings yet
Unsupervised Learning 1691392220
15 pages
Introduction To Unsupervised Machine Learning
No ratings yet
Introduction To Unsupervised Machine Learning
9 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
How To Perform Clustering Algorithms in Machine Learning
No ratings yet
How To Perform Clustering Algorithms in Machine Learning
9 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
17 pages
Clustering
No ratings yet
Clustering
22 pages
L05 Unsupervised Learning - Overview
No ratings yet
L05 Unsupervised Learning - Overview
16 pages
U5 Unsupervised Learning
No ratings yet
U5 Unsupervised Learning
15 pages
Machine Learning Section3 Ebook v05
No ratings yet
Machine Learning Section3 Ebook v05
15 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Group I Discrete Mathematics
No ratings yet
Group I Discrete Mathematics
4 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Un-Supervised Machine Learning
No ratings yet
Un-Supervised Machine Learning
9 pages
Which ML Algo Should I Use SAS
No ratings yet
Which ML Algo Should I Use SAS
20 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
Machine Learning For Everyone
No ratings yet
Machine Learning For Everyone
35 pages
Final Written Exam Edit 3.3
No ratings yet
Final Written Exam Edit 3.3
13 pages
GIS and Remote Sensing Techniques (Preview)
75% (4)
GIS and Remote Sensing Techniques (Preview)
29 pages
Unit 2 Data Analytics
No ratings yet
Unit 2 Data Analytics
33 pages
Risk Management and Analysis in Hydro-Electric Projects in India
No ratings yet
Risk Management and Analysis in Hydro-Electric Projects in India
7 pages
Quantum-Enhanced Support Vector Classifier For Image Classification
100% (1)
Quantum-Enhanced Support Vector Classifier For Image Classification
6 pages
Implementation of Smart Attendance On FPGA: October 2019
No ratings yet
Implementation of Smart Attendance On FPGA: October 2019
6 pages
Phys2431 Python Programming For Linear Algebra-1
No ratings yet
Phys2431 Python Programming For Linear Algebra-1
3 pages
Mcqs 1
No ratings yet
Mcqs 1
34 pages
Using R With Multivariate Statistics Randall E Schumacker Download
No ratings yet
Using R With Multivariate Statistics Randall E Schumacker Download
78 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
BAUMGARTNER 1992 Remembrances of Things Past
No ratings yet
BAUMGARTNER 1992 Remembrances of Things Past
8 pages
CHP 2
No ratings yet
CHP 2
52 pages
Real Statistics Examples Part 1B
No ratings yet
Real Statistics Examples Part 1B
421 pages
An Accurate Plant Disease Detection Technique Usin
No ratings yet
An Accurate Plant Disease Detection Technique Usin
9 pages
Last Names of Albanians Ahg12015
No ratings yet
Last Names of Albanians Ahg12015
12 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Data Mining Basics
No ratings yet
Data Mining Basics
38 pages
International Journal of TESOL Studies (2024)
No ratings yet
International Journal of TESOL Studies (2024)
19 pages
Lenka Šťastná & Martin Gregor (2015) Public Sector Efficiency in Transition and Beyond Evidence From Czech Local Governments
No ratings yet
Lenka Šťastná & Martin Gregor (2015) Public Sector Efficiency in Transition and Beyond Evidence From Czech Local Governments
21 pages
10.1007@s12517 020 05551 Z PDF
No ratings yet
10.1007@s12517 020 05551 Z PDF
12 pages
2012-408 Understanding Correlation Matrices
No ratings yet
2012-408 Understanding Correlation Matrices
6 pages
ICA Dim Red
No ratings yet
ICA Dim Red
39 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
1 s2.0 002364389590011X Main
No ratings yet
1 s2.0 002364389590011X Main
5 pages
Partial Least Squares Regression (PLS) - Statistical Software For Excel
No ratings yet
Partial Least Squares Regression (PLS) - Statistical Software For Excel
3 pages
2020conf DeepFit
No ratings yet
2020conf DeepFit
16 pages
10 - 2015-Salmani-Ghabeshi
No ratings yet
10 - 2015-Salmani-Ghabeshi
9 pages
Factor Analysis
No ratings yet
Factor Analysis
3 pages
Structure of A Questionnaire On Children's Attitudes Towards Inclusive Physical Education (Caipe-Cz)
No ratings yet
Structure of A Questionnaire On Children's Attitudes Towards Inclusive Physical Education (Caipe-Cz)
6 pages
EE 583: Project 3 Nithin Srinivasan, 3575 7582 04: 1 Narrowband Interference Suppression
No ratings yet
EE 583: Project 3 Nithin Srinivasan, 3575 7582 04: 1 Narrowband Interference Suppression
6 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet

Variance

Uploaded by

Variance

Uploaded by

Variance:Variance refers to the changes in the model when using different portions of the training

2. Eigenvector: The vector v mentioned above is called an eigenvector corresponding to the

Use of Eigenvalues and Eigenvectors in Machine Learning and AI:

What is dimensionality reduction?

Feature Selection Techniques in Machine Learning

What is Unsupervised Learning?

Why use Unsupervised Learning?

Types of Unsupervised Learning Algorithm:

Advantages of Unsupervised Learning

Disadvantages of Unsupervised Learning

What Is Hierarchical Clustering?

What Is Hierarchical Clustering?

What Is Partitional Clustering?

The K-Means clustering works using the following steps:

2.Objects closest to the centroid are grouped to form K number of clusters

Difference between Hierarchical and Partitional Clustering

In conclusion, the main differences between Hierarchical and Partitional Clustering

Complete versus Partial

A complete Clustering allocates each object to a cluster, whereas partial Clustering

Types of Clustering in Machine Learning

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the

It is a centroid-based algorithm, where each cluster is associated with a centroid. The

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:

Step-1: Select the number K to decide the number of clusters.

Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

Step-7: The model is ready.

Significance of support vector machine

1. **Effective in High-Dimensional Spaces**: SVMs perform well even in high-

2. **Robustness to Overfitting**: SVMs are less prone to overfitting compared to

3. **Kernel Trick for Non-Linear Decision Boundaries**: SVMs can efficiently

4. **Global Optimization Objective**: The objective function of SVMs involves

5. **Regularization Parameter**: SVMs have a regularization parameter (C) that

7. **Well-Studied and Established**: SVMs are a well-studied algorithm with strong

You might also like

1. Effective in High-Dimensional Spaces: SVMs perform well even in high-

2. Robustness to Overfitting: SVMs are less prone to overfitting compared to

3. Kernel Trick for Non-Linear Decision Boundaries: SVMs can efficiently

4. Global Optimization Objective: The objective function of SVMs involves

5. Regularization Parameter: SVMs have a regularization parameter (C) that

7. Well-Studied and Established: SVMs are a well-studied algorithm with strong