0% found this document useful (0 votes)

39 views6 pages

Wk01 Machine Learning

Uploaded by

bhaskarsupplychain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views6 pages

Wk01 Machine Learning

Uploaded by

bhaskarsupplychain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Week-1: Dimensionality Reduction with PCA

Sherry Thomas
21f3001449

Contents
Introduction to Machine Learning 1
Broad Paradigms of Machine Learning . . . . . . . . . . . . . . . . . . 2

Representation Learning 3
Potential Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Principal Component Analysis 4

Approximate Representation . . . . . . . . . . . . . . . . . . . . . . . 5
P.C.A. Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Acknowledgments 6
Abstract
The week provides an introduction to Machine Learning and subse-
quently delves into the syllabus with a focus on unsupervised learning.
The two primary areas of study covered are representation learning and
Principal Component Analysis (PCA).

Introduction to Machine Learning

Machine Learning is a sub-field of artificial intelligence concerned with the de-
sign of algorithms and statistical models that allow computers to learn from
and make predictions or decisions based on data, without being explicitly pro-
grammed. It utilizes mathematical optimization, algorithms, and computational
models to analyze and understand patterns in data and make predictions about
future outcomes.
It can be further explained as follows:
• Why: Machine Learning is used to automate tasks that would otherwise
require human intelligence, to process vast amounts of data, and to make
predictions or decisions with greater accuracy than traditional approaches.
It also has surged in popularity in recent years.
• Where: Machine Learning is applied in various fields such as computer
vision, natural language processing, finance, and healthcare, among oth-
ers.Where: Machine Learning is applied in various fields such as computer
vision, natural language processing, finance, and healthcare, among oth-
ers.

1
• What: Machine Learning departs from traditional procedural approaches,
instead it is driven by data analysis. Rather than memorizing specific
examples, it seeks to generalize patterns in the data. Machine Learning
is not based on magic, rather it relies on mathematical principles and
algorithms.

Broad Paradigms of Machine Learning

1. Supervised Learning:Supervised Machine Learning is a type of machine
learning where the algorithm is trained on a labeled dataset, meaning that
the data includes both inputs and their corresponding outputs. The goal
of supervised learning is to build a model that can accurately predict the
output for new, unseen input data. Few examples:
• Linear regression for predicting a continuous output
• Logistic regression for binary classification problems
• Decision trees for non-linear classification and regression problems
• Support Vector Machines for binary and multi-class classification problems
• Neural Networks for complex non-linear problems in various domains such
as computer vision, natural language processing, and speech recognition
2. Unsupervised Learning: Unsupervised Machine Learning is a type of
machine learning where the algorithm is trained on an unlabeled dataset,
meaning that only the inputs are provided and no corresponding outputs.
The goal of unsupervised learning is to uncover patterns or relationships
within the data without any prior knowledge or guidance. Few examples:
• Clustering algorithms such as K-means, hierarchical clustering, and
density-based clustering, used to group similar data points together into
clusters
• Dimensionality reduction techniques such as Principal Component Anal-
ysis (PCA), used to reduce the number of features in a dataset while
preserving the maximum amount of information
• Anomaly detection algorithms used to identify unusual data points that
deviate from the normal patterns in the data
3. Sequential learning: Sequential Machine Learning (also known as time-
series prediction) is a type of machine learning that is focused on making
predictions based on sequences of data. It involves training the model on
a sequence of inputs, such that the predictions for each time step depend
on the previous time steps. Few examples:
• Time series forecasting, used to predict future values based on past trends
and patterns in data such as stock prices, weather patterns, and energy
consumption
• Speech recognition, used to transcribe speech into text by recognizing
patterns in audio signals
• Natural language processing, used to analyze and make predictions about
sequences of text data

2
Representation Learning
Representation learning is a fundamental sub-field of machine learning that is
concerned with acquiring meaningful and compact representations of intricate
data, facilitating various tasks such as dimensionality reduction, clustering, and
classification.
Let us consider a dataset {x1 , x2 , … , x𝑛 }, where each x𝑖 ∈ ℝ𝑑 . The objective is
to find a representation that minimizes the reconstruction error.
We can start by seeking the best linear representation of the dataset, denoted
by w, subject to the constraint ||w|| = 1.
The representation is given by,

(x𝑇𝑖 w)
w
w𝑇 w
However, ||w|| = 1
∴ Projection = (x𝑇𝑖 w)w

The reconstruction error is computed as follows,

1 𝑛
Reconstruction Error(𝑓(w)) = ∑ ||x − (x𝑇𝑖 w)w||2
𝑛 𝑖=1 𝑖

where x𝑖 − (x𝑇𝑖 w)w is termed the residue and can be represented as x′ .

The primary aim is to minimize the reconstruction error, leading to the following
optimization formulation:

1 𝑛
min 𝑓(w) = ∑ −(x𝑇𝑖 w)2
w∈||w||=1 𝑛 𝑖=1
1 𝑛 𝑇 2
∴ max 𝑓(w) = ∑(x w)
w∈||w||=1 𝑛 𝑖=1 𝑖
1 𝑛
= w𝑇 ( ∑ x x𝑇 )w
𝑛 𝑖=1 𝑖 𝑖
max 𝑓(w) = w𝑇 Cw
w∈||w||=1

1 𝑛
where C = ∑ x x𝑇 represents the Covariance Matrix, and C ∈ ℝ𝑑×𝑑 .
𝑛 𝑖=1 𝑖 𝑖
Notably, the eigenvector w corresponding to the largest eigenvalue 𝜆 of C be-
comes the sought-after solution for the representation. This w is often referred
to as the First Principal Component of the dataset.

3
Potential Algorithm
Based on the above concepts, we can outline the following algorithm for repre-
sentation learning:
Given a dataset {x1 , x2 , … , x𝑛 } where x𝑖 ∈ ℝ𝑑 ,
1. Center the dataset:
1 𝑛
�= ∑x
𝑛 𝑖=1 𝑖
x𝑖 = x𝑖 − � ∀𝑖

2. Find the best representation w ∈ ℝ𝑑 with ||w|| = 1.

3. Update the dataset with the representation:

x𝑖 = x𝑖 − (x𝑇𝑖 w)w ∀𝑖

4. Repeat steps 2 and 3 until the residues become zero, resulting in

w2 , w3 , … , w𝑑 .
The question arises: Is this the most effective approach, and how many w do
we need to achieve optimal compression?

Principal Component Analysis

Principal Component Analysis (PCA) is a powerful technique employed to re-
duce the dimensionality of a dataset by identifying its most important features,
known as principal components, which explain the maximum variance present
in the data. PCA achieves this by transforming the original dataset into a
new set of uncorrelated variables, ordered by their significance in explaining
the variance. This process is valuable for visualizing high-dimensional data and
preprocessing it before conducting machine learning tasks.
Following the potential algorithm mentioned earlier and utilizing the set of
eigenvectors {w1 , w2 , … , w𝑑 }, we can express each data point x𝑖 as a linear
combination of the projections on these eigenvectors:

∀𝑖 x𝑖 − ((x𝑇𝑖 w1 )w1 + (x𝑇𝑖 w2 )w2 + … + (x𝑇𝑖 w𝑑 )w𝑑 ) = 0

∴x𝑖 = (x𝑇𝑖 w1 )w1 + (x𝑇𝑖 w2 )w2 + … + (x𝑇𝑖 w𝑑 )w𝑑

From the above equation, we observe that we can represent the data using
constants {x𝑇𝑖 w1 , x𝑇𝑖 w2 , … , x𝑇𝑖 w𝑑 } along with vectors {w1 , w2 , … , w𝑑 }.
Thus, a dataset initially represented as 𝑑 × 𝑛 can now be compressed to 𝑑(𝑑 + 𝑛)
elements, which might seem suboptimal at first glance.
However, if the data resides in a lower-dimensional subspace, the residues can be
reduced to zero without requiring all 𝑑 principal components. Suppose the data
can be adequately represented using only 𝑘 principal components, where 𝑘 ≪ 𝑑.
In that case, the data can be eﬀiciently compressed from 𝑑 × 𝑛 to 𝑘(𝑑 + 𝑛).

4
Approximate Representation
The question arises: If the data can be approximately represented by a lower-
dimensional subspace, would it suﬀice to use only those 𝑘 projections? Addi-
tionally, how much variance should be covered?
Let us consider a centered dataset {x1 , x2 , … , x𝑛 } where x𝑖 ∈ ℝ𝑑 . Let C rep-
resent its covariance matrix, and {𝜆1 , 𝜆2 , … , 𝜆𝑑 } be the corresponding eigen-
values, which are non-negative due to the positive semi-definiteness of the co-
variance matrix. These eigenvalues are arranged in descending order, with
{w1 , w2 , … , w𝑑 } as their corresponding eigenvectors of unit length.
The eigen equation for the covariance matrix can be expressed as follows:

Cw = 𝜆w
w Cw = w𝑇 𝜆w
𝑇

∴𝜆 = w𝑇 Cw {w𝑇 w = 1}
1 𝑛 𝑇 2
𝜆= ∑(x w)
𝑛 𝑖=1 𝑖

Hence, the mean of the dataset being zero, 𝜆 represents the variance captured
by the eigenvector w.
A commonly accepted heuristic suggests that PCA should capture at least 95%
of the variance. If the first 𝑘 eigenvectors capture the desired variance, it can
be stated as:
𝑘
∑ 𝜆𝑗
𝑗=1
𝑑
≥ 0.95
∑ 𝜆𝑖
𝑖=1

Thus, the higher the variance captured, the lower the error incurred.

5
P.C.A. Algorithm
The Principal Component Analysis algorithm can be summarized as follows for a
centered dataset {x1 , x2 , … , x𝑛 } where x𝑖 ∈ ℝ𝑑 , and C represents its covariance
matrix:
• Step 1: Find the eigenvalues and eigenvectors of C. Let {𝜆1 , 𝜆2 , … , 𝜆𝑑 }
be the eigenvalues arranged in descending order, and {w1 , w2 , … , w𝑑 } be
their corresponding eigenvectors of unit length.
• Step 2: Calculate 𝑘, the number of top eigenvalues and eigenvectors
required, based on the desired variance to be covered.
• Step 3: Project the data onto the eigenvectors and obtain the desired
representation as a linear combination of these projections.

Figure 1: The dataset depicted in the diagram has two principal components:
the green vector represents the first PC, whereas the red vector corresponds to
the second PC.

In essence, PCA is a dimensionality reduction technique that identifies feature

combinations that are de-correlated (independent of each other).

Acknowledgments
Professor Arun Rajkumar: The content, including the concepts and nota-
tions presented in this document, has been sourced from his slides and lectures.

Iiitb Ed ML Ai
No ratings yet
Iiitb Ed ML Ai
24 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Regression Project
100% (1)
Regression Project
60 pages
Data Science
No ratings yet
Data Science
15 pages
HERQA Accreditation List of College in English
79% (14)
HERQA Accreditation List of College in English
43 pages
9 Supervised Learning - II
No ratings yet
9 Supervised Learning - II
55 pages
Linear Algebra LectureNote
No ratings yet
Linear Algebra LectureNote
288 pages
Module 1 Topic-3-ML Framework
No ratings yet
Module 1 Topic-3-ML Framework
82 pages
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
No ratings yet
Cs8082 Machine Learning Techniques Ripped From Amazon Kindle e Books by Sai Seena
148 pages
Oracle Live Project
No ratings yet
Oracle Live Project
2 pages
AI
No ratings yet
AI
101 pages
ML - 1 - Sovan - Introduction To ML
No ratings yet
ML - 1 - Sovan - Introduction To ML
83 pages
RDBMS Concepts: © Tata Consultancy Services Ltd. July 7, 2018 1
No ratings yet
RDBMS Concepts: © Tata Consultancy Services Ltd. July 7, 2018 1
38 pages
Computer Organisation Makaut
No ratings yet
Computer Organisation Makaut
163 pages
Syllabus of Data Science 5
No ratings yet
Syllabus of Data Science 5
7 pages
JDK Installation Guide
100% (1)
JDK Installation Guide
8 pages
Jntuk r20 ML Unit-II
No ratings yet
Jntuk r20 ML Unit-II
33 pages
Textbook ML - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed
44 pages
UE20CS302 Unit4 Slides
No ratings yet
UE20CS302 Unit4 Slides
312 pages
CRT Aptitude Paid
No ratings yet
CRT Aptitude Paid
107 pages
Introduction To Data Visualization With Python
No ratings yet
Introduction To Data Visualization With Python
47 pages
Engineering Mathematics III
No ratings yet
Engineering Mathematics III
410 pages
Advanced SQL 3 1730883488
No ratings yet
Advanced SQL 3 1730883488
14 pages
Oracle 12C - (SQL & PL/SQL) : 1.fundamentals of Database
No ratings yet
Oracle 12C - (SQL & PL/SQL) : 1.fundamentals of Database
8 pages
Vector Booklet Full Stack Final - v7 - Without - Bleed-Compressed
No ratings yet
Vector Booklet Full Stack Final - v7 - Without - Bleed-Compressed
18 pages
Module 1 Topic-2-ML Applications
No ratings yet
Module 1 Topic-2-ML Applications
44 pages
Data Minds - Data Science Curriculum 2023 V2
No ratings yet
Data Minds - Data Science Curriculum 2023 V2
15 pages
Data Science Syllabus
No ratings yet
Data Science Syllabus
23 pages
ML CHP 123
No ratings yet
ML CHP 123
69 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Explain How You Would Conduct A Job Analysis in A Company That Has Never Had Job Descriptions
No ratings yet
Explain How You Would Conduct A Job Analysis in A Company That Has Never Had Job Descriptions
3 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
Ls2-Lesson 1 How Do I Get Bad Luck
100% (1)
Ls2-Lesson 1 How Do I Get Bad Luck
5 pages
Data Science Course in Hyderabad - Innomatics
No ratings yet
Data Science Course in Hyderabad - Innomatics
10 pages
ML Decode TE IT
No ratings yet
ML Decode TE IT
71 pages
Introduction To Machine Learning PART 1
No ratings yet
Introduction To Machine Learning PART 1
6 pages
Cognizant Internship Report
No ratings yet
Cognizant Internship Report
54 pages
Lecture 1
No ratings yet
Lecture 1
43 pages
Full Notes
No ratings yet
Full Notes
37 pages
Python 2023
No ratings yet
Python 2023
7 pages
Skyess Spark Syllabus
No ratings yet
Skyess Spark Syllabus
12 pages
Web Dev Notes
No ratings yet
Web Dev Notes
270 pages
Introduction To AI, ML and DL: Dr. Manjubala Bisi
No ratings yet
Introduction To AI, ML and DL: Dr. Manjubala Bisi
33 pages
Linear Algebra Lec - 1 Iit - Jam Math Crash Course 2023
No ratings yet
Linear Algebra Lec - 1 Iit - Jam Math Crash Course 2023
45 pages
2nd Exam Question Paper 2
No ratings yet
2nd Exam Question Paper 2
16 pages
CS230: Deep Learning: Winter Quarter 2019 Stanford University Midterm Examination 180 Minutes
No ratings yet
CS230: Deep Learning: Winter Quarter 2019 Stanford University Midterm Examination 180 Minutes
29 pages
C P
No ratings yet
C P
14 pages
Question Bank AML
No ratings yet
Question Bank AML
4 pages
Apprenticeship Form
No ratings yet
Apprenticeship Form
4 pages
M.sc. (Maths) Book Final 29-11-18 PDF
No ratings yet
M.sc. (Maths) Book Final 29-11-18 PDF
171 pages
CS601PC - MACHINE LEARNING Unit - 1-2
No ratings yet
CS601PC - MACHINE LEARNING Unit - 1-2
145 pages
Bhramam Question
No ratings yet
Bhramam Question
300 pages
03 CCFP4.0 RDBMS
No ratings yet
03 CCFP4.0 RDBMS
27 pages
15CS31T
No ratings yet
15CS31T
114 pages
Sathish Yellanki: Skyess: in Association With
No ratings yet
Sathish Yellanki: Skyess: in Association With
12 pages
About Skyess: Problems Crushed
No ratings yet
About Skyess: Problems Crushed
11 pages
Oracle Syllabus
No ratings yet
Oracle Syllabus
7 pages
SQL PL SQL Content
No ratings yet
SQL PL SQL Content
4 pages
New Advances in Machine Learning: ISBN 978-953-307-034-6
No ratings yet
New Advances in Machine Learning: ISBN 978-953-307-034-6
378 pages
Satish Yellanki August 2107 AM
No ratings yet
Satish Yellanki August 2107 AM
18 pages
Exam1 PHYS 193 Summer2015
No ratings yet
Exam1 PHYS 193 Summer2015
8 pages
SQL by Jai Shankar Sir
No ratings yet
SQL by Jai Shankar Sir
158 pages
Machine Learning Methods For Data Security
No ratings yet
Machine Learning Methods For Data Security
141 pages
Ian Goodfellow, Yoshua Bengio, Aaron Courville - Deep Learning (2017, MIT)
No ratings yet
Ian Goodfellow, Yoshua Bengio, Aaron Courville - Deep Learning (2017, MIT)
5 pages
Volume 2 Final
No ratings yet
Volume 2 Final
199 pages
Tibagan High School: 7 Ave., Brgy. East Rembo, City of Makati 1216 Metro Manila
No ratings yet
Tibagan High School: 7 Ave., Brgy. East Rembo, City of Makati 1216 Metro Manila
7 pages
Pa Core 2014 Reading Street CC 2013 Grade K Final
No ratings yet
Pa Core 2014 Reading Street CC 2013 Grade K Final
35 pages
Physics 5th Edition Walker Digital Access
No ratings yet
Physics 5th Edition Walker Digital Access
406 pages
Exam Questions - Second Term Examination Mathematics For JSS 1 (Basic 7) - ClassRoomNotes
No ratings yet
Exam Questions - Second Term Examination Mathematics For JSS 1 (Basic 7) - ClassRoomNotes
14 pages
Highlighter Ink Out of Blue Ternate
No ratings yet
Highlighter Ink Out of Blue Ternate
6 pages
GDS Cea - 18 09 19
No ratings yet
GDS Cea - 18 09 19
4 pages
Read Aloud Prek
No ratings yet
Read Aloud Prek
4 pages
Che101 Chemistry
No ratings yet
Che101 Chemistry
9 pages
Shakey's 2021
No ratings yet
Shakey's 2021
69 pages
Information Security Professional With Active DoD Clearance
No ratings yet
Information Security Professional With Active DoD Clearance
3 pages
Overview of The Transformer-Based Models For NLP Tasks
No ratings yet
Overview of The Transformer-Based Models For NLP Tasks
5 pages
The Zen Critique of Pure Land Buddhism
No ratings yet
The Zen Critique of Pure Land Buddhism
18 pages
Cultural Tendencies in Negotiation - A Comparison of Finland Indi
No ratings yet
Cultural Tendencies in Negotiation - A Comparison of Finland Indi
13 pages
5E Lesson Plan Template
No ratings yet
5E Lesson Plan Template
3 pages
Pfe Manual
No ratings yet
Pfe Manual
9 pages
Soal Les Uts s1
No ratings yet
Soal Les Uts s1
10 pages
Electron and The Bits
No ratings yet
Electron and The Bits
13 pages
Eng - Overcoming Fear Question Paper (Paper 2) PDF
No ratings yet
Eng - Overcoming Fear Question Paper (Paper 2) PDF
12 pages
Combining XGBoost With Particle Swarm Optimization To Improve Phishing Detection (JOURNAL (Revisi Note
No ratings yet
Combining XGBoost With Particle Swarm Optimization To Improve Phishing Detection (JOURNAL (Revisi Note
8 pages
Analysis of Dual Phase Lag Heat Conduction in Cylind - 2007 - Applied Mathematic
No ratings yet
Analysis of Dual Phase Lag Heat Conduction in Cylind - 2007 - Applied Mathematic
12 pages
20MBS1001 Rudrakshi
No ratings yet
20MBS1001 Rudrakshi
25 pages
The Structured Interview An Alternative To The Assessment Center?
No ratings yet
The Structured Interview An Alternative To The Assessment Center?
15 pages
Albuquerque & Simões (2010) RAN
No ratings yet
Albuquerque & Simões (2010) RAN
13 pages
Lazarski 2021-2022 BABE
No ratings yet
Lazarski 2021-2022 BABE
2 pages
ML Rubric
No ratings yet
ML Rubric
2 pages