0% found this document useful (0 votes)

11 views15 pages

Weekly Homework X

Has the homework submissions for CS550

Uploaded by

chuchubhos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views15 pages

Weekly Homework X

Has the homework submissions for CS550

Uploaded by

chuchubhos

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Assignment 1

Bhavik Shangari
CS550 Machine Learning
September 2, 2024

Question 1.

Figure 1: Data

Solution. Applying PCA to the above given data include these steps
1. Formulate data in form of Matrix
 
2 5
6 4
 
10 11
14 14

2. Zero Mean and Normalize by Standard Deviation

 
−1.34164079 −0.84270097
 −0.4472136 −1.08347268
 
 0.4472136 0.60192927 
1.34164079 1.32424438

T
3. Calculate the covariance matrix M = XN,X , where N denotes number of data points

1. 0.91524923
0.91524923 1.

4. Calculate the Eigen Vectors and Eigen Values of this matrix M. (Mx = λx).

1
λ1 = 1.91524923 λ2 = 0.08475077

0.70710678 −0.70710678
0.70710678 0.70710678

5. Now after getting Eigen Vectors and Eigen Values, you need to project the original
points onto these vectors to get the projections.

 
−1.34164079 −0.84270097
 −0.4472136 −1.08347268 0.70710678 −0.70710678
 ∗
 0.4472136 0.60192927  0.70710678 0.70710678
1.34164079 1.32424438

where * denotes Matrix Multiplication

6. The Projections comes out to be
 
−1.54456287 0.35280373
−1.08235864 −0.44990311
 
 0.74185603 0.1094005 
1.88506548 −0.01230111

7. If you want to consider only 1 Dimension, you could take the first column, which
corresponds to the maximum Eigen value.

8. The Eigen Values provided in the Question by you does not match with the eigen
values computed, so we cannot find eigen vectors through the Covariance matrix for
the given values, and thus there is no need to compare

2
Question 2.

Solution. 1. The given data in Matrix form

 
1 4
1 3
 
0 4
 
5 1
 
6 2
4 0

rows represents different data points and columns represent x and y coordinates.

2. The plot of above data is shown if Figure 2

Figure 2: Plot of data

3. For starting with K-means Clustering Algorithm, we need to start by Randomly Ini-
tializing Centroids (given K=2), and assign each point to one of the Cluster based on
Euclidean distance from the Centroid. Figure 3.

3
Figure 3: Randomly Initialized Centroids

Red -> Centroid

Yellow -> Cluster 1
Purple -> Cluster 2

4. Then we need to replace the centroid with the mean of the points assigned to that
cluster adn repreat this process Iteratively, till the centroid do not change at all. At
the end of all Iterations we get the results shown in Figure 4

4
Figure 4: K Means Final Result

Final Medians: (0.66666667 3.66666667) and (5.0, 1.0)

Code used:

def distance(pt1, pt2):

return np.sqrt((pt1[0] - pt2[0])**2 + (pt1[1] - pt2[1])**2)

points = np.array([(1,4), (1,3), (0,4), (5,1), (6,2), (4,0)])

median = 4*np.random.rand(2,2)

prev_median = np.zeros_like(median)
while median.all() != prev_median.all():
dist_mat = []
for pt in points:
a = []
for med in median:
dist = distance(pt, med)
a.append(dist)
dist_mat.append(a)
dist_mat = np.array(dist_mat)

plt.scatter(median[:,0], median[:, 1],c=’r’)

plt.scatter(points[:, 0], points[:,1], c=np.argmax(dist_mat, axis=1))

5
plt.pause(0.1)
cluster = {0:[], 1:[]}
for i, el in enumerate(np.argmax(dist_mat, axis=1)):
cluster[el].append(points[i])
cluster[0] = np.stack(cluster[0])
cluster[1] = np.stack(cluster[1])
prev_median = median
for i in range(len(median)):
median[i] = np.mean(cluster[i], axis=0)
print(median)

6
Question 3.

Solution. Given data (feature1, feature2) : (Class label)

   
4 1 0
2 4 0
   
2 3 0
   
3 6 0
   
4 4 0
X= 9 , Y =  
 10
1
 
6 8 1
   
9 5 1
  
8 7 1
10 8 1
1. LDA is used for dimensionality reduction, when we need to account for the distance
between data points of different classes when reducing the number of dimensions.

2. Consider that in the projected space, the difference between means of different classes
should be high and the variance of cluster of each class should be small, which leads
to the Objective Function
µ˜1 − µ˜2
max 2 (1)
s˜1 + s˜2 2
where µ˜1 and µ˜2 are the means of the projections of both classes and s˜1 2 and s˜2 2 are
the covariances of the projections of each class.

3. We get the Separate class Covariance for the data points of both the classes s1 and s2 ,
which are in the original feature space.
X
s1 = (xi − µ1 )(xi − µ1 )T (2)
xi ∈c1

X
s2 = (xi − µ2 )(xi − µ2 )T (3)
xi ∈c2

where c1 and c2 denotes the set of points in class 1 and class 2 respectively, µ1 and
µ2 are the means of points of class 1 and 2. s1 and s2 denotes the Class Covariance
matrices.

4. Let pi denote the projections of the data point xi , we can write it as

p i = v T xi (4)

where v T denotes the vector on which projection is taken.

5. So we will have X
s˜1 2 = (pi − µ˜1 )(pi − µ˜1 )T
pi ∈c1

7
X
s˜1 2 = (v T xi − v T µ1 )(v T xi − v T µ1 )T
xi ∈c1

s˜1 2 = v T s1 v (5)
We get Within Class Covariance Matrix as

s˜1 2 + s˜2 2 = v T (s1 + s2 )v = v T SW v

where SW is Within Class Scatter Matrix

S W = s1 + s2 (6)

6. Similarly we define Between Class Scatter Matrix SB as

SB = (µ1 − µ2 )(µ1 − µ2 )T

So
(µ˜1 − µ˜2 )2 = (v T µ1 − v T µ2 )2 = v T SB v

7. Putting this all in our Objective Function, we get

µ˜1 − µ˜2 v T SB v
max 2 = max T
s˜1 + s˜2 2 v SW v
then
δ(J(v))
=0
δv
we finally get
M v = λv (7)
where M = Sw−1 SB

We need to perform eigen decomposition over this to get our projections.

For the values given to us, its plot is shown in Figure 5

8
Figure 5: Data

Red -> Class1

Blue -> Class2

µ1 = [3., 3.6], µ2 = [8.4, 7.6]

13.2 −2.2
SW =
−2.2 26.4

29.16 21.6
SB =
21.6 16.
Now we calculate matrix M with the help of equation 7, later eigen valuesa nd eigen vectors.

λ1 = 3.13137004 λ2 = 0.

0.91955932 −0.59522755
v1 = v2 =
0.39295122 0.80355719

Considering only v1 as it has the maximum variance of projected data.

9
The Projected Vector is shown below, with class labels in Figure 6
 
4.07118849
 3.41092352 
 
 3.0179723 
 
 5.11638527 
 
 5.25004215 
P = 
 12.20554606 

 8.66096567 
 
10.24078996
 
10.10713308
12.33920294

Figure 6: Projection

The code used for all calculation is provided below

class1 = np.array([[4,1],
[2,4],
[2,3],
[3,6],
[4,4]])

class2 = np.array([[9,10],
[6,8],
[9,5],

10
[8,7],
[10,8]])

mean1 = np.mean(class1, axis=0)

mean2 = np.mean(class2, axis=0)

Sw = (class1 - mean1).T @ (class1 - mean1) + (class2 - mean2).T @ (class2 - mean2)

Sb = (mean1.reshape(-1, 2) - mean2.reshape(-1,2)).T @
(mean1.reshape(1, 2) - mean2.reshape(1, 2))

plt.scatter(class1 @ np.linalg.eig(np.linalg.inv(Sw) @ Sb).eigenvectors[:, 0],

np.zeros_like(class1 @ np.linalg.eig(np.linalg.inv(Sw) @ Sb).eigenvectors[:, 0]))

plt.scatter(class2 @ np.linalg.eig(np.linalg.inv(Sw) @ Sb).eigenvectors[:, 0],

np.zeros_like(class2 @ np.linalg.eig(np.linalg.inv(Sw) @ Sb).eigenvectors[:, 0]))

The Advantages & Disadvantages of LDA includes

1. Advantages

(a) Dimensionality Reduction: Reduces the number of features while preserving

class separability.
(b) Class Separation: Maximizes the separation between different classes in the
feature space.
(c) Provides a Linear Decision Boundary: Offers a straightforward model with
clear decision boundaries.

2. Disadvantages

(a) Sensitive to Outliers

(b) Not Suitable for Non-Linear Problems

11
Question 4.

Solution. Github Link

All ML algorithms work with numbers, and they also considers the distance between
feature vectors, which gets affected by the scale and units of the data.
For Example, consider Weight and Height, the general range of Height will be 4 feet to
6.5 feet wheras for Weight it will vary between 60 Kg to 120 Kg.
Now think of Algorithms which rely solely on Distance such as KMeans, all the decision
will solely be based on Weight as its scale is high, so max contribution for Euclidean distance
will come from Weight feature, diminishing affect from Height, which is not a good approach.
These kinds of reasons call for the need of Normalization and the reason why Min-max
Scaling is used.

12
Question 5.

Solution. Let’s start by exploring the difference between Model Parameters and Model Hy-
perparameters.

Model Parameters Model Hyperparameters

1. The Model Parameters are the internal 1. The Model Hyperparameters are the
variables that are learned by the pro- external variables that are set before
cess of Optimization, such as Weights the training process, these can be the
and Biases in models like Linear Regres- number of neurons, layers in case of
sion and Neural Networks or Positional neural networks, learning rate in case
Embeddings in the case of LLMs of Gradient Descent process.
2. Learned during Training / Optimiza-
tion Process. 2. Can be manually set or be made to
learn (by making it parameter).
3. Define the Performance of Model
3. Control the Learning process.

Hyperparameter Tuning: This is the process of finding the optimal set of hyperpa-
rameter that make the Training more stable and leads us with the best set of parameters
which has the optimal performance and tailors with the application we want.
K-Means: Number of Cluster (K), Distance (Manhattan, Euclidean)
DBSCAN: min samples and max distance

13
Question 6.

Solution. Gradient Descent for Determining the De-Mixing Matrix in ICA

In Independent Component Analysis (ICA), the goal is to determine the de-mixing matrix
W such that the observed data x can be expressed as a linear combination of independent
source signals s:

x = As
where A is the mixing matrix, and s are the source signals. The de-mixing matrix W is
the inverse of the mixing matrix, such that:

s = Wx
Log-Likelihood and Cost Function
The log-likelihood of the observed data x, given the de-mixing matrix W, can be written
as:
n
X
L(W) = log p(si ) + log |det(W)|
i=1

where p(si ) is the probability density function of the independent sources.

The cost function to be minimized (negative log-likelihood) is therefore:
n
X
C(W) = − log p(si ) − log |det(W)|
i=1

Gradient Descent
To minimize the cost function C(W), we use gradient descent. The update rule for W is
given by:

14
∂C(W)
W ←W+η
∂W
where η is the learning rate.
Gradient of the Cost Function
The gradient of the cost function C(W) with respect to W can be derived as follows:
n
∂C(W) X ∂ log p(si ) ∂si ∂ log |det(W)|
=− −
∂W i=1
∂si ∂W ∂W
For a non-Gaussian distribution of the sources s, the gradient of the log-density log p(s)
is a non-linear function of s. Therefore:

∂ log p(si )
= g(si )
∂si
where g(si ) is a non-linear function that depends on the assumed distribution of the
sources.
The gradient of the determinant term is:

∂ log |det(W)|
= (W−1 )⊤
∂W
Therefore, the gradient of the cost function becomes:
n
∂C(W) X
= −W⊤ + g(si )x⊤
i
∂W i=1

Update Rule
The update rule for the de-mixing matrix W using gradient descent is:
n
!
X
−1 ⊤
W ←W+η W − g(si )xi
i=1

This iterative update is performed until convergence, at which point the de-mixing ma-
trix W will allow the extraction of the independent source signals from the observed mixtures.

A is not Full rank: When the mixing matrix A in an Independent Component Analysis
(ICA) problem is not full rank, meaning it is singular or nearly singular, several challenges
and implications arise for the ICA process:

1. Loss of Information

2. Incomplete Separation:

3. Instability in Estimation

Solutions include Increasing the Number of Mixtures.

Support Vector Machines
No ratings yet
Support Vector Machines
57 pages
Basic Mathematical Foundations Ai Hands
No ratings yet
Basic Mathematical Foundations Ai Hands
521 pages
A Level Further Mathematics For AQA Student Book 1 (ASYear 1) (Stephen Ward, Paul Fannon) (Z-Library)
100% (1)
A Level Further Mathematics For AQA Student Book 1 (ASYear 1) (Stephen Ward, Paul Fannon) (Z-Library)
699 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
134 pages
EpidemiologyUsingR PDF
No ratings yet
EpidemiologyUsingR PDF
302 pages
Creating Your Own Models For Use With Proteus VSM PDF
No ratings yet
Creating Your Own Models For Use With Proteus VSM PDF
289 pages
Electrical Network, Graph Theory, Incidence Matrix, Topology
100% (1)
Electrical Network, Graph Theory, Incidence Matrix, Topology
53 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Control System Lab Manual V 1
100% (1)
Control System Lab Manual V 1
173 pages
Solutions To Selected Problems in Machine Learning: An Algorithmic Perspective
0% (1)
Solutions To Selected Problems in Machine Learning: An Algorithmic Perspective
4 pages
Introduction To MATLAB: Sajid Gul Khawaja
No ratings yet
Introduction To MATLAB: Sajid Gul Khawaja
48 pages
5 Sem Lab Manual R Programming BCA-BSC
No ratings yet
5 Sem Lab Manual R Programming BCA-BSC
16 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Sketching As A Tool For Numerical Linear Algebra
No ratings yet
Sketching As A Tool For Numerical Linear Algebra
139 pages
LDA Tutorial
No ratings yet
LDA Tutorial
47 pages
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
No ratings yet
Pattern Recognition (CSE4213) : Linear Discriminant Analysis (LDA)
33 pages
Lda PDF
No ratings yet
Lda PDF
47 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Patter Recognition (Spring 2010) Midterm Exam: and ω are distributed according to
No ratings yet
Patter Recognition (Spring 2010) Midterm Exam: and ω are distributed according to
4 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
hw7 Sol
No ratings yet
hw7 Sol
12 pages
AI42001 Practice 2
No ratings yet
AI42001 Practice 2
4 pages
CS-3035 (ML) - CS End April 2024
No ratings yet
CS-3035 (ML) - CS End April 2024
21 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
No ratings yet
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
12 pages
Introduction To Support Vector Machines: Andrew Moore CMU
No ratings yet
Introduction To Support Vector Machines: Andrew Moore CMU
40 pages
10 SVM
No ratings yet
10 SVM
77 pages
BEAMPASK
No ratings yet
BEAMPASK
49 pages
Discriminant Functions
No ratings yet
Discriminant Functions
33 pages
Machine Unit4
No ratings yet
Machine Unit4
55 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
8 pages
B22CS014 Report
No ratings yet
B22CS014 Report
11 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Lecture: Dimensionality Reduction With Principal Component Analysis
No ratings yet
Lecture: Dimensionality Reduction With Principal Component Analysis
42 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Assignment 2 Documentation
No ratings yet
Assignment 2 Documentation
15 pages
ML PG Assignment 3
No ratings yet
ML PG Assignment 3
3 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
OAVS PGT Syllabus
No ratings yet
OAVS PGT Syllabus
15 pages
Assig1 2023
No ratings yet
Assig1 2023
3 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
ML 2023a Midsem Solution
No ratings yet
ML 2023a Midsem Solution
9 pages
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 3 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
8 pages
Math Question Papers For Class 12
100% (1)
Math Question Papers For Class 12
155 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
LINFO2275 Questions D Examen-4
No ratings yet
LINFO2275 Questions D Examen-4
34 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
ECE B.Tech Syllabus 2021 22
No ratings yet
ECE B.Tech Syllabus 2021 22
75 pages
E9 205 - Machine Learning For Signal Processing
No ratings yet
E9 205 - Machine Learning For Signal Processing
3 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
SVM Using Iris Dataset by Hyparlink
No ratings yet
SVM Using Iris Dataset by Hyparlink
19 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
ML IA2 Answers
No ratings yet
ML IA2 Answers
4 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
4 pages
Mathematica Vs R Mind Map
No ratings yet
Mathematica Vs R Mind Map
1 page
2IIG0 Cheat Sheet 1
No ratings yet
2IIG0 Cheat Sheet 1
2 pages
Scs 414 Machine Learning Assignment 2 Sc212-1012-2019
No ratings yet
Scs 414 Machine Learning Assignment 2 Sc212-1012-2019
12 pages
MT201 Course Outline
No ratings yet
MT201 Course Outline
4 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
Lab 2 SVM
No ratings yet
Lab 2 SVM
23 pages
Department of Electrical Engineering School of Science and Engineering
No ratings yet
Department of Electrical Engineering School of Science and Engineering
10 pages
Semester End Examinations - July 2024: USN 1 M S
No ratings yet
Semester End Examinations - July 2024: USN 1 M S
4 pages
7 Quantum Mechanics
No ratings yet
7 Quantum Mechanics
572 pages
DiagonalIzation Matrix
No ratings yet
DiagonalIzation Matrix
4 pages
CIM Unit-3
No ratings yet
CIM Unit-3
98 pages
Final Exam Linear Algebra
No ratings yet
Final Exam Linear Algebra
3 pages
Engineering Mathematics - I (MATH ZC 161) : BITS Pilani
No ratings yet
Engineering Mathematics - I (MATH ZC 161) : BITS Pilani
50 pages
Dynamec System PDF
No ratings yet
Dynamec System PDF
220 pages
Jee Main 2022 - Matrices Chapterwise Past Year Questions
No ratings yet
Jee Main 2022 - Matrices Chapterwise Past Year Questions
85 pages
C Basic Programs by Sanaa Salam
No ratings yet
C Basic Programs by Sanaa Salam
56 pages
Day 1 19 Nov 2023 Discrete Mathematics - New
No ratings yet
Day 1 19 Nov 2023 Discrete Mathematics - New
44 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
ECU106 Mathematics For Engineers 3
No ratings yet
ECU106 Mathematics For Engineers 3
13 pages
Cambricon: An Instruction Set Architecture For Neural Networks
No ratings yet
Cambricon: An Instruction Set Architecture For Neural Networks
13 pages
Matrices Worksheet
No ratings yet
Matrices Worksheet
8 pages
SMA 1216 Matrices Tutorial
No ratings yet
SMA 1216 Matrices Tutorial
3 pages
Week 1
No ratings yet
Week 1
5 pages
Tutorial 3 EEN-301 2024-25
No ratings yet
Tutorial 3 EEN-301 2024-25
2 pages
Core Concepts in Real Analysis
From Everand
Core Concepts in Real Analysis
Roshan Trivedi
No ratings yet
Kronecker Products and Matrix Calculus with Applications
From Everand
Kronecker Products and Matrix Calculus with Applications
Alexander Graham
No ratings yet
INVENRELATION
From Everand
INVENRELATION
Shih Yu Chang
No ratings yet
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
The Smart Math Tricks Secrets to Solving Math Fast and Easy
From Everand
The Smart Math Tricks Secrets to Solving Math Fast and Easy
Leonardo Cruz
No ratings yet
SAT Math Shortcuts
From Everand
SAT Math Shortcuts
Bella Biscotti
No ratings yet

Weekly Homework X

Uploaded by

Weekly Homework X

Uploaded by

Assignment 1

2. Zero Mean and Normalize by Standard Deviation

where * denotes Matrix Multiplication

Solution. 1. The given data in Matrix form

2. The plot of above data is shown if Figure 2

Figure 2: Plot of data

Red -> Centroid

Final Medians: (0.66666667 3.66666667) and (5.0, 1.0)

def distance(pt1, pt2):

points = np.array([(1,4), (1,3), (0,4), (5,1), (6,2), (4,0)])

plt.scatter(median[:,0], median[:, 1],c=’r’)

Solution. Given data (feature1, feature2) : (Class label)

4. Let pi denote the projections of the data point xi , we can write it as

where v T denotes the vector on which projection is taken.

s˜1 2 + s˜2 2 = v T (s1 + s2 )v = v T SW v

where SW is Within Class Scatter Matrix

6. Similarly we define Between Class Scatter Matrix SB as

7. Putting this all in our Objective Function, we get

We need to perform eigen decomposition over this to get our projections.

Red -> Class1

µ1 = [3., 3.6], µ2 = [8.4, 7.6]

Considering only v1 as it has the maximum variance of projected data.

The code used for all calculation is provided below

mean1 = np.mean(class1, axis=0)

Sw = (class1 - mean1).T @ (class1 - mean1) + (class2 - mean2).T @ (class2 - mean2)

plt.scatter(class1 @ np.linalg.eig(np.linalg.inv(Sw) @ Sb).eigenvectors[:, 0],

plt.scatter(class2 @ np.linalg.eig(np.linalg.inv(Sw) @ Sb).eigenvectors[:, 0],

The Advantages & Disadvantages of LDA includes

(a) Dimensionality Reduction: Reduces the number of features while preserving

(a) Sensitive to Outliers

Solution. Github Link

Model Parameters Model Hyperparameters

Solution. Gradient Descent for Determining the De-Mixing Matrix in ICA

where p(si ) is the probability density function of the independent sources.

Solutions include Increasing the Number of Mixtures.

You might also like