0% found this document useful (0 votes)
7 views55 pages

Crux LA Version2

The document covers key concepts in linear algebra including vectors, vector spaces, matrix rank, eigenvalues, and eigenvectors, as well as practical computational experiments using MATLAB. It discusses the Singular Value Decomposition (SVD), regression, classification, and the relationship between rowspace and columnspace. Additionally, it includes examples and exercises for generating matrices and performing operations related to linear algebra concepts.

Uploaded by

Calmer Music
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views55 pages

Crux LA Version2

The document covers key concepts in linear algebra including vectors, vector spaces, matrix rank, eigenvalues, and eigenvectors, as well as practical computational experiments using MATLAB. It discusses the Singular Value Decomposition (SVD), regression, classification, and the relationship between rowspace and columnspace. Additionally, it includes examples and exercises for generating matrices and performing operations related to linear algebra concepts.

Uploaded by

Calmer Music
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

CRUX of Linear Algebra -Version 2

SVD, Regression and Classification

Vectors in , Linear dependence, Linear independence , Linear combination, dotproduct, orthogonality

Vectorspaces, Basis set, Orthogonal basis set

Rowspace, Columnspace, Leftnullspace, Rightnullspace of matrices

Rank of a Matrix

Eigenvalues and Eigenvectors of Square Matrices

Main Decompositions: ; . r is rank of A

If is mxr, is rxr and is rxn

1.

; Rank of A is 2

C and R form basis set for columnspace and rowspace

Inclass Computational Experiments

1. Using RREF get C and R for given matrix


2. Demonstrate A=CR

A=[1 3 4;2 4 6; 3 5 8];


[Rref,ColIndex]=rref(A);
r=length(ColIndex); % r is also rank of A
C=A(:,ColIndex); % Retrieve r independent columns from A
R=Rref(1:r,:); % Retrieve first r rows from Rref
disp(int16(C))

1 3
2 4
3 5

disp(int16(R))

1 0 1
0 1 1

disp(int16(C*R)) % Display C*R

1 3 4
2 4 6
3 5 8

1
2.

rank of A is 2

C and R form basis set for columnspace and rowspace

Inclass Computational Experiments

1. Using RREF get C and R for given matrix. C and R must be from A itself
2. Get Leftinverse of C. If all Columns are independent, Left inverse exist
3. Get Right inverse of R. If all rows are independent, Right inverse exist
4. Get M
5. Demonstrate A=CMR

A=[1 3 4;2 4 6; 3 5 8];


[Rref,ColIndex]=rref(A);
r=length(ColIndex); % r is also rank of A
C=A(:,ColIndex); % Retrieve r independent columns from A
[Rref,RowIndex]=rref(A'); % Transpose and get r independent rows
R=A(RowIndex,:); % r independent rows from A
Cinv=inv(C'*C)*C'; % inv(C'*C)*(C'*C)=I
Rinv=R'*inv(R*R'); % (R*R')inv(R*R')=I
M=Cinv*A*Rinv;
disp(int16(C))

1 3
2 4
3 5

disp(int16(R))

1 3 4
2 4 6

disp(int16(C*M*R)) % Display C*R

1 3 4
2 4 6
3 5 8

Orthogonal Basis set using SVD

3.

Economical form of SVD

where r is rank of A

2
Orthogonal basis set for Rowspace: Columns of

Orthogonal basis set for Columnspace: Columns of

Projection matrix for projecting a vector y into Columnspace :

Inclass Computational Experiments

1. Demonstarte that vectors are in Column space of A using rank


2. Demonstarte that vectors are in Row space of A using rank
3. Demonstrate that A, and have same rank
4. Demonstrate that nonzero eigenvalues of and are same
5. Demonstrate that last m-r column vectors in U are in Leftnullspace of A
6. Demonstrate that last n-r column vectors in V are in Right nullspace of A
7. In one line find sum of squares of elements in A
8. Find Trace of and and find their relation
9. Show that Trace of is same as Trace of

A=[1 3 4;2 4 6; 3 5 8];


[U S V]=svd(A);
r=rank(A); % Find rank of A
% Append columns of A with first r columns from U
B=[A U(:,1:r)];
r1=rank(B); % Rank of appended matrix
% Append Rows of A with first r rows of V_transposed
B1=[A; V(:,1:2)'];
% format default
r2=rank(B1);
disp([r r1 r2])

2 2 2

Tr1=trace(A*A');
Tr2=trace(A'*A);
Tr3=trace(S.^2);
disp(int16([Tr1 Tr2 Tr3]))

3
180 180 180

Give intuitive linear algebra based explanation

Consider . It is equation of a line. Set of points ( on this line is infinite.

We are interested in finding which point on this line is closest to origin.

Squared norm (or square of of any point ( is

We solve the problem using optimization theory from calculus.

minimize subject to

Convert the function in to single variable.

So the point on the line nearest to origin is ( . Note that and hence it is pint on

Now we connect the solution to concepts from Linear Algebra.

in matrix format is

For the matrix following is true.

1) Rowspace is set of all points on the line passing through the origin and

Any point on this line is a scalar mutiple of .

A generic point on rowspace is .

4
Our solution point for the nearest point is the only point in rowspace f A.

2) A null space vector for is since

Any generic solution to is

For Example

The least norm solution is when .

That is when . This solution is unique. No two solution for x in Rowspace of A

Example 2

Let us try for rowspace solution to . Let

Generic Solution is

5
Example 3

; Rank of A is 2. b is sum of Column1 and columnn 3 .

x: Ax=b has infinite solution. Rank of A is 2. How do we find least norm solution?.

Method 1.

Remove redundant rows and solve. Here we remove third row.

Method 2. Using SVD and Pseudoinverse

--(1) r stand for rank. Here r=2 . So

; --(2) . Take solution vector from rowspace.

6
=

A=[1 1 2 2; 2 2 3 3; 3 3 4 4 ]; b=[3;5;7];
x=pinv(A)*b;
disp(x)

0.5000
0.5000
0.5000
0.5000

Take SVD of A:

Project : ---(1)

Take ----(2)

7
From (2),

Computational Experiments

1. Find Columnspace Projection matrix P for using SVD of A

2. Find Projected vector into columnspace of for

3. Find least square error vector for for ,

4. Demonstrate that least square error vector for for , is orthogonal to

columnspace of A

Demonstrate through Computational Experiments

1. If size of A is mxn , size of is nxm


2.

3.

4. is symmetric so also
5. ; Columnspace Projection matrix
6. ; Rowspace Projection matrix

Demonstrate through Computational Experiments

1. Create a 3x3 nontrivial integer matrix A with det(A)=1

rng("default")
R=[-3 3]; A=randi(R,3,3);A=A-diag(diag(A))+eye(3); A=triu(A)*tril(A);
disp(int16(det(A)))

8
1

2. Create a 3x3 nontrivial integer matrix A for which inverse is also integer matrix

rng("default")
R=[-3 3]; A=randi(R,3,3);A=A-diag(diag(A))+eye(3); A=triu(A)*tril(A);
disp(A);disp(int16(inv(A)))

16 9 -2
3 1 0
-3 -3 1

1 -3 2
-3 10 -6
-6 21 -11

3. Create a 3x3 nontrivial integer matrix B for which eigenvalues are 1,2,3

rng("default")
R=[-3 3]; A=randi(R,3,3);A=A-diag(diag(A))+eye(3); A=triu(A)*tril(A);
B=A*diag([1 2 3])*inv(A); disp(B);disp(int16(eig(B)))

-2 6 -10
-3 11 -6
-3 12 -3

1
2
3

4. Cretae Three nontrivial interger Matrices A, B and C such that

Refer Cayley Hamilton Theorem

rng("default"); I3=eye(3); R=[-3 3]; A=randi(R,3,3);A=A-diag(diag(A))+I3;


A=triu(A)*tril(A); D=A*diag([1 2 3])*inv(A);
A=D-I3; B=D-2*I3; C=D-3*I3;
disp(int16(A))

-3 6 -10
-3 10 -6
-3 12 -4

disp(int16(B))

-4 6 -10
-3 9 -6
-3 12 -5

disp(int16(C))

-5 6 -10
-3 8 -6
-3 12 -6

disp(int16(A*B*C))

9
0 0 0
0 0 0
0 0 0

5. Write matlab code for characteristic equation for integer 3x3 matrix whose roots provide eigenvalues 1,2,3.

rng("default");I3=eye(3); R=[-3 3]; A=randi(R,3,3);A=A-diag(diag(A))+I3;


A=triu(A)*tril(A);A=A*diag([1 2 3])*inv(A); disp(A)

-2 6 -10
-3 11 -6
-3 12 -3

p=charpoly(A,'x') % display symbolic Polynomial

p =

p1=charpoly(A) % get the polynmial coefficients

p1 = 1×4
1 -6 11 -6

disp(int16(roots(p1))) % display roots.

3
2
1

6. Create nonsymmetric 3x3 matrix with rank 2 and rowspace equals columnspace. Verify the result.

R=[1 2 3;3 4 5]; C=R'; M=[1 2;3 4];


A=C*M*R;
% Verification . bassis vectors of null(A)=basis vectors of null(A'
null(A)

ans = 3×1
0.4082
-0.8165
0.4082

null(A')

ans = 3×1
0.4082
-0.8165

10
0.4082

Miscellaneous Questions on Matrix Generation

1. Generate 100 integer Rowspace vectors from random 4x5 integer matrix A with rank 2
2. Generate 100 integer Columnspace vectors from random 4x5 integer matrix A with rank 2
3. Generate 100 integer Leftnullspace vectors from random 4x5 integer matrix A with rank 2
4. Generate 100 integer Rightnullspace vectors from random 4x5 integer matrix A with rank 2
5.
Generate 100 random integer matrices A with

6.
, generate an integer matrix whose columns are basis set for right null space of

Prove the following

1.
form a vector space
2. Eigenvectors corresponding to nonzero eigenvalues of matrix are orthogonal
3. Eigenvalues of are real and nonnegative
4. Eigenvectors corresponding to distinct eigenvalues of square matrix A are independent

Linear and Non-Linear classifiers

Convert Nominal and categorical variable into one-hot representation

Datano GenderM GenderF Age BPN BPH BPL


1 1 0 20 1 0 0
2 0 1 73 1 0 0
3 1 0 37 0 1 0

11
4 1 0 33 0 0 1
5 0 1 48 0 1 0
6 1 0 29 1 0 0
7 0 1 52 1 0 0
8 1 0 42 0 0 1
9 1 0 61 1 0 0
10 0 1 30 1 0 0
11 0 1 26 0 0 1
12 1 0 54 0 1 0
Table: Preprocessed data

Target variable is also converted into one-hot reprsentation.

Regession Model:

Predicted class label : Index of maximum value in Aw

% Medical data with one hot Presentation of categorical variables


% 'Gender' Male= [1 0] ; Female =[0 1]
% 'BP' Normal=[ 1 0 0] , High =[ 0 1 0] , Low=[0 0 1]
% Target variable Drug A= [ 1 0]; Drug B =[0 1]
% 'Age' variable is normalized between 0 and 1
A=[ 1 0 20 1 0 0 ;
0 1 73 1 0 0;
1 0 37 0 1 0;
1 0 33 0 0 1;
0 1 48 0 1 0;
1 0 29 1 0 0;
0 1 52 1 0 0;
1 0 42 0 0 1;
1 0 61 1 0 0;
0 1 30 1 0 0;
0 1 26 0 0 1;
1 0 54 0 1 0;];
y = [1 0; 0 1; 1 0; 0 1; 1 0; 1 0; 0 1; 0 1; 0 1; 1 0; 0 1; 1 0];
% yd=Target variable in terms of position where 1 is put.
% This is for easy comparision with predicted output
yd= [1 2 1 2 1 1 2 2 2 1 2 1];
x=A(:,3);
xmin=15;
xmax=80;
x=(x-xmin)./xmax; % make age variable values between 0 and 1
A(:,3)=x;
% learned weight =regression coefficients; two columns

12
w=pinv(A)*y

w = 6×2
0.6603 -0.0603
0.6080 -0.0080
-1.7061 1.7061
0.4878 -0.0878
1.0253 -0.6253
-0.2448 0.6448

D=A*pinv(A)*y;
[val,index]=max(D,[],2); % 2 is rowwise sum

% max operate in columnwise. Hence Transpose


% check whether model is working or not
% 'index' gives predicted class label
Accuracy=sum(yd(:)==index(:))*100/length(yd)

Accuracy =
100

Non-Linear Classifier

. Linear Kernel Classifier

. Nonliear Lpolynimial Kernel Classifier

Learning Classifiers (functions). Kernel Regression

It uses Fisher's Iris Dataset

3 Classes. 150 data points

The Iris dataset is a multivariate data set that contains measurements of the length and width of the sepals and
petals of three species of iris flowers:

• Iris setosa
• Iris virginica
• Iris versicolor:

13
The dataset is often used in examples of: Data mining, Classification, Clustering, and Testing algorithms.

Here are some details about the Iris dataset:

• The dataset was collected by Edgar Anderson to quantify the variation in the morphology of the flowers.
• The dataset was made famous by British statistician and biologist Ronald Fisher in his 1936 paper, The
use of multiple measurements in taxonomic problems.
• The dataset is available for download from the UCI Machine Learning Repository.
• The dataset is also included in the datasets library in R.
• The dataset is a 150x4 numpy.ndarray, with the rows representing the samples and the columns
representing the features.
• The columns in the dataset are:
• Sepal Length
• Sepal Width
• Petal Length
• Petal Width
• Species
• The dataset is simple and versatile, allowing for the construction of both linear and nonlinear models.
• One species of iris is linearly separable from the other two, but the other two are not linearly separable
from each other.
• In the dataset first 50 is Species 1, second 50 Species 2 and rest species 3
• Data Science Example - Iris dataset

A=[5.1 3.5 1.4 0.2 ;


4.9 3 1.4 0.2 ;
4.7 3.2 1.3 0.2 ;
4.6 3.1 1.5 0.2 ;
5 3.6 1.4 0.2 ;
5.4 3.9 1.7 0.4 ;
4.6 3.4 1.4 0.3 ;
5 3.4 1.5 0.2 ;
4.4 2.9 1.4 0.2 ;
4.9 3.1 1.5 0.1 ;
5.4 3.7 1.5 0.2 ;
4.8 3.4 1.6 0.2 ;
4.8 3 1.4 0.1 ;
4.3 3 1.1 0.1 ;
5.8 4 1.2 0.2 ;
5.7 4.4 1.5 0.4 ;
5.4 3.9 1.3 0.4 ;
5.1 3.5 1.4 0.3 ;
5.7 3.8 1.7 0.3 ;
5.1 3.8 1.5 0.3 ;
5.4 3.4 1.7 0.2 ;
5.1 3.7 1.5 0.4 ;
4.6 3.6 1 0.2 ;
5.1 3.3 1.7 0.5 ;

14
4.8 3.4 1.9 0.2 ;
5 3 1.6 0.2 ;
5 3.4 1.6 0.4 ;
5.2 3.5 1.5 0.2 ;
5.2 3.4 1.4 0.2 ;
4.7 3.2 1.6 0.2 ;
4.8 3.1 1.6 0.2 ;
5.4 3.4 1.5 0.4 ;
5.2 4.1 1.5 0.1 ;
5.5 4.2 1.4 0.2 ;
4.9 3.1 1.5 0.1 ;
5 3.2 1.2 0.2 ;
5.5 3.5 1.3 0.2 ;
4.9 3.1 1.5 0.1 ;
4.4 3 1.3 0.2 ;
5.1 3.4 1.5 0.2 ;
5 3.5 1.3 0.3 ;
4.5 2.3 1.3 0.3 ;
4.4 3.2 1.3 0.2 ;
5 3.5 1.6 0.6 ;
5.1 3.8 1.9 0.4 ;
4.8 3 1.4 0.3 ;
5.1 3.8 1.6 0.2 ;
4.6 3.2 1.4 0.2 ;
5.3 3.7 1.5 0.2 ;
5 3.3 1.4 0.2 ;
7 3.2 4.7 1.4 ;
6.4 3.2 4.5 1.5 ;
6.9 3.1 4.9 1.5 ;
5.5 2.3 4 1.3 ;
6.5 2.8 4.6 1.5 ;
5.7 2.8 4.5 1.3 ;
6.3 3.3 4.7 1.6 ;
4.9 2.4 3.3 1 ;
6.6 2.9 4.6 1.3 ;
5.2 2.7 3.9 1.4 ;
5 2 3.5 1 ;
5.9 3 4.2 1.5 ;
6 2.2 4 1 ;
6.1 2.9 4.7 1.4 ;
5.6 2.9 3.6 1.3 ;
6.7 3.1 4.4 1.4 ;
5.6 3 4.5 1.5 ;
5.8 2.7 4.1 1 ;
6.2 2.2 4.5 1.5 ;
5.6 2.5 3.9 1.1 ;
5.9 3.2 4.8 1.8 ;
6.1 2.8 4 1.3 ;
6.3 2.5 4.9 1.5 ;
6.1 2.8 4.7 1.2 ;

15
6.4 2.9 4.3 1.3 ;
6.6 3 4.4 1.4 ;
6.8 2.8 4.8 1.4 ;
6.7 3 5 1.7 ;
6 2.9 4.5 1.5 ;
5.7 2.6 3.5 1 ;
5.5 2.4 3.8 1.1 ;
5.5 2.4 3.7 1 ;
5.8 2.7 3.9 1.2 ;
6 2.7 5.1 1.6 ;
5.4 3 4.5 1.5 ;
6 3.4 4.5 1.6 ;
6.7 3.1 4.7 1.5 ;
6.3 2.3 4.4 1.3 ;
5.6 3 4.1 1.3 ;
5.5 2.5 4 1.3 ;
5.5 2.6 4.4 1.2 ;
6.1 3 4.6 1.4 ;
5.8 2.6 4 1.2 ;
5 2.3 3.3 1 ;
5.6 2.7 4.2 1.3 ;
5.7 3 4.2 1.2 ;
5.7 2.9 4.2 1.3 ;
6.2 2.9 4.3 1.3 ;
5.1 2.5 3 1.1 ;
5.7 2.8 4.1 1.3 ;
6.3 3.3 6 2.5 ;
5.8 2.7 5.1 1.9 ;
7.1 3 5.9 2.1 ;
6.3 2.9 5.6 1.8 ;
6.5 3 5.8 2.2 ;
7.6 3 6.6 2.1 ;
4.9 2.5 4.5 1.7 ;
7.3 2.9 6.3 1.8 ;
6.7 2.5 5.8 1.8 ;
7.2 3.6 6.1 2.5 ;
6.5 3.2 5.1 2 ;
6.4 2.7 5.3 1.9 ;
6.8 3 5.5 2.1 ;
5.7 2.5 5 2 ;
5.8 2.8 5.1 2.4 ;
6.4 3.2 5.3 2.3 ;
6.5 3 5.5 1.8 ;
7.7 3.8 6.7 2.2 ;
7.7 2.6 6.9 2.3 ;
6 2.2 5 1.5 ;
6.9 3.2 5.7 2.3 ;
5.6 2.8 4.9 2 ;
7.7 2.8 6.7 2 ;
6.3 2.7 4.9 1.8 ;

16
6.7 3.3 5.7 2.1 ;
7.2 3.2 6 1.8 ;
6.2 2.8 4.8 1.8 ;
6.1 3 4.9 1.8 ;
6.4 2.8 5.6 2.1 ;
7.2 3 5.8 1.6 ;
7.4 2.8 6.1 1.9 ;
7.9 3.8 6.4 2 ;
6.4 2.8 5.6 2.2 ;
6.3 2.8 5.1 1.5 ;
6.1 2.6 5.6 1.4 ;
7.7 3 6.1 2.3 ;
6.3 3.4 5.6 2.4 ;
6.4 3.1 5.5 1.8 ;
6 3 4.8 1.8 ;
6.9 3.1 5.4 2.1 ;
6.7 3.1 5.6 2.4 ;
6.9 3.1 5.1 2.3 ;
5.8 2.7 5.1 1.9 ;
6.8 3.2 5.9 2.3 ;
6.7 3.3 5.7 2.5 ;
6.7 3 5.2 2.3 ;
6.3 2.5 5 1.9 ;
6.5 3 5.2 2 ;
6.2 3.4 5.4 2.3 ;
5.9 3 5.1 1.8 ] ;
Y=zeros(150,3);
Y(1:50,1)=1; % first 50 is class 1
Y(51:100,2)=1;% second 50 is class 2
Y(101:150,3)=1;% third 50 is class 3
K=(A*A'+1).^10; % Nonlinear Kernel
Alpha=pinv(K)*Y;
[val,Index]=max(K*Alpha,[],2); % 2 is rowwise sum
p=ones(1,50);
ylabel=[p 2*p 3*p]; % class labels as 1,2,3
Accuracy=100*sum(ylabel(:)==Index(:))/length(ylabel)

Accuracy =
100

Data from ellipses (inside of) for clustering and classification

Parametric Equation of Ellipse

Take b > a

17
Create two clusters of data both of which are not globular

To this data, apply different clustering algorithm and compare their performance

Label the cluster points as (1 ,0) and (0,1) depending on the cluster and apply Ddifferent Classification
Algorithm. Compare their performance.

Before doing it, search in Google and find out different clustering Algorithm.

% data creation
a=5; b =8;
n=10000;
theta=rand(n,1)*pi/2; % 0 to pi/2
x=a*rand(n,1).*cos(theta);
y=b*rand(n,1).*sin(theta);
Data=[x y; -x y];
%Remove all data points which are inside x^2+y^2<a^2
Index=[];
for i=1:1:2*n
x1=Data(i,1); y1=Data(i,2);
z=x1^2+y1^2;
if z<a^2
Index=[Index;i];
end
end
Data(Index,:)=[];
plot(Data(:,1),Data(:,2),'r.')
axis equal
n1=5000;
a1=3;b1=1.5;
theta1=-pi+rand(n1,1)*2*pi; % -pi to pi
x1=a1*rand(n1,1).*cos(theta1);
y1=b1*rand(n1,1).*sin(theta1);
Data1=[x1 y1+2.5];
hold on
plot(Data1(:,1),Data1(:,2),'b.')
hold off

18
Another simple code

clearvars
a=5; b =8;
n=10000;
theta=rand(n,1)*pi/2; % 0 to pi/2
x=a*rand(n,1).*cos(theta);
y=b*rand(n,1).*sin(theta);
Data=[x y; -x y];
A = Data(Data(:,1).^2 + Data(:,2).^2 > a^2,:);
plot(A(:,1),A(:,2),'r.');
n1=5000;
a1=3;b1=1.5;
theta1=-pi+rand(n1,1)*2*pi; % -pi to pi
x1=a1*rand(n1,1).*cos(theta1);
y1=b1*rand(n1,1).*sin(theta1);
Data1=[x1 y1+2.5];
hold on
plot(Data1(:,1),Data1(:,2),'b.')
hold off

19
%-------------------

% data creation .
clearvars
a=5; b =8;
n=10000;
theta=rand(n,1)*pi/2; % -pi to pi
x=a*rand(n,1).*cos(theta);
y=b*rand(n,1).*sin(theta);
Data=[x y; -x y; -x -y; x -y];
A = Data(Data(:,1).^2 + Data(:,2).^2 > 20,:);
plot(A(:,1),A(:,2),'r.');
hold on
B=0.9*randn(n,2);
plot(B(:,1),B(:,2),'b.');
axis equal
hold off

20
What you should do.

1. Figure out how code works


2. Apply K-menas , Spectral Clustering and DBSCAN and compare the result of clustering.
3. After giving one-hot representation for label (y) , apply the following for classification and find training
accuracy

a) regression model .

b) polynomial kernel regression

c) Random Kitchen sink algorithm

Assignment

1. Write entire code in Jupyter notbook

2. Write code for creating 3D globular and ellipsoidal clusters

https://mccormickml.com/2014/08/04/gaussian-mixture-models-tutorial-and-matlab-code/

https://www.analyticWelcome to Computational Discovery on Jupyter — Computational Discovery on Jupyter

DataSet for Data mining

21
XOR Data

clc; clear all


X=[0 0; 0 1;1 0; 1 1];
% 0 0 belongs to class 0 % 0 1 class 1 % 1 0 class 1 % 1 1 class 0
Y=[0; 1; 1; 0];
% Let us map to 10d space and apply cos as elementwise opeartion
% in Middle layer
B=randn(2,10);
K=cos(X*B); % map the data and apply cos
%Kw ~=y % Final model
w=pinv(K)*Y; % neural weight in last layer
class_lable_predicted =round(K*w) % check whether learned or not

class_lable_predicted = 4×1
0
1
1
0

Checker Board data

clc;clear all;close all;


x = rand(2000,1)*5;
y = rand(2000,1)*5;
c = mod((floor(x)+floor(y)),2);
ind = find(c);
a = [x(ind),y(ind)];
ind1 = find(c==0);
b = [x(ind1),y(ind1)];
figure(1)
plot(a(:,1),a(:,2),'*');hold on
plot(b(:,1),b(:,2),'o','Color','red');hold off;title('Checkboard data')

22
Spiral data

% Spiral Data
clc;clear all;close all;
close(gcf)
x1 = [];y1 = [];
j = 1:100;
angle = j*pi/16;
radius = 6.5*(104-j)/104;
x = radius.*sin(angle);
y = radius.*cos(angle);
x1 = [x1;x]; y1 = [y1;y];
x2 = -x1;y2 = -y1;
figure
plot(x1,y1);hold on;
plot(x2,y2,'Color','red');title('Spiral data')

23
Ring data

%Ring data generation


d = 1000;
r1 = 5+rand(d,1);
theta = 2*pi*rand(d,1);
x1 = r1.*cos(theta);
y1 = r1.*sin(theta);

r2 = 10+rand(d,1);
x2 = r2.*cos(theta);
y2 = r2.*sin(theta);
figure
plot(x1,y1,'o');hold on;
plot(x2,y2,'*','Color','red');title('Ring data')

24
% Weather data

25
Outlook : Sunny= 1; Overcast = 0 ; rainy= -1

Temp : Hot = 1; mild = 0 ; cool = -1

Humidity: high= 1; Normal = -1

26
Windy : True =1; False = -1

Play : yes [1, 0]; No [0, 1]

% First Four values in each row represent values of Independent variables

% Last two values represent Target variable values

A=[1 1 1 -1 0 1 ;
1 1 1 -1 0 1 ;
0 1 1 -1 1 0;
-1 0 1 -1 1 0;
-1 -1 -1 -1 1 0;
-1 -1 -1 1 0 1;
0 -1 -1 1 1 0;
1 0 1 -1 0 1;
1 -1 -1 -1 1 0;
-1 0 -1 -1 1 0;
1 0 -1 1 1 0;
0 0 1 1 1 0;
0 1 -1 -1 1 0;
-1 0 1 1 0 1];
X=A(:,[1 2 3 4]); % 14x 4 matrix
Y=A(:,[5 6]); % 14x2 matrix
[~, Yd]=max(Y'); % Yd represent class values as 1 and 2
B=randn(4,10); % Random matrix to map to High Dimension
XM=X*B ; % each 4 tuple Mapped to 10 dimension
XM=cos(XM); % Nonlinear mapping; bring values in -1 to +1
% 4_tuple to 10 tuple. XM is 14x10
% W is 10x2 ; Y is 14 by 2
% Model is XM*W ~ Y
W=pinv(XM)*Y;
% Predicted class value
Yp=XM*W; % 14x2
% Find location of Max values
[~,Ypc]=max(Yp'); % need to use Transpose
% Display labelled class and predicted class values
Yd

Yd = 1×14
2 2 1 1 1 2 1 2 1 1 1 1 1

Ypc

Ypc = 1×14
2 2 1 1 1 2 1 2 1 1 1 1 1

27
ML for School Kids

Chineese AI books for school kids

Challenge

Can you create one for Indian School Kids

Nonlinear classifier
When classes (data points in different classes) are not linearly seperable, we may create new features that may
allow linear seperation.

Manual Feature Generation (from existing) and viewing non-linear Class boundaries

% clearvars;
rng("default")
N = 1e3;
th = linspace(0,2*pi,N);
r1 = 5 + 0.4*randn(N,1); % for a circular cluster with mean radius=5
r2 = 10 + 0.5*randn(N,1); % for a circular cluster with mean radius=10
r3 = 15 + 0.4*randn(N,1); % for a circular cluster with mean radius=15
% make radii and theta into column vectors
th = th(:);
r1 = r1(:);
r2 = r2(:);
r3 = r3(:);

figure
plot(r1.*cos(th), r1.*sin(th),'r.');hold on

28
plot(r2.*cos(th), r2.*sin(th),'b.');
plot(r3.*cos(th), r3.*sin(th),'k.');
plot([-20 20],[0 0],"LineWidth",2);
plot([0 0],[-20 20],"LineWidth",2);
% hold off
% hold off
% axis off
title('Scatter plot of datapoints of 3 classes ')
hold off
axis equal

x1 = r1.*cos(th);
y1 = r1.*sin(th);

x2 = r2.*cos(th);
y2 = r2.*sin(th);

x3 = r3.*cos(th);
y3 = r3.*sin(th);

% New Feature Generated x^2+y^2

data = [[x1,y1,x1.^2+y1.^2 ];[x2,y2,x2.^2+y2.^2 ];[x3,y3,x3.^2+y3.^2 ]];


% for first N data, class label is 1,
% for second N data, class label is 2,
% for third N data, class label is 3,

29
clas = [ones(length(x1),1);...
2*ones(length(x2),1);...
3*ones(length(x3),1)];
% Note that in 'data' decision tree treats first column variable as x1 second as x2
and so on
% Note that third column x3 alone is enough for classification
ctree = fitctree(data,clas);
view(ctree,'mode','graph') % graphic description

% Looking at Decision tree we can decide the classifier


% -equation of circle representing classification boundaries
figure
plot(r1.*cos(th), r1.*sin(th),'r.');hold on
plot(r2.*cos(th), r2.*sin(th),'b.');
plot(r3.*cos(th), r3.*sin(th),'k.');
plot([-20 20],[0 0],"LineWidth",2);
plot([0 0],[-20 20],"LineWidth",2);
r4=sqrt(56);
plot(r4.*cos(th), r4.*sin(th),'k.')
r4=sqrt(162);
plot(r4.*cos(th), r4.*sin(th),'k.')
title('Class Boundaries')

30
% note that decision tree used only third column. Why?.
% Let us plot third column of data wrt data indices
% Note that for first 1000 data class label is l, next thousand 2 etc
% you will see wrt third variable data is linearly seperable
figure
plot(data(:,3),'.')

31
% Let us plot 3d scatter plot with variables as x, y, z=x^2+y^2
% You will see now it is linearly seperable
figure
scatter3(data(:,1),data(:,2),data(:,3))
view([-13 9])

32
Note that when one dimension(created by squaring and summing first two features) is added , the data in
different classes are linearly seperable and hence we get simple decision trees.

Decision tree without third variable

data1 = [[x1,y1];[x2,y2];[x3,y3]];
ctree = fitctree(data1,clas);
view(ctree,'mode','graph') % graphic description

33
It is very huge. Tree depth is 12

Clustering

idx = dbscan(data1,1,5); % The default distance metric is Euclidean distance

Visualize the clustering.

gscatter(data1(:,1),data1(:,2),idx);
title('DBSCAN Using Euclidean Distance Metric')

34
Use of Regression for classification -Random Kitchen sink Algorithm

N1=length(clas);
Nc=3; % number of classes
Y=zeros(N1,3); %
for i=1:N1
k=clas(i);
Y(i,k)=1;
end
A=data(:,1:2);
R=randn(2,100); % 2 tuple to 50 tuple
A1=cos(A*R);
W=pinv(A1)*Y;
[~,index]=max((A1*W)'); % Transpose and find location of maximum;
index=index(:);
acc=sum(clas==index)*100/N1

acc =
100

What you should explore further

1) How DBSCAN work- Its algorithm

35
2) Implementing DBSCAN algorithm from Scratch

3) Given 2D scatter plot of data with class label, inferring a decision tree directly

An example is given below.

If X1> 5 class is 2

If X1<= 5 and if X2<=8.5 class 1

If X1<= 5 and if X2 > 8.5 class 2

References

https://in.mathworks.com/help/stats/predict-class-labels-using-classification-tree-predict-block.html

https://in.mathworks.com/help/stats/dbscan.html

Signal Processing and

Pattern Classification with Linear Algebra

+2 level Mathematical Fundamental Required

If is orthonormal basis set for column space of mxn Matrix A with rank r, then:

Example 1

36
Note that rank of both and is r though matrix is different.

Projected vector corresponding to vector y is

Digital Filtering of Signals


Projection =Filtering

Method a. Filtering using DCT


Steps.

1. Let y be a signal. D be full basis matrix (bases in columns)

Bases are in increasing order of frequency (wave numbers)

2. Find ; (dot products)

Y is vector of DCT coefficients

Remove Unwanted frequency coefficients.

a) it can be lowpass b) highpass, c)bandpass, d)bandstop

3. Let .

4. Modified signal is (linear combination)

This is same as:

Select wanted bases of DCT in and

Compute

We can do same with Fourier and wavelet bases.

clear all;clf
yd=[-0.099913 -0.22503 -0.047385 -0.032256 -0.17648 0.056719
0.056226 -0.066583 -0.030068 -0.045194 -0.081084 0.01051
-0.12047 0.15719 -0.074188 -0.048517 0.047455 -0.05256
-0.067285 -0.14016 -0.026658 -0.18888 0.017027 0.10881
-0.12185 0.033994 0.074463 -0.20944 -0.1933 0.0087719
-0.087463 0.022396 0.035829 0.026326 0.085025 0.023721
0.076801 -0.16168 -0.042568 -0.055422 -0.19933 -0.01236
-0.14292 0.10507 -0.11614 0.018061 -0.012062 -0.12536

37
-0.24941 -0.037424 -0.13172 0.031656 0.021867 0.14124 0.032059
-0.090465 0.012944 -0.12491 -0.024765 -0.02633 -0.020039
-0.050158 0.093061 -0.20158 0.031299 0.081191 0.068447
0.057542 0.0089849 0.078737 0.074939 0.00061216 -0.0022158
0.016596 -0.08935 0.048047 0.097815 0.1331 0.26258 0.1005
0.21579 0.25145 0.28359 0.10782 0.24483 0.26261 0.15142
0.18918 0.38034 0.26502 0.32038 0.2905 0.21554 0.21762
0.30967 0.15945 0.31953 0.28355 0.1284 0.16701 0.057241
-0.061959 0.2393 0.071659 0.13987 0.11518 0.079806 -0.03556
-0.041343 0.0058561 -0.083986 -0.078369 0.1683 0.021523
-0.11001 0.0040033 -0.10794 -0.13323 -0.028002 0.08821
-0.0024487 0.038274 -0.16073 -0.1114 -0.11761 -0.24055
-0.27601 -0.06731 -0.16124 -0.20787 -0.016045 0.005701
-0.00069108 0.38383 0.45959 0.77589 0.93776 0.92538 0.98152
1.1503 1.0947 0.94033 0.91103 0.81006 0.39377 0.15791
0.034675 -0.023729 -0.28593 -0.39291 -0.32142 -0.32143
-0.24688 -0.31853 -0.23 -0.21382 -0.13741 0.058111 -0.13008
-0.18592 0.0082326 -0.12068 -0.021134 -0.083541 0.035872
0.040503 -0.032211 -0.21421 0.0077977 0.15747 0.10402 -0.15144
0.0036101 -0.051316 -0.079693 -0.094185 0.065225 0.0011213
0.058003 0.024717 0.024834 0.11902 0.16613 0.31668 -0.017337
0.029816 0.25026 0.12258 0.039142 0.22509 0.3656 0.2982
0.44058 0.31292 0.4666 0.26381 0.20132 0.31136 0.46816
0.25351 0.44382 0.33433 0.46266 0.3094 0.43847 0.23846
0.3971 0.4985 0.42438 0.58258 0.39644 0.22585 0.33204
0.55565 0.43182 0.47246 0.51884 0.20088 0.33572 0.31593
0.26975 0.33773 0.49172 0.31888 0.42813 0.2181 0.2706
0.17361 0.29268 0.19109 0.2467 0.24835 0.15702 0.23766
0.36423 0.1711 0.10367 0.18329 0.085563 -0.041885 0.10707
0.048534 0.012736 0.16011 0.21166 -0.033343 0.023539 -0.072625
0.071749 -0.23661 0.095544 0.015594 0.017481 0.11066 -0.039821
-0.027015 -0.14181 -0.18194 -0.086519 0.01153 -0.070903
-0.15021 0.090457 -0.087865 -0.017538 -0.016166 -0.040636
0.02446 -0.041005 -0.18034 -0.11613 0.011806 -0.13743
-0.045756 -0.10446 -0.12708 -0.11106 -0.048453 0.0097401
-0.018573 0.10011 -0.030181 -0.15674 -0.0033692 0.0034374
0.028592 0.047554 0.12601 -0.055959 0.025522 -0.0012623
-0.057362 0.2149 -0.023892 -0.13849 0.18007 0.036197 -0.10768
0.066859 0.13243];
y=yd';
N=length(y);
X=dct(y);
c=floor(N/7); % waves to retain
X(c:end)=0; % Cut off coefficients corresponding to high frequency basis vectors
x=idct(X); % inverse transform

subplot(2,1,1);plot(y);axis tight;title('Noisy signal-Y');


subplot(2,1,2);plot(x);axis tight;title('Denoised signal-Ycs');
subplot(3,1,3);plot(y-x);axis tight;title('Noise -Ylns');

38
Method b. Use of FFT for low pass filtering
Cut off high frequency components after taking FFT. These coefficients are centered around indexnumber
N/2+1 where N is number of data points. N is assumed to be even.

X=fft(y);
% Number of high frequencies to cut off
n=130; % total 2n+1 coefficients will be cut off
c=N/2+1; % index of the highest frequency
X(c-n:c+n)=0; % Cut off coefficients corresponding to high frequency basis vectors
x=real(ifft(X)); % inverse transform
subplot(2,1,1);plot(y);axis tight;title('Noisy signal');
subplot(2,1,2);plot(x);axis tight;title('Denoised signal');

Method c. Filtering using direct optimization with pseudo inverse.

Assume signal is continous and is function representing the signal.

The sampled value of be represented by vector

Assume signal is is contaminated by noise function representing the noisy signal by .

The sampled value of be represented by vector

39
Our aim is to retrieve original x(t) from noisy signal y(t).

For this we use concepts from calculus.

Existence of non-zero higher order derivatives at a point indicate the curvature of function at the point.

If first and higher derivative at a point (x,f(x) ) is zero, it shows function is constant surrounding the point
considered.

If first derivative at a point (x,f(x) ) is non-zero and all other higher order derivative is zero, it shows function is
linear surrounding the point considered.

For denoising, we will put two objective functions

1) unknown {x(k)} should approximately follow {y(k)} for different k. Or sum of squares of difference must be
small.

That is or

2) Sum of absolute value of second (or any order) order derivates at all points must be small.

First and second 'Derivative Matrix'

Consider function values at indices 1, 2,3 . Function values are x(1),x(2),x(3)

First derivative at index 2 is .

For simplicity let us assume unit

Then first derivative is

Second derivative at index 2 is =

We can find second derivative at third points onwards as:

------

------

This can be put in a matrix form as:

40
D is matrix

Dx is a column vector representing second derivative of unknown signal x at all points except the two end
points.

The following is the objective function used for calculus based filtering.

. We will try to minimize it wrt x

The first term is called Fidelity term. Through this term, we insist vector x to follow vector y.

Through the second term , we try to smooth the unknown function. It tries to put second derivative at many
points to zero.

is a hyperparameter, which we should choose and give. It shold strike a balance between two terms in the
minimization.

+ . It is a column vector

Above formula is used for filtering signals

N = length(y); % length of the signal


e = ones(N, 1); % vector of ones
D = spdiags([e -e], 0:1, N-1, N); % Second order difference matrix
lam = 2; % regularization param
x2=pinv(eye(N)+lam*D'*D)*y(:);
subplot(2,1,1);plot(y);axis tight;title('Noisy signal');

41
subplot(2,1,2);plot(x2);axis tight;title('Denoised signal');

Another example Filtering using FFT with synthetic data

Create a 1 second signal sampled at 128 samples/second. Signal is obtained by summing cosine waves of
frequences 5, 30 and 60 with amplitudes c1=1.2 , c2= 0.5 and c3=0.5. Progressively remove high frequency
signals

N=128;
theta=(0:N-1)*2*pi/N;
f1=5;f2=30;f3=60; c1=1.2 ; c2= 0.5; c3=0.5;
t=linspace(0,1,N);
x=c1*cos(f1*theta)+c2*cos(f2*theta) +c3*cos(f3*theta);
figure
plot(t,x)
X=fft(x); % folding frequency basis location is 64+1 % We will remove 65th +/- 30
coefficients and do inverse transform
Index=35:95;
X(Index)=0;
xfiltered=real(ifft(X));
figure
plot(t,xfiltered);
% folding frequency basis location is 64+1 % We will remove 65th +/- 40
coefficients and do inverse transform
Index=25:105;
X(Index)=0;
xfiltered=real(ifft(X));
figure
plot(t,xfiltered);

Method d. Filtering using Designed low pass filter


Theory Need to covered in Class

42
43
Frequency Response

% clear D; clf
wp=0.1*pi;
ws=0.3*pi;
% let us create Nw points between 0 and pi
% it represent sampled value of omega
Nw=200;
w=linspace(0,pi,Nw);
phi=4*w; % linear pahse

44
den=(wp-ws); % denominator of slope
% fill D array and phi array with a loop
% D is desired magnitude response
Mag(1:length(w))=0;
phi=4*w; % linear pahse
den=(wp-ws); % denominator of slope
for k= 1: length(w)
if w(k)<wp
Mag(k)=1;
elseif w(k)< ws
Mag(k)=1- (wp-w(k))/den;
else
Mag(k)=0.0;
end
end
N=10;% filter length is N+1
% Creating A matrix using outer product
w=w(:); % make it a colum vector
A=cos(w*(0:N));% a single line to cretate marix
b=Mag(:).*cos(phi(:));
b=b(:);
h=pinv(A)*b;
x3=conv(y,h);
subplot(2,1,1);plot(y);axis tight;title('Noisy signal');
subplot(2,1,2);plot(x3);axis tight;title('Denoised signal');

Method e. Filtering Using Daubechies filters

[LoD,HiD] = wfilters('db2');
x4=conv(LoD(:),y(:));
subplot(2,1,1);plot(y);axis tight;title('Noisy signal');
subplot(2,1,2);plot(x4);axis tight;title('Denoised signal');

Image Denoising

% clc;
% clear all; close all;
A1= imread('cameraman.tif');
A2=imnoise(A1,'gaussian', .01);
img=double(A2);
lamda= 1;
output=LS_DEN(img,lamda);
A4= uint8(output);
subplot(1,2,1);

45
imshow(A2);
title('noissy image')
subplot(1,2,2);
imshow(A4);
title('denoised image')
function [ output ] = LS_DEN( img,lamda )
y=img;
[M N]= size(y);
e = ones(M, 1);
D = spdiags([e -2*e e], 0:2, M, M);
%% Solve the least square problem

lam=lamda;
F = speye(M) + lam * (D'*D);
x = F \ y;
F = speye(M) + lam * (D'*D);
x1 = F \ x';
output=x1';
end

More Notes to Follow on

1) Random Kitchen Sink Algorithm

2) NN and CNN

3) Neural Tangent Kernel

4) Kolmogorov Arnoldi Networks

5) Auto Encoder and Variational Autoencoder

6) Attention Mechanisms and Transformers

7) Diffusion Models

Applications of FFT in Communication Engineering

OFDM

H:\akarsh\USB dump\Mtech 1 2021

H:\akarsh\USB dump\Mtech 1 2021\LA Strang Exercises

H:\akarsh\USB dump\Mtech 1 2021\svd image compression

H:\akarsh\QB\MIS notes 1\MIS notes 1 sample QPs in LA

H:\akarsh\QB Graph Algorithms part 1 to Part 5

46
H:\akarsh\python_notes

To be added

Projects

Demystify DNA Sequencing with Machine Learning (with data)

Filtering using Convolution

What is Linear convolution?. wait for interpretaion

It is given by

where are one sided (do not exist for negative indices)
sequences of length m and n respectively.

N=m+n-1 is length of output sequence .

Convolve sequences

1 2 3 4 5
6 7 8
Process

1. Fold one of the sequence about first index

1 2 3 4 5
8 7 6
z(0)=1x6=6 .

2 Multiply and add overlapping part of sequences

3. Shift lower sequence to the right one element at a ime and do dotproduct of the overlapping part of the
sequences. Index k of z sequence indicate the amount of shift done to the right

1 2 3 4 5
8 7 6

z(1)=

1 2 3 4 5
8 7 6

z(2)=

47
1 2 3 4 5
8 7 6

z(3)=

1 2 3 4 5
8 7 6

z(4)=

1 2 3 4 5
8 7 6

z(5)=

1 2 3 4 5
8 7 6

z(6)=

z=

Matlab command conv() do the operation

x=[1 2 3 4 5]; y=[6 7 8];


z=conv(x,y);
disp(int2str(z))

Linear Convolution using FFT

Compute 'output length' of the Linear convolution

N=m+n-1. Here m and n lengths of input sequences

Make the length of both sequences equal to N by padding zeros

Take FFT of padded sequences, multiply the corresponding elements and do the inverse transform.

x=[1 2 3 4 5 ]; y=[6 7 8];


m=length(x);n=length(y);
N=m+n-1;
x1=zeros(1,N);
x1(1:m)=x;
y1=zeros(1,N);

48
y1(1:n)=y;
z=real(ifft(fft(x1).*fft(y1)));
disp(int2str(z))
% Multiplication result
% Weight vector

K=10.^[N-1:-1:0];
format rat
Number=dot(K(:),z(:))
12345*678

Multivariate Optimization

49
Regression

Another example

50
; is a hyperparameter

+ . It is a column vector

CRUX of Linear Algebra

whre r is rank of A

Give intuitive linear algebra based explanation

Consider . It is equation of a line. Set of points ( on this line is infinite.

We are interested in finding which point on this line is closest to origin.

Squared norm (or square of of any point ( is

We solve the problem using optimization theory from calculus.

minimize subject to

Convert the function in to single variable.

51
So the point on the line nearest to origin is ( . Note that and hence it is pint on

Now we connect the solution to concepts from Linear Algebra.

in matrix format is

For the matrix following is true.

1) Rowspace is set of all points on the line passing through the origin and

Any point on this line is a scalar mutiple of .

A generic point on rowspace is .

Our solution point for the nearest point is the only point in rowspace f A.

2) A null space vector for is since

Any generic solution to is

For Example

The least norm solution is when .

That is when . This solution is unique. No two solution for x in Rowspace of A

Example 2

52
Let us try for rowspace solution to . Let

Generic Solution is

Example 3

; Rank of A is 2. b is Columnsum 1 and 3 .

x: Ax=b has infinite solution. Rank of A is 2. How do I find least norm solution?.

Method 1.

Remove redundant rows and solve. Here we remove third row.

53
Method 2. Using SVD and Pseudoinverse

--(1) r stand for rank. Here r=2 . So

; --(2) . Take solution vector from rowspace.

A=[1 1 2 2; 2 2 3 3; 3 3 4 4 ]; b=[3;5;7];
x=pinv(A)*b;
disp(x)

Take SVD of A:

Project : ---(1)

54
Take ----(2)

From (2)

55

You might also like