PR Practical File
PR Practical File
(ECECE26)
Practical File
Department of
Electronics and Communication Engineering,
Netaji Subhas University of Technology
Theory:
The Gaussian distribution, also known as the normal distribution, is a fun-
damental probability distribution in statistics and machine learning. It is
widely used due to its natural occurrence in many real-world phenomena.
The probability density function (PDF) of a Gaussian distribution for a ran-
dom variable x is given by:
1 1 T −1
f (x) = p exp − (x − m) S (x − m) (1)
(2π)d |S| 2
where:
(x − µ)2
1
f (x) = √ exp − (2)
σ 2π 2σ 2
where:
Code:
1 import numpy as np
2 import matplotlib . pyplot as plt
3
Result:
Successfully plotted the Gaussian distribution and studied the effect of mean
and variance.
EXPERIMENT-2
Aim:
To write a MATLAB/Python function that will take as inputs: (a) the mean
vectors, (b) the covariance matrices of the class distributions of a c-class
problem, (c) the a priori probabilities of the c classes, and (d) a matrix X
containing column vectors that stem from the above classes. It will give as
output an N-dimensional vector whose ith component contains the class where
the corresponding vector is assigned, according to the Bayesian classification
rule.
Theory:
Bayesian classification is a probabilistic approach to classifying a given set
of input vectors based on prior knowledge of class distributions. It is derived
from Bayes’ theorem, which provides a mathematical framework for updating
prior beliefs with observed data. The goal of this experiment is to implement
a classifier that assigns each input vector to one of the given c classes based
on the Bayesian decision rule. For a given feature vector x, the probability
of belonging to class Cj is given by:
P (x | Cj )P (Cj )
P (Cj | x) = (3)
P (x)
where:
49 plt . xlabel ( ’ X1 ’)
50 plt . ylabel ( ’ X2 ’)
51 plt . legend ()
52 plt . title ( ’ Bayesian Classification and Decision Boundary ’
)
53 plt . show ()
54
55 # Define means , covariance matrices , and priors
56 mean_vectors = [ np . array ([0 , 0]) , np . array ([3 , 3]) ]
57 cov_matrices = [ np . array ([[1 , 0.5] , [0.5 , 1]]) , np . array ([[1 ,
-0.5] , [ -0.5 , 1]]) ]
58 priors = [0.5 , 0.5]
59
60 # Example input vectors
61 X = np . array ([[1 , 2 , 3] , [1 , 2 , 3]])
62
63 # Perform classification
64 class_labels = b ay e s ia n _ cl a s si f i er ( mean_vectors , cov_matrices
, priors , X )
65 print ( " Class assignments : " , class_labels )
66
67 # Visualize classification
68 v i s u a l i z e _ c l a s s i f i c a t i o n ( mean_vectors , cov_matrices , X ,
class_labels )
Observation:
Class assignments: [0 1 1]
Figure 2: Gaussian PDF for varying mean and variance
Result:
Successfully understood the Bayesian classification rule.
EXPERIMENT-3
Aim:
To write a MATLAB/Python function that will take as inputs: (a) the mean
vectors, and (b) a matrix X containing column vectors that stem from the
above classes. It will give as output an N-dimensional vector whose ith com-
ponent contains the class where the corresponding vector is assigned, accord-
ing to the minimum Euclidean distance classifier.
Theory:
The minimum Euclidean distance classifier assigns an input vector to the
class whose mean vector is closest to it in terms of Euclidean distance.
The Euclidean distance between two points x = (x1 , x2 , . . . , xn ) and m =
(m1 , m2 , . . . , mn ) in an n-dimensional space is given by:
v
u n
uX
d(x, m) = t (xi − mi )2 (4)
i=1
Code:
1 import numpy as np
2 import matplotlib . pyplot as plt
3
4 def c l a s s i f y _ m i n _ e u c l i d e a n ( mean_vectors , X ) :
5 # Compute the squared Euclidean distances between each
vector in X and each mean vector
6 distances = np . linalg . norm ( X [: , : , np . newaxis ] -
mean_vectors [: , np . newaxis , :] , axis =0)
7 # Assign each vector in X to the class with the minimum
distance
8 return np . argmin ( distances , axis =1)
9
10 # Define mean vectors and data points
11 mean_vectors = np . array ([[1 , 4] , [1 , 4]]) # Two mean vectors
( for two classes )
12 X = np . array ([[2 , 3 , 5] , [2 , 3 , 5]]) # Three vectors to
classify
13
14 # Classify points
15 classifications = c l a s s i f y _ m i n _ e u c l i d e a n ( mean_vectors , X )
16
17 # Visualization
18 plt . figure ( figsize =(6 , 6) )
19 colors = [ ’ red ’ , ’ blue ’] # Colors for two classes
20
21 # Plot mean vectors
22 plt . scatter ( mean_vectors [0] , mean_vectors [1] , c = ’ black ’ ,
marker = ’x ’ , s =100 , label = ’ Mean Vectors ’)
23
24 # Plot data points , color - coded by classification
25 unique_classes = np . unique ( classifications )
26 for cls in unique_classes :
27 indices = np . where ( classifications == cls )
28 plt . scatter ( X [0 , indices ] , X [1 , indices ] , c = colors [ cls ] ,
label = f ’ Class { cls } ’)
29
30 plt . xlabel ( ’ Feature 1 ’)
31 plt . ylabel ( ’ Feature 2 ’)
32 plt . legend ()
33 plt . title ( ’ Data Classification using Minimum Euclidean
Distance ’)
34 plt . grid ()
35 plt . show ()
36
Observation:
Class assignments: [0 1 1]
Figure 3: Gaussian PDF for varying mean and variance
Result:
Successfully implemented and understood the minimum Euclidean distance
classifier.
EXPERIMENT-4
Aim:
To write a MATLAB/Python function that will take as inputs: (a) the mean
vectors, (b) the covariance matrix of the class distributions of a c-class prob-
lem, and (c) a matrix X containing column vectors that stem from the above
classes. It will give as output an N-dimensional vector whose ith component
contains the class where the corresponding vector is assigned according to
the minimum Mahalanobis distance classifier.
Theory:
The Mahalanobis distance is a measure of the distance between a point and a
distribution, taking into account the correlations of the dataset. It is defined
as:
p
dM (x, m) = (x − m)T S−1 (x − m) (5)
where:
• x is the input vector,
Code:
1 import numpy as np
2 import matplotlib . pyplot as plt
3 from scipy . spatial . distance import mahalanobis
4
5 # Mahalanobis classifier function
6 def m a h a l a n o b i s _ c l a s s i f i e r ( mean_vectors , cov_matrix , X ) :
7 c , N = mean_vectors . shape # Number of classes and
feature dimension
8 M = X . shape [1] # Number of samples to classify
9
Observation:
Assigned Classes: [0 1 1]
Result:
Successfully implemented and understood the minimum Mahalanobis dis-
tance classifier.
EXPERIMENT-5
Aim:
To write a MATLAB/Python function that takes as inputs: (a) a set of
N1 vectors packed as columns of a matrix Z, (b) an N1 -dimensional vector
containing the classes where each vector in Z belongs, (c) the value for the
parameter k of the classifier, (d) a set of N vectors packed as columns in the
matrix X. It returns an N-dimensional vector whose ith component contains
the class where the corresponding vector of X is assigned, according to the
k-nearest neighbour classifier.
Theory:
The k-Nearest Neighbors (k-NN) classifier is a simple, non-parametric algo-
rithm that assigns a class to a given input vector based on the majority class
among its k-nearest neighbors in the feature space. The Euclidean distance
is typically used to determine the closeness of neighbors. The classification
process involves:
• Computing the distance between the test sample and all training sam-
ples.
• Assigning the most frequent class among these k neighbors to the test
sample.
Code:
1 import numpy as np
2 import matplotlib . pyplot as plt
3 from scipy . stats import mode
4
5 def knn_classifier (Z , labels , k , X ) :
6 distances = np . linalg . norm ( Z [: , : , None ] - X [: , None , :] ,
axis =0)
7 knn_indices = np . argsort ( distances , axis =0) [: k ]
8 knn_labels = labels [ knn_indices ]
9 predicted_labels = mode ( knn_labels , axis =0) . mode . squeeze
()
10 return predicted_labels , knn_indices
11
12 # More data points
13 Z = np . array ([[1.0 , 2.0 , 3.0 , 4.5 , 2.2 , 3.8] , [1.0 , 2.0 , 3.0 ,
1.5 , 3.1 , 2.5]]) # 2 D feature vectors
14 labels = np . array ([0 , 1 , 1 , 0 , 1 , 0]) # Class labels
15 X = np . array ([[2.5 , 1.5 , 3.5] , [2.5 , 1.5 , 2.0]]) # 2 D test
vectors
16 k = 3 # Updated k to 3
17
18 predicted_classes , knn_indices = knn_classifier (Z , labels , k ,
X)
19
20 # Define class colors
21 class_colors = {0: ’ blue ’ , 1: ’ red ’}
22
23 # Plot the training points
24 for label in np . unique ( labels ) :
25 plt . scatter ( Z [0 , labels == label ] , Z [1 , labels == label ] ,
color = class_colors [ label ] , marker = ’o ’ , edgecolors = ’k ’ ,
label = f ’ Class { label } ’)
26
27 # Plot the test points
28 for i , x in enumerate ( X . T ) :
29 plt . scatter ( x [0] , x [1] , color = ’ gold ’ , marker = ’s ’ ,
edgecolors = ’k ’ , label = f ’ Test Point { i +1} ( Pred : {
pre dicted _class es [ i ]}) ’)
30
31 # Highlight the nearest neighbors
32 for i , x in enumerate ( X . T ) :
33 neighbors = knn_indices [: , i ]
34 for neighbor in neighbors :
35 plt . plot ([ Z [0 , neighbor ] , x [0]] , [ Z [1 , neighbor ] , x
[1]] , ’k - - ’ , alpha =0.6 , label = ’ Nearest Neighbor ’ if i == 0
and neighbor == neighbors [0] else " " )
36
37 plt . legend ()
38 plt . xlabel ( ’ Feature 1 ’)
39 plt . ylabel ( ’ Feature 2 ’)
40 plt . title ( ’k - NN Classification Visualization ’)
41 plt . show ()
42
43 print ( " Predicted Classes : " , predi cted_c lasses )
Observation:
Predicted Classes: [1 1 0]
Result:
Successfully implemented and understood the k-Nearest Neighbors (k-NN)
classifier.
EXPERIMENT-6
Aim:
To write a MATLAB/Python function that will take as inputs: (a) an N-
dimensional vector, each component of which contains the class where the
corresponding data vector belongs and (b) a similar N-dimensional vector
each component of which contains the class where the corresponding data
vector is assigned from a certain classifier. Its output will be the percentage
of the places where the two vectors differ (i.e., the classification error of the
classifier).
Theory:
Classification error is a key metric in evaluating the performance of a classi-
fier. It quantifies the percentage of instances where the predicted class labels
differ from the true class labels. The classification error can be mathemati-
cally expressed as:
PN
I(yi ̸= ŷi )
E = i=1 × 100 (6)
N
where:
Code:
1 import numpy as np
2 import matplotlib . pyplot as plt
3 from sklearn . svm import SVC
4 from sklearn . model_selection import train_test_split
5 from sklearn . datasets import m ak e _ cl a s si f i ca t i on
6 from sklearn . decomposition import PCA
7
8 def c l a s s i f i c a t i o n _ e r r o r ( true_labels , predicted_labels ) :
9 true_labels = np . array ( true_labels )
10 predicted_labels = np . array ( predicted_labels )
11
Observation:
Classification Error: 10.00%
Figure 6: Gaussian PDF for varying mean and variance
Result:
Successfully implemented the classification error calculation for evaluating
classifier performance.
EXPERIMENT-7
Aim:
To write a MATLAB/Python function for the perceptron algorithm. This
will take as inputs: (a) a matrix X containing N1 -dimensional column vectors,
(b) an N-dimensional row vector y, whose ith component contains the class
(-1 or +1) where the corresponding vector belongs, and (c) an initial value
vector wini for the parameter vector. It returns the estimated parameter
vector.
Theory:
The perceptron algorithm is a fundamental method in machine learning used
for binary classification of linearly separable data. It is an iterative process
that updates a weight vector to find a linear decision boundary that correctly
separates the given classes. The perceptron learning rule updates the weight
vector whenever a misclassification occurs.
Given a feature matrix X ∈ RN ×d where each row represents a d-dimensional
feature vector, a label vector y ∈ {−1, +1}N containing class labels, and an
initial weight vector wini ∈ Rd , the perceptron algorithm updates the weight
vector w using the following rule:
w = w + y i Xi if yi (w · Xi ) ≤ 0 (7)
where Xi is the ith input vector, yi is the corresponding class label, and
w · Xi is the dot product between the weight vector and the input vector.
The algorithm iterates over all data points and updates w whenever a
misclassified point is encountered. The process continues until no misclassi-
fications occur or a predefined number of iterations is reached.
The perceptron algorithm is guaranteed to converge if the data is linearly
separable. If the data is not linearly separable, the algorithm will continue
indefinitely unless a maximum iteration limit is imposed. The final weight
vector defines a hyperplane that separates the two classes.
In this experiment, the perceptron successfully learns the weight vector
that defines a linear decision boundary separating the input samples based
on their class labels.
Code:
1 import numpy as np
2 import matplotlib . pyplot as plt
3
4 def plot_data (X , y , w = None ) :
5 # Plot one point for each class in legend
6 plt . scatter ( X [ y == 1][0 , 0] , X [ y == 1][0 , 1] , color = ’ blue
’ , label = ’ Class 1 ’)
7 plt . scatter ( X [ y == -1][0 , 0] , X [ y == -1][0 , 1] , color = ’
red ’ , label = ’ Class -1 ’)
8
13 if w is not None :
14 x_min , x_max = min ( X [: , 0]) - 1 , max ( X [: , 0]) + 1
15 x_vals = np . linspace ( x_min , x_max , 100)
16 if w [1] != 0:
17 y_vals = - ( w [0] * x_vals + w [2]) / w [1] #
Corrected decision boundary equation
18 plt . plot ( x_vals , y_vals , ’g - - ’ , label = ’ Decision
Boundary ’)
19
20 plt . xlabel ( ’ Feature 1 ’)
21 plt . ylabel ( ’ Feature 2 ’)
22 plt . legend ()
23 plt . show ()
24
25 def perceptron (X , y , w_ini , max_iter =1000) :
26 w = np . copy ( w_ini )
27 N = X . shape [0]
28
48 # Initial visualization
49 plot_data ( X [: , :2] , y )
50
51 # Train perceptron
52 w_final = perceptron (X , y , w_ini )
53 print ( " Estimated parameter vector : " , w_final )
54
55 # Final visualization
56 plot_data ( X [: , :2] , y , w_final )
Observation:
Estimated parameter vector: [ 2 1 -5]
Theory:
A Feedforward Neural Network (FFN) is a fundamental type of artificial
neural network where connections between the nodes do not form a cycle.
A three-layer FFN consists of an input layer, a hidden layer, and an out-
put layer. The network maps inputs to outputs using a series of weighted
connections and activation functions.
The activation function used for all the nodes in this experiment is the
hyperbolic tangent function (tanh), defined as:
ex − e−x
tanh(x) = (8)
ex + e−x
which produces outputs in the range of (-1,1), allowing better gradient
propagation compared to sigmoid functions.
The network is trained using gradient descent, which updates the weights
to minimize the mean squared error (MSE) between predicted and actual
outputs. The weight updates follow:
∂L
W =W −η (9)
∂W
where η is the learning rate and L is the loss function.
Three training algorithms can be used:
Code:
1 import numpy as np
2 import matplotlib . pyplot as plt
3
4 def tanh ( x ) :
5 return np . tanh ( x )
6
7 def tanh_derivative ( x ) :
8 return 1.0 - np . tanh ( x ) **2
9
10 # Generate training data ( XOR - like pattern with additional
points )
11 X = np . array ([[0 , 0] , [0 , 1] , [1 , 0] , [1 , 1] ,
12 [0.1 , 0] , [0.9 , 1] , [1 , 0.1] , [0 , 0.9] ,
13 [0.2 , 0.8] , [0.8 , 0.2] , [0.3 , 0.7] , [0.7 ,
0.3]])
14
15 y = np . array ([[0 , 0] , [1 , 0] , [1 , 0] , [0 , 1] ,
16 [0 , 0] , [1 , 0] , [1 , 0] , [0 , 1] ,
17 [0 , 1] , [1 , 0] , [0 , 1] , [1 , 0]])
18
19 # Initialize network parameters
20 input_dim = 2
21 hidden_dim = 4
22 output_dim = 2
23 np . random . seed (42)
24 W1 = np . random . randn ( input_dim , hidden_dim )
25 B1 = np . zeros ((1 , hidden_dim ) )
26 W2 = np . random . randn ( hidden_dim , output_dim )
27 B2 = np . zeros ((1 , output_dim ) )
28
29 # Training parameters
30 learning_rate = 0.1
31 epochs = 5000
32 losses = []
33
34 # Training loop
35 for epoch in range ( epochs ) :
36 # Forward pass
37 Z1 = np . dot (X , W1 ) + B1
38 A1 = tanh ( Z1 )
39 Z2 = np . dot ( A1 , W2 ) + B2
40 A2 = tanh ( Z2 )
41
42 # Compute loss
43 loss = np . mean (( y - A2 ) ** 2)
44 losses . append ( loss )
45
46 # Backpropagation
47 dA2 = ( A2 - y ) * tanh_derivative ( Z2 )
48 dW2 = np . dot ( A1 .T , dA2 )
49 dB2 = np . sum ( dA2 , axis =0 , keepdims = True )
50
Observation:
Epoch 0: Loss = 1.8093
Epoch 1000: Loss = 0.0774
Epoch 2000: Loss = 0.0653
Epoch 3000: Loss = 0.0453
Epoch 4000: Loss = 0.0235
Result:
Successfully designed and trained a three-layer feedforward neural network
using gradient descent, demonstrating correct mapping from input to output
with the hyperbolic tangent activation function.
EXPERIMENT-9
Aim:
Write MATLAB/Python function to compute the principal components of
the covariance matrix of an l × N dimensional data matrix X as well as
the corresponding variances. Hence, write a MATLAB/Python function that
evaluates the performance of the PCA method when applied on a data matrix
X.
Theory:
Principal Component Analysis (PCA) is a widely used dimensionality reduc-
tion technique that transforms high-dimensional data into a lower-dimensional
form, preserving as much variance as possible. It is based on an eigendecom-
position of the covariance matrix of the data.
Given a data matrix X ∈ Rl×N , where each column represents a sample,
the steps to compute PCA are:
• Centering the Data: Subtract the mean of each row from the data.
55 return {
56 "U": U,
57 "S": S,
58 " e x p l a i n e d _ v a r i a n c e _ r a t i o " : explained_variance_ratio ,
59 " X_reconstructed " : X_reconstructed
60 }
61
62 # Generate example 3 D data
63 np . random . seed (0)
64 mean = np . array ([[0] , [0] , [0]])
65 cov = np . array ([[3 , 1 , 0.5] , [1 , 2 , 0.3] , [0.5 , 0.3 , 1]])
66 X = np . random . m u l ti v a ri a t e_ n o rm a l ( mean . flatten () , cov , 100) . T
67
68 # Evaluate PCA
69 result = evaluate_pca (X , num_components =2)
Observation: