SOI Report
SOI Report
BACHELOR OF TECHNOLOGY
IN THE
DEPARTMENT OF ELECTRONICS AND TELECOMMUNICATION ENGINEERING
Submitted by:
SAMPAD MOHANTY
(2002070059)
CERTIFICATE
_
SAMPAD MOHANTY
Regd no.: - 2002070059
Electronics and Telecommunication
Engineering
Section: A2
ACKNOWLEDGEMENT
Lastly, I would like to thank the honorable Vice Chancellor and Head
of Department of Electronics and Telecommunication Engineering, Veer
Surendra Sai University of Technology, Burla for giving us this opportunity
to work together as a team on this project as a part of our curriculum.
SAMPAD MOHANTY
Regd no.: - 2002070059
Electronics and Telecommunication
Engineering
Section: A2
ABSTRACT
In this internship, in the Department of Electronics and Electrical
Communication Engineering at IIT Kharagpur, I learned about various topics
and techniques in machine learning and deep learning, which are essential for
the field of electronics. I started with the basics of model representation, cost
function, and gradient descent for linear regression. Then I moved on to logistic
regression and neural networks, where I learned how to classify data using
different activation functions, loss functions, and architectures. I also learned
how to implement these techniques using Python and TensorFlow frameworks.
In addition, I explored some advanced topics and techniques in deep
learning, such as convolutional neural networks, residual networks, transfer
learning, face recognition, and neural style transfer. I learned how these
techniques can improve the performance and efficiency of deep learning
models on complex tasks and domains. I also learned how to use pre-trained
models and frameworks, such as ResNet, MobileNetV2, FaceNet, and U-Net.
Moreover, I worked on a hardware project that involved performing a 3x3
convolution dot product of two arrays in both decimal and fixed-point binary
representation, optimizing the code to reduce the error, and implementing the
operation in Verilog.
This internship was a great opportunity for me to learn and grow in the
field of electronics and electrical communication engineering. I gained
valuable exposure to various topics and techniques in machine learning and
deep learning, which are essential for this field. I also acquired practical
experience and skills in hardware design and programming, which are
important for implementing these techniques in real-world applications. This
internship enhanced my knowledge and confidence in this field and prepared
me for future challenges and opportunities. I am grateful for this internship and
the guidance I received from my mentors.
Contents
Description of Internship .......................................................................................................... 1
............................................................................................................................... 1
Assignment 1: Model Representation .............................................................................. 1
Assignment 2: Cost Function............................................................................................ 2
Assignment 3: Gradient Descent for Linear Regression................................................. 2
Assignment 4: Linear Regression .................................................................................... 3
Assignment 5: Logistic Regression.................................................................................. 4
...................................................................................................................................... 6
Neural Networks................................................................................................................. 6
Assignment 6: Logistic Regression with a Neural Network mindset ............................. 6
Assignment 7: Planar Data classification with one hidden layer ................................... 7
Assignment 8: Build your Deep Network: Step by Step .................................................. 9
Assignment 9: Deep Neural Network for Image Classification ..................................... 10
Assignment 10: Initialization ........................................................................................... 11
Assignment 11: Regularization ....................................................................................... 12
Assignment 12: Gradient Checking ................................................................................ 14
Assignment 13: Optimization Methods .......................................................................... 15
Assignment 14: Convolutional Neural Network: Step by Step ..................................... 17
Some learning stuffs 1:.................................................................................................... 18
Assignment 15: Residual Networks (ResNets) .............................................................. 19
Some Learning Stuffs 2: .................................................................................................. 20
Assignment 16: Transfer Learning with MobileNetV2.................................................... 20
Assignment 17: Autonomous Driving – Car Detection .................................................. 21
Some Learning Stuffs 3: U-Net ....................................................................................... 23
Assignment 18: Face recognition using FaceNet .......................................................... 25
Some Learning Stuffs 4: Neural Style Transfer................................................................. 25
........................................................................................................................................................ 27
............................................................................................................................................... 30
Description of Internship
I joined the Department of E&ECE as an intern on 10th May 2023. My internship duration was two
months. During this period, I learned various topics related to machine learning, deep learning,
convolutional neural networks, fixed point and floating-point binary number representation and
their arithmetic. These topics are described ahead in this section.
Machine Learning is an AI technique that teaches computers to learn from experience. Machine
learning algorithms use computational methods to “learn” information directly from data without
relying on a predetermined equation as a model.
In this part I learned about supervised learning. The concepts were based on linear regression
model, cost function, gradient descent, classification, logistic regression, overfitting problem and
regularization.
There was several learn-by-doing type of assignments included in it as described ahead in this
section.
• I created the `x_train` and `y_train` variables. The data is stored in one-dimensional
NumPy arrays.
• I plotted the data using matplotlib’s scatter function.
• Then the model function 𝑓(𝑤,𝑏) was plotted according to the following equation using
matplotlib,
𝑓𝑤,𝑏 (𝑥 (𝑖) ) = 𝑤𝑥 (𝑖) + 𝑏
• I adjusted the values of 𝑤 and 𝑏 to fit the model by repeated checking with different
values.
What I learned:
• Linear regression builds a model which establishes a relationship between features and
targets.
• In the assignment above, the feature was house size, and the target was house price
• for simple linear regression, the model has two parameters 𝑤 and 𝑏 whose values are 'fit'
using training data.
• once a model's parameters have been determined, the model can be used to make
predictions on novel data.
• How to implement and explore the `cost` function for linear regression with one variable.
• The cost equation provides a measure of how well your predictions match your training
data.
• Minimizing the cost can provide optimal values of 𝑤, 𝑏.
• Redeveloped the routines for linear regression, now with multiple variables.
• Utilized numpy `np.dot` to vectorize the implementations.
• Explored the impact of the learning rate 𝛼 on convergence.
• Discovered the value of feature scaling using z-score normalization in speeding
convergence.
• Learned how linear regression can model complex, even highly non-linear functions using
feature engineering.
• Recognized that it is important to apply feature scaling when doing feature engineering.
• Utilized an open-source machine learning toolkit, scikit-learn.
• Implemented linear regression using gradient descent and feature normalization from
that toolkit.
• Implemented linear regression using a close-form solution from that toolkit.
• You have historical data from previous applicants that you can use as a training set for
logistic regression.
• For each training example, you have the applicant’s scores on two exams and the
admissions decision.
• Your task is to build a classification model that estimates an applicant’s probability of
admission based on the scores from those two exams.
How I solved:
𝑙𝑜𝑠𝑠(𝑓𝑤,𝑏 (𝑥 (𝑖) ), 𝑦 (𝑖) ) = (−𝑦 (𝑖) log (𝑓𝑤,𝑏 (𝑥 (𝑖) )) − (1 − 𝑦 (𝑖) ) log (1 − 𝑓𝑤,𝑏 (𝑥 (𝑖) )))
Deep learning is a branch of machine learning that uses neural networks to learn from data.
Neural networks are composed of layers of artificial neurons that perform nonlinear
transformations on the input and pass it to the next layer. The output layer produces the final
prediction or classification. Neural networks can learn complex features and patterns from large
amounts of data, and can be applied to various domains such as computer vision, natural
language processing, speech recognition, etc.
Neural Networks
Neural networks are models that consist of multiple layers of artificial neurons that are
connected by weights. Each neuron receives an input vector and computes a weighted sum of
its elements, adds a bias term, and applies an activation function to produce an output scalar.
The output of one layer becomes the input of the next layer, until the final output layer
produces the prediction or classification. The structure and parameters of a neural network are
determined by its architecture and hyperparameters.
• Loaded the data from the given dataset `data.h5` and analyzed the number of
datapoints.
• I split the data into train and test set.
• Standardized the training data.
• Studied the concepts of general Architecture of the learning algorithm in neural
networks.
• Defined the model structure by defining the helper function i.e. sigmoid function in this
case.
• Initialized parameters with zero.
• Using propagate function I computed the cost function and gradient.
• A function named optimize is declared to find the optimal parameters at minimum cost.
• The loss was optimized iteratively to learn the parameters:
o Computed the cost and its gradient.
o Updated the parameters using gradient descent.
• Used the learned parameters to predict the labels for a given set of the examples.
• Merged all function to a single function which is ultimately called the model.
What I Learned:
𝑦̂
(𝑖) = 𝑎 [2](𝑖) = 𝜎(𝑧 [2](𝑖) )
Given the predictions on all the examples, you can also compute the cost 𝐽 as follows:
𝑚
1
𝐽 = − ∑(𝑦 (𝑖) log(𝑎 [2](𝑖) ) + (1 − 𝑦 (𝑖) ) log(1 − 𝑎[2](𝑖) ))
𝑚
𝑖=0
• Backward Propagation is implemented and the gradients dW1, dW2, db1, db2.
• Parameters are updated using the update_parameters function, which returns a
dictionary with W1, b1, W2, b2.
• The whole model was integrated into a single function nn_model(). Which returns the
updated parameters.
• The model was tested, and the prediction function was defined.
• Accuracy was computed and found to be 90%.
What I learned:
What I learned:
• I implemented all the functions required for building a deep neural network.
• Used non-linear units to improve your model.
• Built a deeper neural network (with more than 1 hidden layer).
• Implemented an easy-to-use neural network class.
• Loaded the dataset and split the data into train and test data.
• Reshape and standardized the images before feeding them to network.
• Then I studied the model architecture for 2-layer neural network and L-layer neural
network.
• The analysis was done for the 2-layer neural network by the following steps:
o The input is a (64,64,3) image which is flattened to a vector of size (12288,1).
o The corresponding vector: [𝑥0 , 𝑥1 , … , 𝑥12287 ]𝑇 is then multiplied by the weight
matrix 𝑊 [1] of size (𝑛[1] , 12288).
o Then, added a bias term and took its relu to get the following vector:
[1] [1] [1] 𝑇
[𝑎0 , 𝑎1 , … , 𝑎𝑛[1] −1 ] .
o Repeated the same process.
o Multiplied the resulting vector by 𝑊 [2] and added the intercept (bias).
o Finally, took the sigmoid of the result. If it's greater than 0.5, classified it as a cat.
• Detailed steps followed for the L-layer neural network:
o The input was a (64,64,3) image which was flattened to a vector of size
(12288,1).
o The corresponding vector: [𝑥0 , 𝑥1 , … , 𝑥12287 ]𝑇 is then multiplied by the weight
matrix 𝑊 [1] and then added the intercept 𝑏 [1] . The result is called the linear unit.
o Next, I took the relu of the linear unit. This process repeated several times for
each (𝑊 [𝑙] , 𝑏 [𝑙] ) depending on the model architecture.
o Finally, took the sigmoid of the final linear unit. If it is greater than 0.5, classified
it as a cat.
• The accuracy for the 2-layer neural network was computed as:
o Accuracy of prediction on the training data is found to be 0.99.
o Accuracy of prediction on the test data is found to be 0.72.
• Whereas the accuracy for the L-layer (4-Layer in this case) neural network was
computed as:
o Accuracy of prediction on the training data is found to be 0.99.
o Accuracy of prediction on the test data is found to be 0.8.
• It seemed that the 4-layered neural network has better performance (80%) than the 2-
layered neural network (72%) on the same test set.
What I learned:
• Learned to build and train a deep L-layer neural network and applied it to supervised
learning.
• Explored how the increase in layers increased the accuracy of the model.
• Imported all the required libraries for the assignment as per the instruction.
• Loaded the dataset and split it into train and test dataset.
• I used a 3-layer neural network.
• The hidden layer have ReLU activation function.
• Zero Initialization of Parameters:
o W1, W2, b1, b2 = 0
o Accuracy on train set: 0.5
o Accuracy on test set: 0.5
• Random Initialization:
o b1 and b2 = 0
o W1 and W2 arrays are initialized randomly with large random values.
o Accuracy on train set: 0.83
o Accuracy on test set: 0.86
• `He` Initialization:
o Instead of multiplying `np.random.randn(..,..)` for 𝑊 and 𝑏 by 10, you will
2
multiply it by√dimension of the previous layer, which is what He initialization
recommends for layers with a ReLU activation.
o Accuracy on train set: 0.99
o Accuracy on test set: 0.96
What I learned:
What I did:
• Loaded the dataset and split it into train and test dataset
• First, I analysed by building a non-regularized model:
o Accuracy on train set: 0.94
o Accuracy on test set: 0.915
o The scatter plot and decision boundary show that the neural network model
suffers from overfitting problem.
• To reduce overfitting, I analysed two techniques – L2 Regularization and Dropout
• L2-regularization:
o A hyperparameter λ is used.
o Computed cost with regularization.
o Backward propagation with regularization is also computed.
o I computed the parameters.
o Accuracy on train set: 0.93
o Accuracy on test set: 0.93
• Dropout:
o It randomly shuts down some neurons in each iteration.
o Computed forward propagation.
o Performed backward propagation.
o I got the parameters and computed the predictions.
o Accuracy on train dataset: 0.929
o Accuracy on test dataset: 0.95
• So, dropout regularization worked better than L2- regularization.
What I learned:
• Gradient checking verifies closeness between the gradients from backpropagation and
the numerical approximation of the gradient (computed using forward propagation).
• Gradient checking is slow, so you don't want to run it in every iteration of training. You
would usually run it only to make sure your code is correct, then turn it off and use
backprop for the actual learning process.
𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑
𝑣𝑑𝑏
𝑏 ≔𝑏−α
𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑
√𝑠𝑑𝑏 +ϵ
• I combined mini batch with gradient descent, momentum and Adam optimization
techniques for better outcomes.
• Accuracy of mini batch gradient descent: 0.716
• Accuracy of mini batch gradient descent with momentum: 0.716
• Accuracy of mini batch gradient descent with Adam: 0.943
• Learning Rate Decay:
1
𝛼= 𝛼
1 + 𝑑𝑒𝑐𝑎𝑦𝑅𝑎𝑡𝑒 × 𝑒𝑝𝑜𝑐ℎ𝑁𝑢𝑚𝑏𝑒𝑟 0
o Schedule learning rate decay - the learning rate scheduling such that it only
changes when the epoch number is a multiple of the time interval.
1
α= α0
𝑒𝑝𝑜𝑐ℎ𝑁𝑢𝑚
1 + 𝑑𝑒𝑐𝑎𝑦𝑅𝑎𝑡𝑒 × ⌊ ⌋
𝑡𝑖𝑚𝑒𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙
• I implemented the learning rate decay with different optimization techniques.
• Gradient Descent with Learning Rate Decay:
o Accuracy: 0.94
• Gradient descent with momentum and learning rate decay:
o Accuracy: 0.95
• Adam with learning rate decay:
o Accuracy: 0.94
• I achieved nearly similar performance with different methods.
What I learned:
• Shuffling and Partitioning are the two steps required to build mini batches.
• Powers of two are often chosen to be the mini-batch size, e.g., 16, 32, 64, 128.
• Momentum takes past gradients into account to smooth out the steps of gradient
descent. It can be applied with batch gradient descent, mini-batch gradient descent or
stochastic gradient descent.
• We have to tune a momentum hyperparameter 𝛽 and a learning rate 𝛼.
• Apply three different optimization methods to your models.
• Build mini batches for your training set.
• Use learning rate decay scheduling to speed up your training.
• There are two types of layers – convolutional layers and pooling layers
• Convolutional layers:
o Convolutional layers perform convolution over the input using one or more filters
and produce feature maps that represent different aspects or characteristics of
the input.
o I performed zero padding.
o Then I studied about single step convolution.
o How to perform convolution in python.
o Convolutional Neural Networks – forward pass was again implemented
• Pooling Layers:
o Pooling layers perform pooling over the feature maps using a pooling function
and produce pooled feature maps that reduce the size and complexity of the
feature maps and make them more invariant to translation or distortion.
o Learned about forward pooling.
𝑛𝐻𝑝𝑟𝑒𝑣 − 𝑓
𝑛𝐻 = ⌊ ⌋+1
𝑠𝑡𝑟𝑖𝑑𝑒
𝑛𝑊𝑝𝑟𝑒𝑣 − 𝑓
𝑛𝑊 = ⌊ ⌋+1
𝑠𝑡𝑟𝑖𝑑𝑒
𝑛𝐶 = 𝑛𝐶𝑝𝑟𝑒𝑣
• A convolution extracts features from an input image by taking the dot product between
the input data and a 3D array of weights (the filter).
• The 2D output of the convolution is called the feature map.
• A convolution layer is where the filter slides over the image and computes the dot
product.
o This transforms the input volume into an output volume of different size.
• Zero padding helps keep more information at the image borders, and is helpful for
building deeper networks, because you can build a CONV layer without shrinking the
height and width of the volumes.
• Pooling layers gradually reduce the height and width of the input by sliding a 2D window
over each specified region, then summarizing the features in that region.
• Explored the backward pass of pooling layer by two different concepts – max pooling
layer backward pass and average pooling backward pass.
• Also used Tensorflow.
• Using Tensorflow Keras I learned to make Sequential models.
• Trained and evaluated the model using Keras models.
• Learned to make a convolutional model using tensorflow keras’ Conv2D.
Identity Block
The identity block is the standard block used in ResNets, and corresponds to the case where the
input activation (say 𝑎[𝑙] ) has the same dimension as the output activation (say 𝑎[𝑙+2] ).
Convolutional Block
The ResNet "convolutional block" is the second block type. You can use this type of block when
the input and output dimensions don't match up. The difference with the identity block is that
there is a CONV2D layer in the shortcut path.
• Very deep "plain" networks don't work in practice because vanishing gradients make
them hard to train.
• Skip connections help address the Vanishing Gradient problem. They also make it easy
for a ResNet block to learn an identity function.
• There are two main types of blocks: The identity block and the convolutional block.
• Very deep Residual Networks are built by stacking these blocks together.
• Anchor boxes are chosen by exploring the training data to choose reasonable
height/width ratios that represent the different classes. For this assignment, 5 anchor
boxes were chosen for you (to cover the 80 classes), and stored in the file
'./model_data/yolo_anchors.txt'
• The dimension of the encoding tensor of the second to last dimension based on the
anchor boxes is (𝑚, 𝑛𝐻 , 𝑛𝑊 , 𝑎𝑛𝑐ℎ𝑜𝑟𝑠, 𝑐𝑙𝑎𝑠𝑠𝑒𝑠).
• The YOLO architecture is IMAGE (m, 608, 608, 3) -> DEEP CNN -> ENCODING (m, 19, 19,
5, 85).
Encoding
If the center/midpoint of an object falls into a grid cell, that grid cell is responsible for detecting
that object.
Class score
Now, for each box (of each cell) you'll compute the following element-wise product and extract
a probability that the box contains a certain class.
Visualizing classes
- For each of the 19x19 grid cells, find the maximum of the probability scores (taking a max
across the 80 classes, one maximum for each of the 5 anchor boxes).
- Color that grid cell according to what object that grid cell considers the most likely.
Note that this visualization isn't a core part of the YOLO algorithm itself for making predictions;
it's just a nice way of visualizing an intermediate result of the algorithm.
Visualizing bounding boxes
Another way to visualize YOLO's output is to plot the bounding boxes that it outputs. Doing that
results in a visualization like this:
Each cell gives you 5 boxes. In total, the model predicts: 19x19x5 = 1805 boxes just by looking
once at the image (one forward pass through the network)! Different colors denote different
classes.
Non-Max suppression
In the figure above, the only boxes plotted are ones for which the model had assigned a high
probability, but this is still too many boxes. You'd like to reduce the algorithm's output to a
much smaller number of detected objects.
To do so, you'll use non-max suppression. Specifically, you'll carry out these steps:
- Get rid of boxes with a low score. Meaning, the box is not very confident about detecting a
class, either due to the low probability of any object, or low probability of this particular class.
- Select only one box when several boxes overlap with each other and detect the same object.
Several more concepts are discussed in the coding part of the assignment.
• Put more images of each person (under different lighting conditions, taken on different
days, etc.) into the database. Then, given a new image, compare the new face to
multiple pictures of the person. This would increase accuracy.
• Crop the images to contain just the face, and less of the "border" region around the face.
This preprocessing removes some of the irrelevant pixels around the face, and also
makes the algorithm more robust.
[𝑙](𝐺) [𝑙](𝐶)
where 𝑎𝑖𝑗 and 𝑎𝑖𝑗 are the activations of layer 𝑙 for the generated image and the content
image, respectively.
STYLE LOSS
Style loss is a type of loss function that measures how well the generated image matches the style
of the style image. Style loss can be computed by using a pre-trained convolutional neural
network and comparing the Gram matrices of multiple hidden layers between the generated
image and the style image. Gram matrices are matrices that contain the inner products of the
feature maps of a layer, which capture the correlations and patterns of the features. Style loss can
capture the textures, colors, and styles of the style image and ignore the spatial structure and
semantics. Style loss is defined as:
1 [𝑙](𝐺) [𝑙](𝑆) 2
𝐿𝑠𝑡𝑦𝑙𝑒 = ∑ λ𝑙 2 2 ∑ (𝐺𝑖𝑗 − 𝐺𝑖𝑗 )
4𝑛𝑙 𝑚𝑙
𝑙 𝑖,𝑗
[𝑙](𝐺) [𝑙](𝑆)
where 𝐺𝑖𝑗 and 𝐺𝑖𝑗 are the Gram matrices of layer 𝑙 for the generated image and the style
image, respectively, 𝑛𝑙 is the number of filters in layer 𝑙, 𝑚𝑙 is the height times width of the feature
map of layer 𝑙, and λ𝑙 is a weight parameter that controls the contribution of layer 𝑙 to the style
loss.
where 𝑥𝑖,𝑗 is the pixel value at position (𝑖, 𝑗) in the generated image.
Problem Statement: Perform a 3x3 convolution dot product of two arrays, namely an input
array and a kernel or weights array. I had to do this in both decimal and fixed-point binary
representation. Then I had to analyze the error between the two representations. Then I had to
optimize the code to reduce the error. After achieving the minimum error, I had to implement the
whole operation in hardware using Verilog.
• Using Python, a class named `FixedPoint` is declared that converts decimal to fixed point
numbers. The class takes four arguments: `bits`, `num`, `frac`, and `signed`. `bits` is the total
number of bits used to represent the fixed-point number, `frac` is the number of fractional
bits, and `signed` is a boolean value that indicates whether the number is signed or unsigned.
The class has methods for converting decimal to binary, binary to decimal, adding,
subtracting, multiplying, dividing, shifting, rounding, and truncating fixed point numbers.
• Another class named `FixedPointArray` is declared that converts decimal arrays to fixed point
arrays. The class inherits from `FixedPoint` and takes an additional argument: `array`. `array`
is a numpy array of decimal numbers that needs to be converted to fixed point numbers. The
class has methods for converting decimal arrays to binary arrays, binary arrays to decimal
arrays, performing element-wise operations on fixed point arrays, and displaying fixed point
arrays.
• An image array of size 7x7 and a weight array of size 3x3 are defined as numpy arrays of
decimal numbers. The image array represents an image that needs to be convolved with the
weight array, which represents a kernel or a filter that modifies the image.
• The image array and the weight array are converted to fixed point arrays using the
`FixedPointArray` class. The image array in fixed point domain has 8 bits and 8 fractional bits,
whereas the weight array in fixed point domain has 4 bits and 4 fractional bits. The signed
argument is set to True for both arrays.
• To perform the convolution dot product in fixed point domain, the image array is padded with
zeros on all sides to make it 9x9. Then, a 3x3 submatrix of the padded image array is multiplied
element-wise with the weight array, producing a 3x3 product matrix. The elements of the
product matrix are summed up to get a single number that represents one element of the
output matrix at a defined position according to loop iteration. This process is repeated for all
possible positions of the submatrix on the padded image array.
• The output matrix in fixed point domain is requantized to 8 bits using the `FixedPointArray`
class's `requantize` method. The number of fractional bits is decided after observing the mean
squared error (MSE) plot for different fractional bits from 0 to 10. The MSE plot is generated
using matplotlib's `plot` function. The MSE plot shows that 7 fractional bits for the final output
has the least error.
• Similarly, requantization is carried out for the multiplication step and the addition step in
convolution dot product. The multiplication step produces a 3x3 matrix in both domains,
which is requantized to 8 bits and 8 fractional bits in fixed point domain. The addition step
produces a single number in both domains, which is requantized to 8 bits and 8 fractional bits
in fixed point domain. These requantizations are also based on MSE plots for different
fractional bits.
• Project Folder:
https://drive.google.com/drive/folders/1yLFY5iypBE6YkOo4p4tipRmIqk3wdPvi?usp=sharing
• The convolution dot product for the optimized is also implemented for hardware using
Verilog.
• To implement the convolution dot product for hardware using Verilog, I defined modules as
`multiplier`, `adder`, and `convolution`.
• `convolution` module that takes 18 elements as input and gives 1 element as output. The first
9 elements are 8bits fixed-point inputs and remaining 9 input elements are 4bits signed fixed
point weights. The output is an 8 bits element.
• `multiplier` module takes an 8-bit input, a 4-bit signed input and gives an 8-bit signed output.
• `adder` module takes two 8-bit signed inputs and gives 9-bit signed output.
• I also designed separate projects for adder and multiplier and tested both separately.
• It performs the convolution dot product in fixed point format.
• Using python I generated a large amount of test cases for multiplier, adder and convolution,
10000 test cases each.
• Those test cases were saved in .mem files and then imported inside respective projects in
Verilog.
• These designs were tested by writing the testbench code for each project separately by taking
large number of inputs from .mem files and comparing the Verilog output with the Python
output.
• Project Folder:
https://drive.google.com/drive/folders/1P15NGwc71F6t48g0NXVjU6_mHefCk7a6?usp=sharing
What I Learned:
• Representation of decimal numbers in fixed-point form.
In fixed-point representation, the fraction is often expressed in the same number base as
the integer part, but using negative powers of the base b. The most common variants are
decimal (base 10) and binary (base 2). The latter is commonly known also as binary
scaling.
A fixed-point representation of a fractional number is essentially an integer that is to be
implicitly multiplied by a fixed scaling factor. For example, to represent the number 4.85
in fixed-point binary with 3 bits for the fractional part, you would first multiply 4.85 by 2^3
= 8 to get 38.8. Then you would round this number to the nearest integer to get 39. Finally,
you would represent this integer in binary as 100111. The binary point is implicitly located
three bits from the right, so this number represents 100.111 in binary, which is equivalent
to 4 + 0 + 0 + 0.5 + 0.25 + 0.125 = 4.875 in decimal.
• Importance of 2’s complement and why 2s’s complement works with normal arithmetic:
The key to understanding two's complement is to note that we have a set of finitely many
(in particular, 28) values in which there is a sensible notion of addition by 1 that allows us
to cycle through all of the numbers. In particular, we have a system of modular arithmetic,
in this case modulo 28=256.
In the context of arithmetic with signed integers, we don't think of 11111101 as
being 253 in our 8-bit system, we instead consider it to represent the number −3. Rather
than having our numbers go from 0 to 255 around a clock, we have them go
from −128 to 127, where −x occupies the same spot that n−x would occupy for values of x
from 1 to 128.
Succinctly, this amounts to saying that a number with 8 binary digits is deemed negative
if and only if its leading digit (its "most significant" digit) is a 1. For this reason, the leading
digit is referred to as the "sign bit" in this context.
• After completing the task related to the above concepts, I explored how to analyse each
step of a convolution dot product. Here I mean to optimize the steps by requantising the
number of bits of product and sum.
• I learned to use matplotlib to visualize the MSE (mean squared error) vs number of
fractional bits in each step and to set the number of bits and fractional bits by defining a
method requantise.
• I explored the convolution dot product in Verilog. Though Verilog is an HDL, I got a vision
how to do the operations by considering the numbers in binary domain.
• Also, I got to know how to generate 10000 test cases in python and store it to a memory
(.mem) file and import it in Verilog test bench code to test the design.
This task has taught me many concepts, some of which were new and others that needed
revisiting. I learned how to analyze my work to determine if it was correct or if there were any
errors. If there were errors, I learned how to identify, address, and fix them. The most important
lesson I learned from repeatedly analyzing the model, fixing bugs, reducing errors, and trying
different approaches to further minimize errors is to keep persevering and striving for human-level
accuracy.
Conclusion
In this internship I gained valuable knowledge and skills in Machine Learning, Deep
Learning and Convolutional Neural Networks. I applied these concepts to a practical task
that involved Convolution Dot Product and learned how to perform it in both decimal and
fixed-point binary representation. I also learned how to analyze and minimize the error
between the two representations and how to optimize the code for better performance.
Finally, I learned how to implement the whole operation in hardware using Verilog and
verified its functionality. This internship was a great learning experience for me, and I
thank my mentors and guides for their support and guidance.
Besides the technical aspects, I also learned how to conduct analysis and
approach a problem from various angles. I was fascinated by the discussions on the
research work done by the students in the lab and how they shared their insights and
findings. I admired the culture of the lab, where the students had the freedom to explore
their interests and interact with each other in a professional and friendly way. This
experience motivated me to pursue my higher studies at IIT Kharagpur, as I aspire to be
part of such a stimulating and supportive environment.
Minor Project
Neural Image Compression and Explanation
ABSTRACT
Explaining the prediction of deep neural networks (DNNs) and semantic
image compression are two active research areas of deep learning with a
numerous of applications in decision-critical systems, such as surveillance
cameras, drones and self-driving cars, where interpretable decision is critical
and storage/network bandwidth is limited. In this article, we propose a novel
end-to-end Neural Image Compression and Explanation (NICE) framework that
learns to (1) explain the predictions of convolutional neural networks (CNNs),
and (2) subsequently compress the input images for efficient storage or
transmission.
Specifically, NICE generates a sparse mask over an input image by
attaching a stochastic binary gate to each pixel of the image, whose parameters
are learned through the interaction with the CNN classifier to be explained. The
generated mask is able to capture the saliency of each pixel measured by its
influence to the final prediction of CNN; it can also be used to produce a mixed-
resolution image, where important pixels maintain their original high resolution
and insignificant background pixels are subsampled to a low resolution.
The produced images achieve a high compression rate (e.g., about 0.6×
of original image file size), while retaining a similar classification accuracy.
Extensive experiments across multiple image classification benchmarks
demonstrate the superior performance of NICE compared to the state-of-the-
art methods in terms of explanation quality and semantic image compression
rate.
RELATED WORKS
Neural Explanation
Neural explanation methods are techniques that help us understand
how deep neural networks (DNNs) make predictions. They can be divided into
two types: global and local. Global methods try to find out which input
variables are most important for the overall performance of a trained model.
This can help us discover general rules or knowledge from the model. Local
methods try to give understandable explanations for each individual
prediction. This can help us see what features or regions of the input data
influence the prediction the most.
There are different ways to implement local methods. Some methods
change or remove parts of the input data and see how the prediction changes.
Some methods calculate the gradient of the output with respect to the input
sample using backpropagation. This can show which features have high
sensitivity to the prediction. Some methods use a simpler model, such as a
linear model, to approximate the decision boundary of a DNN near a specific
prediction. This can give a local linear explanation for the prediction.
NICE is a local method that aims to produce simple and clear local
explanations, similar to some other methods such as Saliency Map, RTIS and
VIBI. However, NICE explicitly enforces sparsity and smoothness on the
explanations by using an L0-norm regularization and a smoothness constraint,
which are optimized by stochastic binary optimization.
The sparse mask generator of NICE is also related to semantic
segmentation, which is a task of dividing an image into meaningful regions.
However, unlike most semantic segmentation methods that use different
kinds of supervision, such as pixel-level labels, image-level labels, bounding
boxes, etc., NICE trains the sparse mask generator to maximize the
classification accuracy of the mixed-resolution images without using any pixel-
level annotations. This makes NICE a weakly supervised binary segmentation
algorithm that detects salient regions of an image. Since NICE’s main goal is to
provide better or comparable neural explanations, we mainly compare it with
other deep explanation methods in our experiments, rather than segmentation
methods.
where L(·) denotes the loss over training data D, such as the cross-entropy loss
for classification or the mean squared error (MSE) for regression.
The goal of this article is to develop an approach that can explain the
prediction of a neural network h(x; θ) in response to an input image x;
meanwhile, to reduce storage or network transmission cost of the image, we’d
like to compress the image x based on the above derived explanation such that
the compressed image x˜ has the minimal file size while retaining a similar
classification accuracy as the original image x.
To meet these interdependent goals, we develop a Neural Image
Compression and Explanation (NICE) framework that integrates explanation
and compression into an end-toend trainable pipeline as illustrated in Fig. 1. In
this framework, given an input image, a mask generator under the L0- norm and
smoothness constraints generates a sparse mask that indicates salient regions
of the image. The generated mask is then used to transform the original input
image to a mixed-resolution image that has a high resolution in the salient
regions and a low resolution in the background.
To evaluate the quality of sparse mask generator and the compressed
image, at the end of the pipeline a discriminator network (e.g., CNN) classifies
the generated image for prediction. Finally, the prediction, sparse mask and
compressed image can be stored or transmitted efficiently for decision making,
interpretation and system diagnosis. The whole pipeline is fully differentiable
and can be trained end-to-end by backpropagation.
Overall architecture of NICE
CONCLUSION
In this ongoing project, we will present a novel framework, NICE, that can
simultaneously explain and compress images for deep neural network
classifiers. NICE leverages a stochastic binary gate mechanism to generate
sparse masks that highlight the salient regions of the input images. The masks
can also be used to produce mixed-resolution images that preserve the
semantic information while reducing the file size. We will try such that NICE can
achieve high-quality explanations and high compression rates on various
image classification benchmarks, outperforming the existing methods. Our
work will opens up new possibilities for interpretable and efficient deep
learning applications in resource-constrained scenarios. As future work, we
plan to extend our framework to other modalities, such as natural language and
speech, and explore more ways to improve the explanation and compression
performance.