CNN Theory
CNN Theory
a known grid-like topology, such as images (which can be seen as 2D grids of pixels).
The key components of a CNN include
1. Convolutional Layers
2. Pooling layers
3. Activation Functions
4. Fully connected layers.
Each of these components relies on specific mathematical operations that allow the
network to learn and extract features from input data.
For a 2D input image I and a 2D kernel K, the convolution operation can be defined as:
Kernel: A small matrix (e.g., 3×3) that slides over the image. Each position on the
kernel has a weight K(m,n).
Sliding Window: The kernel is applied to each overlapping region of the image. For
each position, we multiply the corresponding pixel values of the image and the kernel
weights and then sum these products to get a single number. This process creates a
new matrix, called the feature map, which highlights certain features of the image.
Pooling Layers
Pooling is a down-sampling operation used to reduce the dimensions of the feature
maps while retaining the most important information. There are three different types of
pooling techniques.
i) Max pooling is the most common pooling operation, where the maximum value
within a window is selected.
iii) Average pooling is the pooling operation, where the average value within a
window is selected.
Activation Functions: Activation functions introduce non-linearity into the network,
allowing it to learn more complex representations.
ReLU (Rectified Linear Unit): ReLU is a piecewise linear function defined as:
ReLU simply replaces all negative values in the input with zero. This operation can be
thought of as "activating" only those neurons that contribute to the network's
decision-making.
Fully Connected Layers: Fully connected layers are where each neuron is connected to
every neuron in the previous layer. Mathematically, this is a linear transformation
followed by an activation function.
Matrix Multiplication in Fully Connected Layers : Given an input vector x and a weight
matrix W, the output, Z is calculated as: Z=Wx+b
Where: W is the weight matrix, b is the bias vector, z is the output vector.
Gradient Descent: Once the gradients are calculated, gradient descent is used to
update the weights to minimize the loss function:
θ:=θ−α⋅∇θJ(θ)
Where: