Convolutional Neural Networks
Convolutional Neural Networks
Ms.G.Keerthika1
Abstract :
1.Introduction:
Key Words:
2.Architecture of CNNs:
Input Layer
The input layer of a CNN receives the image data, which is typically represented as a 3D
tensor (height × width × channels). For instance, a colored image may have 3 channels
corresponding to the Red, Green, and Blue (RGB) color values. Size: The input image size
depends on the dataset. For example, images in the CIFAR-10 dataset are 32×32×3, while those
in ImageNet are 224×224×3.
Convolutional Layer
The convolutional layer is the core building block of a CNN. It applies a set of filters
(kernels) to the input image or the feature maps produced by previous layers. These filters slide
across the image (with a defined *stride*) and perform the convolution operation, which
essentially captures local patterns such as edges, corners, textures, or other features.
Mathematical Operation
A filter is convolved over the image using element-wise multiplication and summation.For
example, for a filter size of 3×3, the filter is applied to 3×3 sections of the input, moving across
the image. Each filter detects different features.The output of the convolution operation is a set
of feature maps, each representing a specific learned feature from the image.
Key Concepts
Filters/Kernels: Small-sized weight matrices (e.g., 3×3 or 5×5) that learn spatial features.
Stride: The number of pixels the filter moves across the image at each step.
Padding: Adding extra pixels around the input image to preserve the spatial dimensions after
applying convolution.
After convolution, the feature map is passed through an activation function, typically
**ReLU (Rectified Linear Unit)*, which introduces non-linearity into the network. ReLU sets all
negative values in the feature map to zero while leaving positive values unchanged. This helps
the network capture complex patterns. ReLU(x) = max(0, x), where x is the output from the
convolution operation. ReLU accelerates convergence and helps mitigate the vanishing gradient
problem, allowing the network to learn more efficiently
Pooling Layer
Types of Pooling:
Max Pooling: Selects the maximum value from each region (e.g., 2×2 or 3×3) of the
feature map. This helps preserve the most dominant features.
Average Pooling: Takes the average value of each region, offering a smoother
representation
After several convolutional and pooling layers, the CNN typically ends with one or more
fully connected layers. These layers flatten the feature maps into a 1D vector and use standard
fully connected neural network architectures for final decision-making. Each node in a fully
connected layer is connected to every node in the previous layer, helping to combine the learned
features into higher-level concepts.The fully connected layer is responsible for classifying the
input image into the desired categories (e.g., "dog", "cat", "car").
Output Layer
The final output layer typically uses the **softmax* activation function for classification
tasks. Softmax converts the raw output values (logits) into probabilities, ensuring that the output
is a distribution that sums to 1.
\[
\] where \(z_i\) is the logit for class \(i\) and the denominator is the sum of exponentials of all
logits.
Dropout Layer
Image Classification: Models like AlexNet, VGG, and ResNet achieve high accuracy in
identifying objects within an image.
Object Detection: CNN-based frameworks like Faster R-CNN, YOLO (You Only Look Once),
and SSD (Single Shot Multibox Detector). Used in real-time applications like autonomous
vehicles.
Image Segmentation: Fully Convolutional Networks (FCNs) and U-Net for pixel-wise
predictions.
Face Recognition: FaceNet and DeepFace for identifying and verifying human faces.
Solution: Data Augmentation: Apply transformations like rotation, flipping, scaling, and
cropping to artificially expand the training dataset.
Computational Complexity
Class Imbalance
In image recognition, some classes may have a lot more samples than others,
causing the CNN to be biased toward the more common classes.
Solution: Class Weights: Modify the loss function to assign higher weights to less frequent
classes, ensuring the model pays more attention to underrepresented classes.
Interpretability
CNNs are often considered "black boxes" because it is difficult to understand how
decisions are made, especially in complex models.
Solution: Visualization Tools: Techniques like Grad-CAM or saliency maps can help visualize
which parts of the image are contributing to the decision-making process of the model.
Adversarial Attacks
CNNs are vulnerable to adversarial attacks, where small, imperceptible changes to the
input image can cause the model to misclassify it.
Solution: Adversarial Training: Train the model on adversarial examples, allowing it to learn
how to defend against such attacks.
5.Advantages of CNNs
Automatic Feature Extraction: CNNs automatically learn and extract features from images,
such as edges, textures, and patterns, without needing manual feature engineering. This enables
them to perform well on complex image tasks where identifying relevant features by hand would
be difficult and time-consuming.
How It Works: Through layers like convolutions and pooling, CNNs progressively
detect low-level features in earlier layers (e.g., edges) and higher-level features in deeper layers
(e.g., object parts).
Translation Invariance: CNNs are highly effective at recognizing objects regardless of their
position in the image. This property, called translation invariance, means that the model can
correctly classify an object even if it appears in different locations within the image.
How It Works: Through pooling layers (such as max pooling), CNNs reduce the spatial
dimensions, making them less sensitive to small translations or shifts in the position of objects
Parameter Sharing: CNNs utilize the concept of **parameter sharing*, where the same
convolutional filters are applied across different parts of the image. This drastically reduces the
number of parameters compared to fully connected layers and makes the model more efficient in
terms of memory and computation.
How It Works: A convolutional kernel (filter) is slid over the input image, which means
that the same set of weights is reused, reducing the total number of parameters to be learned.
Local Connectivity: CNNs focus on local connections in the image by using small filters that
only consider local patches (receptive fields) of the image at a time. This mimics how the human
visual system works and allows CNNs to capture local patterns effectively.
How It Works: Each neuron in a convolutional layer is only connected to a local region of
the input, enabling the network to learn local spatial hierarchies and patterns.
Reduced Need for Feature Engineering: Unlike traditional machine learning algorithms that
require significant manual feature extraction (such as histogram of oriented gradients or scale-
invariant feature transform), CNNs can learn relevant features directly from the raw image dat
6.Case Studies
7.Future Directions
8.Conclusion
9.References