0% found this document useful (0 votes)
12 views8 pages

Paper Analysis

Uploaded by

thesismodel630
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views8 pages

Paper Analysis

Uploaded by

thesismodel630
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Paper4

This image illustrates the process of license plate detection and recognition from a vehicle
image. Let's break it down step by step:

1. Input Image: The process starts with an input image of a blue BMW car with a visible
license plate "KAH-9329".
2. Pre-processing:
o Image Downsampling: The original image is reduced in size to improve
processing speed.
o Gray Image Conversion: The downsampled image is converted to grayscale to
simplify further processing.
3. Candidate Extraction:
o Binary Segmentation: The grayscale image is converted to a binary (black and
white) image, emphasizing the license plate area.
o Kernel Density Function: This step further refines the binary image to isolate
potential license plate regions.
4. Output of Detected Image: The license plate area is isolated and extracted from the
original image.
5. Segmentation: The extracted license plate is further processed to isolate individual
characters.

The final result shows the successfully extracted and segmented license plate "KAH-9329".

This process demonstrates a typical workflow in automatic license plate recognition systems,
involving image preprocessing, region of interest extraction, and character segmentation. These
steps prepare the image for the final stage of character recognition, which would typically
involve machine learning techniques to identify each character and reconstruct the full license
plate number.

 License Plate Extraction: Used a pre-trained Haar Cascade Classifier.

Haar Cascade is an object recognition algorithm

that can be used to detect objects and faces in images and

videos. It is focused on the idea of features. It serves as the foundation for

all object detection using the Haar-like features algorithm. Haar cascade

is a technique that uses a huge number of positive and negative images

to train the cascade classifier.

Positive images – Positive images are those images which are supposed

to be recognized by our classifier. Our dataset contained 1100


images of positive images

Negative Images – Negative images are all the random images which

are of no use i.e., all those images which do not contain the object we

want our classifier to detect. Our dataset contains 3000 negative

images.

 Image Pre-processing: Used techniques like grayscale conversion, thresholding, and


morphological operations.

Image Pre-processing Methods

1. Grayscale Conversion:
o Purpose: Simplifies the image by converting it from color to shades of gray.
o How it Works: Each pixel's color value is transformed into a single intensity
value, representing the brightness of the pixel. This reduces the computational
complexity.
2. Thresholding:
o Purpose: Converts a grayscale image into a binary image (black and white).
o How it Works: Sets a threshold value. Pixels with intensity values above the
threshold are turned white, and those below are turned black. This helps in
distinguishing the license plate characters from the background.
3. Morphological Operations:
o Purpose: Enhances the structure of objects within the image.
o Common Operations:
 Dilation: Expands the boundaries of white regions, useful for connecting
disjointed parts of the license plate characters.
 Erosion: Shrinks the boundaries of white regions, helping to remove
small noise points.
 Opening: Erosion followed by dilation, used to remove small objects
from the foreground.
 Closing: Dilation followed by erosion, useful for closing small holes in
the foreground objects.

These pre-processing techniques prepare the image for the subsequent steps of license plate
localization and character recognition by enhancing the features and reducing noise.

 Character Segmentation: Compared character bounding box dimensions to license plate


dimensions.
Based on the provided code snippet and explanation, here's a brief summary of the character
segmentation process:

1. Filtering: The code checks if the width and height of each potential character are within
specified lower and upper bounds. This helps to filter out noise and non-character
elements.
2. Contour Extraction: The x-coordinates of valid character contours are stored in
x_cntr_list for later use in indexing.
3. Character Extraction: Each character is extracted from the original image using the
bounding rectangle coordinates.
4. Resizing: The extracted character is resized to a standard size (20x40 pixels).
5. Visualization: A rectangle is drawn around each character for visualization purposes.
6. Color Inversion: The character image is inverted (255 - char) to prepare it for
classification.
7. Standardization: The character is placed within a larger 44x24 pixel image with a black
border, creating a consistent format for all characters.
8. Storage: The processed character image is appended to img_res, which stores all
segmented characters.

This process aims to isolate individual characters from the license plate image, standardize their
size and format, and prepare them for subsequent character recognition. The method uses
geometric properties (width and height ratios) to identify and extract valid characters while
filtering out non-character elements.

 Character Recognition: Implemented and compared four deep learning models (CNN,
MobileNet, Inception V3, ResNet50)

1. CNN Model Structure: The model consists of 10 layers in total:

 1 Input layer
 4 Convolutional (Conv2D) layers
 1 Max Pooling layer
 1 Dropout layer
 1 Flatten layer
 2 Dense layers

2. Layer Details:

 Input: (28, 28, 3) - Likely for 28x28 pixel color images


 Conv2D layers: Increasing filters from 16 to 64
 Max Pooling: Reduces spatial dimensions from 28x28 to 7x7
 Dropout: Helps prevent overfitting
 Flatten: Converts 2D feature maps to 1D vector
 Dense layers: 128 units, then 36 units (likely for 26 letters + 10 digits)

3. Purpose of this Model: This CNN architecture is designed for character recognition in
license plates. It's structured to:

 Extract features from images using convolutional layers


 Reduce spatial dimensions and computational load with max pooling
 Prevent overfitting with dropout
 Classify characters with dense layers

The Flatten layer converts convolutional layer into a single one-dimensional


vector, that can be used as input for a dense layer. The last dense layer has the
most parameters. This layer connects every single output 'pixel' from the
convolutional layer to the output classes.

Mobilnet

This image illustrates two key building blocks of the MobileNet architecture: the stride=1 block
and the stride=2 block. Let me explain each:

1. Stride=1 block (left side):


o Input goes through a 1x1 convolution with ReLU activation
o Then a 3x3 depthwise convolution with ReLU activation
o Followed by another 1x1 convolution (linear, no activation)
o The output is then added to the original input (residual connection)
2. Stride=2 block (right side):
o Input goes through a 1x1 convolution with ReLU activation
o Then a 3x3 depthwise convolution with stride=2 and ReLU activation
o Followed by another 1x1 convolution (linear, no activation)

Key points about MobileNet architecture:

1. Depthwise Separable Convolutions: The architecture uses depthwise convolutions


(labeled as "Dwise" in the image) followed by pointwise convolutions (1x1
convolutions). This significantly reduces computational cost compared to standard
convolutions.
2. Residual Connections: The stride=1 block includes a residual connection (skip
connection) that adds the input to the processed output. This helps in training deeper
networks by addressing the vanishing gradient problem.
3. ReLU6 Activation: The architecture uses ReLU6 as the activation function, which is a
variant of ReLU that caps the maximum output at 6.
4. Efficient Design: The combination of depthwise separable convolutions, residual
connections, and strategic use of stride=2 blocks for downsampling allows MobileNet to
be computationally efficient while maintaining good performance, making it suitable for
mobile and embedded vision applications.

These building blocks are repeated and stacked to form the complete MobileNet architecture,
with variations in the number of filters and layers depending on the specific version of
MobileNet being used.

Key points about MobileNet V2:

 It has 53 layers and is pretrained on ImageNet.


 It's designed for image classification and mobile vision applications.
 The architecture starts with a fully convolutional layer with 32 filters, followed by 19
bottleneck layers.
 When using MobileNet for transfer learning, the last output layer is typically replaced
with a new layer specific to the task (in this case, 36 nodes for character recognition).
 The input shape is adjusted to (80,80,3) for the specific application mentioned.

This approach allows for efficient transfer learning on mobile devices while maintaining good
performance for tasks like image classification.

Inception V3

Overview:

 Depth: 48 layers deep CNN model.


 Purpose: Primarily used for image recognition.
 Pre-training: Pretrained on the ImageNet dataset, which contains millions of images
with labeled objects.

Implementation Steps:

1. Modify Output Layer:


o Discard the default output layers.
o Replace with a custom output layer containing 36 nodes (for 36 characters).
o Configure the input layer for images of shape (80, 80, 3).
2. Training Configuration:
o Set all layers as trainable if training = True (weights can be updated during
training).
o Initialize the learning rate and decay value.
o Compile the model with categorical cross-entropy as the loss function and
accuracy as the metric.

ResNet50

Overview:

 Depth: 50 layers deep Residual Neural Network.


 Purpose: Used for image recognition.
 Pre-training: Pretrained on the ImageNet dataset.

Architecture:

 Five Stages: Each stage contains a convolution block and an identity block.
 Blocks: Each block has three layers.

Implementation Steps:

1. Modify Output Layer:


o Discard the default output layers.
o Replace with a custom output layer containing 36 nodes (for 36 characters).
o Configure the input layer for images of shape (80, 80, 3).
2. Training Configuration:
o Set all layers as trainable if training = True (weights can be updated during
training).
o Initialize the learning rate and decay value.
o Compile the model with categorical cross-entropy as the loss function and
accuracy as the metric.

Basic Concepts

Convolutional Neural Network (CNN):

 Purpose: Used for image recognition and processing.


 Layers: Consist of convolutional layers, pooling layers, and fully connected layers.
 Function: Extracts features from images and classifies them.

Transfer Learning:

 Purpose: Utilizes pre-trained models to leverage existing knowledge and apply it to new
tasks.
 Advantage: Reduces training time and computational resources.

Both Inception V3 and ResNet50 use transfer learning to adapt pre-trained models for
recognizing 36 characters, with modifications to the final layers to suit the specific task.
Paper 1

What have they done?

The paper presents a methodology for detecting and recognizing Bangla license plate numbers
using a deep learning approach. Specifically, the authors have used a 53-layer convolutional
neural network model to perform the tasks of license plate detection and character recognition.
They have captured images using 12-megapixel cameras and prepared a

dataset comprising 1050 training images and 200 testing images of private vehicles.

The images have been manually annotated and augmented to enhance model robustness(paper1).

 Proposed Method:

YOLOv3 (You Only Look Once version 3)

 Definition: YOLOv3 is a specific object detection algorithm that uses a single CNN to
predict multiple bounding boxes and class probabilities for objects in images.
 Structure: YOLOv3 is built upon CNN architecture. It uses a feature extraction network
(Darknet-53) which is a 53-layer convolutional network. The network is followed by
several layers that predict bounding boxes and class probabilities.
 Usage: YOLOv3 is designed specifically for real-time object detection. It divides the
input image into a grid and predicts bounding boxes and probabilities for each grid cell.

Relationship

 CNN Foundation: YOLOv3 is based on CNNs. The backbone of YOLOv3, called


Darknet-53, is a deep convolutional neural network that extracts features from the input
image.
 Specialization: While a general CNN can be used for various tasks, YOLOv3 is
specialized for object detection. It leverages the capabilities of CNNs to simultaneously
perform localization (bounding box prediction) and classification.

*******************

 Image Capture and Pre-processing:


o Images are captured and resized to 416 x 416 pixels.
 License Plate Detection:
o The first CNN model (YOLOv3) is used to detect the license plate.
o YOLOv3 chosen for its real-time detection, accuracy, and ability to filter
background noise.
o Network consists of 53 convolutional layers and uses the leaky rectified linear
unit activation function.
 License Plate Cropping:
o If a license plate is detected, it is cropped from the input image.
 Segmentation and Character Recognition:
o The cropped image is processed by a second CNN model (also based on
YOLOv3) for segmenting and recognizing the text.
o YOLOv3 helps segment words and characters and classify them simultaneously,
reducing processing time.

You might also like