Object Detection
Object Detection
Object detection is a computer vision technique for locating instances of objects in images or
videos. Humans can easily detect and identify objects present in an image
To establish a fair comparison between different detectors many metrics were defined over
the years, the most dominant one being mean average precision (mAP). A brief introduction
to other metrics is necessary to fully understand mAP so an explanation on those is
provided.
Key Metrics:
The IoU metric evaluates the division between the area of overlap and the area of union. In
other words, it evaluates the degree of overlap between ground truth (gt) and predictions
(pd). where 1 would be a perfect overlap between the ground truth and the prediction.
In a two staged architecture these steps are separated; first it gets an object region proposal
then classifies it based on the features extracted from the region proposed. These
architectures achieve a very high accuracy rate but are rather slow which makes them unfit
for real-time applications like self-driving vehicles.
One stage detector predicts the bounding box over the images without region proposal step,
achieving greater detection speeds; here we can see that the ROI generation step differes
from two-staged to one-stage
Object detection is a computer vision task that involves identifying and locating objects within
an image or video. It's a fundamental task in various applications, including self-driving cars,
surveillance, and image analysis.
Object Detection Training Flow Diagram:
Data Collection and Annotation: Gather a dataset of images containing objects of interest and
annotate them with bounding boxes indicating object locations.
Data Preprocessing: Resize, normalise, and augment the data to ensure consistency and improve
model robustness
Model Selection: Choose an object detection architecture like Faster R-CNN, YOLO, or SSD,
depending on your requirements
Model Configuration: Configure the selected model architecture, including the backbone network,
anchor box sizes, and other hyperparameters.
Loss Function: Define a suitable loss function, often a combination of localization and classification
losses like Smooth L1 loss and cross-entropy loss.
Optimization: Use an optimization algorithm like stochastic gradient descent (SGD) or Adam to
minimise the loss function.
Training: Train the model on your annotated dataset using a GPU. Iterate through the dataset
multiple times (epochs) to optimise the model.
Model Evaluation: Assess the model's performance using metrics like mean average precision
(mAP) and adjust the model or hyperparameters if needed.
Fine-tuning: Adjust the model's hyperparameters (e.g., learning rate, regularisation strength) to
improve its performance.
Model Export: Save the trained model for future inference and make relevant usage
Preprocessing: Preprocess the input image by resizing and normalising it, ensuring it matches the
training data preprocessing.
Load Pretrained Model: Load the trained object detection model that was saved during training.
Model Inference: Pass the preprocessed image through the model to obtain object predictions in the
form of bounding boxes and class scores.
Object Localization: Extract the object's location and class label from the remaining bounding boxes.
Output Visualization: Draw bounding boxes and labels on the input image to visualize the detected
objects.
Results: Use the detected object information for the intended application, such as tracking objects or
triggering actions
Flow Diagram Draw.io