CV Unit V
CV Unit V
Key components and basic architecture of deep neural network, Convolution neural network,
Object detection using R-CNN, Segmentation using image-to image neural network,
Temporal processing and recurrent neural network.
Output: A set of final detected objects with their corresponding bounding boxes, class labels,
and confidence scores
Pseudocode for Object Detection Using R-CNN:
# R-CNN Algorithm for Object Detection
def rcnn_object_detection(image):
# Step 1: Generate region proposals using Selective Search
region_proposals = selective_search(image)
# List to store detected objects
detected_objects = []
# Step 2: For each region proposal
for region in region_proposals:
# Extract the bounding box for the region
bounding_box = get_bounding_box(region)
# Step 3: Extract features from the region using CNN
resized_region = resize_region(region) # Resize to input size for
CNN
features = cnn_extract_features(resized_region)
# Step 4: Classify the region using a trained SVM
object_class = svm_classify(features)
# Step 5: Apply bounding box regression to refine the bounding box
refined_bounding_box=bounding_box_regression(bounding_box, features)
# Step 6: Store the result if it's a valid object class (not
background)
if object_class != "background":
detected_objects.append({
'class': object_class,
'bounding_box': refined_bounding_box,
'confidence': svm_confidence_score(object_class)
})
# Step 7: Apply Non-Maximum Suppression (NMS) to remove redundant
detections
final_detections = non_maximum_suppression(detected_objects)
return final_detections
# Output: List of detected objects with bounding boxes, class
labels, and confidence scores.
5. Image Segmentation using Image-to-Image Neural Networks
Segmentation is the process of classifying every pixel in an image into a category.
5.1. Types of Segmentation
1. Semantic Segmentation
o Labels each pixel with a class (e.g., “sky,” “road,” “car”).
o Example: U-Net (used in medical imaging).
2. Instance Segmentation
o Separates different instances of the same class (e.g., multiple people).
o Example: Mask R-CNN (used in self-driving cars).
5.2. U-Net (Image-to-Image Network)
Developed for medical image segmentation.
Uses encoder-decoder architecture:
o Encoder (CNN) extracts features.
o Decoder reconstructs pixel-wise classification.
5.3. Mask R-CNN
Extends Faster R-CNN for pixel-wise object detection.
Adds a segmentation mask prediction branch.
Used in autonomous vehicles, robotics, and AR/VR.
1. Types of Segmentation
• Semantic Segmentation: This classifies each pixel in an image into a particular class, but
does not distinguish between objects of the same class (e.g., all cars in an image are labeled
as "car").
• Instance Segmentation: Similar to semantic segmentation, but it distinguishes different
instances of the same object class.
• Panoptic Segmentation: A combination of semantic and instance segmentation, labeling
both things (objects) and stuff (background).
2. Neural Network Architectures for Segmentation
• Fully Convolutional Networks (FCN): Traditional CNNs for image classification have fully
connected layers at the end. However, for segmentation, FCNs replace these with
convolutional layers that output a pixel-wise classification.
• U-Net: A popular architecture originally designed for biomedical image segmentation. It
consists of an encoder-decoder structure with skip connections. The encoder extracts features,
and the decoder reconstructs the segmented image. Skip connections help recover spatial
information lost during downsampling.
• SegNet: Similar to U-Net, SegNet also uses an encoder-decoder structure, but it memorizes
the max-pooling indices in the encoder and uses them in the decoder to ensure better spatial
resolution.
• Mask R-CNN: Extends Faster R-CNN (a region proposal network for object detection) to
also generate segmentation masks for each detected object. It’s commonly used for instance
segmentation.
• DeepLab: Uses dilated/atrous convolutions to capture multi-scale context information and
improves segmentation, especially for smaller objects. Variants like DeepLabV3 and
DeepLabV3+ are widely used.
5. Applications
• Medical Imaging: Segmenting organs or tumors from MRI, CT, or ultrasound scans.
• Autonomous Driving: Road, vehicles, and pedestrians segmentation for scene
understanding.
• Satellite Imagery: Segmenting land use areas, forests, or water bodies.
• Object Detection: Combined with object detection for pixel-accurate instance detection.