0% found this document useful (0 votes)

27 views11 pages

Image Segmentation Basics

The document provides an overview of image segmentation, detailing its definition, applications, and the classification of pixels. It discusses various types of image segmentation, including semantic, instance, and panoptic segmentation, as well as the role of Convolutional Neural Networks (CNNs) in these processes. Additionally, it highlights key techniques in image segmentation and popular architectures such as U-Net, VGG-19, and DoubleU-Net.

Uploaded by

vqtuanminh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views11 pages

Image Segmentation Basics

Uploaded by

vqtuanminh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

SCHOOL OF ELECTRICAL AND ELCTRONICS ENGINEERING

PROJECT I
AN OVERVIEW OF IMAGE SEGMENTATION
ANALYSIS
VU QUACH TUAN MINH
Minh.vqt222845@[Link]

ADVANCED PROGRAM - CONTROL AND AUTOMATION

Instructor: Dr. Pham Van Truong

Instructor’s Signature

Faculty: School of Electric and Electronics Engineering

Hanoi, January 2025

Table of Contents

CHAPTER 1: INTRODUCTION AND KEY CONCEPTS 1

1.1 Definition of Image Segmentation . . . . . . . . . . . . . . . . . . . 1
1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Pixel Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

CHAPTER 2: TYPES OF IMAGE SEGMENTATION 2

2.1 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Instance Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Panoptic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 2

CHAPTER 3: CONVOLUTION NEURAL NETWORK (CNNs) 3

3.1 Overview of CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Layers in CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.3 Filters/Kernels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

CHAPTER 4: KEY TECHNIQUES IN IMAGE SEGMENTATION 4

4.1 Convolution Operation . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 Padding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Strides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.4 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

CHAPTER 5: POPULAR ARCHITECTURE 6

5.1 U-NET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.2 VGG-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.3 DoubleU-Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CHAPTER 1: INTRODUCTION AND KEY CONCEPTS

1.1 Definition of Image Segmentation

Image segmentation is a computer vision technique that involves breaking down
an image into multiple segments or regions. The goal is to simplify the image and
make it more meaningful and easier to analyze by identifying and isolating objects or
areas of interest within the image.
1.2 Applications
Image segmentation is widely used in various fields, including medical imaging
(identifying tumors), autonomous vehicles (detecting road signs and obstacles), and
facial recognition (detecting facial features).
1.3 Pixel Classification
Each pixel in the image is classified into a specific category or class. This
classification can be binary (e.g., object vs. background) or multi-class (e.g., different
objects).
1.4 Mask
A mask is created where each pixel is labeled according to its class. This mask
highlights the segmented regions of the image.

1
CHAPTER 2: TYPES OF IMAGE SEGMENTATION

2.1 Semantic Segmentation

Classifies each pixel into a class, without distinguishing between different instances
of the same class.
Semantic segmentation is like teaching a computer to understand and label every
single pixel in an image according to its category. Imagine having a photo of a busy
street, semantic segmentation would help the computer recognize that certain groups
of pixels belong to cars, others to pedestrians, some to buildings, and so forth. The
result is a new image where every pixel is assigned a label corresponding to what it
represents. This technique is particularly powerful because it enables detailed scene-
understanding, allowing applications like autonomous driving to identify and react to
different elements in the environment accurately. Think of it as giving the computer
a detailed map of everything in the image, making it easier to analyze and interpret
complex scenes.
2.2 Instance Segmentation
Differentiates between individual instances of objects within the same class.
Instance segmentation is like giving the computer the ability to not only identify
and label different categories of objects in an image but also distinguish between
individual instances of the same category. Imagine you have a photo with several
people and cars. Instance segmentation will help the computer understand that there
are multiple people and cars, and it will label each person and car separately, even
if they belong to the same category. This is incredibly useful for applications like
autonomous vehicles, where it is essential to differentiate between multiple objects
of the same type, such as different cars on the road. Think of instance segmentation
as providing a detailed and precise breakdown of every individual object within each
category in the image.
2.3 Panoptic Segmentation
Combines semantic and instance segmentation, providing a complete scene understanding.

2
CHAPTER 3: CONVOLUTION NEURAL NETWORK (CNNs)

3.1 Overview of CNNs

A Convolutional Neural Network (CNN) is a type of deep learning model particularly
well-suited for analyzing visual data like images. At its core, a CNN consists of
multiple layers that work together to automatically and adaptively learn spatial hierarchies
of features from the input images. The fundamental building blocks of a CNN include
convolutional layers, which apply filters (kernels) to the input images to extract features
such as edges, textures, and patterns; activation layers like ReLU (Rectified Linear
Unit) that introduce non-linearity; pooling layers that downsample the feature maps
to reduce computational load and control overfitting; and fully connected layers that
integrate the learned features for final classification or regression tasks. This layered
structure allows CNNs to efficiently process and understand complex visual data,
making them essential tools in various applications, from image classification and
object detection to image segmentation and beyond.
3.2 Layers in CNNs
In the context of Convolutional Neural Networks (CNNs), "layers" refer to the
individual stages through which an input image passes as it is processed by the network.
Each layer performs specific operations that transform the input data, allowing the
network to learn and extract various features.
3.3 Filters/Kernels
In Convolution Neural Networks (CNNs), filters (also known as kernels) are
small matrices of weights that play a crucial role in feature extraction from input
images. Filters are applied to the input data through the convolution operation, where
they slide across the image, performing element-wise multiplication and summation
at each position. Each filter is designed to detect specific features, such as edges,
textures, or patterns, within the image. During the training process, the values of these
filters are learned and adjusted to capture relevant features effectively. The resulting
output of the convolution operation is known as a feature map, which highlights
the presence of the detected features. By stacking multiple convolution layers with
different filters, CNNs can learn hierarchical representations of the image, from low-
level features in the initial layers to high-level, complex patterns in the deeper layers.
This ability to automatically and adaptively extract features makes filters essential
components in the powerful functionality of CNNs.

3
CHAPTER 4: KEY TECHNIQUES IN IMAGE SEGMENTATION

4.1 Convolution Operation

Convolution is a mathematical operation used to extract features from an input
image. It involves applying a filter (kernel) to the image by sliding it across the image
with a specified stride and performing element-wise multiplication and summation at
each position.
For example, we have an "input image" which is illustrated by this matrix A:
 
1 2 0 1 3
 
4 5 1 6 0
 
A=
7 8 1 3 2

4 5 2 1 0
 
0 2 1 3 4

We also have this filter ( or kernel ) matrix B:

 
0 1 0
B = 1 −1 0
 

0 1 0

If we perform element-wise multiplication of matrices A and B ( at the top-left

portion of matrix A), we have
 
1∗0 2∗1 0∗0
4 ∗ 1 5 ∗ −1 1 ∗ 1
 

7∗0 8∗1 1∗0

Now we take the sum of all the products, that is

(1 ∗ 0) + (2 ∗ 1) + (0 ∗ 0) + (4 ∗ 1) + (5 ∗ −1) + (1 ∗ 1) + (7 ∗ 0) + (8 + 1) + (1 ∗ 0)
= 0 + 2 + 0 + 4 + (−5) + 1 + 0 + 8 + 0
= 2+4−5+1+8
= 10
And if we do the same with the other 8 possible positions on Matrix A, we have
the following "Feature Map"
 
10 11 10
12 11 9 
 

10 9 12

4
4.2 Padding
Padding in the context of Convolution Neural Networks (CNNs) is a technique
used to add extra pixels around the borders of an input image or feature map. These
additional pixels, often set to zero (zero padding), allow the convolution filters to
process the border regions of the image more effectively. Padding helps in preserving
the spatial dimensions of the input during the convolution operation, preventing the
reduction in size that typically occurs. This is particularly important for maintaining
the alignment of feature maps across different layers of the network and ensuring
that important features near the edges of the image are not lost. By using padding,
CNNs can generate output feature maps that retain the same height and width as the
original input, facilitating the design of deeper and more complex neural network
architectures.
4.3 Strides
In the context of Convolution Neural Networks (CNNs), stride refers to the
step size with which a convolution filter or pooling window moves across the input
image or feature map. Stride determines how much the filter shifts at each step, both
horizontally and vertically. A stride of 1 means the filter moves one pixel at a time,
resulting in overlapping regions and larger output dimensions. A stride greater than 1
(e.g., 2 or 3) means the filter moves more than one pixel at a time, reducing the output
dimensions and making the computation more efficient. Stride plays a crucial role in
controlling the spatial dimensions of the output feature map and helps in balancing
the trade-off between spatial resolution and computational efficiency.

The example at Part 4.1 uses a stride of 1.

4.4 Pooling
Pooling is a down-sampling operation used in Convolution Neural Networks
(CNNs) to reduce the spatial dimensions (height and width) of feature maps while
retaining the most important [Link] are two main types of pooling: max
pooling and average pooling.
In max pooling, the maximum value within a specified window matrix (e.g., 2x2)
is selected, whereas in average pooling, the average value within the window matrix
is calculated. Pooling helps in reducing the computational load and the number of
parameters in the network, which in turn helps prevent overfitting. Additionally,
pooling introduces a degree of invariance to small translations and distortions in
the input image, making the network more robust to variations in the input data.
By effectively summarizing the presence of features, pooling layers enable CNNs
to capture the essential characteristics of the input image efficiently.

5
CHAPTER 5: POPULAR ARCHITECTURE

5.1 U-NET
The U-Net architecture, introduced by Olaf Ronneberger, Philipp Fischer, and
Thomas Brox in 2015, is a specialized type of Convolution Neural Network (CNN)
designed specifically for image segmentation tasks, particularly in biomedical applications.
The key innovation of U-Net lies in its unique architecture, which is shaped like the
letter "U." This design allows the network to capture both context and fine details of
the input images, making it highly effective for precise segmentation.

The U-Net architecture consists of two main parts: the contracting path and
the expanding path. The contracting path, also known as the encoder, follows the
typical structure of a CNN, with repeated application of convolution and pooling
layers to progressively reduce the spatial dimensions and capture high-level features.
The expanding path, or decoder, mirrors the contracting path but uses up-sampling
and convolution layers to gradually restore the spatial dimensions and reconstruct the
segmented output. A crucial aspect of U-Net is the presence of skip connections
that directly link corresponding layers in the contracting and expanding paths. These
connections ensure that fine-grained details lost during down-sampling are preserved
and integrated during up-sampling, resulting in highly accurate segmentation maps.
5.2 VGG-19
The VGG-19 architecture is a deep convolution neural network introduced by
the Visual Geometry Group (VGG) at the University of Oxford in 2014. VGG-19

6
is part of the VGG family of models, known for their simplicity and effectiveness in
image classification tasks. The architecture is characterized by its use of small 3x3
convolution filters stacked on top of each other in multiple layers. VGG-19 consists
of 19 layers, including 16 convolution layers and 3 fully connected layers.

The design of VGG-19 emphasizes depth, with the network having a total of 19
weight layers, which enables it to capture hierarchical features from the input images.
The use of small filters allows the network to learn intricate patterns while keeping
the number of parameters manageable. VGG-19 has been widely adopted in various
computer vision tasks due to its strong performance and relatively straightforward
architecture. It has also served as a foundational model for many subsequent advancements
in deep learning, including the development of more complex architectures.
5.3 DoubleU-Net
The DoubleU-Net architecture is an extension of the popular U-Net model,
designed to improve performance in image segmentation tasks, particularly in biomedical
imaging. The key innovation of DoubleU-Net is the incorporation of two U-Net
structures connected in a cascade. The first U-Net performs initial segmentation,
and its output is passed as input to the second U-Net, which refines the segmentation
results.

This dual U-Net structure allows DoubleU-Net to leverage the strengths of the
original U-Net while addressing some of its limitations. The first U-Net focuses

7
on capturing coarse-level features and providing an initial segmentation map. The
second U-Net, using the output of the first U-Net, performs fine-level segmentation,
enhancing the accuracy and detail of the segmented regions. Skip connections within
each U-Net and between the two U-Nets ensure that important features and contextual
information are preserved throughout the network.

DoubleU-Net has shown significant improvements in segmentation performance,

especially in tasks where high precision and detail are crucial. Its ability to refine
segmentation results makes it a valuable tool in various applications, including medical
diagnostics and remote sensing.

Lecture 8 Image Segmentationi N Computer Vision 2025
No ratings yet
Lecture 8 Image Segmentationi N Computer Vision 2025
18 pages
Computer Vision Part 2
No ratings yet
Computer Vision Part 2
5 pages
ML Report-Image Segmentation
No ratings yet
ML Report-Image Segmentation
19 pages
Harley MSC Thesis Menos Especializadpo
No ratings yet
Harley MSC Thesis Menos Especializadpo
71 pages
IVP Notes
No ratings yet
IVP Notes
25 pages
Deep Learning Image Segmentation
No ratings yet
Deep Learning Image Segmentation
12 pages
Introduction To Convolutional Neural Networks (CNNS)
No ratings yet
Introduction To Convolutional Neural Networks (CNNS)
28 pages
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
No ratings yet
Computer Vision: Field of AI That Enables Computers To Derive Meaningful Information From
26 pages
Da Unit-Iv
No ratings yet
Da Unit-Iv
23 pages
Computer Vision and Its Applications
No ratings yet
Computer Vision and Its Applications
3 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
56 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
A Comprehensive Review of Modern Object Segmentation Approaches
No ratings yet
A Comprehensive Review of Modern Object Segmentation Approaches
177 pages
A1745136595 29458 13 2025 Unit6cv
No ratings yet
A1745136595 29458 13 2025 Unit6cv
54 pages
Unit 2 Computer Vision & Image Processsing
100% (2)
Unit 2 Computer Vision & Image Processsing
16 pages
Computer Vision & CNNs - Study Notes
No ratings yet
Computer Vision & CNNs - Study Notes
12 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
Deep Dive Into Convolutional Neural Networks CNNs
No ratings yet
Deep Dive Into Convolutional Neural Networks CNNs
3 pages
Convolutional Networks 2024
No ratings yet
Convolutional Networks 2024
44 pages
DL Unit Iv
No ratings yet
DL Unit Iv
18 pages
Segmentation-Aware Convolutional Networks Using Local Attention Masks
No ratings yet
Segmentation-Aware Convolutional Networks Using Local Attention Masks
11 pages
Lec 2 (Image Segemnation)
No ratings yet
Lec 2 (Image Segemnation)
52 pages
Explo PPT
No ratings yet
Explo PPT
25 pages
Bascis of AI - Module 2 - Complementary Study Material - 3
No ratings yet
Bascis of AI - Module 2 - Complementary Study Material - 3
3 pages
14 Segmentation
No ratings yet
14 Segmentation
22 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
AI Unit5
No ratings yet
AI Unit5
33 pages
Intro To CNN
No ratings yet
Intro To CNN
93 pages
DL UNIt-III
No ratings yet
DL UNIt-III
21 pages
Image Segmentation ÔÇö A BeginnerÔÇÖs Guide - Medium
No ratings yet
Image Segmentation ÔÇö A BeginnerÔÇÖs Guide - Medium
16 pages
Some Important Question
No ratings yet
Some Important Question
59 pages
Intro to Convolutional Neural Networks
No ratings yet
Intro to Convolutional Neural Networks
80 pages
CNNs: Deep Learning for Image Tasks
No ratings yet
CNNs: Deep Learning for Image Tasks
27 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
46 pages
Computer Vision With CNNs
No ratings yet
Computer Vision With CNNs
3 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Lecture 8 Segmentation
No ratings yet
Lecture 8 Segmentation
54 pages
CVlecture 5
No ratings yet
CVlecture 5
56 pages
Unit 3
No ratings yet
Unit 3
59 pages
Convolutional Neural Networks - Part 1
No ratings yet
Convolutional Neural Networks - Part 1
44 pages
IA Unit-03
No ratings yet
IA Unit-03
10 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
CV Unit V
No ratings yet
CV Unit V
18 pages
CV Lecture 7
No ratings yet
CV Lecture 7
119 pages
Image Segmentation: Femur
No ratings yet
Image Segmentation: Femur
18 pages
IJRAR1DUP001
No ratings yet
IJRAR1DUP001
3 pages
Research Paper2
No ratings yet
Research Paper2
18 pages
Lecture 3 Image Segmentation
No ratings yet
Lecture 3 Image Segmentation
25 pages
CNN, RNN
No ratings yet
CNN, RNN
60 pages
Advanced DL Computer Vision
No ratings yet
Advanced DL Computer Vision
10 pages
DL Unit-3
No ratings yet
DL Unit-3
70 pages
Unit 3 CNN 2024
No ratings yet
Unit 3 CNN 2024
58 pages
Unit - 5
No ratings yet
Unit - 5
47 pages
Introduction To CNNs
No ratings yet
Introduction To CNNs
26 pages
Convolutional Neural Network (CNN) : Assignment On
No ratings yet
Convolutional Neural Network (CNN) : Assignment On
8 pages
Deep Learning U3
No ratings yet
Deep Learning U3
3 pages
BML Assign Print 4
No ratings yet
BML Assign Print 4
8 pages
On The General Theory of Control Systems
No ratings yet
On The General Theory of Control Systems
12 pages
Lec 28 Variations in BPNN
100% (1)
Lec 28 Variations in BPNN
20 pages
Csóványos Webcam Analysis in Physics Contest
No ratings yet
Csóványos Webcam Analysis in Physics Contest
13 pages
Rhino Guide To Beam Dimension
No ratings yet
Rhino Guide To Beam Dimension
7 pages
Introduction To Database
No ratings yet
Introduction To Database
14 pages
Linear Algebra for Math Students
No ratings yet
Linear Algebra for Math Students
33 pages
Second Quarterly Examination in Mathematics-11: Libertad National High School
No ratings yet
Second Quarterly Examination in Mathematics-11: Libertad National High School
2 pages
Midterm Examination CS540-1: Introduction To Artificial Intelligence
No ratings yet
Midterm Examination CS540-1: Introduction To Artificial Intelligence
4 pages
Graph Models for NFL Analysis
No ratings yet
Graph Models for NFL Analysis
1 page
Analysis of Case - : Bloomex - Ca Logistics Optimization
No ratings yet
Analysis of Case - : Bloomex - Ca Logistics Optimization
25 pages
Econ F342 - Compre - QP
No ratings yet
Econ F342 - Compre - QP
2 pages
Procaccia Strangeness of Attractors
No ratings yet
Procaccia Strangeness of Attractors
20 pages
Cryptography Module 1part 2
No ratings yet
Cryptography Module 1part 2
162 pages
DSP Full Slides
No ratings yet
DSP Full Slides
911 pages
Algebraic Roots for Math Students
No ratings yet
Algebraic Roots for Math Students
19 pages
Automatic Material Sorting System Report
No ratings yet
Automatic Material Sorting System Report
3 pages
Stability Analysis with Routh-Hurwitz
No ratings yet
Stability Analysis with Routh-Hurwitz
64 pages
Convex Hulls
No ratings yet
Convex Hulls
27 pages
Department of Electronics and Communication Engineering: EC8491 - Communication Theory Unit III - MCQ Bank
No ratings yet
Department of Electronics and Communication Engineering: EC8491 - Communication Theory Unit III - MCQ Bank
9 pages
AI&ML 3170924 Assignment With Acknowledgement Page
No ratings yet
AI&ML 3170924 Assignment With Acknowledgement Page
45 pages
CEV 206 - Lecture 9
No ratings yet
CEV 206 - Lecture 9
42 pages
Hierarchical Frequency and SOC Control of Power Grids With Battery Energy Storage Systems1
No ratings yet
Hierarchical Frequency and SOC Control of Power Grids With Battery Energy Storage Systems1
14 pages
The Physics PG
No ratings yet
The Physics PG
23 pages
Lead Compensator-Time Domain
No ratings yet
Lead Compensator-Time Domain
17 pages
Automata: Theory and Practice: Paritosh K. Pandya (TIFR, Mumbai, India)
No ratings yet
Automata: Theory and Practice: Paritosh K. Pandya (TIFR, Mumbai, India)
11 pages
Calculus Practice for Students
No ratings yet
Calculus Practice for Students
6 pages
Dsa Cis Mom
No ratings yet
Dsa Cis Mom
1 page
Understanding Watermarking Techniques
No ratings yet
Understanding Watermarking Techniques
50 pages
Cross-Domain Recommendations Using Reviews
No ratings yet
Cross-Domain Recommendations Using Reviews
5 pages
Experiment No. 8 Lexical Diversity: 1 Objective
No ratings yet
Experiment No. 8 Lexical Diversity: 1 Objective
3 pages

Image Segmentation Basics

Uploaded by

Image Segmentation Basics

Uploaded by

HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY

SCHOOL OF ELECTRICAL AND ELCTRONICS ENGINEERING

ADVANCED PROGRAM - CONTROL AND AUTOMATION

Instructor: Dr. Pham Van Truong

Faculty: School of Electric and Electronics Engineering

Hanoi, January 2025

CHAPTER 1: INTRODUCTION AND KEY CONCEPTS 1

CHAPTER 2: TYPES OF IMAGE SEGMENTATION 2

CHAPTER 3: CONVOLUTION NEURAL NETWORK (CNNs) 3

CHAPTER 4: KEY TECHNIQUES IN IMAGE SEGMENTATION 4

CHAPTER 5: POPULAR ARCHITECTURE 6

1.1 Definition of Image Segmentation

2.1 Semantic Segmentation

3.1 Overview of CNNs

4.1 Convolution Operation

We also have this filter ( or kernel ) matrix B:

If we perform element-wise multiplication of matrices A and B ( at the top-left

7∗0 8∗1 1∗0

Now we take the sum of all the products, that is

The example at Part 4.1 uses a stride of 1.

DoubleU-Net has shown significant improvements in segmentation performance,

You might also like