Unit 3
Unit 3
Image segmentation
Image segmentation is a computer vision technique that involves dividing an
image into meaningful regions or segments, typically to simplify or change the
representation of an image into something more meaningful and easier to analyze.
Common Techniques:
Thresholding (e.g., Otsu's method)
Clustering
Clustering is a technique to group similar entities and label them
Is the process of assigning a label to every pixel in an image such that pixels
with the same label share certain characteristics
Hierarchical clustering
Divisive clustering is a type of hierarchical clustering where we start with one
large cluster that contains all the data points, and then we recursively split it into
🔍 Simple Explanation:
Think of it like organizing a messy drawer of mixed items:
You split the items into broad groups—say, electronics and stationery.
Then, you take the electronics and divide them into chargers and
headphones.
🧠 Example:
Suppose you have this data (2D points):
scss
CopyEdit
(1, 2), (2, 1), (1, 1), (8, 8), (9, 9), (8, 9)
2. First split: Use a method like K-means (k=2) to split into two groups:
3. Stop or split again: Now we decide if each of these smaller clusters can be
split further. Maybe Cluster A is fine, but we decide to split Cluster B more.
🔍 Step-by-Step Explanation:
Let’s say we have 4 data points:
scss
CopyEdit
A (1, 1), B (2, 1), C (5, 4), D (6, 5)
Step 3: Repeat
C and D are closest → merge them
🔢
Data Points:
ini
CopyEdit
P1 = (1, 1)
P2 = (2, 1)
P3 = (4, 3)
P4 = (5, 4)
✅ Step 1: Initialization
Randomly select two points as initial centroids:
java
CopyEdit
Centroid A = (1, 1) ← P1
Centroid B = (5, 4) ← P4
P4 (5,4) 5 0 B
🔄 Recalculate centroids
√[(4.5−1)² + (3.5−1)²] =
P1 (1,1) √[(1−1.5)²] = 0.5 A
√21.25 ≈ 4.61
√[(4.5−2)² + (3.5−1)²] =
P2 (2,1) √[(2−1.5)²] = 0.5 A
√12.25 ≈ 3.50
Cluster A: P1, P2
Cluster B: P3, P4
✅ Summary:
Initial centroids: P1 (1,1) and P4 (5,4)
After Iteration 1:
📉 How it works:
Run K-Means with different values of K (e.g., from 1 to 10).
Plot K vs WCSS.
Look for the "elbow point" — the point where the WCSS stops decreasing
significantly.
Let's compute the WCSS (Within-Cluster Sum of Squares) step by step for K = 2
using the earlier example dataset:
📍Data Points:
Point Coordinates
P1 (1, 2)
P2 (1, 4)
P3 (1, 0)
P4 (10, 2)
P5 (10, 4)
P6 (10, 0)
does not require specifying the number of clusters in advance. The number
of clusters is determined by the algorithm with respect to the data
Dense regions already exist in the data — areas where many points are close
together
A mode is the peak of a dense region — the point with highest data density.
Think of a hill:
Analogy:
Choose any data point (or grid point) as your current location xxx.
Take all the points you saw and compute their average position:mean=#
{neighbors}∑neighbors xixi
In the story: you ask everyone around, “Where are you standing on
average?”
In the story: you take a step uphill in the direction most of the crowd is
standing.
Keep shining your flashlight, recalculating the mean, and stepping, until
your step-size is almost zero (you’ve reached a top).
Technical explanation
In the Mean Shift algorithm, the value m(x) becomes the new position (mean) for
the data point during each iteration.
The bandwidth (also known as the window radius) directly controls the
neighborhood size(number of points used in the mean(Mw) used in the Mean
Shift algorithm.
Faster Convergence: Points move faster towards the mode because the
Gaussian kernel places less weight on distant points, allowing the algorithm to
converge more efficiently.
Pros:
Robust to outliers
Does not assume any prior shape like spherical, elliptical, etc. on data clusters
Cons:
Computationally expensive
Watershed problem
Imagine your image is a landscape made of hills(segment’s peak) and valleys.
Now, picture rain falling on this landscape. Water naturally starts collecting in the
lowest points — the valleys.
These walls — built to keep waters from different valleys apart(acts as boundary
separating different segments in image)
Construct a dam
Step 1: Preprocessing
Convert the image to binary using thresholding.
Inside each coin, pixels get values based on their distance to the edge.
Now the coin center — which was a hilltop — becomes a valley again.
It tries to expand(labels the near by pixel withe label 1 which was given to the
local minima) to its neighboring pixels.
It only expands to pixels that are lower or slightly higher in intensity (just like
water flows into valleys first).
Background subtraction
Background subtraction in real-time is a computer vision technique used to
detect and isolate moving objects (foreground) from a static or slowly changing
background in video streams. It’s commonly used in surveillance, gesture
recognition, and traffic monitoring.
An image can be digitally represented as a function of space, I = f(x, y)
where x and y represent row and column number of points I represents the
intensity
Advantages:
Simple and fast.
Limitations:
Doesn't handle dynamic backgrounds (e.g., trees moving in the wind).
Visual Example:
Imagine the color of a pixel changes over time:
A single Gaussian would try to model all these colors with one mean and variance
— poor fit.
But a mixture of 3 Gaussians can:
You're combining multiple bell curves, each with its own shape and
importance, to approximate a complex distribution.
For example, if you had pixel values that were mostly near 50, sometimes near
100, and rarely near 200, a 3-component GMM could model that with:
Label it as B (background)
Label it as F (foreground)