Module 4 Dip
Module 4 Dip
A. Manual Segmentation
• Done completely by a person using tools like drawing or tracing.
• Used by doctors or experts for critical tasks.
• Pros: Very accurate.
• Cons: Time-consuming, tiring, and different people may get different results.
B. Semi-Automatic Segmentation
• A person gives a starting point, and the system does the rest.
• Example: You click on the tumor, and software grows the area around it.
• Known as "seed-based" methods (like region-growing).
• Combines human decision + computer speed.
C. Automatic Segmentation
• No human help needed.
• The software detects and segments everything.
• Great for large datasets or real-time apps like face recognition.
✅ Example Mask:
diff
CopyEdit
-1 -1 -1
-1 8 -1
-1 -1 -1
🔸 5. Limitations
• If the background is noisy, false points may be detected.
• Not suitable for complex textures.
Edge dedtection
🔹 30 % – Basic Introduction to Edge Detection
1. What Is an Edge?
An edge is simply a line (or curve) in an image where the pixel intensity changes abruptly.
In real‐world scenes, edges often correspond to object boundaries, changes in surface
orientation, or differences in material/lighting.
2. Why Detect Edges?
• Outlines & Shape: Edges give the “outline” of objects, making it easier to
recognize shapes.
• Data Reduction: By focusing on edges, you discard large swaths of nearly uniform
regions, keeping only important structural information.
• Preprocessing: Most high‐level tasks—like object recognition or segmentation—
begin by finding edges first.
3. Key Idea
• Edges are found where there’s a significant discontinuity (jump) in intensity.
• Mathematically, we look at the derivative of the image function:
• A first derivative (gradient) becomes large at an edge.
• A second derivative crosses zero at the location of an abrupt change (zero‐
crossing).
4. Edge Detection Pipeline (High‐Level)
• Smoothing/Filtering (to reduce noise)
• Compute Derivative (find where intensity changes strongly)
• Threshold/Localize (decide which strong changes are actual edges)
• Post‐Processing (thin, link, and clean up edge pixels)
That’s the 30 percent core. Once you have this, you know “what” an edge is, “why” we want it, and
the four broad steps to find it. Now dive into the 70 percent for all the mechanics you’ll need to
write it out yourself.
Why Care?
Different detectors respond differently to these. A step edge yields a strong peak in the first
derivative; a ramp edge produces a wider, lower‐magnitude peak; a spike shows up sharply in the
second derivative, and so on.
Thresholding
Here’s a 30:70 simple explanation of Thresholding from the Image Segmentation and
Compression chapter in your PDFs. 😊
🔸 2. Types of Thresholding
A. Global Thresholding
• One fixed threshold TT for the entire image.
• Simple but may fail if lighting varies.
• Best for clear contrast images (like black text on white background).
C. Dynamic Thresholding
• Threshold depends on pixel coordinates (x, y) and local image properties.
• More advanced; adjusts dynamically.
🔸 3. Histogram-Based Thresholding
• A histogram shows how many pixels have each intensity level.
• If the histogram has two peaks (bimodal), the valley between them is a good threshold.
📌 Types:
• Unimodal histogram → hard to threshold.
• Bimodal histogram → ideal for segmentation.
• Overlapping peaks → more complex; may need adaptive methods.
🔸 5. Multiple Thresholding
• Used when the image has more than two object classes.
• Instead of one threshold, use multiple thresholds T1,T2,...,TnT_1, T_2, ..., T_n
• Output: different values for different ranges
g(x,y)={g1,f(x,y)<T1g2,T1≤f(x,y)<T2…gn,f(x,y)≥Tng(x,y) = \begin{cases} g_1, & f(x,y)
< T_1 \\ g_2, & T_1 ≤ f(x,y) < T_2 \\ \dots \\ g_n, & f(x,y) ≥ T_n \end{cases}
🔸 6. Effect of Noise
• Noise can create false peaks in histogram.
• May mislead threshold selection.
• Use smoothing or averaging to remove noise from histogram before thresholding.
🔸 7. Peakiness Test (to check genuine peaks)
• Checks if a peak is sharp and deep enough.
• A "true" peak should be:
• Narrow (not spread out)
• Tall (clearly above valley)
📌 Peakiness Formula:
If a peak’s height is PP, width is WW, valley values are AA and BB, and total pixels = NN:
• Sharpness = P/NP / N
• Peakiness =
A+B2P×(1−sharpness)\frac{A + B}{2P} \times (1 - \text{sharpness})
If peakiness > threshold → accept as true peak.
✅ Summary Table
Type Description Best Use
Global Thresholding Same threshold for all pixels Simple, good for uniform lighting
Local Thresholding Threshold varies by region Uneven lighting (e.g., shadows)
Dynamic Thresholding Based on pixel + local properties Complex/real-world applications
Multiple Thresholding More than two classes (T₁, T₂…) Color or multi-object segmentation
🔸 Key Idea:
Start with one pixel, then “grow” the region by checking neighboring pixels that are similar.
Continue until no more pixels match the region criteria.
🧠 Example:
If you're segmenting a white flower, choose a pixel from the petal as the seed.
🔸 3. Connectivity
• Defines how neighbors are checked.
• 4-connectivity: Up, Down, Left, Right
• 8-connectivity: Also includes diagonals
Larger connectivity = more detailed region growth.
🔸 4. Stopping Criteria
Region growth stops when:
• No new similar pixels are found.
• A maximum region size is reached.
• A predefined number of iterations is completed.
This ensures the region doesn’t grow endlessly or include unwanted areas.
🔸 5. Advantages
• Very accurate for segmenting homogeneous regions.
• Can handle complex shapes as long as pixels are similar.
• Simple to implement and understand.
🔸 6. Disadvantages
• Sensitive to noise – a noisy pixel may break the region.
• Depends on initial seed – poor seed = poor segmentation.
• Computationally expensive for large images.
🔸 7. Improvements
• Use multiple seeds to segment multiple regions.
• Use region merging after growing to combine small similar regions.
• Combine with edge information for more accurate boundaries.
✅ Summary Table
Step Explanation
Select seed Choose starting pixel
Check neighbors Compare based on intensity, color, etc.
Add similar pixels Include if condition is met
Repeat Continue with newly added pixels
Stop When no more neighbors meet the criteria
🔸 Key Idea:
• Split regions that are too varied (not uniform).
• Merge regions that are similar.
• Continue until all regions are uniform and no further splitting/merging is needed.
🧠 Example:
Two regions with average intensities 125 and 128 (very close) → can be merged.
🔸 4. Quadtree Structure
• The image is represented using a tree structure where:
• The root = whole image
• Each node has 4 children = 4 sub-regions
• Splitting continues until leaf nodes meet the uniformity condition.
• Then, merging is applied from bottom-up.
🔸 5. Advantages
• Flexible: Works even if objects are irregular.
• Systematic: Combines top-down and bottom-up methods.
• No need for initial seed point like region growing.
🔸 6. Disadvantages
• Needs proper homogeneity condition; too strict = over-splitting.
• May result in blocky segmentation (due to square division).
• Computationally more expensive compared to simple thresholding.
🔸 7. Improvement Tips
• Use adaptive thresholding for better homogeneity checking.
• Combine with edge detection to prevent merging across boundaries.
✅ Summary Table
Process Description
Splitting Divide non-homogeneous regions into 4 equal parts
Merging Combine adjacent regions that are similar
Quadtree Tree structure to manage the recursive division
Process Description
Homogeneity Test Checks if pixels in a region are similar enough
🔸 What is a Quadtree?
A quadtree is a tree data structure used to divide an image into square regions. Each square is
split into 4 equal sub-squares (quad = four).
B. Types of Pyramids
1. Gaussian Pyramid:
• Each level is a smoothed and smaller version of the previous.
• Helps in image analysis at multiple resolutions.
2. Laplacian Pyramid:
• Formed by subtracting the Gaussian-blurred image from the original.
• Stores only the differences (details).
• Used in image compression and reconstruction.
C. Applications:
• Object detection at different sizes.
• Multi-resolution image blending.
• Image compression (e.g., JPEG uses a similar concept).
🔸 2. Quadtree Representation
A. How Quadtree Works
• Start with the full image as one large square node (root).
• Check if the region is homogeneous (all pixels similar).
• If yes, stop.
• If not, split into 4 equal parts.
• Repeat the process for each new region.
• Represented as a tree where each node has 4 children.
🧠 Example:
• An image region with mixed black and white pixels → not homogeneous → split into 4.
• If one quadrant is still mixed → split again.
• If another quadrant is all white → no need to split.
B. Homogeneity Check
A simple test:
max(pixel values)−min(pixel values)<T\text{max(pixel values)} - \text{min(pixel values)} < T
If false → split the region.
🔸 4. Advantages
Pyramid Quadtree
Handles multi-resolution Handles spatial segmentation
Used for blending, scaling Used for region-based analysis
Good for image compression Good for object localization
🔸 5. Disadvantages
• Pyramid:
• Loses fine detail in upper levels.
• Needs extra memory for all levels.
• Quadtree:
• Can result in blocky regions.
• Performance depends on threshold accuracy.
✅ Summary Table
Concept Purpose Structure
Pyramid Multi-resolution image representation Stack (bottom to top)
Quadtree Region-based segmentation Tree (top-down split)
Here’s a 30:70 simple explanation of Image Compression – Fundamentals, based on your PDFs
(Image Segmentation and Compression chapter). This structure will help you understand the core
quickly (30%) and then write deeper in your own words (70%). 😊📘
🔸 Main Goal:
Remove redundant or unnecessary data from the image.
A. Coding Redundancy
• Some pixel values (like gray levels) occur more often than others.
• Use shorter codes for frequent values (e.g., Huffman coding).
B. Spatial Redundancy
• Neighboring pixels often have similar values.
• So we don’t need to store all pixel values individually—just the difference is enough.
C. Psycho-visual Redundancy
• Human eyes don’t notice small details or color differences.
• So we can remove data that the eye can’t detect (used in JPEG compression).
B. Lossy Compression
• Some image data is permanently removed.
• Results in smaller file sizes, but cannot restore the original image fully.
✅ Examples:
• JPEG
• Transform coding (DCT)
B. Huffman Coding
• Assigns shorter binary codes to frequent pixel values.
• Based on a binary tree where each leaf node represents a symbol.
C. Predictive Coding
• Predicts pixel values from neighbors and stores only the error.
• Good when pixels are highly similar (low contrast images).
🔸 4. Lossy Compression Techniques
A. Transform Coding
• Uses mathematical transforms like DCT (Discrete Cosine Transform) to convert image to
frequency components.
• High-frequency parts (fine details) are discarded.
B. Quantization
• Groups nearby pixel values into ranges and stores them with fewer bits.
• Reduces file size but introduces small errors.
C. JPEG Compression
• A standard lossy method.
• Steps:
1. Divide image into 8×8 blocks
2. Apply DCT
3. Quantize DCT coefficients
4. Encode remaining values using Huffman coding
🔸 5. Compression Metrics
A. Compression Ratio
Compression Ratio=Original SizeCompressed Size\text{Compression Ratio} = \frac{\text{Original
Size}}{\text{Compressed Size}}
Higher ratio = better compression.
B. Bit Rate
• Average bits per pixel.
• Lower bit rate = more compression.
✅ Summary Table
Compression Type Description Use Case
Lossless No data loss Medical, Legal
Lossy Some data discarded Photos, Web images
RLE Store repeated pixels as a pair Simple images
Huffman Short codes for common symbols Any data
DCT (Transform) Converts image to frequency domain JPEG compression
🔸 Key Idea:
The model uses mathematical or logical techniques to remove redundant data and represent the
image more efficiently, either losslessly or lossily.
🔹 Process:
1. Source encoder converts pixel values into compact binary codes.
2. Source decoder reconstructs the exact pixel values from those codes (for lossless) or
approximate values (for lossy).
Used in both lossless (e.g., PNG) and lossy (e.g., JPEG with Huffman stage)
compression.
🔹 Process:
1. Channel encoder adds error-correcting codes.
2. Channel decoder detects and corrects errors during data transmission.
Common in satellite imaging, wireless communication, where bit errors may occur.
🔸 3. Linear Transform Model
🔹 Concept:
• Converts image from spatial domain (pixels) to frequency domain.
• High-frequency details are usually less important and can be discarded (in lossy
compression).
🔹 Techniques:
• DCT (Discrete Cosine Transform) – Used in JPEG
• DFT (Discrete Fourier Transform)
• DWT (Discrete Wavelet Transform) – Used in JPEG 2000
🔹 Steps:
1. Apply transform (e.g., DCT) to image blocks.
2. Remove or quantize small coefficients (lossy).
3. Store or transmit remaining data.
4. Apply inverse transform during decompression.
Best for lossy image compression.
🔸 4. Statistical Model
🔹 Concept:
• Uses probability theory to predict the occurrence of pixel values.
• Based on Markov models or context-based prediction.
🔹 Process:
1. Predict next pixel value based on previous ones.
2. Encode the difference between actual and predicted value.
3. Use statistical coding (e.g., Arithmetic Coding).
Works well in predictive coding and context modeling.
✅ Summary Table
Model Purpose Used In
Source Encoder/Decoder Minimize redundancy Huffman, Arithmetic, JPEG
Channel Encoder/Decoder Handle transmission errors Wireless/Satellite compression
Linear Transform Model Frequency-based compression JPEG, JPEG2000 (DCT, DWT)
Model Purpose Used In
Predict pixel values & compress
Statistical Model Predictive coding, Context Model
diff.
🔸 Key Point:
After decompression, you get back the exact original image—no changes, no loss.
B. Spatial Redundancy
• Neighboring pixels often have similar intensities.
• Instead of storing all pixels, store the difference between neighboring pixels (used in
predictive coding).
• Best for images with large areas of constant color, like scanned documents or icons.
B. Huffman Coding
• A form of entropy coding.
• Builds a binary tree where frequent symbols get short codes.
• Example:
• Symbol A (most common): 0
D. LZW (Lempel-Ziv-Welch)
• Builds a dictionary of patterns found in the image.
• If a pattern repeats, it stores a reference to the dictionary instead of the actual data.
• Used in GIF and TIFF formats.
E. Predictive Coding
• Predict the value of a pixel based on its neighbors.
• Store only the difference (error) between actual and predicted pixel values.
• If pixels are similar, the difference is small → fewer bits needed.
🔸 5. Advantages
• No information is lost.
• Perfect for sensitive or critical images.
• Many techniques are fast and easy to implement.
🔸 6. Disadvantages
• Less compression ratio compared to lossy methods.
• Not suitable for natural images like photographs (where pixel values vary a lot).
✅ Summary Table
Method How It Works Used In
RLE Stores runs of repeated values Fax, icons, simple scans
Huffman Coding Short codes for frequent values PNG, inside JPEG
Arithmetic Coding Encodes entire message as a single number JPEG 2000
LZW Uses dictionary-based pattern matching GIF, TIFF
Predictive Coding Stores difference between actual & predicted Compression engines
Jpeg mpeg
Here’s a 30:70 simplified explanation of Image Compression Standards – JPEG and MPEG,
based on the Image Segmentation and Compression chapter from your PDFs. This breakdown will
help you understand the basics fast (30%), and then write more details in your own words (70%).
😊📚
✅ Characteristics:
• High compression ratio.
• Some quality loss (depends on compression level).
• Widely used in cameras, websites, WhatsApp, etc.
✅ Frame Types:
1. I-Frames (Intra-coded)
• Compressed like JPEG (self-contained)
2. P-Frames (Predictive-coded)
• Stores only changes from previous frame
3. B-Frames (Bi-directional)
• Stores changes using previous + next frame
✅ Compression Techniques:
• Temporal Compression: Removes redundancy between frames.
• Spatial Compression: Compresses within a frame (like JPEG).
• Motion Estimation: Tracks movement of objects across frames and stores the motion
vectors.
✅ MPEG Versions:
• MPEG-1: For VCD, MP3
• MPEG-2: For DVDs, TV broadcasting
• MPEG-4: For internet videos (YouTube, MP4 format)
🔸 3. Comparison Table
Feature JPEG MPEG
Type Still images Video (image + time)
Method DCT + quantization + VLC Frame-based + motion estimation
Output Format .jpg, .jpeg .mpg, .mp4, .avi, etc.
Compression Spatial Spatial + Temporal
Feature JPEG MPEG
Lossy Yes Yes
🔸 4. Advantages
✅ JPEG
• Simple, fast, efficient
• Great for photos
• Adjustable quality (compression ratio)
✅ MPEG
• Excellent for video
• Reduces file size by removing repeated scenes
• Used in almost all video platforms
🔸 5. Limitations
❌ JPEG
• Not suitable for high-quality image editing (loses details)
• Artifacts (blurring, blockiness) at high compression
❌ MPEG
• Complex algorithms
• Loss of quality over repeated compression
• Needs synchronization between frames
Would you like a diagram or one-page summary chart of JPEG and MPEG differences? 😊🖼️