0% found this document useful (0 votes)
39 views21 pages

Unit 1

This document provides an overview of image formation in computer vision, detailing the processes of capturing and representing images, including key concepts like light interaction, camera models, and image representation techniques. It discusses advantages, limitations, and various applications of image formation, such as 3D reconstruction and object detection. Additionally, it covers methods of image capture and representation, highlighting challenges faced in the field.

Uploaded by

bms714491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views21 pages

Unit 1

This document provides an overview of image formation in computer vision, detailing the processes of capturing and representing images, including key concepts like light interaction, camera models, and image representation techniques. It discusses advantages, limitations, and various applications of image formation, such as 3D reconstruction and object detection. Additionally, it covers methods of image capture and representation, highlighting challenges faced in the field.

Uploaded by

bms714491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

UNIT 1

Introduction to Image Formation-capture and representation-linear filtering-correlation-


convolution-visual features and representations: Edge, Blobs, Corner Detection; Visual
Feature Extraction: Bag-of-words, VLAD, RANSAC, Hough Transform

Introduction to Image Formation in Computer Vision


Definition:

Image formation is a fundamental concept in computer vision, which focuses on how digital
images are created and represented in a way that can be analyzed and processed by computers.
Understanding the image formation process is crucial for building algorithms that can interpret visual
data accurately.

Key Concepts in Image Formation:

1. Light and Illumination:

o Light is the primary source of information in image formation. It interacts with


objects in the environment, and its reflection is captured by sensors (cameras).

o Key properties of light include intensity, wavelength (color), and direction, all of
which affect the resulting image.

2. The Camera Model:

o Cameras simulate the human eye to capture images of the 3D world. The pinhole
camera model is a simple mathematical model widely used in computer vision:

▪ The world is projected onto a 2D plane (image plane) through a single point
(pinhole).

▪ The relationship between 3D points in the scene and their corresponding 2D


points in the image is governed by geometric transformations.

3. Projection Geometry:

o Perspective Projection: Objects farther from the camera appear smaller, creating
depth perception.

o Orthographic Projection: Parallel projection used for simplicity in some applications,


ignoring perspective effects.

4. Image Representation:

o An image is a 2D matrix of pixels, where each pixel stores intensity or color values:

▪ Grayscale Images: Represent intensity values (single channel).

▪ Color Images: Represent RGB values (three channels: Red, Green, Blue).

5. Image Formation Pipeline:

o Scene Illumination: Light source illuminates objects.


o Interaction with Objects: Light is reflected, absorbed, or scattered based on object
properties.

o Camera Lens: Collects light and focuses it onto the image sensor.

o Image Sensor: Converts light into electrical signals (e.g., CCD or CMOS sensors).

o Digital Image: Electrical signals are processed to create a digital image.

6. Radiometric and Photometric Properties:

o Radiometry: Measures light energy captured by the camera.

o Photometry: Relates light intensity to human perception.

o These properties influence brightness, contrast, and color in the image.

7. Distortions in Image Formation:

o Lens Distortions: Radial and tangential distortions caused by imperfections in the


camera lens.

o Motion Blur: Caused by movement during image capture.

o Noise: Random variations in image data introduced during sensing or transmission.

8. Mathematical Tools:

o Homogeneous Coordinates: Simplify transformations in computer vision by


extending 2D points to 3D space.

o Camera Calibration: The process of determining camera parameters to correct


distortions and map between 3D and 2D spaces.

How Image Formation Works in Computer Vision

Image formation is the process of converting a 3D real-world scene into a 2D digital representation
that can be analyzed by computers. The process involves several key steps:

1. Scene Illumination:

o Light from a source interacts with objects in the scene.

o The interaction depends on the material properties of the objects (reflective,


absorptive, or refractive).

2. Light Projection:

o Light rays pass through the camera lens and project onto the image plane.

o The pinhole camera model or more complex lens models govern this projection.

3. Image Capture by Sensors:

o Light is converted into electrical signals using an image sensor (e.g., CCD or CMOS).

o Each pixel in the sensor captures light intensity (grayscale) or light of different
wavelengths (color).
4. Digital Image Processing:

o The electrical signals are digitized to form a matrix of pixel values.

o Additional processing, like correcting lens distortions and adjusting brightness or


contrast, may be applied.

Image sensing pipeline in a camera

Advantages of Image Formation in Computer Vision:

1. Foundation for Advanced Applications:

o Enables core computer vision tasks like object detection, facial recognition, and 3D
scene reconstruction.

2. Realistic Scene Representation:

o Captures visual data that closely represents the physical world, making it intuitive for
humans and effective for AI models.

3. Automation Potential:

o Automates tasks like quality inspection, surveillance, and navigation that would
otherwise require human intervention.

4. Scalability:

o Once set up, systems leveraging image formation can process vast amounts of data
quickly and consistently.

5. Multi-Sensor Integration:
o Works with other sensors like LiDAR and depth cameras to create more robust
systems for 3D perception.

Limitations of Image Formation in Computer Vision:

1. Environmental Dependence:

o Varying lighting conditions, weather, and occlusions can degrade image quality and
affect accuracy.

2. Limited Depth Perception:

o Single cameras cannot capture depth information effectively. Stereo cameras or


additional sensors (e.g., LiDAR) are required for 3D data.

3. Sensor Limitations:

o Dynamic range, resolution, and noise levels in sensors restrict the quality of captured
images.

4. Lens Distortions:

o Imperfections in lenses can introduce radial or tangential distortions, requiring


calibration and correction.

5. Motion Blur:

o Fast-moving objects or camera motion can cause blurred images, affecting analysis.

6. High Computational Costs:

o Processing high-resolution images or video streams requires significant


computational power and storage.

7. Data Ambiguity:

o Certain features may not be visible or may overlap, leading to ambiguities in


interpretation.

8. Ethical Concerns:

o Privacy issues can arise, particularly in surveillance applications, where capturing


images without consent is a concern.

Applications of Image Formation in Computer Vision:

1. 3D Reconstruction:

o Recovering the 3D structure of a scene from 2D images.

2. Object Detection and Recognition:

o Identifying and classifying objects in images.

3. Augmented Reality:

o Superimposing virtual objects onto real-world scenes.

4. Robotics:
o Helping robots perceive and navigate their environment.

5. Medical Imaging:

o Enhancing and analyzing images for diagnostics.

Capture and Representation in Computer Vision:


The process of converting a 3D real-world scene into a 2D digital representation in computer vision
can be divided into two fundamental stages:

1) image capture

2)image representation

Both stages are crucial for enabling computers to analyze and interpret visual data
effectively.

1. Image Capture:

Image capture involves the transformation of light from a scene into a digital format that can be
processed by a computer. This step includes:

A. Interaction of Light with the Scene

• Light originates from sources such as the sun, artificial lights, or ambient illumination.

• When light interacts with objects in the scene, the following phenomena occur:

o Reflection:

▪ Diffuse Reflection: Scattered uniformly in all directions; dominant in matte


surfaces.

▪ Specular Reflection: Reflected in a single direction; observed in shiny


surfaces.

o Absorption: Certain wavelengths are absorbed by the object, giving it color.

o Transmission and Refraction: Light passes through transparent objects and bends.

B. Optics and Projection

• Camera Lens:

o Focuses incoming light onto the image plane (or sensor).

o The lens introduces projection effects such as perspective, which impacts the
appearance of objects in the captured image.

• Projection Models:

o Pinhole Camera Model:

▪ Simplest model, where light rays pass through a small aperture (pinhole) to
form an inverted image on the image plane.

▪ 3D points in the scene (X,Y,Z) are mapped to 2D points (x,y) on the image
plane based on:
where f is the focal length.

o Lens-Based Model:

▪ Accounts for real-world lens effects, such as magnification, distortion, and


chromatic aberration.

C. Image Sensors

• Light is captured by sensors that convert photons into electrical signals:

o Charge-Coupled Device (CCD):

▪ Provides high-quality, low-noise images.

▪ Typically used in professional imaging.

o Complementary Metal-Oxide-Semiconductor (CMOS):

▪ Consumes less power and allows for on-chip processing.

▪ Common in consumer cameras and smartphones.

• Each pixel on the sensor measures light intensity (grayscale) or light intensity for different
wavelengths (color).

D. Analog-to-Digital Conversion

• The electrical signals from the sensor are digitized into discrete pixel values:

o Grayscale Images: Represent light intensity using a single value per pixel (e.g., 0–255
for 8-bit images).

o Color Images: Represent light intensity for different wavelengths, commonly stored
as RGB (Red, Green, Blue) triplets.

2. Image Representation:

Once the image is captured, it is represented in a format that can be processed by computer vision
algorithms. Representation involves the organization and encoding of pixel data to extract
meaningful information.

A. Pixel Grid

• An image is represented as a 2D matrix of pixels:

o Grayscale Images: Each pixel stores a single intensity value.


o Color Images: Each pixel stores three values corresponding to red, green, and blue
intensities (RGB).

B. Image Coordinate System

• Each pixel in the image has a position (x,y) in the image coordinate system.

• The origin (0,0) is usually at the top-left corner of the image.

• Pixel coordinates are discrete, while the physical world is continuous.

C. Resolution

• Defined by the number of pixels in the image (e.g., 1920x1080).

• Higher resolution provides finer detail but requires more storage and processing power.

D. Intensity and Color Representation

• Grayscale:

o Represents brightness levels on a scale (e.g., 0 for black, 255 for white in 8-bit
images).

• Color:

o RGB format is most common, with each channel typically stored as an 8-bit value (0–
255).

o Other color spaces, like HSV (Hue, Saturation, Value) or YUV, are used for specific
applications.

E. Depth Information (Optional)

• Some systems capture depth along with intensity or color using techniques like stereo vision,
LiDAR, or structured light.

• Depth is stored as an additional channel, resulting in 3D representations (e.g., point clouds).

F. High Dynamic Range (HDR)

• Standard cameras have limited dynamic range, which may cause loss of detail in very bright
or dark areas.

• HDR imaging combines multiple exposures to capture a wider range of intensities.


Methods of Image Capture and Representation in Computer Vision:

1. Image Capture Methods

Various methods are used to acquire visual data, depending on the application and the type of
information required:

1. Standard 2D Cameras:

o Captures 2D images using CCD or CMOS sensors.

o Applications: Object detection, facial recognition, and general imaging.

2. Stereo Cameras:

o Consist of two cameras positioned apart to simulate binocular vision.

o Captures depth information by analyzing disparities between two images.

o Applications: 3D reconstruction, robotics, and autonomous vehicles.

3. Depth Cameras:

o Uses techniques like structured light, time-of-flight (ToF), or LiDAR to capture depth.

o Applications: Gesture recognition, AR/VR, and environment mapping.

4. Multispectral and Hyperspectral Cameras:

o Captures images in multiple spectral bands beyond the visible range (e.g., infrared,
ultraviolet).

o Applications: Remote sensing, agriculture, and medical diagnostics.

5. High-Speed Cameras:

o Captures a large number of frames per second to analyze fast-moving objects.

o Applications: Sports analysis, scientific experiments, and industrial inspection.

6. Thermal Cameras:

o Detects infrared radiation to create images based on temperature.

o Applications: Night vision, surveillance, and heat detection.

2. Image Representation Methods

1. Pixel-Based Representation:

o Images are represented as a grid of pixels, each storing intensity or color values.

o Common formats: Grayscale, RGB, YUV, HSV.

2. Feature-Based Representation:

o Represents key features (e.g., edges, corners, textures) instead of raw pixel data.

o Used in tasks like feature matching and object detection.

3. Sparse Representations:
o Focuses only on important areas or features, reducing data size.

o Applications: Compression and efficient storage.

4. Graph-Based Representations:

o Models an image as a graph where pixels or regions are nodes, and edges represent
relationships.

o Applications: Image segmentation, object tracking.

5. 3D Representations:

o Captures geometric data, such as depth maps, point clouds, or meshes.

o Applications: 3D modeling, AR/VR, and autonomous navigation.

6. Fourier and Wavelet Transforms:

o Represents images in frequency or multi-resolution domains.

o Applications: Image compression, filtering, and enhancement.

Challenges in Capture and Representation:

1. Environmental Factors:

o Lighting conditions, occlusions, and shadows can degrade image quality.

2. Sensor Noise:

o Introduced during capture, such as thermal noise or quantization noise.

3. Projection Loss:

o Depth information is lost during the 3D-to-2D mapping.

4. Computational and Storage Costs:

o High-resolution images require significant resources for processing and storage.

Applications of Image Capture and Representation in Computer Vision:

1. Object Detection and Recognition

• Use: Identifying and classifying objects in images.

• Examples:

o Facial recognition for security systems.

o Vehicle detection for traffic monitoring.

o Product recognition in e-commerce.

2. 3D Reconstruction

• Use: Creating 3D models of objects or scenes.

• Examples:
o Archaeological site reconstruction.

o Medical imaging for creating anatomical models.

o 3D mapping in urban planning.

3. Autonomous Vehicles

• Use: Navigating and understanding the environment using cameras and sensors.

• Examples:

o Lane detection.

o Obstacle and pedestrian recognition.

o Depth estimation for path planning.

4. Augmented Reality (AR) and Virtual Reality (VR)

• Use: Integrating virtual elements with the real world or creating immersive environments.

• Examples:

o AR gaming.

o Virtual training simulators.

o Remote collaboration tools.

5. Medical Imaging

• Use: Diagnosing and analyzing medical conditions through image analysis.

• Examples:

o X-ray, MRI, and CT scan interpretation.

o Tumor detection and segmentation.

o Retinal image analysis for diabetes.

6. Surveillance and Security

• Use: Monitoring environments for safety and security.

• Examples:

o Intruder detection.

o Crowd analysis.

o License plate recognition.

7. Industrial Automation

• Use: Quality inspection and process automation in manufacturing.

• Examples:

o Detecting defects in products.


o Monitoring assembly lines.

o Robotics for material handling.

8. Agriculture

• Use: Monitoring crop health and optimizing farming practices.

• Examples:

o Disease detection in plants.

o Yield estimation from aerial imagery.

o Precision farming using drone-captured images.

9. Entertainment

• Use: Creating realistic visual effects and animations.

• Examples:

o Motion capture for movies and video games.

o Photo editing and enhancement.

o Content generation for social media.

10. Environmental Monitoring

• Use: Tracking and analyzing environmental changes.

• Examples:

o Monitoring deforestation using satellite imagery.

o Tracking wildlife movement.

o Analyzing climate patterns.

Linear Filtering in Computer Basics:


Linear filtering is a fundamental operation in image and signal processing, widely used
for tasks such as noise reduction, edge detection, and image enhancement. The term "linear" refers
to the principle that the filtering operation satisfies the properties of linearity: additivity and
homogeneity.

What is Linear Filtering?

Linear filtering involves modifying the value of a pixel (or a data point in general) by
applying a mathematical function that depends linearly on the values of its neighboring pixels. The
output at each point is a weighted sum of the input values, where the weights are defined by a filter
kernel (or mask).
Steps in Linear Filtering

1. Choose a Filter Kernel:

o A small matrix of weights (e.g., 3×33, 5×55) that defines the transformation.

o Examples:

▪ Box Filter: Averages pixel values in the neighborhood.

▪ Gaussian Filter: Applies a Gaussian weighting for smoothing.

▪ Sobel Filter: Highlights edges by emphasizing gradient directions.

2. Apply Convolution or Correlation:

o Convolution: Flips the kernel before applying it to the image.

o Correlation: Directly slides the kernel across the image without flipping.

o For each pixel, compute the sum of the product of the kernel values and the
corresponding pixel values in the neighborhood.

3. Handle Image Borders:

o Options include padding with zeros, mirroring, or extending edge values to deal with
regions where the kernel extends beyond the image boundary.

4. Produce the Output Image:

o The result is an image with the same dimensions (or slightly reduced if no padding is
applied), where each pixel value reflects the weighted sum of its neighborhood.

Common Linear Filters:

1. Smoothing Filters:

o Purpose: Reduce noise by averaging pixel values.

o Example: Box filter.


o Kernel:

Gaussian Filter:

• Purpose: Smooth the image while preserving edges better than the box filter.

• Kernel is based on the Gaussian function:

Edge Detection Filters:

• Purpose: Detect edges by emphasizing intensity gradients.

• Examples: Sobel, Prewitt filters.

• Sobel x-direction kernel:

Sharpening Filters:

• Purpose: Enhance edges and fine details.

• Example: Laplacian filter.

• Kernel:

Applications of Linear Filtering:

1. Image Smoothing:

o Reduces noise in images.

o Example: Preprocessing for facial recognition or object detection.

2. Edge Detection:
o Identifies boundaries of objects.

o Example: Used in medical imaging to detect tumor edges.

3. Feature Extraction:

o Enhances specific patterns like edges, corners, or textures.

o Example: Optical character recognition (OCR).

4. Image Enhancement:

o Improves visual quality by reducing blurriness or noise.

o Example: Digital photography post-processing.

5. Data Preprocessing:

o Smooths data for machine learning models.

Limitations of Linear Filtering:

1. Loss of Detail:

o Smoothing filters can blur edges and remove fine details.

2. Not Effective for Complex Noise:

o Linear filters cannot handle non-Gaussian or non-linear noise effectively.

3. Edge Artifacts:

o Edges near the border may be distorted due to padding methods.

4. Limited Context Awareness:

o Only considers local neighborhoods, which may not capture larger patterns

Example:

Correlation:
Correlation is a statistical measure that describes the extent to which two variables are linearly
related. In simpler terms, it quantifies the strength and direction of the relationship between two
data sets. Correlation is widely used in various fields, including statistics, machine learning, and
computer vision, to understand dependencies and interactions between variables.

Key Characteristics of Correlation:

1. Direction:

o Positive Correlation: As one variable increases, the other tends to increase.

o Negative Correlation: As one variable increases, the other tends to decrease.

o No Correlation: No consistent relationship between the variables.

2. Magnitude:

o Correlation values range from -1 to +1.

▪ +1: Perfect positive correlation.

▪ 0: No correlation.

▪ -1: Perfect negative correlation.

Types of Correlation:

1. Pearson Correlation Coefficient (r):

o Measures linear relationships between continuous variables.

o Formula:

2. Spearman Rank Correlation:

• Measures the strength and direction of a monotonic relationship between ranked variables.

• Used when data is not normally distributed or relationships are non-linear.

3. Kendall’s Tau:

• Measures the association between two variables based on the ranking of data.

4. Cross-Correlation:

• Measures similarity between two signals as a function of time-lag.

• Common in signal processing and image analysis.

Correlation in Machine Learning:

1. Feature Selection:

o Helps identify redundant or highly correlated features.

2. Predictive Modeling:
o Indicates dependencies that may improve model performance.

3. Interpretability:

o Highlights relationships between input variables and target outputs.

Correlation in Computer Vision:

In computer vision, correlation is used for:

1. Template Matching:

o Measures the similarity between an image template and regions in a larger image.

2. Feature Matching:

o Identifies corresponding features in two images.

3. Optical Flow:

o Tracks pixel intensity patterns across video frames.

Advantages of Correlation:

1. Simplicity:

o Easy to calculate and interpret, especially for linear relationships.

2. Quantifies Relationships:

o Provides a numerical value to represent the strength and direction of the


relationship between two variables.

3. Feature Analysis:

o Identifies dependencies between variables, useful in data exploration.

4. Signal Processing:

o Measures similarity between signals or patterns, useful in cross-correlation and


template matching.

5. Predictive Modeling:

o Helps identify predictors and reduce multicollinearity in machine learning models.

Disadvantages of Correlation:

1. Linear Relationships Only:

o Correlation measures only linear dependencies and may not detect non-linear
relationships.

2. No Causation:

o Correlation does not imply causation; two correlated variables may be influenced by
a third factor.

3. Sensitivity to Outliers:
o Extreme values can distort the correlation coefficient, leading to misleading
interpretations.

4. Data Scale Dependency:

o Requires normalization or standardization for variables with different units or scales.

5. Limited in High Dimensions:

o Pairwise correlation analysis may not effectively capture complex interactions in


high-dimensional datasets.

Applications of Correlation:

1. Statistics and Data Analysis

• Understanding relationships between variables.

• Identifying redundant or dependent variables in datasets.

2. Machine Learning

• Feature Selection: Removing highly correlated features to reduce redundancy.

• Feature Engineering: Identifying relevant input features for predictive models.

• Model Evaluation: Analyzing correlation between predicted and actual values.

3. Signal Processing

• Cross-Correlation: Comparing signals for similarity or time shifts.

• Pattern Recognition: Identifying patterns in data streams, such as audio or seismic signals.

4. Computer Vision

• Template Matching: Locating a template within an image using correlation-based similarity.

• Feature Matching: Matching corresponding points or features in different images (e.g., in


stereo vision).

• Image Registration: Aligning multiple images based on correlated regions.

5. Finance and Economics

• Analyzing relationships between financial indicators (e.g., stock prices and interest rates).

• Measuring market dependencies and diversifying portfolios.

6. Bioinformatics

• Studying gene expression patterns or protein-protein interactions.

• Understanding correlations in biological datasets.

7. Social Sciences

• Exploring relationships between demographic or behavioral variables.

• Analyzing survey data for trends and dependencies.


8. Environmental Science

• Examining correlations between weather variables (e.g., temperature and humidity).

• Analyzing the relationship between pollution and health metrics.

Convolution:
Convolution is a mathematical operation that combines two functions to produce a third
function. In the context of images, convolution involves a small matrix called a kernel or filter sliding
over an image to perform operations like edge detection, blurring, or sharpening.

Mathematical Representation:

How Convolution Works:

1. Kernel/Filter:

o A small matrix (e.g., 3×33 \times 33×3, 5×55 \times 55×5) with predefined or learned
weights.

o Examples: Edge detection filter, Gaussian blur, etc.

2. Sliding Window:

o The kernel slides over the image pixel by pixel.

o At each position, element-wise multiplication is performed between the kernel and


the overlapping image region.

3. Aggregation:

o The results of the multiplication are summed up to produce a single value.

4. Output:

o The output is a feature map (or activation map) highlighting specific patterns or
features.
Key Components in Convolutional Operations:

1. Stride:

o The step size by which the kernel moves.

o Larger strides reduce the output size, capturing more abstract features.

2. Padding:

o Adds extra pixels around the image to control the output size.

o Types:

▪ Valid Padding: No padding (output size shrinks).

▪ Same Padding: Padding added to preserve the input size.

3. Channels:

o Handles multi-channel images (e.g., RGB).

o Each kernel applies convolution to individual channels, and the results are
aggregated.

Why Convolution is Important in Computer Vision?

1. Feature Extraction:

o Identifies patterns such as edges, textures, and shapes in images.

2. Translation Invariance:

o The same filter is applied across the image, ensuring features are detected regardless
of location.

3. Parameter Efficiency:

o Reduces the number of parameters compared to fully connected layers, making


models computationally efficient.
Advantages of Convolution in Computer Vision:

1. Efficient Representation:

o Captures spatial and hierarchical features with fewer parameters.

2. Scalable:

o Works for small and large images.

3. Universal Applicability:

o Applicable to various tasks like detection, segmentation, and classification.

Challenges of Convolution:

1. Computational Intensity:

o Requires high computational resources, especially for large kernels.

2. Limited Receptive Field:

o Each convolution captures local features, requiring deeper layers for global
understanding.

3. Overfitting:

o May occur without proper regularization techniques (e.g., dropout, weight decay).

Applications of Convolution in Computer Vision:

1. Image Processing:

o Edge Detection: Sobel, Prewitt, and Canny filters.


o Blurring/Sharpening: Gaussian blur or sharpening filters.

2. Object Detection:

o Identifying objects in an image using CNNs (e.g., YOLO, Faster R-CNN).

3. Image Segmentation:

o Partitioning images into meaningful regions (e.g., U-Net).

4. Feature Matching:

o Comparing features across images for tasks like panorama stitching or 3D


reconstruction.

5. Facial Recognition:

o Using convolutional layers to extract facial features for identification.

6. Generative Models:

o GANs use convolutions for creating new images or altering existing ones.

7. Image Classification:

o CNNs use multiple convolution layers to classify objects.

You might also like