0% found this document useful (0 votes)
4 views21 pages

Computer VISION - 1

This document provides an introduction to image formation and processing in computer vision, covering key concepts such as image acquisition, processing, feature extraction, and geometric transformations. It explains the photometric image formation process, the role of digital cameras, and the use of point operators for image enhancement. The document emphasizes the multidisciplinary nature of computer vision and its applications across various industries.

Uploaded by

Andro Jeevan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views21 pages

Computer VISION - 1

This document provides an introduction to image formation and processing in computer vision, covering key concepts such as image acquisition, processing, feature extraction, and geometric transformations. It explains the photometric image formation process, the role of digital cameras, and the use of point operators for image enhancement. The document emphasizes the multidisciplinary nature of computer vision and its applications across various industries.

Uploaded by

Andro Jeevan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

lOMoARcPSD|51618313

UNIT I Introduction TO Image Formation AND Processing

Computer Science (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Jeevan Jothivel ([email protected])
lOMoARcPSD|51618313

UNIT I INTRODUCTION
TO IMAGE FORMATION
AND PROCESSING
Computer Vision – Geometric primitives and transformations – Photometric image formation – The
digital camera – Point operators – Linear filtering – More neighborhood operators – Fourier
transforms – Pyramids and wavelets – Geometric transformations – Global optimization.

Computer vision
Computer Vision is a multidisciplinary field that enables computers to interpret and
understand visual information from the world, just like humans do with their eyes and brains. It
involves processing, analyzing, and making sense of images and videos to extract meaningful insights,
recognize objects, understand scenes, and even perform actions based on visual data. Computer
Vision combines techniques from various domains, such as computer science, mathematics, physics,
and psychology, to enable machines to perceive and interpret visual content.

Key Concepts in Computer Vision:

1. Image Acquisition: Computer Vision starts with obtaining visual data. This can be achieved through
various devices like cameras, scanners, drones, or sensors. Images are captured as grids of pixels, each pixel
representing a tiny unit of color or intensity.

2. Image Processing: This involves manipulating and enhancing images to improve their quality or extract
relevant features. Common techniques include noise reduction, image sharpening, and contrast adjustment.

3. Feature Extraction: In order to understand and identify objects in an image, relevant features must be
extracted. Features can be edges, corners, textures, colors, shapes, or more complex patterns.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

4. Image Analysis and Understanding: This stage involves interpreting the extracted features to understand
the content of the image. For instance, detecting objects, recognizing faces, or identifying specific patterns
within an image.

5. Object Detection: Identifying and localizing objects within an image or video stream. This is commonly
used in applications like self-driving cars, where the system needs to detect pedestrians, vehicles, traffic
signs, etc.

6. Image Classification: Assigning a label or category to an image. For example, recognizing whether an
image contains a cat or a dog.

7. Object Tracking: Following the movement of an object over a sequence of frames in a video. This is
crucial in applications like surveillance and robotics.

8. Semantic Segmentation: Assigning a semantic label (e.g., "road," "building," "tree") to each pixel in an
image. This is often used for detailed scene understanding.

9. 3D Reconstruction: Creating a three-dimensional model of a scene or object from multiple images taken
from different viewpoints.

10. Motion Analysis: Analyzing the motion of objects in videos to understand patterns, behaviors, or
anomalies. This is used in applications like action recognition and video surveillance.

11. Deep Learning: In recent years, deep neural networks, particularly Convolutional Neural Networks
(CNNs), have revolutionized computer vision. They can automatically learn hierarchical features from raw
pixel data, enabling better performance in tasks like image recognition and object detection.

12. Applications: Computer Vision has a wide range of applications across industries. Some examples
include facial recognition, medical image analysis (like diagnosing diseases from medical scans),
autonomous vehicles, augmented reality, quality control in manufacturing, and more.

13. Challenges: Computer Vision faces challenges such as handling variations in lighting, viewpoint, and
occlusions (parts of objects being hidden). Building models that generalize well across different conditions
is a complex task.

Computer Vision has progressed significantly due to advancements in hardware capabilities,


algorithm development, and the availability of large datasets. It's a rapidly evolving field with
continuous research and innovation, making machines more capable of perceiving and
understanding visual data, and bringing us closer to achieving tasks that were once exclusive to
human vision.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

Geometric Primitives and Transformations


Geometric primitives and transformations are fundamental concepts in computer vision that
enable the representation, manipulation, and analysis of visual data, such as images and videos.
These concepts help to describe the structure and relationships within visual scenes and facilitate
tasks like object recognition, tracking, and 3D reconstruction.

Geometric Primitives:

Geometric primitives are basic shapes or components that are used to represent objects and
structures in a visual scene. They serve as building blocks for more complex shapes and allow us to
describe the world in terms of simple elements. Some common geometric primitives include:

1. Points: Represented by their coordinates (x, y), points are fundamental elements that define positions in
a 2D or 3D space.

2. Lines and Line Segments: Lines are defined by two points and extend infinitely in both directions. Line
segments are portions of lines bounded by two points.

3. Curves: Curves can be represented by mathematical equations and can include circles, ellipses, and
Bézier curves.

4. Polygons: Closed shapes formed by connecting multiple line segments.

5. Planes: Defined by three points or a point and a normal vector, planes are flat surfaces in 3D space.

6. Solids: Represent three-dimensional objects, such as cubes, spheres, and cylinders.

Geometric Transformations:

Geometric transformations involve altering the position, size, orientation, or shape of geometric
primitives in a visual scene. These transformations are used to manipulate visual data and enable
various computer vision tasks. Some common types of geometric transformations include:

1. Translation: Moves an object from one location to another by adding a constant value to its coordinates.
Translation is often represented as (x + dx, y + dy) in 2D.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

2. Rotation: Rotates an object around a specified point or axis. The rotation angle determines the amount
of rotation.

3. Scaling: Changes the size of an object by multiplying its coordinates by scaling factors. Scaling can be
uniform (equal in all directions) or non-uniform.

4. Shearing: Skews or distorts an object along one axis, causing it to change shape.

5. Reflection: Creates a mirror image of an object by reversing its coordinates along a specified axis.

6. Affine Transformation: Combines translation, rotation, scaling, and shearing to perform more complex
transformations while preserving parallel lines.

7. Projective Transformation: Handles more general transformations that include perspective effects, such
as foreshortening and vanishing points.

These geometric transformations are crucial for various computer vision tasks:

- Image Registration: Aligning multiple images or frames to a common reference frame.

- Object Recognition: Matching features or shapes in different images despite variations in viewpoint or
scale.

- Camera Calibration: Determining the parameters of a camera's intrinsic and extrinsic properties to enable
accurate mapping of 3D world points to 2D image points.

- Augmented Reality: Overlaying virtual objects onto the real world by transforming them according to the
camera's perspective.

- 3D Reconstruction: Estimating the 3D structure of a scene by transforming 2D image points into 3D space.

- Image Warping: Distorting images or videos to simulate different viewpoints, effects, or perspectives.

In summary, geometric primitives and transformations play a crucial role in computer vision by
providing the tools to describe, manipulate, and analyze visual data in both 2D and 3D space, enabling
a wide range of applications in the field.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

Photometric Image Formation


Photometric image formation, also known as the imaging process or image formation model,
refers to the process by which light from a scene interacts with a camera's optics and sensor to create
a digital image. Understanding this process is essential in computer vision for tasks such as image
analysis, object recognition, and 3D reconstruction. The photometric image formation process
involves several stages:

1. Illumination: The scene is illuminated by a light source, either natural (such as sunlight) or artificial (like
indoor lighting). Illumination affects the appearance of objects in the scene, their colors, and the intensity
of light they reflect or transmit.

2. Reflection and Transmission: When light interacts with objects in the scene, it can be reflected, absorbed,
or transmitted. The properties of the materials in the scene determine how they respond to different
wavelengths of light, leading to variations in appearance.

3. Camera Optics: The light from the scene enters the camera through its lens system. The camera optics
focus the incoming light onto the camera's sensor (or film in traditional cameras). The lens characteristics,
including aperture size and focal length, affect how the light is captured.

4. Sensor Response: The camera's sensor consists of individual photosensitive elements (pixels) that detect
the incoming light. Each pixel responds to the intensity of light falling on it. The sensor's response can vary
based on its sensitivity to different wavelengths and the exposure settings.

5. Quantization: The analog signal generated by the sensor needs to be digitized for storage and processing.
This involves converting the continuous range of light intensities into discrete values (usually in the form
of digital numbers), a process known as quantization.

6. Noise and Distortions: During the entire imaging process, various sources of noise and distortions can
affect the captured image. These can include sensor noise, lens aberrations, and atmospheric effects.

7. Color Representation: For color images, information about the distribution of different wavelengths of
light is captured. This can be achieved using different techniques, such as using multiple sensors or filters
to capture red, green, and blue channels, or using a single sensor with a color filter array (e.g., Bayer filter).

The entire process of photometric image formation is complex and influenced by numerous factors,
such as the properties of the camera, the characteristics of the scene, the lighting conditions, and the
interaction of light with different materials. Because of these complexities, computer vision
researchers have developed mathematical models to simulate and understand the image formation
process accurately.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

These models help in tasks like:

- Camera Calibration: Estimating the intrinsic and extrinsic parameters of a camera, which are essential for
mapping 3D world points to 2D image coordinates.

- Image Enhancement: Correcting for noise, distortions, and other artifacts introduced during the imaging
process.

- Color Correction: Adjusting colors to account for differences in lighting conditions or sensor
characteristics.

- Material and Lighting Estimation: Inferring properties of materials and lighting in the scene based on
captured images.

- Scene Understanding: Interpreting the observed images to infer the structure, content, and characteristics
of the scene.

In summary, photometric image formation is a fundamental concept in computer vision that


encompasses the entire process from illumination to the creation of a digital image. Understanding
this process is essential for accurate analysis and interpretation of visual data in various applications.

The Digital Camera


A digital camera is a device that captures and records images in a digital format. It plays a
significant role in computer vision as it serves as the primary tool for capturing visual data that can
be analyzed, processed, and understood by computers. The working principle of a digital camera
involves several components and processes:

Components of a Digital Camera:

1. Lens: The lens gathers light from the scene and focuses it onto the camera's sensor. The quality and
characteristics of the lens affect the overall image quality.

2. Aperture: The aperture is an adjustable opening that controls the amount of light entering the camera. It
affects the depth of field (the range of distances in focus) and the exposure of the image.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

3. Shutter: The shutter controls the duration of time that light is allowed to hit the sensor. It opens and closes
to control the exposure of the image. A fast shutter speed freezes motion, while a slow shutter speed captures
motion blur.

4. Sensor: The image sensor is the electronic component that converts light into an electrical signal. The
two most common types of sensors are CCD (Charge-Coupled Device) and CMOS (Complementary Metal-
Oxide-Semiconductor). Each pixel on the sensor corresponds to a photosensitive site that measures the
intensity of incoming light.

5. ADC (Analog-to-Digital Converter): The analog signal generated by the sensor needs to be converted
into digital form for processing and storage. The ADC assigns a digital value to each analog signal, which
represents the intensity of light.

6. Processor: The camera's processor handles various tasks, including image processing, compression, and
the application of camera settings like white balance and color correction.

7. Memory Card: The digital image data is stored on a memory card, typically in formats like JPEG or
RAW. This storage allows for easy transfer of images to other devices.

8. Viewfinder or LCD Screen: The viewfinder (optical or electronic) allows the photographer to compose
the shot, while the LCD screen displays the captured image and camera settings.

Working Principle:

1. Light Capture: When you press the shutter button, the camera's aperture opens, allowing light to pass
through the lens. The lens focuses this light onto the image sensor.

2. Sensor Exposure: The light falling on the sensor generates an electrical charge at each pixel site. The
amount of charge corresponds to the intensity of light.

3. Analog-to-Digital Conversion: The analog charges from the sensor are converted into digital values using
the ADC. Each pixel's charge is quantized into a specific digital number, representing its brightness.

4. Image Processing: The camera's processor applies various adjustments to the digital image, including
white balance, color correction, noise reduction, and sharpening.

5. Storage: The processed image data is stored on a memory card in a specific file format (such as JPEG or
RAW). This allows easy transfer and sharing of the images.

6. Display: The captured image can be displayed on the camera's LCD screen for review.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

Digital cameras provide a flexible and efficient way to capture visual data for computer vision
applications. They allow for precise control over exposure settings, image quality, and other
parameters, making them valuable tools for capturing images that can be analyzed by computers to
extract information, recognize objects, and understand scenes.

Point Operator
In computer vision, a point operator, often referred to as a point processing or point-wise
operation, is a type of image processing operation that operates on individual pixels of an image
independently, without considering their neighbors. The operation modifies the pixel values based
on a predefined rule or function, transforming the pixel values one by one. Point operators are simple
yet powerful tools used for tasks like image enhancement, contrast adjustment, and intensity
transformation.

Key Concepts of Point Operators:

1. Pixel Transformation: In a point operator, each pixel in the input image is transformed into a new value
in the output image based on a specific rule or function. The transformation function defines how the pixel
values are modified.

2. Function Mapping: Point operators involve mapping the original pixel values to new values using a
mathematical function. The function's behavior determines how the pixel values change.

3. Local Operation: Since point operators consider only the individual pixel being processed, they are local
operations. They don't take into account the surrounding pixels, which makes them computationally
efficient.

4. Intensity Transformation: Point operators are commonly used for intensity transformations, such as
adjusting the brightness, contrast, or gamma correction of an image.

Common Point Operator Examples:

1. Brightness Adjustment: A simple point operator involves adding or subtracting a constant value from all
pixel values in the image. This operation shifts the intensity levels without changing the overall contrast.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

2. Contrast Enhancement: By applying a linear scaling factor to the pixel values, you can expand or
compress the range of intensity values, enhancing the image's contrast.

3. Gamma Correction: Adjusting the intensity levels using a power function can correct the gamma
characteristics of an image and improve its overall appearance.

4. Thresholding: Creating a binary image by assigning one value to pixels above a certain threshold and
another value to pixels below the threshold.

5. Negative Image: Inverting the pixel values, resulting in a negative image.

6. Histogram Equalization: A more advanced point operator that redistributes the intensity values to
enhance the image's contrast and reveal details.

7. Piecewise Linear Transformation: Applying different linear functions to different intensity ranges in the
image to achieve custom adjustments.

Applications:

Point operators are widely used in various computer vision tasks and image processing applications:

- Image Enhancement: Modifying pixel values to improve the overall visual quality of an image.

- Histogram Manipulation: Changing the distribution of pixel intensities in the image to improve contrast
or match a desired histogram.

- Image Correction: Adjusting images to account for variations in lighting conditions or sensor
characteristics.

- Preprocessing: Preparing images for further analysis by normalizing intensity ranges or reducing noise.

- Artistic Effects: Creating stylistic effects by manipulating pixel values, such as generating sepia tones or
applying filters.

In summary, point operators are basic but essential tools in image processing and computer vision.
They enable quick adjustments to pixel values, allowing for image enhancement and transformation
without the need for complex neighborhood-based operations.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

Linear Filtering
Linear filtering is a fundamental concept in computer vision and image processing that involves
applying a convolution operation to an image using a filter kernel. This operation is used to perform
tasks like noise reduction, edge detection, and image smoothing. Linear filtering plays a crucial role
in enhancing images, extracting features, and preparing images for further analysis.

Key Concepts of Linear Filtering:

1. Filter Kernel: A filter kernel, also known as a convolution kernel or mask, is a small matrix of numeric
values. The size of the kernel determines the extent of the neighborhood considered during the filtering
operation. The values within the kernel define the coefficients used in the convolution operation.

2. Convolution Operation: The convolution operation involves placing the filter kernel over each pixel in
the image and calculating a weighted sum of the pixel values and their neighbors according to the values in
the kernel. The result of this sum becomes the new pixel value in the output image.

3. Pixel-wise Operation: Linear filtering is a pixel-wise operation, meaning that each pixel's value in the
output image is calculated independently based on its corresponding neighborhood in the input image.

Types of Linear Filtering:

1. Smoothing (Low-pass) Filters: These filters are used to reduce noise and smooth out image details.
Common smoothing filters include the Gaussian filter, which assigns higher weights to central pixels and
lower weights to neighboring pixels, and the mean filter, which replaces each pixel value with the average
of its neighbors.

2. Edge Detection (High-pass) Filters: These filters emphasize the sharp changes or edges in an image. The
Sobel, Prewitt, and Roberts operators are commonly used edge detection filters.

3. Gradient Filters: These filters highlight intensity changes in different directions. They are used for tasks
like edge detection and texture analysis. The Sobel filter is an example of a gradient filter.

4. Embossing Filters: These filters create a 3D effect by highlighting the edges in the image. They can make
objects appear raised or engraved.

5. Box Blur and Motion Blur Filters: These filters apply uniform blurring in a specific direction, simulating
motion blur or smoothing.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

Applications of Linear Filtering:

1. Noise Reduction: Smoothing filters are used to reduce random noise in images, making them clearer and
easier to analyze.

2. Edge Detection: High-pass filters are employed to highlight edges, which are important features in image
analysis and object detection.

3. Image Enhancement: Filtering can enhance specific image details or structures to improve the visual
quality or prepare images for further processing.

4. Feature Extraction: Certain filters highlight textures, patterns, or other characteristics, which can be
useful for identifying objects or regions of interest.

5. Preprocessing: Filtering can be a crucial step before applying more complex computer vision algorithms,
improving their accuracy and performance.

In summary, linear filtering is a foundational technique in computer vision that involves convolving
an image with a filter kernel to achieve various effects such as noise reduction, edge detection, and
image enhancement. It's a versatile tool used in various image processing tasks to extract meaningful
information and prepare images for analysis.

More Neighbourhood Operators


Neighborhood operators, also known as local operators or spatial filters, are techniques used
in computer vision and image processing to manipulate the pixel values of an image based on the
values of their neighboring pixels. Unlike point operators, which only consider individual pixel values,
neighborhood operators consider the context of nearby pixels to perform operations that involve
local patterns and structures. These operators are particularly useful for tasks like image
enhancement, edge detection, and texture analysis.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

Key Concepts of Neighborhood Operators:

1. Neighborhood: A neighborhood refers to a specific region around a pixel in an image. It's defined by a
certain number of rows and columns (usually an odd number) that determines the size of the local region.
The pixels within this region are used to compute the output value for the central pixel.

2. Kernel or Mask: The kernel, also referred to as the mask or window, is a small matrix that defines the
weights or coefficients for each pixel within the neighborhood. These weights determine how much
influence each pixel has on the computation of the central pixel's new value.

3. Convolution Operation: The process involves sliding the kernel over each pixel in the image. At each
position, the elements of the kernel are multiplied with the corresponding pixel values in the neighborhood,
and the sum of these products becomes the new value for the central pixel.

Common Neighborhood Operators:

1. Blurring (Smoothing) Filters: Neighborhood operators can perform blurring or smoothing to reduce noise
and make images more visually coherent. Gaussian blur and median filters are examples of smoothing
operators.

2. Edge Detection Filters: These operators emphasize the differences in intensity between neighboring
pixels, making edges more visible. Examples include the Sobel, Prewitt, and Laplacian of Gaussian filters.

3. Gradient Filters: These operators compute the gradient of intensity changes in different directions,
helping to identify edges and boundaries. The Sobel and Scharr filters are popular gradient operators.

4. Noise Removal Filters: Some neighborhood operators, like the median filter, are effective at removing
salt-and-pepper noise or other types of noise from images.

5. Morphological Operators: These operators, such as erosion and dilation, are used for tasks like image
segmentation and noise reduction by modifying the shapes of objects in an image.

Applications of Neighborhood Operators:

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

1. Image Enhancement: By considering local patterns and structures, neighborhood operators can enhance
image features, improve contrast, and reveal hidden details.

2. Edge Detection: Operators like Sobel and Prewitt are used to detect edges by highlighting rapid intensity
changes between neighboring pixels.

3. Feature Extraction: Neighborhood operators can help extract textures, patterns, and other features from
images, aiding in object recognition and classification.

4. Image Restoration: These operators can restore images by reducing noise, smoothing, and correcting
artifacts.

5. Morphological Operations: Erosion, dilation, opening, and closing are used in image segmentation and
shape analysis.

Neighborhood operators are essential tools in computer vision and image processing, as they allow
us to analyze and manipulate images based on local patterns and context, which is often crucial for
understanding the content of images and extracting meaningful information.

Fourier Transforms
Fourier Transforms are fundamental mathematical tools used in various fields, including
computer vision, to analyze signals and images in the frequency domain. They enable the
decomposition of complex signals into simpler sinusoidal components, revealing underlying patterns
and structures that might not be immediately apparent in the spatial domain.

Key Concepts of Fourier Transforms:

1. Spatial Domain vs. Frequency Domain: In the spatial domain, data is represented as intensity values in
an image. The frequency domain, on the other hand, represents data in terms of its frequency components,
indicating how much each frequency contributes to the overall signal.

2. Complex Exponential Basis: Fourier Transforms use complex exponential functions as the basis to
represent signals. These basis functions are sinusoidal waves of various frequencies.

3. Transform and Inverse Transform: A Fourier Transform converts a signal from the spatial domain to the
frequency domain. The Inverse Fourier Transform converts it back from the frequency domain to the spatial
domain.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

4. Magnitude and Phase: The magnitude of the Fourier Transform represents the strength of each frequency
component, while the phase indicates the position of each component in time.

Applications of Fourier Transforms in Computer Vision:

1. Image Filtering and Enhancement: Fourier Transforms are used to design and apply filters in the
frequency domain, enabling tasks such as noise reduction, sharpening, and blurring. Filtering in the
frequency domain can be more intuitive and powerful than in the spatial domain.

2. Image Compression: Fourier Transforms help compress images by representing them in the frequency
domain and removing high-frequency components that contribute less to human perception.

3. Pattern Recognition and Texture Analysis: Frequency information can be vital for detecting patterns and
analyzing textures in images. Certain frequency components are characteristic of specific patterns and
textures.

4. Edge Detection: High-frequency components in the Fourier Transform can help identify edges and abrupt
changes in the image.

5. Image Registration: Fourier Transforms are used for aligning and registering images by analyzing their
frequency content.

6. Holography and 3D Imaging: Fourier Transforms are utilized in holography to record and reconstruct
3D images.

7. Image Deconvolution: Fourier Transforms play a role in image deconvolution, a technique used to
recover the original image from a blurred or distorted version.

8. Frequency Analysis: Fourier Transforms provide insight into the frequency content of images, which is
important for tasks like speech and audio analysis.

Fast Fourier Transform (FFT):

The Fast Fourier Transform is an algorithm that efficiently computes the Fourier Transform and its
inverse. It significantly speeds up the process and is widely used due to its computational efficiency.

In summary, Fourier Transforms are powerful tools in computer vision that allow us to analyze
images and signals in terms of their frequency components. They find applications in various tasks,

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

including filtering, enhancement, compression, and pattern recognition, helping us gain insights into
the underlying structure of visual data.

Pyramids and wavelets


Pyramids and wavelets are two techniques used in computer vision and image processing for
multi-scale analysis of images. They allow for the representation of images at different scales, which
is particularly useful for tasks like image compression, object detection, texture analysis, and feature
extraction. Both pyramids and wavelets provide a way to capture information at various levels of
detail, helping to analyze images efficiently.

Pyramids:

Pyramids are hierarchical representations of images, organized into multiple levels or scales. Each
level contains a version of the image at a different resolution. There are two main types of pyramids:

1. Gaussian Pyramid: In a Gaussian pyramid, images are repeatedly filtered with a Gaussian filter to reduce
their size and resolution. This process effectively blurs the image and reduces high-frequency details. Lower
levels of the pyramid contain lower-resolution images, while higher levels contain higher-resolution images
with lower-frequency content.

2. Laplacian Pyramid: A Laplacian pyramid is constructed by taking the difference between consecutive
levels of the Gaussian pyramid. This pyramid represents the details that are removed during the blurring
process of the Gaussian pyramid. It captures the high-frequency information and can be used to reconstruct
the original image.

Wavelets:

Wavelets are mathematical functions that can represent signals and images in both time and
frequency domains. Wavelet transformations break down an image into different frequency
components at different scales. The concept of wavelets is similar to pyramids, but they use a
different mathematical approach.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

1. Continuous Wavelet Transform (CWT): This transform analyzes an image using a continuous wavelet
function that varies in scale and position. It provides a detailed view of the frequency content of an image
at various scales.

2. Discrete Wavelet Transform (DWT): DWT divides an image into approximate and detailed components
at different scales. It operates on discrete levels, making it suitable for computer applications.

Applications of Pyramids and Wavelets:

1. Image Compression: Both pyramids and wavelets are used in image compression algorithms like
JPEG2000. They allow for efficient representation of images by discarding less significant details at lower
scales.

2. Texture Analysis: Multi-scale analysis helps in capturing texture features at different levels, which can
be useful for texture classification.

3. Object Detection: Multi-scale analysis aids in detecting objects of different sizes within images. Objects
at different scales are often represented more clearly in different levels of the pyramid or wavelet
transformation.

4. Feature Extraction: Pyramids and wavelets can help extract features at various scales, improving the
accuracy of feature-based techniques.

5. Image Denoising: Multi-scale analysis can help in removing noise by isolating high-frequency details,
allowing for effective noise reduction.

6. Image Restoration: Wavelet and pyramid techniques can be used to restore images by enhancing the
important features while suppressing noise.

Both pyramids and wavelets offer valuable tools for multi-scale analysis in computer vision, enabling
the extraction of information from images at different levels of detail and contributing to various
image processing and analysis tasks.

Geometric Transformation
Geometric transformations in computer vision involve manipulating the spatial properties
of images, such as their position, orientation, scale, and perspective. These transformations are used
to align images, correct distortions, and project 3D scenes onto 2D images. Geometric

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

transformations play a crucial role in various computer vision applications, including image
registration, object tracking, augmented reality, and 3D reconstruction.

Common Geometric Transformations:

1. Translation: Shifting an image by a certain amount in the horizontal and vertical directions. This is often
represented as (x + dx, y + dy), where (dx, dy) are the translation parameters.

2. Rotation: Rotating an image around a specified point by a given angle. The rotation can be clockwise or
counterclockwise.

3. Scaling: Changing the size of an image by applying scaling factors independently along the horizontal
and vertical axes.

4. Shearing: Distorting an image by slanting it along one axis while keeping the other axis unchanged.

5. Reflection: Flipping an image along a specified axis, creating a mirror image.

6. Affine Transformation: A combination of translation, rotation, scaling, and shearing. It preserves straight
lines but not necessarily angles and lengths.

7. Projective Transformation (Homography): A more general transformation that includes perspective


effects, such as mapping 3D scenes onto 2D images. It preserves collinearity (straight lines) but not
necessarily angles, lengths, or parallelism.

Applications of Geometric Transformations:

1. Image Registration: Aligning multiple images from different viewpoints or time instances. This is crucial
in applications like medical imaging, satellite imagery, and image mosaicking.

2. Object Tracking: Adjusting the position and orientation of a tracking window as an object moves in a
video sequence.

3. Augmented Reality: Overlaying virtual objects onto the real world by transforming them according to
the camera's perspective.

4. Image Warping: Deforming images for artistic effects or simulating changes in viewing angles.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

5. Camera Calibration: Determining the intrinsic and extrinsic parameters of a camera to map 3D points to
2D image coordinates.

6. Panoramic Image Stitching: Combining multiple images to create a panoramic view by aligning and
blending them using geometric transformations.

7. 3D Reconstruction: Estimating the 3D structure of a scene by mapping 2D points from multiple images
to a common 3D coordinate system.

Geometric transformations are essential for adapting images to different contexts, correcting
distortions introduced by camera optics, and aligning images for further analysis. They are a
fundamental part of computer vision algorithms that involve spatial relationships between images
and scenes.

Global Optimization
Global optimization in computer vision refers to the process of finding the optimal solution
across an entire search space, considering all possible solutions and avoiding local minima or maxima.
It's a challenging problem since many real-world optimization tasks involve complex, multi-
dimensional, and often nonlinear objective functions. Global optimization methods aim to find the
best possible solution regardless of the initial starting point.

Key Challenges in Global Optimization:

1. High-Dimensional Spaces: Many optimization problems in computer vision involve a large number of
variables, making the search space high-dimensional. Traditional optimization methods struggle to handle
such complexities.

2. Nonlinearity: Objective functions in computer vision can be highly nonlinear, with multiple local optima
and global optima scattered throughout the search space.

3. Non-Convexity: The presence of multiple local optima that are not necessarily convex adds complexity
to the optimization problem.

4. Noise and Uncertainty: In real-world scenarios, objective functions often include noise or uncertainty,
making it difficult to distinguish true optima from spurious ones.

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

5. Computationally Intensive: Finding a global optimum can be computationally expensive, especially in


high-dimensional spaces.

Methods for Global Optimization in Computer Vision:

1. Simulated Annealing: This probabilistic technique mimics the annealing process in metallurgy. It allows
the optimization algorithm to explore the search space by accepting uphill moves with decreasing
probability. This helps escape local minima and eventually converge to a global optimum.

2. Genetic Algorithms: Inspired by natural evolution, genetic algorithms use concepts of selection,
mutation, and crossover to evolve a population of potential solutions. They explore the search space
broadly, enabling them to find global optima.

3. Particle Swarm Optimization (PSO): PSO models particles moving through a search space, with each
particle adjusting its position based on its own experience and the experience of other particles. This
cooperative behavior can help discover global optima.

4. Differential Evolution (DE): DE is a population-based optimization algorithm that creates new candidate
solutions by combining differences between existing solutions. It balances exploration and exploitation to
find global optima.

5. Bayesian Optimization: Bayesian optimization uses a probabilistic model to predict the value of the
objective function at different points. It aims to optimize the acquisition function, which balances
exploration and exploitation, to find the best solution.

6. Random Search: Despite its simplicity, random search can be surprisingly effective for global
optimization, especially in high-dimensional spaces. It involves randomly sampling points from the search
space and evaluating the objective function at those points.

7. Evolutionary Strategies: These algorithms employ strategies such as mutation, crossover, and selection
to evolve a population of solutions. They can be adapted to handle complex, noisy, and high-dimensional
optimization problems.

8. Metaheuristic Algorithms: Metaheuristic algorithms, like ant colony optimization, harmony search, and
cuckoo search, are designed to tackle complex optimization problems by simulating natural or social
behaviors.

Global optimization methods in computer vision are crucial for parameter tuning, model selection,
and solving optimization tasks where local optima are not sufficient. These methods help in finding

Downloaded by Jeevan Jothivel ([email protected])


lOMoARcPSD|51618313

solutions that are closer to the true global optimum across a wide range of applications, from image
processing to machine learning.

Downloaded by Jeevan Jothivel ([email protected])

You might also like