Computer Vision NOTES
Computer Vision NOTES
• Computer Vision is a field that focuses on developing algorithms and techniques to enable
computerstounderstandandinterpretvisualinformation from images or videos. It encompasses a wide range of tasks,
including image recognition, object detection, image segmentation, and scene understanding. To tackle these tasks,
Computer Vision can be categorized into three levels: low-level vision, mid-level vision, and high-level vision. These levels
represent different stages of processing and analysis in the field of computer vision
• 1 Low-level vision
• The goal of low-level vision is to enhance the quality of the image and extract fundamental information that can be used
as input for higher-level processing. Some key aspects of low-level vision include image enhancement, image filtering, and
feature detection. include image enhancement, image filtering, and feature detection. Image enhancement techniques
aim to improve the visual quality of images. This may involve operations such as noise reduction, contrast enhancement,
and image sharpening. These techniques help to improve the visibility of important details in the image and reduce the
impact of unwanted artifacts or noise. Image filtering techniques are used to highlight or extract specific features from
images. Filters, such as Gaussian filters, Sobel filters, and Laplacian filters, can be applied to enhance edges, detect
textures, or smooth the image. Feature detection is the process of identifying and locating specific image features, such as
corners, edges, or blobs. These features serve as important cues for higher-level tasks. Techniques like the Harris corner
detector or the Canny edge detector are commonly used for feature detection in low-level vision.
• 2. Mid-level Vision: Mid-level vision involves higher-level processing that goes beyond individual pixels and focuses on
extracting meaningful information and grouping pixels into coherent entities. It involves analyzing regions, contours, and
boundaries to understand the structure of an image. Image segmentation is a key aspect of mid-level vision. It involves
dividing an image into meaningful regions or segments based on similarities in color, texture, or other visual properties.
Segmentation helps in object identification, boundary detection, and further analysis of specific regions of interest. Motion
analysis is another important task in mid-level vision. It involves tracking and analyzing motion patterns in image
sequences. This includes tasks such as object tracking, optical flow estimation, and activity recognition. By analyzing the
motion patterns, important temporal information can be extracted, enabling tasks such as action recognition or behavior
analysis.
Types of vision continued
• High-level Vision: High-level vision involves advanced interpretation and understanding of visual data,
focusing on semantic understanding and higher-level concepts. It involves extracting high-level information
and making inferences based on the analyzed data. Object recognition is a fundamental task in high-level
vision. It involves identifying and classifying objects or specific object instances within an image or video.
This may involve training machine learning or deep learning models to recognize objects based on their
features or patterns.
• Object recognition is widely used in applications such as autonomous driving, surveillance, and image
retrieval. Scene understanding is another important aspect of high-level vision. It involves analyzing the
overall context and structure of a scene, including the relationships between objects, scene categories, and
spatial layouts. Scene understanding enables higher-level reasoning about the scene and can be valuable in
applications such as robotics, scene understanding, and navigation systems. Image understanding is the
process of extracting meaning and understanding from images. This includes tasks such as image captioning,
visual question answering, and image-based reasoning.
• Image understanding aims to bridge the gap between visual perception and natural language understanding,
enabling computers to comprehend and interpret images in a way that is similar to human understanding. In
conclusion, the field of Computer Vision can be categorized into three levels: low-level vision, mid-level
vision, and high-level vision. These levels represent different stages of processing and analysis, from
extracting basic visual features to understanding complex scenes and making high-level inferences. Each
level of vision contributes to a deeper understanding of visual data and enables computers to interpret and
interact with the visual world in a more intelligent and meaningful way
FUNDAMENTALS OF IMAGE FORMATION
• 1. Light and Electromagnetic Spectrum: Image formation begins with the interaction of light, which is an
electromagnetic radiation, with objects in the scene. The electromagnetic spectrum ranges from low-frequency
radio waves to high-frequency gamma rays. Visible light, the portion of the spectrum that human eyes can perceive,
spans wavelengths of approximately 400 to 700 nanometers. When light interacts with objects, it undergoes various
processes such as reflection, transmission, absorption, and scattering.
• 2. Reflection and Absorption: When light strikes an object, it can be either reflected or absorbed. The properties of
the object, such as its color and texture, determine how it interacts with light. An object appears a certain color
because it reflects light of that color while absorbing other wavelengths. The surface characteristics of an object,
such as its smoothness or roughness, affect the scattering of light. A smooth surface tends to produce more
specular reflection, where light is reflected in a specific direction, while a rough surface scatters light in multiple
directions.
• 3. Illumination: Illumination refers to the light that falls on the objects in a scene. It plays a crucial role in image
formation as it determines the amount and distribution of light that is reflected from the objects. The intensity,
direction, and color of the incident light influence the appearance of objects in an image. Different lighting
conditions can result in variations in brightness, contrast, and color rendition in images. Controlling illumination is
important in applications such as photography, computer vision, and image analysis.
• 4. Imaging Geometry: Image formation involves the geometry of capturing light rays from the scene and projecting
them onto an image sensor or film. The key components of imaging geometry include the camera, lens, and the
relative positions of the scene, camera, and object of interest. The properties of the camera and lens, such as focal
length and aperture, affect the image formation process. The distance between the camera and the object, known
as the camera-object distance, also plays a role in determining the size and perspective of objects in the captured
image.
IMAGE FORMATION continued
• 5. Pinhole Camera Model: The pinhole camera model is a simplified representation of the imaging process. It assumes a
light-tight box with a small aperture (pinhole) on one side and a photosensitive surface (film or image sensor) on the
opposite side. Light rays from the scene pass through the aperture and form an inverted image on the sensor/film. This
model helps in understanding basic concepts of image formation, such as perspective projection and the role of aperture
size in controlling depth of fieldof field.
• 6. Lens and Optical Systems: In practical cameras, lenses are used to focus light rays onto the image sensor or film.
Lenses consist of curved surfaces that refract (bend) light, allowing the camera to capture a sharper and properly focused
image. Lens properties, such as focal length, aperture size, and lens aberrations, impact image quality and formation. The
focal length determines the field of view and the magnification of objects in the image. The aperture size controls the
amount of light entering the camera and affects the depth of field, which determines the range of distances that appear
in sharp focus.
• 7. Image Sensor: Modern digital cameras use image sensors (e.g., CCD or CMOS) to convert light into electrical signals.
Each pixel on the sensor measures the intensity of light falling on it and generates a corresponding electrical signal.
These signals are then processed to form a digital image. The size and arrangement of pixels on the sensor influence the
spatial resolution and image quality. The sensor's sensitivity to light, noise characteristics, and dynamic range also affect
the image formation process.
• 8. Image Formation Process: The image formation process involves several steps. First, light rays from the scene pass
through the camera lens, which refracts and focuses them onto the image sensor. The lens controls the amount of light
entering the camera and determines the focusing distance. The image sensor converts the light into electrical signals,
representing the intensity of light falling on each pixel. The electrical signals are processed and digitized to form a digital
image, typically represented as a grid of pixel values. Further processing, such as color interpolation and demosaicing,
may be applied to obtain a full-color image. The digital image can then undergo additional processing steps, such as
image enhancement, analysis, or compression, depending on the application
E. Transformation
• Image transformation refers to the process of applying geometric modifications to an image. It involves
altering the spatial characteristics of an image, such as its shape, size,charateristics of an image, size,
orientation, or perspective. By performing transformations on an image, it is possible to modify its
appearance or adjust specific aspects of interest.
• Transformation techniques manipulate the pixel coordinates of an image, repositioning them according to
a defined mathematical model. These models typically involve matrix operations and mathematical
formulas that determine how the pixels should be transformed. Transformations can be used for various
purposes in image processing and computer vision. They are employed to correct geometric distortions in
images caused by camera or lens imperfections. For example, lens distortion can be corrected by applying
an appropriate transformation to the image pixels.
• Transformations are also utilized for image alignment and registration. By transforming images, they can
be brought into alignment with each other, allowing for comparison or fusion of multiple images. This is
particularly useful in applications such as panoramic image stitching or super-resolution imaging.
Furthermore, transformations play a crucial role in tasks such as image warping and morphing. Image
warping involves manipulating an image to match a particular shape or template. This can be used for
artistic purposes or to simulate effects such as perspective adjustments or non-rigid deformations.
• Image morphing, on the other hand, involves smoothly transitioning between two or more images by
applying a series of intermediate transformations. In summary, image transformation is a fundamental
concept in image processing. It enables the modification of an image's geometry, allowing for corrections,
alignments, distortions, or creative modifications to enhance visual content or facilitate further analysis
Transformation CONTINUED
• Angle of rotation determines the extent of the transformation. Rotation is a fundamental operation in image processing as it allows images
to be adjusted to a desired orientation. It is commonly used for tasks like aligning images that are taken from different angles or merging
multiple images into a coherent panorama. By applying a rotation transformation, these images can be brought into a common orientation
or coordinate system, facilitating comparison and analysis. Reflection, another orthogonal transformation, produces a mirror image of the
original. It involves flipping the image along a line, known as the reflection axis. This transformation results in a reversed version of the
image, with objects appearing as mirror reflections of themselves. Reflections are frequently used in image processing for applications
such as symmetry analysis or pattern recognition.
• 1 Orthogonal transformations are particularly valuable because they preserve important geometric properties of the image, such as
angles and distances between points. This property ensures that the overall shape and size of objects in the image remain unchanged
after applying these transformations. By maintaining these geometric properties, orthogonal transformations enable accurate
measurement and analysis of images, as well as effective alignment of images for further processing or visualization. In summary,
orthogonal transformations play a crucial role in image processing by preserving angles and distances between points. The rotation
operation allows for image orientation adjustment, while reflection generates mirror images. These transformations are essential for tasks
like image alignment, panorama creation, symmetry analysis, and pattern recognition. By preserving geometric properties, orthogonal
transformations ensure the integrity of image content and facilitate accurate analysis and manipulation.
• .2 Euclidean Transformation: Euclidean transformations are a class of geometric transformations that include translation, rotation, and
scaling. These transformations are widely used in image processing and computer vision.
• Translation is commonly employed for tasks like image alignment, where multiple images need to be registered based on common
features or landmarks.
• Rotation is another important Euclidean transformation. It changes the orientation of the image by rotating it around a given center point.
The angle of rotation determines the extent of the transformation. Rotating an image preserves its shape and size while reorienting it in
space. This transformation is extensively used for tasks such as image stabilization, object detection, and image correction.
• Scaling is a Euclidean transformation that resizes the image. It can make the image larger (upscaling) or smaller (downscaling). Scaling is
often expressed as a scaling factor applied to the image dimensions. When scaling an image, all its elements, including shapes and
distances between points, are uniformly stretched or compressed. This transformation is utilized in applications such as image resizing,
zooming, and object recognition at different scales
AFFINE
• 3.Affine Transformation:
• An affine transformation is a versatile geometric transformation that encompasses a combination of
translation, rotation, scaling, shearing, and reflection. It is a more general transformation than Euclidean
transformations and allows for a wider range of spatial modifications, including changes in the shape of
objects within an image.
• Translation is the simplest form of affine transformation and involves shifting an image in the x and y
directions. It moves the entire image without altering its shape or orientation.
• Rotation changes the orientation of the image by rotating it around a specified center point.
• Scaling modifies the size of the image, making it larger (upscaling) or smaller (downscaling) uniformly or
independently along each axis.
• These operations are similar to those in Euclidean transformations but are now part of a larger set of
transformations. In addition to translation, rotation, and scaling, affine transformations include shearing and
reflection. Shearing involves skewing the image along one or both axes, distorting its shape. Reflection
produces a mirror image of the original, flipping the image along a specified axis. These transformations
allow for more complex modifications, such as simulating perspective effects or creating artistic distortions.
One key property of affine transformations is that they preserve parallel lines. This means that if two lines are
parallel in the original image, they will remain parallel after the transformation. Affine transformations also
preserve the ratios of distances between points. This property ensures that the spatial relationships between
objects in the image remain intact during the transformation. Additionally, affine transformations preserve
affine combinations of points, which is a linear combination of points where the coefficients sum to one
Projective Transformation
• Projective Transformation:
• Projective transformations, also known as perspective transformations, are a type of
geometric transformation that allows for more general distortions of an image. Unlike
affine transformations that preserve parallel lines, projective transformations can
represent 3D transformations, including perspective projections and non-parallel lines.
They are widely used in computer vision tasks such as 3D reconstruction, augmented
reality, and virtual reality. One key characteristic of projective transformations is their
ability to simulate perspective. In the real world, objects that are farther away appear
smaller than those closer to the viewer. Perspective transformations replicate this
effect by mapping points in the 3D space onto a 2D image plane, taking into account
the viewer's viewpoint. This is particularly important in applications such as computer
graphics and rendering, where realistic virtual scenes need to be generated.
• Projective transformations are commonly used in camera calibration. By applying a
known pattern or set of reference points, the properties of the camera and its distortion
parameters can be estimated. This enables accurate 3D reconstruction or
measurement of objects in the scene. Projective transformations are also utilized in
tasks like image rectification, where images taken from different viewpoints are
transformed to a common perspective, making them suitable for stereo vision or visual
odometry. in augmented reality, projective transformations play a crucial role in
overlaying virtual objects onto real-world scenes
Fourier Transform
• 5 Fourier Transform: The Fourier transform is a fundamental mathematical technique used in image processing
to analyze the frequency components of an image. It decomposes an image into its constituent frequencies,
providing valuable information about the spatial frequency content of the image. The Fourier transform
represents an image as a sum of sine and cosine functions at different frequencies. By applying the Fourier
transform to an image, the image's spatial domain representation is converted into its frequency domain
representation. This transformation
• filtering. By analyzing the frequency spectrum, specific frequency components can be identified and
manipulated. For example, high-pass filtering can be applied to remove low-frequency components, effectively
reducing noise or blurring in the image. On the other hand, low-pass filtering can be employed to remove high-
frequency components, smoothing the image or reducing details. This ability to selectively filter frequency
components allows for image enhancement and noise reduction.
• Another application of the Fourier transform is image compression. The frequency spectrum provides
information about the relative importance of different frequency components in the image. By representing the
image in the frequency domain and discarding or quantizing less significant frequency components, the image
can be compressed while preserving essential visual information. This is the basis for transform-based image
compression techniques such as JPEG. The Fourier transform is also beneficial in the analysis of periodic
patterns or textures in images. By examining the frequency spectrum, it is possible to identify the dominant
frequencies that contribute to the periodic patterns. This can be useful in tasks such as texture analysis, pattern
recognition, and image segmentation
Convolution and Filtering
• Convolution is a fundamental operation in image processing that combines two
functions to produce a third function. It is widely used for filtering operations, which
involve modifying an image by applying a convolution operation between the image
and a filter kernel. Convolution plays a crucial role in various image processing tasks
such as image enhancement, feature extraction, and noise reduction.
• The concept of convolution can be understood by considering a discrete image and a
filter kernel. The image consists of a grid of pixels, where each pixel has a
corresponding intensity value. The filter kernel is a small matrix of coefficients that
defines a local neighborhood around each pixel.
• The convolution operation involves sliding the filter kernel over the image,
computing the weighted sum of the pixel values in the neighborhood defined by the
kernel, and assigning the result to the corresponding output pixel. The values in the
filter kernel determine the nature of the convolution operation. They act as a
template that specifies how the values of nearby pixels contribute to the output pixel .