0% found this document useful (0 votes)
34 views26 pages

Computer Vision SM-1

The document provides a comprehensive overview of computer vision, detailing its definition, challenges, and applications across various fields, including autonomous vehicles and medical imaging. It traces the history of computer vision from its early beginnings in the 1960s to the advancements in deep learning and machine learning in the 2010s, highlighting key developments and breakthroughs. The future of computer vision is discussed, emphasizing the ongoing pursuit of achieving human-level perception and the integration of AI technologies.

Uploaded by

notfairksd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views26 pages

Computer Vision SM-1

The document provides a comprehensive overview of computer vision, detailing its definition, challenges, and applications across various fields, including autonomous vehicles and medical imaging. It traces the history of computer vision from its early beginnings in the 1960s to the advancements in deep learning and machine learning in the 2010s, highlighting key developments and breakthroughs. The future of computer vision is discussed, emphasizing the ongoing pursuit of achieving human-level perception and the integration of AI technologies.

Uploaded by

notfairksd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

BELGAUM

COMPUTER VISION

(COUSE Code: BPLCK105B)

STUDY MATERIAL

VI-SEMESTER

Dr. Krishna Prasad K


Associate Professor, Dept of ISE

A J INSTITUTE OF ENGINEERING & TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
(A unit of Laxmi Memorial Education Trust. (R))
NH - 66, Kottara Chowki, Kodical Cross - 575 006

1
Computer Vision BCS613B

MODULE -1

Syllabus: Introduction: What is computer vision? A brief history. Image Formation:


Photometric image formation, The digital camera. Image processing: Point operators, Linear
filtering.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

MODULE -1

1.1 What is Computer Vision?


Understanding Human and Machine Vision
Computer vision is the scientific field that enables machines to interpret and analyze images
and videos in a way similar to human vision. The human visual system naturally perceives the
three-dimensional world around it, recognizing objects, depth, and context with ease. For
example, when observing a vase of flowers, a person can immediately determine the shape,
color, and transparency of each petal due to the way light interacts with surfaces.
Similarly, humans can effortlessly count and recognize individuals in a group photo and even
infer emotions from facial expressions. Despite significant advancements, replicating this level
of understanding using computational models remains a challenging problem in artificial
intelligence.
Challenges of Computer Vision
Unlike traditional computer tasks that involve structured data, computer vision deals with an
inverse problem: reconstructing 3D structures and object properties from 2D images. This
process is inherently difficult due to the complexity of real-world lighting, object occlusion,
and variations in camera perspectives.
To overcome this, researchers use:
• Physics-based models: Understanding how light interacts with objects.
• Probabilistic models: Estimating likely interpretations from ambiguous visual data.
• Machine learning: Training models on vast datasets to recognize patterns.
The Role of Computer Vision in AI
Early artificial intelligence researchers believed that vision was an easy problem to solve,
expecting it to be simpler than logical reasoning and planning. However, decades of research
have shown that perception is one of the most complex aspects of intelligence.
Real-World Applications of Computer Vision
Despite its challenges, computer vision has achieved significant success across multiple
domains:
1. Optical Character Recognition (OCR)
• Extracts text from images, enabling applications such as automatic number plate
recognition (ANPR) and postal code scanning.
2. Machine Inspection
• Uses stereo vision to inspect parts in manufacturing for quality control.
• Detects defects in materials using specialized lighting and imaging techniques.
3. Retail & Automated Stores

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

• Recognizes objects for automated checkout systems.


• Powers systems like Amazon Go, where customers can pick up items without scanning
them.
4. Autonomous Vehicles
• Self-driving cars use object detection, lane tracking, and pedestrian recognition.
• Uses Simultaneous Localization and Mapping (SLAM) for navigation.
5. 3D Model Building & Photogrammetry
• Generates 3D reconstructions of environments from aerial and drone imagery.
6. Medical Imaging
• Analyzes CT scans, MRIs, and X-rays for diagnosis.
• Used for tracking disease progression over time.
7. Motion Capture & Surveillance
• Captures actor movements for animated films.
• Monitors traffic patterns and detects intrusions.
8. Facial Recognition & Biometrics
• Enhances security by authenticating individuals.
• Used in smartphones for facial unlocking.
Consumer-Level Applications of Computer Vision
Beyond industrial applications, computer vision has become an integral part of consumer
technology:
1. Image Stitching
• Combines multiple images into a seamless panorama.
• Example: Google Photos’ panorama feature.
2. Exposure Bracketing & HDR Imaging
• Merges multiple images taken at different exposures to create a perfectly lit photo.
• Used in smartphone cameras for better night photography.
3. Augmented Reality (AR)
• Integrates virtual objects into real-world scenes.
• Example: AR filters on social media platforms.
4. Face Detection & Authentication
• Enhances camera focusing and photo tagging in social media apps.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

Engineering vs. Scientific Approaches to Computer Vision


Computer vision can be studied from two perspectives:
1. Engineering Approach: Focuses on building practical solutions by applying known
techniques.
2. Scientific Approach: Develops mathematical models of vision, considering the
physics of light and sensor behavior.
For instance, Bayesian models help estimate the likelihood of different interpretations
based on prior knowledge, improving the accuracy of vision systems.
Future of Computer Vision
While modern AI-driven vision systems can recognize objects and generate 3D models, they
still fall short of human-level perception. The field continues to advance through:
• Deep Learning: More sophisticated neural networks improve accuracy.
• Neural Rendering: Enhancing realism in synthetic images.
• Self-Supervised Learning: Reducing reliance on labeled datasets.
1.2 History of Computer Vision
1. Early Beginnings (1960s-1970s)
The Foundations of Computer Vision
The field of computer vision emerged as part of a larger effort to develop artificial intelligence
(AI) that could mimic human perception and reasoning. In the early 1970s, researchers at
institutions such as MIT, Stanford, and Carnegie Mellon University believed that solving visual
perception would be a stepping stone to higher-level reasoning.
One of the most famous early attempts was initiated by Marvin Minsky at MIT in 1966, who
assigned a student to "spend the summer linking a camera to a computer and getting the
computer to describe what it saw".
Key Research Topics in the 1970s
• Edge detection: Early work focused on identifying object boundaries in images.
• Line labeling & 3D structure inference: Researchers attempted to recover the structure
of 3D objects from 2D line drawings.
• Pattern recognition: Developing methods to classify visual data based on statistical
models.
• Stereo vision: Early techniques were developed to estimate depth using two images
taken from different viewpoints.
One of the major distinctions between computer vision and the existing field of digital image
processing was the desire to recover the 3D structure of the world rather than simply enhancing
images.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

2. The 1980s: The Rise of Mathematical and Statistical Methods


During the 1980s, researchers focused on more rigorous mathematical models for vision tasks.
This period saw:
• Image pyramids: Multi-scale image representations were used for image blending and
coarse-to-fine processing.
• Shape from shading: Algorithms were developed to estimate surface properties based
on how light interacted with an object.
• Optical flow: Motion estimation techniques were introduced to track movement across
frames in a video.
• 3D reconstruction: Early structure from motion (SfM) techniques were proposed to
recover the 3D shape of objects from image sequences.
David Marr’s work (1982) played a significant role in shaping the way researchers approached
computer vision. He introduced the idea of three levels of vision processing:
1. Low-level vision: Extracting basic features like edges and textures.
2. Intermediate-level vision: Grouping features into meaningful structures.
3. High-level vision: Recognizing and interpreting objects.
3. The 1990s: Advancements in 3D Vision and Statistical Approaches
The 1990s witnessed significant progress in computer vision algorithms and applications due
to increased computational power. Key developments included:
• Projective geometry and camera models: Advances in geometric image formation
allowed for better 3D reconstruction.
• Facial recognition and object detection: The first successful methods for face
recognition were developed using Eigenfaces (1991).
• Markov Random Fields (MRFs) and Kalman Filters: Probabilistic models improved
object tracking and segmentation tasks.
This period also saw the rise of real-time vision systems, enabling applications such as
automatic number plate recognition (ANPR) and industrial inspection.
History of Computer Vision
1. Early Beginnings (1960s-1970s)
The Foundations of Computer Vision
The field of computer vision emerged as part of a larger effort to develop artificial intelligence
(AI) that could mimic human perception and reasoning. In the early 1970s, researchers at
institutions such as MIT, Stanford, and Carnegie Mellon University believed that solving visual
perception would be a stepping stone to higher-level reasoning.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

One of the most famous early attempts was initiated by Marvin Minsky at MIT in 1966, who
assigned a student to "spend the summer linking a camera to a computer and getting the
computer to describe what it saw"
Key Research Topics in the 1970s
• Edge detection: Early work focused on identifying object boundaries in images.
• Line labeling & 3D structure inference: Researchers attempted to recover the
structure of 3D objects from 2D line drawings.
• Pattern recognition: Developing methods to classify visual data based on statistical
models.
• Stereo vision: Early techniques were developed to estimate depth using two images
taken from different viewpoints.
One of the major distinctions between computer vision and the existing field of digital image
processing was the desire to recover the 3D structure of the world rather than simply
enhancing images.
2. The 1980s: The Rise of Mathematical and Statistical Methods
During the 1980s, researchers focused on more rigorous mathematical models for vision
tasks. This period saw:
• Image pyramids: Multi-scale image representations were used for image blending
and coarse-to-fine processing.
• Shape from shading: Algorithms were developed to estimate surface properties based
on how light interacted with an object.
• Optical flow: Motion estimation techniques were introduced to track movement across
frames in a video.
• 3D reconstruction: Early structure from motion (SfM) techniques were proposed to
recover the 3D shape of objects from image sequences.
David Marr’s work (1982) played a significant role in shaping the way researchers
approached computer vision. He introduced the idea of three levels of vision processing:
1. Low-level vision: Extracting basic features like edges and textures.
2. Intermediate-level vision: Grouping features into meaningful structures.
3. High-level vision: Recognizing and interpreting objects.
3. The 1990s: Advancements in 3D Vision and Statistical Approaches
The 1990s witnessed significant progress in computer vision algorithms and applications
due to increased computational power. Key developments included:
• Projective geometry and camera models: Advances in geometric image formation
allowed for better 3D reconstruction.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

• Facial recognition and object detection: The first successful methods for face
recognition were developed using Eigenfaces (1991).
• Markov Random Fields (MRFs) and Kalman Filters: Probabilistic models improved
object tracking and segmentation tasks.
This period also saw the rise of real-time vision systems, enabling applications such as
automatic number plate recognition (ANPR) and industrial inspection.
4. The 2000s: Machine Learning and Data-Driven Methods
The early 2000s marked a paradigm shift in computer vision with the introduction of machine
learning-based methods.
Key Developments
• Feature-based recognition: Algorithms like SIFT (Scale-Invariant Feature
Transform, 2004) became the standard for object detection.
• Graph-based segmentation: Techniques such as Normalized Cuts (Shi & Malik,
2000) improved image segmentation.
• 3D Modeling & Photogrammetry: Advances in Structure from Motion (SfM)
enabled large-scale 3D reconstructions from images.
• Image-based Rendering: Techniques such as light-field capture and HDR imaging
allowed for photo-realistic image synthesis.
A key trend in this decade was the interplay between computer vision and computer
graphics, enabling applications such as augmented reality (AR) and computational
photography (Computer Vision: Algorithms and Applications, pages 19-20).
5. The 2010s: The Deep Learning Revolution
The most significant breakthrough in computer vision came in the 2010s with deep learning.
The availability of large-scale labeled datasets, such as ImageNet (2009), combined with
advances in GPU computing, led to a dramatic improvement in vision algorithms.
Key Breakthroughs
• Convolutional Neural Networks (CNNs): The AlexNet model (2012) outperformed
traditional vision algorithms on the ImageNet Challenge, sparking the deep learning
revolution.
• Object Detection & Semantic Segmentation: Algorithms such as Faster R-CNN
(2015) and Mask R-CNN (2017) achieved state-of-the-art results.
• Generative Adversarial Networks (GANs): Introduced in 2014, GANs allowed for
realistic image synthesis and enhancement.
• Real-time 3D Scene Understanding: Systems like KinectFusion (2011) enabled
depth estimation and 3D reconstruction in real-time.
• Self-Supervised Learning: Models began learning representations from unlabeled
images, reducing dependence on large labeled datasets.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

This period also saw a rise in autonomous driving, medical imaging AI, and facial
recognition systems, making computer vision a mainstream technology.
6. The 2020s and Beyond: Towards Generalized AI Vision
Today, computer vision continues to evolve toward more generalized AI systems that can
learn from fewer examples and adapt to new environments.
Emerging Trends
• Neural Rendering & 3D Reconstruction: AI-driven rendering techniques are making
virtual worlds more realistic.
• Self-Supervised & Few-Shot Learning: New models aim to reduce the need for large
labeled datasets.
• Multi-modal Vision Models: Combining vision with language and audio to improve
AI's understanding of the world.
• AI-powered Robotics: Advanced vision systems are enabling more capable robotic
assistants and self-driving cars.
Despite these advances, the ultimate goal of achieving human-level vision understanding
remains unsolved.
The early 2000s marked a paradigm shift in computer vision with the introduction of machine
learning-based methods.
Key Developments
• Feature-based recognition: Algorithms like SIFT (Scale-Invariant Feature Transform,
2004) became the standard for object detection.
• Graph-based segmentation: Techniques such as Normalized Cuts (Shi & Malik, 2000)
improved image segmentation.
• 3D Modeling & Photogrammetry: Advances in Structure from Motion (SfM) enabled
large-scale 3D reconstructions from images.
• Image-based Rendering: Techniques such as light-field capture and HDR imaging
allowed for photo-realistic image synthesis.
A key trend in this decade was the interplay between computer vision and computer graphics,
enabling applications such as augmented reality (AR) and computational photography
(Computer Vision: Algorithms and Applications, pages 19-20).
5. The 2010s: The Deep Learning Revolution
The most significant breakthrough in computer vision came in the 2010s with deep learning.
The availability of large-scale labeled datasets, such as ImageNet (2009), combined with
advances in GPU computing, led to a dramatic improvement in vision algorithms.
Key Breakthroughs

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

• Convolutional Neural Networks (CNNs): The AlexNet model (2012) outperformed


traditional vision algorithms on the ImageNet Challenge, sparking the deep learning
revolution.
• Object Detection & Semantic Segmentation: Algorithms such as Faster R-CNN (2015)
and Mask R-CNN (2017) achieved state-of-the-art results.
• Generative Adversarial Networks (GANs): Introduced in 2014, GANs allowed for
realistic image synthesis and enhancement.
• Real-time 3D Scene Understanding: Systems like KinectFusion (2011) enabled depth
estimation and 3D reconstruction in real-time.
• Self-Supervised Learning: Models began learning representations from unlabeled
images, reducing dependence on large labeled datasets.
This period also saw a rise in autonomous driving, medical imaging AI, and facial recognition
systems, making computer vision a mainstream technology.
6. The 2020s and Beyond: Towards Generalized AI Vision
Today, computer vision continues to evolve toward more generalized AI systems that can learn
from fewer examples and adapt to new environments.
Emerging Trends
• Neural Rendering & 3D Reconstruction: AI-driven rendering techniques are making
virtual worlds more realistic.
• Self-Supervised & Few-Shot Learning: New models aim to reduce the need for large
labeled datasets.
• Multi-modal Vision Models: Combining vision with language and audio to improve
AI's understanding of the world.
• AI-powered Robotics: Advanced vision systems are enabling more capable robotic
assistants and self-driving cars.
Despite these advances, the ultimate goal of achieving human-level vision understanding
remains unsolved.
1.3 Image Formation
Image formation is a critical process in computer vision that describes how a scene is captured
and transformed into an image. This involves light interaction with objects, camera optics,
and sensor characteristics.
Photometric image formation focuses on how light interacts with surfaces and how that
interaction is captured by a camera sensor. The final image's brightness, contrast, and color are
affected by:
• Lighting conditions.
• Reflectance and shading.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

• Optical properties of the lens.


• The image sensor’s response to light.
1.3.1 Lighting
Light is the primary medium that allows us to capture images. A scene must be illuminated for
an image to exist.
Types of Light Sources:
1. Point Light Sources – Emit light from a single point in all directions (e.g., light bulbs,
flashlights).
2. Directional Light Sources – Emit parallel rays, creating sharp shadows (e.g., sunlight).
3. Area Light Sources – Cover a broad region and produce soft shadows (e.g., LED
panels, skylights).
1.3. 2 Reflectance and Shading
Reflectance and shading define how light interacts with an object’s surface.
• Reflectance: The proportion of light that a surface reflects.
• Shading: The variation in brightness due to surface orientation, shadowing, and
occlusion.
Shading effects include:
• Diffuse shading: Light is scattered uniformly.
• Specular shading: Light is reflected in a specific direction, creating highlights.
• Shadowing and Occlusion: Some areas receive less light due to obstacles blocking
illumination.
1.3. 3 The Bidirectional Reflectance Distribution Function (BRDF)
The BRDF is a mathematical model that describes how light reflects off a surface at different
angles.
𝑑𝐿
BRDF = 𝑑𝐸𝑟
𝑖

Where 𝐿𝑟 is the reflected radiance

𝐸𝑖 , is the incoming irradiance


1.3.4 Diffuse Reflection
Diffuse reflection occurs when light is scattered in all directions. It is the primary form of
reflection for matte surfaces (e.g., paper, cloth, unpolished wood).
• Follows Lambert’s Law, which states that brightness remains the same regardless of
the viewer’s angle.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

• The intensity is calculated as:

I=L⋅ cos(θ)
where L is the incoming light and θ is the angle between the light source and the surface normal.
1.3.5 Specular Reflection
Specular reflection occurs on shiny surfaces, where light reflects in a specific direction.
• The reflection angle equals the incidence angle.
• Common in materials like glass, water, and polished metal.
• Creates bright highlights, which depend on the viewing angle.
1.3.6 Phong Shading Model
The Phong shading model is used in computer graphics to approximate how light interacts
with surfaces.
It combines:
• Ambient Reflection (constant background illumination).
• Diffuse Reflection (Lambertian model).
• Specular Reflection (creates highlights).
Phong shading formula:
I = 𝑘𝑎 𝐼𝑎 + 𝑘𝑑 (𝐿 . 𝑁) 𝐼𝑑 + 𝑘𝑠 ( R. V)𝑛 𝐼𝑠
𝑘𝑎 , 𝑘𝑑 , 𝑘𝑠 , are material coefficients.
𝐼𝑎 , 𝐼𝑑 , 𝐼𝑠 , are the ambient, diffuse, and specular intensities.
L, N, R, V are light, normal, reflection, and view vectors.
n controls shininess.
1.3.7 Di-Chromatic Reflection Model
The Di-Chromatic Reflection Model states that the observed color of a surface is a mix of:
• Diffuse reflection (color from the material).
• Specular reflection (color from the light source).
Lr=Li+Lb
where Li is specular and Lb is body reflection.
1.3.8 Global Illumination: Ray Tracing and Radiosity
Global illumination methods simulate realistic lighting.
• Ray Tracing: Follows rays of light as they bounce off surfaces, creating realistic
reflections and shadows.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

• Radiosity: Simulates indirect lighting by distributing light energy across surfaces.


1.3. 9 Optics
Optics govern how light is focused onto the camera sensor. Key factors include:
• Focal Length: Determines zoom and perspective.
• Aperture (f-number): Controls light entry and depth of field.
• Shutter Speed: Affects exposure and motion blur.
1.3.10 Chromatic Aberration
Chromatic aberration occurs when different wavelengths of light focus at different points,
causing color fringing.

1.3.11 Vignetting
Vignetting causes brightness to fade towards the edges due to lens properties.
1.5 Digital Camera
A digital camera converts light into a digital image using optical lenses, image sensors, and a
digital signal processing pipeline. The accuracy and quality of the captured image depend on
the ability of the camera to sense light, process signals, and minimize noise.
Process of Digital Image Formation
1. Light enters the camera lens and passes through the optical system.
2. The sensor (CCD or CMOS) captures photons and converts them into electrical
signals.
3. The Analog-to-Digital Converter (ADC) transforms the electrical signals into digital
pixel values.
4. Image processing steps (demosaicing, noise reduction, compression, etc.) refine the
final image.
1.5.1 Light Sensing Technology
Light sensing technology in digital cameras involves capturing photons using specialized
sensors and converting them into electronic signals that can be processed to create an image.
1.5.1.1 The Role of Light Sensors
A digital camera’s sensor consists of millions of photodetectors that collect light and
transform it into electrical signals. The amount of charge generated is proportional to the
intensity of incoming light.
Types of Sensors in Light Sensing Technology:
1. Photodiodes – Convert light energy into electrical current.
2. Charge-Coupled Devices (CCDs) – Store and transfer charge pixel-by-pixel.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

3. Complementary Metal-Oxide Semiconductor (CMOS) – Amplifies charge at each


pixel.
1.5. 2. CCD Sensors (Charge-Coupled Devices)
1.5.2.1 Working Principle
CCD sensors capture images by accumulating light in photodiodes, which convert photons
into electrons. These electrons are stored and transferred pixel by pixel to an output node, where
the charge is amplified and digitized.
1.5.2.2 Advantages of CCD Sensors
• High Image Quality – Less noise and uniform response across pixels.
• Superior Light Sensitivity – Produces clearer images in low-light conditions.
• Better Color Reproduction – Higher dynamic range.
1.5.2.3 Disadvantages of CCD Sensors
• Power Consumption – Requires more power than CMOS sensors.
• Slow Readout Speed – Charge transfer takes time, making CCDs slower in capturing
fast-moving objects.
• Expensive Manufacturing Process – More costly than CMOS sensors.
CCD sensors are widely used in scientific imaging, medical imaging, and high-end
photography due to their superior image quality.
1.5.3. CMOS Sensors (Complementary Metal-Oxide Semiconductor)
1.5.3.1 Working Principle
Unlike CCD sensors, CMOS sensors integrate photodetectors and amplifiers within each
pixel, allowing parallel processing and faster readout speeds.
1.5.3.2 Advantages of CMOS Sensors
• Lower Power Consumption – More energy-efficient than CCDs.
• Faster Readout Speed – Ideal for high-speed photography and video recording.
• Lower Manufacturing Cost – More affordable than CCD sensors.
• Integration with Other Electronics – Easier to incorporate into smartphones and
digital cameras.
1.5.3.3 Disadvantages of CMOS Sensors
• Higher Noise Levels – More prone to fixed pattern noise.
• Lower Image Quality – Less uniform light sensitivity compared to CCDs.
• Reduced Dynamic Range – Can struggle in extreme lighting conditions.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

Due to technological advancements, modern CMOS sensors now rival or exceed CCDs in
performance, making them the standard in smartphones, DSLRs, and action cameras.

Figure: Image sensing pipeline, showing the various sources of noise as well as typical digital
post-processing steps.
1.5.4. Key Factors Affecting Sensor Performance
Several factors determine the quality of images captured by digital sensors.
1.5.4.1 Sensor Size
• Larger sensors capture more light and produce images with less noise.
• Full-frame sensors (35mm) offer better performance than smaller sensors (APS-C,
micro four-thirds).
1.5.4.2 Pixel Size and Density
• Larger pixels collect more light, improving low-light performance.
• High pixel density can increase detail but may lead to increased noise.
1.5.4.3 Dynamic Range
• The ability to capture details in both bright and dark areas.
• Larger sensors and better sensor technology improve dynamic range.
1.5.4.4 Noise and Signal-to-Noise Ratio (SNR)
• Noise comes from sensor heat, electronic interference, and photon shot noise.
• Higher SNR leads to cleaner images with less grain.
1.5.4.5 Sensitivity and ISO Performance

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

• ISO determines how sensitive the sensor is to light.


• Higher ISO increases brightness but introduces more noise.
1.5.4.6 Color Accuracy and White Balance
• Sensors use a color filter array (CFA) to detect red, green, and blue (RGB) colors.
• Correct white balance ensures natural-looking colors.
1.5.4.7 Rolling Shutter vs. Global Shutter
• Rolling Shutter: Exposes pixels row-by-row, causing motion distortion.
• Global Shutter: Captures the entire image at once, eliminating motion artifacts.
1.5.5 Future Trends in Digital Camera Sensors
With continuous advancements in sensor technology, new trends are shaping the future of
digital imaging:
1.5.5.1 Backside-Illuminated (BSI) CMOS Sensors
• Improves low-light sensitivity by rearranging sensor layers.
• Found in smartphones and high-end mirrorless cameras.
1.5.5.2 Stacked Sensors
• Integrate memory and processing within the sensor for faster readout speeds.
1.5.5.3 Quantum Dot Image Sensors
• Use nanotechnology to enhance light sensitivity and color reproduction.
1.5.5.4 AI-Powered Image Processing
• Machine learning algorithms reduce noise and enhance details in real-time.
1.5.5.5 Multi-Sensor Systems
• Smartphones and AR devices use multiple sensors for depth estimation, HDR
imaging, and computational photography.
Conclusion
Digital camera technology has evolved significantly, with CCD and CMOS sensors playing a
crucial role in capturing high-quality images. While CCD sensors offer superior image quality,
CMOS sensors have become the industry standard due to faster readout speeds, lower power
consumption, and cost efficiency. The performance of a digital camera sensor depends on
various factors such as sensor size, pixel density, dynamic range, and noise reduction
techniques. With new advancements in AI, quantum dot sensors, and multi-sensor imaging, the
future of digital photography continues to improve, enabling better low-light performance,
high-speed imaging, and real-time processing.
1.6 Image Processing-Point Operation
Image Processing: Point Operations

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

Point operations are the simplest type of image processing operations where each pixel in an
image is transformed independently of its neighbors. These operations are commonly used for
brightness adjustment, contrast enhancement, color transformations, histogram equalization,
and compositing.
In computer vision, point operations are often the first stage of preprocessing, preparing an
image for further analysis like edge detection, segmentation, and object recognition.
1. 6.1 Pixel Transforms
1.6.1.1 Definition
A pixel transform is a mathematical operation that modifies the intensity of each pixel based
on a predefined function.
Mathematically, a pixel transform is given by:
𝒈(𝒊, 𝒋) = 𝒉(𝒇(𝒊, 𝒋))
where:
• 𝒇(𝒊, 𝒋)is the intensity of the input pixel at coordinates (𝒊, 𝒋)
• 𝒈(𝒊, 𝒋) is the transformed intensity of the output pixel.
• 𝒉() is the transformation function applied to each pixel.
1.6.1.2 Types of Pixel Transforms
1.6.1.2.1 Brightness Adjustment
Brightness adjustment modifies pixel intensities by adding a constant value bbb to every pixel:
𝒈(𝒙) = 𝒇(𝒙) + 𝒃
If b>0, the image becomes brighter.
• If b<0, the image becomes darker.
1.6.1.2.2 Contrast Adjustment
Contrast adjustment scales pixel intensities by a factor a:
𝒈(𝒙) = 𝒂𝒇(𝒙)
• If a>1, contrast is increased.
• If 0<a<1, contrast is reduced.
1.6.1.2.3 Gamma Correction
Gamma correction applies a non-linear transformation to adjust brightness and contrast:
𝒈(𝒙) = 𝒇(𝒙)𝜸
Used for display calibration and HDR imaging.
• Common gamma values: γ≈2.2\gamma \approx 2.2γ≈2.2 for standard monitors.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

1.6. 2. Color Transforms


1.6.2.1 Purpose of Color Transforms
• Convert images between different color spaces.
• Adjust white balance to correct lighting conditions.
• Enhance specific colors for image segmentation and object recognition.
1.6.2.2 Common Color Transformations
1.6.2.2.1 RGB to Grayscale Conversion
Grayscale conversion removes color by computing a weighted sum of the red, green, and blue
channels:
I=0.2989R+0.5870G+0.1140BI = 0.2989 R + 0.5870 G + 0.1140
BI=0.2989R+0.5870G+0.1140B
• Preserves brightness perception according to human vision sensitivity.
1.6.2.2.2 White Balance Correction
White balance corrects color casts due to different lighting conditions using the Gray World
Assumption:
Observed color
White-balanced color = Average scene color

Other methods include:


• Color Temperature Correction: Adjusts RGB values to match a reference white
point.
1.6. 2.2.3 HSV (Hue, Saturation, Value) Transformation
HSV transformation separates brightness (Value) from color (Hue, Saturation), making it
useful for color-based segmentation:
√3(𝐺−𝐵)
𝐻 = 𝑎𝑟𝑐𝑡𝑎𝑛 (3(𝐺 − 𝐵)2𝑅 − 𝐺 − 𝐵) ( 2𝑅−𝐺−𝐵 )

1.6.3. Compositing and Matting


1.6.3.1 What is Compositing?
• Compositing merges multiple images into a single image.
• Used in photo editing, green screen effects, and augmented reality (AR).
1.6.3.2 What is Matting?
• Matting extracts the foreground from the background.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

• Example: Green screen removal in video production.


Mathematically, compositing is expressed as:
𝐶 = 𝛼𝐹 + (1 − 𝛼)𝐵
where:
• C is the composite image.
• F is the foreground image.
• B is the background image.
• α alphaα is the transparency mask.

1.6.4. Histogram Equalization


1.6.4.1 What is Histogram Equalization?
Histogram equalization enhances contrast by redistributing pixel intensities to utilize the full
intensity range.
1.6.4.2 Steps for Histogram Equalization
1. Compute the histogram of the image.
2. Calculate the cumulative distribution function (CDF).
3. Map old intensity values to new values using the CDF.
Mathematically:
(𝐼−𝐼𝑚𝑖𝑛 )
𝐼′ = × 255
𝐼𝑚𝑎𝑥 −𝐼𝑚𝑖𝑛

where:
• 'I′ is the new intensity.
• I is the original intensity.
• 𝐼𝑚𝑖𝑛 and 𝐼𝑚𝑎𝑥 are the minimum and maximum intensity values.
1.6.4.3 Applications
• Medical imaging: Enhances X-ray and MRI scans.
• Satellite imagery: Improves contrast in aerial photos.
1.6.5 Tonal Adjustment
Tonal adjustment improves image appearance by modifying brightness and contrast.
1.6.5.1 Techniques
• Exposure correction: Fixes underexposed or overexposed images.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

• Contrast enhancement: Expands the difference between dark and bright areas.
• Shadow and highlight adjustments: Preserves details in dark and bright regions.
1.6.5.2 Applications in HDR Imaging
• Merges multiple images taken at different exposures.
• Enhances low-light images and high-contrast scenes.
Conclusion
Point operations in image processing modify each pixel independently, making them fast
and efficient for real-time applications. Key transformations include:
• Pixel transforms for brightness, contrast, and gamma correction.
• Color transforms for grayscale conversion, white balance, and HSV processing.
• Compositing and matting for blending images.
• Histogram equalization for contrast enhancement.
• Tonal adjustment for fine-tuning brightness and contrast.
1.7 Image Processing-Linear Filtering
Linear filtering is a fundamental technique in image processing and computer vision, where
each pixel in an output image is obtained by applying a fixed weighted combination of
neighboring pixel values in the input image. It is widely used for:
• Blurring (smoothing)
• Sharpening
• Edge detection
• Noise removal
• Feature enhancement
Linear filtering plays a crucial role in image preprocessing and feature extraction, making it
a building block for many vision-based applications.
1.7.1 Fundamentals of Linear Filtering
1.7.1.1 Definition
Linear filtering operates by computing the weighted sum of neighboring pixel values using a
filter kernel or mask.
Mathematically, a linear filter is expressed as:
g (i, j) = ∑𝑘,𝑙 𝑓(𝑖 + 𝑘, 𝑗 + 𝑙) ℎ(𝑘, 𝑙)
where:
g (i, j) is the filtered image at pixel (i, j)

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

f (i, j) is the input image


h (k, l) is the filter kernel (weight mask).
The summation runs over all values in the neighborhood defined by the kernel.
1.7. 1.2 Convolution vs. Correlation
Linear filtering can be performed using correlation or convolution.
• Correlation:
o The filter is directly applied to the image.
o Uses the formula
G=f⊛h
Convolution:
• The filter kernel is flipped before being applied.
• Uses the formula:
g (i, j) = ∑𝑘,𝑙 𝑓(𝑖 + 𝑘, 𝑗 + 𝑙) ℎ(𝑘, 𝑙)

This is denoted as:


g = f ∗ hg
where ∗ represents convolution.
The key difference is that convolution reverses the filter kernel, which is important for
mathematical consistency in signal processing.
1.7. 2 Types of Linear Filters
1.7.2.1 Smoothing Filters (Low-pass Filtering)
Smoothing filters reduce noise and remove high-frequency components, making an image
appear blurred.
1.7. 2.1.1 Box Filter (Mean Filter)
The box filter replaces each pixel with the average of its neighboring pixels.
1
h (k, l) = 𝐾2

where K is the size of the filter kernel.


1.7.2.1.2 Gaussian Filter
The Gaussian filter is a weighted smoothing filter where closer pixels contribute more to the
output.
𝑥2 +𝑦2
1 −
h (x, y) = 2 𝜋𝜎2 𝑒 2 𝜎2

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

where σ sigma controls the amount of smoothing.


1.7.2.2 Sharpening Filters (High-pass Filtering)
Sharpening filters enhance edges and fine details.
1.7.2.2.1 Laplacian Filter
The Laplacian filter detects regions of rapid intensity change.
0 −1 0
h = [−1 4 −1]
0 −1 0
1.7.2.2.2 Unsharp Masking
Unsharp masking sharpens an image by subtracting a blurred version from the original:
𝑔𝑠ℎ𝑎𝑟𝑝 = f + 𝛾 (𝑓 − ℎ𝑏𝑙𝑢𝑟 * f)

Where ℎ𝑏𝑙𝑢𝑟 is a Gaussian blur filter.


1.7. 2.3 Edge Detection Filters
Edge detection filters highlight boundaries in an image.
1.7. 2.3.1 Sobel Operator
The Sobel operator detects edges using two separable filters:
−1 0 1 −1 −2 −1
𝐺𝑥 = [−2 0 2] 𝐺𝑦 = [ 0 0 0]
−1 0 1 1 2 1
The gradient magnitude is computed as:

G = √𝐺𝑥2 + 𝐺𝑦2

1.7. 3. Separable Filtering


Separable filtering speeds up convolution by splitting a 2D filter into two 1D filters.
h (x, y) = ℎ𝑥 (x) ℎ𝑦 (y)

This reduces computational complexity from O (K2) to O(K)


1.7. 4 Applications of Linear Filtering
1.7.4.1 Noise Removal
• Gaussian filters remove high-frequency noise.
• Averaging filters reduce random noise.
1.7. 4.2 Feature Enhancement
• Laplacian filters sharpen edges.
• Unsharp masking enhances details.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

1.7.4.3 Edge Detection


• Sobel filters highlight object boundaries.
1.7.4.4 Image Compression
• Low-pass filters reduce redundant information for efficient storage.
1.7. 5 Band-pass and Steerable Filters
Band-pass filters extract specific frequency components, useful for texture analysis and
feature extraction.
ℎ𝑏𝑎𝑛𝑑−𝑝𝑎𝑠𝑠 = ℎℎ𝑖𝑔ℎ−𝑝𝑎𝑠𝑠 - ℎ𝑙𝑜𝑤−𝑝𝑎𝑠𝑠

Steerable filters adjust orientation-dependent responses dynamically.


Conclusion
Linear filtering is a powerful tool in image processing, allowing:
• Smoothing (Gaussian, Box filters)
• Sharpening (Laplacian, Unsharp Masking)
• Edge Detection (Sobel, Prewitt)
• Feature Enhancement (Steerable Filters)
Important Questions
1. What is Computer Vision? Why is vision so difficult? Provide six real-world examples
of Computer Vision and explain each.
2. Illustrate with a real-world example how a photometric image is formed.
3. Explain the history of Computer Vision.
4. Illustrate with an example various types of Point operations in image processing.
5. Compare and contrast different types of image sensors used in digital cameras.
6. Discuss the role of feature extraction in Computer Vision.
7. Explain the different types of light sources and their impact on image formation.
8. Describe the impact of deep learning on modern Computer Vision applications.
9. What are the main challenges in object recognition and segmentation in Computer
Vision?
Brief Answers (Not enough for Examinations Purpose)
Here are brief answers to your questions based on the syllabus:
1. What is Computer Vision? Why is vision so difficult? Provide six real-world examples
of Computer Vision and explain each.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

o Computer Vision is a field of AI that enables machines to interpret visual


information, similar to human vision. It is difficult because real-world images
have variations in lighting, occlusions, noise, and different viewpoints.
o Real-world Examples:
1. Facial Recognition – Used in security and authentication systems.
2. Autonomous Vehicles – Detects objects, lanes, and pedestrians for self-
driving.
3. Medical Imaging – Identifies diseases in X-rays, MRIs, and CT scans.
4. Retail Automation – Used in checkout-free stores like Amazon Go.
5. Industrial Inspection – Detects defects in manufacturing.
6. Surveillance Systems – Used for real-time monitoring and security.

2. Illustrate with a real-world example how a photometric image is formed.


o Photometric image formation depends on light interaction with surfaces, camera
optics, and sensor response.
o Example: A photograph of a car taken at sunset has variations in brightness due
to sunlight direction, reflections on the metallic surface, and shadowing from
nearby objects.
3. Explain the history of Computer Vision.
o 1960s-70s: Early research focused on edge detection and pattern recognition.
o 1980s: Mathematical models like optical flow and 3D reconstruction emerged.
o 1990s: Machine learning methods improved feature extraction and object
recognition.
o 2000s: Feature-based recognition (e.g., SIFT), 3D modeling, and segmentation
advanced.
o 2010s: Deep learning revolutionized vision tasks with CNNs, GANs, and object
detection models.
o 2020s & Beyond: AI-driven robotics, multi-modal vision, and generalized AI
models.
4. Illustrate with an example various types of Point operations in image processing.
o Point operations modify pixel values based on intensity transformations.
o Example:
▪ Contrast Stretching: Enhances image contrast by stretching pixel values.
▪ Thresholding: Converts grayscale images into binary based on a
threshold.

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

▪ Log Transform: Enhances darker areas in low-light images.

5. Compare and contrast different types of image sensors used in digital cameras.
o CCD (Charge-Coupled Device):
▪ Produces high-quality images with low noise.
▪ Expensive and consumes more power.
o CMOS (Complementary Metal-Oxide-Semiconductor):
▪ Cheaper, consumes less power, and offers faster processing.
▪ Slightly lower image quality compared to CCD
6. Discuss the role of feature extraction in Computer Vision.
o Feature extraction identifies key patterns in images for tasks like object
recognition.
o Examples:
▪ SIFT (Scale-Invariant Feature Transform): Detects distinctive points in
images.
▪ HOG (Histogram of Oriented Gradients): Used in human detection.
▪ SURF (Speeded Up Robust Features): Faster alternative to SIFT.

7. Explain the different types of light sources and their impact on image formation.
o Point Light Sources: Emit light in all directions (e.g., bulbs, flashlights).
o Directional Light Sources: Produce parallel rays (e.g., sunlight).
o Area Light Sources: Cover broad areas and create soft shadows (e.g., LED
panels).

8. Describe the impact of deep learning on modern Computer Vision applications.


o CNNs (Convolutional Neural Networks): Revolutionized image recognition.
o Faster R-CNN & Mask R-CNN: Advanced object detection and segmentation.
o GANs (Generative Adversarial Networks): Enabled realistic image synthesis.
o Self-Supervised Learning: Reduced reliance on labeled data.

9. What are the main challenges in object recognition and segmentation in Computer
Vision?
o Challenges include:

Dept of CSE, AJIET Mangaluru


Computer Vision BCS613B

▪ Occlusion (partially hidden objects).


▪ Variability in lighting and viewpoints.
▪ Noise and low-resolution images.
▪ Complex backgrounds affecting object separation

***************

Dept of CSE, AJIET Mangaluru

You might also like