Computer Vision SM-1
Computer Vision SM-1
BELGAUM
COMPUTER VISION
STUDY MATERIAL
VI-SEMESTER
1
Computer Vision BCS613B
MODULE -1
MODULE -1
One of the most famous early attempts was initiated by Marvin Minsky at MIT in 1966, who
assigned a student to "spend the summer linking a camera to a computer and getting the
computer to describe what it saw"
Key Research Topics in the 1970s
• Edge detection: Early work focused on identifying object boundaries in images.
• Line labeling & 3D structure inference: Researchers attempted to recover the
structure of 3D objects from 2D line drawings.
• Pattern recognition: Developing methods to classify visual data based on statistical
models.
• Stereo vision: Early techniques were developed to estimate depth using two images
taken from different viewpoints.
One of the major distinctions between computer vision and the existing field of digital image
processing was the desire to recover the 3D structure of the world rather than simply
enhancing images.
2. The 1980s: The Rise of Mathematical and Statistical Methods
During the 1980s, researchers focused on more rigorous mathematical models for vision
tasks. This period saw:
• Image pyramids: Multi-scale image representations were used for image blending
and coarse-to-fine processing.
• Shape from shading: Algorithms were developed to estimate surface properties based
on how light interacted with an object.
• Optical flow: Motion estimation techniques were introduced to track movement across
frames in a video.
• 3D reconstruction: Early structure from motion (SfM) techniques were proposed to
recover the 3D shape of objects from image sequences.
David Marr’s work (1982) played a significant role in shaping the way researchers
approached computer vision. He introduced the idea of three levels of vision processing:
1. Low-level vision: Extracting basic features like edges and textures.
2. Intermediate-level vision: Grouping features into meaningful structures.
3. High-level vision: Recognizing and interpreting objects.
3. The 1990s: Advancements in 3D Vision and Statistical Approaches
The 1990s witnessed significant progress in computer vision algorithms and applications
due to increased computational power. Key developments included:
• Projective geometry and camera models: Advances in geometric image formation
allowed for better 3D reconstruction.
• Facial recognition and object detection: The first successful methods for face
recognition were developed using Eigenfaces (1991).
• Markov Random Fields (MRFs) and Kalman Filters: Probabilistic models improved
object tracking and segmentation tasks.
This period also saw the rise of real-time vision systems, enabling applications such as
automatic number plate recognition (ANPR) and industrial inspection.
4. The 2000s: Machine Learning and Data-Driven Methods
The early 2000s marked a paradigm shift in computer vision with the introduction of machine
learning-based methods.
Key Developments
• Feature-based recognition: Algorithms like SIFT (Scale-Invariant Feature
Transform, 2004) became the standard for object detection.
• Graph-based segmentation: Techniques such as Normalized Cuts (Shi & Malik,
2000) improved image segmentation.
• 3D Modeling & Photogrammetry: Advances in Structure from Motion (SfM)
enabled large-scale 3D reconstructions from images.
• Image-based Rendering: Techniques such as light-field capture and HDR imaging
allowed for photo-realistic image synthesis.
A key trend in this decade was the interplay between computer vision and computer
graphics, enabling applications such as augmented reality (AR) and computational
photography (Computer Vision: Algorithms and Applications, pages 19-20).
5. The 2010s: The Deep Learning Revolution
The most significant breakthrough in computer vision came in the 2010s with deep learning.
The availability of large-scale labeled datasets, such as ImageNet (2009), combined with
advances in GPU computing, led to a dramatic improvement in vision algorithms.
Key Breakthroughs
• Convolutional Neural Networks (CNNs): The AlexNet model (2012) outperformed
traditional vision algorithms on the ImageNet Challenge, sparking the deep learning
revolution.
• Object Detection & Semantic Segmentation: Algorithms such as Faster R-CNN
(2015) and Mask R-CNN (2017) achieved state-of-the-art results.
• Generative Adversarial Networks (GANs): Introduced in 2014, GANs allowed for
realistic image synthesis and enhancement.
• Real-time 3D Scene Understanding: Systems like KinectFusion (2011) enabled
depth estimation and 3D reconstruction in real-time.
• Self-Supervised Learning: Models began learning representations from unlabeled
images, reducing dependence on large labeled datasets.
This period also saw a rise in autonomous driving, medical imaging AI, and facial
recognition systems, making computer vision a mainstream technology.
6. The 2020s and Beyond: Towards Generalized AI Vision
Today, computer vision continues to evolve toward more generalized AI systems that can
learn from fewer examples and adapt to new environments.
Emerging Trends
• Neural Rendering & 3D Reconstruction: AI-driven rendering techniques are making
virtual worlds more realistic.
• Self-Supervised & Few-Shot Learning: New models aim to reduce the need for large
labeled datasets.
• Multi-modal Vision Models: Combining vision with language and audio to improve
AI's understanding of the world.
• AI-powered Robotics: Advanced vision systems are enabling more capable robotic
assistants and self-driving cars.
Despite these advances, the ultimate goal of achieving human-level vision understanding
remains unsolved.
The early 2000s marked a paradigm shift in computer vision with the introduction of machine
learning-based methods.
Key Developments
• Feature-based recognition: Algorithms like SIFT (Scale-Invariant Feature Transform,
2004) became the standard for object detection.
• Graph-based segmentation: Techniques such as Normalized Cuts (Shi & Malik, 2000)
improved image segmentation.
• 3D Modeling & Photogrammetry: Advances in Structure from Motion (SfM) enabled
large-scale 3D reconstructions from images.
• Image-based Rendering: Techniques such as light-field capture and HDR imaging
allowed for photo-realistic image synthesis.
A key trend in this decade was the interplay between computer vision and computer graphics,
enabling applications such as augmented reality (AR) and computational photography
(Computer Vision: Algorithms and Applications, pages 19-20).
5. The 2010s: The Deep Learning Revolution
The most significant breakthrough in computer vision came in the 2010s with deep learning.
The availability of large-scale labeled datasets, such as ImageNet (2009), combined with
advances in GPU computing, led to a dramatic improvement in vision algorithms.
Key Breakthroughs
I=L⋅ cos(θ)
where L is the incoming light and θ is the angle between the light source and the surface normal.
1.3.5 Specular Reflection
Specular reflection occurs on shiny surfaces, where light reflects in a specific direction.
• The reflection angle equals the incidence angle.
• Common in materials like glass, water, and polished metal.
• Creates bright highlights, which depend on the viewing angle.
1.3.6 Phong Shading Model
The Phong shading model is used in computer graphics to approximate how light interacts
with surfaces.
It combines:
• Ambient Reflection (constant background illumination).
• Diffuse Reflection (Lambertian model).
• Specular Reflection (creates highlights).
Phong shading formula:
I = 𝑘𝑎 𝐼𝑎 + 𝑘𝑑 (𝐿 . 𝑁) 𝐼𝑑 + 𝑘𝑠 ( R. V)𝑛 𝐼𝑠
𝑘𝑎 , 𝑘𝑑 , 𝑘𝑠 , are material coefficients.
𝐼𝑎 , 𝐼𝑑 , 𝐼𝑠 , are the ambient, diffuse, and specular intensities.
L, N, R, V are light, normal, reflection, and view vectors.
n controls shininess.
1.3.7 Di-Chromatic Reflection Model
The Di-Chromatic Reflection Model states that the observed color of a surface is a mix of:
• Diffuse reflection (color from the material).
• Specular reflection (color from the light source).
Lr=Li+Lb
where Li is specular and Lb is body reflection.
1.3.8 Global Illumination: Ray Tracing and Radiosity
Global illumination methods simulate realistic lighting.
• Ray Tracing: Follows rays of light as they bounce off surfaces, creating realistic
reflections and shadows.
1.3.11 Vignetting
Vignetting causes brightness to fade towards the edges due to lens properties.
1.5 Digital Camera
A digital camera converts light into a digital image using optical lenses, image sensors, and a
digital signal processing pipeline. The accuracy and quality of the captured image depend on
the ability of the camera to sense light, process signals, and minimize noise.
Process of Digital Image Formation
1. Light enters the camera lens and passes through the optical system.
2. The sensor (CCD or CMOS) captures photons and converts them into electrical
signals.
3. The Analog-to-Digital Converter (ADC) transforms the electrical signals into digital
pixel values.
4. Image processing steps (demosaicing, noise reduction, compression, etc.) refine the
final image.
1.5.1 Light Sensing Technology
Light sensing technology in digital cameras involves capturing photons using specialized
sensors and converting them into electronic signals that can be processed to create an image.
1.5.1.1 The Role of Light Sensors
A digital camera’s sensor consists of millions of photodetectors that collect light and
transform it into electrical signals. The amount of charge generated is proportional to the
intensity of incoming light.
Types of Sensors in Light Sensing Technology:
1. Photodiodes – Convert light energy into electrical current.
2. Charge-Coupled Devices (CCDs) – Store and transfer charge pixel-by-pixel.
Due to technological advancements, modern CMOS sensors now rival or exceed CCDs in
performance, making them the standard in smartphones, DSLRs, and action cameras.
Figure: Image sensing pipeline, showing the various sources of noise as well as typical digital
post-processing steps.
1.5.4. Key Factors Affecting Sensor Performance
Several factors determine the quality of images captured by digital sensors.
1.5.4.1 Sensor Size
• Larger sensors capture more light and produce images with less noise.
• Full-frame sensors (35mm) offer better performance than smaller sensors (APS-C,
micro four-thirds).
1.5.4.2 Pixel Size and Density
• Larger pixels collect more light, improving low-light performance.
• High pixel density can increase detail but may lead to increased noise.
1.5.4.3 Dynamic Range
• The ability to capture details in both bright and dark areas.
• Larger sensors and better sensor technology improve dynamic range.
1.5.4.4 Noise and Signal-to-Noise Ratio (SNR)
• Noise comes from sensor heat, electronic interference, and photon shot noise.
• Higher SNR leads to cleaner images with less grain.
1.5.4.5 Sensitivity and ISO Performance
Point operations are the simplest type of image processing operations where each pixel in an
image is transformed independently of its neighbors. These operations are commonly used for
brightness adjustment, contrast enhancement, color transformations, histogram equalization,
and compositing.
In computer vision, point operations are often the first stage of preprocessing, preparing an
image for further analysis like edge detection, segmentation, and object recognition.
1. 6.1 Pixel Transforms
1.6.1.1 Definition
A pixel transform is a mathematical operation that modifies the intensity of each pixel based
on a predefined function.
Mathematically, a pixel transform is given by:
𝒈(𝒊, 𝒋) = 𝒉(𝒇(𝒊, 𝒋))
where:
• 𝒇(𝒊, 𝒋)is the intensity of the input pixel at coordinates (𝒊, 𝒋)
• 𝒈(𝒊, 𝒋) is the transformed intensity of the output pixel.
• 𝒉() is the transformation function applied to each pixel.
1.6.1.2 Types of Pixel Transforms
1.6.1.2.1 Brightness Adjustment
Brightness adjustment modifies pixel intensities by adding a constant value bbb to every pixel:
𝒈(𝒙) = 𝒇(𝒙) + 𝒃
If b>0, the image becomes brighter.
• If b<0, the image becomes darker.
1.6.1.2.2 Contrast Adjustment
Contrast adjustment scales pixel intensities by a factor a:
𝒈(𝒙) = 𝒂𝒇(𝒙)
• If a>1, contrast is increased.
• If 0<a<1, contrast is reduced.
1.6.1.2.3 Gamma Correction
Gamma correction applies a non-linear transformation to adjust brightness and contrast:
𝒈(𝒙) = 𝒇(𝒙)𝜸
Used for display calibration and HDR imaging.
• Common gamma values: γ≈2.2\gamma \approx 2.2γ≈2.2 for standard monitors.
where:
• 'I′ is the new intensity.
• I is the original intensity.
• 𝐼𝑚𝑖𝑛 and 𝐼𝑚𝑎𝑥 are the minimum and maximum intensity values.
1.6.4.3 Applications
• Medical imaging: Enhances X-ray and MRI scans.
• Satellite imagery: Improves contrast in aerial photos.
1.6.5 Tonal Adjustment
Tonal adjustment improves image appearance by modifying brightness and contrast.
1.6.5.1 Techniques
• Exposure correction: Fixes underexposed or overexposed images.
• Contrast enhancement: Expands the difference between dark and bright areas.
• Shadow and highlight adjustments: Preserves details in dark and bright regions.
1.6.5.2 Applications in HDR Imaging
• Merges multiple images taken at different exposures.
• Enhances low-light images and high-contrast scenes.
Conclusion
Point operations in image processing modify each pixel independently, making them fast
and efficient for real-time applications. Key transformations include:
• Pixel transforms for brightness, contrast, and gamma correction.
• Color transforms for grayscale conversion, white balance, and HSV processing.
• Compositing and matting for blending images.
• Histogram equalization for contrast enhancement.
• Tonal adjustment for fine-tuning brightness and contrast.
1.7 Image Processing-Linear Filtering
Linear filtering is a fundamental technique in image processing and computer vision, where
each pixel in an output image is obtained by applying a fixed weighted combination of
neighboring pixel values in the input image. It is widely used for:
• Blurring (smoothing)
• Sharpening
• Edge detection
• Noise removal
• Feature enhancement
Linear filtering plays a crucial role in image preprocessing and feature extraction, making it
a building block for many vision-based applications.
1.7.1 Fundamentals of Linear Filtering
1.7.1.1 Definition
Linear filtering operates by computing the weighted sum of neighboring pixel values using a
filter kernel or mask.
Mathematically, a linear filter is expressed as:
g (i, j) = ∑𝑘,𝑙 𝑓(𝑖 + 𝑘, 𝑗 + 𝑙) ℎ(𝑘, 𝑙)
where:
g (i, j) is the filtered image at pixel (i, j)
G = √𝐺𝑥2 + 𝐺𝑦2
5. Compare and contrast different types of image sensors used in digital cameras.
o CCD (Charge-Coupled Device):
▪ Produces high-quality images with low noise.
▪ Expensive and consumes more power.
o CMOS (Complementary Metal-Oxide-Semiconductor):
▪ Cheaper, consumes less power, and offers faster processing.
▪ Slightly lower image quality compared to CCD
6. Discuss the role of feature extraction in Computer Vision.
o Feature extraction identifies key patterns in images for tasks like object
recognition.
o Examples:
▪ SIFT (Scale-Invariant Feature Transform): Detects distinctive points in
images.
▪ HOG (Histogram of Oriented Gradients): Used in human detection.
▪ SURF (Speeded Up Robust Features): Faster alternative to SIFT.
7. Explain the different types of light sources and their impact on image formation.
o Point Light Sources: Emit light in all directions (e.g., bulbs, flashlights).
o Directional Light Sources: Produce parallel rays (e.g., sunlight).
o Area Light Sources: Cover broad areas and create soft shadows (e.g., LED
panels).
9. What are the main challenges in object recognition and segmentation in Computer
Vision?
o Challenges include:
***************