1) What is Computer Vision?
Computer vision is a branch of artificial intelligence that helps machines see and understand
images and videos, just like humans do. It allows computers to recognize objects, analyze
their surroundings, and make decisions based on what they see.
Human vs. Machine Vision
Humans can easily recognize objects, understand depth, and interpret emotions from facial
expressions. For example, when you see a vase of flowers, you can quickly identify its shape,
color, and even its transparency based on how light interacts with it. You can also look at a
group photo and recognize people, count them, and even guess their emotions.
Computers, however, do not naturally "see" like humans. They process images as numbers
and patterns. Teaching computers to understand images as humans do is a complex challenge,
but modern AI techniques, such as deep learning, have made significant progress in this field.
--------------------------------------------------------------------------------------------------------------------------------------
2) What are the main challenges in object recognition and segmentation in
Computer Vision?
Challenges in Object Recognition and Segmentation in Computer Vision
Object recognition and segmentation help computers identify and separate objects in an
image, but several challenges make this difficult:
1. Occlusion (Hidden Objects) – Sometimes, objects are partially hidden behind other
objects. For example, in a group photo, one person might be standing in front of
another, making it hard for a computer to recognize both.
2. Lighting and Viewpoint Changes – Objects can look very different under different
lighting conditions or when viewed from different angles. A car at night looks
different from a car in daylight, making recognition harder.
3. Noise and Low-Resolution Images – Blurry or pixelated images make it difficult for
the computer to detect details. For example, a security camera might capture a low-
quality image, making it hard to recognize faces.
4. Complex Backgrounds – Sometimes, objects blend into their surroundings. If a cat is
sitting on a couch with a similar color, the computer might struggle to separate the cat
from the background.
Overcoming these challenges requires advanced AI techniques, high-quality data, and
powerful computing systems.
--------------------------------------------------------------------------------------------------------------------------------------
3) Illustrate with a real-world example how a photometric image is formed.
How a Photometric Image is Formed (Simple Explanation)
When we take a photo, light interacts with objects and is captured by a camera. The way an
image looks depends on several factors, such as lighting, how objects reflect light, and how
the camera processes the light.
Key Factors in Image Formation
1. Lighting Conditions – Light is essential for capturing images. The type of light
source affects how an image appears:
o Point Light Sources (e.g., a bulb or flashlight) emit light in all directions.
o Directional Light Sources (e.g., sunlight) create strong shadows.
o Area Light Sources (e.g., LED panels) spread light over a broad area, making
soft shadows.
2. Reflectance and Shading – Objects reflect light differently, affecting brightness and
contrast.
o Reflectance: How much light an object reflects. A mirror reflects more light
than a dark cloth.
o Shading: Changes in brightness due to object shape and light direction.
▪ Diffuse Shading: Light spreads evenly (e.g., a matte surface).
▪ Specular Shading: Light reflects in one direction, creating highlights
(e.g., a shiny car).
▪ Shadows and Occlusion: Some areas are darker when objects block
the light.
Real-World Example
Imagine taking a photo of an apple on a table:
• If sunlight comes from one side, one half of the apple looks bright while the other is
in shadow.
• A shiny apple creates highlights due to specular reflection.
• If another object blocks the light, part of the apple may be darker (occlusion).
By understanding these factors, computer vision can analyze and improve image processing
in cameras, robotics, and AI systems.