0% found this document useful (0 votes)
26 views45 pages

Unit I Vision System: Lighting and Machine Vision Systems

The document discusses machine vision systems, focusing on their components, lighting techniques, and the role of visual perception in automation. It highlights the importance of lighting for effective image capture and inspection, detailing various lighting methods such as back lighting, diffuse lighting, and dark field lighting. Additionally, it covers the functionality of machine vision cameras, lenses, and the principles behind pinhole cameras, emphasizing their applications in industrial processes and quality control.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views45 pages

Unit I Vision System: Lighting and Machine Vision Systems

The document discusses machine vision systems, focusing on their components, lighting techniques, and the role of visual perception in automation. It highlights the importance of lighting for effective image capture and inspection, detailing various lighting methods such as back lighting, diffuse lighting, and dark field lighting. Additionally, it covers the functionality of machine vision cameras, lenses, and the principles behind pinhole cameras, emphasizing their applications in industrial processes and quality control.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

UNIT I VISION SYSTEM Lighting and Machine Vision Systems

Basic Components – Elements of visual perception, Lenses: Pinhole cameras, Gaussian Optics – Lighting is responsible for illuminating the object and highlighting its distinct features to be
Cameras – Camera- Computer interfaces viewed by the camera. It is one of the critical aspects of machine vision systems; the camera
1.1 Introduction Machine Vision System cannot inspect objects that it cannot see. Therefore, lighting parameters such as distance of the
Machine vision systems are assemblies of integrated electronic components, computer hardware, light source from the camera and object, angle, intensity, brightness, shape, size, and color of
and software algorithms that offer operational guidance by processing and analyzing the images lighting must be optimized to highlight the features being inspected. In addition, the object must
captured from their environment. The data acquired from the vision system are used to control be seen clearly by the camera when it is struck by light; hence, the object's surface properties
and automate a process or inspect a product or material. must also be considered during lighting optimization.
Lighting can be provided by LED, quartz halogen, fluorescent, and xenon strobe light sources. It
can be directional or diffusive. Lighting techniques in machine vision systems are classified as
follows:

Back Lighting
Back lighting illuminates the target from behind. It creates contrast as dark silhouettes appear
against a bright background. Back lighting is used to detect holes, gaps, cracks, bubbles, and
scratches on clear parts. It is suitable for measuring, placing, and positioning parts. It is advisable
to use monochrome light with light control polarization if very precise (subpixel) edge detection
is necessary.
Many manufacturing industries adapt machine vision systems in performing tasks that can be
mundane, repetitive, tiring, and time-consuming to the workers, resulting in increased
productivity and reduced operational cost. For instance, a machine vision system in a production
line can inspect hundreds and thousands of parts per minute. A similar type of inspection can be
performed by human workers manually; however, it is much slower and expensive, prone to
error, and not all running parts can be quality-checked offline due to time limitations.
Machine vision systems also promote high product quality and production yield by providing
accurate, consistent, and repeatable detection, verification, and measurement systems. They can
help detect defects earlier in the process, which prevents the production and escape of defective Diffuse Lighting
parts. They improve the traceability and compliance to regulations and specifications of products Diffuse (or full bright) lighting is used to illuminate shiny specular and mixed reflective targets,
and materials in industrial processes. requiring even and multi-directional lighting. There are three types of diffuse lighting:
Components of Machine Vision Systems  Dome Diffuse Lighting is the most common diffuse lighting technique and is effective on
Machine vision systems are typically composed of five elements (or components), as discussed curved and specular surfaces. It is helpful in minimizing glare.
below. These components are common and may be seen in other systems. However, when these  On-axis Lighting, or co-axial lighting, uses a mirror to reflect light rays to the target in a co-
components work together by playing their distinct roles, they create a vision system capable of axial direction with the target and the camera. Surfaces that are at an angle with the camera will
sophisticated functions.
appear. On-axis lighting is effective in emphasizing angled, textured, or topographical features  Aperture is the opening of the lens through which light passes to enter the camera. It controls
on flat surfaces. the amount of light entering the lens. It is inversely related to the depth of field.
 Flat Diffuse Lighting has a highly diffusive light source which can be used as a front or
projection light. The object is viewed in the center of the flat diffusive light source. This lighting
technique is widely used in PCB inspection.
Partial Bright Field Lighting
In partial bright field or directional lighting, the light rays from an angled directional light source
strike the material directly. The camera and the object are in a co-axial position with each other.
Partial bright field lighting is good in generating contrast and emphasizing topographical features
of the surface. However, this lighting arrangement is less effective with specular surfaces as it
creates lighting hotspot reflections. Image Sensor
Dark Field Lighting The image sensor inside the machine vision camera converts light captured by the lens into a
In dark field lighting, the light rays from a directional light source (e.g., bar, spot, or ring light) digital image. It typically utilizes charged coupled device (CCD) or complementary metal-oxide-
strike the object at a low angle (10-150) from the surface. This lighting arrangement makes semiconductor (CMOS) technology to translate photons into electrical signals. The output of
surface flaws such as scratches, imprints, and notches appear bright by reflecting light to the image sensors is a digital image composed of pixels that shows the presence of light in the areas
camera; the rest of the surface appears to be dark. that the lens has observed.
Resolution and sensitivity are critical specifications of image sensors. Resolution is the number
of pixels produced by the sensor in the digital image. Sensors with a higher resolution produce
higher quality images, meaning more details can be observed in the object being inspected, and
more accurate measurements can be obtained. The resolution also refers to the ability of the
machine vision to perceive small changes. Sensitivity, on the other hand, refers to the minimum
amount of light required to detect a distinguishable output change in the image. Resolution and
sensitivity are inversely related to each other; an increased resolution will decrease the
sensitivity.
Vision Processing Unit
Devices such as color filters and polarizers may be used in machine vision lighting. Color filters
The vision processing unit of a machine vision system uses algorithms to analyze the digital
are used to lighten or darken targeted features on the surface. Polarizers are installed in cameras
image produced by the sensor. Vision processing involves a series of steps, performed externally
to reduce lighting noises such as glares and hotspots and increase the contrast.
(by a computer) or internally (for stand-alone machine vision systems). First, the digital image is
Machine Vision Lenses
extracted from the image sensor and is relayed to the computer. Next, the digital image is
The lens captures the image and relays it to the image sensor inside the camera in the form of
prepared for analysis by making the necessary features on the image stand out. The image is then
light. Most lenses are equipped with color recognition capability. The lens of a machine vision
analyzed to locate the specific features needed to be observed and measured. Once observations
camera can be an interchangeable lens (C-mount or CS-mount) or a fixed lens. Lenses are
and measurements of the feature are completed, they are compared to the defined and pre-
characterized by the following properties, which describes the image quality they can capture:
programmed specifications and criteria. Finally, the decision is made, and the results are
 Field of view refers to how much area the image sensor views; lenses with higher focal length
communicated.
have a reduced field of view.
Communication System
 Depth of field refers to the ability to maintain acceptable image quality without refocusing if the
The communication system quickly passes the decision made by the vision processing unit to
object is moved farther from the plane of best focus. It also influences the object’s range of
specific machine elements. Once the machine elements have received the information (or signal),
acceptable motion.
the machine elements will intervene on and control the process based on the output of the vision
 Depth of focus refers to how the quality of focus changes as the sensor is moved while the
processing unit. This mechanism is accomplished by discrete I/O signals or data communication
object remains in the same position.
by a serial connection in the form of RS-232 serial output or Ethernet.
Types of Machine Vision Cameras
The types of machine vision cameras are the following: The lens captures the image and relays it to the image sensor inside the camera in the form of
light. Most lenses are equipped with color recognition capability. The lens of a machine vision
1.2 Visual Perception camera can be an interchangeable lens (C-mount or CS-mount) or a fixed lens

Visual perception enables the machines to possess the ability to perceive and derive meaningful Types of lens
information from images, videos and other optical inputs. This is aptly described in Figure 1. The
information is useful for robots and autonomous systems to manipulate and navigate in their pericentric lens
environment and other relative locations. It describes the shape and size of the objects and their entocentric lens
relative locations for manipulation. It can also provide knowledge about the nature of obstacles telecentric lens
and terrain characteristics for navigation systems. A hypercentric or pericentric lens is a lens system where the entrance pupil is located in front
of the lens, in the space where an object could be located. This has the result that, in a certain
Primarily, the camera is the source of images for visual perception. The images acquired from region, objects that are farther away from the lens produce larger images than objects that are
the camera must be pre-processed to ensure quality, i.e., without any noise and distortions. After closer to the lens, in stark contrast to the behavior of the human eye or any
pre-processing, the object present in the images can be identified using any machine learning- ordinary camera (both entocentric lenses), where farther-away objects always appear smaller.
based computer vision techniques. However, the objects are described in reference to the image
in terms of pixels. Hence, there is a need for techniques that transform the image/pixel
coordinates into 3D world so that the knowledge of the object can be perceived for further usage.

The geometry of a hypercentric lens can be visualized by imagining a point source of light at the
center of the entrance pupil sending rays in all directions. Any point on the object will be imaged
to the point on the image plane found by continuing the ray that passes through it, so the shape of
the image will be the same as that of the shadow cast by the object from the imaginary point of
light. So the closer an object gets to that point (the center of the entrance pupil), the larger its
image will be.

This inversion of normal perspectivity can be useful for machine vision. Imagine a six-
Applications sided die sitting on a conveyor belt being imaged by a hypercentric lens system directly above,
whose entrance pupil is below the conveyor belt. The image of the die would contain the top and
1. Pick and place robot
all four sides at once, because the bottom of the die appears larger than the top.
2. Material handling systems
3. Assembly and quality inspection An entocentric lens is a compound lens which has its entrance or exit pupil inside the
4. Dimension and counting of objects lens.[1] This is the most common type of photographic lens. The aperture diaphragm is located
5. Automated device testing applications between the objective and the image-side focus (optics). It corresponds to the "normal" human
1.3 Lens in machine vision system visual impression. Distant objects appear smaller than closer objects. This doesn't occur for
example on other lens or perspectives such as Telecentric lens or Hypercentric lens.
showed how light may be used to project a picture onto a flat surface. In 1850, Scottish
scientist Sir David Brewster used a pinhole camera to take the first photograph.
Working Principle of Pinhole Camera

A telecentric lens is a special optical lens (often an objective lens or a camera lens) that has
its entrance or exit pupil, or both, at infinity. The size of images produced by a telecentric lens is
insensitive to either the distance between an object being imaged and the lens, or the distance
between the image plane and the lens, or both, and such an optical property is
called telecentricity. Telecentric lenses are used for precision optical two-
dimensional measurements, reproduction (e.g., photolithography), and other applications that are
sensitive to the image magnification or the angle of incidence of light.

The notion of rectilinear mobility of light underpins pinhole cameras due to the fact that light
ravels in a straight line. A pinhole camera produces an inverted image due to the straight-line
travel of the light.
We can take an image of the same quality as a digital SLR camera by employing the right
dimensions of a pinhole camera and the size of a tiny hole through which light enters. The lack
of a lens distinguishes pinhole cameras from other types of cameras. A pinhole camera can never
be a camera with a lens.
CONSTRUCTION OF PINHOLE CAMERA

The photographer can create a one-of-a-kind pinhole camera based on how he intends to utilize
1.4 Pinhole Camera it. A photographic pinhole camera's basic design consists of a light-tight box with a pin-sized
hole opening on one end and a piece of film or photographic paper wedged or taped on the other
A pinhole camera is a light-proof box with a tiny hole in one side that doesn't have a end. A shutter can be made out of a cardboard flap and a tape hinge. A sewing needle can be
lens but does have a small aperture. The camera obscura effect occurs when light from used to punch or drill a pinhole in tin foil, thin aluminium, or brass sheet.
an item travels through the aperture and projects an inverted picture on the opposite side
of the box.
The notion of Camera Obscura is used to create a pinhole camera. "Camera obscura" is a
Latin phrase that means "darkened room." It refers to a box-shaped or room-shaped
device that lets light in through a small opening on one side and projects it on the other.
In this method, the picture of an object outside of the box is created inside the box
upside-down.
Ibn al-Haytham, an Arab philosopher, was the first to demonstrate our way of seeing by
inventing the camera obscura, which was the forerunner to the pinhole camera. He
When the hole in a pinhole camera is the size of a green gram, the visual sharpness suffers. The
image becomes thicker and more blurry. More light enters a pinhole camera as the hole size
grows larger, disrupting the development of the image.
Image Characteristics of a Pinhole Camera

 As the image is displayed on the screen, a real image is obtained.


 The obtained image is significantly lower in size than the actual object.
 On the x-axis as well as the y-axis, the image is reversed.
 The image can be used to study light propagation in a rectilinear path.
 The size of the image created by a pinhole camera is determined by the distance between
Behind a hole cut through the box, this portion is taped to the inside of the lightbox. A pinhole the pinhole and the screen or film behind it.
camera can be created out of either a cylindrical cornflakes container or a shoebox. Many  Making numerous pinholes will result in multiple pictures on the screen.
pinhole photographers are inspired by the art of making their cameras; for them, discovering new  A larger pinhole allows more light to pass through, allowing light from one place on the
materials or a perfect box is akin to discovering a hidden gem. item to reach multiple points on the screen. As a result, the image will be brighter but
Image Formation of a Pinhole Camera fuzzier
Pinhole Camera Uses and Applications
Imagine being inside a dark room with no light coming in. This is how a pinhole camera works.
Consider making a small gap in the wall you're facing. You might see the light creep into your  The image obtained by a pinhole camera can be projected on a translucent screen in real-
chamber if someone held a torchlight from the outside. You could see the light seeping in, time to safely witness a solar eclipse.
varying in direction and intensity as the individual holding the torch moved around the light  Pinhole photography is frequently used to record the sun's movement over time.
source.  Pinhole cameras are composed of materials that are impervious to electric and magnetic
Imagine a little box that has been light proofed save for a small pin-sized opening on the box, fields. Because they are difficult to notice, they are frequently used for surveillance.
instead of a room. Instead of you, there's a film that traps light rays within. The film within the  Because of their low cost of manufacture and small weight, they are an excellent choice
box records the image of light rays hitting the opposite side of the wall, rather than you gazing at for scientific photography in schools.
it. Because the tiny opening limits the amount of light that may enter, the exposure to the light  A pinhole camera may take 360° images by drilling holes on each side of the camera box
must be prolonged. and mounting the film in the center with a cylindrical roll.
A convex lens is used in a conventional camera to admit more light while simultaneously 1.5 Gaussian optics
concentrating it on a narrow area. This cuts down on the amount of time spent in front of the Gaussian optics is a framework for describing optical phenomena, which is based on geometrical
camera. optics (ray optics) and makes extensive use of the paraxial approximation. It has been developed
by Johann Carl Friedrich Gauss (1777 – 1855) and is still widely used for many purposes.
Gaussian optics is a simplified approach to the study of optical systems, such as lenses and  Nodal points: Points where a ray passes through without any deviation, often coinciding with
mirrors, where light rays are approximated to follow the principles of paraxial approximation. It principal points for thin lenses.
is the foundation of geometrical optics and focuses on situations where light rays make small
angles with the optical axis, allowing mathematical treatment to be simplified.

Here’s a summary of key concepts in Gaussian optics: 5. Thin Lens Approximation

 For a thin lens, its thickness is negligible compared to its focal length. This simplifies analysis,
treating the lens as a single plane.
1. Paraxial Approximation

 Assumes light rays travel close to the optical axis and at small angles.
 Simplifies trigonometric functions using approximations: 6. Ray Tracing
sin⁡θ≈θ\sin\theta \approx \thetasinθ≈θ, tan⁡θ≈θ\tan\theta \approx \thetatanθ≈θ, and
cos⁡θ≈1\cos\theta \approx 1cosθ≈1, where θ\thetaθ is in radians.
Ray tracing involves using paraxial rays and specific rules to locate images:

 Parallel ray: A ray parallel to the optical axis passes through the focal point.
 Focal ray: A ray passing through the focal point emerges parallel to the optical axis.
2. Gaussian Lens Formula  Central ray: A ray passing through the center of the lens or mirror remains undeviated.

The relationship between the object distance (uuu), image distance (vvv), and the focal length
(fff) of a thin lens or spherical mirror is given by:
7. Spherical Aberration Neglect
1f=1u+1v\frac{1}{f} = \frac{1}{u} + \frac{1}{v}f1=u1+v1

Gaussian optics assumes ideal conditions, neglecting higher-order effects such as spherical and
chromatic aberrations, which are significant in real systems.

3. Magnification

 Magnification (MMM) relates the image height (hih_ihi) to the object height (hoh_oho) as: Applications:
M=hiho=−vuM = \frac{h_i}{h_o} = -\frac{v}{u}M=hohi=−uv (The negative sign indicates image
inversion in some cases.)  Optical System Design: Designing lenses, mirrors, and microscopes.
 Image Formation: Analyzing how images form in cameras and telescopes.
 Beam Collimation: Controlling the spread of light in lasers and optical fibers.

4. Cardinal Points
The essential assumptions on which Gaussian optics is based are the following:
Key reference points for Gaussian optics:
Gaussian optics includes the use of rays around an optical axis and the paraxial approximation.
 Principal points: Points on the optical axis where light rays passing through experience no  Light can be described with geometrical light rays (geometrical optics); wave effects can be
angular deviation. ignored.
 Focal points (F): Points where parallel rays converge (or appear to diverge).
 The investigated systems are rotationally symmetric around an optical axis. (A simple
generalization can lead to different behavior in two transverse dimensions, for example for
treating cylindrical lenses.)
 All relevant light rays have only relatively small angles against the optical axis. Various
equations treat only first-order terms, e.g. identifying the sine or the tangent of an angle (in
radians) with the angle itself. The paraxial approximation is used throughout.
It is no problem that substantial angles can be involved e.g. in refraction at prisms; at those
optical components, the optical axis can also be assumed to be bent. Only angles relative to the
optical axis need to be small.
Under the mentioned assumptions, a substantially simplified mathematical description of optical
phenomena is possible:
 Any light ray can be described with two coordinates for a certain position along the optical axis:
for example, a transverse coordinate and an angle . (Sometimes, one uses reduced
coordinates and where some relations are simpler.)
 For a wide range of optical components such as lenses, prisms and mirrors, one can describe the
effect on the two coordinates with a 2 × 2 matrix (ABCD matrix) because the relations between
inputs and outputs are linear.
 Likewise, any combination of such optical elements (and air spaces between them) can be
described with such a matrix, which is obtained by multiplication of the matrices corresponding
to the different elements and air spaces.
 One can also describe the optical function of an element or a combination of elements by
specifying so-called cardinal points. Those can be calculated from the mentioned matrix, and
vice versa. A complete optical system can thus be treated as a kind of black box, which is
characterized with only a couple of Gaussian parameters.
One can also apply the related rules in geometrical drawings.
The described framework can be applied to a wide range of optical systems – for example,
to telescopes, photo cameras and microscopes. One can calculate parameters like focal lengths,
the transverse, linear and longitudinal magnification, identify conjugate planes, focal
planes, image planes etc. However, important phenomena like optical aberrations cannot be
treated because those involve geometrical nonlinearities which are neglected in Gaussian optics.
Their treatment requires substantially more sophisticated mathematical methods. One can
consider Gaussian optics to provide a simplified description, which is relatively easily calculated,
and aberrations (as calculated with more sophisticated methods) are deviations from that.
Although Gaussian optics belongs to the methods of geometrical optics, various parameters have
a direct correspondence to quantities in wave optics. Therefore, it is possible, for example, to
describe the propagation of Gaussian beams (including wave effects like diffraction) based on
parameters calculated with Gaussian optics.
Note that the well-known Gaussian beams are not appearing in the realm of Gaussian optics;
they belong to wave optics.
1.4 Types of Machine Vision Cameras
The types of machine vision cameras are the following:
Line Scan Camera
A line-scan camera precisely and quickly captures digital images one line at a time. The camera
still views the whole of the object. The complete image is constructed in the software pixel line
by pixel line. Either the part or the camera must be moving during the inspection.
Line scan cameras can inspect multiple objects in a single line. They are ideal in high-speed In robotics, camera-computer interfaces are essential for enabling robots to perceive, analyze,
conveying systems and continuous processes. They are suitable in continuous webs of materials, and respond to their environment. This capability is a cornerstone of computer vision and
such as paper, metal, and textiles, large parts, and cylinders. machine vision systems in robotics. Here's an overview of camera-computer interfaces in the
context of robotics:

Types of Cameras Used in Robotics

1. Monocular Cameras (Single Lens):


o Captures a 2D view of the environment.
o Used for tasks like object detection, QR code reading, or color recognition.
Area Scan Camera 2. Stereo Cameras:
o Consists of two cameras placed apart to mimic human binocular vision.
Area scan cameras use rectangular-shaped image sensors used to capture images in a single
o Provides depth perception for 3D object recognition and mapping.
frame. The resulting digital image has a height and width based on the number of pixels on the 3. RGB-D Cameras:
sensor. The vision processing unit analyzes the scene image by image. Area scan cameras can o Combines RGB (color) data with depth information (e.g., Microsoft Kinect, Intel
perform almost all common industrial tasks and are easier to set up and align. Unlike line scan RealSense).
cameras, area scan cameras are preferred in inspecting stationary objects. The objects can be o Common in applications like SLAM (Simultaneous Localization and Mapping).
paused momentarily in front of area scan cameras to allow inspection. 4. Thermal Cameras:
o Captures infrared radiation to detect heat signatures.
o Used in search-and-rescue robots or night vision applications.
5. LiDAR (Light Detection and Ranging):
o Emits laser beams to measure distances and generate 3D maps.
o Critical for autonomous navigation in self-driving cars and drones.
6. 360° Cameras:
o Provides a panoramic view of the environment.
o Useful for robotic surveillance and telepresence.
7. High-Speed Cameras:
o Captures fast-moving objects with high frame rates.
o Essential for robotics in manufacturing and quality control.

3D Scan Camera
3D scan cameras can perform inspections at X, Y, and Z planes and calculate the object’s
position and orientation in space. They utilize single or multiple cameras and laser displacement Camera-Computer Interfaces in Robotics
sensors. In a single-camera setup, the camera must be moved to generate a heightmap that
resulted from the displacement of lasers’ location on the object. The height of the object and its
1. Hardware Interfaces
surface planarity can be calculated using a calibrated offset laser. In a multi-camera setup, laser o USB: Common in most robotic setups, especially for plug-and-play cameras.
triangulation is deployed to generate a digitized model of the object’s shape and location. o Ethernet/IP: Used in industrial robots where high-speed and long-distance
3D scan cameras are ideal for inspecting 3D-formed parts and robotic guidance applications. communication is necessary.
This type of machine vision camera can tolerate slight environmental disruptions (e.g., light, o MIPI (Mobile Industry Processor Interface): Used in embedded robotics systems (e.g.,
contrast, and color variations) while providing precise information. Hence, they are widely used Raspberry Pi Camera Module).
o PCIe (Peripheral Component Interconnect Express): For high-performance systems
in metrology, factory automation, and defect analysis of parts.
requiring low latency and high data rates.
o Wireless (Wi-Fi/Bluetooth): Enables mobile robots to use cameras without physical Challenges in Robotics Camera-Computer Interfaces
tethering.
o ROS (Robot Operating System) Hardware Interfaces: Provides abstraction for
1. Latency:
connecting cameras to robotic platforms.
o Critical for real-time applications like obstacle avoidance.
2. Software Interfaces
2. Data Processing Load:
o ROS (Robot Operating System):
o High-resolution and high-frame-rate cameras generate large amounts of data, requiring
 ROS has drivers and packages like image_transport and
efficient computation.
camera_info_manager for interfacing with cameras.
3. Calibration:
 Examples: ROS-compatible cameras like Intel RealSense or ZED Stereo Camera.
o Cameras need to be accurately calibrated to provide reliable data, especially for stereo
o OpenCV:
and depth cameras.
 Widely used for processing images and video streams in robotics.
4. Environmental Factors:
 Tasks: Object detection, tracking, feature extraction, and edge detection.
o Varying lighting, motion blur, or occlusion can affect the performance of camera-based
o GStreamer:
systems.
 A multimedia framework for handling video streams.
5. Integration:
 Useful for robotics applications requiring real-time video processing.
o Ensuring seamless integration of cameras with robotic platforms and software stacks.
o SLAM Libraries:
 Systems like ORB-SLAM and RTAB-Map use camera data for mapping and
localization.
o Deep Learning Frameworks:
 TensorFlow, PyTorch, or YOLO (You Only Look Once) models for real-time object
recognition and decision-making.

Applications of Camera-Computer Interfaces in Robotics

1. Autonomous Navigation:
o Cameras provide visual inputs for path planning and obstacle avoidance in self-driving
cars, drones, and delivery robots.
2. Object Detection and Manipulation:
o Used in robotic arms to identify, pick, and place objects in industrial settings.
3. SLAM (Simultaneous Localization and Mapping):
o Cameras help robots build a map of their environment and determine their position
within it.
4. Human-Robot Interaction (HRI):
o Cameras enable robots to recognize human gestures, faces, and body language for
intuitive interaction.
5. Inspection and Quality Control:
o Industrial robots use high-speed cameras to inspect products for defects or deviations.
6. Search and Rescue:
o Thermal and RGB cameras help robots navigate and locate survivors in disaster zones.
 retrieval to browse or search an extensive image database for an image like the original
UNIT II/ VISION ALGORITHMS image;

Fundamental Data Structures: Images, Regions, Sub-pixel Precise Contours – Image  sharpening and restoration to create an enhanced image from the original image; and

Enhancement : Gray value transformations, image smoothing, Fourier Transform – Geometric  Visualization to identify objects not visible in an image.
Transformation - Image segmentation – Segmentation of contours, lines, circles and ellipses –
Image retention
Camera calibration – Stereo Reconstruction.
A temporary or permanent residual image on a screen is known as image retention. Image
1 .Image retention sometimes occurs on screens when an image is displayed for an extensive period. This
occurs because of the different characteristics of the materials used to achieve high-definition
An Image may be defined as a two dimensional function f (x,y) where x & y are spatial (plane) imaging. It is best to avoid images susceptible to image retention to protect digital screens
coordinates, and the amplitude of f at any pair of coordinates(x,y) is called intensity or gray level whenever possible.

of the image at that point. When x,y and the amplitude values of f are all finite, discrete
Digital image correlation
quantities we call the image as Digital Image.
Digital image correlation (DIC) describes the surface displacement measurement technique of
Image processing capturing a solid object's motion, shape and deformation. It is difficult to achieve reliable and
high-quality DIC results but easy to obtain rudimentary DIC results.
Image processing describes the process of digitally transforming an image and executing specific
operations to obtain useful information from it. Image processing systems often treat images as Levels of image data representation
2D signals when applying some predetermined signal processing approaches.
 Iconic images - consists of images containing original data; integer matrices with data
Image Sampling about pixel brightness.

Digitization of spatial coordinates (x,y) is called Image Sampling. To be suitable for computer E.g., outputs of pre-processing operations (e.g., filtration or edge sharpening) used for
processing, an image function f(x,y) must be digitized both spatially and in magnitude. highlighting some aspects of the image important for further treatment.

Image Quantization  Segmented images - parts of the image are joined into groups that probably belong to the
same objects.
Digitizing the amplitude values is called Quantization. Quality of digital image is determined to
It is useful to know something about the application domain while doing image
a large degree by the number of samples and discrete gray levels used in sampling and
segmentation; it is then easier to deal with noise and other problems associated with
quantization.
erroneous image data
Types of image processing include the following:
 Geometric representations - hold knowledge about 2D and 3D shapes. The
 pattern recognition to measure various patterns around objects in an image; quantification of a shape is very difficult but very important

 recognition to detect or differentiate objects within an image;  Relational models - give the ability to treat data more efficiently and at a higher level of
abstraction.
A prior knowledge about the case being solved is usually used in processing of this kind.  If local information is needed from the chain code, it is necessary to search through the
whole chain systematically
Example - counting planes standing at an airport using satellite images
 Chains can be represented using static data structures (e.g., 1D arrays); their size is the
Traditional image data structures longest length of the chain expected.

 matrices,  Dynamic data structures are more advantageous to save memory.

 chains, Run length coding

 graphs,  Often used to represent strings of symbols in an image matrix (e.g., FAX machines use
it).
 lists of object properties,
 In binary images, run length coding records only areas that belong to the object in the
 relational databases, image; the area is then represented as a list of lists.
 etc.  Each row of the image is described by a sub list, the first element of which is the row
number.
 used not only for the direct representation of image information, also a basis of more
complex hierarchical methods of image representation  Subsequent terms are co-ordinate pairs; the first element of a pair is the beginning of a
run and the second is the end.
Matrices
 There can be several such sequences in the row.
 Most common data structure for low level image representation Elements of the matrix
are integer numbers

 Image data of this kind are usually the direct output of the image capturing device, e.g., a
scanner.

Chains

 Chains are used for description of object borders

 Symbols in a chain usually correspond to the neighborhood of primitives in the image.


Run length coding can be used for an image with multiple brightness levels as well - in
the sub list, sequence brightness must also be recorded

Topological data structures


Image description as a set of elements and their relations.
 Graphs
 Evaluated graphs
 Region adjacency graphs
 ML has the same dimensions and elements as the original image
 Mi-1 is derived from the Mi by reducing the resolution by one half.
 Square matrices with dimensions equal to powers of two required - M0 corresponds to
one pixel only.
 M-pyramids are used when it is necessary to work with an image at different resolutions
simultaneously.
 An image having one degree smaller resolution in a pyramid contains four times less
data, so that it is processed approximately four times as quickly.
 Often it is advantageous to use several resolutions simultaneously rather than to choose
Relational structures just one image from the M-pyramid.
 Such images can be represented using tree pyramids ... T-pyramids.
 information is then concentrated in relations between semantically important parts of the  T-pyramid is a tree, every node of the T-pyramid has 4 child nodes.
image - objects - that are the result of segmentation
 Appropriate for higher level image understanding

Standard digital image file formats include the following.

JPEG
JPEG (pronounced JAY-peg) is a graphic image file produced according to the Joint
Photographic Experts Group standard. This group of experts develop and maintain standards for
a suite of compression algorithms for computer image files. JPEGs usually have a .jpg file
extension.

GIF

GIF stands for Graphics Interchange Format. GIFs use a two-dimensional (2D) raster data
type and are binarily encoded. GIF files usually have the .gif extension.

PNG
Pyramids PNG is the Portable Network Graphics file format for image compression. It provides several
improvements over the GIF format. Like a GIF, a PNG file is compressed in a lossless manner,
 M-pyramid - Matrix pyramid ... is a sequence {ML, ML-1, ..., M0} of images
meaning that all image information can be restored when decompressing the file for viewing.
PNG files typically have a .png extension.
SVG
SVG (Scalable Vector Graphics) is a vector file used to display 2D graphics, charts and
illustrations online. SVG files do not depend on unique pixels to create images so they can scale
up or down without losing resolution. This means that the file can be viewed on a computer
display of any size and resolution, such as the small screen of a smart phone or a large
widescreen display on a PC.

SVG files are also searchable and indexable since they use an Extensible Markup Language
(XML) format. Any program, such as a browser, that recognizes XML can display the image
using the information provided in the SVG file. SVG files usually have an .svg extension.
In this chapter, we will discuss the basic concepts of regions, concentrating on two issues:
TIFF Segmenting an image into regions . Representing the regions

3.Sub pixel-Precise Contours


TIFF (Tag Image File Format) is a standard format for exchanging raster graphic (bitmap) The data structures we have considered so far are pixel-precise. Often, it is important to extract
images between application programs, including those used for scanner images. TIFF files have subpixel-precise data from an image because the application requires an accuracy that is higher
a .tiff or .tif file name suffix. than thepixel resolution of the image. The subpixel data can, for example, be extracted with
subpixel thresholding (see Section 3.4.3) or subpixel edge extraction (see Section 3.7.3). The
2.Regions results of these operations can be described with subpixel-precise contours. Figure 3.2 displays
several example contours. As we can see, the contours can basically be represented as a polygon,
A region in an image is a group of connected pixels with similar properties. Regions are i.e., an ordered set of control points (ri, ci)┬, where the ordering defines which control points are
important for the interpretation of an image because they may correspond to objects in a scene. connected to each other. Since the extraction typically is based on the pixel grid, the distance
An image may contain several objects and, in turn, each object may contain several regions between the control points of the contour is approximately 1 pixel on average. In the computer,
corresponding to different parts of the object. For an image to be interpreted accurately, it must the contours are simply represented as arrays of floating-point row and column coordinates.
be partitioned into regions that correspond to objects or parts of an object. However, due to From Figure 3.2, we can also see that there is a rich topology associated with the contours. For
segmentation errors, the correspondence between regions and objects will not be perfect, and example, contours can be closed (contour 1) or open (contours 2–5). Closed contours are usually
object-specific knowledge must be used in later stages for image interpretation. represented by having the first contour point identical to the last contour point or by a special
attribute that is stored with the contour. Furthermore, we can see that several contours can meet
3.1 Regions and Edges Consider the simple image shown in Figure 3.1. at a junction point, e.g., contours 3–5. It is sometimes useful to explicitly store this topological
information with the contours.
This figure contains several objects. The first step in the analysis and understanding of this image
is to partition the image so that regions representing different objects are explicitly marked. Such
partitions may be obtained from the characteristics of the gray values of the pixels in the image.
Recall that an image is a two-dimensional array and the values of the array elements are the gray
values. Pixels, gray values at specified indices in the image array, are the observations, and all
other attributes, such as region membership, must be derived from the gray values. There are two
approaches to partitioning an image into regions: region-based segmentation and boundary
estimation using edge detection.
2.4 Image Enhancement Image enhancement techniques are used to enrich the contrast and enhance the details of an
image. These techniques play a vital role in image processing, and one commonly used method
The principal objective of image enhancement is to process a given image so that the is gray-level transformation.
result is more suitable than the original image for a specific application. • It accentuates
or sharpens image features such as edges, boundaries, or contrast to make a graphic Gray level transformation
display more helpful for display and analysis. • The enhancement doesn't increase the
inherent information content of the data, but it increases the dynamic range of the chosen Gray level transformation allows the modification of pixel intensities by mapping input gray
features so that they can be detected easily. levels to various output levels, to achieve desired image enhancements. This mapping can easily
be achieved through different mathematical functions.

However, the basic transformation function is given as follows:

o=T(i)

Where O stands for output pixel value, I stands for input pixel value, and T stands for the
transformation function that maps the pixel values of the input image to different output gray
levels.

The greatest difficulty in image enhancement is quantifying the criterion for enhancement Types of transformation
and, therefore, a large number of image enhancement techniques are empirical and
require interactive procedures to obtain satisfactory results. • There are three common types of gray-level transformation:
Image enhancement methods can be based on either spatial or frequency domain
techniques. 1. Linear transformation
Spatial domain enhancement methods: 2. Logarithmic transformation
• Spatial domain techniques are performed to the image plane itself and they are based on 3. Power-Law transformation
direct manipulation of pixels in an image.
• The operation can be formulated as g(x,y) = T[f(x,y)], where g is the output, f is the Linear transformation
input image and T is an operation on f defined over some neighborhood of (x,y).
• According to the operations on the image pixels, it can be further divided into 2 A linear transformation is achieved by applying a linear relationship to the pixels of an image
categories: Point operations and spatial operations (including linear and non-linear to get the desired enhancements. This method is often used for adjusting the brightness and
operations). contrast of an image.

Frequency domain enhancement methods: There are various types of linear transformation, that are discussed below.
• These methods enhance an image f(x,y) by convoluting the image with a linear, position
invariant operator. Identity transformation
• The 2D convolution is performed in frequency domain with DFT. Spatial domain:
Identity transformation leaves the original pixel value of an image unchanged and maps it as is
g(x,y)=f(x,y)*h(x,y) Frequency domain: G(w1,w 2)=F(w1,w 2)H(w1,w 2)
in the output image.

2. Enhancement by point processing Formula O=I


• These processing methods are based only on the intensity of single pixels.
In this equation, the input and output pixel values are indicated by I and O respectively.
Graphical representation: Logarithmic transformation

The logarithmic transformation uses logarithmic functions to modify the pixel values of an
image. It redistributes the pixel values in an image, accentuating the detail in dim areas while
compressing the details in brighter areas.

Formula

The mathematical formula for logarithmic transform is as follows:

O=c∗log(1+I)

In this equation, O indicates the resulting pixel value and I conveys the original pixel
Here, the x-axis represents the input pixel values and the y-axis represents the output pixel value. C marks the scaling factor which controls the degree of image enhancement.
values.
Graphical representation
Negative transformation

Negative transformation inverts the pixel value of the image by subtracting it from the
maximum pixel value. The resulting image is a digital negative of the original image.

Formula O=MAX−I

In this equation, O implies the output pixel value and I implies the input pixel
value. MAX represents the minimum pixel value in the input image.

Graphical representation

In this context, the x-axis is used to indicate or show the input pixel values, while the y-axis is
used to portray or depict the output pixel values.

Power-law transformation

The power law transformation, also called gamma transformation is a technique that uses a
power-law function to adjust the pixel values of an image. It is versatile, as it allows the
emphasis on certain intensity ranges or enhancing specific details in an image.

Formula

Here, the x-axis and y-axis represent the input and output pixel values respectively. The gamma transformation equation is given as:
O=c∗Iγ several types of smoothing filters, including mean filter, median filter, Gaussian filter, and
bilateral filter.
In this context, the symbol c is used to represent the scaling factor, γ denotes the gamma
correction value, and I and O respectively stand for the input and output pixel values. Mean Filter

 If the value of gamma is greater than one, it stretches contrast in brighter areas and The mean filter is a type of linear smoothing filter that replaces each pixel in the image with the
compresses the pixel values in the darker areas. average of its neighboring pixels. The size of the neighboring pixels is defined by the filter
 If the value of gamma is smaller than one, it enhances contrast in dim areas and kernel or mask. The larger the kernel, the stronger the smoothing effect. Mean filters are easy to
compresses the pixel values in the bright areas. implement and are commonly used in low-level image processing tasks such as noise reduction
and edge detection. However, they tend to blur edges and details in the image, leading to loss of
Graphical Representation image quality.

Here, the x-axis and y-axis represent the input and output pixel values respectively and the shape Median Filter
of the graph varies with the gamma values.
The median filter is a type of nonlinear smoothing filter that replaces each pixel in the image
2.5 Smoothing in Image Processing with the median value of its neighboring pixels. The size of the neighborhood is also defined by
the filter kernel. Unlike the mean filter, the median filter does not blur edges and details in the
Smoothing in image processing refers to the process of reducing noise or other unwanted image. Instead, it preserves them while removing the noise. Median filters are commonly used in
artifacts in an image while preserving important features and structures. The goal of smoothing is image processing tasks that involve removing salt and pepper noise from the image.
to create a visually appealing image that is easy to interpret and analyze. Smoothing techniques
use various algorithms, such as filters or convolutions, to remove noise or other distortions in the
image. Effective smoothing requires striking a balance between removing unwanted artifacts and
preserving important image details, and is an essential step in many image processing
applications, including image segmentation, object recognition, and computer vision.

Types of Smoothing Filters

Smoothing filters, also known as blurring filters, are a type of image filter that are commonly
used in image processing to reduce noise and remove small details from an image. There are
Gaussian Filter Fig. 32 shows the decomposition of a synthetic image into oscillations. In this toy-example, the
image is simple enough to be decomposed by using only three oscillations. We will see further
The Gaussian filter is a type of linear smoothing filter that is based on the Gaussian that usual images need much more oscillations. The Fourier transform gives information about
distribution. The filter works by convolving the image with a Gaussian kernel. The Gaussian the frequency content of the image
kernel has a bell-shaped curve that determines the weight of each pixel in the neighborhood.
Pixels closer to the center of the kernel have a higher weight than pixels farther away. The
Gaussian filter is a good choice for smoothing images while preserving edges and details.
However, it is computationally expensive and can produce ringing artifacts around edges.

The discrete Fourier transform (DFT) of an image f of size M×N is an image F of same size
defined as:.

Bilateral Filter

The bilateral filter is a type of nonlinear smoothing filter that uses a combination of spatial and
range filtering. The spatial filter is similar to the Gaussian filter, while the range filter is based on
the difference in pixel intensities. The bilateral filter preserves edges and details while removing Note that the definition of the Fourier transform uses a complex exponential. In consequence, the
noise. It is commonly used in image processing tasks such as image denoising, edge-preserving DFT of an image is possibly complex, so it cannot be displayed with a
smoothing, and tone mapping.
single image. That is why we will show the amplitude (modulus) and phase (argument) of the
Smoothing Techniques DFT separately, as in Fig. 33.

Convolution-Based Techniques (eg., Applying Filters to An Image)

Non-Local Means Techniques (eg., Using Similar Patches from Other Parts of The Image

Wavelet-Based Techniques (eg., Applying Wavelet Transforms to An Image)

2.6 Fourier transform


The (2D) Fourier transform is a very classical tool in image processing. It is the extension of the
well known Fourier transform for signals which decomposes a signal into a sum of complex
oscillations (actually, complex exponential). In image processing, the complex oscillations
always come by pair because the pixels have real intensities.
Fig. 33 DFT of the squirrel. The amplitude is shown with a logarithmic scale to distinguish algorithms are used to estimate the pixel values at the new locations based on the
clearly the details (an histogram transformation has been applied). surrounding pixel values.

The amplitude and phase represent the distribution of energy in the frequency plane. The low
frequencies are located in the center of the image, and the high frequencies near the boundaries. Applications of Geometric Transformation
Geometric transformation is a versatile technique that finds a wide range of applications in image
In the figure above, the gray background behind the squirrel is a low frequency area because the processing. Some of the common applications of geometric transformation are as follows:
intensities of the pixels slowly evolve from one pixel to another. On the contrary, the tail is a
high frequency area because the intensities of the pixels show a rapid alternation between the
1. Image Registration: Image registration is the process of aligning two or more images to
hair and the background.
enable comparison or analysis. Geometric transformation is used to align the images by
modifying their position, orientation, or scale.
Inverse Fourier transform
2. Image Rectification: Image rectification is the process of removing distortions from an image
The inverse discrete Fourier transform computes the original image from its Fourier transform:
caused by the projection of a three-dimensional scene onto a two-dimensional image plane.
Geometric transformation is used to correct the perspective distortion by transforming the
image into a new coordinate system.

3. Image Resizing: Image resizing is the process of changing the size of an image while
preserving its aspect ratio. Geometric transformation is used to scale the image by a factor in
the x and y directions.

4. Image Rotation: Image rotation is the process of rotating an image by a specified angle.
2.7 Geometric Transformation Geometric transformation is used to perform the rotation by mapping the original pixel
Geometric transformation is a fundamental technique used in image processing that involves positions to new positions based on the rotation angle.
manipulating the spatial arrangement of pixels in an image. It is used to modify the geometric
properties of an image, such as its size, shape, position, and orientation. The following are some
of the basic concepts of geometric transformation in image processing: 2.8 Image segmentation
1. Transformation Functions: Transformation functions are mathematical functions used to Image segmentation is a commonly used technique in digital image processing and analysis
modify the geometric properties of an image. These functions map the coordinates of each to partition an image into multiple parts or regions, often based on the characteristics of the
pixel in an image to new coordinates based on a specified transformation rule. Some of the pixels in the image.
commonly used transformation functions include scaling, rotation, translation, and shearing.
Image Segmentation is an extension of image classification, a computer vision technique used to
2. Coordinate System: A coordinate system is a reference system used to define the spatial understand what is at the pixel level in an image, in addition to classifying the information in the
location of pixels in an image. In digital image processing, the most commonly used image. It outlines the boundaries of objects to find out what they are, where they are, and how to
coordinate system is the Cartesian coordinate system, which uses two perpendicular axes (x detect individual object detection to individually labeling different regions in an image.
and y) to represent the horizontal and vertical positions of pixels in an image. Image segmentation is one of the most important fields, especially in computer vision. In
machine learning, image segmentation refers to the process of separating data into discrete
3. Interpolation: Interpolation is the process of estimating the pixel values of an image at groups. In deep learning, image segmentation is all about creating a segment map that associates
locations that are not explicitly defined. This is necessary when transforming an image since labels or categories with every pixel in an image, for example, for self-driving cars to identify
the new coordinates may not coincide with the original pixel positions. Interpolation vehicles, pedestrians, traffic signs, and other roadways.
Approaches in Image Segmentation TYPES OF EDGES

There are two common approaches in image segmentation:

1. Similarity approach: This approach involves detecting similarity between image pixels to form
a segment based on a given threshold. Machine learning algorithms like clustering are based on
this type of approach to segment an image.
2. Discontinuity approach: This approach relies on the discontinuity of pixel intensity values of
the image. Line, point and edge detection techniques use this type of approach for obtaining
intermediate segmentation results that can later be processed to obtain the final segmented
image.
Image Segmentation Techniques

There are five common image segmentation techniques.

5 IMAGE SEGMENTATION TECHNIQUES TO KNOW

1. Threshold-based segmentation
2. Edge-based image segmentation
3. Region-based image segmentation
4. Clustering-based image segmentation
5. Artificial neural network-based segmentation Region-Based Image Segmentation
Edge-Based Image Segmentation
A region can be classified as a group of connected pixels exhibiting similar properties. Pixel
Edge-based segmentation relies on edges found in an image using various edge detection similarity can be in terms of intensity and color, etc. In this type of segmentation, some
operators. These edges mark image locations of discontinuity in gray levels, color, texture, etc. predefined rules are present that pixels have to obey in order to be classified into similar pixel
When we move from one region to another, the gray level may change. So, if we can find that regions. Region-based segmentation methods are preferred over edge-based segmentation
discontinuity, we can find that edge. A variety of edge detection operators are available, but the methods if it’s a noisy image. Region-based techniques are further classified into two types
resulting image is an intermediate segmentation result and should not be confused with the final based on the approaches they follow:
segmented image. We have to perform further processing on the image to segment it.

Additional steps include combining edges segments obtained into one segment in order to reduce
1. Region growing method.
the number of segments rather than chunks of small borders which might hinder the process of
region filling. This is done to obtain a seamless border of the object. The goal of edge 2. Region splitting and merging method.
segmentation is to get an intermediate segmentation result to which we can apply region-based or Clustering-Based Segmentation
any other type of segmentation to get the final segmented image.
Clustering is a type of unsupervised machine learning algorithm. It’s often used for image Camera calibration is a necessary step in 3D computer vision in order to extract metric
segmentation. One of the most dominant clustering-based algorithms used for segmentation is K- information from 2D images. Much work has been done, starting in the photogrammetry
means clustering. This type of clustering can be used to make segments in a colored image. community (see [3, 6] to cite a few), and more recently in computer vision ([12, 11, 33, 10, 37,
35, 22, 9] to cite a few). According to the dimension of the calibration objects, we can classify
Segmentation of Contours those techniques roughly into three categories.

Contours can be explained simply as a curve joining all the continuous points (along the 3D reference object based calibration. Camera calibration is performed by observing a
boundary), having same color or intensity. The contours are a useful tool for shape analysis and calibration object whose geometry in 3-D space is known with very good precision. Calibration
object detection and recognition. For better accuracy, use binary images. can be done very efficiently [8]. The calibration object usually consists of two or three planes
orthogonal to each other. Sometimes, a plane undergoing a precisely known translation is also
used [33], which equivalently provides 3D reference points. This approach requires an expensive
calibration apparatus and an elaborate setup.

2D plane based calibration. Techniques in this category requires to observe a planar pattern
shown at a few different orientations [42, 31]. Different from Tsai’s technique [33], the
knowledge of the plane motion is not necessary. Because almost anyone can make such a
calibration pattern by him/her-self, the setup is easier for camera calibration.

1D line based calibration. Calibration objects used in this category are composed of a set of
collinear points [44]. As will be shown, a camera can be calibrated by observing a moving line
around a fixed point, such as a string of balls hanging from the ceiling.

Self-calibration. Techniques in this category do not use any calibration object, and can be
considered as 0D approach because only image point correspondences are required. Just by
moving a camera in a static scene, the rigidity of the scene provides in general two constraints
Line Level Segmentation: [22, 21] on the cameras’ internal parameters from one camera displacement by using image
information alone. Therefore, if images are taken by the same camera with fixed internal
In this level of segmentation, we are provided with a skew corrected image containing text
parameters, correspondences between three images are sufficient to recover both the internal and
written in the form of lines. The objective of Line Level Segmentation is to segment the image
external parameters which allow us to reconstruct 3-D structure up to a similarity [20, 17].
into lines.
Although no calibration objects are necessary, a large number of parameters need to be
estimated, resulting in a much harder mathematical problem.
Circles option is an automatic segmentation technique that you can use to segment an image into
foreground and background elements. The Find Circles option does not require initialization.
Stereo Reconstruction.
Ellipses are described by more parameters than are lines or circles, detecting ellipses is more it lends the possibility for 3D reconstruction to be derived and utilized on preexisting equipment;
challenging than is detecting lines or circles. And while we have nice functionality for detecting greatly reducing the financial barrier to entry.
the simpler shapes (houghlines, imfindcircles), we do not (currently) have a function to detect In our specific case we will investigate the strategy of using stereo images to perform 3D
ellipses. Enter Martin's ellipse-detection function. reconstruction. Stereo images mean that there are two cameras and 2 images required to calculate
a point in 3D space. Essentially the pixels from one image are matched with pixels of the second
and epipolar geometry is used to calculate that same point in 3D space. The result, after
processing all the relevant pixels in each image is a 3D point map of the pictured object including
2.10 Camera calibration depth information.
In practice however 3D reconstruction is not that easy and more steps are required in order to see Visual object recognition refers to the ability to identify the objects in view based on visual
an accurate 3D point map. There will be 4 parts to this series of posts each covering essentials input. One important signature of visual object recognition is "object invariance", or the ability to
techniques which need to be understood in order to effectively perform 3D reconstruction. identify objects across changes in the detailed context in which objects are viewed, including
changes in illumination, object pose, and background context

Basic stages of object recognition

Stage 1 Processing of basic object components, such as color, depth, and form.
Stage 2 These basic components are then grouped on the basis of similarity, providing
information on distinct edges to the visual form. Subsequently, figure-ground segregation
is able to take place.
Stage 3 The visual representation is matched with structural descriptions in memory.
Stage 4 Semantic attributes are applied to the visual representation, providing meaning,
and thereby recognition.

(i)Approaches based on CAD-like object models

Geons

Geons are the simple 2D or 3D forms such


as cylinders, bricks, wedges, cones, circles and rectangles corresponding to the simple parts of an
object in Biederman's recognition-by-components theory The theory proposes that the visual
input is matched against structural representations of objects in the brain. These structural
representations consist of geons and their relations (e.g., an ice cream cone could be broken
down into a sphere located above a cone). Only a modest number of geons (< 40) are assumed.
When combined in different relations to each other (e.g., on-top-of, larger-than, end-to-end, end-
to-middle) and coarse metric variation such as aspect ratio and 2D orientation, billions of
possible 2- and 3-geon objects can be generated. Two classes of shape-based visual identification
that are not done through geon representations, are those involved in: a) distinguishing between
similar faces, and b) classifications that don’t have definite boundaries, such as that of bushes or
a crumpled garment. Typically, such identifications are not viewpoint-invariant.

(ii)Appearance-based methods

 Use example images (called templates or exemplars) of the objects to perform recognition
 Objects look different under varying conditions:
o Changes in lighting or color
o Changes in viewing direction
o Changes in size/shape
 A single exemplar is unlikely to succeed reliably. However, it is impossible to represent all
appearances of an object.
Edge matching Gradient matching

 Uses edge detection techniques, such as the Canny edge detection, to find edges.  Another way to be robust to illumination changes without throwing away as much
 Changes in lighting and color usually don't have much effect on image edges information is to compare image gradients
 Matching is performed like matching greyscale images
Strategy:  Simple alternative: Use (normalized) correlation

1. Detect edges in template and image (iii)Feature-based methods


2. Compare edges images to find the template
3. Must consider range of possible template positions  a search is used to find feasible matches between object features and image features.
 Measurements:  the primary constraint is that a single position of the object must account for all of the
o Good – count the number of overlapping edges. Not robust to changes in shape feasible matches.
o Better – count the number of template edge pixels with some distance of an edge in the  methods that extract features from the objects to be recognized and the images to be
search image searched.
o Best – determine probability distribution of distance to nearest edge in search image (if o surface patches
template at correct position). Estimate likelihood of each template position generating o corners
image o linear edges
Divide-and-Conquer search Interpretation trees

 Strategy:  A method for searching for feasible matches, is to search through a tree.
o Consider all positions as a set (a cell in the space of positions)  Each node in the tree represents a set of matches.
o Determine lower bound on score at best position in cell o Root node represents empty set
o If bound is too large, prune cell o Each other node is the union of the matches in the parent node and one additional match.
o If bound is not too large, divide cell into subcells and try each subcell recursively o Wildcard is used for features with no match
o Process stops when cell is “small enough”  Nodes are “pruned” when the set of matches is infeasible.
 Unlike multi-resolution search, this technique is guaranteed to find all matches that meet the o A pruned node has no children
criterion (assuming that the lower bound is accurate)  Historically significant and still used, but less commonly
 Finding the Bound: Hypothesize and test
o To find the lower bound on the best score, look at score for the template position
represented by the center of the cell  Hypothesize a correspondence between a collection of image features and a collection of
o Subtract maximum change from the “center” position for any other position in cell object features
(occurs at cell corners)  Then use this to generate a hypothesis about the projection from the object coordinate
 Complexities arise from determining bounds on distance frame to the image frame
Greyscale matching  Use this projection hypothesis to generate a rendering of the object. This step is usually
known as backprojection
 Edges are (mostly) robust to illumination changes, however they throw away a lot of  Compare the rendering to the image, and, if the two are sufficiently similar, accept the
information hypothesis
 Must compute pixel distance as a function of both pixel position and pixel intensity Pose consistency
 Can be applied to color also
 Also called Alignment, since the object is being aligned to the image memory as only structural parts need to be encoded, which can produce multiple object
 Correspondences between image features and model features are not independent – representations through the interrelations of these parts and mental rotation.[ Participants in a
Geometric constraints study were presented with one encoding view from each of 24 preselected objects, as well as five
 A small number of correspondences yields the object position – the others must be consistent filler images. Objects were then represented in the central visual field at either the same
with this orientation or a different orientation than the original image. Then participants were asked to
 General Idea: name if the same or different depth- orientation views of these objects presented.[9] The same
o If we hypothesize a match between a sufficiently large group of image features and a procedure was then executed when presenting the images to the left or right visual field.
sufficiently large group of object features, then we can recover the missing camera Viewpoint-dependent priming was observed when test views were presented directly to the right
parameters from this hypothesis (and so render the rest of the object) hemisphere, but not when test views were presented directly to the left hemisphere. The results
 Strategy: support the model that objects are stored in a manner that is viewpoint dependent because the
results did not depend on whether the same or a different set of parts could be recovered from the
o Generate hypotheses using small number of correspondences (e.g. triples of points for
different-orientation views.[9]
3D recognition)
3-D model representation
o Project other model features into image (backproject) and verify additional
The 3-D model representations obtained from the object are formed by first identifying the
correspondences
concavities of the object, which separate the stimulus into individual parts. Recent research
 Use the smallest number of correspondences necessary to achieve discrete object poses
suggests that an area of the brain, known as the caudal intraparietal area (CIP), is responsible for
Pose clustering storing the slant and tilt of a plan surface that allow for concavity recognition.
Recognition by components
 General Idea: An extension of Marr and Nishihara's model, the recognition-by-components theory, proposed
o Each object leads to many correct sets of correspondences, each of which has (roughly) by Biederman (1987), proposes that the visual information gained from an object is divided into
the same pose simple geometric components, such as blocks and cylinders, also known as "geons" (geometric
o Vote on pose. Use an accumulator array that represents pose space for each object ions), and are then matched with the most similar object representation that is stored in memory
o This is essentially a Hough transform to provide the object's identification
 Strategy:
o For each object, set up an accumulator array that represents pose space – each element in
the accumulator array corresponds to a “bucket” in pose space.
o Then take each image frame group, and hypothesize a correspondence between it and
every frame group on every object
o For each of these correspondences, determine pose parameters and make an entry in the
accumulator array for the current object at the pose value.
o If there are large numbers of votes in any object's accumulator array, this can be
interpreted as evidence for the presence of that object at that pose.
o The evidence can be checked using a verification method
(ii) Viewpoint-dependent theories
Recognition by combination of views Viewpoint-dependent theories suggest that object recognition is affected by the viewpoint at
which it is seen, implying that objects seen in novel viewpoints reduce the accuracy and speed of
(i) Viewpoint-invariant theories
object identification.[13] This theory of recognition is based on a more holistic system rather than
Viewpoint-invariant theories suggest that object recognition is based on structural information,
by parts, suggesting that objects are stored in memory with multiple viewpoints and angles. This
such as individual parts, allowing for recognition to take place regardless of the object's
form of recognition requires a lot of memory as each viewpoint must be stored. Accuracy of
viewpoint. Accordingly, recognition is possible from any viewpoint as individual parts of an
recognition also depends on how familiar the observed viewpoint of the object is.[14]
object can be rotated to fit any particular view. This form of analytical recognition requires little
(iii) Multiple views theory
This theory proposes that object recognition lies on a viewpoint continuum where each UNIT IV/ APPLICATIONS
viewpoint is recruited for different types of recognition. At one extreme of this continuum,
viewpoint-dependent mechanisms are used for within-category discriminations, while at the Transforming sensor readings
other extreme, viewpoint-invariant mechanisms are used for the categorization of objects.
Transforming sensor readings in the application of machine vision systems involves converting
raw sensor data into a format that is suitable for processing and analysis using machine vision
algorithms. This transformation typically includes several preprocessing steps to enhance the
quality and usability of the sensor data before feeding it into machine vision algorithms. Here are
some common steps involved in transforming sensor readings in the application of machine
vision systems:

1. Data Acquisition: The first step is to acquire sensor readings from the sensors deployed in the
environment. These sensors may include cameras, LiDAR, depth sensors, thermal sensors, or
any other sensors relevant to the specific application.
2. Data Calibration: Sensor data often needs to be calibrated to correct for distortions, noise, or
other artifacts introduced by the sensor hardware or environment. Calibration involves estimating
and compensating for systematic errors in the sensor readings to improve their accuracy and
consistency.
3. Image Formation: For vision-based sensors such as cameras, the acquired sensor readings are
typically in the form of images. Image formation involves converting raw sensor data into
images with the appropriate resolution, color space, and format for further processing.
4. Image Preprocessing: Preprocessing steps are applied to the acquired images to enhance their
quality and remove unwanted artifacts or noise. Common preprocessing techniques include
image denoising, smoothing, contrast enhancement, and image normalization.
5. Image Registration: If multiple sensors are used in the system, such as stereo cameras or multi-
modal sensors, image registration may be necessary to align the acquired images into a common
coordinate system. This ensures that corresponding features in different sensor readings are
properly matched for subsequent analysis.
6. Feature Extraction: Feature extraction involves identifying relevant visual features or keypoints
in the sensor readings that are informative for the task at hand. These features may include edges,
corners, blobs, textures, or other distinctive patterns in the images.

1
7. Dimensionality Reduction: In some cases, the sensor readings may contain high-dimensional 1. Acquisition of Sonar Data: The first step is to acquire sonar data using underwater sonar
data that is redundant or irrelevant for the machine vision task. Dimensionality reduction sensors. Sonar sensors emit sound waves into the water and measure the time it takes for the
techniques, such as principal component analysis (PCA) or feature selection, can be applied to sound waves to bounce off objects and return to the sensor. This information is used to estimate
reduce the complexity of the data while preserving its informative content. the distances to objects in the underwater environment.
8. Normalization and Standardization: Normalization and standardization techniques may be 2. Data Preprocessing: Raw sonar data may contain noise, artifacts, or inconsistencies that need to
applied to scale and normalize the sensor readings to a common range or distribution, which can be addressed before mapping. Preprocessing techniques such as filtering, noise removal, and
improve the performance of machine learning algorithms and enhance their robustness to calibration are applied to clean up the data and enhance its quality.
variations in the data. 3. Sonar Data Fusion: In many cases, sonar data is collected from multiple sensors or from
9. Data Augmentation: Data augmentation techniques may be employed to artificially increase the different types of sonar systems (e.g., side-scan sonar, multibeam sonar). Sonar data fusion
diversity and variability of the sensor readings, especially in scenarios where the available techniques are used to integrate data from different sensors or modalities into a unified
training data is limited. Augmentation techniques include rotation, translation, scaling, flipping, representation of the underwater environment.
and adding noise to the sensor readings. 4. Georeferencing: Sonar data is often georeferenced to a coordinate system that allows for
10. Data Fusion: If the system incorporates multiple sensors, data fusion techniques may be used to accurate mapping and localization. This involves associating each sonar reading with its
combine information from different sensors to obtain a more comprehensive and accurate corresponding position in three-dimensional space, typically using GPS coordinates or other
representation of the environment. This can improve the reliability and robustness of the machine navigation systems.
vision system. 5. Mapping Algorithm: Various mapping algorithms can be used to create a representation of the
underwater environment based on sonar data. Common approaches include occupancy grid
Mapping sonar data mapping, probabilistic mapping (e.g., using Bayesian filters), and simultaneous localization and
mapping (SLAM) techniques adapted for underwater environments.
By applying these preprocessing and transformation steps to sensor readings, machine vision
6. Terrain Reconstruction: Sonar data can be used to reconstruct the terrain or topography of the
systems can effectively extract meaningful information from raw sensor data and perform tasks
underwater environment. This involves estimating the depth or elevation of the seabed and
such as object detection, recognition, tracking, scene understanding, and navigation in various
identifying features such as underwater hills, valleys, ridges, and obstacles.
robotic and computer vision applications.
7. Object Detection and Classification: Sonar data can also be used to detect and classify
underwater objects such as wrecks, rocks, vegetation, marine life, and man-made structures.
Mapping sonar data involves creating a representation of the underwater environment based on
Detection algorithms analyze sonar readings to identify objects based on their size, shape,
the readings obtained from sonar sensors. Sonar (Sound Navigation and Ranging) is a technique
texture, and acoustic properties.
that uses sound waves to detect objects underwater and measure their distances. Mapping sonar
8. Visualization: The mapped sonar data can be visualized using graphical techniques to provide a
data is crucial in various underwater applications, including marine exploration, underwater
visual representation of the underwater environment. This may involve generating 2D or 3D
navigation, underwater archaeology, and underwater robotics. Here's an overview of the process
maps, depth contours, bathymetric charts, or point cloud representations of the seabed.
involved in mapping sonar data:
9. Integration with Navigation Systems: Mapped sonar data is often integrated with navigation
systems to enable underwater vehicles or robots to navigate autonomously in the underwater
2 3
environment. Navigation algorithms use the mapped environment to plan trajectories, avoid 4. Tracking: The detected lanes are tracked over time to maintain the vehicle's position within the
obstacles, and reach specified destinations. lane and anticipate changes in road geometry. Tracking algorithms, such as Kalman filters,
particle filters, or model-based tracking, are used to predict the future positions of lane markings
By mapping sonar data, researchers, scientists, and engineers can gain valuable insights into the
based on their current trajectories and update the vehicle's control inputs accordingly.
underwater world, explore uncharted territories, and support a wide range of underwater
5. Control and Guidance: Based on the information obtained from lane detection and tracking,
applications, including marine research, underwater exploration, environmental monitoring, and
control algorithms adjust the vehicle's steering angle to keep it centered within the lane and
underwater infrastructure inspection.
follow the curvature of the road. These algorithms may include proportional-integral-derivative
(PID) controllers, model predictive control (MPC), or reinforcement learning (RL) approaches.
Aligning Laser Scan Measurements
6. Integration with Sensor Fusion: Lane detection and tracking are often integrated with data
Aligning laser scan measurements in a machine vision system enables the system to leverage the from other sensors, such as GPS, inertial measurement units (IMUs), and radar, to improve
complementary strengths of different sensors, leading to improved perception, accuracy, and robustness and accuracy. Sensor fusion techniques combine information from multiple sensors to
understanding of the environment, which is essential for a wide range of applications in robotics, provide a more comprehensive understanding of the vehicle's surroundings and enhance
automation, and computer vision. navigation performance.
In the context of vision and tracking, "following the road" typically refers to the task of 7. Adaptive Behavior: Advanced systems may incorporate adaptive strategies to handle
autonomously guiding a vehicle along a roadway by detecting and tracking the road's boundaries challenging road conditions, such as occlusions, adverse weather, or poorly visible lane
or lane markings. This task is fundamental to various applications, including autonomous markings. These systems dynamically adjust their behavior and control parameters based on real-
driving, advanced driver-assistance systems (ADAS), and robotics navigation. Here's how vision time sensor inputs to maintain safe and reliable operation.
and tracking techniques are employed to achieve this:
Vision and Tracking Following By the Road
1. Road Detection: The first step in following the road is to detect the boundaries of the road or
lane markings in the captured images or sensor data. Vision algorithms, such as edge detection, By employing these vision and tracking techniques, vehicles and autonomous systems can
line detection, or semantic segmentation, are applied to identify the road surface and distinguish effectively follow the road, navigate complex environments, and ensure safe and reliable
it from other objects in the scene. operation in various driving conditions.
2. Lane Detection: Once the road is detected, lane markings are identified within the detected road Vision and tracking technologies play a crucial role in various applications related to road
area. This involves detecting lane boundaries, such as solid lines, dashed lines, or curbs, using monitoring, traffic management, and autonomous driving systems. Here's how these technologies
techniques like Hough transform, polynomial fitting, or convolutional neural networks (CNNs). are utilized for road-related tasks:
Lane markings provide crucial information for vehicle guidance and trajectory planning.
1. Lane Detection: Vision-based lane detection algorithms are used to identify and track lane
3. Feature Extraction: Relevant features of the detected lanes, such as curvature, width, and
markings on roads. These algorithms analyze captured images or video frames to detect lane
position relative to the vehicle, are extracted from the detected lane markings. These features
boundaries, including lane markings such as solid lines, dashed lines, or curbs. Lane detection is
help in determining the vehicle's position within the lane and estimating the curvature of the road
essential for tasks such as lane-keeping assistance systems in vehicles or autonomous navigation.
ahead.
4 5
2. Object Detection and Tracking: Object detection and tracking algorithms are employed to systems. These technologies help enhance road safety, improve traffic efficiency, and enable the
identify and monitor various objects on the road, including vehicles, pedestrians, cyclists, and development of advanced transportation systems.
obstacles. These algorithms utilize machine learning techniques, such as CNNs, to detect and
Multiscale image processing
track objects within the camera's field of view. Object tracking enables tasks such as collision
Multiscale image processing is a methodology that involves analyzing and processing images at
avoidance, adaptive cruise control, and pedestrian detection systems.
multiple resolutions or scales. Instead of treating an image as a single entity, multiscale
3. Traffic Sign Recognition: Vision-based traffic sign recognition systems automatically detect
processing involves decomposing the image into different levels of detail, each representing
and interpret traffic signs and signals, such as speed limits, stop signs, and traffic lights. These
different frequency bands or spatial resolutions. This approach allows for a more comprehensive
systems analyze captured images or video frames to recognize and interpret the symbols and
understanding of image content and facilitates various image processing tasks.
colors of traffic signs, providing valuable information for navigation and traffic management.
4. Road Surface Analysis: Vision-based systems can analyze the road surface to detect various
Here are some key aspects of multiscale image processing:
conditions, such as road markings, potholes, cracks, and surface irregularities. This information
can be used for road maintenance and repair planning, as well as to provide warnings to drivers 1. Multiscale Decomposition: The process of decomposing an image into multiple scales or levels
about hazardous road conditions. of detail. This can be achieved using techniques such as wavelet transforms, pyramid
5. Vehicle Localization and Mapping: Vision-based simultaneous localization and mapping decompositions (e.g., Gaussian pyramid, Laplacian pyramid), or scale-space representations.
(SLAM) techniques are used to accurately determine the position and orientation of vehicles on 2. Feature Extraction: Multiscale processing enables the extraction of features at different levels
the road, as well as to create detailed maps of the surrounding environment. SLAM algorithms of detail. Features may include edges, textures, shapes, or other salient characteristics present in
utilize visual data from cameras, along with other sensor inputs, to estimate the vehicle's pose the image.
and create a map of the road and surrounding landmarks. 3. Enhancement and Restoration: Multiscale techniques can be used to enhance or restore images
6. Behavioral Analysis: Vision-based systems can analyze the behavior of road users, such as by selectively processing different scales. For example, noise reduction algorithms may be
vehicle trajectories, speed profiles, and lane-changing patterns. This information is valuable for applied at coarse scales to preserve overall image structure while enhancing fine details at finer
traffic flow analysis, congestion management, and understanding driver behavior. scales.
7. Autonomous Driving Systems: Vision-based technologies are integral to autonomous driving 4. Segmentation: Multiscale segmentation techniques partition images into meaningful regions or
systems, enabling vehicles to perceive and understand their surroundings for safe and efficient objects at multiple resolutions. By considering different levels of detail, multiscale segmentation
navigation. Autonomous vehicles use vision sensors, along with other sensors such as LiDAR methods can handle both large-scale structures and fine details within an image.
and radar, to detect and track objects, interpret road markings and signs, plan optimal 5. Object Detection and Recognition: Multiscale processing aids in object detection and
trajectories, and make real-time driving decisions. recognition by analyzing objects at multiple resolutions. This allows for robust detection of
objects of varying sizes and scales within an image.
In summary, vision and tracking technologies are essential for various road-related applications,
6. Pyramid-based Techniques: Methods such as Gaussian and Laplacian pyramids construct
including lane detection, object detection and tracking, traffic sign recognition, road surface
hierarchical representations of images, where each level of the pyramid represents a different
analysis, vehicle localization and mapping, behavioral analysis, and autonomous driving
scale. These pyramids are used for tasks such as image blending, texture synthesis, and image
compression.
6 7
7. Feature Fusion: Multiscale features extracted from different levels of detail can be fused to 5. Enhancement and Restoration: Enhancing or restoring iconic images to improve their clarity,
improve the performance of various image processing tasks, such as classification, registration, contrast, or visual appearance while preserving the integrity of the iconic objects or symbols.
and matching. 6. Cultural and Contextual Analysis: Analyzing the cultural significance or contextual meaning
8. Scale-Invariant Analysis: Multiscale processing facilitates scale-invariant analysis, where of iconic images and symbols within different contexts or domains.
image features are detected and analyzed regardless of their size or scale. This is particularly
useful for tasks such as object recognition and tracking under varying viewing conditions. While "iconic image processing" may not be a standard term in the field of image processing, the
concept can encompass a range of techniques and methodologies aimed at understanding and
Overall, multiscale image processing provides a powerful framework for analyzing and working with images that contain iconic or symbolic representations. If you have a specific
understanding images at different levels of detail, enabling a wide range of applications in fields application or question in mind related to iconic image processing, feel free to provide more
such as computer vision, remote sensing, medical imaging, and multimedia processing. details, and I can offer further insights.

Iconic Image Processing Video Tracking - Learning Land Marks


Video tracking involves the process of locating and following objects of interest within a
In the context of image processing, "iconic" could refer to the processing of iconic images or sequence of video frames over time. Learning landmarks in the context of video tracking
iconic representations within images. An iconic image could be an image that carries a particular typically refers to identifying specific features or landmarks on an object that serve as reference
symbolic or iconic significance, such as famous landmarks, logos, or culturally significant points for tracking.
symbols.
Here's how the process of learning landmarks in video tracking might be approached:
Iconic image processing might involve techniques that focus on the detection, recognition,
analysis, or manipulation of iconic objects or symbols within images. This could include: 1. Feature Selection: The first step is to select suitable landmarks or features on the object being
tracked. These landmarks should be distinctive and easily identifiable across different frames of
1. Object Detection: Identifying iconic objects or symbols within images using techniques such as
the video. Common landmarks could include corners, edges, or other salient points.
template matching, object recognition algorithms, or deep learning-based object detection
2. Feature Extraction: Once the landmarks are selected, computer vision techniques are used to
models.
extract these features from the video frames. Feature extraction methods could include corner
2. Feature Extraction: Extracting features specific to iconic objects or symbols, which could
detection algorithms like Harris corner detector, FAST, or SIFT (Scale-Invariant Feature
include shape descriptors, color histograms, texture features, or other characteristics that are Transform).
distinctive to these objects.
3. Feature Representation: The extracted features are then represented in a suitable format that
3. Classification and Recognition: Classifying or recognizing iconic objects or symbols within
allows for efficient comparison and matching across frames. This could involve encoding the
images, which could involve training machine learning models to distinguish between different features' locations, orientations, and other relevant attributes.
classes of iconic objects.
4. Learning: In the context of video tracking, learning landmarks often involves training a model
4. Segmentation: Segmenting iconic objects or symbols from the background or from other objects
to recognize and track these landmarks automatically. Machine learning techniques, such as
within images to isolate them for further analysis or manipulation.

8 9
supervised learning or deep learning, can be used to train models that can detect and track 3. Histogram Representation: For each spatial bin, a histogram is computed based on the
landmarks in video sequences. distribution of landmarks within that bin. The histogram captures the frequency or density of
5. Tracking: Once the landmarks are learned, the tracking algorithm uses them as reference points landmarks in different spatial locations.
to track the object across consecutive frames of the video. Various tracking algorithms, such as 4. Normalization: To make the representation invariant to changes in image scale, orientation, or
optical flow, Kalman filters, or more sophisticated deep learning-based trackers, can be illumination, the histograms may be normalized. Common normalization techniques include L1
employed for this purpose. or L2 normalization, where the histogram values are divided by the sum of all histogram values
6. Adaptation and Refinement: The tracking algorithm may need to adapt to changes in the or by the Euclidean norm, respectively.
object's appearance, scale, or orientation over time. Techniques such as online learning or model 5. Concatenation or Aggregation: The histograms from all spatial bins are typically concatenated
update strategies can be employed to continuously refine the learned landmarks and improve or aggregated into a single feature vector, forming the landmark spatiogram representation for
tracking accuracy. the image or region of interest.
6. Application: Landmark spatiograms can be used as feature descriptors for various computer
By learning and tracking landmarks in videos, it becomes possible to monitor the movement, vision tasks such as object recognition, image retrieval, scene classification, and activity
deformation, or other changes in the object of interest over time, enabling applications such as recognition. They capture information about the spatial arrangement and distribution of
object tracking, motion analysis, activity recognition, and surveillance. landmarks, which can be valuable for discriminating between different visual patterns and
categories.
Landmark Spatiograms

Landmark spatiograms are a representation used in computer vision and image processing to Landmark spatiograms offer a compact and informative representation of the spatial layout of
describe the spatial distribution of landmarks within an image or a region of interest. They are keypoints within an image or region, making them useful for a wide range of applications in
particularly useful in tasks such as object recognition, image classification, and scene computer vision and image analysis.
understanding.

K-Means Clustering
Here's an overview of landmark spatiograms:

1. Definition: A landmark spatiogram represents the spatial layout of landmarks or keypoints K-means clustering is a popular unsupervised machine learning algorithm used for partitioning a

within an image or a specific region. Landmarks are typically distinctive points or features dataset into a predefined number of clusters. The goal of K-means clustering is to group data

detected using keypoint detection algorithms like SIFT (Scale-Invariant Feature Transform), points into clusters in such a way that points within the same cluster are more similar to each

SURF (Speeded-Up Robust Features), or ORB (Oriented FAST and Rotated BRIEF). other compared to points in other clusters. It's widely used in various fields including image

2. Spatial Binning: The image or region of interest is divided into a grid or spatial bins. Each bin processing, data mining, and pattern recognition.

corresponds to a specific area within the image, and the number of landmarks falling into each
Here's how the K-means clustering algorithm works:
bin is counted.

10 11
1. Initialization: Choose the number of clusters, K, and randomly initialize K cluster centroids. Overall, K-means clustering is a powerful and widely used technique for partitioning data into
These centroids represent the initial cluster centers. clusters, with applications ranging from customer segmentation and image compression to
2. Assign Data Points to Clusters: For each data point, calculate the distance to each centroid and anomaly detection and document clustering.
assign the point to the cluster whose centroid is closest. This is typically done using a distance
metric such as Euclidean distance.
3. Update Cluster Centroids: After assigning all data points to clusters, update the cluster EM (Expectation-Maximization) clustering
centroids by computing the mean of all data points assigned to each cluster. The new centroid EM (Expectation-Maximization) clustering, also known as Gaussian Mixture Model (GMM)
becomes the center of the cluster. clustering, is a probabilistic model-based clustering algorithm. Like K-means clustering, EM
4. Repeat: Iteratively repeat the assignment and centroid update steps until convergence criteria are clustering is used to partition data into clusters, but it assumes that the data is generated from a
met. Convergence can be determined by checking whether the cluster assignments or centroids mixture of multivariate Gaussian distributions rather than assigning each point to a single cluster.
no longer change significantly between iterations, or by reaching a maximum number of
Here's how the EM clustering algorithm works:
iterations.
5. Finalization: Once convergence is reached, the algorithm outputs the final cluster assignments
1. Initialization: Initialize the parameters of the Gaussian mixture model. This includes the means,
and centroids.
covariances, and mixing coefficients of the Gaussian distributions.
2. Expectation Step (E-step): For each data point, calculate the probability that it belongs to each
Key aspects and considerations of K-means clustering:
of the Gaussian distributions (clusters) based on the current parameter estimates. This step
 Number of Clusters (K): The choice of K is crucial and often requires domain knowledge or computes the posterior probability of each data point belonging to each cluster using Bayes'
exploration through techniques such as the elbow method or silhouette analysis. theorem.

 Initialization: The choice of initial centroids can affect the final clustering result. Common 3. Maximization Step (M-step): Update the parameters of the Gaussian mixture model (means,
initialization methods include random selection of data points or using more sophisticated covariances, and mixing coefficients) based on the current assignments of data points to clusters.
techniques such as K-means++. This step involves maximizing the likelihood of the observed data given the current model
 Convergence: K-means may converge to a local minimum rather than the global minimum, parameters.

depending on the initial centroids. Running the algorithm multiple times with different 4. Repeat: Iteratively perform the E-step and M-step until convergence criteria are met.

initializations can help mitigate this issue. Convergence is typically determined by the change in the log-likelihood of the data or the
 Scalability: K-means can be computationally efficient and scales well to large datasets, but it change in the model parameters between iterations.

may struggle with clusters of varying sizes, non-convex shapes, or data with uneven density. 5. Finalization: Once convergence is reached, the algorithm outputs the final cluster assignments

 Cluster Interpretation: After clustering, it's important to interpret and analyze the resulting and model parameters.

clusters to understand their characteristics and relevance to the data.


Key aspects and considerations of EM clustering:

12 13
 Number of Clusters: Similar to K-means, the number of clusters in EM clustering needs to be UNIT V/ ROBOT VISION
specified. Techniques such as the Bayesian Information Criterion (BIC) or cross-validation can
be used to determine the optimal number of clusters. Basic introduction to Robotic operating System (ROS) - Real and Simulated Robots -

 Initialization: The initialization of the model parameters can influence the convergence and Introduction to OpenCV, Open NI and PCL, installing and testing ROS camera Drivers, ROS to

final clustering result. Common initialization methods include random initialization or using the OpenCV - The cv_bridge Package.

results of K-means clustering as initial estimates.


Basic introduction to Robotic operating System (ROS)
 Cluster Covariances: Unlike K-means, EM clustering can model clusters with different shapes
and orientations by allowing each cluster to have its own covariance matrix. The Robotic Operating System, commonly referred to as ROS, is an open-source
 Soft Assignments: EM clustering provides soft assignments of data points to clusters, meaning framework for building robotic systems. It provides a collection of software libraries and tools
that each data point has a probability distribution over all clusters rather than belonging to a that help developers to create and manage complex robot applications. ROS was initially
single cluster. This makes it more flexible than K-means for capturing complex data developed by Willow Garage and is now maintained by the Open Robotics organization.
distributions.
Here's a basic introduction to ROS:
 Computational Complexity: EM clustering can be more computationally intensive compared to
K-means, especially for high-dimensional data or large datasets, due to the estimation of
Modularity: ROS is designed with a modular architecture, allowing developers to build robot
covariance matrices.
systems by combining pre-existing software components called packages. These packages
 Cluster Interpretation: After clustering, it's important to interpret and analyze the resulting
encapsulate functionalities such as control algorithms, sensor drivers, perception algorithms, and
clusters, which may involve examining the means, covariances, and mixing coefficients of the
communication protocols. This modular approach promotes code reuse and simplifies the
Gaussian distributions.
development process.

Overall, EM clustering is a powerful and flexible algorithm for clustering data with complex 1. Communication Infrastructure: One of the key features of ROS is its communication
distributions, and it has applications in fields such as pattern recognition, image segmentation, infrastructure, which enables seamless data exchange between different components of a robotic
and natural language processing. system. ROS uses a publish-subscribe messaging system known as the ROS Communication
(ROS Comms) system. This system allows nodes (individual software processes) to
communicate with each other by publishing messages to topics and subscribing to messages
from topics.
2. Tools and Utilities: ROS provides a suite of command-line tools and graphical user interfaces
(GUIs) to facilitate development, debugging, and visualization of robotic systems. Some
commonly used tools include roscore (ROS master), rostopic (topic command-line tool), rviz
(3D visualization tool), and rqt (ROS GUI toolset).
3. Support for Multiple Programming Languages: ROS supports multiple programming
languages, including C++, Python, and more recently, JavaScript. This allows developers to
14 1
write robot applications using their preferred programming language and seamlessly integrate 4. Installation: Depending on the version of ROS you've chosen, you'll need to install either the
them into the ROS ecosystem. ros-desktop-full package (includes ROS, rqt, rviz, robot-generic libraries, 2D/3D simulators,
4. Community and Ecosystem: ROS has a large and active community of developers, researchers, navigation, and perception) or a lighter package if you need a more minimal installation.
and enthusiasts contributing to its development and maintenance. The ROS community provides 5. Initialize rosdep: Before you can use many ROS tools, you will need to initialize rosdep, which
a wealth of resources, including documentation, tutorials, sample code, and forums, making it enables you to easily install system dependencies for the ROS packages to compile and run.
easier for newcomers to get started with robotics development using ROS. 6. Environment Setup: Set up the ROS environment variables in your shell session. This involves
5. Platform Independence: ROS is platform-independent, meaning it can run on various operating sourcing the setup script provided by ROS for your chosen distribution. You can either do this
systems such as Linux, macOS, and Windows. However, it is primarily developed and optimized manually each time you open a new terminal or add it to your shell configuration file (e.g.,
for Linux distributions such as Ubuntu. .bashrc) for automatic sourcing.
7. Dependencies Installation: You may need to install additional dependencies specific to your
Overall, ROS has become the de facto standard for robotics development due to its flexibility,
robot or project. ROS provides a tool called rosdep to help you install system dependencies for
modularity, and extensive community support. It is widely used in academia, industry, and
the ROS packages you want to compile.
hobbyist projects for building a wide range of robotic applications, including industrial robots,
8. Test Installation: After completing the installation, it's a good idea to verify that ROS is
autonomous vehicles, drones, manipulators, and more.
installed correctly by running some basic commands such as roscore, rostopic list, or launching

Basic Steps In To Install ROS ROS graphical tools like rviz.

Installing ROS involves several steps, primarily depending on the operating system you Real robots
are using. Here are the basic steps for installing ROS on Ubuntu, which is the officially
supported platform for ROS: Real robots refer to physical machines or devices designed and programmed to perform
tasks autonomously or semi-autonomously. These robots can range from simple industrial arms
1. Check System Requirements: Ensure that your system meets the minimum requirements for programmed for repetitive tasks to complex humanoid robots capable of interacting with humans
running ROS. Generally, it's recommended to have a relatively modern computer with a decent and performing various functions. Real robots are used in a wide range of fields including
amount of RAM and disk space. manufacturing, healthcare, agriculture, exploration, defense, and entertainment. They are built
2. Choose ROS Distribution: Decide which version of ROS you want to install. ROS versions are using a combination of mechanical, electrical, and software engineering principles, and they
named after turtles; for example, Melodic Morenia, Noetic Ninjemys, etc. Choose the version often incorporate sensors, actuators, and control systems to perceive their environment and
that best suits your needs, considering factors such as long-term support, compatibility with your execute tasks.
hardware, and availability of packages.
Real robots can be categorized into various types based on their design, functionality, and
3. Set Up Sources: Set up your computer to accept software from packages.ros.org. This involves
application. Here are some common types of real robots:
adding the ROS package repository to your system's list of package sources.
1. Industrial Robots: These are perhaps the most common type of real robot, used extensively in
manufacturing and production environments. They are typically fixed or articulated arms
2 3
equipped with tools or end-effectors for tasks such as welding, painting, assembly, and researchers, engineers, and developers to test and validate robotic algorithms, control strategies,
packaging. and designs before deploying them in physical robots. Simulated robots can vary widely in
2. Mobile Robots: These robots are designed to move around their environment. They can be complexity, from simple models with basic functionalities to highly realistic representations with
wheeled, tracked, or legged and are often used for tasks such as material handling, surveillance, sophisticated behaviors.
exploration, and logistics in indoor or outdoor settings.
3. Service Robots: Service robots are intended to assist humans in various tasks, often in domestic Simulated robots offer several advantages:

or commercial settings. Examples include robotic vacuum cleaners, assistive robots for the
1. Cost-Effectiveness: Building and testing physical robots can be expensive and time-consuming.
elderly or disabled, and customer service robots in retail environments.
Simulated robots provide a cost-effective alternative, allowing researchers to iterate quickly and
4. Autonomous Vehicles: Autonomous vehicles, such as self-driving cars, trucks, and drones, are
explore a wide range of design choices without the need for physical hardware.
robotic systems capable of navigating and operating in their environment without direct human
2. Safety: In some cases, testing certain robotic behaviors or algorithms in the real world can pose
intervention. They utilize sensors, algorithms, and AI to perceive and interpret their surroundings
safety risks to humans or the environment. Simulated robots provide a safe environment for
and make decisions accordingly.
experimentation, where failures and errors can be analyzed and corrected without real-world
5. Humanoid Robots: Humanoid robots are designed to resemble humans in appearance and
consequences.
behavior to varying degrees. They are often used in research, entertainment, and social
3. Scalability: Simulated environments can easily scale to accommodate large numbers of robots or
interaction applications, as well as in environments where human-like dexterity and mobility are
complex scenarios that may be impractical or impossible to replicate in the physical world.
required.
4. Control: Simulated robots offer precise control over various environmental factors, such as
6. Medical Robots: Medical robots are used in healthcare settings for tasks such as surgery,
lighting, terrain, and obstacles, allowing researchers to systematically study the impact of these
rehabilitation, diagnosis, and patient care. Examples include surgical robots for minimally
factors on robot performance.
invasive procedures, exoskeletons for physical therapy, and telepresence robots for remote
5. Accessibility: Simulated robotics platforms and environments are often freely available or open-
medical consultations.
source, making them accessible to a broad community of researchers, students, and hobbyists.
7. Agricultural Robots: Agricultural robots, also known as agribots or agrobots, are designed to
automate tasks in farming and agriculture. They can perform activities such as planting, Simulated robots are widely used in robotics research, education, and development across
harvesting, weeding, and monitoring crops and livestock, helping to improve efficiency and various domains, including industrial automation, autonomous vehicles, robotic manipulation,
productivity in the agricultural sector. and swarm robotics. They play a crucial role in advancing our understanding of robotic systems
and accelerating the development of new technologies and applications.
These are just a few examples, and there are many other specialized types of real robots tailored
to specific applications and industries. OpenCV (Open Source Computer Vision) ROS
OpenCV (Open Source Computer Vision) ROS (Robot Operating System) is a
Simulated robots
specialized integration of the OpenCV library within the ROS framework. It combines the
Simulated robots are virtual entities created within computer simulations or virtual environments powerful computer vision capabilities of OpenCV with the flexible robotics framework provided
to replicate the behavior and characteristics of real-world robots. These simulations allow by ROS. Here's an introduction to OpenCV ROS:
4 5
1. Purpose: OpenCV ROS extends the capabilities of ROS by providing access to a wide range of robotic systems with integrated vision capabilities. It serves as a valuable tool for building
computer vision algorithms and tools from the OpenCV library. It allows roboticists and vision-based applications in robotics and automation.
developers to leverage computer vision techniques seamlessly within ROS-based robotic Within the ROS (Robot Operating System) ecosystem, OpenCV (Open Source Computer Vision)
systems. is often integrated in various ways to facilitate computer vision tasks for robotic applications.
2. Integration: OpenCV ROS integrates the OpenCV library with the ROS ecosystem, enabling
Here are some common types or aspects of OpenCV ROS integration:
ROS nodes to utilize OpenCV functions for various computer vision tasks. This integration
simplifies the development of vision-based robotic applications within the ROS framework.
1. ROS Wrapper Nodes for OpenCV Functions: These nodes provide a ROS interface to
3. ROS Nodes: OpenCV ROS provides ROS nodes that encapsulate common computer vision
specific OpenCV functions or algorithms. For example, there might be a ROS node that
functionalities, such as image processing, object detection, feature extraction, and camera
subscribes to an image topic, performs edge detection using OpenCV, and publishes the result as
calibration. These nodes can be easily incorporated into ROS-based robot systems, facilitating
a new image topic.
tasks such as navigation, manipulation, perception, and localization.
2. Image Processing Nodes: These nodes leverage OpenCV for various image processing tasks
4. Image Transport: OpenCV ROS includes components for efficient image transport between
within ROS. This can include operations such as blurring, thresholding, color conversion, feature
ROS nodes. It provides interfaces for subscribing to and publishing image messages, enabling
detection, and more.
seamless communication and data exchange between different parts of a robotic system.
3. Object Detection and Recognition Nodes: These nodes use OpenCV's object detection and
5. Compatibility: OpenCV ROS is compatible with various versions of ROS and OpenCV,
recognition algorithms within ROS environments. Examples include face detection, object
allowing developers to choose the combination that best suits their requirements. It supports both
tracking, and template matching.
the C++ and Python programming languages, enabling developers to write vision-based ROS
4. Camera Calibration Nodes: OpenCV provides robust camera calibration tools, which are often
nodes in their preferred language.
integrated into ROS for calibrating cameras used in robotic systems. These nodes help correct for
6. Community and Resources: Like ROS and OpenCV individually, OpenCV ROS benefits from
lens distortion and determine camera intrinsic and extrinsic parameters.
a large and active community of developers and users. It offers extensive documentation,
5. Feature Extraction and Matching Nodes: OpenCV's feature extraction and matching
tutorials, and examples to help developers get started with using computer vision in ROS-based
algorithms are commonly used in ROS for tasks such as visual odometry, SLAM (Simultaneous
robotics projects.
Localization and Mapping), and point cloud registration.
7. Applications: OpenCV ROS finds applications in a wide range of robotic systems and domains,
6. ROS Image Transport with OpenCV: ROS provides an image transport mechanism for
including autonomous vehicles, industrial automation, service robots, agricultural robotics,
efficient image data communication between nodes. OpenCV can be used to encode, decode, and
surveillance systems, and more. It enables robots to perceive and interpret their environment
process images transported through ROS, ensuring compatibility and interoperability.
using vision sensors, facilitating tasks such as object recognition, localization, navigation, and
7. Custom ROS Messages for OpenCV Data: In some cases, custom ROS message types are
manipulation.
defined to represent data structures used by OpenCV (e.g., keypoints, descriptors). This allows

In summary, OpenCV ROS combines the computer vision capabilities of OpenCV with the for seamless integration of OpenCV data with other ROS components.

flexibility and scalability of the ROS framework, empowering developers to create advanced

6 7
8. Integration with ROS Visualization Tools: OpenCV results can be visualized using ROS In summary, OpenNI and ROS are two powerful frameworks that serve different purposes in the
visualization tools such as RViz (ROS Visualization) or rqt_image_view, providing real-time field of robotics and computer vision. Integrating OpenNI into ROS-based systems allows
feedback and debugging capabilities. developers to harness the capabilities of both frameworks to build advanced robotic applications
with natural interaction capabilities.
These are just a few examples of how OpenCV is integrated into ROS for various computer
vision tasks in robotic applications. Depending on the specific requirements of a project, Point Cloud Library (PCL):
developers may implement additional types of integration to leverage the full capabilities of both  PCL is an open-source library for 2D/3D image and point cloud processing. It offers
OpenCV and ROS. numerous algorithms for filtering, feature estimation, registration, segmentation, and
surface reconstruction from point cloud data.
OpenNI:  Developed by a community of researchers and engineers, PCL provides a comprehensive
 OpenNI is an open-source framework primarily used for developing applications that suite of tools for working with point cloud data captured from various 3D sensors such as
involve natural interaction, such as gesture recognition, motion tracking, and 3D sensing. LiDAR, depth cameras, and stereo cameras.
 Originally developed by PrimeSense, it provides a set of APIs and tools that enable  PCL is widely used in robotics, computer vision, augmented reality, and other fields for
developers to work with depth sensors, such as Microsoft Kinect, Asus Xtion, and other tasks like environment perception, object recognition, scene understanding, and robot
similar devices. localization and mapping.
 OpenNI offers functionalities for depth sensing, skeleton tracking, hand gestures, and
scene analysis, making it suitable for a wide range of interactive applications. Integration of PCL with ROS:

Integration of OpenNI and ROS:  The integration of PCL with ROS allows developers to leverage the powerful point cloud
processing capabilities provided by PCL within ROS-based robotic systems.
 OpenNI can be integrated into ROS-based robotic systems to provide depth sensing and natural  ROS provides packages and libraries for interfacing with various sensors and devices, and there
interaction capabilities. are specific packages available for integrating PCL with ROS.
 This integration allows ROS developers to leverage the features provided by OpenNI for tasks  ROS packages such as pcl_ros provide convenient interfaces for converting point cloud
such as environment perception, human-robot interaction, and object recognition. messages between ROS and PCL formats, enabling seamless integration of PCL algorithms into
 ROS provides packages and libraries for interfacing with various sensors and devices, including ROS nodes.
those supported by OpenNI. For instance, there are ROS packages like openni_camera and  By integrating PCL with ROS, developers can perform advanced point cloud processing tasks
openni_tracker that enable ROS nodes to communicate with depth sensors and perform tasks such as filtering, feature extraction, object segmentation, and 3D reconstruction within ROS-
like capturing depth images, tracking skeletons, and detecting gestures. based robotic systems.
 By integrating OpenNI with ROS, developers can create sophisticated robotic systems that can  This integration enables the development of sophisticated robotic applications that leverage point
perceive their environment in 3D, interact with humans through gestures, and perform complex cloud data for tasks such as environment mapping, object detection and tracking, obstacle
tasks autonomously. avoidance, and localization and navigation in complex 3D environments.

8 9
In conclusion, the integration of PCL with ROS combines the powerful point cloud processing
capabilities of PCL with the flexible robotics development environment provided by ROS,
enabling the creation of advanced robotic systems with enhanced perception and spatial
awareness.

Installing and testing ROS camera Drivers 4. Configure camera driver:

Install ROS: Once the driver package is installed, you may need to configure it. This could involve specifying
the camera device ID, resolution, frame rate, etc. Refer to the documentation provided with the
If you haven't already installed ROS, you'll need to do so. Follow the instructions on the ROS
camera driver package for instructions on configuration.
website for your specific operating system and ROS distribution: ROS Installation Instructions

5. Launch camera driver node:


2. Install camera driver dependencies:

After installation and configuration, you can launch the camera driver node. This is typically
Different cameras may require different dependencies. Check the documentation provided with
done using ROS launch files. For example:
your camera or the ROS package for specific instructions. Typically, you might need to install
libraries such as libusb, libudev, libv4l, etc.

3. Install ROS camera driver package:

ROS provides several camera drivers as ROS packages. You'll need to install the appropriate
package for your camera. Some common camera driver packages include: Replace usb_cam-test.launch with the appropriate launch file for your camera driver.

 usb_cam for USB cameras. 6. Test camera:


 camera_umd for various camera drivers.
Once the camera driver node is running, you can test the camera by subscribing to its image topic
 uvc_camera for USB Video Class (UVC) cameras.
and viewing the images. You can use tools like rviz, image_view, or write your own subscriber
 gscam for GStreamer-based cameras.
node to visualize the camera feed.
You can usually install these packages using apt-get if they're available in your ROS
distribution's package repositories. For example:

udo apt-get install ros-<distro>-usb-cam

10 11
Replace <distro> with your ROS distribution name (e.g., melodic, noetic).

Include Headers: In your ROS node code where you want to use cv_bridge, include the
necessary headers:

#include <ros/ros.h>

Replace /usb_cam/image_raw with the appropriate image topic published by your camera #include <image_transport/image_transport.h>
driver.
#include <cv_bridge/cv_bridge.h>
7. Troubleshooting:
#include <sensor_msgs/Image.h>
If you encounter any issues during installation or testing, refer to the documentation provided
#include <opencv2/highgui/highgui.hpp>
with the camera driver package. You can also search online forums and ROS community
resources for help with troubleshooting specific issues.
#include <opencv2/imgproc/imgproc.hpp>

By following these steps, you should be able to install and test ROS camera drivers for your Replace image_transport/image_transport.h and sensor_msgs/Image.h with the appropriate
specific camera. Remember that the exact steps may vary depending on your camera model and paths based on your ROS distribution.
ROS distribution. Subscribing to Image Topics: Suppose you have a ROS node that subscribes to an image topic.
You can define a callback function to handle incoming images:

ROS to OpenCV - The cv_bridge Package.


void image Callback(const sensor msgs::ImageConstPtr& msg) {
The cv_bridge package in ROS (Robot Operating System) facilitates the conversion of ROS
image messages to OpenCV images and vice versa. This is particularly useful when you're try {
working with ROS and need to process images using OpenCV libraries.
cv::Mat img = cv_bridge::toCvCopy(msg, sensor_msgs::image_encodings::BGR8)->image;
Here's how you can use the cv_bridge package:
// Process the image using OpenCV
Installation: Make sure you have ROS installed on your system. You can install cv_bridge
// Example: cv::imshow("Received Image", img);
using ROS package manager (apt for Ubuntu):

// Example: cv::waitKey(1);
sudo apt-get install ros-<distro>-cv-bridge
} catch (cv_bridge::Exception& e) {
12 13
U20RA601/ROBOTICS VISION SYSTEM
PART-A
ROS_ERROR("cv_bridge exception: %s", e.what());
UNIT-1

} VISION SYSTEM
1. Define Image?
} An Image may be defined as a two dimensional function f (x,y) where x & y are spatial (plane)
Converting ROS Image Messages to OpenCV Images: Inside the callback function, coordinates, and the amplitude of f at any pair of coordinates(x,y) is called intensity or gray level of the
cv_bridge::toCvCopy() function is used to convert the ROS image message to an OpenCV image at that point. When x,y and the amplitude values of f are all finite, discrete quantities we call the
image (cv::Mat). The second argument specifies the desired encoding (e.g., image as Digital Image.
sensor_msgs::image_encodings::BGR8). 2. Define Image Sampling?
Publishing OpenCV Images as ROS Image Messages: If you want to publish OpenCV images Digitization of spatial coordinates (x,y) is called Image Sampling. To be suitable for computer processing,
as ROS image messages, you can use cv_bridge::CvImage: an image function f(x,y) must be digitized both spatially and in magnitude.
3. Define Quantization ?
Digitizing the amplitude values is called Quantization. Quality of digital image is determined to a large
degree by the number of samples and discrete gray levels used in sampling and quantization.
cv_bridge::CvImage out_msg;
4. What is Dynamic Range?

out_msg.header = msg->header; // Same timestamp and tf frame as input image The range of values spanned by the gray scale is called dynamic range of an image. Image will have high
contrast, if the dynamic range is high and image will have dull washed out gray look if the dynamic range
out_msg.encoding = sensor_msgs::image_encodings::BGR8; // or whatever your image encoding is low.
is 5. Define Mach band effect?
The spatial interaction of Luminance from an object and its surround creates a Phenomenon called the
out_msg.image = your_cv_mat_image; // Your cv::Mat
mach band effect.

pub.publish(out_msg.toImageMsg()); 6. Define Brightness?


Brightness of an object is the perceived luminance of the surround. Two objects with different
Compile and Run: Make sure to compile your ROS package after making changes to your code. surroundings would have identical luminance but different brightness.
Then, run your ROS node as usual. 7. Define Tapered Quantization?
If gray levels in a certain range occur frequently while others occurs rarely, the quantization levels are
By using the cv_bridge package, you can seamlessly integrate ROS image messages with
finely spaced in this range and coarsely spaced outside of it.This method is sometimes called Tapered
OpenCV image processing capabilities, enabling sophisticated robotic vision applications within
Quantization.
the ROS ecosystem.
8. What do you meant by Gray level?
Gray level refers to a scalar measure of intensity that ranges from black to grays and finally to white.
9. What do you meant by Color model?
A Color model is a specification of 3D-coordinates system and a subspace with in that system where each
14
color is represented by a single point.
10. List the hardware oriented color models?
1. RGB model
2. CMY model Brightness adaptation means the human visual system can operate only from scotopic to glare limit. It
3. YIQ model cannot operate over the range simultaneously. It accomplishes this large variation by changes in its
4. HSI model overall intensity.
11. What is Hue of saturation? 19. Define weber ratio
Hue is a color attribute that describes a pure color where saturation gives a measure of the degree to The ratio of increment of illumination to background of illumination is called as weber ratio.
which a pure color is diluted by white light. ฀ Ic/I. If the ratio is small, then small percentage of change in intensity is needed (ie) good brightness
12. List the applications of color models? adaptation. If the ratio is large, then large percentage of change in intensity is needed (ie) poor brightness

1. RGB model--- used for color monitor & color video camera adaptation.

2. CMY model---used for color printing 20. What is meant by machband effect?
Machband effect means the intensity of the stripes is constant. Therefore it preserves the brightness
3. HIS model --- used for color image processing
pattern near the boundaries, these bands are called as machband effect.
4. YIQ model -- used for color picture transmission
UNIT II/ VISION ALGORITHMS
13. What is Chromatic Adoption?
21. What is simultaneous contrast?
The hue of a perceived color depends on the adoption of the viewer. For example, the American Flag will
The region reserved brightness not depends on its intensity but also on its background. All centre square
not immediately appear red, white, and blue of the viewer has been subjected to high intensity red light
have same intensity. However they appear to the eye to become darker as the background becomes
before viewing the flag. The color of the flag will appear to shift in hue toward the red component cyan.
lighter.
14. Define Resolution?
22. What is meant by illumination and reflectance?
Resolution is defined as the smallest number of discernible detail in an image. Spatial resolution is the
Illumination is the amount of source light incident on the scene. It is represented as i(x, y). Reflectance is
smallest discernible detail in an image and gray level resolution refers to the smallest discernible change
the amount of light reflected by the object in the scene. It is represented by r(x, y).
is gray level.
23. Define sampling and quantization
15. Define of KL Transform?
Sampling means digitizing the co-ordinate value (x, y). Quantization means digitizing the amplitude
KL Transform is an optimal in the sense that it minimizes the mean square error between the vectors X
value.
and their approximations X^. Due to this idea of using the Eigenvectors corresponding to largest Eigen
24.What are the types of noise models
values. It is also known as principal component transform.
฀ Guassian noise
16. Explain Mask or Kernels?
฀ Rayleigh noise
A Mask is a small two-dimensional array, in which the value of the mask coefficient determines the
฀ Erlang noise
nature of the process, such as image sharpening.
฀ Exponential noise
17. What are the steps involved in DIP?
฀ Uniform noise
1. Image Acquisition
฀ Impulse noise
2. Preprocessing
25.Define histogram.
3. Segmentation
A histogram of a digital image with gray levels in the range [0,L-1] is a discrete function given by H(rk) =
4. Representation and Description
nk
5. Recognition and Interpretation Where rk Kth gray level
18. Define subjective brightness and brightness adaptation? nk Number of pixels in the image having gray level rk
Subjective brightness means intensity as preserved by the human visual system. That is , histogram is a plot having gray level values in the axis and the number of pixels with the
corresponding gray levels in the y-axis.
26. What is white noise?
If all the frequencies of a function are in equal proportions, its Fourier spectrum is said to be constant. If It is useful to specify the shape of the histogram that we wish the processed image to have.
the Fourier spectrum of a noise is constant, it is called White noise. 34.What is histogram equalization?
27. What is meant by spatial averaging? A technique used to obtain a uniform histogram is known as histogram equalization. Pr(rk) Vs rk is called
In spatial or image averaging each pixel in an image is repalced by a weighted average of its as histogram equalization, where
neighborhood pixels. This process is used to reduce the noise content in an image. Pr(rk)= nk /n
28.What is directional smoothing? ฀ nk - no of pixels in the image having gray level rk
directional smoothing is the process used to protect the edges from distortions in the form of blurring ฀ n- total no . of pixels.
while smoothing the images. It is done by using directional averaging filters. Conditions:
29.What is a Median filter? T(r) is single valued and monotonically increasing in interval 0≤r≤1, 0≤ T(r) ≤ 1
The median filter replaces the value of a pixel by the median of the gray levels in the neighborhood of 35.What is salt and pepper noise?
that pixel The impulse noise has the PDF given by
30.Name the categories of Image Enhancement and explain? Here when Pa and pb are not equal to zero, but both are approximately equal, the impulse noise values in
The categories of Image Enhancement are an image look like salt and pepper granuales spreaded over the image. Therfore, it is called the salt and
฀ Spatial domain peper noise.
฀ Frequency domain 36.What is meant by Image Restoration?
Spatial domain: It refers to the image plane, itself and it is based on direct manipulation of pixels of an Restoration attempts to reconstruct or recover an image that has been degraded by using a clear
image. knowledge of the degrading phenomenon.
Frequency domain techniques are based on modifying the Fourier transform of an image. 37.What are the two properties in Linear Operator?
31Define box filter. ฀ Additivity
A spatial averaging filter in which all the coefficients are equal is called as a box filter. An example 3×3 ฀ Homogenity
filter is, 38.Explain additivity property in Linear Operator?
H[f1(x,y)+f2(x,y)]=H[f1(x,y)]+H[f2(x,y)]
The additive property says that if H is the linear operator,the response to a sum of two is equal to the sum
of the two responses.
39.How a degradation process is modeled?
A system operator H, which together with an additive white noise term _(x,y) a operates on an input
image f(x,y) to produce a degraded image g(x,y).
40.Explain homogenity property in Linear Operator?

32.What is homomorphic filtering? H[k1f1(x,y)]=k1 H[f1(x,y)]

The filter which controls both high frequency and low frequency components are called as homomorphic The homogeneity property says that,the response to a constant multiple of any input is equal to the

filters. response to that input multiplied by the same constant.

The equation for homomorphic filter can be derived from the illumination reflectance model given by the UNIT III/ OBJECT RECOGNITION

equation. 41.Define Gray-level interpolation?

f(x, y)= i(x, y) r(x, y) Gray-level interpolation deals with the assignment of gray levels to pixels in the spatially transformed

33.What is histogram specification? image

The method used to generate a processed image that has a specified histogram is called histogram 42.Why the restoration is called as unconstrained restoration?

matching (or) histogram specification.


In the absence of any knowledge about the noise ‘n’, a meaningful criterion function is to seek an f^ such The approach for linking edge points is to analyze the characteristics of pixels in a small neighborhood
that H f^ approximates of in a least square sense by assuming the noise term is as small as possible. (3x3 or 5x5) about every point (x, y) in an image that has undergone edge detection. All points that are
Where H = system operator. f^ = estimated input image. g = degraded image. similar are linked, forming a boundary of pixels that share some common properties.
43.What are the three methods of estimating the degradation function? 51. What are the two properties used for establishing similarity of edge pixels?
฀ Observation (1) The strength of the response of the gradient operator used to produce the edge pixel. (2) The direction
฀ Experimentation of the gradient.

฀ Mathematical modeling. 52. What is edge?

An edge is a set of connected pixels that lie on the boundary between two regions edges are more closely
44.What are the types of noise models?
modeled as having a ramp like profile. The slope of the ramp is inversely proportional to the degree of
฀ Guassian noise
blurring in the edge.
฀ Rayleigh noise
฀ Erlang noise 53. Give the properties of the second derivative around an edge?
* The sign of the second derivative can be used to determine whether an edge pixel lies on the dark or
฀ Exponential noise
light side of an edge. * It produces two values for every edge in an image. * An imaginary straight line
฀ Uniform noise
joining the extreme positive and negative values of the second derivative would cross zero near the
฀ Impulse noise
midpoint of the edge.
45.What is inverse filtering?
54. What is meant by object point and background point?
The simplest approach to restoration is direct inverse filtering, an estimate F^(u,v) of the transform of the
To execute the objects from the background is to select a threshold T that separates these modes. Then
original image simply by dividing the transform of the degraded image G^(u,v) by the degradation
any point (x, y) for which f(x, y)>T is called an object point. Otherwise the point is called background
function.
point.
F^ (u,v) = G^(u,v)/H(u,v)
55. What is global, Local and dynamic or adaptive threshold?
46.What is segmentation?
When Threshold T depends only on f(x,y) then the threshold is called global . If T depends both on f(x,y)
Segmentation subdivides on image in to its constitute regions or objects. The level to which the
and p(x,y) is called local. If T depends on the spatial coordinates x and y the threshold is called dynamic
subdivides is carried depends on the problem being solved .That is segmentation should when the objects
or adaptive where f(x,y) is the original image. 56. Define region growing?
of interest in application have been isolated.
Region growing is a procedure that groups pixels or subregions in to layer regions based on predefined
47. Write the applications of segmentation.
criteria. The basic approach is to start with a set of seed points and from there grow regions by appending
* Detection of isolated points.
to each seed these neighbouring pixels that have properties similar to the seed.
* Detection of lines and edges in an image.
58. Specify the steps involved in splitting and merging?
48. What are the three types of discontinuity in digital image?
Split into 4 disjoint quadrants any region Ri for which P(Ri)=FALSE. Merge any adjacent regions Rj and
Points,
Rk for which P(RjURk)=TRUE. Stop when no further merging or splitting is positive.
Lines and edges.
59. What is meant by markers?
49. How the derivatives are obtained in edge detection during formulation?
An approach used to control over segmentation is based on markers. marker is a connected component
The first derivative at any point in an image is obtained by using the magnitude of the gradient at that
belonging to an image. We have internal markers, associated with objects of interest and external markers
point. Similarly the second derivatives are obtained by using the laplacian.
associated with background.
50. Write about linking edge points.
60. What is thresholding? Give its types.
Thresholding is the process used to separate the object present in an image from its background. magnitude of this vector (gradient) and is denoted as Δf The direction of gradient vector also is an
Depending on the object present, thresholding is divided into 1. Single thresholding, 2. Multilevel important quantity. 70. Give the Robert’s cross- Gradient operators Robert’s cross- Gradient operators are
thresholding used to calculate the first order derivative at any point in the image. Two marks used for this purpose are -
UNIT IV/ APPLICATIONS 1 0 0 1 These two masks are known as Robert’s mask or Robert’s cross- gradiant operators
61. What is meant by region growing? 71. . What is computer vision?
Region growing is the process of grouping pixels or subregions into larger regions based on predefined Computer Vision uses images and videos to understand a real-world scene. Just like Humans use eyes for
similarity criteria. The similarity criteria may be based on Intensity values, texture, color, size capturing light, receptors in the brain for accessing it, and the visual cortex for processing it. Similarly, a
62. . Define catchment basin. computer understands images, videos, or a real-world scenario through machine learning algorithms and
In watershed segmentation technique, for a particular regional minimum, the set of points at which if a AI self-learning programming.
drop is placed, it would fall with certainty to a single minimum is called catchment basin of watershed of 72. What are machine learning algorithms available in OpenCV?
that minimum. An OpenCV is open for all and free cross-platform where you get a library of real-time computer vision
63. Why the laplacian cannot be used in its original form for edge detection? programming functions. It is developed by Intel and is mostly written in the C++ programming language.
The laplacian cannot be used for edge detection in its original form, because ฀ It is highly sensitive to A JavaScript version is also available as OpenCV.js which is built for web platforms.
noise ฀ Its magnitude produces double edges which makes the segmentation difficult. ฀ It is unable to Some machine learning libraries available in OpenCV are:
detect the direction of edges. Artificial Neural Networks, Random Forest, Support Vector Machine, Decision Tree Learning,
64. What is an isolated poit? How it is detected ? Convolution Neural Networks, Boosting and Gradient Boosting Trees, Expectation-Maximization
An isolated point is a point whose gray level is entirely different from its background and which is located algorithm, Naive Bayes classifier, K-nearest neighboring algorithm.
in a homogeneous or nearly homogeneous area. An isolated point is said to be detected if the response, R 73. How many types of image filters in OpenCV?
of a mask at the location on which it is centered is |R|>= T Where T – positive threshold Image filters used in OpenCV are: Bilateral Filter, Blur, Box Filter, Dilate, Build Pyramid, Erode,
65. What is hough transform? Filter2D, Gaussian Blur, Deriv, and Gabor Kernels, Laplacian, and Median Blur.
Assembling the edge pixels may not result in a meaningful edges in some cases. Because, there may be 74. What are face recognition algorithms?
breaks in the edges due to non uniform illumination and some other factors. Hough transform is a method The face recognition algorithm is basically the computer application that is used for tracking, detecting,
used to link edge pixels into meaningful edges by onsidering the global relationship between pixels. identifying, or verifying the human faces simply from the image or the video that has been captured using
66. What is meant by water shed lines? the digital camera.
The points at which water would be equally likely to fall to more than one single minimum (i.e. satisfying Some popular but evolving algorithms are:
condition (3)) will form crest lines on the topographic surface. Such line are called Divide lines or a ฀ PCA- Principal Component Analysis
watershed lines. ฀ LBPH- Local Binary Pattern Histograms
67. How to select the seed points in region growing? ฀ k-NN (nearest neighbors) algorithm
When no prior information is available a set of the same properties are compute d for every pixel. If the ฀ Eigen’s faces
result of these computations shows clusters or groups of values then the pixels with properties near the ฀ Fisher faces
centroid of the cluster can be selected as seed points. ฀ SIFT- Scale Invariant Feature Transform
68.What is the advantage of using sobel operator? ฀ SURF- Speed Up Robust Features
Sobel operators have the advantage of providing both the differencing and a smoothing effect. Because 75. What are languages supported by Computer vision?
derivatives enhance noise, the smoothing effect is particularly attractive feature of the sobel operators. LISP, Prolog, C/C++, Java, and Python.
69. Explain about gradient operator 76. What Do You Mean By Color Model?
The gradient of an image f(x, y) at location (x, y) is the vector -The gradient vector points are in the
direction of maximum rate of change of f at (x, y) - In edge detection an important quantity is the
A Color Model is a coordinate system and a subset of visible colors. With "Color Model" we create a
Q: What is PCL in robotics?
whole range of colors from a limited set of primary colors like RGB (Red Green Blue). Color Models are
A: PCL (Point Cloud Library) is a large-scale, open-source project for 2D/3D image and point cloud
of two types: Additive and Subtractive. processing.
77. What Is Dynamic Range?
Q: How can you install a ROS camera driver?
Dynamic Range is a ratio of small and large values that is assumed by a certain quantity. It is used in A: You can install a ROS camera driver using sudo apt-get install ros-<distro>-usb-cam (replace <distro>
with your ROS version).
signals, photography, sounds, and light. From a photographic point of view, it is a ratio of minimum and
maximum measuring light intensity or the lightest and darkest regions also called color contrast. Q: How do you test a ROS camera driver?
A: By running camera nodes and using rqt_image_view or rviz to visualize the image topic like
78. Define Digital Image?
/camera/image_raw.
A digital image is an image that is comprised of the elements of the picture, they are also admitted as
Q: What is the function of cv_bridge in ROS?
pixels. Each pixel is with the finite and the discrete numbers of the numerical representation which belong
A: cv_bridge is a ROS package that allows conversion between ROS image messages and OpenCV image
to its intensity and the gray level which is considered as it’s output from the functions of the two- formats.
dimensions that is feed by the input of spatial coordinates those are denoted by the x-axis and y-axis.
Q: Mention one command to view images from a ROS camera topic using OpenCV.
79. What Is Meant By Mach Band Effect? A: You can use Python with cv_bridge and OpenCV to subscribe to an image topic and display it using
cv2.imshow().
Mach Band Effect is an optical illusion. It emphasizes the differentiation between edges of the somewhat
varying shades of grey when they reach each other. The extreme left side is dark grey and it converts into
Q: What are ROS nodes?
the lighter shades as they move to the right side of the plate. A: ROS nodes are individual processes that perform computation and communicate with each other using
topics, services, or actions.
80. What Are Sampling And Quantization?
We use sampling and Quantization to convert analog images to digital images. An image has two things. Q: What is a ROS topic?
A: A topic in ROS is a named bus over which nodes exchange messages using a publish/subscribe model.
1. Coordinates
Digitizing of coordinates is called Sampling. That is, converting the coordinates of the analog images to Q: What is the role of a launch file in ROS?
A: A launch file in ROS automates the startup of multiple nodes and sets parameters using XML syntax.
the digital images.
2. Intensity/Amplitude Q: What is RViz used for in ROS?
A: RViz is a 3D visualization tool for ROS used to visualize sensor data, robot models, and planning.
Digitizing of Amplitude or Intensity is called Quantization. That is converting the Amplitude or Intensity
of an analog image to a digital image. Q: How does simulation help in robotics development?
A: Simulation allows safe and cost-effective testing of algorithms and behaviors before deploying them
UNIT V/ ROBOTIC OPERATING SYSTEM
on real hardware.
Q: What is the Robot Operating System (ROS)?
A: ROS is an open-source framework that provides libraries and tools to help developers build robot Q: What sensor types are supported by OpenNI?
applications. A: OpenNI supports RGB-D sensors such as Microsoft Kinect, Asus Xtion, and other depth cameras.

Q: Differentiate between real and simulated robots. Q: What is a point cloud?


A: Real robots interact with the physical world using hardware, while simulated robots operate in a virtual A: A point cloud is a collection of data points in 3D space, usually produced by 3D scanners or depth
environment for testing and development. cameras.

Q: Name any two simulators commonly used with ROS. Q: Which ROS command is used to list available topics?
A: Gazebo and RViz are commonly used simulators in ROS. A: The command rostopic list is used to list all available topics in the ROS system.

Q: What is OpenCV? Q: What is the file extension for ROS launch files?
A: OpenCV (Open Source Computer Vision Library) is an open-source computer vision and machine A: ROS launch files use the .launch file extension.
learning software library.
Q: How does ROS integrate with OpenCV using Python?
Q: What is the use of OpenNI in robotics? A: ROS integrates with OpenCV using the cv_bridge module to convert ROS image messages to OpenCV
A: OpenNI (Open Natural Interaction) provides drivers and middleware for natural user interfaces,
format for processing in Python.
including depth sensors like Kinect.

You might also like