0% found this document useful (0 votes)
17 views51 pages

CV Unit 1 Overview of Computer Vison and Application

The document provides an overview of computer vision, explaining its definition as the extraction and interpretation of information from images, and its aspiration to match human vision capabilities. It discusses the components of vision, including sensing and interpreting devices, and outlines various applications such as face detection, self-driving cars, and augmented reality. Additionally, it covers image formation, geometric transformations, and the types of transformations used in computer graphics.

Uploaded by

Presha Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views51 pages

CV Unit 1 Overview of Computer Vison and Application

The document provides an overview of computer vision, explaining its definition as the extraction and interpretation of information from images, and its aspiration to match human vision capabilities. It discusses the components of vision, including sensing and interpreting devices, and outlines various applications such as face detection, self-driving cars, and augmented reality. Additionally, it covers image formation, geometric transformations, and the types of transformations used in computer graphics.

Uploaded by

Presha Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

lOMoARcPSD|48134016

CV-Unit 1-Overview of Computer vison and application

Computer Vision (Gujarat Technological University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Presha Patel ([email protected])
lOMoARcPSD|48134016

Unit 1
Computer Vision Introduction

What is Human Vision?


Human vision, ‘eyesight,’ is one of the most significant senses in the human body. It is the ability to
observe and understand one’s surroundings without complication. This is dependent on the eye, and
the eye is fully dependent on light.

The light enters the eye through the cornea and is concentrated on the retina, a light-sensitive
membrane at the back of the eye, by the lens.

After that, the image is inverted. Human vision coordinates with the eye, but it also coordinates
with the brain to function.

To identify an object in an image, we need to see a particular number of general forms and patterns.

What Is Computer Vision?

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Computer vision is the systematic extraction, interpretation, and comprehension of relevant


information from a single picture or a collection of images.

Computer vision is a technology that aspires to perform at the same level as human eyesight.
However, the human eye can’t see things in the rough, but in the case of computer vision, this isn’t
a significant concern. To attain a high level of visibility, computer vision has undergone numerous
methodologies, algorithms, machine learning, and other types of training. Without the assistance of
a person, computer vision can recognize images. Deep learning techniques’ progress has given new
life to the field of computer vision.

Overview of computer vision and its applications


Computer vision is a sector of Artificial Intelligence that uses Machine Learning and Deep Learning
to allow computers to see, recognize and analyze things in photos and videos in the same way that
people do. Computational vision is rapidly gaining popularity for automated AI vision inspection,
remote monitoring, and automation.

Definition:

Computer vision can be defined as a scientific field that extracts information out of digital images.
The type of information gained from an image can vary from identification, space measurements for
navigation, or augmented reality applications.

Computer vision systems use

(1) cameras to obtain visual data,


(2) machine learning models for processing the images, and
(3) conditional logic to automate application-specific use cases.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

What is vision??

Be it a computer or an animal, vision comes down to two components.

● 1: a sensing device captures as many details from an image as possible.


○ Human Vision: The eye will capture light coming through the iris and project it to
the retina, where specialized cells will transmit information to the brain through
neurons.
○ Computer Vision : A camera captures images in a similar way and transmits pixels to
the computer. Cameras are better than humans as they can see infrared with more
precision.

● 2 : The interpreting device has to process the information and extract meaning from it.
○ The human brain solves this in multiple steps in different regions of the brain.
○ Computer vision still lags behind human performance in this domain.It is using
machine learning and deep learning algorithms to gain information from image
semantics .
○ Extracting information from images
○ We can divide the information gained from images in computer vision in two
categories: measurements and semantic information.
○ Vision as a measurement device
■ Robots navigating in an unknown location need to be able to scan their
surroundings to compute the best path.
■ Stereo cameras give depth information, like our two eyes, through
triangulation. If we increase the number of viewpoints to cover all the sides
of an object, we can create a 3D surface representing the object even more
challenging idea might be to reconstruct the 3D model of a monument
through all the results of a google image search for this
○ A source of semantic information
■ On top of measurement information, an image contains a very dense amount
of semantic information. We can label objects in an image,label the whole
scene, recognize people, recognize actions, gestures,faces.

Applications

● Special effects Shape and motion capture are new techniques used in movies like Avatar to
animate digital characters by recording the movements played by a human actor. In order to
do that, we have to find the exact positions of markers on the actor’s face in a 3D space, and
then recreate them on the digital avatar.

● 3D urban modeling Taking pictures with a drone over a city can be used to render a 3D
model of the city. Computer vision is used to combine all the photos into a single 3D model.

● Scene recognition It is possible to recognize the location where a photo was taken. For
instance, a photo of a landmark can be compared to billions of photos on google to find the
best matches.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● Face detection Face detection has been used for multiple years in cameras to take better
pictures and focus on the faces. Smile detection can allow a camera to take pictures
automatically when the subject is smiling.

● Face recognition is more difficult than face detection, but with the scale of today’s data,
companies like Facebook are able to get very good performance. Finally, we can also use
computer vision for biometrics, using unique iris pattern recognition or fingerprints.

● Optical Character Recognition One of the oldest successful applications of computer vision
is to recognize characters and numbers. This can be used to read zip codes, or license plates.

● Mobile visual search With computer vision, we can do a search on Google using an image
as the query.

● Self-driving cars Autonomous driving is one of the hottest applications of computer vision.
Companies like Tesla, Google or General Motors compete to be the first to build a fully
autonomous car.

● Automatic checkout Amazon Go is a new kind of store that has no checkout. With computer
vision, algorithms detect exactly which products you take and they charge you as you walk
out of the store

● Vision-based interaction Microsoft’s Kinect captures movement in real time and allows
players to interact directly with a game through moves.

● Augmented Reality AR is also a very hot field right now, and multiple companies are
competing to provide the best mobile AR platform.Apple released ARKit in June and has
already impressive applications

● Virtual Reality VR is using similar computer vision techniques as AR. The algorithm needs
to know the position of a user, and the positions of all the objects around. As the user moves
around, everything needs to be updated in a realistic and smooth way.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Image Formation and Representation


Image formation combines radiometric and geometric processes by which 2D images of 3D objects
are formed.

Image Geometry:
https://medium.com/@madali.nabil97/representing-and-manipulating-points-lines-and-conics-using-homogeneous-coordinates-46d907ba2f54

Points:

One of the basic primitives in geometry is the point, a point can be defined by one or more
coordinates depending on the space ,for example in 2D, a point is defined in Euclidean coordinate
system by a vector with two components as follows,

Although these are the most widely used notations to define a point, this definition is rarely used in
fields related to computer vision, where the point is commonly defined in homogeneous
coordinates, this can be done simply, by adding an additional dimension as follows:

If you are given a point in homogeneous coordinates, you can do the reverse transformation by
simply dividing it by the last term.

For a single physical point in 2D, the set

defines all possible points in 3d space that represent the same physical point (x,y) in 2d. We can
therefore conclude that all the points of this set are equivalent.

3D Points:

The point coordinates in 3D can be written using inhomogeneous coordinates x=(x,y,z) part of R3
or homogeneous coordinates x’ = (x’,y’,z’,1) part of P3.To Denote a 3D point using the augmented
vector x’ =(x,y,z,1) with x’= w’x’.

Lines:

Once we have defined a point let us define a straight line, we recall that a straight line in 2D is
defined by the following equation:

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

A fairly simple way to interpret this equation is to consider that the set of points that belong to a
straight line parameterized by the vector ℓ = (a, b, c) has a null dot product null with this one, more
formally

We can multiply the first equation by w, we will get

Therefore the same equation is verified in homogeneous coordinates.

We can consider the vector of parameter ℓ as being the representation on homogeneous coordinates
of the straight line in 2D, in the same way, we can say that any straight line passing through the
origin without containing it is the homogeneous representation of a physical point in 2D.

the intersection of two lines parameterized by ℓ1 and ℓ2 is given by their cross product as follows,

We can also easily calculate the line between two points as follows

For example for the following two points

The straight line is given by

The first equation comes from the fact that x belongs to the two lines and verifies the following
equations

And the second equation comes from the fact that the two points belong to the same line and verify
the following equations

Some basic calculations are needed to reach the final results.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

3D Lines:
The equation for a line in 3D: 𝒗 = 〈𝑎, 𝑏, 𝑐〉 = parallel to the line.

𝒓𝟎 = 〈𝑥0, 𝑦0, 𝑧0〉 = a position vector then all other points, (𝑥, 𝑦, 𝑧), satisfy

〈𝑥, 𝑦, 𝑧〉 = 〈𝑥0, 𝑦0, 𝑧0〉 + t〈𝑎, 𝑏, 𝑐〉, for some number t.

The above form (𝒓 = 𝒓𝟎 + t 𝒗) is called the vector form of the line.

We also write this in parametric form as:


𝑥 = 𝑥0 + 𝑎𝑡,
𝑦 = 𝑦0 + 𝑏𝑡,
𝑧 = 𝑧0 + 𝑐𝑡

Plane:

The equation for a plane in 3D: 𝒏 = 〈𝑎, 𝑏, 𝑐〉 = orthogonal to plane ,𝒓𝟎 = 〈𝑥0, 𝑦0, 𝑧0〉 = a position
vector then all other points, (𝑥, 𝑦, 𝑧), satisfy

〈𝑎, 𝑏, 𝑐〉 ∙ 〈𝑥 − 𝑥0, 𝑦 − 𝑦0, 𝑧 − 𝑧0〉 = 0.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

The above form (𝒏 ∙ (𝒓 − 𝒓𝟎) = 0) is called the vector form of the plane.

We also write this in standard form as: 𝑎(𝑥 − 𝑥0) + 𝑏(𝑦 − 𝑦0) + 𝑐(𝑧 − 𝑧0) = 0

𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 − 𝑎𝑥0 − 𝑏𝑦0 − 𝑐𝑧0 = 0,

letting 𝑑 = −𝑎𝑥0 − 𝑏𝑦0 − 𝑐𝑧0, we get

𝑎𝑥 + 𝑏𝑦 + 𝑐𝑧 + 𝑑 = 0.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Transformation

Linear Transformations
•Linear transformations are combinations of Scale,Rotation, Shear
•Properties Satisfies:
•Origin maps to origin
•Lines map to lines
•Parallel lines remain parallel
•Ratios are preserved

Euclidean Transformations/ Rigid Transformation


•The Euclidean transformations are the most commonly used transformations.
•An Euclidean transformation is either a translation, a rotation, or a reflection.
•Properties:
Preserve length and angle measures

Affine Transformations
•Affine transformations are the generalizations of Euclidean transformation and combinations of
•Linear transformations, Rotation and scaling and
•Translations and Reflection.
•Properties:
•Origin does not necessarily map to origin
•Lines map to lines but circles become ellipses
•Parallel lines remain parallel
•Ratios are preserved
•Length and angle are not preserved

Projective Transformations
•Projective transformations are the most general linear transformations and require the use of
homogeneous coordinates.
•Properties:

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

•Origin does not necessarily map to origin


•Lines map to lines
•Parallel lines do not necessarily remain parallel
•Ratios are not preserved
•Closed under composition

Transformations are the movement of the object in the Cartesian plane .

TYPES OF TRANSFORMATION
There are two types of transformation in computer graphics.
1) 2D transformation
2) 3D transformation
Types of 2D and 3D transformation
1) Translation 2) Rotation 3) Scaling 4) Shearing 5) reflection

2D transformation:

Translation
The straight line movement of an object from one position to another is called Translation. Here the
object is positioned from one coordinate location to another.

Translation of point:

To translate a point from coordinate position (x, y) to another (x1 y1), we add algebraically the
translation distances Tx and Ty to original coordinate.

x1=x+Tx
y1=y+Ty

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

The translation pair (Tx,Ty) is called as shift vector.

Translation is a movement of objects without deformation. Every position or point is translated by


the same amount. When the straight line is translated, then it will be drawn using endpoints.

For translating polygon, each vertex of the polygon is converted to a new position. Similarly, curved
objects are translated. To change the position of the circle or ellipse its center coordinates are
transformed, then the object is drawn using new coordinates.

Let P is a point with coordinates (x, y). It will be translated as (x 1 y1).

Matrix for Translation:

Scaling
● It is used to alter or change the size of objects. The change is done using scaling factors. T
● here are two scaling factors, i.e. Sx in x direction Sy in y-direction.
● If the picture is enlarged to twice its original size then Sx = Sy =2. If Sxand Sy are not equal
then scaling will occur but it will elongate or distort the picture.
● If scaling factors are less than one, then the size of the object will be reduced.
● If scaling factors are higher than one, then the size of the object will be enlarged.
● If Sxand Syare equal it is also called as Uniform Scaling. If not equal then called as
Differential Scaling.
● If scaling factors with values less than one will move the object closer to coordinate origin,
while a value higher than one will move coordinate position farther from origin.
● Enlargement:

○ If T1= ,If (x1 y1)is original position and T1is translation vector then (x2 y2) are
coordinated after scaling

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● Reduction:

○ If T1= . If (x1 y1) is original position and T1 is translation vector, then (x2
y2) are coordinates after scaling

Matrix for Scaling:

Rotation:

● It is a process of changing the angle of the object. Rotation can be clockwise or


anticlockwise. For rotation, we have to specify the angle of rotation and rotation point.
Rotation point is also called a pivot point.
● Types of Rotation:
○ Anticlockwise
○ Counterclockwise
● The positive value of the pivot point (rotation angle) rotates an object in a counter-clockwise
(anti-clockwise) direction.
● The negative value of the pivot point (rotation angle) rotates an object in a clockwise
direction.
● When the object is rotated, then every point of the object is rotated by the same angle.
○ Straight Line: Straight Line is rotated by the endpoints with the same angle and
redrawing the line between new endpoints.
○ Polygon: Polygon is rotated by shifting every vertex using the same rotational angle.
○ Curved Lines: Curved Lines are rotated by repositioning of all points and drawing
of the curve at new positions.
○ Circle: It can be obtained by center position by the specified angle.
○ Ellipse: Its rotation can be obtained by rotating major and minor axis of an ellipse by
the desired angle.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Matrix for rotation is a clockwise direction.

Matrix for rotation is an anticlockwise direction.

Matrix for homogeneous coordinate rotation (clockwise)

Matrix for homogeneous coordinate rotation (anticlockwise)

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Rotation about an arbitrary point: If we want to rotate an object or point about an arbitrary point,
first of all, we translate the point about which we want to rotate to the origin. Then rotate a point or
object about the origin, and at the end, we again translate it to the original place. We get rotation
about an arbitrary point.

Example: Rotate a line CD whose endpoints are (3, 4) and (12, 15) about origin through a 45°
anticlockwise direction.

Solution: The point C (3, 4)

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Euler angles

Affine and Projective Transformation

Two classes of linear transformations - projective and affine. Affine transformations are the
particular case of the projective ones. Both of the transformations can be represented with the
following matrix:

Where:

● is a rotation matrix. This matrix defines the kind of the transformation that will be
performed: scaling, rotation, and so on.

● is the translation vector. It simply moves the points.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● is the projection vector. For affine transformations all elements of this vector are
always equal to 0.

If x and y are the coordinates of a point, the transformation can be done by the simple
multiplication:

Here, x' and y' are the coordinates of the transformed point.

This transformation allows creating perspective distortion. The affine transformation is used for
scaling, skewing and rotation.

Difference Between Projective and Affine Transformations

For affine transformations, the first two elements of this line should be zeros.

● The projective transformation preserves collinearity and incidence.


● Since the affine transformation is a special case of the projective transformation, it has the
same properties. It preserves parallelism.

Projective transformation can be represented as transformation of an arbitrary quadrangle


(i.e. system of four points) into another one.

Affine transformation is a transformation of a triangle. Since the last row of a matrix is zeroed,
three points are enough. The image below illustrates the difference.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

3D Transformation:

3D Translation:

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

3D Rotation:

X axis Rotation

Y axis Rotation

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Z axis Rotation:

3D Scaling:

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

3D Reflection

Composing Transformation – the process of applying several transformation in succession


to form one overall transformation
If we apply transform a point P using M1 matrix first, and then transform using M2, and
then M3, then we have:
(M3 x (M2 x (M1 x P ))) = M3 x M2 x M1 x P

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Image Radiometry

In image formation,radiometry is concerned with the relation among the amounts of light energy
emitted from light sources,reflected from surfaces and captured by sensors.

Simple model for Image Formation

● A Simple model of image formation The scene is illuminated by a single source.


● The scene reflects radiation towards the camera. The camera senses it via solid state cells
(CCD cameras)
● There are two parts to the image formation process:
○ The geometry, which determines where in the image plane the projection of a point
in the scene will be located.
○ The physics of light, which determines the brightness of a point in the image plane.
f(x,y) = i(x,y) r(x,y)
■ Simple model: i: illumination, r: reflectance

Radiometry Parameters

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Photometric image formation

● Images cannot exist without light.


● Light sources can be a point or an area light source.
○ point source (location only, e.g. bulb)
○ Directional source (orientation only, e.g. Sun)
○ Ambient source (no location nor orientation)
○ Spot light (point source + spread angle)
○ Flap, barn-door (directional source + spatial extent)

● When Light arriving at a surface, two factors affect of light for vision.
○ Strength: characterized by its irradiance (energy/time-area)
○ Distance: how much emitted energy actually gets to the object (no
attenuation, no reflection)

● When the light hits a surface, three major reactions might occur-

○ Some light is absorbed. That depends on the factor called ρ (albedo). Low ρ
of the surface means more light will get absorbed.

○ Some light gets reflected

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

■ diffusively, which is independent of viewing direction. It follows


Lambert’s cosine law that the amount of reflected light is proportional
to cos(θ). E.g., cloth, brick.

■ Some light is reflected specularly, which depends on the viewing


direction. E.g., mirror.

○ Some lights may refracted.(absorbed and travel through material)


○ absorption + reflection + refraction = total incident

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Bidirectional Reflectance Distribution Function (BRDF).

● It gives the measure of light scattered by a medium from one direction into another.
● The scattering of the light can determine the topography of the surface —
○ smooth surfaces reflect almost entirely in the specular direction, while with
increasing roughness the light tends to diffract into all possible directions.
○ An object will appear equally bright throughout the outgoing hemisphere if its
surface is perfectly diffuse (i.e., Lambertian).

● Relative to some local coordinate frame on the surface, the BRDF is a four dimensional
function that describes how much of each wavelength arriving at an incident direction v̂ i is
emitted in a reflected direction v̂ r .
● The function can be written in terms of the angles of the incident and reflected directions
relative to the surface frame as

● Owing to this, BRDF can give valuable information about the nature of the target sample.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● Fraction of incident light from the incident direction to the viewing direction per unit surface
area per unit viewing angle

Typical BRDFs can often be split into their diffuse and specular components,

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● The diffuse component (also known as Lambertian or matte reflection) scatters light
uniformly in all directions and is the phenomenon we most normally associate with shading.
● Diffuse reflection also often imparts a strong body color to the light since it is caused by
selective absorption and re-emission of light inside the object’s material

● While light is scattered uniformly in all directions, i.e., the BRDF is constant.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

The second major component of a typical BRDF is specular (gloss or highlight) reflection,
which depends strongly on the direction of the outgoing light.

Consider light reflecting off a mirrored surface.Incident light rays are reflected in a direction that is
rotated by 180◦ around the surface normal n̂.

Color
From a viewpoint of color, we know visible light is only a small portion of a large electromagnetic
spectrum.
Two factors are noticed when a colored light arrives at a sensor:
Colour of the light
Colour of the surface

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Bayer Grid/Filter is an important development to capture the color of the light. In a camera, not
every sensor captures all the three components (RGB) of light. Inspired by human visual preceptors,
Bayers proposed a grid in which there are 50% green, 25 % red, and 25% blue sensors.

3D to 2D Projection
● Projections transform points in n-space to m-space, where m < n.
● We need to specify how 3D primitives are projected onto image plane.A linear 3D to 2D
projection matrix can be used to do this.
● In 3D, we map points from 3-space to the projection plane (PP) along projectors emanating
from the center of projection (COP).

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

There are two basic types of projections:

Perspective - distance from COP to PP finite


Parallel - distance from COP to PP infinite

Parallel Projection :
● In parallel projection, the centre of the projection lies at infinity. In parallel projection, the
view of the object obtained at the plane is less-realistic as there is no for-shortcoming. and
the relative dimension of the object remains preserves.

● Characteristic of parallel Projection:


○ In parallel Projection, the projection lines are parallel to each other.
○ There is the least amount of distortion within the object.
○ The lines that are parallel to the object are also parallel to the drawing.
○ The view of Parallel Projection is less realistic cause of no foreshortening.
○ The Parallel Projections are good for accurate measurements.


● Parallel projection is further divided into two categories :
○ a) Orthographic Projection
○ b) Oblique Projection

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

(a) Orthographic Projection : It is a kind of parallel projection where the projecting lines emerge
parallelly from the object surface and incident perpendicularly at the projecting plane.

Orthographic Projection is of two categories :

(a). Multiview Projection : It is further divided into three categories –

(1) Top-View : In this projection, the rays that emerge from the top of the polygon
surface are observed.

(2) Side-View : It is another type of projection orthographic projection where the


side view of the polygon surface is observed.

(3) Front-view : In this orthographic projection front face view of the object is
observed.

(2) Axonometric : Axonometric projection is an orthographic projection, where the


projection lines are perpendicular to the plane of projection, and the object is rotated around
one or more of its axes to show multiple sides.

It is further divided into three categories :

(1) Isometric Projection : It is a method for visually representing three-dimensional


objects in two-dimensional display in technical and engineering drawings. Here in
this projection, the three coordinate axes appear equally foreshortened and the angle
between any two of them is 120 degrees.

(2) Dimetric Projection : It is a kind of orthographic projection where the visualized


object appears to have only two adjacent sides and angles equal.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

(3) Trimetric Projection : It is a kind of orthographic projection where the visualized


object appears to have all the adjacent sides and angles unequal.

(b) Oblique Projection : It is a kind of parallel projection where projecting rays emerge parallelly
from the surface of the polygon and incident at an angle other than 90 degrees on the plane.

It is of two kinds :

1. Cavalier Projection : It is a kind of oblique projection where the projecting lines emerge
parallelly from the object surface and incident at 45‘ rather than 90′ at the projecting plane.
In this projection, the length of the reading axis is larger than the cabinet projection.

2. Cabinet Projection : It is similar to that cavalier projection but here the length of reading
axes is just half than the cavalier projection and the incident angle at the projecting plane is
63.4′ rather 45′.

Perspective Projection:

● In the perspective projection, the distance of the project plane from the center of projection
is finite. The object size keeps changing in reverse order with distance.
● Perspective projection is used to determine the projector lines come together at a single
point. The single point is also called "project reference point" or "Center of projection."
● Characteristic of Perspective Projection:
○ The Distance between the object and projection center is finite.
○ In Perspective Projection, it is difficult to define the actual size and shape of the
object.
○ The Perspective Projection has the concept of vanishing points.
○ The Perspective Projection is realistic but tough to implement.

● Vanishing Point: Vanishing point can be defined as a point in image plane where all parallel
lines are interlinked. The Vanishing point is also called “Directing Point.”

Use of Vanishing Point:

○ It is used in 3D games and graphics editing.


○ It is also used to represent 3D objects.
○ We can also include perspective in the background of an image.
○ We can also insert the shadow effect in an image.

There are three types of Perspective Projection.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

1.One Point:
● A One Point perspective contains only one vanishing point on the horizon line.
● It is easy to draw.
● Use of One Point- The One Point projection is mostly used to draw the images of roads,
railway tracks, and buildings.

2.Two Point:
● It is also called "Angular Perspective." A Two Point perspective contains two vanishing
points on the line.
● Use of Two Point- The main use of Two Point projection is to draw the two corner roads.

3.Three-Point:
● The Three-Point Perspective contains three vanishing points. Two points lie on the horizon
line, and one above or below the line.
● It is very difficult to draw.
● Use of Three-Point: It is mainly used in skyscraping.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Camera and Projection


Image sensing Pipeline (The digital camera)

The light originates from multiple light sources, gets reflected on multiple surfaces, and finally
enters the camera where the photons are converted into the (R, G, B) values that we see while
looking at a digital image.

In a camera, the light first falls on the lens (optics). Following that is the aperture and shutter which
can be specified or adjusted. Then the light falls on sensors which can be CCD or CMOS , then the
image is obtained in an analog or digital form and we get the raw image.

Image is sharpened if required or any other important processing algorithms are applied. Post this,
white balancing and other digital signal processing tasks are done and the image is finally
compressed to a suitable format and stored.

CCD vs CMOS

The camera sensor can be CCD or CMOS. In charged coupled device (CCD). A charge is generated
at each sensing element and this photogenerated charge is moved from pixel to pixel and is
converted into a voltage at the output node. Then an analog to digital converter (ADC) converts the
value of each pixel to a digital value.

The complementary metal-oxide-semiconductor (CMOS) sensors work by converting charge to


voltage inside each element as opposed to CCD which accumulates the charge. CMOS signal is
digital and therefore does not need ADC. CMOS is widely used in cameras in the current times.

Properties of Digital Image Sensor

Let us look at some properties that you may see while clicking a picture on a camera.

Shutter Speed: It controls the amount of light reaching the sensor

Sampling Pitch: It defines the physical space between adjacent sensor cells on the imaging chip.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Fill Factor: It is the ratio of active sensing area size with respect to the theoretically available
sensing area (product of horizontal and vertical sampling pitches)

Chip Size: Entire size of the chip

Sensor Noise: Noise from various sources in the sensing process

Resolution: It tells you how many bits are specified for each pixel.

Post-processing: Digital image enhancement methods used before compression and storage.

Pinhole Camera

Figure : A simple working camera model: the pinhole camera model.

● A simple camera system – a system that can record an image of an object or scene in the 3D
world.
● This camera system can be designed by placing a barrier with a small aperture between the
3D object and a photographic film or sensor.
● Each point on the 3D object emits multiple rays of light outwards. Without a barrier in
place, every point on the film will be influenced by light rays emitted from every point on
the 3D object.

● Due to the barrier, only one (or a few) of these rays of light passes through the aperture and
hits the film. The result is that the film gets exposed by an “image” of the 3D object.
● This simple model is known as the pinhole camera model.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Figure : A formal construction of the pinhole camera model.

● In more formal construction of the pinhole camera the film is commonly called the image
plane.
○ O : The aperture ( the pinhole O or center of the camera)
○ f : The distance between the image plane and the pinhole O is the focal length f .
○ Virtual Image Plane: the image plane is placed between O and the 3D object at a
distance f from O.
● Let P = [x y z]T be a point on some 3D object visible to the pinhole camera. P will be
mapped or projected onto the image plane Π′, resulting in point1 P ′ = [x′ y′]T .
● Similarly, the pinhole itself can be projected onto the image plane, giving a new point C′.

● Define a coordinate system [i j k] centered at the pinhole O such that the axis k is
perpendicular to the image plane and points toward it.
● This coordinate system is often known as the camera reference system or camera coordinate
system.
● The line defined by C′ and O is called the optical axis of the camera system.

● Recall that point P ′ is derived from the projection of 3D point P on the image plane Π′.
Notice that triangle P ′C′O is similar to the triangle formed by P , O and (0, 0, z).
● If we derive the relationship between 3D point P and image plane point P

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

The effects of aperture size on the image.

● The aperture size decreases, the image gets sharper, but darker.
● The aperture size increases, the number of light rays that passes through the barrier
consequently increase,causing blurring the image.

Cameras and lenses

● The above conflict between crispness and brightness is mitigated by using lenses, devices
that can focus light.
● If we replace the pinhole with a lens that is both properly placed and sized, then it satisfies
the following property:
○ all rays of light that are emitted by some point P are refracted by the lens such that
they converge to a single point P ′ in the image plane.
● Lenses have a specific distance for which objects are “in focus”. This property is also related
to a photography

Camera lenses have another interesting property:


they focus all light rays traveling parallel to the optical axis to one point known as the focal
point

The distance between the focal point and the center of the lens is commonly referred to as the focal
length f .

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

The pinhole model z′ = f , The lens-based model z′ = f +z0.


Additionally, since this derivation takes advantage of the paraxial or “thin lens”, it is called the
paraxial refraction model.

Because the paraxial refraction model approximates using the thin lens assumption, a number of
aberrations can occur.
The radial distortion, which causes the image magnification to decrease or increase as a function of
the distance to the optical axis. The radial distortion is pincushion distortion when the magnification
increases and barrel distortion when the magnification decreases.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Projection matrix
There are three coordinate systems involved --- camera, image and world.

1. Camera: perspective projection.

This can be written as a linear mapping between homogeneous coordinates (the equation is
only up to a scale factor):

where a projection matrix represents a map from 3D to 2D.


2. Image: (intrinsic/internal camera parameters)

is a upper triangular matrix, called the camera calibration matrix:

where , .
○ provides the transformation between an image point and a ray in Euclidean
3-space.
○ There are four parameters:
1. The scaling in the image x and y directions, and .
2. The principal point , which is the point where the optic axis
intersects the image plane.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

○ The aspect ratio is .


○ Once is known the camera is termed calibrated.
○ A calibrated camera is a direction sensor, able to measure the direction of rays ---
like a 2D protractor.

3. World: (extrinsic/external camera parameters)


The Euclidean transformation between the camera and world coordinates is
:

Finally, concatenating the three matrices,

which defines the projection matrix from Euclidean 3-space to an image:

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Image Representation

After getting an image, it is important to devise ways to represent the image. There are various
ways by which an image can be represented. Let’s look at the most common ways to represent an
image.

Image as a matrix
The simplest way to represent the image is in the form of a matrix.

In fig., a part of the image, i.e., the clock, has been represented as a matrix. A similar matrix will
represent the rest of the image too.It is commonly seen that people use up to a byte to represent
every pixel of the image. This means that values between 0 to 255 represent the intensity for each
pixel in the image where 0 is black and 255 is white. For every color channel in the image, one such
matrix is generated. In practice, it is also common to normalize the values between 0 and 1 (as done
in the example in the figure above).

Image as a function
An image can also be represented as a function. An image (grayscale) can be thought of as a
function that takes in a pixel coordinate and gives the intensity at that pixel.
It can be written as function f: ℝ² → ℝ that outputs the intensity at any input point (x,y). The value
of intensity can be between 0 to 255 or 0 to 1 if values are normalized.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Image Digitization

In Digital Image Processing, signals captured from the physical world need to be translated into
digital form by “Digitization” Process. In order to become suitable for digital processing, an image
function f(x,y) must be digitized both spatially and in amplitude. This digitization process involves
two main processes called

1. Sampling: Digitizing the coordinate value is called sampling.


2. Quantization: Digitizing the amplitude value is called quantization

Typically, a frame grabber or digitizer is used to sample and quantize the analogue video signal.

Sampling

● Since an analogue image is continuous not just in its coordinates (x axis), but also in its
amplitude (y axis), so the part that deals with the digitizing of coordinates is known as
sampling.
● In digitizing sampling is done on independent variables. In the case of equation y = sin(x), it
is done on the x variable.

● In sampling we reduce this noise by taking samples. It is obvious that more samples we
take, the quality of the image would be more better.
● However, if you take sampling on the x axis, the signal is not converted to digital format,
unless you take sampling of the y-axis too which is known as quantization.
● Sampling has a relationship with image pixels.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● The total number of pixels in an image can be calculated as Pixels = total no of rows * total
no of columns.
○ For example, let’s say we have a total of 36 pixels, that means we have a square
image of 6X 6. As we know in sampling, that more samples eventually result in
more pixels. So it means that of our continuous signal, we have taken 36 samples on
x axis. That refers to 36 pixels of this image. Also the number sample is directly
equal to the number of sensors on the CCD array.

Here is an example for image sampling and how it can be represented using a graph.

Quantization

● Quantization is opposite to sampling because it is done on “y axis” while sampling is done


on “x axis”.
● Quantization is a process of transforming a real valued sampled image to one taking only a
finite number of distinct values.
● In simple words, when you are quantizing an image, you are actually dividing a signal into
quanta(partitions).
● How Quantization done:
○ Here we assign levels to the values generated by the sampling process.
○ In the image shown in the sampling explanation, although the samples had been
taken, they were still spanning vertically to a continuous range of gray level values.
○ In the image shown below, these vertically ranging values have been quantized into 5
different levels or partitions. Ranging from 0 black to 4 white. This level could vary
according to the type of image wanted as output.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● There is a relationship between Quantization with gray level resolution.


● The above quantized image represents 5 different levels of gray and that means the image
formed from this signal, would only have 5 different colors. It would be a black and white
image more or less with some colors of gray.
● When we want to improve the quality of image, we can increase the levels assign to the
sampled image. If we increase this level to 256, it means we have a gray scale image.
Whatever the level which we assign is called as the gray level. Most digital IP devices uses
quantization into k equal intervals. If b-bits per pixel are used,

● The number of quantization levels should be high enough for human perception of fine
shading details in the image.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

Extra For Knowledge:

Reference Frames
- Five reference frames are needed for general problems in 3D scene analysis.

• Object Coordinate Frame


- This is a 3D coordinate system: xb, yb, zb
- It is used to model ideal objects in both computer graphics and computer vision.
- It is needed to inspect an object (e.g., to check if a particular hole is in proper position relative to
other holes)
- The coordinates of 3D point B, e.g., relative to the object reference frame are (xb, 0, zb)
- Object coordinates do not change regardless how the object is placed in the scene.
Notation: (Xo, Yo, Zo)T

• World Coordinate Frame


- This is a 3D coordinate system: xw , yw , zw
- The scene consists of object models that have been placed (rotated and translated) into the scene,
yielding object coordinates in the world coordinate system.
- It is needed to relate objects in 3D (e.g., the image sensor tells the robot where to pick up ta bolt
and in which hole to insert it).
Notation: (Xw , Yw , Zw )

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

• Camera Coordinate Frame


- This is a 3D coordinate system (xc , yc , zc axes)
- Its purpose is to represent objects with respect to the location of the camera.
Notation: (Xc , Yc , Zc )T

• Image Plane Coordinate Frame (CCD plane)


- This is a 2D coordinate system (x f , y f axes)
- Describes the coordinates of 3D points projected on the image plane.
- The projection of A, e.g., is point a whose both coordinates are negative.
Notation: (x, y)T

• Pixel Coordinate Frame


- This is a 2D coordinate system (r, c axes)
- Each pixel in this frame has an integer pixel coordinates.
- Point A, e.g., gets projected to image point (ar , ac ) where ar and ac are integer row and column.
Notation: (xim , yim )T

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

● The camera matrix K contains some of the critical parameters that describe a camera’s
characteristics and its model, including the cx, cy, k, and l parameters.
● Two parameters are currently missing this formulation: skewness and distortion. Most
cameras have zero-skew, but some degree of skewness may occur because of sensor
manufacturing errors.
● Deriving the new camera matrix accounting for skewness is outside the scope of this class
and we give it to you below:

Substituting this in equation and simplifying gives

These parameters R and T are known as the extrinsic parameters because they are external to and do
not depend on the camera.
This completes the mapping from a 3D point P in an arbitrary world reference system to the image
plane. To reiterate, we see that the full projection matrix M consists of the two types of parameters
introduced above:intrinsic and extrinsic parameters.

Downloaded by Presha Patel ([email protected])


lOMoARcPSD|48134016

All parameters contained in the camera matrix K are the intrinsic parameters, which change as the
type of camera changes.
The extrinsic parameters include the rotation and translation, which do not depend on the camera’s
build.
Overall, we find that the 3 × 4 projection matrix M has 11 degrees of freedom: 5 from the intrinsic
camera matrix, 3 from extrinsic rotation, and 3 from extrinsic translation.

Downloaded by Presha Patel ([email protected])

You might also like