Unit - Iii CV
Unit - Iii CV
Topics
translational alignment
parametric motion
spline-based motion
optical flow
layered
motion.
❖ Projections in Computer Graphics :-
Representing an n-dimensional object into an n-1 dimension is known as
projection. It is process of converting a 3D object into 2D object, we represent
a 3D object on a 2D plane {(x,y,z)->(x,y)}. It is also defined as mapping or
transforming of the object in projection plane or view plane. When geometric
objects are formed by the intersection of lines with a plane, the plane is
called the projection plane and the lines are called projections.
Types of Projections:
1. Parallel projections
2. Perspective projections
Center of Projection:
It is an arbitrary point from where the lines are drawn on each point of an
object.
is the result
● If the cop is located at infinity, all the lines are parallel and the result
is a parallel projection.
Parallel Projection:
Parallel projection is divided into two parts and these two parts sub divided
into many.
Orthographic Projections:
Oblique Projections:
Oblique projections are obtained by projectors along parallel lines that are
not perpendicular to the projection plane. An oblique projection shows the
front and top surfaces that include the three dimensions of height, width and
depth. The front or principal surface of an object is parallel to the plane of
projection. Effective in pictorial representation.
● Isometric Projections: Orthographic projections that show more
Cavalier Projections:
All lines perpendicular to the projection plane are projected with no change
in length. If the projected line making an angle 45 degrees with the projected
plane, as a result the line of the object length will not change.
Cabinet Projections:
All lines perpendicular to the projection plane are projected to one half of
their length. These gives a realistic appearance of object. It makes 63.4
degrees angle with the projection plane. Here lines perpendicular to the
viewing surface are projected at half their actual length.
Perspective Projections:
● A perspective projection is the one produced by straight lines
person sees.
● Any set of parallel lines of object that are not parallel to the
●
● Two point perspective projections: Exactly 2 principals have
depth.
draw.
Perspective fore shortening:
The size of the perspective projection of the object varies inversely with
distance of the object from the center of projection.
Winter-time is here and so is the time to skill-up! More than 5,000 learners
have now completed their journey from basics of DSA to advanced level
development programs such as Full-Stack, Backend Development, Data
Science.
I=L⋅n
M observed intensities,
is a (known)
3×m
matrix of normalized light directions.
This model can easily be extended to surfaces with non-uniform albedo, while keeping
[4]
the problem linear. Taking an albedo reflectivity of
I=k(L⋅n)
If
is square (there are exactly 3 lights) and non-singular, it can be inverted, giving:
L−1I=kn
kn
, and
is not square (there are more than 3 lights), a generalisation of the inverse can be
[5]
obtained using the Moore–Penrose pseudoinverse, by simply multiplying both sides
with
LT
giving:
LTI=LTk(L⋅n)
(LTL)−1LTI=kn
After which the normal vector and albedo can be solved as described above.
Figure 1: A model that representated by elliptical splats (left), rendered using flat
shading (center left), Gouraud shading (center right), and Phong shading (right).
High visual quality is and flexible rendering is achieved using multi-pass deferred
shading achieved by the inherent anti-aliasing of surface splatting, as well as by
per-pixel Phong shading and shadow mapping (Botsch et al., 2004; Botsch et al., 2005).
Figure 2: Phong shading can be implemented efficiently using deferred shading, based
on the depicted three rendering passes.
The point-based rendering metaphor, where elliptical splats are generated from simple
OpenGL points, has also been successfully applied in scientific visualization. For
instance, in molecular visualization, individual atoms and their connections can be
represented by sphere and cylinders, respectively, which are generated and rasterized
completely on the GPU (Sigg et al., 2006). Thanks to the high rendering performance,
even dynamic (pre-computed) MD simulations of large membrane patches can be
visualized in realtime, which we exploited for an interactive “atom-level magnifier tool” in
a combined mesoscopic and molecular visualization (missing reference)
Figure 3: Point-based molecule rendering allows for interactive magnifier tool that
bridges the gap between cell visualization of the mesoscopic (left) and molecular scale
(right).
★ Also Check PDF shared in the WHats app group for the above topic .
However, 2D object detection has its limitations. Since it only considers two dimensions,
it doesn’t understand depth. This can make it hard to judge how far away or big an
object is. For example, a large object far away might appear the same size as a smaller
object that’s closer, which can be confusing. The lack of depth information can cause
inaccuracies in applications like robotics or augmented reality, where knowing the true
size and distance of objects is necessary. That’s where the need for 3D object detection
comes in.
3D object detection is vital for applications like self-driving cars, robotics, and
augmented reality systems. It works by using sensors like LiDAR or stereo cameras.
These sensors create detailed 3D maps of the environment, known as point clouds or
depth maps. These maps are then analyzed to detect objects in a 3D environment.
There are many advanced computer vision models designed specifically for
handling 3D data, like point clouds. For example, VoteNet is a model that
uses a method called Hough voting to predict where the center of an object
is in a point cloud, making it easier to detect and classify objects accurately.
Similarly, VoxelNet is a model that converts point clouds into a grid of small
cubes called voxels to simplify data analysis.
Key Differences Between 2D and 3D Object Detection
Now that we've understood 2D and 3D object detection, let's explore their key
differences. 3D object detection is more complicated than 2D object detection because
it works with point clouds. Analyzing 3D data, like the point clouds generated by LiDAR,
requires a lot more memory and computing power. Another difference is the complexity
of the algorithms involved. 3D object detection models need to be more complex to be
able to handle depth estimation, 3D shape analysis, and analysis of an object’s
orientation.
3D object detection offers several advantages that make it stand out from traditional 2D
object detection methods. By capturing all three dimensions of an object, it provides
precise details about its location, size, and orientation with respect to the real world.
Such precision is crucial for applications like self-driving cars, where knowing the exact
position of obstacles is vital for safety. Another advantage of using 3D object detection
is that it can help you get a much better understanding of how different objects relate to
each other in 3D space.
Despite the many benefits, there are also limitations related to 3D object detection. Here
are some of the key challenges to keep in mind:
➔ Autonomous Vehicles
Robotics
Robotic systems use 3D object detection for several applications.
They use it to navigate through different types of environments, pick
up and place objects, and interact with their surroundings. Such use
cases are particularly important in dynamic settings like warehouses
or manufacturing facilities, where robots need to understand
three-dimensional layouts to function effectively.
➔ Augmented and Virtual Reality (AR/VR)
❖ 3D Reconstruction
➔ 3D Reconstruction Basic Terminology (Traditional
Computer Vision Approach)
Understanding 3D Reconstruction
2. Depth from Focus: Depth from Focus is a technique that estimates depth information
based on the variations in the focus of an imaging system. By capturing multiple
images with different focus settings, the algorithm analyzes the sharpness of different
image regions and infers the corresponding depth values. This method is particularly
useful in applications where traditional stereo or structure from motion techniques may
not be applicable, such as micro-scale object reconstruction.
3. Shape from Shading: Shape from Shading techniques utilize the variations in the
intensity of an object's surface to infer its 3D shape. By assuming certain lighting
conditions, the algorithm estimates the surface normals at each pixel and generates a
depth map. This method finds applications in fields like computer graphics and medical
imaging.
➔ Applications of 3D Reconstruction:
1. Robotics and Automation: 3D reconstruction plays a vital role in robotic applications,
enabling robots to perceive their environments, localize themselves, and plan actions
accordingly. Robots equipped with 3D vision systems can navigate complex
environments, perform pick-and-place tasks with precision, and collaborate safely with
humans in shared workspaces.
5. Cultural Heritage Restoration: Cultural heritage sites and artifacts can deteriorate
over time due to environmental factors and human impact. 3D reconstruction assists in
the restoration and conservation of these valuable assets by creating digital archives
and aiding in virtual reconstruction efforts.
➔Triangulation
Geometric distortion, for example lens distortion, which means that the 3D to 2D
mapping of the camera deviates from the pinhole camera model. To some extent these
errors can be compensated for, leaving a residual geometric error.
● A single ray of light from x (3D point) is dispersed in the lens system of the
cameras according to a point spread function. The recovery of the
corresponding image point from measurements of the dispersed intensity
function in the images gives errors.
● In a digital camera, the image intensity function is only measured in discrete
sensor elements. Inexact interpolation of the discrete intensity function have
to be used to recover the true one.
'
● The image points y1 and y2' used for triangulation are often found using
various types of feature extractors, for example of corners or interest points in
general. There is an inherent localization error for any type of feature
extraction based on neighborhood operations.
● Refinement step in Structure-from-Motion. • Refine a visual reconstruction to
produce jointly optimal 3D structures P and camera poses C. • Minimize total
re-projection errors .
➔Bundle adjustments :
Refinement step in Structure-from-Motion. • Refine a visual
reconstruction to produce jointly optimal 3D structures P and
camera poses C. • Minimize total re-projection errors .
➔ Parametric Modeling
It is a term used to demonstrate the ability which change the model
For Instance: An object can be included with various types of features like
grooves, chamfers, holes, and fillets. The parametric solid model consists of a
are interlinked, or we can say that it allows the designer to demonstrate all
For Instance: the designer had to alter the length, breadth, and height of a
change one parameter since the other two are automatically updated.
rectangle, or sphere, which are then worked with using basic Boolean
operations.
surfaces (points, edges, etc.) that define its spatial limits. Then, by connecting
these spatial points, the thing is created. This technique is widely used in
are as follows:
vertices and faces. Also keep in mind that a single polygon is usually
model.
● Visual Programming: It allows us to create parametric 2D or 3D
connected inputs. It is best for the people who usually don’t have
code.
● Data Types: Each and every item has its own data type. Some items
software.
❖ Spline-based motion
1. Definition of Splines
splines, with cubic splines being the most widely used due to their
representation of motion.
points.
5. Mathematical Representation
specific interval.
polynomial, ensuring continuity in both the function and its first and
second derivatives.
minimize the error between the spline and the observed data points.
data scenarios.
7. Challenges
computationally intensive.
● They can also be integrated with Kalman filters for better tracking in
dynamic environments.
Conclusion
implementations.
The concept of optical flow dates back to the early works of James Gibson in the
1950s. Gibson introduced the concept in the context of visual perception. Researchers
didn’t start studying and using optical flow until the 1980s when computational tools
were introduced.
A significant milestone was the development of the Lucas-Kanade method in 1981. This
provided a foundational algorithm for estimating optical flow in a local window of an
image. The Horn-Schunck algorithm followed soon after, introducing a global approach
to optical flow estimation across the entire image.
Optical flow estimation relies on the assumption that the brightness of a point is
constant over short periods. Mathematically, this is expressed through the optical flow
equation, Ixvx+Iyvy+It=0.
● Ixand Iy reflect the spatial gradients of the pixel intensity in the x and y
directions, respectively
● It is the temporal gradient
● vx and vyare the flow velocities in the x and y directions, respectively.
More recent breakthroughs involve leveraging deep learning models like FlowNet,
FlowNet 2.0, and LiteFlowNet. These models transformed optical flow estimation by
significantly improving accuracy and computational efficiency. This is largely because
of the integration of Convolutional Neural Networks (CNNs) and the availability of large
datasets.
Even in settings with occlusions, optical flow techniques nowadays can accurately
anticipate complicated patterns of apparent motion.
Techniques and Algorithms for Optical Flow
Different types of optic flow algorithms, each with a unique way of calculating the
pattern of motion, led to the evolution of computational approaches. Traditional
algorithms like the Lucas-Kanade and Horn-Schunck methods laid the groundwork for
this area of computer vision.
The Lucas-Kanade Method
This method caters to use cases with a sparse feature set. It operates on the
assumption that flow is locally smooth, applying a Taylor-series approximation to the
image gradients. You can thereby solve an optical flow equation, which typically
involves two unknown variables for each point in the feature set. This method is highly
efficient for tracking well-defined corners and textures often, as identified by the
Shi-Tomasi corner detection or the Harris corner detector.
However, novel deep learning algorithms have ushered in a new era of optical flow
algorithms. Models like FlowNet, LiteFlowNet, and PWC-Net use CNNs to learn from
vast datasets of images. This enables the prediction with greater accuracy and
robustness, especially in challenging scenarios. For example, in scenes with occlusions,
varying illumination, and complex dynamic textures.