0% found this document useful (0 votes)
6 views38 pages

AIR Notes

The document provides an overview of robotics, including definitions, types of robots, the three laws of robotics, and components such as sensors and actuators. It also discusses artificial intelligence in robotics, machine learning techniques, and various robot configurations. Additionally, it covers concepts like kinematics, computer vision, and the role of sensors and transducers in robotic systems.

Uploaded by

swatikar.2708
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views38 pages

AIR Notes

The document provides an overview of robotics, including definitions, types of robots, the three laws of robotics, and components such as sensors and actuators. It also discusses artificial intelligence in robotics, machine learning techniques, and various robot configurations. Additionally, it covers concepts like kinematics, computer vision, and the role of sensors and transducers in robotic systems.

Uploaded by

swatikar.2708
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Unit 1: Intro to robotics

1. ROBOT: A re-programmable, multifunctional, automatic industrial machine


designed to replace human in hazardous work. (mechanical device that performs
human tasks, either automatically or by remote control)
WHY ITS NEEDED?
•Speed
• Can work in hazardous/dangerous temperature
• Can do repetitive tasks
• Can do work with accuracy
THREE LAWS OF ROBOTICS:

Zeroth Law : A robot may not injure humanity, or, through inaction, allow
humanity to come to harm
First Law : A robot may not injure a human being, or, through inaction, allow a
human being to come to harm, unless this would violate a higher order law
Second Law: A robot must obey orders given it by human beings, except where
such orders would conflict with a higher order law
Third Law : A robot must protect its own existence as long as such protection
does not conflict with a higher order law.
TYPES OF ROBOTS:
1. Stationary robots: perform repeating tasks without ever
moving an inch, explore areas or imitate a human
being.
2. Autonomous robots: self supporting or in other
words self contained. In a way they rely on their own
‘brains’.
3. Virtual robots: just programs, building blocks of software inside a computer.
4. Remote controlled robots: difficult and usually dangerous tasks without
being at the spot.
5. Mobile robots: rolling, walking

Robot control loop:


Robot Components:
1. Manipulator or Rover: Main body of robot (Links, Joints, other structural element of
the robot)
2. End Effecter: The part that is connected to the last joint hand) of a manipulator.
3. Actuators: Muscles of the manipulators (servomotor, stepper motor, pneumatic
and hydraulic cylinder).
4. Sensors: To collect information about the internal state of the robot or to
communicate with the outside environment.
5. Controller: Like cerebellum. It controls and coordinates the motion of the
actuators.
6. Processor: The brain of the robot. It calculates the motions and the velocity of the
robot’s joints, etc.
7. Software: Operating system, robotic software, and the collection of routines.

SENSORS:

END EFFECTOR: an end effector is the device at the end of a robotic arm,
designed to interact with the environment. May consist of a gripper or a
tool. The gripper can be of two fingers, three fingers or even five fingers.

ACTUATORS: A device that converts energy (like electricity, hydraulic


pressure, or air pressure) into physical motion, essentially acting as the
"muscle" of a robot, allowing it to move its joints, limbs, or other parts to
perform actions; it's the component that generates the force needed for a
robot to move or manipulate objects.
(locomotion, manipulation)
Locomotion:

Manipulation:

2. AIML IN ROBOTICS:
What is AI?
It is the simulation of human intelligence in machines enabling them to perform
cognitive tasks. These tasks include learning, reasoning, problem-solving, perception,
language understanding, and decision-making. AI systems can be rule-based
(symbolic AI) or data-driven (machine learning and deep learning), allowing them to improve
performance over time. AI is used in various fields, such as healthcare, finance, social media,
automation, and robotics.

Types: Strong Artificial Intelligence


1. Computers thinking at a level that meets or surpasses people
2. Abstract reasoning & thinking
3. Not used right now.
Weak Pattern-Based Artificial Intelligence
1. Machines solve problems by detecting useful patterns (pattern-based AI
2. Has been used to automate many processes. (dominant mode of AI today)
3. Driving, language translation
MAJOR AI TECHNIQUES

RULE/LOGIC BASED: Uses predefined DATA DRIVEN (PATTERN-BASED): Learns


rules and logic to make decisions. patterns from large datasets and improves
over time.
How It Works: Follows “if-then” rules
programmed by experts. How It Works: Uses algorithms like neural
Strengths: networks, decision trees, and deep learning
models.
• Transparent decision-making Strengths:
process.
• Works well for structured problems • Can handle complex, unstructured
with clear rules. data (images, speech, text).
• Continuously improves with more
Limitations: data.

• Struggles with complex, Limitations:


unstructured data.
• Requires manual updates for new • Requires large amounts of data for
rules. training.
• Less transparent (“black-box” nature).
Examples: Expert systems, chatbots with
predefined responses, medical diagnosis Examples: Image recognition, self-driving
systems. cars, AI chatbots like ChatGPT.

MACHINE LEARNING: Branch of AI that enables systems to learn and improve from experience
without explicit programming. It develops algorithms that analyse data,
identify patterns, and make decisions. The goal is for computers to learn automatically and
adapt without human intervention.
Uses and Abuses:

• Predict the outcomes of elections


• Identify and filter spam messages from e-mail
• Foresee criminal activity
• Automate traffic signals according to road conditions
• Produce financial estimates of storms and natural disasters
• Examine customer churn
• Create auto-piloting planes and auto-driving cars
• Stock market prediction
• Target advertising to specific types of consumers

Recognizing patterns: Pattern recognition is the automated recognition of patterns and


regularities in data. It has applications in statistical data analysis, signal processing, image
analysis, information retrieval, data compression, machine learning.

How do machines learn?

Machines learn through machine learning (ML), it enables systems to improve their
performance based on experience rather than explicit programming. The learning process generally
follows these steps:

1. Data Collection – Machines require large datasets to recognize patterns and make
predictions.

2. Data Processing – The collected data is cleaned, structured, and prepared for
analysis.

3. Model Training – An algorithm is trained on the data, learning patterns and


relationships (fitting a ml model to the dataset). Adjusting the model's parameters so
that it can identify patterns and relationships within the data.

4. Evaluation – The trained model is tested on new data to assess its accuracy and
performance.

5. Optimization – The model is fine-tuned to improve accuracy and reduce errors.

6. Prediction s Deployment – The final model is used to make predictions or


automate tasks in real-world applications.

Machines learn using different approaches, including supervised learning (regression,


classification), unsupervised learning (clustering, anomaly detecn), and reinforcement learning
(learning through rewards and penalties).
TYEPS OF ALGOS IN ML:

INTELLIGENT ROBOTS: Intelligent robots are autonomous machines equipped with artificial
intelligence (AI) that enable them to perceive their environment, process
information, and make decisions to perform tasks without constant human intervention.
They integrate machine learning, computer vision, natural language processing, and sensor
technologies to adapt, learn from experiences, and improve performance over time. EX: TESLA,
DRONE, ALEXA

Scope of ML in Robotics:

AI enhances vision for object detection, grasping for optimal handling, motion control for dynamic
interaction and obstacle avoidance, and data analysis for proactive decision-
making.

COMPUTER VISION V/S ROBOT/MACHINE VISION

While computer vision focuses purely on image processing through algorithms, robot
vision or machine vision involves both software and hardware (like cameras and
sensors) that allow robots to perceive and interact with their environment.

Machine vision is responsible for advancements in robot guidance and inspection systems,
enabling automation.
The key difference is that robot vision also considers kinematics, meaning how a robot
understands its position and movements in a 3D space, allowing it to physically
interact with its surroundings.

IMITATION LEARNING:

• A machine learning technique where an AI or robot learns to perform tasks by observing and
mimicking human or expert demonstrations, rather than learning through trial and error.
(used in self-driving cars, robotics, gaming ai etc)
• Bayesian or probabilistic models are a common feature of this.

SELF SUPERVISED LEARNING:

• It is a type of machine learning where the model generates its own labels from raw data
instead of relying on manually labelled data.
• It serves as a bridge between supervised and unsupervised learning, allowing AI to learn
representations from vast amounts of unlabelled data.
• [priori (PRE-EXISTING) training and data captured close range to interpret long-range
ambiguous sensor data] EXAMPLES – WATCH BOT, FSVMs, road probabilistic
distribution model (RPDM)

MULTI-AGENT LEARNING:

Multi-Agent Learning (MAL) is a branch of machine learning where multiple AI agents learn and
interact within a shared environment. These agents can either collaborate, compete, or
coexist to achieve individual or collective goals. (coordination and negotiation)

KINEMATICS: Branch of classical mechanics which describes the motion of points


(alternatively “particles”), bodies (objects), and systems of bodies without consideration of the
masses of those objects nor the forces that may have caused the motion; often referred to as
“geometry of motion”.
BAYESIAN MODELS IN ML: Statistical models based on Bayes' theorem, which updates
probabilities as more evidence becomes available. They are widely used in machine learning, data
science, and AI for uncertainty estimation, decision-making, and probabilistic
reasoning.

INVERSE OPTIMAL CONTROL OR INVERSE REINFORCEMENT LEARNING:

Inverse Optimal Control (IOC), also called Inverse Reinforcement Learning (IRL), is the process of
recovering an unknown reward function in a Markov Decision Process (MDP) by observing an
expert's optimal behaviour (policy).

Markov Decision Process (MDP) – A framework used in reinforcement learning where an agent
takes actions in a given state to maximize rewards over time.

3. ROBOT CONFIGURATIONS:

a) Cartesian/Rectangular Gantry(3P) (XYZ configuration): 3 linear/prismatic joints,


works in rectangular prism workplace, linear motion along three principal axis.
Applications: Carrying high payloads, pick and place, 3D printing

STRUCTURE AND WORKSPACE:

b) Cylindrical (R2P): Cylindrical coordinate Robots have 2 prismatic joints


perpendicular to each other and one revolute joint mounted on rotary axis.
Applications: simple assembly tasks, painting, material handling tasks. Advantages

(Precision, efficiency, versatility, easy to use)

Disadvantages: (limited flexibility, takes up space, high initial cost)


STRUCTURE AND WORKSPACE:

c) Spherical joint (2RP)/Polar Configuration: Operates in spherical work volume, its


larger than the work volume of cylindrical and cartesian. Has 2 rotatory and 1 prismatic joint.

Applications: surgeries, welding, entertainment STRUCTURE

AND WORKSPACE:

d) Articulated/anthropomorphic(3RP)/Revolute: An articulated robot’s joints


are all revolute, like a human’s arm. Its work envelope is also spherical.
Applications: Cutting steel, flat glass handling, spot welding
STRUCTURE AND WORKSPACE:

e) Selective Compliance Assembly Robot Arm (SCARA) (2R1P): They have


two revolute joints that are parallel and allow the Robot to move in a horizontal
plane, plus an additional prismatic joint that moves vertically. Its work
envelope is cylindrical in shape.
Applications: Assembly, packaging, machine loading

STRUCTURE AND WORK ENVELOPE:


REFERENCE FRAMES:

1. World Reference Frame:


o A universal coordinate system (x, y, z) for robot motion.
o All joints move simultaneously to achieve movement along major axes.
2. Joint Reference Frame:
o Defines motion for each individual joint separately.
o Only one joint moves at a time while others remain stationary.
3. Tool Reference Frame:
o A coordinate system (x', y', z') attached to the robot's hand/tool.
o All joints move together to control the tool's motion relative to this frame.

WRIST: DOF – 3 (PITCH- up down, YAW- left right, ROLL-rotation around arm)

CONTROL METHODS: Control could mean two things. One is motion control strategy, i.e., whether
a robot is servo controlled or not, and the other one is how the motion path is achieved, i.e., point-to-
point or continuous.

1) SERVO (closed loop + feedback) / NON-SERVO (open loop):


2) POINT TO POINT AND CONTINOUS:

COORDINATE TRANSFORMATION:

- Coordinate systems are used to describe the locations of points in space.


- Multiple coordinate systems make graphics algorithms easier to implement.
- Transformations convert points between coordinate systems.

2D Affine Transformations:

Any affine transformation can be composed as a sequence of simple


transformations:
• Translation
• Scaling
• Rotation
• Shear
• Reflection.
1. 2D TRANSLATION: x’ = x + bx and y’ = y + by

3D TRANSLATION

2. 2D (x’ =x* Sx and y’ = y* Sy) AND 3D SCALING:

3. 2D ROTATION (ABOUT ORIGIN)


3D ROTATION (X ROLL, Y ROLL, Z ROLL) ABOUT ORIGIN

Combined:

Rotation about a point:


2D SHEAR:

Along X axis: or

Along Y: or

2D REFLECT:

Along X:

Along Y:

UNIT 02: SENSORS AND TRANDUCERS


1) A sensor is a device that detects and measures a physical property from its
environment and converts it into a signal that can be read by an observer or an
electronic system.

2) A transducer is a device that converts one form of energy into another. It takes an input signal
in one physical form and transforms it into an output signal of a different form, typically
electrical.
3) An actuator is a device that converts energy (usually electrical, hydraulic, or pneumatic) into
mechanical motion. It is used to move or control a mechanism or system.

CLASSIFICATION OF SENSORS:

1) BASED ON MEASURED QUANTITIES

2) BASED ON THEIR OUTPUTS

3) BASED ON THEIR SUPPLY


SOME TERMS:

Non-Linearity Error Refers to the deviation of the actual output from an ideal
straight- line response over the measurement range. It indicates how much the sensor's
output deviates from a perfectly linear relationship between input and output.

Hysteresis Error Occurs when a sensor or transducer gives different output values for
the same input, depending on whether the input is increasing or decreasing. This
means the output follows a different path when measured during an increasing input
versus a decreasing input, creating a loop-like effect.
SENSORS IN ROBOTS:
There are generally two categories of sensors used in robotics; these are for internal
purposes, and those for external purposes.

INTERNAL: Used to monitor and control the various joints of the robot; they form a feedback control
loop with the robot controller.
Examples of internal sensors include potentiometers and optical encoders, while tachometers
of various types can be deployed to control the speed of the robot arm.

EXTERNAL: Sensors that are placed outside the robot and help it interact with its surroundings,
equipment, or objects in a workspace. They are essential for coordinating robotic operations in
industrial environments.
Light Detection and Ranging (LiDAR) is a remote sensing technology that uses laser
pulses to measure distances and create 3D models.
Thresholding is the binary conversion technique in which each pixel is converted into
a binary value, either black or white.

Region growing is a collection of segmentation techniques in which the pixels are


grouped in regions called grid elements based on attribute similarities.
Edge detection is considered as the intensity change that occurs in the pixels at the
boundary or edges of a part.
NOTE:
UNIT: 03 (KINEMATICS)

1.

2.
3.
Degrees of freedom (DoF) is the number of independent movements a robot
can make. It's also used to describe the motion capabilities of robots,
including humanoid robots.

Forward kinematics refers to the process of determining the position and orientation of the end-
effector (tool or hand) of a robot given the values of its joint parameters (angles or
displacements).
UNIT: 04 (CV AND ITS APPLICATION IN ROBOTICS)

1) What is computer vision? Computer Vision is a branch of Artificial Intelligence (AI) that enables
computers to interpret and extract information from images and videos, similar to human perception. It
involves developing algorithms to process visual data and derive meaningful insights.
Applications: OCR, 3D VIEW, FACE DETECTION, SMILE DETECTION, BOUNDARY DETECTION, MEDICAL
CV is the inverse of computer graphics as cv’s goal is to recognize/understand while cg’s goal is to
create/display.

2) Computational Photography: A field that combines image processing, computer vision, computer
graphics and photography to enhance or extend the capabilities of digital cameras using algorithms and
software.

Features: Panorama, Night moder, HDR, Portrait etc. Uses: Smartphone cameras, LIDAR, AI devices,
Apps that remove blur.

Intersection of vision and graphics

Mathematics used: Signal and Image processing, Euclidean and projective geometry, Vector calculus,
Partial differential equations, Optimization, Probabilistic estimation.

3) CAMERAS, TRANSFORMATIONS, CALIBRATION:

• Challenge in 3D Vision: Extracting 3D information from 2D inputs.


• Techniques:
o 2D-to-3D reconstruction via standard cameras
o Direct 3D acquisition via range sensors
o Novel Approaches: Camera Arrays, One-pixel cameras, light field cameras
• Pinhole camera: Add a barrier between object and film with a small hole (aperture) to block
unwanted rays. Results in reduced blurring and clearer images.

o Shrinking the aperture: Small aperture = less light = clearer image but the trade-off is
that too small causes diffraction.

• Adding a LENS: Lens focuses light to a point, objects at certain distances are "in focus", Others
form blurred circles (circle of confusion), adjusting lens shape changes focus distance.

o LENSES:

- Parallel rays focus to a single focal point.


- Focal length (f) depends on lens shape and refractive index.
- Aperture diameter (D) restricts light range.
- Most lenses are spherical for manufacturing simplicity.

Thin lenses:

o Depth of field: Range of ‘in-focus’ area. Smaller aperture = greater depth of field.
• Digital Camera: Replaces film with a sensor array (each cell in the array is a CCD – charge
coupled device; CMOS is also in use nowadays). Each cell converts light (photons) into
electronic signals (electrons).
• Perspective Projection (pinhole camera model): Uses focal length f’ to map 3D points (x, y, z)
to 2D image (x’, y’)

(projection of the 3D point (x, y, z) onto this plane gives the 2D (x’, y’) image coordinates)

• Orthographic Projection: (camera is infinitely far, rays are parallel)

• Camera Calibrations: The process of determining the precise characteristics of a camera's


lens and sensor to accurately map 3D real-world objects to 2D images. To estimate intrinsic and
extrinsic parameters of a camera.

o Method: Use images of known scenes


o Tools: geometric camera models, SVD and constrained least-squares, feature/line
extraction
o Intrinsic Camera Parameters: (camera’s internal geometry)

• Focal Length (f)


• Pixel size (Sx, Sy)
• Image Centre (Ox, Oy)
• Non-linear radial distortion coefficients (k1, k2)

o Coordinate Frames: (EP – extrinsic params, IP – intrinsic params)

World Frame → Camera Frame → Image Frame

o Why calibrate? To relate 2D image points to 3D scene rays, necessary for accurate 3D
reconstruction.

o Extrinsic parameters: [Translation followed by rotation] Two formulations: Pc=R(Pw-T)


and Pc=R Pw +T. [R is same T is different]

o Perspective Camera Model:


o BASIC EQUATIONS:

o SUMMARY:
• SINGULAR VECTOR DECOMPOSITION (SVD) STEPS: A = U𝜮𝑽𝑻

Step 1: (Finding U)

o Calculate: A . 𝑨𝑻
o Then its eigen values (C – 𝜆I = 0) and eigen vectors for each eigen value.
o Organize the eigen vectors in decreasing order.
o U is obtained after normalizing the eigen vector matrix.

Step 2: (Finding 𝑽𝑻 )

o Calculate: 𝑨𝑻 . A
o Then its eigen values ( 𝜆3 − (𝑠𝑢𝑚 𝑜𝑓 𝐷𝐸) 𝜆2 + (𝑠𝑢𝑚 𝑜𝑓 𝑚𝑖𝑛𝑜𝑟 𝐷𝐸)𝜆 − |𝑀𝑎𝑡| ) and eigen vectors
(by Cramer’s rule) for each eigen value.
o Organize the vectors in decreasing order.
o V is obtained by normalizing the eigen vector matrix.
o Find 𝑽𝑻 .

Step 3: (Finding 𝜮)

o Matrix generated by sq. root of eigen values (in step 2) in decreasing order diagonally.
o Assemble the matrices together.

• VISION IN ROBOTICS:

o Methods for 3d vision – Used in inspection and measurement of complex shapes.

Common technologies: Time of flight (time based), Geometric and angle based- Laser scanning
based triangulation (most common), stereo or stereoscopic vision, shape from shading, light
stripe projection, white light interferometry.

Line scan principle: Produces a point cloud (cod) (not image-like).

(potential blind spots caused by shadowing can be solved by using multiple cameras)

• Why to use 3D vision technology?

o Volumetric measurements (XYZ) for shape and position params.


o Contrast invariant measurements, ideal for low contrast images.
o Resistance to lighting changes
o Simpler multi-sensor configurations for large object inspection.
o High repeatability due to integrated objects, lighting, pre-calibration.
• PROJECTION: A 3D projection (or graphical projection) is a design technique used to display a
three-dimensional (3D) object on a two-dimensional (2D) surface.

o Based on visual perspective and aspect analysis


o Involves imaginary projectors to visualize objects on flat surfaces.
o Two types: Parallel Projection, Perspective Projection

• Parallel Projection – The lines of sight from the object to the projection plane are parallel to
each other. Thus, parallel lines in 3d space remain parallel in 2d projected image.

Equivalent to infinite focal length (zoom) in camera terms (no distortion).

• Perspective projection/transformation - Linear projection where three dimensional objects


are projected on a picture plane (2d). Parallel lines in reality appear to converge, mimics human
eye perception: distant objects look smaller, used in photorealistic rendering and simulations.

• Shape from shading (PHOTOCLINOMETRY): A 2-dimensional image of a surface is


transformed into a surface map that represents different levels of elevation. It uses shadows
and light direction as reference points. Useful in sculpture modelling and terrain mapping.
Helps generate bump map of a surface which used grayscale levels to depict height of a point
on the surface.

• Photometric Stereo: Used to estimate the surface normals and albedo (reflectivity) of an
object by capturing multiple images of the same object under different illumination conditions.
Based on the fact that the amount of light reflected by a surface is dependent on the
orientation of the surface in relation to the light source and the observer. Creates normal
maps from pixel-by-pixel lighting analysis.

More light directions = better accuracy


• Shape from Texture: Computer vision technique that uses texture distortions and patterns in
a 2D image to infer its 3D shape.

• Shape from focus/defocus: A computer vision technique used to reconstruct 3D shape


information from an image sequence where the focal point of the camera is shifted. By
analysing the blur and sharpness in the images, SFF can infer the depth of objects in the scene.

• Active range finding: Use unilateral transmission and passive reflection. (ex: LIDAR, RADAR,
SONAR etc). A signal is radiated toward an object or target of interest and the reflected or
scattered signal is detected by a receiver and used to determine the range.

• Surface Representation: Extension of curve representation to surfaces. Represented as grid


points inside its bounding curves (2D/3D). Parametric and non-parametric equations used.
Techniques: Fitting (pass through all points) and Approximation (smooth transitions between
patches)

• Point based representation: Uses individual points (or "landmarks") to represent an object or
scene, often in a 3D space (point clouds). Local Binary Pattern (LBP) is a local descriptor that
compares each pixel's intensity to its neighbors, creating a binary code.
• Volumetric Representations: A method of representing 3D objects or scenes using a grid of 3D
cells called voxels. Each voxel contains information about a specific property at that location,
such as density, color, or temperature. This representation allows for detailed 3D modeling and
analysis.

o Subtractive and Additive Manufacturing (SM & AM) - In SM, a cutter removes
material from a homogeneous block until the desired shape is formed. AM needs
more flexible and advanced modeling. Volumetric methods aim to support AM’S
demands.

• 3D object Recognition and Reconstruction:

3D object recognition involves recognizing and determining 3D information, such as the pose,
volume, or shape, of user-chosen 3D objects in a photograph or range scan. Some algorithms
are designed to recognize one specific object (pre identified) , while others can detect general
types of objects like faces or common items.

3D reconstruction is the process of creating a three-dimensional model of objects or scenes


from two-dimensional images by capturing the shape and appearance of real objects. It can be
done by active (LIDAR) or passive (stereo vision) methods. If the model is allowed to change its
shape in time, this is referred to as non- rigid or spatio-temporal reconstruction.

• Triangulation: A method used in 3D vision to determine the position or depth of a point in


space by using two or more camera views (or sensors).

o Two cameras capture images of the same object from different angles.
o By knowing the angle and distance between the cameras and measuring where the
object appears in each image, the system forms a triangle.
o Using geometry, it calculates the exact 3D position of the object point (depth or
distance from cameras).

• Two problems of STEREO: correspondence and reconstruction

Correspondence: The task of finding matching points between two or more images taken from
different viewpoints, often from a stereo camera pair. It involves determining which pixels in one
image correspond to the same physical location in another, which is crucial for depth
perception and 3D reconstruction. Two main approaches:

• Intensity-based: Attempt to establish some correspondence by


matching image intensities.

Feature-based: Attempt to establish some correspondence by
matching a sparse sets of image features (edge points, line segments, corners etc).
Reconstruction (2d to 3d): Given the corresponding points, we compute the disparity map. The
disparity map can be converted to a 3D map of the scene assuming that the stereo geometry is
known.

• Depth from disparity: Disparity is inversely proportional to depth.


• Essential Matrix: The Essential Matrix is a 3 x 3 matrix that encodes epipolar geometry. Given a
point in one image, multiplying by the essential matrix will tell us the epipolar line in the second
view.

o Epipolar geometry (geometry of stereo vision): Describes the geometric relationship


between two views of a 3D scene taken by different cameras. It is independent of scene
structure, and only depends on the cameras’ internal parameters and relative pose.

o Epipolar line (𝒍): A line that helps match where the same point in the real world
appears in two different camera images.

Why? Bcoz dot product of two orthogonal vectors is zero.

So, if 𝒙𝑻 𝒍 = 𝟎 and 𝑬𝒙 = 𝒍′ then 𝒙′𝑻 𝑬𝒙 = 𝟎.


Essential matrix derivation:

• Camera – camera transform (just like world - camera transform) is:

• If then,

• Also,

• Putting it together,

• Now, and cross product changes to dot product.

How?

• Here, E is the essential matrix:


Properties of E matrix:

Fundamental Matrix (F): It is a 3 × 3 matrix of rank 2. (rank: the maximum number of linearly
independent rows or columns in the matrix) with 7 degrees of freedom.

If a point in 3-space X is imaged as x in the first view, and x′ in the second, then the image points satisfy
the relation .

Derivation:
Notes:

Summary terms:

You might also like