AIR Notes
AIR Notes
Zeroth Law : A robot may not injure humanity, or, through inaction, allow
humanity to come to harm
First Law : A robot may not injure a human being, or, through inaction, allow a
human being to come to harm, unless this would violate a higher order law
Second Law: A robot must obey orders given it by human beings, except where
such orders would conflict with a higher order law
Third Law : A robot must protect its own existence as long as such protection
does not conflict with a higher order law.
TYPES OF ROBOTS:
1. Stationary robots: perform repeating tasks without ever
moving an inch, explore areas or imitate a human
being.
2. Autonomous robots: self supporting or in other
words self contained. In a way they rely on their own
‘brains’.
3. Virtual robots: just programs, building blocks of software inside a computer.
4. Remote controlled robots: difficult and usually dangerous tasks without
being at the spot.
5. Mobile robots: rolling, walking
SENSORS:
END EFFECTOR: an end effector is the device at the end of a robotic arm,
designed to interact with the environment. May consist of a gripper or a
tool. The gripper can be of two fingers, three fingers or even five fingers.
Manipulation:
2. AIML IN ROBOTICS:
What is AI?
It is the simulation of human intelligence in machines enabling them to perform
cognitive tasks. These tasks include learning, reasoning, problem-solving, perception,
language understanding, and decision-making. AI systems can be rule-based
(symbolic AI) or data-driven (machine learning and deep learning), allowing them to improve
performance over time. AI is used in various fields, such as healthcare, finance, social media,
automation, and robotics.
MACHINE LEARNING: Branch of AI that enables systems to learn and improve from experience
without explicit programming. It develops algorithms that analyse data,
identify patterns, and make decisions. The goal is for computers to learn automatically and
adapt without human intervention.
Uses and Abuses:
Machines learn through machine learning (ML), it enables systems to improve their
performance based on experience rather than explicit programming. The learning process generally
follows these steps:
1. Data Collection – Machines require large datasets to recognize patterns and make
predictions.
2. Data Processing – The collected data is cleaned, structured, and prepared for
analysis.
4. Evaluation – The trained model is tested on new data to assess its accuracy and
performance.
INTELLIGENT ROBOTS: Intelligent robots are autonomous machines equipped with artificial
intelligence (AI) that enable them to perceive their environment, process
information, and make decisions to perform tasks without constant human intervention.
They integrate machine learning, computer vision, natural language processing, and sensor
technologies to adapt, learn from experiences, and improve performance over time. EX: TESLA,
DRONE, ALEXA
Scope of ML in Robotics:
AI enhances vision for object detection, grasping for optimal handling, motion control for dynamic
interaction and obstacle avoidance, and data analysis for proactive decision-
making.
While computer vision focuses purely on image processing through algorithms, robot
vision or machine vision involves both software and hardware (like cameras and
sensors) that allow robots to perceive and interact with their environment.
Machine vision is responsible for advancements in robot guidance and inspection systems,
enabling automation.
The key difference is that robot vision also considers kinematics, meaning how a robot
understands its position and movements in a 3D space, allowing it to physically
interact with its surroundings.
IMITATION LEARNING:
• A machine learning technique where an AI or robot learns to perform tasks by observing and
mimicking human or expert demonstrations, rather than learning through trial and error.
(used in self-driving cars, robotics, gaming ai etc)
• Bayesian or probabilistic models are a common feature of this.
• It is a type of machine learning where the model generates its own labels from raw data
instead of relying on manually labelled data.
• It serves as a bridge between supervised and unsupervised learning, allowing AI to learn
representations from vast amounts of unlabelled data.
• [priori (PRE-EXISTING) training and data captured close range to interpret long-range
ambiguous sensor data] EXAMPLES – WATCH BOT, FSVMs, road probabilistic
distribution model (RPDM)
MULTI-AGENT LEARNING:
Multi-Agent Learning (MAL) is a branch of machine learning where multiple AI agents learn and
interact within a shared environment. These agents can either collaborate, compete, or
coexist to achieve individual or collective goals. (coordination and negotiation)
Inverse Optimal Control (IOC), also called Inverse Reinforcement Learning (IRL), is the process of
recovering an unknown reward function in a Markov Decision Process (MDP) by observing an
expert's optimal behaviour (policy).
Markov Decision Process (MDP) – A framework used in reinforcement learning where an agent
takes actions in a given state to maximize rewards over time.
3. ROBOT CONFIGURATIONS:
AND WORKSPACE:
WRIST: DOF – 3 (PITCH- up down, YAW- left right, ROLL-rotation around arm)
CONTROL METHODS: Control could mean two things. One is motion control strategy, i.e., whether
a robot is servo controlled or not, and the other one is how the motion path is achieved, i.e., point-to-
point or continuous.
COORDINATE TRANSFORMATION:
2D Affine Transformations:
3D TRANSLATION
Combined:
Along X axis: or
Along Y: or
2D REFLECT:
Along X:
Along Y:
2) A transducer is a device that converts one form of energy into another. It takes an input signal
in one physical form and transforms it into an output signal of a different form, typically
electrical.
3) An actuator is a device that converts energy (usually electrical, hydraulic, or pneumatic) into
mechanical motion. It is used to move or control a mechanism or system.
CLASSIFICATION OF SENSORS:
Non-Linearity Error Refers to the deviation of the actual output from an ideal
straight- line response over the measurement range. It indicates how much the sensor's
output deviates from a perfectly linear relationship between input and output.
Hysteresis Error Occurs when a sensor or transducer gives different output values for
the same input, depending on whether the input is increasing or decreasing. This
means the output follows a different path when measured during an increasing input
versus a decreasing input, creating a loop-like effect.
SENSORS IN ROBOTS:
There are generally two categories of sensors used in robotics; these are for internal
purposes, and those for external purposes.
INTERNAL: Used to monitor and control the various joints of the robot; they form a feedback control
loop with the robot controller.
Examples of internal sensors include potentiometers and optical encoders, while tachometers
of various types can be deployed to control the speed of the robot arm.
EXTERNAL: Sensors that are placed outside the robot and help it interact with its surroundings,
equipment, or objects in a workspace. They are essential for coordinating robotic operations in
industrial environments.
Light Detection and Ranging (LiDAR) is a remote sensing technology that uses laser
pulses to measure distances and create 3D models.
Thresholding is the binary conversion technique in which each pixel is converted into
a binary value, either black or white.
1.
2.
3.
Degrees of freedom (DoF) is the number of independent movements a robot
can make. It's also used to describe the motion capabilities of robots,
including humanoid robots.
Forward kinematics refers to the process of determining the position and orientation of the end-
effector (tool or hand) of a robot given the values of its joint parameters (angles or
displacements).
UNIT: 04 (CV AND ITS APPLICATION IN ROBOTICS)
1) What is computer vision? Computer Vision is a branch of Artificial Intelligence (AI) that enables
computers to interpret and extract information from images and videos, similar to human perception. It
involves developing algorithms to process visual data and derive meaningful insights.
Applications: OCR, 3D VIEW, FACE DETECTION, SMILE DETECTION, BOUNDARY DETECTION, MEDICAL
CV is the inverse of computer graphics as cv’s goal is to recognize/understand while cg’s goal is to
create/display.
2) Computational Photography: A field that combines image processing, computer vision, computer
graphics and photography to enhance or extend the capabilities of digital cameras using algorithms and
software.
Features: Panorama, Night moder, HDR, Portrait etc. Uses: Smartphone cameras, LIDAR, AI devices,
Apps that remove blur.
Mathematics used: Signal and Image processing, Euclidean and projective geometry, Vector calculus,
Partial differential equations, Optimization, Probabilistic estimation.
o Shrinking the aperture: Small aperture = less light = clearer image but the trade-off is
that too small causes diffraction.
• Adding a LENS: Lens focuses light to a point, objects at certain distances are "in focus", Others
form blurred circles (circle of confusion), adjusting lens shape changes focus distance.
o LENSES:
Thin lenses:
o Depth of field: Range of ‘in-focus’ area. Smaller aperture = greater depth of field.
• Digital Camera: Replaces film with a sensor array (each cell in the array is a CCD – charge
coupled device; CMOS is also in use nowadays). Each cell converts light (photons) into
electronic signals (electrons).
• Perspective Projection (pinhole camera model): Uses focal length f’ to map 3D points (x, y, z)
to 2D image (x’, y’)
(projection of the 3D point (x, y, z) onto this plane gives the 2D (x’, y’) image coordinates)
o Why calibrate? To relate 2D image points to 3D scene rays, necessary for accurate 3D
reconstruction.
o SUMMARY:
• SINGULAR VECTOR DECOMPOSITION (SVD) STEPS: A = U𝜮𝑽𝑻
Step 1: (Finding U)
o Calculate: A . 𝑨𝑻
o Then its eigen values (C – 𝜆I = 0) and eigen vectors for each eigen value.
o Organize the eigen vectors in decreasing order.
o U is obtained after normalizing the eigen vector matrix.
Step 2: (Finding 𝑽𝑻 )
o Calculate: 𝑨𝑻 . A
o Then its eigen values ( 𝜆3 − (𝑠𝑢𝑚 𝑜𝑓 𝐷𝐸) 𝜆2 + (𝑠𝑢𝑚 𝑜𝑓 𝑚𝑖𝑛𝑜𝑟 𝐷𝐸)𝜆 − |𝑀𝑎𝑡| ) and eigen vectors
(by Cramer’s rule) for each eigen value.
o Organize the vectors in decreasing order.
o V is obtained by normalizing the eigen vector matrix.
o Find 𝑽𝑻 .
Step 3: (Finding 𝜮)
o Matrix generated by sq. root of eigen values (in step 2) in decreasing order diagonally.
o Assemble the matrices together.
• VISION IN ROBOTICS:
Common technologies: Time of flight (time based), Geometric and angle based- Laser scanning
based triangulation (most common), stereo or stereoscopic vision, shape from shading, light
stripe projection, white light interferometry.
(potential blind spots caused by shadowing can be solved by using multiple cameras)
• Parallel Projection – The lines of sight from the object to the projection plane are parallel to
each other. Thus, parallel lines in 3d space remain parallel in 2d projected image.
• Photometric Stereo: Used to estimate the surface normals and albedo (reflectivity) of an
object by capturing multiple images of the same object under different illumination conditions.
Based on the fact that the amount of light reflected by a surface is dependent on the
orientation of the surface in relation to the light source and the observer. Creates normal
maps from pixel-by-pixel lighting analysis.
• Active range finding: Use unilateral transmission and passive reflection. (ex: LIDAR, RADAR,
SONAR etc). A signal is radiated toward an object or target of interest and the reflected or
scattered signal is detected by a receiver and used to determine the range.
• Point based representation: Uses individual points (or "landmarks") to represent an object or
scene, often in a 3D space (point clouds). Local Binary Pattern (LBP) is a local descriptor that
compares each pixel's intensity to its neighbors, creating a binary code.
• Volumetric Representations: A method of representing 3D objects or scenes using a grid of 3D
cells called voxels. Each voxel contains information about a specific property at that location,
such as density, color, or temperature. This representation allows for detailed 3D modeling and
analysis.
o Subtractive and Additive Manufacturing (SM & AM) - In SM, a cutter removes
material from a homogeneous block until the desired shape is formed. AM needs
more flexible and advanced modeling. Volumetric methods aim to support AM’S
demands.
3D object recognition involves recognizing and determining 3D information, such as the pose,
volume, or shape, of user-chosen 3D objects in a photograph or range scan. Some algorithms
are designed to recognize one specific object (pre identified) , while others can detect general
types of objects like faces or common items.
o Two cameras capture images of the same object from different angles.
o By knowing the angle and distance between the cameras and measuring where the
object appears in each image, the system forms a triangle.
o Using geometry, it calculates the exact 3D position of the object point (depth or
distance from cameras).
Correspondence: The task of finding matching points between two or more images taken from
different viewpoints, often from a stereo camera pair. It involves determining which pixels in one
image correspond to the same physical location in another, which is crucial for depth
perception and 3D reconstruction. Two main approaches:
o Epipolar line (𝒍): A line that helps match where the same point in the real world
appears in two different camera images.
• If then,
• Also,
• Putting it together,
How?
Fundamental Matrix (F): It is a 3 × 3 matrix of rank 2. (rank: the maximum number of linearly
independent rows or columns in the matrix) with 7 degrees of freedom.
If a point in 3-space X is imaged as x in the first view, and x′ in the second, then the image points satisfy
the relation .
Derivation:
Notes:
Summary terms: