Augmented Reality - Unit 5
Augmented Reality - Unit 5
Unit 5
Computer Vision for Augmented
Reality
Computer vision for AR is concerned with
electronically perceiving and understanding
imagery from camera sensors that can inform
the AR system about the user and the
surrounding environment.
Computer vision forms the crucial backbone that
enables AR systems to analyze real-world visuals
to achieve spatial understanding and overlay
virtual graphics accurately
Processes
• Spatial Mapping and Scene Reconstruction
Using visual inputs from smartphone cameras or specialized depth sensors
on AR headsets, computer vision algorithms construct detailed 3D maps of
the environment.
This process, known as simultaneous localization and mapping (SLAM),
tracks feature points in the scene to model depth, surfaces, and spatial
relationships.
• Lighting Estimation
CV algorithms estimate real-world lighting conditions by analyzing
brightness and shadowing patterns in the environment.
This data is used to modulate the rendering properties of virtual AR objects
to blend in seamlessly with ambient illumination.
• Occlusion Handling
By combining environment mapping with object detection outputs,
computer vision in augmented reality rendering engines leverage CV
outputs to determine where virtual objects should be occluded by real
surfaces and where they should be visible for proper visual coherence.
• Motion Tracking
As users or their device cameras move, CV algorithms continuously
track visual motion across frames.
Marker-based or marker-less techniques identify anchor points to
update the 3D world and AR content positions relative to the
changing viewpoint and device motion.
• Object Detection and Recognition
Robust computer vision models identify real-world objects by
detecting their presence across images or video frames and
classifying them into known categories like people, cars, buildings,
furniture, etc.
• Surface Detection and Meshing
Beyond just object recognition, CV models identify real-world planar
surfaces like walls, floors, and tabletops through geometric reasoning
and shade reconstruction.
Working of Computer Vision
The Application of Computer Vision
• Consumer
AR filters and lenses in social apps like Instagram and Snapchat use CV
for facial recognition, motion tracking and 3D animation. Gaming
companies leverage CV environment mapping for realistic gameplay
rendering.
• Retail and eCommerce
Virtual try-on apps overlay virtual clothing, cosmetic rendering, and
furniture placement in shoppers' environments through real-time CV
mapping. This enhances buyer confidence through visualization.
• Healthcare
AR surgery guidance systems use CV tracking to overlay rendered
anatomy graphics precisely aligned to the patient's body to help
surgeons during procedures.
Benefits of augmented reality powered by
computer vision
• Intuitive User Experience
• Enhanced Context and Understanding
• Remote Assistance
• Visualization and Previews
CASE STUDY
• Case study on marker tracking: This simple example introduces a basic camera
representation, contour-based shape detection, pose estimation from a homography, and
nonlinear pose refinement.
• Case study on multi-camera infrared tracking: This case study presents a crash course in
multi-view geometry. The reader learns about 2D–2D point correspondences in multiple-
camera images, epipolar geometry, triangulation, and absolute orientation.
• Case study on natural feature tracking by detection: This case study introduces interest
point detection in images, creation and matching of descriptors, and robust computation
of the camera pose from known 2D–3D correspondences (Perspective-n-Point Pose,
RANSAC).
• Case study on incremental tracking: This case study explains how to track features across
consecutive frames using active search methods (KLT, ZNCC) and how incremental
tracking can be combined with tracking by detection.
• Case study on simultaneous localization and mapping: This case study explores pose
computation from 2D–2D correspondences (fivepoint pose, bundle adjustment). We also
look into modern techniques such as parallel tracking and mapping, and dense tracking
and mapping.
• Case study on outdoor tracking: This case study presents methods for tracking in wide-
area outdoor environments—a capability that requires scalable feature matching and
assistance from sensor fusion and geometric priors.
1. Marker Tracking
Detecting the four corners of a flat marker in an image from a single
calibrated camera delivers just enough information to recover the
pose of the camera relative to the marker.
The steps following provides an overview of the marker tracking
pipeline. The pipeline consists of five stages:
Key Components:
• Center of Projection (c): All 3D points' projections pass through this point.
• Image Plane (Π): The plane where the projected image is formed.
• Principal Point (c′): The point on the image plane directly along the
optical axis.
• Optical Axis: The line connecting the center of projection (c) and the
principal point (c′).
• Focal Length (f): The distance between the center of projection (c) and
the principal point (c′).
Marker Detection
These methods can even deal with strong artifacts such as glossy reflections on the
marker. Unfortunately, they are computationally intensive. A cheaper
method is to determine the threshold locally (e.g., in a 4 × 4 sub-area) and
interpolate it linearly over the image
Pose Estimation from Homography
The four corners of a flat marker are an instance of a
frequently encountered geometric situation
constraining the known points qi to lie on a plane.
We assume that the marker defines the plane Π ′:qz = 0
in world coordinates, and that marker corners have the
coordinates [0 0 0]T, [1 0 0]T, [1 1 0]T, and [0 1 0]T.
We can then express a 3D point q Π ′ as a homogeneous
2D point q′ = [qx qy 1]T. Mapping from one plane to
another can be mathematically modeled as a
homography defined by a 3 × 3 matrix H
Pose Refinement
Pose estimation cannot always be computed
directly from imperfect point correspondences
with the desired accuracy. Therefore, the pose
estimation is refined by iteratively minimizing
the reprojection error.
When a first estimate of the camera pose is
known, we minimize the displacement of the
known points, projected using points from its
known image location.
2. Multiple-Camera Infrared Tracking