DIP L01 Introduction
DIP L01 Introduction
Think-Pair-Share
Laptop: Biometrics auto-login (face recognition, 3D), OCR
Smartphones: QR codes, computational photography (Android Lens Blur, iPhone
Portrait Mode), panorama construction (Google Photo Spheres), face detection,
expression detection (smile), Snapchat filters (face tracking), FaceID (iPhone), Night
Sight (Pixel), iPhone 12 Pro (LiDAR)
Web: Image search, Google photos (face recognition, object recognition, scene
recognition, geolocalization from vision), Facebook (image captioning), Google maps
aerial imaging (image stitching), YouTube (content categorization)
VR/AR: Outside-in tracking (HTC VIVE), inside out tracking (simultaneous localization
and mapping, HoloLens), object occlusion (dense depth estimation)
Motion: Kinect, full body tracking of skeleton, gesture recognition, virtual try-on
Medical imaging: CAT / MRI reconstruction, assisted diagnosis, automatic pathology,
connectomics, endoscopic surgery
Industry: Vision-based robotics (marker-based), machine-assisted router (jig),
automated post, ANPR (number plates), surveillance, drones, shopping
Transportation: Assisted driving (everything), face tracking/iris dilation for
drunkeness, drowsiness, automated distribution (all modes)
Media: Visual effects for film, TV (reconstruction), virtual sports replay
(reconstruction), semantics-based auto edits (reconstruction, recognition)
Optical character recognition (OCR)
Technology to convert images of text into text
If you have a scanner, it probably came with OCR software
Live
Camera
Translation
Liang et
al. 2014
Vision-based biometrics
JH
Smile detection
Apple FaceTime
Attention Correction
Object recognition (in mobile phones)
e.g., Google Lens
Object recognition (in supermarkets)
How does it work? Think-Pair-Share
How does it work?
Source: Vivek Ramanujan
3D from images
JH
Sports
JH
Medical imaging
JH
AutoCars - Uber bought CMU’s lab (2015)
Then sold it (2020)
http://www.robocup.org/
Mobile robots
Saxena et al. 2008
STAIR at Stanford
Skydio 2 drone
6x fisheye cameras for
obstacle avoidance
Onboard NVIDIA GPU
Vision in space
NASA'S Mars Exploration Rover Spirit captured this westward view from atop
a low plateau where Spirit spent the closing months of 2007.
Oculus (Quest)
Niantic
AI for Physical Interaction