0% found this document useful (0 votes)
2 views

Computer_Vision_1_introduction

The document provides an overview of computer vision, emphasizing its ability to analyze and interpret images and videos using AI algorithms, which aims to replicate human vision capabilities. It discusses the differences between human and computer vision, the processing levels involved in computer vision (low, mid, and high), and various applications such as object detection, facial recognition, and scene reconstruction. Additionally, it highlights the importance of feature extraction and matching in enhancing computer vision performance.

Uploaded by

sushanth.tambe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Computer_Vision_1_introduction

The document provides an overview of computer vision, emphasizing its ability to analyze and interpret images and videos using AI algorithms, which aims to replicate human vision capabilities. It discusses the differences between human and computer vision, the processing levels involved in computer vision (low, mid, and high), and various applications such as object detection, facial recognition, and scene reconstruction. Additionally, it highlights the importance of feature extraction and matching in enhancing computer vision performance.

Uploaded by

sushanth.tambe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Computer Vision

Introduction
Computer Vision in general
• Analysis and understanding of single or multiple images
• Use single or multiple cameras, apply pre-processing
and then apply pattern recognition or AI algorithms for decision making
• Computer vision - Image/video as inputs and output is interpretation
• Image processing - input is image and output is also an image
Computer Vision

• Objective is to see objects like humans and possibly even better


• Amount of data generated is tremendous - more than 3 billion images/day are
shared online
• Data can be used to train deep learning models to make computer vision better
• Requires high level understanding of digital images or videos
• Computer Vision is a subfield of Deep Learning and Artificial Intelligence
• Algorithms make computers see and interpret the scene/ images
Image
• Human can
• observe in a few seconds
• process and take intelligent decisions
• perform tasks effortlessly and
effectively
• For computer
• not fast and not easy
• Computer vision enable computers to see
the world in the same way humans do
Features of Human Vision (stereo vision, shape)

• Both eyes measure different distances Shape similarity


• Using distance, depth is calculated
• Stereo Vision is required for depth
calculation
Features of Human Vision (texture, color)
Different texture patterns can have
different shape visualization Identify objects Based on color
Features of Human Vision (object recognition)
Recognize a friend in a photograph clicked many years ago

• Human can recognize image with different illumination, view point, expressions etc
• There is no limit on how many faces we can store in our brain for future recognition
Features of Human Vision (object identification)

• Human vision can infer the context and


key information

• Computer vision is more difficult task then human vision


Features of Human Vision (object identification)

Human and computer vision identify


• Human can and for computer vision, it
is a challenge
• Change algorithm to identify

Human vision is more powerful than computer vision

• Human can and for computer vision,


it is a challenge
• Change algorithm to identify
Human Vision
• Human vision can provide
• Depth perception
• Relative position/ occlusion
• Shading of objects
• Sharpness of edges of objects
• Size and shape of objects
• Structure of object
• Limitations of Human Vision
• limited memory
• limited to visible spectrum
• illusion
•…
Limitations of Human Vision

• Able to establish the context


• Computer vision can interpret and have complete observations within a short time
Limitations of Human Vision

Difference in distance Difference in perception


Limitations of Human Vision

• Sizes of orange circle appear to be different


• Human does interpolation of objects
Computer Vision vs Image Processing
• Image processing is image-to-image transformation
• Typical image processing operations include
• image compression
• image restoration
• image enhancement
• Most computer vision algorithms work on images which already pre-processed to
improve image quality
Applications
• Object
• Classification: Broad category of object in image
• Identification: Type of a given object in image
• Detection: Check whether object exists in image
• Landmark Detection: Identify key points of the objects
• Segmentation: Identify pixels belonging to objects
• Recognition: Existence and location of objects
• Video motion analysis to estimate the velocity of objects in a video, or the
camera
• Scene reconstruction to create a 3D model of a scene captured in the form of
images or video
Specific Applications
Specific Applications
Specific Applications
Specific Applications

Most of the applications are based on features of image


Features of image

• Features of a region of an image is used to represent the region


• Edges are feature
• Corners are more localized and can be used to generate features
• Corner points are called key points
• Generate features of key points
• Feature matching relates the features of similar region of one image with those of
another image
Feature Matching
• Feature matching are used for object identification
• Steps for feature matching
• Detection of keypoints:
• Harris Corner Detection, SIFT, and SURF
• Local descriptors:
• Region surrounding each keypoint is captured and local descriptors are obtained
• Feature matching:
• Derive features from local descriptors
• Match features in the corresponding images
Feature Matching
Scene Reconstruction
• Digital 3D reconstruction of an object from a photograph
Video Motion Analysis
• Is a study of moving objects and the trajectory of objects
• Motion analysis is a combination of
• Object detection
• Tracking
• Segmentation
• Estimation of movement
• Human motion analysis is used in areas like
• Sports
• Intelligent video analytics etc
• Manufacturing
• Count and track microorganisms like bacteria and viruses
Real-world computer vision applications
• Self-driving cars (allows self-driving cars to safely steer through streets and
highways)
• Facial recognition (match images of people’s faces to their identities)
• Augmented reality (mix virtual objects with real-world images)
• Medical imaging (scan X-rays, MRIs, and ultrasounds to detect health problems)
• to spot defective products on the assembly line and prevent them from shipping
to customers
• Intelligent Video Analytics
• Manufacturing and Construction
• OCR
• Retail
• Banks use it to verify customers’ identities before conducting large transactions
Levels of processing for computer Vision

• Low Level Processing


• Mid Level Processing
• High Level Processing
Low Level Processing
• Image enhancement
• Apply edge detection, corner detection, filtering, morphology are used

Texture to determine repetitive pattern


Edge Detection
Low Level Processing

Low Level Features

Lines Corners Salient points


Low Level Processing
Image Matching

Image Stitching
Low level features/ vision

Boundary detection Variation in texture information determines shape of objects


Low level features/ vision

Get stereo image to get depth


Images of left and right cameras are used to
information
determine disparity map which gives depth
information
Low level features/ vision
Structure from motion

• For a moving object, surface closer to camera


moves faster than farthest surface
• This helps in defining shape/ depth
information
Mid Level Processing
• Segmentation by breaking images/ videos into useful pieces followed by interpretation
• Find video sequences that correspond to one scenario
• Keep track of moving object

Find correspondence between frames through a


sequence of video frames

Track object using background


subtraction
Mid Level Processing
Object tracking - Grouping objects which have similar optical flow

• Detect area of interest and predict where the object


will be in next frame
Mid Level Processing

• K-means clustering (k=7)


High Level Processing (Image Understanding)
• Generated from low-level feature
• Contain more complicated details about image/video
• Reconstruct, interpret and understand a 3D scene from its 2D images in terms of
the properties of the structures present in the scene
• Ex: Convolutional neural networks (CNNs), Recurrent Neural Networks (RNNs) for
learning high-level features
High Level Processing (Image Understanding)
High Level Processing (Image Understanding)
Theme understanding

• Does it have people? • What objects are present in the image?


• Is it a market place or football ground • Draw bounding box around each label
or garden? • Classify object as building (easy)
• Identify location of bicycle • Is it a house or shop?
High Level Processing (Image Understanding)
Example: Event recognition
• Video has a scene of clapping and a person is cutting cake
• If birthday party, birthday boy is cutting the cake
Visual recognition:
• Classifying images/ videos, localize objects
• Classify human activities

• Identify action
• Predict next action
Difference between Low and High Level Processing
• Low level is characteristics extracted from an image, such as colors, edges, and
textures
• High level is extracted from low-level features and denotes more meaningful
concepts

Lines, Corners, Edges Faces with expressions


Difference between Low and High Level Processing
Low Level High Level
Content • Related to the raw pixel data of the • More robust
image • Higher level of understanding of the image content
• More sensitive to noise and changes
in the image
Scale • Typically retrieved at a local scale • Frequently retrieved at a global scale, which means
• Vulnerable to little modifications of that they take into account the whole picture or
the picture, like lighting or video and are more robust
orientation
Resources • Feature extraction usually takes • Requires more advanced machine learning
fewer system resources than high- methods
level feature extraction
Task • frequently task-specific, which • frequently more generic and suitable for a broader
specificity means they are appropriate for a range of jobs
certain set of activities
Useful • Image segmentation, object • Image classification, object recognition, and scene
detection, and feature matching understanding
Difference between Low and High Level Processing
Low Level High Level
Content • Related to the raw pixel data of the • More robust
image • Higher level of understanding of the image content
• More sensitive to noise and changes
in the image
Scale • Typically retrieved for local region • Frequently retrieved at a global scale
• Vulnerable to little modifications of • Takes into account the whole image/video
the picture, like lighting or • More robust
orientation
Resources • Feature extraction usually takes • Requires more advanced machine learning
fewer system resources than high- methods
level feature extraction
Task • frequently task-specific, which • frequently more generic and suitable for a broader
specificity means they are appropriate for a range of jobs
certain set of activities
Useful • Image segmentation, object • Image classification, object recognition, and scene
detection, and feature matching understanding
Difference between Low and High Level Processing
Low Level High Level
Content • Related to the raw pixel data of the • More robust
image • Higher level of understanding of the image content
• More sensitive to noise and changes
in the image
Scale • Typically retrieved for local region • Frequently retrieved at a global scale
• Vulnerable to little modifications of • Takes into account the whole image/video
the picture, like lighting or • More robust
orientation
Resources • Feature extraction usually takes • Requires more advanced techniques
fewer system resources
Task • frequently task-specific, which • frequently more generic and suitable for a broader
specificity means they are appropriate for a range of jobs
certain set of activities
Useful • Image segmentation, object • Image classification, object recognition, and scene
detection, and feature matching understanding
Difference between Low and High Level Processing

Low Level High Level


Content • Related to the raw pixel data of the • More robust
image • Higher level of understanding of the image content
• More sensitive to noise and changes
in the image
Scale • Typically retrieved for local region • Frequently retrieved at a global scale
• Vulnerable to little modifications of • Takes into account the whole image/video
the picture, like lighting or • More robust
orientation
Resources • Feature extraction usually takes • Requires more advanced techniques
fewer system resources
Useful • Image segmentation, object • Image classification, object recognition, and scene
detection, and feature matching understanding

You might also like