0% found this document useful (0 votes)
25 views25 pages

Unit 1 Introduction

The document provides an overview of computer vision, detailing its importance in artificial intelligence and its distinction from image processing. It explains the hierarchy of computer vision, including low-level, mid-level, and high-level processes, and how machines interpret visual data through pattern recognition. Additionally, it discusses the evolution of computer vision technology and its applications in understanding and analyzing visual information.

Uploaded by

aryansuthar194
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views25 pages

Unit 1 Introduction

The document provides an overview of computer vision, detailing its importance in artificial intelligence and its distinction from image processing. It explains the hierarchy of computer vision, including low-level, mid-level, and high-level processes, and how machines interpret visual data through pattern recognition. Additionally, it discusses the evolution of computer vision technology and its applications in understanding and analyzing visual information.

Uploaded by

aryansuthar194
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Unit 1: Introduction

Contents

◉ Computer Vision
◉ Image Processing
◉ Low-level Computer Vision
◉ Mid-level Computer Vision
◉ High-level Computer Vision
◉ Overview of Diverse Computer Vision
Computer Vision
Computer Vision (Cont.)

◉ Computer vision vs human vision

What we see What a computer sees


Computer Vision (Cont.)

◉ Computer vision is one of the most important fields of artificial intelligence (AI) and
computer science engineering that focuses on creating digital systems that can process,
analyze, and make sense of visual data (images or videos) in the same way that humans
do.
◉ The concept of computer vision is based on teaching computers to process an image at a pixel level
and understand it.
◉ Further, it also helps to take appropriate actions and make recommendations based on the extracted
information.
◉ If artificial intelligence enables computer systems to think intelligently, computer vision makes them
capable of seeing, analyzing, and understanding.
◉ The image data can take many forms, such as a video sequence, depth images, views from multiple
cameras, or multi-dimensional data from a medical scanner
Image Processing

How computers see image?

Pixel
[10,250,0]

What Is an Image?
◉ An image is represented by its dimensions (height and width) based on the number of pixels. For
example, if the dimensions of an image are 500 x 400 (width x height), the total number of pixels in
the image is 200000.
◉ This pixel is a point on the image that takes on a specific shade, opacity or color.
Image Processing (Cont.)

◉ Pixel is usually represented in one of the following:


⮚ Grayscale - A pixel is an integer with a value between 0 to 255 (0 is completely black and 255 is
completely white).
⮚ RGB - A pixel is made up of 3 integers between 0 to 255 (the integers represent the intensity of
red, green, and blue).
⮚ RGBA - It is an extension of RGB with an added alpha field, which represents the opacity of the
image.
Image Processing (Cont.)
Image processing is the process of transforming an image into a digital form and performing
certain operations to get some useful information from it.

Enhanced image

Mathematical
Digital
operation or
Image
algorithm

Edge
Image Processing (Cont.)

◉ Image processing requires fixed sequences of operations that are performed at each pixel of an
image.
◉ The image processor performs the first sequence of operations on the image, pixel by pixel. Once this
is fully done, it will begin to perform the second operation, and so on.
◉ The output value of these operations can be computed at any pixel of the image.
◉ Image Processing Techniques
⮚ Image Segmentation
⮚ Color Image Processing
⮚ Image Restoration
⮚ Object Detection
⮚ Morphological Operations
Computer Vision
Why Computer Vision?

◉ Computer vision helps us solve some of the most difficult problems there are in
computer science related to real-time processing and understanding of visual
information such as an image, a video stream, etc. These problems were hard to
solve in the past because we did not, at that time, have the processing power
required to process such data at a fast enough speed. Also, we did not have any
way for our machines to be able to understand what a particular object looked like
and what it should be called.
◉ Because of these issues, even though our machines were becoming quite good at
tasks such as loading, transferring, and displaying data in visual formats like videos
and images, we were not able to build systems that could understand this kind of
data in any meaningful way. Tasks such as figuring out the text contained in an
image or being able to recognize a number in an image looked simple but were
quite hard practically. Even a simple task like detecting the presence of human
faces in a photo or a video was very hard to accomplish and was done after a lot of
research and failed attempts.
Why Computer Vision (Cont.)

◉ Giving machines the ability to understand these kinds of visual images has become
even more important in today’s digital age, where everyone has access to the
Internet and can put any content on any of the online social media platforms.
◉ For example, if someone tries to put some false information in a textual format on
any of these platforms, then most of these platforms are smart enough to either tag
it as unverified or even remove it. However, if the same information is put online as
an image or a video, then these systems, without computer vision, would not be
able to understand its content and would, therefore, have to publish it until
someone reports it.
Computer Vision (Cont.)

How does Computer Vision Work?


◉ Computer vision technology tends to mimic the way the human brain works. But how does our brain
solve visual object recognition? One of the popular hypothesis states that our brains rely on patterns
to decode individual objects. This concept is used to create computer vision systems.
◉ Computer vision algorithms that we use today are based on pattern recognition. We train computers
on a massive amount of visual data—computers process images, label objects on them, and find
patterns in those objects.
◉ Firstly, a vast amount of visual labelled data is provided to machines to train it. This labeled data
enables the machine to analyse different patterns in all the data points and can relate to those labels.
E.g., suppose we provide visual data of millions of dog images. In that case, the computer learns from
this data, analyzes each photo, shape, the distance between each shape, color, etc., and hence
identifies patterns similar to dogs and generates a model. As a result, this computer vision model can
now accurately detect whether the image contains a dog or not for each input image.
Computer Vision (Cont.)

◉ Machines interpret images as a series of pixels, each with their own set of color values.
◉ For example, below is a picture of Abraham Lincoln. Each pixel’s brightness in this image is
represented by a single 8-bit number, ranging from 0 (black) to 255 (white). These numbers are what
software sees when you input an image. This data is provided as an input to the computer vision
algorithm that will be responsible for further analysis and decision making.
Computer Vision v/s Image Processing

◉ Computer vision is quite a different field from image processing, and these two things should not be
considered as being similar. Digital image processing is the process of creating new images from an
existing image. The new images are created using special algorithms designed for achieving a specific
output from an image. This includes tasks such as creating a black and white version of an image,
removing noise from an image, etc. This is similar to digital signal processing. In other words, digital
image processing is used for the generation of new images and does not in any way try to understand
the content of an image, i.e., it has no idea what object an image contains. It only knows how to
convert it from one form to another.
◉ Computer vision, on the other hand, is used for understanding the content of an image or a video. It
deals with extracting useful information out of images, e.g., if an image contains a human face,
whether it was taken during the day or the night, what the objects are there in the image, etc.
Computer vision does not manipulate images or create new ones in any way.
Computer Vision Hierarchy

◉ The continuum from image processing to computer vision can be broken up into low-, mid- and high-
level processes
◉ Low-level vision − It includes processing image for feature extraction.
◉ Intermediate-level vision − It includes object recognition and segmentation
◉ High-level vision − It includes conceptual description of a scene like activity, intention and
behaviour.
Low Level Vision

◉ Set of operations performed on images aiming at enhancing their quality and selecting useful
information, which will be processed by humans or other algorithms.
◉ It is mainly concerned with extracting descriptions from images (that are usually represented as
images themselves). The analysis usually does not know anything about what objects are actually in
the scene, nor where the scene is relative to the observer. There may be multiple, largely independent
descriptions, such as edge fragments, spots, reflectances, line fragments, etc.
◉ For example, if one was looking at an image of a coffee mug on a desk, the low level descriptions
would make explicit where the mug edges were, where specular highlights were on the mug surface,
what the colours on the mug were. As this description is still linked to an image, these descriptions
would apply everywhere in the image, not just to the mug.
◉ Tasks:
⮚ Primitive operations such as image processing to reduce noise, contrast enhancement, and image
sharpening

Image Image
Image processing
Low Level Vision (Cont.)

sharpening

blurring
Low Level Vision (Cont.)

◉ Noise Removal Example

Calculate dirty Background Subtract


Dirty Image background Median Filtering
image’s Noise
background noise from denoised image
noise dirty image
Mid level vision

◉ In the mid-level process, inputs are generally images but its output are generally image attributes
(e.g., edges, contours, identity of individual objects)
◉ Includes extraction of symbolic information from pre-processed images (low-level vision output) and
analysis techniques of the visual characteristics of the objects that are in the images.
◉ It is mainly concerned with extracting descriptions of the scene from the image descriptions extracted
at the low level. The output is usually in some more symbolic form, describing the position and shape
of portions of the scene. The analysis usually does not know anything about what objects are in the
scene, but does use a lot of knowledge of scene shape and how shape appears in an image.
◉ In our coffee mug example, the kinds of descriptions one might expect are 3D position of the edges of
the mug, portions of its surface shape, depth relationships between adjacent surface patches, which
features are moving and where, etc.
◉ Tasks:
⮚ Segmentation(Partitioning an image into regions or objects)
⮚ Description of those objects to reduce them to a form suitable for computer processing.
⮚ Classifications (recognition) of objects
High-level vision

◉ High-level vision is to infer the semantics, for example, object recognition and scene understanding
◉ In this input is attribute and output is understanding
◉ It includes interpretation of the evolving information provided by the middle level vision as well as
directing what middle and low level vision tasks should be performed. Interpretation may include
conceptual description of a scene like activity, intention, and behaviour.
◉ High level vision is concerned mainly with the interpretation of scene in term of the objects in it, and is
usually based on knowledge of specific objects and relationships. The analysis usually involves
symbolic descriptions, although it might make reference to results from the low and middle levels to
verify hypotheses.
◉ Typical results of high level analysis are a naming of objects present in the scene, estimates of their
position, identification of objects that can satisfy a particular function, descriptions of what sorts of
motions are occurring, or summaries of what sort of scene it is (e.g. an office scene).
◉ In the coffee mug example, the results might say that we are looking at a coffee mug, sitting in a desk
at a given position, the mug is half-full, there is nothing else nearby that could be used to hold coffee,
and the desk is cluttered.
High-level vision
Overview
Overview of Computer Vision

◉ Computer vision is a field of artificial intelligence that trains computers to interpret and understand
the visual world. Machines can accurately identify and locate objects then react to what
they “see” using digital images from cameras, videos, and deep learning models.
◉ As computer vision evolved, programming algorithms were created to solve individual challenges.
Machines became better at doing the job of vision recognition with repetition. Over the years, there
has been a huge improvement of deep learning techniques and technology. We now have the ability to
program supercomputers to train themselves, self-improve over time and provide capabilities to
businesses as online applications.
Thank You

You might also like