Unit 1 Introduction
Unit 1 Introduction
Contents
◉ Computer Vision
◉ Image Processing
◉ Low-level Computer Vision
◉ Mid-level Computer Vision
◉ High-level Computer Vision
◉ Overview of Diverse Computer Vision
Computer Vision
Computer Vision (Cont.)
◉ Computer vision is one of the most important fields of artificial intelligence (AI) and
computer science engineering that focuses on creating digital systems that can process,
analyze, and make sense of visual data (images or videos) in the same way that humans
do.
◉ The concept of computer vision is based on teaching computers to process an image at a pixel level
and understand it.
◉ Further, it also helps to take appropriate actions and make recommendations based on the extracted
information.
◉ If artificial intelligence enables computer systems to think intelligently, computer vision makes them
capable of seeing, analyzing, and understanding.
◉ The image data can take many forms, such as a video sequence, depth images, views from multiple
cameras, or multi-dimensional data from a medical scanner
Image Processing
Pixel
[10,250,0]
What Is an Image?
◉ An image is represented by its dimensions (height and width) based on the number of pixels. For
example, if the dimensions of an image are 500 x 400 (width x height), the total number of pixels in
the image is 200000.
◉ This pixel is a point on the image that takes on a specific shade, opacity or color.
Image Processing (Cont.)
Enhanced image
Mathematical
Digital
operation or
Image
algorithm
Edge
Image Processing (Cont.)
◉ Image processing requires fixed sequences of operations that are performed at each pixel of an
image.
◉ The image processor performs the first sequence of operations on the image, pixel by pixel. Once this
is fully done, it will begin to perform the second operation, and so on.
◉ The output value of these operations can be computed at any pixel of the image.
◉ Image Processing Techniques
⮚ Image Segmentation
⮚ Color Image Processing
⮚ Image Restoration
⮚ Object Detection
⮚ Morphological Operations
Computer Vision
Why Computer Vision?
◉ Computer vision helps us solve some of the most difficult problems there are in
computer science related to real-time processing and understanding of visual
information such as an image, a video stream, etc. These problems were hard to
solve in the past because we did not, at that time, have the processing power
required to process such data at a fast enough speed. Also, we did not have any
way for our machines to be able to understand what a particular object looked like
and what it should be called.
◉ Because of these issues, even though our machines were becoming quite good at
tasks such as loading, transferring, and displaying data in visual formats like videos
and images, we were not able to build systems that could understand this kind of
data in any meaningful way. Tasks such as figuring out the text contained in an
image or being able to recognize a number in an image looked simple but were
quite hard practically. Even a simple task like detecting the presence of human
faces in a photo or a video was very hard to accomplish and was done after a lot of
research and failed attempts.
Why Computer Vision (Cont.)
◉ Giving machines the ability to understand these kinds of visual images has become
even more important in today’s digital age, where everyone has access to the
Internet and can put any content on any of the online social media platforms.
◉ For example, if someone tries to put some false information in a textual format on
any of these platforms, then most of these platforms are smart enough to either tag
it as unverified or even remove it. However, if the same information is put online as
an image or a video, then these systems, without computer vision, would not be
able to understand its content and would, therefore, have to publish it until
someone reports it.
Computer Vision (Cont.)
◉ Machines interpret images as a series of pixels, each with their own set of color values.
◉ For example, below is a picture of Abraham Lincoln. Each pixel’s brightness in this image is
represented by a single 8-bit number, ranging from 0 (black) to 255 (white). These numbers are what
software sees when you input an image. This data is provided as an input to the computer vision
algorithm that will be responsible for further analysis and decision making.
Computer Vision v/s Image Processing
◉ Computer vision is quite a different field from image processing, and these two things should not be
considered as being similar. Digital image processing is the process of creating new images from an
existing image. The new images are created using special algorithms designed for achieving a specific
output from an image. This includes tasks such as creating a black and white version of an image,
removing noise from an image, etc. This is similar to digital signal processing. In other words, digital
image processing is used for the generation of new images and does not in any way try to understand
the content of an image, i.e., it has no idea what object an image contains. It only knows how to
convert it from one form to another.
◉ Computer vision, on the other hand, is used for understanding the content of an image or a video. It
deals with extracting useful information out of images, e.g., if an image contains a human face,
whether it was taken during the day or the night, what the objects are there in the image, etc.
Computer vision does not manipulate images or create new ones in any way.
Computer Vision Hierarchy
◉ The continuum from image processing to computer vision can be broken up into low-, mid- and high-
level processes
◉ Low-level vision − It includes processing image for feature extraction.
◉ Intermediate-level vision − It includes object recognition and segmentation
◉ High-level vision − It includes conceptual description of a scene like activity, intention and
behaviour.
Low Level Vision
◉ Set of operations performed on images aiming at enhancing their quality and selecting useful
information, which will be processed by humans or other algorithms.
◉ It is mainly concerned with extracting descriptions from images (that are usually represented as
images themselves). The analysis usually does not know anything about what objects are actually in
the scene, nor where the scene is relative to the observer. There may be multiple, largely independent
descriptions, such as edge fragments, spots, reflectances, line fragments, etc.
◉ For example, if one was looking at an image of a coffee mug on a desk, the low level descriptions
would make explicit where the mug edges were, where specular highlights were on the mug surface,
what the colours on the mug were. As this description is still linked to an image, these descriptions
would apply everywhere in the image, not just to the mug.
◉ Tasks:
⮚ Primitive operations such as image processing to reduce noise, contrast enhancement, and image
sharpening
Image Image
Image processing
Low Level Vision (Cont.)
sharpening
blurring
Low Level Vision (Cont.)
◉ In the mid-level process, inputs are generally images but its output are generally image attributes
(e.g., edges, contours, identity of individual objects)
◉ Includes extraction of symbolic information from pre-processed images (low-level vision output) and
analysis techniques of the visual characteristics of the objects that are in the images.
◉ It is mainly concerned with extracting descriptions of the scene from the image descriptions extracted
at the low level. The output is usually in some more symbolic form, describing the position and shape
of portions of the scene. The analysis usually does not know anything about what objects are in the
scene, but does use a lot of knowledge of scene shape and how shape appears in an image.
◉ In our coffee mug example, the kinds of descriptions one might expect are 3D position of the edges of
the mug, portions of its surface shape, depth relationships between adjacent surface patches, which
features are moving and where, etc.
◉ Tasks:
⮚ Segmentation(Partitioning an image into regions or objects)
⮚ Description of those objects to reduce them to a form suitable for computer processing.
⮚ Classifications (recognition) of objects
High-level vision
◉ High-level vision is to infer the semantics, for example, object recognition and scene understanding
◉ In this input is attribute and output is understanding
◉ It includes interpretation of the evolving information provided by the middle level vision as well as
directing what middle and low level vision tasks should be performed. Interpretation may include
conceptual description of a scene like activity, intention, and behaviour.
◉ High level vision is concerned mainly with the interpretation of scene in term of the objects in it, and is
usually based on knowledge of specific objects and relationships. The analysis usually involves
symbolic descriptions, although it might make reference to results from the low and middle levels to
verify hypotheses.
◉ Typical results of high level analysis are a naming of objects present in the scene, estimates of their
position, identification of objects that can satisfy a particular function, descriptions of what sorts of
motions are occurring, or summaries of what sort of scene it is (e.g. an office scene).
◉ In the coffee mug example, the results might say that we are looking at a coffee mug, sitting in a desk
at a given position, the mug is half-full, there is nothing else nearby that could be used to hold coffee,
and the desk is cluttered.
High-level vision
Overview
Overview of Computer Vision
◉ Computer vision is a field of artificial intelligence that trains computers to interpret and understand
the visual world. Machines can accurately identify and locate objects then react to what
they “see” using digital images from cameras, videos, and deep learning models.
◉ As computer vision evolved, programming algorithms were created to solve individual challenges.
Machines became better at doing the job of vision recognition with repetition. Over the years, there
has been a huge improvement of deep learning techniques and technology. We now have the ability to
program supercomputers to train themselves, self-improve over time and provide capabilities to
businesses as online applications.
Thank You