0% found this document useful (0 votes)
6 views12 pages

Computer Vision

The document outlines the primary tasks of computer vision, which include semantic segmentation, classification and localization, and object detection. It explains semantic segmentation as classifying images based on visual content, while classification and localization involve identifying objects and placing bounding boxes around them. Additionally, it discusses the concepts of pixels, resolution, grayscale images, and colored images in the context of digital photography.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views12 pages

Computer Vision

The document outlines the primary tasks of computer vision, which include semantic segmentation, classification and localization, and object detection. It explains semantic segmentation as classifying images based on visual content, while classification and localization involve identifying objects and placing bounding boxes around them. Additionally, it discusses the concepts of pixels, resolution, grayscale images, and colored images in the context of digital photography.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Computer Vision

Computer Vision
Primary Tasks- There are primarily three tasks that
Computer vision accomplishes:

1.Semantic Segmentation (Image Classification)


2.Classification + Localization
3. Object Detection
Semantic Segmentation
Semantic Segmentation is also called the Image classification.
Semantic segmentation is a process in Computer Vision where
an image is classified depending on its visual content.
Basically, a set of classes (objects to identify in images) are
defined and a model is trained to recognize them with the
help of labelled example photos.
In simple terms it takes an image as an input and outputs a
class i.e. a cat, dog etc. or a probability of classes from which
one has the highest chance of being correct. For human, this
ability comes naturally and effortlessly but for machines, it’s a
fairly complicated process.
•Image Classification: Predict the type or class of an
object in an image.
• Input: An image with a single object, such as a
photograph.

• Output: A class label (e.g. one or more integers that


are mapped to class labels).
Classification and Localization
Once the object classified and labelled, the localization task is
evoked which puts a bounding box around the object in the
picture. The term ‘localization’ refers to where the object is in
the image. Say we have a dog in an image, the algorithm
predicts the class and creates a bounding box around the
object in the image.
•Object Localization: Locate the presence of objects
in an image and indicate their location with a bounding
box.
•Input: An image with one or more objects, such as a
photograph.
•Output: One or more bounding boxes (e.g. defined by
a point, width, and height)
Object Detection
• When human beings see a video or an image, they
immediately identify the objects present in them. This
intelligence can be duplicated using a computer. If we have
multiple objects in the image, the algorithm will identify all
of them and localise (put a bounding box around) each one
of them. You will therefore, have multiple bounding boxes
and labels around the objects.
•Object Detection: Locate the presence of objects with a bounding box and types or

classes of the located objects in an image.

• Input: An image with one or more objects, such as a photograph.

• Output: One or more bounding boxes (e.g. defined by a point, width, and

height), and a class label for each bounding box.


Pixel- The word “pixel” means a picture element. Every photograph, in digital form, is
made up of pixels. They are the smallest unit of information that make up a picture.
Usually round or square, they are typically arranged in a 2-dimensional grid.
RESOLUTION -The number of pixels in an image is called the resolution.
Another convention is to express the number of pixels as a single number, like a 5 mega
pixel camera (a megapixel is a million pixels). This means the pixels along the width
multiplied by the pixels along the height of the image taken by the camera equals 5
million pixels. In the case of our 1280×1024 monitors, it could also be expressed as 1280 x
1024 = 1,310,720, or 1.31 megapixels.
Since each pixel uses 1 byte of an image, which is equivalent to 8 bits of
data. Since each bit can have two possible values which tells us that the
8 bit can have 255 possibilities of values which starts from 0 and ends at
255.

Grayscale Images Grayscale images are images which have a range of shades of gray without
apparent colour. The darkest possible shade is black, which is the total absence of colour or zero
value of pixel. The lightest possible shade is white, which is the total presence of colour or 255
value of a pixel . Intermediate shades of gray are represented by equal brightness levels of the
three primary colours.
Coloured Images
All the images that we see around are coloured images. These images are made up
of three primary colours Red, Green and Blue. All the colours that are present can be
made by combining different intensities of red, green and blue.

You might also like