0% found this document useful (0 votes)
19 views3 pages

Unit-5 Computer Vision

Computer vision is a branch of AI that allows computers to interpret and analyze visual data from images and videos. Key applications include facial recognition, medical imaging, and autonomous vehicles, while tasks involve image classification, object detection, and feature extraction. Techniques like convolution and pooling are fundamental in processing images, with Convolutional Neural Networks (CNNs) being widely used for image recognition.

Uploaded by

araj131207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views3 pages

Unit-5 Computer Vision

Computer vision is a branch of AI that allows computers to interpret and analyze visual data from images and videos. Key applications include facial recognition, medical imaging, and autonomous vehicles, while tasks involve image classification, object detection, and feature extraction. Techniques like convolution and pooling are fundamental in processing images, with Convolutional Neural Networks (CNNs) being widely used for image recognition.

Uploaded by

araj131207
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Part-B, Unit-5 Computer Vision

Q: What is Computer Vision


Ans: - Computer vision is a branch in the Domain of AI that enables computers to analyse meaningful information from
images, videos, and other visual inputs.
Computer vision is the same as the human eye, it enables us see-through images or visual data, process and analyses
them on the basis of algorithms and methods in order to analyse actual phenomena with images.

Q: Mention some Applications of Computer Vision.


Ans: some Applications of Computer Vision are:

Facial recognition
The most frequently used technology is smartphones. It is a technology to remember and verify a person, object, etc
from the visuals from the given pre-defined data. Such kinds of mechanics are often used for security and safety
purposes.
For eg: Face security lock-in devices and traffic cameras are some examples using facial recognition.

Facial filters
A Face Filter is a computer-generated effect that applies predesigned edits or changes to a loaded image. Modern days
social media apps like Snapchat and Instagram use such kinds of technology that extract facial landmarks and process
them using AI to get the best result.

Goggle lens
To search data, Google uses Computer vision for capturing and analysing different features of the input image to the
database of images and then gives us the search.

Automotive (Autonomous Cars)


The machinery in industries is now using Computer vision. Automated cars are equipped with sensors and software which
can detect the 360 degrees of movements determine the location, detect objects and establish the depth or dimensions
of the virtual world.
For eg: Companies like Tesla are now interested in developing self-driving cars.

Medical Imaging
For the last decades, computer vision medical imaging application has been a trustworthy help for physicians and doctors.
It creates and analyses images and helps doctors with their interpretation.
The application is used to read and convert 2D scan images into interactive 3D models.

Other applications of computer vision are:


Banking, Agriculture, Retail Business, Warehouse Automation, Damage Analysis, Livestock Farming etc.

Computer Vision Tasks


The Application of the computer is performed by certain tasks on the data or input provided by the user so it can process
and analyse the situation and predict the outcome.

Single object
This means giving one image as input to the Computer Vision application. It is divided into two categories: -

1. Image Classification
Image Classification is the task of identifying an object in the input image and label from a predefined category.

2. Classification + Localization
As the name suggests, the task identifies the object and locates it in the input image.

Multiple object
This means giving multiple images as input to the Computer Vision application. It is divided into two categories: -

1.Object detection
Object detection tasks extract features from the input and use learned formulas to recognize instances of an object
category.
2.Instance segmentation
Instance segmentation assigns a label to each pixel of the image. It is used for tasks such as counting the number of
objects.

Basics of Images
The word “pixel” means a picture element.

Pixels
• Pixels are the fundamental element of a photograph.
• They are the smallest unit of information that make up a picture.
• They are typically arranged in a 2-dimensional grid.
• In general term, the more pixels you have, the more closely the image resembles the original.

Resolution
• The number of pixels covered in an image is sometimes called the resolution
• Term for area covered by the pixels in conventionally known as resolution.
• For eg :1080 x 720 pixels is a resolution giving numbers of pixels in width and height of that picture.
• A megapixel is a million pixels
Pixel value
• Pixel value represent the brightness of the pixel.
• The range of a pixel value in 0-255(28-1)
• where 0 is taken as Black or no colour and 255 is taken as white.

Q: Why do pixel values have numbers?


Ans: Computer systems only work in the form of ones and zeros or binary systems. Each bit in a computer system can
have either a zero or a one. Each pixel uses 1 byte of an image each bit can have two possible values which tells us that
the 8 bits can have 255 possibilities of values that start from 0 and ends at 255.

Greyscale Images
• Grayscale images are images which have a range of shades of grey without apparent colour. The lightest shade is
white total presence of colour or 255 and darkest colour is black at 0.
• Intermediate shades of grey have equal brightness levels of the three primary colours RBG.
• The computers store the images we see in the form of these numbers.

RBG colours
• All the coloured images are made up of three primary colours Red, Green and Blue.
• All the other colour are formed by using these primary colours at different proportions.
• Computer stores RGB Images in three different channels called the R channel, G channel and the B channel.

Image Features
• A feature is a description of an image.
• Features are the specific structures in the image such as points, edges or objects.
• Other examples of features are related to tasks of CV motion in image sequences, or to shapes defined in terms
of curves or boundaries between different image regions.

OpenCV or Open-Source Computer Vision Library is that tool that helps a computer to extract these features from the
images. It is capable of processing images and videos to identify objects, faces, or even handwriting.

Convolution
Convolutions are one of the most critical, fundamental building blocks in computer vision and image processing.
We learned that computers store images in numbers and that pixel are arranged in a particular manner to create the
picture we can recognize. As we change the values of these pixels, the image changes.
Image convolution is simply an element-wise multiplication of two matrices followed by a sum. Convolution is using
a ‘kernel’ to extract certain ‘features’ from an input image.

Kernel- A kernel is a matrix or a small matrix used for blurring, sharpening, and many more which is slid across the image
and multiplied with the input such that the output is enhanced in a certain desirable manner.
Q: Mention the steps of convolution:
Ans: - The steps of convolution
1. Take two matrices (Input Image +kernel with dimensions).
2. Multiply them, element-by-element (i.e., not the dot-product, just a simple multiplication).
3. Sum the elements together.
4. Then the sum will be the centre value of the image.

Convolution Neural Network


CNN is an efficient recognition algorithm that is widely used in image recognition and processing that is specifically
designed to process pixel data.

Convolution Layer
The first Convolution Layer is responsible for capturing the Low-Level features such as edges, colour, gradient
orientation, etc. In the convolution layer, there are several kernels that help us in processing the image further produce
several features. The output of this layer is called the feature map.
For eg: If we consider it as a kid, we teach him the landmarks in the image, and then if he finds these similar landmarks in
another, he will identify that object same is the case with AI we use convolution for picking the landmark from the input
for further editing.

Rectified Linear Unit Function


The next layer in the CNN is the Rectified Linear Unit function or the ReLU layer. This layer simply gets rid of all the
negative numbers in the feature map and lets the positive number stay as it is. It has become the default activation
function for many types of neural networks because a model that uses it is easier to train and often achieves better
performance.

Pooling Layer
The Pooling layer is responsible for reducing the spatial size of the Convolved Feature while still retaining the important
features. Image is more resistant to small transformations, distortions, and translations to the input image.

There are two types of pooling:


1. Max pooling: The maximum pixel value of the batch is selected.
2. Average pooling: The average value of all the pixels in the batch is selected.

Q: What is the difference between convolution and pooling layer?


Ans: - The major difference is if you include a large stride in the convolution filter, you are changing the types of features
you extract in the algorithm, whereas if you change it in the pooling layer, you are simply changing how much the data is
down sampled.

Fully Connected Layer


The final layer in the CNN is to connect all the dots and make a conclusion within the input and output of the image.
The output from the convolutional/pooling layers represents high-level features in the data. That output needs to be
connected to the output layer; A fully-connected layer is a cheap way of learning non-linear combinations of these
features.
For eg: if the image is of a cat, features representing things like whiskers or fur should have high probabilities for the label
“cat”

*********************

You might also like