Research Acv
Research Acv
ARTIFICIAL INTELLIGENCE IN
COMPUTER VISION
Aryan Karn
Motilal Nehru National Institute of Technology Allahabad, Prayagraj
Department of Electronics and Communication Engineering
Abstract- Computer vision is an area of research concerned often employed in visual sign decoding. In addition, it is used in
with assisting computers in seeing. Computer vision issues aim Computer Vision and Natural Language Processing to organize
to infer something about the world from observed picture data material (NLP). It is possible to construct a convolutional neural
at the most abstract level. It is a multidisciplinary subject that network using a variety of structural blocks. These structural
may be loosely classified as a branch of artificial intelligence blocks include convolution layers, pooling layers, and fully
and machine learning, both of which may include using connected layers, all of which will be discussed briefly in this
specific techniques and using general-purpose learning article. In the next sections, the author covers Deep Learning and
methods. As an interdisciplinary field of research, it may seem the many neural network techniques lumped together. In addition,
disorganized, with methods taken and reused from various the book covers Convolutional Neural Networks, their
engineering and computer science disciplines. While one construction, and their applications in several fields, including
specific vision issue may be readily solved with a hand-crafted medicine and engineering.
statistical technique, another may need a vast and
sophisticated ensemble of generic machine learning II. LITERATURE SURVEY
algorithms. Computer vision as a discipline is at the cutting
edge of science. As with any frontier, it is thrilling and chaotic, A. Deep Learning and Neural Networks
with often no trustworthy authority to turn to. Numerous
beneficial concepts lack a theoretical foundation, and some Machine Learning is a subset of Deep Learning, a subset of
theories are rendered ineffective in reality; developed regions Artificial Intelligence (AI). Machine learning uses algorithms and
are widely dispersed, and often one seems totally unreachable training data to automatically detect patterns and with little human
from the other. intervention. Artificial Intelligence is a method for teaching
computers to act like humans. At the same time, Deep Learning is
Keywords—Computer vision, Artificial intelligence, Neural networks, inspired by the structure and function of the human brain, as
CNN., Deep learning, machine learning represented symbolically by an artificial neural network. [12]
While deep learning was originally proposed in the 1980s, it has
I. INTRODUCTION shown significant benefits in recent years for two primary reasons:
Recently, computer vision has gained traction and popularity A. This requires a significant level of knowledge. For instance, the
as a consequence of the many applications it has found in areas development of autonomous vehicles necessitates the collection of
like health and medical, sports and entertainment, automaton many pictures and lengthy video recordings.
design, and self-driving cars. Many of these applications rely on
visual recognition tasks such as image order, restriction, and B. Deep learning requires a large capacity for recording. High-
identification. Recent advances in Convolutional Neural Networks performance GPUs offer an efficient parallel design that is well-
(CNNs) have resulted in an extraordinary performance in these suited for deep learning. When used in conjunction with clusters
best-in-class visual recognition assignments and frameworks, or cloud computing, this significantly lowers the time required to
demonstrating the power of Convolutional Neural Networks. train a deep learning network from weeks to hours or less. [11].
Consequently, convolutional neural networks (CNNs) have Deep learning may be used to solve a wide range of problems. For
emerged as the basic building blocks of deep learning example, the author discusses autonomous driving, aerospace and
computations in computer vision. military, medical research, industrial automation, and electronics
in more detail in the closing part of the article.
Deep Neural Networks (DNN) is a kind of neural network that has
better image identification skills and is often utilized in computer In general, a Neural Network is a kind of algorithm that accepts
vision computations. Convolutional Neural Networks (CNN or certain input parameters and processes them using an Activation
ConvNet) is a subtype of Deep Neural Networks (DNNs) that are Function to get the desired Output. In this method, the input to
249
International Journal of Engineering Applied Sciences and Technology, 2021
Vol. 6, Issue 1, ISSN No. 2455-2143, Pages 249-254
Published Online May 2021 in IJEAST (http://www.ijeast.com)
output processing component is referred to as a neuron. Consider In deep learning, a convolutional neural network (CNN), often
the fundamental example of calculating the purchase price of a known as a ConvNet, is a kind of deep neural network that is
home. Numerous factors must be taken into account, each of frequently used to analyze visual pictures. In certain areas, it is
which has an impact on the Price component. For instance, the also referred to as a convolutional neural network (CNN). These
square footage of the room, the number of bedrooms, and the zip artificial neural networks are referred to as shift-invariant artificial
code. Thus, if we take price as an Output, the following Neural neural networks or space-invariant artificial neural networks due
Network shows how a neural network could produce that Output to their shared-weights architecture and translation invariance
using the parameters stated earlier as inputs. properties (SIANNs). Algorithms may be used to identify pictures
and videos, create recommender systems, categorize images, do
medical image analysis, and evaluate natural language. In the next
part, the author discusses what Convolution is, how it extracts data
from pictures, and the architecture and components of CNN,
among other things. This will show how CNN examines the
content of an image and processes the data in order to provide the
intended result to the audience.
C. Architectural Overview
Fig 1: An example of Standard Neural Network [1] Convolution is a mathematical procedure that takes two functions
and produces a third function that illustrates how the shape of one
Each circle represents a neuron that is given an Activation function is affected by the shape of the other. To complete the
Function that computes the desired Output by combining distinct operation, the Convolution process requires the calculation of the
values for various input parameters. The Activation Function is Result function, as well as the initialization of the Result function.
determined by the algorithm's purpose/application. For instance, Convolution is a data processing technique that entails
each circle represents a neuron that is given an Activation categorizing the components (content) of an image in order to
Function that computes the desired Output by combining distinct assist Machine Learning and ultimately generate the desired
values for various input parameters. The Activation Function is Output through the algorithm. It is utilized in the processing of
determined by the algorithm's purpose/application. For instance, picture data. Deep Learning and Neural Networks are two
in the above example, the objective is to determine the maximum different types of neural networks that are capable of analyzing
price of a house. For the sake of simplicity, let us suppose that the image data. Deep Learning is a kind of neural network that enables
Output is solely dependent on two input variables, namely the size data-driven learning. As indicated by the procedure's name, the
and number of bedrooms. In this instance, the bigger the house and convolution process separates the wheat from the chaff.
the more bedrooms, the greater the price of the house. Thus, the
Activation Function (Neuron) will be defined in such a manner This structure may be seen as a three-dimensional volume of
that it will pick the greatest possible value for each input parameter neurons in a cellular environment. A distinguishing feature of how
and then compute the Output. Obviously, this seems to be very CNNs have evolved from earlier feed-forward versions is their
easy in this case, but when a large number of factors are involved, ability to improve computational efficiency via the addition of
decision-making is not as straightforward as it appears based on new layer types to their design. How about we take a closer look
maximum or minimum values alone. And here is where Data- at the general design of CNNs right now? [4]
Driven Machine Learning comes into play. The method takes
advantage of data saved (learned!) from previous instances in D. Basic CNN components
order to calculate the optimal Output using the Activation
Function. The above example shows a Standard Neural Network, 1. Convolutional Layer:
which is often used to generate Output from statistical, numerical,
CNN, or convolutional neural network, is a kind of neural
and other quantitative data. The kind of Neural Network to employ
network model that is designed for dealing with two-dimensional
is determined by the nature of the input data that the algorithm
image data, although it may also be used to deal with one-
must handle. The following table summarizes the capabilities of
dimensional and three-dimensional data. Convolution is
different Neural Networks in processing various kinds of input
accomplished via the use of a channel (a small matrix whose size
data. [1] In the remainder of this article, the author will concentrate
may be chosen). In this channel, which travels the whole picture
only on the Convolutional Neural Network method used in Deep
network, the task is to reproduce the image's features by utilizing
Learning.
the pixel values that were first used. Each of these increases is
B. Deep Learning using Convolutional Neural Network for added together to form a single number towards the end of the
Computer Vision process. When doing a comparison action, the channel moves
250
International Journal of Engineering Applied Sciences and Technology, 2021
Vol. 6, Issue 1, ISSN No. 2455-2143, Pages 249-254
Published Online May 2021 in IJEAST (http://www.ijeast.com)
The way to detect the vertical edge in the image is to look for the
pixel values as, if the pixel values are greater, then brightness at
that part of the image will be more, and if the value is less, it will
be dark. [14]
2. Pooling Layer:
251
International Journal of Engineering Applied Sciences and Technology, 2021
Vol. 6, Issue 1, ISSN No. 2455-2143, Pages 249-254
Published Online May 2021 in IJEAST (http://www.ijeast.com)
252
International Journal of Engineering Applied Sciences and Technology, 2021
Vol. 6, Issue 1, ISSN No. 2455-2143, Pages 249-254
Published Online May 2021 in IJEAST (http://www.ijeast.com)
A convolutional neural network was in the process of being At the beginning of the paper, we discussed the overview of
created from the late 1990s to the middle of the 2010s, and it was deep learning and how Neural networks in dep learning are
known as LeNet during that time period. The tasks that deployed to process various inputs to gain desired outputs. In the
convolutional neural networks were capable of doing grew later part, the author has focused on Convolutional Neural
253
International Journal of Engineering Applied Sciences and Technology, 2021
Vol. 6, Issue 1, ISSN No. 2455-2143, Pages 249-254
Published Online May 2021 in IJEAST (http://www.ijeast.com)
towardsdatascience.com/understanding-1d-and-3d-convolution-
neural-network-keras-9d8f76e29610.
adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-
Guide-To-Understanding-Convolutional-Neural-Networks/.
254