Chapter
Chapter
INTRODUCTION:
"The real power of interactive technologies is that they let us learn in ways that aren’t
otherwise possible or practical" said by David Lessnar. Everything on our planet was looking
for a way to evolve or go through an evolutionary process. Likewise, Artificial intelligence
(AI) research and applications have sparked extensive scientific, economic, social, and political
debate. Here, we contend that statistics, as an interdisciplinary scientific area, is crucial to both
the current and future development of artificial intelligence. In fact, statistics could be seen as
a fundamental component of AI. Statistics is a natural partner for other disciplines in teaching,
research, and practise because of its specialised knowledge of data evaluation, which begins
with the precise formulation of the research question and progresses through a study design
stage to analysis and interpretation of the results. We examine the contributions of statistics to
the field of artificial intelligence with regard to methodological development, study planning
and design, evaluation of data quality and data collecting, distinction of causation and
associations, and evaluation of uncertainty in results.
STATISTICS:
1
The goal of statisticians is to comprehend and, whenever feasible, manage the sources of
variation.
DATA:
Data is a unique piece of information that is gathered and translated for a specific purpose.
Data that is not formatted properly is not useful to either computers or people. Data can be
found in a variety of formats, including bits and bytes saved in electronic memory, numbers or
text written on paper, or truths that have been mentally preserved. Since the advent of
computers, the term "data" has come to describe any transferred or stored computer
information. There are different kinds of data; such are as follows:
• Sound
• Video
• Single character
• Number (integer or floating-point)
• Picture
• Boolean (true or false)
• Text (string)
Data is kept in a computer's storage as a string of binary digits (bits) that can either be 1 or
0. The data may take the shape of images, text documents, computer programs, audio or video
clips, or other types of information. The computer's CPU processes the data by using logical
operations to create output (new data) from input (old data), which can be saved in files and
folders on the computer's storage. As binary data (zero or one) is how data is kept on computers,
it may be processed, created, saved, and stored digitally. This enables data to be transferred
2
between computers using various media devices or a network connection. Data does not lose
quality or deteriorate over time if it is used often. Data has 3 types. And they are:
• Structured Data
• Semi-structed Data
• Unstructured Data
STRUCTURED DATA:
Data that has been arranged and formatted in a certain way to make it easily readable
and comprehensible by both people and machines is referred to as structured data. This is often
accomplished by using a well-defined data model or schema, which gives the data structure.
Spreadsheets and databases frequently contain structured data, which is distinguished by its
organization. Each record or row represents a distinct instance of that data, and each data
element is often given a specific field or column in the schema. A client database might, for
instance, have fields for the customer's name, address, phone number, and email address in
each record. Because it can be quickly accessed, queried, and analyzed using a variety of tools
and techniques, structured data is extremely important. This makes it the perfect format for
applications involving machine learning and artificial intelligence as well as data-driven
software like business intelligence and analytics. Examples of structured data formats include
relational databases, XML, and JSON.
• SQL Databases
• Spreadsheets such as Excel
• OLTP Systems
• Online forms
• Sensors such as GPS or RFID tags
• Network and Web server logs
• Medical devices
SEMI-STRUCTRED DATA:
3
or schema, and it may include components that are difficult to classify or categorize. Metadata
or tags that offer further details about the data elements are frequently used to identify semi-
structured data. An XML document, for instance, might have tags that describe the document's
structure as well as other tags that offer metadata about the content, like the author, the date,
or keywords. JSON, which is frequently used to exchange data between web applications, is
another example of semi-structured data, as are log files, which frequently combine structured
and unstructured data. As businesses gather and handle more data from many sources, such as
social media, IoT devices, and other unstructured sources, semi-structured data is becoming
more and more prevalent. Although working with semi-structured data can be more difficult
than working with rigorously organized data, semi-structured data gives more flexibility and
adaptability, making it an important tool for data management and analysis. Data that has some
structure but does not adhere to a data model is referred to as semi-structured data. It doesn't
have a set or rigid structure. The data that does not belong in a logical database but has some
organizational characteristics is what makes analysis easier. We can store some processes in
the relational database.
• E-mails
• XML and other markup languages
• Binary executables
• TCP/IP packets
• Zipped files
• Integration of data from different sources
• Web pages
• Data usually has an irregular and partial structure. Some sources have implicit structure
of data, which makes it difficult to interpret the relationship between data
• Schema and data are usually tightly coupled i.e. they are not only linked together but
are also dependent of each other. Same query may update both schema and data with
the schema being updated frequently.
• Distinction between schema and data is very uncertain or unclear. This complicates the
designing of structure of data
4
• Storage cost is high as compared to structured data
UNSTRUCTRED DATA:
Unstructured data is any data that does not adhere to a data model and has no obvious
organization, making it difficult for computer programs to use. Unstructured data is not well
suited for a common relational database since it is not organized in a predefined way or does
not have a predefined data model. Data is unstructured and does not follow a data model. Rows
and columns, as used in databases, cannot be used to store data. Data does not adhere to any
rules or semantics. Data does not follow a specific format or order. Data lacks a well-defined
structure. The lack of a recognizable structure makes it difficult for computer programs to use.
ARTIFICIAL INTELLIGENCE:
Building computers and machines with the ability to reason, learn, and behave in ways
that would typically need human intelligence or that involve data of a size beyond what people
can examine is the focus of the study of artificial intelligence. AI is a broad field that
encompasses many different disciplines, including computer science, data analytics and
5
statistics, hardware and software engineering, linguistics, neuroscience, and even philosophy
and psychology. On a practical level for business use, artificial intelligence (AI) is a set of
technologies used for data analytics, predictions and forecasting, object categorization, natural
language processing, recommendations, intelligent data retrieval, and more. These
technologies are primarily based on machine learning and deep learning.
Artificial intelligence (AI) technologies like deep learning and artificial neural
networks are rapidly developing, mostly because AI can process enormous volumes of data far
more quickly and correctly than a human can. While the enormous amount of data generated
6
every day would drown a human researcher, AI technologies that use machine learning can
swiftly transform that data into useful knowledge. The cost of processing the enormous
amounts of data that AI programming demands is now the main drawback of employing AI.
ADVANTAGES OF AI:
DISADVANTAGES OF AI:
TYPES OF AI:
Depending on its stage of development or the tasks being carried out, artificial
intelligence can be arranged in a variety of ways. For instance, it is well knowledge that AI
development occurs in four stages.
• Reactive Machines: AI with limited capabilities that only respond to various inputs
in accordance with predetermined rules. does not employ memory, making it unable
to learn from new information. A reactive machine is IBM's Deep Blue, which
defeated world chess champion Garry Kasparov in 1997.
7
• Limited Memory: Most contemporary AI is thought to have a limited memory. By
being trained with fresh data over time, generally using an artificial neural network
or other training model, it can use memory to get better. Deep learning, a subtype
of machine learning, is regarded as artificial intelligence with a restricted memory.
• Theory Of Mind: AI does not currently exist, but research is ongoing into its
possibilities. It depicts AI with decision-making abilities comparable to those of a
human, including the ability to recognize and remember emotions and respond in
social setting is the same way a person would.
• Self-Aware: Superior to the theory of mind Self-aware AI, sometimes known as
AI, refers to a hypothetical machine with the mental and emotional capacities of a
person and the awareness of its own existence. Self-aware AI is a hypothetical
concept that does not yet exist.
AI TECHNOLOGY:
• Automation: Automation tools can increase the number and variety of jobs carried out
when used in conjunction with AI technologies. RPA, a form of software that automates
repetitive, rule-based data processing operations often carried out by humans, is an
example. RPA can enhance machine learning and cutting-edge AI techniques automate
larger amounts of company tasks so that RPA's tactical bots may relay AI intelligence
and react to changes in processes.
• Natural Language Processing: This is how a computer programs interprets human
language. Among the spam detection, which examines an email's subject line and body
text, is one of NLP's more established and well-known applications deems it to be
8
rubbish. The methods used in NLP today are based on machine learning. Text
translation, among other NLP task Speech-recognition software and sentiment analysis.
• Robotics: This area of engineering is devoted to the creation and design of robots.
Robots are frequently utilized to complete jobs that are challenging for humans to
complete or consistently complete. Robots, for instance, are employed by NASA to
move heavy things in space or in auto assembly lines to produce cars. Additionally,
researchers are utilizing machine learning construct socially intelligent robots.
• Self-Driving Cars: Using a combination of computer vision, image recognition, and
deep learning, autonomous cars develop automated ability at driving a vehicle while
keeping in a specific lane and avoiding unforeseen obstacles, such as pedestrians.
• Machine Vision: A machine can now sight thanks to this technology. With the use of
a camera, analog-to-digital conversion, and digital signal processing, machine vision
software can record and examine visual data. Machine vision is sometimes likened to
human eyesight, however it is not constrained by biology and can be programmed to,
for instance, see through walls. It is employed in a variety of applications, including
medical picture analysis and signature identification. Machine vision and computer
vision are frequently confused, with computer vision concentrating on automated image
processing
• Machine Learning: The machine learning explains both fundamental and
sophisticated machine learning principles. Both students and working professionals can
benefit from our machine learning tutorial. A developing technology called machine
learning makes it possible for computers to learn autonomously from historical data.
Machine learning uses a variety of techniques to create mathematical models and make
predictions based on previous information or data. Currently, it is utilized for many
different things, including recommender systems, email filtering, Facebook auto-
tagging, image identification, and speech recognition. In the actual world, we are
surrounded by people who are able to learn from their experiences thanks to their
capacity for learning, and we also have computers or other robots that carry out our
orders. But can a machine learn from past facts or experiences the same way a human
does? Herein is the function of machine learning.
9
According to some, machine learning is a branch of artificial intelligence that focuses
primarily on creating algorithms that enable a computer to independently learn from data and
previous experiences. Arthur Samuel coined the phrase "machine learning" in 1959. In a
nutshell, we may say that it is “Machine learning enables a machine to automatically learn from
data, improve performance from experiences, and predict things without being explicitly
programmed”. Machine learning algorithms create a mathematical model with the aid of
historical sample data, or "training data," that aids in making predictions or judgments without
being explicitly programmed. Computer science and statistics are used with machine learning
to create prediction models. Algorithms that learn from past data are created by machine
learning or used in it. The performance will increase as we supply more information.
When a machine learning system receives new data, it forecasts the outcome using the
prediction models it has built using prior data. The amount of data used determines how well
the output is anticipated, as a larger data set makes it easier to create a model that predicts the
outcome more precisely. Imagine that we have a complex problem that requires some
predictions. Instead of creating code for it, we can simply input the data to generic algorithms,
and the machine would develop the logic according to the data and forecast the output. Our
perspective on the issue has altered as a result of machine learning. The machine learning
algorithm's operation is explained in the block diagram below:
10
AI LEARNING TYPES:
Businesses frequently refer to “training data” while discussing AI. What does that
signify, though? Keep in mind that limited-memory AI is AI that gets better over time by being
trained with new data. A subset of artificial intelligence called machine learning trains data
using algorithms in order to produce outcomes.
11
DEEP LEARNING:
As artificial neural networks will mimic the human brain, deep learning is also a type
of mimic of the human brain. Deep learning is a subfield of machine learning that is entirely
based on neural networks. We don't have to explicitly program everything in deep learning.
Deep learning is not a brand-new idea. It has been in existence for a while. It's popular now
because we have access to more data and computing capacity than we did in the past. Deep
learning and machine learning emerged as a result of the exponential rise in processing power
over the past 20 years. A formal definition of deep learning is- neurons.
An artificial neural network, a model loosely modelled on the human brain, is a frequent
sort of training model in AI. A neural network is a collection of synthetic neurons, also known
as perceptron’s, that function as computational nodes in the classification and analysis of data.
A neural network's first layer receives the data, and each perceptron makes a choice before
relaying that information to a number of nodes in the layer above. Deep neural networks or
"deep learning" refer to training models with more than three layers. Some contemporary neural
networks have a thousand or more layers. The final perceptron’s output allows the neural
network to complete the task assigned to it, such as classifying an object or identifying patterns
in data.
Some of the most common types of artificial neural networks you may encounter include:
• Feedforward Neural Networks (FF): Data flows through layers of artificial neurons
in feedforward neural networks (FF), one of the earliest types of neural networks, until
the desired output is obtained. Nowadays, the majority of feedforward neural networks
are thought to be "deep feedforward" and have multiple layers (including multiple
"hidden" layers). Typically, backpropagation, an error-correction method, is used in
conjunction with feedforward neural networks to increase the accuracy of the neural
network. Backpropagation works by starting with the result of the neural network and
working backwards through the network to discover faults. Deep feedforward neural
networks are frequently basic yet effective.
• Recurrent Neural Networks (RNN): It is distinct from feedforward neural networks
in that they frequently incorporate time series data or data involving sequences.
Recurrent neural networks have "memory" of what happened in the previous layer as
dependent on the output of the current layer, unlike feedforward neural networks, which
12
employ weights in each node of the network. RNNs, for instance, may "remember"
additional words mentioned in a phrase when analyzing spoken language. Speech
recognition, translation, and image captioning are common applications for RNNs.
• Long/Short Term Memory (LSTM): (LSTM) are a more sophisticated type of RNN
that can "remember" what happened in earlier levels by using memory. RNNs and
LTSM vary in that LTSM uses "memory cells" to use past events as a basis for future
decisions. In speech detection and prediction, LSTM is frequently employed.
• Generative Adversarial Networks (GAN): (GAN) involve two neural networks
playing a game against one another, which ultimately increases the output's accuracy.
The generator network generates instances that the discriminator network uses to
determine whether they are true or untrue. GANs have been applied to the production
of art and realistic images.
• Convolutional Neural Networks (CNN): CNN include some of the most common
neural networks in modern artificial intelligence. Most often used in image recognition,
CNNs use several distinct layers (a convolutional layer, then a pooling layer) that filter
different parts of an image before putting it back together (in the fully connected layer).
The earlier convolutional layers may look for simple features of an image such as colors
and edges, before looking for more complex features in additional layers.
13
HISTORY OF NEURAL NETWORKS:
• 1940’s
When Walter Pitts and Warren McCulloch developed a computer model based on
the neural networks of the human brain in 1943, deep learning had its beginnings. In
order to simulate the mental process, they employed a set of mathematical formulas and
techniques they named "threshold logic". Since then, there have only been two
important interruptions in the development of deep learning. Both have anything to do
with the infamous AI winters.
• 1950’s
Convolutional neural networks, or CNNs for short, form the backbone of many
modern computer vision systems. In 1959, David Hubel and Torsten Wiesel described
“simple cells” and “complex cells” in the human visual cortex. They proposed that both
kinds of cells are used in pattern recognition. A “simple cell” responds to edges and
bars of particular orientations, such as this image:
A “complex cell” also responds to edges and bars of particular orientations, but it is
different from a simple cell in that these edges and bars can be shifted around the scene
and the cell will still respond. For instance, a simple cell may respond only to a
horizontal bar at the bottom of an image, while a complex cell might respond to
horizontal bars at the bottom, middle, or top of an image. This property of complex
cells is termed “spatial invariance.”
14
• 1960’s
Henry J. Kelley is given credit for developing the basics of a continuous Back
Propagation Model in 1960.
Hubel and Wiesel proposed in 1962 that complex cells achieve spatial invariance
by “summing” the output of several simple cells that all prefer the same orientation
(e.g., horizontal bars) but different receptive fields (e.g., bottom, middle, or top of an
image). By collecting information from a bunch of simple cell minions, the complex
cells can respond to horizontal bars that occur anywhere.
In 1962, a simpler version based only on the chain rule was developed by Stuart
Dreyfus. While the concept of back propagation (the backward propagation of errors
for purposes of training) did exist in the early 1960s, it was clumsy and inefficient. The
earliest efforts in developing deep learning algorithms came from Alexey Grigoryevich
Ivakhnenko (developed the Group Method of Data Handling) and Valentin
Grigorʹevich Lapa (author of Cybernetics and Forecasting Techniques) in 1965. They
used models with polynomial (complicated equations) activation functions, that were
then analyzed statistically. From each layer, the best statistically chosen features were
then forwarded on to the next layer (a slow, manual process).
• 1970’s
The first "convolutional neural networks" were employed by Kunihiko
Fukushima. Fukushima created neural networks with several convolutional and pooling
layers. He created the Neocognitron artificial neural network in 1979, which had a
hierarchical, multilayered design. The computer may "learn" to recognize visual
patterns thanks to this design. Although taught using a reinforcement technique of
repeated activation in numerous layers, the networks matched contemporary models
and grew stronger with time. Furthermore, Fukushima's design allowed for human
adjustment of significant elements by raising the "weight" of specific connections.
Back propagation, the use of errors in training deep learning models, evolved
significantly in 1970. This was when Seppo Linnainmaa wrote his master’s thesis,
including a FORTRAN code for back propagation.
15
• 1980’s
In 1989, Yann LeCun provided the first practical demonstration of
backpropagation at Bell Labs. He combined convolutional neural networks with back
propagation onto read “handwritten” digits. This system was eventually used to read
the numbers of handwritten checks.
• 1990’s
Some people continued to work on AI and DL, and some significant advances
were made. In 1995, Dana Cortes and Vladimir Vapnik developed the support vector
machine (a system for mapping and recognizing similar data). LSTM (long short-term
memory) for recurrent neural networks was developed in 1997, by Sepp Hochreiter and
Juergen Schmidhuber.
The next significant evolutionary step for deep learning took place in 1999, when
computers started becoming faster at processing data and GPU (Graphics Processing
Units) were developed. Faster processing, with GPUs processing pictures, increased
computational speeds by 1000 times over a 10-year span. During this time, neural
networks began to compete with support vector machines. While a neural network
could be slow compared to a support vector machine, neural networks offered better
results using the same data. Neural networks also have the advantage of continuing to
improve as more training data is added.
• 2000’s
Around the year 2000, Vanishing Gradient Problem appeared. It was discovered
“features” (lessons) formed in lower layers were not being learned by the upper layers,
because no learning signal reached these layers. This was not a fundamental problem
for all neural networks, just the ones with gradient-based learning methods. Two
solutions used to solve this problem were layer-by-layer pre-training and the
development of long short-term memory. Two solutions used to solve this problem
were layer-by-layer pre-training and the development of long short-term memory.
The potential and difficulties of data growth were described as three-dimensional
in a research report published in 2001 by META Group, now known as Gartner.
According to the report, the variety of data sources and types has expanded as a result
of the growth in data volume and speed. This was a warning to get ready for the
impending invasion of big data.
16
A free collection of more than 14 million tagged photos was compiled by Stanford
AI scientist Fei-Fei Li in 2009 under the name ImageNet. Unlabeled photographs are
and have always been prevalent online. It was necessary to "train" neural nets using
labelled images. Professor Li stated, "Our hope was that the use of large data will
transform machine learning. Learning is driven by data.
• 2010’s
Convolutional neural networks could be trained "without" the layer-by-layer pre-
training in 2011 thanks to a huge gain in GPU speed. It became clear that deep learning
has considerable advantages in terms of efficiency and speed with the rising processing
speed. AlexNet is a convolutional neural network that, in 2011 and 2012, won
numerous international contests for its architecture. Rectified linear units were utilized
to increase dropout and speed.
Google Brain published the findings of an odd research called The Cat
Experiment in 2012 as well. The adventurous study looked into the challenges of
"unsupervised learning." Convolutional neural networks used in deep learning are
trained with labelled data, which is known as "supervised learning" (think images from
ImageNet). A convolutional neural network is trained through unsupervised learning,
which involves feeding it unlabeled data and instructing it to look for recurrent patterns.
2014 saw the introduction of the Generative Adversarial Neural Network
(GAN). Ian Goodfellow developed the GAN programme. Two neural networks
compete with one another using GAN. In order to win, one network must successfully
duplicate a photograph and deceive the other into thinking it is real. Of course, the
adversary is seeking for weaknesses. The game continues until the opponent is duped
with a nearly flawless photo. GAN offers a method for improving a product (and has
also begun being used by scammers).
Deep learning and artificial intelligence are influencing the creation of new business models.
These businesses are creating new corporate cultures that embrace deep learning, artificial
intelligence, and modern technology.
There are three types of layers that make up the CNN which are the convolutional
layers, pooling layers, and fully-connected (FC) layers. When these layers are stacked, a CNN
17
architecture will be formed. In addition to these three layers, there are two more important
parameters which are the dropout layer and the activation function which are defined below.
• CONVOLUTIONAL LAYER: This layer is the first layer that is used to extract the
various features from the input images. In this layer, the mathematical operation of
convolution is performed between the input image and a filter of a particular size MxM.
By sliding the filter over the input image, the dot product is taken between the filter and
the parts of the input image with respect to the size of the filter (MxM). The output is
termed as the Feature map which gives us information about the image such as the
corners and edges. Later, this feature map is fed to other layers to learn several other
features of the input image. The convolution layer in CNN passes the result to the next
layer once applying the convolution operation in the input. Convolutional layers in
CNN benefit a lot as they ensure the spatial relationship between the pixels is intact.
• POOLING LAYER: In most cases, a Convolutional Layer is followed by a Pooling
Layer. The primary aim of this layer is to decrease the size of the convolved feature
map to reduce the computational costs. This is performed by decreasing the connections
between layers and independently operates on each feature map. Depending upon
method used, there are several types of Pooling operations. It basically summarizes the
features generated by a convolution layer. In Max Pooling, the largest element is taken
from feature map. Average Pooling calculates the average of the elements in a
predefined sized Image section. The total sum of the elements in the predefined section
is computed in Sum Pooling. The Pooling Layer usually serves as a bridge between the
18
Convolutional Layer and the FC Layer. This CNN model generalizes the features
extracted by the convolution layer, and helps the networks to recognize the features
independently. With the help of this, the computations are also reduced in a network.
• FULLY CONNECTED LAYER: The Fully Connected (FC) layer consists of the
weights and biases along with the neurons and is used to connect the neurons between
two different layers. These layers are usually placed before the output layer and form
the last few layers of a CNN Architecture. In this, the input image from the previous
layers are flattened and fed to the FC layer. The flattened vector then undergoes few
more FC layers where the mathematical functions operations usually take place. In this
stage, the classification process begins to take place. The reason two layers are
connected is that two fully connected layers will perform better than a single connected
layer. These layers in CNN reduce the human supervision
• DROPOUT LAYER: Usually, when all the features are connected to the FC layer, it
can cause overfitting in the training dataset. Overfitting occurs when a particular model
works so well on the training data causing a negative impact in the model’s performance
when used on a new data. To overcome this problem, a dropout layer is utilised wherein
a few neurons are dropped from the neural network during training process resulting in
reduced size of the model. On passing a dropout of 0.3, 30% of the nodes are dropped
out randomly from the neural network. Dropout results in improving the performance
of a machine learning model as it prevents overfitting by making the network simpler.
It drops neurons from the neural networks during training.
• ACTIVATION FUNCTIONS: Finally, one of the most important parameters of the
CNN model is the activation function. They are used to learn and approximate any kind
of continuous and complex relationship between variables of the network. In simple
words, it decides which information of the model should fire in the forward direction
and which ones should not at the end of the network. It adds non-linearity to the
network. There are several commonly used activation functions such as the ReLU,
Softmax, tanH and the Sigmoid functions. Each of these functions have a specific
usage. For a binary classification CNN model, sigmoid and softmax functions are
preferred an for a multi-class classification, generally softmax us used. In simple terms,
activation functions in a CNN model determine whether a neuron should be activated
or not. It decides whether the input to the work is important or not to predict using
mathematical operations.
19
ADVANTAGES OF CNN:
• Feature Engineering: CNNs can learn the necessary features during training,
eliminating the need for human feature engineering. By feeding it data, you can use the
pre-trained CNN and modify the weights even if you're working on a completely
different task. CNN will adjust to the new task.
• Computational Efficient: Convolutional neural networks (CNN) are computationally
significantly more effective than traditional neural networks. CNN uses dimensionality
reduction and parameter sharing, which makes it simple and quick to deploy models.
They may be made to work smoothly on any platform, including smartphones.
• Accuracy: Convolutional networks, like those used in image transformers, are not
currently the state-of-the-art NNs for image categorization. However, CNNs have long
since taken the lead in tasks involving image and video identification as well as other
comparable tasks. Particularly when there is a large amount of data involved, they
typically exhibit superior accuracy than non-convolutional NNs.
DISADVANTAGES OF CNN:
• Adversarial Attacks: Adversarial attacks involve supplying the network with "poor"
samples (i.e., photos that have been significantly altered in a specific way) in order to
lead to misclassification. A CNN can become irrational with just a small change in the
pixels. Criminals, for instance, can trick a face recognition system based on CNN and
go unnoticed in front of the camera.
20
handful high-quality pre-trained models, like GoogleNet, VGG, Inception, and
AlexNet. Global businesses hold the majority of them.
APPLICATIONS OF CNN:
• Object Detection: CNN is frequently used by self-driving cars, AI-powered security
systems, and smart homes to be able to recognize and mark items. CNN has the
ability to recognize, classify, and label the items in the images in real time. This is
how an automatic automobile avoids collisions with other vehicles and pedestrians,
and how smart houses can distinguish the owner's face from others.
21
their ability. CNN is also applicable to agriculture. The networks get imagery from
satellites like LSAT and may identify areas according to their degree of agriculture
using this data. As a result, this information can be utilised to forecast the fertility level
of the grounds or create a plan for the best possible use of farmland. One of the first
applications of CNN for computer vision was the recognition of handwritten digits.
WHAT IS AN IMAGE?
The amplitude of F at each given pair of coordinates (x,y) is referred to as the intensity
of that image at that location. An image is defined as a two-dimensional function, F(x,y), where
22
x and y are spatial coordinates. We refer to it as a digital image when F's x, y, and amplitude
values are all finite. In other words, a two-dimensional array specifically set up in rows and
columns can be used to define an image.
TYPES OF IMAGES:
• Binary Image: The binary image, as its name implies, consists of only two pixel
elements, namely 0 and 1, where 0 stands for black and 1 for white. Monochrome is
another name for this picture.
• Black And White Image: The term "black and white image" refers to an image that
solely has the colors black and white.
• 8 Bit Color Format: The most popular image format is this one. It is sometimes
referred to as a grayscale image and contains 256 different color tones. In this format,
0 denotes black, 255 denotes white, and 127 denotes grey.
• 16 Bit Color Format: It's a format for colored images. It comes in 65,536 distinct
colors. High Color Format is another name for it. The distribution of color in this format
differs from that in a grayscale image. A 16 bit format is actually divided into three
further formats which are Red, Green and Blue. That famous RGB format.
• To collect and preprocess a large dataset of images for training, validating and testing
the CNN model.
• To design and train a CNN model for image classification using the collected dataset.
• To evaluate the performance of the CNN model by measuring its accuracy on a
separate(unused) test dataset.
• To compare the performance of the CNN model with other existing models and
methods for image classification.
23
• To identify and analyze the factors that affect the performance of the CNN model, as
using various metrics, such as accuracy, precision, recall, and F1-score and the number
of layers, the type of activation functions, and the size of the input images.
• To provide insights and recommendations for future improvements and applications of
the image classification using CNN technique.
Overall, the aim of this study is to demonstrate the effectiveness and potential of CNN models
for image classification tasks and to provide insights and guidelines for further research and
development in this field.
24