Unit-6 AI (April 11, 2023)
Unit-6 AI (April 11, 2023)
1
What is Artificial Intelligence (AI)?
Artificial Neural Networks (ANN) are the fundamental building blocks
of Artificial Intelligence (AI) technology. ANNs are the basis of machine-
learning models; they simulate the process of learning identical to human brains.
Simply put, ANNs give machines the capacity to accomplish human-like
performance (and beyond) for specific tasks.
In today's world, technology is growing very fast, and we are getting in touch with
different new technologies day by day.
AI is one of the fascinating and universal fields of Computer science which has a
great scope in future. AI holds a tendency to cause a machine to work as a human.
Artificial Intelligence exists when a machine can have human based skills such as
learning, reasoning, and solving problems
It is believed that AI is not a new technology, and some people says that as per
Greek myth, there were Mechanical men in early days which can work and behave
like humans.
o With the help of AI, you can create such software or devices which can solve
real-world problems very easily and with accuracy such as health issues,
marketing, traffic issues, etc.
o With the help of AI, you can create your personal virtual Assistant, such as
Cortana, Google Assistant, Siri, etc.
o With the help of AI, you can build such Robots which can work in an
environment where survival of humans can be at risk.
o AI opens a path for other new technologies, new devices, and new
Opportunities.
o Playing chess
5. Creating some system which can exhibit intelligent behavior, learn new
things by itself, demonstrate, explain, and can advise to its user.
Artificial Intelligence is not just a part of computer science even it's so vast and
requires lots of other factors which can contribute to it. To create the AI first we
should know that how intelligence is composed, so the Intelligence is an intangible
part of our brain which is a combination of Reasoning, learning, problem-
solving perception, language understanding, etc.
3
To achieve the above factors for a machine or software Artificial Intelligence
requires the following discipline:
o Mathematics
o Biology
o Psychology
o Sociology
o Computer Science
o Neurons Study
o Statistics
o High Accuracy with less errors: AI machines or systems are prone to less
errors and high accuracy as it takes decisions as per pre-experience or
information.
o High-Speed: AI systems can be of very high-speed and fast-decision
making, because of that AI systems can beat a chess champion in the Chess
game.
o High reliability: AI machines are highly reliable and can perform the same
action multiple times with high accuracy.
o Useful for risky areas: AI machines can be helpful in situations such as
defusing a bomb, exploring the ocean floor, where to employ a human can
be risky.
4
o Digital Assistant: AI can be very useful to provide digital assistant to the
users such as AI technology is currently used by various E-commerce
websites to show the products as per customer requirement.
o Useful as a public utility: AI can be very useful for public utilities such as
a self-driving car which can make our journey safer and hassle-free, facial
recognition for security purpose, Natural language processing to
communicate with the human in human-language, etc.
Every technology has some disadvantages, and the same goes for Artificial
intelligence. Being so advantageous technology still, it has some disadvantages
which we need to keep in our mind while creating an AI system. Following are the
disadvantages of AI:
5
Artificial Neural Network(ANN)
Here, X1 and X2 are inputs to the artificial neurons, f(X) represents the processing
done on the inputs and y represents the output of the neuron.
Artificial Neural Networks contain artificial neurons which are called units.
These units are arranged in a series of layers that together constitute the whole
Artificial Neural Networks in a system. A layer can have only a dozen units or
millions of units as this depends on the complexity of the system. Commonly,
Artificial Neural Network has an input layer, output layer as well as hidden
layers. The input layer receives data from the outside world which the neural
network needs to analyze or learn about. Then this data passes through one or
multiple hidden layers that transform the input into data that is valuable for the
output layer. Finally, the output layer provides an output in the form of a response
of the Artificial Neural Networks to input data provided.
In the majority of neural networks, units are interconnected from one layer to
another. Each of these connections has weights that determine the influence of
7
one unit on another unit. As the data transfers from one unit to another, the neural
network learns more and more about the data which eventually results in an
output from the output layer.
In our brain, there are billions of cells called neurons, which processes information
in the form of electric signals. External information/stimuli is received by the
dendrites of the neuron, processed in the neuron cell body, converted to an output
and passed through the Axon to the next neuron. The next neuron can choose to
either accept it or reject it depending on the strength of the signal.
8
Now, lets try to understand how a ANN works:
As you can see from the above, an ANN is a very simplistic representation of a how
a brain neuron works.
9
Key Components of the Neural Network Architecture
(OR Representation of Neural Network)
The Neural Network architecture is made of individual units called neurons that mimic
the biological behavior of the brain.
Here are the various components of a neuron.
10
Input - It is the set of features that are fed into the model for the learning process.
For example, the input in object detection can be an array of pixel values pertaining
to an image.
Weight - Its main function is to give importance to those features that contribute
more towards the learning. It does so by introducing scalar multiplication between
the input value and the weight matrix. For example, a negative word would impact
the decision of the sentiment analysis model more than a pair of neutral words.
Transfer function - The job of the transfer function is to combine multiple inputs
into one output value so that the activation function can be applied. It is done by a
simple summation of all the inputs to the transfer function.
Activation Function—It introduces non-linearity in the working of perceptrons to
consider varying linearity with the inputs. Without this, the output would just be a
linear combination of input values and would not be able to introduce non-linearity
in the network.
11
In Neural Network the activation function defines if given node should be
“activated” or not based on the weighted sum. Let’s define this weighted sum
value as z. In this section I would explain why “Step Function” and “Linear
Function” won’t work and talk about “Sigmoid Function” one of the most
popular activation functions. There are also other functions which I will leave
aside for now.
The transfer function translates the input signals to output signals. Four types of
transfer functions are commonly used, Unit step (threshold), sigmoid, piecewise
linear, and Gaussian.
Bias - The role of bias is to shift the value produced by the activation function. Its
role is similar to the role of a constant in a linear function.
When multiple neurons are stacked together in a row, they constitute a layer, and
multiple layers piled next to each other are called a multi-layer neural network.
We've described the main components of this type of structure below.
12
Input Layer
The data that we feed to the model is loaded into the input layer from external
sources like a CSV file or a web service. It is the only visible layer in the complete
Neural Network architecture that passes the complete information from the outside
world without any computation.
Hidden Layers
The hidden layers are what makes deep learning what it is today. They are
intermediate layers that do all the computations and extract the features from the
data.
There can be multiple interconnected hidden layers that account for searching
different hidden features in the data. For example, in image processing, the first
hidden layers are responsible for higher-level features like edges, shapes, or
boundaries. On the other hand, the later hidden layers perform more complicated
tasks like identifying complete objects (a car, a building, a person).
Output Layer
The output layer takes input from preceding hidden layers and comes to a final
prediction based on the model’s learnings. It is the most important layer where we
get the final result.
In the case of classification/regression models, the output layer generally has a single
node. However, it is completely problem-specific and dependent on the way the
model was built.
Unit step (threshold)
The output is set at one of two levels, depending on whether the total input is
greater than or less than some threshold value.
13
Example of ANN
To make things clearer, lets understand ANN using a simple example: A bank
wants to assess whether to approve a loan application to a customer, so, it wants to
predict whether a customer is likely to default on the loan. It has data like below:
Lets try to create an Artificial Neural Network architecture loosely based on the
structure of a neuron using this example:
14
In general, a simple ANN architecture for the above example could be:
15
Key Points related to the architecture:
1. The network architecture has an input layer, hidden layer (there can be more than
1) and the output layer. It is also called MLP (Multi Layer Perceptron) because of
the multiple layers.
2. The hidden layer can be seen as a “distillation layer” that distills some of the
important patterns from the inputs and passes it onto the next layer to see. It makes
the network faster and efficient by identifying only the important information from
the inputs leaving out the redundant information
16
In the above example, the activation function used is sigmoid:
O1 = 1 / (1+exp(-F))
Sigmoid activation function creates an output with values between 0 and 1. There
can be other activation functions like Tanh, softmax and RELU.
4. Similarly, the hidden layer leads to the final prediction at the output layer:
O3 = 1 / (1+exp(-F 1))
Here, the output value (O3) is between 0 and 1. A value closer to 1 (e.g. 0.75)
indicates that there is a higher indication of customer defaulting.
5. The weights W are the importance associated with the inputs. If W1 is 0.56 and
W2 is 0.92, then there is higher importance attached to X2: Debt Ratio than X1:
Age, in predicting H1.
6. The above network architecture is called “feed-forward network”, as you can see
that input signals are flowing in only one direction (from inputs to outputs). We can
also create “feedback networks where signals flow in both directions.
7. A good model with high accuracy gives predictions that are very close to the
actual values. So, in the table above, Column X values should be very close to
Column W values. The error in prediction is the difference between column W and
column X:
17
8. The key to get a good model with accurate predictions is to find “optimal values
of W — weights” that minimizes the prediction error. This is achieved by “Back
propagation algorithm” and this makes ANN a learning algorithm because by
learning from the errors, the model is improved.
18
How do Artificial Neural Networks learn?
Artificial neural networks are trained using a training set. For example, suppose
you want to teach an ANN to recognize a cat. Then it is shown thousands of
different images of cats so that the network can learn to identify a cat. Once the
neural network has been trained enough using images of cats, then you need to
check if it can identify cat images correctly. This is done by making the ANN
classify the images it is provided by deciding whether they are cat images or not.
The output obtained by the ANN is corroborated by a human-provided
description of whether the image is a cat image or not. If the ANN identifies
incorrectly then back-propagation is used to adjust whatever it has learned during
training. Back-propagation is done by fine-tuning the weights of the connections
in ANN units based on the error rate obtained. This process continues until the
artificial neural network can correctly recognize a cat in an image with minimal
possible error rates.
There are different types of neural networks, but they are generally classified into
feed-forward and feed-back networks.
20
5. Radial basis function Neural Network
Radial basis functions are those functions that consider the distance of a point
concerning the center. RBF functions have two layers. In the first layer, the input
is mapped into all the Radial basis functions in the hidden layer and then the
output layer computes the output in the next step. Radial basis function nets are
normally used to model the data that represents any underlying trend or function.
1. Social Media
Artificial Neural Networks are used heavily in Social Media. For example, let’s
take the ‘People you may know’ feature on Facebook that suggests you people
that you might know in real life so that you can send them friend requests. Well,
this magical effect is achieved by using Artificial Neural Networks that analyze
your profile, your interests, your current friends, and also their friends and various
other factors to calculate the people you might potentially know. Another
common application of Machine Learning in social media is facial recognition.
This is done by finding around 100 reference points on the person’s face and then
matching them with those already available in the database using convolutional
neural networks.
ANNs have some key advantages that make them most suitable for certain
problems and situations:
1. ANNs have the ability to learn and model non-linear and complex relationships,
which is really important because in real-life, many of the relationships between
inputs and outputs are non-linear as well as complex.
2. ANNs can generalize — After learning from the initial inputs and their
relationships, it can infer unseen relationships on unseen data as well, thus making
the model generalize and predict on unseen data.
3. Unlike many other prediction techniques, ANN does not impose any restrictions
on the input variables (like how they should be distributed). Additionally, many
studies have shown that ANNs can better model heteroskedasticity i.e. data with
high volatility and non-constant variance, given its ability to learn hidden
22
relationships in the data without imposing any fixed relationships in the data. This
is something very useful in financial time series forecasting (e.g. stock prices)
where data volatility is very high.
Perceptron
In Machine Learning and Artificial Intelligence, Perceptron is the most commonly
used term for all folks. It is the primary step to learn Machine Learning and Deep
Learning technologies, which consists of a set of weights, input values or scores,
and a threshold. Perceptron is a building block of an Artificial Neural Network.
Initially, in the mid of 19th century, Mr. Frank Rosenblatt invented the
Perceptron for performing certain calculations to detect input data capabilities or
business intelligence.
Perceptron is a linear Machine Learning algorithm used for supervised learning for
various binary classifiers. This algorithm enables neurons to learn elements and
processes them one by one during preparation. Here "Perceptron in Machine
Learning," we will discuss in-depth knowledge of Perceptron and its basic
functions in brief. Let's start with the basic introduction of Perceptron.
23
What is the Perceptron model in Artificial Neuron network/
Machine Learning ?
Perceptron is Machine Learning algorithm for supervised learning of various
binary classification tasks. Further, Perceptron is also understood as an Artificial
Neuron or neural network unit that helps to detect certain input data
computations in business intelligence.
Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary
classifiers. Hence, we can consider it as a single-layer neural network with four
main parameters, i.e., input values, weights and Bias, net sum, and an
activation function.
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:
This is the primary component of Perceptron which accepts the initial data into the
system for further processing. Each input node contains a real numerical value.
24
Weight parameter represents the strength of the connection between units. This is
another most important parameter of Perceptron components. Weight is directly
proportional to the strength of the associated input neuron in deciding the output.
Further, Bias can be considered as the line of intercept in a linear equation.
o Activation Function:
These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function.
o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based
on various problem statements and forms the desired outputs. Activation function
may differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking
whether the learning process is slow or has vanishing or exploding gradients.
25
together to create the weighted sum. Then this weighted sum is applied to the
activation function 'f' to obtain the desired output. This activation function is also
known as the step function and is represented by 'f'.
This step function or Activation function plays a vital role in ensuring that output
is mapped between required values (0,1) or (-1,1). It is important to note that the
weight of input is indicative of the strength of a node. Similarly, an input's bias
value gives the ability to shift the activation function curve up or down.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x'
with the learned weight coefficient 'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
26
Perceptron model works in two important steps as follows:
Step-1
In the first step first, multiply all input values with corresponding weight values
and then add them to determine the weighted sum. Mathematically, we can
calculate the weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's
performance.
∑wi*xi + b
Step-2
Y = f(∑wi*xi + b)
Characteristics of Perceptron
27
Types of Perceptron Models
Based on the layers, Perceptron models are divided into two types. These are as
follows:
This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold
transfer function inside the model. The main objective of the single-layer
perceptron model is to analyze the linearly separable objects with binary outcomes.
In a single layer perceptron model, its algorithms do not contain recorded data, so
it begins with inconstantly allocated input for weight parameters. Further, it sums
up all inputs (weight). After adding all inputs, if the total sum of all inputs is more
than a pre-determined value, the model gets activated and shows the output value
as +1.
Like a single-layer perceptron model, a multi-layer perceptron model also has the
same model structure but has a greater number of hidden layers.
28
o Forward Stage: Activation functions start from the input layer in the
forward stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are
modified as per the model's requirement. In this stage, the error between
actual output and demanded originated backward on the output layer and
ended on the input layer.
A multi-layer perceptron model has greater processing power and can process
linear and non-linear patterns. Further, it can also implement logic gates such as
AND, OR, XOR, NAND, NOT, XNOR, NOR.
29
Recurrent Neural Networks have the power to remember what it has learned in the past
and apply it in future predictions.
The input is in the form of sequential data that is fed into the RNN, which has a hidden
internal state that gets updated every time it reads the following sequence of data in the
input.
The internal hidden state will be fed back to the model. The RNN produces some
output at every timestamp.
The mathematical representation is given below:
30
Note: We use the same function and parameters at every timestamp.
• Cell state (c_t): It corresponds to the long-term memory content of the network.
31
• Forget Gate: Some information in the cell state is no longer needed and is
erased. The gate receives two inputs, x_t (current timestamp input) and h_t-1
(previous cell state), multiplied with the relevant weight matrices before bias is
added. The result is sent into an activation function, which outputs a binary value
that decides whether the information is retained or forgotten.
• Input gate: It decides what piece of new information is to be added to the cell
state. It is similar to the forget gate using the current timestamp input and
previous cell state with the only difference of multiplying with a different set of
weights.
• Output gate: The output gate's job is to extract meaningful information from the
current cell state and provide it as an output.
To train a machine learning algorithm, you strive to identify the weights and biases
within the network that will help you solve the problem under consideration.
For example, you may have a classification problem. When looking at an image,
you want to determine if the image is of a cat or a dog. To build your model, you
train your algorithm with training data with correctly labelled data samples of cats
and dogs images.
33
It is helpful to think of a neural network as a function that accepts inputs (data ), to
produce an output prediction. The variables of this function are the parameters or
weights of the neuron.
The image below depicts a simple neural network that receives input(X1, X2, X3,
Xn), these inputs are fed forward to neurons within the layer containing
weights(W1, W2, W3, Wn). The inputs and weights undergo a multiplication
operation and the result is summed together by an adder(), and an activation
function regulates the final output of the layer.
34
value is required, yielding the calculation of a factor that influences the
modification of weights and biases within a neural network.
The error gap between the predicted value of a neural network and the actual value
of a data sample is facilitated by the cost function.
When a ‘3’ is fed forward through the network, we expect the connections
(represented by the arrows in the diagram) responsible for classifying a ‘3’ to have
higher activation, which results in a higher probability for the output neuron
associated with the digit ‘3’.
35
Several components are responsible for the activation of a neuron, namely biases,
weights, and the previous layer activations. These specified components have to be
iteratively modified for the neural network to perform optimally on a particular
dataset.
For completeness, below are examples of cost functions used within machine
learning:
• Categorical Cross-Entropy
• Binary Cross-Entropy
• Logarithmic Loss
The image in figure 3 illustrates a cost function plotted on the x and y-axis that hold
values within the function’s parameter space. Let’s take a look at how neural
networks learn by visualizing the cost function as an uneven surface plotted on a
graph within the parameter spaces of the possible weight/parameters values.
36
Figure 3: Gradient Descent visualized
The blue points in the image above represent a step (evaluation of parameters
values into the cost function) in the search for a local minimum. The lowest point
of a modelled cost function corresponds to the position of weights values that
results in the lowest value of the cost function. The smaller the cost function is, the
better the neural network performs. Therefore, it is possible to modify the networks’
weights from the information gathered.
Gradient descent is the algorithm employed to guide the pairs of values chosen at
each step towards a minimum.
• Global Minimum: This is the smallest parameter value within the entire
cost function domain.
The gradient descent algorithm guides the search for values that minimize the
function at a local/global minimum by calculating the gradient of a differentiable
function and moving in the opposite direction of the gradient.
37
Advantages of gradient descent and its variants:
1. Widely used: Gradient descent and its variants are widely used in
machine learning and optimization problems because they are effective
and easy to implement.
2. Convergence: Gradient descent and its variants can converge to a global
minimum or a good local minimum of the cost function, depending on
the problem and the variant used.
3. Scalability: Many variants of gradient descent can be parallelized and are
scalable to large datasets and high-dimensional models.
4. Flexibility: Different variants of gradient descent offer a range of trade-
offs between accuracy and speed, and can be adjusted to optimize the
performance of a specific problem.
1. Choice of learning rate: The choice of learning rate is crucial for the
convergence of gradient descent and its variants. Choosing a learning rate
that is too large can lead to oscillations or overshooting, while choosing
a learning rate that is too small can lead to slow convergence or getting
stuck in local minima.
2. Sensitivity to initialization: Gradient descent and its variants can be
sensitive to the initialization of the model’s parameters, which can affect
the convergence and the quality of the solution.
Time-consuming: Gradient descent and its variants can be time-
consuming, especially when dealing with large datasets and high-
dimensional models. The convergence speed can also vary depending on
the variant used and the specific problem.
3. Local optima: Gradient descent and its variants can converge to a local
minimum instead of the global minimum of the cost function, especially
in non-convex problems. This can affect the quality of the solution, and
techniques like random initialization and multiple restarts may be used
to mitigate this issue.
38
Backpropagation.
The minimization of the cost function by calculating the gradient leads to a local
minimum. In each iteration or training step, the weights in the network are adjusted
by the calculated gradient, alongside the learning rate, which controls the factor of
modification made to weight values. This process is repeated for each step to be
taken during the training phase of a neural network with the goal to be closer to a
local minimum after each step.
39
The name “Backpropagation” comes from the process’s literal meaning, which is
“backward propagation of errors.” The partial derivative of the gradient quantifies
the error. By propagating the errors backwards through the network, the partial
derivative of the gradient of the last layer (closest layer to the output layer) is used
to calculate the gradient of the second to the last layer.
The propagation of errors through the layers and the utilization of the partial
derivative of the gradient from a previous layer in the current layer occurs until the
first layer (closest layer to the input layer) in the network is reached.
Overfitting occurs when the model tries to make predictions on data that is very
noisy. A model that is overfitted is inaccurate because the trend does not reflect
the reality present in the data. To overcome this, there are a few techniques that
can be used.
You can identify that your model is not right when it works well on training data
but does not perform well on unseen and new data. You can also track the
performance of the model performance through concepts like bias and variance.
40
But how to solve this problem? Here are some of the techniques you can use to
effectively overcome the overfitting problem in your neural network.
The first step when dealing with overfitting is to decrease the complexity of the
model. To decrease the complexity, we can simply remove layers or reduce the
number of neurons to make the network smaller. There is no general rule on how
much to remove or how large your network should be. But, if your neural network
is overfitting, try making it smaller.
Early Stopping
In the case of neural networks, data augmentation simply means increasing the size
of the data that is increasing the number of images present in the dataset. Some of
the popular image augmentation techniques are flipping, translation, rotation,
scaling, changing brightness, adding noise etcetera.
Use Regularization
41
adding a penalty term to the loss function. The most common techniques are
known as L1 and L2 regularization:
The L1 penalty aims to minimize the absolute value of the weights. This is
mathematically shown in the below formula.
The L2 penalty aims to minimize the squared magnitude of the weights. This is
mathematically shown in the below formula.
Use Dropouts
Dropout modifies the network itself. It randomly drops neurons from the neural
network during training in each iteration. When we drop different sets of neurons,
it’s equivalent to training different neural networks. The different networks will
overfit in different ways, so the net effect of dropout will be to reduce overfitting.
42