Unit Ii ML
Unit Ii ML
Introduction:
What is ANN?
Artificial Neural Networks are a special type of machine learning algorithms that
are modeled after the human brain. That is, just like how the neurons in our
nervous system are able to learn from the past data, similarly, the ANN is able
to learn from the data and provide responses in the form of predictions or
classifications.
ANNs are nonlinear statistical models which display a complex relationship
between the inputs and outputs to discover a new pattern. A variety of tasks
such as image recognition, speech recognition, machine translation as well as
medical diagnosis makes use of these artificial neural networks.
An important advantage of ANN is the fact that it learns from the example data
sets. Most commonly usage of ANN is that of a random function approximation.
With these types of tools, one can have a cost-effective method of arriving at
the solutions that define the distribution. ANN is also capable of taking sample
data rather than the entire dataset to provide the output result. With ANNs, one
can enhance existing data analysis techniques owing to their advanced
predictive capabilities.
The functioning of the Artificial Neural Networks is similar to the way neurons
work in our nervous system. The Neural Networks go back to the
early 1970s when Warren S McCulloch and Walter Pitts coined this term. In
order to understand the workings of ANNs, let us first understand how it is
structured. In a neural network, there are three essential layers –
Input Layers
The input layer is the first layer of an ANN that receives the input information in
the form of various texts, numbers, audio files, image pixels, etc.
Hidden Layers
In the middle of the ANN model are the hidden layers. There can be a single
hidden layer, as in the case of a perceptron or multiple hidden layers. These
hidden layers perform various types of mathematical computation on the input
data and recognize the patterns that are part of.
Output Layer
In the output layer, we obtain the result that we obtain through continuous
computations performed by the middle layer. In a neural network, there are
multiple parameters and hyper-parameters that affect the performance of the
model. The output of ANNs is mostly dependent on these parameters. Some of
these parameters are weights, biases, learning rate, batch size etc. Each node
in the ANN has some weight.
Each node in the network has some weights assigned to it. A transfer function
is used for calculating the weighted sum of the inputs and the bias.
After the transfer function has calculated the sum, the activation function
obtains the result. Based on the output received, the activation functions fire the
appropriate result from the node. For example, if the output received is above
0.5, the activation function fires a 1 otherwise it remains 0.
Some of the popular activation functions used in Artificial Neural Networks are
Sigmoid, RELU, Softmax, tanh etc.
Based on the value that the node has fired, we obtain the final output. Then,
using the error functions, we calculate the discrepancies between the predicted
output and resulting output and adjust the weights of the neural network through
a process known as backpropagation.
Perceptions
What is Machine Perception?
Machine Perception refers to the added functionality in computer systems that
enables reaction based on senses, similar to human perception. Computers
now have the added capacity to see, hear, touch, and in some cases even
smell. The goal of this functionality is to learn and react as a human would, so
the computer can alert human operators to any impending issues and help
troubleshoot.
Computer vision, sometimes called machine vision, refers to the way in which
computers analyze and interpret images or videos. Obtaining and
understanding images is a functionality used quite often in this digital revolution
for facial recognition software and image classification through convolutional
neural networks (CNN). Machine hearing is the computer's ability to decipher
sounds, such as speech and music, and process the sound data. This is used
for recording music and in voice recognition software in cars and on
smartphones. Machine touch generally attempts to gain information based on
tactile interaction with physical surroundings. This functionality is less widely
used, as recreating a real-world physical reaction in an artificial intelligence
(AI) capacity has not yet been fully realized. Similarly, machine smell, or
olfaction, is still in its early stages. The intended use of machine olfaction is for
chemical analysis and necessary alerts.
Machine learning refers to the overall data analysis that improves over time as it
"learns", but machine perception specifically involves the human senses and
their capacity to receive and process information. Whether the incoming data is
a face or an image or a string of music notes, object recognition and analysis
are improving daily. As each new set of data adds on, the system as a whole
becomes more appropriately reactive and even predictive. Fully realizing the
benefits of machine learning requires analysis through all human senses and
how they continuously learn, grow, and react to incoming information.
Perceptron
Suppose we have xn inputs(x1, x2….xn) and a bias unit. Let the weight applied to
be w1, w2…..wn. Then find the summation and bias unit on performing dot
product among inputs and weights as:
r = Σmi=1 wi xi + bias
On feeding the r into activation function F(r) we find the output for the hidden
layers. For the first hidden layer h 1, the neuron can be calculated as:
h11 = F(r)
For all the other hidden layers repeat the same procedure. Keep repeating the
process until reach the last weight set.
backpropagation algorithm
What is a backpropagation algorithm?
The key difference here is that static backpropagation offers instant mapping
and recurrent backpropagation does not.
The algorithm gets its name because the weights are updated backward, from
output to input.
It does not have any parameters to tune except for the number of inputs.
It is highly adaptable and efficient and does not require any prior knowledge
about the network.
The difficulty of understanding exactly how changing weights and biases affect
the overall behavior of an artificial neural network was one factor that held back
more comprehensive use of neural network applications, arguably until the early
2000s when computers provided the necessary insight.
Backpropagation requires a known, desired output for each input value in order
to calculate the loss function gradient -- how a prediction differs from actual
results -- as a type of supervised machine learning. Along with classifiers such
as Naïve Bayesian filters and decision trees, the backpropagation training
algorithm has emerged as an important part of machine learning applications
that involve predictive analytics.
The time complexity of each iteration -- how long it takes to execute each
statement in an algorithm -- depends on the network's structure. For multilayer
perceptron, matrix multiplications dominate time.
Summary
Face Recognition
The current technology amazes people with amazing innovations that not only
make life simple but also bearable. Face recognition has over time proven to be
the least intrusive and fastest form of biometric verification.
Image reading
The computer reads any image as a range of values between 0 and 255. For
any color image, there are 3 primary colors – Red, green, and blue. A matrix is
formed for every primary color and later these matrices combine to provide a
Pixel value for the individual R, G, B colors. Each element of the matrices
provide data about the intensity of the brightness of the pixel.
As a simple example, we can map a “face” into a feature vector which can
comprise various features like:
Height of face (cm) Width of the face (cm) Average color of face (RGB) Width of
lips (cm) Height of nose (cm)
1. Deriving the feature vector: it is difficult to manually list down all of the
features because there are just so many. A Machine Learning algorithm can
intelligently label out many of such features. For instance, a complex feature
could be the ratio of the height of the nose and width of the forehead.
2. Matching algorithms: Once the feature vectors have been obtained, a
Machine Learning algorithm needs to match a new image with the set of
feature vectors present in the corpus.
3. Face Recognition Operations
Face Recognition Operations
The technology system may vary when it comes to facial recognition. Different
software applies different methods and means to achieve face recognition. The
stepwise method is as follows:
Face Detection: To begin with, the camera will detect and recognize a face.
The face can be best detected when the person is looking directly at the
camera as it makes it easy for facial recognition. With the advancements in
the technology, this is improved where the face can be detected with slight
variation in their posture of face facing to the camera.
Face Analysis: Then the photo of the face is captured and analyzed. Most
facial recognition relies on 2D images rather than 3D because it is more
convenient to match to the database. Facial recognition software will analyze
the distance between your eyes or the shape of your cheekbones.
Image to Data Conversion: Now it is converted to a mathematical formula
and these facial features become numbers. This numerical code is known a
face print. The way every person has a unique fingerprint, in the same way,
they have unique face print.
Match Finding: Then the code is compared against a database of other face
prints. This database has photos with identification that can be compared.
The technology then identifies a match for your exact features in the
provided database. It returns with the match and attached information such
as name and addresses or it depends on the information saved in the
database of an individual.
Airline Industry:
Some airlines use facial recognition to identify passengers. This face
scanner would help saving time and to prevent the hassle of keeping track of
a ticket.
Hospital Security:
Facial recognition can be used in hospitals to keep a record of the patients
that is far better than keeping records and finding their names, address. It
would be easy for the staff to use this app and recognize a patient and get
its details within seconds. Secondly, can be used for security purpose where
it can detect if the person is genuine or not or is it a patient.
Detection of emotions and sentiments:
Real-time emotion detection is yet another valuable application of face
recognition in healthcare. It can be used to detect emotions which patients
exhibit during their stay in the hospital and analyze the data to determine
how they are feeling. The results of the analysis may help to identify if
patients need more attention in case they’re in pain or sad.
Pose: A Face Recognition System can tolerate cases with small rotation
angles, but it becomes difficult to detect if the angle would be large and if the
database does not contain all the angles of the face then it can impose a
problem.
Expressions: Because of the emotions, human mood varies and results in
due to which it may be difficult to identify a person who is now 60 years old.
Occlusion: Occlusion means blockage. This is due to the presence of various
occluding objects such as glasses, beard, moustache, etc. on the face, and
when an image is captured, the face lacks some parts. Such a problem can
severely affect the classification process of the recognition system.
Illumination: Illumination means light variations. Illumination changes can vary
For estimating hypothesis accuracy, statistical methods are applied. In this blog,
we’ll have a look at evaluating hypotheses and estimating it’s accuracy.
Evaluating hypotheses:
Whenever you form a hypothesis for a given training data set, for example, you
came up with a hypothesis for the EnjoySport example where the attributes of
the instances decide if a person will be able to enjoy their favorite sport or not.
Now to test or evaluate how accurate the considered hypothesis is we use
different statistical measures. Evaluating hypotheses is an important step in
training the model.
This section introduces basic notions from statistics and sampling theory,
including probability distributions, expected value, variance, Binomial and
Normal distributions, and two-sided and one-sided intervals.
Usefully, a table is provided that summarizes the key concepts of this section,
provided below.
To save repeating all of this theory, the crux of the section is as follows:
Proportional values like classification accuracy and classification error fit a
Binomial distribution.
The Binomial distribution characterizes the probability of a binary event, such as
a coin flip or a correct/incorrect classification prediction.
The mean is the expected value in the distribution, the variance is the average
distance that samples have from the mean, and the standard deviation is the
variance normalized by the size of the data sample.
Ideally, we seek an unbiased estimate of our desired parameter that has the
smallest variance.
Confidence intervals provide a way to quantify the uncertainty in a population
parameter, such as a mean.
The Binomial distribution can be approximated with the simpler Gaussian
distribution for large sample sizes, e.g. 30 or more observations.
The interval can be centered on the mean called two-sided, but can also be
one-sided, such as a radius to the left or right of the mean.
A more precise rule of thumb is that the approximation described above works
well when
Comparing Learning Algorithms
This final content section of the chapter focuses on the comparison of machine
learning algorithms.
The calculation of the confidence interval is updated to account for the reduced
number of degrees of freedom as each algorithm is evaluated on the same test
set.
Mitchell reminds us that the Student’s t-test does not technically apply in the
case where we use resampling methods. Nevertheless, he recommends using
k-fold cross-validation or random sampling in order to estimate the variance in
the estimate of model error, as they are the only methods available.
This is less than ideal as the expectations of the statistical test will be violated,
increasing type I errors