0% found this document useful (0 votes)
24 views22 pages

Unit Ii ML

1) Artificial neural networks (ANNs) are machine learning algorithms modeled after the human brain that can learn from data to make predictions or classifications. ANNs have multiple layers including an input layer, one or more hidden layers, and an output layer. 2) Each node in an ANN has weights assigned to it, and a transfer function is used to calculate weighted sums and pass results through an activation function. Backpropagation is used to adjust weights based on errors between predicted and actual outputs. 3) Perceptrons are simple ANNs that use a threshold function to determine output, but cannot represent XOR. Multi-layer perceptrons use multiple hidden layers to represent more complex functions.

Uploaded by

Vasu 22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views22 pages

Unit Ii ML

1) Artificial neural networks (ANNs) are machine learning algorithms modeled after the human brain that can learn from data to make predictions or classifications. ANNs have multiple layers including an input layer, one or more hidden layers, and an output layer. 2) Each node in an ANN has weights assigned to it, and a transfer function is used to calculate weighted sums and pass results through an activation function. Backpropagation is used to adjust weights based on errors between predicted and actual outputs. 3) Perceptrons are simple ANNs that use a threshold function to determine output, but cannot represent XOR. Multi-layer perceptrons use multiple hidden layers to represent more complex functions.

Uploaded by

Vasu 22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT II

Introduction:

Introduction to Artificial Neural Networks

Artificial Neural Networks are the most popular machine learning


algorithms today. The invention of these Neural Networks took place in the
1970s but they have achieved huge popularity due to the recent increase in
computation power because of which they are now virtually everywhere. In
every application that you use, Neural Networks power the intelligent interface
that keeps you engaged.

What is ANN?

Artificial Neural Networks are a special type of machine learning algorithms that
are modeled after the human brain. That is, just like how the neurons in our
nervous system are able to learn from the past data, similarly, the ANN is able
to learn from the data and provide responses in the form of predictions or
classifications.
ANNs are nonlinear statistical models which display a complex relationship
between the inputs and outputs to discover a new pattern. A variety of tasks
such as image recognition, speech recognition, machine translation as well as
medical diagnosis makes use of these artificial neural networks.

An important advantage of ANN is the fact that it learns from the example data
sets. Most commonly usage of ANN is that of a random function approximation.
With these types of tools, one can have a cost-effective method of arriving at
the solutions that define the distribution. ANN is also capable of taking sample
data rather than the entire dataset to provide the output result. With ANNs, one
can enhance existing data analysis techniques owing to their advanced
predictive capabilities.

Artificial Neural Networks Architecture

The functioning of the Artificial Neural Networks is similar to the way neurons
work in our nervous system. The Neural Networks go back to the
early 1970s when Warren S McCulloch and Walter Pitts coined this term. In
order to understand the workings of ANNs, let us first understand how it is
structured. In a neural network, there are three essential layers –
Input Layers

The input layer is the first layer of an ANN that receives the input information in
the form of various texts, numbers, audio files, image pixels, etc.
Hidden Layers

In the middle of the ANN model are the hidden layers. There can be a single
hidden layer, as in the case of a perceptron or multiple hidden layers. These
hidden layers perform various types of mathematical computation on the input
data and recognize the patterns that are part of.
Output Layer

In the output layer, we obtain the result that we obtain through continuous
computations performed by the middle layer. In a neural network, there are
multiple parameters and hyper-parameters that affect the performance of the
model. The output of ANNs is mostly dependent on these parameters. Some of
these parameters are weights, biases, learning rate, batch size etc. Each node
in the ANN has some weight.
Each node in the network has some weights assigned to it. A transfer function
is used for calculating the weighted sum of the inputs and the bias.

After the transfer function has calculated the sum, the activation function
obtains the result. Based on the output received, the activation functions fire the
appropriate result from the node. For example, if the output received is above
0.5, the activation function fires a 1 otherwise it remains 0.
Some of the popular activation functions used in Artificial Neural Networks are
Sigmoid, RELU, Softmax, tanh etc.

Based on the value that the node has fired, we obtain the final output. Then,
using the error functions, we calculate the discrepancies between the predicted
output and resulting output and adjust the weights of the neural network through
a process known as backpropagation.

Appropriate Problems for NN Learning


 Instances are represented by many attribute-value pairs.
 The target function output may be discrete-valued, real-valued, or a
vector of several real-valued or discrete-valued attributes.
 The training examples may contain errors.
 Long training times are acceptable.
 Fast evaluation of the learned target function may be required.
 The ability of humans to understand the learned target function is not
important

Perceptions
What is Machine Perception?
Machine Perception refers to the added functionality in computer systems that
enables reaction based on senses, similar to human perception. Computers
now have the added capacity to see, hear, touch, and in some cases even
smell. The goal of this functionality is to learn and react as a human would, so
the computer can alert human operators to any impending issues and help
troubleshoot.

Computer vision, sometimes called machine vision, refers to the way in which
computers analyze and interpret images or videos. Obtaining and
understanding images is a functionality used quite often in this digital revolution
for facial recognition software and image classification through convolutional
neural networks (CNN). Machine hearing is the computer's ability to decipher
sounds, such as speech and music, and process the sound data. This is used
for recording music and in voice recognition software in cars and on
smartphones. Machine touch generally attempts to gain information based on
tactile interaction with physical surroundings. This functionality is less widely
used, as recreating a real-world physical reaction in an artificial intelligence
(AI) capacity has not yet been fully realized. Similarly, machine smell, or
olfaction, is still in its early stages. The intended use of machine olfaction is for
chemical analysis and necessary alerts.

Machine learning refers to the overall data analysis that improves over time as it
"learns", but machine perception specifically involves the human senses and
their capacity to receive and process information. Whether the incoming data is
a face or an image or a string of music notes, object recognition and analysis
are improving daily. As each new set of data adds on, the system as a whole
becomes more appropriately reactive and even predictive. Fully realizing the
benefits of machine learning requires analysis through all human senses and
how they continuously learn, grow, and react to incoming information.

What are the business advantages of using Machine Perception?

 Predictive functionality: Accessing data that is processed through human-


like senses is the closest alternative to consumer testing. Machine
perception can help a business predict how a consumer or user will see,
hear, and experience a new product, site, or service.

 Robotics: Machines with robotic capabilities are advancing the


manufacturing and production sectors, but organizations can significantly
reduce the number of malfunctions with the added capabilities of machine
vision or tactile responsiveness. Smarter robots that can detect visible
errors and respond to equipment failure can save the organization costly
repairs and replacements.

 Accuracy: Collecting and analyzing data with computational methods is


an exact science. Even analyzing through models based on human
senses will be more accurate than human analysis alone.
Efficiency and productivity: Computer analysis and computer processing
are much faster than human employees can physically function. Reducing
the number of error-prone tasks that are carried out by humans will
reduce both errors and time spent.

 Recommendations: Machine perception empowers the use of predictive


analytics, but aside from predicting customer reactions, businesses can
also forecast what consumers will like and buy. This provides an
additional opportunity for revenue by recommending additional products
and services based on data-backed customer preferences.

Perceptron

 Perceptron is a Linear Threshold Unit (LTU).

 A perceptron takes a vector of real-valued inputs, calculates a linear


combination of these inputs, then outputs 1 if the result is greater than
some threshold and -1 otherwise.

 Given inputs xl through xn , the output o(x1 , . . . , xn ) computed by the


perceptron is:

Where each wi is a real-valued constant, or weight, that determines the


contribution of input x i to the perceptron output.

 The quantity (-w0 ) is a threshold that the weighted combination of


inputs must surpass in order for the perceptron to output 1. – To
simplify notation, we imagine an additional constant input x 0 = 1

Representational Power of Perceptrons

 A perceptron represents a hyperplane decision surface in the n-


dimensional space of instances.
 The perceptron outputs 1 for instances lying on one side of the
hyperplane and outputs -1 for instances lying on the other side.

 The equation for this decision hyperplane is w ->.x->= 0

 Some sets of positive and negative examples cannot be separated


by any hyperplane. Those that can be separated are called linearly
separable sets of examples.

 A single perceptron can be used to represent many Boolean


functions.
– AND, OR, NAND, NOR are representable by a perceptron
– XOR cannot be representable by a perceptron.

Multi-Layer Neural Network

To be accurate a fully connected Multi-Layered Neural Network is known as


Multi-Layer Perceptron. A Multi-Layered Neural Network consists of multiple
layers of artificial neurons or nodes. Unlike Single-Layer Neural networks, in
recent times most networks have Multi-Layered Neural Network. The following
diagram is a visualization of a multi-layer neural network.
Explanation: Here the nodes marked as “1” are known as bias units. The leftmost
layer or Layer 1 is the input layer, the middle layer or Layer 2 is the hidden
layer and the rightmost layer or Layer 3 is the output layer. It can say that the
above diagram has 3 input units (leaving the bias unit), 1 output unit, and 4 hidden
units(1 bias unit is not included).

A Multi-layered Neural Network is a typical example of the Feed Forward Neural


Network. The number of neurons and the number of layers consists of the hyper
parameters of Neural Networks which need tuning. In order to find ideal values
for the hyper parameters, one must use some cross-validation techniques.
Using the Back-Propagation technique, weight adjustment training is carried
out.

Formula for Multi-Layered Neural Network

Suppose we have xn inputs(x1, x2….xn) and a bias unit. Let the weight applied to
be w1, w2…..wn. Then find the summation and bias unit on performing dot
product among inputs and weights as:

r = Σmi=1 wi xi + bias
On feeding the r into activation function F(r) we find the output for the hidden
layers. For the first hidden layer h 1, the neuron can be calculated as:

h11 = F(r)

For all the other hidden layers repeat the same procedure. Keep repeating the
process until reach the last weight set.

backpropagation algorithm
What is a backpropagation algorithm?

Backpropagation, or backward propagation of errors, is an algorithm that is


designed to test for errors working back from output nodes to input nodes. It is
an important mathematical tool for improving the accuracy of predictions in data
mining and machine learning. Essentially, backpropagation is an algorithm used
to calculate derivatives quickly.

There are two leading types of backpropagation networks:

1. Static backpropagation. Static backpropagation is a network developed to


map static inputs for static outputs. Static backpropagation networks can
solve static classification problems, such as optical character recognition
(OCR).

2. Recurrent backpropagation. The recurrent backpropagation network is used


for fixed-point learning. Recurrent backpropagation activation feeds forward
until it reaches a fixed value.

The key difference here is that static backpropagation offers instant mapping
and recurrent backpropagation does not.

What is a backpropagation algorithm in a neural network?

Artificial neural networks use backpropagation as a learning algorithm to


compute a gradient descent with respect to weight values for the various inputs.
By comparing desired outputs to achieved system outputs, the systems are
tuned by adjusting connection weights to narrow the difference between the two
as much as possible.

The algorithm gets its name because the weights are updated backward, from
output to input.

The advantages of using a backpropagation algorithm are as follows:

 It does not have any parameters to tune except for the number of inputs.

 It is highly adaptable and efficient and does not require any prior knowledge
about the network.

 It is a standard process that usually works well.

 It is user-friendly, fast and easy to program.

 Users do not need to learn any special functions.

The disadvantages of using a backpropagation algorithm are as follows:

 It prefers a matrix-based approach over a mini-batch approach.

 Data mining is sensitive to noise and irregularities.

 Performance is highly dependent on input data.

 Training is time- and resource-intensive.


What is the objective of a backpropagation algorithm?

Backpropagation algorithms are used extensively to train feedforward neural


networks in areas such as deep learning. They efficiently compute the gradient
of the loss function with respect to the network weights. This approach
eliminates the inefficient process of directly computing the gradient with respect
to each individual weight. It enables the use of gradient methods, like gradient
descent or stochastic gradient descent, to train multilayer networks and update
weights to minimize loss.

The difficulty of understanding exactly how changing weights and biases affect
the overall behavior of an artificial neural network was one factor that held back
more comprehensive use of neural network applications, arguably until the early
2000s when computers provided the necessary insight.

Today, backpropagation algorithms have practical applications in many areas of


artificial intelligence (AI), including OCR, natural language processing and
image processing.

What is a backpropagation algorithm in machine learning?

Backpropagation requires a known, desired output for each input value in order
to calculate the loss function gradient -- how a prediction differs from actual
results -- as a type of supervised machine learning. Along with classifiers such
as Naïve Bayesian filters and decision trees, the backpropagation training
algorithm has emerged as an important part of machine learning applications
that involve predictive analytics.

What is the time complexity of a backpropagation algorithm?

The time complexity of each iteration -- how long it takes to execute each
statement in an algorithm -- depends on the network's structure. For multilayer
perceptron, matrix multiplications dominate time.

Disadvantages of using Backpropagation

 The actual performance of backpropagation on a specific problem is


dependent on the input data.
 Back propagation algorithm in data mining can be quite sensitive to noisy
data
 You need to use the matrix-based approach for backpropagation instead
of mini-batch.

Summary

 A neural network is a group of connected it I/O units where each


connection has a weight associated with its computer programs.
 Backpropagation is a short form for “backward propagation of errors.” It is
a standard method of training artificial neural networks
 Back propagation algorithm in machine learning is fast, simple and easy
to program
 A feedforward BPN network is an artificial neural network.
 Two Types of Backpropagation Networks are 1)Static Back-propagation
2) Recurrent Backpropagation
 In 1961, the basics concept of continuous backpropagation were derived
in the context of control theory by J. Kelly, Henry Arthur, and E. Bryson.
 Back propagation in data mining simplifies the network structure by
removing weighted links that have a minimal effect on the trained
network.
 It is especially useful for deep neural networks working on error-prone
projects, such as image or speech recognition.
 The biggest drawback of the Backpropagation is that it can be sensitive
for noisy data.

Face Recognition

The current technology amazes people with amazing innovations that not only
make life simple but also bearable. Face recognition has over time proven to be
the least intrusive and fastest form of biometric verification.

Facial Recognition is a category of biometric software that maps an individual’s


facial features and stores the data as a face print. The software uses deep
learning algorithms to compare a live captured image to the stored face print to
verify one’s identity. Image processing and machine learning are the backbones
of this technology. Face recognition has received substantial attention from
researchers due to human activities found in various applications of security like
an airport, criminal detection, face tracking, forensic, etc. Compared to other
biometric traits like palm print, iris, fingerprint, etc., face biometrics can be non-
intrusive.
They can be taken even without the user’s knowledge and further can be used
for security-based applications like criminal detection, face tracking, airport
security, and forensic surveillance systems. Face recognition involves capturing
face images from a video or a surveillance camera. They are compared with the
stored database. Face recognition involves training known images, classify
them with known classes, and then they are stored in the database. When a
test image is given to the system it is classified and compared with the stored
database.

Image Processing and Machine learning


Image processing by computers involves the process of Computer Vision. It
deals with the high-level understanding of digital images or videos. The
requirement is to automate tasks that the human visual systems can do. So, a
computer should be able to recognize objects such as that of a face of a human
being or a lamppost or even a statue.

Image reading
The computer reads any image as a range of values between 0 and 255. For
any color image, there are 3 primary colors – Red, green, and blue. A matrix is
formed for every primary color and later these matrices combine to provide a
Pixel value for the individual R, G, B colors. Each element of the matrices
provide data about the intensity of the brightness of the pixel.

OpenCV is a Python library that is designed to solve computer vision problems.


OpenCV was originally developed in 1999 by Intel but later supported by Willow
Garage. OpenCV supports a variety of programming languages such as C++,
Python, Java, etc. Support for multiple platforms including Windows, Linux, and
macOS. OpenCV Python is a wrapper class for the original C++ library to be
used with Python. Using this, all of the OpenCV array structures get converted
to/from NumPy arrays. This makes it easier to integrate it with other libraries
that use NumPy. For example, libraries such as SciPy and Matplotlib.
Machine learning
Every Machine Learning algorithm takes a dataset as input and learns from the
data it basically means to learn the algorithm from the provided input and output
as data. It identifies the patterns in the data and provides the desired algorithm.
For instance, to identify whose face is present in a given image, multiple things
can be looked at as a pattern:

 Height/width of the face.


 Height and width may not be reliable since the image could be rescaled to a
smaller face or grid. However, even after rescaling, what remains unchanged
are the ratios – the ratio of the height of the face to the width of the face
won’t change.
 Color of the face.
 Width of other parts of the face like lips, nose, etc.
There is a pattern involved – different faces have different dimensions like the
ones above. Similar faces have similar dimensions. Machine Learning
algorithms only understand numbers so it is quite challenging. This numerical
representation of a “face” (or an element in the training set) is termed as a
feature vector. A feature vector comprises of various numbers in a specific
order.

As a simple example, we can map a “face” into a feature vector which can
comprise various features like:

 Height of face (cm)


 Width of the face (cm)
 Average color of face (R, G, B)
 Width of lips (cm)
 Height of nose (cm)
Essentially, given an image, we can convert them into a feature vector like:

Height of face (cm) Width of the face (cm) Average color of face (RGB) Width of
lips (cm) Height of nose (cm)

23.1 15.8 (255, 224, 189) 5.2 4.4


So, the image is now a vector that could be represented as (23.1, 15.8, 255,
224, 189, 5.2, 4.4). There could be countless other features that could be
derived from the image,, for instance, hair color, facial hair, spectacles, etc.
Machine Learning does two major functions in face recognition technology.
These are given below:

1. Deriving the feature vector: it is difficult to manually list down all of the
features because there are just so many. A Machine Learning algorithm can
intelligently label out many of such features. For instance, a complex feature
could be the ratio of the height of the nose and width of the forehead.
2. Matching algorithms: Once the feature vectors have been obtained, a
Machine Learning algorithm needs to match a new image with the set of
feature vectors present in the corpus.
3. Face Recognition Operations
Face Recognition Operations
The technology system may vary when it comes to facial recognition. Different
software applies different methods and means to achieve face recognition. The
stepwise method is as follows:

 Face Detection: To begin with, the camera will detect and recognize a face.
The face can be best detected when the person is looking directly at the
camera as it makes it easy for facial recognition. With the advancements in
the technology, this is improved where the face can be detected with slight
variation in their posture of face facing to the camera.
 Face Analysis: Then the photo of the face is captured and analyzed. Most
facial recognition relies on 2D images rather than 3D because it is more
convenient to match to the database. Facial recognition software will analyze
the distance between your eyes or the shape of your cheekbones.
 Image to Data Conversion: Now it is converted to a mathematical formula
and these facial features become numbers. This numerical code is known a
face print. The way every person has a unique fingerprint, in the same way,
they have unique face print.
 Match Finding: Then the code is compared against a database of other face
prints. This database has photos with identification that can be compared.
The technology then identifies a match for your exact features in the
provided database. It returns with the match and attached information such
as name and addresses or it depends on the information saved in the
database of an individual.

Utilization of Face Recognition


While facial recognition may seem futuristic, it’s currently being used in a variety
of ways. Here are some surprising applications of this technology.

 Genetic Disorder Identification:


There are healthcare apps such as Face2Gene and software like Deep
Gestalt that uses facial recognition to detect a genetic disorder. This face is
then analyzed and matched with the existing database of disorders.

 Airline Industry:
Some airlines use facial recognition to identify passengers. This face
scanner would help saving time and to prevent the hassle of keeping track of
a ticket.

 Hospital Security:
Facial recognition can be used in hospitals to keep a record of the patients
that is far better than keeping records and finding their names, address. It
would be easy for the staff to use this app and recognize a patient and get
its details within seconds. Secondly, can be used for security purpose where
it can detect if the person is genuine or not or is it a patient.
 Detection of emotions and sentiments:
Real-time emotion detection is yet another valuable application of face
recognition in healthcare. It can be used to detect emotions which patients
exhibit during their stay in the hospital and analyze the data to determine
how they are feeling. The results of the analysis may help to identify if
patients need more attention in case they’re in pain or sad.

Problems and Challenges


The face recognition technology is facing several challenges. The common
problems and challenges that a face recognition system can have while
detecting and recognizing faces are discussed in the following paragraphs.

 Pose: A Face Recognition System can tolerate cases with small rotation

angles, but it becomes difficult to detect if the angle would be large and if the
database does not contain all the angles of the face then it can impose a
problem.
 Expressions: Because of the emotions, human mood varies and results in

different expressions. With these facial expressions, the machine could


make mistakes to find the correct person identity.
 Aging: With time and age face changes it is unique and does not remain rigid

due to which it may be difficult to identify a person who is now 60 years old.
 Occlusion: Occlusion means blockage. This is due to the presence of various

occluding objects such as glasses, beard, moustache, etc. on the face, and
when an image is captured, the face lacks some parts. Such a problem can
severely affect the classification process of the recognition system.
 Illumination: Illumination means light variations. Illumination changes can vary

the overall magnitude of light intensity reflected from an object, as well as


the pattern of shading and shadows visible in an image. The problem of face
recognition over changes in illumination is widely recognized to be difficult
for humans and algorithms. The difficulties posed by illumination condition is
a challenge for automatic face recognition systems.
 Identify similar faces: Different persons may have a similar appearance that

sometimes makes it impossible to distinguish.


Disadvantages of Face Recognition
1. Danger of automated blanket surveillance
2. Lack of clear legal or regulatory framework
3. Violation of the principles of necessity and proportionality
4. Violation of the right to privacy
5. Effect on democratic political culture

Evaluating Hypotheses: Estimating hypotheses Accuracy

For estimating hypothesis accuracy, statistical methods are applied. In this blog,
we’ll have a look at evaluating hypotheses and estimating it’s accuracy.

Evaluating hypotheses:
Whenever you form a hypothesis for a given training data set, for example, you
came up with a hypothesis for the EnjoySport example where the attributes of
the instances decide if a person will be able to enjoy their favorite sport or not.
Now to test or evaluate how accurate the considered hypothesis is we use
different statistical measures. Evaluating hypotheses is an important step in
training the model.

Basics of Sampling Theory


The equation to calculate the confidence interval from the previous section
makes many assumptions.

This section works through these assumptions in order to provide a basis of


understanding for the calculation.

This section introduces basic notions from statistics and sampling theory,
including probability distributions, expected value, variance, Binomial and
Normal distributions, and two-sided and one-sided intervals.

Usefully, a table is provided that summarizes the key concepts of this section,
provided below.

To save repeating all of this theory, the crux of the section is as follows:
 Proportional values like classification accuracy and classification error fit a
Binomial distribution.
 The Binomial distribution characterizes the probability of a binary event, such as
a coin flip or a correct/incorrect classification prediction.
 The mean is the expected value in the distribution, the variance is the average
distance that samples have from the mean, and the standard deviation is the
variance normalized by the size of the data sample.
 Ideally, we seek an unbiased estimate of our desired parameter that has the
smallest variance.
 Confidence intervals provide a way to quantify the uncertainty in a population
parameter, such as a mean.
 The Binomial distribution can be approximated with the simpler Gaussian
distribution for large sample sizes, e.g. 30 or more observations.
The interval can be centered on the mean called two-sided, but can also be
one-sided, such as a radius to the left or right of the mean.

To evaluate the hypotheses precisely focus on these points:

When statistical methods are applied to estimate hypotheses,


 First, how well does this estimate the accuracy of a hypothesis across
additional examples, given the observed accuracy of a hypothesis over a limited
sample of data?
 Second, how likely is it that if one theory outperforms another across a set of
data, it is more accurate in general?
 Third, what is the best strategy to use limited data to both learn and measure
the accuracy of a hypothesis?

Bias in the estimation


There is a bias in the estimation. Initially, the observed accuracy of the learned
hypothesis over training instances is a poor predictor of its accuracy over future
cases.
Because the learned hypothesis was generated from previous instances, future
examples will likely yield a skewed estimate of hypothesis correctness.
Estimation variability.
Second, depending on the nature of the particular set of test examples, even if
the hypothesis accuracy is tested over an unbiased set of test instances
independent of the training examples, the measurement accuracy can still differ
from the true accuracy.

The anticipated variance increases as the number of test examples decreases.

Confidence Intervals for Discrete-Valued Hypotheses:

“How accurate are errors(h) estimates of errorD(h)?” – in the case of a discrete-


valued hypothesis (h).

To estimate the true error for a discrete-valued hypothesis h based on its


observed sample error over a sample S, where
 According to the probability distribution D, the sample S contains n samples
drawn independently of one another and of h.
 n >= 30
 Over these n occurrences, hypothesis h commits r mistakes errors(h) = r/n

Under these circumstances, statistical theory permits us to state the following:


 If no additional information is available, the most likely value of errorD(h) is
errors(h).
 The genuine error errorD(h) lies in the interval with approximately 95%
probability.

A more precise rule of thumb is that the approximation described above works
well when
Comparing Learning Algorithms
This final content section of the chapter focuses on the comparison of machine
learning algorithms.

This is different from comparing models (hypotheses) as comparing algorithms


involves training them and evaluating them potentially on multiple different
samples of data from the domain.

The comparison of two algorithms is motivated by estimating the expected or


mean difference between the two methods. A procedure is presented that uses
k-fold cross-validation where each algorithm is trained and evaluated on the
same splits of the data. A final mean difference in error is calculated, from
which a confidence interval can be estimated.

The calculation of the confidence interval is updated to account for the reduced
number of degrees of freedom as each algorithm is evaluated on the same test
set.

The paired Student’s t-test is introduced as a statistical hypothesis test for


quantifying the likelihood that two means belong to the same (or different)
distributions. This test can be used with the outlined procedure, but only if each
train and test set contain independent samples, a fact that is not the case with
default k-fold cross-validation.

In particular, in this idealized method we modify the procedure of Table 5.5 so


that on each iteration through the loop it generates a new random training set Si
and new random test set Ti by drawing from this underlying instance distribution
instead of drawing from the fixed sample Do

The section ends by outlining practical considerations when comparing machine


learning algorithms.

Mitchell reminds us that the Student’s t-test does not technically apply in the
case where we use resampling methods. Nevertheless, he recommends using
k-fold cross-validation or random sampling in order to estimate the variance in
the estimate of model error, as they are the only methods available.

This is less than ideal as the expectations of the statistical test will be violated,
increasing type I errors

You might also like