0% found this document useful (0 votes)
130 views79 pages

Unit 4

Uploaded by

21311a1962
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
130 views79 pages

Unit 4

Uploaded by

21311a1962
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 79

MACHINE LEARNING

UNIT-IV

Mr. Kasi Bandla


Asst.Professor
Department of ECM
SNIST
E-mail: [email protected]

1
CONTENTS

 Dimensionality Reduction
 Linear Discriminant Analysis
 Principal Component Analysis
 Factor Analysis
 Independent Component Analysis
 Locally Linear Embedding
 Isomap Least Squares Optimization
 Evolutionary Learning
 Genetic algorithms
 Genetic Offspring
 Genetic Operators
 Using Genetic Algorithms
 Reinforcement Learning
 Overview
 Getting Lost Example
2
DIMENSIONALITY REDUCTION
 Reduction is divided into two components, feature
Dimensionality reduction refers to techniques
for reducing the number of input variables in training data.

 Dimensionality reduction is a statistical technique


of reducing the amount of random variables in
a problem by obtaining a set of principal variables. The
process of dimensionality selection and feature extraction.

 It reduces the time and storage space required. It becomes


easier to visualize the data when reduced to very low
dimensions such as 2D or 3D.
3
COMMON DIMENSIONALITY REDUCTION TECHNIQUES

 Missing Value Ratio. Suppose you're given a


dataset. ...
 Low Variance Filter. ...
 High Correlation filter. ...
 Random Forest. ...
 Backward Feature Elimination. ...
 Forward Feature Selection. ...
 Factor Analysis. ...
 Principal Component Analysis (PCA)
 Linear Discernment Analysis(LDA)
4
ADVANTAGES OF DIMENSIONALITY REDUCTION

 It helps in data compression, and hence reduced storage


space. It reduces computation time. It also helps remove
redundant features.

Disadvantages
 Non-linear data is mapped and transformed onto a
higher-dimensional space. Then PCA is used
to reduce the dimensions. However, one downside of this
approach is that it is computationally very expensive.

5
LINEAR DISCRIMINANT ANALYSIS (LDA)

 Linear discriminant analysis is a supervised classification


method that is used to create machine learning models.
These models based on dimensionality reduction are used in
the application

 LDA is primarily used here to reduce the number of features


to a more manageable number before classification.

 Discriminate analysis is a statistical method that is used by


researchers to help them understand the relationship
between a "dependent variable" and "independent
variables.”
6
LDA PROCEDURE
1. Compute the d-dimensional mean vectors.
2. Compute the scatter matrices(Covariance)
3. Compute the eigenvectors and corresponding eigen values for
the scatter matrices.
4. Sort the eigen values and choose those with the largest eigen
values to form a d×k dimensional matrix
5. Transform the samples onto the new subspace.
Note: A scatter matrix is a pair-wise scatter plot of several
variables presented in a matrix format. It can be used to
determine whether the variables are correlated and whether the
correlation is positive or negative.
7
 THE PRINCIPAL INSIGHT OF LDA IS THAT THE COVARIANCE
MATRIX CAN TELL US ABOUT THE SCATTER WITHIN A DATASET,
WHICH IS THE AMOUNT OF SPREAD THAT THERE IS WITHIN THE
DATA.
THE WAY TO FIND THIS SCATTER IS TO MULTIPLY THE
COVARIANCE BY THE PC, THE PROBABILITY OF THE CLASS
(THAT IS, THE NUMBER OF DATA POINTS
THERE ARE IN THAT CLASS DIVIDED BY THE TOTAL NUMBER).
ADDING THE VALUES OF THIS FOR ALL OF THE CLASSES GIVES
US A MEASURE OF THE WITHIN-CLASS SCATTER OF THE
DATASET:

8
9
10
11
PRINCIPAL COMPONENTS ANALYSIS (PCA)

 Principal component analysis (PCA) is a standard tool in


modern data analysis - in diverse fields from neuroscience
to computer graphics. It is very useful method for
extracting relevant information from confusing data sets.

 Principal component analysis (PCA) is a statistical


procedure that uses an orthogonal transformation to convert
a set of observations of possibly correlated variables into a
set of values of linearly uncorrelated variables
called principal components. The number of principal
components is less than or equal to the number of original
variables.
12
THE PRINCIPAL COMPONENTS ANALYSIS ALGORITHM

13
PCA APPROACH

 Standardize the data.


 Perform Singular Vector Decomposition to get the
Eigenvectors and Eigen values.
 Sort eigen values in descending order and choose
the k- eigenvectors
 Construct the projection matrix from the
selected k- eigenvectors.
 Transform the original dataset via projection matrix to obtain
a k-dimensional feature subspace.

14
15
16
Goals
 The main goal of a PCA analysis is to identify
patterns in data.
 PCA aims to detect the correlation between
variables.
 It attempts to reduce the dimensionality.

Transformation
This transformation is defined in such a way that the
first principal component has the largest
possible variance and each succeeding component
in turn has the next highest possible variance.
17
LIMITATION OF PCA

The results of PCA depend on the scaling of the


variables.
A scale-invariant form of PCA has been developed.

Applications
 Spike-triggered covariance analysis in

Neuroscience.
 Quantitative Finance.

 Image Compression.

 Facial Recognition.

 Other applications like Medical Data

correlation. 18
PCA RELATION WITH THE MULTI-LAYER PERCEPTRON

 The auto-associative MLP actually computes something


very similar to the principal components of the data in the
hidden nodes, and this is one of the ways that we can
understand what the network is doing.

 Computing the principal components with a neural


network isn’t necessarily a good idea.

 PCA is linear (it just rotates and translates the axes). This is
clear, the hidden nodes that are computing PCA, and they
are effectively a bit like a Perceptron—they can only
perform linear tasks. 19
 The predictor variables are multi collinear in
nature which is overcome by using Principal
Component Analysis (PCA) which resulted in a
new set of independent variables that are taken
for predicting the results using Multilayer Layer
Perceptron (MLP) model.

 To evaluate the prediction ability of the model,


we compare the performance of models using a
common error measure. The empirical results
reveal that the proposed approach is a promising
alternate to different fields.

20
KERNEL PCA
 PCA is a linear method. Kernel PCA uses a kernel function to project
dataset into a higher dimensional feature space, where it is linearly
separable. It is similar to the idea of Support Vector Machines. There
are various kernel methods like linear, polynomial, Sigmoid and
gaussian.
 In the field of multivariate statistics, kernel principal component
analysis (KPCA) is an extension of principal component
analysis (PCA) using techniques of kernel methods. Using a kernel,
the originally linear operations of PCA are performed in a
reproducing kernel Hilbert space.
 The largest difference of the projections of the points onto the
eigenvector (new coordinates), KPCA is a circle and PCA is a straight
line, so KPCA gets higher variance than PCA.

21
22
THE KERNEL PCA ALGORITHM

23
FACTOR ANALYSIS
 Factor analysis is a technique that is used to reduce a large
number of variables into fewer numbers of factors. This
technique extracts maximum common variance from all
variables and puts them into a common score.
 Factor analysis is a statistical data reduction and
analysis technique that strives to explain correlations among
multiple outcomes as the result of one or more underlying
explanations, or factors. The technique involves data
reduction, as it attempts to represent a set of variables
by a smaller number.
 The difference between factor analysis and principal
component analysis. Factor analysis explicitly assumes the
existence of latent factors underlying the observed
data. PCA instead seeks to identify variables that are
composites of the observed variables. 24
INDEPENDENT COMPONENTS ANALYSIS (ICA)
 Independent Component Analysis (ICA) is a machine
learning technique to separate independent sources from a mixed
signal/Data. Unlike principal component analysis which focuses on
maximizing the variance of the data points, the independent
component analysis focuses on independence.

 Independent component analysis (ICA) is a statistical and


computational technique for revealing hidden factors that underlie sets
of random variables, measurements, or signals. ICA defines a
generative model for the observed multivariate data, which is
typically given as a large database of samples.

 Both are statistical transformations.


 PCA: information from second order statistics.
 ICA: information that goes up to high order statistics. 25
LOCALLY LINEAR EMBEDDING
 Locally linear embedding (LLE) seeks a lower-dimensional
projection of the data which preserves distances
within local neighborhoods. It can be thought of as a series
of local Principal Component Analyses which are globally
compared to find the best non-linear embedding.

 The Locally Linear Embedding algorithm is a type of typical


manifold learning algorithm. The main idea of LLE is to solve
globally nonlinear problems using locally linear fitting, which is
based on the assumption that data lying on a nonlinear manifold
can be viewed as linear in local areas.

26
THE LOCALLY LINEAR EMBEDDING ALGORITHM

27
THE LLE ALGORITHM PRODUCES A VERY INTERESTING
RESULT ON THE IRIS DATASET: IT SEPARATES THE THREE
GROUPS INTO THREE POINTS (FIGURE 6.12). THIS SHOWS
THAT THE ALGORITHM WORKS VERY WELL ON THIS TYPE OF
DATA, BUT DOESN’T GIVE US ANY HINTS AS TO WHAT ELSE IT
CAN DO.

28
FIGURE 6.13 SHOWS A COMMON DEMONSTRATION DATASET
FOR THESE ALGORITHMS. KNOWN AS THE SWISSROLL FOR
OBVIOUS REASONS, IT IS TRICKY TO FIND A 2D
REPRESENTATION OF THE 3D DATA BECAUSE IT IS ROLLED UP.
THE RIGHT OF FIGURE 6.13 SHOWS THAT LLE CAN
SUCCESSFULLY UNROLL IT.

29
MULTI-DIMENSIONAL SCALING (MDS)

 Like PCA, MDS tries to find a linear approximation to the full data
space that embeds the data into a lower dimensionality.
 In the case of MDS the embedding tries to preserve the distances
between all pairs of points. It turns out that if the space is
Euclidean, then the two methods are identical.
 We use the same notational setup as previously, starting with data
points x1, x2, . . . , xN 2 RM. We choose a new dimensionality L <
M and compute the embedding so that the data points are z1, z2, . .
. zN 2 RL. As usual, we need a cost function to minimize. There are
lots of choices for MDS cost functions, but the more common ones
are:

30
31
32
 THIS CLASSICAL MDS ALGORITHM WORKS FINE ON FLAT
MANIFOLDS(VARIOUS AND MANY) DATA SPACES.

 we are interested in manifolds that are not flat, and this is


handled by Isomap. This algorithm has to construct the
distance matrix for all pairs of data points on the manifold,
and so the distances can’t be computed exactly.

 Isomap approximates them by assuming that the distances


between pairs of points that are close together are good, It
builds up the distances between points that are far away by
finding paths that run through points that are close together,
i.e., that are neighbours, and then uses normal MDS on this
distance matrix:

33
34
EVOLUTIONARY LEARNING

 An EL uses mechanisms inspired by biological evolution, such as


reproduction, mutation, recombination, and selection. Evolutionary
algorithms often perform well approximating solutions to all types
of problems because they ideally do not make any assumption
about the underlying fitness landscape.
 The genetic algorithm models the genetic process that gives rise to
evolution. In particular, it models sexual reproduction, where both
parents give some genetic information to their offspring.
 The genetic algorithm shows many of the things that are best and
worst about machine learning: it is often, but not always, very
effective, it has an array of parameters that are crucial, but hard to
set, and it is impossible to guarantee that it will find a result that is
any good at all.

35
EACH ADULT IN THE MATING PAIR PASSES ONE OF
THEIR TWO CHROMOSOMES TO THEIR OFFSPRING

36
THE GENETIC ALGORITHM (GA)

 The Genetic Algorithm is a computational approximation to


how evolution performs search, which is by producing
modifications of the parent genomes in their offspring and
thus producing new individuals with different fitness. The
computational procedure is

 a method for representing problems as chromosomes


 a way to calculate the fitness of a solution
 a selection method to choose parents
 a way to generate offspring by breeding the parents

37
STRING REPRESENTATION

 The first thing that we need is some way to


represent the individual solutions, in analogy to
the chromosome.
 GAs use a string, with each element of the string
(equivalent to the gene) being chosen from
some alphabet. The different values in the
alphabet, which is often just binary, are
analogous to the alleles.
 For the problem, we are trying to solve to work
out a way of encoding the description of a
solution as a string. We then create a set of
random strings to be our initial population.
38
EVALUATING FITNESS
 The fitness function can be seen as an oracle that takes a
string as an argument and returns a value for that string.
Together with the string encoding the fitness function forms
the problem-specific part of the GA.

 For the knapsack problem, we decided that we wanted to


make the bag as full as possible. So we would need to know
the volume of each item that we want to put into the
knapsack, and then for a given string that says which things
should be taken, and which should not, we can compute the
total volume. This is then a possible fitness function.

39
POPULATION

 We can now measure the fitness of any string. The GA


works on a population of strings, with the first generation
usually being created randomly. The fitness of each string
is then evaluated, and that first generation is bred together
to make a second generation, which is then used to
generate a third, and so on.

 After the initial population is chosen randomly, the


algorithm evolves to produce each successive generation,
with the hope being that there will be progressively fitter
individuals in the populations as the number of
generations increases.
40
GENERATING OFFSPRING: PARENT SELECTION

 For the current generation we need to select those strings that will
be used to generate new offspring. The idea here is that average
fitness will improve if we select strings that are already relatively
fit compared to the other members of the population (following
natural selection), which is exploitation of our current population.

 However, it is also good to allow some exploration in there,


which means that we have to allow some possibility of weak
strings being considered.

 If strings are chosen proportionally to their fitness, so that fitter


strings are more likely to be chosen to enter the ‘mating pool’,
then this allows for both options.
41
42
GENERATING OFFSPRING: GENETIC OPERATORS

 Crossover
 Crossover is the operator that performs global exploration, since
the strings that are produced are radically different to both parents
in at least some places. The hope is that sometimes we will take
good parts of both solutions and put them together to make an
even better solution. The different forms of the crossover
operator.

 (a) Single point crossover. A position in the string is chosen at


random, and the offspring is made up of the first part of parent 1
and the second part of parent 2.
 (b) Multi-point crossover. Multiple points are chosen, with the
offspring being made in the same way.
 (c) Uniform crossover. Random numbers are used to select which43
 Mutation
 The other genetic operator is mutation, which effectively
performs local random search. The value of any element
of the string can be changed, governed by some (usually
low) probability p. For our binary alphabet in the
knapsack problem, mutation causes a bit-flip, as is
shown in Figure.
44
 Elitism, Tournaments, and Niching
 Elitism, which takes some number of the fittest strings from
one generation and puts them directly into the next
population, replacing strings that are already there either at
random, or by choosing the least fit to replace.
 Elitism and tournaments both ensure that good solutions
aren’t lost, they both have the problem that they can
encourage premature convergence, where the algorithm
settles down to a constant population that never changes even
though it hasn’t found an optimum.
45
THE RANDOMNESS IN THE GA IS A VERY LARGE PART
OF WHY IT WORKS, AND SCHEMES TO REDUCE THAT
RANDOMNESS OFTEN HARM THE OVERALL RESULTS.
 One way to solve the problem of premature convergence is
through niching (also known as using island populations),
where the population is separated into several subpopulations,
which all evolve independently for some period of time, so
that they are likely to have converged to different local
maxima, and a few members of one subpopulation are
occasionally injected as ‘immigrants’ into another
subpopulation.

 Another approach is known as fitness sharing, where the


fitness of a particular string is averaged across the number of
times that string appears in the population. This biases the
fitness function towards uncommon strings, but can also mean
46
47
USING GENETIC ALGORITHMS

Map Colouring
Graph colouring is a typical discrete optimisation problem. We want to
colour a graph using only k colours, and choose them in such a way that
adjacent regions have different colours. It has been mathematically proven
that any two-dimensional planar graph can be coloured with four colours,
which was the first ever proof that used a computer program to check the
cases.
 Encode possible solutions as strings For this problem, we’ll
choose our alphabet to consist of the three possible shades
(black (b), dark (d), and light (l)).
 Choose a suitable fitness function The thing that we want
to minimise (a cost function) is the number of times that
two adjacent regions have the same colour.
 Choose suitable genetic operators We’ll use the standard
genetic operators for this, since this example makes the
operations of crossover and mutation clear. 48
49
50
PUNCTUATED EQUILIBRIUM
The argument runs that if humans evolved from apes, then there should be
some evidence of a whole set of intermediary species that existed during the
transition phase, and there aren’t. Interestingly, GAs demonstrate one of the
explanations why this is not correct, which is that the way that evolution
actually seems to work is known as punctuated equilibrium.

51
EXAMPLES
 The Knapsack Problem
Knapsack problem states that: Given a set of items, each with a
mass and a value, determine the number of each item to include in
a collection so that the total weight is less than or equal to a given
limit and the total value is as large as possible.
The Genetic Algorithm provides a way to solve the knapsack
problem in linear time complexity . The attribute reduction
technique which incorporates Rough Set Theory finds the
important genes, hence reducing the search space and ensures that
the effective information will not be lost.

Genetic Algorithms definitely rule them all and prove to be the best
approach in obtaining solutions to problems traditionally thought
of as computationally infeasible such as the Knapsack problem.

52
53
EXAMPLE 2: THE FOUR PEAKS PROBLEM
The four peaks is a toy problem that is quite often used to test out
GAs and various developments of them. It is an invented fitness
function that rewards strings with lots of consecutive 0s at the
start of the string, and lots of consecutive 1s at the end. The fitness
consists of counting the number of 0s at the start, and the number
of 1s at the end and returning the maximum of them as the fitness.

54
LIMITATIONS OF GA

 A significant one of which is they can be very slow.


 The main problem is that once a local maximum has been reached,
it can often be a long time before a string is produced that escapes
from the local maximum and finds another, higher, maximum.
 A more basic criticism of genetic algorithms is that it
is very hard (read basically impossible) to analyse
the behaviour of the GA.
 we cannot guarantee that the algorithm will
converge at all, and certainly not to the optimal
solution.

55
TRAINING NEURAL NETWORKS WITH GENETIC ALGORITHMS
 We trained our neural networks, most notably the MLP, using
gradient descent. we could encode the problem of finding the
correct weights as a set of strings, with the fitness function
measuring the sum-of-squa res error. This has been done,
and with good reported results. However, there are
some problems with this approach.

Problem:
 The first is that we turn all the local information from the targets

about the error at each output node of the network into just one
number, the fitness, which is throwing away useful information,
and the second is that we are ignoring the gradient information,
which is also throwing away useful information.

56
SOLUTION

 GAs with neural networks is to use the GA to choose the


topology of the network. Previously, we chose the structure in a
completely ad hoc way by trying out different structures and
choosing the one that worked best.

 We can use a GA for this problem, although the crossover


operator doesn’t make a great deal of sense, so we just consider
mutation. However, we allow for four different types of
mutation: delete a neuron, delete a weight connection, add a
neuron, add a connection.

57
REINFORCEMENT LEARNING
 Reinforcement learning fills the gap between supervised
learning, where the algorithm is trained on the correct answers
given in the target data, and unsupervised learning, where the
algorithm can only exploit similarities in the data to cluster it.
 Reinforcement learning is usually described in terms of the
interaction between some agent and its environment. The agent is
the thing that is learning, and the environment is where it is
learning, and what it is learning about. The environment has
another task, which is to provide information about how good a
strategy is, through some reward function.
 The importance of reinforcement learning for psychological
learning theory comes from the concept of trial-and-error learning,
which has been around for a long time, and is also known as the
Law of Effect.
58
A ROBOT PERCEIVES THE CURRENT STATE OF ITS
ENVIRONMENT THROUGH ITS SENSORS, AND
PERFORMS ACTIONS BY MOVING ITS MOTORS. THE
REINFORCEMENT LEARNER (AGENT) WITHIN THE
ROBOT TRIES TO PREDICT THE NEXT STATE AND
REWARD.

59
 Reinforcement learning maps states or situations to actions in order
to maximise some numerical reward. That is, the algorithm knows
about the current input (the state), and the possible things it can do
(the actions), and its aim is to maximise the reward. There is a clear
distinction drawn between the agent that is doing the learning and
the environment, which is where the agent acts, and which produces
the state and the rewards.

 The most common way to think about reinforcement learning is on a


robot. The current sensor readings of the robot, or processed versions
of them, could define the state. They are a representation of the
environment around the robot in some way. Note that the state
doesn’t necessarily tell us everything that it would be useful to know,
and there can be noise and inaccuracies in the state data.

 The possible ways that the robot can drive its motors are the actions,
which move the robot in the environment, and the reward could be
60
how well it does its task without crashing into things.
THE REINFORCEMENT LEARNING CYCLE: THE LEARNING
AGENT PERFORMS ACTION AT IN STATE ST AND RECEIVES
REWARD RT+1 FROM THE ENVIRONMENT, ENDING UP IN
STATE ST+1.

61
EXAMPLE: GETTING LOST
You arrive in a foreign city exhausted after many
hours of flying, catch the train into town and stagger
into a backpacker’s hostel without noticing much of
your surroundings. When you wake up it is dark and
you are starving, walked through the old part of the city, so
you don’t need to worry about any street that takes you out of the
old part. So at the next bus stop you come to, you have a proper
look at the map, and note down the map of the old town squares,
which turns out to look like Figure.

62
YOU DECIDE THAT THE BACKPACKER’S IS ALMOST DEFINITELY IN THE SQUARE
LABELLED F ON THE MAP, BECAUSE ITS NAME SEEMS VAGUELY FAMILIAR.
YOU DECIDE TO WORK OUT A REWARD STRUCTURE SO THAT YOU CAN
FOLLOW A REINFORCEMENT LEARNING ALGORITHM TO GET TO THE
BACKPACKER’S. THE FIRST THING YOU WORK OUT IS THAT STAYING STILL
MEANS THAT YOU ARE SLEEPING ON YOUR FEET, WHICH IS BAD. SO YOU
ASSIGN A REWARD OF −5 FOR THAT (WHILE NEGATIVE REINFORCEMENT CAN
BE VIEWED AS PUNISHMENT, IT DOESN’T NECESSARILY CORRESPOND
CLEARLY, BUT YOU MIGHT WANT TO IMAGINE IT AS PINCHING YOURSELF SO
THAT YOU STAY AWAKE).

63
THE STATE DIAGRAM IF YOU ARE CORRECT AND THE
BACKPACKER’S IS IN SQUARE (STATE) F. THE
CONNECTIONS FROM EACH STATE BACK INTO ITSELF
(MEANING THAT YOU DON’T MOVE) ARE NOT SHOWN, TO
AVOID THE FIGURE GETTING TOO COMPLICATED. THEY
ARE EACH WORTH −5 (EXCEPT FOR STAYING IN STATE F,
WHICH MEANS THAT YOU ARE IN THE BACKPACKER’S).

64
THE FOLLOWING THINGS DISCUSS IN REINFORCEMENT LEARNING

 State and Action Spaces


Our reinforcement learner is basically a search algorithm, and obviously the
larger the number of states that the algorithm has to search through, the longer
it will take to find a good solution. The set of all states that are possible for the
learner to experience is known as the state space. There is a corresponding
action space that contains all of the possible actions.
 Carrots and Sticks: The Reward Function
The basic idea of the learner is that it will choose the action that gets
the maximum expected reward. The reward function takes the current
state and the chosen action and produces a numerical reward based on
them.
 Discounting
The solution to this problem is known as discounting, and means that
we take into account how certain we can be about things that happen in
the future: there is lots of uncertainty in the learning anyway, so we
should discount our predictions of rewards in the future according to
how much chance there is that they are wrong. 65
Action Selection
At each stage of the reinforcement learning process, the algorithm looks
at the actions that can be performed in the current state and computes the
value of each action. Based on the current average reward predictions,
there are three methods of choosing action a that are worth thinking about
for reinforcement learning. We’ve seen the first and third of them before:

66
POLICY

 We have just considered different action selection


methods, such as e-greedy and soft-max. The aim of the
action selection is to trade off exploration and
exploitation in such a way as to maximize the expected
reward into the future.

 Instead, we can make an explicit decision that we are


going to always take the optimal choice at each stage, and
not do exploration any more. This choice of which action
to take in each state in order to get optimal results is
known as the policy, π.

67
MARKOV DECISION PROCESSES
The Markov Property
A simple example of a Markov decision process to
decide on the state of your mind tomorrow given
your state of mind today.

A reinforcement learning problem that follows that is, the Markov property is
known as a Markov Decision Process (MDP). It means that we can compute the
likely next reward, and what the next state will be, from only the current state and
action, based on previous experience.
68
PROBABILITIES IN MARKOV DECISION PROCESSES

 We have now reduced our reinforcement learning problem to


learning about Markov Decision Processes.
 We will only talk about the case where the number of possible
states and actions is finite, because reasoning about the infinite
case makes your head hurt. There is a very simple example of an
MDP, showing predictions for your state-of-mind while preparing
for an exam, together with the (transition probabilities) for moving
between each pair of states shown. This is known as a Markov
chain.

69
PROBABILITIES IN MARKOV DECISION PROCESSES

 There are three actions that can be taken in state E (shown by the
black circles), with associated probabilities and expected rewards.
Learning and using this transition diagram can be seen as the aim
of any reinforcement learner.

 The Markov Decision Process formalism is a powerful one that


can deal with additional uncertainties. For example, it can be
extended to deal with the case where the true states are not known,
only an observation of the state can be made, which is
probabilistically related to the state, and possibly the action. These
are known as partially observable Markov Decision Processes
(POMDPs), and they are related to the Hidden Markov Models

70
VALUES
 The reinforcement learner is trying to decide on what action to take
in order to maximize the expected reward into the future. This
expected reward is known as the value. There are two ways that we
can compute a value.
 We can consider the current state, and average across all of the
actions that can be taken, leaving the policy to sort this out for itself
(the state-value function, V (s)), or we can consider the current
state and each possible action that can be taken separately, the
action-value function, Q(s, a). In either case we are thinking about
what the expected reward would be if we started in state s (where
E(·) is the statistical expectation):

71
72
73
BACK ON HOLIDAY: USING REINFORCEMENT LEARNING

Figure, and can be written out as (where 1 means that there is a link, and 0 means that there is
not):

74
THE DIFFERENCE BETWEEN SARSA AND Q-LEARNING

The most important difference between the two is how Q is updated after each
action. SARSA uses the Q' following a ε-greedy policy exactly, as A' is drawn
from it. In contrast, Q-learning uses the maximum Q' over all possible actions for
the next step.
 Both algorithms will start out with no information about the environment, and
will therefore explore randomly, using the -greedy policy. However, over time,
the strategies that the two algorithms produce are quite different.
 The main reason for the difference is that Q-learning always attempts to follow
the optimal path, which is the shortest one. This takes it close to the cliff, and the
-greedy part means that inevitably it will sometimes fall over. By way of
contrast, the sarsa algorithm will converge to a much safer route that keeps it
well away from the cliff, even though it takes longer.

75
USES OF REINFORCEMENT LEARNING

 Reinforcement learning has been used successfully for many problems, and the
results of computer modeling of reinforcement learning have been of great
interest to psychologists, as well as computer scientists, because of the close
links to biological learning.
 Reinforcement learning has been used in other robotic applications, including
robots learning to follow each other, travel towards bright lights, and even
navigate.
 In general, reinforcement learning is fairly slow, because it has to build up all of
the information through exploration and exploitation in order to find the better
solutions.
 It is also very dependent upon a carefully chosen reward function: get that
wrong and the algorithm will do something completely unexpected.
 A famous example of reinforcement learning was TD-Gammon, which was
produced by Gerald Tesauro. His idea was that reinforcement learning should be
very good at learning to play games, because games were clearly episodic—you
played until somebody won—and there was a clear reward structure, with a
positive reward for winning.
76
ASSIGNMENT QUESTIONS

1. What is Reinforcement Learning?


2. Explain the Q function and Q Learning Algorithm.
3. Describe K-nearest Neighbor learning Algorithm for
continues valued target function.
4. Discuss the major drawbacks of K-nearest Neighbor
learning Algorithm and how it can be corrected
5. Define the following terms with respect to K - Nearest
Neighbor Learning:
i) Regression ii) Residual iii) Kernel Function.
6. Explain Q learning algorithm assuming deterministic
rewards and actions?
8. Explain Locally Weighted Linear Regression.
9. Explain High Dimensional Spaces in machine learning.

77
10. What is The Curse of Dimensionality?
11. How the Dimensionality Reduction in Latent Variable.
12. Write algorithm for Principal Component Analysis.
13. Explain about Probabilistic PCA.
14. Differentiate Probabilistic PCA - Independent Components
Analysis.
15. What is Factor analysis?
16. (i)Describe in detail about Linear Discriminants. (ii)Discuss:
Generalizing the Linear Model and Geometry of the Linear
Discriminant.
17. Point out why dimensionality reduction is useful?
18. Define Factor Analysis or latent variables.
19. Distinguish between within-class scatter and between-
classes scatter. 78
20. Define PCA.
21. Describe what Isomap is.
22. Discover Locally Linear Embedding algorithm with k=12.
23. Explain the three different ways to do dimensionality reduction.
24. Explain what Least Squares Optimization is.
25. Difference action and state space.
26. Write what is Punctuated Equilibrium? Remember BTL1
27. How reinforcement learner experience and the corresponding action.
28. Express the basic tasks that need to be performed for GA.
29. Identify how reinforcement learning maps states to action
30. Examine Genetic Programming.
31. Differentiate Sarsa and Q-learning.
32. Explain Least Squares Optimization.
33. (i)Describe in detail about Generating Offspring Genetic Operators.
(ii)Discuss the Basic Genetic Algorithm.
79

You might also like