3rd Unit Last 5 Answer AIML
3rd Unit Last 5 Answer AIML
3rd Unit Last 5 Answer AIML
b0 + b1*x1 + b2*x2 = 0
Where b0, b1 and b2 are the coefficients of the line that control the
intercept and slope, and x1 and x2 are two input variables.
Logistic Regression
Linear Discriminant Analysis
Perceptron
Naive Bayes
Simple Neural Networks
Benefits of Parametric Machine Learning Algorithms:
If you have more than two classes then Linear Discriminant Analysis is the
preferred linear classification technique.
In this post you will discover the Linear Discriminant Analysis (LDA)
algorithm for classification predictive modeling problems. After reading this
post you will know:
The limitations of logistic regression and the need for linear discriminant analysis.
The representation of the model that is learned from data and can be saved to
file.
How the model is estimated from your data.
How to make predictions from a learned LDA model.
How to prepare your data to get the most from the LDA model.
This post is intended for developers interested in applied machine learning,
how the models work and how to use them well. As such no background in
statistics or linear algebra is required, although it does help if you know
about the mean and variance of a distribution.
LDA is a simple model in both preparation and application. There is some
interesting statistics behind how the model is setup and how the prediction
equation is derived, but is not covered in this post.
These statistical properties are estimated from your data and plug into the
LDA equation to make predictions. These are the model values that you
would save to file for your model.
The mean (mu) value of each input (x) for each class (k) can be estimated
in the normal way by dividing the sum of values by the total number of
values.
Where muk is the mean value of x for the class k, nk is the number of
instances with class k. The variance is calculated across all classes as the
average squared difference of each value from the mean.
Where sigma^2 is the variance across all inputs (x), n is the number of
instances, K is the number of classes and mu is the mean for input x.
Where PIk refers to the base probability of each class (k) observed in your
training data (e.g. 0.5 for a 50-50 split in a two class problem). In Bayes’
Theorem this is called the prior probability.
PIk = nk/n
Dk(x) is the discriminate function for class k given input x, the muk,
sigma^2 and PIk are all estimated from your data.
Classification Problems. This might go without saying, but LDA is intended for
classification problems where the output variable is categorical. LDA supports
both binary and multi-class classification.
Gaussian Distribution. The standard implementation of the model assumes a
Gaussian distribution of the input variables. Consider reviewing the univariate
distributions of each attribute and using transforms to make them more
Gaussian-looking (e.g. log and root for exponential distributions and Box-Cox for
skewed distributions).
Remove Outliers. Consider removing outliers from your data. These can skew the
basic statistics used to separate classes in LDA such the mean and the standard
deviation.
Same Variance. LDA assumes that each input variable has the same variance. It is
almost always a good idea to standardize your data before using LDA so that it
has a mean of 0 and a standard deviation of 1.
Extensions to LDA
Linear Discriminant Analysis is a simple and effective method for
classification. Because it is simple and so well understood, there are many
extensions and variations to the method. Some popular extensions include:
Quadratic Discriminant Analysis (QDA): Each class uses its own estimate of
variance (or covariance when there are multiple input variables).
Flexible Discriminant Analysis (FDA): Where non-linear combinations of inputs is
used such as splines.
Regularized Discriminant Analysis (RDA): Introduces regularization into the
estimate of the variance (actually covariance), moderating the influence of
different variables on LDA.
The original development was called the Linear Discriminant or Fisher’s
Discriminant Analysis. The multi-class version was referred to Multiple
Discriminant Analysis. These are all simply referred to as Linear
Discriminant Analysis now.
Gradient Descent
In the below figure, the shortest from the starting point ( the
peak) to the optima ( valley) is along the gradient trajectory. The
same principle applies the multi-dimensional space which is
generally the case for machine learning training.
To demonstrate how gradient descent is applied in machine
learning training, we’ll use logistic regression.
Binary Case
Π is a product operator.
Multiclass Case
Backpropagation:
Okay, fine, we have selected some weight values in the beginning, but our
model output is way different than our actual output i.e. the error value is
huge.
Calculate the error – How far is your model output from the actual
output.
Minimum Error – Check whether the error is minimized or not.
Update the parameters – If the error is huge then, update the
parameters (weights and biases). After that again check the error.
Repeat the process until the error becomes minimum.
Model is ready to make a prediction – Once the error becomes
minimum, you can feed some inputs to your model and it will produce
the output.
I am pretty sure, now you know, why we need Backpropagation or why and
what is the meaning of training a model.
What is Backpropagation?
The Backpropagation algorithm looks for the minimum value of the error
function in weight space using a technique called the delta rule or gradient
descent. The weights that minimize the error function is then considered to
be a solution to the learning problem.
Model output
Input Desired Output
(W=3)
0 0 0
1 2 3
2 4 6
Notice the difference between the actual output and the desired output:
Model Model
Desired Absolute Square Square
Input output output
Output Error Error Error
(W=3) (W=2)
0 0 0 0 0 0 0
1 2 3 2 4 3 0
2 4 6 2 4 4 0
Now, what we did here:
So, we are trying to get the value of weight such that the error becomes
minimum. Basically, we need to figure out whether we need to increase or
decrease the weight value. Once we know that, we keep on updating the
weight value in that direction until error becomes minimum. You might
reach a point, where if you further update the weight, the error will increase.
At that time you need to stop, and that is your final weight value.
two inputs
two hidden neurons
two output neurons
two biases