Lec1 PDF
Lec1 PDF
to Machine learning
Evaluation: Assignment 1 (Please submit the report for the assignment 1 along-with the
pseudo-code)
Textbooks:
T1. Simon Haykin, “Neural Networks – A comprehensive Foundation”, Pearson Education, 1999.
T2. H. J. Zimmermann, “Fuzzy Set Theory and its Applications”,3rd Edition, Kluwer Academic, 1996.
Reference books/Materials
Applications:
Test data
❑The systems that use this method are able to considerably improve learning
accuracy.
This is a natural algorithm which takes a step in the direction of steepest decrease
of the cost function ‘J’. Now, the partial derivative is evaluated as
Where ‘j’ is the number of attributes or features with j=0, 1, 2…., n and the
training instances varies from i=1, 2, …, m. This method looks at every example in
the entire training set on every step, and is called batch gradient descent.
Stochastic gradient descent
❑ In this algorithm, we repeatedly run through the training set, and each time we
encounter a training example, we update the parameters according to the
gradient of the error with respect to that single training example only. This
algorithm is called stochastic gradient descent (also incremental gradient
descent).
❑ Often, stochastic gradient descent gets w “close” to the minimum much faster
than batch gradient descent.
Regularized Linear Regression (Ridge Regression)
❑ Regularization helps to deal with the bias-variance problem of
model development. When small changes are made to data, such as
switching from the training to testing data, there can be wild
changes in the estimates. Regularization can often smooth this
problem out substantially.
The minimization of error is same as the maximization of the log likelihood and it is
given by
Logistic regression (Binary Classification)
The partial derivative of the cost function for single training instance
is given by
Hence using the LMS rule, the weight values for logistic regression is estimated as
❑In One Vs All algorithm, 4 binary class models are created (‘n’ classes
so ‘n’ number of models). The following block diagram explains the
procedure for One Vs All based multiclass coding algorithm.
Now, the gradient of the cost function for single training instance is given by
Hence using the LMS rule, the weight values for the cost function is estimated as
Logistic Regression with L1-Norm Regularization
The cost function or likelihood function for logistic regression with
L1-Norm regularization is given by
Now, the gradient of the cost function for a single training instance is given by
Hence using the LMS rule, the weight values for the cost function is estimated as
Unsupervised Learning (k-means clustering)
❑ Calculate the distance between each data point and cluster centers.
❑ Assign the data point to the cluster center whose distance from the cluster
center is minimum of all the cluster centers.
❑The holdout method is the simplest kind of cross validation. The data set is
separated into two sets, called the training set and the testing set.
❑The function approximator fits a function using the training set only. Then the
function approximator is asked to predict the output values for the data in the
testing set (it has never seen these output values before).
❑ K-fold cross validation is one way to improve over the holdout method. The data set is divided into
k subsets, and the holdout method is repeated k times.
❑ Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together
to form a training set.
❑ Then the average error across all k trials is computed. The advantage of this method is that it
matters less how the data gets divided.
❑ The disadvantage of this method is that the training algorithm has to be rerun from scratch k
times, which means it takes k times as much computation to make an evaluation.
Leave-one-out cross validation
❑That means that N separate times, the function approximator is trained on all
the data except for one point and a prediction is made for that point.
Performance Measures for Binary Classifier
❑ The TP, the TN, the FP and the FN are number of true positives, number of
true negatives, number of false positives and number of false negatives,
respectively.
❑The sensitivity (SE) is defined as proportion of abnormal episodes that are
accurately classified as abnormal and it is given by
❑The specificity (SP) is defined as proportion of normal episodes
that are accurately classified as normal and it is given by
i=1,2,…m
The posterior probability evaluated using Bayes’s theorem is given by