ML QB
ML QB
CP1104
MACHINE LEARNING TECHNIQUES
OBJECTIVES
❖ To introduce students to the basic concepts and techniques of Machine Learning.
❖ To have a thorough understanding of the Supervised and Unsupervised learning techniques
❖ To study the various probability-based learning techniques
❖ To understand graphical models of machine learning algorithms
UNIT I
INTRODUCTION
9
Learning – Types of Machine Learning – Supervised Learning – The Brain and the Neuron –
Design a Learning System – Perspectives and Issues in Machine Learning – Concept Learning
Task – Concept Learning as Search – Finding a Maximally Specific Hypothesis – Version
Spaces and the Candidate Elimination Algorithm – Linear Discriminants – Perceptron – Linear
Separability – Linear Regression.
UNIT II
LINEAR MODELS
Multi-layer Perceptron – Going Forwards – Going Backwards: Back Propagation Error – Multi-
layer Perceptron in Practice – Examples of using the MLP – Overview – Deriving Back-
Propagation – Radial Basis Functions and Splines – Concepts – RBF Network – Curse of
Dimensionality – Interpolations and Basis Functions – Support Vector Machines.
UNIT III
TREE AND PROBABILISTIC MODELS
Learning with Trees – Decision Trees – Constructing Decision Trees – Classification and
Regression Trees – Ensemble Learning – Boosting – Bagging – Different ways to Combine
Classifiers – Probability and Learning – Data into Probabilities – Basic Statistics – Gaussian
Mixture Models – Nearest Neighbor Methods – Unsupervised Learning – K means Algorithms –
Vector Quantization – Self Organizing Feature Map
UNIT IV
DIMENSIONALITY REDUCTION AND EVOLUTIONARY MODELS
UNIT V
GRAPHICAL MODELS
Markov Chain Monte Carlo Methods – Sampling – Proposal Distribution – Markov Chain Monte
Carlo – Graphical Models – Bayesian Networks – Markov Random Fields – Hidden Markov
Models – Tracking Methods
TOTAL : 45 PERIODS
TEXT BOOKS
1 Ethem Alpaydin, ―Introduction to Machine Learning 3e (Adaptive Computation and Machine
Learning Series)‖, Third Edition, MIT Press, 2014
2 Jason Bell, ―Machine learning – Hands on for Developers and Technical Professionals‖, First
Edition, Wiley, 2014
REFERENCE BOOKS
1. Peter Flach, ―Machine Learning: The Art and Science of Algorithms that Make Sense of
Data‖, First Edition, Cambridge University Press, 2012.
2. Stephen Marsland, ―Machine Learning – An Algorithmic Perspective‖, Second Edition,
Chapman and Hall/CRC Machine Learning and Pattern Recognition Series, 2014.
3. Tom M Mitchell, ―Machine Learning‖, First Edition, McGraw Hill Education, 2013.
UNIT-I INTRODUCTION
PART-A
1. Give precise definition of learning.
A computer program is said to learn from experience E with respect to some class of tasks
T and performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E.
2. Find T, P and E for checkers learning problem, handwriting recognition learning problem,
robot driving learning problem
Checkers learning problem:
• Task T: playing checkers
• Performance measure P: percent of games won against opponents
• Training experience E: playing practice games against itself.
Handwriting recognition learning problem:
• Task T: recognizing and classifying handwritten words within images
• Performance measure P: percent of words correctly classified
• Training experience E: a database of handwritten words with given classifications
Robot driving learning problem:
• Task T: driving on public four-lane highways using vision sensors
• Performance measure P: average distance travelled before an error
• Training experience E: a sequence of images and steering commands recorded while
observing a human driver
6. What is the difference between artificial intelligence and machine learning methods?
23. What is feature engineering? How do you apply it in the process of modelling?
Feature engineering is the process of transforming raw data into features that better represent the
underlying problem to the predictive models, resulting in improved model accuracy on unseen
data.
24. What value is the sum of the residuals of a linear regression close to? Justify.
The sum of the residuals of a linear regression is 0. Linear regression works on the assumption
that the errors (residuals) are normally distributed with a mean of 0, i.e.
Y = βT X + ε
Here, Y is the target or dependent variable,
β is the vector of the regression coefficient,
X is the feature matrix containing all the features as the columns,
ε is the residual term such that ε ~ N(0,σ2).
So, the sum of all the residuals is the expected value of the residuals times the total number of
data points. Since the expectation of residuals is 0, the sum of all the residual terms is zero.
PART-B
1. Define learning. What are the three features of learning? Explain the three features of learning
with the following problem.
a) Checkers learning problem
b) Handwriting recognition learning problem
c) Robot driving learning problem.
2 a) List and explain some successful applications of machine learning?
b) List some disciplines and examples of their influence on machine learning.
3. Explain the various steps in designing a learning system.
4.Elaborate the Perspectives and issues in machine learning
5. Explain concept learning task with the example of ENJOYSPORT.
6. Explain concept learning as search with the example of ENJOYSPORT.
7. Write FIND-S algorithm. Trace the algorithm with example.
8. Explain version space and List-Then-Eliminate algorithm with an example.
9. Explain CANDIDATE ELIMINATION learning algorithm with an example.
10. List and explain the various remarks on version spaces and candidate elimination.
11. Explain supervised learning with the concept of regression and classification.
12. Draw the neuron structure and explain the various parts.
13. Explain McCulloch-Pitts model with an example. Explain the limitations of MCP model.
14. Explain Perceptron learning algorithm with an example.
4. What do you mean by going forwards and backwards through the network?
Training the MLP consists of two parts:
• Working out what the outputs are for the given inputs and the current weights
• Updating the weights according to the error, which is a function of the difference between
the outputs and the targets. These are generally known as going forwards and backwards
through the network.
Back propagation is a method used in artificial neural networks to calculate the error
contribution of each neuron after a batch of data (e.g. in image recognition, multiple images) is
processed. This is used by an enveloping optimization algorithm to adjust the weight of each
neuron, completing the learning process for that case.
17. How many weights are there in MLP with one hidden layer?
For the MLP with one hidden layer there are (m+1) x n + (n x1) x p weights where m, n, p
are the number of nodes in the input, hidden and output layers respectively. The extra +1 come
from the bias node, which also have adjustable weights.
18. What are the various parameters that the hidden unit depends on?
The best number of hidden units depends in a complex way on many factors, including:
• The number of training patterns
• The numbers of input and output units
• The amount of noise in the training data
• The complexity of the function or classification to be learned
• The type of hidden unit activation function
• The training algorithm
An input vector x is used as input to all radial basis functions, each with different
parameters. The output of the network is a linear combination of the outputs from radial basis
functions.
23. State the applications of radial basis function network. (Jan 2018)
A radial basis function network is an artificial neural network that uses radial basis functions as
activation functions. The output of the network is a linear combination of radial basis functions
of the inputs and neuron parameters. Radial basis function networks have many uses, including
function approximation, time series prediction, classification, and system control.
24. Explain the bias-variance trade-off.
Bias refers to the difference between the values predicted by the model and the real values. It is an
error. One of the goals of an ML algorithm is to have a low bias.Variance refers to the sensitivity
of the model to small fluctuations in the training dataset. Another goal of an ML algorithm is to
have low variance.For a dataset that is not exactly linear, it is not possible to have both bias and
variance low at the same time. A straight line model will have low variance but high bias, whereas
a high-degree polynomial will have low bias but high variance. So, there is a trade-off between the
two; the ML specialist has to decide, based on the assigned problem, how much bias and variance
can be tolerated. Based on this, the final model is built.
25. What’s the “kernel trick” and how is it useful?
Kernel trick plays a huge role in application of SVM for non-linear separable classification
problems. The idea is to map the non-linear separable data-set into a higher dimensional space
where we can find a hyperplane that can separate the samples.
PART-B
1. Explain how the XOR problem can be solved by an MLP?
2. Explain the multilayer perceptron algorithm in detail.
3. Explain the following with respect to MLP.
a) Initializing the weights
b) Different output activation functions
4. Explain the following in detail
a) Sequential and batch training
b) Local minima
5. What are the choices that can be made about the network in order to use it for real problems?
Explain in detail.
6. How regression problem can be solved using MLP? Explain with example.
7. Explain classification with MLP in detail.
8. Explain time series prediction problem with MLP in detail.
9. Explain the auto associative network in detail.
10. Explain Radial Basis Function (RBF) network & training RBFN in detail.
11. Explain the following in detail
a) The curse of dimensionality
b) Interpolation and basis function.
12. Explain support vector machine in detail.
• Calculations can get very complex, particularly if many values are uncertain and/or if many
outcomes are linked.
8. What is CART?
Decision trees used in data mining are of two main types:
• Classification tree analysis is when the predicted outcome is the class to which the data
belongs.
• Regression tree analysis is when the predicted outcome can be considered a real number
(e.g. the price of a house, or a patient's length of stay in a hospital).
The term Classification And Regression Tree (CART) analysis is an umbrella term used to refer
to both of the above procedures, first introduced by researcher Breiman. Trees used for regression
and trees used for classification have some similarities - but also some differences, such as the
procedure used to determine where to split.
from other artificial neural networks as they apply competitive learning as opposed to error-
correction learning (such as back propagation with gradient descent), and in the sense that they use
a neighborhood function to preserve the topological properties of the input space.
PART-B
1. Explain ID3 algorithm with example.
2. Explain classification and regression trees with example.
3. Make a decision tree that computes the logical AND function. How does it compare to the
perceptron solution?
4. Explain AdaBoost algorithm with example.
5. Explain the different ways to combine classifiers.
6. Explain the following
a) Turning data into probabilities
b) Naïve Bayes classifier
7. Explain the important statistical concepts in detail.
8. Explain the Gaussian Mixture models in detail.
9. Explain nearest neighbor methods in detail.
10. Explain the k-means algorithm in detail
11. Explain vector quantization in detail.
12. Explain SOFM in detail.
variation that remains and finds another axis that it orthogonal to the first and covers as much of
the remaining variation as possible. It then iterates this until it has run out of possible axes.
12. What are three main types of rules the genetic algorithm uses at each step to create the
next generation from the current population?
The genetic algorithm uses three main types of rules at each step to create the next generation
from the current population:
• Selection rules select the individuals, called parents that contribute to the population at the
next generation.
• Crossover rules combine two parents to form children for the next generation.
• Mutation rules apply random changes to individual parents to form children.
13. Enumerate the difference between genetic algorithm and classical optimization
algorithm.
The genetic algorithm differs from a classical, derivative-based, optimization algorithm in
two main ways, as summarized in the following table.
17. What are the three basic tasks to be performed when you want to apply genetic
algorithm?
• Encode possible solutions as strings
• Choose a suitable fitness functions
• Choose suitable genetic operators.
maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic
programming and reinforcement learning.
21. What two requirements should a problem satisfy in order to be suitable for solving it by
a Genetic Algorithm (GA)?
GA can only be applied to problems that satisfy the following requirements: The fitness function
can be well–defined. Solutions should be decomposable into steps (building blocks) which could
be then encoded as chromosomes.
22. Explain the Confusion Matrix with Respect to Machine Learning Algorithms.
A confusion matrix (or error matrix) is a specific table that is used to measure the performance of
an algorithm. It is mostly used in supervised learning; in unsupervised learning, it’s called the
matching matrix. The confusion matrix has two parameters:
• Actual
• Predicted
It also has identical sets of features in both of these dimensions.
One of the primary differences between machine learning and deep learning is that feature
engineering is done manually in machine learning. In the case of deep learning, the model
consisting of neural networks will automatically determine which features to use (and which not
to use).
PART-B
1. Explain the three different ways to do dimensionality reduction in detail.
2. Explain Linear Discriminant analysis in detail.
3. Explain principal component analysis algorithm in detail.
4. Explain Factor analysis in detail.
5. Use the LDA on the Iris dataset. Compare the results with using PCA, which is not supervised
and will not therefore be able to find the same space.
6. Explain Levenberg-Marquardt algorithm in detail.
7. Explain Conjugate gradients algorithm in detail.
8. Explain the three basic approaches in search techniques.
9. Explain evolutionary learning in detail.
10. Explain reinforcement learning in detail
4. What are the two forms that the Box–Muller transform is commonly expressed?
The Box–Muller transform is commonly expressed in two forms. The basic form as given
by Box and Muller takes two samples from the uniform distribution on the interval [0, 1] and maps
them to two standard, normally distributed samples. The polar form takes two samples from a
different interval, [−1, +1], and maps them to two normally distributed samples without the use of
sine or cosine functions.
5. What is chain?
In probabilistic term, a chain is a sequence of possible states, where the probability of being
in state s at time t is a function of the previous state.
Markov chain is a chain with the Markov property. That is the probability at time t depends
only on the state at t-1.
factorization and independences, but they differ in the set of independences they can encode and
the factorization of the distribution that they induce
PART-B