Machine Learning - AL3451 - Important Questions With Answer
Machine Learning - AL3451 - Important Questions With Answer
4th Semester
2nd Semester
Deep Learning -
AD3501
Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester
7th Semester
8th Semester
Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
lOMoARcPSD|45333583
www.BrainKart.com
UNIT - I
Machine Learning is the field of study that gives computers the capability to learn without
being explicitly programmed. ML. is one of the most exciting technologies that one would have
ever come across. As it is evident from the name, it gives the computer that makes it more
similar to humans: The ability to learn. Machine learning is actively being used today, perhaps
in many more places than one would expect
- Netflix's Recommendation Engine: The core of Netflix is its infamous recommendation engine.
Over 75% of what you watch is recommended by Netflix and these recommendations are made
by implementing Machine Learning.
- Facebook's Auto-tagging feature: The logic behind Facebook's DeepMind face verification
system is Machine Learning and Neural Networks. DeepMind studies the facial features in an
image to tag your friends and family.
- Amazon's Alexa: The infamous Alexa, which is based on Natural Language Processing and
Machine Learning is an advanced level Virtual Assistant that does more than just play songs on
your playlist. It can book you an Uber, connect with the other loT devices at home, track your
health, etc.
- Google's Spam Filter: Gmail makes use of Machine Learning to filter out spam messages. It
uses Machine Learning algorithms and Natural Language Processing to analyze emails in real-
time and classify them as either spam or non-spam.
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Unsupervised learning involves training by using unlabeled data and allowing the model to act
on that information without guidance.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
- Think of unsupervised learning as a smart kid that learns without any guidance. In this type of
Machine Learning, the model is not fed with labeled data, as in the model has no clue that 'this
image is Tom and this is Jerry', it figures out patterns and the differences between Tom and
Jerry on its own by taking in tons of data.
- Deep learning is a machine learning technique that teaches computers to do what comes
naturally to humans: learn by example. Deep learning is a key technology behind driverless cars,
enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost. It is the
key to voice control in consumer devices like phones, tablets, TVs, and hands-free speakers.
Deep learning is getting lots of attention lately and for good reason. It's achieving results that
were not possible before.
- In deep learning, a computer model learns to perform classification tasks directly from images,
text, or sound. Deep learning models can achieve state-of-the-art accuracy, sometimes
exceeding human-level performance.
- Models are trained by using a large set of labeled data and neural network architectures that
contain many layers.
- This type of Machine Learning is comparatively different. Imagine that you were dropped off
at an isolated island! What would you do?
- Panic? Yes, of course, initially we all would. But as time passes by, you will learn how to live on
the island. You will explore the environment, understand the climate condition, the type of
food that grows there, the dangers of the island, etc. This is exactly how Reinforcement
Learning works, it involves an Agent (you, stuck on the island) that is put in an unknown
environment (island), where he must learn by observing and performing actions that result in
rewards.
- Reinforcement Learning is mainly used in advanced Machine Learning areas such as self-
driving cars, AplhaGo, etc.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
7. Define VC Dimension?
8. What is shattering?
Shattering is the ability of a model to classify a set of points perfectly. More generally, the
model can create a function that can divide the points into two distinct classes without
overlapping. It is different from simple classification because it considers all possible
combinations of labels upon those points. Later in the shot, we’ll see this concept in action
while computing the VC dimension. In the context of shattering, we simply define the VC
dimension of a model as the size of the largest set of points that that model can shatter.
9. What is Hypothesis?
Null Hypothesis: A null hypothesis is a type of statistical hypothesis which tells that there is no
statistically significant effect exists in the given set of observations. It is also known as
conjecture and is used in quantitative analysis to test theories about markets, investment, and
finance to decide whether an idea is true or false.
-The p-value in statistics is defined as the evidence against a null hypothesis. In other words, P-
value is the probability that a random chance generated the data or something else that is
equal or rarer under the null hypothesis condition.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
- If the p-value is smaller, the evidence will be stronger, and vice-versa which means the null
hypothesis can be rejected in testing. It is always represented in a decimal form, such as 0.035.
- Whenever a statistical test is carried out on the population and sample to find out P-value,
then it always depends upon the critical value. If the p value is less than the critical value, then
it shows the effect is significant, and the null hypothesis can be rejected. Further, if it is higher
than the critical value, it shows that there is no significant effect and hence fails to reject the
Null Hypothesis.
The prediction results of a machine learning model stand somewhere between (a) low-bias,
low-variance, (b) low-bias, high-variance (c) high-bias, low- variance, and (d) high-bias, high-
variance. A low-biased, high-variance model is called overfit and a high-biased, low-variance
model is called underfit. By generalization, we find the best trade-off between underfitting and
overfitting so that a trained model obtains the best performance. An overfit model obtains a
high prediction score on seen data and low one from unseen datasets. An underfit model has
low performance in both seen and unseen datasets.
When a machine learning model becomes too complex, it is usually prone to overfitting. There
are methods that help to make the model simpler. They are called Regularization methods.
Following we explain it.
Regularization is collection of methods to make a machine learning model simpler. To this end,
certain approaches are applied to different machine learning algorithms, for instance, pruning
for decision trees, dropout techniques for neural networks, and adding a penalty parameter to
the cost function in Regression.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
PART-B & C
9. What are the ways to secure that a machine learning model is generalized?
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
UNIT – II
Regression algorithms are used when you notice that the output is a continuous variable,
whereas classification algorithms are used when the output is divided into sections such as
Pass/Fail, Good/Average/Bad, etc. We have various algorithms for performing the regression or
classification actions, with Linear Regression Algorithm being the basic algorithm in Regression.
Y = βo + β1 X + €
- Predicting crop yields based on the amount of rainfall: Yield is dependent variable while the
amount of rainfall is independent variable.
- Marks scored by student based on number of hours studied (ideally): Here marks scored is
dependent and number of hours studied is independent.
- Predicting the Salary of a person based on years of experience: Thus Experience become the
independent variable while Salary becomes the dependent variable
The size of the error in our prediction doesn't change significantly across the values of the
independent variable.
6. Define Linearity?
The line of best fit through the data points is a straight line, rather than a curve or some sort of
grouping factor.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
Before starting the working principle of gradient descent, we should know some basic concepts
to find out the slope of a line from linear regression. The equation for simple linear regression is
given as: Y = mX+c
Where 'm' represents the slope of the line, and 'c' represents the intercepts on the y-axis.
The starting point is used to evaluate the performance as it is considered just as an arbitrary
point. At this starting point, we will derive the first derivative or slope and then use a tangent
line to calculate the steepness of this slope. Further, this slope will inform the updates to the
parameters (weights and bias).
9. What is SGD?
Stochastic gradient descent (SGD) is a type of gradient descent that runs one training example
per iteration. Or in other words, it processes a training epoch for each example within a dataset
and updates each training example's parameters one at a time. As it requires only one training
example at a time, hence it is easier to store in allocated memory
Mini Batch gradient descent is the combination of both batch gradient descent and stochastic
gradient descent. It divides the training datasets into small batch sizes then performs the
updates on those batches separately. Splitting training datasets into smaller batches make a
balance to maintain the computational efficiency of batch gradient descent and speed of
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
stochastic gradient descent. Hence, we can achieve a special type of gradient descent with
higher computational efficiency and less noisy gradient descent.
Quadratic Discriminate Analysis (QDA) For multiple input variables, each class deploys its own
estimate of variance.
FDA uses regularization in the estimate of the variance (actually covariance) and hence
moderates the influence of different variables on LDA.
Binary classifiers are defined as the function that helps in deciding whether input data can be
represented as vectors of numbers and belongs to some specific class. It can be considered as
linear classifiers. In simple words, we can understand it as a classification algorithm that can
predict linear predictor function in terms of weight and feature vectors.
- Naive Bayes
- Hidden Markov Models
- Autoencoder
- Boltzmann Machines
- Variational Autoencoder
- Generative Adversarial Networks
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
Support Vector Machine (SVM) is a non-parametric, supervised learning technique very popular
with engineers for it produces excellent results with significantly less compute. A Machine
Learning algorithm, it can be applied to both classification (output is deterministic) and
regression (output is continuous problems. It is largely used in text classification, image
classification, protein and gene classification.
Like SVM, Random Forest also falls in the class of discriminative modelling It is one of the most
popular and powerful Machine Learning algorithms to perform classification and regression.
Random Forest became the Kaggle Community as it helped win many competitions.
PART-B & C
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
UNIT - III
It is a machine learning technique that combines several base models in order to produce one
optimal predictive model. Decision Trees is best to outline the definition and practicality of
Ensemble Methods (however it is important to note that Ensemble Methods do not only
pertain to Decision Trees).
- Stacking
- Blending
- Bagging
- Boosting
Unsupervised learning, also known as unsupervised machine learning, uses machine learning
algorithms to analyze and cluster unlabeled datasets. These algorithms discover hidden
patterns or data groupings without the need for human intervention. Its ability to discover
similarities and differences in information make it the ideal solution for exploratory data
analysis, cross-selling strategies, customer segmentation, and image recognition
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
K-Means clustering is used in a variety of examples or business cases in real life, like:
- Academic performance
- Diagnostic systems
- Search engines
- Wireless sensor networks
This model is a soft probabilistic clustering model that allows us to describe the membership of
points to a set of clusters using a mixture of Gaussian densities. It is a soft classification (in
contrast to a hard one) because it assigns probabilities of belonging to a specific class instead of
a definitive choice. In essence, each observation will belong to every class but with different
probabilities.
The Expectation-Maximization algorithm is performed exactly the same way. In fact, the
optimization procedure we describe above for GMMs is a specific implementation of the EM
algorithm. The EM algorithm is just more generally and formally defined (as it can be applied to
many other optimization problems).
So the general idea is that we are trying to maximize a likelihood (and more frequently a log-
likelihood), that is, we are trying to solve the following optimization problem:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
PART-B & C
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
UNIT - IV
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
An activation function is a function that is added to an artificial neural network in order to help
the network learn complex patterns in the data. When comparing with a neuron-based model
that is in our brains, the activation function is at the end deciding what is to be fired to the next
neuron. The neuron doesn't really know how to bound to value and thus is not able to decide
the firing pattern. Thus the activation function is an important part of an artificial neural
network.
In the process of training, we want to start with a bad performing neural network and wind up
with network with high accuracy. In terms of loss function, we want our loss function to much
lower in the end of training. Improving the network is possible, because we can change its
function by adjusting weights. We want to find another function that performs better than the
initial one.
Gradient, in plain terms means slope or slant of a surface. So gradient descent literally means
descending a slope to reach the lowest point on that surface
In Stochastic Gradient Descent, a few samples are selected randomly instead of the whole data
set for each iteration. In Gradient Descent, there is a term called "batch" which denotes the
total number of samples from a dataset that is used for calculating the gradient for each
iteration. In typical Gradient Descent optimization, like Batch Gradient Descent, the batch is
taken to be the whole dataset.
1. The algorithm starts at a random point by initializing the weights with random values
2. Then it calculates the gradients at that random point
3. Then it moves in the opposite direction of the gradient
4. The process continues to repeat itself until it finds the point of minimum loss
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
Backpropagation defines the whole process encompassing both the calculation of the gradient
and its need in the stochastic gradient descent. Technically, backpropagation is used to
calculate the gradient of the error of the network concerning the network's modifiable weights.
The characteristics of Backpropagation are the iterative, recursive and effective approach
through which it computes the updated weight to increase the network until it is not able to
implement the service for which it is being trained
Each output unit compares activation YK with the target value TK to determine the associated
error for that unit. It is based on the error, the factor δδ (K = 1,… m) is computed and is used to
distribute the error at the output unit YK back to all units in the previous layer. Similarly the
factor δδj(j = 1,...p) is compared for each hidden unit Zj.
The vanishing gradient problem is an issue that sometimes arises when training machine
learning algorithms through gradient descent. This most often occurs in neural networks that
have several neuronal layers such as in a deep learning system, but also occurs in recurrent
neural networks.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
The key point is that the calculated partial derivatives used to compute the gradient as one
goes deeper into the network. Since the gradients control how much the network learns during
training, if the gradients are very small or zero, then little to no training can take place, leading
to poor predictive performance.
The rectified linear activation unit, or ReLU, is one of the few landmarks in the deep learning
revolution. It's simple, yet it's far superior to previous activation functions like sigmoid or tanh.
Randomized Search CV solves the drawbacks of Grid Search CV, as it goes through only a fixed
number of hyperparameter settings. It moves within the grid in a random fashion to find the
best set of hyperparameters. This approach reduces unnecessary computation.
Regularization refers to the modifications that can be made to a learning algorithm that helps
to reduce this generalization error and not the training error. It reduces by ignoring the less
important features. It also helps prevent overfitting, making the model more robust and
decreasing the complexity of a model.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
neural network where all units are activated. On the right, the red units have been dropped out
of the model - the values of their weights and biases are not considered during training.
When we apply dropout to a neural network, we're creating a "thinned" network with unique
combinations of the units in the hidden layers being dropped randomly at different points in
time during training. Each time the gradient of our model is updated, we generate a new
thinned neural network with different units dropped based on a probability hyperparameter p.
Training a network using dropout can thus be viewed as training loads of different thinned
neural networks and merging them into one network that picks up the key properties of each
thinned network. This process allows dropout to reduce the overfitting of models on training
data.
Although dropout is clearly a highly effective tool, it comes with certain drawbacks. A network
with dropout can take 2-3 times longer to train than a standard network. One way to attain the
benefits of dropout without slowing down training is by finding a regularizer that is essentially
equivalent to a dropout layer. For linear regression, this regularizer has been proven to be a
modified form of L2 regularization.
PART-B & C
8. Discuss on ReLU.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
UNIT - V
Machine learning projects are highly iterative; as you progress through the ML lifecycle, you'll
find yourself iterating on a section until reaching a satisfactory level of performance, then
proceeding forward to the next task (which may be circling back to an even carlier step).
Moreover, a project isn't complete after you ship the first version; you get feedback from real-
world interactions and redefine the goals for the next iteration of deployment.
Validation Set approach is very simple method and frequently used method when there is
sufficiently enough amount of observations to get reasonable results. It is basically dividing the
data that we have to two place as train set and validation set (or holdout set), and building
model on train set, then checking the model accuracy on validation set. And resulting accuracy
from validation set is the estimate about the real test data(unseen data).
5. Define LOOCV.
Leave One Out Cross Validation method is addressed to the drawbacks of the validation set
approach and it is also simple method as validation set approach. Main point here is to take
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
each observation as a validation set one time. It means that if we have "n" number of
observations, we will fit model "n" times. And in every try we will keep one observation as a
test sample, and will train the model on "n-1" observations.
- Bootstrap Sampling
- K Fold Cross Validation
- Leave One Out Cross Validation
Resampling methods are very useful and beneficial in statistics and machine learning to fit more
accurate models, model selection and parameter tuning. They draw samples from train data
and fit model to check the variability of model and get additional information. We cannot best
sure of the result of the model by just unique fit without testing on different sample or samples.
It can be computationally expensive because of fitting model more than one, but recent
improvements tackle this issue easily without too much effort.
Cross validation resamples without replacement and thus produces surrogate data sets that are
smaller than the original. These data sets are produced in a systematic way so that after a pre-
specified number k of surrogate data sets, each of the n original cases has been left out exactly
once. This is called k-fold cross validation or leave-x-out cross validation with x=n/k, e.g. leave-
one-out cross validation omits 1 case for each surrogate set, i.e. k = n.
In general, statistical inference uses data from a sample to make estimates or predictions about
a population. Both bootstrapping and cross-validation are used to estimate the performance of
our population's "standard error" or, more simply, how our machine-learning algorithm will do
in a production system on unseen data.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
The main difference between the two methods is that bootstrapping is a resampling technique,
while cross-validation is a partitioning technique. Bootstrapping involves random sampling with
replacement from the training data set to create multiple new training sets.
This means that bootstrapping will lower the variance for our machine- learning model.
The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns
from the given dataset or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be
called as targets/labels or categories.
The Naive Bayes method is a supervised learning algorithm based on applying Bayes theorem
with the "naive" assumption of conditional independence between every pair of features given
the value of the class variable. Naive Bayes classifiers are a collection of classification
algorithms based on Bayes' Theorem. It is not a single algorithm but a family of algorithms
where all of them share a common principle, i.e. every pair of features being classified is
independent of each other.
Support vector machine is based on statistical approaches. Her we try to find a hyperplane that
best separates the two classes. SVM finding the maximum margin between the hyperplanes
that means maximum distances between the two classes. SVM works best when the dataset is
small and complex. When the data is perfectly linearly separable only then we can use Linear
SVM. When the data is not linearly separable then we can use Non-Linear SVM, which means
when the data points cannot be separated into 2 classes by using a linear approach.
A T-test is the final statistical measure for determining differences between two means that
may or may not be related. The testing uses randomly selected samples from the two
categories or groups. It is a statistical method in which samples are chosen randomly, and there
is no perfect normal distribution.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583
www.BrainKart.com
McNemar's Test: This is a non-parametric test for the paired nominal data. This test is used
when we want to find the change in proportion for the paired data. This test is also known as
McNemar's Chi-Square test. This is because the test statistic has a chi-square distribution.
K-fold cross-validated paired t-test procedure is a common method for comparing the
performance of two models (classifiers or regressors) and addresses some of the drawbacks of
the resampled t-test procedure; however, this method has still the problem that the training
sets overlap and is not recommended to be used in practice.
PART-B & C
9. Briefly explain the Popular algorithms that can be used for binary classification
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
Click on Subject/Paper under Semester to enter.
Professional English Discrete Mathematics Environmental Sciences
Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester
4th Semester
2nd Semester
Deep Learning -
AD3501
Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester
7th Semester
8th Semester
Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering