0% found this document useful (0 votes)
710 views25 pages

Machine Learning - AL3451 - Important Questions With Answer

Uploaded by

Vidhya Gopinath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
710 views25 pages

Machine Learning - AL3451 - Important Questions With Answer

Uploaded by

Vidhya Gopinath
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Click on Subject/Paper under Semester to enter.

Professional English Discrete Mathematics Environmental Sciences


Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -


Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -


Engineering Chemistry Information Science Algorithms - AD3351 AL3451
- CY3151 - PH3256
Data Exploration and Fundamentals of Data
Basic Electrical and
Visualization - AD3301 Science and Analytics
Problem Solving and Electronics Engineering -
BE3251 - AD3491
Python Programming -
GE3151 Artificial Intelligence
Data Structures Computer Networks
- AL3391
Design - AD3251 - CS3591

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791


6th Semester

7th Semester

8th Semester

Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
lOMoARcPSD|45333583

www.BrainKart.com

AL3451 MACHINE LEARNING

UNIT - I

TWO MARKS QUESTIONS AND ANSWERS (PART-A)

1. Define Machine Learning?

Machine Learning is the field of study that gives computers the capability to learn without
being explicitly programmed. ML. is one of the most exciting technologies that one would have
ever come across. As it is evident from the name, it gives the computer that makes it more
similar to humans: The ability to learn. Machine learning is actively being used today, perhaps
in many more places than one would expect

2. What are the Machine Learning applications?

- Netflix's Recommendation Engine: The core of Netflix is its infamous recommendation engine.
Over 75% of what you watch is recommended by Netflix and these recommendations are made
by implementing Machine Learning.

- Facebook's Auto-tagging feature: The logic behind Facebook's DeepMind face verification
system is Machine Learning and Neural Networks. DeepMind studies the facial features in an
image to tag your friends and family.

- Amazon's Alexa: The infamous Alexa, which is based on Natural Language Processing and
Machine Learning is an advanced level Virtual Assistant that does more than just play songs on
your playlist. It can book you an Uber, connect with the other loT devices at home, track your
health, etc.

- Google's Spam Filter: Gmail makes use of Machine Learning to filter out spam messages. It
uses Machine Learning algorithms and Natural Language Processing to analyze emails in real-
time and classify them as either spam or non-spam.

3. List the types of Machine Learning?

- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning

4. Define unsupervised Learning?

- Unsupervised learning involves training by using unlabeled data and allowing the model to act
on that information without guidance.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

- Think of unsupervised learning as a smart kid that learns without any guidance. In this type of
Machine Learning, the model is not fed with labeled data, as in the model has no clue that 'this
image is Tom and this is Jerry', it figures out patterns and the differences between Tom and
Jerry on its own by taking in tons of data.

5. Define deep learning?

- Deep learning is a machine learning technique that teaches computers to do what comes
naturally to humans: learn by example. Deep learning is a key technology behind driverless cars,
enabling them to recognize a stop sign, or to distinguish a pedestrian from a lamppost. It is the
key to voice control in consumer devices like phones, tablets, TVs, and hands-free speakers.
Deep learning is getting lots of attention lately and for good reason. It's achieving results that
were not possible before.

- In deep learning, a computer model learns to perform classification tasks directly from images,
text, or sound. Deep learning models can achieve state-of-the-art accuracy, sometimes
exceeding human-level performance.

- Models are trained by using a large set of labeled data and neural network architectures that
contain many layers.

6. What is mean by reinforcement learning?

- Reinforcement Learning is a part of Machine learning where an agent is put in an environment


and he learns to behave in this environment by performing certain actions and observing the
rewards which it gets from those actions.

- This type of Machine Learning is comparatively different. Imagine that you were dropped off
at an isolated island! What would you do?

- Panic? Yes, of course, initially we all would. But as time passes by, you will learn how to live on
the island. You will explore the environment, understand the climate condition, the type of
food that grows there, the dangers of the island, etc. This is exactly how Reinforcement
Learning works, it involves an Agent (you, stuck on the island) that is put in an unknown
environment (island), where he must learn by observing and performing actions that result in
rewards.

- Reinforcement Learning is mainly used in advanced Machine Learning areas such as self-
driving cars, AplhaGo, etc.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

7. Define VC Dimension?

The Vapnik-Chervonenkis dimension, more commonly known as the VC dimension, is a model


capacity measurement used in statistics and machine learning. It is termed informally as a
measure of a model’s capacity. It is used frequently to guide the model selection process while
developing machine learning applications. To understand the VC dimension, we must first
understand shattering.

8. What is shattering?

Shattering is the ability of a model to classify a set of points perfectly. More generally, the
model can create a function that can divide the points into two distinct classes without
overlapping. It is different from simple classification because it considers all possible
combinations of labels upon those points. Later in the shot, we’ll see this concept in action
while computing the VC dimension. In the context of shattering, we simply define the VC
dimension of a model as the size of the largest set of points that that model can shatter.

9. What is Hypothesis?

The hypothesis is defined as the supposition or proposed explanation based on insufficient


evidence or assumptions. It is just a guess based on some known facts but has not yet been
proven. A good hypothesis is testable, which results in either true or false.

10. What is Null Hypothesis?

Null Hypothesis: A null hypothesis is a type of statistical hypothesis which tells that there is no
statistically significant effect exists in the given set of observations. It is also known as
conjecture and is used in quantitative analysis to test theories about markets, investment, and
finance to decide whether an idea is true or false.

11. What is Alternative Hypothesis

Alternative Hypothesis: An alternative hypothesis is a direct contradiction of the null


hypothesis, which means if one of the two hypotheses is true, then the other must be false. In
other words, an alternative hypothesis is a type of statistical hypothesis which tells that there is
some significant effect that exists in the given set of observations.

12. State P-value?

-The p-value in statistics is defined as the evidence against a null hypothesis. In other words, P-
value is the probability that a random chance generated the data or something else that is
equal or rarer under the null hypothesis condition.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

- If the p-value is smaller, the evidence will be stronger, and vice-versa which means the null
hypothesis can be rejected in testing. It is always represented in a decimal form, such as 0.035.

- Whenever a statistical test is carried out on the population and sample to find out P-value,
then it always depends upon the critical value. If the p value is less than the critical value, then
it shows the effect is significant, and the null hypothesis can be rejected. Further, if it is higher
than the critical value, it shows that there is no significant effect and hence fails to reject the
Null Hypothesis.

13. Define Variance-bias trade-off

The prediction results of a machine learning model stand somewhere between (a) low-bias,
low-variance, (b) low-bias, high-variance (c) high-bias, low- variance, and (d) high-bias, high-
variance. A low-biased, high-variance model is called overfit and a high-biased, low-variance
model is called underfit. By generalization, we find the best trade-off between underfitting and
overfitting so that a trained model obtains the best performance. An overfit model obtains a
high prediction score on seen data and low one from unseen datasets. An underfit model has
low performance in both seen and unseen datasets.

14. Explain Model complexity

When a machine learning model becomes too complex, it is usually prone to overfitting. There
are methods that help to make the model simpler. They are called Regularization methods.
Following we explain it.

15. State Regularization

Regularization is collection of methods to make a machine learning model simpler. To this end,
certain approaches are applied to different machine learning algorithms, for instance, pruning
for decision trees, dropout techniques for neural networks, and adding a penalty parameter to
the cost function in Regression.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

PART-B & C

1. What is Machine Learning Process?

2. What are the types of machine learning?

3. Write Some Applications of machine learning?

4. Write a detail note on Linear algebra.

5. Write some popular examples of linear algebra and explain.

6. What are the types of problem in machine learning?

7. Write in detail about Vapnik-Chervonenkis (VC) dimension?

8. Write About Hypothesis in Machine Learning

9. What are the ways to secure that a machine learning model is generalized?

10. Determinant factors to train generalized models?

11. Compare Bias and Variance

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

UNIT – II

TWO MARKS QUESTIONS & ANSWERS (PART-A)

1.Define Regression algorithms?

Regression algorithms are used when you notice that the output is a continuous variable,
whereas classification algorithms are used when the output is divided into sections such as
Pass/Fail, Good/Average/Bad, etc. We have various algorithms for performing the regression or
classification actions, with Linear Regression Algorithm being the basic algorithm in Regression.

2. What are the two types of Linear Regression?

- Simple Linear Regression


- Multiple Linear Regression

3. Write simple linear regression?

Y = βo + β1 X + €

Y represents the output or dependent variable.


β0 and β1 are two unknown constants that represent the intercept and coefficient (slope)
respectively.
ꞓ (Epsilon) is the error term.

4. Write Applications of Simple Linear Regression?

- Predicting crop yields based on the amount of rainfall: Yield is dependent variable while the
amount of rainfall is independent variable.

- Marks scored by student based on number of hours studied (ideally): Here marks scored is
dependent and number of hours studied is independent.

- Predicting the Salary of a person based on years of experience: Thus Experience become the
independent variable while Salary becomes the dependent variable

5. What is mean by Homogeneity of variance?

The size of the error in our prediction doesn't change significantly across the values of the
independent variable.

6. Define Linearity?

The line of best fit through the data points is a straight line, rather than a curve or some sort of
grouping factor.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

7. How does Gradient Descent work?

Before starting the working principle of gradient descent, we should know some basic concepts
to find out the slope of a line from linear regression. The equation for simple linear regression is
given as: Y = mX+c
Where 'm' represents the slope of the line, and 'c' represents the intercepts on the y-axis.

The starting point is used to evaluate the performance as it is considered just as an arbitrary
point. At this starting point, we will derive the first derivative or slope and then use a tangent
line to calculate the steepness of this slope. Further, this slope will inform the updates to the
parameters (weights and bias).

8. Write the Advantages of Batch gradient descent?

- It produces less noise in comparison to other gradient descent.


- It produces stable gradient descent convergence.
- It is Computationally efficient as all resources are used for all training samples.

9. What is SGD?

Stochastic gradient descent (SGD) is a type of gradient descent that runs one training example
per iteration. Or in other words, it processes a training epoch for each example within a dataset
and updates each training example's parameters one at a time. As it requires only one training
example at a time, hence it is easier to store in allocated memory

10. Write about Mini Batch Gradient Descent?

Mini Batch gradient descent is the combination of both batch gradient descent and stochastic
gradient descent. It divides the training datasets into small batch sizes then performs the
updates on those batches separately. Splitting training datasets into smaller batches make a
balance to maintain the computational efficiency of batch gradient descent and speed of

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

stochastic gradient descent. Hence, we can achieve a special type of gradient descent with
higher computational efficiency and less noisy gradient descent.

11. What is QDA?

Quadratic Discriminate Analysis (QDA) For multiple input variables, each class deploys its own
estimate of variance.

12. What is FDA?

FDA uses regularization in the estimate of the variance (actually covariance) and hence
moderates the influence of different variables on LDA.

13. What is Binary classifier?

Binary classifiers are defined as the function that helps in deciding whether input data can be
represented as vectors of numbers and belongs to some specific class. It can be considered as
linear classifiers. In simple words, we can understand it as a classification algorithm that can
predict linear predictor function in terms of weight and feature vectors.

14. Some of the discriminative models?

- Support Vector Machine


- Logistic Regression
- k-Nearest Neighbour (KNN)
- Random Forest
- Deep Neural Network (such as AlexNet, VGGNet, and ResNet)

15. Type of Generative Models?

- Naive Bayes
- Hidden Markov Models
- Autoencoder
- Boltzmann Machines
- Variational Autoencoder
- Generative Adversarial Networks

16. Write applications of Naïve Bayes Classifier?

- It is used for Credit Scoring.


- It is used in medical data classification.
- It can be used in real-time predictions because Naïve Bayes Classifier is an eager learner.
- It is used in Text classification such as Spam filtering and Sentiment analysis.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

17. Write about SVM?

Support Vector Machine (SVM) is a non-parametric, supervised learning technique very popular
with engineers for it produces excellent results with significantly less compute. A Machine
Learning algorithm, it can be applied to both classification (output is deterministic) and
regression (output is continuous problems. It is largely used in text classification, image
classification, protein and gene classification.

18. Define Random Forest?

Like SVM, Random Forest also falls in the class of discriminative modelling It is one of the most
popular and powerful Machine Learning algorithms to perform classification and regression.
Random Forest became the Kaggle Community as it helped win many competitions.

PART-B & C

1. Explain the types of Linear Regression?

2. Write in detail about Least Square Method?

3. Write in detail about Bayesian Linear Regression?

4. What are the Types of Gradient Descent?

5. Explain about Linear Classification?

6. Discuss about Real-world Applications of LDA?

7. Write Briefly about Perceptron working Model?

8. Compare Generative and Discriminative Models?

9. What is Deep Neural Network?

10. Compare Linear Regression and Logistic Regression?

11. Write in detail about Generative Modelling?

12. Explain in detail about Naïve Bayes Classifier Algorithm?

13. Write in detail about Bayes' Theorem?

14. Explain in detail about Maximum Margin Classifier?

15. Write in detail about Decision tree?

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

UNIT - III

TWO MARKS QUESTIONS AND ANSWERS (PART-A)

1. Define Ensemble methods

It is a machine learning technique that combines several base models in order to produce one
optimal predictive model. Decision Trees is best to outline the definition and practicality of
Ensemble Methods (however it is important to note that Ensemble Methods do not only
pertain to Decision Trees).

2. List the Ensemble Techniques


A few simple but powerful techniques, namely:
- Max Voting
- Averaging
- Weighted Averaging

3. What are the Advanced Ensemble techniques

- Stacking
- Blending
- Bagging
- Boosting

4. What do you mean by Unsupervised learning

Unsupervised learning, also known as unsupervised machine learning, uses machine learning
algorithms to analyze and cluster unlabeled datasets. These algorithms discover hidden
patterns or data groupings without the need for human intervention. Its ability to discover
similarities and differences in information make it the ideal solution for exploratory data
analysis, cross-selling strategies, customer segmentation, and image recognition

5. List some of the popular unsupervised learning algorithms:


- K-means clustering
- KNN (k-nearest neighbors)
- Hierarchal clustering
- Anomaly detection
- Neural Networks
- Principle Component Analysis
- Independent Component Analysis
- Apriori algorithm
- Singular value decomposition

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

6. State the Applications of K-Means Clustering.

K-Means clustering is used in a variety of examples or business cases in real life, like:
- Academic performance
- Diagnostic systems
- Search engines
- Wireless sensor networks

7. State the Gaussian Mixture Model (GMM).

This model is a soft probabilistic clustering model that allows us to describe the membership of
points to a set of clusters using a mixture of Gaussian densities. It is a soft classification (in
contrast to a hard one) because it assigns probabilities of belonging to a specific class instead of
a definitive choice. In essence, each observation will belong to every class but with different
probabilities.

8. Mention the pros and cons of KNN Algorithm.

Advantages of KNN Algorithm:


- It is simple to implement
- It is robust to the noisy training data
- It can be more effective if the training data is large.

Disadvantages of KNN Algorithm:


- Always needs to determine the value of K which may be complex some time.
- The computation cost is high because of calculating the distance between the data points for
all the training samples.

9. What is meant by Expectation Maximization (EM) intuition?

The Expectation-Maximization algorithm is performed exactly the same way. In fact, the
optimization procedure we describe above for GMMs is a specific implementation of the EM
algorithm. The EM algorithm is just more generally and formally defined (as it can be applied to
many other optimization problems).

So the general idea is that we are trying to maximize a likelihood (and more frequently a log-
likelihood), that is, we are trying to solve the following optimization problem:

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

PART-B & C

1. Elaborate Ensemble Techniques in detail.

2. Compare the Advanced Ensemble techniques.

3. Explain the Working of Unsupervised Learning.

4. Clarify on Unsupervised Learning Algorithm.

5. Detail the Steps to create k-means clusters.

6. Demonstrate Gaussian Mixture Model (GMM).

7. Discuss on Expectation Maximization (EM) intuition.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

UNIT - IV

TWO MARKS QUESTIONS AND ANSWERS (PART-A)

1. What is the Perceptron model in Machine Learning?

Perceptron is Machine Learning algorithm for supervised learning of various binary


classification tasks. Further, Perceptron is also understood as an Artificial Neuron or neural
network unit that helps to detect certain input data computations in business intelligence.
Perceptron model is also treated as one of the best and simplest types of Artificial Neural
networks.

2. How does Perceptron work?

In Machine Learning, Perceptron is considered as a single-layer neural network that consists of


four main parameters named input values (Input nodes), weights and Bias, net sum, and an
activation function. The perceptron model begins with the multiplication of all input values and
their weights, then adds these values together to create the weighted sum. Then this weighted
sum is applied to the activation function 'f' to obtain the desired output. This activation
function is also known as the step function and is represented by 'f'.

3. List down the advantages of Multi-Layer Perceptron:

- A multi-layered perceptron model can be used to solve complex non-linear problems.


- It works well with both small and large input data.
- It helps us to obtain quick predictions after the training.
- It helps to obtain the same accuracy ratio with large as well as small data.

4. Compare the types of Activation Functions.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

5. Define an Activation function

An activation function is a function that is added to an artificial neural network in order to help
the network learn complex patterns in the data. When comparing with a neuron-based model
that is in our brains, the activation function is at the end deciding what is to be fired to the next
neuron. The neuron doesn't really know how to bound to value and thus is not able to decide
the firing pattern. Thus the activation function is an important part of an artificial neural
network.

6. How do you train a neural network?

In the process of training, we want to start with a bad performing neural network and wind up
with network with high accuracy. In terms of loss function, we want our loss function to much
lower in the end of training. Improving the network is possible, because we can change its
function by adjusting weights. We want to find another function that performs better than the
initial one.

7. What is the objective of Gradient Descent?

Gradient, in plain terms means slope or slant of a surface. So gradient descent literally means
descending a slope to reach the lowest point on that surface

8. List the types of Gradient Descent:

Typically, there are three types of Gradient Descent:


- Batch Gradient Descent
- Stochastic Gradient Descent
- Mini-batch Gradient Descent

9. Define a Stochastic Gradient Descent (SGD):

In Stochastic Gradient Descent, a few samples are selected randomly instead of the whole data
set for each iteration. In Gradient Descent, there is a term called "batch" which denotes the
total number of samples from a dataset that is used for calculating the gradient for each
iteration. In typical Gradient Descent optimization, like Batch Gradient Descent, the batch is
taken to be the whole dataset.

10. Write an algorithm for stochastic gradient descent

1. The algorithm starts at a random point by initializing the weights with random values
2. Then it calculates the gradients at that random point
3. Then it moves in the opposite direction of the gradient
4. The process continues to repeat itself until it finds the point of minimum loss

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

11. What do you mean by back propagation?

Backpropagation defines the whole process encompassing both the calculation of the gradient
and its need in the stochastic gradient descent. Technically, backpropagation is used to
calculate the gradient of the error of the network concerning the network's modifiable weights.
The characteristics of Backpropagation are the iterative, recursive and effective approach
through which it computes the updated weight to increase the network until it is not able to
implement the service for which it is being trained

12. What is a Backpropagation of error?

Each output unit compares activation YK with the target value TK to determine the associated
error for that unit. It is based on the error, the factor δδ (K = 1,… m) is computed and is used to
distribute the error at the output unit YK back to all units in the previous layer. Similarly the
factor δδj(j = 1,...p) is compared for each hidden unit Zj.

13. How Backpropagation Algorithm Works

- Inputs X, arrive through the preconnected path


- Input is modeled using real weights W. The weights are usually randomly selected.
- Calculate the output for every neuron from the input layer, to the hidden layers, to the output layer.
- Calculate the error in the outputs O ErrorB = Actual Output - Desired Output
- Travel back from the output layer to the hidden layer to adjust the weights such that the error is
decreased.

14. List the Types of Backpropagation

There are two types of Backpropagation which are as follows-


Static Back Propagation - In this type of backpropagation, the static output is created because
of the mapping of static input. It is used to resolve static classification problems like optical
character recognition.
Recurrent Backpropagation - The Recurrent Propagation is directed forward or directed until a
specific determined value or threshold value is acquired. After the certain value, the error is
evaluated and propagated backward.

15. What is Unit saturation (vanishing gradient problem)

The vanishing gradient problem is an issue that sometimes arises when training machine
learning algorithms through gradient descent. This most often occurs in neural networks that
have several neuronal layers such as in a deep learning system, but also occurs in recurrent
neural networks.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

The key point is that the calculated partial derivatives used to compute the gradient as one
goes deeper into the network. Since the gradients control how much the network learns during
training, if the gradients are very small or zero, then little to no training can take place, leading
to poor predictive performance.

16. Define ReLU

The rectified linear activation unit, or ReLU, is one of the few landmarks in the deep learning
revolution. It's simple, yet it's far superior to previous activation functions like sigmoid or tanh.

ReLU formula is: f(x) = max(0,x)

17. State Hyperparameter tuning

A Machine Learning model is defined as a mathematical model with a number of parameters


that need to be learned from the data. By training a model with existing data, we are able to fit
the model parameters. However, there is another kind of parameter, known as
Hyperparameters, that cannot be directly learned from the regular training process. They are
usually fixed before the actual training process begins. These parameters express important
properties of the model such as its complexity or how fast it should learn.

18. What is Randomized Search CV?

Randomized Search CV solves the drawbacks of Grid Search CV, as it goes through only a fixed
number of hyperparameter settings. It moves within the grid in a random fashion to find the
best set of hyperparameters. This approach reduces unnecessary computation.

19. Clarify the concept Regularization in Machine Learning?

Regularization is an application of Occam's Razor. It is one of the key concepts in Machine


learning as it helps choose a simple model rather than a complex one. We want our model to
perform well both on the train and the new unseen data, meaning the model must have the
ability to be generalized. Generalization error is "a measure of how accurately an algorithm can
predict outcome values for previously unseen data."

Regularization refers to the modifications that can be made to a learning algorithm that helps
to reduce this generalization error and not the training error. It reduces by ignoring the less
important features. It also helps prevent overfitting, making the model more robust and
decreasing the complexity of a model.

20. Define Dropout.


"Dropout" in machine learning refers to the process of randomly ignoring certain nodes in a
layer during training. In the figure below, the neural network on the left represents a typical

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

neural network where all units are activated. On the right, the red units have been dropped out
of the model - the values of their weights and biases are not considered during training.

21. How Does Dropout Work?

When we apply dropout to a neural network, we're creating a "thinned" network with unique
combinations of the units in the hidden layers being dropped randomly at different points in
time during training. Each time the gradient of our model is updated, we generate a new
thinned neural network with different units dropped based on a probability hyperparameter p.
Training a network using dropout can thus be viewed as training loads of different thinned
neural networks and merging them into one network that picks up the key properties of each
thinned network. This process allows dropout to reduce the overfitting of models on training
data.

22. What is the Downside of Dropout?

Although dropout is clearly a highly effective tool, it comes with certain drawbacks. A network
with dropout can take 2-3 times longer to train than a standard network. One way to attain the
benefits of dropout without slowing down training is by finding a regularizer that is essentially
equivalent to a dropout layer. For linear regression, this regularizer has been proven to be a
modified form of L2 regularization.

PART-B & C

1. Explain the Basic Components of Perceptron.

2. Demonstrate the Multi-Layered Perceptron Model:

3. Elaborate Activation functions in detail.

4. Clarify how to train a neural network.

5. Explain Stochastic gradient descent.

6. What is back propagation? How Backpropagation Algorithm Works?

7. Write in detail the vanishing gradient problem.

8. Discuss on ReLU.

9. Demonstrate the strategies for Hyperparameter tuning.

10. Explain the Regularization Techniques in Machine Learning.

11. How Does Dropout Work? Discuss.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

UNIT - V

TWO MARKS QUESTIONS AND ANSWERS (PART-A)

1. What are the guidelines for machine learning experiments

Machine learning projects are highly iterative; as you progress through the ML lifecycle, you'll
find yourself iterating on a section until reaching a satisfactory level of performance, then
proceeding forward to the next task (which may be circling back to an even carlier step).
Moreover, a project isn't complete after you ship the first version; you get feedback from real-
world interactions and redefine the goals for the next iteration of deployment.

2. State the Model exploration

- Establish baselines for model performance


- Start with a simple model using initial data pipeline
- Overfit simple model to training data
- Stay nimble and try many parallel (isolated) ideas during early stages
- Find SoTA model for your problem domain (if available) and reproduce results, then apply to
your dataset as a second baseline

3. List the Properties of an experiment, trial and trial component:

- An Experiment is uniquely characterized by its objective or hypothesis


- An Experiment usually contains more than one Trial, one Trial for each variable set.
- A Trial is uniquely characterized by its variable set, sampled from the variable space defined by you.
- A Trial component is any artifact, parameter or job that is associated with a specific Trial.
- A Trial component is usually part of a Trial, but it can exist independent of an experiment or trial.
- A Trial component cannot be directly associated with an Experiment. It has to be associated
with a Trial which is associated with an Experiment.

4. What is the Validation Set Approach?

Validation Set approach is very simple method and frequently used method when there is
sufficiently enough amount of observations to get reasonable results. It is basically dividing the
data that we have to two place as train set and validation set (or holdout set), and building
model on train set, then checking the model accuracy on validation set. And resulting accuracy
from validation set is the estimate about the real test data(unseen data).

5. Define LOOCV.

Leave One Out Cross Validation method is addressed to the drawbacks of the validation set
approach and it is also simple method as validation set approach. Main point here is to take

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

each observation as a validation set one time. It means that if we have "n" number of
observations, we will fit model "n" times. And in every try we will keep one observation as a
test sample, and will train the model on "n-1" observations.

6. List down the types of resampling methods:

- Bootstrap Sampling
- K Fold Cross Validation
- Leave One Out Cross Validation

7. State the Resampling Method.

Resampling methods are very useful and beneficial in statistics and machine learning to fit more
accurate models, model selection and parameter tuning. They draw samples from train data
and fit model to check the variability of model and get additional information. We cannot best
sure of the result of the model by just unique fit without testing on different sample or samples.
It can be computationally expensive because of fitting model more than one, but recent
improvements tackle this issue easily without too much effort.

8. What is a Cross Validation - Kfold?

Cross validation resamples without replacement and thus produces surrogate data sets that are
smaller than the original. These data sets are produced in a systematic way so that after a pre-
specified number k of surrogate data sets, each of the n original cases has been left out exactly
once. This is called k-fold cross validation or leave-x-out cross validation with x=n/k, e.g. leave-
one-out cross validation omits 1 case for each surrogate set, i.e. k = n.

9. What do you mean by Bootstrapping?

Bootstrapping is a resampling method that is used in machine learning. It is a widespread


technique due to its flexibility since it does not require anything other than your training
dataset. Bootstrap Sampling comes from the ideas around just the Bootstrap. The Bootstrap is a
flexible and powerful statistical tool that brings us closer to our sample's true population
parameters.

10. What is The Difference Between Bootstrapping And Cross-Validation?

Bootstrapping and Cross-Validation are both sampling methods of statistical inference.

In general, statistical inference uses data from a sample to make estimates or predictions about
a population. Both bootstrapping and cross-validation are used to estimate the performance of
our population's "standard error" or, more simply, how our machine-learning algorithm will do
in a production system on unseen data.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

The main difference between the two methods is that bootstrapping is a resampling technique,
while cross-validation is a partitioning technique. Bootstrapping involves random sampling with
replacement from the training data set to create multiple new training sets.

This means that bootstrapping will lower the variance for our machine- learning model.

11. What is the Classification Algorithm?

The Classification algorithm is a Supervised Learning technique that is used to identify the
category of new observations on the basis of training data. In Classification, a program learns
from the given dataset or observations and then classifies new observation into a number of
classes or groups. Such as, Yes or No, 0 or 1, Spam or Not Spam, cat or dog, etc. Classes can be
called as targets/labels or categories.

12. Define Naive Bayes

The Naive Bayes method is a supervised learning algorithm based on applying Bayes theorem
with the "naive" assumption of conditional independence between every pair of features given
the value of the class variable. Naive Bayes classifiers are a collection of classification
algorithms based on Bayes' Theorem. It is not a single algorithm but a family of algorithms
where all of them share a common principle, i.e. every pair of features being classified is
independent of each other.

13. Write about Support Vector Machine

Support vector machine is based on statistical approaches. Her we try to find a hyperplane that
best separates the two classes. SVM finding the maximum margin between the hyperplanes
that means maximum distances between the two classes. SVM works best when the dataset is
small and complex. When the data is perfectly linearly separable only then we can use Linear
SVM. When the data is not linearly separable then we can use Non-Linear SVM, which means
when the data points cannot be separated into 2 classes by using a linear approach.

14. What is a T-test

A T-test is the final statistical measure for determining differences between two means that
may or may not be related. The testing uses randomly selected samples from the two
categories or groups. It is a statistical method in which samples are chosen randomly, and there
is no perfect normal distribution.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

15. Define McNemar's test

McNemar's Test: This is a non-parametric test for the paired nominal data. This test is used
when we want to find the change in proportion for the paired data. This test is also known as
McNemar's Chi-Square test. This is because the test statistic has a chi-square distribution.

16. What is a K-fold CV paired t test?

K-fold cross-validated paired t-test procedure is a common method for comparing the
performance of two models (classifiers or regressors) and addresses some of the drawbacks of
the resampled t-test procedure; however, this method has still the problem that the training
sets overlap and is not recommended to be used in practice.

PART-B & C

1. Discuss about the guidelines for machine learning experiments

2. Explain Project lifecycle in machine learning

3. Discuss on the term Cross Validation

4. Compare and contrast types of resampling methods

5. Demonstrate on Cross Validation

6. How will you Evaluate a ML model using K-Fold CV?

7. What is Bootstrapping? How do you implement it in python?

8. Broadly explain on measuring classifier performance.

9. Briefly explain the Popular algorithms that can be used for binary classification

10. Explain the method of Performing a t test.

11. Demonstrate the McNemar Test.

https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
Click on Subject/Paper under Semester to enter.
Professional English Discrete Mathematics Environmental Sciences
Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -


Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -


Engineering Chemistry Information Science Algorithms - AD3351 AL3451
- CY3151 - PH3256
Data Exploration and Fundamentals of Data
Basic Electrical and
Visualization - AD3301 Science and Analytics
Problem Solving and Electronics Engineering -
BE3251 - AD3491
Python Programming -
GE3151 Artificial Intelligence
Data Structures Computer Networks
- AL3391
Design - AD3251 - CS3591

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791


6th Semester

7th Semester

8th Semester

Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering

You might also like