ML Answerbank

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Q3. What are the limitations of inference machines?Explain.

Ans:The work in ML inference can sometimes be misallocated to the data scientist. If given
only a low-level set of tools for ML inference, the data scientist may not be successful in the
deployment.
Additionally, DevOps and data engineers are sometimes not able to help with deployment,
often due to conflicting priorities or a lack of understanding of what’s required for ML
inference. In many cases, the ML model is written in a language like Python, which is
popular among data scientists, but the IT team is more well-versed in a language like Java.
This means that engineers must take the Python code and translate it to Java to run it within
their infrastructure. In addition, the deployment of ML models requires some extra coding to
map the input data into a format that the ML model can accept, and this extra work adds to
the engineers’ burden when deploying the ML model.

Also, the ML lifecycle typically requires experimentation and periodic updates to the ML
models. If deploying the ML model is difficult in the first place, then updating models will be
almost as difficult. The whole maintenance effort can be difficult, as there are business
continuity and security issues to address.

Another challenge is attaining suitable performance for the workload. REST-based systems
that perform the ML inference often suffer from low throughput and high latency. This might
be suitable for some environments, but modern deployments that deal with IoT and online
transactions are facing huge loads that can overwhelm these simple REST-based
deployments. And the system needs to be able to scale to not only handle growing
workloads but to also handle temporary load spikes while retaining consistent
responsiveness.

Q4. Write a note on


i. Approximation and estimation errors.
ii. Hypothesis class

Ans:
i Approximation and estimation errors: Many learning algorithms search for the model
within a predefined function class. In this case, either the optimal model belongs to this class
or not. But even if the optimal model lies within the reach of the algorithm, this does not
necessarily means that the algorithm will find it. In fact, in many cases, the resulting model
will be worse than what could be obtained by using a more restricted function class without
the optimal model.

This phenomenon can be explained in terms of the estimation and approximation errors.
The estimation error is the error implied by the fact that the algorithm works with a finite
training set that only partially reflects the true distribution of the data. By considering that the
training set is obtained by randomly sampling the true distribution, this also incurs a certain
variability in the resulting model.
The approximation error is the error implied by the choice of function class and is defined as
the difference in risk obtained by the best model within the function class and the optimal
model.

If we let the function class be large enough to contain the optimal model, then the
approximation error can be zero, but on the other hand, the estimation error increases as the
larger the function class is, the less likely the algorithm is to find the best model in the class.
Conversely, we can decrease the estimation error by reducing the size of the function class
(think of the extreme case where the class contains a single function), but doing so, we
probably increase the approximation error.

ii Hypothesis class: In most supervised machine learning algorithms, our main goal is to
find out a possible hypothesis from the hypothesis space that could possibly map out the
inputs to the proper outputs.
Hypothesis Space (H):
Hypothesis space is the set of all the possible legal hypotheses. This is the set from which
the machine learning algorithm would determine the best possible (only one) which would
best describe the target function or the outputs.

Hypothesis (h):
A hypothesis is a function that best describes the target in supervised machine learning. The
hypothesis that an algorithm would come up depends upon the data and also depends upon
the restrictions and bias that we have imposed on the data. To better understand the
Hypothesis Space and Hypothesis consider the following coordinate that shows the
distribution of some data:
Say suppose we have test data for which we have to determine the outputs or results. The
test data is as shown below:
We can predict the outcomes by dividing the coordinate as shown below:

So the test data would yield the following result:


But note here that we could have divided the coordinate plane as:
The way in which the coordinate would be divided depends on the data, algorithm and
constraints.

All these legal possible ways in which we can divide the coordinate plane to predict the
outcome of the test data composed of the Hypothesis Space.
Each individual possible way is known as the hypothesis.
Hence, in this example the hypothesis space would be like:

Q5. Explain the K-nearest neighbour algorithm with an example.


ANS:
K-nearest neighbors (KNN) algorithm is a type of supervised ML algorithm which can be
used for both classification as well as regression predictive problems. However, it is mainly
used for classification predictive problems in industry. The following two properties would
define KNN well −
● Lazy learning algorithm − KNN is a lazy learning algorithm because it does not have
a specialized training phase and uses all the data for training while classification.
● Non-parametric learning algorithm − KNN is also a non-parametric learning algorithm
because it doesn’t assume anything about the underlying data.

Working of KNN Algorithm


K-nearest neighbors (KNN) algorithm uses ‘feature similarity’ to predict the values of new
datapoints which further means that the new data point will be assigned a value based on
how closely it matches the points in the training set. We can understand its working with the
help of following steps −
Step 1 − For implementing any algorithm, we need a dataset. So during the first step of
KNN, we must load the training as well as test data.
Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can be any
integer.
Step 3 − For each point in the test data do the following −
● 3.1 − Calculate the distance between test data and each row of training data with the
help of any of the method namely: Euclidean, Manhattan or Hamming distance. The
most commonly used method to calculate distance is Euclidean.
● 3.2 − Now, based on the distance value, sort them in ascending order.
● 3.3 − Next, it will choose the top K rows from the sorted array.
● 3.4 − Now, it will assign a class to the test point based on most frequent class of
these rows.
Step 4 − End

Example
The following is an example to understand the concept of K and working of KNN algorithm

Suppose we have a dataset which can be plotted as follows −

Now, we need to classify new data point with black dot (at point 60,60) into blue or red
class. We are assuming K = 3 i.e. it would find three nearest data points. It is shown in the
next diagram −
We can see in the above diagram the three nearest neighbors of the data point with black
dot. Among those three, two of them lies in Red class hence the black dot will also be
assigned in red class.

Q6. Describe sample complexity in detail.


ANS: The sample complexity of a machine learning algorithm represents the number of
training-samples that it needs in order to successfully learn a target function.
More precisely, the sample complexity is the number of training-samples that we need to
supply to the algorithm, so that the function returned by the algorithm is within an arbitrarily
small error of the best possible function, with probability arbitrarily close to 1.
There are two variants of sample complexity:
● The weak variant fixes a particular input-output distribution;
● The strong variant takes the worst-case sample complexity over all input-output
distributions.

Q7. Explain the importance & applications of support vector machine.


Ans: supervised learning algorithms. The aim of using SVM is to correctly classify unseen
data. SVMs have a number of applications in several fields.
Some common applications of SVM are-

Face detection – SVMc classify parts of the image as a face and non-face and create a
square boundary around the face.
Text and hypertext categorization – SVMs allow Text and hypertext categorization for both
inductive and transductive models. They use training data to classify documents into
different categories. It categorizes on the basis of the score generated and then compares
with the threshold value.
Classification of images – Use of SVMs provides better search accuracy for image
classification. It provides better accuracy in comparison to the traditional query-based
searching techniques.
Bioinformatics – It includes protein classification and cancer classification. We use SVM for
identifying the classification of genes, patients on the basis of genes and other biological
problems.
Protein fold and remote homology detection – Apply SVM algorithms for protein remote
homology detection.
Handwriting recognition – We use SVMs to recognize handwritten characters used widely.
Generalized predictive control(GPC) – Use SVM based GPC to control chaotic dynamics
with useful parameters

Q8. What is structural risk minimization? Explain in detail.

Ans:

Q9. Explain the importance of maximal margin classifiers.


Q10. Explain the concept of finite covering with suitable explanation.

Q11. What is model based learning? Explain it's importance.


Model-based learning is the formation and subsequent development of mental models by a learner.
Most often used in the context of dynamic phenomena, mental models organise information about
how the components of systems interact to produce the dynamic phenomena.

Importance:
● Model-based design provides a common design environment, which facilitates
general communication, data analysis, and system verification between various
(development) groups.
● Engineers can locate and correct errors early in system design, when the time and
financial impact of system modification are minimized.
● Design reuse, for upgrades and for derivative systems with expanded capabilities, is
facilitated.

Q12. Describe the process of Occam's learning in detail.


● Occam’s razor is one of the simplest examples of inductive bias. It involves a
preference for a simpler hypothesis that best fits the data. Though the razor can
● be used to eliminate other hypotheses, relevant justification may be needed to do
so. Below is an analysis of how this principle is applicable in decision tree
learning.
● The decision tree learning algorithms follow a search strategy to search the
hypotheses space for the hypothesis that best fits the training data. For example,
the ID3 algorithm uses a simple to complex strategy starting from an empty tree
and adding nodes guided by the information gain heuristic to build a decision tree
consistent with the training instances.
The information gain of every attribute (which is not already included in the tree)
is calculated to infer which attribute to be considered as the next node.
Information gain is the essence of the ID3 algorithm. It gives a quantitative
measure of the information that an attribute can provide about the target variable
i.e, assuming only information of that attribute is available, how efficiently can we
infer about the target. It can be defined as :
Well, there can be many decision trees that are consistent with a given set of
training examples, but the inductive bias of the ID3 algorithm results in the
preference for simper (or shorter trees) trees. This preference bias of ID3 arises
from the fact that there is an ordering of the hypotheses in the search strategy.
This leads to additional bias that attributes high with information gain closer to the
root is preferred. Therefore, there is a definite order the algorithm follows until it
terminates on reaching a hypothesis that is consistent with the training data.
The above image depicts how the ID3 algorithm chooses the nodes in every
iteration. The red arrow depicts the node chosen in a particular iteration while the
black arrows suggest other decision trees that could have been possible in a
given iteration.
● Hence starting from an empty node, the algorithm graduates towards more complex
decision trees and stops when the tree is sufficient to classify the training examples.

Q13. Write a note on :


i) Value Iteration.
Value Iteration
Value iteration computes the optimal state value function by iteratively improving the
estimate of V(s). The algorithm initialize V(s) to arbitrary random values. It repeatedly
updates the Q(s, a) and V(s) values until they converges. Value iteration is guaranteed to
converge to the optimal values. This algorithm is shown in the following pseudo-code:

Example : FrozenLake8x8 (Using Value-Iteration)

Now lets implement it in python to solve the FrozenLake8x8 openAI gym. compared to the

FrozenLake-v0 environment we solved earlier using genetic algorithm, the FrozenLake8x8

has 64 possible states (grid size is 8x8) instead of 16. Therefore, the problem becomes harder

and genetic algorithm will struggle to find the optimal solution.

ii) Eligibility Traces.

Eligibility Traces is a kind of mathematical trick that improves the performance of Temporal

Difference methods, in Reinforcement Learning.

Here are the benefits of Eligibility Traces:

● Provide a way of implementing Monte Carlo in online fashion (does not wait for

the episode to finish) and on problems without episodes.

● Provide an algorithmic mechanism that uses a short-term memory vector.

● Computational efficiency by storing a single vector memory instead a list of

feature vectors.
● Learning is done continually rather than waiting results at the end of an episode.

Q14. Explain the concept of Reinforcement learning with suitable example.

What is Reinforcement Learning?

● Reinforcement Learning is a feedback-based Machine learning technique in which an


agent learns to behave in an environment by performing the actions and seeing the
results of actions. For each good action, the agent gets positive feedback, and for
each bad action, the agent gets negative feedback or penalty.

● In Reinforcement Learning, the agent learns automatically using feedbacks without


any labeled data, unlike supervised learning.

● Since there is no labeled data, so the agent is bound to learn by its experience only.

● RL solves a specific type of problem where decision making is sequential, and the
goal is long-term, such as game-playing, robotics, etc.

● The agent interacts with the environment and explores it by itself. The primary goal of
an agent in reinforcement learning is to improve the performance by getting the
maximum positive rewards.

● The agent learns with the process of hit and trial, and based on the experience, it
learns to perform the task in a better way. Hence, we can say that "Reinforcement
learning is a type of machine learning method where an intelligent agent (computer
program) interacts with the environment and learns to act within that." How a Robotic
dog learns the movement of his arms is an example of Reinforcement learning.

● It is a core part of Artificial intelligence, and all AI agent works on the concept of
reinforcement learning. Here we do not need to pre-program the agent, as it learns
from its own experience without any human intervention.

● Example: Suppose there is an AI agent present within a maze environment, and his
goal is to find the diamond. The agent interacts with the environment by performing
some actions, and based on those actions, the state of the agent gets changed, and
it also receives a reward or penalty as feedback.
● The agent continues doing these three things (take action, change state/remain in
the same state, and get feedback), and by doing these actions, he learns and
explores the environment.

● The agent learns that what actions lead to positive feedback or rewards and what
actions lead to negative feedback penalty. As a positive reward, the agent gets a
positive point, and as a penalty, it gets a negative point.

You might also like