ML Viva Questions
ML Viva Questions
Unsupervised Learning
In unsupervised learning, we don't have labeled data. A model can identify
patterns, anomalies, and relationships in the input data.
Reinforcement Learning
Using reinforcement learning, the model can learn based on the rewards it
received for its previous action.
3) What is ‘training Set’ and ‘test Set’ in a Machine Learning Model? How
Much Data Will You Allocate for Your Training, Validation, and Test Sets?
There is a three-step process followed to create a model:
1. Train the model
2. Test the model
3. Deploy the model
Consider a case where you have labeled data for 1,000 records. One way to
train the model is to expose all 1,000 records during the training process. Then
you take a small set of the same data to test the model, which would give good
results in this case.
But, this is not an accurate way of testing. So, we set aside a portion of that
data called the ‘test set’ before starting the training process. The remaining
data is called the ‘training set’ that we use for training the model. The training
set passes through the model multiple times until the accuracy is high, and
errors are minimized.
Now, we pass the test data to check if the model can accurately predict the
values and determine if training is effective. If you get errors, you either need
to change your model or retrain it with more data.
Regarding the question of how to split the data into a training set and test set,
there is no fixed rule, and the ratio can vary based on individual preferences.
4) How Do You Handle Missing or Corrupted Data in a Dataset?
One of the easiest ways to handle missing or corrupted data is to drop those
rows or columns or replace them entirely with some other value.
There are two useful methods in Pandas:
IsNull() and dropna() will help to find the columns/rows with missing
data and drop them
Fillna() will replace the wrong values with a placeholder value
5) How Can You Choose a Classifier Based on a Training Set Data Size?
When the training set is small, a model that has a right bias and low variance
seems to work better because they are less likely to overfit.
For example, Naive Bayes works best when the training set is large. Models
with low bias and high variance tend to perform better as they work fine with
complex relationships.
Your AI/ML Career is Just Around The Corner!
AI Engineer Master's ProgramExplore Program
Here,
For actual values:
Total Yes = 12+1 = 13
Total No = 3+9 = 12
Similarly, for predicted values:
Total Yes = 12+3 = 15
Total No = 1+9 = 10
For a model to be accurate, the values across the diagonals should be high. The
total sum of all the values in the matrix equals the total observations in the test
data set.
For the above matrix, total observations = 12+3+1+9 = 25
Now, accuracy = sum of the values across the diagonal/total dataset
= (12+9) / 25
= 21 / 25
= 84%
7) What Is a False Positive and False Negative and How Are They Significant?
False positives are those cases that wrongly get classified as True but are False.
False negatives are those cases that wrongly get classified as False but are True.
In the term ‘False Positive,’ the word ‘Positive’ refers to the ‘Yes’ row of the
predicted value in the confusion matrix. The complete term indicates that the
system has predicted it as a positive, but the actual value is negative.
Enables machines to
take decisions with the
help of artificial neural
Enables machines to take decisions on their
networks
own, based on past data
It needs a large amount
It needs only a small amount of data for
of training data
training
Needs high-end
Works well on the low-end system, so you
machines because it
don't need large machines
requires a lot of
Most features need to be identified in computing power
advance and manually coded
The machine learns the
The problem is divided into two parts and features from the data it
solved individually and then combined is provided
The problem is solved in
an end-to-end manner
It concludes experiences
It observes instances based on defined Example: Allow the child
principles to draw a conclusion to play with fire. If he or
she gets burned, they
Example: Explaining to a child to keep away
will learn that it is
from the fire by showing a video where fire
dangerous and will
causes damage
refrain from making the
same mistake again
K-means KNN
KNN is supervised in
nature
K-Means is unsupervised
KNN is a classification
K-Means is a clustering algorithm
algorithm
The points in each cluster are similar to each
It classifies an unlabeled
other, and each cluster is different from its
observation based on its
neighboring clusters
K (can be any number)
surrounding neighbors
24) Considering a Long List of Machine Learning Algorithms, given a Data Set,
How Do You Decide Which One to Use?
There is no master algorithm for all situations. Choosing an algorithm depends
on the following questions:
How much data do you have, and is it continuous or categorical?
Is the problem related to classification, association, clustering, or
regression?
Predefined variables (labeled), unlabeled, or mix?
What is the goal?
Based on the above questions, the following algorithms can be used:
Your AI/ML Career is Just Around The Corner!
AI Engineer Master's ProgramExplore Program
Observe that all five selected points do not belong to the same cluster. There
are three tennis balls and one each of basketball and football.
When multiple classes are involved, we prefer the majority. Here the majority
is with the tennis ball, so the new data point is assigned to this cluster.
32) What is a Recommendation System?
Anyone who has used Spotify or shopped at Amazon will recognize a
recommendation system: It’s an information filtering system that predicts what
a user might want to hear or see based on choice patterns provided by the
user.
33) What is Kernel SVM?
Kernel SVM is the abbreviated version of the kernel support vector machine.
Kernel methods are a class of algorithms for pattern analysis, and the most
common one is the kernel SVM.
34) What Are Some Methods of Reducing Dimensionality?
You can reduce dimensionality by combining features with feature engineering,
removing collinear features, or using algorithmic dimensionality reduction.
Now that you have gone through these machine learning interview questions,
you must have got an idea of your strengths and weaknesses in this domain.
35) What is Principal Component Analysis?
Principal Component Analysis or PCA is a multivariate statistical technique that
is used for analyzing quantitative data. The objective of PCA is to reduce higher
dimensional data to lower dimensions, remove noise, and extract crucial
information such as features and attributes from large amounts of data.
36) What do you understand by the F1 score?
The F1 score is a metric that combines both Precision and Recall. It is also the
weighted average of precision and recall.
The F1 score can be calculated using the below formula:
F1 = 2 * (P * R) / (P + R)
The F1 score is one when both Precision and Recall scores are one.
37) What do you understand by Type I vs Type II error?
Type I Error: Type I error occurs when the null hypothesis is true and we reject
it.
Type II Error: Type II error occurs when the null hypothesis is false and we
accept it.
38) Explain Correlation and Covariance?
Correlation: Correlation tells us how strongly two random variables are related
to each other. It takes values between -1 to +1.
Formula to calculate Correlation:
Gini Impurity: Splitting the nodes of a decision tree using Gini Impurity is
followed when the target variable is categorical.
43) How does the Support Vector Machine algorithm handle self-learning?
The SVM algorithm has a learning rate and expansion rate which takes care of
self-learning. The learning rate compensates or penalizes the hyperplanes for
making all the incorrect moves while the expansion rate handles finding the
maximum separation area between different classes.
44) What are the assumptions you need to take before starting with linear
regression?
There are primarily 5 assumptions for a Linear Regression model:
Multivariate normality
No auto-correlation
Homoscedasticity
Linear relationship
No or little multicollinearity
45) What is the difference between Lasso and Ridge regression?
Lasso(also known as L1) and Ridge(also known as L2) regression are two
popular regularization techniques that are used to avoid overfitting of data.
These methods are used to penalize the coefficients to find the optimum
solution and reduce complexity. The Lasso regression works by penalizing the
sum of the absolute values of the coefficients. In Ridge or L2 regression, the
penalty function is determined by the sum of the squares of the coefficients.