0% found this document useful (0 votes)
33 views6 pages

Machine Learning Viva Questions

The document provides a comprehensive overview of machine learning, including its definition, differences from traditional programming, real-life applications, and various types of algorithms such as supervised and unsupervised learning. It also covers essential concepts such as confusion matrices, decision trees, bias and variance errors, and clustering. Additionally, it introduces key libraries like NumPy and Pandas, and discusses metrics for model evaluation.

Uploaded by

pratidnyasathe07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views6 pages

Machine Learning Viva Questions

The document provides a comprehensive overview of machine learning, including its definition, differences from traditional programming, real-life applications, and various types of algorithms such as supervised and unsupervised learning. It also covers essential concepts such as confusion matrices, decision trees, bias and variance errors, and clustering. Additionally, it introduces key libraries like NumPy and Pandas, and discusses metrics for model evaluation.

Uploaded by

pratidnyasathe07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Machine Learning Viva Questions

1. What is Machine Learning


Ans: Machine Learning is the field of study that gives computers the capability to learn without
being explicitly programmed. ML is one of the most exciting technologies that one would have
ever come across. As it is evident from the name, it gives the computer that makes it more
similar to humans. Machine learning algorithms build a model based on sample data, known
as training data, in order to make predictions without being explicitly programmed to do so.

2. How machine learning is different from traditional programming?


Ans: Traditional Programming: We feed in DATA (Input) + PROGRAM (logic), run it on the
machine, and get the output.
Machine Learning: We feed in DATA(Input) + Output, run it on the machine during training
and the machine creates its own program(logic), which can be evaluated while testing.

3. Real life applications


Ans: 1. Image recognition
2. Speech recognition
3. Medical diagnosis
4. Traffic prediction
5. Product recommendations
6. Email Spam and Malware Filtering
7. Virtual Personal Assistant
8. Stock Market trading
9. Automatic Language Translation

4. Types of Machine Learning


Ans: Supervised Learning Unsupervised Learning, Semi-Supervised Learning, Reinforcement
Learning

5. What is NumPy?
Ans: NumPy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects, and an
assortment of routines for fast operations on arrays, including mathematical, logical, shape
manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic
statistical operations, random simulation and much more.
6. What is Pandas?
Ans: Pandas is defined as an open-source library that provides high-performance data
manipulation in Python. It is built on top of the NumPy package, which means Numpy is
required for operating the Pandas. Pandas is used to analyze data.

7. Difference between Pandas and NumPy


Ans:

8. Difference between Regression and Classification


Ans:

Regression Classification
the output variable must be of continuous the output variable must be a discrete value
nature or real value
we try to find the best fit line, which can we try to find the decision boundary, which
predict the output more accurately can divide the dataset into different classes
Ex: Weather Prediction, House price Ex: Identification of spam emails, Speech
prediction Recognition, Identification of cancer cells
9. What is a confusion matrix?
Ans: A Confusion matrix is an N x N matrix used for evaluating the performance of a
classification model, where N is the number of target classes. The matrix compares the actual
target values with those predicted by the machine learning model.
a. Accuracy = Correct Predictions / Total Predictions
= TP + TN / TP + TN + FP + FN
b. Precision = Predictions Actually Positive / Total Predicted Positive
= TP / TP + FP
c. Recall = Predictions Actually Positive / Total Actual Positive
d. F1 Score = 2 x (Recall x Precision) / (Recall + Precision)

10. What are MSE, MAE and R-Square metrics


Ans: MSE (Mean Squared Error) represents the difference between the original and
predicted values extracted by squared the average difference over the data set.
MAE (Mean absolute error) represents the difference between the original and
predicted values extracted by averaged the absolute difference over the data set.
R-squared (Coefficient of determination) represents the coefficient of how well the
values fit compared to the original values. The value from 0 to 1 interpreted as percentages.
The higher the value is, the better the model is

11. What is Decision Tree?


Ans: A decision tree is a non-parametric supervised learning algorithm, which is utilized for
both classification and regression tasks. but mostly it is preferred for solving Classification
problems. It is a tree-structured classifier, where internal nodes represent the features of a
dataset, branches represent the decision rules and each leaf node represents the outcome. In
a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
12. Decision Tree Terminologies
Ans: 1. Root Node: Root node is from where the decision tree starts. It represents the
entire dataset, which further gets divided into two or more homogeneous sets.
2. Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
3. Splitting: Splitting is the process of dividing the decision node/root node into sub-
nodes according to the given conditions.
4. Branch/Sub Tree: A tree formed by splitting the tree.
5. Pruning: Pruning is the process of removing the unwanted branches from the tree.
6. Parent/Child node: The root node of the tree is called the parent node, and other
nodes are called the child nodes.

13. What is Cross-Validation?


Ans: Cross-validation, sometimes called rotation estimation or out-of-sample testing, is
validation technique for assessing how the results of a statistical analysis will generalize to an
independent data set. Cross-Validation is a resampling technique with the fundamental idea
of splitting the dataset into 2 parts- training data and test data. Train data is used to train the
model and the unseen test data is used for prediction. If the model performs well over the
test data and gives good accuracy, it means the model hasn’t overfitted the training data and
can be used for prediction.

14. What is ‘Naive’ in a Naive Bayes?


Ans: The Naive Bayes method is a supervised learning algorithm; it is naive since it makes
assumptions by applying Bayes’ theorem that all attributes are independent of each other.

15. What is SVM Algorithm?


Ans: “Support Vector Machine” (SVM) is a supervised machine learning algorithm that can be
used for both classification and regression challenges. However, it is mostly used in
classification problems. In the SVM algorithm, we plot each data item as a point in n-
dimensional space (where n is a number of features you have) with the value of each feature
being the value of a particular coordinate. Then, we perform classification by finding the
hyper-plane that differentiates the two classes very well.

16. What are Different Kernels in SVM?


Ans: 1. Linear kernel - used when data is linearly separable.
2. Polynomial kernel - When you have discrete data that has no natural notion of
smoothness.
3. Radial basis kernel - Create a decision boundary able to do a much better job of
separating two classes than the linear kernel.
4. Sigmoid kernel - used as an activation function for neural networks.

17. How to handle missing or corrupted data in a dataset?


Ans: We can find missing/corrupted data in a dataset and either drop those rows or columns
or to replace them with another value. In Pandas, there are two very useful methods: isnull ()
and dropna() that will help to find columns of data with missing or corrupted data and drop
those values.
18. What is Bias?
Ans: While making predictions, a difference occurs between prediction values made by the
model and actual values/expected values, and this difference is known as bias errors or Errors
due to bias.
• Low Bias: A low bias model will make fewer assumptions about the form of the target
function. Decision Trees, k-Nearest Neighbours and Support Vector Machines are Low
Bias Algorithms.
• High Bias: A model with a high bias makes more assumptions, and the model becomes
unable to capture the important features of our dataset. A high bias model also
cannot perform well on new data. Linear Regression, Linear Discriminant Analysis and
Logistic Regression are High Bias Algorithms.

Ways to reduce High Bias:

High bias mainly occurs due to a much simple model. Below are some ways to reduce the high
bias:

• Increase the input features as the model is underfitted.


• Decrease the regularization term.
• Use more complex models, such as including some polynomial features.

19. What is a Variance Error?


Ans: variance tells that how much a random variable is different from its expected value.
Variance errors are either of low variance or high variance. Low variance means there is a
small variation in the prediction of the target function with changes in the training data set.
At the same time, High variance shows a large variation in the prediction of the target function
with changes in the training dataset. Examples of machine learning algorithms with low
variance are Linear Regression, Logistic Regression, and Linear discriminant analysis. At the
same time, algorithms with high variance are decision tree, Support Vector Machine, and K-
nearest neighbours.

Ways to Reduce High Variance:


• Reduce the input features or number of parameters as a model is overfitted.
• Do not use a much complex model.
• Increase the training data.
• Increase the Regularization term.
• Low-Bias, Low-Variance: The combination of low bias and low variance shows an ideal
machine learning model. However, it is not possible practically.
• Low-Bias, High-Variance: With low bias and high variance, model predictions are
inconsistent and accurate on average. This case occurs when the model learns with a
large number of parameters and hence leads to an overfitting.
• High-Bias, Low-Variance: With High bias and low variance, predictions are consistent
but inaccurate on average. This case occurs when a model does not learn well with
the training dataset or uses few numbers of the parameter. It leads to underfitting
problems in the model.
• High-Bias, High-Variance: With high bias and high variance, predictions are
inconsistent and also inaccurate on average.

20. What is clustering?


Ans: The process of grouping a set of objects into classes of similar objects. It is an
unsupervised learning method, hence no supervision is provided to the algorithm, and it deals
with the unlabeled dataset. Clustering is somewhere similar to the classification algorithm,
but the difference is the type of dataset that we are using. In classification, we work with the
labeled data set, whereas in clustering, we work with the unlabelled dataset.

You might also like