0% found this document useful (0 votes)
43 views10 pages

ANS - For ML

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views10 pages

ANS - For ML

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Question Bank

Course: B.Tech

Semester: 5th Subject:

Machine Learning

Short Questions:

1. What is Hyperplane and support vector in SVM.


Hyperplane is a decision boundary in SVM that separates different classes.
Support vectors are the data points closest to the hyperplane.

2. What is supervised learning?


Supervised learning is a type of machine learning where the model is trained
on labeled data to make predictions on new data.

3. What is a consistent hypothesis?


Consistent hypothesis is a hypothesis that is not contradicted by any training
data.

4. What is concept learning?


Concept learning is the process of learning to classify objects or events based
on their features.

5. What is unbiased learning?


Biased learning is a learning process that is influenced by prior assumptions
or prejudices.

6. Define entropy.
Entropy is a measure of disorder or randomness in a system.

7. Relate entropy and information gain.


Entropy and information gain are related in decision trees, where entropy
measures the impurity of a set of examples and information gain measures
the decrease in entropy after a feature is chosen to split the examples.

8. Define regression.
Regression is a type of supervised learning where the goal is to predict a
continuous output value from input features.

9. Compare classification and regression models


Classification models predict discrete output values (i.e. classes) while
regression models predict continuous output values.
10. Define clustering.
Clustering is the process of grouping similar data points together in a dataset.

11. List out the disadvantages of clustering schemes.


Disadvantages of clustering schemes include difficulty in defining similarity,
difficulty in determining the number of clusters, and sensitivity to initial
conditions.

12. Distinguish between classification and Clustering.


Classification assigns predefined labels to data points while Clustering groups
similar data points based on their features.

13. List out the applications of clustering algorithm.


Clustering algorithms are used in fields such as market segmentation, image
processing, and bioinformatics
14. Identify the challenges of clustering algorithm.
Challenges of clustering algorithm include high dimensionality, different
density clusters, and overlapping clusters.

15. Estimate the problems associated with clustering large data.


Clustering large data sets poses computational and memory challenges, and
can also lead to overfitting or loss of global structure.

16. What is k in k-means algorithm? How it is selected?


"k" in k-means algorithm is the number of clusters. It is usually selected by trial and
error or using a technique such as elbow method
17. Compare biological neuron and artificial neuron.
Biological neurons are cells in the brain that process and transmit information,
while artificial neurons are simple models of biological neurons used in
artificial neural networks.

18. Define perceptron.


Perceptron is a type of artificial neural network that can be used for binary
classification.

19. Draw the simple perceptron model.


A simple perceptron model consists of input layer, output layer and a set of
weights and biases that are used to make predictions.

20. Identify the parameters in a perceptron network and its significance.


Parameters in a perceptron network include weights and biases, which are
learned during training and used to make predictions.
21. What are activation functions?
Activation functions are mathematical functions that determine the output of
a neuron in an artificial neural network.

22. What is dimensionality reduction?


Dimensionality reduction is the process of reducing the number of features in
a dataset while preserving important information.

23. Justify the necessity for dimensionality reduction in the context of machine learning.
Dimensionality reduction is necessary in machine learning because it can
improve the performance of a model, reduce overfitting, and make the data
easier to visualize and interpret.

24. Define SVM.


SVM (Support Vector Machine) is a type of supervised learning algorithm that
can be used for classification and regression.

25. What is deep learning.


Deep learning is a branch of machine learning that uses multi-layered neural
networks to learn hierarchical representations of data.

26. When to use the regression?


Regression is used when the output variable is continuous.

27. What is meant by a recommendation system?


A recommendation system is a system that makes personalized
recommendations to users based on their past behavior and preferences.

28. What is testing data and training data?


Training data is the data used to train a model and testing data is used to
evaluate the performance of a trained model.

29. What is multiple linear Regression?


Multiple linear regression is a type of linear regression that models the
relationship between multiple independent variables and a single dependent
variable.

30. What is MLP.


MLP (Multi-Layer Perceptron) is a type of artificial neural network that has
multiple layers between the input and output layers.

31. What is numerical data and categorical data?


Numerical data is data that can be measured, such as age or income, while
categorical data is data that can be grouped into categories, such as gender
or color.

32. Point out/examine supervised learning categories and techniques.


Supervised learning categories include classification and regression.
Techniques include decision trees, k-nearest neighbors, and support vector
machines.

33. What is dimensionality reduction?


Dimensionality reduction is the process of reducing the number of features in
a dataset while preserving important information.

34. Explain biological Neuron.


A biological neuron is a specialized cell in the brain that receives, processes,
and transmits information through electrical and chemical signals.

35. Discuss about unbiased learner.


A biased learner is a learning algorithm that is influenced by prior
assumptions or prejudices.

36. What are Support Vectors in SVM?


Support vectors in SVM are the data points that are closest to the decision
boundary and have the greatest impact on determining the position of the
boundary.

37. Define Precision and Recall?


Precision and recall are two metrics used to evaluate the performance of a
classification model. Precision measures the proportion of true positive
predictions out of all positive predictions, while recall measures the
proportion of true positive predictions out of all actual positive examples.

38. What is overfitting and underfitting.


Overfitting occurs when a model is too complex and fits the training data too
well, leading to poor generalization to new data. Underfitting occurs when a
model is too simple and does not fit the training data well.

39. What is Ensemble learning?


Ensemble learning is a method of combining multiple models to improve the
overall performance of the resulting ensemble.

40. What is the difference between inductive and deductive learning?


41. Inductive learning is the process of learning from examples to make
generalizations and predictions, while deductive learning is the process of
using logical reasoning to infer specific conclusions from general premises.

Focus Questions:

1. Explain what is the function of ‘Unsupervised Learning’?


2. What do you understand by Eigenvectors and Eigenvalues?
Eigenvectors and eigenvalues are mathematical concepts used in linear algebra to
describe the behavior of a linear transformation. Eigenvectors are vectors that
remain unchanged after being transformed, while eigenvalues are scalars that
describe the scaling factor of the transformation. They are used in various machine
learning algorithms such as PCA (Principal Component Analysis) and LDA (Linear
Discriminant Analysis). Eigenvectors are the direction of the maximal variance of the
data after a linear transformation, and eigenvalues are the scaling factor of this
transformation that represent the amount of variance along the eigenvector
direction. Eigenvectors and eigenvalues are used to identify the most important
features of a dataset and to reduce the dimensionality of the data, which can
improve the performance of a machine learning model and make the data easier to
visualize and interpret. Eigenvectors are also used in image compression, image
processing, and image recognition as they can be used to represent an image in
terms of a small number of features
3. List different forms of learning.
There are several forms of learning in machine learning:

1. Supervised learning: This is the most common form of learning, it uses


labeled data, and the model is trained to make predictions or classifications
on new data. Examples of supervised learning algorithms include logistic
regression, decision trees, and support vector machines.
2. Unsupervised learning: In unsupervised learning, the model is not provided
with labeled data, and the goal is to discover patterns or relationships in the
data on its own. Examples of unsupervised learning algorithms include k-
means, hierarchical clustering, and principal component analysis (PCA).
3. Semi-supervised learning: In this type of learning, the model is provided with
some labeled data and some unlabeled data, and it learns to make
predictions or classifications on new data.
4. Reinforcement learning: It's a type of learning where an agent learns to make
decisions in an environment by performing actions and receiving rewards or
penalties.

4. Identify the disadvantage of K- NN algorithm.


Accuracy depends on the quality of the data
With large data, the prediction stage might be slow
Sensitive to the scale of the data and irrelevant features
Require high memory – need to store all of the training data
Given that it stores all of the training, it can be computationally expensive
5. What are benefits of K- NN algorithm?

• Quick calculation time


• Simple algorithm – to interpret
• Versatile – useful for regression and classification
• High accuracy – you do not need to compare with better-supervised
learning models
• No assumptions about data – no need to make additional assumptions,
tune several parameters, or build a model. This makes it crucial in
nonlinear data case.

6. Explain simple model of an Artificial Neuron and its functions.


An artificial neuron is a mathematical model that is designed to mimic the behavior
of a biological neuron. It is a fundamental building block of artificial neural
networks, which are used for tasks such as image recognition, natural language
processing, and decision making.

The basic function of an artificial neuron is to take the input values, multiply them
by their corresponding weights, add the bias term, and then pass the result through
an activation function. The activation function calculates the output of the neuron,
which is a scalar value that represents the neuron's prediction or classification.
Artificial neurons can be connected together to form a neural network, which allows
the model to learn more complex relationships between the inputs and outputs.

8. What is the difference between Gini Impurity and Entropy in a Decision Tree?

Gini impurity and entropy are two measures used to evaluate the quality of a split in
a decision tree.

Gini impurity is a measure of the probability of a random data point being


incorrectly classified if it is randomly labeled according to the class distribution in a
subset. A low Gini impurity value indicates a pure subset, which means that the data
points are mostly of the same class.

Entropy is a measure of disorder or randomness in a set of data points, it represents


the amount of uncertainty or randomness in the data. It also calculates the impurity
of the subset. A low entropy value indicates a pure subset, which means that the
data points are mostly of the same class.

The main difference between Gini impurity and entropy is the range of their values.
Gini impurity ranges from 0 to 0.5, while entropy ranges from 0 to log(n), where n is
the number of classes. Entropy tends to favor splits with multiple classes, while Gini
impurity favors splits with a large number of points in one class.

9. Explain false negative, false positive, true negative and true positive with a simple example.
False negative, false positive, true negative, and true positive are terms used
to evaluate the performance of a binary classification model.

True positive (TP) refers to the number of cases in which the model correctly
identified the positive class.

True negative (TN) refers to the number of cases in which the model
correctly identified the negative class.

False positive (FP) refers to the number of cases in which the model
identified the positive class but it's actually negative class.

False negative (FN) refers to the number of cases in which the model
identified the negative class but it's actually positive class.

For example, consider a model that identifies spam emails. A true positive
would be when the model correctly identifies a spam email as spam. A true
negative would be when the model correctly identifies a non-spam email as
non-spam. A false positive would be when the model identifies a non-spam
email as spam, and a false negative would be when the model identifies a
spam email as non-spam.

10. Explain the Difference Between Classification and Regression?


Classification and regression are two types of supervised learning in machine
learning.

Classification is used to predict a categorical value, such as a label or class. It


is used to identify to which set of categories an object or an instance belongs
to. Examples of classification algorithms include logistic regression, decision
trees, and support vector machines.

Regression, on the other hand, is used to predict a continuous value, such as


a price, weight, or temperature. It is used to predict a value within a range of
possible values. Examples of regression algorithms include linear regression,
polynomial regression, and decision tree regression.

In summary, classification is used to predict discrete or categorical output


variables, while regression is used to predict continuous output variables

11. State Bayes theorem.


Bayes' theorem is a fundamental result in probability theory that relates the
conditional probability of an event to the prior probability of that event. It
states that the probability of an event A given that another event B has
occurred is proportional to the probability of the event B given that event A
has occurred, multiplied by the probability of event A.
Formally, it is written as P(A|B) = (P(B|A) * P(A)) / P(B) where P(A|B) is the
conditional probability of event A given event B, P(B|A) is the conditional
probability of event B given event A, P(A) is the prior probability of event A,
and P(B) is the prior probability of event B.

Bayes' theorem is used in many machine learning algorithms, such as Naive


Bayes for classification, and Bayesian networks for probabilistic reasoning.

12. Explain conditional probability


Conditional probability is the probability of an event occurring given that
another event has occurred. It is represented by P(A|B) where A is the event
of interest and B is the event that has occurred. It expresses the likelihood of
event A happening given that event B has already happened.

For example, if we know that a person has a certain disease, we can use
conditional probability to determine the probability that they have a
symptom given that they have the disease.

Conditional probability is calculated using the formula P(A|B) = P(A and B) /


P(B), where P(A and B) is the probability of events A and B occurring
together, and P(B) is the probability of event B occurring.

Conditional probability is used in many machine learning algorithms such as


decision trees, Bayesian networks, and Markov models for probabilistic
reasoning.

13. Write the difference between supervised learning and unsupervised learning.

Supervised learning is a type of machine learning where the model is


trained on labeled data, meaning that the desired output for each input is
provided. The goal of supervised learning is to learn a general rule that maps
inputs to outputs. Examples of supervised learning algorithms include linear
regression, logistic regression, and support vector machines.

Unsupervised learning, on the other hand, is a type of machine learning


where the model is not provided with labeled data. The goal of unsupervised
learning is to find patterns or structure in the data without any prior
knowledge of the output. Examples of unsupervised learning algorithms
include k-means, hierarchical clustering, and principal component analysis.

14. Discuss major applications of machine learning.


15. Explain various learning techniques involved in unsupervised learning?
16. Comment on the issues in machine learning
Machine learning has many issues that researchers and practitioners need to be
aware of. Some of the main issues include:

Overfitting: Overfitting occurs when a model is trained too well on the


training data, but it generalizes poorly on new, unseen data.

Underfitting: Underfitting occurs when a model is not trained well enough


on the training data and it performs poorly on new, unseen data.

Data bias: Data bias occurs when the data used to train a model is not
representative of the real-world data, leading to poor performance on new,
unseen data.

Data scarcity: Data scarcity occurs when there is not enough data to train a
model, leading to poor performance on new, unseen data.

Privacy and security: Machine learning algorithms can be used to access


and manipulate sensitive data, which can lead to privacy and security issues
.
Explainability: Many machine learning algorithms are not interpretable or
understandable by humans, which makes it difficult to understand how the
model is making predictions.

Scalability: Some machine learning algorithms are computationally


expensive and can be challenging to scale to large datasets.

17. Explain various learning techniques involved in supervised learning?


Regression: This technique is used to predict continuous values, such as a
price, weight or temperature. Examples of regression algorithms include
linear regression, polynomial regression, and decision tree regression.

Classification: This technique is used to predict categorical values, such as a


label or class. Examples of classification algorithms include logistic
regression, decision trees, and support vector machines.

Neural Networks: This technique is used to approximate any function, it is


used to classify, cluster, and even generate images, music, and more.

Ensemble methods: This technique is used to combine the predictions of


multiple models to improve performance, examples of ensemble methods
include Random Forest, Gradient Boosting, and Bagging.
Decision Trees: This technique is used to generate a model in the form of a
tree structure, it's used to classify and even regression problems.

18. What is confusion matrix? Explain with one example.


19. What are the factors affecting the performance of machine learning algorithm?
20. Discuss about types of artificial neural networks.
21. Discuss major applications of machine learning.

Long Questions:

1. What is machine learning? Discuss learning and machine learning. Discuss about various
types of machine learning.
2. Explain the machine learning life cycle with a diagram.
3. Explain the following
a)Linear regression b) Logistic Regression
4. Find the covariance and correlation coefficient of data X={1,2,3,4,5} and Y={1,4,9,16,25)
5. What is support vector machine .Discuss in detail?
6. Define Multiclass Classification with a neat diagram?
7. Define clustering. What are the different types of clustering explain in detail.
8. Explain the K-means clustering algorithm with an example.
9. a. What is a density-based clustering algorithm?
b. Explain the DBSCAN clustering algorithm with an example.
10. a. How is a Random Forest related to Decision Trees?
b. Explain Random forest algorithm with an example.
11. Explain SVM classifier with a suitable example
12. What is reinforcement learning explain its detailed concepts.
13. How to construct ID3 and derive the procedure to construct a decision tree using ID3
14. A. What is reinforcement learning? What are the feature of Reinforcement learning?
B. Explain the concept of the Bellman equation in reinforcement learning with an example.
15. A. What is Ensemble learning in Machine Learning?
B. Explain different methods of ensemble learning.
16. Explain the concept of CNN with an example.
17. Explain the K-Mode clustering algorithm with an example.

18. Consider the training dataset given in the following table. Use Weighted k-NN and
determine the class. Test instance (7.6, 60, 8) and K=3

You might also like