0% found this document useful (0 votes)
14 views6 pages

Ml-Unit 2-QB

The document covers various concepts in supervised learning, including linear regression, logistic regression, and classification models like decision trees and support vector machines. It discusses overfitting, the importance of training and test sets, and ensemble learning methods such as bagging and boosting. Additionally, it contrasts different machine learning algorithms and their applications, providing insights into Bayesian linear regression and random forests.

Uploaded by

deepika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

Ml-Unit 2-QB

The document covers various concepts in supervised learning, including linear regression, logistic regression, and classification models like decision trees and support vector machines. It discusses overfitting, the importance of training and test sets, and ensemble learning methods such as bagging and boosting. Additionally, it contrasts different machine learning algorithms and their applications, providing insights into Bayesian linear regression and random forests.

Uploaded by

deepika
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

UNIT II SUPERVISED LEARNING

Introduction to machine learning – Linear Regression Models: Least squares,


single & multiple variables, Bayesian linear regression, gradient descent, Linear
Classification Models: Discriminant function – Probabilistic discriminative model - Logistic
regression, Probabilistic generative model – Naive Bayes, Maximum margin classifier –
Support vector machine, Decision Tree, Random forests.

PART - A

1. What is the niche of Machine Learning? (NOV/DEC 2024)CS3491


Machine learning is a branch of computer science which deals with system
programming in order to automatically learn and improve with experience. For
example: Robots are programed so that they can perform the task based on data they
gather from sensors. It automatically learns programs from data.

2. What is ‘Overfitting’ in Machine learning?


In machine learning, when a statistical model describes random error or noise
instead of underlying relationship ‘overfitting’ occurs. When a model is excessively
complex, overfitting is normally observed, because of having too many parameters
with respect to the number of training data types. The model exhibits poor performance
which has been overfit.
3. Why overfitting happens?
The possibility of overfitting exists as the criteria used for training the model is not
the same as the criteria used to judge the efficacy of a model.

4. Assume a disease so rare that it is seen in only one person out of every million.
Assume also that we have a test that is effective in that if a person has the disease, there
is a 99 percent chance that the test result will be positive; however, the test is not
perfect, and there is a one in a thousand chance that the test result will be positive on a
healthy person. Assume that a new patient arrives and the test result is positive. What is
the probability that the patient has the disease? (APRIL/MAY 2024)CS3491

5. Write 3 types of ensemble learning (APRIL/MAY 2024)CS3491


 Bagging
 Boosting
 Stacking
6. What are the three stages to build the hypotheses or model in machine learning?
 Model building
 Model testing
 Applying the model

7. State the logic behind Gaussian processes ( Nov/Dec 2024) CS3491


The logic behind Gaussian Processes (GPs) is that they provide a non-parametric way to
model functions by assuming that any finite set of function values has a joint Gaussian
distribution. GPs define a distribution over functions, where the function values are
related through a covariance function (kernel) that encodes assumptions about the
smoothness and structure of the function.

8. What is ‘Training set’ and ‘Test set’?


In various areas of information science like machine learning, a set of data is used
to discover the potentially predictive relationship known as ‘Training Set’. Training set
is an examples given to the learner, while Test set is used to test the accuracy of the
hypotheses generated by the learner, and it is the set of example held back from the
learner. Training set are distinct from Test set.

9. What is Random forest?( APRIL/MAY 2023) CS3491


It is a flexible and easy-to-use machine learning algorithm that gives great results
without even using hyper-parameter tuning. Because of its simplicity and diversity, it is
one of the most used algorithms for both classification and regression tasks.

10. What are the advantages of Naive Bayes?


In Naïve Bayes classifier will converge quicker than discriminative models like
logistic regression, so you need less training data. The main advantage is that it can’t
learn interactions between features.

11. What is the main key difference between supervised and unsupervised machine
learning?(APRIL.MAY 2023) CS3491
supervised learning Unsupervised learning
The supervised learning technique needs Unsupervised learning does not
labelled data to train the model. For need any labelled dataset. This is
example, to solve a classification problem the main key difference between
(a supervised learning task), you need to supervised learning and
have label data to train the model and to unsupervised learning.
classify the data into your labelled groups.

12. Compare and contrast linear regression and logistic regression ( April/May 2023)
Linear Regression and Logistic Regression are both fundamental algorithms in machine
learning, but they differ in their purpose and the type of problems they solve:
 Linear Regression: Used for predicting continuous numerical values. It models the
relationship between the dependent variable and independent variables by fitting a
straight line (linear relationship) to the data. The output is a real number.
 Logistic Regression: Used for classification problems, where the goal is to predict
discrete outcomes (e.g., 0 or 1). It applies the logistic (sigmoid) function to the linear
combination of inputs to produce a probability, which is then mapped to a class label
(typically binary).

13. What is a gradient? How gradient Descent is useful in machine learning ? APRIL/MAY
2023
A gradient is a vector that represents the direction and rate of the fastest increase of a
function. In the context of machine learning, the gradient of a loss function with respect to
model parameters shows how much the loss will change if the parameters are adjusted.
Gradient Descent is an optimization algorithm used in machine learning to minimize the
loss function. It involves iteratively adjusting the model parameters in the opposite
direction of the gradient, effectively reducing the loss. The algorithm continues to update
parameters until it reaches a local minimum (or close enough to it).

14. Relate Entropy and information gain ( Nov/Dec 2023)


Entropy: Measures the uncertainty or randomness in a dataset. It quantifies how mixed
or pure the data is with respect to the target classes. Lower entropy indicates more
certainty (or purity), while higher entropy means more disorder.
Information Gain: Measures the reduction in entropy when a dataset is split based on a
particular attribute. It shows how much information is gained by knowing the value of
that attribute in reducing uncertainty about the target class.

15. How does CART solve the regression problems ( Nov/Dec 2023)

CART (Classification and Regression Trees) solves regression problems by


constructing a binary tree where each internal node represents a decision rule based on
feature values, and each leaf node contains a predicted value.

16. What are types of classification models?


 Logistic Regression
 Naive Bayes
 K-Nearest Neighbors
 Decision Tree
 Support Vector Machines
17. Mention the merits of bayesian linear regression ( April/ May 2024)

The merits of Bayesian Linear Regression include:

1. Uncertainty Estimation: It provides a probabilistic approach to model the


uncertainty in the regression coefficients, offering a distribution over the possible
values rather than a single point estimate. This allows for more informed
predictions with confidence intervals.
2. Regularization: Bayesian Linear Regression inherently incorporates
regularization through prior distributions, which helps prevent overfitting,
especially in cases with limited data or high-dimensional features.

18. Which is better linear regression or random forest?


Multiple linear regression is often used for prediction in neuroscience. Random
forest regression is an alternative form of regression. It does not make the assumptions
of linear regression. We show that linear regression can be superior to random forest
regression.

19. What is SVM?


It is a supervised learning algorithm used both for classification and regression
problems. A type of discriminative modelling, support vector machine (SVM) creates a
decision boundary to segregate n-dimensional space into classes. The best decision
boundary is called a hyperplane created by choosing the extreme points called the
support vectors.
20. Distinguish between random forests vs SVM.( April / May 2024)
There are a couple of reasons why a random forest is a better choice of the model
than asupport vector machine:
● Random forests allow you to determine the feature importance. SVM’s can’t do
this.
● Random forests are much quicker and simpler to build than an SVM.
● For multi-class classification problems, SVMs require a one-vs-rest method,
which is less scalable and more memory intensive.

Part – B
1. Explain Naïve Bayes Classifier with an Example.
2. Elaborate on logistics regression with an example .Explain the process of computing
coeffients (APRIL/MAY 2023)CS3491
3. What is a classification tree?Explain the steps to construct a classification tree.List and
explain about the different procedures used (APRIL/MAY 2023) CS3491
4. State when and why you would use ramdom forests vs SVM (April/May 2024) CS3491
5. Explain the principle of the gradient descent algorithm. Accompany your explanation
with a diagram. (April/May 2024) CS3491
6. Describe the general procedure of random forest algorithm (Nov/dec 2023) CS3491
7. With a suitable example explain knowledge extraction in detail ( Nov/Dec2023) CS3491
8. Explain the following
a) Linear regression
b) Logistic Regression
9. CS3491(April/May 2024)
\
10. CS3491(April/May 2023)

11. Explain about how optimal Hyperplane differs from other hyperplanes.Elaborate on how SVM is
able to achieve this ? ( April/May 2023)
12. Explain the process of constructing CART (Classification and Regression Tree) with a suitable
example ( April/May 2023)
13. Write short notes on Regression and correlation and limitation of Regression model ( Nov/ Dec
2023)
14. List the advantages of SVM and how optimal hyperplane differs from hyperplane ( Nov/Dec
2023)
15. Discuss the supervised and unsupervised learning with example ( Nov/ Dec 2023)
16. With an example explain decision tree concepts in detail ( April /May 2024)
17. April/May 2024
18. April/ May 2024

You might also like