0% found this document useful (0 votes)
11 views6 pages

Aiml-Qb - Unit 3

nigga

Uploaded by

windsurfmahee1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

Aiml-Qb - Unit 3

nigga

Uploaded by

windsurfmahee1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

UNIT III SUPERVISED LEARNING

Introduction to machine learning – Linear Regression Models: Least squares,


single & multiple variables, Bayesian linear regression, gradient descent, Linear
Classification Models: Discriminant function – Probabilistic discriminative model -
Logistic regression, Probabilistic generative model – Naive Bayes, Maximum margin
classifier – Support vector machine, Decision Tree, Random forests.

PART - A

1. What is the niche of Machine Learning? (NOV/DEC 2024)


Machine learning is a branch of computer science which deals with system programming
in order to automatically learn and improve with experience. For example: Robots are
programed so that they can perform the task based on data they gather from sensors. It
automatically learns programs from data.

2. Mention the difference between Data Mining and Machine learning?


Machine learning relates with the study, design and development of the algorithms that
give computers the capability to learn without being explicitly programmed. While, data
mining can be defined as the process in which the unstructured data tries to extract
knowledge or unknown interesting patterns.

3. What is ‘Overfitting’ in Machine learning?


In machine learning, when a statistical model describes random error or noise instead of
underlying relationship ‘overfitting’ occurs. When a model is excessively complex,
overfitting is normally observed, because of having too many parameters with respect to
the number of training data types. The model exhibits poor performance which has been
overfit.

4. Why overfitting happens?


The possibility of overfitting exists as the criteria used for training the model is not the
same as the criteria used to judge the efficacy of a model.

5. How can you avoid overfitting?


By using a lot of data-[ overfitting can be avoided, overfitting happens relatively as you
have a small dataset, and you try to learn from it. But if you have a small database and you
are forced to come with a model based on that. In such situation, you can use a technique
known as cross validation. In this method the dataset splits into two section, testing and
training datasets, the testing dataset will only test the model while, in training dataset, the
datapoints will come up with the model. In this technique, a model is usually given a
dataset of a known data on which training (training data set) is run and a dataset of
unknown data against which the model is tested. The idea of cross validation is to define a
dataset to “test” the model in the training phase.

6. Assume a disease so rare that it is seen in only one person out of every million. Assume
also that we have a test that is effective in that if a person has the disease, there is a 99
percent chance that the test result will be positive; however, the test is not perfect, and
there is a one in a thousand chance that the test result will be positive on a healthy person.
Assume that a new patient arrives and the test result is positive. What is the probability
that the patient has the disease? (APRIL/MAY 2024)

1
7. Write 3 types of ensemble learning (APRIL/MAY 2024)
 Bagging
 Boosting
 Stacking
8. What are the three stages to build the hypotheses or model in machine learning?
● Model building

● Model testing

● Applying the model

9. State the logic behind Gaussian processes ( Nov/Dec 2024)


The logic behind Gaussian Processes (GPs) is that they provide a non-parametric way to
model functions by assuming that any finite set of function values has a joint Gaussian
distribution. GPs define a distribution over functions, where the function values are related
through a covariance function (kernel) that encodes assumptions about the smoothness and
structure of the function.

10. What is ‘Training set’ and ‘Test set’?


In various areas of information science like machine learning, a set of data is used to
discover the potentially predictive relationship known as ‘Training Set’. Training set is
an examples given to the learner, while Test set is used to test the accuracy of the
hypotheses generated by the learner, and it is the set of example held back from the
learner. Training set are distinct from Test set.

11. What is Random forest?( APRIL/MAY 2023)


It is a flexible and easy-to-use machine learning algorithm that gives great results without
even using hyper-parameter tuning. Because of its simplicity and diversity, it is one of
the most used algorithms for both classification and regression tasks.

12. What are the advantages of Naive Bayes?


In Naïve Bayes classifier will converge quicker than discriminative models like logistic
regression, so you need less training data. The main advantage is that it can’t learn
interactions between features.

13. What is the main key difference between supervised and unsupervised machine
learning?(APRIL.MAY 2023)
supervised learning Unsupervised learning
The supervised learning technique Unsupervised learning does
needs labelled data to train the not need any labelled dataset.
model. For example, to solve a This is the main key
classification problem (a supervised difference between
2
learning task), you need to have supervised learning
label data to train the model and to and
classify the data into your labelled unsupervised learning.
groups.

3
14. What is a Linear Regression?
In simple terms, linear regression is adopting a linear approach to modeling the
relationship between a dependent variable (scalar response) and one or more
independent variables (explanatory variables). In case you have one explanatory
variable, you call it a simple linear regression. In case you have more than one
independent variable, you refer to the process as multiple linear regressions.

15. What are the disadvantages of the linear regression model?


One of the most significant demerits of the linear model is that it is sensitive and
dependent on the outliers. It can affect the overall result. Another notable demerit of
the linear model is overfitting. Similarly, underfitting is also a significant
disadvantage of the linear model.

16. What is the difference between classification and regression?


Classification is used to produce discrete results; classification is used to classify data
into some specific categories. For example, classifying emails into spam and non-
spam categories. Whereas, we use regression analysis when we are dealing with
continuous data, for example predicting stock prices at a certain point in time.

17. What is the difference between stochastic gradient descent (SGD) and
gradient descent (GD)?
Both algorithms are methods for finding a set of parameters that minimize a loss
function by evaluating parameters against data and then making adjustments.
In standard gradient descent, you'll evaluate all training samples for each set of
parameters. This is akin to taking big, slow steps toward the solution. In stochastic
gradient descent, you'll evaluate only 1 training sample for the set of parameters
before updating them. This is akin to taking small, quick steps toward the solution.

18. What are the different types of least squares?


Least squares problems fall into two categories: linear or ordinary least squares and
nonlinear least squares, depending on whether or not the residuals are linear in all
unknowns. The linear least-squares problem occurs in statistical regression analysis;
it has a closed-form solution.

19. What are some advantages to using Bayesian linear regression?


Doing Bayesian regression is not an algorithm but a different approach to statistical
inference. The major advantage is that, by this Bayesian processing, you recover the
whole range of inferential solutions, rather than a point estimate and a confidence
interval as in classical regression.

20. What Is Bayesian Linear Regression?


In Bayesian linear regression, the mean of one parameter is characterized by a
weighted sum of other variables. This type of conditional modeling aims to
determine the prior distribution of the regressors as well as other variables describing
the allocation of the regress and eventually permits the out-of-sample forecasting of
the regress and conditional on observations of the regression coefficients.

4
Part – B
1. Explain Naïve Bayes Classifier with an Example.
2. Elaborate on logistics regression with an example .Explain the process of computing
coeffients (APRIL/MAY 2023)
3. What is a classification tree?Explain the steps to construct a classification tree.List
and explain about the different procedures used (APRIL/MAY 2023)
4. State when and why you would use ramdom forests vs SVM (April/May 2024)
5. Explain the principle of the gradient descent algorithm. Accompany your explanation
with a diagram. (April/May 2024)
6. Describe the general procedure of random forest algorithm (Nov/dec 2023)
7. With a suitable example explain knowledge extraction in detail ( Nov/Dec2023)
8. Explain the following
a) Linear regression
b) Logistic Regression
9. APRIL/MAY 2024

10.APRIL/MAY 2023

Prepared By, Approved


By,

5
Mrs. A.Deepika
Dr.R.Deepalakshmi
AP/AIDS
Prof& Head /AIDS

You might also like