QB Unit 2
QB Unit 2
II YEAR / IV SEM
Prepared by
S.BASKARI, M.Tech,MBA,(Ph.D)
ASSISTANT PROFESSOR
PART A
The least squares method minimizes the sum of the squares of the residuals (the differences between
observed and predicted values) to find the best-fitting linear model.
Multiple linear regression involves more than one independent variable, whereas simple linear regression involves
only one independent variable.
4. How does Bayesian linear regression differ from ordinary linear regression?
Bayesian linear regression incorporates prior distributions over the model parameters and updates these priors with
data to obtain posterior distributions, providing a probabilistic interpretation of the regression model.
Gradient descent is an optimization algorithm used to minimize the cost function by iteratively adjusting the
model parameters in the direction of the steepest decrease in the cost function.
The discriminant function is a function used in classification to assign input data points to different classes by
determining decision boundaries.
7. Describe the Perceptron algorithm. The Perceptron algorithm is a simple linear classifier that updates the
weights of the model iteratively based on misclassified examples until it finds a separating hyperplane or reaches a
stopping criterion.
Logistic regression models the probability of a binary outcome using a logistic function, providing a probabilistic
interpretation of classification.
The Naive Bayes classifier is a probabilistic generative model that assumes independence between features given
the class label and uses Bayes' theorem to predict the probability of each class.
The objective of an SVM is to find the hyperplane that maximizes the margin between different classes, thereby
achieving the best possible separation.
11. How does a decision tree classify data?
A decision tree classifies data by recursively splitting the data into subsets based on the value of input features,
leading to a tree structure where each leaf node represents a class label.
A random forest is an ensemble learning method that constructs multiple decision trees during training and outputs
the mode of their predictions for classification tasks or the average for regression tasks.
13. What are the assumptions of the linear regression model? The assumptions of the linear regression model
include linearity, independence, homoscedasticity (constant variance of errors), normality of errors, and no
multicollinearity among the independent variables.
The goodness-of-fit of a linear regression model is commonly measured using the coefficient of determination
(R²), which indicates the proportion of the variance in the dependent variable that is predictable from the
independent variables
The Perceptron algorithm is a simple linear classifier that updates weights based on misclassified examples
without providing probability estimates, while logistic regression models the probability of class membership using
the logistic function and is suitable for binary classification with a probabilistic interpretation.
16. What is the significance of the kernel trick in Support Vector Machines (SVM)?
The kernel trick allows SVMs to efficiently perform nonlinear classification by implicitly mapping input features
into a higher-dimensional space without explicitly computing the coordinates in that space, enabling the separation
of nonlinearly separable data.
17. What is the Gini index, and how is it used in decision trees?
The Gini index is a measure of impurity used to evaluate splits in decision trees. It quantifies how often a randomly
chosen element would be incorrectly classified if it were randomly labelled according to the distribution of labels
in the subset.
18. How does the inclusion of prior distributions influence the results in Bayesian linear regression?
The inclusion of prior distributions in Bayesian linear regression allows incorporating prior knowledge about the
model parameters, leading to posterior distributions that combine the prior information with the observed data,
providing a more comprehensive uncertainty estimation.
The main objective of the Maximum Margin Classifier is to find a hyperplane that separates the data into classes
while maximizing the margin, which is the distance between the hyperplane and the nearest data points from either
class.
Support vectors are the data points that are closest to the separating hyperplane in an SVM. They are critical in
defining the position and orientation of the hyperplane and directly influence the margin.
A linear SVM uses a linear kernel to find a straight-line hyperplane for classification, suitable for linearly
separable data. A non-linear SVM uses kernel functions (e.g., polynomial, RBF) to transform the data into a
higher-dimensional space where a linear hyperplane can be used to separate non-linearly separable data.
The regularization parameter (C) in SVM controls the trade-off between maximizing the margin and minimizing
the classification error. A smaller C allows for a larger margin but potentially more misclassifications, while a
larger C aims for fewer misclassifications but may result in a smaller margin.
The kernel function in SVM enables the transformation of data into a higher dimensional space where a linear
hyperplane can separate the classes, allowing SVM AL3451_ML 4931_Grace College of Engineering,
Thoothukudi to handle non-linearly separable data. Common kernels include linear, polynomial, and radial basis
function (RBF).
The 'soft margin' concept in SVM allows for some misclassifications in the training data by introducing slack
variables. This approach provides a balance between achieving a large margin and allowing some classification
errors to improve generalization on noisy data.
The dual formulation of SVM involves expressing the optimization problem in terms of Lagrange multipliers,
which simplifies the computation when using kernel functions and allows handling higher-dimensional spaces
without explicit transformations.
SVM handles multi-class classification problems using techniques like One-vs One (OvO) and One-vs-All (OvA).
OvO trains a classifier for every pair of classes, while OvA trains a classifier for each class against all other
classes.
28. What are the advantages of using SVM for classification tasks?
Advantages of using SVM for classification tasks include its effectiveness in high-dimensional spaces, robustness
to overfitting (especially with proper regularization), and its ability to handle non-linear data using appropriate
kernel functions
PART B&C
1.Explain the least squares method for linear regression and its application in machine learning
2.Differentiate between single-variable and multiple-variable linear regression. Provide examples for both.
3. Discuss Bayesian linear regression and its advantages over standard linear regression.
4. Explain the gradient descent algorithm and its application in optimizing linear regression models.
5. Derive the equation for the cost function in linear regression and explain how gradient descent minimizes it.
6. Compare and contrast Bayesian linear regression and least squares regression.
8. Explain the Perceptron algorithm with an example. Discuss its convergence properties.
9. Illustrate the logistic regression model and derive its cost function.
10. Explain the Naive Bayes classifier and its assumptions. Discuss its application in text classification.
11. Compare probabilistic discriminative models (logistic regression) with probabilistic generative models (Naive
Bayes).
12. Discuss the maximum margin classifier and its implementation using support vector machines (SVM).
13. Explain the dual formulation of the support vector machine and the role of kernels.
14. Illustrate the process of constructing a decision tree using the CART algorithm.
15. Explain how information gain and Gini index are used in decision tree splitting criteria.
16. Discuss the advantages and limitations of decision trees. How are these addressed in random forests?
17. Explain the concept of ensemble learning and how random forests improve upon single decision trees.
18. Compare and contrast decision trees with random forests in terms of accuracy, overfitting, and interpretability.
19. Provide a detailed comparison of gradient descent and least squares methods for optimizing regression models.
20. Explain the concept of overfitting in supervised learning and discuss methods to prevent it, using examples
from decision trees and random forests.