0% found this document useful (0 votes)
441 views20 pages

ML Unit 03 MCQ

This document provides solutions to multiple choice questions about logistic regression and linear regression. It addresses topics like: - Methods for fitting data in logistic regression and linear regression (maximum likelihood, least squares error) - Variable selection methods like LASSO - Evaluating models using metrics like mean squared error - Interpreting residuals and correlation coefficients - Understanding bias-variance tradeoff and overfitting - Applying regularization methods like ridge regression

Uploaded by

pranav chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
441 views20 pages

ML Unit 03 MCQ

This document provides solutions to multiple choice questions about logistic regression and linear regression. It addresses topics like: - Methods for fitting data in logistic regression and linear regression (maximum likelihood, least squares error) - Variable selection methods like LASSO - Evaluating models using metrics like mean squared error - Interpreting residuals and correlation coefficients - Understanding bias-variance tradeoff and overfitting - Applying regularization methods like ridge regression

Uploaded by

pranav chaudhari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT –III

1. Which of the following methods do we use to best fit the data in Logistic Regression?
A) Least Square Error
B) Maximum Likelihood
C) Jaccard distance
D) Both A and B
Ans Solution: B

2. Choose which of the following options is true regarding One-Vs-All method in Logistic
Regression.
A) We need to fit n models in n-class classification problem
B) We need to fit n-1 models to classify into n classes
C) We need to fit only 1 model to classify into n classes
D) None of these
Ans Solution: A

3. Suppose, You applied a Logistic Regression model on a given data and got a training accuracy
X and testing accuracy Y. Now, you want to add a few new features in the same data. Select the
option(s) which is/are correct in such a case.
Note: Consider remaining parameters are same.
A) Training accuracy increases
B) Training accuracy increases or remains the same
C) Testing accuracy decreases
D) Testing accuracy increases or remains the same
Ans Solution: A and D
Adding more features to model will increase the training accuracy because model has to
consider more data to fit the logistic regression. But testing accuracy increases if feature is
found to be significant

4. Which of the following algorithms do we use for Variable Selection?


A) LASSO
B) Ridge
C) Both
D) None of these
Ans Solution: A
In case of lasso we apply a absolute penality, after increasing the penality in lasso some of the
coefficient of variables may become zero

5. Which of the following statement is true about outliers in Linear regression?


A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
Ans Solution: (A)
The slope of the regression line will change due to outliers in most of the cases. So Linear
Regression is sensitive to outliers.

6. Which of the following methods do we use to find the best fit line for data in Linear
Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Ans Solution: (A)
In linear regression, we try to minimize the least square errors of the model to identify the line
of best fit.

7. Which of the following is true about Residuals?


A) Lower is better
B) Higher is better
C) A or B depend on the situation
D) None of these
Ans Solution: (A)
Residuals refer to the error values of the model. Therefore lower residuals are desired.

8. Suppose you plotted a scatter plot between the residuals and predicted values in linear
regression and you found that there is a relationship between them. Which of the following
conclusion do you make about this situation?

A) Since the there is a relationship means our model is not good


B) Since the there is a relationship means our model is good
C) Can’t say
D) None of these
Ans Solution: (A)
There should not be any relationship between predicted values and residuals. If there exists any
relationship between them, it means that the model has not perfectly captured the information
in the data.

9. Suppose you have fitted a complex regression model on a dataset. Now, you are using Ridge
regression with penalty x.
Choose the option which describes bias in best manner.
A) In case of very large x; bias is low
B) In case of very large x; bias is high
C) We can’t say about bias
D) None of these
Ans Solution: (B)
If the penalty is very large it means model is less complex, therefore the bias would be high.

10. Which of the following option is true?


A) Linear Regression errors values has to be normally distributed but in case of Logistic
Regression it is not the case
B) Logistic Regression errors values has to be normally distributed but in case of Linear
Regression it is not the case
C) Both Linear Regression and Logistic Regression error values have to be normally distributed
D) Both Linear Regression and Logistic Regression error values have not to be normally
distributed
Ans Solution: A

11. Suppose you have trained a logistic regression classifier and it outputs a new example x with
a prediction ho(x) = 0.2. This means
Our estimate for P(y=1 | x)
Our estimate for P(y=0 | x)
Our estimate for P(y=1 | x)
Our estimate for P(y=0 | x)
Ans Solution: B

12. True-False: Linear Regression is a supervised machine learning algorithm.


A) TRUE
B) FALSE
Solution: (A)
Yes, Linear regression is a supervised learning algorithm because it uses true labels for training.
Supervised learning algorithm should have input variable (x) and an output variable (Y) for each
example.

13. True-False: Linear Regression is mainly used for Regression.


A) TRUE
B) FALSE
Solution: (A)
Linear Regression has dependent variables that have continuous values.
14. True-False: It is possible to design a Linear regression algorithm using a neural network?

A) TRUE
B) FALSE

Solution: (A)

True. A Neural network can be used as a universal approximator, so it can definitely implement
a linear regression algorithm.

15. Which of the following methods do we use to find the best fit line for data in Linear
Regression?
A) Least Square Error
B) Maximum Likelihood
C) Logarithmic Loss
D) Both A and B
Solution: (A)
In linear regression, we try to minimize the least square errors of the model to identify the line
of best fit.

16. Which of the following evaluation metrics can be used to evaluate a model while modeling
a continuous output variable?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
Solution: (D)
Since linear regression gives output as continuous values, so in such case we use mean squared
error metric to evaluate the model performance. Remaining options are use in case of a
classification problem.

17. True-False: Lasso Regularization can be used for variable selection in Linear Regression.
A) TRUE
B) FALSE
Solution: (A)
True, In case of lasso regression we apply absolute penalty which makes some of the coefficients
zero.

18. Which of the following is true about Residuals ?


A) Lower is better
B) Higher is better
C) A or B depend on the situation
D) None of these
Solution: (A)
Residuals refer to the error values of the model. Therefore lower residuals are desired.

19. Suppose that we have N independent variables (X1,X2… Xn) and dependent variable is Y.
Now Imagine that you are applying linear regression by fitting the best fit line using least square
error on this data.
You found that correlation coefficient for one of it’s variable(Say X1) with Y is -0.95.
Which of the following is true for X1?
A) Relation between the X1 and Y is weak
B) Relation between the X1 and Y is strong
C) Relation between the X1 and Y is neutral
D) Correlation can’t judge the relationship
Solution: (B)
The absolute value of the correlation coefficient denotes the strength of the relationship.
Since absolute correlation is very high it means that the relationship is strong between X1 and
Y.

20. Looking at above two characteristics, which of the following option is the correct for
Pearson correlation between V1 and V2?
If you are given the two variables V1 and V2 and they are following below two characteristics.
1. If V1 increases then V2 also increases
2. If V1 decreases then V2 behavior is unknown
A) Pearson correlation will be close to 1
B) Pearson correlation will be close to -1
C) Pearson correlation will be close to 0
D) None of these

Solution: (D)
We cannot comment on the correlation coefficient by using only statement 1. We need to
consider the both of these two statements. Consider V1 as x and V2 as |x|. The correlation
coefficient would not be close to 1 in such a case.

21. Suppose Pearson correlation between V1 and V2 is zero. In such case, is it right to
conclude that V1 and V2 do not have any relation between them?
A) TRUE
B) FALSE
Solution: (B)
Pearson correlation coefficient between 2 variables might be zero even when they have a
relationship between them. If the correlation coefficient is zero, it just means that that they
don’t move together. We can take examples like y=|x| or y=x^2.
22. True- False: Overfitting is more likely when you have huge amount of data to train?
A) TRUE
B) FALSE
Solution: (B)
With a small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e.
overfitting.

23. We can also compute the coefficient of linear regression with the help of an analytical
method called “Normal Equation”. Which of the following is/are true about Normal Equation?
1. We don’t have to choose the learning rate
2. It becomes slow when number of features is very large
3. Thers is no need to iterate

A) 1 and 2
B) 1 and 3
C) 2 and 3
D) 1,2 and 3
Solution: (D)
Instead of gradient descent, Normal Equation can also be used to find coefficients.

Question Context 24-26:


Suppose you have fitted a complex regression model on a dataset. Now, you are using Ridge
regression with penality x.
24. Choose the option which describes bias in best manner.
A) In case of very large x; bias is low
B) In case of very large x; bias is high
C) We can’t say about bias
D) None of these
Solution: (B)
If the penalty is very large it means model is less complex, therefore the bias would be high.

25. What will happen when you apply very large penalty?
A) Some of the coefficient will become absolute zero
B) Some of the coefficient will approach zero but not absolute zero
C) Both A and B depending on the situation
D) None of these
Solution: (B)
In lasso some of the coefficient value become zero, but in case of Ridge, the coefficients become
close to zero but not zero.

26. What will happen when you apply very large penalty in case of Lasso?
A) Some of the coefficient will become zero
B) Some of the coefficient will be approaching to zero but not absolute zero
C) Both A and B depending on the situation
D) None of these
Solution: (A)
As already discussed, lasso applies absolute penalty, so some of the coefficients will become
zero.

27. Which of the following statement is true about outliers in Linear regression?
A) Linear regression is sensitive to outliers
B) Linear regression is not sensitive to outliers
C) Can’t say
D) None of these
Solution: (A)
The slope of the regression line will change due to outliers in most of the cases. So Linear
Regression is sensitive to outliers.

28. Suppose you plotted a scatter plot between the residuals and predicted values in linear
regression and you found that there is a relationship between them. Which of the following
conclusion do you make about this situation?

A) Since the there is a relationship means our model is not good


B) Since the there is a relationship means our model is good
C) Can’t say
D) None of these
Solution: (A)
There should not be any relationship between predicted values and residuals. If there exists any
relationship between them,it means that the model has not perfectly captured the information
in the data.

Question Context 29-31:


Suppose that you have a dataset D1 and you design a linear regression model of degree 3
polynomial and you found that the training and testing error is “0” or in another terms it
perfectly fits the data.
29. What will happen when you fit degree 4 polynomial in linear regression?
A) There are high chances that degree 4 polynomial will over fit the data
B) There are high chances that degree 4 polynomial will under fit the data
C) Can’t say
D) None of these
Solution: (A)
Since is more degree 4 will be more complex(overfit the data) than the degree 3 model so it will
again perfectly fit the data. In such case training error will be zero but test error may not be
zero.
30. What will happen when you fit degree 2 polynomial in linear regression?
A) It is high chances that degree 2 polynomial will over fit the data
B) It is high chances that degree 2 polynomial will under fit the data
C) Can’t say
D) None of these
Solution: (B)
If a degree 3 polynomial fits the data perfectly, it’s highly likely that a simpler model(degree 2
polynomial) might under fit the data.

31. In terms of bias and variance. Which of the following is true when you fit degree 2
polynomial?

A) Bias will be high, variance will be high


B) Bias will be low, variance will be high
C) Bias will be high, variance will be low
D) Bias will be low, variance will be low
Solution: (C)
Since a degree 2 polynomial will be less complex as compared to degree 3, the bias will be high
and variance will be low.

Question Context 32-33:


We have been given a dataset with n records in which we have input attribute as x and output
attribute as y. Suppose we use a linear regression method to model this data. To test our linear
regressor, we split the data in training set and test set randomly.
32. Now we increase the training set size gradually. As the training set size increases, what do
you expect will happen with the mean training error?

A) Increase
B) Decrease
C) Remain constant
D) Can’t Say
Solution: (D)
Training error may increase or decrease depending on the values that are used to fit the model.
If the values used to train contain more outliers gradually, then the error might just increase.

33. What do you expect will happen with bias and variance as you increase the size of training
data?

A) Bias increases and Variance increases


B) Bias decreases and Variance increases
C) Bias decreases and Variance decreases
D) Bias increases and Variance decreases
E) Can’t Say False
Solution: (D)
As we increase the size of the training data, the bias would increase while the variance would
decrease.

Question Context 34:


Consider the following data where one input(X) and one output(Y) is given.

34. What would be the root mean square training error for this data if you run a Linear
Regression model of the form (Y = A0+A1X)?

A) Less than 0
B) Greater than zero
C) Equal to 0
D) None of these
Solution: (C)
We can perfectly fit the line on the following data so mean error will be zero.

Question Context 35-36:


Suppose you have been given the following scenario for training and validation error for Linear
Regression.
Number Validation
Learning Training
Scenario of Error
Rate Error
iterations

1 0.1 1000 100 110

2 0.2 600 90 105


3 0.3 400 110 110

4 0.4 300 120 130

5 0.4 250 130 150

35. Which of the following scenario would give you the right hyper parameter?
A) 1
B) 2
C) 3
D) 4
Solution: (B)
Option B would be the better option because it leads to less training as well as validation error.
36. Suppose you got the tuned hyper parameters from the previous question. Now, Imagine
you want to add a variable in variable space such that this added feature is important. Which
of the following thing would you observe in such case?
A) Training Error will decrease and Validation error will increase
B) Training Error will increase and Validation error will increase
C) Training Error will increase and Validation error will decrease
D) Training Error will decrease and Validation error will decrease
E) None of the above
Solution: (D)
If the added feature is important, the training and validation error would decrease.

Question Context 37-38:


Suppose, you got a situation where you find that your linear regression model is under fitting
the data.
37. In such situation which of the following options would you consider?
1. I will add more variables
2. I will start introducing polynomial degree variables
3. I will remove some variables
A) 1 and 2
B) 2 and 3
C) 1 and 3
D) 1, 2 and 3
Solution: (A)
In case of under fitting, you need to induce more variables in variable space or you can add
some polynomial degree variables to make the model more complex to be able to fir the data
better.
38. Now situation is same as written in previous question(under fitting).Which of following
regularization algorithm would you prefer?

A) L1
B) L2
C) Any
D) None of these
Solution: (D)
I won’t use any regularization methods because regularization is used in case of overfitting.

39. True-False: Is Logistic regression a supervised machine learning algorithm?


A) TRUE
B) FALSE
Solution: A
True, Logistic regression is a supervised learning algorithm because it uses true labels for
training. Supervised learning algorithm should have input variables (x) and an target variable (Y)
when you train the model .

40. True-False: Is Logistic regression mainly used for Regression?


A) TRUE
B) FALSE
Solution: B
Logistic regression is a classification algorithm, don’t confuse with the name regression.

41. True-False: Is it possible to design a logistic regression algorithm using a Neural Network
Algorithm?
A) TRUE
B) FALSE
Solution: A
True, Neural network is a is a universal approximator so it can implement linear regression
algorithm.

42. True-False: Is it possible to apply a logistic regression algorithm on a 3-class Classification


problem?
A) TRUE
B) FALSE
Solution: A
Yes, we can apply logistic regression on 3 classification problem, We can use One Vs all method
for 3 class classification in logistic regression.

43. Which of the following methods do we use to best fit the data in Logistic Regression?
A) Least Square Error
B) Maximum Likelihood
C) Jaccard distance
D) Both A and B
Solution: B
Logistic regression uses maximum likely hood estimate for training a logistic regression.

44. Which of the following evaluation metrics can not be applied in case of logistic regression
output to compare with target?
A) AUC-ROC
B) Accuracy
C) Logloss
D) Mean-Squared-Error
Solution: D
Since, Logistic Regression is a classification algorithm so it’s output can not be real time value so
mean squared error can not use for evaluating it

45. One of the very good methods to analyze the performance of Logistic Regression is AIC,
which is similar to R-Squared in Linear Regression. Which of the following is true about AIC?
A) We prefer a model with minimum AIC value
B) We prefer a model with maximum AIC value
C) Both but depend on the situation
D) None of these
Solution: A
We select the best model in logistic regression which can least AIC.

46. [True-False] Standardisation of features is required before training a Logistic Regression.


A) TRUE
B) FALSE
Solution: B
Standardization isn’t required for logistic regression. The main goal of standardizing features is
to help convergence of the technique used for optimization.

47. Which of the following algorithms do we use for Variable Selection?


A) LASSO
B) Ridge
C) Both
D) None of these

Solution: A
In case of lasso we apply a absolute penality, after increasing the penality in lasso some of the
coefficient of variables may become zero.
Context: 48-49

Consider a following model for logistic regression: P (y =1|x, w)= g(w0 + w1x)
where g(z) is the logistic function.

In the above equation the P (y =1|x; w) , viewed as a function of x, that we can get by changing the
parameters w.

48 What would be the range of p in such case?

A) (0, inf)
B) (-inf, 0 )
C) (0, 1)
D) (-inf, inf)

Solution: C

For values of x in the range of real number from −∞ to +∞ Logistic function will give the output
between (0,1)

49 In above question what do you think which function would make p between (0,1)?

A) logistic function
B) Log likelihood function
C) Mixture of both
D) None of them

Solution: A

Explanation is same as question number 10

50. Suppose you have been given a fair coin and you want to find out the odds of getting heads.
Which of the following option is true for such a case?

A) odds will be 0
B) odds will be 0.5
C) odds will be 1
D) None of these

Solution: C

Odds are defined as the ratio of the probability of success and the probability of failure. So in case of fair
coin probability of success is 1/2 and the probability of failure is 1/2 so odd would be 1

51. The logit function(given as l(x)) is the log of odds function. What could be the range of logit
function in the domain x=[0,1]?
A) (– ∞ , ∞)
B) (0,1)
C) (0, ∞)
D) (- ∞, 0)

Solution: A

For our purposes, the odds function has the advantage of transforming the probability function, which
has values from 0 to 1, into an equivalent function with values between 0 and ∞. When we take the
natural log of the odds function, we get a range of values from -∞ to ∞.

52. Which of the following option is true?

A) Linear Regression errors values has to be normally distributed but in case of Logistic Regression it is
not the case
B) Logistic Regression errors values has to be normally distributed but in case of Linear Regression it is
not the case
C) Both Linear Regression and Logistic Regression error values have to be normally distributed
D) Both Linear Regression and Logistic Regression error values have not to be normally distributed

Solution:A

53. Which of the following is true regarding the logistic function for any value “x”?

Note:
Logistic(x): is a logistic function of any number “x”

Logit(x): is a logit function of any number “x”

Logit_inv(x): is a inverse logit function of any number “x”

A) Logistic(x) = Logit(x)
B) Logistic(x) = Logit_inv(x)
C) Logit_inv(x) = Logit(x)
D) None of these

Solution: B

54. How will the bias change on using high(infinite) regularisation?

Suppose you have given the two scatter plot “a” and “b” for two classes( blue for positive and red for
negative class). In scatter plot “a”, you correctly classified all data points using logistic regression ( black
line is a decision boundary).
A) Bias will be high
B) Bias will be low
C) Can’t say
D) None of these

Solution: A

Model will become very simple so bias will be very high.

55. Suppose, You applied a Logistic Regression model on a given data and got a training accuracy X
and testing accuracy Y. Now, you want to add a few new features in the same data. Select the
option(s) which is/are correct in such a case.

Note: Consider remaining parameters are same.

A) Training accuracy increases


B) Training accuracy increases or remains the same
C) Testing accuracy decreases
D) Testing accuracy increases or remains the same

Solution: A and D

Adding more features to model will increase the training accuracy because model has to consider more
data to fit the logistic regression. But testing accuracy increases if feature is found to be significant

56. Choose which of the following options is true regarding One-Vs-All method in Logistic Regression.

A) We need to fit n models in n-class classification problem


B) We need to fit n-1 models to classify into n classes
C) We need to fit only 1 model to classify into n classes
D) None of these
Solution: A

If there are n classes, then n separate logistic regression has to fit, where the probability of each
category is predicted over the rest of the categories combined.

57. Below are two different logistic models with different values for β0 and β1.

Which of the
following statement(s) is true about β0 and β1 values of two logistics models (Green, Black)?

Note: consider Y = β0 + β1*X. Here, β0 is intercept and β1 is coefficient.

A) β1 for Green is greater than Black


B) β1 for Green is lower than Black
C) β1 for both models is same
D) Can’t Say

Solution: B

β0 and β1: β0 = 0, β1 = 1 is in X1 color(black) and β0 = 0, β1 = −1 is in X4 color (green)

Context 58-60

Below are the three scatter plot(A,B,C left to right) and hand drawn decision boundaries for logistic
regression.
58. Which of the following above figure shows that the decision boundary is overfitting the training
data?

A) A
B) B
C) C
D)None of these

Solution: C

Since in figure 3, Decision boundary is not smooth that means it will over-fitting the data.

59. What do you conclude after seeing this visualization?

1. The training error in first plot is maximum as compare to second and third plot.

2. The best model for this regression problem is the last (third) plot because it has minimum
training error (zero).

3. The second model is more robust than first and third because it will perform best on unseen
data.

4. The third model is overfitting more as compare to first and second.

5. All will perform same because we have not seen the testing data.

A) 1 and 3
B) 1 and 3
C) 1, 3 and 4
D) 5

Solution: C

The trend in the graphs looks like a quadratic trend over independent variable X. A higher degree(Right
graph) polynomial might have a very high accuracy on the train population but is expected to fail badly
on test dataset. But if you see in left graph we will have training error maximum because it underfits the
training data

60. Suppose, above decision boundaries were generated for the different value of regularization.
Which of the above decision boundary shows the maximum regularization?

A) A
B) B
C) C
D) All have equal regularization

Solution: A

Since, more regularization means more penality means less complex decision boundry that shows in first
figure A.

61. What would do if you want to train logistic regression on same data that will take less time as well
as give the comparatively similar accuracy(may not be same)?

Suppose you are using a Logistic Regression model on a huge dataset. One of the problem you may face
on such huge data is that Logistic regression will take very long time to train.

A) Decrease the learning rate and decrease the number of iteration


B) Decrease the learning rate and increase the number of iteration
C) Increase the learning rate and increase the number of iteration
D) Increase the learning rate and decrease the number of iteration

Solution: D

If you decrease the number of iteration while training it will take less time for surly but will not give the
same accuracy for getting the similar accuracy but not exact you need to increase the learning rate.

62. Which of the following image is showing the cost function for y =1.

Following is the loss function in logistic regression(Y-axis loss function and x axis log probability) for
two class classification problem.

Note: Y is the target class


A) A
B) B
C) Both
D) None of these

Solution: A

A is the true answer as loss function decreases as the log probability increases

63. Suppose, Following graph is a cost function for logistic regression.

Now, How many local minimas are present in the graph?

A) 1
B) 2
C) 3
D) 4

Solution: C
There are three local minima present in the graph

64. Can a Logistic Regression classifier do a perfect classification on the below data?

Note: You can use only X1 and X2 variables where X1 and X2 can take only two binary values(0,1).

A) TRUE
B) FALSE
C) Can’t say
D) None of these

Solution: B

No, logistic regression only forms linear decision surface, but the examples in the figure are not linearly
separable.

You might also like