Sample Questions

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8
At a glance
Powered by AI
Some of the key machine learning concepts discussed are logistic regression, linear regression, regularization techniques like LASSO and Ridge, dimensionality reduction using PCA, and neural networks. Other topics covered include bias-variance tradeoff, overfitting, and evaluation metrics.

PCA finds linear combinations of features to project the data to a lower-dimensional space, while LASSO selects an actual subset of the original features. PCA aims to retain maximum variance while LASSO aims for sparsity. PCA transforms features while LASSO selects existing features.

Topic modeling is an unsupervised technique that identifies topics or themes within a collection of text documents. It can help analyze large corpora, discover hidden semantic patterns, and has applications in information retrieval, summarization and categorization.

1.

What would you do if you want to train logistic regression on same data that will take less
time as well as give the comparatively similar accuracy (may not be exactly the same)?
a. Decrease the learning rate and decrease the number of iterations
b. Decrease the learning rate and increase the number of iterations
c. Increase the learning rate and increase the number of iterations
d. Increase the learning rate and decrease the number of iterations

2. How will the bias in estimation in a logistic regression chance on using very high
regularization in the model?
a. Bias will be high
b. Bias will be low
c. Bias may be high some cases and low in some other cases
d. None of the above options are correct

3. Which of the following statements is TRUE?


a. Linear Regression error values have to be normally distributed but in case of Logistic
Regression it is not the case.
b. Logistic Regression error values have to be normally distributed but in case of Linear
Regression it is not the case.
c. Both Linear Regression and Logistic Regression error values have to be normally
distributed.
d. Both for Linear Regression and Logistic Regression, the error values need not be
normally distributed.

4. The logit function is the natural log of odds. What could be the range of logit function in the
domain x = [0, 1].
a. (– ∞ , ∞)
b. (0, 1)
c. (0, ∞)
d. (– ∞, 0)

5. Is XOR problem solvable using a single perceptron?


a. Yes
b. No
c. Sometimes but not always
d. None of the above options are correct.
With only the input and the output layer nodes, it is impossible

6. The class conditional probability in a Bayesian classified is called


a. Evidence
b. Likelihood
c. Prior
d. Posterior
7. Which of the following statements are TRUE about subset selection? (Multiple options are
true)
a. Subset selection can substantially decrease the bias in estimation
b. Ridge regression frequently eliminates some of the features
c. Subset selection can reduce overfitting
d. Finding the best subset involves exponential time complexity

8. How does the bias-variance decomposition of a ridge regression estimator compare with
that of ordinary least squares regression?
a. Ridge has larger bias, larger variance
b. Ridge has smaller bias, larger variance
c. Ridge has smaller bias, and smaller variance
d. Ridge has larger bias, smaller variance

9. Both PCA and Lasso can be used for feature selection. Which of the following statements are
TRUE? (Multiple options are correct)
a. Lasso selects a subset (not necessarily a strict subset) of the original features
b. PCA and Lasso both allow you to specify how many features are chosen
c. PCA produces features that are linear combinations of the original features
d. PCA and Lasso are the same as far as their feature selection ability is concerned.

10. In neural networks, nonlinear activation functions such as SIGMOID function


a. Speed up the gradient calculation in backpropagation, as compared to linear units
b. Are applied only to output units of the networks
c. Help the model to learn nonlinear decision boundaries
d. Always produce output values in the range [0, 1]

11. Which of the following are TRUE about Generative Models? (Multiple options are correct)
a. They model the joint distribution P(class = C AND sample = x)
b. An Artificial Neural Network is a generative model
c. They can be used for classification
d. Linear Discriminant Analysis is a generative model

12. Which of the following assumptions do we make while deriving linear regression
parameters? (Multiple options are correct)
a. The true relationship between the response variable y and the predictor variable x is
linear
b. The model errors are statistically independent
c. The error is normally distributed with 0 mean and constant standard deviation
d. The predictor x is non-stochastic and is measured error-free

13. For k cross-validation, smaller k value implies less variance.


a. The statement is always TRUE
b. The statement is always FALSE
c. The statement is sometimes TRUE, but not always
d. There is no relation between the value of k and the variance in the model

14. As the model complexity increases, bias will decrease while variance will increase. This
statement is:
a. Always TRUE
b. Always FALSE
c. Sometimes TRUE and sometimes FALSE
d. Model complexity has got nothing to do with model variance

15. Likelihood
a. Is same as a p-value
b. Is the probability of observing a particular parameter value given a set of data
c. Attempts to find the parameter value which is the most likely given the observed
data
d. Minimizes the difference between the model and the data.

1. Which of the following statements is TRUE?


e. Linear regression error values have to be normally distributed but in case of logistic
regression it is not the case.
f. Logistic regression error values have to be normally distributed but in case of linear
regression it is not the case.
g. Both linear regression and logistic regression error values have to be normally
distributed.
h. Both for linear regression and logistic regression, the error values need not be
normally distributed.

16. Which of the following assumptions do we make while deriving linear regression
parameters? (Multiple options may be correct. Choose all correct options to get full credit.)
a. The true relationship between the response variable y and the predictor variable x is
linear
b. The model errors are statistically independent
c. The errors are normally distributed with 0 mean and constant standard deviation
d. The predictor x is non-stochastic and is measured error-free.

17. For k cross-validation, smaller k value implies less variance.


a. The statement is always TRUE
b. The statement is always FALSE
c. The statement is sometimes TRUE, but not always
d. There is no relation between the value of k and the variance in the model

18. As the model complexity increases, bias will decrease while variance will increase. This
statement is:
a. Always TRUE
b. Always FALSE
c. Sometimes TRUE and sometimes FALSE
d. Model complexity has got nothing to do with model variance.

19. In Principal Component Analysis, the correlation coefficients between the variables and the
components are known as:
a. Component scores
b. Component loadings
c. Correlation loadings
d. None of the above

20. Imagine, you are solving a classification problem with two highly imbalanced classes. The
majority class is observed 99% of the records in the training dataset. Your model has 99%
accuracy on the test data class prediction. Which of the following is TRUE in such as case?
a. Classification accuracy, Precision, and Recall are all good metrics
b. None of Classification accuracy, Precision, and Recall are good metrics
c. Classification accuracy is not a good metric, while Precision and Recall are
d. Classification accuracy is a good metric, while Precision and Recall are not

21. Which of the following is TRUE for a White Noise series? (Hint: Multiple options are
correct.)
a. Zero mean
b. Zero auto-covariances
c. Zero autocorrelations except at lag zero
d. Stationary time series

22. Suppose we fit Lasso Regression to a data set which has 100 features (X1, X2, ....X100). Now,
we rescale one of these features, say X1, by multiplying it by 10, and then refit Lasso
Regression with the same regularization parameter. Which of the following options is now
correct?
a. It is more likely that X1 will be excluded from the model
b. It is more likely that X1 will be included in the model
c. Nothing can be said beforehand
d. None of the above

23. Which of the following is TRUE about "Ridge" or "Lasso" Regression methods in case of
feature selection?
a. Ridge Regression uses subset selection of features
b. Lasso Regression uses subset selection of features
c. Both use subset selection of features
d. None of the above

24. Suppose you have fitted a complex regression model on a dataset. Now you are using Ridge
Regression with the tuning parameter lambda to reduce its complexity. Which of the
following statements is correct?
a. In case of very small lambda, bias is low and variance is high
b. In case of very small lambda, bias is high and variance is low
c. In case of very high lambda, bias is high and variance is low
d. In case of very high lambda, bias is low and variance is high

25. Suppose we have generated a synthetic dataset with a predictor and a response with the
help of polynomial regression of degree 3 (i.e., degree 3 will perfectly fit the data). Now
consider the following statements and identify which of them are CORRECT. (Multiple
options are correct; identify all of them)
a. A simple linear regression model fitted on the data will have a high bias and low
variance
b. A simple linear regression model will have a low bias and a high variance
c. A polynomial regression model with degree 3 will have a low bias and a high
variance
d. A polynomial regression model with a degree 3 will have a low bias and a low
variance

26. Youden's Index provides the best classification cut-off when


a. Sensitivity and specificity are equally important
b. Sensitivity and positive predictive value (PPV) are equally important
c. The number of positives in the data set is more than the number of negatives
d. The number of negatives in the data set is more than the number of positives

27. Suppose your model is exhibiting high variance across different training sets. Which of the
following is NOT a valid way to try and reduce the variance?
a. Increase the amount of training data in each training set
b. Improve the optimization algorithm being used for error minimization
c. Decrease the model complexity
d. Reduce the noise in the training data

28. What would you do if you want to train logistic regression on same data that will take less
time as well as give the comparatively similar accuracy (may not be exactly the same)?
a. Decrease the learning rate and decrease the number of iterations
b. Decrease the learning rate and increase the number of iterations
c. Increase the learning rate and increase the number of iterations
d. Increase the learning rate and decrease the number of iterations

29. How will the bias in estimation in a logistic regression change on using very high
regularization in the model?
a. Bias will be high
b. Bias will be low
c. Bias may be high some cases and low in some other cases
d. None of the above statements are correct

30. Which of the following statements are TRUE about subset selection? (Multiple options are
true)
a. Subset selection can substantially decrease the bias in estimation
b. Ridge regression frequently eliminates some of the features
c. Subset selection can reduce overfitting
d. Finding the best subset involves exponential time complexity

31. How does the bias-variance decomposition of a ridge regression estimator compare with
that of ordinary least squares regression?
a. Ridge has larger bias, larger variance
b. Ridge has smaller bias, larger variance
c. Ridge has smaller bias, and smaller variance
d. Ridge has larger bias, smaller variance

32. Both PCA and Lasso can be used for feature selection. Which of the following statements are
TRUE? (Multiple options are correct)
a. Lasso selects a subset (not necessarily a strict subset) of the original features
b. PCA and Lasso both allow you to specify how many features are chosen
c. PCA produces features that are linear combinations of the original features
d. PCA and Lasso are the same as far as their feature selection ability is concerned.

33. Likelihood
a. is same as a p-value
b. is the probability of observing a particular parameter value given a set of data
c. attempts to find the parameter value which is the most likely given the observed
data
d. minimizes the difference between the model and the data.

34. Which of the following techniques would perform better for reducing dimensions of a data
set?
a. Removing columns which have too many missing values
b. Removing columns which have high variance in data
c. Removing columns with dissimilar data trends
d. None of the above statements are correct

35. Which of the following statement(s) is/are TRUE about Principal Component Analysis?
(Multiple option are correct)
a. PCA is an unsupervised learning method
b. PCA searches for the directions in which the data have the largest variance
c. Maximum number of principal components is the number of features in the data
d. All principal components are orthogonal to each other

36. PCA works better if there is


a. A linear structure in the data.
b. If the data lies on a curved surface and not on a flat surface
c. If variables are scaled in the same unit
d. All the above statements are correct
37. Which of the following features can be used for accuracy improvement of a text
classification model?
a. Frequency count of terms
b. Dependency Grammar
c. Part of Speech Tag
d. All of the above

38. Which of the following statement is TRUE for Latent Dirichlet Allocation (LDA) technique of
Topic Modeling? (Multiple options are correct)
a. It is an unsupervised learning method
b. Selection of number of topics in a model does not depend on the size of the text
data
c. Number of topic terms is directly proportional to the size of the text data
d. It is used for sentiment analysis in the text data
39. When building a regression or classification model, which of the following is the correct
sequence to follow?
a. Removal of NAs  Normalize the data  PCA  Training the model
b. Removal of NAs  PCA  Normalize the data  Training the model
c. Normalize the data  Removal of NAs  Training the model  PCA
d. Normalize the data  PCA  Training the model  PCA

40. Which of the following statements is TRUE for a classification model?


a. A steep ROCR curve and a flat Lift curve indicate a good model
b. A flat ROCR curve and a steep Lift curve indicate a good model
c. A steep ROCR curve and a steep Lift curve indicate a good model
d. A flat ROCR curve and a flat Lift curve indicate a good model

41. LASSO regression uses


a. Gradient Descent algorithm and L1 regularization for computing the coefficients
b. Coordinate Descent algorithm and L1 regularization for computing the coefficients
c. Gradient Descent algorithm and L2 regularization for computing the coefficients
d. Stochastic Gradient Descent algorithm and L2 regularization for computing the
coefficients

42. Artificial Neural Networks-based models can be used for


a. Classification only with linearly separable data
b. Regression only with data having linear relationships
c. Both for classification and regression with linearly separable data only
d. Both classification and regression with both linearly and nonlinearly separable data

For two short answer type questions (each carrying five marks), study the following topics from the
materials that I shared with you:

1. Principal Component Analysis (the ppt and the Jupyter notebook that I shared)
2. LASSO and Ridge Regression (the jupyter note book that I shared)
3. Topic Modelling (from the jupyter notebook for text mining that I shared)
4. Artificial Neural Networks (from the pdf doc that I shared)

You might also like