0% found this document useful (0 votes)
45 views8 pages

ML MCQ Unit 2

Uploaded by

Aditya Gaikwad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views8 pages

ML MCQ Unit 2

Uploaded by

Aditya Gaikwad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

1.

Another name for an output attribute: (BL I)


A.Predictive variable
B.Independent variable
C.Estimated variable
D.Dependent variable
ANSWER: D

2. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of


students from a college.Which of the following statement is true in following case?
(BL I)
A.Feature F1 is an example of nominal variable.
B.Feature F1 is an example of ordinal variable.
C.It doesnt belong to any of the above category.
D.Both of these
ANSWER: B

3. Which of the following is an example of a deterministic algorithm? (BL I)


A.PCA
B.K-Means
C.None of the above
D. Both of above
ANSWER:A

4. A Pearson correlation between two variables is zero but, still their values can still be
related to each other. (BL I)
A.TRUE
B.FALSE
C.May be  
D.No relation will be there.
ANSWER:A

5. Adding a non-important feature to a linear regression model may result in. 1.


Increase in R-square  2. Decrease in R-square(BL I)
A.Only 1 is correct
B.Only 2 is correct
C.Either 1 or 2
D.None of these
ANSWER:A

6. What would you do in PCA to get the same projection as SVD? (BL I)
A.Transform data to zero mean
B.Transform data to zero median
C.Not possible
D.None of these
ANSWER:A

7. One hot encoding (BL I)


A.increase the dimensionality of a data set
B.decrease the dimensionality of a data set
C.Depends upon the use case
D. All of the above
ANSWER:A

8. Which of the following techniques would perform better for reducing dimensions of a
data set? (BL I)
A. Removing columns which have too many missing values
B. Removing columns which have high variance in data
C. Removing columns with dissimilar data trends
D. None of these
ANSWER:A

9. Dimensionality reduction algorithms are one of the possible ways to reduce the
computation time required to build a model. (BL I)
A. TRUE
B. FALSE
C.May be  
D.No relation will be there.
ANSWER:A

10. What will happen when eigenvalues are roughly equal? (BL I)
A. PCA will perform outstandingly
B. PCA will perform badly
C. Cant Say
D. None of above
ANSWER:B

11. In which of the following case LDA will fail? (BL I)


A. If the discriminatory information is not in the mean but in the variance of the data
B. If the discriminatory information is in the mean but not in the variance of the data
C. If the discriminatory information is in the mean and variance of the data
D. None of these
ANSWER:A

12. Which of the following options are correct, when you are applying PCA on a image
dataset?  1. It can be used to effectively detect deformable objects. 2. It is invariant
to affine transforms. 3. It can be used for lossy image compression. 4. It is not
invariant to shadows. (BL I)
A. 1 and 2
B. 2 and 3
C. 3 and 4
D. 1 and 4
ANSWER:C

13. Under which condition SVD and PCA produce the same projection result? (BL I)
A. When data has zero median
B. When data has zero mean
C. Both are always same
D. None of these
ANSWER:B

14. Missing values has become a regular part of the modern-day data collection
procedure. The main reasons for missing values include- 1. Error in collection 2.
Error in recording 3. Non Responsiveness(Eg- People not revealing their income in
the Income space) (BL I)
A. 1 and 2
B. 1 and 3
C. 1,2 and 3
D. Cant Say
ANSWER:D

15. We use Dimensionality Reduction techniques a. For there are too many
variables/predictors which increase the chances of multicollinearity b. Which
increase the chances of over-fitting while model building and these variables
become redundant. (BL I)
A. Only a
B. Only b
C. Both A and B
D. None of above
ANSWER:C

16. Scikit-learn is considered as the best library for (BL I)


A. Model building
B. evaluation metrics
C. hyper-parameter tuning
D. All of the above
ANSWER: D

17. Which of the following are reasons for using feature scaling? (BL I)
A. It speeds up solving for ? using the normal equation.
B. It prevents the matrix XTX (used in the normal equation) from being non-invertable
(singular/degenerate).
C. It is necessary to prevent gradient descent from getting stuck in local optima.
D. It speeds up gradient descent by making it require fewer iterations to get to a good
solution.
ANSWER:D

18. How do you handle missing or corrupted data in a dataset? (BL I)


A. Drop missing rows or columns
B. Replace missing values with mean/median/mode
C. Assign a unique category to missing values
D. All of the above
ANSWER:D

19. What is the purpose of performing cross-validation? (BL I)


A. To assess the predictive performance of the models
B. To judge how the trained model performs outside the sample on test data
C. Both A and B
D. None of above
ANSWER:C

20. When performing regression or classification, which of the following is the correct
way to preprocess the data? (BL I)
A. Normalize the data ? PCA ? training
B. PCA ? normalize PCA output ? training
C.Normalize the data ? PCA ? normalize PCA output ? training
D. None of the above
ANSWER:A

21. Which of the following is an example of feature extraction? (BL I)


A.Constructing bag of words vector from an email
B. Applying PCA projects to a large high-dimensional data
C. Removing stopwords in a sentence
D. All of the above
ANSWER:D

22. What is pca.components_ in Sklearn? (BL I)


A. Set of all eigen vectors for the projection space
B. Matrix of principal components
C. Result of the multiplication matrix
D. None of the above options
ANSWER:A

23. ________ allows you to rebuild a sample starting from a sparse dictionary of atoms
(similar to principal components, but without constraints about the independence).
(BL I)
A. Dictionary learning
B. Machine Learning
C. Both A and B
D. None of the above
ANSWER: A

24. Which of the following algorithms cannot be used for reducing the dimensionality of
data (BL I)
A. t-SNE
B. PCA
C. LDA False
D. None of these
ANSWER:D

25. True or False:  PCA can be used for projecting and visualizing data in lower
dimensions. (BL I)
A. TRUE
B. FALSE
ANSWER:A

26. Suppose we are using dimensionality reduction as pre-processing technique, i.e,


instead of using all the features, we reduce the data to k dimensions with PCA. And
then use these PCA projections as our features. Which of the following statement is
correct? (BL I)
A. Higher ‘k’ means more regularization
B. Higher ‘k’ means less regularization
C. Can’t Say
D. None of Above 
ANSWER: B

27. In which of the following scenarios is t-SNE better to use than PCA for
dimensionality reduction while working on a local machine with minimal
computational power (BL I)
A. Dataset with 1 Million entries and 300 features
B. Dataset with 100000 entries and 310 features
C. Dataset with 10,000 entries and 8 features
D. Dataset with 10,000 entries and 200 features
ANSWER: C

28. Which of the following statement is true for a t-SNE cost function (BL I)
A. It is asymmetric in nature.
B. It is symmetric in nature.
C. It is same as the cost function for SNE.
D. None of Above 
ANSWER:B

29. Imagine you are dealing with text data. To represent the words you are using word
embedding (Word2vec). In word embedding, you will end up with 1000 dimensions.
Now, you want to reduce the dimensionality of this high dimensional data such that,
similar words should have a similar meaning in nearest neighbour space. In such
case, which of the following algorithm are you most likely choose? (BL I)
A. t-SNE
B. PCA
C. LDA
D. None of these
ANSWER:A

30. True or False  t-SNE learns non-parametric mapping. (BL I)


A. TRUE
B. FALSE
ANSWER: A

31. Which of the following statement is correct for t-SNE and PCA (BL I)
A. t-SNE is linear whereas PCA is non-linear
B. t-SNE and PCA both are linear
C. t-SNE and PCA both are nonlinear
D. t-SNE is nonlinear whereas PCA is linear
ANSWER: D

32. In t-SNE algorithm, which of the following hyper parameters can be tuned (BL I)
A. Number of dimensions
B. Smooth measure of effective number of neighbours
C. Maximum number of iterations
D. All of the above
ANSWER: D

33. What is of the following statement is true about t-SNE in comparison to PCA (BL I)
A. When the data is huge (in size), t-SNE may fail to produce better results.
B. T-NSE always produces better result regardless of the size of the data
C. PCA always performs better than t-SNE for smaller size data.
D. None of these
ANSWER: A

34. What will happen when eigenvalues are roughly equal (BL I)
A. PCA will perform outstandingly
B. PCA will perform badly
C. Cant Say
D.None of above
ANSWER:B

35. PCA works better if there is  1.A linear structure in the data 2.If the data lies on a
curved surface and not on a flat surface 3.If variables are scaled in the same unit
(BL I)
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1 ,2 and 3
ANSWER: C

36. What happens when you get features in lower dimensions using PCA  1.The
features will still have interpretability 2.The features will lose  interpretability 3.The
features must carry all information present in data 4.The features may not carry all
information present in data (BL I)
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
ANSWER: D

37. Which of the following option(s) is / are true  1.You need to initialize parameters in
PCA 2.You don’t need to initialize parameters in PCA 3.PCA can be trapped into
local minima problem   4.PCA can’t be trapped into local minima problem (BL I)
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
ANSWER: D

38. Another name for an output attribute: (BL II)


A.Predictive variable
B.Independent variable
C.Estimated variable
D.Dependent variable
ANSWER: D

39. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of
students from a college.Which of the following statement is true in following case?
(BL II)
A.Feature F1 is an example of nominal variable.
B.Feature F1 is an example of ordinal variable.
C.It doesnt belong to any of the above category.
D.Both of these
ANSWER: B

40. Which of the following is an example of a deterministic algorithm? (BL II)


A.PCA
B.K-Means
C.None of the above
D. Both of above
ANSWER:A
41. A Pearson correlation between two variables is zero but, still their values can still be
related to each other. (BL II)
A.TRUE
B.FALSE
C.May be  
D.No relation will be there.
ANSWER:A

42. Which of the following comparison(s) are true about PCA and LDA  1.Both LDA and
PCA are linear transformation techniques 2.LDA is supervised whereas PCA is
unsupervised 3.PCA maximize the variance of the data, whereas LDA maximize the
separation between different classes, (BL II)
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1, 2 and 3
ANSWER: D

43. Which of the following statements is/are true about Type-1 and Type-2 errors? 1.
Type1 is known as false positive and Type2 is known as false negative. 2. Type1 is
known as false negative and Type2 is known as false positive. 3. Type1 error occurs
when we reject a null hypothesis when it is actually true. (BL II)
A.Only 3
B.1 and 2
C.1 and 3
D.2 and 3
ANSWER: D

44. In k-NN it is very likely to overfit due to the curse of dimensionality. Which of the
following option would you consider to handle such problem? 1. Dimensionality
Reduction 2. Feature selection(BL II)
A. 1
B. 2
C. 1 and 2
D. None of these
ANSWER:D

45. PCA works better if there is? 1. A linear structure in the data  2. If the data lies on a
curved surface and not on a flat surface 3. If variables are scaled in the same unit
(BL II)
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1,2 and 3
ANSWER:C

46. What happens when you get features in lower dimensions using PCA? 1. The
features will still have interpretability  2. The features will lose interpretability  3. The
features must carry all information present in data 4. The features may not carry all
information present in data(BL II)
A. 1 and 3
B. 1 and 4
C. 2 and 3
D. 2 and 4
ANSWER:D

47. Which of the following options is/are true for K-fold cross-validation? 1. Increase in K
will result in higher time required to cross validate the result. 2. Higher values of K
will result in higher confidence on the cross-validation result as compared to lower
value of K. 3. If K=N, then it is called Leave one out cross validation, where N is the
number of observations. (BL III)
A. 1 and 2
B. 2 and 3
C. 1 and 3
D. 1, 2 and 3
ANSWER:D

48. The most popularly used dimensionality reduction algorithm is Principal Component
Analysis (PCA). Which of the following is/are true about PCA? 1. PCA is an
unsupervised method 2. It searches for the directions that data have the largest
variance 3. Maximum number of principal components <= number of features 4. All
principal components are orthogonal to each other. (BL III)
A. 1 and 2
B. 1 and 3
C. 2 and 3
D. All of the above
ANSWER:D

49. Which of the following can be the first 2 principal components after applying PCA? 
1. (0.5, 0.5, 0.5, 0.5) and (0.71, 0.71, 0, 0)   2. (0.5, 0.5, 0.5, 0.5) and (0, 0, -0.71, -
0.71)  3. (0.5, 0.5, 0.5, 0.5) and (0.5, 0.5, -0.5, -0.5)  4. (0.5, 0.5, 0.5, 0.5) and (-0.5, -
0.5, 0.5, 0.5) (BL III)
A. 1 and 2
B. 1 and 3
C. 2 and 4
D. 3 and 4
ANSWER:D

50. The major reasons for performing data preprocessing includes- 1. Lack of
appropriate variables or contains missing values 2. Inconsistency or Discrepancy in
the recorded data 3. Remove outliers or errors(BL III)
A. 1 and 2
B. 1 and 3
C. 1,2 and 3
D. Cant Say
ANSWER:D

51. The most popularly used dimensionality reduction algorithm is Principal Component
Analysis (PCA). Which of the following is/are true about PCA 1.PCA is an
unsupervised method 2.It searches for the directions that data have the largest
variance 3.Maximum number of principal  components <= number of features 4.All
principal components are orthogonal to each other(BL III)
A. 1 and 2
B. 1 and 3
C. 2 and 3
D.All of the above
ANSWER: D

You might also like