Machine-Learning Set 7
Machine-Learning Set 7
7 of 8 sets
601. The average squared difference between classifier predicted output and actual
output.
A. mean squared error
B. root mean squared error
C. mean absolute error
D. mean relative error
Answer:A
q M
Answer:A
c
Mused to model _______ data.
603. Regression trees are often
A. linear
B. nonlinear
C. categorical
D. symmetrical
Answer:B
606. This technique associates a conditional probability value with each data
instance.
A. linear regression
B. logistic regression
C. simple regression
D. multiple linear regression
Answer:B
607. This supervised learning technique can process both numeric and categorical
input attributes.
A. linear regression
B. bayes classifier
C. logistic regression
D. backpropagation learning
Answer:A
609. This clustering algorithm merges and splits nodes to help modify nonoptimal
partitions.
A. agglomerative clustering
B. expectation maximization
C. conceptual clustering
610. This clustering algorithm initially assumes that each data instance represents
a single cluster.
A. agglomerative clustering
B. conceptual clustering
C. k-means clustering
D. expectation maximization
Answer:C
612. Machine learning techniques differ from statistical techniques in that machine
learning methods
A. typically assume an underlying distribution for the data.
B. are better able to deal with missing and noisy data.
C. are not able to explain their behavior.
D. have trouble with large-sized datasets.
Answer:B
618. if there is only a discrete number of possible outcomes (called categories), the
process becomes a______.
A. Regression
B. Classification.
C. Modelfree
D. Categories
620. During the last few years, many ______ algorithms have been applied to deep
neural networks to learn the best policy for playing Atari video games and to teach
an agent how to associate the right action with an input representing the state.
A. Logical
B. Classical
C. Classification
D. None of above
Answer:D
625. If you need a more powerful scaling feature, with a superior control on
outliers and the possibility to select a quantile range, there's also the class________.
A. RobustScaler
B. DictVectorizer
C. LabelBinarizer
D. FeatureHasher
Answer:A
627. There are also many univariate methods that can be used in order to select the
best features according to specific criteria based on________.
A. F-tests and p-values
B. chi-square
C. ANOVA
629. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade
of students from a college. Which of the following statement is true in following
case?
A. Feature F1 is an example of nominal variable.
B. Feature F1 is an example of ordinal variable.
C. It doesn’t belong to any of the above category.
D. Both of these
Answer:B
630. The parameter______ allows specifying the percentage of elements to put into
the test/training set
A. test_size
B. training_size
C. All above
D. None of these
Answer:C
635. Which of the following methods do we use to find the best fit line for data in
Linear Regression?
A. Least Square Error
B. Maximum Likelihood
C. Logarithmic Loss
D. Both A and B
Answer:A
636. Which of the following evaluation metrics can be used to evaluate a model
while modeling a continuous output variable?
A. AUC-ROC
B. Accuracy
C. Logloss
D. Mean-Squared-Error
641. Features being classified is __________ of each other in Naïve Bayes Classifier
A. Independent
B. Dependent
C. Partial Dependent
D. None
Answer:A
649. SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = ?1jx)
A. True
B. false
Answer:B
651. Solving a non linear separation problem with a hard margin Kernelized SVM
(Gaussian RBF Kernel) might lead to overfitting
A. True
B. false
Answer:A
654. The linear SVM classifier works by drawing a straight line between two
classes
A. True
B. false
Answer:A
657. Even if there are no actual supervisors ________ learning is also based on
feedback provided by the environment
A. Supervised
B. Reinforcement
C. Unsupervised
D. None of the above
Answer:B
658. When it is necessary to allow the model to develop a generalization ability and
avoid a common problem called______.
A. Overfitting
B. Overlearning
659. Techniques involve the usage of both labeled and unlabeled data is called___.
A. Supervised
B. Semi-supervised
C. Unsupervised
D. None of the above
Answer:B
662. Which of the following are several models for feature extraction
A. regression
B. classification
C. None of the above
Answer:C
663. _____ provides some built-in datasets that can be used for testing purposes.
A. scikit-learn
B. classification
C. regression
664. While using _____ all labels are turned into sequential numbers.
A. LabelEncoder class
B. LabelBinarizer class
C. DictVectorizer
D. FeatureHasher
Answer:A
665. _______produce sparse matrices of real numbers that can be fed into any
machine learning model.
A. DictVectorizer
B. FeatureHasher
C. Both A & B
D. None of the Mentioned
Answer:C
666. scikit-learn offers the class______, which is responsible for filling the holes
using a strategy based on the mean, median, or frequency
A. LabelEncoder
B. LabelBinarizer
C. DictVectorizer
D. Imputer
Answer:D
667. Which of the following scale data by removing elements that don't belong to a
given range or by considering a maximum absolute value.
A. MinMaxScaler
B. MaxAbsScaler
C. Both A & B
D. None of the Mentioned
Answer:C
670. In order to assess how much information is brought by each component, and
the correlation among them, a useful tool is the_____.
A. Concuttent matrix
B. Convergance matrix
C. Supportive matrix
D. Covariance matrix
Answer:D
671. The_____ parameter can assume different values which determine how the
data matrix is initially processed.
A. run
B. start
C. init
D. stop
Answer:C
672. ______allows exploiting the natural sparsity of data while extracting principal
components.
A. SparsePCA
B. KernelPCA
C. SVD
D. init parameter
Answer:A
674. Suppose you plotted a scatter plot between the residuals and predicted values
in linear regression and you found that there is a relationship between them.
Which of the following conclusion do you make about this situation?
A. Since the there is a relationship means our model is not good
B. Since the there is a relationship means our model is good
C. Can’t say
D. None of these
Answer:A
675. Let’s say, a “Linear regression” model perfectly fits the training data (train
error is zero). Now, Which of the following statement is true?
A. You will always have test error zero
B. You can not have test error zero
C. None of the above
Answer:C
678. which of the following step / assumption in regression modeling impacts the
trade-off between under-fitting and over-fitting the most.
A. The polynomial degree
B. Whether we learn the weights by matrix inversion or gradient descent
C. The use of a constant-term
Answer:A
679. Which of the following is true about “Ridge” or “Lasso” regression methods
in case of feature selection?
A. Ridge regression uses subset selection of features
B. Lasso regression uses subset selection of features
C. Both use subset selection of features
D. None of above
Answer:B
680. Which of the following statement(s) can be true post adding a variable in a
linear regression model?1. R-Squared and Adjusted R-squared both increase2. R-
Squared increases and Adjusted R-squared decreases3. R-Squared decreases and
Adjusted R-squared decreases4. R-Squared decreases and Adjusted R-squared
increases
A. 1 and 2
B. 1 and 3
C. 2 and 4
D. None of the above
Answer:A
681. What is/are true about kernel in SVM?1. Kernel function map low
dimensional data to high dimensional space2. It’s a similarity function
A. 1
B. 2
C. 1 and 2
D. None of these
682. Suppose you are building a SVM model on data X. The data X can be error
prone which means that you should not trust any specific data point too much.
Now think that you want to build a SVM model which has quadratic kernel
function of polynomial degree 2 that uses Slack variable C as one of it’s hyper
parameter.What would happen when you use very small C (C~0)?
A. Misclassification would happen
B. Data will be correctly classified
C. Can’t say
D. None of these
Answer:A
688. Some people are using the term ___ instead of prediction only to avoid the
weird idea that machine learning is a sort of modern magic.
A. Inference
B. Interference
C. Accuracy
D. None of above
Answer:A
689. The term _____ can be freely used, but with the same meaning adopted in
physics or system theory.
A. Accuracy
B. Cluster
C. Regression
D. Prediction
Answer:D
690. Common deep learning applications / problems can also be solved using____
A. Real-time visual object identification
B. Classic approaches
C. Automatic labeling
D. Bio-inspired adaptive systems
Answer:B
692. What are the two methods used for the calibration in Supervised Learning?
A. Platt Calibration and Isotonic Regression
B. Statistics and Informal Retrieval
Answer:A
693. Suppose we fit “Lasso Regression” to a data set, which has 100 features
(X1,X2…X100). Now, we rescale one of these feature by multiplying with 10 (say
that feature is X1), and then refit Lasso regression with the same regularization
parameter.Now, which of the following option will be correct?
A. It is more likely for X1 to be excluded from the model
B. It is more likely for X1 to be included in the model
C. Can’t say
D. None of these
Answer:B
694. Which of the following is true about “Ridge” or “Lasso” regression methods
in case of feature selection?
A. Ridge regression uses subset selection of features
B. Lasso regression uses subset selection of features
C. Both use subset selection of features
D. None of above
Answer:B
695. Which of the following statement(s) can be true post adding a variable in a
linear regression model?
1. R-Squared and Adjusted R-squared both increase
2. R-Squared increases and Adjusted R-squared decreases
3. R-Squared decreases and Adjusted R-squared decreases
4. R-Squared decreases and Adjusted R-squared increases
A. 1 and 2
B. 1 and 3
C. 2 and 4
696. We can also compute the coefficient of linear regression with the help of an
analytical method called “Normal Equation”. Which of the following is/are true
about “Normal Equation”?
1. We don’t have to choose the learning rate
2. It becomes slow when number of features is very large
3. No need to iterate
A. 1 and 2
B. 1 and 3.
C. 2 and 3.
D. 1,2 and 3.
Answer:D
697. If two variables are correlated, is it necessary that they have a linear
relationship?
A. Yes
B. No
Answer:B
698. When the C parameter is set to infinite, which of the following holds true?
A. The optimal hyperplane if exists, will be the one that completely separates the data
B. The soft-margin classifier will separate the data
C. None of the above
Answer:A
699. Suppose you are building a SVM model on data X. The data X can be error
prone which means that you should not trust any specific data point too much.
Now think that you want to build a SVM model which has quadratic kernel
function of polynomial degree 2 that uses Slack variable C as one of it’s hyper
parameter.What would happen when you use very large value of C(C->infinity)?
A. We can still classify data correctly for given setting of hyper parameter C
B. We can not classify data correctly for given setting of hyper parameter C
C. Can’t Say
D. None of these
Answer:A