0% found this document useful (0 votes)
32 views22 pages

Machine-Learning Set 7

This document contains a series of questions and answers related to machine learning concepts. It covers topics like regression, classification, clustering, evaluation metrics, naive bayes, and more. There are over 600 multiple choice questions in total.

Uploaded by

Salma Abobasha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views22 pages

Machine-Learning Set 7

This document contains a series of questions and answers related to machine learning concepts. It covers topics like regression, classification, clustering, evaluation metrics, naive bayes, and more. There are over 600 multiple choice questions in total.

Uploaded by

Salma Abobasha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Machine Learning (ML)

7 of 8 sets

601. The average squared difference between classifier predicted output and actual
output.
A. mean squared error
B. root mean squared error
C. mean absolute error
D. mean relative error
Answer:A

602. Simple regression assumes a __________ relationship between the input


attribute and output attribute.
o m
A. linear
. c
B. quadratic
te
C. reciprocal
a
D. inverse

q M
Answer:A
c
Mused to model _______ data.
603. Regression trees are often
A. linear
B. nonlinear
C. categorical
D. symmetrical
Answer:B

604. The leaf nodes of a model tree are


A. averages of numeric output attribute values.
B. nonlinear regression equations.
C. linear regression equations.
D. sums of numeric output attribute values.
Answer:C
605. Logistic regression is a ________ regression technique that is used to model
data having a _____outcome.
A. linear, numeric
B. linear, binary
C. nonlinear, numeric
D. nonlinear, binary
Answer:D

606. This technique associates a conditional probability value with each data
instance.
A. linear regression
B. logistic regression
C. simple regression
D. multiple linear regression
Answer:B

607. This supervised learning technique can process both numeric and categorical
input attributes.
A. linear regression
B. bayes classifier
C. logistic regression
D. backpropagation learning
Answer:A

608. With Bayes classifier, missing data items are


A. treated as equal compares.
B. treated as unequal compares.
C. replaced with a default value.
D. ignored.
Answer:B

609. This clustering algorithm merges and splits nodes to help modify nonoptimal
partitions.
A. agglomerative clustering
B. expectation maximization
C. conceptual clustering

View all MCQ's at McqMate.com


D. k-means clustering
Answer:D

610. This clustering algorithm initially assumes that each data instance represents
a single cluster.
A. agglomerative clustering
B. conceptual clustering
C. k-means clustering
D. expectation maximization
Answer:C

611. This unsupervised clustering algorithm terminates when mean values


computed for the current iteration of the algorithm are identical to the computed
mean values for the previous iteration.
A. agglomerative clustering
B. conceptual clustering
C. k-means clustering
D. expectation maximization
Answer:C

612. Machine learning techniques differ from statistical techniques in that machine
learning methods
A. typically assume an underlying distribution for the data.
B. are better able to deal with missing and noisy data.
C. are not able to explain their behavior.
D. have trouble with large-sized datasets.
Answer:B

613. In reinforcement learning if feedback is negative one it is defined as____.


A. Penalty
B. Overlearning
C. Reward
D. None of above
Answer:A

View all MCQ's at McqMate.com


614. According to____ , it’s a key success factor for the survival and evolution of all
species.
A. Claude Shannon\s theory
B. Gini Index
C. Darwin’s theory
D. None of above
Answer:C

615. What is ‘Training set’?


A. Training set is used to test the accuracy of the hypotheses generated by the learner.
B. A set of data is used to discover the potentially predictive relationship.
C. Both A & B
D. None of above
Answer:B

616. Common deep learning applications include____


A. Image classification, Real-time visual tracking
B. Autonomous car driving, Logistic optimization
C. Bioinformatics, Speech recognition
D. All above
Answer:D

617. Reinforcement learning is particularly efficient when______________.


A. the environment is not completely deterministic
B. it\s often very dynamic
C. it\s impossible to have a precise error measure
D. All above
Answer:D

618. if there is only a discrete number of possible outcomes (called categories), the
process becomes a______.
A. Regression
B. Classification.
C. Modelfree
D. Categories

View all MCQ's at McqMate.com


Answer:B

619. Which of the following are supervised learning applications


A. Spam detection, Pattern detection, Natural Language Processing
B. Image classification, Real-time visual tracking
C. Autonomous car driving, Logistic optimization
D. Bioinformatics, Speech recognition
Answer:A

620. During the last few years, many ______ algorithms have been applied to deep
neural networks to learn the best policy for playing Atari video games and to teach
an agent how to associate the right action with an input representing the state.
A. Logical
B. Classical
C. Classification
D. None of above
Answer:D

621. What is ‘Overfitting’ in Machine learning?


A. when a statistical model describes random error or noise instead of underlying relationship
‘overfitting’ occurs.
B. Robots are programed so that they can perform the task based on data they gather from
sensors.
C. While involving the process of learning ‘overfitting’ occurs.
D. a set of data is used to discover the potentially predictive relationship
Answer:A

622. What is ‘Test set’?


A. Test set is used to test the accuracy of the hypotheses generated by the learner.
B. It is a set of data is used to discover the potentially predictive relationship.
C. Both A & B
D. None of above
Answer:A

623. ________is much more difficult because it's necessary to determine a


supervised strategy to train a model for each feature and, finally, to predict their

View all MCQ's at McqMate.com


value
A. Removing the whole line
B. Creating sub-model to predict those features
C. Using an automatic strategy to input them according to the other known values
D. All above
Answer:B

624. How it's possible to use a different placeholder through the


parameter_______.
A. regression
B. classification
C. random_state
D. missing_values
Answer:D

625. If you need a more powerful scaling feature, with a superior control on
outliers and the possibility to select a quantile range, there's also the class________.
A. RobustScaler
B. DictVectorizer
C. LabelBinarizer
D. FeatureHasher
Answer:A

626. scikit-learn also provides a class for per-sample normalization, Normalizer. It


can apply________to each element of a dataset
A. max, l0 and l1 norms
B. max, l1 and l2 norms
C. max, l2 and l3 norms
D. max, l3 and l4 norms
Answer:B

627. There are also many univariate methods that can be used in order to select the
best features according to specific criteria based on________.
A. F-tests and p-values
B. chi-square
C. ANOVA

View all MCQ's at McqMate.com


D. All above
Answer:A

628. ________performs a PCA with non-linearly separable data sets.


A. SparsePCA
B. KernelPCA
C. SVD
D. None of the Mentioned
Answer:B

629. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade
of students from a college. Which of the following statement is true in following
case?
A. Feature F1 is an example of nominal variable.
B. Feature F1 is an example of ordinal variable.
C. It doesn’t belong to any of the above category.
D. Both of these
Answer:B

630. The parameter______ allows specifying the percentage of elements to put into
the test/training set
A. test_size
B. training_size
C. All above
D. None of these
Answer:C

631. In many classification problems, the target ______ is made up of categorical


labels which cannot immediately be processed by any algorithm.
A. random_state
B. dataset
C. test_size
D. All above
Answer:B

View all MCQ's at McqMate.com


632. _______adopts a dictionary-oriented approach, associating to each category
label a progressive integer number.
A. LabelEncoder class
B. LabelBinarizer class
C. DictVectorizer
D. FeatureHasher
Answer:A

633. Function used for linear regression in R is __________


A. lm(formula, data)
B. lr(formula, data)
C. lrm(formula, data)
D. regression.linear(formula, data)
Answer:A

634. In syntax of linear model lm(formula,data,..), data refers to ______


A. Matrix
B. Vector
C. Array
D. List
Answer:B

635. Which of the following methods do we use to find the best fit line for data in
Linear Regression?
A. Least Square Error
B. Maximum Likelihood
C. Logarithmic Loss
D. Both A and B
Answer:A

636. Which of the following evaluation metrics can be used to evaluate a model
while modeling a continuous output variable?
A. AUC-ROC
B. Accuracy
C. Logloss
D. Mean-Squared-Error

View all MCQ's at McqMate.com


Answer:D

637. Which of the following is true about Residuals ?


A. Lower is better
B. Higher is better
C. A or B depend on the situation
D. None of these
Answer:A

638. Naive Bayes classifiers are a collection ------------------of algorithms


A. Classification
B. Clustering
C. Regression
D. All
Answer:A

639. Naive Bayes classifiers is _______________ Learning


A. Supervised
B. Unsupervised
C. Both
D. None
Answer:A

640. Features being classified is independent of each other in Naïve Bayes


Classifier
A. False
B. true
Answer:B

641. Features being classified is __________ of each other in Naïve Bayes Classifier
A. Independent
B. Dependent
C. Partial Dependent
D. None
Answer:A

View all MCQ's at McqMate.com


642. Conditional probability is a measure of the probability of an event given that
another event has already occurred.
A. True
B. false
Answer:A

643. Bayes’ theorem describes the probability of an event, based on prior


knowledge of conditions that might be related to the event.
A. True
B. false
Answer:A

644. Bernoulli Naïve Bayes Classifier is ___________distribution


A. Continuous
B. Discrete
C. Binary
Answer:C

645. Multinomial Naïve Bayes Classifier is ___________distribution


A. Continuous
B. Discrete
C. Binary
Answer:B

646. Gaussian Naïve Bayes Classifier is ___________distribution


A. Continuous
B. Discrete
C. Binary
Answer:A

647. Binarize parameter in BernoulliNB scikit sets threshold for binarizing of


sample features.
A. True
B. false
Answer:A

View all MCQ's at McqMate.com


648. Gaussian distribution when plotted, gives a bell shaped curve which is
symmetric about the _______ of the feature values.
A. Mean
B. Variance
C. Discrete
D. Random
Answer:A

649. SVMs directly give us the posterior probabilities P(y = 1jx) and P(y = ?1jx)
A. True
B. false
Answer:B

650. Any linear combination of the components of a multivariate Gaussian is a


univariate Gaussian.
A. True
B. false
Answer:A

651. Solving a non linear separation problem with a hard margin Kernelized SVM
(Gaussian RBF Kernel) might lead to overfitting
A. True
B. false
Answer:A

652. SVM is a ------------------ algorithm


A. Classification
B. Clustering
C. Regression
D. All
Answer:A

653. SVM is a ------------------ learning


A. Supervised
B. Unsupervised
C. Both

View all MCQ's at McqMate.com


D. None
Answer:A

654. The linear SVM classifier works by drawing a straight line between two
classes
A. True
B. false
Answer:A

655. What is Model Selection in Machine Learning?


A. The process of selecting models among different mathematical models, which are used to
describe the same data set
B. when a statistical model describes random error or noise instead of underlying relationship
C. Find interesting directions in data and find novel observations/ database cleaning
D. All above
Answer:A

656. Which are two techniques of Machine Learning ?


A. Genetic Programming and Inductive Learning
B. Speech recognition and Regression
C. Both A & B
D. None of the Mentioned
Answer:A

657. Even if there are no actual supervisors ________ learning is also based on
feedback provided by the environment
A. Supervised
B. Reinforcement
C. Unsupervised
D. None of the above
Answer:B

658. When it is necessary to allow the model to develop a generalization ability and
avoid a common problem called______.
A. Overfitting
B. Overlearning

View all MCQ's at McqMate.com


C. Classification
D. Regression
Answer:A

659. Techniques involve the usage of both labeled and unlabeled data is called___.
A. Supervised
B. Semi-supervised
C. Unsupervised
D. None of the above
Answer:B

660. A supervised scenario is characterized by the concept of a _____.


A. Programmer
B. Teacher
C. Author
D. Farmer
Answer:B

661. overlearning causes due to an excessive ______.


A. Capacity
B. Regression
C. Reinforcement
D. Accuracy
Answer:A

662. Which of the following are several models for feature extraction
A. regression
B. classification
C. None of the above
Answer:C

663. _____ provides some built-in datasets that can be used for testing purposes.
A. scikit-learn
B. classification
C. regression

View all MCQ's at McqMate.com


D. None of the above
Answer:A

664. While using _____ all labels are turned into sequential numbers.
A. LabelEncoder class
B. LabelBinarizer class
C. DictVectorizer
D. FeatureHasher
Answer:A

665. _______produce sparse matrices of real numbers that can be fed into any
machine learning model.
A. DictVectorizer
B. FeatureHasher
C. Both A & B
D. None of the Mentioned
Answer:C

666. scikit-learn offers the class______, which is responsible for filling the holes
using a strategy based on the mean, median, or frequency
A. LabelEncoder
B. LabelBinarizer
C. DictVectorizer
D. Imputer
Answer:D

667. Which of the following scale data by removing elements that don't belong to a
given range or by considering a maximum absolute value.
A. MinMaxScaler
B. MaxAbsScaler
C. Both A & B
D. None of the Mentioned
Answer:C

668. scikit-learn also provides a class for per-sample normalization,_____


A. Normalizer

View all MCQ's at McqMate.com


B. Imputer
C. Classifier
D. All above
Answer:A

669. ______dataset with many features contains information proportional to the


independence of all features and their variance.
A. normalized
B. unnormalized
C. Both A & B
D. None of the Mentioned
Answer:B

670. In order to assess how much information is brought by each component, and
the correlation among them, a useful tool is the_____.
A. Concuttent matrix
B. Convergance matrix
C. Supportive matrix
D. Covariance matrix
Answer:D

671. The_____ parameter can assume different values which determine how the
data matrix is initially processed.
A. run
B. start
C. init
D. stop
Answer:C

672. ______allows exploiting the natural sparsity of data while extracting principal
components.
A. SparsePCA
B. KernelPCA
C. SVD
D. init parameter
Answer:A

View all MCQ's at McqMate.com


673. Which of the following statement is true about outliers in Linear regression?
A. Linear regression is sensitive to outliers
B. Linear regression is not sensitive to outliers
C. Can’t say
D. None of these
Answer:A

674. Suppose you plotted a scatter plot between the residuals and predicted values
in linear regression and you found that there is a relationship between them.
Which of the following conclusion do you make about this situation?
A. Since the there is a relationship means our model is not good
B. Since the there is a relationship means our model is good
C. Can’t say
D. None of these
Answer:A

675. Let’s say, a “Linear regression” model perfectly fits the training data (train
error is zero). Now, Which of the following statement is true?
A. You will always have test error zero
B. You can not have test error zero
C. None of the above
Answer:C

676. In a linear regression problem, we are using “R-squared” to measure


goodness-of-fit. We add a feature in linear regression model and retrain the same
model.Which of the following option is true?
A. If R Squared increases, this variable is significant.
B. If R Squared decreases, this variable is not significant.
C. Individually R squared cannot tell about variable importance. We can’t say anything about it
right now.
D. None of these.
Answer:C

677. To test linear relationship of y(dependent) and x(independent) continuous


variables, which of the following plot best suited?
A. Scatter plot
B. Barchart

View all MCQ's at McqMate.com


C. Histograms
D. None of these
Answer:A

678. which of the following step / assumption in regression modeling impacts the
trade-off between under-fitting and over-fitting the most.
A. The polynomial degree
B. Whether we learn the weights by matrix inversion or gradient descent
C. The use of a constant-term
Answer:A

679. Which of the following is true about “Ridge” or “Lasso” regression methods
in case of feature selection?
A. Ridge regression uses subset selection of features
B. Lasso regression uses subset selection of features
C. Both use subset selection of features
D. None of above
Answer:B

680. Which of the following statement(s) can be true post adding a variable in a
linear regression model?1. R-Squared and Adjusted R-squared both increase2. R-
Squared increases and Adjusted R-squared decreases3. R-Squared decreases and
Adjusted R-squared decreases4. R-Squared decreases and Adjusted R-squared
increases
A. 1 and 2
B. 1 and 3
C. 2 and 4
D. None of the above
Answer:A

681. What is/are true about kernel in SVM?1. Kernel function map low
dimensional data to high dimensional space2. It’s a similarity function
A. 1
B. 2
C. 1 and 2
D. None of these

View all MCQ's at McqMate.com


Answer:C

682. Suppose you are building a SVM model on data X. The data X can be error
prone which means that you should not trust any specific data point too much.
Now think that you want to build a SVM model which has quadratic kernel
function of polynomial degree 2 that uses Slack variable C as one of it’s hyper
parameter.What would happen when you use very small C (C~0)?
A. Misclassification would happen
B. Data will be correctly classified
C. Can’t say
D. None of these
Answer:A

683. The cost parameter in the SVM means:


A. The number of cross-validations to be made
B. The kernel to be used
C. The tradeoff between misclassification and simplicity of the model
D. None of the above
Answer:C

684. How do you handle missing or corrupted data in a dataset?


A. a. Drop missing rows or columns
B. b. Replace missing values with mean/median/mode
C. c. Assign a unique category to missing values
D. d. All of the above
Answer:D

685. Which of the following statements about Naive Bayes is incorrect?


A. Attributes are equally important.
B. Attributes are statistically dependent of one another given the class value.
C. Attributes are statistically independent of one another given the class value.
D. Attributes can be nominal or numeric
Answer:B

686. The SVM’s are less effective when:


A. The data is linearly separable

View all MCQ's at McqMate.com


B. The data is clean and ready to use
C. The data is noisy and contains overlapping points
Answer:C

687. If there is only a discrete number of possible outcomes called _____.


A. Modelfree
B. Categories
C. Prediction
D. None of above
Answer:B

688. Some people are using the term ___ instead of prediction only to avoid the
weird idea that machine learning is a sort of modern magic.
A. Inference
B. Interference
C. Accuracy
D. None of above
Answer:A

689. The term _____ can be freely used, but with the same meaning adopted in
physics or system theory.
A. Accuracy
B. Cluster
C. Regression
D. Prediction
Answer:D

690. Common deep learning applications / problems can also be solved using____
A. Real-time visual object identification
B. Classic approaches
C. Automatic labeling
D. Bio-inspired adaptive systems
Answer:B

691. what is the function of ‘Unsupervised Learning’?


A. Find clusters of the data and find low-dimensional representations of the data

View all MCQ's at McqMate.com


B. Find interesting directions in data and find novel observations/ database cleaning
C. Interesting coordinates and correlations
D. All
Answer:D

692. What are the two methods used for the calibration in Supervised Learning?
A. Platt Calibration and Isotonic Regression
B. Statistics and Informal Retrieval
Answer:A

693. Suppose we fit “Lasso Regression” to a data set, which has 100 features
(X1,X2…X100). Now, we rescale one of these feature by multiplying with 10 (say
that feature is X1), and then refit Lasso regression with the same regularization
parameter.Now, which of the following option will be correct?
A. It is more likely for X1 to be excluded from the model
B. It is more likely for X1 to be included in the model
C. Can’t say
D. None of these
Answer:B

694. Which of the following is true about “Ridge” or “Lasso” regression methods
in case of feature selection?
A. Ridge regression uses subset selection of features
B. Lasso regression uses subset selection of features
C. Both use subset selection of features
D. None of above
Answer:B

695. Which of the following statement(s) can be true post adding a variable in a
linear regression model?
1. R-Squared and Adjusted R-squared both increase
2. R-Squared increases and Adjusted R-squared decreases
3. R-Squared decreases and Adjusted R-squared decreases
4. R-Squared decreases and Adjusted R-squared increases
A. 1 and 2
B. 1 and 3
C. 2 and 4

View all MCQ's at McqMate.com


D. None of the above
Answer:A

696. We can also compute the coefficient of linear regression with the help of an
analytical method called “Normal Equation”. Which of the following is/are true
about “Normal Equation”?
1. We don’t have to choose the learning rate
2. It becomes slow when number of features is very large
3. No need to iterate
A. 1 and 2
B. 1 and 3.
C. 2 and 3.
D. 1,2 and 3.
Answer:D

697. If two variables are correlated, is it necessary that they have a linear
relationship?
A. Yes
B. No
Answer:B

698. When the C parameter is set to infinite, which of the following holds true?
A. The optimal hyperplane if exists, will be the one that completely separates the data
B. The soft-margin classifier will separate the data
C. None of the above
Answer:A

699. Suppose you are building a SVM model on data X. The data X can be error
prone which means that you should not trust any specific data point too much.
Now think that you want to build a SVM model which has quadratic kernel
function of polynomial degree 2 that uses Slack variable C as one of it’s hyper
parameter.What would happen when you use very large value of C(C->infinity)?
A. We can still classify data correctly for given setting of hyper parameter C
B. We can not classify data correctly for given setting of hyper parameter C
C. Can’t Say
D. None of these
Answer:A

View all MCQ's at McqMate.com


700. SVM can solve linear and non-linear problems
A. true
B. false
Answer:A

View all MCQ's at McqMate.com

You might also like