0% found this document useful (0 votes)
37 views

MLPUE2 Solution

The document contains questions and answers related to hypothesis testing and machine learning concepts. Specifically: 1) It provides definitions for key hypothesis testing terms like p-value, null hypothesis, and type I and type II errors. 2) It asks questions about machine learning algorithms like decision trees, Naive Bayes classifiers, and neural networks. Topics covered include information gain, greedy algorithms, and generalization. 3) Additional questions cover statistical topics like linear regression, correlation, residuals, and clustering. Relationships between variables and improving regression accuracy with clustering are discussed.

Uploaded by

NAVEEN SAINI
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

MLPUE2 Solution

The document contains questions and answers related to hypothesis testing and machine learning concepts. Specifically: 1) It provides definitions for key hypothesis testing terms like p-value, null hypothesis, and type I and type II errors. 2) It asks questions about machine learning algorithms like decision trees, Naive Bayes classifiers, and neural networks. Topics covered include information gain, greedy algorithms, and generalization. 3) Additional questions cover statistical topics like linear regression, correlation, residuals, and clustering. Relationships between variables and improving regression accuracy with clustering are discussed.

Uploaded by

NAVEEN SAINI
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 9

1.

In order to determine the p-value of a hypothesis test, which of the following


is not needed?
a. whether the test is one-tail or two-tail
b. the value of the test statistic
c. the form of the null and alternate hypotheses
d. the level of significance
e. all of the above are needed to determine the p-value

Answer: D

2. The purpose of hypothesis testing is to:


a. test how far the mean of a sample is from zero
b. determine whether a statistical result is significant
c. determine the appropriate value of the significance level
d. derive the standard error of the data
e. determine the appropriate value of the null hypothesis

Answer: B

3. Which of the following statements about hypothesis testing is true?


a. If the p-value is greater than the significance level, we fail to reject Ho
b. A Type II error is rejecting the null when it is actually true.
c. If the alternative hypothesis is that the population mean is greater than a
specified value, then
the test is a two-tailed test.
d. The significance level equals one minus the probability of a Type I error.

Answer A

4. The rejection probability of Null Hypothesis when it is true is called as?


a) Level of Confidence
b) Level of Significance
c) Level of Margin
d) Level of Rejection

Answer: b

5. Alternative Hypothesis is also called as?


a) Composite hypothesis
b) Research Hypothesis
c) Simple Hypothesis
d) Null Hypothesis

Answer: b

6. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of


students from a college.

1) Which of the following statement is true in following case?

A) Feature F1 is an example of nominal variable.


B) Feature F1 is an example of ordinal variable.
C) It doesn’t belong to any of the above category.
D) Both of these
Solution: (B)

7. What is Machine learning?


a) The autonomous acquisition of knowledge through the use of computer programs
b) The autonomous acquisition of knowledge through the use of manual programs
c) The selective acquisition of knowledge through the use of computer programs
d) The selective acquisition of knowledge through the use of manual programs

Answer: a

8. A statement made about a population for testing purpose is called?


a) Statistic
b) Hypothesis
c) Level of Significance
d) Test-Statistic

Answer: b

9. If the Null Hypothesis is false then Alternative Hypothesis is accepted.


The type of test is defined by Alternative Hypothesis?
a.True
b. False
Answer: True

10. A statement made about a population for testing purpose is called Level of
significance?
a. TRUE
b. FALSE

Answer : False

11. Which of the following sentences are correct in reference to


Information gain?
a. It is biased towards single-valued attributes
b. It is biased towards multi-valued attributes
c. ID3 makes use of information gain
d. The approact used by ID3 is greedy

Ans: c, d

12. Which one of these is a tree based learner?


a. Rule based
b. Bayesian Belief Network
c. Bayesian classifier
d. Random Forest

Ans: d

13. What is the approach of basic algorithm for decision tree induction?
a. Greedy
b. Top Down
c. Procedural
d. Step by Step
Ans: a

14. Which of the following classifications would best suit the student performance
classification systems?
a. If...then... analysis
b. Market-basket analysis
c. Regression analysis
d. Cluster analysis
Ans: a

15. Attribute selection measures are also known as splitting rules.


a.True b. false

Ans: True

16. What is generalization?


a) ability to store a pattern
b) ability to recall a pattern
c) ability to learn a mapping function
d) none of the mentioned

Answer: c

17. Generalization feature of a multilayer feedforward network depends on factors?


a) architectural details
b) learning rate parameter
c) training samples
d) all of the mentioned

Answer: a

18. What is capacity of a network?


a) number of inputs it can take
b) number of output it can deliver
c) number of patterns that can be stored
d) none of the mentioned

answer c

19. When does storage problem becomes hard problem?


a) when number of patterns is more than number of basins of attraction
b) when number of patterns is less than number of basins of attraction
c) when number of patterns is same as number of basins of attraction
d) none of the mentioned

Answer: a

20. For what purpose energy minima are used?


a) pattern classification
b) patten mapping
c) pattern storage
d) none of the mentioned

Answer: c

21. Select the order of sampling schemes from best to worst.


a. simple random, stratified, convenience
b. simple random, convenience, stratified
c. stratified, simple random, convenience
d. stratified, convenience, simple random

Answer A

22 When the correlation coefficient, r, is close to one:


a. there is no relationship between the two variables
b. there is a strong linear relationship between the two variables
c. it is impossible to tell if there is a relationship between the two variables
d. the slope of the regression line will be close to one

Answer: b

23. Given the following data pairs (x, y), find the regression equation.
(1, 1.24), (2, 5.23), (3, 7.24), (4, 7.60), (5, 9.97), (6, 14.31), (7, 13.99), (8,
14.88),
(9, 18.04), (10, 20.70)
a. y = 0.490 x - 0.053
b. y = 2.04 x
c. y = 1.98 x + 0.436
d. y = 0.49 x

Answer:c

24. The intercept in linear regression represents:


a. the strength of the relationship between x and y
b. the expected x value when y is zero
c. the expected y value when x is zero
d. a population parameter

Answer: c

25. The list of all units in a population is called

a.Random sampling
b.Sampling Frame
c. Bias
d. Parameter
e. Statistic

Answer: Sampling frame

26. What is the number of parameters needed to represent a Naive Bayes classifier
with n
Boolean variables and a Boolean label ?
Options:
(a) 2n + 1
(b) n + 1
(c) 2n
(d) n

Ans: (a)

27. If we train a Naive Bayes classifier using infinite training data that
satisfies all of its
modeling assumptions (e.g., conditional independence), then in general, what can we
say about the training error (error in training data) and test error (error in
held-out
test data)?
(a) It may not achieve either zero training error or zero test error
(b) It will always achieve zero training error and zero test error.
(c) It will always achieve zero training error but may not achieve zero test error.
(d) It may not achieve zero training error but will always achieve zero test error.

Ans: (a)

28. Which of the following is able to approximate any continuous function to an


arbitrary accuracy?
(a) A two-layer neural network (input layer, output layer) using a linear
activation function.
(b) A two-layer neural network (input layer, output layer) using a non-linear
activation function.
(c) A three-layer neural network (input layer, hidden layer, output layer) using a
linear activation function.
(d) A three-layer neural network (input layer, hidden layer, output layer) using a
non-linear activation function.

Ans: (d)

29. How many terms are required for building a bayes model?
a) 1
b) 2
c) 3
d) 4

Answer: c

30. What does the bayesian network provides?


a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned

Answer: a

31. If there is a very strong correlation between two variables then the
correlation
coefficient must be
a. any value larger than 1
b. much smaller than 0, if the correlation is negative
c. much larger than 0, regardless of whether the correlation is negative or
positive
d. None of these alternatives is correct.

Answer: b

32. In regression, the equation that describes how the response variable (y) is
related to the
explanatory variable (x) is:
a. the correlation model
b. the regression model
c. used to compute the correlation coefficient
d. None of these alternatives is correct.

Answer: b

33. The relationship between number of beers consumed (x) and blood alcohol
content (y) was studied
in 16 male college students by using least squares regression. The following
regression equation
was obtained from this study:
y^2= -0.0127 + 0.0180x
The above equation implies that:
a. each beer consumed increases blood alcohol by 1.27%
b. on average it takes 1.8 beers to increase blood alcohol content by 1%
c. each beer consumed increases blood alcohol by an average of amount of 1.8%
d. each beer consumed increases blood alcohol by exactly 0.018

Answer:c

34. Regression modeling is a statistical framework for developing a mathematical


equation that
describes how
a. one explanatory and one or more response variables are related
b. several explanatory and several response variables response are related
c. one response and one or more explanatory variables are related
d. All of these are correct.

Answer: c

35. A residual plot:


a. displays residuals of the explanatory variable versus residuals of the response
variable.
b. displays residuals of the explanatory variable versus the response variable.
c. displays explanatory variable versus residuals of the response variable.
d. displays the explanatory variable versus the response variable.
e. displays the explanatory variable on the x axis versus the response variable on
the y axis.

Answer:c

36. Is it possible that Assignment of observations to clusters does not change


between successive iterations in K-Means

A. Yes

B. No

C. Can’t say
D. None of these

Solution: (A)

37. How can Clustering (Unsupervised Learning) be used to improve the accuracy of
Linear Regression model (Supervised Learning):

a. Creating different models for different cluster groups.


b. Creating an input feature for cluster ids as an ordinal variable.
c. Creating an input feature for cluster centroids as a continuous variable.
d. Creating an input feature for cluster size as a continuous variable.

A. 1 only

B. 1 and 2

C. 1 and 4

D. 3 only

E. 2 and 4

F. All of the above


Solution: (F)

38. Which of the following is/are valid iterative strategy for treating missing
values before clustering analysis?

A. Imputation with mean

B. Nearest Neighbor assignment

C. Imputation with Expectation Maximization algorithm

D. All of the above

Solution: (C)

39. If two variables V1 and V2, are used for clustering. Which of the following are
true for K means clustering with k =3?

If V1 and V2 has a correlation of 1, the cluster centroids will be in a straight


line
If V1 and V2 has a correlation of 0, the cluster centroids will be in straight line
Options:

A. 1 only

B. 2 only

C. 1 and 2

D. None of the above

Solution: (A)

40. What is true about K-Mean Clustering?

K-means is extremely sensitive to cluster center initializations


Bad initialization can lead to Poor convergence speed
Bad initialization can lead to bad overall clustering
Options:

A. 1 and 3

B. 1 and 2

C. 2 and 3

D. 1, 2 and 3

Solution: (D)

41. PCR products( polymerase chain reaction) can be analysed in many ways. Which
of the following is not possible?
a) Use of restriction enzymes
b) Determining whether a particular oliginucleotide probe hybridizes to a PCR
product
c) Electrophoresis
d) Direct sequencing can’t be carried out

Answer: d
42. PCR is useful in population genetics because at times it can be used to study
genetics of bacteria that can’t be cultured axenically.
a) True
b) False

Answer: a

43. PCR amplification can be used for which type of samples?


a) Old samples only
b) Recent samples only
c) Equally to both recent and old samples
d) Recent samples are preferred but can be applied to old samples also

Answer: c

44 A represents the dominant allele and a represents the recessive allele of a


pair. If, in 1000 offspring, 500 are aa
and 500 are of some other genotype, which of the following are most probably the
genotypes of the parents?
a. Aa and Aa
b. Aa and aa
c. AA and Aa
d. AA and aa
e. aa and aa

Answer a

45. Which of the following is the most likely explanation for a high rate of
crossing-over between two genes?
a. The two genes are far apart on the same chromosome.
b. The two genes are both located near the centromere.
c. The two genes are sex-linked.
d. The two genes code for the same protein.
e. The two genes are on different chromosomes

Answer a

46. What is the final stage of an agent-based modeling (ABM) methodology?


1 Identifying the agents and determining their behavior
2 Determining agent-related data
3 Validating agent behavior against reality
4 Determining the suitability of ABM

Answer : 3

47. All of the following are suitable problems for genetic algorithms EXCEPT
1. dynamic process control
2.pattern recognition with complex patterns
3.simulation of biological models
4.simple optimization with few variables

Answer: 4

48. The mutation occurs at a random basis within a genome.


a) True
b) False

Answer: b
49. Detection of mismatches and fidelity of replication is maintained by
mutation repair system.
a) True
b) False

Answer: a

50. Nucleosome is made up of __________


a) DNA, histone core protein
b) DNA, histone core protein, linker H1
c) RNA, histone core protein
d) RNA, histone core protein, linker H1

Answer: b

You might also like