Nptel Assignment Answers
Nptel Assignment Answers
Week 1 – Answers
Week 2 – Answers
1) Which of the following is not a method for describing a sample space?
a) Roster or listing
b) Tree diagram
c) Offset builder notation
d) Venn Diagram
Ans) (c) Offset builder notation.
Set builder notation is used for describing sample space not Offset
builder notation
2) A club of 4 is to be selected from a group of 12 people. How many
possible clubs can be selected?
a) 395
b) 425
c) 495
d) 525
Ans) (c) 495
12C = 12!/4!8! = 12x 11x10x9/4x3x2 = 495
4
Week – 3 Answers
1) State True or False: Statement: The specific value of a random variable is
called estimator.
a) True
b) False
Ans) (b) False. The specific value of a random variable is called estimate
2) If the true proportion of customers who are below 100 kg weight is 0.4,
what is the probability that a sample size 100 yields a sample proportion
between 0.3 to 0.4
a) 0.961
b) 0.827
c) 0.706
d) 0.479
Ans) (d) 0.479
8) A car distributor in city Y experiences on an average 2.5 car sales per day.
Find the probability that on a randomly selected day, they will sell 5 car:
a) 0.0668
b) 0.544
c) 0.082
d) 0.205
Ans) (a) 0.0668
10) In question 8, Find the probability that on a randomly selected day, they
will sell at most 2 cars:
a) 0.0668
b) 0.544
c) 0.082
d) 0.205
Ans) (b) 0.544
Week – 4 Answers
1) In hypothesis testing if the null hypothesis is rejected
a) no conclusions can be drawn from the test
b) The alternative hypothesis is true
c) the data must have been accumulated incorrectly
d) the sample size has been too small
Ans) (b) The alternated hypothesis is true if null hypothesis is rejected
When testing the hypothesis H0: Mu ≥ 500 vs. Ha: Mu < 500 at a
significance level of alpha, we reject the null hypothesis if the p-value is
less than or equal to alpha. Therefore, the answer is ≤ alpha.
Since the sample size is 51, we can use a t-distribution with 50 degrees
of freedom (df = n - 1). We can find the test statistic that corresponds to
a left-tailed area of 0.2 using a t-distribution table or software.
Week-5 Answers
2)A term that means the same as the term "variable" in an ANOVA
procedure is
a)factor
b)treatment
c)replication
d)variance within
Ans(a) factor
Week-6 Answers
a)0.887
b)0.956
c)0.945
d)0.932
Ans(b)
2)With reference to the data given in question no. 1, test the null
hypothesis: "There is no significant relationship between the variables". we
will:
a)Accept the null hypothesis
b)Reject the null hypothesis
c)Can’t state any conclusion
d)None of the above
Ans(b)There is a significant relationship between the variables
Week-7 Answers
Ans(a) zero
4) Larger values of r2 imply that the observations are more closely grouped about the
a) average value of the independent variables
b) average value of the dependent variable
c) least squares line
d) origin
Ans. A. ±0.65
Ans. D. 64%
9) If all the points of a scatter diagram lie on the least squares regression line, then
the coefficient of
determination for these variables based on these data is
a) 0
b) 1
c) either 1 or -1, depending upon whether the relationship is positive or negative
d) could be any value between -1 and 1
Ans. B. 1
10)A simple linear regression equation (y = mx + c ) will always pass through the point
_____
a)(0,0)
b)(1,1)
c)( Ymean , Xmean )
d)(Xmean , Ymean)
Week-8 Answers
1) Which of the following methods do we use to best fit the data in Logistic Regression?
a) Least Square Error
b) Maximum Likelihood
c) Jaccard distance
d) All of these
2) Which of the following evaluation metrics can not be applied in case of logistic
regression output
to compare with the target?
a) AUC-ROC
b) Accuracy
c) Log Loss
d) Mean-Squared-Error
3) Let f(x) denote the logistic function. The range of f(x) for any real value of x is
a) (0,1)
b) (-1 , 1)
c) All positive integers
d) All negative integers
Ans(a) (0, 1)
Ans(a) Linear Regression errors values has to be normally distributed but in case of
Logistic
Regression it is not the case
5)For the figure given below, which decision boundary is overfitting the training data?
a) A
b) B
c) C
d) None of these
Ans(c) C
6) Select the correct alternatives from the following based on the figure
1. The training error in the first plot is maximum as compared to second and third plot.
2. The best model for this regression problem is the last (third) plot because it has
minimum
training error (zero).
3. The second model is more robust than first and third because it will perform best on
unseen
data.
4. The third model is overfitting more as compared to first and second.
5. All will perform the same because we have not seen the testing data.
a) 1 and 3
b) 1 and 4
c) 1, 3 and 4
d) 5
Ans(c) 1, 3 and 4
7) For categorical data with ‘n’ categories, the number of dummy variables will
be________
a) n
b) n-1
c) n+1
d) 2n
Ans(b) n – 1
9) If the number of False negatives is 5 and number of True Positives is 20, the value of
recall will be
equal to _______
a) 0.2
b) 0.6
c) 0.8
d) 0.3
Ans(c) 0.8
10) If the precision is 0.6 and the recall value is 0.4, the value of f-measure will be
a) 0.48
b) 1
c) 0.24
d) None of these
Ans(a) 0.48
Week 9 Assignment
1 point
1. The following confusion matrix was obtained from a classifier.
Confusion Matrix. What is the accuracy of the classifier?
35%
27%
75%
80%
Sol:
5. For the above given confusion matrix, what is the F1-score of the
classifier for the Apple class?
0.5
0.4
0.2
0.6
Sol:
Ans. 0.4
Explanation: F1-score = 2*TP / (2TP + FP + FN) = 2*7/(2*7 + 17 + 4) =
0.4
8. For the given confusion matrix, determine the sensitivity for the
model. Confusion Matrix: 33%
67%
50%
None of these
Sol
9. For the given confusion matrix, determine the specificity for the
model
Confusion Matrix:
53%
67%
47%
33%
Sol:
Specificity = TN / (TN + FP) = 45/(45 + 40) = 0.53 .i.e. 53%
10. According to the ROC Curve and AUC below, choose the
correct alternative for the effectiveness of classifiers A and B.
ROC Curve:
A=B
A <B
A>B
None of these
Sol:
Week 10 Assignment
1. State True or False: Statement: Null hypothesis for chi square test
of independence assumes that all the proportions are equal.
A. True
B. False
Ans. a. True
The null hypothesis for the chi-square test of independence assumes that there is no
association between the two variables being analysed. In other words, the proportions of one
variable are independent of the proportions of the other variable. This means that all the
proportions are equal, and any observed differences are due to chance.
a. 0
b. 52.75
c. 32
d. None of these
Ans. 52.75
The critical value of chi-square for a significance level of 5% and degrees of freedom
= 4 is 9.488.
Since the calculated chi-square statistic of 52.75 is greater than the critical value of
9.488, we can reject the null hypothesis of no association between the two variables
and conclude that there is a significant association between the flavour of candy and
the number of pieces of candy.
df = (6 - 1) * (3 - 1) = 5 * 2 = 10
Week 11 Assignment
A. 2
B. 4
C. 6
D. 8
Ans. 4
https://www.youtube.com/watch?v=3BzfOLnIY9w(reference)
There are several methods for determining the appropriate number of clusters in a
dendrogram. Here are some common ones:
3.For the given dendrogram, what would be the threshold value for
a total number of clusters equal to 4?
A. 10000
B. 15000
C. 20000
D. 5000
Ans. 15000
https://www.youtube.com/watch?v=3BzfOLnIY9w(reference)
The threshold value for a dendrogram to produce a specific number of clusters, such as 4 in
this case, will depend on the structure of the dendrogram and the clustering method used.
The threshold value determines the level at which the dendrogram is "cut" to create the
desired number of clusters.
The Manhattan distance (also known as taxicab distance or L1 norm) between point
X (3, 4) and the origin (0, 0) is:
|3-0| + |4-0| = 3 + 4 = 7
a. 2.8
b. 4.6
c. 22.6
d. -3.6
Ans. 2.8
Similarly, the Euclidean distance between point (2,0.5) and centroids of Cluster A, B, and C
are:
Therefore, both points (2, 3) and (2, 0.5) will be assigned to Cluster B in the next iteration as
it has the smallest distance from both points.
Week 12 Assignment
6. Suppose in a classification problem, you are using a decision tree and you
use the Gini index as the
criterion for the algorithm to select the feature for the root node. The feature
with the _____ Gini
index will be selected.
(A) maximum
(B) highest
(C) least
(D) None of these
Ans. Least
7. In Decision Trees, for predicting a class label, the algorithm starts from
which node of the tree?
(A) Root
(B) Leaf
(C) Terminal
(D) Sub-node
Ans. Root
9. State True or False: Leaf node in a decision tree will have entropy value
equal to 0
A. True
B. False
Ans. True