0% found this document useful (0 votes)
52 views20 pages

Statistics Quiz

1. KNN can be used for both classification and regression problems. 2. In KNN, both having too large or too small a value of k can lead to issues - a large k may include points from other classes, while a small k makes the algorithm sensitive to noise. 3. K is the number of nearest neighbors considered to determine the class of a new data point in KNN classification.

Uploaded by

Mohan Koppula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views20 pages

Statistics Quiz

1. KNN can be used for both classification and regression problems. 2. In KNN, both having too large or too small a value of k can lead to issues - a large k may include points from other classes, while a small k makes the algorithm sensitive to noise. 3. K is the number of nearest neighbors considered to determine the class of a new data point in KNN classification.

Uploaded by

Mohan Koppula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Mean of

Sum of
Degree of freedom the
squares
squares
1
Regression 1 1258.55 1258.55

     (SSR)   (MSR)

Residual 6 773.451 128.909


   ( SSE)    (MSE)

Total 7      2032

     (SST)

Based on the above Table what is the value of F Statistic?


1.62

0.61

9.76

The F value is the ratio of the mean regression sum of squares (MSR) divided by the mean error sum of squ

2 Correlation Coefficient takes values between +1 and -1

0
Correlation coefficient formulas are used to find how strong a relationship is between data. The formulas re

3 As per assumptions of Linear Regression: Variance of the error terms should be

Heteroscedastic

Homoscedastic

The error terms should be Homoscedastic i.e. the error terms should have constant variance across differen
4 Df Sum of sqMean of the squares
Regression 1 1258.55 1258.55
     (SSR)   ( MSR )
Residual 6 773.451 128.909
     (SSE)   (  MSE)
Total 7      2032
       SST

No. of Independent Variables (I.V) = 1 Based on the above table what would be the value of R Squ

0.74

0.62

0.5

0.82

The R Squared is the ratio of Sum of Squares Regression (SSR) to Total sum of squares (SST) = 1258.54863

5 A Type II error is a

False Positive

False Negative

6 The VIF is a relative measure of the increase in the variance because of collinearity.

VIF is variance inflation factor which quantifies the severity of multicollinearity in an ordinary least squares

7 Which of the following statement is true about outliers in Linear regression?

1. Linear regression is sensitive to outliers

2. Linear regression is not sensitive to outliers

Can't Say
None of these

The slope of the regression line will change due to outliers in most of the cases. So Linear Regression is sen

8 As a Quick Rule of Thumb: AIC or p-value; which one to choose for model selection?

Sum of Squared Errors

AIC

The Akaike information criterion (AIC) is an estimator of the relative quality of statistical models for a given

9 In the mathematical Equation of Linear Regression Y = β1 + β2X + ϵ,


what does β1 & β2 refer to ?

(X-intercept, Slope)

(Slope, X-Intercept)

(Y-Intercept, Slope)

(slope, Y-Intercept)

Y-intercept is β1 and slope is β2.

10 In a simple linear regression model (One independent variable), If we change the input variable by 1 unit. How muc

by 1

no change

by intercept

by its slope

For linear regression we know that Y=a+bx+error. And if we neglect error then the equation is Y=a+bx. T
e mean error sum of squares (MSE) So F Statistic is 1258.548639 / 128.9085602 = 9.76 (rounded to 2 decimal places)

en data. The formulas return a value between -1 and 1

variance across different values of independent variable


be the value of R Square (R²)?

res (SST) = 1258.548639/2032 = 0.62

ordinary least squares regression analysis


Linear Regression is sensitive to outliers.

tical models for a given set of data

iable by 1 unit. How much output variable will change?

e equation is Y=a+bx. Therefore, if x increases by 1, then Y = a+b(x+1) which implies Y=a+bx+b. So Y increases by its slope.
decimal places)
Y increases by its slope.
1 Calculate the odds ratio (rounded off to 2 decimal points) for the given probability, P =0.45
0.69
0.73
0.77
0.81
explain: q=1-p 0.55 odss rationp/q 0.818
p 0.45

2 Observe the below table and calculate- Specificity


Actual Actual
Positive Negative total

Predicted
2300 180
Positive
2480 0.072581

Predicted
200 820
Negative
1020 0.196078
total 2500 1000
0.92 0.82
0.78
0.82 yellow marked need validation
0.92
0.96

3 Observe the below table and calculate- Accuracy


Actual Actual
Positive Negative

Predicted
2300 180
Positive
2480 0.927419

Predicted
200 820
Negative
1020 0.803922
2500 1000
0.08 0.18
0.59
0.69
0.79
0.89 yellow marked ones need validation

4 Observe the below table and calculate- Sensitivity


Actual Actual
Positive Negative
Predicted
2300 180
Positive
2480 0.072581

Predicted
200 820
Negative
1020 0.196078
2500 1000
0.92 0.82
0.78
0.82
0.92 yellow marked ones need validation
0.96

5 Which of the below is true for a good model?

There should be a very high correlation between two independent variables.

There should be a very low correlation between two independent variables.

There should be a very low correlation between a dependent and an independent variables.

None of the mentioned

6 Which of the following option is true?

Linear Regression errors values has to be normally distributed but in case of Logistic Regression it is not th

Logistic Regression errors values has to be normally distributed but in case of Linear Regression it is not th

Both Linear Regression and Logistic Regression error values have to be normally distributed

Both Linear Regression and Logistic Regression error values have not to be normally distributed

7 The logit function is ________________

The log of the event = 1

The log of the odds ratio of an event

The log of the probability of success


All of the above

8 Logistic regression assumes a linear relationship between the log odds ratio of an event and independent

9 Which of the following methods do we use to best fit the data in Logistic Regression?

Least Square Error

Maximum Likelihood

Jaccard distance

Both A and B

10 Which of the following is not an assumption of Logistic Regression?

There should exist No Multicollinearity between Independent variables

There should exist Normality of data distribution

The dependent variable should be a continuous variable

There can be more than 2 categories in the Dependent Variable for prediction
ent variables.

gistic Regression it is not the case

Linear Regression it is not the case

y distributed

mally distributed
an event and independent predictors.
1 KNN can be used for both Classification and Regression problems. 

0
KNN can be used for both classification and regression problems.

2 You have given the following 2 statements, find which of these option is/are true in case of k-NN?

1. In case of very large value of k, we may include points from other classes into the neighborhood.
2. In case of too small value of k the algorithm is very sensitive to noise

Only 1

Only 2

Both 1 & 2

None of the above

3 What is K in KNN ?
1 - Number of nearest neighbors to the new data point considered to decide the class of the new data po
2 - Number of nearest neighbors to the new data point, majority class out of which is assigned to the new

Only 1

Only 2

Both 1 & 2

None of the above

4 Which one of the following is not true about KNN?

KNN algorithm has a high prediction cost for large data sets

KNN is lazy learning algorithm and therefore requires no training prior to making real time predictions

The KNN algorithm does work very well with categorical features.
This algorithm segregates unlabeled data points into well-defined groups
explained: The KNN algorithm doesn't work well with high dimensional data because, with a large number of dimen

5 Discriminant Analysis uses Bayes Theorem.

6 Naive Bayes can only be used for Binary classification

explained: Naive Bayes can be used for Binary and Multiclass classification. It provides different types of Naive Bay

7 Which of the following is a type of Naive Bayes Algorithms?

Gaussian

Bernoulli

Multinomial

All of the above

explained: Naive Bayes can be used for Binary and Multiclass classification. It provides different types of Naive Bayes

8 Naive Bayes assumes all the features to be related so it can learn the relationship between features

explained: It assumes all the features to be unrelated so it cannot learn the relationship between features

9 Which of the following statement is not True about Naive Bayes

Can successfully train on small data set

Good for multiclass classification


Slow calculation since it is naive

Continous feature data is assumed to be normally distributed


explained: In Naive Bayes the math is simpler, so the classifier runs quicker.

10 Naive Bayes is called Naive due to the assumption that the features in the dataset are mutually independent.

0
It is called Naïve due to the assumption that the features in the dataset are mutually independent.
ue in case of k-NN?

o the neighborhood.

he class of the new data point


hich is assigned to the new data point, based on simple 'voting'

ng real time predictions


h a large number of dimensions, it becomes difficult for the algorithm to calculate the distance in each dimension

fferent types of Naive Bayes Algorithms like GaussianNB, MultinomialNB, BernoulliNB

erent types of Naive Bayes Algorithms like GaussianNB, MultinomialNB, BernoulliNB

ip between features

etween features
e mutually independent.

utually independent.
ch dimension

You might also like