0% found this document useful (0 votes)
9 views5 pages

ASSIGN8

The document discusses regression techniques used in predictive data mining. It provides examples of linear regression models and discusses terms like slope, sum of squared errors, overfitting and underfitting in the context of regression analysis. It also mentions other techniques like principal component analysis and autoregression that are applicable for time series prediction problems.

Uploaded by

Toygj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

ASSIGN8

The document discusses regression techniques used in predictive data mining. It provides examples of linear regression models and discusses terms like slope, sum of squared errors, overfitting and underfitting in the context of regression analysis. It also mentions other techniques like principal component analysis and autoregression that are applicable for time series prediction problems.

Uploaded by

Toygj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Mining: Assignment Week 8: Regression

(Each question carries 1 mark)

1. Regression is used in:

A. predictive data mining

B. exploratory data mining

C. descriptive data mining

D. explanative data mining

Ans: A

Explanation: Regression is used for prediction.

2. In the regression equation Y = 21 - 3X, the slope is

A. 21
B. -21
C. 3
D. -3

Ans: D
Explanation: slope intercept form of a line is y=mx+c.

3. The output of a regression algorithm is usually a:

A. real variable

B. integer variable

C. character variable

D. string variable

Ans: A

Explanation: Regression outputs real variables.


4. Regression finds out the model parameters which produces the least square
error between -

A. input value and output value

B. input value and target value

C. output value and target value

D. model parameters and output value

Ans: C

Explanation: Regression finds out the model parameters which


minimises the error between the output value and the target value

5. The linear regression model y = a0 + a1x is applied to the data in the table
shown below. What is the value of the sum squared error function S(a0, a1),
when a0 = 1, a1 = 2?

x y
1 1
2 1
4 6
3 2

A. 0.0
B. 27
C. 13.5
D. 54

Ans: D
Explanation: y’ is the predicted output.

y’ = 1+2x

x y y’
1 1 3
2 1 5
4 6 9
3 2 7

sum of squared error = (1-3)2 +(1-5)2 +(6-9)2 +(2-7)2 = 54


6. Consider x1, x2 to be the independent variables and y the dependent
variable, which of the following represents a linear regression model?

A. y = a0 + a1/x1 + a2/x2

B. y = a0 + a1x1 + a2x2

C. y = a0 + a1x1 + a2x22

D. y = a0 + a1x12 + a2x2

Ans: B

Explanation: In option B y is linear in x.

7. Find all the eigenvalues of the following matrix A.

A. 1,3
B. 2,3
C. 1,2,3
D. Eigenvalues cannot be found.
Ans: C
Explanation: If A is an n × n triangular matrix (upper triangular, lower
triangular, or diagonal), then the eigenvalues of A are entries of the main
diagonal of A. Therefore, eigenvalues are 1,2,3.

8. In the figures below the training instances for classification problems are
described by dots. The blue dotted lines indicate the actual functions and the
red lines indicate the regression model. Which of the following statement is
correct?
A. Figure 1 represents overfitting and Figure 2 represents underfitting

B. Figure 1 represents underrfitting and Figure 2 represents overfitting

C. Both Figure 1 and Figure 2 represents underfitting

D. Both Figure 1 and Figure 2 represents overfitting

Ans: B

Explanation: Definition of overfitting and underfitting.

9. In principal component analysis, the projected lower dimensional space


corresponds to –

A. subset of the original co-ordinate axis

B. eigenvectors of the data covariance matrix

C. eigenvectors of the data distance matrix

D. orthogonal vectors to the original co-ordinate axis

Ans: B

Explanation: We must first subtract the mean of each variable from the dataset to cen-
ter the data around the origin. Then, we compute the covariance matrix of the data and
calculate the eigenvalues and corresponding eigenvectors of this covariance ma-
trix. Then we must normalize each of the orthogonal eigenvectors to become unit vectors.
Once this is done, each of the mutually orthogonal, unit eigenvectors can be interpreted as
an axis of the ellipsoid fitted to the data. This choice of basis will transform our covariance
matrix into a diagonalised form with the diagonal elements representing the variance of
each axis.
10. A time series prediction problem is often best solved using?

A. Multivariate regression

B. Autoregression

C. Logistic regression

D. Sinusoidal regression

Ans : B

Explanation: Autoregression is a time series model that uses observations


from previous time steps as input to a regression equation to predict the value
at the next time step.

You might also like