0% found this document useful (0 votes)
4 views

Data Mining

The document consists of a series of true/false questions related to concepts in supervised learning, regression analysis, data visualization, and data mining. It covers topics such as the importance of target variables, the curse of dimensionality, principal component analysis, and the characteristics of big data. Additionally, it addresses misconceptions in statistical modeling and provides insights into the properties of various analytical techniques.

Uploaded by

Rani Raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Data Mining

The document consists of a series of true/false questions related to concepts in supervised learning, regression analysis, data visualization, and data mining. It covers topics such as the importance of target variables, the curse of dimensionality, principal component analysis, and the characteristics of big data. Additionally, it addresses misconceptions in statistical modeling and provides insights into the properties of various analytical techniques.

Uploaded by

Rani Raut
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Question 1- Supervised Learning MUST have a target /Out Variable –

True
Question -2-The Curse of dimension is the Affiliction caused by adding variables to multivariate
data models – True
Question -3-A matrix plot is an Example of a Multi -Dimension Plot True
Question 4- In logistic Regression models errors in functional form will create bias True
Question 5- Principal Component analysis is a dimensionally reduction Technique that retains
much of the variation present in data set True
Question 6-Data Visualization is used for Prediction and not Exploration False
Question -7 A time Series plot can be visually inspected to determine Seasonality in the data
True
Question 8 -A negative Covariance between Variables X and Y Move in Opposite Direction True
Question 9 – In a linear regression model of the form Y=B0+B1X, the parameter B1 is biased if it
is different from the true parameter B1 False
Question 10 -Data Mining is the Confluence of the Field of Statistics and machine learning TRUE
Question 11- A histogram is an Example of a basic plot False
Question 12-in Predictive modeling the P value of coefficient is the most important measure
True
Question 13 – The odds and log odds in the context of logistic regression means the same -
False
Question 14- A linear probability model is nothing but a linear regression with the fitted value
restricted between o and 1 – False
Question 15- Linear Regression Cannot be used when the outcome variable is categorical- True

16. If the data is homoscedastic the parameters in the linear regression cannot be trusted?
A.False
17.in any given dataset the covariance of any two variables can never be zero?
A.False
18. An odd of -0.5 means that the probability of winning and losing are equal’s a goodness
A.False
19. In a standard linear regression equation the co-efficient b1 represents the intercept of the
regression line on the axis of the outcome variable Y?
A.False
20. R – squared is a goodness of fit measure that adjusts the results based on the number of
predictors?
A.False
21. Let X and Y be two variables in a dataset. You have calculated two corresponding principal
components, Z1 and Z2, respectively, which of the following is ALWAYS true?
A. COV(Z1,Z2)=0
22.In a logistic regression equation, in p/1-p =B0 + B1 X consider B1 to be equal to1. Then one
unit change in X results in:?
A.Increasing the odds by the factor of 2.71
23. The error term in a linear regression equation?
A.Contains all factor affecting the outcome variable Y……
24. which of the following is a characteristic of BIG DATA:
A.All of the above
25. The first principal component in the PCA algorithm has:
A.Highest Variance
26. Which of the following is not a step in Explanatory modelling?
A.Focus in on Yhat
27. Which of the following is not a step in the principal component analysis methodology?
A.Square the covariance matrixx
28. Which of the following is not a technique to deal with missing values?
A.Use linear Regression to predict values
29. In the OLS solution in class, we rely on the following assumption
A.Two of the other Choices
30. Which of the following is not an iterative search algorithm for determining predictors?
A. Maximium likey hood estimation

31. In the algebraic solution to OLS. The following answers are true?
A.Two of the other choices
32. In a linear probability model the predicted probability is?
A. can be below 0 & above 1
33. The mean of a data sample is a measure of?
A.Central Tendency
34. Heat maps are used to visualize ______________ and _____________?
A.Correleation and Missing Data

35. Consider a regression which gives coefficient with a p-value of 0.555, Which of the following
statements is true (Use Statistical Significance of < 0.001)
Ans : There is not enough evidence to reject the null hypothesis

36. Which of the following is not a property of eigen vector


Ans : MxN there are M eigenvectors

37: An Odds of 0.5 means the probability of winning is


Ans : . Lower than losing

38. Data Visualization Supports?


Ans : All of the Above

39: Which of the following is reason NOT to select predictors from the full data set.
Ans : parsimony is important

40.Jittering in the Scatter Plots is the


Ans : adding of noise to unstack markets that hide data points underneath

41. Which of the following is not part of the 4 steps data mining cycle
Ans: None of the Above

42. Which of the following is not a property of predictive modeling?


Ans: performance is measure by how well the model approximates the training data set

43. Which of the following is not a step in ordinary least square algorithm
Ans: Find the eigenvectors

44. Which of the following is not a data reduction technique


Ans: Classification Trees

45. Which of the following is not a part of data mining step


Ans: Overfit data

46. The Diagonal of 2 dimensional covariance matric represents the following


Ans: the variance of each variable

47: Cov (X,Y) is always equal to


Ans : Cov(Y,X)

48. Multidimensional Visualition is the


Ans : addng of colour, size and multiple panels to convery richer information

49. In a linear regression of the form Y=B0+B1*X+U, homoskedasticity means the following
Ans. E(xu] = 0 but not Covxu) = 0

50. In a logistic regression equation of a form in [P/1-P] = B0+B1X,


Ans. p are unrelated

You might also like