DMW MCQ
DMW MCQ
DMW MCQ
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : sub tree
B : class label
C : testing node
D : condition
A : false rate
B : recall
C : negative rate
D : recognition rate
Q.no 3. the negative tuples that were correctly labeled by the
classifier
A : False positives(FP)
B : True positives(TP)
D : False negatives(FN)
A : recovery
B : data cleaning
C : data cleansing
D : data pruning
A : Pruning
B : Partitioning
C : Candidate generation
D : Itemset generation
B : many labels
D : no label
A : Support
B : Confidence
C : Support count
D : Concept Hierarchies under support-confidence framework
Q.no 8. What is the method to interpret the results after rule generation?
A : Absolute Mean
B : Lift ratio
C : Gini Index
D : Apriori
A : supervised classification
B : semi-supervised classification
C : unsupervised classification
D : regression
Q.no 10. Which of the following is direct application of frequent itemset mining?
C : Outlier Detection
D : Intrusion Detection
B : An approach to a problem that is not guaranteed to work but performs well in most
cases
D : None of these
Q.no 12. The schema is collection of stars. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
A : Data warehouse
C : ETL
D : Business Intelligemce
Q.no 14. Which of the following are methods for supervised classification?
A : Decision tree
B : K-Means
C : Hierarchical
D : Apriori
Q.no 15. These are the intermediate servers that stand in between a relational
back-end server and client front-end tools
A : ROLAP
B : MOLAP
C : HOLAP
D : HaoLap
A : Nominal
B : Binary
C : Ordinal
D : numeric
A : Dimensions
B : Facts
D : Dimensions or Facts
A : Pruning
B : Partitioning
C : Candidate generation
D : Itemset generation
Q.no 20. What is the range of the cosine similarity of the two documents?
A : Zero to One
B : Zero to infinity
C : Infinity to infinity
D : Zero to Zero
A : learner waits until the last minute before constructing model to classify
B : a given training data constructs a model first and then uses it to classify
A : testing the machine on all possible ways by substituting the original sample into
training set
B : testing the machine on all possible ways by dividing the original sample into
training and validation sets.
D : Programs are not dependent on the physical attributes as well as logical attributes
of the data
A : Clustering
B : Regression
C : Summarization
D : Association rules
A : 1 Dimensional
B : 2 Dimensional
C : 3 Dimensional
D : n-Dimensional
B : data matrix
D : genomic data
A : Only one
B : Two
C : Three
D : Four
Q.no 32. What is the approach of basic algorithm for decision tree induction?
A : Greedy
B : Top Down
C : Procedural
D : Step by Step
Q.no 34. Which of the following probabilities are used in the Bayes theorem.
A : P(Ci|X)
B : P(Ci)
C : P(X|Ci)
D : P(X)
Q.no 35. In which step of Knowledge Discovery, multiple data sources are
combined?
A : Data Cleaning
B : Data Integration
C : Data Selection
D : Data Transformation
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 37. Handwritten digit recognition classifying an image of a handwritten
number into a digit from 0 to 9 is example of
A : Multiclassification
B : Multi-label classification
C : Imbalanced classification
D : Binary Classification
Q.no 38. What type of data do you need for a chi-square test?
A : Categorical
B : Ordinal
C : Interval
D : Scales
Q.no 39. For a classification problem with highly imbalanced class. The majority
class is observed 99% of times in the training data.
Your model has 99% accuracy after taking the predictions on test data. Which of
the following is not true in such a case?
C : Precision and recall metrics aren’t good for imbalanced class problems.
D : Precision and recall metrics are good for imbalanced class problems.
Q.no 40. Which of the following property typically does not hold for similarity
measures between two objects ?
A : Symmetry
B : Definiteness
C : Triangle inequality
D : Transitive
A : CART
B : C4.5
C : ID3
D : ALL
Q.no 42. In one of the frequent itemset example, it is observed that if tea and milk
are bought then sugar is also purchased by customers. After, generating an
association rule among the given set of items, it is inferred:
C : Underfitting
Q.no 44. Ordinal attribute has three distinct values such as Fair, Good, and
Excellent.
If x and y are two objects of ordinal attribute with Fair and Good values
respectively, then what is the distance from object y to x?
A:1
B:0
C : 0.5
D : 0.75
B : Exponent
C : Modulus
D : Percentage
Q.no 46. Which is the most well known association rule algorithm and is used in
most commercial products.
A : Apriori algorithm
B : Pincer-search algorithm
C : Distributed algorithm
D : Partition algorithm
B : City Block
distance
C : Chebyshev distance
D : Euclidean distance
Q.no 48. a model predicts 50 examples belonging to the minority class, 45 of which
are true positives and five of which are false positives. Precision of model is
A : Precision= 0.90
B : Precision= 0.79
C : Precision= 0.45
D : Precision= 0.68
Q.no 49. How the bayesian network can be used to answer any query?
A : Full distribution
B : Joint distribution
C : Partial distribution
Q.no 50. A sub-database which consists of set of prefix paths in the FP-tree co-
occuring with the sufix pattern is called as
A : Suffix path
B : FP-tree
C : Prefix path
Q.no 52. The basic idea of the apriori algorithm is to generate the item sets of a
particular size & scans the database. These item sets are
A : Primary
B : Secondary
C : Superkey
D : Candidate
B : Transaction processing
C : Recovery
Q.no 54. The problem of finding hidden structure from unlabeled data is called as
A : Supervised learning
B : Unsupervised learning
C : Reinforcement Learning
D : Semisupervised learning
Q.no 55. A model makes predictions and predicts 120 examples as belonging to the
minority class, 90 of which are correct, and 30 of which are incorrect. Precision of
model is
A : Precision = 0.89
B : Precision = 0.23
C : Precision = 0.45
D : Precision = 0.75
Q.no 58. A model makes predictions and predicts 90 of the positive class
predictions correctly and 10 incorrectly.Recall of model is
A : Recall=0.9
B : Recall=0.39
C : Recall=0.65
D : Recall=5.0
A : Pivot
B : Roll up
C : Drill down
D : Slice
A : ROLAP
B : MOLAP
C : HOLAP
D : HaoLap
Answer for Question No 1. is b
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. Postpruning is
B : Stop constructing tree if this would result in the measure falling below a threshold
D : Flow-Chart
Q.no 2. If two documents are similar, then what is the measure of angle between
two documents?
A : 30
B : 60
C : 90
D:0
Q.no 3. CART stands for
A : Regression
B : Classification
D : Decision Trees
A : Data Integration
B : Data Selection
C : Data Transformation
D : Data Cleaning
Q.no 5. These are the intermediate servers that stand in between a relational
back-end server and client front-end tools
A : ROLAP
B : MOLAP
C : HOLAP
D : HaoLap
A : false rate
B : recall
C : negative rate
D : recognition rate
A : Data constraints
B : Rule constraints
D : Time constraints
Q.no 8. Baysian classification in based on
B : Support
C : tree induction
D : Trees
B : An approach to a problem that is not guaranteed to work but performs well in most
cases
D : None of these
A : L1 norm
B : L2 norm
C : Lmax norm
D : L norm
Q.no 12. The distance between two points calculated using Pythagoras theorem is
A : Supremum distance
B : Euclidean distance
C : Linear distance
D : Manhattan Distance
A : A closed itemset
B : A frequent itemset
A : general tree
B : binary tree
C : prediction tree
Q.no 15. cross-validation and bootstrap methods are common techniques for
assessing
A : accuracy
B : Precision
C : recall
D : performance
A : Dimension table
B : Fact table
Q.no 17. The problem of agents to learn from the environment by their
interactions with dynamic environment is done in
A : Reinforcement learning
B : Multi-label classification
C : Binary Classification
D : Multiclassification
A : impurity of an attribute
B : Purity of an attribute
C : Weight of an attribute
D : Class of an attribute
Q.no 19. the negative tuples that were correctly labeled by the
classifier
A : False positives(FP)
B : True positives(TP)
D : False negatives(FN)
A : random sampling
C : cross validation
D : the true positive rate (TPR) and the false positive rate
(FPR)
A : Single mode
B : Two mode
C : Multi mode
D : Large mode
A : Clustering
B : Regression
C : Summarization
D : Association rules
D : Programs are not dependent on the physical attributes as well as logical attributes
of the data
Q.no 26. If first object X and Y coordinates are 3 and 5 respectively and second
object X and Y coordinates are 10 and 3 respectively, then what is Manhattan
disstance between these two objects?
A:8
B : 13
C:9
D : 10
B : OLTP
Q.no 28. Which of the following operations are used to calculate proximity
measures for ordinal attribute?
B : Correlation coefficient
C : Discretization
D : Randomization
A : Test conditions
B : Class labels
C : Attribute values
D : Decision
Q.no 32. The Galaxy Schema is also called as
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 33. For mining frequent itemsets, the Data format used by Apriori and FP-
Growth algorithms are
A : Noise
B : Sampling
C : Clustering
D : Histogram
A : Apriori probability
B : subjective probability
C : posterior probability
D : conditional probability
Q.no 37. In which step of Knowledge Discovery, multiple data sources are
combined?
A : Data Cleaning
B : Data Integration
C : Data Selection
D : Data Transformation
Q.no 38. Some company wants to divide their customers into distinct groups to
send offers this is an example of
A : Data Extraction
B : Data Classification
C : Data Discrimination
D : Data Selection
Q.no 39. The accuracy of a classifier on a given test set is the percentage of
A : Cosine dissimilarity
B : Sine similarity
C : Sine dissimilarity
D : Cosine similarity
A : Rule based
D : Random Forest
Q.no 42. The problem of finding hidden structure from unlabeled data is called as
A : Supervised learning
B : Unsupervised learning
C : Reinforcement Learning
D : Semisupervised learning
Q.no 43. Transforming a 3-D cube into a series of 2-D planes is the examplele of
A : Pivot
B : Roll up
C : Drill down
D : Slice
Q.no 44. What is the range of the angle between two term frequency vectors?
A : Zero to Thirty
B : Zero to Ninety
Q.no 45. Name the property of objects for which distance from first object to
second and vice-versa is same.
A : Symmetry
B : Transitive
C : Positive definiteness
D : Traingle inequality
Q.no 46. Ordinal attribute has three distinct values such as Fair, Good, and
Excellent.
If x and y are two objects of ordinal attribute with Fair and Good values
respectively, then what is the distance from object y to x?
A:1
B:0
C : 0.5
D : 0.75
Q.no 47. A concept hierarchy that is a total or partial order among attributes in a
database schema is called
A : Mixed hierarchy
B : Total hierarchy
C : Schema hierarchy
D : Concept generalization
A : CART
B : C4.5
C : ID3
D : ALL
Q.no 49. How the bayesian network can be used to answer any query?
A : Full distribution
B : Joint distribution
C : Partial distribution
Q.no 50. A database has 4 transactions.Of these, 4 transactions include milk and
bread. Further , of the given 4 transactions, 3 transactions include cheese. Find
the support percentage for the following association rule, " If milk and bread
purchased then cheese is also purchased".
A : 0.6
B : 0.75
C : 0.8
D : 0.7
Q.no 51. a model predicts 50 examples belonging to the minority class, 45 of which
are true positives and five of which are false positives. Precision of model is
A : Precision= 0.90
B : Precision= 0.79
C : Precision= 0.45
D : Precision= 0.68
Q.no 52. A sub-database which consists of set of prefix paths in the FP-tree co-
occuring with the sufix pattern is called as
A : Suffix path
B : FP-tree
C : Prefix path
Q.no 53. High entropy means that the partitions in classification are
A : pure
B : Not pure
C : Useful
D : Not useful
A:7
B:9
C : 10
D : 11
Q.no 56. In one of the frequent itemset example, it is observed that if tea and milk
are bought then sugar is also purchased by customers. After, generating an
association rule among the given set of items, it is inferred:
D : eliminating noise
Q.no 58. A data normalization technique for real-valued attributes that divides
each numerical value by the same power of 10.
A : min-max normalization
B : z-score normalization
C : decimal scaling
D : decimal smoothing
A : Pivot
B : Roll up
C : Drill down
D : Slice
Q.no 60. Holdout method, Cross-validation and Bootstrap methods are techniques
to estimate
A : Precision
B : Classifier performance
C : Recall
D : F-measure
Answer for Question No 1. is a
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Sin
B : Tan
C : Cos
D : Sec
A : Data Integration
B : Data Selection
C : Data Transformation
D : Data Cleaning
Q.no 3. cross-validation and bootstrap methods are common techniques for
assessing
A : accuracy
B : Precision
C : recall
D : performance
Q.no 4. The task of building decision model from labeled training data is called as
A : Supervised Learning
B : Unsupervised Learning
C : Reinforcement Learning
D : Structure Learning
A : Dimension table
B : Fact table
A : Vector
B : Matirx
C : List
A : Regression
B : Classification
D : Decision Trees
A : Application-oriented
B : Object-oriented
C : Goal-oriented
D : Subject-oriented
Q.no 11. What is the method to interpret the results after rule generation?
A : Absolute Mean
B : Lift ratio
C : Gini Index
D : Apriori
Q.no 12. The distance between two points calculated using Pythagoras theorem is
A : Supremum distance
B : Euclidean distance
C : Linear distance
D : Manhattan Distance
Q.no 13. What is the range of the cosine similarity of the two documents?
A : Zero to One
B : Zero to infinity
C : Infinity to infinity
D : Zero to Zero
A : Nominal
B : Binary
C : Ordinal
D : numeric
Q.no 15. The schema is collection of stars. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
A : Validation Data
B : Training Data
C : Testing Data
D : Hidden Data
Q.no 17. The problem of agents to learn from the environment by their
interactions with dynamic environment is done in
A : Reinforcement learning
B : Multi-label classification
C : Binary Classification
D : Multiclassification
Q.no 20. Learning algorithm which trains with combination of labeled and
unlabeled data.
A : Supervised
B : Unsupervised
C : Semi supervised
D : Non- supervised
Q.no 23. Which of the following operation is correct about supremum distance?
Q.no 24. Frequent patterns generated from association can be used for
classification is called
A : Naïve Bays
B : Associative Classification
C : Preditctive Mining
D : Decision Tree
Q.no 25. Holdout and random subsampling are common techniques for assessing
A : K-Fold validation
B : cross validation
C : accuracy
D : sampling
Q.no 26. Which statement is true about the decision tree attribute selection
process
A : A categorical attribute may appear in a tree node several times but a numeric
attribute may appear at most once.
B : A numeric attribute may appear in several tree nodes but a categorical attribute
may appear at most once.
C : Both numeric and categorical attributes may appear in several tree nodes.
D : Numeric and categorical attributes may appear in at most one tree node.
Q.no 27. Which of the following is not correct use of cross validation?
A : Selecting variables to include in a model
B : Comparing predictors
D : classification
A : Noise
B : Sampling
C : Clustering
D : Histogram
Q.no 32. If A, B are two sets of items, and A is a subset of B. Which of the following
statement is always true?
A : Support(A) is less than or equal to Support(B)
A : 1 Dimensional
B : 2 Dimensional
C : 3 Dimensional
D : n-Dimensional
A : Regression
B : Classification
C : Sampling
D : Cross validation
Q.no 38. What type of data do you need for a chi-square test?
A : Categorical
B : Ordinal
C : Interval
D : Scales
A : misclassification rate
D : correctness
A : OLAP
B : OLTP
Q.no 41. How the bayesian network can be used to answer any query?
A : Full distribution
B : Joint distribution
C : Partial distribution
Q.no 42. Which operation is required to calculate Hamming distacne between two
objects?
A : AND
B : OR
C : NOT
D : XOR
Q.no 43. This technique uses mean and standard deviation scores to transform
real-valued attributes.
A : decimal scaling
B : min-max normalization
C : z-score normalization
D : logarithmic normalization
A : Rule based
C : Bayesian classifier
D : Random Forest
C : Underfitting
A : 0-D cuboid
B : 1-D cuboid
C : Base cuboid
D : 2-D cuboid
Q.no 48. In Binning, we first sort data and partition into (equal-frequency) bins
and then which of the following is not valid step
Q.no 49. A model makes predictions and predicts 90 of the positive class
predictions correctly and 10 incorrectly.Recall of model is
A : Recall=0.9
B : Recall=0.39
C : Recall=0.65
D : Recall=5.0
Q.no 50. A database has 4 transactions.Of these, 4 transactions include milk and
bread. Further , of the given 4 transactions, 3 transactions include cheese. Find
the support percentage for the following association rule, " If milk and bread
purchased then cheese is also purchased".
A : 0.6
B : 0.75
C : 0.8
D : 0.7
Q.no 51. The basic idea of the apriori algorithm is to generate the item sets of a
particular size & scans the database. These item sets are
A : Primary
B : Secondary
C : Superkey
D : Candidate
Q.no 52. Which is the most well known association rule algorithm and is used in
most commercial products.
A : Apriori algorithm
B : Pincer-search algorithm
C : Distributed algorithm
D : Partition algorithm
Q.no 53. Name the property of objects for which distance from first object to
second and vice-versa is same.
A : Symmetry
B : Transitive
C : Positive definiteness
D : Traingle inequality
Q.no 55. These numbers are taken from the number of people that attended a
particular church every Friday for 7 weeks: 62, 18, 39, 13, 16, 37, 25. Find the
mean.
A : 25
B : 210
C : 62
D : 30
Q.no 56. In one of the frequent itemset example, it is observed that if tea and milk
are bought then sugar is also purchased by customers. After, generating an
association rule among the given set of items, it is inferred:
A:7
B:9
C : 10
D : 11
Q.no 60. The tables are easy to maintain and saves storage space.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Answer for Question No 1. is c
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Vector
B : Matirx
C : List
A : Study Class
B : Intial Class
C : Target Class
D : Final Class
A : False positives(FP)
B : True positives(TP)
D : False negatives(FN)
Q.no 5. A person trained to interact with a human expert in order to capture their
knowledge.
A : knowledge programmer
B : knowledge developer
C : knowledge engineer
D : knowledge extractor
A : recovery
B : data cleaning
C : data cleansing
D : data pruning
A : supervised classification
B : semi-supervised classification
C : unsupervised classification
D : regression
Q.no 8. What is the range of the cosine similarity of the two documents?
A : Zero to One
B : Zero to infinity
C : Infinity to infinity
D : Zero to Zero
Q.no 10. The task of building decision model from labeled training data is called
as
A : Supervised Learning
B : Unsupervised Learning
C : Reinforcement Learning
D : Structure Learning
Q.no 11. The first steps involved in the knowledge discovery is?
A : Data Integration
B : Data Selection
C : Data Transformation
D : Data Cleaning
A : false rate
B : recall
C : negative rate
D : recognition rate
A : general tree
B : binary tree
C : prediction tree
Q.no 14. Supervised learning and unsupervised clustering both require at least
one
A : hidden attribute
B : output attribute
C : input attribute
D : categorical attribute
Q.no 15. The distance between two points calculated using Pythagoras theorem is
A : Supremum distance
B : Euclidean distance
C : Linear distance
D : Manhattan Distance
A : Sin
B : Tan
C : Cos
D : Sec
B : An approach to a problem that is not guaranteed to work but performs well in most
cases
C : Information that is hidden in a database and that cannot be recovered by a simple
SQL query
D : None of these
Q.no 18. The example of knowledge type constraints in constraint based mining is
A : Association or Correlation
B : Rule templates
D : Threshold measures
Q.no 19. Which technique finds the frequent itemsets in just two database scans?
A : Partitioning
B : Sampling
C : Hashing
Q.no 20. A data matrix in which attributes are of the same type and asymmetric is
called
A : Pattern matrix
D : Normal matrix
B : correctness
C : misclassification rate
Q.no 22. If first object X and Y coordinates are 3 and 5 respectively and second
object X and Y coordinates are 10 and 3 respectively, then what is Manhattan
disstance between these two objects?
A:8
B : 13
C:9
D : 10
Q.no 23. Which of the following property typically does not hold for similarity
measures between two objects ?
A : Symmetry
B : Definiteness
C : Triangle inequality
D : Transitive
Q.no 25. One of the most well known software used for classification is
A : Java
B : C4.5
C : Oracle
D : C++
Q.no 26. This supervised learning technique can process both numeric and
categorical input attributes.
A : linear regression
B : Bayes classifier
C : logistic regression
D : backpropagation learning
Q.no 27. A lattice of cuboids is called as
A : Data cube
B : Dimesnion lattice
C : Master lattice
D : Fact table
D : Facts or keys
Q.no 31. Which of the following operation is correct about supremum distance?
A : Normal matrix
B : Sparse matrix
C : Dense matrix
D : Contingency matrix
A : misclassification rate
D : correctness
Q.no 34. What is the limitation behind rule generation in Apriori algorithm?
B : Need to repeatedly scan the whole database and Check a large set of candidates by
pattern matching
Q.no 35. If A, B are two sets of items, and A is a subset of B. Which of the following
statement is always true?
Q.no 36. Which of the following sequence is used to calculate proximity measures
for ordinal attribute?
Q.no 37. For a classification problem with highly imbalanced class. The majority
class is observed 99% of times in the training data.
Your model has 99% accuracy after taking the predictions on test data. Which of
the following is not true in such a case?
C : Precision and recall metrics aren’t good for imbalanced class problems.
D : Precision and recall metrics are good for imbalanced class problems.
Q.no 38. Some company wants to divide their customers into distinct groups to
send offers this is an example of
A : Data Extraction
B : Data Classification
C : Data Discrimination
D : Data Selection
Q.no 39. This operation may add new dimension to the cube
A : Roll up
B : Drill down
C : Slice
D : Dice
A : Single mode
B : Two mode
C : Multi mode
D : Large mode
Q.no 41. Holdout method, Cross-validation and Bootstrap methods are techniques
to estimate
A : Precision
B : Classifier performance
C : Recall
D : F-measure
Q.no 42. Transforming a 3-D cube into a series of 2-D planes is the examplele of
A : Pivot
B : Roll up
C : Drill down
D : Slice
Q.no 43. The tables are easy to maintain and saves storage space.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
A : Pivot
B : Roll up
C : Drill down
D : Slice
A:7
B:9
C : 10
D : 11
Q.no 47. This technique uses mean and standard deviation scores to transform
real-valued attributes.
A : decimal scaling
B : min-max normalization
C : z-score normalization
D : logarithmic normalization
Q.no 48. The problem of finding hidden structure from unlabeled data is called as
A : Supervised learning
B : Unsupervised learning
C : Reinforcement Learning
D : Semisupervised learning
A : ROLAP
B : MOLAP
C : HOLAP
D : HaoLap
A : CART
B : C4.5
C : ID3
D : ALL
Q.no 51. High entropy means that the partitions in classification are
A : pure
B : Not pure
C : Useful
D : Not useful
Q.no 52. A database has 4 transactions.Of these, 4 transactions include milk and
bread. Further , of the given 4 transactions, 3 transactions include cheese. Find
the support percentage for the following association rule, " If milk and bread
purchased then cheese is also purchased".
A : 0.6
B : 0.75
C : 0.8
D : 0.7
Q.no 53. In one of the frequent itemset example, it is observed that if tea and milk
are bought then sugar is also purchased by customers. After, generating an
association rule among the given set of items, it is inferred:
Q.no 54. A sub-database which consists of set of prefix paths in the FP-tree co-
occuring with the sufix pattern is called as
A : Suffix path
B : FP-tree
C : Prefix path
Q.no 55. The basic idea of the apriori algorithm is to generate the item sets of a
particular size & scans the database. These item sets are
A : Primary
B : Secondary
C : Superkey
D : Candidate
Q.no 56. a model predicts 50 examples belonging to the minority class, 45 of which
are true positives and five of which are false positives. Precision of model is
A : Precision= 0.90
B : Precision= 0.79
C : Precision= 0.45
D : Precision= 0.68
Q.no 58. Which operation is required to calculate Hamming distacne between two
objects?
A : AND
B : OR
C : NOT
D : XOR
Q.no 59. A concept hierarchy that is a total or partial order among attributes in a
database schema is called
A : Mixed hierarchy
B : Total hierarchy
C : Schema hierarchy
D : Concept generalization
Q.no 60. How the bayesian network can be used to answer any query?
A : Full distribution
B : Joint distribution
C : Partial distribution
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Reinforcement learning
B : Multi-label classification
C : Binary Classification
D : Multiclassification
B : Support
C : tree induction
D : Trees
Q.no 3. Which of the following is correct about Proximity measures?
A : Similarity
B : Dissimilarity
A : Pruning
B : Partitioning
C : Candidate generation
D : Itemset generation
A : Supervised
B : Unsupervised
C : Semi supervised
D : Non- supervised
Q.no 6. The most widely used metrics and tools to assess a classification model
are:
A : Conusion Matrix
B : Support
C : Entropy
D : Probability
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
A : random sampling
C : cross validation
D : the true positive rate (TPR) and the false positive rate
(FPR)
A : Support
B : Confidence
C : Support count
A : Data constraints
B : Rule constraints
D : Time constraints
Q.no 12. Each dimension is represented by only one table. Recognize the type of
schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 13. How can one represent document to calculate cosine similarity?
A : Vector
B : Matirx
C : List
Q.no 14. What is the method to interpret the results after rule generation?
A : Absolute Mean
B : Lift ratio
C : Gini Index
D : Apriori
A : Regression
B : Classification
D : Decision Trees
A : false rate
B : recall
C : negative rate
D : recognition rate
A : Nominal
B : Binary
C : Ordinal
D : Numeric
Q.no 18. cross-validation and bootstrap methods are common techniques for
assessing
A : accuracy
B : Precision
C : recall
D : performance
A : Application-oriented
B : Object-oriented
C : Goal-oriented
D : Subject-oriented
Q.no 21. Every key structure in the data warehouse contains a time element
A : records
B : Explicitly
D : Implicitly or explicitly
Q.no 22. This supervised learning technique can process both numeric and
categorical input attributes.
A : linear regression
B : Bayes classifier
C : logistic regression
D : backpropagation learning
Q.no 23. For mining frequent itemsets, the Data format used by Apriori and FP-
Growth algorithms are
A : A frequent-item-node
B : An item-prefix-tree
C : A frequent-item-header table
D : both B and C
Q.no 26. Learning with a complete system in mind with reference to interactions
among
the systems and subsystems with proper understanding of systemic boundaries is
A : Multi-label classification
B : Reinforcement learning
C : Systemic learning
D : Machine Learning
Q.no 27. Handwritten digit recognition classifying an image of a handwritten
number into a digit from 0 to 9 is example of
A : Multiclassification
B : Multi-label classification
C : Imbalanced classification
D : Binary Classification
A : Only one
B : Two
C : Three
D : Four
Q.no 31. What is the limitation behind rule generation in Apriori algorithm?
B : Need to repeatedly scan the whole database and Check a large set of candidates by
pattern matching
A : Single mode
B : Two mode
C : Multi mode
D : Large mode
A : Test conditions
B : Class labels
C : Attribute values
D : Decision
A : Data cube
B : Dimesnion lattice
C : Master lattice
D : Fact table
Q.no 36. If first object X and Y coordinates are 3 and 5 respectively and second
object X and Y coordinates are 10 and 3 respectively, then what is Manhattan
disstance between these two objects?
A:8
B : 13
C:9
D : 10
D : No class
Q.no 39. One of the most well known software used for classification is
A : Java
B : C4.5
C : Oracle
D : C++
Q.no 41. The basic idea of the apriori algorithm is to generate the item sets of a
particular size & scans the database. These item sets are
A : Primary
B : Secondary
C : Superkey
D : Candidate
A : Rule based
C : Bayesian classifier
D : Random Forest
B : Exponent
C : Modulus
D : Percentage
D : eliminating noise
A : CART
B : C4.5
C : ID3
D : ALL
Q.no 47. Transforming a 3-D cube into a series of 2-D planes is the examplele of
A : Pivot
B : Roll up
C : Drill down
D : Slice
Q.no 48. How the bayesian network can be used to answer any query?
A : Full distribution
B : Joint distribution
C : Partial distribution
Q.no 49. What is the range of the angle between two term frequency vectors?
A : Zero to Thirty
B : Zero to Ninety
Q.no 50. If True Positives (TP): 7, False Positives (FP): 1,False Negatives (FN): 4,
True Negatives (TN): 18. Calculate Precision and Recall.
Q.no 51. The cuboid that holds the lowest level of summarization is called as
A : 0-D cuboid
B : 1-D cuboid
C : Base cuboid
D : 2-D cuboid
Q.no 52. In Binning, we first sort data and partition into (equal-frequency) bins
and then which of the following is not valid step
Q.no 53. A model makes predictions and predicts 90 of the positive class
predictions correctly and 10 incorrectly.Recall of model is
A : Recall=0.9
B : Recall=0.39
C : Recall=0.65
D : Recall=5.0
Q.no 54. Name the property of objects for which distance from first object to
second and vice-versa is same.
A : Symmetry
B : Transitive
C : Positive definiteness
D : Traingle inequality
Q.no 55. In one of the frequent itemset example, it is observed that if tea and milk
are bought then sugar is also purchased by customers. After, generating an
association rule among the given set of items, it is inferred:
C : Underfitting
Q.no 57. Which operation is required to calculate Hamming distacne between two
objects?
A : AND
B : OR
C : NOT
D : XOR
Q.no 58. The tables are easy to maintain and saves storage space.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 59. a model predicts 50 examples belonging to the minority class, 45 of which
are true positives and five of which are false positives. Precision of model is
A : Precision= 0.90
B : Precision= 0.79
C : Precision= 0.45
D : Precision= 0.68
Q.no 60. Effectiveness of the browsing is highest. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Answer for Question No 1. is a
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Sin
B : Tan
C : Cos
D : Sec
C : representing data
A : ROLAP
B : MOLAP
C : HOLAP
D : HaoLap
A : sub tree
B : class label
C : testing node
D : condition
A : Dimension table
B : Fact table
A : Validation Data
B : Training Data
C : Testing Data
D : Hidden Data
A : Data warehouse
C : ETL
D : Business Intelligemce
A : Nominal
B : Binary
C : Ordinal
D : numeric
A : sampling
B : Reinforcement learning
C : unsupervised classification
D : semi-supervised classification
A : Decision tree
B : Regression Analysis
C : Induction
D : Association Rules
A : Pruning
B : Rule generation
C : Induction
D : spliiting
Q.no 12. Learning algorithm which trains with combination of labeled and
unlabeled data.
A : Supervised
B : Unsupervised
C : Semi supervised
D : Non- supervised
A : Itemsets
B : Subsequences
C : Substructures
D : Associations
A : L1 norm
B : L2 norm
C : Lmax norm
D : L norm
Q.no 15. Which one of the following is true for decision tree
Q.no 16. What is the range of the cosine similarity of the two documents?
A : Zero to One
B : Zero to infinity
C : Infinity to infinity
D : Zero to Zero
A : false rate
B : recall
C : negative rate
D : recognition rate
Q.no 18. Which of the following are methods for supervised classification?
A : Decision tree
B : K-Means
C : Hierarchical
D : Apriori
Q.no 19. The schema is collection of stars. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
A : recovery
B : data cleaning
C : data cleansing
D : data pruning
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 22. Every key structure in the data warehouse contains a time element
A : records
B : Explicitly
Q.no 23. If x and y are two objects of nominal attribute with COMP and IT values
respectively, then what is the similarity between these two objects?
A : Zero
B : Infinity
C : Two
D : One
Q.no 24. The accuracy of a classifier on a given test set is the percentage of
A : Data cube
B : Dimesnion lattice
C : Master lattice
D : Fact table
Q.no 27. Which of the following is not correct use of cross validation?
B : Comparing predictors
A : Only one
B : Two
C : Three
D : Four
A : Normal Distribution
B : Chi-Squared Distribution
C : Gamma Distribution
D : Poisson Distribution
Q.no 30. What is the approach of basic algorithm for decision tree induction?
A : Greedy
B : Top Down
C : Procedural
D : Step by Step
Q.no 31. Joins will be needed to execute the query. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 32. Which of the following sequence is used to calculate proximity measures
for ordinal attribute?
Q.no 33. Some company wants to divide their customers into distinct groups to
send offers this is an example of
A : Data Extraction
B : Data Classification
C : Data Discrimination
D : Data Selection
D : None of these
Q.no 36. What type of data do you need for a chi-square test?
A : Categorical
B : Ordinal
C : Interval
D : Scales
Q.no 37. In which step of Knowledge Discovery, multiple data sources are
combined?
A : Data Cleaning
B : Data Integration
C : Data Selection
D : Data Transformation
A : Cosine dissimilarity
B : Sine similarity
C : Sine dissimilarity
D : Cosine similarity
A : Top-down
B : Recursive
C : Bottom-up
Q.no 41. precision of model is 0.75 and recall is 0.43 then F-Score is
A : F-Score= 0.99
B : F-Score= 0.84
C : F-Score= 0.55
D : F-Score= 0.49
Q.no 44. A sub-database which consists of set of prefix paths in the FP-tree co-
occuring with the sufix pattern is called as
A : Suffix path
B : FP-tree
C : Prefix path
Q.no 45. These numbers are taken from the number of people that attended a
particular church every Friday for 7 weeks: 62, 18, 39, 13, 16, 37, 25. Find the
mean.
A : 25
B : 210
C : 62
D : 30
C : Underfitting
Q.no 47. The basic idea of the apriori algorithm is to generate the item sets of a
particular size & scans the database. These item sets are
A : Primary
B : Secondary
C : Superkey
D : Candidate
A : Rule based
C : Bayesian classifier
D : Random Forest
Q.no 49. a model predicts 50 examples belonging to the minority class, 45 of which
are true positives and five of which are false positives. Precision of model is
A : Precision= 0.90
B : Precision= 0.79
C : Precision= 0.45
D : Precision= 0.68
Q.no 50. Name the property of objects for which distance from first object to
second and vice-versa is same.
A : Symmetry
B : Transitive
C : Positive definiteness
D : Traingle inequality
A:7
B:9
C : 10
D : 11
Q.no 52. Holdout method, Cross-validation and Bootstrap methods are techniques
to estimate
A : Precision
B : Classifier performance
C : Recall
D : F-measure
Q.no 53. Ordinal attribute has three distinct values such as Fair, Good, and
Excellent.
If x and y are two objects of ordinal attribute with Fair and Good values
respectively, then what is the distance from object y to x?
A:1
B:0
C : 0.5
D : 0.75
A : CART
B : C4.5
C : ID3
D : ALL
Q.no 55. The cuboid that holds the lowest level of summarization is called as
A : 0-D cuboid
B : 1-D cuboid
C : Base cuboid
D : 2-D cuboid
A : Pivot
B : Roll up
C : Drill down
D : Slice
Q.no 57. In Binning, we first sort data and partition into (equal-frequency) bins
and then which of the following is not valid step
B : City Block
distance
C : Chebyshev distance
D : Euclidean distance
Q.no 59. In one of the frequent itemset example, it is observed that if tea and milk
are bought then sugar is also purchased by customers. After, generating an
association rule among the given set of items, it is inferred:
Q.no 60. Which operation is required to calculate Hamming distacne between two
objects?
A : AND
B : OR
C : NOT
D : XOR
Answer for Question No 1. is c
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Data warehouse
C : ETL
D : Business Intelligemce
A : Association or Correlation
B : Rule templates
D : Threshold measures
Q.no 3. If two documents are similar, then what is the measure of angle between
two documents?
A : 30
B : 60
C : 90
D:0
Q.no 4. The most widely used metrics and tools to assess a classification model
are:
A : Conusion Matrix
B : Support
C : Entropy
D : Probability
Q.no 5. The distance between two points calculated using Pythagoras theorem is
A : Supremum distance
B : Euclidean distance
C : Linear distance
D : Manhattan Distance
A : Nominal
B : Binary
C : Ordinal
D : Numeric
A : Vector
B : Matirx
C : List
D : Term frequency vector
A : sampling
B : Reinforcement learning
C : unsupervised classification
D : semi-supervised classification
Q.no 9. Which is the keyword that distinguishes data warehouses from other data
repository systems ?
A : Subject-oriented
B : Object-oriented
C : Client server
D : Time-invariant
A : supervised classification
B : semi-supervised classification
C : unsupervised classification
D : regression
A : Similarity
B : Dissimilarity
A : Pruning
B : Partitioning
C : Candidate generation
D : Itemset generation
B : An approach to a problem that is not guaranteed to work but performs well in most
cases
D : None of these
A : Nominal
B : Binary
C : Ordinal
D : numeric
A : Decision tree
B : Regression Analysis
C : Induction
D : Association Rules
Q.no 17. Learning algorithm which trains with combination of labeled and
unlabeled data.
A : Supervised
B : Unsupervised
C : Semi supervised
D : Non- supervised
Q.no 18. An automatic car driver and business intelligent systems are examples of
A : Regression
B : Classification
C : Machine Learning
D : Reinforcement learning
Q.no 19. Which of the following is direct application of frequent itemset mining?
C : Outlier Detection
D : Intrusion Detection
A : ROLAP
B : MOLAP
C : HOLAP
D : HaoLap
Q.no 23. For mining frequent itemsets, the Data format used by Apriori and FP-
Growth algorithms are
A : Noise
B : Sampling
C : Clustering
D : Histogram
A : Solving queries
B : Increasing complexity
C : Decreasing complexity
Q.no 27. This operation may add new dimension to the cube
A : Roll up
B : Drill down
C : Slice
D : Dice
Q.no 28. If x and y are two objects of nominal attribute with COMP and IT values
respectively, then what is the similarity between these two objects?
A : Zero
B : Infinity
C : Two
D : One
Q.no 29. Every key structure in the data warehouse contains a time element
A : records
B : Explicitly
D : Implicitly or explicitly
Q.no 30. What type of matrix is required to represent binary data for proximity
measures?
A : Normal matrix
B : Sparse matrix
C : Dense matrix
D : Contingency matrix
Q.no 31. In which step of Knowledge Discovery, multiple data sources are
combined?
A : Data Cleaning
B : Data Integration
C : Data Selection
D : Data Transformation
Q.no 32. In a decision tree each leaf node represents
A : Test conditions
B : Class labels
C : Attribute values
D : Decision
A : cross validation
B : sampling
C : Error-detecting codes
D : Error-correcting codes
A : testing the machine on all possible ways by substituting the original sample into
training set
B : testing the machine on all possible ways by dividing the original sample into
training and validation sets.
A : Consolidated
B : Primitive
C : Highly detailed
D : Recent data
A : weather forecast
B : data matrix
D : genomic data
B : Correlation coefficient
C : Discretization
D : Randomization
Q.no 40. Which of the following probabilities are used in the Bayes theorem.
A : P(Ci|X)
B : P(Ci)
C : P(X|Ci)
D : P(X)
Q.no 41. Ordinal attribute has three distinct values such as Fair, Good, and
Excellent.
If x and y are two objects of ordinal attribute with Fair and Good values
respectively, then what is the distance from object y to x?
A:1
B:0
C : 0.5
D : 0.75
Q.no 42. A sub-database which consists of set of prefix paths in the FP-tree co-
occuring with the sufix pattern is called as
A : Suffix path
B : FP-tree
C : Prefix path
D : eliminating noise
Q.no 44. These numbers are taken from the number of people that attended a
particular church every Friday for 7 weeks: 62, 18, 39, 13, 16, 37, 25. Find the
mean.
A : 25
B : 210
C : 62
D : 30
Q.no 45. This technique uses mean and standard deviation scores to transform
real-valued attributes.
A : decimal scaling
B : min-max normalization
C : z-score normalization
D : logarithmic normalization
Q.no 46. Transforming a 3-D cube into a series of 2-D planes is the examplele of
A : Pivot
B : Roll up
C : Drill down
D : Slice
Q.no 47. Which is the most well known association rule algorithm and is used in
most commercial products.
A : Apriori algorithm
B : Pincer-search algorithm
C : Distributed algorithm
D : Partition algorithm
Q.no 48. A database has 4 transactions.Of these, 4 transactions include milk and
bread. Further , of the given 4 transactions, 3 transactions include cheese. Find
the support percentage for the following association rule, " If milk and bread
purchased then cheese is also purchased".
A : 0.6
B : 0.75
C : 0.8
D : 0.7
Q.no 49. Effectiveness of the browsing is highest. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 50. The basic idea of the apriori algorithm is to generate the item sets of a
particular size & scans the database. These item sets are
A : Primary
B : Secondary
C : Superkey
D : Candidate
Q.no 51. Name the property of objects for which distance from first object to
second and vice-versa is same.
A : Symmetry
B : Transitive
C : Positive definiteness
D : Traingle inequality
Q.no 52. Which operation is required to calculate Hamming distacne between two
objects?
A : AND
B : OR
C : NOT
D : XOR
Q.no 53. How the bayesian network can be used to answer any query?
A : Full distribution
B : Joint distribution
C : Partial distribution
Q.no 54. What is the range of the angle between two term frequency vectors?
A : Zero to Thirty
B : Zero to Ninety
Q.no 55. precision of model is 0.75 and recall is 0.43 then F-Score is
A : F-Score= 0.99
B : F-Score= 0.84
C : F-Score= 0.55
D : F-Score= 0.49
Q.no 56. The tables are easy to maintain and saves storage space.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
B : City Block
distance
C : Chebyshev distance
D : Euclidean distance
A : CART
B : C4.5
C : ID3
D : ALL
Q.no 59. A concept hierarchy that is a total or partial order among attributes in a
database schema is called
A : Mixed hierarchy
B : Total hierarchy
C : Schema hierarchy
D : Concept generalization
C : Underfitting
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Data Integration
B : Data Selection
C : Data Transformation
D : Data Cleaning
B : Regression Analysis
C : Induction
D : Association Rules
A : Itemsets
B : Subsequences
C : Substructures
D : Associations
Q.no 5. The distance between two points calculated using Pythagoras theorem is
A : Supremum distance
B : Euclidean distance
C : Linear distance
D : Manhattan Distance
A : Dimensions
B : Facts
D : Dimensions or Facts
A : random sampling
C : cross validation
D : the true positive rate (TPR) and the false positive rate
(FPR)
Q.no 8. Which of the following is the data mining tool?
A : Borland C
B : Weka
C : Borland C++
D : Visual C
A : sampling
B : Reinforcement learning
C : unsupervised classification
D : semi-supervised classification
Q.no 10. Each dimension is represented by only one table. Recognize the type of
schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 14. What is the range of the cosine similarity of the two documents?
A : Zero to One
B : Zero to infinity
C : Infinity to infinity
D : Zero to Zero
A : confusion matrix
D : classifier
Q.no 17. Supervised learning and unsupervised clustering both require at least
one
A : hidden attribute
B : output attribute
C : input attribute
D : categorical attribute
Q.no 18. CART stands for
A : Regression
B : Classification
D : Decision Trees
A : A closed itemset
B : A frequent itemset
A : Study Class
B : Intial Class
C : Target Class
D : Final Class
A : learner waits until the last minute before constructing model to classify
B : a given training data constructs a model first and then uses it to classify
A : P(Ci|X)
B : P(Ci)
C : P(X|Ci)
D : P(X)
A : A frequent-item-node
B : An item-prefix-tree
C : A frequent-item-header table
D : both B and C
Q.no 25. Holdout and random subsampling are common techniques for assessing
A : K-Fold validation
B : cross validation
C : accuracy
D : sampling
B : correctness
C : misclassification rate
Q.no 27. If A, B are two sets of items, and A is a subset of B. Which of the following
statement is always true?
A : cross validation
B : sampling
C : Error-detecting codes
D : Error-correcting codes
Q.no 29. What is the limitation behind rule generation in Apriori algorithm?
B : Need to repeatedly scan the whole database and Check a large set of candidates by
pattern matching
D : No class
A : Consolidated
B : Primitive
C : Highly detailed
D : Recent data
Q.no 32. When you use cross validation in machine learning, it means
A : you verify how accurate your model is on multiple and different subsets of data.
A : Greedy
B : Top Down
C : Procedural
D : Step by Step
Q.no 34. Which of the following operations are used to calculate proximity
measures for ordinal attribute?
A : Frequent 5 itemsets
B : Frequent 3 itemsets
C : Frequent 4 itemsets
D : Frequent 6 itemsets
A : Clustering
B : Regression
C : Summarization
D : Association rules
A : Noise
B : Sampling
C : Clustering
D : Histogram
Q.no 38. Some company wants to divide their customers into distinct groups to
send offers this is an example of
A : Data Extraction
B : Data Classification
C : Data Discrimination
D : Data Selection
A : Data cube
B : Dimesnion lattice
C : Master lattice
D : Fact table
Q.no 41. A database has 4 transactions.Of these, 4 transactions include milk and
bread. Further , of the given 4 transactions, 3 transactions include cheese. Find
the support percentage for the following association rule, " If milk and bread
purchased then cheese is also purchased".
A : 0.6
B : 0.75
C : 0.8
D : 0.7
Q.no 42. A sub-database which consists of set of prefix paths in the FP-tree co-
occuring with the sufix pattern is called as
A : Suffix path
B : FP-tree
C : Prefix path
Q.no 43. The cuboid that holds the lowest level of summarization is called as
A : 0-D cuboid
B : 1-D cuboid
C : Base cuboid
D : 2-D cuboid
C : Underfitting
Q.no 45. Transforming a 3-D cube into a series of 2-D planes is the examplele of
A : Pivot
B : Roll up
C : Drill down
D : Slice
B : Transaction processing
C : Recovery
Q.no 47. These numbers are taken from the number of people that attended a
particular church every Friday for 7 weeks: 62, 18, 39, 13, 16, 37, 25. Find the
mean.
A : 25
B : 210
C : 62
D : 30
Q.no 48. If True Positives (TP): 7, False Positives (FP): 1,False Negatives (FN): 4,
True Negatives (TN): 18. Calculate Precision and Recall.
Q.no 49. The problem of finding hidden structure from unlabeled data is called as
A : Supervised learning
B : Unsupervised learning
C : Reinforcement Learning
D : Semisupervised learning
A : Pivot
B : Roll up
C : Drill down
D : Slice
Q.no 51. High entropy means that the partitions in classification are
A : pure
B : Not pure
C : Useful
D : Not useful
Q.no 52. A model makes predictions and predicts 120 examples as belonging to the
minority class, 90 of which are correct, and 30 of which are incorrect. Precision of
model is
A : Precision = 0.89
B : Precision = 0.23
C : Precision = 0.45
D : Precision = 0.75
Q.no 53. The tables are easy to maintain and saves storage space.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 54. precision of model is 0.75 and recall is 0.43 then F-Score is
A : F-Score= 0.99
B : F-Score= 0.84
C : F-Score= 0.55
D : F-Score= 0.49
Q.no 55. A model makes predictions and predicts 90 of the positive class
predictions correctly and 10 incorrectly.Recall of model is
A : Recall=0.9
B : Recall=0.39
C : Recall=0.65
D : Recall=5.0
Q.no 56. Effectiveness of the browsing is highest. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
B : C4.5
C : ID3
D : ALL
Q.no 58. A data normalization technique for real-valued attributes that divides
each numerical value by the same power of 10.
A : min-max normalization
B : z-score normalization
C : decimal scaling
D : decimal smoothing
Q.no 60. How the bayesian network can be used to answer any query?
A : Full distribution
B : Joint distribution
C : Partial distribution
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
A : Pruning
B : Partitioning
C : Candidate generation
D : Itemset generation
A : Decision trees
B : Eclat
C : FP growth
D : Apriori
B : Rule constraints
D : Time constraints
A : random sampling
C : cross validation
D : the true positive rate (TPR) and the false positive rate
(FPR)
Q.no 5. If two documents are similar, then what is the measure of angle between
two documents?
A : 30
B : 60
C : 90
D:0
Q.no 7. Supervised learning and unsupervised clustering both require at least one
A : hidden attribute
B : output attribute
C : input attribute
D : categorical attribute
Q.no 8. The fact is also called as
A : Dimension
B : Key
C : Schema
D : Measure
Q.no 9. The most widely used metrics and tools to assess a classification model
are:
A : Conusion Matrix
B : Support
C : Entropy
D : Probability
Q.no 10. A person trained to interact with a human expert in order to capture
their knowledge.
A : knowledge programmer
B : knowledge developer
C : knowledge engineer
D : knowledge extractor
A : Pruning
B : Rule generation
C : Induction
D : spliiting
Q.no 12. The schema is collection of stars. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 13. The distance between two points calculated using Pythagoras theorem is
A : Supremum distance
B : Euclidean distance
C : Linear distance
D : Manhattan Distance
A : confusion matrix
D : classifier
A : Pruning
B : Partitioning
C : Candidate generation
D : Itemset generation
Q.no 16. The example of knowledge type constraints in constraint based mining is
A : Association or Correlation
B : Rule templates
D : Threshold measures
A : Nominal
B : Binary
C : Ordinal
D : Numeric
A : Dimensions
B : Facts
D : Dimensions or Facts
Q.no 19. Which one of the following is true for decision tree
A : ROLAP
B : MOLAP
C : HOLAP
D : HaoLap
Q.no 25. What type of matrix is required to represent binary data for proximity
measures?
A : Normal matrix
B : Sparse matrix
C : Dense matrix
D : Contingency matrix
A : misclassification rate
D : correctness
A : Frequent 5 itemsets
B : Frequent 3 itemsets
C : Frequent 4 itemsets
D : Frequent 6 itemsets
A : Multiclassification
B : Multi-label classification
C : Imbalanced classification
D : Binary Classification
A : Data cube
B : Dimesnion lattice
C : Master lattice
D : Fact table
B : correctness
C : misclassification rate
A : cross validation
B : sampling
C : Error-detecting codes
D : Error-correcting codes
Q.no 32. This operation may add new dimension to the cube
A : Roll up
B : Drill down
C : Slice
D : Dice
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 34. For a classification problem with highly imbalanced class. The majority
class is observed 99% of times in the training data.
Your model has 99% accuracy after taking the predictions on test data. Which of
the following is not true in such a case?
C : Precision and recall metrics aren’t good for imbalanced class problems.
D : Precision and recall metrics are good for imbalanced class problems.
D : No class
A : Consolidated
B : Primitive
C : Highly detailed
D : Recent data
A : A frequent-item-node
B : An item-prefix-tree
C : A frequent-item-header table
D : both B and C
A : Regression
B : Classification
C : Sampling
D : Cross validation
A : testing the machine on all possible ways by substituting the original sample into
training set
B : testing the machine on all possible ways by dividing the original sample into
training and validation sets.
A : Rule based
C : Bayesian classifier
D : Random Forest
Q.no 42. Ordinal attribute has three distinct values such as Fair, Good, and
Excellent.
If x and y are two objects of ordinal attribute with Fair and Good values
respectively, then what is the distance from object y to x?
A:1
B:0
C : 0.5
D : 0.75
A : Pivot
B : Roll up
C : Drill down
D : Slice
A:7
B:9
C : 10
D : 11
Q.no 45. If True Positives (TP): 7, False Positives (FP): 1,False Negatives (FN): 4,
True Negatives (TN): 18. Calculate Precision and Recall.
Q.no 46. The tables are easy to maintain and saves storage space.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 47. Accuracy is
Q.no 48. What is the range of the angle between two term frequency vectors?
A : Zero to Thirty
B : Zero to Ninety
Q.no 49. A sub-database which consists of set of prefix paths in the FP-tree co-
occuring with the sufix pattern is called as
A : Suffix path
B : FP-tree
C : Prefix path
Q.no 50. Transforming a 3-D cube into a series of 2-D planes is the examplele of
A : Pivot
B : Roll up
C : Drill down
D : Slice
Q.no 51. A model makes predictions and predicts 120 examples as belonging to the
minority class, 90 of which are correct, and 30 of which are incorrect. Precision of
model is
A : Precision = 0.89
B : Precision = 0.23
C : Precision = 0.45
D : Precision = 0.75
Q.no 52. The cuboid that holds the lowest level of summarization is called as
A : 0-D cuboid
B : 1-D cuboid
C : Base cuboid
D : 2-D cuboid
Q.no 53. A data normalization technique for real-valued attributes that divides
each numerical value by the same power of 10.
A : min-max normalization
B : z-score normalization
C : decimal scaling
D : decimal smoothing
Q.no 54. High entropy means that the partitions in classification are
A : pure
B : Not pure
C : Useful
D : Not useful
Q.no 55. In Binning, we first sort data and partition into (equal-frequency) bins
and then which of the following is not valid step
Q.no 56. This technique uses mean and standard deviation scores to transform
real-valued attributes.
A : decimal scaling
B : min-max normalization
C : z-score normalization
D : logarithmic normalization
Q.no 58. precision of model is 0.75 and recall is 0.43 then F-Score is
A : F-Score= 0.99
B : F-Score= 0.84
C : F-Score= 0.55
D : F-Score= 0.49
Q.no 59. The basic idea of the apriori algorithm is to generate the item sets of a
particular size & scans the database. These item sets are
A : Primary
B : Secondary
C : Superkey
D : Candidate
Q.no 60. How the bayesian network can be used to answer any query?
A : Full distribution
B : Joint distribution
C : Partial distribution
1) All questions are Multiple Choice Questions having single correct option.
7) Use only black/blue ball point pen to darken the appropriate circle.
Q.no 1. What is the method to interpret the results after rule generation?
A : Absolute Mean
B : Lift ratio
C : Gini Index
D : Apriori
A : Application-oriented
B : Object-oriented
C : Goal-oriented
D : Subject-oriented
B : Confidence
C : Support count
Q.no 5. Supervised learning and unsupervised clustering both require at least one
A : hidden attribute
B : output attribute
C : input attribute
D : categorical attribute
Q.no 6. The task of building decision model from labeled training data is called as
A : Supervised Learning
B : Unsupervised Learning
C : Reinforcement Learning
D : Structure Learning
Q.no 7. What is the range of the cosine similarity of the two documents?
A : Zero to One
B : Zero to infinity
C : Infinity to infinity
D : Zero to Zero
B : many labels
D : no label
A : Decision trees
B : Eclat
C : FP growth
D : Apriori
Q.no 10. The first steps involved in the knowledge discovery is?
A : Data Integration
B : Data Selection
C : Data Transformation
D : Data Cleaning
Q.no 11. The distance between two points calculated using Pythagoras theorem is
A : Supremum distance
B : Euclidean distance
C : Linear distance
D : Manhattan Distance
C : cross validation
D : the true positive rate (TPR) and the false positive rate
(FPR)
Q.no 14. Each dimension is represented by only one table. Recognize the type of
schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
A : Nominal
B : Binary
C : Ordinal
D : Numeric
A : Sin
B : Tan
C : Cos
D : Sec
Q.no 18. Which of the following is the data mining tool?
A : Borland C
B : Weka
C : Borland C++
D : Visual C
A : general tree
B : binary tree
C : prediction tree
Q.no 21. What is the approach of basic algorithm for decision tree induction?
A : Greedy
B : Top Down
C : Procedural
D : Step by Step
Q.no 23. For mining frequent itemsets, the Data format used by Apriori and FP-
Growth algorithms are
Q.no 24. Which of the following sequence is used to calculate proximity measures
for ordinal attribute?
Q.no 26. Which of the following is not correct use of cross validation?
B : Comparing predictors
D : classification
D : Facts or keys
Q.no 29. Every key structure in the data warehouse contains a time element
A : records
B : Explicitly
D : Implicitly or explicitly
Q.no 30. The accuracy of a classifier on a given test set is the percentage of
A : Regression
B : Classification
C : Sampling
D : Cross validation
Q.no 33. If A, B are two sets of items, and A is a subset of B. Which of the following
statement is always true?
Q.no 34. What is the limitation behind rule generation in Apriori algorithm?
B : Need to repeatedly scan the whole database and Check a large set of candidates by
pattern matching
Q.no 36. One of the most well known software used for classification is
A : Java
B : C4.5
C : Oracle
D : C++
A : weather forecast
B : data matrix
D : genomic data
Q.no 38. What type of matrix is required to represent binary data for proximity
measures?
A : Normal matrix
B : Sparse matrix
C : Dense matrix
D : Contingency matrix
Q.no 39. Some company wants to divide their customers into distinct groups to
send offers this is an example of
A : Data Extraction
B : Data Classification
C : Data Discrimination
D : Data Selection
Q.no 40. This operation may add new dimension to the cube
A : Roll up
B : Drill down
C : Slice
D : Dice
B:9
C : 10
D : 11
Q.no 43. These numbers are taken from the number of people that attended a
particular church every Friday for 7 weeks: 62, 18, 39, 13, 16, 37, 25. Find the
mean.
A : 25
B : 210
C : 62
D : 30
Q.no 44. Effectiveness of the browsing is highest. Recognize the type of schema.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 45. The cuboid that holds the lowest level of summarization is called as
A : 0-D cuboid
B : 1-D cuboid
C : Base cuboid
D : 2-D cuboid
Q.no 46. The tables are easy to maintain and saves storage space.
A : Star Schema
B : Snowflake schema
C : Fact constellation
D : Database schema
Q.no 47. A model makes predictions and predicts 120 examples as belonging to the
minority class, 90 of which are correct, and 30 of which are incorrect. Precision of
model is
A : Precision = 0.89
B : Precision = 0.23
C : Precision = 0.45
D : Precision = 0.75
Q.no 48. A database has 4 transactions.Of these, 4 transactions include milk and
bread. Further , of the given 4 transactions, 3 transactions include cheese. Find
the support percentage for the following association rule, " If milk and bread
purchased then cheese is also purchased".
A : 0.6
B : 0.75
C : 0.8
D : 0.7
Q.no 49. What is the range of the angle between two term frequency vectors?
A : Zero to Thirty
B : Zero to Ninety
B : City Block
distance
C : Chebyshev distance
D : Euclidean distance
Q.no 53. This technique uses mean and standard deviation scores to transform
real-valued attributes.
A : decimal scaling
B : min-max normalization
C : z-score normalization
D : logarithmic normalization
C : Underfitting
D : eliminating noise
Q.no 56. If True Positives (TP): 7, False Positives (FP): 1,False Negatives (FN): 4,
True Negatives (TN): 18. Calculate Precision and Recall.
A : Precision = 0.88, Recall=0.64
Q.no 57. A sub-database which consists of set of prefix paths in the FP-tree co-
occuring with the sufix pattern is called as
A : Suffix path
B : FP-tree
C : Prefix path
A : CART
B : C4.5
C : ID3
D : ALL
Q.no 59. Which is the most well known association rule algorithm and is used in
most commercial products.
A : Apriori algorithm
B : Pincer-search algorithm
C : Distributed algorithm
D : Partition algorithm
Q.no 60. Which operation is required to calculate Hamming distacne between two
objects?
A : AND
B : OR
C : NOT
D : XOR
Answer for Question No 1. is b
12 Incorrect attribute values may due to faulty data data entry data
collection problems transmissio
all
D
instruments n problems
14 Binning method first sort data and partition into (equi-depth) bins TRUE FALSE
A
15 Data can be smoothed by fitting the data to a function, such as with
regression.
TRUE FALSE
A
16 Linear regression - involves finding the___________line to fit two
attributes (or variables)
best average worst
A
22 Redundant data occur often when integration of multiple databases TRUE FALSE
A
23 The same attribute may have different names in different databases TRUE FALSE
A
24 Careful integration of the data from multiple sources may help
reduce/avoid redundancies and inconsistencies
TRUE FALSE
A
31 Data reduction obtains a reduced representation of the data set that is TRUE
much smaller in volume but yet produces the same (or almost the
FALSE
A
same) analytical results
32 Run Length Encoding is lossless TRUE FALSE
A
33 Jpeg compression is lossy lossless
A
34 Wavelet Transform Decomposes a signal into different frequency
subbands
TRUE FALSE
A
35 Principal Component Analysis (PCA) is used for dimensionality
reduction
TRUE FALSE
A
36 Normalization by______________ scaling normalizes by moving the
decimal point of values of attribute A.
binary octal decimal
C
37 Data cube aggregation is normalization TRUE FALSE
B
38 ordinal attribute have values from an ___________set ordered unordered
A
39 Run Length Encoding is lossy lossless
B
40 Nominal attribute have values from an ___________set ordered unordered
B
UNIT SUB : 410244 (D) DMW
TWO
7 Among the types of fact tables which is not a correct type ? Fact-less fact
table
Transaction
fact tables
Integration
fact tables
Aggregate fact
tables
c
8 Among the followings which is not a characteristic of Data
Warehouse?
Integrated Volatile Time-variant Subject
oriented
b
9 what is not considered as isssues in data warehousing? optimization data
transformatio
extraction inter
mediation
d
n
10 which one is NOT considering as a standard query
technique?
Drill-up Drill-across DSS Pivoting
c
11 Among the following which is not a type of business data ? Real time data Application
data
Reconciled
data
Derived data
b
12 A data warehouse is which of the following? Can be
updated by
Contains
numerous
Organized
around
Contains only
current data.
c
end users. naming important
conventions subject areas.
and formats.
32 A ____________ combines facts from multiple processes into a Aggregate fact Consolidated Transaction
single fact table and eases the analytic burden on BI table fact table fact table
Accumulating
snapshot fact
b
applications. table
13 1 2 4 8
Which of the following will be Euclidean
Distance between the two data point A(1,3)
a
and B(2,3)?
14 Suppose, you want to predict the class of new + Class
data point x=1 and y=1 using eucludian
– Class cant say None of these
a
distance in 3-NN. In which class this data
point belong to?
15 Which of the following would be the leave on (2/14)
out cross validation accuracy for k=5?
(4/14) (6/14) None of these
d
16 What is Manhattan distance? The distance The distance
between two points between two points
The distance
between two points
None of these
c
in a vector data in a raster data in a raster data
layer calculated as layer calculated as layer calculated as
the length of the the number of cells the sum of the cell
line between them. crossed by a sides intersected by
straight line a straight line
between them. between them.
17 Which of the following combination is
incorrect ?
Continuous – Continuous –
euclidean distance correlation
Binary –
manhattan
None of the
Mentioned
d
similarity distance
30 …..are the different types of attributes nominal ordinal interval All of these
d
31 … are the types of data sets graph record ordered All of these
d
32 …are the types of ordered data spatial data temporal data sequential data All of these
d
33 … are the example of data quality problems missing value wrong data duplicate data All of these
d
34 Numerical measure of how different two
data objects are…
similarity measure dissimilarity
measure
Both a & b none of these
b
22 What is association rule mining? Same as frequent Finding of strong Both a and b
itemset mining association rules
None of these
b
using frequent
itemsets
35 Which Association Rule would you prefer High support and High support and Low support and Low support and
medium low confidence high confidence low confidence
c
confidence
36 The apriori property means If a set cannot
pass a test, its
To decrease the
efficiency, do
To improve the
efficiency, do
If a set can pass a
test, its supersets
a
supersets will level-wise level-wise will fail the same
also fail the same generation of generation of test
test frequent item sets frequent item
37 sets
If an item set ‘XYZ’ is a frequent item set, then undefined
all subsets of that frequent item set are
not frequent frequent cant say
c
38 To determine association rules from frequent Only minimum
item sets confidence
Neither support
not confidence
Both minimum
support and
Minimum
support is needed
c
needed needed confidence are
needed
7 Decision Tree is used to build classification and regression models. TRUE FALSE
A
8 Sequential Covering Algorithm can be used to
extract___________rules form the training data
Do_WHILE IF-THEN
B
16 In k-NN regression, the output is the property value for the object. TRUE FALSE
A
17 Decision Tree Mining belongs to supervised class learning. TRUE FALSE
A
18 A regression equation is a polynomial regression equation if the
power of independent variable is more than 1.
TRUE FALSE
A
19 Decision Tree is used to create data models that will predict class
labels or values for the decision-making process.
TRUE FALSE
A
22 A decision tree works for both discrete and continuous variables. TRUE FALSE
A
23 Decision tree induction is the method of learning the decision trees TRUE
from the training set.
FALSE
A
2 Which of the following is true for Classification? A subdivision of a set A measure of the
accuracy
The task of
assigning a
All of these
a
classification
UNIT-1
1) Binary attribute are
a) This takes only two values. In general, these values will be 0 and 1 and
.they can be coded as one bit
b) The natural environment of a certain species
c) Systems that can be used without knowledge of internal operations
d) None of these
Ans: a
Explanation: All statement are true about Machine Learning.
2) “Efficiency and scalability of data mining algorithms” issues come under?
a) Mining Methodology and User Interaction Issues
b) Performance Issues
c) Diverse Data Types Issues
d) None of the above
Ans: b
Explanation: In order to effectively extract the information from huge amount of data
in databases, data mining algorithm must be efficient and scalable.
3) ——- is not a data mining functionality?
a) Clustering and Analysis
b) Selection and interpretation
c) Classification and regression
Characterization and Discrimination
Ans: b
Explanation: Selection and interpretation
Ans: d
Explanation: Data transformation
6) Which of the following is the right approach to Data Mining?
e) Infrastructure, exploration, analysis, exploitation, interpretation
f) Infrastructure, exploration, analysis, interpretation, exploitation
g) Infrastructure, analysis, exploration, interpretation, exploitation
None of these
Ans: b
Explanation: Infrastructure, exploration, analysis, interpretation, exploitation
8)
Data mining is
a) The actual discovery phase of a knowledge discovery process
b) The stage of selecting the right data for a KDD process
c) A subject-oriented integrated time variant non-volatile collection of
data in support of management
d) None of these
Ans: a
Explanation: The actual discovery phase of a knowledge discovery process
09)
Data selection is
Ans: a
Explanation: Nominal means “relating to names.” The values of a nominal attribute are
symbols or names of things
11) The Example of binary attribute is
a) gender
b) drink_size
c) tempertaure
d) professionl_rank
Ans b
Explanation: A binary attribute is a nominal attribute with only two categories or states:0
or1
12) The Example of ordinary attribute is
a) Years_of_experience
b) age
c) occupation
d) customer_id
Ans: b
An ordinal attribute is an attribute with possible values that have a meaningful
Explanation: order or ranking among them
13) Data cleaning includes____
a. Handling missing values and noisy data
b. Reduction of attributes
c. Relevant attribute selection
d. Sample data selection
Ans: a
Explanation: Data cleaning (or data cleansing) routines attempt to fill in missing values,
smooth out noise while identifying outliers, and correct inconsistencies in the
data.
14) To deal with missing values, the following strategy is used__
e. Use a measure of central tendency
f. Reduction of attribute
g. Sample data selection
h. Data converted into other form
Ans: a
Explanation: measures of central tendency, which indicate the “middle” value of a data
distribution
15) Noise is ___
a) Missing value from dataset
b) Inaccurate data
c) a random error or variance in a measured variable
d) the data whose value known to user
Ans: c
Explanation:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
a) K data values
b) Knowledge discovery from dataset
c) K dataset
d) None of the above
Ans. b
Explaination Knowledge discovery from dataset
21)
Data transformation includes:
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
Ans a
Explanation data are transformed and consolidated into forms appropriate for mining by
performing summary or aggregation operations
Ans c
Explanation visualization and knowledge representation techniques are used to present
mined knowledge to users
24) Data mining functionalities are used to___
a) to specify the kinds of patterns or knowledge to be found in data
mining tasks
b) to select data
c) to find missing values
d) to analyze the mining result
Ans a
Explanation a) Data mining functionalities are used to specify the kinds of patterns or
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
2. Data Warehouse is
a) Read only
b) Write only
c) Read and write only
d) none
Ans: a
Explanation: Because of historical data storage
3. Expansion for DSS in DW is___
a) Decision Single System
b) Decision storable system
c) Decision Support System
d) Data Support System
Ans: c
Explanation: Decision support system
4. The important aspect of data warehouse environment is that data found within the
data warehouse is___
a) Subject oriented
b) Time-variant
c) Integrated
d) All of the above
Ans: d
Explanation: All are correct
5. The time horizon in Data warehouse is usually__
a) 1-2 year
b) 3-4 year
c) 5-6 years
d) 5-10 years
Ans: d
Explanation: 5 to 10 years
6. The data is stored , retrieved and updated in___
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
a) OLAP
b) OLTP
c) SMTP
d) FTP
Ans: b
Explanation: Online Analytical Transaction processing
7. ___describes the data oriented in the data warehouse
a) Relational data
b) Operational data
c) Metadata
d) Informational data
Ans: c
Explanation: metadata
8. ___ predicts the future trends and behaviours, allowing business managers to make
proactive knowledge-driven decisions
a) Data warehouse
b) Data mining
c) Datamarts
d) metadata
Ans: b
Explanation:
current data.
d) A system that is used to support decision making and is based on
historical data.
Ans: b
Explanation:
19. Good performance can be achieved in a data mart environment by extensive use of
a) Indexes
b) creating profile records
c) volumes of data
d) all of the above
Ans: d
Explanation:
20. Warehouse administrator responsible for
a) Administrator
b) Maintenance
c) both a and b
d) none of the above
Ans c
Explaination
21. What is data cube?
a) allows data to be modeled and viewed in multiple dimensions
b) data with dimensions
c) data values
d) description about data
Ans. a
23 .Which of the following is not a multidimensional data model?
a) Star schema
b) Fact constellation
c) Snowflake schemas
d) Entity-relationship model
Ans d
Explanation Three models of data warehouse: star, snowflake and fact constellation
24. Snowflake schema consists of ___fact tables
a) One
b) Two
c) Three
ZEAL EDUCATION SOCIETY’S
ZEAL COLLEGE OF ENGINEERING AND RESEARCH
NARHE │PUNE -41 │ INDIA
DEPARTMENT OF COMPUTER ENGINEERING
d) four
Ans a
Explanation Having only one fact table and many dimension tables
25.Fact constellation consists of __ fact tables
a) one
b) two
c) three
d) many
Ans d
A) Data warehousing
B) Data mining
C) Text mining
D) Data selection
i) Data streams
v) Spatial data
A) i, ii, iii and v only
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection
5. ............................. is a comparison of the general features of the target
class data objects against the general features of objects from one or
multiple contrasting classes.
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection
A) cost-sensitive
B) work-sensitive
C) time-sensitive
D) technical-sensitive
A) Data Characterization
B) Data Classification
C) Data discrimination
D) Data selection
A) i, ii and iv only
A) Knowledge Database
A) Data
B) Information
C) Query
D) Useful information
Data Warhouse & Data Mining 700 - MCQ’s
6. The important aspect of the data warehouse environment is that data found within the data
warehouse is___________.
A. subject-oriented.
B. time-variant.
C. integrated.
D. All of the above.
ANSWER: D
1
7. The data is stored, retrieved & updated in ____________.
A. OLAP.
B. OLTP.
C. SMTP.
D. FTP.
ANSWER: B
11. ________________defines the structure of the data held in operational databases and used
byoperational applications.
A. User-level metadata.
B. Data warehouse metadata.
C. Operational metadata.
D. Data mining metadata.
ANSWER: C
13. _________maps the core warehouse metadata to business concepts, familiar and useful to end
users.
A. Application level metadata.
B. User level metadata.
C. Enduser level metadata.
D. Core level metadata.
ANSWER: A
2
14. Data can be updated in _____environment.
A. data warehouse.
B. data mining.
C. operational.
D. informational.
ANSWER: C
20. The term that is not associated with data cleaning process is ______.
A. domain consistency.
B. deduplication.
C. disambiguation.
D. segmentation.
ANSWER: D
3
ANSWER: C
26. _______________ helps to integrate, maintain and view the contents of the data warehousing
system.
A. Business directory.
B. Information directory.
C. Data dictionary.
D. Database.
ANSWER: B
28. Data marts that incorporate data mining tools to extract sets of data are called ______.
A. independent data mart.
B. dependent data marts.
C. intra-entry data mart.
D. inter-entry data mart.
ANSWER: B
4
29. A directory to help the DSS analyst locate the contents of the data warehouse is seen in ______.
A. Current detail data.
B. Lightly summarized data.
C. Metadata.
D. Older detail data.
ANSWER: C
32. Which of the following is not the other name of Data mining?
A. Exploratory data analysis.
B. Data driven discovery.
C. Deductive learning.
D. Data integration.
ANSWER: D
5
ANSWER: B
38. __________ is used to map a data item to a real valued prediction variable.
A. Regression.
B. Time series analysis.
C. Prediction.
D. Classification.
ANSWER: B
6
44. Treating incorrect or missing data is called as ___________.
A. selection.
B. preprocessing.
C. transformation.
D. interpretation.
ANSWER: B
45. Converting data from different sources into a common format for processing is called as
________.
A. selection.
B. preprocessing.
C. transformation.
D. interpretation.
ANSWER: C
49. __________ is used to proceed from very specific knowledge to more general information.
A. Induction.
B. Compression.
C. Approximation.
D. Substitution.
ANSWER: A
7
D. Summarization.
ANSWER: C
53. The ____________ of data could result in the disclosure of information that is deemed to be
confidential.
A. authorized use.
B. unauthorized use.
C. authenticated use.
D. unauthenticated use.
ANSWER: B
54. ___________ data are noisy and have many missing attribute values.
A. Preprocessed.
B. Cleaned.
C. Real-world.
D. Transformed.
ANSWER: C
55. __________ describes the discovery of useful information from the web contents.
A. Web content mining.
B. Web structure mining.
C. Web usage mining.
D. Web development.
ANSWER: A
56. _______ is concerned with discovering the model underlying the link structures of the web.
A. Web content mining.
B. Web structure mining.
C. Web usage mining.
D. Web development.
ANSWER: B
57. A _____ algorithm takes all the data at once and tries to create a hypothesis based on this data.
A. supervised.
B. batch learning.
C. unsupervised.
D. incremental learning.
ANSWER: B
58. A ________ algorithm takes a new piece of information at each learning cycle and tries to revise
the theory using
new data.
A. supervised.
B. batch learning.
C. unsupervised.
D. incremental learning.
ANSWER: B
8
59. ________ is used to find the vaguely known data.
A. SQL.
B. KDD.
C. Data mining.
D. Sybase.
ANSWER: C
60. The easiest way to gain access to the data and facilitate effective decision making is to set up a
_______.
A. database.
B. data mart.
C. data warehouse.
D. operational.
ANSWER: C
65. The _________ techniques are used to load information from operational database to data
warehouse.
A. reengineering.
B. reverse.
C. transfer.
D. replication.
ANSWER: D
9
66. In machine learning ________ phase try to find the patterns from observations.
A. observation
B. theory
C. analysis
D. prediction
ANSWER: C
68. The ________ is used to express the hypothesis describing the concept.
A. computer language.
B. algorithm.
C. definition.
D. theory
ANSWER: A
70. The results of machine learning algorithms are always have to be checked for their _________.
A. observations.
B. calculations
C. programs.
D. statistical relevance.
ANSWER: D
10
INTERMEDIATE QUESTIONS
79. Business Intelligence and data warehousing is not used for ________.
A. Forecasting.
B. Data Mining.
C. Analysis of large volumes of product sales data.
D. Discarding data.
ANSWER: D
11
80. Classification rules are extracted from _____________.
A. root node.
B. decision tree.
C. siblings.
D. branches.
ANSWER: B
81. Reducing the number of attributes to solve the high dimensionality problem is called as
________.
A. dimensionality curse.
B. dimensionality reduction.
C. cleaning.
D. Overfitting.
ANSWER: B
82. Data that are not of interest to the data mining task is called as ______.
A. missing data.
B. changing data.
C. irrelevant data.
D. noisy data.
ANSWER: C
84. Which of the following is not a desirable feature of any efficient algorithm?
A. to reduce number of input operations.
B. to reduce number of output operations.
C. to be efficient in computing.
D. to have maximal code length.
ANSWER: D
85. All set of items whose support is greater than the user-specified minimum support are called as
A. border set.
B. frequent set.
C. maximal frequent set.
D. lattice.
ANSWER: B
12
D. operational data.
ANSWER: C
94. Which of the following is closely related to statistical significance and transparency?
A. Classification Accuracy.
B. Transparency.
C. Statistical significance.
D. Search Complexity.
ANSWER: B
13
95. ________ is the technique which is used for discovering patterns in dataset at the beginning of
data mining process.
A. Kohenon map.
B. Visualization.
C. OLAP.
D. SQL.
ANSWER: B
100. Data mining is used to refer ______ stage in knowledge discovery in database.
A. selection.
B. retrieving.
C. discovery.
D. coding.
ANSWER: C
14
D. Poppers law.
ANSWER: A
103. The algorithms that are controlled by human during their execution is _______ algorithm.
A. unsupervised.
B. supervised.
C. batch learning.
D. incremental.
ANSWER: B
ADVANCED QUESTIONS
105. Dimensionality reduction reduces the data set size by removing ____________.
A. relevant attributes.
B. irrelevant attributes.
C. derived attributes.
D. composite attributes.
ANSWER: B
106. The main organizational justification for implementing a data warehouse is to provide ______.
A. cheaper ways of handling transportation.
B. decision support.
C. storing large volume of data.
D. access to data.
ANSWER: C
108. __________ are designed to overcome any limitations placed on the warehouse by the nature
of therelational data model.
A. Operational database.
B. Relational database.
C. Multidimensional database.
D. Data repository.
ANSWER: C
109. If a set is a frequent set and no superset of this set is a frequent set, then it is called ________.
A. maximal frequent set.
B. border set.
C. lattice.
D. infrequent sets.
15
ANSWER: A
110. The goal of _____ is to discover both the dense and sparse regions of a data set.
A. Association rule.
B. Classification.
C. Clustering.
D. Genetic Algorithm.
ANSWER: C
111. Rule based classification algorithms generate ______ rule to perform the classification.
A. if-then.
B. while.
C. do while.
D. switch.
ANSWER: A
112. ___________ training may be used when a clear link between input data sets and target output
valuesdoes not exist.
A. Competitive.
B. Perception.
C. Supervised.
D. Unsupervised.
ANSWER: D
113. Web content mining describes the discovery of useful information from the _______contents.
A. text.
B. web.
C. page.
D. level.
ANSWER: B
116. In web mining, _______ is used to find natural groupings of users, pages, etc.
A. clustering.
B. associations.
C. sequential analysis.
D. classification.
ANSWER: A
16
117. In web mining, _________ is used to know which URLs tend to be requested together.
A. clustering.
B. associations.
C. sequential analysis.
D. classification.
ANSWER: B
118. The ___________ engine for a data warehouse supports query-triggered usage of data
A. NNTP
B. SMTP
C. OLAP
D. POP
ANSWER: C
119. ________ displays of data such as maps, charts and other graphical representation allow data
to be presented compactly to the users.
A. Hidden
B. Visual
C. Obscured
D. Concealed
ANSWER: B
120. Which of the following are the important qualities of good learning algorithm.
A. Consistent, Complete.
B. Information content, Complex.
C. Complete, Complex.
D. Transparent, Complex.
ANSWER: A
121. The _______ is a symbolic representation of facts or ideas from which information can
potentially be extracted.
A. knowledge.
B. data.
C. algorithm.
D. program.
ANSWER: B
123. The main organizational justification for implementing a data warehouse is to provide ______.
A. cheaper ways of handling transportation.
B. decision support.
C. storing large volume of data.
D. access to data.
ANSWER: C
17
124. The process of finding the right formal representing of a certain body of knowledge in order to
represent it inknowledge based system is__________.
A. re-engineering.
B. replication.
C. knowledge engineering.
D. reverse engineering.
ANSWER: C
126. ________analysis divides data into groups that are meaningful, useful, or both.
A. Cluster.
B. Association.
C. Classifiction.
D. Relation.
ANSWER: A
130. Nominal and ordinal attributes are collectively referred to as_________ attributes.
A. qualitative.
B. perfect.
C. consistent.
D. optimized.
ANSWER: A
18
C. data object.
D. template.
ANSWER: C
136. Nominal and ordinal attributes are collectively referred to as_________ attributes.
A. qualitative.
B. perfect.
C. consistent.
D. optimized.
ANSWER: A
19
139. ___________ is used for discrete target variable.
A. Nominal.
B. Classification.
C. Clustering.
D. Association.
ANSWER: B
INTERMEDIATE QUESTIONS
144. The term that is not associated with data cleaning process is ______.
A. domain consistance.
B. de-duplication.
C. disambiguation.
D. segmentation.
ANSWER: D
The _____ is a useful method of discovering patterns at the beginning of data mining process.
A. calculating distance.
B. visualization techniques.
C. decision trees.
D. association rules.
ANSWER: B
20
145. Data mining methodology states that in optimal situation data mining is an _____.
A. standard process.
B. complete process.
C. creative process.
D. ongoing process.
ANSWER: D
150. Data marts that incorporate data mining tools to extract sets of data is called______.
A. independent data mart.
B. dependent data marts.
C. intra-entry data mart.
D. inter-entry data mart.
ANSWER: B
21
D. Semisupervised learning
ANSWER : B
156. Which one of the following is not a part of empirical cycle in scientific research?
A. Observation
B. Theory.
C. Self learning.
D. Prediction.
ANSWER: C
157. In machine learning ________ phase try to find the patterns from observations.
A. observation
B. theory
C. analysis
D. prediction
ANSWER: C
158. ANSWER: D
Data warehouse architecture is based on ______________.
A. DBMS.
B. RDBMS.
C. Sybase.
D. SQL Server.
ANSWER: B
ADVANCED QUESTIONS
22
160. ________ is the type of pollution that is difficult to trace.
A. Duplication of records.
B. Ambiguition.
C. Lack of domain consistency.
D. Lack of information.
ANSWER: C
167. The _________is a knowledge that can be found by using pattern recognition algorithm.
A. hidden knowledge.
B. deep.
C. shallow.
23
D. multidimensional.
ANSWER: A
169. Which of the following features usually applies to data in a data warehouse
A. Data are often deleted.
B. Most applications consist of transactions.
C. Data are rarely deleted.
D. Relatively few records are processed by applications.
ANSWER: C
24
175. Which of the following is an extract process
A. Capturing all of the data contained in various operational systems.
B. Capturing a subset of the data contained in various operational systems.
C. Capturing all of the data contained in various decision support systems.
D. Capturing a subset of the data contained in various decision support systems.
ANSWER: B
179. Which of the given technology is not well-suited for data mining
A. Expert system technology.
B. Data visualization.
C. Technology limited to specific data types such as numeric data types.
D. Parallel architecture.
ANSWER: C
181. Which of the following function involves data cleaning, data standardizing and summarizing
A. Storing data.
B. Transforming data.
C. Data acquisition.
D. Data Access.
ANSWER: B
182. Which of the following problems bog down the development of data mining projects
A. Financial problem.
B. Lack of technical assistance.
C. Lack of long-term vision.
25
D. Legal and privacy restrictions.
ANSWER: C
186. You are given data about seismic activity in Japan, and you want to predict a magnitude of the
next earthquake, this is an example of
Supervised learning
Unsupervised learning
Serration
Dimensionality reduction
ANSWER: A
187. Algoritm is
A. It uses machine-learning technique. Here a program can learn from past experience.
B. Computational procedure that takes some values as input and procedure takes some value as
output
C. Science of making machines perform tasks that would require intelligence when performed by
humans
D. Processing procedure
ANSWER: A
26
ANSWER: C
192. Analysis of variance is a statistical method of comparing the ________ of several populations.
A. standard deviations
B. variances
C. means
D. proportions
Answer: A
27
28
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
II M.Sc(IT) [2012-2014]
Semester III
Core: Data Warehousing and Mining - 363U1
Multiple Choice Questions.
4. The important aspect of the data warehouse environment is that data found within the data
warehouse is___________.
A. subject-oriented.
B. time-variant.
C. integrated.
D. All of the above.
ANSWER: D
1 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
D. FTP.
ANSWER: B
8. ____________predicts future trends & behaviors, allowing business managers to make proactive,
knowledge-driven decisions.
A. Data warehouse.
B. Data mining.
C. Datamarts.
D. Metadata.
ANSWER: B
11. ________________defines the structure of the data held in operational databases and used by
operational applications.
A. User-level metadata.
B. Data warehouse metadata.
C. Operational metadata.
D. Data mining metadata.
ANSWER: C
13. _________maps the core warehouse metadata to business concepts, familiar and useful to end
users.
A. Application level metadata.
B. User level metadata.
C. Enduser level metadata.
D. Core level metadata.
ANSWER: A
2 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
19. The key used in operational environment may not have an element of__________.
A. time.
B. cost.
C. frequency.
D. quality.
ANSWER: A
3 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
D. data warehouse
ANSWER: D
23. Data warehouse contains_____________data that is never found in the operational environment.
A. normalized.
B. informational.
C. summary.
D. denormalized.
ANSWER: C
24. Data redundancy between the environments results in less than ____________percent.
A. one.
B. two.
C. three.
D. four.
ANSWER: A
25. Bill Inmon has estimated___________of the time required to build a data warehouse, is consumed
in the conversion process.
A. 10 percent.
B. 20 percent.
C. 40 percent
D. 80 percent.
ANSWER: D
29. The biggest drawback of the level indicator in the classic star-schema is that it limits_________.
4 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
A. quantify.
B. qualify.
C. flexibility.
D. ability.
ANSWER: C
5 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
6 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
B. Data Mining.
C. Analysis of large volumes of product sales data.
D. All of the above.
ANSWER: D
45. The data administration subsystem helps you perform all of the following, except__________.
A. backups and recovery.
B. query optimization.
C. security management.
D. create, change, and delete information.
ANSWER: D
46. The most common source of change data in refreshing a data warehouse is _______.
A. queryable change data.
B. cooperative change data.
C. logged change data.
D. snapshot change data.
ANSWER: A
47. ________ are responsible for running queries and reports against data warehouse tables.
A. Hardware.
B. Software.
C. End users.
D. Middle ware.
ANSWER: C
50. Dimensionality reduction reduces the data set size by removing ____________.
A. relevant attributes.
B. irrelevant attributes.
C. derived attributes.
D. composite attributes.
ANSWER: B
7 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
52. Effect of one attribute value on a given class is independent of values of other attribute is called
_________.
A. value independence.
B. class conditional independence.
C. conditional independence.
D. unconditional independence.
ANSWER: A
53. The main organizational justification for implementing a data warehouse is to provide ______.
A. cheaper ways of handling transportation.
B. decision support.
C. storing large volume of data.
D. access to data.
ANSWER: C
8 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
D. Symmetric Microprogramming.
ANSWER: A
60. __________ are designed to overcome any limitations placed on the warehouse by the nature of the
relational data model.
A. Operational database.
B. Relational database.
C. Multidimensional database.
D. Data repository.
ANSWER: C
61. __________ are designed to overcome any limitations placed on the warehouse by the nature of the
relational data model.
A. Operational database.
B. Relational database.
C. Multidimensional database.
D. Data repository.
ANSWER: C
9 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
68. The term that is not associated with data cleaning process is ______.
A. domain consistency.
B. deduplication.
C. disambiguation.
D. segmentation.
ANSWER: D
74. The terms equality and roll up are associated with ____________.
A. OLAP.
B. visualization.
C. data mart.
D. decision tree.
10 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
ANSWER: C
79. The first International conference on KDD was held in the year _____________.
A. 1996.
B. 1997.
C. 1995.
D. 1994.
ANSWER: C
81. ____________ contains information that gives users an easy-to-understand perspective of the
information stored in the data warehouse.
A. Business metadata.
B. Technical metadata.
C. Operational metadata.
D. Financial metadata.
ANSWER: A
82. _______________ helps to integrate, maintain and view the contents of the data warehousing
system.
11 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
A. Business directory.
B. Information directory.
C. Data dictionary.
D. Database.
ANSWER: B
84. Data marts that incorporate data mining tools to extract sets of data are called ______.
A. independent data mart.
B. dependent data marts.
C. intra-entry data mart.
D. inter-entry data mart.
ANSWER: B
85. ____________ can generate programs itself, enabling it to carry out new tasks.
A. Automated system.
B. Decision making system.
C. Self-learning system.
D. Productivity system.
ANSWER: D
87. Building the informational database is done with the help of _______.
A. transformation or propagation tools.
B. transformation tools only.
C. propagation tools only.
D. extraction tools.
ANSWER: A
12 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
90. ________ is data that is distilled from the low level of detail found at the current detailed leve.
A. Highly summarized data.
B. Lightly summarized data.
C. Metadata.
D. Older detail data.
ANSWER: B
92. A directory to help the DSS analyst locate the contents of the data warehouse is seen in ______.
A. Current detail data.
B. Lightly summarized data.
C. Metadata.
D. Older detail data.
ANSWER: C
95. The data from the operational environment enter _______ of data warehouse.
A. Current detail data.
B. Older detail data.
C. Lightly summarized data.
D. Highly summarized data.
ANSWER: A
96. The data in current detail level resides till ________ event occurs.
A. purge.
B. summarization.
C. archieved.
D. all of the above.
ANSWER: D
13 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
D. units of measures.
ANSWER: B
98. The granularity of the fact is the _____ of detail at which it is recorded.
A. transformation.
B. summarization.
C. level.
D. transformation and summarization.
ANSWER: C
101. ___________ of data means that the attributes within a given entity are fully dependent on the
entire primary key of the entity.
A. Additivity.
B. Granularity.
C. Functional dependency.
D. Dimensionality.
ANSWER: C
105. Non-additive measures can often combined with additive measures to create new _________.
14 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
A. additive measures.
B. non-additive measures.
C. partially additive.
D. All of the above.
ANSWER: A
106. A fact representing cumulative sales units over a day at a store for a product is a _________.
A. additive fact.
B. fully additive fact.
C. partially additive fact.
D. non-additive fact.
ANSWER: B
107. ____________ of data means that the attributes within a given entity are fully dependent on the
entire primary key of the entity.
A. Additivity.
B. Granularity.
C. Functional Dependency.
D. Dependency.
ANSWER: C
15 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
ANSWER: B
114. __________ is used to map a data item to a real valued prediction variable.
A. Regression.
B. Time series analysis.
C. Prediction.
D. Classification.
ANSWER: B
16 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
C. five.
D. six.
ANSWER: C
122. Converting data from different sources into a common format for processing is called as ________.
A. selection.
B. preprocessing.
C. transformation.
D. interpretation.
ANSWER: C
126. __________ is used to proceed from very specific knowledge to more general information.
A. Induction.
B. Compression.
C. Approximation.
D. Substitution.
ANSWER: A
127. Describing some characteristics of a set of data by a general model is viewed as ____________
A. Induction.
B. Compression.
C. Approximation.
D. Summarization.
ANSWER: B
17 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
A. Induction.
B. Compression.
C. Approximation.
D. Summarization.
ANSWER: C
129. _______ are needed to identify training data and desired results.
A. Programmers.
B. Designers.
C. Users.
D. Administrators.
ANSWER: C
134. The ____________ of data could result in the disclosure of information that is deemed to be
confidential.
A. authorized use.
B. unauthorized use.
C. authenticated use.
D. unauthenticated use.
ANSWER: B
135. ___________ data are noisy and have many missing attribute values.
A. Preprocessed.
B. Cleaned.
C. Real-world.
D. Transformed.
18 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
ANSWER: C
139. Reducing the number of attributes to solve the high dimensionality problem is called as ________.
A. dimensionality curse.
B. dimensionality reduction.
C. cleaning.
D. Overfitting.
ANSWER: B
140. Data that are not of interest to the data mining task is called as ______.
A. missing data.
B. changing data.
C. irrelevant data.
D. noisy data.
ANSWER: C
19 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
C. marketing strategies.
D. All of the above.
ANSWER: D
146. The value that says that transactions in D that support X also support Y is called ______________.
A. confidence.
B. support.
C. support count.
D. None of the above.
ANSWER: A
147. If T consist of 500000 transactions, 20000 transaction contain bread, 30000 transaction contain
jam, 10000 transaction contain both bread and jam. Then the support of bread and jam is _______.
A. 2%
B. 20%
C. 3%
D. 30%
ANSWER: A
148. 7 If T consist of 500000 transactions, 20000 transaction contain bread, 30000 transaction contain
jam, 10000 transaction contain both bread and jam. Then the confidence of buying bread with jam is
_______.
A. 33.33%
B. 66.66%
C. 45%
D. 50%
ANSWER: D
20 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
ANSWER: A
151. Which of the following is not a desirable feature of any efficient algorithm?
A. to reduce number of input operations.
B. to reduce number of output operations.
C. to be efficient in computing.
D. to have maximal code length.
ANSWER: D
152. All set of items whose support is greater than the user-specified minimum support are called as
_____________.
A. border set.
B. frequent set.
C. maximal frequent set.
D. lattice.
ANSWER: B
153. If a set is a frequent set and no superset of this set is a frequent set, then it is called ________.
A. maximal frequent set.
B. border set.
C. lattice.
D. infrequent sets.
ANSWER: A
156. If an itemset is not a frequent set and no superset of this is a frequent set, then it is _______.
A. Maximal frequent set
B. Border set.
C. Upward closure property.
D. Downward closure property.
ANSWER: B
21 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
161. The _______ step eliminates the extensions of (k-1)-itemsets which are not found to be frequent,
from being considered for counting support.
A. Candidate generation.
B. Pruning.
C. Partitioning.
D. Itemset eliminations.
ANSWER: B
162. The a priori frequent itemset discovery algorithm moves _______ in the lattice.
A. upward.
B. downward.
C. breadthwise.
D. both upward and downward.
ANSWER: A
22 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
167. Itemsets in the ______ category of structures have a counter and the stop number with them.
A. Dashed.
B. Circle.
C. Box.
D. Solid.
ANSWER: A
168. The itemsets in the _______category structures are not subjected to any counting.
A. Dashes.
B. Box.
C. Solid.
D. Circle.
ANSWER: C
169. Certain itemsets in the dashed circle whose support count reach support value during an iteration
move into the ______.
A. Dashed box.
B. Solid circle.
C. Solid box.
D. None of the above.
ANSWER: A
170. Certain itemsets enter afresh into the system and get into the _______, which are essentially the
supersets of the itemsets that move from the dashed circle to the dashed box.
A. Dashed box.
B. Solid circle.
C. Solid box.
D. Dashed circle.
ANSWER: D
171. The itemsets that have completed on full pass move from dashed circle to ________.
A. Dashed box.
B. Solid circle.
C. Solid box.
D. None of the above.
ANSWER: B
23 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
B. a frequent-item-header table.
C. a frequent-item-node.
D. both A & B.
ANSWER: D
176. The paths from root node to the nodes labelled 'a' are called __________.
A. transformed prefix path.
B. suffix subpath.
C. transformed suffix path.
D. prefix subpath.
ANSWER: D
177. The transformed prefix paths of a node 'a' form a truncated database of pattern which co-occur
with a is called _______.
A. suffix path.
B. FP-tree.
C. conditional pattern base.
D. prefix path.
ANSWER: C
178. The goal of _____ is to discover both the dense and sparse regions of a data set.
A. Association rule.
B. Classification.
C. Clustering.
D. Genetic Algorithm.
ANSWER: C
180. _______ clustering technique start with as many clusters as there are records, with each cluster
having only one record.
A. Agglomerative.
B. divisive.
C. Partition.
D. Numeric.
24 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
ANSWER: A
181. __________ clustering techniques starts with all records in one cluster and then try to split that
cluster into small pieces.
A. Agglomerative.
B. Divisive.
C. Partition.
D. Numeric.
ANSWER: B
182. Which of the following is a data set in the popular UCI machine-learning repository?
A. CLARA.
B. CACTUS.
C. STIRR.
D. MUSHROOM.
ANSWER: D
183. In ________ algorithm each cluster is represented by the center of gravity of the cluster.
A. k-medoid.
B. k-means.
C. STIRR.
D. ROCK.
ANSWER: B
184. In ___________ each cluster is represented by one of the objects of the cluster located near the
center.
A. k-medoid.
B. k-means.
C. STIRR.
D. ROCK.
ANSWER: A
25 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
189. The cluster features of different subclusters are maintained in a tree called ___________.
A. CF tree.
B. FP tree.
C. FP growth tree.
D. B tree.
ANSWER: A
190. The ________ algorithm is based on the observation that the frequent sets are normally very few
in number compared to the set of all itemsets.
A. A priori.
B. Clustering.
C. Association rule.
D. Partition.
ANSWER: D
191. The partition algorithm uses _______ scans of the databases to discover all frequent sets.
A. two.
B. four.
C. six.
D. eight.
ANSWER: A
192. The basic idea of the apriori algorithm is to generate________ item sets of a particular size &
scans the database.
A. candidate.
B. primary.
C. secondary.
D. superkey.
ANSWER: A
193. ________is the most well known association rule algorithm and is used in most commercial
products.
A. Apriori algorithm.
B. Partition algorithm.
C. Distributed algorithm.
D. Pincer-search algorithm.
ANSWER: A
194. An algorithm called________is used to generate the candidate item sets for each pass after the
first.
A. apriori.
B. apriori-gen.
C. sampling.
D. partition.
ANSWER: B
195. The basic partition algorithm reduces the number of database scans to ________ & divides it into
partitions.
26 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
A. one.
B. two.
C. three.
D. four.
ANSWER: B
197. ___________can be thought of as classifying an attribute value into one of a set of possible
classes.
A. Estimation.
B. Prediction.
C. Identification.
D. Clarification.
ANSWER: B
199. _________data consists of sample input data as well as the classification assignment for the data.
A. Missing.
B. Measuring.
C. Non-training.
D. Training.
ANSWER: D
200. Rule based classification algorithms generate ______ rule to perform the classification.
A. if-then.
B. while.
C. do while.
D. switch.
ANSWER: A
201. ____________ are a different paradigm for computing which draws its inspiration from
neuroscience.
A. Computer networks.
B. Neural networks.
C. Mobile networks.
D. Artificial networks.
ANSWER: B
27 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
D. muscles.
ANSWER: A
204. The ___________is a long, single fibre that originates from the cell body.
A. axon.
B. neuron.
C. dendrites.
D. strands.
ANSWER: A
207. _________ is the connectivity of the neuron that give simple devices their real power. a. b. c. d.
A. Water.
B. Air.
C. Power.
D. Fire.
ANSWER: D
209. The biological neuron's _________ is a continuous function rather than a step function.
A. read.
B. write.
C. output.
D. input.
ANSWER: C
210. The threshold function is replaced by continuous functions called ________ functions.
A. activation.
28 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
B. deactivation.
C. dynamic.
D. standard.
ANSWER: A
213. In a feed- forward networks, the conncetions between layers are ___________ from input to
output.
A. bidirectional.
B. unidirectional.
C. multidirectional.
D. directional.
ANSWER: B
217. RBF hidden layer units have a receptive field which has a ____________; that is, a particular input
value at which they have a maximal output.
A. top.
B. bottom.
C. centre.
D. border.
29 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
ANSWER: C
218. ___________ training may be used when a clear link between input data sets and target output
values does not exist.
A. Competitive.
B. Perception.
C. Supervised.
D. Unsupervised.
ANSWER: D
220. ________________ design involves deciding on their centres and the sharpness of their Gaussians.
A. DR.
B. AND.
C. XOR.
D. RBF.
ANSWER: D
223. ____________ is one of the most popular models in the unsupervised framework.
A. SOM.
B. SAM.
C. OSM.
D. MSO.
ANSWER: A
224. The actual amount of reduction at each learning step may be guided by _________.
A. learning cost.
B. learning level.
C. learning rate.
D. learning time.
ANSWER: C
30 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
B. Teuvokohonen.
C. Tomoki Toda.
D. Julia.
ANSWER: B
227. Investment analysis used in neural networks is to predict the movement of _________ from
previous data.
A. engines.
B. stock.
C. patterns.
D. models.
ANSWER: B
228. SOMs are used to cluster a specific _____________ dataset containing information about the
patient's drugs etc.
A. physical.
B. logical.
C. medical.
D. technical.
ANSWER: C
231. Genetic algorithms are search algorithms based on the mechanics of natural_______.
A. systems.
B. genetics.
C. logistics.
D. statistics.
ANSWER: B
31 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
ANSWER: A
239. Web content mining describes the discovery of useful information from the _______contents.
A. text.
B. web.
C. page.
D. level.
ANSWER: B
32 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
C. meta.
D. digital.
ANSWER: B
241. _______ mining is concerned with discovering the model underlying the link structures of the web.
A. Data structure.
B. Web structure.
C. Text structure.
D. Image structure.
ANSWER: B
243. The ________ propose a measure of standing a node based on path counting.
A. open web.
B. close web.
C. link web.
D. hidden web.
ANSWER: B
244. In web mining, _______ is used to find natural groupings of users, pages, etc.
A. clustering.
B. associations.
C. sequential analysis.
D. classification.
ANSWER: A
245. In web mining, _________ is used to know the order in which URLs tend to be accessed.
A. clustering.
B. associations.
C. sequential analysis.
D. classification.
ANSWER: C
246. In web mining, _________ is used to know which URLs tend to be requested together.
A. clustering.
B. associations.
C. sequential analysis.
D. classification.
ANSWER: B
247. __________ describes the discovery of useful information from the web contents.
A. Web content mining.
B. Web structure mining.
C. Web usage mining.
D. All of the above.
ANSWER: A
248. _______ is concerned with discovering the model underlying the link structures of the web.
33 of 34 8/20/2013 2:47 PM
http://grdmcqonline/printqp.php?heading=II M.Sc(IT) [2012-2014], Se...
249. A link is said to be _________ link if it is between pages with different domain names.
A. intrinsic.
B. transverse.
C. direct.
D. contrast.
ANSWER: B
250. A link is said to be _______ link if it is between pages with the same domain name.
A. intrinsic.
B. transverse.
C. direct.
D. contrast.
ANSWER: A
Staff Name
LAXMI.SREE.B.R.
34 of 34 8/20/2013 2:47 PM
marks question A B C D ans
Data warehousing
involves data
cleaning, data
integration, and
data
consolidations. To
integrate
To integrate heterogeneous databases, how many
0 1 2 3 4 5 heterogeneous
approaches are there in Data Warehousing?
databases, we
have the following
two approaches:
Query Driven
Approach,
Update Driven
Approach
Evolution
Analysis:
Evolution analysis
refers to the
__________ refers to the description and model
description and
1 1 regularities or trends for objects whose behavior Evolution Analysis Outlier Analysis Prediction Classification
model regularities
changes over time.
or trends for
objects whose
behavior changes
over time.
Data
Discrimination: It
refers to the
The mapping or classification of a class with Data Data Data Sub mapping or
2 1 Data Set
some predefined group or class is known as? Discrimination Characterization Structure classification of a
class with some
predefined group
or class
Data Integration:
In which step of Knowledge Discovery, multiple Data multiple data
3 1 Data Integration Data Cleaning Data Selection
data sources are combined? Transformation sources are
combined.
Time-Sensitive is
Technical-
4 1 What is the strategic value of data mining? Time-sensitive Work-sensitive. Cost-sensitive the strategic value
sensitive.
of data mining.
The first step
involved in the
The first step involved in knowledge discovery Data
5 2 Data Cleaning Data Selection Data Transformation knowledge
is? Integration
discovery is Data
Integration.
Selection and
Which of the following is not a data mining Selection and Classification and Characterization Clustering and interpretation is
6 2
functionality? interpretation regression and Discrimination Analysis not a function of
data mining
Data
Characterization:
This refers to
summarizing data
In Data Characterization, the class under study is
7 2 Target Class Initial Class Study Class Final Class of class under
called as?
study. This class
under study is
called Target
Class.
The predictive
Capability of data mining is to build model has the
8 2 Predictive. Interrogative. Retrospective. Imperative.
___________ models. capability of data
mining
marks question A B C D ans
The database may
contain complex
data objects,
Mining multimedia data
"Handling of relational and complex types of Diverse Data Performance Methodology and objects, spatial
9 2 None
data" issue comes under? Types Issues Issues User Interaction data, temporal
Issues data, etc. One
system can't mine
all this kind of
data.
Data Mining is
defined as
extracting
information from
Data mining also Data Mining is huge sets of data.
involves other defined as the In other words,
Data mining is the
processes such procedure of we can say that
procedure of mining
10 2 What is true about data mining? All as Data Cleaning, extracting data mining is the
knowledge from
Data Integration, information procedure of
data.
Data from huge sets mining knowledge
Transformation of data from data. The
information or
knowledge is
extracted so that it
can be used.
The KDD stands
Knowledge
Knowledge Knowledge Data Knowledge for Knowledge
11 2 What is KDD Discovery
Database House Data Definition Discovery
Database
Database.
Data mining is
highly useful in the
following domains:
Corporate Market Market Analysis
Which of the following is the correct application
12 2 All Analysis & Risk Fraud Detection Analysis and and Management,
of data mining?
Management Management Corporate
Analysis & Risk
Management,
Fraud Detection
All of the above
Which of the following is not a data mining Space
13 2 All Time complexity. ROI are algorithm
metric? complexity.
metrics.
The Data Mining
Query Language
(DMQL) was
Data Marts
Data Mining Query Dataset Mining DBMiner Query proposed by Han,
14 2 DMQL stands for? Query
Language Query Language Language Fu, Wang, et al.
Language
for the DBMiner
data mining
system.
marks question A B C D ans
Mining of
Correlations: It is
a kind of
additional analysis
performed to
uncover interesting
statistical
correlations
The analysis performed to uncover interesting
Mining of Mining of Mining of between
15 2 statistical correlations between associated- None
Correlations Clusters Association associated-
attribute-value pairs is called?
attribute-value
pairs or between
two item sets to
analyze that if they
have positive,
negative, or no
effect on each
other.
Data cleaning is a
technique that is
applied to remove
the noisy data and
correct the
inconsistencies in
data. Data
cleaning involves
Correct the Transformations to transformations to
To remove the
16 2 What is the use of data cleaning? All inconsistencies in correct the wrong correct the wrong
noisy data
data data. data. Data
cleaning is
performed as a
data
preprocessing
step while
preparing the data
for a data
warehouse.
In order to
effectively extract
the information
Mining from a huge
"Efficiency and scalability of data mining Performance Methodology Diverse Data Types amount of data in
17 2 None
algorithms" issues come under? Issues and User Issues databases, the
Interaction Issues data mining
algorithm must be
efficient and
scalable.
All are the
Sales promotion Marketing Inventory
18 3 Data mining helps in __________. All properties of data
strategies. strategies. management.
mining
Outlier Analysis:
Outliers may be
defined as the
__________ may be defined as the data objects data objects that
Evolution
19 3 that do not comply with the general behavior or Outlier Analysis Prediction Classification do not comply
Analysis
model of the data available. with the general
behavior or model
of the data
available.
……………………….. is a comparison of the
Data
general features of the target class data objects Data Data
20 3 Data discrimination Data selection discrimination is
against the general features of objects from one Classification Characterization
the feature
or multiple contrasting classes.
marks question A B C D ans
Frequent
Subsequence: A
sequence of
patterns that occur
A sequence of patterns that occur frequently is Frequent . Frequent Item Frequent Sub All of the
21 3 frequently such as
known as? Subsequence Set Structure above
purchasing a
camera is
followed by a
memory card.
Data mining is an
-------- is an essential process where intelligent Data
22 3 Data mining Text mining Data selection essential process
methods are applied to extract data patterns. warehousing
where AI is used.
Pattern evaluation:
The patterns
discovered should
Mining
be interesting
Methodology and Performance Diverse Data Types None of the
23 3 Does the pattern evaluation issue come under? because either
User Interaction Issues Issues above
they represent
Issues
common
knowledge or lack
of novelty.
What predicts future trends & behaviors, Data mining
24 3 allowing business managers to make Data mining. Data warehouse. Datamarts. Metadata. predicts future
proactive,knowledge-driven decisions. trends.
All the above are
Which of the following is the other name of Data Data-driven Exploratory
25 3 All Deductive learning. the name of data
mining? discovery. data analysis.
mining
There are two
categories of
functions involved
How many categories of functions involved in
26 3 2 3 4 5 in Data Mining: 1.
Data Mining?
Descriptive, 2.
Classification and
Prediction
A data mining
system can be
classified
according to the
following criteria:
Database
Does Data Mining System Classification consist Machine Database
27 3 All Information Science Technology,
of? Learning Technology
Statistics,
Machine Learning,
Information
Science,
Visualization,
Other Disciplines
The Query
It is very Driven All statements are
This approach is
Which of the following is the correct inefficient and Approach a disadvantage of
expensive for
28 3 disadvantage of the Query-Driven Approach in All very expensive needs complex the Query-Driven
queries that require
Data Warehousing? for frequent integration and Approach in Data
aggregations.
queries. filtering Warehousing.
processes.
The data can be
copied,
Both A and B are
processed,
the advantages of
Which of the following is the correct advantage integrated, This approach
the Update-
29 3 of the Update-Driven Approach in Data Both A and B annotated, provides high None
Driven Approach
Warehousing? summarized, and performance.
in Data
restructured in
Warehousing.
the semantic data
store in advance.
marks question A B C D ans
{ (item name,
SELECT item name, color, clothes SIZE,
color, clothes
SUM(quantity)\nFROM sales\nGROUP BY
30 1 4 8 2 1 size), (item name,
rollup(item name, color, clothes SIZE);\nHow
color), (item
many grouping is possible in this rollup?\n
name), () }.
We can change
the dimensions
used in a cross
The operation of changing the dimensions used in tab. The operation
31 1 Pivoting Alteration Piloting Renewing
a cross-tab is called as ________ of changing a
dimension used in
a cross-tab is
called pivoting.
OLAP is the
Online manipulation of
Online analytical Online analysis Online transaction
32 1 OLAP stands for aggregate information to
processing processing processing
processing support decision
making.
In OLAP, analysts
cannot view a
dimension in
different levels of
State true or false: In OLAP, analysts cannot
33 1 "False" "True" None None detail. The
view a dimension in different levels of detail.
different levels of
detail are
classified into a
hierarchy.
Given a relation
used for data
analysis, we can
identify some of its
attributes as
Data that can be modeled as dimension attributes
34 1 Multidimensional Singledimensional Measured Dimensional measure
and measure attributes are called _______ data.
attributes, since
they measure
some value, and
can be aggregated
upon.
Analysis of large
Business Intelligence and data warehousing is All are used in
35 1 All Data Mining. volumes of product Forecasting
used for ________. data ware house
sales data.
OLAP systems
permit users to
view the data at
any level of
granularity. The
The operation of moving from coarser granular
36 1 Drill down Increment Rollback Reduction process of moving
data to finer granular data is called _______
from finer granular
data to coarser
granular data is
called as drill-
down.
The opposite
operation—that of
moving from
The operation of moving from finer-granularity
coarser-
37 2 data to a coarser granularity (using aggregation) Rollup Drill down Dicing Pivoting
granularity data to
is called a ________
finer-granularity
data—is called a
drill down.
marks question A B C D ans
OLAP systems
can be
implemented as
client-server
State true or false: OLAP systems can be systems. Most of
38 2 "True" "False" None None
implemented as client-server systems the current OLAP
systems are
implemented as
client-server
systems.
Data that can be
modeled as
Data that can be modelled as dimension dimension
Multi-dimensional Mono-
39 2 attributes and measure attributes are called Measurable data Efficient data attributes and
data dimensional data
___________ measure attributes
are called multi-
dimensional data.
The slice
operation selects
one particular
dimension from a
given cube and
The process of viewing the cross-tab (Single Both Slicing provides a new
40 2 Slicing Dicing Pivoting
dimensional) with a fixed value of one attribute is and Dicing sub-cube. Dice
selects two or
more dimensions
from a given cube
and provides a
new sub-cube.
The time horizon in Data warehouse is usually 5 to 10 years is
41 2 5-10 years. 3-4 years 5-6 years. 1-2 years.
__________. the horizon time
Cross-tabs
enables analysts to
view two
How many dimensions of multi-dimensional data dimensions of
42 2 2 1 3 None
do cross tabs enable analysts to view? multi-dimensional
data, along with
the summaries of
the data.
Operational OLAP support
43 2 What do data warehouses support? OLAP OLTP OLAP and OLTP
databases data warehouses
RDBMS is the
Data warehouse architecture is based on
44 2 RDBMS DBMS Sybase. SQL Server data warehouse
______________.
architecture.
What does collector_type_id stands for in the
collector_type_uid
following code snippet?
45 2 uniqueidentifier membership role directory None is the GUID for
core.sp_remove_collector_type [
the collector type.
@collector_type_uid = ] ‘collector_type_uid’
Each cell in the
cube is identified
The generalization of cross-tab which is
Two-dimensional Multidimensional for the values for
46 2 represented visually is ____________ which is N-dimensional cube Cuboid
cube cube the three-
also called as a data cube.
dimensional
attributes.
Operational
The source of all data warehouse data is Operational Informal Formal Technology environment is the
47 2
the____________. environment. environment. environment. environment source of data
warehouse
marks question A B C D ans
A normalized
histogram. p(rk) =
nk / n\nWhere, n
is total number of
pixels in image, rk
the kth gray level
What is the sum of all components of a
48 3 1 -1 0 None and nk total pixels
normalized histogram?
with gray level
rk.\nHere, p(rk)
gives the
probability of
occurrence of
rk.\n
HOLAP means
Hybrid OLAP,
MOLAP means
multidimensional
Which of the following OLAP systems do not OLAP, ROLAP
49 3 None MOLAP ROLAP HOLAP
exist? means relational
OLAP. This
means all of the
above OLAP
systems exist.
We want to add the following capabilities to
Table2: show the data for 3 age groups (20-39, Between 40
Between 10 and 30 More than 100 is
40-60, over 60), 3 revenue groups (less than and 60
50 3 More than 100 4 (boundaries the capabilities to
$10,000, $10,000-$30,000, over $30,000) and (boundaries
includeD. Table2
add a new type of account: Money market. The includeD.
total number of measures will be:
The decode
function allows
substitution of
values in an
attribute of a
tuple. The decode
The _______ function allows substitution of function does not
51 3 Decode Unknown Cube Substitute
values in an attribute of a tuple always work as
we might like for
null values
because
predicates on null
values evaluate to
unknown.
OLAP systems
permit users to
view the data at
any level of
The operation of moving from finer granular data granularity. The
52 3 Roll up Increment Reduction Drill down
to coarser granular data is called _______ process of moving
from finer granular
data to coarser
granular data is
called as a roll-up.
OLAP is the
The ___________ engine for a data warehouse
53 3 OLAP SMTP NNTP POP engine of data
supports query-triggered usage of data
warehouse
Pivot
(sum(quantity) for
54 3 In SQL the cross-tabs are created using Slice Dice Pivot All color in
(’dark’,’pastel’,’
white’)).
marks question A B C D ans
DECODE The right synatax
DECODE DECODE (search, for DECODE is
DECODE (search,
(expression, (expression, expression, DECODE
Which one of the following is the right syntax for result [, search,
55 3 search, result [, result [, search, result [, (expression,
DECODE? result]… [, default],
search, result]… [, result]… [, search, search, result [,
expression)
default]) default], search) result]… [, search, result]…
default]) [, default])
800,000 is value
at the intersection
The value at the intersection of the row labeled
of the row labeled
56 3 "India" and the column "Savings" in Table2 should 800000 300000 200000 300000
"India" and the
be:
column "Savings"
in Table2
The heart of data
Data warehouse Data mining Data mart database Relational data warehouse is Data
57 3 __________ is the heart of the warehouse.
database servers database servers. servers. base servers. warehouse
database servers.
{ (item name, color, clothes size), (item name,
Group by the Group by ‘Group by cube’
58 3 color), (item name, clothes size), (color, clothes None Group by
cubic rollup is used.
size), (item name), (color), (clothes size), () }
The data
59 3 The data Warehouse is__________. Read-only. Write only. Read write only None warehouse is
read-only
Unsupervised data
Unsupervised data Supervised data Depends on the
60 1 Cluster analysis is a type of ... ? Can not say mining is the
mining mining data
cluster analysis
High All are the
61 1 Challenges of clustering includes? All Scalability Noisy data dimensionality challenges of
of data clustering
You should
Continuous – Continuous – choose a
Binary – manhattan
62 1 Which of the following combination is incorrect? None correlation euclidean distance/similarity
distance
similarity distance that makes sense
for your problem.
Hierarchical
Hierarchical clustering should be primarily used
63 1 "True" "False" None None clustering is
for exploration.
deterministic.
Reduction of All mention are
In clustering high dimensional data comes with Reduction in Increase in
64 1 All algorithm the problems od
problems like? algorithm efficiency complexity
performance clustering
Hierarchical
Which of the following clustering requires clustering requires
65 1 Hierarchical Partitional Naive Bayes None
merging approach? a defined distance
as well.
K-means
Which of the following is required by K-means Number of Initial guess as to Defined clustering follows
66 1 All
clustering? clusters cluster centroids distance metric the partitioning
approach.
Hierarchical
Which clustering procedure is characterized by Hierarchical Optimizing Partition based Density
67 1 clustering is tree
the formation of a tree like structure? clustering partitioning clustering clustering
like structure.
k-means
k-means clustering k-nearest
k-nearest neighbor clustering is a
aims to partition n neighbor has
68 1 Point out the wrong statement. is same as k- none er method of
observations into k nothing to do with
means vector
clusters k-means.
quantization
Dissimilarity
A metric that is
means metric used
used to measure A metric that is used
69 2 What is dissimilarity? Both a and b None in clustering and
the closeness of in clustering.
closeness of
objects.
objects.
marks question A B C D ans
Formulating the
Data
The most important part of ... is selecting the Formulating the Deciding the Analysing the clustering problem
70 2 preprocessing for
attributes on which clustering is done? clustering problem clustering procedure cluster is the imporatant
clustering
part of clustering.
K-means
clustering
K-means is not deterministic and it also consists
71 2 "True" "False" None None produces the final
of number of iterations.
estimate of cluster
centroids.
Non-hierarchical
Non-hierarchical Optimizing Agglomerative clustering is called
72 2 k-means clustering is also referred to as ....? Divisive clustering
clustering partitioning clustering as k-means
clustering
Partition All other are the
73 2 Which is not a type of clustering? Decision driven Similarity based Density based
Based type of clustering
Hierarchical
Tree showing how
Which of the following is finally produced by Final estimate of Assignment of each clustering is an
74 2 close things are to All
Hierarchical Clustering? cluster centroids point to clusters agglomerative
each other
approach.
Derivative is not a
Which of the following is not clustering
75 2 Derivative Agglomerative Partitioning Density Based clustering
technique?
technique.
K-means requires
Which of the following function is used for k-
76 2 k-means k-mean heatmap None a number of
means clustering?
clusters.
Cluster
analysis
reduces the
number of
In clustering,
In clustering, larger objects, not
The dendrogram Clustering should be larger the distance
Which of the below sentences is true with respect the distance the the number of
77 2 is read from right done on samples of the more similar
of clustering? more similar the variables, by
to left 300 or more the object is true
object grouping them
for clustering.
into a much
smaller
number of
clusters
Unsupervised is a
78 2 Clustering is what type of learning? Unsupervised supervised Semi-supervised None
type of learning
Some elements
The choice of may be close to
In general, the an appropriate one another
Hierarchical
merges and splits metric will according to one
79 2 Point out the correct statement. All clustering is also
are determined in a influence the distance and
called HCA
greedy manner shape of the farther away
clusters according to
another.
Hierarchical
clustering is
Hierarchical clustering is slower than non-
80 2 "True" "False" Depends on data Can not say slower than non-
hierarchical clustering?
hierarchical
clustering
It does not fit in It does not fit in
It does not fit in It does not fit in
81 3 When does a model is said to do over-fitting? both current and None future state is a
future state current state
future state model.
The group of
Group of similar
Group objects Simplification of similar objects
objects with
having a similar data to make it with significant
significant
82 3 What is a cluster? feature from a ready for a None dissimilarity with
dissimilarity with
group of similar classification objects of other
objects of other
objects. algorithm. groups is called as
groups
cluster
marks question A B C D ans
Cluster analysis is
not classify
Which method of analysis does not classify Discriminant Regression
83 3 Cluster analysis Analysis of variance variables as
variables as dependent or independent? analysis analysis
dependent or
independent
Groups are not
Groups are not Groups are Depends on the
84 3 In clustering ? Can not say predefined in
predefined predefined data
clustering
All are the
85 3 Which of the following are clustering techniques? All Density Based Partitioning Agglomerative clustering
techniques.
Process of Process of Clustering is a
None of the
86 3 What is clustering? grouping similar classifying new Both a and b group of similar
above
objects object objects
Not sure about Clusters are
Noise and outliers All are the density
87 3 When is density based clustering preferred? All the number of irregular or
are present based clustering
clusters present intertwined
Euclidean distance
In the K-means clustering algorithm the distance
is the k-means
88 3 between cluster centroid to each object is Euclidean distance Cluster distance Cluster width None
clustering
calculated using ....method.
algorithm.
Partitioning is
Dynamic
Which technique finds the frequent itemsets in technique that
89 1 Partitioning Sampling Hashing itemset
just two database scans? finds the frequent
counting
itemsets
Finding of strong
Finding of strong
Using association to association rules
association rules Same as frequent
90 1 What is association rule mining? analyse correlation None using frequent
using frequent itemset mining
rules itemsets is an
itemsets
assoication rule.
An itemset which
An itemset which is is both closed and
An itemsetwhose no proper super-itemset has A frequent A closed itemsetA
91 1 both closed and None frequent are
same support is closed itemsets itemset closed itemset
frequent closed frequent
itemsets.
Apriori uses
Both apriori and Both apriori and Apriori uses Both apriori and
vertical and
FP-Growth uses FP-Growth uses horizontal and FP- FP-Growth uses
92 1 Which of the following is true? FP-Growth
horizontal data vertical data Growth uses vertical horizontal data
uses horizontal
format format data format format is true
data format
Support is
Some itemsets will Some itemsets will reduced by some
The number of
add to the current become infrequent itemsets will add
93 1 What will happen if support is reduced? frequent itemsets Can not say
set of frequent while others will to the current set
remains same
itemsets become frequent of frequent
itemsets
Support(A B) / Support(A B) / Support(A B) / Support(A B)
94 1 How do you calculate Confidence(A -> B)? None
Support (A) Support (B) Support (A) / Support (B)
The Apriori
If a rule is
If a rule is algorithm works
infrequent, its
What is the principle on which Apriori algorithm infrequent, its on if a rule is
95 1 generalized rules Both a and b None
work? specialized rules infrequent, its
are also
are also infrequent specialized rules
infrequent
are also infrequent
Apriori algorithm
It mines all It mines all
works on It mines
frequent patterns frequent patterns
None of the all frequent
96 1 What does Apriori algorithm through pruning through pruning Both a and b
above patterns through
rules with lesser rules with higher
pruning rules with
support support
lesser support
marks question A B C D ans
A frequent
A frequent A frequent A non-frequent itemsetwhose no
itemsetwhose no itemset whose itemset whose super-itemset is
97 2 What are maximal frequent itemsets? None
super-itemset is super-itemset is super-itemset is frequent is
frequent also frequent frequent maximal frequent
itemsets.
It mines
There are frequent It expands the
It expands the
chances that FP FP trees are very itemsets original database
98 2 What is not true about FP growth algorithms? original database
trees may not fit expensive to build . without to build FP trees
to build FP trees.
in the memory. candidate is not true
generation.
Decision trees is
Which of these is not a frequent pattern mining not a frequent
99 2 Decision trees FP growth Apriori Eclat
algorithm? pattern mining
algorithm
This clustering algorithm terminates when mean K-Means
values computed for the current iteration of the K-Means Conceptual Expectation Agglomerative clustering is the
100 2
algorithm are identical to the computed mean clustering clustering maximization clustering current iteration of
values for the previous iteration the algorithm.
Which of the following is not null invariant
lift is not null
101 2 measure(that does not considers null lift max_confidence cosine measure all_confidence
invariant measure
transactions)?
Absolute - Absolute -
Minimum support Minimum support
What is the difference between absolute and count threshold threshold and
102 2 Both mean same None None
relative support? and Relative - Relative -
Minimum support Minimum support
threshold count threshold
No we cannot use
Can FP growth algorithm be used if FP tree None of the
103 2 No Yes Both a and b FP growth
cannot be fit in memory? above
algorithm
An
An itemset for An itemsetwhose
An itemsetwhose An itemset for itemsetwhose
which at least no proper super-
no proper super- which at least no proper
104 2 What are closed itemsets? one proper itemset has same
itemset has same super-itemset has super-itemset
super-itemset has support is closed
support same confidence has same
same support itemsets
confidence
It mines all FP growth
It mines all It mines all frequent
frequent patterns algorithm do all
frequent patterns patterns through
105 2 What does FP growth algorithm do? through pruning All frequent patterns
by constructing a pruning rules with
rules with higher by constructing a
FP tree lesser support
support FP tree.
Number of
transactions
not containing
Support (A)
A Number of A / Total
means Number of
transactions Total Number of Total number of number of
transactions
106 2 What do you mean by support(A)? containing A / transactions not transactions transactions
containing A /
Total number of containing A containing Ans: Number
Total number of
transactions of transactions
transactions
containing A /
Total number
of transactions
Find all strong association rules given the support Cannot be
107 2 → I5, → → I5, → → I2 Null rule set None
is 0.6 and confidence is 0.8. determined
If it only satisfies If it satisfies both
There are
If it satisfies both min_support If it min_support and
When do you consider an association rule If it only satisfies other
108 3 min_support and satisfies both min_confidence
interesting? min_confidence measures to
min_confidence min_support and association rule
check so
min_confidence works
marks question A B C D ans
A frequent
itemset 'P' is a both are ture
When both a and b Support (P) = When a is true
109 3 When is sub-itemset pruning done? proper subset of when sub-itemset
is true Support(Q) and b is not
another frequent pruning is done
itemset 'Q'
Some association
rules will add to
Some association
Some association the current set of
Number of rules will become
What is the effect of reducing min confidence rules will add to association rules is
110 3 association rules invalid while others Can not say
criteria on the same? the current set of the effect of
remains same. might become a
association rules reducing min
rule.
confidence criteria
on the same
Market Basket
Analysis is direct
Which of the following is direct application of Market Basket Social Network Intrusion
111 3 Outlier Detection application of
frequent itemset mining? Analysis Analysis Detection
frequent itemset
mining.
Why is correlation analysis important?\nFor To weed out
To restrict the
questions given below consider the data To weed out To find large uninteresting
To make apriori number of
112 3 Transactions :\n1. I1, I2, I3, I4, I5, I6\n2. I7, I2, uninteresting number of frequent itemsets
memory efficient database
I3, I4, I5, I6\n3. I1, I8, I4, I5\n4. I1, I9, I10, I4, frequent itemsets interesting itemsets is correlation
iterations
I6\n5. I10, I2, I4, I11, I5 analysis
Apriori algorithm
Bottom-up and Top-down and Bottom-up and Top-down and works in bottom-
113 3 The apriori algorithm works in a ..and ..fashion?
breath-first breath-first depth-first depth-first up and breath-first
fashion.
FP growth
algorithm requires
114 3 Which algorithm requires fewer scans of data? FP growth Apriori Both a and b None
fewer scans of
data
115 3 Find odd man out: DBSCAN K mean PAM K medoid None
All techniques are
What techniques can be used to improve the Transaction Hash-based used to improve
116 3 All Partitioning
efficiency of apriori algorithm? Reduction techniques the efficiency of
apriori algorithm
Relation between
candidate and
A frequent itemset A candidate
What is the relation between candidate and No relation between frequent itemsets
117 3 must be a itemset is always Both are same
frequent itemsets? the two is frequent itemset
candidate itemset a frequent itemset
must be a
candidate itemset
Pattern evaluation
Measures to
measure are
What are Max_confidence, Cosine similarity, Pattern evaluation improve Frequent pattern
118 3 None Max_confidence,
All_confidence? measure efficiency of mining algorithms
Cosine similarity,
apriori
All_confidence
119 1 End Nodes are represented by __________ Triangles Squares Disks Circles None
Multivariate split is where the partitioning of
120 1 tuples is based on a combination of attributes "True" "False" None None None
rather than on a single attribute.
Unsupervised Supervised Reinforcement Missing data
121 1 Self-organizing maps are an example of None
learning learning learning imputation
Regression can
Assume you want to perform supervised learning
predict number of
and to predict number of newborns according to Structural
newborns
122 1 size of storks’ population Regression Classification Clustering equation
according to size
(http://www.brixtonhealth.com/storksBabies.pdf), modeling
of storks’
it is an example of
population
marks question A B C D ans
Some telecommunication company wants to Unsupervised
segment their customers into distinct groups to Unupervised Supervised learning is
123 1 Data extraction Serration
send appropriate subscription offers, this is an learning learning telecommunication
example of company
Decision Nodes are represented by
124 1 Squares Disks Circles Triangles None
____________
Outcome is the
In the example of predicting number of babies
example of
125 1 based on storks’ population size, number of Outcome Feature Attribute Observation
predicting
babies is
numbers.
126 1 Cost complexity pruning algorithm is used in? CART C4 ID3 All None
Attribute selection
Attribute selection measures are also known as measures are also
127 1 "True" "False" None None
splitting rules. known as splitting
rules
By pruning the
Both By pruning the
longer rules you
How will you counter over-fitting in decision By pruning the By creating new longer rules’ and ‘ None of the
128 1 can counter over-
tree? longer rules rules By creating new options
fitting in decision
rules’
tree
Gain ratio tends to
prefer unbalanced
Gain ratio tends to prefer unbalanced splits in
splits in which one
129 1 which one partition is much smaller than the "True" "False" None None
partition is much
other.
smaller than the
other.
If...then... analysis
is the best suit the
Which of the following classifications would best
Market-basket Cluster student
130 2 suit the student performance classification If...then... analysis Regression analysis
analysis analysis performance
systems?
classification
systems
A _________ is a decision support tool that uses
Refer the
a tree-like graph or model of decisions and their Neural
131 2 Decision tree Graphs Trees definition of
possible consequences, including chance event Networks
Decision tree.
outcomes, resource costs, and utility.
CART is the cost
132 2 Cost complexity pruning algorithm is used in? CART C4.5 ID3 All
complexity used
Unsupervised
The problem of finding hidden structure in Unupervised Supervised Reinforcement Data
133 2 learning is
unlabeled data is called learning learning learning extraction
unlabeled data
You are given data about seismic activity in
Supervised Unsupervised Dimensionality
134 2 Japan, and you want to predict a magnitude of Serration None
learning learning reduction
the next earthquake, this is in an example of
Greedy approach
What is the approach of basic algorithm for is basic algorithm
135 2 Greedy Top Down Procedural Step by Step
decision tree induction? for decision tree
induction
Choose from the following that are Decision Tree Decision
136 2 All End Nodes Chance Nodes None
nodes? Nodes
In pre-pruning
A pruning set of The best pruned
a tree is
class labelled tree is the one that
'pruned' by All statements are
137 2 Which of the following sentences are true? All tuples is used to minimizes the
halting its true
estimate cost number of encoding
construction
complexity. bits.
early.
Data
Which of the following is not involve in data Knowledge Data Data transformation is
138 2 Data exploration
mining? extraction transformation archaeology not involved in
data mining
marks question A B C D ans
Gini index favour
139 2 Gini index does not favour equal sized partitions. "False" "True" None None equal sized
partitions
Structure in
Flow-Chart &
which internal
Structure in which
node represents
internal node
test on an
represents test on
attribute, each Refer the
an attribute, each
140 3 What is Decision Tree? branch Flow-Chart None definition of
branch represents
represents Decision tree.
outcome of test
outcome of test
and each leaf node
and each leaf
represents class
node represents
label
class label
Random
141 3 Which one of these is not a tree based learner? Bayesian classifier ID3 CART None
Forest
Task of inferring a
model from
Task of inferring a model from labeled training Supervised Unsupervised Reinforcement Complax labeled training
142 3
data is called learning learning learning learning data is called
Supervised
learning
Pessimistic Postpruning and
Cost complexity
Postpruning and pruning and None of the Prepruning are
143 3 What are two steps of tree pruning work? pruning and time
Prepruning Optimistic options two steps of tree
complexity pruning
pruning pruning.
Classifiers that
perform a series Classifiers which
of condition form a tree with Both are tree-
144 3 What are tree-based classifiers? Both None
checking with each attribute at one based classifiers
one attribute at a level
time
Random Forest is
Bayesian Belief
145 3 Which one of these is a tree based learner? Random Forest Bayesian classifier Rule based the tree-based
Network
leraner.
Use a white box Worst, best and
Possible
Which of the following are the advantage/s of model, If given expected values can
146 3 All Scenarios can None
Decision Trees? result is provided be determined for
be added
by a model different scenarios
When the number of classes is large Gini index is Gini index is not a
147 3 "True" "False" None None
not a good choice. good choice
Discriminating between spam and ham e-mails is
148 3 "True" "False" None None None
a classification task, true or false?
Simple random Simple random
sampling of time Horizon parameter sampling of time
Three parameters
series is probably is the number of series is probably
149 1 Point out the wrong statement. are used for time All
the best way to consecutive values not the best way
series splitting
resample times in test set sample to resample times
series data. series data.
The cluster
sampling, stratified
sampling or
The cluster sampling, stratified sampling or Non random
150 1 Random sampling Indirect sampling Direct sampling systematic
systematic samplings are types of ________ sampling
samplings are
types of random
sampling.
createResample
Which of the following can be used to generate can be used to
151 1 balanced cross–validation groupings from a set of createFolds createSample createResample None make simple
data? bootstrap
samples.
marks question A B C D ans
The unknown or
exact value that
represents the
whole population
Which of the following is classified as unknown
is called as
152 1 or exact value that represents the whole Parameter Guider Predictor Estimator
parameter.
population?
Generally
parameters are
defined by small
Roman symbols.
PCA is a
technique for
reducing the
dimensionality of
Which of the following is NOT supervised Naive large datasets,
153 1 PCA Decision Tree Linear Regression
learning? Bayesian increasing
interpretability but
at the same time
minimizing
information loss.
In judgement
sampling is carried
under an opinion
of an expert. The
In which of the following types of sampling the
Judgement Convenience Quota judgement
154 1 information is carried out under the opinion of an Purposive sampling
sampling sampling sampling sampling often
expert?
results in a bias
because of the
variance in the
expert opinion.
There are many
Which of the following package tools are present Pre-
155 1 All Feature selection Model tuning different modeling
in caret? processing
functions in R.
Which of the following can be used to create
Splitting is based
156 1 sub–samples using a maximum dissimilarity maxDissim minDissim inmaxDissim All
on the predictors.
approach?
Factors that affect
the performance
Which of the factors affect the performance of Good data Representation of learner system
157 2 Training scenario Type of feedback
learner system does not include? structures scheme used does not include
good data
structures.
If the argument to
this function is a
factor, the random
sampling occurs
Which of the following function can be used to within each class
158 2 createDataPartition newDataPartition renameDataPartition None
create balanced splits of the data? and should
preserve the
overall class
distribution of the
data.
In language
understanding, the
levels of
In language understanding, the levels of
159 2 Empirical Syntactic Phonological Logical knowledge that do
knowledge that does not include?
not include
empirical
knowledge.
Rolling forecasting
origin techniques
Which of the following function can create the
160 2 createTimeSlices newTimeSlices binTimeSlices None are associated
indices for time series type of splitting?
with time series
type of splitting.
marks question A B C D ans
In Cluster the
population is
divided into
various groups
The selected clusters in a clustering sampling are Proportional called as clusters.
161 2 Elementary units Primary units Secondary units
known as ________ units The selected
clusters in a
sample are called
as elementary
units.
p → Øq is not a
162 2 Among the following which is not a horn clause? p → Øq Øp V q p→q p
horn clause.
Entropy is a
measure of the
randomness in the
information being
processed. The
higher the entropy,
the harder it is to
High entropy means that the partitions in
163 2 Not pure Pure Useful Useless draw any
classification are
conclusions from
that
information.\nIt is
a measure of
disorder or purity
or unpredictability
or uncertainty.\n
Generally a
sample having 30
or more sample
values is called a
A sample size is considered large in which of the large sample. By
164 2 n > or = 30 n > or = 50 n < or = 30 n < or = 50
following cases? the Central Limit
Theorem such a
sample follows a
Normal
Distribution.
The method of
selecting a
desirable portion
from a population
The method of selecting a desirable portion from
that describes the
165 2 a population which describes the characteristics Sampling Segregating Dividing Implanting
characteristics of
of whole population is called as ________
the whole
population is
called as
Sampling.
Attributes are
statistically
dependent of one
Attributes are Attributes are
another given the
statistically Attributes are statistically Attributes can
Which of the following statements about Naive class value
166 2 dependent of one equally independent of one be nominal or
Bayes is incorrect? Attributes are
another given the important. another given the numeric
statistically
class value. class value.
independent of
one another given
the class value.
Sampling error is
inversely
proportional to the
Sampling error increases as we increase the sampling size. As
167 2 "False" "True" None None
sampling size. the sampling size
increases the
sampling error
decreases.
marks question A B C D ans
The function
dummyVars takes
The function
a formula and a
Caret includes dummyVars can be
Asymptotics data set and
several functions used to generate a
are used for outputs an object
168 2 Point out the correct statement. All to pre-process complete set of
inference that can be used
the predictor dummy variables
usually to create the
data from one or more
dummy variables
factors
using the predict
method.
In a sampling
distribution the
mean of the
population is equal
to the mean of the
If the mean of population is 29 then the mean of sampling
169 2 29 30 21 31
sampling distribution is __________ distribution.
Hence mean of
population=29.
Hence mean of
sampling
distribution=29.
The caret package
is a set of
functions that
Caret stands for classification and regression attempt to
170 3 "True" "False" None None
training. streamline the
process for
creating predictive
models.
Caret uses the
171 3 Caret does not use the proxy package. "False" "True" None None
proxy package.
A model of
language consists
A model of language consists of the categories Role structure of of the categories
172 3 Structural units System constraints Language units
which does not include? units which does not
include structural
units.
Study of
population is
called a Census.
Suppose we want to make a voters list for the
Hence for making
173 3 general elections 2019 then we require Census Sampling error Random error Simple error
a voter list for the
__________
general elections
2019 we require
Census.
Different learning
methods does not
174 3 Different learning methods does not include? Introduction Analogy Deduction Memorization
include the
introduction.
The density-based
clustering methods
recognize clusters
based on the
density function
Suppose we would like to perform clustering on
distribution of the
spatial data such as the geometrical locations of
Density-based Model-based K-means data object. For
175 3 houses. We wish to produce clusters of many Decision Trees
clustering clustering clustering clusters with
different sizes and shapes. Which of the following
arbitrary shapes,
methods is the most appropriate?
these algorithms
connect regions
with sufficiently
high densities into
clusters.
marks question A B C D ans
sumDiss can be
Which of the following function can be used to used to maximize
176 3 All minDiss avgDiss sumDiss
maximize the minimum dissimilarities? the total
dissimilarities.
In sampling
distribution the
parameter k
represents
In sampling distribution what does the parameter Secondary Sub stage
177 3 Sampling interval Multi stage interval Sampling interval.
k represents ________ interval interval
It represents the
distance between
which data is
taken.
Maximum
possible different
A machine learning problem involves four examples are the
attributes plus a class. The attributes have 3, 2, 2, products of the
178 3 and 2 possible values each. The class has 3 72 24 48 12 possible values of
possible values. How many maximum possible each attribute and
different examples are there? the number of
classes;\n3 * 2 *
2 * 2 * 3 = 72\n
1Data selection is:
A. The actual discovery phase of a knowledge discovery process
B. The stage of selecting the right data for a KDD process
C. A subject-oriented integrated time variant non-volatile collection of data in support of management
D. None of these
Answer: B
2Discovery is:
A. It is hidden within a database and can only be recovered if one is given certain clues (an example IS
encrypted information).
B. The process of executing implicit previously unknown and potentially useful information from data
C. An extremely complex molecule that occurs in human chromosomes and that carries genetic information
in the form of genes.
D. None of these
Answer: B
12The process of removing the deficiencies and loopholes in the data is called as
A. Aggregation of data
B. Extracting of data
C. Cleaning up of data.
D. Loading of data
Answer: C
13Which of the following process includes data cleaning, data integration, data selection, data
transformation, data mining, pattern evolution and knowledge presentation?
A. KDD process
B. ETL process
C. KTL process
D. MDX process
Answer: A
(A)
Data Cleaning
(B)
Data Transformation
(C)
Data Reduction
(D)
Data Integration
Answer:A
(A)
Data Reduction
(B)
Data Cleaning
(C)
Data Integration
(D)
Data Transformation
Answer:C
21Data set {brown, black, blue, green , red} is example of Select one:
A. Continuous attribute
B. Ordinal attribute
C. Numeric attribute
D. Nominal attribute
Answer:D
A.
This takes only two values. In general, these values will be 0 and 1 and .they can be coded as one bit
B.
The natural environment of a certain species
C.
Systems that can be used without knowledge of internal operations
D.
None of these
Answer:A
A.
A stage of the KDD process in which new data is added to the existing selection.
B.
The process of finding a solution for a problem simply by enumerating all possible solutions according to
some pre-defined order and then testing them
C.
The distance between two points as calculated using the Pythagoras theorem
D.None of These
24If there is a very strong correlation between two variables then the correlation coefficient must be
a. any value larger than 1
b. much smaller than 0, if the correlation is negative
c. much larger than 0, regardless of whether the correlation is negative or positive
d. None of these alternatives is correct.
Answer:B
Which of the following is a good alternative to the star schema?
A. Snowflake schema
B. Star schema
C. Star snowflake schema
D. Fact constellation
ANSWER: D
Patterns that can be discovered from a given database are which type
A. More than one type
B. Multiple types always
C. One type only
D. No specific type
ANSWER: A
The name of the table used for measuring similarity between objects
represenred using 2 or more binary attributes is:
A. Sqaure Matrix
B. Contegency Table
C. Triangular Matrix
D. None of the above
ANSWER: B
44. A large DB can be viewed as using___________ to help uncover hidden information about the
data.
a) Search
b) Compression
51. ________ models describe the relationship between I/O through algebraic equation.
a) Parametric
b) Non-parametric
c) Static
d) Dynamic
52. _______ may also be used to estimate error.
a) Squared error
b) Root mean error
c) Mean Root square
d) Mean squared error
53. _________ assumes that a linear relationship exists between the input data and the output data.
a) Bivariate regression
b) Correlation
c) Multiple regression
d) Linear regression
54. The _________ algorithm solves the estimation problem with incomplete data.
a) Expectation maximization
b) Expectation minimization
c) Summarization-maximization
d) Summarization minimization
55. Decision tree uses a _________ techniques.
a) Greedy
b) Divide & Conquer
c) Shortest Path
d) BFS
56. Null hypothesis and _______ hypothesis are two complementary hypothesis.
a) Classical
b) Testing
c) Alternative
d) None of the above
57. The BIAS of an estimator is the difference between ______ & _______values.
a) Expected,actual
b) Actual ,Expected
c) Maximal,Minimal
d) Minimal,Maximal
58. An __________ estimator is one whose BIAS is 0.
a) Unbiased
b) Rule biased
c) Mean Root square
d) Mean squared error
62. In Box plot the Total range of the data value is divided into ________.
a) Regions
b) Quartiles
c) Divisions
d) Partitions
63. ________ measure is used instead of similarity measures.
a) Distance
b) Dissimilarity
c) Both a,b
d) None of the above
64. _________ relates the overlap to the average size of the two sets together.
a) Dice
b) Jaccard
c) Cosine
d) Overlap
65. ________ is used to measure the overlap of two sets as related to the whole set caused by their
union.
a) Dice
b) Jaccard
c) Cosine
d) Overlap
66. ________ coefficient relates the overlap to the geometric average of the two sets.
a) Dice
b) Jaccard
c) Cosine
d) Overlap
1 A 2 C 3 C 4 A 5 C 6 A 7 B 8 A 9 C 10 A
11 D 12 D 13 A 14 A 15 D 16 B 17 A 18 A 19 C 20 C
21 B 22 B 23 A 24 C 25 B 26 A 27 C 28 A 29 A 30 A
31 D 32 C 33 B 34 D 35 B 36 D 37 D 38 B 39 D 40 B
41 A 42 A 43 D 44 C 45 B 46 A 47 A 48 D 49 A 50 D
UNIT-II
51 A 52 B 53 D 54 A 55 B 56 C 57 A 58 A 59 A 60 A
61 C 62 B 63 C 64 A 65 B 66 C 67 D 68 B 69 B 70 C
71 C 72 B 73 A 74 C 75 C 76 B 77 B 78 C 79 B 80 A
81 A 82 B 83 A 84 C 85 B 86 C 87 B 88 A 89 B 90 A
91 C 92 A 93 A 94 B 95 C 96 C 97 D 98 B 99 B 100 A
UNIT-III
111 B 112 A 113 B 114 D 115 B 116 C 117 B 118 C 119 D 120 A
121 B 122 A 123 C 124 D 125 A 126 C 127 B 128 D 129 A 130 C
131 A 132 C 133 A 134 C 135 B 136 B 137 C 138 C 139 C 140 A
141 A 142 A 143 A 144 A 145 C 146 C 147 B 148 B 149 A 150 D
UNIT-IV
151 C 152 A 153 D 154 C 155 B 156 B 157 C 158 B 159 A 160 C
161 B 162 D 163 C 164 B 165 D 166 C 167 A 168 B 169 B 170 A
171 C 172 D 173 B 174 D 175 A 176 A 177 B 178 C 179 B 180 D
181 C 182 B 183 C 184 A 185 B 186 A 187 C 188 A 189 B 190 C
191 A 192 C 193 C 194 A 195 A 196 B 197 A 198 B 199 A 200 B
UNIT-V
211 B 212 D 213 C 214 A 215 D 216 B 217 C 218 C 219 A 220 C
221 C 222 A 223 A 224 B 225 D 226 A 227 B 228 A 229 C 230 A
231 A 232 D 233 A 234 A 235 A 236 A 237 B 238 A 239 B 240 C
241 A 242 B 243 A 244 C 245 A 246 A 247 C 248 C 249 D 250 B