Ban Quiz Answer
Ban Quiz Answer
Week 2
1) Which of the following is a motivating challenge for developing data mining:.
A. No one cares about data mining
B. Scalability
C. Algorithms are very complex
D. Boredom
2) Data mining, as a research discipline, does not draw ideas from this research discipline:
A. Statistics
B. Linguistics
C. Artificial Intelligence
D. Information Theory
4) Which of the following is an important data-related success factor in data mining efforts:
A. Data type(s) being mined.
B. Processing power available
C. Algorithm design
D. Descriptive statistics-based tools
7) The terms “Analytics” and “Big Data” have become essentially synonymous with the
term “data mining” in recent years.
Answer true
8) Search algorithms are the only AI techniques of interest to data mining researchers
Answer false
9) The term “noise” has a technical meaning in data mining referring to the distortion of
data from their true value and/or the addition of spurious objects.
Answer true
10) Effectively handling “noisy” data is the object of much data mining research, as data used
in the real world often contains significant amounts of noise, thus potentially
contaminating data mining results to the point where those results are useless.
Answer true
11) Cluster analysis is a means for discovering patterns in the data based on highly associated
features of the data
Answer false
12) Cluster analysis is a means for discovering and grouping together (clustering) sets of
observations that are closely related
Answer true
13) Association analysis is a means for discovering and grouping together (clustering) sets of
observations that are closely related
Answer false
Week 3
1 Which of the following is not a factor in data quality:
A. Accuracy
B. Completeness
C. Relevance
D. Timeliness
14) Which of the following data attributes is not one of the most common found in real-world
databases and data warehouses
A. Inaccuracy
B. Interpretability
C. Incompleteness
D. Timeliness
16) Dirty data can cause which of the following problems regarding data mining results
A. Distrust of the results by those who must rely on them to make important decisions
B. Inaccurate results
C. Incomplete results
D. All of the above
Answer false
19) Median is a summary statistic, where thefunction median(x) returns the middle value in a
data set with an odd number of values and the average of the two middle values if the
number of values is even.
Answer true
20) Visualization techniques are often specialized to the type of data being analyzed.
Answer true
21) A multidimensional representation of data with all possible totals (aggregates) is known
as a _data_ _cube_.
22) If we aggregate over all the dimensions of a data set except for two, we are creating a two
dimensional table using a data reduction approach through aggregation known as
_pivoting_.
23) A _MOLAP_ is a database system that might use as its base data model a data cube
representation.
24) If we analyze monthly data in terms of the days of each month we are _drilling__down_.
25) Two steps are necessary to define data in a multidimensional array representation:
_identification of dimensions_ and an_attribute_ that is the analysis focus.
Week 4
1 Which of the following is/are unordered categorical variable(s):.
A. Gender(male, female)
B. Economic status (low, middle, high)
C. Set of set of even integers from n=2 to n=30
D. Hair color (brown, brown, red)
E. All of the above
26) The set of odd integers from n=5 to n=41 is which of these types of variables:
A. Categorical
B. Interval
C. Independent
D. Ordinal
E. Dependent
31) According to Han, Kamber and Pei, data warehouses are a multidimensional space for
storing data
Answer true
32) The four key words in William H. Inmon’s definition of a data warehouse that separate
them from other types of data storage structures are: subject-oriented, integrated, time-
variant and nonvolatile.
Answer true
33) A library maintained by a business does not fit William H. Inmon’s definition of a data
warehouse.
Answer false
35) Data warehouses are not a form of online analytical processing (OLAP) systems
Answer false
36) The term “data cube” refers to a data structure that is defined by dimensions and facts.
Answer true
Answer true
38) If one dimension of a data cube is “location”, described by the attributes number, street,
city, province_or_state, zip-code and country, then there is implicit in the location
dimension a concept hierarchy defined in the schema for that database.
Answer true
39) Concept hierarchies in a data warehouse represent easily materialized views of the data
using several non-interactive data cube operations.
Answer false
40) The slice and dice operations both form subcubes, but the slice is done on one dimension,
while the dice is done on multiple dimensions.
Answer true
Week 5
1 A key motivation for using a multidimensional approach to the data to be analyzed is that
aggregating data in multiple ways is:
A. Data cubes are an easy data structure for high-level managers to grasp
B. The computational efficiency of algorithms that work with more than two dimensions
is very high
C. Important
D. Non-existent, there is no key motivation to do so
46) Data cubes can store precomputed measures, such as aggregated values of dimensions of
an attribute such as daily sales totals.
Answer true
47) Explanatory multidimensional data mining is never an interactive process, given the
intense computations that necessarily involved.
Answer false
Answer false
49) Preprocessing, or computing, the values to be stored in a given data cube allows for more
efficient real-time querying in a multidimensional online analysis processing
environment.
Answer true
Answer true
52) If we aggregate only those cells where the number of items bought by a particular
customer on a given day is greater than x, then the resulting partially materialized data
cube is an_Iceberg_ data cube.
53) Drill down operations on a prediction cube is a computational challenge, given the need
to materialize cell values at many different granularity levels.
Answer true
Week 6
1 Which of the following is / aretrue of arule extracted via association analysis (AA):
A. Always describes an association between / among two or more items that is non-
random, not based in chance
B. Can be, but is not always, reliably useful to predict future behaviors of the population
described by the data from which the rule is derived
C. Is often computationally expensive to discover via AA
D. Must always be assumed to be true in 100% of all future behaviors of the population
that is described by the data from which the rule was extracted
55) If the equation, σ ( X ) =⌊ {t i|X ⊆ t i , t i ∈T } ⌋ , represents the value known as the support
count in associative analysis, then
A. It must be true that ti represents a member of the itemset X
B. Then the higher the value, σ ( X ) is determined to be, the less likely that the
itemset is meaningful in the final analysis results
C. σ ( X ) cannot be the support value, because the set T is clearly always going to
be equal to the null set
D. X is the itemset for which we are trying to determine the number of transactions,
t i , in the data set of transactions, T, that contain X
56) The theorem that states “If an itemset is frequent, then all of its subsets must also be
frequent” is often referred to as the
A. Theorem that can never be proved
B. Apriori Principle
C. The key to understanding market basket analysis
D. All of the above
57) In associative analysis, “frequent” or, put differently, the minimum support value, that an
itemset must have to considered in the final results
A. Is specific to the data being analyzed
B. Is a constant for a given organization’s data, but can be different across organizations
C. Is often found to be non-determinable
D. Is often just a random value chosen by the analyst and so cannot be tested as to its
validity
60) Which of these possible attributes of a maximal itemset actually is the defining one:
A. None of the immediate subsets of the given itemset are frequent, but it is frequent
B. All of the immediate supersets of the given itemset are frequent, but it is frequent
C. None of the immediate supersets of the given itemset are frequent, but it is frequent
D. One or more of the immediate supersets of the given itemset are frequent, but it is not
frequent
61) Which of these algorithms is often useful in finding maximal itemsets in a collection:
A. Shallow first
B. Binary sort
C. Backtrack
D. Depth-first
62) Because the FP-Growth algorithm abandons the generate and test approach of the Apriori
algorithm in favor a significantly more direct paradigm of storing into a compact data
structure and directly selecting the frequent itemsets from the structure, one can safely
say the FP-Growth algorithm is a radical departure from Apriori.
Answer true
Answer false
65) The end user of an analysis is the only one who can ultimately judge if a given rule is
interesting in terms of the results it produces.
Answer true
66) A good interestingness measure will not be impacted by transactions that do not contain
itemsets of interest, because measures that are so impacted generate unstable results
67) If the function Lift(X, Y) returns a value less than 1, then the presence of event X in a
given set of events most likely means event Y is absent, and the reverse is also true.
Answer true
68) If the χ2 value is greater than 1 and the observed value of a slot (X, Y) is less than the
expected value, then there is a negativecorrelation between members X and Y of the slot
69) To say that the measures for interestingness, lift and χ2 are not null-invariant, is to assert
that their values are not independent of the number of null transactions in the data set
being analyzed.
Answer true
70) Han, Kamber and Pei recommend the use of the Kulc null-invariant measure in
conjunction with the imbalance ratio in determining interestedness.
Answer true
Answer false
72) __________ is a methodology useful for discovering interesting relationships within
large sets of data.
A. Big Data
B. Association analysis
C. Data Mining
D. Algorithm
74) Sets of frequent items hidden in large data sets are called _________.
A. associationrules.
B. big data.
C. binary representation.
D. analysis.
75) Two key issues that must be addressed when applying association analysis are:
A. Overpopulation
B. Computational Expense
C. Discovering spurious patterns
D. Data density
76) The strength of an association rule can be measured in terms of its____ and _____.
A. support
B. finances
C. confidence
D. size
77) The objective of the ________________ strategy is to find all the items that satisfy the
minsup threshold.
A. Association Rule Discovery
B. Correct Answer
C. Frequent Itemset Generation
D. Incorrect Answer
78) The objective of the ________________ strategy is to extract all the high-confidence
rules from the frequent itemsets found in the Frequent Itemset Generation.
A. Association Rule
B. Rule Generation
C. Association Analysis
D. Association Generation
79) Trimming the exponential search space based on the support measure is known as:
A. Exponential Pruning
B. Apriori Algorithm
C. Trimming
D. Support-Based Pruning
80) __________ are visual structures that use branches and leaf nodes to search an item or
itemset.
A. Functions
B. Hash Trees
C. Root Node
D. Algorithms
81) Market-based analysis studies customers’ buying habits by searching for itemsets that are
frequently purchased together (or in sequence).
TRUE
82) None of the association rule mining algorithms use support measure to prune rules and
itemsets.
FALSE
83) The Apriori Principle says that if an itemset is frequent then all of its subsets must also be
frequent.
TRUE