Data Warehouse and Data Mining MCQ Questions: Name: Shivani Dattatraya Chatte Roll No: 08
Data Warehouse and Data Mining MCQ Questions: Name: Shivani Dattatraya Chatte Roll No: 08
Data Warehouse and Data Mining MCQ Questions: Name: Shivani Dattatraya Chatte Roll No: 08
Roll No: 08
1) Which of the following refers to the problem of finding abstracted patterns (or structures) in
the unlabeled data?
a. Supervised learning
b. Unsupervised learning
c. Hybrid learning
d. Reinforcement learning
Answer: b
Explanation: Unsupervised learning is a type of machine learning algorithm that is generally used to
find the hidden structured and patterns in the given unlabeled data.
2) Which one of the following refers to querying the unstructured textual data?
a. Information access
b. Information update
c. Information retrieval
d. Information manipulation
Answer: c
Explanation: Information retrieval refers to querying the unstructured textual data. We can also
understand information retrieval as an activity (or process) in which the tasks of obtaining information
from system recourses that are relevant to the information required from the huge source of
information.
3) Which of the following can be considered as the correct process of Data Mining?
Answer: a
Explanation: The process of data mining contains many sub-processes in a specific order. The correct
order in which all sub-processes of data mining executes is Infrastructure, Exploration, Analysis,
Interpretation, and Exploitation.
4) Which of the following is an essential process in which the intelligent methods are applied to
extract data patterns?
a. Warehousing
b. Data Mining
c. Text Mining
d. Data Selection
Answer: b
Explanation: Data mining is a type of process in which several intelligent methods are used to extract
meaningful data from the huge collection ( or set) of data.
Answer: a
Explanation: The term KDD or Knowledge Discovery Database is refers to a broad process of
discovering the knowledge in the data and emphasizes the high-level applications of specific Data
Mining techniques as well.
a. Science of making machine performs the task that would require intelligence when performed
by humans.
b. A computational procedure that takes some values as input and produces some values as the
output.
c. It uses machine learning techniques, in which programs learn from their past experience and
adapt themself to new conditions or situations.
d. All of the above.
Answer: c
Explanation: Generally, adaptive system management refers to using machine learning techniques. In
which the programs learn from their past experience and adapt themselves for new conditions and
events.
7) For what purpose, the analysis tools pre-compute the summaries of the huge amount of
data?
Answer: d
Explanation:
Whenever a query is fired, the response of the query would be put very earlier. So, for the query
response, the analysis tools pre-compute the summaries of the huge amount of data. To understand it in
more details, consider the following example:
Suppose that to get some information about something, you write a keyword in Google search. Google's
analytical tools will then pre-compute large amounts of data to provide a quick output related to the
keywords you have written.
Answer: d
Explanation: In data mining, there are several functionalities used for performing the different types of
tasks. The common functionalities used in data mining are cluster analysis, prediction, characterization,
and evolution. Still, the association and correctional analysis classification are also one of the important
functionalities of data mining.
a. Hierarchal
b. Naive Bayes
c. Partitional
d. None of the above
Answer: a
Explanation: In the above-given diagram, the hierarchal type of clustering is used. The hierarchal type
of clustering categorizes data through a variety of scales by making a cluster tree. So the correct answer
is A.
10) Which of the following statements is incorrect about the hierarchal clustering?
Answer: a
Explanation: All following statements given in the above question are incorrect, so the correct answer
is D.
11) Which one of the following can be considered as the final output of the hierarchal type of
clustering?
a. A tree which displays how the close thing are to each other
b. Assignment of each point to clusters
c. Finalize estimation of cluster centroids
d. None of the above
Answer: a
Explanation: The hierarchal type of clustering can be referred to as the agglomerative approach.
12) Which one of the following statements about the K-means clustering is incorrect?
a. The goal of the k-means clustering is to partition (n) observation into (k) clusters
b. K-means clustering can be defined as the method of quantization
c. The nearest neighbor is the same as the K-means
d. All of the above
Answer: c
Explanation: There is nothing to deal in between the k-means and the K- means the nearest neighbor.
13) Which of the following statements about hierarchal clustering is incorrect?
a. The hierarchal clustering can primarily be used for the aim of exploration
b. The hierarchal clustering should not be primarily used for the aim of exploration
c. Both A and B
d. None of the above
Answer: a
Explanation: The hierarchical clustering technique can be used for exploration because it is the
deterministic technique of clustering.
14) Which one of the clustering technique needs the merging approach?
a. Partitioned
b. Naïve Bayes
c. Hierarchical
d. Both A and C
Answer: c
Explanation: The hierarchal type of clustering is one of the most commonly used methods to analyze
social network data. In this type of clustering method, multiple nodes are compared with each other on
the basis of their similarities and several larger groups' are formed by merging the nodes or groups of
nodes that have similar characteristics.
15) The self-organizing maps can also be considered as the instance of _________ type of learning.
a. Supervised learning
b. Unsupervised learning
c. Missing data imputation
d. Both A & C
Answer: b
Explanation: The Self Organizing Map (SOM), or the Self Organizing Feature Map is a kind of Artificial
Neural Network which is trained through unsupervised learning.
16) The following given statement can be considered as the examples of_________
Suppose one wants to predict the number of newborns according to the size of storks' population by
performing supervised learning
a. Structural equation modelling
b. Clustering
c. Regression
d. Classification
Answer: c
Explanation: The above-given statement can be considered as an example of regression. Therefore the
correct answer is C.
17) In the example predicting the number of newborns, the final number of total newborns can
be considered as the _________
a. Features
b. Observation
c. Attribute
d. Outcome
Answer: d
Explanation: In the example of predicting the total number of newborns, the result will be represented
as the outcome. Therefore, the total number of newborns will be found in the outcome or addressed by
the outcome.
a. It is a measure of accuracy
b. It is a subdivision of a set
c. It is the task of assigning a classification
d. None of the above
Answer: b
Explanation: The term "classification" refers to the classification of the given data into certain sub-
classes or groups according to their similarities or on the basis of the specific given set of rules.
Explanation: The term data mining can be defined as the process of extracting information from the
massive collection of data. In other words, we can also say that data mining is the procedure of mining
useful knowledge from a huge set of data.
a. 5
b. 4
c. 2
d. 3
Answer: c
Explanation: There are only two categories of functions included in data mining: Descriptive,
Classification and Prediction. Therefore the correct answer is C.
21) Which of the following can be considered as the classification or mapping of a set or class
with some predefined group or classes?
a. Data set
b. Data Characterization
c. Data Sub Structure
d. Data Discrimination
Answer: d
Explanation: The discrimination refers to the mapping (or classification) of a class with some
predefined groups or classes. So the correct answer is D.
22) The analysis performed to uncover the interesting statistical correlation between
associated -attributes value pairs are known as the _______.
a. Mining of association
b. Mining of correlation
c. Mining of clusters
d. All of the above
Answer: b
Explanation: Mining of correlation refers to the additional analysis performed for uncovering the
interesting statistical correlation in between associated-attribute-value pairs.
23) Which one of the following can be defined as the data object which does not comply with
the general behavior (or the model of available data)?
a. Evaluation Analysis
b. Outliner Analysis
c. Classification
d. Prediction
Answer: b
Explanation: It may be defined as the object that doesn't comply with the general behavior or with the
model of available data.
24) Which one of the following statements is not correct about the data cleaning?
Answer: d
Explanation: Data cleaning is a kind of process that is applied to data set to remove the noise from the
data (or noisy data), inconsistent data from the given data. It also involves the process of
transformation where wrong data is transformed into the correct data as well. In other words, we can
also say that data cleaning is a kind of pre-process in which the given set of data is prepared for the data
warehouse.
a. Database technology
b. Information Science
c. Machine learning
d. All of the above
Answer: d
Explanation: Generally, the classification of a data mining system depends on the following criteria:
Database technology, machine learning, visualization, information science, and several other disciplines.
26) In order to integrate heterogeneous databases, how many types of approaches are there in
the data warehousing?
a. 3
b. 4
c. 5
d. 2
Answer: d
Explanation: In general, data warehousing consist of data integration, data cleaning, and data
consolidations. Therefore to integrate heterogeneous databases, there are two approaches that are
update-driven approach and the query-driven approach. So the correct answer is D.
27) The issues like efficiency, scalability of data mining algorithms comes under_______
a. Performance issues
b. Diverse data type issues
c. Mining methodology and user interaction
d. All of the above
Answer: a
Explanation: In order to extract information effectively from a huge collection of data in databases, the
data mining algorithm must be efficient and scalable. Therefore the correct answer is A.
28) Which of the following is the correct advantage of the Update-Driven Approach?
Answer: c
Explanation: The statements given in both A and B are the advantage of the Update-Driven Approach
in Data Warehousing. So the correct answer is C.
29) Which of the following statements about the query tools is correct?
Explanation: The query tools are used to query the database. Or we can also say that these tools are
generally used to get only the necessary information from the entire database.
30) Which one of the following correctly defines the term cluster?
Answer: a
Explanation: The term "cluster" refers to the set of similar objects or items that differ significantly
from the other available objects. In other words, we can understand clusters as making groups of
objects that contain similar characteristics form all available objects. Therefore the correct answer is A.
a. This takes only two values. In general, these values will be 0 and 1, and they can be coded as one
bit
b. The natural environment of a certain species
c. Systems that can be used without knowledge of internal operations
d. All of the above
Answer: a
Explanation: In general, the binary attribute takes only two types of values, that are 0 and 1and these
values can be coded as one bit. So the correct answer will be A.
Answer: c
Explanation: Data selection can be defined as the stage in which the correct data is selected for the
phase of a knowledge discovery process (or KKD process). Therefore the correct answer C.
33) Which one of the following correctly refers to the task of the classification?
a. A measure of the accuracy, of the classification of a concept that is given by a certain theory
b. The task of assigning a classification to a set of examples
c. A subdivision of a set of examples into a number of classes
d. None of the above
Answer: b
Explanation: The task of classification refers to dividing the set into subsets or in the numbers of the
classes. Therefore the correct answer is C.
a. Approach to the design of learning algorithms that is structured along the lines of the theory of
evolution.
b. Decision support systems that contain an information base filled with the knowledge of an
expert formulated in terms of if-then rules.
c. Combining different types of method or information
d. None of these
Answer: c
Explanation: The term "hybrid" refers to merging two objects and forms individual object that contains
features of the combined objects.
a. It is hidden within a database and can only be recovered if one is given certain clues (an
example IS encrypted information).
b. An extremely complex molecule that occurs in human chromosomes and that carries genetic
information in the form of genes.
c. It is a kind of process of executing implicit, previously unknown and potentially useful
information from data
d. None of the above
Answer: c
Explanation: The term "discovery" means to discover something new that has not yet been discovered.
It can also be interpreted as a process of executing underlying, previously unknown and potentially
useful information from data.
a. The process of finding a solution for a problem simply by enumerating all possible solutions
according to some predefined order and then testing them
b. The distance between two points as calculated using the Pythagoras theorem
c. A stage of the KDD process in which new data is added to the existing selection.
d. All of the above
Answer: c
Explanation: Euclidean distance measure can be defined as the calculating distance between two
points in either in-plane or three-dimensional space measures the length of the segments connecting
two points. It can also define as the distance between two points as calculated using the Pythagoras
theorem.
37) Which one of the following can be considered as the correct application of the data mining?
a. Fraud detection
b. Corporate Analysis & Risk management
c. Management and market analysis
d. All of the above
Answer: d
Explanation: Data mining is highly useful in a variety of areas such as fraud detection, corporate
analysis, and risk management, and market analysis, etc., so the correct option is D.
38) Which one of the following correctly refers to the Class study in the data cauterization?
a. Final class
b. Study class
c. Target class
d. Both A and C
Answer: c
Explanation: In the data cauterization, generally, the study class refers to the target class, and the study
class is the class that is under the process of summarizing data.
39) Which of the following refers to the sequence of pattern that occurs frequently?
a. Frequent sub-sequence
b. Frequent sub-structure
c. Frequent sub-items
d. All of the above
Answer: a
Explanation: In data mining, the frequent sub-sequence refers to a certain sequence of patterns that
occurs frequently, for example, buying a camera followed by the memory card. So the correct answer
will be A.
40) Which one of the following refers to the model regularities or to the objects that trends or
not consistent with the change in time?
a. Prediction
b. Evolution analysis
c. Classification
d. Both A and B
Answer: b
Explanation: In general, the evolution analysis refers to the model regularities or the object trends that
vary with change in time.
41) The issues like "handling the rational and complex types of data" comes under which of the
following category?
Answer: a
Explanation: It is quite often that a database can contain multiple types of data, complex objects, and
temporary data, etc., so it is not possible that only one type of system can filter all data. Therefore this
type of issue comes under the category Diverse Data type. So the correct answer is A.
42) Which of the following also used as the first step in the knowledge discovery process?
a. Data selection
b. Data cleaning
c. Data transformation
d. Data integration
Answer: b
Explanation: Data cleaning is included as one of the first steps of the knowledge discovery process. So
the correct answer is B.
43) Which of the following refers to the steps of the knowledge discovery process, in which the
several data sources are combined?
a. Data selection
b. Data cleaning
c. Data transformation
d. Data integration
Answer: d
Explanation: The step "data integration" of the knowledge discovery process refers to combining
several data sources. Therefore the correct answer is D.
44) Which of the following can be considered as the drawback of the query-Driven approach in
data warehousing?
Answer: d
Explanation: All statements given in the above question are drawbacks of the query-driven approach.
Therefore the correct answer is D.
45) Which of the following correctly refers to the term "Data Independence"?
a. It means that the programs are not dependent on the logical attributes
b. It refers to that data that is defined separately, not included in the program
c. It means that the programs are totally dependent on the physical attributes of data
d. Both A and C
Answer: d
Explanation: The term "Data Independence" refers that the programs are not dependent on the
physical attributes of data and neither on the logical attributes of data.
46) Which of the following is generally used by the E-R model to represent the weak entities?
a. Diamond
b. Doubly outlined rectangle
c. Dotted rectangle
d. Both B & C
Answer: b
Explanation: Generally, the double outline rectangle is used in the E-R model to represent the weak
entities.
a. It can be referred as the system that can be used without the knowledge of the internal
operations
b. It referrers the natural environment of the specific species
c. It takes only two values at most that are 0 and 1
d. All of the above
Answer: a
Explanation: Black Box is referred to as the system which takes only two values at most are zero and
one.
48) Which one of the following issues must be considered before investing in data mining?
a. Compatibility
b. Functionality
c. Vendor consideration
d. All of the above
Answer: d
Explanation: The common but important issues like functionality and compatibility must always be
discussed before investing in data mining. Therefore the correct answer is D.
Answer: c
Explanation: The term "DMQL" refers to the Data Mining Query Language. Therefore the correct
answer is C.
50) In certain cases, it is not clear what kind of pattern need to find, data mining
should_________:
Answer: c
Explanation: In some data mining operations where it is not clear what kind of pattern needed to find,
here the user can guide the data mining process. Because a user has a good sense of which type of
pattern he wants to find. So, he can eliminate the discovery of all other non-required patterns and focus
the process to find only the required pattern by setting up some rules. Therefore the correct answer is
C.
51) __________ contains information that gives users an easy-to-understand perspective of the
information stored in the data warehouse
a. Financial metadata
b. Operational metadata
c. Technical metadata
d. Business metadata
Answer: d
a. Deduplication
b. Domain consistency
c. Segmentation
d. Disambiguation
Answer: c
53) ___________ is a good alternative to the star schema.
a. Star schema.
b. Snowflake schema.
c. Fact constellation.
d. Star-snowflake schema.
Answer: c
Answer: b
55) A star schema has what type of relationship from a dimension to the fact table?
a. Many-to-many
b. Many-to-one
c. One-to-one
d. One-to-many
Answer: d
56) __________ predicts future trends & behaviors, allowing business managers to make
knowledge-driven decisions
a. Meta data
b. Data mart
c. Data warehouse
d. Data Mining
Answer: d
Answer: d
Answer: b
60) The main organisational justification for implementing a data warehouse is to provide
___________
Answer: d
a. must import data from transactional systems whenever significant changes occur in the
transactional data
b. works on live transactional data to provide up to date and valid results
c. takes regular copies of transaction data
d. takes preprocessed transaction data and stores in a way that is optimised for analysis
Answer: d
62) Data warehouse contains ________data that is seldom found in the operational environment
a. informational
b. normalized
c. denormalized
d. summary
Answer: d
a. A data warehouse is necessary to all those organisations that are using relational OLTP
b. A data warehouse is useful to all organisations that currently use OLTP
c. A data warehouse is valuable to the organisations that need to keep an audit trail of their
activities
d. A data warehouse is valuable only if the organisation has an interest in analysing historical data
Answer: d
Answer: b
Answer: a
Answer: d
67) Fact tables are described by which of the following?
a. Partially normalized
b. Completely denormalized
c. Partially denormalized
d. Completely normalized
Answer: d
68) _______ are numeric measurements or values that represent a specific business aspect or
activity
a. Dimensions
b. Schemas
c. Facts
d. Tables
Answer: c
Answer: d
Answer: a
71) Data cubes can grow to n-number of dimensions, thus becoming _______
a. Hypercubes
b. Star Cubes
c. Dimensional Cubes
d. Solid cubes
Answer: a
a. Relational data
b. Operational data
c. Informational data
d. Meta data
Answer: d
a. A system that is used to run the business in real time and is based on current data
b. A system that is used to run the business in real time and is based on historical data
c. A system that is used to support decision making and is based on historical data
d. A system that is used to support decision making and is based on current data
Answer: a
a. Preprocessing
b. Interpretation
c. Selection
d. Transformation
Answer: a
75) ______ makes a copy of a table and places it in a different location, to improve access time
a. Archive
b. Replication
c. Partitioning
d. Aggregation
Answer: b
76) When you ________ the data, you are aggregating the data to a higher level
a. Slice
b. Roll up
c. Accumulate
d. Drill down
Answer: b
77) ______ is a measurement of the density of the data held in the data cube
a. Mass
b. Sparsity
c. Compactness
d. Concentration
Answer: b
78) A fact table in the centre with dimension tables directly linked to it
a. A star schema
b. A star flake schema
c. A snowflake schema
d. A constellation
Answer: a
a. Re-format
b. Selection
c. Projection
d. Comparison
Answer: a
80)_________ introduces the Management Data Warehouse (MDW) to SQL Server Management
Studio for streamlined performance troubleshooting.
a. SQL Server 2005
b. SQL Server 2008
c. SQL Server 2012
d. SQL Server 2014
Answer: b
Explanation: MDW is a set of components that enable a database developer or administrator to quickly
track down problems that could be causing performance degradation.
Answer: a
Explanation: MDW consists of three components: Data Collector, MDW database and MDW reports.
82) Which of the following mode allows for the collection and uploading of data to occur on
demand?
a. Non-cached mode
b. Cached mode
c. Mixed mode
d. All of the mentioned
Answer: a
Explanation: In non-cached mode, collection and upload are on the same schedule.
Answer: d
Explanation: Cached mode uses separate schedules for collection and upload.
Answer: b
Explanation: You should not change the database name after creation, because all of the jobs created to
manage the database collection refer to the database by the original name and will generate errors if
the name is changed.
85) Which of the following is the best Practice and Caveat for Management Data Warehouse?
a. Use a centralized server for the MDW database
b. The XML parameters for a single T-SQL collection item can have multiple <Query> elements
c. Use a distributed server for the MDW database
d. All of the mentioned
Answer: a
Explanation: Centralized server allows you to use a single point for viewing reports for multiple
instances.
86) ____________ stores information about how the management data warehouse reports should
group and aggregate performance counters.
a. core.snapshots_internal
b. core.supported_collector_types_internal
c. core.wait_categories
d. core.performance_counter_report_group_items
Answer: d
Explanation: core.wait_categories contains the categories used to group wait types according to
wait_type characteristic.
87) Which of the following table is used in the management data warehouse schema that is
required for the Server Activity?
a. snapshots.query_stat
b. snapshots.os_latch_stats
c. snapshots.active_sessions
d. all of the mentioned
Answer: b
Explanation: snapshots.os_latch_stats is a System level resource table.
Answer: c
Explanation: core.sp_add_collector_type adds a new entry to the core.supported_collector_types view
in the management data warehouse database.
89) What does collector_type_id stands for in the following code snippet?
core.sp_remove_collector_type [ @collector_type_uid = ] ‘collector_type_uid’
a. uniqueidentifier
b. membership role
c. directory
d. none of the mentioned
Answer: a
Explanation: collector_type_uid is the GUID for the collector type.
Answer: d
Explanation: Data Mining is defined as extracting information from huge sets of data. In other
words, we can say that data mining is the procedure of mining knowledge from data. The
information or knowledge extracted so that it can be used.
Answer: a
Explanation: there are two categories of functions involved in Data Mining : 1. Descriptive, 2.
Classification and Prediction
92) The mapping or classification of a class with some predefined group or class is known as?
a. Data Characterization
b. Data Discrimination
c. Data Set
d. Data Sub Structure
Answer: b
93) The analysis performed to uncover interesting statistical correlations between associated-
attribute-value pairs is called?
a. Mining of Association
b. Mining of Clusters
c. Mining of Correlations
d. None of the above
Answer: c
94) __________ may be defined as the data objects that do not comply with the general behavior
or model of the data available.
a. Outlier Analysis
b. Evolution Analysis
c. Prediction
d. Classification
Answer: a
Explanation: Outlier Analysis : Outliers may be defined as the data objects that do not comply
with the general behavior or model of the data available.
Answer: b
Explanation: In order to effectively extract the information from huge amount of data in
databases, data mining algorithm must be efficient and scalable.
96) To integrate heterogeneous databases, how many approaches are there in Data
Warehousing?
a. 2
b. 3
c. 4
d. 5
Answer: a
Explanation: Data warehousing involves data cleaning, data integration, and data
consolidations. To integrate heterogeneous databases, we have the following two approaches :
Query Driven Approach, Update Driven Approach
Answer: C
Answer: d
Explanation: Data cleaning is a technique that is applied to remove the noisy data and correct
the inconsistencies in data. Data cleaning involves transformations to correct the wrong data.
Data cleaning is performed as a data preprocessing step while preparing the data for a data
warehouse.
Answer: d
Explanation: A data mining system can be classified according to the following criteria :
Database Technology, Statistics, Machine Learning, Information Science, Visualization, Other
Disciplines
Answer: d
a. Bottom-up process
b. Top-down process
c. None
d. Both a & b
Answer: d
102) The pincer-search has an advantage over a priori algorithm when the largest frequent item set is
long.
a. True
b. false
Answer: a
Answer: a
Answer: a
Answer: a
106) If the item set is in a dashed circle while completing a full pass it moves towards
a. Dashed circle
b. Dashed box
c. Solid Box
d. Solid circle
Answer: d
107) If the item set is in the dashed box then it moves into a solid box after completing a full pass
a. True
b. False
Answer: a
108) The dashed arrow indicates the movement of the item set
a. True
b. False
Answer: b
109) The vertical arrow indicates the movement of the item set after reaching the frequency threshold
a. True
b. False
Answer: a
Answer: c
Answer: a
Answer: d
114) The main idea of the algorithm is to maintain a frequent pattern tree of the date set. An extended
prefix tree structure starting crucial and quantitative information about frequent sets
a. Priori Algorithm
b. Pinchers Algorithm
c. FP- Tree Growth algo.
d. All of these
Answer: c
115) The data warehousing and data mining technologies have extensive potential applications in the
govt in various central govt sectors such as :
a. Agriculture
b. Rural Development
c. Health and Energy
d. all of the true
Answer: d
Answer: a
117) Good performance can be achieved in a data mart environment by extensive use of
a. Indexes
b. creating profile records
c. volumes of data
d. all of the above
Answer: d
119) For a list T, we denote head_t as its first element and body-t as the remaining part of the list (the
portion of the list T often removal of head_t) thus t is
a. {head} {body}
b. {head_t} {body_t}
c. {t_head}{t_body}
d. None of these
Answer: b
Answer: b
Answer: a
Answer: c
Answer: a
124) Pincer search algorithm based on the principle of
a. Bottom-up
b. Top-Down
c. Directional
d. Bi-Directional
Answer: d
Answer: d
126) Is a full-breadth search, where no background knowledge of frequent itemsets is used for
pruning?
a. Level-crises filtering by the single item
b. Level-by-level independent
c. Multi-level mining with uniform support
d. Multi-level mining with reduced support
Answer: b
Answer: c
Answer: c
129) The pincer-search has an advantage over a priori algorithm when the largest frequent itemset is
long
a. True
b. false
Answer: a
Answer: a
Answer: b
Answer: b
Answer: a
b. The “Best Pruned Tree is the one that maximizes the number of encoding bits.
a. True
b. False
Answer: b
Answer: c
Answer: d
Answer: b
Answer: a
Answer: c
140) Class label of each training sample is provided with this step is known as
a. Unsupervised learning
b. Supervised learning
c. Training samples
d. Clustering
Answer: b
Answer: d
Answer: d
143) To select the test attribute of each node in a decision tree we use
a. Entity Selection Measure
b. Data Selection Measure
c. Information Gain Measure
d. None of these
Answer: c
144) Test attribute for the current node in the decision tree is chosen on the basis of
a. Lowest entity gain
b. Highest data gain
c. Highest Information Gain
d. Lowest Attribute Gain
Answer: c
Answer: a
146) Let us be the no. of samples of S in class Ci then expected information to classify a given sample is
given by
a. L(s1,s2,……..sm)=_log2(pi)
b. L(s1,s2,……..sm)=-_pilog2(pi)
c. L(s1,s2,……..sm)=_pilog2x
d. L(s1,s2,……..sm)=_pilog2(pi)
Answer: b
147) Steps applied to the data in order to improve the accuracy, efficiency, and scalability are:-
a. Data cleaning
b. Relevance analysis
c. Data transformation
d. All of the above
Answer: d
148) The process used to remove or reduce noise and the treatment of missing values
a. Data cleaning
b. Relevance analysis
c. Data transformation
d. None of above
Answer: a
149) Relevance analysis may be performed on the data by removing any irrelevant attribute from the
process.
a. True
b. False
Answer: a
151) In a decision tree internal node denotes a test on an attribute and Leaf nodes represent classes or
class distributions
a. True
b. false
Answer: a
152) ___ attempts to identify and remove branches, with Improving accuracy
a. decision tree
b. tree pruning
c. both of them
d. none of above
Answer: b
153) To deal with larger data sets, a sampling method, called ___
a. Clara
b. Dara
c. Pam
d. None
Answer: a
Answer: c
Answer: a
156) Which Algorithm was proposed that combines the Sapling Technique with PAM.
a. CLARA
b. CLARANS
c. Both a and b
d. None of these.
Answer: b
Answer: b
158) Cluster is a :
a. The process of grouping a set of physical or abstract objects into classes of similar objects is
called clustering.
b. A cluster of data objects can be treated collectively as one group in many applications
c. Cluster analysis is an important human activity.
d. All of the above
Answer: d
Answer: c
Answer: c
Answer: c
Answer: d
Answer: c
Answer: a
Answer: c
Answer: c
Answer: b
Answer: a
Answer: b
172) The k-means and the k-modes methods can be integrated to cluster data with mixed numeric and
categorical values, resulting in
a. k-median method
b. k-partition method
c. k-prototypes method
d. k-medoids method
Answer: c
Answer: d
Answer: c
Answer: b
Answer: d
Answer: b
Answer: b
Answer: a
Answer: a
182) The ___ client is a desktop that relies on the server to which it is connected for the majority of its
computing power.
a. thin
b. none
c. thick
d. web server
Answer: a
183) An object is said to be the Core Object if
a. _ Ne(O)_ ≥ MinPts
b. _ N (O)_ ≥ MaxPts
c. none of above
d. both a & b
Answer: a
Answer: a
Answer: d
Answer: d
Answer: c
Answer: a
Answer: c
191) Unnormalized data, which is the basis for online analytical processing tools are prepared
periodically but is directly based on detailed ___.
a. reference data
b. transaction data
c. reference and transaction data
d. none of the above
Answer: a
192) The data mart is loaded with data from a data warehouse by means of a ___
a. load program
b. process
c. project
d. all is valid
Answer: a
Answer: d
194) Periodic maintenance of a data mart means
a. all are true
b. loading
c. refreshing
d. purging
Answer: a
195) Detailed level data, summary level, preprocessed and Adhoc data are data in
a. data warehouse
b. data mart
c. both
d. none of the above
Answer: b
Answer: c
197)___ Table help and enable the end-users of the data mart to relate the data to its expanded version.
a. data
b. reference
c. both a and b
d. none of the above
Answer: b