0% found this document useful (0 votes)
43 views5 pages

DWM Mid 2 Question Bank

This document contains sample questions and answers related to key concepts in data mining and data warehousing. The questions cover topics like frequent pattern mining, association rule mining, classification methods, ensemble learning, accuracy evaluation measures, and more. Example algorithms discussed include Apriori, decision trees, Naive Bayes, and support vector machines.

Uploaded by

Indu Alluru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views5 pages

DWM Mid 2 Question Bank

This document contains sample questions and answers related to key concepts in data mining and data warehousing. The questions cover topics like frequent pattern mining, association rule mining, classification methods, ensemble learning, accuracy evaluation measures, and more. Example algorithms discussed include Apriori, decision trees, Naive Bayes, and support vector machines.

Uploaded by

Indu Alluru
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Data Mining and Data Warehousing (AIDS & AIML)

Question Bank for Mid – II


Unit - 3
One & Two-Mark Question:

1. Define frequent patterns in the context of data mining.


Answer: Frequent patterns refer to sets of items or attributes that appear together frequently in a
dataset.
2. What are the key objectives of association mining?
Answer: The key objectives of association mining are to discover frequent itemsets, generate
association rules, and identify correlations between different items or attributes in a dataset.
3. Explain the concept of correlation analysis in data mining.
Answer: Correlation analysis in data mining aims to measure the statistical relationship or
association between different variables or attributes in a dataset.
4. What is constraint-based association mining?
Answer: Constraint-based association mining involves incorporating additional constraints or
user-defined criteria during the process of mining association rules to focus on specific patterns of
interest.
5. Name some popular classification methods used in data mining.
Answer: Some popular classification methods in data mining include decision tree induction,
Bayesian classification, rule-based classification, support vector machines, and associative
classification.
6. What are lazy learners in the context of classification?
Answer: Lazy learners are classification algorithms that defer the learning process until a new
instance needs to be classified. They store the training data and use it directly during
classification.
7. What is ensemble learning in data mining?
Answer: Ensemble learning involves combining multiple classifiers or predictors to improve the
overall accuracy and robustness of the classification or prediction tasks.
8. Explain the concept of support vector machines (SVM) in data mining.
Answer: Support vector machines (SVM) are supervised learning models used for classification
and regression analysis. They find an optimal hyperplane that maximally separates data points of
different classes.
9. What is the purpose of prediction in data mining?
Answer: The purpose of prediction in data mining is to make inferences or forecasts about future
events or outcomes based on patterns and relationships discovered in historical or existing data.
10. What is error measures used in evaluating predictors or regression models?
Answer: Error measures used in evaluating predictors or regression models include mean squared
error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and R-squared
(coefficient of determination).
11. Explain the concept of associative classification in data mining.
Answer: Associative classification combines association rule mining and classification techniques
to generate classification rules based on the relationships between itemsets and class labels in a
dataset.
One and Two-mark Questions: Unit – 4
1. Define cluster analysis and its purpose in data mining.
Answer: Cluster analysis is the process of grouping similar objects or data points into clusters to
discover patterns or relationships. It is used in data mining to identify inherent structures in the
data and uncover meaningful insights.
2. What are the major types of data used in cluster analysis?
Answer: The major types of data used in cluster analysis are categorical, numerical, binary, and
mixed data.
3. Differentiate between partitioning methods and hierarchical methods in cluster analysis.
Answer: Partitioning methods divide the dataset into non-overlapping partitions or clusters, while
hierarchical methods create a hierarchical structure of nested clusters.
4. Name one density-based clustering algorithm.
Answer: One example of a density-based clustering algorithm is DBSCAN (Density-Based
Spatial Clustering of Applications with Noise).
5. What are grid-based clustering methods?
Answer: Grid-based clustering methods divide the dataset into cells or grids and use these
divisions to efficiently perform clustering.
6. Explain the concept of model-based clustering methods.
Answer: Model-based clustering methods assume that the data is generated from a mixture of
probability distributions. These methods estimate the parameters of the distributions to identify
clusters.
7. What is the main challenge in clustering high-dimensional data?
Answer: The main challenge in clustering high-dimensional data is the "curse of dimensionality,"
where the distance between points becomes less meaningful as the number of dimensions
increases.
8. What is constraint-based cluster analysis?
Answer: Constraint-based cluster analysis incorporates user-defined constraints or prior
knowledge to guide the clustering process. It allows users to specify constraints on the objects
that should or should not belong to certain clusters.
9. Define outlier analysis in the context of cluster analysis.
Answer: Outlier analysis involves identifying objects or data points that deviate significantly
from the normal patterns or clusters in the data.
10. What is the primary goal of cluster analysis?
Answer: The primary goal of cluster analysis is to group similar objects together and separate
dissimilar objects based on their characteristics or patterns.
11. Give an example of a partitioning-based clustering algorithm.
Answer: K-means is an example of a partitioning-based clustering algorithm.
12. Mention one advantage of density-based clustering algorithms.
Answer: Density-based clustering algorithms can discover clusters of arbitrary shape and handle
noise and outliers effectively.
Unit – 5
One and Two-mark Questions:

1. What is data stream mining?


Answer: Data stream mining involves analyzing and extracting patterns from continuous, fast-
paced, and potentially infinite streams of data.
2. Name one algorithm used for mining time series data.
Answer: One algorithm used for mining time series data is Dynamic Time Warping (DTW).
3. What is sequence pattern mining in transactional databases?
Answer: Sequence pattern mining in transactional databases involves discovering sequential
patterns or frequent subsequences in transactional data.
4. Give an example of mining sequence patterns in biological data.
Answer: An example of mining sequence patterns in biological data is the identification of gene
expression patterns or protein sequences.
5. What is graph mining?
Answer: Graph mining involves analyzing and extracting patterns or insights from structured data
represented as graphs.
6. What is multi-relational data mining?
Answer: Multi-relational data mining involves analyzing and extracting patterns from databases
that contain multiple interrelated tables or datasets.
7. Define spatial data mining.
Answer: Spatial data mining refers to the process of discovering patterns, relationships, or
insights from spatial datasets, such as geographic information systems (GIS) data.
8. What is multimedia data mining?
Answer: Multimedia data mining involves analyzing and extracting patterns from multimedia
datasets, which include different types of media such as images, audio, video, and text.
9. Explain text mining in the context of data mining.
Answer: Text mining is the process of extracting meaningful information and patterns from
unstructured text data, such as documents, emails, or social media posts.
10. Give an example of mining the World Wide Web.
Answer: Mining the World Wide Web can involve tasks such as web page clustering, sentiment
analysis of online reviews, or recommendation systems for personalized web content.
11. What is multidimensional analysis in the context of complex data objects?
Answer: Multidimensional analysis involves analyzing and exploring complex data objects with
multiple attributes or dimensions, typically represented in a data cube or data warehouse.
12. Define sequence pattern mining.
Answer: Sequence pattern mining is the process of discovering sequential patterns or frequent
subsequences in a given dataset.
Big Questions: UNIT-3

1. Find the frequent item sets and strong association rules for the following transactional database
table using Apriori algorithm. consider the threshold as min support=2 and Confidence=50%.

TID List of Items-IDs


T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
2. Explain Decision Tree Induction algorithm for classification, Discuss the usage of information
gain in this.

3. Discuss the process of frequent itemset mining and its significance in data mining. Provide
examples to illustrate your answer.
4. Compare and contrast different classification methods, such as decision tree induction, Bayesian
classification, and support vector machines. Highlight their strengths and weaknesses.
5. Explain the concept of ensemble learning in data mining. Discuss the advantages and challenges
associated with ensemble methods. Provide examples to support your answer.
6. Evaluate the accuracy and error measures commonly used in data mining for assessing the
performance of classifiers or predictors. Discuss the scenarios in which different error measures are
appropriate to use.
7. Discuss the challenges and techniques involved in handling large datasets and missing or noisy
data during the classification and prediction process in data mining. Provide recommendations for
addressing these challenges.

Big Questions:UNIT-4

1. Discuss the main categories of clustering methods and provide examples of each.
2. Compare and contrast partitioning methods, hierarchical methods, and density-based methods in
cluster analysis. Discuss their strengths and weaknesses.
3. Explain the challenges and techniques involved in clustering high-dimensional data. Provide
recommendations for addressing these challenges.
4. Discuss the concept of outlier analysis in the context of cluster analysis. Explain the techniques
used to identify and handle outliers.
5. What is constraint-based cluster analysis, and how does it differ from traditional cluster analysis
methods? Discuss the advantages and limitations of constraint-based clustering.
Big Questions: UNIT-5

1. Discuss the challenges and techniques involved in mining data streams. Explain how data stream
mining differs from traditional data mining.
2. Explain the concept of sequence pattern mining in transactional databases. Provide an example to
illustrate its application.
3. Discuss the applications and challenges of mining sequence patterns in biological data. Explain
the techniques used for mining biological sequences.
4. What is the role of graph mining in data analysis? Discuss the algorithms and techniques used for
graph mining.
5. Describe the process of text mining and its applications in data mining. Discuss the challenges
and techniques involved in text mining.

You might also like