DataWarehousing DataMining Question Bank
DataWarehousing DataMining Question Bank
16 marks
1. Design a complete data warehouse architecture for an enterprise system that
supports both retail (e.g., Walmart) and banking operations. Your design should
include appropriate schemas (star, snowflake, and fact constellation), metadata, and
concept hierarchies.
2. Compare and analyze OLAP and OLTP systems by applying them to real-time use
cases like airline reservation systems and online ticket booking. Highlight their
differences in terms of structure, speed, and suitability for decision support.
3. Evaluate OLAP operations (Roll-up, Drill-down, Slice, Dice, Pivot) in the context
of financial forecasting and educational data analysis. Explain how these operations
enhance business intelligence.
4. Develop a scalable data warehouse model for a multi-platform business like an
online travel agency. Address challenges related to real-time fraud detection,
integration with multiple DBMSs, and the role of OLAP tools in operational
reporting.
5. Construct and apply the Knowledge Discovery in Databases (KDD) process to
analyze social media user behavior and predict telecom customer churn. Explain each
step with focus on data selection, cleaning, transformation, and pattern evaluation.
6. Evaluate the role of data preprocessing techniques—such as cleaning, integration,
transformation, reduction, and discretization—in predicting customer churn in
telecom, transportation analysis, and stock market trend forecasting.
7. Discuss the challenges and importance of data visualization and statistical
description in real-time systems like surveillance, IoT, and fraud detection in
banking. Provide appropriate preprocessing and visualization workflows.
8. Assess the role of transformation, discretization, and visualization in predicting
stock market trends and processing IoT sensor data. Provide challenges in
implementing these preprocessing stages in large-scale data environments.
9. Design and implement an association rule mining model using the Apriori
algorithm for analyzing customer purchase patterns in an online grocery store.
Highlight how pattern evaluation measures like support, confidence, and lift impact
decision-making.
10. Evaluate constraint-based frequent pattern mining techniques with applications
in fraud detection and telecom churn prediction. Discuss the importance of constraints
in reducing pattern explosion and increasing mining efficiency.
11. Construct a multidimensional frequent pattern mining model for e-commerce
platforms like Amazon or Flipkart. Explain the design process, schema selection, and
pattern generalization across product categories and customer segments.
12. Develop a classification model using frequent patterns mined from social media
content. Analyze how mined patterns can improve the accuracy and interpretability of
the classifier.
13. Develop a classification model using decision tree and support vector machine
(SVM) for detecting credit card fraud and analyzing customer sentiments. Compare
their performance using evaluation metrics like accuracy, precision, and recall.
14. Compare and evaluate clustering algorithms (K-means, hierarchical, density-
based, grid-based) for real-world applications such as document categorization,
customer segmentation, and health diagnostics.
15. Construct and analyze a clustering model for high-dimensional data such as
medical image segmentation and satellite imagery. Discuss challenges like
dimensionality, scalability, and noise handling.
16. Design a classification-clustering hybrid approach to group students based on
learning behavior and predict performance using rule-based classification. Justify
model selection and discuss improvements with clustering evaluation techniques.
17. Apply classification algorithms using WEKA (such as J48, Naive Bayes, or SVM)
on real-world datasets like Breast Cancer or Diabetes. Evaluate model performance
using confusion matrix, ROC, and cross-validation.
18. Design and compare clustering models in WEKA using the Iris and Auto Imports
datasets. Explain how clustering results vary by algorithm (e.g., k-means vs. EM) and
the role of visualization in interpretation.
19. Use WEKA to perform association rule mining on retail transaction datasets.
Interpret mined rules using support, confidence, and lift. Explain how filters and data
format (e.g., ARFF) influence preprocessing and results.
20. Develop a complete end-to-end workflow in WEKA involving data preprocessing
(cleaning, filtering), model building, testing, and result visualization. Apply it to a
real-time text classification task or campaign targeting system.
2 Marks
1. Define OLAP and list its types.
2. List components of a Data Warehouse.
3. Differentiate between OLTP and OLAP.
4. What are star, snowflake, and fact constellation schemas?
5. Mention any two OLAP operations used in business analytics.
6. Define metadata and explain its role in a data warehouse.
7. List any two benefits of using data warehouses in retail industries.
8. Mention any two key characteristics of OLAP systems.
9. State the purpose of concept hierarchies in business analysis.
10. List the phases involved in building a data warehouse.
11. Define data mining.
12. List the steps in the knowledge discovery process (KDD).
13. Mention two data cleaning techniques.
14. State the role of data integration in preprocessing.
15. Define transformation and discretization with one example.
16. List types of data attributes.
17. Mention two real-time applications of data mining.
18. Define data visualization and give one benefit.
19. List two statistical methods used in data preprocessing.
20. Mention any two issues or challenges in data mining.
21. Define frequent itemset.
22. What is the Apriori principle?
23. State the purpose of support and confidence in association rule mining.
24. Mention one difference between multilevel and multidimensional pattern mining.
25. List two pattern evaluation measures.
26. What is constraint-based pattern mining?
27. State the need for lift and conviction in association analysis.
28. List two real-time applications of association rule mining.
29. Mention one advantage of FP-Growth over Apriori.
30. List any two steps involved in the Apriori algorithm.
31. Define classification and give one example.
32. What is a decision tree?
33. Mention one real-time application of SVM.
34. List types of clustering algorithms.
35. State one difference between supervised and unsupervised learning.
36. Mention one property of density-based clustering.
37. Define hierarchical clustering.
38. List two clustering evaluation measures.
39. What is overfitting in classification?
40. Mention one use-case of hybrid classification and clustering.
41. What is WEKA and why is it used?
42. List two datasets available in WEKA.
43. Mention any two classification algorithms in WEKA.
44. What is an ARFF file?
45. List any two preprocessing filters available in WEKA.
46. Define clustering and list one algorithm used in WEKA.
47. Mention one use-case for association rule mining in WEKA.
48. State the use of cross-validation in model evaluation.
49. List two tabs/features available in the WEKA Explorer interface.
50. Mention one advantage of using WEKA for educational purposes.