COMP4433
COMP4433
Credit Value 3
Level 3
Objectives This subject aims at equipping students with the latest knowledge and skills to:
1. create a clean, consistent repository of data within a data warehouse for large
corporations;
4. expose students to new techniques and ideas that can be used to improve the
effectiveness of current data mining tools.
(a) identify and analyse why there is a need for data warehouse in addition to
traditional operational database systems, motivated by real examples;
(b) conduct in-depth analysis of the key components in typical and advanced data
warehouse architectures;
(c) design a data warehouse and understand the process required to construct one;
(d) identify and analyse why there is a need for data mining and in what ways it is
different from traditional statistical techniques, motivated by real examples;
(e) learn and master the algorithms made available by popular commercial data
mining software;
(f) solve real data mining problems by using the right tools to find interesting
patterns;
(h) obtain hands-on experience with some popular data mining software;
(k) generate innovative solutions individually or in groups and develop group work
skills directly and indirectly.
Subject Topic
Synopsis/
Indicative 1. Introduction to Data Warehousing and Data Mining
Syllabus
Introduction to data warehousing and data mining; possible application areas
in business and finance; definitions and terminologies; types of data mining
problems.
2. Data Warehousing
Data warehouse and data warehousing; data warehouse and the industry;
definitions; operational databases vs. data warehouses.
6. Association Rules
Mining of association rules; the Apriori algorithm; binary, quantitative and
generalised association rules; interestingness measures.
7. Classification
Classification; decision tree based algorithms; Bayesian approach; statistical
approaches, nearest neighbour approach; neural network based approach;
genetic algorithms based technique; evaluation of classification model.
8. Clustering
Clustering; k-means algorithm; hierarchical algorithm; condorset; neural
network and genetic algorithms based approach; evaluation of effectiveness.
Jul 2022
10. Other Techniques
Computation intelligence techniques; fuzzy logic, genetic algorithms and
neural networks for data mining.
Laboratory Experiment:
Topic
1. Discover Association rules and sequential patterns using data mining tools
2. Discover Classification rules using data mining tools
3. Discover Clusters using data mining tools
Case Study:
1. Application of data mining techniques to solve real business problems.
2. Attributes leading to success and failure of data warehousing projects tutorials
when appropriate.
Teaching/ This subject consists mainly of class lectures and laboratory sessions. For the class
Learning lectures, various cases will be presented to help student understand why there is a
Methodology need for data warehouse to be built and why data mining is important for modern day
business intelligence. Students will be given time to participate in discussions when
the cases are presented.
All assignments and projects will also be given in the form of different cases collected
so as to allow students to learn more about how data warehouse and data mining can
be and have been used in real business environment. For the projects and
assignments, students are expected to learn independently and think critically with
minimise guidance. They are expected to practice their writing kills through project
documentations and report writing. As students will work in teams on the project,
they are expected to also learn to work with each other collaboratively.
Jul 2022
Assessment
Specific % Intended subject learning outcomes to be assessed
Methods in
assessment weighting
Alignment with
methods/tasks a b c d e f g h i j k
Intended
Learning
Continuous 55%
Outcomes
Assessment
1. Assignment
2. Project
Examination 45%
Total 100 %
Tutorials 0 Hrs.
Jul 2022
Reading List Reference Books:
and References
1. Han, Jiawei and Kamber, Micheline, Data Mining: Concepts and Techniques,
3rd Edition, Morgan Kaufmann, 2012.
3. Inmon, W.H., Strauss, Derek and Neushloss, Genia, DW 2.0: The Architecture
for the Next Generation of Data Warehousing, Morgan Kaufmann, 2008.
4. Rokach, Lior and Maimon, Oded Z., Data Mining with Decision Trees: Theory
and Applications, World Scientific, 2008.
5. Witten, Ian H., Frank, Eibe and Hall, Mark A., Data Mining: Practical Machine
Learning Tools and Techniques, 3rd Edition, Morgan Kaufmann, 2011.
7. Cox, Earl, Fuzzy Modeling and Genetic Algorithms for Data Mining and
Exploration, Morgan Kaufmann, 2005.
8. Liu, Bing, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,
Springer, Berlin Heidelberg, 2009.
10. Shapiro, A.F. and Jain, L.C., Intelligent and Other Computational Techniques
in Insurance: Theory and Applications, World Scientific, 2003.
Jul 2022