0% found this document useful (0 votes)
90 views15 pages

Data Mining and Data Warehousing

This document discusses data warehousing and data mining. It covers key concepts like data warehouse architectures, data cube modeling, OLAP operations, and efficient processing techniques. It also discusses data mining tasks like association analysis, classification, clustering analysis and algorithms like decision trees, k-means clustering etc. The outcomes are to identify data mining problems, implement data warehouses, write association rules and choose classification or clustering solutions. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data to support management decision making.

Uploaded by

pavanalina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views15 pages

Data Mining and Data Warehousing

This document discusses data warehousing and data mining. It covers key concepts like data warehouse architectures, data cube modeling, OLAP operations, and efficient processing techniques. It also discusses data mining tasks like association analysis, classification, clustering analysis and algorithms like decision trees, k-means clustering etc. The outcomes are to identify data mining problems, implement data warehouses, write association rules and choose classification or clustering solutions. It defines a data warehouse as a subject-oriented, integrated, time-variant and non-volatile collection of data to support management decision making.

Uploaded by

pavanalina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Mining and Data

Warehousing
18CS641
Module 1

 Data Warehousing & modelling


Basic Concepts: Data Warehousing: A multitier Architecture, Data
warehouse models: Enterprise warehouse, Data mart and virtual
warehouse, Extraction, Transformation and loading, Data Cube: A
multidimensional data model, Stars, Snowflakes and Fact
constellations: Schemas for multidimensional Data models,
Dimensions: The role of concept Hierarchies, Measures: Their
Categorization and computation, Typical OLAP Operations
Module 2

 Data warehouse implementation & Data mining:


Efficient Data Cube computation: An overview, Indexing OLAP
Data: Bitmap index and join index, Efficient processing of OLAP
Queries, OLAP server Architecture ROLAP versus MOLAP Versus
HOLAP. : Introduction: What is data mining, Challenges, Data
Mining Tasks, Data: Types of Data, Data Quality, Data
Preprocessing, Measures of Similarity and Dissimilarity.
Module 3

 Association Analysis
Association Analysis: Problem Definition, Frequent Item set
Generation, Rule generation. Alternative Methods for Generating
Frequent Item sets, FPGrowth Algorithm, Evaluation of Association
Patterns.
Module 4

 Classification
Decision Trees Induction, Method for Comparing Classifiers, Rule
Based Classifiers, Nearest Neighbor Classifiers, Bayesian
Classifiers.
Module 5

 Clustering Analysis
Overview, K-Means, Agglomerative Hierarchical Clustering,
DBSCAN, Cluster Evaluation, Density-Based Clustering, Graph-
Based Clustering, Scalable Clustering Algorithms.
Couse Outcomes

 The student will be able to


• Identify data mining problems and implement the data warehouse
• Write association rules for a given data pattern.
• Choose between classification and clustering solution
Textbooks

1. Pang-Ning Tan, Michael Steinbach, Vipin Kumar: Introduction to Data Mining,


Pearson, First impression,2014.
2. Jiawei Han, Micheline Kamber, Jian Pei: Data Mining -Concepts and
Techniques, 3rd Edition, Morgan Kaufmann Publisher, 2012.
What is a Data Warehouse?

Definition: “Data Warehouse is a subject-oriented , integrated, time variant and


non-volatile collection of data in support of management’s decision making
process” -William H. Inmon
 It provides architectures and tools for business executives to systematically
organize, understand, and use their data to make strategic decisions.
 It is data repository that is maintained separately from an organization’s
operational databases.
 It allows for integration of a variety of application systems.
 It supports information processing by providing a solid platform of
consolidated historic data for analysis
Key features of Data Warehouse are

 Subject oriented
 Integrated
 Time -variant
 Non- volatile
Subject oriented

 Data warehouse is organized around major subjects such


as customer, supplier, product, and sales rather than day
to day transaction
 It only focusses on modelling and analysis of data for
decision makers (subjects)
 It provides a concise view of particular subject issue which
is useful for decision making process
Integrated

 A data warehouse usually constructed by integrating


multiple heterogeneous such as relational data base, flat
file and online transaction records.
 Data cleaning and data integration techniques are applied
to ensure consistency in naming conventions, encoding
structures, attribute measures
Time – Variant

 Data are stored to provide information from an historic


perspective (e.g., the past 5–10 years).
 Every key structure in the data warehouse contains, either
implicitly or explicitly, a time element
Non-volatile

 A data warehouse is always a physically separate store of


data transformed from the application data found in the
operational environment.
 Due to this separation, a data warehouse does not require
transaction processing, recovery, and concurrency control
mechanisms.
 It usually requires only two operations in data accessing:
initial loading of data and access of data.

You might also like