0% found this document useful (0 votes)

171 views11 pages

DWDM Lecture Notes U-1

Uploaded by

harshale13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

171 views11 pages

DWDM Lecture Notes U-1

Uploaded by

harshale13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

UNIT-I

Knowledge Discovery in Databases (KDD)

Some people treat data mining same as Knowledge discovery while some people view data
mining essential step in process of knowledge discovery. Here is the list of steps involved
in knowledge discovery process:

Data Cleaning - In this step the noise and inconsistent data is removed.
Data Integration - In this step multiple data sources are combined.
Data Selection - In this step relevant to the analysis task are retrieved from the database.
Data Transformation - In this step data are transformed or consolidated intoforms
appropriate for mining by performing summary or aggregation operations.
Data Mining - In this step intelligent methods are applied in order to extract data
patterns.
Pattern Evaluation - In this step, data patterns are evaluated.
Knowledge Presentation - In this step,knowledge is represented.

1
The following diagram shows the process of knowledge discovery process:

Architecture of KDD

Data Warehouse:

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of

data in support of management's decision making process.

2
Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A
and source B may have different ways of identifying a product, but in a data warehouse, there
will be only a single way of identifying a product.

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data
from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts
with a transactions system, where often only the most recent data is kept. For example, a
transaction system may hold the most recent address of a customer, where a data warehouse can
hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data
warehouse should never be altered.

Data Warehouse Design Process:

A data warehouse can be built using a top-down approach, a bottom-up approach, or a

combination of both.

The top-down approach starts with the overall design and planning. It is useful in cases
where the technology is mature and well known, and where the business problems that must
be solved are clear and well understood.

The bottom-up approach starts with experiments and prototypes. This is useful in the early
stage of business modeling and technology development. It allows an organization to move
forward at considerably less expense and to evaluate the benefits of the technology before
making significant commitments.

In the combined approach, an organization can exploit the planned and strategic nature of
the top-down approach while retaining the rapid implementation and opportunistic
application of the bottom-up approach.

3
The warehouse design process consists of the following steps:

Choose a business process to model, for example, orders, invoices, shipments, inventory,
account administration, sales, or the general ledger. If the business process is organizational
and involves multiple complex object collections, a data warehouse model should be
followed. However, if the process is departmental and focuses on the analysis of one kind of
business process, a data mart model should be chosen.
Choose the grain of the business process. The grain is the fundamental, atomic level of data
to be represented in the fact table for this process, for example, individual transactions,
individual daily snapshots, and so on.
Choose the dimensions that will apply to each fact table record. Typical dimensions are
time, item, customer, supplier, warehouse, transaction type, and status.
Choose the measures that will populate each fact table record. Typical measures are numeric
additive quantities like dollars sold and units sold.

4
A Three Tier Data Warehouse Architecture:

Tier-1:

The bottom tier is a warehouse database server that is almost always a relationaldatabase
system. Back-end tools and utilities are used to feed data into the bottomtier from
operational databases or other external sources (such as customer profileinformation
provided by external consultants). These tools and utilities performdataextraction,
cleaning, and transformation (e.g., to merge similar data from differentsources into a
unified format), as well as load and refresh functions to update thedata warehouse . The
data are extracted using application programinterfaces known as gateways. A gateway is

5
supported by the underlying DBMS andallows client programs to generate SQL code to
be executed at a server.

Examplesof gateways include ODBC (Open Database Connection) and OLEDB (Open
Linkingand Embedding for Databases) by Microsoft and JDBC (Java Database
Connection).
This tier also contains a metadata repository, which stores information aboutthe data
warehouse and its contents.

Tier-2:

The middle tier is an OLAP server that is typically implemented using either a relational
OLAP (ROLAP) model or a multidimensional OLAP.

OLAP model is an extended relational DBMS thatmaps operations on

multidimensional data to standard relational operations.
A multidimensional OLAP (MOLAP) model, that is, a special-purpose server
that directly implements multidimensional data and operations.

Tier-3:

The top tier is a front-end client layer, which contains query and reporting
tools, analysis tools, and/or data mining tools (e.g., trend analysis, prediction, and so
on).

6
Data Warehouse Models:

There are three data warehouse models.

1. Enterprise warehouse:
An enterprise warehouse collects all of the information about subjects spanning the entire
organization.
It provides corporate-wide data integration, usually from one or more operational
systems or external information providers, and is cross-functional in scope.
It typically contains detailed data aswell as summarized data, and can range in size from a
few gigabytes to hundreds of gigabytes, terabytes, or beyond.
An enterprise data warehouse may be implemented on traditional mainframes, computer
superservers, or parallel architecture platforms. It requires extensive business modeling
and may take years to design and build.

2. Data mart:

A data mart contains a subset of corporate-wide data that is of value to aspecific group of
users. The scope is confined to specific selected subjects. For example,a marketing data
mart may confine its subjects to customer, item, and sales. Thedata contained in data
marts tend to be summarized.

Data marts are usually implemented on low-cost departmental servers that

areUNIX/LINUX- or Windows-based. The implementation cycle of a data mart ismore
likely to be measured in weeks rather than months or years. However, itmay involve
complex integration in the long run if its design and planning werenot enterprise-wide.

7
Depending on the source of data, data marts can be categorized as independent
ordependent. Independent data marts are sourced fromdata captured fromone or
moreoperational systems or external information providers, or fromdata generated
locallywithin a particular department or geographic area. Dependent data marts are
sourceddirectly from enterprise data warehouses.

3. Virtual warehouse:

A virtual warehouse is a set of views over operational databases. Forefficient

query processing, only some of the possible summary views may bematerialized.
A virtual warehouse is easy to build but requires excess capacity on operational
database servers.

Meta Data Repository:

Metadata are data about data.When used in a data warehouse, metadata are the data thatdefine
warehouse objects. Metadata are created for the data names anddefinitions of the given
warehouse. Additional metadata are created and captured fortimestamping any extracted data,
the source of the extracted data, and missing fieldsthat have been added by data cleaning or
integration processes.

A metadata repository should contain the following:

A description of the structure of the data warehouse, which includes the warehouse
schema, view, dimensions, hierarchies, and derived data definitions, as well as data mart
locations and contents.

Operational metadata, which include data lineage (history of migrated data and the
sequence of transformations applied to it), currency of data (active, archived, or purged),
and monitoring information (warehouse usage statistics, error reports, and audit trails).

8
The algorithms used for summarization, which include measure and dimension
definitionalgorithms, data on granularity, partitions, subject areas, aggregation,
summarization,and predefined queries and reports.

The mapping from the operational environment to the data warehouse, which
includessource databases and their contents, gateway descriptions, data partitions, data
extraction, cleaning, transformation rules and defaults, data refresh and purging rules,
andsecurity (user authorization and access control).

Data related to system performance, which include indices and profiles that improvedata
access and retrieval performance, in addition to rules for the timing and scheduling of
refresh, update, and replication cycles.

Business metadata, which include business terms and definitions, data

ownershipinformation, and charging policies.

OLAP(Online analytical Processing):

OLAP is an approach to answering multi-dimensional analytical (MDA) queries swiftly.

OLAP is part of the broader category of business intelligence, which also
encompasses relational database, report writing and data mining.
OLAP tools enable users to analyze multidimensional data interactively
frommultiple perspectives.

OLAP consists of three basic analytical operations:

 Consolidation (Roll-Up)
 Drill-Down

9
 Slicing And Dicing

Consolidation involves the aggregation of data that can be accumulated and computed in
one or more dimensions. For example, all sales offices are rolled up to the sales
department or sales division to anticipate sales trends.

The drill-down is a technique that allows users to navigate through the details. For
instance, users can view the sales by individual products that make up a region’s
sales.

Slicing and dicing is a feature whereby users can take out (slicing) a specific set of
data of the OLAP cube and view (dicing) the slices from different viewpoints.

Types of OLAP:

1. Relational OLAP (ROLAP):

ROLAP works directly with relational databases. The base data and the dimension
tables are stored as relational tables and new tables are created to hold the aggregated
information. It depends on a specialized schema design.
This methodology relies on manipulating the data stored in the relational database to
give the appearance of traditional OLAP's slicing and dicing functionality. In essence,
each action of slicing and dicing is equivalent to adding a "WHERE" clause in the
SQL statement.
ROLAP tools do not use pre-calculated data cubes but instead pose the query to the
standard relational database and its tables in order to bring back the data required to
answer the question.
ROLAP tools feature the ability to ask any question because the methodology does
not limit to the contents of a cube. ROLAP also has the ability to drill down to the
lowest level of detail in the database.

10
2. Multidimensional OLAP (MOLAP):

MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP.

MOLAP stores this data in an optimized multi-dimensional array storage, rather than
in a relational database. Therefore it requires the pre-computation and storage of
information in the cube - the operation known as processing.

MOLAP tools generally utilize a pre-calculated data set referred to as a data cube.
The data cube contains all the possible answers to a given range of questions.

MOLAP tools have a very fast response time and the ability to quickly write
back data into the data set.

3. Hybrid OLAP (HOLAP):

There is no clear agreement across the industry as to what constitutes Hybrid OLAP,
except that a database will divide data between relational and specialized storage.
For example, for some vendors, a HOLAP database will use relational tables to hold
the larger quantities of detailed data, and use specialized storage for at least some
aspects of the smaller quantities of more-aggregate or less-detailed data.
HOLAP addresses the shortcomings of MOLAP and ROLAP by combining the
capabilities of both approaches.
HOLAP tools can utilize both pre-calculated cubes and relational data sources.

Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Data Warehousing - Architecture - Tutorialspoint
No ratings yet
Data Warehousing - Architecture - Tutorialspoint
7 pages
DWDM Lecture Notes III-II
No ratings yet
DWDM Lecture Notes III-II
81 pages
C Lecture
No ratings yet
C Lecture
8 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
DWDM Lecture Notes III-ii - For Nlcad-6-86
No ratings yet
DWDM Lecture Notes III-ii - For Nlcad-6-86
81 pages
Fundamentals of Data Science Notes (Module - 2)
No ratings yet
Fundamentals of Data Science Notes (Module - 2)
11 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
03 Data Warehouse
No ratings yet
03 Data Warehouse
27 pages
Knowledge Discovery in Databases (KDD) Lect 4
No ratings yet
Knowledge Discovery in Databases (KDD) Lect 4
28 pages
Module 1 Notes
No ratings yet
Module 1 Notes
29 pages
Data Warehouse
No ratings yet
Data Warehouse
74 pages
Unit 1
No ratings yet
Unit 1
22 pages
Data Warehouse
No ratings yet
Data Warehouse
5 pages
CS 2208 Data Mining and Warehousing Notes
No ratings yet
CS 2208 Data Mining and Warehousing Notes
14 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
No ratings yet
DWDM Fresh Notes For Unit 1, Unit 2, Unit 3
54 pages
Approach, or A Combination of Both
No ratings yet
Approach, or A Combination of Both
12 pages
Data Warehouse Development Approach
No ratings yet
Data Warehouse Development Approach
25 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
DWDM 5 Unit Notes
No ratings yet
DWDM 5 Unit Notes
86 pages
FDS Unit 2
No ratings yet
FDS Unit 2
21 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
DW Unit 1
No ratings yet
DW Unit 1
29 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
Data Warehousing
No ratings yet
Data Warehousing
35 pages
Data Warehouse
No ratings yet
Data Warehouse
39 pages
DWDM202
No ratings yet
DWDM202
6 pages
What Is A Data Warehouse?
No ratings yet
What Is A Data Warehouse?
39 pages
Paper Presentation: Data Ware Housing AND Data Mining
No ratings yet
Paper Presentation: Data Ware Housing AND Data Mining
10 pages
DWDM
No ratings yet
DWDM
15 pages
Data Warehousing and Data Mining Final Year Seminar Topic
No ratings yet
Data Warehousing and Data Mining Final Year Seminar Topic
10 pages
Olap and Oltap
No ratings yet
Olap and Oltap
14 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
2 Ba
No ratings yet
2 Ba
37 pages
Bida Notes
No ratings yet
Bida Notes
67 pages
Presentation DW DM
No ratings yet
Presentation DW DM
132 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
Warehouse
No ratings yet
Warehouse
60 pages
Module 3 - Datawarehousing
No ratings yet
Module 3 - Datawarehousing
45 pages
U1 DMBI
No ratings yet
U1 DMBI
51 pages
Module-3 Data Warehousing
No ratings yet
Module-3 Data Warehousing
44 pages
Unit3 Notes
No ratings yet
Unit3 Notes
15 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
DWDM UNIT-1 Lecture Notes
No ratings yet
DWDM UNIT-1 Lecture Notes
15 pages
Lecture6 Three Tier Architecture 11052016
No ratings yet
Lecture6 Three Tier Architecture 11052016
13 pages
Data Warehouse & Data Mining Notes
No ratings yet
Data Warehouse & Data Mining Notes
9 pages
Lesson 2
No ratings yet
Lesson 2
25 pages
DMDW1
No ratings yet
DMDW1
13 pages
Report On Principles of Fragmentation in Computer Science
No ratings yet
Report On Principles of Fragmentation in Computer Science
26 pages
Data Ware House
No ratings yet
Data Ware House
25 pages
Unit - 3 Data Warehouse Modelling and Online Analytical Processing II
No ratings yet
Unit - 3 Data Warehouse Modelling and Online Analytical Processing II
50 pages
Overview of Data Warehousing and OLAP
No ratings yet
Overview of Data Warehousing and OLAP
12 pages
DW Concepts
No ratings yet
DW Concepts
40 pages
Lec09-Data Warehousing
No ratings yet
Lec09-Data Warehousing
32 pages
Unit 2 Datawarehouse
No ratings yet
Unit 2 Datawarehouse
17 pages
Introduction On Data Warehouse With OLTP and OLAP: Arpit Parekh
No ratings yet
Introduction On Data Warehouse With OLTP and OLAP: Arpit Parekh
5 pages
DW Concepts
No ratings yet
DW Concepts
40 pages
MUSTAFA TAREK. Public Accountant
No ratings yet
MUSTAFA TAREK. Public Accountant
2 pages
Haha
No ratings yet
Haha
3 pages
Q3 Module6 CSS9
No ratings yet
Q3 Module6 CSS9
7 pages
Bertha L. Turner - The Federation Cook Book (CA. 1910)
100% (3)
Bertha L. Turner - The Federation Cook Book (CA. 1910)
100 pages
Iso 21138-3 2020
No ratings yet
Iso 21138-3 2020
42 pages
Soul
100% (6)
Soul
101 pages
35 City of Manila Vs Chinese Community
No ratings yet
35 City of Manila Vs Chinese Community
2 pages
A Triumph of Surgery - Explanation +glossary
0% (1)
A Triumph of Surgery - Explanation +glossary
3 pages
ENG-189 SAS12 Speaking 2324
No ratings yet
ENG-189 SAS12 Speaking 2324
7 pages
Oid Esp All Eat A Paper Thailand Final
No ratings yet
Oid Esp All Eat A Paper Thailand Final
6 pages
Format of Response Sheet
No ratings yet
Format of Response Sheet
3 pages
Treasurers Certificate
No ratings yet
Treasurers Certificate
2 pages
Research With Page Numbers 1
No ratings yet
Research With Page Numbers 1
14 pages
STIHL FS 110 Owners Instruction Manual
No ratings yet
STIHL FS 110 Owners Instruction Manual
116 pages
Classification of Rocks
No ratings yet
Classification of Rocks
2 pages
Blavet
No ratings yet
Blavet
25 pages
FSC BT405 Datasheet
No ratings yet
FSC BT405 Datasheet
6 pages
Updated List Convocation
No ratings yet
Updated List Convocation
216 pages
EDU4 Instructors Lesson Plan Final
No ratings yet
EDU4 Instructors Lesson Plan Final
26 pages
Module 3 Becg
No ratings yet
Module 3 Becg
23 pages
Slides Chapter 2 (PDF) (ENG) Theories of International Trade
No ratings yet
Slides Chapter 2 (PDF) (ENG) Theories of International Trade
33 pages
Ethical Code of Conduct and Corporate Governance - Tatat Steel
No ratings yet
Ethical Code of Conduct and Corporate Governance - Tatat Steel
26 pages
Chapter 1 (BC)
No ratings yet
Chapter 1 (BC)
30 pages
Character Master Sheets - V2
No ratings yet
Character Master Sheets - V2
103 pages
Special Ed Thesis Topics
100% (3)
Special Ed Thesis Topics
5 pages
3 Laptop 26 Oktober 2020
No ratings yet
3 Laptop 26 Oktober 2020
1 page
Spellbound Kingdoms Revised
100% (5)
Spellbound Kingdoms Revised
300 pages
Atmospheric-pollutants-EXAM-QUESTIONS-Mark Scheme
No ratings yet
Atmospheric-pollutants-EXAM-QUESTIONS-Mark Scheme
3 pages
Adaptive Headlight With Orvm Technology
No ratings yet
Adaptive Headlight With Orvm Technology
7 pages
Ucsp DLL
100% (1)
Ucsp DLL
18 pages

DWDM Lecture Notes U-1

Uploaded by

DWDM Lecture Notes U-1

Uploaded by

UNIT-I

Knowledge Discovery in Databases (KDD)

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of

Data Warehouse Design Process:

A data warehouse can be built using a top-down approach, a bottom-up approach, or a

OLAP model is an extended relational DBMS thatmaps operations on

There are three data warehouse models.

Data marts are usually implemented on low-cost departmental servers that

A virtual warehouse is a set of views over operational databases. Forefficient

Meta Data Repository:

A metadata repository should contain the following:

Business metadata, which include business terms and definitions, data

OLAP(Online analytical Processing):

OLAP is an approach to answering multi-dimensional analytical (MDA) queries swiftly.

OLAP consists of three basic analytical operations:

1. Relational OLAP (ROLAP):

3. Hybrid OLAP (HOLAP):

You might also like