Data Warehousing and Data Mining
Data Warehousing and Data Mining
1 12/08/21 05:34 PM
DATA WAREHOUSING
2 12/08/21 05:34 PM
A physical repository where relational data are
specially organized to provide enterprise-wide,
cleansed data in a standardized format.
A Data warehouse is a subject oriented,
integrated, time variant and non volatile
collection of data in support of management’s
decision making process – W H Inmon.
3 12/08/21 05:34 PM
Characteristics of data warehousing
– Subject oriented
– Integrated
– Time variant (time series)
– Nonvolatile
– Web based
– Relational/multidimensional
– Client/server
– Real-time
– Include metadata
Refer PDF
Data Warehousing
Definitions and Concepts
Data mart
A departmental data warehouse that stores only
relevant data
Dependent data mart
A subset that is created directly from a data
warehouse
Independent data mart
A small data warehouse designed for a strategic
business unit or a department
Data Warehousing
Definitions and Concepts
9 12/08/21 05:34 PM
The concept of DW emerges from several sets of information which
users need. The need have arisen from change in the management style
of different classes of users, who now need organization wide view of the
information. These needs are critical to the success of business.
The decision makers are required to react quickly to mission critical
needs due to rapidly changing volatile and competitive markets.
They need multidimensional support of information.
They need information for strategic decisions. They need both internal
and external information which gives larger view of a problem scenario.
10 12/08/21 05:34 PM
The features of such needs are fundamental for patterns and trends
and also require enterprise view as against functional localized view
of the subject. The DW is designed to meet these needs delivers the
same effectively.
There are three kind of end users of in formations:
The management
Knowledge workers
Operations staff.
11 12/08/21 05:34 PM
The management needs holistic view of a situation expected predicting
in the future. It helps to critical changes has taken place in the business
showing any patterns and factors affecting the change and use it to
business advantage.
The knowledge workers belong to middle management level in the
organizational hierarchy. Their needs are multidimensional depending
on their role and position.
The needs of operations staff are fulfilled through transaction
processing system, where decision making process is automated by
embedding the rules in the system.
12 12/08/21 05:34 PM
DATA WAREHOUSING INCLUDES:-
Retrieving data
Analyzing data
Extracting data
Loading data
Transforming data
Managing data
13 12/08/21 05:34 PM
Data Warehousing
Process Overview
19 12/08/21 05:34 PM
Components of Datwarehouse
20 12/08/21 05:34 PM
WAREHOUSE MANAGEMENT relates to the day-to-day
management of the data warehouse. The management tasks
associated with the warehouse include ensuring its availability,
the effective backup of its contents, and its security.
QUERY MANAGEMENT relates to the provision of access to
the contents of the warehouse and may include the partitioning
of information into different areas with different privileges to
different users. Access may be provided through custom-built
applications, or ad hoc query tools.
21 12/08/21 05:34 PM
The architecture
Operational Reporting, query,
data source1
application development,
and EIS(executive
High
information system) tools
Meta-data summarized data
Operational Query Manage
data source 2 Lightly
Load Manager summarized
data
Operational
data source n Detailed data DBMS OLAP(online
analytical processing) tools
Operational
data store (ods)
Warehouse Manager
Archive/backup
data
End-user
access tools
Typical architecture of a data warehouse
22 12/08/21 05:34 PM
load manageralso called the frontend component, it
performance all the operations associated with the extraction and
loading of data into the warehouse. These operations include
simple transformations of the data to prepare the data for entry
into the warehouse
warehouse managerperforms all the operations associated with
the management of the data in the warehouse. The operations
performed by this component include analysis of data to ensure
consistency, transformation and merging of source data, creation
of indexes and views, generation of denormalizations and
aggregations, and archiving and backing-up data
23 12/08/21 05:34 PM
query manageralso called backend
component, it performs all the operations
associated with the management of user
queries. The operations performed by this
component include directing queries to the
appropriate tables and scheduling the
execution of queries detailed, lightly and
lightly summarized data,archive/backup data
24 12/08/21 05:34 PM
meta-data
end-user access toolscan be categorized
into five main groups: data reporting and
query tools, application development tools,
executive information system (EIS) tools,
online analytical processing (OLAP) tools,
and data mining tools
25 12/08/21 05:34 PM
DATA WAREHOUSE ARCHITECTURE
27 12/08/21 05:34 PM
Data Warehousing Architectures
36 12/08/21 05:34 PM
The purpose of the Data Warehouse is to integrate
corporate data.
The amount of data in the Data Warehouse is
massive. Data is stored at a very deep level of
detail.
This allows data to be grouped in unimaginable
ways.
Data Warehouses does not contain all the data in
the organization ,It's purpose is to provide base that
are needed by the organization for strategic and
tactical decision making.
37 12/08/21 05:34 PM
ETL extract data from the Data Warehouse and
send to one or more Data Marts for use of users.
Data marts are represented as shortcut to a data
warehouse ,to save time.
It is just an partition of data present in data
warehouse.
Each Data Mart can contain different
combinations of tables, columns and rows from
the Enterprise Data Warehouse.
38 12/08/21 05:34 PM
Data in Data warehouse
39 12/08/21 05:34 PM
40 12/08/21 05:34 PM
:
There are three types of data in
the data warehouse
· Base-level data,
· Summary-level data,
· Metadata.
Business data in data warehouse can be stored in atomic form
or in summary. For eg, sales data could be stored by product
that is in atomic form or also summarized by product family.
Base-Level Data
Base-level data contains historical data that is normalized. It is
at the atomic level and is used to create summary-level data.
Base-level data is also used to reconcile the data contained in
the summary-level to the operational data.
41 12/08/21 05:34 PM
Summary-Level Data
Summary-level data contains historical data that is derived (i.e.,
summarized and aggregated) to support end-user reports and queries.
It is accessed by the end-user to perform decision making activities.
The three currency features for business data are:
Current data – view of business at the present time.
Point in time data - snapshot of business data at a particular moment.
Periodic data – business data is represented by periods such as last
three years, last 12 quarters. Etc
42 12/08/21 05:34 PM
43 12/08/21 05:34 PM
44 12/08/21 05:34 PM
45 12/08/21 05:34 PM
Advantages of Data warehouse
46 12/08/21 05:34 PM
REASONS FOR CREATING AN DATA MART
47 12/08/21 05:34 PM
DATA MINING
48 12/08/21 05:34 PM
Where Has it Come From ?
49 12/08/21 05:34 PM
Motivation
50 12/08/21 05:34 PM
How does data mining work?
51 12/08/21 05:34 PM
DATA MINING MEASURES
Accuracy
Clarity
Dirty Data
Scalability
Speed
Validation
52 12/08/21 05:34 PM
Typical Applications of Data Mining
53 12/08/21 05:34 PM
ADVANTAGES OF DATA MINING
54 12/08/21 05:34 PM
Engineering and Technology
55 12/08/21 05:34 PM
Medical Science
56 12/08/21 05:34 PM
BUSINESS
57 12/08/21 05:34 PM
Combating terrorism
58 12/08/21 05:34 PM
Games
59 12/08/21 05:34 PM
Research And Development
60 12/08/21 05:34 PM
List of the top eight data-mining
software vendors in 2008
Angoss Software
Infor CRM Epiphany
Portrait Software
SAS
G-Stat
SPSS
ThinkAnalytics
Unica
Viscovery
61 12/08/21 05:34 PM
THANK YOU
62 12/08/21 05:34 PM