0% found this document useful (0 votes)
375 views62 pages

Data Warehousing and Data Mining

Data warehousing involves combining data from multiple sources into a centralized database for analysis. It provides businesses with analytics results from data mining, OLAP, scorecarding and reporting. A data warehouse is a subject-oriented, integrated, time-variant collection of data that supports management decision making. It combines data retrieval, analysis, extraction, loading, transformation and management. The goal is to provide a comprehensive view of business metrics and trends to support strategic business decisions.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
375 views62 pages

Data Warehousing and Data Mining

Data warehousing involves combining data from multiple sources into a centralized database for analysis. It provides businesses with analytics results from data mining, OLAP, scorecarding and reporting. A data warehouse is a subject-oriented, integrated, time-variant collection of data that supports management decision making. It combines data retrieval, analysis, extraction, loading, transformation and management. The goal is to provide a comprehensive view of business metrics and trends to support strategic business decisions.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 62

DATA WAREHOUSING AND DATA MINING

1 12/08/21 05:34 PM
DATA WAREHOUSING

 Data warehousing is combining data from


multiple sources into one comprehensive and
easily manipulated database.
 The primary aim for data warehousing is to
provide businesses with analytics results from
data mining, OLAP, Scorecarding and
reporting.

2 12/08/21 05:34 PM
 A physical repository where relational data are
specially organized to provide enterprise-wide,
cleansed data in a standardized format.
 A Data warehouse is a subject oriented,
integrated, time variant and non volatile
collection of data in support of management’s
decision making process – W H Inmon.

3 12/08/21 05:34 PM
 Characteristics of data warehousing
– Subject oriented
– Integrated
– Time variant (time series)
– Nonvolatile
– Web based
– Relational/multidimensional
– Client/server
– Real-time
– Include metadata

Refer PDF
Data Warehousing
Definitions and Concepts
 Data mart
A departmental data warehouse that stores only
relevant data
 Dependent data mart
A subset that is created directly from a data
warehouse
 Independent data mart
A small data warehouse designed for a strategic
business unit or a department
Data Warehousing
Definitions and Concepts

 Operational data stores (ODS)


A type of database often used as an interim area
for a data warehouse, especially for customer
information files
Data Warehousing
Definitions and Concepts

 Enterprise data warehouse (EDW)


A technology that provides a vehicle for pushing
data from source systems into a data warehouse
 Metadata
Data about data. In a data warehouse, metadata
describe the contents of a data warehouse and
the manner of its use
NEED FOR DATA WAREHOUSING

 Information is now considered as a key for all


the works.
 Those who gather, analyze, understand, and
act upon information are winners.
 Information have no limits, it is very hard to
collect information from various sources, so
we need an data warehouse from where we
can get all the information.
8 12/08/21 05:34 PM
TODAYS BUISNESS INFORMATION

9 12/08/21 05:34 PM
 The concept of DW emerges from several sets of information which
users need. The need have arisen from change in the management style
of different classes of users, who now need organization wide view of the
information. These needs are critical to the success of business.
 The decision makers are required to react quickly to mission critical
needs due to rapidly changing volatile and competitive markets.
 They need multidimensional support of information.
 They need information for strategic decisions. They need both internal
and external information which gives larger view of a problem scenario.
  

10 12/08/21 05:34 PM
 The features of such needs are fundamental for patterns and trends
and also require enterprise view as against functional localized view
of the subject. The DW is designed to meet these needs delivers the
same effectively.
 There are three kind of end users of in formations:
 The management
 Knowledge workers
 Operations staff.

11 12/08/21 05:34 PM
 The management needs holistic view of a situation expected predicting
in the future. It helps to critical changes has taken place in the business
showing any patterns and factors affecting the change and use it to
business advantage.
 The knowledge workers belong to middle management level in the
organizational hierarchy. Their needs are multidimensional depending
on their role and position.
 The needs of operations staff are fulfilled through transaction
processing system, where decision making process is automated by
embedding the rules in the system.
  
12 12/08/21 05:34 PM
DATA WAREHOUSING INCLUDES:-

 Retrieving data

 Analyzing data

 Extracting data

 Loading data

 Transforming data

 Managing data
13 12/08/21 05:34 PM
Data Warehousing
Process Overview

 Organizations continuously collect data,


information, and knowledge at an increasingly
accelerated rate and store them in computerized
systems
 The number of users needing to access the
information continues to increase as a result of
improved reliability and availability of network
access, especially the Internet
Data Warehousing
Process Overview
Data Warehousing
Process Overview

 The major components of a data warehousing


process
– Data sources
– Data extraction
– Data loading
– Comprehensive database
– Metadata
– Middleware tools
17 12/08/21 05:34 PM
18 12/08/21 05:34 PM
 First step in building DW is to extract data from different sources.
After this, the data needs to be validated for coding structures, name
and formats/ It is rationalized to a common unit of measure through
transformations or conversion process. Such data is then
consolidated to common reference level such as end of month,
region, zone etc. The data so processed is them moved to DW. All
these processes are handled by middleware, written o construct the
DW. Middleware is a set of programs and routines which pulls data
from various sources, checks and validate, moves it from one
platform to other and transforms to the DW design specifications and
then loads in DW.Since data in DW is ready to user for decision
making, it needs to be delivered in DW after instituting QA measures
on the data

19 12/08/21 05:34 PM
Components of Datwarehouse

 LOAD MANAGEMENT relates to the collection of information


from disparate internal or external sources. In most cases the
loading process includes summarizing, manipulating and
changing the data structures into a format that lends itself to
analytical processing. Actual raw data should be kept
alongside, or within, the data warehouse itself, thereby
enabling the construction of new and different representations.
A worst-case scenario, if the raw data is not stored, would be
to reassemble the data from the various disparate sources
around the organization simply to facilitate a different analysis.

20 12/08/21 05:34 PM
 WAREHOUSE MANAGEMENT relates to the day-to-day
management of the data warehouse. The management tasks
associated with the warehouse include ensuring its availability,
the effective backup of its contents, and its security.
 QUERY MANAGEMENT relates to the provision of access to
the contents of the warehouse and may include the partitioning
of information into different areas with different privileges to
different users. Access may be provided through custom-built
applications, or ad hoc query tools.

21 12/08/21 05:34 PM
 The architecture
Operational Reporting, query,
data source1
application development,
and EIS(executive
High
information system) tools
Meta-data summarized data
Operational Query Manage
data source 2 Lightly
Load Manager summarized
data

Operational
data source n Detailed data DBMS OLAP(online
analytical processing) tools

Operational
data store (ods)
Warehouse Manager

Operational data store (ODS)


Data mining

Archive/backup
data
End-user
access tools
Typical architecture of a data warehouse

22 12/08/21 05:34 PM
 load manageralso called the frontend component, it
performance all the operations associated with the extraction and
loading of data into the warehouse. These operations include
simple transformations of the data to prepare the data for entry
into the warehouse
 warehouse managerperforms all the operations associated with
the management of the data in the warehouse. The operations
performed by this component include analysis of data to ensure
consistency, transformation and merging of source data, creation
of indexes and views, generation of denormalizations and
aggregations, and archiving and backing-up data

23 12/08/21 05:34 PM
 query manageralso called backend
component, it performs all the operations
associated with the management of user
queries. The operations performed by this
component include directing queries to the
appropriate tables and scheduling the
execution of queries detailed, lightly and
lightly summarized data,archive/backup data

24 12/08/21 05:34 PM
 meta-data
 end-user access toolscan be categorized
into five main groups: data reporting and
query tools, application development tools,
executive information system (EIS) tools,
online analytical processing (OLAP) tools,
and data mining tools

25 12/08/21 05:34 PM
DATA WAREHOUSE ARCHITECTURE

 Data warehousing is designed to provide an


architecture that will make cooperate data
accessible and useful to users.
 There is no right or wrong architecture.
 The worthiness of the architecture can be
judge by its use, and concept behind it .
 Data Warehouses can be architected in
many different ways, depending on the
specific needs of a business. 
26 12/08/21 05:34 PM
Typical Data Warehousing Environment

27 12/08/21 05:34 PM
Data Warehousing Architectures

 Three parts of the data warehouse


– The data warehouse that contains the data and
associated software
– Data acquisition (back-end) software that extracts data
from legacy systems and external sources,
consolidates and summarizes them, and loads them
into the data warehouse
– Client (front-end) software that allows users to access
and analyze data from the warehouse
Data Warehousing
Architectures
Data Warehousing
Architectures
Data Warehousing
Architectures
Data Warehousing
Architectures
Data Warehousing
Architectures
Ten factors that potentially affect the architecture selection
decision:
5. Constraints on resources
1. Information
6. Strategic view of the data
interdependence between
warehouse prior to
organizational units
implementation
2. Upper management’s 7. Compatibility with existing
information needs systems
3. Urgency of need for a 8. Perceived ability of the in-
data warehouse house IT staff
4. Nature of end-user tasks 9. Technical issues
10. Social/political factors
 An operational data store (ODS) is basically a
database that is used for being an temporary
storage area for a datawarehouse.
 Its primary purpose is for handling data which
are progressively in use.
 Operational data store contains data which are
constantly updated through the course of the
business operations.
34 12/08/21 05:34 PM
 ETL (Extract, Transform, Load) is used to copy
data from:-
 ODS to data warehouse staging area.
 Data warehouse staging area to data warehouse
.
 Data warehouse to data mart .
 ETL extracts data, transforms values of
inconsistent data, cleanses "bad" data, filters
data and loads data into a target database. 
35 12/08/21 05:34 PM
 The Data Warehouse Staging Area is
temporary location where data from source
systems is copied. 
 It increases the speed of data warehouse
architecture.
 It is very essential since data is increasing
day by day.

36 12/08/21 05:34 PM
 The purpose of the Data Warehouse is to integrate
corporate data.
 The amount of data in the Data Warehouse is
massive.  Data is stored at a very deep level of
detail.
 This allows data to be grouped in unimaginable
ways.
 Data Warehouses does not contain all the data in
the organization ,It's purpose is to provide base that
are needed by the organization for strategic and
tactical decision making.  
37 12/08/21 05:34 PM
 ETL extract data from the Data Warehouse and
send to one or more Data Marts for use of users.
 Data marts are represented as shortcut to a data
warehouse ,to save time.
 It is just an partition of data present in data
warehouse.
 Each Data Mart can contain different
combinations of tables, columns and rows from
the Enterprise Data Warehouse. 
38 12/08/21 05:34 PM
Data in Data warehouse

39 12/08/21 05:34 PM
40 12/08/21 05:34 PM
:
There are three types of data in
the data warehouse

 · Base-level data,
 · Summary-level data,
 · Metadata.
 Business data in data warehouse can be stored in atomic form
or in summary. For eg, sales data could be stored by product
that is in atomic form or also summarized by product family.
 Base-Level Data
 Base-level data contains historical data that is normalized. It is
at the atomic level and is used to create summary-level data.
Base-level data is also used to reconcile the data contained in
the summary-level to the operational data.

41 12/08/21 05:34 PM
 Summary-Level Data
 Summary-level data contains historical data that is derived (i.e.,
summarized and aggregated) to support end-user reports and queries.
It is accessed by the end-user to perform decision making activities.
  
 The three currency features for business data are:
 Current data – view of business at the present time.
 Point in time data - snapshot of business data at a particular moment.
 Periodic data – business data is represented by periods such as last
three years, last 12 quarters. Etc
  

42 12/08/21 05:34 PM
43 12/08/21 05:34 PM
44 12/08/21 05:34 PM
45 12/08/21 05:34 PM
Advantages of Data warehouse

 One consistent data store for reporting, forecasting, and analysis


 Easier and timely access to data
 Improved end-user productivity
 Reduced costs
 Scalability ,Flexibility ,Reliability
 Competitive advantage
 Trend analysis and detection
 Drill down analysis
 Problem monitoring
 Executive analysis

46 12/08/21 05:34 PM
REASONS FOR CREATING AN DATA MART

 Easy access to frequently needed data.


 Creates collective view by a group of users.
 Improves user response time.
 Ease of creation.
 Lower cost than implementing a full Data
warehouse

47 12/08/21 05:34 PM
DATA MINING

 The non-trivial extraction of implicit,


previously unknown, and potentially useful
information from large databases.

– Extremely large datasets


– Useful knowledge that can improve
processes
– Cannot be done manually

48 12/08/21 05:34 PM
Where Has it Come From ?

49 12/08/21 05:34 PM
Motivation

 Databases today are huge:


– More than 1,000,000 entities/records/rows
– From 10 to 10,000 fields/attributes/variables
– Giga-bytes and tera-bytes
 Databases a growing at an unprecendented rate
 The corporate world is a cut-throat world
– Decisions must be made rapidly
– Decisions must be made with maximum knowledge

50 12/08/21 05:34 PM
How does data mining work?

 Extract, transform, and load transaction data onto


the data warehouse system.
 Store and manage the data in a multidimensional
database system.
 Provide data access to business analysts and
information technology professionals.
 Analyze the data by application software.
 Present the data in a useful format, such as a graph
or table

51 12/08/21 05:34 PM
DATA MINING MEASURES

 Accuracy
 Clarity
 Dirty Data
 Scalability
 Speed
 Validation

52 12/08/21 05:34 PM
Typical Applications of Data Mining

53 12/08/21 05:34 PM
ADVANTAGES OF DATA MINING

 Engineering and Technology


 Medical Science
 Business
 Combating Terrorism
 Games
 Research and Development

54 12/08/21 05:34 PM
Engineering and Technology

 In Electrical Power Engineering


- used for condition monitoring of high
voltage electrical equipment
- vibration monitoring and analysis of
transformer on-load tap-changers
 Education
- to concentrate their knowledge

55 12/08/21 05:34 PM
Medical Science

 Data mining has been widely used in area of


bioinformatics , genetics
 DNA sequences and variability in disease
susceptibility which is very important to help
improve the diagnosis, prevention and
treatment of the diseases

56 12/08/21 05:34 PM
BUSINESS

 In Customer Relationship Management


applications
 It Translate data from customer to merchant
Accurately
 Distribute Business Processes
 Powerful Tool For Marketing

57 12/08/21 05:34 PM
Combating terrorism

 Concept used by Interpol against terrorists


for searching their records by Multistate Anti-
Terrorism Information Exchange
 In the Secure Flight program , Computer
Assisted Passenger Pre screening System ,
Semantic Enhancement

58 12/08/21 05:34 PM
Games

 for certain combinatorial games, also called


table bases (e.g. for 3x3-chess)
 It includes extraction of human-usable
strategies
 Berlekamp in dots-and-boxes and Joh Nunn
in chess endgames are notable examples

59 12/08/21 05:34 PM
Research And Development

 Helps to Develop the search algorithms


 It offers huge libraries of graphing and
visualisation softwares
 The users can easily create the models
optimally

60 12/08/21 05:34 PM
List of the top eight data-mining
software vendors in 2008

 Angoss Software
 Infor CRM Epiphany
 Portrait Software
 SAS
 G-Stat
 SPSS
 ThinkAnalytics
 Unica
 Viscovery

61 12/08/21 05:34 PM
THANK YOU

62 12/08/21 05:34 PM

You might also like