Data Warehousing and Data Mining
Data Warehousing and Data Mining
Data Warehousing and Data Mining
com
www.fullinterview.com
www.chetanasprojects.com
A Paper Presentation on
www.1000projects.com
www.fullinterview.com
www.chetanasprojects.com
Abstract
Organisations are today suffering from a malaise of data overflow. The developments in the
transaction processing technology has given rise to a situation where the amount and rate of data
capture is very high, but the processing of this data into information that can be utilised for
decision making, is not developing at the same pace. Data warehousing and data mining (both data
& text) provide a technology that enables the decision-maker in the corporate sector/govt. to
process this huge amount of data in a reasonable amount of time, to extract intelligence/knowledge
The data warehouse allows the storage of data in a format that facilitates its access, but if
the tools for deriving information and/or knowledge and presenting them in a format that is useful
for decision making are not provided the whole rationale for the existence of the warehouse
disappears. Various technologies for extracting new insight from the data warehouse have come up
Our paper focuses on the need for information repositories and discovery of knowledge and
thence the overview of, the so hyped, Data Warehousing and Data Mining.
Content Overview
2
Page No
Introduction 4
What is Data-Warehousing? 5
Warehousing Functions 7
Compendium 11
Bibliography 11
Introduction
“Knowledge [no more Information] is not only power, but also has significant competitive
advantage”
Organizations have lately realized that just processing transactions and/or information’s
faster and more efficiently, no longer provides them with a competitive advantage vis-à-vis their
3
competitors for achieving business excellence. Information technology (IT) tools that are oriented
towards knowledge processing can provide the edge that organizations need to survive and thrive
in the current era of fierce competition. The increasing
4
FINGER
HUMAN
ABSTRACT:
competitive pressures and the
desire to leverage information technology techniques have led many organizations to explore the
benefits of new emerging technology – viz. "Data Warehousing and Data Mining". What is needed
5
Evolution of Information Technology Tools
The evolution of the information systems characterize the evolution of systems from data
maintenance systems, to systems that transform the data into "information" for use in the decision
making process. These systems supported the information acquisition from the database of
transactional data. The managerial knowledge acquisition function is/was not directly supported by
these systems. The evolution of new patterns in the changing scenario could not be provided by
these systems directly, the planner was supposed to do this from experience.
Data Knowledge
Information
6
What is Data-Warehousing?
The data warehouse makes an attempt to figure out "what we need," before we know we
need it.
∗ This data is taken from various, perhaps incompatible, sources and stored in a uniform format
∗ Several tools transform this data into meaningful business information for the purpose of
comparisons, trends and forecasting
∗ Data in a warehouse is not updates or changed in any way, but is only loaded and accessed later
on
In general a database is not a data warehouse unless it has the following two features:
∗ It collects information from a number of different disparate sources and is the place where this
disparity is reconciled, and
7
Information Sources always include the core operational systems which form the backbone of day-
to-day activities. It is these systems which have traditionally provided management information to
support decision making.
Decision Support Tools are used to analyze the information stored in the warehouse, typically to
identify trends and new business opportunities.
The Data Warehouse itself is the bridge between the operational systems and the decision support
tools. It holds a copy of much of the operational system data in a logical structure which is more
conducive to analysis. The Data Warehouse, which will be refreshed in scheduled bursts from
operational systems and from relevant external data sources, provides a single, consistent view of
corporate data, leaving operational systems unaffected.
The main function behind a data warehouse is to get the enterprise-wide data in a format
that is most useful to end-users, regardless of their locations. Data warehousing is used for:
* Increasing the speed and flexibility of analysis.
8
∗ A data model to define the warehouse contents.
∗ A front end for Decision Support System (DSS) for reporting and for structured and
unstructured analysis.
• Query and
Legacy Database reporting
• Multi-
Metadata
dimensional
Extract
Operational Database Transform Data analysis
Maintain Warehous tools
e • Other OLAP
tools
External Data Source
• Data mining
tools
Data Mining
Data base mining or Data mining (DM) (formally termed Knowledge Discovery in
Databases – KDD) is a process that aims to use existing data to invent new facts and to uncover
new relationships previously unknown even to experts thoroughly familiar with the data. It is like
extracting precious metal (say gold etc.) and/or gems, hence the term “mining”, it is based on
filtration and assaying of mountain of data “ore” in order to get “nuggets” of knowledge. The data
mining process is diagrammatically exemplified in Figure below
Transformed Data
Data Sources
Extracted
1 Information
Assimilated Information
2 Data
Selected
Warehouse
Data
9
N Select Transform Mine Assimilate
∗ Data mining can be used in conjunction with a data warehouse to help with certain types of
decisions.
∗ To make data mining more efficient, the data warehouse should have an aggregated or
summarized collection of data.
∗ Data mining helps in extracting meaningful new patterns that cannot be found necessarily by
merely querying or processing data or metadata in the data warehouse.
∗ Knowledge Discovery in Databases, frequently abbreviated as KDD, typically encompasses more than
data mining.
Data selection, Data about specific items or categories of items, or from stores in a specific region
or area of the country, may be selected.
Data cleansing process then may correct invalid zip codes or eliminate records with incorrect
phone prefixes.
Data transformation and encoding may be done to reduce the amount of data.
10
Goals of Data Mining and Knowledge Discovery
Prediction: Data mining can show how certain attributes within the data will behave in the future.
Identification: Data patterns can be used to identify the existence of an item, an event, or an activity.
Classification: Data mining can partition the data so that different classes or categories can be identified
based on combinations of parameters.
Optimization: One eventual goal of data mining may be to optimize the use of limited resources such as
time, space, money, or materials and to maximize output variables such as sales or profits under a given
set of constraints.
11
•
•
•
12
• Compendium
13
* A data warehouse takes the organisations operational data, historical data and external data
∗ Consolidates it into a separately designed database (which can either be relational or multi-
dimensional in nature)
∗ Manages it into a format that is optimised for end users to access and analyse.
When a data warehouse has been constructed, it provides a complete picture of the
enterprise. It provides an unparalleled opportunity to the management to learn about their
customers.
The data warehouse technology together with online transaction processing and data
mining, allows the management to provide better customer service, create greater customer loyalty
and activity, focus customer acquisition and retention of the most profitable customer, increase
revenue, reduce operating cost; provides tools that facilitate sounder decision making; improves
worker/management knowledge and productivity; spares the operational database from ad-hoc
queries with the resulting performance degradation and clears the legacy database system, while
moving the corporate system architecture forward.
With the incorporation of new data delivery and presentation techniques, like hypertext
mark up language (HTML), Open Database Connectivity (ODBC) etc. the database mining (Data
& Text) operation has gained wide spread recognition as a viable tool for business intelligence
gathering. Advances in the document mining technology (database mining of free form text/data,
in contrast to the “classical” approach to data mining of fixed length records) are making the data
mining technology more powerful.
Last but never the least, the Internet has emerged as the largest data warehouse of
unstructured and free form data. The new technologies are geared towards mining this great data
warehouse.
Bibliography
14
Using Information Technology by William Sawyers Hutchinson
http://www.infogoal.com/
http://www.cse.buffalo.edu/
http://www.datawarehousingonline.com/
15