DMDW Technical Paper Presentation.
DMDW Technical Paper Presentation.
DMDW Technical Paper Presentation.
Organisations are today suffering from a malaise of data overflow. The developments in the
transaction processing technology has given rise to a situation where the amount and rate of data
capture is very high, but the processing of this data into information that can be utilised for decision
making, is not developing at the same pace. Knowledge and data mining provide a technology that
enables the decision-maker in the corporate sector/govt. to process this huge amount of data in a
reasonable amount of time, to extract intelligence/knowledge in a near real time. The data warehouse,
also called knowledge, allows the storage of data in a format that facilitates its access, but if the tools
for deriving information and/or knowledge and presenting them in a format that is useful for decision
making are not provided the whole rationale for the existence of the warehouse disappears. Various
technologies for extracting new insight from the data warehouse have come up which we classify
loosely as "Data Mining Techniques".
Our paper focuses on the need for information repositories and discovery of knowledge and
thence the overview of, the so hyped, Data Warehousing and Data Mining.
Content Overview
Introduction
What is Data-Warehousing?
Warehousing Functions
Architecture Of Data Warehouse
What is Data Mining?
Warehousing and Mining
Data Mining as a part of Knowledge Discovery
Technologies used in Data Mining
Goals of Data Mining and Knowledge Discovery
Applications & Compendium
Bibliography
Introduction
“Knowledge [no more Information] is not only power, but also has significant competitive
advantage”
Organizations have lately realized that just processing transactions and/or information’s faster
and more efficiently, no longer provide them with a competitive advantage vis-à-vis their competitors
for achieving business excellence. Information technology (IT) tools that are oriented towards
knowledge processing can provide the edge that organizations need to survive and thrive in the
current era of fierce competition. The increasing competitive pressures and the desire to leverage
information technology techniques have led many organizations to explore the benefits of new
emerging technology - "Data Warehousing and Data Mining". What is needed today is not just the
latest and updated to the nano-second information, but the cross-functional information that can help
decisions making activity as "on-line" process.
The evolution of the information systems characterize the evolution of systems from data
maintenance systems, to systems that transform the data into "information" for use in the decision
making process. These systems supported the information acquisition from the database of
transactional
data. The managerial knowledge acquisition function is/was not directly supported by these systems.
The evolution of new patterns in the changing scenario could not be provided by these systems
directly, the planner was supposed to do this from experience.
What is Data-Warehousing?
The data warehouse makes an attempt to figure out "what we need" before we know we need it. What
it actually is?
• This data is taken from various, perhaps incompatible, sources and stored in a uniform
format
• Several tools transform this data into meaningful business information for the purpose
of comparisons, trends and forecasting
• Data in a warehouse is not updates or changed in any way, but is only loaded and
accessed later on
• Data is organized according to subject instead of application.
In general a database is not a data warehouse unless it has the following two features:
• It collects information from a number of different disparate sources and is the place
where this disparity is reconciled, and
Information Sources always include the core operational systems, which form the backbone
of day-to-day activities. It is these systems, which have traditionally provided management
information to support decision-making.
Decision Support Tools are used to analyze the information stored in the warehouse, typically
to identify trends and new business opportunities.
The Data Warehouse itself is the bridge between the operational systems and the decision
support tools. It holds a copy of much of the operational system data in a logical structure,
which is more conducive to analysis. The Data Warehouse, which will be refreshed in
scheduled bursts from operational systems and from relevant external data sources, provides a
single, consistent view of corporate data, leaving operational systems unaffected.
The main function behind a data warehouse is to get the enterprise-wide data
Each implementation of a data warehouse is different in its detailed design (as shown in figure
below), but all are characterised by a handful of the following key components:
• A front end for Decision Support System (DSS) for reporting and for structured
andunstructuredanalysis
Data Mining
Data base mining or Data mining (DM) (formally termed Knowledge Discovery in Databases)
is a process that aims to use existing data to invent new facts and to uncover new relationships
previously unknown even to experts thoroughly familiar with the data. It is based on filtration and
assaying of mountain of data “ore” in order to get “nuggets” of knowledge. The data mining process
is diagrammatically exemplified in Figure below
Transformed Data
Data Sources
Extracted
1 Information
Assimilated Information
2 Data
Selected
Warehouse
Data
The goal of a data warehouse is to support decision making with data. Data mining can be
used in conjunction with a data warehouse to help with certain types of decisions. Data mining can be
applied to operational databases with individual transactions. To make data mining more efficient,
the data warehouse should have an aggregated or summarized collection of data .Data mining helps
in extracting meaningful new patterns that cannot be found necessarily by merely querying or
processing data or metadata in the data warehouse.
encompasses more than data mining. The knowledge discovery process comprises five phases:
Extraction: Extract patterns from the data warehouse, turning data into knowledge
Artificial neural networks: Non-linear predictive models that learn through training and
resemble biological neural networks in structure.
Genetic algorithms: Optimization techniques that use processes such as genetic combination,
mutation, and natural selection in a design based on the concepts of natural evolution.
Decision trees: Tree-shaped structures that represent sets of decisions. These decisions
generate rules for the classification of a dataset.
Case Based reasoning (CBR):To forecast a situation, or to make a correct decision, such
systems find the closest past analogs of the present situation and choose the same solution, which
was the right one in those past situations. That is why this method is also called the nearest
neighbor method
Rule induction: The extraction of useful if-then rules from data based on statistical
significance.
• Prediction: Data mining can show how certain attributes within the
an event, or an activity.
• Classification: Data mining can partition the data so that different classes or
• Optimization :One eventual goal of data mining may be to optimize the use of
• Applications
• Using Data Mining and Warehousing For Knowledge Discovery
• Knowledge Discovery is a powerful new solution to information overload.
• Make predictions about new data.
• Identify and explain hidden patterns and trends in existing data.
• Summarize the contents of large databases to facilitate understanding and decision making
• Dependability and fault tolerance
• High Availability and Disaster Recovery
• Survivability of evaluative systems
• Reliability and Robustness Issues
• Accuracy and reliability of responses
• Reliable and Failure Tolerant Business Process Integration
• Failure Tolerant and trustworthy Sensor Networks
• Privacy and security policies and social impact of data mining
• Privacy preserving data integration
• Access control techniques and secure data models
• Encryption & Authentication
• Pseudonymization and Encryption
• Anonymization and pseudonymization
Compendium
A new technological leap is needed to structure and prioritize information for specific end-user
problems.
The data mining tools can make this leap. Quantifiable business benefits have been proven through
the integration of data mining with current information systems
Over the next few years, the growth of data warehousing is going to be enormous with new products
and technologies coming out frequently.
Data mining helps in focusing business to servicing customers and to provide efficient business
processes.
Data Warehousing provides the means to change raw data into information for making effective
business decisions—the emphasis on information, not data. The data warehouse is the hub for
decision support data. A good data warehouse will... provide the RIGHT data... to the RIGHT
people... at the RIGHT time: RIGHT NOW! While data warehouse organizes data for business
analysis, Internet has emerged as the standard for information sharing.
The paper concludes hoping that companies will use data mining and data warehousing
effectively in order to focus on serving the customers and serving themselves in doing so.
Bibliography
• Internet E-books
• blog4jntu.co.cc
• .http://www.spss.com/datamine/ocdm.html