DBMS, Data Warehousing and Data Mining
DBMS, Data Warehousing and Data Mining
What
Whatproduct
productprom- Which
prom- Whichcustomers
customers
-otions
-otionshave
havethe
thebiggest are
biggest aremost
mostlikely
likelyto
togo
go
impact
impactononrevenue? to
revenue? tothe
thecompetition
competition??
What
Whatimpact
impactwill
will
new
newproducts/services
products/services
have
haveon
onrevenue
revenue
and
andmargins?
margins?
Data problems and difficulties
• Amount of data increases exponentially with
time.
• Data are scattered throughout organizations and
are collected by many individuals using several
methods
• Data security, quality & integrity are critical , yet
are easily jeopardized
• Selecting data mgmnt tools can be a major
problem.
Data warehouse
• The main repository of an organization's
historical data, its corporate memory
• Contains the raw material for
management's decision support system
• A data analyst can perform complex
queries and analysis, such as data mining,
on the information without slowing down
the operational systems
Data warehouse
• In data warehousing, you create stores
of informational data, data that is
extracted from the operational data and
then transformed for decision making.
Data Warehouse
• A data warehouse is a
– subject-oriented
– integrated
– time-varying
– non-volatile
collection of data that is used primarily in
organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996
Characteristics
• Organisation: Data are organised by subject & contain
information relevant for DSS only.
• Consistency : Data will be coded in a consistent manner
• Time variant : The data are kept for many years & used for
trends,forecasting, & comparisons
• Non volatile: Once entered into the warehouse, data are not
erased
• Relational: Data warehouse uses a relational structure
• Client/Server:To provide the end user an easy access to its data
• Web based: provide an efficient computing environment for
web based applications
Advantages of data
warehouse
• Enhances end-user access to a wide
variety of data.
• Decision support system users can obtain
specified trend reports, e.g. the item with
the most sales in a particular
area/country within the last two years.
• A data warehouse can be a significant
enabler of commercial business
applications, most notably CRM
Operational
data
Historical
data
Operational Extract
data Data
& Warehouse
Transform
External
data •Queries
•Reports
External
•OLAP
data
•Data mining
Capabilities of data mining
Automated prediction of trends and behaviors.
Data mining automates the process of finding
predictive information in large databases.
A typical example of a predictive problem is
targeted marketing. Data mining uses data on
past promotional mailings to identify the targets
most likely to maximize return on investment in
future mailings.
Contd..
Automated discovery of previously unknown
patterns.
Data mining tools sweep through databases
and identify previously hidden patterns in one
step.
An example of pattern discovery is the
analysis of retail sales data to identify
seemingly unrelated products that are often
purchased together.
Techniques
• Case based reasoning
• Neural computing
• Intelligent agents
• Other tools
Decision Trees
Genetic Algorithms
Nearest neighbor method
Rule induction
Types of information obtained
from data mining
• Associations
• Sequences
• Classifications
• Clustering
• forecasting
Associations
• Occurrences linked to a single event
Sequence
• Events are linked over time
Classification
• Recognizes patterns that describe the
group to which an item belongs by
examining existing items that have been
classified and by inferring a set of rules
Clustering
• Works similar to classification when no
groups have yet been defined
Forecasting
• Uses predictions .
• Uses a series of existing values to forecast
what other values will be
• Retailing & Sales
• Banking
• Manufacturing & production
• Insurance
• Computer hardware & software
• Policework
• Government & defense
• Airlines
• Broadcasting