Chapter BI4
Chapter BI4
Chapter BI4
Business Intelligence
Lecture out line
Databases to Improve Business Performance and Decision Making
Business Intelligence
Data Warehouses
Association Rule Mining
Classification
Clustering
Others
Using Databases to Improve Business
Performance and Decision Making
• Databases provide information to help the company
run the business more efficiently
• help managers and employees make better
decisions
Tools for analyzing, accessing vast quantities of
data:
• Data warehousing
• Multidimensional data analysis
• Data mining
Using Databases to Improve Business
Performance and Decision Making
Businesses use their databases to keep track of
basic transactions, such as paying suppliers,
processing orders, serving customers, and paying
employees.
If a company wants to know which product is the
most popular or who is its most profitable
customer, the answer lies in the data.
A Good Data Warehouse
is a pre-requisite for
Business Decision Making
WHY & WHAT
DATA WAREHOUSE
MOTIVATION
““We are drowning in information,
but starving for knowledge
John Naisbett
A producer wants to know….
Which
Whichare
areour
our
lowest/highest
lowest/highestmargin
margin
customers ?
customers ?
Who
Whoare
aremy
mycustomers
customers
and what products
What
Whatisisthe
themost
most and what products
are
arethey
theybuying?
buying?
effective distribution
effective distribution
channel?
channel?
What
Whatproduct
productprom-
prom- Which
Whichcustomers
customers
-otions have the biggest are
-otions have the biggest are mostlikely
most likelyto
togo
go
impact
impacton
onrevenue?
revenue? to
to the competition??
the competition
What
Whatimpact
impactwill
will
new products/services
new products/services
have
haveon
onrevenue
revenue
and
andmargins?
margins?
Data, Data……. Everywhere
I can’t find the data I need
yet ... datais scattered over the network
many versions, subtle differences
By to Motivation
Pla
What Is Data Mining? A Definition
Knowledge Discovery in
Databases
The non-trivial extraction of
implicit, previously
unknown and potentially
useful knowledge from
data in large data
repositories
Alternative names
• Knowledge Discovery
(mining) in Databases
(KDD)
• Knowledge extraction
• Data/pattern analysis
• Business Intelligence etc.,
Problem Behind…..
Heterogeneous Information Sources
“Heterogeneities are
everywhere”
Personal
Databases
World
Scientific Databases Medical data Wide
They have Web
Different interfaces
Different data representations
Duplicate and inconsistent
information
Problem
Data Management in Large Enterprises
Application driven development of
operational systems resulted in vertical
fragmentation of informational systems .
Sales Planning Suppliers
Stock Mngmt Debt Mngmt Inventory Mngmt
Integration System
World
Wide
Personal
Web
Digital Libraries Scientific Databases Databases
Uses multidimensional
databases
Components of a Data Warehouse
Why a Data Warehouse?
and analysis
...
Extractor/ Extractor/ Extractor/
Monitor Monitor Monitor
...
Source Source Source
Business intelligence and, data mining
Once data have been captured and organized in
data warehouses ,they are available for further
analysis.
A series of tools enables users to analyze these data
to see new patterns, relationships, and insights that
are useful for guiding decision making
BI cont’d
Definition
According to (Adelman et.al, 2002), BI is a term that
encompasses a broad range of analytical software and solutions for
gathering, consolidating, analyzing and providing access to information in a
way that is supposed to let an enterprise's users make better business
decisions
Stackowiak et al. (2007) define Business intelligence as the process
of taking large amounts of data, analyzing that data, and presenting a high-
level set of reports that condense the essence of that data into the basis of
business actions, enabling management to make fundamental daily business
decisions.
Business intelligence as a “business management term used to
describe applications and technologies which are used to gather, provide
access to analyze data and information about an enterprise, in order to help
them make better informed business decisions.”
Cont’d
These tools for consolidating, analyzing, and
providing access to vast amounts of data to help
users make better business decisions are often
referred to business intelligence (BI).
business intelligence provides firms with the
capability to amass information; develop
knowledge about customers, competitors, and
internal operations; and change decision-making
behavior to achieve higher profitability and other
business goals
BI cont’d
Classification Determine to which class a data Classification Determine to which class a data
item belongs item belongs
Clustering and outlier analysis Partition a Clustering and outlier analysis Partition a
set into classes, whereby items with similar set into classes, whereby items with similar
characteristics are grouped together characteristics are grouped together
27
Market Basket Analysis…
Analyzes customer buying habits by finding associations between
different items that customers place in their “Shopping Baskets”
The discovery of the interesting correlations can help retailers develop
marketing strategies by gaining insight into .”which items are frequently
purchased together by the customers”.
This information leads to increased sales by helping retailers to do
selective marketing and plan their shelf place.
Find all rules that correlate the presence of one set of items
(item set) with that of another set of items
Why Association Rule Mining
Support
Simplest question: find sets of items that appear “frequently”
in the baskets.
Support for itemset I = the number of baskets containing all
items in I.
Given a support threshold s, sets of items that appear in > s
baskets are called frequent itemsets.
Association mining from frequent Pattern
The rule A B holds in the transaction set D with support s,
where s is the percentage of transactions in D that contain A B
(i.e., the union of itemsets A and B, or say, both A and B).
Ie.,
support(A B) = P(A B)
confidence(A B) = P(B|A)
Association Rule- Basic Concepts
Association Rule form :
Antecedent Consequent [support, confidence]
Examples:
buys(x, “ computer”) ¨buys(x, “ financial Mgt. software”)
[0.5%, 60%]
Model Construction:
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Merga, Professor,8)
NAME RANK YEARS TENURED
Tenured?
Kedir Assistant Prof 4 no
Abebe Associate Prof 8 yes
Kebede Professor 9 yes
Alima Assistant Prof 9 yes
Bayesian Theorem: for prediction
• P(X|buys_Comp= “no”)
= 0.6 x 0.4 x 0.2 x 0.4
= 0.019
Outlook P N Humidity P N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 normal 6/9 1/5
rain 3/9 2/5
Tempreature Windy
hot 2/9 2/5 true 3/9 3/5
mild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5
We also have the probabilities
P = 9/14
N = 5/14
Naive Bayesian Classifier Example
To classify a new sample X:
outlook = sunny
temperature = cool
humidity = high
windy = false
Prob(P|X) = Prob(P)*Prob(sunny|P)*Prob(cool|P)*
Prob(high|P)*Prob(false|P) = 9/14*2/9*3/9*3/9*6/9 =
0.01
Prob(N|X) = Prob(N)*Prob(sunny|N)*Prob(cool|N)*
Prob(high|N)*Prob(false|N) = 5/14*3/5*1/5*4/5*2/5 =
0.013
Therefore X takes class label N
What is Cluster Analysis?
Finding groups of objects such that the objects in a group will be similar (or
related) to one another and different from (or unrelated to) the objects in other
groups
Inter-
Intra- cluster
cluster distances
distances are
are maximized
minimized
Clustering