By Ngatunga Elizabeth
Roll No: 13PTIT011
1
Msc. IT and Management
Bussiness Intelligence and Analytics
2013 Batch
12/19/15
STATISTICAL BASED
METHOD DATA MINING
ALGORITHM
OUTLINE
Introduction
Correlation Analysis
Regression Analysis
Bayesian model
Conclusion
References
12/19/15
INTRODUCTION
Data Mining => searching for certain patterns of
12/19/15
data so as to obtain knowledge that can be used for
decision making
Statistics is useful for mining various patterns
from data as well as for understanding the
underlying
mechanisms
generating
and
affecting the patterns
Both statistics and data mining are concerned
with drawing inferences from data
STATISTICAL BASED ALGORITHMS
Correlation Analysis
Regression Analysis
Bayesian model
12/19/15
CORRELATION
Correlation is a statistical technique used to
determine the degree to which two variables are
related expressed as correlation coefficient, r
12/19/15
CORRELATION USING LIFT
Correlation rule being measured not only by its support and
confidence but also by the correlation between item sets A and
B
12/19/15
Expressed as
Such that lift(A,B)
-1
|
negatively correlated
0
|
independent
1
|
positively correlated
CORRELATION USING LIFT
12/19/15
lift of the association (or correlation) rule
assesses the degree to which the occurrence of
one lifts the occurrence of the other
7
CORRELATION USING CHI-SQUARE
METHOD
12/19/15
Applicable in categorical (binary) data e.g the
customer loyalty to a supermarket.
The 2 statistic tests the hypothesis that A and B
are independent, that is there is no correlation
between them.
If the hypothesis can be rejected, then A and B are
statistically correlated.
8
APPLICATION IN DATA MINING
12/19/15
The discovery of interesting correlation relationships
among huge amounts of business transaction records
can help in many business decision-making processes
such as
Catalog
design,
Cross-marketing (joint promotion)
Customer shopping behavior analysis.
REGRESSION
12/19/15
Regression analysis can be used to model the
relationship between one or more independent or
predictor variables and a dependent or response
variable (which is continuous-valued).
Types
Linear
Regression (single independent variable)
Multiple
Logit
regression (more than one)
Regression (categorical dependent variable)
10
APPLICATION IN DATA MINING
Commerce: predicting sales amounts of new product
12/19/15
Applications of this statistical method in data mining are
multiple which includes
based on advertising expenditure
Meteorology: predicting wind velocities and directions
as a function of temperature, humidity, air pressure
Stock exchange: time series prediction of stock market
indices (trend estimation)
Medicine: effect of parental birth weight/height on
infant birth weight/height, for instance (Glorunescu,
2011)
11
BAYESIAN MODEL
12/19/15
12
BAYESIAN MODEL
12/19/15
13
APPLICATION IN DATA MINING
Minimizes the probability to make a wrong
decision, or the expected risk
12/19/15
Thus, the experts will be able to give an
estimate of the weight or importance of their
prior knowledge, compared to the training
data available
14
CONCLUSION
Industries such as banking, insurance,
medicine, and retailing commonly use data
mining to reduce costs, enhance research, and
increase sales.
Through the use of Correlation, Regression
and Bayesian model inferences are made to
ensure understanding of the patterns of
correlation and causal links among the data
values or making predictions of future data
values.
15
REFERENCES
Hand, Mannila and Smyth (2001), Principles of Data
Mining, The MIT Press ,Cambridge, Massachusetts London
England
12/19/15
Han and Kamber(2012), Data Mining: Concepts and
Techniques,3rd edition, Morgan Kaufmann, USA.
Glorunescu(2011), Data Mining:Concepts, models and
Techniques, Springer-Verlag Berlin Heidelberg , Romania
Berry and Linolf(2004), Data MiningTechniques:For
Marketing, Sales, andCustomer Relationship Management,
2nd Edition, Wiley Publishing, Inc., Indianapolis, Indiana,
Robinson and Officer (2008),Data Mining: Predicting
16
Laptop Retail Price Using Regression, Acccesed on 8/3/2015
http://www.spelman.edu/docs/aspire-research/joibritneypdf.pdf?s
END OF PRESENTATION
12/19/15
17