Data Mining
Data Mining
A Newer Definition
Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner.
Techniques
Regression/Fitting Clustering Neural networks Bayesian networks Hidden Markov models
temp
mild
humidity windy
high false
play
no
sunny
rainy
hot
cool
mild
high
true
false
yes
yes
Clustering
Iterative clustering
K-means
Hierarchical clustering
Agglomerative method
Applications:
Marketing sales model, Finance loan decision Insurance risk analysis, Telecom load predication Web/text mining, Surveillance security Bioinformatics
In Bioinformatics
Analysis of Microarray Data Mining free text Structural genomics protein crystallization Predicting structure from sequence
Common theme: complex data, fast growing (outgrowing our processing power)
Data Representations
Cluster Analysis
Bayesian Network
Wrap It Up
Data mining has great potential Danger: dont over predict
S&P index = function of the previous years butter production, cheese production, sheep population in Bangladesh and US?