0% found this document useful (0 votes)
8 views22 pages

Week 12-Asociation Dan Forecasting

week 12

Uploaded by

mifta ardianti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views22 pages

Week 12-Asociation Dan Forecasting

week 12

Uploaded by

mifta ardianti
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Clustering, Asociation and

Forecasting Analysis

Week 12
Cluster Analysis for Data Mining
 Digunakan untuk identifikasi otomatis
pengelompokan benda secara alami
 Bagian dari machine learning
 Termasuk kedalam unsupervised learning
 Mempelajari kumpulan hal dari data masa
lalu, lalu menetapkan instance baru
 Tidak ada variabel keluaran
 Also known as segmentation

-2 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Cluster Analysis for Data Mining
 Clustering results may be used to
 Identifikasi pengelompokkan customers
 Identifikasi aturan untuk menugaskan kasus baru
ke kelas untuk tujuan penargetan/diagnostic
 Memberikan karakterisasi, definisi, pelabelan
populasi
 Mengurangi ukuran dan kompleksitas masalah
untuk metode penambangan data lainnya
 Identifikasi outlier dalam domain tertentu
(misalnya, deteksi kejadian langka)

-3 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Cluster Analysis for Data Mining
 Analysis methods
 Statistical methods (including both
hierarchical and nonhierarchical), such as
k-means, k-modes, and so on
 Neural networks (adaptive resonance
theory [ART], self-organizing map [SOM])
 Fuzzy logic (e.g., fuzzy c-means algorithm)
 Genetic algorithms
 Divisive versus Agglomerative methods
-4 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining
 Berapa banyak cluster?
 Tidak ada cara yang “benar-benar optimal” untuk

menghitungnya
 Heuristics sering digunakan

 Look at the sparseness of clusters

 Number of clusters = (n/2)1/2 (n: no of data points)

 Use Akaike information criterion (AIC)

 Use Bayesian information criterion (BIC)

 Kebanyakan metode analisis klaster melibatkan penggunaan


ukurn jarak untuk menghitung kedekatan antar pasangan item
 Euclidian versus Manhattan (rectilinear) distance

-5 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Cluster Analysis for Data Mining
 k-Means Clustering Algorithm
 k : pre-determined number of clusters
 Algorithm (Step 0: determine value of k)
Step 1: Randomly generate k random points as
initial cluster centers
Step 2: Assign each point to the nearest cluster
center
Step 3: Re-compute the new cluster centers
Repetition step: Repeat steps 3 and 4 until some
convergence criterion is met (usually that the
assignment of points to clusters becomes stable)
-6 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining

-7 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


-8 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining -
k-Means Clustering Algorithm

Step 1 Step 2 Step 3

-9 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Association Rule Mining
 A very popular DM method in business
 Finds interesting relationships (affinities)
between variables (items or events)
 Part of machine learning family
 Employs unsupervised learning
 There is no output variable
 Also known as market basket analysis
 Often used as an example to describe DM to
ordinary people, such as the famous
“relationship between diapers and beers!”
-10 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining
 Input: the simple point-of-sale transaction data
 Output: Most frequent affinities among items
 Example: according to the transaction data…
“Customer who bought a laptop computer and a virus
protection software, also bought extended service plan
70 percent of the time."
 How do you use such a pattern/knowledge?
 Put the items next to each other for ease of finding
 Promote the items as a package (do not put one on sale if the
other(s) are on sale)
 Place items far apart from each other so that the customer
has to walk the aisles to search for it, and by doing so
potentially seeing and buying other items
-11 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining
 A representative applications of association
rule mining include
 In business: cross-marketing, cross-selling, store
design, catalog design, e-commerce site design,
optimization of online advertising, product pricing,
and sales/promotion configuration
 In medicine: relationships between symptoms and
illnesses; diagnosis and patient characteristics and
treatments (to be used in medical DSS); and genes
and their functions (to be used in genomics
projects)…

-12 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Association Rule Mining
 Are all association rules interesting and useful?
A Generic Rule: X  Y [S%, C%]
X, Y: products and/or services
X: Left-hand-side (LHS)
Y: Right-hand-side (RHS)
S: Support: how often X and Y go together
C: Confidence: how often Y go together with the X
Example: {Laptop Computer, Antivirus Software} 
{Extended Service Plan} [30%, 70%]

-13 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Association Rule Mining
 Algorithms are available for generating
association rules
 Apriori
 Eclat
 FP-Growth
 + Derivatives and hybrids of the three
 The algorithms help identify the
frequent item sets, which are, then
converted to association rules
-14 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining
 Apriori Algorithm
 Finds subsets that are common to at least
a minimum number of the itemsets
 uses a bottom-up approach
 frequent subsets are extended one item at a
time (the size of frequent subsets increases
from one-item subsets to two-item subsets,
then three-item subsets, and so on), and
 groups of candidates at each level are tested
against the data for minimum support
 see the figure…
-15 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining
 Apriori Algorithm
Raw Transaction Data One-item Itemsets Two-item Itemsets Three-item Itemsets

Transaction SKUs Itemset Itemset Itemset


Support Support Support
No (Item No) (SKUs) (SKUs) (SKUs)

1 1, 2, 3, 4 1 3 1, 2 3 1, 2, 4 3
1 2, 3, 4 2 6 1, 3 2 2, 3, 4 3
1 2, 3 3 4 1, 4 3
1 1, 2, 4 4 5 2, 3 4
1 1, 2, 3, 4 2, 4 5
1 2, 4 3, 4 3

-16 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Data Mining SPSS PASW Modeler (formerly Clementine)

RapidMiner

Software
SAS / SAS Enterprise Miner

Microsoft Excel

Your own code

Weka (now Pentaho)

 Commercial KXEN

MATLAB
 SPSS - PASW (formerly Other commercial tools

Clementine)
KNIME

Microsoft SQL Server

 SAS - Enterprise Miner Other free tools

Zementis
 IBM - Intelligent Miner Oracle DM

StatSoft – Statistical Data


Statsoft Statistica

Salford CART, Mars, other

Miner Orange

Angoss
 … many more C4.5, C5.0, See5

Free and/or Open


Bayesia

Insightful Miner/S-Plus (now TIBCO)

Source Megaputer

Viscovery

 Weka Clario Analytics


Total (w/ others) Alone
Miner3D
 RapidMiner… Thinkanalytics

0 20 40 60 80 100 120
Source: KDNuggets.com, May 2009
-17 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Myths
 Data mining …
 provides instant solutions/predictions
 is not yet viable for business applications
 requires a separate, dedicated database
 can only be done by those with advanced
degrees
 is only for large firms that have lots of
customer data
 is another name for the good-old statistics

-18 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Common Data Mining Mistakes
1. Selecting the wrong problem for data mining
2. Ignoring what your sponsor thinks data
mining is and what it really can/cannot do
3. Not leaving insufficient time for data
acquisition, selection and preparation
4. Looking only at aggregated results and not
at individual records/predictions
5. Being sloppy about keeping track of the data
mining procedure and results

-19 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


Common Data Mining Mistakes
6. Ignoring suspicious (good or bad) findings
and quickly moving on
7. Running mining algorithms repeatedly and
blindly, without thinking about the next stage
8. Naively believing everything you are told
about the data
9. Naively believing everything you are told
about your own data mining analysis
10. Measuring your results differently from the
way your sponsor measures them
-20 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
End of the Chapter

 Questions / Comments…

-21 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall


All rights reserved. No part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise, without the prior written
permission of the publisher. Printed in the United States of America.

Copyright © 2011 Pearson Education, Inc.


Publishing as Prentice Hall

-22 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

You might also like