0% found this document useful (0 votes)

8 views22 pages

Week 12-Asociation Dan Forecasting

week 12

Uploaded by

mifta ardianti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views22 pages

Week 12-Asociation Dan Forecasting

week 12

Uploaded by

mifta ardianti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 22

Clustering, Asociation and

Forecasting Analysis

Week 12
Cluster Analysis for Data Mining
 Digunakan untuk identifikasi otomatis
pengelompokan benda secara alami
 Bagian dari machine learning
 Termasuk kedalam unsupervised learning
 Mempelajari kumpulan hal dari data masa
lalu, lalu menetapkan instance baru
 Tidak ada variabel keluaran
 Also known as segmentation

-2 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Cluster Analysis for Data Mining
 Clustering results may be used to
 Identifikasi pengelompokkan customers
 Identifikasi aturan untuk menugaskan kasus baru
ke kelas untuk tujuan penargetan/diagnostic
 Memberikan karakterisasi, definisi, pelabelan
populasi
 Mengurangi ukuran dan kompleksitas masalah
untuk metode penambangan data lainnya
 Identifikasi outlier dalam domain tertentu
(misalnya, deteksi kejadian langka)

-3 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Cluster Analysis for Data Mining
 Analysis methods
 Statistical methods (including both
hierarchical and nonhierarchical), such as
k-means, k-modes, and so on
 Neural networks (adaptive resonance
theory [ART], self-organizing map [SOM])
 Fuzzy logic (e.g., fuzzy c-means algorithm)
 Genetic algorithms
 Divisive versus Agglomerative methods
-4 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining
 Berapa banyak cluster?
 Tidak ada cara yang “benar-benar optimal” untuk

menghitungnya
 Heuristics sering digunakan

 Look at the sparseness of clusters

 Number of clusters = (n/2)1/2 (n: no of data points)

 Use Akaike information criterion (AIC)

 Use Bayesian information criterion (BIC)

 Kebanyakan metode analisis klaster melibatkan penggunaan

ukurn jarak untuk menghitung kedekatan antar pasangan item
 Euclidian versus Manhattan (rectilinear) distance

-5 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Cluster Analysis for Data Mining
 k-Means Clustering Algorithm
 k : pre-determined number of clusters
 Algorithm (Step 0: determine value of k)
Step 1: Randomly generate k random points as
initial cluster centers
Step 2: Assign each point to the nearest cluster
center
Step 3: Re-compute the new cluster centers
Repetition step: Repeat steps 3 and 4 until some
convergence criterion is met (usually that the
assignment of points to clusters becomes stable)
-6 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining

-7 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

-8 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Cluster Analysis for Data Mining -
k-Means Clustering Algorithm

Step 1 Step 2 Step 3

-9 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Association Rule Mining
 A very popular DM method in business
 Finds interesting relationships (affinities)
between variables (items or events)
 Part of machine learning family
 Employs unsupervised learning
 There is no output variable
 Also known as market basket analysis
 Often used as an example to describe DM to
ordinary people, such as the famous
“relationship between diapers and beers!”
-10 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining
 Input: the simple point-of-sale transaction data
 Output: Most frequent affinities among items
 Example: according to the transaction data…
“Customer who bought a laptop computer and a virus
protection software, also bought extended service plan
70 percent of the time."
 How do you use such a pattern/knowledge?
 Put the items next to each other for ease of finding
 Promote the items as a package (do not put one on sale if the
other(s) are on sale)
 Place items far apart from each other so that the customer
has to walk the aisles to search for it, and by doing so
potentially seeing and buying other items
-11 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining
 A representative applications of association
rule mining include
 In business: cross-marketing, cross-selling, store
design, catalog design, e-commerce site design,
optimization of online advertising, product pricing,
and sales/promotion configuration
 In medicine: relationships between symptoms and
illnesses; diagnosis and patient characteristics and
treatments (to be used in medical DSS); and genes
and their functions (to be used in genomics
projects)…

-12 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Association Rule Mining
 Are all association rules interesting and useful?
A Generic Rule: X  Y [S%, C%]
X, Y: products and/or services
X: Left-hand-side (LHS)
Y: Right-hand-side (RHS)
S: Support: how often X and Y go together
C: Confidence: how often Y go together with the X
Example: {Laptop Computer, Antivirus Software} 
{Extended Service Plan} [30%, 70%]

-13 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Association Rule Mining
 Algorithms are available for generating
association rules
 Apriori
 Eclat
 FP-Growth
 + Derivatives and hybrids of the three
 The algorithms help identify the
frequent item sets, which are, then
converted to association rules
-14 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining
 Apriori Algorithm
 Finds subsets that are common to at least
a minimum number of the itemsets
 uses a bottom-up approach
 frequent subsets are extended one item at a
time (the size of frequent subsets increases
from one-item subsets to two-item subsets,
then three-item subsets, and so on), and
 groups of candidates at each level are tested
against the data for minimum support
 see the figure…
-15 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Association Rule Mining
 Apriori Algorithm
Raw Transaction Data One-item Itemsets Two-item Itemsets Three-item Itemsets

Transaction SKUs Itemset Itemset Itemset

Support Support Support
No (Item No) (SKUs) (SKUs) (SKUs)

1 1, 2, 3, 4 1 3 1, 2 3 1, 2, 4 3
1 2, 3, 4 2 6 1, 3 2 2, 3, 4 3
1 2, 3 3 4 1, 4 3
1 1, 2, 4 4 5 2, 3 4
1 1, 2, 3, 4 2, 4 5
1 2, 4 3, 4 3

-16 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Data Mining SPSS PASW Modeler (formerly Clementine)

RapidMiner

Software
SAS / SAS Enterprise Miner

Microsoft Excel

Your own code

Weka (now Pentaho)

 Commercial KXEN

MATLAB
 SPSS - PASW (formerly Other commercial tools

Clementine)
KNIME

Microsoft SQL Server

 SAS - Enterprise Miner Other free tools

Zementis
 IBM - Intelligent Miner Oracle DM

StatSoft – Statistical Data

Statsoft Statistica

Salford CART, Mars, other

Miner Orange

Angoss
 … many more C4.5, C5.0, See5

Free and/or Open

Bayesia

Insightful Miner/S-Plus (now TIBCO)

Source Megaputer

Viscovery

 Weka Clario Analytics

Total (w/ others) Alone
Miner3D
 RapidMiner… Thinkanalytics

0 20 40 60 80 100 120
Source: KDNuggets.com, May 2009
-17 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
Data Mining Myths
 Data mining …
 provides instant solutions/predictions
 is not yet viable for business applications
 requires a separate, dedicated database
 can only be done by those with advanced
degrees
 is only for large firms that have lots of
customer data
 is another name for the good-old statistics

Common Data Mining Mistakes
1. Selecting the wrong problem for data mining
2. Ignoring what your sponsor thinks data
mining is and what it really can/cannot do
3. Not leaving insufficient time for data
acquisition, selection and preparation
4. Looking only at aggregated results and not
at individual records/predictions
5. Being sloppy about keeping track of the data
mining procedure and results

Common Data Mining Mistakes
6. Ignoring suspicious (good or bad) findings
and quickly moving on
7. Running mining algorithms repeatedly and
blindly, without thinking about the next stage
8. Naively believing everything you are told
about the data
9. Naively believing everything you are told
about your own data mining analysis
10. Measuring your results differently from the
way your sponsor measures them
-20 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall
End of the Chapter

 Questions / Comments…

All rights reserved. No part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise, without the prior written
permission of the publisher. Printed in the United States of America.

Publishing as Prentice Hall

UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
No ratings yet
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
40 pages
DW and DM Notes
No ratings yet
DW and DM Notes
89 pages
DWDM Unit-II Notes
No ratings yet
DWDM Unit-II Notes
29 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
1st SEM Practical File by Shubham Jain
No ratings yet
1st SEM Practical File by Shubham Jain
27 pages
Elm04 10solutions - Doc 0
100% (2)
Elm04 10solutions - Doc 0
13 pages
Beard, Daniel Carter - Field & Forest Handy Book - New Ideas For Out-of-Doors (1906) PDF
No ratings yet
Beard, Daniel Carter - Field & Forest Handy Book - New Ideas For Out-of-Doors (1906) PDF
457 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
DM Notes
No ratings yet
DM Notes
91 pages
Data Mining
No ratings yet
Data Mining
63 pages
Vinee
100% (1)
Vinee
28 pages
Data Mining Techniques and Applications
No ratings yet
Data Mining Techniques and Applications
16 pages
SA0951a TUTORIAL - Normalisation & Functional Dependency
No ratings yet
SA0951a TUTORIAL - Normalisation & Functional Dependency
4 pages
Unit-4 DWM
No ratings yet
Unit-4 DWM
73 pages
Chapter 4 - IS 466 - Fall Semester 24-25
No ratings yet
Chapter 4 - IS 466 - Fall Semester 24-25
57 pages
Chapter 4 - IS 466 - Spring Semester 23-24 Final
No ratings yet
Chapter 4 - IS 466 - Spring Semester 23-24 Final
57 pages
Next DLP Data Loss Prevention Checklist
100% (2)
Next DLP Data Loss Prevention Checklist
3 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Data Mining & Data Warehousing
No ratings yet
Data Mining & Data Warehousing
84 pages
PHP Runner
No ratings yet
PHP Runner
613 pages
Chapter 4 SR2023
No ratings yet
Chapter 4 SR2023
58 pages
Unit 3
No ratings yet
Unit 3
58 pages
Data Mining Module - New
No ratings yet
Data Mining Module - New
38 pages
Lecture 1-Introduction To Data Mining - M
No ratings yet
Lecture 1-Introduction To Data Mining - M
38 pages
تنقيب بيانات 7 بعد التعديل Maj
No ratings yet
تنقيب بيانات 7 بعد التعديل Maj
35 pages
What Is NoSQL
No ratings yet
What Is NoSQL
52 pages
Unit 5
No ratings yet
Unit 5
38 pages
Unit 3 BI & Data Science
No ratings yet
Unit 3 BI & Data Science
19 pages
Research On AWS Glue
No ratings yet
Research On AWS Glue
5 pages
Turban Dss9e ch05
No ratings yet
Turban Dss9e ch05
54 pages
Entity-Relationship (ER) Modeling: Discussion Focus
No ratings yet
Entity-Relationship (ER) Modeling: Discussion Focus
73 pages
Data Mining
No ratings yet
Data Mining
20 pages
Scientific Literature
No ratings yet
Scientific Literature
20 pages
Lecture 5-Database Security
No ratings yet
Lecture 5-Database Security
26 pages
Unit-1 L1-DBMS Introduction
No ratings yet
Unit-1 L1-DBMS Introduction
44 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
36 pages
Intro To Data Mining
No ratings yet
Intro To Data Mining
25 pages
Lec 1 Data Mining Introduction For Exam
No ratings yet
Lec 1 Data Mining Introduction For Exam
48 pages
Lecturenotes Data Mining
No ratings yet
Lecturenotes Data Mining
23 pages
Ba Group 5
No ratings yet
Ba Group 5
18 pages
Data Mining
No ratings yet
Data Mining
30 pages
12-Association Rule Learning
No ratings yet
12-Association Rule Learning
25 pages
Big Data 4 (3 - 4)
No ratings yet
Big Data 4 (3 - 4)
13 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
84 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Knowledge Discovery and Data Mining (KDD)
No ratings yet
Knowledge Discovery and Data Mining (KDD)
52 pages
ACE Sim 03 PDF
No ratings yet
ACE Sim 03 PDF
32 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
24 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
69 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
46 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
2-Tasks and Techniques
No ratings yet
2-Tasks and Techniques
17 pages
Sayan Ghosh 26900123054 Cse Data Mining 6th Sem
No ratings yet
Sayan Ghosh 26900123054 Cse Data Mining 6th Sem
11 pages
Recess Term Project Presentation
No ratings yet
Recess Term Project Presentation
9 pages
Goms and KLM: Human Computer Interaction
No ratings yet
Goms and KLM: Human Computer Interaction
36 pages
Introduction
No ratings yet
Introduction
26 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
Data Mining: © Pearson Education Limited 1995, 2005
No ratings yet
Data Mining: © Pearson Education Limited 1995, 2005
50 pages
Managing Information Resources 1
No ratings yet
Managing Information Resources 1
28 pages
Data Mining Implementation
No ratings yet
Data Mining Implementation
9 pages
Data Mining Concepts - Binary
No ratings yet
Data Mining Concepts - Binary
22 pages
8 Data Mining Algorithms
No ratings yet
8 Data Mining Algorithms
8 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
DWM
No ratings yet
DWM
66 pages
Business Intelligence: A Managerial Approach (2 Edition)
No ratings yet
Business Intelligence: A Managerial Approach (2 Edition)
58 pages
Data Miningppt378
No ratings yet
Data Miningppt378
31 pages
Datamining & Cluster Coputing
No ratings yet
Datamining & Cluster Coputing
16 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Data Mining
No ratings yet
Data Mining
9 pages
Ip Database Management System
No ratings yet
Ip Database Management System
13 pages
Data Engineer Certification Study Guide
No ratings yet
Data Engineer Certification Study Guide
2 pages
Sample Questions 4&5
No ratings yet
Sample Questions 4&5
4 pages
Student ID Return System
No ratings yet
Student ID Return System
12 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Gis Books
No ratings yet
Gis Books
12 pages
Data Mining: An Overview From A Database Perspective
No ratings yet
Data Mining: An Overview From A Database Perspective
30 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Clarity Bio Internship Feb 2023
No ratings yet
Clarity Bio Internship Feb 2023
2 pages
Final 2
No ratings yet
Final 2
19 pages
Dbms
No ratings yet
Dbms
4 pages
CRM Question Bank
No ratings yet
CRM Question Bank
2 pages
Location Intelligence - Use Geospatial Data & Analytics To ..
No ratings yet
Location Intelligence - Use Geospatial Data & Analytics To ..
2 pages
Manual Honda Nc700x
No ratings yet
Manual Honda Nc700x
2 pages
Sisp Guide Note-Biblio PDF
No ratings yet
Sisp Guide Note-Biblio PDF
3 pages
CDCSetup
No ratings yet
CDCSetup
4 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Pattern Recognition: Fundamentals and Applications
From Everand
Pattern Recognition: Fundamentals and Applications
Fouad Sabry
No ratings yet
DMR Sia
No ratings yet
DMR Sia
538 pages

Week 12-Asociation Dan Forecasting

Uploaded by

Week 12-Asociation Dan Forecasting

Uploaded by

Clustering, Asociation and

-2 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

-3 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

 Look at the sparseness of clusters

 Number of clusters = (n/2)1/2 (n: no of data points)

 Use Akaike information criterion (AIC)

 Use Bayesian information criterion (BIC)

 Kebanyakan metode analisis klaster melibatkan penggunaan

-5 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

-7 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Step 1 Step 2 Step 3

-9 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

-12 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

-13 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Transaction SKUs Itemset Itemset Itemset

-16 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Your own code

Weka (now Pentaho)

Microsoft SQL Server

 SAS - Enterprise Miner Other free tools

StatSoft – Statistical Data

Free and/or Open

 Weka Clario Analytics

-18 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

-19 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

-21 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

Copyright © 2011 Pearson Education, Inc.

-22 Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall

You might also like