0% found this document useful (0 votes)

59 views33 pages

Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm

The document discusses data mining topics including: 1. The lecture plan introduces the general and specific instructional objectives of the data mining course, which are to introduce data mining concepts, develop skills in data mining software, and gain research experience. 2. The topics to be covered are defined, including what is data mining, input/output, algorithms, credibility evaluation, advanced techniques, data transformation, and WEKA implementation. 3. An example lecture on "What is data mining" is outlined, addressing the relationships between data mining and machine learning, providing simple examples, discussing machine learning and statistics, and examining generalization as a search process.

Uploaded by

Souvenir Tas Blacu Solo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views33 pages

Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm

Uploaded by

Souvenir Tas Blacu Solo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Penambangan Data

Program Pascasarjana Fakultas Teknik

JTETI - UGM

Indriana Hidayah
References
1. Witten, Ian H. and Eibe Frank. Data mining: practical
machine learning tools and techniques, 2nd edition.
Morgan Kaufmann publishers. 2005.
2. Han, Jiawei, Micheline Kamber, and Jian Pei. Data
mining: concept and techniques, 3rd edition. Morgan
Kaufmann Publishers. 2012.
3. Liu, Bing. Web data mining: exploring hyperlinks,
contents, and usage data. Springer. 2007.
Lecture plan
RPKPS (Rencana Program Kegiatan
Pembelajaran Semester)

 Tujuan Instruksional Umum

 To introduce students to the basic concepts and techniques of Data
Mining.
 To develop skills of using recent data mining software for solving
practical problems.
 To gain experience of doing independent study and research.
Lecture plan
RPKPS (Rencana Program Kegiatan
Pembelajaran Semester)

• Tujuan Instruksional Khusus Tiap Topik (Pokok Bahasan)

Memahami unsur-unsur yang dirinci sebagai berikut.
– What is data mining
– Input: Concepts, instances, attributes
– Output: Knowledge representation
– Algorithms: The basic method
– Credibility: Evaluating what has been learned
– Advanced Data Mining: Implementation
– Data Transformation
– WEKA Data Mining Implementation.
Today’s topic
• What is data mining:
(1) data mining and machine learning;
(2) simple examples;
(3) machine learning and statistics;
(4) generalization as search.
What is data mining:
(1) data mining and machine learning;
(2) simple examples;
(3) machine learning and statistics;
(4) generalization as search.
What Is Data Mining?
 Data mining (knowledge discovery in databases):
 Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) information or patterns from data in large
databases
 Alternative names:
 Data mining: a misnomer?
 Knowledge discovery(mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.

Source: Jiawei Han's slide

9/13/2013 Data Mining: Concepts and Techniques 7

Why data mining?
 The motivation:
 Data explosion problem Big data in
 Automated data collection tools databases and
other repositories
 Mature database technology
 Data rich but information poor!

 Solution: Data warehousing and data mining

 Data warehousing and on-line analytical
processing (OLAP)
 Data mining: extraction of interesting
knowledge (rules, patterns, constraints) from
data in large databases

9/13/2013 Data Mining: Concepts and Techniques 8

Evolution of Database Technology
 1960s:
 Data collection, change from primitive file processing to database
system
 1970s:
 Relational data model, relational DBMS implementation
 1980s:
 RDBMS, advanced data models (extended-relational, OO,
deductive, etc.) and application-oriented DBMS (spatial, scientific,
engineering, etc.)
 1990s—2000s:
 Data mining and data warehousing, multimedia databases, and
Web databases

9/13/2013 Data Mining: Concepts and Techniques 9

How about machine learning?
• Data mining is defined as the process of
discovering useful patterns, automatically or
semi-automatically, in large quantities of data.
• Where as, machine learning is…
– Learning (noun): cognitive process of acquiring skill
or knowledge (Wordweb 6.6)
– Thus, machine learning can be thought as the
machine (i.e. computer) going on a process of
acquiring skill or knowledge.
• So…
– How is the relation between data mining and
machine learning?
Potential Applications (1)
 Data mining can be applied in multidiscipline
field, involving:
 machine learning,
 statistics,
 databases,
 artificial intelligence, and
 pattern recognition
 Web usage mining
 Text mining
9/13/2013 Data Mining: Concepts and Techniques 12
What is data mining:
(1) data mining and machine learning;
(2) simple examples;
(3) machine learning and statistics;
(4) generalization as search.
Simple example
Contact lens prescription

 The patterns can

be:
 Classification
 Presented in
decision tree

9/13/2013 Data Mining: Concepts and Techniques 14

More realistic example:
vertebral column
pelvic_inci- pelvic_tilt lumbar_lordo sacral_slope pelvic_radius degree_spondy- Class
dence sis_angle lolisthesis attribute

63.0278175 22.55258597 39.60911701 40.47523153 98.67291675 -0.254399986Hernia

39.05695098 10.06099147 25.01537822 28.99595951 114.4054254 4.564258645Hernia

68.83202098 22.21848205 50.09219357 46.61353893 105.9851355 -3.530317314Hernia

69.29700807 24.65287791 44.31123813 44.64413017 101.8684951 11.21152344Hernia

49.71285934 9.652074879 28.317406 40.06078446 108.1687249 7.918500615Hernia

40.25019968 13.92190658 25.1249496 26.32829311 130.3278713 2.230651729Hernia

53.43292815 15.86433612 37.16593387 37.56859203 120.5675233 5.988550702Hernia

45.36675362 10.75561143 29.03834896 34.61114218 117.2700675 -10.67587083Hernia

43.79019026 13.5337531 42.69081398 30.25643716 125.0028927 13.28901817Hernia

36.68635286 5.010884121 41.9487509 31.67546874 84.24141517 0.664437117Hernia

49.70660953 13.04097405 31.33450009 36.66563548 108.6482654 -7.825985755Hernia

Data mining process
 As a process, data mining encompasses three main
steps:
 Pre-processing → dealing with unsuitable raw data
 Data mining → applying data mining method
 Post-processing → interpreting mined patterns
Architecture of a Typical Data Mining
System
Graphical user interface

Pattern evaluation

Data mining engine

Knowledge-base
Database or data
warehouse server
Data cleaning & data
integration Filtering

Data
Databases Warehouse
9/13/2013 Data Mining: Concepts and Techniques 18
Another example:Directed marketing
(S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-
DM Methodology. )

• Problem:
– Increasing vast number of marketing campaigns
– Global competitive world
– Mass campaigns are ineffective
• Solution:
– Directed campaigns with a strict and rigorous selection of
contacts.
• Focus on targets that assumable will be keener to that specific
product/service
• More efficient, reduction in costs and time
• The dataset:
– Portuguese marketing
campaign related with bank
deposit subscription.
– Dataset collected is related to
17 campaigns that occurred
between May 2008 and
November 2010,
corresponding to a total of
79354 contacts.
– For each contact, recorded
• a large number of attributes
• the target variable (class attribute)
• there were 6499 successes (8%
success rate).
Steps
1. Goal definition
– To predict if a client will subscribe the deposit
– Classification task
2. Simple data pre-processing (Data Preparation phase)
– Non-conclusive instances were discarded, leading to a total of 55817
contacts.
– Attribute reduction, leading to 29 attributes and 1 class attribute
– Discard instances that contained missing values, leading to 45211
instances (5289 of which were successful or 11.7% success rate).
3. Data mining step (Modeling phase), using NB, DT, SVM
– dataset was randomly divided into training (2/3) and test (1/3) sets
4. Evaluation of the model
Conclusion
• Call duration is the most
relevant feature, meaning
that longer calls tend
increase successes.
• In second place comes the
month of contact.
• Success is most likely to
occur in the last month of
each trimester (March, June,
September and December).
• Such knowledge can be
used to shift campaigns to
occur in those months.
Data Mining: On What Kind of
Data?

 Relational databases
 Data warehouses
 Transactional databases
 Advanced DB and information repositories
 Object-oriented and object-relational databases
 Spatial databases
 Time-series data and temporal data
 Text databases and multimedia databases
 Heterogeneous and legacy databases
 WWW
9/13/2013 Data Mining: Concepts and Techniques 23
Functionality
Knowledge produced by data mining
 Knowledge in DM term, means useful pattern
 The pattern should be
 Useful
 Valid
 Understandable
 Pattern types can be produced by data mining
methods:
 Frequent pattern, association, correlation
 Data characterization and discrimination
 Classification and prediction
 Cluster
Frequent pattern, association,
correlation

 Patterns that occur frequently in data

 Frequent itemset
 Frequent subsequences
 Frequent substructures
 Leading to associations and correlation within
data
Classification and prediction
Cluster analysis
Are All the “Discovered” Patterns
Interesting?
 A data mining system may generate thousands of patterns, not all of them
are interesting.
 Suggested approach: Human-centered, query-based, focused mining
 Interestingness measures: A pattern is interesting if it is easily
understood by humans, valid on new or test data with some degree of
certainty, potentially useful, novel, or validates some hypothesis that a user
seeks to confirm
 Objective vs. subjective interestingness measures:
 Objective: based on statistics and structures of patterns, e.g., support,
confidence, etc.
 Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty,
actionability, etc.

9/13/2013 Data Mining: Concepts and Techniques 28

Can We Find All and Only Interesting
Patterns?

 Search for only interesting patterns: Optimization

 Can a data mining system find only the interesting patterns?
 Approaches
 First general all the patterns and then filter out the uninteresting
ones.
 Generate only the interesting patterns—mining query optimization

9/13/2013 Data Mining: Concepts and Techniques 29

What is data mining:
(1) data mining and machine learning;
(2) simple examples;
(3) machine learning and statistics;
(4) generalization as search.
Machine learning and statistics
• Both are in the continuum of data analysis
techniques
– Some derive from the skills taught in standard
statistics courses,
– others are more closely associated with algorithms
that has arisen out of computer science.
What is data mining:
(1) data mining and machine learning;
(2) simple examples;
(3) machine learning and statistics;
(4) generalization as search.
• One way of visualizing the problem of learning—
and one that distinguishes it from statistical
approaches—is to imagine a search through a
space of possible concept descriptions for one
that fits the data.

Manual Bluelight Bl6-U
100% (3)
Manual Bluelight Bl6-U
238 pages
Online Doctor Appointment System
69% (13)
Online Doctor Appointment System
20 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
Data Mining and Datawarehousing CS-303
No ratings yet
Data Mining and Datawarehousing CS-303
34 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
Chapter 1
No ratings yet
Chapter 1
38 pages
BIDW Lecture 2
No ratings yet
BIDW Lecture 2
33 pages
Data Mining Concepts and Techniques
50% (2)
Data Mining Concepts and Techniques
136 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
2020 - UNIT 2 Chapter 1
No ratings yet
2020 - UNIT 2 Chapter 1
73 pages
Intro Data Mining
No ratings yet
Intro Data Mining
51 pages
Data Mining - Concepts and Techniques
No ratings yet
Data Mining - Concepts and Techniques
224 pages
01 Intro
No ratings yet
01 Intro
35 pages
01 Intro 1
No ratings yet
01 Intro 1
33 pages
1901 2022412984 SC400T00AENUTrainerHandbook
100% (2)
1901 2022412984 SC400T00AENUTrainerHandbook
194 pages
01 Intro
No ratings yet
01 Intro
40 pages
Combine 056
No ratings yet
Combine 056
57 pages
DM 1
No ratings yet
DM 1
47 pages
01 Intro 1
No ratings yet
01 Intro 1
50 pages
Unit 1
No ratings yet
Unit 1
19 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Unit 1
No ratings yet
Unit 1
148 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
37 pages
01 Intro
No ratings yet
01 Intro
29 pages
Data Mining
No ratings yet
Data Mining
35 pages
Data Mining
No ratings yet
Data Mining
15 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
5-Introduction To Data Mining, Steps in Data Mining Process-06!01!2025
No ratings yet
5-Introduction To Data Mining, Steps in Data Mining Process-06!01!2025
21 pages
Lecture - 1 02032023 095637am 1 29022024 124126pm
No ratings yet
Lecture - 1 02032023 095637am 1 29022024 124126pm
33 pages
BI Ch02
No ratings yet
BI Ch02
29 pages
01 Intro
No ratings yet
01 Intro
22 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
Unit I Dbmi
No ratings yet
Unit I Dbmi
35 pages
01 Introduction
No ratings yet
01 Introduction
36 pages
Sp-Ict Guidelines - For Secondary
No ratings yet
Sp-Ict Guidelines - For Secondary
30 pages
Administare Netwrok and Peripheral Devices Information Sheet
88% (16)
Administare Netwrok and Peripheral Devices Information Sheet
54 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
No ratings yet
Data Mining: Concepts and Techniques: Jiawei Han and Micheline Kamber
46 pages
Unit 4
No ratings yet
Unit 4
17 pages
Class 9 Question Paper New
No ratings yet
Class 9 Question Paper New
8 pages
01 Intro
No ratings yet
01 Intro
45 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
01 Intro
No ratings yet
01 Intro
23 pages
Unit - I
No ratings yet
Unit - I
22 pages
Data Mining: Concepts and Techniques: - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 1
37 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
31 pages
Karel Robot Book
100% (1)
Karel Robot Book
161 pages
Data Mining
No ratings yet
Data Mining
48 pages
IS414: Data Mining: DR - Waleed M.Ead
No ratings yet
IS414: Data Mining: DR - Waleed M.Ead
36 pages
Data Warehousing Data Mining Lecture Notes On UNIT 1
No ratings yet
Data Warehousing Data Mining Lecture Notes On UNIT 1
22 pages
Intro Data Mining
No ratings yet
Intro Data Mining
30 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Class 7 Extra Computer Science CHAPTER 3 (Computer Viruses)
100% (1)
Class 7 Extra Computer Science CHAPTER 3 (Computer Viruses)
3 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
28 pages
References: Machine Learning Tools and Techniques, 2 Edition
No ratings yet
References: Machine Learning Tools and Techniques, 2 Edition
32 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
25 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
May 14, 2015 Data Mining: Concepts and Techniques
No ratings yet
May 14, 2015 Data Mining: Concepts and Techniques
29 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 1
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 1
37 pages
Chapter 1. Introduction
No ratings yet
Chapter 1. Introduction
323 pages
Data Mining: Concepts and Techniques: Sujata Chakravarty Associate Professor RCMA, Bhubaneswar
No ratings yet
Data Mining: Concepts and Techniques: Sujata Chakravarty Associate Professor RCMA, Bhubaneswar
17 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
21 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Im Smartcool e 6877419 V1.5.0 10 14
No ratings yet
Im Smartcool e 6877419 V1.5.0 10 14
222 pages
Computer Profile Summary: Plan For Your Next Computer Refresh... Click For Belarc's System Management Products
0% (1)
Computer Profile Summary: Plan For Your Next Computer Refresh... Click For Belarc's System Management Products
6 pages
The Computing Profession
No ratings yet
The Computing Profession
21 pages
688966705-At-T-Mobility-Llc-Iphone-12-2 11.13.02 Am
No ratings yet
688966705-At-T-Mobility-Llc-Iphone-12-2 11.13.02 Am
1 page
KeralaPentecostHistory (SajuMathew) PDF
100% (1)
KeralaPentecostHistory (SajuMathew) PDF
440 pages
2nd Quarter Exam Mil
100% (2)
2nd Quarter Exam Mil
3 pages
Basics of Data Analysis and Graphics in
No ratings yet
Basics of Data Analysis and Graphics in
103 pages
R13 QP EFFSetup For Pricing Extensions
No ratings yet
R13 QP EFFSetup For Pricing Extensions
10 pages
PC DMIS Software de Masura PDF
No ratings yet
PC DMIS Software de Masura PDF
24 pages
Management of Technology Task: Skype Business Canvas
0% (1)
Management of Technology Task: Skype Business Canvas
26 pages
LPC-P1114 Development Board
No ratings yet
LPC-P1114 Development Board
15 pages
Sathyabama
No ratings yet
Sathyabama
65 pages
Roles of Mass Media in Education: Mr. John Michael O. Cadoy
No ratings yet
Roles of Mass Media in Education: Mr. John Michael O. Cadoy
8 pages
Sencon 2.0 Software Update Version 2
No ratings yet
Sencon 2.0 Software Update Version 2
11 pages
Guide For Combined Incorporation Process
No ratings yet
Guide For Combined Incorporation Process
5 pages
History of Computers
No ratings yet
History of Computers
49 pages
Skills and Cert Roadmap 2015
No ratings yet
Skills and Cert Roadmap 2015
1 page
Children and Young People's Home Use of ICT For Educational Purposes: The Impact On Attainment at Key Stages 1-4
No ratings yet
Children and Young People's Home Use of ICT For Educational Purposes: The Impact On Attainment at Key Stages 1-4
106 pages
Assignment # 1: Course: Instructor
No ratings yet
Assignment # 1: Course: Instructor
3 pages
Se CT 1 Answer
No ratings yet
Se CT 1 Answer
5 pages
Animasi Pesawat Menggunakan OpenGL
No ratings yet
Animasi Pesawat Menggunakan OpenGL
11 pages
Converting Floating Point or REAL Datatypes Into Integers
No ratings yet
Converting Floating Point or REAL Datatypes Into Integers
3 pages

Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm

Uploaded by

Penambangan Data: Program Pascasarjana Fakultas Teknik Jteti - Ugm

Uploaded by

Penambangan Data

Program Pascasarjana Fakultas Teknik

 Tujuan Instruksional Umum

• Tujuan Instruksional Khusus Tiap Topik (Pokok Bahasan)

Source: Jiawei Han's slide

9/13/2013 Data Mining: Concepts and Techniques 7

 Solution: Data warehousing and data mining

9/13/2013 Data Mining: Concepts and Techniques 8

9/13/2013 Data Mining: Concepts and Techniques 9

 The patterns can

9/13/2013 Data Mining: Concepts and Techniques 14

63.0278175 22.55258597 39.60911701 40.47523153 98.67291675 -0.254399986Hernia

39.05695098 10.06099147 25.01537822 28.99595951 114.4054254 4.564258645Hernia

68.83202098 22.21848205 50.09219357 46.61353893 105.9851355 -3.530317314Hernia

69.29700807 24.65287791 44.31123813 44.64413017 101.8684951 11.21152344Hernia

49.71285934 9.652074879 28.317406 40.06078446 108.1687249 7.918500615Hernia

40.25019968 13.92190658 25.1249496 26.32829311 130.3278713 2.230651729Hernia

53.43292815 15.86433612 37.16593387 37.56859203 120.5675233 5.988550702Hernia

45.36675362 10.75561143 29.03834896 34.61114218 117.2700675 -10.67587083Hernia

43.79019026 13.5337531 42.69081398 30.25643716 125.0028927 13.28901817Hernia

36.68635286 5.010884121 41.9487509 31.67546874 84.24141517 0.664437117Hernia

49.70660953 13.04097405 31.33450009 36.66563548 108.6482654 -7.825985755Hernia

Data mining engine

 Patterns that occur frequently in data

9/13/2013 Data Mining: Concepts and Techniques 28

 Search for only interesting patterns: Optimization

9/13/2013 Data Mining: Concepts and Techniques 29

You might also like