0% found this document useful (0 votes)
4 views

Chapter 1

Chapter One of 'Data Mining Techniques and Applications' introduces the concepts of data, information, and knowledge, highlighting the significance of data mining in extracting useful insights from large datasets. It discusses the objectives of data mining, its relationship with other disciplines, current applications, and challenges faced in the field. Additionally, it provides a brief overview of the Weka tool, which aids in data mining processes.

Uploaded by

Natael FRONGE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Chapter 1

Chapter One of 'Data Mining Techniques and Applications' introduces the concepts of data, information, and knowledge, highlighting the significance of data mining in extracting useful insights from large datasets. It discusses the objectives of data mining, its relationship with other disciplines, current applications, and challenges faced in the field. Additionally, it provides a brief overview of the Weka tool, which aids in data mining processes.

Uploaded by

Natael FRONGE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 24

Chapter One

Introduction

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Chapter Overview

• Roles of data, information and knowledge


• Background of data mining
• What is data mining?
• Main data mining objectives
• Data mining and other related disciplines
• Current state of data mining
• Promises and challenges
• A brief preview of data mining tool Weka
Data Mining Techniques and Applications, 1 st edition
Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data, Information and Knowledge
• Data (D)
– Isolated factual recording of separate objects and
events
– Enables the recording of the seen events
• Information (I)
– Fact of meaningful context represented by
relationships between isolated data items
K
– Information enables the responding to the seen
events I
• Knowledge (K) D
– Verified known information that is accommodated
into the business process
– Enable the anticipation of the unseen events

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: The Background
• Computerisation of operations in commercial,
governmental and scientific organisations has resulted
in large volumes of operational data, e.g.
– Itemised telephone bills
– Bank statements
– Supermarket transactions
– Share prices
– Scientific experimental data sets
– Published web pages
– CCTV video footages
– ……

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: The Background
• Facts:
– Storing the data is an operational necessity
– Storing the data has become easy and affordable
– Data acquisition is fully or partially automatic and fast
• Consequences:
– The speed of data comprehension does not match the
speed of data acquisition
– Many commercial database management systems
(DBMSs) are not equipped with data comprehension and
analysis tools.
– We may be data rich, but information poor.
Data Mining Techniques and Applications, 1 st edition
Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: The Background
• An intriguing quotable quote:

“I know half the money I spend on


advertising is wasted, but I can
never find out which half!”

Lord Leverhulme
President of Unilever

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: What it is
Knowledge discovery in databases (KDD) refers to the efficient process
of searching through large volumes of raw data in databases to find
potentially useful information that is implicitly embedded in the data. Data
Mining is an integral step of KDD that discovers hidden patterns from an
input data set.

• Useful information; leading to a course of action or an


understanding of data
• Non-trivial implicit information; not the raw data, nor
the result of a simple data summary
• Real life databases; not laboratory generated data sets
• Efficient novel discovery methods; expected to be
scaled up and applied to large databases
Data Mining Techniques and Applications, 1 st edition
Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: Useful Information
Example 1 (A well-known example, not a joke):
Customers who purchase beer are also likely (say 90%) to
purchase nappies.

Example 2 (May already be in practical use in credit card


applications):
If 20,000  Customer’s Salary  40,000 pounds and
Customer has a house, then Customer is a safe
customer.

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: Non-trivial Information
• Putting the “search for information” into a spectrum:

Data retrieval Online analytic processing Data mining


sophistication

sophistication
High end of
Low end of

• Retrieval of stored data • Interactive reporting • Discovery of hidden and


• Trivial data aggregation on stored data embedded patterns
• Written in standard SQL • Summarisation and • Discovery algorithms
drilling along different • Written in programming
attributes
• Written in extended language probably with
SQL the assistance of SQL

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: Real-life Databases
• Characteristics of a real-life database
– The size may be extremely large
– The dimensionality can be very high
– Attributes can be of different data types
– Data quality can be very poor
– Data may exist in pieces and isolated in different systems
– Value distribution can be extremely skewed
– Database content can be dynamic and evolving
– Data may lack traditional record-based structure
– Data are available on second storage media

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: Efficient Algorithms
• Discovering interesting patterns supported by given facts
can be computationally hard because many discoveries
are combinatorial problems. Trivial algorithms may take
too long.
• A discovery algorithm is considered efficient if its
execution time and memory requirement are comparable
to those of sorting algorithms; otherwise, it is unlikely to
scale up well enough to cope with data sets of large
sizes.
• Efficient discovery algorithms may be hard to find. Using
advanced hardware, optimising the implementation of the
algorithms and developing approximate solutions can be
viable alternative options.
Data Mining Techniques and Applications, 1 st edition
Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining Objectives
• Classification
– Using existing data to form a classification model and then
using the model to assign an appropriate class label for a
data record (e.g. safe vs. risky customers)
• Estimation
– Similar to classification but to assign a value to an output
variable of a data record (e.g. estimated house value)
• Prediction
– Similar to classification and estimation, but more concerned
with future outcome of the output (e.g. tomorrow’s weather)
• Description
– General description of data characteristics (e.g. customer
profile)
Data Mining Techniques and Applications, 1 st edition
Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining & Other Disciplines

Machine Learning
Statistics
(Artificial Intelligence)

Inductive & deductive Data analysis theories


learning methods methods and measures

DATA MINING

Fast storage structures &


retrieval operations

Database
Management

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: Current State
• Many data mining algorithms have been developed or
adapted
• Many data mining software tools have been built and
are in use
• A cross-industry methodology has been formed
• Besides general solutions, more application-oriented
data mining solutions are being developed
• More and more organisations are either doing their
own data mining or hiring consultants to do the job
• Data mining has been extended to web mining and
text mining
Data Mining Techniques and Applications, 1 st edition
Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: Current State
• Some nuisances
– Mining cookies
– Spyware and miningware
– Intrusion to privacy
• Some serious problems
– “Big Brother is watching”
– Unfair advantages in trading practice e.g. high-
frequency trading (HFT)
– Abuse of personal data
– Ethical concerns

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: Promises
• Areas of data mining application:
– Finance and insurance
– Marketing and sales
– Medicine
– Agriculture
– Society, politics and economics
– Science
– Engineering
– Law enforcement
– Military and intelligence (classified)
Data Mining Techniques and Applications, 1 st edition
Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Data Mining: Challenges Faced
• Some difficult problems to solve
– Extremely large data sets
– Extremely high dimensionalities (curse of dimensions)
– Combinatorial problems and fast algorithms
– Meaningful evaluation of the patterns
– Discovery of changing and evolving patterns
– Integration of data mining techniques
– Comprehensibility of patterns
– Data pre-processing
– Mining non-standard complex data such as multimedia
materials

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Weka: A Brief Introduction
• Overview
– Java tool set developed at Univ. of Waikato (NZ)
– Free to download and used by many
– A wide range of learning and data pre-processing
methods and algorithms, with Java API
– Offering a GUI (Explorer) and a command-line (Simple
CLI) interface to the tools
– Experimenter module to assist the evaluation of
classification techniques
– KnowledgeFlow module to enable batch-processing
style discovery and incremental mining
– Some visualisation facilities
Data Mining Techniques and Applications, 1 st edition
Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Weka: A Brief Introduction
• Weka Explorer
– For investigative interactive data mining with small size data
sets
– Preprocess, Classify, Cluster, Associate, Select Attributes
and Visualise pages

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Weka: A Brief Introduction
• Weka Simple CLI
– Weka facilities as Java classes
– Calling the Java functions as commands

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Weka: A Brief Introduction
• Weka Experimenter
– Comparing performances of different classification solutions
on a collection of data sets

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Weka: A Brief Introduction
• Weka KnowledgeFlow
– Setting up a flow of knowledge discovery in a diagram
– Overview of the entire discovery project

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
Chapter Summary
• Importance of data in operation and importance of
information and knowledge in decision-making
• Data rich does not mean information rich
• Data mining: automatic or semi automatic data
understanding and decision support
• To classify, to estimate, to predict and to describe
• Data mining closely relates to database, statistics and
machine learning
• Data mining: from technology towards application
• A lot of potential uses and a lot of challenges to face
• Weka: excellent tool to support teaching data mining
Data Mining Techniques and Applications, 1 st edition
Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning
References
Read Chapter 1 of Data Mining Techniques and
Applications

Useful further references


• Han & Kamber, Chapter 1
• Berry & Linoff, Chapter 1 (business-like)
• Kdnuggets: http://www.kdnuggets.com/

Data Mining Techniques and Applications, 1 st edition


Hongbo Du
ISBN 978-1-84480-891-5 © 2010 Cengage Learning

You might also like