0% found this document useful (0 votes)
2 views23 pages

Ch1 Overview KDD - ML

Uploaded by

Hunzila Nisar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views23 pages

Ch1 Overview KDD - ML

Uploaded by

Hunzila Nisar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

1

ARIN-2137
KNOWLEDGE DISCOVERY & DATA MINING
2

TOPIC
1
3

Knowledge Discovery in Database


KDD is the automatic extraction of non-obvious,
hidden knowledge from large volumes of data.

106-1012 bytes: What is the knowledge?


we never see the whole Then run Data How to represent
data set, so will put it in Mining algorithms and use it?
the memory of computers
4

Data, Information, Knowledge


We often see data as a string of bits, or numbers
and symbols, or “objects” which we collect daily.

Information is data stripped of redundancy, and


reduced to the minimum necessary to characterize
the data.

Knowledge is integrated information, including facts


and their relations, which have been perceived,
discovered, or learned as our “mental pictures”.
Knowledge can be considered data at
a high level of abstraction and generalization.
5

Data, Information, Knowledge

10°C
10 It’s cold
temperature

DATA INFORMATION KNOWLEDGE

Give your own example of data, info & knowledge


6

Data Rich Knowledge Poor


How to acquire knowledge for
knowledge-based systems
remains as the main difficult

People gathered and stored so and crucial problem.


much data because they think
some valuable assets
are implicitly coded within it. ?
Raw data is rarely of direct knowledge inference
base engine
benefit.
Its true value depends on the
ability to extract information
useful for decision support. Tradition: via knowledge engineers
Impractical Manual Data Analysis New trend: via automatic programs
7

Benefits of Knowledge Discovery

Value

Disseminate

Generate
DSS
MIS
EDP
Rapid Response
Volume
EDP: Electronic Data Processing
MIS: Management Information Systems
DSS: Decision Support Systems
8

The KDD process


The non-trivial process of identifying valid, novel,
potentially useful, and ultimately understandable
patterns in data - Fayyad, Platetsky-Shapiro, Smyth (1996)
Multiple process

non-trivial process
Justified patterns/models
valid
novel Previously unknown

useful Can be used

understandable by human and machine


9

The Knowledge Discovery Process


5
a step in the KDD
process consisting of Putting the results
methods that produce in practical use
useful patterns or 4
models from the data,
under some acceptable Interpret and Evaluate
computational discovered knowledge
efficiency limitations 3

Data Mining
2 Extract Patterns/Models

Collect and
Preprocess Data
1

Understand the domain KDD is inherently


and Define problems interactive and iterative
10
Data organized by function

Create/select
The KDD Process
target database
Data warehousing
Select sampling
1
technique and
sample data

Supply missing Eliminate


values noisy data 2

Normalize Transform Create derived Find important


values values attributes attributes &
value ranges

3 4
Select DM Select DM Extract Test Refine
task (s) method (s) knowledge knowledge knowledge

Query & report generation


Transform to Aggregation & sequences
different
representation Advanced methods 5
11

KDD: Opportunity and Challenges


Competitive
Pressure

Data Rich
Knowledge Poor
(the resource) KDD
Data Mining
Technology
Mature

Enabling Technology
(Interactive MIS, OLAP,
parallel computing, Web, etc.)
12

Challenges KDD

Scalability Dimensionality

Complex and
Data heterogeneous
ownership data

Privacy Data quality


preservation
13

Data Mining

Data mining is a step in the


KDD process of applying data analysis
and discovery algorithms that, under
acceptable computational efficiency
limitations, produce a particular
enumeration of patterns (or models) on
the data

“solving problems by analyzing data that


already exists in databases”
Witten and Frank, 2005
14

Data Mining Tasks


Eg: Clustering,
summarization

• Find some human


interpretable rules, r/ships,
descriptive patterns in data

• Infers from current data to


make predictions
predictive
Eg: regression,
classification
15

Primary Tasks of Data Mining


finding the description
identifying a finite
of several predefined
set of categories or
classes and classify
clusters to describe
a data item into one
the data.
of them.
Clustering
Classification
finding a model
maps a data item which describes
? significant dependencies
to a real-valued
prediction variable. between variables.

Regression
Dependency
discovering the finding a Modeling
most significant compact description
changes in the data for a subset of data
Deviation and
change detection Summarization
16
17

Data Mining vs Machine


Learning

Both data mining and machine learning are


rooted in data science and generally
fall under that umbrella. They often
intersect or are confused with each other
18

Data Mining vs Machine


Learning
Data Mining discovers hidden
value (or revealing valuable
knowledge hidden in raw data)
in data warehouse

Machine Learning is a branch of


artificial intelligence (AI) or the
science of getting computers to act
without being explicitly programmed
19

Data Mining vs Machine


Learning
20
21

TOP MACHINE LEARNING ALGORITHMS

Linear regression

Logistic regression

Linear discriminant analysis


Learning
Machine

Classification Trees

Naïve bayes

K nearest neighbours

Learning Vector quantization

Support vector machine

Bagging and random forest

Boosting and AdaBoost


22

Potential Applications
Business information Manufacturing information

- Marketing and sales


data analysis
- Investment analysis
- Loan approval
- Controlling and scheduling
- Fraud detection
- Network management
- etc.
- Experiment result analysis
- etc.
Scientific information Personal information
- Sky survey cataloging
- Biosequence Databases
- Geosciences: Quakefinder
- etc.
Example: Recommender Systems

23

You might also like