0% found this document useful (0 votes)

60 views12 pages

LectureSlide 1

1) The document provides an outline for a course on data mining, covering an intuitive introduction, textbook chapters, and student presentations. 2) Key concepts around data, information, and knowledge are defined - data is unprocessed facts, information is interpreted data, and knowledge combines information with experience and insight. 3) Data mining aims to discover useful patterns and knowledge automatically from large amounts of data through techniques like classification, clustering, and prediction. It helps address the "data explosion problem" of having more data than the ability to analyze it.

Uploaded by

Rajni Kapoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views12 pages

LectureSlide 1

Uploaded by

Rajni Kapoor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

7/22/2010

Data Mining Data Mining

Part One: Intuitive Introduction and DM Overview

Part Two: Textbook chapters
Part Three: Students Presentations
Dr Muhammad Abulaish
Dr. Course Textbook:

Reader, Dept. of Computer Science J. Han, M. Kamber

Jamia Millia Islamia, New Delhi - 25 DATA MINING
Email: [email protected] Concepts and Techniques
Morgan Kaufmann, 2003/2006

Course Outline Data

Data is the Latin plural of datum
Used to represent unprocessed facts and figures without
any added interpretation or analysis.
Click here to see the course outline
Generally associated with some entity and often viewed
as the lowest level of abstraction from which
information and knowledge are derived.
Data may be unstructured, semi-structured, and
structured
Example: The price of petrol is Rs. 48 per liter

Information Knowledge
Information is interpreted (processed) data so that it has Knowledge is a fluid mix of information, experience and
meaning for the user. insight that may benefit the individual or the
“The price of petrol has risen from Rs. 43 to Rs. 48 per organization.
liter” – is information for a pperson who tracks ppetrol When petrol prices go up by Rs.
“When Rs 5 per liter,
liter it is likely
prices. that bus fare will rise by 10%" is knowledge.
Data becomes information when it is processed for some The boundaries between data, information, and
purpose and adds value for the recipient. knowledge is fuzzy
A set of raw sales figures – Data What is data to one person is information to someone
Sales report (chart plotting, trend analysis) – Information else.

1
7/22/2010

Data Mining, Text Mining and Web 8

Summarized View
Mining
Data are stored in Documents (A file)
Data – as in databases

Information – Processed data Unstructured Semi-structured Structured

knowledge is a meta information about the

A file stored on A web page A database
patterns hidden in the data your PC stored on WWW
10%

The patterns must be discovered automatically

Text Mining Web Mining Data Mining

Data Mining
Why Data Mining?
Main Objectives
Identification of data as a source of useful Data explosion problem
information
The Explosive Growth of Data: from terabytes to
petabytes
Use of discovered information for competitive
Automated data collection tools and mature database
advantages when working in business
technology lead to tremendous amounts of data
enviroment
stored in databases, datawarehouses and other
information repositories

Why Data Mining? (c.d.) Why DM? (c.d.)

Data explosion problem (c.d.) Data explosion problem (c.d.)

Major sources of abundant data We are drowning in data, but starving for knowledge!
Business: Web
Web, e
e-commerce,
commerce transactions
transactions, stocks
stocks, … Solution:
S l ti Data
D t warehousing
h i and
dDData Mining
t Mi i
Science: Remote sensing, bioinformatics, scientific
Extraction of interesting knowledge (rules, regularities,
simulation
patterns, constraints) from data in large databases
Society and everyone: news, digital cameras,

2
7/22/2010

The Huber Taxonomy of Data Set

Algorithmic Complexity
Sizes
Descriptor Data Set Size in Storage Mode Algorithm Complexity
Bytes
Plot a scatterplot O(n 1/2)

Tiny 102 Piece of Paper

Calculate means, variances, kernel densityy O(n)
S ll
Small 104 A Few
F Pieces
Pi off P
Paper estimates
Medium 106 A Floppy Disk
Calculate fast Fourier transforms O(n log(n))
Large 108 Hard Disk
Calculate singular value decomposition of an O(nc)
Huge 1010 Multiple Hard Disks, e.g. rc matrix; solve a multiple linear regression
RAID Storage
Massive 1012 Robotic Magnetic Tape, Solve most clustering algorithms O(n2)
Storage Silos

No. of Operations for Algorithms of Various

Computational Feasibility on a Pentium PC
Computational Complexities and various
10 MegaFLOPs Performance Assumed
Data Set Sizes

n n 1/2 n n log(n) n 3/2 n2

n n1/2 n n log(n) n3/2 n2 tiny 10 -6 10 -5 2x10 -5 .0001 .001
seconds seconds seconds seconds seconds
tinyy 10 102 2x102 103 104 small 10 -5 .001
001 .004
004 .11 10
seconds seconds seconds seconds seconds
small 102 104 4x104 106 108
medium .0001 .1 .6 1.67 1.16
seconds seconds seconds minutes days
medium 103 106 6x106 109 1012
large .001 10 1.3 1.16 31.7
large 104 108 8x108 1012 1016 seconds seconds minutes days years
huge .01 16.7 2.78 3.17 317,000
huge 105 1010 1011 1015 1020 seconds minutes hours years years

Computational Feasibility on a Silican Computational Feasibility on an Intel Paragon

Graphics Onyx Workstation XP/S A4
300 MegaFLOPs Performance Assumed 4.2 GigaFLOPs Performance Assumed

n n1/2 n n log(n) n3/2 n2 n n1/2 n n log(n) n3/2 n2

tiny 3.3x10-8 3.3x10-7 6.7x10-7 3.3x10-6 3.3x10-5
tiny 2.4x10-9 2.4x10-8 4.8x10-8 2.4x10-7 2.4x10-6
seconds seconds seconds seconds seconds
seconds seconds seconds seconds seconds
small 3.3x10-7 3.3x10-5 1.3x10-4 3.3x10-3 .33
seconds seconds seconds seconds seconds small 2.4x10-8 2.4x10-6 9.5x10-6 2.4x10-4 .024
seconds seconds seconds seconds seconds
-6 -3
medium 3.3x10 3.3x10 .02 3.3 55 -7 -4
seconds seconds seconds seconds minutes medium 2.4x10 2.4x10 .0014 .24 4.0
seconds seconds seconds seconds minutes
large 3.3x10-5 .33 2.7 55 1.04
-6
seconds seconds seconds minutes years large 2.4x10 .024 .19 4.0 27.8
seconds seconds seconds minutes days
huge 3.3x10-4 33 5.5 38.2 10,464
seconds seconds minutes days years huge 2.4x10-5 2.4 24 66.7 761
seconds seconds seconds hours years

3
7/22/2010

Computational Feasibility on a TeraFLOP

Types of Computers for Interactive Feasibility
Grand Challenge Computer
Response Time < 1 Second
1000 GigaFLOPs Performance Assumed

n n1/2 n n log(n) n3/2 n2

n n1/2 n n log(n) n 3/2 n2
tiny 10-11 10-10 2x10-10 10-9 10-8 tiny Personal Personal Personal Personal Personal
seconds seconds seconds seconds seconds C
Computer C
Computer C
Computer C
Computer C
Computer
small 10-10 10-8 4x10-8 10-6 10-4 small Personal Personal Personal Personal Super
seconds seconds seconds seconds seconds Computer Computer Computer Computer Computer

medium 10-9 10-6 6x10-6 .001 1 medium Personal Personal Personal Super Computer Teraflop
seconds seconds seconds seconds second Computer Computer Computer Computer

large -8
10 -4
10 8x10-4
1 2.8 large Personal Workstation Super Computer Teraflop ---
Computer Computer
seconds seconds seconds second hours
-7 huge Personal Super Teraflop --- ---
huge 10 .01 .1 16.7 3.2 Computer Computer Computer
seconds seconds seconds minutes years

Types of Computers for Feasibility Massive Data Sets:

Response Time < 1 Week Commonly Used Language

n n 1/2 n n log(n) n 3/2 n2

Data Mining = DM
tiny Personal
Computer
Personal
Computer
Personal
Computer
Personal
Computer
Personal
Computer Knowledge
g Discoveryy in Databases = KDD
small Personal
Computer
Personal
Computer
Personal
Computer
Personal
Computer
Personal
Computer Massive Data Sets = MD
medium Personal
Computer
Personal
Computer
Personal
Computer
Personal
Computer
Personal
Computer Data Analysis = DA
large Personal Personal Personal Personal Teraflop
Computer Computer Computer Computer Computer
huge Personal Personal Personal Super Computer ---
Computer Computer Computer

What is Data Mining? DM: Intuitive Definition

There are many activities with the same

Process to extract previously unknown
name: CONFUSSION
knowledge from large volumes of data
DM: Huge volumes of data
DM: Potential hidden knowledge
Requires both new technologies and
DM: Process of discovery of hidden methods
patterns in data

4
7/22/2010

Data Mining DM Some Applications

DM creates models (algorithms):

Classification
Target marketing, customer relation
Clustering management, market basket analysis,
Association cross selling
selling, market segmentation
Prediction
Forecasting, customer retention, quality
DM often presents the knowledge as a set of rules of the
form:
control, competitive analysis
IF.... THEN...
Finds other relationships in data
Detects deviations

DM Other Applications DM: Business Advantages

Other Applications Data Mining uses gathered data to

Text mining (news group, email, documents) Predicts tendencies and waves
and Web analysis.
y Classifies new data
Intelligent query answering Find previously unknown patterns
Scientific Applications Discover unknown relationships

DM: Technologies Data Mining vs Statistics

Many commercially available tools Some statistical methods are considered as a part of
Many methods (models, algorithms) for the same task Data Mining i.e. they are used as Data Mining
TOOLS ALONE ARE NOT THE SOLUTION algorithms, or as a part of Data Mining algorithms
The user must be able to interpret the results; one of the
requirements of DM is: Some, like statistical prediction methods of different
“the results must be easily comprehensible to the user” types of regression and clustering methods are now
Most often,especially when dealing with statistical considered as an integral part of Data Mining research
methods analysts are needed to interpret the knowledge – and applications
weakness of statistical methods.

5
7/22/2010

Fraud Detection and Management

Bussiness Applications (B1)

Buying patterns Applications

Fraud detection widely used in health care, retail, credit card
services, telecommunications (phone card
pp
Decision support fraud) etc
fraud), etc.
Medical aplications Approach
Marketing use historical data to build models of
fraudulent behavior and use data mining to
and more
help identify similar instances

Fraud Detection and Management Fraud Detection and Management

(B2) (B3)
Examples Detecting inappropriate medical treatment
auto insurance: detect characteristics of group Australian Health Insurance Commission detected that in
of people who stage accidents to collect on many cases blanket screening tests were requested
((save Australian $
$1m/yr).
y)
insurance
Detecting telephone fraud
money laundering: detect characteristics of
DM builds telephone call model: destination of the call,
suspicious money transactions (US Treasury's duration, time of day or week. Detects patterns that
Financial Crimes Enforcement Network) deviate from an expected norm.
medical insurance: detect characteristics of British Telecom identified discrete groups of callers with
fraudulent patients and doctors frequent intra-group calls, especially mobile phones, and
broke a multimillion dollar fraud.

Fraud Detection and Management

(B4) Data Mining vs Data Marketing

Retail Data Mining methods apply to many

domains
Analysts used Data Mining techniques to
estimate that 38%
% of retail shrink is due to Applications of Data Mining methods in
dishonest employees which the goal is to find buying patterns in
Transactional Data Bases has been named:
and more….
Data Marketing

6
7/22/2010

Market Analysis and Management Market Analysis and Management

(MA1) (MA2)
Where are the data sources for analysis? Determine customer purchasing
Credit card transactions, loyalty cards, discount patterns over time
coupons, customer complaint calls, plus (public)
lif t l studies
lifestyle t di Conversion of single to a joint bank account:
when marriage occurs, etc.
Target marketing
DM finds clusters of “model” customers who Cross-market analysis
share the same characteristics: interest, income Associations/co-relations between product sales
level, spending habits, etc.
Prediction based on the association information

Market Analysis and Management Corporate Analysis and Risk

(MA3) Management (CA1)
Customer profiling
Finance planning and asset evaluation
data mining can tell you what types of customers
cash flow analysis and prediction
buy what products (clustering or classification) contingent claim anal
analysis evaluate
sis to e al ate assets
Identifying customer requirements cross-sectional and time series analysis
(financial-ratio, trend analysis, etc.)
identifying the best products for different customers
Resource planning:
summarize and compare the resources and
spending

Corporate Analysis and

Risk Management (CA2) Business Summary

Data Mining helps to improve competitive

Competition: advantage of organizations in dynamically
monitor competitors and market directions changing environment; it improves clients
group
gro p ccustomers class-
stomers into classes and a class retention
t ti and d conversion
i
based pricing procedure
Different Data Mining methods are requiered
set pricing strategy in a highly competitive
for different kind of data and different kinds
market
of goals

7
7/22/2010

Scientific Applications Other Applications

Networks failure detection Sports

Controllers IBM Advanced Scout analyzed NBA game statistics
(shots blocked, assists, and fouls) to gain
g p
Geographic y
Information Systems competitive advantage for New York Knicks and
Genome- Bioinformatics Miami Heat

Intelligent robots Astronomy

etc… etc …. JPL and the Palomar Observatory discovered 22
quasars with the help of data mining
And more …..

Evolution of Database Technology

What is NOT Data Mining

Once the patterns are found Data Mining

1960s:
process is finished
Data collection, database creation, IMS and
The use of the patterns is not Data Mining
network DBMS
Queries to the database are not DM
1970s:
Relational data model, relational DBMS
implementation

Evolution of Database
Short History of Data Mining
Technology c.d.
1989 - KDD term (Knowledge Discovery in
1980s: Databases) appears in (IJCAI Workshop)
RDBMS, advanced data models (extended- 1991 - a collection of research papers edited by
Piatetsky-Shapiro
y p and Frawley y
relational OO
relational, OO, deductive
deductive, etc
etc.)) and
application-oriented DBMS (spatial, scientific, 1993 – Association Rule Mining Algorithm
APRIORI proposed by Agrawal, Imielinski and
engineering, etc.)
Swami.
1990s—2000s: 1996 – present: KDD evolves as a conjuction of
Data mining and data warehousing, different knowledge areas (data bases, machine
multimedia databases, and Web databases learning, statistics, artificial intelligence) and the
term Data Mining becomes popular

8
7/22/2010

Data Mining: Confluence of KDD process: Definition [Piatetsky-

Multiple Disciplines Shapiro 97]

Database KDD is a non trivial process for identification

Statistics
Technology of :
Valid
Machine New
Learning
Data Mining Visualization
Potentially useful
Understable
patterns in data
Information Other
Science Disciplines

The KDD process Steps of the KDD process

INTERPRETATION AND EVALUATION Preprocessing: includes all the operations that

have to be performed before a data mining
knowledge
DATA MINING algorithm is applied
((Chapter
p 3)
CODIFICATION Models
Data Mining: knowledge discovery algorithms
Transformed data
are applied in order to obtain the patterns
CLEANING
(Chapters 6, 7, and 8 )

SELECTION
Processed Data Interpretation: discovered patterns are
Target data
presented in a proper format and the user decides
if it is neccesary to re-iterate the algorthms
Data

DM: Data Mining KDD vs DM

DM is a step of the KDD process in which KDD is a term used by Academia

algorithms are applied to look for patterns in DM is a commercial term
data DM term is also being g used in Academia,,
It is necessary to apply first the as it has become a “brand name” for both
preprocessing operation to clean and KDD process and its DM sub-process
preprocess the data in order to obtain The important point is to see Data Mining as
significant patterns a process

9
7/22/2010

Architecture of a Typical Data Mining Data Mining: On What Kind of Data?

System
Graphical user interface Relational Databases
Data warehouses
Pattern evaluation Transactional databases
Data mining engine Advanced DB and information repositories
Object-oriented and object-relational databases
Knowledge-base
Database or data Spatial databases
warehouse server Time-series data and temporal data
Data cleaning & data integration Filtering
Text databases and multimedia databases
Data
Heterogeneous and legacy databases
Databases
Warehouse WWW

DM Functionalities (1) DM Functionalities (2)

Concept, class, description Concept characteristics
Concept – is defined semantically as any subset of records. Concept C characteristics is a set of attributes
We often define the concept by attribute c and its value v
a1, a2, … ak, and their respective values v1, v2,
In this case the concept description is syntactically written as …. vk that are characteristic for a given concept
: c=v and we define:
c , i.e.
i
CONCEPT={records: c=v}
For example: climate=wet (description of the concept) {records: a1=v1 & a2=v2&…..ak=vk}
CONCEPT={records: climate=wet} Characteristics description is then syntactically
We use word: CLASS, class attribute written as
for Concept, concept attribute a1=v1 & a2=v2&…..ak=vk

Characterization Discrimination

Describes the process which aim is to It is the process which aim is to find rules
find rules that describe properties of a that allow us to discriminate the objects
concept. They take the form (records) belonging to a given concept (one
class ) from the rest of records ( classes)
If concept then characteristics If characteristics then concept
A=0 & B=1 Æ C=1 33% 83% (support, confidence: the conditional
C=1 Æ A=1 & B=3 25% (support: there are 25% o the records for probability of the concept given the characteristics)
which the rule is true) A=2 & B=0 Æ C=1 27% 80%
C=1 Æ A=1 & B=4 17% A=1 & B=1 Æ C=1 12% 76%
C=1 Æ A=0 & B=2 16% Discriminant rule can be good even if it has a low support (and high
confidence)

10
7/22/2010

Data Mining Functionalities

Data Mining Functionalities (3) (4)
Prediction (statistical)
Classification and Prediction - Supervised - predict some unknown or missing numerical
learning
values
Finding models (rules) that describe (characterize) or/
and distinguish (discriminate) classes or concepts for C uste a
Cluster a ys s
analysis
future prediction Class label is unknown: Group data to form new
Example: classify countries based on climate classes- unsupervised learning
(characteristics), or classify cars based on gas For example: cluster houses to find distribution
mileage and use it to predict classification of a new patterns
car Clustering is based on the principle: maximizing the
Presentation: decision-tree, classification rules, intra-class similarity and minimizing the interclass
neural network, Bayes Network similarity

Data Mining Functionalities (5) Major Issues in Data Mining (1)

Mining methodology and user interaction

Outlier analysis Mining different kinds of knowledge in
Outlier: a data object that does not comply databases
with the general behavior of the data Interactive
I t ti mining
i i off knowledge
k l d att multiple
lti l
levels of abstraction
It can be considered as noise or exception
Incorporation of background knowledge
but is quite useful in fraud detection, rare
Data mining query languages and ad-hoc data
events analysis
mining
Expression and visualization of data mining
results

Major Issues in Data Mining (2) Major Issues in Data Mining (3)

Handling noise and incomplete data Issues relating to the diversity of data types
Handling relational and complex types of data
Pattern evaluation: the interestingness problem
Mining information from heterogeneous databases and
global information systems (WWW)
Performance and scalability Issues related to applications and social impacts
Efficiency and scalability of data mining Application of discovered knowledge
Domain-specific data mining tools
algorithms Intelligent query answering
Parallel, distributed and incremental Process control and decision making

mining methods Integration of the discovered knowledge with existing

knowledge: A knowledge fusion problem
Protection of data security, integrity, and privacy

11
7/22/2010

Aproaches (I)

Mathematics: Consist in the creation of

APPROACHES TO DATA mathematical models to extract rules,
MINING regularities and patterns (rough sets)

Statistics: They are focused in the creation

of statistical models to analyse data.
(bayesian networks)

Approaches (II)

Artificial Intelligence:
Classification trees (ID3, C4.5..)
Clustering

Neural Networks
Genetic algorithms
Visualization techniques
...

Splunk 8.1 Fundamentals Part 3
100% (4)
Splunk 8.1 Fundamentals Part 3
304 pages
CMD Hacking
100% (4)
CMD Hacking
2 pages
(Nijhoff International Philosophy Series) Stanislaw Lesniewski - S. J. Surma Et Al. (Eds.) - Collected Works. 1, 2-Springer (1991)
100% (4)
(Nijhoff International Philosophy Series) Stanislaw Lesniewski - S. J. Surma Et Al. (Eds.) - Collected Works. 1, 2-Springer (1991)
408 pages
100+ Java Interview Questions and Answers
No ratings yet
100+ Java Interview Questions and Answers
11 pages
Research and Development of Renewable Energy Prototype of Led Street Lighting From Solar Energy
No ratings yet
Research and Development of Renewable Energy Prototype of Led Street Lighting From Solar Energy
12 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
Data Mining
No ratings yet
Data Mining
395 pages
Data-Mining FINAL
No ratings yet
Data-Mining FINAL
45 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
1 DM Intro
No ratings yet
1 DM Intro
38 pages
0 Introduction
No ratings yet
0 Introduction
43 pages
Chapter 1 - Tagged
No ratings yet
Chapter 1 - Tagged
46 pages
Introduction-to-Data-Mining
No ratings yet
Introduction-to-Data-Mining
32 pages
DB-14
No ratings yet
DB-14
97 pages
Unit 3
No ratings yet
Unit 3
23 pages
01Intro
No ratings yet
01Intro
52 pages
DM 1
No ratings yet
DM 1
78 pages
Lecture 1-Data Mining (Introduction)
No ratings yet
Lecture 1-Data Mining (Introduction)
30 pages
Internal
No ratings yet
Internal
267 pages
DM Lec1
No ratings yet
DM Lec1
40 pages
intro data mining
No ratings yet
intro data mining
51 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Acp Excise
No ratings yet
Acp Excise
11 pages
1712060004 (1)
No ratings yet
1712060004 (1)
25 pages
L1 CH 1 Introd
No ratings yet
L1 CH 1 Introd
97 pages
Data Mining
No ratings yet
Data Mining
27 pages
1 01intro, 2data (Except2 3), 3preprocessing
No ratings yet
1 01intro, 2data (Except2 3), 3preprocessing
169 pages
Week 01 Chapt01
No ratings yet
Week 01 Chapt01
49 pages
Introduction
No ratings yet
Introduction
46 pages
PPP
No ratings yet
PPP
38 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
Anaum Hamid: Lecture 01 - Introduction To DM
No ratings yet
Anaum Hamid: Lecture 01 - Introduction To DM
50 pages
1 Lect - 1.2 - 12 - August 2022 PDF
No ratings yet
1 Lect - 1.2 - 12 - August 2022 PDF
59 pages
Data Mining
No ratings yet
Data Mining
7 pages
Chapter-1 (Introduction)
No ratings yet
Chapter-1 (Introduction)
17 pages
DM BS Lec1 Intro
No ratings yet
DM BS Lec1 Intro
20 pages
Data Analysis-2
No ratings yet
Data Analysis-2
41 pages
Chapter One
No ratings yet
Chapter One
21 pages
Data Mining
No ratings yet
Data Mining
7 pages
lec slides combined mid quiz with old quizzes (1)
No ratings yet
lec slides combined mid quiz with old quizzes (1)
378 pages
Data Mining in Search Engine Analytics
No ratings yet
Data Mining in Search Engine Analytics
7 pages
Data Mining
No ratings yet
Data Mining
61 pages
Data Mining and Warehousing-1
No ratings yet
Data Mining and Warehousing-1
43 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
01 Intro
No ratings yet
01 Intro
22 pages
DM-Unit 1 PPT
No ratings yet
DM-Unit 1 PPT
110 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
01Intro
No ratings yet
01Intro
41 pages
Week1-1
No ratings yet
Week1-1
18 pages
unit-III
No ratings yet
unit-III
101 pages
01 Intro
No ratings yet
01 Intro
61 pages
dm mod1
No ratings yet
dm mod1
29 pages
1-Data Mining and Applications
No ratings yet
1-Data Mining and Applications
70 pages
Data Mining Note
No ratings yet
Data Mining Note
79 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
1 - 1 Intro To Data Mining - ch1
No ratings yet
1 - 1 Intro To Data Mining - ch1
18 pages
01 Intro
No ratings yet
01 Intro
35 pages
Introduction To Data Mining: - Chapter 3
No ratings yet
Introduction To Data Mining: - Chapter 3
39 pages
LECTURE 1 data mining
No ratings yet
LECTURE 1 data mining
41 pages
Module - 1 - DM
No ratings yet
Module - 1 - DM
52 pages
01Intro
No ratings yet
01Intro
28 pages
01 Intro
No ratings yet
01 Intro
40 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet
Simulation of Solar Powered EV Scooter
No ratings yet
Simulation of Solar Powered EV Scooter
7 pages
Loading & Discharge Hoses For Offshore Moorings
100% (4)
Loading & Discharge Hoses For Offshore Moorings
17 pages
Kalda Handouts Merged
No ratings yet
Kalda Handouts Merged
466 pages
0581 w13 Ms 31
No ratings yet
0581 w13 Ms 31
6 pages
MATHEMATICS ANNUAL REVISION WORKSHEET CLASS 8
No ratings yet
MATHEMATICS ANNUAL REVISION WORKSHEET CLASS 8
6 pages
2023 Alsharedah - Centrifuge Testing of Improved Monopile Foundation For Offshore
No ratings yet
2023 Alsharedah - Centrifuge Testing of Improved Monopile Foundation For Offshore
16 pages
Standard For Canned Mackerel and Jack 1979
No ratings yet
Standard For Canned Mackerel and Jack 1979
40 pages
DIAEnergie WebAPI v1.4
No ratings yet
DIAEnergie WebAPI v1.4
9 pages
ZSI: The Zolera Soap Infrastructure Developer's Guide
No ratings yet
ZSI: The Zolera Soap Infrastructure Developer's Guide
71 pages
Conceptual Engineering and Pragmatism - Historical and Theoretical Perspectives
No ratings yet
Conceptual Engineering and Pragmatism - Historical and Theoretical Perspectives
9 pages
Types of DC Generators
No ratings yet
Types of DC Generators
28 pages
S10.s4 Bearing-Capacity
No ratings yet
S10.s4 Bearing-Capacity
20 pages
Object Oriented Programming With C++
No ratings yet
Object Oriented Programming With C++
2 pages
Wa0010.
No ratings yet
Wa0010.
36 pages
1. C3 & M3 Maths (SRP) Material (25-26)
No ratings yet
1. C3 & M3 Maths (SRP) Material (25-26)
10 pages
Database
No ratings yet
Database
11 pages
1 SM PDF
No ratings yet
1 SM PDF
13 pages
Eor PDF
No ratings yet
Eor PDF
20 pages
Recount Text
No ratings yet
Recount Text
15 pages
motilal-oswal-midcap-fund-regular-plan (1)
No ratings yet
motilal-oswal-midcap-fund-regular-plan (1)
2 pages
Cheat Sheet of Mathemtical Notation and Terminology
No ratings yet
Cheat Sheet of Mathemtical Notation and Terminology
1 page
Project of Sanitary Engineering: Assignment 2: Design of A Water Distribution and Wastewater Drainage Systems
No ratings yet
Project of Sanitary Engineering: Assignment 2: Design of A Water Distribution and Wastewater Drainage Systems
3 pages
Two Way Slab Punching Shear Check
No ratings yet
Two Way Slab Punching Shear Check
1 page
Download full Schaum s outline of theory and problems of discrete mathematics 3rd Edition Seymour Lipschutz ebook all chapters
100% (21)
Download full Schaum s outline of theory and problems of discrete mathematics 3rd Edition Seymour Lipschutz ebook all chapters
60 pages
P Rafful
No ratings yet
P Rafful
74 pages

LectureSlide 1

Uploaded by

LectureSlide 1

Uploaded by

7/22/2010

Data Mining Data Mining

Part One: Intuitive Introduction and DM Overview

Reader, Dept. of Computer Science J. Han, M. Kamber

Course Outline Data

Data Mining, Text Mining and Web 8

Information – Processed data Unstructured Semi-structured Structured

knowledge is a meta information about the

The patterns must be discovered automatically

Why Data Mining? (c.d.) Why DM? (c.d.)

Data explosion problem (c.d.) Data explosion problem (c.d.)

The Huber Taxonomy of Data Set

Tiny 102 Piece of Paper

No. of Operations for Algorithms of Various

n n 1/2 n n log(n) n 3/2 n2

Computational Feasibility on a Silican Computational Feasibility on an Intel Paragon

n n1/2 n n log(n) n3/2 n2 n n1/2 n n log(n) n3/2 n2

Computational Feasibility on a TeraFLOP

n n1/2 n n log(n) n3/2 n2

Types of Computers for Feasibility Massive Data Sets:

n n 1/2 n n log(n) n 3/2 n2

What is Data Mining? DM: Intuitive Definition

There are many activities with the same

Data Mining DM Some Applications

DM creates models (algorithms):

DM Other Applications DM: Business Advantages

Other Applications Data Mining uses gathered data to

DM: Technologies Data Mining vs Statistics

Fraud Detection and Management

Buying patterns Applications

Fraud Detection and Management Fraud Detection and Management

Fraud Detection and Management

Retail Data Mining methods apply to many

Market Analysis and Management Market Analysis and Management

Market Analysis and Management Corporate Analysis and Risk

Corporate Analysis and

Data Mining helps to improve competitive

Scientific Applications Other Applications

Networks failure detection Sports

Intelligent robots Astronomy

Evolution of Database Technology

Once the patterns are found Data Mining

Data Mining: Confluence of KDD process: Definition [Piatetsky-

Database KDD is a non trivial process for identification

The KDD process Steps of the KDD process

INTERPRETATION AND EVALUATION Preprocessing: includes all the operations that

DM: Data Mining KDD vs DM

DM is a step of the KDD process in which KDD is a term used by Academia

Architecture of a Typical Data Mining Data Mining: On What Kind of Data?

DM Functionalities (1) DM Functionalities (2)

Data Mining Functionalities

Data Mining Functionalities (5) Major Issues in Data Mining (1)

Mining methodology and user interaction

mining methods Integration of the discovered knowledge with existing

Mathematics: Consist in the creation of

Statistics: They are focused in the creation

You might also like