0% found this document useful (0 votes)

6 views

Lecture 1 Introduction Updated (1)

The document is an introductory lecture on data mining, covering its definition, trends leading to data generation, and the necessity of data mining in understanding large datasets. It explains the knowledge discovery process, types of data suitable for mining, and various data mining techniques such as association analysis, classification, and clustering. Additionally, it provides examples of applications in marketing, fraud detection, and inventory management.

Uploaded by

zyadmonster22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lecture 1 Introduction Updated (1)

Uploaded by

zyadmonster22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

King Saud University

College of Computer & Information Sciences

IS 463 Introduction to Data Mining

Lecture 1
Preliminaries and Overview
Dr Mohammad Mehedi Hassan

The slides content is derived and adopted from many references

Trends leading to Data Flood
• More data is generated:
– Bank, telecom, other
business transactions ...
– Scientific Data: astronomy,
biology, etc
– Web, text, and e-commerce
• More data is captured:
– Storage technology faster
and cheaper
– DBMS capable of handling
bigger DB

2
Examples
• Europe's Very Long Baseline Interferometry (VLBI)
– has 16 telescopes, each of which produces 1 Gigabit/second
of astronomical data over a 25-day observation session
– storage and analysis a big problem
• Walmart reported to have 24 Tera-byte DB
• AT&T handles billions of calls per day
– data cannot be stored -- analysis is done on the fly

3
Growth Trends
• Moore’s law
– Computer Speed doubles every 18
months
• Storage law
– total storage doubles every 9 months
• Consequence
– very little data will ever be looked at
by a human
• Data mining is NEEDED to make
sense and use of data.

4
• What is data mining?
• Data mining on what kind of data
• Data mining : A KDD process
• What kind of patterns can be minded?
• Data mining system
• Data mining applications

5
What is data mining
• Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful)
patterns or knowledge from huge amount of
data stored either in databases, data warehouses or others
information repository.
• Alternative names :
– KDD (Knowledge Discovery in Databases),
– data pattern analysis,
– business intelligence

6
What is data mining
• Data mining can be viewed as a result of the
natural evolution of information technology.

• The evolutionary path :

1960 : data collection, database creation, network DBMS
1970 : relational data model (Codd)
1980 : advanced data models (extended relational, OO, deductive,…)
1990 : data analysis and understanding (data mining & data
warehousing)
2000 : data mining with variety of applications, web technology

7
What is (not) Data Mining?
 What is not Data  What is Data Mining?
Mining? – Certain names are more
– Look up phone prevalent in certain US
number in phone locations (O’Brien, O’Rurke,
directory O’Reilly… in Boston area)
– Group together similar
– Query a Web documents returned by search
search engine for engine according to their
information about context (e.g. Amazon
“Amazon” rainforest, Amazon.com,)

8
What is (not) Data Mining?
Supermarket Analysis
What is Data Mining:
A supermarket uses data mining to analyze customer
purchase data over time. They look for patterns, like which
products are often bought together (e.g., bread and butter) or
which days certain items sell the most. This helps them
optimize stock levels, create better promotions, and
understand customer preferences.
What is Not Data Mining:
Simply generating a report that shows last month’s total sales
figures without analyzing patterns or trends is not data
mining. It’s just basic data reporting.

9
What is (not) Data Mining?
Email Spam Filtering
What is Data Mining:
An email service uses data mining to analyze thousands of
emails, identifying patterns that distinguish spam from
legitimate emails. For example, the system might learn that
emails with certain keywords, phrases, or sender addresses
are often flagged as spam. Over time, this helps the service
automatically filter out spam emails for users.

What is Not Data Mining:

A user manually marking individual emails as spam is not
data mining. It’s just manual filtering without any analysis of
patterns across large datasets.
10
• What is data mining?
• Data mining on what kind of data
• Data mining : A KDD process
• What kind of patterns can be minded?
• Data mining system
• Data mining applications

11
Data mining on what kind of data
• Relational database
1. Relational database system is a collection of tables
with ER for modeling and SQL for querying

2. Data mining system may analyze customer data to

predict the credit risk of new customers based on
their income, age and previous credit information

3. Data mining system may detect deviations, such as

items whose sales are far from those expected in
comparison with the previous year.

12
Data mining on what kind of data
• Data warehouse
It is repository of multiple heterogeneous data
sources organized under a unified schema at a
single site in order to facilitate management
decision making.

• Data Mining can handle many prediction

problems

13
Data warehouse

Data source 1 Clean client

Querying
Transform
& Analysis
Integrate
Load Data warehouse

Data source n

client

14
Data mining on what kind of data
• Transactional database
1. Transaction is a file where each record represents a
transaction
sales(trans_ID, list of item_IDs)

trans_ID list of item_ID

T1 I1, I3, I8
… ….

2. Data mining can bring answer to “Which items sold

well together”

15
Data mining on what kind of data

• Advanced database and information repository

1. Spatial and temporal data

– Characteristics of houses located near a park
– Change in trend of metropolitan poverty rated based on city
distances from major highways
2. Heterogeneous and legacy database
3. Text databases & WWW

16
• What is data mining?
• Data mining on what kind of data
• Data mining: A KDD process
• What kind of patterns can be minded?
• Data mining system
• Data mining applications

17
Knowledge Discovery Process
Integration

Interpretation Knowledge
Da & Evaluation
ta
Mi
nin
Tra g Knowledge
ns
Raw Data for

Understanding
Se ma __ __ __
tio Patterns
& lect n __ __ __
Cl io __ __ __ and
ea n
nin Rules
g
Transformed
Target Data
DATA
Data
Ware
house

18
• What is data mining?
• Data mining on what kind of data
• Data mining: A KDD process
• What kind of patterns can be minded?
• Data mining system
• Data mining applications

19
What kind of Patterns can be minded
(Association analysis)
• Association analysis discovers association rules
showing attribute-value conditions that occur
frequently together in a set of data, e.g. market
basket
• A rule has the form body head
buys(X, “milk”)  buys(X, “sugar”)

20
What kind of Patterns can be minded
(Association analysis)
• Itemset X={x1, …, xk}
Transaction-id Items bought
• Find all the rules XY with min confidence
10 A, B, C
and support
20 A, C – support, s, probability that a
30 A, D transaction contains XY
support(XY ) = P(XY)
40 B, E, F
– confidence, c, conditional probability
that a transaction having X also
Customer Customer
buys both
contains Y
buys sugar
confidence(XY ) = P(Y/X)
=support({X,Y})/
Let support({X})
min_support = 50%,
min_conf = 50%
Customer A  C (50%, 66.7%)
buys milk
C  A (50%, 100%)
21
Association Rule Mining: Application 1
• Marketing and Sales Promotion:
• The rule discovered is: {Bagels, … } --> {Potato Chips}
• This rule means that when customers buy bagels (and possibly other items), they
also tend to buy potato chips. Here’s what each part of the rule implies for
marketing strategies:
• Potato Chips as consequent: This tells us that potato chips are often bought when
customers buy other products like bagels. Knowing this, the store can think of
strategies to increase the sales of potato chips. For instance, they might place
potato chips closer to bagels or advertise them together in promotions to boost
sales.
• Bagels in the antecedent: Since bagels are in the antecedent (the part before the
arrow, which triggers the rule), the store can analyze the impact of bagels on sales
of other products like potato chips. If the store considers discontinuing bagels, this
analysis will help them understand how that decision might affect the sales of
products that are often bought with bagels, like potato chips.
• Bagels in antecedent and Potato chips in consequent: This part of the rule
suggests that selling bagels alongside potato chips can be an effective strategy to
promote the sale of potato chips. The store could use this insight to create bundle
deals, discounts, or marketing campaigns that feature both products, potentially
increasing sales for both.
22
Association Rule Mining: Application 2
• Supermarket shelf management.
– Goal: To identify items that are bought together
by sufficiently many customers.
– Approach: Process the point-of-sale data
collected with barcode scanners to find
dependencies among items.
– A classic rule --
• If a customer buys diaper and milk, then he is very
likely to buy Juice.

23
Association Rule Mining: Application 3
• Inventory Management:
– Goal: The appliance repair company wants to be more efficient in
fixing consumer products. They aim to understand what kind of
repairs are usually needed for different appliances. This knowledge
will help them ensure that their service vehicles always carry the right
parts. By doing this, they hope to fix the appliances in one visit, rather
than having to make multiple trips to get the correct parts.

– Approach: To achieve this, the company plans to analyze the data from
past repair jobs. This data includes what tools and parts were needed
for each repair job at different locations. By examining this
information, the company can identify patterns. For example, they
might find that washing machines in a particular area often need a
specific type of belt replaced. Recognizing these patterns (called "co-
occurrence patterns") will help them predict what parts are likely to be
needed at future jobs in similar areas or with similar appliances.
24
What kind of Patterns can be minded
(Classification and Prediction)

• Construct models (functions) that describe and

distinguish classes or concepts for future prediction
– E.g., classify countries based on climate,
– or classify cars based on gas mileage
• Presentation
– decision-tree, classification rule, neural network
• Predict some unknown or missing numerical values

25
Training Dataset
age income student leasing_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

26
Output: A Decision Tree for
“buys_computer” Target Class
age?

<=30 overcast
31..40 >40

student? yes leasing_rating?

no yes excellent fair

no yes no yes

27
Classification: Application 1
• Direct Marketing
– Goal: Reduce cost of mailing by targeting a set of consumers likely to
buy a new cell-phone product.
– Approach:
• Use the data for a similar product introduced before.
• We know which customers decided to buy and which decided
otherwise.
• This {buy, don’t buy} decision forms the class attribute.
• Collect various demographic, lifestyle, and company-interaction
related information about all such customers.
– Type of business, where they stay, how much they earn, etc.
• Use this information as input attributes to learn a classifier model.

28
Classification: Application 2
• Fraud Detection
– Goal: Predict fraudulent cases in credit card
transactions.
– Approach:
• Use credit card transactions and the information on its
account-holder as attributes.
– When does a customer buy, what does he buy, how often he
pays on time, etc
• Label past transactions as fraud or fair transactions.
This forms the class attribute.
• Learn a model for the class of the transactions.
• Use this model to detect fraud by observing credit card
transactions on an account.
29
What kind of Patterns can be minded
(Cluster Analysis)
• Cluster: a collection of data objects
– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
• Cluster analysis
– Grouping a set of data objects into clusters
• Clustering is unsupervised classification: no
predefined classes
• Typical applications
– As a stand-alone tool to get insight into data
distribution

30
Clustering: Application 1
• Market Segmentation:
– Goal: subdivide a market into distinct subsets of
customers where any subset may conceivably be
selected as a market target to be reached with a
distinct marketing mix.
– Approach:
• Collect different attributes of customers based on their
geographical and lifestyle related information.
• Find clusters of similar customers.
• Measure the clustering quality by observing buying
patterns of customers in same cluster vs. those from
different clusters.
31
Clustering: Application 2
• Document Clustering:
– Goal: To find groups of documents that are
similar to each other based on the important
terms appearing in them.
– Approach:
• To identify frequently occurring terms in each
document.
• Form a similarity measure based on the frequencies
of different terms. Use it to cluster.
– Gain: Information Retrieval can utilize the
clusters to relate a new document or search
term to clustered documents. 32
Illustrating Document Clustering
• Clustering Points: 3204 Articles of Los Angeles Times.
• Similarity Measure: How many words are common in these
documents (after some word filtering).
Category Total Correctly
Articles Placed
Financial 555 364

Foreign 341 260

National 273 36

Metro 943 746

Sports 738 573

Entertainment 354 278

33
Clustering of S&P 500 Stock Data
• Observe Stock Movements every day.
• Clustering points: Stock-{UP/DOWN}
• Similarity Measure: Two points are more similar if the events
described by them frequently happen together on the same
day.
• We used association rules to quantify a similarity measure.
Discovered Clusters Industry Group

1
Applied-Matl-DOW N,Bay-Net work-Down,3-COM-DOWN,
Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,
DSC-Co mm-DOW N,INTEL-DOWN,LSI-Logic-DOWN,
Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down,
Technology1-DOWN
Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOW N,
Sun-DOW N

2
Apple-Co mp-DOW N,Autodesk-DOWN,DEC-DOWN,
ADV-M icro-Device-DOWN,Andrew-Corp-DOWN,
Co mputer-Assoc-DOWN,Circuit-City-DOWN,
Technology2-DOWN
Co mpaq-DOWN, EM C-Corp-DOWN, Gen-Inst-DOWN,
Motorola-DOW N,Microsoft-DOWN,Scientific-Atl-DOWN

3
Fannie-Mae-DOWN,Fed-Ho me-Loan-DOW N,
MBNA-Corp -DOWN,Morgan-Stanley-DOWN Financial-DOWN

4
Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,
Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP, Oil-UP
Schlu mberger-UP

34
Clustering Complex (social) network
• Complex networks are large
networks where local behavior
generates non-trivial global
features.
• Network Clustering
• Clustering coefficients – how
well connected?
• What does a complex network
look like when you can really
see it?
• Community discovery-separate
into densely connected subsets
• Automatic discovery of
communities
• Split by interest or meaning
35
Regression
• Predict a value of a given continuous valued variable
based on the values of other variables, assuming a
linear or nonlinear model of dependency.
• Greatly studied in statistics, neural network fields.
• Examples:
– Predicting sales amounts of new product based on
advetising expenditure.
– Predicting wind velocities as a function of
temperature, humidity, air pressure, etc.
– Time series prediction of stock market indices.

36
• What is data mining?
• Data mining on what kind of data
• Data mining : A KDD process
• What kind of patterns can be minded?
• Data mining system
• Data mining applications

37
Data Mining System
Graphical user interface

Pattern evaluation

Data mining engine

Data warehouse Knowledge-

server base
Data cleaning & data integration Filtering

Data
Databases Warehouse

38
Confluence of Multiple Disciplines

Database
Statistics
Systems

Machine
Learning
Data Mining Visualization

Algorithm Other
Disciplines
39
• What is data mining?
• Data mining on what kind of data
• Data mining : A KDD process
• What kind of patterns can be minded?
• Data mining system
• Data mining applications

40
Major Application Areas for
Data Mining Solutions
• Advertising
• Customer Relationship Management (CRM)
• Database Marketing
• Fraud Detection
• eCommerce
• Health Care
• Investment/Securities
• Manufacturing, Process Control
• Sports and Entertainment
• Telecommunications
• Web
• Bioinformatics

41
Case Study: Search Engines
• Early search engines used mainly keywords
on a page – were subject to manipulation
• Google success is due to its algorithm
which uses mainly links to the page
• Google founders Sergey Brin and Larry
Page were students in Stanford doing
research in databases and data mining in
1998 which led to Google

42
Case Study: Direct Marketing and CRM
• Most major direct marketing companies are
using modeling and data mining
• Most financial companies are using customer
modeling
• Modeling is easier than changing customer
behaviour
• Some successes (Homework)
– Verizon Wireless reduced churn rate from 2% to
1.5%
43
Case Study:
Security and Fraud Detection
• Credit Card Fraud Detection
• Money laundering
– FAIS (US Treasury)
• Securities Fraud
– NASDAQ Sonar system
• Phone fraud
– AT&T, Bell Atlantic, British
Telecom/MCI
• Bio-terrorism detection at Salt Lake
Olympics 2002 44
Data Mining with Privacy
• Data mining is about finding patterns in large sets of data to gain insights
or make predictions, not about tracking individual people.
• Protecting Privacy: Here's how privacy can be maintained while using
data mining:
• Replacing Personal Data: Instead of using sensitive personal details like
names or addresses, these are replaced with anonymous identifiers. This
means the data can be used without revealing who it belongs to.
• Randomized Outputs: Sometimes, data mining systems are designed to
provide outputs (results) that are slightly randomized. This helps to ensure
that the results can't be used to figure out personal details about the people
in the data.
• Multi-party Computation: This is a method where data is distributed
across different locations or parties. No single party has access to all the
information. They can work together to perform calculations or analyses
without actually sharing the sensitive data they each hold.
45
Summary
• Data mining: discovering interesting patterns from large
amounts of data
• KDD process includes data cleaning, data integration, data
selection, transformation, data mining, pattern evaluation,
and knowledge presentation
• Mining can be performed in a variety of information
repositories
• Data mining functionalities: association, classification,
clustering, outlier and trend analysis, etc.

Snowpro-Core LATEST
No ratings yet
Snowpro-Core LATEST
391 pages
Datawarehousing and Data Mining
No ratings yet
Datawarehousing and Data Mining
46 pages
1 Chapter One
No ratings yet
1 Chapter One
54 pages
PPP
No ratings yet
PPP
38 pages
Data Mining
No ratings yet
Data Mining
395 pages
The Importance of Data Mining in IT Industry
No ratings yet
The Importance of Data Mining in IT Industry
50 pages
Motivation For Data Mining The Information Crisis
No ratings yet
Motivation For Data Mining The Information Crisis
13 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
Data Mining and Its Applications
No ratings yet
Data Mining and Its Applications
60 pages
Introduction to Data Mining_125604
No ratings yet
Introduction to Data Mining_125604
7 pages
1 IT326 - Ch1 - Introduction
No ratings yet
1 IT326 - Ch1 - Introduction
37 pages
Data Mining Notes
No ratings yet
Data Mining Notes
46 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
No ratings yet
Data Mining: M.P.Geetha, Department of CSE, Sri Ramakrishna Institute of Technology, Coimbatore
115 pages
CPS 196.03: Information Management and Mining: Shivnath Babu
No ratings yet
CPS 196.03: Information Management and Mining: Shivnath Babu
30 pages
KDD Process
No ratings yet
KDD Process
56 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
21 pages
What Is Data Mining
No ratings yet
What Is Data Mining
5 pages
Data Mining Nostos
100% (1)
Data Mining Nostos
39 pages
DMDW
No ratings yet
DMDW
287 pages
1_Lect 1 & 2 Data Mining
No ratings yet
1_Lect 1 & 2 Data Mining
20 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
DWM
No ratings yet
DWM
66 pages
Introduction-to-Data-Mining
No ratings yet
Introduction-to-Data-Mining
32 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
8 pages
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
No ratings yet
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
14 pages
CH 1
No ratings yet
CH 1
66 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Data Mining
No ratings yet
Data Mining
17 pages
001Lecture_1 Introduction-1
No ratings yet
001Lecture_1 Introduction-1
40 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Introduction To Data Mining-Week1
No ratings yet
Introduction To Data Mining-Week1
43 pages
DWM Unit II
No ratings yet
DWM Unit II
76 pages
Absract:: Data, Information, and Knowledge
No ratings yet
Absract:: Data, Information, and Knowledge
7 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
Fundamentals of Data Mining
No ratings yet
Fundamentals of Data Mining
36 pages
Seminar Data Mining
No ratings yet
Seminar Data Mining
10 pages
DMWH M1
No ratings yet
DMWH M1
25 pages
Data Mining: by Doug Alexander
No ratings yet
Data Mining: by Doug Alexander
6 pages
Chap 1
No ratings yet
Chap 1
45 pages
p144 Data Mining
100% (3)
p144 Data Mining
11 pages
1 ST Review Document
No ratings yet
1 ST Review Document
37 pages
Data Mining Tutorial
No ratings yet
Data Mining Tutorial
30 pages
Data Mining
No ratings yet
Data Mining
19 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
data mining 1
No ratings yet
data mining 1
39 pages
Data Miningppt378
No ratings yet
Data Miningppt378
31 pages
Introduction To Data Mining: - Chapter 3
No ratings yet
Introduction To Data Mining: - Chapter 3
39 pages
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
No ratings yet
Data Mining and Decision Trees: Prof. Sin-Min Lee Department of Computer Science
66 pages
Data Mining
No ratings yet
Data Mining
26 pages
Chapter 1 DM
No ratings yet
Chapter 1 DM
20 pages
July 16, 2009 1 Data Mining
No ratings yet
July 16, 2009 1 Data Mining
26 pages
Data Minng
No ratings yet
Data Minng
20 pages
Subject Data Warehouse
No ratings yet
Subject Data Warehouse
42 pages
Data Mining: Nicoleta ROGOVSCHI
No ratings yet
Data Mining: Nicoleta ROGOVSCHI
84 pages
Unit 1 DMDW
No ratings yet
Unit 1 DMDW
57 pages
Chapter 2 Data Mining
No ratings yet
Chapter 2 Data Mining
25 pages
Dmbi PPT 1
No ratings yet
Dmbi PPT 1
40 pages
Unit-1 Notes (1)
No ratings yet
Unit-1 Notes (1)
24 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Data Warehouse
No ratings yet
Data Warehouse
9 pages
Data Warehouse Architecture 032008
No ratings yet
Data Warehouse Architecture 032008
3 pages
Data Modeling: Jak Na Cheatsheet
No ratings yet
Data Modeling: Jak Na Cheatsheet
3 pages
Skill Development Notes
No ratings yet
Skill Development Notes
22 pages
Ch15multilevel Association Rules PDF
No ratings yet
Ch15multilevel Association Rules PDF
11 pages
Telecommunications Case Study
No ratings yet
Telecommunications Case Study
6 pages
Sridhar G M: Onfidential Ésumé
No ratings yet
Sridhar G M: Onfidential Ésumé
7 pages
2022 - DCOM-Diploma in Computer Engineering - (WWW - Arjun00.com - NP)
No ratings yet
2022 - DCOM-Diploma in Computer Engineering - (WWW - Arjun00.com - NP)
30 pages
Ism Second Module
No ratings yet
Ism Second Module
73 pages
Plaquette en
No ratings yet
Plaquette en
24 pages
Lec2 Dimensional Model
No ratings yet
Lec2 Dimensional Model
30 pages
Data Warehousing & Data Mining
No ratings yet
Data Warehousing & Data Mining
15 pages
Microsoft 70 463
No ratings yet
Microsoft 70 463
175 pages
Information Technology Auditing 3rd Edition James A. Hall - Download the ebook and explore the most detailed content
No ratings yet
Information Technology Auditing 3rd Edition James A. Hall - Download the ebook and explore the most detailed content
56 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
48 pages
Learning Unit 3
No ratings yet
Learning Unit 3
24 pages
Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehouse systems
No ratings yet
Evaluating partitioning and bucketing strategies for Hive-based Big Data Warehouse systems
38 pages
Snowflake:: Data Warehouse For Cloud
No ratings yet
Snowflake:: Data Warehouse For Cloud
2 pages
Final Examination in Accounting Information Systems
100% (2)
Final Examination in Accounting Information Systems
7 pages
Chapter 9
No ratings yet
Chapter 9
31 pages
Conformed Dimensions
No ratings yet
Conformed Dimensions
3 pages
Metadata Implementation With Ab Initio EME - Teradata Downloads
No ratings yet
Metadata Implementation With Ab Initio EME - Teradata Downloads
9 pages
P3 - Data, Preprocessing, Informasi, & Analisis
No ratings yet
P3 - Data, Preprocessing, Informasi, & Analisis
30 pages
Data Warehousing and Data Mining-The Multidimensional Data Model
0% (1)
Data Warehousing and Data Mining-The Multidimensional Data Model
15 pages
Bharathiar University, Coimbatore - 641 046 M. SC Computer Science
No ratings yet
Bharathiar University, Coimbatore - 641 046 M. SC Computer Science
16 pages
Resume of Md Mazed Hussain
No ratings yet
Resume of Md Mazed Hussain
5 pages
Acord and IAA
0% (1)
Acord and IAA
4 pages
CSD310 - Pertemuan 9 - ETL
No ratings yet
CSD310 - Pertemuan 9 - ETL
54 pages
Week 4 - 5 - Data Preprocessing
No ratings yet
Week 4 - 5 - Data Preprocessing
67 pages

Lecture 1 Introduction Updated (1)

Uploaded by

Lecture 1 Introduction Updated (1)

Uploaded by

King Saud University

College of Computer & Information Sciences

IS 463 Introduction to Data Mining

The slides content is derived and adopted from many references

• The evolutionary path :

What is Not Data Mining:

2. Data mining system may analyze customer data to

3. Data mining system may detect deviations, such as

• Data Mining can handle many prediction

Data source 1 Clean client

trans_ID list of item_ID

2. Data mining can bring answer to “Which items sold

• Advanced database and information repository

1. Spatial and temporal data

• Construct models (functions) that describe and

student? yes leasing_rating?

no yes excellent fair

Foreign 341 260

Metro 943 746

Sports 738 573

Entertainment 354 278

Data mining engine

Data warehouse Knowledge-

You might also like