Knowledge Management - 10 - Data Mining Overview

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 41

Knowledge

Management
Shofiana, DA – Universitas Lampung
Review
 Learning From Data
Approach
 Artificial Neural Network
 Association Rules

1 2 3
Data Mining Overview
Today’s Topic

01 Data Mining Definition

02 Knowledge Discovery in Database

03 Data Mining System Architecture

04 Data Mining Tasks

05 Data Mining Applications


What’s the reason behind the
advancement of data mining
within the industrial business?
Data Exploration Problem
Motivation • We are drowning in data, but starving for knowledge!
• Information are “hidden” within data
• Data are often collected with no further analysis
conducted
Why use data mining?
Commercial Perspective

 Data become
ubiquitous
 Computer technology
getting more powerful
and accessible
 The competitive
pressure is getting
stronger
Scientific Perspective
 Data collected and stored
in high speed (GB / hour)

Remote sensor in satellite
 Gene generator
microarray expression
data
 Spatial data (GIS), etc.
 Conventional techniques
are not enough
 The gate to modelling,
classification, clustering,
etc
The Growth Of Internet Users

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon, and infographics & images by Freepik

Please keep this slide for attribution


What is Data Mining?

 Extraction of interesting information or


patterns (non-trivial, implicit, previously
unknown and potentially useful) in the
‘big’ database
 Other names: Knowledge discovery (mining) in
databases (KDD), knowledge extraction,
data/pattern analysis, data archeology, data
dredging, information harvesting, business
intelligence, etc.
 Non-data mining tasks: deductive query,
expert system, information system
Data Mining Tasks?

Not data mining Data mining

 Search for the names of  Search for product names


Computer Science often purchased at the same
students on SIAKADU time at the supermarket
 Search for a phone  Grouping similar documents
number in the telephone returned by search engines
directory based on the context (example:
 Perform queries on the Amazon rain forest, Amazon
search engine searching river, Amazon.com)
for information about
"Corona"
 Close to AI, machine learning,
statistics, and database Relationship of Data
systems Mining with Other
Fields
 Conventional techniques
becomes irrelevant:
 Dimensionality of data
 Size of data
 Heterogeneity and distribution of
data
Data Mining: Multi Discipline

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon, and infographics & images by Freepik

Please keep this slide for attribution


Data Mining: A Process in KDD
One main element in the Knowledge Discovery from Data process.

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon, and infographics & images by Freepik

Please keep this slide for attribution


Data Mining: A Process in KDD

 Data Cleaning: Removing noise and inconsistent data


 Data Integration: Collect and integrate data from various sources
 Data Selection: Selecting only relevant data from the database
 Data Transformation: Transforming the data into a certain form to
be mined
 Data Mining: Intelligence method applied to generate patterns
 Pattern Evaluation: Evaluate interesting patterns based on some
parameters
 Knowledge Representation: Represent the knowledge as the
mining result from the KDD process
KDD Process Typical View from Machine Learning and Statistics

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon, and infographics & images by Freepik

Please keep this slide for attribution


Data Mining In Business Intelligence

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon, and infographics & images by Freepik

Please keep this slide for attribution


Data Mining System: Architecture

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon, and infographics & images by Freepik

Please keep this slide for attribution


Data Mining System: Architecture

 Database: Place where information is stored.


 Database or Data Warehouse Server: The component responsible for
retrieving relevant data, based on user requests.
 Knowledge Base: Domain knowledge to guide searches or evaluate
patterns.
 Data Mining Engine: Functional modules in data mining ( association,
classification, cluster analysis, etc.)
 Pattern Evaluation Model: Measures the attractiveness and interact with
data mining modules in searching interesting patterns.
 Graphical User Interface: Communication bridge between users and
data mining systems.
What kind of data can we mine?

 Relational database
 Transactional database
 Data warehouse
 Other data storage: object-oriented and object-
relational database, spatial database, time series
and temporal data, sequence data, text data,
multimedia data, WWW
Methods in Data Mining Tasks

Prediction Description

 Using multiple  Finding patterns


variables (attributes) to (correlations, trends,
predict the unknown clusters, trajectories,
value or the value and anomalies) that
coming from another summarize the
variable (attribute) relationships in the
data.
Association (correlation, causality)

 Multi-dimensional vs. single-dimensional


association
 Snack → Soft Drink [0.5%, 80%] (support,
confidence)
 contains(T, “computer”) → contains(x,
“software”) [1%, 75%]
Data Mining Tasks (1)
Classification or Prediction

 Find a model (function) that describes


and differentiates classes or a concept
for future predictions
 Presentation: decision-tree,
classification rule, neural network, etc.
Cluster Analysis
 The class label is unknown
 To analyze the distribution pattern
 Maximizing intra-class similarity and
minimizing inter-class similarity

Outlier Analysis
Data Mining Tasks (2)  Outliers: part of data that doesn't follow
the general behavior of the data
 Can be seen as noise or exception but
useful in fraud detection, analysis of rare
events

Trend and evolution analysis


 Regression analysis
 Sequential pattern mining (ex: Antivirus)
Interesting Pattern?
A pattern is called interesting when
the pattern is easy to comprehend,
valid to new dataset with a certain
level of certainty, useful, novel, or it
can validate a hypothesis generated
by the user.
 Web page analysis: from web page classification, clustering to
PageRank & HITS algorithms
 Collaborative analysis & recommender systems
 Basket data analysis to targeted marketing
 Biological and medical data analysis: classification, cluster analysis
(microarray data analysis), biological sequence analysis, biological
network analysis

Applications of Data Mining


Application: Classification

Direct Marketing
 Purpose: reduce the cost of
intermediaries with targeting a group
of consumers who probably will buy a
new product.

Fraud Detection
 Goals: predicting the fraud cases in
transactions within credit card
transactions.
Application: Clustering

Market Segmentation
 Goal: Dividing a heterogeneous market into
relatively more homogenous segments
based on certain parameters like geographic,
demographic, psychographic, behavioral, etc.
 Approach:
- Gather different attributes of consumers
information.
- Find clusters of similar consumers.
- Measure the quality of the clusters by
observing patterns purchases from
consumers in the same class towards
consumers from different clusters.
Application: Establishment of Association Rules

Management of placing goods in


supermarkets.

 Goal: identify items that were


purchased concurrently by many
buyers.
 Approach: Processing point-of-sale
data collected with a barcode scanner
to find dependency between items.
 Example:
{Cola, … } --> {Potato Chips}
What We Have Learned

 Why data mining exist


 Definition of data mining
 KDD Process
 Data Mining System Architecture
 Data Mining Tasks
 Data Mining Applications
Thank you
Do you have any questions?

[email protected]
linkedin.com/dashofiana

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon, and infographics & images by Freepik

Please keep this slide for attribution


Learning From Data
 Build learning models that
automatically improve with
experience.
 Top-down approach
 Generate ideas
 Develop models
 Validate models
 Bottom-up approach
 Discover new (unknown)
patterns
 Find key relationships in
data
Top-down Approach (Example)

 Start with a hypothesis derived


from observation or prior
knowledge
 “Tourists visiting Egypt earn an
annual income of at least $50,000”
 Hypothesis tested by querying
database followed by analysis
 If tests not supportive, hypothesis
is revised and test again
Bottom-up Approach (Example)

 No hypothesis to test
 “Find unknown buying patterns by
analyzing the shopping basket”
 “ … showed married females, age
21 to 27, shopped for diapers also
brought cosmetic products.”
 “store decided to stack cosmetics
cases next to diaper shelf”
A Neuron Model

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon, and infographics & images by Freepik

Please keep this slide for attribution


Learning in Neural Network
 Supervised
 The NN needs a
teacher with a training
set of examples of
input and output
 Unsupervised (or
Self-Supervised)
 Does not need a
teacher
Association Rules

 A KB tool that generates a set of


rules to help understanding
relationships that exist in data
 Types:
 Boolean rule
 Quantitative rule
 Multi-dimensional rule
 Multi-level association rule
Boolean Rule (An Example)

 A rule that examines the presence


or absence of items
 For example, if a customer buys a
PC and a 17” monitor, then he will
buy a printer. Presence of items
(a PC and 17” monitor) implies
presence of the printer in the
customer’s buying list
Quantitative Rule (An Example)

 A rule that considers the


quantitative values of items
 For example, if a customer
earns between $30,000 and
$50,000 and owns an
apartment worth between
$250,000 and $500,000, he
will buy a 4-door automobile
Multi-dimensional Rule

 A rule that refers to a multitude of


dimensions
 If a customer lives in a big city and
earns more than $35,000, then he
will buy a cellular phone
 This rule involves 3 attributes: living,
earning, and buying. Therefore, it is a
multi-dimensional rule
Multi-level Association Rule

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon, and infographics & images by Freepik

Please keep this slide for attribution

You might also like