0% found this document useful (0 votes)
126 views53 pages

Data Mining & Warehousing 01

This document provides an introduction to data mining and warehousing. It defines data mining as the process of analyzing large databases to find hidden patterns and relationships. The key reasons for using data mining now are the competitive pressure on businesses, the massive amount of data being collected and stored, increased computing power and availability of data mining algorithms and commercial tools. Data mining works with data warehouses, which store refined historic data to support management decision making and intelligence applications.

Uploaded by

Mulat Shieraw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views53 pages

Data Mining & Warehousing 01

This document provides an introduction to data mining and warehousing. It defines data mining as the process of analyzing large databases to find hidden patterns and relationships. The key reasons for using data mining now are the competitive pressure on businesses, the massive amount of data being collected and stored, increased computing power and availability of data mining algorithms and commercial tools. Data mining works with data warehouses, which store refined historic data to support management decision making and intelligence applications.

Uploaded by

Mulat Shieraw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Data Mining & Warehousing

(INSY3041)

Chapter 1
Introduction to Data Mining and Warehousing
(DMW)
Compiled by: Mulat S. 6/1/20
Data Mining & Warehousing 1
What is data mining?
Data is growing at a phenomenal rate. At the same time, users expect more
sophisticated information
A marketing manager is no longer satisfied with a simple listing of
marketing contacts, but wants detailed information about customers past
purchasing behavior and prediction of future purchases
Data mining steps to solve such kinds of needs.
 How? Data mining uncover hidden patterns in a database

Data Mining is a technology that uses various techniques to discover


hidden knowledge from heterogeneous and distributed historical data
stored in large databases, warehouses and other massive information
repositories.
Data Mining & Warehousing 2
What is data mining? …
Data mining is the process of analyzing large databases using various
techniques to find patterns in data that are:
valid: not only represent current state, but also hold on new data with some
certainty, patterns hold in general
novel: non-obvious to the system that are generated as new facts, didn’t know
the pattern beforehand
useful: should be possible to act on the item or problem , can devise actions
from the patterns
understandable: humans should be able to interpret the pattern, can interpret
and comprehend the patterns

Data Mining & Warehousing 3


Why DM Now?
Four main reasons why DM now?
The competitive pressure is very strong
 How to gain competitive advantage?
 How to control the volatile market?
 How to satisfy customers (prosumers) need?
 How to manage the high turnover rate of professionals?

Massive data collection: produced at alarming rate & is being warehoused


The computing power is available and is affordable
DM commercial products and machine learning algorithms are available

Data Mining & Warehousing 4


Why use DM Now: Massive data collection
Massive data collection: large databases (data warehouses) are growing at
unprecedented rates to manage the explosive growth in stored data.
Data is being produced (generated & collected) at alarming rate because of:
 The computerization of business & scientific transactions
 Advances in data collection tools, ranging from scanned texts & image platforms to

satellite remote sensing systems


 Popular use of WWW as a global information system

Examples of massive data sets


Google: Order of 10 billion Web pages indexed
 100’s of millions of site visitors per day
MEDLINE text database: 17 million published articles
Retail transaction data: EBay, Amazon, Wal-Mart: order of 100 million transactions per
day
 Visa, MasterCard: similar or larger numbers
Data Mining & Warehousing 5
Too much data & too little knowledge
There is a need to extract knowledge (useful information) from the massive
data.
The competitive pressures are strong, which needs useful information for prediction,
classification, clustering, associating, linking
Facing too enormous volumes of data, human analysts with no special tools can
no longer make sense.
Data mining can automate the process of finding patterns & relationships in raw
data and the results can be utilized for decision support. That is why
data mining is used, especially in science and business areas.
If we know how to reveal valuable knowledge hidden in raw data, data might be
one of our most valuable assets.
Data mining is the tool that involves retrospective analysis to extract diamonds of
knowledge from historical data & predict outcome of the future.
Data Mining & Warehousing 6
Why use DM Now: Powerful computers
Powerful computers: The computing power is available and is also affordable
The need for improved computational engines can now be met in a cost-effective
manner with parallel multiprocessor computer technology.
Technological Driving Factors
Larger, cheaper memory (in hundred GBs, not in MBs)
 Moore’s law for magnetic disk density
“capacity doubles every 18 months”
 Storage cost per byte falling rapidly

Faster, cheaper processors (in GHz, not in MHz)


 the CRAY of 15 years ago is now on your desk
Success of Relational Databases and the World Wide Web
 everybody is a “data owner”

Data Mining & Warehousing 7


Why DM Now: DM algorithms
TO provides rational decision for Business and applications to classify,
cluster ,associate , predict(futures of) using data the data mining tools and
techniques is necessary.
Commercial products (for data mining) are available
Data mining algorithms have been matured & there are reliable tools that
consistently outperform older statistical methods.
New ideas in machine learning/statistics
 Boosting, SVMs, decision trees, non-parametric Bayes, text models, etc
Existence of around 20-30 mining tool vendors
Existence of many embedded products for
 Fraud detection
 CRM
 Health care(Medical diagnosis)
 E-commerce applications( Mart Segmentation, MBA) analysis

Data Mining & Warehousing 8


Example: Why Data Mining
Customer relationship management:
Which of my customers are likely to be the most loyal, and which are most likely
to leave for a competitor?
Credit ratings/targeted marketing:
Given a database of 100,000 names, which persons are the least likely to default
on their credit cards?
Identify likely responders to sales promotions

Fraud detection/Network intrusion detection


Which types of transactions are likely to be fraudulent, given the demographics
and transactional history of a particular customer?

•Data Mining helps extract such useful information


Data Mining & Warehousing 9
Database Processing vs. Data Mining
Processing
Database Data mining Comments
Query Well defined • Poorly defined The data miner might not
Structured • No precise query know what he exactly wants
Query language to see(HIDDEN)
Language
Data Operational Not Operational data The data have been cleansed
data (historical +operational ) and modified to better
support the mining process

Output Precise and Not a subset of database The output is some hidden
Subset of (new interesting useful patterns & knowledge
database patterns) in the database

Data Mining & Warehousing 10


What is (not) Data Mining?
 What is not Data Mining?  What is Data Mining?
– Look up phone number in
phone directory – Certain names are more prevalent in
certain locations (O’Brien, O’Rurke,
O’Reilly… in Boston area)
– Query a Web search
engine for information
about “Amazon” – Discover groups of similar documents on
the Web

Data Mining & Warehousing 11


Data Mining … then
Type of data Type of
Interestingness criteria

+ =
Interestingness Hidden
Data criteria patterns

Data Mining & Warehousing 12


Query Examples
Database
Find all credit applicants with first name ‘Alex’.
Identify customers who have purchased more than Birr 10,000 in the last month.

Find all customers who have purchased Bread

Data Mining
Find all credit applicants who have no credit risks. (classification)
Identify customers with similar buying habits. (Clustering)
Find all items which are frequently purchased with Bread. (association rules)

Data Mining & Warehousing 13


Primitives for specifying a data mining task

Data Mining & Warehousing 14


Data Mining works with Data Warehouse
 Data Warehouse provides the Enterprise with a memory

Enterprise + Memory
 contains more refined data and act as ROM( we can add new data but can’t
delete, update)
a RDBMS
responsible for the collection and storage of data to support management
decision making and problem solving

• Data Mining provides the Enterprise with intelligence


Enterprise + Intelligence (Applications)
Data Mining & Warehousing 15
Data Warehouse
Data warehouse
A data warehouse is a RDBMS responsible for the collection and storage of data to
support management decision making and problem solving.
It enables managers and other business professionals for data mining, online analytical
processing(OLAP), market research and decision support
Current evolution of Decision Support Systems (DSSs)

Data mart
A subset of a data warehouse for small and medium-size businesses or departments
within larger companies

Data Mining & Warehousing 16


Data Warehouse Stores Heterogeneous Data
Data
Relational
extraction
databases
process
------------------
Hierarchical Data
databases cleanup
----------------- process
Network
databases Data
----------------- warehouse
Flat files End user access Query and
-----------------
analysis
Spreadsheets
tools

Data Mining & Warehousing 17


Data warehousing …
Data warehouse is an integrated, subject-oriented, time-variant,
non-volatile database that provides support for decision making.
Integrated  centralized, consolidated database that integrates data
derived from the entire organization.
 Consolidates data from multiple & diverse sources with diverse formats.
 Helps managers to better understand the company’s operations.

• Subject-Oriented  Data warehouse contains data organized by topics


for specific application domain.
 E.g. Sales, marketing, finance, etc.

Data Mining & Warehousing 18


Data warehousing…
Time variant  In contrast to the operational database that focus on
current transactions, the data warehouse represent the flow of data
through time.( Operational(accept) +Historical Data)
Data warehouse contains data that reflect what happened last week, last month,
past five years, and so on.

Non volatile  Once data enter the data warehouse, they are never
removed. Because the data in the warehouse represent the company’s
entire history.
 Because data is added all the time, warehouse is growing.

Data Mining & Warehousing 19


Why Data Warehousing?
Which
Whichare
areour
our Who
lowest/highest Whoare
aremy
mycustomers
customers
lowest/highestmargin
margin and
andwhat
whatproducts
products
What
Whatisisthe
themost customers
most customers?? are
arethey
theybuying?
buying?
effective
effectivedistribution
distribution
channel?
channel?

What
Whatproduct
productprom- Which
prom- Whichcustomers
customers
-otions
-otionshave
havethe
thebiggest are
biggest aremost
mostlikely
likelyto
togo
go
impact
impactononrevenue? to
revenue? tothe
thecompetition
competition??

What
Whatimpact
impactwill
will
new
newproducts/services
products/services
have
haveon
onrevenue
revenue
and
andmargins?
margins?

Data Mining & Warehousing 20


Database & data warehouse: Differences
The data warehouse and operational environments are separated. Data
warehouse receives its data from operational databases.
Data warehouse environment is characterized by read-only transactions to
very large data sets(ROM).
Operational environment is characterized by numerous update transactions to a
few data entities at a time.
Data warehouse contains historical data over a long time horizon.

 Ultimately Information is created from data warehouses. Such Information


becomes the basis for rational decision making.
 The data found in data warehouse is analyzed to discover previously
unknown data characteristics, relationships, dependencies, or trends.

Data Mining & Warehousing 21


Data Processing Technologies
OLAP – Online Analytical Processing
refers to an advanced data analysis environment that supports decision making.
Access to multidimensional databases providing managerially useful display
techniques
 Data mining tools analyze the data, uncover problems or opportunities hidden
in the data relationships.
 E.g.: Credit system : who are likely not to pay their debts?
 Crime Database : Who are likely to commit what kind of crime?

OLAP provides top-down, query-driven analysis


Data mining provides bottom-up, discovery-driven analysis

Data Mining & Warehousing 22


Data Mining(DM) vs. Knowledge Discovery in Databases(KDD)
KDD is often used as a synonym for Data Mining.
Some author define KDD as the whole process involving: data selection  data
pre-processing: cleaning  data transformation data mining  result
evaluation  visualization/representation
Data mining is a step in the KDD process of applying data analysis and discovery
algorithms.
KDD is process of extracting previously unknown, valid, and actionable
(understandable) information from large databases.
It is the process of finding useful information and patterns in data
DM is the use of algorithms to extract hidden patterns & knowledge
in data
Knowledge Discovery is NEEDED to make sense and use of data.
Data Mining & Warehousing 23
Stages in data mining: The KDD process
• Knowledge discovery in databases (KDD) is the non-trivial process of identifying valid, potentially
useful and ultimately understandable patterns in data

Modeling using
algorithms
Cleansing, reduction,
integration

• Data ( Selection  Per-Pro  Transformation Mining Evaluation 


Interpretation  K./patterns)
Data Mining & Warehousing 24
Stages in DM: The KDD process …
Selection: Obtain data from various heterogeneous sources such as
databases, data warehouses, files, non-electronic records, etc.
Preprocessing: Cleanse inconsistent & incorrect data; fills incomplete
records; predict missing values; correct erroneous & anomalous data.
Transformation: Convert data from different sources into common new
format. Apply data reduction & data categorization/binning to ease data
mining
Mining: apply classification or clustering techniques to obtain predictive or
descriptive models.
Interpretation/Evaluation: Present results to user in meaningful manner
using various visualization and GUI strategies.

Data Mining & Warehousing 25


CRoss Industry Standard Process for Data Mining (CRISP-DM)

Data Mining & Warehousing 26


Phases and Tasks
Business Data Data
Modeling Evaluation Deployment
Understanding Understanding Preparation

Determine Collect Initial Data Data Set Select Modeling Evaluate Results Plan Deployment
Business Objectives
Initial Data Collection Data Set Description Technique Assessment of Data Deployment Plan
Background Report Modeling Technique Mining Results
Business Objectives Select Data Modeling Assumptions w.r.t. Plan Monitoring
Business Success Describe Data Rationale for Inclusion / Business Success and
Criteria Generate Test Design
Data Description Report Exclusion Test Design Criteria Maintenance
Situation Assessment Approved Models Monitoring and
Inventory of Resources Explore Data Clean Data Build Model Maintenance Plan
Requirements, Data Exploration ReportData Cleaning Report Parameter Settings Review Process
Assumptions, and Models Review of Process Produce Final
Constraints Verify Data Quality Construct Data Model Description Report
Risks and ContingenciesData Quality Report Derived Attributes Determine Next Final Report
Terminology Generated Records Assess Model Steps Final Presentation
Costs and Benefits Model Assessment List of Possible
Integrate Data Revised Parameter Actions Review Project
Determine Merged Data Settings Decision Experience
Data Mining Goal
Data Mining Goals Documentation
Data Mining Success Format Data
Criteria Reformatted Data

Produce Project Plan


Project Plan
Initial Asessment of
Tools and Techniques

Data Mining & Warehousing 27


Data Mining and Business Intelligence
Increasing potential
Informed decisions
to support
business decisions End User
Making
Decisions

Data Presentation Business Analyst

Visualization Techniques
Data Mining Data Analyst
Information Discovery

Data Exploration
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
OLAP
Data Sources DBA
Paper, Files, Information Providers, Database Systems, OLTP
Data Mining & Warehousing 28
DM Process Ex: Web Log
Web log- a web access data in either client or server perspective (found in client pc,
browser, proxy…)
Selection:
 Select log data (dates and locations) to use  web access properties
Preprocessing: ( select the target one, remove unnecessary)
 Remove identifying URLs
 Remove error logs
Transformation: (Come up with same format)
 Sessionize (sort and group) in periodical manner
Data Mining: ( Find useful/interesting Patterns in data)
 Identify and count patterns apply for new web log data
 Construct data structure
Interpretation/Evaluation: (Display, Visualize)
 Identify and display frequently accessed sequences.

Data Mining & Warehousing 29


Origins of Data Mining
pre 1960 1960’s 1970’s 1980’s 1990’s

Hardware
(sensors, storage, computation)

Relational
Databases Data
AI Pattern Machine Mining
Learning Text Mining
Recognition
Web Mining
Knowledge Mining
“Flexible Models” Spatial Mining
EDA
“Pencil
“Data Dredging”
and Paper”

Data Mining & Warehousing 30


Data Mining Development •Similarity Measures
•Hierarchical Clustering
•Relational Data Model •IR Systems
•SQL •Imprecise Queries
•Association Rule Algorithms •Textual Data
•Data Warehousing
•Scalability Techniques •Web Search Engines

•Bayes Theorem
•Regression Analysis
DATA MINING •EM Algorithm
•K-Means Clustering
•Time Series Analysis
•Algorithm Design Techniques
•Algorithm Analysis •Neural Networks
•Data Structures
•Decision Tree Algorithms

HIGH PERFORMANCE
Data Mining & Warehousing 31
DM: Intersection of Many Fields
• Data mining overlaps with machine learning, statistics,
artificial intelligence, databases, visualization
Machine Learning (ML)
Data structure &
Statistics (stats) algorithm analysis

Visualization (viz) Data Databases (DB)


Mining

Human Computer High-Performance


Interaction (HCI) Parallel Computing
Information
retrieval
Data Mining & Warehousing 32
Data Mining Metrics
How to measure the effectiveness or usefulness of data mining approach?
Return on Investment (ROI)
From an overall business or usefulness perspective a measure such as ROI is used
ROI compares costs of DM techniques against savings or benefits from its use

Accuracy in classification
Analyze true positive and false positive to calculate recall, precision of the system
Measure percentage of correct mis/classification.

Space/Time complexity
Running time: how fast the algorithm runs
Storage or memory space requirement

Data Mining & Warehousing 33


Data Mining implementation issues
Scalability
Applicability of data mining techniques to perform well with massive real world
data sets
Techniques should also work regardless of the amount of available main memory
Real World Data quality of data
Real world data are noisy and have many missing attribute values. Algorithms
should be able to work even in the presence of these problems
Updates( flexibility )
Database can not be assumed to be static. The data is frequently changing.
However, many data mining algorithms work with static data sets. This requires
that the algorithm be completely rerun any time the database changes.

Data Mining & Warehousing 34


Data Mining implementation issues …
High dimensionality:
A conventional database schema may be composed of many different attributes. The
problem here is that all attributes may not be needed to solve a given DM problem.
The use of unnecessary attributes may increase the overall complexity and decrease the
efficiency of an algorithms.
The solution is dimensionality reduction (reduce the number of attributes). But,
determining which attributes are not needed is a tough task!
Overfitting (Fitting of the model/pattern Vs DB states)
The size and representativeness of the dataset determines whether the
model
associated with a given database states fits to also future database states.
Overfitting occurs when the model does not fit to the future states which is
caused by the use of small size and unbalanced training database.

Data Mining & Warehousing 35


Data Mining implementation issues …
Ease of Use of the DM tool
Since data mining problems are often not precisely stated, interfaces may be
needed with both domain and technical experts
Although some techniques may work well, they may not be accepted by users if
they are difficult to use or understand
Application
Determining the intended use for the information obtained from the DM tool is a
challenge.
Indeed, how business executives can effectively use the output is sometimes considered
the most difficult part. Because the results are of a type that have not previously been
known(surprising patterns) .
Business practices may have to be modified to determine how to effectively use the
information uncovered. The model/pattern may not fit the future DB states due to
unexpectedness of results(small size or unbalanced training set in DBs, WHs).
Data Mining & Warehousing 36
Focus area of DM
Designing an efficient DM algorithms & architectures
that is scalable to the number of features and instances extracted from the
high dimensional database
Data miner that handle large, heterogeneous data (including multimedia data,
spatial data, textual data, time series data …) Data Types to be mined and
techniques used.
Presentation of DM results  visualization
To easily view and understand the output of the DM algorithms there is a need
to use knowledge representation (decision tree, rules, equations, semantic
networks) and visualization techniques (such as graphs, bar charts, etc.).
Integration of DM functions into traditional DBMS in order to design an
intelligent database

Data Mining & Warehousing 37


Focus area of DM …
Advertising
Bioinformatics
Customer Relationship Management (CRM)
Database Marketing
Fraud Detection
ecommerce
Health Care
Investment/Securities
Manufacturing, Process Control
Sports and Entertainment
Telecommunications
Web
Data Mining & Warehousing 38
DM Architecture
Database, data warehouse, WWW or other information
repository (store data)
Database or data warehouse server (fetch and
combine data)
Knowledge base (turn data into meaningful groups
according to domain knowledge)
Data mining engine (perform mining tasks)
Pattern evaluation module (find interesting patterns)
User interface (interact with the user)

Data Mining & Warehousing 39


DM Architecture

Data Mining & Warehousing 40


Data Warehouse: A Multi-Tiered Architecture
Monitor
Metadata & OLAP Server
Other
sources Integrator

Analysis
Operational Extract
Server
Query
DBs Transform Data Reports
Load
Refresh
Warehouse Data mining

Data Marts
Data Sources Data Storage OLAP Engine Front-End Tools
41
DM Tasks and Models

Data Mining & Warehousing 42


A Multi-Dimensional View of Data Mining Classification
Databases to be mined
Relational, transactional, object-oriented, object-relational, active, spatial, time-series,
text, multi-media, heterogeneous, legacy, WWW, etc.
Knowledge to be mined
Characterization, discrimination, association, classification, clustering, trend,
deviation and outlier analysis, etc.
Multiple/integrated functions and mining at multiple levels
Techniques utilized
Database-oriented, data warehouse (OLAP) oriented, machine learning, statistics,
visualization, neural network, etc.
Applications adapted
 Retail, telecommunication, banking, fraud analysis, DNA mining, stock market analysis, Web mining,
Weblog analysis, etc.
43
Data Mining Tasks and Models
Data mining tasks are generally divided into two major categories:
• Predictive tasks - Use some attributes to predict unknown or future values of other attributes.
• Predictive data mining task perform inference on the current data in order to
make prediction to the future reference
» Classification
» Regression
» Deviation Detection
• Descriptive tasks - Find human-interpretable patterns that describe the data.
– Descriptive data mining task characterize the general properties of the data in a database.
» Association Discovery
» Clustering
» etc.

Data Mining & Warehousing 44


Predictive Data Mining (Supervised Learning)
Given a collection of records (training set)
Each record contains a set of attributes, one of the attributes is the class.
Find ("learn") a model for the class attribute as a function of the values of
the other attributes.

Goal: previously unseen records should be assigned a class as accurately


as possible.

Data Mining & Warehousing 45


Data Mining Functionalities
The supervised predictive data mining functionalities includes
Classification
 Finding class label for un know data point
Regression
 Predicting the value associated to a given dependent variable
Time series
 Finding the relation between previous observation and the current on a given variable
Estimation
 Estimate missing element which can be either classification or regression or time series

Data Mining & Warehousing 46


Data Mining Functionalities
 The unsupervised descriptive data mining functionalities includes
 Concept /class description: Characterization (summarization) and discrimination
 Characterization refers to describing a given set of data that belongs to a class
 Discrimination refers to finding similarities and differences among a class with another set of class (classes)
 Association Analysis
 Finding how one or more attribute and their values are related into acouse and effect relationship
 Clustering analysis
 Finding the possible set of classes for a given set of data that maximize the intra cluster distance and
minimizes the inter cluster distance
 Outlier analysis
 Find a group of data set or activities that deviate from the normal behaviour
 Trend or Evolution analysis
 Try to see the evolutionary trend from the data and may be used to predicte about the present or the future
47
Potential Applications of Data Mining
Data mining has many and varied fields of application some of which are
listed below.
1 Marketing Analysis
2 Fraud detection and management
3 Insurance and Health Care Analysis
4 Transportation Analysis
5 Medicine Analysis

Data Mining & Warehousing 48


Potential Applications of Data Mining
Data mining can be applied in various application areas
1. Target Market Analysis
 Given a product, who should be targeted so that he/she/they will by the product
 This need to analyze what type of customer potentially by what type of product
 This enable us where, when and to whom to make the promotion using various strategies
 We can also use Data mining to predict response rate for the marketing movement when conducted by email,
calling, advertizing campaign
 Customers who are BSc students are candidate to buy a new model laptop so that we can promote the laptop
primarily for them if the company has one
2. Market Basket Analysis
 Which item should be clustered on a shopping shelf together so that customers will be comfortable
to find items that they need to buy in the same vicinity
 A person who buy chat also buy Coke
 A person who buy meet on Friday night also buy beer
 A person who buy dipper on Thursday night will also buy beer
Data Mining & Warehousing 49
Potential Applications
Data mining can be applied in various application areas
3. Market Segmentation
 Clustering customer by their demographic, geographic behavior and physiological information and
treat them accordingly
 Example
 Telecommunication agency of Ethiopia has customer type which is manually derived but used to
treat them differently
 Travel agencies handle traveler differently depending on their status
 Retailers has segment that allow or disallow credit purchase
 Ministry of Education has segment to treat differently students based on their gender (Affirmative
action)

Data Mining & Warehousing 50


Potential Applications
Data mining can be applied in various application areas
4. Fraud detection and management
Intrusion detection
Misuse detection of resources
Malicious activity detection
Credit card fraud
Telephone line fraud
5. Identify loyal customers vs. risky customer
 Load risk assessment
 Abuse protection of various insurance of customers

6. Risk analysis and management


 Forecasting, customer retention, improved underwriting, quality control, competitive analysis
Data Mining & Warehousing 51
Question :Are All the “Discovered” Patterns Interesting?
A pattern is interesting if it is easily understood by humans, valid on new or test data with some
degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to
confirm
An interesting pattern represents knowledge
 Measure of Interestingness measures
 Two types (Objective vs. subjective)
 Objective: based on statistics and structures of patterns, e.g., support, confidence, Error (Mean Square error,
absolute error, etc), Similarity measure, etc.
 Subjective: based on user’s belief in the data, e.g., unexpectedness (contradicting a user’s belief), novelty,
actionability, etc.

52
End!!!
Question ???

53

You might also like