0% found this document useful (0 votes)

40 views16 pages

Unit 5 Introduction To Data Mining: Prashasti Kanikar 9/26/2020

This document provides an introduction to data mining. It discusses that data mining aims to discover hidden patterns from large databases. It describes the different types of data that can be mined, including relational databases, time-series data, graphs, text and web data. The document also outlines several data mining techniques, such as classification, clustering, association analysis and outlier detection. Finally, it discusses some common applications and challenges of data mining.

Uploaded by

Hansica Madurkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views16 pages

Unit 5 Introduction To Data Mining: Prashasti Kanikar 9/26/2020

Uploaded by

Hansica Madurkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Unit 5

Introduction to Data Mining

1 PRASHASTI KANIKAR 9/26/2020

 Data Mining is “an information extraction
activity whose goal is to discover
hidden facts contained in large
databases.”

2 PRASHASTI KANIKAR 9/26/2020

Data Mining: On What Kinds of Data?
 Database-oriented data sets and applications

 Relational database, data warehouse, transactional database

 Advanced data sets and advanced applications

 Data streams and sensor data

 Time-series data, temporal data, sequence data (incl. bio-sequences)

 Structure data, graphs, social networks and multi-linked data

 Object-relational databases

 Heterogeneous databases and legacy databases

 Spatial data and spatiotemporal data

 Multimedia database

 Text databases

 The World-Wide Web

3 PRASHASTI KANIKAR 9/26/2020

What kind of Patterns can be mined?
Data Mining Function: (1) Generalization
 Information integration and data warehouse construction
 Data cleaning, transformation, integration, and multidimensional
data model
 Data cube technology
 Scalable methods for computing (i.e., materializing)
multidimensional aggregates
 OLAP (online analytical processing)
 Multidimensional concept description: Characterization and
discrimination
 Generalize, summarize, and contrast data characteristics, e.g., dry vs.
wet region
4 PRASHASTI KANIKAR 9/26/2020
Data Mining Function: (2) Association and Correlation
Analysis
 Frequent patterns (or frequent itemsets)
 What items are frequently purchased together in your Walmart?
 Association, correlation vs. causality
 A typical association rule
 Bread  Butter [0.5%, 75%] (support, confidence)

5 PRASHASTI KANIKAR 9/26/2020

Data Mining Function: (3) Classification

 Classification and label prediction

 Construct models (functions) based on some training examples
 Describe and distinguish classes or concepts for future prediction
 E.g., classify countries based on (climate), or classify cars based on (gas mileage)
 Predict some unknown class labels

6 PRASHASTI KANIKAR 9/26/2020

Data Mining Function: (4) Cluster Analysis

 Unsupervised learning (i.e., Class label is unknown)

 Group data to form new categories (i.e., clusters), e.g., cluster houses to
find distribution patterns
 Principle: Maximizing intra-class similarity & minimizing interclass
similarity
 Many methods and applications

7 PRASHASTI KANIKAR 9/26/2020

Data Mining Function: (5) Outlier Analysis

 Outlier analysis
 Outlier: A data object that does not comply with the general behavior of the data
 Noise or exception? ― One person’s garbage could be another person’s treasure
 Methods: by product of clustering or regression analysis, …
 Useful in fraud detection, rare events analysis

8 PRASHASTI KANIKAR 9/26/2020

Time and Ordering: Sequential Pattern, Trend and
Evolution Analysis
 Sequence, trend and evolution analysis
 Trend, time-series, and deviation analysis: e.g., regression and value
prediction
 Sequential pattern mining
 e.g., first buy digital camera, then buy large SD memory cards
 Periodicity analysis
 Motifs and biological sequence analysis
 Approximate and consecutive motifs
 Similarity-based analysis
 Mining data streams
 Ordered, time-varying, potentially infinite, data streams

9 PRASHASTI KANIKAR 9/26/2020

Structure and Network Analysis
 Graph mining
 Finding frequent subgraphs (e.g., chemical compounds), trees (XML), substructures
(web fragments)
 Information network analysis
 Social networks: actors (objects, nodes) and relationships (edges)
 e.g., author networks in CS, terrorist networks
 Multiple heterogeneous networks
 A person could be multiple information networks: friends, family, classmates, …
 Links carry a lot of semantic information: Link mining
 Web mining
 Web is a big information network: from PageRank to Google
 Analysis of Web information networks
 Web community discovery, opinion mining, usage mining, …

10 PRASHASTI KANIKAR 9/26/2020

Technologies to be used

Machine Pattern Statistics

Learning Recognition

Applications Data Mining Visualization

Algorithm Database High-Performance

Technology Computing

Data Mining: Confluence of Multiple Disciplines

11 PRASHASTI KANIKAR 9/26/2020
Why Confluence of Multiple Disciplines?
 Tremendous amount of data
 Algorithms must be highly scalable to handle such as tera-bytes of data
 High-dimensionality of data
 Micro-array may have tens of thousands of dimensions
 High complexity of data
 Data streams and sensor data
 Time-series data, temporal data, sequence data
 Structure data, graphs, social networks and multi-linked data
 Heterogeneous databases and legacy databases
 Spatial, spatiotemporal, multimedia, text and Web data
 Software programs, scientific simulations
 New and sophisticated applications

12 PRASHASTI KANIKAR 9/26/2020

Data Mining Application:
Marketing
 Sales Analysis
• associations between product sales:
 bread and butter
 Toothpaste and toothbrush

 Customer Profiling
• data mining can tell you what types of customers
buy what products
 Identifying Customer Requirements
• identify the best products for different customers
• use prediction to find what factors will attract new
customers
13 PRASHASTI KANIKAR 9/26/2020
Data Mining Application:
Fraud Detection
• Association Rule Mining can detect a group of people who
stage accidents to collect on insurance

• a data-mining application can be used to detect suspicious

money transactions

• data mining can be used to help commercial lending

decisions and to prevent fraud

14 PRASHASTI KANIKAR 9/26/2020

Other Applications of Data Mining
 Web page analysis: from web page classification, clustering to PageRank & HITS
algorithms
 Collaborative analysis & recommender systems

 Basket data analysis to targeted marketing

 Biological and medical data analysis: classification, cluster analysis (microarray data
analysis), biological sequence analysis, biological network analysis
 Data mining and software engineering (e.g., IEEE Computer, Aug. 2009 issue)

 From major dedicated data mining systems/tools (e.g., SAS, MS SQL-Server Analysis
Manager, Oracle Data Mining Tools) to invisible data mining

15 PRASHASTI KANIKAR 9/26/2020

Major Issues in Data Mining

 Mining Methodology
 Mining various and new kinds of knowledge
 Mining knowledge in multi-dimensional space
 Data mining: An interdisciplinary effort
 Boosting the power of discovery in a networked environment
 Handling noise, uncertainty, and incompleteness of data
 Pattern evaluation and pattern- or constraint-guided mining

 User Interaction
 Interactive mining
 Incorporation of background knowledge
 Presentation and visualization of data mining results

16 PRASHASTI KANIKAR 9/26/2020

SQL Server To Aurora PostgreSQL Migration Playbook 1.0 Preliminary
No ratings yet
SQL Server To Aurora PostgreSQL Migration Playbook 1.0 Preliminary
456 pages
Major Issues in Data Mining
75% (4)
Major Issues in Data Mining
45 pages
CertyIQ AZ-900 UpdatedExam Dumps - 2022 Part 2
No ratings yet
CertyIQ AZ-900 UpdatedExam Dumps - 2022 Part 2
27 pages
DWDMUNIT1A
No ratings yet
DWDMUNIT1A
95 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
91 pages
Salesforce Interview Question.1
No ratings yet
Salesforce Interview Question.1
45 pages
Spring Boot PDF Notes
0% (1)
Spring Boot PDF Notes
11 pages
Intro. To Data Warehousing and Mining - Fall - Spring - Fall and Spring. 1 Credit Unit)
No ratings yet
Intro. To Data Warehousing and Mining - Fall - Spring - Fall and Spring. 1 Credit Unit)
41 pages
Real Estate Management System Synopsis PDF
71% (14)
Real Estate Management System Synopsis PDF
6 pages
Zabbix
100% (1)
Zabbix
45 pages
Data Mining Notes
100% (1)
Data Mining Notes
45 pages
Unit 1: Data Warehousing & Data Mining
No ratings yet
Unit 1: Data Warehousing & Data Mining
54 pages
Week 02 PDF
No ratings yet
Week 02 PDF
39 pages
Chapter 1 DM
No ratings yet
Chapter 1 DM
20 pages
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
No ratings yet
3-OLAP Operations-13!08!2021 (13-Aug-2021) Material I 13-Aug-2021 Data Mining - Introductory Slides
37 pages
Dpu4e - Dpu4f
No ratings yet
Dpu4e - Dpu4f
5 pages
Chapter - 1
No ratings yet
Chapter - 1
22 pages
Disaster Recovery Using VMware Vsphere Replication and Vcenter Site Recovery Manager Sample Chapter
No ratings yet
Disaster Recovery Using VMware Vsphere Replication and Vcenter Site Recovery Manager Sample Chapter
41 pages
1 Intro
No ratings yet
1 Intro
33 pages
Unit 2 Ques
No ratings yet
Unit 2 Ques
80 pages
01 Intro
No ratings yet
01 Intro
23 pages
Lecture 1
No ratings yet
Lecture 1
37 pages
Oracle 2021 Java Selenium Automation Interview Questions
No ratings yet
Oracle 2021 Java Selenium Automation Interview Questions
10 pages
DM - Lecture 1
No ratings yet
DM - Lecture 1
28 pages
Data Mining
No ratings yet
Data Mining
27 pages
General Services Administration: GS-35F-0278L March 7, 2016 Through March 6, 2021
No ratings yet
General Services Administration: GS-35F-0278L March 7, 2016 Through March 6, 2021
23 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
28 pages
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
No ratings yet
Mehrdad Jalali: Jalali@mshdiau - Ac.ir Jalali - Mshdiau.ac - Ir
27 pages
Online Shopping System (Edited)
No ratings yet
Online Shopping System (Edited)
18 pages
Sas/Access Interface To SAP BW: User's Guide
No ratings yet
Sas/Access Interface To SAP BW: User's Guide
92 pages
Unit - I
No ratings yet
Unit - I
22 pages
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
No ratings yet
Introduction To Data Mining: Dr. Dipti Chauhan Assistant Professor SCSIT, SUAS Indore
16 pages
01 Intro
No ratings yet
01 Intro
29 pages
Data Mining
No ratings yet
Data Mining
26 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
Unit 3.1
No ratings yet
Unit 3.1
23 pages
DM Introduction
No ratings yet
DM Introduction
32 pages
01 Intro
No ratings yet
01 Intro
40 pages
PHPBB 3
No ratings yet
PHPBB 3
85 pages
Data Mining
No ratings yet
Data Mining
88 pages
Data Science & Big Data Analysis Module 1,2,3,4,5
No ratings yet
Data Science & Big Data Analysis Module 1,2,3,4,5
70 pages
Cseise6thelec 1
No ratings yet
Cseise6thelec 1
2 pages
21IS503 UnitII LM5
No ratings yet
21IS503 UnitII LM5
20 pages
Chapter 18 Theory Questions Worksheet 2
No ratings yet
Chapter 18 Theory Questions Worksheet 2
16 pages
Introduction
No ratings yet
Introduction
46 pages
(IJCST-V3I1P3) Author:Siddu P. Algur, Basavaraj A. Goudannavar, Prashant Bhat
No ratings yet
(IJCST-V3I1P3) Author:Siddu P. Algur, Basavaraj A. Goudannavar, Prashant Bhat
6 pages
Course Outline MIS 205
100% (1)
Course Outline MIS 205
3 pages
Knowledge Management UNIT-3 Notes
No ratings yet
Knowledge Management UNIT-3 Notes
17 pages
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
No ratings yet
Knowledge Discovery Process and Data Mining - Final Remarks: - Moore's Law
25 pages
Unit 1
No ratings yet
Unit 1
95 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
How To Build A Scalable & Robust Web Application Design
No ratings yet
How To Build A Scalable & Robust Web Application Design
8 pages
DM 1
No ratings yet
DM 1
7 pages
DP-201 Answers and Explanation
No ratings yet
DP-201 Answers and Explanation
215 pages
DWDM LS1 Fall 24 25
No ratings yet
DWDM LS1 Fall 24 25
42 pages
Data Mining 1
No ratings yet
Data Mining 1
39 pages
Level 6 Database Systems Cat 1
No ratings yet
Level 6 Database Systems Cat 1
5 pages
Unit 1
No ratings yet
Unit 1
148 pages
DM 1
No ratings yet
DM 1
47 pages
Lecture 01 11jan
No ratings yet
Lecture 01 11jan
29 pages
Data Mining L-5
No ratings yet
Data Mining L-5
19 pages
Data Mining Unit I Notes
No ratings yet
Data Mining Unit I Notes
24 pages
BSA Mainframe
No ratings yet
BSA Mainframe
4 pages
Datamining&warehousing
No ratings yet
Datamining&warehousing
65 pages
01 Intro
No ratings yet
01 Intro
28 pages
Unit 1 Data Mining Introduction
No ratings yet
Unit 1 Data Mining Introduction
53 pages
144 BCA Honors MAJOR Dbms Syllabus
No ratings yet
144 BCA Honors MAJOR Dbms Syllabus
6 pages
Week1 1
No ratings yet
Week1 1
18 pages
Week1 2
No ratings yet
Week1 2
24 pages
Online Quizz
No ratings yet
Online Quizz
12 pages
Lec Slides Combined Mid Quiz With Old Quizzes
No ratings yet
Lec Slides Combined Mid Quiz With Old Quizzes
378 pages
GAURAV SINGH-profile
No ratings yet
GAURAV SINGH-profile
1 page
Management Information System at Noun Edition
No ratings yet
Management Information System at Noun Edition
11 pages
RAGHack PostgreSQL
No ratings yet
RAGHack PostgreSQL
36 pages
6.1 Managing Backup and Recovery in Oracle RAC
No ratings yet
6.1 Managing Backup and Recovery in Oracle RAC
10 pages
IS352 - Lecture 01
No ratings yet
IS352 - Lecture 01
62 pages
02 DM BI Data Mining
No ratings yet
02 DM BI Data Mining
66 pages
DataMining and Warehousing - Chapter1
No ratings yet
DataMining and Warehousing - Chapter1
23 pages
Church Events Reservation System As 10032024 726pm
No ratings yet
Church Events Reservation System As 10032024 726pm
56 pages
02-Introduction To Data Mining
No ratings yet
02-Introduction To Data Mining
40 pages
DM - Unit I-Updated
No ratings yet
DM - Unit I-Updated
65 pages
Lecture 1.1.1 1.1.2
No ratings yet
Lecture 1.1.1 1.1.2
32 pages
1 - DM
No ratings yet
1 - DM
5 pages
Inf 444e - Datamining N Advanced Databases Introduction 2019
No ratings yet
Inf 444e - Datamining N Advanced Databases Introduction 2019
32 pages
DM Unit 1
No ratings yet
DM Unit 1
10 pages
Oracle AI Vector Search Mock Test - Set - 05
No ratings yet
Oracle AI Vector Search Mock Test - Set - 05
5 pages
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet

Unit 5 Introduction To Data Mining: Prashasti Kanikar 9/26/2020

Uploaded by

Unit 5 Introduction To Data Mining: Prashasti Kanikar 9/26/2020

Uploaded by

Unit 5

Introduction to Data Mining

1 PRASHASTI KANIKAR 9/26/2020

2 PRASHASTI KANIKAR 9/26/2020

 Relational database, data warehouse, transactional database

 Advanced data sets and advanced applications

 Data streams and sensor data

 Time-series data, temporal data, sequence data (incl. bio-sequences)

 Structure data, graphs, social networks and multi-linked data

 Heterogeneous databases and legacy databases

 Spatial data and spatiotemporal data

 The World-Wide Web

3 PRASHASTI KANIKAR 9/26/2020

5 PRASHASTI KANIKAR 9/26/2020

 Classification and label prediction

6 PRASHASTI KANIKAR 9/26/2020

 Unsupervised learning (i.e., Class label is unknown)

7 PRASHASTI KANIKAR 9/26/2020

8 PRASHASTI KANIKAR 9/26/2020

9 PRASHASTI KANIKAR 9/26/2020

10 PRASHASTI KANIKAR 9/26/2020

Machine Pattern Statistics

Applications Data Mining Visualization

Algorithm Database High-Performance

Data Mining: Confluence of Multiple Disciplines

12 PRASHASTI KANIKAR 9/26/2020

• a data-mining application can be used to detect suspicious

• data mining can be used to help commercial lending

14 PRASHASTI KANIKAR 9/26/2020

 Basket data analysis to targeted marketing

15 PRASHASTI KANIKAR 9/26/2020

16 PRASHASTI KANIKAR 9/26/2020

You might also like