0% found this document useful (0 votes)

9 views54 pages

Advanced-Applications

Data Mining IOE - Chapter 7 Notes

Uploaded by

flamboyantmcclintock4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views54 pages

Advanced-Applications

Data Mining IOE - Chapter 7 Notes

Uploaded by

flamboyantmcclintock4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Unit 7

Advanced Applications

1
Advanced Applications
 Multimedia data mining
 Similarity search in multimedia data
 Mining association in multimedia data
 An introduction to text mining
 Natural language processing and information extraction
 Web mining
– Web content mining
– Web structure mining
– Web usage mining

2
Mining Complex Data Types

3
Mining Sequence Data
 A sequence is an ordered list of events.
 Sequences may be categorized into three groups, based on the
characteristics of the events they describe:
(1) time-series data,
(2) symbolic sequence data, and
(3) biological sequences.

4
Mining Sequence Data
 In time-series data, sequence data consist of long sequences of
numeric data, recorded at equal time intervals (e.g., per minute,
per hour, or per day).
 Time-series data can be generated by many natural and economic
processes such as stock markets, and scientific, medical, or
natural observations.

5
Mining Sequence Data
 Symbolic sequence data consist of long sequences of event or
nominal data, which typically are not observed at equal time
intervals.
 For many such sequences, gaps (i.e., lapses between recorded
events) do not matter much.
 Examples include customer shopping sequences and web click
streams, as well as sequences of events in science and
engineering and in natural and social developments.

6
Mining Sequence Data
 Biological sequences include DNA and protein sequences. Such
sequences are typically very long, and carry important,
complicated, but hidden semantic meaning.
 Here, gaps are usually important.

7
Mining Sequence Data: Similarity Search in
Time-Series Data
 Unlike normal database queries, which find data that match a
given query exactly, a similarity search finds data sequences
that differ only slightly from the given query sequence.
 Many time-series similarity queries require subsequence
matching, that is, finding a set of sequences that contain
subsequences that are similar to a given query sequence.

8
Mining Sequence Data: Similarity Search in
Time-Series Data
 For similarity search, it is often necessary to first perform data or
dimensionality reduction and transformation of time-series data.
Typical dimensionality reduction techniques include
(1) the discrete Fourier transform (DFT),
(2) discrete wavelet transforms (DWT), and
(3) singular value decomposition (SVD) based PCA.

9
Mining Sequence Data: Regression and
Trend Analysis in Time-Series Data
 Regression analysis of time-series data has been studied
substantially in the fields of statistics and signal analysis.
 However, one may often need to go beyond pure regression analysis
and perform trend analysis for many practical applications.

10
Mining Sequence Data: Regression and
Trend Analysis in Time-Series Data
 Trend analysis builds an integrated model using the following four major
components or movements to characterize time-series data:
1. Trend or long-term movements: These indicate the general direction in
which a time-series graph is moving over time, for example, using weighted
moving average and the least squares methods to find trend curves such as the
dashed curve indicated in Figure
2. Cyclic movements: These are the long-term oscillations about a trend line
or curve.
3. Seasonal variations: These are nearly identical patterns that a time series
appears to follow during corresponding seasons of successive years such as
holiday shopping seasons. For effective trend analysis, the data often need to be
“deseasonalized” based on a seasonal index computed by autocorrelation.
4.Random movements: These characterize sporadic changes due to chanc
eevents such as labor disputes or announced personnel changes within
companies.

11
Multimedia Data Mining
“What is a multimedia database?”
 A multimedia database system stores and manages a large
collection of multimedia data, such as audio, video, image,
graphics, speech, text, document, and hypertext data, which
contain text, text markups, and linkages.
 Multimedia database systems are increasingly common owing to
the popular use of audio- video equipment, digital cameras, CD-
ROMs, and the Internet.
 Typical multimedia database systems include NASA’s EOS (Earth
Observation System), various kinds of image and audio-video
databases, and Internet databases.

12
Similarity Search in Multimedia Data

“When searching for similarities in multimedia data, can we

search on either the data description or the data content?”
 The answer is yes.
 For similarity searching in multimedia data, we consider two main
families of multimedia indexing and retrieval systems:
(1) description-based retrieval systems,
– which build indices and perform object retrieval based on
image descriptions, such as keywords, captions, size, and
time of creation;
(2) content-based retrieval systems,
– which support retrieval based on the image content, such as
color histogram, texture, pattern, image topology, and the
shape of objects and their layouts and locations within the
image. 13
Similarity Search in Multimedia Data

Several approaches have been proposed and studied for similarity-

based retrieval in image databases, based on image signature:
 Color histogram–based signature:
– In this approach, the signature of an image includes color
histograms based on the color composition of an image
regardless of its scale or orientation.
– This method does not contain any information about shape,
image topology, or texture.
– Thus, two images with similar color composition but that
contain very different shapes or textures may be identified as
similar, although they could be completely unrelated
semantically.

14
Similarity Search in Multimedia Data

Several approaches have been proposed and studied for similarity-

based retrieval in image databases, based on image signature:
 Multifeature composed signature:
– In this approach, the signature of an image includes a
composition of multiple features: color histogram, shape,
image topology, and texture.
– The extracted image features are stored as metadata, and
images are indexed based on such metadata.

15
Similarity Search in Multimedia Data

Several approaches have been proposed and studied for similarity-

based retrieval in image databases, based on image signature:
 Wavelet-based signature:
– This approach uses the dominant wavelet coefficients of an
image as its signature.
– Wavelets capture shape, texture, and image topology informa-
tion in a single unified framework.
– However, since this method computes a single signature for
an entire image, it may fail to identify images containing similar
objects where the objects differ in location or size.

16
Text Mining
 In reality, a substantial portion of the available information is
stored in text databases (or document databases), which consist
of large collections of documents from various sources, such as
news articles, research papers, books, digital libraries, e-mail
messages, and Web pages.

17
Text Mining
 Nowadays most of the information in government, industry,
business, and other institutions are stored electronically, in the
form of text databases.
 Data stored in most text databases are semistructured data in
that they are neither completely unstructured nor completely
structured.
 For example, a document may contain a few structured fields,
such as title, authors, publication date, category, and so on, but
also contain some largely unstructured text components, such as
abstract and contents.

18
Text Mining

19
Text Mining

20
Some Text Mining Application

21
Text Mining: Classification of News

22
Text Mining: Sentiment Analysis

23
Text Mining: Search Log Mining

24
Text Mining: Search Vs Discovery

25
Text Mining: Process

26
Text Mining: Text Preprocessing

27
Text Mining: Text Preprocessing
Syntactic and Linguistic Text Preprocessing

28
Text Mining: Text Preprocessing

Stopword Removal

29
Text Mining: Text Preprocessing
Stemming

30
Text Mining: Text Preprocessing
Some Basic Stemming Rules

31
Text Mining: Feature Generation

32
Text Mining: Feature Generation
Bag-of-Words: The Term-Document Matrix

33
Text Mining: Feature Generation
Bag-of-Words:Feature Generation

34
Text Mining: Feature Generation
The TF-IDF Term Weighting Scheme

35
Text Mining: Feature Generation
Word Embeddings

36
Text Mining: Feature Generation
Embedding Methods and Pretrained Models

37
Text Mining: Feature Selection

38
Text Mining: Feature Selection
Filter Tokens by POS Tags

39
Text Mining: Pattern Discovery

40
Text Mining: Pattern Discovery
Document Clustering

41
Text Mining: Pattern Discovery
Jaccard Coefficient

42
Text Mining: Pattern Discovery
Example: Jaccard Coefficient

43
Text Mining: Pattern Discovery
Cosine Similarity

44
Text Mining: Pattern Discovery
Example: Cosine Similarity and TF-IDF

45
Text Mining: Pattern Discovery
Example: Cosine Similarity and TF-IDF

46
Text Mining: Pattern Discovery
Document Classification

47
Text Mining: Pattern Discovery
Example Application: Sentiment Analysis

48
Example Application: Sentiment Analysis

49
GoEmotions: A Dataset for Fine-Grained Emotion Classification

50
Web Mining
 Web mining is the use of data mining techniques to extract knowledge
from web data.

 Web data includes:

- web documents
- hyperlinks between documents
- usage logs of web sites

 The WWW is huge, widely distributed, global information service centre

and, therefore, constitutes a rich source for data mining.

51
Web Mining

52
Web Mining: Issues

 Web data sets can be very large

- Tens to hundreds of terabyte
 Cannot mine on a single server
- Need large farms of servers
 Proper organization of hardware and software to mine multi-
terabyte data sets
 Difficulty in finding relevant information
 Extracting new knowledge from the web

53
Web Mining: Issues

DWDM PDF (R18) (2018-2022)
No ratings yet
DWDM PDF (R18) (2018-2022)
243 pages
B-64697en - 01 0i-F Plus Operation and Maintenance Handbook
No ratings yet
B-64697en - 01 0i-F Plus Operation and Maintenance Handbook
890 pages
Dentin
No ratings yet
Dentin
12 pages
Data Mining (Module-1)
No ratings yet
Data Mining (Module-1)
14 pages
11B-Consecutive Integer Word Problems
No ratings yet
11B-Consecutive Integer Word Problems
4 pages
Unit5-Dwdm
No ratings yet
Unit5-Dwdm
58 pages
MXQ Pro TV Box Android Instruction
No ratings yet
MXQ Pro TV Box Android Instruction
12 pages
Ordinary Differential Equation. First Order PDF
No ratings yet
Ordinary Differential Equation. First Order PDF
16 pages
DMDW
No ratings yet
DMDW
287 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
Unit V 1
No ratings yet
Unit V 1
23 pages
Wr250f 2009 Service
No ratings yet
Wr250f 2009 Service
314 pages
TMK DWDM Unit 7 Advance Topics
No ratings yet
TMK DWDM Unit 7 Advance Topics
28 pages
BCA Semester VI Data Mining Module 5 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 5 (Presentation Kind of N
38 pages
CS822 DataMining Week1
No ratings yet
CS822 DataMining Week1
97 pages
Intro 1
No ratings yet
Intro 1
43 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
BI Ch02
No ratings yet
BI Ch02
29 pages
Data Mining Unit-1
No ratings yet
Data Mining Unit-1
59 pages
09le 03 Gs Punctuation I PT Pa 2
No ratings yet
09le 03 Gs Punctuation I PT Pa 2
9 pages
DM Laqs
No ratings yet
DM Laqs
14 pages
DATA MINING For Search Engines
No ratings yet
DATA MINING For Search Engines
33 pages
DM UNIT-I Notes
No ratings yet
DM UNIT-I Notes
54 pages
1st Slides
No ratings yet
1st Slides
60 pages
Unit - 5
No ratings yet
Unit - 5
12 pages
ESC0102P - Profram For Problem Solving Lab File
No ratings yet
ESC0102P - Profram For Problem Solving Lab File
42 pages
1-Data Mining and Applications
No ratings yet
1-Data Mining and Applications
70 pages
Swi12/14Tm Swi20/25Te Swi20/25Ti: Flange Spreading Wedges
No ratings yet
Swi12/14Tm Swi20/25Te Swi20/25Ti: Flange Spreading Wedges
40 pages
(IJCST-V5I3P21) :mylavarapu Kalyan Ram, Dr.M.Venkateswara Rao, Challapalli Sujana
No ratings yet
(IJCST-V5I3P21) :mylavarapu Kalyan Ram, Dr.M.Venkateswara Rao, Challapalli Sujana
6 pages
Bi - Unit 3
No ratings yet
Bi - Unit 3
18 pages
Exoplanet Dissertation 5
No ratings yet
Exoplanet Dissertation 5
34 pages
Lizard Stream Cipher
No ratings yet
Lizard Stream Cipher
12 pages
Introduction Data Science
No ratings yet
Introduction Data Science
29 pages
01 - Data Mining Introduction
No ratings yet
01 - Data Mining Introduction
21 pages
Insight Into Theoretical and Applied Informatics I... - (2.2.4 Data Mining)
No ratings yet
Insight Into Theoretical and Applied Informatics I... - (2.2.4 Data Mining)
5 pages
Accurate Sensors Technologies: Ast A4-Ex
No ratings yet
Accurate Sensors Technologies: Ast A4-Ex
3 pages
Delta Ia-Mds Vfd-E Um en 20160516
33% (3)
Delta Ia-Mds Vfd-E Um en 20160516
435 pages
Data Mining L-3,4
No ratings yet
Data Mining L-3,4
25 pages
Module 7 Mining Object Spatial Multimedia Text and Web Data
100% (1)
Module 7 Mining Object Spatial Multimedia Text and Web Data
28 pages
NS LogMessages
No ratings yet
NS LogMessages
54 pages
Web Page Similarity Draft Final
No ratings yet
Web Page Similarity Draft Final
71 pages
01 Intro
No ratings yet
01 Intro
26 pages
Module 5-1
No ratings yet
Module 5-1
6 pages
HUECK ETFE Print Patterns Decors Overview EN 220426 Final
No ratings yet
HUECK ETFE Print Patterns Decors Overview EN 220426 Final
1 page
Advanced Analytics - Course Outline
No ratings yet
Advanced Analytics - Course Outline
4 pages
Ninoy Aquino Wildlife Center
No ratings yet
Ninoy Aquino Wildlife Center
18 pages
Unit I Introduction 1.1 What Motivated Data Mining? Why Is It Important?
No ratings yet
Unit I Introduction 1.1 What Motivated Data Mining? Why Is It Important?
18 pages
Thesis Chapterwise
No ratings yet
Thesis Chapterwise
52 pages
Data Mining: Concepts and Techniques (2nd Edition)
No ratings yet
Data Mining: Concepts and Techniques (2nd Edition)
8 pages
Datamining Lect1
No ratings yet
Datamining Lect1
61 pages
Data Mining: Nicoleta ROGOVSCHI
No ratings yet
Data Mining: Nicoleta ROGOVSCHI
84 pages
1 Intor To DMW
No ratings yet
1 Intor To DMW
22 pages
01 Intro
No ratings yet
01 Intro
45 pages
A Survey On Association Rules in Case of Multimedia Data Mining
No ratings yet
A Survey On Association Rules in Case of Multimedia Data Mining
4 pages
John - Fields - HW1 Data Mining
No ratings yet
John - Fields - HW1 Data Mining
10 pages
Dunham - Data Mining PDF
100% (1)
Dunham - Data Mining PDF
156 pages
CP2 Module 1 - Program Development Life Cycle
No ratings yet
CP2 Module 1 - Program Development Life Cycle
16 pages
T002 TMU Project Synopsis Presentation Template v1.0
No ratings yet
T002 TMU Project Synopsis Presentation Template v1.0
11 pages
Dunham - Data Mining PDF
83% (6)
Dunham - Data Mining PDF
156 pages
1 Introduction To Computing
No ratings yet
1 Introduction To Computing
44 pages
Data Mining: Knowledge Discovery in Databases
No ratings yet
Data Mining: Knowledge Discovery in Databases
24 pages
Malasiqui Catholic School: Malasiqui, Pangasinan 2421 Philippines Tel. No. 632-2390
No ratings yet
Malasiqui Catholic School: Malasiqui, Pangasinan 2421 Philippines Tel. No. 632-2390
2 pages
Data Mining Unit4
No ratings yet
Data Mining Unit4
16 pages
Series and Shunt Compensation
No ratings yet
Series and Shunt Compensation
55 pages
An Introduction To Direct-Sequence Spread-Spectrum Communications
No ratings yet
An Introduction To Direct-Sequence Spread-Spectrum Communications
11 pages
Data Mining
No ratings yet
Data Mining
26 pages
1.1 What Is Data Mining?
No ratings yet
1.1 What Is Data Mining?
6 pages
ZP OI Basstuba BA-Englisch
No ratings yet
ZP OI Basstuba BA-Englisch
2 pages
PSP 4 5 Units
No ratings yet
PSP 4 5 Units
132 pages
Cs1004 Data Warehousing & Mining Unit 5
No ratings yet
Cs1004 Data Warehousing & Mining Unit 5
10 pages
Trends in Data Mining
No ratings yet
Trends in Data Mining
9 pages
Data Mining Concepts and Applications: Six Factors Behind The Sudden Rise in Popularity of Data Mining
No ratings yet
Data Mining Concepts and Applications: Six Factors Behind The Sudden Rise in Popularity of Data Mining
36 pages
Summer Training-Diesel Loco Shed 2014, Tughlakabad
80% (10)
Summer Training-Diesel Loco Shed 2014, Tughlakabad
53 pages
Article Review Assignment
No ratings yet
Article Review Assignment
16 pages
Monica Joyce Naperi - Adaptive-Teaching-Guide-Template
100% (1)
Monica Joyce Naperi - Adaptive-Teaching-Guide-Template
9 pages
Data Bases Data Ware Hous e Pre Proces Sed Data Mine D Data Disco Vered Know Ledge Data Cleaning Data Integration Data Mining
No ratings yet
Data Bases Data Ware Hous e Pre Proces Sed Data Mine D Data Disco Vered Know Ledge Data Cleaning Data Integration Data Mining
7 pages
Introduction To Data Mining & Business Intelligence
No ratings yet
Introduction To Data Mining & Business Intelligence
25 pages
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
0% (1)
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
20 pages
Unit 3 Data Mining PDF
No ratings yet
Unit 3 Data Mining PDF
19 pages
Volvo Penta D7 at Genset
75% (4)
Volvo Penta D7 at Genset
4 pages
Configuring Intergraph Smart 3D Application Servers and Databases Creations
No ratings yet
Configuring Intergraph Smart 3D Application Servers and Databases Creations
20 pages
Web Mining: Day-Today: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
No ratings yet
Web Mining: Day-Today: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS)
4 pages
Samsung LN40C540F2FXZA Fast Track Guide (SM)
No ratings yet
Samsung LN40C540F2FXZA Fast Track Guide (SM)
4 pages
Data Warehousing and Mining: V Unit: Recent Trends
No ratings yet
Data Warehousing and Mining: V Unit: Recent Trends
5 pages
UNIT 4 Mining Object Spatial Multimedia Text and Web Data
No ratings yet
UNIT 4 Mining Object Spatial Multimedia Text and Web Data
30 pages
Data Mining-Multimedia Datamining
No ratings yet
Data Mining-Multimedia Datamining
8 pages
State-of-the-Art ABAP - A Practical Programming Guide
No ratings yet
State-of-the-Art ABAP - A Practical Programming Guide
26 pages
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet