0% found this document useful (0 votes)

25 views78 pages

(FREE PDF SAMPLE) Pattern Discovery Using Sequence Data Mining Applications and Studies 1st Edition Pradeep Kumar Ebook Full Chapters

The document promotes the ebook 'Pattern Discovery Using Sequence Data Mining: Applications and Studies' edited by Pradeep Kumar and others, which provides a comprehensive overview of sequence mining techniques and current research. It includes various chapters on applications, techniques, and case studies related to sequential data mining. The book aims to highlight the significance of sequential data in various fields, including biology and web analytics.

Uploaded by

susienottet6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views78 pages

(FREE PDF SAMPLE) Pattern Discovery Using Sequence Data Mining Applications and Studies 1st Edition Pradeep Kumar Ebook Full Chapters

Uploaded by

susienottet6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

Get the full ebook with Bonus Features for a Better Reading Experience on ebookgate.

com

Pattern Discovery Using Sequence Data Mining

Applications and Studies 1st Edition Pradeep Kumar

https://ebookgate.com/product/pattern-discovery-using-
sequence-data-mining-applications-and-studies-1st-edition-
pradeep-kumar/

OR CLICK HERE

DOWLOAD NOW

Download more ebook instantly today at https://ebookgate.com

Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Statistical Data Mining Using SAS Applications Second

Edition Chapman Hall CRC Data Mining and Knowledge
Discovery Series George Fernandez
https://ebookgate.com/product/statistical-data-mining-using-sas-
applications-second-edition-chapman-hall-crc-data-mining-and-
knowledge-discovery-series-george-fernandez/
ebookgate.com

Knowledge Discovery Practices and Emerging Applications of

Data Mining Trends and New Domains 1st Edition A.V.
Senthil Kumar
https://ebookgate.com/product/knowledge-discovery-practices-and-
emerging-applications-of-data-mining-trends-and-new-domains-1st-
edition-a-v-senthil-kumar/
ebookgate.com

Data Mining with R Learning with Case Studies Chapman Hall

CRC Data Mining and Knowledge Discovery Series 1st Edition
Torgo
https://ebookgate.com/product/data-mining-with-r-learning-with-case-
studies-chapman-hall-crc-data-mining-and-knowledge-discovery-
series-1st-edition-torgo/
ebookgate.com

Data Mining and Knowledge Discovery for Geoscientists 1st

Edition Guangren Shi (Auth.)

https://ebookgate.com/product/data-mining-and-knowledge-discovery-for-
geoscientists-1st-edition-guangren-shi-auth/

ebookgate.com
Data Mining Algorithms Explained Using R 1st Edition Pawel
Cichosz

https://ebookgate.com/product/data-mining-algorithms-explained-
using-r-1st-edition-pawel-cichosz/

ebookgate.com

Knowledge Discovery with Support Vector Machines Wiley

Series on Methods and Applications in Data Mining 1st
Edition Lutz H. Hamel
https://ebookgate.com/product/knowledge-discovery-with-support-vector-
machines-wiley-series-on-methods-and-applications-in-data-mining-1st-
edition-lutz-h-hamel/
ebookgate.com

Biological Knowledge Discovery Handbook Preprocessing

Mining and Postprocessing of Biological Data 1st Edition
Mourad Elloumi
https://ebookgate.com/product/biological-knowledge-discovery-handbook-
preprocessing-mining-and-postprocessing-of-biological-data-1st-
edition-mourad-elloumi/
ebookgate.com

Data mining and medical knowledge management cases and

applications 1st Edition Petr Berka

https://ebookgate.com/product/data-mining-and-medical-knowledge-
management-cases-and-applications-1st-edition-petr-berka/

ebookgate.com

Data Mining with Python Theory Application and Case

Studies 1st Edition Di Wu

https://ebookgate.com/product/data-mining-with-python-theory-
application-and-case-studies-1st-edition-di-wu/

ebookgate.com
Pattern Discovery Using
Sequence Data Mining:
Applications and Studies
Pradeep Kumar
Indian Institute of Management Lucknow, India

P. Radha Krishna
Infosys Lab, Infosys Limited, India

S. Bapi Raju
University of Hyderabad, India
Senior Editorial Director: Kristin Klinger
Director of Book Publications: Julia Mosemann
Editorial Director: Lindsay Johnston
Acquisitions Editor: Erika Carter
Development Editor: Joel Gamon
Production Editor: Sean Woznicki
Typesetters: Jennifer Romanchak, Lisandro Gonzalez
Print Coordinator: Jamie Snavely
Cover Design: Nick Newcomer

Published in the United States of America by

Information Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue
Hershey PA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: [email protected]
Web site: http://www.igi-global.com

Copyright © 2012 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or
companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.

Library of Congress Cataloging-in-Publication Data

Pattern discovery using sequence data mining : applications and studies / Pradeep Kumar, P. Radha Krishna, and S. Bapi
Raju, editors.
p. cm.
Summary: “This book provides a comprehensive view of sequence mining techniques, and present current research and
case studies in Pattern Discovery in Sequential data authored by researchers and practitioners”-- Provided by publisher.
Includes bibliographical references and index.
ISBN 978-1-61350-056-9 (hardcover) -- ISBN 978-1-61350-058-3 (print & perpetual access) -- ISBN 978-1-61350-057-6
(ebook) 1. Sequential pattern mining. 2. Sequential processing (Computer science) I. Kumar, Pradeep, 1977- II. Radha
Krishna, P. III. Raju, S. Bapi, 1962-
QA76.9.D343P396 2012
006.3’12--dc22
2011008678

British Cataloguing in Publication Data

A Cataloguing in Publication record for this book is available from the British Library.

All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.
List of Reviewers
Manish Gupta, University of Illinois at Urbana, USA
Chandra Sekhar, Indian Institute of Technology Madras, India
Arnab Bhattacharya, Indian Institute of Technology Kanpur, India
Padmaja T Maruthi, University of Hyderabad, India
T. Ravindra Babu, Infosys Technologies Ltd, India
Pratibha Rani, International Institute of Information Technology Hyderabad, India
Nita Parekh, International Institute of Information Technology Hyderabad, India
Anass El-Haddadi, IRIT, France
Pinar Senkul, Middle East Technical University, Turkey
Jessica Lin, George Mason University, USA
Pradeep Kumar, Indian Institute of Management Lucknow, India
Raju S. Bapi, University of Hyderabad, India
P. Radha Krishna, Infosys Lab, Infosys Limited, India
Table of Contents

Preface...................................................................................................................................................vii

Section 1
Current State of Art

Chapter 1
Applications of Pattern Discovery Using Sequential Data Mining......................................................... 1
Manish Gupta, University of Illinois at Urbana-Champaign, USA
Jiawei Han, University of Illinois at Urbana-Champaign, USA

Chapter 2
A Review of Kernel Methods Based Approaches to Classification and Clustering of Sequential
Patterns, Part I: Sequences of Continuous Feature Vectors................................................................... 24
Dileep A. D., Indian Institute of Technology, India
Veena T., Indian Institute of Technology, India
C. Chandra Sekhar, Indian Institute of Technology, India

Chapter 3
A Review of Kernel Methods Based Approaches to Classification and Clustering of Sequential
Patterns, Part II: Sequences of Discrete Symbols.................................................................................. 51
Veena T., Indian Institute of Technology, India
Dileep A. D., Indian Institute of Technology, India
C. Chandra Sekhar, Indian Institute of Technology, India

Section 2
Techniques

Chapter 4
Mining Statistically Significant Substrings Based on the Chi-Square Measure.................................... 73
Sourav Dutta, IBM Research Lab, India
Arnab Bhattacharya, Indian Institute of Technology Kanpur, India
Chapter 5
Unbalanced Sequential Data Classification Using Extreme Outlier Elimination and Sampling
Techniques............................................................................................................................................. 83
T. Maruthi Padmaja, University of Hyderabad (UoH), India
Raju S. Bapi, University of Hyderabad (UoH), India
P. Radha Krishna, Infosys Lab, Infosys Limited, India

Chapter 6
Quantization Based Sequence Generation and Subsequence Pruning for Data Mining
Applications........................................................................................................................................... 94
T. Ravindra Babu, Infosys Limited, India
M. Narasimha Murty, Indian Institute of Science Bangalore, India
S. V. Subrahmanya, Infosys Limited, India

Chapter 7
Classification of Biological Sequences................................................................................................ 111
Pratibha Rani, International Institute of Information Technology Hyderabad, India
Vikram Pudi, International Institute of Information Technology Hyderabad, India

Section 3
Applications

Chapter 8
Approaches for Pattern Discovery Using Sequential Data Mining..................................................... 137
Manish Gupta, University of Illinois at Urbana-Champaign, USA
Jiawei Han, University of Illinois at Urbana-Champaign, USA

Chapter 9
Analysis of Kinase Inhibitors and Druggability of Kinase-Targets Using Machine Learning
Techniques........................................................................................................................................... 155
S. Prasanthi, University of Hyderabad, India
S. Durga Bhavani, University of Hyderabad, India
T. Sobha Rani, University of Hyderabad, India
Raju S. Bapi, University of Hyderabad, India

Chapter 10
Identification of Genomic Islands by Pattern Discovery..................................................................... 166
Nita Parekh, International Institute of Information Technology Hyderabad, India

Chapter 11
Video Stream Mining for On-Road Traffic Density Analytics............................................................ 182
Rudra Narayan Hota, Frankfurt Institute for Advanced Studies, Germany
Kishore Jonna, Infosys Lab, Infosys Limited, India
P. Radha Krishna, Infosys Lab, Infosys Limited, India
Chapter 12
Discovering Patterns in Order to Detect Weak Signals and Define New Strategies............................ 195
Anass El Haddadi, University of Toulouse III, France & University of Mohamed V, Morocco
Bernard Dousset, University of Toulouse, France
Ilham Berrada, University of Mohamed V, Morocco

Chapter 13
Discovering Patterns for Architecture Simulation by Using Sequence Mining.................................. 212
Pınar Senkul, Middle East Technical University, Turkey
Nilufer Onder, Michigan Technological University, USA
Soner Onder, Michigan Technological University, USA
Engin Maden, Middle East Technical University, Turkey
Hui Meen Nyew, Michigan Technological University, USA

Chapter 14
Sequence Pattern Mining for Web Logs.............................................................................................. 237
Pradeep Kumar, Indian Institute of Management Lucknow, India
Raju S. Bapi, University of Hyderabad, India
P. Radha Krishna, Infosys Lab, Infosys Limited, India

Compilation of References................................................................................................................ 244

About the Contributors..................................................................................................................... 264

Index.................................................................................................................................................... 270
vii

Preface

A huge amount of data is collected every day in the form of sequences. These sequential data are valu-
able sources of information not only to search for a particular value or event at a specific time, but also
to analyze the frequency of certain events or sets of events related by particular temporal/sequential
relationship. For example, DNA sequences encode the genetic makeup of humans and all other species,
and protein sequences describe the amino acid composition of proteins and encode the structure and
function of proteins. Moreover, sequences can be used to capture how individual humans behave through
various temporal activity histories such as weblog histories and customer purchase patterns. In general
there are various methods to extract information and patterns from databases, such as time series ap-
proaches, association rule mining, and data mining techniques.
The objective of this book is to provide a concise state-of-the-art in the field of sequence data min-
ing along with applications. The book consists of 14 chapters divided into 3 sections. The first section
provides review of state-of-art in the field of sequence data mining. Section 2 presents relatively new
techniques for sequence data mining. Finally, in section 3, various application areas of sequence data
mining have been explored.
Chapter 1, Approaches for Pattern Discovery Using Sequential Data Mining, by Manish Gupta and
Jiawei Han of University of Illinois at Urbana-Champaign, IL, USA, discusses different approaches for
mining of patterns from sequence data. Apriori based methods and the pattern growth methods are the
earliest and the most influential methods for sequential pattern mining. There is also a vertical format
based method which works on a dual representation of the sequence database. Work has also been done
for mining patterns with constraints, mining closed patterns, mining patterns from multi-dimensional
databases, mining closed repetitive gapped subsequences, and other forms of sequential pattern mining.
Some works also focus on mining incremental patterns and mining from stream data. In this chapter,
the authors have presented at least one method of each of these types and discussed advantages and
disadvantages.
Chapter 2, A Review of Kernel Methods Based Approaches to Classification and Clustering of
Sequential Patterns, Part I: Sequences of Continuous Feature Vectors, was authored by Dileep A. D.,
Veena T., and C. Chandra Sekhar of Department of Computer Science and Engineering, Indian Institute
of Technology Madras, India. They present a brief description of kernel methods for pattern classifica-
tion and clustering. They also describe dynamic kernels for sequences of continuous feature vectors.
The chapter also presents a review of approaches to sequential pattern classification and clustering using
dynamic kernels.
viii

Chapter 3 is A Review of Kernel Methods Based Approaches to Classification and Clustering of

Sequential Patterns, Part II: Sequences of Discrete Symbols by Veena T., Dileep A. D., and C. Chandra
Sekhar of Department of Computer Science and Engineering, Indian Institute of Technology Madras,
India. The authors review methods to design dynamic kernels for sequences of discrete symbols. In their
chapter they have also presented a review of approaches to classification and clustering of sequences of
discrete symbols using the dynamic kernel based methods.
Chapter 4 is titled, Mining Statistically Significant Substrings Based on the Chi-Square Measure,
contributed by Sourav Dutta of IBM Research India along with Arnab Bhattacharya Dept. of Computer
Science and Engineering, Indian Institute of Technology, Kanpur, India. This chapter highlights the chal-
lenge of efficient mining of large string databases in the domains of intrusion detection systems, player
statistics, texts, proteins, et cetera, and how these issues have emerged as challenges of practical nature.
Searching for an unusual pattern within long strings of data is one of the foremost requirements for
many diverse applications. The authors first present the current state-of-art in this area and then analyze
the different statistical measures available to meet this end. Next, they argue that the most appropriate
metric is the chi-square measure. Finally, they discuss different approaches and algorithms proposed for
retrieving the top-k substrings with the largest chi-square measure. The local-maxima based algorithms
maintain high quality while outperforming others with respect to the running time.
Chapter 5 is Unbalanced Sequential Data Classification Using Extreme Outlier Elimination and
Sampling Techniques, by T. Maruthi Padmaja along with Raju S. Bapi from University of Hyderabad,
Hyderabad, India and P. Radha Krishna, Infosys Lab, Infosys Technologies Ltd, Hyderabad, India. This
chapter focuses on problem of predicting minority class sequence patterns from the noisy and unbal-
anced sequential datasets. To solve this problem, the atuhors proposed a new approach called extreme
outlier elimination and hybrid sampling technique.
Chapter 6 is Quantization Based Sequence Generation and Subsequence Pruning for Data Mining
Applications by T. Ravindra Babu and S. V. Subrahmanya of E-Comm. Research Lab, Education and
Research, Infosys Technologies Limited, Bangalore, India, along with M. Narasimha Murty, Dept. of
Computer Science and Automation, Indian Institute of Science, Bangalore, India. This chapter has high-
lighted the problem of combining data mining algorithms with data compaction used for data compression.
Such combined techniques lead to superior performance. Approaches to deal with large data include
working with a representative sample instead of the entire data. The representatives should preferably
be generated with minimal data scans, methods like random projection, et cetera.
Chapter 7 is Classification of Biological Sequences by Pratibha Rani and Vikram Pudi of International
Institute of Information Technology, Hyderabad, India, and it discusses the problem of classifying a newly
discovered sequence like a protein or DNA sequence based on their important features and functions,
using the collection of available sequences. In this chapter, the authors study this problem and present
two techniques Bayesian classifiers: RBNBC and REBMEC. The algorithms used in these classifiers
incorporate repeated occurrences of subsequences within each sequence. Specifically, RBNBC (Repeat
Based Naive Bayes Classifier) uses a novel formulation of Naive Bayes, and the second classifier,
REBMEC (Repeat Based Maximum Entropy Classifier) uses a novel framework based on the classical
Generalized Iterative Scaling (GIS) algorithm.
Chapter 8, Applications of Pattern Discovery Using Sequential Data Mining, by Manish Gupta and
Jiawei Han of University of Illinois at Urbana-Champaign, IL, USA, presents a comprehensive review
of applications of sequence data mining algorithms in a variety of domains like healthcare, education,
Web usage mining, text mining, bioinformatics, telecommunications, intrusion detection, et cetera.
ix

Chapter 9, Analysis of Kinase Inhibitors and Druggability of Kinase-Targets Using Machine Learn-
ing Techniques, by S. Prashanthi, S. Durga Bhavani, T. Sobha Rani, and Raju S. Bapi of Department
of Computer & Information Sciences, University of Hyderabad, Hyderabad, India, focuses on human
kinase drug target sequences since kinases are known to be potential drug targets. The authors have also
presented a preliminary analysis of kinase inhibitors in order to study the problem in the protein-ligand
space in future. The identification of druggable kinases is treated as a classification problem in which
druggable kinases are taken as positive data set and non-druggable kinases are chosen as negative data set.
Chapter 10, Identification of Genomic Islands by Pattern Discovery, by Nita Parekh of International
Institute of Information Technology, Hyderabad, India addresses a pattern recognition problem at the
genomic level involving identifying horizontally transferred regions, called genomic islands. A horizon-
tally transferred event is defined as the movement of genetic material between phylogenetically unrelated
organisms by mechanisms other than parent to progeny inheritance. Increasing evidence suggests the
importance of horizontal transfer events in the evolution of bacteria, influencing traits such as antibiotic
resistance, symbiosis and fitness, virulence, and adaptation in general. Considerable effort is being made
in their identification and analysis, and in this chapter, a brief summary of various approaches used in
the identification and validation of horizontally acquired regions is discussed.
Chapter 11, Video Stream Mining for On-Road Traffic Density Analytics, by Rudra Narayan Hota of
Frankfurt Institute for Advanced Studies, Frankfurt, Germany along with Kishore Jonna and P. Radha
Krishna, Infosys Lab, Infosys Technologies Limited, India, addresses the problem of estimating computer
vision based traffic density using video stream mining. The authors present an efficient approach for
traffic density estimation using texture analysis along with Support Vector Machine (SVM) classifier, and
describe analyzing traffic density for on-road traffic congestion control with better flow management.
Chapter 12, Discovering Patterns in Order to Detect Weak Signals and Define New Strategies, by
Anass El Haddadi of Université de Toulouse, IRIT UMR France Bernard Dousset, Ilham Berrada of
Ensias, AL BIRONI team, Mohamed V University – Souissi, Rabat, Morocco presents four methods
for discovering patterns in the competitive intelligence process: “correspondence analysis,” “multiple
correspondence analysis,” “evolutionary graph,” and “multi-term method.” Competitive intelligence
activities rely on collecting and analyzing data in order to discover patterns from data using sequence
data mining. The discovered patterns are used to help decision-makers considering innovation and de-
fining business strategy.
Chapter 13, Discovering Patterns for Architecture Simulation by Using Sequence Mining, by Pınar
Senkul (Middle East Technical University, Computer Engineering Dept., Ankara, Turkey) along with
Nilufer Onder (Michigan Technological University, Computer Science Dept., Michigan, USA), Soner
Onder (Michigan Technological University, Computer Science Dept., Michigan, USA), Engin Maden
(Middle East Technical University, Computer Engineering Dept., Ankara, Turkey) and Hui Meen Nyew
(Michigan Technological University, Computer Science Dept., Michigan, USA), discusses the problem
of designing and building high performance systems that make effective use of resources such as space
and power. The design process typically involves a detailed simulation of the proposed architecture fol-
lowed by corrections and improvements based on the simulation results. Both simulator development
and result analysis are very challenging tasks due to the inherent complexity of the underlying systems.
They present a tool called Episode Mining Tool (EMT), which includes three temporal sequence mining
algorithms, a preprocessor, and a visual analyzer.
Chapter 14 is called Sequence Pattern Mining for Web Logs by Pradeep Kumar, Indian Institute of
Management, Lucknow, India, Raju S. Bapi, University of Hyderabad, India and P. Radha Krishna,
x

Infosys Lab, Infosys Technologies Limited, India. In their work, the authors utilize a variation to the
AprioriALL Algorithm, which is commonly used for the sequence pattern mining. The proposed varia-
tion adds up the measure Interest during every step of candidate generation to reduce the number of
candidates thus resulting in reduced time and space cost.
This book can be useful to academic researchers and graduate students interested in data mining
in general and in sequence data mining in particular, and to scientists and engineers working in fields
where sequence data mining is involved, such as bioinformatics, genomics, Web services, security, and
financial data analysis.
Sequence data mining is still a fairly young research field. Much more remains to be discovered in
this exciting research domain in the aspects related to general concepts, techniques, and applications.
Our fond wish is that this collection sparks fervent activity in sequence data mining, and we hope this
is not the last word!

Pradeep Kumar
Indian Institute of Management Lucknow, India

P. Radha Krishna
Infosys Lab, Infosys Limited, India

S. Bapi Raju
University of Hyderabad, India
Section 1
Current State of Art
1

Chapter 1
Applications of Pattern
Discovery Using Sequential
Data Mining
Manish Gupta
University of Illinois at Urbana-Champaign, USA

Jiawei Han
University of Illinois at Urbana-Champaign, USA

ABSTRACT
Sequential pattern mining methods have been found to be applicable in a large number of domains.
Sequential data is omnipresent. Sequential pattern mining methods have been used to analyze this data
and identify patterns. Such patterns have been used to implement efficient systems that can recommend
based on previously observed patterns, help in making predictions, improve usability of systems, de-
tect events, and in general help in making strategic product decisions. In this chapter, we discuss the
applications of sequential data mining in a variety of domains like healthcare, education, Web usage
mining, text mining, bioinformatics, telecommunications, intrusion detection, et cetera. We conclude
with a summary of the work.

HEALTHCARE Patterns in patient paths: The purpose of the

French Diagnosis Related Group’s information
Patterns in healthcare domain include the common system is to describe hospital activity by focusing
patterns in paths followed by patients in hospitals, on hospital stays. (Nicolas, Herengt & Albuisson,
patterns observed in symptoms of a particular 2004) propose usage of sequential pattern mining
disease, patterns in daily activity and health data. for patient path analysis across multiple healthcare
Works related to these applications are discussed institutions. The objective is to discover, to classify
in this sub-section. and to visualize frequent patterns among patient
path. They view a patient path as a sequence of
DOI: 10.4018/978-1-61350-056-9.ch001

Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Applications of Pattern Discovery Using Sequential Data Mining

sets. Each set in the sequence is a hospitaliza- rules. It describes the criteria on the relationships
tion instance. Each element in a hospitalization between subgroups. A constraint on rule outlines
can be any symbolic data gathered by the PMSI the composition of an association rule; it describes
(medical data source). They used the SLPMiner the attributes that form the antecedents and the
system (Seno & Karypis, 2002) for mining the consequents, and calculates the confidence of an
patient path database in order to find frequent association rule. It also specifies the minimum
sequential patterns among the patient path. They support for a rule and prunes away item-sets that do
tested the model on the 2002 year of PMSI data at not meet this support at the end of each subgroup-
the Nancy University Hospital and also propose merging step. A typical user constraint can look
an interactive tool to perform inter-institutional like [1,2,3][1, a=A1&n<=2][2, a=B1&n<=2][3,
patient path analysis. v=1][rule, (s1 s2) ⇒s3]. This can be interpreted
Patterns in dyspepsia symptoms: Consider as: looking at subgroups 1, 2 and 3, from subgroup
a domain expert, who is an epidemiologist and 1, extract patterns that contain the attribute A1
is interested in finding relationships between (a=A1) and contain no more than 2 attributes
symptoms of dyspepsia within and across time (n<=2); from subgroup 2, extract patterns that
points. This can be done by first mining patterns contain the attribute B1 (a=B1) and contain no
from symptom data and then using patterns to more than 2 attributes (n<=2); then from subgroup
define association rules. Rules could look like 3, extract patterns with at least one attribute that
ANOREX2=0 VOMIT2=0 NAUSEA3=0 AN- has a value of 1 (v=1). Attributes from subgroups
OREX3=0 VOMIT3=0 ⇒ DYSPH2=0 where 1 and 2 form the antecedents in a rule, and at-
each symptom is represented as <symptom>N=V tributes from subgroup 3 form the consequents
(time=N and value=V). ANOREX (anorexia), ([rule, (s1 s2) ⇒ s3]). Such constraints are easily
VOMIT (vomiting), DYSPH (dysphagia) and incorporated into the Apriori process by pruning
NAUSEA (nausea) are the different symptoms. away more candidates based on these constraints.
However, a better way of handling this is to de- They experimented on a dataset with records
fine subgroups as a set of symptoms at a single of 303 patients treated for dyspepsia. Each record
time point. (Lau, Ong, Mahidadia, Hoffmann, represented a patient, the absence or presence of
Westbrook, & Zrimec, 2003) solve the problem 10 dyspepsia symptoms at three time points (initial
of identifying symptom patterns by implement- presentation to a general practitioner, 18 months
ing a framework for constraint based association after endoscopy screening, and 8–9 years after
rule mining across subgroups. Their framework, endoscopy) and the endoscopic diagnosis for the
Apriori with Subgroup and Constraint (ASC), is patient. Each of these symptoms can have one
built on top of the existing Apriori framework. of the following three values: symptom present,
They have identified four different types of phase- symptom absent, missing (unknown). At each of
wise constraints for subgroups: constraint across the three time points, a symptom can take any of
subgroups, constraint on subgroup, constraint on these three possible values. They show that their
pattern content and constraint on rule. A constraint approach leads to interesting symptom pattern
across subgroups specifies the order of subgroups discovery.
in which they are to be mined. A constraint on Patterns in daily activity data: There are also
subgroup describes the intra-subgroup criteria works, which investigate techniques for using
of the association rules. It describes a minimum agent-based smart home technologies to provide
support for subgroups and a set of constraints for at-home automated assistance and health moni-
each subgroup. A constraint on pattern content toring. These systems first learn patterns from
outlines the inter-subgroup criteria on association at-home health and activity data. Further, for any

2
Applications of Pattern Discovery Using Sequential Data Mining

new test cases, they identify behaviors that do not 3. A pair of set-up and clean-up: <set-up
conform to normal behavior and report them as method>, <misc action>, …, <clean-up
predicted anomalous health problems. method>
4. Exception Handling: Every instance is in-
cluded in a try-catch statement.
EDUCATION 5. Other patterns.

In the education domain, work has been done They have made this technique available as
to extract patterns from source code and student a tool: Fung(http://sel.ist.osaka-u.ac.jp/~ishio/
teamwork data. fung/)
Patterns in source code: A coding pattern is Patterns in student team-work data: (Kay,
a frequent sequence of method calls and control Maisonneuve, Yacef, & Zaïane, 2006) describe
statements to implement a particular behavior. data mining of student group interaction data to
Coding patterns include copy-and-pasted code, identify significant sequences of activity. The goal
crosscutting concerns (parts of a program which is to build tools that can flag interaction sequences
rely on or must affect many other parts of the indicative of problems, so that they can be used
system) and implementation idioms. Dupli- to assist student teams in early recognition of
cated code fragments and crosscutting concerns problems. They also want tools that can identify
that spread across modules are problematic in patterns that are markers of success so that these
software maintenance. (Ishio, Date, Miyake, & might indicate improvements during the learning
Inoue, 2008) propose a sequential pattern min- process. They obtain their data using TRAC which
ing approach to capture coding patterns in Java is an open source tool designed for use in software
programs. They define a set of rules to translate development projects. Students collaborate by
Java source code into a sequence database for sharing tasks via the TRAC system. These tasks
pattern mining, and apply PrefixSpan algorithm are managed by a “Ticket” system; source code
to the sequence database. They define constraints writing tasks are managed by a version control
for mining source code patterns. A constraint for system called “SVN”; students communicate by
control statements could be: If a pattern includes means of collaborative web page writing called
a LOOP/IF element, the pattern must include its “Wiki”. Data consist of events where each event
corresponding element generated from the same is represented as Event = {EventType, Resour-
control statement. They classify sub-patterns into ceId, Author, Time} where: EventType is one of
pattern groups. As a case study, they applied their T (for Ticket), S (for SVN), W (for Wiki). One
tool to six open-source programs and manually such sequence is generated for each of the group
investigated the resultant patterns. of students.
They identify about 17 pattern groups which The original sequence obtained for each group
they classify into 5 categories: was 285 to 1287 long. These event sequences
were then broken down into several “sequences”
1. A boolean method to insert an additional of events using a per session approach or a per
action: <Boolean method>, <IF>, <action- resource approach. In breakdown per session ap-
method>, <END-IF> proach, date and the resourceId are omitted and
2. A boolean method to change the behavior a sequence is of form: (iXj) which captures the
of multiple methods: <Boolean method>, number of i consecutive times a medium X was
<IF>, <action-method>, <END-IF> used by j different authors, e.g., <(2T1), (5W3),
(2S1),(1W1)>. In breakdown per resource ap-

3
Applications of Pattern Discovery Using Sequential Data Mining

proach, sequence is of form <iXj, t> which captures being accepted by a team member and then SVN
the number of i different events of type X, the work relating to that task being completed and the
number j of authors, and the number of days over second likely being work being done followed
which t the resource was modified, e.g., <10W5, by the ticket being closed. The close coupling of
2>. In a follow-up paper (Perera, Kay, Yacef, & task-related SVN and Wiki activity and Ticket
Koprinska, 2007), they have a third approach, events for this group was also shown by relatively
breakdown by task where every sequence is of high support for the patterns (1,t,b)(1,t,b)(1,t,b),
the form (i,X,A) which captures the number of (1,t,b)(1,s,b)(1,t,b) and (1,t,b)(1,w,b)(1,t,b). The
consecutive events (i) occurring on a particular poorest group displayed the highest support for
TRAC medium (X), and the role of the author (A). the last pattern, but no support for the former,
Patterns observed in group sessions: Better again indicating their lack of SVN use in tasks.
groups had many alternations of SVN and Wiki Patterns observed in resource sequences: The
events, and SVN and Ticket events whereas best group had very high support for patterns
weaker groups had almost none. The best group where the leader interacted with group members
also had the highest proportion of author ses- on tickets, such as (L,1,t)(b,1,t)(L,1,t). The poorest
sions containing many consecutive ticket events group in contrast lacked these interaction patterns,
(matching their high use of ticketing) and SVN and had more tickets which were created by the
events (suggesting they committed their work to Tracker rather than the Leader, suggestive of
the group repository more often). weaker leadership. The best group displayed the
A more detailed analysis of these patterns highest support for patterns such as (b,3,t) and
revealed that the best group used the Ticket (b,4,t), suggestive of group members making at
more than the Wiki, whereas the weakest group least one update on tickets before closing them.
displayed the opposite pattern. The data sug- In contrast, the weaker groups showed support
gested group leaders in good groups were much mainly for the pattern (b,2,t), most likely indicative
less involved in technical work, suggesting work of group members accepting and closing tickets
was being delegated properly and the leader was with no update events in between.
leading rather than simply doing all the work. In
contrast, the leaders of the poorer groups either Web Usage Mining
seemed to use the Wiki (a less focused medium)
more than the tickets, or be involved in too much The complexity of tasks such as Web site design,
technical work. Web server design, and of simply navigating
Patterns observed in task sequences: The two through a Web site has been increasing continu-
best groups had the greatest percentage support ously. An important input to these design tasks
for the pattern (1,t,L)(1,t,b), which were most is the analysis of how a Web site is being used.
likely tickets initiated by the leader and accepted Usage analysis includes straightforward statistics,
by another team member. The fact this occurred such as page access frequency, as well as more
more often than (1,t,L)(2,t,b), suggests that the sophisticated forms of analysis, such as finding
better groups were distinguished by tasks being the common traversal paths through a Web site.
performed on the Wiki or SVN files before the Web Usage Mining is the application of pattern
ticket was closed by the second member. Notably, mining techniques to usage logs of large Web
the weakest group had higher support for this latter data repositories in order to produce results that
pattern than the former. The best group was one of can be used in the design tasks mentioned above.
the only two to display the patterns (1,t,b)(1,s,b) However, there are several preprocessing tasks that
and (1,s,b)(1,t,b) – the first likely being a ticket

4
Applications of Pattern Discovery Using Sequential Data Mining

must be performed prior to applying data mining Patterns for customer acquisition: (Buchner &
algorithms to the data collected from server logs. Mulvenna, 1998) propose an environment that al-
Transaction identification from web usage data: lows the discovery of patterns from trading related
(Cooley, Mobasher, & Srivastava, 1999) present web sites, which can be harnessed for electronic
several data preparation techniques in order to commerce activities, such as personalization,
identify unique users and user sessions. Also, a adaptation, customization, profiling, and recom-
method to divide user sessions into semantically mendation.
meaningful transactions is defined. Each user The two essential parts of customer attraction
session in a user session file can be thought of in are the selection of new prospective customers and
two ways; either as a single transaction of many the acquisition of the selected potential candidates.
page references, or a set of many transactions each One marketing strategy to perform this exercise,
consisting of a single page reference. The goal of among others, is to find common characteristics in
transaction identification is to create meaningful already existing visitors’ information and behavior
clusters of references for each user. Therefore, for the classes of profitable and non-profitable
the task of identifying transactions is one of customers. The authors discover these sequences
either dividing a large transaction into multiple by extending GSP so it can handle duplicates in
smaller ones or merging small transactions into sequences, which is relevant to discover naviga-
fewer larger ones. This process can be extended tional behavior.
into multiple steps of merge or divide in order to
create transactions appropriate for a given data A found sequence looks as the
mining task. Both types of approaches take a following:
transaction list and possibly some parameters as {ecom.infm.ulst.ac.uk/, ecom.infm.
input, and output a transaction list that has been ulst.ac.uk/News_Resources.html, ecom.
operated on by the function in the approach in infm.ulst.ac.uk/Journals.html, ecom.
the same format as the input. They consider three infm.ulst.ac.uk/, ecom.infm.ulst.
different ways of identifying transactions based ac.uk/search.htm} Support = 3.8%;
on: Reference Length (time spent when visiting a Confidence = 31.0%
page), Maximal Forward Reference (set of pages
in the path from the first page in a user session up The discovered sequence can then be used
to the page before a backward reference is made) to display special offers dynamically to keep a
and Time Window. customer interested in the site, after a certain
By analyzing this information, a Web Usage page sequence with a threshold support and/or
Mining system can determine temporal relation- confidence value has been visited.
ships among data items such as the following
Olympics Web site examples: Patterns to Improve Web Site Design

• 9.81% of the site visitors accessed the For the analysis of visitor navigation behavior
Atlanta home page followed by the in web sites integrating multiple information
Sneakpeek main page. systems (multiple underlying database servers
• 0.42% of the site visitors accessed or archives), (Berendt, 2000) proposed the web
the Sports main page followed by the usage miner (WUM), which discovers naviga-
Schedules main page. tion patterns subject to advanced statistical and
structural constraints. Experiments with a real web
site that integrates data from multiple databases,

5
Applications of Pattern Discovery Using Sequential Data Mining

the German SchulWeb (a database of German- frequent navigational paths) are more suitable for
language school magazines), demonstrate the ap- predictive tasks, such as Web pre-fetching, which
propriateness of WUM in discovering navigation involve predicting which item is accessed next by
patterns and show how those discoveries can help a user), while less constrained patterns, such as
in assessing and improving the quality of the site frequent item-sets or general sequential patterns
design i.e. conformance of the web site’s structure are more effective alternatives in the context of
to the intuition of each group of visitors accessing Web personalization and recommender systems.
the site. The intuition of the visitors is indirectly Web usage preprocessing ultimately results
reflected in their navigation behavior, as repre- in a set of n page-views, P = {p1, p2... pn}, and a
sented in their browsing patterns. By comparing set of m user transactions, T = {t1, t2... tm}. Each
the typical patterns with the site usage expected transaction t is defined as an l-length sequence of
by the site designer, one can examine the quality ordered pairs: t = <(pt1, w(pt1)), (pt2, w(pt2)),...,(ptl,
of the site and give concrete suggestions for its w(ptl))> where w(pti) is the weight associated with
improvement. For instance, repeated refinements page-view pti. Contiguous sequential patterns
of a query may indicate a search environment that (CSPs -- patterns in which the items appearing
is not intuitive for some users. Also, long lists in the sequence must be adjacent with respect
of results may signal that sufficiently selective to the underlying ordering) are used to capture
search options are lacking, or that they are not frequent navigational paths among user trails.
understood by everyone. General sequential patterns are used to represent
A session is a directed list of page accesses more general navigational patterns within the site.
performed by a user during her/his visit in a site. To build a recommendation algorithm using
Pages of a session are mapped onto elements sequential patterns, the authors focus on frequent
of a sequence, whereby each element is a pair sequences of size |w| + 1 whose prefix contains an
comprised of the page and a positive integer. This active user session w. The candidate page-views
integer is the occurrence of the page in the session, to be recommended are the last items in all such
taking the fact into account that a user may visit the sequences. The recommendation values are based
same page more than once during a single session. on the confidence of the patterns. A simple trie
Further, they also define generalized sequences structure is used to store both the sequential and
which are sequences with length constraints on contiguous sequential patterns discovered during
gaps. These constraints are expressed in a mining the pattern discovery phase. The recommendation
language MINT. algorithm is extended to generate all kth order
The patterns that they observe are as follows. recommendations as follows. First, the recom-
Searches reaching a ‘school’ entry are a dominant mendation engine uses the largest possible active
sub-pattern. ‘State’ lists of schools are the most session window as an input for recommendation
popular lists. Schools are rarely reached in short engine. If the engine cannot generate any recom-
searches. mendations, the size of active session window is
iteratively decreased until a recommendation is
Pattern Discovery for generated or the window size becomes 0.
Web Personalization The CSP model can do better in terms of pre-
cision, but the coverage levels, in general, may
Pattern discovery from usage data can also be used be too low when the goal is to generate as many
for Web personalization. (Mobasher, Dai, Luo, & good recommendations as possible. On the other
Nakagawa, 2002) find that more restrictive pat- hand, when dealing with applications such as
terns, such as contiguous sequential patterns (e.g., Web pre-fetching in which the primary goal is to

6
Applications of Pattern Discovery Using Sequential Data Mining

predict the user’s immediate next actions (rather the results saved. The user’s query is translated
than providing a broader set of recommendations), into a shape query and this query is then executed
the CSP model provides the best choice. This is over the mined data yielding the desired trends.
particularly true in sites with many dynamically The results of the mining are a set of phrases that
generated pages, where often a contiguous navi- occur frequently in the underlying documents and
gational path represents a semantically meaningful that match a query supplied by the user. Thus, the
sequence of user actions each depending on the system has three major steps: Identifying frequent
previous actions. phrases using sequential patterns mining, generat-
ing histories of phrases and finding phrases that
satisfy a specified trend.
TEXT MINING 1-phrase is a list of elements where each ele-
ment is a phrase. k-phrase is an iterated list of
Pattern mining has been used for text databases to phrases with k levels of nesting. <<(IBM)><(data
discover trends, for text categorization, for docu- mining)>> is a 1-phrase, which can mean that
ment classification and authorship identification. IBM and “data mining” should occur in the same
We discuss these works below. paragraph, with “data mining” being contiguous
words in the paragraph.
Trends in Text Databases A word in a text field is mapped to an item in
a data-sequence or sequential pattern and a phrase
(Lent, Agrawal, & Srikant, 1997) describe a to a sequential pattern that has just one item in
system for identifying trends in text documents each element. Each element of a data sequence
collected over a period of time. Trends can be in the sequential pattern problem has some as-
used, for example, to discover that a company sociated timestamp relative to the other elements
is shifting interests from one domain to another. in the sequence thereby defining an ordering of
Their system mines these trends and also provides the elements of a sequence. Sequential pattern
a method to visualize them. algorithms can now be applied to the transaction
The unit of text is a word and a phrase is a list ID labeled words to identify simple phrases from
of words. Associated with each phrase is a history the document collection.
of the frequency of occurrence of the phrase, ob- User may be interested in phrases that are
tained by partitioning the documents based upon contained in individual sentences only. Alterna-
their timestamps. The frequency of occurrence in tively, the words comprising a phrase may come
a particular time period is the number of docu- from sequential sentences so that a phrase spans
ments that contain the phrase. A trend is a specific a paragraph. This generalization can be accom-
subsequence of the history of a phrase that satisfies modated by the use of distance constraints that
the users’ query over the histories. For example, specify a minimum and/or maximum gap between
the user may specify a shape query like a spike adjacent words of a phrase. For example, the first
query to find those phrases whose frequency of variation described above would be constrained
occurrence increased and then decreased. In this by specifying a minimum gap of one word and a
trend analysis, sequential pattern mining is used maximum gap of one sentence. The second varia-
for phrase identification. tion would have a minimum gap of one sentence
A transaction ID is assigned to each word of and a maximum gap of one paragraph. For this
every document treating the words as items in the latter example, one could further generalize the
data mining algorithms. This transformed data is notion from a single word from each sentence
then mined for dominant words and phrases, and to a set of words from each sentence by using a

7
Applications of Pattern Discovery Using Sequential Data Mining

sliding transaction time window within sentences. words. E.g., the sequential pattern < (data) (infor-
The generalizations made in the GSP algorithm mation) (machine)> means that some texts contain
for mining sequential patterns allow a one-to-one words ‘data’ then ‘information’ then ‘machine’ in
mapping of the minimum gap, maximum gap, three different sentences. Once sequential patterns
and transaction window to the parameters of the have been extracted for each category, the goal is
algorithm. to derive a categorizer from the obtained patterns.
Basic mapping of phrases to sequential patterns This is done by computing, for each category, the
is extended by providing a hierarchical mapping confidence of each associated sequential pattern.
over sentences, paragraphs, or even sections of a To solve this problem, a rule R is generated in the
text document. This extended mapping helps in following way:
taking advantage of the structure of a document
to obtain a richer set of phrases. Where a docu- R:<s1... sp> ⇒ Ci; confidence(R)=(#texts from Ci
ment has completely separate sections, phrases matching <s1... sp>)/(#texts matching <s1... sp>).
that span multiple sections can also be mined,
thereby discovering a new set of relationships. Rules are sorted depending on their confidence
This enhancement of the GSP algorithm can be level and the size of the associated sequence.
implemented by changing the Apriori-like candi- When considering a new text to be classified,
date generation algorithm, to consider both phrases a simple categorization policy is applied: the K
and words as individual elements when generating rules having the best confidence level and being
candidate k-phrases. The manner in which these supported are applied. The text is then assigned
candidates are counted would similarly change. to the class mainly obtained within the K rules.

Patterns for Text Categorization Patterns for XML Document

Classification
(Jaillet, Laurent, & Teisseire, 2006) propose us-
age of sequential patterns in the SPaC method (Garboni, Masseglia, & Trousse, 2005) present
(Sequential Patterns for Classification) for text a supervised classification technique for XML
categorization. Text categorization is the task of documents which is based on structure only. Each
assigning a boolean value to each pair (document, XML document is viewed as an ordered labeled
category) where the value is true if the document tree, represented by its tags only. After a cleaning
belongs to the particular category. SPaC method step, each predefined cluster is characterized in
consists of two steps. In the first step, sequential terms of frequent structural subsequences. Then
patterns are built from texts. In the second step, the XML documents are classified based on the
sequential patterns are used to classify texts. mined patterns of each cluster.
The text consists of a set of sentences. Each Documents are characterized using frequent
sentence is associated with a timestamp (its posi- sub-trees which are common to at least x% (the
tion in the text). Finally the set of words contained minimum support) documents of the collection.
in a sentence corresponds to the set of items pur- The system is provided a set of training docu-
chased by the client in the market basket analysis ments each of which is associated with a category.
framework. This representation is coupled with a Frequently occurring tags common to all clusters
stemming step and a stop-list. Sequential patterns are removed. In order to transform an XML docu-
are extracted using a different support applied for ment to a sequence, the nodes of the XML tree
each category Ci. The support of a frequent pattern are mapped into identifiers. Then each identifier
is the number of texts containing the sequence of is associated with its depth in the tree. Finally

8
Applications of Pattern Discovery Using Sequential Data Mining

a depth-first exploration of the tree gives the sequence of words including empty sequence.
corresponding sequence. An example sequential These sequential word patterns were introduced
pattern looks like <(0 movie), (1 title), (1 url), for authorship identification based on the fol-
(1 CountryOfProduction), (2 item), (2 item), (1 lowing assumption. Because people usually
filmography), (3 name)>. Once the whole set of generate words from the beginning to the end of
sequences (corresponding to the XML documents a sentence, how one orders words in a sentence
of a collection) is obtained, a traditional sequential can be an indicator of author’s writing style. As
pattern extraction algorithm is used to extract word order in Japanese (they study a Japanese
the frequent sequences. Those sequences, once corpus) is relatively free, rigid word segments
mapped back into trees, will give the frequent and non-contiguous word sequences may be a
sub-trees embedded in the collection. particularly important indicator of the writing
They tested several measures in order to decide style of authors.
which class each test document belongs to. The two While N-grams (consecutive word sequences)
best measures are based on the longest common fail to account for non-contiguous patterns, se-
subsequence. The first one computes the average quential pattern mining methods can do so quite
matching between the test document and the set naturally.
of sequential patterns and the second measure is a
modified measure, which incorporates the actual
length of the pattern compared to the maximum BIOINFORMATICS
length of a sequential pattern in the cluster.
Pattern mining is useful in the bioinformatics
Patterns to Identify Authors domain for predicting rules for organization of
of Documents certain elements in genes, for protein function pre-
diction, for gene expression analysis, for protein
(Tsuboi, 2002) aims at identifying the authors fold recognition and for motif discovery in DNA
of mailing list messages using a machine learn- sequences. We study these applications below.
ing technique (Support Vector Machines). In
addition, the classifier trained on the mailing Pattern Mining for Bio-Sequences
list data is applied to identify the author of Web
documents in order to investigate performance in Bio-sequences typically have a small alphabet,
authorship identification for more heterogeneous a long length, and patterns containing gaps (i.e.,
documents. Experimental results show better “don’t care”) of arbitrary size. A long sequence
identification performance when features of not (especially, with a small alphabet) often contains
only conventional word N-gram information but long patterns. Mining frequent patterns in such
also of frequent sequential patterns extracted by sequences faces a different type of explosion
a data mining technique (PrefixSpan) are used. than in transaction sequences primarily moti-
They applied PrefixSpan to extract sequential vated in market-basket analysis. (Wang, Xu, &
word patterns from each sentence and used them Yu, 2004) study how this explosion affects the
as author’s style markers in documents. The classic sequential pattern mining, and present a
sequential word patterns are sequential patterns scalable two-phase algorithm to deal with this
where item and sequence correspond to word and new explosion.
sentence, respectively. Biosequence patterns have the form of X1
Sequential pattern is <w1*w2*...*wl> where wi *...* Xn spanning over a long region, where each
is a word and l is the length of pattern. * is any Xi is a short region of consecutive items, called

9
Applications of Pattern Discovery Using Sequential Data Mining

a segment, and * denotes a variable length gap the positions after i when matching P’*X
corresponding to a region not conserved in the against s.
evolution. The presence of * implies that pattern
matching is more permissible and involves the Further to deal with the huge size of the
whole range in a sequence. The support of a pattern sequences, they introduce compression based
is the percentage of the sequences in the database querying. In this method, all positions in a
that contain the pattern. Given a minimum segment non-coding region are compressed into a new
length min_len and a minimum support min_sup, item ε that matches no existing item except *. A
a pattern X1 *...* Xn is frequent if |Xi|>=min_len non-coding region contains no part of a frequent
for 1<=i<=n and the support of the pattern is at segment. Each original sequence is scanned once,
least min_sup. The problem of mining sequence each consecutive region not overlapping with any
patterns is to find all frequent patterns. frequent segment is identified and collapsed into
The Segment Phase first searches short patterns the new item ε. For a long sequence and large
containing no gaps (Xi), called segments. This min_len and min_sup, a compressed sequence is
phase is efficient. This phase finds all frequent typically much shorter than the original sequence.
segments and builds an auxiliary structure for On real life datasets like DNA and protein
answering position queries. GST (generalized sequences submitted from 2002/12, 2003/02, they
suffix tree) is used to find: (1) The frequent seg- show the superiority of their method compared
ments of length min_len, Bi, called base segments, to PrefixSpan with respect to execution time and
and the position lists for each Bi, s:p1, p2... where the space required.
pj<pj+1 and each <s, pj> is a start position of Bi.
(2) All frequent segments of length>min_len. Note Patterns in Genes for Predicting
that position lists for such frequent segments are Gene Organization Rules
not extracted. This information about the base
segments and their positions is then stored in an In eukaryotes, rules regarding organization of cis-
index, Segment to Position Index. regulatory elements are complex. They sometimes
The Pattern Phase searches for long patterns govern multiple kinds of elements and positional
(X1 *...* Xn) containing multiple segments sepa- restrictions on elements. (Terai & Takagi, 2004)
rated by variable length gaps. This phase grows propose a method for detecting rules, by which the
rapidly one segment at a time, as opposed to one order of elements is restricted. The order restric-
item at a time. This phase is time consuming. The tion is expressed as element patterns. They extract
purpose of two phases is to exploit the information all the element patterns that occur in promoter
obtained from the first phase to speed up the pat- regions of at least the specified number of genes.
tern growth and matching and to prune the search Then, significant patterns are found based on
space in the second phase. the expression similarity of genes with promoter
Two types of pruning techniques are used. regions containing each of the extracted patterns.
Consider a pattern P’, which is a super-pattern of P: By applying the method to Saccharomyces cerevi-
siae, they detected significant patterns overlooked
• Pattern Generation Pruning: If P*X fails by previous methods, thus demonstrating the
to be a frequent pattern, so does P’*X. So, utility of sequential pattern mining for analysis
we can prune P’*X. of eukaryotic gene regulation. Several types of
• Pattern Matching Pruning: If P*X fails element organization exist, those in which (1)
to occur before position i in sequence s, so only the order of elements is important, (2) order
does P’*X. So, we only need to examine and distance both are important and (3) only the

10
Applications of Pattern Discovery Using Sequential Data Mining

combination of elements is important. In this case, Patterns for Analysis of

pattern support is the number of genes containing Gene Expression Data
the pattern in their promoter region. Minimum
length of the patterns may vary with the species. (Icev, 2003) introduces a sequential pattern mining
They use Apriori algorithm to perform mining. based technique for the analysis of gene expres-
Each element typically has a length of 10–20 sion. Gene expression is the effective production
base pairs. Therefore, two elements sometimes of the protein that a gene encodes. They focus on
overlap one another. In this study, any two ele- the characterization of the expression patterns of
ments overlapping each other are not considered genes based on their promoter regions. The pro-
to be ordered elements, because they use elements moter region of a gene contains short sequences
defined by computational prediction. Most of called motifs to which gene regulatory proteins
these overlapping sites may have no biological may bind, thereby controlling when and in which
meaning; they may simply be false-positive hits cell types the gene is expressed. Their approach
during computational prediction of elements. addresses two important aspects of gene expres-
The decision of how to treat such overlapping sion analysis: (1) Binding of proteins at more than
elements is reflected in the count stage −if a one motif is usually required, and several different
pattern consisting of element A followed by and types of proteins may need to bind several differ-
overlapping with B should not be considered as ent types of motifs in order to confer transcrip-
<A,B>, we can exclude genes containing such tional specificity. (2) Since proteins controlling
elements when counting the support of <A,B>. transcription may need to interact physically, the
This is an interesting tweak in counting support, order and spacing in which motifs occur can affect
specific to this problem. expression. They use association rules to address
the combinatorial aspect. The association rules
Patterns for Predicting Protein have the ability to involve multiple motifs and
Sequence Function to predict expression in multiple cell types. To
address the second aspect, association rules are
(Wang, Shang, & Li, 2008) present a novel method enhanced with information about the distances
of protein sequence function prediction based on among the motifs, or items that are present in
sequential pattern mining. First, known function the rule. Rules of interest are those whose set of
sequence dataset is mined to get frequent patterns. motifs deviates properly, i.e. set of motifs whose
Then, a classifier is built using the patterns gen- pair-wise distances are highly conserved in the
erated to predict function of protein sequences. promoter regions where these motifs occur.
They propose the usage of joined frequent pat- They define the cvd of a pair of motifs with
terns based and joined closed frequent patterns respect to a collection (or item-set) I of motifs as
based sequential pattern mining algorithms for the ratio between the standard deviation and the
mining this data. First, the joined frequent pattern mean of the distances between the motifs in those
segments are generated. Then, longer frequent promoter regions that contain all the motifs in I.
patters can be obtained by combining the above Given a dataset of instances D, a minimum
segments. They generate closed patterns only. support min_sup, a minimum confidence min_
The purpose of producing closed patterns is to use conf, and a maximum coefficient of variation of
them to construct a classifier for protein function distances (max-cvd), they find all distance-based
prediction. So using non-redundant patterns can association rules from D whose support and confi-
improve the accuracy of classification. dence are >= the min_sup and min_conf thresholds
and such that the cvd’s of all the pairs of items

11
Applications of Pattern Discovery Using Sequential Data Mining

in the rule are <= the maximum cvd threshold. into two parts, one for the antecedent and one for
Their algorithm to mine distance-based associa- the consequent of the rule, is considered. If the
tion rules from a dataset of instances extends the rule so formed satisfies the min_conf constraint,
Apriori algorithm. then the rule is added to the output. These rules are
In order to obtain distance-based association then used for building a classification/predictive
rules, one could use the Apriori algorithm to model for gene expression.
mine all association rules whose supports and
confidences satisfy the thresholds, and then an- Patterns for Protein
notate those rules with the cvd’s of all the pair of Fold Recognition
items present in the rule. Only those rules whose
cvd’s satisfy the max-cvd threshold are returned. Protein data contain discriminative patterns that
They call this algorithm to mine distance-based can be used in many beneficial applications if
association rules, Naïve distance-Apriori. they are defined correctly. (Exarchos, Papaloukas,
Distance-based Association Rule Mining Lampros, & Fotiadis, 2008) use sequential pat-
(DARM) algorithm first generates all the frequent tern mining for sequence-based fold recognition.
item-sets that satisfy the max-cvd constraint (cvd- Protein classification in terms of fold recognition
frequent item-sets), and then generates all associa- plays an important role in computational protein
tion rules with the required confidence from those analysis, since it can contribute to the determina-
item-sets. Note that the max-cvd constraint is a tion of the function of a protein whose structure is
non-monotonic property. An item-set that does not unknown. Fold means 3D structure of a protein.
satisfy this constraint may have supersets that do. They use cSPADE (Zaki, Sequence mining in
However, they define the following procedure that categorical domains: incorporating constraints,
keeps under consideration only frequent item-sets 2000), for the analysis of protein sequence. Se-
that deviate properly in an interesting manner. quential patterns were generated for each category
Let n be the number of promoter regions (in- (fold) separately. A patterni extracted from foldi,
stances) in the dataset. Let I be a frequent item- indicates an implication (rule) of the form patterni
set, and let S be the set of promoter regions that ⇒foldi. A maximum gap constraint is also used.
contain I. I is then said to deviate properly if either: When classifying an unknown protein to one of
the folds, all the extracted sequential patterns from
1. I is cvd-frequent. That is, the cvd over S of all folds are examined to find which of them are
each pair of motifs in I is <= max-cvd, or contained in the protein. For a pattern contained
2. For each pair of motifs P∈I, there is a subset in a protein, the score of this protein with respect
S’ of S with cardinality >= ⌈min_sup*n⌉ such to this fold is increased by: scoreai=(length of the
that the cvd over S’ of P is <= max-cvd. patternai-k) /(number of patterns in foldi) where ‘i’
represents a fold, ‘a’ represents a pattern of a fold.
The k-level of item-sets kept by the DARM Here, the length is the size of the pattern with gaps.
algorithm is the collection of frequent item-sets of Patternai is the ath pattern of the ith fold and k is a
cardinality k that deviate properly. Those item-sets value employed to assign the minimum score, to
are used to generate the (k+1)-level. Once, all the the minimal pattern. It should be mentioned that
frequent item-sets that deviate properly have been if a pattern is contained in a protein sequence
generated, distance-based association rules are more than once, it receives the same score as if
constructed from those item-sets that satisfy the it was contained only once. The scores for each
max-cvd constraint. As is the case with the Apriori fold are summed and the new protein is assigned
algorithm, each possible split of such an item-set to the fold exhibiting the highest sum.

12
Applications of Pattern Discovery Using Sequential Data Mining

The score of a protein with respect to a fold These subsequences are possibly implied in a
is calculated based on the number of sequential structural or biological function of the family and
patterns of this fold contained in the protein. The have been preserved through the protein evolution.
higher the number of patterns of a fold contained Thus, if a sequence shares patterns with other
in a protein, the higher the score of the protein sequences it is expected that the sequences are
for this fold. biologically related. Considering the two types
A classifier uses the extracted sequential pat- of patterns, rigid gap patterns reveal better con-
terns to classify proteins in the appropriate fold served regions of similarity. On the other hand,
category. For training and evaluating the proposed flexible gap patterns have a greater probability
method they used the protein sequences from of occur by chance, having a smaller biological
the Protein Data Bank and the annotation of the significance. Since the protein alphabet is small,
SCOP database. The method exhibited an overall many small patterns that express trivial local
accuracy of 25% (random would be 2.8%) in a similarity may arise. Therefore, longer patterns
classification problem with 36 candidate catego- are expected to express greater confidence in the
ries. The classification performance reaches up to sequences similarity.
56% when the five most probable protein folds
are considered. Patterns in DNA Sequences

Patterns for Protein Family Detection Large collections of genomic information have
been accumulated in recent years, and embedded
In another work on protein family detection (pro- in them is potentially significant knowledge for
tein classification), (Ferreira & Azevedo, 2005) exploitation in medicine and in the pharmaceutical
use the number and average length of the relevant industry. (Guan, Liu, & Bell, 2004) detect strings
subsequences shared with each of the protein in DNA sequences which appear frequently, either
families, as features to train a Bayes classifier. within a given sequence (e.g., for a particular
Priors for the classes are set using the number of patient) or across sequences (e.g., from different
patterns and average length of the patterns in the patients sharing a particular medical diagnosis).
corresponding class. Motifs are strings that occur very frequently.
Having discovered such motifs, they show how to
They Identify Two Types of Patterns mine association rules by an existing rough-sets
based technique.
Rigid Gap Patterns (only contain gaps with a
fixed length) and Flexible Gap Patterns (allow a
variable number of gaps between symbols of the TELECOMMUNICATIONS
sequence). Frequent patterns are mined with the
constraint of minimum length. Apart from this, Pattern mining can be used in the field of tele-
they also support item constraints (restricts set of communications for mining of group patterns
other symbols that can occur in the pattern), gap from mobile user movement data, for customer
constraints (minGap and maxGap), duration or behavior prediction, for predicting future location
window constraints which defines the maximum of a mobile user for location based services and
distance (window) between the first and the last for mining patterns useful for mobile commerce.
event of the sequence patterns. We discuss these works briefly in this sub-section.
Protein sequences of the same family typically
share common subsequences, also called motifs.

13
Applications of Pattern Discovery Using Sequential Data Mining

Patterns in Mobile User In the second series of experiments, they show

Movement Data that the methods using location summarization
reduce the mining overheads for group patterns
(Wang, Lim, & Hwang, 2006) present a new ap- of size two significantly.
proach to derive groupings of mobile users based
on their movement data. User movement data are Patterns for Customer
collected by logging location data emitted from Behavior Prediction
mobile devices tracking users. This data is of
the form D = (D1, D2... DM), where Di is a time Predicting the behavior of customers is challeng-
series of tuples (t, (x, y, z)) denoting the x, y and ing, but important for service oriented businesses.
z coordinates of user ui at time t. A set of consecu- Data mining techniques are used to make such
tive time points [ta, tb] is called a valid segment predictions, typically using only recent static data.
of G (where G is a set of users) if all the pair of (Eichinger, Nauck, & Klawonn) propose the usage
users are within dist max_dis for time [ta,tb], at of sequence mining with decision tree analysis for
least one pair of users has distance greater than this task. The combined classifier is applied to real
max_dis before time ta, at least one pair of users customer data and produces promising results.
has distance greater than max_dis after time tb and
tb-ta+1 >=min_dur. Given a set of users G, thresh- They Use Two Sequence
olds max_dis and min_dur, these form a group Mining Parameters
pattern, denoted by P = <G,max_dis,min_dur>, if
G has a valid segment. Thus, a group pattern is a maxGap, the maximum number of allowed ex-
group of users that are within a distance threshold tra events in between a sequence and maxSkip,
from one another for at least a minimum duration. the maximum number of events at the end of a
In a movement database, a group pattern may sequence before the occurrence of the event to
have multiple valid segments. The combined be predicted.
length of these valid segments is called the weight- They use an Apriori algorithm to detect fre-
count of the pattern. Thus the significance of the quent patterns from a Sequence tree and hash
pattern is measured by comparing its weight-count table based data structure. This avoids multiple
with the overall time duration. database scans, which are otherwise necessary
Since weight represents the proportion of the after every generation of candidate sequences in
time points a group of users stay close together, Apriori based algorithms.
the larger the weight is, the more significant (or The frequent sequences are combined with
interesting) the group pattern is. Furthermore, if decision tree based classification to predict cus-
the weight of a group pattern exceeds a threshold tomer behavior.
min_wei, it is called a valid group pattern, and
the corresponding group of users a valid group. Patterns for Future Location
To mine group patterns, they first propose two Prediction of Mobile Users
algorithms, namely AGP (based on Apriori) and
VG-growth (based on FP-growth). They show that Future location prediction of mobile users can
when both the number of users and logging dura- provide location-based services (LBSs) with ex-
tion are large, AGP and VG-growth are inefficient tended resources, mainly time, to improve system
for the mining group patterns of size two. There- reliability which in turn increases the users’ confi-
fore, they propose a framework that summarizes dence and the demand for LBSs. (Vu, Ryu, & Park,
user movement data before group pattern mining. 2009) propose a movement rule-based Location

14
Applications of Pattern Discovery Using Sequential Data Mining

Prediction method (RLP), to guess the user’s future patterns and sequential intrusion patterns from a
location for LBSs. They define moving sequences collection of attack packets, and then converts the
and frequent patterns in trajectory data. Further, patterns to Snort detection rules for on-line intru-
they find out all frequent spatiotemporal move- sion detection. Patterns are extracted both from
ment patterns using an algorithm based on GSP packet headers and the packet payload. A typical
algorithm. The candidate generating mechanism pattern is of the form “A packet with DA port as
of the technique is based on that of GSP algorithm 139, DgmLen field in header set to 48 and with
with an additional temporal join operation and content as 11 11”. Intrusion behavior detection
a different method for pruning candidates. In engine creates an alert when a series of incom-
addition, they employ the clustering method to ing packets match the signatures representing
control the dense regions of the patterns. With sequential intrusion scenarios.
the frequent movement patterns obtained from
the preceding subsection, the movement rules are Patterns for Discovering Multi-
generated easily. Stage Attack Strategies

Patterns for Mobile Commerce In monitoring anomalous network activities,

intrusion detection systems tend to generate a
To better reflect the customer usage patterns in large amount of alerts, which greatly increase the
the mobile commerce environment, (Yun & Chen, workload of post-detection analysis and decision-
2007) propose an innovative mining model, called making. A system to detect the ongoing attacks
mining mobile sequential patterns, which takes and predict the upcoming next step of a multistage
both the moving patterns and purchase patterns attack in alert streams by using known attack
of customers into consideration. How to strike a patterns can effectively solve this problem. The
compromise among the use of various knowledge complete, correct and up to date pattern rule of
to solve the mining on mobile sequential patterns, various network attack activities plays an impor-
is a challenging issue. They devise three algorithms tant role in such a system. An approach based on
for determining the frequent sequential patterns sequential pattern mining technique to discover
from the mobile transaction sequences. multistage attack activity patterns is efficient to
reduce the labor to construct pattern rules. But
in a dynamic network environment where novel
INTRUSION DETECTION attack strategies appear continuously, the novel
approach proposed by (Li, Zhang, Li, & Wang,
Sequential pattern mining has been used for in- 2007) to use incremental mining algorithm shows
trusion detection to study patterns of misuse in better capability to detect recently appeared attack.
network attack data and thereby detect sequential They remove the unexpected results from mining
intrusion behaviors and for discovering multistage by computing probabilistic score between suc-
attack strategies. cessive steps in a multistage attack pattern. They
use GSP to discover multistage attack behavior
Patterns in Network Attack Data patterns. All the alerts stored in database can be
viewed as a global sequence of alerts sorted by
(Wuu, Hung, & Chen, 2007) have implemented ascending DetectTime timestamp. Sequences of
an intrusion pattern discovery module in Snort alerts describe the behavior and actions of attack-
network intrusion detection system which applies ers. Multistage attack strategies can be found by
data mining technique to extract single intrusion analyzing this alert sequence. A sequential pattern

15
Applications of Pattern Discovery Using Sequential Data Mining

is a collection of alerts that occur relatively close WP, Solar etc are different earth science parameters
to each other in a given order frequently. Once with values Hi (High) or Lo (Low).
such patterns are known, the rules can be produced
for describing or predicting the behavior of the Patterns for Computer
sequence of network attack. Systems Management

Predictive algorithms play a crucial role in sys-

OTHER APPLICATIONS tems management by alerting the user to potential
failures. (Vilalta, Apte, Hellerstein, Ma, & Weiss,
Apart from the different domains mentioned 2002) focus on three case studies dealing with the
above, sequential pattern mining has been found prediction of failures in computer systems: (1)
useful in a variety of other domains. We briefly long-term prediction of performance variables
mention works in some of such areas in this sub- (e.g., disk utilization), (2) short-term prediction
section. Besides the works mentioned below, there of abnormal behavior (e.g., threshold violations),
are some applications that may need to classify and (3) short-term prediction of system events
sequence data, such as based on sequence patterns. (e.g., router failure). Empirical results show
An overview on research in sequence classification that predictive algorithms based on mining of
can be found in (Xing, Pei & Keogh). sequential patterns can be successfully employed
in the estimation of performance variables and the
Patterns in Earth Science Data prediction of critical events.

The earth science data consists of time series mea- Patterns to Detect Plan Failures
surements for various Earth science and climate
variables (e.g. soil moisture, temperature, and (Zaki, Lesh, & Mitsunori, 1999) present an al-
precipitation), along with additional data from gorithm to extract patterns of events that predict
existing ecosystem models (e.g. Net Primary failures in databases of plan executions: Plan-
Production). The ecological patterns of interest Mine. Analyzing execution traces is appropriate
include associations, clusters, predictive models, for planning domains that contain uncertainty,
and trends. (Potter, Klooster, Torregrosa, Tan, such as incomplete knowledge of the world or
Steinbach, & Kumar) discuss some of the chal- actions with probabilistic effects. They extract
lenges involved in preprocessing and analyzing causes of plan failures and feed the discovered
the data, and also consider techniques for handling patterns back into the planner. They label each
some of the spatio-temporal issues. Earth Science plan as “good” or “bad” depending on whether
data has strong seasonal components that need it achieved its goal or it failed to do so. The goal
to be removed prior to pattern analysis, as Earth is to find “interesting” sequences that have a high
scientists are primarily interested in patterns confidence of predicting plan failure. They use
that represent deviations from normal seasonal SPADE to mine such patterns.
variation such as anomalous climate events (e.g., TRIPS is an integrated system in which a
El Nino) or trends (e.g., global warming). They person collaborates with a computer to develop a
de-seasonalize the data and then compute variety high quality plan to evacuate people from a small
of spatio-temporal patterns. Rules learned from island. During the process of building the plan,
the patterns look like (WP-Hi) ⇒ (Solar-Hi) ⇒ the system simulates the plan repeatedly based
(NINO34-Lo) ⇒ (Temp-Hi) ⇒ (NPP-Lo) where on a probabilistic model of the domain, includ-

16
Applications of Pattern Discovery Using Sequential Data Mining

ing predicted weather patterns and their effect on Significant patterns provide knowledge of one or
vehicle performance. more product failures that lead to future product
The system returns an estimate of the plan’s fault(s). The effectiveness of the algorithm is il-
success. Additionally, TRIPS invokes PlanMine lustrated with the warranty data mining application
on the execution traces produced by simulation, from the automotive industry.
in order to analyze why the plan failed when it
did. The system runs PlanMine on the execution Patterns in Alarm Data
traces of the given plan to pinpoint defects in the
plan that most often lead to plan failure. It then Increasingly powerful fault management systems
applies qualitative reasoning and plan adaptation are required to ensure robustness and quality of
algorithms to modify the plan to correct the defects service in today’s networks. In this context, event
detected by PlanMine. correlation is of prime importance to extract
meaningful information from the wealth of alarm
Patterns in Automotive data generated by the network. Existing sequen-
Warranty Data tial data mining techniques address the task of
identifying possible correlations in sequences of
When a product fails within a certain time period, alarms. The output sequence sets, however, may
the warranty is a manufacturer’s assurance to a contain sequences which are not plausible from
buyer that the product will be repaired without the point of view of network topology constraints.
a cost to the customer. In a service environment (Devitt, Duffin, & Moloney, 2005) presents the
where dealers are more likely to replace than to Topographical Proximity (TP) approach which
repair, the cost of component failure during the exploits topographical information embedded in
warranty period can easily equal three to ten times alarm data in order to address this lack of plausibil-
the supplier’s unit price. Consequently, companies ity in mined sequences. Their approach is based
invest significant amounts of time and resources on an Apriori approach and introduces a novel
to monitor, document, and analyze product war- criterion for sequence selection which evaluates
ranty data. (Buddhakulsomsiri & Zakarian, 2009) sequence plausibility and coherence in the context
present a sequential pattern mining algorithm that of network topology. Connections are inferred at
allows product and quality engineers to extract run-time between pairs of alarm generating nodes
hidden knowledge from a large automotive war- in the data and a Topographical Proximity (TP)
ranty database. The algorithm uses the elementary measure is assigned based on the strength of the
set concept and database manipulation techniques inferred connection. The TP measure is used to
to search for patterns or relationships among reject or promote candidate sequences on the basis
occurrences of warranty claims over time. The of their plausibility, i.e. the strength of their con-
sequential patterns are represented in a form of nection, thereby reducing the candidate sequence
IF–THEN association rules, where the IF portion set and optimizing the space and time constraints
of the rule includes quality/warranty problems, of the data mining process.
represented as labor codes, that occurred in an
earlier time, and the THEN portion includes Patterns for Personalized
labor codes that occurred at a later time. Once a Recommendation System
set of unique sequential patterns is generated, the
algorithm applies a set of thresholds to evaluate (Romero, Ventura, Delgado, & Bra, 2007) describe
the significance of the rules and the rules that a personalized recommender system that uses web
pass these thresholds are reported in the solution. mining techniques for recommending a student

17
Applications of Pattern Discovery Using Sequential Data Mining

which (next) links to visit within an adaptable with a dataset consisting of samples from aerosol
educational hypermedia system. They present a time-of-flight mass spectrometer (ATOFMS).
specific mining tool and a recommender engine A mass spectrum is a plot of signal intensity
that helps the teacher to carry out the whole web (often normalized to the largest peak in the spec-
mining process. The overall process of Web per- trum) versus the mass-to-charge (m/z) ratio of
sonalization based on Web usage mining generally the detected ions. Thus, the presence of a peak
consists of three phases: data preparation, pattern indicates the presence of one or more ions con-
discovery and recommendation. The first two taining the m/z value indicated, within the ion
phases are performed off-line and the last phase cloud generated upon the interaction between
on-line. To make recommendations to a student, the particle and the laser beam. In many cases,
the system first, classifies the new students in one the ATOFMS generates elemental ions. Thus, the
of the groups of students (clusters). Then, it only presence of certain peaks indicates that elements
uses the sequential patterns of the correspond- such as Na+ (m/z = +23) or Fe+ (m/z = +56) or
ing group to personalize the recommendations O- (m/z = -16) ions are present. In other cases,
based on other similar students and his current cluster ions are formed, and thus the m/z observed
navigation. Grouping of students is done using represents that of a sum of the atomic weights of
k-means. They use GSP to get frequent sequences various elements.
for each of the clusters. They mine rules of the For many kinds of analysis, what is significant
form readme⇒install, welcome⇒install which are in each particle’s mass spectrum is the composi-
intuitively quite common patterns for websites. tion of the particle, i.e., the ions identified by the
peak labels (and, ideally, their proportions in the
Patterns in Atmospheric particle, and our confidence in having correctly
Aerosol Data identified them). While this representation is
less detailed than the labeled spectrum itself, it
EDAM (Exploratory Data Analysis and Man- allows us to think of the ATOFMS data stream as
agement) is a joint project between researchers a time-series of observations, one per observed
in Atmospheric Chemistry and Computer Sci- particle, where each observation is a set of ions
ence at Carleton College and the University of (possibly labeled with some additional details).
Wisconsin-Madison that aims to develop data This is precisely the market-basket abstraction
mining techniques for advancing the state of the used in e-commerce: a time-series of customer
art in analyzing atmospheric aerosol datasets. transactions, each recording the items purchased
The traditional approach for particle measure- by a customer on a single visit to a store. This
ment, which is the collection of bulk samples of analogy opens the door to applying a wide range
particulates on filters, is not adequate for studying of association rule and sequential pattern algo-
particle dynamics and real-time correlations. This rithms to the analysis of mass spectrometry data.
has led to the development of a new generation Once these patterns are mined, they can be used to
of real-time instruments that provide continuous extrapolate to periods where filter-based samples
or semi-continuous streams of data about certain were not collected.
aerosol properties. However, these instruments
have added a significant level of complexity to at- Patterns in Individuals’ Time Diaries
mospheric aerosol data, and dramatically increased
the amounts of data to be collected, managed, and Identifying patterns of activities within indi-
analyzed. (Ramakrishnan, et al., 2005) experiment viduals’ time diaries and studying similarities and
deviations between individuals in a population

18
Applications of Pattern Discovery Using Sequential Data Mining

is of interest in time use research. So far, activ- of the population) that should be performing the
ity patterns in a population have mostly been pattern. The sequential mining algorithm that
studied either by visual inspection, searching for they have used for the activity pattern extraction
occurrences of specific activity sequences and is an “AprioriAll” algorithm which is adapted to
studying their distribution in the population, or the time diary data.
statistical methods such as time series analysis Two stage classification using patterns: (Ex-
in order to analyze daily behavior. (Vrotsou, El- archos, Tsipouras, Papaloukas, & Fotiadis, 2008)
legård, & Cooper) describe a new approach for present a methodology for sequence classification,
extracting activity patterns from time diaries that which employs sequential pattern mining and
uses sequential data mining techniques. They optimization, in a two-stage process. In the first
have implemented an algorithm that searches the stage, a sequence classification model is defined,
time diaries and automatically extracts all activ- based on a set of sequential patterns and two sets
ity patterns meeting user-defined criteria of what of weights are introduced, one for the patterns and
constitutes a valid pattern of interest. Amongst the one for classes. In the second stage, an optimiza-
many criteria which can be applied are: a time tion technique is employed to estimate the weight
window containing the pattern, and minimum values and achieve optimal classification accuracy.
and maximum number of people that perform the Extensive evaluation of the methodology is car-
pattern. The extracted activity patterns can then ried out, by varying the number of sequences, the
be interactively filtered, visualized and analyzed number of patterns and the number of classes and
to reveal interesting insights using the VISUAL- it is compared with similar sequence classifica-
TimePAcTS application. To demonstrate the value tion approaches.
of this approach they consider and discuss sequen-
tial activity patterns at a population level, from a
single day perspective, with focus on the activity CONCLUSION
“paid work” and some activities surrounding it.
An activity pattern in this paper is defined as a We presented selected applications of the se-
sequence of activities performed by an individual quential pattern mining methods in the fields of
which by itself or together with other activities, healthcare, education, web usage mining, text
aims at accomplishing a more general goal/proj- mining, bioinformatics, telecommunications,
ect. When analyzing a single day of diary data, intrusion detection, etc. We envision that the
activity patterns identified in a single individual power of sequential mining methods has not yet
(referred to as an individual activity pattern) are been fully exploited. We hope to see many more
unlikely to be significant but those found amongst strong applications of these methods in a variety
a group or population (a collective activity pat- of domains in the years to come.
tern) are of greater interest. Seven categories of
activities that they consider are: care for oneself,
care for others, household care, recreation/reflec- REFERENCES
tion, travel, prepare/procure food, work/school.
{“cook dinner”; “eat dinner”; “wash dishes”} is Berendt, B. A. (2000). Analysis of navigation
a typical pattern. They also incorporate a variety behaviour in web sites integrating multiple infor-
of constraints like min and max pattern duration, mation systems. The VLDB Journal, 9(1), 56–75.
min and max gap between activities, min and doi:10.1007/s007780050083
max number of occurrences of the pattern and
min and max number of people (or a percentage

19
Applications of Pattern Discovery Using Sequential Data Mining

Buchner, A. G., & Mulvenna, M. D. (1998). Dis- Ferreira, P. G., & Azevedo, P. J. (2005). Protein
covering Internet marketing intelligence through sequence classification through relevant sequence
online analytical web usage mining. SIGMOD Re- mining and bayes classifiers. Proc. 12th Portu-
cord, 27(4), 54–61. doi:10.1145/306101.306124 guese Conference on Artificial Intelligence (EPIA)
(pp. 236-247). Springer-Verlag.
Buddhakulsomsiri, J., & Zakarian, A. (2009). Se-
quential pattern mining algorithm for automotive Garboni, C., Masseglia, F., & Trousse, B. (2005).
warranty data. Journal of Computers and Indus- Sequential pattern mining for structure-based
trial Engineering, 57(1), 137–147. doi:10.1016/j. XML document classification. Workshop of the
cie.2008.11.006 INitiative for the Evaluation of XML Retrieval.
Chen, Y.-L., & Huang, T. C.-K. (2008). A novel Guan, J. W., Liu, D., & Bell, D. A. (2004). Dis-
knowledge discovering model for mining fuzzy covering motifs in DNA sequences. Fundam.
multi-level sequential patterns in sequence data- Inform., 59(2-3), 119–134.
bases. Data & Knowledge Engineering, 66(3),
Icev, A. (2003). Distance-enhanced association
349–367. doi:10.1016/j.datak.2008.04.005
rules for gene expression. BIOKDD’03, in con-
Cooley, R., Mobasher, B., & Srivastava, J. (1999). junction with ACM SIGKDD.
Data preparation for mining World Wide Web
Ishio, T., Date, H., Miyake, T., & Inoue, K. (2008).
browsing patterns. Knowledge and Information
Mining coding patterns to detect crosscutting con-
Systems, 1(1), 5–32.
cerns in Java programs. WCRE ‘08: Proceedings
Devitt, A., Duffin, J., & Moloney, R. (2005). of the 2008 15th Working Conference on Reverse
Topographical proximity for mining network Engineering (pp. 123-132). Washington, DC:
alarm data. MineNet ‘05: Proceedings of the 2005 IEEE Computer Society.
ACM SIGCOMM workshop on Mining network
Jaillet, S., Laurent, A., & Teisseire, M. (2006).
data (pp. 179-184). Philadelphia, PA: ACM.
Sequential patterns for text categorization. Intel-
Eichinger, F., Nauck, D. D., & Klawonn, F. (n.d.). ligent Data Analysis, 10(3), 199–214.
Sequence mining for customer behaviour predic-
Kay, J., Maisonneuve, N., Yacef, K., & Zaïane,
tions in telecommunications.
O. (2006). Mining patterns of events in students’
Exarchos, T. P., Papaloukas, C., Lampros, C., & teamwork data. In Educational Data Mining
Fotiadis, D. I. (2008). Mining sequential patterns Workshop, held in conjunction with Intelligent
for protein fold recognition. Journal of Biomedi- Tutoring Systems (ITS), (pp. 45-52).
cal Informatics, 41(1), 165–179. doi:10.1016/j.
Kum, H.-C., Chang, J. H., & Wang, W. (2006).
jbi.2007.05.004
Sequential Pattern Mining in Multi-Databases via
Exarchos, T. P., Tsipouras, M. G., Papaloukas, C., Multiple Alignment. Data Mining and Knowl-
& Fotiadis, D. I. (2008). A two-stage methodology edge Discovery, 12(2-3), 151–180. doi:10.1007/
for sequence classification based on sequential s10618-005-0017-3
pattern mining and optimization. Data & Knowl-
Kum, H.-C., Chang, J. H., & Wang, W. (2007).
edge Engineering, 66(3), 467–487. doi:10.1016/j.
Benchmarking the effectiveness of sequential
datak.2008.05.007
pattern mining methods. Data & Knowledge
Engineering, 60(1), 30–50. doi:10.1016/j.
datak.2006.01.004

20
Applications of Pattern Discovery Using Sequential Data Mining

Kuo, R. J., Chao, C. M., & Liu, C. Y. (2009). In- Masseglia, F., Poncelet, P., & Teisseire, M. (2009).
tegration of K-means algorithm and AprioriSome Efficient mining of sequential patterns with time
algorithm for fuzzy sequential pattern mining. Ap- constraints: Reducing the combinations. Expert
plied Soft Computing, 9(1), 85–93. doi:10.1016/j. Systems with Applications, 36(2), 2677–2690.
asoc.2008.03.010 doi:10.1016/j.eswa.2008.01.021
Lau, A., Ong, S. S., Mahidadia, A., Hoffmann, Mendes, L. F., Ding, B., & Han, J. (2008). Stream
A., Westbrook, J., & Zrimec, T. (2003). Mining sequential pattern mining with precise error
patterns of dyspepsia symptoms across time points bounds. Proc. 2008 Int. Conf. on Data Mining
using constraint association rules. PAKDD’03: (ICDM’08), Italy, Dec. 2008.
Proceedings of the 7th Pacific-Asia conference on
Mobasher, B., Dai, H., Luo, T., & Nakagawa, M.
Advances in knowledge discovery and data mining
(2002). Using sequential and non-sequential pat-
(pp. 124-135). Seoul, Korea: Springer-Verlag.
terns in predictive Web usage mining tasks. ICDM
Laur, P.-A., Symphor, J.-E., Nock, R., & Pon- ‘02: Proceedings of the 2002 IEEE International
celet, P. (2007). Statistical supports for mining Conference on Data Mining (pp. 669-672). Wash-
sequential patterns and improving the incremental ington, DC: IEEE Computer Society.
update process on data streams. Intelligent Data
Nicolas, J. A., Herengt, G., & Albuisson, E. (2004).
Analysis, 11(1), 29–47.
Sequential pattern mining and classification of
Lent, B., Agrawal, R., & Srikant, R. (1997). Dis- patient path. MEDINFO 2004: Proceedings Of
covering trends in text databases. Proc. 3rd Int. The 11th World Congress On Medical Informatics.
Conf. Knowledge Discovery and Data Mining,
Parthasarathy, S., Zaki, M., Ogihara, M., &
KDD (pp. 227-230). AAAI Press.
Dwarkadas, S. (1999). Incremental and interactive
Li, Z., Zhang, A., Li, D., & Wang, L. (2007). Dis- sequence mining. In Proc. of the 8th Int. Conf.
covering novel multistage attack strategies. ADMA on Information and Knowledge Management
‘07: Proceedings of the 3rd international confer- (CIKM’99).
ence on Advanced Data Mining and Applications
Perera, D., Kay, J., Yacef, K., & Koprinska, I.
(pp. 45-56). Harbin, China: Springer-Verlag.
(2007). Mining learners’ traces from an online
Lin, N. P., Chen, H.-J., Hao, W.-H., Chueh, H.-E., collaboration tool. Proceedings of Educational
& Chang, C.-I. (2008). Mining strong positive and Data Mining workshop (pp. 60–69). CA, USA:
negative sequential patterns. W. Trans. on Comp., Marina del Rey.
7(3), 119–124.
Pinto, H., Han, J., Pei, J., Wang, K., Chen, Q., &
Mannila, H., Toivonen, H., & Verkamo, I. (1997). Dayal, U. (2001). Multi-dimensional sequential
Discovery of frequent episodes in event sequences. pattern mining. CIKM ‘01: Proceedings of the
Data Mining and Knowledge Discovery, 1(3), Tenth International Conference on Information
259–289. doi:10.1023/A:1009748302351 and Knowledge Management (pp. 81-88). New
York, NY: ACM.
Masseglia, F., Poncelet, P., & Teisseire, M. (2003).
Incremental mining of sequential patterns in large Potter, C., Klooster, S., Torregrosa, A., Tan, P.-
databases. Data & Knowledge Engineering, 46(1), N., Steinbach, M., & Kumar, V. (n.d.). Finding
97–121. doi:10.1016/S0169-023X(02)00209-4 spatio-temporal patterns in earth science data.

21
Applications of Pattern Discovery Using Sequential Data Mining

Ramakrishnan, R., Schauer, J. J., Chen, L., Huang, Wang, J. L., Chirn, G., Marr, T., Shapiro, B.,
Z., Shafer, M. M., & Gross, D. S. (2005). The Shasha, D., & Zhang, K. (1994). Combinatorial
EDAM project: Mining atmospheric aerosol da- pattern discovery for scientific data: Some pre-
tasets: Research articles. International Journal of liminary results. Proc. ACM SIGMOD Int’l Conf.
Intelligent Systems, 20(7), 759–787. doi:10.1002/ Management of Data, (pp. 115-125).
int.20094
Wang, K., Xu, Y., & Yu, J. X. (2004). Scalable
Romero, C., Ventura, S., Delgado, J. A., & Bra, sequential pattern mining for biological sequences.
P. D. (2007). Personalized links recommendation CIKM ‘04: Proceedings of the Thirteenth ACM
based on data mining un adaptive educational International Conference on Information and
hypermedia systems. Creating New Learning Knowledge Management (pp. 178-187). Wash-
Experiences on a Global Scale. Second European ington, DC: ACM.
Conference on Technology Enhanced Learning,
Wang, M., Shang, X.-Q., & Li, Z.-H. (2008).
EC-TEL 2007 (pp. 293-305). Crete, Greece:
Sequential pattern mining for protein function
Springer.
prediction. ADMA ‘08: Proceedings of 4th In-
Seno, M., & Karypis, G. (2002). SLPMiner: An ternational Conference on Adv Data Mining and
algorithm for finding frequent sequential patterns Applications (pp. 652-658). Chengdu, China:
using length-decreasing support constraint. In Springer-Verlag.
Proceedings of the 2nd IEEE International Con-
Wang, Y., Lim, E.-P., & Hwang, S.-Y. (2006).
ference on Data Mining (ICDM), (pp. 418-425).
Efficient mining of group patterns from user
Srikant, R., & Agrawal, R. (1996)... Advances in movement data. Data & Knowledge Engineering,
Database Technology EDBT, 96, 3–17. 57(3), 240–282. doi:10.1016/j.datak.2005.04.006
Terai, G., & Takagi, T. (2004). Predicting rules Wong, P. C., Cowley, W., Foote, H., Jurrus, E.,
on organization of cis-regulatory elements, tak- & Thomas, J. (2000). Visualizing sequential pat-
ing the order of elements into account. Bioin- terns for text mining. Proc. IEEE Information
formatics (Oxford, England), 20(7), 1119–1128. Visualization, 2000 (pp. 105-114). Society Press.
doi:10.1093/bioinformatics/bth049
Wuu, L.-C., Hung, C.-H., & Chen, S.-F. (2007).
Tsuboi, Y. (2002). Authorship identification for Building intrusion pattern miner for Snort network
heterogeneous documents. intrusion detection system. Journal of Systems
and Software, 80(10), 1699–1715. doi:10.1016/j.
Vilalta, R., Apte, C. V., Hellerstein, J. L., Ma, S., &
jss.2006.12.546
Weiss, S. M. (2002). Predictive algorithms in the
management of computer systems. IBM Systems Xing, Z., Pei, J., & Keogh, E. (2010). A
Journal, 41(3), 461–474. doi:10.1147/sj.413.0461 brief survey on sequence classification. SIG-
KDD Explorations Newsletter, 12(1), 40–48.
Vrotsou, K., Ellegård, K., & Cooper, M. (n.d.).
doi:10.1145/1882471.1882478
Exploring time diaries using semi-automated
activity pattern extraction. Yun, C. H., & Chen, M. S. (2007). Mining mobile
sequential patterns in a mobile commerce environ-
Vu, T. H., Ryu, K. H., & Park, N. (2009). A
ment. IEEE Transactions on Systems, Man, and
method for predicting future location of mobile
Cybernetics, 278–295.
user for location-based services system. Com-
puters & Industrial Engineering, 57(1), 91–105.
doi:10.1016/j.cie.2008.07.009

22
Applications of Pattern Discovery Using Sequential Data Mining

Yun, U. (2008). A new framework for detecting Han, J., & Kamber, M. (2006). Data Mining:
weighted sequential patterns in large sequence Concepts and Techniques (2nd ed.). Morgan
databases. Knowledge-Based Systems, 21(2), Kaufmann Publishers.
110–122. doi:10.1016/j.knosys.2007.04.002
Li, T.-R., Xu, Y., Ruan, D., & Pan, W.-m. Sequen-
Zaki, M. J. (2001). SPADE: An efficient algorithm tial pattern mining. In R. Da, G. Chen, E. E. Kerre,
for mining frequent sequences. Machine Learning, & G. Wets, Intelligent data mining: techniques
42(1-2), 31–60. doi:10.1023/A:1007652502315 and applications (pp. 103-122). Springer.
Zaki, M. J., Lesh, N., & Mitsunori, O. (1999). Lu, J., Adjei, O., Chen, W., Hussain, F., & Enach-
PlanMine: Predicting plan failures using sequence escu, C. (n.d.). Sequential Patterns Mining.
mining. Artificial Intelligence Review, 14(6),
Srinivasa, R. N. (2005). Data mining in e-com-
421–446. doi:10.1023/A:1006612804250
merce: A survey. Sadhana, 275–289. doi:10.1007/
BF02706248
Teisseire, M., Poncelet, P., Scientifique, P., Besse,
ADDITIONAL READING
G., Masseglia, F., & Masseglia, F. (2005). Se-
Adamo, J.-M. (2001). Data Mining for Associa- quential pattern mining: A survey on issues and
tion Rules and Sequential Patterns: Sequential approaches. Encyclopedia of Data Warehousing
and Parallel Algorithms. Secaucus, NJ, USA: and Mining, nformation Science Publishing (pp.
Springer-Verlag New York, Inc.doi:10.1007/978- 3–29). Oxford University Press.
1-4613-0085-4 Yang, L. (2003). Visualizing frequent itemsets, as-
Alves, R., & Rodriguez-Baena, D. S., Aguilar- sociation rules, and sequential patterns in parallel
Ruiz, & S., J. (2009). Gene association analysis: coordinates. ICCSA’03: Proceedings of the 2003
a survey of frequent pattern mining from gene international conference on Computational sci-
expression data. Briefings in Bioinformatics, ence and its applications (pp. 21-30). Montreal,
210–224. Canada: Springer-Verlag.
Zhao, Q., & Bhowmick, S. S. (2003). Sequential
Pattern Matching: A Survey.

23
24

Chapter 2
A Review of Kernel Methods
Based Approaches to
Classification and Clustering
of Sequential Patterns, Part I:
Sequences of Continuous Feature Vectors

Dileep A. D.
Indian Institute of Technology, India

Veena T.
Indian Institute of Technology, India

C. Chandra Sekhar
Indian Institute of Technology, India

ABSTRACT
Sequential data mining involves analysis of sequential patterns of varying length. Sequential pattern
analysis is important for pattern discovery from sequences of discrete symbols as in bioinformatics and
text analysis, and from sequences or sets of continuous valued feature vectors as in processing of au-
dio, speech, music, image, and video data. Pattern analysis techniques using kernel methods have been
explored for static patterns as well as sequential patterns. The main issue in sequential pattern analysis
using kernel methods is the design of a suitable kernel for sequential patterns of varying length. Kernel
functions designed for sequential patterns are known as dynamic kernels. In this chapter, we present a
brief description of kernel methods for pattern classification and clustering. Then we describe dynamic
kernels for sequences of continuous feature vectors. We then present a review of approaches to sequential
pattern classification and clustering using dynamic kernels.

DOI: 10.4018/978-1-61350-056-9.ch002

Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
A Review of Kernel Methods Based Approaches, Part I

INTRODUCTION TO SEQUENTIAL of the within-cluster scatter matrix for the clusters

PATTERN ANALYSIS USING KERNEL formed in the kernel feature space (Girolami, 2002;
METHODS Satish, 2005). The choice of the kernel function
used in the kernel methods is important for their
Classification and clustering of patterns extracted performance. Several kernel functions have been
from sequential data are important for pattern proposed for static patterns. Kernel methods for
discovery using sequence data mining. Pattern sequential pattern analysis adopt one of the fol-
discovery from bio-sequences involves classifica- lowing two strategies: (1) Convert a sequential
tion and clustering of discrete symbol sequences. pattern into a static pattern and then use a kernel
Pattern discovery from multimedia data such as function defined for static patterns, and (2) Design
audio, speech and video data involves classifica- and use a kernel function for sequential patterns.
tion and clustering of continuous valued feature Kernel functions designed for sequential data are
vector sequences. Classification and clustering of referred to as dynamic kernels or sequence kernels
sequential patterns of varying length have been (Wan & Renals, 2002). Examples of dynamic
challenging tasks in pattern recognition. Con- kernels for continuous feature vector sequences
ventional methods for classification of sequential are Gaussian mixture model (GMM) supervector
patterns use discrete hidden Markov models kernel (Campbell et al., 2006b) and intermedi-
(HMMs) for discrete sequences, and continuous ate matching kernel (Boughorbel et al., 2005).
density HMMs for continuous feature vector se- Fisher kernel (Jaakkola et al,. 2000) is used for
quences. Conventional methods for clustering of both the discrete observation symbol sequences
sequential patterns use distance measures such as and sequences of continuous feature vectors. This
edit distance for discrete sequences and dynamic Chapter discusses the issues in designing dynamic
time warping based distance for continuous feature kernels for continuous feature vector sequences
vector sequences. During the past 15 years, kernel and then presents a review of dynamic kernels
methods based approaches such as support vector proposed in the literature.
machines and kernel K-means clustering have been Dynamic kernels for continuous feature vec-
explored for classification and clustering of static tor sequences belong to the following two main
patterns and sequential patterns. Kernel methods categories: (1) Kernels such as Fisher kernels that
have been shown to give a good generalization capture the sequence information in the feature
performance. This Chapter presents a review of vector sequences, and (2) Kernels such as GMM
kernel methods based approaches to classification supervector kernels and intermediate matching
and clustering of sequential patterns. kernels that consider the feature vector sequences
Kernel methods for pattern analysis involve as sets of feature vectors. The kernels belonging
performing a nonlinear transformation from the to the first category have been explored for clas-
input feature space to a higher dimensional feature sification of units of speech such as phonemes,
space induced by a Mercer kernel function, and syllables and words in speech recognition. The
then constructing an optimal linear solution in the kernels belonging to the second category have
kernel feature space. Support vector machine for been explored for tasks such as speaker identifica-
two class pattern classification constructs an op- tion and verification, speech emotion recognition
timal hyperplane corresponding to the maximum and image classification. This chapter presents a
margin separating hyperplane in the kernel feature review of dynamic kernels based approaches to
space (Burges, 1998). Kernel K-means clustering classification and clustering of sequential patterns.
gives an optimal nonlinear separation of clusters The organization of the rest of the chapter is
in the input feature space by minimizing the trace as follows: The next section describes the kernel

25
A Review of Kernel Methods Based Approaches, Part I

methods for pattern analysis. The SVM based ap- Optimal Hyperplane for Linearly
proach to pattern classification and kernel based Separable Classes
approaches to pattern clustering are presented in
this section. Then the design of dynamic kernels Suppose the training data set consists of L ex-
for sequential patterns is presented in the third amples, {xi , yi }
L
, xi ∈ Rd and yi ∈ {+1, −1},
section. This section also describes the dynamic i =1

kernels for continuous feature vector sequences. where xi is ith training example and yi is the cor-
Finally, we present a review of kernel methods responding class label. Figure 1 illustrates the
based approaches to sequential pattern analysis. construction of an optimal separating hyperplane
for linearly separable classes in the two-dimen-
sional input space of x.
KERNEL METHODS FOR A hyperplane is specified as wtx + b = 0, where
PATTERN ANALYSIS w is the parameter vector and b is the bias. A
separating hyperplane that separates the data points
In this section we describe different approaches of two linearly separable classes satisfies the fol-
using kernel methods for patterns analysis. We first lowing constraints:
describe the support vector machines (SVMs) for
yi(wtxi + b) > 0 for i = 1, 2,…, L (1)
pattern classification, and then present the kernel
K-means clustering and support vector clustering
The distance between the nearest example and
methods for pattern clustering.
the separating hyperplane, called the margin, is
given by 1/||w||. The problem of finding the op-
Support Vector Machines for
timal separating hyperplane that maximizes the
Pattern Classification
margin is the same as the problem of minimizing
the Euclidean norm of the parameter vector w. For
The SVM (Burges, 1998; Cristianini & Shawe-
reducing the search space of w, the constraints that
Taylor, 2000; Sekhar et al., 2003) is a linear two-
the optimal separating hyperplane must satisfy are
class classifier. An SVM constructs the maximum
specified as follows:
margin hyperplane (optimal hyperplane) as a
decision surface to separate the data points of two
classes. The margin of a hyperplane is defined as
the minimum distance of training points from the Figure 1. Illustration of constructing the optimal
hyperplane. We first discuss the construction of an hyperplane for linearly separable classes
optimal hyperplane for linearly separable classes.
Then we discuss the construction of an optimal
hyperplane for linearly nonseparable classes, i.e.,
some training examples of the classes cannot be
classified correctly. Finally, we discuss building
an SVM for nonlinearly separable classes by con-
structing an optimal hyperplane in a high dimen-
sional feature space corresponding to a nonlinear
transformation induced by a kernel function.

26
A Review of Kernel Methods Based Approaches, Part I

yi(wtxi + b) ≥ 1 for i = 1, 2,…, L (2) L

∑αy i i
=0 (8)
i =1
The learning problem of finding the optimal
separating hyperplane is a constrained optimiza-
Substituting the expression for w from (7) in
tion problem stated as follows: Given the training
(4) and using the condition in (8), the dual form of
data set, find the values of w and b such that they
Lagrangian objective function can be derived as a
satisfy the constraints in (2) and the parameter
function of Lagrangian multipliers α, as follows:
vector w minimizes the following cost function:
L
1 L L
1 2 Ld (α) = ∑ αi − ∑ ∑ α α y y xt x (9)
J(w) = w (3) i =1 2 i =1 j =1 i j i j i j
2

The constrained optimization problem is solved The optimum values of Lagrangian multipli-
using the method of Lagrangian multipliers. The ers are determined by maximizing the objective
primal form of the Lagrangian objective function function Ld(α) subject to the following constraints:
is given by
L

L
∑αy i i
=0 (10)
1
( )
L p (w, b, α) = w − ∑ αi yi w t xi + b − 1
2 i =1

2 i =1
 
(4) αi ≥ 0 for i = 1, 2, …, L (11)

where the non-negative variables αi are called This optimization problem is solved using
Lagrange multipliers. The saddle point of the quadratic programming methods (Kaufman,
Lagrangian objective function provides the solu- 1999). The data points for which the values of the
tion for the optimization problem. The solution optimum Lagrange multipliers are not zero are
is determined by first minimizing the Lagrang- the support vectors. For these data points the
ian objective function with respect to w and b, distance to the optimal hyperplane is minimum.
and then maximizing with respect to α. The two Hence, the support vectors are the training data
conditions of optimality due to minimization are points that lie on the margin, as illustrated in
Figure 1. For the optimum Lagrange multipliers
{α }
Ls
∂L p (w, b, α) (5)
*
, the optimum parameter vector w∗ is
=0 j j =1
∂w given by

∂L p (w, b, α) Ls
=0 (6) w * = ∑ α *j y j x j (12)
∂b j =1

Application of optimality conditions gives

where Ls is the number of support vectors. The
L discriminant function of the optimal hyperplane
w = ∑ αi yi xi (7) in terms of support vectors is given by
i =1

27
A Review of Kernel Methods Based Approaches, Part I

Ls
The slack variable ξi is a measure of the devia-
D (x) = w *t x + b * = ∑ αj*y j xt x j + b * (13)
j =1
tion of a data point xi from the ideal condition of
separability. For 0 ≤ ξi ≤ 1, the data point falls
inside the region of separation, but on the correct
where b∗ is the optimum bias.
side of the separating hyperplane. For ξi > 1, the
However, the data for most of the real world
data point falls on the wrong side of the separat-
tasks are not linearly separable. Next we present
ing hyperplane. The support vectors are those
a method to construct an optimal hyperplane for
particular data points that satisfy the constraint
linearly non-separable classes.
in (14) with equality sign. The cost function for
linearly non-separable classes is given as
Optimal Hyperplane for Linearly
Non-Separable Classes L
1 2
J (w, ξ) = w + C ∑ ξi (15)
The training data points of the linearly non- 2 i =1

separable classes cannot be separated by a hyper-

plane without classification error. In such cases, where C is a user-specified positive parameter that
it is desirable to find an optimal hyperplane that controls the trade-off between the complexity of
minimizes the probability of classification error the classifier and the number of non-separable data
for the training data set. A data point is non- points. Using the method of Lagrange multipliers
separable when it does not satisfy the constraint to solve the constrained optimization problem as
in (2). This corresponds to a data point that falls in the case of linearly separable classes, the dual
either within margin or on the wrong side of the form of the Lagrangian objective function can be
separating hyperplane as illustrated in Figure 2. obtained as follows (Haykin, 1999):
For linearly non-separable classes, the con-
straints in (2) are modified by introducing the L
1 L L
Ld (α) = ∑ αi − ∑ ∑ α α y y xt x (16)
nonnegative slack variables ξi as follows: i =1 2 i =1 j =1 i j i j i j

yi(wtxi + b) ≥ 1- ξi for i = 1, 2,…, L (14)

subject to the constraints:

Figure 2. Illustration of constructing the optimal L

hyperplane for linearly nonseparable classes ∑αy i i

=0 (17)
i =1

0 ≤ αi ≤ C for i = 1, 2, …, L (18)

It may be noted that the maximum value that

the Lagrangian multipliers αi can take is C for the
linearly non-separable classes. For the optimum
{ }
Ls
Lagrange multipliers α*j , the optimum pa-
j =1

rameter vector w∗ is given by

w * = ∑ α *j y j x j (19)
j =1

28
A Review of Kernel Methods Based Approaches, Part I

where Ls is the number of support vectors. The mapped onto three-dimensional feature vectors
discriminant function of the optimal hyperplane Φ(xi) =[ x i21, x i22 , 2x i 1x i 2 ]t, i = 1, 2, …, L where
for an input vector x is given by they are linearly separable.
For the construction of the optimal hyperplane
Ls

D (x) = w *t x + b * = ∑ αj*y j xt x j + b * (20) in the high dimensional feature space Φ(x), the
j =1 dual form of the Lagrangian objective function
in (16) takes the following form:
where b∗ is the optimum bias. L
1 L L
Ld (α) = ∑ αi − ∑ ∑ α α y y Φ(xi )t Φ(x j )
Support Vector Machine for i =1 2 i =1 j =1 i j i j
Nonlinearly Separable Classes (21)

For nonlinearly separable classes, an SVM is built subject to the constraints:

by mapping the input vector xi, i = 1, 2, …, L into
a high dimensional feature vector Φ(xi) using a L

nonlinear transformation Φ, and constructing an ∑αy i i

=0 (22)
i =1
optimal hyperplane defined by wtΦ(x) + b = 0 to
separate the examples of two classes in the feature
space Φ(x). This is based on Cover’s theorem 0 ≤ αi ≤ C for i = 1, 2, …, L (23)
which states that an input space where the patterns
are nonlinearly separable may be transformed into For the optimal α∗, the optimal parameter vec-
a feature space where the patterns are linearly tor w∗ is given by
separable with a high probability, provided two Ls
conditions are satisfied (Haykin, 1999). The first w * = ∑ α *j y j Φ(x j ) (24)
condition is that the transformation is nonlinear j =1

and the second condition is that the dimensional-

ity of the feature space is high enough. The con- where Ls is the number of support vectors. The
cept of support vector machine for pattern classi- discriminant function of the optimal hyperplane
fication is illustrated in Figure 3. It is seen that the for an input vector x is defined as
nonlinearly separable data points xi = [xi1, xi2]t, i =
1, 2, …, L in a two-dimensional input space are

Figure 3. Illustration of nonlinear transformation used in building an SVM for nonlinearly separable
classes

29
A Review of Kernel Methods Based Approaches, Part I

Ls L
1 L L
D (x) = w *t ¦ (x) + b * = ∑ αj*y j ¦ (x)t ¦ (x j ) + b * Ld (α) = ∑ αi − ∑ ∑ α α y y K (xi , x j )
j =1 i =1 2 i =1 j =1 i j i j
(25) (28)

Solving (21) involves computation of the in- Ls

nerproduct operation Φ(xi)tΦ(xj). Evaluation of D (x) = w t Φ(x) + b = ∑ αjy j K (x, x j ) + b

j =1
innerproducts in a high dimensional feature space
is avoided by using an innerproduct kernel, K(xi, (29)
xj), defined as K(xi, xj) = Φ(xi)tΦ(xj) (Scholkopf
et al., 1999). A valid innerproduct kernel K(xi, xj) The architecture of a support vector machine
for two pattern vectors xi and xj is a symmetric for two-class pattern classification that implements
function for which the following Mercer’s condi- the discriminant function of the hyperplane in (29)
tion holds good: is given in Figure 4. The number of hidden nodes
corresponding to the number of support vectors,
and the training examples corresponding to the
∫ K (x , x )g(x )g(x )d x d x
i j i j i j
≥0 (26)
support vectors are determined by maximizing the
objective function in (28) using a given training
for all g(xi) such that data set and for a chosen kernel function.
Some commonly used innerproduct kernel
functions are as follows:
∫ g (x )d x (27)
2
i i
<∞

Polynomial kernel: K(xi, xj) = (axitxj + c)p

The objective function in (21) and the discrimi- Sigmoidal kernel: K(xi, xj) = tanh(axitxj + c)
nant function of the optimal hyperplane in (25) Gaussian kernel: K(xi, xj) = exp(−δ||xi − xj||2)
can now be specified using the kernel function
as follows: Here, xi and xj are vectors in the d-dimensional
input pattern space, a and c are constants, p is the

Figure 4. Architecture of a support vector machine for two-class pattern classification. The class of
the input pattern x is given by the sign of the discriminant function D(x). The number of hidden nodes
corresponds to the number of support vectors Ls. Each hidden node computes the innerproduct kernel
function K(x, xi) on the input pattern x and a support vector xi.

30
A Review of Kernel Methods Based Approaches, Part I

degree of the polynomial and δ is a nonnegative we assume that the number of examples for each
constant used for numerical stability in Gaussian class is the same, i.e., Lt = L/T.
kernel function. The dimensionality of the feature
space is (p+d)!/(p! d!) for the polynomial kernel One-Against-the-Rest Approach
(Cristianini & Shawe-Taylor, 2000). The feature
spaces for the sigmoidal and Gaussian kernels In this approach, an SVM is constructed for each
are of infinite dimension. The kernel functions class by discriminating that class against the re-
involve computations in the d-dimensional input maining (T-1) classes. The classification system
space and avoid the innerproduct operations in based on this approach consists of T SVMs. All
the high dimensional feature space. the L training examples are used in constructing
The best choice of the kernel function for a an SVM for each class. In constructing the SVM
given pattern classification problem is still a re- for the class t the desired output yi for a training
search issue (Burges, 1998). The suitable kernel example xi is specified as follows:
function and its parameters are chosen empirically.
The complexity of a two-class support vector +1, if ci = t
machine is a function of the number of support yi =  (30)
−1, if ci ≠ t
vectors (Ls) determined during its training. Multi- 
class pattern classification problems are generally
solved using a combination of two-class SVMs. The examples with the desired output yi = +1
Therefore, the complexity of a multiclass pattern are called positive examples. The examples with
classification system depends on the number of the desired output yi = −1 are called negative
SVMs and the complexity of each SVM used. In examples. An optimal hyperplane is constructed
the next subsection, we present the commonly to separate Lt positive examples from L(T-1)/T
used approaches to multiclass pattern classifica- negative examples. The much larger number of
tion using SVMs. negative examples leads to an imbalance, resulting
in the dominance of negative examples in deter-
Multiclass Pattern Classification mining the decision boundary (Kressel & Ulrich,
Using SVMs 1999). The extent of imbalance increases with the
number of classes and is significantly high when
Support vector machines are originally designed the number of classes is large. A test pattern x is
for two-class pattern classification. Multiclass classified by using the winner-takes-all strategy
pattern classification problems are commonly that uses the following decision rule:
solved using a combination of two-class SVMs
and a decision strategy to decide the class of the Class label for x = argt max Dt (x) (31)
input pattern (Allwein et al., 2001). Each SVM has
the architecture given in Figure 4 and is trained where Dt(x) is the discriminant function of the
independently. Now we present the two approaches SVM constructed for the class t.
to decomposition of the learning problem in mul-
ticlass pattern classification into several two-class One-Against-One Approach
learning problems so that a combination of SVMs
can be used. The training data set {(xi, ci)} consists In this approach, an SVM is constructed for ev-
of L examples belonging to T classes. The class ery pair of classes by training it to discriminate
label ci ∈ {1, 2,..., T}. For the sake of simplicity, the two classes. The number of SVMs used in
this approach is T(T-1)/2. An SVM for a pair of

31
A Review of Kernel Methods Based Approaches, Part I

classes s and t is constructed using 2Lt training recognition and verification, and speech emotion
examples belonging to the two classes only. The recognition.
desired output yi for a training example xi is speci-
fied as follows: Kernel Methods for
Pattern Clustering
+1, if ci = s
yi =  (32) In this subsection we the describe kernel K-means
−1, if ci = t
 clustering and support vector clustering methods
for clustering in the kernel feature space.
The small size of the set of training examples
and the balance between the number of positive Kernel K-means Clustering
and negative examples lead to a simple optimi-
zation problem to be solved in constructing an The commonly used K-means clustering method
SVM for a pair of classes. When the number of gives a linear separation of data, as illustrated
classes is large, the proliferation of SVMs leads in Figure 5, and is not suitable for separation of
to a complex classification system. nonlinearly separable data. In this subsection,
The maxwins strategy is commonly used to the criterion for partitioning the data into clusters
determine the class of a test pattern x in this ap- in the input space using the K-means clustering
proach. In this strategy, a majority voting scheme algorithm is first presented. Clustering in the
is used. If Dst(x), the value of the discriminant kernel feature space is then realised using the
function of the SVM for the pair of classes s and t, K-means clustering algorithm (Girolami, 2002;
is positive, then the class s wins a vote. Otherwise, Satish, 2005).
the class t wins a vote. Outputs of SVMs are used Consider a set of L data points in the input
to determine the number of votes won by each
space, {xi }
L
, xi ∈ Rd. Let the number of clusters
class. The class with the maximum number of i =1

votes is assigned to the test pattern. When there are to be formed is Q. The criterion used by the K-
multiple classes with the same maximum number means clustering method in the input space for
of votes, the class with the maximum value of the grouping the data into Q clusters is to minimize
total magnitude of discriminant functions (TMDF) the trace of the within-cluster scatter matrix, Sw,
is assigned. The total magnitude of discriminant defined as follows (Girolami, 2002):
functions for the class s is defined as follows:
1 Q L
Sw = ∑ ∑ z (x − µq )(xi − µq )t
L q =1 i =1 qi i
(34)
TMDF = ∑ Dst (x) (33)
t

where μq is the center of the qth cluster, Cq, and

where the summation is over all t with which the zqi is the membership of data point xi to the cluster
class s is paired. The maxwins strategy needs Cq. The membership value zqi = 1, if xi ∈Cq and 0
evaluation of discriminant functions of all the otherwise. The number of points in the qth cluster
SVMs in deciding the class of a test pattern. is given as Lq defined by
The SVM based classifiers have been suc-
cessfully used in various applications like image L

categorization, object categorization, text clas- Lq = ∑ zqi (35)

sification, handwritten character recognition, i =1

speech recognition (Sekhar et al., 2003), speaker

32
A Review of Kernel Methods Based Approaches, Part I

Figure 5. Illustration of K-means clustering in input space. (a) Scatter plot of the data in clusters sepa-
rable by a circular shaped curve in a 2-dimensional space. Inner cluster belongs to cluster 1 and the
outer cluster belongs to cluster 2. (b) Linear separation of data obtained using K-means clustering in
the input space.

The center of the cluster Cq is given as μq where µqΦ , the center of the qth cluster in the
defined by feature space, is given by
L
1 L
µq =
Lq
∑z x
qi i
(36) µqΦ =
1
∑z qi
Φ(x i ) (38)
i =1 Lq i =1

The optimal clustering of the data points in- The trace of the scatter matrix SwΦ can be
volves determining the Q × L indicator matrix,
computed using the innerproduct operations as
Z, with the elements as zqi, that minimizes the
given below:
trace of the matrix Sw. This method is used in the
K-means clustering algorithm for linear separation
1 Q L
( ) (Φ(x ) − µ )
t
of the clusters. For nonlinear separation of clusters Tr (SwΦ ) = ∑ ∑ z Φ(x i ) − µqΦ
L q =1 i =1 qi i
Φ
q
of data points, the input space is transformed into
a high dimensional feature space using a smooth (39)
and continuous nonlinear mapping, Φ, and the
clusters are formed in the feature space. The When the feature space is explicitly repre-
optimal partitioning in the feature space is based sented, as in the case of mapping using polyno-
on the criterion of minimizing the trace of the mial kernels, the K-means clustering algorithm
within-cluster scatter matrix in the feature space, can be used to minimise the trace given in the
above equation. However, for Mercer kernels such
SwΦ . The feature space scatter matrix is given by
as Gaussian kernels with implicit mapping used
for transformation, it is necessary to express the
1 Q L
( )( )
t
SwΦ = ∑ ∑ z Φ(x i ) − µqΦ Φ(x i ) − µqΦ
L q =1 i =1 qi
trace in terms of kernel function. The Mercer
kernel function in the input space corresponds to
(37) the inner-product operation in the feature space,

33
A Review of Kernel Methods Based Approaches, Part I

L
i.e., Ki j = K(xi, xj) = Φ(xi)tΦ(xj). The trace of SwΦ 1
can be rewritten as
Dqi = Kˆii −
Lq
∑z qj
Kˆij (43)
j =1

L z
1 Q L 1 Q L For implicit mapping kernels such as the
Tr (SwΦ ) = ∑ ∑
L q =1 i =1
z K
qi ii
− ∑ ∑
L q =1 i =1
zqi ∑ qj K ij
j =1 Lq Gaussian kernel function, the explicit feature
1 Q L  1 L  space representation is not known. A Gaussian
= ∑ ∑ zqi K ii − ∑ zqj K ij 
L q =1 i =1  Lq j =1  kernel is defined as K(x, xi) = exp(−δ||x − xi||2),
1 Q L where δ is the kernel width parameter. For Gauss-
= ∑ ∑ zqi Dqi ian kernel, Dq j takes a nonnegative value because
L q =1 i =1
(40) Kii =1 and Ki j ≤ Kii.
In the kernel K-means clustering, the optimiza-
where tion problem is to determine the indicator matrix
Z∗ such that
L
1
Dqi = K ii −
Lq
∑z qj
K ij (41) Z * = arg min Tr (SwΦ ) (44)
j =1 Z

The term Dqi is the penalty associated with An iterative method for solving this optimiza-
assigning xi to the qth cluster in the feature space. tion problem is given in (Girolami, 2002). The
For explicit mapping kernels such as the polyno- clusters obtained for the ring data using the kernel
mial kernel function, the feature space represen- K-means clustering method are shown in Figure 6.
tation is explicitly known. Polynomial kernel is
given by K(x, xi) = (axtxi + c)p, where a and c are
constants, and p is the degree of polynomial ker-
nel. The vector Φ(x) in the feature space of the
Figure 6. Nonlinear separation of data obtained
polynomial kernel corresponding to the input
using the kernel K-means clustering method for
space vector x includes the monomials upto order
the ring data plotted in Figure 5(a).
p of elements in x. For a polynomial kernel, Dqi
may take a negative value because the magnitude
of Kq j can be greater than that of Kii. To avoid Dqi
taking negative values, Ki j, in the equation for Dqi
is replaced with the normalized value K̂ ij defined
as

K ij
K̂ ij = (42)
K ii K jj

F r o m C a u c h y - S c h w a r z i n e q u a l i t y,
K ij ≤ K ii K jj . It follows that for the polyno-
mial kernel K̂ = 1 and Kˆ ≤ Kˆ , and D is
ii ij ii qi

defined as:

34
A Review of Kernel Methods Based Approaches, Part I

Support Vector Clustering L

a = ∑ αi Φ(x i ) (49)
i =1
Support vector clustering (Ben-Hur et al., 2001) is
a clustering method that follows the support vec- αi = C - βi (50)
tor data description technique. Here, data points
are mapped by means of a Gaussian kernel to a Using these relations, the variables R, a and
high dimensional feature space, where a search ζi may be eliminated from the Lagrangian objec-
for the minimal enclosing sphere is performed. tive function giving rise to the Wolf dual, that is
This sphere, when mapped back to data space, can expressed solely in terms of αi, as follows:
form several contours, each enclosing a separate
cluster of points. L
2
L L

Consider a set of L data points in the input Ld = ∑ αi Φ(x i ) − ∑ ∑ αi αj Φ(x i )t Φ(x j )

i =1 i =1 j =1
space, {x i }
L
, xi ∈ Rd. Using a nonlinear trans- (51)
i =1
formation Φ from the input space to a high di-
mensional feature space, the smallest enclosing subject to the following constraints:
sphere of radius R is found in the feature space.
This is described by the constraints as given below: L

∑α i
=1 (52)
i =1
|| Φ(xi) - a||2 ≤ R2, for i = 1, 2, …, L (45)
0 ≤ αi ≤ C for i = 1, 2, …, L (53)
where a is the center of the sphere. Soft constraints
are incorporated by adding slack variables ζi as
The objective function in (51) can now be
follows:
specified using the kernel function as follows:
|| Φ(xi) - a||2 ≤ R2 + ζi, for i = 1, 2, …, L (46) L L L
Ld = ∑ αi K (x i , x i ) − ∑ ∑ αi αj K (x i , x j )
with ζi ≥ 0. This constrained optimization problem i =1 i =1 j =1

is solved using the method of Lagrangian multipli- (54)

ers. The primal form of the Lagrangian objective
function is given by This optimization problem is solved using
a quadratic programming method to determine
L
 2 L L
the optimum center of the sphere a in the feature
Lp = R 2 − ∑ αi R 2 + ζi − Φ(x i ) − a  − ∑ β i ζi + C ∑ ζi
i =1
  i =1 i =1 space. Like in SVM, the set of points whose cor-
(47) responding Lagrange multipliers are non-zero
become support vectors. Further, the support
where αi ≥ 0 and βi ≥ 0 are the Lagrange multipli- vectors whose Lagrange multipliers are at C
L
ers, C is a constant, and C ∑ ζi is a penalty term. are called bounded support vectors and the rest
i =1 of them are called unbounded support vectors.
Setting to zero the derivative of Lp with respect Geometrically, the unbounded support vectors
to R, a and ζi, respectively, leads to lie on the surface of the sphere, bounded support
vectors lie outside the sphere and the remaining
L
points lie inside the sphere.
∑α i
=1 (48)
i =1

35
A Review of Kernel Methods Based Approaches, Part I

Let Z(x) be the distance of Φ(x) to the center Though the methods are described for static pat-
of the sphere a is given by terns with each example represented as a vector
in d-dimensional input space, these methods
Z2(x) = || Φ(x) - a||2 (55) can also be used for patterns with each example
represented as a non-vectorial type structure.
From equations (55) and (49) we have, However, it is necessary to design a Mercer
kernel function for patterns represented using
a non-vectorial type structure so that the kernel
L L L
Z 2 (x ) = K (x , x ) − 2∑ αi K (x i , x ) − ∑ ∑ αi αj K (x i , x j )
i =1 i =1 j =1 methods can be used for analysis of such pat-
(56) terns. Kernel functions have been proposed for
different types of structured data such as strings,
Then, the radius of the sphere R can be deter- sets, texts, graphs, images and time series data.
mined by computing Z(xi), where xi is a unbounded In the next section, we present dynamic kernels
support vector. for sequential patterns represented as sequences
The sphere in the feature space when mapped of continuous feature vectors.
back to the input space leads to the formation of
a set of contours which are interpreted as cluster
boundaries. To identify the points that belong to DESIGN OF DYNAMIC
different clusters, a geometric approach involv- KERNELS FOR CONTINUOUS
ing Z(x) and based on the following observation FEATURE VECTOR
is used: Given a pair of data points that belong to
different clusters, any path that connects them must Sequences
exit from the sphere in feature space. Therefore,
such a path contains a segment of points v such Continuous sequence data is represented in the
that Z(v) > R. This leads to the following definition form of a sequence of continuous feature vectors.
of the adjacency Ai j between a pairs of points xi Examples of continuous sequence data are speech
and xj with Φ(xi) and Φ(xj) being present in or data, handwritten character data, video data and
on the sphere in feature space shown in Box 1. time series data such as weather forecasting data,
Clusters are now defined as the connected financial data, stock market data and network
components of the graph induced by the adja- traffic data. Short-time spectral analysis of the
cency matrix A. Bounded support vectors are speech signal of an utterance gives a sequence of
unclassified by this procedure since their feature continuous feature vectors. Short-time analysis
space images lie outside the enclosing sphere. One of speech signal involves performing spectral
may decide either to leave them unclassified, or to analysis on each frame of about 20 milliseconds
assign them to the cluster that they are closest. duration and representing each frame by a real
In this section, we presented the kernel meth- valued feature vector. These feature vectors corre-
ods for classification and clustering of patterns. spond to the observations. The speech signal of an

Box 1.

1, if Z (v ) ≤ R, for all v on the line segment connecting x i and x j

Aij =  (57)
0, otherwise


36
A Review of Kernel Methods Based Approaches, Part I

utterance with M number of frames is represented for designing the kernel between two sequences of
as X = x1x2... xm... xM, where xm is a vector of real feature vectors (Boughorbel et al., 2005; Grauman
valued observations for frame m. The duration & Darrell, 2007). In this Section, we describe dif-
of utterances belonging to a class varies from ferent dynamic kernels such as generalized linear
one utterance to another. Hence, the number of discriminant sequence kernel (Campbell et al.,
frames also differs from one utterance to another. 2006a), the probabilistic sequence kernel (Lee
This makes the number of observations to vary. In et al., 2007), Kullback-Leibler divergence based
the tasks such as speech recognition, duration of kernel (Moreno et al., 2004), GMM supervector
the data is short and there is a need to model the kernel (Campbell et al., 2006b), Bhattacharyya
temporal dynamics and correlations among the distance based kernel (You et al., 2009a), earth
features. This requires the sequence information mover’s distance kernel (Jing et al,. 2003), inter-
present in the data to be preserved. In such cases, mediate matching kernel (Boughorbel et al., 2005),
a speech utterance is represented as a sequence and pyramid match kernel (Grauman & Darrell,
of feature vectors. On the other hand, in the tasks 2007) used for sequences or sets of continuous
such as speaker identification, spoken language feature vectors.
identification, and speech emotion recognition,
the duration of the data is long and preserving Generalized Linear Discriminant
sequence information is not critical. In such cases, Sequence Kernel
a speech signal is represented as a set of feature
vectors. In the handwritten character data also, Generalized linear discriminant sequence
each character is represented as a sequence of (GLDS) kernel (Campbell et al., 2006a) uses
feature vectors. In the video data, each video clip an explicit expansion into a kernel feature space
is considered as a sequence of frames and a frame defined by the polynomials of degree p. Let X
may be considered as an image. Each image can be = x1x2... xm... xM, where xm ∈ Rd be a set of M
represented by a feature vector. Since the sequence feature vectors. The GLDS kernel is derived by
information present among the adjacent frames is considering polynomials as the generalized linear
to be preserved, a video clip data is represented as discriminant functions (Campbell et al., 2002).
a sequence of feature vectors. An image can also A feature vector xm is represented in a higher
be represented as a set of local feature vectors. dimensional space Ψ as a polynomial expansion
The main issue in designing a kernel for se- Ψ (xm) =[ψ1(xm), ψ2(xm),..., ψr(xm)]t. The expansion
quences of continuous feature vectors is to handle Ψ(xm) includes all monomials of elements of xm
the varying length nature of sequences. Dynamic upto and including degree p. The set of feature
kernels for sequences of continuous feature vectors vectors X is represented as a fixed dimensional
are designed in three ways. In the first approach, vector Φ(X) which is obtained as follows:
a sequence of feature vectors is mapped onto a
vector in a fixed dimension feature space and a 1 M

kernel is defined in that space (Campbell et al.,

ΦGLDS (X ) =
M
∑ Ψ(x m
) (58)
m =1
2006a; Lee et al., 2007). The second approach
involves in kernelizing a suitable distance measure
The GLDS kernel between two examples X =
used to compare two sequences of feature vectors
x1x2... xM and Y = y1y2..., yN is given as
(Campbell et al., 2006b; Jing et al., 2003; Moreno
et al., 2004; You et al., 2009a). In the third ap-
proach, matching based technique is considered

37
Other documents randomly have
different content
I GO DOWN INTO EGYPT

I t was sitting on a housetop overlooking Jerusalem on the last day

of our visit to the Holy City that we heard in detail the story of the
official entry of the British forces. The woman who told us had spent
the years of the war as well as most of her life in Jerusalem though
she is an American. Her children were born there, she speaks many
of its languages, she knows its people and she loves it. She is one of
the women appointed by Lord Samuel, the High Commissioner, to
serve on an advisory council to assist the government in establishing
policies for the protection and betterment of women. She is the
Christian representative, the others being Mohammedan and
Hebrew.
The story, as she told it, revealing all the fear and anxiety of those
hard years, the pressure of uncertainty, the daily, hourly struggle for
food with sufficient nourishment to keep the children and the old
people alive, made us feel again, as so many times before, the sharp
stab of the fact that it is not those who bear arms alone who go to
war. Little faces, whose right it is to be round and rosy, covered with
smiles, must be pale and wan, yes, must even forget how to smile, as
millions of little ones have forgotten since nineteen hundred and
fourteen. Old faces, whose right it is to bear the marks of peace and
contentment after the struggle of the years, must be left instead with
marks of pain and anguish as have millions of the aged since
nineteen hundred and fourteen. He is a cruel monster, War, and if
man, after what he has seen these past years, does not imprison him,
starve him, and leave him to die, then man deserves the bitterness of
the fate that will be his.
When we opened our eyes, the Nile lay almost at
our feet.

Threatened deportation, first by the German, then by the Turk,

had been again and again postponed for the little group known as the
American Colony but finally word came that in ten days all must go.
The men of military age, though neutral, had been ordered away a
week before and were expecting the arrival of a Turkish officer at any
moment, to tell them the time had come. Food was very scarce, there
was no sugar, little flour, and no fats. The woman who told us the
story was herself doing all-day and sometimes all-night duty as a
nurse in the Red Crescent Hospital which was our present hotel.
Some of the letters she showed to us proved what a consolation she
must have been to the young British soldiers, who lay with the Turks,
prisoners, and sorely wounded. The thundering of the guns had been
drawing nearer but, despite rumors that crept into the streets and
the hospital wards that the Turks were slowly losing ground, they
themselves reported progress. Great airplanes droned over the city,
cannon roared in the hills. One afternoon there was unusual
commotion in the open space before the hotel and, standing by the
window, our friend saw the Germans making hurried preparations
for leaving. Signal wires, telephone wires, rugs, removable furniture,
tons of supplies, went out through the Jaffa Gate. The Turkish
General visited her—he must leave the wounded in her care. The
German doctors would go with the troops. He would leave two days’
provisions and medicines, after that—
When he had gone, half afraid to believe it, she whispered the
word to the British patients. They were nearly mad with joy, and
there was little sleeping that night. “When? When?” they would
whisper as she or her helper passed them. She could only answer, “I
do not know. We have provisions for two days.” Very early in the
morning, before it was fully light, the Mayor of the city, one of the
direct descendants of Mohamet, sent her a message saying that he
was about to surrender the city.
At half-past eight that morning the outposts saw the white flag
approaching. General Shea was the officer sent by General Allenby to
accept the surrender of the city. At half-past twelve Jerusalem had
passed into the hands of the British, guards were placed at all public
buildings, and instructions given to the Chief of Police. The joy of the
people—Jew, Christian, and Moslem—was shown in the crowds that
filled the street, the tears and embraces, the shrill cries in many
tongues. Even the Turkish sick and wounded in the hospital showed
relief. For the twenty-third time in its history Jerusalem had
surrendered. But this time there were no cries for mercy, no
bitterness, no wailing, no terrible fear of the conqueror.
Outside the walls the guns banged and hammered at the Turkish
defences. Much had to be done before Jerusalem was safe from
attack, but although, spurred on by advices from Germany, the Turks
made an attempt to recapture the City, it was a disastrous failure.
One of the young men joined us on the roof and contributed his
share of the description of the next thrilling days. It was on
December 11th, 1917, that General Allenby, Commander-in-Chief,
took formal possession of the City. He would not enter through the
break in the old wall made when the former Kaiser with his great
retinue entered as a Crusader. Indeed orders have been given to have
the break closed. Allenby would carry no flag. On foot and
accompanied by a Guard that altogether numbered about one
hundred fifty, he stood on Mount Zion on the steps of the Citadel at
the entrance to David’s Tower. He had been met outside the narrow
gate by the Guard representing all branches, faiths and races that
make up the British Army. Behind him, as he stood on the steps with
his staff, were the leading men of the City, ready to listen to the
reading of the Proclamation. There were no shouts of victory, no
trumpets, no evidence of the spirit of triumph over a foe. The
Proclamation was read in Arabic and English, in Hebrew and Greek,
in Russian, in French and Italian. As the people, standing
respectfully in the open spaces and upon the housetops, heard each
in his own tongue that all men might “pursue their lawful business
without fear,” and the promise that “every Holy Place, revered and
held sacred by any faith, will be defended and protected,” a look, first
of incredulity, then of confidence, passed over their faces. Many
Mohammedans ran from the square to repeat the words in homes
from which some fearful ones had not dared to come. Murmurs and
gestures of approval were given on every side. The windows of the
Red Crescent Hospital were filled with faces of those who, though
very ill or badly wounded, could not miss this significant moment of
the Great War. All the promises made that day have been sacredly
kept, our friends told us.
In the old Turkish barrack square the Commander-in-Chief met
the heads of all the religious communities. The sheikhs in charge of
the Mosque of Omar, the representatives of the Priests and
Patriarchs of the Latin, Greek Orthodox, Armenian, and Coptic
churches who had been deported by the Turks, the heads of the
Jewish communities, the Syrian Church, the Greek Catholic, the
Abyssinian and the Anglican Churches, all were there. It was indeed
a cosmopolitan company as to religious faiths. “The bright color of
the holiday dress which most of the people had put on, I can see in
detail at this moment,” said our friend. “Not one step in that simple
ceremony shall I ever forget. In two hours, leaving guards over all
holy places, with Mohammedan officers and soldiers from the Indian
regiments to guard the Mosque of Omar, the General walked back
through the Old Jaffa Gate as he had come, received the salute of his
troops, entered his car and went down to the fighting area.”
The relief workers took up the heavy burden of bringing food to
the people whose thin bodies and pale faces showed the effect of
months of starvation diet. All supplies must come up over roads
muddy and torn by heavy traffic, crowded with army food and
equipment that must have right-of-way. The task was one that taxed
patience and energies to the utmost. The health commission
assumed the equally great task of clearing the streets of unspeakable
filth left in the wake of the Turkish rule, our friends shouldered their
burden of securing medical attention and food for the sick, helpless
and wounded men.
“Only gradually,” said our friend walking up and down on the roof,
her cheeks flushed by the memories of days so vividly recalled, “did
we come to realize that this Holy City was free. It was as though we
had been going about in heavy chains that, suddenly taken from us,
had left us too dazed to move. We unconsciously looked for old
restrictions, old threats, old taxes suddenly to be laid upon us.
Despite the glory of its past,” she added, leaning far over the parapet
to look out upon it, “the City in all its long history was never so truly
the City of Zion as now.”
When we went back to our hotel, standing at the windows from
which the wounded had looked, we felt that though other scenes of
many cities in many lands would fade with the years, the description
of that day when a victorious army under the leadership of a great
General, a true soldier, and a Christian gentleman had, beneath the
shadow of the Tower, proclaimed a message of possession more truly
in accord with the word and teaching of Jesus than any ever recorded
in history, would never leave us.
The sunset that night was more glorious than any we had seen.
The hills were on fire with it, the Gate was gold. Then the valleys
darkened, the streets were still, the crowd of Arab and Greek, Jew
and Moslem, the shepherds, the merchant with his camels, the
shopmen and the traveler, all sought shelter. It was night, and we
looked for the last time out on the hills to the place where the stars
shone over Bethlehem. Never did more reluctant pilgrims leave a
Holy Place.
The train, going twice each week down from Jerusalem and
connecting with the train for Cairo, had begun to carry both sleeping
compartments and dining car. We boarded it late that night at the
foot of the long hill. Before it was light, we had pulled out of the
Jerusalem station and did not get even one more glimpse of the city
set on a hill. Instead when we opened our eyes we were almost out of
the Judean Hills and soon were moving along through desert-
rimmed lowlands. Then the desert itself lay about us for hours,
livened by occasional caravans that also were going down to Egypt.
Once an airplane flew over us and on into the glare of the cloudless
sky. We thought often of Joseph and Mary and the Child fleeing
through this lonely desert, to find in alien Egypt refuge from the
jealous wrath of the Roman king. How rapidly the world has learned
to cover time and space since then! How slow has been its progress
toward the kingdom the Child, whom Herod feared, had come to
build. God grant that now over the new paths through old Palestine
messengers bearing Good Will may come with all speed.
At Kantara we saw a company of Jews sent by the Zionists into
Palestine. They were a weary group, their faces bore marks of deep
suffering. They spoke Russian only, so we could not talk with them.
They were going by train to Haifa. We could only hope that their
sorrows were over and that in the land of their fathers they would
find peace, a chance to forget, shelter, food and a home. But we could
not feel sure. The threatening words we had heard from the lips of
the Arab, the protests on the part of the Jews that there was not land
enough in Palestine capable of bearing crops to give food to those
who now struggled to live, the bitter race hatreds and religious feuds
very near the surface always ready to burst into flame,—these things
made us doubtful. The transformation within a generation of this
land of Palestine into a safe and happy home for all the Jews of all
the world seems but the futile day dream of children when one faces
conditions as they are. Many lands have done many things to the Jew
who once in simplicity worshiped the God of Abraham, Isaac, and
Jacob and tried sincerely, not only to obey the great commandments
of Moses, but to teach them to his children. Whether again they can
gather from all lands and bring with them the best who shall say? It
may be that the fervent exhortations of rabbis, the wisdom of judges,
the training of educators, the science of agriculturists, the modern
irrigation miracles of engineers, the proper placing of peoples, will
make of Palestine a modern state where the dream of economists,
political and social, will one day be demonstrated in action. The only
spot we saw that seemed like the coming of the day was Ramallah,
the city that had to be taken by storm before the troops could reach
Jerusalem and visited by us to see the splendid work of the schools of
the Quakers who for long years have given learning and new life to
children of Syrian and Armenian, and now and then to a Moslem or a
Jew. From that little town between six and seven hundred Jews and
Syrians had emigrated to America. Since the war many had returned
and were rapidly building the city. Others had sent money to rebuild
the homes of their families. These homes were of stone built two
stories high with the flat roof but with plenty of space for light and
air. Gardens surrounded many of them and trees were being planted
everywhere. The children were well nourished, well clothed, and well
trained. It helped us to have more confidence in the dreams with the
fulfilment of which all Christendom is in sympathy. If only patience
and unselfishness can be set as watchmen over against the door of
enthusiasm! We were told that the Hebrew High Commissioner of
Jerusalem looks to Ramallah as a prophecy.
We sat for a while on the journey from Kantara to Cairo with a
British officer and a nurse who had seen hard service during the war
and is now in charge of a number of stations where the native nurses
meet her to report progress and receive further instructions. She
helps them to understand the care of mothers and young babies and
trains them to fight the terrible diseases of the eye that result in a
staggering percentage of blindness. The native girls are taking up
their work with enthusiasm, even the little children are sharing in the
campaign against flies. “That is a far harder campaign than any we
have ever waged when you consider the people you have to train to
fight,” said the officer. And remembering the flies on our days at
Suez and Port Said, we could agree with him. It was nearly midnight
when she left the train at a little station in the sand. Her man-servant
was waiting with her horse and we watched her ride off to her
hospital out there somewhere in the blackness. “No finer women on
God’s earth than those who wear that uniform,” said the officer.
A half hour or more and he left us. He was a lover of the desert and
almost made us forget how pitiless, how cruel, how destitute of all
that makes life for most of us, it is. He knew the names of all the stars
and when they would appear in the velvet sky over his great stretch
of camps. He loved the cold of the night and was not afraid of the
heat of midday, he loved the sunrise and sunsets and “the desert-folk
worth many times the puny men of cities.” He, too, rode off into the
darkness.
At last—Cairo—and we left the train through the long station,
bright as day. A car was waiting. The luxury of our room made us feel
that we had entered fairyland. It had been so long since we had seen
the things for rest, comfort and cleanliness that had in our past been
common necessities, that now after these months of journeying
about the world they seemed extravagances indeed!
The next morning we looked through the open French windows
with their rose hangings, out upon the Pyramids and the Nile. The
river lay almost at our feet. The beauty of it was intoxicating in the
soft light of the rising sun. When my friend broke the silence she
said, “It is indeed beautiful,” then smiling she added “but,—
If I forget thee, O Jerusalem,
Let my right hand forget her skill.
Let my tongue cleave to the roof of my mouth,
If I remember thee not,
If I prefer not Jerusalem
Above my chief joy....

Awake, awake, put on thy strength; put on thy beautiful garments, O Jerusalem,
the holy city.... Shake thyself from the dust; arise, sit on thy throne, O Jerusalem.—
Peace be unto thee, O Zion.”
TRANSCRIBER’S NOTE
1. Silently corrected typographical errors and variations in
spelling.
2. Retained anachronistic, non-standard, and uncertain
spellings as printed.
*** END OF THE PROJECT GUTENBERG EBOOK NEW PATHS
THROUGH OLD PALESTINE ***

Updated editions will replace the previous one—the old editions

will be renamed.

Creating the works from print editions not protected by U.S.

copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.

START: FULL LICENSE

THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK

To protect the Project Gutenberg™ mission of promoting the

free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.

Section 1. General Terms of Use and

Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.

1.B. “Project Gutenberg” is a registered trademark. It may only

be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.

1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.

1.E. Unless you have removed all references to Project

Gutenberg:

1.E.1. The following sentence, with active links to, or other

immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:

This eBook is for the use of anyone anywhere in the United

States and most other parts of the world at no cost and
with almost no restrictions whatsoever. You may copy it,
give it away or re-use it under the terms of the Project
Gutenberg License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country
where you are located before using this eBook.

1.E.2. If an individual Project Gutenberg™ electronic work is

derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.

1.E.3. If an individual Project Gutenberg™ electronic work is

posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.

1.E.4. Do not unlink or detach or remove the full Project

Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.

1.E.5. Do not copy, display, perform, distribute or redistribute

this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.

1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.

1.E.7. Do not charge a fee for access to, viewing, displaying,

performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.

1.E.8. You may charge a reasonable fee for copies of or

providing access to or distributing Project Gutenberg™
electronic works provided that:

• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”

• You provide a full refund of any money paid by a user who

notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.

• You provide, in accordance with paragraph 1.F.3, a full refund of

any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.

• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.

1.E.9. If you wish to charge a fee or distribute a Project

Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.

1.F.

1.F.1. Project Gutenberg volunteers and employees expend

considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.

1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except

for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you

discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.

1.F.4. Except for the limited right of replacement or refund set

forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.

1.F.5. Some states do not allow disclaimers of certain implied

warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.

1.F.6. INDEMNITY - You agree to indemnify and hold the

Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.

Section 2. Information about the Mission

of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.

Volunteers and financial support to provide volunteers with the

assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.

Section 3. Information about the Project

Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.

The Foundation’s business office is located at 809 North 1500

West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.

The Foundation is committed to complying with the laws

regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.

While we cannot and do not solicit contributions from states

where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.

International donations are gratefully accepted, but we cannot

make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.

Please check the Project Gutenberg web pages for current

donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.

Section 5. General Information About

Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.

Project Gutenberg™ eBooks are often created from several

printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.

Most people start at our website which has the main PG search
facility: www.gutenberg.org.

This website includes information about Project Gutenberg™,

including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com