SlideShare a Scribd company logo
3
Most read
4
Most read
7
Most read
Mining Stream, Time Series, and Sequence Data
Methodologies for Stream Data Processing and Stream Data SystemsRandom SamplingSliding WindowsHistogramsMulti resolution MethodsSketches Synopses
Randomized Algorithms to analyze Data StreamsRandomized algorithms, in the form of random sampling and sketching, are often used to deal with massive, high-dimensional data streams.
Data Stream Management Systems and Stream QueriesIn traditional database systems, data are stored in finite and persistent databases.stream data are infinite and impossible to store fully in a database. Data Stream Management System (DSMS), there may be multiple data streams.Once an element from a data stream has been processed, it is discarded or archived, and it cannot be easily retrieved unless it is explicitly stored in memory
Critical Layers of stream data cube    Two critical cuboids (or layers)The first layer, called the minimal interest layer, is the minimally interesting layer that ananalyst would like to studyThe second layer, called the observation layer, is the layer at which an analyst (or anautomated system) would like to continuously study the data.
Hoeffding Tree AlgorithmThe Hoeffding tree algorithm is a decision tree learning method for stream data classification.It was initially used to track Web click streams and construct models to predict which Web hosts and Web sites a user is likely to access. It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute.
Very Fast Decision Tree (VFDT) The VFDT (Very Fast Decision Tree) algorithm makes several modifications to the Hoeffding tree algorithm.The modifications include breaking near-ties during attribute selection more aggressively, computing the G function after a number of training examples, deactivating the least promising leaves whenever memory is running low, dropping poor splitting attributes, and improving the initialization method.VFDT works well on stream data and also compares extremely well to traditional classifiers in both speed and accuracy To adapt to concept-drifting data streams.
Concept-adapting Very Fast Decision Tree algorithm (CVFDT).CVFDT also uses a sliding window approach; however, it does not construct a new model from scratch each time. Rather, it updates statistics at the nodes by incrementing the counts associated with new examples and decrementing the counts associated with old ones. Therefore, if there is a concept drift, some nodes may no longer pass the Hoeffding bound. When this happens, an alternate subtree will be grown, with the new best splitting attribute at the root.
A Classifier Ensemble Approach to Stream Data ClassificationThe idea is to train an ensemble or group of classifiers (using, say naĂŻve Bayes) from sequential chunks of the data stream.Whenever a new chunk arrives, we build a new classifier from it. The individual classifiers are weighted based on their expected classification accuracy in a time-changing environment. Only the top-k classifiers are kept. The decisions are then based on the weighted votes of the classifiers.
Clustering in evolving data streamsCompute and store summaries of past dataApply a divide-and-conquer strategyIncremental clustering of incoming data streamsPerform micro clustering as well as macro clustering analysisExplore multiple time granularity for the analysis of cluster evolutionDivide stream clustering into on-line and off-line processes
Mining Time-Series DataA time-series database consists of sequences of values or events obtained over repeated measurements of time.Trend AnalysisSimilarity Search in Time-Series Analysis
Markov Chain for sequence analysisA Markov chain is a model that generates sequences in which the probability of a symbol depends only on the previous symbol.
Tasks using hidden Markov models include:Evaluation: Given a sequence, x, determine the probability, P(x), of obtaining x in the model.Decoding: Given a sequence, determine the most probable path through the model that produced the sequence.Learning: Given a model and a set of training sequences, find the model parameters (i.e., the transition and emission probabilities) that explain the training sequences with relatively high probability.
Different algorithms in series analysisForward AlgorithmViterbi AlgorithmBaum-Welch Algorithm
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net

More Related Content

PPTX
Data warehouse architecture
janani thirupathi
 
PPT
5.1 mining data streams
Krish_ver2
 
PPT
4.2 spatial data mining
Krish_ver2
 
PPT
5.2 mining time series data
Krish_ver2
 
PDF
Big data Analytics
ShivanandaVSeeri
 
PPT
5.3 mining sequential patterns
Krish_ver2
 
PPTX
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
PDF
CS6701 CRYPTOGRAPHY AND NETWORK SECURITY
Kathirvel Ayyaswamy
 
Data warehouse architecture
janani thirupathi
 
5.1 mining data streams
Krish_ver2
 
4.2 spatial data mining
Krish_ver2
 
5.2 mining time series data
Krish_ver2
 
Big data Analytics
ShivanandaVSeeri
 
5.3 mining sequential patterns
Krish_ver2
 
Clustering in data Mining (Data Mining)
Mustafa Sherazi
 
CS6701 CRYPTOGRAPHY AND NETWORK SECURITY
Kathirvel Ayyaswamy
 

What's hot (20)

PDF
symmetric key encryption algorithms
Rashmi Burugupalli
 
PPT
3. mining frequent patterns
Azad public school
 
PPT
Message authentication
CAS
 
PPTX
Ensemble Method (Bagging Boosting)
Abdullah al Mamun
 
PPTX
Multidimensional data models
774474
 
PPTX
Data preprocessing PPT
ANUSUYA T K
 
PPTX
Major issues in data mining
Slideshare
 
PPTX
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
PPTX
Public Key Cryptography
Gopal Sakarkar
 
PPTX
SHA- Secure hashing algorithm
Ruchi Maurya
 
PPT
File organization 1
Rupali Rana
 
PPT
Block Cipher and its Design Principles
SHUBHA CHATURVEDI
 
PPT
Network security cryptographic hash function
Mijanur Rahman Milon
 
PPTX
ElGamal Encryption Algoritham.pptx
Indian Institute of information technology Una
 
PPTX
database recovery techniques
Kalhan Liyanage
 
PPT
Centralised and distributed databases
Forrester High School
 
PPTX
04 Classification in Data Mining
Valerii Klymchuk
 
PPTX
Concurrency Control in Distributed Database.
Meghaj Mallick
 
PDF
IP Security
Dr.Florence Dayana
 
PDF
Run time storage
Rasineni Madhan Mohan Naidu
 
symmetric key encryption algorithms
Rashmi Burugupalli
 
3. mining frequent patterns
Azad public school
 
Message authentication
CAS
 
Ensemble Method (Bagging Boosting)
Abdullah al Mamun
 
Multidimensional data models
774474
 
Data preprocessing PPT
ANUSUYA T K
 
Major issues in data mining
Slideshare
 
Introduction to Web Mining and Spatial Data Mining
AarshDhokai
 
Public Key Cryptography
Gopal Sakarkar
 
SHA- Secure hashing algorithm
Ruchi Maurya
 
File organization 1
Rupali Rana
 
Block Cipher and its Design Principles
SHUBHA CHATURVEDI
 
Network security cryptographic hash function
Mijanur Rahman Milon
 
ElGamal Encryption Algoritham.pptx
Indian Institute of information technology Una
 
database recovery techniques
Kalhan Liyanage
 
Centralised and distributed databases
Forrester High School
 
04 Classification in Data Mining
Valerii Klymchuk
 
Concurrency Control in Distributed Database.
Meghaj Mallick
 
IP Security
Dr.Florence Dayana
 
Run time storage
Rasineni Madhan Mohan Naidu
 
Ad

Similar to Data Mining: Mining stream time series and sequence data (20)

PPTX
Clustering for Stream and Parallelism (DATA ANALYTICS)
DheerajPachauri
 
PPT
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Esteban Donato
 
DOC
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Mumbai Academisc
 
PDF
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
mlaij
 
PDF
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
mlaij
 
PPTX
Thilaganga mphil cs viva presentation ppt
thilaganga
 
PDF
A fuzzy clustering algorithm for high dimensional streaming data
Alexander Decker
 
PDF
Mining closed sequential patterns in large sequence databases
IJDMS
 
PPTX
Data mining
Jhadesunil
 
PPTX
swatiVCprsentation artificial learning and machine learning.pptx
pooja71445
 
DOCX
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
Nexgen Technology
 
PPTX
Real time streaming analytics
Anirudh
 
PPTX
Data mining concepts and work
Amr Abd El Latief
 
PPTX
Atomreaktor
József Király
 
PDF
Big data serving: Processing and inference at scale in real time
Itai Yaffe
 
PDF
Cognitive automation
Trideeb Kumar Das
 
PPTX
Introduction to data mining
Ujjawal
 
PPTX
Seminar Presentation
Vaibhav Dhattarwal
 
PDF
Novel Ensemble Tree for Fast Prediction on Data Streams
IJERA Editor
 
PPTX
Azure Databricks for Data Scientists
Richard Garris
 
Clustering for Stream and Parallelism (DATA ANALYTICS)
DheerajPachauri
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Esteban Donato
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Mumbai Academisc
 
EVALUATION OF A NEW INCREMENTAL CLASSIFICATION TREE ALGORITHM FOR MINING HIGH...
mlaij
 
Evaluation of a New Incremental Classification Tree Algorithm for Mining High...
mlaij
 
Thilaganga mphil cs viva presentation ppt
thilaganga
 
A fuzzy clustering algorithm for high dimensional streaming data
Alexander Decker
 
Mining closed sequential patterns in large sequence databases
IJDMS
 
Data mining
Jhadesunil
 
swatiVCprsentation artificial learning and machine learning.pptx
pooja71445
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
Nexgen Technology
 
Real time streaming analytics
Anirudh
 
Data mining concepts and work
Amr Abd El Latief
 
Atomreaktor
József Király
 
Big data serving: Processing and inference at scale in real time
Itai Yaffe
 
Cognitive automation
Trideeb Kumar Das
 
Introduction to data mining
Ujjawal
 
Seminar Presentation
Vaibhav Dhattarwal
 
Novel Ensemble Tree for Fast Prediction on Data Streams
IJERA Editor
 
Azure Databricks for Data Scientists
Richard Garris
 
Ad

More from DataminingTools Inc (20)

PPTX
Terminology Machine Learning
DataminingTools Inc
 
PPTX
Techniques Machine Learning
DataminingTools Inc
 
PPTX
Machine learning Introduction
DataminingTools Inc
 
PPTX
Areas of machine leanring
DataminingTools Inc
 
PPTX
AI: Planning and AI
DataminingTools Inc
 
PPTX
AI: Logic in AI 2
DataminingTools Inc
 
PPTX
AI: Logic in AI
DataminingTools Inc
 
PPTX
AI: Learning in AI 2
DataminingTools Inc
 
PPTX
AI: Learning in AI
DataminingTools Inc
 
PPTX
AI: Introduction to artificial intelligence
DataminingTools Inc
 
PPTX
AI: Belief Networks
DataminingTools Inc
 
PPTX
AI: AI & Searching
DataminingTools Inc
 
PPTX
AI: AI & Problem Solving
DataminingTools Inc
 
PPTX
Data Mining: Text and web mining
DataminingTools Inc
 
PPTX
Data Mining: Outlier analysis
DataminingTools Inc
 
PPTX
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
PPTX
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
PPTX
Data warehouse and olap technology
DataminingTools Inc
 
PPTX
Data Mining: Data processing
DataminingTools Inc
 
PPTX
Data Mining: clustering and analysis
DataminingTools Inc
 
Terminology Machine Learning
DataminingTools Inc
 
Techniques Machine Learning
DataminingTools Inc
 
Machine learning Introduction
DataminingTools Inc
 
Areas of machine leanring
DataminingTools Inc
 
AI: Planning and AI
DataminingTools Inc
 
AI: Logic in AI 2
DataminingTools Inc
 
AI: Logic in AI
DataminingTools Inc
 
AI: Learning in AI 2
DataminingTools Inc
 
AI: Learning in AI
DataminingTools Inc
 
AI: Introduction to artificial intelligence
DataminingTools Inc
 
AI: Belief Networks
DataminingTools Inc
 
AI: AI & Searching
DataminingTools Inc
 
AI: AI & Problem Solving
DataminingTools Inc
 
Data Mining: Text and web mining
DataminingTools Inc
 
Data Mining: Outlier analysis
DataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
Data warehouse and olap technology
DataminingTools Inc
 
Data Mining: Data processing
DataminingTools Inc
 
Data Mining: clustering and analysis
DataminingTools Inc
 

Recently uploaded (20)

PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PPTX
Comunidade Salesforce SĂŁo Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira JĂşnior
 
PDF
This slide provides an overview Technology
mineshkharadi333
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PPTX
Coupa-Overview _Assumptions presentation
annapureddyn
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PPTX
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
DevOps & Developer Experience Summer BBQ
AUGNYC
 
PPT
L2 Rules of Netiquette in Empowerment technology
Archibal2
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Software Development Company | KodekX
KodekX
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
ChatGPT's Deck on The Enduring Legacy of Fax Machines
Greg Swan
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Comunidade Salesforce SĂŁo Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira JĂşnior
 
This slide provides an overview Technology
mineshkharadi333
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
Coupa-Overview _Assumptions presentation
annapureddyn
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
DevOps & Developer Experience Summer BBQ
AUGNYC
 
L2 Rules of Netiquette in Empowerment technology
Archibal2
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Software Development Company | KodekX
KodekX
 

Data Mining: Mining stream time series and sequence data

  • 1. Mining Stream, Time Series, and Sequence Data
  • 2. Methodologies for Stream Data Processing and Stream Data SystemsRandom SamplingSliding WindowsHistogramsMulti resolution MethodsSketches Synopses
  • 3. Randomized Algorithms to analyze Data StreamsRandomized algorithms, in the form of random sampling and sketching, are often used to deal with massive, high-dimensional data streams.
  • 4. Data Stream Management Systems and Stream QueriesIn traditional database systems, data are stored in finite and persistent databases.stream data are infinite and impossible to store fully in a database. Data Stream Management System (DSMS), there may be multiple data streams.Once an element from a data stream has been processed, it is discarded or archived, and it cannot be easily retrieved unless it is explicitly stored in memory
  • 5. Critical Layers of stream data cube Two critical cuboids (or layers)The first layer, called the minimal interest layer, is the minimally interesting layer that ananalyst would like to studyThe second layer, called the observation layer, is the layer at which an analyst (or anautomated system) would like to continuously study the data.
  • 6. Hoeffding Tree AlgorithmThe Hoeffding tree algorithm is a decision tree learning method for stream data classification.It was initially used to track Web click streams and construct models to predict which Web hosts and Web sites a user is likely to access. It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute.
  • 7. Very Fast Decision Tree (VFDT) The VFDT (Very Fast Decision Tree) algorithm makes several modifications to the Hoeffding tree algorithm.The modifications include breaking near-ties during attribute selection more aggressively, computing the G function after a number of training examples, deactivating the least promising leaves whenever memory is running low, dropping poor splitting attributes, and improving the initialization method.VFDT works well on stream data and also compares extremely well to traditional classifiers in both speed and accuracy To adapt to concept-drifting data streams.
  • 8. Concept-adapting Very Fast Decision Tree algorithm (CVFDT).CVFDT also uses a sliding window approach; however, it does not construct a new model from scratch each time. Rather, it updates statistics at the nodes by incrementing the counts associated with new examples and decrementing the counts associated with old ones. Therefore, if there is a concept drift, some nodes may no longer pass the Hoeffding bound. When this happens, an alternate subtree will be grown, with the new best splitting attribute at the root.
  • 9. A Classifier Ensemble Approach to Stream Data ClassificationThe idea is to train an ensemble or group of classifiers (using, say naĂŻve Bayes) from sequential chunks of the data stream.Whenever a new chunk arrives, we build a new classifier from it. The individual classifiers are weighted based on their expected classification accuracy in a time-changing environment. Only the top-k classifiers are kept. The decisions are then based on the weighted votes of the classifiers.
  • 10. Clustering in evolving data streamsCompute and store summaries of past dataApply a divide-and-conquer strategyIncremental clustering of incoming data streamsPerform micro clustering as well as macro clustering analysisExplore multiple time granularity for the analysis of cluster evolutionDivide stream clustering into on-line and off-line processes
  • 11. Mining Time-Series DataA time-series database consists of sequences of values or events obtained over repeated measurements of time.Trend AnalysisSimilarity Search in Time-Series Analysis
  • 12. Markov Chain for sequence analysisA Markov chain is a model that generates sequences in which the probability of a symbol depends only on the previous symbol.
  • 13. Tasks using hidden Markov models include:Evaluation: Given a sequence, x, determine the probability, P(x), of obtaining x in the model.Decoding: Given a sequence, determine the most probable path through the model that produced the sequence.Learning: Given a model and a set of training sequences, find the model parameters (i.e., the transition and emission probabilities) that explain the training sequences with relatively high probability.
  • 14. Different algorithms in series analysisForward AlgorithmViterbi AlgorithmBaum-Welch Algorithm
  • 15. Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net