Module 4
Module 4
Subject In-charge
Sonali Suryawanshi
Assistant Professor, Department of Information Technology, SFIT
Room No. 328
email: [email protected]
Need of DSMS
Need of DSMS
types of queries
1) One-time queries: POINT-IN-TIME
snapshot of data set, with answer
returned to user
Other category:
• 1) predefined queries: supplied to
DSMS before any relevant data streams
have already begun, most commonly
Continuous queries(max,min, avg, sum)
1) Sensor network:
• Many sensors : huge source of data in terms of streams feeding into central controller
• Situations: require constant monitoring of many parameters to make important decisions
• Response: Alert and alarms
Need for Analysis , aggregation and joins over multiple streams corresponding to various
sensors
- Joins on multiple streams like temperature streams, ocean current streams from whether
stations to give alerts or warning of disasters for Rapidly changing information depending
on vagaries of nature
- Monitoring stream of current power usage statistics reported to power station, grouping
on them by location, user type etc, to manage power distribution efficiently
-
St. Francis Institute of Technology
Department of Information Technology 14
The material in this presentation belongs to St. Francis Institute of Technology and is solely for educational purposes. Distribution and modifications of the content is prohibited.
● Since external memory not suitable for real time analysis due to
high latency
● External memory not suitable for continuous queries
● Thus Need of algorithm that confines to main memory without
accessing disk.
3) Sliding Windows:
● insights based on the recent past will be more informative and useful than insights
based on stale data.,
● must keep the window fresh, deleting the oldest elements as new ones
come in.
● can be the most recent n elements of a stream, for some n, or it can be all
the elements that arrived within the last t time units, for example, 1 month.
5) Blocking Operators
● query operator that is unable to produce an answer until it has seen its entire input.
● aggregation operators such as SUM COUNT, MIN, MAX and AVG.
● incorporation of the blocking operators into the query tree poses problems.
● never be able to produce any output as data streams are infinite .
● dealing with them effectively is one of the challenges of data stream computation.
01 Reservoir Sampling ●
Biased Reservoir
02 ●
Sampling
03 Concise Sampling ●