0% found this document useful (0 votes)

11 views24 pages

Lec 19

This lecture discusses Spark Streaming and its application in real-time big data processing, particularly focusing on fraud detection, anomaly detection in IoT sensor data, and Twitter sentiment analysis. It emphasizes the need for a unified framework that integrates batch and streaming data processing, highlighting Spark Streaming's capabilities in achieving low latency and fault tolerance. The lecture also outlines the advantages of using Spark Streaming within the Spark ecosystem, including its ability to handle large-scale data streams from various sources like Kafka and Twitter.

Uploaded by

bhargavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views24 pages

Lec 19

Uploaded by

bhargavi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Lecture-19

Spark Streaming and Sliding Window Analytics (Part-I)

Spark Streaming and Sliding Window Analytics.

Refer slide time: (0:17)

Preface
Content of this lecture: In this lecture we will discuss real time big data processing with Spark Streaming
and sliding window analytics. We will also discuss a case study based on Twitter sentiment analysis using
Streaming.

Refer slide time: (0:35)

Big Streaming data processing the motivation of going for Streaming data processing is deep-rooted and
is based on the motivation is based on the application such as fraud detection in different banking
transactions which are happening online and and the real time how this particular fraud detection can be
done? That becomes a challenge. So, therefore this motivates Streaming data processing system and how
using that? We can do this online or in a real time the fraud detection in the banking transactions. Another
such application which has motivated the use of Streaming data processing is detecting the anomalies in
the sensor data. Now, you have seen that lot of IOT deployments are happening these days which
basically collects the sensor data and then based on these sensing data the anomalies are to be detected in
real time. This also has given the new way of solving these problems using the Streaming data processing
in real time. Similarly we want to find out we want to do the analysis of tweets and based on these
analysis we can find out the sentiments or the mood of the people at a particular instance so this online in
a real-time this tweet analysis also requires the Streaming data processing. So, these applications are a
newer applications are now given a reason to think of providing Streaming data processing for analysis
for analytics in the real time. And therefore, we are going to understand the new concepts and the very
new theory that is Spark Streaming how that is being used in today's workload and then ewer
applications.

Refer slide time: (3:03)

Now, the question is how to process the big data big Streaming data? So, now here we are going to deal
in this part of the lecture the data which is called a, ‘Streaming Data’. So, if the the stream of the data is
very fast and also a continuous stream then it is categorized under the Big Data scenario. Where in this
particular characteristic which is called a, ‘Volume’, sorry, ‘Velocity’, requires the infrastructure of a big
data to handle it so that means we require hundreds to the thousands of these nodes to scale that means to
scale out to deal with this first data which is also called as the, ‘Streaming Data’. So, for the scaling and
scalability and for we require thousands of hundreds of nodes to process this big Streaming data another
aspect is about achieving the low latency. Obviously this requires the insight into the technology we will
see how we can achieve the low latency in this Streaming data processing framework. Now, another
requirement is about how to deal with the failures? And how to deal not only with the failures? But, how
to efficiently recover from the failures? So, that it can be tolerated by the applications which are
monitoring in the real time the events and for different applications. So, we have to deal with the failure
recovery which is to be done. So, that it can be it can cater to the application in a real-time applications
which are based on streaming data. Now, another thing is that now there are various interactive
processing is happening with the Streaming data. And this particular Streaming data is also called as a
‘Fast Data’. Now, besides this fast data sometimes you we also require to be to integrate with another
stream of batch data. So, if there are two different modes of data that means one is through the batch data
the other is called ‘Streaming data’ together there are some applications which requires the integration of
both these different type of data and they are different and they are having a different stack to be
processed. So, how are we going to integrate them in the traditional in the previous the technologies they
are having a different stacks and therefore the integration is basically time taking and may not be useful
for real-time applications. So, here we are going to see the new technology which is called the ‘Spark
Streaming’. Which will combine or integrate this requirement of processing simultaneously the batch and
the interactive data.

Refer slide time: (7:16)

Now, let us see how what what people have been doing? So, far that means what are the other systems
before this Spark is Streaming. And how it was being done. So, actually as we see that for batch
processing and for stream processing there we requirement of different stacks and often different stacks
will are optimized for different type of data so integration is not very common in the previous generation
of systems. And, for example for batch processing we have the Spark system and we have the Map
Reduce framework for doing the batch processing and for the Streaming in the earlier systems were such
as storm. So, now integration requires two different stacks to be combined together. So, hence it requires
it has lot of latency involved within it and may not be require may not be sufficient for different real-time
applications. That, is why this particular framework which we are going to discuss in today's lecture that
is called, ‘Spark Streaming’ which will integrate both batch and Streaming applications using the same
stack. Therefore, it will be the most efficient way of dealing with multiple type of data that is the batch
data and the stream data when they are required to be processed at the same time. So, the existing
framework such as storm and Map Reduce cannot do both of the the processing that is for the batch and
the Streaming data at the same point of time. So, either the swimming stream processing of hundreds of
megabytes with a low latency or the batch processing of terabytes of data with high latency are to be dealt
separately and the combined or integrated viewpoint is not available as of date before this Spark is
Streaming. So, Spark Streaming is the new technology which we are going to discuss and we will be also
seeing different use cases where this kind of batch processing and stream processing together are required
in many applications.

Refer slide time: (10:22)

So, therefore there are it is required to maintain two different stacks if the the integration is not supported
and also they may require different programming models and also different efforts are required and also
requires an operational cost. So, that is not feasible.

Refer slide time: (10:45)

Therefore, the Spark is Streaming we are going to discuss today how that is all integrated and is being
useful for various applications. Now, let us see the before going in more detail about the Spark Streaming.
So, how we will see let us see that how the fault-tolerance is achieved in the stream processing systems.
So, traditional processing models use the pipeline of different nodes and each node maintains the mutable
States and each input record updates the state and new records are sent out. Now, the problem with the
previous systems were that the mutable States were lost if the node are filled. And, this is a normal failure
of the nodes is a norm rather than exception in the commodity hardware. So, therefore when the node
fields the mutable states which are maintaining this fault tolerance of for the mutable states will be lost.
So, some things are basically lost which are in the traditional the previous systems. Therefore, the stream
therefore making the stateful stream processing fault tolerant is also very much needed and in in Spark
Streaming system. Will, we will see that how this stateful in stream processing is done and in a fault
tolerant manner.

Refer slide time: (12:44)

Now, let us see what is a Streaming? So, Data Streaming is a technique for transferring data. So, that it
can be processed as a steady and continuous stream which is incoming to the system in a stream of data
which is incoming to the system. So, you can visualize as if the data is flowing continuously on the pipes
and as it process as it passes through these pipes it has required to be processing in a real-time. And, if
that is cause if that is there then it is called the, ‘Data Streaming’ or ‘Streaming Data’. Now, the sources
which generates the streaming data are many for example the internet traffic which is flowing also if it is
seen then it will also be a network Streaming data similarly the Twitter Streaming data also can be taken
up in some of the applications. Similarly, the Netflix which is in real-time online movie watching that
also generates the streaming data similarly YouTube data also a kind of the Streaming data and there are
many other many many other ways this Streaming data can be generated Streaming data can also be
generated from the database so data is read and is being transmitted in the form of the Streaming heater,
that is ETL. So, companies normally does this for the analysis so Streaming data technologies are
becoming increasingly important I with the growth of the Internet and the Internet enabled different
services which are available in the form of Netflix, Facebook, Twitter, YouTube, Pandora, iTunes and so
on. There are tons of such different nowadays services available through the internet.

Refer slide time: (14:49)

Now, let us see the Spark ecosystem. And, we will see the positioning of the Spark streaming system
where it lies. So, from in this Spark ecosystem you will see that on the core there is a spark engine and on
top of it is you can see that spark streaming system is running. Now, the spark is streaming system runs
over the Spark core and this enables the analytical and interactive applications for the live Streaming data.
So, therefore these core components of the Spark framework provide the utilities and architecture for the
other components also. Therefore, dealing with this Spark Streaming or Spark gives many other
advantages. For example, the analytics which is required to be performed on the Streaming data may
sometimes need the machine learning on that to be applied on the own Streaming data. Now, with one
single common stack that is a Spark core engine it is now possible that we can apply or it is possible that
the machine learning libraries which are built on top of the Spark can also be used for analytics of the
Streaming data. Similarly, the graph processing applications such as graph computation engines which
combine data parallel and graph parallel concepts can also be useful for the analysis of our analytics of
the Streaming data. And, they are all integrated similarly the Spark SQL which is also the structured way
of accessing the key value store can also be useful to be used in the Spark Streaming for storage and
retrieval purposes. So, therefore having the same in stack this Spark Streaming will gain a lot of
advantages not only for doing the batch processing and the Streaming data together there can be
integrated and a lot of other that means libraries can be used such as machine learning can be applied for
the Streaming analysis and graph also can be used as for the Streaming data analysis. Which can be
represented in a graph and can be done an analysis. We, will see this Spark ecosystem. So, the Spark
Streaming the positioning on top of the Spark core will gain not the advantages of other such utilities like
machine learning that is MLlib, GraphX, Spark SQL and so on. We, will see in further slides how we are
going to utilize the integration of all these together in solving today’s workload problems.

Refer slide time: (17:54)

So, what is this Spark Streaming? So, is it Spark it extends the Spark for doing the data big data
Streaming processing. So, big data stream processing can be now done with the help of Spark Streaming
which is an extension which extends the Spark core to perform this Spark Streaming. Now, this Spark
Streaming project was started in 2012 and it was released in the form of this Spark 0.9. And, Spark
Streaming highest lot of built-in support to consume the data from different sources such as from Kafka is
also one of the receiver which can feed the live stream data to the Spark stream similarly flume, Twitter,
Zero MQ, kinesis and TCP/IP sockets. Are, some of the inbuilt receiver system through API is they can
be plugged into the Spark Streaming system. Now, to get the stream data for computations in this
scenario. So, in a spark to point acts a separate technology based on the data set called, ‘Structured
Streaming’, has been designed and that has a higher level interface provided to provide the the the support
of Streaming.

Refer slide time: (19:26)

So, let us see the Streaming Spark Streaming framework for a large scale processing of Streaming data.
And, we will again let us review that it should be scale to the hundreds and thousands of the node it can
achieve can achieve second scaled the latencies. So, that means latencies are to be in the form of seconds.
So, latencies are in the in the scale of seconds not in the minutes are not latency should be in the minutes
or hours and so on. Now, achieving this is the scale of seconds latency is not going to be easy task and it
requires the new design. Therefore, we are going to understand the Spark Streaming system. Now, this is
Spark Streaming system also integrates the batch and the interactive processing together with a unified
view with the same stack therefore with unification on this batch and interactive application it is now
possible to achieve the latencies in the in the scale of seconds. Similarly it will provide a simple batch like
APL is for implementing the complex algorithms. And, it can also integrate it with the live data streams
from different other live other tools such as Kafka, flume, ZeroMQ and so on. So, that is why this Spark
Streaming is now-a-days required very much.

Refer slide time: (21:22)

Now, Sparky Streaming receives the data stream from different input sources process them in the cluster
and push it out to the databases dashboard for its output. Therefore, Spark Streaming is a scalable fault
tolerant and having the the the latencies of the scale of the seconds time. So, let us see through this
particular diagram that the input the data can be received through different input sources. So, it can be
received either from Kafka, flume, HDFS, kinesis or Twitter this live stream of data is now fed into the
spark Streaming system for the computation of Streaming data computation. And, after that the output
will be either stored on the database or it will be pushed on the dashboard for the output. So, we will now
see in this part of the discussion. What is the spark Streaming? How it is handling this kind of Streaming
of Streaming data of different sources together? And for which is useful for various applications?

Refer slide time: (22:54)

So, therefore again are going to understand about why the Spark is Streaming? So, many big data
applications need to process the large data streams in the real time such as website monitoring sometimes
require is required to see the the loads, the hits which are basically coming on the website and whether the
performance of the website is going well or not for has to be monitored. Similarly, the online transactions
for the bank or from for the credit card also really needs to be analyzed in real time. So, that becomes
Streaming data applications. Similarly, various ad monetization also require to be processed in real time.
So, whenever a user clicks on a particular ad and so those ads are to be so when the user will click. So, all
that data the click-through data has to be analyzed in real time and therefore different ads are to be pushed
in that manner.

Refer slide time: (24:13)

So, the Spark Streaming is very much needed and many important application must process the large
stream of live data and provides the results in a near real-time near real-time means this particular system
which we are going to discuss that is Spark Streaming has the latency is up to the half of the second. So,
those latencies which are tolerable up to half of the seconds can be used here by with the help of Spark
Streaming even lesser than this can be used in a separate Streaming systems which are known as the
storm and so on which has very less latency already available. So, let us see that if the the different
application such as social network trends where statistics intrusion detection system and so on. They they
need this kind of streaming system and require the large cluster to handle these workloads and also
require the the the the latencies of the order seconds.

Refer slide time: (25:29)

So, we can use the Spark is Streaming to stream the real-time data from various sources like Twitter a
stock market geographical system for doing the powerful analytics. So, spark sleeping is used to stream
the real-time data from various sources like Twitter, stock market, geographical systems perform
powerful analytics to help various business.

Refer slide time: (25:51)

So, therefore there is a need of a framework for big data stream processing that scales to the hundreds
thousands of the nodes achieves second scale latency sufficient to recover from the failure integrate, with
a batch and interactive processing.

Refer slide time: (26:06)

Let us see, how? What are the different features which are handled which are able to cater all these parts?
So, spark is tripping features first is scaling. So, Spark Streaming can easily scale to hundreds and
thousands so speed also is a low latency and fault tolerance is achieved here to recover from the failure
ending it is also integrated with the real-time and Business Analytics is also supported.

Refer slide time: (26:45)

So, let us see another part besides the integration with the batch and the batch processing and the real-
time Streaming data processing other than that we will see another requirement which is about stateful
stream processing. In, the traditional model as we have seen that it provides a pipeline and if the mutable
state is lost then it has to handle.

Refer slide time: (27:15)

So, let us see the modern data applications approaches for for more insights. So, traditional analytics is
basically require these kind of analysis.
Refer slide time: (27:32)

Now, the existing system for the Streaming data analysis is called a, ‘Storm’. And, it replays the record if
not processed by the node and therefore sometimes it provides the at least ones semantics. So, that means
some of the updates if the mutable states and the nodes are filled to achieve the updates in the mutable
state. So, that means it will update twice so that becomes a problem in at least one semantics and the
mutable states can be lost due to the failures. So, this at least once semantics and will create problems in
some of the times where the updates are to be done twice and in the existing systems like storm and does
achieve only up to that state of the art which is called at least once. So, exactly once is the semantics
which is required and it is supported in the spark Streaming system. There, are other streaming system
such as Trident which use the transactions to update the state and there also has exactly ones semantics
and for estate transaction to external database is slow.

Refer slide time: (28:58)

Now, how does this Spark Streaming work? Let us understand this. So, it runs a Streaming computations
as the series of small, deterministic bad jobs and the live streams which are taken up or which are
considered in the system is to be divided into the X seconds. And, the Spark treats each batch of data is an
RDD and process them using an RDD operation. Finally, the process results of RDD operations are
returned in the batches. So, let us see all these things using the diagram so here the live stream data when
it is entered it will be divided into the batch of of X seconds. And, this particular Spark Streaming system
will now divide into the batches earned this batches in the form of RDD is will be given to the Spark
engine for processing and the process result will finally be emitted. So, again let us discuss this entire
scenario which says that you to run this Spark Streaming system requires the the Streaming data to be
entered into the system that is called data streams. Data streams are received by the receivers of the Spark
Streaming system and then it will divide into the batches of X seconds and they are called, ‘Micro
Batches’ or a ‘Micro RDDs’. So, after dividing into the batches of X seconds this particular micro batches
will be given to the Spark. And, spark reads each batch of data as RDD and process them using the
different RDD operations which are provided by the Spark engine. And, finally the process result of the
RDD operations are returned in the form of batches and will be taken care either they are to be stored in
the database or they are stored in HDFS or they are output on dashboards. So, there are different ways,
this output can be or the result of this Spark Streaming can be now used indifferent applications.

Refer slide time: (31:47)

So, therefore we have seen that to run a Spark Streaming computation as a series of very small
deterministic bad jobs. And, these batch sizes are as low as half of the second. So, it cannot be lesser than
half a second latency why because this is to beset by the system. So, let us see if let us say batch size is a
half of a second it will finally end-to-end latency it comes out to be only one second. So, therefore
seconds of latency is achieved due to this batch size of a half a second size is being computed and
processed. So, therefore the potential for combining batch and Streaming data processing in the same
system exists why because finally these batches are also micro batch RDD's and which are given to the
spark system and the batch data also is processed to the SPARC system so finally they have the single
engine that is the spark engine which process both the batch as well as the Streaming data of after the
processing of the Spark is Streaming. So, therefore it is possible to combine or integrate both batch
processing and Streaming Streaming processing in the same system.

Refer slide time: (33:12)

And, now let us see an example of a word count with Kafka as the live input source. And, how this is all
done in the Spark Streaming system. So, this word count application we can see we have to create the
Spark Streaming context and with a with a one-second size of batch sizes and the Sparkle and then this
particular context will be bound to the to the Kafka utilities and which will give the live stream in the
form of lines and this particular line will be further processed by the by the Spark Streaming using flat
map. Which, will split this line into the words and this word will now be applied on a map function. So,
map function is doing the transformation on all the words, which will be received by the Kafka live
stream. And, this map function will now perform the transformation that for every word it will emit the
word and a value 1 which in turn will be taken reduced by key and for every word it will now try to count
or it will or it will now do the sum of all such words appearing number of ones that means and therefore
every word will now have a particular count which will be printed. And, then we have to do this context
start and finally the context of wait termination also these. So, you see that what count with the Kafka is
quite easy to understand and you can write down the program address of the things that is the entire
operation of this stream computation is completely abstracted from the user.

Refer slide time: (35:12)

Now, let us see the internal details of the calf the Spark stream execution before that let us understand
first the Spark application. So, in the Spark application that is a Spark engine will have the processing in
the terms of the driver program which will now communicate with the executors running on different
nodes on to the cluster systems. And, this particular driver will communicate to the executor and this
particular driver will assign the tasks send to the executor for processing this particular data. Now, this
particular driver launches the executors in this particular cluster and this can be done with the help of yarn
and Mesos or Spark standalone clusters. So, this way the Spark launches using the driver program and
driver and executors driver will assign the task to the executors and executors in turn will perform the
execution of the processing of data in the Spark system.

Refer slide time: (36:25)

Now, in Spark Streaming runs on top of a Spark engine. So, let us see how the Spark Streaming
application will now progress in this scenario again there will be a driver program and the same word
count program which we have written is given to the driver and Driver will now in turn runs the receiver
as long as as long as running the task. So, this particular tasks are given to the executor and executors in
turn will have the receiver which will receive the live data stream from Kafka. So, after receiving the data
from the Kafka now the data will be means this particular live stream data will be received in the form of
lines and these lines will be now done in and that will be in the form of data blocks. Now, then it will
assign the task of a flat map on these executors these data blocks are to be replicated on to the other
executors. So, replication also is required for achieving the fault tolerance of these data blocks. Now,
further the driver will also assign the tasks to be performed by the executors on these data blocks. And,
for example we have seen the flat map and performing the the Map Reduce operation on the word counts
all these tasks will be executed at the executor. So, so all these are the components of the spark Streaming
system. So, here we see that how it will receive the data using the receiver.

Refer slide time: (38:12)

And, then we have also seen now see that how the data will be processed so each batch interval the driver
launches the task to process these particular data blocks. And, these processing will now give other results
to the results back and that result will be stored in the in the data store. So, these parts we have seen and
now we have also seen internally how the data is to be processed in the spark in the Streaming in the
spark Streaming system.

Refer slide time: (38:49)

So, let us review the entire scenario as the spark is streaming architecture. So, it is based on micro batch
architecture operates in the interval of time whenever a new batch are created at a regular time interval it
divides the received time batch into the blocks for parallelism. And, each batch is draft that translates into
the multiple jobs. And, has the ability to create the larger sized batch window as it processes over the
time.

New Text Document
70% (10)
New Text Document
8 pages
Hot Data Analytics For Real-Time Streaming in Iot Platform
No ratings yet
Hot Data Analytics For Real-Time Streaming in Iot Platform
227 pages
Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
StreamProcessingAndAnalytics Handout
No ratings yet
StreamProcessingAndAnalytics Handout
7 pages
Unit - 5 FBDA
No ratings yet
Unit - 5 FBDA
7 pages
Stream Processing and Analytics - Regular-HO
No ratings yet
Stream Processing and Analytics - Regular-HO
7 pages
Bài Giảng Spark Streaming
No ratings yet
Bài Giảng Spark Streaming
75 pages
Lec 05
No ratings yet
Lec 05
10 pages
Lecture 3 SStreaming Data Systems and Applications
No ratings yet
Lecture 3 SStreaming Data Systems and Applications
39 pages
Lec 19
No ratings yet
Lec 19
23 pages
Stream Processing Chapter 5
No ratings yet
Stream Processing Chapter 5
23 pages
Lec 20
No ratings yet
Lec 20
25 pages
Stream Processing
No ratings yet
Stream Processing
33 pages
Lecture #7.2 - Apache Spark - Streaming API
No ratings yet
Lecture #7.2 - Apache Spark - Streaming API
37 pages
SA Unit 1 PPT 5
No ratings yet
SA Unit 1 PPT 5
14 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
Lecture 11
No ratings yet
Lecture 11
31 pages
Big Data Stream Processing
No ratings yet
Big Data Stream Processing
25 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Assignment No. 3 For Business Data Analytics
No ratings yet
Assignment No. 3 For Business Data Analytics
16 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
Spark Notes
No ratings yet
Spark Notes
37 pages
7 - Streaming 2 - Calcite
No ratings yet
7 - Streaming 2 - Calcite
45 pages
Big Data Notes
No ratings yet
Big Data Notes
37 pages
Lec 01
No ratings yet
Lec 01
17 pages
b0m33bdt 7p Spark Databricks Streaming - 2023 - en
No ratings yet
b0m33bdt 7p Spark Databricks Streaming - 2023 - en
50 pages
BDA Unit 3
No ratings yet
BDA Unit 3
42 pages
B.Tech. 3rd Yr CSE (AI) 2022 23 Revised - 30
No ratings yet
B.Tech. 3rd Yr CSE (AI) 2022 23 Revised - 30
1 page
Big Data PDF
No ratings yet
Big Data PDF
10 pages
Stream Processing and Analytics Handout
No ratings yet
Stream Processing and Analytics Handout
8 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
Enkitec RealWorldExadata
No ratings yet
Enkitec RealWorldExadata
38 pages
BDA Unit 3
No ratings yet
BDA Unit 3
18 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Real Time Data Streaming New Techniques
No ratings yet
Real Time Data Streaming New Techniques
5 pages
DSPL Casestidy
No ratings yet
DSPL Casestidy
3 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
Apache Spark Streaming Presentation
100% (1)
Apache Spark Streaming Presentation
28 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
Spark Streaming: Tathagata "TD" Das
No ratings yet
Spark Streaming: Tathagata "TD" Das
28 pages
SPA Notes
No ratings yet
SPA Notes
4 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
Unit 4 Streaming Data
No ratings yet
Unit 4 Streaming Data
4 pages
Streaming Graph Processing Unit5
No ratings yet
Streaming Graph Processing Unit5
7 pages
ECS765P - W10 - Stream Processing
No ratings yet
ECS765P - W10 - Stream Processing
39 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
T09 Data Streaming
No ratings yet
T09 Data Streaming
52 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
3 pages
Lecture 10-11 (10-11 - 23-24-MAY-2023) - CH08 - PPT
No ratings yet
Lecture 10-11 (10-11 - 23-24-MAY-2023) - CH08 - PPT
105 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Railway Ticket Booking System Using QR Code
No ratings yet
Railway Ticket Booking System Using QR Code
14 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
UNIT V Streaming
No ratings yet
UNIT V Streaming
22 pages
Database Management Systems Outline
No ratings yet
Database Management Systems Outline
5 pages
Big Data 3rd Assignment Answers
No ratings yet
Big Data 3rd Assignment Answers
8 pages
Library Management System
50% (2)
Library Management System
73 pages
Learning Real-Time Processing With Spark Streaming - Sample Chapter
No ratings yet
Learning Real-Time Processing With Spark Streaming - Sample Chapter
30 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
2 pages
Advanced English Communication Skills Lab
No ratings yet
Advanced English Communication Skills Lab
2 pages
Data Loss Prevention Five Pillars Ebook
No ratings yet
Data Loss Prevention Five Pillars Ebook
9 pages
CSB353: Compiler Design Lab: Project Report
No ratings yet
CSB353: Compiler Design Lab: Project Report
15 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
Scenario Test Case Action (TC Description) Response (Expected) Serial Number Test Case No
No ratings yet
Scenario Test Case Action (TC Description) Response (Expected) Serial Number Test Case No
26 pages
Advanced Data Structures
No ratings yet
Advanced Data Structures
2 pages
005 Administer Governance and Compliance
No ratings yet
005 Administer Governance and Compliance
36 pages
Pig 2
No ratings yet
Pig 2
63 pages
PRN212Assignment01 WPF - LINQ
No ratings yet
PRN212Assignment01 WPF - LINQ
5 pages
Prog Python
No ratings yet
Prog Python
67 pages
E Hanaaw 18
No ratings yet
E Hanaaw 18
11 pages
Working With Expressions - Grouping and Summarizing Data - 16.05.20
No ratings yet
Working With Expressions - Grouping and Summarizing Data - 16.05.20
64 pages
Oracle Data Pump in Oracle Database 10g: Getting Started
No ratings yet
Oracle Data Pump in Oracle Database 10g: Getting Started
10 pages
UiPath Certified Professional - ABA - RecommendedTraining
No ratings yet
UiPath Certified Professional - ABA - RecommendedTraining
3 pages
Ramsin Nissan Resume
No ratings yet
Ramsin Nissan Resume
1 page
Lec 3
No ratings yet
Lec 3
28 pages
Lec 4
No ratings yet
Lec 4
28 pages
Lec 5
No ratings yet
Lec 5
6 pages
Big Data Analytics Using Hadoop
No ratings yet
Big Data Analytics Using Hadoop
26 pages
Lec 1
No ratings yet
Lec 1
30 pages
Lec 6
No ratings yet
Lec 6
16 pages
Lec 18
No ratings yet
Lec 18
21 pages
Lec 8
No ratings yet
Lec 8
24 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
31 pages
Function Spark
No ratings yet
Function Spark
9 pages
File Organization (IS 211) : Dr. Howida Youssry
No ratings yet
File Organization (IS 211) : Dr. Howida Youssry
15 pages
Per Partition
No ratings yet
Per Partition
3 pages
Lec 2
No ratings yet
Lec 2
20 pages
Lec 26
No ratings yet
Lec 26
10 pages
Lec 7
No ratings yet
Lec 7
10 pages
Information Security Clauses For Contracts
No ratings yet
Information Security Clauses For Contracts
7 pages
NPS Transaction Statement For Tier I Account: Welcome Subscriber-110196210021 17-Jul-2023 Home - Logout
No ratings yet
NPS Transaction Statement For Tier I Account: Welcome Subscriber-110196210021 17-Jul-2023 Home - Logout
3 pages
Tryton Client Web
No ratings yet
Tryton Client Web
6 pages
Question One INF1512 Introduction To Software Engineering: EC2 TD1/TD2 2020/2021
No ratings yet
Question One INF1512 Introduction To Software Engineering: EC2 TD1/TD2 2020/2021
2 pages
EDR vs. Client Cheat Sheet
No ratings yet
EDR vs. Client Cheat Sheet
1 page
Building A Database Test Plan With Jmeter: JDBC Configuration
No ratings yet
Building A Database Test Plan With Jmeter: JDBC Configuration
13 pages
M SHEKHAR RAO (RA1911029010066) - Ex - No-13 VPN Configuration
No ratings yet
M SHEKHAR RAO (RA1911029010066) - Ex - No-13 VPN Configuration
12 pages
Szlimjdxzd
No ratings yet
Szlimjdxzd
10 pages
Cloud Infrastructure Security at Different Laevels
No ratings yet
Cloud Infrastructure Security at Different Laevels
7 pages
AEM Online Training Course Content by MR - Kumar
No ratings yet
AEM Online Training Course Content by MR - Kumar
7 pages
Ass 1
No ratings yet
Ass 1
5 pages
8 - Soln Database
No ratings yet
8 - Soln Database
4 pages
Software Testing Methodologies: A Information Review: Prof. Sapana Desai
No ratings yet
Software Testing Methodologies: A Information Review: Prof. Sapana Desai
5 pages
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
From Everand
Kafka Up and Running for Network DevOps: Set Your Network Data in Motion
Eric Chou
No ratings yet
ASP.NET For Beginners: The Simple Guide to Learning ASP.NET Web Programming Fast!
From Everand
ASP.NET For Beginners: The Simple Guide to Learning ASP.NET Web Programming Fast!
Tim Warren
No ratings yet

Lec 19

Uploaded by

Lec 19

Uploaded by

Lecture-19

Spark Streaming and Sliding Window Analytics (Part-I)

Refer slide time: (0:17)

Refer slide time: (0:35)

Refer slide time: (3:03)

Refer slide time: (7:16)

Refer slide time: (10:22)

Refer slide time: (10:45)

Refer slide time: (12:44)

Refer slide time: (14:49)

Refer slide time: (17:54)

Refer slide time: (19:26)

Refer slide time: (21:22)

Refer slide time: (22:54)

Refer slide time: (24:13)

Refer slide time: (25:29)

Refer slide time: (25:51)

Refer slide time: (26:06)

Refer slide time: (26:45)

Refer slide time: (27:15)

Refer slide time: (28:58)

Refer slide time: (31:47)

Refer slide time: (33:12)

Refer slide time: (35:12)

Refer slide time: (36:25)

Refer slide time: (38:12)

Refer slide time: (38:49)

You might also like