0% found this document useful (0 votes)

25 views23 pages

Lec 19

Uploaded by

kanish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views23 pages

Lec 19

Uploaded by

kanish

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Big Data Computing

Prof. Rajiv Misra

Computer Science andEngineering,IIT Patna

Lecture-19

Spark Streaming and Sliding Window Analytics (Part-I)

Spark Streaming and Sliding Window Analytics.

Refer slide time: (0:17)

Preface
Content of this lecture: In this lecture we will discuss real time big data processing with Spark Streaming
and sliding window analytics. We will also discuss a case study based on Twitter sentiment analysis using
Streaming.

Refer slide time: (0:35)

Big Streaming data processing the motivation of going for Streaming data processing is deep-rooted and
is based on the motivation is based on the application such as fraud detection in different banking
transactions which are happening online and and the real time how this particular fraud detection can be
done? That becomes a challenge. So, therefore this motivates Streaming data processing system and how
using that? We can do this online or in a real time the fraud detection in the banking transactions. Another
such application which has motivated the use of Streaming data processing is detecting the anomalies in
the sensor data. Now, you have seen that lot of IOT deployments are happening these days which
basically collects the sensor data and then based on these sensing data the anomalies are to be detected in
real time. This also has given the new way of solving these problems using the Streaming data processing
in real time. Similarly we want to find out we want to do the analysis of tweets and based on these
analysis we can find out the sentiments or the mood of the people at a particular instance so this online in
a real-time this tweet analysis also requires the Streaming data processing. So, these applications are a
newer applications are now given a reason to think of providing Streaming data processing for analysis
for analytics in the real time. And therefore, we are going to understand the new concepts and the very
new theory that is Spark Streaming how that is being used in today's workload and then ewer
applications.

Refer slide time: (3:03)

Now, the question is how to process the big data big Streaming data? So, now here we are going to deal in
this part of the lecture the data which is called a, ‘Streaming Data’. So, if the the stream of the data is very
fast and also a continuous stream then it is categorized under the Big Data scenario. Where in this
particular characteristic which is called a, ‘Volume’, sorry, ‘Velocity’, requires the infrastructure of a big
data to handle it so that means we require hundreds to the thousands of these nodes to scale that means to
scale out to deal with this first data which is also called as the, ‘Streaming Data’. So, for the scaling and
scalability and for we require thousands of hundreds of nodes to process this big Streaming data another
aspect is about achieving the low latency. Obviously this requires the insight into the technology we will
see how we can achieve the low latency in this Streaming data processing framework. Now, another
requirement is about how to deal with the failures? And how to deal not only with the failures? But, how
to efficiently recover from the failures? So, that it can be tolerated by the applications which are
monitoring in the real time the events and for different applications. So, we have to deal with the failure
recovery which is to be done. So, that it can be it can cater to the application in a real-time applications
which are based on streaming data. Now, another thing is that now there are various interactive processing
is happening with the Streaming data. And this particular Streaming data is also called as a ‘Fast Data’.
Now, besides this fast data sometimes you we also require to be to integrate with another stream of batch
data. So, if there are two different modes of data that means one is through the batch data the other is
called ‘Streaming data’ together there are some applications which requires the integration of both these
different type of data and they are different and they are having a different stack to be processed. So, how
are we going to integrate them in the traditional in the previous the technologies they are having a
different stacks and therefore the integration is basically time taking and may not be useful for real-time
applications. So, here we are going to see the new technology which is called the ‘Spark Streaming’.
Which will combine or integrate this requirement of processing simultaneously the batch and the
interactive data.

Refer slide time: (7:16)

Now, let us see how what what people have been doing? So, far that means what are the other systems
before this Spark is Streaming. And how it was being done. So, actually as we see that for batch
processing and for stream processing there we requirement of different stacks and often different stacks
will are optimized for different type of data so integration is not very common in the previous generation
of systems. And, for example for batch processing we have the Spark system and we have the Map
Reduce framework for doing the batch processing and for the Streaming in the earlier systems were such
as storm. So, now integration requires two different stacks to be combined together. So, hence it requires
it has lot of latency involved within it and may not be require may not be sufficient for different real-time
applications. That, is why this particular framework which we are going to discuss in today's lecture that
is called, ‘Spark Streaming’ which will integrate both batch and Streaming applications using the same
stack. Therefore, it will be the most efficient way of dealing with multiple type of data that is the batch
data and the stream data when they are required to be processed at the same time. So, the existing
framework such as storm and Map Reduce cannot do both of the the processing that is for the batch and
the Streaming data at the same point of time. So, either the swimming stream processing of hundreds of
megabytes with a low latency or the batch processing of terabytes of data with high latency are to be dealt
separately and the combined or integrated viewpoint is not available as of date before this Spark is
Streaming. So, Spark Streaming is the new technology which we are going to discuss and we will be also
seeing different use cases where this kind of batch processing and stream processing together are required
in many applications.

Refer slide time: (10:22)

So, therefore there are it is required to maintain two different stacks if the the integration is not supported
and also they may require different programming models and also different efforts are required and also
requires an operational cost. So, that is not feasible.

Refer slide time: (10:45)

Therefore, the Spark is Streaming we are going to discuss today how that is all integrated and is being
useful for various applications. Now, let us see the before going in more detail about the Spark Streaming.
So, how we will see let us see that how the fault-tolerance is achieved in the stream processing systems.
So, traditional processing models use the pipeline of different nodes and each node maintains the mutable
States and each input record updates the state and new records are sent out. Now, the problem with the
previous systems were that the mutable States were lost if the node are filled. And, this is a normal failure
of the nodes is a norm rather than exception in the commodity hardware. So, therefore when the node
fields the mutable states which are maintaining this fault tolerance of for the mutable states will be lost.
So, some things are basically lost which are in the traditional the previous systems. Therefore, the stream
therefore making the stateful stream processing fault tolerant is also very much needed and in in Spark
Streaming system. Will, we will see that how this stateful in stream processing is done and in a fault
tolerant manner.

Refer slide time: (12:44)

Now, let us see what is a Streaming? So, Data Streaming is a technique for transferring data. So, that it
can be processed as a steady and continuous stream which is incoming to the system in a stream of data
which is incoming to the system. So, you can visualize as if the data is flowing continuously on the pipes
and as it process as it passes through these pipes it has required to be processing in a real-time. And, if
that is cause if that is there then it is called the, ‘Data Streaming’ or ‘Streaming Data’. Now, the sources
which generates the streaming data are many for example the internet traffic which is flowing also if it is
seen then it will also be a network Streaming data similarly the Twitter Streaming data also can be taken
up in some of the applications. Similarly, the Netflix which is in real-time online movie watching that also
generates the streaming data similarly YouTube data also a kind of the Streaming data and there are many
other many many other ways this Streaming data can be generated Streaming data can also be generated
from the database so data is read and is being transmitted in the form of the Streaming heater, that is ETL.
So, companies normally does this for the analysis so Streaming data technologies are becoming
increasingly important I with the growth of the Internet and the Internet enabled different services which
are available in the form of Netflix, Facebook, Twitter, YouTube, Pandora, iTunes and so on. There are
tons of such different nowadays services available through the internet.

Refer slide time: (14:49)

Now, let us see the Spark ecosystem. And, we will see the positioning of the Spark streaming system
where it lies. So, from in this Spark ecosystem you will see that on the core there is a spark engine and on
top of it is you can see that spark streaming system is running. Now, the spark is streaming system runs
over the Spark core and this enables the analytical and interactive applications for the live Streaming data.
So, therefore these core components of the Spark framework provide the utilities and architecture for the
other components also. Therefore, dealing with this Spark Streaming or Spark gives many other
advantages. For example, the analytics which is required to be performed on the Streaming data may
sometimes need the machine learning on that to be applied on the own Streaming data. Now, with one
single common stack that is a Spark core engine it is now possible that we can apply or it is possible that
the machine learning libraries which are built on top of the Spark can also be used for analytics of the
Streaming data. Similarly, the graph processing applications such as graph computation engines which
combine data parallel and graph parallel concepts can also be useful for the analysis of our analytics of the
Streaming data. And, they are all integrated similarly the Spark SQL which is also the structured way of
accessing the key value store can also be useful to be used in the Spark Streaming for storage and retrieval
purposes. So, therefore having the same in stack this Spark Streaming will gain a lot of advantages not
only for doing the batch processing and the Streaming data together there can be integrated and a lot of
other that means libraries can be used such as machine learning can be applied for the Streaming analysis
and graph also can be used as for the Streaming data analysis. Which can be represented in a graph and
can be done an analysis. We, will see this Spark ecosystem. So, the Spark Streaming the positioning on
top of the Spark core will gain not the advantages of other such utilities like machine learning that is
MLlib, GraphX, Spark SQL and so on. We, will see in further slides how we are going to utilize the
integration of all these together in solving today’s workload problems.

Refer slide time: (17:54)

So, what is this Spark Streaming? So, is it Spark it extends the Spark for doing the data big data
Streaming processing. So, big data stream processing can be now done with the help of Spark Streaming
which is an extension which extends the Spark core to perform this Spark Streaming. Now, this Spark
Streaming project was started in 2012 and it was released in the form of this Spark 0.9. And, Spark
Streaming highest lot of built-in support to consume the data from different sources such as from Kafka is
also one of the receiver which can feed the live stream data to the Spark stream similarly flume, Twitter,
Zero MQ, kinesis and TCP/IP sockets. Are, some of the inbuilt receiver system through API is they can
be plugged into the Spark Streaming system. Now, to get the stream data for computations in this
scenario. So, in a spark to point acts a separate technology based on the data set called, ‘Structured
Streaming’, has been designed and that has a higher level interface provided to provide the the the support
of Streaming.

Refer slide time: (19:26)

So, let us see the Streaming Spark Streaming framework for a large scale processing of Streaming data.
And, we will again let us review that it should be scale to the hundreds and thousands of the node it can
achieve can achieve second scaled the latencies. So, that means latencies are to be in the form of seconds.
So, latencies are in the in the scale of seconds not in the minutes are not latency should be in the minutes
or hours and so on. Now, achieving this is the scale of seconds latency is not going to be easy task and it
requires the new design. Therefore, we are going to understand the Spark Streaming system. Now, this is
Spark Streaming system also integrates the batch and the interactive processing together with a unified
view with the same stack therefore with unification on this batch and interactive application it is now
possible to achieve the latencies in the in the scale of seconds. Similarly it will provide a simple batch like
APL is for implementing the complex algorithms. And, it can also integrate it with the live data streams
from different other live other tools such as Kafka, flume, ZeroMQ and so on. So, that is why this Spark
Streaming is now-a-days required very much.

Refer slide time: (21:22)

Now, Sparky Streaming receives the data stream from different input sources process them in the cluster
and push it out to the databases dashboard for its output. Therefore, Spark Streaming is a scalable fault
tolerant and having the the the latencies of the scale of the seconds time. So, let us see through this
particular diagram that the input the data can be received through different input sources. So, it can be
received either from Kafka, flume, HDFS, kinesis or Twitter this live stream of data is now fed into the
spark Streaming system for the computation of Streaming data computation. And, after that the output
will be either stored on the database or it will be pushed on the dashboard for the output. So, we will now
see in this part of the discussion. What is the spark Streaming? How it is handling this kind of Streaming
of Streaming data of different sources together? And for which is useful for various applications?

Refer slide time: (22:54)

So, therefore again are going to understand about why the Spark is Streaming? So, many big data
applications need to process the large data streams in the real time such as website monitoring sometimes
require is required to see the the loads, the hits which are basically coming on the website and whether the
performance of the website is going well or not for has to be monitored. Similarly, the online transactions
for the bank or from for the credit card also really needs to be analyzed in real time. So, that becomes
Streaming data applications. Similarly, various ad monetization also require to be processed in real time.
So, whenever a user clicks on a particular ad and so those ads are to be so when the user will click. So, all
that data the click-through data has to be analyzed in real time and therefore different ads are to be pushed
in that manner.

Refer slide time: (24:13)

So, the Spark Streaming is very much needed and many important application must process the large
stream of live data and provides the results in a near real-time near real-time means this particular system
which we are going to discuss that is Spark Streaming has the latency is up to the half of the second. So,
those latencies which are tolerable up to half of the seconds can be used here by with the help of Spark
Streaming even lesser than this can be used in a separate Streaming systems which are known as the
storm and so on which has very less latency already available. So, let us see that if the the different
application such as social network trends where statistics intrusion detection system and so on. They they
need this kind of streaming system and require the large cluster to handle these workloads and also
require the the the the latencies of the order seconds.

Refer slide time: (25:29)

So, we can use the Spark is Streaming to stream the real-time data from various sources like Twitter a
stock market geographical system for doing the powerful analytics. So, spark sleeping is used to stream
the real-time data from various sources like Twitter, stock market, geographical systems perform powerful
analytics to help various business.

Refer slide time: (25:51)

So, therefore there is a need of a framework for big data stream processing that scales to the hundreds
thousands of the nodes achieves second scale latency sufficient to recover from the failure integrate, with
a batch and interactive processing.
Refer slide time: (26:06)

Let us see, how? What are the different features which are handled which are able to cater all these parts?
So, spark is tripping features first is scaling. So, Spark Streaming can easily scale to hundreds and
thousands so speed also is a low latency and fault tolerance is achieved here to recover from the failure
ending it is also integrated with the real-time and Business Analytics is also supported.

Refer slide time: (26:45)

So, let us see another part besides the integration with the batch and the batch processing and the real
time Streaming data processing other than that we will see another requirement which is about stateful
stream processing. In, the traditional model as we have seen that it provides a pipeline and if the mutable
state is lost then it has to handle.

Refer slide time: (27:15)

So, let us see the modern data applications approaches for for more insights. So, traditional analytics is
basically require these kind of analysis.
Refer slide time: (27:32)
Now, the existing system for the Streaming data analysis is called a, ‘Storm’. And, it replays the record if
not processed by the node and therefore sometimes it provides the at least ones semantics. So, that means
some of the updates if the mutable states and the nodes are filled to achieve the updates in the mutable
state. So, that means it will update twice so that becomes a problem in at least one semantics and the
mutable states can be lost due to the failures. So, this at least once semantics and will create problems in
some of the times where the updates are to be done twice and in the existing systems like storm and does
achieve only up to that state of the art which is called at least once. So, exactly once is the semantics
which is required and it is supported in the spark Streaming system. There, are other streaming system
such as Trident which use the transactions to update the state and there also has exactly ones semantics
and for estate transaction to external database is slow.

Refer slide time: (28:58)

Now, how does this Spark Streaming work? Let us understand this. So, it runs a Streaming computations
as the series of small, deterministic bad jobs and the live streams which are taken up or which are
considered in the system is to be divided into the X seconds. And, the Spark treats each batch of data is an
RDD and process them using an RDD operation. Finally, the process results of RDD operations are
returned in the batches. So, let us see all these things using the diagram so here the live stream data when
it is entered it will be divided into the batch of of X seconds. And, this particular Spark Streaming system
will now divide into the batches earned this batches in the form of RDD is will be given to the Spark
engine for processing and the process result will finally be emitted. So, again let us discuss this entire
scenario which says that you to run this Spark Streaming system requires the the Streaming data to be
entered into the system that is called data streams. Data streams are received by the receivers of the Spark
Streaming system and then it will divide into the batches of X seconds and they are called, ‘Micro
Batches’ or a ‘Micro RDDs’. So, after dividing into the batches of X seconds this particular micro batches
will be given to the Spark. And, spark reads each batch of data as RDD and process them using the
different RDD operations which are provided by the Spark engine. And, finally the process result of the
RDD operations are returned in the form of batches and will be taken care either they are to be stored in
the database or they are stored in HDFS or they are output on dashboards. So, there are different ways,
this output can be or the result of this Spark Streaming can be now used indifferent applications.

Refer slide time: (31:47)

So, therefore we have seen that to run a Spark Streaming computation as a series of very small
deterministic bad jobs. And, these batch sizes are as low as half of the second. So, it cannot be lesser than
half a second latency why because this is to beset by the system. So, let us see if let us say batch size is a
half of a second it will finally end-to-end latency it comes out to be only one second. So, therefore
seconds of latency is achieved due to this batch size of a half a second size is being computed and
processed. So, therefore the potential for combining batch and Streaming data processing in the same
system exists why because finally these batches are also micro batch RDD's and which are given to the
spark system and the batch data also is processed to the SPARC system so finally they have the single
engine that is the spark engine which process both the batch as well as the Streaming data of after the
processing of the Spark is Streaming. So, therefore it is possible to combine or integrate both batch
processing and Streaming Streaming processing in the same system.

Refer slide time: (33:12)

And, now let us see an example of a word count with Kafka as the live input source. And, how this is all
done in the Spark Streaming system. So, this word count application we can see we have to create the
Spark Streaming context and with a with a one-second size of batch sizes and the Sparkle and then this
particular context will be bound to the to the Kafka utilities and which will give the live stream in the
form of lines and this particular line will be further processed by the by the Spark Streaming using flat
map. Which, will split this line into the words and this word will now be applied on a map function. So,
map function is doing the transformation on all the words, which will be received by the Kafka live
stream. And, this map function will now perform the transformation that for every word it will emit the
word and a value 1 which in turn will be taken reduced by key and for every word it will now try to count
or it will or it will now do the sum of all such words appearing number of ones that means and therefore
every word will now have a particular count which will be printed. And, then we have to do this context
start and finally the context of wait termination also these. So, you see that what count with the Kafka is
quite easy to understand and you can write down the program address of the things that is the entire
operation of this stream computation is completely abstracted from the user.

Refer slide time: (35:12)

Now, let us see the internal details of the calf the Spark stream execution before that let us understand first
the Spark application. So, in the Spark application that is a Spark engine will have the processing in the
terms of the driver program which will now communicate with the executors running on different nodes
on to the cluster systems. And, this particular driver will communicate to the executor and this particular
driver will assign the tasks send to the executor for processing this particular data. Now, this particular
driver launches the executors in this particular cluster and this can be done with the help of yarn and
Mesos or Spark standalone clusters. So, this way the Spark launches using the driver program and driver
and executors driver will assign the task to the executors and executors in turn will perform the execution
of the processing of data in the Spark system.

Refer slide time: (36:25)

Now, in Spark Streaming runs on top of a Spark engine. So, let us see how the Spark Streaming
application will now progress in this scenario again there will be a driver program and the same word
count program which we have written is given to the driver and Driver will now in turn runs the receiver
as long as as long as running the task. So, this particular tasks are given to the executor and executors in
turn will have the receiver which will receive the live data stream from Kafka. So, after receiving the data
from the Kafka now the data will be means this particular live stream data will be received in the form of
lines and these lines will be now done in and that will be in the form of data blocks. Now, then it will
assign the task of a flat map on these executors these data blocks are to be replicated on to the other
executors. So, replication also is required for achieving the fault tolerance of these data blocks. Now,
further the driver will also assign the tasks to be performed by the executors on these data blocks. And,
for example we have seen the flat map and performing the the Map Reduce operation on the word counts
all these tasks will be executed at the executor. So, so all these are the components of the spark Streaming
system. So, here we see that how it will receive the data using the receiver.

Refer slide time: (38:12)

And, then we have also seen now see that how the data will be processed so each batch interval the driver
launches the task to process these particular data blocks. And, these processing will now give other
results to the results back and that result will be stored in the in the data store. So, these parts we have
seen and now we have also seen internally how the data is to be processed in the spark in the Streaming in
the spark Streaming system.

Refer slide time: (38:49)

So, let us review the entire scenario as the spark is streaming architecture. So, it is based on micro batch
architecture operates in the interval of time whenever a new batch are created at a regular time interval it
divides the received time batch into the blocks for parallelism. And, each batch is draft that translates into
the multiple jobs. And, has the ability to create the larger sized batch window as it processes over the
time.

Calculation Worksheet: Combustion Air, Standard Method: Step 1
No ratings yet
Calculation Worksheet: Combustion Air, Standard Method: Step 1
1 page
Lec 19
No ratings yet
Lec 19
24 pages
Lec 05
No ratings yet
Lec 05
10 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
Real Time Data Streaming New Techniques
No ratings yet
Real Time Data Streaming New Techniques
5 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Stream Processing and Analytics Handout
No ratings yet
Stream Processing and Analytics Handout
8 pages
BDA Unit 3
No ratings yet
BDA Unit 3
18 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Stream Processing Chapter 5
No ratings yet
Stream Processing Chapter 5
23 pages
StreamProcessingAndAnalytics Handout
No ratings yet
StreamProcessingAndAnalytics Handout
7 pages
b0m33bdt 7p Spark Databricks Streaming - 2023 - en
No ratings yet
b0m33bdt 7p Spark Databricks Streaming - 2023 - en
50 pages
BDA Unit 3
No ratings yet
BDA Unit 3
42 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
SPA Notes
No ratings yet
SPA Notes
4 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
Hot Data Analytics For Real-Time Streaming in Iot Platform
No ratings yet
Hot Data Analytics For Real-Time Streaming in Iot Platform
227 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
B.Tech. 3rd Yr CSE (AI) 2022 23 Revised - 30
No ratings yet
B.Tech. 3rd Yr CSE (AI) 2022 23 Revised - 30
1 page
Stream Processing
No ratings yet
Stream Processing
33 pages
BDA Lec10
No ratings yet
BDA Lec10
33 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
Lec 01
No ratings yet
Lec 01
17 pages
Stream Processing and Analytics - Regular-HO
No ratings yet
Stream Processing and Analytics - Regular-HO
7 pages
Lec 20
No ratings yet
Lec 20
25 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
Lecture 3 SStreaming Data Systems and Applications
No ratings yet
Lecture 3 SStreaming Data Systems and Applications
39 pages
Unit 4 Streaming Data
No ratings yet
Unit 4 Streaming Data
4 pages
Lecture #7.2 - Apache Spark - Streaming API
No ratings yet
Lecture #7.2 - Apache Spark - Streaming API
37 pages
UNIT V Streaming
No ratings yet
UNIT V Streaming
22 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
Big Data Stream Processing
No ratings yet
Big Data Stream Processing
25 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
Apache Spark Streaming Presentation
100% (1)
Apache Spark Streaming Presentation
28 pages
Spark Streaming: Tathagata "TD" Das
No ratings yet
Spark Streaming: Tathagata "TD" Das
28 pages
Lecture 11
No ratings yet
Lecture 11
31 pages
ECS765P - W10 - Stream Processing
No ratings yet
ECS765P - W10 - Stream Processing
39 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Assignment No. 3 For Business Data Analytics
No ratings yet
Assignment No. 3 For Business Data Analytics
16 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
7 - Streaming 2 - Calcite
No ratings yet
7 - Streaming 2 - Calcite
45 pages
Big Data Notes
No ratings yet
Big Data Notes
37 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Unit - 5 FBDA
No ratings yet
Unit - 5 FBDA
7 pages
T09 Data Streaming
No ratings yet
T09 Data Streaming
52 pages
Bài Giảng Spark Streaming
No ratings yet
Bài Giảng Spark Streaming
75 pages
Understanding Data Processing in Databricks: From Spark Streaming To Structured Streaming
No ratings yet
Understanding Data Processing in Databricks: From Spark Streaming To Structured Streaming
12 pages
Learning Real-Time Processing With Spark Streaming - Sample Chapter
No ratings yet
Learning Real-Time Processing With Spark Streaming - Sample Chapter
30 pages
What Is Stream Processing
No ratings yet
What Is Stream Processing
3 pages
Big Data 3rd Assignment Answers
No ratings yet
Big Data 3rd Assignment Answers
8 pages
Streaming Graph Processing Unit5
No ratings yet
Streaming Graph Processing Unit5
7 pages
Streaming Systems
No ratings yet
Streaming Systems
1 page
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
DSPL Casestidy
No ratings yet
DSPL Casestidy
3 pages
Big Data PDF
No ratings yet
Big Data PDF
10 pages
Real-Time Data Pipelines Made Easy With Structured Streaming in Apache Spark
No ratings yet
Real-Time Data Pipelines Made Easy With Structured Streaming in Apache Spark
51 pages
Lec 21
No ratings yet
Lec 21
16 pages
Lec 18
No ratings yet
Lec 18
18 pages
Esia Notes
No ratings yet
Esia Notes
35 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Ios Unit 4
No ratings yet
Ios Unit 4
18 pages
Ios Unit 1
No ratings yet
Ios Unit 1
27 pages
Ios Unit 3
No ratings yet
Ios Unit 3
10 pages
Ferro Electric
No ratings yet
Ferro Electric
33 pages
A Deep Learning Model For Detection of Cervical SP
No ratings yet
A Deep Learning Model For Detection of Cervical SP
12 pages
SAP HANA SQL Script Reference en
No ratings yet
SAP HANA SQL Script Reference en
256 pages
Pharmacy Proposal
25% (4)
Pharmacy Proposal
20 pages
Greenhouse Monitoring and Control System Based On Wireless Sensor Network
No ratings yet
Greenhouse Monitoring and Control System Based On Wireless Sensor Network
4 pages
Emergent Ecapture Pro Manual v0.1.7 (2022-08-05)
No ratings yet
Emergent Ecapture Pro Manual v0.1.7 (2022-08-05)
128 pages
LOGO! StarterKit
No ratings yet
LOGO! StarterKit
2 pages
Kidney Disease Early-Stage Identification and Prevention Using Supervised Machine Learning
No ratings yet
Kidney Disease Early-Stage Identification and Prevention Using Supervised Machine Learning
6 pages
Atg - Format
No ratings yet
Atg - Format
8 pages
Group1 A
No ratings yet
Group1 A
18 pages
I-F Plus: FANUC Series 0
No ratings yet
I-F Plus: FANUC Series 0
3 pages
Applications of Nanotechnology in Agriculture
No ratings yet
Applications of Nanotechnology in Agriculture
417 pages
ICAO Frequency Management Manual
No ratings yet
ICAO Frequency Management Manual
19 pages
BBB Finals Primary 5 Session 2
100% (2)
BBB Finals Primary 5 Session 2
35 pages
GED OnlineTest R2 Sci
No ratings yet
GED OnlineTest R2 Sci
18 pages
MVD Universal Battery Chargers
0% (1)
MVD Universal Battery Chargers
4 pages
Lesson 1 Measures of Position
No ratings yet
Lesson 1 Measures of Position
23 pages
Classification of Reservoirs and Reservoir Fluid Properties: Dr. Farqad Hadi
No ratings yet
Classification of Reservoirs and Reservoir Fluid Properties: Dr. Farqad Hadi
7 pages
CS423 Raw Sockets BW
No ratings yet
CS423 Raw Sockets BW
34 pages
Chapters 3 To 7 Study Guide
No ratings yet
Chapters 3 To 7 Study Guide
38 pages
Maxime Cohen Promo Paper Final
No ratings yet
Maxime Cohen Promo Paper Final
58 pages
Lab Report-03 (ME-339 Control Engineering Lab)
No ratings yet
Lab Report-03 (ME-339 Control Engineering Lab)
6 pages
Physics Questions
No ratings yet
Physics Questions
7 pages
Capital Budgeting: Learning Objectives
No ratings yet
Capital Budgeting: Learning Objectives
41 pages
Transfer Learning Using VGG-16 With Deep Convoluti
No ratings yet
Transfer Learning Using VGG-16 With Deep Convoluti
9 pages
Sma 306 - Complex Analysis 1 - April 2017
No ratings yet
Sma 306 - Complex Analysis 1 - April 2017
4 pages
EST 1 (Last Minute) (16-5-2024)
No ratings yet
EST 1 (Last Minute) (16-5-2024)
13 pages
Chapter 5
No ratings yet
Chapter 5
30 pages
January 1995 PW
100% (1)
January 1995 PW
78 pages

Lec 19

Uploaded by

Lec 19

Uploaded by

Big Data Computing

Prof. Rajiv Misra

Spark Streaming and Sliding Window Analytics (Part-I)

Refer slide time: (0:17)

Refer slide time: (0:35)

Refer slide time: (3:03)

Refer slide time: (7:16)

Refer slide time: (10:22)

Refer slide time: (10:45)

Refer slide time: (12:44)

Refer slide time: (14:49)

Refer slide time: (17:54)

Refer slide time: (19:26)

Refer slide time: (21:22)

Refer slide time: (22:54)

Refer slide time: (24:13)

Refer slide time: (25:29)

Refer slide time: (25:51)

Refer slide time: (26:45)

Refer slide time: (27:15)

Refer slide time: (28:58)

Refer slide time: (31:47)

Refer slide time: (33:12)

Refer slide time: (35:12)

Refer slide time: (36:25)

Refer slide time: (38:12)

Refer slide time: (38:49)

You might also like