0% found this document useful (0 votes)

34 views100 pages

Cs498 Week 12 Slide

The document discusses the importance of real-time stream processing in cloud computing, highlighting the limitations of traditional batch processing systems like Hadoop. It introduces Apache Storm as a solution for real-time data processing, detailing its architecture, components (spouts and bolts), and the concept of topologies. Additionally, it touches on the challenges of state management in streaming systems and mentions Trident for providing exactly-once semantics in stateful stream processing.

Uploaded by

wzhi1870

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views100 pages

Cs498 Week 12 Slide

Uploaded by

wzhi1870

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

Cloud Computing

Prof Roy Campbell

Reza Farivar
Space reserved for video
University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Fall 2015 Streaming Introduction

Cloud Computing Applications - Roy Campbell 1

Why Real-Time Stream Processing?

• Real-time data processing at massive scale

is becoming a requirement for businesses
• Real-time search, high frequency trading, social
networks
• Have a stream of events that flow into the
system at a given data rate
• The processing system must keep up with
Space reserved for video
the event rate or degrade gracefully by Do not put anything here
eliminating events. This is typically called
load shedding
Why Real-Time Stream Processing?

• MapReduce, Hadoop, etc., store and

process data at scale, but not for real-time
systems
• There's no hack that will turn Hadoop into a
real-time streaming system
• Fundamentally different set of requirements
than batch processing
Space reserved for video
• Lack of a "Hadoop of real-time" has Do not put anything here
become the biggest hole in the data
processing ecosystem
Cloud Streaming Engines

• Apache Storm
• Twitter Heron
• Apache Flink

• Older non-cloud systems

• IBM System S
Space reserved for video
• Borealis Do not put anything here
• Descendent of Aurora from Brown University. Not
active anymore
Space reserved for video
Do not put anything here
The Rise of Real-Time

▪ As Hadoop ramped up to offer batch data availability, a growing need arose to provide
data in real-time for analytic and instant feedback use cases

▪ Storm became stable for production scale in 2012

Space reserved for video

Do not put anything here
The Storm Fire Hose

▪ Topologies
● graph of spouts and bolts that are connected with stream groupings
● runs indefinitely (no time/batch boundaries)
▪ Streams
● unbounded sequence of tuples that is processed and created in parallel in a
distributed fashion
▪ Spouts
● input source of streams in topology Space reserved for video
▪ Bolts Do not put anything here
● processing container, which can perform transformation, filter, aggregation,
join, etc.
● sinks: special type of bolts that have an output interface
How Did We Get Here?

▪ People always have wanted data faster

▪ Finally we had hardware costs that were in line with doing in-memory streaming for
billions of events/day

Space reserved for video

Do not put anything here
The Lambda Architecture: Real-Time + Batch

Space reserved for video

Do not put anything here
The Present Architecture
Batch data input
Real-time data input

Hadoop Storm
Transforms Spout
HDFS
Joins
Bolt Bolt

Validation
Sink
Aggs
Space reserved for video
Do not put anything here
Druid

Data Users / Customers

The Next Frontier:
Real-Time as Source of Truth

Storm
Spout

Transforms

Joins

Validation

Sink

Space reserved for video

Hadoop Do not put anything here
Aggs HDFS Druid

Data Users / Customers

Cloud Computing
Prof Roy Campbell

Space reserved for video

University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Storm Introduction: Bolts &

Fall 2015 Spouts

Cloud Computing Applications - Roy Campbell 1

Apache Storm

• Guaranteed data processing

• Horizontal scalability
• Fault tolerance
• No intermediate message brokers
• Higher-level abstraction than message
passing
• “Just works” Space reserved for video
Do not put anything here
• Hadoop of real-time streaming jobs
• Built by Backtype, then by Twitter, and
eventually Apache open source
Storm

Space reserved for video

Do not put anything here

Cluster
Master Node Worker Processes
Coordination
Storm Concepts
• Streams
• Unbounded sequences of tuples

• Spout
• Source of Streams
• E.g., Read from Twitter streaming API

• Bolts
• Processes input streams and produces new streams
• E.g., Functions, Filters, Aggregation, Joins
Space reserved for video
Do not put anything here

• Topologies
• Network of spouts and bolts
Storm Tasks

• Spouts and bolts execute as many tasks

across the cluster
• When a tuple is emitted, which task does it
go to?  User programmable:
• Shuffle grouping: pick a random task
• Fields grouping: consistent hashing on a subset
of tuple fields
Space reserved for video
• All grouping: send to all tasks Do not put anything here
• Global grouping: pick task with lowest id
Cloud Computing
Prof Roy Campbell

Space reserved for video

University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Fall 2015 Storm Word Count Example

Cloud Computing Applications - Roy Campbell 1

Streaming Word Count Example

Space reserved for video

Do not put anything here
Cloud Computing
Prof Roy Campbell

Space reserved for video

University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Writing
Fall 2015 the Storm Word Count Example

Cloud Computing Applications - Roy Campbell 1

Example: Word Count in Storm
Public static class SplitSentence extends ShellBolt implements IRichBolt {
//Code to split a sentence
}

Public static class Word Count implements IBasicBolt{

//Code to count words; have to override the execute function
Public void execute(Tuple tuple, BasicOutputCollector collector){
//…
} Space reserved for video
} Do not put anything here
Example: Word Count in Storm

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout(1, new KernelSpout(“kestrel.twitter.com”,
22133, “sentence_queue”, new StringScheme()),5);
builder.setBolt(2, new SplitSentence(), 8).shuffleGrouping(1);
builder.setBolt(3, new Word Count(), 12)
.fieldGrouping(2, new Fields(“word”));

Space reserved for video

Do not put anything here
Parallelism Degree
(Number of tasks for a spout or bolt)
Example: Word Count in Storm
StormSubmitter.submitTopology(“word count”, builder.createTopology);

Space reserved for video

Do not put anything here
• Scaling Storm to 4000 nodes

1
Open Source Big Data
@Yahoo
Bobby (Robert) Evans
[email protected]
@bobbydata
Architect @ Yahoo

2
Provide a Hosted Platform for Yahoo

3
What We Do
• Yahoo Scale
• Make it Secure
• Make it Easy

4
Yahoo Scale
Largest Cluster Size Total Nodes

Nodes
Nodes

Hadoop 5400 Hadoop 41000

Storm 300 Storm 2300

5
Yahoo Scale (Solving Hard Problems)
Network Topology Aware Scheduling

https://en.wikipedia.org/wiki/Network_topology
https://en.wikipedia.org/wiki/Knapsack_problem 6
Understanding Software and Hardware
State Storage (ZooKeeper):
 Limited to disk write speed (80MB/sec typically)
 Scheduling
• O(num_execs * resched_rate)
 Supervisor
• O(num_supervisors * hb_rate)
 Topology Metrics (worst case)
• O(num_execs * num_comps * num_streams * hb_rate)

On one 240-node Yahoo Storm cluster, ZK writes 16 MB/sec,

about 99.2% of that is worker heartbeats

Theoretical Limit:
80 MB/sec / 16 MB/sec * 240 nodes = 1,200 nodes

7
Apply it to Work Around Bottlenecks
Fix: Secure In-Memory Store for Worker Heartbeats (PaceMaker)
 Removes Disk Limitation
 Writes Scale Linearly
(but nimbus still needs to read it all, ideally in 10 sec or less)
240 node cluster’s complete HB state is 48MB, Gigabit is about 125 MB/s
10 s / (48 MB / 125 MB/s) * 240 nodes = 6,250 nodes
Theoretical Maximum Cluster Size
Zookeeper PaceMaker Gigabit

6250

1200
8
Make it Secure

9
Make it Easy
• Simple API
• Easy to Debug
• Easy to Setup
• Easy to Upgrade (no downtime ideally)

Heavy lifting done by the platform

10
Cloud Computing
Prof Roy Campbell

Space reserved for video

University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Fall 2015

Cloud Computing Applications - Roy Campbell 1

Guaranteeing Message Processing
in Three Tasty Flavors

1. None (like the old S4)

2. At Least Once: tuple trees,
anchoring, and spout replay
3. Exactly Once (like Hadoop or Puma)
Space reserved for video
Do not put anything here
Tuple Tree
Tuple Tree

Space reserved for video

Do not put anything here
Tuple Tree
• A spout tuple is not fully processed until all
tuples in the tree have been completed
• If the tuple tree is not completed within a
specified timeout, the spout tuple is
replayed
• Uses acker tasks to keep track of tuple
progress Space reserved for video
Do not put anything here
Anchoring
Reliability API for the user: “Anchoring” creates a new
edge in the tuple tree
Marks a single node in the tree
as complete

Space reserved for video

Do not put anything here
At Least Once
• What happens if there is a failure?
• You can double process events
• This is not so critical if you have something like
Hadoop to back you up and correct the issue
later
• Or if you are looking at statistical trends and
replay does not happen that often
• This requires you to have a spout that Space reserved for video
supports replay. Not all messaging Do not put anything here

infrastructure does
Example
SPOUT SPLIT COUNT
[“the”] [“the”, 2]

[“cow”] [“cow”, 1]

[“the cow jumped over [“jumped”] [“jumped”, 1]

the moon”]

[“over”] [“over”, 1]

[“the”]
Space reserved for video
Do not put anything here
[“moon”] [“moon”, 1]

Acker
Example
SPOUT SPLIT COUNT
[“the”] 4]
[“the”, 2]

[“cow”] [“cow”, 2]
1]

[“the cow jumped over [“jumped”] [“jumped”, 1]

the moon”]

[“over”] [“over”, 2]
1]

[“the”]
Space reserved for video
Do not put anything here
[“moon”] [“moon”, 2]
1]

Acker
Cloud Computing
Prof Roy Campbell

Space reserved for video

University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Fall 2015

Cloud Computing Applications - Roy Campbell 1

But What About State
• For most of storm, state storage is left up to
you
• If your bolt goes down with 3 weeks of
aggregated data that you have not stored
anywhere, well too bad!

Space reserved for video

Do not put anything here
Enter Trident
• Provides exactly once semantics
• In trident, state is a first-class citizen, but
the exact implementation of state is up to
you
• There are many prebuilt connectors to various
NoSQL stores like HBase
• Provides a high level API (similar to
Space reserved for video
cascading for Hadoop) Do not put anything here
Trident Example
public class Split extends BaseFunction {

public void execute(TridentTuple tuple, TridentCollector collector) {

String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
collector.emit(new Values(word));
} Space reserved for video
} Do not put anything here
No Acking Required
}
Trident Example
TridentTopology topology =
new TridentTopology(); Aggregates values and stores them.

TridentState wordCounts =
topology.newStream("spout1", spout)
.each(new Fields("sentence"), new Split(),
new Fields("word"))
.groupBy(new Fields("word")) Space reserved for video
Do not put anything here
.persistentAggregate(new
MemoryMapState.Factory(), new Count(), new
Fields("count"))
.parallelismHint(6);
Cloud Computing
Prof Roy Campbell

Space reserved for video

University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Fall 2015

Cloud Computing Applications - Roy Campbell 1

Inside Apache Storm
• Open Source, Git
• IntelliJ

Space reserved for video

Do not put anything here
DEMO

Space reserved for video

Do not put anything here

Cloud Computing Applications - Roy Campbell 3

Cloud Computing
Prof Roy Campbell

Space reserved for video

University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Fall 2015

Cloud Computing Applications - Roy Campbell 1

Structure
• Clojure
• Clojure: functional programming language
• Dialect of LISP
• Runs on JVM
• Complete Java interop
• Fast!
• Java
Space reserved for video
• Lots of code in Java, including the scheduler
Do not put anything here
Cloud Computing
Prof Roy Campbell

Space reserved for video

University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Fall 2015

Cloud Computing Applications - Roy Campbell 1

Thrift
• Storm.thrift
• Thrift compiler

Space reserved for video

Do not put anything here
Cloud Computing
Prof Roy Campbell

Space reserved for video

University of Illinois at Urbana-Champaign (UIUC)
Do not put anything here

Fall 2015

Cloud Computing Applications - Roy Campbell 1

Scheduler
• IScheduler
• Multi-tenant scheduler

Space reserved for video

Do not put anything here
Space reserved for video
Do not put anything here

Spark Streaming
Stateful Stream Processing

mutable state
• Traditional streaming systems have a
record-at-a-time processing model input
records
• Each node has mutable state
• For each record, update state and send node 1
new records
• State is lost if node dies!
node 3
• Lambda Architecture input
• Making stateful stream processing be records Space reserved for video
fault-tolerant is challenging Do not put anything here
node 2

2
Existing Streaming Systems

• Storm
• Replays record if not processed by a
node
• Processes each record at least once
• May update mutable state twice!
• Mutable state can be lost due to failure!
Space reserved for video
• Trident – Use transactions to update state Do not put anything here
• Processes each record exactly once
• Per state transaction to external database
is slow
3
Spark
• Spark was a project out of Berkeley from 2010
• Has become very popular
• Most contributed open source project in big-data domain
• RDD: Resilient Distributed Data Set

Space reserved for video

Do not put anything here
Spark Streaming
• Window a bit of
data
• Run a batch
• Repeat

Space reserved for video

Do not put anything here
Discretized Stream Processing

live data stream

Spark
 Chop up the live stream into batches of X seconds Streaming

 Spark treats each batch of data as RDDs and processes

them using RDD operations batches of X seconds

 Finally, the processed results of the RDD operations are

returned in batches
Space reserved forSpark
video
processedDo not put anything here
results

6
Discretized Stream Processing

live data stream

Spark
 Batch sizes as low as ½ second, latency of about 1 Streaming
second
batches of X seconds
 Potential for combining batch processing and
streaming processing in the same system

Space reserved forSpark

video
processedDo not put anything here
results

7
Spark Streaming example
sc = SparkContext("local[2]", "NetworkWordCount")
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream("localhost", 9999)
# Split each line into words
words = l
# Count each word in each batch
pairs = words.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)Space reserved for video
# Print the first ten elements of each RDD generatedDoinnotthis
put anything
DStream here
to the console
wordCounts.pprint()ines.flatMap(lambda line: line.split(" "))
DStream Input Sources
• Out of the box
• Kafka
• HDFS
• Flume
• Akka Actors
• Raw TCP sockets

Space reserved for video

• Very easy to write a receiver for your own data Do
source
not put anything here
Arbitrary Stateful Computations

• updateStateByKey
• maintain arbitrary state while continuously
updating it with new information
• How to use
• Define the state - The state can be an arbitrary
data type
• Define the state update function - Specify with a
function how to update the state using the
previous state and the new values from an input Space reserved for video
stream Do not put anything here
• state update function applied in every batch for all
existing keys
Arbitrary Stateful Computations

Space reserved for video

Do not put anything here
Spark ML, Graph, etc.
• Advantage of Spark Streaming:
• Rich ecosystem of big data tools
• Spark SQL
• Spark ML
• Spark GraphX
• SparkR
• Disadvantage:
Space reserved for video
• Not really streaming Do not put anything here
Space reserved for video
Do not put anything here

Lambda and Kappa Architecture

Lambda Architecture
• Lambda architecture In-memory
KV store

• Why Lambda? Because things fail Space reserved for video

• Batch handles failures just fine Do not put anything here
• A true streaming system has to guarantee
idempotency
Kappa Architecture
• Kappa Architecture

• Only the streaming path Space reserved for video

• But what about the state? Do not put anything here

• Perhaps Microbatch can help

Streaming Ecosystem
Reza Farivar
Capital One

[email protected]
Components of a streaming ecosystem
• Gather the data
• Funnel
• Distributed Queue
• Real-Time Processing
• Semi-Real-Time Processing
• Real-time OLAP
Step 1: Gather the Data
• Apache NiFi is a good distributed funnel
• Was made in NSA
• Over 8 years of development
• Open sourced in 2014 and picked up by HortonWorks
• Great visual UI to design a data flow
• Has many many processor types in the box
• But not very good for heavy weight distributed processing
• Same graph is executed on all the nodes
NiFi Components
• FlowFile
• Unit of data moving through the system
• Content + Attributes (key/value pairs)
• Processor
• Performs the work, can access FlowFiles
• Connection
• Links between processors
• Queues that can be dynamically prioritized
• Process Group
• Set of processors and their connections
• Receive data via input ports, send data via output ports
NiFi GUI
• Drag and drop processors to build a flow
• Start, stop, and configure components in real time
• View errors and corresponding error messages
• View statistics and health of data flow
• Create templates of common processor & connections
NiFi Site-to-Site
• Site-to-site allows very easy pushing of data from one data center to
another
• Makes it a great choice for
distributed funnel
Step 2: Distributed Queue
• Pub-sub model
Producer publish(topic, msg) Consumer
subscribe
• Kafka a very poular
example Topic
1
Topic msg
2
Topic
3
Publish subscribe
system Consumer
Producer
msg
Kafka Architecture
• Distributed, high-throughput,
pub-sub messaging system
• Fast, Scalable, Durable Producer Producer

• Main use cases:

• log aggregation, real-time Broker Broker ZK Broker Broker
processing, monitoring,
queueing
• Originally developed by
Consumer Consumer
LinkedIn
• Implemented in Scala/Java
Kafka Manager
• There are some CLI tools
kafka-console-producer
kafka-console-consumer
Kafka-topics
kafka-consumer-offset-checker

• Some very new open-source projects for monitoring Kafka

• Kafka-manager by yahoo
• https://github.com/yahoo/kafka-manager
Step 3: Distributed Processing
• Once data is in the Kafka message broker, we need to process it
• Filter
• Join
• Windowing
• Business logic
• Real-time requirements
• Sub ms to 10 ms
Storm
• Apache Storm
• Built in backtype, sold to Twitter
• Written in Clojure
Storm Architecture
Storm programming
• Topology
• Spouts
• Bolts
• Tuples
• Streams
• topologyBuilder API

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("words", new TestWordSpout(), 10);
builder.setBolt("exclaim1", ne ExclamationBolt(), 3)
.shuffleGrouping("words");
builder.setBolt("exclaim2", new ExclamationBolt(), 2)
.shuffleGrouping("exclaim1");
Example topology
• Storm is great for non-
trivial large scale
processing
• Mature enterprise
level features,
including multitenancy
and security
• Work on resource
aware scheduling
Step 5: Micro batch processing / SQL / ML
• Instead of real-time event-by event processing, we can do micro
batch
• Reduce overheads
• Fault tolerance  Kappa architecture
• High latency
Spark
• Spark was a project out of Berkeley from 2010
• Has become very popular
• Most contributed open source project in big-data domain
• RDD: Resilient Distributed Data Set
Spark Streaming
• Window a bit of
data
• Run a batch
• Repeat
Spark ML, Graph, etc.
• Advantage of Spark Streaming:
• Rich ecosystem of big data tools
• Spark SQL
• Spark ML
• Spark GraphX
• SparkR
• Disadvantage:
• Not really streaming
Benchmark: ETL pipeline
Three-way Comparison
• Flink and Storm
have similar linear
performance
profiles
• These two systems
process an
incoming event as
it becomes
available
• Spark Streaming
has much higher
latency, but is
expected to handle
higher throughputs
• System behaves in
a stepwise
function, a direct
result from its
micro-batching
nature
Side note: in-memory key-value store
• Redis
• Cassandra
Step 6: OLAP (Online Analytical Processing)
• Business Intelligence
• Multidimensional data analytics
• Analyze multidimensional data interactively
• Basic Operations
• Consolidation (roll-up, aggregation in dimensions)
• Drill-down (filter)
• Slicing and dicing (Look at the data from different viewpoints)
Druid
• Developed in Metamarkets in 2011
• RDBMs: Too slow
• NoSQL key value store: fast, but exponential memory space, precompute very slow
• Gaining in popularity
• Open Source (Apache license) in late 2012
• OLAP queries
• Column oriented
• Sub second query time (Avg query time 0.5 seconds)
• Real-time streaming ingestion
• Scalable
Druid
• Arbitrary slice and dive of data
Druid Architecture
Druid Bitmap Index
• This is one of the
reasons Druid is so fast
• Dictionary encoding
• Bitmap Index
• Compression ratio: 1
bit per record
• Logical AND/OR of a
few thousand numbers
for a query 
lightning fast queries
Step 7: BI
• Pivot
• web-based exploratory visualization UI for Druid
• Easily filter, split, visualize, etc.
• Tableu and SQL not natively supported 
• But wait!
Pivot
Druid and Spark
• Druid’s native API is JSON
• No Tableau, SQL support
• But there is hope!
https://github.com/SparklineData/spark-druid-olap

• Connect Druid to Tableu

through Spark
Why Druid and Spark together?
• Spark is great as a general engine
• Everything and the kitchen sink
• Queries can take a long time
• Still much faster than Hive on Yarn
• Druid is optimized for Column based time-series queries
Questions?
Email: [email protected]

Basement Ventilation
67% (3)
Basement Ventilation
8 pages
Apache Storm Thesis
100% (2)
Apache Storm Thesis
7 pages
Science in The Clouds: History, Challenges, and Opportunities
100% (1)
Science in The Clouds: History, Challenges, and Opportunities
59 pages
Introduction To Edge Computing
No ratings yet
Introduction To Edge Computing
27 pages
HD Mod012 Storm
No ratings yet
HD Mod012 Storm
79 pages
Cloud MR
No ratings yet
Cloud MR
51 pages
Cloud Computing For ML Sys Class
No ratings yet
Cloud Computing For ML Sys Class
48 pages
Edge Computing
100% (1)
Edge Computing
24 pages
Lecture 9 - Realtime Analytics
No ratings yet
Lecture 9 - Realtime Analytics
34 pages
Cloud Computing
No ratings yet
Cloud Computing
167 pages
Edge Computing Seminar Report
No ratings yet
Edge Computing Seminar Report
6 pages
W2C1 History Building Blocks Cloud Computing
No ratings yet
W2C1 History Building Blocks Cloud Computing
38 pages
Unit 3
No ratings yet
Unit 3
55 pages
Lec 03
No ratings yet
Lec 03
16 pages
Storm Berkeley
No ratings yet
Storm Berkeley
91 pages
Apache Storm Tutorial
No ratings yet
Apache Storm Tutorial
22 pages
Author's Accepted Manuscript: Journal of Network and Computer Applications
No ratings yet
Author's Accepted Manuscript: Journal of Network and Computer Applications
59 pages
Modern Compute Options On Cloud
No ratings yet
Modern Compute Options On Cloud
13 pages
11-Internet Engineering-Rezapour-CH11
No ratings yet
11-Internet Engineering-Rezapour-CH11
54 pages
Module 1
No ratings yet
Module 1
69 pages
Big Data Pipelines The Riseof Real Time
No ratings yet
Big Data Pipelines The Riseof Real Time
7 pages
CC Unit 1
No ratings yet
CC Unit 1
25 pages
Unit1 B
No ratings yet
Unit1 B
48 pages
Cloud, Fog, and Edge Computing: Cpts 464/564 April 24, 2019
No ratings yet
Cloud, Fog, and Edge Computing: Cpts 464/564 April 24, 2019
20 pages
Cloud Edge and Fog Computing
No ratings yet
Cloud Edge and Fog Computing
19 pages
Kostha 1
No ratings yet
Kostha 1
32 pages
Big Data Processing in The Cloud - Challenges and Platforms
No ratings yet
Big Data Processing in The Cloud - Challenges and Platforms
8 pages
BD Notes
No ratings yet
BD Notes
11 pages
Sensors 23 02931
No ratings yet
Sensors 23 02931
27 pages
Moble Edge Cloud
No ratings yet
Moble Edge Cloud
14 pages
DATA228 Lecture Notes Week 3
No ratings yet
DATA228 Lecture Notes Week 3
21 pages
Edge Sys
No ratings yet
Edge Sys
10 pages
ECS781P 2 CloudNetworking
No ratings yet
ECS781P 2 CloudNetworking
59 pages
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
BeSA Week 01-1
No ratings yet
BeSA Week 01-1
15 pages
PSLE Maths 2020 Paper 1 Booklet B
No ratings yet
PSLE Maths 2020 Paper 1 Booklet B
8 pages
Presentation1 - Module 1
No ratings yet
Presentation1 - Module 1
50 pages
L wk1 wk2 July31 2023
No ratings yet
L wk1 wk2 July31 2023
36 pages
Research Paper
No ratings yet
Research Paper
5 pages
Unit 1
No ratings yet
Unit 1
14 pages
Atharv 23 Cloud Computing Technology CaseStudy
No ratings yet
Atharv 23 Cloud Computing Technology CaseStudy
8 pages
A Review of Edge Computing Technology and Its Applications in Power Systems
No ratings yet
A Review of Edge Computing Technology and Its Applications in Power Systems
28 pages
Cloud Computing Unit 1
No ratings yet
Cloud Computing Unit 1
21 pages
L2 AWS Basics
No ratings yet
L2 AWS Basics
56 pages
Cloud Computing Notes
No ratings yet
Cloud Computing Notes
15 pages
Benefits of Apache Storm
No ratings yet
Benefits of Apache Storm
3 pages
Edge Computing
No ratings yet
Edge Computing
11 pages
Lecture 4
No ratings yet
Lecture 4
10 pages
Big Data Exam Help
No ratings yet
Big Data Exam Help
7 pages
Edge Computing
No ratings yet
Edge Computing
6 pages
Lecture 4 Parallel Programming in The Cloud
No ratings yet
Lecture 4 Parallel Programming in The Cloud
16 pages
Amazon Web Service CASE STUDY
No ratings yet
Amazon Web Service CASE STUDY
36 pages
2 Storm
No ratings yet
2 Storm
2 pages
Cheatsheet For CC
No ratings yet
Cheatsheet For CC
3 pages
Unit 5 - Cloud Computing
No ratings yet
Unit 5 - Cloud Computing
62 pages
Resource Scalability For Efficient Parallel Processing in Cloud
No ratings yet
Resource Scalability For Efficient Parallel Processing in Cloud
5 pages
How To Use LTMC For Master Data Migration
100% (1)
How To Use LTMC For Master Data Migration
13 pages
Cloud Services and Platforms - Compute Services
No ratings yet
Cloud Services and Platforms - Compute Services
4 pages
23 Hack in Sight 2014
100% (2)
23 Hack in Sight 2014
652 pages
Cloudcomputing
No ratings yet
Cloudcomputing
70 pages
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
No ratings yet
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
36 pages
KT Ykts
No ratings yet
KT Ykts
41 pages
State of The Art and Critique of Cloud Computing
No ratings yet
State of The Art and Critique of Cloud Computing
11 pages
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
No ratings yet
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
27 pages
1756 ControlLogix Controllers
No ratings yet
1756 ControlLogix Controllers
40 pages
My Strategy - MACD.HA
No ratings yet
My Strategy - MACD.HA
6 pages
Iare DS Lecture Notes 2
No ratings yet
Iare DS Lecture Notes 2
135 pages
Mu-Analysis and Synthesis Toolbox
No ratings yet
Mu-Analysis and Synthesis Toolbox
734 pages
Chapter-4 Basic of Statistics
No ratings yet
Chapter-4 Basic of Statistics
4 pages
Flexim Fluxus F60x Quick Start Guide
100% (1)
Flexim Fluxus F60x Quick Start Guide
2 pages
Att 8 - ASTM B8-4
No ratings yet
Att 8 - ASTM B8-4
7 pages
Intro S4HANA Using Global Bike Exercises PP Fiori en v4.2
No ratings yet
Intro S4HANA Using Global Bike Exercises PP Fiori en v4.2
16 pages
Assignment 1 Excel Spreadsheet 2 3
No ratings yet
Assignment 1 Excel Spreadsheet 2 3
20 pages
Otago 649834
No ratings yet
Otago 649834
27 pages
Ramsey S Legacy 1st Edition Lillehammer Download PDF
100% (6)
Ramsey S Legacy 1st Edition Lillehammer Download PDF
84 pages
Draftspecificationformantransformer 7775 Kvawithincr
No ratings yet
Draftspecificationformantransformer 7775 Kvawithincr
13 pages
STB1003 Unit-3A
No ratings yet
STB1003 Unit-3A
18 pages
Constructive Cost Model
No ratings yet
Constructive Cost Model
14 pages
Finite - Element - Modeling - of - Prestressed - Concrete - SP
No ratings yet
Finite - Element - Modeling - of - Prestressed - Concrete - SP
11 pages
Unit 1 & 2
No ratings yet
Unit 1 & 2
26 pages
E-Learning and Job Performance of Academic Staff in Bayelsa State Owned Universities
No ratings yet
E-Learning and Job Performance of Academic Staff in Bayelsa State Owned Universities
6 pages
Asynch Exercise 2 WACC APV
No ratings yet
Asynch Exercise 2 WACC APV
2 pages
Multi Class Logistic Regression Training and Testing
No ratings yet
Multi Class Logistic Regression Training and Testing
9 pages
Confined Space Entry Permit Sample 1
No ratings yet
Confined Space Entry Permit Sample 1
2 pages
Surmount International School Half Yearly Examination (2019-2020) Class: 10 Subject: Mathematics
No ratings yet
Surmount International School Half Yearly Examination (2019-2020) Class: 10 Subject: Mathematics
4 pages
Study of Suspension System in All Terrain Vehicle: Presented by
No ratings yet
Study of Suspension System in All Terrain Vehicle: Presented by
14 pages
Exercise 2: Nerve Conduction
No ratings yet
Exercise 2: Nerve Conduction
10 pages
Error TPV
No ratings yet
Error TPV
7 pages
CE Topic 2 & 3
No ratings yet
CE Topic 2 & 3
2 pages
Android NDK: Beginner's Guide - Second Edition
From Everand
Android NDK: Beginner's Guide - Second Edition
Sylvain Ratabouil
No ratings yet

Cs498 Week 12 Slide

Uploaded by

Cs498 Week 12 Slide

Uploaded by

Cloud Computing

Prof Roy Campbell

Fall 2015 Streaming Introduction

Cloud Computing Applications - Roy Campbell 1

• Real-time data processing at massive scale

• MapReduce, Hadoop, etc., store and

• Older non-cloud systems

▪ Storm became stable for production scale in 2012

Space reserved for video

▪ People always have wanted data faster

Space reserved for video

Space reserved for video

Data Users / Customers

Space reserved for video

Data Users / Customers

Space reserved for video

Storm Introduction: Bolts &

Cloud Computing Applications - Roy Campbell 1

• Guaranteed data processing

Space reserved for video

• Spouts and bolts execute as many tasks

Space reserved for video

Fall 2015 Storm Word Count Example

Cloud Computing Applications - Roy Campbell 1

Space reserved for video

Space reserved for video

Cloud Computing Applications - Roy Campbell 1

Public static class Word Count implements IBasicBolt{

TopologyBuilder builder = new TopologyBuilder();

Space reserved for video

Space reserved for video

Hadoop 5400 Hadoop 41000

On one 240-node Yahoo Storm cluster, ZK writes 16 MB/sec,

Heavy lifting done by the platform

Space reserved for video

Cloud Computing Applications - Roy Campbell 1

1. None (like the old S4)

Space reserved for video

Space reserved for video

[“the cow jumped over [“jumped”] [“jumped”, 1]

[“the cow jumped over [“jumped”] [“jumped”, 1]

Space reserved for video

Cloud Computing Applications - Roy Campbell 1

Space reserved for video

public void execute(TridentTuple tuple, TridentCollector collector) {

Space reserved for video

Cloud Computing Applications - Roy Campbell 1

Space reserved for video

Space reserved for video

Cloud Computing Applications - Roy Campbell 3

Space reserved for video

Cloud Computing Applications - Roy Campbell 1

Space reserved for video

Cloud Computing Applications - Roy Campbell 1

Space reserved for video

Space reserved for video

Cloud Computing Applications - Roy Campbell 1

Space reserved for video

Space reserved for video

Space reserved for video

live data stream

 Spark treats each batch of data as RDDs and processes

 Finally, the processed results of the RDD operations are

live data stream

Space reserved forSpark

Space reserved for video

Space reserved for video

Lambda and Kappa Architecture

• Why Lambda? Because things fail Space reserved for video

• Only the streaming path Space reserved for video

• Perhaps Microbatch can help

• Main use cases:

• Some very new open-source projects for monitoring Kafka

TopologyBuilder builder = new TopologyBuilder();

• Connect Druid to Tableu

You might also like