Module 07 Streaming - Distributed Stream Computing Engine
Module 07 Streaming - Distributed Stream Computing Engine
Streaming
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. Introduction to Streaming
2. System Architecture
3. Key Features
4. Introduction to StreamCQL
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Streaming Overview
Streaming is a distributed real-time computing framework based on
the open source Storm.with the following features:
Real-time response with low delay
Continuous query
No waiting; Results delivered in-flight
Event-driven Event Alerts
Data Actions
Queries
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Application Scenarios of Streaming
Streaming is applicable to the following scenarios:
Real-time analysis: real-time log processing and vehicle traffic analysis
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Position of Streaming in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Comparison with Spark Streaming
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Comparison of Application Scenario
Real-time Performance
Streaming
Spark Streaming
Time
milliseconds seconds
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Contents
1. Introduction to Streaming
2. System Architecture
3. Key Features
4. Introduction to StreamCQL
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
Basic Concepts (1)
Topology: a real-time application in Streaming.
Nimbus: assigns resources and schedules tasks.
Supervisor: receives tasks assigned by Nimbus, and starts/stops
Worker processes.
Worker: runs component logic processes.
Spout: generates source data flows in a topology.
Bolt: receives and processes data in a topology.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Basic Concepts (2)
Task: a Spout or Bolt thread of Worker.
Tuple: core data structure of Streaming. It is basic message delivery
unit in key-value pairs, which can be created and processed in a
distributed way.
Stream: an infinite continuous Tuple sequence.
Zookeeper: provides distributed collaboration services for processes.
Active/Standby Nimbus, Supervisor, and Worker register their
information in ZooKeeper. This enables Nimbus to detect the health
status of all roles.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
System Architecture
Submits a Monitors the heartbeat
topology. and assigns tasks.
Client Nimbus
ZooKeeper
ZooKeeper
Obtains tasks.
Supervisor Supervisor
ZooKeeper
Starts Worker.
Worker
Executor
Worker
Executor Reports the heartbeat.
Worker
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Topology
A topology is a directed acyclic graph (DAG) consisting of Spout (data source) and Bolt
(for logical processing). Spout and Bolt are connected through Stream Groupings.
Service processing logic is encapsulated in topologies in Streaming.
Filters data.
Spout
Triggers external
messages.
Bolt C
Persistent archiving
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Worker
Worker: A Worker is a JVM process and Worker Process
a topology runs in one or more Workers. Executor
Executor
A started Worker runs all the way Task
Executor: In a Worker process runs one or more Executor threads. Each Executor can run one
or more task instances of either Spout or Bolt.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Task
Both Spout and Bolt in a topology support concurrent running. In the topology, you can
specify the number of concurrently running tasks on each node. Streaming assigns tasks
in the cluster to enable simultaneous calculation and enhance processing capability of
the system.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Message Delivery Policies
Grouping Mode Description
Delivers messages in groups to tasks of the target
fieldsGrouping (field grouping)
Bolt according to message hash values.
Delivers all messages to a fixed task of the target
globalGrouping (global grouping)
Bolt.
Delivers messages to a random task of the target
shuffleGrouping (shuffle grouping)
Bolt.
Delivers messages randomly to tasks if one or more
localOrShuffleGrouping (local or shuffle grouping) tasks exist in the target Bolt process, or delivers
messages in shuffle grouping mode.
allGrouping (broadcast grouping) Delivers messages to all tasks of the target Bolt.
Delivers messages to the task of the target Bolt
specified by the data producer. The task ID needs
directGrouping (direct grouping)
to be specified by using the emitDirect (taskID,
tuple) interface.
partialKeyGrouping (partial field grouping) Balanced field grouping.
noneGrouping (no grouping) Same as shuffle grouping.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
Contents
1. Introduction to Streaming
2. System Architecture
3. Key Features
4. Introduction to StreamCQL
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Nimbus HA
ZooKeeper cluster
Streaming cluster
Active Standby
Nimbus Nimbus
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Disaster Recovery
Services are automatically migrated from faulty nodes to normal ones, preventing
service interruptions.
Zero
manual
operation
Node1 Node2 Node3
Topo1 Topo1 Topo1
Topo1 Topo3
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Message Reliability
Reliability Processing
Description
Level Mechanism
This mode involves the highest throughput and applies to
At Most Once None
messages with low reliability requirements.
This mode involves low throughput and applies to messages
At Least Once Ack with high reliability problems. All data must be completely
processed.
Trident is a special transactional API provided by Storm and
Exactly Once Trident
involves the lowest throughput.
When a tuple is completely processed in Streaming, the tuple and all its derived tuples are successfully
processed. A tuple fails to be processed if the processing is not complete within the timeout period.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Ack Mechanism
When Spout sends a tuple, it notifies Acker
that a new root message is generated. Acker Spout Ack6
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Reliability Level Setting
If not every message is required to be processed (allowing some
message loss), the reliability mechanism can be disabled to ensure
better performance.
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Streaming and Other Components
HDFS, HBase, Kafka…
Streaming
HDFS
Kafka
Topology1
Topic1 Redis
Topic2 Topology2
HBase
Topic N
Topology N Kafka
……
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Contents
1. Introduction to Streaming
2. System Architecture
3. Key Features
4. Introduction to StreamCQL
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
StreamCQL Overview
StreamCQL(Stream Continuous Query Language) is a query language based on the
distributed stream processing platform based on and can be built on various stream
processing engines (mainly Apache Storm).
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
StreamCQL Easy to Develop
//Def Input:
public void open(Map conf,
TopologyContext context,
SpoutOutputCollector collector) {…} --Def Input:
public void nextTuple() {…} CREATE INPUT STREAM S1 …
public void ack(Object id) { …}
public void
declareOutputFields(OutputFieldsDeclar
er declarer) {…} --Def logic:
//Def logic: INSERT INTO STREAM filterstr SELECT *
public void execute(Tuple tuple, FROM S1 WHERE name="HUAWEI";
BasicOutputCollector collector) {…}
public void
declareOutputFields(OutputFieldsDeclar --Def Output:
er ofd) {…} CREATE OUTPUT STREAM S2…
//Def Output:
public void execute(Tuple tuple,
BasicOutputCollector collector) {…} --Def Topology:
public void SUBMIT APPLICATION test;
declareOutputFields(OutputFieldsDeclar
er ofd) {…} StreamCQL
//Def Topology:
public static void main(String[] args)
throws Exception {…}
Native Storm API
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
StreamCQL and Stream Processing
Platform
Function
Join Aggregate Split Merge Pattern Matching
Stream Window
Engine
Other stream
Storm processing engines
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Summary
This module describes the following information about
Streaming:
Definition
Application Scenarios
Position of Streaming in FusionInsight
System architecture of Streaming
Key features of Streaming
Introduction to StreamCQL
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Quiz
1. How is message reliability guaranteed in Streaming?
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
Quiz
1. Which of the following statements about Supervisor is CORRECT? ( )
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 32
Quiz
2. Which of the following statements about Supervisor is
CORRECT? ( )
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
More Information
Training materials:
http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
Exam outline:
http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
Mock exam:
http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
Authentication process:
http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Thank You
www.huawei.com
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35