0% found this document useful (0 votes)
5 views

Module 08 Flink – Stream Processing and Batch Processing Platform

The document provides an overview of Flink, a unified computing framework for batch and stream processing, detailing its technical principles, architecture, and integration with FusionInsight HD. Key features include low latency, fault tolerance through a checkpoint mechanism, and scalability. The document also discusses application scenarios, data processing capabilities, and interaction with other components like HDFS and Yarn.

Uploaded by

Lucas Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module 08 Flink – Stream Processing and Batch Processing Platform

The document provides an overview of Flink, a unified computing framework for batch and stream processing, detailing its technical principles, architecture, and integration with FusionInsight HD. Key features include low latency, fault tolerance through a checkpoint mechanism, and scalability. The document also discusses application scenarios, data processing capabilities, and interaction with other components like HDFS and Yarn.

Uploaded by

Lucas Oliveira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Technical Principles of

Flink

www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.


Objectives
 After completing this course, you will be able to understand:
 Technical principles of Flink
 Key features of Flink
 Flink integration in FusionInsight HD

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Contents
1. Flink Overview

2. Technical Principles and Architecture of Flink

3. Flink Integration in FusionInsight HD

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Flink Overview
 Flink is a unified computing framework that supports both batch processing
and stream processing. It provides a streaming data processing engine that
supports data distribution and parallel computing. Flink features stream
processing, and is a top open-source stream processing engine in the
industry.
 Flink, similar to Storm, is an event-driven real-time streaming system.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Key Features of Flink

Streaming-first Fault-tolerant
- Stream processing - Reliability and
engine checkpoint
mechanism

Flink

Scalable Excellent
- Scaling out to over
performance
1000 nodes - High throughput
and low latency

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Application Scenarios of Flink
 Flink provides high-concurrency data processing, millisecond-
level latency, and high reliability, making it extremely suitable
for low-latency data processing scenarios.
 Typical scenarios:
 Internet finance services
 Clickstream log processing
 Public opinion monitoring

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Key Features of Flink
 Low Latency
 Millisecond-level processing capability.
 Exactly Once
 Asynchronous snapshot mechanism, ensuring that all data is processed
only once.
 HA
 Active/standby JobManagers, preventing single points of failure (SPOFs).
 Scale-out
 Manual scale-out supported by TaskManagers.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Hadoop Compatibility

 Flink supports Yarn and can obtain data from the Hadoop distributed file
system (HDFS) and HBase.
 Flink supports all formatted input and output of Hadoop.
 Flink supports the Mappers and Reducers of Hadoop, which can be used
together with Flink operations.
 Flink can run Hadoop jobs faster.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 8
Performance Comparison of Stream
Computing Frameworks

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
Contents
1. Flink Overview

2. Technical Principles and Architecture of Flink

3. Flink Integration in FusionInsight HD

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Flink Architecture

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
Flink Technology Stack

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Core Concept of Flink - DataStream
 DataStream: Flink uses DataStream to represent data streams in applications. Data
streams can be considered as an unchangeable collection of duplicate data. The
number of DataStream elements is unlimited.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
DataStream
 Data source: indicates the streaming data source, which can be HDFS
files, Kafka data, or texts.
 Transformations: indicates streaming data conversion.
 Data sink: indicates data output, which can be HDFS files, Kafka data,
or texts.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Data Source of Flink
Batch processing Stream processing
 Files  Files
 HDFS, local file system, and  Socket streams
MapR file system  Kafka
 Text, CSV, Avro, and Hadoop  RabbitMQ
input formats
 Flume
 JDBC
 Collections
 HBase
 Implement your own
 Collections
 SourceFunction.collect

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 15
DataStream Transformations
Common transformations:
public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper)

public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T, R> flatMapper)

public SingleOutputStreamOperator<T> filter(FilterFunction<T> filter)

public KeyedStream<T, Tuple> keyBy(int... fields)

public <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, int field)

public DataStream<T> rebalance()

public DataStream<T> shuffle()

public DataStream<T> broadcast()

public <R extends Tuple> SingleOutputStreamOperator<R> project(int... fieldIndexes)

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
DataStream Transformations

flatMap

1 3

writeAsText
Window/Join

6
HDFS

HDFS
textFile

map keyBy

2 4 5
Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Flink Application Running Process - Key
Roles
Client:
Indicates the request initiator, which submits application requests and creates the data
flow.

JobManager:
Manages the resources for applications. JobManager applies to
ResourceManager for resources based on the requirements of
applications.

ResourceManager of Yarn:
Indicates the resource management department, which schedules and allocates the
resources of the entire cluster in a unified manner.

TaskManager:
Performs computing work. An application will be split and assigned to multiple
TaskManagers for computing.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Flink Job Running Process

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Flink on Yarn

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Technical Principles of Flink (1)
 A Flink application consists of streaming data and transformation operators.
 Conceptually, a stream is a (potentially never-ending) flow of data records,
and a transformation is an operator that takes one or more streams as input,
and produces one or more output streams as a result.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Technical Principles of Flink (2)
The source operator is used to load streaming data. Transformation operators,
such as map(), keyBy(), and apply(), are used to process streaming data. After
streaming data is processed, the sink writes the processed streaming data into
related storage systems, such as HDFS, HBase, and Kafka.

Source Operator Transformation Operator Sink Operator

keyBy()
Source map() Sink
apply()

Stream

Streaming Dataflow

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Parallel DataStream of Flink
Streaming Dataflow (condensed view)

keyBy()
Source map() Sink
apply()

Stream
Operator

Source[1 map()
[1] keyBy()
] apply()
[1]
Operator Stream Sink
Subtask Partition parallelism = 2 [1]

keyBy()
Source map() parallelism = 1
apply()
[2] [2]
[2]

Streaming Dataflow (parallelized view)

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Operator Chain of Flink
Streaming Dataflow (condensed view)

Source map() keyBy()


Sink
apply()

Operator Chain Task

keyBy()
Source map() apply()
[1] [1] [1]

Sink
Subtask (=thread) Subtask (=thread)
[1]

Source map() keyBy()


[2] [2] apply()
[2]

Streaming Dataflow (parallelized view)

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Windows of Flink
 Flink supports operations based on time windows and operations
based on data windows.
 Categorized by splitting standard: time windows and count windows
 Categorized by window action: tumbling windows, sliding windows, and
custom windows

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Common Window Types of Flink (1)
 Tumbling windows, whose times do not overlap

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Common Window Types of Flink (2)
 Sliding windows, whose times overlap

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Common Window Types of Flink (3)
 Session windows, which are considered completed if there is no data
within the preset time period.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Fault Tolerance of Flink
 The checkpoint mechanism is a key fault tolerance measure of Flink.
 The checkpoint mechanism keeps creating status snapshots of stream
applications. The status snapshots of the stream applications are stored at a
configurable place (for example, in the memory of JobManager or on HDFS).
 The core of the distributed snapshot mechanism of Flink is the barrier. Barriers
are periodically inserted into data streams and flow as part of the data streams.
New tuple DataStream Old tuple

Checkpoint barrier n Checkpoint barrier n-1

Part of Part of Part of


Checkpoint n+1 Checkpoint Checkpoint n-1

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Checkpoint Mechanism (1)
 The checkpoint mechanism is the reliability pillar stone of Flink. When
an exception occurs on an operator in the Flink cluster (for example,
unexpected exit), the checkpoint mechanism can restore all application
statuses at a previous time so that all statuses are consistent.
 This mechanism ensures that when a running application fails, all
statuses of the application can be restored from a checkpoint so that
data is processed only once. Alternatively, you can choose to process
data at least once.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 30
Checkpoint Mechanism (2)
Barrier Source Intermediate Sink
operator operator operator

CheckpointCoordinator

Barrier
Source Intermediate
Sink
operator operator operator

CheckpointCoordinator
Snapshot
Barrier
Source Intermediate
Sink
operator operator operator

CheckpointCoordinator Snapshot

Source Intermediate Sink


operator operator operator

CheckpointCoordinator
Snapshot

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 31
Checkpoint Mechanism (3)

A Barrier of source A

C D
B Barrier of source B

A Barrier of source A

C D
B Barrier of source B

Snapshot
A
Merged barrier
C D
B

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
Contents
1. Flink Overview

2. Technical Principles and Architecture of Flink

3. Flink Integration in FusionInsight HD

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Location of Flink in FusionInsight
Products
Application service layer
Open API/SDK REST/SNMP/Syslog

DataFarm Data Porter Information


Miner Knowledge Farmer Wisdom Manager
System
Hadoop API management
Plugin API
Service
governance
Hive MapReduce Spark Storm Flink
Hadoop LibrA Security
Yarn/ZooKeeper
management
HDFS/HBase

FusionInsight HD provides a Big Data processing environment and selects the best practice
in the industry based on scenarios and open source software enhancement.
Flink is a unified computing framework that supports both batch processing and stream
processing. Flink provides high-concurrency pipeline data processing, millisecond-level
latency, and high reliability.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35
Flink WebUI
 The FusionInsight HD platform provides a visual management and monitoring UI for
Flink. You can use the Yarn WebUI to query the running status of Flink tasks.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Interaction of Flink with Other
Components
 In the FusionInsight HD cluster, Flink interacts with the
following components:
 HDFS: (mandatory) Flink reads and writes data in HDFS.
 Yarn: (mandatory) Flink relies on Yarn to schedule and manage
resources for running tasks.
 ZooKeeper: (mandatory) Flink relies on ZooKeeper to implement
the checkpoint mechanism.
 Kafka: (optional) Flink can receive data streams sent from Kafka.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 37
Summary
 These slides describe the following information about Flink:
basic concepts, application scenarios, technical architecture,
window types, and Flink on Yarn.
 These slides also describe Flink integration in FusionInsight HD.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Quiz
1. What are the key features of Flink?

2. What are the common window types of Flink?

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 39
More Information
 Training materials:
 http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
 Exam outline:
 http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
 Mock exam:
 http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
 Authentication process:
 http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 40
Thank You
www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 41

You might also like