0% found this document useful (0 votes)

23 views42 pages

Module 10 Flume - Massive Logs Aggregation

Flume is an open-source, distributed log aggregation system designed for collecting, processing, and transferring massive amounts of log data. It offers various features such as real-time log collection, customizable data senders and receivers, and supports multiple data sources and sinks including HDFS, HBase, and Kafka. The document outlines Flume's architecture, key characteristics, and provides examples of its applications in log collection and real-time data processing.

Uploaded by

Lucas Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views42 pages

Module 10 Flume - Massive Logs Aggregation

Uploaded by

Lucas Oliveira

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Technical Principles of

Flume

www.huawei.com

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.

Foreword
 Flume is an open-source log system, which is a distributed,
reliable, and high-available massive log aggregation system.
Flume supports customization of data senders and receivers for
collecting, processing and transferring data.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Objectives
 Upon completion of this course, you will be able to know:
 What Flume is
 Functions of Flume
 Position of Flume in FusionInsight
 System architecture of Flume
 Key characteristics of Flume
 Application Examples of Flume

2. Key Characteristics of Flume

3. Flume Applications

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 4
What Is Flume
 Flume is a streamed log collection tool. Flume can roughly
processes data and writes data to customizable data receivers.
Flume can collect data from various data sources such as local
files (spool directory source), real-time logs (taildir and exec),
REST message, Thrift, Avro, Syslog, and Kafka.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Functions of Flume
 Flume can collect logs from a specified directory and save the logs in a
specified path (HDFS, HBase, and Kafka).
 Flume can collect and save logs (taildir) to a specified path in real time.
 Flume supports the cascading mode (multiple Flume nodes interwork
with each other) and data aggregation.
 Flume supports customized data collection.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 6
Position of Flume in FusionInsight
Application service layer
OpenAPI/SDK REST/SNMP/Syslog

Data Information Knowledge Wisdom

DataFarm Flume Miner Farmer Manager
System
management
Hadoop API Plugin API
Service
governance
HIVE M/R Spark Storm Flink
Hadoop LibrA
YARN/ Zookeeper Security
management
HDFS/HBase

Flume is a distributed framework for collecting and aggregating stream data.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Architecture of Flume (1)
 Basic Flume architecture: Flume can directly collect data on a single node. This architecture is mainly
applicable to data collection within a cluster.

Source Sink

Log Channel HDFS

 Multi-agent architecture of the Flume: Multiple Flume nodes can be connected. After
collecting initial data from data sources, Flume saves the data in the final storage system.
This architecture is mainly applicable to the import of data outside to the cluster.

Source Sink Source Sink

Channel Channel
Log HDFS

Interceptor events

Channel
events events events
events
Source Channel Channel
Porcessor Selector Channel
events
Sink Sink
Runner Processor Sink

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 9
Basic Concept - Source (1)
 The source receives events or generates events based on special mechanisms.
The source can save events to one channel or multiple channels in batches.
The sources are classified into event-driven sources and event polling sources.
 Event-driven source: The external source actively sends data to Flume to
drive Flume to accept the data.
 Event polling source: Flume periodically obtains data in an active manner.
 The source must be associated with at least one channel.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 10
Basic Concept - Source (2)
Source Type Description
Runs a certain command or script, and outputs the
exec source
execution results as a data source.
Provides an Avro-based server. It binds the server with a
avro source port so that the server waits for the data sent from the
Avro-based client.
The same as the avro source. The transmission protocol is
thrift source
Thrift.
http source Supports data transmission based on HTTP POST.
syslog source Collects the syslog logs.

spooling directory source Collects local static files.

jms source Obtain data from the message queue.

Kafka source Obtain data from the Kafka.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 11
Basic Concept - Channel (1)
 The channel is located between the source and the sink. The channel functions similar
to the queue. It temporarily saves events. When the sink successfully sends events to
the next-hop channel or the destination, the events are removed from the current
channel.
 The persistence levels vary with channels.
 Memory channel: The persistence is not supported.
 File channel: The persistence is achieved based on the Write-Ahead Log (WAL).
 JDBC channel: The persistence is achieved based on the embedded database.
 Channels support transactions and provide weak sequence assurance. Channels can
connect any quantity of sources and sinks.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Basic Concept - Channel (2)
 Memory channel: Messages are saved in the memory. This channel supports
high throughput but no reliability. Data may be lost.
 File channel: It supports data persistence. However, the configuration is
complex. Both the data directory and the checkpoint directory need to be
configured. Checkpoint directories need to be configured for different file
channels.
 JDBC channel: It is the embedded Derby database. It supports event
persistence and high reliability. It can replace the file channel that also
supports persistence.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Basic Concept - Sink (1)
 The sink transmits events to the next hop or destination.
After the events are successfully transmitted, they are
removed from the current channel.
 The sink must bind to a specific channel.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Basic Concept - Sink (2)
Sink Type Description
hdfs sink Writes the data in the HDFS.

Transmits data to the next-hope Flume node using

avro sink
the Avro protocol.

The same as the avro sink. The transmission

thift sink
protocol is Thrift.

file roll sink Saves data in the local file system.

hbase sink Writes data in the HBase.

Kafka sink Writes data in the Kafka.

MorphlineSolr sink Writes data in the Solr.

2. Key Characteristics of Flume

3. Flume Applications

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 16
Log Collection
 Flume can collect logs beyond a cluster and archive the logs in the
HDFS, HBase, and Kafka for data analysis and cleaning by upper-layer
applications.

Log HDFS
Source Channel Sink

Log Source Channel HBase

Sink

Source Channel Kafka

Log Sink

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Multi-level Cascading and Multi-
channel Duplication
 Multiple Flume nodes can be cascaded. The cascaded nodes support
internal data duplication.
Source

Log Channel

Sink

Channel Sink HDFS

Source

Channel Sink HBase

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Message Compression and Encryption
by Cascaded Flume Nodes
 Data transmitted between cascaded Flume nodes can be compressed
and encrypted, thereby improving the data transmission efficiency
and security.

Flume

Compression Decompression HDFS/Hive/

and encryption and decryption HBase/Kafka
应用
Flume API

Flume monitoring information

FusionInsight
Manager

Application Flume
Received data size Transmitted data size
HDFS/Hive/Hbase/
Source Sink Kafka
Data buffer size
Flume API
Channel
Transmitted
data size

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Transmission Reliability
 Flume adopts the transaction management mode for data transmission. This mode
ensures the data security and enhances the reliability during transmission. In addition, if
the file channel is used to transmit data buffered in the channel, the data is not lost
when a process or node is restarted.

Channel Sink Source Channel

Start tx

Take events Send events

Start tx

Put events
End tx End tx

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 21
Transmission Reliability (Failover)
 During data transmission, if the next-hop Flume node is faulty or receives data
abnormally, the data is automatically switched over to another path.

Source Sink
HDFS
Sink
Source

Channel
Log
Channel
Sink
Source
Sink HDFS

Channel

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Data Filtering During Transmission
 During data transmission, Flume roughly filters and cleans the data.
The unnecessary data is filtered. In addition, you can develop filter
plug-ins based on the data particularity if you need to filter complex
data. Flume supports the third-party filter plug-ins.

Interceptor
Channel

events events
Source Channel Channel
Porcessor Selector events
Channel

2. Key Characteristics of Flume

3. Flume Applications

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Flume Example 1 (1)
 Description
 In this application scenario, Flume collects logs from an application
(for example, the online banking system) outside the cluster and
saves the logs in the HDFS.

 Data preparations
 Create a log directory /tmp/log_test on a node in the cluster

 Take this directory as the monitoring directory

 Log in to the FusionInsight HD cluster. Choose Service

Management > Flume > Download Client.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Flume Example 1 (3)
 Install Flume client:
 Decompress the client
Tar –xvf FusionInsight_V100R002C60_Flume_Client.tar
Tar –xvf FusionInsight_V100R002C60_Flume_ClientConfig.tar
Cd FussionInsight_V100R002C60_Flume_ClientConfig/Flume
Tar –xvf FusionInsight-Flume-1.6.0.tar.gz

 Install the client

./install.sh –d /opt/FlumeClient –f hostIP –c

flume/conf/client.properties.properties

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Flume Example 1 (4)
 Configure flume source
server.sources = a1
server.channels = ch1
server.sinks = s1
# the source configuration of a1
server.sources.a1.type = spooldir
server.sources.a1.spoolDir = /tmp/log_test
server.sources.a1.fileSuffix = .COMPLETED
server.sources.a1.deletePolicy = never
server.sources.a1.trackerDir = .flumespool
server.sources.a1.ignorePattern = ^$
server.sources.a1.batchSize = 1000
server.sources.a1.inputCharset = UTF-8
server.sources.a1.deserializer = LINE
server.sources.a1.selector.type = replicating
server.sources.a1.fileHeaderKey = file
server.sources.a1.fileHeader = false
server.sources.a1.channels = ch1

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Flume Example 1 (5)
 Configure flume channel
# the channel configuration of ch1
server.channels.ch1.type = memory
server.channels.ch1.capacity = 10000
server.channels.ch1.transactionCapacity = 1000
server.channels.ch1.channlefullcount = 10
server.channels.ch1.keep-alive = 3
server.channels.ch1.byteCapacityBufferPercentage = 20

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 29
Flume Example 1 (6)
 Configure flume sink
server.sinks.s1.type = hdfs
server.sinks.s1.hdfs.path = /tmp/flume_avro
server.sinks.s1.hdfs.filePrefix =
over_%{basename}
server.sinks.s1.hdfs.inUseSuffix = .tmp
server.sinks.s1.hdfs.rollInterval = 30
server.sinks.s1.hdfs.rollSize = 1024
server.sinks.s1.hdfs.rollCount = 10
server.sinks.s1.hdfs.batchSize = 1000
server.sinks.s1.hdfs.fileType = DataStream
server.sinks.s1.hdfs.maxOpenFiles = 5000
server.sinks.s1.hdfs.writeFormat = Writable
server.sinks.s1.hdfs.callTimeout = 10000
server.sinks.s1.hdfs.threadsPoolSize = 10
server.sinks.s1.hdfs.failcount = 10
server.sinks.s1.hdfs.fileCloseByEndEvent = true
server.sinks.s1.channel = ch1

 Upload the configuration file.

 Check if data is sinked to HDFS:

hdfs dfs –ls /tmp/flume_avro

 log.11 is already renamed log.11. COMPLETED， which means

success of data collection.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 33
Flume Example 2 (1)
 Description
 In this application scenario, Flume collects real-time clickstream
logs and saves the logs to the Kafka, for real-time analysis
processing.

 Data preparations
 Create a log directory /tmp/log_click on a node in the cluster

 Collect data to kafka topic_1028

# the source configuration of a1

server.sources.a1.type = spooldir
server.sources.a1.spoolDir = /tmp/log_click
server.sources.a1.fileSuffix = .COMPLETED
server.sources.a1.deletePolicy = never
server.sources.a1.trackerDir = .flumespool
server.sources.a1.ignorePattern = ^$
server.sources.a1.batchSize = 1000
server.sources.a1.inputCharset = UTF-8
server.sources.a1.selector.type = replicating
jserver.sources.a1.basenameHeaderKey = basename
server.sources.a1.deserializer.maxBatchLine = 1
server.sources.a1.deserializer.maxLineLength = 2048
server.sources.a1.channels = ch1

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 35
Flume Example 2 (3)
 Configure flume channel：
# the channel configuration of ch1
server.channels.ch1.type = memory
server.channels.ch1.capacity = 10000
server.channels.ch1.transactionCapacity = 1000
server.channels.ch1.channlefullcount = 10
server.channels.ch1.keep-alive = 3
server.channels.ch1.byteCapacityBufferPercentage = 20

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Flume Example 2 (4)
 Configure flume sink：
# the sink configuration of s1
server.sinks.s1.type = org.apache.flume.sink.kafka.KafkaSink
server.sinks.s1.kafka.topic = topic_1028
server.sinks.s1.flumeBatchSize = 1000
server.sinks.s1.kafka.producer.type = sync
server.sinks.s1.kafka.bootstrap.servers = 192.168.225.15:21007
server.sinks.s1.kafka.security.protocol = SASL_PLAINTEXT
server.sinks.s1.requiredAcks = 0
server.sinks.s1.channel = ch1

 Use kafka demands to view data collected kafka topic_1028.

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Summary
 This course describes Flume functions and application scenarios,
including the basic concepts, functions, reliability, and
configuration items. Upon completion of this course, you can
understand Flume functions, application scenarios, and
configuration methods.

2. What are key characteristics of the Flume?

3. What are functions of the source, channel, and sink?

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved. Page 41
More Information
 Training materials:
 http://support.huawei.com/learning/Certificate!showCertificate?lang=en&pbiPath=term100002
5450&id=Node1000011796
 Exam outline:
 http://support.huawei.com/learning/Certificate!toExamOutlineDetail?lang=en&nodeId=Node10
00011797
 Mock exam:
 http://support.huawei.com/learning/Certificate!toSimExamDetail?lang=en&nodeId=Node10000
11798
 Authentication process:
 http://support.huawei.com/learning/NavigationAction!createNavi#navi[id]=_40

Stream Processing Hands On With Apache Flink Free Lms Version
No ratings yet
Stream Processing Hands On With Apache Flink Free Lms Version
232 pages
Gauranga Das - The Art of Focus (2021, Penguin Random House India Private Limited) - Libgen - Li
67% (3)
Gauranga Das - The Art of Focus (2021, Penguin Random House India Private Limited) - Libgen - Li
253 pages
Scope Statement For The Time Table Generation System For Thapar University
60% (5)
Scope Statement For The Time Table Generation System For Thapar University
4 pages
Chapter 8 Flume - Massive Log Aggregation
No ratings yet
Chapter 8 Flume - Massive Log Aggregation
35 pages
Expose BDD
No ratings yet
Expose BDD
16 pages
FLUME
No ratings yet
FLUME
31 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
11 pages
Module 5 - Flume
No ratings yet
Module 5 - Flume
23 pages
What Is Apache Flume?: Collecting, Aggregating, and Moving Large Amounts of Log Data. in
No ratings yet
What Is Apache Flume?: Collecting, Aggregating, and Moving Large Amounts of Log Data. in
8 pages
Flume User Guide
No ratings yet
Flume User Guide
48 pages
Flume
No ratings yet
Flume
15 pages
Flume Agent
No ratings yet
Flume Agent
23 pages
Apache Flume Tutorial PDF
No ratings yet
Apache Flume Tutorial PDF
43 pages
Flume User Guide
No ratings yet
Flume User Guide
32 pages
Apache Flume: Distributed Log Collection For Hadoop - Second Edition - Sample Chapter
No ratings yet
Apache Flume: Distributed Log Collection For Hadoop - Second Edition - Sample Chapter
13 pages
U Iv Flume 1
No ratings yet
U Iv Flume 1
37 pages
8 - Big - Data Vivek
No ratings yet
8 - Big - Data Vivek
2 pages
Flume Developer Guide
No ratings yet
Flume Developer Guide
14 pages
06 - Acquire Data Using CLI and Flume
No ratings yet
06 - Acquire Data Using CLI and Flume
13 pages
Apache Flume
No ratings yet
Apache Flume
8 pages
Assignment
No ratings yet
Assignment
37 pages
Arinto Murdopo Josep Subirats Group 4 EEDC 2012
No ratings yet
Arinto Murdopo Josep Subirats Group 4 EEDC 2012
19 pages
Flume Case Study
No ratings yet
Flume Case Study
2 pages
Streaming Data Via Flume
No ratings yet
Streaming Data Via Flume
13 pages
Flume PDF
No ratings yet
Flume PDF
7 pages
Apache Flume
No ratings yet
Apache Flume
21 pages
Unit-3 (HDFS-II)
No ratings yet
Unit-3 (HDFS-II)
28 pages
Apache Flume Tutorial - What Is - Architecture
No ratings yet
Apache Flume Tutorial - What Is - Architecture
8 pages
Bda Exp7 Chinmay
No ratings yet
Bda Exp7 Chinmay
5 pages
Bda Iat2
No ratings yet
Bda Iat2
23 pages
Search Analytics With Flume and HBase
No ratings yet
Search Analytics With Flume and HBase
24 pages
Sqoop & Flume: Issues With Data Load Into Hadoop
No ratings yet
Sqoop & Flume: Issues With Data Load Into Hadoop
6 pages
6 Flume - Student - Datadotz
No ratings yet
6 Flume - Student - Datadotz
29 pages
Unit 2 (2 Part)
No ratings yet
Unit 2 (2 Part)
69 pages
Hadoop 3
No ratings yet
Hadoop 3
52 pages
Foundations For High Scalability in Mule 4 PDF
No ratings yet
Foundations For High Scalability in Mule 4 PDF
33 pages
BDA Mid-2 Important Questions
No ratings yet
BDA Mid-2 Important Questions
19 pages
Apache Flume - Data Transfer in Hadoop - Tutorialspoint
No ratings yet
Apache Flume - Data Transfer in Hadoop - Tutorialspoint
2 pages
A728542518 - 16469 - 30 - 2019 - Flume Complete
No ratings yet
A728542518 - 16469 - 30 - 2019 - Flume Complete
13 pages
Questions For CCA175
50% (2)
Questions For CCA175
33 pages
Module 08 Flink - Stream Processing and Batch Processing Platform
No ratings yet
Module 08 Flink - Stream Processing and Batch Processing Platform
40 pages
Unit-2 Imp Ques Ans
No ratings yet
Unit-2 Imp Ques Ans
8 pages
Lect - 11 - BIG DATA
No ratings yet
Lect - 11 - BIG DATA
42 pages
Module 07 Streaming - Distributed Stream Computing Engine
No ratings yet
Module 07 Streaming - Distributed Stream Computing Engine
33 pages
Big Data Ca
No ratings yet
Big Data Ca
14 pages
Chapter 7 Flink Stream and Batch Processing in A Single Engine
No ratings yet
Chapter 7 Flink Stream and Batch Processing in A Single Engine
45 pages
5a. Introduction To Data Ingestion and Processing
No ratings yet
5a. Introduction To Data Ingestion and Processing
26 pages
Big Data-2 Sourcing Data
No ratings yet
Big Data-2 Sourcing Data
38 pages
Presentation of Big Data
No ratings yet
Presentation of Big Data
4 pages
Big Data: Week - 13
No ratings yet
Big Data: Week - 13
33 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
ITHome - Deep Dive Into Apache Flink - Gordon
No ratings yet
ITHome - Deep Dive Into Apache Flink - Gordon
44 pages
Chapter 7 Flume and Kafka Questions Answers
No ratings yet
Chapter 7 Flume and Kafka Questions Answers
5 pages
Unit - 5 Updated MHM
No ratings yet
Unit - 5 Updated MHM
25 pages
Lambda - A Modern Big Data Architecture 5 - 12 PDF
No ratings yet
Lambda - A Modern Big Data Architecture 5 - 12 PDF
128 pages
Akk A Stream and HTTP Java
No ratings yet
Akk A Stream and HTTP Java
138 pages
Lec 20
No ratings yet
Lec 20
25 pages
Illumio Core FlowLink Configuration and Usage Guide 1.1.2
No ratings yet
Illumio Core FlowLink Configuration and Usage Guide 1.1.2
36 pages
Indjcse24 15 04 020
No ratings yet
Indjcse24 15 04 020
13 pages
Lecture 9 - Realtime Analytics
No ratings yet
Lecture 9 - Realtime Analytics
34 pages
Learning Hadoop 2
From Everand
Learning Hadoop 2
Garry Turkington
4/5 (1)
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Huawei Academy Instructor Assessment Guide (Only For HCNA-cloud) - 20180416
No ratings yet
Huawei Academy Instructor Assessment Guide (Only For HCNA-cloud) - 20180416
8 pages
Appendix 4 Apply For HCAI Certificate Guide
No ratings yet
Appendix 4 Apply For HCAI Certificate Guide
8 pages
Module 13 FusionInsight HD Solution Overview
No ratings yet
Module 13 FusionInsight HD Solution Overview
57 pages
Module 11 Kafka - Distributed Message Subscription System
No ratings yet
Module 11 Kafka - Distributed Message Subscription System
34 pages
Module 12 Zookeeper - Cluster Distributed Coordination Service
No ratings yet
Module 12 Zookeeper - Cluster Distributed Coordination Service
26 pages
Module 01 Big Data Industry and Technological Trends
No ratings yet
Module 01 Big Data Industry and Technological Trends
50 pages
MAQ TNC AC Test
No ratings yet
MAQ TNC AC Test
1 page
Topic 2 Linear Programming
No ratings yet
Topic 2 Linear Programming
64 pages
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
No ratings yet
02-07-23 SR - Iit Star Co-Sc (Model-A) Jee Adv 2020 (P-I) Wat-45 Key&Sol
14 pages
Opticalsmokedetector Salwicoev P
No ratings yet
Opticalsmokedetector Salwicoev P
2 pages
Installation Instructions: Diesel/Alternator Tachometer 3-3/8" & 5"
No ratings yet
Installation Instructions: Diesel/Alternator Tachometer 3-3/8" & 5"
2 pages
Printlac High Gloss TDS
No ratings yet
Printlac High Gloss TDS
2 pages
Chapter 8 Gladys May Alcantara
No ratings yet
Chapter 8 Gladys May Alcantara
57 pages
Compiler Design 1
100% (1)
Compiler Design 1
30 pages
DVB Conditional Access System - CAS & SMS - CryptoGuard AB CryptoGuard AB
No ratings yet
DVB Conditional Access System - CAS & SMS - CryptoGuard AB CryptoGuard AB
7 pages
Distribution and Habitat Association of Somali Ostrich in Samburu, Kenya
No ratings yet
Distribution and Habitat Association of Somali Ostrich in Samburu, Kenya
9 pages
PES3701 Assignment 3
No ratings yet
PES3701 Assignment 3
3 pages
Staff Manual Chewonki
No ratings yet
Staff Manual Chewonki
34 pages
NW NSC GR 11 Maths Lit P1 Eng Memo Nov 2019
No ratings yet
NW NSC GR 11 Maths Lit P1 Eng Memo Nov 2019
7 pages
AC2 Engineering Utilities 2 Syllabus
No ratings yet
AC2 Engineering Utilities 2 Syllabus
16 pages
Q. No Sub Q.No Answer: (Autonomous)
No ratings yet
Q. No Sub Q.No Answer: (Autonomous)
23 pages
Academic Writing
No ratings yet
Academic Writing
12 pages
q3 Peh Week3
No ratings yet
q3 Peh Week3
8 pages
Prof Ed 2023 New Curriculum
No ratings yet
Prof Ed 2023 New Curriculum
17 pages
Case Study: Gates Corporation Blueprint
No ratings yet
Case Study: Gates Corporation Blueprint
2 pages
Algebra and More For Analytics
No ratings yet
Algebra and More For Analytics
29 pages
Susanto Update Cv.2023
No ratings yet
Susanto Update Cv.2023
3 pages
Finalworm 160204043543
No ratings yet
Finalworm 160204043543
20 pages
Udgam School For Children: Page 1 of 2 Class-VII / Subject - English / Worksheet
No ratings yet
Udgam School For Children: Page 1 of 2 Class-VII / Subject - English / Worksheet
2 pages
WEG - Transformer
No ratings yet
WEG - Transformer
20 pages
34.1.18 AOAC Official Method 948.14 Succinic Acid in Eggs
No ratings yet
34.1.18 AOAC Official Method 948.14 Succinic Acid in Eggs
2 pages
Theory of Elasticity
No ratings yet
Theory of Elasticity
4 pages
Habib Rehman Presentation
No ratings yet
Habib Rehman Presentation
8 pages
Science: Junior Cycle Final Examination Sample Paper A Solutions
No ratings yet
Science: Junior Cycle Final Examination Sample Paper A Solutions
10 pages

Module 10 Flume - Massive Logs Aggregation

Uploaded by

Module 10 Flume - Massive Logs Aggregation

Uploaded by

Technical Principles of

Copyright © 2018 Huawei Technologies Co., Ltd. All rights reserved.

2. Key Characteristics of Flume

Data Information Knowledge Wisdom

Flume is a distributed framework for collecting and aggregating stream data.

Log Channel HDFS

Source Sink Source Sink

spooling directory source Collects local static files.

jms source Obtain data from the message queue.

Transmits data to the next-hope Flume node using

The same as the avro sink. The transmission

file roll sink Saves data in the local file system.

hbase sink Writes data in the HBase.

Kafka sink Writes data in the Kafka.

MorphlineSolr sink Writes data in the Solr.

2. Key Characteristics of Flume

Log Source Channel HBase

Source Channel Kafka

Channel Sink HDFS

Channel Sink HBase

Compression Decompression HDFS/Hive/

Flume monitoring information

Channel Sink Source Channel

Take events Send events

2. Key Characteristics of Flume

 Take this directory as the monitoring directory

 Log in to the FusionInsight HD cluster. Choose Service

 Install the client

./install.sh –d /opt/FlumeClient –f hostIP –c

 Upload the configuration file.

 Check if data is sinked to HDFS:

 log.11 is already renamed log.11. COMPLETED， which means

 Collect data to kafka topic_1028

# the source configuration of a1

 Use kafka demands to view data collected kafka topic_1028.

2. What are key characteristics of the Flume?

3. What are functions of the source, channel, and sink?

You might also like