0% found this document useful (0 votes)

16 views21 pages

Apache Flume

Uploaded by

Veerabhadra Durgam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views21 pages

Apache Flume

Uploaded by

Veerabhadra Durgam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

12/31/21, 9:19 AM Apache Flume - Quick Guide

Apache Flume - Quick Guide

Apache Flume - Introduction

What is Flume?

Apache Flume is a tool/service/data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log
files, events (etc...) from various sources to a centralized data store.

Flume is a highly reliable, distributed, and configurable tool. It is principally designed to copy streaming data (log data) from various web servers to
HDFS.

Applications of Flume
Assume an e-commerce web application wants to analyze the customer behavior from a particular region. To do so, they would need to move the
available log data in to Hadoop for analysis. Here, Apache Flume comes to our rescue.

Flume is used to move the log data generated by application servers into HDFS at a higher speed.

Advantages of Flume

Here are the advantages of using Flume −

Using Apache Flume we can store the data in to any of the centralized stores (HBase, HDFS).

When the rate of incoming data exceeds the rate at which data can be written to the destination, Flume acts as a mediator between data
producers and the centralized stores and provides a steady flow of data between them.

Flume provides the feature of contextual routing.

The transactions in Flume are channel-based where two transactions (one sender and one receiver) are maintained for each message. It
guarantees reliable message delivery.

Flume is reliable, fault tolerant, scalable, manageable, and customizable.

Features of Flume
Some of the notable features of Flume are as follows −
Flume ingests log data from multiple web servers into a centralized store (HDFS, HBase) efficiently.

Using Flume, we can get the data from multiple servers immediately into Hadoop.

Along with the log files, Flume is also used to import huge volumes of event data produced by social networking sites like Facebook and
Twitter, and e-commerce websites like Amazon and Flipkart.

Flume supports a large set of sources and destinations types.

Flume supports multi-hop flows, fan-in fan-out flows, contextual routing, etc.

Flume can be scaled horizontally.

Apache Flume - Data Transfer In Hadoop

Big Data, as we know, is a collection of large datasets that cannot be processed using traditional computing techniques. Big Data, when analyzed,
gives valuable results. Hadoop is an open-source framework that allows to store and process Big Data in a distributed environment across
clusters of computers using simple programming models.

Streaming / Log Data

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 1/21
12/31/21, 9:19 AM Apache Flume - Quick Guide
Generally, most of the data that is to be analyzed will be produced by various data sources like applications servers, social networking sites, cloud
servers, and enterprise servers. This data will be in the form of log files and events.

Log file − In general, a log file is a file that lists events/actions that occur in an operating system. For example, web servers list every request
made to the server in the log files.

On harvesting such log data, we can get information about −

the application performance and locate various software and hardware failures.
the user behavior and derive better business insights.
The traditional method of transferring data into the HDFS system is to use the put command. Let us see how to use the put command.

HDFS put Command

The main challenge in handling the log data is in moving these logs produced by multiple servers to the Hadoop environment.

Hadoop File System Shell provides commands to insert data into Hadoop and read from it. You can insert data into Hadoop using the put
command as shown below.

$ Hadoop fs –put /path of the required file /path in HDFS where to save the file

Problem with put Command

We can use the put command of Hadoop to transfer data from these sources to HDFS. But, it suffers from the following drawbacks −
Using put command, we can transfer only one file at a time while the data generators generate data at a much higher rate. Since the
analysis made on older data is less accurate, we need to have a solution to transfer data in real time.
If we use put command, the data is needed to be packaged and should be ready for the upload. Since the webservers generate data
continuously, it is a very difficult task.

What we need here is a solutions that can overcome the drawbacks of put command and transfer the "streaming data" from data generators to
centralized stores (especially HDFS) with less delay.

Problem with HDFS

In HDFS, the file exists as a directory entry and the length of the file will be considered as zero till it is closed. For example, if a source is writing
data into HDFS and the network was interrupted in the middle of the operation (without closing the file), then the data written in the file will be lost.

Therefore we need a reliable, configurable, and maintainable system to transfer the log data into HDFS.

Note − In POSIX file system, whenever we are accessing a file (say performing write operation), other programs can still read this file (at least the
saved portion of the file). This is because the file exists on the disc before it is closed.

Available Solutions
To send streaming data (log files, events etc..,) from various sources to HDFS, we have the following tools available at our disposal −

Facebook’s Scribe

Scribe is an immensely popular tool that is used to aggregate and stream log data. It is designed to scale to a very large number of nodes and be
robust to network and node failures.

Apache Kafka

Kafka has been developed by Apache Software Foundation. It is an open-source message broker. Using Kafka, we can handle feeds with high-
throughput and low-latency.

Apache Flume

Apache Flume is a tool/service/data ingestion mechanism for collecting aggregating and transporting large amounts of streaming data such as log
data, events (etc...) from various webserves to a centralized data store.

It is a highly reliable, distributed, and configurable tool that is principally designed to transfer streaming data from various sources to HDFS.
In this tutorial, we will discuss in detail how to use Flume with some examples.

Apache Flume - Architecture

The following illustration depicts the basic architecture of Flume. As shown in the illustration, data generators (such as Facebook, Twitter)
generate data which gets collected by individual Flume agents running on them. Thereafter, a data collector (which is also an agent) collects the
data from the agents which is aggregated and pushed into a centralized store such as HDFS or HBase.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 2/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

Flume Event

An event is the basic unit of the data transported inside Flume. It contains a payload of byte array that is to be transported from the source to the
destination accompanied by optional headers. A typical Flume event would have the following structure −

Flume Agent
An agent is an independent daemon process (JVM) in Flume. It receives the data (events) from clients or other agents and forwards it to its next
destination (sink or agent). Flume may have more than one agent. Following diagram represents a Flume Agent

As shown in the diagram a Flume Agent contains three main components namely, source, channel, and sink.

Source

A source is the component of an Agent which receives data from the data generators and transfers it to one or more channels in the form of
Flume events.

Apache Flume supports several types of sources and each source receives events from a specified data generator.
Example − Avro source, Thrift source, twitter 1% source etc.

Channel

A channel is a transient store which receives the events from the source and buffers them till they are consumed by sinks. It acts as a bridge
between the sources and the sinks.
These channels are fully transactional and they can work with any number of sources and sinks.

Example − JDBC channel, File system channel, Memory channel, etc.

Sink

A sink stores the data into centralized stores like HBase and HDFS. It consumes the data (events) from the channels and delivers it to the
destination. The destination of the sink might be another agent or the central stores.

Example − HDFS sink

Note − A flume agent can have multiple sources, sinks and channels. We have listed all the supported sources, sinks, channels in the Flume
configuration chapter of this tutorial.

Additional Components of Flume Agent

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 3/21
12/31/21, 9:19 AM Apache Flume - Quick Guide
What we have discussed above are the primitive components of the agent. In addition to this, we have a few more components that play a vital
role in transferring the events from the data generator to the centralized stores.

Interceptors

Interceptors are used to alter/inspect flume events which are transferred between source and channel.

Channel Selectors

These are used to determine which channel is to be opted to transfer the data in case of multiple channels. There are two types of channel
selectors −

Default channel selectors − These are also known as replicating channel selectors they replicates all the events in each channel.

Multiplexing channel selectors − These decides the channel to send an event based on the address in the header of that event.

Sink Processors

These are used to invoke a particular sink from the selected group of sinks. These are used to create failover paths for your sinks or load balance
events across multiple sinks from a channel.

Apache Flume - Data Flow

Flume is a framework which is used to move log data into HDFS. Generally events and log data are generated by the log servers and these
servers have Flume agents running on them. These agents receive the data from the data generators.

The data in these agents will be collected by an intermediate node known as Collector. Just like agents, there can be multiple collectors in Flume.
Finally, the data from all these collectors will be aggregated and pushed to a centralized store such as HBase or HDFS. The following diagram
explains the data flow in Flume.

Multi-hop Flow

Within Flume, there can be multiple agents and before reaching the final destination, an event may travel through more than one agent. This is
known as multi-hop flow.

Fan-out Flow
The dataflow from one source to multiple channels is known as fan-out flow. It is of two types −

Replicating − The data flow where the data will be replicated in all the configured channels.
Multiplexing − The data flow where the data will be sent to a selected channel which is mentioned in the header of the event.

Fan-in Flow
The data flow in which the data will be transferred from many sources to one channel is known as fan-in flow.

Failure Handling
In Flume, for each event, two transactions take place: one at the sender and one at the receiver. The sender sends events to the receiver. Soon
after receiving the data, the receiver commits its own transaction and sends a “received” signal to the sender. After receiving the signal, the sender
commits its transaction. (Sender will not commit its transaction till it receives a signal from the receiver.)

Apache Flume - Environment

We already discussed the architecture of Flume in the previous chapter. In this chapter, let us see how to download and setup Apache Flume.

Before proceeding further, you need to have a Java environment in your system. So first of all, make sure you have Java installed in your system.
For some examples in this tutorial, we have used Hadoop HDFS (as sink). Therefore, we would recommend that you go install Hadoop along with
https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 4/21
12/31/21, 9:19 AM Apache Flume - Quick Guide
Java. To collect more information, follow the link −http://www.tutorialspoint.com/hadoop/hadoop_enviornment_setup.htm

Installing Flume
First of all, download the latest version of Apache Flume software from the website https://flume.apache.org/ .

Step 1

Open the website. Click on the download link on the left-hand side of the home page. It will take you to the download page of Apache Flume.

Step 2

In the Download page, you can see the links for binary and source files of Apache Flume. Click on the link apache-flume-1.6.0-bin.tar.gz

You will be redirected to a list of mirrors where you can start your download by clicking any of these mirrors. In the same way, you can download
the source code of Apache Flume by clicking on apache-flume-1.6.0-src.tar.gz .

Step 3

Create a directory with the name Flume in the same directory where the installation directories of Hadoop, HBase, and other software were
installed (if you have already installed any) as shown below.

$ mkdir Flume

Step 4

Extract the downloaded tar files as shown below.

$ cd Downloads/
$ tar zxvf apache-flume-1.6.0-bin.tar.gz
$ tar zxvf apache-flume-1.6.0-src.tar.gz

Step 5

Move the content of apache-flume-1.6.0-bin.tar file to the Flume directory created earlier as shown below. (Assume we have created the Flume
directory in the local user named Hadoop.)

$ mv apache-flume-1.6.0-bin.tar/* /home/Hadoop/Flume/

Configuring Flume
To configure Flume, we have to modify three files namely, flume-env.sh, flumeconf.properties, and bash.rc.

Setting the Path / Classpath

In the .bashrc file, set the home folder, the path, and the classpath for Flume as shown below.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 5/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

conf Folder

If you open the conf folder of Apache Flume, you will have the following four files −

flume-conf.properties.template,
flume-env.sh.template,
flume-env.ps1.template, and
log4j.properties.

Now rename
flume-conf.properties.template file as flume-conf.properties and

flume-env.sh.template as flume-env.sh

flume-env.sh

Open flume-env.sh file and set the JAVA_Home to the folder where Java was installed in your system.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 6/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

Verifying the Installation

Verify the installation of Apache Flume by browsing through the bin folder and typing the following command.

$ ./flume-ng

If you have successfully installed Flume, you will get a help prompt of Flume as shown below.

Apache Flume - Configuration

After installing Flume, we need to configure it using the configuration file which is a Java property file having key-value pairs. We need to pass
values to the keys in the file.

In the Flume configuration file, we need to −

Name the components of the current agent.
Describe/Configure the source.
Describe/Configure the sink.
Describe/Configure the channel.
Bind the source and the sink to the channel.
Usually we can have multiple agents in Flume. We can differentiate each agent by using a unique name. And using this name, we have to
configure each agent.

Naming the Components

First of all, you need to name/list the components such as sources, sinks, and the channels of the agent, as shown below.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 7/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

agent_name.sources = source_name
agent_name.sinks = sink_name
agent_name.channels = channel_name

Flume supports various sources, sinks, and channels. They are listed in the table given below.

Sources Channels Sinks

You can use any of them. For example, if you are transferring Twitter data using Twitter source through a memory channel to an HDFS sink, and
the agent name id TwitterAgent, then

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

After listing the components of the agent, you have to describe the source(s), sink(s), and channel(s) by providing values to their properties.

Describing the Source

Each source will have a separate list of properties. The property named “type” is common to every source, and it is used to specify the type of the
source we are using.

Along with the property “type”, it is needed to provide the values of all the required properties of a particular source to configure it, as shown
below.

agent_name.sources. source_name.type = value

agent_name.sources. source_name.property2 = value
agent_name.sources. source_name.property3 = value

For example, if we consider the twitter source, following are the properties to which we must provide values to configure it.

TwitterAgent.sources.Twitter.type = Twitter (type name)

TwitterAgent.sources.Twitter.consumerKey =
TwitterAgent.sources.Twitter.consumerSecret =
TwitterAgent.sources.Twitter.accessToken =
TwitterAgent.sources.Twitter.accessTokenSecret =

Describing the Sink

Just like the source, each sink will have a separate list of properties. The property named “type” is common to every sink, and it is used to specify
the type of the sink we are using. Along with the property “type”, it is needed to provide values to all the required properties of a particular sink to
configure it, as shown below.

agent_name.sinks. sink_name.type = value

agent_name.sinks. sink_name.property2 = value
https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 8/21
12/31/21, 9:19 AM Apache Flume - Quick Guide
agent_name.sinks. sink_name.property3 = value

For example, if we consider HDFS sink, following are the properties to which we must provide values to configure it.

TwitterAgent.sinks.HDFS.type = hdfs (type name)

TwitterAgent.sinks.HDFS.hdfs.path = HDFS directory’s Path to store the data

Describing the Channel

Flume provides various channels to transfer data between sources and sinks. Therefore, along with the sources and the channels, it is needed to
describe the channel used in the agent.

To describe each channel, you need to set the required properties, as shown below.

agent_name.channels.channel_name.type = value
agent_name.channels.channel_name. property2 = value
agent_name.channels.channel_name. property3 = value

For example, if we consider memory channel, following are the properties to which we must provide values to configure it.

TwitterAgent.channels.MemChannel.type = memory (type name)

Binding the Source and the Sink to the Channel

Since the channels connect the sources and sinks, it is required to bind both of them to the channel, as shown below.

agent_name.sources.source_name.channels = channel_name
agent_name.sinks.sink_name.channels = channel_name

The following example shows how to bind the sources and the sinks to a channel. Here, we consider twitter source, memory channel, and
HDFS sink.

TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channels = MemChannel

Starting a Flume Agent

After configuration, we have to start the Flume agent. It is done as follows −

$ bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf

Dflume.root.logger=DEBUG,console -n TwitterAgent

where −

agent − Command to start the Flume agent

--conf ,-c<conf> − Use configuration file in the conf directory

-f<file> − Specifies a config file path, if missing

--name, -n <name> − Name of the twitter agent

-D property =value − Sets a Java system property value.

Apache Flume - Fetching Twitter Data

Using Flume, we can fetch data from various services and transport it to centralized stores (HDFS and HBase). This chapter explains how to fetch
data from Twitter service and store it in HDFS using Apache Flume.
As discussed in Flume Architecture, a webserver generates log data and this data is collected by an agent in Flume. The channel buffers this data
to a sink, which finally pushes it to centralized stores.

In the example provided in this chapter, we will create an application and get the tweets from it using the experimental twitter source provided by
Apache Flume. We will use the memory channel to buffer these tweets and HDFS sink to push these tweets into the HDFS.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 9/21
12/31/21, 9:19 AM Apache Flume - Quick Guide
To fetch Twitter data, we will have to follow the steps given below −

Create a twitter Application

Install / Start HDFS
Configure Flume

Creating a Twitter Application

In order to get the tweets from Twitter, it is needed to create a Twitter application. Follow the steps given below to create a Twitter application.

Step 1

To create a Twitter application, click on the following link https://apps.twitter.com/ . Sign in to your Twitter account. You will have a Twitter
Application Management window where you can create, delete, and manage Twitter Apps.

Step 2

Click on the Create New App button. You will be redirected to a window where you will get an application form in which you have to fill in your
details in order to create the App. While filling the website address, give the complete URL pattern, for example, http://example.com.

Step 3

Fill in the details, accept the Developer Agreement when finished, click on the Create your Twitter application button which is at the bottom of
the page. If everything goes fine, an App will be created with the given details as shown below.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 10/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

Step 4

Under keys and Access Tokens tab at the bottom of the page, you can observe a button named Create my access token. Click on it to generate
the access token.

Step 5

Finally, click on the Test OAuth button which is on the right side top of the page. This will lead to a page which displays your Consumer key,
Consumer secret, Access token, and Access token secret. Copy these details. These are useful to configure the agent in Flume.

Starting HDFS
Since we are storing the data in HDFS, we need to install / verify Hadoop. Start Hadoop and create a folder in it to store Flume data. Follow the
steps given below before configuring Flume.

Step 1: Install / Verify Hadoop

Install Hadoop . If Hadoop is already installed in your system, verify the installation using Hadoop version command, as shown below.

$ hadoop version

If your system contains Hadoop, and if you have set the path variable, then you will get the following output −

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 11/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

Hadoop 2.6.0
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r
e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1
Compiled by jenkins on 2014-11-13T21:10Z
Compiled with protoc 2.5.0
From source with checksum 18e43357c8f927c0695f1e9522859d6a
This command was run using /home/Hadoop/hadoop/share/hadoop/common/hadoop-common-2.6.0.jar

Step 2: Starting Hadoop

Browse through the sbin directory of Hadoop and start yarn and Hadoop dfs (distributed file system) as shown below.

cd /$Hadoop_Home/sbin/
$ start-dfs.sh
localhost: starting namenode, logging to
/home/Hadoop/hadoop/logs/hadoop-Hadoop-namenode-localhost.localdomain.out
localhost: starting datanode, logging to
/home/Hadoop/hadoop/logs/hadoop-Hadoop-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
starting secondarynamenode, logging to
/home/Hadoop/hadoop/logs/hadoop-Hadoop-secondarynamenode-localhost.localdomain.out

$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to
/home/Hadoop/hadoop/logs/yarn-Hadoop-resourcemanager-localhost.localdomain.out
localhost: starting nodemanager, logging to
/home/Hadoop/hadoop/logs/yarn-Hadoop-nodemanager-localhost.localdomain.out

Step 3: Create a Directory in HDFS

In Hadoop DFS, you can create directories using the command mkdir. Browse through it and create a directory with the name twitter_data in the
required path as shown below.

$cd /$Hadoop_Home/bin/
$ hdfs dfs -mkdir hdfs://localhost:9000/user/Hadoop/twitter_data

Configuring Flume
We have to configure the source, the channel, and the sink using the configuration file in the conf folder. The example given in this chapter uses
an experimental source provided by Apache Flume named Twitter 1% Firehose Memory channel and HDFS sink.

Twitter 1% Firehose Source

This source is highly experimental. It connects to the 1% sample Twitter Firehose using streaming API and continuously downloads tweets,
converts them to Avro format, and sends Avro events to a downstream Flume sink.

We will get this source by default along with the installation of Flume. The jar files corresponding to this source can be located in the lib folder as
shown below.

Setting the classpath

Set the classpath variable to the lib folder of Flume in Flume-env.sh file as shown below.

export CLASSPATH=$CLASSPATH:/FLUME_HOME/lib/*

This source needs the details such as Consumer key, Consumer secret, Access token, and Access token secret of a Twitter application.
While configuring this source, you have to provide values to the following properties −

Channels

Source type : org.apache.flume.source.twitter.TwitterSource

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 12/21
12/31/21, 9:19 AM Apache Flume - Quick Guide
consumerKey − The OAuth consumer key

consumerSecret − OAuth consumer secret

accessToken − OAuth access token

accessTokenSecret − OAuth token secret

maxBatchSize − Maximum number of twitter messages that should be in a twitter batch. The default value is 1000 (optional).

maxBatchDurationMillis − Maximum number of milliseconds to wait before closing a batch. The default value is 1000 (optional).

Channel

We are using the memory channel. To configure the memory channel, you must provide value to the type of the channel.

type − It holds the type of the channel. In our example, the type is MemChannel.

Capacity − It is the maximum number of events stored in the channel. Its default value is 100 (optional).

TransactionCapacity − It is the maximum number of events the channel accepts or sends. Its default value is 100 (optional).

HDFS Sink

This sink writes data into the HDFS. To configure this sink, you must provide the following details.

Channel
type − hdfs

hdfs.path − the path of the directory in HDFS where data is to be stored.

And we can provide some optional values based on the scenario. Given below are the optional properties of the HDFS sink that we are configuring
in our application.

fileType − This is the required file format of our HDFS file. SequenceFile, DataStream and CompressedStream are the three types
available with this stream. In our example, we are using the DataStream.

writeFormat − Could be either text or writable.

batchSize − It is the number of events written to a file before it is flushed into the HDFS. Its default value is 100.

rollsize − It is the file size to trigger a roll. It default value is 100.

rollCount − It is the number of events written into the file before it is rolled. Its default value is 10.

Example – Configuration File

Given below is an example of the configuration file. Copy this content and save as twitter.conf in the conf folder of Flume.

# Naming the components on the current agent.

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

# Describing/Configuring the source

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = Your OAuth consumer key
TwitterAgent.sources.Twitter.consumerSecret = Your OAuth consumer secret
TwitterAgent.sources.Twitter.accessToken = Your OAuth consumer key access token
TwitterAgent.sources.Twitter.accessTokenSecret = Your OAuth consumer key access token secret
TwitterAgent.sources.Twitter.keywords = tutorials point,java, bigdata, mapreduce, mahout, hbase, nosql

# Describing/Configuring the sink

TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/Hadoop/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000

# Describing/Configuring the channel

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

# Binding the source and sink to the channel

TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 13/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

Execution
Browse through the Flume home directory and execute the application as shown below.

$ cd $FLUME_HOME
$ bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf
Dflume.root.logger=DEBUG,console -n TwitterAgent

If everything goes fine, the streaming of tweets into HDFS will start. Given below is the snapshot of the command prompt window while fetching
tweets.

Verifying HDFS

You can access the Hadoop Administration Web UI using the URL given below.

http://localhost:50070/

Click on the dropdown named Utilities on the right-hand side of the page. You can see two options as shown in the snapshot given below.

Click on Browse the file system and enter the path of the HDFS directory where you have stored the tweets. In our example, the path will be
/user/Hadoop/twitter_data/. Then, you can see the list of twitter log files stored in HDFS as given below.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 14/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

Apache Flume - Sequence Generator Source

In the previous chapter, we have seen how to fetch data from twitter source to HDFS. This chapter explains how to fetch data from Sequence
generator.

Prerequisites

To run the example provided in this chapter, you need to install HDFS along with Flume. Therefore, verify Hadoop installation and start the HDFS
before proceeding further. (Refer the previous chapter to learn how to start the HDFS).

Configuring Flume

We have to configure the source, the channel, and the sink using the configuration file in the conf folder. The example given in this chapter uses a
sequence generator source, a memory channel, and an HDFS sink.

Sequence Generator Source

It is the source that generates the events continuously. It maintains a counter that starts from 0 and increments by 1. It is used for testing purpose.
While configuring this source, you must provide values to the following properties −

Channels

Source type − seq

Channel

We are using the memory channel. To configure the memory channel, you must provide a value to the type of the channel. Given below are the
list of properties that you need to supply while configuring the memory channel −

type − It holds the type of the channel. In our example the type is MemChannel.

Capacity − It is the maximum number of events stored in the channel. Its default value is 100. (optional)
TransactionCapacity − It is the maximum number of events the channel accepts or sends. Its default is 100. (optional).

HDFS Sink

This sink writes data into the HDFS. To configure this sink, you must provide the following details.

Channel

type − hdfs

hdfs.path − the path of the directory in HDFS where data is to be stored.

And we can provide some optional values based on the scenario. Given below are the optional properties of the HDFS sink that we are configuring
in our application.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 15/21
12/31/21, 9:19 AM Apache Flume - Quick Guide
writeFormat − Could be either text or writable.

batchSize − It is the number of events written to a file before it is flushed into the HDFS. Its default value is 100.

rollsize − It is the file size to trigger a roll. It default value is 100.

rollCount − It is the number of events written into the file before it is rolled. Its default value is 10.

Example – Configuration File

Given below is an example of the configuration file. Copy this content and save as seq_gen .conf in the conf folder of Flume.

# Naming the components on the current agent

SeqGenAgent.sources = SeqSource
SeqGenAgent.channels = MemChannel
SeqGenAgent.sinks = HDFS

# Describing/Configuring the source

SeqGenAgent.sources.SeqSource.type = seq

# Describing/Configuring the sink

SeqGenAgent.sinks.HDFS.type = hdfs
SeqGenAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/Hadoop/seqgen_data/
SeqGenAgent.sinks.HDFS.hdfs.filePrefix = log
SeqGenAgent.sinks.HDFS.hdfs.rollInterval = 0
SeqGenAgent.sinks.HDFS.hdfs.rollCount = 10000
SeqGenAgent.sinks.HDFS.hdfs.fileType = DataStream

# Describing/Configuring the channel

SeqGenAgent.channels.MemChannel.type = memory
SeqGenAgent.channels.MemChannel.capacity = 1000
SeqGenAgent.channels.MemChannel.transactionCapacity = 100

# Binding the source and sink to the channel

SeqGenAgent.sources.SeqSource.channels = MemChannel
SeqGenAgent.sinks.HDFS.channel = MemChannel

Execution
Browse through the Flume home directory and execute the application as shown below.

$ cd $FLUME_HOME
$./bin/flume-ng agent --conf $FLUME_CONF --conf-file $FLUME_CONF/seq_gen.conf
--name SeqGenAgent

If everything goes fine, the source starts generating sequence numbers which will be pushed into the HDFS in the form of log files.

Given below is a snapshot of the command prompt window fetching the data generated by the sequence generator into the HDFS.

Verifying the HDFS

You can access the Hadoop Administration Web UI using the following URL −

http://localhost:50070/

Click on the dropdown named Utilities on the right-hand side of the page. You can see two options as shown in the diagram given below.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 16/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

Click on Browse the file system and enter the path of the HDFS directory where you have stored the data generated by the sequence generator.

In our example, the path will be /user/Hadoop/ seqgen_data /. Then, you can see the list of log files generated by the sequence generator, stored
in the HDFS as given below.

Verifying the Contents of the File

All these log files contain numbers in sequential format. You can verify the contents of these file in the file system using the cat command as
shown below.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 17/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

Apache Flume - NetCat Source

This chapter takes an example to explain how you can generate events and subsequently log them into the console. For this, we are using the
NetCat source and the logger sink.

Prerequisites

To run the example provided in this chapter, you need to install Flume.

Configuring Flume

We have to configure the source, the channel, and the sink using the configuration file in the conf folder. The example given in this chapter uses a
NetCat Source, Memory channel, and a logger sink.

NetCat Source

While configuring the NetCat source, we have to specify a port while configuring the source. Now the source (NetCat source) listens to the given
port and receives each line we entered in that port as an individual event and transfers it to the sink through the specified channel.

While configuring this source, you have to provide values to the following properties −

channels

Source type − netcat

bind − Host name or IP address to bind.

port − Port number to which we want the source to listen.

Channel

type − It holds the type of the channel. In our example, the type is MemChannel.

Capacity − It is the maximum number of events stored in the channel. Its default value is 100. (optional)

TransactionCapacity − It is the maximum number of events the channel accepts or sends. Its default value is 100. (optional).

Logger Sink

This sink logs all the events passed to it. Generally, it is used for testing or debugging purpose. To configure this sink, you must provide the
following details.

Channel

type − logger

Example Configuration File

Given below is an example of the configuration file. Copy this content and save as netcat.conf in the conf folder of Flume.

# Naming the components on the current agent

NetcatAgent.sources = Netcat
NetcatAgent.channels = MemChannel
NetcatAgent.sinks = LoggerSink

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 18/21
12/31/21, 9:19 AM Apache Flume - Quick Guide
# Describing/Configuring the source
NetcatAgent.sources.Netcat.type = netcat
NetcatAgent.sources.Netcat.bind = localhost
NetcatAgent.sources.Netcat.port = 56565

# Describing/Configuring the sink

NetcatAgent.sinks.LoggerSink.type = logger

# Describing/Configuring the channel

NetcatAgent.channels.MemChannel.type = memory
NetcatAgent.channels.MemChannel.capacity = 1000
NetcatAgent.channels.MemChannel.transactionCapacity = 100

# Bind the source and sink to the channel

NetcatAgent.sources.Netcat.channels = MemChannel
NetcatAgent.sinks. LoggerSink.channel = MemChannel

Execution

Browse through the Flume home directory and execute the application as shown below.

$ cd $FLUME_HOME
$ ./bin/flume-ng agent --conf $FLUME_CONF --conf-file $FLUME_CONF/netcat.conf
--name NetcatAgent -Dflume.root.logger=INFO,console

If everything goes fine, the source starts listening to the given port. In this case, it is 56565. Given below is the snapshot of the command prompt
window of a NetCat source which has started and listening to the port 56565.

Passing Data to the Source

To pass data to NetCat source, you have to open the port given in the configuration file. Open a separate terminal and connect to the source
(56565) using the curl command. When the connection is successful, you will get a message “connected” as shown below.

$ curl telnet://localhost:56565
connected

Now you can enter your data line by line (after each line, you have to press Enter). The NetCat source receives each line as an individual event
and you will get a received message “OK”.

Whenever you are done with passing data, you can exit the console by pressing (Ctrl+C). Given below is the snapshot of the console where we
have connected to the source using the curl command.

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 19/21
12/31/21, 9:19 AM Apache Flume - Quick Guide

Each line that is entered in the above console will be received as an individual event by the source. Since we have used the Logger sink, these
events will be logged on to the console (source console) through the specified channel (memory channel in this case).
The following snapshot shows the NetCat console where the events are logged.

Useful Video Courses

Video

Apache Spark Online Training

46 Lectures 3.5 hours

Arnab Chakraborty

More Detail

Video

Apache Spark With Scala - Hands On With Big Data

23 Lectures 1.5 hours

Mukund Kumar Mishra

More Detail

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 20/21
12/31/21, 9:19 AM Apache Flume - Quick Guide
Video

Learn Apache Cordova Using Visual Studio 2015 & Command Line

16 Lectures 1 hours
Nilay Mehta

More Detail

Video

Delta Lake With Apache Spark Using Scala

52 Lectures 1.5 hours

Bigdata Engineer

More Detail

Video

Apache Zeppelin - Big Data Visualization Tool

14 Lectures 1 hours

Bigdata Engineer

More Detail

Video

Olympic Games Analytics Project In Apache Spark For Beginner

23 Lectures 1 hours
Bigdata Engineer

More Detail

https://www.tutorialspoint.com/apache_flume/apache_flume_quick_guide.htm 21/21

Abrir 02L085006 Service+Manual+VCF85
100% (3)
Abrir 02L085006 Service+Manual+VCF85
85 pages
Big Data-2 Sourcing Data
No ratings yet
Big Data-2 Sourcing Data
38 pages
MSS-SP-25 (2013) PDF
67% (3)
MSS-SP-25 (2013) PDF
31 pages
Selected Candidates List - Rinex Technologies - KA - 2025 Batch
No ratings yet
Selected Candidates List - Rinex Technologies - KA - 2025 Batch
3 pages
Apache Flume Tutorial PDF
No ratings yet
Apache Flume Tutorial PDF
43 pages
Hortonworks Data Platform (HDP)
100% (1)
Hortonworks Data Platform (HDP)
56 pages
Examdays TSAP - Indian & AP History - RC Reddy PDF
No ratings yet
Examdays TSAP - Indian & AP History - RC Reddy PDF
490 pages
Hadoop 3
No ratings yet
Hadoop 3
52 pages
Flume User Guide
No ratings yet
Flume User Guide
32 pages
Assignment
No ratings yet
Assignment
37 pages
Module 10 Flume - Massive Logs Aggregation
No ratings yet
Module 10 Flume - Massive Logs Aggregation
42 pages
Pulmonology (Q & A) (Medicalstudyzone - Com)
No ratings yet
Pulmonology (Q & A) (Medicalstudyzone - Com)
1,768 pages
Lect - 11 - BIG DATA
No ratings yet
Lect - 11 - BIG DATA
42 pages
Chapter 8 Flume - Massive Log Aggregation
No ratings yet
Chapter 8 Flume - Massive Log Aggregation
35 pages
Slide 4 Data Loading Tool
No ratings yet
Slide 4 Data Loading Tool
77 pages
Module 5 - Flume
No ratings yet
Module 5 - Flume
23 pages
Unit-3 (HDFS-II)
No ratings yet
Unit-3 (HDFS-II)
28 pages
Unit 2 (2 Part)
No ratings yet
Unit 2 (2 Part)
69 pages
BDA Mid-2 Important Questions
No ratings yet
BDA Mid-2 Important Questions
19 pages
Expose BDD
No ratings yet
Expose BDD
16 pages
FLUME
No ratings yet
FLUME
31 pages
Flume Agent
No ratings yet
Flume Agent
23 pages
5a. Introduction To Data Ingestion and Processing
No ratings yet
5a. Introduction To Data Ingestion and Processing
26 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
11 pages
A728542518 - 16469 - 30 - 2019 - Flume Complete
No ratings yet
A728542518 - 16469 - 30 - 2019 - Flume Complete
13 pages
Big Data: Week - 13
No ratings yet
Big Data: Week - 13
33 pages
Unit - 5 Updated MHM
No ratings yet
Unit - 5 Updated MHM
25 pages
New Applications June 2023
No ratings yet
New Applications June 2023
124 pages
Apache Flume: Distributed Log Collection For Hadoop - Second Edition - Sample Chapter
No ratings yet
Apache Flume: Distributed Log Collection For Hadoop - Second Edition - Sample Chapter
13 pages
6 Flume - Student - Datadotz
No ratings yet
6 Flume - Student - Datadotz
29 pages
Cse 17CS82 M2 S2 PPT
No ratings yet
Cse 17CS82 M2 S2 PPT
20 pages
Flume Developer Guide
No ratings yet
Flume Developer Guide
14 pages
06 - Acquire Data Using CLI and Flume
No ratings yet
06 - Acquire Data Using CLI and Flume
13 pages
Streaming Data Via Flume
No ratings yet
Streaming Data Via Flume
13 pages
Apache Flume
No ratings yet
Apache Flume
8 pages
U Iv Flume 1
No ratings yet
U Iv Flume 1
37 pages
Indjcse24 15 04 020
No ratings yet
Indjcse24 15 04 020
13 pages
Iso 14001 Static 16x9
100% (1)
Iso 14001 Static 16x9
13 pages
Flume
No ratings yet
Flume
15 pages
Big Data Ca
No ratings yet
Big Data Ca
14 pages
Sqoop VSFlume
No ratings yet
Sqoop VSFlume
18 pages
Flume PDF
No ratings yet
Flume PDF
7 pages
Bda Iat2
No ratings yet
Bda Iat2
23 pages
Unit-2 Imp Ques Ans
No ratings yet
Unit-2 Imp Ques Ans
8 pages
Screenshot 2025-01-13 at 12.17.38 PM
No ratings yet
Screenshot 2025-01-13 at 12.17.38 PM
12 pages
OGG Flume Integration
No ratings yet
OGG Flume Integration
12 pages
Bda Exp7 Chinmay
No ratings yet
Bda Exp7 Chinmay
5 pages
Data Ingest
No ratings yet
Data Ingest
15 pages
Unit 3 Part 2 Scoopflume
No ratings yet
Unit 3 Part 2 Scoopflume
10 pages
Essential Hadoop Tools: Module - 2 Session - 2
No ratings yet
Essential Hadoop Tools: Module - 2 Session - 2
6 pages
Search Analytics With Flume and HBase
No ratings yet
Search Analytics With Flume and HBase
24 pages
Sqoop & Flume: Issues With Data Load Into Hadoop
No ratings yet
Sqoop & Flume: Issues With Data Load Into Hadoop
6 pages
Apache Flume Tutorial - What Is - Architecture
No ratings yet
Apache Flume Tutorial - What Is - Architecture
8 pages
Twitter Data Analysis Using Flume & Hive On Hadoop Framework
No ratings yet
Twitter Data Analysis Using Flume & Hive On Hadoop Framework
5 pages
Presentation of Big Data
No ratings yet
Presentation of Big Data
4 pages
Modelling of The Pressure Drop in Tangential Inlet Cyclone Separators
No ratings yet
Modelling of The Pressure Drop in Tangential Inlet Cyclone Separators
10 pages
Flume User Guide
No ratings yet
Flume User Guide
48 pages
Breakdown Price: Jasa
No ratings yet
Breakdown Price: Jasa
2 pages
Arinto Murdopo Josep Subirats Group 4 EEDC 2012
No ratings yet
Arinto Murdopo Josep Subirats Group 4 EEDC 2012
19 pages
8 - Big - Data Vivek
No ratings yet
8 - Big - Data Vivek
2 pages
Apache Flume - Data Transfer in Hadoop - Tutorialspoint
No ratings yet
Apache Flume - Data Transfer in Hadoop - Tutorialspoint
2 pages
What Is Apache Flume?: Collecting, Aggregating, and Moving Large Amounts of Log Data. in
No ratings yet
What Is Apache Flume?: Collecting, Aggregating, and Moving Large Amounts of Log Data. in
8 pages
Flume Case Study
No ratings yet
Flume Case Study
2 pages
GEA Marine Purifiers For Motor Yachts - tcm11-83673
No ratings yet
GEA Marine Purifiers For Motor Yachts - tcm11-83673
6 pages
Computer Organization
100% (2)
Computer Organization
8 pages
Launch A Linux Virtual Machine
No ratings yet
Launch A Linux Virtual Machine
16 pages
Supply Chain PDF
No ratings yet
Supply Chain PDF
2 pages
Accenture Human Capital Services For SuccessFactors
No ratings yet
Accenture Human Capital Services For SuccessFactors
8 pages
C++ - Short-Notes
No ratings yet
C++ - Short-Notes
73 pages
BST, S&I, and EI: Lab Manual
No ratings yet
BST, S&I, and EI: Lab Manual
28 pages
VDL Sample
No ratings yet
VDL Sample
2 pages
Overview of Photonic Layer Functional Elements V4go
No ratings yet
Overview of Photonic Layer Functional Elements V4go
142 pages
63Y Set-Up EN XX
No ratings yet
63Y Set-Up EN XX
12 pages
Idea Makers Stephen Wolfram Epub - Google Search
0% (1)
Idea Makers Stephen Wolfram Epub - Google Search
3 pages
2 Crypto
No ratings yet
2 Crypto
86 pages
F3 Fixture
No ratings yet
F3 Fixture
2 pages
Store and Retrieve A File: Step 1. Enter The Amazon S3 Console
No ratings yet
Store and Retrieve A File: Step 1. Enter The Amazon S3 Console
9 pages
Touch Screen Technology: Let'S Touch The Future
No ratings yet
Touch Screen Technology: Let'S Touch The Future
45 pages
Kirankumar Kaisetty Manoharan Resume
No ratings yet
Kirankumar Kaisetty Manoharan Resume
7 pages
3D Printing in Housing Revolutionizing Construction
No ratings yet
3D Printing in Housing Revolutionizing Construction
8 pages
Design and Analysis of Algorithms
No ratings yet
Design and Analysis of Algorithms
5 pages
Trellix Insights: Key Benefits
No ratings yet
Trellix Insights: Key Benefits
8 pages
Y19 II Sem Syllabus
No ratings yet
Y19 II Sem Syllabus
22 pages
IO List
No ratings yet
IO List
2 pages
How To Construct A Class Diagram in Rational Rose SE
No ratings yet
How To Construct A Class Diagram in Rational Rose SE
4 pages
Microsoft Project Tutorial - How To Add Milestone PDF
No ratings yet
Microsoft Project Tutorial - How To Add Milestone PDF
14 pages
Aa270625068397p - SCN25062025 GST
No ratings yet
Aa270625068397p - SCN25062025 GST
1 page
Rashed
No ratings yet
Rashed
9 pages
CLOUD WEB SERVICES-syllabus
No ratings yet
CLOUD WEB SERVICES-syllabus
1 page
Linux Commands
No ratings yet
Linux Commands
4 pages
International Journal of Data Science and Analytics (IJDSA)
No ratings yet
International Journal of Data Science and Analytics (IJDSA)
2 pages
Exams PDF
No ratings yet
Exams PDF
1 page
Writing Portfolio Task 3 Writing Form (Final) Name: Hoh Jia Da Group: 54 MATRIC No.: 200480
No ratings yet
Writing Portfolio Task 3 Writing Form (Final) Name: Hoh Jia Da Group: 54 MATRIC No.: 200480
2 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Apache Flume: Distributed Log Collection for Hadoop - Second Edition
From Everand
Apache Flume: Distributed Log Collection for Hadoop - Second Edition
Steve Hoffman
No ratings yet
PHP & MySQL Practice It Learn It
From Everand
PHP & MySQL Practice It Learn It
Jitendra Patel
3/5 (2)
Apache Flume: Distributed Log Collection for Hadoop
From Everand
Apache Flume: Distributed Log Collection for Hadoop
Steve Hoffman
No ratings yet
phpMyAdmin Starter
From Everand
phpMyAdmin Starter
Marc Delisle
No ratings yet

Apache Flume

Uploaded by

Apache Flume

Uploaded by

12/31/21, 9:19 AM Apache Flume - Quick Guide

Apache Flume - Quick Guide

Apache Flume - Introduction

Here are the advantages of using Flume −

Flume provides the feature of contextual routing.

Flume is reliable, fault tolerant, scalable, manageable, and customizable.

Flume supports a large set of sources and destinations types.

Flume can be scaled horizontally.

Apache Flume - Data Transfer In Hadoop

Streaming / Log Data

On harvesting such log data, we can get information about −

HDFS put Command

Problem with put Command

Problem with HDFS

Apache Flume - Architecture

Example − JDBC channel, File system channel, Memory channel, etc.

Example − HDFS sink

Additional Components of Flume Agent

Apache Flume - Data Flow

Apache Flume - Environment

Extract the downloaded tar files as shown below.

Setting the Path / Classpath

Verifying the Installation

Apache Flume - Configuration

In the Flume configuration file, we need to −

Naming the Components

Sources Channels Sinks

Avro Source Memory Channel HDFS Sink

Describing the Source

agent_name.sources. source_name.type = value

TwitterAgent.sources.Twitter.type = Twitter (type name)

Describing the Sink

agent_name.sinks. sink_name.type = value

TwitterAgent.sinks.HDFS.type = hdfs (type name)

Describing the Channel

TwitterAgent.channels.MemChannel.type = memory (type name)

Binding the Source and the Sink to the Channel

Starting a Flume Agent

$ bin/flume-ng agent --conf ./conf/ -f conf/twitter.conf

agent − Command to start the Flume agent

--conf ,-c<conf> − Use configuration file in the conf directory

-f<file> − Specifies a config file path, if missing

--name, -n <name> − Name of the twitter agent

Apache Flume - Fetching Twitter Data

Create a twitter Application

Creating a Twitter Application

Step 1: Install / Verify Hadoop

Step 2: Starting Hadoop

Step 3: Create a Directory in HDFS

Twitter 1% Firehose Source

Setting the classpath

Source type : org.apache.flume.source.twitter.TwitterSource

consumerSecret − OAuth consumer secret

accessToken − OAuth access token

hdfs.path − the path of the directory in HDFS where data is to be stored.

writeFormat − Could be either text or writable.

rollsize − It is the file size to trigger a roll. It default value is 100.

Example – Configuration File

# Naming the components on the current agent.

# Describing/Configuring the source

# Describing/Configuring the sink

# Describing/Configuring the channel

# Binding the source and sink to the channel

Apache Flume - Sequence Generator Source

Sequence Generator Source

Source type − seq

hdfs.path − the path of the directory in HDFS where data is to be stored.

rollsize − It is the file size to trigger a roll. It default value is 100.

Example – Configuration File

# Naming the components on the current agent

# Describing/Configuring the source

# Describing/Configuring the sink

# Describing/Configuring the channel

# Binding the source and sink to the channel

Verifying the HDFS

Verifying the Contents of the File