Presentation of Big Data

This document discusses how large amounts of data can be collected, aggregated, and moved between Hadoop Distributed File System and relational database management systems using tools like Sqoop and Flume. It explains that Flume is used for ingesting streaming data into Hadoop while Sqoop can import data from relational databases into Hadoop.

Uploaded by

merazga ammar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views4 pages

Presentation of Big Data

Uploaded by

merazga ammar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Big Data applies to information that cannot be processed or

analyzed using traditional (RDBMS)(Relational database

management system) process or tools. So large amount of data is
required for analytical processing and this data is loaded from
different sources into Hadoop clusters (i.e. a cluster used for storing
and analyzing a huge amount of data in a distributed manner).
Sourcing of this bulk data into Hadoop clusters from different
sources faces a problem like maintaining and ensuring data
consistency since each data source could have data in varied form
and structure. The best way of collecting, aggregating, and moving
large amounts of data between the Hadoop Distributed File System
and RDBMS is via using tools such as Sqoop or Flume.

-On the one hand, you may have a Hadoop cluster that used for the
procession and storing large amounts of data, in the other hand,
you have an application that producing a large of amount of data or
you have a legacy system that storing data in a relational database.
how do you connect these two? That's exactly where Flume and
Sqoop coming .
Flume is used for ingesting Streaming data into Hadoop it has three
majored component : Sources - Channels and Sinks.
Sqoop is used to port your data from existing relational database
into Hadoop , we can use Sqoop to import data used from MySQL
into Ha doop.
Generally the data where come from two kinds of sources: either
it is an application that produces data in regular bases. or a
traditional "Relational database management system" like for
example "Oracle DB, SQL Server..." in both cases we
have sources which contains data and you have a destination
which is a Hadoop ecosystem data store....

The question now is how do we get our data from these sources
to Hadoop ?
Of course you will say after the introduction Flume and Sqoop .
but let explain how this process is done Or in the absence of
these tools, what is the steps to do ...
Normally , all the Hadoop ecosystem technology . exposes Java
APIs (application programming interface), you can directly use
these APIs to write data to for example to HDFS , HBase
Cassandra ...
But there are a few reasons why that may be a different problem ,
based on if you are transforming data :form an application or if
you are bulk transforming data such as RDBMS.
Let start with application : let suppose that we have a number of
events that produce data for this application which needs to be
stored as the events occur . this is called streaming data . so to
do that:
1- we have first to integrate your application with HDFS's Java
API .
2- also you have to create a mechanism to Buffer your data .
because the HDFS files have to be large to take advantage of it
distributed architecture . it means buffer the data in memory or in
an intermediate file before writing to HDFS
3- The buffer layer has to be fault-tolerant and non-lossy .
you should not lose any data even if there is a crash , and we
need a guarantee so that no data be lost.
All these difficulties and problems are then overridden by Flume

also there are a few problems with using RDBMS and directly
integrating with a Java API .
Let say that you have a legacy system used RDBMS and you
want to port your data from it to HDFS .
The first option is to:
1/ Dump all your tables into large files and than manually copy
these files to HDFS.
The second option is to :

2/ use scripts to read the data from RDBMS and then write the
data back to HDFS.
Fortunately, we do not need to think about any option we'll
choose thanks to Sqoop .

So Flume and Sqoop are technologies developed to isolate and

abstract the transport of data between a source and the data
store.
both are open source technologies developed by Apache.
There role in the Hadoop ecosystem is slightly similar . but they
used cases for each with be slightly different.

the first different is

Write Your Own PHP MVC Framework
No ratings yet
Write Your Own PHP MVC Framework
20 pages
Apache Flume Tutorial PDF
No ratings yet
Apache Flume Tutorial PDF
43 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
55 pages
Hortonworks Data Platform (HDP)
100% (1)
Hortonworks Data Platform (HDP)
56 pages
Demystifying The Big Data Ecosystem... - Param Natarajan
100% (1)
Demystifying The Big Data Ecosystem... - Param Natarajan
8 pages
Moving Data in and Out of Hadoop
No ratings yet
Moving Data in and Out of Hadoop
17 pages
Build A Restful App With Spring MVC and Angularjs
No ratings yet
Build A Restful App With Spring MVC and Angularjs
161 pages
Z BR Log Customizing Analyzerw
No ratings yet
Z BR Log Customizing Analyzerw
90 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
56 pages
Essential Hadoop Tools: Module - 2 Session - 2
No ratings yet
Essential Hadoop Tools: Module - 2 Session - 2
6 pages
Slide 4 Data Loading Tool
No ratings yet
Slide 4 Data Loading Tool
77 pages
A728542518 - 16469 - 30 - 2019 - Flume Complete
No ratings yet
A728542518 - 16469 - 30 - 2019 - Flume Complete
13 pages
Cse 17CS82 M2 S2 PPT
No ratings yet
Cse 17CS82 M2 S2 PPT
20 pages
Unit-3 (HDFS-II)
No ratings yet
Unit-3 (HDFS-II)
28 pages
15CS82 Module 2
No ratings yet
15CS82 Module 2
12 pages
Streaming Data Via Flume
No ratings yet
Streaming Data Via Flume
13 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Sqoop & Flume: Issues With Data Load Into Hadoop
No ratings yet
Sqoop & Flume: Issues With Data Load Into Hadoop
6 pages
Unit 2
No ratings yet
Unit 2
15 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
11 pages
06 - Acquire Data Using CLI and Flume
No ratings yet
06 - Acquire Data Using CLI and Flume
13 pages
Expose BDD
No ratings yet
Expose BDD
16 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
55 pages
Apache Flume Tutorial - What Is - Architecture
No ratings yet
Apache Flume Tutorial - What Is - Architecture
8 pages
Apache Flume: Distributed Log Collection For Hadoop - Second Edition - Sample Chapter
No ratings yet
Apache Flume: Distributed Log Collection For Hadoop - Second Edition - Sample Chapter
13 pages
Week 4 - Hadoop Ecosystem
No ratings yet
Week 4 - Hadoop Ecosystem
109 pages
Big Data Unit - 2
No ratings yet
Big Data Unit - 2
18 pages
Lect - 11 - BIG DATA
No ratings yet
Lect - 11 - BIG DATA
42 pages
Data Ingest
No ratings yet
Data Ingest
15 pages
Big Data Unit III (HDFS)
No ratings yet
Big Data Unit III (HDFS)
53 pages
UNIT 3 HDFS, Hadoop Environment Part 2
No ratings yet
UNIT 3 HDFS, Hadoop Environment Part 2
6 pages
CSE320 - Unit1 - 4 Devops Ci-Cd
No ratings yet
CSE320 - Unit1 - 4 Devops Ci-Cd
35 pages
Big Data Ca
No ratings yet
Big Data Ca
14 pages
Unit-2 Imp Ques Ans
No ratings yet
Unit-2 Imp Ques Ans
8 pages
Apache Flume
No ratings yet
Apache Flume
21 pages
Sqoop VSFlume
No ratings yet
Sqoop VSFlume
18 pages
Apache Flume
No ratings yet
Apache Flume
8 pages
Module 5 - Flume
No ratings yet
Module 5 - Flume
23 pages
Flume Agent
No ratings yet
Flume Agent
23 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
Module IV
No ratings yet
Module IV
5 pages
QB
No ratings yet
QB
1 page
Unit IV Hadoop
No ratings yet
Unit IV Hadoop
90 pages
Bda Exp7 Chinmay
No ratings yet
Bda Exp7 Chinmay
5 pages
Unit 3 Part 2 Scoopflume
No ratings yet
Unit 3 Part 2 Scoopflume
10 pages
HADOOP Notes Unit 3 and 4
No ratings yet
HADOOP Notes Unit 3 and 4
14 pages
Advanced Co Angular Questions
No ratings yet
Advanced Co Angular Questions
21 pages
Big Data: Week - 13
No ratings yet
Big Data: Week - 13
33 pages
(Ebook - PDF - JSF) Sun - The Java Server Faces Technology Tutorial
100% (4)
(Ebook - PDF - JSF) Sun - The Java Server Faces Technology Tutorial
162 pages
Unit 2 (2 Part)
No ratings yet
Unit 2 (2 Part)
69 pages
Bda Iat2
No ratings yet
Bda Iat2
23 pages
FLUME
No ratings yet
FLUME
31 pages
Flume Case Study
No ratings yet
Flume Case Study
2 pages
U Iv Flume 1
No ratings yet
U Iv Flume 1
37 pages
Ankit Frontenddevloper 1
No ratings yet
Ankit Frontenddevloper 1
1 page
12th Computer Science EM Chapter 12 Study Materials English Medium PDF Download
No ratings yet
12th Computer Science EM Chapter 12 Study Materials English Medium PDF Download
17 pages
C Is An Attitude
0% (1)
C Is An Attitude
42 pages
Apache Cassandra: Database
No ratings yet
Apache Cassandra: Database
55 pages
Ketan Babbar - 22BCC70082 - Daa
No ratings yet
Ketan Babbar - 22BCC70082 - Daa
34 pages
Se2030-01 Software Engineering Tools and Practices
No ratings yet
Se2030-01 Software Engineering Tools and Practices
33 pages
A Study On Windows Mobile 6.5 Operation System
No ratings yet
A Study On Windows Mobile 6.5 Operation System
13 pages
Spring Kafka Reference
No ratings yet
Spring Kafka Reference
241 pages
What Is Rapid Application Development?
No ratings yet
What Is Rapid Application Development?
35 pages
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
No ratings yet
Sorting Algorithms Sorting Algorithms: Biostatistics 615/815
43 pages
Text Compression
No ratings yet
Text Compression
25 pages
Pratt Chapter 1
No ratings yet
Pratt Chapter 1
11 pages
What Is Data Science
No ratings yet
What Is Data Science
34 pages
Write C Programs To Simulate The Following CPU Scheduling Algorithms
No ratings yet
Write C Programs To Simulate The Following CPU Scheduling Algorithms
46 pages
DS Lab Programs 1-12
No ratings yet
DS Lab Programs 1-12
51 pages
Unit I Introducation and Overview Java
No ratings yet
Unit I Introducation and Overview Java
34 pages
OAF Personalisation Migration
No ratings yet
OAF Personalisation Migration
8 pages
L1 - Instructions - Intro - Operations - Operands of The Computer
No ratings yet
L1 - Instructions - Intro - Operations - Operands of The Computer
19 pages
Assignment On Declarative Programming Paradigm
No ratings yet
Assignment On Declarative Programming Paradigm
22 pages
Akash Khandagale JavaDeveloper Resume PDF
No ratings yet
Akash Khandagale JavaDeveloper Resume PDF
2 pages
Eclipse Error Log
No ratings yet
Eclipse Error Log
1 page
Spiros Styliaras: Devops Engineer
No ratings yet
Spiros Styliaras: Devops Engineer
2 pages
Resume UniversityId
No ratings yet
Resume UniversityId
2 pages
Name: Abhishek Pokale Register Number: 21BCE10761 Name: Abhishek Pokale Register Number: 21BCE10761
No ratings yet
Name: Abhishek Pokale Register Number: 21BCE10761 Name: Abhishek Pokale Register Number: 21BCE10761
1 page
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Hadoop Blueprints
From Everand
Hadoop Blueprints
Anurag Shrivastava
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Presentation of Big Data

Uploaded by

Presentation of Big Data

Uploaded by

Big Data applies to information that cannot be processed or

analyzed using traditional (RDBMS)(Relational database

So Flume and Sqoop are technologies developed to isolate and

the first different is

You might also like