0% found this document useful (0 votes)

45 views4 pages

Real-Time Processing of Events (Sensor, Telecommunications, Fraud Etc.) Even

Apache Hadoop v2 represents a major shift in Hadoop's architecture with the introduction of YARN. YARN separates resource management from job processing, allowing Hadoop to support various workloads beyond batch processing like real-time analytics and interactive SQL queries. Key features of Hadoop v2 include YARN, high availability for HDFS, HDFS federation, and improved performance and Windows support. The community continues enhancing capabilities like YARN scheduling and long-running services.

Uploaded by

amitbcm007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views4 pages

Real-Time Processing of Events (Sensor, Telecommunications, Fraud Etc.) Even

Uploaded by

amitbcm007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Apache Hadoop v2 is not just a major release number, but represents generational shift

in the architecture of Apache Hadoop. With YARN, Apache Hadoop is recast as a

significantly more powerful platform one that takes Hadoop beyond merely batch
applications to taking its position as a data operating system.
To recap, Apache Hadoop v1 comprised of HDFS & MapReduce.
With HDFS one could store data of all manner, however MapReduce was the only
algorithm you could use to process that data in parallel. That was very limiting since
MapReduce, although very general, proved inadequate to satisfy all the demands being
placed on Apache Hadoop.
As Apache Hadoop crystallizes into a key component of a Modern Data Architecture,
users and customers want to store all data in HDFS and interact with that data in
multiple ways:

Real-time processing of events (sensor, telecommunications, fraud etc.) even

before it lands on HDFS

Interactive query capabilities for interrogating new data for data analysts (SQL)
and data scientists (SQL plus scripting etc.)

The need to productionize the insight i.e. batch-processing, reporting etc. in a

well-defined and timely manner

The community has worked together to make HDFS itself a much more scalable,
efficient and enterprise-friendly storage platform by addressing key functionality High
Availability for the HDFS NameNode, Federation for scaling & HDFS Snapshots to list a
few.
With YARN, Apache Hadoop now clearly delineates the system (resource management,
security, SLAs etc.) from the application framework (e.g. MapReduce) and allows for
multiple ways to interact with the data in HDFS (batch with MapReduce, streaming with
Apache Storm, interactive SQL with Apache Hive and Apache Tez).

We are already seeing the benefits of this vision in the form of many and varied
applications and services being re-vectored on top of YARN such as Apache Storm for
event processing, Apache Giraph for graph processing, Apache Tez for interactive SQL
queries, HOYA for running services such as Apache HBase and Apache Accumulo on
YARN and so on. Exciting times indeed!
As a result the Hadoop stack looks very different with Hadoop v2:

Personally, its a huge thrill to see this baby grow up and reach adulthood since
the original Jira ticket (MAPREDUCE-279) opened more than 5 years ago!

Apache Hadoop v2
As a lot of people are aware, Apache Hadoop 2 landed the Beta tag a few months ago.
Since then the community has spent a lot of time validating the APIs, protocols and the
system itself. As a result we are now very confident in our ability to not only handle the
workloads that will be thrown at Apache Hadoop, but also in our ability to do so in a
forward compatible manner such that Apache Hadoop v2 represents a stable base atop
which the ecosystem can flourish in the future.
For those who, like me, are more comfortable with simplified lists (*smile*), here are the
enhancements and major features:

YARN

High Availability for HDFS

HDFS Federation

HDFS Snapshots

NFSv3 access to data in HDFS

Binary Compatibility for MapReduce applications between Hadoop v1 and

Hadoop v2 to ease migration

Performance

Support for running Hadoop on Microsoft Windows

Integration testing for the entire Apache Hadoop ecosystem at the ASF.

Onwards
Although its a major milestone and a big reason to celebrate, the Apache Hadoop
community will continue to drive it forward under the aegis of the the ASF. There are
ever more things to do, user-cases to fulfill and users to thrill. The HDFS community is
striving hard to finish up the addition of symlinks to HDFS which just didnt make the cut
at the last minute. On the YARN side we plan to add more enhancements such as

advanced scheduling features, high availability for YARN Resource Manager, enhanced
support for long-running services and generally make it easier to run other applications
such as Apache Storm within YARN. Stay tuned!

Terminology and Architecture

MapReduce from Hadoop 1 (MapReduce 1) has been split into two components. The cluster resource
management capabilities have become YARN (Yet Another Resource Negotiator), while the MapReducespecific capabilities remain MapReduce. In the MapReduce 1 architecture, the cluster was managed by a
service called the JobTracker. TaskTracker services lived on each node and would launch tasks on behalf
of jobs. The JobTracker would serve information about completed jobs. In MapReduce 2, the functions of
the JobTracker have been split between three services. The ResourceManager is a persistent YARN
service that receives and runs applications (a MapReduce job is an application) on the cluster. It contains
the scheduler, which, as previously, is pluggable. The MapReduce-specific capabilities of the JobTracker
have been moved into the MapReduce Application Master, one of which is started to manage each
MapReduce job and terminated when the job completes. The JobTrackers function of serving information
about completed jobs has been moved to the JobHistoryServer. The TaskTracker has been replaced with
the NodeManager, a YARN service that manages resources and deployment on a node. It is responsible
for launching containers, each of which can house a map or reduce task.

The new architecture has its advantages. First, by breaking up the JobTracker into a few different
services, it avoids many of the scaling issues faced by MapReduce in Hadoop 1. More importantly, it
makes it possible to run frameworks other than MapReduce on a Hadoop cluster. For example, Impala
can also run on YARN and share resources on a cluster with MapReduce.

http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoopmapreduce-client-core/MapReduceTutorial.html

Wa0005.
No ratings yet
Wa0005.
84 pages
Understanding Hadoop Ecosystem
No ratings yet
Understanding Hadoop Ecosystem
38 pages
Unit 2lecturenotes 240530095215 Bebaac62
No ratings yet
Unit 2lecturenotes 240530095215 Bebaac62
98 pages
Big Data - Tomas Iglesias IV
No ratings yet
Big Data - Tomas Iglesias IV
37 pages
Updated Unit-IV Reference PPT 08-02-2022
No ratings yet
Updated Unit-IV Reference PPT 08-02-2022
103 pages
BDS Session 6
No ratings yet
BDS Session 6
78 pages
Describe The Functions and Features of HDP
100% (2)
Describe The Functions and Features of HDP
16 pages
BIGDATA4
No ratings yet
BIGDATA4
28 pages
Computer Science Assignment Xi
No ratings yet
Computer Science Assignment Xi
34 pages
Module 1 - Introduction To Big Data
100% (1)
Module 1 - Introduction To Big Data
40 pages
Unit 4
No ratings yet
Unit 4
85 pages
Experiment No - 01
No ratings yet
Experiment No - 01
14 pages
Itab-Unit - 2: Computer Software
No ratings yet
Itab-Unit - 2: Computer Software
87 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
56 pages
Big Data Notes
No ratings yet
Big Data Notes
12 pages
Lec 2
No ratings yet
Lec 2
19 pages
FYP-1 Evaluation Sheet
No ratings yet
FYP-1 Evaluation Sheet
1 page
Hadoop Platform & Services
No ratings yet
Hadoop Platform & Services
41 pages
Custom Notes
No ratings yet
Custom Notes
10 pages
Flexible Manufacturing Systems (F.M.S) : A Whitepaper
100% (4)
Flexible Manufacturing Systems (F.M.S) : A Whitepaper
61 pages
BD U-4 (Anupam Sir)
No ratings yet
BD U-4 (Anupam Sir)
23 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
1.1.2 and 1.1.3
No ratings yet
1.1.2 and 1.1.3
21 pages
DS Lab 22
No ratings yet
DS Lab 22
115 pages
Chapter - 6 - Hadoop
No ratings yet
Chapter - 6 - Hadoop
51 pages
bdcc-2 2
No ratings yet
bdcc-2 2
12 pages
Structures in C
100% (1)
Structures in C
25 pages
M2 Bigdata&Hadoop
No ratings yet
M2 Bigdata&Hadoop
27 pages
Nude Celebrity Photo Hacker From Lancaster Charged in 'Fappening' Scandal: Lancaster
No ratings yet
Nude Celebrity Photo Hacker From Lancaster Charged in 'Fappening' Scandal: Lancaster
9 pages
Bda 201070046 01
No ratings yet
Bda 201070046 01
24 pages
Bda Unit 3
No ratings yet
Bda Unit 3
50 pages
Implement Using NOR Gates Only
No ratings yet
Implement Using NOR Gates Only
3 pages
HowTo5 - 001 HMI VIJEO DEIGNER MAGELIS
No ratings yet
HowTo5 - 001 HMI VIJEO DEIGNER MAGELIS
14 pages
Mod 5
No ratings yet
Mod 5
46 pages
Hadoop - Presentation 101
No ratings yet
Hadoop - Presentation 101
10 pages
CB3405 - Unit 3 - Notes
No ratings yet
CB3405 - Unit 3 - Notes
43 pages
Unit5 B
No ratings yet
Unit5 B
4 pages
2 - Yarn
No ratings yet
2 - Yarn
59 pages
Lecture 06
No ratings yet
Lecture 06
26 pages
10 - Big Data Architecture and Tools
No ratings yet
10 - Big Data Architecture and Tools
31 pages
Hadoop Class 2 PDF
No ratings yet
Hadoop Class 2 PDF
18 pages
MapReduce V1
No ratings yet
MapReduce V1
26 pages
Download
No ratings yet
Download
7 pages
Big Data Exam Help
No ratings yet
Big Data Exam Help
7 pages
Hadoop 2full Mod2
No ratings yet
Hadoop 2full Mod2
10 pages
Hadoop 2.0 YARN
No ratings yet
Hadoop 2.0 YARN
7 pages
Sms
No ratings yet
Sms
98 pages
SmallBusCompliancePoster PDF
No ratings yet
SmallBusCompliancePoster PDF
1 page
6 Yarn
No ratings yet
6 Yarn
10 pages
Hadoop Yarn
No ratings yet
Hadoop Yarn
11 pages
Unit - 4 Yarn
No ratings yet
Unit - 4 Yarn
20 pages
Hadoop
No ratings yet
Hadoop
7 pages
Big Data-Week 3 - 1
No ratings yet
Big Data-Week 3 - 1
22 pages
Introduction To YARN
No ratings yet
Introduction To YARN
17 pages
Xps Spi
No ratings yet
Xps Spi
38 pages
CASE STUDY On Application of Hadoop
No ratings yet
CASE STUDY On Application of Hadoop
16 pages
Postgresql Plpython Cheatsheet
No ratings yet
Postgresql Plpython Cheatsheet
1 page
Info Theory Exercise Solutions
No ratings yet
Info Theory Exercise Solutions
16 pages
Operations (Math IBA)
No ratings yet
Operations (Math IBA)
45 pages
YARN (Yet Another Resource Negotiator) : Apache Hadoop in A Nutshell
No ratings yet
YARN (Yet Another Resource Negotiator) : Apache Hadoop in A Nutshell
2 pages
Hadoop 2.0
No ratings yet
Hadoop 2.0
20 pages
Day 2 S1 Intro - To - Hadoop - Ashok
No ratings yet
Day 2 S1 Intro - To - Hadoop - Ashok
27 pages
Pre-Read - Process 1 - Aspac Process Workshop - Recruitment V2.0
No ratings yet
Pre-Read - Process 1 - Aspac Process Workshop - Recruitment V2.0
24 pages
Framework For Processing Data in Hadoop - : Yarn and Mapreduce
No ratings yet
Framework For Processing Data in Hadoop - : Yarn and Mapreduce
31 pages
Computer Architecture MCQ
No ratings yet
Computer Architecture MCQ
102 pages
0418 w04 Ms 2
No ratings yet
0418 w04 Ms 2
8 pages
Management Information Systems (891) : Assignment No. 1
No ratings yet
Management Information Systems (891) : Assignment No. 1
4 pages
Ipi74931 PDF
No ratings yet
Ipi74931 PDF
7 pages
Data - Management - Dipen Khanna
No ratings yet
Data - Management - Dipen Khanna
45 pages
Apache Hadoop YARN
No ratings yet
Apache Hadoop YARN
24 pages
Solucionario Econometria Jeffrey M Wooldridge PDF
11% (9)
Solucionario Econometria Jeffrey M Wooldridge PDF
4 pages
Apache Hadoop YARN - Enabling Next Generation Data Applications
No ratings yet
Apache Hadoop YARN - Enabling Next Generation Data Applications
64 pages
Apache Hadoop Next Generation Compute Platform: Bikas Saha @bikassaha
No ratings yet
Apache Hadoop Next Generation Compute Platform: Bikas Saha @bikassaha
22 pages
Hadoop
No ratings yet
Hadoop
7 pages
Unit 2 Notes BDA
No ratings yet
Unit 2 Notes BDA
10 pages
Adoop Cosystem: S W S A, T L at 68
No ratings yet
Adoop Cosystem: S W S A, T L at 68
22 pages
Web Dynpro Abap - Scn1
No ratings yet
Web Dynpro Abap - Scn1
39 pages
Infinite Sequences Lecture Notes
No ratings yet
Infinite Sequences Lecture Notes
4 pages
Unit 2 B)
No ratings yet
Unit 2 B)
16 pages
EHSMS Procedure
No ratings yet
EHSMS Procedure
4 pages
YARN Essentials - Sample Chapter
No ratings yet
YARN Essentials - Sample Chapter
12 pages
C++ Exercises II
50% (2)
C++ Exercises II
4 pages
Sending Calendar Invitations From TOPdesk
No ratings yet
Sending Calendar Invitations From TOPdesk
3 pages
Credit Card Usage Pattern
No ratings yet
Credit Card Usage Pattern
3 pages
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet

Real-Time Processing of Events (Sensor, Telecommunications, Fraud Etc.) Even

Uploaded by

Real-Time Processing of Events (Sensor, Telecommunications, Fraud Etc.) Even

Uploaded by

Apache Hadoop v2 is not just a major release number, but represents generational shift

in the architecture of Apache Hadoop. With YARN, Apache Hadoop is recast as a

Real-time processing of events (sensor, telecommunications, fraud etc.) even

The need to productionize the insight i.e. batch-processing, reporting etc. in a

High Availability for HDFS

NFSv3 access to data in HDFS

Binary Compatibility for MapReduce applications between Hadoop v1 and

Support for running Hadoop on Microsoft Windows

Terminology and Architecture

You might also like