Discover millions of ebooks, audiobooks, and so much more with a free trial

From $11.99/month after trial. Cancel anytime.

YARN Essentials
YARN Essentials
YARN Essentials
Ebook319 pages1 hour

YARN Essentials

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book
  • Learn the inner workings of YARN and how its robust and generic framework enables optimal resource utilization across multiple applications
  • Get to grips with single and multi-node installation, administration, and real-time distributed application development
  • A step-by-step self-learning guide to help you perform optimal resource utilization in a cluster
Who This Book Is For

If you have a working knowledge of Hadoop 1.x but want to start afresh with YARN, this book is ideal for you. You will be able to install and administer a YARN cluster and also discover the configuration settings to fine-tune your cluster both in terms of performance and scalability. This book will help you develop, deploy, and run multiple applications/frameworks on the same shared YARN cluster.

LanguageEnglish
Release dateFeb 24, 2015
ISBN9781784397722
YARN Essentials

Related to YARN Essentials

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for YARN Essentials

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    YARN Essentials - Amol Fasale

    Table of Contents

    YARN Essentials

    Credits

    About the Authors

    About the Reviewers

    www.PacktPub.com

    Support files, eBooks, discount offers, and more

    Why subscribe?

    Free access for Packt account holders

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Downloading the example code

    Errata

    Piracy

    Questions

    1. Need for YARN

    The redesign idea

    Limitations of the classical MapReduce or Hadoop 1.x

    YARN as the modern operating system of Hadoop

    What are the design goals for YARN

    Summary

    2. YARN Architecture

    Core components of YARN architecture

    ResourceManager

    ApplicationMaster (AM)

    NodeManager (NM)

    YARN scheduler policies

    The FIFO (First In First Out) scheduler

    The fair scheduler

    The capacity scheduler

    Recent developments in YARN architecture

    Summary

    3. YARN Installation

    Single-node installation

    Prerequisites

    Platform

    Software

    Starting with the installation

    The standalone mode (local mode)

    The pseudo-distributed mode

    The fully-distributed mode

    HistoryServer

    Slave files

    Operating Hadoop and YARN clusters

    Starting Hadoop and YARN clusters

    Stopping Hadoop and YARN clusters

    Web interfaces of the Ecosystem

    Summary

    4. YARN and Hadoop Ecosystems

    The Hadoop 2 release

    A short introduction to Hadoop 1.x and MRv1

    MRv1 versus MRv2

    Understanding where YARN fits into Hadoop

    Old and new MapReduce APIs

    Backward compatibility of MRv2 APIs

    Binary compatibility of org.apache.hadoop.mapred APIs

    Source compatibility of org.apache.hadoop.mapred APIs

    Practical examples of MRv1 and MRv2

    Preparing the input file(s)

    Running the job

    Result

    Summary

    5. YARN Administration

    Container allocation

    Container allocation to the application

    Container configurations

    YARN scheduling policies

    The FIFO (First In First Out) scheduler

    The FIFO (First In First Out) scheduler

    The capacity scheduler

    Capacity scheduler configurations

    The fair scheduler

    Fair scheduler configurations

    YARN multitenancy application support

    Administration of YARN

    Administrative tools

    Adding and removing nodes from a YARN cluster

    Administrating YARN jobs

    MapReduce job configurations

    YARN log management

    YARN web user interface

    Summary

    6. Developing and Running a Simple YARN Application

    Running sample examples on YARN

    Running a sample Pi example

    Monitoring YARN applications with web GUI

    YARN's MapReduce support

    The MapReduce ApplicationMaster

    Example YARN MapReduce settings

    YARN's compatibility with MapReduce applications

    Developing YARN applications

    The YARN application workflow

    Writing the YARN client

    Writing the YARN ApplicationMaster

    Responsibilities of the ApplicationMaster

    Summary

    7. YARN Frameworks

    Apache Samza

    Writing a Kafka producer

    Writing the hello-samza project

    Starting a grid

    Storm-YARN

    Prerequisites

    Hadoop YARN should be installed

    Apache ZooKeeper should be installed

    Setting up Storm-YARN

    Getting the storm.yaml configuration of the launched Storm cluster

    Building and running Storm-Starter examples

    Apache Spark

    Why run on YARN?

    Apache Tez

    Apache Giraph

    HOYA (HBase on YARN)

    KOYA (Kafka on YARN)

    Summary

    8. Failures in YARN

    ResourceManager failures

    ApplicationMaster failures

    NodeManager failures

    Container failures

    Hardware Failures

    Summary

    9. YARN – Alternative Solutions

    Mesos

    Omega

    Corona

    Summary

    10. YARN – Future and Support

    What YARN means to the big data industry

    Journey – present and future

    Present on-going features

    Future features

    YARN-supported frameworks

    Summary

    Index

    YARN Essentials


    YARN Essentials

    Copyright © 2015 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: February 2015

    Production reference: 1190215

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham B3 2PB, UK.

    ISBN 978-1-78439-173-7

    www.packtpub.com

    Credits

    Authors

    Amol Fasale

    Nirmal Kumar

    Reviewers

    Lakshmi Narasimhan

    Swapnil Salunkhe

    Jenny (Xiao) Zhang

    Commissioning Editor

    Taron Pereira

    Acquisition Editor

    James Jones

    Content Development Editor

    Arwa Manasawala

    Technical Editor

    Indrajit A. Das

    Copy Editors

    Karuna Narayanan

    Laxmi Subramanian

    Project Coordinator

    Purav Motiwalla

    Proofreaders

    Safis Editing

    Maria Gould

    Indexer

    Priya Sane

    Graphics

    Sheetal Aute

    Valentina D'silva

    Abhinash Sahu

    Production Coordinator

    Shantanu N. Zagade

    Cover Work

    Shantanu N. Zagade

    About the Authors

    Amol Fasale has more than 4 years of industry experience actively working in the fields of big data and distributed computing; he is also an active blogger in and contributor to the open source community. Amol works as a senior data system engineer at MakeMyTrip.com, a very well-known travel and hospitality portal in India, responsible for real-time personalization of online user experience with Apache Kafka, Apache Storm, Apache Hadoop, and many more. Also, Amol has active hands-on experience in Java/J2EE, Spring Frameworks, Python, machine learning, Hadoop framework components, SQL, NoSQL, and graph databases.

    You can follow Amol on Twitter at @amolfasale or on LinkedIn. Amol is very active on social media. You can catch him online for any technical assistance; he would be happy to help.

    Amol has completed his bachelor's in engineering (electronics and telecommunication) from Pune University and postgraduate diploma in computers from CDAC.

    The gift of love is one of the greatest blessings from parents, and I am heartily thankful to my mom, dad, friends, and colleagues who have shown and continue to show their support in different ways. Finally, I owe much to James and Arwa without whose direction and understanding, I would not have completed this work.

    Nirmal Kumar is a lead software engineer at iLabs, the R&D team at Impetus Infotech Pvt. Ltd. He has more than 8 years of experience in open source technologies such as Java, JEE, Spring, Hibernate, web services, Hadoop, Hive, Flume, Sqoop, Kafka, Storm, NoSQL databases such as HBase and Cassandra, and MPP databases such as Teradata.

    You can follow him on Twitter at @nirmal___kumar. He spends most of his time reading about and playing with different technologies. He has also undertaken many tech talks and training sessions on big data technologies.

    He has attained his master's degree in computer applications from Harcourt Butler Technological Institute (HBTI), Kanpur, India and is currently part of the big data R&D team in iLabs at Impetus Infotech Pvt. Ltd.

    I would like to thank my organization, especially iLabs, for supporting me in writing this book. Also, a special thanks to the Packt Publishing team; without you guys, this work would not have been possible.

    About the Reviewers

    Lakshmi Narasimhan is a full stack developer who has been working on big data and search since the early days of Lucene and was a part of the search team at Ask.com. He is a big advocate of open source and regularly contributes and consults on various technologies, most notably Drupal and technologies related to big data. Lakshmi is currently working as the curriculum designer for his own training company, http://www.readybrains.com. He blogs occasionally about his technical endeavors at http://www.lakshminp.com and can be contacted via his Twitter handle, @lakshminp.

    It's hard find a ready reference or documentation for a subject like YARN. I'd like to thank the author for writing a book on YARN and hope the target audience finds it useful.

    Swapnil Salunkhe is a passionate software developer who is keenly interested in learning and implementing new technologies. He has a passion for functional programming, machine learning, and working with data. He has experience working in the finance and telecom domains.

    I'd like to thank Packt Publishing and its staff for an opportunity to contribute to this book.

    Jenny (Xiao) Zhang is a technology professional in business analytics, KPIs, and big data. She helps businesses better manage, measure, report, and analyze data to answer critical business questions and drive business growth. She is an expert in SaaS business and had experience in a variety of industry domains such as telecom, oil and gas, and finance. She has written a number of blog posts at http://jennyxiaozhang.com on big data, Hadoop, and YARN. She also actively uses Twitter at @smallnaruto to share insights on big data and analytics.

    I want to thank all my blog readers. It is the encouragement from them that motivates me to deep dive into the ocean of big data. I also want to thank my dad, Michael (Tiegang) Zhang, for providing technical insights in the process of reviewing the book. A special thanks to the Packt Publishing team for this great opportunity.

    www.PacktPub.com

    Support files, eBooks, discount offers, and more

    For support files and downloads related to your book, please visit www.PacktPub.com.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www2.packtpub.com/books/subscription/packtlib

    Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Free access for Packt account holders

    If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

    Preface

    In a short span of time, YARN has attained a great deal of momentum and acceptance in the big data world.

    YARN essentials is about YARN—the modern operating system for Hadoop. This book contains all that you need to know about YARN, right from its inception to the present and future.

    In the first part of the

    Enjoying the preview?
    Page 1 of 1