YARN Essentials
By Amol Fasale and Nirmal Kumar
()
About this ebook
- Learn the inner workings of YARN and how its robust and generic framework enables optimal resource utilization across multiple applications
- Get to grips with single and multi-node installation, administration, and real-time distributed application development
- A step-by-step self-learning guide to help you perform optimal resource utilization in a cluster
If you have a working knowledge of Hadoop 1.x but want to start afresh with YARN, this book is ideal for you. You will be able to install and administer a YARN cluster and also discover the configuration settings to fine-tune your cluster both in terms of performance and scalability. This book will help you develop, deploy, and run multiple applications/frameworks on the same shared YARN cluster.
Related to YARN Essentials
Related ebooks
Scala Programming for Big Data Analytics: Get Started With Big Data Analytics Using Apache Spark Rating: 0 out of 5 stars0 ratingsGetting Started with Hazelcast - Second Edition Rating: 0 out of 5 stars0 ratingsLearning SaltStack Rating: 4 out of 5 stars4/5OpenStack Essentials Rating: 0 out of 5 stars0 ratingsOptimizing Hadoop for MapReduce Rating: 0 out of 5 stars0 ratingsApache Cassandra Essentials Rating: 4 out of 5 stars4/5Cloudera Administration Handbook Rating: 0 out of 5 stars0 ratingsHadoop 2.x Administration Cookbook Rating: 0 out of 5 stars0 ratingsGoogle Cloud Platform Complete Self-Assessment Guide Rating: 1 out of 5 stars1/5AWS Key Management Service and AWS CloudHSM Third Edition Rating: 0 out of 5 stars0 ratingsMySQL Cluster 7.5 inside and out Rating: 0 out of 5 stars0 ratingsSQL Server 2019 AlwaysOn: Supporting 24x7 Applications with Continuous Uptime Rating: 0 out of 5 stars0 ratingsLearn Hadoop in 24 Hours Rating: 0 out of 5 stars0 ratingsPractical API Architecture and Development with Azure and AWS: Design and Implementation of APIs for the Cloud Rating: 0 out of 5 stars0 ratingsMastering Hadoop Rating: 0 out of 5 stars0 ratingsMonitoring Hadoop Rating: 0 out of 5 stars0 ratingsDeveloping Applications with Azure Active Directory: Principles of Authentication and Authorization for Architects and Developers Rating: 0 out of 5 stars0 ratingsExploring Hadoop Ecosystem (Volume 2): Stream Processing Rating: 0 out of 5 stars0 ratingsExploring Hadoop Ecosystem (Volume 1): Batch Processing Rating: 0 out of 5 stars0 ratingsData Normalization A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsUnderstanding Azure Data Factory: Operationalizing Big Data and Advanced Analytics Solutions Rating: 0 out of 5 stars0 ratingsMeteor Cookbook Rating: 5 out of 5 stars5/5AWS Organizations Second Edition Rating: 0 out of 5 stars0 ratingsInstant Citrix XenApp Rating: 5 out of 5 stars5/5Troubleshooting CentOS Rating: 0 out of 5 stars0 ratingsPostgreSQL Administration Cookbook, 9.5/9.6 Edition Rating: 0 out of 5 stars0 ratingsInstant Pentaho Data Integration Kitchen Rating: 0 out of 5 stars0 ratingsAnsible DevOps Cookbook Rating: 0 out of 5 stars0 ratingsLearning Azure DocumentDB Rating: 0 out of 5 stars0 ratings
Enterprise Applications For You
QuickBooks 2024 All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsQuickBooks 2023 All-in-One For Dummies Rating: 0 out of 5 stars0 ratingsBitcoin For Dummies Rating: 4 out of 5 stars4/550 Useful Excel Functions: Excel Essentials, #3 Rating: 5 out of 5 stars5/5Excel Formulas That Automate Tasks You No Longer Have Time For Rating: 5 out of 5 stars5/5Excel 2019 Bible Rating: 5 out of 5 stars5/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5Excel 101: A Beginner's & Intermediate's Guide for Mastering the Quintessence of Microsoft Excel (2010-2019 & 365) in no time! Rating: 0 out of 5 stars0 ratingsNotion for Beginners: Notion for Work, Play, and Productivity Rating: 4 out of 5 stars4/5Enterprise AI For Dummies Rating: 3 out of 5 stars3/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Managing Humans: Biting and Humorous Tales of a Software Engineering Manager Rating: 4 out of 5 stars4/5QuickBooks Online For Dummies Rating: 0 out of 5 stars0 ratingsAccess 2019 For Dummies Rating: 0 out of 5 stars0 ratingsSalesforce.com For Dummies Rating: 3 out of 5 stars3/5Essential Office 365 Third Edition: The Illustrated Guide to Using Microsoft Office Rating: 3 out of 5 stars3/5Excel Tips and Tricks Rating: 0 out of 5 stars0 ratingsHelp Desk Practitioner's Handbook Rating: 5 out of 5 stars5/5Scrivener For Dummies Rating: 4 out of 5 stars4/5Excel All-in-One For Dummies Rating: 0 out of 5 stars0 ratings101 Most Popular Excel Formulas: 101 Excel Series, #1 Rating: 4 out of 5 stars4/5Generative AI For Dummies Rating: 0 out of 5 stars0 ratingsExcel Tables: A Complete Guide for Creating, Using and Automating Lists and Tables Rating: 5 out of 5 stars5/5SharePoint For Dummies Rating: 0 out of 5 stars0 ratings
Reviews for YARN Essentials
0 ratings0 reviews
Book preview
YARN Essentials - Amol Fasale
Table of Contents
YARN Essentials
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Need for YARN
The redesign idea
Limitations of the classical MapReduce or Hadoop 1.x
YARN as the modern operating system of Hadoop
What are the design goals for YARN
Summary
2. YARN Architecture
Core components of YARN architecture
ResourceManager
ApplicationMaster (AM)
NodeManager (NM)
YARN scheduler policies
The FIFO (First In First Out) scheduler
The fair scheduler
The capacity scheduler
Recent developments in YARN architecture
Summary
3. YARN Installation
Single-node installation
Prerequisites
Platform
Software
Starting with the installation
The standalone mode (local mode)
The pseudo-distributed mode
The fully-distributed mode
HistoryServer
Slave files
Operating Hadoop and YARN clusters
Starting Hadoop and YARN clusters
Stopping Hadoop and YARN clusters
Web interfaces of the Ecosystem
Summary
4. YARN and Hadoop Ecosystems
The Hadoop 2 release
A short introduction to Hadoop 1.x and MRv1
MRv1 versus MRv2
Understanding where YARN fits into Hadoop
Old and new MapReduce APIs
Backward compatibility of MRv2 APIs
Binary compatibility of org.apache.hadoop.mapred APIs
Source compatibility of org.apache.hadoop.mapred APIs
Practical examples of MRv1 and MRv2
Preparing the input file(s)
Running the job
Result
Summary
5. YARN Administration
Container allocation
Container allocation to the application
Container configurations
YARN scheduling policies
The FIFO (First In First Out) scheduler
The FIFO (First In First Out) scheduler
The capacity scheduler
Capacity scheduler configurations
The fair scheduler
Fair scheduler configurations
YARN multitenancy application support
Administration of YARN
Administrative tools
Adding and removing nodes from a YARN cluster
Administrating YARN jobs
MapReduce job configurations
YARN log management
YARN web user interface
Summary
6. Developing and Running a Simple YARN Application
Running sample examples on YARN
Running a sample Pi example
Monitoring YARN applications with web GUI
YARN's MapReduce support
The MapReduce ApplicationMaster
Example YARN MapReduce settings
YARN's compatibility with MapReduce applications
Developing YARN applications
The YARN application workflow
Writing the YARN client
Writing the YARN ApplicationMaster
Responsibilities of the ApplicationMaster
Summary
7. YARN Frameworks
Apache Samza
Writing a Kafka producer
Writing the hello-samza project
Starting a grid
Storm-YARN
Prerequisites
Hadoop YARN should be installed
Apache ZooKeeper should be installed
Setting up Storm-YARN
Getting the storm.yaml configuration of the launched Storm cluster
Building and running Storm-Starter examples
Apache Spark
Why run on YARN?
Apache Tez
Apache Giraph
HOYA (HBase on YARN)
KOYA (Kafka on YARN)
Summary
8. Failures in YARN
ResourceManager failures
ApplicationMaster failures
NodeManager failures
Container failures
Hardware Failures
Summary
9. YARN – Alternative Solutions
Mesos
Omega
Corona
Summary
10. YARN – Future and Support
What YARN means to the big data industry
Journey – present and future
Present on-going features
Future features
YARN-supported frameworks
Summary
Index
YARN Essentials
YARN Essentials
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: February 2015
Production reference: 1190215
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78439-173-7
www.packtpub.com
Credits
Authors
Amol Fasale
Nirmal Kumar
Reviewers
Lakshmi Narasimhan
Swapnil Salunkhe
Jenny (Xiao) Zhang
Commissioning Editor
Taron Pereira
Acquisition Editor
James Jones
Content Development Editor
Arwa Manasawala
Technical Editor
Indrajit A. Das
Copy Editors
Karuna Narayanan
Laxmi Subramanian
Project Coordinator
Purav Motiwalla
Proofreaders
Safis Editing
Maria Gould
Indexer
Priya Sane
Graphics
Sheetal Aute
Valentina D'silva
Abhinash Sahu
Production Coordinator
Shantanu N. Zagade
Cover Work
Shantanu N. Zagade
About the Authors
Amol Fasale has more than 4 years of industry experience actively working in the fields of big data and distributed computing; he is also an active blogger in and contributor to the open source community. Amol works as a senior data system engineer at MakeMyTrip.com, a very well-known travel and hospitality portal in India, responsible for real-time personalization of online user experience with Apache Kafka, Apache Storm, Apache Hadoop, and many more. Also, Amol has active hands-on experience in Java/J2EE, Spring Frameworks, Python, machine learning, Hadoop framework components, SQL, NoSQL, and graph databases.
You can follow Amol on Twitter at @amolfasale or on LinkedIn. Amol is very active on social media. You can catch him online for any technical assistance; he would be happy to help.
Amol has completed his bachelor's in engineering (electronics and telecommunication) from Pune University and postgraduate diploma in computers from CDAC.
The gift of love is one of the greatest blessings from parents, and I am heartily thankful to my mom, dad, friends, and colleagues who have shown and continue to show their support in different ways. Finally, I owe much to James and Arwa without whose direction and understanding, I would not have completed this work.
Nirmal Kumar is a lead software engineer at iLabs, the R&D team at Impetus Infotech Pvt. Ltd. He has more than 8 years of experience in open source technologies such as Java, JEE, Spring, Hibernate, web services, Hadoop, Hive, Flume, Sqoop, Kafka, Storm, NoSQL databases such as HBase and Cassandra, and MPP databases such as Teradata.
You can follow him on Twitter at @nirmal___kumar. He spends most of his time reading about and playing with different technologies. He has also undertaken many tech talks and training sessions on big data technologies.
He has attained his master's degree in computer applications from Harcourt Butler Technological Institute (HBTI), Kanpur, India and is currently part of the big data R&D team in iLabs at Impetus Infotech Pvt. Ltd.
I would like to thank my organization, especially iLabs, for supporting me in writing this book. Also, a special thanks to the Packt Publishing team; without you guys, this work would not have been possible.
About the Reviewers
Lakshmi Narasimhan is a full stack developer who has been working on big data and search since the early days of Lucene and was a part of the search team at Ask.com. He is a big advocate of open source and regularly contributes and consults on various technologies, most notably Drupal and technologies related to big data. Lakshmi is currently working as the curriculum designer for his own training company, http://www.readybrains.com. He blogs occasionally about his technical endeavors at http://www.lakshminp.com and can be contacted via his Twitter handle, @lakshminp.
It's hard find a ready reference or documentation for a subject like YARN. I'd like to thank the author for writing a book on YARN and hope the target audience finds it useful.
Swapnil Salunkhe is a passionate software developer who is keenly interested in learning and implementing new technologies. He has a passion for functional programming, machine learning, and working with data. He has experience working in the finance and telecom domains.
I'd like to thank Packt Publishing and its staff for an opportunity to contribute to this book.
Jenny (Xiao) Zhang is a technology professional in business analytics, KPIs, and big data. She helps businesses better manage, measure, report, and analyze data to answer critical business questions and drive business growth. She is an expert in SaaS business and had experience in a variety of industry domains such as telecom, oil and gas, and finance. She has written a number of blog posts at http://jennyxiaozhang.com on big data, Hadoop, and YARN. She also actively uses Twitter at @smallnaruto to share insights on big data and analytics.
I want to thank all my blog readers. It is the encouragement from them that motivates me to deep dive into the ocean of big data. I also want to thank my dad, Michael (Tiegang) Zhang, for providing technical insights in the process of reviewing the book. A special thanks to the Packt Publishing team for this great opportunity.
www.PacktPub.com
Support files, eBooks, discount offers, and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Preface
In a short span of time, YARN has attained a great deal of momentum and acceptance in the big data world.
YARN essentials is about YARN—the modern operating system for Hadoop. This book contains all that you need to know about YARN, right from its inception to the present and future.
In the first part of the