Hadoop Platform & Services
Hadoop Platform & Services
➔ ES, Druid (Other data stores used for analytics and reporting)
▪ Secondary NameNode
‣ Merge FS image with edit log periodically … avoids downtime
when merging
‣ Serves as stale copy of FS image … data loss possible
http://blog.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
Secondary
Name Node
http://blog.cloudera.com/blog/2012/03/high-availability-for-the-hadoop-distributed-file-system-hdfs/
Slave/Worker: Data Node
▪ Store & retrieve blocks
▪ Respond to client and master requests for block operations
▪ Sends heartbeat every 3 secs for liveliness
▪ Periodically sends list of block IDs and location on that node
‣ Piggyback on heartbeat message
‣ e.g., send block list every hour
▪ Caches blocks in-memory using cache-directives per file, on
single data node
‣ E.g. index, lookup table, etc.
‣ Can be used by schedulers
Network Topology
▪ Same Node, Same Rack, Same Data Center, Different Data Centers
▪ Distance function between two logical nodes provided in config
‣ /dc/rack/node … default is “flat”, i.e. same distance
Slave Slave
Fig-1 Fig-2
MRv1 vs MRv2 Application Lifecycle
Apache Hadoop YARN, Arun C. Murthy, et al, HortonWorks, Addison Wesley, 2014
MapReduce v1 → MapReduce v2 (YARN)
Apache Hadoop YARN, Arun C. Murthy, et al, HortonWorks, Addison Wesley, 2014
YARN
▪ Designed for scalability
‣ 10k nodes, 400k tasks
▪ Designed for availability
‣ Separate application management from resource management
▪ Improve utilization
‣ Flexible slot allocation. Slots not bound to Map or Reduce types.
▪ Go beyond MapReduce
YARN
▪ ResourceManager for cluster
‣ Keeps track of nodes, capacities, allocations
‣ Failure and recovery (heartbeats)
▪ Coordinates scheduling of jobs on the cluster
‣ Decides which node to allocate to a job
‣ Ensures load balancing
Apache Hadoop YARN, Arun C. Murthy, et al, HortonWorks, Addison Wesley, 2014
Container
heartbeat
status to AM
Apache Hadoop YARN, Arun C. Murthy, et al, HortonWorks, Addison Wesley, 2014
Hadoop: The Definitive Guide, 4th Edition, 2015
MapReduce AppManager
▪ First requests Map containers
‣ As many as number of splits
▪ Capacity
‣ using different queues, min capacity per
queue
‣ Allocate excess resource to more loaded
▪ Fair
‣ Give all available
‣ Redistribute as jobs arrive
Hadoop MapReduce
Mapping tasks to blocks
▪ FileInputFormat converts blocks to “splits”
‣ Typically, 1 split per block … reduce task creation
overhead vs. overwhelm single task
‣ Can specify splits smaller/larger than a block size
‣ Affects locality if spanning blocks
‣ Affects performance with many small files (combine!)
▪ Background thread “spills” to disk when circular memory buffer (100MB) threshold
reached (80%)
‣ Asynchronous, avoid blocking unless thread write slower than Map task
▪ Divides the data into in-memory partitions, one for each reducer
‣ Performs sort by key
‣ Runs combiner sorted outputs
‣ Writes to local directory, accessible by reducers over HTTPS (Not HDFS!)
Local Disk Local Disk
▪ Output files are merged, partitioned and sorted into single file on disk
‣ If multiple spill files (3) once Map task done, runs combiner again.
‣ Optionally compressed
▪ Map task output always written to disk…recovery!
Local Disk Local Disk
▪ When output from all Map tasks available, final Merge-sort over all spilled files,
before reduce method called
‣ Multiple rounds, 10 files merged per round
‣ Input to reducer from sorted file and trailing in-memory sorted KVP
Liveliness
▪ A Hadoop job or task is alive as long as it is
making progress
‣ Reading/writing input record
‣ Setting status or incrementing counter
▪ Progress reported to App- Manager by
Tasks ~3secs
▪ Client polls AppManager
‣ ~1 sec
Reading
▪ Hadoop: The Definitive Guide, 4th Edition, 2015
‣ Chapters 3, 4, 7
Additional Resources
▪ Apache Hadoop YARN: Moving Beyond MapReduce and Batch Processing with
Apache Hadoop, 2015
‣ Chapters 1, 3, 4, 7