0% found this document useful (0 votes)
2 views

Module_3_Session_2 Features and Components of Hadoop

The document outlines the features of Hadoop, highlighting its fault-tolerant, scalable, and modular design that efficiently handles Big Data storage and processing. It describes the robustness of the Hadoop Distributed File System (HDFS) and its ability to continue operations despite server failures, along with its open-source nature and reliance on Java and Linux. Additionally, it details the core components of the Apache Hadoop framework, including Hadoop Common, HDFS, YARN, and MapReduce.

Uploaded by

s903019.1265
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module_3_Session_2 Features and Components of Hadoop

The document outlines the features of Hadoop, highlighting its fault-tolerant, scalable, and modular design that efficiently handles Big Data storage and processing. It describes the robustness of the Hadoop Distributed File System (HDFS) and its ability to continue operations despite server failures, along with its open-source nature and reliance on Java and Linux. Additionally, it details the core components of the Apache Hadoop framework, including Hadoop Common, HDFS, YARN, and MapReduce.

Uploaded by

s903019.1265
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

CS6CRT19 Big Data Analytics​ ​ ​ ​ ​ ​ ​ Module 3

Features of Hadoop
●​ Fault-efficient scalable, flexible and modular design which uses a simple
and modular programming model. The system provides servers at high
scalability. The system is scalable by adding new nodes to handle larger
data. Hadoop proves very helpful in storing, managing, processing and
analyzing Big Data. Modular functions make the system flexible. One can
add or replace components at ease. Modularity allows replacing its
components for a different software tool.
●​ Robust design of HDFS: Execution of Big Data applications continue even
when an individual server or cluster fails. This is because of Hadoop
provisions for backup and data recovery mechanism. HDFS thus has high
reliability.
●​ Store and process Big Data: Processes Big Data of 3V characteristics.
●​ Distributed clusters computing model with data locality: Processes Big
Data at high speed as the application tasks and subtasks submit to the
DataNodes. One can achieve more computing power by increasing the
number of computing nodes. The processing splits across multiple
DataNodes, and thus fast processing and aggregated results.
●​ Hardware fault-tolerant: A fault does not affect data and application
processing. If a node goes down, the other nodes take care of the residue.
This is due to multiple copies of all data blocks which replicate
automatically. Default is three copies of data blocks.
●​ Open-source framework: Open source access and cloud services enable
large data stores. Hadoop uses a cluster of multiple inexpensive servers or
the cloud.
●​ Java and Linux based: Hadoop uses Java interfaces. Hadoop base is Linux
but has its own set of shell commands support.

Swamy Saswathikananda College, Poothotta​ ​ ​ ​ ​ ​ ​ 1


CS6CRT19 Big Data Analytics​ ​ ​ ​ ​ ​ ​ Module 3

The base Apache Hadoop framework is composed of the following modules:


●​ Hadoop Common – contains libraries and utilities needed by other Hadoop
modules
●​ Hadoop Distributed File System (HDFS) – a distributed file-system that
stores data on commodity machines, providing very high aggregate
bandwidth across the cluster
●​ Hadoop YARN – (introduced in 2012) a platform responsible for managing
computing resources in clusters and using them for scheduling users'
applications
●​ Hadoop MapReduce – an implementation of the MapReduce programming
model for large-scale data processing

Swamy Saswathikananda College, Poothotta​ ​ ​ ​ ​ ​ ​ 2

You might also like