Hadoop Project: Hardware Specific
Hadoop Project: Hardware Specific
Hardware Specific:-
The number of machines, and specs of the machines, depends on a few factors:
Hardware Requirement
Network Considerations: Hadoop is very bandwidth-intensive! Often, all nodes are communicating with
each other at the same time
Daily data input 100 GB Storage space used by daily data input = daily
HDFS replication factor 3 data input * replication factor = 300 GB
Monthly growth 5% Monthly volume = (300 * 30) + 5% = 9450 GB
After one year = 9450 * (1 + 0.05)^12 = 16971
GB
Intermediate MapReduce 25% Dedicated space = HDD size * (1 – Non HDFS
data reserved space per disk / 100 + Intermediate
MapReduce data / 100)
Non HDFS reserved space per 30% = 4 * (1 – (0.25 + 0.30)) = 1.8 TB (which is the
disk node capacity)
Size of a hard drive disk 4 TB
RHEL or CentOS
Architecture review
Discuss key points that will dictate deployment decisions, including o Cluster Redundancy
Pre-installation
Hue installation
High availability
Cluster management
Cluster monitoring
Documentation