Experiment 01 PDF
Experiment 01 PDF
Aim: Setup Hadoop Single node cluster. Compare Hadoop 1.x, 2.x and 3.x
Theory:
The Apache Hadoop software library provides a framework for the distributed processing of massive
data volumes across computer clusters. From a single server to thousands of devices, each providing
local computing and storage, it is intended to scale up. The library itself is designed to identify and
handle problems at the application layer rather than relying on hardware to provide high availability. As
a result, a highly-available service is delivered on top of a cluster of computers, each of which may be
prone to failures.
Implementation:
b. Since I want to manage Hadoop files independently, I create a separate Hadoop user named
'hadoop'. And then switch from angela_user to hadoop user and make sure this newly
created user is a member of the group.
e. Next we configure Java Environment Variables and then Edit core-site.xml, hdfs-site.xml,
mapred-site.xml and yarn-site.xml
g. In order to verify the running components, we check jps (java vm process status) as shown
above.
Knowing one’s IP address and Hadoop port will allow access to the Hadoop dashboard.
Example : http://localhost:9870/
Conclusion:
Thus, in this experiment, we have set up Hadoop single node cluster and have compared different
versions of Hadoop (v1.x, v2.x and v3.x)