0% found this document useful (0 votes)
38 views25 pages

ADM Hadoop

Hadoop Related

Uploaded by

Jameer Inamdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views25 pages

ADM Hadoop

Hadoop Related

Uploaded by

Jameer Inamdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Computer Engineering

A Report On

“HADOOP Configuration”

Submitted from the

“DIPLOMA IN COMPUTER ENGINEERING”

MSBTE, PUNE

Department of Computer Engineering


Abhaysinhraje Bhonsle Institute of Technology (Polytechnic),
Shendre, Satara.
Academic Year – 2023-24

Page 1 of 25
Computer Engineering

Vidyavardhini Charitable trust


Abhaysinhraje Bhonsle Institute of Technology
Shahunagar – Shendre , Satara.

CERTIFICATE
This is to certify that:
SR NO. Enrollment No Name
1 2109830054 Thorat Om Avinash
2 2209830253 Bhandari Mansi Tejkumar
3 2209830259 Inamdar Jameer Rafik

Diploma in computer engineering has satisfactorily completed the project


work under micro project report on ," Hadoop Configuration ” under my
guidance and supervision, this is part of partial fulfillment of the requirement
for submission of Maharashtra State Board of Technical Education , Mumbai
during semester third of Academic year 2022-23...

Page 2 of 25
Computer Engineering

TABLE OF CONTENTS

Sr. Index Page No.


No.
1. Abstract 4

2. Introduction 6

3. Project Objective 7

4. What is Hadoop 8
5. Features of Hadoop 9

6. Installation of Hadoop 10-17

7. Conclusion 18

8. Reference 19

ABSTRACT

Page 3 of 25
Computer Engineering

What is Hadoop
Hadoop is an open source framework from Apache and is used to store
process and analyze data which are very huge in volume. Hadoop is written in Java
and is not OLAP (online analytical processing). It is used for batch/offline
processing.It is being used by Facebook, Yahoo, Google, Twitter, LinkedIn and
many more.
Moreover it can be scaled up just by adding nodes in the cluster.

ACTION PLAN:

Page 4 of 25
Computer Engineering

Sr. Details Of Activity Start Finish Responsible


No. Date Date Team
Member

1 Selected The Topic For Micro- 4-08-2023 7-08-2023 Om


Project Mansi
Jameer

2 We Organised Things Required For 8-08-2023 25-08-2023 Mansi


Our Project

3 We Browsed The Internet For 30-08-2023 15-09-2023 Jameer


Information And Raw Data

4 We Attended Extra Lectures For Our 20-09-2023 30-09-2023 Om


Project Topic

5 We Made Points/Notes On The 1-10-2023 5-10-2023 Mansi


Information We Collected

6 We Created A Word Document With 8-10-2023 15-10-2023 Mansi


Help Of Our Teacher

7 We Made Corrections By Discussing 17-10-2023 25-10-2023 Jameer


With Our Teacher

8 We Also Created A PDF Document 25-10-2023 27-10-2023 Om


To Make A Hard Copy Of The
Report

Page 5 of 25
Computer Engineering

INTRODUCTION :

Apache Hadoop is an open source software framework used to develop data


processing applications which are executed in a distributed computing
environment.

Applications built using HADOOP are run on large data sets distributed across
clusters of commodity computers. Commodity computers are cheap and widely
available. These are mainly useful for achieving greater computational power at
low cost.

Similar to data residing in a local file system of a personal computer system, in


Hadoop, data resides in a distributed file system which is called as a Hadoop
Distributed File system. The processing model is based on ‘Data Locality’ concept
wherein computational logic is sent to cluster nodes(server) containing data. This
computational logic is nothing, but a compiled version of a program written in a
high-level language such as Java. Such a program, processes data stored in Hadoop
HDFS.

Page 6 of 25
Computer Engineering

PROJECT OBJECTIVE

1) Hadoop MapReduce: MapReduce is a computational model and software


framework for writing applications which are run on Hadoop. These MapReduce
programs are capable of processing enormous data in parallel on large clusters of
computation nodes.

2) HDFS (Hadoop Distributed File System): HDFS takes care of the storage part
of Hadoop applications. MapReduce applications consume data from HDFS.
HDFS creates multiple replicas of data blocks and distributes them on compute
nodes in a cluster. This distribution enables reliable and extremely rapid
computations.

Page 7 of 25
Computer Engineering

Features Of ‘Hadoop’

• Suitable for Big


Data Analysis
As Big Data tends to be distributed and unstructured in nature, HADOOP clusters
are best suited for analysis of Big Data. Since it is processing logic (not the actual
data) that flows to the computing nodes, less network bandwidth is consumed. This
concept is called as data locality concept which helps increase the efficiency of
Hadoop based applications.
• Scalability
HADOOP clusters can easily be scaled to any extent by adding additional cluster
nodes and thus allows for the growth of Big Data. Also, scaling does not require
modifications to application logic.

• Fault Tolerance
HADOOP ecosystem has a provision to replicate the input data on to other cluster
nodes. That way, in the event of a cluster node failure, data processing can still
proceed by using data stored on another cluster node.

Page 8 of 25
Computer Engineering

Modules of Hadoop
1) HDFS: Hadoop Distributed File System. Google published its paper GFS and
on the basis of that HDFS was developed. It states that the files will be broken into
blocks and stored in nodes over the distributed architecture.
2) Yarn: Yet another Resource Negotiator is used for job scheduling and manage
the cluster.
3) Map Reduce: This is a framework which helps Java programs to do the
parallel computation on data using key value pair. The Map task takes input data
and converts it into a data set which can be computed in Key value pair. The output
of Map task is consumed by reduce task and then the out of reducer gives the
desired result.
4) Hadoop Common: These Java libraries are used to start Hadoop and are used
by other Hadoop modules.

Page 9 of 25
Computer Engineering

Installation of Hadoop

Step 1: Click here to download the Java 8 Package. Save this file in your home
directory.
Step 2: Extract the Java Tar File.
Command: tar -xvf jdk-8u101-linux-i586.tar.gz
Untar Java - Install Hadoop - Edureka

Fig: Hadoop Installation – Extracting Java Files

Step 3: Download the Hadoop 2.7.3 Package.


Command: wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-
2.7.3.tar.gz

Fig: Hadoop Installation – Downloading Hadoop

Step 4: Extract the Hadoop tar File.


Command: tar -xvf hadoop-2.7.3.tar.gz

Fig: Hadoop Installation – Extracting Hadoop Files

Page 10 of 25
Computer Engineering

Step 5: Add the Hadoop and Java paths in the bash file (.bashrc). Open.
bashrc file. Now, add Hadoop and Java Path as shown below.

Learn more about the Hadoop Ecosystem and its tools with the Hadoop Certification.

Command: vi .bashrc

Fig: Hadoop Installation – Setting Environment Variable

Then, save the bash file and close it.

For applying all these changes to the current Terminal, execute the source command.

Command: source .bashrc

Page 11 of 25
Computer Engineering

Fig: Hadoop Installation – Refreshing environment variables

To make sure that Java and Hadoop have been properly installed on your system and
can be accessed through the Terminal, execute the java -version and hadoop version
commands.

Command: java -version

Fig: Hadoop Installation – Checking Java Version

Command: hadoop version

Fig: Hadoop Installation – Checking Hadoop Version

Step 6: Edit the Hadoop Configuration files.

Page 12 of 25
Computer Engineering

Command: cd hadoop-2.7.3/etc/hadoop/

Command: ls

All the Hadoop configuration files are located in hadoop-2.7.3/etc/hadoop directory as


you can see in the snapshot below:

Fig: Hadoop Installation – Hadoop Configuration Files

Step 7: Open core-site.xml and edit the property mentioned below inside
configuration tag:
core-site.xml informs Hadoop daemon where NameNode runs in the cluster.
It contains configuration settings of Hadoop core such as I/O settings that are
common to HDFS & MapReduce.

Command: vi core-site.xml

Page 13 of 25
Computer Engineering

Fig: Hadoop Installation – Configuring core-site.xml

<?xml version="1.0" encoding="UTF-8"?>


<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Step 8: Edit hdfs-site.xml and edit the property mentioned below inside
configuration tag:
hdfs-site.xml contains configuration settings of HDFS daemons (i.e. NameNode,
DataNode, Secondary NameNode). It also includes the replication factor and
block size of HDFS.

Command: vi hdfs-site.xml

Step 8: Edit hdfs-site.xml and edit the property mentioned below inside
configuration tag:
hdfs-site.xml contains configuration settings of HDFS daemons (i.e. NameNode,
DataNode, Secondary NameNode). It also includes the replication factor and
block size of HDFS.

Command: vi hdfs-site.xml

Page 14 of 25
Computer Engineering

Fig: Hadoop Installation – Configuring hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>


<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permission</name>
<value>false</value>
</property>
</configuration>

Step 9: Edit the mapred-site.xml file and edit the property mentioned below
inside configuration tag:
mapred-site.xml contains configuration settings of MapReduce application like
number of JVM that can run in parallel, the size of the mapper and the reducer
process, CPU cores available for a process, etc.

In some cases, mapred-site.xml file is not available. So, we have to create the
mapred-site.xml file using mapred-site.xml template.

Page 15 of 25
Computer Engineering

Command: cp mapred-site.xml.template mapred-site.xml Command:


vi mapred-site.xml.

Fig: Hadoop Installation – Configuring mapred-site.xml

<?xml version="1.0" encoding="UTF-8"?>


<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

Page 16 of 25
Computer Engineering

Step 10: Edit yarn-site.xml and edit the property mentioned below inside
configuration tag:
yarn-site.xml contains configuration settings of ResourceManager and
NodeManager like application memory management size, the operation needed on
program & algorithm, etc.

You can even check out the details of Big Data with the Azure Data Engineering
Certification in Hyderabad.

Command: vi yarn-site.xml

Fig: Hadoop Installation – Configuring yarn-site.xml

<?xml version="1.0">
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>

Page 17 of 25
Computer Engineering

<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>

Page 18 of 25
Computer Engineering

CONCLUSION

Congratulations, you have successfully installed a single-node Hadoop cluster in one


go. In our next blog of the Hadoop Tutorial Series, we will be covering how to install
Hadoop on a multi-node cluster as well.

Now that you have understood how to install Hadoop, check out the Hadoop admin
course by Edureka, a trusted online learning company with a network of more than
250,000 satisfied learners spread across the globe. The Edureka Big Data Engineer
Course helps learners become experts in HDFS, Yarn, MapReduce, Pig, Hive,
HBase, Oozie, Flume, and Sqoop using real-time use cases on Retail, Social
Media, Aviation, Tourism, Finance domains.

Page 19 of 25
Computer Engineering

REFERENCE :

BOOK:
1. A Guide to Measuring and Monitoring Project Performance BY Harold Kerzner
2. Advanced Database Systems By Nabil R. Adam, Bhagvan.
3. Database Systems: Design, Implementation, and Management By Peter Rob.

WEBSITE NAME:

1. https://html.scribdassets.com/8517dys11c79xnq3/images/6-420fb4cfaa.png
2. https://www.emugames.net/
3. https://www.geeksforgeeks.org/DBMS
4. https://www.tutorialspoint.com
5. https://data-flair.training/blogs/best-data-mining-books/amp/
6. https://www.guru99.com/learn-hadoop-in-10-minutes.html

Page 20 of 25
Computer Engineering

l fd

Page 19 of 19

You might also like