Bda 1
Bda 1
Theory: Hadoop is a Java-based programming framework that supports the processing and
storage of extremely large datasets on a cluster of inexpensive machines. It was the first major
open source project in the big data playing field and is sponsored by the Apache Software
Foundation.
Installation:
Prerequisites:
Step1: Installing Java 8 version.
OpenJDK version "1.8.0_91"
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-3ubuntu1~16.04.1-b14)
OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode) This output verifies
that OpenJDK has been successfully installed. Note: To set the path for
environment variables. i.e. JAVA_HOME
Fig 2
1. Install Apache Hadoop 2.2.0 in Microsoft Windows OS: If Apache Hadoop 2.2.0 is not
already installed then follow the post Build, Install, Configure and Run Apache Hadoop
2.2.0 in Microsoft Windows OS.
2. Start HDFS (Namenode and Datanode) and YARN (Resource Manager and Node
Manager)
Run following commands.
Command Prompt
C:\Users\abhijitg>cd c:\hadoop c:\hadoop>sbin\
start-dfs c:\hadoop>sbin\start-yarn
starting yarn daemons
Run wordcount MapReduce job Now we'll run wordcount MapReduce job available
in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-
2.2.0.jar
Create a text file with some content. We'll pass this file as input to
the wordcount MapReduce job for counting words.
C:\file1.txt
Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt') to be used for
counting words.
C:\Users\abhijitg>cd c:\hadoop C:\hadoop>bin\hdfs dfs -mkdir input
Copy the text file(say 'file1.txt') from local disk to the newly created 'input' directory in
HDFS.
C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input
http://abhijitg:8088/cluster
Result: We've installed Hadoop in stand-alone mode and verified it by running an example
program it provided.