Big Datalab
Big Datalab
6. Configure
$ nano yarn-site.xml
7. Do in Master Node
$ hdfs namenode �format
$ start-dfs.sh
$start-yarn.sh
8. Format NameNode
10. END
INPUT
OUTPUT:
Data node, name nodem Secondary name node, NodeManager, Resource Manager
________________________________________________________________________
ALGORITHM: -
SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS
Step-1
Adding Files and Directories to HDFS
Before you can run Hadoop programs on data stored in HDFS, you�ll need to put the
data into HDFS first. Let�s create a directory and put a file in it. HDFS has a
default working directory of/user/$USER, where $USER is your login user name. This
directory isn�t automatically created for you, though, so let�s create it with the
mkdir command. For the purpose of illustration, we use chuck. You should substitute
your user name in the example commands.
Step-2
Retrieving Files from HDFS
The Hadoop command get copies files from HDFS back to the local filesystem. To
retrieve example.txt, we can run the following command:
hadoop fs -cat example.txt
Step-3
Deleting Files from HDFS
hadoop fs -rm example.txt
Command for creating a directory in hdfs is �hdfs dfs �mkdir /lendicse�.
Adding directory is done through the command �hdfs dfs �put lendi_english /�.
Step-4
Copying Data from NFS to HDFS
Copying from directory command is
�hdfs dfs �copyFromLocal/home/lendi/Desktop/shakes/glossary /lendicse/"
*View the file by using the command �hdfs dfs �cat /lendi_english/glossary�
___________________________________________________________________
III Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
a) Find the number of occurrences of each word appearing in the input file(s)
b) Performing a MapReduce Job for word search count (look for specific keywords in
a file)
MAPREDUCE
PROGRAM
WordCount is a simple program which counts the number of occurrences of each word
in a given text input data set. WordCount fits very well with the MapReduce
programming model making it a great example to understand the Hadoop Map/Reduce
programming style. Our implementation consists of three main parts:
1. Mapper
2. Reducer
3. Driver
Step-1. Write a Mapper
A Mapper overrides the ?map? function from the Class
"org.apache.hadoop.mapreduce.Mapper" which provides <key, value> pairs as the
input. A Mapper implementation may output<key,value> pairs using the provided
Context .Input value of the WordCount Map task will be a line of text from the
input data file and the key would be the line number <line_number, line_of_text> .
Map task outputs <word, one> for each word in the line of text.
Pseudo-code
void Map (key, value)
{
for each word x in value:
output.collect(x, 1);
}
Pseudo-code
void Reduce (keyword, <list of value>)
{
for each x in <list of value>:
sum+=x;
final_output.collect(keyword, sum);
}
Step-3. Write Driver
The Driver program configures and run the MapReduce job. We use the main program
toperform basic configurations such as:
?Job Name : name of this Job
?Executable (Jar) Class: the main executable class. For here, WordCount.?Mapper
Class: class which overrides the "map" function. For here, Map.
?Reducer: class which override the "reduce" function. For here , Reduce.
?Output Key: type of output key. For here, Text.
?Output Value: type of output value. For here, IntWritable.
?File Input Path
?File Output Path