0% found this document useful (0 votes)
100 views19 pages

Big Data Analytics - Lab-Manual

The document provides instructions for installing a single node Hadoop cluster on Ubuntu. It details downloading and configuring Hadoop, setting environment variables, formatting the namenode, and starting the Hadoop cluster processes.

Uploaded by

Monika R.T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views19 pages

Big Data Analytics - Lab-Manual

The document provides instructions for installing a single node Hadoop cluster on Ubuntu. It details downloading and configuring Hadoop, setting environment variables, formatting the namenode, and starting the Hadoop cluster processes.

Uploaded by

Monika R.T
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

‭LAB MANUAL‬

‭PRACTICAL NO – 1‬

‭Exp No:‬

‭Date:‬

‭Aim:‬‭Installation of Single Node Hadoop Cluster on Ubuntu‬

‭ HEORY‬‭:‬
T
‭Apache Hadoop 3.1 have noticeable improvements any many bug fixes over the previous stable‬
‭3.0 releases. This version has many improvements in HDFS and MapReduce. This how-to guide‬
‭will help you to setup Hadoop 3.1.0 Single-Node Cluster on CentOS/RHEL 7/6/5, Ubuntu 18.04,‬
‭17.10, 16.04 & 14‬
‭.04, Debian 9/8/7 and LinuxMint Systems. This article has been tested with Ubuntu 18.04 LTS.‬

1‭ . Prerequisites‬
‭Java‬‭is‬‭the‬‭primary‬‭requirement‬‭for‬‭running‬‭Hadoop‬‭on‬‭any‬‭system,‬‭So‬‭make‬‭sure‬‭you‬‭have‬‭Java‬
‭installed‬‭on‬‭your‬‭system‬‭using‬‭the‬‭following‬‭command.‬‭If‬‭you‬‭don’t‬‭have‬‭Java‬‭installed‬‭on‬‭your‬
‭system,‬‭use‬‭one‬‭of‬‭the‬‭following‬‭links‬‭to‬‭install‬‭it‬‭first.‬‭Hadoop‬‭supports‬‭only‬‭JAVA‬‭8‬‭If‬‭already‬
‭any other version is present then uninstall the following using these commands.‬
‭sudo apt-get purge openjdk-\* icedtea-\* icedtea6-\*‬
‭OR‬
‭sudo apt remove openjdk-8-jdk‬

‭∙‬‭Step 1.1 – Install Oracle Java 8 on Ubuntu‬


‭ ou need to enable additional repository to your system to install Java 8 on Ubuntu VPS. After‬
Y
‭that install Oracle Java 8 on an Ubuntu system using apt-get.This repository contains a package‬
‭named oracle-java8-installer, Which is notan actual Java package. Instead of that, this package‬
‭contains a script toinstall Java on Ubuntu.Run below commands to install Java 8 on Ubuntu and‬
‭LinuxMint.‬
‭sudo add-apt-repository‬
‭ppa:webupd8team/java sudo apt-get‬
‭sudo apt-get install oracle-java8-installer‬
‭OR‬
‭sudo apt install openjdk-8-jre-headless‬
‭sudo apt install openjdk-8-jdk‬
‭∙‬‭Step 1.2 – Verify Java Inatallation‬
‭The apt repository also provides package oracle-java8-set-default to set Java 8 as your default‬
‭Java version. This package will be installed along with Java installation. To make sure run‬
‭below command.‬
‭sudo apt-get install oracle-java8-set-default‬
‭After successfully installing Oracle Java 8 using the above steps, Let’s verify the installed‬
v‭ ersion using the following command.‬
‭java -version‬
‭java version "1.8.0_201"‬

J‭ ava(TM) SE Runtime Environment (build 1.8.0_201-b09)‬


‭Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)‬
‭∙‬‭Step 1.3 – Setup JAVA_HOME and JRE_HOME Variable‬
‭Add the java path to JAVA_HOME variable in .bashrc file. Go to your home directory and in the‬
‭folder option click on show hidden files. After that a .bashrc file will be present, open the file‬
‭and add the following line at the end.‬
‭NOTE- Path of the java will be your pc path on which java is been‬
‭installed. export JAVA_HOME=/usr/lib/jvm/java-8-oracle‬
‭NOTE- After doing all changes and saving the file run the following command to make‬
‭changes through the .bashrc file.‬
‭source ~/.bashrc‬
‭All done, you have successfully installed Java 8 on a Linux system.‬

2‭ . Create Hadoop User‬


‭We recommend creating a normal (nor root) account for Hadoop working. To create an‬
‭account using the following command.‬
‭adduser hadoop‬
‭passwd hadoop‬
‭Set up a new user for Hadoop working separately other than the normal users. NOTE- Its‬
‭compulsory to create a sperate user with username hadoop otherwise it may give you path‬
‭file issues later.‬
‭Also run these commands from an admin privileged user present on the‬
‭machine. sudo adduser Hadoop sudo‬
‭Command – sudo adduser hadoop sudo If you have already created the user and want to‬
‭give sudo/root privileges to it then run the following command.‬
‭sudo usermod -a -G sudo hadoop‬
‭Otherwise you can directly edit the permission lines in sudoers file. Go to the root access‬
‭by running‬
‭sudo -I or su- <username>‬
‭Type the following command and add the below line to the file.‬
‭visudo‬
‭Add following lines to the file.‬
‭hadoop ALL=(ALL:ALL) ALL‬
‭After creating the account, it also required to set up key-based ssh to its own account. To do this‬
‭use execute following commands.‬
‭su - hadoop‬
‭ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa‬
‭cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys‬
‭chmod 0600 ~/.ssh/authorized_keys‬
‭ et’s verify key based login. Below command should not ask for the password but the first‬
L
‭time it will prompt for adding RSA to the list of known hosts.‬
‭ssh localhost‬
‭exit‬
‭Disable all firewall restriction‬
‭sudo ufw disable‬

I‭ f above command doesn’t work then go with.‬


‭service iptables stop‬
‭OR‬
‭sudo chkconfig iptables off‬
‭Sometimes it’s better to manage firewall using a third party software. Ex. yast‬

3‭ . Download Hadoop 3.1 Archive‬


‭In this step, download hadoop 3.1 source archive file using below command. You can also‬
‭select alternate download mirror for increasing download speed.‬
‭cd ~‬
‭wget http://www-eu.apache.org/dist/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz‬
‭tar xzf hadoop-3.1.0.tar.gz‬
‭mv hadoop-3.1.0 hadoop‬

4‭ . Setup Hadoop Pseudo-Distributed Mode‬


‭4.1. Setup Hadoop Environment Variables‬
‭First, we need to set environment variable uses by Hadoop. Edit ~/.bashrc file and‬
‭append following values at end of file.‬
‭export HADOOP_HOME=/home/hadoop/hadoop export‬
‭HADOOP_INSTALL=$HADOOP_HOME export‬
‭HADOOP_MAPRED_HOME=$HADOOP_HOME export‬
‭HADOOP_COMMON_HOME=$HADOOP_HOME‬
‭export HADOOP_HDFS_HOME=$HADOOP_HOME‬
‭export YARN_HOME=$HADOOP_HOME‬
‭export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native‬
‭export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin‬

‭ ow apply the changes in the current running‬


N
‭environment source ~/.bashrc‬

‭ ow edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME‬


N
‭environment variable. Change the JAVA path as per install on your system. This path may vary‬
‭as per your operating system version and installation source. So make sure you are using correct‬
‭path. export JAVA_HOME=/usr/lib/jvm/java-8-oracle‬

4‭ .2. Setup Hadoop Configuration Files‬


‭Hadoop has many of configuration files, which need to configure as per requirements of your‬
‭Hadoop infrastructure. Let’s start with the configuration with basic Hadoop single node‬
c‭ luster setup. first, navigate to below location‬
‭cd $HADOOP_HOME/etc/hadoop‬
‭Edit core-site.xml‬
‭<configuration>‬
‭<property>‬
‭<name>fs.default.name</name>‬

‭3‬
‭ value>hdfs://localhost:9000</value>‬
<
‭</property>‬
‭</configuration>‬

‭ dit hdfs-site.xml‬
E
‭<configuration>‬
‭<property>‬
‭<name>dfs.replication</name>‬
‭<value>1</value>‬
‭</property>‬
‭<property>‬
‭<name>dfs.name.dir</name>‬
‭<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>‬
‭</property>‬
‭<property>‬
‭<name>dfs.data.dir</name>‬
‭<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>‬
‭</property>‬
‭</configuration>‬

‭ dit mapred-site.xml‬
E
‭<configuration>‬
‭<property>‬
‭<name>mapreduce.framework.name </name>‬
‭<value>yarn </value>‬
‭</property>‬
‭</configuration>‬

‭ dit yarn-site.xml‬
E
‭<configuration>‬
‭<property>‬
‭<name>yarn.nodemanager.aux-services </name>‬
‭<value>mapreduce_shuffle </value>‬
‭</property>‬
‭</configuration>‬
4‭ .3. Format Namenode‬
‭Now format the namenode using the following command, make sure that Storage directory‬
‭is hdfs namenode -format‬

‭Sample output:‬
‭ ARNING: /home/hadoop/hadoop/logs does not exist. Creating.‬
W
‭2018-05-02 17:52:09,678 INFO namenode.NameNode: STARTUP_MSG:‬
‭STARTUP_MSG: Starting NameNode STARTUP_MSG: host =‬
‭localhost/127.0.1.1‬
‭STARTUP_MSG: args = [-format]‬
‭STARTUP_MSG: version = 3.1.0‬
‭...‬
‭...‬
‭...‬
‭2018-05-02 17:52:13,717 INFO common.Storage: Storage directory‬
‭/home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted. 2018-05-02‬
‭17:52:13,806 INFO namenode.FSImageFormatProtobuf: Saving image file‬
‭/home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using‬
‭no‬
‭compression‬
‭2018-05-02 17:52:14,161 INFO namenode.FSImageFormatProtobuf: Image file‬
‭/home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size‬
‭391 bytes saved in 0 seconds .‬
‭2018-05-02 17:52:14,224 INFO namenode.NNStorageRetentionManager: Going to retain‬
‭1 images with txid >= 0‬
‭2018-05-02 17:52:14,282 INFO namenode.NameNode: SHUTDOWN_MSG:‬
‭/************************************************************‬
‭SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.1.1‬
‭************************************************************/‬

5‭ . Start Hadoop Cluster‬


‭Let’s start your Hadoop cluster using the scripts provides by Hadoop. Just navigate to‬
‭your $HADOOP_HOME/sbin directory and execute scripts one by one.‬
‭Cd $HADOOP_HOME/sbin/‬

‭ ow run start-dfs.sh script.‬


N
‭./start-dfs.sh‬
‭Sample output:Starting namenodes on‬
‭[localhost] Starting datanodes‬
‭Starting secondary namenodes [localhost]‬
‭2018-05-02 18:00:32,565 WARN util.NativeCodeLoader: Unable to load native-hadoop‬
‭library for your platform... using builtin-java classes where applicable‬
‭Now run start-yarn.sh script.‬
.‭/start-yarn.sh‬
‭Sample output:Starting resourcemanager‬
‭Starting nodemanagers‬

6‭ . Access Hadoop Services in Browser‬


‭Hadoop NameNode started on port 9870 default. Access your server on port 9870 in‬
‭your favorite web browser.‬
‭http://localhost:9870/‬

‭ ow access port 8042 for getting the information about the cluster and all‬
N
‭applications http:// localhost:8042/‬

‭ ccess port 9864 to get details about your Hadoop node.‬


A
‭http://localhost:9864/‬

7‭ . Test Hadoop Single Node Setup‬


‭7.1Make the HDFS directories required using following commands.‬
‭bin/hdfs dfs -mkdir /user‬
‭bin/hdfs dfs -mkdir /user/hadoop‬
‭7.2Copy all files from local file system /var/log/httpd to hadoop distributed file system‬
‭using below command‬
b‭ in/hdfs dfs -put /var/log/apache2 logs‬
‭7.3 Browse Hadoop distributed file system by opening below URL in the browser. You will‬
‭see an apache2 folder in the list.‬
‭http://localhost:9870/explorer.html#/user/hadoop/logs/‬
‭PRACTICAL NO – 2‬

‭ xp No:‬
E
‭Date :‬

‭Aim:‬‭Hadoop Programming: Word Count MapReduce Program Using Eclipse‬

‭THEORY:‬

‭ teps to run WordCount Application in Eclipse‬


S
‭Step-1‬
‭Download eclipse if you don’t have.‬‭64 bit Linux os‬‭32 bit Linux os‬
‭Step-2‬
‭Open Eclipse and Make Java Project.‬
‭In eclipse Click on File menu-> new -> Java Project. Write there your project name. Here is‬
‭WordCount. Make sure Java version must be 1.6 and above. Click on Finish.‬

‭ tep-3‬
S
‭Make Java class File and write a code.‬
‭Click on WordCount project. There will be ‘src’ folder. Right click on ‘src’ folder -> New ->‬
‭Class. Write Class file name. Here is Wordcount. Click on Finish.‬

‭Copy and Paste below code in Wordcount.java. Save it.‬


‭ ou will get lots of error but don’t panic. It is because of requirement of external library‬
Y
‭of hadoop which is required to run mapreduce program.‬
‭import java.io.IOException;‬
i‭mport java.util.StringTokenizer;‬
‭import org.apache.hadoop.conf.Configuration;‬
‭import org.apache.hadoop.fs.Path;‬
‭import org.apache.hadoop.io.IntWritable;‬
‭import org.apache.hadoop.io.Text;‬
‭import org.apache.hadoop.mapreduce.Job;‬
‭import org.apache.hadoop.mapreduce.Mapper;‬
‭import org.apache.hadoop.mapreduce.Reducer;‬
‭import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;‬
‭import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;‬
‭public class Wordcount {‬
‭public static class TokenizerMapper‬
‭extends Mapper{‬
‭private final static IntWritable one = new‬
‭IntWritable(1); private Text word = new Text();‬
‭public void map(Object key, Text value, Context context) throws‬
‭IOException, InterruptedException {‬
‭StringTokenizer itr = new‬
‭StringTokenizer(value.toString()); while‬
‭(itr.hasMoreTokens()) {‬
‭word.set(itr.nextToken());‬
‭context.write(word, one);‬
‭}‬
‭}‬
‭}‬
‭public static class IntSumReducer‬
‭extends Reducer {‬
‭private IntWritable result = new IntWritable();‬
‭public void reduce(Text key, Iterable values,Context context) throws‬
‭IOException, InterruptedException {‬
‭int sum = 0;‬
‭for (IntWritable val : values) {‬
‭sum += val.get();‬
‭}‬
‭result.set(sum);‬
‭context.write(key, result);‬
‭}‬
‭}‬
‭public static void main(String[] args) throws Exception‬
‭{ Configuration conf = new Configuration();‬
‭Job job = Job.getInstance(conf, "word count");‬
‭job.setJarByClass(WordCount.class);‬
‭job.setMapperClass(TokenizerMapper.class);‬
‭job.setCombinerClass(IntSumReducer.class);‬
‭job.setReducerClass(IntSumReducer.class);‬
‭job.setOutputKeyClass(Text.class);‬
j‭ob.setOutputValueClass(IntWritable.class);‬
‭FileInputFormat.addInputPath(job, new Path(args[0]));‬
‭FileOutputFormat.setOutputPath(job, new Path(args[1]));‬
‭System.exit(job.waitForCompletion(true) ? 0 : 1);‬

}‭ ‬
‭}‬

‭ tep-4‬
S
‭Add external libraries from hadoop.‬
‭Right click on WordCount Project -> Build Path -> Configure Build Path -> Click on Libraries -‬
‭> click on ‘Add External Jars..’ button.‬
‭Select below files from hadoop folder.‬
‭In my case:- /usr/local/hadoop/share/hadoop‬
‭4.1‬‭Add jar files from /usr/local/hadoop/share/hadoop/common folder.‬
‭4.2‬‭Add jar files from /usr/local/hadoop/share/hadoop/common/lib folder.‬
‭4.3‬‭Add jar files from /usr/local/hadoop/share/hadoop/mapreduce folder (Don’t need to add‬
‭hadoop-mapreduce-examples-2.7.3.jar)‬
‭4.4‬‭Add jar files from /usr/local/hadoop/share/hadoop/yarn folder.‬
‭Click on ok. Now you can see, all error in code is gone.‬
‭Step 5‬
‭Running Mapreduce Code.‬
‭5.1‬‭Make input file for WordCount Project.‬
‭Right Click on WordCount project-> new -> File. Write File name and click on ok. You can‬
‭copy and paste below contains into your input file.‬
‭car bus bike‬
‭bike bus aeroplane‬
‭truck car bus‬
‭5.2‬‭Right click on WordCount Project -> click on Run‬‭As. -> click on Run Configuration…‬
‭Make new configuration by clicking on ‘new launch configuration’. Set Configuration‬
‭Name, Project Name and Class file name.‬
‭Output of WordCount Application and output logs in console.‬
‭Refresh WordCount Project. Right Click on project -> click on Refresh. You can find ‘out’‬
‭directory in project explorer. Open ‘out’ directory. There will be ‘part-r-00000’ file. Double click‬
‭to open it.‬
‭PRACTICAL NO – 3‬

‭Exp No:‬

‭Date:‬

‭Aim:‬‭Implementing Matrix Multiplication Using One Map-Reduce Step.‬

‭THEORY:‬
I‭ n mathematics, matrix multiplication or the matrix product is a binary operation that produces a‬
‭matrix from two matrices. In more detail, if A is an n × m matrix and B is an m × p matrix, their‬
‭matrix product AB is an n × p matrix, in which the m entries across a row of A are multiplied‬
‭with the m entries down a column of B and summed to produce an entry of AB. When two‬
‭linear transformations are represented by matrices, then the matrix product represents the‬
‭composition of the two transformations.‬

‭Algorithm for Map Function:‬

‭for each element mij of M do‬

‭produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of columns of N‬

‭for each element njk of N do‬

‭produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the number of rows of M.‬

r‭ eturn Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij) and (N, j,njk)‬
‭for all possible values of j.‬

‭Algorithm for Reduce Function:‬

‭for each key (i,k) do‬

‭sort values begin with M by j in listM‬

‭sort values begin with N by j in listN‬

‭multiply mij and njk for jth value of each list‬


‭sum up mij x njk return (i,k), Σj=1 mij x njk‬
‭Step 1. Download the hadoop jar files with these links.‬

‭Download Hadoop Common Jar files:‬‭https://goo.gl/G4MyHp‬‭$‬

‭wget‬‭https://goo.gl/G4MyHp‬‭-‭O
‬ hadoop-common-2.2.0.jar‬

‭Download Hadoop Mapreduce Jar File:‬‭https://goo.gl/KT8yfB‬

‭$ wget‬‭https://goo.gl/KT8yfB‬‭-‭O
‬ ‬

‭hadoop-mapreduce-client-core-2.7.1.jar‬‭Step 2. Creating Mapper file‬

‭for Matrix Multiplication.‬

‭import org.apache.hadoop.conf.*;‬

‭import org.apache.hadoop.io.LongWritable;‬

‭import org.apache.hadoop.io.Text;‬

‭import org.apache.hadoop.mapreduce.Mapper;‬

‭import java.io.IOException;‬

‭public class Map‬

‭extends org.apache.hadoop.mapreduce.Mapper<LongWritable, Text, Text, Text> {‬

‭@Override‬

‭public void map(LongWritable key, Text value, Context context)‬

‭throws IOException, InterruptedException {‬

‭Configuration conf = context.getConfiguration();‬

‭int m = Integer.parseInt(conf.get("m")); int p =‬

‭Integer.parseInt(conf.get("p"));‬

‭String line = value.toString();‬

‭// (M, i, j, Mij);‬

‭String[] indicesAndValue = line.split(",");‬


‭Text outputKey = new Text();‬
‭ ext outputValue = new Text();‬
T
‭if (indicesAndValue[0].equals("M")) {‬

‭for (int k = 0; k < p; k++) {‬

‭outputKey.set(indicesAndValue[1] + "," + k);‬

‭// outputKey.set(i,k);‬

‭outputValue.set(indicesAndValue[0] + "," + indicesAndValue[2]‬

‭+ "," + indicesAndValue[3]);‬

‭// outputValue.set(M,j,Mij);‬

‭context.write(outputKey, outputValue);‬

‭}‬

‭} else {‬

‭// (N, j, k, Njk);‬

‭for (int i = 0; i < m; i++) {‬

‭outputKey.set(i + "," + indicesAndValue[2]);‬

‭outputValue.set("N," + indicesAndValue[1] + ","‬

‭+ indicesAndValue[3]);‬

‭context.write(outputKey, outputValue);‬

‭}‬

‭}‬

‭}‬

‭}‬
‭ tep 3. Creating Reducer.java file for Matrix‬
S
‭Multiplication.‬
‭import org.apache.hadoop.io.Text;‬

‭import org.apache.hadoop.mapreduce.Reducer;‬

‭import java.io.IOException;‬

‭import java.util.HashMap;‬

‭public class Reduce‬

‭extends org.apache.hadoop.mapreduce.Reducer<Text, Text, Text, Text>‬

‭{ @Override‬

‭public void reduce(Text key, Iterable<Text> values, Context context)‬

‭throws IOException, InterruptedException {‬

‭String[] value;‬

‭//key=(i,k),‬

‭//Values = [(M/N,j,V/W),..]‬

‭HashMap<Integer, Float> hashA = new HashMap<Integer,‬

‭Float>(); HashMap<Integer, Float> hashB = new HashMap<Integer,‬

‭Float>(); for (Text val : values) {‬

‭value = val.toString().split(",");‬

‭if (value[0].equals("M")) {‬

‭hashA.put(Integer.parseInt(value[1]),‬

‭Float.parseFloat(value[2])); } else {‬

‭hashB.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));‬

‭}‬
}‭ ‬
‭int n = Integer.parseInt(context.getConfiguration().get("n"));‬
‭float result = 0.0f;‬

‭float m_ij;‬

‭float n_jk;‬

‭for (int j = 0; j < n; j++) {‬

‭m_ij = hashA.containsKey(j) ? hashA.get(j) :‬

‭0.0f; n_jk = hashB.containsKey(j) ? hashB.get(j) :‬

‭0.0f; result += m_ij * n_jk;‬

‭}‬

‭if (result != 0.0f) {‬

‭context.write(null,‬

‭new Text(key.toString() + "," + Float.toString(result)));‬

‭}‬

‭}‬

‭}‬
‭ tep 4. Creating MatrixMultiply.java file for‬
S

‭import org.apache.hadoop.conf.*;‬

‭import org.apache.hadoop.fs.Path;‬

‭import org.apache.hadoop.io.*;‬

‭import org.apache.hadoop.mapreduce.*;‬

‭import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;‬

‭import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;‬
‭importorg.apache.hadoop.mapreduce.lib.output.FileOutputForm‬

‭at;‬

‭import‬

‭org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;‬

‭public class MatrixMultiply {‬

‭public static void main(String[] args) throws Exception‬

‭{ if (args.length != 2) {‬

‭System.err.println("Usage: MatrixMultiply <in_dir>‬

‭<out_dir>"); System.exit(2);‬

‭}‬

‭Configuration conf = new Configuration();‬

‭conf.set("m", "1000");‬

‭conf.set("n", "100");‬

‭conf.set("p", "1000");‬

‭@SuppressWarnings("deprecation")‬

J‭ ob job = new Job(conf, "MatrixMultiply");‬


‭job.setJarByClass(MatrixMultiply.class);‬

‭job.setOutputKeyClass(Text.class);‬

‭job.setOutputValueClass(Text.class);‬

‭job.setMapperClass(Map.class);‬

‭job.setReducerClass(Reduce.class);‬

‭job.setInputFormatClass(TextInputFormat.class);‬

‭job.setOutputFormatClass(TextOutputFormat.class);‬
‭FileInputFormat.addInputPath(job, new Path(args[0]));‬

‭FileOutputFormat.setOutputPath(job, new‬

‭Path(args[1])); job.waitForCompletion(true);‬

‭}‬
‭}‬

‭Step 5. Compiling the program in particular folder named as operation/‬

$‭ javac -cp‬
‭hadoop-common-2.2.0.jar:hadoop-mapreduce-client-core-2.7.1.jar:operation/:. - d‬
‭operation/ Map.java‬

$‭ javac -cp‬
‭hadoop-common-2.2.0.jar:hadoop-mapreduce-client-core-2.7.1.jar:operation/:. - d‬
‭operation/ Reduce.java‬

‭$ javac -cp‬
h‭ adoop-common-2.2.0.jar:hadoop-mapreduce-client-core-2.7.1.jar:operation/:. - d‬
‭operation/ MatrixMultiply.java‬
‭Step 6. Let’s retrieve the directory after compilation.‬

‭$ ls -R operation/‬

‭operation/:‬

‭www‬

‭operation/www:‬

‭ehadoopinfo‬

‭operation/www/ehadoopinfo:‬

‭com‬

‭operation/www/ehadoopinfo/com:‬

‭Map.class MatrixMultiply.class Reduce.class‬


‭Step 7. Creating Jar file for the Matrix Multiplication.‬

‭$ jar -cvf MatrixMultiply.jar -C operation/ .‬

‭added manifest‬

‭adding: www/(in = 0) (out= 0)(stored 0%)‬

‭adding: www/ehadoopinfo/(in = 0) (out= 0)(stored 0%)‬

‭adding: www/ehadoopinfo/com/(in = 0) (out= 0)(stored 0%)‬

‭adding: www/ehadoopinfo/com/Reduce.class(in = 2919) (out= 1271)(deflated 56%) adding:‬


‭www/ehadoopinfo/com/MatrixMultiply.class(in = 1815) (out= 932)(deflated 48%) adding:‬
‭www/ehadoopinfo/com/Map.class(in = 2353) (out= 993)(deflated 57%)‬
‭Step 8. Uploading the M, N file which contains the matrix multiplication data to HDFS.‬
‭$ cat M‬
‭M,0,0,1‬

‭M,0,1,2‬

‭M,1,0,3‬

‭M,1,1,4‬

‭$ cat N‬

‭N,0,0,5‬

‭N,0,1,6‬

‭N,1,0,7‬

‭N,1,1,8‬

‭$ hadoop fs -mkdir Matrix/‬

‭$ hadoop fs -copyFromLocal M Matrix/‬

‭$ hadoop fs -copyFromLocal N Matrix/‬


‭ tep 9. Getting Output from part-r-00000 that was generated after the execution of‬
S
‭the hadoop command.‬

‭$ hadoop fs -cat result/part-r-00000‬

‭0,0,19.0‬

‭0,1,22.0‬

‭1,0,43.0‬

‭1,1,50.0‬

You might also like