Big Data Analytics - Lab-Manual
Big Data Analytics - Lab-Manual
PRACTICAL NO – 1
Exp No:
Date:
HEORY:
T
Apache Hadoop 3.1 have noticeable improvements any many bug fixes over the previous stable
3.0 releases. This version has many improvements in HDFS and MapReduce. This how-to guide
will help you to setup Hadoop 3.1.0 Single-Node Cluster on CentOS/RHEL 7/6/5, Ubuntu 18.04,
17.10, 16.04 & 14
.04, Debian 9/8/7 and LinuxMint Systems. This article has been tested with Ubuntu 18.04 LTS.
1 . Prerequisites
JavaistheprimaryrequirementforrunningHadooponanysystem,SomakesureyouhaveJava
installedonyoursystemusingthefollowingcommand.Ifyoudon’thaveJavainstalledonyour
system,useoneofthefollowinglinkstoinstallitfirst.HadoopsupportsonlyJAVA8Ifalready
any other version is present then uninstall the following using these commands.
sudo apt-get purge openjdk-\* icedtea-\* icedtea6-\*
OR
sudo apt remove openjdk-8-jdk
3
value>hdfs://localhost:9000</value>
<
</property>
</configuration>
dit hdfs-site.xml
E
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
dit mapred-site.xml
E
<configuration>
<property>
<name>mapreduce.framework.name </name>
<value>yarn </value>
</property>
</configuration>
dit yarn-site.xml
E
<configuration>
<property>
<name>yarn.nodemanager.aux-services </name>
<value>mapreduce_shuffle </value>
</property>
</configuration>
4 .3. Format Namenode
Now format the namenode using the following command, make sure that Storage directory
is hdfs namenode -format
Sample output:
ARNING: /home/hadoop/hadoop/logs does not exist. Creating.
W
2018-05-02 17:52:09,678 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode STARTUP_MSG: host =
localhost/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 3.1.0
...
...
...
2018-05-02 17:52:13,717 INFO common.Storage: Storage directory
/home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted. 2018-05-02
17:52:13,806 INFO namenode.FSImageFormatProtobuf: Saving image file
/home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 using
no
compression
2018-05-02 17:52:14,161 INFO namenode.FSImageFormatProtobuf: Image file
/home/hadoop/hadoopdata/hdfs/namenode/current/fsimage.ckpt_0000000000000000000 of size
391 bytes saved in 0 seconds .
2018-05-02 17:52:14,224 INFO namenode.NNStorageRetentionManager: Going to retain
1 images with txid >= 0
2018-05-02 17:52:14,282 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.1.1
************************************************************/
ow access port 8042 for getting the information about the cluster and all
N
applications http:// localhost:8042/
xp No:
E
Date :
THEORY:
tep-3
S
Make Java class File and write a code.
Click on WordCount project. There will be ‘src’ folder. Right click on ‘src’ folder -> New ->
Class. Write Class file name. Here is Wordcount. Click on Finish.
}
}
tep-4
S
Add external libraries from hadoop.
Right click on WordCount Project -> Build Path -> Configure Build Path -> Click on Libraries -
> click on ‘Add External Jars..’ button.
Select below files from hadoop folder.
In my case:- /usr/local/hadoop/share/hadoop
4.1Add jar files from /usr/local/hadoop/share/hadoop/common folder.
4.2Add jar files from /usr/local/hadoop/share/hadoop/common/lib folder.
4.3Add jar files from /usr/local/hadoop/share/hadoop/mapreduce folder (Don’t need to add
hadoop-mapreduce-examples-2.7.3.jar)
4.4Add jar files from /usr/local/hadoop/share/hadoop/yarn folder.
Click on ok. Now you can see, all error in code is gone.
Step 5
Running Mapreduce Code.
5.1Make input file for WordCount Project.
Right Click on WordCount project-> new -> File. Write File name and click on ok. You can
copy and paste below contains into your input file.
car bus bike
bike bus aeroplane
truck car bus
5.2Right click on WordCount Project -> click on RunAs. -> click on Run Configuration…
Make new configuration by clicking on ‘new launch configuration’. Set Configuration
Name, Project Name and Class file name.
Output of WordCount Application and output logs in console.
Refresh WordCount Project. Right Click on project -> click on Refresh. You can find ‘out’
directory in project explorer. Open ‘out’ directory. There will be ‘part-r-00000’ file. Double click
to open it.
PRACTICAL NO – 3
Exp No:
Date:
THEORY:
I n mathematics, matrix multiplication or the matrix product is a binary operation that produces a
matrix from two matrices. In more detail, if A is an n × m matrix and B is an m × p matrix, their
matrix product AB is an n × p matrix, in which the m entries across a row of A are multiplied
with the m entries down a column of B and summed to produce an entry of AB. When two
linear transformations are represented by matrices, then the matrix product represents the
composition of the two transformations.
produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of columns of N
produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the number of rows of M.
r eturn Set of (key,value) pairs that each key (i,k), has list with values (M,j,mij) and (N, j,njk)
for all possible values of j.
wgethttps://goo.gl/G4MyHp-O
hadoop-common-2.2.0.jar
$ wgethttps://goo.gl/KT8yfB-O
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
@Override
Integer.parseInt(conf.get("p"));
// outputKey.set(i,k);
+ "," + indicesAndValue[3]);
// outputValue.set(M,j,Mij);
context.write(outputKey, outputValue);
}
} else {
+ indicesAndValue[3]);
context.write(outputKey, outputValue);
}
}
}
}
tep 3. Creating Reducer.java file for Matrix
S
Multiplication.
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.HashMap;
{ @Override
String[] value;
//key=(i,k),
//Values = [(M/N,j,V/W),..]
value = val.toString().split(",");
if (value[0].equals("M")) {
hashA.put(Integer.parseInt(value[1]),
Float.parseFloat(value[2])); } else {
hashB.put(Integer.parseInt(value[1]), Float.parseFloat(value[2]));
}
}
int n = Integer.parseInt(context.getConfiguration().get("n"));
float result = 0.0f;
float m_ij;
float n_jk;
}
context.write(null,
}
}
}
tep 4. Creating MatrixMultiply.java file for
S
import org.apache.hadoop.conf.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputForm
at;
import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
{ if (args.length != 2) {
<out_dir>"); System.exit(2);
}
conf.set("m", "1000");
conf.set("n", "100");
conf.set("p", "1000");
@SuppressWarnings("deprecation")
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new
Path(args[1])); job.waitForCompletion(true);
}
}
$ javac -cp
hadoop-common-2.2.0.jar:hadoop-mapreduce-client-core-2.7.1.jar:operation/:. - d
operation/ Map.java
$ javac -cp
hadoop-common-2.2.0.jar:hadoop-mapreduce-client-core-2.7.1.jar:operation/:. - d
operation/ Reduce.java
$ javac -cp
h adoop-common-2.2.0.jar:hadoop-mapreduce-client-core-2.7.1.jar:operation/:. - d
operation/ MatrixMultiply.java
Step 6. Let’s retrieve the directory after compilation.
$ ls -R operation/
operation/:
www
operation/www:
ehadoopinfo
operation/www/ehadoopinfo:
com
operation/www/ehadoopinfo/com:
added manifest
M,0,1,2
M,1,0,3
M,1,1,4
$ cat N
N,0,0,5
N,0,1,6
N,1,0,7
N,1,1,8
0,0,19.0
0,1,22.0
1,0,43.0
1,1,50.0