0% found this document useful (0 votes)

17 views54 pages

BDA Lab Manual

The document outlines the curriculum for the Big Data Analytics Laboratory at JKKN College of Engineering & Technology, detailing various experiments and procedures related to Hadoop installation and file management tasks. It includes step-by-step instructions for installing Hadoop, implementing matrix multiplication using MapReduce, and performing file management operations in HDFS. The document serves as a practical guide for students in the Information Technology department during the academic year 2023-2024.

Uploaded by

mouyadharshiniscse2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views54 pages

BDA Lab Manual

Uploaded by

mouyadharshiniscse2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

JKKN COLLEGE OF

ENGINEERING &
TECHNOLOGY
(Approved by AICTE and Affiliated to Anna University)
Natarajapuram, Komarapalayam, Namakkal-638 183.

DEPARTMENT OF INFORMATION TECHNOLOGY BONAFIDE

CERTIFICATE

Certified that this is the bonafide record of work done by Mr/Ms............................................……..

with Register number …………………………… of…………………………...III Year / V Semester

during the academic year 2023 – 2024 for the CCS334 – BIG DATA ANALYTICS LABORATORY.

Signature of Lab In-charge Signature of Head of the Department

Submitted for the Anna University Practical Examination held on

Internal Examiner External Examiner

i
INDEX

Page Faculty
Ex No. Name of the Experiment Mark
No. Signature
Downloading and installing Hadoop; Understanding
1
different Hadoop modes. Startup scripts,
Configuration files.

2 Hadoop Implementation of file management

tasks, such as Adding files and directories,
retrieving files and Deleting files

3 Implement of Matrix Multiplication with Hadoop

Map Reduce

4 Run a basic Word Count Map Reduce program to

understand Map Reduce Paradigm.

5 Installation of Hive along with practice examples.

6 Installation of HBase, Installing thrift along with

Practice examples
.

7 Practice importing and exporting data from various

databases.

ii
Ex.No:1
HADOOP INSTALLATION
Date:

AIM
To Download and install Hadoop and Understanding different Hadoop modes. Startup scripts,
Configuration files.

PROCEDURE:
Step by step Hadoop 2.8.0 installation on Windows 10 Prepare:

These software‟s should be prepared to install Hadoop 2.8.0 on window 10 64 bits.

1) Download Hadoop
2.8.0 (Link:
http://wwweu.apache.org/dist/hadoop/common/hadoop-
2.8.0/hadoop-2.8.0.tar.gz OR
http://archive.apache.org/dist/hadoop/core//hadoop-2.8.0/ha
d oop- 2.8.0.tar.gz)
2) Java JDK 1.8.0.zip
(Link: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-
2133151.html)
Set up:
1) Check either Java 1.8.0 is already installed on your system or not, use “Javac
-version" to check Java version
2) If Java is not installed on your system then first install java under "C:\JAVA" Javasetup
3) Extract files Hadoop 2.8.0.tar.gz or Hadoop-2.8.0.zip and
place under "C:\Hadoop-2.8.0" hadoop
4) Set the path HADOOP_HOME Environment variable on windows 10(see
Step 1,2, 3 and 4 below) hadoop
5) Set the path JAVA_HOME Environment variable on windows 10(see Step
1, 2,3 and 4 below) java
6) Next we set the Hadoop bin directory path and JAVA bin directory path

1
Configuration
a) File C:/Hadoop-2.8.0/etc/hadoop/core-site.xml, paste below xml
paragraph andsave this file.
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
b) Rename "mapred-site.xml.template" to "mapred-site.xml" and edit this file
C:/Hadoop- 2.8.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and
savethis file.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration
>

c) Create folder "data" under "C:\Hadoop-2.8.0"

1) Create folder "datanode" under "C:\Hadoop-2.8.0\data"
2) Create folder "namenode" under "C:\Hadoop-2.8.0\data" data
d) Edit file C:\Hadoop-2.8.0/etc/hadoop/hdfs-site.xml, paste below
xml paragraphand save this file.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>

2
<name>dfs.namenode.name.dir</name>
<value>C:\hadoop-2.8.0\data\namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>C:\hadoop-2.8.0\data\datanode</value>
</property>
</configuration>
e) Edit file C:/Hadoop-2.8.0/etc/hadoop/yarn-site.xml, paste below
xml paragraphand save this file.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
f) Edit file C:/Hadoop-2.8.0/etc/hadoop/hadoop-env.cmd by closing the command
line "JAVA_HOME=%JAVA_HOME%" instead of set "JAVA_HOME=C:\Java" (On
C:\java this is path to file jdk.18.0)

Hadoop Configuration
7) Download file Hadoop Configuration.zip
(Link:https://github.com/MuhammadBilalYar/HADOOP-INSTALLATION
- ON-
WINDOW-10/blob/master/Hadoop%20Configuration.zip)
8) Delete file bin on C:\Hadoop-2.8.0\bin, replaced by file bin on file
just download(from Hadoop Configuration.zip).

3
9) Open cmd and typing command "hdfs namenode –format" .You will
see hdfs namenode –format
Testing
10) Open cmd and change directory to "C:\Hadoop-2.8.0\sbin" and
type "start-all.cmd" to start apache.
11) Make sure these apps are running.
a) Name node
b)Hadoop data node
c) YARN Resource Manager
d)YARN Node Manager hadoop nodes
12) Open: http://localhost:8088
13) Open: http://localhost:50070

RESULT
Installing Hadoop and Understanding different Hadoop modes. Startup scripts, Configuration
files was executed successfully.

4
5
Ex.No:2
HADOOP IMPLEMENTATION OF FILE MANAGEMENT TASKS
Date:

AIM:
To write a program to implement Hadoop Implementation of file management tasks

PROGRAM :
Implement the following file management tasks in Hadoop:
i. Adding files and directories
ii. Retrieving files
iii. Deleting files
Adding Files and Directories to HDFS
Before you can run Hadoop programs on data stored in HDFS, you„ll need to put the
data into HDFS first . Let„s create a directory and put a file in it. HDFS has a default working
directory of /user/$USER, where $USER is your login user name. This directory isn„t
automatically created for you, though, so let„s create it with the mkdir command. For the
purpose of illustration, we use chuck. You should substitute your user name in the example
commands.
hadoop fs -mkdir /user/chuck
hadoop fs -put example.txt /user/chuck

Retrieving Files from HDFS

The Hadoop command get copies files from HDFS back to the local filesystem. To
retrieve example.txt,
we can run the following command:
hadoop fs -cat example.txt

Deleting files from HDFS

hadoop fs -rm example.txt

6
● Command for creating a directory in hdfs is “hdfs dfs –mkdir /lendicse”.
● Adding directory is done through the command “hdfs dfs –put lendi_english /”.

7
RESULT

Thus, the program to implement Hadoop Implementation of file management tasks was written,
executed and verified successfully.

8
Ex.No:3 IMPLEMENT OF MATRIX MULTIPLICATION WITH HADOOP
MAP REDUCE
Date:

AIM:
To Develop a Map Reduce program to implement Matrix Multiplication.

Procedure:
In mathematics, matrix multiplication or the matrix product is a binary operation that
produces a matrix from two matrices. The definition is motivated by linear equations and linear
transformations on vectors, which have numerous applications in applied mathematics, physics,
and engineering. In more detail, if A is an n × m matrix and B is an m × p matrix, their matrix
product AB is an n × p matrix, in which the m entries across a row of A are multiplied with the
m entries down a column of B and summed to produce an entry of AB. When two linear
transformations are represented by matrices, then the matrix product represents the composition
of the two transformations.

● Create two files M1, M2 and put the matrix values. (sperate columns with spaces and
rows with a line break)

● Put the above files to HDFS at location /user/clouders/matrices/

9
Algorithm for Map Function.

a. for each element mij of M do

produce (key,value) pairs as ((i,k), (M,j,mij), for k=1,2,3,.. upto the number of
columns of N
b. for each element njk of N do
produce (key,value) pairs as ((i,k),(N,j,Njk), for i = 1,2,3,.. Upto the
number of rows of M.
c. return Set of (key,value) pairs that each key (i,k), has list
with values (M,j,mij) and (N, j,njk) for all possible values of j.

Algorithm for Reduce Function.

d. for each key (i,k) do

e. sort values begin with M by j in listM sort values begin with N by j
in listN multiply mij and njk for jth value of each list
f. sum up mij x njk return (i,k), Σj=1 mij x njk

10
Step 1. Download the hadoop jar files with these links.
Download Hadoop Common Jar files: https://goo.gl/G4MyHp
$ wget https://goo.gl/G4MyHp -O hadoop-common-2.2.0.jar
Download Hadoop Mapreduce Jar File: https://goo.gl/KT8yfB
$ wget https://goo.gl/KT8yfB -O hadoop-mapreduce-client-core-2.7.1.jar

Step 2. Creating Mapper file for Matrix Multiplication.

import java.io.DataInput;
import
java.io.DataOutput;
import
java.io.IOException;imp
ort java.util.ArrayList;

import
org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import
org.apache.hadoop.io.DoubleWritable;
import
org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import
org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import
org.apache.hadoop.mapreduce.lib.input.*;
import
org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.ReflectionUtils;
class Element implements writable {
int
tag;
int
index
;
double value;
Element(){
tag = 0;

11
index = 0;
value = 0.0;
}
Element(int tag, int index, double value)
{
this.tag = tag;
this.index =
index;
this.value
= value;
}
@Override
public void readFields(DataInput input)
throws IOException {tag =
input.readInt(); index =
input.readInt();
value = input.readDouble();
}
@Override
public void write(DataOutput output)
throws IOException
{output.writeInt(tag);
output.writeInt(index);
output.writeDouble(value);
}
}
class Pair implements WritableComparable<Pair>
{
int i;
int j;

12
Pair() {
i = 0;
j = 0;
}
Pair(int i, int j)
{ this.i = i;
this.j = j;

}
@Override
public void readFields(DataInput input)
throws IOException
{

i = input.readInt();
j = input.readInt();
}
@Override
public void write(DataOutput output)
throws IOException
{output.writeInt(i);
output.writeInt(j);
}
@Override
public int compareTo(Pair compare)
{
if (i > compare.i) {
return 1;
} else if ( i <
compare.i)
{return -1;
} else {
if(j > compare.j) {
return 1;
} else if (j <
co
mp
a
re.
j
){
ret
ur
n
-1;
13
}}
return 0;
}
public String
toString()
{
return i + " " + j + " ";
}
}
public class Multiply {
public static class MatriceMapperM extends Mapper<Object,Text,IntWritable,Element>
{
@Override
public void map(Object key, Text value, Context context) throws
IOException, InterruptedException

{
String readLine =
{
String readLine =
value.toString();
String[] stringTokens
= readLine.split(",");
int index =
Integer.parseInt(stringTokens[0]); double
elementValue =
Double.parseDouble(stringTokens[2]);
Element e = new Element(0, index, elementValue);
IntWritable keyValue = new
IntWritable(Integer.parseInt(stringTokens[1]));
context.write(keyValue, e);
}
}
public static class MatriceMapperN extends
Mapper<Object,Text,IntWritable,Element> {@Override
public void map(Object key, Text value, Context
context)
throws IOException, InterruptedException
{
String readLine = value.toString();
String[] stringTokens =
readLine.split(",");
int index = Integer.parseInt(stringTokens[1]);
double elementValue =

14
Double.parseDouble(stringTokens[2]); Element e = new
Element(1,index, elementValue); IntWritable keyValue = new
IntWritable(Integer.parseInt(stringTokens[0]));
context.write(keyValue, e);
}
}
public static class ReducerMxN extends
Reducer<IntWritable,Element, Pair,DoubleWritable>
{
@Override
public void reduce(IntWritable key, Iterable<Element> values, Context
context) throwsIOException, InterruptedException
{
ArrayList<Element> M = new ArrayList<Element>();
ArrayList<Element> N = new ArrayList<Element>();Configuration
conf
= context.getConfiguration(); for(Element element :

for(int i=0;i<M.size();i++)
{ for(int j=0;j<N.size();j++)
{

Pair p = new Pair(M.get(i).index,N.get(j).index);

double multiplyOutput = M.get(i).value * N.get(j).value;

context.write(p, new DoubleWritable(multiplyOutput));

}
}
}
}

public static class MapMxN extends Mapper<Object, Text, Pair,

DoubleWritable>
{
@Override
public void map(Object key, Text value, Context context)
throws IOException, InterruptedException
{
String readLine = value.toString();
String[] pairValue = readLine.split(" ");
Pair p = new
Pair(Integer.parseInt(pairValue[0]),Integer.parseInt(pairValue[1]));
DoubleWritable val = new
15
DoubleWritable(Double.parseDouble(pairValue[2]));
context.write(p, val);
}

16
}
public static class ReduceMxN extends Reducer<Pair,
DoubleWritable, Pair,DoubleWritable>
{
@Override
public void reduce(Pair key, Iterable<DoubleWritable> values,
Context context)throws IOException, InterruptedException
{
double sum = 0.0;
for(DoubleWritable value :
values) {
sum += value.get();
}
context.write(key, new DoubleWritable(sum));
}
}
public static void main(String[] args)
throws Exception {Job job =
Job.getInstance();
job.setJobName("MapIntermediat
e");
job.setJarByClass(Project1.class);
MultipleInputs.addInputPath(job, new Path(args[0]),
TextInputFormat.class,MatriceMapperM.class);
MultipleInputs.addInputPath(job, new Path(args[1]),
TextInputFormat.class,MatriceMapperN.class);
job.setReducerClass(ReducerMxN.class);
job.setMapOutputKeyClass(IntWritable.class);
job.setMapOutputValueClass(Element.class);
job.setOutputKeyClass(Pair.class);
job.setOutputValueClass(DoubleWritable.class);
job.setOutputFormatClass(TextOutputFormat.clas
s); FileOutputFormat.setOutputPath(job, new
Path(args[2]));job.waitForCompletion(true);
Job job2 = Job.getInstance();
job2.setJobName("MapFinalOutput");
job2.setJarByClass(Project1.class);
job2.setMapperClass(MapMxN.class);
job2.setReducerClass(ReduceMxN.class);
job2.setMapOutputKeyClass(Pair.class);
job2.setMapOutputValueClass(DoubleWritable.class);

17
job2.setOutputKeyClass(Pair.class);
job2.setOutputValueClass(DoubleWritable.class);
job2.setInputFormatClass(TextInputFormat.class)
;
job2.setOutputFormatClass(TextOutputFormat.cl
ass);
FileInputFormat.setInputPaths(job2, new Path(args[2]));
FileOutputFormat.setOutputPath(job2, new Path(args[3]));
job2.waitForCompletion(true);
}
}

Step 3. Compiling the program in particular folder named as operation

#!/bin/bash
rm -rf multiply.jar
classes module load
hadoop/2.6.0
mkd
i r -p
class
es
javac -d classes -cp classes:`$HADOOP_HOME/bin/hadoop classpath`
Multiply.javajar cf multiply.jar -C classes .
echo "end"

Step 4. Running the program in particular folder named as operation

export
HADOOP_CONF_DIR=/home/$USER/cometc
l ustermodule load hadoop/2.6.0
myhado
op-
configur
e.sh
start-dfs
. sh
start-yar
n.sh
hdfs dfs -mkdir -p /user/$USER
hdfs dfs -put M-matrix-large.txt
/user/$USER/M- matrix-large.txthdfs dfs
-put N-matrix-large.txt
/user/$USER/N-matrix-large.txt

18
hadoop jar multiply.jar edu.uta.cse6331.Multiply
/user/$USER/M-matrix- large.txt
/user/$USER/N-matrix-large.txt /user/$USER/intermediate
/user/$USER/outputrm -rf output-distr
mkdir output-distr
hdfs dfs -get /user/$USER/output/part* output-distr
Output:

module
load
hadoop/2.
6.0 rm -rf
output
intermedi
ate
hadoop --config $HOME jar multiply.jar edu.uta.cse6331.Multiply
M-matrix-small.txt N- matrix-small.txt intermediate output

RESULT

Thus, the program to Implement of Matrix Multiplication with Hadoop Map Reduce cluster
was written, executed and verified successfully.
19
Ex.No:4
IMPLEMENTATION OF WORD COUNT PROGRAMS
USING MAP REDUCE
Date:

AIM:
To write a program to implement MapReduce application for word counting on Hadoop cluster
PROGRAM
import
java.io.IOException;
import
java.util.StringTokenizer;
import
org.apache.hadoop.io.IntWritable;
import
org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapreduce.Mapper;
import
org.apache.hadoop.mapreduce.Reducer;
import
org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.fs.Path;
public class WordCount
{
public static class Map extends
Mapper<LongWritable,Text,Text,IntWritable> { public void
map(LongWritable key, Text value,Context context) throws
IOException,InterruptedException{
String line = value.toString();
StringTokenizer tokenizer = new

20
StringTokenizer(line); while
(tokenizer.hasMoreTokens())
{ value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
}}

21
public void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException {
int sum=0;
for(IntWritable x:
values)
{
sum+=x.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws
Exception { Configuration conf= new
Configuration();
Job job = new Job(conf,"My Word Count
Program"); job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class
); Path outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//deleting the output path automatically from hdfs so that we don't have to
delete it explicitly
outputPath.getFileSystem(conf).delete(outputPath);
//exiting the job only if the flag value becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
The entire MapReduce program can be fundamentally
divided into three parts:
• Mapper Phase Code
• Reducer Phase Code
• Driver Code
We will understand the code for each of these
three parts sequentially.

22
Mapper code:
public static class Map extends
1. Create an input directory in HDFS.

hadoop fs -mkdir /input_dir

2. Copy the input text file named input_file.txt in the input
directory (input_dir)of HDFS.

23
Mapper<LongWritable,Text,Text,IntWritable> {
public void map(LongWritable key, Text value, Context context) throws
IOException,InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new
StringTokenizer(line); while
(tokenizer.hasMoreTokens())
{ value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
• We have created a class Map that extends the classMapper which is
already defined in the MapReduce
Framework.
• We define the data types of input and output key/valuepair after the class
declaration using angle brackets.
• Both the input and output of the Mapper is a key/valuepair.
• Input:
◦ The key is nothing but the offset
of each line in the text file:LongWritable
◦ The value is each individual line
(as shown in the figure at the right): Text
• Output:
◦ The key is the tokenized words: Text
◦ We have the hardcoded value in
our case which is 1: IntWritable
◦ Example – Dear 1, Bear 1, etc.
• We have written a java code where we have tokenizedeach word and assigned them
a hardcoded value equal
to 1.
Reducer Code:
public static class Reduce extends
Reducer<Text,IntWritable,Text,IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,Context
context)
throws
IOException,InterruptedException { int
sum=0;
for(IntWritable x: values)
{
sum+=x.get();

24
}
context.write(key, new IntWritable(sum));

1. Verify content of the copied file.

25
}
}
• We have created a class Reduce
which extends class Reducer like that of
Mapper.
• We define the data types of input and output key/valuepair after the class
declaration using angle brackets as
done for Mapper.
Both the input and the output of the Reducer is a keyvalue pair.
• Input:
◦ The key nothing but those unique words whichhave been generated after the
sorting and shuffling phase: Text
◦ The value is a list of integers
corresponding to each key:
IntWritable
◦ Example – Bear, [1, 1], etc.
• Output:
◦ The key is all the unique words
present in the input text file: Text
◦ The value is the number of
occurrences of each of the unique words:
IntWritable
◦ Example – Bear, 2; Car, 3, etc.
• We have aggregated the values present in each of thelist corresponding to each
key and produced the final
answer.
• In general, a single reducer is created for each of theunique words, but, you can
specify the number of
reducer in mapred-site.xml.
Driver Code:
Configuration conf= new Configuration();
Job job = new Job(conf,"My Word Count
Program"); job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class
); Path outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

26
• In the driver class, we set the configuration of our

1. Run MapReduceClient.jar and also provide input and out directories.

hadoop jar C:/MapReduceClient.jar wordcount /input_dir /output_dir

t r r

t t tf r l
44

27
• The method setInputFormatClass () is used
for specifying that how a Mapper will read the input
data or what will
be the unit of work. Here, we have chosen
TextInputFormat so that single line is read by the
mapper at a time from the input text file
The main () method is the entry point for the
driver. In this method, we instantiate a new
Configuration object for the job.
Run the MapReduce code:
The command for running a MapReduce code is:
hadoop jar hadoop-mapreduce-example.jar
WordCount / sample/input /sample/output

Prepare:

Download MapReduceClient.jar
(Link: https://github.com/MuhammadBilalYar/HADOOP-
INSTALLATION- ON-WINDOW-10/blob/master/MapReduceClient.jar)
1. Download Input_file.txt
(Link: https://github.com/MuhammadBilalYar/HADOOP-
INSTALLATION-ON-WINDOW-10/blob/master/input_file
. txt)

Place both files in "C:/"

2. Verify content for generated output file.

28
4) To delete file from HDFS directory

5) To delete directory from HDFS directory

29
RESULT

Thus, the java program to implement Mapreduce application for word counting on
hadoop cluster was written, executed and verified successfully.

30
Ex.No:5
INSTALLATION OF HIVE ALONG WITH PRACTICE EXAMPLES
Date:

AIM:

To write a program to Installation of Hive along with practice examples

PROCEDURE:
Step 1: Verifying JAVA Installation
Java must be installed on your system before installing Hive. Let us verify java installation using the
following command:

If Java is already installed on your system, you get to see the following response:

If java is not installed in your system, then follow the steps given below for installing java.
Installing
Java
Step I:
Download java (JDK <latest version> - X64.tar.gz) by visiting
the following link
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
. Then jdk-7u71-linux-x64.tar.gz will be downloaded onto your system.
Step II:
Generally you will find the downloaded java file in the Downloads folder. Verify it and extract
the jdk- 7u71-linux-x64.gz file using the following commands.

Step III:

31
To make java available to all the users, you have to move it to the location “/usr/local/”. Open root, and
type the following commands.

Step IV:
For setting up PATH and JAVA_HOME variables, add the following commands to ~/.bashrc file.

Now apply all the changes into the current running system.

Step V:
Use the following commands to configure java alternatives:

Now verify the installation using the command java -version from the terminal as explained above.
Step 2: Verifying Hadoop Installation
Hadoop must be installed on your system before installing Hive. Let us verify the Hadoop
installation using the following command:

If Hadoop is already installed on your system, then you will get the following response:

If Hadoop is not installed on your system, then proceed with the following steps:
Downloading Hadoop
Download and extract Hadoop 2.4.1 from Apache Software Foundation using the following commands.

32
Installing Hadoop in Pseudo Distributed Mode
The following steps are used to install Hadoop 2.4.1 in pseudo distributed mode.
Step I: Setting up Hadoop
You can set Hadoop environment variables by appending the following commands to ~/.bashrc file.

Now apply all the changes into the current running system.

Step II: Hadoop Configuration

You can find all the Hadoop configuration files in the location “$HADOOP_HOME/etc/hadoop”.
You need to make suitable changes in those configuration files according to your Hadoop
infrastructure.

In order to develop Hadoop programs using java, you have to reset the java environment variables in
hadoop-env.sh file by replacing JAVA_HOME value with the location of java in your system.

Given below are the list of files that you have to edit to configure Hadoop.
core-site.xml
The core-site.xml file contains information such as the port number used for Hadoop instance, memory
allocated for the file system, memory limit for storing the data, and the size of Read/Write buffers.
Open the core-site.xml and add the following properties in between the <configuration> and
</configuration> tags.

33
hdfs-site.xml
The hdfs-site.xml file contains information such as the value of replication data, the namenode path,
and the datanode path of your local file systems. It means the place where you want to store the
Hadoop infra.
Let us assume the following data.

Open this file and add the following properties in between the <configuration>, </configuration>
tags in this file.

Note: In the above file, all the property values are user-defined and you can make changes according to
your Hadoop infrastructure.
yarn-site.xml

34
This file is used to configure yarn into Hadoop. Open the yarn-site.xml file and add the following
properties in between the <configuration>, </configuration> tags in this file.

mapred-site.xml
This file is used to specify which MapReduce framework we are using. By default, Hadoop contains
a template of yarn-site.xml. First of all, you need to copy the file from mapred-site,xml.template to
mapred-site.xml file using the following command.

Open mapred-site.xml file and add the following the <configuration>,

properties in between
</configuration> tags in this file.

Verifying Hadoop Installation

The following steps are used to verify the Hadoop installation.
Step I: Name Node Setup
Set up the namenode using the command “hdfs namenode -format” as follows.

The expected result is as follows.

35
Step II: Verifying Hadoop dfs
The following command is used to start dfs. Executing this command will start your Hadoop file
system.

The expected output is as follows:

Step III: Verifying Yarn Script

The following command is used to start the yarn script. Executing this command will start your yarn
daemons.

The expected output is as follows:

Step IV: Accessing Hadoop on Browser

The default port number to access Hadoop is 50070. Use the following url to get Hadoop services on
your browser.

Step V: Verify all applications for cluster

The default port number to access all applications of cluster is 8088. Use the following url

to visit this service.

36
Step 3: Downloading Hive
We use hive-0.14.0 in this tutorial. You can download it by visiting the following link
http://apache.petsads.us/hive/hive-0.14.0/. Let us assume it gets downloaded onto the /Downloads
directory. Here, we download Hive archive named “apache-hive-0.14.0-bin.tar.gz” for this tutorial.
The following command is used to verify the download:

On successful download, you get to see the following response:

Step 4: Installing Hive

The following steps are required for installing Hive on your system. Let us assume the Hive archive
is downloaded onto the /Downloads directory.
Extracting and verifying Hive Archive
The following command is used to verify the download and extract the hive archive:

On successful download, you get to see the following response:

Copying files to /usr/local/hive directory
We need to copy the files from the super user “su -”. The following commands are used to copy the
files from the extracted directory to the /usr/local/hive” directory.

Setting up environment for Hive

You can set up the Hive environment by appending the following lines to ~/.bashrc file:

The following command is used to execute ~/.bashrc file.

37
To configure Hive with Hadoop, you need to edit the hive-env.sh file, which is placed in the
$HIVE_HOME/conf directory. The following commands redirect to Hive config folder and copy
the template file:

Edit the hive-env.sh file by appending the following line:

Hive installation is completed successfully. Now you require an external database server to configure
Metastore. We use Apache Derby database.

Step 6: Downloading and Installing Apache Derby

Follow the steps given below to download and install Apache Derby:
Downloading Apache Derby
The following command is used to download Apache Derby. It takes some time to download.

The following command is used to verify the download:

On successful download, you get to see the following response:
Extracting and verifying Derby archive
The following commands are used for extracting and verifying the Derby archive:

On successful download, you get to see the following response:

Copying files to /usr/local/derby directory
We need to copy from the super user “su -”. The following commands are used to copy the files
from the extracted directory to the /usr/local/derby directory:

38
You can set up the Derby environment by appending the following lines to ~/.bashrc file:

Create a directory to store Metastore

Create a directory named data in $DERBY_HOME directory to store Metastore data.

Derby installation and environmental setup is now complete.

Step 7: Configuring Metastore of Hive
Configuring Metastore means specifying to Hive where the database is stored. You can do this by
editing the hive-site.xml file, which is in the $HIVE_HOME/conf directory. First of all, copy the
template file using the following command:

Edit hive-site.xml and append the following lines between the <configuration> and
</configuration> tags:

Create a file named jpox.properties and add the following lines into it:

39
Step 8: Verifying Hive Installation
Before running Hive, you need to create the /tmp folder and a separate Hive folder in HDFS. Here,
we use the /user/hive/warehouse folder. You need to set write permission for these newly created
folders as shown below:

Now set them in HDFS before verifying Hive. Use the following commands:

The following commands are used to verify Hive installation:

On successful installation of Hive, you get to see the following response:

The following sample command is executed to display all the tables:

40
Use Hive to create, alter, and drop databases, tables, views, functions, and indexes.

PROGRAM :
SYNTAX for HIVE Database Operations
DATABASE Creation
CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>
Drop Database Statement
DROP DATABASE Statement DROP (DATABASE|SCHEMA) [IF EXISTS]
database_name
[RESTRICT|CASCADE]; Creating
and Dropping Table in HIVE
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]
table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment] [ROW FORMAT row_format] [STORED AS
file_format]
Loading Data into table
log_data Syntax:
LOAD DATA LOCAL INPATH '<path>/u.data' OVERWRITE INTO TABLE
u_data;
Alter Table in HIVE
Syntax
ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name
new_type ALTER TABLE name REPLACE COLUMNS (col_spec[,
col_spec ...]) Creating and Dropping View
CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT
column_comment], ...) ] [COMMENT table_comment] AS SELECT ...
Droppin
g View
Syntax:
DROP VIEW view_name
Functions in HIVE
String Functions:- round(), ceil(), substr(), upper(), reg_exp()
etc Date and Time Functions:- year(), month(), day(),
to_date() etc Aggregate Functions :- sum(), min(), max(),
count(), avg() etc 43
INDEXES
CREATE INDEX index_name ON TABLE base_table_name (col_name,
...) AS
'index.handler.class.name'
[WITH DEFERRED
REBUILD]
[IDXPROPERTIES (property_name=property_value,
...)] [IN TABLE index_table_name]

41
[PARTITIONED BY (col_name,
...)] [
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
Creating Index
CREATE INDEX index_ip ON TABLE log_data(ip_address) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED
REBUILD;
Altering and Inserting Index
ALTER INDEX index_ip_address ON log_data REBUILD;
Storing Index Data in Metastore
SET
hive.index.compact.file=/home/administrator/Desktop/big/metastore_db/tmp/index_ipa
d d ress_result;
SET
hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInpu
tFor mat;
Dropping Index
DROP INDEX INDEX_NAME on TABLE_NAME;

RESULT:

Thus, the program to Installation of Hive along with practice examples was written, executed and
verified successfully.

42
Ex.No:6
INSTALLATION OF HBASE, INSTALLING THRIFT ALONG WITH
EXAMPLES
Date:

AIM:

To write a procedure for Installation of HBase, Installing thrift along with examples.

PROCEDURE:

Installing HBase
We can install HBase in any of the three modes: Standalone mode, Pseudo Distributed mode, and
Fully Distributed mode.
Installing HBase in Standalone Mode
Download the latest stable version of HBase form http://www.interior-
dsgn.com/apache/hbase/stable/ using “wget” command, and extract it using the tar “zxvf” command.
See the following command.
$cd usr/local/
$wget
http://www.interior-dsgn.com/apache/hbase/stable/hbase-0.98.8
- hadoop2-bin.tar.gz
$tar -zxvf hbase-0.98.8-hadoop2-bin.tar.gz
Shift to super user mode and move the HBase folder to /usr/local as shown below.
$su
$password: enter your password
here mv hbase-0.99.1/* Hbase/
Configuring HBase in Standalone Mode
Before proceeding with HBase, you have to edit the following files and configure
HBase. hbase-env.sh
Set the java Home for HBase and open hbase-env.sh file from the conf folder. Edit JAVA_HOME
environment variable and change the existing path to your current JAVA_HOME variable as shown
below.
cd
/usr/local/Hbase/con
f gedit hbase-env.sh
This will open the env.sh file of HBase. Now replace the existing JAVA_HOME value with your
current value as shown below.

43
export JAVA_HOME=/usr/lib/jvm/java-1.7.0
hbase-site.xml
This is the main configuration file of HBase. Set the data directory to an appropriate location by
opening the HBase home folder in /usr/local/HBase. Inside the conf folder, you will find several files,
open the hbase-site.xml file as shown below.
#cd
/usr/local/HBase/
#cd conf
# gedit hbase-site.xml
Inside the hbase-site.xml file, you will find the <configuration> and </configuration> tags. Within
them, set the HBase directory under the property key with the name “hbase.rootdir” as shown below.

With this, the HBase installation and configuration part is successfully complete. We can start HBase
by using start-hbase.sh script provided in the bin folder of HBase. For that, open HBase Home
Folder and run HBase start script as shown below.
$cd /usr/local/HBase/bin
$./start-hbase.sh
If everything goes well, when you try to run HBase start script, it will prompt you a message saying
that HBase has started.
starting master, logging to /usr/local/HBase/bin/../logs/hbase-tpmaster-localhost.localdomain.out
Installing HBase in Pseudo-Distributed Mode
Let us now check how HBase is installed in pseudo-distributed mode.
Configuring HBase
Before proceeding with HBase, configure Hadoop and HDFS on your local system or on a remote
system and make sure they are running. Stop HBase if it is running.
hbase-site.xml
Edit hbase-site.xml file to add the following properties.

44
It will mention in which mode HBase should be run. In the same file from the local file system,
change the hbase.rootdir, your HDFS instance address, using the hdfs://// URI syntax. We are
running HDFS on the localhost at port 8030.

Starting HBase
After configuration is over, browse to HBase home folder and start HBase using the following
command.
$cd /usr/local/HBase
$bin/start-hbase.sh
Note: Before starting HBase, make sure Hadoop is running.
Checking the HBase Directory in HDFS
HBase creates its directory in HDFS. To see the created directory, browse to Hadoop bin and type the
following command.
$ ./bin/hadoop fs -ls /hbase
If everything goes well, it will give you the following output.
Found 7 items
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/.tmp
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/WALs
drwxr-xr-x - hbase users 0 2014-06-25 18:48 /hbase/corrupt
drwxr-xr-x - hbase users 0 2014-06-25 18:58 /hbase/data
-rw-r--r-- 3 hbase users 42 2014-06-25 18:41 /hbase/hbase.id
-rw-r--r-- 3 hbase users 7 2014-06-25 18:41 /hbase/hbase.version
drwxr-xr-x - hbase users 0 2014-06-25 21:49 /hbase/oldWALs
Starting and Stopping a Master
Using the “local-master-backup.sh” you can start up to 10 servers. Open the home folder of HBase,
master and execute the following command to start it.
$ ./bin/local-master-backup.sh 2 4
To kill a backup master, you need its process id, which will be stored in a file named
“/tmp/hbase- USER-X-master.pid.” you can kill the backup master using the following
command.
$ cat /tmp/hbase-user-1-master.pid |xargs kill -9

45
HBase Web Interface
To access the web interface of HBase, type the following url in the browser.
http://localhost:60010
This interface lists your currently running Region servers, backup masters and HBase tables.
HBase Region servers and Backup Masters

46
Starting and Stopping RegionServers
You can run multiple region servers from a single system using the following command.
$ .bin/local-regionservers.sh start 2 3
To stop a region server, use the following command.
$ .bin/local-regionservers.sh stop 3

Starting HBaseShell
After Installing HBase successfully, you can start HBase Shell. Below given are the sequence of
steps that are to be followed to start the HBase shell. Open the terminal, and login as super user.
Start Hadoop File System
Browse through Hadoop home sbin folder and start Hadoop file system as shown below.
$cd $HADOOP_HOME/sbin
$start-all.sh
Start HBase
Browse through the HBase root directory bin folder and start HBase.
$cd /usr/local/HBase
$./bin/start-hbase.sh
Start HBase Master Server
This will be the same directory. Start it as shown below.
$./bin/local-master-backup.sh start 2 (number signifies specific
server.)
Start Region
Start the region server as shown below.
$./bin/./local-regionservers.sh start 3
Start HBase Shell
You can start HBase shell using the following command.
$cd bin
$./hbase shell
This will give you the HBase Shell Prompt as shown below.
2014-12-09 14:24:27,526 INFO [main] Configuration.deprecation:
hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help<RETURN>' for list of supported
commands. Type "exit<RETURN>" to leave the HBase Shell
Version 0.98.8-hadoop2, r6cfc8d064754251365e070a10a82eb169956d5fe, Fri
Nov 14 18:26:29 PST 2014
hbase(main):001:0>

47
HBase Tables

RESULT:

Thus, the program to Installation of HBase, Installing thrift along with examples was written,
executed and verified successfully.

48
49
Ex.No:7
IMPORTING AND EXPORTING DATA FROM VARIOUS
DATABASES
Date:

AIM:

To write a procedure for Importing and exporting data from various databases

PROCEDURE:
SQOOP is basically used to transfer data from relational databases such as MySQL, Oracle to data
warehouses such as Hadoop HDFS(Hadoop File System). Thus, when data is transferred from a
relational database to HDFS, we say we are importing data. Otherwise, when we transfer data from
HDFS to relational databases, we say we are exporting data.
Note: To import or export, the order of columns in both MySQL and Hive should be the same.

Importing data from MySQL to HDFS

In order to store data into HDFS, we make use of Apache Hive which provides an SQL-like interface
between the user and the Hadoop distributed file system (HDFS) which integrates Hadoop. We
perform the following steps:
Step 1: Login into
MySQL mysql -u root –
pcloudera

Step 2: Create a database and table and insert

data. create database geeksforgeeeks;
create table geeksforgeeeks.geeksforgeeks(author_name varchar(65), total_no_of_articles int,

50
phone_no int, address varchar(65));
insert into geeksforgeeks values(“Rohan”,10,123456789,”Lucknow”);

Step 3: Create a database and table in the hive where data should be imported.
create table geeks_hive_table(name string, total_articles int, phone_no int, address string) row format
delimited fields terminated by „,‟;

Step 4: Run below the import command on

Hadoop. sqoop import --connect \
jdbc:mysql://127.0.0.1:3306/database_name_in_mysql \
--username root --password cloudera \
--table table_name_in_mysql \
--hive-import --hive-table database_name_in_hive.table_name_in_hive \
--m 1

In the above code following things should be noted.

● 127.0.0.1 is localhost IP address.
● 3306 is the port number for MySQL.
● m is the number of mappers
Step 5: Check-in hive if data is imported successfully or not.

Exporting data from HDFS to MySQL

To export data into MySQL from HDFS, perform the following steps:

51
Step 1: Create a database and table in the hive.
create table hive_table_export(name string,company string, phone int, age int) row format delimited
fields terminated by „,‟;

Step 2: Insert data into the hive table.

insert into hive_table_export

values("Ritik","Amazon",234567891,35); Data in Hive table

Step 3: Create a database and table in MySQL in which data should be exported.

Step 4: Run the following command on Hadoop.

sqoop export --connect \
jdbc:mysql://127.0.0.1:3306/database_name_in_mysql \
--table table_name_in_mysql \
--username root --password cloudera \
--export-dir /user/hive/warehouse/hive_database_name.db/table_name_in_hive \
--m 1 \
-- driver com.mysql.jdbc.Driver
--input-fields-terminated-by ','

Step 5: Check-in MySQL if data is exported successfully or not.

RESULT:

Thus, the program to Importing and exporting data from various databases was, executed and
verified successfully.

Bda Manual
No ratings yet
Bda Manual
80 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Extension: Business Process Mapping Using Data Flow Diagrams (DFDS)
No ratings yet
Extension: Business Process Mapping Using Data Flow Diagrams (DFDS)
17 pages
BIGDATALABCURRENT
No ratings yet
BIGDATALABCURRENT
54 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
Bda Record (24-25)
No ratings yet
Bda Record (24-25)
50 pages
Big Data Manual
No ratings yet
Big Data Manual
19 pages
Data Science
No ratings yet
Data Science
82 pages
Big Data Analytics IT
No ratings yet
Big Data Analytics IT
55 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
BDA
No ratings yet
BDA
19 pages
New Bda Manual
No ratings yet
New Bda Manual
80 pages
Bda Record
No ratings yet
Bda Record
83 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Da Lab Record - Merged
No ratings yet
Da Lab Record - Merged
48 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
43 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Big Data
No ratings yet
Big Data
28 pages
CCS334 BDA Lab Manual
No ratings yet
CCS334 BDA Lab Manual
35 pages
BDA LabManual
No ratings yet
BDA LabManual
20 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Anushka Shetty 35
No ratings yet
Anushka Shetty 35
34 pages
BDA Record
No ratings yet
BDA Record
58 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
HDFS
No ratings yet
HDFS
6 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages
Ccs 334 Bigdata Manual
No ratings yet
Ccs 334 Bigdata Manual
45 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Big Data
No ratings yet
Big Data
23 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
BDA Lab Manual R22
0% (1)
BDA Lab Manual R22
70 pages
Bda Lab
No ratings yet
Bda Lab
39 pages
Bda 1
No ratings yet
Bda 1
54 pages
CCS334 Bda Lab Manual
No ratings yet
CCS334 Bda Lab Manual
48 pages
Cloud Computing Ex 6
No ratings yet
Cloud Computing Ex 6
8 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
42 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
9 pages
BDA Lab Manual by T.Naga Praveena
No ratings yet
BDA Lab Manual by T.Naga Praveena
40 pages
BDA Record
No ratings yet
BDA Record
34 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
Hadoopfile PP
No ratings yet
Hadoopfile PP
83 pages
Hadoop
No ratings yet
Hadoop
19 pages
Unit 1 Bdhall
No ratings yet
Unit 1 Bdhall
66 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Rush
No ratings yet
Rush
90 pages
Notes
No ratings yet
Notes
53 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
62 pages
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
CS8391 Data Structure PDF
No ratings yet
CS8391 Data Structure PDF
347 pages
Final Project
No ratings yet
Final Project
8 pages
Data Preprocessing
No ratings yet
Data Preprocessing
63 pages
BAD601 Module 3 PDF
No ratings yet
BAD601 Module 3 PDF
70 pages
+2 Practical Com Acc Sandeep Hsstimes
No ratings yet
+2 Practical Com Acc Sandeep Hsstimes
62 pages
SDE 1 Fullstack
No ratings yet
SDE 1 Fullstack
3 pages
cp4152 Database Practices Unit 12 Compress
No ratings yet
cp4152 Database Practices Unit 12 Compress
72 pages
Xii CS Practical File
No ratings yet
Xii CS Practical File
41 pages
Lab - 02 - Building Websites Using ASP - NET Core Razor Pages
No ratings yet
Lab - 02 - Building Websites Using ASP - NET Core Razor Pages
27 pages
Roles of Data Scientists in Business and Society
No ratings yet
Roles of Data Scientists in Business and Society
47 pages
Chapter 1 - Introduction To SQL
No ratings yet
Chapter 1 - Introduction To SQL
8 pages
Welcome
No ratings yet
Welcome
23 pages
Class 10 Notes
No ratings yet
Class 10 Notes
7 pages
Srs Template
No ratings yet
Srs Template
10 pages
Next Generation Computing: Anjalai Ammal Mahalingam Engineering College Kovilvenni, India
No ratings yet
Next Generation Computing: Anjalai Ammal Mahalingam Engineering College Kovilvenni, India
49 pages
Enterprise Data Models
No ratings yet
Enterprise Data Models
54 pages
IDOR Guide
No ratings yet
IDOR Guide
41 pages
Developer Guide Into
No ratings yet
Developer Guide Into
79 pages
Big Data Unit 3
No ratings yet
Big Data Unit 3
374 pages
NZ DDL Grant Group
No ratings yet
NZ DDL Grant Group
12 pages
Microsoft Online Services and Support
No ratings yet
Microsoft Online Services and Support
19 pages
Chap-2-Database Security and Authorization
100% (1)
Chap-2-Database Security and Authorization
38 pages
Introduction To Power BI: Lis Sulmont
No ratings yet
Introduction To Power BI: Lis Sulmont
34 pages
Important SQL Practice Questions With Answers
100% (1)
Important SQL Practice Questions With Answers
7 pages
IoT 801 Lab Manual
No ratings yet
IoT 801 Lab Manual
31 pages
EX 10 Trigger
No ratings yet
EX 10 Trigger
4 pages
Current - Log Hook Up
No ratings yet
Current - Log Hook Up
20 pages
DBA Activites
No ratings yet
DBA Activites
4 pages
Sample Technical and Functional Documentation PowerBI Project
No ratings yet
Sample Technical and Functional Documentation PowerBI Project
3 pages

BDA Lab Manual

Uploaded by

BDA Lab Manual

Uploaded by

JKKN COLLEGE OF

DEPARTMENT OF INFORMATION TECHNOLOGY BONAFIDE

Certified that this is the bonafide record of work done by Mr/Ms............................................……..

with Register number …………………………… of…………………………...III Year / V Semester

Signature of Lab In-charge Signature of Head of the Department

Submitted for the Anna University Practical Examination held on

Internal Examiner External Examiner

2 Hadoop Implementation of file management

3 Implement of Matrix Multiplication with Hadoop

4 Run a basic Word Count Map Reduce program to

5 Installation of Hive along with practice examples.

6 Installation of HBase, Installing thrift along with

7 Practice importing and exporting data from various

These software‟s should be prepared to install Hadoop 2.8.0 on window 10 64 bits.

c) Create folder "data" under "C:\Hadoop-2.8.0"

Retrieving Files from HDFS

Deleting files from HDFS

● Put the above files to HDFS at location /user/clouders/matrices/

a. for each element mij of M do

Algorithm for Reduce Function.

d. for each key (i,k) do

Step 2. Creating Mapper file for Matrix Multiplication.

Pair p = new Pair(M.get(i).index,N.get(j).index);

context.write(p, new DoubleWritable(multiplyOutput));

public static class MapMxN extends Mapper<Object, Text, Pair,

Step 3. Compiling the program in particular folder named as operation

Step 4. Running the program in particular folder named as operation

hadoop fs -mkdir /input_dir

1. Verify content of the copied file.

1. Run MapReduceClient.jar and also provide input and out directories.

hadoop jar C:/MapReduceClient.jar wordcount /input_dir /output_dir

Place both files in "C:/"

5) To delete directory from HDFS directory

To write a program to Installation of Hive along with practice examples

Step II: Hadoop Configuration

Open mapred-site.xml file and add the following the <configuration>,

Verifying Hadoop Installation

The expected result is as follows.

The expected output is as follows:

Step III: Verifying Yarn Script

The expected output is as follows:

Step IV: Accessing Hadoop on Browser

Step V: Verify all applications for cluster

to visit this service.

On successful download, you get to see the following response:

Step 4: Installing Hive

On successful download, you get to see the following response:

Setting up environment for Hive

The following command is used to execute ~/.bashrc file.

Edit the hive-env.sh file by appending the following line:

Step 6: Downloading and Installing Apache Derby

The following command is used to verify the download:

On successful download, you get to see the following response:

Create a directory to store Metastore

Derby installation and environmental setup is now complete.

The following commands are used to verify Hive installation:

On successful installation of Hive, you get to see the following response:

The following sample command is executed to display all the tables:

Importing data from MySQL to HDFS

Step 2: Create a database and table and insert

Step 4: Run below the import command on

In the above code following things should be noted.

Exporting data from HDFS to MySQL

Step 2: Insert data into the hive table.

values("Ritik","Amazon",234567891,35); Data in Hive table

Step 4: Run the following command on Hadoop.

Step 5: Check-in MySQL if data is exported successfully or not.

You might also like