CCS334 BDA Lab Manual
CCS334 BDA Lab Manual
DEPARTMENT OF
ARTIFICIAL INTELLIGENCE &
DATA SCIENCE
OUTCOMES:
After the completion of this course, students will be able to:
CO1: Describe big data and use cases from selected business domains.
CO2: Explain NoSQL big data management.
CO3: Install, configure, and run Hadoop and HDFS.
CO4: Perform map-reduce analytics using Hadoop.
CO5: Use Hadoop-related tools such as HBase, Cassandra, Pig, and Hive for big data
analytics.
SOFTWARE REQUIRED:
Cassandra, Hadoop, Java, Pig, Hive and HBase.
viii
TABLE OF CONTENTS
PAGE NO.
S.NO. TITLE OF THE EXPERIMENTS
5.
Installation of Hive along with Practice Examples. 24
7. Installation of Thrift. 30
ix
EX. No. 1: DOWNLOADING AND INSTALLING HADOOP; UNDERSTANDING
DIFFERENT HADOOP MODES. STARTUP SCRIPTS, CONFIGURATION
FILES.
VIRTUAL BOX (For Linux): it is used for installing the operating system on it.
OPERATING SYSTEM: You can install Hadoop on Windows or Linux based
operating systems. Ubuntu and CentOS are very commonly used.
JAVA: You need to install the Java 8 package on your system.
HADOOP: You require Hadoop latest version
1. Install Java
Java JDK Link to download
https://www.oracle.com/java/technologies/javase-jdk8-downloads.html
Extract and install Java in C:\Java
Open cmd and type -> javac –version
2. Download Hadoop
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
extract to C:\Hadoop
1
3. Set the path JAVA_HOME Environment variable
4. Set the path HADOOP_HOME Environment variable
2
3
4
5
5. Configurations
6
paste the xml code in folder and save
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
======================================================
Rename “mapred-site.xml.template” to “mapred-site.xml” and edit this file C:/Hadoop-
3.3.0/etc/hadoop/mapred-site.xml, paste xml code and save this file.
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
======================================================
Create folder “data” under “C:\Hadoop-3.3.0”
Create folder “datanode” under “C:\Hadoop-3.3.0\data”
Create folder “namenode” under “C:\Hadoop-3.3.0\data”
======================================================
Edit file C:\Hadoop-3.3.0/etc/hadoop/hdfs-site.xml,
paste xml code and save this file.
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/hadoop-3.3.0/data/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
7
<value>/hadoop-3.3.0/data/datanode</value>
</property>
</configuration>
======================================================
Edit file C:/Hadoop-3.3.0/etc/hadoop/yarn-site.xml,
paste xml code and save this file.
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
======================================================
Edit file C:/Hadoop-3.3.0/etc/hadoop/hadoop-env.cmd
by closing the command line
“JAVA_HOME=%JAVA_HOME%” instead of set “JAVA_HOME=C:\Java”
6. Hadoop Configurations
Download
https://github.com/brainmentorspvtltd/BigData_RDE/blob/master/Hadoop%20Configuration.zip
or (for hadoop 3)
https://github.com/s911415/apache-hadoop-3.1.0-winutils
Copy folder bin and replace existing bin folder in
C:\Hadoop-3.3.0\bin
Format the NameNode
Open cmd and type command “hdfs namenode –format”
8
7. Testing
Open cmd and change directory to C:\Hadoop-3.3.0\sbin
type start-all.cmd
9
Open: http://localhost:8088
Open: http://localhost:9870
10
11
EX. No : 2 HADOOP IMPLEMENTATION OF FILE MANAGEMENT TASKS, SUCH AS
ADDING FILES AND DIRECTORIES, RETRIEVING FILES AND
DELETING FILES
12
Example:
hadoop fs -cat /user/saurzcode/dir1/abc.txt
Example:
hadoop fs -cp
/user/saurzcode/dir1/abc.txt
/user/saurzcode/dir2
URI Example:
hadoop fs -copyFromLocal /home/saurzcode/abc.txt /user/ saurzcode/abc.txt
Similar to put command, except that the source is restricted to a local file reference.
copyToLocal
Usage:
hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>
Similar to get command, except that the destination is restricted to a local file reference.
14
EX. No : 3 IMPLEMENT OF MATRIX MULTIPLICATION WITH HADOOP MAP
REDUCE
AIM:-
To write a Map Reduce Program that implements Matrix Multiplication.
PROCEDURE:
We assume that the input matrices are already stored in Hadoop Distributed File System
(HDFS) in a suitable format (e.g., CSV, TSV) where each row represents a matrix element. The
matrices are compatible for multiplication (the number of columns in the first matrix is equal
to the number of rows in the second matrix).
STEP 1: MAPPER
The mapper will take the input matrices and emit key-value pairs for each element in
the result matrix. The key will be the (row, column) index of the result element, and the value
will be the corresponding element value.
STEP 2: REDUCER
The reducer will take the key-value pairs emitted by the mapper and calculate the partial
sum for each element in the result matrix.
PROGRAM:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
15
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.fs.Path;
public class MatrixMultiplicationMapper extends Mapper<LongWritable, Text, Text, Text>
{
protected void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
// Parse the input line to get row, column, and value of each element in the input matrices
String[] elements = value.toString().split(",");
int row = Integer.parseInt(elements[0]);
int col = Integer.parseInt(elements[1]);
int val = Integer.parseInt(elements[2]);
// Emit key-value pairs where key is (row, column) index of the result element
// and value is the corresponding element value
context.write(new Text(row + "," + col), new Text(val));
}
}
public class MatrixMultiplicationReducer extends Reducer<Text, Text, Text, IntWritable> {
protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException,
InterruptedException {
int result = 0;
for (Text value : values) {
// Accumulate the partial sum for the result element
result += Integer.parseInt(value.toString());
}
// Emit the final result for the result element
context.write(key, new IntWritable(result));
}
16
}
public class MatrixMultiplicationDriver {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Matrix Multiplication");
job.setJarByClass(MatrixMultiplicationDriver.class);
job.setMapperClass(MatrixMultiplicationMapper.class);
job.setReducerClass(MatrixMultiplicationReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
17
RESULT:
Thus the Map Reduce Program that implements Matrix Multiplication was executed
and verified successfully.
18
EX. NO: 4 RUN A BASIC WORD COUNT MAP REDUCE PROGRAM TO
UNDERSTAND MAP REDUCE PARADIGM.
AIM:-
To write a Basic Word Count program to understand Map Reduce Paradigm.
PROCEDURE:
The entire MapReduce program can be fundamentally divided into three parts:
Mapper Phase Code
Reducer Phase Code
Driver Code
19
Input:
The key nothing but those unique words which have been generated after the sorting
and shuffling phase: Text
The value is a list of integers corresponding to each key: IntWritable
Example – Bear, [1, 1], etc.
Output:
The key is all the unique words present in the input text file: Text
The value is the number of occurrences of each of the unique words: IntWritable
Example – Bear, 2; Car, 3, etc.
We have aggregated the values present in each of the list corresponding to each key and
produced the final answer.
In general, a single reducer is created for each of the unique words, but, you can specify the
number of reducer in mapred-site.xml.
PROGRAM:
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
20
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.fs.Path;
public class WordCount
{
public static class Map extends Mapper<LongWritable,Text,Text,IntWritable> {
public void map(LongWritable key, Text value,Context context) throws
IOException,InterruptedException{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
value.set(tokenizer.nextToken());
context.write(value, new IntWritable(1));
}
}
}
public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException {
int sum=0;
for(IntWritable x: values)
{
sum+=x.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf= new Configuration();
21
Job job = new Job(conf,"My Word Count Program");
job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//deleting the output path automatically from hdfs so that we don't have to
delete it explicitly
outputPath.getFileSystem(conf).delete(outputPath);
//exiting the job only if the flag value becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Run the MapReduce code:
The command for running a MapReduce code is:
hadoop jar hadoop-mapreduce-example.jar WordCount /sample/input /sample/output
OUTPUT:
22
RESULT:
Thus the Map Reduce Program that implements word count was executed and verified
successfully.
23
EX. NO : 5 INSTALLATION OF HIVE ALONG WITH PRACTICE EXAMPLES.
PREREQUISITES:
Java Development Kit (JDK) installed and the JAVA_HOME environment variable
set.
Hadoop installed and configured on your Windows system.
STEP-BY-STEP INSTALLATION:
1. Download HIVE:
Visit the Apache Hive website and download the latest stable version of Hive.
Official Apache Hive website: https://hive.apache.org/
2. Extract the Downloaded Hive Archive to a Directory on Your Windows Machine,
e.g., C:\hive.
3. Configure Hive:
Open the Hive configuration file (hive-site.xml) located in the conf folder of the
extracted Hive directory.
Set the necessary configurations, such as Hive Metastore connection settings and
Hadoop configurations. Make sure to adjust paths accordingly for Windows. Here's an
example of some configurations:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:derby:;databaseName=/path/to/metastore_db;create=true</value>
<description>JDBC connect string for a JDBC metastore.</description>
</property>
<!-- Other Hive configurations -->
</configuration>
24
5. Start the Hive Metastore service:
To start the Hive Metastore service, you can use the schematool script:
6. Start Hive:
Open a command prompt or terminal and navigate to the Hive installation directory.
Execute the hive command to start the Hive shell.
EXAMPLES:
1. Create a Database:
To create a new database in HIVE, use the following syntax:
2. Use a Database:
To use a specific database in HIVE, use the following syntax:
USE database_name;
Example:
USE mydatabase;
3. Show Databases:
To display a list of available databases in HIVE, use the following syntax:
SHOW DATABASES;
4. Create a Table:
To create a table in HIVE, use the following syntax:
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
...
);
25
Example:
CREATE TABLE mytable (
id INT,
name STRING,
age INT
);
5. Show Tables:
To display a list of tables in the current database, use the following syntax:
SHOW TABLES;
6. Describe a Table:
To view the schema and details of a specific table, use the following syntax:
DESCRIBE table_name;
Example:
DESCRIBE mytable;
RESULT:
Thus the Installation of HIVE was done successfully.
26
EX. NO : 6 INSTALLATION OF HBASE ALONG WITH PRACTICE EXAMPLES
AIM:
To install HBASE using Virtual Machine and perform some operations in HBASE.
PROCEDURE:
Step 1: Install a Virtual Machine
Download and install a virtual machine software such as VirtualBox
(https://www.virtualbox.org/) or VMware (https://www.vmware.com/).
Create a new virtual machine and install a Unix-based operating system like Ubuntu or
CentOS. You can download the ISO image of your desired Linux distribution from their
official websites.
27
Move the extracted HBase directory to a desired location:
sudo mv <hbase_extracted_directory> /opt/hbase
Replace <hbase_extracted_directory> with the actual name of the extracted HBase
directory.
28
Step 3: Create a Table
In the HBase shell, you can create a table with column families.
For example, let's create a table named "my_table" with a column family called "cf":
>> create 'my_table', 'cf'
RESULT:
Thus the installation of HBase using Virtual Machine was done successfully.
29
EX. NO : 7 INSTALLATION OF THRIFT
AIM:
To install Apache thrift on Windows OS.
PROCEDURE:
RESULT:
Thus the installation of Thrift on windows OS was done successfully.
30
EX. NO : 8 PRACTICE IMPORTING AND EXPORTING DATA FROM VARIOUS
DATABASES.
AIM:
To import and export data from various Databases using SQOOP.
PROCEDURE:
31
--password <DB_PASSWORD> \
--table <TABLE_NAME> \
--export-dir <HDFS_EXPORT_DIR> \
--input-fields-terminated-by '<DELIMITER>'
Replace the placeholders
(<DB_TYPE>, <DB_HOST>, <DB_PORT>, <DB_NAME>, <DB_USERNAME>,
<DB_PASSWORD>, <TABLE_NAME>, <HDFS_EXPORT_DIR>, and
<DELIMITER>) with the appropriate values for your database and Hadoop
environment.
RESULT:
Thus the implementation export data from various Databases using SQOOP was done
successfully.
32