Bda Lab
Bda Lab
PROGRAM :
Linked List
Program:
import java.util.*;
public class LinkedListDemo {
public static void main(String args[]) {
// create a linked list
LinkedList ll = new LinkedList();
// add elements to the
linked list ll.add("F");
ll.add("B");
ll.add("D");
ll.add("E");
ll.add("C");
ll.addLast("Z");
ll.addFirst("A");
ll.add(1, "A2");
System.out.println("Original contents of ll: " + ll);
// remove elements from the
linked list ll.remove("F");
ll.remove(2);
Program:
import java.util.*;
st.push(new Integer(a));
System.out.println("push(" + a + ")");
System.out.println(a);
System.out.println("stack: " +
showpush(st, 66);
showpush(st, 99);
showpop(st);
showpop(st);
showpop(st);
try {
showpop(st);
}catch
(EmptyStackException e) {
System.out.println("empty
stack");
Output:
stack: [ ]
push(42)
stack:
[42]
push(66)
stack:
[42, 66]
push(99)
pop -> 99
pop -> 66
stack: [42]
pop -> 42
stack: [ ]
program
import java.util.LinkedList;
import java.util.Queue;
public class
QueueExample
1, 2, 3, 4} to queue for
q.add(i);
queue.
System.out.println("Eleme
nts of queue-"+q);
// To remove
the head of
queue. int
removedele =
q.remove();
System.out.println(q);
// To view the
head = q.peek();
// implementation.
Output:
Elements of
queue-[0, 1, 2, 3,
4] removed
element-0
[1, 2, 3, 4]
head of queue-1
Size of queue-4
d) Set
Program
import java.util.*;
public class
SetDemo {
public static void
main(String args[]) {
int count[] = {34,
22,10,60,30,22};
Set<Integer> set = new
HashSet<Integer>(); try{
for(int i = 0; i<5;
i++){
set.add(count[i]);
System.out.println(set);
System.out.println(sortedSet);
}catch(Exception e){}
Output:
Program:
import java.awt.Color;
import java.util.HashMap;
import java.util.Map;
import java.util.Set; public
class MapDemo
sai : java.awt.Color[r=0,g=0,b=255]
krishna : java.awt.Color[r=255,g=0,b=0]
Ram : java.awt.Color[r=0,g=255,b=0]
EXERCISE 2 :
AIM: (i)Perform setting up and Installing Hadoop in its three operating modes:
Standalone, Pseudo distributed,Fully distributed
(ii)Use web based tools to monitor your Hadoop setup.
PROGRAM :
https://drive.google.com/file/d/1nCN_jK7EJF2DmPUUxgOggnvJ6k6tksYz/vie
w
Editing core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Editing hdfs-site.xml
Also replace PATH~1 and PATH~2 with the path of namenode and
datanode folder that we created recently
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>PATH~1\namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>PATH~2\datanode</value>
<final>true</final>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!-- Site specific YARN configuration properties --></configuration>
Verifying hadoop-env.cmd
set JAVA_HOME=%JAVA_HOME%
OR
set JAVA_HOME="C:\Program Files\Java\jdk1.8.0_221"
Replacing bin
EXERCISE 3 :
AIM: Implement the following file management tasks in Hadoop:
1. Adding files and directories 2. Retrieving files 3. Deleting files
PROGRAM :
EXERCISE 4 :
AIM: Run a basic Word Count MapReduce program to understand
MapReduce Paradigm.
PROGRAM :
EXERCISE 5 :
AIM: Write a map reduce program that mines weather data.
PROGRAM :
Open eclipse→create new java project as MyProject-→ create class as
MyMaxMin
Copy the below code..
// importing Libraries
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
public class MyMaxMin {
// Mapper
/*MaxTemperatureMapper class is static
* and extends Mapper abstract class
* having four Hadoop generics type
* LongWritable, Text, Text, Text.
*/
public static class MaxTemperatureMapper extends
Mapper<LongWritable, Text, Text, Text> {
/**
* @method map
* This method takes the input as a text data type.
* Now leaving the first five tokens, it takes
* 6th token is taken as temp_max and
* 7th token is taken as temp_min. Now
* temp_max > 30 and temp_min < 15 are
* passed to the reducer.
*/
// the data in our data set with
// this value is inconsistent data
public static final int MISSING = 9999;
@Override
public void map(LongWritable arg0, Text Value, Context context)
throws IOException, InterruptedException {
// Convert the single row(Record) to
// String and store it in String
// variable name line
String line = Value.toString();
// Check for the empty line
if (!(line.length() == 0)) {
// from character 6 to 14 we have
// the date in our dataset
String date = line.substring(6, 14);
// similarly we have taken the maximum
// temperature from 39 to 45 characters
float temp_Max = Float.parseFloat(line.substring(39, 45).trim());
// similarly we have taken the minimum
// temperature from 47 to 53 characters
float temp_Min = Float.parseFloat(line.substring(47, 53).trim());
// if maximum temperature is
// greater than 30, it is a hot day
if (temp_Max > 30.0) {
// Hot day
context.write(new Text("The Day is Hot Day :" + date),
new Text(String.valueOf(temp_Max)));
}
// if the minimum temperature is
// less than 15, it is a cold day
if (temp_Min < 15) {
// Cold day
context.write(new Text("The Day is Cold Day :" + date),
new Text(String.valueOf(temp_Min)));
}
}
}
}
// Reducer
/*MaxTemperatureReducer class is static
and extends Reducer abstract class
having four Hadoop generics type
Text, Text, Text, Text.
*/
public static class MaxTemperatureReducer extends
Reducer<Text, Text, Text, Text> {
/**
* @method reduce
* This method takes the input as key and
* list of values pair from the mapper,
* it does aggregation based on keys and
* produces the final context.
*/
public void reduce(Text Key, Iterator<Text> Values, Context context)
throws IOException, InterruptedException {
// putting all the values in
// temperature variable of type String
String temperature = Values.next().toString();
context.write(Key, new Text(temperature));
}
}
/**
* @method main
* This method is used for setting
* all the configuration properties.
* It acts as a driver for map-reduce
* code.
*/
public static void main(String[] args) throws Exception {
// reads the default configuration of the
// cluster from the configuration XML files
Configuration conf = new Configuration();
// Initializing the job with the
// default configuration of the cluster
Job job = new Job(conf, "weather example");
// Assigning the driver class name
job.setJarByClass(MyMaxMin.class);
// Key type coming out of mapper
job.setMapOutputKeyClass(Text.class);
// value type coming out of mapper
job.setMapOutputValueClass(Text.class);
// Defining the mapper class name
job.setMapperClass(MaxTemperatureMapper.class);
// Defining the reducer class name
job.setReducerClass(MaxTemperatureReducer.class);
// Defining input Format class which is
// responsible to parse the dataset
// into a key value pair
job.setInputFormatClass(TextInputFormat.class);
// Defining output Format class which is
// responsible to parse the dataset
// into a key value pair
job.setOutputFormatClass(TextOutputFormat.class);
// setting the second argument
// as a path in a path variable
Path OutputPath = new Path(args[1]);
// Configuring the input path
// from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
// Configuring the output path from
// the filesystem into the job
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// deleting the context path automatically
// from hdfs so that we don't have
// to delete it explicitly
OutputPath.getFileSystem(conf).delete(OutputPath);
// exiting the job only if the
// flag value becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
C. Add yarn jar files. Select yarn jar files and then select Open.
D. Add MapReduce jar files. Select MapReduce jar files. Click Open.
E. Add HDFS jar files. Select HDFS jar files and click Open. Click on Apply and
Close to add all the Hadoop jar files.
Now, we have added all required jar files in our project.
Step 5. Now create a new class that performs the map job.
Then export
Output:
1. Map Function: The map function takes the input graph and generates
key-value pairs where the key is the node ID and the value is a tuple
containing the distance from the source node and the list of neighbors.
2. Reduce Function: The reduce function receives the key-value pairs
generated by the map function and processes them to update the distance
and neighbors list for each node.
3. Iteration: The map and reduce functions are repeated iteratively until the
target node is reached or no more updates are made to the distances.
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
@Override
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {
// Parse the input line as a node and its adjacency list
String[] tokens = value.toString().split("\t");
long node = Long.parseLong(tokens[0]);
String adjacencyList = tokens[1];
// If the node is the source node, emit its adjacency list with a distance
of 0
if (node == SOURCE_NODE_ID) {
String[] neighbors = adjacencyList.split(",");
for (String neighbor : neighbors) {
long neighborNode = Long.parseLong(neighbor);
context.write(new LongWritable(neighborNode), new Text("0," +
SOURCE_NODE_ID));
}
}
}
}
@Override
public void reduce(LongWritable key, Iterable<Text> values, Context
context)
throws IOException, InterruptedException {
boolean visited = false;
long distance = Long.MAX_VALUE;
String adjacencyList = null;
// If the node has been visited, emit its adjacency list with the updated
distance
if (visited) {
context.write(key, new Text(distance + "," + adjacencyList));
}
}
}
// Driver program
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "ShortestPathFinder");
The FoFMapper class emits user and friends as key-value pairs, and the
FoFReducer class counts the number of unique friends-of-friends for each user
and emits the result as output. The main method sets up the MapReduce job,
including the input and output file paths, mapper and reducer classes, and
output key and value classes.
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
// Mapper class
public static class FoFMapper extends Mapper<Object, Text, Text, Text> {
@Override
protected void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
// Split the input line into tokens
String[] tokens = value.toString().trim().split("\\s+");
// Reducer class
public static class FoFReducer extends Reducer<Text, Text, Text, IntWritable>
{
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// Create a set to hold unique friends
Set<String> uniqueFriends = new HashSet<>();
// Main method
public static void main(String[] args) throws Exception {
// Create a Hadoop configuration
Configuration conf = new Configuration();
// Create a MapReduce job
Job job = Job.getInstance(conf, "FriendsOfFriends");
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
// If the page has outgoing links, emit its pageRank divided by the
number of outgoing links for each outgoing link
if (outgoingLinks.length > 0) {
float outgoingPageRank = pageRank / outgoingLinks.length;
for (String link : outgoingLinks) {
output.collect(new Text(link), new Text("OPR:" +
outgoingPageRank));
}
}
}
}
while (values.hasNext()) {
String value = values.next().toString();
String[] parts = value.split(":");
if (parts[0].equals("PR")) {
// Accumulate the sum of pageRank for this key
sumPageRank += Float.parseFloat(parts[1]);
} else if (parts[0].equals("OPR")) {
// Collect the outgoing links
outgoingLinks += "," + parts[1];
}
}
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
JobClient.runJob(conf);
}
}
EXERCISE 9 :
AIM: Perform an efficient semi-join in MapReduce.
PROGRAM :
A semi-join in MapReduce is an operation that filters data from one data set
based on the existence of matching keys in another data set, similar to an inner
join in relational databases.
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
// Reducer class
public static class SemiJoinReducer extends Reducer<Text, Text, Text, Text>
{
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
boolean hasMatch = false;
for (Text value : values) {
// Iterate through values to check for a match
if (value.toString().equals("match")) {
hasMatch = true;
break;
}
}
if (hasMatch) {
// If there is a match, emit the key as output
context.write(key, new Text(""));
}
}
}
In this program, the SemiJoinMapper class reads input data and emits the join
key as the output key, and the join value as the output value. The
SemiJoinReducer class receives the key-value pairs from the mapper and
checks for the existence of the key "match" in the values. If a match is found,
the key is emitted as the output. The input and output paths are specified as
command-line arguments when running the MapReduce job.
EXERCISE 10 :
AIM: Install and Run Pig then write Pig Latin scripts to sort, group, join,
project, and filter your data.
PROGRAM :
Pig is a data processing tool in Hadoop ecosystem that provides a high-level
scripting language called Pig Latin for processing large data sets.
1. Download the latest stable release of Pig from the Apache Pig website
(https://pig.apache.org/).
2. Extract the downloaded Pig archive to a directory of your choice.
3. Set the PIG_HOME environment variable to the path of the extracted Pig
directory.
4. Add the Pig binaries to your system's PATH environment variable.
5. Verify the Pig installation by running the following command: pig -
version.
Sort data:
Group data:
Join data:
Project data:
-- Load data from a CSV file
data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,
city:chararray);
Filter data:
-- Load data from a CSV file
data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,
city:chararray);
In above examples, the input data is loaded from a CSV file using the LOAD
statement, and the processed data is stored in a new file using the STORE
statement with the specified output file name.
EXERCISE 11 :
AIM: Install and Run Hive then use Hive to create, alter, and drop databases,
tables, views, functions, and indexes
PROGRAM :
Hive is a data warehouse tool that provides an SQL-like interface for querying
and managing large datasets stored in distributed file systems like Hadoop
HDFS. Here are the steps to install and run Hive, and then create, alter, and
drop databases, tables, views, functions, and indexes.
1. Install Hadoop: Hive runs on top of Hadoop, so you need to have Hadoop
installed and configured on your system. You can download Hadoop from
the Apache Hadoop website (https://hadoop.apache.org/).
2. Download Hive: You can download Hive from the Apache Hive website
(https://hive.apache.org/).
3. Extract Hive: Extract the downloaded Hive archive to a directory of your
choice.
4. Configure Hive: Hive requires some configuration settings. Copy the hive-
default.xml.template file from the Hive installation directory to hive-
site.xml, and then configure the necessary settings, such as Hadoop's
fs.defaultFS, Hive's javax.jdo.option.ConnectionURL, and
javax.jdo.option.ConnectionDriverName.
Step 3: Create, Alter, and Drop Databases, Tables, Views, Functions, and
Indexes
Create a Database: You can create a new database in Hive using the CREATE
DATABASE command. For example:
CREATE DATABASE mydb;
Alter a Database: You can alter a database in Hive using the ALTER
DATABASE command. For example, you can set properties for a database:
ALTER DATABASE mydb SET DBPROPERTIES ('description'='My database');
Drop a Database: You can drop a database in Hive using the DROP DATABASE
command. For example:
DROP DATABASE mydb;
Create a Table: You can create a table in Hive using the CREATE TABLE
command. For example:
CREATE TABLE mytable (id INT, name STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
Alter a Table: You can alter a table in Hive using the ALTER TABLE command.
For example, you can add a new column to a table:
ALTER TABLE mytable ADD COLUMN age INT;
Drop a Table: You can drop a table in Hive using the DROP TABLE command.
For example:
DROP TABLE mytable;
Create a View: You can create a view in Hive using the CREATE VIEW
command. For example:
CREATE VIEW myview AS SELECT id, name FROM mytable WHERE age > 18;
Create a Function: You can create a custom function in Hive using the
CREATE FUNCTION command. For example:
CREATE FUNCTION myfunc AS 'com.example.MyUDF' USING JAR
'hdfs://localhost:9000/myudf.jar';
Create an Index: You can create an index on a table in Hive using the CREATE
INDEX command. For example:
CREATE INDEX myindex ON TABLE mytable (name) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';