0% found this document useful (0 votes)
17 views50 pages

Bda Lab

Uploaded by

anjiyennj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views50 pages

Bda Lab

Uploaded by

anjiyennj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

EXERCISE 1 :

AIM: Implement the following Data structures in Java


a) Linked Lists b) Stacks c) Queues d) Set e) Map

PROGRAM :

Linked List

Program:

import java.util.*;
public class LinkedListDemo {
public static void main(String args[]) {
// create a linked list
LinkedList ll = new LinkedList();
// add elements to the
linked list ll.add("F");
ll.add("B");
ll.add("D");
ll.add("E");
ll.add("C");
ll.addLast("Z");
ll.addFirst("A");
ll.add(1, "A2");
System.out.println("Original contents of ll: " + ll);
// remove elements from the
linked list ll.remove("F");
ll.remove(2);

System.out.println("Contents of ll after deletion: " + ll);


// remove first and last
elements
ll.removeFirst();
ll.removeLast();

System.out.println("ll after deleting first and last: "+ ll);


// get and set a
value Object val
= ll.get(2);
ll.set(2, (String) val + "
Changed");
System.out.println("ll after
change: " + ll);
}
}
Output:
Original contents of ll: [A, A2, F, B,
D, E, C, Z] Contents of ll after
deletion: [A, A2, D, E, C, Z] ll after
deleting first and last: [A2, D, E, C]
ll after change: [A2, D, E Changed, C]
b) Stacks

Program:

import java.util.*;

public class StackDemo {

static void showpush(Stack st, int a) {

st.push(new Integer(a));

System.out.println("push(" + a + ")");

System.out.println("stack: " + st);

static void showpop(Stack st) {

System.out.print("pop -> ");

Integer a = (Integer) st.pop();

System.out.println(a);

System.out.println("stack: " + st);

public static void main(String

args[]) { Stack st = new Stack();

System.out.println("stack: " +

st); showpush(st, 42);

showpush(st, 66);

showpush(st, 99);
showpop(st);

showpop(st);

showpop(st);

try {
showpop(st);

}catch

(EmptyStackException e) {

System.out.println("empty

stack");

Output:

stack: [ ]

push(42)

stack:

[42]

push(66)

stack:

[42, 66]

push(99)

stack: [42, 66, 99]

pop -> 99

stack: [42, 66]

pop -> 66

stack: [42]

pop -> 42

stack: [ ]

pop -> empty stack


c) Queues

program

import java.util.LinkedList;

import java.util.Queue;

public class

QueueExample

public static void main(String[] args)

Queue<Integer> q = new LinkedList<>();

// Adds elements {0,

1, 2, 3, 4} to queue for

(int i=0; i<5; i++)

q.add(i);

// Display contents of the

queue.

System.out.println("Eleme

nts of queue-"+q);

// To remove

the head of

queue. int

removedele =

q.remove();

System.out.println("removed element-" + removedele);

System.out.println(q);
// To view the

head of queue int

head = q.peek();

System.out.println("head of queue-" + head);

// Rest all methods of collection interface,

// Like size and contains can be used with this

// implementation.

int size = q.size();

System.out.println("Size of queue-" + size);


}

Output:

Elements of

queue-[0, 1, 2, 3,

4] removed

element-0

[1, 2, 3, 4]

head of queue-1

Size of queue-4
d) Set

Program
import java.util.*;
public class
SetDemo {
public static void
main(String args[]) {
int count[] = {34,
22,10,60,30,22};
Set<Integer> set = new
HashSet<Integer>(); try{
for(int i = 0; i<5;

i++){

set.add(count[i]);

System.out.println(set);

TreeSet sortedSet = new TreeSet<Integer>(set);

System.out.println("The sorted list is:");

System.out.println(sortedSet);

System.out.println("The First element of the set is:

"+(Integer)sortedSet.first()); System.out.println("The last element

of the set is: "+(Integer)sortedSet.last());

}catch(Exception e){}

Output:

[34, 22, 10, 60, 30]

The sorted list is:


[10, 22, 30, 34, 60]

The First element

of the set is: 10

The last element

of the set is: 60


e) Map

Program:

import java.awt.Color;
import java.util.HashMap;
import java.util.Map;
import java.util.Set; public
class MapDemo

public static void main(String[] args)


{
Map<String, Color> favoriteColors = new
HashMap<String, Color>(); favoriteColors.put("sai",
Color.BLUE); favoriteColors.put("Ram", Color.GREEN);
favoriteColors.put("krishna", Color.RED);
favoriteColors.put("narayana", Color.BLUE); //
Print all keys and values in the map
Set<String> keySet = favoriteColors.keySet(); for (String key : keySet)
{
Color value = favoriteColors.get(key);
System.out.println(key + " : " + value);
}
}
}
Output:
narayana :
java.awt.Color[r=0,g=0,b=255]

sai : java.awt.Color[r=0,g=0,b=255]
krishna : java.awt.Color[r=255,g=0,b=0]

Ram : java.awt.Color[r=0,g=255,b=0]
EXERCISE 2 :
AIM: (i)Perform setting up and Installing Hadoop in its three operating modes:
Standalone, Pseudo distributed,Fully distributed
(ii)Use web based tools to monitor your Hadoop setup.

PROGRAM :
https://drive.google.com/file/d/1nCN_jK7EJF2DmPUUxgOggnvJ6k6tksYz/vie
w
Editing core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Editing hdfs-site.xml
Also replace PATH~1 and PATH~2 with the path of namenode and
datanode folder that we created recently
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>PATH~1\namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>PATH~2\datanode</value>
<final>true</final>
</property>
</configuration>

mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<!-- Site specific YARN configuration properties --></configuration>
Verifying hadoop-env.cmd
set JAVA_HOME=%JAVA_HOME%
OR
set JAVA_HOME="C:\Program Files\Java\jdk1.8.0_221"

Replacing bin
EXERCISE 3 :
AIM: Implement the following file management tasks in Hadoop:
1. Adding files and directories 2. Retrieving files 3. Deleting files

PROGRAM :
EXERCISE 4 :
AIM: Run a basic Word Count MapReduce program to understand
MapReduce Paradigm.
PROGRAM :
EXERCISE 5 :
AIM: Write a map reduce program that mines weather data.

PROGRAM :
Open eclipse→create new java project as MyProject-→ create class as
MyMaxMin
Copy the below code..

// importing Libraries
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
public class MyMaxMin {
// Mapper
/*MaxTemperatureMapper class is static
* and extends Mapper abstract class
* having four Hadoop generics type
* LongWritable, Text, Text, Text.
*/
public static class MaxTemperatureMapper extends
Mapper<LongWritable, Text, Text, Text> {
/**
* @method map
* This method takes the input as a text data type.
* Now leaving the first five tokens, it takes
* 6th token is taken as temp_max and
* 7th token is taken as temp_min. Now
* temp_max > 30 and temp_min < 15 are
* passed to the reducer.
*/
// the data in our data set with
// this value is inconsistent data
public static final int MISSING = 9999;
@Override
public void map(LongWritable arg0, Text Value, Context context)
throws IOException, InterruptedException {
// Convert the single row(Record) to
// String and store it in String
// variable name line
String line = Value.toString();
// Check for the empty line
if (!(line.length() == 0)) {
// from character 6 to 14 we have
// the date in our dataset
String date = line.substring(6, 14);
// similarly we have taken the maximum
// temperature from 39 to 45 characters
float temp_Max = Float.parseFloat(line.substring(39, 45).trim());
// similarly we have taken the minimum
// temperature from 47 to 53 characters
float temp_Min = Float.parseFloat(line.substring(47, 53).trim());
// if maximum temperature is
// greater than 30, it is a hot day
if (temp_Max > 30.0) {
// Hot day
context.write(new Text("The Day is Hot Day :" + date),
new Text(String.valueOf(temp_Max)));
}
// if the minimum temperature is
// less than 15, it is a cold day
if (temp_Min < 15) {
// Cold day
context.write(new Text("The Day is Cold Day :" + date),
new Text(String.valueOf(temp_Min)));
}
}
}
}
// Reducer
/*MaxTemperatureReducer class is static
and extends Reducer abstract class
having four Hadoop generics type
Text, Text, Text, Text.
*/
public static class MaxTemperatureReducer extends
Reducer<Text, Text, Text, Text> {
/**
* @method reduce
* This method takes the input as key and
* list of values pair from the mapper,
* it does aggregation based on keys and
* produces the final context.
*/
public void reduce(Text Key, Iterator<Text> Values, Context context)
throws IOException, InterruptedException {
// putting all the values in
// temperature variable of type String
String temperature = Values.next().toString();
context.write(Key, new Text(temperature));
}
}
/**
* @method main
* This method is used for setting
* all the configuration properties.
* It acts as a driver for map-reduce
* code.
*/
public static void main(String[] args) throws Exception {
// reads the default configuration of the
// cluster from the configuration XML files
Configuration conf = new Configuration();
// Initializing the job with the
// default configuration of the cluster
Job job = new Job(conf, "weather example");
// Assigning the driver class name
job.setJarByClass(MyMaxMin.class);
// Key type coming out of mapper
job.setMapOutputKeyClass(Text.class);
// value type coming out of mapper
job.setMapOutputValueClass(Text.class);
// Defining the mapper class name
job.setMapperClass(MaxTemperatureMapper.class);
// Defining the reducer class name
job.setReducerClass(MaxTemperatureReducer.class);
// Defining input Format class which is
// responsible to parse the dataset
// into a key value pair
job.setInputFormatClass(TextInputFormat.class);
// Defining output Format class which is
// responsible to parse the dataset
// into a key value pair
job.setOutputFormatClass(TextOutputFormat.class);
// setting the second argument
// as a path in a path variable
Path OutputPath = new Path(args[1]);
// Configuring the input path
// from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
// Configuring the output path from
// the filesystem into the job
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// deleting the context path automatically
// from hdfs so that we don't have
// to delete it explicitly
OutputPath.getFileSystem(conf).delete(OutputPath);
// exiting the job only if the
// flag value becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Add all the necessary jar libraries


To do so Right-Click on Project Name >>Build Path>> configure Build Path.
Add the External jars.
hadoop-3.1.2>> share >> hadoop.
share >> Hadoop
Then Add the client jar files.
A. Select client jar files and click on Open.
B. Add common jar files.
Select common jar files and Open. Also, add common/lib libraries. Select all
common/lib jars and click Open.

C. Add yarn jar files. Select yarn jar files and then select Open.
D. Add MapReduce jar files. Select MapReduce jar files. Click Open.
E. Add HDFS jar files. Select HDFS jar files and click Open. Click on Apply and
Close to add all the Hadoop jar files.
Now, we have added all required jar files in our project.
Step 5. Now create a new class that performs the map job.

Then export
Output:

Open cmd in admin mode


Launch all dfs and yarn nodes

EXERCISE 6 :
AIM: Use MapReduce to find the shortest path between two people
in a social graph.
PROGRAM :

Here's a high-level overview of the MapReduce process:

1. Map Function: The map function takes the input graph and generates
key-value pairs where the key is the node ID and the value is a tuple
containing the distance from the source node and the list of neighbors.
2. Reduce Function: The reduce function receives the key-value pairs
generated by the map function and processes them to update the distance
and neighbors list for each node.
3. Iteration: The map and reduce functions are repeated iteratively until the
target node is reached or no more updates are made to the distances.

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;

public class ShortestPathFinder {

// Mapper class for BFS


public static class BFSMapper extends Mapper<LongWritable, Text,
LongWritable, Text> {

@Override
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {
// Parse the input line as a node and its adjacency list
String[] tokens = value.toString().split("\t");
long node = Long.parseLong(tokens[0]);
String adjacencyList = tokens[1];

// Emit the node and its adjacency list


context.write(new LongWritable(node), new Text(adjacencyList));

// If the node is the source node, emit its adjacency list with a distance
of 0
if (node == SOURCE_NODE_ID) {
String[] neighbors = adjacencyList.split(",");
for (String neighbor : neighbors) {
long neighborNode = Long.parseLong(neighbor);
context.write(new LongWritable(neighborNode), new Text("0," +
SOURCE_NODE_ID));
}
}
}
}

// Reducer class for BFS


public static class BFSReducer extends Reducer<LongWritable, Text,
LongWritable, Text> {

@Override
public void reduce(LongWritable key, Iterable<Text> values, Context
context)
throws IOException, InterruptedException {
boolean visited = false;
long distance = Long.MAX_VALUE;
String adjacencyList = null;

// Iterate through the input values


for (Text value : values) {
String[] tokens = value.toString().split(",");
if (tokens.length == 1) { // Adjacency list
visited = true;
adjacencyList = tokens[0];
} else { // Distance and previous node
long currentDistance = Long.parseLong(tokens[0]);
if (currentDistance < distance) {
distance = currentDistance;
}
}
}

// If the node has been visited, emit its adjacency list with the updated
distance
if (visited) {
context.write(key, new Text(distance + "," + adjacencyList));
}
}
}

// Driver program
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "ShortestPathFinder");

// Set input and output paths


job.setJarByClass(ShortestPathFinder.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

// Set mapper and reducer classes


job.setMapperClass(BFSMapper.class);
job.setReducerClass(BFSReducer.class);

// Set output key and value classes


job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);

// Wait for the job to complete


System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
EXERCISE 7 :
AIM: Implement Friends-of-friends algorithm in MapReduce.
PROGRAM :

The FoFMapper class emits user and friends as key-value pairs, and the
FoFReducer class counts the number of unique friends-of-friends for each user
and emits the result as output. The main method sets up the MapReduce job,
including the input and output file paths, mapper and reducer classes, and
output key and value classes.

// Import necessary libraries


import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class FriendsOfFriends {

// Mapper class
public static class FoFMapper extends Mapper<Object, Text, Text, Text> {

@Override
protected void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
// Split the input line into tokens
String[] tokens = value.toString().trim().split("\\s+");

// Extract user and friends


String user = tokens[0];
List<String> friends = new ArrayList<>();
for (int i = 1; i < tokens.length; i++) {
friends.add(tokens[i]);
}

// Emit user and friends as key-value pairs


for (String friend : friends) {
// Emit friend as key with user as value
context.write(new Text(friend), new Text(user));
// Emit user as key with friend as value
context.write(new Text(user), new Text(friend));
}
}
}

// Reducer class
public static class FoFReducer extends Reducer<Text, Text, Text, IntWritable>
{

@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// Create a set to hold unique friends
Set<String> uniqueFriends = new HashSet<>();

// Iterate through input values and add them to the set


for (Text value : values) {
uniqueFriends.add(value.toString());
}

// Remove the user from the set of unique friends


uniqueFriends.remove(key.toString());

// Emit friend-of-friend count as output


context.write(key, new IntWritable(uniqueFriends.size()));
}
}

// Main method
public static void main(String[] args) throws Exception {
// Create a Hadoop configuration
Configuration conf = new Configuration();
// Create a MapReduce job
Job job = Job.getInstance(conf, "FriendsOfFriends");

// Set the classes for the job


job.setJarByClass(FriendsOfFriends.class);
job.setMapperClass(FoFMapper.class);
job.setReducerClass(FoFReducer.class);

// Set the input and output file paths


FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

// Set the output key and value classes


job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// Wait for the job to complete and print the result
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
EXERCISE 8 :
AIM: Implement an iterative PageRank graph algorithm in MapReduce.
PROGRAM :

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class PageRank {

public static class Map extends MapReduceBase implements


Mapper<LongWritable, Text, Text, Text> {

public void map(LongWritable key, Text value, OutputCollector<Text, Text>


output, Reporter reporter) throws IOException {

// Split the input line into pageID, pageRank, and outgoingLinks


String[] parts = value.toString().split("\t");
String pageID = parts[0];
float pageRank = Float.parseFloat(parts[1]);
String[] outgoingLinks = parts[2].split(",");

// Emit the pageID and its pageRank


output.collect(new Text(pageID), new Text("PR:" + pageRank));

// If the page has outgoing links, emit its pageRank divided by the
number of outgoing links for each outgoing link
if (outgoingLinks.length > 0) {
float outgoingPageRank = pageRank / outgoingLinks.length;
for (String link : outgoingLinks) {
output.collect(new Text(link), new Text("OPR:" +
outgoingPageRank));
}
}
}
}

public static class Reduce extends MapReduceBase implements


Reducer<Text, Text, Text, FloatWritable> {

private static final float DAMPING_FACTOR = 0.85f;

public void reduce(Text key, Iterator<Text> values, OutputCollector<Text,


FloatWritable> output, Reporter reporter) throws IOException {

float sumPageRank = 0.0f;


String outgoingLinks = "";

while (values.hasNext()) {
String value = values.next().toString();
String[] parts = value.split(":");
if (parts[0].equals("PR")) {
// Accumulate the sum of pageRank for this key
sumPageRank += Float.parseFloat(parts[1]);
} else if (parts[0].equals("OPR")) {
// Collect the outgoing links
outgoingLinks += "," + parts[1];
}
}

// Update the pageRank using the PageRank formula


float newPageRank = (1 - DAMPING_FACTOR) + (DAMPING_FACTOR *
sumPageRank);
output.collect(key, new FloatWritable(newPageRank));

// Emit the outgoing links for the key


output.collect(key, new FloatWritable(Float.MIN_VALUE));
if (!outgoingLinks.isEmpty()) {
output.collect(key, new FloatWritable(Float.MIN_VALUE));
}
}
}

public static void main(String[] args) throws Exception {


JobConf conf = new JobConf(PageRank.class);
conf.setJobName("PageRank");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);

conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));


FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}
EXERCISE 9 :
AIM: Perform an efficient semi-join in MapReduce.
PROGRAM :
A semi-join in MapReduce is an operation that filters data from one data set
based on the existence of matching keys in another data set, similar to an inner
join in relational databases.

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class SemiJoinMapReduce {


// Mapper class
public static class SemiJoinMapper extends Mapper<Object, Text, Text, Text>
{
@Override
protected void map(Object key, Text value, Context context) throws
IOException, InterruptedException {
String[] record = value.toString().split("\t"); // assuming tab-separated
values
String joinKey = record[0]; // extract the join key
String joinValue = record[1]; // extract the value
context.write(new Text(joinKey), new Text(joinValue)); // emit the key-
value pair
}
}

// Reducer class
public static class SemiJoinReducer extends Reducer<Text, Text, Text, Text>
{
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
boolean hasMatch = false;
for (Text value : values) {
// Iterate through values to check for a match
if (value.toString().equals("match")) {
hasMatch = true;
break;
}
}
if (hasMatch) {
// If there is a match, emit the key as output
context.write(key, new Text(""));
}
}
}

public static void main(String[] args) throws Exception {


Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "SemiJoinMapReduce");
job.setJarByClass(SemiJoinMapReduce.class);
job.setMapperClass(SemiJoinMapper.class);
job.setReducerClass(SemiJoinReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0])); // input path
FileOutputFormat.setOutputPath(job, new Path(args[1])); // output path
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

In this program, the SemiJoinMapper class reads input data and emits the join
key as the output key, and the join value as the output value. The
SemiJoinReducer class receives the key-value pairs from the mapper and
checks for the existence of the key "match" in the values. If a match is found,
the key is emitted as the output. The input and output paths are specified as
command-line arguments when running the MapReduce job.
EXERCISE 10 :
AIM: Install and Run Pig then write Pig Latin scripts to sort, group, join,
project, and filter your data.
PROGRAM :
Pig is a data processing tool in Hadoop ecosystem that provides a high-level
scripting language called Pig Latin for processing large data sets.

Step 1: Install Pig

1. Download the latest stable release of Pig from the Apache Pig website
(https://pig.apache.org/).
2. Extract the downloaded Pig archive to a directory of your choice.
3. Set the PIG_HOME environment variable to the path of the extracted Pig
directory.
4. Add the Pig binaries to your system's PATH environment variable.
5. Verify the Pig installation by running the following command: pig -
version.

Step 2: Start Pig in Local mode

1. Open a command prompt or terminal window.


2. Run the following command to start Pig in local mode: pig -x local. This
will start Pig in local mode, where you can process data stored on your
local machine.

Sort data:

-- Load data from a CSV file

data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,


city:chararray);

-- Sort data by age in ascending order

sorted_data = ORDER data BY age ASC;

-- Store sorted data in a new file

STORE sorted_data INTO 'sorted_output' USING PigStorage(',');

Group data:

-- Load data from a CSV file


data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,
city:chararray);

-- Group data by city


grouped_data = GROUP data BY city;

-- Store grouped data in a new file


STORE grouped_data INTO 'grouped_output' USING PigStorage(',');

Join data:

-- Load data from two CSV files


data1 = LOAD 'input1.csv' USING PigStorage(',') AS (name:chararray, age:int);
data2 = LOAD 'input2.csv' USING PigStorage(',') AS (name:chararray,
city:chararray);

-- Join data1 and data2 on the 'name' field


joined_data = JOIN data1 BY name, data2 BY name;

-- Store joined data in a new file


STORE joined_data INTO 'joined_output' USING PigStorage(',');

Project data:
-- Load data from a CSV file
data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,
city:chararray);

-- Project only the 'name' and 'city' fields


projected_data = FOREACH data GENERATE name, city;

-- Store projected data in a new file


STORE projected_data INTO 'projected_output' USING PigStorage(',');

Filter data:
-- Load data from a CSV file
data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,
city:chararray);

-- Filter data to keep only records where age is greater than 30


filtered_data = FILTER data BY age > 30;

-- Store filtered data in a new file


STORE filtered_data INTO 'filtered_output' USING PigStorage(',');

In above examples, the input data is loaded from a CSV file using the LOAD
statement, and the processed data is stored in a new file using the STORE
statement with the specified output file name.
EXERCISE 11 :
AIM: Install and Run Hive then use Hive to create, alter, and drop databases,
tables, views, functions, and indexes
PROGRAM :

Hive is a data warehouse tool that provides an SQL-like interface for querying
and managing large datasets stored in distributed file systems like Hadoop
HDFS. Here are the steps to install and run Hive, and then create, alter, and
drop databases, tables, views, functions, and indexes.

Step 1: Install Hive

1. Install Hadoop: Hive runs on top of Hadoop, so you need to have Hadoop
installed and configured on your system. You can download Hadoop from
the Apache Hadoop website (https://hadoop.apache.org/).
2. Download Hive: You can download Hive from the Apache Hive website
(https://hive.apache.org/).
3. Extract Hive: Extract the downloaded Hive archive to a directory of your
choice.
4. Configure Hive: Hive requires some configuration settings. Copy the hive-
default.xml.template file from the Hive installation directory to hive-
site.xml, and then configure the necessary settings, such as Hadoop's
fs.defaultFS, Hive's javax.jdo.option.ConnectionURL, and
javax.jdo.option.ConnectionDriverName.

Step 2: Start Hive

1. Start Hadoop: Start your Hadoop cluster using the appropriate


commands, such as start-dfs.sh and start-yarn.sh.
2. Start Hive: Start Hive by running the Hive CLI or HiveServer2, depending
on your use case. Hive CLI provides an interactive command-line
interface, while HiveServer2 provides a Thrift service for remote clients to
connect and execute Hive queries.

Step 3: Create, Alter, and Drop Databases, Tables, Views, Functions, and
Indexes

Create a Database: You can create a new database in Hive using the CREATE
DATABASE command. For example:
CREATE DATABASE mydb;

Alter a Database: You can alter a database in Hive using the ALTER
DATABASE command. For example, you can set properties for a database:
ALTER DATABASE mydb SET DBPROPERTIES ('description'='My database');

Drop a Database: You can drop a database in Hive using the DROP DATABASE
command. For example:
DROP DATABASE mydb;

Create a Table: You can create a table in Hive using the CREATE TABLE
command. For example:
CREATE TABLE mytable (id INT, name STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

Alter a Table: You can alter a table in Hive using the ALTER TABLE command.
For example, you can add a new column to a table:
ALTER TABLE mytable ADD COLUMN age INT;

Drop a Table: You can drop a table in Hive using the DROP TABLE command.
For example:
DROP TABLE mytable;

Create a View: You can create a view in Hive using the CREATE VIEW
command. For example:
CREATE VIEW myview AS SELECT id, name FROM mytable WHERE age > 18;

Create a Function: You can create a custom function in Hive using the
CREATE FUNCTION command. For example:
CREATE FUNCTION myfunc AS 'com.example.MyUDF' USING JAR
'hdfs://localhost:9000/myudf.jar';

Create an Index: You can create an index on a table in Hive using the CREATE
INDEX command. For example:
CREATE INDEX myindex ON TABLE mytable (name) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';

You might also like