0% found this document useful (0 votes)

17 views50 pages

Bda Lab

Uploaded by

anjiyennj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views50 pages

Bda Lab

Uploaded by

anjiyennj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

EXERCISE 1 :

AIM: Implement the following Data structures in Java

a) Linked Lists b) Stacks c) Queues d) Set e) Map

PROGRAM :

Linked List

Program:

import java.util.*;
public class LinkedListDemo {
public static void main(String args[]) {
// create a linked list
LinkedList ll = new LinkedList();
// add elements to the
linked list ll.add("F");
ll.add("B");
ll.add("D");
ll.add("E");
ll.add("C");
ll.addLast("Z");
ll.addFirst("A");
ll.add(1, "A2");
System.out.println("Original contents of ll: " + ll);
// remove elements from the
linked list ll.remove("F");
ll.remove(2);

System.out.println("Contents of ll after deletion: " + ll);

// remove first and last
elements
ll.removeFirst();
ll.removeLast();

System.out.println("ll after deleting first and last: "+ ll);

// get and set a
value Object val
= ll.get(2);
ll.set(2, (String) val + "
Changed");
System.out.println("ll after
change: " + ll);
}
}
Output:
Original contents of ll: [A, A2, F, B,
D, E, C, Z] Contents of ll after
deletion: [A, A2, D, E, C, Z] ll after
deleting first and last: [A2, D, E, C]
ll after change: [A2, D, E Changed, C]
b) Stacks

Program:

import java.util.*;

public class StackDemo {

static void showpush(Stack st, int a) {

st.push(new Integer(a));

System.out.println("push(" + a + ")");

System.out.println("stack: " + st);

static void showpop(Stack st) {

System.out.print("pop -> ");

Integer a = (Integer) st.pop();

System.out.println(a);

System.out.println("stack: " + st);

public static void main(String

args[]) { Stack st = new Stack();

System.out.println("stack: " +

st); showpush(st, 42);

showpush(st, 66);

showpush(st, 99);
showpop(st);

showpop(st);

try {
showpop(st);

}catch

(EmptyStackException e) {

System.out.println("empty

stack");

Output:

stack: [ ]

push(42)

stack:

[42]

push(66)

stack:

[42, 66]

push(99)

stack: [42, 66, 99]

pop -> 99

stack: [42, 66]

pop -> 66

stack: [42]

pop -> 42

stack: [ ]

pop -> empty stack

c) Queues

program

import java.util.LinkedList;

import java.util.Queue;

public class

QueueExample

public static void main(String[] args)

Queue<Integer> q = new LinkedList<>();

// Adds elements {0,

1, 2, 3, 4} to queue for

(int i=0; i<5; i++)

q.add(i);

// Display contents of the

queue.

System.out.println("Eleme

nts of queue-"+q);

// To remove

the head of

queue. int

removedele =

q.remove();

System.out.println("removed element-" + removedele);

System.out.println(q);
// To view the

head of queue int

head = q.peek();

System.out.println("head of queue-" + head);

// Rest all methods of collection interface,

// Like size and contains can be used with this

// implementation.

int size = q.size();

System.out.println("Size of queue-" + size);

}

Output:

Elements of

queue-[0, 1, 2, 3,

4] removed

element-0

[1, 2, 3, 4]

head of queue-1

Size of queue-4
d) Set

Program
import java.util.*;
public class
SetDemo {
public static void
main(String args[]) {
int count[] = {34,
22,10,60,30,22};
Set<Integer> set = new
HashSet<Integer>(); try{
for(int i = 0; i<5;

i++){

set.add(count[i]);

System.out.println(set);

TreeSet sortedSet = new TreeSet<Integer>(set);

System.out.println("The sorted list is:");

System.out.println(sortedSet);

System.out.println("The First element of the set is:

"+(Integer)sortedSet.first()); System.out.println("The last element

of the set is: "+(Integer)sortedSet.last());

}catch(Exception e){}

Output:

[34, 22, 10, 60, 30]

The sorted list is:

[10, 22, 30, 34, 60]

The First element

of the set is: 10

The last element

of the set is: 60

e) Map

Program:

import java.awt.Color;
import java.util.HashMap;
import java.util.Map;
import java.util.Set; public
class MapDemo

public static void main(String[] args)

{
Map<String, Color> favoriteColors = new
HashMap<String, Color>(); favoriteColors.put("sai",
Color.BLUE); favoriteColors.put("Ram", Color.GREEN);
favoriteColors.put("krishna", Color.RED);
favoriteColors.put("narayana", Color.BLUE); //
Print all keys and values in the map
Set<String> keySet = favoriteColors.keySet(); for (String key : keySet)
{
Color value = favoriteColors.get(key);
System.out.println(key + " : " + value);
}
}
}
Output:
narayana :
java.awt.Color[r=0,g=0,b=255]

sai : java.awt.Color[r=0,g=0,b=255]
krishna : java.awt.Color[r=255,g=0,b=0]

Ram : java.awt.Color[r=0,g=255,b=0]
EXERCISE 2 :
AIM: (i)Perform setting up and Installing Hadoop in its three operating modes:
Standalone, Pseudo distributed,Fully distributed
(ii)Use web based tools to monitor your Hadoop setup.

PROGRAM :
https://drive.google.com/file/d/1nCN_jK7EJF2DmPUUxgOggnvJ6k6tksYz/vie
w
Editing core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

Editing hdfs-site.xml
Also replace PATH~1 and PATH~2 with the path of namenode and
datanode folder that we created recently
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>PATH~1\namenode</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>PATH~2\datanode</value>
<final>true</final>
</property>
</configuration>

mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
Verifying hadoop-env.cmd
set JAVA_HOME=%JAVA_HOME%
OR
set JAVA_HOME="C:\Program Files\Java\jdk1.8.0_221"

Replacing bin
EXERCISE 3 :
AIM: Implement the following file management tasks in Hadoop:
1. Adding files and directories 2. Retrieving files 3. Deleting files

PROGRAM :
EXERCISE 4 :
AIM: Run a basic Word Count MapReduce program to understand
MapReduce Paradigm.
PROGRAM :
EXERCISE 5 :
AIM: Write a map reduce program that mines weather data.

PROGRAM :
Open eclipse→create new java project as MyProject-→ create class as
MyMaxMin
Copy the below code..

// importing Libraries
import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
public class MyMaxMin {
// Mapper
/*MaxTemperatureMapper class is static
* and extends Mapper abstract class
* having four Hadoop generics type
* LongWritable, Text, Text, Text.
*/
public static class MaxTemperatureMapper extends
Mapper<LongWritable, Text, Text, Text> {
/**
* @method map
* This method takes the input as a text data type.
* Now leaving the first five tokens, it takes
* 6th token is taken as temp_max and
* 7th token is taken as temp_min. Now
* temp_max > 30 and temp_min < 15 are
* passed to the reducer.
*/
// the data in our data set with
// this value is inconsistent data
public static final int MISSING = 9999;
@Override
public void map(LongWritable arg0, Text Value, Context context)
throws IOException, InterruptedException {
// Convert the single row(Record) to
// String and store it in String
// variable name line
String line = Value.toString();
// Check for the empty line
if (!(line.length() == 0)) {
// from character 6 to 14 we have
// the date in our dataset
String date = line.substring(6, 14);
// similarly we have taken the maximum
// temperature from 39 to 45 characters
float temp_Max = Float.parseFloat(line.substring(39, 45).trim());
// similarly we have taken the minimum
// temperature from 47 to 53 characters
float temp_Min = Float.parseFloat(line.substring(47, 53).trim());
// if maximum temperature is
// greater than 30, it is a hot day
if (temp_Max > 30.0) {
// Hot day
context.write(new Text("The Day is Hot Day :" + date),
new Text(String.valueOf(temp_Max)));
}
// if the minimum temperature is
// less than 15, it is a cold day
if (temp_Min < 15) {
// Cold day
context.write(new Text("The Day is Cold Day :" + date),
new Text(String.valueOf(temp_Min)));
}
}
}
}
// Reducer
/*MaxTemperatureReducer class is static
and extends Reducer abstract class
having four Hadoop generics type
Text, Text, Text, Text.
*/
public static class MaxTemperatureReducer extends
Reducer<Text, Text, Text, Text> {
/**
* @method reduce
* This method takes the input as key and
* list of values pair from the mapper,
* it does aggregation based on keys and
* produces the final context.
*/
public void reduce(Text Key, Iterator<Text> Values, Context context)
throws IOException, InterruptedException {
// putting all the values in
// temperature variable of type String
String temperature = Values.next().toString();
context.write(Key, new Text(temperature));
}
}
/**
* @method main
* This method is used for setting
* all the configuration properties.
* It acts as a driver for map-reduce
* code.
*/
public static void main(String[] args) throws Exception {
// reads the default configuration of the
// cluster from the configuration XML files
Configuration conf = new Configuration();
// Initializing the job with the
// default configuration of the cluster
Job job = new Job(conf, "weather example");
// Assigning the driver class name
job.setJarByClass(MyMaxMin.class);
// Key type coming out of mapper
job.setMapOutputKeyClass(Text.class);
// value type coming out of mapper
job.setMapOutputValueClass(Text.class);
// Defining the mapper class name
job.setMapperClass(MaxTemperatureMapper.class);
// Defining the reducer class name
job.setReducerClass(MaxTemperatureReducer.class);
// Defining input Format class which is
// responsible to parse the dataset
// into a key value pair
job.setInputFormatClass(TextInputFormat.class);
// Defining output Format class which is
// responsible to parse the dataset
// into a key value pair
job.setOutputFormatClass(TextOutputFormat.class);
// setting the second argument
// as a path in a path variable
Path OutputPath = new Path(args[1]);
// Configuring the input path
// from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
// Configuring the output path from
// the filesystem into the job
FileOutputFormat.setOutputPath(job, new Path(args[1]));
// deleting the context path automatically
// from hdfs so that we don't have
// to delete it explicitly
OutputPath.getFileSystem(conf).delete(OutputPath);
// exiting the job only if the
// flag value becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Add all the necessary jar libraries

To do so Right-Click on Project Name >>Build Path>> configure Build Path.
Add the External jars.
hadoop-3.1.2>> share >> hadoop.
share >> Hadoop
Then Add the client jar files.
A. Select client jar files and click on Open.
B. Add common jar files.
Select common jar files and Open. Also, add common/lib libraries. Select all
common/lib jars and click Open.

C. Add yarn jar files. Select yarn jar files and then select Open.
D. Add MapReduce jar files. Select MapReduce jar files. Click Open.
E. Add HDFS jar files. Select HDFS jar files and click Open. Click on Apply and
Close to add all the Hadoop jar files.
Now, we have added all required jar files in our project.
Step 5. Now create a new class that performs the map job.

Then export
Output:

Open cmd in admin mode

Launch all dfs and yarn nodes
…
EXERCISE 6 :
AIM: Use MapReduce to find the shortest path between two people
in a social graph.
PROGRAM :

Here's a high-level overview of the MapReduce process:

1. Map Function: The map function takes the input graph and generates
key-value pairs where the key is the node ID and the value is a tuple
containing the distance from the source node and the list of neighbors.
2. Reduce Function: The reduce function receives the key-value pairs
generated by the map function and processes them to update the distance
and neighbors list for each node.
3. Iteration: The map and reduce functions are repeated iteratively until the
target node is reached or no more updates are made to the distances.

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;

public class ShortestPathFinder {

// Mapper class for BFS

public static class BFSMapper extends Mapper<LongWritable, Text,
LongWritable, Text> {

@Override
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException {
// Parse the input line as a node and its adjacency list
String[] tokens = value.toString().split("\t");
long node = Long.parseLong(tokens[0]);
String adjacencyList = tokens[1];

// Emit the node and its adjacency list

context.write(new LongWritable(node), new Text(adjacencyList));

// If the node is the source node, emit its adjacency list with a distance
of 0
if (node == SOURCE_NODE_ID) {
String[] neighbors = adjacencyList.split(",");
for (String neighbor : neighbors) {
long neighborNode = Long.parseLong(neighbor);
context.write(new LongWritable(neighborNode), new Text("0," +
SOURCE_NODE_ID));
}
}
}
}

// Reducer class for BFS

public static class BFSReducer extends Reducer<LongWritable, Text,
LongWritable, Text> {

@Override
public void reduce(LongWritable key, Iterable<Text> values, Context
context)
throws IOException, InterruptedException {
boolean visited = false;
long distance = Long.MAX_VALUE;
String adjacencyList = null;

// Iterate through the input values

for (Text value : values) {
String[] tokens = value.toString().split(",");
if (tokens.length == 1) { // Adjacency list
visited = true;
adjacencyList = tokens[0];
} else { // Distance and previous node
long currentDistance = Long.parseLong(tokens[0]);
if (currentDistance < distance) {
distance = currentDistance;
}
}
}

// If the node has been visited, emit its adjacency list with the updated
distance
if (visited) {
context.write(key, new Text(distance + "," + adjacencyList));
}
}
}

// Driver program
public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "ShortestPathFinder");

// Set input and output paths

job.setJarByClass(ShortestPathFinder.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

// Set mapper and reducer classes

job.setMapperClass(BFSMapper.class);
job.setReducerClass(BFSReducer.class);

// Set output key and value classes

job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);

// Wait for the job to complete

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
EXERCISE 7 :
AIM: Implement Friends-of-friends algorithm in MapReduce.
PROGRAM :

The FoFMapper class emits user and friends as key-value pairs, and the
FoFReducer class counts the number of unique friends-of-friends for each user
and emits the result as output. The main method sets up the MapReduce job,
including the input and output file paths, mapper and reducer classes, and
output key and value classes.

// Import necessary libraries

import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class FriendsOfFriends {

// Mapper class
public static class FoFMapper extends Mapper<Object, Text, Text, Text> {

@Override
protected void map(Object key, Text value, Context context)
throws IOException, InterruptedException {
// Split the input line into tokens
String[] tokens = value.toString().trim().split("\\s+");

// Extract user and friends

String user = tokens[0];
List<String> friends = new ArrayList<>();
for (int i = 1; i < tokens.length; i++) {
friends.add(tokens[i]);
}

// Emit user and friends as key-value pairs

for (String friend : friends) {
// Emit friend as key with user as value
context.write(new Text(friend), new Text(user));
// Emit user as key with friend as value
context.write(new Text(user), new Text(friend));
}
}
}

// Reducer class
public static class FoFReducer extends Reducer<Text, Text, Text, IntWritable>
{

@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// Create a set to hold unique friends
Set<String> uniqueFriends = new HashSet<>();

// Iterate through input values and add them to the set

for (Text value : values) {
uniqueFriends.add(value.toString());
}

// Remove the user from the set of unique friends

uniqueFriends.remove(key.toString());

// Emit friend-of-friend count as output

context.write(key, new IntWritable(uniqueFriends.size()));
}
}

// Main method
public static void main(String[] args) throws Exception {
// Create a Hadoop configuration
Configuration conf = new Configuration();
// Create a MapReduce job
Job job = Job.getInstance(conf, "FriendsOfFriends");

// Set the classes for the job

job.setJarByClass(FriendsOfFriends.class);
job.setMapperClass(FoFMapper.class);
job.setReducerClass(FoFReducer.class);

// Set the input and output file paths

FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

// Set the output key and value classes

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// Wait for the job to complete and print the result
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
EXERCISE 8 :
AIM: Implement an iterative PageRank graph algorithm in MapReduce.
PROGRAM :

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class PageRank {

public static class Map extends MapReduceBase implements

Mapper<LongWritable, Text, Text, Text> {

public void map(LongWritable key, Text value, OutputCollector<Text, Text>

output, Reporter reporter) throws IOException {

// Split the input line into pageID, pageRank, and outgoingLinks

String[] parts = value.toString().split("\t");
String pageID = parts[0];
float pageRank = Float.parseFloat(parts[1]);
String[] outgoingLinks = parts[2].split(",");

// Emit the pageID and its pageRank

output.collect(new Text(pageID), new Text("PR:" + pageRank));

// If the page has outgoing links, emit its pageRank divided by the
number of outgoing links for each outgoing link
if (outgoingLinks.length > 0) {
float outgoingPageRank = pageRank / outgoingLinks.length;
for (String link : outgoingLinks) {
output.collect(new Text(link), new Text("OPR:" +
outgoingPageRank));
}
}
}
}

public static class Reduce extends MapReduceBase implements

Reducer<Text, Text, Text, FloatWritable> {

private static final float DAMPING_FACTOR = 0.85f;

public void reduce(Text key, Iterator<Text> values, OutputCollector<Text,

FloatWritable> output, Reporter reporter) throws IOException {

float sumPageRank = 0.0f;

String outgoingLinks = "";

while (values.hasNext()) {
String value = values.next().toString();
String[] parts = value.split(":");
if (parts[0].equals("PR")) {
// Accumulate the sum of pageRank for this key
sumPageRank += Float.parseFloat(parts[1]);
} else if (parts[0].equals("OPR")) {
// Collect the outgoing links
outgoingLinks += "," + parts[1];
}
}

// Update the pageRank using the PageRank formula

float newPageRank = (1 - DAMPING_FACTOR) + (DAMPING_FACTOR *
sumPageRank);
output.collect(key, new FloatWritable(newPageRank));

// Emit the outgoing links for the key

output.collect(key, new FloatWritable(Float.MIN_VALUE));
if (!outgoingLinks.isEmpty()) {
output.collect(key, new FloatWritable(Float.MIN_VALUE));
}
}
}

public static void main(String[] args) throws Exception {

JobConf conf = new JobConf(PageRank.class);
conf.setJobName("PageRank");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);

conf.setMapperClass(Map.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}
EXERCISE 9 :
AIM: Perform an efficient semi-join in MapReduce.
PROGRAM :
A semi-join in MapReduce is an operation that filters data from one data set
based on the existence of matching keys in another data set, similar to an inner
join in relational databases.

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class SemiJoinMapReduce {

// Mapper class
public static class SemiJoinMapper extends Mapper<Object, Text, Text, Text>
{
@Override
protected void map(Object key, Text value, Context context) throws
IOException, InterruptedException {
String[] record = value.toString().split("\t"); // assuming tab-separated
values
String joinKey = record[0]; // extract the join key
String joinValue = record[1]; // extract the value
context.write(new Text(joinKey), new Text(joinValue)); // emit the key-
value pair
}
}

// Reducer class
public static class SemiJoinReducer extends Reducer<Text, Text, Text, Text>
{
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
boolean hasMatch = false;
for (Text value : values) {
// Iterate through values to check for a match
if (value.toString().equals("match")) {
hasMatch = true;
break;
}
}
if (hasMatch) {
// If there is a match, emit the key as output
context.write(key, new Text(""));
}
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "SemiJoinMapReduce");
job.setJarByClass(SemiJoinMapReduce.class);
job.setMapperClass(SemiJoinMapper.class);
job.setReducerClass(SemiJoinReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0])); // input path
FileOutputFormat.setOutputPath(job, new Path(args[1])); // output path
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

In this program, the SemiJoinMapper class reads input data and emits the join
key as the output key, and the join value as the output value. The
SemiJoinReducer class receives the key-value pairs from the mapper and
checks for the existence of the key "match" in the values. If a match is found,
the key is emitted as the output. The input and output paths are specified as
command-line arguments when running the MapReduce job.
EXERCISE 10 :
AIM: Install and Run Pig then write Pig Latin scripts to sort, group, join,
project, and filter your data.
PROGRAM :
Pig is a data processing tool in Hadoop ecosystem that provides a high-level
scripting language called Pig Latin for processing large data sets.

Step 1: Install Pig

1. Download the latest stable release of Pig from the Apache Pig website
(https://pig.apache.org/).
2. Extract the downloaded Pig archive to a directory of your choice.
3. Set the PIG_HOME environment variable to the path of the extracted Pig
directory.
4. Add the Pig binaries to your system's PATH environment variable.
5. Verify the Pig installation by running the following command: pig -
version.

Step 2: Start Pig in Local mode

1. Open a command prompt or terminal window.

2. Run the following command to start Pig in local mode: pig -x local. This
will start Pig in local mode, where you can process data stored on your
local machine.

Sort data:

-- Load data from a CSV file

data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,

city:chararray);

-- Sort data by age in ascending order

sorted_data = ORDER data BY age ASC;

-- Store sorted data in a new file

STORE sorted_data INTO 'sorted_output' USING PigStorage(',');

Group data:

-- Load data from a CSV file

data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,
city:chararray);

-- Group data by city

grouped_data = GROUP data BY city;

-- Store grouped data in a new file

STORE grouped_data INTO 'grouped_output' USING PigStorage(',');

Join data:

-- Load data from two CSV files

data1 = LOAD 'input1.csv' USING PigStorage(',') AS (name:chararray, age:int);
data2 = LOAD 'input2.csv' USING PigStorage(',') AS (name:chararray,
city:chararray);

-- Join data1 and data2 on the 'name' field

joined_data = JOIN data1 BY name, data2 BY name;

-- Store joined data in a new file

STORE joined_data INTO 'joined_output' USING PigStorage(',');

Project data:
-- Load data from a CSV file
data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,
city:chararray);

-- Project only the 'name' and 'city' fields

projected_data = FOREACH data GENERATE name, city;

-- Store projected data in a new file

STORE projected_data INTO 'projected_output' USING PigStorage(',');

Filter data:
-- Load data from a CSV file
data = LOAD 'input.csv' USING PigStorage(',') AS (name:chararray, age:int,
city:chararray);

-- Filter data to keep only records where age is greater than 30

filtered_data = FILTER data BY age > 30;

-- Store filtered data in a new file

STORE filtered_data INTO 'filtered_output' USING PigStorage(',');

In above examples, the input data is loaded from a CSV file using the LOAD
statement, and the processed data is stored in a new file using the STORE
statement with the specified output file name.
EXERCISE 11 :
AIM: Install and Run Hive then use Hive to create, alter, and drop databases,
tables, views, functions, and indexes
PROGRAM :

Hive is a data warehouse tool that provides an SQL-like interface for querying
and managing large datasets stored in distributed file systems like Hadoop
HDFS. Here are the steps to install and run Hive, and then create, alter, and
drop databases, tables, views, functions, and indexes.

Step 1: Install Hive

1. Install Hadoop: Hive runs on top of Hadoop, so you need to have Hadoop
installed and configured on your system. You can download Hadoop from
the Apache Hadoop website (https://hadoop.apache.org/).
2. Download Hive: You can download Hive from the Apache Hive website
(https://hive.apache.org/).
3. Extract Hive: Extract the downloaded Hive archive to a directory of your
choice.
4. Configure Hive: Hive requires some configuration settings. Copy the hive-
default.xml.template file from the Hive installation directory to hive-
site.xml, and then configure the necessary settings, such as Hadoop's
fs.defaultFS, Hive's javax.jdo.option.ConnectionURL, and
javax.jdo.option.ConnectionDriverName.

Step 2: Start Hive

1. Start Hadoop: Start your Hadoop cluster using the appropriate

commands, such as start-dfs.sh and start-yarn.sh.
2. Start Hive: Start Hive by running the Hive CLI or HiveServer2, depending
on your use case. Hive CLI provides an interactive command-line
interface, while HiveServer2 provides a Thrift service for remote clients to
connect and execute Hive queries.

Step 3: Create, Alter, and Drop Databases, Tables, Views, Functions, and
Indexes

Create a Database: You can create a new database in Hive using the CREATE
DATABASE command. For example:
CREATE DATABASE mydb;

Alter a Database: You can alter a database in Hive using the ALTER
DATABASE command. For example, you can set properties for a database:
ALTER DATABASE mydb SET DBPROPERTIES ('description'='My database');

Drop a Database: You can drop a database in Hive using the DROP DATABASE
command. For example:
DROP DATABASE mydb;

Create a Table: You can create a table in Hive using the CREATE TABLE
command. For example:
CREATE TABLE mytable (id INT, name STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

Alter a Table: You can alter a table in Hive using the ALTER TABLE command.
For example, you can add a new column to a table:
ALTER TABLE mytable ADD COLUMN age INT;

Drop a Table: You can drop a table in Hive using the DROP TABLE command.
For example:
DROP TABLE mytable;

Create a View: You can create a view in Hive using the CREATE VIEW
command. For example:
CREATE VIEW myview AS SELECT id, name FROM mytable WHERE age > 18;

Create a Function: You can create a custom function in Hive using the
CREATE FUNCTION command. For example:
CREATE FUNCTION myfunc AS 'com.example.MyUDF' USING JAR
'hdfs://localhost:9000/myudf.jar';

Create an Index: You can create an index on a table in Hive using the CREATE
INDEX command. For example:
CREATE INDEX myindex ON TABLE mytable (name) AS
'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';

ICI Overview
No ratings yet
ICI Overview
40 pages
Sugar and Other Sweeteners - Handbook of Industrial Chemistry and Biotechnology-Springer (2017)
100% (1)
Sugar and Other Sweeteners - Handbook of Industrial Chemistry and Biotechnology-Springer (2017)
46 pages
Hadoop Lab Programs
No ratings yet
Hadoop Lab Programs
32 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
43 pages
Experiment 1
No ratings yet
Experiment 1
7 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
20 pages
Nimsabda
No ratings yet
Nimsabda
36 pages
Data Structures in Java
No ratings yet
Data Structures in Java
9 pages
Hadoop Lab Manual
No ratings yet
Hadoop Lab Manual
16 pages
Collections in Java
No ratings yet
Collections in Java
4 pages
Java Collections + Syntax
No ratings yet
Java Collections + Syntax
9 pages
Java Collection Framework
No ratings yet
Java Collection Framework
29 pages
Collections
No ratings yet
Collections
21 pages
Collection
No ratings yet
Collection
4 pages
Adv Prog Contest Prep
No ratings yet
Adv Prog Contest Prep
12 pages
BDA Lab Practical
No ratings yet
BDA Lab Practical
55 pages
Unit 1: Hadoop and Big Data
No ratings yet
Unit 1: Hadoop and Big Data
20 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
36 pages
Common Formats
No ratings yet
Common Formats
11 pages
Big Data
No ratings yet
Big Data
67 pages
Java Collection Framework
No ratings yet
Java Collection Framework
23 pages
Unit1 Hadoop
No ratings yet
Unit1 Hadoop
20 pages
Bda Lab Ex No 01 & 02
No ratings yet
Bda Lab Ex No 01 & 02
25 pages
JAVA Module 1
No ratings yet
JAVA Module 1
18 pages
Hadoop and BigData LAB MANUAL
50% (4)
Hadoop and BigData LAB MANUAL
59 pages
w05 Collections Curs
No ratings yet
w05 Collections Curs
48 pages
Java Cheat Sheet For Interview
No ratings yet
Java Cheat Sheet For Interview
12 pages
Java Collections Scenarios
No ratings yet
Java Collections Scenarios
7 pages
Linked Lists Program
No ratings yet
Linked Lists Program
35 pages
Linked Lists Program
No ratings yet
Linked Lists Program
21 pages
Java Assignment Vtu SMVIT
No ratings yet
Java Assignment Vtu SMVIT
16 pages
Roorkee Institute of Technology: Object Oriented Programming With Java (Cat-011) Collection Framework
No ratings yet
Roorkee Institute of Technology: Object Oriented Programming With Java (Cat-011) Collection Framework
14 pages
Java Module-4
No ratings yet
Java Module-4
25 pages
Java Collections
No ratings yet
Java Collections
20 pages
Interface Type and Its Implementation
No ratings yet
Interface Type and Its Implementation
8 pages
Bda Lab
No ratings yet
Bda Lab
36 pages
Java Collections Framework
No ratings yet
Java Collections Framework
11 pages
Cheat Sheet
No ratings yet
Cheat Sheet
13 pages
Java Collection Framework Full Code Arial Updated
No ratings yet
Java Collection Framework Full Code Arial Updated
9 pages
Coding Syntaxes
No ratings yet
Coding Syntaxes
20 pages
Java - CollectionFramework - Examples A
No ratings yet
Java - CollectionFramework - Examples A
11 pages
CCCCCCCCVVVVVVV
No ratings yet
CCCCCCCCVVVVVVV
35 pages
JAVA UNIT-4 Lecture Notes
No ratings yet
JAVA UNIT-4 Lecture Notes
31 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
85 pages
Information Security Assignment 1
No ratings yet
Information Security Assignment 1
9 pages
Collections
No ratings yet
Collections
25 pages
Name: Evarist Joseph Evarist Rollno: 92100151022
No ratings yet
Name: Evarist Joseph Evarist Rollno: 92100151022
17 pages
Data Structure
No ratings yet
Data Structure
8 pages
Sec3 Program
No ratings yet
Sec3 Program
41 pages
Data Structure Specification
No ratings yet
Data Structure Specification
7 pages
Java Useful Methods
No ratings yet
Java Useful Methods
2 pages
NAME-Megha Saxena Registration Number - 20BIT0366 Java Programming Lab - 33+34 Assignment 5
No ratings yet
NAME-Megha Saxena Registration Number - 20BIT0366 Java Programming Lab - 33+34 Assignment 5
6 pages
Java Unit 4
No ratings yet
Java Unit 4
33 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
59 pages
Java 2 Collections
No ratings yet
Java 2 Collections
32 pages
Java Points
No ratings yet
Java Points
18 pages
Softeng - Polito.it Slides 07-JavaCollections
No ratings yet
Softeng - Polito.it Slides 07-JavaCollections
71 pages
Daa Lab Manual Cse III I Sem
No ratings yet
Daa Lab Manual Cse III I Sem
47 pages
M1
No ratings yet
M1
51 pages
Java Collections Framework: Collection: A Group of Elements
No ratings yet
Java Collections Framework: Collection: A Group of Elements
5 pages
Hadoop Bigdata UNIT-1
No ratings yet
Hadoop Bigdata UNIT-1
18 pages
Day 5 IELTS Academic Reading Questions by KenyanNurse-1
No ratings yet
Day 5 IELTS Academic Reading Questions by KenyanNurse-1
12 pages
Task Support Vehicle: Maintenance Repair Parts Manual
No ratings yet
Task Support Vehicle: Maintenance Repair Parts Manual
120 pages
Altium Designer Knjiga Na Engleskom PDF
100% (2)
Altium Designer Knjiga Na Engleskom PDF
84 pages
Temporary Structures: Wall Form Design
No ratings yet
Temporary Structures: Wall Form Design
12 pages
Piping Material Steel
No ratings yet
Piping Material Steel
44 pages
Ramu Prabhakar Chatla: Electrical Designer
No ratings yet
Ramu Prabhakar Chatla: Electrical Designer
3 pages
ST6391, ST6392, ST6393 ST6395, ST6397, ST6399: Data Sheet
No ratings yet
ST6391, ST6392, ST6393 ST6395, ST6397, ST6399: Data Sheet
68 pages
RAIC Field Review Manual
No ratings yet
RAIC Field Review Manual
29 pages
Tillett Car Seat 2011 Brochure
No ratings yet
Tillett Car Seat 2011 Brochure
8 pages
Explain City Functional Movement
No ratings yet
Explain City Functional Movement
3 pages
Case Study Presentation Two Tough Calls A Harvard Business School
No ratings yet
Case Study Presentation Two Tough Calls A Harvard Business School
10 pages
Singsing NG Tanikala - CapDevACE Nomination Form
No ratings yet
Singsing NG Tanikala - CapDevACE Nomination Form
7 pages
PaveAnalyzer White Paper
No ratings yet
PaveAnalyzer White Paper
4 pages
CMV 230330430
No ratings yet
CMV 230330430
1 page
PP-SFC Introduction To Production Orders
No ratings yet
PP-SFC Introduction To Production Orders
12 pages
Acknowledgement Thesis Sample Friends
100% (2)
Acknowledgement Thesis Sample Friends
5 pages
KR 120 R3900-2 K: Workspace Graphic
No ratings yet
KR 120 R3900-2 K: Workspace Graphic
1 page
CLAT Previous Year Question Papers With Answers
No ratings yet
CLAT Previous Year Question Papers With Answers
209 pages
Ford
No ratings yet
Ford
101 pages
Memu Scope of Work
No ratings yet
Memu Scope of Work
1 page
Chinmaya Vidyalaya
No ratings yet
Chinmaya Vidyalaya
4 pages
Cooperatives Vs Corporations
No ratings yet
Cooperatives Vs Corporations
2 pages
Module 12
No ratings yet
Module 12
17 pages
Nitoprime Primer
No ratings yet
Nitoprime Primer
2 pages
Installing OpenCV With Visual C++ On Windows 7
100% (1)
Installing OpenCV With Visual C++ On Windows 7
10 pages
Standard Shipment Process (Mass Processing) : LE (Logistics Execution)
No ratings yet
Standard Shipment Process (Mass Processing) : LE (Logistics Execution)
9 pages
Adv PPT
No ratings yet
Adv PPT
25 pages
How Technology Has Made Governance Easier
No ratings yet
How Technology Has Made Governance Easier
8 pages