0% found this document useful (0 votes)

16 views24 pages

DA Lab EXERCISE

Data analytics laboratory experiments The data analytics 7th semester Data analytics lab manual

Uploaded by

julie M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views24 pages

DA Lab EXERCISE

Data analytics laboratory experiments The data analytics 7th semester Data analytics lab manual

Uploaded by

julie M

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

EX.

1 Hadoop Installation and Configuration

Aim:-

To install and Configure Hadoop Environment.

Procedure:-

1. Install Java 8:
a. Download Java 8 a. Set environmental variables:
i. User variable:
• Variable: JAVA_HOME
• Value: C:\java
ii. System variable:
• Variable: PATH
• Value: C:\java\bin
b. Check on cmd, see below:

2. Install Eclipse Mars. Download it from the link: https://eclipse.org/downloads/ and

extract it into C drive.
a. Set environmental variables:
i. User variable:
• Variable: ECLIPSE_HOME
• Value: C:\eclipse
ii. System variable:
• Variable: PATH
• Value: C:\eclipse \bin
b. Download “hadoop2x-eclipse-plugin-master.”Three Jar files on the path
“hadoop2x- eclipse-plugin-master\release.” Copy these three jar files and pate
them into “C:\eclipse\dropins.”
c. Download “slf4j-1.7.21.” Copy Jar files from this folder and paste them to
“C:\eclipse\plugins”. This step may create errors; when you will execute
Eclipse, you will see errors like org.apa…..jar file in multiple places. So, now
delete these files from all the places except one.
Errors
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/C:/eclipse/plugins/org.slf4j.impl.log4j12_1.7.2.v20131105-
2200.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/hadoop-
2.6.0/share/hadoop/common/lib/slf4j-log4j12-
1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

3. Download Apache-ant-1.9.6: (optional step) extract it into a folder in C drive.

4. Download Hadoop-2.6.x:
a. Put extracted Hadoop-2.6.x files into D drive.
b. Download “hadoop-common-2.6.0-bin-master. Paste all these files into the
“bin” folder of Hadoop-2.6.x.
c. Create a “data” folder inside Hadoop-2.6.x, and also create two more folders
in the “data” folder as “data” and “name.”
d. Create a folder to store temporary data during execution of a project, such as
“D:\hadoop\temp.”
e. Create a log folder, such as “D:\hadoop\userlog”
f. Go to Hadoop-2.6.x etc Hadoop and edit four files:
i. core-site.xml
ii. hdfs-site.xml
iii. mapred.xml
iv. yarn.xml

core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>D:\hadoop\temp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:50071</value>
</property>
</configuration>

hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>
<property><name>dfs.replication</name><value>1</value></property>
<property> <name>dfs.namenode.name.dir</name><value>/hadoop-
2.6.0/data/name</value><final>true</final></property>
<property><name>dfs.datanode.data.dir</name><value>/hadoop-
2.6.0/data/data</value><final>true</final> </property>
</configuration>
mapred.xml
<?xml version="1.0"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapreduce.application.classpath</name>

<value>/hadoop-2.6.0/share/hadoop/mapreduce/*,
/hadoop-2.6.0/share/hadoop/mapreduce/lib/*,
/hadoop-2.6.0/share/hadoop/common/*,
/hadoop-2.6.0/share/hadoop/common/lib/*,
/hadoop-2.6.0/share/hadoop/yarn/*,
/hadoop-2.6.0/share/hadoop/yarn/lib/*,
/hadoop-2.6.0/share/hadoop/hdfs/*,
/hadoop-2.6.0/share/hadoop/hdfs/lib/*,
</value>
</property>
</configuration>
yarn-site.xml
<?xml version="1.0"?>

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>D:\hadoop\userlog</value><final>true</final>
</property>
<property><name>yarn.nodemanager.local-
dirs</name><value>D:\hadoop\temp\nm-local-dir</value></property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value> 600</value>
</property>
<property><name>yarn.application.classpath</name>
<value>/hadoop-2.6.0/,/hadoop-2.6.0/share/hadoop/common/*,/hadoop-
2.6.0/share/hadoop/common/lib/*,/hadoop-2.6.0/share/hadoop/hdfs/*,/hadoop-
2.6.0/share/hadoop/hdfs/lib/*,/hadoop-2.6.0/share/hadoop/mapreduce/*,/hadoop-
2.6.0/share/hadoop/mapreduce/lib/*,/hadoop-2.6.0/share/hadoop/yarn/*,/hadoop-
2.6.0/share/hadoop/yarn/lib/*</value>
</property>
</configuration>
g. Go to the location: “Hadoop-2.6.0->etc->hadoop,” and edit “hadoop-env.cmd” by
writing
set JAVA_HOME=C:\java\jdk1.8.0_91
h. Set environmental variables: Do: My computer -> Properties -> Advance system
settings -> Advanced -> Environmental variables
i. User variables:
• Variable: HADOOP_HOME
• Value: D:\hadoop-2.6.0
ii. System variable
• Variable: Path
• Value: D:\hadoop-2.6.0\bin

D:\hadoop-2.6.0\sbin
D:\hadoop-2.6.0\share\hadoop\common\*
D:\hadoop-2.6.0\share\hadoop\hdfs
D:\hadoop-2.6.0\share\hadoop\hdfs\lib\*
D:\hadoop-2.6.0\share\hadoop\hdfs\*
D:\hadoop-2.6.0\share\hadoop\yarn\lib\*
D:\hadoop-2.6.0\share\hadoop\yarn\*
D:\hadoop-2.6.0\share\hadoop\mapreduce\lib\*
D:\hadoop-2.6.0\share\hadoop\mapreduce\*
D:\hadoop-2.6.0\share\hadoop\common\lib\*
i. Check on cmd;
j. Format name-node: On cmd go to the location “Hadoop-2.6.0 bin” by writing on
cmd “cd hadoop-2.6.0.\bin” and then “hdfs namenode –format”
k. Start Hadoop. Go to the location: “D:\hadoop-2.6.0\sbin.” Run the following files as
administrator “start-dfs.cmd” and “start-yarn.cmd”

How to create a new MapReduce project in Eclipse

1. Open Ellipse
2. Click File -> New Project -> Java project
3. Click next and add external Jars for MapReduce.

Copy all the Jar files from the locations “D:\hadoop-2.6.0\”

a. \share\hadoop\common\lib
b. \share\hadoop\mapreduce
c. \share\hadoop\mapreduce\lib share\hadoop\yarn
d. \share\hadoop\yarn\lib
4. Connect DFS in Eclipse

Eclipse -> Window -> Perspective -> Open Perspective -> Other -> MapReduce ->
Click OK.
See a bar at the bottom. Click on Map/Reduce locations.
Right click on blank space, then click on “Edit setting,” and see the following screen.

a. Set the following:

i. MapReduce (V2) Master
• Host: localhost
• Port: 9001
ii. DFS Master
• Host: localhost
• Port: 50071
b. Click finish

Result:-

Thus, the Hadoop Environment was installed and Configured.

EX.2 Implementation of word count/frequency using MapReduce

Aim:-
To implement word count program using MapReduce.

Program:-
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Result:-

Thus, the word count program was executed using Hadoop environment.
EX.3 Implementation of MR program using Weather dataset

Aim :-

To write a code to find maximum temperature per year from sensor temperature
data sheet, using hadoop mapreduce framework.

Procedure:-

Implement Mapper and Reducer program for finding Maximum temperature in java

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;

//Mapper class
class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {

private static final int MISSING = 9999;

@Override
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {

String line = value.toString();

String year = line.substring(15, 19);
int airTemperature;
if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs
airTemperature = Integer.parseInt(line.substring(88, 92));
} else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92, 93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}

//Reducer class
class MaxTemperatureReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {

@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context)
throws IOException, InterruptedException {

int maxValue = Integer.MIN_VALUE;

for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}

//Driver Class

public class MaxTemperature {

public static void main(String[] args) throws Exception {

if (args.length != 2) {
System.err.println("Usage: MaxTemperature <input path=""> <output path>");
System.exit(-1);
}

Job job = Job.getInstance(new Configuration());

job.setJarByClass(MaxTemperature.class);
job.setJobName("Max temperature");

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

job.setMapperClass(MaxTemperatureMapper.class);
job.setReducerClass(MaxTemperatureReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

job.submit();
}
}
Result:-

Thus, Maximum temperature of weather dataset was obtained using

MapReduce.

EX.4 INSTALL, CONFIGURE AND RUN SPARK

Aim:-
To install and configure spark in standalone machine.
Procedure:-
Step 1: Install Java 8
Apache Spark requires Java 8. You can check to see if Java is installed using the
command prompt.
Open the command line by clicking Start > type cmd > click Command Prompt.
Type the following command in the command prompt:
java -version
If Java is installed, it will respond with the following output:
Step 2: Install Python
1. To install the Python package manager, navigate to https://www.python.org/ in your
web browser.
2. Mouse over the Download menu option and click Python 3.8.3. 3.8.3 is the latest
version at the time of writing the article.
3. Once the download finishes, run the file.

4. Near the bottom of the first setup dialog box, check off Add Python 3.8 to PATH.
Leave the other box checked.
5. Next, click Customize installation.
6. You can leave all boxes checked at this step, or you can uncheck the options you
do not want.
7. Click Next.
8. Select the box Install for all users and leave other boxes as they are.
9. Under Customize install location, click Browse and navigate to the C drive. Add a
new folder and name it Python.
10. Select that folder and click OK.
11. Click Install, and let the installation complete.
12. When the installation completes, click the Disable path length limit option at the
bottom and then click Close.
13. If you have a command prompt open, restart it. Verify the installation by checking
the version of Python:
python --version
The output should print Python 3.8.3.
Step 3: Download Apache Spark
1. Open a browser and navigate to https://spark.apache.org/downloads.html.
2. Under the Download Apache Spark heading, there are two drop-down menus. Use
the current non-preview version.
In our case, in Choose a Spark release drop-down menu select 2.4.5
In the second drop-down Choose a package type, leave the selection Pre-built for
Apache Hadoop 2.7.
3. Click the spark-2.4.5-bin-hadoop2.7.tgz link.
4. A page with a list of mirrors loads where you can see different servers to download
from. Pick any from the list and save the file to your Downloads folder.
Step 4: Verify Spark Software File
1. Verify the integrity of your download by checking the checksum of the file. This
ensures you are working with unaltered, uncorrupted software.
2. Navigate back to the Spark Download page and open the Checksum link, preferably
in a new tab.
3. Next, open a command line and enter the following command:
certutil -hashfile c:\users\username\Downloads\spark-2.4.5-bin-hadoop2.7.tgz
SHA512
4. Change the username to your username. The system displays a long alphanumeric
code, along with the message Certutil: -hashfile completed successfully.

5. Compare the code to the one you opened in a new browser tab. If they match, your
download file is uncorrupted.
Step 5: Install Apache Spark
Installing Apache Spark involves extracting the downloaded file to the desired location.
1. Create a new folder named Spark in the root of your C: drive. From a command
line, enter the following:
cd \
mkdir Spark
2. In Explorer, locate the Spark file you downloaded.
3. Right-click the file and extract it to C:\Spark using the tool you have on your system
(e.g., 7-Zip).
4. Now, your C:\Spark folder has a new folder spark-2.4.5-bin-hadoop2.7 with the
necessary files inside.
Step 6: Add winutils.exe File
Download the winutils.exe file for the underlying Hadoop version for the Spark
installation you downloaded.
1. Navigate to this URL https://github.com/cdarlint/winutils and inside the bin
folder, locate winutils.exe, and click it.

2. Find the Download button on the right side to download the file.
3. Now, create new folders Hadoop and bin on C: using Windows Explorer or the
Command Prompt.
4. Copy the winutils.exe file from the Downloads folder to C:\hadoop\bin.
Step 7: Configure Environment Variables
Configuring environment variables in Windows adds the Spark and Hadoop locations
to your system PATH. It allows you to run the Spark shell directly from a command
prompt window.
1. Click Start and type environment.
2. Select the result labeled Edit the system environment variables.
3. A System Properties dialog box appears. In the lower-right corner, click
Environment Variables and then click New in the next window.
4. For Variable Name type SPARK_HOME.
5. For Variable Value type C:\Spark\spark-2.4.5-bin-hadoop2.7 and click OK. If you
changed the folder path, use that one instead.

6. In the top box, click the Path entry, then click Edit. Be careful with editing the system
path. Avoid deleting any entries already on the list.
7. You should see a box with entries on the left. On the right, click New.
8. The system highlights a new line. Enter the path to the Spark folder C:\Spark\spark-
2.4.5-bin-hadoop2.7\bin. We recommend using %SPARK_HOME%\bin to avoid
possible issues with the path.
9. Repeat this process for Hadoop and Java.
• For Hadoop, the variable name is HADOOP_HOME and for the value use the
path of the folder you created earlier: C:\hadoop. Add C:\hadoop\bin to the Path
variable field, but we recommend using %HADOOP_HOME%\bin.
• For Java, the variable name is JAVA_HOME and for the value use the path to
your Java JDK directory (in our case it’s C:\Program Files\Java\jdk1.8.0_251).

10. Click OK to close all open windows.

Step 8: Launch Spark
1. Open a new command-prompt window using the right-click and Run as
administrator:
2. To start Spark, enter:
C:\Spark\spark-2.4.5-bin-hadoop2.7\bin\spark-shell
If you set the environment path correctly, you can type spark-shell to launch Spark.
3. The system should display several lines indicating the status of the application. You
may get a Java pop-up. Select Allow access to continue.
Finally, the Spark logo appears, and the prompt displays the Scala shell.
4., Open a web browser and navigate to http://localhost:4040/.
5. You can replace localhost with the name of your system.
6. You should see an Apache Spark shell Web UI. The example below shows the
Executors page.

7. To exit Spark and close the Scala shell, press ctrl-d in the command-prompt window.

Result:-
Thus, the SPARK was installed and configured successfully.
EX.5 IMPLEMENT WORD COUNT / FREQUENCY PROGRAMS USING SPARK
Aim:-
To Implement word count / frequency programs using Spark

Program:-
package org.apache.spark.examples;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public final class WordCount {

private static final Pattern SPACE = Pattern.compile(" ");

public static void main(String[] args) throws Exception {

if (args.length < 1) {
System.err.println("Usage: WordCount <file>");
System.exit(1);
}

final SparkConf sparkConf = new

SparkConf().setAppName("WordCount");
final JavaSparkContext ctx = new JavaSparkContext(sparkConf);
final JavaRDD<String> lines = ctx.textFile(args[0], 1);

final JavaRDD<String> words = lines.flatMap(s ->

Arrays.asList(SPACE.split(s)));
final JavaPairRDD<String, Integer> ones = words.mapToPair(s -> new
Tuple2<>(s, 1));
final JavaPairRDD<String, Integer> counts = ones.reduceByKey((i1, i2)
-> i1 + i2);

final List<Tuple2<String, Integer>> output = counts.collect();

for (Tuple2 tuple : output) {
System.out.println(tuple._1() + ": " + tuple._2());
}
ctx.stop();
}}
Result:-
Thus, the word count program is executed successfully.

Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
How To Install Hadoop in Windows 10 & 11 - Hadoop Installation
No ratings yet
How To Install Hadoop in Windows 10 & 11 - Hadoop Installation
9 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
54 pages
Exp 5 - 9
No ratings yet
Exp 5 - 9
25 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
26 pages
Da Lab Record - Merged
No ratings yet
Da Lab Record - Merged
48 pages
N8N Textbook
No ratings yet
N8N Textbook
22 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
49 pages
DA Lab Manual Final
No ratings yet
DA Lab Manual Final
46 pages
06-API Management-Containers&CloudNativeRoadShow PDF
100% (1)
06-API Management-Containers&CloudNativeRoadShow PDF
35 pages
Hadoopfile PP
No ratings yet
Hadoopfile PP
83 pages
New Bda Manual
No ratings yet
New Bda Manual
80 pages
Bda Manual
No ratings yet
Bda Manual
80 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
Big Data Lab Record
No ratings yet
Big Data Lab Record
30 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
Bda Record
No ratings yet
Bda Record
83 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Hive INstallation
No ratings yet
Hive INstallation
13 pages
Hadoop Installation Process
No ratings yet
Hadoop Installation Process
16 pages
Hadoop Record 2024-Final
No ratings yet
Hadoop Record 2024-Final
59 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
Data Science
No ratings yet
Data Science
82 pages
Setup Hadoop On Windows 10 Machines
No ratings yet
Setup Hadoop On Windows 10 Machines
4 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
Group A 1st
No ratings yet
Group A 1st
4 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
Hbase Installationn
No ratings yet
Hbase Installationn
12 pages
Cloud Computing Ex 6
No ratings yet
Cloud Computing Ex 6
8 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Big Data File
No ratings yet
Big Data File
16 pages
Step 2 - First MapReduce Program
No ratings yet
Step 2 - First MapReduce Program
25 pages
Gathering and Organizing Information Using ICT Advantages and Disadvantages of Using Online Tools To Gather Data
No ratings yet
Gathering and Organizing Information Using ICT Advantages and Disadvantages of Using Online Tools To Gather Data
16 pages
Big Data
No ratings yet
Big Data
28 pages
PowerDNS Authoritative Server Documentat PDF
No ratings yet
PowerDNS Authoritative Server Documentat PDF
382 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
BDAO
No ratings yet
BDAO
23 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
TP2 - 3IM - en
No ratings yet
TP2 - 3IM - en
7 pages
Hadoop
No ratings yet
Hadoop
4 pages
Install and Run Hadoop On Windows
No ratings yet
Install and Run Hadoop On Windows
29 pages
Haddop Installation
No ratings yet
Haddop Installation
2 pages
Experiment No. 3.1: 1) JAVA-Java JDK 2) HADOOP-Hadoop Package - Step 1: Verify The Java Installed
No ratings yet
Experiment No. 3.1: 1) JAVA-Java JDK 2) HADOOP-Hadoop Package - Step 1: Verify The Java Installed
6 pages
Computer Science & Engineering: Department of
No ratings yet
Computer Science & Engineering: Department of
6 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
Anushka Shetty 35
No ratings yet
Anushka Shetty 35
34 pages
CC EXP 8 VBHV
No ratings yet
CC EXP 8 VBHV
8 pages
Hadoop Installation
No ratings yet
Hadoop Installation
4 pages
EX1-Installation of Hadoop
No ratings yet
EX1-Installation of Hadoop
6 pages
Unit 1 Introduction To Data Science
No ratings yet
Unit 1 Introduction To Data Science
63 pages
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
No ratings yet
Installing Standalone and Pseudocode Hadoop Cluster: 1. Setting Up Vmware Virtual Machine
14 pages
Amc Engineering College: Dept. of Computer Science and Engineering
No ratings yet
Amc Engineering College: Dept. of Computer Science and Engineering
6 pages
Cyber Security Tutorial
No ratings yet
Cyber Security Tutorial
78 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
RSA Archer Threat Management 4 Overview Guide
No ratings yet
RSA Archer Threat Management 4 Overview Guide
33 pages
Hadoop/Hbase Installation: Install Java
No ratings yet
Hadoop/Hbase Installation: Install Java
11 pages
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
No ratings yet
Sqoop Tutorial: Sqoop: "SQL To Hadoop and Hadoop To SQL"
11 pages
Shared Memory Architecture
No ratings yet
Shared Memory Architecture
17 pages
CC103 Mod3
No ratings yet
CC103 Mod3
12 pages
Install Sqoop
No ratings yet
Install Sqoop
7 pages
Hadoop On Windows
No ratings yet
Hadoop On Windows
6 pages
Query Sheet
0% (1)
Query Sheet
18 pages
InternshipPPT 1
No ratings yet
InternshipPPT 1
11 pages
Bharat Intern Data Science
No ratings yet
Bharat Intern Data Science
9 pages
Database Design Document Hostel Room Allocation and Maintenance System
No ratings yet
Database Design Document Hostel Room Allocation and Maintenance System
12 pages
Connection Strings
No ratings yet
Connection Strings
4 pages
Essay On An Education Issue
No ratings yet
Essay On An Education Issue
16 pages
07 SSH Basics
No ratings yet
07 SSH Basics
15 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
8 pages
Unit 3 Topic List
No ratings yet
Unit 3 Topic List
21 pages
Cloud, Edg, Fog
No ratings yet
Cloud, Edg, Fog
10 pages
NoSQL Paper 2
No ratings yet
NoSQL Paper 2
18 pages
System Analysis & Design: Section 7 Eng. Faten Khalifa
No ratings yet
System Analysis & Design: Section 7 Eng. Faten Khalifa
27 pages
Security Enhancement and Time Delay Consumption For Cloud Computing Using AES and RC6 Algorithm
No ratings yet
Security Enhancement and Time Delay Consumption For Cloud Computing Using AES and RC6 Algorithm
6 pages
X E Boys Travel - Merged
No ratings yet
X E Boys Travel - Merged
19 pages
apna-TCS-IT Support Engineer-Interview-Report-202506041718
No ratings yet
apna-TCS-IT Support Engineer-Interview-Report-202506041718
6 pages
Sales Management System
No ratings yet
Sales Management System
13 pages
Codd 1981 Data Model
No ratings yet
Codd 1981 Data Model
12 pages
Sdma - Tdma - Fdma - Cdma.
No ratings yet
Sdma - Tdma - Fdma - Cdma.
11 pages
Sap Successfactors Recruiting Management Solutions Specification
No ratings yet
Sap Successfactors Recruiting Management Solutions Specification
6 pages
Learning Front-End Development
No ratings yet
Learning Front-End Development
4 pages
Christopher Fernando: People
No ratings yet
Christopher Fernando: People
3 pages
Prelim Exam Cpe 522
No ratings yet
Prelim Exam Cpe 522
2 pages
7.214 Assignment 01
No ratings yet
7.214 Assignment 01
5 pages
Solai's Shampoo
No ratings yet
Solai's Shampoo
1 page
Multani Mitti Powder
No ratings yet
Multani Mitti Powder
1 page
Hair Oil
No ratings yet
Hair Oil
1 page
Thulsi Soap
No ratings yet
Thulsi Soap
1 page
Prescription 1750335382980
No ratings yet
Prescription 1750335382980
1 page
Prescription 1750335398682
No ratings yet
Prescription 1750335398682
1 page
Prescription 1750335455310
No ratings yet
Prescription 1750335455310
1 page
CCNA Security: Implementing Network Security (Version 1.2) - CCNAS Chapter 4
No ratings yet
CCNA Security: Implementing Network Security (Version 1.2) - CCNAS Chapter 4
1 page
Product Owner Resume Template
No ratings yet
Product Owner Resume Template
1 page
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet

DA Lab EXERCISE

Uploaded by

DA Lab EXERCISE

Uploaded by

EX.

1 Hadoop Installation and Configuration

To install and Configure Hadoop Environment.

2. Install Eclipse Mars. Download it from the link: https://eclipse.org/downloads/ and

3. Download Apache-ant-1.9.6: (optional step) extract it into a folder in C drive.

How to create a new MapReduce project in Eclipse

Copy all the Jar files from the locations “D:\hadoop-2.6.0\”

a. Set the following:

Thus, the Hadoop Environment was installed and Configured.

public class WordCount {

public static class TokenizerMapper

private final static IntWritable one = new IntWritable(1);

public void map(Object key, Text value, Context context

public static class IntSumReducer

public void reduce(Text key, Iterable<IntWritable> values,

public static void main(String[] args) throws Exception {

private static final int MISSING = 9999;

String line = value.toString();

int maxValue = Integer.MIN_VALUE;

public class MaxTemperature {

public static void main(String[] args) throws Exception {

Job job = Job.getInstance(new Configuration());

FileInputFormat.addInputPath(job, new Path(args[0]));

Thus, Maximum temperature of weather dataset was obtained using

EX.4 INSTALL, CONFIGURE AND RUN SPARK

10. Click OK to close all open windows.

public final class WordCount {

public static void main(String[] args) throws Exception {

final SparkConf sparkConf = new

final JavaRDD<String> words = lines.flatMap(s ->

final List<Tuple2<String, Integer>> output = counts.collect();

You might also like