0% found this document useful (0 votes)

8 views4 pages

Big Datalab

The document provides detailed instructions for setting up and installing Hadoop in three modes: Standalone, Pseudo Distributed, and Fully Distributed, along with the necessary configurations and commands. It also covers file management tasks in Hadoop, including adding, retrieving, and deleting files in HDFS, as well as running a basic Word Count MapReduce program to demonstrate the MapReduce paradigm. The document outlines the algorithm steps for each task and includes example commands for effective execution.

Uploaded by

Menaka Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views4 pages

Big Datalab

Uploaded by

Menaka Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 4

i) Perform setting up and Installing Hadoop in its three operating modes:?

Standalone?Pseudo Distributed?Fully Distributed

DESCRIPTION:Hadoop is written in Java, so you will need to have Java installed on

your machine,version 6 or later. Sun's JDK is the one most widely used with Hadoop,
although others havebeen reported to work

Hadoop runs on Unix and on Windows. Linux is the only

supported productionplatform, but other flavors of Unix (including Mac OS X) can
be used to run Hadoop fordevelopment. Windows is only supported as a development
platform, and additionally requiresCygwin to run. During the Cygwin
installation process, you should include the opensshpackage if you plan
to run Hadoop in pseudo-distributed mode

ALGORITHM STEPS INVOLVED IN INSTALLING HADOOP IN PSEUDO DISTRIBUTED MODE:-

1. In order install pseudo distributed mode we need to configure the hadoop
configuration files resides in the directory /home/lendi/hadoop-2.7.1/etc/hadoop.
2. First configure the hadoop-env.sh file by changing the java path.
3. Configure the core-site.xml which contains a property tag, it contains name and
value. Name as fs.defaultFS and value as hdfs://localhost:9000
4. Configure hdfs-site.xml.
5. Configure yarn-site.xml.
6. Configure mapred-site.xml before configure the copy mapred-site.xml.templateto
mapred-site.xml.
7. Now format the name node by using command hdfs namenode �format.
8. Type the command start-dfs.sh,start-yarn.sh means that starts the daemons like
NameNode,DataNode,SecondaryNameNode ,ResourceManager,NodeManager.
9. Run JPS which views all daemons. Create a directory in the hadoop by using
command hdfs dfs �mkdr /csedir and enter some data into lendi.txt using command
nano lendi.txt and copy from local directory to hadoop using command hdfs dfs �
copyFromLocal lendi.txt /csedir/and run sample jar file wordcount to check whether
pseudo distributed mode is working or not.
10. Display the contents of file by using command hdfs dfs �cat /newdir/part-r-
00000.

FULLY DISTRIBUTED MODE INSTALLATION:

ALGORITHM
1. Stop all single node clusters
$stop-all.sh

2. Decide one as NameNode (Master) and remaining as DataNodes(Slaves).

3. Copy public key to all three hosts to get a password less

SSH access$ssh-copy-id �I $HOME/.ssh/id_rsa.pub lendi@l5sys24

4. Configure all Configuration files, to name Master and Slave Nodes.

$cd $HADOOP_HOME/etc/hadoop
$nano core-site.xml
$ nano hdfs-site.xml

5. Add hostnames to file slaves and save it.

$ nano slaves

6. Configure
$ nano yarn-site.xml
7. Do in Master Node
$ hdfs namenode �format
$ start-dfs.sh
$start-yarn.sh
8. Format NameNode

9. Daemons Starting in Master and Slave Nodes

10. END

INPUT

ubuntu @localhost> jps

OUTPUT:
Data node, name nodem Secondary name node, NodeManager, Resource Manager

________________________________________________________________________

II Implement the following file management tasks in Hadoop:?

Adding files and directories?
Retrieving files?
Deleting Files

HDFS is a scalable distributed filesystem designed to scale to petabytes of data

while running on top of the underlying filesystem of the operating system.
HDFS keeps track of where the data resides in a network by associating the name
of its rack (or network switch) with the dataset. This allows Hadoop to efficiently
schedule tasks to those nodes that contain data, or which are nearest to it,
optimizing bandwidth utilization. Hadoop provides a set of command line utilities
that work similarly to the Linux file commands, and serve as your primary interface
with HDFS. We�re going to have a look into HDFS by interactig with it from the
command line. We will take a

*Adding files and directories to HDFS?

*Retrieving files from HDFS to local filesystem?
*Deleting files from HDFS

ALGORITHM: -
SYNTAX AND COMMANDS TO ADD, RETRIEVE AND DELETE DATA FROM HDFS
Step-1
Adding Files and Directories to HDFS
Before you can run Hadoop programs on data stored in HDFS, you�ll need to put the
data into HDFS first. Let�s create a directory and put a file in it. HDFS has a
default working directory of/user/$USER, where $USER is your login user name. This
directory isn�t automatically created for you, though, so let�s create it with the
mkdir command. For the purpose of illustration, we use chuck. You should substitute
your user name in the example commands.

hadoop fs -mkdir /user/chuck

hadoop fs -put example.txt
hadoop fs -put example.txt /user/chuck

Step-2
Retrieving Files from HDFS
The Hadoop command get copies files from HDFS back to the local filesystem. To
retrieve example.txt, we can run the following command:
hadoop fs -cat example.txt

Step-3
Deleting Files from HDFS
hadoop fs -rm example.txt
Command for creating a directory in hdfs is �hdfs dfs �mkdir /lendicse�.
Adding directory is done through the command �hdfs dfs �put lendi_english /�.

Step-4
Copying Data from NFS to HDFS
Copying from directory command is
�hdfs dfs �copyFromLocal/home/lendi/Desktop/shakes/glossary /lendicse/"

*View the file by using the command �hdfs dfs �cat /lendi_english/glossary�

*Command for listing of items in Hadoop is �hdfs dfs �ls hdfs://localhost:9000/�.

*Command for Deleting files is �hdfs dfs �rm r /kartheek�.

___________________________________________________________________

III Run a basic Word Count Map Reduce program to understand Map Reduce Paradigm.
a) Find the number of occurrences of each word appearing in the input file(s)
b) Performing a MapReduce Job for word search count (look for specific keywords in
a file)

MAPREDUCE
PROGRAM
WordCount is a simple program which counts the number of occurrences of each word
in a given text input data set. WordCount fits very well with the MapReduce
programming model making it a great example to understand the Hadoop Map/Reduce
programming style. Our implementation consists of three main parts:
1. Mapper
2. Reducer
3. Driver
Step-1. Write a Mapper
A Mapper overrides the ?map? function from the Class
"org.apache.hadoop.mapreduce.Mapper" which provides <key, value> pairs as the
input. A Mapper implementation may output<key,value> pairs using the provided
Context .Input value of the WordCount Map task will be a line of text from the
input data file and the key would be the line number <line_number, line_of_text> .
Map task outputs <word, one> for each word in the line of text.

Pseudo-code
void Map (key, value)
{
for each word x in value:
output.collect(x, 1);
}

Step-2. Write a Reducer

A Reducer collects the intermediate <key,value> output from multiple map
tasks andassemble a single result. Here, the WordCount program will sum up the
occurrence of eachword to pairs as
<word, occurrence>.

Pseudo-code
void Reduce (keyword, <list of value>)
{
for each x in <list of value>:
sum+=x;
final_output.collect(keyword, sum);
}
Step-3. Write Driver
The Driver program configures and run the MapReduce job. We use the main program
toperform basic configurations such as:
?Job Name : name of this Job
?Executable (Jar) Class: the main executable class. For here, WordCount.?Mapper
Class: class which overrides the "map" function. For here, Map.
?Reducer: class which override the "reduce" function. For here , Reduce.
?Output Key: type of output key. For here, Text.
?Output Value: type of output value. For here, IntWritable.
?File Input Path
?File Output Path

Hadoop Lab Manual
No ratings yet
Hadoop Lab Manual
54 pages
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
Public Administration
No ratings yet
Public Administration
178 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
HRM - 1st Midterm
100% (1)
HRM - 1st Midterm
81 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
42 pages
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Big Data Analytics lab-JD
No ratings yet
Big Data Analytics lab-JD
49 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
26 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
45 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Lab Manual
No ratings yet
Lab Manual
34 pages
Hadoop Module1
No ratings yet
Hadoop Module1
37 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
Pro 3
No ratings yet
Pro 3
45 pages
BigData Lab Manual
No ratings yet
BigData Lab Manual
44 pages
CCS334-BDA LAB MANUAL Final
No ratings yet
CCS334-BDA LAB MANUAL Final
46 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
IGCSE Biology - Keywords PDF
No ratings yet
IGCSE Biology - Keywords PDF
13 pages
Bigdatamanual
No ratings yet
Bigdatamanual
45 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Bi Lab File
No ratings yet
Bi Lab File
19 pages
Technical Delay Report
100% (1)
Technical Delay Report
1 page
BDF Programs
No ratings yet
BDF Programs
32 pages
Bda Manual
No ratings yet
Bda Manual
33 pages
1.mrplab Intro
No ratings yet
1.mrplab Intro
18 pages
Big Data
No ratings yet
Big Data
28 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
Big Data
No ratings yet
Big Data
23 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
HADOOP AND BIG DATA - Final
No ratings yet
HADOOP AND BIG DATA - Final
26 pages
Social Studies Grade 8 Final Final August 2022
No ratings yet
Social Studies Grade 8 Final Final August 2022
117 pages
Bda Lab Manual
No ratings yet
Bda Lab Manual
20 pages
Prachi 20CS111 BDALab File
No ratings yet
Prachi 20CS111 BDALab File
20 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
18 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Big Data File
No ratings yet
Big Data File
16 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Bda Lab Manuel
No ratings yet
Bda Lab Manuel
9 pages
Exp 1-2
No ratings yet
Exp 1-2
9 pages
A Note On The Guru Cult
No ratings yet
A Note On The Guru Cult
4 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
(Ebook PDF) Contemporary Maternal-Newborn Nursing 9th Edition PDF Download
100% (7)
(Ebook PDF) Contemporary Maternal-Newborn Nursing 9th Edition PDF Download
49 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
Bdafile
No ratings yet
Bdafile
9 pages
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
No ratings yet
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
33 pages
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
No ratings yet
Extracting Real Value From Your Data With Apache Hadoop: Sarah Sproehnle
51 pages
Power Electronics and DC Lectures
No ratings yet
Power Electronics and DC Lectures
159 pages
Array: Intermediate Level Questions
No ratings yet
Array: Intermediate Level Questions
3 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
Palak
No ratings yet
Palak
10 pages
Time-Series Econometrics
No ratings yet
Time-Series Econometrics
36 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
Checklist and Procedure Ver 3.0
No ratings yet
Checklist and Procedure Ver 3.0
4 pages
Hadoop and Mapreduce Cheat Sheet
No ratings yet
Hadoop and Mapreduce Cheat Sheet
1 page
Eaton DS 265760 NZMN4 AE1000 en - GB 20241113
No ratings yet
Eaton DS 265760 NZMN4 AE1000 en - GB 20241113
4 pages
How To Write A Research Proposal
No ratings yet
How To Write A Research Proposal
3 pages
Issues in Distance Learning
No ratings yet
Issues in Distance Learning
29 pages
SIIT Student Handbook
No ratings yet
SIIT Student Handbook
49 pages
PUMA Annual Report 2021
No ratings yet
PUMA Annual Report 2021
322 pages
The Contemporary World
No ratings yet
The Contemporary World
2 pages
Writing Ten Core Concepts 2nd Robert P. Yagelski Robert P. Yagelski PDF Download
No ratings yet
Writing Ten Core Concepts 2nd Robert P. Yagelski Robert P. Yagelski PDF Download
25 pages
Swarna Ganga Form
No ratings yet
Swarna Ganga Form
1 page
So HVAC
No ratings yet
So HVAC
1 page
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Cab and Chassis Connections Cab Wiring (Right Side) Fuse Block Wiring
No ratings yet
Cab and Chassis Connections Cab Wiring (Right Side) Fuse Block Wiring
4 pages
Module 1-Session 1 LAS
No ratings yet
Module 1-Session 1 LAS
5 pages
Questionnaire For Employees
No ratings yet
Questionnaire For Employees
7 pages
02 Electrochemistry Ques. Final E PDF
No ratings yet
02 Electrochemistry Ques. Final E PDF
21 pages
Monocular Depth Estimation Based On Deep Learning An Overview
No ratings yet
Monocular Depth Estimation Based On Deep Learning An Overview
16 pages
Obc 19971027 Kaye Gibbons
No ratings yet
Obc 19971027 Kaye Gibbons
2 pages
Module 1-Ders Notları
No ratings yet
Module 1-Ders Notları
2 pages
Triac: Fill-In Questions
No ratings yet
Triac: Fill-In Questions
4 pages
20 06 09 Tastytrade Research
No ratings yet
20 06 09 Tastytrade Research
3 pages
Diagrama E Honda Civid Hybrid 2009
No ratings yet
Diagrama E Honda Civid Hybrid 2009
1 page

Big Datalab

Uploaded by

Big Datalab

Uploaded by

i) Perform setting up and Installing Hadoop in its three operating modes:?

Standalone?Pseudo Distributed?Fully Distributed

DESCRIPTION:Hadoop is written in Java, so you will need to have Java installed on

Hadoop runs on Unix and on Windows. Linux is the only

ALGORITHM STEPS INVOLVED IN INSTALLING HADOOP IN PSEUDO DISTRIBUTED MODE:-

FULLY DISTRIBUTED MODE INSTALLATION:

2. Decide one as NameNode (Master) and remaining as DataNodes(Slaves).

3. Copy public key to all three hosts to get a password less

4. Configure all Configuration files, to name Master and Slave Nodes.

5. Add hostnames to file slaves and save it.

9. Daemons Starting in Master and Slave Nodes

ubuntu @localhost> jps

II Implement the following file management tasks in Hadoop:?

HDFS is a scalable distributed filesystem designed to scale to petabytes of data

*Adding files and directories to HDFS?

hadoop fs -mkdir /user/chuck

*Command for listing of items in Hadoop is �hdfs dfs �ls hdfs://localhost:9000/�.

*Command for Deleting files is �hdfs dfs �rm r /kartheek�.

Step-2. Write a Reducer

You might also like