Day 7

Uploaded by

sidhrajsz112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views7 pages

Day 7

Uploaded by

sidhrajsz112

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Path for Hadoop map reduce

user/lib/Hadoop-map reduce

MR Framework
Initially  MR  JAVA

Hadoop streaming python,shell,c/c++,ruby

Steps to execute map reduce

hadoop jar <jar files path> -files<.py> -input<hdfs
path> -output<hdfs-path>

Hadoop streaming is a utility that comes with Hadoop

that enables you to develop map reduce executables in
languages other than java.
Streaming is implemented in the form of jar file so you
can run it from EMR or command line just like a
standard jar file
Streaming allows you to take advantage of the benefits
of map reduce while using any scripting language you
like

TO SEE WHERE THE JAR FILE ARE IN HADOOP

cd /usr/lib/hadoop-mapreduce
Map code
HOW TO RUN MAP REDUCE CODE USING HADOOP
STREAMING LIBRARY
=================================
===============
Make mr-demo directory
ls
Put the files from s3 bucket from directory cod to EMR
here
aws s3 cp <s3 code file uri of map> <destination> 
aws cli used
aws s3 cp <s3 code file uri of reducer> <destination>
chmod -x <mapcode.py>  making file executable
chmod -x <reducercode.py>
data is in data.txt file in s3  this has data (input)
linux commands
hadoop jar \
/usr/lib/Hadoop-mapreduce/hadoop-streaming.jar \
-files /home/hadoop/mr-demo/mapper.py,
/home/hadoop/mr-demo/reducer.py \
-mapper mapper.py \
-reducer reducer.py \
-input s3://vita-24-artefact/data/ \
-output s3://vita-24-artefact/output/output_class_demo

example
hadoop jar /usr/lib/map-reduce/hadoop-streaming.jar -
files /home/hadoop/mr-demo/mapper.py,
/home/hadoop/mr-demo/reducer.py -mapper mapper.py
-reducer reducer.py -input<data.txt s3 uri> -
output<output folder s3 uri>

Cloud Storage as HDFS

S3 acts like HDFS
Pros- availablility  we put data in s3 so now it aws job
to protect it (durable 99.999999)
Even when emr is terminated data is safe on s3
Cons – slow compared to HDFS  data from s3 and
requires networks so transfer is slower compared to
hdfs which is local in EMR
S3  aws
Bucket  gcp
Blob azure

To put data from linux to s3 bucket

hdfs dfs -put <linux file path> <s3 folder uri path>

to get data from s3 to hdfs

hdfs dfs -cp <s3 uri > <hdfs path>
Hive with S3
In EMR

In hive commands
create external table
Student ( …… …… .)
Location ‘s3:a//vita//data//student’
Select * from Student

Unit-2 (Hadoop)
No ratings yet
Unit-2 (Hadoop)
16 pages
DSBDSAssingment 11
No ratings yet
DSBDSAssingment 11
20 pages
Bda - Unit 3
No ratings yet
Bda - Unit 3
29 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
29 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Unit 3
No ratings yet
Unit 3
14 pages
Bda Unit 4
No ratings yet
Bda Unit 4
16 pages
Big Data Exam Help
No ratings yet
Big Data Exam Help
7 pages
Activity 2
No ratings yet
Activity 2
31 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Unit 3 MapReduce Part 2
No ratings yet
Unit 3 MapReduce Part 2
12 pages
Module 3 - Mapreduce
No ratings yet
Module 3 - Mapreduce
40 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
HADOOP One Day Crash Course
No ratings yet
HADOOP One Day Crash Course
19 pages
1.mrplab Intro
No ratings yet
1.mrplab Intro
18 pages
Big Data
No ratings yet
Big Data
43 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
1.4 Map Reduce
No ratings yet
1.4 Map Reduce
30 pages
12 13 14 Map Reduce
No ratings yet
12 13 14 Map Reduce
57 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Lab Manual
No ratings yet
Lab Manual
86 pages
Procedure: 1
No ratings yet
Procedure: 1
29 pages
Downloaded Oct24 Lab5 Latestmanual
No ratings yet
Downloaded Oct24 Lab5 Latestmanual
24 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
18mcs35e U4
No ratings yet
18mcs35e U4
7 pages
R Integration With Hadoop
No ratings yet
R Integration With Hadoop
12 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Slide 3 Hadoop MapReduce Tutorial
No ratings yet
Slide 3 Hadoop MapReduce Tutorial
119 pages
Part 03 Intro To Hadoop
No ratings yet
Part 03 Intro To Hadoop
22 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
STREAMING
No ratings yet
STREAMING
12 pages
Unit-2 1
No ratings yet
Unit-2 1
93 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
Hadoop Python MapReduce Tutorial For Beginners
No ratings yet
Hadoop Python MapReduce Tutorial For Beginners
15 pages
Hadoop
No ratings yet
Hadoop
30 pages
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
27 pages
Unit IV BDA
No ratings yet
Unit IV BDA
32 pages
Module 2 Cont.
No ratings yet
Module 2 Cont.
16 pages
BDT Unit - Iii
No ratings yet
BDT Unit - Iii
12 pages
Apache - Hadoop Streaming
No ratings yet
Apache - Hadoop Streaming
13 pages
B. Hadoop Ecosystem - III (MapReduce)
No ratings yet
B. Hadoop Ecosystem - III (MapReduce)
55 pages
Data Science With Python - Lesson 12 - Python Integration With Hadoop
No ratings yet
Data Science With Python - Lesson 12 - Python Integration With Hadoop
53 pages
BDA Unit-4
No ratings yet
BDA Unit-4
32 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
05 - MapReduce in Hadoop - An Introduction
No ratings yet
05 - MapReduce in Hadoop - An Introduction
31 pages
Hadoop and Big Data
No ratings yet
Hadoop and Big Data
41 pages
BD Unit-02
No ratings yet
BD Unit-02
16 pages
CO3 Session 19
No ratings yet
CO3 Session 19
29 pages
Lsde Workshop wk9
No ratings yet
Lsde Workshop wk9
31 pages
Chapter 4 MapReduce and New Software Stack
No ratings yet
Chapter 4 MapReduce and New Software Stack
48 pages
wk8 Final
No ratings yet
wk8 Final
39 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet

Day 7

Uploaded by

Day 7

Uploaded by

Path for Hadoop map reduce

Hadoop streaming python,shell,c/c++,ruby

Steps to execute map reduce

Hadoop streaming is a utility that comes with Hadoop

TO SEE WHERE THE JAR FILE ARE IN HADOOP

Cloud Storage as HDFS

To put data from linux to s3 bucket

to get data from s3 to hdfs

You might also like