0% found this document useful (0 votes)

15 views3 pages

060 Fileformats-Pig

This document discusses different file formats that can be used with Pig including compressed, sequence, and Avro files. It shows how to enable compression when storing Pig data, load and query compressed and sequence files in Pig, and how to store Pig data in sequence and Avro file formats. Key steps include setting compression options in Pig, loading and dumping compressed and sequence files, registering necessary jars, and using Elephant Bird and Avro Storage to write sequence and Avro files.

Uploaded by

Pradeep Saraswat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views3 pages

060 Fileformats-Pig

Uploaded by

Pradeep Saraswat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

/Hadoop In Real World/

Hadoop In Real World *** http://www.hadoopinrealworld.com

File Formats - Pig
/**************Hadoop In Real World**************/

### ENABLE COMPRESSION ###

grunt> SET mapred.compress.map.output true;

grunt> SET mapred.output.compress true;
grunt> SET mapred.output.compression.codec org.apache.hadoop.io.compress.GzipCodec;

grunt> stocks = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as

(exchange:chararray, symbol:chararray, date:chararray, open:float, high:float,
low:float, close:float, volume:int, adj_close:float);

### STORE stocks AS COMPRESSED FILE ###

grunt> STORE stocks INTO 'output/pig/compressed' USING PigStorage(',');

### VERIFY FILE IS COMPRESSED ###

hadoop fs -ls output/pig/compressed

hadoop fs -copyToLocal output/pig/compressed/part-m-00000.gz

tail part-m-00000.gz

### LOAD A COMPRESSED FILE ###

grunt> stocks_comp = LOAD 'output/pig/compressed' USING PigStorage(',') as

(exchange:chararray, symbol:chararray, date:chararray, open:float, high:float,
low:float, close:float, volume:int, adj_close:float);

grunt> top10 = LIMIT stocks_comp 10;

grunt> DUMP top10;

### LOADING A SEQUENCE FILE ###

grunt> REGISTER /usr/lib/pig/piggybank.jar;

grunt> DEFINE SequenceFileLoader
org.apache.pig.piggybank.storage.SequenceFileLoader();
grunt> seq_dataset = LOAD '/user/hirw/output/fileformats/sequence-file/compressed-
sequence-file' USING SequenceFileLoader AS (key:long, value:chararray);

grunt> split_value = FOREACH seq_dataset GENERATE FLATTEN(STRSPLIT(value, ',', 9));

grunt> sym_vol = FOREACH split_value GENERATE (chararray)$1 as symbol, (double)$7

as volume;
grunt> grp_sym = GROUP sym_vol BY symbol;
grunt> avg_vol = FOREACH grp_sym GENERATE group, AVG(sym_vol.volume);

grunt> top10 = LIMIT avg_vol 10;

grunt> DUMP top10;

### STORING A SEQUENCE FILE ###

REGISTER '/hirw-workshop/elephant-bird/lib/elephant-bird-pig-4.0.jar';
REGISTER '/hirw-workshop/elephant-bird/lib/elephant-bird-hadoop-compat-4.0.jar';
REGISTER '/hirw-workshop/elephant-bird/lib/elephant-bird-core-4.0.jar';

STORE avg_vol INTO 'output/pig/fileformats/sequence-file/pig-sequence' USING

com.twitter.elephantbird.pig.store.SequenceFileStorage (
'-c com.twitter.elephantbird.pig.util.TextConverter',
'-c com.twitter.elephantbird.pig.util.TextConverter'
);

### VERIFY FILE IS A SEQUENCE FILE ###

hadoop fs -ls output/pig/fileformats/sequence-file/pig-sequence

hadoop fs -copyToLocal output/pig/fileformats/sequence-file/pig-sequence/part-r-

00000

vi part-r-00000

hadoop fs -text output/pig/fileformats/sequence-file/pig-sequence/part-r-00000

### STORE AN AVRO FILE ###

REGISTER /usr/lib/pig/piggybank.jar
REGISTER /usr/lib/pig/lib/avro-*.jar
REGISTER /usr/lib/pig/lib/jackson-core-asl-*.jar
REGISTER /usr/lib/pig/lib/jackson-mapper-asl-*.jar
REGISTER /usr/lib/pig/lib/json-simple-*.jar

grunt> stocks = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as

(exchange:chararray, symbol:chararray, date:chararray, open:float, high:float,
low:float, close:float, volume:int, adj_close:float);

grunt> STORE stocks INTO 'output/pig/avro' USING

org.apache.pig.piggybank.storage.avro.AvroStorage(
'{
"schema": {
"namespace": "com.hirw.avro",
"type": "record",
"name": "Stock",
"fields": [
{"name": "exch", "type": "string"},
{"name": "symbol", "type": ["string", "null"]},
{"name": "ymd", "type": ["string", "null"]},
{"name": "price_open", "type": "float"},
{"name": "price_high", "type": ["float", "null"]},
{"name": "price_low", "type": ["float", "null"]},
{"name": "price_close", "type": "float"},
{"name": "volume", "type": ["int", "null"]},
{"name": "price_adj_close", "type": ["float",
"null"]}
]
}
}');

### VERIFY FILE IS AN AVRO FILE ###

hadoop fs -ls output/pig/avro
rm part-m-00000.avro
hadoop fs -copyToLocal output/pig/avro/part-m-00000.avro
vi part-m-00000.avro

### LOAD AN AVRO FILE ###

grunt> stocks_avro = LOAD 'output/pig/avro'

USING org.apache.pig.piggybank.storage.avro.AvroStorage(
'no_schema_check',
'schema_file',
'avro/stocks.avro.schema');

grunt> top10 = LIMIT stocks_avro 10;

grunt> DUMP top10;

Heavy M
No ratings yet
Heavy M
172 pages
Apache Pig
100% (2)
Apache Pig
80 pages
Mortar Pig Cheat Sheet
50% (2)
Mortar Pig Cheat Sheet
13 pages
Demonstration: Understanding Pig: HDP Developer: Apache Pig and Hive
No ratings yet
Demonstration: Understanding Pig: HDP Developer: Apache Pig and Hive
26 pages
Apache Pig: Senthil Kumar A
No ratings yet
Apache Pig: Senthil Kumar A
24 pages
BigData Module 2
No ratings yet
BigData Module 2
41 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
Hadoop - PIG User Material
No ratings yet
Hadoop - PIG User Material
292 pages
Exercise 3 Pig Evaluation Functions: IBM Software
No ratings yet
Exercise 3 Pig Evaluation Functions: IBM Software
12 pages
Apache PIG by Sravanthi
No ratings yet
Apache PIG by Sravanthi
31 pages
Hadoop - Session 7 Python
No ratings yet
Hadoop - Session 7 Python
6 pages
Pig
No ratings yet
Pig
16 pages
Unit 5
No ratings yet
Unit 5
16 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
Pig Setup and Test Run: by Kannan Kalidasan
No ratings yet
Pig Setup and Test Run: by Kannan Kalidasan
17 pages
BDA Module 2 PDF
No ratings yet
BDA Module 2 PDF
123 pages
Unit IV - Pig PDF
No ratings yet
Unit IV - Pig PDF
79 pages
Lecture+Notes+ +PIG
No ratings yet
Lecture+Notes+ +PIG
21 pages
L Apachepigdataquery PDF
No ratings yet
L Apachepigdataquery PDF
10 pages
Pig Exercise 1
No ratings yet
Pig Exercise 1
10 pages
Apache Pig: For Live Hadoop Training, Please See Courses
No ratings yet
Apache Pig: For Live Hadoop Training, Please See Courses
25 pages
Beam Design Excel Sheet
No ratings yet
Beam Design Excel Sheet
1 page
CCTR-809 Asset GPS Tracker User Manual
No ratings yet
CCTR-809 Asset GPS Tracker User Manual
16 pages
BOOX Note3 User Manual
No ratings yet
BOOX Note3 User Manual
152 pages
UNIT 5 Notes by ARUN JHAPATE
No ratings yet
UNIT 5 Notes by ARUN JHAPATE
21 pages
QESV1138 01 AP1055D Slides
No ratings yet
QESV1138 01 AP1055D Slides
185 pages
024 Pig-Latin-Solving-A-Problem-Pig
No ratings yet
024 Pig-Latin-Solving-A-Problem-Pig
2 pages
IC Software Project Sign Off Document 11340
No ratings yet
IC Software Project Sign Off Document 11340
7 pages
026 Pig-Latin-Joins-Pig
No ratings yet
026 Pig-Latin-Joins-Pig
3 pages
Unit III
No ratings yet
Unit III
118 pages
023 Pig-Latin-Loading-Projecting-Pig
No ratings yet
023 Pig-Latin-Loading-Projecting-Pig
1 page
Session 3.3
No ratings yet
Session 3.3
30 pages
Pig Hive
No ratings yet
Pig Hive
58 pages
Pig Hive
No ratings yet
Pig Hive
72 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
Loops in Python - Shishir Kant Singh
No ratings yet
Loops in Python - Shishir Kant Singh
16 pages
Pig
No ratings yet
Pig
12 pages
Pig Commands
No ratings yet
Pig Commands
9 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Code 188 - Punto Classic
No ratings yet
Code 188 - Punto Classic
5 pages
3 Pig
No ratings yet
3 Pig
77 pages
7 Ibiz Pig Workouts
No ratings yet
7 Ibiz Pig Workouts
7 pages
Allia IGS 3 Allia IGS 5 - MIS Maps - SM - 5871315-1EN - 3
No ratings yet
Allia IGS 3 Allia IGS 5 - MIS Maps - SM - 5871315-1EN - 3
2 pages
MCAD
No ratings yet
MCAD
24 pages
Pig Mini Project
No ratings yet
Pig Mini Project
1 page
Sectona Tech Overview
No ratings yet
Sectona Tech Overview
17 pages
9 - HVAC - Heat Recovery Ventilation - (Slide 3 and Slide 10 - Slide 15)
No ratings yet
9 - HVAC - Heat Recovery Ventilation - (Slide 3 and Slide 10 - Slide 15)
24 pages
CXDI Controller NF RF Software
No ratings yet
CXDI Controller NF RF Software
3 pages
Safa 218-00-000 SITE SURVEY
No ratings yet
Safa 218-00-000 SITE SURVEY
11 pages
Product Data Sheet Metco 5MPE Series Powder Feeders
No ratings yet
Product Data Sheet Metco 5MPE Series Powder Feeders
4 pages
BDC Output 7
No ratings yet
BDC Output 7
9 pages
Hadalation Steps
No ratings yet
Hadalation Steps
5 pages
Apache Pig
No ratings yet
Apache Pig
28 pages
User Manual 4587613
No ratings yet
User Manual 4587613
3 pages
Tectura Cloud Capability - 2017
No ratings yet
Tectura Cloud Capability - 2017
26 pages
Pig 2
No ratings yet
Pig 2
63 pages
Lab07-Apache Pig V1.01
No ratings yet
Lab07-Apache Pig V1.01
7 pages
AWSNotes
No ratings yet
AWSNotes
4 pages
Pig Expt 5
No ratings yet
Pig Expt 5
4 pages
Bda V
No ratings yet
Bda V
10 pages
The Augmented Matrix of A Linear System
No ratings yet
The Augmented Matrix of A Linear System
14 pages
Blue and White Modern Digital Marketing Agency Presentation
No ratings yet
Blue and White Modern Digital Marketing Agency Presentation
9 pages
Safety Systems and Accident Theory SSAT Reader 2021 09 29
No ratings yet
Safety Systems and Accident Theory SSAT Reader 2021 09 29
283 pages
Pig SKB
No ratings yet
Pig SKB
7 pages
Resume - Taha - Taha Jamal
No ratings yet
Resume - Taha - Taha Jamal
1 page
BDA Unit-4
No ratings yet
BDA Unit-4
98 pages
BIG DATA Module 2 FINAL SMI
No ratings yet
BIG DATA Module 2 FINAL SMI
44 pages
063 Rcfile
No ratings yet
063 Rcfile
1 page
043 Hive-Twitter-Q
No ratings yet
043 Hive-Twitter-Q
1 page
086 Kafka-Experiments
No ratings yet
086 Kafka-Experiments
1 page
BSBPEF501 - Assessment Task 2 2024
No ratings yet
BSBPEF501 - Assessment Task 2 2024
14 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
C16 Dcme
No ratings yet
C16 Dcme
311 pages
SysteDrawing LI23090+卡塔尔+CVES1.2+64台 V1 20230613
No ratings yet
SysteDrawing LI23090+卡塔尔+CVES1.2+64台 V1 20230613
3 pages
06 Pig 01 Intro 1
No ratings yet
06 Pig 01 Intro 1
23 pages
Pig Notes-1
No ratings yet
Pig Notes-1
6 pages
Survey AI
No ratings yet
Survey AI
17 pages
6 Part2
No ratings yet
6 Part2
45 pages
RV 10
No ratings yet
RV 10
8 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
101 pages
05a Pig
No ratings yet
05a Pig
52 pages
Ms Word
No ratings yet
Ms Word
42 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
94 pages
Pig: Building High-Level Dataflows Over Map-Reduce
No ratings yet
Pig: Building High-Level Dataflows Over Map-Reduce
59 pages
BDP Assignment 2
No ratings yet
BDP Assignment 2
12 pages
Group 3
No ratings yet
Group 3
24 pages
Bug Sheet
No ratings yet
Bug Sheet
17 pages
Advanced Certification in Full Stack Developer Course IITG
No ratings yet
Advanced Certification in Full Stack Developer Course IITG
13 pages
Bda Module 5
No ratings yet
Bda Module 5
26 pages
Apache Pig Data Processing Guide
No ratings yet
Apache Pig Data Processing Guide
10 pages
Iimjobs Siddharth Singh
No ratings yet
Iimjobs Siddharth Singh
1 page
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
No ratings yet
ISTQB CTFL40 Sample-Exam-Answers SET-E v1.2 GTB-edition Engl en
59 pages

060 Fileformats-Pig

Uploaded by

060 Fileformats-Pig

Uploaded by

/**************Hadoop In Real World**************/

Hadoop In Real World *** http://www.hadoopinrealworld.com

### ENABLE COMPRESSION ###

grunt> SET mapred.compress.map.output true;

grunt> stocks = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as

### STORE stocks AS COMPRESSED FILE ###

grunt> STORE stocks INTO 'output/pig/compressed' USING PigStorage(',');

### VERIFY FILE IS COMPRESSED ###

hadoop fs -ls output/pig/compressed

hadoop fs -copyToLocal output/pig/compressed/part-m-00000.gz

### LOAD A COMPRESSED FILE ###

grunt> stocks_comp = LOAD 'output/pig/compressed' USING PigStorage(',') as

grunt> top10 = LIMIT stocks_comp 10;

### LOADING A SEQUENCE FILE ###

grunt> REGISTER /usr/lib/pig/piggybank.jar;

grunt> split_value = FOREACH seq_dataset GENERATE FLATTEN(STRSPLIT(value, ',', 9));

grunt> sym_vol = FOREACH split_value GENERATE (chararray)$1 as symbol, (double)$7

grunt> top10 = LIMIT avg_vol 10;

### STORING A SEQUENCE FILE ###

STORE avg_vol INTO 'output/pig/fileformats/sequence-file/pig-sequence' USING

### VERIFY FILE IS A SEQUENCE FILE ###

hadoop fs -ls output/pig/fileformats/sequence-file/pig-sequence

hadoop fs -copyToLocal output/pig/fileformats/sequence-file/pig-sequence/part-r-

hadoop fs -text output/pig/fileformats/sequence-file/pig-sequence/part-r-00000

### STORE AN AVRO FILE ###

grunt> stocks = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as

grunt> STORE stocks INTO 'output/pig/avro' USING

### VERIFY FILE IS AN AVRO FILE ###

### LOAD AN AVRO FILE ###

grunt> stocks_avro = LOAD 'output/pig/avro'

grunt> top10 = LIMIT stocks_avro 10;

You might also like

/Hadoop In Real World/