Lecture 18
Lecture 18
Operators
continue ..
FOREACH …. GENERATE
• Applies expression on each input record and generate one or multiple
records
• Syntax: FOREACH relation_name GENERATE {….}
• Usage example:
• projection (select one or multiple fields from a relation)
X = FOREACH A GENERATE f1;
X = FOREACH A GENERATE a1, a2;
FOREACH …. GENERATE
• If the input to the foreach is tuple, bag or map
• we can make projection on fields inside the bag
X = FOREACH C GENERATE group, A.(a1, a2);
DUMP X;
({(1,2)},1)
({(4,3),(4,2)},4)
({(8,4),(8,3)},8)
FOREACH …. GENERATE
• Apply functions
• In the following example, group, then apply count function
D = group C by word;
E = foreach D generate COUNT(C), group;
• nested example
A = LOAD 'data' AS (url:chararray,outlink:chararray);
DUMP A; outlink
url
(www.ccc.com,www.hjk.com)
(www.ddd.com,www.xyz.org)
(www.aaa.com,www.cvn.org)
(www.www.com,www.kpt.net)
X = FOREACH B {
(www.www.com,www.xyz.org)
FA= FILTER A BY outlink == 'www.xyz.org';
(www.ddd.com,www.xyz.org)
PA = FA.outlink;
DA = DISTINCT PA;
B = GROUP A BY url;
GENERATE group, COUNT(DA);}
}
DUMP B;
(www.aaa.com,{(www.aaa.com,www.cvn.org)})
DUMP X;
(www.ccc.com,{(www.ccc.com,www.hjk.com)})
(www.aaa.com,0)
(www.ddd.com,{(www.ddd.com,www.xyz.org),(www.ddd.com,www.xyz.org)})
(www.ccc.com,0)
(www.www.com,{(www.www.com,www.kpt.net),(www.www.com,www.xyz.org)})
(www.ddd.com,1)
(www.www.com,1)
Order By
A = LOAD 'data' AS (a1:int,a2:int,a3:int);
(4,2,1)
• Syntax: ORDER relation_name BY field_1 [DESC
| ASC], field_2 [DESC | ASC] (8,3,4)
(4,3,3)
(7,2,5)
(8,4,3)
X = ORDER A BY a3 DESC;
DUMP X;
(7,2,5)
(8,3,4)
(1,2,3)
(4,3,3)
(8,4,3)
(4,2,1)
Limit
• is used to limit number of tuples to be taken from a relation
• Syntax: LIMIT relation_name #number
• this will take the specified number of records from the relation
Built-in Functions
• Eval function
• String functions
• Date-time function
• Math functions
Eval functions
• AVG()
• COUNT()
• MIN() Case Sensitive
• MAX()
• SUM()
• CONCAT()
• DIFF()
• SUBTRACT()
Keyword. Use ALL if you want all tuples to go to a single group; for
• Syntax: AVG(expression)
• use group all operator: this will give two filed; the first is called all(one
value), the second is a bag of all tuples
• checks whether the first string ends with the second string
• STARTSWITH
• checks whether the first string starts with the second string
grunt> student_details = LOAD 'student_details.txt' USING PigStorage(‘,') as (id:int, name:chararray, gpa:int);
grunt> student_ends_with = foreach student_details Generate (name , id) , ENDSWITH (name , ’n’);
grunt> student_starts_with = foreach student_details Generate (name , id) , STARTSWITH (name , ’R’);
SUBSTRING()
• Return part (substring) of a given string
• the following example, will generate the first 3 letters from the name
grunt> student_substring = foreach student_details Generate (name , id) , SUBSTRING (name , 0 , 2);
EqualIsIgnoreCase()
• Compares if two strings (ignoring the case) are equal
• true if equal
• false if not
grunt> student_equal = foreach student_details Generate (name , id) , EqualIsIgnoreCase (name , ‘Ali’);
UPPER, LOWER
• UPPER
• convert letters to upper case
• LOWER
• convert letters to lower case
REPLACE
• replace characters in a given string with new characters
grunt> student_equal = foreach student_details Generate (name , id) , REPLACE (city , ‘Washington’, ‘WA’);