0% found this document useful (0 votes)
42 views

Exercises

Hadoop

Uploaded by

Pramod Vr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Exercises

Hadoop

Uploaded by

Pramod Vr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Exercises

1)
Wordcount
2)
Distributed Grep
3)
Distributed Sed
4)
Return a max length word
5)
Number of lines a file
6)
Given a huge amount of html files which
contains links to other pages. Now write a
mapreduce program which returns a data
structure which maintains all unique urls
7)
Implement Inverse indexing
8)
Suppose input is given as urls with number
of times it viewd.
Now we need to find the avg count of each url
Exampple input:
A1.html
10
A2.html
20
A3.html
5
A1.html
20
A4.html
60
A1.html
30
Output:
A1.html
A2.html
A3.html
A4.html
9)

20
20
5
60

Moving average

10) Count total number of words which starts


with a,b,c.,z
11) I am getting stream of data, I need to find
the aggregates for every hour basis.
12) [Distributed Cache][Exercise] Using a
lookup file, filter out all the stop words and
given count of normal words.
13) [DC] Using a lookup file, which contains
user name and phone numbers. Given input file
which contains only users, need to extract out
all phone numbers
14) Secondary Sort
15) Configurations
16) Example of Key-Value Input Format
17) Example of Sequence File Input Format
18) MR Unit test case
19) http://www.apache.org/dyn/closer.cgi
20) http://wiki.apache.org/hadoop/HowToContribute
http://www.slideshare.net/cwsteinbach/hive-quick-start-tutorial

https://cwiki.apache.org/Hive/languagemanual-udf.html
https://cwiki.apache.org/Hive/languagemanual-explain.html

http://www.antlr.org/wiki/display/ANTLR3/Interfacing+AST+with+Java
https://cwiki.apache.org/Hive/languagemanual-explain.html
http://www.riccomini.name/Topics/DistributedComputing/Hadoop/SortByValue/
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-nodecluster/

http://www.riccomini.name/Topics/DistributedComputing/Hadoop/SortByValue/
http://hbase.apache.org/book/standalone_dist.html#confirm
http://apache.techartifact.com/mirror/mrunit/mrunit-0.9.0-incubating/ and mockitoall-1.8.5 put these jars in class path
https://cwiki.apache.org/MRUNIT/mrunit-tutorial.html

You might also like