Unit 5 2 Marks
Unit 5 2 Marks
UNIT V – FRAMEWORKS
Applications on Big Bata using Pig and Hive, Data processing Operators in
Pig, Hive Services, Hive QL, Querying Data in Hive, Fundamentals of Hbase
and ZooKeeper, IBM InfoSphere Big Insights and Streams, Visualizations,
Visual Data Analysis techniques, Interaction techniques, Systems and
Applications
1. Define PIG
Pig is a high level data flow platform for creating Map Reduce programs of
Hadoop.
It is provided by Apache.
It is treated like a compiler which takes high level language like java as input
and converts into assembly level language.
The language for Pig is pig Latin.
Every task which can be achieved using PIG can also be achieved using
java used in Map reduce.
UNIT V – FRAMEWORKS
7. Define GRUNT
Grunt is Pig’s interactive shell. It enables users to enter Pig Latin interactively and
provides a shell for users to interact with HDFS.It is a command interpreter.
Pig’s data types can be divided into two categories: scalar types, which contain a
single value, and complex types, which contain other types.
Scalar Types
Pig’s scalar types are simple types that appear in most programming languages.
int
An integer. They store a four-byte signed integer
long
A long integer. They store an eight-byte signed integer.
float
A floating-point number. Uses four bytes to store their value.
double
A double-precision floating-point number. and use eight bytes to store their value
chararray
A string or character array, and are expressed as string literals with single quotes
bytearray
A blob or array of bytes.
Complex Types
UNIT V – FRAMEWORKS
Pig has several complex data types such as maps, tuples, and bags. All of these
types can contain data of any type, including other complex types. So it is possible
to have a map where the value field is a bag, which contains a tuple where one of
the fields is a map.
Map
A map in Pig is a chararray to data element mapping, where that element can be
any Pig type, including a complex type. The chararray is called a key and is used
as an index to find the element, referred to as the value.
Tuple
A tuple is a fixed-length, ordered collection of Pig data elements. Tuples are
divided into fields, with each field containing one data element. These elements
can be of any type—they do not all need to be the same type. A tuple is analogous
to a row in SQL, with the fields being SQL columns.
Bag
A bag is an unordered collection of tuples. Because it has no order, it is not
possible to reference tuples in a bag by position. Like tuples, a bag can, but is not
required to, have a schema associated with it. In the case of a bag, the schema
describes all tuples within the bag.
Nulls
Pig includes the concept of a data element being null. Data of any type can be
null. It is important to understand that in Pig the concept of null is the same as in
SQL, which is completely different from the concept of null in C, Java, Python, etc.
In Pig a null data element means the value is unknown.
Casts
Indicates convert one type of content to any other type.
9. Define HIVE
UNIT V – FRAMEWORKS
• Hive is a data ware house system for Hadoop. It runs SQL like queries
called HQL (Hive query language) which gets internally converted to map
reduce jobs.
• Hive was developed by Facebook.
• Hive supports Data definition Language(DDL), Data Manipulation
Language(DML) and user defined functions.
UNIT V – FRAMEWORKS
• Buckets (or Clusters): Data in each partition may in turn be divided into
Buckets based on the value of a hash function of some column of the Table.
UNIT V – FRAMEWORKS
DESCRIBE database
shows the directory location for the database.
UNIT V – FRAMEWORKS
DROP database
Alter Database
You can set key-value pairs in the DBPROPERTIES associated with a database
using the ALTER DATABASE command. No other metadata about the database
can be changed,including its name and directory location:
UNIT V – FRAMEWORKS
RDBMS HBASE
Schema / Database HBase is schema-less, it doesn't
have the concept of fixed columns
schema; defines only column families
Built for small tables Built for wide tables
Table is RDBMS Column Family in Hbase
Record in RDBMS Record in Hbase
Data layout is row oriented Column Oriented
SQL is the query language Get/put/scan are used
used
Maximum data size is TBs Hundrends of PBs
1000s queries/second can Millions of queries per second
be read and written
RDBMS is transactional. No transactions are there in
HBase.
It has de-normalized data. It will have normalized data.
It is good for semi- It is good for structured data.
structured as well as
structured data.
UNIT V – FRAMEWORKS
Most business intelligence software vendors embed data visualization tools into
their products, either developing the visualization technology themsel ves or
sourcing it from companies that specialize in visualization.
UNIT V – FRAMEWORKS
Geometric Zoom
Fisheye Zoom
Flip Zooming
Semantic Zoom
There are three basic types of zooming.
Geometric zooming allows the user to specify the scale of magnification and
increasing or decreasing the magnification of an image by that scale. This allows
the user focus on a specific area and information outside of this area is generally
discarded. A great example is mapping software like MapQuest or Yahoo.
The fisheye zoom is similar to the geometric zoom with the exception that the
outside information is not lost from view; this information is merely distorted.
Semantic zooming approaches the process from a different angle. Semantic
zooming changes the shape or context in which the information is being presented.
An example of this type of technique is the use of a digital clock within an
application.
In a normal view, the clock may show the hour of the day and date. If the user
zooms in then the clock may alter it’s appearance by adding the seconds and
minutes. If the user that zooms out, information is discarded with only the date
remaining. The actual information did not change, only the presentation method.
Magic Lens filters are new a user interface tool that combine an arbitrarily-shaped
region with an operator that changes the view of objects viewed through that
region.