0% found this document useful (0 votes)

108 views10 pages

Unit 5 2 Marks

The document provides an overview of Big Data frameworks, focusing on Pig and Hive, including their definitions, features, and differences. It also covers HBase, ZooKeeper, and data visualization techniques, detailing their functionalities and applications in data processing and analysis. Key topics include Pig Latin, HiveQL, data types, and the advantages of using these tools in handling large datasets.

Uploaded by

ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

108 views10 pages

Unit 5 2 Marks

Uploaded by

ramya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

Applications on Big Bata using Pig and Hive, Data processing Operators in
Pig, Hive Services, Hive QL, Querying Data in Hive, Fundamentals of Hbase
and ZooKeeper, IBM InfoSphere Big Insights and Streams, Visualizations,
Visual Data Analysis techniques, Interaction techniques, Systems and
Applications

1. Define PIG
Pig is a high level data flow platform for creating Map Reduce programs of
Hadoop.
It is provided by Apache.
It is treated like a compiler which takes high level language like java as input
and converts into assembly level language.
The language for Pig is pig Latin.
Every task which can be achieved using PIG can also be achieved using
java used in Map reduce.

2. Mention the two modes for PIG execution

The Pig execution environment has two modes:
Local mode: All scripts are run on a single machine. Hadoop MapReduce
and HDFS are not required.
Hadoop: Also called MapReduce mode, all scripts are run on a given
Hadoop cluster.

3. List the 3 ways by which PIG programs can be run

Pig Latin Script: Simply a file containing Pig Latin commands, identified by
the .pig suffix (for example, file.pig or myscript.pig).
Grunt shell: Grunt is a command interpreter. You can type Pig Latin on the
grunt command line and Grunt will execute the command on your behalf.
Embedded: Pig programs can be executed as part of a Java program.

4. Mention the Features of PIG

It is a large-scale data processing system
Scripts are written in Pig Latin, a dataflow language
Developed by Yahoo, and open source
Pig runs on Hadoop. It makes use of both the Hadoop Distributed File
System, HDFS, and Hadoop’s processing system, MapReduce.
5. Differentiate PIG and Map reduce
PIG MAP REDUCE
PIG is a data flow language, the key Map/Reduce on the other hand is, it is
focus of Pig is manage the flow of data a programming model, or framework
from input source to output store. for processing large data sets in
distributed manner, using large number
of computers, i.e. nodes.

Pig is written specifically for managing

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 1 / 10
MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

data flow of Map reduce type of jobs.

PIG commands are submitted as
MapReduce jobs internally.
It is more concise. The 200 lines is A 200 lines Java code written for
reduced to 10 Lines in PIG MapReduce
it is bit slower as compared to No Translation Requried
MapReduce since PIG commands are
translated into Map reduce Prior to
Execution

6. Mention the various operations supported by PIG

Loading and storing of data
Streaming data
Filtering data
Grouping and joining data
Sorting data
Combining and splitting data

7. Define GRUNT
Grunt is Pig’s interactive shell. It enables users to enter Pig Latin interactively and
provides a shell for users to interact with HDFS.It is a command interpreter.

8. Mention the various data types supported by PIG

Pig’s data types can be divided into two categories: scalar types, which contain a
single value, and complex types, which contain other types.

Scalar Types
Pig’s scalar types are simple types that appear in most programming languages.
int
An integer. They store a four-byte signed integer
long
A long integer. They store an eight-byte signed integer.
float
A floating-point number. Uses four bytes to store their value.
double
A double-precision floating-point number. and use eight bytes to store their value
chararray
A string or character array, and are expressed as string literals with single quotes
bytearray
A blob or array of bytes.

Complex Types

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 2 / 10

MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

Pig has several complex data types such as maps, tuples, and bags. All of these
types can contain data of any type, including other complex types. So it is possible
to have a map where the value field is a bag, which contains a tuple where one of
the fields is a map.
Map
A map in Pig is a chararray to data element mapping, where that element can be
any Pig type, including a complex type. The chararray is called a key and is used
as an index to find the element, referred to as the value.
Tuple
A tuple is a fixed-length, ordered collection of Pig data elements. Tuples are
divided into fields, with each field containing one data element. These elements
can be of any type—they do not all need to be the same type. A tuple is analogous
to a row in SQL, with the fields being SQL columns.
Bag
A bag is an unordered collection of tuples. Because it has no order, it is not
possible to reference tuples in a bag by position. Like tuples, a bag can, but is not
required to, have a schema associated with it. In the case of a bag, the schema
describes all tuples within the bag.
Nulls
Pig includes the concept of a data element being null. Data of any type can be
null. It is important to understand that in Pig the concept of null is the same as in
SQL, which is completely different from the concept of null in C, Java, Python, etc.
In Pig a null data element means the value is unknown.
Casts
Indicates convert one type of content to any other type.

Type Description Example

Int Signed 32 bit integer 2

Long Signed 64 bit integer 15L or 15l

Float 32 bit floating point 2.5f or 2.5F

1.5 or 1.5e2 or
Double 32 bit floating point
1.5E2
charArray Character array hello javatpoint

byteArray BLOB(Byte array)

tuple Ordered set of fields (12,43)

bag Collection f tuples {(12,43),(54,28)}

map collection of tuples [open#apache]

9. Define HIVE

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 3 / 10

MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

• Hive is a data ware house system for Hadoop. It runs SQL like queries
called HQL (Hive query language) which gets internally converted to map
reduce jobs.
• Hive was developed by Facebook.
• Hive supports Data definition Language(DDL), Data Manipulation
Language(DML) and user defined functions.

10. List the various HIVE services

• cli - The command-line interface to Hive (the shell). This is the default
service.can run using the hive command.
• hiveserver2 -Runs Hive as a server exposing a Thrift service, enabli ng
access from a range of clients written in different languages.
• beeline- A command-line interface to Hive that works in embedded mode
• hwi - The Hive Web Interface
• jar - The Hive equivalent of hadoop jar, a convenient way to run Java
applications

11. Mention the various clients connected to Hive server

• Thrift Client (economy)
The Hive server is exposed as a Thrift service, so it’s possible to interact
with it using any programming language that supports Thrift.
• JDBC driver
• ODBC driver
An ODBC driver allows applications that support the ODBC protocol (such
as business intelligence software) to connect to Hive.
• The Metastore
The metastore is the central repository of Hive metadata

12. List the advantages of Hive

• Perfectly fits low level interface requirement of Hadoop
• Hive supports external tables and ODBC/JDBC
• Having Intelligence Optimizer
• Hive support of Table-level Partitioning to speed up the query times
• Metadata store is a big plus in the architecture that makes the lookup easy

13. List the Data Units of Hive

Hive data is organized into:
• Databases: Namespaces that separate tables and other data units from
naming confliction.
• Tables: Homogeneous units of data, which have the same schema.
• Partitions: Each Table can have one or more partition Keys which
determines how the data is stored. Partitions - apart from being storage
units - also allow the user to efficiently identify the rows that satisfy a certain
criteria.
• Partition columns are virtual columns, they are not part of the data itself but
are derived on load.
Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 4 / 10
MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

• Buckets (or Clusters): Data in each partition may in turn be divided into
Buckets based on the value of a hash function of some column of the Table.

14. Mention the various Hive data Types

Hive Support two types of data type formats
1. Primitive data type
2. Collection data type

Primitive Data Types

Collection Data Types

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 5 / 10

MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

15. Differentiate PIG vs Hive

Pig Hive
Procedural Data Flow Language Declarative SQLish Language
For Programming For creating reports
Mainly used by Researchers and
Mainly used by Data Analysts
Programmers
Operates on the client side of a Operates on the server side of a
cluster. cluster.
Makes use of exact variation of
Does not have a dedicated
dedicated SQL DDL language by
metadata database.
defining tables beforehand.
Directly leverages SQL and is
Pig is SQL like but varies to a
easy to learn for database
great extent.
experts.
Pig supports Avro file format. Hive does not support it.
Developed by Yahoo Developed by Facebook
Language used is Pig Latin Hive QL

16. Define Hive QL

HiveQL is the Hive query language
Hive query language (HiveQL) supports SQL features like CREATE tables, DROP
tables, SELECT ... FROM ... WHERE clauses, Joins (inner, left outer, right outer
and outer joins), Cartesian products, GROUP BY, SORT BY, aggregations, union
and many useful functions on primitive as well as complex data types.

hive> CREATE DATABASE IF NOT EXISTS financials;

hive> SHOW DATABASES;

hive> CREATE DATABASE human_resources;

hive> SHOW DATABASES;

DESCRIBE database
shows the directory location for the database.

hive> DESCRIBE DATABASE financials;

USE database

The USE command sets a database as your working database, analogous to

changing working directories in a filesystem

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 6 / 10

MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

hive> USE financials;

DROP database

hive> DROP DATABASE IF EXISTS financials;

The IF EXISTS is optional and suppresses warnings if financials doesn’t exist.

Alter Database
You can set key-value pairs in the DBPROPERTIES associated with a database
using the ALTER DATABASE command. No other metadata about the database
can be changed,including its name and directory location:

hive> ALTER DATABASE financials SET DBPROPERTIES ('edited-by' = 'active

steps');

17. Define HBase

Hbase is an open source and sorted map data built on Hadoop. It is column
oriented and horizontally scalable

18. Explain the need for HBase

• RDBMS get exponentially slow as the data becomes large
• Expects data to be highly structured, i.e. ability to fit in a well -defined
schema
• Any change in schema might require a downtime
• For sparse (thin) datasets, too much of overhead of maintaining NULL
values

19. List the Features of Hbase

• Horizontally scalable: You can add any number of columns anytime.
• Automatic Failover: Automatic failover is a resource that allows a system
administrator to automatically switch data handling to a standby system in
the event of system compromise
• Integrations with Map/Reduce framework: Al the commands and java codes
internally implement Map/ Reduce to do the task and it is built over Hadoop
Distributed File System.
• sparse, distributed, persistent, multidimensional sorted map, which is
indexed by rowkey, column key,and timestamp.
• Often referred as a key value store or column family -oriented database, or
storing versioned maps of maps.
• fundamentally, it's a platform for storing and retrieving data with random
access.
• It doesn't care about datatypes(storing an integer in one row and a string in
another for the same column).
• It doesn't enforce relationships within your data.

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 7 / 10

MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

• It is designed to run on a cluster of computers, built using commodity

hardware.

20. Differentiate RDBMS vs Hbase

RDBMS HBASE
Schema / Database HBase is schema-less, it doesn't
have the concept of fixed columns
schema; defines only column families
Built for small tables Built for wide tables
Table is RDBMS Column Family in Hbase
Record in RDBMS Record in Hbase
Data layout is row oriented Column Oriented
SQL is the query language Get/put/scan are used
used
Maximum data size is TBs Hundrends of PBs
1000s queries/second can Millions of queries per second
be read and written
RDBMS is transactional. No transactions are there in
HBase.
It has de-normalized data. It will have normalized data.
It is good for semi- It is good for structured data.
structured as well as
structured data.

21. List the Applications of Hbase

• It is used whenever there is a need to write heavy applications.

• HBase is used whenever we need to provide fast random access to available
data.
• Companies such as Facebook, Twitter, Yahoo,and Adobe use HBase
internally.

22. List the Components of Hbase

HBase has three major components:
• the client library,
• a master server,
• region servers.
• Region servers can be added or removed as per requirement.

23. Define Zoo Keeper

Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate
between themselves and maintain shared data with robust synchroniz ation
techniques.

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 8 / 10

MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

24. List the Benefits of Zoo Keeper

• Simple distributed coordination process
• Synchronization − Mutual exclusion and coo-operation between server
processes. This process helps in Apache HBase for configuration
management.
• Ordered Messages
• Serialization − Encode the data according to specific rules. Ensure your
application runs consistently. This approach can be used in MapReduce to
coordinate queue to execute running threads.
• Reliability
• Atomicity − Data transfer either succeed or fail
• completely, but no transaction is partial

25. Define Data Visualization

Data visualization is a general term that describes any effort to help people
understand the significance of data by placing it in a visual context. Patterns,
trends and correlations that might go undetected in text-based data can be
exposed and recognized easier with data visualization software.

Most business intelligence software vendors embed data visualization tools into
their products, either developing the visualization technology themsel ves or
sourcing it from companies that specialize in visualization.

26. List the various Interaction Techniques used in information visualization

A multiple view–system uses two or more distinct views to support the
investigation of a single conceptual entity.
Fish-eye lenses magnify the center of the field of view, with a continuous fall -off
in magnification toward the edges. Degree-of-interest values determine the level of
detail to be displayed for each item and are assigned through user interaction
Dynamic queries continuously update the data that is filtered from the database
and visualized.
The details on demand–technique allows interactively selecting parts of data to
be visualized more detailed while providing an overview of the whole informational
concept.
Filtering is one of the basic interaction techniques often used in information
visualization used to limit the amount of displayed information through filter
criteria.
The idea of linking and brushing is to combine different visualization methods to
overcome the shortcomings of single techniques. Interactive changes made in one
visualization are automatically reflected in the other visualizations.

Zooming is one of the basic interaction techniques of information visualizations.

Since the maximum amount of information can be limited by the resolution and
color depth of a display, zooming is a crucial technique to overcome this limitation.
There are three different zooming techniques.
Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 9 / 10
MC7011 BIG DATA ANALYTICS

UNIT V – FRAMEWORKS

Geometric Zoom
Fisheye Zoom
Flip Zooming
Semantic Zoom
There are three basic types of zooming.
Geometric zooming allows the user to specify the scale of magnification and
increasing or decreasing the magnification of an image by that scale. This allows
the user focus on a specific area and information outside of this area is generally
discarded. A great example is mapping software like MapQuest or Yahoo.
The fisheye zoom is similar to the geometric zoom with the exception that the
outside information is not lost from view; this information is merely distorted.
Semantic zooming approaches the process from a different angle. Semantic
zooming changes the shape or context in which the information is being presented.
An example of this type of technique is the use of a digital clock within an
application.

In a normal view, the clock may show the hour of the day and date. If the user
zooms in then the clock may alter it’s appearance by adding the seconds and
minutes. If the user that zooms out, information is discarded with only the date
remaining. The actual information did not change, only the presentation method.
Magic Lens filters are new a user interface tool that combine an arbitrarily-shaped
region with an operator that changes the view of objects viewed through that
region.

Brushing is the process of interactively selecting data items from a visual

representation. The original intention of brushing is to highlight brushed data items
in different views of visualization.

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 10 / 10

ZXA10 C3xx MIB Specifications
75% (4)
ZXA10 C3xx MIB Specifications
155 pages
DBMS Material 2023
No ratings yet
DBMS Material 2023
50 pages
Fundamentals-of-Computer-and-IT-BCA Notes (Unit1, Unit2, Unit3 and Unit4)
No ratings yet
Fundamentals-of-Computer-and-IT-BCA Notes (Unit1, Unit2, Unit3 and Unit4)
187 pages
PHP Lab - Iv Sem - Bca
No ratings yet
PHP Lab - Iv Sem - Bca
16 pages
Big Data Analytics - Unit 4
No ratings yet
Big Data Analytics - Unit 4
32 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
DBMS Unit 1 Notes
100% (1)
DBMS Unit 1 Notes
22 pages
Dbms Practical File
100% (1)
Dbms Practical File
22 pages
FDP Brochure PDF
100% (1)
FDP Brochure PDF
2 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
Gate 2023 Roadmap
100% (2)
Gate 2023 Roadmap
14 pages
Java MCQ
No ratings yet
Java MCQ
24 pages
DBMS (UNIT-6) (Advances in Databases and Big Data)
No ratings yet
DBMS (UNIT-6) (Advances in Databases and Big Data)
103 pages
Java Programming-18
No ratings yet
Java Programming-18
170 pages
Data Mining Notes
No ratings yet
Data Mining Notes
82 pages
Web Technologies Black Book
No ratings yet
Web Technologies Black Book
2 pages
Enterprise Java Unit 5
No ratings yet
Enterprise Java Unit 5
10 pages
Web Development Using PHP
No ratings yet
Web Development Using PHP
65 pages
SPCC Viva
No ratings yet
SPCC Viva
11 pages
VI SEM BCA Advanced Java - UNIT 4 - JSP-P1 MATERIAL
No ratings yet
VI SEM BCA Advanced Java - UNIT 4 - JSP-P1 MATERIAL
29 pages
C++ Question Bank
No ratings yet
C++ Question Bank
28 pages
B. SC Computer Science
100% (1)
B. SC Computer Science
5 pages
4th Year Comps GTU - BH - Qbanks
100% (1)
4th Year Comps GTU - BH - Qbanks
8 pages
Cyber Security IMP Points Short Notes
No ratings yet
Cyber Security IMP Points Short Notes
20 pages
Software Engineering Notes (Unit-III)
No ratings yet
Software Engineering Notes (Unit-III)
21 pages
3-1 Bigdata (Spark)
No ratings yet
3-1 Bigdata (Spark)
3 pages
Practical File (AI)
No ratings yet
Practical File (AI)
15 pages
18ai63 - Java For Mobile Applications Question Bank
No ratings yet
18ai63 - Java For Mobile Applications Question Bank
15 pages
Object Oriented Analysis and Design Two Mark and Sixteen Mark Q & A Part - A Questions and Answers Unit-I
No ratings yet
Object Oriented Analysis and Design Two Mark and Sixteen Mark Q & A Part - A Questions and Answers Unit-I
39 pages
SRM Institute of Science and Technology
No ratings yet
SRM Institute of Science and Technology
6 pages
CS302 Unit1-III
No ratings yet
CS302 Unit1-III
18 pages
Files in Python
No ratings yet
Files in Python
12 pages
Module II
No ratings yet
Module II
22 pages
Java - Lab - Manual-21csl35 - Skit
No ratings yet
Java - Lab - Manual-21csl35 - Skit
30 pages
Seminar Report On AI Driven Drug Discovery 2
No ratings yet
Seminar Report On AI Driven Drug Discovery 2
22 pages
TE7265 - Introduction To Data Science
No ratings yet
TE7265 - Introduction To Data Science
4 pages
JDBC Mock Test II
No ratings yet
JDBC Mock Test II
6 pages
OPERATING SYSTEM Multiple Choice Questions
No ratings yet
OPERATING SYSTEM Multiple Choice Questions
17 pages
Sapthagiri College of Engineering: Department of Information Science and Engineering Big Data Analytics Question Bank
No ratings yet
Sapthagiri College of Engineering: Department of Information Science and Engineering Big Data Analytics Question Bank
3 pages
Android Building Blocks
No ratings yet
Android Building Blocks
11 pages
Fs QB: Question Bank and Answers
No ratings yet
Fs QB: Question Bank and Answers
91 pages
18cs62 Mod 1
No ratings yet
18cs62 Mod 1
64 pages
OOAD
No ratings yet
OOAD
2 pages
Unit 5-Cloud PDF
No ratings yet
Unit 5-Cloud PDF
33 pages
Cdma IS-95, IMT-2000: Technology, AND
No ratings yet
Cdma IS-95, IMT-2000: Technology, AND
29 pages
CORBA Services
No ratings yet
CORBA Services
5 pages
IT2403 Systems Analysis and Design: (Compulsory)
No ratings yet
IT2403 Systems Analysis and Design: (Compulsory)
6 pages
Important Questions
No ratings yet
Important Questions
8 pages
Software Construction Lecture 1
No ratings yet
Software Construction Lecture 1
30 pages
Contiguous Memory Allocation: Partitions
No ratings yet
Contiguous Memory Allocation: Partitions
42 pages
ADBMS Lab Manual
No ratings yet
ADBMS Lab Manual
33 pages
Principles of Compiler Design
No ratings yet
Principles of Compiler Design
36 pages
Internet Technology and Web Design Viva Questions: 1.what Is DNS?
No ratings yet
Internet Technology and Web Design Viva Questions: 1.what Is DNS?
7 pages
Uid-Graphical System Advatages
No ratings yet
Uid-Graphical System Advatages
21 pages
Inacon - HSDPA Network Optimization & Troubleshooting
100% (1)
Inacon - HSDPA Network Optimization & Troubleshooting
116 pages
M2 - Entity Relationship (ER) Model
No ratings yet
M2 - Entity Relationship (ER) Model
22 pages
6580
No ratings yet
6580
10 pages
Computer Hardware N Software
No ratings yet
Computer Hardware N Software
21 pages
Dcar Manual Técnico Lan Mac 800
No ratings yet
Dcar Manual Técnico Lan Mac 800
32 pages
Et - Comp - 162 - Computer Applications Question Answer
No ratings yet
Et - Comp - 162 - Computer Applications Question Answer
13 pages
Mpi Lab Manual
No ratings yet
Mpi Lab Manual
79 pages
Chapter 2 Data Models Final
No ratings yet
Chapter 2 Data Models Final
18 pages
RS485 Type Modbus Communication Protocol (V1.1)
No ratings yet
RS485 Type Modbus Communication Protocol (V1.1)
4 pages
Understanding The Internet Protocol (IP) For RF Technicians
No ratings yet
Understanding The Internet Protocol (IP) For RF Technicians
84 pages
Cognos TM1 Vs Anaplan
No ratings yet
Cognos TM1 Vs Anaplan
19 pages
Protocols
No ratings yet
Protocols
60 pages
Basic Host To Host 1 15
No ratings yet
Basic Host To Host 1 15
37 pages
vp9 Bitstream Specification v0.6 20160331 Draft
No ratings yet
vp9 Bitstream Specification v0.6 20160331 Draft
171 pages
Tesla Project Presentation
No ratings yet
Tesla Project Presentation
20 pages
DBMS
No ratings yet
DBMS
12 pages
Week 6 - XSLT: Internet Technologies and Web Services
No ratings yet
Week 6 - XSLT: Internet Technologies and Web Services
53 pages
Demystifying The Mib
No ratings yet
Demystifying The Mib
12 pages
BitDefender Log File
No ratings yet
BitDefender Log File
3 pages
SYMCLI Command Description
No ratings yet
SYMCLI Command Description
3 pages
Lecture 12-Operation On Singly Linked Lists
No ratings yet
Lecture 12-Operation On Singly Linked Lists
27 pages
OSM-MR#10 Hackfest - HD2.6 Juju Relations PDF
No ratings yet
OSM-MR#10 Hackfest - HD2.6 Juju Relations PDF
37 pages
Lecture 9 Identifying Network Components and Protocols
No ratings yet
Lecture 9 Identifying Network Components and Protocols
16 pages
PDF Respuesta A Las Preguntas DL
No ratings yet
PDF Respuesta A Las Preguntas DL
6 pages
Chapter 2 Elementary Programming
No ratings yet
Chapter 2 Elementary Programming
76 pages
2D Scanner
No ratings yet
2D Scanner
1 page
Unit - 3 CPU Scheduling and Memory Management
No ratings yet
Unit - 3 CPU Scheduling and Memory Management
19 pages
Computer Multitasking With Desqview 386 in A Family Practice
No ratings yet
Computer Multitasking With Desqview 386 in A Family Practice
2 pages
CSE504
No ratings yet
CSE504
11 pages
C & Data Structures
From Everand
C & Data Structures
Prof. P. Padmanabham
No ratings yet
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
Touchpad Plus Ver. 4.0 Class 4: Windows 10 & MS Office 2019
From Everand
Touchpad Plus Ver. 4.0 Class 4: Windows 10 & MS Office 2019
Nidhi Gupta
No ratings yet
Touchpad Prime Ver. 1.2 Class 6
From Everand
Touchpad Prime Ver. 1.2 Class 6
Nisha Batra
No ratings yet
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
From Everand
The Datadog Handbook: A Guide to Monitoring, Metrics, and Tracing
Robert Johnson
No ratings yet
Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet

Unit 5 2 Marks

Uploaded by

Unit 5 2 Marks

Uploaded by

MC7011 BIG DATA ANALYTICS

2. Mention the two modes for PIG execution

3. List the 3 ways by which PIG programs can be run

4. Mention the Features of PIG

Pig is written specifically for managing

data flow of Map reduce type of jobs.

6. Mention the various operations supported by PIG

8. Mention the various data types supported by PIG

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 2 / 10

Type Description Example

Long Signed 64 bit integer 15L or 15l

Float 32 bit floating point 2.5f or 2.5F

byteArray BLOB(Byte array)

tuple Ordered set of fields (12,43)

bag Collection f tuples {(12,43),(54,28)}

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 3 / 10

10. List the various HIVE services

11. Mention the various clients connected to Hive server

12. List the advantages of Hive

13. List the Data Units of Hive

14. Mention the various Hive data Types

Primitive Data Types

Collection Data Types

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 5 / 10

15. Differentiate PIG vs Hive

16. Define Hive QL

hive> CREATE DATABASE IF NOT EXISTS financials;

hive> SHOW DATABASES;

hive> CREATE DATABASE human_resources;

hive> SHOW DATABASES;

hive> DESCRIBE DATABASE financials;

The USE command sets a database as your working database, analogous to

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 6 / 10

hive> USE financials;

hive> DROP DATABASE IF EXISTS financials;

hive> ALTER DATABASE financials SET DBPROPERTIES ('edited-by' = 'active

17. Define HBase

18. Explain the need for HBase

19. List the Features of Hbase

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 7 / 10

• It is designed to run on a cluster of computers, built using commodity

20. Differentiate RDBMS vs Hbase

21. List the Applications of Hbase

• It is used whenever there is a need to write heavy applications.

22. List the Components of Hbase

23. Define Zoo Keeper

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 8 / 10

24. List the Benefits of Zoo Keeper

25. Define Data Visualization

26. List the various Interaction Techniques used in information visualization

Zooming is one of the basic interaction techniques of information visualizations.

Brushing is the process of interactively selecting data items from a visual

Prepared by : Mrs.M.Nirmala / AP / MCA Page No : 10 / 10

You might also like