0% found this document useful (0 votes)

28 views34 pages

Hadoop MCQs

Hadoop mcqs collected from web

Uploaded by

younshamod136

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views34 pages

Hadoop MCQs

Hadoop mcqs collected from web

Uploaded by

younshamod136

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 34

MCQs sanfoundary for Hadoop

Hadoop Questions and Answers – History of Hadoop

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “History of
Hadoop”.

1. IBM and ________ have announced a major initiative to use Hadoop to support university
courses in distributed computer programming.

a) Google Latitude

b) Android (operating system)

c) Google Variations

d) Google

View Answer

Answer: d

Explanation: Google and IBM Announce University Initiative to Address Internet-Scale.

2. Point out the correct statement.

a) Hadoop is an ideal environment for extracting and transforming small volumes of data

b) Hadoop stores data in HDFS and supports data compression/decompression

c) The Giraph framework is less useful than a MapReduce job to solve graph and machine
learning

d) None of the mentioned

View Answer

Answer: b

Explanation: Data compression can be achieved using compression algorithms like bzip2,
gzip, LZO, etc. Different algorithms can be used in different scenarios based on their
capabilities.
3. What license is Hadoop distributed under?

a) Apache License 2.0

b) Mozilla Public License

c) Shareware

d) Commercial

View Answer

Answer: a

Explanation: Hadoop is Open Source, released under Apache 2 license.

4. Sun also has the Hadoop Live CD ________ project, which allows running a fully functional
Hadoop cluster using a live CD.

a) OpenOffice.org

b) OpenSolaris

c) GNU

d) Linux

View Answer

Answer: b

Explanation: The OpenSolaris Hadoop LiveCD project built a bootable CD-ROM image.

5. Which of the following genres does Hadoop produce?

a) Distributed file system

b) JAX-RS

c) Java Message Service

d) Relational Database Management System

View Answer
Answer: a

Explanation: The Hadoop Distributed File System (HDFS) is designed to store very large data
sets reliably, and to stream those data sets at high bandwidth to the user.

Note: Join free Sanfoundry classes at Telegram or Youtube

6. What was Hadoop written in?

a) Java (software platform)

b) Perl

c) Java (programming language)

d) Lua (programming language)

View Answer

Answer: c

Explanation: The Hadoop framework itself is mostly written in the Java programming
language, with some native code in C and command-line utilities written as shell scripts.

7. Which of the following platforms does Hadoop run on?

a) Bare metal

b) Debian

c) Cross-platform

d) Unix-like

View Answer

Answer: c

Explanation: Hadoop has support for cross-platform operating system.

8. Hadoop achieves reliability by replicating the data across multiple hosts and hence does
not require ________ storage on hosts.
a) RAID

b) Standard RAID levels

c) ZFS

d) Operating system

View Answer

Answer: a

Explanation: With the default replication value, 3, data is stored on three nodes: two on the
same rack, and one on a different rack.

9. Above the file systems comes the ________ engine, which consists of one Job Tracker, to
which client applications submit MapReduce jobs.

a) MapReduce

b) Google

c) Functional programming

d) Facebook

View Answer

Answer: a

Explanation: MapReduce engine uses to distribute work around a cluster.

10. The Hadoop list includes the HBase database, the Apache Mahout ________ system, and
matrix operations.

a) Machine learning

b) Pattern recognition

c) Statistical classification

d) Artificial intelligence

View Answer
Answer: a

Explanation: The Apache Mahout project’s goal is to build a scalable machine learning tool.

Hadoop Questions and Answers – Hadoop Ecosystem

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Hadoop
Ecosystem”.

1. ________ is a platform for constructing data flows for extract, transform, and load (ETL)
processing and analysis of large datasets.

a) Pig Latin

b) Oozie

c) Pig

d) Hive

View Answer

Answer: c

Explanation: Apache Pig is a platform for analyzing large data sets that consists of a high-
level language for expressing data analysis programs.

2. Point out the correct statement.

a) Hive is not a relational database, but a query engine that supports the parts of SQL
specific to querying data

b) Hive is a relational database with SQL support

c) Pig is a relational database with SQL support

d) All of the mentioned

View Answer
Answer: a

Explanation: Hive is a SQL-based data warehouse system for Hadoop that facilitates data
summarization, ad hoc queries, and the analysis of large datasets stored in Hadoop-
compatible file systems.

3. _________ hides the limitations of Java behind a powerful and concise Clojure API for
Cascading.

a) Scalding

b) HCatalog

c) Cascalog

d) All of the mentioned

View Answer

Answer: c

Explanation: Cascalog also adds Logic Programming concepts inspired by Datalog. Hence the
name “Cascalog” is a contraction of Cascading and Datalog.

4. Hive also support custom extensions written in ____________

a) C#

b) Java

c) C

d) C++

View Answer

Answer: b

Explanation: Hive also supports custom extensions written in Java, including user-defined
functions (UDFs) and serializer-deserializers for reading and optionally writing custom
formats.

5. Point out the wrong statement.

a) Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering

b) Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering

c) Scalding is a Scala API on top of Cascading that removes most Java boilerplate

d) All of the mentioned

View Answer

Answer: a

Explanation: Rather than building Hadoop deployments manually on EC2 (Elastic Compute
Cloud) clusters, users can spin up fully configured Hadoop installations using simple
invocation commands, either through the AWS Web Console or through command-line
tools.

6. ________ is the most popular high-level Java API in Hadoop Ecosystem

a) Scalding

b) HCatalog

c) Cascalog

d) Cascading

View Answer

Answer: d

Explanation: Cascading hides many of the complexities of MapReduce programming behind

more intuitive pipes and data flow abstractions.

Subscribe Now: Hadoop Newsletter | Important Subjects Newsletters

7. ___________ is general-purpose computing model and runtime system for distributed

data analytics.

a) Mapreduce

b) Drill

c) Oozie

d) None of the mentioned

View Answer

Answer: a

Explanation: Mapreduce provides a flexible and scalable foundation for analytics, from
traditional reporting to leading-edge machine learning algorithms.

8. The Pig Latin scripting language is not only a higher-level data flow language but also has
operators similar to ____________

a) SQL

b) JSON

c) XML

d) All of the mentioned

View Answer

Answer: a

Explanation: Pig Latin, in essence, is designed to fill the gap between the declarative style of
SQL and the low-level procedural style of MapReduce.

9. _______ jobs are optimized for scalability but not latency.

a) Mapreduce

b) Drill

c) Oozie

d) Hive

View Answer

Answer: d

Explanation: Hive Queries are translated to MapReduce jobs to exploit the scalability of
MapReduce.
10. ______ is a framework for performing remote procedure calls and data serialization.

a) Drill

b) BigTop

c) Avro

d) Chukwa

View Answer

Answer: c

Explanation: In the context of Hadoop, Avro can be used to pass data from one program or
language to another.

Hadoop Questions and Answers – Big Data

This set of Multiple Choice Questions & Answers (MCQs) focuses on “Big-Data”.

1. As companies move past the experimental phase with Hadoop, many cite the need for
additional capabilities, including _______________

a) Improved data storage and information retrieval

b) Improved extract, transform and load features for data integration

c) Improved data warehousing functionality

d) Improved security, workload management, and SQL support

View Answer

Answer: d

Explanation: Adding security to Hadoop is challenging because all the interactions do not
follow the classic client-server pattern.

2. Point out the correct statement.

a) Hadoop do need specialized hardware to process the data

b) Hadoop 2.0 allows live stream processing of real-time data

c) In the Hadoop programming framework output files are divided into lines or records

d) None of the mentioned

View Answer

Answer: b

Explanation: Hadoop batch processes data distributed over a number of computers ranging
in 100s and 1000s.

3. According to analysts, for what can traditional IT systems provide a foundation when
they’re integrated with big data technologies like Hadoop?

a) Big data management and data mining

b) Data warehousing and business intelligence

c) Management of Hadoop clusters

d) Collecting and storing unstructured data

View Answer

Answer: a

Explanation: Data warehousing integrated with Hadoop would give a better understanding
of data.

4. Hadoop is a framework that works with a variety of related tools. Common cohorts
include ____________

a) MapReduce, Hive and HBase

b) MapReduce, MySQL and Google Apps

c) MapReduce, Hummer and Iguana

d) MapReduce, Heron and Trumpet

View Answer

Answer: a

Explanation: To use Hive with HBase you’ll typically want to launch two clusters, one to run
HBase and the other to run Hive.

advertisement
5. Point out the wrong statement.

a) Hardtop processing capabilities are huge and its real advantage lies in the ability to
process terabytes & petabytes of data

b) Hadoop uses a programming model called “MapReduce”, all the programs should
conform to this model in order to work on the Hadoop platform

c) The programming model, MapReduce, used by Hadoop is difficult to write and test

d) All of the mentioned

View Answer

Answer: c

Explanation: The programming model, MapReduce, used by Hadoop is simple to write and
test.

6. What was Hadoop named after?

a) Creator Doug Cutting’s favorite circus act

b) Cutting’s high school rock band

c) The toy elephant of Cutting’s son

d) A sound Cutting’s laptop made during Hadoop development

View Answer

Answer: c

Explanation: Doug Cutting, Hadoop creator, named the framework after his child’s stuffed
toy elephant.

Sanfoundry Certification Contest of the Month is Live. 100+ Subjects. Participate Now!

7. All of the following accurately describe Hadoop, EXCEPT ____________

a) Open-source

b) Real-time
c) Java-based

d) Distributed computing approach

View Answer

Answer: b

Explanation: Apache Hadoop is an open-source software framework for distributed storage

and distributed processing of Big Data on clusters of commodity hardware.

8. __________ can best be described as a programming model used to develop Hadoop-

based applications that can process massive amounts of data.

a) MapReduce

b) Mahout

c) Oozie

d) All of the mentioned

View Answer

Answer: a

Explanation: MapReduce is a programming model and an associated implementation for

processing and generating large data sets with a parallel, distributed algorithm.

9. __________ has the world’s largest Hadoop cluster.

a) Apple

b) Datamatics

c) Facebook

d) None of the mentioned

View Answer

Answer: c
Explanation: Facebook has many Hadoop clusters, the largest among them is the one that is
used for Data warehousing.

10. Facebook Tackles Big Data With _______ based on Hadoop.

a) ‘Project Prism’

b) ‘Prism’

c) ‘Project Big’

d) ‘Project Data’

View Answer

Answer: a

Explanation: Prism automatically replicates and moves data wherever it’s needed across a
vast network of computing facilities.

Hadoop Questions and Answers – Introduction to Mapreduce

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by .1
.the JobTracker

a) MapReduce

b) Mapper

c) TaskTracker

d) JobTracker

Answer: c

Explanation: TaskTracker receives the information necessary for the execution of a Task from
.JobTracker, Executes the Task, and Sends the Results back to JobTracker

.Point out the correct statement.2

a) MapReduce tries to place the data and the compute as close as possible

b) Map Task in MapReduce is performed using the Mapper() function

c) Reduce Task in MapReduce is performed using the Map() function

d) All of the mentioned

Answer: a

.”Explanation: This feature of MapReduce is “Data Locality

part of the MapReduce is responsible for processing one or more chunks ___________ .3
.of data and producing the output results

a) Maptask

b) Mapper

c) Task execution

d) All of the mentioned

Answer: a

function is responsible for consolidating the results produced by each of the _________ .4
.Map() functions/tasks

a) Reduce

b) Map

c) Reducer

d) All of the mentioned

View Answer

Answer: a

.Explanation: Reduce function collates the work and resolves the results

.Point out the wrong statement .5

a) A MapReduce job usually splits the input data-set into independent chunks which are
processed by the map tasks in a completely parallel manner

b) The MapReduce framework operates exclusively on <key, value> pairs

c) Applications typically implement the Mapper and Reducer interfaces to provide the map
and reduce methods

d) None of the mentioned

View Answer

Answer: d

Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them
.and re-executes the failed tasks

Although the Hadoop framework is implemented in Java, MapReduce applications need .6

____________ not be written in

a) Java

b) C

#c) C

d) None of the mentioned

Answer: a

Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce

.applications (non JNITM based)

SanfoundryMenu

Hadoop Questions and Answers – Introduction to Mapreduce

This set of Multiple Choice Questions & Answers (MCQs) focuses on “Introduction to
.”Mapreduce

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by .1
.the JobTracker

a) MapReduce

b) Mapper

c) TaskTracker

d) JobTracker

View Answer

Answer: c
Explanation: TaskTracker receives the information necessary for the execution of a Task from
.JobTracker, Executes the Task, and Sends the Results back to JobTracker

.Point out the correct statement .2

a) MapReduce tries to place the data and the compute as close as possible

b) Map Task in MapReduce is performed using the Mapper() function

c) Reduce Task in MapReduce is performed using the Map() function

d) All of the mentioned

View Answer

Answer: a

.”Explanation: This feature of MapReduce is “Data Locality

part of the MapReduce is responsible for processing one or more chunks ___________ .3
.of data and producing the output results

a) Maptask

b) Mapper

c) Task execution

d) All of the mentioned

View Answer

Answer: a

.Explanation: Map Task in MapReduce is performed using the Map() function

function is responsible for consolidating the results produced by each of the _________ .4
.Map() functions/tasks

a) Reduce

b) Map

c) Reducer
d) All of the mentioned

View Answer

Answer: a

.Explanation: Reduce function collates the work and resolves the results

.Point out the wrong statement .5

a) A MapReduce job usually splits the input data-set into independent chunks which are
processed by the map tasks in a completely parallel manner

b) The MapReduce framework operates exclusively on <key, value> pairs

c) Applications typically implement the Mapper and Reducer interfaces to provide the map
and reduce methods

d) None of the mentioned

View Answer

Answer: d

Explanation: The MapReduce framework takes care of scheduling tasks, monitoring them
and re-executes the failed tasks.

6. Although the Hadoop framework is implemented in Java, MapReduce applications need

not be written in ____________

a) Java

b) C

c) C#

d) None of the mentioned

View Answer

Answer: a
Explanation: Hadoop Pipes is a SWIG- compatible C++ API to implement MapReduce
applications (non JNITM based).

Sanfoundry Certification Contest of the Month is Live. 100+ Subjects. Participate Now!

7. ________ is a utility which allows users to create and run jobs with any executables as the
mapper and/or the reducer.

a) Hadoop Strdata

b) Hadoop Streaming

c) Hadoop Stream

d) None of the mentioned

View Answer

8. __________ maps input key/value pairs to a set of intermediate key/value pairs.

a) Mapper

b) Reducer

c) Both Mapper and Reducer

d) None of the mentioned

View Answer

9. The number of maps is usually driven by the total size of ____________

a) inputs

b) outputs

c) tasks

d) None of the mentioned

View Answer

Answer: a

Explanation: Total size of inputs means the total number of blocks of the input files.

10. _________ is the default Partitioner for partitioning key space.

a) HashPar

b) Partitioner

c) HashPartitioner

d) None of the mentioned

Answer: c

Explanation: The default partitioner in Hadoop is the HashPartitioner which has a method
called getPartition to partition.

11. Running a ___________ program involves running mapping tasks on many or all of the
nodes in our cluster.

a) MapReduce

b) Map

c) Reducer

d) All of the mentioned

Answer: a

Explanation: In some applications, component tasks need to create and/or write to side-files,
which differ from the actual job-output files.

9. __________ is a generalization of the facility provided by the MapReduce framework to

collect data output by the Mapper or the Reducer.

a) Partitioner
b) OutputCollector

c) Reporter

d) All of the mentioned

View Answer

Answer: b

Explanation: Hadoop MapReduce comes bundled with a library of generally useful mappers,
reducers, and partitioners.

10. _________ is the primary interface for a user to describe a MapReduce job to the
Hadoop framework for execution.

a) Map Parameters

b) JobConf

c) MemoryConf

d) None of the mentioned

View Answer

Answer: b

Explanation: JobConf represents a MapReduce job configuration.

Hadoop Questions and Answers – Scaling out in Hadoop

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Scaling out
in Hadoop”.

1. ________ systems are scale-out file-based (HDD) systems moving to more uses of
memory in the nodes.

a) NoSQL

b) NewSQL

c) SQL

d) All of the mentioned

View Answer

Answer: a

Explanation: NoSQL systems make the most sense whenever the application is based on data
with varying data types and the data can be stored in key-value notation.

2. Point out the correct statement.

a) Hadoop is ideal for the analytical, post-operational, data-warehouse-ish type of workload

b) HDFS runs on a small cluster of commodity-class nodes

c) NEWSQL is frequently the collection point for big data

d) None of the mentioned

View Answer

Answer: a

Explanation: Hadoop together with a relational data warehouse, they can form very effective
data warehouse infrastructure.

3. Hadoop data is not sequenced and is in 64MB to 256MB block sizes of delimited record
values with schema applied on read based on ____________

a) HCatalog

b) Hive

c) Hbase

d) All of the mentioned

View Answer

Answer: a

Explanation: Other means of tagging the values also can be used.

4. __________ are highly resilient and eliminate the single-point-of-failure risk with
traditional Hadoop deployments.

a) EMR

b) Isilon solutions

c) AWS

d) None of the mentioned

View Answer

Answer: b

Explanation: enterprise data protection and security options including file system auditing
and data-at-rest encryption to address compliance requirements are also provided by Isilon
solution.

5. Point out the wrong statement.

a) EMC Isilon Scale-out Storage Solutions for Hadoop combine a powerful yet simple and
highly efficient storage platform

b) Isilon native HDFS integration means you can avoid the need to invest in a separate
Hadoop infrastructure

c) NoSQL systems do provide high latency access and accommodate less concurrent users

d) None of the mentioned

View Answer

Answer: c

Explanation: NoSQL systems do provide low latency access and accommodate many
concurrent users.

6. HDFS and NoSQL file systems focus almost exclusively on adding nodes to ____________

a) Scale out

b) Scale up

c) Both Scale out and up

d) None of the mentioned

View Answer

Answer: a

Explanation: HDFS and NoSQL file systems focus almost exclusively on adding nodes to
increase performance (scale-out) but even they require node configuration with elements of
scale up.

Sanfoundry Certification Contest of the Month is Live. 100+ Subjects. Participate Now!

7. Which is the most popular NoSQL database for scalable big data store with Hadoop?

a) Hbase

b) MongoDB

c) Cassandra

d) None of the mentioned

View Answer

Answer: a

Explanation: HBase is the Hadoop database: a distributed, scalable Big Data store that lets
you host very large tables — billions of rows multiplied by millions of columns — on clusters
built with commodity hardware.

8. The ___________ can also be used to distribute both jars and native libraries for use in
the map and/or reduce tasks.

a) DataCache

b) DistributedData

c) DistributedCache

d) All of the mentioned

View Answer
Answer: c

Explanation: The child-jvm always has its current working directory added to the
java.library.path and LD_LIBRARY_PATH.

9. HBase provides ___________ like capabilities on top of Hadoop and HDFS.

a) TopTable

b) BigTop

c) Bigtable

d) None of the mentioned

View Answer

Answer: c

Explanation: Google Bigtable leverages the distributed data storage provided by the Google
File System.

10. __________ refers to incremental costs with no major impact on solution design,
performance and complexity.

a) Scale-out

b) Scale-down

c) Scale-up

d) None of the mentioned

View Answer

Answer: c

Explanation: Adding more CPU/RAM/Disk capacity to Hadoop DataNode that is already part
of a cluster does not require additional network switches.

Hadoop Questions and Answers – Hadoop Streaming

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Hadoop
Streaming”.

1. Streaming supports streaming command options as well as _________ command options.

a) generic

b) tool

c) library

d) task

View Answer

Answer: a

Explanation: Place the generic options before the streaming options, otherwise the
command will fail.

2. Point out the correct statement.

a) You can specify any executable as the mapper and/or the reducer

b) You cannot supply a Java class as the mapper and/or the reducer

c) The class you supply for the output format should return key/value pairs of Text class

d) All of the mentioned

View Answer

Answer: a

Explanation: If you do not specify an input format class, the TextInputFormat is used as the
default.

3. Which of the following Hadoop streaming command option parameter is required?

a) output directoryname

b) mapper executable

c) input directoryname
d) all of the mentioned

View Answer

Answer: d

Explanation: Required parameters are used for Input and Output location for the mapper.

4. To set an environment variable in a streaming command use ____________

a) -cmden EXAMPLE_DIR=/home/example/dictionaries/

b) -cmdev EXAMPLE_DIR=/home/example/dictionaries/

c) -cmdenv EXAMPLE_DIR=/home/example/dictionaries/

d) -cmenv EXAMPLE_DIR=/home/example/dictionaries/

View Answer

Answer: c

Explanation: Environment Variable is set using cmdenv command.

5. Point out the wrong statement.

a) Hadoop has a library package called Aggregate

b) Aggregate allows you to define a mapper plugin class that is expected to generate
“aggregatable items” for each input key/value pair of the mappers

c) To use Aggregate, simply specify “-mapper aggregate”

d) None of the mentioned

View Answer

Answer: c

Explanation: To use Aggregate, simply specify “-reducer aggregate”:

Sanfoundry Certification Contest of the Month is Live. 100+ Subjects. Participate Now!

ADVERTISEMENT
6. The ________ option allows you to copy jars locally to the current working directory of
tasks and automatically unjar the files.

a) archives

b) files

c) task

d) none of the mentioned

View Answer

Answer: a

Explanation: Archives options is also a generic option.

7. ______________ class allows the Map/Reduce framework to partition the map outputs
based on certain key fields, not the whole keys.

a) KeyFieldPartitioner

b) KeyFieldBasedPartitioner

c) KeyFieldBased

d) None of the mentioned

View Answer

Answer: b

Explanation: The primary key is used for partitioning, and the combination of the primary
and secondary keys is used for sorting.

8. Which of the following class provides a subset of features provided by the Unix/GNU Sort?

a) KeyFieldBased

b) KeyFieldComparator

c) KeyFieldBasedComparator

d) All of the mentioned

View Answer

Answer: c

Explanation: Hadoop has a library class, KeyFieldBasedComparator, that is useful for many
applications.

9. Which of the following class is provided by the Aggregate package?

a) Map

b) Reducer

c) Reduce

d) None of the mentioned

View Answer

Answer: b

Explanation: Aggregate provides a special reducer class and a special combiner class, and a
list of simple aggregators that perform aggregations such as “sum”, “max”, “min” and so on
over a sequence of values.

10. Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce,

that effectively allows you to process text data like the unix ______ utility.

a) Copy

b) Cut

c) Paste

d) Move

View Answer

Answer: b

Explanation: The map function defined in the class treats each input key/value pair as a list
of fields
Hadoop Questions and Answers – Introduction to HDFS

This set of Multiple Choice Questions & Answers (MCQs) focuses on “Introduction to
HDFS”.

1. A ________ serves as the master and there is only one NameNode per cluster.

a) Data Node

b) NameNode

c) Data block

d) Replication

View Answer

Answer: b

Explanation: All the metadata related to HDFS including the information about data nodes,
files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

2. Point out the correct statement.

a) DataNode is the slave/worker node and holds the user data in the form of Data Blocks

b) Each incoming file is broken into 32 MB by default

c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of
fault tolerance

d) None of the mentioned

View Answer

Answer: a

Explanation: There can be any number of DataNodes in a Hadoop Cluster.

3. HDFS works in a __________ fashion.

a) master-worker

b) master-slave

c) worker/slave
d) all of the mentioned

View Answer

Answer: a

Explanation: NameNode servers as the master and each DataNode servers as a worker/slave

4. ________ NameNode is used when the Primary NameNode goes down.

a) Rack

b) Data

c) Secondary

d) None of the mentioned

View Answer

Answer: c

Explanation: Secondary namenode is used for all time availability and reliability.

5. Point out the wrong statement.

a) Replication Factor can be configured at a cluster level (Default is set to 3) and also at a file
level

b) Block Report from each DataNode contains a list of all the blocks that are stored on that
DataNode

c) User data is stored on the local file system of DataNodes

d) DataNode is aware of the files to which the blocks stored on it belong to

View Answer

Answer: d

Explanation: NameNode is aware of the files to which the blocks stored on it belong to.

Subscribe Now: Hadoop Newsletter | Important Subjects Newsletters

6. Which of the following scenario may not be a good fit for HDFS?

a) HDFS is not suitable for scenarios requiring multiple/simultaneous writes to the same file

b) HDFS is suitable for storing data related to applications requiring low latency data access

c) HDFS is suitable for storing data related to applications requiring low latency data access

d) None of the mentioned

View Answer

Answer: a

Explanation: HDFS can be used for storing archive data since it is cheaper as HDFS allows
storing the data on low cost commodity hardware while ensuring a high degree of fault-
tolerance.

7. The need for data replication can arise in various scenarios like ____________

a) Replication Factor is changed

b) DataNode goes down

c) Data Blocks get corrupted

d) All of the mentioned

View Answer

Answer: d

Explanation: Data is replicated across different DataNodes to ensure a high degree of fault-
tolerance.

8. ________ is the slave/worker node and holds the user data in the form of Data Blocks.

a) DataNode

b) NameNode

c) Data block
d) Replication

View Answer

Answer: a

Explanation: A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has

more than one DataNode, with data replicated across them.

9. HDFS provides a command line interface called __________ used to interact with HDFS.

a) “HDFS Shell”

b) “FS Shell”

c) “DFS Shell”

d) None of the mentioned

View Answer

Answer: b

Explanation: The File System (FS) shell includes various shell-like commands that directly
interact with the Hadoop Distributed File System (HDFS).

10. HDFS is implemented in _____________ programming language.

a) C++

b) Java

c) Scala

d) None of the mentioned

View Answer

Answer: b

Explanation: HDFS is implemented in Java and any computer which can run Java can host a
NameNode/DataNode on it.

11. For YARN, the ___________ Manager UI provides host and port information.
a) Data Node

b) NameNode

c) Resource

d) Replication

View Answer

Answer: c

Explanation: All the metadata related to HDFS including the information about data nodes,
files stored on HDFS, and Replication, etc. are stored and maintained on the NameNode.

12. Point out the correct statement.

a) The Hadoop framework publishes the job flow status to an internally running web server
on the master nodes of the Hadoop cluster

b) Each incoming file is broken into 32 MB by default

c) Data blocks are replicated across different nodes in the cluster to ensure a low degree of
fault tolerance

d) None of the mentioned

View Answer

Answer: a

Explanation: The web interface for the Hadoop Distributed File System (HDFS) shows
information about the NameNode itself.

13. For ________ the HBase Master UI provides information about the HBase Master
uptime.

a) HBase

b) Oozie

c) Kafka

d) All of the mentioned

View Answer
Answer: a

Explanation: HBase Master UI provides information about the number of live, dead and
transitional servers, logs, ZooKeeper information, debug dumps, and thread stacks.

14. During start up, the ___________ loads the file system state from the fsimage and the
edits log file.

a) DataNode

b) NameNode

c) ActionNode

d) None of the mentioned

View Answer

Answer: b

Explanation: HDFS is implemented on any computer which can run Java can host a
NameNode/DataNode on it

Unit Iii
No ratings yet
Unit Iii
20 pages
AKS Doc
100% (1)
AKS Doc
1,070 pages
Role of Data For Emerging Technologies
87% (15)
Role of Data For Emerging Technologies
12 pages
Multi-Threaded Programming With POSIX Threads - Linux Systems Programming
No ratings yet
Multi-Threaded Programming With POSIX Threads - Linux Systems Programming
2,608 pages
Vera Connection Guide For Non-Wells Fargo-Owned Pcs - V.8.1
0% (1)
Vera Connection Guide For Non-Wells Fargo-Owned Pcs - V.8.1
28 pages
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Ss Zg653 Ec-3r Second Sem 2015-2016
No ratings yet
Ss Zg653 Ec-3r Second Sem 2015-2016
2 pages
Using VIDA
100% (1)
Using VIDA
8 pages
EXCEL 2021 A Complete Guide On How To Use Excel in Genera
100% (6)
EXCEL 2021 A Complete Guide On How To Use Excel in Genera
110 pages
Viva
No ratings yet
Viva
32 pages
Big Data Analytics Unit 1 MCQ
90% (10)
Big Data Analytics Unit 1 MCQ
10 pages
Gitlab CICD
100% (1)
Gitlab CICD
15 pages
Quick Start Guide: Cisco Wap150 Wireless-Ac/N Dual Radio Access Point With Poe
No ratings yet
Quick Start Guide: Cisco Wap150 Wireless-Ac/N Dual Radio Access Point With Poe
12 pages
Downgradetalk RC2
No ratings yet
Downgradetalk RC2
51 pages
CloudBoost Docu71141 NetWorker 8.x With EMC CloudBoost 2.1 Integration Guide
No ratings yet
CloudBoost Docu71141 NetWorker 8.x With EMC CloudBoost 2.1 Integration Guide
68 pages
Microsoft Azure VMware Solution Advanced Specialization
No ratings yet
Microsoft Azure VMware Solution Advanced Specialization
4 pages
WMB WMQ WAS MQ Interview Questions
No ratings yet
WMB WMQ WAS MQ Interview Questions
6 pages
Travelmate 2410
No ratings yet
Travelmate 2410
92 pages
Bda 18CS72 Mod-2
No ratings yet
Bda 18CS72 Mod-2
152 pages
3.1 03-03 Open Systems Interconnection OSI Model Overview PDF
No ratings yet
3.1 03-03 Open Systems Interconnection OSI Model Overview PDF
19 pages
Cheats - Victoria 2 Wiki
No ratings yet
Cheats - Victoria 2 Wiki
10 pages
Data Stracture Lab
No ratings yet
Data Stracture Lab
30 pages
Thecodingshef: Unit 2 Big Data MCQ Aktu
No ratings yet
Thecodingshef: Unit 2 Big Data MCQ Aktu
10 pages
The Solution For Big Data Hadoop
No ratings yet
The Solution For Big Data Hadoop
27 pages
Ririn Review 4
No ratings yet
Ririn Review 4
5 pages
DocScanner Jan 12, 2023 2-29 PM
No ratings yet
DocScanner Jan 12, 2023 2-29 PM
32 pages
V3i308 PDF
No ratings yet
V3i308 PDF
9 pages
Csis 151 Lecture 1a
No ratings yet
Csis 151 Lecture 1a
33 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
56 pages
Hadoop MCQ Challenge
No ratings yet
Hadoop MCQ Challenge
63 pages
Hadoop Intro - Part1
No ratings yet
Hadoop Intro - Part1
45 pages
Gafunk Pricelist
No ratings yet
Gafunk Pricelist
2 pages
Midterm Solution
0% (1)
Midterm Solution
7 pages
Unit - 3
No ratings yet
Unit - 3
34 pages
Module 2.2
No ratings yet
Module 2.2
32 pages
Unit3 Ds QUEUES
No ratings yet
Unit3 Ds QUEUES
9 pages
$RWLX60C
No ratings yet
$RWLX60C
21 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
58 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
Subject Name:: Knowledge Institute of Technology & Engineering-135
No ratings yet
Subject Name:: Knowledge Institute of Technology & Engineering-135
22 pages
Hadoop 1000 MCQ Question
No ratings yet
Hadoop 1000 MCQ Question
96 pages
PCX - Components and Peripherals Price List
No ratings yet
PCX - Components and Peripherals Price List
2 pages
454U8-Big Data Analytics
No ratings yet
454U8-Big Data Analytics
22 pages
BigData Objective
No ratings yet
BigData Objective
93 pages
AaxHadoop Interview Questions and Answers
No ratings yet
AaxHadoop Interview Questions and Answers
37 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
(MCQS) Big Data - Last Moment Tuitions
No ratings yet
(MCQS) Big Data - Last Moment Tuitions
9 pages
VSA - Final Assignment
No ratings yet
VSA - Final Assignment
18 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
BDA - Unit 4
No ratings yet
BDA - Unit 4
18 pages
Hadoop and Their Ecosystem
100% (2)
Hadoop and Their Ecosystem
24 pages
4 5969937999511686081
No ratings yet
4 5969937999511686081
6 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Big Data Questions
100% (1)
Big Data Questions
39 pages
00 HadoopWelcome Transcript
No ratings yet
00 HadoopWelcome Transcript
4 pages
Book Store Management System: Term Paper of Database Administration (CAP - 414)
No ratings yet
Book Store Management System: Term Paper of Database Administration (CAP - 414)
28 pages
Data Analytics - Unit - 5
No ratings yet
Data Analytics - Unit - 5
15 pages
Bda Unit 2
No ratings yet
Bda Unit 2
44 pages
Unit 2 - Hadoop PDF
No ratings yet
Unit 2 - Hadoop PDF
7 pages
Bda MCQ
No ratings yet
Bda MCQ
9 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
Draft: Edge Computing For Iot
No ratings yet
Draft: Edge Computing For Iot
19 pages
IT Hardware Disposal TZ - 2024
No ratings yet
IT Hardware Disposal TZ - 2024
37 pages
Hadoop Questions and Answers Part 100
No ratings yet
Hadoop Questions and Answers Part 100
34 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
44 pages
Big Data Testing
100% (1)
Big Data Testing
34 pages
Big Data Unit II
No ratings yet
Big Data Unit II
42 pages
Nptel Assignment 1
No ratings yet
Nptel Assignment 1
4 pages
Mad Technical Book 3161612
No ratings yet
Mad Technical Book 3161612
72 pages
Big Data - Introduction To Hadoop
No ratings yet
Big Data - Introduction To Hadoop
61 pages
8 MapReduce Different Phases 08-01-2025
No ratings yet
8 MapReduce Different Phases 08-01-2025
28 pages
Unit Ii
No ratings yet
Unit Ii
30 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
7 pages
Course Plan Moodle
No ratings yet
Course Plan Moodle
11 pages
BIG DATA ANALYTICS MCQs
No ratings yet
BIG DATA ANALYTICS MCQs
8 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
14 pages
INTRO Hadoop-Ecosystem
No ratings yet
INTRO Hadoop-Ecosystem
6 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
277 pages
DIVAR IP All in One Data Sheet enUS 85245878923
No ratings yet
DIVAR IP All in One Data Sheet enUS 85245878923
5 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Unit 2
No ratings yet
Unit 2
73 pages
CC Unit 2
No ratings yet
CC Unit 2
29 pages
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
No ratings yet
Big Data Technologies (Spark & Scala) (22CSH-391) Lecture-1 (CO1)
30 pages
Big Data BCS061 Complete Question Bank With RealWorld
No ratings yet
Big Data BCS061 Complete Question Bank With RealWorld
5 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
Big Data 2 - Part
No ratings yet
Big Data 2 - Part
40 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet