0% found this document useful (1 vote)
171 views7 pages

Midterm Solution

The document contains 20 multiple choice questions about Hadoop, MapReduce, and big data concepts. Key points covered include: - Hadoop is an open-source, distributed computing framework used for storing and analyzing large datasets across clusters of computers. - MapReduce is a programming model used by Hadoop applications to process massive amounts of data in parallel across nodes in a cluster. - The main components of big data systems are MapReduce, HDFS for storage, and YARN for resource management. - Characteristics of big data include large volume, high velocity of data generation, and being unsuitable for processing using traditional techniques.

Uploaded by

Dasha Desho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
171 views7 pages

Midterm Solution

The document contains 20 multiple choice questions about Hadoop, MapReduce, and big data concepts. Key points covered include: - Hadoop is an open-source, distributed computing framework used for storing and analyzing large datasets across clusters of computers. - MapReduce is a programming model used by Hadoop applications to process massive amounts of data in parallel across nodes in a cluster. - The main components of big data systems are MapReduce, HDFS for storage, and YARN for resource management. - Characteristics of big data include large volume, high velocity of data generation, and being unsuitable for processing using traditional techniques.

Uploaded by

Dasha Desho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

1.

According to analysts, for what can traditional IT systems provide a


foundation when they’re integrated with big data technologies like Hadoop?
a) Big data management and data mining
b) Data warehousing and business intelligence
c) Management of Hadoop clusters
d) Collecting and storing unstructured data

2. All of the following accurately describe Hadoop, EXCEPT:


a) Open-source
b) Real-time
c) Java-based
d) Distributed computing approach

3. ___________ is general-purpose computing model and runtime system for


distributed data analytics.
a) Mapreduce
b) Drill
c) Oozie
d) None of the above
4. As companies move past the experimental phase with Hadoop, many cite the
need for additional capabilities, including _______________
a) Improved data storage and information retrieval
b) Improved extract, transform and load features for data integration
c) Improved data warehousing functionality
d) Improved security, workload management, and SQL support

5. Hadoop is a framework that works with a variety of related tools. Common


cohorts include ____________
a) MapReduce, Hive and HBase
b) MapReduce, MySQL and Google Apps
c) MapReduce, Hummer and Iguana
d) MapReduce, Heron and Trumpet

6.__________ can best be described as a programming model used to develop


Hadoop-based applications that can process massive amounts of data.
a) MapReduce
b) Mahout
c) Oozie
d) All of the mentioned
7. What was Hadoop named after?
a) Creator Doug Cutting’s favorite circus act
b) Cutting’s high school rock band
c) The toy elephant of Cutting’s son
d) A sound Cutting’s laptop made during Hadoop development

8. Which of the following genres does Hadoop produce?


a) Distributed file system
b) JAX-RS
c) Java Message Service
d) Relational Database Management System

9. ___________ is general-purpose computing model and runtime system for


distributed data analytics.
a) Mapreduce
b) Drill
c) Oozie
d) None of the mentioned

10. A ________ node acts as the Slave and is responsible for executing a Task
assigned to it by the JobTracker.
a) MapReduce
b) Mapper
c) TaskTracker
d) JobTracker
11. ___________ part of the MapReduce is responsible for processing one or
more chunks of data and producing the output results.
a) Maptask
b) Mapper
c) Task execution
d) All of the mentioned

12. _________ function is responsible for consolidating the results produced by


each of the Map() functions/tasks.
a) Reduce
b) Map
c) Reducer
d) All of the mentioned

13. Point out the wrong statement.


a) A MapReduce job usually splits the input data-set into independent chunks
which are processed by the map tasks in a completely parallel manner
b) The MapReduce framework operates exclusively on <key, value> pairs
c) Applications typically implement the Mapper and Reducer interfaces to provide
the map and reduce methods
d) None of the mentioned
14. Although the Hadoop framework is implemented in Java, MapReduce
applications need not be written in ____________
a) Java
b) C
c) C#
d) None of the mentioned

15. ________ is a utility which allows users to create and run jobs with any
executables as the mapper and/or the reducer.
a) Hadoop Strdata
b) Hadoop Streaming
c) Hadoop Stream
d) None of the mentioned

16. . __________ maps input key/value pairs to a set of intermediate key/value


pairs.
a) Mapper
b) Reducer
c) Both Mapper and Reducer
d) None of the mentioned
17. What are the main components of Big Data?
A) MapReduce
B) HDFS
C) YARN
D) All of the above

18. Which of the following is true about big data?

A) Big Data refers to data sets that are at least a petabyte in size
B) Big Data has low velocity, meaning that it is generated slowly
C) Big Data can be processed using traditional techniques
D) None of the above

19. Identify the term used to define the multidimensional model of


the data warehouse.
A) Taple
B) Data Cube
C) Tree
D) Data structure

20. The total forms of big data is ____


A) 1
B) 2
C) 3
D) 4

You might also like