Big Data Analytics Unit 1 MCQ

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10
At a glance
Powered by AI
Hadoop is an open source framework for distributed storage and processing of large datasets across clusters of computers. It consists of HDFS for storage and MapReduce as a programming model for distributed computing.

The main components of Hadoop are HDFS for distributed storage, MapReduce as a programming model for distributed computing, YARN for resource management, and additional components like Pig, Hive, HBase, Zookeeper etc.

The main steps involved in MapReduce processing are Map, Shuffle, Sort and Reduce. Mappers process the data in parallel and produce intermediate output which is shuffled, sorted and passed to the reducers to produce the final output.

1. What was Hadoop named after?

1) Creator Doug Cutting’s favorite circus act


2) Cutting’s high school rock band
3) The toy elephant of Cutting’s son
4) A sound Cutting’s laptop made during Hadoop development
Answer: 3
2. Hadoop is a framework that works with a variety of related tools. Common cohorts
include ____________
1) MapReduce, Hive and HBase
2) MapReduce, MySQL and Google Apps
3) MapReduce, Hummer and Iguana
4) MapReduce, Heron and Trumpet

Answer: 1

3. __________ can best be described as a programming model used to develop Hadoop-


based applications that can process massive amounts of data.
1) MapReduce
2) Mahout
3) Oozie
4) All of the mentioned
Answer: 1
4. Point out the correct statement.
1) Hadoop is an ideal environment for extracting and transforming small volumes of data
2) Hadoop stores data in HDFS and supports data compression/decompression
3) The Graph framework is less useful than a MapReduce job to solve graph and machine
learning
4) None of the mentioned
Answer: 2
5. What was Hadoop written in?
1) Java (software platform)
2) Perl
3) Java (programming language)
4) Lua (programming language)

Answer: 3
6. ___________ is general-purpose computing model and runtime system for distributed
data analytics.
1) Mapreduce
2) Drill
3) Oozie
4) None of the mentioned
Answer: 1

7. A ________ node acts as the Slave and is responsible for executing a Task assigned to it
by the JobTracker.
1) MapReduce
2) Mapper
3) TaskTracker
4) JobTracker

Answer: 3

8. ___________ part of the MapReduce is responsible for processing one or more chunks of
data and producing the output results.
1) Maptask
2) Mapper
3) Task execution
4) All of the mentioned

Answer: 1

9. _________ function is responsible for consolidating the results produced by each of the
Map() functions/tasks.
1) Reduce
2) Map
3) Reducer
4) All of the mentioned

Answer: 1

10. Although the Hadoop framework is implemented in Java, MapReduce applications need
not be written in ____________
a) Java
b) C
c) C#
d) None of the mentioned

Answer: 1
11. MapReduce process has how many steps

1) 3
2) 4
3) 5
4) 6

Answer: 4

12. MapReduce output is displayed in _______________ file

1) Output File
2) Success File
3) Result File
4) Part File
Answer: 4
13. The following Hadoop Jobs are managed by OOZIE

1) MapReduce, Pig, Hive, and Flume


2) MapReduce, Pig, Hive and Sqoop
3) MapReduce, Pig, Hive
4) MapReduce, Pig, HDFS
Answer: 2
14. What license is Hadoop distributed under?

1) Apache License 2.0


2) Mozilla Public License
3) Shareware
4) Commercial

Answer:1

15. What are the five V’s of Big Data?

1) Volume

2) Velocity

3) Variety

4) All the above

Answer: 4
16. What are the main components of Big Data?

1) MapReduce

2) HDFS

3) YARN

4) All of these

Answer: 4

17. What does commodity Hardware in Hadoop world mean?

1) Very cheap hardware

2) Industry standard hardware

3) Discarded hardware

4) Low specifications Industry grade hardware

Answer: 4

18. What does “Velocity” in Big Data mean?

1) Speed of input data generation

2) Speed of individual machine processors

3) Speed of ONLY storing data

4) Speed of storing and processing data

Answer: 4

19. The term Big Data first originated from:

1) Stock Markets Domain

2) Banking and Finance Domain

3) Genomics and Astronomy Domain

4) Social Media Domain


Answer: 3
20. Which of the following are NOT true for Hadoop?

1) It’s a tool for Big Data analysis

2) It supports structured and unstructured data analysis

3) It aims for vertical scaling out/in scenarios

4) Both (1) and (3)

Answer: 4

21. Which of the following are the core components of Hadoop?


1) HDFS

2) Map Reduce

3) HBase

4) Both (a) and (b)

Answer: 4

22. Hadoop is open source.

1) ALWAYS True

2) True only for Apache Hadoop

3) True only for Apache and Cloudera Hadoop

4) ALWAYS False
Answer: 2
23. Hive can be used for real time queries.

1) TRUE

2) FALSE

3) True if data set is small

4) True for some distributions

Answer: 2
24. Which of the following is NOT the component of Flume?

1) Sink

2) Database

3) Source

4) Channel

Answer: 2

25. What is Hive used as?

1) Hadoop query engine

2) MapReduce wrapper

3) Hadoop SQL interface

4) All of the above

Answer: 4

26. What is the default HDFS replication factor?

1) 4

2) 1

3) 3

4) 2
Answer: 3
27. The mechanism used to create replica in HDFS is____________.

1) Gossip protocol

2) Replicate protocol

3) HDFS protocol

4) Store and Forward protocol

Answer: 3
28. . From the options listed below, select the suitable data sources for flume.

1) Publicly open web sites

2) Local data folders

3) Remote web servers

4) Both (a) and (c)

Answer: 4

29. Which of the following is the correct sequence of MapReduce flow?

1) Map Reduce Combine

2) Combine Reduce Map

3) Map Combine Reduce

4) Reduce Combine Map


Answer: 3
30. A Map reduce job can be written in:

1) Java

2) Ruby

3) Python

4) Any Language which can read from input stream

Answer: 4

31. Who will initiate the mapper?

1) Task tracker

2) Job tracker

3) Combiner

4) Reducer

Answer: 1
32. Hadoop EcoSystem is described in how many stages?
1) 6
2) 7
3) 4
4) 5
Answer: 3

33. Initial version of Hadoop was developed in which year

1) 2004
2) 2005
3) 2006
4) 2007
Answer: 1
34. Latest Hadoop version is
1) HADOOP 1.X
2) HADOOP 2.X
3) HADOOP 3.X
4) HADOOP 4.X
Answer: 3
35. Which of the following platforms does Hadoop run on?
1) Bare metal
2) Debian
3) Cross-platform
4) Unix-Like
Answer: 3
36. The Hadoop list includes the HBase database ,The Apache Mahout___________system
and matrix operations.
1) Machine learning
2) Pattern recognition
3) Statistical classification
4) Artificialnintelligence
Answer: 1
37. All of the following accurately describe Hadoop, EXCEPT
1) Open source
2) Real-time
3) Java-based
4) Distributed computing approach
Answer: 2
38. ___________ has the world’s largest Hadoop cluster.

1) Apple
2) Datamatics
3) Facebook
4) None of the mentioned
Answer: 3
39. Which component in Hadoop EcoSystem is used for provisioning, managing, monitoring
and securing apache Hadoop cluster
1) Zookeeper
2) Ambari
3) Pig
4) Oozie
Answer: 2
40. Which of the following are Big Data Applications?
1) Transportation
2) Education
3) Automobile
4) All the above
Answer: 4

You might also like