0% found this document useful (0 votes)
243 views24 pages

MCQ Type Questions

1. HDFS provides fault tolerance through data replication across multiple nodes. It replicates data blocks across multiple DataNodes, typically on different racks. 2. The core component of HDFS is the DataNode, which manages storage attached to the nodes in the cluster. 3. The fsck command is used to check the health and integrity of the HDFS file system.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
243 views24 pages

MCQ Type Questions

1. HDFS provides fault tolerance through data replication across multiple nodes. It replicates data blocks across multiple DataNodes, typically on different racks. 2. The core component of HDFS is the DataNode, which manages storage attached to the nodes in the cluster. 3. The fsck command is used to check the health and integrity of the HDFS file system.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Big Data Concepts

Q No Complexity Question

1 Low The nature of hardware for the namenode shoyuld be

Which of the following technology is used to import and export


2 Medium
data in Hadoop?

Splunk announced a new product to search, access and report


3 High
on

4 Low What is the largest single source where data is gathered?

What open-source software was developed from Google's


5 Low
MapReduce Concept

6 Medium Which of the following are false about Hadoop?

All the Slaves in the Hadoop Cluster should be of same


7 Medium
configuration

In Namenode HA, when active node fails, which node takes the
8 Medium
responsibilty of active node?

What is a big hurdle enterprises need to overcome when


9 Medium
embracing big data?
ata Concepts
Type Options Correct Answer

A. Superior than commodity grade.


B. Commodity grade
Radio Button A. Superior than commodity grade
C. More RAM than each of the data nodes
D. Should have more memory for the Node Manager

A. HBase
B. Sqoop
Radio Button B. Sqoop
C. Zookeeper
D. Parquet

A. Splunk Storm
B. MongoDB
Radio Button D. Hunk
C. Splunk Cloud
D. Hunk

A. Emails
Radio Button B. Business Transactions B. Business Transactions
C. Social Media
D. Log Data

A. Puppet
B. Splunk
Radio Button C. Hadoop
C. Hadoop
D. Mongo DB

A. Hadoop works in Master-Slave fashion


B. Master and Slave both are worker nodes B. Master and Slave both are
Radio Button C. User submit his work on master worker nodes.
D. Slaves are actual worker nodes
A. True
Radio Button A. True
B. False
A. Secondary Namenode
B. Backup Node
Radio Button C. StandBy Node
C. StandBy Node
D. Checkpoint Node

A. Infrastructure requirements
B. Cyber security risks
Radio Button D. Lack of digital skills
C. Cloud integration
D. Lack of digital skills
HDFS concepts
Q No Complexity Question

1 Low Which of the following is the core component of HDFS?

Which command is used to check the status of all daemons


2 Medium
running in HDFS?

3 Medium Which of the following command is used to copy a directory


from one node to another in HDFS?

4 Low HDFS does not allow files to?

5 Low What is Block in Hadoop HDFS?

HDFS allow a client to read a file which is already opened for


6 Low
writing?

7 Low Where is HDFS replication factor controlled?

8 Low For 129 MB file how many Blocks will be created?

For the frequently accessed HDFS files the blocks are cached
9 Low in?

The HDFS command to create the copy of a file from a local


10 Low
system is which of the following?

11 Medium The need for data replication can arise in various scenarios?
Which utility is used for checking the health of a HDFS file
12 Low
system?

A 10 GB file is split into chunks of 100MB and is distributed


among the nodes of a Hadoop Cluster. Due to power failure,
13 Medium the system got switched off, and when power returns, the
system administrator restarts the process. How will the
Namenode know what kind of processing was being performed
on which file?

A ________ serves as the master and there is only one active


14 Low
NameNode per cluster.

15 Low HDFS works in a __________ architecture.

16 Medium Which of the following scenario may not be a good fit for HDFS
?

HDFS provides a command line interface called __________


17 Medium
used to interact with HDFS.

18 Medium HDFS can be ensured with a high degree of Fault Tolerance by


FS concepts
Type Options Correct Answer
A. Node Manager
B. Data Node
Radio Button B. Data Node
C. Resource Manager
D. All of the above

A. fsck
B. distcp
Radio Button C. jps
C. jps
D. All of the above will work

A. cp
Radio Button B. distcp B. distcp
C. copyFromLocal
D. put

A. read
Radio Button B. copy C. execute
C. execute
D. archive

A. It is the logical representation of data


B. It is the physical representation of data B. It is the physical representation
Radio Button
C. Both of the above of data
D. None of the above
A. True
Radio Button A. True
B. False
A. mapred-site.xml
B. core-site.xml
Radio Button D. hdfs-site.xml
C. yarn-site.xml
D. hdfs-site.xml

A. 3
B. 2
Radio Button B. 2
C. 1
D. 4

A. the memory of the datanode


B. in the memory of the namenode
Radio Button C. Both A. the memory of the datanode
D. None

A. copyFromLocal
B. CopyFromLocal
Radio Button A. copyFromLocal
C. CopyLocal
D. copyfromlocal

A. Replication Factor is changed


B. DataNode goes down
Radio Button D. All of the mentioned
C. Data Blocks get corrupted
D. All of the mentioned
A. fsck
B. fchk
Radio Button A. fsck
C. fsch
D. fcks

A. Through the combiner


Radio Button B. Through the schedular C. Through the input list
C.Through the input list
D. Through the DataNode

A. Data Node
B. NameNode
Radio Button B. NameNode
C. Data block
D. Replication

A. master-worker
B. master-slave
Radio Button B. master-slave
C. worker/slave.
D. All of the mentioned

A. HDFS is not suitable for scenarios requiring


multiple/simultaneous writes to the same file
B. HDFS is suitable for storing data related to A. HDFS is not suitable for scenarios
Radio Button applications requiring low latency data access requiring multiple/simultaneous
C. HDFS is suitable for storing data related to writes to the same file
applications requiring low latency data access
D. None of the mentioned

A. “HDFS Shell”
B. “FS Shell”
Radio Button B. “FS Shell”
C. “DFS Shell”
D. None of the mentioned

A. Adding Data Nodes to the cluster


B. Increasing the Namenode capcity C. Increasing the replication factor
Radio Button
C. Increasing the replication factor to 1 or above to 1 or above
D. Configuring an edge node
Map Reduce Concepts
Q No Complexity Question

1 Medium How can you disable the reduce step?

Which of the following is used to set mappers for MapReduce


2 Medium
jobs?

Which of the following statements is true about key-value


3 Medium
pairs ?

4 Low Aggregation cannot be done in Mapper

The MapReduce programming model widely used in analytics


5 Medium
was developed at?

Which of the following can be used to control the number of


6 Medium
part files ( B) in a map reduce program output directory?

7 Low Who will initiate the mapper?

Which of the following class is responsible for converting


8 Medium
inputs to key-value Pairs?

Which of the following permits to use multiple Mapper classes


9 Low
within a single Map task?

10 Low Which of the following is true about MapReduce?


11 Medium What are the core methods of Reducer?

________ can best be described as a programming model used


12 Low to develop Hadoop-based applications that can process
massive amounts of data.

Mapper implementations are passed the JobConf for the job


13 Medium
via the ________ method

14 Medium Input to the _______ is the sorted output of the mappers

15 High The output of the _______ is not sorted in the Mapreduce


framework for Hadoop.

16 Low Which of the following phases occur simultaneously ?

The ___________ can also be used to distribute both jars and


17 Medium
native libraries for use in the map and/or reduce tasks.

Which of the following Hadoop streaming command option


18 High parameter is required ?

Hadoop has a library class,


org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that
19 High
effectively allows you to process text data like the unix ______
utility.

20 Medium The number of maps is usually driven by the total size of :

A ________ node acts as the Slave and is responsible for


21 Low
executing a Task assigned to it by the JobTracker.
educe Concepts
Type Options Correct Answer
A. set conf.setNumreduceTasks(0)
B. set job.setNumreduceTasks(0)
Radio Button B. set job.setNumreduceTasks(0)
C. set job.setNumreduceTasks()=0
D. None of these

A. job.setNumMaptasks()
B. job.setNum.Maptasks()
Radio Button A. job.setNumMaptasks()
C. job.setNumMap.tasks()
D. job.setNumMap()

A. key class must implement Writable


B. key class must implement WritableComparable. B. key class must implement
Radio Button
C. Value class must implement WritableComparable. WritableComparable
D. Value class must extend WritableComparable

A. True
Radio Button A. True
B. False
A. Apache Foundation
B. Google
Radio Button B. Google
C. Microsoft Research
D. None of the above

A. Number of mappers
B. Number of reducers
Radio Button B. Number of reducers
C. Counter
D. Partitioner

A. Task Tracker
B. Job Tracker
Radio Button A. Task Tracker
C. Combiner
D. Data Node

A. FileInputFormat
B. InputSplit
Radio Button C. RecordReader
C. RecordReader
D. Mapper

A. Identity Mapper
B. Chain Mapper
Radio Button B. Chain Mapper
C. Both
D. None

A. Data processing layer of hadoop


B. It provides the resource management
Radio Button C. It is an open source data warehouse system for A. Data processing layer of hadoop
querying and analyzing large datasets stored in
hadoop files
D. All of the above
A. setup(),reduce(),cleanup()
B. Get(), Mapreduce(), cleanup()
Radio Button A. setup(),reduce(),cleanup()
C. Put(), reduce(), clean()
D. set-up(),reduce(),cleanup()

A. MapReduce
B. Mahout
Radio Button A. MapReduce
C. Oozie
D. All of the mentioned

A. JobConfigure.configure
B. JobConfigurable.configure
Radio Button B. JobConfigurable.configure
C. JobConfigurable.configureable
D. None of the mentioned

A. Reducer
Radio Button B. Mapper A. Reducer
C. Shuffle
D. All of the mentioned

A. Mapper
Radio Button B. Cascader D. None of the mentioned
C. Scalding
D. None of the mentioned

A. Shuffle and Sort


B. Reduce and Sort
Radio Button A. Shuffle and Sort
C. Shuffle and Map
D. All of the mentioned

A. DataCache
B. DistributedData
Radio Button C. DistributedCache
C. DistributedCache
D. All of the mentioned

A. output directoryname
B. mapper executable
Radio Button C. input directoryname D. All of the mentioned
D. All of the mentioned

A. Copy
B. Cut
Radio Button B. Cut
C. Paste
D. Move

A. no. of input files


B. no. of input splits
Radio Button B. no. of input splits
C. pre defined variable
D. None of the mentioned

A. MapReduce
B. Mapper
Radio Button C. TaskTracker
C. TaskTracker
D. JobTracker
Spark dataframe
Q No Complexity Question

1 Low We can create DataFrame using?

2 Low Which of the following is transformation?

What will be the output:


3 Low val rawData = spark.read.textFile("PATH").rdd
val result = rawData.filter…

4 Low Which of the following is not a transformation?

5 Low The basic abstraction of Spark Streaming is?

In Dataframe in Spark, Once the domain object is converted


6 Medium into dataframe, the regeneration of domain object is not
possible?

7 Low Which of the following require Garbage collection?

DataFrame in Apache Spark prevails over RDD and doesnot


8 Low
contain any feature of RDD?

9 Low Which of the following is true about DataFrame?


k dataframe
Type Options Correct Answer
A. Tables in Hive
B. Structured data files
Radio Button D. All of the above
C. External databases
D. All of the above

A. take(n)
B. top()
Radio Button D. mapPartitionWithIndex()
C. countByValue()
D. mapPartitionWithIndex()

A. Process the data as per the specified logic


B. Compilation error
Radio Button C. Won't be executed
C. Won't be executed
D. None

A. map()
Radio Button B. flatMap() C. reduce()
C. reduce()
D. filter()

A. Dstream
B. RDD
Radio Button A. Dstream
C. Shared Variable
D. None of the above

A. True
Radio Button A. True
B. False

A. RDD
B. DataFrame
Radio Button A. RDD
C. Dataset
D. All of the above
A. True
Radio Button B. False
B. False

A. DataFrame API have provision for compile time


type safety.
B. DataFrames provide a more user-friendly API than B. DataFrames provide a more user-
Radio Button
RDDs. friendly API than RDDs.
C. Both a and b
D. None of the above
Spark query language
Q No Complexity Question

1 Low Which of the following is the entry point of Spark Application?

2 Low How many Spark Context can be active per JVM?

Which of the following is not a Spark SQL query execution


3 Medium
phases?

4 Low Which of the following is module for Structured data


processing?

5 Medium Which schedular is used by SparkContext by default?


query language
Type Options Correct Answer
A. SparkSession
B. SparkContext
Radio Button B. SparkContext
C. Both a and b
D. None of the above

A. more than one


B. only one
Radio Button B. only one
C. not specific
D. None of the above

A. Analysis
B. Logical Optimization
Radio Button C. Execution
C. Execution
D. Physical planning

A. GraphX
Radio Button B. Mllib C. Spark SQL
C. Spark SQL
D. Spark R

A. DAG Schedular
B. Fair Schedular
Radio Button A. DAG Schedular
C. Capacity Schedular
D. None of the above
Hive Concepts
Q No Complexity Question

1 Low Hive Data models represent?

2 Medium Hive uses an ORC file contains groups of row data called?

3 Low Integral literals are assumed to be what by default ?

4 Low Hive does not support literals for which types?

5 Low What Hive cannot offer?

6 Medium What is the drawback of managed tables in Hive is?

7 Low A view in Hive can be seen by writing which command?

Which of the following operator executes a shell command


1 Low
from the Hive shell ?

Which of the following will remove the resource(s) from the


2 Medium
distributed cache ?

3 Medium Which of the following is a command line option ?


Avro-backed tables can simply be created by using _________
4 Low
in a DDL statement.

________ is used to embed the schema in the create


5 Medium
statement.

________ was designed to overcome limitations of the other


6 Low
Hive file formats.

7 Medium _______ is a lossless data compression library that favors


speed over compression ratio.

8 High Which functions allow Concatenation in Hive

How can the sub directories be accessed recursively in Hive


9 High
queries
10 High We are using a precedence hierarchy for setting the properties
e Concepts
Type Options Correct Answer
A. Tables in metastore DB
B. Table in HDFS
Radio Button C. Directories in HDFS
C. Directories in HDFS
D. None of the above

A. postscript
B. stripes
Radio Button B. stripes
C. script
D. None of the mentioned

A. Small Int
B. Int
Radio Button B. Int
C. Big Int
D. Tiny Int

A. Scalar
Radio Button B. Complex B. Complex
C. Int
D. Char

A. Storing data in tables and columns


B. Online transaction processing
Radio Button B. Online transaction processing
C. Handling date time data
D. Partitioning stored data

A. They are always stored under default directory


B. They cannot grow bigger than a fixed size of D. They cannot be shared with
Radio Button 100GB
other applications
C. They can never be dropped
D. They cannot be shared with other applications

A. SHOW TABLES
B. SHOW VIEWS
Radio Button C. DESCRIBE VIEWS A. SHOW TABLES
D. VIEW VIEWS

A. |
B. !
Radio Button B. !
C. ^
D. +

A. delete FILE[S] *
B. delete JAR[S] *
Radio Button D. All of the mentioned
C. delete ARCHIVE[S] *
D. All of the mentioned

A. -d,–define
B. -e,–define
Radio Button A. -d,–define
C. -f,–define
D. None of the mentioned
A. “STORED AS AVRO”
B. “STORED AS HIVE”
Radio Button A. “STORED AS AVRO”
C. “STORED AS AVROHIVE”
D. “STORED AS SERDE”

A. schema.literal
B. schema.lit
Radio Button A. schema.literal
C. row.literal
D. All of the mentioned

A. ORC
B. OPC
Radio Button A. ORC
C. ODC
D. None of the mentioned

A. LOZ
Radio Button B. LZO B. LZO
C. OLZ
D. All of the mentioned

A. CONCATENATE ('Hive','-','query')
Checkbox B. CONCAT ('query','-','query') B. CONCAT ('query','-','query')
C. CONCAT_WS ('-','Hive',’Query’) C. CONCAT_WS ('-','Hive',’Query’)
D. CONCATENATE_WS ('-','Hive','Query')

A. Set
A. Set mapred.input.dir.recursive=true;
mapred.input.dir.recursive=true;
B. Set hive.supports.subdirectories=true;
Checkbox C. Set
C. Set hive.mapred.supports.subdirectories=true;
hive.mapred.supports.subdirectorie
D. Set hive.input.dir.recursive=true;
s=true;
A.
1) The command line –hiveconf option
2) SET Command in HIVE
3) Hive-site.XML
4) Hive-default.xml
5) Hadoop-site.xml
6) Hadoop-default.xml

B.
1) SET Command in HIVE
2) The command line –hiveconf option
3) Hadoop-site.xml
C.
4) Hadoop-default.xml
1) SET Command in HIVE
5) Hive-site.XML
2) The command line –hiveconf
6) Hive-default.xml
option
Radio Button
3) Hive-site.XML
C.
1) SET Command in HIVE 4) Hive-default.xml
5) Hadoop-site.xml
2) The command line –hiveconf option
6) Hadoop-default.xml
3) Hive-site.XML
4) Hive-default.xml
5) Hadoop-site.xml
6) Hadoop-default.xml

D.
1) Hadoop-site.xml
2) Hadoop-default.xml
3) Hive-site.XML
4) Hive-default.xml
5) SET Command in HIVE
6) The command line –hiveconf option
Oozie concept
Q No Complexity Question

___________ is a Java Web application used to schedule


1 Low
Apache Hadoop jobs

2 Low Oozie Workflow jobs are Directed ________ graphs of actions.

Which of the following is one of the possible state for a


3 Medium
workflow jobs ?

4 Medium A workflow definition is a ______ with control flow nodes or


action nodes.

Node names and transitions must be conform to the following


5 High pattern =[a-zA-Z][\-_a-zA-Z0-0]*=, of up to __________
characters long.

6 Medium A workflow definition must have one ________ node.

If one or more actions started by the workflow job are


7 Medium executing when the ________ node is reached, the actions will
be killed.

A ___________ node enables a workflow to make a selection


8 Medium
on the execution path to follow.

Which of the following can be seen as a switch-case


9 Medium
statement ?

All decision nodes must have a _____________ element to


10 Medium avoid bringing the workflow into an error state if none of the
predicates evaluates to true.

If the failure is of ___________ nature, Oozie will suspend the


11 Medium
workflow job.
zie concept
Type Options Correct Answer
A. Impala
B. Oozie
Radio Button B. Oozie
C. Mahout
D. All of the mentioned

A. Acyclical
B. Cyclical
Radio Button A. Acyclical
C. Elliptical
D. All of the mentioned

A. PREP
B. START
Radio Button A. PREP
C. RESUME
D. END

A. CAG
Radio Button B. DAG B. DAG
C. BAG
D. None of the mentioned

A. 10
B. 15
Radio Button C. 20
C. 20
D. 25

A. stop
B. resume
Radio Button A. stop
C. finish
D. None of the mentioned

A. kill
B. start
Radio Button A. kill
C. end
D. finsih

A. fork
B. decision
Radio Button B. decision
C. start
D. None of the mentioned

A. fork
B. decision
Radio Button A. fork
C. start
D. None of the mentioned

A. name
B. default
Radio Button B. default
C. server
D. client

A. transient
B. non-transient
Radio Button B. non-transient
C. permanent
D. All of the mentioned

You might also like