0% found this document useful (0 votes)
53 views6 pages

MCQ Questions

The document contains multiple-choice questions related to Hadoop and MapReduce, covering topics such as HDFS, MapReduce phases, and components of the Hadoop ecosystem. Key concepts include the roles of NameNode, DataNode, and the function of MapReduce in processing large datasets. Additionally, it touches on applications of NLP and the structure of HDFS.

Uploaded by

malakasran339
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views6 pages

MCQ Questions

The document contains multiple-choice questions related to Hadoop and MapReduce, covering topics such as HDFS, MapReduce phases, and components of the Hadoop ecosystem. Key concepts include the roles of NameNode, DataNode, and the function of MapReduce in processing large datasets. Additionally, it touches on applications of NLP and the structure of HDFS.

Uploaded by

malakasran339
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

lecture 4

(Hadoop&mapreduce)
MCQ Questions:-

1. Which of the following best describes Hadoop?

 A) A relational database system.


 B) A distributed framework for storing and processing large datasets.
 C) A cloud storage service.
 D) A SQL-based querying platform.

2. What is the main function of the "NameNode" in HDFS?

 A) To store data blocks.


 B) To manage metadata and the directory structure.
 C) To process MapReduce jobs.
 D) To replicate data across DataNodes.

3. In Hadoop, which component is responsible for processing data in parallel?

 A) HDFS.
 B) MapReduce.
 C) Spark.
 D) YARN.

4. What is an example of unstructured data that Hadoop can process?

 A) Tables in a database.
 B) JSON files.
 C) Video files.
 D) E-mail headers.

5. Which of the following is NOT part of the Hadoop ecosystem?

 A) Hive.
 B) Pig.
 C) TensorFlow.
 D) HDFS.

6. What does the "Map" phase in MapReduce do?

 A) It combines intermediate results.


 B) It processes input data and transforms it into key-value pairs.
 C) It merges data from different nodes.
 D) It reduces the size of data stored in HDFS.

7. Why is MapReduce scalable?


 A) It uses expensive hardware.
 B) It supports SQL queries.
 C) It divides tasks into smaller jobs that can run on multiple nodes.
 D) It relies on in-memory processing.

8. Which of the following is NOT a use case for MapReduce?

 A) Word count in large text files.


 B) Data sorting.
 C) Relational database updates.
 D) Log analysis.

9. What is the default block size in HDFS?

 A) 32 MB.
 B) 64 MB.
 C) 128 MB.
 D) 256 MB.

10. Which phase of MapReduce is responsible for aggregating data?

 A) Splitting.
 B) Mapping.
 C) Reducing.
 D) Combining.

11. Which of the following is a benefit of HDFS?

 A) Centralized storage.
 B) Storing large files across distributed nodes.
 C) Performing real-time analytics.
 D) Encrypting small datasets.

12. What does Hadoop use for fault tolerance?

 A) Backup servers.
 B) Data replication.
 C) RAID arrays.
 D) Cloud storage integration.

13. In MapReduce, what is a Combiner?

 A) A required step for reducing data.


 B) A local reducer that processes intermediate data on the same node.
 C) A function to map data into key-value pairs.
 D) A secondary task for YARN.

14. Which of the following tools in the Hadoop ecosystem supports SQL-like queries?

 A) Pig.
 B) Hive.
 C) HDFS.
 D) Spark Streaming.

15. What is the purpose of the JobTracker in Hadoop 1.x?

 A) Managing distributed data.


 B) Tracking metadata.
 C) Coordinating MapReduce jobs across nodes.
 D) Processing SQL queries.

16. Which of the following tools is used for large-scale data storage in Hadoop?

 A) MapReduce.
 B) YARN.
 C) HDFS.
 D) Oozie.

17. Which of the following is true about Pig?

 A) It is used for real-time streaming.


 B) It provides a high-level scripting language for data analysis.
 C) It processes relational database queries.
 D) It stores data on HDFS.

18. What is a DataNode in Hadoop?

 A) A node that manages metadata.


 B) A node that processes MapReduce jobs.
 C) A node that stores blocks of data.
 D) A node that tracks task execution.

19. What is a key feature of HDFS that ensures reliability?

 A) Data replication across multiple nodes.


 B) Encryption of stored data.
 C) Automated SQL query optimization.
 D) In-memory data processing.
20. Which of the following best describes Spark compared to MapReduce?

 A) It is slower but simpler.


 B) It processes data in memory, making it faster.
 C) It supports only unstructured data.
 D) It does not integrate with HDFS.

21. What is the role of the ResourceManager in Hadoop 2.x (YARN)?

 A) Managing storage blocks.


 B) Scheduling resources for applications.
 C) Tracking job progress.
 D) Replicating data across nodes.

22. Which of the following is NOT a phase in MapReduce?

 A) Splitting.
 B) Shuffling.
 C) Mapping.
 D) Indexing.

23. What is "YARN" in Hadoop?

 A) A storage system.
 B) A resource management framework.
 C) A database query tool.
 D) A data transformation tool.

24. In Hadoop, what is a Block Report?

 A) A list of corrupted blocks.


 B) Metadata sent by a DataNode to the NameNode.
 C) A summary of HDFS storage usage.
 D) A report detailing completed MapReduce jobs.

25. What is an example of structured data?

 A) Log files.
 B) Relational database tables.
 C) Social media posts.
 D) Video streams.
26. What is the primary function of the "Reducer" in MapReduce?

 A) Transforming input data into key-value pairs.


 B) Aggregating intermediate results from the Mappers.
 C) Storing data on HDFS.
 D) Dividing tasks into smaller jobs.

27. What is a typical file format supported by Hadoop?

 A) XML.
 B) CSV.
 C) JSON.
 D) All of the above.

28. Which programming language is NOT commonly used with Hadoop?

 A) Java.
 B) Python.
 C) R.
 D) HTML.

29. Which of the following describes the "Shuffle and Sort" phase in MapReduce?

 A) Splitting data into smaller chunks.


 B) Sorting and grouping intermediate results by key.
 C) Transforming key-value pairs into final results.
 D) Writing data to HDFS.

30. Which of the following is a real-world use case for MapReduce?

 A) Banking transactions processing.


 B) Search engine indexing.
 C) Real-time weather analysis.
 D) Video game rendering.
Applications of NLP

o ‫( تحليل المشاعر‬Sentiment Analysis).


o ‫استخراج المعلومات وتصنيف النصوص‬
‫ باستخدام‬NLP ‫و‬Machine Learning
o ‫( تصنيف النصوص‬Text Classification).

‫تحديد إذا كانت النصوص إيجابية أو‬


‫سلبية‬.

o ‫( استخراج الكيانات‬Named Entity


Recognition).

‫ زي اللي بيستخدمها‬Netflix ‫و‬Amazon


‫القتراح األفالم أو المنتجات‬.

3. Key Components of HDFS

1. NameNode:
o Role: Manages the filesystem namespace and metadata (file directory, block locations).
o Responsibilities:
 Keeps track of where blocks are stored across the cluster.
 Handles client requests for file operations (read/write).
2. DataNode:
o Role: Stores the actual data blocks.
o Responsibilities:
 Performs read and write operations as instructed by the NameNode.
 Sends regular heartbeat signals to the NameNode to indicate it’s functional.
3. Secondary NameNode:
o Role: Maintains a backup of the NameNode's metadata and periodically updates it.
o Note: It is NOT a failover for the NameNode.
4. Blocks:
o Files are divided into smaller chunks called Blocks.
o Example: A 512 MB file is split into four 128 MB blocks, distributed across DataNodes.

Summary

 HDFS is a distributed file system for storing large datasets.


 Key components: NameNode (manages metadata), DataNode (stores data), and Blocks (small parts of

You might also like