0% found this document useful (0 votes)
53 views5 pages

Chapter 6 Spark and Flink Questions Answers

The document consists of multiple-choice, single-choice, and true/false questions related to Spark and Flink architectures, data structures, features, and processing models. It covers components, data types, fault tolerance mechanisms, and specific operations within both frameworks. The questions are designed to assess knowledge of Spark and Flink's capabilities and functionalities.

Uploaded by

Mahmoud Ibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views5 pages

Chapter 6 Spark and Flink Questions Answers

The document consists of multiple-choice, single-choice, and true/false questions related to Spark and Flink architectures, data structures, features, and processing models. It covers components, data types, fault tolerance mechanisms, and specific operations within both frameworks. The questions are designed to assess knowledge of Spark and Flink's capabilities and functionalities.

Uploaded by

Mahmoud Ibrahim
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Multiple-Choice Questions (Select multiple correct answers):

1. Which of the following are components of the Spark architecture?


○ A. Spark Core
○ B. Spark SQL
○ C. Spark Streaming
○ D. GraphX
2. What are the data structures used in Spark?
○ A. RDD
○ B. DataFrame
○ C. DataSet
○ D. Key/Value Pairs
3. Which of the following features describe Spark?
○ A. In-memory computing
○ B. Low latency
○ C. Supports batch processing
○ D. Only supports static data
4. What are some advantages of using Spark?
○ A. Supports various processing paradigms
○ B. High fault tolerance
○ C. Low throughput
○ D. Seamless integration with Hadoop
5. Which of the following are Spark’s primary use cases?
○ A. Machine learning
○ B. Batch processing
○ C. Streaming processing
○ D. Log analysis
6. Which of the following are characteristics of RDD in Spark?
○ A. Read-only
○ B. In-memory data storage
○ C. Partitioned data
○ D. Dynamic modifications
7. Which of the following are state storage methods in Flink?
○ A. MemoryStateBackend
○ B. FsStateBackend
○ C. RocksDBStateBackend
○ D. SQLStateBackend
8. Which of the following are components of Flink?
○ A. JobManager
○ B. TaskManager
○ C. ResourceManager
○ D. Dispatcher
9. Which of the following are supported window types in Flink?
○ A. Tumbling window
○ B. Sliding window
○ C. Session window
○ D. Real-time window
10. Which of the following describe Flink's time semantics?
○ A. Event time
○ B. Processing time
○ C. Window time
○ D. Ingestion time
11. Which of the following are types of dependencies in Spark?
○ A. Narrow dependency
○ B. Wide dependency
○ C. Loop dependency
○ D. Stream dependency
12. Which of the following are features of Structured Streaming in Spark?
○ A. Handles real-time data
○ B. Uses RDDs
○ C. Executes SQL-like queries
○ D. Incrementally processes data
13. Which of the following are API layers provided by Flink?
○ A. DataStream API
○ B. DataSet API
○ C. Table API
○ D. SQL API
14. Which of the following describe Flink’s fault tolerance mechanism?
○ A. Checkpointing
○ B. Distributed snapshots
○ C. Speculative execution
○ D. Event replay
15. Which of the following are benefits of using Flink’s stream processing model?
○ A. Stateful stream processing
○ B. Continuous processing of stream data
○ C. SQL-like queries for stream processing
○ D. Only supports real-time processing

Single-Choice Questions (Select one correct answer):

1. What is the core data structure in Spark used for fault-tolerant in-memory
computations?
○ A. DataFrame
○ B. RDD
○ C. DataSet
○ D. Key/Value Pair
2. Which of the following best describes Flink's stream processing model?
○ A. Stateless processing
○ B. Stateful stream processing
○ C. Batch processing only
○ D. Synchronous stream processing
3. What type of dependency does a groupByKey operation in Spark have?
○ A. Narrow dependency
○ B. Wide dependency
○ C. Map dependency
○ D. Filter dependency
4. What is the function of the reduceByKey operation in Spark?
○ A. Shuffle and sort data
○ B. Group and reduce data based on a function
○ C. Filter the dataset
○ D. Convert an RDD to a DataFrame
5. Which backend is recommended in Flink for jobs with very large states?
○ A. MemoryStateBackend
○ B. FsStateBackend
○ C. RocksDBStateBackend
○ D. ExternalBackend
6. What does the Flink JobManager do?
○ A. Executes tasks
○ B. Manages resources and schedules jobs
○ C. Stores data
○ D. Monitors clusters
7. Which Spark API is used for real-time stream processing?
○ A. Spark SQL
○ B. MLlib
○ C. Spark Streaming
○ D. GraphX
8. What does a Flink DataStream represent?
○ A. A collection of batch data
○ B. An immutable collection of stream data
○ C. A collection of real-time processing tasks
○ D. A set of graphs
9. What is the default state backend for storing small states in Flink?
○ A. FsStateBackend
○ B. RocksDBStateBackend
○ C. MemoryStateBackend
○ D. SQLBackend
10. What mechanism does Flink use to handle out-of-order data?
○ A. Checkpoints
○ B. Watermarks
○ C. RDD lineage
○ D. Fault-tolerant framework
11. In Spark, what is used to trigger computation in an RDD?
○ A. Transformation
○ B. Action
○ C. Control operation
○ D. Job submission
12. Which window type in Flink is defined by a specified session interval?
○ A. Tumbling window
○ B. Sliding window
○ C. Session window
○ D. Count window
13. What is the main goal of Checkpointing in Flink?
○ A. To optimize queries
○ B. To save state in case of failure
○ C. To store data permanently
○ D. To prevent latency issues
14. Which operation in Spark converts an RDD into a new RDD based on a
user-defined function?
○ A. map()
○ B. collect()
○ C. reduce()
○ D. saveAsTextFile()
15. Which Flink window type processes data without overlap?
○ A. Tumbling window
○ B. Sliding window
○ C. Session window
○ D. Count window

True/False Questions:

1. Spark’s RDD is a mutable, distributed dataset.


○ True/False (RDD is immutable.)
2. Flink supports both batch and stream processing using a unified engine.
○ True/False
3. Spark SQL is slower than Hive when executing queries.
○ True/False (Spark SQL is faster than Hive.)
4. Flink’s Stateful Stream Processing is the main advantage over other engines.
○ True/False
5. Spark uses MapReduce as its execution engine.
○ True/False (Spark has its own execution engine.)
6. RDDs in Spark support fault tolerance through lineage tracking.
○ True/False
7. In Flink, a watermark is used to handle out-of-order events.
○ True/False
8. Flink’s TaskManager is responsible for managing job submission and scheduling.
○ True/False (That is the JobManager’s responsibility.)
9. In Spark, actions trigger the execution of a computation.
○ True/False
10. Flink supports real-time computation with exactly-once processing guarantees.
○ True/False
11. A sliding window in Flink allows overlapping time windows.
○ True/False
12. Spark supports SQL-like queries on both structured and semi-structured data.
○ True/False
13. Flink provides fault tolerance through speculative execution.
○ True/False (Fault tolerance is achieved through checkpointing.)
14. In Spark, the reduceByKey function groups data based on key and reduces it with
a function.
○ True/False
15. Flink's Checkpointing mechanism is enabled by default.
○ True/False (It needs to be enabled.)

You might also like