Chapter 6 Spark and Flink Questions Answers
Chapter 6 Spark and Flink Questions Answers
1. What is the core data structure in Spark used for fault-tolerant in-memory
computations?
○ A. DataFrame
○ B. RDD
○ C. DataSet
○ D. Key/Value Pair
2. Which of the following best describes Flink's stream processing model?
○ A. Stateless processing
○ B. Stateful stream processing
○ C. Batch processing only
○ D. Synchronous stream processing
3. What type of dependency does a groupByKey operation in Spark have?
○ A. Narrow dependency
○ B. Wide dependency
○ C. Map dependency
○ D. Filter dependency
4. What is the function of the reduceByKey operation in Spark?
○ A. Shuffle and sort data
○ B. Group and reduce data based on a function
○ C. Filter the dataset
○ D. Convert an RDD to a DataFrame
5. Which backend is recommended in Flink for jobs with very large states?
○ A. MemoryStateBackend
○ B. FsStateBackend
○ C. RocksDBStateBackend
○ D. ExternalBackend
6. What does the Flink JobManager do?
○ A. Executes tasks
○ B. Manages resources and schedules jobs
○ C. Stores data
○ D. Monitors clusters
7. Which Spark API is used for real-time stream processing?
○ A. Spark SQL
○ B. MLlib
○ C. Spark Streaming
○ D. GraphX
8. What does a Flink DataStream represent?
○ A. A collection of batch data
○ B. An immutable collection of stream data
○ C. A collection of real-time processing tasks
○ D. A set of graphs
9. What is the default state backend for storing small states in Flink?
○ A. FsStateBackend
○ B. RocksDBStateBackend
○ C. MemoryStateBackend
○ D. SQLBackend
10. What mechanism does Flink use to handle out-of-order data?
○ A. Checkpoints
○ B. Watermarks
○ C. RDD lineage
○ D. Fault-tolerant framework
11. In Spark, what is used to trigger computation in an RDD?
○ A. Transformation
○ B. Action
○ C. Control operation
○ D. Job submission
12. Which window type in Flink is defined by a specified session interval?
○ A. Tumbling window
○ B. Sliding window
○ C. Session window
○ D. Count window
13. What is the main goal of Checkpointing in Flink?
○ A. To optimize queries
○ B. To save state in case of failure
○ C. To store data permanently
○ D. To prevent latency issues
14. Which operation in Spark converts an RDD into a new RDD based on a
user-defined function?
○ A. map()
○ B. collect()
○ C. reduce()
○ D. saveAsTextFile()
15. Which Flink window type processes data without overlap?
○ A. Tumbling window
○ B. Sliding window
○ C. Session window
○ D. Count window
True/False Questions: