Serverless Data Processing With Dataflow - Foundations
Serverless Data Processing With Dataflow - Foundations
------------------QUIZ 1
What is the Beam Portability Framework?
- A set of protocols for executing pipelines
- A language-agnostic way to represent pipelines
Which of the following are benefits of Beam Portability (Select ALL that apply) ?
- Implement new Beam transforms using a language of choice and utilize these
transforms from other languages
- Cross-language transforms
- Running pipelines authored in any SDK on any runner
------------------QUIZ 2
The Dataflow Shuffle service is available only for batch jobs.
- True
What are the benefits of Dataflow Streaming Engine? Select ALL that apply:
- Reduced consumption of worker CPU, memory, and storage
- More responsive autoscaling for incoming data variations
- Lower resource and quota consumption
Which of the following are TRUE about Flexible Resource Scheduling (select ALL that
apply) :
- FlexRS helps to reduce batch processing costs by using advanced scheduling
techniques
- When you submit a FlexRS job, the Dataflow service places the job into a queue
and submits it for execution within 6 hours from job creation.
- FlexRS leverages a mix of preemptible and normal VMs
------------------QUIZ 3
You want to run the following command:
gcloud dataflow jobs cancel 2021-01-31_14_30_00-9098096469011826084--region=$REGION
Which of these roles can be assigned to you for the command to work?
- Dataflow Admin
- Dataflow Developer
Your project’s current SSD usage is 100 TB. You want to launch a streaming pipeline
with shuffle done on the VM. You set the initial number of workers to 5 and the
maximum number of workers to 100. What will be your project’s SSD usage when the
job launches?
- 140 TB
------------------QUIZ 4
You are a Beam developer for a university in Googleville. Googleville law mandates
that all student data is kept within Googleville. Compute Engine resources can be
launched in Googleville; the region name is google-world1. Dataflow, however, does
not currently have a regional endpoint set up in google-world1. Which flags are
needed in the following command to allow you to launch a Dataflow job and to
conform with Googleville’s law?
python3 -m apache_beam.examples.wordcount \
--input gs://dataflow-samples/shakespeare/kinglear.txt \
Your project’s current In-use IP address usage is 500/575. You run the following
command:
python3 -m apache_beam.examples.wordcount \
--input gs://dataflow-samples/shakespeare/kinglear.txt \
--subnetwork regions/$REGION/subnetworks/$SUBNETWORK \
What will be the in-use IP address usage after the job starts?
Answer
- 500/575