0% found this document useful (0 votes)
760 views

Serverless Data Processing With Dataflow - Foundations

The document discusses the Beam Portability Framework and Dataflow Streaming Engine. It provides quiz questions about Beam, Dataflow batch and streaming jobs, Flexible Resource Scheduling, and launching Dataflow jobs with specific region and resource requirements.

Uploaded by

Prashant Rohilla
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
760 views

Serverless Data Processing With Dataflow - Foundations

The document discusses the Beam Portability Framework and Dataflow Streaming Engine. It provides quiz questions about Beam, Dataflow batch and streaming jobs, Flexible Resource Scheduling, and launching Dataflow jobs with specific region and resource requirements.

Uploaded by

Prashant Rohilla
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

Serverless Data Processing with Dataflow: Foundations

------------------QUIZ 1
What is the Beam Portability Framework?
- A set of protocols for executing pipelines
- A language-agnostic way to represent pipelines

Which of the following are benefits of Beam Portability (Select ALL that apply) ?
- Implement new Beam transforms using a language of choice and utilize these
transforms from other languages
- Cross-language transforms
- Running pipelines authored in any SDK on any runner

------------------QUIZ 2
The Dataflow Shuffle service is available only for batch jobs.
- True

What are the benefits of Dataflow Streaming Engine? Select ALL that apply:
- Reduced consumption of worker CPU, memory, and storage
- More responsive autoscaling for incoming data variations
- Lower resource and quota consumption

Which of the following are TRUE about Flexible Resource Scheduling (select ALL that
apply) :
- FlexRS helps to reduce batch processing costs by using advanced scheduling
techniques
- When you submit a FlexRS job, the Dataflow service places the job into a queue
and submits it for execution within 6 hours from job creation.
- FlexRS leverages a mix of preemptible and normal VMs

------------------QUIZ 3
You want to run the following command:
gcloud dataflow jobs cancel 2021-01-31_14_30_00-9098096469011826084--region=$REGION

Which of these roles can be assigned to you for the command to work?
- Dataflow Admin
- Dataflow Developer

Your project’s current SSD usage is 100 TB. You want to launch a streaming pipeline
with shuffle done on the VM. You set the initial number of workers to 5 and the
maximum number of workers to 100. What will be your project’s SSD usage when the
job launches?
- 140 TB

------------------QUIZ 4
You are a Beam developer for a university in Googleville. Googleville law mandates
that all student data is kept within Googleville. Compute Engine resources can be
launched in Googleville; the region name is google-world1. Dataflow, however, does
not currently have a regional endpoint set up in google-world1. Which flags are
needed in the following command to allow you to launch a Dataflow job and to
conform with Googleville’s law?
python3 -m apache_beam.examples.wordcount \

--input gs://dataflow-samples/shakespeare/kinglear.txt \

--output gs://$BUCKET/results/outputs --runner DataflowRunner \

--project $PROJECT --temp_location gs://$BUCKET/tmp/ \


Answer
- --region northamerica-northeast1 --worker_region google-world1

Your project’s current In-use IP address usage is 500/575. You run the following
command:
python3 -m apache_beam.examples.wordcount \

--input gs://dataflow-samples/shakespeare/kinglear.txt \

--output gs://$BUCKET/results/outputs --runner DataflowRunner \

--project $PROJECT --temp_location gs://$BUCKET/tmp/ --region $REGION \

--subnetwork regions/$REGION/subnetworks/$SUBNETWORK \

--num_workers 20 --machine_type n1-standard-4 --no_use_public_ips

What will be the in-use IP address usage after the job starts?

Answer
- 500/575

You might also like