0% found this document useful (0 votes)
27 views4 pages

Question 8

The document discusses a question from Google's Professional Cloud Architect exam regarding the most cost-effective way to analyze telemetry data from TerramEarth's 20 million vehicles. Various options are presented, with a consensus leaning towards option D, which involves launching a cluster in each region to preprocess and compress data before moving it to a regional bucket for final analysis. The discussion highlights the importance of minimizing data transfer costs and optimizing processing time.

Uploaded by

DARSH BAKSHI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

Question 8

The document discusses a question from Google's Professional Cloud Architect exam regarding the most cost-effective way to analyze telemetry data from TerramEarth's 20 million vehicles. Various options are presented, with a consensus leaning towards option D, which involves launching a cluster in each region to preprocess and compress data before moving it to a regional bucket for final analysis. The discussion highlights the importance of minimizing data transfer costs and optimizing processing time.

Uploaded by

DARSH BAKSHI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

 MENU 

 Google Discussions

Exam Professional Cloud Architect All Questions


View all questions & answers for the Professional Cloud Architect exam

Go to Exam

 EXAM PROFESSIONAL CLOUD ARCHITECT TOPIC 8 QUESTION 8 DISCUSSION

Actual exam question from Google's Professional Cloud Architect


Question #: 8
Topic #: 8
[All Professional Cloud Architect Questions]

TerramEarth's 20 million vehicles are scattered around the world. Based on the vehicle's location, its telemetry data
is stored in a Google Cloud Storage (GCS) regional bucket (US, Europe, or Asia). The CTO has asked you to run a
report on the raw telemetry data to determine why vehicles are breaking down after 100 K miles. You want to run
this job on all the data.
What is the most cost-effective way to run this job?

A. Move all the data into 1 zone, then launch a Cloud Dataproc cluster to run the job

B. Move all the data into 1 region, then launch a Google Cloud Dataproc cluster to run the job

C. Launch a cluster in each region to preprocess and compress the raw data, then move the data into a multi-
region bucket and use a Dataproc cluster to finish the job

D. Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region
bucket and use a Cloud Dataproc cluster to finish the job

Show Suggested Answer

by  JoeShmoe at Nov. 15, 2019, 10:29 a.m.

Comments

Type your comment...


Submit

  cetanx Highly Voted  3 years, 9 months ago


I will look at it from a different perspective;
A, B says "move all data" but analysis will try to reveal breaking down after 100K miles so there is no point of transferring
data of the vehicles with less than 100K milage.
Therefore, transferring all data is just waste of time and money.

There is one thing for sure here. If we move/copy data between continents it will cost us money therefore compressing the
data before copying to another region/continent makes sense.
Preprocessing also makes sense because we probably want to process smaller chunks of data first (remember 100K
milage).
So now type of target bucket; multi-region or standard? multi-region is good for high-availability and low latency with a little
more cost however question doesn't require any of these features.
Therefore I think standard storage option is good to go given lower costs are always better.

So my answer would be D
   upvoted 66 times

  DiegoQ 3 years, 6 months ago


I totally agree with you, and I think that what confuse people here is the "run a raw data", but preprocess doesn´t mean
to mandatory transform raw data, it could be to only select the data that you need (as you said: vehicles with less than
100K milage)
   upvoted 2 times
  mrhege 3 years ago
You will need data from non-broken machines too for labelling.
   upvoted 1 times

  stfnz 11 months, 1 week ago


yes, still you will be interested in 100K+ mileage, whether broken or not
   upvoted 1 times
  JoeShmoe 4 years, 5 months ago
Highly Voted 
D is the most cost effective and DataProc is regional
   upvoted 32 times

  nitinz 3 years, 1 month ago


It is D.
   upvoted 1 times
  Rafaa 3 years, 10 months ago
Hold on guys, you do not need to 'preprocess' the data. This rules out C,D.
   upvoted 2 times

  guid1984 3 years, 2 months ago


why not it's a RAW data, so can be pre-processed for optimization
   upvoted 2 times
  passnow 4 years, 4 months ago
Dataproc can be use global end points too.
   upvoted 1 times

  tartar 3 years, 8 months ago


D is ok
   upvoted 11 times
  passnow 4 years, 4 months ago
Honestly, if we read the question well and factor in cost, D would be a better option
   upvoted 2 times

  vindahake 4 years, 1 month ago


I think running additional compute regionally will be more expensive than data transfer charges and centrally
processing them
   upvoted 4 times
  msahdra Most Recent  4 months, 3 weeks ago
Selected Answer: C
While regional preprocessing can be efficient, moving the data back to regional buckets after compression defeats the
While regional preprocessing can be efficient, moving the data back to regional buckets after compression defeats the
purpose of a multi-region bucket. It adds unnecessary data transfer costs and reduces the availability of the preprocessed
data for global analysis.
   upvoted 2 times
  thewalker 5 months, 1 week ago
D
Considering https://cloud.google.com/storage/docs/locations#considerations
   upvoted 2 times
  Jeena345 1 year, 2 months ago
Selected Answer: D
D should be fine
   upvoted 1 times
  omermahgoub 1 year, 3 months ago
Answer is C
To run the report on all of the raw telemetry data for TerramEarth's vehicles in the most cost-effective way, it would be best
to launch a cluster in each region to preprocess and compress the raw data. This will allow you to process the data in
place, which will minimize the amount of data that needs to be transferred between regions. After the data has been
preprocessed and compressed, you can then move it into a multi-region bucket and use a Dataproc cluster to finish the job.
   upvoted 2 times

  omermahgoub 1 year, 3 months ago


D, moving the data into a region bucket and using a Cloud Dataproc cluster to finish the job, would also not be as cost-
effective as moving the data into a multi-region bucket, as it would not take advantage of the lower costs of storing data
in a multi-region bucket.
   upvoted 1 times
  megumin 1 year, 5 months ago
Selected Answer: D
ok for D
   upvoted 1 times
  Mahmoud_E 1 year, 6 months ago
Selected Answer: D
D seems better
   upvoted 1 times
  AMohanty 1 year, 8 months ago
What is the use of Multi-Regional DataProc if ur Storage Data is Regional
   upvoted 2 times
  AzureDP900 1 year, 9 months ago
D is fine, There is no need of multi-region as mentioned in C. D is right in my opinion.
   upvoted 2 times
  vincy2202 2 years, 4 months ago
Selected Answer: D
D is the correct answer. Regional bucket is required, since multi regional bucket will incur additional cost to transfer the
data to a centralized location.
   upvoted 2 times
  vincy2202 2 years, 4 months ago
D seems to be the correct answer
   upvoted 1 times
  joe2211 2 years, 4 months ago
Selected Answer: D
vote D
   upvoted 2 times
  MaxNRG 2 years, 6 months ago
D – Launch a cluster in each region to pre-process and compress the raw data, then move the data into a regional bucket
and use Cloud Dataproc cluster.
Egress rates are most important. It is free inside of region - so make sense to move all data into one region for
processing/performance (from all continents). Cross-region cost is 0.01$ per GB, and inter-continent 0.12$ per GB.
If to consider just option B (moving all raw data into one region) then just monthly volume would cost:
900 TB (all 20M units daily) 30 days 0.12 $ = 3.24 M $ (just for data transfer). So, it definitely makes sense to
preprocess/compress data per region, and then move all that data into one region for final analysis. That would save up to
10-100 times on egress costs. Also, important aspect is processing time - running it in parallel on all regions accelerates
10-100 times on egress costs. Also, important aspect is processing time - running it in parallel on all regions accelerates
overall analysis effort. Faster result - faster in-field improvements.
Look this interesting video about price optimization in GCP (first 11.5 mins are about Storage/Network)
https://cloud.google.com/storage/docs/locations#considerations
   upvoted 6 times
  victory108 2 years, 9 months ago
D. Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region bucket and
use a Cloud Dataproc cluster to finish the job
   upvoted 1 times
  MamthaSJ 2 years, 9 months ago
Answer is D
   upvoted 3 times
  Yogikant 2 years, 10 months ago
Answer D:

moving data from one region to another region will incur network egress cost. By compressing data and then moving would
reduce this cost. Though running Dataproc for preprocessing in each region will incur additional cost but it will also reduce
cost of running Dataproc job on all pre-processed data will also reduce cost offsetting additional cost of Dataproc cluster at
regional level.
   upvoted 1 times
Load full discussion...

Platform

 Home  All Exams

 Examtopics PRO  Training Courses

    

You might also like