0% found this document useful (0 votes)
185 views6 pages

Professional Data Engineer Demo

This document provides a 10 question sample quiz on Google Cloud technologies and services. The questions cover topics like data modeling, data pipelines, databases, and data storage. It also advertises that the full quiz contains 173 premium questions and provides a link to purchase the full quiz PDF.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
185 views6 pages

Professional Data Engineer Demo

This document provides a 10 question sample quiz on Google Cloud technologies and services. The questions cover topics like data modeling, data pipelines, databases, and data storage. It also advertises that the full quiz contains 173 premium questions and provides a link to purchase the full quiz PDF.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Professional Data Engineer on Google Cloud

Platform
Google Professional-Data-Engineer
Version Demo

Total Demo Questions: 10

Total Premium Questions: 173


Buy Premium PDF

https://dumpsarena.com

[email protected]
QUESTION NO: 1

You are building a model to predict whether or not it will rain on a given day. You have thousands of input features and want
to see if you can improve training speed by removing some features while having a minimum effect on model accuracy.
What can you do?

A. Eliminate features that are highly correlated to the output labels.

B. Combine highly co-dependent features into one representative feature.

C. Instead of feeding in each feature individually, average their values in batches of 3.

D. Remove the features that have null values for more than 50% of the training records.

ANSWER: B

QUESTION NO: 2

Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday
season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They
are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine
learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial
load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?

A. Redefine the schema by evenly distributing reads and writes across the row space of the table.

B. The performance issue should be resolved over time as the site of the BigDate cluster is increased.

C. Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.

D. Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.

ANSWER: A

QUESTION NO: 3

You used Cloud Dataprep to create a recipe on a sample of data in a BigQuery table. You want to reuse this recipe on a
daily upload of data with the same schema, after the load job with variable execution time completes. What should you do?

A. Create a cron schedule in Cloud Dataprep.

B. Create an App Engine cron job to schedule the execution of the Cloud Dataprep job.

C. Export the recipe as a Cloud Dataprep template, and create a job in Cloud Scheduler.

DumpsArena - Pass Your Next Certification Exam Fast!


dumpsarena.com
D. Export the Cloud Dataprep job as a Cloud Dataflow template, and incorporate it into a Cloud Composer job.

ANSWER: C

QUESTION NO: 4

You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a
storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?

A. Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data.

B. Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the
time of the backup. In the event of an emergency, use the backup copy of the table.

C. Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the
data.

D. Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with
the time of the backup. In the event of an emergency, use the backup copy of the table.

ANSWER: B

QUESTION NO: 5

You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format
is Optimized Row Columnar (ORC). All ORC files have been successfully copied to a Cloud Storage bucket. You need to
replicate some data to the cluster’s local Hadoop Distributed File System (HDFS) to maximize performance. What are two
ways to start using Hive in Cloud Dataproc? (Choose two.)

A. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to HDFS. Mount the Hive tables locally.

B. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to any node of the Dataproc cluster. Mount
the Hive tables locally.

C. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to the master node of the Dataproc cluster.
Then run the Hadoop utility to copy them do HDFS. Mount the Hive tables from HDFS.

D. Leverage Cloud Storage connector for Hadoop to mount the ORC files as external Hive tables. Replicate external Hive
tables to the native ones.

E. Load the ORC files into BigQuery. Leverage BigQuery connector for Hadoop to mount the BigQuery tables as external
Hive tables. Replicate external Hive tables to the native ones.

ANSWER: B C

DumpsArena - Pass Your Next Certification Exam Fast!


dumpsarena.com
QUESTION NO: 6

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is
less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a
200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a
secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client
runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day
available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even
though the bandwidth utilization is rather low.

You are told that due to seasonality, your company expects the number of files to double for the next three months. Which
two actions should you take? (Choose two.)

A. Introduce data compression for each file to increase the rate file of file transfer.

B. Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.

C. Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.

D. Assemble 1,000 files into a tape archive (TAR) file. Transmit the TAR files instead, and disassemble the CSV files in the
cloud upon receiving them.

E. Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer
on-premises data to the designated storage bucket.

ANSWER: C E

QUESTION NO: 7

You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a
BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance
type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion,
when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your
pipeline? (Choose two.)

A. Increase the number of max workers

B. Use a larger instance type for your Cloud Dataflow workers

C. Change the zone of your Cloud Dataflow pipeline to run in us-central1

D. Create a temporary table in Cloud Bigtable that will act as a buffer for new data. Create a new step in your pipeline to
write to this table first, and then create a new pipeline to write from Cloud Bigtable to BigQuery

E. Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to
write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery

ANSWER: B E

DumpsArena - Pass Your Next Certification Exam Fast!


dumpsarena.com
QUESTION NO: 8

You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by
your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the
intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will
build machine learning models based on this data. You want to maximize performance and usability for your data science
team. Which two strategies should you adopt? (Choose two.)

A. Denormalize the data as must as possible.

B. Preserve the structure of the data as much as possible.

C. Use BigQuery UPDATE to further reduce the size of the dataset.

D. Develop a data pipeline where status updates are appended to BigQuery instead of updated.

E. Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file. Use BigQuery’s support for
external data sources to query.

ANSWER: D E

QUESTION NO: 9

You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user
base could grow exponentially, but you do not want to manage infrastructure scaling.

Which Google database service should you use?

A. Cloud SQL

B. BigQuery

C. Cloud Bigtable

D. Cloud Datastore

ANSWER: A

QUESTION NO: 10

You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will
account for the long-term data growth, while keeping the costs low. You also want to create snapshots of the data
periodically, so that you can make a point-in-time (PIT) recovery, or clone a copy of the data for Cloud Datastore in a
different environment. You want to archive these snapshots for a long time. Which two methods can accomplish this?
(Choose two.)

A. Use managed export, and store the data in a Cloud Storage bucket using Nearline or Coldline class.

DumpsArena - Pass Your Next Certification Exam Fast!


dumpsarena.com
B. Use managed export, and then import to Cloud Datastore in a separate project under a unique namespace reserved for
that export.

C. Use managed export, and then import the data into a BigQuery table created just for that export, and delete temporary
export files.

D. Write an application that uses Cloud Datastore client libraries to read all the entities. Treat each entity as a BigQuery table
row via BigQuery streaming insert. Assign an export timestamp for each export, and attach it as an extra column for each
row. Make sure that the BigQuery table is partitioned using the export timestamp column.

E. Write an application that uses Cloud Datastore client libraries to read all the entities. Format the exported data into a
JSON file. Apply compression before storing the data in Cloud Source Repositories.

ANSWER: C E

DumpsArena - Pass Your Next Certification Exam Fast!


dumpsarena.com

You might also like