0% found this document useful (0 votes)
86 views212 pages

DevOps - GCP - Final

The document outlines best practices and recommendations for managing applications on Google Cloud, specifically focusing on DevOps and Site Reliability Engineering (SRE) principles. It covers various scenarios including performance monitoring, incident management, logging, deployment strategies, and capacity planning. Each section presents multiple-choice questions with explanations to guide decision-making in cloud environments.

Uploaded by

Valentin Garcia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views212 pages

DevOps - GCP - Final

The document outlines best practices and recommendations for managing applications on Google Cloud, specifically focusing on DevOps and Site Reliability Engineering (SRE) principles. It covers various scenarios including performance monitoring, incident management, logging, deployment strategies, and capacity planning. Each section presents multiple-choice questions with explanations to guide decision-making in cloud environments.

Uploaded by

Valentin Garcia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 212

Professional Cloud DevOps – Google Cloud

1. You support a Node.js application running on Google Kubernetes Engine (GKE) in


production. The application makes several HTTP requests to dependent applications.
You want to anticipate which dependent applications might cause performance
issues. What should you do?
a. Instrument all applications with Stackdriver Profiler. Cloud Profiler is a
continuous CPU and heap profiling to improve performance and reduce costs.
b. Instrument all applications with Stackdriver Trace and review inter-service
HTTP requests. When it says “make several requests to dependent app” you’ll
need traces. Cloud Trace is usually used to find performance bottlenecks in
production. HTTP requests -> Trace.
Cloud Trace, a distributed tracing system for Google Cloud, helps you
understand how long it takes your application to handle incoming requests from
users or other applications, and how long it takes to complete operations like
RPC calls performed when handling the requests.
c. Use Stackdriver Debugger to review the execution of logic within each
application to instrument all applications.
d. Modify the Node.js application to log HTTP request and response times to
dependent applications. Use Stackdriver Logging to find dependent
applications that are performing poorly. It’s not the best practice to log every
response time in a log, makes it heavier.

NOTA: Stackdriver Profiler es Cloud Monitoring

2. You created a Stackdriver chart for CPU utilization in a dashboard within your
workspace project. You want to share the chart with your Site Reliability Engineering
(SRE) team only. You want to ensure you follow the principle of least privilege. What
should you do?
a. Share the workspace Project ID with the SRE team. Assign the SRE team the
Monitoring Viewer IAM role in the workspace project. You don’t want to share
the whole project (least privilege).
b. Share the workspace Project ID with the SRE team. Assign the SRE team the
Dashboard Viewer IAM role in the workspace project. You don’t want to share
the whole project (least privilege).
c. Click “Share chart by URL” and provide the URL to the SRE team. Assign the
SRE team the Monitoring Viewer IAM role in the workspace project. The role
exists and it follow the least privilege principle by just sharing the chart.
d. Click “Share chart by URL” and provide the URL to the SRE team. Assign the
SRE team the Dashboard Viewer IAM role in the workspace project. There is
no dashboard viewer role, it could be monitoring dashboard configuration
viewer.
3. Your organization wants to implement Site Reliability Engineering (SRE) culture and
principles. Recently, a service that you support had a limited outage. A manager on
another team asks you to provide a formal explanation of what happened so they can
action remediations. What should you do?
a. Develop a postmortem that includes the root causes, resolution, lessons
learned, and a prioritized list of action items. Share it with the manager only.
b. Develop a postmortem that includes the root causes, resolution, lessons
learned, and a prioritized list of action items. Share it on the engineering
organization's document portal. You share it with the team, not just the
manager, every part must know how to work better.
c. Develop a postmortem that includes the root causes, resolution, lessons
learned, the list of people responsible, and a list of action items for each
person. Share it with the manager only. SRE don’t blame specific people even
if they were part of the incident.
d. Develop a postmortem that includes the root causes, resolution, lessons
learned, the list of people responsible, and a list of action items for each
person. Share it on the engineering organization's document portal. SRE don’t
blame specific people even if they were part of the incident.
4. You have a set of applications running on a Google Kubernetes Engine (GKE) cluster,
and you are using Stackdriver Kubernetes Engine Monitoring. You are bringing a new
containerized application required by your company into production. This application
is written by a third party and cannot be modified or reconfigured. The application
writes its log information to /var/log/app_messages.log, and you want to send these
log entries to Stackdriver Logging. What should you do?
a. Use the default Stackdriver Kubernetes Engine Monitoring agent
configuration.
b. Deploy a Fluentd daemonset to GKE. Then create a customized input and
output configuration to tail the log file in the application's pods and write to
Stackdriver Logging. To collect log entries from a specific file within each node
in your GKE cluster, you can use a DaemonSet, Fluentd is often used as the
logging agent for log forwarding in GKE.
GKE's default logging agent provides a managed solution to deploy and manage
the agents that send the logs for your clusters to Cloud Logging. Depending on
your GKE cluster master version, either fluentd or fluentbit are used to collect
logs. Common use cases include:
- Collecting additional logs not written to STDOUT or STDERR.
- Customized log formatting.
c. Install Kubernetes on Google Compute Engine (GCE) and redeploy your
applications. Then customize the built-in Stackdriver Logging configuration to
tail the log file in the application's pods and write to Stackdriver Logging. If it
works it’s not the best option in terms of effort.
d. Write a script to tail the log file within the pod and write entries to standard
output. Run the script as a sidecar container with the application's pod.
Configure a shared volume between the containers to allow the script to have
read access to /var/log in the application container. You have to customize a
script while it says that it can’t be reconfigured.
5. You are running an application in a virtual machine (VM) using a custom Debian image.
The image has the Stackdriver Logging agent installed. The VM has the cloud-platform
scope. The application is logging information via syslog. You want to use Stackdriver
Logging in the Google Cloud Platform Console to visualize the logs. You notice that
syslog is not showing up in the "All logs" dropdown list of the Logs Viewer. What is
the first thing you should do?
a. Look for the agent's test log entry in the Logs Viewer.
b. Install the most recent version of the Stackdriver agent. It doesn’t have to
much to do.
c. Verify the VM service account access scope includes the monitoring.write
scope. Monitoring.write scope is for monitoring agent and not logging agent.
d. SSH to the VM and execute the following commands on your VM: ps ax | grep
fluentd. First recommended troubleshooting step is to check if the agent is
running or not. So with that command you can verify if the agent is running.
6. You use a multiple step Cloud Build pipeline to build and deploy your application to
Google Kubernetes Engine (GKE). You want to integrate with a third-party monitoring
platform by performing a HTTP POST of the build information to a webhook. You want
to minimize the development effort. What should you do?
a. Add logic to each Cloud Build step to HTTP POST the build information to a
webhook. There is not structure attribute to create a http request in the steps
and remember you want minimize the development effort.
b. Add a new step at the end of the pipeline in Cloud Build to HTTP POST the
build information to a webhook. This approach allows you to handle the
integration cleanly without cluttering each individual step with additional logic,
keeping your pipeline simple and maintainable.
c. Use Stackdriver Logging to create a logs-based metric from the Cloud Build
logs. Create an Alert with a Webhook notification type. It would work but it
doesn’t minimize the development effort.
d. Create a Cloud Pub/Sub push subscription to the Cloud Build cloud-builds
PubSub topic to HTTP POST the build information to a webhook.
7. You use Spinnaker to deploy your application and have created a canary deployment
stage in the pipeline. Your application has an in-memory cache that loads objects at
start time. You want to automate the comparison of the canary version against the
production version. How should you configure the canary analysis?
a. Compare the canary with a new deployment of the current production
version. You might be tempted to compare the canary deployment against your
current production deployment. Instead, always compare the canary against an
equivalent baseline, deployed at the same time.
The baseline uses the same version and configuration that is currently running
in production, but is otherwise identical to the canary:
• Same time of deployment
• Same size of deployment
• Same type and amount of traffic
b. Compare the canary with a new deployment of the previous production
version. No tiene mucho sentido.
c. Compare the canary with the existing deployment of the current production
version.
d. Compare the canary with the average performance of a sliding window of
previous production versions.
8. You support a high-traffic web application and want to ensure that the home page
loads in a timely manner. As a first step, you decide to implement a Service Level
Indicator (SLI) to represent home page request latency with an acceptable page load
time set to 100 ms. What is the Google-recommended way of calculating this SLI?
a. Bucketize the request latencies into ranges, and then compute the percentile
at 100 ms.
b. Bucketize the request latencies into ranges, and then compute the median and
90th percentiles.
c. Count the number of home page requests that load in under 100 ms, and then
divide by the total number of home page requests. It's recommended treating
the SLI as the ratio of two numbers: the number of good events divided by the
total number of events. For example: Number of successful HTTP requests /
total HTTP requests (success rate)
d. Count the number of home page request that load in under 100 ms, and then
divide by the total number of all web application requests. It says home page,
not all web app requests.
9. You deploy a new release of an internal application during a weekend maintenance
window when there is minimal user tragic. After the window ends, you learn that one
of the new features isn't working as expected in the production environment. After
an extended outage, you roll back the new release and deploy a fix.
You want to modify your release process to reduce the mean time to recovery so you
can avoid extended outages in the future. What should you do? (Choose two.)
a. Before merging new code, require 2 different peers to review the code
changes. That doesn’t automate anything.
b. Adopt the blue/green deployment strategy when releasing new code via a CD
server. It makes sense to use this strategy.
c. Integrate a code linting tool to validate coding standards before any code is
accepted into the repository. Linting is for code format. I assume the standards
are implied.
d. Require developers to run automated integration tests on their local
development environments before release. It’s a good approach, but that
doesn’t mean that it will work in the different environments while on
production.
e. Configure a CI server. Add a suite of unit tests to your code and have your CI
server run them on commit and verify any changes. As I need a second option,
why not cover the whole CI/CD pipeline.
10. You have a pool of application servers running on Compute Engine. You need to
provide a secure solution that requires the least amount of configuration and allows
developers to easily access application logs for troubleshooting. How would you
implement the solution on GCP?
a. Deploy the Stackdriver logging agent to the application servers. Give the
developers the IAM Logs Viewer role to access Stackdriver and view logs. It is
the most secure in terms of least privilege principle. Logs Viewer Role. A logging
agent is required to enable the custom logs pushed to Stackdriver. Developers
need only Log Viewer permission, which is enough in this case.
b. Deploy the Stackdriver logging agent to the application servers. Give the
developers the IAM Logs Private Logs Viewer role to access Stackdriver and
view logs. Logs Private Logs is for Data Logs. Private Logs Viewer gives you extra
access to Data access logs. Private Log viewer is a superset of log viewer
permission with elevated permission to view the private data in logs.
c. Deploy the Stackdriver monitoring agent to the application servers. Give the
developers the IAM Monitoring Viewer role to access Stackdriver and view
metrics. It says logs not metrics.
d. Install the gsutil command line tool on your application servers. Write a script
using gsutil to upload your application log to a Cloud Storage bucket, and then
schedule it to run via cron every 5 minutes. Give the developers the IAM
Object Viewer access to view the logs in the specified bucket. Claramente no
es la mejor opción si se habla de menor cantidad de configuración.
11. You support the backend of a mobile phone game that runs on a Google Kubernetes
Engine (GKE) cluster. The application is serving HTTP requests from users.
You need to implement a solution that will reduce the network cost. What should you
do?
a. Configure the VPC as a Shared VPC Host project. It doesn’t make sense.
b. Configure your network services on the Standard Tier. It could be but it doesn’t
specify the current tier, although is not the most suitable answer.
c. Configure your Kubernetes cluster as a Private Cluster. What does it have to
do with network costs?
d. Configure a Google Cloud HTTP Load Balancer as Ingress. Costs associated with
a load balancer are charged to the project containing the load balancer
components. Because of these benefits, container-native load balancing is the
recommended solution for load balancing through Ingress. When NEGs are used
with GKE Ingress, the Ingress controller facilitates the creation of all aspects of
the L7 load balancer. This includes creating the virtual IP address, forwarding
rules, health checks, firewall rules, and more.
12. You encountered a major service outage that affected all users of the service for
multiple hours. After several hours of incident management, the service returned to
normal, and user access was restored. You need to provide an incident summary to
relevant stakeholders following the Site Reliability Engineering recommended
practices. What should you do first?
a. Call individual stakeholders to explain what happened. No.
b. Develop a post-mortem to be distributed to stakeholders. Nothing else to add.
c. Send the Incident State Document to all the stakeholders. The thing is to give
a principle of solution, not the problem. Incident State document is used for
consultation with incident participants.
d. Require the engineer responsible to write an apology email to all
stakeholders. Blameless culture.
13. You are performing a semi-annual capacity planning exercise for your flagship service.
You expect a service user growth rate of 10% month-over-month over the next six
months. Your service is fully containerized and runs on Google Cloud Platform (GCP),
using a Google Kubernetes Engine (GKE) Standard regional cluster on three zones with
cluster autoscaler enabled. You currently consume about 30% of your total deployed
CPU capacity, and you require resilience against the failure of a zone. You want to
ensure that your users experience minimal negative impact as a result of this growth
or as a result of zone failure, while avoiding unnecessary costs. How should you
prepare to handle the predicted growth?
a. Verify the maximum node pool size, enable a horizontal pod autoscaler, and
then perform a load test to verify your expected resource needs. The
Horizontal Pod Autoscaler changes the shape of your Kubernetes workload by
automatically increasing or decreasing the number of Pods in response to the
workload's CPU or memory consumption.
b. Because you are deployed on GKE and are using a cluster autoscaler, your GKE
cluster will scale automatically, regardless of growth rate. The autoscaler
works based on the workload, no the CPU.
c. Because you are at only 30% utilization, you have significant headroom and
you won't need to add any additional capacity for this rate of growth.
d. Proactively add 60% more node capacity to account for six months of 10%
growth rate, and then perform a load test to make sure you have enough
capacity. Even if it works, for 5 months you’ll be spending more than needed.
14. Your application images are built and pushed to Google Container Registry (GCR). You
want to build an automated pipeline that deploys the application when the image is
updated while minimizing the development effort. What should you do?
a. Use Cloud Build to trigger a Spinnaker pipeline. Hoy por hoy Cloud Build
también tiene Pub/Sub pero hace 2 años asumo que no, por eso es la B.
b. Use Cloud Pub/Sub to trigger a Spinnaker pipeline.
What is Google Cloud Build? Continuously build, test, and deploy. Cloud Build
lets you build software quickly across all languages. Get complete control over
defining custom workflows for building, testing, and deploying across multiple
environments such as VMs, serverless, Kubernetes, or Firebase.
What is Spinnaker? Multi-cloud continuous delivery platform for releasing
software changes with high velocity and confidence. Created at Netflix, it has
been battle-tested in production by hundreds of teams over millions of
deployments. It combines a powerful and flexible pipeline management system
with integrations to the major cloud providers.
c. Use a custom builder in Cloud Build to trigger Jenkins pipeline.
d. Use Cloud Pub/Sub to trigger a custom deployment service running in Google
Kubernetes Engine (GKE). Es un dato que sea en GKE, no responde como tal a
la pregunta.
15. Our product is currently deployed in three Google Cloud Platform (GCP) zones with
your users divided between the zones. You can fail over from one zone to another, but
it causes a 10-minute service disruption for the affected users. You typically
experience a database failure once per quarter and can detect it within five minutes.
You are cataloging the reliability risks of a new real-time chat feature for your product.
You catalog the following information for each risk:
• Mean Time to Detect (MTTD) in minutes
• Mean Time to Repair (MTTR) in minutes
• Mean Time Between Failure (MTBF) in days
• User Impact Percentage
The chat feature requires a new database system that takes twice as long to
successfully fail over between zones. You want to account for the risk of the new
database failing in one zone. What would be the values for the risk of database failover
with the new system?
a. MTTD: 5 MTTR: 10 MTBF: 90 Impact: 33%
b. MTTD: 5 MTTR: 20 MTBF: 90 Impact: 33%. Es matemática, el MTTD y el
MTBF son todos iguales, y como te dicen 3 zonas, el impacto se divide por
3. Quedando la A y B, pero como te dice que requiere un tiempo doble para
hacer el failover (era 10 antes) el MTTR es 20.
c. MTTD: 5 MTTR: 10 MTBF: 90 Impact: 50%
d. MTTD: 5 MTTR: 20 MTBF: 90 Impact: 50%
16. You are managing the production deployment to a set of Google Kubernetes Engine
(GKE) clusters. You want to make sure only images which are successfully built by your
trusted CI/CD pipeline are deployed to production. What should you do?
a. Enable Cloud Security Scanner on the clusters.
b. Enable Vulnerability Analysis on the Container Registry.
c. Set up the Kubernetes Engine clusters as private clusters.
d. Set up the Kubernetes Engine clusters with Binary Authorization. Binary
Authorization is a feature of Google Kubernetes Engine that allows you to
ensure that only containers that are verified to be from a trusted source are
deployed to your clusters. It works by using a policy that checks the signatures
of container images before they are deployed. You can configure Binary
Authorization to require that all images are signed by a trusted certificate
authority (CA) or that they are signed by a trusted key that you manage. This
ensures that only images that have been successfully built by your trusted CI/CD
pipeline are deployed to your production clusters.
17. You support an e-commerce application that runs on a large Google Kubernetes
Engine (GKE) cluster deployed on-premises and on Google Cloud Platform. The
application consists of microservices that run in containers. You want to identify
containers that are using the most CPU and memory. What should you do?
a. Use Stackdriver Kubernetes Engine Monitoring.
b. Use Prometheus to collect and aggregate logs per container, and then analyze
the results in Grafana.
c. Use the Stackdriver Monitoring API to create custom metrics, and then
organize your containers using groups.
d. Use Stackdriver Logging to export application logs to BigQuery, aggregate logs
per container, and then analyze CPU and memory consumption.
18. Your company experiences bugs, outages, and slowness in its production systems.
Developers use the production environment for new feature development and bug
fixes. Configuration and experiments are done in the production environment, causing
outages for users. Testers use the production environment for load testing, which
often slows the production systems. You need to redesign the environment to reduce
the number of bugs and outages in production and to enable testers to toad test new
features. What should you do?
a. Create an automated testing script in production to detect failures as soon as
they occur. Nadie lo revisa nunca?
b. Create a development environment with smaller server capacity and give
access only to developers and testers. You couldn’t test the load that way.
c. Secure the production environment to ensure that developers can't change it
and set up one controlled update per year. Una vez por año?
d. Create a development environment for writing code and a test environment
for configurations, experiments, and load testing. Esto es lo mejor, crear un
entorno dev y otro test.
19. You support an application running on App Engine. The application is used globally
and accessed from various device types. You want to know the number of
connections. You are using Stackdriver Monitoring for App Engine. What metric should
you use?
a. flex/connections/current. Number of current active connections per App
Engine flexible environment version. An App Engine app is made up of a single
application resource that consists of one or more services. Each service can be
configured to use different runtimes and to operate with different performance
settings. Within each service, you deploy versions of that service. Each version
then runs within one or more instances, depending on how much traffic you
configured it to handle.
b. tcp_ssl_proxy/new_connections. Metrics for Cloud Load Balancing.
c. tcp_ssl_proxy/open_connections. Metrics for Cloud Load Balancing.
d. flex/instance/connections/current. Number of current active connections per
App Engine flexible environment instance.
20. You support an application deployed on Compute Engine. The application connects to
a Cloud SQL instance to store and retrieve data. After an update to the application,
users report errors showing database timeout messages. The number of concurrent
active users remained stable. You need to find the most probable cause of the
database timeout. What should you do?
a. Check the serial port logs of the Compute Engine instance. No tiene nada que
ver.
b. Use Stackdriver Profiler to visualize the resources utilization throughout the
application. The most probable cause of the database timeout when the
number of concurrent active users remained stable is a performance issue.
Stackdriver Profiler can be used to identify and diagnose performance issues in
the application. Profiler can help you to visualize the resources utilization
throughout the application, including CPU and memory usage, and identify any
parts of the application that might be causing high load. This can help you
understand how the application is utilizing the resources and identify any
bottlenecks in the code that might be causing the timeouts.
c. Determine whether there is an increased number of connections to the Cloud
SQL instance. No te dice que las conexiones se mantienen estables?
d. Use Cloud Security Scanner to see whether your Cloud SQL is under a
Distributed Denial of Service (DDoS) attack. It’s a security tool that can detect
vulnerabilities in the application, but it's not related to the database timeouts.
21. Your application images are built using Cloud Build and pushed to Google Container
Registry (GCR). You want to be able to specify a particular version of your application
for deployment based on the release version tagged in source control. What should
you do when you push the image?
a. Reference the image digest in the source control tag.
b. Supply the source control tag as a parameter within the image name.
c. Use Cloud Build to include the release version tag in the application image.
Labure con Cloud Build y es la más sencilla que cumple con lo que pide.
d. Use GCR digest versioning to match the image to the tag in source control.
22. You are on-call for an infrastructure service that has a large number of dependent
systems. You receive an alert indicating that the service is failing to serve most of its
requests and all of its dependent systems with hundreds of thousands of users are
affected. As part of your Site Reliability Engineering (SRE) incident management
protocol, you declare yourself Incident Commander (IC) and pull in two experienced
people from your team as Operations Lead (OL) and
Communications Lead (CL). What should you do next?
a. Look for ways to mitigate user impact and deploy the mitigations to
production.
b. Contact the affected service owners and update them on the status of the
incident.
c. Establish a communication channel where incident responders and leads can
communicate with each other.
Prepare Beforehand: In addition to incident response training, it helps to
prepare for an incident beforehand. Use the following tips and strategies to be
better prepared.
Decide on a communication channel: Decide and agree on a communication
channel (Slack, a phone bridge, IRC, HipChat, etc.) beforehand.
Keep your audience informed: Unless you acknowledge that an incident is
happening and actively being addressed, people will automatically assume
nothing is being done to resolve the issue. Similarly, if you forget to call off the
response once the issue has been mitigated or resolved, people will assume the
incident is ongoing. You can preempt this dynamic by keeping your audience
informed throughout the incident with regular status updates. Having a
prepared list of contacts (see the next tip) saves valuable time and ensures you
don’t miss anyone.
d. Start a postmortem, add incident information, circulate the draft internally,
and ask internal stakeholders for input. The postmortem should be done after
the incident is resolved.
23. You are developing a strategy for monitoring your Google Cloud Platform (GCP)
projects in production using Stackdriver Workspaces. One of the requirements is to be
able to quickly identify and react to production environment issues without false
alerts from development and staging projects. You want to ensure that you adhere to
the principle of least privilege when providing relevant team members with access to
Stackdriver Workspaces. What should you do?
a. Grant relevant team members read access to all GCP production projects.
Create Stackdriver workspaces inside each project.
b. Grant relevant team members the Project Viewer IAM role on all GCP
production projects. Create Stackdriver workspaces inside each project.
c. Choose an existing GCP production project to host the monitoring workspace.
Attach the production projects to this workspace. Grant relevant team
members read access to the Stackdriver Workspace.
d. Create a new GCP monitoring project and create a Stackdriver Workspace
inside it. Attach the production projects to this workspace. Grant relevant
team members read access to the Stackdriver Workspace. When you want
view or monitor time-series data that is stored by multiple resource containers,
you must create or select a project, and then configure the metrics scope of that
project. After you add a resource container to a metrics scope, it becomes a
monitored resource container.
Adding a resource container to a metrics scope doesn't change the container.
However, this action lets the scoping project chart and monitor the time-series
data stored by the resource container. If the added resource container includes
children, time-series data stored in those child resources isn't charted or
monitored by the scoping project.
24. You currently store the virtual machine (VM) utilization logs in Stackdriver. You need
to provide an easy-to-share interactive VM utilization dashboard that is updated in
real time and contains information aggregated on a quarterly basis. You want to use
Google Cloud Platform solutions. What should you do?
a. 1. Export VM utilization logs from Stackdriver to BigQuery. 2. Create a
dashboard in Data Studio. 3. Share the dashboard with your stakeholders.
b. 1. Export VM utilization logs from Stackdriver to Cloud Pub/Sub. 2. From Cloud
Pub/Sub, send the logs to a Security Information and Event Management
(SIEM) system. 3. Build the dashboards in the SIEM system and share with your
stakeholders. Habla de la VM utilization no de aspectos de seguridad.
c. 1. Export VM utilization logs from Stackdriver to BigQuery. 2. From BigQuery,
export the logs to a CSV file. 3. Import the CSV file into Google Sheets. 4. Build
a dashboard in Google Sheets and share it with your stakeholders. You want
to use GCP solutions, Google Sheets is a SaaS but not straight from Google
Cloud.
d. 1. Export VM utilization logs from Stackdriver to a Cloud Storage bucket. 2.
Enable the Cloud Storage API to pull the logs programmatically. 3. Build a
custom data visualization application. 4. Display the pulled logs in a custom
dashboard. It’s not quite it.
25. You need to run a business-critical workload on a fixed set of Compute Engine
instances for several months. The workload is stable with the exact amount of
resources allocated to it. You want to lower the costs for this workload without any
performance implications. What should you do?
a. Purchase Committed Use Discounts. When you know that you will have a
workload running on a fixed set of instances for several months, you can take
advantage of Committed Use Discounts to lower the costs. These discounts
provide a lower, sustained usage rate for a committed period of time (e.g. 1 or
3 years) in exchange for committing to use a certain number of virtual machine
(VM) instances or n1-standard hours. This is a good choice because you can
lower costs without any performance implications and the workload is stable
with the exact amount of resources allocated to it.
b. Migrate the instances to a Managed Instance Group.
c. Convert the instances to preemptible virtual machines. They can be stopped
randomly. So, as it is business-critical this doesn’t fit.
d. Create an Unmanaged Instance Group for the instances used to run the
workload.
26. You are part of an organization that follows SRE practices and principles. You are
taking over the management of a new service from the Development Team, and you
conduct a Production Readiness Review (PRR). After the PRR analysis phase, you
determine that the service cannot currently meet its Service Level
Objectives (SLOs). You want to ensure that the service can meet its SLOs in production.
What should you do next?
a. Adjust the SLO targets to be achievable by the service so you can bring it into
production. Tarde o temprano, aunque ajuste los SLO como para zafar, en algún
momento van a salir errores peores al estar deployado.
b. Notify the development team that they will have to provide production
support for the service.
c. Identify recommended reliability improvements to the service to be
completed before handover. A Production Readiness Review (PRR) is an
assessment of a service's readiness to be deployed in production. A service that
cannot meet its Service Level Objectives (SLOs) is not ready to be deployed in
production. The next step is to identify the recommended reliability
improvements that should be made to the service before it can be handed over
to the SRE team.
d. Bring the service into production with no SLOs and build them when you have
collected operational data.
27. You are running an experiment to see whether your users like a new feature of a web
application. Shortly after deploying the feature as a canary release, you receive a spike
in the number of 500 errors sent to users, and your monitoring reports show increased
latency. You want to quickly minimize the negative impact on users. What should you
do first?
a. Roll back the experimental canary release. When you receive a spike in the
number of 500 errors sent to users and increased latency after deploying a new
feature, it is important to take immediate action to minimize the negative
impact on users. The first step should be to roll back the experimental canary
release. This will remove the new feature from production and revert the
system to its previous state, which should reduce the number of errors and
decrease latency. After rolling back the canary release, you can start monitoring
latency, traffic, errors, and saturation (Option B) to determine the impact of the
rollback and to make sure that the system is stable. You can also record data for
the postmortem document of the incident (Option C) to learn from the incident
and to improve future releases. Lastly, you can trace the origin of 500 errors and
the root cause of increased latency (Option D) after rolling back the canary
release to understand what went wrong and to prevent similar issues from
happening again in the future.
b. Start monitoring latency, traffic, errors, and saturation.
c. Record data for the postmortem document of the incident.
d. Trace the origin of 500 errors and the root cause of increased latency.
28. You are responsible for creating and modifying the Terraform templates that define
your Infrastructure. Because two new engineers will also be working on the same
code, you need to define a process and adopt a tool that will prevent you from
overwriting each other's code. You also want to ensure that you capture all updates
in the latest version. What should you do?
a. Store your code in a Git-based version control system. Establish a process that
allows developers to merge their own changes at the end of each day. Package
and upload code to a versioned Cloud Storage basket as the latest master
version. No tiene sentido alguno que los devs puedan mergear sin test o
aprobación previa.
b. Store your code in a Git-based version control system. Establish a process that
includes code reviews by peers and unit testing to ensure integrity and
functionality before integration of code. Establish a process where the fully
integrated code in the repository becomes the latest master version. Ahi va,
una vez los tests corren bien, hacer un PR y mezclar todo a main o master.
c. Store your code as text files in Google Drive in a defined folder structure that
organizes the files. At the end of each day, confirm that all changes have been
captured in the files within the folder structure. Rename the folder structure
with a predefined naming convention that increments the version. Archivos
terraform como archivos de texto? Preferible usar un repositorio.
d. Store your code as text files in Google Drive in a defined folder structure that
organizes the files. At the end of each day, confirm that all changes have been
captured in the files within the folder structure and create a new .zip archive
with a predefined naming convention. Upload the .zip archive to a versioned
Cloud Storage bucket and accept it as the latest version.
29. You support a high-traffic web application with a microservice architecture. The home
page of the application displays multiple widgets containing content such as the
current weather, stock prices, and news headlines. The main serving thread makes a
call to a dedicated microservice for each widget and then lays out the homepage for
the user. The microservices occasionally fail; when that happens, the serving thread
serves the homepage with some missing content. Users of the application are unhappy
if this degraded mode occurs too frequently, but they would rather have some content
served instead of no content at all. You want to set a Service Level Objective (SLO) to
ensure that the user experience does not degrade too much. What Service Level
Indicator (SLI) should you use to measure this?
a. A quality SLI: the ratio of non-degraded responses to total responses. Quality
is a helpful SLI for complex services that are designed to fail gracefully by
degrading when dependencies are slow or unavailable. The SLI for quality is
defined as follows: The proportion of valid requests served without degradation
of service.
b. An availability SLI: the ratio of healthy microservices to the total number of
microservices. An availability SLI would measure the availability of the
microservices, not the user experience of the web application.
c. A freshness SLI: the proportion of widgets that have been updated within the
last 10 minutes. A freshness SLI would measure the freshness of the content,
not the user experience of the web application.
d. A latency SLI: the ratio of microservice calls that complete in under 100 ms to
the total number of microservice calls. A latency SLI would measure the speed
of the microservices, not the user experience of the web application.
30. You support a multi-region web service running on Google Kubernetes Engine (GKE)
behind a Global HTTP/S Cloud Load Balancer (CLB). For legacy reasons, user requests
first go through a third-party Content Delivery Network (CDN), which then routes
traffic to the CLB. You have already implemented an availability
Service Level Indicator (SLI) at the CLB level. However, you want to increase coverage
in case of a potential load balancer misconfiguration, CDN failure, or other global
networking catastrophe. Where should you measure this new SLI? (Choose two.)
a. Your application servers' logs.
b. Instrumentation coded directly in the client.
c. Metrics exported from the application servers.
d. GKE health checks for your application servers.
e. A synthetic client that periodically sends simulated user requests.

From GCP “Measure your SLO’S”

Choose a measurement method

The following are suggested approaches that you can implement over time, listed in
order of increasing effort:

• Use application server exports and infrastructure metrics. Typically, you can access
these metrics immediately, and they quickly provide value. Some APM tools include
built-in SLO tooling.
• Use client instrumentation. Because legacy systems typically lack built-in, end-
user client instrumentation, setting up instrumentation might require a significant
investment. However, if you use an APM suite or frontend framework that
provides client instrumentation, you can quickly gain insight into your customer's
happiness.
• Use logs processing. If you can't implement server exports or client instrumentation
(previous bullets) but logs do exist, logs processing might be your best approach.
Another method is to combine exports and logs processing. Use exports as an
immediate source for some SLIs (such as immediate availability) and logs processing
for long-term signals (such as slow-burn alerts discussed in the SLOs and Alert)
guide.
• Implement synthetic testing. After you have a basic understanding of how your
customers use your service, you test your service. For example, you can seed test
accounts with known-good data and query for it. This approach can help highlight
failure modes that aren't readily observed, such as for low-traffic services.
31. Your team is designing a new application for deployment into Google Kubernetes
Engine (GKE). You need to set up monitoring to collect and aggregate various
application-level metrics in a centralized location. You want to use Google Cloud
Platform services while minimizing the amount of work required to set up monitoring.
What should you do?
a. Publish various metrics from the application directly to the Stackdriver
Monitoring API, and then observe these custom metrics in Stackdriver.
Algunas aplicaciones pueden que no tengan la instrumentación suficiente como
para poder hacer esto de base.
b. Install the Cloud Pub/Sub client libraries, push various metrics from the
application to various topics, and then observe the aggregated metrics in
Stackdriver. Cloud Pub/Sub libraries?
c. Install the OpenTelemetry client libraries in the application, configure
Stackdriver as the export destination for the metrics, and then observe the
application's metrics in Stackdriver. Podría ser la A también, pero en términos
de facilidad gana esta opción. Acordate que desde la configuración de las
librerías podes setear el collector o destino para exportar las métricas, logs, etc.
d. Emit all metrics in the form of application-specific log messages, pass these
messages from the containers to the Stackdriver logging collector, and then
observe metrics in Stackdriver. Minimizar el esfuerzo no lo hace.
32. You support a production service that runs on a single Compute Engine instance. You
regularly need to spend time on recreating the service by deleting the crashing
instance and creating a new instance based on the relevant image. You want to reduce
the time spent performing manual operations while following Site Reliability
Engineering principles. What should you do?
a. File a bug with the development team so they can find the root cause of the
crashing instance.
b. Create a Managed instance Group with a single instance and use health checks
to determine the system status. A Managed Instance Group (MIG) is a GCP
service that automatically creates, scales, and deletes instances based on the
policies that you set. By creating a MIG with a single instance, you can set up
health checks to automatically detect when the instance is crashing and
replace it with a new instance, without manual intervention. This ensures that
the service is always available, and it can reduce the time spent recreating the
service and also minimize the risk of human errors.
c. Add a Load Balancer in front of the Compute Engine instance and use health
checks to determine the system status.
d. Create a Stackdriver Monitoring dashboard with SMS alerts to be able to start
recreating the crashed instance promptly after it was crashed.
33. Your application artifacts are being built and deployed via a CI/CD pipeline. You want
the CI/CD pipeline to securely access application secrets. You also want to more easily
rotate secrets in case of a security breach. What should you do?
a. Prompt developers for secrets at build time. Instruct developers to not store
secrets at rest.
b. Store secrets in a separate configuration file on Git. Provide select developers
with access to the configuration file.
c. Store secrets in Cloud Storage encrypted with a key from Cloud KMS. Provide
the CI/CD pipeline with access to Cloud KMS via IAM. By storing secrets in
Cloud Storage, you can take advantage of the security features provided by the
platform and encrypt them using Cloud KMS, a GCP service that allows you to
create, manage, and use encryption keys. This way you can control who has
access to the secrets, and you can easily rotate the encryption keys in case of a
security breach. Additionally, you can use IAM to give the CI/CD pipeline the
necessary permissions to access the secrets and use them during the
deployment process, without the need to store them in the source code or give
access to them to specific developers.
d. Encrypt the secrets and store them in the source code repository. Store a
decryption key in a separate repository and grant your pipeline access to it.
34. Your company follows Site Reliability Engineering practices. You are the person in
charge of Communications for a large, ongoing incident affecting your customer-facing
applications. There is still no estimated time for a resolution of the outage. You are
receiving emails from internal stakeholders who want updates on the outage, as well
as emails from customers who want to know what is happening. You want to
efficiently provide updates to everyone affected by the outage.
What should you do?
a. Focus on responding to internal stakeholders at least every 30 minutes.
Commit to next update times.
b. Provide periodic updates to all stakeholders in a timely manner. Commit to a
next update time in all communications. Provide periodic updates to all
stakeholders in a timely manner. Commit to a "next update" time in all
communications. During an incident, it's important to keep all stakeholders
informed about the current situation and any progress made towards resolving
the problem. Providing periodic updates to all stakeholders, including internal
stakeholders and customers, is the most effective way of ensuring everyone is
informed. Additionally, by committing to a "next update" time in all
communications, you ensure that stakeholders are aware of when they can
expect to receive new information. This way you can avoid the unnecessary
pressure of responding to emails constantly, and you can focus on the incident
resolution and providing accurate information.
c. Delegate the responding to internal stakeholder emails to another member of
the Incident Response Team. Focus on providing responses directly to
customers. Si el lider quiere delegar al incident commander, debe pedir otro
communications leader, el IC puede delegar.
d. Provide all internal stakeholder emails to the Incident Commander, and allow
them to manage internal communications. Focus on providing responses
directly to customers. No tiene sentido delegar si yo estoy a cargo de la
comunicación.
35. Your team uses Cloud Build for all CI/CD pipelines. You want to use the kubectl builder
for Cloud Build to deploy new images to Google Kubernetes Engine (GKE). You need
to authenticate to GKE while minimizing development effort. What should you do?
a. Assign the Container Developer role to the Cloud Build service account. Lo
probe para hacer el despliegue en GKE con la pipeline. Google Cloud Build uses
a default service account to run the build, this service account is automatically
created by Cloud Build and it has the necessary permissions to access the
resources used by the build. By assigning the Container Developer role to this
service account, it will have the necessary permissions to deploy new images to
GKE. This way you don't need to create a new service account or specify the role
in the cloudbuild.yaml file. This is an easy and secure way to authenticate to GKE
without adding extra steps to the CI/CD pipeline.
b. Specify the Container Developer role for Cloud Build in the cloudbuild.yaml
file.
c. Create a new service account with the Container Developer role and use it to
run Cloud Build. Es válido, pero es más rapido darle los roles a la service account
desde el IAM.
d. Create a separate step in Cloud Build to retrieve service account credentials
and pass these to kubectl.
36. You support an application that stores product information in cached memory. For
every cache miss, an entry is logged in Stackdriver Logging. You want to visualize how
often a cache miss happens over time. What should you do?
a. Link Stackdriver Logging as a source in Google Data Studio. Filter the logs on
the cache misses.
b. Configure Stackdriver Profiler to identify and visualize when the cache misses
occur based on the logs.
c. Create a logs-based metric in Stackdriver Logging and a dashboard for that
metric in Stackdriver Monitoring. Stackdriver Logging provides the ability to
extract metrics from logs, these metrics are called logs-based metrics. You can
create a logs-based metric that counts the number of cache miss logs and
configure it to be collected at a regular interval, this way you can see how often
a cache miss happens over time. Additionally, Stackdriver Monitoring provides
the ability to create dashboards that display the metrics collected by logs-based
metrics, you can use this dashboard to visualize the cache misses over time and
easily identify trends or spikes in the data. Tiene sentido el uso de logs-based
metrics si te dice que por cada error de caché se genera un entry en los logs.
d. Configure BigQuery as a sink for Stackdriver Logging. Create a scheduled query
to filter the cache miss logs and write them to a separate table.
37. You need to deploy a new service to production. The service needs to automatically
scale using a Managed Instance Group (MIG) and should be deployed over multiple
regions. The service needs a large number of resources for each instance and you need
to plan for capacity. What should you do?
a. Use the n1-highcpu-96 machine type in the configuration of the MIG.
b. Monitor results of Stackdriver Trace to determine the required amount of
resources.
c. Validate that the resource requirements are within the available quota limits
of each region. Validate that the resource requirements are within the available
quota limits of each region. It is important to ensure that the resource
requirements are within the available quota limits in each region before
deploying the service, to avoid exceeding the limits and causing problems. This
is essential to ensure that the service is deployed correctly and has the
necessary capacity to handle the load. Allocation quotas, also known as
resource quotas, define the number of resources that your project has access
to. Compute Engine enforces allocation quotas on resource usage for various
reasons. For example, quotas help to protect the community of Google Cloud
users by preventing unforeseen spikes in usage.
d. Deploy the service in one region and use a global load balancer to route traffic
to this region. It says multiple regions, not just one. It’s also a good option, but
C is the right one.
38. You are running an application on Compute Engine and collecting logs through
Stackdriver. You discover that some personally identifiable information (PII) is leaking
into certain log entry fields. All PII entries begin with the text userinfo. You want to
capture these log entries in a secure location for later review and prevent them from
leaking to Stackdriver Logging. What should you do?
a. Create a basic log filter matching userinfo, and then configure a log export in
the Stackdriver console with Cloud Storage as a sink. It stills leaks to Stackdriver
Logging.
b. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries
containing userinfo, and then copy the entries to a Cloud Storage bucket.
Fluentd can filter the logs quite nicely before passing information to Stackdriver.
It can cover sensitive information such as credit card details, social security
numbers, etc. Once the filtering is done, then the log can be passed to Cloud
Storage.
c. Create an advanced log filter matching userinfo, configure a log export in the
Stackdriver console with Cloud Storage as a sink, and then configure a log
exclusion with userinfo as a filter. It stills leaks to Stackdriver Logging.
d. Use a Fluentd filter plugin with the Stackdriver Agent to remove log entries
containing userinfo, create an advanced log filter matching userinfo, and then
configure a log export in the Stackdriver console with Cloud Storage as a sink.
It stills leaks to Stackdriver Logging.
39. You have a CI/CD pipeline that uses Cloud Build to build new Docker images and push
them to Docker Hub. You use Git for code versioning. After making a change in the
Cloud Build YAML configuration, you notice that no new artifacts are being built by
the pipeline. You need to resolve the issue following Site Reliability Engineering
practices. What should you do?
a. Disable the CI pipeline and revert to manually building and pushing the
artifacts. O sea claramente hay un error, pero hacerlo manualmente a la larga
no resuelve nada.
b. Change the CI pipeline to push the artifacts is Container Registry instead of
Docker Hub. En base a las dependencias de la empresa, capaz lo mejor es
pushearlo a Docker Hub, igual no soluciona nada en principio.
c. Upload the configuration YAML file to Cloud Storage and use Error Reporting
to identify and fix the issue. No creo que lo solucione.
d. Run a Git compare between the previous and current Cloud Build
Configuration files to find and fix the bug.
40. Your company follows Site Reliability Engineering principles. You are writing a
postmortem for an incident, triggered by a software change, that severely affected
users. You want to prevent severe incidents from happening in the future. What
should you do?
a. Identify engineers responsible for the incident and escalate to their senior
management.
b. Ensure that test cases that catch errors of this type are run successfully before
new software releases. Implement a strong continuous integration and
continuous delivery (CI/CD) pipeline. This will help to automate the testing
process and ensure that new code is tested thoroughly before it is deployed to
production. Use a blameless postmortem process. This will encourage engineers
to be open and honest about their mistakes, which will help to identify and fix
problems more quickly. Invest in training and education for your engineers. This
will help them to understand the importance of reliability and how to write code
that is less prone to errors. By taking these steps, you can help to prevent future
incidents and improve the overall reliability of your system.
c. Follow up with the employees who reviewed the changes and prescribe
practices they should follow in the future. While following up with reviewers is
important, the primary focus should be on improving the testing process to
catch errors before they reach production.
d. Design a policy that will require on-call teams to immediately call engineers
and management to discuss a plan of action if an incident occurs. Might be
helpful in some cases, but it's not a general solution. Not all incidents require
immediate escalation, and relying solely on on-call teams to make that decision
can lead to unnecessary escalations.
41. You support a high-traffic web application that runs on Google Cloud Platform (GCP).
You need to measure application reliability from a user perspective without making
any engineering changes to it. What should you do? (Choose two.)
a. Review current application metrics and add new ones as needed. Adding
metrics doesn't necessarily reflect reliability from a user perspective.
b. Modify the code to capture additional information for user interaction. It
shouldn’t have any changes. O sea tenes que medir, no modificar, en todo caso
es el paso siguiente.
c. Analyze the web proxy logs only and capture response time of each request.
Analyzing proxy logs doesn't connect the findings to a user perspective. Web
proxy is not a reverse proxy, it is a forward proxy - a type of server that runs at
the client side.
d. Create new synthetic clients to simulate a user journey using the application.
Simulas la experiencia del user sin hacer cambios a la aplicación, pero tenes que
generar el modelo sintético. Ejemplo Instana, que tiene para hacer el test
sintético, simulando como si fuese un usuario y así obtener las métricas por
ejemplo.
e. Use current and historic Request Logs to trace customer interaction with the
application. De calle esta es una opción.
42. You manage an application that is writing logs to Stackdriver Logging. You need to give
some team members the ability to export logs. What should you do?
a. Grant the team members the IAM role of logging.configWriter on Cloud IAM.
b. Configure Access Context Manager to allow only these members to export
logs.
c. Create and grant a custom IAM role with the permissions logging.sinks.list and
logging.sink.get. No está del todo bien, faltarían otros permisos, pero en
principio sigue el least privilege.
d. Create an Organizational Policy in Cloud IAM to allow only these members to
create log exports. Error prone.
43. Your application services run in Google Kubernetes Engine (GKE). You want to make
sure that only images from your centrally-managed Google Container Registry (GCR)
image registry in the altostrat-images project can be deployed to the cluster while
minimizing development time. What should you do?
a. Create a custom builder for Cloud Build that will only push images to
gcr.io/altostrat-images.
b. Use a Binary Authorization policy that includes the whitelist name pattern
gcr.io/altostrat-images/.
c. Add logic to the deployment pipeline to check that all manifests contain only
images from gcr.io/altostrat-images.
d. Add a tag to each image in gcr.io/altostrat-images and check that this tag is
present when the image is deployed.
44. Your team has recently deployed a NGINX-based application into Google Kubernetes
Engine (GKE) and has exposed it to the public via an HTTP Google Cloud Load Balancer
(GCLB) ingress. You want to scale the deployment of the application's frontend using
an appropriate Service Level Indicator (SLI). What should you do?
a. Configure the horizontal pod autoscaler to use the average response time
from the Liveness and Readiness probes. Using health check as a trigger of
scaling is weird. If the response time of the health check is delayed, it may be
caused by resources issues such as CPU, memories, and so on. So you should
use such values as SLIs.
b. Configure the vertical pod autoscaler in GKE and enable the cluster autoscaler
to scale the cluster as pods expand.
c. Install the Stackdriver custom metrics adapter and configure a horizontal pod
autoscaler to use the number of requests provided by the GCLB. To scale the
deployment of the application's frontend using an appropriate Service Level
Indicator (SLI), we need to monitor the traffic coming to the application. One
way to do this is to install the Stackdriver custom metrics adapter, which
provides visibility into GCLB metrics such as request counts, bytes sent and
received, and active connections. We can then configure a horizontal pod
autoscaler (HPA) to scale the number of pods based on the request count
coming through the GCLB, which will help to ensure that our application is
always available to handle the incoming traffic.
d. Expose the NGINX stats endpoint and configure the horizontal pod autoscaler
to use the request metrics exposed by the NGINX deployment. It should be
using custom metrics, that’s why it’s C.
45. Your company follows Site Reliability Engineering practices. You are the Incident
Commander for a new, customer-impacting incident. You need to immediately assign
two incident management roles to assist you in an effective incident response. What
roles should you assign? (Choose two.)
a. Operations Lead. The main roles in incident response are the Incident
Commander (IC), Communications Lead (CL), and Operations or Ops Lead (OL).
IMAG organizes these roles into a hierarchy: the IC leads the incident response,
and the CL and OL report to the IC.
b. Engineering Lead
c. Communications Lead. The main roles in incident response are the Incident
Commander (IC), Communications Lead (CL), and Operations or Ops Lead (OL).
IMAG organizes these roles into a hierarchy: the IC leads the incident response,
and the CL and OL report to the IC.
d. Customer Impact Assessor
e. External Customer Communications Lead
46. You support an application running on GCP and want to configure SMS notifications
to your team for the most critical alerts in Stackdriver Monitoring. You have already
identified the alerting policies you want to configure this for. What should you do?
a. Download and configure a third-party integration between Stackdriver
Monitoring and an SMS gateway. Ensure that your team members add their
SMS/phone numbers to the external tool.
b. Select the Webhook notifications option for each alerting policy, and
configure it to use a third-party integration tool. Ensure that your team
members add their SMS/phone numbers to the external tool.
c. Ensure that your team members set their SMS/phone numbers in their
Stackdriver Profile. Select the SMS notification option for each alerting policy
and then select the appropriate SMS/phone numbers from the list. This
approach is the simplest and most straightforward of the options presented. It
requires no additional integration or configuration, and team members can
easily manage their contact information in their Stackdriver profile. However, it
does require team members to have access to and familiarity with Stackdriver,
which may not be the case for all members of the team.
To configure SMS notifications, do the following: In the SMS section, click Add
new and follow the instructions. Click Save. When you set up your alerting
policy, select the SMS notification type and choose a verified phone number
from the list.
d. Configure a Slack notification for each alerting policy. Set up a Slack-to-SMS
integration to send SMS messages when Slack messages are received. Ensure
that your team members add their SMS/phone numbers to the external
integration.
47. You are managing an application that exposes an HTTP endpoint without using a load
balancer. The latency of the HTTP responses is important for the user experience. You
want to understand what HTTP latencies all of your users are experiencing. You use
Stackdriver Monitoring. What should you do?
a. In your application, create a metric with a metricKind set to DELTA and a
valueType set to DOUBLE. In Stackdriver's Metrics Explorer, use a Stacked Bar
graph to visualize the metric. DELTA: In which the value measures the change
since it was last recorded.
b. In your application, create a metric with a metricKind set to CUMULATIVE and
a valueType set to DOUBLE. In Stackdriver's Metrics Explorer, use a Line graph
to visualize the metric. CUMULATIVE: In which the value constantly increases
over time
c. In your application, create a metric with a metricKind set to GAUGE and a
valueType set to DISTRIBUTION. In Stackdriver's Metrics Explorer, use a
Heatmap graph to visualize the metric. GAUGE: In which value measures a
specific instant in time. Latency needs specific instant in time. GAUGE metrics
record a value at a particular point in time and DISTRIBUTION captures
distribution statistics. A Heatmap is a good way to visualize latencies across all
users.
d. In your application, create a metric with a metricKind set to
METRIC_KIND_UNSPECIFIED and a valueType set to INT64. In Stackdriver's
Metrics Explorer, use a Stacked Area graph to visualize the metric.
48. Your team is designing a new application for deployment both inside and outside
Google Cloud Platform (GCP). You need to collect detailed metrics such as system
resource utilization. You want to use centralized GCP services while minimizing the
amount of work required to set up this collection system. What should you do?
a. Import the Stackdriver Profiler package, and configure it to relay function
timing data to Stackdriver for further analysis. Profiler works both inside and
outside GCP. Cloud Profiler is a statistical, low-overhead profiler that
continuously gathers CPU usage and memory-allocation information from your
production applications.
b. Import the Stackdriver Debugger package, and configure the application to
emit debug messages with timing information.
c. Instrument the code using a timing library, and publish the metrics via a health
check endpoint that is scraped by Stackdriver.
d. Install an Application Performance Monitoring (APM) tool in both locations,
and configure an export to a central data storage location for analysis.
49. You need to reduce the cost of virtual machines (VM) for your organization. After
reviewing different options, you decide to leverage preemptible VM instances.
Which application is suitable for preemptible VMs?
a. A scalable in-memory caching system. Preemptible VMs are best suited for
fault-tolerant, non-critical applications due to their temporary nature.
b. The organization's public-facing website. La desventaja de las preemptible es
que pueden apagarse en cualquier momento, dejando así sin acceso al website,
lo cual es un componente crítico.
c. A distributed, eventually consistent NoSQL database cluster with sufficient
quorum.
d. A GPU-accelerated video rendering platform that retrieves and stores videos
in a storage bucket. Esta opción también podría ser, pero se ajusta mejor a la
A.
50. Your organization recently adopted a container-based workflow for application
development. Your team develops numerous applications that are deployed
continuously through an automated build pipeline to a Kubernetes cluster in the
production environment. The security auditor is concerned that developers or
operators could circumvent automated testing and push code changes to production
without approval. What should you do to enforce approvals?
a. Configure the build system with protected branches that require pull request
approval. No está tan mal, pero tiene más sentido la D.
b. Use an Admission Controller to verify that incoming requests originate from
approved sources. An admission controller is a piece of code that intercepts
requests to the Kubernetes API server prior to persistence of the object, but
after the request is authenticated and authorized.
c. Leverage Kubernetes Role-Based Access Control (RBAC) to restrict access to
only approved users. We need to "enforce approvals" roles apply in the cluster
and Ops always could push to production without approval.
d. Enable binary authorization inside the Kubernetes cluster and configure the
build pipeline as an attestor. They cannot push code to production without
approval because their images are not signed.
51. You support a stateless web-based API that is deployed on a single Compute Engine
instance in the europe-west2-a zone. The Service Level Indicator (SLI) for service
availability is below the specified Service Level Objective (SLO). A postmortem has
revealed that requests to the API regularly time out. The time outs are due to the API
having a high number of requests and running out memory. You want to improve
service availability. What should you do?
a. Change the specified SLO to match the measured SLI
b. Move the service to higher-specification compute instances with more
memory
c. Set up additional service instances in other zones and load balance the traffic
between all instances. This option will provide redundancy and increase the
availability of the service by distributing the traffic across multiple instances.
Additionally, if one instance goes down, the load balancer will redirect the traffic
to the other healthy instances, minimizing the impact on the service availability.
d. Set up additional service instances in other zones and use them as a failover
in case the primary instance is unavailable. No, porque el problema está en que
hay timeouts, no que se cae, entonces al escalar horizontalmente se previene la
saturación del sistema en esa zona en específico que se menciona.
52. You are running a real-time gaming application on Compute Engine that has a
production and testing environment. Each environment has their own Virtual Private
Cloud (VPC) network. The application frontend and backend servers are located on
different subnets in the environment's VPC. You suspect there is a malicious process
communicating intermittently in your production frontend servers. You want to
ensure that network traffic is captured for analysis. What should you do?
a. Enable VPC Flow Logs on the production VPC network frontend and backend
subnets only with a sample volume scale of 0.5. Enable VPC Flow Logs on the
production VPC network frontend and backend subnets only with a sample
volume scale of 0.5 is not adequate, as it captures only half of the network
traffic, there is a chance that the logs of the malicious process are not captured.
b. Enable VPC Flow Logs on the production VPC network frontend and backend
subnets only with a sample volume scale of 1.0. VPC flow logs are a feature
that allows you to capture network traffic data in your VPC network. To ensure
that all network traffic is captured for analysis, you should enable VPC flow logs
on the production VPC network frontend and backend subnets with a sample
volume scale of 1.0. This will capture all network traffic data, including the
potentially malicious process, for further analysis.
c. Enable VPC Flow Logs on the testing and production VPC network frontend
and backend subnets with a volume scale of 0.5. Apply changes in testing
before production.
d. Enable VPC Flow Logs on the testing and production VPC network frontend
and backend subnets with a volume scale of 1.0. Apply changes in testing
before production.
53. Your team of Infrastructure DevOps Engineers is growing, and you are starting to use
Terraform to manage infrastructure. You need a way to implement code versioning
and to share code with other team members. What should you do?
a. Store the Terraform code in a version-control system. Establish procedures for
pushing new versions and merging with the master.
b. Store the Terraform code in a network shared folder with child folders for each
version release. Ensure that everyone works on different files.
c. Store the Terraform code in a Cloud Storage bucket using object versioning.
Give access to the bucket to every team member so they can download the
files.
d. Store the Terraform code in a shared Google Drive folder so it syncs
automatically to every team member's computer. Organize files with a naming
convention that identifies each new version.
54. You are using Stackdriver to monitor applications hosted on Google Cloud Platform
(GCP). You recently deployed a new application, but its logs are not appearing on the
Stackdriver dashboard. You need to troubleshoot the issue. What should you do?
a. Confirm that the Stackdriver agent has been installed in the hosting virtual
machine. De base, por ejemplo, Elastic, si no están los agentes, todas las otras
3 opciones serían inválidas e innecesarias. You need the agent to export logs.
So, first thing you'll always see in the agent is there, and running. Next service
account, next client libraries.
b. Confirm that your account has the proper permissions to use the Stackdriver
dashboard.
c. Confirm that port 25 has been opened in the firewall to allow messages
through to Stackdriver.
d. Confirm that the application is using the required client library and the service
account key has proper permissions. So, first thing you'll always see in the
agent is there, and running. Next service account, next client libraries.
55. Your organization recently adopted a container-based workflow for application
development. Your team develops numerous applications that are deployed
continuously through an automated build pipeline to the production environment. A
recent security audit alerted your team that the code pushed to production could
contain vulnerabilities and that the existing tooling around virtual machine (VM)
vulnerabilities no longer applies to the containerized environment. You need to
ensure the security and patch level of all code running through the pipeline. What
should you do?
a. Set up Container Analysis to scan and report Common Vulnerabilities and
Exposures. To ensure the security and patch level of all code running through
the pipeline, you should set up Container Analysis to scan and report Common
Vulnerabilities and Exposures. Container Analysis is a service on GCP that allows
you to scan and analyze container images for vulnerabilities, malware and other
issues. This will help you identify vulnerabilities in your container images and
take appropriate action to address them.
b. Configure the containers in the build pipeline to always update themselves
before release. Puede seguir teniendo fallas igual.
c. Reconfigure the existing operating system vulnerability software to exist
inside the container. Esto en que influye a la pipeline? Se sigue pusheando
código vulnerable.
d. Implement static code analysis tooling against the Docker files used to create
the containers. No influye en nada.
56. You use Cloud Build to build your application. You want to reduce the build time while
minimizing cost and development effort. What should you do?
a. Use Cloud Storage to cache intermediate artifacts. To increase the speed of a
build, reuse the results from a previous build. You can copy the results of a
previous build to a Google Cloud Storage bucket, use the results for faster
calculation, and then copy the new results back to the bucket. Use this method
when your build takes a long time and produces a small number of files that
does not take time to copy to and from Google Cloud Storage.
b. Run multiple Jenkins agents to parallelize the build. Convengamos que si agiliza
el proceso, pero es costoso, cuando se podría dedicar otro agente de Jenkins
para otro propósito.
c. Use multiple smaller build steps to minimize execution time. No es una imagen
de Docker en sí, es toda la aplicación dentro de una pipeline el proceso en
cuestión.
d. Use larger Cloud Build virtual machines (VMs) by using the machine-type
option. Esto es válido pero requiere más esfuerzo y costos.
57. You support a web application that is hosted on Compute Engine. The application
provides a booking service for thousands of users. Shortly after the release of a new
feature, your monitoring dashboard shows that all users are experiencing latency at
login. You want to mitigate the impact of the incident on the users of your service.
What should you do first?
a. Roll back the recent release. Casi siempre va a ser la mejor opción.
b. Review the Stackdriver monitoring.
c. Upsize the virtual machines running the login services. Puede ayudar a
solucionar el problema pero no garantiza la mitigación del mismo.
d. Deploy a new release to see whether it fixes the problem.
58. You are deploying an application that needs to access sensitive information. You need
to ensure that this information is encrypted and the risk of exposure is minimal if a
breach occurs. What should you do?
a. Store the encryption keys in Cloud Key Management Service (KMS) and rotate
the keys frequently. Las KMS de Google tienen la posibilidad de rotar las llaves
en caso de algún incidente de seguridad.
b. Inject the secret at the time of instance creation via an encrypted
configuration management system.
c. Integrate the application with a Single sign-on (SSO) system and do not expose
secrets to the application.
d. Leverage a continuous build pipeline that produces multiple versions of the
secret for each instance of the application.
59. You encounter a large number of outages in the production systems you support. You
receive alerts for all the outages that wake you up at night. The alerts are due to
unhealthy systems that are automatically restarted within a minute. You want to set
up a process that would prevent staff burnout while following Site Reliability
Engineering practices. What should you do?
a. Eliminate unactionable alerts. The reason is because it seems like the problem
is automatically fixed with a restart of the service after a minute, therefore
engineers don't really need to be woken up about these problems. If it failed
multiple times or if the restart failed, then the engineer should be woken up
b. Create an incident report for each of the alerts.
c. Distribute the alerts to engineers in different time zones.
d. Redefine the related Service Level Objective so that the error budget is not
exhausted.
60. You have migrated an e-commerce application to Google Cloud Platform (GCP). You
want to prepare the application for the upcoming busy season. What should you do
first to prepare for the busy season?
a. Load test the application to profile its performance for scaling. The objective
of the preparation stage is to test the system's ability to scale for peak user
traffic and to document the results. Completing the preparation stage results in
architecture refinement to handle peak traffic more efficiently and increase
system reliability. This stage also yields procedures for operations and support
that help streamline processes for handling the peak event and any issues that
might occur. Consider this stage as practice for the peak event from a system
and operations perspective.
b. Enable AutoScaling on the production clusters, in case there is growth. Esto en
todo caso se hace después de haber hecho el load test.
c. Pre-provision double the compute power used last season, expecting growth.
d. Create a runbook on inflating the disaster recovery (DR) environment if there
is growth.
61. You support a web application that runs on App Engine and uses CloudSQL and Cloud
Storage for data storage. After a short spike in website traffic, you notice a big increase
in latency for all user requests, increase in CPU use, and the number of processes
running the application. Initial troubleshooting reveals:
• After the initial spike in traffic, load levels returned to normal but users still experience
high latency.
• Requests for content from the CloudSQL database and images from Cloud Storage
show the same high latency.
• No changes were made to the website around the time the latency increased.
• There is no increase in the number of errors to the users.
You expect another spike in website traffic in the coming days and want to make sure
users don't experience latency. What should you do?
a. Upgrade the GCS buckets to Multi-Regional. Es independiente de la región el
problema.
b. Enable high availability on the CloudSQL instances.
c. Move the application from App Engine to Compute Engine.
d. Modify the App Engine configuration to have additional idle instances. Scaling
App Engine scales the number of instances automatically in response to
processing volume. This scaling factors in the automatic_scaling settings that
are provided on a per-version basis in the configuration file. A service with basic
scaling is configured by setting the maximum number of instances in the
max_instances parameter of the basic_scaling setting. The number of live
instances scales with the processing volume. You configure the number of
instances of each version in that service's configuration file. The number of
instances usually corresponds to the size of a dataset being held in memory or
the desired throughput for offline work. You can adjust the number of instances
of a manually-scaled version very quickly, without stopping instances that are
currently running, using the Modules API set_num_instances function.
62. Your application runs on Google Cloud Platform (GCP). You need to implement Jenkins
for deploying application releases to GCP. You want to streamline the release process,
lower operational toil, and keep user data secure. What should you do?
a. Implement Jenkins on local workstations.
b. Implement Jenkins on Kubernetes on-premises.
c. Implement Jenkins on Google Cloud Functions. No es específicamente para
aplicaciones, en todo caso un Cloud Run.
d. Implement Jenkins on Compute Engine virtual machines. Implement Jenkins
on Compute Engine virtual machines. This will allow you to leverage GCP's
security and compliance features, and integrate with other GCP services such as
Cloud Storage or Cloud SQL for storing build artifacts and user data.
Additionally, using Compute Engine virtual machines for Jenkins will provide
flexibility in terms of scaling and managing resources.
63. You are working with a government agency that requires you to archive application
logs for seven years. You need to configure Stackdriver to export and store the logs
while minimizing costs of storage. What should you do?
a. Create a Cloud Storage bucket and develop your application to send logs
directly to the bucket.
b. Develop an App Engine application that pulls the logs from Stackdriver and
saves them in BigQuery.
c. Create an export in Stackdriver and configure Cloud Pub/Sub to store logs in
permanent storage for seven years.
d. Create a sink in Stackdriver, name it, create a bucket on Cloud Storage for
storing archived logs, and then select the bucket as the log export destination.
This allows you to export logs from Stackdriver to a long-term storage bucket in
Cloud Storage, which is a cost-effective option for long-term storage.
Additionally, you can use Stackdriver's export feature to schedule regular
exports and configure retention policies to keep the logs for the required seven
years.
Sinks control how Cloud Logging routes logs. Using sinks, you can route some or
all of your logs to supported destinations. Some of the reasons that you might
want to control how your logs are routed include the following:
• To store logs that are unlikely to be read but that must be retained for
compliance purposes.
• To organize your logs in buckets in a format that is useful to you.
• To use big-data analysis tools on your logs.
• To stream your logs to other applications, other repositories, or third
parties. For example, if you want to export your logs from Google Cloud
so that you can view them on a third-party platform, then configure a
sink to route your log entries to Pub/Sub.
64. You support a trading application written in Python and hosted on App Engine flexible
environment. You want to customize the error information being sent to Stackdriver
Error Reporting. What should you do?
a. Install the Stackdriver Error Reporting library for Python, and then run your
code on a Compute Engine VM.
b. Install the Stackdriver Error Reporting library for Python, and then run your
code on Google Kubernetes Engine.
c. Install the Stackdriver Error Reporting library for Python, and then run your
code on App Engine flexible environment.
You can send error reports to Error Reporting from Python applications by using
the Error Reporting library for Python. Use the Error Reporting library for Python
to create error groups for the following cases:
• Your log bucket has customer-managed encryption keys (CMEK).
• Your log buckets aren't in the global region.
• You want to report custom error events.
d. Use the Stackdriver Error Reporting API to write errors from your application
to ReportedErrorEvent, and then generate log entries with properly formatted
error messages in Stackdriver Logging.
65. You need to define Service Level Objectives (SLOs) for a high-traffic multi-region web
application. Customers expect the application to always be available and have fast
response times. Customers are currently happy with the application performance and
availability. Based on current measurement, you observe that the 90 percentile of
latency is 120ms and the 95 percentile of latency is 275ms over a 28-day window.
What latency SLO would you recommend to the team to publish?
a. 90th percentile – 100ms / 95th percentile – 250ms
b. 90th percentile – 120ms / 95th percentile – 275ms
c. 90th percentile – 150ms / 95th percentile – 300ms. Start from a loosen SLO
that you can tighten better than to choose an overly strict SLO that has to be
relaxed after you discover it is unattainable.
d. 90th percentile – 250ms / 95th percentile – 400ms|
66. Your company is developing applications that are deployed on Google Kubernetes
Engine (GKE). Each team manages a different application. You need to create the
development and production environments for each team, while minimizing costs.
Different teams should not be able to access other teams' environments.
What should you do?
a. Create one GCP Project per team. In each project, create a cluster for
Development and one for Production. Grant the teams IAM access to their
respective clusters.
b. Create one GCP Project per team. In each project, create a cluster with a
Kubernetes namespace for Development and one for Production. Grant the
teams IAM access to their respective clusters.
c. Create a Development and a Production GKE cluster in separate projects. In
each cluster, create a Kubernetes namespace per team, and then configure
Identity Aware Proxy so that each team can only access its own namespace.
d. Create a Development and a Production GKE cluster in separate projects. In
each cluster, create a Kubernetes namespace per team, and then configure
Kubernetes Role-based access control (RBAC) so that each team can only
access its own namespace. Option D is a good approach for creating the
development and production environments for each team while minimizing
costs and ensuring that different teams cannot access other teams'
environments. This approach involves creating a Development and Production
GKE cluster in separate GCP projects. In each cluster, a Kubernetes namespace
is created per team. Then, Kubernetes Role-based access control (RBAC) is
configured so that each team can only access its own namespace. This ensures
that the teams are isolated from each other and can only access the resources
they need, while minimizing costs by using the same clusters for different
teams.
67. Some of your production services are running in Google Kubernetes Engine (GKE) in
the eu-west-1 region. Your build system runs in the us-west-1 region. You want to
push the container images from your build system to a scalable registry to maximize
the bandwidth for transferring the images to the cluster. What should you do?
a. Push the images to Google Container Registry (GCR) using the gcr.io
hostname.
b. Push the images to Google Container Registry (GCR) using the us.gcr.io
hostname.
c. Push the images to Google Container Registry (GCR) using the eu.gcr.io
hostname. Al poner eu, existe un repositorio de imágenes de Google en Europa
lo cual agiliza la transferencia entre el registry y el cluster que está en esa región
cercana.
d. Push the images to a private image registry running on a Compute Engine
instance in the eu-west-1 region.
68. You manage several production systems that run on Compute Engine in the same
Google Cloud Platform (GCP) project. Each system has its own set of dedicated
Compute Engine instances. You want to know how must it costs to run each of the
systems. What should you do?
a. In the Google Cloud Platform Console, use the Cost Breakdown section to
visualize the costs per system.
b. Assign all instances a label specific to the system they run. Configure BigQuery
billing export and query costs per label. Using labels to tag instances with the
specific system they run allows you to easily filter and query costs by system in
BigQuery. This allows you to see the costs associated with each system and
make informed decisions about cost optimization.
c. Enrich all instances with metadata specific to the system they run. Configure
Stackdriver Logging to export to BigQuery, and query costs based on the
metadata.
d. Name each virtual machine (VM) after the system it runs. Set up a usage report
export to a Cloud Storage bucket. Configure the bucket as a source in BigQuery
to query costs based on VM name.
69. You use Cloud Build to build and deploy your application. You want to securely
incorporate database credentials and other application secrets into the build pipeline.
You also want to minimize the development effort. What should you do?
a. Create a Cloud Storage bucket and use the built-in encryption at rest. Store
the secrets in the bucket and grant Cloud Build access to the bucket.
b. Encrypt the secrets and store them in the application repository. Store a
decryption key in a separate repository and grant Cloud Build access to the
repository.
c. Use client-side encryption to encrypt the secrets and store them in a Cloud
Storage bucket. Store a decryption key in the bucket and grant Cloud Build
access to the bucket.
d. Use Cloud Key Management Service (Cloud KMS) to encrypt the secrets and
include them in your Cloud Build deployment configuration. Grant Cloud Build
access to the KeyRing. Use Cloud Key Management Service (Cloud KMS) to
encrypt the secrets and include them in your Cloud Build deployment
configuration. Grant Cloud Build access to the KeyRing. This option allows you
to use Google-managed encryption and access controls, and it also minimizes
the development effort required to securely incorporate the secrets into the
build pipeline.
70. You support a popular mobile game application deployed on Google Kubernetes
Engine (GKE) across several Google Cloud regions. Each region has multiple
Kubernetes clusters. You receive a report that none of the users in a specific region
can connect to the application. You want to resolve the incident while following Site
Reliability Engineering practices. What should you do first?
a. Reroute the user traffic from the affected region to other regions that don't
report issues. Google always aims to first stop the impact of an incident, and
then find the root cause (unless the root cause just happens to be identified
early on). Issue is that one region is not serving requests. First thing to resolve
is make the application responsive to users as soon as possible affected by this
issue. Look into the error logs later.
b. Use Stackdriver Monitoring to check for a spike in CPU or memory usage for
the affected region.
c. Add an extra node pool that consists of high memory and high CPU machine
type instances to the cluster.
d. Use Stackdriver Logging to filter on the clusters in the affected region, and
inspect error messages in the logs. Issue is that one region is not serving
requests. First thing to resolve is make the application responsive to users as
soon as possible affected by this issue. Look into the error logs later.
71. You are writing a postmortem for an incident that severely affected users. You want
to prevent similar incidents in the future. Which two of the following sections should
you include in the postmortem? (Choose two.)
a. An explanation of the root cause of the incident.
b. A list of employees responsible for causing the incident.
c. A list of action items to prevent a recurrence of the incident.
d. Your opinion of the incident's severity compared to past incidents.
e. Copies of the design documents for all the services impacted by the incident.
72. You are ready to deploy a new feature of a web-based application to production. You
want to use Google Kubernetes Engine (GKE) to perform a phased rollout to half of the
web server pods. What should you do?
a. Use a partitioned rolling update. A partitioned rolling update allows you to
control the percentage of pods that are updated at a time, which allows you to
perform a phased rollout. This way you can incrementally test and monitor the
new feature, before it is deployed to all the pods. This approach is useful when
you want to minimize the risk of introducing new bugs or breaking changes in
your production environment, it allows you to have more control over the
process, and it's less likely to cause service disruption, or to have all the pods
down at the same time.
b. Use Node taints with NoExecute. Using Node taints with NoExecute could be
used to prevent pods from being scheduled on certain nodes, but it would not
be the best option for a phased rollout as it does not allow for a specific
percentage or number of pods to be updated.
c. Use a replica set in the deployment specification. A replica set in the
deployment specification is used for ensuring that a specified number of
replicas of a pod are running at any given time, but it does not provide a way to
perform a phased rollout.
d. Use a stateful set with parallel pod management policy. A stateful set with
parallel pod management policy is used for managing stateful applications, but
it also does not provide a way to perform a phased rollout.
73. You are responsible for the reliability of a high-volume enterprise application. A large
number of users report that an important subset of the application's functionality `" a
data intensive reporting feature `" is consistently failing with an HTTP 500 error. When
you investigate your application's dashboards, you notice a strong correlation
between the failures and a metric that represents the size of an internal queue used
for generating reports. You trace the failures to a reporting backend that is
experiencing high I/O wait times. You quickly fix the issue by resizing the backend's
persistent disk (PD). How you need to create an availability Service Level Indicator (SLI)
for the report generation feature. How would you define it?
a. As the I/O wait times aggregated across all report generation backends
b. As the proportion of report generation requests that result in a successful
response. Availability = good time/ total time.
c. As the application's report generation queue size compared to a known-good
threshold. It’s a SLO since it would be using a known threshold.
d. As the reporting backend PD throughout capacity compared to a known-good
threshold. It’s a SLO since it would be using a known threshold.
74. You have an application running in Google Kubernetes Engine. The application invokes
multiple services per request but responds too slowly. You need to identify which
downstream service or services are causing the delay. What should you do?
a. Analyze VPC flow logs along the path of the request.
b. Investigate the Liveness and Readiness probes for each service.
c. Create a Dataflow pipeline to analyze service metrics in real time.
d. Use a distributed tracing framework such as OpenTelemetry or Stackdriver
Trace. Distributed tracing allows you to trace the path of a request as it travels
through multiple services and identify where delays may be occurring. This can
provide detailed information about the request and response timings for each
service, making it easier to pinpoint which services are causing delays in your
application. OpenTelemetry and Stackdriver Trace are both available on GCP,
and provide easy integration with Kubernetes and other GCP services.
75. You are creating and assigning action items in a postmodern for an outage. The outage
is over, but you need to address the root causes. You want to ensure that your team
handles the action items quickly and efficiently. How should you assign owners and
collaborators to action items?
a. Assign one owner for each action item and any necessary collaborators. It is
important to have a clear ownership for each action item, so that there is no
confusion about who is responsible for the task and it will be easier to track the
progress and follow up if there is any delay. The owner should also be given the
necessary authority and resources to carry out the task. The necessary
collaborators should also be assigned to support the owner in completing the
task efficiently. Esto puede verse por ejemplo, en los boards de Azure DevOps
con los task ítems.
b. Assign multiple owners for each item to guarantee that the team addresses
items quickly.
c. Assign collaborators but no individual owners to the items to keep the
postmortem blameless.
d. Assign the team lead as the owner for all action items because they are in
charge of the SRE team.
76. Your development team has created a new version of their service's API. You need to
deploy the new versions of the API with the least disruption to third-party developers
and end users of third-party installed applications. What should you do?
a. Introduce the new version of the API. Announce deprecation of the old version
of the API. Deprecate the old version of the API. Contact remaining users of
the old API. Provide best effort support to users of the old API. Turn down the
old version of the API.
b. Announce deprecation of the old version of the API. Introduce the new version
of the API. Contact remaining users on the old API. Deprecate the old version
of the API. Turn down the old version of the API. Provide best effort support
to users of the old API. No tiene sentido anunciar la deprecación antes de
anunciar la nueva versión.
c. Announce deprecation of the old version of the API. Contact remaining users
on the old API. Introduce the new version of the API. Deprecate the old version
of the API. Provide best effort support to users of the old API. Turn down the
old version of the API. No tiene sentido anunciar la deprecación antes de
anunciar la nueva versión.
d. Introduce the new version of the API. Contact remaining users of the old API.
Announce deprecation of the old version of the API. Deprecate the old version
of the API. Turn down the old version of the API. Provide best effort support
to users of the old API. No anunciaste deprecación, porque contactarías a
usuarios viejos.
77. You are running an application on Compute Engine and collecting logs through
Stackdriver. You discover that some personally identifiable information (PII) is leaking
into certain log entry fields. You want to prevent these fields from being written in
new log entries as quickly as possible. What should you do?
a. Use the filter-record-transformer Fluentd filter plugin to remove the fields
from the log entries in flight. It is included in the fluentd core and does not
require installing a new plugin. Fluentd is a log collector and processor that is
commonly used with Google Cloud Platform. The filter-record-transformer
plugin for Fluentd can be used to modify log entries as they are being collected,
allowing you to remove sensitive fields from the log entries in real-time before
they are written to Stackdriver. This can be done quickly, as it doesn't require
changes on the application code.
b. Use the fluent-plugin-record-reformer Fluentd output plugin to remove the
fields from the log entries in flight. Reformer is Fluentd plugin to add or replace
fields of a event record, however it might still work aswell, but A is better.
c. Wait for the application developers to patch the application, and then verify
that the log entries are no longer exposing PII.
d. Stage log entries to Cloud Storage, and then trigger a Cloud Function to
remove the fields and write the entries to Stackdriver via the Stackdriver
Logging API.
78. You support a service that recently had an outage. The outage was caused by a new
release that exhausted the service memory resources. You rolled back the release
successfully to mitigate the impact on users. You are now in charge of the post-
mortem for the outage. You want to follow Site Reliability Engineering practices when
developing the post-mortem. What should you do?
a. Focus on developing new features rather than avoiding the outages from
recurring.
b. Focus on identifying the contributing causes of the incident rather than the
individual responsible for the cause. According to Site Reliability Engineering
(SRE) practices, the goal of a post-mortem is to identify the underlying causes
of the incident in order to take steps to prevent it from happening again in the
future. This involves looking for patterns and issues in the system rather than
looking for a specific person to blame. It's important to have a focus on learning
and continuous improvement, rather than assigning blame.
c. Plan individual meetings with all the engineers involved. Determine who
approved and pushed the new release to production.
d. Use the Git history to find the related code commit. Prevent the engineer who
made that commit from working on production services.
79. You support a user-facing web application. When analyzing the application's error
budget over the previous six months, you notice that the application has never
consumed more than 5% of its error budget in any given time window. You hold a
Service Level Objective (SLO) review with business stakeholders and confirm that the
SLO is set appropriately. You want your application's SLO to more closely reflect its
observed reliability. What steps can you take to further that goal while balancing
velocity, reliability, and business needs? (Choose two.)
a. Add more serving capacity to all of your application's zones.
b. Have more frequent or potentially risky application releases.
c. Tighten the SLO match the application's observed reliability.
d. Implement and measure additional Service Level Indicators (SLIs) from the
application.
e. Announce planned downtime to consume more error budget, and ensure that
users are not depending on a tighter SLO.
80. You support a service with a well-defined Service Level Objective (SLO). Over the
previous 6 months, your service has consistently met its SLO and customer satisfaction
has been consistently high. Most of your service's operations tasks are automated and
few repetitive tasks occur frequently. You want to optimize the balance between
reliability and deployment velocity while following site reliability engineering best
practices. What should you do? (Choose two.)
a. Make the service's SLO more strict.
b. Increase the service's deployment velocity and/or risk.
c. Shift engineering time to other services that need more reliability.
d. Get the product team to prioritize reliability work over new features.
e. Change the implementation of your Service Level Indicators (SLIs) to increase
coverage.

For SLO (Met), Toil (Low) and High Customer Satisfaction:

Choose to (a) relax release and deployment processes and increase velocity, or (b) step
back from the engagement and focus engineering time on services that need more
reliability.

81. Your organization uses a change advisory board (CAB) to approve all changes to an
existing service. You want to revise this process to eliminate any negative impact on
the software delivery performance. What should you do? (Choose two.)
a. Replace the CAB with a senior manager to ensure continuous oversight from
development to deployment. Los boards son hasta obligatorios, en empresas
muy grandes es indispensable el uso de ellos para mantener un track de todos
los issues y cambios a realizar.
b. Let developers merge their own changes, but ensure that the team's
deployment platform can roll back changes if any issues are discovered.
c. Move to a peer-review based process for individual changes that is enforced
at code check-in time and supported by automated tests. Implementing a
peer-review process ensures that changes are reviewed by team members,
which can catch issues early in the development process. Automated tests can
provide additional confidence in the quality of changes. This approach
encourages collaboration and reduces the need for a formal CAB.
d. Batch changes into larger but less frequent software releases.
e. Ensure that the team's development platform enables developers to get fast
feedback on the impact of their changes. Fast feedback mechanisms, such as
automated testing and continuous integration pipelines, allow developers to
quickly identify and address issues with their changes. This reduces the need for
a formal approval board like CAB and promotes a culture of ownership and
responsibility among developers.
82. Your organization has a containerized web application that runs on-premises. As part
of the migration plan to Google Cloud, you need to select a deployment strategy and
platform that meets the following acceptance criteria:
1. The platform must be able to direct traffic from Android devices to an Android-
specific microservice.
2. The platform must allow for arbitrary percentage-based traffic splitting
3. The deployment strategy must allow for continuous testing of multiple versions of
any microservice.
What should you do?
a. Deploy the canary release of the application to Cloud Run. Use traffic splitting
to direct 10% of user traffic to the canary release based on the revision tag.
b. Deploy the canary release of the application to App Engine. Use traffic splitting
to direct a subset of user traffic to the new version based on the IP address.
c. Deploy the canary release of the application to Compute Engine. Use Anthos
Service Mesh with Compute Engine to direct 10% of user traffic to the canary
release by configuring the virtual service.
d. Deploy the canary release to Google Kubernetes Engine with Anthos Service
Mesh. Use traffic splitting to direct 10% of user traffic to the new version based
on the user-agent header configured in the virtual service. This option offers a
comprehensive solution that aligns well with your criteria. Anthos Service Mesh,
integrated with Google Kubernetes Engine (GKE), supports advanced traffic
management capabilities, including the ability to perform traffic splitting based
on HTTP headers. This would allow you to use the user-agent header to identify
Android devices and direct traffic accordingly. Additionally, it supports arbitrary
percentage-based traffic splitting and allows for the testing of multiple versions
of a microservice, meeting the requirement for continuous testing.
83. Your team is running microservices in Google Kubernetes Engine (GKE). You want to
detect consumption of an error budget to protect customers and define release
policies. What should you do?
a. Create SLIs from metrics. Enable Alert Policies if the services do not pass.
b. Use the metrics from Anthos Service Mesh to measure the health of the
microservices.
c. Create a SLO. Create an Alert Policy on select_slo_burn_rate. This approach
involves defining specific SLOs for your services, which are quantitative
measures of the desired reliability of a service. Once you have these SLOs, you
can set up Alert Policies based on the rate at which your error budget is
consumed (burn rate). Using select_slo_burn_rate is a more granular way to
detect consumption of error budget. It can be used to monitor individual SLOs
and to identify specific types of errors. However, it can be more difficult to set
up and to interpret the results.
d. Create a SLO and configure uptime checks for your services. Enable Alert
Policies if the services do not pass. Creating an SLO and configuring uptime
checks is a good way to get a high-level view of the health of your services. It
can also help you to identify trends over time. However, it can be difficult to
configure uptime checks for complex services, and it may not be possible to
detect all types of errors.
84. Your organization wants to collect system logs that will be used to generate
dashboards in Cloud Operations for their Google Cloud project. You need to configure
all current and future Compute Engine instances to collect the system logs, and you
must ensure that the Ops Agent remains up to date. What should you do?
a. Use the gcloud CLI to install the Ops Agent on each VM listed in the Cloud Asset
Inventory,
b. Select all VMs with an Agent status of Not detected on the Cloud Operations
VMs dashboard. Then select Install agents.
c. Use the gcloud CLI to create an Agent Policy. It will ensure the installation will
be done in the future as well. Agent policies enable automated installation and
maintenance of the Google Cloud Observability agents across a fleet of
Compute Engine VMs that match user-specified criteria. With one command,
you can create a policy for a Google Cloud project that governs existing and new
VMs associated with that Google Cloud project, ensuring proper installation and
optional auto-upgrade of all Google Cloud Observability agents on those VMs.
d. Install the Ops Agent on the Compute Engine image by using a startup script

En líneas generales, es la C, por el hecho de que no te vas a poner a instalar un agente


máquina por máquina. Recordar Elastic, que se tenía un agent policy, y vos vinculas
esa policy a un cluster por ejemplo, bueno en este caso sería lo mismo pero para las
VMs con su respectiva configuración obviamente.

85. Your company has a Google Cloud resource hierarchy with folders for production, test,
and development. Your cyber security team needs to review your company's Google
Cloud security posture to accelerate security issue identification and resolution. You
need to centralize the logs generated by Google Cloud services from all projects only
inside your production folder to allow for alerting and near-real time analysis. What
should you do?
a. Enable the Workflows API and route all the logs to Cloud Logging.
b. Create a central Cloud Monitoring workspace and attach all related projects.
c. Create an aggregated log sink associated with the production folder that uses
a Pub/Sub topic as the destination. No es mala opción tampoco, pero para mi
tiene más sentido la D.
d. Create an aggregated log sink associated with the production folder that uses
a Cloud Logging bucket as the destination. Routed logs are generally available
within seconds of their arrival to Logging, with 99% of logs available in less than
60 seconds. Cloud Logging: " A log bucket can store logs that are received by
multiple Google Cloud projects."
86. You are configuring the frontend tier of an application deployed in Google Cloud. The
frontend tier is hosted in nginx and deployed using a managed instance group with an
Envoy-based external HTTP(S) load balancer in front. The application is deployed
entirely within the europe-west2 region, and only serves users based in the United
Kingdom. You need to choose the most cost-effective network tier and load balancing
configuration. What should you use?
a. Premium Tier with a global load balancer
b. Premium Tier with a regional load balancer
c. Standard Tier with a global load balancer
d. Standard Tier with a regional load balancer. Europe west 2 es Londres, así que
conviene que sea regional y standard por cuestión de costos, está todo en una
región.
87. You recently deployed your application in Google Kubernetes Engine (GKE) and now
need to release a new version of the application. You need the ability to instantly roll
back to the previous version of the application in case there are issues with the new
version. Which deployment model should you use?
a. Perform a rolling deployment, and test your new application after the
deployment is complete.
b. Perform A/B testing, and test your application periodically after the
deployment is complete.
c. Perform a canary deployment, and test your new application periodically after
the new version is deployed.
d. Perform a blue/green deployment, and test your new application after the
deployment is complete. This method involves deploying the new version of
your application alongside the old version (two separate but identical
environments: blue for the old version and green for the new one). You then
switch traffic from blue to green. If any issues arise with the green environment
(the new version), you can instantly route traffic back to the blue environment
(the old version). This approach offers the fastest rollback mechanism as it
merely involves a change in the traffic routing.
88. You are building and deploying a microservice on Cloud Run for your organization.
Your service is used by many applications internally. You are deploying a new release,
and you need to test the new version extensively in the staging and production
environments. You must minimize user and developer impact. What should you do?
a. Deploy the new version of the service to the staging environment. Split the
traffic, and allow 1% of traffic through to the latest version. Test the latest
version. If the test passes, gradually roll out the latest version to the staging
and production environments. Deploying the new version to the staging
environment and allowing only 1% of traffic to the latest version minimizes user
and developer impact, as the majority of traffic continues to be served by the
current version. Testing the latest version in the staging environment ensures
that any issues can be identified and addressed before rolling out the new
version to production.
b. Deploy the new version of the service to the staging environment. Split the
traffic, and allow 50% of traffic through to the latest version. Test the latest
version. If the test passes, send all traffic to the latest version. Repeat for the
production environment.
c. Deploy the new version of the service to the staging environment with a new-
release tag without serving traffic. Test the new-release version. If the test
passes, gradually roll out this tagged version. Repeat for the production
environment. Para mi no es por esto: Deploying the new version with a new-
release tag without serving traffic in the staging environment delays the testing
process and does not provide an accurate representation of how the new
version performs under real-world conditions with actual traffic.
d. Deploy a new environment with the green tag to use as the staging
environment. Deploy the new version of the service to the green environment
and test the new version. If the tests pass, send all traffic to the green
environment and delete the existing staging environment. Repeat for the
production environment.
89. You work for a global organization and run a service with an availability target of 99%
with limited engineering resources.
For the current calendar month, you noticed that the service has 99.5% availability.
You must ensure that your service meets the defined availability goals and can react
to business changes, including the upcoming launch of new features.
You also need to reduce technical debt while minimizing operational costs. You want
to follow Google-recommended practices. What should you do?
a. Add N+1 redundancy to your service by adding additional compute resources
to the service.
b. Identify, measure, and eliminate toil by automating repetitive tasks. In the
context of running a service with a 99% availability target and limited
engineering resources, this approach aligns with Google's Site Reliability
Engineering (SRE) principles. By automating manual and repetitive operational
work, engineering teams can enhance efficiency, reduce the risk of human
error, and free up valuable resources. The emphasis on eliminating toil not only
contributes to meeting availability targets by minimizing the potential for errors
but also allows engineering teams to allocate more time to strategic tasks,
reducing technical debt, and facilitating a more agile response to business
changes, such as the launch of new features.
c. Define an error budget for your service level availability and minimize the
remaining error budget.
d. Allocate available engineers to the feature backlog while you ensure that the
service remains within the availability target
90. You are developing the deployment and testing strategies for your CI/CD pipeline in
Google Cloud. You must be able to:
• Reduce the complexity of release deployments and minimize the duration of
deployment rollbacks.
• Test real production traffic with a gradual increase in the number of affected users.
You want to select a deployment and testing strategy that meets your requirements.
What should you do?
a. Recreate deployment and canary testing
b. Blue/green deployment and canary testing. Blue/Green Deployment: In a
blue/green deployment, you maintain two separate environments, one (blue)
with the current version of your application in production and another (green)
with the new version. You switch traffic from the blue environment to the green
environment once testing is successful.
Canary Testing: Canary testing involves gradually rolling out a new version of
the application to a small subset of users or traffic. This allows for real
production traffic testing with minimal impact, and if issues are detected, the
deployment can be rolled back quickly.
c. Rolling update deployment and A/B testing
d. Rolling update deployment and shadow testing
91. You are creating a CI/CD pipeline to perform Terraform deployments of Google Cloud
resources. Your CI/CD tooling is running in Google Kubernetes Engine (GKE) and uses
an ephemeral Pod for each pipeline run. You must ensure that the pipelines that run
in the Pods have the appropriate Identity and Access Management (IAM) permissions
to perform the Terraform deployments. You want to follow Google-recommended
practices for identity management. What should you do? (Choose two.)
a. Create a new Kubernetes service account, and assign the service account to
the Pods. Use Workload Identity to authenticate as the Google service
account. Suggests creating a new Kubernetes service account and assigning it
to the Pods. This service account is then associated with a Google service
account using Workload Identity. This setup enables seamless authentication of
Pods as the specified Google service account without relying on manual
management of service account keys.
b. Create a new JSON service account key for the Google service account, store
the key as a Kubernetes secret, inject the key into the Pods, and set the
GOOGLE_APPLICATION_CREDENTIALS environment variable.
c. Create a new Google service account, and assign the appropriate IAM
permissions. Complements the approach by emphasizing the creation of a new
Google service account and assigning the necessary IAM permissions. While the
Kubernetes service account establishes the identity within the GKE cluster, the
Google service account is associated with the underlying Google Cloud
resources, ensuring that the appropriate permissions are granted for Terraform
deployments.
d. Create a new JSON service account key for the Google service account, store
the key in the secret management store for the CI/CD tool, and configure
Terraform to use this key for authentication.
e. Assign the appropriate IAM permissions to the Google service account
associated with the Compute Engine VM instances that run the Pods.
92. You are the on-call Site Reliability Engineer for a microservice that is deployed to a
Google Kubernetes Engine (GKE) Autopilot cluster. Your company runs an online store
that publishes order messages to Pub/Sub, and a microservice receives these
messages and updates stock information in the warehousing system. A sales event
caused an increase in orders, and the stock information is not being updated quickly
enough. This is causing a large number of orders to be accepted for products that are
out of stock. You check the metrics for the microservice and compare them to typical
levels:

You need to ensure that the warehouse system accurately reflects product inventory
at the time orders are placed and minimize the impact on customers. What should you
do?
a. Decrease the acknowledgment deadline on the subscription.
b. Add a virtual queue to the online store that allows typical traffic levels.
c. Increase the number of Pod replicas. By scaling horizontally, additional
instances of the microservice are created, enabling parallel processing of order
messages and reducing the backlog. This approach leverages the flexibility and
automation of GKE Autopilot, ensuring that the infrastructure dynamically
adapts to the increased load, providing a more responsive and scalable solution
to minimize the impact on customers and maintain accurate stock information
in the warehousing system.
d. Increase the Pod CPU and memory limits.
93. Your team deploys applications to three Google Kubernetes Engine (GKE)
environments: development, staging, and production. You use GitHub repositories as
your source of truth. You need to ensure that the three environments are consistent.
You want to follow Google-recommended practices to enforce and install network
policies and a logging DaemonSet on all the GKE clusters in those environments. What
should you do?
a. Use Google Cloud Deploy to deploy the network policies and the DaemonSet.
Use Cloud Monitoring to trigger an alert if the network policies and
DaemonSet drift from your source in the repository.
b. Use Google Cloud Deploy to deploy the DaemonSet and use Policy Controller
to configure the network policies. Use Cloud Monitoring to detect drifts from
the source in the repository and Cloud Functions to correct the drifts.
c. Use Cloud Build to render and deploy the network policies and the
DaemonSet. Set up Config Sync to sync the configurations for the three
environments. C is not as effective as D because it does not enforce the network
policies and DaemonSet configurations. This means that unauthorized changes
could still be made to the configurations.
Config Sync is a tool that can be used to synchronize Kubernetes configurations
across multiple clusters. However, it does not prevent unauthorized changes
from being made to the configurations.
d. Use Cloud Build to render and deploy the network policies and the
DaemonSet. Set up a Policy Controller to enforce the configurations for the
three environments. Policy Controller is a tool that can be used to enforce
Kubernetes configurations. It does this by monitoring the Kubernetes API for
changes to the configurations and automatically reverting unauthorized
changes.
94. You are using Terraform to manage infrastructure as code within a CI/CD pipeline. You
notice that multiple copies of the entire infrastructure stack exist in your Google Cloud
project, and a new copy is created each time a change to the existing infrastructure is
made. You need to optimize your cloud spend by ensuring that only a single instance
of your infrastructure stack exists at a time. You want to follow Google-recommended
practices. What should you do?
a. Create a new pipeline to delete old infrastructure stacks when they are no
longer needed.
b. Confirm that the pipeline is storing and retrieving the terraform.tfstate file
from Cloud Storage with the Terraform gcs backend. By storing and retrieving
the terraform.tfstate file from Cloud Storage with the Terraform Google Cloud
Storage (GCS) backend, you centralize the state management. This ensures that
Terraform has a single source of truth for the infrastructure state, preventing
the creation of redundant instances. The GCS backend enables state locking,
consistency, and collaboration across the CI/CD pipeline, allowing for proper
tracking and management of infrastructure changes. This practice helps avoid
unnecessary duplication of resources and promotes efficient cloud spend
management.
c. Verify that the pipeline is storing and retrieving the terraform.tfstate file from
a source control. Esto no está mal, pero ahí dice Google Recommended,
entonces lo mejor sería usar las tecnologías de Google, como en el caso b.
d. Update the pipeline to remove any existing infrastructure before you apply
the latest configuration.
95. You are creating Cloud Logging sinks to export log entries from Cloud Logging to
BigQuery for future analysis. Your organization has a Google Cloud folder named Dev
that contains development projects and a folder named Prod that contains production
projects. Log entries for development projects must be exported to dev_dataset, and
log entries for production projects must be exported to prod_dataset. You need to
minimize the number of log sinks created, and you want to ensure that the log sinks
apply to future projects. What should you do?
a. Create a single aggregated log sink at the organization level. Como funcionar
funciona, pero no tenes forma de discriminar (rápidamente) a quién pertenece
el conjunto de datos.
b. Create a log sink in each project. Si lo haces así, los futuros proyectos no van a
exportar los logs.
c. Create two aggregated log sinks at the organization level, and filter by project
ID. Funciona también, pero es muy ineficiente.
d. Create an aggregated log sink in the Dev and Prod folders. By creating an
aggregated log sink at the folder level for both the Dev and Prod folders, you
can enforce consistent export configurations for all existing and forthcoming
projects within each folder. This approach facilitates centralized management,
eliminating the need for creating individual sinks for each project. It reduces
complexity and ensures that any new projects added to the Dev or Prod folders
will automatically adopt the log sink configuration, streamlining the process and
fostering uniformity throughout the organization.
96. Your company runs services by using multiple globally distributed Google Kubernetes
Engine (GKE) clusters. Your operations team has set up workload monitoring that uses
Prometheus-based tooling for metrics, alerts, and generating dashboards. This setup
does not provide a method to view metrics globally across all clusters. You need to
implement a scalable solution to support global Prometheus querying and minimize
management overhead. What should you do?
a. Configure Prometheus cross-service federation for centralized data access.
b. Configure workload metrics within Cloud Operations for GKE.
c. Configure Prometheus hierarchical federation for centralized data access.
d. Configure Google Cloud Managed Service for Prometheus. This approach
involves leveraging the fully managed service offered by Google Cloud for
Prometheus, which streamlines the collection, querying, and alerting on
metrics. With this service, you can effortlessly centralize metrics from various
globally distributed GKE clusters, eliminating the need for intricate
configurations or federation mechanisms. By opting for the Google Cloud
Managed Service for Prometheus, you simplify operational tasks, reduce
administrative overhead, and gain a unified and straightforward method for
monitoring and analyzing metrics on a global scale across all clusters within the
Google Cloud environment.
97. You need to build a CI/CD pipeline for a containerized application in Google Cloud.
Your development team uses a central Git repository for trunk-based development.
You want to run all your tests in the pipeline for any new versions of the application
to improve the quality. What should you do?
La respuesta es la D

In this method, unit tests are seamlessly integrated into the initial stages of the pipeline,
automatically triggered when code is pushed. Successful unit tests then prompt Cloud
Build to construct and push the application container to a central registry. The pipeline
subsequently orchestrates the deployment of the container to a testing environment,
where comprehensive integration and acceptance tests are executed. Only when all
tests pass successfully does the pipeline advance to deploying the application to the
production environment, accompanied by the execution of smoke tests.

This systematic and thorough process ensures that any code changes undergo rigorous
testing at various stages, ensuring high-quality standards and instilling confidence in the
deployment process to production.

98. The new version of your containerized application has been tested and is ready to be
deployed to production on Google Kubernetes Engine (GKE). You could not fully load-
test the new version in your pre-production environment, and you need to ensure that
the application does not have performance problems after deployment. Your
deployment must be automated. What should you do?
a. Deploy the application through a continuous delivery pipeline by using canary
deployments. Use Cloud Monitoring to look for performance issues, and ramp
up traffic as supported by the metrics. In Blue/Green deployment you can
rollback quickly after facing the performance issue, but in Canary you can
detect performance issue on partial deployment and rollback before the issue
get affected. Canary approach is better in this specific scenario as it allows for
monitoring the performance of the new version in production while minimizing
the risk of widespread issues.
b. Deploy the application through a continuous delivery pipeline by using
blue/green deployments. Migrate traffic to the new version of the application
and use Cloud Monitoring to look for performance issues.
c. Deploy the application by using kubectl and use Config Connector to slowly
ramp up traffic between versions. Use Cloud Monitoring to look for
performance issues.
d. Deploy the application by using kubectl and set the spec.updateStrategy.type
field to RollingUpdate. Use Cloud Monitoring to look for performance issues,
and run the kubectl rollback command if there are any issues.
99. You are managing an application that runs in Compute Engine. The application uses a
custom HTTP server to expose an API that is accessed by other applications through
an internal TCP/UDP load balancer. A firewall rule allows access to the API port from
0.0.0.0/0. You need to configure Cloud Logging to log each IP address that accesses
the API by using the fewest number of steps. What should you do first?
a. Enable Packet Mirroring on the VPC.
b. Install the Ops Agent on the Compute Engine instances.
c. Enable logging on the firewall rule.
d. Enable VPC Flow Logs on the subnet. To configure Cloud Logging to log each IP
address accessing the API with the fewest steps in a Compute Engine
environment using an internal TCP/UDP load balancer, the first step would be
to enable VPC Flow Logs on the subnet. That will allow you to capture network
flow information, including source and destination IP addresses, as traffic
passes through the load balancer.
VPC Flow Logs provide detailed visibility into network activity without requiring
modifications to individual instances or the installation of additional agents.
Enabling VPC Flow Logs is a straightforward and efficient way to capture the
necessary information for logging IP addresses accessing the API in a Compute
Engine environment.
Professional Cloud DevOps – Google Cloud II
1. Your organization wants to increase the availability target of an application from
99.9% to 99.99% for an investment of $2,000. The application's current revenue is
$1,000,000. You need to determine whether the increase in availability is worth the
investment for a single year of usage. What should you do?
a. Calculate the value of improved availability to be $900, and determine that
the increase in availability is not worth the investment. To assess the cost-
effectiveness of investing in an availability increase from 99.9% to 99.99% for a
single year, it's essential to calculate the additional revenue generated by the
improved availability.
Using the formula 1000000÷100×0.09, which represents 0.09% of the current
revenue of $1,000,000, the calculated value is $900. This indicates that the
potential increase in revenue due to the higher availability is $900. Given that
the investment cost is $2,000, the gain falls short of covering the investment,
making the decision not to pursue the availability increase financially
impractical.
b. Calculate the value of improved availability to be $1,000, and determine that
the increase in availability is not worth the investment.
c. Calculate the value of improved availability to be $1,000, and determine that
the increase in availability is worth the investment.
d. Calculate the value of improved availability to be $9,000, and determine that
the increase in availability is worth the investment.
2. A third-party application needs to have a service account key to work properly. When
you try to export the key from your cloud project, you receive an error: “The
organization policy constraint iam.disableServiceAccounKeyCreation is enforced.”
You need to make the third-party application work while following Google-
recommended security practices.
What should you do?
a. Enable the default service account key, and download the key.
b. Remove the iam.disableServiceAccountKeyCreation policy at the organization
level, and create a key.
c. Disable the service account key creation policy at the project's folder, and
download the default key.
d. Add a rule to set the iam.disableServiceAccountKeyCreation policy to off in
your project, and create a key. By adding a rule to set the
"iam.disableServiceAccountKeyCreation" policy to "off" specifically in your
project, you can override the organization-level constraint temporarily for your
project. This allows you to create the necessary service account key for the
third-party application without compromising the organization-wide security
policy. This targeted adjustment ensures that the key creation is enabled only
for the project in question, maintaining security standards across the broader
organization.
3. Your team is writing a postmortem after an incident on your external facing
application. Your team wants to improve the postmortem policy to include triggers
that indicate whether an incident requires a postmortem. Based on Site Reliability
Engineering (SRE) practices, what triggers should be defined in the postmortem
policy? (Choose two.)
a. An external stakeholder asks for a postmortem
b. Data is lost due to an incident.
c. An internal stakeholder requests a postmortem.
d. The monitoring system detects that one of the instances for your application
has failed.
e. The CD pipeline detects an issue and rolls back a problematic release

Teams have some internal flexibility, but common postmortem triggers include:

• User-visible downtime or degradation beyond a certain threshold


• Data loss of any kind
• On-call engineer intervention (release rollback, rerouting of traffic, etc.)
• A resolution time above some threshold
• A monitoring failure (which usually implies manual incident discovery)

La C y la E también son motivo de hacer un postmortem, pero algunas pesan un poco


más que otras, la B por sobre todo.

4. You are implementing a CI/CD pipeline for your application in your company’s multi-
cloud environment. Your application is deployed by using custom Compute Engine
images and the equivalent in other cloud providers. You need to implement a solution
that will enable you to build and deploy the images to your current environment and
is adaptable to future changes. Which solution stack should you use?
a. Cloud Build with Packer. Cloud Build integrates seamlessly with Packer, a tool
for creating machine images across various platforms. This combination
provides a flexible and scalable solution for building custom images and
deploying them across different cloud providers. Packer allows you to define
infrastructure as code and supports multiple cloud providers, ensuring
adaptability to future changes in the multi-cloud environment. The unified
approach of Cloud Build and Packer streamlines the CI/CD pipeline, enabling
efficient image creation and deployment processes while maintaining cross-
cloud compatibility.
b. Cloud Build with Google Cloud Deploy. No sirve para multicloud, solo para GCP.
c. Google Kubernetes Engine with Google Cloud Deploy
d. Cloud Build with kpt
5. Your application's performance in Google Cloud has degraded since the last release.
You suspect that downstream dependencies might be causing some requests to take
longer to complete. You need to investigate the issue with your application to
determine the cause. What should you do?
a. Configure Error Reporting in your application.
b. Configure Google Cloud Managed Service for Prometheus in your application.
c. Configure Cloud Profiler in your application.
d. Configure Cloud Trace in your application. Cloud Trace is a distributed tracing
system that collects latency data from your applications and displays it in the
Google Cloud Console. You can track how requests propagate through your
application and receive detailed near real-time performance insights.
Peticiones HTTP por ejemplo, es un downstream dependency.
Cloud Trace provides detailed insights into the end-to-end latency of requests,
enabling you to trace the execution flow across various services and
dependencies. By utilizing tracing, you can identify bottlenecks and delays,
allowing for a comprehensive analysis of the application's performance. Cloud
Trace offers valuable data, including latency information and detailed timelines
for each request, facilitating effective troubleshooting and performance
optimization. This makes it a suitable choice when investigating issues related
to request completion times and downstream dependencies in your Google
Cloud environment.
6. You are creating a CI/CD pipeline in Cloud Build to build an application container
image. The application code is stored in GitHub. Your company requires that
production image builds are only run against the main branch and that the change
control team approves all pushes to the main branch. You want the image build to be
as automated as possible. What should you do? (Choose two.)
a. Create a trigger on the Cloud Build job. Set the repository event setting to ‘Pull
request’.
b. Add the OWNERS file to the Included files filter on the trigger.
c. Create a trigger on the Cloud Build job. Set the repository event setting to
‘Push to a branch’. Setting the repository event setting to ‘Push to a branch’ will
trigger the Cloud Build job whenever a push is made to any branch in the
repository. This is necessary because you want the image build to be triggered
when a push is made to the main branch.
d. Configure a branch protection rule for the main branch on the repository.
Configuring a branch protection rule for the main branch on the repository will
require that all pushes to the main branch be approved by the change control
team. This is necessary to ensure that only approved changes are made to the
main branch, which will then trigger the image build.
e. Enable the Approval option on the trigger.
7. You built a serverless application by using Cloud Run and deployed the application to
your production environment. You want to identify the resource utilization of the
application for cost optimization. What should you do?
a. Use Cloud Trace with distributed tracing to monitor the resource utilization of
the application.
b. Use Cloud Profiler with Ops Agent to monitor the CPU and memory utilization
of the application.
c. Use Cloud Monitoring to monitor the container CPU and memory utilization
of the application. Cloud Monitoring provides comprehensive monitoring
capabilities, allowing you to track container CPU and memory utilization
effectively. Specifically tailored for containerized environments like Cloud Run,
Cloud Monitoring provides insights into key metrics, enabling you to analyze
resource consumption, identify potential bottlenecks, and optimize costs based
on observed utilization patterns.
By monitoring container CPU and memory metrics, you gain valuable data for
making informed decisions about resource allocation, ensuring efficient usage,
and ultimately optimizing the cost of running the serverless application in your
production environment.
d. Use Cloud Ops to create logs-based metrics to monitor the resource utilization
of the application.
8. Your company is using HTTPS requests to trigger a public Cloud Run-hosted service
accessible at the https://booking-engine-abcdef.a.run.app URL. You need to give
developers the ability to test the latest revisions of the service before the service is
exposed to customers. What should you do?
a. Run the gcloud run deploy booking-engine --no-traffic --tag dev command. Use
the https://dev--booking-engine-abcdef.a.run.app URL for testing. To enable
developers to test the latest revisions of a Cloud Run-hosted service before
exposing it to customers, the recommended approach is to use the --no-traffic
flag during deployment.
By running the command gcloud run deploy booking-engine --no-traffic --tag
dev, you deploy the service with no traffic routed to it initially. Subsequently,
developers can test the latest revisions using the URL https://dev--booking-
engine-abcdef.a.run.app. This allows for a controlled and private testing
environment where developers can validate the service's functionality and
behavior before making it publicly accessible. Utilizing a dedicated URL with the
--tag option ensures that developers can interact with the specific version
intended for testing, facilitating a smooth and secure testing process.
b. Run the gcloud run services update-traffic booking-engine --to-revisions
LATEST=1 command. Use the https://booking-engine-abcdef.a.run.app URL
for testing. 100% of traffic will be route to the latest revision.
c. Pass the curl –H “Authorization:Bearer $(gcloud auth print-identity-token)”
auth token. Use the https://booking-engine-abcdef.a.run.app URL to test
privately.
d. Grant the roles/run.invoker role to the developers testing the booking-engine
service. Use the https://booking-engine-abcdef.private.run.app URL for
testing.
9. You are configuring connectivity across Google Kubernetes Engine (GKE) clusters in
different VPCs. You notice that the nodes in Cluster A are unable to access the nodes
in Cluster B. You suspect that the workload access issue is due to the network
configuration. You need to troubleshoot the issue but do not have execute access to
workloads and nodes. You want to identify the layer at which the network
connectivity is broken. What should you do?
a. Install a toolbox container on the node in Cluster Confirm that the routes to
Cluster B are configured appropriately.
b. Use Network Connectivity Center to perform a Connectivity Test from Cluster
A to Cluster B. By performing a Connectivity Test from Cluster A to Cluster B
using Network Connectivity Center, you can assess the network path and
identify potential issues affecting the connectivity. This diagnostic tool helps
pinpoint where the connectivity breaks down, enabling you to analyze the
network configuration and resolve any misconfigurations or obstacles that
might be hindering communication between the clusters. This approach
provides a focused and efficient way to troubleshoot the specific network layer
where the connectivity issue occurs without requiring execute access to
workloads and nodes in the clusters.
c. Use a debug container to run the traceroute command from Cluster A to
Cluster B and from Cluster B to Cluster A. Identify the common failure point.
d. Enable VPC Flow Logs in both VPCs, and monitor packet drops.
10. You manage an application that runs in Google Kubernetes Engine (GKE) and uses the
blue/green deployment methodology. Extracts of the Kubernetes manifests are
shown below:
The Deployment app-green was updated to use the new version of the application.
During post-deployment monitoring, you notice that the majority of user requests are
failing. You did not observe this behavior in the testing environment. You need to
mitigate the incident impact on users and enable the developers to troubleshoot the
issue. What should you do?
a. Update the Deployment app-blue to use the new version of the application.
b. Update the Deployment app-green to use the previous version of the
application.
c. Change the selector on the Service app-svc to app: my-app.
d. Change the selector on the Service app-svc to app: my-app, version: blue. This
adjustment allows the Service to route traffic specifically to the previous version
of the application (app-blue). By changing the selector to include the version
label, you effectively roll back the traffic to the working version, providing a
quick and effective solution to address the user request failures observed after
the update to app-green.
This enables developers to investigate and troubleshoot the issue in a controlled
manner without affecting user experience, and it serves as an interim solution
while further investigation and remediation are carried out.
11. You are running a web application deployed to a Compute Engine managed instance
group. Ops Agent is installed on all instances. You recently noticed suspicious activity
from a specific IP address. You need to configure Cloud Monitoring to view the number
of requests from that specific IP address with minimal operational overhead. What
should you do?
a. Configure the Ops Agent with a logging receiver. Create a logs-based metric.
To efficiently monitor the number of requests from a specific IP address in a
web application deployed to a Compute Engine managed instance group with
Ops Agent installed, the recommended approach is to configure the Ops Agent
with a logging receiver and create a logs-based metric.
By configuring the Ops Agent with a logging receiver, it can collect and forward
relevant log data to Cloud Monitoring. Subsequently, creating a logs-based
metric allows you to define a metric based on the extracted information from
the logs, specifically focusing on the number of requests from the suspicious IP
address. This setup minimizes operational overhead by leveraging existing
infrastructure and provides a streamlined way to monitor and analyze the
targeted metric related to the suspicious activity, aiding in swift detection and
response to potential security issues.
b. Create a script to scrape the web server log. Export the IP address request
metrics to the Cloud Monitoring API.
c. Update the application to export the IP address request metrics to the Cloud
Monitoring API.
d. Configure the Ops Agent with a metrics receiver.
12. Your organization is using Helm to package containerized applications. Your
applications reference both public and private charts. Your security team flagged that
using a public Helm repository as a dependency is a risk. You want to manage all charts
uniformly, with native access control and VPC Service Controls. What should you do?
a. Store public and private charts in OCI format by using Artifact Registry. To
address security concerns and maintain consistent access controls for Helm
charts, it's recommended to store both public and private charts in the Open
Container Initiative (OCI) format using Google Cloud's Artifact Registry.
Artifact Registry provides a centralized, secure repository for Helm charts with
native access control features and integration capabilities with VPC Service
Controls. Storing charts in OCI format ensures a standardized approach to
packaging, and Artifact Registry offers a robust solution for organizing and
securing container artifacts. This approach improves security by centralizing
both public and private charts, aligning with best practices for Helm chart
management in containerized applications.
b. Store public and private charts by using GitHub Enterprise with Google
Workspace as the identity provider.
c. Store public and private charts by using Git repository. Configure Cloud Build
to synchronize contents of the repository into a Cloud Storage bucket. Connect
Helm to the bucket by using https://[bucket].storage-
googleapis.com/[helmchart] as the Helm repository.
d. Configure a Helm chart repository server to run in Google Kubernetes Engine
(GKE) with Cloud Storage bucket as the storage backend.
13. You use Terraform to manage an application deployed to a Google Cloud
environment. The application runs on instances deployed by a managed instance
group. The Terraform code is deployed by using a CI/CD pipeline. When you change
the machine type on the instance template used by the managed instance group, the
pipeline fails at the terraform apply stage with the following error message:

You need to update the instance template and minimize disruption to the application
and the number of pipeline runs.
What should you do?
a. Delete the managed instance group, and recreate it after updating the
instance template.
b. Add a new instance template, update the managed instance group to use the
new instance template, and delete the old instance template. It suggests a
manual change, but we should use terraform in a CI/CD pipeline. Answer D is
the good one, it does the same thing than B, but with terraform automatically,
as the default behavior of terraform is to destroy resources before recreating it
when it cannot be updated, we have to tune the lifecycle parameter.
c. Remove the managed instance group from the Terraform state file, update the
instance template, and reimport the managed instance group.
d. Set the create_before_destroy meta-argument to true in the lifecycle block on
the instance template. By setting the create_before_destroy meta-argument to
true in the lifecycle block on the instance template, Terraform will create a new
instance template before destroying the old one. This ensures a smooth
transition by allowing the managed instance group to gradually replace
instances with the new template without causing downtime. This method
optimizes the update process, reducing disruption to the application and
minimizing the number of pipeline runs needed.
14. Your company operates in a highly regulated domain that requires you to store all
organization logs for seven years. You want to minimize logging infrastructure
complexity by using managed services. You need to avoid any future loss of log
capture or stored logs due to misconfiguration or human error. What should you do?
a. Use Cloud Logging to configure an aggregated sink at the organization level to
export all logs into a BigQuery dataset.
b. Use Cloud Logging to configure an aggregated sink at the organization level to
export all logs into Cloud Storage with a seven-year retention policy and
Bucket Lock. By using Cloud Logging to configure an aggregated sink at the
organization level to export all logs into Cloud Storage with a seven-year
retention policy and Bucket Lock, you establish a centralized and secure storage
solution. This setup leverages Cloud Storage's features, such as retention
policies and Bucket Lock, to prevent accidental or intentional deletion of logs,
reducing the risk of data loss due to misconfiguration or human error. It
provides a robust and managed solution for storing logs over the required
seven-year period, ensuring compliance with regulatory standards.
c. Use Cloud Logging to configure an export sink at each project level to export
all logs into a BigQuery dataset
d. Use Cloud Logging to configure an export sink at each project level to export
all logs into Cloud Storage with a seven-year retention policy and Bucket Lock.
15. You are building the CI/CD pipeline for an application deployed to Google Kubernetes
Engine (GKE). The application is deployed by using a Kubernetes Deployment, Service,
and Ingress. The application team asked you to deploy the application by using the
blue/green deployment methodology. You need to implement the rollback actions.
What should you do?
a. Run the kubectl rollout undo command. Using the kubectl rollout undo
command is another rollback method, but updating the Service provides a
cleaner and more declarative approach within the Kubernetes environment.
b. Delete the new container image, and delete the running Pods.
c. Update the Kubernetes Service to point to the previous Kubernetes
Deployment. Updating the Kubernetes Service to point to the previous
Kubernetes Deployment effectively rolls back the service to the previous
version. This action ensures that traffic is directed to the previous deployment,
mitigating any issues introduced by the new version. It is a controlled and
Kubernetes-native way to perform rollbacks, allowing for seamless transitions
between different versions of the application without downtime.
d. Scale the new Kubernetes Deployment to zero.
16. You are building and running client applications in Cloud Run and Cloud Functions.
Your client requires that all logs must be available for one year so that the client can
import the logs into their logging service. You must minimize required code changes.
What should you do?
a. Update all images in Cloud Run and all functions in Cloud Functions to send
logs to both Cloud Logging and the client's logging service. Ensure that all the
ports required to send logs are open in the VPC firewall.
b. Create a Pub/Sub topic, subscription, and logging sink. Configure the logging
sink to send all logs into the topic. Give your client access to the topic to
retrieve the logs.
c. Create a storage bucket and appropriate VPC firewall rules. Update all images
in Cloud Run and all functions in Cloud Functions to send logs to a file within
the storage bucket.
d. Create a logs bucket and logging sink. Set the retention on the logs bucket to
365 days. Configure the logging sink to send logs to the bucket. Give your client
access to the bucket to retrieve the logs. Creating a logs bucket and logging
sink, is the recommended approach to meet the client's requirement of
retaining logs for one year while minimizing code changes. By configuring a
logging sink to send logs to a bucket and setting the retention period on the
bucket to 365 days, you leverage Google Cloud's managed logging and storage
services. This approach abstracts the log storage from your applications running
in Cloud Run and Cloud Functions, ensuring that logs are retained for the
specified duration without requiring modifications to individual codebases.
Additionally, providing your client access to the designated bucket allows them
to retrieve the logs seamlessly. It's a centralized and efficient solution that aligns
with best practices for log management in oogle Cloud.
17. You have an application that runs in Google Kubernetes Engine (GKE). The application
consists of several microservices that are deployed to GKE by using Deployments and
Services. One of the microservices is experiencing an issue where a Pod returns 403
errors after the Pod has been running for more than five hours. Your development
team is working on a solution, but the issue will not be resolved for a month. You need
to ensure continued operations until the microservice is fixed. You want to follow
Google-recommended practices and use the fewest number of steps. What should you
do?
a. Create a cron job to terminate any Pods that have been running for more than
five hours.
b. Add a HTTP liveness probe to the microservice's deployment. Liveness probes
are used to monitor the health of containers inside pods. They can identify
application instances that have failed, even if the pod appears to be operational,
If a liveness probe detects an unhealthy state, Kubernetes kills the container
and tries to redeploy it. If the probe succeeds, no action is taken and no events
are logged.
This approach ensures continuous operations by proactively detecting and
addressing issues, minimizing manual intervention. It aligns with Google-
recommended practices for handling service health and availability in a
Kubernetes environment with minimal steps and complexity.
c. Monitor the Pods, and terminate any Pods that have been running for more
than five hours.
d. Configure an alert to notify you whenever a Pod returns 403 errors.
18. You want to share a Cloud Monitoring custom dashboard with a partner team. What
should you do?
a. Provide the partner team with the dashboard URL to enable the partner team
to create a copy of the dashboard. Provide the partner team with the
dashboard URL to enable them to create a copy of the dashboard. By sharing
the dashboard URL, the partner team can access and duplicate the specific
Cloud Monitoring custom dashboard easily. This method allows for a
straightforward and efficient way to share the dashboard configuration without
the need for additional manual steps or exporting/importing files. It ensures a
seamless transfer of the custom dashboard to the partner team, enabling them
to leverage the same monitoring setup for their needs.
b. Export the metrics to BigQuery. Use Looker Studio to create a dashboard, and
share the dashboard with the partner team.
c. Copy the Monitoring Query Language (MQL) query from the dashboard, and
send the ML query to the partner team.
d. Download the JSON definition of the dashboard, and send the JSON file to the
partner team.
19. You are building an application that runs on Cloud Run. The application needs to
access a third-party API by using an API key. You need to determine a secure way to
store and use the API key in your application by following Google-recommended
practices. What should you do?
a. Save the API key in Secret Manager as a secret. Reference the secret as an
environment variable in the Cloud Run application. Save the API key in Secret
Manager as a secret and reference the secret as an environment variable in the
Cloud Run application. This approach aligns with Google-recommended
practices for securely managing sensitive information. Secret Manager provides
a centralized and secure storage for secrets, allowing you to store and retrieve
the API key. Referencing the secret as an environment variable in the Cloud Run
application ensures that the key remains confidential and is easily accessible
without exposing it directly in the code. It enhances security by separating
sensitive information from the application logic and adheres to best practices
for secure credential management in a cloud environment.
b. Save the API key in Secret Manager as a secret key. Mount the secret key
under the /sys/api_key directory, and decrypt the key in the Cloud Run
application.
c. Save the API key in Cloud Key Management Service (Cloud KMS) as a key.
Reference the key as an environment variable in the Cloud Run application.
d. Encrypt the API key by using Cloud Key Management Service (Cloud KMS), and
pass the key to Cloud Run as an environment variable. Decrypt and use the
key in Cloud Run.
20. You are currently planning how to display Cloud Monitoring metrics for your
organization’s Google Cloud projects. Your organization has three folders and six
projects:

You want to configure Cloud Monitoring dashboards to only display metrics from the
projects within one folder. You need to ensure that the dashboards do not display
metrics from projects in the other folders. You want to follow Google-recommended
practices. What should you do?
a. Create a single new scoping project.
b. Create new scoping projects for each folder. Create new scoping projects for
each folder. Google Cloud Monitoring dashboards allow you to scope metrics
based on projects, and by creating separate scoping projects for each folder,
you can effectively isolate and display metrics only from the projects within
that specific folder. This approach aligns with Google-recommended practices
by providing a structured and organized way to manage and visualize metrics.
Using dedicated scoping projects for each folder ensures a clean separation of
monitoring data, allowing you to customize dashboards according to the specific
projects within a given folder while excluding metrics from projects in other
folders.
c. Use the current app-one-prod project as the scoping project.
d. Use the current app-one-dev, app-one-staging, and app-one-prod projects as
the scoping project for each folder.
21. Your company’s security team needs to have read-only access to Data Access audit
logs in the _Required bucket. You want to provide your security team with the
necessary permissions following the principle of least privilege and Google-
recommended practices. What should you do?
a. Assign the roles/logging.viewer role to each member of the security team.
b. Assign the roles/logging.viewer role to a group with all the security team
members.
c. Assign the roles/logging.privateLogViewer role to each member of the
security team.
d. Assign the roles/logging.privateLogViewer role to a group with all the security
team members. Assign the roles/logging.privateLogViewer role to a group with
all the security team members. This approach follows the principle of least
privilege by granting the specific role needed for read-only access to Data Access
audit logs. The roles/logging.privateLogViewer role is more restrictive than
roles/logging.viewer, providing access only to private logs, such as Data
Access audit logs, and aligns with Google-recommended practices for securing
sensitive data.
22. Your team is building a service that performs compute-heavy processing on batches
of data. The data is processed faster based on the speed and number of CPUs on the
machine. These batches of data vary in size and may arrive at any time from multiple
third-party sources. You need to ensure that third parties are able to upload their data
securely. You want to minimize costs, while ensuring that the data is processed as
quickly as possible. What should you do?
a. Provide a secure file transfer protocol (SFTP) server on a Compute Engine
instance so that third parties can upload batches of data, and provide
appropriate credentials to the server.
Create a Cloud Function with a google.storage.object.finalize Cloud Storage
trigger. Write code so that the function can scale up a Compute Engine
autoscaling managed instance group
Use an image pre-loaded with the data processing software that terminates
the instances when processing completes.
b. Provide a Cloud Storage bucket so that third parties can upload batches of
data, and provide appropriate Identity and Access Management (IAM) access
to the bucket.
Use a standard Google Kubernetes Engine (GKE) cluster and maintain two
services: one that processes the batches of data, and one that monitors Cloud
Storage for new batches of data.
Stop the processing service when there are no batches of data to process.
c. Provide a Cloud Storage bucket so that third parties can upload batches of
data, and provide appropriate Identity and Access Management (IAM) access
to the bucket.
Create a Cloud Function with a google.storage.object.finalize Cloud Storage
trigger. Write code so that the function can scale up a Compute Engine
autoscaling managed instance group.
Use an image pre-loaded with the data processing software that terminates
the instances when processing completes. Provide a Cloud Storage bucket for
third parties to upload batches of data, and utilize a Cloud Function with a
google.storage.object.finalize trigger to scale up a Compute Engine autoscaling
managed instance group. This approach ensures secure data uploads to a Cloud
Storage bucket with proper IAM access controls. The Cloud Function, triggered
upon new object finalization in the bucket, scales up a managed instance group
with pre-loaded data processing software, optimizing for compute-heavy tasks.
The instances terminate upon completion, minimizing costs. This design
efficiently leverages serverless and autoscaling capabilities, ensuring quick and
cost-effective processing of data batches arriving at varying times from multiple
sources.
d. Provide a Cloud Storage bucket so that third parties can upload batches of
data, and provide appropriate Identity and Access Management (IAM) access
to the bucket.
Use Cloud Monitoring to detect new batches of data in the bucket and trigger
a Cloud Function that processes the data.
Set a Cloud Function to use the largest CPU possible to minimize the runtime
of the processing.
23. You are reviewing your deployment pipeline in Google Cloud Deploy. You must reduce
toil in the pipeline, and you want to minimize the amount of time it takes to complete
an end-to-end deployment. What should you do? (Choose two.)
a. Create a trigger to notify the required team to complete the next step when
manual intervention is required.
b. Divide the automation steps into smaller tasks.
c. Use a script to automate the creation of the deployment pipeline in Google
Cloud Deploy. We would rather use a webhook to create a deployment or
pub/sub message.
d. Add more engineers to finish the manual steps.
e. Automate promotion approvals from the development environment to the
test environment.
24. You work for a global organization and are running a monolithic application on
Compute Engine. You need to select the machine type for the application to use that
optimizes CPU utilization by using the fewest number of steps. You want to use
historical system metrics to identify the machine type for the application to use. You
want to follow Google-recommended practices. What should you do?
a. Use the Recommender API and apply the suggested recommendations.
Utilizing the Recommender API allows you to leverage Google Cloud's machine
learning algorithms to analyze historical data and provide specific
recommendations for resource optimization. This method is proactive and can
automatically suggest appropriate changes to improve efficiency, aligning with
Google-recommended practices. By incorporating the insights provided by the
Recommender API, you can make informed decisions to select the most suitable
machine type for the application, ensuring optimal CPU utilization with minimal
manual intervention.
b. Create an Agent Policy to automatically install Ops Agent in all VMs.
c. Install the Ops Agent in a fleet of VMs by using the gcloud CLI.
d. Review the Cloud Monitoring dashboard for the VM and choose the machine
type with the lowest CPU utilization.
25. You deployed an application into a large Standard Google Kubernetes Engine (GKE)
cluster. The application is stateless and multiple pods run at the same time. Your
application receives inconsistent traffic. You need to ensure that the user experience
remains consistent regardless of changes in traffic and that the resource usage of the
cluster is optimized.
a. Configure a cron job to scale the deployment on a schedule
b. Configure a Horizontal Pod Autoscaler. Configure a Horizontal Pod Autoscaler
(HPA). HPA automatically adjusts the number of replica pods based on observed
CPU utilization or other custom metrics. In the context of varying traffic
patterns, HPA dynamically scales the number of pods to meet demand, ensuring
that there are enough instances to handle increased traffic and scaling down
during periods of lower demand. This helps maintain consistent performance
while optimizing resource utilization in response to changing workloads.
c. Configure a Vertical Pod Autoscaler
d. Configure cluster autoscaling on the node pool.
26. You need to deploy a new service to production. The service needs to automatically
scale using a managed instance group and should be deployed across multiple regions.
The service needs a large number of resources for each instance and you need to plan
for capacity. What should you do?
a. Monitor results of Cloud Trace to determine the optimal sizing.
b. Use the n2-highcpu-96 machine type in the configuration of the managed
instance group.
c. Deploy the service in multiple regions and use an internal load balancer to
route traffic.
d. Validate that the resource requirements are within the available project quota
limits of each region. Validating that the resource requirements are within the
available project quota limits for each region is crucial to avoid issues during
deployment. Each Google Cloud region has specific quota limits for various
resources, such as CPU, memory, and instances. Ensuring that the planned
capacity aligns with the allocated quotas prevents unexpected scaling
limitations and helps in effective capacity planning for the service across
multiple regions. This approach ensures a smooth deployment and operation of
the service without encountering resource constraints.
27. You are analyzing Java applications in production. All applications have Cloud Profiler
and Cloud Trace installed and configured by default. You want to determine which
applications need performance tuning. What should you do? (Choose two.)
a. Examine the wall-clock time and the CPU time of the application. If the
difference is substantial increase the CPU resource allocation.
b. Examine the wall-clock time and the CPU time of the application. If the
difference is substantial, increase the memory resource allocation.
c. Examine the wall-clock time and the CPU time of the application. If the
difference is substantial, increase the local disk storage allocation.
d. Examine the latency time the wall-clock time and the CPU time of the
application. If the latency time is slowly burning down the error budget, and
the difference between wall-clock time and CPU time is minimal mark the
application for optimization. Latency time: High latency directly impacts user
experience and can negatively affect your error budget. Wall-clock time: This
represents the elapsed time from start to finish of a request. A large difference
with CPU time indicates bottlenecks outside the CPU, requiring optimization.
CPU time: This represents the actual CPU processing time used by the
application. Minimal difference with wall-clock time suggests inefficient use of
CPU, potentially needing optimization.
e. Examine the heap usage of the application. If the usage is low, mark the
application for optimization. Low heap usage can indicate under-provisioning
or potential memory leaks. Either way, it implies inefficient resource utilization
and deserves investigation. Low heap usage may indicate that the application is
not fully utilizing available resources, presenting an opportunity for
optimization to enhance efficiency and responsiveness.
28. Your organization stores all application logs from multiple Google Cloud projects in a
central Cloud Logging project. Your security team wants to enforce a rule that each
project team can only view their respective logs and only the operations team can
view all the logs. You need to design a solution that meets the security team
requirements while minimizing costs. What should you do?
a. Grant each project team access to the project _Default view in the central
logging project. Grant logging viewer access to the operations team in the
central logging project.
b. Create Identity and Access Management (IAM) roles for each project team and
restrict access to the _Default log view in their individual Google Cloud
project. Grant viewer access to the operations team in the central logging
project.
c. Create log views for each project team and only show each project team their
application logs. Grant the operations team access to the _AllLogs view in the
central logging project. Creating log views for each project team allows you to
tailor access to only show each team their relevant application logs. This fine-
grained control ensures that project teams can access their own logs while
maintaining isolation from logs of other teams. Granting the operations team
access to the _AllLogs view in the central logging project provides them with the
necessary visibility across all logs. This approach not only satisfies the security
requirements but also minimizes costs by efficiently organizing and restricting
access to the logs based on project teams' needs.
d. Export logs to BigQuery tables for each project team. Grant project teams
access to their tables. Grant logs writer access to the operations team in the
central logging project.
29. Your company uses Jenkins running on Google Cloud VM instances for CI/CD. You need
to extend the functionality to use infrastructure as code automation by using
Terraform. You must ensure that the Terraform Jenkins instance is authorized to
create Google Cloud resources. You want to follow Google-recommended practices.
What should you do?
a. Confirm that the Jenkins VM instance has an attached service account with
the appropriate Identity and Access Management (IAM) permissions.
Downloading and setting Env Variable has potential to expose such high-level
access to users whereas a dedicated SA attached to VM does not expose any
credentials at all. running tf by default would use SA attached to VM.
b. Use the Terraform module so that Secret Manager can retrieve credentials.
c. Create a dedicated service account for the Terraform instance. Download and
copy the secret key value to the GOOGLE_CREDENTIALS environment variable
on the Jenkins server. Creating a dedicated service account for the Terraform
instance is a best practice as it allows for fine-grained control over permissions.
By downloading and copying the secret key value to the GOOGLE_CREDENTIALS
environment variable on the Jenkins server, you securely provide the necessary
credentials to Terraform. This approach minimizes security risks associated with
manual handling of secret keys and adheres to the principle of least privilege by
assigning only the required IAM permissions to the dedicated service account
for Terraform automation.
d. Add the gcloud auth application-default login command as a step in Jenkins
before running the Terraform commands.
30. As part of your company's initiative to shift left on security, the InfoSec team is asking
all teams to implement guard rails on all the Google Kubernetes Engine (GKE) clusters
to only allow the deployment of trusted and approved images. You need to determine
how to satisfy the InfoSec team's goal of shifting left on security. What should you do?
a. Enable Container Analysis in Artifact Registry, and check for common
vulnerabilities and exposures (CVEs) in your container images
b. Use Binary Authorization to attest images during your CI/CD pipeline. Use
Binary Authorization to attest images during your CI/CD pipeline. Binary
Authorization allows you to define and enforce policies that determine which
container images can run in your GKE environment based on image signatures.
By integrating Binary Authorization into your CI/CD pipeline, you can ensure
that only trusted and approved images, with the correct attestations, are
deployed to the GKE clusters. This proactive security measure aligns with the
concept of shifting security left, as it establishes controls early in the
development and deployment process, minimizing the risk of deploying
compromised or unapproved images in production.
c. Configure Identity and Access Management (IAM) policies to create a least
privilege model on your GKE clusters.
d. Deploy Falco or Twistlock on GKE to monitor for vulnerabilities on your
running Pods
31. Your company operates in a highly regulated domain. Your security team requires that
only trusted container images can be deployed to Google Kubernetes Engine (GKE).
You need to implement a solution that meets the requirements of the security team
while minimizing management overhead. What should you do?
a. Configure Binary Authorization in your GKE clusters to enforce deploy-time
security policies. Binary Authorization allows you to define and enforce policies
that determine which container images can be deployed based on image
signatures. By configuring Binary Authorization, you can enforce deploy-time
security policies, ensuring that only trusted and verified container images are
allowed to run in your GKE clusters. This approach provides a robust security
mechanism without requiring additional custom validators or complex
configurations, minimizing management overhead while meeting the stringent
security requirements of a highly regulated domain.
b. Grant the roles/artifactregistry.writer role to the Cloud Build service account.
Confirm that no employee has Artifact Registry write permission.
c. Use Cloud Run to write and deploy a custom validator. Enable an Eventarc
trigger to perform validations when new images are uploaded.
d. Configure Kritis to run in your GKE clusters to enforce deploy-time security
policies.
32. Your CTO (director de tecnología) has asked you to implement a postmortem policy
on every incident for internal use. You want to define what a good postmortem is to
ensure that the policy is successful at your company. What should you do? (Choose
two.)
a. Ensure that all postmortems include what caused the incident, identify the
person or team responsible for causing the incident, and how to prevent a
future occurrence of the incident.
b. Ensure that all postmortems include what caused the incident, how the
incident could have been worse, and how to prevent a future occurrence of
the incident.
c. Ensure that all postmortems include the severity of the incident, how to
prevent a future occurrence of the incident, and what caused the incident
without naming internal system components. It emphasizes including the
severity of the incident, prevention strategies for future occurrences, and an
analysis of what caused the incident without necessarily naming internal system
components. This approach ensures a balance between transparency and
security, providing valuable insights without exposing sensitive internal details.
d. Ensure that all postmortems include how the incident was resolved and what
caused the incident without naming customer information.
e. Ensure that all postmortems include all incident participants in postmortem
authoring and share postmortems as widely as possible. Which advocates
involving all incident participants in postmortem authoring and sharing
postmortems widely, promotes a collaborative and inclusive culture. Involving
all relevant stakeholders ensures a comprehensive understanding of the
incident, and sharing postmortems widely fosters transparency, enabling the
organization to learn from incidents collectively.
33. You are developing reusable infrastructure as code modules. Each module contains
integration tests that launch the module in a test project. You are using GitHub for
source control. You need to continuously test your feature branch and ensure that all
code is tested before changes are accepted. You need to implement a solution to
automate the integration tests. What should you do?
a. Use a Jenkins server for CI/CD pipelines. Periodically run all tests in the feature
branch.
b. Ask the pull request reviewers to run the integration tests before approving
the code.
c. Use Cloud Build to run the tests. Trigger all tests to run after a pull request is
merged.
d. Use Cloud Build to run tests in a specific folder. Trigger Cloud Build for every
GitHub pull request. By configuring Cloud Build to run tests in a specific folder,
you can focus on the relevant tests for the modified code in the feature branch.
Triggering Cloud Build for every GitHub pull request ensures that tests are
automatically executed whenever changes are proposed, providing continuous
integration. This approach allows for automated testing of each pull request,
providing early feedback to developers and ensuring that changes are
thoroughly tested before being merged, contributing to a more reliable and
efficient development process.
34. Your company processes IoT data at scale by using Pub/Sub, App Engine standard
environment, and an application written in Go. You noticed that the performance
inconsistently degrades at peak load. You could not reproduce this issue on your
workstation. You need to continuously monitor the application in production to
identify slow paths in the code. You want to minimize performance impact and
management overhead. What should you do?
a. Use Cloud Monitoring to assess the App Engine CPU utilization metric.
b. Install a continuous profiling tool into Compute Engine. Configure the
application to send profiling data to the tool.
c. Periodically run the go tool pprof command against the application instance.
Analyze the results by using flame graphs.
d. Configure Cloud Profiler, and initialize the cloud.google.com/go/profiler
library in the application. By configuring Cloud Profiler and initializing the
corresponding library in the Go application, you can collect detailed
performance data without significantly impacting the application's
performance. This approach allows you to analyze profiling information, identify
slow paths in the code, and gain insights into performance bottlenecks,
providing a powerful and efficient way to troubleshoot and optimize the
application in a production environment.
35. Your company runs services by using Google Kubernetes Engine (GKE). The GKE
clusters in the development environment run applications with verbose logging
enabled. Developers view logs by using the kubectl logs command and do not use
Cloud Logging. Applications do not have a uniform logging structure defined. You need
to minimize the costs associated with application logging while still collecting GKE
operational logs. What should you do?
a. Run the gcloud container clusters update --logging=SYSTEM command for the
development cluster.
b. Run the gcloud container clusters update --logging=WORKLOAD command for
the development cluster.
c. Run the gcloud logging sinks update _Default --disabled command in the
project associated with the development environment.
d. Add the severity >= DEBUG resource.type = "k8s_container" exclusion filter to
the _Default logging sink in the project associated with the development
environment. Add the severity >= DEBUG resource.type = "k8s_container"
exclusion filter to the _Default logging sink in the project associated with the
development environment. This filter excludes logs with a severity level of
DEBUG or lower for the specified resource type, "k8s_container," effectively
reducing the volume of verbose application logs being ingested into Cloud
Logging. This allows you to focus on collecting GKE operational logs while
excluding less critical and potentially costly application logs. It helps strike a
balance between maintaining visibility into operational aspects and optimizing
costs associated with log storage and processing.
36. You have deployed a fleet of Compute Engine instances in Google Cloud. You need to
ensure that monitoring metrics and logs for the instances are visible in Cloud Logging
and Cloud Monitoring by your company's operations and cyber security teams. You
need to grant the required roles for the Compute Engine service account by using
Identity and Access Management (IAM) while following the principle of least privilege.
What should you do?
a. Grant the logging.logWriter and monitoring.metricWriter roles to the
Compute Engine service accounts. These roles provide the necessary
permissions for writing logs and metrics to Cloud Logging and Cloud Monitoring,
respectively, without granting overly broad access. This aligns with the principle
of least privilege, ensuring that the Compute Engine service accounts have the
specific permissions needed for monitoring tasks without unnecessary
additional privileges. This approach enables effective visibility for both
operations and cyber security teams while maintaining a secure and well-
defined access model.
b. Grant the logging.admin and monitoring.editor roles to the Compute Engine
service accounts.
c. Grant the logging.editor and monitoring.metricWriter roles to the Compute
Engine service accounts.
d. Grant the logging.logWriter and monitoring.editor roles to the Compute
Engine service accounts.
37. You are the Site Reliability Engineer responsible for managing your company's data
services and products. You regularly navigate operational challenges, such as
unpredictable data volume and high cost, with your company's data ingestion
processes. You recently learned that a new data ingestion product will be developed
in Google Cloud. You need to collaborate with the product development team to
provide operational input on the new product. What should you do?
a. Deploy the prototype product in a test environment, run a load test, and share
the results with the product development team.
b. When the initial product version passes the quality assurance phase and
compliance assessments, deploy the product to a staging environment. Share
error logs and performance metrics with the product development team.
c. When the new product is used by at least one internal customer in production,
share error logs and monitoring metrics with the product development team.
d. Review the design of the product with the product development team to
provide feedback early in the design phase. Review the design of the product
with the product development team to provide feedback early in the design
phase. By engaging in the design phase, you can contribute valuable insights
from an operational perspective, helping to identify potential challenges and
considerations related to data volume, cost, and overall operability. This early
collaboration allows for proactive discussions on scalability, reliability, and cost-
effectiveness, leading to a more robust and operationally sound product.
Providing input during the design phase ensures that operational concerns are
addressed from the outset, reducing the likelihood of issues during the
development and deployment phases.
38. You are investigating issues in your production application that runs on Google
Kubernetes Engine (GKE). You determined that the source of the issue is a recently
updated container image, although the exact change in code was not identified. The
deployment is currently pointing to the latest tag. You need to update your cluster to
run a version of the container that functions as intended. What should you do?
a. Create a new tag called stable that points to the previously working container,
and change the deployment to point to the new tag.
b. Alter the deployment to point to the sha256 digest of the previously working
container. Because the code change is not known, it may be possible that the
image was updated with same tag and now it's not working, so pointing using
the tag won't solve the issue as it will always point to the latest digest sha256
(which is faulty). So we must point the deployment to the sha256 digest of the
previously working container.
Alter the deployment to point to the sha256 digest of the previously working
container. Using the sha256 digest provides a precise and immutable reference
to a specific version of the container image, ensuring that the exact image that
was known to work is deployed. This approach eliminates any ambiguity
associated with using tags like "latest" and provides a reliable way to rollback to
a known-good state. By referencing the sha256 digest, you maintain control
over the version of the container image deployed in the cluster and can
effectively address issues arising from unanticipated changes in the latest tag.
c. Build a new container from a previous Git tag, and do a rolling update on the
deployment to the new container.
d. Apply the latest tag to the previous container image, and do a rolling update
on the deployment.
39. You need to create a Cloud Monitoring SLO for a service that will be published soon.
You want to verify that requests to the service will be addressed in fewer than 300 ms
at least 90% of the time per calendar month. You need to identify the metric and
evaluation method to use. What should you do?
a. Select a latency metric for a request-based method of evaluation. Select a
latency metric for a request-based method of evaluation. Latency metrics,
which measure the time it takes for requests to be processed, are appropriate
for evaluating the specific performance requirements outlined in the scenario.
By selecting a latency metric for a request-based method, such as P95
(percentile 95), you can precisely measure and evaluate the desired
performance level—requests being addressed in fewer than 300 ms at least 90%
of the time. This approach aligns with the specific performance criteria and
allows for a granular assessment of the service's responsiveness.
b. Select a latency metric for a window-based method of evaluation.
c. Select an availability metric for a request-based method of evaluation.
d. Select an availability metric for a window-based method of evaluation.
40. You have an application that runs on Cloud Run. You want to use live production traffic
to test a new version of the application, while you let the quality assurance team
perform manual testing. You want to limit the potential impact of any issues while
testing the new version, and you must be able to roll back to a previous version of the
application if needed. How should you deploy the new version? (Choose two.)
a. Deploy the application as a new Cloud Run service.
b. Deploy a new Cloud Run revision with a tag and use the --no-traffic option.
c. Deploy a new Cloud Run revision without a tag and use the --no-traffic option.
d. Deploy the new application version and use the --no-traffic option. Route
production traffic to the revision’s URL.
e. Deploy the new application version, and split traffic to the new version.

En definitiva lo importante es que no haya tráfico para poder testear y que estén tageadas.

It involves deploying the new version of the application on Cloud Run by creating a new
revision with a tag and using the --no-traffic option. This approach allows the isolation of the
new revision from live production traffic initially. Once the deployment is complete,
extensive testing can be conducted without affecting users. Subsequently, when confident
in the new version's stability, production traffic can be gradually directed to it using the
Cloud Run services update-traffic command. This combination ensures a controlled and risk-
mitigated approach to deploying and testing new versions, with the ability to roll back if any
issues arise during the testing phase.

A common use case for this feature is to use it for testing and vetting of a new service
revision before it serves any traffic, in this typical sequence:

1. Run integration tests on a container during development.

2. Deploy the container to a Google Cloud project that you use only for staging, serving no
traffic, and test against a tagged revision.

3. Deploy it to production, without serving traffic, and test against a tagged revision in
production.

4. Migrate traffic to the tagged revision.

Cloud Run allows you to specify which revisions should receive traffic and to specify traffic
percentages that are received by a revision. This feature allows you to rollback to a previous
revision, gradually roll out a revision, and split traffic between multiple revisions.

41. You recently noticed that one of your services has exceeded the error budget for the
current rolling window period. Your company's product team is about to launch a new
feature. You want to follow Site Reliability Engineering (SRE) practices. What should
you do?
a. Notify the team about the lack of error budget and ensure that all their tests
are successful so the launch will not further risk the error budget
b. Notify the team that their error budget is used up. Negotiate with the team
for a launch freeze or tolerate a slightly worse user experience. By notifying
the team that the error budget has been exhausted, the SRE team is proactively
communicating the potential risks associated with launching the new feature
during a period of heightened error rates. The suggestion to negotiate for a
launch freeze or tolerate a slightly degraded user experience demonstrates a
commitment to preserving the system's reliability and ensuring that user impact
is minimized. This approach fosters a collaborative effort between the SRE team
and the product team, allowing for informed decision-making that prioritizes
reliability over feature deployment when necessary, adhering to the core tenets
of SRE practices.
c. Escalate the situation and request additional error budget.
d. Look through other metrics related to the product and find SLOs with
remaining error budget. Reallocate the error budgets and allow the feature
launch.
42. You need to introduce postmortems into your organization. You want to ensure that
the postmortem process is well received. What should you do? (Choose two.)
a. Encourage new employees to conduct postmortems to team through practice.
b. Create a designated team that is responsible for conducting all postmortems.
c. Encourage your senior leadership to acknowledge and participate in
postmortems. When senior leadership acknowledges and participates in
postmortems, it shows that they value the process and are committed to
learning from mistakes. This can help in creating a culture where postmortems
are well received and taken seriously by the organization.
d. Ensure that writing effective postmortems is a rewarded and celebrated
practice. When effective postmortems are rewarded and celebrated, it creates
a positive incentive for employees to invest time and effort in conducting
thorough postmortems. It helps in reinforcing the importance of the process
and encourages others to follow suit.
e. Provide your organization with a forum to critique previous postmortems.
43. You need to enforce several constraint templates across your Google Kubernetes
Engine (GKE) clusters. The constraints include policy parameters, such as restricting
the Kubernetes API. You must ensure that the policy parameters are stored in a GitHub
repository and automatically applied when changes occur. What should you do?
a. Set up a GitHub action to trigger Cloud Build when there is a parameter
change. In Cloud Build, run a gcloud CLI command to apply the change.
b. When there is a change in GitHub. use a web hook to send a request to Anthos
Service Mesh, and apply the change.
c. Configure Anthos Config Management with the GitHub repository. When
there is a change in the repository, use Anthos Config Management to apply
the change. Anthos Config Management allows you to declaratively manage
configurations using Kubernetes-style manifests, making it well-suited for policy
enforcement in a Kubernetes environment. By configuring Anthos Config
Management with the GitHub repository, any changes made to the policy
parameters, stored in the repository, can be automatically applied to the GKE
clusters. This ensures consistency and compliance across clusters and
streamlines the process of managing and enforcing policy changes in a
Kubernetes environment. The integration with GitHub provides version control
and auditability for the changes made to the policy parameters.
d. Configure Config Connector with the GitHub repository. When there is a
change in the repository, use Config Connector to apply the change.
44. You are the Operations Lead for an ongoing incident with one of your services. The
service usually runs at around 70% capacity. You notice that one node is returning 5xx
errors for all requests. There has also been a noticeable increase in support cases from
customers. You need to remove the offending node from the load balancer pool so
that you can isolate and investigate the node. You want to follow Google-
recommended practices to manage the incident and reduce the impact on users. What
should you do?
La correcta es la A:

First, communicating your intent to the incident team ensures transparency and
collaboration. Performing a load analysis is crucial to determine if the remaining nodes can
handle the increased traffic after offloading from the unhealthy node. Scaling appropriately
is essential to maintain the overall capacity. Once new nodes report as healthy, draining
traffic from the unhealthy node ensures a gradual transition without disrupting user
experience. Removing the unhealthy node from service comes after ensuring that the other
nodes can handle the load effectively.

This step-by-step approach, coupled with communication and load analysis, aligns with
Google-recommended practices for incident response and minimizes the impact on users
during the investigation and resolution process.

45. You are configuring your CI/CD pipeline natively on Google Cloud. You want builds in
a pre-production Google Kubernetes Engine (GKE) environment to be automatically
load-tested before being promoted to the production GKE environment. You need to
ensure that only builds that have passed this test are deployed to production. You
want to follow Google-recommended practices. How should you configure this
pipeline with Binary Authorization?
a. Create an attestation for the builds that pass the load test by requiring the
lead quality assurance engineer to sign the attestation by using their personal
private key.
b. Create an attestation for the builds that pass the load test by using a private
key stored in Cloud Key Management Service (Cloud KMS) with a service
account JSON key stored as a Kubernetes Secret.
c. Create an attestation for the builds that pass the load test by using a private
key stored in Cloud Key Management Service (Cloud KMS) authenticated
through Workload Identity. This option involves creating an attestation for the
builds that pass the load test using a private key stored in Cloud Key
Management Service (Cloud KMS) authenticated through Workload Identity.
Workload Identity allows you to securely authenticate to Google Cloud services
from your GKE clusters without the need for storing and managing service
account keys. By using Cloud KMS for key storage and Workload Identity for
authentication, you enhance the security of your pipeline.
d. Create an attestation for the builds that pass the load test by requiring the
lead quality assurance engineer to sign the attestation by using a key stored
in Cloud Key Management Service (Cloud KMS). Sirve, pero le quita parte de lo
automatizado.
46. You are deploying an application to Cloud Run. The application requires a password
to start. Your organization requires that all passwords are rotated every 24 hours, and
your application must have the latest password. You need to deploy the application
with no downtime. What should you do?
a. Store the password in Secret Manager and send the secret to the application
by using environment variables.
b. Store the password in Secret Manager and mount the secret as a volume
within the application. Mount each secret as a volume, which makes the
secret available to the container as files. Reading a volume always fetches the
secret value from Secret Manager, so it can be used with the latest version.
This method also works well with secret rotation.
By storing the password in Secret Manager and mounting the secret as a volume
within the application, you can achieve password rotation without causing
downtime. This allows you to update the password in Secret Manager, and
Cloud Run can dynamically mount the latest version of the secret without
requiring a redeployment of the application.
c. Use Cloud Build to add your password into the application container at build
time. Ensure that Artifact Registry is secured from public access.
d. Store the password directly in the code. Use Cloud Build to rebuild and deploy
the application each time the password changes.
47. You are designing a system with three different environments: development, quality
assurance (QA), and production. Each environment will be deployed with Terraform
and has a Google Kubernetes Engine (GKE) cluster created so that application teams
can deploy their applications. Anthos Config Management will be used and templated
to deploy infrastructure level resources in each GKE cluster. All users (for example,
infrastructure operators and application owners) will use GitOps. How should you
structure your source control repositories for both Infrastructure as Code (IaC) and
application code?
a. • Cloud Infrastructure (Terraform) repository is shared: different directories
are different environments
• GKE Infrastructure (Anthos Config Management Kustomize manifests)
repository is shared: different overlay directories are different environments
• Application (app source code) repositories are separated: different branches
are different features
For GitOps, Google recommends: Use folders for variants of the configuration
instead of branches. With folders, you can use the tree command to see
variants. For example, with branches, you can't tell if the delta between a prod
and stage branch is an upcoming change in configuration or a permanent
difference between what stage and prod should look like.
“GKE Infrastructure (Anthos Config Management Kustomize manifests)
repository is shared: different overlay directories are different environments.”
b. • Cloud Infrastructure (Terraform) repository is shared: different directories
are different environments
• GKE Infrastructure (Anthos Config Management Kustomize manifests)
repositories are separated: different branches are different environments
• Application (app source code) repositories are separated: different branches
are different features
c. • Cloud Infrastructure (Terraform) repository is shared: different branches are
different environments
• GKE Infrastructure (Anthos Config Management Kustomize manifests)
repository is shared: different overlay directories are different environments
• Application (app source code) repository is shared: different directories are
different features
d. • Cloud Infrastructure (Terraform) repositories are separated: different
branches are different environments
• GKE Infrastructure (Anthos Config Management Kustomize manifests)
repositories are separated: different overlay directories are different
environments
• Application (app source code) repositories are separated: different branches
are different features
48. You are configuring Cloud Logging for a new application that runs on a Compute Engine
instance with a public IP address. A user-managed service account is attached to the
instance. You confirmed that the necessary agents are running on the instance but you
cannot see any log entries from the instance in Cloud Logging. You want to resolve the
issue by following Google-recommended practices. What should you do?
a. Export the service account key and configure the agents to use the key.
b. Update the instance to use the default Compute Engine service account. They
specified that a user manager service account is attached to the instance, so the
default one will not be used.
c. Add the Logs Writer role to the service account. By adding the Logs Writer role
to the service account, you grant the necessary permissions to write logs to
Cloud Logging. This role provides the required access for the agents running on
the instance to send log entries to Cloud Logging. Make sure to follow the
principle of least privilege and only grant the minimum permissions required for
your application to function.
d. Enable Private Google Access on the subnet that the instance is in.
49. As a Site Reliability Engineer, you support an application written in Go that runs on
Google Kubernetes Engine (GKE) in production. After releasing a new version of the
application, you notice the application runs for about 15 minutes and then restarts.
You decide to add Cloud Profiler to your application and now notice that the heap
usage grows constantly until the application restarts. What should you do?
a. Increase the CPU limit in the application deployment.
b. Add high memory compute nodes to the cluster. Tiene que ver más con la
aplicación que con los recursos en la nube, siendo la C.
c. Increase the memory limit in the application deployment. Given the scenario
where the heap usage of the application grows constantly until it restarts, the
issue is likely related to memory consumption. Therefore, the appropriate
action to take is to address the memory-related problem.
By increasing the memory limit in the application deployment configuration,
you provide more memory resources to the application, potentially preventing
it from exhausting its memory and triggering restarts.
d. Add Cloud Trace to the application, and redeploy.
50. You are deploying a Cloud Build job that deploys Terraform code when a Git branch is
updated. While testing, you noticed that the job fails. You see the following error in
the build logs:
Initializing the backend...
Error: Failed to get existing workspaces: querying Cloud Storage failed: googleapi:
Error 403
You need to resolve the issue by following Google-recommended practices. What
should you do?
a. Change the Terraform code to use local state.
b. Create a storage bucket with the name specified in the Terraform
configuration.
c. Grant the roles/owner Identity and Access Management (IAM) role to the
Cloud Build service account on the project.
d. Grant the roles/storage.objectAdmin Identity and Access Management (IAM)
role to the Cloud Build service account on the state file bucket. This permission
grants the necessary access for Cloud Build to read and write objects (which
include Terraform state files) in the specified Cloud Storage bucket, resolving
the 403 error. It's important to follow the principle of least privilege and only
grant the permissions needed for the specific task at hand.
Storage Object Admin (roles/storage.objectAdmin) Grants full control over
objects, including listing, creating, viewing, and deleting objects, as well as
setting object ACLs. Also grants access to create, delete, get, and list managed
folders.
51. Your company runs applications in Google Kubernetes Engine (GKE). Several
applications rely on ephemeral volumes. You noticed some applications were unstable
due to the DiskPressure node condition on the worker nodes. You need to identify
which Pods are causing the issue, but you do not have execute access to workloads
and nodes. What should you do?
a. Check the node/ephemeral_storage/used_bytes metric by using Metrics
Explorer.
b. Check the container/ephemeral_storage/used_bytes metric by using Metrics
Explorer. This metric provides information about the ephemeral storage usage
of containers within Pods. By examining this metric, you can identify the specific
Pods that are contributing to the DiskPressure condition on the worker nodes.
Metrics Explorer allows you to visualize and analyze metrics data, making it a
suitable tool for investigating resource usage and identifying potential issues in
a GKE environment.
This approach provides the necessary detail to identify which Pods are causing
the DiskPressure condition without the need for direct access to the nodes or
the ability to execute commands on them. Metrics Explorer in Google Cloud's
operations suite (formerly Stackdriver) allows you to query and visualize metrics
from your GKE environment, making it a powerful tool for this kind of diagnostic
task.
c. Locate all the Pods with emptyDir volumes. Use the df -h command to
measure volume disk usage.
d. Locate all the Pods with emptyDir volumes. Use the df -sh * command to
measure volume disk usage.
52. You are designing a new Google Cloud organization for a client. Your client is
concerned with the risks associated with long-lived credentials created in Google
Cloud. You need to design a solution to completely eliminate the risks associated with
the use of JSON service account keys while minimizing operational overhead. What
should you do?
a. Apply the constraints/iam.disableServiceAccountKevCreation constraint to
the organization.
By applying the constraints/iam.disableServiceAccountKeyCreation constraint
to the organization, you can prevent the creation of JSON service account keys,
thus minimizing the risk associated with long-lived credentials. This constraint
disables the ability to create new service account keys, reducing the potential
for misuse or compromise of credentials.
You can use the iam.disableServiceAccountCreation boolean constraint to
disable the creation of new service accounts. This allows you to centralize
management of service accounts while not restricting the other permissions
your developers have on projects.
b. Use custom versions of predefined roles to exclude all
iam.serviceAccountKeys.* service account role permissions.
c. Apply the constraints/iam.disableServiceAccountKeyUpload constraint to the
organization.
d. Grant the roles/iam.serviceAccountKeyAdmin IAM role to organization
administrators only.
53. You are designing a deployment technique for your applications on Google Cloud. As
part of your deployment planning, you want to use live traffic to gather performance
metrics for new versions of your applications. You need to test against the full
production load before your applications are launched. What should you do?
a. Use A/B testing with blue/green deployment.
b. Use canary testing with continuous deployment.
c. Use canary testing with rolling updates deployment.
d. Use shadow testing with continuous deployment. With shadow testing, you
deploy and run a new version alongside the current version, but in such a way
that the new version is hidden from the users.
An incoming request is mirrored and replayed in a test environment. This
process can happen either in real time or asynchronously after a copy of the
previously captured production traffic is replayed against the newly deployed
service.
Shadow testing allows the deployment and execution of a new version
alongside the current one, but in a manner that is hidden from users. This
approach ensures zero production impact, as the incoming requests are
mirrored and replayed in a test environment. By duplicating traffic, it enables
comprehensive testing of new features and improvements against the full
production load without affecting end-users. Additionally, the continuous
deployment aspect ensures that the deployment process remains ongoing,
allowing for iterative testing and refinement based on real-world performance
metrics and user interactions before a full rollout occurs.
54. Your Cloud Run application writes unstructured logs as text strings to Cloud Logging.
You want to convert the unstructured logs to JSON-based structured logs. What
should you do?
a. Modify the application to use Cloud Logging software development kit (SDK),
and send log entries with a jsonPayload field.
b. Install a Fluent Bit sidecar container, and use a JSON parser.
c. Install the log agent in the Cloud Run container image, and use the log agent
to forward logs to Cloud Logging.
d. Configure the log agent to convert log text payload to JSON payload. In this
context, if you have unstructured logs written as text strings and want to
convert them to JSON-based structured logs, you would typically use a log agent
or parser to transform the log entries. The log agent is configured to recognize
the structure of the logs and convert the text payload into a JSON payload. The
log agent would be responsible for parsing the unstructured logs and converting
them into a structured format. This process involves specifying how to extract
relevant information from the text payload and organize it into a JSON
structure. It's important to note that this approach assumes you have a log
agent or parser that supports the transformation of unstructured logs to
structured logs.
55. Your company is planning a large marketing event for an online retailer during the
holiday shopping season. You are expecting your web application to receive a large
volume of traffic in a short period. You need to prepare your application for potential
failures during the event. What should you do? (Choose two.)
a. Configure Anthos Service Mesh on the application to identify issues on the
topology map. While Anthos Service Mesh can provide valuable insights into
service interactions, it's not directly related to preparing for potential failures
during high-traffic events.
b. Ensure that relevant system metrics are being captured with Cloud
Monitoring, and create alerts at levels of interest. Monitoring system metrics
and setting up alerts can help you detect and respond to issues quickly,
minimizing potential downtime during periods of high traffic.
c. Review your increased capacity requirements and plan for the required quota
management. Reviewing your capacity requirements and planning for quota
management ensures that your application has the necessary resources to
handle the expected increase in traffic.
d. Monitor latency of your services for average percentile latency. While
monitoring latency is important for understanding service performance, it
doesn't directly help prepare for potential failures during high-traffic events.
e. Create alerts in Cloud Monitoring for all common failures that your application
experiences. Creating alerts for all common failures could result in alert fatigue.
56. Your company recently migrated to Google Cloud. You need to design a fast, reliable,
and repeatable solution for your company to provision new projects and basic
resources in Google Cloud. What should you do?
a. Use the Google Cloud console to create projects.
b. Write a script by using the gcloud CLI that passes the appropriate parameters
from the request. Save the script in a Git repository.
c. Write a Terraform module and save it in your source control repository. Copy
and run the terraform apply command to create the new project.
d. Use the Terraform repositories from the Cloud Foundation Toolkit. Apply the
code with appropriate parameters to create the Google Cloud project and
related resources. Using Terraform and the Cloud Foundation Toolkit ensures
that you have a structured and version-controlled approach to provisioning
resources. It also facilitates automation and scalability, making it easier to
manage projects and resources across different environments.
The Cloud Foundation Toolkit provides a series of reference templates for
Deployment Manager and Terraform which reflect Google Cloud best practices.
57. You are configuring a CI pipeline. The build step for your CI pipeline integration testing
requires access to APIs inside your private VPC network. Your security team requires
that you do not expose API traffic publicly. You need to implement a solution that
minimizes management overhead. What should you do?
a. Use Cloud Build private pools to connect to the private VPC. Private pools are
private, dedicated pools of workers that offer greater customization over the
build environment, including the ability to access resources in a private network.
Private pools, similar to default pools, are hosted and fully-managed by Cloud
Build and scale up and down to zero, with no infrastructure to set up, upgrade,
or scale. Because private pools are customer-specific resources, you can
configure them in more ways.
Cloud Build private pools allow you to run builds in a Google Cloud environment
with access to your private VPC network. This ensures that the build step for
your CI pipeline can access APIs inside the private VPC without exposing API
traffic to the public internet. Private pools minimize management overhead by
providing a secure and controlled environment for running builds.
b. Use Spinnaker for Google Cloud to connect to the private VPC.
c. Use Cloud Build as a pipeline runner. Configure Internal HTTP(S) Load
Balancing for API access.
d. Use Cloud Build as a pipeline runner. Configure External HTTP(S) Load
Balancing with a Google Cloud Armor policy for API access.
58. You are leading a DevOps project for your organization. The DevOps team is
responsible for managing the service infrastructure and being on-call for incidents.
The Software Development team is responsible for writing, submitting, and reviewing
code. Neither team has any published SLOs. You want to design a new joint-ownership
model for a service between the DevOps team and the Software Development team.
Which responsibilities should be assigned to each team in the new joint-ownership
model?

a.

b.

c.

La correcta es la C, no tiene sentido que el equipo de software esté en las llamadas


y los de DevOps no y viceversa.

59. You recently migrated an ecommerce application to Google Cloud. You now need to
prepare the application for the upcoming peak traffic season. You want to follow
Google-recommended practices. What should you do first to prepare for the busy
season?
a. Migrate the application to Cloud Run, and use autoscaling.
b. Create a Terraform configuration for the application's underlying
infrastructure to quickly deploy to additional regions.
c. Load test the application to profile its performance for scaling. Google-
recommended practices often involve load testing applications to understand
their performance characteristics and determine appropriate scaling strategies.
By load testing the ecommerce application, you can simulate peak traffic
scenarios and identify potential bottlenecks or areas for optimization. This
allows you to make informed decisions about scaling strategies, such as setting
up auto-scaling policies based on the observed performance metrics. Load
testing helps ensure that the application can handle the expected peak traffic
efficiently and reliably.
d. Pre-provision the additional compute power that was used last season, and
expect growth.
60. You are monitoring a service that uses n2-standard-2 Compute Engine instances that
serve large files. Users have reported that downloads are slow. Your Cloud Monitoring
dashboard shows that your VMs are running at peak network throughput. You want
to improve the network throughput performance. What should you do?
a. Add additional network interface controllers (NICs) to your VMs. Neither
additional virtual network interfaces (vNICs) nor additional IP addresses per
vNIC increase ingress or egress bandwidth for a VM. For example, a C3 VM with
22 vCPUs is limited to 23 Gbps total egress bandwidth. If you configure the C3
VM with two vNICs, the VM is still limited to 23 Gbps total egress bandwidth,
not 23 Gbps bandwidth per vNIC.
b. Deploy a Cloud NAT gateway and attach the gateway to the subnet of the VMs.
c. Change the machine type for your VMs to n2-standard-8. Changing the
machine type to a higher-capacity will increases the available resources,
including CPU, memory, and potentially network bandwidth.
d. Deploy the Ops Agent to export additional monitoring metrics.
61. Your organization is starting to containerize with Google Cloud. You need a fully
managed storage solution for container images and Helm charts. You need to identify
a storage solution that has native integration into existing Google Cloud services,
including Google Kubernetes Engine (GKE), Cloud Run, VPC Service Controls, and
Identity and Access Management (IAM). What should you do?
a. Use Docker to configure a Cloud Storage driver pointed at the bucket owned
by your organization.
b. Configure an open source container registry server to run in GKE with a
restrictive role-based access control (RBAC) configuration.
c. Configure Artifact Registry as an OCI-based container registry for both Helm
charts and container images.
d. Configure Container Registry as an OCI-based container registry for container
images.
62. You are running an application on an autoscaled managed instance group in a single
zone. This is a high-priority workload, and you need to test changes to the managed
instance group before any changes are finalized. You need to ensure that there is
sufficient capacity to serve any requests while you are testing the changes. What
should you do?
a. Configure autoscaling to "Only scale out" while you make changes. Configuring
autoscaling to “Only scale out” ensures that capacity doesn’t drop. If there are
more requests, the managed instance group scales out.
b. Change the managed instance group to run in multiple zones. Changing the
managed instance group to run in multiple zones does not guarantee that the
changes being tested don't impact capacity.
c. Temporarily disable autoscaling while you make changes. Disabling
autoscaling only supports the current capacity, which might not be sufficient
during the period you make the changes.
d. Enable predictive autoscaling. Predictive autoscaling does not guarantee that
your application will have sufficient capacity. While you make changes to the
existing configuration, the number of actual requests might spike or the changes
that you make might affect the predictive autoscaling, which might result in
insufficient capacity.
63. Your company requires that all employees within a team use a consistent, templated
development environment, which includes popular integrated development
environments (IDEs). You need to maintain security patches and updates to the
development environments. You also need to identify a reliable, Google-
recommended way to ensure that all developers within each team use a consistent
development environment. What should you do?
a. Create daily Compute Engine snapshots of a 'golden' machine with all security
patches and make the snapshots available for developers to update their own
machines. Relying on the developers to make the update themselves might
result in inconsistencies.
b. Create a checklist of all software updates with update scripts and allow
developers to update their own machines. Relying on the developers to make
the update themselves might result in inconsistencies.
c. Create an image with all up-to-date dependencies and install the image on
laptops for the developers to use. Critical patches and updates cannot be
reliably rolled out to all developers’ laptops.
d. Create Cloud Workstations configurations specific to each team. Cloud
Workstations configurations provide templates for the creation of consistent
workstations across multiple developers, and specify configuration settings
such as machine type, disk size, tools, and pre-installed libraries. Any operations
performed on a workstation configuration, such as changing the machine type
or container image, reflect on each workstation the next time the workstation
starts up.
64. Your team has received information about a security patch that addresses a critical
vulnerability. The patch needs to be applied to Compute Engine instances that serve
millions of customers across the world. You need to roll out the security patch quickly,
while minimizing cost and impact on end customers. What should you do?
a. Perform a canary rollout of the security patch with a short duration. A canary
rollout with a short duration lets you quickly and cost-effectively identify any
side effects of the security patch before you expand the rollout.
b. Simultaneously update of all the Compute Engine instances across the world.
If the security patch has any adverse or unexpected implications, all users could
be affected simultaneously.
c. Perform a 24 hour-rolling update to occur during each region’s overnight
hours. This is not an effective strategy, because critical systems in regions that
currently serve customers in the day time could be an attack vector.
d. Perform an A/B rollout, and create a copy of all the instances in Compute
Engine. Apply the security patch, and then switch users. This is not cost-
effective and would also be time consuming to create a copy of all the services
and machines.
65. You have an application deployed on Cloud Run with users around the world. You
need to configure Cloud Run to route all users to their nearest region so that request
responses have low latency. What should you do?
a. Deploy the application to any one region, and serve the application at a global
anycast IP address behind a global external HTTP(S) load balancer. All requests
would have to be routed to the single Cloud Run instance, which causes latency
and congestion.
b. Deploy the application to each region, and serve the application at a global
anycast IP address behind a global external HTTP(S) load balancer. The Cloud
Run instance needs to be available in the regions closest to where the users are
located. The global external HTTP(S) load balancer can be the endpoint for the
anycast IP address which will route the request to the closest region with the
Cloud Run instance running.
c. Deploy the application to one region per continent and serve the application
by using a regional external HTTP(S) load balancer in each region and attach
the same global IP address to all the regional load balancers. This is not the
most efficient solution. You will have to set up multiple load balancers which
are complex in configuration and not unified for the customer.
d. Deploy the application to any one region, and configure Google Cloud Armor
in the frontend to allow traffic from all IP addresses. Google Cloud Armor allow
rules will allow traffic from all over the world, but responses will have higher
latency when served from one region.
66. Your company is running a legacy database in Google Kubernetes Engine (GKE) that
does not respond well to Pod evictions and rescheduling. Your company cannot afford
a disruption to the running database. You need to choose a GKE maintenance strategy
to use. What should you do?
a. Configure the GKE maintenance channel to “Stable”. If you choose the stable
release channel alone, all updates will be performed on the stable release
schedule, which can cause Pod evictions.
b. Configure a node pool upgrade strategy with max-surge-upgrade set to 0. A
surge upgrade strategy will not restrict maintenance updates, and that could
cause pod evictions.
c. Create a "PodDisruptionBudget" object and specify maxUnavailable as 100%.
A ”PodDisruptionBudget” of 100% will not stop the maintenance
upgrades,therefore, it could cause Pod evictions.
d. Configure a maintenance exclusion window with a scope of “No minor or node
upgrades”. The "No minor or node upgrades" setting prevents node pool
disruption and avoids any eviction and rescheduling of your workloads because
of node upgrades.
67. Your organization uses Cloud Run to deploy a containerized application. A new version
of the application recently passed internal tests and now needs to be tested in
production. You need to implement a configuration so that beta testers can test the
application before you roll out the application to end users. What should you do?
a. Deploy the Cloud Run application to production, and direct 1% of traffic to the
application. End users might fall within the 1% and access the application.
b. Configure Google Cloud Armor to reject requests that originate from IP
addresses outside of the testing team. Maintaining a list of IP addresses that
the testing team uses is cumbersome and not reliable.
c. Deploy the container to a private VM instance on Compute Engine, and enable
access for the testing team only. This deployment environment is different
from the production environment, and more effort is required to reliably roll
the application out to end users.
d. Deploy the Cloud Run application to production with a revision tag and the “-
-no-traffic“ option, and share the URL with the testing team. Assigning a tag
lets you access the revision at a specific URL without serving traffic.
68. You work for a small company that uses open source Java packages as part of the
CI/CD pipeline in Cloud Build. Your security team is concerned that vulnerabilities in
the open source packages could be exploited. You need to implement a cost-effective
solution to ensure that your production system dependencies are secure. What should
you do?
a. Pull only the latest versions of GitHub packages into your Cloud Build pipeline.
Pulling the latest open source versions might have bugs and vulnerabilities that
have not been discovered yet.
b. Pull the packages into Cloud Build from Assured Open Source Software
(Assured OSS) packages. The Assured OSS Packages are built in Google Cloud’s
own secured pipelines and are regularly scanned, analyzed, and fuzz tested for
vulnerabilities.
c. Pull the packages into Cloud Source Repositories, and have the security team
validate open source packages. This is not the most cost-effective solution
because Google Cloud already offers open source packages that have been
curated and validated.
d. Pull the open source packages locally, use Cloud Build to scan all packages,
and delete any flagged packages. Scanning all packages in each build is not cost-
effective. Also, builds will not be successful if you delete essential packages.
69. You have a custom application that runs on a few Compute Engine instances. The
servers occasionally experience spikes in utilization up to 90% but then quickly return
to about 50% utilization. You have alerts set up to notify you to take action when your
server utilization is above 80%. You are receiving too many notifications, and need to
update your alerts to only notify you when there is consistently high utilization for at
least 5 minutes. What should you do?
a. Send notifications to a Cloud Run application that identifies utilization above
80% for 5 minutes. It is a considerable effort to build a custom solution to filter
and analyze alerts.
b. Create an alert from a log-based metric with a rolling window of 5 minutes
and a metric absence condition. The absence of the metric is not indicative of
high utilization.
c. Create an alert from a log-based metric with a rolling window of 5 minutes
and a mean threshold of 80%. You need to be notified only when utilization is
consistently high over a period of 5 minutes.
d. Create an alert from a log-based metric with a rolling window of 5 minutes
and a mean threshold of 90%. A notification based on 90% reduces the number
of notifications but it does not trigger for high utilization between 80% and 90%.
70. Your organization has many projects with production environments that have to be
migrated to Google Cloud. You manage organizational policies centrally. You want to
follow Google-recommended practices to roll out policy changes in the resource
hierarchy. What should you do?
a. Create an organization to experiment with policy changes and a separate one
for the production environments. Any change in the resource hierarchy the
policy applies to might have far-reaching and unintended consequences. To
mitigate this risk, it is a recommended practice to test policy changes in a
separate organization before you apply them in your main organization.
b. Create an organization for the production environments, and then apply
policy changes gradually across this organization going down the hierarchy.
Many policies are applicable to the entire organization. Policy changes to the
existing production organization might have far-reaching and unintended
consequences.
c. Create an organization for the production environments and a staging
environment for pre-release checks. Perform all experiments in the staging
environment. Staging environments should reflect the production environment
configuration. It's not recommended to change configurations or make the
security configurations easier just to do experiments.
d. Create an organization for the production environments, and then apply
policy changes gradually going up the hierarchy starting from production
projects in this organization. Policy changes to the existing production
organization could have unintended consequences.
71. Your organization runs a fault-tolerant, batch processing application that processes
very large amounts of data. The computations are spread over hundreds of VMs for
parallel processing. You need to implement a cost-effective solution to perform the
data processing at scale. What should you do?
a. Allocate Spot VMs to process data. Spot VMs are a cost-effective solution for
fault-tolerant processing in Compute Engine in comparison to regular VMs.
b. Allocate sole tenant machines to process data. Sole tenant VMs are not the
most cost-effective solution.
c. Create a managed instance group with autohealing enabled. Autohealing will
automatically repair unhealthy VMs, but it won't be the most cost-effective
solution with a regular VM configuration.
d. Create a managed instance group with accelerator-optimized VMs and
autoscaling enabled. Autoscaling allocates VMs as required, but it’s not the
most cost-effective solution with an accelerator-optimized VM configuration.
72. Your security team requires you to create a replica of the production infrastructure
for testing in case of a suspected system breach. You want to follow Google-
recommended practices to quickly deliver replicas to the security team. What should
you do?
a. Run Cloud Asset Inventory, list the resources in the production environment,
and recreate the resources by using the Google Cloud console. Repeatedly
recreating entire technical environments manually through the console is not
Google-recommended.
b. Run a scan with the Security Command Center, list the resources in the
production environment, and recreate the resources by using the gcloud CLI.
Resources listed in the Security Command Center will not have sufficient
information to recreate them. Also, repeatedly recreating entire technical
environments manually with the gcloud command line tool is not a Google-
recommended practice.
c. Create a Terraform script to recreate your production environment and rerun
the script as required. Infrastructure as code is the recommended way to
manage repeatability technical environments. Terraform is the Google-
recommended tool for infrastructure as code.
d. Create bash shell scripts to recreate your production environment, and rerun
the scripts as required. Repeatedly recreating entire technical environments
manually with bash scripts tool is not a Google-recommended practice.
73. You need to investigate some network traffic to a VM in your Google Cloud VPC. You
have VPC Flow Logs enabled. However, when you filter for the traffic information, the
network traffic is not available. You need to identify the issue. What should you do?
a. Determine whether you are filtering by TCP protocol. This is not the issue. TCP
is supported by VPC Flow Logs, and it should be available in the data captured.
b. Determine whether you need to lower the sample rate. Lowering the sample
rate captures less logs. A sample rate of 1.0 (100%) means that all entries are
kept.
c. Determine whether the traffic you are filtering for has very low volume. Very
low traffic volume could be the issue. Logs are sampled, so some packets in low
volume flows might be missed.
d. Determine whether the traffic is flowing from a VM outside your VPC to your
VM inside the VPC. This is not the issue. VPC Flow Logs can capture traffic
coming from external servers into VMs in a Google Cloud VPC.
74. Your colleague noticed an increasing amount of suspicious traffic entering your
project's virtual private cloud (VPC) while on call at night. Your colleague updated the
configuration of Cloud Armor to stop the traffic that originates from certain IP
addresses. The change reduced the network traffic, but there were escalations from
an affected customer, which resulted in penalties for your company. Your team wants
to use Google-recommended practices to prevent a recurrence. What should you do?
a. Institute a policy that requires that all changes are made during day time when
there are additional people to review changes. This is not a viable policy,
because configurations would need to be changed at any time. In this case, a
potential cyber attack happened at night. The team needs processes that can
be applied any time.
b. Create a process that requires a second level of review for this colleague for
any changes. This approach does not follow recommended practices because it
focuses specifically on an individual instead of improving the process for
everybody.
c. Identify core technical issues, and document the learnings for use by the entire
team. This solution takes the DevOps/SRE approach of objectively identifying
the technical issue without blaming the individual.
d. Create an incident report for your colleague and escalate to senior
management. This approach does not follow recommended practices, because
it focuses on blaming the individual instead of improving the process.
75. You are creating a software delivery pipeline for your company. New application
features delivered by the development team must be deployed by using Cloud Deploy
in testing, staging, and then production. The corresponding deployment targets are
already defined. You instantiated the delivery pipeline by creating a release. You need
to perform the deployments while following Google-recommended practices. What
should you do?
a. Create an image for each stage in Artifact Registry and apply each image in the
corresponding stage. Creating separate images for each deployment is not the
recommended approach.
b. Configure Binary Authorization to approve the deployment before each stage.
Binary authorization provides a security check prior to deployment. However, it
is not a solution for deployment itself.
c. Create an independent delivery pipeline for each of the delivery stages.
Creating separate delivery pipelines in Cloud Deploy is not recommended for
this use case. The recommended approach is to promote the deployment to the
next stage.
d. Promote the initial release through the staging and production stages.
Instantiating the release will deploy to the first target. You can promote it to go
to staging and then again to go to production.
76. Your organization runs many builds on Google Cloud daily. You need to ensure that
VMs do not have public IP addresses and secure the build machines against data
exfiltration. You want to use Google-recommended practices to create a
customizable, easily-managed solution for your builds that can scale to the demands
of your organization. What should you do?
a. Implement private pools in Cloud Build with VPC service control. Creating a
VPC service control, and running builds in a private pool gives you more security
and allows multiple customizations that the default pool does not allow.
b. Run and configure the builds in Cloud Build in a default pool for required
privacy. The default pool does not allow customization to limit VM
configurations such as public IP addresses.
c. Create a build environment by using Compute Engine VMs in a custom VPC.
Use Jenkins to run the builds. This custom approach incurs more effort and is
not the recommended approach on Google Cloud.
d. Create a build environment by using Compute Engine Spot VM instances in a
custom VPC. Use Jenkins to run the builds. This custom approach incurs more
effort and is not the recommended approach on Google Cloud.
77. Your company has a major sales event planned in September. You have identified a
group of employees to support operations during the sales event. You want to follow
Google-recommended practices to provide the employees with access to Google
Cloud resources for September only. What should you do?
a. Add a firewall rule to allow traffic from the identified employees’ machines to
the resources. Change the firewall rule to deny traffic at the end of September.
The firewall rules might allow network traffic, but that does not mean that the
user has the IAM permissions to work on the resources.
b. Create a group that contains the identified employees. Assign the group
permissions to access the resources and configure an Identity and Access
Management (IAM) condition with a date/time expression for September.
IAM conditions automatically assign the required permissions for the
designated period only.
c. Create a group that contains the identified employees and assign the group a
custom Identity and Access Management (IAM) role that has permissions to
access the resources. Delete the custom role at the end of September. You
need to remember to remove the role at the end of September, which is not
recommended. Also, the permissions apply right away, which is not necessary.
d. Assign a custom Identity and Access Management (IAM) role that has
permissions to access the resources to each of the identified employees.
Create a Cloud Scheduler task to delete the custom role at the end of
September. Assigning roles to individual users is not recommended. Also, using
Cloud Scheduler to later delete the role is not the recommended practice.
78. Your company runs applications that serve users in Europe. Industry regulations
require you to instantiate resources only in specific regions. You need to ensure that
all projects in your Google Cloud organization follow this requirement. What should
you do?
a. Create a log-based alert to identify resources created outside of the allowed
list of regions. Delete any non-compliant resources. This option allows a
resource to be created, which is a breach of the regulation.
b. Run an hourly Asset Inventory scan to catalog all resources. Delete any
instance outside of the allowed list of regions. This option allows a resource to
be created, which is a breach of the regulation.
c. Create a limited group of users with permission to create new instances.
Create processes to verify compliance. This is not reliable, because the team
might still accidentally or maliciously create resources in the wrong regions.
d. Configure an organizational policy with the allowed regions in an “in:
allowed_values“ list. The organizational policy constraint preempts the
creation of a resource if it is not in the allowed regions.
79. Your application has a module that performs a computationally intensive task. Your
development team wants to optimize this module by experimenting with different
algorithms. You need to help the team test the algorithms in both the staging and
production environments to gather detailed data on the performance of each
algorithm. What should you do?
a. Instrument the code with profiling libraries. Visualize the data in Cloud Profiler
as flame graphs. Flame graphs in Cloud Profiler help you visualize the call stack
within a program. This lets you identify bottlenecks and guides you in optimizing
the code and algorithms.
b. Instrument the code to capture the flow of requests and responses. Review
the request latencies in Cloud Trace. Cloud Trace does not provide sufficient
information on benchmarking code. It is better suited for distributed tracing of
applications with multiple calls between them.
c. Instrument the code with log statements at various parts of the algorithm.
Send log-based notifications to collect data on the algorithm's execution. Log
statements affect the timing of algorithms. Data collected in this way is
inconvenient to visualize and is not detailed enough to provide guidance on
bottlenecks.
d. Instrument the code with log statements at various parts of the algorithm.
Send notifications based on log-metrics when the same control structures are
called multiple times. Log statements affect the timing of algorithms. Data
collected in this way is inconvenient to visualize and is not detailed enough to
provide guidance on bottlenecks.
80. Your code is stored on GitHub and built with Cloud Build, both manually and with
webhooks. Building the code requires accessing confidential information. You want to
follow Google-recommended practices to ensure the security of the confidential
information. What should you do?
a. Trigger the builds from the command line each time, and supply the required
confidential information as parameters. This solution does not work for those
cases when webhooks trigger the build.
b. Create an encrypted file in your GitHub repository to only contain the
confidential information, and reference the encrypted file in your Cloud Build
file. As a rule, any confidential information, secrets, and credentials should not
be stored in a source repository. Others who have access to the GitHub
repository would have access to the credentials, thus breaking the principle of
least privilege.
c. Create an entry in Secret Manager, specify the URI for the confidential secrets
in the Cloud Build file, and reference the secret as parameters in the build
steps. Secret Manager can be configured with Identity and Access Management
(IAM) to follow the principle of least privilege. Cloud Build can reference Secret
Manager and you can include the secrets in the build steps by using parameter
substitution
d. Retain the code in Github. Create a new repository only for confidential
information in Cloud Source Repositories, and reference the new repository in
your Cloud Build file. As a rule, any confidential information, secrets, and
credentials should not be stored in a source repository. Isolating confidential
information in a separate repository is also not acceptable. Others who have
access to the Cloud Source Repositories repository would have access to the
information, thus breaking the principle of least privilege.
81. Your company uses an application that is deployed on Compute Engine virtual
machines (VMs) to deliver a service to your customer. You have a Service Level
Agreement (SLA) to provide 99% availability per month. Telemetry data over the first
22 days of the month shows that you served 10,000,000 requests with 99.28%
availability. The development team has added new features to the application and
wants to deploy them. You need to decide on a deployment approach while following
Site Reliability Engineering (SRE) practices. What should you do?
a. Deploy the new features right away to Confidential VMs. Confidential VMs do
not guarantee high availability for end-user requests.
b. Defer the deployment of the new features until the next month. The error
budget is very low and you need to balance stability over feature updates.
c. Ask the customer to modify the SLOs and deploy new features right away.
SLOs are an internal measure and do not reflect your service agreement with
the customer.
d. Ask the customer to modify the SLA to 98.5% and deploy new features right
away. Changing SLAs when you have a low remaining error budget is not a
recommended approach for application delivery.
82. You support a large service with a well-defined Service Level Objective (SLO). The
development team deploys new releases of the service multiple times a week. If a
major incident causes the service to miss its SLO, you want the development team to
shift its focus from working on features to improving service reliability. What should
you do before a major incident occurs?
a. Develop an appropriate error budget policy in cooperation with all service
stakeholders. Una política de presupuesto de errores permite definir cómo se
gestionan los errores y las desviaciones del SLO. Al tener esta política en vigor y
acordada por todas las partes interesadas antes de que ocurra un incidente
importante, se puede asegurar que hay un plan claro para priorizar la fiabilidad
del servicio sobre el desarrollo de nuevas características cuando sea necesario.
b. Negotiate with the product team to always prioritize service reliability over
releasing new features. Sugiere negociar con el equipo de producto para
siempre priorizar la fiabilidad del servicio sobre la liberación de nuevas
características. Sin embargo, esta opción puede ser demasiado rígida y no
considerar las necesidades cambiantes del negocio ni las expectativas de los
clientes. Además, siempre priorizar la fiabilidad sobre nuevas características
puede no ser práctico o realista en todas las situaciones.
c. Negotiate with the development team to reduce the release frequency to no
more than once a week.
d. Add a plugin to your Jenkins pipeline that prevents new releases whenever
your service is out of SLO.
83. Peter is working at TPT Ltd. He has been asked to deploy a new application both on-
premises at various international locations for internal use and on the Google Cloud
Platform for external use.
Given - In order to optimize resource utilization, all traffic is directed through Google
Cloud load balancers. Requirement - Service Level Indicators (SLIs) need to collect
latency and availability data comprehensively.
Which of the following options should Peter choose to meet the requirement?
a. Gather and process data separately from the on-prem VMs and the GCP VMs
to create separate SLIs which can then be aggregated.
b. Gather the data only from the GCP VMs as they represent your most important
users.
c. Gather the data from the load balancers.
d. Gather and process data together from the on-prem VMs and the GCP VMs to
create a single bucketed SLI.
84. Peter is working at TPT Ltd. His team is responsible for managing an application serving
a global audience. A recent update caused a service downtime. Requirement - Peter
has been designated as the Incident commander. Which of the given options should
be excluded by Peter in the Incident Document according to Google SRE’s best
practices?
a. Command hierarchy.
b. Incident timeline.
c. List of Actions carried out to restore the service.
d. The developers are responsible for the update.
85. Peter is working at TPT Ltd. He has been assigned the task of investigating the
progressive decline in response time for a production application. The application is
deployed on a Managed Instance Group consisting of five instances.
Which of the following measure should Peter undertake to investigate this issue while
minimizing unnecessary overhead?
a. Install the monitoring agent and create a dashboard to view the response
time.
b. Install the logging agent and create a logs-based metric in Cloud Logging
c. Install the debugger agent and investigate in the Cloud Debugger console.
d. Instrument your application with the tracing agent and inspect the latency
data in the Cloud Trace console.
86. In order to examine a sample of network traffic within a particular subnet on your
Virtual Private Cloud (VPC), which tool should you employ for analysis?
a. VPC Flow Logs
b. Monitoring Agents
c. Firewall Logs
d. Logging agents
87. During which phase of a service's life cycle is an SRE (Site Reliability Engineering) least
engaged?
a. During the General Availability phase.
b. During the Limited Availability phase.
c. During the Active Development phase.
d. During the Architecture and Development phase.
88. A newly emerged public cloud provider is gaining significant popularity. Your SRE team
is receiving a substantial number of tickets regarding resource consumption quota
increases.
In accordance with Google's SRE best practices, what steps can you take to enhance
this situation?
a. Set aside time to automate the collection, validation and update of the Quotas
b. Create a rotating schedule for the SRE team to handle the requests.
c. Employ more staff to handle the quota requests as needed.
d. Increase the current quota of all users.
89. Peter is working at TPT Ltd., a gaming company. Recently, it was decided to transition
all the companies operations to the cloud.
For this, applications will be created and deployed using cloud services like Compute
Engine. Requirement - Peter has been asked to gather all audit logs from the used
services. Which of the following option should Peter choose to meet the requirement?
a. Enable the Access Transparency logs
b. Enable the System Event audit logs.
c. Enable the Data Access audit logs for the services used.
d. Enable the Admin Activity audit logs for the services.
90. Which of the given options does NOT represent a characteristic of white-box
monitoring?
a. Includes metrics exposed by the internals of the system.
b. Used to test externally visible behavior as the user would see it.
c. The focus is on predicting problems.
d. Best for detecting imminent issues.
91. Peter is working at TPT Ltd. He has been asked to assist in the designing of a data
processing pipeline for the company. Given - The pipeline receives data streams from
various devices, processes the data, and ultimately loads it into a final storage for
analysis purposes. Requirement - Determine the essential Service Level Indicators
(SLIs) for the pipeline, ensuring the data in the final storage remains current. Which
SLI should Peter exclude from considerations?
a. Durability
b. Correctness
c. Latency
d. Throughput
92. Which of the following describes the consequences when the availability of an
application is compromised?
a. SLA.
b. SLO.
c. PRR.
d. SLI.
93. You‘re using Stackdriver (Cloud Operations) to set up some alerts. You want to reuse
your existing REST-based notification tools that your ops team has created. You also
need the setup to be as simple as possible to configure and maintain since your
customer does not have programming skills. Which notification option would be the
best option?
a. Create a webhook to get this done
b. Send notifications via SMS and use a custom app to forward them to the REST
API.
c. Send it to an email account that is being polled by a custom process that can
handle the notification.
d. Send notifications via Cloud Pub/Sub and use a custom app to forward them
to the REST API.
94. Google Cloud has both types of services that are “ops and no-ops” management
requirements when as referring to customizability with compute services. How would
you rank the four compute services on a scale ranging from the fewest management
requirements and lowest customizability to the most management requirements and
highest customizability?
a. Cloud Functions, App Engine, Compute Engine, Kubernetes Engine
b. Cloud Functions, Compute Engine, Kubernetes Engine, App Engine
c. Cloud Functions, Kubernetes Engine, App Engine Compute Engine
d. Cloud Functions, App Engine, Kubernetes Engine, Compute Engine. Compute
Engine es lo más customizable, luego entre GKE y App Engine tiene mucho más
pinta que el GKE.
95. You are currently looking at your GCP platform with gcloud and would like to list all
the instances in GCP Compute Engine. What command would you use?
a. gcloud grep compute instances
b. gcloud compute list instances
c. gcloud compute instances list
d. gcloud compute instances grep
96. Which of the following are the typical SRE activities according to Google?
a. Software Engineering, Systems Engineering, Toil and Overhead
b. Software Architecture, Systems Engineering, Toil and Overhead
c. Software Engineering, Systems Engineering, Toil and Support
d. Software Engineering, Systems Development, Toil and Overhead
97. You have been advised by your CISO that you will need to maintain a record of all
policy violations and failed deployment attempts around your GKE container
deployments. What service in Google Cloud would be the best solution?
a. Container Analysis
b. Binary Authorization
c. Use a Cloud Marketplace Solution for GKE Logs
d. Use a Cloud Marketplace solution for Splunk
e. Cloud Audit Logs. Cloud Operation (Stackdriver) Logging maintains a record of
all policy violations and failed deployment attempts using Cloud Audit Logs.
98. The _______________________________resource represents the Access Control Lists
(ACLs) for buckets within Google Cloud Storage. ACLs let you specify who has access
to your data and to what extent.
a. SetIAMPolicy
b. DefaultAccessControls
c. TestIAMPermissions
d. BucketAccessControls. Buckets contain objects which can be accessed by their
own methods. In addition to the ACL property, buckets contain
bucketAccessControls, for use in fine-grained manipulation of an existing
bucket‘s access controls
99. You are currently deploying an application on a Kubernetes cluster. Your aware that a
Deployment’s rollout is triggered if and only if the Deployment’s pod template is
changed, for example if the labels or container images of the template are updated.
Other updates, such as scaling the Deployment, do not trigger a rollout. What is the
file name that would need to be changed?
a. .template.yaml
b. .spectemplate.yaml
c. .spec.template
d. App.py
DevOps Professional GCP III
1. Google states that an SRE should not spend no more than what percentage of time on
operations?
a. 50%. Google states that SREs should not be spending more than 50% of their
time on operations and considers any violation of this rule a sign of system poor
health.
b. 25%
c. Depends on the scenario used
d. There is no official number from Google
e. 75%
2. Which of the following is an API that is used to store trusted metadata about our
software artifacts and is also used during the Binary Authorization process
a. Attestation
b. Container Analysis. Container Analysis is an API that is used to store trusted
metadata about our software artifacts and is used during the Binary
Authorization process.
c. Cloud Audit
d. DLP
3. What is the main issue that SRE really exists for to solve?
a. Remove layers of redundancy in software development
b. Reducing latency of applications
c. Increasing performance of applications.
d. Remove silos in the organization and create a hybrid role. The main issue that
SRE exists to solve is that of organisational silos. SRE works to break down the
traditional barriers between Dev and Ops by pulling together their roles into
a new hybrid role.
A Site Reliability Engineer whose job is balanced between developing new
features on the one hand, and ensuring that production systems run smoothly
and reliably, on the other.
SREs enable development teams to deploy faster, while using any failures that
occur as pointers towards relentlessly improving the overall health of their
system.
4. You are currently reviewing your project in GCP using gcloud. You would like to
confirm what the DNS related info is for a project. What is the command to do this?
a. gcloud dns project-info show
b. gcloud dns project-info describe. Using gcloud is very important for this exam
around Kubernetes since the gcloud commands are what interact with GCP
resources that create and manage the clusters and then the kubectl, which is
the Kubernetes command line tool is used to run commands against Kubernetes
clusters on GKE.
c. gcloud dns project-info list
d. gcloud dns project-info grep
5. You have downloaded the SDK kit from Google and now would like to manage
containers on GKE with gcloud. What command would be typed to install kubectl in
the CLI?
a. gcloud components kubectl install
b. gcloud components install components kubernetes
c. gcloud components install components kubectl
d. gcloud components install kubectl.
6. A recent software update to your enterprises e-commerce website that is running on
Google Cloud has caused the website to crash for several hours.
Your CTO decides that all critical changes must now have a back-out/roll-back plan.
The website is deployed on hundreds of virtual machines (VMs), and critical changes
are frequent.
Which two actions should you take to implement the back-out/roll-back plan?
a. Enable object versioning on the website‘s static data files stored in Google
Cloud Storage
b. Use managed instance groups with the "update-instances" command when
starting a rolling update
c. Create a new instance template with applied fixes and roll out via A/B test
d. Use unmanaged instance groups with the "update-instances" command when
starting a rolling update
7. Which of the following is the “maximum size” of a single cached data value in
MemCache?
a. 1GB
b. 10MB
c. 1MB. 1MB maximum size of a cached data value is 1 MB (10^6 bytes).
d. 10GB
8. You are evaluating new GCP services and would like to use tools to help you evaluate
the costs of using GCP. What are two tools available from GCP to help analyse costs.
a. Cost Optimization Tool
b. TCO Tool
c. Pricing Calculator
d. ROI Calculator
9. Your currently an SRE for My widgets Corp. The development team has asked you to
deploy a Java 9 application on GCP App Engine. You realize that you can’t use App
Engine Standard because Java 8/11 is the only Java version supported at the time of
your planning.
What are your options for this scenario?
a. Let the team know that they will deploy this internally since it’s not support
on GCP
b. App Engine Flexible will work with custom runtimes on containers
c. App Engine Flexible will deploy VMs only
d. Advise the team to know that App Engine Standard does not support this
runtime environment
10. Which of the following statements are true when discussing toil in SRE?
a. A toil budget is a result of toil hours minus toil costs
b. Reducing toil is one of the most important tasks of an SRE
c. Toil is mundane, repetitive operational work providing no enduring value,
which scales linearly with service growth
d. Reducing toil is not expected as an SRE, that is DevOps that reduces toil
11. What role in an SRE organization manages the immediate, detailed, technical, and
tactical work of the incident response, which is typically the largest aspect of the
response?
a. Incident Commander
b. Operations Lead. The Ops Lead (OL) or tech lead works to respond to the
incident by applying operational tools to mitigate or resolve the incident
c. Communications Lead
d. Project Manager
12. You are currently working on a pipeline that is hosted on GCP and you are getting
ready to deploy your working copies with Cloud Build. The following command would
do what as part of the process git checkout -b new-feature
a. Create a production branch and push it to the Kubernetes
b. Create a development branch and push it to the Cloud Build server
c. Create a production branch and push it to the Git server
d. Create a development branch and push it to the Git server
13. You are deploying an application to a Kubernetes cluster that requires a username and
password to connect to another service.
When you deploy the application, you want to ensure that the credentials are used
securely in multiple environments with minimal code changes. What should you do?
a. Use a Kubernetes secret and setup a KSM to handle the secrets
b. Enable the Kubernetes secret API and then setup a KSM to handle the secrets.
c. Store the credentials in a Kubernetes Secret and then allow the application
access via environment variables at run time. This will enable secrets usage
without needing to modify the code per environment, update build pipelines,
or store secrets insecurely.
d. Bundle the secret in a file on your desktop and add as needed
14. The Monitoring agent, ________________, is based on the original collectd system
statistics collection daemon?
a. Stackdriver-statsd
b. Stackdriver-collectd
c. Stackdriver-agent. The Stackdriver Monitoring agent is a collectd-based
daemon that gathers system and application metrics from virtual machine
instances and sends them to Stackdriver Monitoring
d. Stackdriver-agent-collectd
15. By default you can create up to _______ networks per project.
a. 100
b. 50
c. 10
d. 5
16. You have been contacted by the enterprise support team which has told you there
have reports of significant latency at specific times for an application running on GCP.
They would like you to review the issue and provide them insight into why the
application is latent at specific times?
What Google Cloud service could you use to inspect latency data that has been
collected in near real time?
a. VPC Flow Logs
b. Debug
c. Profiler
d. Trace. Cloud Trace formerly Stackdriver Trace is a distributed tracing system
that collects latency data from your applications and displays it in the Google
Cloud Console. You can track how requests propagate through your application
and receive detailed near real-time performance insights. Cloud Trace
automatically analyses all your application‘s traces to generate in-depth latency
reports to surface performance degradation, and can capture traces from all
your VMs, containers, or App Engine projects.
17. The__________ Tier delivers traffic over Google’s well-provisioned, low latency,
highly reliable global network.
a. VPN
b. Cloud Interconnect
c. Standard
d. Premium. The Premium Tier delivers traffic over Google’s well-provisioned, low
latency, highly reliable global network.
18. You have just created a cluster called “devops” in GKE and now you need to get
authentication credentials to interact with the cluster. What is the proper CLI syntax
to accomplish this task?
a. kubectl container clusters get-credentials devops
b. gcloud container clusters get-credentials devops
c. gcloud containers cluster get-credentials devops
d. kubectl containers cluster get-credentials devops
19. Who in an SRE organization coordinates efforts of the response team to address an
active incident?
a. Communications Lead
b. Incident Commander. Incident Commander is the person who declares the
incident typically steps into the IC role and directs the high-level state of the
incident.
c. Project Manager
d. Primary Engineer
20. You are currently planning a Kubernetes deployment on premises but also extending
Kubernetes to GCP as well. Your team would like to understand how management,
routing could work as well as how users could extend services in a cluster. What would
you specify to deal with these concerns?
a. Edge Proxy. The edge proxy is commonly called an ingress controller because it
is commonly configured using ingress resources in Kubernetes, however the
edge proxy can also be configured
b. Cloud Endpoints
c. Ingress Controller. Kubernetes ingress is a collection of routing rules that govern
how external users access services running in a Kubernetes cluster
d. Core Proxy
21. When we speak of Best Practices around IAM and specifically the “Principle of least
privilege” . What would be a best practices as related to least privilege?
a. Never control who can change policies and group memberships at the
organizational level
b. Always apply the maximum access level required
c. Never control who can change policies and group memberships at the project
level
d. Always apply the minimal access level required
22. What is the flag for estimating costs for bytes read in Bigquery with the bq command?
a. --estimate_reads
b. Must use the pricing calculator.
c. -dry_run. You can perform a dry run (estimate resources) for a query job by
using the dry run syntax
d. --dry_run_read
e. Must contact support for the BQ spreadsheet
23. Your organization would like to obtain significant discounts on your VM instance
deployments on Google Cloud. These VM instances only need to be used for a few
hours a month.
What pricing model would you want to consider?
a. Committed Use Instances
b. On Demand Instances
c. Spot Instances
d. Reserved Instances
e. Preemptible Instance. There is terminology that is also AWS terminology such
as Spot and Reserved. Googles form of “ Spot” instances are “Preemptable” . A
preemptible VM is an instance that you can create and run at a much lower price
than normal instances. However, Compute Engine might terminate at GCP will
these instances if it requires access to those resources for other tasks.
24. Your customer is currently developing on App Engine with Python. They would like to
implement standard images for their VM configurations. What deployment do they
need to subscribe to so to enable the use of a standard image off their own VM’s?
a. App Engine Flexible
b. App Engine Standard App Engine Standard is all they need if there is no specific
development language specified like Node.js App Engine Standard runs a
sandbox. App Engine Flexible deploys via containers. The question did not
specify a specific version of Python.
c. App Engine Custom
d. Neither deployment support Python
25. What would be the best definition of “StatefulSets” with Google Kubernetes Engine
a. StatefulSets represent a Cluster with unique, persistent identities and stable
hostnames that GKE maintains regardless of where they are scheduled
b. StatefulSets are applications which do not store data or application state to
the cluster or to persistent storage
c. StatefulSets represent a Service with unique, persistent identities and stable
hostnames that GKE maintains regardless of where they are scheduled
d. StatefulSets represent a set of Pods with unique, persistent identities and
stable hostnames that GKE maintains regardless of where they are scheduled.
This is the correct definition of StatefulSets in Google Kubernetes Engine (GKE).
StatefulSets are a Kubernetes resource that is used to manage a set of Pods with
a persistent identity. This means that each Pod in a StatefulSet has a unique
name and can be identified by its ordinal index within the set. Additionally,
StatefulSets guarantee that Pods in the set have stable hostnames, which is
important for applications that require persistent storage or need to
communicate with each other using stable network addresses.
26. You are currently designing a cloud application that your user base will connect to
without a gateway VPN. The company is wanting to ensure that the application
maintains user identity and context to guard access to the applications and VMs. What
would you recommend?
a. Cloud Endpoints
b. Cloud NAT
c. Cloud VPN
d. Identity Aware Proxy(IAP) IAP protects SSH and RDP access to your VMs hosted
on GCP. This is an effective whitelisting approach. Your VM instances don‘t even
need public IP addresses.
27. You have created several preemptible Linux virtual machine instances using Google
Compute Engine. You want to properly shut down your application before the virtual
machines are pre-empted.
a. Create a shutdown script and use it as the value for a new metadata entry with
the key shutdown-script and then use the Google Cloud Github for resources
to complete
b. Use the CLI, Console or API to pass the contents directly
c. Create a shutdown script and use it as the value for a new metadata entry with
the key shutdown-script in Deployment Manager
d. Create a shutdown script and use it as the value for a new log point entry with
the key shutdown-script in the Cloud Platform Console when you create the
new virtual machine instance
e. Create a shutdown script and use it as the value for a new metadata entry with
the key shutdown-script in the Cloud Platform Console when you create the
new virtual machine instance.
28. Your currently ready to deploy some Cloud Deployment Manager templates and you
will need to ensure specific requirements (“explicit”) exists before the templates
deploy. What would be the option you would add to your templates or configuration
files?
a. variables
b. dependsOn. You can specify these dependencies using the dependsOn option
in your configuration files or templates. When you add the dependsOn option
for a resource, Deployment Manager creates or updates the dependencies
before creating or updating the resource.
c. deployON
d. properties
29. Which command will configure Cloud Build to store the image in Container Registry as
part of build flow?
a. Push command
b. Docker insert command
c. Docker push command. docker push command will push an image or a
repository to a registry such as Container Registry. Specify the hostname which
specifies location where you will store the image.
To specify use these prefixes (multi-region)
gcr.io hosts images in data centers in the United States, but the location may
change in the future
us.gcr.io hosts image in data centers in the United States, in a separate storage
bucket from images hosted by gcr.io
eu.gcr.io hosts the images in the European Union
asia.gcr.io hosts images in data centers in Asia
The Docker credential helper is the simplest way to configure Docker to
authenticate directly with Container Registry. You then use the docker
command to tag, push, and pull images. Alternatively, you can use the client
libraries to manage container images, or you can interact directly with the
Docker API.
d. Pull command
30. Your company currently uses a third-party monitoring solution for your enterprise
apps. You are using Kubernetes Engine for your container deployments and would like
to enable this internal monitoring app for Kubernetes clusters. What would be the
best approach?
a. Deploy the monitoring extension for Stackdriver Trace
b. Deploy the monitoring pod as a cluster
c. Deploy a solution from the Cloud Marketplace
d. Deploy the monitoring pod as a DaemonSet. Many monitoring solutions use
the Kubernetes DaemonSet structure to deploy an agent on every cluster node.
S Note that each tool has its own software for cluster monitoring. Heapster is
another option that could also be used, Heapster is a bridge between a cluster
and a storage designed to collect the cluster metrics. Stackdriver is native to
Google Cloud and therefore the recommended approach by Google Cloud.
31. You have created a new image of an application without the signature part and you
tried to deploy. Instead of deploying you received an error “ Denied by Attestor”.
What could be the problem to resolve?
a. Contact Support since its clearly a Google Issue
b. App was deployed via a Marketplace solution that expired and needs to be
renewed
c. Extract the signature of PGP with Putty
d. Enable Cloud Build with proper permissions in IAM
e. Create an attestation and submit to Binary Authorization Binary Authorization
is a deploy time security service provided by Google that ensures that only
trusted containers are deployed in our GKE cluster. It uses a policy driven model
that allows us to configure security policies. Behind the scenes, this service talks
to the Container Analysis service. Attestation is a statement from the Attestor
that an image is ready to be deployed. This attestation needs to be submitted
properly or the error will occur. A note is needed which as well is a piece of
metadata in Container Analysis storage that is associated with an Attestor
There is a setup process required in the project that the cluster is hosted –
Enable the required APIs, Create a Kubernetes cluster that has Binary
Authorization enabled. Set up a Note. Generate the PGP keys
Create an Attestor
32. You have just started your cluster and deployed your pods. You now need to view all
the running pods. What is the proper CLI syntax to accomplish this task?
a. kubectl list pods
b. kubectl get pods
c. gcloud list pods
d. gcloud get pods
33. Where does Container Analysis store resulting metadata and makes it available for
consumption through an API
a. Cloud Storage
b. Cloud Source Repositories
c. GKE persistent storage
d. Container Registry Container Analysis is an API that is used to store trusted
metadata about our software artefacts and is used during the Binary
Authorization process. However, the scanning service performs vulnerability
scans on images in Container Registry, then stores the resulting metadata and
makes it available for consumption through an API.
34. Container Analysis performs vulnerability scans on images in Container Registry and
monitors the vulnerability information to keep it up to date. What are the two main
tasks that Container Analysis performs?
a. Continuous Logging
b. Vulnerability Reporting
c. Incremental Scanning. Incremental scanning: Container Analysis scans new
images when they‘re uploaded to Container Registry.
d. Continuous Analysis. Container Analysis continuously monitors the metadata
of scanned images in Container Registry for new vulnerabilities.
35. Cloud Trace can collect latency data from which of the following services?
a. App Engine
b. All of the Above
c. Applications
d. GKE
e. Load Balancers
36. __________________is a unified programming model and also a managed service for
developing and executing a wide range of data processing patterns including ETL,
batch computation, and continuous computation. What is the service?
a. Cloud DataProc
b. Cloud Datalab
c. Cloud Spanner
d. Cloud Dataflow. Cloud Dataflow is a unified programming model and a
managed service for developing and executing a wide range of data processing
patterns including ETL, batch computation, and continuous computation. The
challenge with a lot of the GCP services is that they sound the same or have
the same prefix which can be confusing.
37. Your company is getting ready to deploy a CI pipeline on GCP. You need to confirm
that you have the proper syntax for creating a Kubernetes namespace called
“production” that will logically isolate the deployment.
What is the Kubernetes command to do this?
a. kubectl create name production
b. kubectl create ns production
c. kubectl ns create production
d. kubectl namespace create production
38. What does Cloud Logging in Google Cloud include as part of the service?
a. Storage for logs
b. API for programmatic access
c. User Interface (Logs Viewer)
d. Kubernetes Logging extensions
e. Analytics Tools
39. The first step in Cloud Deployment manager is to create what____________?
a. Resources
b. Pipeline
c. Configuration. The first step in creating your deployment is to create a
configuration. A configuration is a list of resources, and their respective
properties, that will be part of your deployment.
d. Template
40. You’re currently considering moving your on-premises CI pipeline from on premises to
Google Cloud Platform. You would like to have code maintained in a private Git
repository which is hosted on the Google Cloud Platform. What service would you
choose?
a. Cloud Source Repositories
b. Kubernetes Engine
c. Cloud Run
d. Cloud Build
e. Container Registry
41. You would like to add a strict deploy-time policy enforcement to your Kubernetes
Engine cluster. What would be your best option?
a. IAM Policies
b. Security Scanner
c. Cloud Armor
d. Binary Authorization. Binary Authorization is a deploy-time security control
that ensures only trusted container images are deployed on Google Kubernetes
Engine (GKE).
Using Binary Authorization, you can require images to be signed by trusted
authorities during the development process and then enforce signature
validation when deploying
42. You would like to deploy a LAMP stack for your development team. The only issue is
you’re not sure how to configure this LAMP stack. You would like to use a solution
that has ready made templates to deploy. What GCP service could you use
a. Cloud Deployment Manager
b. Cloud Endure
c. Cloud Marketplace. Google Cloud Marketplace formerly Cloud Launcher offers
ready-to-go development stacks, solutions, and services to accelerate
development, so you spend less time installing and more time developing.
d. Cloud Build
43. What would be the reason to implement Cloud Run APIs?
a. Deploy and manage user provided virtual machine images that scale
automatically based on HTTP traffic
b. Deploy and manage user provided container images that scale automatically
based on HTTP traffic. Cloud Run is a newer service that became available this
year. Its main purpose is to deploy and manage user provided container images
that scale automatically based on HTTP traffic
c. Deploy and manage application provided container images that scale
automatically based on HTTP traffic
d. Deploy and manage user provided container images that scale automatically
based on TCP/UDP traffic
44. Which of the following is not possible using primitive roles in GCP?
a. Allows a user access to view all datasets in a project, but not run queries on
them. Primitive roles can be used to give owner, editor, or viewer access to a
user or group, but they can‘t be used to separate data access permissions from
job-running permissions.
b. Allows a user access to view all datasets in a project only
c. Allows Development owner access and Production editor access for all
datasets in a project
d. None of the Above
45. You would like to create a new repository in Cloud Source Repositories with gcloud.
What would be the command to create a repo called “devops”
a. gcloud create source repos "devops"
b. gcloud source repo create devops
c. gcloud create source repos devops
d. gcloud source repos create devops
46. You need to create many projects for many different teams. You want to use a Cloud
Deployment Manager (DM) deployment to create those projects in a folder called
devops1.
What should you do?
a. Create a project called devops1 and enable appropriate APIs. Grant the project
owner role to the service account Use command “gcloud deployment-
manager deployment create -project devops1
b. Create a project called devops1 and enable appropriate APIs. Grant the project
creator role to the service account Use command “gcloud deployment-
manager The best option is to allow for the project creator role. (never owner)
for a service account. Command syntax is correct
c. Create a project called devops1 and enable appropriate APIs. Grant the
organization role to the service account Use command “gcloud deployment-
manager deployments create new -project devops1
d. This cannot be done. Use Terraform since it supports teams better.
47. App Engine has several solid use cases for the enterprise? What are three uses cases
for App Engine to be a good candidate for a customer?
a. Testing Applications on a hosted platform
b. Scaling Applications on a hosted platform
c. Deploying your SaaS application on a hosted platform
d. Develop Applications on a hosted platform
e. Deploying your VDI application on a hosted platform
48. What is the default retention period for Admin Activity Logs?
a. 500 days
b. 30 days
c. 31 days
d. 400 days
49. The company that has hired you to design a cloud application for their business is now
requiring you to adhere to the following requirements. They want to utilize as many
GCP data focused services as possible
1. Enterprise Data Warehouse (EDW) with SQL
2. Fast response times for OLAP workloads up to petabyte-scale,
3. Supports Big Data services and BI Tools.
4. Fully managed service
What service would you recommend the customer consider based on the limited
information?
a. BigQuery BigQuery is Enterprise Data Warehouse (EDW) with SQL and fast
response times for OLAP workloads up to petabyte-scale, Big Data exploration
and processing, and reporting via Business Intelligence (BI) tools
b. Cloud Dataflow
c. BigTable
d. Cloud Datastore
e. Cloud SQL
50. What GCP service is a lightweight, event-based, asynchronous compute solution that
allows you to create small, single-purpose functions that respond to cloud events
without the need to manage a server or a runtime environment.
a. Cloud Run
b. Cloud PubSub
c. Cloud Functions
d. Cloud Datastore
51. You are currently running your containers on Google Kubernetes Engine. You have
decided to also monitor the nodes that GKE has deployed on your containers on. You
have set up logging information from your application to be sent to stdout while your
app begins as a system service on your GKE. Without changing the app, how do you
get logs sent to Stackdriver(Cloud Operations)?
a. Install Stackdriver Logging Agent. Review the application logs from the
Compute Engine VM Instance system event logs in Stackdriver
b. Delete the application logs from the Compute Engine VM Instance activity logs
in Stackdriver
c. Install Stackdriver Logging Agent. Review the application logs from the
Compute Engine VM Instance syslog logs in Stackdriver Question does not
directly state if Stackdriver was installed. However, it is to be assumed that it
was not because with the agent logs would flow. You need to first install the
Stackdriver agents and then view the logs
Note that when deploying compute services, you have the option to install the
agents at the time of creation, this would be the simplest way.
d. Review the application logs from the Compute Engine VM Instance activity
logs in Stackdriver
52. Your Site Reliability Engineering team does toil work to archive unused data in tables
within your application’s relational database.
This toil is required to ensure that your application has a low Latency Service Level
Indicator (SLI) to meet your Service Level Objective (SLO). Toil is preventing your team
from focusing on a high-priority engineering project that will improve the Availability
SLI of your application.
You want to reduce repetitive tasks to avoid burnout, improve organizational
efficiency, and follow the Site Reliability Engineering recommended practices. What
should you do?
a. Change the availability SLI to meet the SLA
b. Contact your staffing company for help
c. Identify repetitive tasks that create toil and automate as much as possible.
d. Change the SLO to meet the SLA
53. Which of the following statements are true around simplicity around software
systems?
a. Constantly strive to eliminate complexity in systems they onboard and for
which they assume operational responsibility
b. Reliability is the main prerequisite over simplicity
c. Expect and accept complexity will be introduced into the systems for which
they are responsible
d. Simple releases are generally better than complicated releases
54. What is the maximum size of a log entry with Cloud Operations logging
a. 512
b. 256
c. 128
d. 127
55. Your application runs in Google Kubernetes Engine (GKE).
a. Use a Kubernetes Replica Set and then use Spinnaker to create a new service
for each new version deployed
b. Use a Kubernetes Replica Set and then use Spinnaker to update the replica set
for each new version deployed
c. Use a Kubernetes deployment and use Spinnaker for each new version of the
application
d. Use a Kubernetes deployment object and use Spinnaker for each new version
of the application
56. Your considering placing your Infrastructure as code processes on Cloud Deployment
Manager. What would be a risk of doing this?
a. Cloud Deployment Manager can be used to permanently delete cloud
resources.
b. Cloud Deployment Manager requires a Google APIs service account to run.
c. Cloud Deployment Manager takes some training to use
d. Cloud Deployment Manager APIs could be deprecated in the future.
57. Which of the following is a GCP resource that is used for infrastructure automation.
This resource is where you can specify repeatable processes also. What is this
service/resource in GCP that can be used for automation?
a. Cloud Build
b. Cloud Deployment Manager. Google Cloud Deployment Manager allows you to
specify all the resources needed for your application in a declarative format
using yaml. You can also use Python or Jinja2 templates to parameterize the
configuration and allow reuse of common deployment paradigms such as a load
balanced, auto-scaled instance group. Treat your configuration as code and
perform repeatable deployments.
c. Puppet
d. Container Registry
58. The HTTPS load balancer can leverage which of the following types of GCP resources?
a. Two or more Instance Groups
b. Global IP Address (ephemeral only)
c. One or more Instance Groups
d. Global IP Address (ephemeral or static)
59. You would like to deploy a new cluster on GCP with gcloud. The cluster you need is
going be named devops1. You already set your profile and authenticated. What is the
syntax to deploy a cluster?
a. gcloud container cluster create devops1
b. gcloud containers clusters create devops
c. gcloud container clusters deploy devops1
d. gcloud container clusters create devops1
60. Cloud Endpoints can be implemented in which languages?
a. Node.JS
b. Ruby
c. Java
d. Python
e. Go
61. You’re getting ready to deploy a CI pipeline on GCP. You need to confirm that you have
the proper syntax for creating a Kubernetes namespace called “production” that will
logically isolate the deployment. What is the Kubernetes command to do this?
a. kubectl create namespace production
b. kubectl create ns production
c. kubectl ns create production
d. kubectl namespace create production
62. Your company manufactures devices with sensors and has the need to stream huge
amounts of data from these devices to a storage option in the cloud. Which Google
Cloud Platform storage option is the best choice for your application?
a. BigQuery
b. Cloud Storage
c. Cloud Run
d. Cloud SQL
e. BigTable. Bigtable is ideal for storing very large amounts of data in a key-value
store and supports high read and write throughput at low latency for fast access
to large amounts of data.
63. Google Cloud Platform has several unique and innovative benefits when it comes to
billing and resource control. What are these benefits?
a. Sub Hour Billing
b. Spot Instances
c. Hourly billing (Billed for 1 hour and thereafter every minute on VMs)
d. Sustained Use Discounts
e. Compute Engine Custom Machines
64. Which of the following would be a common Service Level Indicator (SLI) for a big data
system?
a. Availability
b. Throughput
c. Latency
d. End to End Latency
65. Which log type provides you with logs of actions taken by Google Support staff when
accessing your Google Cloud resources?
a. Admin Activity
b. Data Access
c. Access Transparency. Access Transparency provides near real-time logs when
Google Cloud administrators access your content. Cloud Audit Logs already
provide visibility into the actions of your own administrators.
d. System Events
66. Which of the following two statements are true about Cloud Operations (Stackdriver)
Logging with Kubernetes Engine?
a. Stackdriver Logging is deployed to a new cluster by default and you cannot opt
out
b. Stackdriver is the default logging solution for clusters deployed on Google
Kubernetes Engine.
c. To ingest logs, you must deploy the Stackdriver Logging agent to each node in
your cluster. To ingest logs, you must deploy the Stackdriver Logging agent to
each node in your cluster. The agent is a configured fluentd instance, where the
configuration is stored in a ConfigMap and the instances are managed using a
Kubernetes DaemonSet. The actual deployment of the ConfigMap and
DaemonSet for your cluster depends on your individual cluster setup.
d. The actual deployment of the ConfigMap and DaemonSet for your cluster
depends on your node licenses
67. You would like to create a new Compute Engine instance called gcelab2 in the zone
us-central-c. What is the proper command?
a. gcloud compute instances create gcelab2 --zone us-central1-c
b. gcloud compute instances init gcelab2 --zone us-central1-c
c. gcloud compute instances make gcelab2 --zone us-central1-c
d. gcloud compute instances init gcelab2 --region us-central1-c
68. Which role does your service account for GKE need to be granted to access Cloud
Storage and perform a “storage.buckets.update” ?
a. Project Viewer
b. Storage Admin
c. Storage Object Owner
d. Project Owner
69. Which of the following would be a common Service Level Indicator (SLI) for a data
storage system?
a. Throughput
b. Availability
c. TPS
d. Latency
70. Which of the follow methods will not cause a shutdown script to be executed.
a. An instance reset thru an API Call. Create and run shutdown scripts that
execute commands right before an instance is terminated or restarted, on a
best-effort basis. This is useful if you rely on automated scripts to start up and
shut down instances, allowing instances time to clean up or perform tasks, such
as exporting logs, or syncing with other systems.
b. Preemptible instance being shutdown
c. When a user initiates a shutdown though a request to the guest operating
system
d. Shut down via the Cloud Console
71. Which are the following resources are Global Resources in GCP?
a. Zones
b. Snapshots
c. Disks
d. Images
72. You’re currently being summoned to the CIO office and he would like to have a copy
of the billing reports from Google Cloud Platform. What answer has the correct
formats you can export billing info to?
a. JSON or .Doc
b. CSV or JSON
c. JSON or XML
d. CSV or XML
73. You are developing an application that will need to meet strict GDPR requirements
around several facets of the regulations. You have been asked to store your
enterprises data on GCP as efficiently as possible. This data will need to be archived
for at least 5 years. What would be the best option?
a. BigTable
b. Cloud Storage
c. BigQuery
d. Cloud DataStore
74. You have been contacted by your CIO to improve your application availability. You
have decided to use instance groups by spreading your instances across three zones.
What type of instance group do you select?
a. Regional managed groups
b. Multi-Regional managed groups
c. Zonal managed groups
d. Multi-Zonal managed groups
75. How do you isolate VM systems within one project to guarantee that they can‘t
communicate over the internal IP address?
a. Place them in a separate project
b. Place them in different zones
c. Place them in different networks
d. Place them in separate organizations
76. Which of the following two statements are true about choices around Cloud
Deployment Manager templates?
a. Python is a simpler but less powerful templating language that uses the same
syntax as YAML
b. Jinja2 is a simpler but less powerful templating language that uses the same
syntax as YAML
c. Python templates are more powerful and give you the option to
programmatically create or manage your templates
d. Jinja2 templates are more powerful and give you the option to
programmatically create or manage your templates

Jinja2 is a simpler but less powerful templating language that uses the same syntax as
YAML, D. Python templates are more powerful and give you the option to
programmatically create or manage your templates
Explanation: You can write templates in your choice of Python 2.7 or Jinja2. Python
templates are more powerful and give you the option to programmatically create or
manage your templates. If you are familiar with Python, use Python for your templates.
Jinja2 is a simpler but less powerful templating language that uses the same syntax as
YAML. If you aren‘t familiar with Python or just want to write simple templates without
messing with Python, use Jinja2.

77. Your team is designing a new User-facing application to serve requests. Which of the
following is Google‘s SRE suggested best practice for selecting SLIs?
a. Choose as many SLIs as possible to cover all aspects of the system
b. Choose very few SLIs as possible
c. Discover what users expect from the system and choose SLIs to measure it.
d. Allow users to feedback on issues before choosing SLIs.
78. Your company has tasked you with setting up a Continuous Integration pipeline. When
code is committed to the source repository, the pipeline will build docker containers
to be pushed to Container Registry and non-container artifacts to be pushed to Cloud
Storage. How would you accomplish this?
a. Add an images field, that specifies the docker images to be pushed to
container registry, to the Cloud Build config file.
b. Add an artifacts field, that specifies the docker images to be pushed to
container registry, to the source repository config file
c. Add an artifacts field, that specifies the non-container artifacts to be stored in
Cloud Storage, to the Cloud Build config file
d. Add an images field, that specifies the non-container artifacts to be stored in
Cloud Storage, to the source repository config file.
e. Add an options field, that specifies the non-container artifacts to be stored in
Cloud Storage, to the source repository config file.
79. Your team is running a production apache application on Google Compute Engine. You
currently monitor the default metrics such as CPU utilization. You have a new
requirement to monitor metrics from the Apache application in the Google Cloud
console. What should you do?
a. Install the fluentd agent on the Compute Engine instance. Fluentd is used for
logging and you have to install the monitoring (collectd) agent in order to
monitor custom metrics.
b. Install the collectd agent on the Compute Engine instance.
c. Download the apache.conf, place it in the directory
/etc/stackdriver/collectd.d/ and restart the monitoring agent.
d. Download the apache.conf, place it in the directory
/etc/stackdriver/fluentd.d/ and restart the monitoring agent.
e. Do nothing.
80. Your team has been working on building a web application that will have a local
audience. The plan is to deploy to Kubernetes as soon as your deployments are
reviewed and approved.
You currently have a Docker file that works locally but needs to be deployed to the
cloud. How can you get the application deployed to Kubernetes?
a. Use docker to create a container image, push it to the Google Container
Registry, deploy the uploaded image to Kubernetes with kubectl.
b. Use kubectl convert -f FILENAME to push the converted Docker file into a
deployment
c. Use kubectl apply to push the Docker file to Kubernetes
d. Use docker to create a container image, save the image to Cloud Storage,
deploy the uploaded image to Kubernetes with kubectl
81. According to Google Cloud Platform design principles stateless servers are easier to
work with than stateful servers? (True or False) ?
a. True
Stateless servers are easier to work with than stateful servers according to Google Cloud
Platform design principles. This is because stateless servers:

• Are simpler to manage: They don’t require complex mechanisms to manage state,
such as session replication or sticky sessions.
• Are more scalable: They can be easily scaled horizontally by adding more instances,
as they don’t rely on any specific server for maintaining state.
• Are more resilient: They are less prone to failures, as the loss of a single instance
doesn’t affect the overall application state.
• Are easier to test and debug: They can be tested and debugged more easily, as there
is no need to worry about state consistency or race conditions.

b. False
82. Each Cloud Platform project has three unique identifiers. Which one is NOT a correct
identifier?
a. Project Name
b. Project ID
c. Project Number
d. Project Scope
83. You’re currently a developer at XYZ Corporation and you have over 60 projects
deployed on GCP. You would like to integrate SSO into your GCP and additional IT
services. What are two features of SSO with GCP?
a. Use your own authentication mechanism and manage your own credentials
b. Federate your AD Trees to Google Cloud Platform
c. Federate your AWS accounts with Google Cloud Platform
d. Google Apps Directory Sync integrates with LDAP
84. Google Cloud Deployment Manager allows you to specify all the resources needed for
your application in a declarative format using ___________ format? What is the
format Cloud Deployment Manager uses?
a. PHP
b. Python
c. YAML
d. JSON
85. Which of the following two details about managing secrets would not be considered
a best practice?
a. Use a separate solution or platform. This helps to isolate secrets from other
sensitive data and reduce the risk of exposure.
b. Cache Secrets locally many times a day. This is not recommended because
caching secrets locally increases the risk of exposure if the local machine is
compromised. Secrets should be retrieved from a secure secret management
solution as needed, rather than being cached locally.
c. Rotating Secrets is a must. Regularly rotating secrets is essential to prevent
unauthorized access.
d. Cache Secrets locally once a year. While caching secrets locally for a short
period of time might be acceptable in certain scenarios, caching them for an
entire year is not a good practice. Secrets should be rotated regularly to
minimize the risk of exposure.
86. Logs are associated primarily with GCP ____________, although _______________ can
also have logs.
a. Projects and Organizations
b. Projects and Regions
c. Project and Regions
d. Zones and Regions
87. The edge proxy in a Kubernetes ingress controller could be configured in several ways.
Which are two ways we could configure the edge proxy?
a. Annotations
b. Tags
c. Egress resources
d. Custom resource definitions
88. You would like to create a new repository in Cloud Source Repositories with gcloud.
What would be the command to create a repo called “developer”?
a. gcloud create source repos developer
b. gcloud source repos create developer
c. gcloud create source repos "developer
d. gcloud source repo create developer
89. Which incident severity classification, is characterized by a user-visible outage with no
lasting damage to your services or customers. There is a possible noticeable revenue
loss that could be incurred by the company.
a. Detrimental
b. Minor
c. Negligible
d. Major
90. The Versioning Configuration feature in gsutil enables you to configure a Google Cloud
Storage bucket to keep old versions of objects. The gsutil versioning command has
two sub-commands. What are the two subcommands?
a. Set
b. Get
c. Show
d. List
e. Put
91. Which of the following is a feature of using a VPC In Google Cloud?
a. Global Resource. A single Google Cloud VPC can span multiple regions without
communicating across the public Internet. For on-premises scenarios, you can
share a connection between VPC and on-premises resources with all regions in
a single VPC. You don‘t need a connection in every region.
b. Regional Resource
c. AWS Compatible Resource
d. Multi-Regional Resource
92. In GCP there are two types of managed instance groups. What are they?
a. Regional
b. Zonal
c. Global
d. Multi-Regional
e. GDPR
93. What is a direct measurement of a service’s behavior such as latency?
a. SLA
b. SLI
c. None of the Above
d. SLO
94. What does Intent-based Capacity Planning involves?
a. Encoding the dependencies of a service
b. All of the above. Intent-based Capacity Planning is Google‘s approach to declare
reliability intent for a service and then solve for the most efficient resource
allocation plan dynamically.
c. Auto-generation of allocation plans
d. Encoding programmatically
95. You need to follow a “best practice” for dealing with processes that do not shutdown
correctly with your VM‘s. What do you configure in the autoscaling options that will
reduce risk by running shutdown scripts to redirect incoming traffic at the load
balancer and flush the cache prior to exit?
a. Soft Exit
b. Not supported in GCP
c. Graceful Exit. Graceful exit with autoscaling event by running shutdown script
to redirect incoming traffic at load balancer, and flush cache prior to exit.
d. Hard Exit
96. Your customer requires that metrics from all applications be retained for 5 years for
future analysis in possible legal proceedings. Which approach should you use?
a. Configure Operations Monitoring for all Projects, and export to BigQuery
b. Configure Operations Monitoring for all Projects, and export to Datastore
c. Configure Operations Monitoring for all Projects with the default retention
policies
d. Configure Operation Monitoring for all Projects, and export to Cloud Storage
97. You would like to understand operations of deploying Compute Engine resources and
its operations. What log would you want to view?
a. Access Transparency
b. Data Access
c. Admin Activity. Admin Activity audit logs contain log entries for API calls or
other administrative actions that modify the configuration or metadata of
resources.
d. System Events
98. What type of account would you use in your code when you want to have services
interact with other services.
a. User Account
b. Service Account
c. GitHub Account
d. GitLab Account
99. Your customer requires that metrics from all applications be retained for 5 years for
future analysis in possible legal proceedings. Which approach should you use?
a. Configure Stackdriver Monitoring for all Projects, and export to Cloud
Datastore
b. Configure Stackdriver Monitoring for all Projects with the default retention
policies
c. Configure Stackdriver Monitoring for all Projects, and export to Cloud Storage
d. Configure Stackdriver Monitoring for all Projects, and export to BigQuery
DevOps Professional GCP IV
1. You are being requested to migrate VMS from your onsite datacenter to GCP Compute
Engine. What is the gcloud command to import images and create a bootable image?
a. gcloud compute import "images"
b. gcloud compute images import
c. gcloud compute import images
d. gcloud compute images "import"
2. You are currently building an SRE organization and you would like to follow what
Google does to build culture. Which of the following two ways could you introduce
this culture?
a. Contact HR for more people asap
b. Launch and Iterate
c. Enable daily culture meetings
d. Create and communicate a clear message
e. Lower your standards initially
3. You’re currently learning that Google Cloud has two specific platforms for providing
an implementation for message passing and asynchronous integration of your
message services. They have do have similarities but differences as well. From the
following statements select the two correct statements about Cloud Tasks.
a. Cloud Tasks is aimed at implicit invocation where the publisher retains full
control of execution
b. Cloud Tasks are appropriate for use cases where a task producer needs to
defer or control the execution timing of a specific webhook or remote
procedure call
c. Cloud Tasks are optimal for more general event data ingestion and
distribution patterns where some degree of control over execution can be
sacrificed
d. Cloud Tasks is aimed at explicit invocation where the publisher retains full
control of execution
4. You have an application that accepts inputs from users. The application needs to kick
off different background tasks based on these inputs. You want to allow for
automated asynchronous execution of these tasks as soon as input is submitted by
the user. Which product should you use?
a. Cloud SDK
b. Cloud Pub/Sub
c. Cloud Tasks. Cloud Task Queues Push or Pull. The core difference between
Pub/Sub and Cloud Tasks is the notion of implicit vs explicit invocation.
d. Cloud Crons
5. You are helping with the design of an e-commerce application. The web application
receives web requests and stores sales transactions in a database.
You need to identify minimal Service Level Indicators (SLIs) for the application to
ensure that forecasted numbers are based on the latest sales numbers.
Which SLIs should you set for the application?
a. Database - Freshness
b. Web Application - Availability
c. Web Application - Durability
d. Database – Availability
6. What GCP service component can you select to create, manage, and upgrade
Kubernetes clusters in your on-premises environment while using your Kubernetes
Engine services.
a. Cloud Anthos
b. Third party services
c. GKE On Prem
d. GKE has this native capability
7. Google recommends using the “___________” , technique which is an iterative
interrogation technique to help identify the root cause of a problem and to get past
the apparent surface cause. What is the technique named?
a. Contact Support
b. 6 Why‘s
c. 5 Why‘s. The “Five Why‘s” is an iterative interrogation technique to help identify
the root cause of a problem and to get past the apparent surface cause.
d. GCP Resolution Workflow
8. In an SRE based organization there are several roles with distinct responsibilities.
What role would execute the technical response around an incident?
a. Operations Lead
b. Incident Commander
c. Communications Lead
d. Primary Responder. Primary responder(first responder) is the SME and the one
that repair or solve the outage.
9. You are the DevOps Engineer in a Finance company. You manage the Cloud Landscape.
The company has several applications on GKE clusters, and the clusters write logs to
Cloud Logging. There is a legal requirement to store logs for 7 years. What is the most
cost-effective place to store the logs?
a. Route the logs to a multi-region Cloud Storage bucket with standard storage
class.
b. Route the logs to a single region Cloud Storage bucket with the archive storage
class
c. Update the retention policy on the _Default log bucket to 7 years .
d. Route the logs from Cloud Logging to BigQuery. BigQuery is mostly suited
towards analytics not long-term storage.
10. You are tasked with designing an automated CI pipeline for building and pushing
images to Container Registry when there is a commit with a particular tag. In the
current system, developers issue build commands after code is pushed to the test
branch in the source repository. What steps can you take to automate the build
described above the least amount of management overhead?
a. Add a cloud build config file when code is pushed to the branch. Create a
trigger in Cloud Source Repository and select the event “Push to a branch”
b. Add a cloud build config file when code is pushed to the branch. Create a
trigger in Cloud Build and select the event “Push new tag”
c. Add a cloud build config file when code is pushed to the branch. Create a
trigger in Cloud Build and select the event “Pull request”
d. Create a cloud function that is triggered when code is committed to the cloud
source repository.
11. You are tasked with investigating the gradual degradation of a production
application’s response time. The application is deployed to a Managed Instance Group
of five instances. What steps can you take to investigate this issue with the least
amount of overhead?
a. Install the logging agent and create a logs-based metric in Cloud Logging
b. Instrument your application with the tracing agent and inspect the latency
data in the Cloud Trace console.
c. Install the monitoring agent and create a dashboard to view the response
time.
d. Install the debugger agent and investigate in the Cloud Debugger console. This
is used to investigate the state of your applications in real time and does not
contain the latency data needed.
12. A gaming company has decided to move its operations to the Cloud. Applications will
be developed and deployed using cloud services such as Compute Engine. You have
been tasked to capture all audit logs from the services used.
How can you achieve this?
a. Enable the Access Transparency logs
b. Enable the Admin Activity audit logs for the services.
c. Enable the Data Access audit logs for the services used.
d. Enable the System Event audit logs.
13. Your organization has recently decided to build and deploy the new version of its
applications in the Cloud. The application is deployed to Compute Engine. During
testing users complain of slow response from the application. What steps can you take
to understand why the application’s response time is high?
a. Install the Profiler package on the VMs and analyse the data in Cloud Profiler
to identify what aspect of the application is causing the slow response.
b. Instrument your application with Google Client libraries and view the traces
in Cloud Trace to identify what aspect of the application is causing the slow
response.
c. Install the logging agent on the VMs and analyse the logs in Cloud Logging to
identify what aspect of the application is causing the slow response.
d. Install the Debugger agent on the VMs and analyse the data in Cloud Profiler
to identify what aspect of the application is causing the slow response.
14. A customer deployed an application on the Compute Engine. The instance uses the
default service account, and the application writes to the logs of the instance. You
have been asked to investigate why no logs are appearing in Cloud Logging. Which of
the following is most likely the problem?
a. There is no logging agent installed on the instance.
b. VM machine logs have not been turned on.
c. The service account does not have the Logging Viewer role. The default service
account has the editor role attached. It has sufficient permissions to write to
Cloud Logging.
d. The scopes of the service account are not sufficient to write to Cloud Logging.
15. Your team has deployed a new version of a service and suddenly more instances are
being created in your Kubernetes cluster. Your service scales when average CPU
utilization is greater than 80%. What tool would help you investigate the problem?
a. Cloud Profiler. Cloud Profiler provides insight into how CPU and memory is
consumed by applications. This will help with understanding why your cluster is
scaling out.
b. Cloud Trace.
c. Cloud Logging.
d. Cloud Monitoring. Cloud Monitoring provides a centralised view of metrics that
can be monitored for GCP services. It does not show the root cause of the scaling
action in your cluster.
16. A betting organization analyses the bets placed on its website at night. The analysis
takes about 5 hours and must be run between midnight and 6am. The bets are
analysed using standard Compute Engine instances and cannot handle interruptions.
You have been tasked with optimising the cost of the analysis which is to run for
another twelve months. Which of the following is the best option?
a. Move to committed-use instances
b. Move to preemptible instances
c. Use the current set of instances because they are the cheapest.
d. Switch from the standard compute instances to Cloud Functions.
17. Your DevOps team is responsible for implementing the aggregated logs collection for
all the projects in the Google organization of your company. Your team needs to
reduce the quantity of logs collected to save costs. Which of the following log types
CANNOT be disabled?
a. Firewall logs
b. VPC Flow logs
c. Policy Denied logs.
d. Data Access logs.
18. Your team is planning on the structure of the Cloud Monitoring workspace that will
monitor multiple projects. You need to grant permissions to the service account of
Compute Engine instances to send metric data to Cloud Monitoring. Following the
principle of least privilege, which of the following roles should be assigned?
a. Monitoring Admin.
b. Monitoring Metric Writer. Monitoring Metric Writer provides enough
permissions for users or service accounts to write metrics to Cloud Monitoring.
c. Logging Admin.
d. Logs Configuration Writer.
19. Your team has deployed a Java application to a Managed Instance Group. The
compute engine instances have the logging agent installed and application logs are
sent to Cloud Logging. You have been tasked with creating an alerting policy if the
errors in the application logs exceed a threshold. What of the following is NOT a valid
notification channel for your alert policy?
a. Slack.
b. Webhooks.
c. PagerDuty Services.
d. Twitter.
20. Your team has deployed a python application to a Managed Instance Group. The
Managed Instance Group is placed behind a load balancer. You have been tasked with
ensuring the load balancer only sends requests to instances that are working. What of
the following helps you achieve this?
a. Setup the Readiness Probe to continuously check if the instance is available.
b. Setup the Liveness Probe to continuously check if the instance is available.
Liveness Probe is used by Kubernetes to check if a pod is in the running state, if
it is not it restarts the pod.
c. Setup Health Checks to continuously check if the instance is available. Health
Checks is used by the load balancer to determine if the backend is reachable
(responds to traffic).
d. Setup Uptime Checks probe to continuously check if the instance is available.
21. You are a DevOps engineer for a social media company. You are on the monitoring
team for their flagship web application that is growing rapidly. The application is
deployed on Managed Instance Groups behind a HTTP(S) load balancer. The number
of logs created by the application is causing the Project to exceed the logging API
quota. You have created exclusion filters in Cloud Logging. You notice the issue
persists. What could be the problem?
a. The exclusion filter was not configured properly.
b. More log types need to be added to the exclusion filters
c. Logs are excluded after they are received by the Logging API. Therefore,
excluding logs does not reduce the number of entries.write API calls. The
problem is the number of entries.write API calls which pushes logs to Cloud
Logging before exclusion filters can be applied. The solution will be to reduce
the logs collected.
d. The service account of the Managed Instance Group needs additional
permissions to use the exclusion filters. There is no need for extra permissions.
The Managed Instance Group can already access Cloud Logging.
22. You are a DevOps engineer for a tech company. You are responsible for the production
Project. At the end of the month, you are informed by finance that charges from logs
stored are very high. You have been asked to investigate and reduce the number of
logs generated in the project. What of the following is unlikely to be generating a lot
of logs?
a. Load Balancer. The load balancer logging is enabled by default and generates a
lot of logs. This can be turned off or excluded
b. VPC Flow logs
c. Data Access logs. Data access logs are turned off by default because they can
generate a lot of logs.
d. IAM. IAM logs fall under audit logs which cannot be turned off and are very little
compared to data access logs.
23. A customer has multiple projects in Google Cloud. The projects represent the different
environments. You have been tasked with sending certain logs from all projects to
Splunk. There is a requirement to send any data access logs to Splunk. What of the
following DOES NOT help you meet this requirement?
a. Set up log sink destination Pub/Sub and subscription.
b. Create the logs sink to route data access logs.
c. Set IAM policy permissions for log sink destination Pub/Sub topic.
d. Set up the log sink destination Cloud Storage bucket.
24. You work as a DevOps Engineer for an energy client. The client runs their applications
on Google Kubernetes Engine and logs are sent to Cloud Logging. They would like to
use the logs generated to monitor the application usage in real time. What is the best
destination for the export sink?
a. Pub/Sub
b. Cloud Storage. Logs routed to Cloud Storage are done in hourly batches.
c. BigQuery. It takes minutes for logs route to BigQuery to appear in the table.
d. Spanner
25. A large professional services client uses Google Cloud for some of its workload. Your
DevOps team is now required to route all logs that show actions taken by Google staff
in its account to a separate logging bucket. Which of the following helps you achieve
this?
a. Create a log sink and route all Access Transparency logs to the specified
logging bucket
b. Create a log sink and route all Admin Activity logs to the specified logging
bucket. Admin Activity logs show actions that modify the config or metadata of
resources.
c. Create a log sink and route all Data Access logs to the specified logging bucket.
Data Access logs shows actions that read the config or metadata of resources as
well as user API calls that perform CRUD operations.
d. Create a log sink and route all System Event logs to the specified logging
bucket. System Event logs are generated by Google systems for Google Cloud
actions that modify the config of resources.
26. Your customer is a financial organization, and you are responsible for setting up an
automated CICD pipeline to deploy applications to GKE clusters in production. You
need to restrict the kinds and origin of images that can be used to deploy containers
into clusters. How can you achieve this?
a. Apply firewall rules to the VPCs.
b. Create custom routes to control traffic to the clusters.
c. Enable binary authorization on the clusters and apply a policy to govern the
allowed images
d. Apply IAM permissions to restrict the container images that can be deployed
on clusters
27. Your Site Reliability (SRE) team members are managing the CICD of your organization.
Applications are deployed to Compute Engine instances. There is a requirement to
send the logs of the instances in the Development Projects to a user-created bucket.
Which step can you take to achieve this?
a. Create a log sink in Cloud Logging with the destination as Cloud storage
bucket.
b. Create a log sink in Cloud Logging with the destination as Cloud Pub/Sub.
c. Create a log sink in Cloud Monitoring with the destination as Cloud Logging
bucket
d. Create a log sink in Cloud Monitoring with the destination as Cloud storage
bucket
28. A customer has opted to use an external source code management such as GitLab. The
customer wants to use Cloud Build for its Continuous Integration and Deployment to
Cloud Run. They would like to automatically trigger a build in Cloud Build when code
is pushed to GitLab. How can this be done?
a. Create a trigger in Cloud Build with the Event Manual invocation.
b. Create a trigger in GitLab with the Destination as Cloud Build.
c. Create a trigger in Cloud Build with the Webhook event. Cloud Build can be
triggered by external systems only via webhooks.
d. Create a trigger in Cloud Build with the Event Pub/Sub message
29. You are designing the CICD pipeline for a customer. The pipeline will be used by
developers to push changes to production. The customer strategy dictates the use of
cloud native tools in the pipeline. Cloud source repositories and Cloud Build have been
chosen. The customer has requested that automated builds in the pipeline are
approved by a senior engineer. How can this be done?
a. In the Cloudbuild.yaml file, specify approval as a requirement.
b. Create a trigger in Cloud Source repository and enable require approval before
build executes.
c. Create a trigger in Cloud Build with the Webhook event.
d. Create a trigger in Cloud Build and enable require approval before build
executes.
30. You work as a DevOps Engineer for a client. The company uses cloud native tools for
its CICD pipeline. Automated build is done using Cloud Build when code is pushed to
repositories in Cloud Source Repositories. Which of the following CANNOT be used as
a trigger with Cloud Source Repositories?
a. Push to a branch
b. Pull Request. This option cannot be used to trigger builds in Cloud Build if Cloud
Source Repositories is used.
c. Push new tag
d. Manual invocation
31. You work as a DevOps Engineer for a client. Developers make changes and push code
to branches in a repository. Each branch is merged into a staging branch daily. The
client wants to trigger a build of the staging branch every night. How can you achieve
this?
a. Create a trigger in Cloud Build with a webhook event
b. Create a trigger in Cloud Build with “manual invocation” as the event
c. Create a trigger in Cloud Build with “Push to a branch” as the event
d. In Cloud Scheduler, create a trigger for Cloud Build.
e. Enable Cloud Scheduler API and create a Cloud Scheduler job.
32. An organization is planning to use an automated CI/CD pipeline to deploy applications
to Compute Engine. The organization would like to use a combination of cloud native
and open-source tools for the pipeline. Which of the following helps you achieve this?
a. Cloud Build as the continuous integration tool and Spinnaker as the
continuous delivery tool
b. Cloud Build as the continuous integration tool and continuous delivery tool
c. Jenkins as the continuous integration tool and Spinnaker as the continuous
delivery tool
d. Jenkins as the continuous integration tool and continuous delivery tool.
33. You are the DevOps Engineer in a healthcare start-up firm. The company has a new
application it is testing. Before the application is promoted to production for live
traffic, you have been tasked with creating an incident response strategy. Which of
the following are incident response team roles that should be delegated?
a. Lead Architect, DevOps and Testing.
b. Operations Lead, Communications Lead and Incident Commander.
c. Helpdesk, Engineering and Executive.
d. Developers, Helpdesk and Operations.
34. A gaming company recently launched a new version of its popular game. The traffic to
the company’s site has increased by over 70%. Users are now complaining of timed
out requests when they attempt to launch the game. Your team declares an incident.
What action is the most important?
a. Dig deeper and Try to find out why the user requests are timing out.
b. Document the incident, all contributing root causes and effective preventive
actions that can be taken.
c. Work to resolve service as soon as possible.
d. Determine who is responsible for not planning the capacity properly.
35. Your client just recovered from a major outage that disrupted application service for
almost an hour. Your DevOps team has been tasked with creating a document that
summarizes the events that took place during the incident. Which of the following
documents will you create?
a. Alerts
b. Support Tickets
c. Postmortems
d. Press Release.
36. You are planning on deploying Nginx using Kubernetes Engine. You need to track the
number of requests Nginx has serviced. Which of the following can help you achieve
this?
a. Enable Cloud Operations for GKE in the Cluster Features.
b. On the Cluster, install the Cloud Monitoring Agent and
c. Choose the System and workload logging and monitoring option in the Cluster
Features.
d. Install the necessary configuration files for Nginx.
e. Install the catch-all configuration to automatically capture the Nginx metrics.

37. You are the on-call SRE for a betting company. You are managing an application
deployed on App Engine flexible environment within a custom VPC. The application
accepts user traffic from anywhere using HTTPS. You have been tasked with logging
all successful incoming SSH traffic to the GCE instances from the company network.
How will you achieve this?
a. Create a firewall rule that denies ingress traffic on Port 22 from the company
network to the VPC network and turn on Logs .
b. Create a firewall rule that allows ingress traffic on Port 22 from the company
network to the VPC network and turn on Logs.
c. Create a firewall rule that denies egress traffic on Port 22 from the company
network to the VPC network and turn on Logs.
d. Create a firewall rule that allows egress traffic on Port 22 from the company
network to the VPC network and turn on Logs.
38. Your company has several Google projects in its organization. As part of the
monitoring strategy, the projects will be added to specified workspaces. Your team
has been assigned the task of creating the workspaces. Following the principle of least
privilege, what IAM role would your team need to create workspaces?
a. Assign the Project Editor role to the team in the Project where the monitoring
workspace will be created.
b. Assign the Monitoring Editor role to the team in the Project where the
monitoring workspace will be created.
c. Assign the Monitoring Admin role to the team in the Project where the
monitoring workspace will be created.
d. Assign the Project Owner role to the team in the Project where the monitoring
workspace will be created
39. Your company has multiple projects in Google Cloud. The projects represent the
available environments such as development, test, pre-production and production. A
centralized logging system needs to be implemented where all the environments send
their logs to a security project. There is a requirement to not send any logs generated
by an apache application to the security project. What steps can you take to achieve
this?
a. Create a Logs bucket in the security project.
b. Create a Logs bucket in each project.
c. In the Logs router of each project, create a sink with the logs bucket in the
security project as destination and specify the apache logs to exclude with a
filter rate of 0.
d. In the Logs router of each project, create a sink with the logs bucket in the
security project as destination and specify the apache logs to exclude with a
filter rate of 100.
e. In the Logs router of each project, create a sink with the logs bucket in each
project as destination and specify the apache logs to exclude with a filter rate
of 100.
40. You are part of the Site Reliability Engineering Team at your company. Your team
manages all the updates to production, and review of application performance in
production. Recently there was an incident in production that affected a whole region
of users. A meeting has been called to review the incident. Following Google’s best
practice, which of the following should not be discussed?
a. The root cause of the incident.
b. The action items in the post-mortem and their status.
c. Team members responsible for the incident and training needed.
d. Any scheduled deployments to production.
41. You are the on-call SRE for a growing media company. You are managing an
application deployed on Compute Engine within a custom VPC. The application
accepts user traffic from anywhere using HTTPS. You have been tasked with logging
all failed incoming SSH traffic to the GCE instances. How will you achieve this?
a. Create a firewall rule that denies ingress traffic on Port 22 from anywhere to
the VPC network and turn on Logs.
b. Create a firewall rule that allows ingress traffic on Port 22 from anywhere to
the VPC network and turn on Logs.
c. Create a firewall rule that denies egress traffic on Port 22 from anywhere to
the VPC network and turn on Logs.
d. Create a firewall rule that allows egress traffic on Port 22 from anywhere to
the VPC network and turn on Logs.
42. You are part of an on-call Site Reliability Engineering team managing a web application
in the production. The application serves user requests from several regions. A new
update was deployed over the weekend to introduce new features into the
application. Users are reporting errors and failed processed requests from the
application. Your team declares an incident, accesses the impact and discovers the
issue is affecting users in one region. Which of the following is the recommended
action?
a. Take steps to mitigate the impact.
b. Carry out a root-cause analysis of the incident, to prevent it from happening
again.
c. Fix what caused the incident and write a post-mortem.
d. Roll back the previous version of the application.
43. You are part of an on-call SRE team managing an apache web service application in
production. The application is deployed to Google Compute Engine. The FluentD agent
is installed on the GCE instance. You have been tasked with reviewing the apache logs
from the application. Which of the following queries helps you do this?
a. resource.type="gae_app" AND (logName:"/apache-access" OR
logName:"/apache-error").
b. resource.type="gce_instance" AND (logName:"/apache-access" OR
logName:"/apache-error").
c. resource.type="gce_instance" AND
log_id("cloudaudit.googleapis.com/activity").
d. resource.type="gce_instance" AND
log_id("cloudaudit.googleapis.com/data_access").
44. Your organization has several applications running on Compute Engine. The instances
generate logs and metrics which are been monitored on dashboards. There is a new
requirement to capture operating system (OS) level logs for security reasons. How can
you achieve this?
a. Install the google-collectd-catch-all package on the Instances
b. Create log-based metrics from the logs in Cloud Logging
c. Install the google-fluentd-catch-all package on the Instances. The FluentD
agent is needed for OS level logs.
d. Create a sink to route all OS level logs to a specified Logs bucket
45. Your company has three environments called production, staging and development.
A GCP Project has been set up for each environment, there is also a monitoring project
with two workspaces, one for production while the other is for development and
staging. A GKE Cluster has been set up in both staging and development for testing an
application to be deployed to production. Both clusters have a service called app-
serve and an alerting policy was created to monitor the service in the workspace.
When there is an incident on the service, the GKE monitoring dashboard can’t
associate this incident uniquely with the development service or the staging service
How can you resolve this with little operational overhead?
a. Create separate workspace for the development workspace
b. Change the service name in one of the clusters
c. Update the Group By field in the alerting policy by adding namespace, cluster
and location
d. Change the name of one of the clusters
46. Your SRE team is responsible for monitoring and logging of the applications in
Production Projects. The applications are deployed on different resources like
Compute Engine and GKE. Your team has created a centralised monitoring dashboard
in the monitoring Project for the metrics from all the production Projects. An uptime
check was created for the applications. You have been tasked with setting up the
Notification channels for one of the applications to send the notification to a public
endpoint. Which of these helps you meet the requirement?
a. In Monitoring, Select the Alerting and Edit Notification Channels Configure
Webhooks.
b. In Monitoring, Select Dashboards and Edit Notification Channels Configure
Webhooks
c. In Monitoring, Select Settings and Edit Notification Channels Configure
Webhooks
d. In Monitoring, Select Notification Channels Configure Webhooks
47. A company wants to use GCP for their development and deployment of applications.
They have set up an organization, folders and projects. They want to set up multiple
Cloud Source Repositories (CSR) in one Project. Different teams have different access
requirements to the CSRs in the Project. Which of the following is the best way of
managing access to the CSR for the different teams?
a. Assign the permissions for the different teams at the folder level
b. Assign the permissions for the different teams at the Project level
c. Assign the permissions for the different teams at the Organization level
d. Assign the permissions for the different teams at the repository level
48. You are on the SRE team that monitor production-grade applications. One of your
team members notices that one application performance has degraded, and
customers are noticing. As this incident begins to unfold, what is Google’s
recommended first action for managing incidents?
a. Assign the role of an operational lead
b. Assign the role of an Incident Commander
c. Assign the role of a communications lead
d. Assign the role of a Planning lead
49. You are one of the on-call engineers managing an application running in production.
A recent update has caused the application’s response time to increase drastically. An
incident has been declared and all the roles except the Planning Lead has been
assigned. Following Google’s SRE practice, who is to assume this role and its
responsibilities?
a. Assign the role to the operational lead
b. Assign the role to the Incident Commander
c. Assign the role to the communications lead
d. Leave the role unassigned
50. You are on a cross-functional team of SREs and product developers managing an
application that needs to be deployed to production. Metrics for measuring reliability
and performance of the application have been agreed on. There is a need to decide
the frequency of releasing new changes. Following Google’s SRE practice, what
measure should be used to control this?
a. The Service Level Indicators (SLI) measurements can be used to control the
frequency of new releases to production
b. The Service Level Agreements (SLA) should be used to control the frequency
of new releases to production
c. The Service Level Objectives (SLO) measurements should be used to control
the frequency of new releases to production
d. The Error Budget should be used to control the frequency of new releases to
production
51. You are on the SRE team of your company. The client has decided to also keep the logs
that record operations in Compute Engine that read user-provided data made in the
Production project for Two years in order to fulfil a new compliance requirement.
Which of the following can help you achieve this?
a. Update the retention period of the _Required Logs bucket from 30 days to
730days.
b. Create a Logs Bucket with a retention period of 730 days. Create a new Logs
Sink with the new Logs Bucket as destination and the inclusion filter of the
_Required Sink .
c. Create a Logs Bucket with a retention period of 730days. Create a Logs Sink
with the new Logs Bucket as destination and the inclusion filter of the
_Default Sink.
d. Enable the Data Read audit logs for Compute Engine API
e. Enable the Data Write audit logs for Compute Engine API
52. You are planning on deploying JVM using Compute Engine. You need to track the peak
number of live threads in the instance. Which of the following can help you achieve
this?
a. Enable Cloud Operations for Compute Engine in the Features. The monitoring
agent is installed on GKE, all that you need to do is enable it and select the
logging and monitoring type you want.
b. On the GCE instance, install the Cloud Monitoring Agent
c. Choose the System and workload logging and monitoring option in the Cluster
Features. The monitoring agent is installed on GKE, all that you need to do is
enable it and select the logging and monitoring type you want.
d. Install the necessary configuration files for JVM.
e. Install the catch-all configuration to automatically capture the JVM metrics.
53. You work as a DevOps Engineer for a start-up company. The company’s strategy is to
use an automated CI/CD pipeline to deliver software faster. You have been tasked
with choosing the tools for the pipeline. A key requirement is selecting a repository
that can trigger builds in Cloud Build. Which of the following repositories does not
meet the requirements?
a. AWS CodeCommit
b. Bitbucket
c. GitLab
d. Cloud Source Repositories
54. Your SRE team is responsible for monitoring and logging of the applications in
Production Projects. The applications are deployed on different resources like
Compute Engine and GKE. Your team has created a centralised monitoring dashboard
in the monitoring Project for the metrics from all the production Projects. An uptime
check was created for the applications. You have been tasked with setting up the
Notification channels for one of the applications to send notifications to the team.
Which of these helps you meet the requirement?
a. On the Monitoring page, Select the Alerting and Edit Notification Channels
Configure Slack to send notifications to the team’s slack channel.
b. On the Monitoring page, Select Dashboards and Edit Notification Channels
Configure Email to send notifications to the team’s group email
c. On the Monitoring page, Select Settings and Edit Notification Channels
Configure Email to send notifications to the team’s group email
d. On the Monitoring page, Select Notification Channels Configure Pub/Sub to
notify the team’s topic
55. You are on the SRE team of your company. There is a new government regulation to
keep the logs of all API calls made in the Production project for three years. Which of
the following can help you achieve this?
a. Update the retention period of the _Default Logs bucket from 30 days to
1095days.
b. Update the retention period of the _Required Logs bucket from 400 days to
1095days.
c. Create a Logs Bucket with a retention period of 1095days. Create a new Logs
Sink with the new Logs Bucket as destination and the inclusion filter of the
_Required Sink .
d. Create a Logs Bucket with a retention period of 1095days. Create a new Logs
Sink with the new Logs Bucket as destination and the inclusion filter of the
_Default Sink.
56. You manage a Java application running on Kubernetes Engine in Production. The
organization has decided there is a need to understand and benchmark the
performance of the application such as CPU time and Heap. The continuous measuring
process should not affect the performance of the application. Which of the following
can help you achieve this?
a. Install and start the Profiler agent to continuously gather the performance
data and send it to Profiler.
b. Install the Logging agent and send logs to Cloud Logging, create log-based
metrics.
c. Install the monitoring agent and send metrics to Cloud Monitoring.
d. Install the Tracing agent and send latency metrics to Cloud Trace.
57. You manage an application running on App Engine Standard in a production project.
The application serves customers worldwide and downtime needs to be kept to a
minimum. There is a need to troubleshoot the application behaviour by injecting
logging without stopping it. Which of the following can help you achieve this?
a. Enable the Debugger agent and insert Logpoints. Cloud Debugger agent is
needed to use Logpoints. Logpoints allow you to inject logging into running
services without restarting or interfering with the normal function of the
service.
b. Install the Logging agent and send logs to Cloud Logging.
c. Add the monitoring agent and send metrics to Cloud Monitoring.
d. No setup is needed.
58. You are part of the SRE team in your organisation. After a recent incident in production
and the follow-up post-mortem, your team has been invited to a production meeting.
Following Google SRE’s best practice, which of the following should not be discussed
at the meeting?
a. Upcoming production changes that have been planned.
b. Priority action items to be assigned to people and tracked.
c. Team members responsible for the last outage.
d. An issue that is not pageable and does not require attention so it can be
removed from alerting.
59. Your team has been tasked with deploying a python application to Cloud Run. The
developer team needs a way to inspect the state of a python application in real time,
without stopping or slowing it down. You are responsible for implementing the
requirement. Which of the following is needed?
a. Ensure the Compute Engine instance has the access scope option “Allow full
access to all Cloud APIs”.
b. Make sure your app.yaml contains the following line “runtime: python37”.
c. Make sure the compute Engine has Python 3 installed.
d. Add the following line “google-python-cloud-debugger” to the
requirement.txt
60. A financial organization that analyses transactions carried out throughout the day at
night. The analysis takes about three hours and must be run between midnight and
5am. The analysis is currently run on standard Compute Engine instances, with several
OS level guardrails to satisfy government regulations, and can handle interruptions.
You have been tasked with optimising the cost of the analysis which is to run for
another six months. Which of the following will optimise the cost?
a. Switch from the standard compute instances to preemptible instances
b. Switch from the standard compute instances to committed-use instances
c. Use the current set of instances because they are the cheapest.
d. Switch from the standard compute instances to Cloud Functions.
61. You are part of the DevOps team that manages applications running in the production
project of your company. After a recent security incident, there was a new
requirement to catch network traffic going to and from the Compute instances in the
VPCs in the production project. VPC Flow Logs was enabled on the production VPC and
no vpc_flows logs are present in Cloud Logging. Which of the following could be the
reason?
a. Logging Inclusion filters could be blocking the vpc_flows log
b. VPC Flow logs was configured incorrectly.
c. The service account for the instances has insufficient permissions.
d. Logging Exclusion filters could be blocking the vpc_flows log
62. Your company has deployed compute resources in VPCs. There are three VPCs in the
Development Project and applications are deployed to GCE Instances in the VPCs.
There is a new security requirement to collect sample network flows sent to and
received by the VM instances. Which of the following can help you achieve this?
a. Deploy the FluentD agent to instances so application logs can be sent to Cloud
Logging. The FluentD agent is useful for application and OS specific logs not the
network traffic in the VPC.
b. Create a Logs Sink with an inclusion filter to sample traffic in the VPCs. This
will only work after the VPC Flow logs has been enabled
c. Enable Firewall logs on the firewall rules affecting instances deployed to the
VPCs.
d. Enable VPC Flow Logs in the subnets where there are instances. VPC Flow Logs
record a sample of network flows sent from and received by VM instances,
including instances used as GKE nodes. These logs can be used for network
monitoring, forensics, real-time security analysis, and expense optimization.
63. Your company has decided to use GCP services to automate its’ Continuous
Integration and Deployment process. Cloud Build will be used to build images and
other artifacts. A new developer has been tasked with creating the build config files.
The Cloud Build process is failing. Which of the following could be the reason?
a. The cloudbuild.yaml was not placed in the same folder as the application
source code
b. The cloudbuild.xml was not placed in the same folder as the application source
code
c. The cloudbuild.py was not placed in the same folder as the application source
code
d. The cloudbuild.json was not placed in the same folder as the application
source code
e. The cloudbuild.sh was not placed in the same folder as the application source
code
64. The Company has a GCP organization that has applications running in GCP projects.
The applications push logs into Cloud logging. The company wants to analyse the logs
using a third-party software such as Elasticsearch. You have set up the Logs sink to
route logs to a Pub/Sub topic but no logs are appearing in Elasticsearch. Which of the
following could be the reason?
a. The logging filter was not configured correctly
b. The service account of the Logging sink has not been granted access to the
Pub/Sub topic
c. The Pub/Sub topic is in a different region from the Logging Sink configured.
d. There is a firewall rule denying egress traffic to the third-party software
65. You are managing an application that generates many logs in the staging Project. The
Company has an organization in GCP, two folders and four projects. The folders are
dev and prod, while the projects are dev, test, staging and production. The dev and
test projects are in the dev folder and the staging and production projects are in the
prod folder. The company wants to generate metrics from the logs for alerting
purposes for that application alone. What IAM solution will help achieve the
requirement following the principle of least privilege?
a. Assign the Logs Admin to developers in the in the prod Folder
b. Assign the Logs Configuration Writer to developers in the prod Folder
c. Assign the Logs Admin to developers in the staging Project
d. Assign the Logs Configuration Writer to developers in the staging Project
66. You are developing a completely serverless application. The application is going to be
built using Cloud Build. There is a requirement to store all non-container artifacts in
Cloud Storage. How will you meet this requirement?
a. Add an Artifacts field in your build config file with the location of the bucket
to store the artifact and the path to one or more artifacts
b. Add an Options field in your build config file with the location of the bucket to
store the artifact and the path to one or more artifacts
c. Add an Images field in your build config file with the location of the bucket to
store the artifact and the path to one or more artifacts
d. Add a substitutions field in your build config file with the location of the
bucket to store the artifact and the path to one or more artifacts
67. You have been tasked with building an automated build for the deployment of
applications to serverless infrastructure in Google Cloud Platform. Which of the
following can help you complete the task with little overhead?
a. Cloud Source Repository for source code management Cloud Build as a CI tool
Cloud Build as a CD tool for deploying to Compute Engine
b. GitHub for source code management Cloud Build as a CI tool Cloud Build as a
CD tool for deploying to Cloud Functions. Cloud Functions and Cloud Run are
serverless products. The use of fully managed Cloud Build for CICD reduces the
overhead. Also builds can be triggered in Cloud Build from code pushed to
GitHub
c. Cloud Source Repository for source code management Cloud Build as a CI tool
Cloud Build as a CD tool for deploying to Cloud Run. Cloud Functions and Cloud
Run are serverless products. The use of fully managed Cloud Build for CICD
reduces the overhead. Also builds can be triggered in Cloud Build from code
pushed to GitHub
d. GitHub for source code management Cloud Build as a CI tool Cloud Source
Repository as a CD tool for deploying to Cloud Functions
e. GitHub for source code management Cloud Build as a CI tool Jenkins as a CD
tool for deploying to Cloud Run
68. You are one of the on-call engineers in a global team managing an application running
in production. A recent update has caused the application’s response time to increase
drastically. An incident has been declared and actions to mitigate the issue have not
yet been deployed. Your team is coming to the end of your workday. Following
Google’s SRE practice, what should be done?
a. Hand over command to developers in another location
b. Stay over-time to resolve the issue
c. Hand over command to an Incident Commander in another location
d. Close for the day and resume resolving the next working day.
69. By default the number of host projects to which a service project can attach for shared
VPC, is ______
a. 50
b. 5
c. 100. Number of Shared VPC host projects in a single organization (it can be
increased)
d. 1. The number of host projects to which a service project can attach is 1. This
limit cannot be increased
e. 10
70. Your team is developing a containerized python application for a government project.
The application uses a microservices architecture and will be deployed using Cloud
Run. You have been asked to capture the application‘s top or new errors in a clear
dashboard in real-time. How would you achieve this?
a. Report errors to the API using either the REST API or a client library
b. Install the Monitoring agent and modify your application, so it logs exceptions
and their stack traces to Error reporting.
c. No additional setup or configuration is required. Error reporting is
automatically enabled for Cloud Run.
d. Install the Logging agent and modify your application, so it logs exceptions and
their stack traces to Cloud Logging.
71. You are part of a team designing a containerized application to be deployed to GKE.
The application will be deployed to a five-node cluster in a single region. The
application will be used to process sensitive user data and there is a requirement to
remove any sensitive data from the logs before it goes to Cloud Logging. Which of the
following helps you meet the requirement?
a. Enable Cloud Operations in GKE Select System and workload logging and
monitoring
b. Enable Cloud Operations in GKE Select Legacy logging and monitoring.
c. Enable Cloud Operations in GKE Select System monitoring only (Logging
disabled). Logging needs to be disabled so it can be installed manually and
customized.
d. Deploy a custom FluentD deployment to the cluster that filters out the
sensitive information, so it is not logged
e. Deploy a custom FluentD daemonset to the cluster that filters out the sensitive
information, so it is not logged
72. You are part of the DevOps team in a growing analytics company. The company
currently deploys its docker applications on Virtual Machines on-premises. The
company has three different environments: dev, staging and production. The
company is planning to move its applications to GKE. The key requirement is the need
to have the environments separate in a way the allows for restricting access using IAM
policy. Which of the following helps you meet the requirement following GCP’s best
practice?
a. Create a VPC with three subnets in a Project, Create a GKE cluster in each
subnet for the different environments.
b. Create three VPCs with one subnet in a Project, Create a GKE cluster in each
VPC for the different environments.
c. Create one GKE cluster with three namespaces for the different environments.
d. Create three Projects, Create a GKE cluster in each Project for the different
environment.
73. Your SRE team is responsible for monitoring and logging of the applications in
different Production Projects. The applications are deployed on different resources
like Compute Engine and GKE. Your team has created a centralised monitoring
dashboard in the monitoring Project for the metrics from all the production Projects.
A new member needs to be given access to one of the charts in the centralised
dashboard for training purposes. Which steps will help you meet the requirements?
a. View the desired Chart in Metrics Explorer Use Share by URL to get a
parameterized URL for the Chart Send the URL with the new member.
b. Use Share by URL to get a parameterized URL for the Dashboard Send the URL
with the new member.
c. View the desired Chart in Uptime checks Use Share by URL to get a
parameterized URL for the Chart Send the URL with the new member.
d. Grant the new member the Monitoring Dashboard Configuration Viewer role
in the monitoring Project.
e. Grant the new member the Monitoring Viewer role in the monitoring Project.
74. Your organization has recently decided to move its applications to the Cloud. The
current CICD pipeline uses GitHub repositories for source code version control. You
have been directed to proof-of-concept deployment linking GitHub to Cloud Build for
image creation and deployment. What steps can you take to achieve this with minimal
overhead?
a. Create a new repository in Cloud Source Repository Create a trigger in Cloud
Build to automate the image creation Clone the GitHub repository.
b. Connect to an external repository in Cloud Source Repository. Authorize
Google Cloud Platform to connect to the GitHub repository Create a trigger in
Cloud Build to automate the image creation.
c. Commit code to the new repository in Cloud Source Repository.
d. Grant the Cloud Build service account the permissions to build the image.
e. Commit code to the repository in GitHub.
75. You are part of the SRE team working for a data-processing company. Your team
manages an application that was manually deployed to App Engine. The application
source code is stored in Cloud Source Repositories. A new version of the application
has been developed and tested. Approval has been given to deploy to production. You
pushed the code update to the Cloud Source Repository, and after some time you
notice the old version has not been updated. What should you do?
a. Check the trigger in Cloud build is set up correctly and push the update again.
b. In the Cloud Shell, change directory to where the app.yaml is stored and run
the command gcloud app browse.
c. In the Cloud Shell, change directory to where the app.yaml is stored and run
the command gcloud app deploy app.yaml.
d. Ensure the Cloud Build Service Account has permissions to deploy to App
Engine.
76. You are responsible for setting up an automated CICD pipeline. The pipeline will be
used to build docker images for application deployment to GKE. Recently the
performance (build speed)) of Cloud Build in your pipeline has been dropping. What
steps can you take to improve the speed of builds?
a. Exclude files not needed by your build with a .gcloudignore file.
b. Utilize larger base images where possible.
c. Exclude files not needed by your build with a .gitignore file
d. Select a virtual machine with a higher CPU
e. Add more permissions to the Cloud Build Service Account
77. Your company has multiple Project in its Google Cloud Organization hierarchy. There
are resources in the different Projects which have been configured to send metrics to
centralised monitoring workspace. You recently created a deployed Apache on a
Compute Engine with a custom Service Account in one of the Projects. You installed
and configured the monitoring agent to get metrics from Apache application. You
notice there are no apache metrics in the centralised monitoring workspace. Which of
the following is a possible reason?
a. The custom Service Account has insufficient permissions.
b. The Fluentd agent was not installed properly. FluentD is used for logging not
monitoring.
c. The agent is not running on the Compute Engine. The agent is running because
the question says it was installed and configured.
d. The Compute Engine is a different region from the monitoring workspace.
78. You are designing an online gaming application. The web application allows users to
select games and view leaderboards. Game scores are stored in a database. You want
to identify the minimum Service Level Indicators (SLIs) for the application to ensure
the leader board has the latest scores. What SLIs should you select?
a. Web Application - Durability, Database - Coverage,
b. Web Application - Availability, Latency Database - Availability.
c. Web Application - Durability, Quality Database - Latency.
d. Web Application - Coverage Database – Latency.
79. Your organization has created three monitoring workspaces called dev-workspace,
test-workspace and prod-workspace. The workspaces monitor the Projects outlined
below:
dev-workspace: dev-1, dev-2, dev-3
test-workspace: test-1, test-2
prod-workspace: prod-1, prod-b and prod-c
You have been asked to monitor the project prod-1 alongside test-1 and test-2 in the
same workspace. How will you achieve this?
a. Move the prod-1 project to the test-workspace, this will delete it from prod-
workspace.
b. The monitoring workspace of Projects cannot be updated after creation.
c. Merge test-workspace and prod-workspace.
d. Add the prod-1 project to the test-workspace and leave it in prod-workspace
as well.
80. Your team is building an automated CICD pipeline in the development Project. The
Cloud Source Repository will be used for code versioning, Cloud Build will be used to
build and deploy the application to Google Kubernetes Engine. The Cloud Build Service
account has been given the Kubernetes Engine Developer permissions. After a
developer pushes code to the Cloud Source Repository, you notice the application is
not getting deployed to GKE. Which of the following could be the reason?
a. The Kubernetes Engine Developer permissions are not sufficient.
b. No Trigger was created in Cloud Build.
c. Cloud Source Repositories needs permissions to use Cloud Build.
d. The GKE API is not enabled.
81. Your team has been tasked with the monitoring of a new application to be deployed
on Managed Instance Groups. You are responsible for setting up the monitoring agent
and the custom metrics for the application. You have chosen to create the metric
descriptor manually. You need to monitor the memory utilization metric of the
application and create an alerting policy. What value should the Metric Kind be set to
in the descriptor?
a. Set the metric kind to cumulative.
b. Set the metric kind to gauge. This is the best suited Metric Kind because it
measures the value at a particular instant in time.
c. Set the metric kind to delta.
d. Set the metric kind to distribution. There is no Metric Kind of distribution
82. The load testing of the application there were application failures, infrastructure
issues, and some capacity issues which were resolved and documented for reference
in future incidents. Which of the following is not a recommended practice for Incident
management?
a. Encourage team members to be familiar with each role in the incident
management process.
b. Focus on restoring service during incidents
c. Prioritize root-cause analysis during incidents.
d. Develop and document your incident management procedures in advance.
83. You use Cloud Build to build and deploy your application. You want to securely
incorporate database credentials and other application secrets into the build pipeline.
You also want to minimize the development effort. What should you do?
a. Use client-side encryption to encrypt the secrets and store them in a Cloud
Storage bucket. Store a decryption key in the bucket and grant Cloud Build
access to the bucket.
b. Use Cloud Key Management Service (Cloud KMS) to encrypt the secrets and
include them in your Cloud Build deployment configuration. Grant Cloud Build
access to the KeyRing.
c. Encrypt the secrets and store them in the application repository. Store a
decryption key in a separate repository and grant Cloud Build access to the
repository.
d. Create a Cloud Storage bucket and use the built-in encryption at rest. Store
the secrets in the bucket and grant Cloud Build access to the bucket.
84. Your application artifacts are being built and deployed via a CI/CD pipeline. You want
the CI/CD pipeline to securely access application secrets. You also want to more easily
rotate secrets in case of a security breach. What should you do?
a. Store secrets in Cloud Storage encrypted with a key from Cloud KMS. Provide
the CI/CD pipeline with access to Cloud KMS via IAM.
b. Prompt developers for secrets at build time. Instruct developers to not store
secrets at rest.
c. Encrypt the secrets and store them in the source code repository. Store a
decryption key in a separate repository and grant your pipeline access to it.
d. Store secrets in a separate configuration file on Git. Provide select developers
with access to the configuration file.
85. As a DevOps Engineer, you have been tasked to optimize resource utilization and
develop a plan to address areas of greatest cost or lowest utilization in a Google Cloud
Platform project. Your project is a multi-tier web application that includes Compute
Engine instances, Cloud Storage, Cloud SQL, and Cloud Pub/Sub. Which of the
following strategies would be the most effective in achieving these goals?
a. Enable autoscaling and use managed instance groups for Compute Engine
instances, optimize Cloud Storage by leveraging object lifecycle policies,
employ Cloud SQL with automatic storage increase, and monitor Cloud
Pub/Sub usage to adjust quotas.
b. Migrate all Compute Engine instances to Preemptible VMs and use the Always
Free tier for Cloud Storage, Cloud SQL, and Cloud Pub/Sub.
c. Utilize the GCP Cost Calculator to estimate costs, and then create a custom
machine type for each Compute Engine instance, use Nearline storage for all
Cloud Storage objects, and disable all Cloud SQL instances and Cloud Pub/Sub
topics.
d. Move the entire application to a single Compute Engine instance with
maximum vCPUs and memory, and use the Always Free tier for Cloud Storage,
Cloud SQL, and Cloud Pub/Sub.
86. You are responsible for designing a CI/CD pipeline in your organization. Your company
has asked you to ensure all data access logs for the pipeline is turned on and kept for
at least 90 days. What should you take into consideration before Data Access logs are
turned on?
a. Cost implications for log ingestion.
b. Cost Implications for logs storage.
c. Logs destination.
d. Log structure.
e. Logs allotment limits.
f. Cost implications for logs routing.
87. You are helping with the design of a data processing pipeline for a company. Data is
streamed from different devices into the pipeline and then processed before it is
loaded into the final storage for analytic use. You want to identify minimal Service
Level Indicators (SLIs) for the pipeline to ensure that the data in the final storage is up
to date. Which SLI should not be part of your consideration?
a. Throughput.
b. Latency.
c. Correctness
d. Durability.
88. You are part of the SRE team tasked with writing a postmortem of an outage for one
of the services your team manages. Which of these should not be a part of the creation
of the postmortem document according to the Google’s SRE best practices?
a. Root cause analysis.
b. Collaboration and Knowledge-sharing.
c. Unreviewed postmortem.
d. Outlining preventive actions.

89. Your company has deployed all its Cloud Source Repositories in a separate GCP
Project. You have been tasked with granting permissions developers in the dev Project
access to commit code to the dev repository in that Project. How can you achieve this
according to Google’s best practice of least privilege?
a. Grant the developers the Source Repository Admin role at Project level
b. Grant the developers the Source Repository Writer role at Repo level. This
grants permissions at the repo level to list, clone, fetch and update repositories.
c. Grant the developers the Source Repository Reader role at Project level
d. Grant the developers the Source Repository Admin role at Repo level
90. Your company is serving an application through the Compute Engine service behind a
global load balancer. You have been tasked with monitoring the availability of the
application and alert the on-call engineer if the application is unavailable for more
than five minutes. What should you do with the least management overhead?
a. Use Cloud Logging alerts to trigger a notification
b. Deploy a service to the instances to notify you when they fail
c. Create Uptime checks with the IP address of the individual VMs and an alerting
policy to trigger a notification
d. Create an Uptime check with the IP address of the load balancer and an
alerting policy to trigger a notification
91. Your team recently pushed an update to production. Several customers are now
complaining that the service is taking too long to respond. What should you do first
following Google’s SRE best practice for effective troubleshooting?
a. Try to figure out the severity of the issue
b. Open a bug (ticket) for the issue. Opening a bug ticket or reviewing application
logs can be helpful, but they should not be done before you have a good
understanding of the severity of the issue. This is because you may be wasting
your time troubleshooting a problem that is not actually affecting a large
number of customers.
c. Make the system work as well as it can while you troubleshoot. Making the
system work as well as it can while you troubleshoot is also a good idea, but it
is not the first step. You need to understand the severity of the issue before you
can make any changes to the system.
d. Review the application logs
92. Your team is managing multiple Projects with different applications. You have been
asked to centralize all billing data for the projects for ease of analysis. What steps
should you take, following Google’s best practice?
a. Create a separate Project for billing and restrict access to this Project using
roles.
b. Export Cloud Billing data to a file in a Cloud Storage bucket in the billing
Project.
c. Export Cloud Billing data to Bigtable in the billing Project
d. Export Cloud Billing data to BigQuery dataset in the billing Project
e. Export Cloud Billing data to Cloud PubSub in the billing Project
93. You are a devops engineer on a large-scale application development for a
multinational company. The development, testing and production environment
consists of several Projects. You have been tasked with designing and implementing a
billing export for the multiple Projects to a central billing Project. Following the
principle of least privilege, what role will be needed?
a. Project Owner.
b. Cloud Billing Administrator.
c. Project Billing Manager
d. Billing Account User
94. Your team manages several applications in different Projects with a central billing
Project. There is a requirement from finance to provide the ability for billing
breakdown according to departments or projects in BigQuery. How would you
accomplish this?
a. Apply the appropriate Tags to resources in the different Projects.
b. Apply the appropriate Security Marks to resources in the different Projects.
c. Create StackDriver Resource Groups in the different Projects
d. Apply the appropriate Labels to resources in the different Projects.
95. You are responsible for the VPC network design of an application that your team will
be deploying on Compute Engine (GCE). Minimal cost for Internet egress traffic
charges is a requirement. Which Network Service Tier option provides the lowest cost?
a. Select the Premium Tier for your network design.
b. Utilize NATs for egress to reduce cost of your network design
c. Select the Standard Tier for your network design
d. Select the Basic Tier for your network design. There are two tiers (Premium and
Standard).
96. You are on-call managing an application in production. You receive alerts from the
monitoring system of the application which show it is failing uptime checks. What do
you do first following SRE best practice of managing incidents?
a. Perform a root cause analysis.
b. Start fixing it.
c. Inform your team lead. This is not the first thing because you don’t know what
the real problem is yet.
d. Investigate, and if it persists, appoint an incident commander.
97. Your team is planning to deploy an application to App Engine in the production
Project. You need to be able to inspect the state of the app in real time, without
stopping or slowing it down. How can you accomplish this?
a. Using Cloud Monitoring to inspect the state of the app in real time.
b. Using Cloud Logging to inspect the state of the app in real time. Cloud Logging,
is used to store, search, analyze, monitor, and alert on log data and events.
While it is useful for analyzing logs, it may not be the best choice for real-time
inspection of the state of an application.
c. Using Cloud Profiler to inspect the state of the app in real time.
d. Using Cloud debugger to inspect the state of the app in real time. Cloud
Debugger allows real-time inspection of the state of an application running on
Google Cloud Platform without stopping or slowing it down. It enables
developers to examine the state of an application, including variables and call
stack at any code location, without using print statements or stopping the
application.
98. You are responsible for designing the logging of an application. Your company has
asked you to ensure logs are sent to the company’s Splunk instance. How should you
accomplish this with the least amount of operation overhead?
a. Create logs export to Cloud Storage buckets as the destination and use Cloud
Functions to copy to the Splunk.
b. Create logs export to a Pub/Sub topic as the destination and subscribe the
Splunk to the Pub/Sub topic. This is the recommended approach for exporting
logs to third-party applications.
c. Create logs export to a BigQuery dataset as the destination and use Cloud
Functions to copy to the Splunk.
d. Create logs export to Splunk as the destination.
99. You are responsible for designing a new logs collection system in your organization.
Your company has asked you to ensure all audit logs from all projects in the
organization are aggregated in one location. How should you accomplish this?
a. Create a Project for collecting logs, and create a logging bucket in Cloud
Storage using the console in that Project.
b. Create a Project for collecting logs, and create a logging bucket in Logging using
the console in that Project.
c. Create a logs sink in the console, select the type of logs to be collected, logging
bucket, organization and all the projects in the organizations.
d. Create a logs sink from the cli, specifying the type of logs to be collected,
logging bucket, organization and the –include-children flag.
e. Create logs sink in every Project to the logging bucket.
DevOps Professional GCP V
1. You are responsible for designing a CICD pipeline in your organization. Your company
has asked you to ensure all the continuous deployment (CD) part of the pipeline can
handle Blue/Green deployment. How could you accomplish this?
a. Spinnaker deployed on GKE
b. Cloud Build
c. Deployment Manager
d. Cloud Run.
e. Jenkins deployed on GKE
2. Your team is designing a CICD pipeline for your organization. Jenkins was chosen as
the Continuous Deployment Tool. Following GCP’s recommended practice, how
should the CD Tool be deployed?
a. Jenkins deployed on GKE
b. Jenkins deployed on Cloud Run
c. Jenkins deployed on Compute Engine
d. Jenkins deployed on App Engine
e. Jenkins deployed on Cloud Functions
3. Your team is designing a web-facing application for your organization. The application
is intended to serve users globally. Your job is to plan for the capacity of the
application. Following GCP’s SRE best practice for capacity management, which of
these is not recommended?
a. Design for graceful degradation
b. Carry out load testing
c. Implement monitoring and alerting
d. Provision a higher capacity to account for possible demand spikes
4. You are responsible for deploying a web-facing application. The application will serve
users in multiple regions. There is a reliability requirement for the system not to be
overloaded with requests during peak periods. Following GCP’s SRE best practice,
which of these is not recommended?
a. Implement queue management
b. Load Shedding
c. Implement cross-layer communication. This is not recommended because the
intra-layer communication is susceptible to a distributed deadlock.
d. Implement Retries
5. To meet industry compliance, your company has asked you to configure VPC Flow
Logs. A key priority is to streamline the logs collected from Flow Logs to reduce storage
costs. What steps can you take to achieve this?
a. You can set filters so that only logs that match certain criteria are generated.
b. Metadata annotations can be turned off, or you can specify only certain
annotations.
c. Create Log Sinks to store VPC Flow Logs
d. Modify Logs using the record_transformer plugin to reduce the number of logs
written to Logging
e. Modify the default retention period on the Logs bucket
6. To meet security compliance of centrally collecting VPC Flow Logs, your company
asked you to configure a Logs routing sink. The Sink destination is a Logging bucket in
another project. After you configure the Logs Sink, a few days later one of the security
team members points out that there are no logs in the logging bucket. Which of the
following is not a possible reason?
a. Flow Logs were not enabled in the monitored Project.
b. Firewall rules are blocking traffic.
c. Logging exclusion filters defined on the sink block specified logs
d. Viewing the wrong Logging bucket
7. Your team manages a financial application for an organisation. You have been given a
requirement to preserve the logs from the application for 10 years as part of a
compliance process. Logs will be reviewed once a year. What is the most cost-effective
way to achieve this?
a. Create a sink to route the application logs to a Cloud Storage Archive bucket
and set the retention policy to 10 years.
b. Create a sink to route the application logs to a user-defined logs bucket and
set the retention period to 3650 days.
c. Create a sink to route the application logs to a Cloud Storage Coldline bucket
and set the retention policy to 10 years.
d. Create a sink to route the application logs to BigQuery dataset.
8. Your company has several Google Projects. As part of the CI/CD pipeline it has a
Project where automated Compute and Docker Image creation is done. Users in the
developer, staging and Production Projects require access to the images created for
deployments. Following principle of least privilege, what IAM role would you need to
assign to users to achieve this?
a. Allow users to create instances from these images by granting them the
compute.imageUser role in the image creation Project.
b. Allow users to create instances from these images by granting them the
compute.instanceAdmin role in the image creation Project.
c. Allow users to create instances from these images by granting them the
compute.imageUser role in their different Projects.
d. Allow users to create instances from these images by granting them the
compute.instanceAdmin role in their different Projects.
9. You are developing a mobile application for a financial institution. A key security
requirement is that application passwords are changed frequently. The application
will comprise two parts; the frontend deployed on Google Kubernetes Engine and the
database is Google Cloud SQL. You need a secure way to pass the database credentials
to the application at runtime and also meet the security requirement. How can you
achieve this following best practice?
a. Store the credentials in the application code and update it as needed by
releasing new versions/updates to the application.
b. Use the CI/CD pipeline to inject the credentials into the application at
deployment
c. Create a secret via the console and configure secret rotation. Store the
credentials in the secret. Configure the application to get the credentials from
Secrets Manager using secret versions and update the secret version used by
the application after every rotation and disable previous versions.
d. Create a secret via the CLI and configure secret rotation. Store the credentials
in the secret, Configure the application to get the credentials from Secrets
Manager using secret versions and update the secret version used by the
application after every rotation and disable previous versions.
10. You are working on a new application development for a gambling company. The
application will utilize a microservices architecture to allow for loose coupling of the
different components. You are using Cloud Build to build the docker images. You have
tested the build locally using the local builder, but when you try to run the build in
Cloud Build it fails. Which of the following could be the problem?
a. Certain Firewall rules set in the VPC deny the Cloud Build traffic.
b. Cloud Build is in a different region from where you tested.
c. Certain permissions on your personal account are missing from the cloud Build
service account.
d. You are running multiple builds at the same time.
11. You are developing a new application for a global media company. The application
will serve content to users in several countries. The application needs to have a high
availability and reliability. Your team has agreed on relevant SLOs and Error budget
policy with stakeholders. Which of the following is not a recommended action when
the service has consumed its entire error budget?
a. Lowering the SLOs will provide more Error budget to work with.
b. The development team gives top priority to bugs.
c. To reduce the risk of more outages, a production freeze halts certain changes
to the system until there is sufficient error budget to resume changes.
d. The development team focuses exclusively on reliability issues until the
system is within SLO.
12. Your company is planning to deploy a python application on Google App Engine
Standard Environment. There is a requirement to continuously gather CPU usage
information from your production application. What steps will help achieve this?
a. Enable the Cloud Trace API
b. Install pip and Install the Trace package
c. Enable the Cloud Profiler API and add google-cloud-profiler to your
requirements.txt file
d. Install pip and Install the Cloud Profiler package. You don’t have access to the
underlying instance when using App Engine Standard Environment.
e. Import googlecloudprofiler module and call the googlecloudprofiler. start
function
13. Your team is creating an incident management procedure which will be a guide for
your team during incidents. Part of Google‘s SRE incident management best practice
is the separation of responsibilities. Which of the following responsibilities is not
essential during an incident?
a. A person responsible for assigning responsibilities according to need and
priority
b. The person or team responsible for modifying the system during an incident
c. The public face of the incident response task force
d. The person who created the incident management procedure. Creating the
incident management procedure is a team effort so anyone can use it when an
incident occurs.
14. As a DevOps engineer working with Google Cloud Platform, you want to foster a
learning culture and promote healthy communication and collaboration among team
members. Which of the following strategies can help you achieve this goal in GCP?
a. Create a safe and supportive environment where team members are
encouraged to experiment, take risks, and learn from their mistakes. Creating
a safe and supportive environment where team members are encouraged to
experiment, take risks, and learn from their mistakes is a key strategy for
fostering a learning culture and promoting healthy communication and
collaboration. This approach allows team members to learn from each other
and from their experiences, which can lead to better performance and job
satisfaction.
b. Provide team members with clear instructions and directives to avoid
confusion and ensure that everyone is working towards the same goals.
c. Encourage team members to work independently and avoid collaboration to
minimize the risk of conflicts and misunderstandings.
d. Focus solely on meeting project deadlines and delivering results to ensure that
projects are completed on time and within budget.
15. In a Google Cloud Platform (GCP) project, you are designing a monitoring system for a
multi-region application using Cloud Monitoring metrics scopes. Your goal is to
analyze the performance and usage of resources in each region separately, as well as
to have an aggregated view of the entire application. Which of the following
configurations would you choose to implement Cloud Monitoring metrics scopes
correctly?
a. Set up individual metrics scopes for each region and use Pub/Sub to aggregate
the data.
b. Create individual metrics scopes for each region and use a separate metrics
scope for the aggregated view.
c. Configure a single metrics scope that covers all the regions and filter the data
by region using labels. This allows you to monitor the performance and usage
of resources in each region separately by using labels for filtering, while still
having an aggregated view of the entire application.
d. Use the default metrics scope and apply a resource group to each region for
easier management.
16. A multinational company has recently migrated its infrastructure to Google Cloud
Platform. The company uses a variety of Compute Engine instances to run its
applications, and they want to optimize resource utilization and utilize committed use
discounts where appropriate. As a DevOps Engineer, which of the following strategies
should you recommend to optimize their costs?
a. Purchase committed use contracts for all instance types they use, regardless
of the actual utilization rates.
b. Use preemptible instances for all workloads to minimize costs, ignoring
committed use contracts.
c. Implement autoscaling on all instances and purchase committed use contracts
for the average usage.
d. Analyze instance utilization patterns and purchase committed use contracts
only for the instances with high and predictable usage.
17. A company has deployed a multi-tier web application on Google Cloud Platform (GCP)
and wants to use Cloud Monitoring to analyze application performance data. They
have decided to integrate Cloud Monitoring with BigQuery to perform more complex
analysis on the collected metrics. Which of the following approaches is the most
appropriate way to achieve this integration while ensuring a scalable, cost-effective
solution?
a. Create a Pub/Sub topic to export monitoring data, use Dataflow to process the
data, and then use BigQuery sink to store the data in BigQuery. Creating a
Pub/Sub topic to export monitoring data allows for real-time streaming and
decoupling of the data ingestion process. Using Dataflow to process the data
provides a scalable, serverless solution for data transformation, and the
BigQuery sink enables efficient storage of the data in BigQuery. This approach
is the most appropriate, as it offers a scalable, cost-effective, and real-time
solution to integrate Cloud Monitoring with BigQuery.
b. Enable the BigQuery export feature within Cloud Monitoring, which will
export the monitoring data directly into BigQuery tables. While it would be
ideal to have a direct BigQuery export feature within Cloud Monitoring, this
feature does not currently exist. Therefore, this option is not a valid solution.
c. Use the Cloud Monitoring API to fetch the metric data and then use the
BigQuery Streaming API to insert the data into BigQuery tables in real-time. It
might work for small-scale scenarios. However, this approach is not scalable, as
it would require significant compute resources to handle large volumes of
metric data.
d. Export the monitoring data to a Cloud Storage bucket, then set up a Data
Transfer service to move the data to BigQuery. It adds unnecessary complexity
and latency to the solution. This approach also requires additional storage and
transfer costs, which makes it less cost-effective.
18. A company is developing a web application on Google Cloud Platform that uses both
front-end and back-end technologies. The front-end is built using React and the back-
end is built using Node.js. They want to design a CI/CD pipeline to automate the build,
test, and deployment processes. Which approach would be most appropriate for
designing the pipeline?
a. Use Jenkins to create separate pipelines for front-end and back-end.
b. Use Google Cloud Build and Cloud Run to create a single pipeline for both
front-end and back-end.
c. Use Google Cloud Build to create separate pipelines for front-end and back-
end.
d. Use Kubernetes Engine and Jenkins to create a single pipeline for both front-
end and back-end.
19. A DevOps Engineer is tasked with implementing an org-level export of Cloud Logging
data to a specific destination. They must also manage the Cloud Logging platform for
better organization and efficient usage. Which of the following methods should the
engineer use to achieve the desired results?
a. Create a logs export sink with the --organization flag, and use the Cloud
Logging API to manage logs. Creating a logs export sink with the –organization
flag ensures that the export sink is configured at the org-level. Using the Cloud
Logging API to manage logs provides the required flexibility and control over log
data.
b. Create a logs export sink with the --folder flag, and use the gcloud command-
line tool for log management. The –folder flag creates a folder-level logs export
sink, not an org-level one. Also, the gcloud command-line tool is not as flexible
and comprehensive as the Cloud Logging API for log management.
c. Implement an org-level logs export sink without any specific flag, and use the
Cloud Logging API for log management.
d. Implement a project-level logs export sink, and use the Cloud Logging API for
log management.
20. A company has implemented a Google Cloud Platform (GCP) project that adheres to
their Service Level Objectives (SLOs). As a Professional Cloud DevOps Engineer, you
have been asked to define alerting policies based on Service Level Indicators (SLIs)
with Cloud Monitoring for this project. Which of the following approaches would be
the most appropriate way to implement this?
a. Set up a single alerting policy based on an aggregation of all SLIs, and trigger
the alert when the overall SLI threshold is breached.
b. Create individual alerting policies for each SLI, and trigger alerts when the
associated SLI thresholds are breached. Use alert documentation to provide
context and recommended actions. Creating individual alerting policies for
each SLI allows for better visibility and understanding of which specific indicator
has breached the threshold. Including alert documentation provides context
and recommended actions to help the team react accordingly.
c. Monitor only the most critical SLIs and rely on default GCP policies to alert for
the other SLIs.
d. Utilize Google Error Reporting to automatically create alerts for all SLIs
without any additional configurations. Google Error Reporting is primarily
designed for application error tracking and not for defining alerting policies
based on SLIs. It does not provide the necessary level of customization or
granularity required for this use case.
21. As a DevOps Engineer, you are tasked with optimizing resource utilization and utilizing
committed use discounts where appropriate in a Google Cloud Platform (GCP) project.
The project has multiple Compute Engine instances with varying resource
requirements and usage patterns. Which of the following strategies should you
implement to achieve the desired optimization and cost-saving goals?
a. Analyze the resource requirements and usage patterns of the Compute Engine
instances, then purchase the appropriate committed use contracts for vCPUs
and memory, while also enabling autoscaling based on custom metrics that
reflect each instance‘s resource requirements and usage patterns. Analyzing
resource requirements and usage patterns allows you to purchase committed
use contracts that match your instances‘ needs, maximizing cost savings.
Additionally, configuring autoscaling based on custom metrics tailored to each
instance‘s requirements and usage patterns ensures optimal resource
utilization.
b. Purchase committed use contracts for the maximum possible vCPUs and
memory for all Compute Engine instances, regardless of their resource
requirements and usage patterns, to ensure maximum cost savings.
c. Analyze the resource requirements and usage patterns of the Compute Engine
instances and then migrate them all to Preemptible VMs to reduce costs.
d. Allocate the minimum possible vCPUs and memory for all Compute Engine
instances and enable autoscaling based on instance uptime to minimize costs.
22. Which of the following best describes the difference between a push and a pull trigger
in a CI/CD pipeline with Cloud Source Repositories in Google Cloud Platform?
a. A push trigger initiates the CI/CD pipeline when a new tag is pushed to the
repository, while a pull trigger initiates the pipeline when a pull request is
merged.
b. A push trigger initiates the CI/CD pipeline when a pull request is made, while
a pull trigger initiates the pipeline when code changes are pushed to the
repository.
c. A push trigger initiates the CI/CD pipeline when code changes are pushed to
the repository, while a pull trigger initiates the pipeline when a pull request is
made. In Cloud Source Repositories, a push trigger initiates the CI/CD pipeline
when code changes are pushed to the repository. This can be configured to
trigger on any push or on specific branches. On the other hand, a pull trigger
initiates the pipeline when a pull request is made. This allows developers to test
their code changes before they are merged into the main branch.
d. A push trigger initiates the CI/CD pipeline when code changes are pushed to
the repository, while a pull trigger initiates the pipeline when a new branch is
created.
23. A multinational company has deployed a distributed application on Google Cloud
Platform (GCP) using multiple microservices. They have been experiencing
intermittent performance issues in their application. As a DevOps Engineer, you have
been tasked with optimizing service performance and debugging the application.
Which of the following approaches would you take to identify and resolve the
performance bottlenecks?
a. Enable Cloud Monitoring and analyze latency data, configure Cloud Debugger
to identify issues in the source code, use Cloud Trace to analyze CPU and
memory usage, and implement circuit breaking patterns to isolate
microservices.
b. Enable Cloud Trace and analyze latency data, configure Cloud Debugger to
identify issues in the source code, use Cloud Profiler to analyze CPU and
memory usage, and implement circuit breaking patterns to isolate
microservices.
c. Enable Cloud Trace and analyze latency data, configure Cloud Monitoring to
identify issues in the source code, use Cloud Profiler to analyze CPU and
memory usage, and implement load balancing patterns to distribute traffic
evenly.
d. Enable Cloud Trace and analyze latency data, configure Cloud Monitoring to
identify issues in the source code, use Cloud Debugger to analyze CPU and
memory usage, and implement circuit breaking patterns to isolate
microservices.
24. Which of these GCP features ‘automatically and digitally checks each component of
your software supply chain, ensuring the quality and integrity of your software before
an application is deployed to the production environment‘?
a. GKE
b. Binary Authorization. Binary Authorization is a GCP feature that automatically
and digitally checks each component of a software supply chain to ensure the
quality and integrity of the software before it is deployed to the production
environment. Binary Authorization uses policy-based controls to enforce
security and compliance checks on container images and their associated
metadata, ensuring that only trusted and approved images are deployed to
production.
c. Container Registry
d. Container Analysis
25. You work with a video rendering application that publishes small tasks as messages to
a Cloud Pub/Sub topic. You need to deploy the application that will execute these
tasks on multiple virtual machines (VMs). Each task takes less than 1 hour to complete.
The rendering is expected to be completed within a month. You need to minimize
rendering costs. What should you do?
a. Deploy the application as a managed instance group.
b. Deploy the application as a managed instance group with Preemptible VMs.
Configure a Committed Use Discount for the amount of CPU and memory
required.
c. Deploy the application as a managed instance group. Configure a Committed
Use Discount for the amount of CPU and memory required.
d. Deploy the application as a managed instance group with Preemptible VMs.
26. You are running a production application on Compute Engine. You want to monitor
the key metrics of CPU, Memory, and Disk I/O time. You want to ensure that the
metrics are visible by the team and will be explorable if an issue occurs. What should
you do?
a. Create a Dashboard with key metrics and indicators that can be viewed by the
team.
b. Set up logs-based metrics based on your application logs to identify errors.
c. Set up alerts in Stackdriver Monitoring for key metrics breaching defined
thresholds.
d. Export key metrics to a Google Cloud Function and then analyze them for
outliers.
e. Export key metrics to BigQuery and then run hourly queries on the metrics to
identify outliers.
27. You have a data processing pipeline that uses Cloud Dataproc to load data into
BigQuery.. A team of analysts works with the data using a Business Intelligence (BI)
tool running on Windows Virtual Machines (VMs) in Compute Engine.. The BI tool is in
use 24 hours a day, 7 days a week, and will be used increasingly over the coming years.
The BI tool communicates to BigQuery only. Cloud Dataproc nodes are the main part
of the GCP cost of this application. You want to reduce the cost without affecting the
performance. What should you do?
a. Apply Committed Use Discounts to the BI Tool VMs and the Cloud Dataproc
nodes.
b. Apply Committed Use Discounts to the BI Tool VMs. Create the Cloud Dataproc
cluster when loading data, and delete the cluster when no data is being
loaded.
c. Apply Committed Use Discounts to the BI Tool VMs and the Cloud Dataproc
nodes. Create the Cloud Dataproc cluster when loading data, and delete the
cluster when no data is being loaded.
d. Apply Committed Use Discounts to the BI Tool VMs.
28. Which incident goes to wider audience by Google?
a. Cloud status dashboard
b. Support cases
c. Google groups
d. GCSC known issues
29. Your company currently has its containerised applications deployed in on-premises
Kubernetes cluster. They have a plan to deploy a similar environment in GCP. The
company is concerned about the amount of operation workload that will be created
to keep both environments in sync. Which of the following can keep the Kubernetes
environments in sync and provide centralised multi-cluster management solution?
a. Anthos. Designed explicitly for centralised multi-cluster management and the
ability to keep multiple environments in sync. Anthos allows companies to
manage their applications across multiple clusters and clouds, including on-
premises environments and GCP, from a single control plane. This enables them
to easily deploy, manage, and update their applications consistently across
different environments, reducing the operational workload required to keep
both environments in sync.
b. Cloud Source Repositories.
c. Jenkins
d. Cloud Build. No es para multi cluster.
30. You are a DevOps Engineer tasked with analyzing your company‘s application
performance using Google Cloud‘s Metrics Explorer for ad hoc metric analysis. You
notice a sudden increase in latency and want to investigate the cause. Which of the
following options would be the most effective way to identify the source of the
problem?
a. Create a custom metric to display latency per instance, then use Stackdriver
Logging to analyze logs and filter by region, zone, and instance type.
b. Create a dashboard with a custom metric to display latency per instance, and
use the Aggregations tab to filter by region, zone, and instance type. This
option allows you to create a custom metric to specifically monitor latency,
which is the issue you are investigating. By using the Aggregations tab, you can
filter the results by region, zone, and instance type to identify the source of the
problem effectively.
c. Use Metrics Explorer to search for a pre-existing metric like
‘instance/network/received_packets_count‘ and filter by region, zone, and
instance type.
d. Use Metrics Explorer to search for a pre-existing metric like
‘compute.googleapis.com/instance/cpu/utilization‘ and filter by region, zone,
and instance type.
31. You are designing a CI/CD pipeline for a service on Google Cloud Platform, and you
need to store immutable artifacts in Artifact Registry. Which of the following is a best
practice for creating and storing immutable artifacts?
a. Use versioning and metadata to manage and track artifacts and versions.
Using versioning and metadata to manage and track artifacts and versions can
help teams to ensure that artifacts are created and stored consistently across
different environments and services. It can also help to manage dependencies
and version control, and ensure that artifacts are deployed consistently across
different environments.
b. Use custom scripts to automate the artifact creation and storage process.
c. Store all artifacts and versions as binary files in Google Cloud Storage.
d. Use a single artifact repository for all artifacts and versions, without
separating them by service or environment.
32. As a DevOps Engineer, you are responsible for managing service incidents for an
application deployed on Google Kubernetes Engine (GKE). Your organization has
recently experienced an incident that affected the application‘s availability. Which
two of the following actions should you take to mitigate the impact of the incident
and ensure efficient incident management?
a. Use GKE autoscaling to increase the number of replicas for the affected service
to distribute the load evenly and improve application availability.
b. Deploy a canary release of the application with the same configuration as the
current production environment to identify the root cause of the incident.
c. Configure Google Cloud Pub/Sub to notify the on-call team when an incident
is detected to reduce the mean time to resolution (MTTR).
d. Create a postmortem report to analyze the root cause of the incident and
implement corrective actions to prevent similar incidents in the future.
e. Perform a rolling update in the production environment to introduce a new,
untested version of the application to resolve the incident.
33. As a DevOps Engineer, you are ready to deploy a new feature of a web-based
application to production. You want to use Google Kubernetes Engine (GKE) to
perform a phased rollout to half of the web server pods. What should you do?
a. Use a partitioned rolling update. Partitioned rolling updates allow you to
perform a phased rollout in GKE, where you can update a specific percentage of
your web server pods. In this case, you can set the partition to update 50% of
the pods to achieve a phased rollout to half of the web server pods.
b. Use Node taints with NoExecute. Node taints with NoExecute are used to
control which nodes the pods can be scheduled on, and it evicts existing pods
from a tainted node. It doesn‘t help with phased rollouts of a new feature to
web server pods.
c. Use a replica set in the deployment specification. A replica set ensures that a
specified number of pod replicas are running at any given time.
d. Use a stateful set with parallel pod management policy. Stateful sets are used
for managing stateful applications that require stable network identities and
persistent storage. The parallel pod management policy ensures that the pods
are launched and terminated in parallel. This option is not suitable for phased
rollouts of a new feature to web server pods, as stateful sets are not designed
for this purpose.
34. You are building a CI/CD pipeline for a service on Google Cloud Platform, and you need
to store immutable artifacts in Artifact Registry. Which of the following is a best
practice for creating and storing artifacts in Artifact Registry?
a. Use a separate pipeline for each service to manage and track the artifact
creation and storage process.
b. Store only the final build artifacts in Artifact Registry, and keep intermediate
artifacts in Google Cloud Storage. Storing only the final build artifacts in Artifact
Registry, and keeping intermediate artifacts in Google Cloud Storage can help
to reduce storage costs and improve performance. It can also make it easier to
manage and track artifacts and versions, and ensure that they meet the required
standards.
c. Use a separate Artifact Registry instance for each environment (development,
staging, and production) to ensure isolation and security.
d. Use a custom naming convention for artifacts that includes the service name,
version number, and environment.
35. You are tasked with designing an automated CI pipeline for building and pushing
images to Container Registry. In the current system, developers have to issue build
commands after code is pushed to the test branch in the source repository. What steps
can you take to automate the build of the test branch with the least amount of
management overhead?
a. Add a cloud build config file when code is pushed to the branch. Create a
trigger in Cloud Source Repository and select the event “Push to a branch”
b. Add a cloud build config file when code is pushed to the branch. Create a
trigger in Cloud Build and select the event “Push to a branch”
c. Add a cloud build config file when code is pushed to the branch. Create a
trigger in Cloud Build and select the event “Pull request”
d. Create a cloud function that is triggered when code is committed to the cloud
source repository.
36. Several teams in your company want to use Cloud Build to deploy to their own Google
Kubernetes Engine (GKE) clusters. The clusters are in projects that are dedicated to
each team. The teams only have access to their own projects. One team should not
have access to the cluster of another team. You are in charge of designing the Cloud
Build setup, and want to follow Google-recommended practices. What should you do?
a. Create a single project for Cloud Build that all the teams will use. List the
service accounts in this project and identify the one used by Cloud Build. Grant
the Kubernetes Engine Developer IAM role to that service account in each
team’s project.
b. In each team’s project, list the service accounts and identify the one used by
Cloud Build for each project. In each project, grant the Kubernetes Engine
Developer IAM role to the service account used by Cloud Build. Ask each team
to execute Cloud Build builds in their own project.
c. Limit each team member’s access so that they only have access to their team’s
clusters. Ask each team member to install the gcloud CLI and to authenticate
themselves by running “gcloud init”. Ask each team member to execute Cloud
Build builds by using the “gcloud builds submit”.
d. In each team’s project, create a service account, download a JSON key for that
service account, and grant the Kubernetes Engine Developer IAM role to that
service account in that project. Create a single project for Cloud Build that all
the teams will use. In that project, encrypt all the service account keys by using
Cloud KMS. Grant the Cloud KMS CryptoKey Decrypter IAM role to Cloud
Build’s service account. Ask each team to include in their “cloudbuild.yaml”
files a step that decrypts the key of their service account, and use that key to
connect to their cluster.
37. You are deploying an application to a Kubernetes cluster that requires a username and
password to connect to another service. When you deploy the application, you want
to ensure that the credentials are used securely in multiple environments with
minimal code changes. What should you do?
a. Leverage a CI/CD pipeline to update the variables at build time and inject them
into a templated Kubernetes application manifest.
b. Store the credentials as a Kubernetes ConfigMap and let the application access
it via environment variables at runtime.
c. Store the credentials as a Kubernetes Secret and let the application access it
via environment variables at runtime.
d. Bundle the credentials with the code inside the container and secure the
container registry.
38. In the App Engine standard environment which applications don’t need to use the
Cloud Trace client libraries?
a. Java8
b. Python2
c. C#
d. PHP5
e. Node
39. Which of the following is not allowed in snapshot condition of Google debugger?
a. The operator <=
b. The operator ==
c. The operator ||
d. The operator &&
40. As a DevOps Engineer, you are tasked with managing service incidents in a multi-tier,
containerized application deployed on Google Kubernetes Engine (GKE). During an
incident, you notice that the application‘s API response time has significantly
increased. Which of the following steps should you take FIRST to mitigate the issue?
a. Scale the application vertically by increasing the resources allocated to each
container.
b. Identify the root cause of the performance degradation by analyzing logs and
metrics. To effectively manage service incidents, the first step is to identify the
root cause of the performance degradation by analyzing logs and metrics. This
approach enables you to implement a targeted solution and prevent the
problem from reoccurring.
c. Delete and recreate the affected containers.
d. Create a new GKE cluster in a different region and redeploy the application.
41. You are a DevOps engineer responsible for managing the logging infrastructure for
your organization‘s Google Cloud Platform (GCP) projects. You have been asked to
implement a more efficient way to manage, analyze, and troubleshoot logs using the
Cloud Logging platform. Which of the following approaches would be the most
appropriate?
a. Export all logs to a third-party logging system for management and analysis,
ignoring GCP‘s native Cloud Logging capabilities.
b. Configure log sinks to export logs to BigQuery for analysis, set up log-based
metrics, and use the Logs Explorer for troubleshooting. This option provides a
comprehensive approach for managing, analyzing, and troubleshooting logs
using the Cloud Logging platform. Configuring log sinks to export logs to
BigQuery allows for efficient log analysis using BigQuery‘s powerful querying
capabilities. Setting up log-based metrics helps in monitoring the logs‘ data
patterns and generating insights. Using the Logs Explorer provides a user-
friendly interface to search, filter, and troubleshoot logs in real-time, making
this the most appropriate choice.
c. Store all logs in Google Cloud Storage, and analyze them using Data Studio for
visualization and reporting.
d. Disable all logs, except for critical and error logs, to reduce log volume and
focus on only essential logs.
42. Which of the following is not a characteristic of a well-defined Service Level Indicator
(SLI)?
a. It is easy to measure and interpret.
b. It is aligned with the service level objectives (SLOs).
c. It provides insight into the underlying system architecture. Providing insight
into the underlying system architecture is not a characteristic of a well-defined
SLI. While understanding the underlying system architecture may be important
for troubleshooting performance issues, SLIs are typically focused on user
experience and are not concerned with the technical details of the system.
d. It is based on a user-centric perspective.
43. When constructing feedback loops to decide what to build next, which of the following
is a key step to ensure the feedback is relevant and actionable?
a. Only soliciting feedback from internal stakeholders.
b. Gathering feedback from as many sources as possible.
c. Prioritizing feedback based on its impact on business goals. When constructing
feedback loops, it is important to prioritize feedback based on its impact on
business goals. This ensures that the feedback received is relevant and
actionable, and that it will help drive the business forward. By prioritizing
feedback, DevOps teams can focus on the most important areas for
improvement and allocate resources accordingly.
d. Ignoring feedback that is critical of existing products or processes.
44. You are a DevOps Engineer responsible for optimizing the performance of an API
service running on Google Cloud Platform. The service is built using Cloud Run and
experiences periodic spikes in demand. Which of the following strategies will best help
you ensure optimal performance during periods of high demand while adhering to the
principles of Site Reliability Engineering (SRE)?
a. Increase the allocated resources (CPU and memory) for each instance without
analyzing the service‘s actual resource requirements.
b. Disable container instance logging and monitoring to reduce resource
consumption.
c. Migrate the service to a Compute Engine instance with the maximum available
resources, without considering autoscaling.
d. Configure autoscaling for the Cloud Run service based on concurrency, and set
up monitoring and alerting using Cloud Operations.
45. As a Cloud DevOps Engineer, you are debugging an application running on Google
Cloud. You decide to use Cloud Debugger to capture the state of your application at a
specific point in the execution flow without stopping or slowing it down. What can
you say about the data captured in a Cloud Debugger snapshot?
a. The snapshot includes the call stack and local variables at the snapshot
location. A snapshot created by Cloud Debugger provides insight into the
application‘s state without stopping its execution. The snapshot includes the call
stack and the values of local variables at the specific snapshot location.
b. The snapshot contains the content of the local storage of the virtual machine
running the application.
c. The snapshot includes only the stack trace of the current execution point.
d. The snapshot provides detailed logs of all operations performed by the
application.
46. Adopting Site Reliability Engineering principles, a postmortem should be written
following an incident resulting from a software change and leading to a significant
impact on users. To ensure similar severe issues do not occur in the future, what steps
should be taken?
a. Make sure that test cases which detect errors like this are conducted
successfully before any new software is released.
b. Establish a policy that will necessitate on-call teams to quickly contact
engineers and management to discuss a plan of action if an incident occurs.
c. Follow up with the personnel who inspected the changes and determine
measures they should stick to in the future.
d. Locate engineers involved with the incident and report to their higher-ups.
47. As a DevOps Engineer, you have been tasked with optimizing the performance of a
web application that frequently experiences sudden spikes in traffic. The application
utilizes multiple microservices, and efficient resource utilization is a top priority.
Which of the following strategies would be most effective in ensuring optimal
performance and resource efficiency during traffic spikes?
a. Utilizing Google Kubernetes Engine (GKE) with cluster autoscaling enabled
b. Deploying the microservices on individual Compute Engine instances without
autoscaling
c. Implementing a Cloud Load Balancer with global backend services
d. Implementing a custom cache eviction policy in Cloud CDN
48. To ensure minimal negative impact on users as a result of predicted growth and zone
failure, a semi-annual capacity planning exercise is being performed for a flagship
service. This service runs on Google Cloud Platform (GCP) and is fully containerized,
using a Google Kubernetes Engine (GKE) Standard regional cluster on three zones with
cluster autoscaler enabled. With an expected 10% month-over-month user growth
rate over the next six months and currently consuming roughly 30% of the total
deployed CPU capacity, resilience against zone failure is paramount. To handle the
predicted growth while avoiding unnecessary costs, what preparation should be
done?
a. As you are currently only using 30%, you have a large amount of spare capacity
and you don‘t need to add any extra for this rate of development.
b. Confirm the upper limit of the node pool size, turn on a horizontal pod
autoscaler, and then run a load test to make sure your anticipated resource
needs are met.
c. Since you are deployed on GKE and are making use of a cluster autoscaler,
your GKE cluster can resize automatically, regardless of the increase in growth.
d. Ahead of time, add 60% more node capacity to handle a six-month 10%
growth rate, and then carry out a load test to make sure you have enough
room.
49. You are building a CI/CD pipeline for a service that needs to be deployed to hybrid and
multicloud environments. Which of the following is a best practice for building and
deploying to hybrid and multicloud environments?
a. Use a different configuration file for each environment to manage
environment-specific variables.
b. Use a different pipeline for each environment to ensure consistency and avoid
conflicts.
c. Use a different container image for each environment to ensure consistency
and avoid conflicts.
d. Use cloud-native tools and platforms to manage and deploy the service
components. Using cloud-native tools and platforms can help teams to manage
and deploy service components efficiently and consistently across different
hybrid and multicloud environments. It can also help to manage dependencies
and version control, and ensure that the service components are deployed
consistently across different environments.
50. As a member of an organization that adheres to SRE practices and principles, I recently
took on the management of a fresh service from the Development Team. During the
Production Readiness Review (PRR) analysis phase, it became apparent that the
service could not meet its Service Level Objectives (SLOs). To make sure the service
can meet its SLOs when in production, what steps should I take next?
a. Set SLO targets that the service can reach so you can put it into production.
b. Alert the development team that they will need to provide production support
for the service.
c. Discover proposed reliability enhancements for the service to be done before
transfer.
d. Bring the service into production without SLOs and construct them when you
have obtained operational data.
51. When managing service incidents in Google Cloud Platform, a DevOps engineer needs
to effectively coordinate roles and implement communication channels. During a
major incident, your team needs to quickly identify the root cause and resolve the
issue. Which of the following steps best demonstrates this process, while adhering to
recommended practices in Google Cloud Platform?
a. Assign an Incident Commander and Communications Lead, create a dedicated
Google Hangouts chat, update stakeholders using Statuspage, and perform a
root cause analysis after the incident is resolved.
b. Assign a primary Incident Commander, create a dedicated Slack channel, share
status updates through Google Sheets, and perform a root cause analysis after
the incident is resolved.
c. Appoint an Incident Commander, use a shared company-wide Slack channel,
skip status updates, and perform a root cause analysis only if stakeholders
request it.
d. Designate a single team member to handle all roles, communicate via email,
and perform a root cause analysis only if the incident recurs.
52. In the context of managing Cloud Logging platform and viewing export logs in Cloud
Storage and BigQuery in Google Cloud Platform, which of the following statements is
true regarding the export of log entries to different storage systems?
a. Log entries cannot be exported to BigQuery, as it is primarily a data analysis
tool and not meant for storing logs.
b. The Cloud Logging API allows you to export logs to Cloud Storage and BigQuery
in real-time, without needing to set up sinks.
c. To export log entries to different storage systems, you need to create sinks
that specify the destination, format, and a filter for logs to be exported. In
order to export log entries to different storage systems like Cloud Storage and
BigQuery, you need to create sinks. Sinks define the destination, format (i.e.,
JSON, Avro, or other formats), and a filter for logs that you want to be exported.
Once a sink is created, the logs will be exported automatically based on the
specified configuration.
d. Log entries exported to Cloud Storage can only be viewed using the Cloud
Storage Console.
53. You are a DevOps Engineer responsible for optimizing the performance of a service
running on the Google App Engine local development server. The service is a critical
part of your company‘s web application and must respond to user requests within 500
ms. You notice that the service is experiencing increased latency during peak hours.
Which of the following approaches should you adopt to optimize the service‘s
performance while using App Engine local development server in Google Cloud
Platform?
a. Use manual scaling, increase the number of instances during peak hours, and
reduce the number of instances during off-peak hours.
b. Implement a cache mechanism like Memcached or Redis to reduce the
number of datastore reads. It can help optimize service performance by
reducing the number of datastore reads, resulting in lower latency during peak
hours. This approach addresses the performance issue more directly and
efficiently.
c. Configure a cron job to restart the service every 15 minutes to ensure that the
service runs efficiently.
d. Enable automatic scaling for the service, set a minimum and maximum
number of instances, and configure appropriate latency targets.
54. As a Cloud DevOps Engineer, you are tasked with maintaining the health of instances
within a Managed Instance Group (MIG). You have configured a health check for your
MIG. How does the health check system determine that an instance in the MIG is
unhealthy?
a. The instance is considered unhealthy if it doesn‘t return a 200 OK status within
the check interval defined in the health check. Health checks send HTTP,
HTTPS, or TCP requests to each instance at a specified frequency. If an instance
doesn‘t respond or doesn‘t return a 200 OK HTTP status to the health check
system within the check interval defined, it is considered unhealthy.
b. The instance is considered unhealthy if it hasn‘t been used for a specified
period of time.
c. The instance is considered unhealthy if the CPU usage goes above 80%.
d. The instance is considered unhealthy if it does not respond to a ping. While a
lack of response to a ping could indicate an unhealthy instance, Google Cloud
Health Checks do not use pings (ICMP Echo Request and Echo Reply messages)
to determine the health status. They send HTTP, HTTPS, or TCP requests to the
instance.
55. How can a DevOps Engineer effectively manage service incidents and coordinate roles
and communication channels during an incident?
a. By implementing an incident management process and establishing clear
communication channels using a collaboration tool. To effectively manage
service incidents and coordinate roles and communication channels during an
incident, a DevOps engineer should implement an incident management
process and establish clear communication channels using a collaboration tool.
This includes creating an incident response plan, defining roles and
responsibilities, and establishing a communication plan that includes the
appropriate channels for different stakeholders. A collaboration tool, such as
Slack or Microsoft Teams, can provide real-time communication, visibility, and
coordination capabilities that are critical for managing service incidents.
b. By assigning a single point of contact for all stakeholders and providing regular
updates via phone calls.
c. By creating a post-incident report and notifying stakeholders via email.
d. By establishing a dedicated chat channel for incident communication and
relying on automated alerts for updates.
56. As a Cloud DevOps Engineer, you are required to manage infrastructure as code for a
multi-environment GCP project. Which of the following Terraform strategies ensures
the highest level of modularity, reusability, and maintainability while minimizing
human error and providing a consistent deployment process across environments?
a. Use a single Terraform workspace for each environment, sharing a common
configuration and utilizing input variables. By using a single Terraform
workspace for each environment, you can share a common configuration while
utilizing input variables to manage environment-specific settings. This approach
ensures modularity, reusability, and maintainability, minimizes human error,
and provides a consistent deployment process.
b. Use separate Terraform configurations for each environment with hardcoded
values for resources and service configurations.
c. Use Terraform modules for each environment with environment-specific
variables in a centralized repository.
d. Use a single Terraform configuration for all environments with distinct
variables files for each environment.
57. In the context of implementing SLO monitoring and alerting with Cloud Monitoring in
Google Cloud Platform, which of the following steps is the correct approach to set up
a comprehensive monitoring and alerting system for the error budget of a service?
a. Set up an alerting policy based on the service‘s overall performance, create a
custom metric for the error budget, define the SLO, and configure the alert
notification channel.
b. Create a custom metric for the error budget, set up a Service Level Indicator
(SLI) based on that metric, create an alerting policy based on the SLI, and
configure the alert notification channel.
c. Use the default metrics provided by Cloud Monitoring to define the Service
Level Objective (SLO), create an alerting policy based on the SLO, and
configure the alert notification channel without any custom metric or SLI.
d. Create a custom metric for the error budget, define the SLO based on that
metric, create an alerting policy based on the SLO, and configure the alert
notification channel. It highlights the complete process to set up a
comprehensive monitoring and alerting system for the error budget of a service,
including creating a custom metric, defining the SLO based on that metric,
setting up an alerting policy based on the SLO, and configuring the alert
notification channel.
58. You are a DevOps engineer working on a Google Cloud Platform (GCP) project. The
project has been experiencing performance issues and you have been tasked with
optimizing service performance and evaluating user impact. Which two of the
following actions should you take to achieve these objectives?
a. Disable logging and monitoring to reduce the overhead and improve
performance.
b. Use Cloud Monitoring to identify performance bottlenecks and create custom
dashboards for specific services. Cloud Monitoring provides visibility into the
performance, uptime, and overall health of cloud-powered applications. By
using this tool, you can identify performance bottlenecks and create custom
dashboards for specific services, which helps you optimize service performance
and evaluate user impact.
c. Implement autoscaling policies based on CPU utilization and request rate to
automatically adjust the number of instances. Autoscaling policies help to
optimize service performance by automatically adjusting the number of
instances based on demand and resource usage, such as CPU utilization and
request rate. This ensures that the application can handle varying workloads
and provide a consistent user experience, while minimizing costs and resource
wastage.
d. Deploy a single monolithic application across all services, to simplify the
architecture and reduce complexity.
e. Manually increase the number of instances for all services, regardless of
resource usage or demand.
59. A DevOps Engineer is tasked with analyzing audit and flow logs to detect anomalies
and potential security threats in a Google Cloud Platform environment. They have
decided to use SIEM tools to help with this task. Which of the following approaches is
the most effective way to integrate SIEM tools with Google Cloud Platform for this
purpose?
a. Use the Google Cloud Pub/Sub service to stream logs from Google Cloud
Logging to the SIEM tool in real-time. Google Cloud Pub/Sub provides a real-
time messaging service that can be used to stream logs from Google Cloud
Logging to a SIEM tool. This approach allows for real-time analysis of logs and
ensures that the SIEM tool has the most up-to-date information for detecting
anomalies and potential security threats.
b. Install a SIEM agent on each virtual machine (VM) and configure it to send logs
directly to the SIEM tool.
c. Rely on the built-in integration between Google Cloud Logging and the SIEM
tool to automatically forward logs.
d. Manually export logs from Google Cloud Logging periodically and then import
them into the SIEM tool.
60. A DevOps Engineer at XYZ Company is tasked with optimizing the performance of a
latency-sensitive microservice and troubleshooting network issues in their Google
Cloud Platform environment. The microservice is running on a Kubernetes cluster, and
the company is using Cloud Monitoring and Cloud Logging for observability. The
microservice is experiencing intermittent network issues, resulting in high latency for
API calls. Which of the following approaches should the engineer take to diagnose and
address the issue?
a. Use Cloud Trace and Cloud Profiler to identify performance bottlenecks in the
microservice.
b. Increase the number of vCPUs and memory allocated to the microservice‘s
Kubernetes Pods.
c. Modify the Kubernetes Ingress controller to use HTTP/2.
d. Use VPC Flow Logs and Network Intelligence Center to identify network
bottlenecks and connectivity issues. VPC Flow Logs provide visibility into
network traffic, enabling the engineer to identify potential bottlenecks or
connectivity issues. Network Intelligence Center offers comprehensive network
monitoring, helping to diagnose and fix network issues in the GCP environment.
This approach addresses both the optimization and troubleshooting aspects of
the question.
61. You are a DevOps Engineer working on a project that requires the optimization of
resource utilization and identification of resource utilization levels in Google Cloud
Platform. Your team has implemented a series of microservices on Google Kubernetes
Engine (GKE), and you need to monitor the cluster‘s performance to ensure optimal
resource utilization. Which of the following approaches would be the most
appropriate to achieve this goal?
a. Enable Cloud Monitoring and Logging on your GKE cluster, and use custom
metrics to track resource utilization levels.
b. Deploy a custom monitoring solution on Compute Engine to scrape GKE cluster
metrics and analyze resource utilization.
c. Create a cron job on a Compute Engine instance to periodically check GKE
cluster resource utilization using the kubectl top command.
d. Set up Google BigQuery to store GKE resource utilization data and analyze it
using SQL queries.
62. What is the purpose of a Service Level Objective (SLO) in a service managed using site
reliability engineering principles?
a. To specify the maximum acceptable level of error or downtime for the service.
b. To ensure that the service is always available to users.
c. To establish a minimum level of performance that the service must meet.
d. To identify the root cause of any service issues and address them proactively.
63. A DevOps Engineer is responsible for managing the Cloud Monitoring platform to
provide effective monitoring of their organization‘s applications and infrastructure in
the Google Cloud Platform (GCP). The engineer also needs to filter and share
dashboards with other teams to collaborate and ensure service uptime. Which of the
following actions will best help the engineer meet these requirements?
a. Create custom dashboards with the required metrics, filter dashboards based
on specific conditions, then export them as JSON files and email them to other
teams.
b. Create custom dashboards with the required metrics, filter dashboards based
on specific conditions, then create a URL to share the filtered view with other
teams. It allows the engineer to create custom dashboards, apply filters based
on specific conditions, and share a live, filtered view with other teams using a
URL. This approach enables teams to collaborate effectively with real-time data.
c. Create custom dashboards with the required metrics, filter dashboards based
on specific conditions, then take screenshots and share them with other
teams.
d. Create custom dashboards with the required metrics, filter dashboards based
on specific conditions, then grant IAM roles to other teams and ask them to
access the Google Cloud Console.
64. Which of the following Google Cloud services is designed to help DevOps teams to
automatically identify and remediate security risks in their infrastructure?
a. Cloud Armor
b. Cloud Security Command Center. Security Command Center is a comprehensive
security and data risk platform that helps DevOps teams to discover, manage,
and remediate security risks across their Google Cloud infrastructure. It
provides a centralized view of security findings, actionable recommendations,
and integrated remediation tools.
c. Cloud Security Scanner
d. Cloud Vulnerability Scanning
65. As a DevOps Engineer, you are tasked with managing the Cloud Logging platform and
viewing logs in the Google Cloud Console. Which of the following options is the most
effective method for configuring the Cloud Logging platform and logs retention policy,
while ensuring the easy accessibility and real-time analysis of log data in the Google
Cloud Console?
a. Create a custom logs-based metric, export logs to Google Cloud Storage, set
up the retention policy in Cloud Storage, and use the Metrics Explorer for real-
time analysis.
b. Export logs to BigQuery for retention, use Cloud Pub/Sub for real-time
streaming, and create a custom dashboard in the Google Cloud Console for log
analysis.
c. Configure the logs retention policy in Cloud Logging, use the Logs Explorer in
the Google Cloud Console to view logs, and enable Data Access logs for real-
time analysis. Configuring the logs retention policy in Cloud Logging ensures
efficient log management, using Logs Explorer in the Google Cloud Console
enables easy accessibility to logs, and enabling Data Access logs allows for real-
time analysis of log data.
Create a custom logs-based metric, export logs to Google Cloud Storage, set up
the retention policy in Cloud Storage, and use the Metrics
d. Set up the retention policy directly in Cloud Logging, use Cloud Pub/Sub to
stream logs in real-time, and configure Cloud Logging to view logs in the
Google Cloud Console.
66. Let’s say your company currently has its containerised applications deployed in on-
premises Kubernetes cluster. They have a plan to deploy a similar environment in GCP.
The company is concerned about the amount of operations that will be needed to
keep both environments in sync. Which of the following can be used to keep the
Kubernetes environments in sync and provide a centralized multi-cluster
management?
a. CloudBuild.
b. Cloud Source Repositories.
c. Jenkins
d. Anthos
67. As a Cloud DevOps Engineer, you are responsible for managing infrastructure as code
and ensuring best practices for infrastructure code versioning in a GCP project. Which
of the following approaches will provide the most efficient way to maintain versioned
infrastructure code while also promoting collaboration, modularity, and traceability?
a. Use separate repositories for each environment, maintaining versioned
infrastructure code with Git branches.
b. Store infrastructure code in a single repository, using Terraform modules and
Git branches for versioning and managing changes across environments.
Storing infrastructure code in a single repository and using Terraform modules
alongside Git branches for versioning provides a balance between collaboration,
modularity, and traceability. This approach allows for efficient management of
changes across environments and promotes maintainability.
c. Store infrastructure code in a single monolithic repository, with each
environment in a separate branch.
d. Store infrastructure code in a centralized repository and use Git tags to version
the code for each environment.
68. As a Cloud DevOps Engineer, you have rolled out a canary release of your new
application on Google Kubernetes Engine (GKE) to test the new features. However,
you notice that there are unexpected errors that are causing disruptions in the system.
What is the best course of action to roll back the experimental canary release?
a. Delete all pods associated with the canary release.
b. Implement a new canary release with fixes for the errors.
c. Use Google Cloud Deployment Manager to switch back to the previous version
of the application.
d. Update the service object in Kubernetes to direct traffic back to the stable
pods. In a canary deployment, the traffic is managed at the Kubernetes service
level. The service object routes traffic to different sets of pods (stable vs canary).
To roll back the canary release, you would update the service object to direct
traffic away from the canary pods and back to the stable pods.
69. As a DevOps Engineer, you have been tasked with optimizing resource utilization and
considering network pricing for a multi-region application in the Google Cloud
Platform. Given the following options, which two will help you achieve these
objectives?
a. Deploy all resources in a single region to minimize egress traffic
b. Use Cloud Interconnect for traffic between on-premises and GCP
c. Leverage Network Service Tiers for different traffic priorities. This is correct
because autoscaling managed instance groups automatically adjust the number
of instances in the group based on the demand, which optimizes resource
utilization. This helps to avoid over-provisioning and minimize costs.
d. Manually provision a fixed number of instances in each region
e. Utilize managed instance groups with autoscaling. This is correct because
Google Cloud offers two network service tiers: Premium and Standard. Premium
Tier provides higher performance and availability, while Standard Tier offers
lower cost for less performance-critical workloads. By using the appropriate tier
for different traffic priorities, you can optimize network pricing and resource
utilization.
70. Crafting a postmortem regarding an incident that had a major impact on users with
the aim of preventing similar issues in future necessitates the inclusion of two
sections. Which two sections should they be?
a. Steps to stop the incident from happening again.
b. A description of why the incident occurred.
c. Documents that include the plans for the services affected by the incident.
d. The names of personnel involved in the incident.
e. An evaluation of how serious the incident was in comparison to other
incidents.
71. You have a batch processing workload that requires hundreds of compute instances
to run every day for a few hours. The workload is not time-sensitive, and you want to
reduce costs as much as possible. Which of the following strategies should you use to
optimize resource utilization and manage preemptible VMs in Google Cloud Platform?
a. Use standard VMs and scale up and down manually based on the workload
demands.
b. Use managed instance groups with autoscaling and preemptible VMs. Using
managed instance groups with autoscaling and preemptible VMs would be the
most efficient and cost-effective option for this batch processing workload.
Managed instance groups allow for automatic scaling based on workload
demands, and preemptible VMs provide further cost savings without sacrificing
performance. The combination of the two also helps to ensure that there are
always enough instances available to complete the workload, even if some are
terminated prematurely.
c. Use preemptible VMs and scale up and down manually based on the workload
demands.
d. Use standard VMs and automate scaling based on the workload demands.
72. In a GKE-based environment, you are required to manage different development
environments and create environments dynamically per feature branch. Which of the
following strategies best achieves this requirement while maintaining efficient
resource utilization and optimal CI/CD pipeline integration?
a. Use a single GKE cluster with Kubernetes namespaces for each feature branch,
and utilize Skaffold for deploying and managing resources. Using a single GKE
cluster with Kubernetes namespaces provides an efficient way to isolate
different environments. Skaffold can help in deploying and managing resources
while integrating with CI/CD pipelines to create environments dynamically for
each feature branch.
b. Use a single GKE cluster with multiple node pools for each feature branch, and
label nodes accordingly.
c. Use individual GKE clusters for each feature branch and configure CI/CD
pipelines to deploy to the appropriate cluster.
d. Use multiple Kubernetes namespaces for each feature branch, with individual
GKE clusters per namespace.
73. You are working as a DevOps Engineer and have been tasked with managing and
optimizing logging in a Google Cloud Platform (GCP) project. You want to both exclude
certain logs from being ingested into Cloud Logging and export logs to an external
storage service for long-term analysis. Which combination of steps should you follow
to achieve these goals?
a. Create a logging exclusion filter for the logs you want to exclude, then create
a log sink for the logs you want to export.
b. Create a log sink for the logs you want to export, then create a logging
exclusion filter for the logs you want to exclude.
c. Create a logging exclusion filter for the logs you want to export, then create a
log sink for the logs you want to exclude.
d. Create a log sink for the logs you want to exclude, then create a logging
exclusion filter for the logs you want to export.
74. As a DevOps Engineer, you are responsible for implementing a CI/CD pipeline that
involves the deployment of multiple interdependent microservices on Google
Kubernetes Engine (GKE). To ensure efficient and reliable deployments, which of the
following strategies should you adopt to manage inter-service dependencies during
the deployment process?
a. Deploy all microservices simultaneously, regardless of their dependencies.
b. Implement separate CI/CD pipelines for each microservice and use Kubernetes
readiness and liveness probes to manage inter-service dependencies during
deployment. Implementing separate CI/CD pipelines for each microservice and
using Kubernetes readiness and liveness probes to manage inter-service
dependencies during deployment allows for better isolation, traceability, and
reliability. Readiness probes ensure that dependent services are available
before a service starts receiving traffic, and liveness probes monitor the health
of running services, restarting them if necessary. This approach promotes
efficient and reliable deployments while maintaining the advantages of a
microservices architecture.
c. Use a monolithic deployment approach by bundling all microservices into a
single container.
d. Use Kubernetes Init Containers to manage the order of deployment for
dependent microservices.
75. When retiring a service on Google Cloud Platform, which of the following is a
recommended best practice for managing the retirement process?
a. Removing the service without testing for any potential impacts on other
services in the same architecture.
b. Archiving all service data and logs for future reference.
c. Leaving the service running for an extended period of time to allow users to
migrate to a replacement service. When retiring a service, it is important to
consider the impact on users and other services. If the service is suddenly
retired, users may experience disruptions. Other services may also be affected
if they depend on the retired service.
To minimize the impact of retiring a service, it is important to give users and
other services time to migrate to a replacement service. This can be done by
leaving the retired service running for an extended period of time.
d. Retiring the service immediately without prior notice to minimize costs
76. When managing infrastructure as code in Google Cloud Platform, which of the
following options provides a declarative language to describe your infrastructure,
automates the creation and modification of resources, and supports version control?
a. Google Cloud Console
b. Google Cloud Build
c. Google Cloud Deployment Manager. When managing infrastructure as code in
Google Cloud Platform, it‘s important to use a tool that provides a declarative
language to describe your infrastructure, automates the creation and
modification of resources, and supports version control. The best option for this
in Google Cloud Platform is Google Cloud Deployment Manager. Google Cloud
Deployment Manager is a tool that allows you to define your infrastructure
using YAML or Python templates. It provides a declarative way to describe your
infrastructure, which means you don‘t have to worry about the underlying
implementation details. This makes it easier to manage your infrastructure as
code.
Other options are all powerful tools in their own right, but they do not provide
the same level of functionality as Google Cloud Deployment Manager when it
comes to managing infrastructure as code.
d. Google Cloud Functions
77. Supporting a large service with a defined SLO, the development team carries out
multiple weekly deployments of new releases. To ensure the service meets its SLO in
the event of a major incident, it is important that the development team prioritize
improving service reliability over working on features. So, what action should be taken
before such an incident takes place?
a. Create a suitable error budget strategy in collaboration with all service
participants.
b. Reach an agreement with the product team to always prioritize service
dependability over introducing new features.
c. Install a plugin to your Jenkins pipeline that prevents new releases when your
service is out of SLO.
d. Negotiate with the development team to reduce the release frequency to no
more than once a week.
78. You are a DevOps Engineer responsible for a critical service experiencing an incident.
Your top priority is to mitigate the impact on users. Which of the following approaches
should you take to effectively reduce the impact on users?
a. Disable all non-critical features of the service to reduce the load on the system
and focus on core functionality.
b. Implement a new, untested fix in the production environment without
evaluating its potential impact.
c. Implement a load-shedding strategy to redirect traffic from the affected
service to a stable fallback service or a static error page.
d. Send mass notifications to all users, informing them about the incident and
asking them to stop using the service temporarily.
79. A DevOps Engineer is building a CI/CD pipeline in Google Cloud Platform using Cloud
Source Repositories. Which of the following statements about triggers for Cloud
Source Repositories is correct?
a. Cloud Source Repositories triggers can only be configured to execute pipelines
on a schedule, rather than on a commit or other event in the repository.
b. Cloud Source Repositories triggers can be configured to execute pipelines
when a new commit is made to the repository, as well as on a schedule.
c. Cloud Source Repositories triggers can only be configured to execute pipelines
when a new tag is pushed to the repository, and cannot be triggered on a
commit or on a schedule.
d. Cloud Source Repositories triggers can only be configured to execute pipelines
when a new commit is made to the repository, and cannot be triggered on a
schedule.
80. What does Cloud Run provide in GCP?
a. Object storage for unstructured data
b. Scalable and managed MySQL and PostgreSQL databases
c. A serverless compute platform
d. Fully managed Kubernetes service
81. Which GCP service provides a scalable and fully managed NoSQL database?
a. Cloud Datastore. Cloud Datastore is another NoSQL database service in GCP,
but it is not specifically designed for massive scale or high-performance
workloads like Cloud Bigtable.
b. Cloud SQL. Cloud SQL is a managed relational database service in GCP that
supports SQL-based databases like MySQL and PostgreSQL. It is not a NoSQL
database.
c. Cloud Bigtable. Cloud Bigtable is the GCP service that provides a scalable and
fully managed NoSQL database. It is designed to handle massive amounts of
structured and unstructured data with low latency and high throughput. Cloud
Bigtable is a highly scalable and performant database suitable for use cases that
require real-time, read/write access to large datasets.
d. Cloud Spanner. Cloud Spanner is a globally distributed, horizontally scalable
relational database service, rather than a NoSQL database.
82. Which GCP service is used for managed service mesh infrastructure?
a. Cloud Build
b. GKE
c. Container
d. Anthos. Anthos is the GCP service used for managed service mesh
infrastructure. It is a modern application management platform that allows you
to build, deploy, and operate applications across multiple environments,
including on-premises and multiple clouds. Anthos includes features for
managing service mesh, which is a dedicated infrastructure layer for controlling
and monitoring the communication between services in a microservices
architecture.
83. What is the purpose of Cloud Armor in GCP?
a. Securing network traffic. Cloud Armor is a service in GCP that provides security
for your applications and services by protecting against distributed denial-of-
service (DDoS) attacks and securing network traffic. It offers a web application
firewall (WAF) capability that allows you to define and enforce security policies
to protect your applications from malicious traffic and unauthorized access.
Cloud Armor uses Google‘s global infrastructure and security capabilities to
inspect and filter incoming traffic, blocking requests that violate the defined
security policies. It helps safeguard your applications and resources from
common web-based attacks and ensures the availability and integrity of your
services.
b. Analyzing log data
c. Storing and managing secrets
d. Creating and managing databases
84. Let’s say your team is developing a python application for a government agency. The
company has decided that the application should be deployed to App Engine Flexible
environment in GCP. There is a security requirement for collection of the application
logs. Which steps can you take to fulfill this requirement?
a. Enable Cloud Logging
b. Grant the Logs Writer role to App Engine
c. You can write structured logs as JSON objects serialized on a single line to
stdout or stderr
d. There is nothing to be done, App Engine automatically sends these logs to the
Cloud Logging agent.
e. Integrate the python logging module with Cloud Logging.
85. Let’s say your team has developed and tested a video processing service for your
company. The video service accepts videos in one format and converts it to another
specified format. Your team has agreed on the indicator metrics to track the
performance of the system. All stakeholders of the application have agreed on a
minimum target value, within a rolling 4-week window, for the indicator metric used
to measure the service. What is needed to guarantee a level of service to the customer
with consequences for missing it?
a. Create a Service Level Agreement and share with the users
b. Create an Error Budget and share with the users.
c. Create a Service Level Indicator and share with the users
d. Create a Service Level Objective and share with the users.
86. Let’s say your Site Reliability (SRE) team members are managing the CICD of your
organization. The organization uses GCP Projects to separate environments. The
pipeline consists of Cloud Source Repository, Cloud Build and Spinnaker. There is a
security requirement to send the logs of Cloud Build in the ProductionProject to a user-
created bucket in a Project designated for logs. Which step can you take to achieve
this?
a. Grant the Cloud Build Service account of the Production Project the Project
Viewer role in the logging Project
b. Grant the Cloud Build Service account of the Production Project the Storage
Admin role in the Production Project.
c. Grant the Cloud Build Service account of the Production Project the Project
Viewer role in the Production Project
d. Grant the Cloud Build Service account of the Production Project the Storage
Admin role in the logging Project.
87. Let’s say your company has decided to migrate from on-premises to Google Cloud. The
first environment to be migrated is the development and testing environments.
Currently each environment is fully documented, consists of a network with 3 subnets,
several firewall rules, routes, VMs, Storage, Databases and DNS. The environments
need to be consistent and immutable. Following best practice, how would you deploy
the environments and make them reproducible with little overhead?
a. Create the environment as code using python in a Cloud Function. Assign
variables to values that are unique across environments
b. Create the resources individually in the console following the documentation
provided.
c. Divide the environment amongst experienced engineers, who will deploy
them and be responsible for the environment’s reproduction.
d. Create the resources individually in CLI following the documentation provided
e. Create the environment as code using Deployment manager or Terraform
templates. Assign variables to values that are unique across environments.
88. Which GCP service is used for managing and securing user identities?
a. Cloud KMS
b. Cloud Identity-Aware Proxy. Cloud Identity-Aware Proxy is a service in GCP that
provides secure access to web applications. It focuses on controlling access to
web applications based on user identity, but it is not specifically designed for
managing and securing user identities.
c. Cloud IAM. Cloud IAM (Identity and Access Management) is the GCP service
used for managing and securing user identities. It provides centralized access
control and permissions management for GCP resources. With Cloud IAM, you
can define and manage fine-grained access policies, assign roles to users and
groups, and control who has access to specific resources and actions within your
GCP projects.
d. Cloud Armor
89. What is the purpose of Cloud Composer in GCP?
a. Analyzing and visualizing data
b. Creating and managing virtual machines
c. Monitoring and logging application performance
d. Orchestrating and managing workflows and pipelines. Cloud Composer is a
GCP service that provides workflow orchestration and pipeline management
capabilities. It is based on the open-source Apache Airflow project and allows
you to author, schedule, and monitor complex workflows involving multiple
tasks, dependencies, and data processing operations.
With Cloud Composer, you can define workflows using Python or the Airflow
DSL (domain-specific language), create directed acyclic graphs (DAGs) to
represent dependencies between tasks and execute them based on predefined
schedules or event triggers. It enables you to automate and manage the
execution of data pipelines, ETL (extract, transform, load) processes, and other
types of workflows.
90. Which GCP service is used for managing and analyzing time-series data?
a. Cloud Spanner
b. Cloud Monitoring. Cloud Monitoring is the GCP service used for managing and
analyzing time-series data. It allows you to collect, visualize, and alert on metrics
and logs from your GCP resources and applications. Cloud Monitoring provides
a unified view of your system‘s performance and availability, enabling you to
monitor and analyze time-series data for metrics such as CPU utilization,
latency, error rates, and more.
c. Cloud Dataflow
d. Cloud Firestore
91. Which GCP service can be used to automatically scale virtual machine instances based
on defined criteria?
a. Cloud Spanner
b. Cloud Pub/Sub
c. Compute Engine Autoscaler. Compute Engine Autoscaler is the GCP service
used to automatically scale virtual machine instances based on defined criteria.
It allows you to specify scaling policies that determine when to add or remove
instances based on factors like CPU utilization, load balancing capacity, or
custom metrics.
d. Cloud SQL
92. Which GCP service can be used for creating and managing virtual networks?
a. Cloud VPN. Cloud VPN is a service in GCP that provides secure and encrypted
connectivity between your on-premises network and a VPC in the cloud. It is
used for establishing site-to-site VPN connections.
b. Cloud DNS
c. Virtual Private Cloud (VPC). Virtual Private Cloud (VPC) is the GCP service used
for creating and managing virtual networks. It allows you to define and control
a logically isolated virtual network environment in the cloud. Within a VPC, you
can create subnets, configure IP addresses, set up firewall rules, and connect
resources such as virtual machines and load balancers.
d. Cloud Interconnect. Cloud Interconnect is a service in GCP that allows you to
establish dedicated and high-bandwidth network connections between your on-
premises network and a VPC in the cloud. It is used for creating private
connections that bypass the public internet.
Therefore, the correct answer is Virtual Private Cloud (VPC) as it is the GCP
service specifically used for creating and managing virtual networks.
93. What is the role of Cloud CDN in GCP?
a. Deliver content to users with low latency. Cloud CDN (Content Delivery
Network) in GCP is a service that helps deliver content to users with low latency
and high availability. It works by caching content at edge locations strategically
placed around the world, closer to the end-users. When a user requests content,
Cloud CDN serves it from the nearest edge location, reducing latency and
improving the overall user experience.
b. Store and analyze large datasets
c. Securely manage user identities
d. Monitor and analyze network traffic
94. What does Cloud Load Balancing in GCP provide?
a. Improved network security. Improved network security and automatic cost
optimization are not the primary objectives of Cloud Load Balancing, although
it can indirectly contribute to improved network security by distributing traffic
and minimizing the risk of overload or single points of failure.
b. Scalability for virtual machines
c. Automatic cost optimization
d. High availability for services. Cloud Load Balancing in GCP provides high
availability for services by distributing incoming traffic across multiple instances
or backends. It helps ensure that your applications and services are highly
available and can handle increased traffic loads by evenly distributing requests.
This load balancing capability helps improve the overall performance and
reliability of your applications.
While scalability for virtual machines is an important aspect of load balancing,
it is not the sole purpose of Cloud Load Balancing. It focuses on distributing
traffic rather than specifically scaling virtual machines.

You might also like