0% found this document useful (0 votes)
32 views60 pages

VMMIG - Module05 - Optimize Phase

VM

Uploaded by

Sushant Bhosale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views60 pages

VMMIG - Module05 - Optimize Phase

VM

Uploaded by

Sushant Bhosale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

The Optimize Phase

Assess/Discover Plan/Foundation Migrate! Optimize your


your application Create a landing Pick a path, operations and
landscape zone to the cloud and get started save on costs

Cloud migration is the journey, the end-to-end lifecycle whereby


things move from other locations (on-prem, other clouds) and into
GCP is the destination where these things migrate to,
the GCP.
and which are often modernized/optimized in-cloud
afterwards.
Learn how to...
Leverage image and configuration management solutions

Enable autoscaling and rolling updates

Provide high-availability and disaster recovery solutions

Consolidate and simplify network and security settings

Select managed services to replace migrated workloads

Optimize costs

Migrate VMs directly into containers with Migrate for Anthos

1.
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
The Optimize phase is where it gets cloudy

Having moved your workloads, you can now update to fully exploit the cloud.

Input Activites Output


● Migrated VMs ● Update and prioritize backlog of ● Updated workloads
● Business objectives optimizations
● Relationships with app ● Design and test strategy for specific
owners optimizations
● Implement optimizations
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Basic image management scheme

Base OS install / GCE Public image

Period 1 Hardened OS Image

Period 2 Platform image

Period 3 App image

https://cloud.google.com/solutions/image-management-best-practices

Start with a base OS installation, or if building images for GCP, start with a public boot
image

Periodically, take the base image and harden it by removing services, change
settings, installing security components, etc. Build the hardened image every 90 days,
or whatever frequency makes sense in the organization. This becomes that basis of
subsequent builds.

More frequently, build platform-specific images. One image for web servers, one for
application servers, one for databases, etc. Build this image maybe every 30 days.

As frequently as you build an app, create new VM images for the new versions of the
application. You might create new application images on a daily basis.
Goals for image management

Post-migration Consistent, image-based Managed Instance Groups,


Autoscaling, Rolling
Bespoke 1 Server 1 Updates
Compute Engine Compute Engine

Servers
Compute Engine
Bespoke 2 Server 2
Compute Engine Compute Engine

v1.0
Bespoke 3 Server 3
Boot image
Compute Engine Compute Engine

v1.1
Bespoke 4 Server 4 Boot image
Compute Engine Compute Engine

After migration, you have servers with independent configurations. They may, or may
not, be managed with a configuration management solution. However, each is
managed as a unique asset.

By updating the servers to all use a consistent base image, you ensure uniform
configuration across multiple instances. You also make it possible to combine like
servers into managed instance groups. This provides benefits such as:
- Health checks
- Ability to resize the cluster easily
- Autoscaling (for workloads that will scale horizontally)
- A cloud-native approach to VM updates - that is, use of immutable images.
This, combined with the rolling update feature of MIGs, makes rolling out new
versions easy.
Organizational maturity

Robust image factory Core image library What's an image?

?
Web image

DB image

Customers may or may not have well developed image creation/management


practices in place for managing creation of VM images for on-prem or AWS
deployments.

Robust systems will be run much like a standard DevOps pipeline. Commits to a code
base will trigger build jobs, which will create/test/deploy images. The image building
tool can leverage configuration management systems to automate the configuration of
the image.

Many customers will have some version of the second option, with a set of images
that may be built manually or with partial automation. They don't get built as often,
and certainly not daily.

Some customers will have hand-crafted servers, and have no existing process in
place for creating/baking images.
GCP Images

Public image On-prem image definition Migrated VM Disk

Baked image Baked image Baked image

https://cloud.google.com/solutions/image-management-best-practices

There are three main approaches to creating GCP boot images that can be used for
managed instance groups.

Best: Build an image from the ground up, starting with a public image. Develop a
clean CI/CD pipeline for generating these images, using tools like Packer and
Chef/Puppet/Ansible.

Good: Use existing image-generation pipelines and produce output images for GCP.
Tools like Packer and Vagrant that are being used to produce VMware images can
also output these images for use with GCP.

Not-so-good (some would say bad): Take the migrated VMs disk and create an image.
Then manually prune and tailor the image.
How much configuration is baked in?

Base image, Major


Everything in
configuration components
image
on boot in image

https://cloud.google.com/solutions/image-management-best-practices

There are many variables that go into deciding how much you bake into an image:

- How mature is the organization when it comes to building images frequently


and efficiently (more mature -> bake in more)
- How long does it take to install necessary components so your app is
functional (longer install times -> bake in more)
- To what extent do you want to move away from in-place upgrade to immutable
images and machine replacement (away from in-place -> bake in more)
Demo: Image factory with Packer and GCP

https://cloud.google.com/community/tutorials/create-cloud-build-image-factory-using-p
acker

You will need to make a few tweaks to the demo to get it to work:

1. In the set the variable section, add these two commands


a. ZONE=us-east1-b
b. ACCOUNT=$(gcloud config get-value account)
2. In the create the build trigger section
a. Change the _IMAGE_ZONE variable to us-east1-b
b. For Tag, select the .* (any tag) entry from the dropdown
3. In the add your repository and push section
a. Before step 1, edit the config.yaml file - insert a line 20 that says
"disk_size" : 20,

The image creation process takes about 5 minutes. While waiting, take time to explain
how Cloud Build works.

Another example can be found here:


https://cloud.google.com/solutions/automated-build-images-with-jenkins-kubernetes

Packer can't SSH in successfully on instances where OS-Login is enabled. Make sure
the metadata to enable this feature is not set on the project where you are demoing.
Optimizing for configuration management
Disable network, authorization, firewall Disable network, authorization, firewall
management in playbooks management in playbooks

On premises On premises
CM Server Routes, firewall rules,
bandwidth
Compute Engine

CM Server CM Server
Migrated VM Migrated VM
Compute Engine Compute Engine

VM VM

Migrated VM Migrated VM
Compute Engine Compute Engine
VM VM

Migrated VM Migrated VM
Compute Engine VM Compute Engine VM

https://cloud.google.com/solutions/configuration-management/

As noted in the module on the Plan phase, companies should really have
configuration management for their on-prem assets in place prior to migrating VMs
into the cloud.

When extending infrastructure into the cloud, one common approach is to place CM
servers in the cloud as well. You configure the on-prem servers to manage the
on-prem inventory, and the cloud servers to manage the cloud inventory. You then
have either separate playbooks for the different environments, or adaptable playbooks
that use environment-specific variables or context to perform slightly different
configuration depending on whether the VM is in the cloud or on-prem.

An alternative approach is to leave the CM infrastructure on-prem, and have


configuration management orchestration happen across the interconnect. This
obviously is affected by latency, available bandwidth, and network access.

For VMs migrated into GCP, you'll want to remove the normal CM commands that
configure network, firewall, and authorization settings as they will be managed
differently in GCP.
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Optimizing for scaling and release management

● Managed Instance Groups offer…


○ Health checks
○ Autoscaling
○ Rolling updates and restarts
○ A/B testing, canary releases

● Stateless apps lend themselves to horizontal scaling


● Some stateful apps are not too difficult to refactor (move state off
server)
● Apps with licensing restrictions, MAC address hard coding, complex
state aren't good candidates for autoscaling

https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-manag
ed-instance-groups

GCE Managed Instance groups provide a mechanism for updating instances by


replacing running VMs with new VMs, using a new image. With this scheme,
significant changes (and perhaps all changes) are accomplished not by using
configuration management processes to update the software on the server. The
Rolling Update feature allows zero-downtime upgrades to instance groups, and can
support A/B testing, canary releases, and rollback.

In addition to ability to accommodate scaling out horizontally, you need to consider


scaling back, or deleting instances. Applications that have long-lived sessions, or
long-running processes, might not scale down as expected. There can be other
reasons that applications don't tolerate removal of instances well.
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Optimizing for high availability

● Distribute workloads across zones


○ Regional MIGs Region
○ Load balancing
Zone A Zone B Zone C

● Potentially distribute across regions App


Compute Engine

● Use resilient data stores


○ GCS is inherently HA
○ GCS offers multi-regional buckets
○ Managed services are often HA Name MR Bucket
Load Balancing Cloud Storage

https://cloud.google.com/docs/enterprise/best-practices-for-enterprise-organizations#
high-availability
https://cloud.google.com/docs/geography-and-regions

Regional instances groups will distribute instances created from your template across
zones. If a zone goes down, the instances in other zones remain available. If a zone
goes down, the regional MIG will not automatically create additional instances in the
remaining zones (unless autoscaling is enabled). An alternative is to use multiple
zonal managed instance groups.

Google typically recommends single-region deployments as being sufficient for


achieving high availability. Multi-region deployments do increase availability, but can
significantly increase costs due to network fees, and introduce other challenges
based on your application design. More often, deploying across regions is motivated
by a desire to place the app near consumers to reduce latency and improve
performance. It is also a strategy for disaster recovery situations.

Google's database managed services all offer high-availability options.

Not mentioned on slide, but also important, is ensuring you have a high-availability
interconnect between GCP and your on-premises networks. This should have been
handled during the Plan phase.
Optimizing for disaster recovery

● Your strategy depends on your


○ Recovery time objective
○ Recovery point objective

● The lower the tolerance for loss, the higher the cost and complexity
● Options include…
○ Cold: rebuild app in another region
○ Warm: unused app in another region
○ Hot: app runs across regions

https://cloud.google.com/solutions/dr-scenarios-planning-guide
DR: Cold pattern

App LB App DNS


Cloud Load Balancing Cloud DNS

Region 1 Region 2

App App
Compute Engine Compute Engine

App DB App DB
Cloud SQL Cloud SQL
Deployment
Manager

MR Bucket
Cloud Storage

https://cloud.google.com/solutions/dr-scenarios-planning-guide

The original environment is deployed using Infrastructure as Code (IaC). The app is
implemented using a managed instance group and instance templates. The database
is backed up periodically to a multiregional bucket.

If a regional fails, the application can be redeployed fairly quickly into a new region,
the database can be restored from the latest backup, and the load balancer can be
reconfigured with a new backend service.

RTO is bounded typically by the time required to restore the DB. RPO is bounded by
how frequently you perform database backups.
DR: Warm pattern

App LB App DNS


Cloud Load Balancing Cloud DNS

Region 1 Region 2

App App
Compute Engine Compute Engine

App DB Replica
Cloud SQL Cloud SQL

https://cloud.google.com/solutions/dr-scenarios-planning-guide

App deployments are made into multiple regions, but failover regions have smaller
application MIGs which don't serve traffic. A DB replica is created in the failover
region, and this receives replication traffic from the DB master, keeping it nearly
entirely up-to-date.

In the case of failure, update the load balancer to include the region 2 instance group
as a backend, increase the size of the instance group, and point the app to the replica
(this could be done via dns changes, or by placing a load balancer in front of the DB
and changing the load balancing configuration).

This design reduces the RTO and RPO significantly. However, it does introduce
cross-regional replication costs.
DR: Hot pattern

App LB App DNS


Cloud Load Balancing Cloud DNS

Region 1 Region 2

App App
Compute Engine Compute Engine

App DB
Cloud Spanner

https://cloud.google.com/solutions/dr-scenarios-planning-guide

App deployment occurs in multiple regions. The load balancer does geo-aware
routing of requests to the nearest region. The backing database service, Spanner,
handles replication across regions.

If a region goes down, the application continue to operate without interruption.


Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Optimizing routes and firewall rules

● After many sprints, there is often


an untidy collection of routes and
firewall rules
○ Best practice is to consolidate
and simplify
○ Use Security Command Center to
discover routes and rules
○ Consider Forseti for firewall rule
scanning

● Google recommends service


account-based firewall rules

https://www.youtube.com/watch?v=1ibeCQjjpBw&autoplay=1
https://forsetisecurity.org/about/
https://cloud.google.com/vpc/docs/firewalls#service-accounts-vs-tags
https://cloud.google.com/blog/products/gcp/simplify-cloud-vpc-firewall-management-w
ith-service-accounts
Optimizing load balancing

● GCP Load balancers offer App LB


high-performance and Cloud Load Balancing

high-availability
App
Compute Engine
● Proxy-based load balancers offer
cross-regional routing
On premises
● Hybrid load balancing can be DNS

achieved with round robin and/or


HTTPS
Load Balancer

weighted DNS App


DNS

https://cloud.google.com/load-balancing/
Optimizing security at the edge

● Google's proxy-based load balancers


protect against DDOS
○ SYN floods
○ IP fragment floods
○ Port exhaustion
○ Etc.

● Cloud Armor provides additional controls


to secure HTTP(S) load balancing
○ IP deny/allow list
○ Geo-based access control (alpha)
○ L3-L7 parameter-based rules (alpha)

https://cloud.google.com/files/GCPDDoSprotection-04122016.pdf
https://cloud.google.com/armor/
Optimizing secret management
Google APIs

● Cloud-native secret management


solutions make managing secrets for Migrated VM
Service
account
GCP instances easier
● VM service accounts can be used by
apps when calling Google services
3rd party service
● Cloud KMS provides a means for
IAM-based encryption/decryption of
Cloud
secrets Migrated VM
KMS
● Cloud HSM does the same
Request to decrypt
GCS-stored secret GCS

https://cloud.google.com/compute/docs/access/create-enable-service-accounts-for-in
stances
https://cloud.google.com/kms/
https://cloud.google.com/hsm/
https://cloud.google.com/kms/docs/encrypt-decrypt

Apps running on instance can use the VM-assigned service account by using Cloud
Libraries; credentials are passed to the application via metadata.

Apps can leverage Cloud KMS to decrypt secrets that are either stored in app
configuration or in GCS. The application operates within the context of a service
account. That account has permissions to use a given key. That key is used by Cloud
KMS to encrypt/decrypt secrets. The diagram shows a secret being stored in GCS,
and the app asks KMS to read and decrypt the file.
Optimizing IAM configurations

● After many sprints, there is often an untidy collection of IAM role


assignments
● APIs make it possible to write tools that will extract role assignments
and definitions for analysis
○ Possible to place in BigQuery for resultant set of policy reporting

● New Policy Intelligence tools aid in IAM cleanup


○ Recommender uses machine learning to identify over-permissioning
○ Analyzer answers questions like "who can access this resource?"
○ Troubleshooter answers questions like "why can't Bob access this
resources?
○ Simulator reports the impact of a proposed policy change

https://cloud.google.com/policy-intelligence/
Logging and monitoring for security

● Google services write logging and metrics data into Cloud Logging
○ VMs with agents installed write guest OS and application data as well
○ This data can flow through to your logging/monitoring tools of choice

● Enable logging and create dashboards and alerts for new,


cloud-native signals
○ Changes to IAM role definitions
○ IAM role assignments
○ Firewall rule logging
○ VPC flow logs
○ Etc.

● If you are using other logging/monitoring solutions, you'll likely need


cloud-based aggregators

https://cloud.google.com/logging/docs/export/
https://cloud.google.com/solutions/exporting-stackdriver-logging-for-splunk
https://www.splunk.com/blog/2016/03/23/announcing-splunk-add-on-for-google-cloud-
platform-gcp-at-gcpnext16.html
https://resources.netskope.com/cloud-security-collateral-2/netskope-for-google-cloud-
platform
https://help.sumologic.com/03Send-Data/Sources/02Sources-for-Hosted-Collectors/G
oogle-Cloud-Platform-Source
https://cloud.google.com/logging/docs/audit/
https://cloud.google.com/vpc/docs/using-flow-logs
https://cloud.google.com/vpc/docs/firewall-rules-logging
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Optimizing with managed services

● One common way to optimize your applications is to replace


VM-based parts of the architecture with managed services
○ MySQL -> Cloud SQL, Spanner
○ HBase -> Bigtable
○ Kafka -> Cloud Pub/Sub
○ Hadoop/Spark -> Dataproc

● Leveraging managed services has multiple benefits


○ Much reduced administrative overhead
○ Potentially better availability and scalability
○ Increased functionality
Managed services behave differently

GCP Service VM-based solution Things to know...

CloudSQL MySQL Cost, not on VPC, see differences and issues docs

Pub/Sub Kafka Messaging only, no ordering, at least once delivery, different


latencies, general architecture, pay by volume

Dataproc Hadoop/Spark Cost for large persistent clusters, GCS performance


characteristics, workflows, configuration mechanisms

Memorystore Redis Failover period not configurable, no persistence, no support for


user modules

https://cloud.google.com/sql/docs/mysql/features#differences
https://cloud.google.com/sql/faq

Cloud SQL costs roughly 2x the cost of un-managed MySQL running on a VM. Cloud
SQL VMs are not on a VPC in the project; they are accessed via peering or public IP.

https://cloud.google.com/pubsub/architecture
https://cloud.google.com/pubsub/docs/faq
https://cloud.google.com/pubsub/docs/ordering
https://cloud.google.com/pubsub/pricing

https://cloud.google.com/dataproc/pricing
https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage
https://cloud.google.com/dataproc/docs/resources/faq

Dataproc's $0.01/vcpu/hr. charge adds up on very large clusters that are long lived.

In general, the online documentation does a good job of detailing key issues. Review
the concepts section, the known issues section, and the pricing. Also, Googling <gcp
product> vs. <other product) often yields good initial results.
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Make sure you tailor instance sizes to real needs

● You chose VM sizes during initial planning and migration


● After the workloads have been in production for a while, you should
review and evaluate the real usage
○ Cloud Monitoring is your friend

● Have at least one phase where you go through an additional


right-sizing pass
○ Note that resizing a VM does entail downtime
○ With Managed Instance Groups, you can do this with rolling updates

● GCP offers sizing recommendations


○ Based on monitoring metrics over 8 days
○ Cloud Monitoring agent improves recommendations

API access to recommendations is coming soon!

Sizing recommendations are currently not available for: VM instances created using
App Engine Flexible Environment, Cloud Dataflow, or Google Kubernetes Engine; or
VM instances with ephemeral disks, GPUs, or TPUs.

The sizing recommendation algorithm is suited to workloads that follow weekly


patterns, workloads that grow or shrink over weeks of time, workloads that
persistently underutilize their resources, or workloads that are persistently throttled by
insufficient resources. In such cases, 8 days of historical data is enough to predict
how a change in the size of the machine can improve resource utilization.

The sizing recommendation algorithm is less suited to workloads that spike less
frequently (for example, monthly spikes) because 8 days of data is not enough to
capture or predict the processing fluctuations.

https://cloud.google.com/compute/docs/instances/apply-sizing-recommendations-for-i
nstances
There's more to TCO than VM costs

Persistent Disk

Network egress

Intra-VPC traffic
costs

Load balancer
costs

BigQuery and billing exports are your friends

https://cloud.google.com/billing/docs/how-to/export-data-bigquery

BigQuery is hugely useful for analyzing billing data. It can be used to find large,
and potentially unexpected, sources of cost - which can then be optimized.
Watch network costs

● Look for intra-VPC network costs


○ Consider moving vms into same zone or region

● Avoid VPC egress


○ Place VMs that exchange traffic on same VPC, or use peering

● Consider standard tier networking


○ Consider latency and reliability tradeoffs

Remember that traffic transferred within a VPC, but across zones or regions incurs
costs (in addition to the more obvious VPC egress).

https://cloud.google.com/vpc/docs/vpc-peering
https://cloud.google.com/vpc/docs/shared-vpc
https://cloud.google.com/network-tiers/
Working with budgets

● GCP has a mechanism for setting budgets


○ Can provide alerts when reaching % of budget

● GCP also offers programmatic budget notifications


○ Sends messages via Pub/Sub
○ Cloud Functions are an easy way to receive and act on messages
○ Theoretically, you could disable runaway services

● BigQuery and App Engine offer cost controls


○ Can stop use of product after $X spend

https://cloud.google.com/billing/docs/how-to/budgets
https://cloud.google.com/billing/docs/how-to/notify
https://cloud.google.com/bigquery/docs/custom-quotas
https://cloud.google.com/appengine/pricing#spending_limit
Lab 13
Defining an optimization strategy
Agenda
Introduction
Image strategies and configuration management
Managed Instance Groups
Availability and disaster recovery
Networking and security consolidation
Managed services
Cost optimization
Migrate for Anthos (VMs to containers)
Moving VMs into containers

Why Kubernetes/GKE?

Secure kernel

Density

Resiliency

Modernization

Experience with desired end state

GKE is simply the best K8s experience

● Google offers automatic updates, which keeps the kernel on the machines
running you apps secure.
● You can run more apps on a given host for better resource utilization.
● If a node goes down, workload is rescheduled to another node, and quickly
● Istio, for example, allows you to make service discovery, traffic splitting,
authorization, circuit break patterns, and other features easy to implement
without having to rewrite apps.
● Teams can get experience using GKE and K8s without having to totally
re-engineer their apps.
Migrate for Anthos moves VMs to containers

● Move and convert workloads into containers


○ Automated
○ Easy path from existing app to k8s container
● Workloads can start as physical servers or VMs
○ On-prem or in other cloud providers
● Builds on Migrate for Compute Engine (Velostrata) tech
○ Not required for GCE > GKE migrations
● Moves workload compute to container immediately (<10 min)
● Data can migrated all at once or "streamed" to cloud until app is live in
cloud
Operational Tiers
Processing
● GKE on GCP or Anthos GKE on-prem
● Runs Migrate for Anthos components to generate artifacts

Control
● Migration CRD, console, CLI

Workload Execution
● Linux -> Anthos runtime embedded in container image, replacing VM's kernel
● Windows -> generates Dockerfile which builds Windows Server container image

Maintenance
● Dockerfile can be used in CI/CD pipeline that build updated container images

- For GCP, AWS, and Azure source, the processing cluster can be either
GKE or Anthos GKE on Google Cloud cluster; for VMware sources, you
need a Anthos GKE on-prem cluster
- For Linux workloads, you can manage the entire process within the
console (rather than the old CLI installation and migration processes)
- For more details, see
https://cloud.google.com/migrate/anthos/docs/architecture
Migrate for Anthos Architecture
Migrate for Anthos Production project

Processing Cluster Artifacts Images Prod Cluster


Kubernetes Engine Cloud Storage Container Engine Kubernetes Engine

CRD SC CSI ... Docker yaml yaml yaml image image app

Migrate for Compute Engine on prem/cloud

Migrate Manager Edge Nodes Cache


Compute Engine Compute Engine Cloud Storage Local Local Local
Compute Compute Compute

1. Migrate for Compute Engine creates the pipeline for streaming/migrating data
from on-prem or cloud systems into GCP.
2. Migrate for Anthos is installed on a GKE processing cluster, and is comprised
of many Kubernetes resources.
3. Migrate for Anthos is used to generate deployment artifacts
a. Some, like the Kubernetes configurations and the Docker file used to
create the VM-wrapping container go into GCS
b. The container images themselves are stored in GCR
4. Once the deployment assets have been created, then can be used to test or
deploy into a target cluster
a. You simply apply the generated configuration and it creates the
necessary Kubernetes elements on the target cluster

Keep in mind that the bottom half of the diagram doesn't apply when migrating from
GCE.
Migration works with specific OS and GKE versions
Supported workload OSes

1.13.5-gke.10 and later


(GKE version for processing cluster)

Ubuntu
COS (ext2/3/4)
(GKE node OS for processing cluster)

https://cloud.google.com/migrate/anthos/docs/supported-os-versions
Migrations follow a typical path

1 2 3 4 5 6

Configure Add Migration Generate and review Generate artifacts Test Deploy
Processing Cluster Source plan

Create cluster, install Contains details needed Create a migration object Generate container Test the container Use the generated
Migrate for Anthos to migrate from VMware, with details of the images and YAML files images, test deployment artifacts to deploy to
components AWS, Azure, or GCP migration. Customize the for deployment. production clusters.
generated plan as needed

https://cloud.google.com/migrate/anthos/docs/migration-journey
Migrate for Anthos requires a processing cluster

gcloud container --project $PROJECT_ID \


clusters create $CLUSTER_NAME \
--zone $CLUSTER_ZONE \ Processing Cluster
--machine-type "n1-standard-4" \ Kubernetes Engine
--image-type "UBUNTU" \
--num-nodes 1 \
--enable-ip-alias \
--tags="http-server"

https://cloud.google.com/migrate/anthos/docs/configuring-a-cluster

● You must be a GKE admin to set up the cluster.


● You must have firewall rules in place that allows communications between
Migrate for Anthos and Migrate for Compute Engine.
● The processing cluster must be on the same VPC as your Migrate for
Compute Engine infrastructure if you are migrating from a non-GCP source
environment

The example command enables a VPC-native cluster.


Add the processing cluster

Using the Console to add a processing cluster provides a set of Cloud Shell
commands that are used to complete the installation of Migrate for Compute Engine.
You can run each command by clicking on the RUN IN CLOUD SHELL buttons.

The first command enables all the services required for Migrate for Anthos to operate.
Configuring access

gcloud iam service-accounts create m4a-125744758

gcloud projects add-iam-policy-binding \


qwiklabs-gcp-01-8415e218f0d2 \
--member=serviceAccount:m4a-125744758@qwiklabs-gcp-01-8415e218f0d2.iam.gserviceaccount.com \
--role=roles/storage.admin

gcloud iam service-accounts keys create sa.json \


--iam-account=m4a-125744758@qwiklabs-gcp-01-8415e218f0d2.iam.gserviceaccount.com \
--project qwiklabs-gcp-01-8415e218f0d2

● A service account is required for storing the migration artifacts in Container


Registry and Cloud Storage. Create a new account by running the following
command. The first command creates that account.
● The second command adds to the account the required permissions to access
Container Registry and Cloud Storage.
● Migrate to containers software requires a key to use the service account. The
third commands creates and exports a new key to a file.
Installing Migrate for Anthos uses migctl
namespaces services

gcloud container clusters get-credentials \


migrate-processing-cluster \
--zone us-central1-f \ serviceaccounts statefulsets
--project qwiklabs-gcp-01-8415e218f0d2 && \
migctl setup install --json-key=sa.json

roles/clustroles daemonsets

rolebindings/
job
clusterrolebindings
Processing Cluster
Kubernetes Engine
configmaps storageclass

CRD

● The gcloud command configures kubectl to work with the processing cluster
● The migctl command installs the CRDs and creates resources:
○ namespace/migrate-system created
○ namespace/v2k-system created
○ storageclass.storage.k8s.io/v2k-generic-disk created
○ customresourcedefinition.apiextensions.k8s.io/migrations.anthos-migra
te.cloud.google.com created
○ serviceaccount/csi-vlsdisk-csi-controller-sa created
○ serviceaccount/csi-vlsdisk-csi-node-sa created
○ serviceaccount/v2k-generic-csi-csi-controller-sa created
○ serviceaccount/v2k-generic-csi-csi-node-sa created
○ serviceaccount/v2k-reconciler-sa created
○ serviceaccount/validator-sa created
○ role.rbac.authorization.k8s.io/csi-vlsdisk-node-healthcheck-pods-role
created
○ role.rbac.authorization.k8s.io/v2k-leader-election-role created
○ clusterrole.rbac.authorization.k8s.io/csi-vlsdisk-controller-role-vls
created
○ clusterrole.rbac.authorization.k8s.io/csi-vlsdisk-driver-registrar-role
created
○ clusterrole.rbac.authorization.k8s.io/csi-vlsdisk-node-healthcheck-role
created
○ clusterrole.rbac.authorization.k8s.io/v2k-generic-csi-controller-role-vls
created
○ clusterrole.rbac.authorization.k8s.io/v2k-generic-csi-driver-registrar-rol
e created
○ clusterrole.rbac.authorization.k8s.io/v2k-manager-role created
○ clusterrole.rbac.authorization.k8s.io/v2k-proxy-role created
○ clusterrole.rbac.authorization.k8s.io/validator-role created
○ rolebinding.rbac.authorization.k8s.io/csi-vlsdisk-node-healthcheck-pod
s-binding created
○ rolebinding.rbac.authorization.k8s.io/v2k-leader-election-rolebinding
created
○ clusterrolebinding.rbac.authorization.k8s.io/csi-vlsdisk-controller-attach
er-binding created
○ clusterrolebinding.rbac.authorization.k8s.io/csi-vlsdisk-controller-provisi
oner-binding created
○ clusterrolebinding.rbac.authorization.k8s.io/csi-vlsdisk-controller-secret
-access-binding created
○ clusterrolebinding.rbac.authorization.k8s.io/csi-vlsdisk-driver-registar-bi
nding created
○ clusterrolebinding.rbac.authorization.k8s.io/csi-vlsdisk-node-healthchec
k-binding created
○ clusterrolebinding.rbac.authorization.k8s.io/v2k-generic-csi-controller-a
ttacher-binding created
○ clusterrolebinding.rbac.authorization.k8s.io/v2k-generic-csi-controller-p
rovisioner-binding created
○ clusterrolebinding.rbac.authorization.k8s.io/v2k-generic-csi-controller-s
ecret-access-binding created
○ clusterrolebinding.rbac.authorization.k8s.io/v2k-generic-csi-driver-regis
tar-binding created
○ clusterrolebinding.rbac.authorization.k8s.io/v2k-manager-rolebinding
created
○ clusterrolebinding.rbac.authorization.k8s.io/v2k-proxy-rolebinding
created
○ clusterrolebinding.rbac.authorization.k8s.io/validator-role-binding
created
○ configmap/default-logs created
○ configmap/exporter-default-config created
○ configmap/v2k-manager-config-map created
○ service/v2k-controller-manager-metrics-service created
○ service/v2k-controller-web-server created
○ deployment.apps/v2k-controller-manager created
○ statefulset.apps/csi-vlsdisk-controller created
○ statefulset.apps/v2k-generic-csi-controller created
○ daemonset.apps/csi-vlsdisk-node created
○ daemonset.apps/v2k-generic-csi-node created
○ job.batch/admission-validator created

This process takes a couple minutes to complete, and can be checked using the
provided command: gcloud container clusters get-credentials
migrate-processing-cluster --zone us-central1-f --project
qwiklabs-gcp-01-8415e218f0d2 && migctl doctor
Adding a source enables migrations from a specific
environment

You select the processing cluster, the source type, the target project for migrated
workloads, and a service account used to do the migration (this can be created for
you as part of install)

The example command is for migrating from GCE.


Creating a migration generates a migration plan

● Defines the migration


resource that will be
created on cluster
● There are different
migration intents
○ Image
○ ImageAndData
○ Data

● Creates a migration plan

https://cloud.google.com/migrate/anthos/docs/creating-a-migration
Creating a migration generates a migration plan

● Migration plans are


reviewed and
customized prior to
generating artifacts

You should review the migration plan file resulting from creating a migration, and
customize it before proceeding to executing the migration. The details of your
migration plan will be used to extract the workload container artifacts from the source
VM, and also to generate Kubernetes deployment files that you can use to deploy the
container image to other clusters, such as a production cluster.
Generating artifacts

Processing Cluster DOCKER/YAML files Container Images


Kubernetes Engine Cloud Storage Container Registry

CRD DOCKER YAML files Images

https://cloud.google.com/migrate/anthos/docs/executing-a-migration

● Migrate for Anthos creates two new Docker images: a runnable image for
deployment to another cluster and a non-runnable image layer that can be
used to update the container image in the future. See Customizing a migration
plan for information on how to identify these images.
● It also generates configuration YAML files that you can use to deploy the VM
to another GKE cluster. These are copied into a Cloud Storage bucket as an
intermediate location. You run migctl migration get-artifacts to download them.
Deployment files typically need modification
● The configuration defines
migctl migration get-artifacts demo-migration
resources to deploy
○ Deployment or StatefulSet
○ Headless service
○ PersistentVolumes and
PersistentVolumeClaims

● You can edit to enable


load-balancing, ingress, disk
size, etc.

https://cloud.google.com/migrate/anthos/docs/review-deployment-files
Apply the configuration to deploy the workload

kubectl apply -f deployment_spec.yaml

https://cloud.google.com/migrate/anthos/docs/review-deployment-files
Best Practices

Good fit workloads Poor fit workloads Things to consider

Web/Application servers
High-perf databases DNS
Business logic layer
In-memory databases NFS
Multi-VM, multi-tier

Small/Med databases Special kernel requirements Environment Variables

Low duty-cycle, bursty Hardware dependencies runlevels


Dev, test, training
Software with licenses tied
to hardware ID Disable unneeded services
Low load services

https://cloud.google.com/migrate/anthos/docs/planning-best-practices?authuser=0

DNS
- Replace host names with service names (and create services)
- Handling /etc/hosts files requires special effort
- Figure out how discovery will work, and if you need to place services in same
namespace

NFS
- Must modify YAML files to mount NFS
- Can't run with kernel mode NFS servers
- Data from NFS mount is not automatically migrated

Environment variables
- If your applications rely on injected metadata (for example, environment
variables), you will need to ensure that these are available on GKE. If the
same metadata injection method is not available, GKE offers ConfigMaps and
Secrets.

Runlevels
- Migrate for Anthos workloads reach runlevel 3 only. VMs migrated into GKE
with Migrate for Anthos will be booted in the container at Linux runlevel 3.
- Certain services (for example X11 or XDM, for remote GUI access using VNC)
are configured by default to start only at runlevel 5. Any necessary services
should be configured to start at runlevel 3.

Disable unneeded services


- Migrate for Anthos will automatically disable hardware- or environment-
specific services, as well as a pre-defined set of additional services running on
VMs. You might choose to disable additional services.
Lab 14
Migrating VMs to containers with Migrate for Anthos
< Action Safe

Title Safe >

You might also like