Advanced Deployment Strategies OpsMx
Advanced Deployment Strategies OpsMx
Deployment
Strategies
An overview of the most common
software deployment strategies
Table of Contents
Introduction 3
Blue-Green Deployments 6
Canary Release 11
Progressive Delivery 22
Introduction
Low-Risk Releases are Incremental
Today’s high-performance DevOps teams accelerate software delivery and reduce cycle
times in a safer, low-risk environment.
Continuous Delivery has enabled companies like Google, Netflix, and Amazon to bring on
new revenue streams faster, achieving the agility needed to respond immediately to
marketplace opportunities, events, and trends.
3
Continuous Delivery is a continuous-flow approach associated with just-in-time and
Kanban. The goal is an optimally balanced deployment pipeline with little waste, the lowest
possible cost, on-time, and defect-free deployment to production.
Waterfall
Agile Scrum
4
2
While Continuous Delivery drastically reduces the time between releases, DevOps teams
must implement advanced deployment strategies to ensure that software deployments can
be fast, repeatable, safe, and secure.
This ebook provides an overview of the latest deployment strategies and how best to
implement them in your software delivery practice.
5
Blue-Green Deployment
Blue-Green
Minimizes downtime during the “cut-over”
One of the challenges with automating deployments is the cut-over, taking software from
the final stage of testing to live production. You usually need to do this quickly to minimize
downtime.
The blue-green deployment strategy requires you have two production environments as
identical as possible. One of them, let’s say blue, for example, is live.
User Traffic
After
Load Balancer Load Balancer
Validation
100% Traffic 100% Traffic
You do your final testing stage in the green environment as you prepare a new software
release. When your testing completes, you switch the router so that all incoming requests
go to the green environment - the blue environment is now idle.
6
How to Implement
DevOps engineers can now run During the smoke tests, if any bugs are
smoke tests on the green instance as detected or a performance issue, the users
they need to assess if any issues will can quickly be rolled back to the stable
impact the users of the new version. blue version without any substantial
interruptions.
7
Benefits
Instant rollbacks
You can undo the change without adverse effects.
Testing parity
The newer versions can be accurately tested in real-world scenarios.
The inherent equivalence of the Blue and Green instances and a quick recovery mechanism
is also perfect for simulating and running disaster recovery practices.
Gone are the days when you had to wait for low-traffic windows to deploy the updates. This
eliminates the need to maintain downtime schedules. Developers can quickly move their
updates into production through the Blue-Green strategy as soon as they are ready with
their code.
8
Challenges
Some sessions may fail during the initial switch to the new environment, or users may
be forced to log back into the application. This issue can be overcome by using a load
balancer instead of DNS to manage new traffic from one instance to another.
User Traffic
Production
Load Balancer Load Balancer
Failure
100% Traffic 100% Traffic
Code compatibility
Different code versions need to co-exist to support the seamless switching between
blue and green instances. For example, if a software update requires changes to a
database, the Blue-Green strategy is challenging to implement because traffic may
switch back and forth between the blue and green instance. Therefore, you should use
a database compatible with all software updates.
9
Common Practices
Do not use DNS to switch between servers. It can take browsers a long time to get the
new IP address. Some of your users may still be served by the old environment.
Instead, use load balancing. Load balancers enable you to set your new servers
immediately without depending on the DNS mechanism. This way, you can ensure that
all traffic comes to the new production environment.
A rolling update slowly replaces the old version with the new version. As the new
version comes up, the old version is scaled down to maintain the overall count of the
application.
The Blue-Green deployment strategy is one of the most widely used deployment
strategies. It is a great fit when environments are consistent between releases and
user sessions are reliable even across new releases.
10
Canary Release
Early warning system to potential issues
Canary Release
Early warning system to potential issues
Canary deployments refer to the practice of releasing a code change to a subset of users
and then looking at how that code performs relative to the old code that the majority of
users are still running. Essentially becoming a “canary in a coal mine” serves as an early
warning system for potential issues.
This is achieved by setting up Canary servers that run the new code. As new users arrive, a
subset of them is routed via the load balancer to those canary servers.
Equally, if you see a much higher rate of I/O, that might also indicate a problem. Canary
Testing is ideal when you wish to test the performance of your backend.
11
How to Implement
1.0 1.0 1.0 1.1 1.0 1.0 1.1 1.1 1.1 1.1
1.0 1.0 1.0 1.1 1.0 1.0 1.1 1.1 1.1 1.1
1.0 1.0 1.0 1.1 1.0 1.0 1.1 1.1 1.1 1.1
Stable Code Version Update Version Stable Version Update Version Stable Version
1.0 1.1 1.0 1.1 1.1
12
Benefits Zero-downtime Releases
Zero-downtime releases
Switch users from one release to another instantaneously.
14
Challenges
It can be time-consuming and error-prone without automation
Today, many companies execute the analysis phase of canary deployments in a siloed
and non-integrated fashion. You need an automated and integrated toolchain.
On-Premise/thick client
applications are challenging
to update
If a DevOps engineer is assigned to manually collect monitoring data and logs from the
canary version and analyze them. It will not be scalable for rapid deployments.
Decisions on whether to roll back or roll forward will be delayed and may be based on
incorrect data.
On-Premise/thick client applications are challenging to update
Databases (or any shared resource) need to work with all versions of the application
you want to have in production. You might need to create a very complex deployment
process if you try to modify the application to interact with the database or change
the database schema.
To perform the canary, first, change the database’s schema to support two or more
instances of the application. This will allow the old and new versions of the application
to run simultaneously. Once the new architecture of the database is in place, the latest
version can be deployed and switched over.
15
Common Practices
You might be tempted to compare the canary deployment against your current
production deployment. Instead, always compare the canary against an equivalent
baseline deployed.
The baseline uses the same version and configuration that is currently running in
production but is otherwise identical to the canary:
Same time of deployment
Same size of deployment
Same type and amount of traffic
In this way, you control the version and configuration only, and you reduce factors that
could affect the analysis, like the cache warmup time, the heap size, and so on.
Slower rollouts mean better data but reduced velocity. It’s a balancing act, but
canaries deployed for critical services should live longer. Longer canary durations will
help detect issues. For highly critical services, recommendations are for 4 to 24 hours.
If you see regular workload peaks and troughs, time your canary period to begin
before peak traffic and cover a portion of the peak traffic period.
For example, if a service tier typically comprises 100 instances, you should canary on
5 to 10 of those instances. A canary covering only 1% to 2% of the workload is more
likely to miss or minimize some important cases; a canary covering more than 10% of
the workload may have too much impact if it doesn’t work as expected.
Conclusion
The canary deployment strategy is widely used because it lowers the risk of moving
changes into production while reducing the need for additional infrastructure.
Organizations using canary can test the new release in a live production environment
while not simultaneously exposing all users to the latest release.
16
Dark Launches & Feature Toggles
A safe way to gauge interest in a new feature
Dark Launching is similar to Canary deployments. However, the difference here is that you
are looking to assess users’ responses to new features in your frontend rather than testing
the performance of the backend.
The concept is that rather than launch a new feature for all users, you instead release it to a
small set of users.
V1 V2 V3
Production
Release
Development
Feature 1
Feature 2
Feature 3
User group 1
User group 2
Usually, these users aren’t aware they are being used as guinea pigs for the new feature,
and often you don’t even highlight the new feature to them, hence the term “Dark”
launching.
17
How to Implement
You can use UX instrumentation to monitor if the feature improves the user experience or
increases your revenue (e.g., the new feature may encourage them to spend longer using
your app and thus consume more ads or make more in-app purchases).
This process is precisely what any product manager is doing when assessing how well an
app performs. The only difference is that you are now looking at the performance of a
single new feature.
Use feature toggles to incrementally roll out a new feature to more and more users and
assess performance.
ON Userbase 1
OFF Userbase 2
New Feature
ON Userbase 3
Dark Launching enables product teams to roll back features that are not performing well or
fully launch features that users love.
18
Benefits
Dark launching empowers DevOps teams to safely experiment with new features and
new software versions at a lower risk. This creates faster feedback cycles so that
product teams can adjust features and software versions to the needs and wants of
the users and the markets they serve faster than their competitors.
With dark launches and feature toggles, product teams can gauge interest and
adoption of new features. They can make rollout decisions based on how the current
subset of users accept the new feature and determine how it will help with the
success of the product and company.
Like canary releases, dark launching and feature toggles will enable you to roll out new
features in a controlled manner. However, dark launches and feature toggles do not
require you to run multiple versions of an application in an environment simultaneously,
which you must do for a canary release.
Organizations can save time and money since features can safely be tested in
production and by actual users instead of QA engineers. Feedback would be directly
from the users in a real-world production environment.
19
Challenges
Dark launches and feature toggles require you to change the code in the application
you want to deploy. Development teams will need to design, code, build, maintain, and
deploy this support. Implementing this support for legacy applications with large
codebases could be problematic.
Feature toggles require code updates to implement. Typically these toggles are
temporary software updates to support the new feature. Once the feature has been
tested and accepted, the feature toggle is no longer required. If you don’t have a
process to maintain, update, and remove old temporary feature toggles, technical debt
will increase.
As more feature toggles are added, the code can become more fragile and brittle,
harder to understand and maintain, and less secure. Feature toggling is about your
software being able to choose between two or more execution paths based on a
toggle configuration. This increases the complexity of the code and makes it harder to
test, support, and secure.
20
Common Practices
As a general rule of thumb, you should try to encapsulate your feature toggle with its
supported business logic. This will help avoid having other areas of your codebase be
aware of the context needed for toggling the feature. Sometimes, this is impossible as
the core business logic might be broken up into several different services. If that is the
case, the toggle should be placed closest to the service call, passing a parameter to
the target services.
Ensure that you are setting service-level objects and determining service level
indicators to track the service or feature impacted performance. It’s critical to have the
tools and infrastructure to assess the system’s performance, monitor for unexpected
responses to client requests, and compare any system deviation to a baseline.
Development and delivery teams are asked to add toggles for various reasons but
aren’t often asked to remove a toggle after it has served its purpose. Teams need to
put processes in place to ensure that toggles are eventually retired. Whether adding a
toggle retirement task to the team’s work backlog or creating an expiration notification
event when a toggle’s expiration date has passed, you should have an automated
system that manages the lifecycle of a feature toggle.
21
Progressive Delivery
Control which users see which feature, and when
Progressive Delivery
Control which users see which feature, and when
Progressive Delivery extends Continuous Delivery by enabling more control over feature
delivery. The process deploys features to a subset of users, then evaluates key metrics
before rolling out to more users or rolling back if there are issues. Progressive delivery
introduces two core tenets, release progressions and progressive delegation.
Global Group
Regional Group
Canary Group
APM
Feature ON ON OFF
1
Feature
2
ON
OFF
OFF
22
How It Works
Incremental delivery and feature management are the key enablers of Progressive Delivery.
Both of these together provide fine-grain user exposure control to a new feature. This
exposure is referred to as the “blast radius.” By limiting the blast radius, you restrict the set
of users exposed to a possible bad outcome.
The decision to proceed or fall back is based on testing criteria and careful monitoring. You
might use canary analysis, A/B testing, observability, or other methods to meet the service
objectives and success criteria.
For example, using a release progression with canary, you can select a small blast radius,
1-5% of the entire user population, and then gradually increase to 10%, 20%..etc., based on
the feature performance and user feedback in each stage. If there are critical issues, you
can either turn that feature off or roll back to the baseline version.
Once the feature is deployed to all users, the management of the feature returns to
development so they can remove the feature toggle and ensure the new feature is part of
the new baseline version.
23
Benefits
Limits blast radius and users affected if features have problems or don’t work as
expected. Only a small subset of users are impacted by limiting the blast radius.
Developers get faster feedback and reduce break/fix cycle times, improving the
quality of the features as they get deployed across the user community.
With progressive delegation, real-time feedback and control of a feature are routed to
the team most responsible for the outcome. This improves both technical and
business outcomes.
Organizations can save time and money since features can safely be tested in
production and by actual users instead of QA engineers. Feedback would be directly
from the users in a real-world production environment.
24
Challenges
Testing in production poses real risks. You cannot rely on automated testing to catch
every issue before it hits production. Still, testing in production can be complex,
degrade the user experience, and slow down your development team. It can take a
long time to be confident in your release. You might have a new team of developers
that are not that familiar with the application. Perhaps traffic is slow, or there are many
features and code paths to exercise. Aligning the product team to a fast release
cadence may force a time constraint, and impatience will lead to poor quality and
broken glass.
Once Progressive Delivery has been adopted, it’s easy for teams to get lazy and wait
until the feature is deployed into production to do all of their testing. Fixing a defect is
cheaper if found on the developer’s desktop or during integration testing in a
pre-production environment. Progressive delivery enables you to get additional
validation, but it should not be an excuse to cut corners.
25
Common Practices
When you are in the planning phase of developing your Progressive Delivery practice,
make sure it’s a collaborative exercise with the product team. Development, product
management, and marketing should be involved in defining how to expose a new
feature to the users. You want to make sure everyone is clear about why you are
releasing the feature and what a positive or negative outcome is.
The key to progressive delivery is quickly making data-driven decisions about whether
to roll back or roll forward a feature. The data to drive that decision is typically
dispersed across many different tools within the DevOps toolchain. It might take hours
to approve or reject a canary. That is why it’s widespread for organizations and teams
to use a solution that automates workflows, securely deploys codes while meeting all
compliance requirements, and provides an automated risk assessment.
26
With OpsMx you can set up
a production ready pipeline
in under 60 minutes
Duration: 10 mins
Provision Infrastructure
STEP -1
-
plates. Automate your pipelines; no scripts re- Integrate all of your required DevOps tools like
quired! Jenkins, Artifactory, and Splunk through our
self-service integrations module.
Automate Compliance
STEP -3
Automate your compliance checks out of the
box with standard static and dynamic policies. Set up control gates and assign ownership at
each pipeline stage. Automated dashboards
ensure nothing is pending on a queue for too
long.
Setup Deployments
STEP -5
Leverage ready-to-use support for on-prem, Quickly and safely make informed decisions
hybrid, and multi-cloud deployments. Utilize with DORA metrics, real-time insights, and
canary, blue-green, and highlander strategies audit reports. AI-driven risk assessment auto-
with automated rollbacks. Use our unique archi- mates the pre-check of software releases at
tecture to deploy across security zones. every stage.
Duration: 10 mins
About OpsMx
Founded with the vision of “delivering software without human intervention,” OpsMx enables customers to transform and automate their
software delivery process. OpsMx’s intelligent software delivery platform is an AI/ML-powered software delivery and verification platform
that enables enterprises to accelerate their software delivery, reduce risk, decrease cost, and minimize manual effort. Follow us on Twitter
@Ops_Mx and learn more at www.opsmx.com
opsmx.com