0% found this document useful (0 votes)
160 views12 pages

CI - CD With Jenkins Pipelines, Part 1 - .NET Core Application Deployments On AWS ECS - by Alexander Savchuk - Xero Developer

This document summarizes an article about deploying .NET Core applications to AWS ECS using Jenkins pipelines. It discusses building Docker images with multi-stage builds to optimize caching. It describes challenges deploying with Terraform due to inability to dynamically pass image tags. The solution was to use Jenkins pipeline chains to deploy individual services, allowing quick rollbacks and isolating deployments. Infrastructure is deployed separately using Terraform.

Uploaded by

satish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
160 views12 pages

CI - CD With Jenkins Pipelines, Part 1 - .NET Core Application Deployments On AWS ECS - by Alexander Savchuk - Xero Developer

This document summarizes an article about deploying .NET Core applications to AWS ECS using Jenkins pipelines. It discusses building Docker images with multi-stage builds to optimize caching. It describes challenges deploying with Terraform due to inability to dynamically pass image tags. The solution was to use Jenkins pipeline chains to deploy individual services, allowing quick rollbacks and isolating deployments. Infrastructure is deployed separately using Terraform.

Uploaded by

satish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

9/21/2020 CI/CD with Jenkins pipelines, part 1: .

es, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

CI/CD with Jenkins pipelines, part 1: .NET


Core application deployments on AWS ECS
Alexander Savchuk Follow
May 23, 2018 · 11 min read

Pipeline chain with manual approval of deployment to production

In theory, deploying a dockerised .NET Core app is easy (because Docker simpli es
everything, right?). Just trigger your CI/CD pipeline on any new commit to GitHub
repository, build an image, run the tests, push the image to the ECR repository, update
the ECS task de nition to point to the new image, and then update the ECS service to
use the new task revision. Rinse and repeat for all environments.

In practice, this turned out to be somewhat less straightforward.

TL;DR: using infrastructure management tools such as Terraform or


CloudFormation for application deployments is problematic. We use pipeline
chains on Jenkins for easy ad-hoc deployments and rollbacks.

Setting the stage


Our team is working on a rather complex .NET Core / ASP.NET Core project with about
20 deployable microservices. All applications live in the same repository, which helps
developer agility a lot but makes deployments tricky. Each deployable microservice has
separate Docker le and Jenkins le.

https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 1/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

The branching model of the repository is pretty simple. There is a master branch, which
is built on every push and is always deployable, and a bunch of feature branches that are
mostly ignored by the deployment system.

There is also a lot of infrastructure bits and pieces which need to be deployed for each
microservice — ECR repository, ECS task and service de nitions, IAM role with policies,
as well as in some cases security groups, Route53 records, a Redis cluster, and an ALB
with listeners and target groups. We use Terraform to manage most of our
infrastructure. Infrastructure deployments use a di erent pipeline and will be covered in
the next blog post.

Build
All Docker images follow the same pattern. Docker les are multi-stage — we use a larger
SDK image to compile the application and a much smaller runtime image to deploy it.
Base images are pinned to the exact version (e.g. microsoft/aspnetcore-build:2.0.5–
2.1.4) to avoid surprises which could happen if we just used ‘latest’.

There are three main stages in the build.

First, we need to restore NuGet packages, which are referenced in *.csproj les. Copying
all les in one go would invalidate the Docker cache and trigger a lengthy restore
whenever any part of the application code changes. We copy *.csproj les separately
from C# source les to avoid this problem. There are complex inter-dependencies
between the projects, and copying *.csproj les one by one would be quite a chore. We
copy them in bulk (which, unfortunately, attens the directory structure) and then run a
simple script to move them to the correct folders. The end goal here is to optimise
layering, leverage cache, and reduce the build times.

1 COPY src/*/*.csproj ./
2 RUN for file in $(ls *.csproj); do mkdir -p ${file%.*}/ && mv $file ${file%.*}/; done

Dockerfile hosted with ❤ by GitHub view raw

Next comes the compilation. This step usually can’t be cached, so it takes some time.

https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 2/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

Lastly, the compiled assets are copied into the runtime image, along with the entrypoint
script. This script is responsible for setting up the execution environment correctly and
bailing out early if any of the mandatory environment variables are missing. The
applications typically are environment-agnostic (they do not care whether they run
locally on a Windows laptop or in a Linux Docker container on AWS), but certain
environment variables must always be set. All conditional logic is pushed to the
entrypoint script, which determines the environment and then, if necessary, fetches
secrets from Parameter store using the pstore utility, grabs metadata like the task
revision and container ID from the container metadata le, and so on. All logs emitted
by the application contain this metadata, which is crucial for being able to identify
which of the hundreds of containers running at a given point in time is having issues.

We add several tags to our Docker images. The build number comes from Jenkins and is
used during the application rollouts. We also add a timestamp and a git hash, which
point to the last commit included in this build and help to establish which application
changes a given Docker image includes.

Another handy tag is the base image version (e.g. ‘ 2.1-runtime’ for .NET Core apps),
which helps us understand whether we need to update an app if any security
vulnerabilities are discovered in the base image. The CI system needs to be aware of the
base image version, so instead of hard-coding it in the Docker le we are passing it as an
argument. The Docker le expects this argument to be set at build-time, but defaults to
something sensible for easier local builds:

1 # This will be overridden at build time


2 ARG VERSION=2.0.6
3 FROM microsoft/aspnetcore:$VERSION

Dockerfile hosted with ❤ by GitHub view raw

Jenkins sources the le that stores the base image version and passes it to the ‘docker
build’ command:

1 docker build --build-arg VERSION=${version} .

docker.sh hosted with ❤ by GitHub view raw

https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 3/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

Deploy
As involved as building a Docker image can be, deploying it in a repeatable and safe
manner turned out to be more complicated.

First, here are the options that we tried or evaluated.

Create a new task revision in Terraform


We already have the full task de nition as part of our infrastructure deployments in
Terraform, so it made sense to use it for application deployments. In theory, all you need
to do is swap the image tag with the new one. It’s a one-line change, how hard can it be?

1 {
2 "name": "${app}",
3 "image": "${image}", <<< THE CHANGE GOES HERE
4 "cpu": ${cpu},
5 ...
6 }

ecs.tf hosted with ❤ by GitHub view raw

The question is how Terraform would know which image to deploy? This tool does
accept parameters, but we have a strict rule that the only external parameter that is
allowed is AWS region, and even that is determined internally by the ‘terraform-
deployer’ Docker container which we use for infrastructure deployments. All other
arguments are either de ned in the con guration les, if they are static, or fetched by
Terraform dynamically using external sources (usually a remote Terraform state, AWS
API calls, or custom data sources). This approach allows us to use standard boilerplate
deployments for all our 100 odd infrastructure projects and not have to maintain
separate scripts for any of them.

We needed some way to deploy new images quickly, so the workaround was to just hard
code task de nition to use ‘latest’. It wasn’t enough though — because the task de nition
never changed (‘latest’ is always the same tag/string/hash, even though the underlying
image changes), Terraform did not detect any changes in the task and correspondingly
did not update the ECS service. We had to force its hand by passing a timestamp
environment variable to the task de nition. Its only purpose was to change the hash of

https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 4/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

the task de nition, so Terraform would have to create a new task revision and then
update the service.

Another problem was that each application deployment was touching all infrastructure
components even though it only needed to change one line in the task de nition. This
slowed down the deployments and increased the blast radius if anything went wrong.
What was even worse though, Terraform was designed to manage infrastructure, not
application deployments, and it would report a deployment as successful even if the
application could not start consistently of failed the health checks.

In summary, this was a quick and dirty way to get something to the test environment. It
was not suitable for production deployments, but it worked for a while during the initial
development phase.

Separate deployment stage in build Jenkins pipeline


The next thing we tried was to create a new task revision as part of our standard build
pipeline. At this stage, the image ID is known, and it’s trivial to create a new task revision
which would point to this image and then lean on ECS scheduler to roll it out. It also
makes sense to keep application deployments tied to application builds to follow the
spirit of CI/CD.

Not surprisingly, this caused other sorts of issues. Now we were changing ECS task
de nitions from two separate pipelines, and they not always agreed with each other.
Terraform only knew about the ‘latest’ image, and each time we refreshed our
infrastructure it reset the task de nition back to ‘latest’ image. Frequently this was the
same image that was already deployed, so it was just a relatively harmless refresh of the
service. Occasionally, however, the ‘latest’ image was not the same that was currently
running, and in this case, Terraform would perform a surreptitious application
deployment.

Another problem was that with this setup, all promotions between environments (test →
UAT → prod) happened in the same pipeline. If a deployment to, say, UAT, failed for
whatever reason (frequently this would be some transient network issue), we had to re-
run the whole pipeline from the start. There was also no clean way to do rollback with
this setup.

https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 5/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

Blue-green deployments using AWS reference templates


AWS o ers two examples of doing blue-green deployments on ECS.

ecs-blue-green-deployment is a set of templates and scripts which leverage


CloudFormation, CodePipeline, CodeBuild, and Lambda to perform simple blue-green
deployments (swapping 100% of tra c at once). ECS tasks and service con gurations
are managed in CloudFormation. At a high level, the whole process works like this:

CloudFormation deploys two copies of a service

CodeBuild builds a Docker image

CodeBuild then passes some arguments to CloudFormation, which updates one of


the services with the new image and task. ECS rolls out the new version of the
service and registers it with the load balancer. At this stage, the latest version of the
app can be tested on port 8080 of the load balancer.

After a manual approval, CodePipeline invokes a deployment lambda, which ips


target groups. Now the new version of the app is serving the live tra c, while the
old version switches to standby mode.

There were several reasons why this wasn’t a good t for our needs. One was that the
suggested solution only works with web services behind a load balancer. Some of our
most important services are worker-type console applications, and they would require a
di erent approach. Ideally, we would prefer to nd something that could work for all
our services in a similar fashion to keep the maintenance overhead low. Another reason
was that provided template relied on Code* family of services for deployments. These
technologies duplicate the capability that we already possess via Jenkins and have some
issues con guring with non-public GitHub Enterprise and cross-account access.

ecs-canary-blue-green-deployment is another way to do blue-green deployments. It has


some really interesting ideas, such as full immutability of services — instead of updating
an existing service and then relying on ECS rolling deployment to push out the change,
it creates a new service and then uses Step Functions to perform a gradual shift of tra c
to it. However, similar to the rst approach, it only works with web services, and in our
case would require some non-trivial e ort to refactor so that it works with microservices
using path-based routing.
https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 6/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

Both approaches used infrastructure-as-code tooling to perform application


deployments, which is far from a perfect t, especially considering that CloudFormation
is less exible than Terraform.

Separate deployment pipeline


Some of the issues mentioned above were a limitation of our CI/CD server (Jenkins) or
Terraform, but we had to work with what we’ve got and nd workarounds if necessary.

Split infrastructure and application deployments


Even though infrastructure and application deployments were happening in entirely
separate pipelines, sometimes that would touch the same things, which is always
problematic.

First of all, we needed to nd a way for Terraform to not change the current image ID
in ECS task de nitions. This was something that was managed outside of Terraform
now, and it should not try to reset it.

There didn’t seem to be a clean way of achieving this using any of the existing Terraform
providers and data sources, so we ended up extending Terraform by writing an external
data source. The next blog post will cover this in more detail.

Split application builds and deployments


As there is no way to re-run the deployment stage(s) of the pipeline, the logical solution
was to split the pipeline into two distinct but related pipelines. The build part stays
essentially the same — Jenkins builds a Docker image, tags it, and publishes the image
to ECR repository in the test environment. At this point, instead of deploying the new
image or exiting, it triggers the deployment pipeline and waits for it to complete.

1 stage('Start deployment') {
2 when {
3 branch 'master'
4 }
5 steps {
6 build job: "Deployment/${serviceName}/${env.BRANCH_NAME}",
7 propagate: true,
8 wait: true,
9 parameters: [
10 [$class: 'StringParameterValue', name: 'imageName', value: imageName],
https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 7/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer
10 [$class: StringParameterValue , name: imageName , value: imageName],
11 [$class: 'StringParameterValue', name: 'serviceName', value: serviceName],
12 [$class: 'StringParameterValue', name: 'tag', value: "${env.BUILD_NUMBER}"]
13 ]
14 }
15 }

Jenkinsfile hosted with ❤ by GitHub view raw

Deployments to test and UAT are de ned in a single Jenkins le used by all services,
which is parameterised with a service name, an image name, and an image tag. It
deploys the new image to test environment, runs some integration tests, and then
promotes the image to UAT. Only one image can be updated at a time. If a task
contains several containers (for example, an app container and an nginx container),
updating them will require separate deployments. This setup limits the blast radius and
makes rollbacks easier in case anything goes wrong.

After going through the same sequence of steps in UAT, this pipeline triggers yet another
pipeline to start deployment to prod environment.

Production deployments are similar, but require explicit approval from one of the
authorised users to start the actual deployment process. This stage proved tricky to
implement correctly. Without a timeout, the con rmation step would block the pipeline
inde nitely, so at one point we had dozens of hanging builds which were soaking up our
executors. After we added a timeout, the builds started to error out, which was also not
really what we wanted. In the end, we added a rather clunky workaround which timed
out and marked the build as successful even if it wasn’t approved to proceed to
production.

1 def release = false


2
3 pipeline {
4 agent any
5
6 options {
7 timestamps()
8 disableConcurrentBuilds()
9 }
10
11 parameters {

https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 8/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer
12 choice(choices: services, description: 'Name of the ECS service to deploy', name: 'serviceNam
13 choice(choices: services, description: 'Name of Docker image to update', name: 'imageName')
14 string(defaultValue: 'Tag to deploy', description: 'Docker image tag', name: 'tag')
15 }
16
17 stage("Confirm") {
18 when {
19 branch 'master'
20 }
21 options {
22 timeout(time: 5, unit: 'MINUTES')
23 }
24 agent none
25 steps {
26 script {
27 approvalMap = deploy.getSignoff('prod', 'app', "${params.serviceName} image ${params.imag
28 if (approvalMap['Release']) {
29 release = approvalMap['Release']
30 }
31 }
32 }
33 }
34
35 // Other deployment steps
36 // <...>
37
38 post {
39 failure {
40 script {
41 if (!release) {
42 currentBuild.result = "SUCCESS"
43 }
44 }
45 }
46 }
47 }

Jenkinsfile hosted with ❤ by GitHub view raw

The deployment is then reported to Slack and recorded in our monitoring and auditing
systems.

https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 9/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

In summary, we have pipelines triggering other pipelines, that invoke some more
pipelines. E ectively, this is a pipeline chain. A failure in any of the links will propagate
back, and it’s straightforward to re-run any of the previous build jobs.

Deploy, roll back, or roll forward


We use a containerised version of open-source ecs-deploy CLI tool for application
deployments. After the new Docker image is promoted to an environment, ecs-deploy
creates a new task revision which points to the new image and then updates ECS service
con guration to use this new task revision.

Deployment log on Jenkins

The actual rollout is controlled by ECS, and we can a ect it by setting minimum and
maximum deployment percentages. The below con guration, for example, will
temporarily double the number of running tasks before scaling them back and will
maintain 70% of tasks running at all times.

1 resource "aws_ecs_service" "web" {


2 deployment_maximum_percent = 200
3 deployment_minimum_healthy_percent = 70
4 }
https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 10/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

ecs.tf hosted with ❤ by GitHub view raw

During this rollout, the ecs-deploy utility monitors the status of ECS service. If it
deployed successfully, the utility marks the old ECS tasks revision as INACTIVE, and the
deployment is considered successful. If the tasks failed to start consistently, after a
con gurable timeout ecs-deploy would attempt to roll back to the previous ECS task
revision.

There may be cases where tasks can start successfully, but the latest changes introduce
some bug which is only found after the deployment. In this case, we would roll forward
by con guring the ECS service to use a new ECS task revision, pointing to the old image.
As far as Jenkins is concerned, this will be a standard deployment, which is triggered
manually by providing the image tag to the build job as a parameter. Alternatively (and
easier) we can just rerun an older build job, which will have all parameters already set.

The pathway to production


Overall, the current deployment process looks like this. First, something triggers an
image build — this is usually either a developer who wants to deploy their changes or a
scheduled build which happens at 7 am every business day. Then the new image is
propagated across non-prod environments and nally starts a production deployment
pipeline. After this the deployment to production is either approved by a developer or
times out. If it needs to go to production, all we need to do is to re-run this build and
authorise the deployment.

Such setup allows us to compose complex pipelines and to arbitrarily re-run them at
di erent stages. It also ensures that we have the latest version of the master branch
running in all environments, while keeping the deployment to production gated.

Over a million small businesses, and their advisors are looking for the best cloud apps that
integrate with Xero. Partner with us, and we’ll make sure they nd yours.

Docker AWS Deployment Developers DevOps

https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 11/12
9/21/2020 CI/CD with Jenkins pipelines, part 1: .NET Core application deployments on AWS ECS | by Alexander Savchuk | Xero Developer

About Help Legal

Get the Medium app

https://devblog.xero.com/ci-cd-with-jenkins-pipelines-part-1-net-core-application-deployments-on-aws-ecs-987b8e032aa0 12/12

You might also like