ProjectReport Aditya
ProjectReport Aditya
Project Report
Submitted
Bachelor of Technology
2021
ACKNOWLEDGEMENTS
Computer Science and Engineering) for their kind co-operation for supervising.
Computer Science and Engineering Department and my friends for their cooperation
and encouragement.
Aditya Pareek
ABSTRACT
1 INTRODUCTION .............................................................................................. 5
2 DOCKER .......................................................................................................... 8
2.1 Virtual machines and containers ............................................................ 9
2.2 Docker and Virtual machines ................................................................ 10
2.3 Linux containers ................................................................................... 11
2.4 Storage drivers ..................................................................................... 13
2.5 Dockerfile ............................................................................................. 14
2.5.1 BUILD phase ............................................................................ 15
2.5.2 RUN phase ............................................................................... 17
2.6 Dockerfile best practices ...................................................................... 19
2.7 Docker-compose .................................................................................. 20
1 INTRODUCTION
The two main components used in this thesis are Docker and Kubernetes. Docker
is used to create a Dockerimage of the application by using a Dockerfile. A
Dockerfile has all the instructions on how to build the final image for deployment
and distribution. The images that are made are reusable perpetually. The image
is then used by Kubernetes for the deployment. The benefits of Docker are, for
example, the reusability of once created resources and the fast setup of the target
environment, whether it is for testing or production purposes. This is achieved
through container technologies made possible by Docker and Kubernetes.
Container technology is a quite new technology which has been growing for the
past five years. (Docker Company 2018, cited 14.1.2018.)
Once the Dockerimage is created with the Docker platform, it is ready to be used
with the Kubernetes container platform. With the Docker platform a base image
is created, which is then used by the Kubernetes deployment platform. At best
this is done with a press of a button. The ease-of-deployment eliminates possible
human errors in the process, which makes the deployment reliable, efficient and
quick. The reason Kubernetes was selected is its versatility, scalability and the
potential automatization of the deployment process. The technologies are still
quite new and are being developed every day to improve the end-user
experience, which already is enjoyable.
The field of Development and Operations (DevOps) benefit greatly from
containerization in the form of automating the deployment. There are several
types of software to create a continuous integration and deployment pipeline
(CI/CD). This enables the DevOps team to deploy an application seamlessly to
the targeted environment. Compared to normal virtual machines, containerized
platforms require less configuration and can be deployed quickly with the CI/CD
pipeline. Container technologies also solve the problem of software environment
mismatches because all the needed dependencies are installed inside the
container and they do not communicate with the outer world. This way the
container is isolated and has everything it needs to run the application. (Rubens
2017, cited 26.12.2017.)
With containers, all the possible mismatches between different software versions
and operating systems are canceled out. It enables the developers to use
whichever programming language and software tool they want to, if they can run
it without problems inside the container. This combined with the deployment
process, makes the whole ordeal agile, highly scalable and most importantly, fast.
With Docker and Kubernetes, it is possible to create a continuous integration- and
deployment- pipeline which for example guarantees a quickly deployed
development version of an application to test locally. (Janmyr 2015, cited
4.1.2018.)
The company that assigned the thesis is Sparta Consulting Ltd. The company
itself was founded in 2012. Sparta operates in the Information Management (IM)
and Cyber Security consultancy business. In addition to support services in this
junction, they have also developed a software product to support them in this.
Currently there are below 50 Spartans. The employees mostly consist of
consultants but also includes a small development and deployment team. The
company has two main locations, the headquarters reside in the heart of Helsinki
and the development team is located in Jyväskylä, Central Finland.
One of the most powerful things about Docker is the flexibility it affords IT
organizations. The decision of where to run your applications can be based
100% on what’s right for your business. -- you can pick and choose and
mix and match in whatever manner makes sense for your organization. --
With Docker containers you get a great combination of agility, portability,
and control. (Coleman, 2016).
For instance, the Finnish railway company VR Group, uses Docker to automate
the deployment and testing process. Their problems were high operating costs,
quality issues and a slow time-tomarket process. After implementing the Docker
EE (Enterprise Edition), their average cost savings were increased by 50%. The
logging and monitoring was easier for all used applications. Standardizing the
applications on one platform enables the usage everywhere. A delivery pipeline
was set up, which works for the whole platform. This enables easy implementation
of new items to the same environment.
2.1 Virtual machines and containers
During the early years of virtual machines, they gained popularity for their ability
to enable higher levels of server utilization, which is still true today. By mixing and
combining Docker hosts with regular virtual machines, system administrators can
maximize efficiency from their physical hardware. (Coleman 2016, cited
18.01.2018). Building a container cluster on top of a virtual machine, whether it
was made with Docker Swarm or Kubernetes, enables the usage of all the
resources provided by the physical machine to maximize performance.
When Docker first came out, Linux- based containers had been there for quite
some time and the technologies it is based on are not brand new. At first, Dockers’
purpose was to build a specialized LinuX Container (LXC). Docker then detached
itself from it and created its own platform. The predecessor of Docker, LXC, had
been around for almost a decade and at first Docker’s purpose was to build a
specialized LXC container. (Matthias & Kane 2015, 4; Upguard 2017, cited
13.1.2018). At the most basic level when comparing these two technologies there
are some similarities (TABLE 1). They both are light use-space virtualization
mechanisms and they both use cgroups (control groups) and namespaces for
resource management. (Wang 2017, cited
18.01.2018).
TABLE 1. Similarities between LXC and Docker (Rajdep 2014, slide 25/30).
The difference between namespaces and cgroups is that namespaces only deal
with a single process and cgroups allocate resources for a group of processes.
(Wang 2017, cited 18.01.2018.) By allocating resources per each process or a
group of processes, it is possible to scale up and down the needed amount of
resources when, for example, during traffic peaks. This makes the utilization of
processing power of the physical computer possible and more importantly,
efficient.
Chroot (change root) is used to change the working directory of the application.
Its purpose is to isolate certain applications from the operating system. This is
called a chroot jail. This is especially handy when a program is tested that could
potentially harm the computer or is insecure in some way. An important thing to
remember is disabling root permissions from the application which is placed
inside the jail, so that it cannot run privileged commands. Other potential use
cases are, for example, running 32-bit applications on 64-bit operating systems,
executing old versions of certain applications on modern operating systems.
(Ubuntu 2015, cited 12.2.2018.)
“For image management, Docker relies heavily on its storage backend, which
communicates with the underlying Linux filesystem to build and manage the
multiple layers that combine into a single usable image.” (Matthias & Kane 2015,
44). In case the operating systems’ kernel supports multiple storage drivers,
Docker has a list of usable storage drivers, if no driver is configured separately.
By default, the Docker Community edition uses the overlay2- filesystem, which
provides a quick copyon-write system for image management (Docker storage
2018, cited 17.01.2018).
The default filesystem was used in this project because as there was no need to
switch to a different one. Depending on the operating system Docker is installed
on, some filesystems might not be enabled, and need specific drivers installed.
When having doubts choosing the right storage driver, the best and safest way is
to use a modern Linux distribution with a kernel that supports the overlay2 storage
driver. (Docker storage 2018, cited 17.01.2018.)
There are generally two levels regarding the storage drivers, file and block level.
For example, the filesystem drivers aufs, overlay, and overlay2 operate at the file
level. They use memory more efficiently, but in turn the writable layer of the
container can grow unnecessarily large in write-heavy workloads. The default
overlay2- driver was chosen since there are not any write-heavy workloads in the
project. The devicemapper, btrfs and zfs are block-level storage drivers and they
perform well on write-heavy workloads (Docker storage 2018, cited 17.01.2018.)
2.5 Dockerfile
FIGURE 5. Images built with multiphase dockerfile, first and second phase.
Although premade images are available for use from the Docker Hub
(https://hub.docker.com/), it is sometimes better to make a specific Dockerfile.
This way, you know how the final image is built, what licenses and/or properties
it contains. The downside of this is that the responsibility to keep the Dockerfile
updated falls on the organization. It is also possible to combine prepared images
with self-made Dockerfiles to maximize efficiency.
COPY EXPOSE
ADD VOLUME
RUN ENTRYPOINT
The FROM command tells the Dockerfile where to get the image for the for the
build. Depending on the use case, the Dockerfile can start FROM scratch, which
starts the container without any operating system. Normally, an operating system
is pulled from the Dockerhub to act as a basis for the final image. Many popular
Linux distributions have their own official Dockerfiles there, such as
Ubuntu, Centos, Debian, Alpine and CoreOS. (Dockerfile 2017, cited
25.12.2017.)
The MAINTAINER command simply dictates the name and e-mail of the author.
The purpose of this command is to notify the end-user who to contact in case of
problem situations. It is good to keep in mind that the author in question might not
keep the Dockerfile updated and does not reply to questions. In this case, it is
best to create a new Dockerfile. (Dockerfile 2017, cited 25.12.2017.)
COPY command is used to copy items from the sources system to the target
destination inside the container. It can also be used to specify multiple sources.
In the case of this project, the folder where the application resided, was copied.
The command: COPY ./ /root (FIGURE 6) states that all the contents of the
current folder, are copied to the root path inside that image. (Janetakis 2017, cited
18.01.2018.)
The ADD command is the same as COPY, but with ADD, the extraction of a single
tar- file is possible from the source to the destination. In addition, an URL- address
can be used instead of a local file. Use cases might be when extracting a remote
TAR- file into a specific directory in the Docker image. (Janetakis 2017, cited
18.01.2018.)
The RUN command executes commands inside a new layer on the current image.
The results are then committed and used for the next step inside the Dockerfile.
This is used when installing important dependencies or updates. For example, if
the base image of the Dockerfile, usually an operating system, needs to be
updated so a command needs to be executed: RUN apt-get upgrade.
The ONBUILD command adds a trigger instruction into the image, which is
executed later. This is useful when building an image which is used later as a
base for other images. For example something like this could be added to the
Dockerfile: ONBUILD ADD . /app/src. What this does, is that it registers advance
instructions to run in the next build stage. (Dockerfile 2017, cited 25.12.2017.)
The ENV instruction is meant to define the environment of the application running
inside the container. These are heavily application specific, for example when
determining users and passwords for a PostgreSQL instance. These are best
specified inside a docker-compose.yml file, if one is used in a project. (Matthias
& Kane 2015, 43.)
The EXPOSE instruction tells docker which ports the container listens to during
runtime. The port being listened to can be specified as TCP or UDP. This
instruction does not publish the port, it just informs which port will be published.
(Dockerfile 2017, cited 25.12.2017.) In this project, the port assignment is done
within the docker-compose.yml file. (APPENDIX 7.)
“The VOLUME instruction creates a mount point with the specified name and
marks it as holding externally mounted volumes from native host or other
containers.”. (Dockerfile 2017, cited 25.12.2017). It can be used to mount for
example a shell script, which is then run in the dockercompose up phase.
Volumes can be also used in docker environments to create a persistent storage
for important information such as user databases.
The ENTRYPOINT instruction allows the user to configure a container that runs
as an executable process. Like the CMD instruction, when listing several
entrypoints only the last one will have an effect in the Dockerfile. Entrypoint can
be used in specific scenarios, where the container needs to behave as if it was
the executable it withholds. It is used when the end user is not allowed to override
the specified executable. (Dockerfile 2017, cited 25.12.2017; DeHamer 2015,
cited 10.2.2018.)
For this project, not all Dockerfile instructions were used. The CMD command for
instance was moved to the docker-compose.yml file. This was done so that
command can be easily modified without rebuilding the image. This saved tons
of time in the process.
Here is the Dockerfile created for this project (FIGURE 6). For convenience and
compatibility uses, the image nikolauska/phoenix:1.5.3-ubuntu was pulled
straight from the Docker hub (https://hub.docker.com) to be used in this project.
This made the deployment process easier due to having a ready image on which
to base the build. The mix commands are specific to the application and not
related to this thesis, so they will not be explained.
Docker images are layered. When building an image, Docker creates a new
intermediate container for each instruction described in the Dockerfile. When the
commands are chained together to form a coherent line of build instructions, it
reduces the build time and resources the Dockerfile might use (FIGURE 7).
(Dockerfile guide 2017, cited 25.12.2017.)
Each of the RUN commands described in the “Un-optimized” part, create their
own intermediate container. Each of them takes time to setup making the build
process slower than the optimized version, which only creates one layer to handle
the instructions. (Dockerfile guide 2017, cited 25.12.2017.)
Using multistage builds help decrease the final image size. In the example before,
the first stage of the build is where all the dependencies and libraries are installed,
the application is built and compressed into an artifact. In the second phase of
the build the artifact is taken and unpacked. The application itself contains specific
information on how to run it, and it will be determined in the docker-compose.yml
file (APPENDIX 7). The image contains the needed dependencies to run the
application, and it will be referred to in the docker-compose.yml file.
When configuring the Dockerfile, a plan should be made what the final image
needs and doesn’t need. For instance, there is no need for a PDF reader inside
a database instance. Minimizing the number of unnecessary packages is one of
the biggest goals. It helps make the image fast to use, and efficient when thinking
about processing power. (McKendrick & Gallagher 2017, 36; Dockerfile guide
2017, cited 25.12.2017.)
The purpose of the .dockerignore file is to exclude the unwanted files which are
not needed in the docker build process. The build context is what the Dockerfile
uses to build the final image. By using a .dockerignore file, all the irrelevant items
can be left out of the build context. (Dockerfile guide. 2017). They will seem quite
familiar to people who have been working with .gitignore files in GitHub. (Cane
2017, cited 18.01.2018; McKendrick & Gallagher 2017, 36.)
To leave out items of the build context, for example the docker-compose.yml file,
just simply type in the name of the file. To add items to the build context, an
exclamation mark (!) can be added before the filename (FIGURE 8). The file is
read from top to bottom, which means that the instructions on top will be executed
first. As an example: when telling the file to leave out all markdown files (*.md), a
specific file can be added with the exclamation mark flag (!README.md) (Cane
2017, cited 18.01.2018).
2.7 Docker-compose
The purpose of the file is to build and link the containers together. This is used,
for example, when making a multi-container application such as linking an
application to one or multiple databases. In this thesis the actual docker-compose
up command is not used. The Kubernetes’ own tool, kompose, is used. It is very
similar to docker-compose but instead of building the image straight from the
docker-compose.yml file, it first translates it into Kubernetes readable resources.
After translating, it can then use the instructions to deploy the applications inside
containers. (Kubernetes kompose 2018, cited 14.1.2018.) An example docker-
compose.yml file (APPENDIX 7) in which an application (app) is linked with three
database instances (DB1, DB2, DB3)
INSTALLING DOCKER APPENDIX 1
Install Docker:
sudo apt-get update
sudo apt-get install
docker-ce