100% found this document useful (1 vote)
190 views255 pages

Introduction and Intermediate Docker

Docker is a tool used to develop, run, and ship containers. It’s an essential part of every data professional’s toolbelt, helping to create robust, secure, and scalable applications or workflows. In this course, you’ll become a Docker pro, gaining hands-on experience using Docker CLI. This course builds upon the foundations of learning Docker and containerization found in the Introduction to Docker course. We extend the concepts and tools covered in that course, adding the ideas of container image management and optimization, networking, file system communication, multi-platform and multi-container applications. https://ebooks-tech.sellfy.store/p/docker-ebook/

Uploaded by

jcmayac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
100% found this document useful (1 vote)
190 views255 pages

Introduction and Intermediate Docker

Docker is a tool used to develop, run, and ship containers. It’s an essential part of every data professional’s toolbelt, helping to create robust, secure, and scalable applications or workflows. In this course, you’ll become a Docker pro, gaining hands-on experience using Docker CLI. This course builds upon the foundations of learning Docker and containerization found in the Introduction to Docker course. We extend the concepts and tools covered in that course, adding the ideas of container image management and optimization, networking, file system communication, multi-platform and multi-container applications. https://ebooks-tech.sellfy.store/p/docker-ebook/

Uploaded by

jcmayac
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 255

Containers and their

advantages
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Prerequisites

Please take DataCamp's Introduction to Shell before starting this course.

We will use:

nano to edit files.

ls , cd , and mkdir to find our way in and manage the file system.

INTRODUCTION TO DOCKER
Containers
A portable computing environment

INTRODUCTION TO DOCKER
Making it less abstract

INTRODUCTION TO DOCKER
Containers run identically every time

INTRODUCTION TO DOCKER
Containers run identically everywhere

INTRODUCTION TO DOCKER
Isolation

INTRODUCTION TO DOCKER
Containers provide security

INTRODUCTION TO DOCKER
Containers are lightweight
Security

Portability

Reproducibility

Lightweight
In comparison to running an application:
Outside of a container

Using a virtual machine

INTRODUCTION TO DOCKER
Containers and data science
Automatically reproducible

Dependencies are automatically included

Datasets can be included

Code will work on your colleagues machine

Easier sharing than alternatives

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
The Docker Engine
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Docker ecosystem

INTRODUCTION TO DOCKER
Docker Engine

1 https://docs.docker.com/engine/

INTRODUCTION TO DOCKER
The Docker daemon

1 https://docs.docker.com/engine/ 2 https://docs.docker.com/get-started/overview/#docker-architecture

INTRODUCTION TO DOCKER
Images and Containers

1 https://docs.docker.com/engine/ 2 https://docs.docker.com/get-started/overview/#docker-architecture

INTRODUCTION TO DOCKER
Containers are processes

INTRODUCTION TO DOCKER
Containers are processes

INTRODUCTION TO DOCKER
Containers are isolated processes

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Containers vs.
Virtual Machines
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Containers and Virtual Machines

INTRODUCTION TO DOCKER
Resource Virtualization

INTRODUCTION TO DOCKER
Containers vs Virtual Machines

INTRODUCTION TO DOCKER
Security of Virtualization

INTRODUCTION TO DOCKER
Containers are lightweight

INTRODUCTION TO DOCKER
Advantages of containers
Because of their smaller size containers
Are faster to
Start

Stop

Distribute

To change or update

Have a large ecosystem of pre-made containers

INTRODUCTION TO DOCKER
Advantages of Virtual Machines

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Running Docker
containers
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Prerequisite
Command Usage
nano <file-name> Opens <file-name> in the nano text editor
touch <file-name> Creates an empty file with the specified name
echo "<text>" Prints <text> to the console
<command> >> <file> Pushes the output of <command> to the end of <file>
<command> -y Automatically respond yes to all prompts from <command>

INTRODUCTION TO DOCKER
The Docker CLI
Docker command line interface will send instructions to the Docker daemon.

Every commands starts with docker .

INTRODUCTION TO DOCKER
Docker container output
docker run <image-name>

docker run hello-world

Hello from Docker!

To generate this message, Docker took the following steps:


1. The Docker client contacted the Docker daemon.
2. The Docker daemon created a new container from the hello-world image which runs
the executable that produces the output you are currently reading.
3. The Docker daemon streamed that output to the Docker client, which sent it to
your terminal.

INTRODUCTION TO DOCKER
Choosing Docker container output
docker run <image-name>

docker run ubuntu

repl@host:/# docker run ubuntu

repl@host:/#

INTRODUCTION TO DOCKER
An interactive Docker container
Adding -it to docker run will give us an interactive shell in the started container.

docker run -it <image-name>

docker run -it ubuntu

docker run -it ubuntu


repl@container:/#

repl@container:/# exit
exit
repl@host:/#

INTRODUCTION TO DOCKER
Running a container detached
Adding -d to docker run will run the container in the background, giving us back control of
the shell.

docker run -d <image-name>


docker run -d postgres

repl@host:/# docker run -d postgres


4957362b5fb7019b56470a99f52218e698b85775af31da01958bab198a32b072
repl@host:/#

INTRODUCTION TO DOCKER
Listing and stopping running containers
docker ps

repl@host:/# docker ps
CONTAINER ID IMAGE COMMAND CREATED
4957362b5fb7 postgres "docker-entrypoint.s…" About a minute ago
STATUS PORTS NAMES
Up About a minute 5432/tcp awesome_curie

docker stop <container-id>

repl@host:/# docker stop cf91547fd657


cf91547fd657

INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Start a container docker run <image-name>
Start an interactive container docker run -it <image-name>
Start a detached container docker run -d <image-name>
List running containers docker ps
Stop a container docker stop <container-id>

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Working with Docker
containers
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Listing containers
repl@host:/# docker ps
CONTAINER ID IMAGE .. CREATED STATUS ... NAMES
3b87ec116cb6 postgres 2 seconds ago Up 1 second ... adoring_germain
8a7830bbc787 postgres 3 seconds ago Up 2 seconds ... exciting_heisenberg
fefdf1687b39 postgres 3 seconds ago Up 2 seconds ... vigilant_swanson
b70d549d4611 postgres 4 seconds ago Up 3 seconds ... nostalgic_matsumoto
a66c71c54b92 postgres 4 seconds ago Up 4 seconds ... lucid_matsumoto
8d4f412adc3f postgres 6 seconds ago Up 5 seconds ... fervent_ramanujan
fd0b3b2a843e postgres 7 seconds ago Up 6 seconds ... cool_dijkstra
0d1951db81c4 postgres 8 seconds ago Up 7 seconds ... happy_sammet
...

INTRODUCTION TO DOCKER
Named containers
docker run --name <container-name> <image-name>

repl@host:/# docker run --name db_pipeline_v1 postgres


repl@host:/# docker ps
CONTAINER ID IMAGE COMMAND CREATED
43aa37614330 postgres "docker-entrypoint.s…" About a minute ago
STATUS PORTS NAMES
Up About a minute 5432/tcp db_pipeline_v1

docker stop <container-name>

repl@host:/# docker stop db_pipeline_v1

INTRODUCTION TO DOCKER
Filtering running containers
docker ps -f "name=<container-name>"

repl@host:/# docker ps -f "name=db_pipeline_v1"


CONTAINER ID IMAGE COMMAND CREATED
43aa37614330 postgres "docker-entrypoint.s…" About a minute ago
STATUS PORTS NAMES
Up About a minute 5432/tcp db_pipeline_v1

INTRODUCTION TO DOCKER
Container logs
docker logs <container-id>

repl@host:/# docker logs 43aa37614330


The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".


The default database encoding has accordingly been set to "UTF8".

PostgreSQL init process complete; ready for start up.

2022-10-24 12:10:40.318 UTC [1] LOG: database system is ready to accept connect..

INTRODUCTION TO DOCKER
Live logs
docker logs -f <container-id>

repl@host:/# docker logs -f 43aa37614330


PostgreSQL init process complete; ready for start up.

2022-10-24 12:10:40.309 UTC [1] LOG: starting PostgreSQL 14.5 (Debian 14.5-1.pg..
2022-10-24 12:10:40.309 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port ..
2022-10-24 12:10:40.309 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-10-24 12:10:40.311 UTC [1] LOG: listening on Unix socket "/var/run/postgre..
2022-10-24 12:10:40.315 UTC [62] LOG: database system was shut down at 2022-10-..
2022-10-24 12:10:40.318 UTC [1] LOG: database system is ready to accept connect..

INTRODUCTION TO DOCKER
Cleaning up
docker container rm <container-id>

repl@host:/# docker stop 43aa37614330


43aa37614330
repl@host:/# docker container rm 43aa37614330
43aa37614330

INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Start container with a name docker run --name <container-name> <image-name>
Filter running container on name docker ps -f "name=<container-name>"
See existing logs for container docker logs <container-id>
See live logs for container docker logs -f <container-id>
Exit live log view of container CTRL+C
Remove stopped container docker container rm <container-id>

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Managing local
docker images
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
INTRODUCTION TO DOCKER
Pulling an image
docker pull <image-name>

docker pull postgres


docker pull ubuntu

repl@host:/# docker pull hello-world


Using default tag: latest
latest: Pulling from library/hello-world
7050e35b49f5: Pull complete
Digest: sha256:e18f0a777aefabe047a671ab3ec3eed05414477c951ab1a6f352a06974245fe7
Status: Downloaded newer image for hello-world:latest
docker.io/library/hello-world:latest

INTRODUCTION TO DOCKER
Image versions

docker pull <image-name>:<image-version>

docker pull ubuntu:22.04


docker pull ubuntu:jammy

INTRODUCTION TO DOCKER
Listing images
docker images

repl@host:/# docker images


REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest 46331d942d63 7 months ago 9.14kB
ubuntu bionic-20210723 7c0c6ae0b575 15 months ago 56.6MB
postgres 12.7 f076c2fa35f5 15 months ago 300MB
postgres 10.3 cbb7481ff9d5 4 years ago 232MB
...

INTRODUCTION TO DOCKER
Removing images
docker image rm <image-name>

repl@host:/# docker image rm hello-world


Untagged: hello-world:latest
Untagged: hello-world@sha256:e18f0a777aefabe047a671ab3ec3eed05414477c951ab1a6f35..
Deleted: sha256:46331d942d6350436f64e614d75725f6de3bb5c63e266e236e04389820a234c4
Deleted: sha256:efb53921da3394806160641b72a2cbd34ca1a9a8345ac670a85a04ad3d0e3507

repl@host:/# docker image rm hello-world


Error response from daemon: conflict: unable to remove repository reference
"hello-world" (must force) - container 96a7b7b0c535 is using its
referenced image 46331d942d63

INTRODUCTION TO DOCKER
Cleaning up containers
docker container prune

repl@host:/# docker container prune


WARNING! This will remove all stopped containers.
Are you sure you want to continue? [y/N] y
Deleted Containers:
4a7f7eebae0f63178aff7eb0aa39cd3f0627a203ab2df258c1a00b456cf20063
f98f9c2aa1eaf727e4ec9c0283bc7d4aa4762fbdba7f26191f26c97f64090360

Total reclaimed space: 212 B

INTRODUCTION TO DOCKER
Cleaning up images
docker image prune -a

repl@host:/# docker image prune -a


WARNING! This will remove all images without at least one container associated t..
Are you sure you want to continue? [y/N] y
Deleted Images:
untagged: alpine:latest
untagged: alpine@sha256:3dcdb92d7432d56604d4545cbd324b14e647b313626d99b889d0626d..
deleted: sha256:4e38e38c8ce0b8d9041a9c4fefe786631d1416225e13b0bfe8cfa2321aec4bba
deleted: sha256:4fe15f8d0ae69e169824f25f1d4da3015a48feeeeebb265cd2e328e15c6a869f

Total reclaimed space: 16.43 MB

INTRODUCTION TO DOCKER
Dangling images
docker images

repl@host:/# docker images


REPOSITORY TAG IMAGE ID CREATED SIZE
testsql latest 6c49f0cce145 7 months ago 3.73GB
<none> <none> a22b8450b88f 7 months ago 3.73GB
<none> <none> 10dd2d03f59c 7 months ago 3.73GB
<none> <none> 878bae40320b 7 months ago 3.73GB
<none> <none> 4ea70583ba54 7 months ago 3.75GB
<none> <none> 3c64576a3a7d 7 months ago 3.75GB

INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Pull an image docker pull <image-name>
Pull a specific version of an image docker pull <image-name>:<image-version>
List all local images docker images
Remove an image docker image rm <image-name>
Remove all stopped containers docker container prune
Remove all images docker image prune -a

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Distributing Docker
Images
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Private Docker registries
Unlike Docker official images there is no quality guarantee

Name starts with the url of the private registry

dockerhub.myprivateregistry.com/classify_spam

docker pull dockerhub.myprivateregistry.com/classify_spam:v1

Using tag: v1
latest: Pulling from dockerhub.myprivateregistry.com
ed02c6ade914: Pull complete
Digest: sha256:b6b83d3c331794420340093eb706b6f152d9c1fa51b262d9bf34594887c2c7ac
Status: Downloaded newer image for dockerhub.myprivateregistry.com/classify_spam:v1
dockerhub.myprivateregistry.com/classify_spam:v1

INTRODUCTION TO DOCKER
Pushing to a registry
docker image push <image name>

Pushing to a specific registry --> name of the image needs to start with the registry url

docker tag classify_spam:v1 dockerhub.myprivateregistry.com/classify_spam:v1

docker image push dockerhub.myprivateregistry.com/classify_spam:v1

INTRODUCTION TO DOCKER
Authenticating against a registry
Docker official images --> No authentication needed

Private Docker repository --> Owner can choose

docker login dockerhub.myprivateregistry.com

user@pc ~ % docker login dockerhub.myprivateregistry.com


Username: student
Password:
Login succeeded

INTRODUCTION TO DOCKER
Docker images as files
Sending a Docker image to one or a few people? Send it as a file!

Save an image

docker save -o image.tar classify_spam:v1

Load an image

docker load -i image.tar

INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Pull image from private registry docker pull <private-registry-url>/<image-name>
Name an image docker tag <old-name> <new-name>
Push an image docker image push <image-name>
Login to private registry docker login <private-registry-url>
Save image to file docker save -o <file-name> <image-name>
Load image from file docker load -i <file-name>

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Creating your own
Docker images
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Creating images with Dockerfiles

INTRODUCTION TO DOCKER
Starting a Dockerfile
A Dockerfile always start from another image, specified using the FROM instruction.

FROM postgres
FROM ubuntu
FROM hello-world
FROM my-custom-data-pipeline

FROM postgres:15.0
FROM ubuntu:22.04
FROM hello-world:latest
FROM my-custom-data-pipeline:v1

INTRODUCTION TO DOCKER
Building a Dockerfile
Building a Dockerfile creates an image.

docker build /location/to/Dockerfile


docker build .

[+] Building 0.1s (5/5) FINISHED


=> [internal] load build definition from Dockerfile
=> => transferring dockerfile: 54B
...
=> CACHED [1/1] FROM docker.io/library/ubuntu
=> exporting to image
=> => exporting layers
=> => writing image sha256:a67f41b1d127160a7647b6709b3789b1e954710d96df39ccaa21..

INTRODUCTION TO DOCKER
Naming our image
In practice we almost always give our images a name using the -t flag:

docker build -t first_image .

...
=> => writing image sha256:a67f41b1d127160a7647b6709b3789b1e954710d96df39ccaa21..
=> => naming to docker.io/library/first_image

docker build -t first_image:v0 .

=> => writing image sha256:a67f41b1d127160a7647b6709b3789b1e954710d96df39ccaa21..


=> => naming to docker.io/library/first_image:v0

INTRODUCTION TO DOCKER
Customizing images
RUN <valid-shell-command>

FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3

Use the -y flag to avoid any prompts:

...
After this operation, 22.8 MB of additional disk space will be used.
Do you want to continue? [Y/n]

INTRODUCTION TO DOCKER
Building a non-trivial Dockerfile
When building an image Docker actually runs commands after RUN
Docker running RUN apt-get update takes the same amount of time as us running it!

root@host:/# apt-get update


Get:1 http://ports.ubuntu.com/ubuntu-ports jammy InRelease [270 kB]
...
Get:17 http://ports.ubuntu.com/ubuntu-ports jammy-security/restricted arm64 Pack..
Fetched 23.0 MB in 2s (12.3 MB/s)
Reading package lists... Done

INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Start a Dockerfile from an image FROM <image-name>
Add a shell command to image RUN <valid-shell-command>
Make sure no user input is needed for the shell-command. RUN apt-get install -y python3

Usage Shell Command


Build image from Dockerfile docker build /location/to/Dockerfile
Build image in current working directory docker build .
Choose a name when building an image docker build -t first_image .

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Managing files in
your image
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
COPYing files into an image
The COPY instruction copies files from our local machine into the image we're building:

COPY <src-path-on-host> <dest-path-on-image>


COPY /projects/pipeline_v3/pipeline.py /app/pipeline.py

docker build -t pipeline:v3 .


...
[4/4] COPY ./projects/pipeline_v3/pipeline.py /app/pipeline.py

If the destination path does not have a filename, the original filename is used:

COPY /projects/pipeline_v3/pipeline.py /app/

INTRODUCTION TO DOCKER
COPYing folders
Not specifying a filename in the src-path will copy all the file contents.

COPY <src-folder> <dest-folder>


COPY /projects/pipeline_v3/ /app/

COPY /projects/pipeline_v3/ /app/ will copy everything under pipeline_v3/ :

/projects/
pipeline_v3/
pipeline.py
requirements.txt
tests/
test_pipeline.py

INTRODUCTION TO DOCKER
Copy files from a parent directory
/init.py
/projects/
Dockerfile
pipeline_v3/
pipeline.py

If our current working directory is in the projects/ folder.

We can't copy init.py into an image.

docker build -t pipeline:v3 .


=> ERROR [4/4] COPY ../init.py / 0.0s
failed to compute cache key: "../init.py" not found: not found

INTRODUCTION TO DOCKER
Downloading files
Instead of copying files from a local directory, files are often downloaded in the image build:

Download a file

RUN curl <file-url> -o <destination>

Unzip the file

RUN unzip <dest-folder>/<filename>.zip

Remove the original zip file

RUN rm <copy_directory>/<filename>.zip

INTRODUCTION TO DOCKER
Downloading files efficiently
Each instruction that downloads files adds to the total size of the image.
Even if the files are later deleted.

The solution is to download, unpack and remove files in a single instruction.

RUN curl <file_download_url> -o <destination_directory>/<filename>.zip \


&& unzip <destination_directory>/<filename>.zip -d <unzipped-directory> \
&& rm <destination_directory>/<filename>.zip

INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
COPY <src-path-on-host> <dest-path-
Copy files from host to the image on-image>
Copy a folder from host to the image COPY <src-folder> <dest-folder>
We can't copy from a parent directory where we
COPY ../<file-in-parent-directory> /
build a Dockerfile
Keep images small by downloading, unzipping, and cleaning up in a single RUN instruction:

RUN curl <file_download_url> -O <destination_directory> \


&& unzip <destination_directory>/<filename>.zip -d <unzipped-directory> \
&& rm <destination_directory>/<filename>.zip

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Choosing a start
command for your
Docker image
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
What is a start command?
The hello-world image prints text and then stops.

docker run hello-world

Hello from Docker!

To generate this message, Docker took the following steps:


1. The Docker client contacted the Docker daemon.
2. The Docker daemon created a new container from the hello-world image which runs
executable that produces the output you are currently reading.
3. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.

INTRODUCTION TO DOCKER
What is a start command?
An image with python could start python on startup.

docker run python3-sandbox

Python 3.10.6 (main, Nov 2 2022, 18:53:38) [GCC 11.3.0] on linux


Type "help", "copyright", "credits" or "license" for more information.
>>>
...

....
>>> exit()
repl@host:/#

INTRODUCTION TO DOCKER
Running a shell command at startup
CMD <shell-command>

The CMD instruction:

Runs when the image is started.

Does not increase the size of the image .

Does not add any time to the build.

If multiple exist, only the last will have an effect.

INTRODUCTION TO DOCKER
Typical usage
Starting an application to run a workflow or that accepts outside connections.

CMD python3 my_pipeline.py

CMD postgres

Starting a script that, in turn, starts multiple applications

CMD start.sh

CMD python3 start_pipeline.py

INTRODUCTION TO DOCKER
When will it stop?

hello-world image -> After printing text

A database image -> When the database exits

A more general image needs a more general start command.

An Ubuntu image -> When the shell is closed

INTRODUCTION TO DOCKER
Overriding the default start command
Starting an image

docker run <image>

Starting an image with a custom start command

docker run <image> <shell-command>

Starting an image interactively with a custom start command

docker run -it <image> <shell-command>

docker run -it ubuntu bash

INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Add a shell command run when a container is started from the CMD <shell-
image. command>

Usage Shell Command


Override the CMD set in the image docker run <image> <shell-command>
Override the CMD set in the image and run docker run -it <image> <shell-
interactively command>

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Introduction to
Docker layers and
caching
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Docker build
Downloading and unzipping a file using the Docker instructions.

RUN curl http://example.com/example_folder.zip


RUN unzip example_folder.zip

Will change the file system and add:

/example_folder.zip
/example_folder/
example_file1
example_file2

It is these changes that are stored in the image.

INTRODUCTION TO DOCKER
Docker instructions are linked to File system changes
Each instruction in the Dockerfile is linked to the changes it made in the image file system.

FROM docker.io/library/ubuntu
=> Gives us a file system to start from with all files needed to run Ubuntu

COPY /pipeline/ /pipeline/


=> Creates the /pipeline/ folder
=> Copies multiple files in the /pipeline/ folder

RUN apt-get install -y python3


=> Add python3 to /var/lib/

INTRODUCTION TO DOCKER
Docker layers
Docker layer: All changes caused by a single Dockerfile instruction.
Docker image: All layers created during a build

--> Docker image: All changes to the file system by all Dockerfile instructions.

While building a Dockerfile, Docker tells us which layer it is working on:

=> [1/3] FROM docker.io/library/ubuntu


=> [2/3] RUN apt-get update
=> [3/3] RUN apt-get install -y python3

INTRODUCTION TO DOCKER
Docker caching
Consecutive builds are much faster because Docker re-uses layers that haven't changed.

Re-running a build:

=> [1/3] FROM docker.io/library/ubuntu


=> CACHED [2/3] RUN apt-get update
=> CACHED [3/3] RUN apt-get install -y python3

Re-running a build but with changes:

=> [1/3] FROM docker.io/library/ubuntu


=> CACHED [2/3] RUN apt-get update
=> [3/3] RUN apt-get install -y R

INTRODUCTION TO DOCKER
Understanding Docker caching
When layers are cached helps us understand why sometimes images don't change after a
rebuild.

Docker can't know when a new version of python3 is released.

Docker will use cached layers because the instructions are identical to previous builds.

=> [1/3] FROM docker.io/library/ubuntu


=> CACHED [2/3] RUN apt-get update
=> CACHED [3/3] RUN apt-get install -y python3

INTRODUCTION TO DOCKER
Understanding Docker caching
Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.

In the following Dockerfile all instructions need to be rebuild if the pipeline.py file is changed:

FROM ubuntu
COPY /app/pipeline.py /app/pipeline.py
RUN apt-get update
RUN apt-get install -y python3

=> [1/4] FROM docker.io/library/ubuntu


=> [2/4] COPY /app/pipeline.py /app/pipeline.py
=> [3/4] RUN apt-get update
=> [4/4] RUN apt-get install -y python3

INTRODUCTION TO DOCKER
Understanding Docker caching
Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.

In the following Dockerfile, only the COPY instruction will need to be re-run.

FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
COPY /app/pipeline.py /app/pipeline.py

=> [1/4] FROM docker.io/library/ubuntu


=> CACHED [2/4] RUN apt-get update
=> CACHED [3/4] RUN apt-get install -y python3
=> [4/4] COPY /app/pipeline.py /app/pipeline.py

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Changing users and
working directory
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Dockerfile instruction interaction
FROM, RUN, and COPY interact through the file system.

COPY /projects/pipeline_v3/start.sh /app/start.sh


RUN /app/start.sh

Some influence other instructions directly:

WORKDIR : Changes the working directory for all following instructions

USER : Changes the user for all following instructions

INTRODUCTION TO DOCKER
WORKDIR - Changing the working directory
Starting all paths at the root of the file system:

COPY /projects/pipeline_v3/ /app/

Becomes cluttered when working with long paths:

COPY /projects/pipeline_v3/ /home/my_user_with_a_long_name/work/projects/app/

Alternatively, use WORKDIR:

WORKDIR /home/my_user_with_a_long_name/work/projects/

COPY /projects/pipeline_v3/ app/

INTRODUCTION TO DOCKER
RUN in the current working directory
Instead of using the full path for every command:

RUN /home/repl/projects/pipeline/init.sh
RUN /home/repl/projects/pipeline/start.sh

Set the WORKDIR:

WORKDIR /home/repl/projects/pipeline/
RUN ./init.sh
RUN ./start.sh

INTRODUCTION TO DOCKER
Changing the startup behavior with WORKDIR
Instead of using the full path:

CMD /home/repl/projects/pipeline/start.sh

Set the WORKDIR:

WORKDIR /home/repl/projects/pipeline/
CMD start.sh

Overriding command will also be run in WORKDIR:

docker run -it pipeline_image start.sh

INTRODUCTION TO DOCKER
Linux permissions
Permissions are assigned to users.
Root is a special user with all permissions.

Best practice

Use root to create new users with permissions for specific tasks.

Stop using root.

INTRODUCTION TO DOCKER
Changing the user in an image
Best practice: Don't run everything as root
Ubuntu -> root by default

FROM ubuntu --> Root user by default


RUN apt-get update --> Run as root

USER Dockerfile instruction:

FROM ubuntu --> Root user by default


USER repl --> Changes the user to repl
RUN apt-get update --> Run as repl

INTRODUCTION TO DOCKER
Changing the user in a container
Dockerfile setting the user to repl:

FROM ubuntu --> Root user by default


USER repl --> Changes the user to repl
RUN apt-get update --> Run as repl

Will also start containers with the repl user:

docker run -it ubuntu bash


repl@container: whoami
repl

INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Change the current working directory WORKDIR <path>
Change the current user USER <user-name>

INTRODUCTION TO DOCKER
Time for practice!
INTRODUCTION TO DOCKER
Variables in
Dockerfiles
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Variables with the ARG instruction
Create variables in a Dockerfile

ARG <var_name>=<var_value>

For example ARG path=/home/repl

To use in the Dockerfile

$path

For example COPY /local/path $path

INTRODUCTION TO DOCKER
Use-cases for the ARG instruction
Setting the Python version

FROM ubuntu
ARG python_version=3.9.7-1+bionic1
RUN apt-get install python3=$python_version
RUN apt-get install python3-dev=$python_version

Configuring a folder

FROM ubuntu
ARG project_folder=/projects/pipeline_v3
COPY /local/project/files $project_folder
COPY /local/project/test_files $project_folder/tests

INTRODUCTION TO DOCKER
Setting ARG variables at build time
FROM ubuntu
ARG project_folder /projects/pipeline_v3
COPY /local/project/files $project_folder
COPY /local/project/test_files $project_folder/tests

Setting a variable in the build command

docker build --build-arg project_folder=/repl/pipeline .

ARG is overwritten, and files end up in:

COPY /local/project/files /repl/pipeline


COPY /local/project/test_files /repl/pipeline/tests

INTRODUCTION TO DOCKER
Variables with ENV
Create variables in a Dockerfile

ENV <var_name>=<var_value>

For example ENV DB_USER=pipeline_user

To use in the Dockerfile or at runtime

$DB_USER

For example CMD psql -U $DB_USER

INTRODUCTION TO DOCKER
Use-cases for the ENV instruction
Setting a directory to be used at runtime

ENV DATA_DIR=/usr/loca/var/postgres

ENV MODE production

Setting or replacing a variable at runtime

docker run --env <key>=<value> <image-name>

docker run --env POSTGRES_USER=test_db --env POSTGRES_PASSWORD=test_db postgres

1 https://hub.docker.com/_/postgres

INTRODUCTION TO DOCKER
Secrets in variables are not secure
docker history <image-name>

ARG DB_PASSWORD=example_password

Will show in docker history :

IMAGE CREATED CREATED BY SIZE ...


cd338027297f 2 months ago ARG DB_PASSWORD=example_password 0B ...

INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Create a variable accessible only during the build ARG <name>=<value>
Create a variable ENV <name>=<value>

Usage Shell Command


Override an ARG in docker build docker build --build-arg <name>=<value>
docker run --env <name>=<value> <image-
Override an ENV in docker run
name>
See the instructions used to create an docker history <image-name>
image

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Creating Secure
Docker Images
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Inherent Security

INTRODUCTION TO DOCKER
Making secure images

Attackers can exceptionally break out of a container.

Additional security measures can lower this risk

Becomes especially important once exposing running containers to the Internet.

INTRODUCTION TO DOCKER
Images from a trusted source
Creating secure images -> Start with an image from a trusted source
Docker Hub filters:

INTRODUCTION TO DOCKER
Keep software up-to-date

INTRODUCTION TO DOCKER
Keep images minimal
Adding unnecessary packages Installing only essential packages
reduces security improves security
Ubuntu with: Ubuntu with:

Python2.7 Python3.11

Python3.11 Our pipeline application

Java default-jre

Java openjdk-11

Java openjdk-8

Airflow
Our pipeline application

INTRODUCTION TO DOCKER
Don't run applications as root
Allowing root access to an image defeats keeping the image up-to-date and minimal.

Instead, make containers start as a user with fewer permissions:

FROM ubuntu # User is set to root by default.


RUN apt-get update
RUN apt-get install python3
USER repl # We switch the user after installing what we need for our use-case.
CMD python3 pipeline.py

INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Wrap-up
INTRODUCTION TO DOCKER

Tim Sangster
Software Engineer @ DataCamp
Chapter 1: The theoretical foundation

INTRODUCTION TO DOCKER
Chapter 2: The Docker CLI
Usage Command
docker run (--name <container-name>) (-it) (-d) <image-
Start a container name>
List running containers docker ps (-f "name=<container-name>")
Stop a container docker stop <container-id>
See (live) logs for container docker logs (-f) <container-id>
Remove stopped container docker container rm <container-id>
Pull a specific version of an docker pull <image-name>:<image-version>
image
List all local images docker images
Remove an image docker image rm <image-name>

INTRODUCTION TO DOCKER
Chapter 3: Dockerfiles
FROM ubuntu
RUN apt-get update && apt-get install python3
COPY /projects/pipeline /app/
CMD /app/init.py

docker build -t my_pipeline .


=> [1/3] FROM docker.io/library/ubuntu
=> CACHED [2/3] RUN apt-get update && apt-get install python3
=> CACHED [3/3] COPY /projects/pipeline /app/

INTRODUCTION TO DOCKER
Chapter 4: Security and Customization
Usage Dockerfile Instruction
Change the current working directory WORKDIR <path>
Change the current user USER <user-name>
Create a variable accessible only during the build ARG <name>=<value>
Create a variable ENV <name>=<value>

Usage Shell Command


Override an ARG in docker build docker build --build-arg <name>=<value>

Override an ENV in docker run docker run --env <name>=<value> <image-


name>
See the instructions used to create a
docker history <image-name>
image

INTRODUCTION TO DOCKER
Chapter 4: Security and Customization
Isolation provided by containers gives security but is not perfect.
Use the "Trusted Content" images from the official Docker Hub registry

Keep software on images up-to-date

Only install the software you need for the current use case.

Do not leave the user in images set to root.

INTRODUCTION TO DOCKER
What more is there to learn?
Dockerfile instructions Multi stage builds
ENTRYPOINT
FROM ubuntu as stage1
HEALTHCHECK RUN generate_data.py

EXPOSE ...
FROM postgres as stage2
...
COPY --from=stage 1 /tmp /data

INTRODUCTION TO DOCKER
Thank you!
INTRODUCTION TO DOCKER
Intermediate Docker
Commands
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
Docker refresher
Docker is a container runtime The DataCamp Introduction to Docker course
Is designed to run and manage various is a pre-requisite to this course

containerized applications on Windows,


Mac, and Linux

Can run containers using pre-built images,


or create our own

Dockerfiles are text files used to Docker


container images

Containers are instances of a given Docker


image

INTERMEDIATE DOCKER
Docker commands
docker run docker --help

docker stop
Provides a list of potential Docker
commands
docker build
docker COMMAND --help
docker run --help
Provides options for the docker run
command

INTERMEDIATE DOCKER
Temporary containers
Docker containers are usually created with docker run --rm
docker run docker run --rm alpine:latest
/bin/sh
Containers remain even after stopping /
exiting Referenced as 'clean-up' or 'remove'

Often want to run a container instance and


remove it immediately upon exit
Development

Testing

Scripts

INTERMEDIATE DOCKER
docker ps
Used for determining name, id, status, and other attributes of containers on a given machine
running Docker

Use the -a flag to get more information about existing containers


docker ps -a

Will cover how to get extremely detailed information about containers later in the course

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Mounting the host
filesystem
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineer
Container filesystems
Container instances each have their own
filesystem
Based off the image the container was
created with
Any changes are tied to that specific
container instance

Any changes are maintained across


restarts
For that instance only

New containers only have the data in the


image, not instance specific changes

INTERMEDIATE DOCKER
Sharing files or directories
Can attach specific files or directories to Known as bind-mount
containers Can be read-only or read/write
Allows for persistence of data, without Note: When files or directories are attached
maintaining a specific container
to a container, they are not accessible to
Can upgrade container to new version but the host until the container is shutdown
safely keep data / changes

INTERMEDIATE DOCKER
Using the -v option
bind-mounts most often use the -v flag
docker run -v ~/html:/var/www/html \
-v <source>:<destination> nginx

Multiple -v commands permitted


docker run
Can also use the --mount option
-v ~/pgdata:/opt/data \
Note: bind-mount hides any content -v ~/pg.conf:/etc/pg.conf \
already present in the destination directory postgresql

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Persistent volumes
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
What is a volume?
Volumes are an option to store data in
Docker, unrelated to the container image or
host filesystem

Are managed from the command line (or


API)

Can share with multiple containers

Higher performance than file share / bind


mounts

Exist until removed

1 Image modified from https://docs.docker.com

INTERMEDIATE DOCKER
Managing volumes
docker volume

docker volume create <volumename>

docker volume ls or
docker volume list

docker volume inspect


Provides assorted metadata about the
volume, including Name, Mountpoint,
Options, and so forth

docker volume rm

INTERMEDIATE DOCKER
Volume creation example
bash> docker volume create sqldata

sqldata

bash> docker volume ls

DRIVER VOLUME NAME


local 2f2b7f710551e004dcdd9edf4cad31c37826b428de12f1c04ca02305d216ab00
local 14da7ff0c6eb29f644e6f9f9d59bbcf56b3699c04881dd7cbcaa9ecd6bef239c
local 150aa3c5c7aee30ffd1ec7ecf39f03989bf561536a9413ebed96ffbaa537d103
local sqldata
...

INTERMEDIATE DOCKER
Volume inspect example
bash> volume inspect sqldata
[
{
"CreatedAt": "2024-01-27T04:27:51Z",
"Driver": "local",
"Labels": null,
"Mountpoint": "/var/lib/docker/volumes/sqldata/_data",
"Name": "sqldata",
"Options": null,
"Scope": "local"
}
]

INTERMEDIATE DOCKER
Attaching volumes
Uses the -v command $ docker run -v sqldata:/data postgres
docker run -v <volumename>:
<destination path>:<options>

Volume name is name of existing volume

Destination path is the location the


volume will be mounted (such as /data)

Options are optional comma-separated


list of values such as ro for read-only.

--mount exists as with bind-mounts

INTERMEDIATE DOCKER
Drivers
Methods of storing Docker volumes
Can include:
Local filesystem (default)

NFS (Unix filesharing)

SMB / CIFS (Windows filesharing)

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Networking
refresher
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
What is networking?
A computer network consists of systems
communicating via a defined method

Varying levels of communication, physical


or logical, referred to as protocols

Common physical networks include


Ethernet and WiFi

Logical networking includes TCP/IP, HTTP,


and SMTP

Networks are defined in various layers or


levels. Often referred to as a networking
stack
1 Photo by Jordan Harrison on Unsplash.

INTERMEDIATE DOCKER
Networking terms
Host
General term for a computer

Network
Group of hosts
Interface
Actual connection from a host to a network, such as Ethernet or WiFi

Can be virtual, meaning entirely in software

LAN
Local Area Network, or set of computers at a given location

VLAN
Virtual LAN, or a software LAN

INTERMEDIATE DOCKER
Internet Protocol
IP
Internet protocol, method to connect between networks using IP addresses

IPv4
Version of IP supporting 4.2 billion addresses, currently exhausted

IPv6
Newer version of IP, supporting 2^128 addresses, still being deployed

IPv4: 10.10.10.1
IPv6: 2001:0db8:85a3:0000:0000:8a2e:0370:7334

INTERMEDIATE DOCKER
TCP / UDP
TCP
Transmission Control Protocol, used to reliably communicate between hosts on IP
networks

UDP
User Datagram Protocol, used to communicate between hosts on IP where
communication is not required.

INTERMEDIATE DOCKER
Ports
Port
Addresses services on a given host, a value between 0 and 65535, used to communicate
between hosts via TCP or UDP

Ports below 1024 are typically reserved for privileged accounts

Values above 1024 are usually ephemeral or temporary ports

Applications listen on a port.

INTERMEDIATE DOCKER
Application protocols
HTTP/HTTPS
Application protocol, defaulting to TCP port 80 for web communication. Secure version on
TCP 443

SMTP
Email transfer protocol, over TCP port 25

SNMP
Network management protocol, over UDP port 161

INTERMEDIATE DOCKER
Docker and networking
Can communicate between containers
Can communicate with the host system

Depending on settings can communicate with external hosts

Typical communication is handled by exposing ports from container to host

Acts like a translation between containers and hosts

INTERMEDIATE DOCKER
Docker and IP
Containers can have IP addresses

Use ifconfig <interface> or ip addr show <interface> from within container to find
addresses

Use ping -c <x> <host> to verify connectivity


ping -c 3 myhost

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Making network
services available in
Docker
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
Network services
Network services listen on a given port
Only one program can listen on an IP:port combo at a given time
For example, 10.1.2.3 80 would be listening on 10.1.2.3 on port 80.

Consider trying to debug different versions of a web server that listens on port 80
Could only run one copy of the application at a time given that it listens only on that port

INTERMEDIATE DOCKER
Containerized services
Wrapping application in a container means that each container can now listen on that port
(as the IP:port combo is different, each container has a different IP)

Can have multiple copies of the containers running at once

But how to connect to container's version of application from the host?

INTERMEDIATE DOCKER
Port mapping
The answer is the use of port mapping, or
port forwarding / translation

Port mapping takes a connection to a given


IP:port and automatically forwards it to
another IP:port combo

In this case, we could map an unused port


on our host and point it to port 80 on the
container(s)

The Docker engine can handle this


automatically if we configure it to

INTERMEDIATE DOCKER
Enabling port mapping
To enable port mapping on a given container, we use the docker run command, and the
-p flag

-p <host port>:<container port>

-p 5501:80

Can have multiple -p flags for different ports

repl@host:~$ docker run -p 5501:80 nginx


repl@host:~$ docker ps -a
CONTAINER ID IMAGE ... PORTS NAMES
84266724ff47 nginx ... 0.0.0.0:5501->80/tcp, :::5501->80/tcp coiled_elgamal

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Exposing ports with
Dockerfiles
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineer
Exposing services
EXPOSE command

Defines which ports the container will use at runtime

Can be defined as <number> , <number>/tcp , or <number>/udp


Such as EXPOSE 80 or EXPOSE 80/tcp

Multiple entries permitted

Used as a documentation method

INTERMEDIATE DOCKER
Using the -p / -P flags
Still requires use of -p or -P options to docker run to make the ports available outside
the container

The -P option will automatically map an ephemeral port to the exposed port(s). Must use
docker ps -a to see which ports are mapped.

Using -p<host port>:<container port> allows use of specific ports.

INTERMEDIATE DOCKER
EXPOSE example
# Dockerfile
FROM python:3.11-slim
ENTRYPOINT ["python","-mhttp.server"]
# Let the Docker engine know
# port 8000 should be available
EXPOSE 8000

Create a container from the image


docker run pyserver

Print the state of the container


docker ps -a

CONTAINER ID IMAGE ... PORTS NAMES


8c3d320255ae pyserver ... 8000/tcp angry_chaum

INTERMEDIATE DOCKER
Making ports reachable
Automatically map temporary port from host to the container

docker run -P pyserver

docker ps -a

CONTAINER ID IMAGE ... PORTS NAMES


6bb458ef25da pyserver ... 0.0.0.0:55001->8000/tcp beautiful_lamarr

INTERMEDIATE DOCKER
Finding exposed ports
docker inspect provides a lot of information

docker inspect <id>

"NetworkSettings": {
"Bridge": "",
"Ports": {
"8000/tcp": [{
"HostIp": "0.0.0.0",
"HostPort": "55001"
}]
},
...

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Docker networks
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
Docker networking
Docker has extensive networking options
Can create networks to communicate between containers, host, and external systems

Will cover various commands to interact with networks.

INTERMEDIATE DOCKER
Docker networking types
Docker supports different networking types, using drivers
bridge: Default driver, allows connections out, connections in if exposed

host: Allows full communication between host and containers

none: Isolate container from network communications

Many others, including custom drivers

Will mostly use the bridge driver to create our own networks

INTERMEDIATE DOCKER
Working with Docker networks
Several commands:

docker network
docker network <command>

docker network <command> --help

docker network ls to list all docker networks on the host

docker network create to create a network

docker network rm to remove a network

INTERMEDIATE DOCKER
Docker network example
Create a Docker network named mynetwork

repl@host:~$ docker network create mynetwork

5ff0febab98f73b74dd753eb44a30f7d7291052b3b1d58b0134589221cb8e33d

repl@host:~$ docker network ls


NETWORK ID NAME DRIVER SCOPE
2edc5ae4838c bridge bridge local
a92988382711 host host local
5ff0febab98f mynetwork bridge local
5464ed866dad none null local

INTERMEDIATE DOCKER
Attaching containers to networks
How to connect container to a network?
docker run --network <networkname> ...

docker run --network mynetwork ubuntu bash

Can also connect containers later


docker network connect <networkname> <container>

docker network connect mynetwork ubuntu-B

INTERMEDIATE DOCKER
docker network inspect
How to check details of network?
docker network inspect <networkname>

Provides configuration info and IP addresses assigned to containers

repl@host:~$ docker network inspect mynetwork

INTERMEDIATE DOCKER
docker network inspect example
"Name": "mynetwork",
...
"Driver": "bridge",
...
Containers": { "2be08aa942029191350d4bceb8816254af8713dd6f7dcbadcab8f068f
"Name": "unruffled_kare",
"EndpointID": "29739356ae200e1e901d2eabef05efaca0fb37e1a4e1a4c3bf369
"MacAddress": "02:42:ac:12:00:02",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
}

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Optimizing Docker
images
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
Docker image explanation
Docker images are the base of a given container
Holds all content initially available to a container instance

INTERMEDIATE DOCKER
Docker image concerns
Tempting to add all potentially needed
components to an image

Size becomes large / unwieldy


Difficult to handle security / updates due to
dependency issues

Harder to combine containers without


wasting space / bandwidth

INTERMEDIATE DOCKER
Docker image recommendations
Split containers to the smallest level needed
Easier to combine multiple containers later vs. building a single large image

Like
building with reusable components

vs. building from scratch each time

Updates to specific software only affect containers using that image instead of all
containers needing the update

Can optimize for size, making use and distribution much easier

INTERMEDIATE DOCKER
Docker image breakdown example
Consider a data engineering project using
FROM ubuntu
the following software:
RUN apt update
Postgresql database
RUN apt install -y postgresql
Python ETL software RUN apt install -y nginx
RUN apt install -y python3.9
Web server software
...
Possible to use a single image, but we
would need to update the image each time
we had an update to the ETL or web server
setup.

What would happen if we needed to add


another web server?

INTERMEDIATE DOCKER
Example with minimized containers
Better options with Docker bash> docker run -d postgresql:latest
Split each into its own container bash> docker run -d nginx:latest
Postgresql database container ...

Python ETL components

Web server

Can build an optimized configuration for


our use, and can add / remove components
as needed

INTERMEDIATE DOCKER
Determining image size
Using docker images bash> docker images

Shows individual image details, including


REPOSITORY TAG SIZE
size
postgres latest 448MB
More in-depth options covered later
postgres 15 442MB
apache/airflow 2.7.1-python3.9 1.4GB
alpine latest 7.73MB

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Understanding
layers
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
Docker layers
Docker images are made up of layers

A layer generally references a change or


command within a Dockerfile

Layers can be cached / reused

The order of commands within a Dockerfile


can affect whether layers are reused

INTERMEDIATE DOCKER
Why do we care about layers?
Reusability
Faster build time

Smaller builds

INTERMEDIATE DOCKER
docker image inspect
How to determine the layers within an image?
docker image inspect <img id | name>

Provides much information about the content of a Docker image

The RootFS:Layers section provides details about layers in a given Docker image

repl@host:~$ docker image inspect alpine


[
{ "Id": "sha256:05455a08881ea9cf0e752bc48e61bbd71a34c029bb13df01e40e3e70e
"RepoTags": [
"alpine:latest"
],
"Created": "2024-01-27T00:30:48.743965523Z",

INTERMEDIATE DOCKER
docker image inspect example
bash> docker image inspect postgres:latest

"RootFS": {
"Type": "layers",
"Layers": [
"sha256:6f2d01c02c30cc1ffac781aff795cba8eeb29cc27756fe37bf525169856369c6",
"sha256:c6ad2d5a3cad837ae66b5560e9c577bfad062556b1f00791d8d733ce44a577ce",
"sha256:2153552a84ccbf7e4a28a50e766b72345072e59f8af0ff068baf98b413132e0c",
"sha256:6c00217b1e4b15c25eb3f6e28b1af8c295f469568014621e31a4c5eb5a8aca6f",
"sha256:167177d78e2a33aa822faebe9f01683c648ae78179059db05cd25737f215c305",
...

INTERMEDIATE DOCKER
jq command-line tool
Sometimes difficult to analyze the results from docker image inspect
jq commandline tool is used to read JSON data, like what's returned from
docker image inspect

Can use jq to query data

INTERMEDIATE DOCKER
jq recipes with Docker
Method to see just a specific section, for example the RootFS data:
docker image inspect <id> | jq '.[0] | .RootFS'

{
"Type": "layers",
"Layers": [ "sha256:0f5c115c5eea96...",
"sha256:20792593831cdc..."
]
}

INTERMEDIATE DOCKER
jq recipes with Docker (part 2)
Method to count number of layers using jq :
docker image inspect <id> | jq '.[0] | {LayerCount: .RootFS.Layers | length}'

{
"LayerCount": 2
}

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Multi-stage builds
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
Single-stage builds
Typical Docker images are created using a FROM ubuntu
single FROM command RUN apt update
Each addition to the source image adds RUN apt install gcc -y
space and makes its management ...
RUN make
Consider an application that must be
CMD ["data_app"]
compiled prior to use
You can add all the necessary
components to the image, compile it,
and then configure the final image for
use

This often leaves superfluous content in


the image even if it is not used

INTERMEDIATE DOCKER
Multi-stage builds
Multi-stage builds use multiple containers
Typically has one or more build stages

Final components are copied into a final container image

The build stages are then removed automatically


Saving space and minimizing the size of the container image

Uses some additional syntax in the Dockerfile


AS <alias>

COPY --from=<alias>

INTERMEDIATE DOCKER
Multi-stage build example
# Create initial build stage
FROM ubuntu AS stage1
# Install compiler and compile code
RUN apt install gcc -y
...
RUN make

# Start new stage to create final image


FROM alpine-base
# Copy from first stage to final
COPY --from=stage1 /data_app /data_app
# Run application on container start
CMD ["data_app"]

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Multi-platform builds
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
Multi-platform?
What does multi-platform mean?
Different OS types
linux

windows

macos

Different CPU types


x64_64 or amd64

arm64

arm7

Usually referred to as os/cpu , such as


linux/amd64

INTERMEDIATE DOCKER
Creating multi-platform builds
Is built on multi-stage build behavior
The initial / build stage tends to use cross-compilers and relies on the architecture of the
host system

Final stage uses the architecture / OS for the intended target.

INTERMEDIATE DOCKER
Multi-platform Dockerfile options
Build stage uses the --platform=$BUILDPLATFORM flag
$BUILDPLATFORM represents the platform of the host running the build

Sometimes uses the ARG directive


Passes local environment variables into the Docker build system

In this case, TARGETOS and TARGETARCH

ARG TARGETOS TARGETARCH

The environment variables at the host level can be defined previously or using the env
command.

INTERMEDIATE DOCKER
Multi-platform example
# Initial stage, using local platform
FROM --platform=$BUILDPLATFORM golang:1.21 AS build
# Copy source into place
WORKDIR /src
COPY . .
# Pull the environment variables from the host
ARG TARGETOS TARGETARCH
# Compile code using the ARG variables
RUN env GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o /final/app .

# Create container and load the cross-compiled code


FROM alpine
COPY --from=build /final/app /bin

INTERMEDIATE DOCKER
Building a multi-platform build
To create a multi-platform build, instead of using docker build , we must use
docker buildx with assorted options

docker buildx provides more commands and capabilities over docker build , including
the option to specify a platform

docker buildx build --platform linux/amd64,linux/arm64 -t multi-platform-app .

Prior to running the build, we must also have a new builder container present. This is done
with the docker buildx create --bootstrap --use command.

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Introduction to
Docker Compose
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
What is Docker Compose?
Additional command-line tool for Docker
Define and manage multi-container
applications

Specify containers, networking, and storage


volumes in a single file
compose.yml or compose.yaml

Older compose files may be named


docker-compose.yaml

Easy to share / demo applications

INTERMEDIATE DOCKER
Example compose.yaml
# Define the services
services:
# Define the container(s), by name
webapp:
image: "webapp"
# Optionally, define the port forwarding
ports:
- "8000:5000"
# Define any other containers required
redis:
image: "redis:alpine"

INTERMEDIATE DOCKER
Starting an application
docker compose up
On older systems, docker-compose up

docker compose -f <yaml> up

docker compose up -d

$ docker compose up
[+] Running 2/0
? Network composetest_default Created
? Container composetest-redis-1 Created
? Container composetest-web-1 Created
Attaching to redis-1, web-1
redis-1 | 1:C 11 Mar 2024 04:09:51.754 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
web-1 | * Serving Flask app 'app.py'
web-1 | * Running on http://127.0.0.1:5000

INTERMEDIATE DOCKER
Checking status of applications
docker compose ls

$ docker compose ls
NAME STATUS CONFIG FILES
webapp running(2) /webapp/docker-compose.yml

INTERMEDIATE DOCKER
Stopping an application
docker compose down
docker-compose down

docker compose -f <yaml> down

$ docker compose down


[+] Running 3/3
? Container composetest-redis-1 Removed
? Container composetest-web-1 Removed
? Network composetest_default Removed

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Creating
compose.yaml files
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
YAML
Yet Another Markup Language
services:
YAML Ain't Markup Language postgres:
Text file, but spacing matters (like Python) container_name: postgres
image: postgres:latest
Used in many development scenarios for
ports:
configuration
- "5432:5432"
Rules can be tricky, mainly keep entries
restart: always
lined up as in examples
pgadmin:
container_name: pgadmin
image: dpage/pgadmin4:latest
ports:
- "5050:80"
restart: always

INTERMEDIATE DOCKER
Main sections
Different sections handle different
services:
components ... # Define containers
services: list the containers to load networks:
... # Define any networking details
networks: handles networking definitions
volumes:
volumes: controls any volume mounting
... # Define storage requirements
configs: handles configuration options configs:
without custom images ... # Define special config details
secrets:
secrets: Provides options to handle
... # Define passwords / etc
passwords, tokens, API keys, etc

Refer to the Docker Compose


Documentation for more information
1 https://docs.docker.com/compose/compose-file/

INTERMEDIATE DOCKER
Services section
Defines all required resources for the application
Primarily specifies the containers and images to be used

Extensive options available, but only apply to the individual container(s)

Indention is applied as needed

First subsection is the name of each component, followed by the settings

INTERMEDIATE DOCKER
Services example
services: Resource name
# Resource name container_name: , the assigned name of
postgres:
the container otherwise it's random
# Container name, otherwise random
container_name: postgres image: , which container image to use
# Container image to use
ports: , contains a list of any port
image: postgres:latest
mapping required
# Any port mapping required
ports: Followed by next resources required
# Network details
- "5432:5432"
# Next resource
pgadmin:
...

INTERMEDIATE DOCKER
Additional comments
config.yaml syntax is extensive
Covering very small portion of compose.yaml options

Review the documentation!

It's typically not required to build a compose.yaml file from scratch

1 https://docs.docker.com/compose/compose-file/

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Dependencies and
troubleshooting in
Docker Compose
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
What are dependencies?
Dependencies define the order of resource
startup

Resources (containers) may require other


resources

Example web application


Database container postgresql must
start first

INTERMEDIATE DOCKER
What are dependencies?
Dependencies define the order of resources
Resources (containers) may require other
resources

Example web application


Database container postgresql must
start first

Then the python_app

INTERMEDIATE DOCKER
What are dependencies?
Dependencies define the order of resources
Resources (containers) may require other
resources

Example web application


Database container postgresql must
start first

Then the python_app

Finally, the nginx web server

INTERMEDIATE DOCKER
depends_on
Dependencies defined using the services:
depends_on attribute
postgresql:
Can chain dependencies as per example image: postgresql:latest

Or, can have multiple dependencies per


python_app:
resource if required
image: custom_app
Order of the compose.yaml file does not
depends_on:
matter
- postgresql

nginx:
image: nginx/latest
depends_on:
- python_app

INTERMEDIATE DOCKER
Shutting down applications
Shutting down an application occurs in
reverse order

Stops nginx resource

INTERMEDIATE DOCKER
Shutting down applications
Shutting down an application occurs in
reverse order

Stops nginx resource

Then stops the python_app resource

INTERMEDIATE DOCKER
Shutting down applications
Shutting down an application occurs in
reverse order

Stops nginx resource

Then stops the python_app resource

And finally the postgresql resource

INTERMEDIATE DOCKER
Other options
Docker Compose provides other options for services:
dependencies nginx:
condition: defines how to decide when image: nginx/latest
resource is ready. depends_on:
service_started - Resource has started python_app:
normally condition: service_started
Default behavior
python_app:
service_completed_successfully -
image: custom_app
Resource ran to completion, such as a
depends_on:
initial configuration / etc
postgresql:
service_healthy - Resource meets a
condition: service_healthy
criteria defined by healthcheck

INTERMEDIATE DOCKER
Docker Compose troubleshooting tools
Docker Compose has additional troubleshooting tools
docker compose logs - Gathers output from all resources in application

redis-1 | * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo


redis-1 | * Running mode=standalone, port=6379.
redis-1 | * Server initialized
redis-1 | * Ready to accept connections tcp
web-1 | * Serving Flask app 'app.py'
web-1 | * Running on all addresses (0.0.0.0)
web-1 | * Running on http://172.20.0.2:5000
web-1 | Press CTRL+C to quit

docker compose logs <resourcename>

INTERMEDIATE DOCKER
docker compose top
docker compose top shows status of resources within an application

composetest-redis-1
UID PID PPID C STIME TTY TIME CMD
999 2767 2726 0 01:16 ? 00:03:27 redis-server *:6379

composetest-web-1
UID PID PPID C STIME TTY TIME CMD
root 2768 2740 0 01:16 ? 00:00:23 /usr/local/bin/python /usr/local/

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Creating a data
service within
Docker
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
Data sharing
docker run -v <host directory>:<container directory>
-v ~/hostdata:/containerdata

INTERMEDIATE DOCKER
Data sharing in compose.yaml
Also present in compose.yaml files

services:
resource:
name: resource1

# Section named volumes


volumes:
- <host directory>:<container directory>
# Such as:
- ~/hostdata:/containerdata

INTERMEDIATE DOCKER
Networks
docker run --network <networkname>
docker run --network net1

In compose.yaml resources

services:
resource:
name: resource1

networks:
network_name:
# Such as:
net1:

INTERMEDIATE DOCKER
Port mapping
docker run -p hostport:containerport
-p 8000:8000

Available in compose.yaml resources

services:
resource:
name: resource1

ports:
- hostport:containerport
# Such as:
- 8000:8000

INTERMEDIATE DOCKER
docker inspect
Determine information about provisioned containers
docker inspect <id / name>

Provides various levels of information


Mounts : Provides mounted data information

NetworkSettings : Network information


NetworkSettings:Networks : Shows the Docker network(s) connection details

"Config": {
"Mounts": [...]
...
"Networks": {
"network1": {
...

INTERMEDIATE DOCKER
Data service

INTERMEDIATE DOCKER
Data service

INTERMEDIATE DOCKER
Data service

INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Course review
I N T E R M E D I AT E D O C K E R

Mike Metzger
Data Engineering Consultant
Next steps
Review Docker documentation docs.docker.com
Containerize more applications

Create custom repositories

Docker Swarm

Kubernetes

CI/CD

Mapping to host GPU hardware

INTERMEDIATE DOCKER
Congratulations!
I N T E R M E D I AT E D O C K E R

You might also like