Introduction and Intermediate Docker
Introduction and Intermediate Docker
advantages
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Prerequisites
We will use:
ls , cd , and mkdir to find our way in and manage the file system.
INTRODUCTION TO DOCKER
Containers
A portable computing environment
INTRODUCTION TO DOCKER
Making it less abstract
INTRODUCTION TO DOCKER
Containers run identically every time
INTRODUCTION TO DOCKER
Containers run identically everywhere
INTRODUCTION TO DOCKER
Isolation
INTRODUCTION TO DOCKER
Containers provide security
INTRODUCTION TO DOCKER
Containers are lightweight
Security
Portability
Reproducibility
Lightweight
In comparison to running an application:
Outside of a container
INTRODUCTION TO DOCKER
Containers and data science
Automatically reproducible
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
The Docker Engine
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Docker ecosystem
INTRODUCTION TO DOCKER
Docker Engine
1 https://docs.docker.com/engine/
INTRODUCTION TO DOCKER
The Docker daemon
1 https://docs.docker.com/engine/ 2 https://docs.docker.com/get-started/overview/#docker-architecture
INTRODUCTION TO DOCKER
Images and Containers
1 https://docs.docker.com/engine/ 2 https://docs.docker.com/get-started/overview/#docker-architecture
INTRODUCTION TO DOCKER
Containers are processes
INTRODUCTION TO DOCKER
Containers are processes
INTRODUCTION TO DOCKER
Containers are isolated processes
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Containers vs.
Virtual Machines
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Containers and Virtual Machines
INTRODUCTION TO DOCKER
Resource Virtualization
INTRODUCTION TO DOCKER
Containers vs Virtual Machines
INTRODUCTION TO DOCKER
Security of Virtualization
INTRODUCTION TO DOCKER
Containers are lightweight
INTRODUCTION TO DOCKER
Advantages of containers
Because of their smaller size containers
Are faster to
Start
Stop
Distribute
To change or update
INTRODUCTION TO DOCKER
Advantages of Virtual Machines
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Running Docker
containers
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Prerequisite
Command Usage
nano <file-name> Opens <file-name> in the nano text editor
touch <file-name> Creates an empty file with the specified name
echo "<text>" Prints <text> to the console
<command> >> <file> Pushes the output of <command> to the end of <file>
<command> -y Automatically respond yes to all prompts from <command>
INTRODUCTION TO DOCKER
The Docker CLI
Docker command line interface will send instructions to the Docker daemon.
INTRODUCTION TO DOCKER
Docker container output
docker run <image-name>
INTRODUCTION TO DOCKER
Choosing Docker container output
docker run <image-name>
repl@host:/#
INTRODUCTION TO DOCKER
An interactive Docker container
Adding -it to docker run will give us an interactive shell in the started container.
repl@container:/# exit
exit
repl@host:/#
INTRODUCTION TO DOCKER
Running a container detached
Adding -d to docker run will run the container in the background, giving us back control of
the shell.
INTRODUCTION TO DOCKER
Listing and stopping running containers
docker ps
repl@host:/# docker ps
CONTAINER ID IMAGE COMMAND CREATED
4957362b5fb7 postgres "docker-entrypoint.s…" About a minute ago
STATUS PORTS NAMES
Up About a minute 5432/tcp awesome_curie
INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Start a container docker run <image-name>
Start an interactive container docker run -it <image-name>
Start a detached container docker run -d <image-name>
List running containers docker ps
Stop a container docker stop <container-id>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Working with Docker
containers
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Listing containers
repl@host:/# docker ps
CONTAINER ID IMAGE .. CREATED STATUS ... NAMES
3b87ec116cb6 postgres 2 seconds ago Up 1 second ... adoring_germain
8a7830bbc787 postgres 3 seconds ago Up 2 seconds ... exciting_heisenberg
fefdf1687b39 postgres 3 seconds ago Up 2 seconds ... vigilant_swanson
b70d549d4611 postgres 4 seconds ago Up 3 seconds ... nostalgic_matsumoto
a66c71c54b92 postgres 4 seconds ago Up 4 seconds ... lucid_matsumoto
8d4f412adc3f postgres 6 seconds ago Up 5 seconds ... fervent_ramanujan
fd0b3b2a843e postgres 7 seconds ago Up 6 seconds ... cool_dijkstra
0d1951db81c4 postgres 8 seconds ago Up 7 seconds ... happy_sammet
...
INTRODUCTION TO DOCKER
Named containers
docker run --name <container-name> <image-name>
INTRODUCTION TO DOCKER
Filtering running containers
docker ps -f "name=<container-name>"
INTRODUCTION TO DOCKER
Container logs
docker logs <container-id>
2022-10-24 12:10:40.318 UTC [1] LOG: database system is ready to accept connect..
INTRODUCTION TO DOCKER
Live logs
docker logs -f <container-id>
2022-10-24 12:10:40.309 UTC [1] LOG: starting PostgreSQL 14.5 (Debian 14.5-1.pg..
2022-10-24 12:10:40.309 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port ..
2022-10-24 12:10:40.309 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-10-24 12:10:40.311 UTC [1] LOG: listening on Unix socket "/var/run/postgre..
2022-10-24 12:10:40.315 UTC [62] LOG: database system was shut down at 2022-10-..
2022-10-24 12:10:40.318 UTC [1] LOG: database system is ready to accept connect..
INTRODUCTION TO DOCKER
Cleaning up
docker container rm <container-id>
INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Start container with a name docker run --name <container-name> <image-name>
Filter running container on name docker ps -f "name=<container-name>"
See existing logs for container docker logs <container-id>
See live logs for container docker logs -f <container-id>
Exit live log view of container CTRL+C
Remove stopped container docker container rm <container-id>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Managing local
docker images
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
INTRODUCTION TO DOCKER
Pulling an image
docker pull <image-name>
INTRODUCTION TO DOCKER
Image versions
INTRODUCTION TO DOCKER
Listing images
docker images
INTRODUCTION TO DOCKER
Removing images
docker image rm <image-name>
INTRODUCTION TO DOCKER
Cleaning up containers
docker container prune
INTRODUCTION TO DOCKER
Cleaning up images
docker image prune -a
INTRODUCTION TO DOCKER
Dangling images
docker images
INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Pull an image docker pull <image-name>
Pull a specific version of an image docker pull <image-name>:<image-version>
List all local images docker images
Remove an image docker image rm <image-name>
Remove all stopped containers docker container prune
Remove all images docker image prune -a
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Distributing Docker
Images
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Private Docker registries
Unlike Docker official images there is no quality guarantee
dockerhub.myprivateregistry.com/classify_spam
Using tag: v1
latest: Pulling from dockerhub.myprivateregistry.com
ed02c6ade914: Pull complete
Digest: sha256:b6b83d3c331794420340093eb706b6f152d9c1fa51b262d9bf34594887c2c7ac
Status: Downloaded newer image for dockerhub.myprivateregistry.com/classify_spam:v1
dockerhub.myprivateregistry.com/classify_spam:v1
INTRODUCTION TO DOCKER
Pushing to a registry
docker image push <image name>
Pushing to a specific registry --> name of the image needs to start with the registry url
INTRODUCTION TO DOCKER
Authenticating against a registry
Docker official images --> No authentication needed
INTRODUCTION TO DOCKER
Docker images as files
Sending a Docker image to one or a few people? Send it as a file!
Save an image
Load an image
INTRODUCTION TO DOCKER
Summary of new commands
Usage Command
Pull image from private registry docker pull <private-registry-url>/<image-name>
Name an image docker tag <old-name> <new-name>
Push an image docker image push <image-name>
Login to private registry docker login <private-registry-url>
Save image to file docker save -o <file-name> <image-name>
Load image from file docker load -i <file-name>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Creating your own
Docker images
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Creating images with Dockerfiles
INTRODUCTION TO DOCKER
Starting a Dockerfile
A Dockerfile always start from another image, specified using the FROM instruction.
FROM postgres
FROM ubuntu
FROM hello-world
FROM my-custom-data-pipeline
FROM postgres:15.0
FROM ubuntu:22.04
FROM hello-world:latest
FROM my-custom-data-pipeline:v1
INTRODUCTION TO DOCKER
Building a Dockerfile
Building a Dockerfile creates an image.
INTRODUCTION TO DOCKER
Naming our image
In practice we almost always give our images a name using the -t flag:
...
=> => writing image sha256:a67f41b1d127160a7647b6709b3789b1e954710d96df39ccaa21..
=> => naming to docker.io/library/first_image
INTRODUCTION TO DOCKER
Customizing images
RUN <valid-shell-command>
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
...
After this operation, 22.8 MB of additional disk space will be used.
Do you want to continue? [Y/n]
INTRODUCTION TO DOCKER
Building a non-trivial Dockerfile
When building an image Docker actually runs commands after RUN
Docker running RUN apt-get update takes the same amount of time as us running it!
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Start a Dockerfile from an image FROM <image-name>
Add a shell command to image RUN <valid-shell-command>
Make sure no user input is needed for the shell-command. RUN apt-get install -y python3
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Managing files in
your image
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
COPYing files into an image
The COPY instruction copies files from our local machine into the image we're building:
If the destination path does not have a filename, the original filename is used:
INTRODUCTION TO DOCKER
COPYing folders
Not specifying a filename in the src-path will copy all the file contents.
/projects/
pipeline_v3/
pipeline.py
requirements.txt
tests/
test_pipeline.py
INTRODUCTION TO DOCKER
Copy files from a parent directory
/init.py
/projects/
Dockerfile
pipeline_v3/
pipeline.py
INTRODUCTION TO DOCKER
Downloading files
Instead of copying files from a local directory, files are often downloaded in the image build:
Download a file
RUN rm <copy_directory>/<filename>.zip
INTRODUCTION TO DOCKER
Downloading files efficiently
Each instruction that downloads files adds to the total size of the image.
Even if the files are later deleted.
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
COPY <src-path-on-host> <dest-path-
Copy files from host to the image on-image>
Copy a folder from host to the image COPY <src-folder> <dest-folder>
We can't copy from a parent directory where we
COPY ../<file-in-parent-directory> /
build a Dockerfile
Keep images small by downloading, unzipping, and cleaning up in a single RUN instruction:
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Choosing a start
command for your
Docker image
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
What is a start command?
The hello-world image prints text and then stops.
INTRODUCTION TO DOCKER
What is a start command?
An image with python could start python on startup.
....
>>> exit()
repl@host:/#
INTRODUCTION TO DOCKER
Running a shell command at startup
CMD <shell-command>
INTRODUCTION TO DOCKER
Typical usage
Starting an application to run a workflow or that accepts outside connections.
CMD postgres
CMD start.sh
INTRODUCTION TO DOCKER
When will it stop?
INTRODUCTION TO DOCKER
Overriding the default start command
Starting an image
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Add a shell command run when a container is started from the CMD <shell-
image. command>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Introduction to
Docker layers and
caching
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Docker build
Downloading and unzipping a file using the Docker instructions.
/example_folder.zip
/example_folder/
example_file1
example_file2
INTRODUCTION TO DOCKER
Docker instructions are linked to File system changes
Each instruction in the Dockerfile is linked to the changes it made in the image file system.
FROM docker.io/library/ubuntu
=> Gives us a file system to start from with all files needed to run Ubuntu
INTRODUCTION TO DOCKER
Docker layers
Docker layer: All changes caused by a single Dockerfile instruction.
Docker image: All layers created during a build
--> Docker image: All changes to the file system by all Dockerfile instructions.
INTRODUCTION TO DOCKER
Docker caching
Consecutive builds are much faster because Docker re-uses layers that haven't changed.
Re-running a build:
INTRODUCTION TO DOCKER
Understanding Docker caching
When layers are cached helps us understand why sometimes images don't change after a
rebuild.
Docker will use cached layers because the instructions are identical to previous builds.
INTRODUCTION TO DOCKER
Understanding Docker caching
Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.
In the following Dockerfile all instructions need to be rebuild if the pipeline.py file is changed:
FROM ubuntu
COPY /app/pipeline.py /app/pipeline.py
RUN apt-get update
RUN apt-get install -y python3
INTRODUCTION TO DOCKER
Understanding Docker caching
Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.
In the following Dockerfile, only the COPY instruction will need to be re-run.
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
COPY /app/pipeline.py /app/pipeline.py
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Changing users and
working directory
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Dockerfile instruction interaction
FROM, RUN, and COPY interact through the file system.
INTRODUCTION TO DOCKER
WORKDIR - Changing the working directory
Starting all paths at the root of the file system:
WORKDIR /home/my_user_with_a_long_name/work/projects/
INTRODUCTION TO DOCKER
RUN in the current working directory
Instead of using the full path for every command:
RUN /home/repl/projects/pipeline/init.sh
RUN /home/repl/projects/pipeline/start.sh
WORKDIR /home/repl/projects/pipeline/
RUN ./init.sh
RUN ./start.sh
INTRODUCTION TO DOCKER
Changing the startup behavior with WORKDIR
Instead of using the full path:
CMD /home/repl/projects/pipeline/start.sh
WORKDIR /home/repl/projects/pipeline/
CMD start.sh
INTRODUCTION TO DOCKER
Linux permissions
Permissions are assigned to users.
Root is a special user with all permissions.
Best practice
Use root to create new users with permissions for specific tasks.
INTRODUCTION TO DOCKER
Changing the user in an image
Best practice: Don't run everything as root
Ubuntu -> root by default
INTRODUCTION TO DOCKER
Changing the user in a container
Dockerfile setting the user to repl:
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Change the current working directory WORKDIR <path>
Change the current user USER <user-name>
INTRODUCTION TO DOCKER
Time for practice!
INTRODUCTION TO DOCKER
Variables in
Dockerfiles
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Variables with the ARG instruction
Create variables in a Dockerfile
ARG <var_name>=<var_value>
$path
INTRODUCTION TO DOCKER
Use-cases for the ARG instruction
Setting the Python version
FROM ubuntu
ARG python_version=3.9.7-1+bionic1
RUN apt-get install python3=$python_version
RUN apt-get install python3-dev=$python_version
Configuring a folder
FROM ubuntu
ARG project_folder=/projects/pipeline_v3
COPY /local/project/files $project_folder
COPY /local/project/test_files $project_folder/tests
INTRODUCTION TO DOCKER
Setting ARG variables at build time
FROM ubuntu
ARG project_folder /projects/pipeline_v3
COPY /local/project/files $project_folder
COPY /local/project/test_files $project_folder/tests
INTRODUCTION TO DOCKER
Variables with ENV
Create variables in a Dockerfile
ENV <var_name>=<var_value>
$DB_USER
INTRODUCTION TO DOCKER
Use-cases for the ENV instruction
Setting a directory to be used at runtime
ENV DATA_DIR=/usr/loca/var/postgres
1 https://hub.docker.com/_/postgres
INTRODUCTION TO DOCKER
Secrets in variables are not secure
docker history <image-name>
ARG DB_PASSWORD=example_password
INTRODUCTION TO DOCKER
Summary
Usage Dockerfile Instruction
Create a variable accessible only during the build ARG <name>=<value>
Create a variable ENV <name>=<value>
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Creating Secure
Docker Images
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Inherent Security
INTRODUCTION TO DOCKER
Making secure images
INTRODUCTION TO DOCKER
Images from a trusted source
Creating secure images -> Start with an image from a trusted source
Docker Hub filters:
INTRODUCTION TO DOCKER
Keep software up-to-date
INTRODUCTION TO DOCKER
Keep images minimal
Adding unnecessary packages Installing only essential packages
reduces security improves security
Ubuntu with: Ubuntu with:
Python2.7 Python3.11
Java default-jre
Java openjdk-11
Java openjdk-8
Airflow
Our pipeline application
INTRODUCTION TO DOCKER
Don't run applications as root
Allowing root access to an image defeats keeping the image up-to-date and minimal.
INTRODUCTION TO DOCKER
Let's practice!
INTRODUCTION TO DOCKER
Wrap-up
INTRODUCTION TO DOCKER
Tim Sangster
Software Engineer @ DataCamp
Chapter 1: The theoretical foundation
INTRODUCTION TO DOCKER
Chapter 2: The Docker CLI
Usage Command
docker run (--name <container-name>) (-it) (-d) <image-
Start a container name>
List running containers docker ps (-f "name=<container-name>")
Stop a container docker stop <container-id>
See (live) logs for container docker logs (-f) <container-id>
Remove stopped container docker container rm <container-id>
Pull a specific version of an docker pull <image-name>:<image-version>
image
List all local images docker images
Remove an image docker image rm <image-name>
INTRODUCTION TO DOCKER
Chapter 3: Dockerfiles
FROM ubuntu
RUN apt-get update && apt-get install python3
COPY /projects/pipeline /app/
CMD /app/init.py
INTRODUCTION TO DOCKER
Chapter 4: Security and Customization
Usage Dockerfile Instruction
Change the current working directory WORKDIR <path>
Change the current user USER <user-name>
Create a variable accessible only during the build ARG <name>=<value>
Create a variable ENV <name>=<value>
INTRODUCTION TO DOCKER
Chapter 4: Security and Customization
Isolation provided by containers gives security but is not perfect.
Use the "Trusted Content" images from the official Docker Hub registry
Only install the software you need for the current use case.
INTRODUCTION TO DOCKER
What more is there to learn?
Dockerfile instructions Multi stage builds
ENTRYPOINT
FROM ubuntu as stage1
HEALTHCHECK RUN generate_data.py
EXPOSE ...
FROM postgres as stage2
...
COPY --from=stage 1 /tmp /data
INTRODUCTION TO DOCKER
Thank you!
INTRODUCTION TO DOCKER
Intermediate Docker
Commands
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Docker refresher
Docker is a container runtime The DataCamp Introduction to Docker course
Is designed to run and manage various is a pre-requisite to this course
INTERMEDIATE DOCKER
Docker commands
docker run docker --help
docker stop
Provides a list of potential Docker
commands
docker build
docker COMMAND --help
docker run --help
Provides options for the docker run
command
INTERMEDIATE DOCKER
Temporary containers
Docker containers are usually created with docker run --rm
docker run docker run --rm alpine:latest
/bin/sh
Containers remain even after stopping /
exiting Referenced as 'clean-up' or 'remove'
Testing
Scripts
INTERMEDIATE DOCKER
docker ps
Used for determining name, id, status, and other attributes of containers on a given machine
running Docker
Will cover how to get extremely detailed information about containers later in the course
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Mounting the host
filesystem
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineer
Container filesystems
Container instances each have their own
filesystem
Based off the image the container was
created with
Any changes are tied to that specific
container instance
INTERMEDIATE DOCKER
Sharing files or directories
Can attach specific files or directories to Known as bind-mount
containers Can be read-only or read/write
Allows for persistence of data, without Note: When files or directories are attached
maintaining a specific container
to a container, they are not accessible to
Can upgrade container to new version but the host until the container is shutdown
safely keep data / changes
INTERMEDIATE DOCKER
Using the -v option
bind-mounts most often use the -v flag
docker run -v ~/html:/var/www/html \
-v <source>:<destination> nginx
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Persistent volumes
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
What is a volume?
Volumes are an option to store data in
Docker, unrelated to the container image or
host filesystem
INTERMEDIATE DOCKER
Managing volumes
docker volume
docker volume ls or
docker volume list
docker volume rm
INTERMEDIATE DOCKER
Volume creation example
bash> docker volume create sqldata
sqldata
INTERMEDIATE DOCKER
Volume inspect example
bash> volume inspect sqldata
[
{
"CreatedAt": "2024-01-27T04:27:51Z",
"Driver": "local",
"Labels": null,
"Mountpoint": "/var/lib/docker/volumes/sqldata/_data",
"Name": "sqldata",
"Options": null,
"Scope": "local"
}
]
INTERMEDIATE DOCKER
Attaching volumes
Uses the -v command $ docker run -v sqldata:/data postgres
docker run -v <volumename>:
<destination path>:<options>
INTERMEDIATE DOCKER
Drivers
Methods of storing Docker volumes
Can include:
Local filesystem (default)
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Networking
refresher
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
What is networking?
A computer network consists of systems
communicating via a defined method
INTERMEDIATE DOCKER
Networking terms
Host
General term for a computer
Network
Group of hosts
Interface
Actual connection from a host to a network, such as Ethernet or WiFi
LAN
Local Area Network, or set of computers at a given location
VLAN
Virtual LAN, or a software LAN
INTERMEDIATE DOCKER
Internet Protocol
IP
Internet protocol, method to connect between networks using IP addresses
IPv4
Version of IP supporting 4.2 billion addresses, currently exhausted
IPv6
Newer version of IP, supporting 2^128 addresses, still being deployed
IPv4: 10.10.10.1
IPv6: 2001:0db8:85a3:0000:0000:8a2e:0370:7334
INTERMEDIATE DOCKER
TCP / UDP
TCP
Transmission Control Protocol, used to reliably communicate between hosts on IP
networks
UDP
User Datagram Protocol, used to communicate between hosts on IP where
communication is not required.
INTERMEDIATE DOCKER
Ports
Port
Addresses services on a given host, a value between 0 and 65535, used to communicate
between hosts via TCP or UDP
INTERMEDIATE DOCKER
Application protocols
HTTP/HTTPS
Application protocol, defaulting to TCP port 80 for web communication. Secure version on
TCP 443
SMTP
Email transfer protocol, over TCP port 25
SNMP
Network management protocol, over UDP port 161
INTERMEDIATE DOCKER
Docker and networking
Can communicate between containers
Can communicate with the host system
INTERMEDIATE DOCKER
Docker and IP
Containers can have IP addresses
Use ifconfig <interface> or ip addr show <interface> from within container to find
addresses
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Making network
services available in
Docker
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Network services
Network services listen on a given port
Only one program can listen on an IP:port combo at a given time
For example, 10.1.2.3 80 would be listening on 10.1.2.3 on port 80.
Consider trying to debug different versions of a web server that listens on port 80
Could only run one copy of the application at a time given that it listens only on that port
INTERMEDIATE DOCKER
Containerized services
Wrapping application in a container means that each container can now listen on that port
(as the IP:port combo is different, each container has a different IP)
INTERMEDIATE DOCKER
Port mapping
The answer is the use of port mapping, or
port forwarding / translation
INTERMEDIATE DOCKER
Enabling port mapping
To enable port mapping on a given container, we use the docker run command, and the
-p flag
-p 5501:80
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Exposing ports with
Dockerfiles
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineer
Exposing services
EXPOSE command
INTERMEDIATE DOCKER
Using the -p / -P flags
Still requires use of -p or -P options to docker run to make the ports available outside
the container
The -P option will automatically map an ephemeral port to the exposed port(s). Must use
docker ps -a to see which ports are mapped.
INTERMEDIATE DOCKER
EXPOSE example
# Dockerfile
FROM python:3.11-slim
ENTRYPOINT ["python","-mhttp.server"]
# Let the Docker engine know
# port 8000 should be available
EXPOSE 8000
INTERMEDIATE DOCKER
Making ports reachable
Automatically map temporary port from host to the container
docker ps -a
INTERMEDIATE DOCKER
Finding exposed ports
docker inspect provides a lot of information
"NetworkSettings": {
"Bridge": "",
"Ports": {
"8000/tcp": [{
"HostIp": "0.0.0.0",
"HostPort": "55001"
}]
},
...
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Docker networks
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Docker networking
Docker has extensive networking options
Can create networks to communicate between containers, host, and external systems
INTERMEDIATE DOCKER
Docker networking types
Docker supports different networking types, using drivers
bridge: Default driver, allows connections out, connections in if exposed
Will mostly use the bridge driver to create our own networks
INTERMEDIATE DOCKER
Working with Docker networks
Several commands:
docker network
docker network <command>
INTERMEDIATE DOCKER
Docker network example
Create a Docker network named mynetwork
5ff0febab98f73b74dd753eb44a30f7d7291052b3b1d58b0134589221cb8e33d
INTERMEDIATE DOCKER
Attaching containers to networks
How to connect container to a network?
docker run --network <networkname> ...
INTERMEDIATE DOCKER
docker network inspect
How to check details of network?
docker network inspect <networkname>
INTERMEDIATE DOCKER
docker network inspect example
"Name": "mynetwork",
...
"Driver": "bridge",
...
Containers": { "2be08aa942029191350d4bceb8816254af8713dd6f7dcbadcab8f068f
"Name": "unruffled_kare",
"EndpointID": "29739356ae200e1e901d2eabef05efaca0fb37e1a4e1a4c3bf369
"MacAddress": "02:42:ac:12:00:02",
"IPv4Address": "172.18.0.2/16",
"IPv6Address": ""
}
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Optimizing Docker
images
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Docker image explanation
Docker images are the base of a given container
Holds all content initially available to a container instance
INTERMEDIATE DOCKER
Docker image concerns
Tempting to add all potentially needed
components to an image
INTERMEDIATE DOCKER
Docker image recommendations
Split containers to the smallest level needed
Easier to combine multiple containers later vs. building a single large image
Like
building with reusable components
Updates to specific software only affect containers using that image instead of all
containers needing the update
Can optimize for size, making use and distribution much easier
INTERMEDIATE DOCKER
Docker image breakdown example
Consider a data engineering project using
FROM ubuntu
the following software:
RUN apt update
Postgresql database
RUN apt install -y postgresql
Python ETL software RUN apt install -y nginx
RUN apt install -y python3.9
Web server software
...
Possible to use a single image, but we
would need to update the image each time
we had an update to the ETL or web server
setup.
INTERMEDIATE DOCKER
Example with minimized containers
Better options with Docker bash> docker run -d postgresql:latest
Split each into its own container bash> docker run -d nginx:latest
Postgresql database container ...
Web server
INTERMEDIATE DOCKER
Determining image size
Using docker images bash> docker images
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Understanding
layers
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Docker layers
Docker images are made up of layers
INTERMEDIATE DOCKER
Why do we care about layers?
Reusability
Faster build time
Smaller builds
INTERMEDIATE DOCKER
docker image inspect
How to determine the layers within an image?
docker image inspect <img id | name>
The RootFS:Layers section provides details about layers in a given Docker image
INTERMEDIATE DOCKER
docker image inspect example
bash> docker image inspect postgres:latest
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:6f2d01c02c30cc1ffac781aff795cba8eeb29cc27756fe37bf525169856369c6",
"sha256:c6ad2d5a3cad837ae66b5560e9c577bfad062556b1f00791d8d733ce44a577ce",
"sha256:2153552a84ccbf7e4a28a50e766b72345072e59f8af0ff068baf98b413132e0c",
"sha256:6c00217b1e4b15c25eb3f6e28b1af8c295f469568014621e31a4c5eb5a8aca6f",
"sha256:167177d78e2a33aa822faebe9f01683c648ae78179059db05cd25737f215c305",
...
INTERMEDIATE DOCKER
jq command-line tool
Sometimes difficult to analyze the results from docker image inspect
jq commandline tool is used to read JSON data, like what's returned from
docker image inspect
INTERMEDIATE DOCKER
jq recipes with Docker
Method to see just a specific section, for example the RootFS data:
docker image inspect <id> | jq '.[0] | .RootFS'
{
"Type": "layers",
"Layers": [ "sha256:0f5c115c5eea96...",
"sha256:20792593831cdc..."
]
}
INTERMEDIATE DOCKER
jq recipes with Docker (part 2)
Method to count number of layers using jq :
docker image inspect <id> | jq '.[0] | {LayerCount: .RootFS.Layers | length}'
{
"LayerCount": 2
}
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Multi-stage builds
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Single-stage builds
Typical Docker images are created using a FROM ubuntu
single FROM command RUN apt update
Each addition to the source image adds RUN apt install gcc -y
space and makes its management ...
RUN make
Consider an application that must be
CMD ["data_app"]
compiled prior to use
You can add all the necessary
components to the image, compile it,
and then configure the final image for
use
INTERMEDIATE DOCKER
Multi-stage builds
Multi-stage builds use multiple containers
Typically has one or more build stages
COPY --from=<alias>
INTERMEDIATE DOCKER
Multi-stage build example
# Create initial build stage
FROM ubuntu AS stage1
# Install compiler and compile code
RUN apt install gcc -y
...
RUN make
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Multi-platform builds
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Multi-platform?
What does multi-platform mean?
Different OS types
linux
windows
macos
arm64
arm7
INTERMEDIATE DOCKER
Creating multi-platform builds
Is built on multi-stage build behavior
The initial / build stage tends to use cross-compilers and relies on the architecture of the
host system
INTERMEDIATE DOCKER
Multi-platform Dockerfile options
Build stage uses the --platform=$BUILDPLATFORM flag
$BUILDPLATFORM represents the platform of the host running the build
The environment variables at the host level can be defined previously or using the env
command.
INTERMEDIATE DOCKER
Multi-platform example
# Initial stage, using local platform
FROM --platform=$BUILDPLATFORM golang:1.21 AS build
# Copy source into place
WORKDIR /src
COPY . .
# Pull the environment variables from the host
ARG TARGETOS TARGETARCH
# Compile code using the ARG variables
RUN env GOOS=$TARGETOS GOARCH=$TARGETARCH go build -o /final/app .
INTERMEDIATE DOCKER
Building a multi-platform build
To create a multi-platform build, instead of using docker build , we must use
docker buildx with assorted options
docker buildx provides more commands and capabilities over docker build , including
the option to specify a platform
Prior to running the build, we must also have a new builder container present. This is done
with the docker buildx create --bootstrap --use command.
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Introduction to
Docker Compose
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
What is Docker Compose?
Additional command-line tool for Docker
Define and manage multi-container
applications
INTERMEDIATE DOCKER
Example compose.yaml
# Define the services
services:
# Define the container(s), by name
webapp:
image: "webapp"
# Optionally, define the port forwarding
ports:
- "8000:5000"
# Define any other containers required
redis:
image: "redis:alpine"
INTERMEDIATE DOCKER
Starting an application
docker compose up
On older systems, docker-compose up
docker compose up -d
$ docker compose up
[+] Running 2/0
? Network composetest_default Created
? Container composetest-redis-1 Created
? Container composetest-web-1 Created
Attaching to redis-1, web-1
redis-1 | 1:C 11 Mar 2024 04:09:51.754 * oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
web-1 | * Serving Flask app 'app.py'
web-1 | * Running on http://127.0.0.1:5000
INTERMEDIATE DOCKER
Checking status of applications
docker compose ls
$ docker compose ls
NAME STATUS CONFIG FILES
webapp running(2) /webapp/docker-compose.yml
INTERMEDIATE DOCKER
Stopping an application
docker compose down
docker-compose down
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Creating
compose.yaml files
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
YAML
Yet Another Markup Language
services:
YAML Ain't Markup Language postgres:
Text file, but spacing matters (like Python) container_name: postgres
image: postgres:latest
Used in many development scenarios for
ports:
configuration
- "5432:5432"
Rules can be tricky, mainly keep entries
restart: always
lined up as in examples
pgadmin:
container_name: pgadmin
image: dpage/pgadmin4:latest
ports:
- "5050:80"
restart: always
INTERMEDIATE DOCKER
Main sections
Different sections handle different
services:
components ... # Define containers
services: list the containers to load networks:
... # Define any networking details
networks: handles networking definitions
volumes:
volumes: controls any volume mounting
... # Define storage requirements
configs: handles configuration options configs:
without custom images ... # Define special config details
secrets:
secrets: Provides options to handle
... # Define passwords / etc
passwords, tokens, API keys, etc
INTERMEDIATE DOCKER
Services section
Defines all required resources for the application
Primarily specifies the containers and images to be used
INTERMEDIATE DOCKER
Services example
services: Resource name
# Resource name container_name: , the assigned name of
postgres:
the container otherwise it's random
# Container name, otherwise random
container_name: postgres image: , which container image to use
# Container image to use
ports: , contains a list of any port
image: postgres:latest
mapping required
# Any port mapping required
ports: Followed by next resources required
# Network details
- "5432:5432"
# Next resource
pgadmin:
...
INTERMEDIATE DOCKER
Additional comments
config.yaml syntax is extensive
Covering very small portion of compose.yaml options
1 https://docs.docker.com/compose/compose-file/
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Dependencies and
troubleshooting in
Docker Compose
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
What are dependencies?
Dependencies define the order of resource
startup
INTERMEDIATE DOCKER
What are dependencies?
Dependencies define the order of resources
Resources (containers) may require other
resources
INTERMEDIATE DOCKER
What are dependencies?
Dependencies define the order of resources
Resources (containers) may require other
resources
INTERMEDIATE DOCKER
depends_on
Dependencies defined using the services:
depends_on attribute
postgresql:
Can chain dependencies as per example image: postgresql:latest
nginx:
image: nginx/latest
depends_on:
- python_app
INTERMEDIATE DOCKER
Shutting down applications
Shutting down an application occurs in
reverse order
INTERMEDIATE DOCKER
Shutting down applications
Shutting down an application occurs in
reverse order
INTERMEDIATE DOCKER
Shutting down applications
Shutting down an application occurs in
reverse order
INTERMEDIATE DOCKER
Other options
Docker Compose provides other options for services:
dependencies nginx:
condition: defines how to decide when image: nginx/latest
resource is ready. depends_on:
service_started - Resource has started python_app:
normally condition: service_started
Default behavior
python_app:
service_completed_successfully -
image: custom_app
Resource ran to completion, such as a
depends_on:
initial configuration / etc
postgresql:
service_healthy - Resource meets a
condition: service_healthy
criteria defined by healthcheck
INTERMEDIATE DOCKER
Docker Compose troubleshooting tools
Docker Compose has additional troubleshooting tools
docker compose logs - Gathers output from all resources in application
INTERMEDIATE DOCKER
docker compose top
docker compose top shows status of resources within an application
composetest-redis-1
UID PID PPID C STIME TTY TIME CMD
999 2767 2726 0 01:16 ? 00:03:27 redis-server *:6379
composetest-web-1
UID PID PPID C STIME TTY TIME CMD
root 2768 2740 0 01:16 ? 00:00:23 /usr/local/bin/python /usr/local/
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Creating a data
service within
Docker
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Data sharing
docker run -v <host directory>:<container directory>
-v ~/hostdata:/containerdata
INTERMEDIATE DOCKER
Data sharing in compose.yaml
Also present in compose.yaml files
services:
resource:
name: resource1
INTERMEDIATE DOCKER
Networks
docker run --network <networkname>
docker run --network net1
In compose.yaml resources
services:
resource:
name: resource1
networks:
network_name:
# Such as:
net1:
INTERMEDIATE DOCKER
Port mapping
docker run -p hostport:containerport
-p 8000:8000
services:
resource:
name: resource1
ports:
- hostport:containerport
# Such as:
- 8000:8000
INTERMEDIATE DOCKER
docker inspect
Determine information about provisioned containers
docker inspect <id / name>
"Config": {
"Mounts": [...]
...
"Networks": {
"network1": {
...
INTERMEDIATE DOCKER
Data service
INTERMEDIATE DOCKER
Data service
INTERMEDIATE DOCKER
Data service
INTERMEDIATE DOCKER
Let's practice!
I N T E R M E D I AT E D O C K E R
Course review
I N T E R M E D I AT E D O C K E R
Mike Metzger
Data Engineering Consultant
Next steps
Review Docker documentation docs.docker.com
Containerize more applications
Docker Swarm
Kubernetes
CI/CD
INTERMEDIATE DOCKER
Congratulations!
I N T E R M E D I AT E D O C K E R