Containers_ Docker
Containers_ Docker
Containers: Docker
Getting started with Docker
There are two main components of Docker: Docker Desktop and Docker Hub.
What is the difference between a container and a virtual machine? Here is a breakdown:
Size: Containers are much smaller than Virtual Machines (VM) and run as isolated processes
versus virtualized hardware. VMs can be GBs while containers can be MBs.
Speed: Virtual Machines can be slow to boot and take minutes to launch. A container can spawn
much more quickly typically in seconds.
Composability: Containers are designed to be programmatically built and are defined as source
code in an Infrastructure as Code project (IaC). Virtual Machines are often replicas of a
manually built system. Containers make IaC workflows possible because they are defined as a
file and checked into source control alongside the projectʼs source code.
Real-World Examples of Containers
What problem do Docker format containers solve? In a nutshell, the operating system runtime can
be packaged along with the code, and this solves a particularly complicated problem with a long
history. There is a famous meme that goes “It works on my machine!”. While this is often told as a
joke to illustrate the complexity of deploying software, it is also true. Containers solve this exact
problem. If the code works in a container, then the container configuration can be checked in as
code. Another way to describe this concept is that the actual Infrastructure is treated as code. This
is called IaC (Infrastructure as Code).
Here are a few specific examples:
Developer Shares Local Project
A developer can work on a web application that uses flask (a popular Python web framework).
The installation and configuration of the underlying operating system is handled by the Docker
container file. Another team member can checkout the code and use docker run to run the
project. This eliminates what could be a multi-day problem of configuring a laptop correctly to run a
software project.
Data Scientist shares Jupyter Notebook with a Researcher at another University
A data scientist working with jupyter style notebooks wants to share a complex data science project
that has multiple dependencies on C, Fortran, R, and Python code. They package up the runtime as
a Docker container and eliminate the back and forth over several weeks that occurs when sharing a
project like this.
A Machine Learning Engineer Load Tests a Production Machine Learning Model
A Machine learning engineer has been tasked with taking a new model and deploying it to
production. Previously, they were concerned about how to accurately test the accuracy of the new
model before committing to it. The model recommends products to paying customers and, if it is
inaccurate, it costs the company a lot of money. Using containers, it is possible to deploy the model
to a fraction of the customers, only 10%, and if there are problems, it can be quickly reverted. If the
model performs well, it can quickly replace the existing models.
Running Docker Containers
Using “base” images
https://noahgift.github.io/cloud-data-analysis-at-scale/topics/docker-format-containers 2/9
3/23/2021 Containers: Docker | cloud-data-analysis-at-scale
One of the advantages of the Docker workflow for developers is the ability to use certified
containers from the “official” development teams. In this diagram a developer uses the official
Python base image which is developed by the core Python developers. This is accomplished by
the FROM statement which loads in a previously created container image.
As the developer makes changes to the Dockerfile , they test locally, then push the changes to a
private Docker Hub repo. After this the changes can be used by a deployment process to a Cloud or
by another developer.
Common Issues Running a Docker Container
There are a few common issues that crop up when starting a container or building one for the first
time. Letʼs walk through each problem and then present a solution for them.
What goes in a Dockerfile if you need to write to the host filesystem? In the following
example the docker volume command is used to create a volume and then later it is mounted
to the container.
> /tmp docker volume create docker-data
docker-data
> /tmp docker volume ls
DRIVER VOLUME NAME
local docker-data
> /tmp docker run -d \
--name devtest \
--mount source=docker-data,target=/app \
https://noahgift.github.io/cloud-data-analysis-at-scale/topics/docker-format-containers 3/9
3/23/2021 Containers: Docker | cloud-data-analysis-at-scale
ubuntu:latest
6cef681d9d3b06788d0f461665919b3bf2d32e6c6cc62e2dbab02b05e77769f4
What about actually mapping the ports? You can do that using the flag as shown. You can read
more about Docker run flags here.
-p
This tells this container to use at max only 25% of the CPU every second.
[TO DO: Docker GPU example]
Container Registries
Build containerized application from Zero on AWS Cloud9
Screencast
https://noahgift.github.io/cloud-data-analysis-at-scale/topics/docker-format-containers 4/9
3/23/2021 Containers: Docker | cloud-data-analysis-at-scale
Dockerfile
FROM python:3.7.3-stretch
# Working Directory
WORKDIR /app
requirements.txt
Makefile
setup:
python3 -m venv ~/.dockerproj
install:
pip install --upgrade pip &&\
https://noahgift.github.io/cloud-data-analysis-at-scale/topics/docker-format-containers 5/9
3/23/2021 Containers: Docker | cloud-data-analysis-at-scale
pip install -r requirements.txt
test:
#python -m pytest -vv --cov=myrepolib tests/*.py
#python -m pytest --nbval notebook.ipynb
validate-circleci:
# See https://circleci.com/docs/2.0/local-cli/#processing-a-config
circleci config process .circleci/config.yml
run-circleci-local:
# See https://circleci.com/docs/2.0/local-cli/#running-a-job
circleci local execute
lint:
hadolint Dockerfile
pylint --disable=R,C,W1203 app.py
app.py
. Install hadolint (you may want to become root: i.e. run this command then exit by
typing exit .
sudo su -
working_directory: ~/repo
steps:
- checkout
- run:
https://noahgift.github.io/cloud-data-analysis-at-scale/topics/docker-format-containers 6/9
3/23/2021 Containers: Docker | cloud-data-analysis-at-scale
name: install dependencies
command: |
python3 -m venv venv
. venv/bin/activate
make install
# Install hadolint
wget -O /bin/hadolint https://github.com/hadolint/hadolint/releases
chmod +x /bin/hadolint
- save_cache:
paths:
- ./venv
key: v1-dependencies-
# run lint!
- run:
name: run lint
command: |
. venv/bin/activate
make lint
. Create app.py
#!/usr/bin/env python
import click
@click.command()
def hello():
click.echo('Hello World!')
if __name__ == '__main__':
hello()
. Run in container
docker build --tag=app .
ec2-user:~/environment $ sudo su -
[root@ip-172-31-65-112 ~]# curl -fLSs https://circle.ci/cli | bash
Starting installation.
Installing CircleCI CLI v0.1.5879
Installing to /usr/local/bin
/usr/local/bin/circleci
#!/usr/bin/env bash
# This tags and uploads an image to Docker Hub
dockerpath="noahgift/app"
# Push Image
docker image push $dockerpath
https://noahgift.github.io/cloud-data-analysis-at-scale/topics/docker-format-containers 8/9
3/23/2021 Containers: Docker | cloud-data-analysis-at-scale
https://noahgift.github.io/cloud-data-analysis-at-scale/topics/docker-format-containers 9/9