Notes For The Textbook - Docker in Action
Notes For The Textbook - Docker in Action
Containers - a unix technology to isolate a process from all resources except where
explicitly allowed.
A program running in an os can see all resources, but one running in a container
can only see the resources allocated to the container. This is implemented using
linux namespaces.
They are challenging to build manually. Docker runs all software inside a
container, using existing container engines to provide consistent containers built
with best practices.
Users interact with docker using the docker cli. All containers run as child
processes of the docker daemon, wrapped with a container.
Containers built by docker are isolated in 8 aspects:
* PID
* UTS (host and domain name)
* MNT
* IPC
* NET
* USR
* chroot() - controls fs root location
* cgroups - resource protection (cpu, memory, disk io etc)
A docker image is a bundled snapshot of all the files that should be available to a
program running inside a container.
A container is created from an image.
Registries and indexes are infrastructure components that simplify distributing
docker images.
The more software you use, the more difficult it is to manage, due to things like
shared application dependencies. Docker helps to focus on using the software,
instead of spending time installing/upgrading/publishing it.
cgroups - control groups, allow limiting resources like cpu, memory etc. each
resource has a tree structure of processes that use it, a process will have a node
in each tree for the different resources it needs.
memory cgroup
-hard limits - if container exceeds hard limit, it will be killed. so only put one
service in a container, so in case of oom only one will be killed.
-soft limits - in case system is running out of memory, it will take memory pages
away from processes that are above their soft limits.
cpu cgroup
-track user/system cpu time & usage per cpu
-can’t set cpu limits
cpuset cgroup
-lets you pin groups of processes to specific cpu - so you can dedicate cpus to
specific tasks. this prevents processes from bouncing between cpus.
blkio cgroup
keeps track of io for each group; reads and writes, sync vs async operations; per
block device
devices cgroup
-controls what the group can do on device nodes like network
interfaces(/dev/net/tun), filesystems in user space (/dev/fuse), etc
freezer cgroup
-lets you freeze a group of processes
New processes start in their parent’s cgroups. groups are created by mkdir in the
pseudo-fs:
mkdir /sys/fs/cgroup/memory/somegroup/subcgroup
to move a process to a cgroup:
echo $PID > /sys/fs/cgroup/../tasks
pid namespace
-processes in a pid namespace can only see processes in the same pid namespace
-a process has multiple pids, one per namespace in which it’s nested
so in a container you can only see those processes, but from system you can see
processes inside the containers on that system
-inside the container, that process could be pid=1, but outside the container
the pid will be different.
network namespace
lets each container have its own network sources.
processes in a namespace get their own private network stack, including network
interface, routing tables, sockets etc.
you can move network interfaces across containers.
mnt namespace
lets a container mount something that isn’t visible in other containers.
processes can have their own root fs
mounts can be private or shared
uts namespace
lets a container have it’s own hostname
ipc namespace
allows a process (or group of processes) to have own
-ipc semaphores
-ipc message queues
-ipc shared memory
user namespace
lets you map uids. in a container you can be uid 0, but outside you are uid 1234.
so it maps container uids to host uids.
user namespace is about usable security. lets you be root inside the container but
not outside the container.
namespace manipulation
-can create a ns by passing flags when creating a new process
-so to manually create a container, start bash with such flags to keep it in
separate namespaces.
-ns are materialized by pseudo files, /prod/<pid>/ns
-when last process in ns exits, ns is destroyed (can be persisted if needed)
copy-on-write storage
-cow is a standard unix pattern that provides a single shared copy of data until
the data is modified. this data is in the docker image. if a write is performed in
the running container, a copy of the file to be modified is first placed in the
writeable layer of the container, and then the write operation takes place on that
copy.
-lets you create a new container instantly (startup time can be less than 0.1 s),
since it doesn’t have to create a copy of the full filesystem.
-lets containers have a small footprint (can occupy less than 1mb on disk)
unlike vms, which take minutes to startup and occupy gbs of disk space.
cow is implemented using the union file system(ufs) (other options may be
available) which allows layered file systems. a docker image is composed of
multiple read only layers. when you create a container, a writable layer is created
on top of the layers. when a file is needed, the layers are searched top down, and
the first file to be found is returned.
orthogonality:
all above features can be used independently
e.g put a debugger in a containers namespace but not its cgroups, so it doesn’t
steal resources.
above features can be used by different container runtimes, not just docker.
they are hard and error prone to use to get a container running, docker and other
container runtimes simplify the process and take care of using them under the hood.
A docker image is a bundled snapshot of all the files that should be available to a
program running inside the container. images are the shippable units in docker.
registries and indexes simplify distributing docker images.
Docker runs natively on linux, and uses a single virtual machine on windows and os
x in which to run containers - vm has a constant overhead while the number of
containers can scale up. This improves portability.
Containers limit the scope of impact a program has on other programs, the data it
can access and system resources. The scope of any security threat is therefore
limited to this scope of impact.
Docker can only run applications that can run on a linux system. Helps protect your
system from attack by running software in containers and as a user with reduced
privileges. This won’t work for software that needs elevated privileges.
It’s a bad idea to blindly run third party containers in a collocated environment.
To list running containers, use the “docker ps” command. This returns the following
details for each running container:
* container id
* the image used
* the command executed inside the container
* the time since the container was created
* the duration that the container has been running
* the network prots exposed by the container
* the name of the container
to see container logs, type “docker logs container_name”. this records anything the
program writes to stdout or stderr. This is never rotated or truncated, which can
be a problem for long lived processes. its preferable to use volumes for log data,
discussed later.
to stop a container, type “docker stop container_name”. This tells the program with
pid #1 in the container to halt.