0% found this document useful (0 votes)
43 views12 pages

6 Storage

The document discusses Docker storage and volumes. Docker uses a layered architecture where each instruction creates a new image layer. Containers then create a writable layer on top of the image. Volumes can be used to persist container data and are supported through plugins. Kubernetes uses volumes to persist pod data and supports dynamic provisioning through storage classes.

Uploaded by

Pravin Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views12 pages

6 Storage

The document discusses Docker storage and volumes. Docker uses a layered architecture where each instruction creates a new image layer. Containers then create a writable layer on top of the image. Volumes can be used to persist container data and are supported through plugins. Kubernetes uses volumes to persist pod data and supports dynamic provisioning through storage classes.

Uploaded by

Pravin Rai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Storage:

Storage in Docker:

When you install Docker on a system it creates this folder structure at /var/lib/docker/.

How exactly does Dockers store the files of an image and a container?
To understand that we need to understand Docker's layered architecture.
When Docker builds images, it builds these in a layered architecture. Each line of
instruction in the Docker file creates a new layer in the Docker image with just the
changes from the previous layer.

To understand the advantages of this layered architecture let's consider a second


application. This application has a different docker file but it's very similar to our first
application as in it uses the same base image as Ubuntu, uses the same Python and
flask dependencies, but uses a different source code to create a different application
and so a different entry point as well.
When I run the Docker build command to build a new image for this application,
since the first three layers of both the applications are the same, Docker is not going to
build the first three layers. Instead, it reuses the same three layers it built for the first
application from the cache. And only creates the last two layers with the new sources
and the new entry point. This way Docker builds images faster and efficiently saves
disc space.
All of these layers are created when we run the docker build command to form the
final docker image. So all of these are the docker image layers. Once the build is
complete, you cannot modify the contents of these layers. And so they are read only
and you can only modify them by initiating a new build.

When you run a container based off of this image using the docker run command,
Docker creates a container based off of these layers and creates a new writeable layer
on top of the image layer. The writeable layer is used to store data created by the
container such as log files written by the applications any temporary files generated
by the container or just any file modified by the user on that container.
The life of this layer though is only as long as the container is alive.

What if I wish to modify the source code to say test a change?


Remember, the same image layer may be shared between multiple containers created
from this image. So does it mean that I cannot modify this file inside the container?
No, I can still modify this file but before I save the modified file Docker automatically
creates a copy of the file in the read write layer and I will then be modifying a
different version of the file in the read write layer. All future modifications will be
done on this copy of the file in the Read Write layer. This is called copy on write
mechanism.

What happens when we get rid of the container? All of the data that was stored in the
container layer also gets deleted.
The change we made to the app.py and the new temp file we created will also get
removed. So what if we wish to persist this data? For example, if we were working
with a database and we would like to preserve the data created by the container
we could add a persistent volume to the container. To do this first create a volume
using the docker volume create command.
what if you didn't run the docker volume Create command to create the volume before
the docker run command?
For example, if I run the Docker run command to create a new instance of my SQL
container with the volume data underscore volume two which I have not created yet,
Docker will automatically create a volume named data underscore volume two and
mount it to the container.

But what if we had our data already at another location? For example, let's say we
have some external storage on the Docker host at Forward slash Data and we would
like to store database data on that volume and not in the default Var lib docker
volumes folder. In that case, we would run a container using the command Docker run
dash V, but in this case we will provide the complete path to the folder we would like
to mount. That is forward slash data, forward slash MySQL and so it will create a
container and mount the folder to the container. This is called bind mounting.

So, there are two types of mounts. A volume mounting and a bind mount.
Volume mount mounts a volume from the volumes directory and bind mount mounts
a directory from any location on the Docker host.

Docker uses storage drivers to enable layered architecture. Some of the common
storage drivers are AUFS, VTRFS, ZFS, device mapper overlay, and overlay two.

Volume Driver plugins in Docker:


Container Storage Interface (CSI):

In the past, Kubernetes used Docker alone as the container runtime engine, and all the
code to work with Docker was embedded within the Kubernetes source code. With
other container run times coming in, such as RKT and CRI-O, it was important to
open up and extend support to work with different container runtimes, and not be
dependent on the Kubernetes source code. And that's how Container Runtime
Interface came to be. The Container Runtime Interface is a standard that defines how
an orchestration solution like Kubernetes would communicate with container runtimes
like Docker.
So, here's what the CSI kind of looks like. It defines a set of RPCs, or remote
procedure calls, that will be called by the container orchestrator, and these must be
implemented by the storage drivers. For example, CSI says that when a pod is created
and requires a volume, the container orchestrator, in this case Kubernetes, should call
the create volume RPC and pass a set of details such as the volume name. The storage
driver should implement this RPC and handle that request and provision a new
volume on the storage array, and return the results of the operation.

Volumes:
Just as in Docker, the pods created in Kubernetes are transient in nature.
When a pod is created to process data, and then deleted, the data processed by it, gets
deleted as well.
For this, we attach a volume to the pod. The data generated by the pod is now stored
in the volume, and even after the pod is deleted, the data remains.
Let's look at a simple implementation of volumes. We have a single node Kubernetes
cluster. We create a simple pod that generates a random number between one and
hundred, and writes that with file at /opt/number.out .
It then gets deleted along with the random number.
To retain the number generated by the pod, we create a volume. And a volume needs a
storage. When you create a volume, you can choose to configure its storage in
different ways. We will look at the various options in a bit,
but for now we will simply configure it to use a directory on the host.
In this case, I specify a path, forward slash data, on the host.
This way, any files created in the volume would be stored in the directory data on my
node. Once the volume is created, to access it from a container we mount the volume
to a directory inside the container. We use the volume mounts field in each container
to mount the data volume to the directory, /opt/number.out within the container.
The random number will now be written /opt/number.out mount
inside the container, which happens to be on the data volume, which is in fact the data
directory on the host. When the pod gets deleted,
the file with the random number still lives on the host.
Persistent Volumes:

A persistent volume is a cluster-wide pool of storage volumes configured by an


administrator to be used by users deploying applications on the cluster.
The users can now select storage from this pool using persistent volume claims.
Persistent volume claims:
we will try to create a Persistent Volume Claim to make the storage available to a
node.
Persistent Volumes and Persistent Volume Claims are two separate objects in the
Kubernetes name space. An administrator creates a set of persistent volumes,
and a user creates persistent volume claims to use the storage. Once the persistent
volume claims are created, Kubernetes binds the persistent volumes to claims based
on the request and properties set on the volume.
During the binding process, Kubernetes tries to find a Persistent Volume that has
sufficient capacity, as requested by the claim. And any other request properties such
as, access modes, volume modes, storage class, et cetera. However, if there are
multiple possible matches for a single claim, and you would like to specifically use a
particular volume, you could still use labels and selectors to bind to the right volumes.

If there are no volumes available, the persistent volume claim will remain in a
pending state until newer volumes are made available to the cluster.

Delete PVC:
PersistentVolumeReclaimPolicy: Defines what will happen when PVC deleted.
Retain(default) : It will be retained
Delete: deleted automatically
Recycle: The data in the data volume will be scrubbed before making it available to
other claims.

Using PVCs in Pods

Once you create a PVC use it in a POD definition file by specifying the PVC Claim name under
persistentVolumeClaim section in the volumes section like this:

1. apiVersion: v1
2. kind: Pod
3. metadata:
4. name: mypod
5. spec:
6. containers:
7. - name: myfrontend
8. image: nginx
9. volumeMounts:
10. - mountPath: "/var/www/html"
11. name: mypd
12. volumes:
13. - name: mypd
14. persistentVolumeClaim:
15. claimName: myclaim

The same is true for ReplicaSets or Deployments. Add this to the pod template section of a
Deployment on ReplicaSet.

Dynamic Provisioning:
Every time an application requires storage, you have to first manually provision the disk on
Google Cloud, and then manually create a persistent volume definition file using the same name
as that of the disk that you created.
That's called static provisioning volumes.

It would've been nice if the volume gets provisioned automatically when the application requires
it, and that's where storage classes come in. With storage classes, you can define a provisioner,
such as Google Storage, that can automatically provision storage on Google Cloud and attach
that to pods when a claim is made.
That's called dynamic provisioning of volumes.
You do that by creating a storage class object with the API version set to storage.k8.io/v1,
specify a name, and use provisioner as Kubernetes.io/gce-pd.

So going back to our original state where we have a pod using a PVC for its storage, and the
PVC is bound to a PV, we now have a storage class, so we no longer need the PV definition,
because the PV and any associated storage is going to be created automatically when the storage
class is created.
For the PVC to use the storage class we defined, we specify the storage class name in the PVC
definition. That's how the PVC knows which storage class to use.
Next time a PVC is created, the storage class associated with it
uses the defined provisioner to provision a new disk with the required size on GCP,
and then creates a persistent volume, and then binds the PVC to that volume.
So remember that it still creates a PV, it's just that you don't have to manually create PV
anymore. It's created automatically by the storage class.

You might also like