Kubernetes Storage 101
Kubernetes Storage 101
Containers – often Docker, but there are others in the market – contain all
that’s needed for an application to run, and can be created, spun up, cloned
and scaled, and made extinct very rapidly.
For this reason, containers are well-suited to workloads that see massive
spikes in demand, especially on the web, and mainly where
Kubernetes’s automation functionality allows this to take place rapidly.
Containers are inherently stateless, and we’ll look at how things work there
first, although the bulk of this article will be concerned with persistent
storage in Kubernetes, which has become the default container orchestration
platform.
Kubernetes also supports persistent storage that can be in a wide range of on-premise
and cloud formats, including file, block, object and numerous classes of storage
from the cloud providers. Storage can also be in data services, such as databases,
which ultimately rely on the existence of physical storage somewhere too.
Storage can be referenced from inside the pod directly, but this is not recommended
because it violates the principle of container/pod portability. Instead, persistent
volumes and persistent volume claims (PV/PVC) are used to define storage and
application requirements.
PVs and PVCs decouple storage implementation from its functioning and allow
block/file/object storage to be consumed by a pod in a portable way. They also
decouple the needs of the user/application and storage configuration.
A PV is where admins define storage and its performance and capacity parameters –
that is, it defines a persistent storage volume. It contains details about the storage such
as performance/cost class, capacity, as well as volume plugin used, paths, IP
addresses, usernames and passwords and what to do with the volume after use. PVs
are not portable across Kubernetes clusters.
Meanwhile, a PVC is used to describe the storage a user/devops wants for their
application. These are portable and they travel with the application. Kubernetes works
out what storage is available from defined PVs and binds the PVC to it.
PVCs are defined in the pod’s YAML so that the claim travels with it and can be
pretty simple, specifying just capacity and tier of storage, for example.
There is provision for multiple cloned pods in Kubernetes, called a deployment, which
share a single PVC, but this can lead to problems such as crashes. An alternative is the
stateful set, which duplicates PVC across pods.
Storage class specifies the volume plugin used, the external – eg, cloud – provider and
the name of the CSI driver. CSIs – container storage interfaces – are drivers that allow
containers to interact with cloud and storage supplier’s products.
It’s good practice to have one storage class marked as “default” so it doesn’t have to
be invoked by use of a PVC, or so that it can be invoked if a user doesn’t specify a
storage class in a PVC.
A storage class can also be created for old data that may need to be accessed by
containerised applications.
That’s the case for host path, which exposes a directory on the host machine.
Obviously that’s not going to be portable because the path will not be accessible if the
pod/container moves and it’s not something that most pod deployments will want.
Local persistent volumes can also be created using block, file or object storage. This
can be used, for example, to build a distributed storage system on top of Kubernetes,
effectively creating a virtualised/containerised storage pool, which is something like
what has been created by Rook.