0% found this document useful (0 votes)
13 views49 pages

A Guide of PostgreSQL On Kubernetes

The document discusses implementing PostgreSQL in a highly available manner on Kubernetes. It covers container and Kubernetes concepts, the suitability of stateful applications like databases on Kubernetes, and examples like STOLON and Vitess. It proposes a simple shared disk architecture for databases on Kubernetes with the main challenge being ensuring high availability of PostgreSQL.

Uploaded by

talker8432
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views49 pages

A Guide of PostgreSQL On Kubernetes

The document discusses implementing PostgreSQL in a highly available manner on Kubernetes. It covers container and Kubernetes concepts, the suitability of stateful applications like databases on Kubernetes, and examples like STOLON and Vitess. It proposes a simple shared disk architecture for databases on Kubernetes with the main challenge being ensuring high availability of PostgreSQL.

Uploaded by

talker8432
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

A guide of PostgreSQL on Kubernetes

~ In terms of storage ~

【 PGConf.Asia 2018 @track B#1】

2018/12/12
About me
 Takahiro, Kobayashi
 Database Architect, Storage Engineer
 Prefer physical storage to software-defined

About this session


 For Kubernetes Beginner
 Talking about active-standby PostgreSQL
 Not including replication

2
Agenda

1. Container & Kubernetes

2. What about database on Kubernetes?

3. In terms of storage

4. Implementation: PostgreSQL on Rook

5. Actual service evaluation

6. Conclusion

3
1. Container & Kubernetes

4
A container in a Nutshell
• Container(Docker) is the lightweight kernel-level VM.
• Needless to say, containers are useful but not enough for the high-availability system.

Container Architecture

 The container architecture provides


Node
effective resource management.
 Containers share with Linux kernel
but have some libs/files individually.
 Container’s issue is that doesn’t contain
Process Process features which run across many nodes.
Files Files  You may not build an enterprise system
Container Container just by using containers.

Container Runtime(Docker)
Linux Kernel

5
What is Kubernetes?
• Kubernetes is a platform for managing containerized workloads and services.
• “Kubernetes” is too long to type, so it shortens as “k8s”.

Kubernetes Cluster

<< The 3 features of Kubernetes >>


 Immutable
Not update. Delete and recreate.
 Auto-healing
Healing by itself automatically.
 Declarative settings
Container Container Container
Declaring To-Be parameters, not procedural.
Container

Container Container

Node Node Node

6
What applications are suitable for Kubernetes?
• Generally speaking, “stateless” applications are suitable for Kubernetes.
• Kubernetes contains the horizontal/vertical auto-scaling function.

Kubernetes Cluster

<< The pros with Kubernetes features >>


 Immutable
Easy to copy/deploy into other nodes.
 Auto-healing
A container can restart in any nodes.
 Declarative settings
If you collect specific parameters,
they apply by k8s automatically.
Horizontal auto-scaling

Node Node Node

7
“Stateful” is NOT suitable for Kubernetes?
• It’s said stateful applications are not perfect for Kubernetes.
• Database(e.g PostgreSQL,MySQL) is typically stateful one.

Kubernetes Cluster

<< The cons against Kubernetes features >>


 Immutable
Of course database must keep their data.
 Auto-healing
To maintain consistency, database cluster
may assign the role to containers.
Replicate
 Declarative settings
Something to do when startup/shutdown.

Master Slave

Node Node Node

8
The growth of the ecosystem around Kubernetes
• The ecosystem(like a device driver) is growing up with Kubernetes.
• For example, NVIDIA announced to adapt their GPUs for Kubernetes.

Kubernetes on NVIDIA GPUs

 Some vendors announced their device is


able to work with Kubernetes.
 As shown left side, NVIDIA GPUs adapts
container and Kubernetes.
 Near future, more vendors/devices will
operate from k8s.
 We must realize a database system
also grows with Kubernetes.

c.f. https://developer.nvidia.com/kubernetes-gpu

9
2. What about database on Kubernetes?

10
A pattern of Database on Kubernetes
• An example of PostgreSQL on Kubernetes is STOLON project.
• STOLON leverages streaming replication which is the feature of PostgreSQL.

 STOLON looks like a shared-nothing type


database cluster built on k8s.
 Data is duplicated by streaming replication.

 When a master-instance is down,


one standby-instance is promoted
by sentinel components.
 Proxies are the access points for apps.

c.f. https://github.com/sorintlab/stolon

11
The case of Vitess
• Vitess is a database clustering system for horizontal scaling of MySQL.
• It was developed by YouTube and has been hosted as CNCF 16 th project.

VTtablet  Vitess provides MySQL sharding in k8s.


 VTgate is working as SQL proxy and runs
divided queries into back-end VTtablets.
 Each pair of VTtablet and MySQL manages
app not entire but sharded data.
SQL
VTtablet  VTgate receives records from VTtablet and
VTgate merges them.
app SQL  Sharded data has built-in redundancy.

c.f. https://github.com/vitessio
app VTtablet
SQL

12
Proposal: A simple architecture for DB on k8s
• Previous patterns of the database on Kubernetes are slightly complex.
• Therefore, we’ll try to push shared-disk database cluster into Kubernetes.

Shared-disk Cluster Kubernetes Cluster

VIP LB

Node#1 Node#2 Node#1 Node#2

Clustering
(Auto-healing)

Controler Controler
Distributed
Storage[shared-disk] storage

13
Main issue: PostgreSQL on Kubernetes(HA)
• As shown below, there are several differences between a traditional shared-disk
cluster and Kubernetes.

Traditional shared-disk DB Feature PostgreSQL on Kubernetes(HA)


One. Primary only. how many DBs One. Primary only.
SAN/iSCSI/SDS Shared disk Cloud Storage/SAN/iSCSI/SDS
VIP (ie. keepalived) LoadBalance <Kubernetes> Service
Cluster software(ie. PaceMaker) Clustering <Kubernetes> StatefulSet
Move cluster-resources Failover/back <kubectl> drain
Stop database Planned outage <kubectl> scale replicas=0
Stop resource, <kubectl> scale replicas=0
Backup/Restrore
and then backup/restore and then backup/restore
14
3. In terms of storage

15
A big issue: Persistence of container’s data
• When re-creating a container, that data is usually lost.
• Container storage should make persistence and portability compatible.

Kubernetes Cluster
<< Requirements for container storage >>
 To keep the container’s data, the individual
life-cycle is necessary for storage.
 In other words, storage avoids to
be deleted when a container is terminated.
 Because of container’s portability,
storage follows that deployment.

Node Node Node

16
Kubernetes Volume Plugin by storage vendors
• Some storage vendors announced their products work well with Kubernetes.
• NetApp Trident allows a container to access storage management APIs.

Kubernetes Cluster
 Trident is a dynamic persistent storage
orchestrator for container.
 Trident contains Kubernetes volume plugin.
 Through that plugin,
a container makes that data persistent
Volume Plugin outside of Kubernetes cluster.

Orchestration  In another perspective, Trident remakes


a typical storage to Container-Ready.
Storage Management

c.f. https://github.com/NetApp/trident

17
Container-Ready VS Container-Native
• In a previous slide, we confirm the Container-Ready storage keeps data outside.
• Whereas Container-Native storage includes all components into a cluster.
Container-Ready style Container-Native style

Node#1 Node#2 Node#1 Node#2

Including
all of storage

Controler Controler Control plane


Storage

Data plane

18
Introduction: Container-Native Storage
• Several OSS/proprietary SDS are called Container-Native storage.
• Many of them based on distributed storage are deployed into Kubernetes.

Name Based on as block dev OSS/Proprietary

Rook Ceph ◎ OSS

OpenEBS --- ◎ OSS

Redhat OpenShift
Container Storage
GlusterFS △ Proprietary

StorageOS --- ◎ Proprietary

19
Picking up: Rook
• We choose Rook from Container-Native storage since it is OSS and easy to use.
• Rook is an operator to run multiple storage(block/file/object) on Kubernetes.

app  Ceph is a popular distributed storage as


app
app using with OpenStack or else.
 Rook provides multiple storage functions
via Ceph.
 Rook deploys and maintains Ceph that is
actually in charge of storing data.
 Rook is a component named as “Operator”
of Kubernetes.

c.f. https://rook.io
https://ceph.com

20
4. Implementation: Postgresql on Rook

21
Implementation: PostgreSQL on Rook
• As a model of the database on Kubernetes, we place and run PostgreSQL on Rook.

Service  We name an architecture on the left side


as PostgreSQL on Rook from now on.
StatefulSet  Each component is explained in detail
Replicas:1 in subsequent slides.

22
Glossaries#1: Node & Pod
• Before showing our evaluation, we try to learn technical words in Kubernetes.
• Most basic elements in a cluster are “Node” and “Pod”.

Kubernetes Cluster
 "Node" is a VM or a physical machine
Node Node Node included in Kubernetes Cluster.
 "Pod" is a unit deployed and scheduled
Pod Pod Pod by Kubernetes.
container1 container4 container5  One or more containers run on a “Pod”.
 What a pod includes / Which node a pod
runs on is defined by manifest files.
Pod
container2
container3

23
Glossaries#2: Deployment & StatefulSet
• In this slide, we get to know how pod is scheduled in a cluster.
• Kubernetes has three kinds of workload to manage pods efficiently.

Kubernetes Cluster

 Both of “Deployment” and “StatefulSet”


Node Node are able to manage pods among nodes.
 "Deployment" manages pods in parallel and
Pod Pod
Deployment random, but "StatefulSet" handles pods
container1 container1 sequentially and permanently.
Replicas:2
 We skip the explanation of "DaemonSet"
in this session.
Pod
StatefulSet container2
Replicas:1 container3

24
Glossaries#3: Service
• “Service” has the special meaning in Kubernetes terms.
• Simple say, it runs as a kind of software load balancer.

Kubernetes Cluster

 At first, “Service” discovers active pods


Service from StatefulSet/Deployment.
 If applications throw a request to “Service”,
it distributes to the suitable pod based on
defined rules.
Node Node  When the pod is down, “Service” removes
it from a kind of sets automatically.
Pod Pod
Deployment
container1 container1
Replicas:2

25
Glossaries#4: Persistent Volume
• As we have already seen, data persistence is a big issue when using containers.
• Kubernetes prepares “PV” and “PVC” architecture to store data persistently.

Kubernetes Cluster
Node  Persistent Volume (abbreviated PV) is tied to
a physical storage unit.
Pod  Permanent volume claim (PVC) requests
container a certain volume from a cluster.
To Write PVC
Persistent  Generally, PV is prepared by administrators
Volume Claim = in charge of Dev before developing an application, PVC is
coded by developers after that.

Persistent
Volume To prepare PV
and physical storage
= in charge of Ops

26
Repeated: PostgreSQL on Rook
• According to the terms of Kubernetes, the architecture is StatefulSet that
mounts PV provided by Rook.

Service
 StatefulSet is in charge of running
PostgreSQL healthy on an appropriate node.
StatefulSet  Rook/Ceph as PV mounted on PostgreSQL
Replicas:1 will accept data and then split and save it
consistently.
 Service distributes requests to PostgreSQL
in active pod of StatefulSet.

27
28
29
5. Actual service evaluation

30
The viewpoint of an evaluation
• We focus on four points below to evaluate for PostgreSQL on Rook.
• Each of them is important if you intend to run it as the HA database cluster.

# Feature How perform explaination


How switch an active-pod when Pod/Node
1 Clustering <Kubernetes> StatefulSet is down.

2 Failover/back <kubectl> drain How to move a pod to other node.

3 <kubectl> scale replicas=0 How to stop PostgreSQL when planned


Planned outage outage is needed.

<kubectl> scale replicas=0 How to execute cold backup within


4 Backup/Restore PostgreSQL on Rook.
and then backup/restore

31
Evaluation #1: Clustering(1) Pod Failure
• This pattern assumes when a Pod stops suddenly(like process down).
• Auto-healing of Kubernetes repairs a Pod quickly.

Service
 If a pod is detected as stopped, StatefulSet
StatefulSet intends to restart a pod at the same node.
Replicas:1  The reason why StatefulSet does that is
to avoid changing network/storage settings.

32
Evaluation #1: Clustering(1) What happened?
• Even if Pod in the StatefulSet is deleted once or many times, Pod usually boots up
the same node.

$ kubectl get pod -o wide


NAME READY STATUS RESTARTS AGE IP NODE
pg-rook-sf-0 1/1 Running 0 1d 10.42.5.30 node001

$ kubectl delete pod pg-rook-sf-0


A pod in StatefulSet is deleted.
pod "pg-rook-sf-0" deleted

$ kubectl get pod –o wide


NAME READY STATUS RESTARTS AGE IP NODE A pod starts up on the same node.
pg-rook-sf-0 1/1 Running 0 3s 10.42.5.31 node001 Thanks to restart, IP/AGE are changed.

$ kubectl delete pod pg-rook-sf-0


pod "pg-rook-sf-0" deleted

$ kubectl get pod –o wide


A pod is deleted again but starts up
NAME READY STATUS RESTARTS AGE IP NODE
on usually the same node.
pg-rook-sf-0 1/1 Running 0 1s 10.42.5.32 node001

33
Evaluation #1: Clustering(2) Node Failure
• In this case, a Node including a Pod of PostgreSQL stops.
• Of course, we expect auto-healing doing well but PostgreSQL is NOT recovered.

Service  When a database cluster detects


node-down, normally it moves an instance
to healthy nodes.
StatefulSet
 StatefulSet does NOT execute failover like
Replicas:1 that in default settings.
 It causes to avoid occurring Split-Brain in
a cluster.

34
Evaluation #1: Clustering(2) What happened?
• PostgreSQL never do failover when a node including a pod is down.
• Why Kubernetes do NOT move PostgreSQL?
$ kubectl get node
NAME STATUS ROLES AGE VERSION
node001 NotReady worker 15d v1.10.5 A node is changed to “NotReady”.
node002 Ready worker 15d v1.10.5

$ kubectl get pod


A pod is changed to “Unknown” status
NAME READY STATUS RESTARTS AGE
but doesn’t move to another node.
pg-rook-sf-0 1/1 Unknown 0 15m

$ kubectl get node


NAME STATUS ROLES AGE VERSION
node001 Ready worker 15d v1.10.5
node002 Ready worker 15d v1.10.5 A node is recovered and back to “Ready”.

$ kubectl get pod


A pod is recovered to “Running” status
NAME READY STATUS RESTARTS AGE
pg-rook-sf-0 1/1 Running 0 8s on the same node.

35
Evaluation #1: How to handle Node Failure
• When occring Node failure, if you like, it can be handled manually.
• Needless to say, the manual operation is not a better way.

$ kubectl get pod –o wide


NAME READY STATUS RESTARTS AGE IP NODE
A pod is still “Unknown” status.
pg-rook-sf-0 1/1 Unknown 0 15m 10.42.6.20 node001

$ kubectl delete pod pg-rook-sf-0 --force --grace-period=0 “delete --force” command is manually
warning: Immediate deletion does not wait for confirmation that sent from a console.
the running resource has been terminated. The resource may continue
to run on the cluster indefinitely.
pod "pg-rook-sf-0" force deleted

$ kubectl get node


NAME STATUS ROLES AGE VERSION A node is still “Not Ready”.
node001 NotReady worker 15d v1.10.5
node002 Ready worker 15d v1.10.5

$ kubectl get pod -o wide


A pod moves to the “Ready” node
NAME READY STATUS RESTARTS AGE IP NODE
and its status becomes “Running”.
pg-rook-sf-0 1/1 Running 0 5s 10.42.6.21 node002
36
Not Recommended: terminationGracePeriodSeconds=0
• For automatically failover, you can specify this parameter below.
pod.Spec.TerminationGracePeriodSeconds: 0

 We try to this parameter:


Service
pod.Spec.TerminationGracePeriodSeconds:0
 It means Kubernetes always send SIGKILL
StatefulSet
to pods in this StatefulSet when stopping.
Replicas:1
 In the case of a database, SIGKILL means
skipping all the processing required
at the end.
pod.Spec.TerminationGracePeriodSeconds: 0  Thus Kubernetes official document says:
“This practice is unsafe and strongly
discouraged.”

37
Evaluation #2: Failover/back
• As mentioned before, a pod in StatefulSet does not normally move to other nodes.
• What should we do when a manual failover or failback is needed?

 Because of moving pods in a node,


As a procedure Service
"drain" command is useful.
 In “drain” command, first, a node is changed
StatefulSet to “SchedulingDisabled”.
Replicas:1  And then, “drain” command evicts all pods
from a node.
 Finally, StatefulSet recovers pods on the
“Ready” nodes.

39
Evaluation #2: Failover/back - How to operate(1)
• We can execute “kubectl drain” command to do failover.
• It is a very simple operation but has to do “uncorden” command after that.

$ kubectl get pod -o wide


NAME READY STATUS RESTARTS AGE IP NODE
pg-rook-sf-0 1/1 Running 0 2h 10.42.6.21 node001 A pod is running on node001.

$ kubectl get node


NAME STATUS ROLES AGE VERSION
node001 Ready worker 15d v1.10.5 Both nodes are healthy.
node002 Ready worker 15d v1.10.5

$ kubectl drain node001 --force --ignore-daemonsets Execute “kubectl drain” command.


node/node001 cordoned We can see node001 gets cordoned.
pod/pg-rook-sf-0

$ kubectl get pod -o wide


A pod is moved to another node
NAME READY STATUS RESTARTS AGE IP NODE
which is not cordoned.
pg-rook-sf-0 1/1 Running 0 7s 10.42.6.22 node002
Failover has done.

40
Evaluation #2: Failover/back - How to operate(2)
• After doing “drain” command, the target node is not assigned any pods.
• For doing failback, "uncourden" command is needed.

$ kubectl get node


NAME STATUS ROLES AGE VERSION A cordoned node is “ScheduligDisabled”.
node001 Ready,SchedulingDisabled worker 15d v1.10.5 Thus pods never run on this node.
node002 Ready worker 15d v1.10.5

$ kubectl uncordon node001 To accept to be scheduled by StatefulSet


node/node001 uncordoned again, It needs to execute
"uncordon" command.

$ kubectl get node


NAME STATUS ROLES AGE VERSION
Node001 becomes just “Ready”.
node001 Ready worker 15d v1.10.5
node002 Ready worker 15d v1.10.5

41
Evaluation #3: Planned outage
• What about planned outage without failover or failback?
• Kubernetes is not good at keeping the pod stopped.

has gone.
Service
 Kubernetes does not have the command for
temporary stop like "pause".
StatefulSet
 Therefore, if you intend to keep stopping
Replicas:1
StatefulSet, it needs a simple trick.
 It's to reduce “Replicas” of StatefulSet to 0.

42
Evaluation #3: Planned outage - How to operate
• To changing pod count, we can run “scale” command.
• The command: “scale --replicas=0” lets all pods in StatefulSet stop.

$ kubectl scale statefulset pg-rook-sf --replicas=0


Run “kubectl scale” command
statefulset.apps/pg-rook-sf scaled
with “-- Replicas=0” option.
$ kubectl get sts
NAME DESIRED CURRENT AGE StatefulSet doesn’t have any pods.
pg-rook-sf 0 0 1d

$ kubectl get pod


No resources found.
To recover from this outage,
$ kubectl scale statefulset pg-rook-sf --replicas=1
run “scale –replicas=1” command.
statefulset.apps/pg-rook-sf scaled

$ kubectl get sts


NAME DESIRED CURRENT AGE
StatefulSet runs a pod again.
pg-rook-sf 1 1 16h

43
Evaluation #4: Backup/Restore
• PostgreSQL on Rook has the advantage for doing backup thanks to Ceph.
• Both Online and Offline backup is able to do in the storage layer.

Service  When doing backup in PostgreSQL on Rook,


we can use “rbd snap create” command.

StatefulSet  Its command makes the storage snapshot.

Replicas:1  The snapshot is also needed for restoring


PostgreSQL.
 If to restore is needed, we can execute
“rbd snap rollback” command.
rbd snap create
rbd snap rollback

44
Evaluation #4: Backup - How to operate
• At PostgreSQL on Rook, Ceph snapshot is used as the backup command.
• It is necessary to run “pg_start_backup” before creating a snapshot.

$ kubectl exec -it -n rook-ceph rook-ceph-tools-seq -- rbd -p replicapool ls


The taget for backup
pvc-bdbc6e53-f6e9-11e8-b0d9-02f062df6b48

$ kubectl exec -it pg-rook-sf-0 -- psql -h localhost -U postgres -c "SELECT pg_start_backup(now()::text);"


pg_start_backup
Begin backup.
-----------------
0/C000028
Create snapshot between
(1 row) start-stop backup.
$ kubectl exec -it -n rook-ceph rook-ceph-tools-seq -- rbd snap create replicapool/img@snap

$ kubectl exec -it pg-rook-sf-0 -- psql -h localhost -U postgres -c "SELECT pg_stop_backup();"


NOTICE: pg_stop_backup complete, all required WAL segments have been archived
pg_stop_backup Terminate backup.
----------------
0/D000050
(1 row)

45
Evaluation #4: Restore - How to operate
• Thanks to Ceph snapshot, we can restore PostgreSQL on Rook.
• The command to stop PostgreSQL is same as planned outage mentioned before.

$ kubectl scale sts pg-rook-sf --replicas=0


Same as planned outage.
statefulset.apps/pg-rook-sf scaled

$ kubectl exec -it -n rook-ceph rook-ceph-tools-seq -- rbd snap ls replicapool/img


SNAPID NAME SIZE TIMESTAMP Confirm our snapshots.
8 pgdata_snap001 3 GiB Mon Dec 3 11:43:52 2018

$ kubectl exec -it -n rook-ceph rook-ceph-tools-seq -- rbd snap rollback replicapool/img@pgdata_snap001


Rolling back to snapshot: 100% complete...done. Rollback an image
to the old snapshot.
$ kubectl scale sts pg-rook-sf --replicas=1
statefulset.apps/pg-rook-sf scaled
Start pod by “kubectl scale”.
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
pg-rook-sf-0 1/1 Running 0 5s

46
6. Conclusion

47
Repeated: Agenda

1. Container & Kubernetes

2. What about database on Kubernetes?

3. In terms of storage

4. Implementation: PostgreSQL on Rook

5. Actual service evaluation

6. Conclusion

48
What the position is Kubernetes now?

49
Questions?

@tzkb
tzkoba/postgresql-on-k8s
Advent Calendar 2018
( PostgreSQL on Kubernetes )

50

You might also like