Back to articles

Kubernetes 101, part V, statefulsets

Published 01 Apr 2023
kubernetesdocker

In the previous article, we explored ways to handle pod updates without affecting availability, using Deployments.

This article will cover stateful applications in Kubernetes and how StatefulSets fit in such scenario. Moreover, you’ll have the chance to understand how volumes work in Kubernetes and how they relate to Pods hence Deployments and StatefulSets.

Let’s start the journey.


When working with containers, more precisely Pods, it’s known that their data is ephemeral, which means all data written to the Pod will live during the Pod lifetime only.

Once the Pod is terminated, all its data is lost.

That’s the essence of stateless applications.

🔵 Stateless applications are the default

By default, all applications in Kubernetes are stateless, meaning that data within the Pod are ephemeral and will be permanently lost during an application rollout update.

pod filesystem

For instance, suppose we have a PostgreSQL Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: pg
spec:
replicas: 1
selector:
matchLabels:
app: pg
template:
metadata:
labels:
app: pg
spec:
containers:
- name: postgresql
image: postgres:14
env:
- name: POSTGRES_USER
value: postgres
- name: POSTGRES_PASSWORD
value: postgres

Once it’s running, we can create a table called users in the database:

$ kubectl exec pg-79d96fb7b7-zg9kl -- \
psql -U postgres -c \
"CREATE TABLE users (id SERIAL, name VARCHAR);"

And running the query afterwards:

$ kubectl exec pg-79d96fb7b7-zg9kl -- \
psql -U postgres -c "SELECT * FROM users;"
id | name
----+------
(0 rows)

👉 Rolling out the application

Not rare, we have to update the application pod, either fixing some bug, updating the database version or doing some maintenance.

$ kubectl rollout restart deploy/pg
deployment.apps/pg restarted

Notice the the pod name has changed, because it’s a deployment, and Deployments have no ordering or identity for differentiation.

Let’s perform the query on this new Pod:

$ kubectl exec pg-8486b4f555-5dqz8 -- \
psql -U postgres -c "SELECT * FROM users;"
ERROR: relation "users" does not exist
LINE 1: SELECT * FROM users;

Uh, ohthe table has gone away. Pods are stateless, remember?


🔵 Stateful applications

If we want to build a stateful application in Kubernetes, we have to share a common persistent structure that can be mounted across different pods of the same replicaset.

Enter Persistent Volumes.

👉 VolumeMounts and Volumes

In order to use persistent volumes, we have to mount a volume in the Pod container spec:

kind: Deployment
... # more
spec:
template:
spec:
containers:
- name: postgresql
image: postgres:14
env:
- name: POSTGRES_USER
value: postgres
- name: POSTGRES_PASSWORD
value: postgres
volumeMounts:
- name: pgdata
mountPath: /var/lib/postgresql/data

Here, the volume is described as pgdata, which will be mounted to the path /var/lib/postgresql/data in the container. This path is exactly where the PostgreSQL data is located.

However, the volume pgdata can’t come from nowhere. We need to request a persistent volume in the underlying infrastructure storage.

By infrastructure, we could think of our host machine in development, a server in the production environment or even a product storage by the underlying cloud-provider if that’s the case.

In the template.spec section, we add the volumes section:

...
spec:
template:
spec:
containers:
...
volumes:
- name: pgdata
persistentVolumeClaim:
claimName: my-pvc

👉 PersistentVolumeClaim

Persistent Volume Claim, or PVC, is a request by the user for some piece of storage. In the above example, we assume that we have a PVC called my-pvc, let’s create it then:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
storageClassName: my-sc
volumeName: my-pv
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

The PVC requires some attributes:

  • storageClassName: it’s a class of storage defined by the administrator of the cluster. Storage class holds traits about policies and other storage services of the cluster. We’ll create it soon.
  • volumeName: the persistent volume, which is a piece of storage that can be statically of dynamically provisioned in the cluster
  • accessModes, resources among others…

👉 StorageClass

First, we have to create the storageClass:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: my-sc
provisioner: kubernetes.io/no-provisioner
parameters:
type: local

Provisioner determines the plugin used to control the storage provisioning in the cluster.

In development, we can use the default kubernetes.io/no-provisioner, which will not request storage dynamically, so we have to declare the persistent volume manually.

👉 PersistentVolume

The Persistent Volume is a piece of storage in the underlying infrastructure.

By defining capacity, storageClass, accessMode and hostPath, we can declare such a piece ready to be used by a PVC in a Pod.

apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
spec:
storageClassName: my-sc
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /data/volumes/my-pv

Once we applied SC, PV and PVC, we can apply the Deployment using the PVC my-pvc:

apiVersion: apps/v1
kind: Deployment
metadata:
name: pg
spec:
replicas: 1
selector:
matchLabels:
app: pg
template:
metadata:
labels:
app: pg
spec:
containers:
- name: postgresql
image: postgres:14
env:
- name: POSTGRES_USER
value: postgres
- name: POSTGRES_PASSWORD
value: postgres
volumeMounts:
- name: pgdata
mountPath: /var/lib/postgresql/data
volumes:
- name: pgdata
persistentVolumeClaim:
claimName: my-pvc

And boom…

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pg-7744b4d548-nxf8v 1/1 Running 0 3s

Now, time to check if the volumes are working properly across rollout updates:

### CREATE TABLE
$ kubectl exec pg-7744b4d548-nxf8v -- psql -U postgres -c "CREATE TABLE users (id SERIAL, name VARCHAR);"
### QUERY
$ kubectl exec pg-7744b4d548-nxf8v -- psql -U postgres -c "SELECT * FROM users"
### ROLLOUT
$ kubectl rollout restart deploy/pg

And then, performing the query against the new Pod:

$ kubectl exec pg-558d58c54-n9zb2 -- psql -U postgres -c "SELECT * FROM users"
id | name
----+------
(0 rows)

Yay! We just created a stateful application using Deployment and Persistent Volumes!


🔵 Scaling up stateful applications

At this moment, our Deployment has 1 replica only, but if we want to achieve high availability, we have to configure our deployment to support more replicas.

Let’s scale up to 3 replicas as we learned in the previous article. It’s easy as doing:

$ kubectl scale deploy/pg --replicas=3
deployment.apps/pg scaled
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pg-9668885c9-rt9fd 1/1 Running 0 64s
pg-9668885c9-dqwcc 1/1 Running 0 63s
pg-9668885c9-kt7dg 1/1 Running 1 (5s ago) 66s

Great, uh?

After several rollout updates, we may end up with the following state:

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pg-55488bc8b6-wvr66 0/1 CrashLoopBackOff 1 (5s ago) 9s
pg-55488bc8b6-x4hh2 0/1 CrashLoopBackOff 1 (5s ago) 9s
pg-55488bc8b6-hvsdb 0/1 Error 2 (16s ago) 19s

💥 Oh my…the application has gone away.💥

There’s no healthy Pod left. The entire Deployment is broken. What happened here?

👉 Deployment replicas share the same PVC

All Pod replicas in the deployment are sharing the same PVC. Due to concurrency issues, when there are two Pods writing to the same location, it can lead to data loss or corruption.

After several rollouts, it’s not rare that our deployment will end up in a broken state like above.

Moreover:

  • Deployments don’t guarantee ordering during updates, which can lead to data inconsistency
  • Deployments don’t provide any kind of identity, like a stable hostname or IP address for the Pods, which can cause reference issues

Hence, despite it’s possible, Deployments are not a good fit for stateful applications.

Thankfully, Kubernetes addresses thoses problems by providing another workload object called StatefulSet.


🔵 StatefulSet

The StatefulSet object brings a StatefulSet Controller that acts like the Deployment Controller, but with some differences:

  • they have an identity, addressing reference issues
  • StatefulSets guarantee ordering of updates, thus avoiding data inconsistency
  • Pod replicas in a StatefulSet do not share the same PVC. Each replica has its own PVC

We’ll follow the same process as for the Deployment, but referencing kind: StatefulSet instead:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: pg
spec:
replicas: 3
selector:
matchLabels:
app: pg
template:
metadata:
labels:
app: pg
spec:
containers:
- name: postgresql
image: postgres:14
env:
- name: POSTGRES_USER
value: postgres
- name: POSTGRES_PASSWORD
value: postgres
volumeMounts:
- name: pvc
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: pvc
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "local-path"
resources:
requests:
storage: 1Gi

Note that the containers.volumeMounts keep the same, as it needs to reference the volume declared in the template.

But the persistent volume will be created dynamically using the attribute volumeClaimTemplates, where we just have to define the storageClassName and storage request.

Wait…why are we using local-path in the storageClassName?

👉 Dynamic Provisioning

In order to create persistent volumes dynamically, we can’t use the storageClass we created previously, because it uses a provisioner called no-provisioner which does not allow to provision volumes dynamically.

Instead, we can use other storageClass. Chances are that you have a default storageClass created on your cluster.

In my example, I created the k8s cluster using colima, so it already has created a default storage class that allows dynamic provisioning.

Go check your cluster and choose the default storageClass created by it.

$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 37d

That’s why local-path is the name of the default storageClass, which allows dynamic provisioning.


After applying the StatefulSet, we can check that we have 3 replicas running. This time, the name of the pods follow a ordering number:

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
pg-0 1/1 Running 0 62s
pg-1 1/1 Running 0 32s
pg-2 1/1 Running 0 25s

Also, confirm that we have 3 different PVC’s:

$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-pg-0 Bound pvc-f69c6af0-fc14-4e19-9c98-68ddbd69cbb5 1Gi RWO local-path 86s
pvc-pg-1 Bound pvc-ebee5b7f-2568-4c30-8e89-f34099036d0d 1Gi RWO local-path 56s
pvc-pg-2 Bound pvc-08039a2b-d6a8-4777-a9d3-72c7b8860eea 1Gi RWO local-path 49s

And lastly, that we provisioned dynamically 3 persistent volumes, one for each replica:

$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-f69c6af0-fc14-4e19-9c98-68ddbd69cbb5 1Gi RWO Delete Bound default/pvc-pg-0 local-path 116s
pvc-ebee5b7f-2568-4c30-8e89-f34099036d0d 1Gi RWO Delete Bound default/pvc-pg-1 local-path 87s
pvc-08039a2b-d6a8-4777-a9d3-72c7b8860eea 1Gi RWO Delete Bound default/pvc-pg-2 local-path 81s

Such a big Yay! 🚀

Now, we can scale up, down or perform rollout updates as many times as we want, scaling issues with stateful apps are gone!


🚀 Wrapping Up

Today we learned how to build stateful applications in Kubernetes using persistent volumes and how Deployments can lead to issues while scaling stateful applications.

We’ve seen how StatefulSets are the best solution for this problem, by keeping identity and ordering during updates, avoiding data inconsistency.

Stay tuned, as the upcoming posts well continue to cover more workload resources in Kubernetes, such as DaemonSets, Jobs and CronJobs.

Cheers!

Comments