Storage in StatefulSets
A Deployment that mounts a PVC gives the same volume to all its replicas. For a database, this is catastrophic: two replicas writing to the same disk causes data corruption. StatefulSets use volumeClaimTemplates instead: a template that creates a unique, dedicated PVC for each Pod.
web-0 gets PVC data-web-0. web-1 gets PVC data-web-1. web-2 gets PVC data-web-2. When a Pod is deleted and recreated, it is reattached to its original PVC. The data persists across restarts.
Adding volumeClaimTemplates
nano statefulset-storage.yamlapiVersion: v1kind: Servicemetadata: name: db-headlessspec: clusterIP: None selector: app: db ports: - port: 5432---apiVersion: apps/v1kind: StatefulSetmetadata: name: dbspec: serviceName: db-headless replicas: 2 selector: matchLabels: app: db template: metadata: labels: app: db spec: containers: - name: database image: busybox:1.36 command: - sh - -c - | echo "init data for $HOSTNAME" > /data/init.txt sleep 3600 volumeMounts: - name: data mountPath: /data volumeClaimTemplates: - metadata: name: data spec: accessModes: ['ReadWriteOnce'] resources: requests: storage: 1Gikubectl apply -f statefulset-storage.yamlWhat volumeClaimTemplates creates
kubectl get pvcYou will see two PVCs:
data-db-0: created fordb-0, bound to a PVdata-db-1: created fordb-1, bound to a PV
The naming convention is <template-name>-<pod-name>.
kubectl get pvEach PVC is backed by a Persistent Volume. The StatefulSet controller did not create the PVs directly: it created the PVCs, and the PVC binding mechanism (StorageClass or pre-provisioned PVs) created or assigned the PVs.
kubectl exec db-0 -- cat /data/init.txtkubectl exec db-1 -- cat /data/init.txtEach Pod wrote its own hostname to its own /data/init.txt file. db-0 writes to data-db-0, db-1 writes to data-db-1. They never share a volume.
You delete db-0. The StatefulSet controller recreates it. Does the new db-0 get a new empty volume or the original data-db-0 PVC?
Reveal answer
The original data-db-0 PVC. When a StatefulSet Pod is deleted and recreated, the controller looks for an existing PVC with the expected name (data-db-0) before creating a new one. If it exists, the new Pod is bound to it. The data is preserved across Pod restarts.
Storage and rescheduling
When a Pod is rescheduled to a different node, its PVC must be accessible from that node. This depends on the accessMode:
ReadWriteOnce(RWO): the volume can only be mounted on one node at a time. If the Pod moves to a different node and the volume is backed by local storage (likehostPath), the Pod may get stuck. Cloud block storage (AWS EBS, GCP PD) supportsReadWriteOnceand can be remounted on a different node.ReadWriteMany(RWX): the volume can be mounted by multiple nodes simultaneously. Network filesystems (NFS, CephFS) typically support this.
For most StatefulSet use cases, use a StorageClass that provisions cloud block storage, which supports ReadWriteOnce across nodes.
StatefulSet scale-down does not delete PVCs. If you scale from 3 to 1, PVCs data-db-1 and data-db-2 remain bound and consuming storage. They are retained in case you scale back up. To free the storage, you must manually delete the PVCs after confirming the data is no longer needed: kubectl delete pvc data-db-1 data-db-2.
Inspecting PVC state
kubectl describe pvc data-db-0The output shows:
Status: Bound: the PVC has a backing PVVolume: the name of the bound PVAccess Modes: how the volume can be mountedStorageClass: which storage class was used to provision it
If a PVC stays in Pending, it cannot find a PV that satisfies its requirements. Check kubectl describe pvc Events for the reason.
A StatefulSet uses volumeClaimTemplates with storageClassName: local-storage. The local-storage class provisions volumes on the node where the Pod first runs. The Pod is rescheduled to a different node. What happens?
Reveal answer
The PVC is still bound to the PV on the original node. The Pod cannot start on the new node because the ReadWriteOnce volume cannot be mounted from a different node. The Pod stays Pending with an event about volume attachment failing. This is why local storage is generally not suitable for StatefulSets that may be rescheduled. Use network-attached storage for portable StatefulSet Pods.
kubectl delete statefulset dbkubectl delete service db-headlessThe PVCs are retained. You would need to delete them manually.
Storage in StatefulSets is per-instance, stable, and not cleaned up automatically. This is the right behavior for databases. The next lesson covers how StatefulSets handle updates and the different update strategies available.