Kubernetes offers numerous features to ensure high availability for applications running in its cluster environment. One critical aspect of maintaining high availability is managing pod disruptions effectively.
In Kubernetes terminology, pod disruption refers to the temporary unavailability or termination of a Kubernetes pod, which can occur voluntarily or involuntarily.
Understanding Pod Disruption
Pods do not disappear until someone (a user or controller) destroys them, or there is an unavoidable hardware or system software error.
Pod disruptions can be classified into voluntary and involuntary disruptions. Voluntary disruptions are intentional and controlled, while involuntary disruptions are unexpected and beyond one’s control.
Voluntary disruptions
Draining a node a Kubernetes upgrade or reboot
Draining a node from a cluster to scale down using Cluster Autoscaling
Updating a deployment’s pod template causing a restart
Deleting a pod (e.g. by accident)
These actions can be take by the cluster administration or by the application owner.
Involuntary disruptions
Node outage due to hardware or hypervisor failure or kernel panic
Node disappears from the cluster due to cluster network partition
Eviction of a pod due to the node being out-of-resources
All these conditions should be familiar to most users except for the out-of-resources condition because of Node-pressure eviction
Best Practices for High Availability
Let’s delve into common issues scenarios and corresponding best practices solutions to prevent application downtime:
1. Deploy Multiple Pods (Replicas)
Issue: Deploying only one pod can lead to application downtime if it becomes unavailable, such as during a pod restart.
Solution: Set the replicas value greater than or equal to 2 to ensure redundancy and high availability. Optionally, use an autoscaler solution (e.g. HorizontalPodAutoscaler or Keda’s ScaledObject) to scale out when needed.
References:
2. Spread Pods Across Availability Zones
Issue: Deploying pods on a single node or within one availability zone can lead to application downtime if that node or zone becomes unavailable.
Solution: Utilize TopologySpreadConstraints or pod anti-affinity to distribute pods across availability zones and set replicas value greater than or equal to 3 to ensure at least one pod is located in each availability zone.
References:
3. Avoid Simultaneous Pod Disruptions
Kubernetes offers features to help you run highly available applications even when you introduce frequent voluntary disruptions.
Issue: Simultaneously draining all nodes hosting application pods can cause application downtime during voluntary disruptions.
Solution: Implement PodDisruptionBudget (PDB) resource to ensure a minimum number of pods remain operational during voluntary disruptions. Do not use a PDB when you only have one replica, as this will block Kubernetes and node upgrades.
References:
4. Correctly Sized Resource Requests and Limits
Requests specify the minimum CPU and memory requirements for a container to operate, with the kube-scheduler utilizing resource requests to determine suitable nodes. Limits establish the maximum CPU and memory usage permitted for a container, preventing excessive resource consumption and safeguarding cluster stability and performance.
Issue: Undersized or improperly configured resource requests and limits can lead to resource contention, out-of-memory issues, or pod restarts.
Solution: Set memory requests and limits according to nominal usage, set CPU requests according to nominal usage, and avoid setting CPU limits.
References:
5. Always Use Health Checks (Liveness and Readiness Probes)
Kubernetes probes, specifically the LivenessProbe and ReadinessProbe, are essential for effective health monitoring of pods. The LivenessProbe determines whether the process within the pod is running, while the ReadinessProbe determines whether the service within the pod is prepared to accept traffic.
Issue: Not defining liveness and readiness probes can result in Kubernetes not detecting crashed or unhealthy application states, leading to downtime or errors.
Solution: Always set liveness and readiness probes to ensure Kubernetes can accurately determine the health and readiness of pods.
References:
6. Use Readiness Probe During Rolling Updates
The rolling deployment is the default deployment strategy in Kubernetes. It replaces pods, one by one, of the previous version of our application with pods of the new version without any cluster downtime.
Issue: Rolling updates, while ensuring minimal downtime, can still result in downtime until new pods are ready to handle requests.
Solution: Always use readiness probes during rolling updates to ensure new pods are fully ready before removing old ones, thus minimizing downtime.
References:
7. Graceful Termination
To ensure graceful pod termination, the application needs to handle the SIGTERM signal properly. This involves stopping traffic, closing database connections, and completing any ongoing operations before shutting down. Kubernetes default wait time for handling SIGTERM is 30 seconds, but this can be adjusted for longer shutdown periods or when deploying a new app.
Issue: When a pod is shut down without handling the termination signal, it can lead to errors.
Solution: To ensure a graceful shutdown of the application, handle the SIGTERM signal and consider extending the wait timeout if necessary.
References:
Summary
In summary, ensuring high availability in Kubernetes applications involves:
Have at least two replicas, optionally use autoscaling
Add health checks (probes)
Add a PodDisruptionBudget
Use pod anti affinity or topology spread constraints
Allocate sufficient resources
The app has the handle SIGTERM
By following these best practices, Kubernetes users can significantly enhance the availability and reliability of their applications in cluster environments. Don’t forget to monitor application workloads to detect downtime or unexpected errors.