We’ve all been there—staring at a Kubernetes cluster upgrade that’s been stuck for the past three hours. The nodes won’t drain. The pods won’t evict. And somewhere in the back of your mind, you’re thinking about that PodDisruptionBudget you added last month because “we need high availability.”
Yeah, that one. The one that’s now holding your entire cluster upgrade hostage.
This is all to say, PodDisruptionBudgets are like seat belts—absolutely critical for safety, but if you configure them wrong, you’re just stuck in the car while it’s on fire.
What’s Actually Happening During a Cluster Upgrade
Before we dive into the PDB problem, let’s talk about what Kubernetes is trying to do when you upgrade your cluster nodes.
When Kubernetes needs to upgrade or replace a node, it follows a polite eviction process:
- Mark the node as unschedulable - New pods won’t land here
- Evict existing pods gracefully - Give them time to shut down properly
- Wait for pods to terminate - Respect their
terminationGracePeriodSeconds - Drain the node - Remove it from the cluster
- Upgrade and rejoin - Bring it back with the new version
The key word here is “gracefully.” Kubernetes isn’t just killing your pods—it’s checking with your PodDisruptionBudgets first to make sure the eviction won’t violate your availability requirements.
And this is where things get interesting.
The Single-Pod PDB Deadlock
Here’s the scenario we see constantly in consulting engagements. A team deploys a service with this configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-processor
namespace: production
spec:
replicas: 1 # Only one pod - maybe we don't need HA yet
selector:
matchLabels:
app: payment-processor
template:
metadata:
labels:
app: payment-processor
spec:
containers:
- name: processor
image: payment-processor:v2.1
ports:
- containerPort: 8080
Looks reasonable, right? It’s a payment processor, they want to be careful, so they add a PodDisruptionBudget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: payment-processor-pdb
namespace: production
spec:
minAvailable: 1 # Always keep one running!
selector:
matchLabels:
app: payment-processor
The logic seems sound: “We need at least one payment processor running at all times. Can’t process payments if we’re down!”
But here’s what actually happens during a node upgrade:
- Kubernetes tries to drain the node
- It sees the payment-processor pod needs to be evicted
- It checks the PDB: “minAvailable: 1”
- Current available pods: 1
- If we evict this pod, available pods: 0
- PDB says we need minimum 1
- Eviction denied. Upgrade blocked.
The node sits there, cordoned but not drained, waiting for a condition that literally cannot be satisfied. You have one pod. The PDB says you need one pod running. The only way to move that pod is to evict it. But evicting it would violate the PDB.
Deadlock.
Why This Happens More Than You’d Think
We’ve seen this pattern in production environments across a number of clients. The teams aren’t making mistakes—they’re following what seems like logical reasoning:
“We have a critical service. We need it to always be available. Therefore, minAvailable: 1.”
The problem is that minAvailable isn’t actually about what you need—it’s about what Kubernetes is allowed to disrupt. And when you only have one replica, there’s no disruption that doesn’t violate a minAvailable: 1 requirement.
We had a client running platform with about 30 microservices. They’d standardized on PDBs across everything—excellent practice. But half their services were running with replicas: 1 and minAvailable: 1.
Their quarterly cluster upgrades? Took three days instead of three hours. The platform team had to reach out to each application team individually to negotiate temporary workarounds and understand their PDB requirements. Picture tracking down 15 different teams, explaining why their configuration was blocking the upgrade, discussing acceptable downtime windows, and coordinating changes across multiple business units. It was essentially a multi-day coordination event that required both technical expertise and organizational diplomacy.
The Right Way to Handle Single-Replica Services
If you only have one replica, you have a few options, and none of them involve minAvailable: 1.
Option 1: Scale to Two Replicas (The Correct Answer)
This is what you actually want if availability matters:
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-processor
namespace: production
spec:
replicas: 2 # Now we have real HA
selector:
matchLabels:
app: payment-processor
template:
metadata:
labels:
app: payment-processor
spec:
containers:
- name: processor
image: payment-processor:v2.1
ports:
- containerPort: 8080
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: payment-processor-pdb
namespace: production
spec:
minAvailable: 1 # Now this actually makes sense
selector:
matchLabels:
app: payment-processor
Now when Kubernetes needs to drain a node:
- Current available: 2
- Evict one: 1 remaining
- minAvailable: 1 ✅
- Eviction proceeds, upgrade continues
This is the real solution. If availability matters enough that you’re writing a PDB, it matters enough to run more than one pod.
Option 2: Use maxUnavailable Instead
If you genuinely need single-replica for now (maybe you’re in dev, or there are constraints), use maxUnavailable:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: payment-processor-pdb
namespace: production
spec:
maxUnavailable: 1 # Allow disruption of one pod
selector:
matchLabels:
app: payment-processor
With maxUnavailable: 1:
- Current pods: 1
- Allowed to disrupt: 1
- Eviction proceeds ✅
This says “I accept that this single pod might go down during maintenance.” It provides some protection—Kubernetes won’t disrupt more than one at a time—but acknowledges the reality that you can’t have zero downtime with one replica.
Option 3: Don’t Use a PDB (Perfectly Fine for Non-Critical Services)
Not every service needs a PDB. If it’s not user-facing, if downtime is acceptable, if it’s stateless and fast to restart—skip the PDB entirely.
apiVersion: apps/v1
kind: Deployment
metadata:
name: background-processor
namespace: production
spec:
replicas: 1
# No PDB needed - it's fine if this restarts during upgrades
selector:
matchLabels:
app: background-processor
template:
metadata:
labels:
app: background-processor
spec:
containers:
- name: processor
image: background-processor:v1.0
Kubernetes will still gracefully terminate the pod. It just won’t block cluster operations to do so.
How to Find the Problem Before Your Next Upgrade
If you’re reading this thinking “I might have this problem,” here’s how to audit your cluster:
# Check for PDBs with zero disruptions allowed
kubectl get pdb --all-namespaces -o json | \
jq -r '.items[] | select(.status.disruptionsAllowed == 0) |
"\(.metadata.namespace)/\(.metadata.name) - DisruptionsAllowed: \(.status.disruptionsAllowed)"'
The key metric is status.disruptionsAllowed. When this is zero, Kubernetes cannot drain any nodes without violating the PDB. This is exactly what blocks your upgrades.
For a more comprehensive check, you can create a simple script:
#!/bin/bash
# Check for PDBs that would block node drains
echo "Checking for PDBs with zero disruptions allowed..."
echo ""
kubectl get pdb --all-namespaces -o json | \
jq -r '.items[] | select(.status.disruptionsAllowed == 0) |
"⚠️ BLOCKED: \(.metadata.namespace)/\(.metadata.name)
Current Pods: \(.status.currentHealthy)
Desired Healthy: \(.status.desiredHealthy)
Disruptions Allowed: \(.status.disruptionsAllowed)"'
Run this before your next cluster upgrade. Any PDB showing DisruptionsAllowed: 0 will block node drains and needs to be fixed.
Real-World Impact: The Three-Day Upgrade
That client we mentioned? Here’s what changed after we fixed their PDB configurations:
Before:
- 30 services, 15 with single-pod PDB deadlocks
- Cluster upgrade: 3 days
- Manual intervention required: running kubectl delete pdb in a watch loop
- Platform team coordination: 15 separate conversations with application teams
- Engineer time required: 8 hours spread across multiple days
- Anxiety level: extremely high
After:
- Scaled critical services to 2+ replicas with proper PDBs
- Changed minAvailable to maxUnavailable for acceptable single-pod services
- Removed PDBs entirely from non-critical background jobs
- Cluster upgrade: 2.5 hours
- Manual intervention: 0
- Team coordination needed: 0
- On-call time: 0
- Anxiety level: checking Slack occasionally
The difference wasn’t just operational—it was cultural. Upgrades went from “major events requiring all-hands-on-deck” to “automated maintenance windows.” And the platform team stopped being the PDB police, chasing down application owners to fix misconfigurations.
The Bottom Line
PodDisruptionBudgets are critical for production Kubernetes clusters. They prevent Kubernetes from disrupting too many pods during voluntary operations like upgrades, node maintenance, or scaling down.
But minAvailable: 1 with replicas: 1 is a configuration that makes no sense. You’re telling Kubernetes “always keep one pod running” while simultaneously only giving it one pod to work with. The math doesn’t work.
If a service is important enough to have a PDB, it’s important enough to have at least two replicas. If you can’t run two replicas, use maxUnavailable instead. And if the service isn’t critical, skip the PDB entirely.
We’ve seen this single-pod PDB trap delay cluster upgrades, block node drains, and create operational emergencies across too many production environments. The fix is straightforward once you understand what’s happening—and now you do.
Getting It Right From the Start
For teams managing production Kubernetes environments, we typically recommend these PDB guidelines:
For critical services (user-facing, payment processing, etc):
- Minimum 2 replicas across multiple nodes
- Use minAvailable: 1 or maxUnavailable: 1 depending on your redundancy
- Test drain scenarios before production
For important but non-critical services:
- Use maxUnavailable: 1 if you have 1 replica
- Use minAvailable: N-1 if you have N replicas (e.g., 2 replicas = minAvailable: 1)
- Accept that brief disruptions during maintenance are OK
For background jobs and workers: - Skip the PDB unless there’s a specific reason - Let Kubernetes handle graceful termination - Design for restartability rather than continuous availability
If you’re managing complex Kubernetes environments or preparing for critical upgrades, we’ve helped dozens of teams audit their PDB configurations and implement reliable maintenance processes. Reach out if you’d like help getting your cluster upgrade process from “emergency intervention” to “scheduled automation.”