Kubernetes: How to Cut Cluster Costs by 40%
A typical picture: a company migrates to Kubernetes, enjoys the scalability, then gets a bill 3x higher than expected. We've audited 15+ clusters over the past year and found the same wasteful patterns. Here's how to fix them.
Problem #1: Wrong Resource Requests
This is the most common reason for overpaying. Developers set requests on a 'better safe than sorry' basis or copy values from documentation. Result: nodes are loaded at 15-20%, but you pay for 100%.
Tool: Vertical Pod Autoscaler (VPA) in recommendation mode — shows actual consumption over the last 7 days and suggests optimal values.
Problem #2: Nodes Don't Scale Down
Cluster Autoscaler adds nodes under load but often doesn't remove them after the load drops. Reasons: Pod Disruption Budgets configured too strictly, pods with local storage, missing annotations.
Problem #3: Spot Instances Not Used
Spot/Preemptible instances cost 60-80% less than on-demand. Most stateless workloads (web servers, workers, batch jobs) work perfectly on Spot with the right configuration.
Strategy: mixed node pools (on-demand for critical services, Spot for everything else) + Karpenter for smart instance type selection.
Problem #4: Unused Resources
LoadBalancers, PersistentVolumes, reserved IP addresses — all of these cost money even when not in use. After deleting a namespace, resources often remain.
Real Audit Results
Here are typical results after applying all four methods to a medium-sized production cluster (20-30 nodes):
| Method | Savings |
|---|---|
| Resource requests optimization (VPA) | 15-20% |
| Proper Cluster Autoscaler | 10-15% |
| Spot instances for stateless workloads | 20-30% |
| Removing unused resources | 5-10% |
| Total | 40-50% |
Quick Audit Checklist
- ✓Install VPA in recommendation mode and review suggestions after 7 days
- ✓Check node utilization: kubectl top nodes
- ✓Find pods with requests >> actual usage via Grafana/Prometheus
- ✓Check PDB settings for stateless services
- ✓Calculate Spot instance share in the cluster (target: 60-70%)
- ✓Find Released PVs and unused LoadBalancers
- ✓Install Kubecost/OpenCost for continuous cost monitoring
Want to Audit Your Cluster?
We conduct Kubernetes infrastructure audits in 3-5 days. Get a concrete list of changes with expected savings.