In Kubernetes, managing resources like CPU and memory effectively is critical for maintaining performance, reducing costs, and ensuring application stability. Over-provisioning resources wastes money, while under-provisioning can lead to performance bottlenecks or application crashes. Striking the right balance is often challenging, especially in dynamic environments where workloads change frequently.
In this article, we’ll explore how the Vertical Pod Autoscaler (VPA) helps solve these challenges by automatically optimizing resource allocation for Kubernetes workloads. We’ll cover its components, how it works, types of recommendations it provides, and the steps to set it up, along with real-world use cases and practical insights.
The Vertical Pod Autoscaler (VPA) is a Kubernetes component designed to optimize resource allocation for workloads. Unlike the Horizontal Pod Autoscaler (HPA), which scales the number of pod replicas, the VPA adjusts the CPU and memory requests and limits of individual pods. This ensures workloads have the necessary resources without over-provisioning or under-provisioning, reducing costs and improving performance.
In environments where workloads are dynamic and unpredictable, the VPA helps maintain resource efficiency by adapting to changing demands. It’s particularly useful for DevOps engineers, SREs, and FinOps teams aiming to optimize costs while maintaining high availability and performance.
The Vertical Pod Autoscaler (VPA) relies on three main components that work together to monitor, recommend, and apply resource adjustments for Kubernetes pods:
The VPA Recommender is the brain behind the operation. It analyzes historical resource usage data collected from sources like the Kubernetes metrics server or Prometheus. By studying patterns such as average and peak usage, it estimates the appropriate CPU and memory requests and limits for each pod.
For example, if a pod consistently uses 0.5 CPU cores (500m
) but is allocated 2 CPU cores (2000m
), the Recommender might suggest reducing the allocation (requests) to 1 CPU (1000m
), preventing over-provisioning while maintaining enough capacity for occasional spikes.
The VPA Updater is responsible for applying the recommendations generated by the Recommender. However, it doesn’t dynamically adjust resources on running pods since Kubernetes doesn’t support changing resource limits without restarting the pod. Instead, the Updater terminates the pod and allows the deployment controller to recreate it with updated specifications.
This ensures that updated resource limits are applied without disrupting the integrity of Kubernetes scheduling and resource allocation.
Note:
While Kubernetes 1.28 introduced the in-place updates for resource requests and limits KEP which allows resource requests and limits to be adjusted without recreating pods, VPA has not fully adopted this capability yet. As a result, VPA still may require pod restarts to enforce changes.
The VPA Admission Controller acts as a gatekeeper during pod creation and updates. It ensures that new pods are created with the recommended resource requests and limits. This guarantees that the pods are rightsized from the start, reducing the likelihood of inefficiencies.
For instance, if a deployment creates 10 new pods, the Admission Controller automatically injects optimized resource values into their configurations before they are scheduled on the cluster nodes.
For a more detailed breakdown of the VPA components, refer to this documentation.
The VPA Recommender analyzes resource usage patterns and generates actionable recommendations to optimize workload performance. Here’s how it operates:
The VPA Recommender collects historical resource usage data from Kubernetes metrics sources, such as the metrics server, Prometheus, or other monitoring tools. This data includes information about CPU and memory usage patterns for each pod in the cluster.
By default, VPA analyzes up to 8 days of historical resource usage data, assigning higher weight to recent samples. This ensures recommendations are aligned with the pod's most current workload behavior.
Tip:
Using Prometheus as your metrics source allows for greater flexibility in data retention policies and fine-grained usage tracking.
Once the data is collected, the Recommender analyzes usage trends over time. It identifies key patterns, such as:
This analysis enables the Recommender to account for both stable workloads and workloads with dynamic demands.
Note:
Workloads with highly unpredictable spikes may require additional configuration, such as Burstable QoS settings, to handle sudden increases in resource usage. (discussed in the next section)
To calculate recommendations, the VPA uses a histogram-based algorithm. Histograms are statistical representations of resource usage data, allowing the VPA to estimate resource needs with high accuracy. Here’s how it works:
For example: If a pod’s CPU usage histogram shows that 90% of usage falls below 400m, the Recommender might suggest setting the CPU request to 400m and the limit to 600m to account for peak loads.
Based on the analyzed data, the Recommender generates recommendations for:
These recommendations are designed to balance cost efficiency and performance stability, ensuring that workloads operate smoothly under varying conditions.
The Recommender supports four operational modes, allowing flexibility based on your use case:
Suggestion:
Start withOff
Mode in production clusters to analyze recommendations without disrupting workloads. Once you’re confident in the results, switch toAuto
Mode to automatically apply the recommendations.
While VPA and HPA serve different purposes, using them together is generally not recommended, as they can conflict when scaling based on the same metrics. HPA scales horizontally (adding/removing pods) based on CPU/memory utilization, while VPA modifies CPU/memory requests and limits, which can interfere with HPA’s decision-making.
However, if you must use them together, ensure they do not rely on the same metrics. A best practice is to configure HPA to scale based on custom metrics (e.g., request rate, latency) via tools like Prometheus Adapter instead of CPU/memory utilization, for better stability.
The Vertical Pod Autoscaler (VPA) provides three types of recommendations based on Kubernetes Quality of Service (QoS) classes: Guaranteed, Burstable and BestEffort.
These recommendations allow you to optimize resource allocation based on workload requirements, balancing cost efficiency and performance reliability.
Guaranteed recommendations are designed for workloads that require consistent and predictable performance. In this mode, both the CPU and memory requests are set equal to the limits, ensuring that the workload always has the resources it needs under all conditions.
When to Use:
Key Characteristics:
Example:
Here’s an example of a pod recommendation using Guaranteed QoS:
containerRecommendations:
- containerName: database
target:
cpu: 500m
memory: 2Gi
In this case, both the target CPU and memory values will be applied as requests and limits to ensure the database pod operates with maximum stability and reliability.
Tip:
Use Guaranteed QoS for workloads with strict performance SLAs (Service Level Agreements) to ensure uninterrupted service.
Burstable recommendations are ideal for workloads that don’t need consistent resource usage but can benefit from additional resources during peak demand. In this mode, resource requests (minimum guaranteed resources) are set lower, and limits (maximum allowed resources) are set higher.
When to Use:
Key Features:
Example Configuration:
Here’s an example of a Burstable QoS recommendation:
containerRecommendations:
- containerName: nginx
lowerBound:
cpu: 200m
memory: 512Mi
upperBound:
cpu: 800m
memory: 2Gi
In this configuration:
BestEffort recommendations are designed for workloads that don’t specify CPU or memory requests and limits. These pods rely entirely on unused cluster resources to operate and have no guaranteed allocation.
Key Characteristics:
When to Use:
These are suitable for non-critical workloads where performance degradation or eviction is acceptable. They are commonly used for:
The choice of recommendation depends on your workload's criticality and resource requirements:
By aligning your workloads with the appropriate QoS class, you can ensure that your Kubernetes environment remains cost-effective and performant.
Setting up the Vertical Pod Autoscaler (VPA) in your Kubernetes cluster involves a few straightforward steps. While we’ll cover the essentials here, you can refer to the official VPA documentation for detailed instructions and advanced configurations.
Before proceeding further, ensure your Kubernetes cluster meets the following requirements:
For additional details, check the official VPA prerequisites documentation.
To deploy VPA, clone the official Kubernetes Autoscaler repository and use the following commands:
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
This deploys the three key VPA components—Recommender, Updater, and Admission Controller—into your cluster. These components are essential for monitoring resource usage, generating recommendations, and applying adjustments.
To enable VPA for a specific workload, create a VerticalPodAutoscaler resource.
Here’s an example configuration:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: example-vpa
namespace: default
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
This configuration targets a Deployment named my-app and sets the VPA to automatically update pod resources based on recommendations.
Tip:
Start withupdateMode: "Off"
to test the recommendations without automatically applying changes. This mode is ideal for production clusters where stability is critical.
To verify that VPA is functioning correctly, you can deploy a sample application along with a corresponding VPA configuration.
The VPA repository provides an example with a "hamster" deployment.
To view the recommendations provided by VPA, describe the VPA resource:
kubectl describe vpa example-vpa
The output includes values like target, lowerBound, and upperBound for CPU and memory, providing insights into optimal resource allocation.
While the Vertical Pod Autoscaler (VPA) simplifies resource management in Kubernetes, its implementation comes with several challenges. Understanding these challenges is essential for effectively deploying VPA in production environments.
Configuring VPA objects for every workload in a large-scale environment can be time-consuming and prone to errors. Each workload may have unique resource requirements, making it difficult to standardize configurations.
To overcome this, leverage tools like Helm charts or CI/CD pipelines to dynamically generate VPA configurations based on workload metadata. This approach reduces manual effort, ensures consistency across deployments, and minimizes the risk of configuration errors.
In Auto mode, VPA applies recommendations by restarting pods to update resource requests and limits, as modifying resource limits on running pods is not yet fully supported. While Kubernetes 1.28 introduced in-place updates for resource requests and limits, this feature is still in beta and not widely adopted, so VPA may continue to rely on pod restarts.
While necessary for resource adjustments, this process can cause temporary disruptions, particularly for critical or stateful workloads.
For instance, a database pod requiring a memory adjustment will be restarted, potentially interrupting active connections. This could lead to downtime or degraded performance during the update.
To minimize disruptions, start with Off
Mode in production clusters to observe recommendations before applying them. Combine this with rolling updates to ensure changes are applied gradually, reducing the impact on application stability.
The VPA relies on historical usage data (defaults to 8 days) to generate recommendations. In highly dynamic workloads, this can lead to delayed adjustments when handling sudden traffic spikes.
To balance this, use HPA for real-time scaling based on external/custom metrics (e.g., request rate, latency) while VPA optimizes CPU/memory requests over time.
In large-scale Kubernetes clusters, VPA’s Recommender and Updater must process a growing amount of data. This can result in slower recommendations or increased resource consumption for managing the VPA itself.
To improve scalability and performance, use Prometheus with fine-tuned retention policies to reduce the metric storage overhead on the VPA Recommender. Additionally, partition workloads across namespaces or clusters to distribute the load and ensure faster, more reliable recommendations.
Tuning Kubernetes workloads with VPA can be complex—manual configuration, constant monitoring, and unexpected pod restarts make it challenging to optimize resources efficiently.
Randoli App Insights simplifies this process by leveraging VPA data to provide accurate rightsizing recommendations. The built-in agent automatically installs and configures VPA by default, eliminating manual setup hassle.
Note:
If you're already using VPA for auto-scaling (usingupdateMode
as auto or recreate), we recommend consulting our team before installing App Insights to avoid conflicts.
Want to optimize Kubernetes resource allocation with minimal effort? Let’s talk!
Managing resource allocation in Kubernetes can be challenging, but the Vertical Pod Autoscaler (VPA) makes it easier. By automatically adjusting CPU and memory requests, VPA ensures workloads always have the right amount of resources—helping you reduce costs, improve stability, and prevent performance bottlenecks.
Whether you're a DevOps engineer, SRE, or FinOps professional, integrating VPA into your cluster can streamline resource management and make your workloads more efficient and scalable. With the right configuration, VPA takes the guesswork out of rightsizing, letting you focus on what matters—running reliable applications.