RightSizing workloads using Vertical Pod Autoscaler (VPA)
January 7, 2025
Tags:
Right Sizing
Cost Optimization
The Vertical Pod Autoscaler (VPA) is a Kubernetes component designed to automatically adjust the resource limits (CPU and memory) of containers running in pods. Unlike the Horizontal Pod Autoscaler (HPA), which scales the number of pod replicas, the VPA focuses on adjusting the resource requests and limits of individual pods to better match the actual resource usage.
The Vertical Pod Autoscaler (VPA) consists of three main components:
VPA Recommender: Analyzes historical resource usage data and provides recommendations for adjusting CPU and memory requests and limits for pods.
Requests: The minimum amount of CPU and memory resources guaranteed to a container. Kubernetes uses these values to schedule pods on nodes that have enough capacity.
Limits: The maximum amount of CPU and memory resources that a container can use. If a container exceeds these limits, it may be throttled or terminated to prevent overconsumption of resources.
VPA Updater: Implements the recommended resource changes by updating pod specifications.
VPA Admission Controller: Ensures that new pods use the recommended resource settings during their creation or updates.
You can find more details about each of component here.
The article focuses on leveraging the VPA Recommender for getting insights into how to resize your workloads.
By continuously monitoring and learning from your application's performance, the VPA recommender helps you avoid over-provisioning and under-provisioning issues, leading to cost savings and improved efficiency. Implementing VPA with its recommender allows you to dynamically adapt to changing workloads, ensuring that your applications always have the right amount of resources to operate smoothly without unnecessary overhead.
How the VPA Recommender Works
The Vertical Pod Autoscaler's Recommender is responsible for analyzing the resource usage of pods and generating recommendations for optimal CPU and memory requests and limits. Here's a step-by-step overview of how the VPA Recommender works:
Data Collection:
The Recommender collects historical resource usage data from Kubernetes metrics sources, such as the metrics server, Prometheus, or other monitoring systems.
The recommendations change over time. VPA in default configuration uses 8 days of history (with more weight given to more recent samples) so the changes should be slow in most cases
Usage Analysis:
It analyzes the collected data to understand the actual resource consumption patterns of the containers over time.
The analysis considers factors like peak usage, average usage, and usage trends.
Estimation of Resource Needs:
Based on the usage analysis, the Recommender estimates the appropriate resource requests and limits for each container.
The goal is to allocate enough resources to handle the workload efficiently without over-provisioning.
Generation of Recommendations:
The Recommender generates recommendations for CPU and memory requests and limits.
These recommendations aim to balance between avoiding resource starvation and preventing resource wastage.
The Vertical Pod Autoscaler (VPA) uses a histogram-based algorithm to calculate resource recommendations. This approach involves creating and maintaining histograms of resource usage data, which allows the VPA to provide more accurate and granular recommendations.
The Vertical Pod Autoscaler (VPA) Recommender provides two types of recommendations that help manage resource allocation effectively. These are based on Kubernetes QoS classes, which categorize pods based on their resource requests and limits.
Recommendation Modes:
Off: The recommendations are logged or displayed but not applied.
Auto: The recommendations are automatically applied to the running pods, which may involve restarting the pods to adjust their resource allocations.
Initial: The recommendations are applied only when new pods are created, setting their initial resource requests and limits.
Updating Resource Allocations:
If in Auto mode, the VPA controller takes the recommendations from the Recommender and applies them to the running pods.
This may involve restarting pods if the container runtime requires a restart to apply new resource limits.
Two Types of Recommendations Based on Kubernetes QoS Classes
VPA provides four recommendation values:
Guaranteed Recommendations:
For pods that require consistent and reliable performance, the VPA provides recommendations to ensure that the resource requests and limits are set to guarantee the pod’s resource needs.
Both CPU and memory requests and limits are set to the same value.
Ensures the highest level of service quality. The pod is guaranteed to receive the requested resources under all conditions.
These pods are less likely to be evicted when the node experiences resource pressure, and they have the highest priority for resource allocation.
For Guaranteed, you would use the target field from the recommendations for both the limits and the requests
For pods that need a certain amount of resources but can also benefit from extra resources when available, the VPA provides recommendations to set a minimum resource request with the ability to burst up to a higher limit.
The resource requests are set to a minimum level, and limits are set to a higher value.
Allows pods to perform reliably under typical conditions while being able to utilize additional resources during peak loads.
These pods receive their requested resources but can also use more resources if the node has them available. They have a medium priority for resource allocation and eviction.
For Burstable, you would use the lowerBound as the requestand the upperBound as the limit.
Create a VPA Object: You need to create a VPA object that specifies how VPA should manage the resources for your pods. Here’s an example of a VPA configuration:
targetRef: Specifies the target for the VPA, which could be a Deployment, StatefulSet, or another workload resource.
updatePolicy: Determines how the VPA applies recommendations. It will be to be set to off
Validate Recommendations: After deploying the VPA, you can check its recommendations using the following command:
<kubectl describe vpa example-vpa>
Challenges with Leveraging VPA for Recommendation
Automated Configuration: Manually configuring VPA objects across large-scale environments is complex and error-prone, often leading to suboptimal resource requests and limits.
Minimized Overhead: Collecting and analyzing performance metrics manually can be time-consuming and resource-intensive, leading to increased overhead.
Real-Time Insights: Traditional VPA setups often suffer from delayed data collection and adjustments, resulting in inefficient resource utilization and performance issues.
Adaptability to Dynamic Environments: In highly dynamic environments, rapid changes can render static VPA configurations ineffective, leading to resource misallocation and performance degradation.
Ensuring Application Stability: Adjustments made by VPAs can sometimes interfere with application performance or stability, particularly in environments with specific constraints and requirements.
Scalability and Efficiency: Scaling VPA solutions to handle enterprise-level deployments can be challenging, with potential inefficiencies in resource allocation and management.
Randoli App Insights: Simplifying VPA Deployment and Management
Randoli App Insights addresses these common VPA challenges by providing a streamlined RightSizing solution that simplifies the deployment and management of VPAs:
Automated Configuration: Our platform automates the complex process of setting up VPA objects across large-scale environments, ensuring optimal initial resource requests and limits.
Minimized Overhead: By automating the collection and analysis of performance metrics, Randoli App Insights reduces the overhead associated with VPA, ensuring efficient use of cluster resources.
Real-Time Insights: We deliver real-time, actionable insights that enhance resource efficiency and system performance, reducing the data collection period and ensuring timely adjustments.
Adaptability to Dynamic Environments: Randoli App Insights effectively handles the rapid shifts in highly dynamic environments, providing timely and accurate recommendations to ensure optimal resource utilization.
Ensuring Application Stability: Our platform ensures that VPA adjustments do not interfere with application performance or stability, addressing application-specific constraints and maintaining high reliability.
Decay Factor and Fixed Weight
Installing Vertical Pod Autoscaler (VPA)
To install the Vertical Pod Autoscaler (VPA) on your Kubernetes cluster, follow these steps:
Prerequisites
Kubernetes Cluster: Ensure you have a running Kubernetes cluster (version 1.11 or later).
kubectl: Make sure you have kubectl configured to interact with your Kubernetes cluster.
Step-by-Step Installation
Clone the VPA Repository:
Clone the official VPA GitHub repository to get the latest manifests:
<git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler>
Deploy the VPA Components:
Apply the VPA components using the provided YAML files:
Check the VPA recommendations and ensure your pods are being adjusted accordingly:
<kubectl describe vpa vpa-example>
Appendix
Steps in the Histogram Calculation Algorithm
The VPA collects resource usage metrics (CPU and memory) for each container within the pod. This data is usually gathered from metrics sources like Prometheus or the Kubernetes metrics server. Using that information the VPA the calculates the recommendations using this algorithm:
Histogram Creation:
For each container, the VPA creates histograms that represent the distribution of resource usage over time. A histogram is a statistical representation that shows the frequency of various resource usage levels.
The histogram bins represent different ranges of resource usage (e.g., 0-10%, 10-20%, etc.).
Updating Histograms:
As new usage data is collected, the histograms are updated to reflect the most recent resource usage patterns.
Each usage data point is added to the appropriate bin in the histogram, incrementing the count for that bin.
Resource Estimation:
The histograms are used to estimate the appropriate resource requests and limits. The VPA analyzes the distribution of resource usage to determine how much CPU and memory the container typically uses and needs.
Percentile-based analysis is often applied. For example, the VPA might use the 90th percentile of the histogram to set the resource request, ensuring that the container has enough resources to handle 90% of its observed usage.
Recommendation Generation:
Requests: The VPA recommends the resource requests based on the chosen percentile (e.g., 90th percentile) of the histogram. This ensures that the container gets enough resources most of the time without significant over-provisioning.
Limits: The VPA recommends resource limits to prevent excessive resource consumption. This might be based on a higher percentile (e.g., 95th or 99th percentile) to accommodate occasional spikes in usage.
Adjustment and Scaling:
The recommendations are used to adjust the resource requests and limits of the pods. If the VPA is operating in automatic mode, it will directly apply these changes to the running pods.
The adjustments ensure that the pods have sufficient resources to handle their workloads while optimizing resource utilization across the cluster.
Example
Imagine a container's CPU usage histogram over a period looks like this:
0-10% usage: 10 times
10-20% usage: 20 times
20-30% usage: 50 times
30-40% usage: 15 times
40-50% usage: 5 times
The VPA might decide to use the 90th percentile to set the resource request. In this case, it would likely recommend setting the CPU request at the level where 90% of the observed usage falls below, which might be in the 20-30% usage range. The limit might be set higher, perhaps based on the 99th percentile.
This histogram-based algorithm helps the VPA provide precise and effective resource recommendations, ensuring that containers have the necessary resources to function efficiently while minimizing waste and optimizing overall cluster performance.
Calculation Breakdown
Taking the example of calculating the 95th percentile value from 24 hours of CPU monitoring data, with data points every minute ranging from 0 to 1000.0:
Bucket indices are represented by N, and the bucket size increases exponentially: bucketSize = 0.01 * (1.05^N). Bucket 0 has a size of 0.01 and a range of [0, 0.01), while bucket 1 has a size of 0.01 * 1.05^1 = 0.0105 and a range of [0.01-0.0205).
Data points are placed into buckets based on their numeric values. For instance, if a monitoring data point at a certain moment is 0.032, it will be placed into bucket 3.
When a data point is added to a bucket, the bucket’s weight increases by fixed weight * decay factor (details on fixed weight and decay factor are explained later). The weight of all buckets increases by fixed weight * decay factor.
Calculate W(95) = 95% * total weight of all buckets.
Start accumulating bucket weights from the smallest to the largest. This weight is denoted as S. When S >= W(95), the index of the bucket at this moment is N. The minimum boundary value of bucket N+1 is the 95th percentile value.
Vertical Pod Autoscaler Components
The Vertical Pod Autoscaler (VPA) is composed of three main components that work together to monitor, recommend, and apply resource adjustments for Kubernetes pods:
VPA Recommender:
Monitors resource usage and generates recommendations for optimal CPU and memory requests and limits.
requests
Gather historical and real-time resource usage data from metrics sources (e.g., metrics server, Prometheus).
Analyzes the collected data to understand usage patterns and estimate future resource needs.
Provides recommendations for adjusting resource requests and limits.
VPA Updater:
Applies the recommendations provided by the VPA Recommender to the running pods.
Modes of Operation:
Off: Does not apply recommendations automatically.
Auto: Automatically updates the resource requests and limits, which may involve restarting pods to apply the changes.
Initial: Sets the resource requests and limits only when new pods are created.
Ensures that the updates are applied in a controlled manner, minimizing disruption to the running applications.
VPA Admission Controller:
Intercepts pod creation and update requests to set or adjust the initial resource requests and limits based on the recommendations.
Works with the Kubernetes API server to modify pod specifications during the admission process.
Ensures that the recommended resource settings are applied consistently across the cluster.
VPA Flow
Configuration:
The user configures the VPA settings for the desired pods.
Metrics Collection:
The VPA Recommender reads the VPA configuration and gathers resource utilization metrics from the metric server.
Recommendation Generation:
The VPA Recommender analyzes the metrics and provides resource recommendations for the pods.
Recommendation Application:
The VPA Updater reads the recommendations from the VPA Recommender.
Since Kubernetes does not support dynamically changing resource limits of a running pod, the VPA Updater initiates the termination of pods that are using outdated limits.
Pod Termination and Recreation:
The deployment controller notices the pod termination and recreates the pod to maintain the desired replica count.
Admission Control:
During the pod recreation process, the VPA Admission Controller intercepts the pod creation request.
The VPA Admission Controller injects the updated resource requests and limits into the new pod’s specification based on the latest recommendations.
Resource Injection:
The VPA Admission Controller finalizes the process by ensuring the new pod is created with the updated resource requests and limits.
For example, the VPA Admission Controller might add a "250m" CPU request to the new pod.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.