Vertical Pod Autoscaler in Kubernetes

Learn how to use Vertical Pod Autoscaler (VPA) to vertically scale services in Kubernetes automatically based on resource metrics.

Jean Mainguy
EGYM Software Development

--

Photo by Taylor Vick on Unsplash

Pssst! I started my own blog!
You can read this very same article over there too!
No paywall, no ad, no Javascript — no bullshit, just pure content.
See you there!

In Kubernetes, we usually think about the Horizontal Pod Autoscaler (HPA) when referring to autoscaling. In most cases, it will be the preferred way of scaling services, based on CPU usage, memory usage, or custom metrics.

If you haven’t already, go read Horizontal Pod Autoscaler in Kubernetes (Part 1) — Simple Autoscaling using Metrics Server and learn how to implement a Horizontal Pod Autoscaler using Metrics Server!

However, while HPA can scale up and down replicas based on the current load, it is not capable of optimizing resource usage over the long term: This is where the Vertical Pod Autoscaler (VPA) comes in.

The VPA can be leveraged to optimize resource usage over time, based on mid to long-term observation.

Please note that to avoid a race condition, the VPA should only be used together with HPAs that are based on custom metrics. In addition, the VPA should not be used with JVM-based services due to limited visibility into the actual memory usage of the workload (learn more about its limitations here).

If you haven’t already, go read Horizontal Pod Autoscaler in Kubernetes (Part 2) — Advanced Autoscaling using Prometheus Adapter and learn how to implement a Horizontal Pod Autoscaler using Prometheus Adapter!

🎬 Hi there, I’m Jean!

In this article, we’re going to learn how to use Vertical Pod Autoscaler (VPA) to vertically scale services in Kubernetes automatically based on resource metrics! 💪

Requirements

Before we start, make sure you have the following tools installed:

Note: for MacOS users or Linux users using Homebrew, simply run: brew install kind kubectl helm k6

All set? Let’s go! 🏁

Creating Kind Cluster

Kind is a tool for running local Kubernetes clusters using Docker container “nodes”. It was primarily designed for testing Kubernetes itself, but may be used for local development or CI.

I don’t expect you to have a demo project in handy, so I built one for you.

git clone https://github.com/jhandguy/vertical-pod-autoscaler.gitcd vertical-pod-autoscaler

Alright, let’s spin up our Kind cluster! 🚀

➜ kind create cluster --image kindest/node:v1.27.3 --config=kind/cluster.yamlCreating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.27.3) 🖼
✓ Preparing nodes 📦
✓ Writing configuration 📜
✓ Starting control-plane 🕹️
✓ Installing CNI 🔌
✓ Installing StorageClass 💾
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kindHave a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community 🙂

Installing cert-manager

cert-manager is a Kubernetes addon that automates the management and issuance of TLS certificates from various issuing sources. It ensures certificates are valid and up to date periodically, and attempts to renew certificates at an appropriate time before expiry.

cert-manager can be installed via its Helm chart.

helm repo add jetstack https://charts.jetstack.iohelm install jetstack/cert-manager --name-template cert-manager --create-namespace -n cert-manager --values kind/cert-manager-values.yaml --version 1.13.2 --wait

If everything went fine, you should be able to see three newly spawned Deployments with the READY state!

➜ kubectl get deploy -n cert-manager
NAME READY UP-TO-DATE AVAILABLE AGE
cert-manager 1/1 1 1 6m27m
cert-manager-cainjector 1/1 1 1 6m27m
cert-manager-webhook 1/1 1 1 6m27m

Installing NGINX Ingress Controller

NGINX Ingress Controller is one of the many available Kubernetes Ingress Controllers, which acts as a load balancer and satisfies routing rules specified in Ingress resources, using the NGINX reverse proxy.

NGINX Ingress Controller can be installed via its Helm chart.

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginxhelm install ingress-nginx/ingress-nginx --name-template ingress-nginx --create-namespace -n ingress-nginx --values kind/ingress-nginx-values.yaml --version 4.8.3 --wait

Now, if everything goes according to plan, you should be able to see the ingress-nginx-controller Deployment running.

➜ kubectl get deploy -n ingress-nginx
NAME READY UP-TO-DATE AVAILABLE AGE
ingress-nginx-controller 1/1 1 1 4m35s

Installing Metrics Server

Metrics Server is a source of container resource metrics, which collects them from Kubelets and exposes them in Kubernetes API server through Metrics API for use by Horizontal Pod Autoscaler and Vertical Pod Autoscaler.

Metrics Server can be installed via its Helm chart.

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-serverhelm install metrics-server/metrics-server --name-template metrics-server --create-namespace -n metrics-server --values kind/metrics-server-values.yaml --version 3.11.0 --wait

Now, if everything goes well, you should see a metrics-server Deployment running.

➜ kubectl get deploy -n metrics-server
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 3m21s

Installing Vertical Pod Autoscaler

Vertical Pod Autoscaler (VPA) is a component of the Kubernetes Autoscaler that frees users from the necessity of setting up-to-date resource limits and requests for the containers in their pods.

When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that the appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in the initial container configuration.

It can both down-scale pods that are over-requesting resources, and also up-scale pods that are under-requesting resources based on their usage over time.

Note: VPA is still in its beta phase, using it for production is at your own risk.

As of this writing, Kubernetes does not provide an official Helm chart, so I went ahead and built one!

helm install helm-chart --name-template vertical-pod-autoscaler --create-namespace -n vertical-pod-autoscaler --wait

If everything goes fine, you should eventually see three Deployments with the READY state!

➜ kubectl get deploy -n vertical-pod-autoscaler
NAME READY UP-TO-DATE AVAILABLE AGE
vert...scaler-admission-controller 1/1 1 1 2m32s
vert...scaler-recommender 1/1 1 1 2m32s
vert...scaler-updater 1/1 1 1 2m32s

As you can observe, the VPA is split into 3 separate components:

  • The Recommender computes the recommended resource requests for pods based on historical and current usage of the resources. The current recommendations are then put in the status of the VPA resource, where they can be inspected;
  • The Updater decides which pods should be restarted based on resource allocation recommendations calculated by Recommender. If a pod should be updated, Updater will try to evict the pod. It respects the pod disruption budget, by using the Eviction API to evict pods. Updater does not perform the actual resources update but relies on Admission Controller to update pod resources when the pod is recreated after eviction.
  • The Admission Controller will get a request from the API server for each pod creation and will either decide there’s no matching VPA configuration or find the corresponding one and use the current recommendation to set resource requests in the pod.

Configuring Vertical Pod Autoscaler

Now that the Vertical Pod Autoscaler is up and running, let’s get to it, shall we? 🧐

helm install sample-app/helm-chart --name-template sample-app --create-namespace -n sample-app --wait

If everything goes fine, you should eventually see one Deployment with the READY state.

➜ kubectl get deploy -n sample-app
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 3/3 3 3 58s

Alright, now let’s have a look at the VPA!

➜ kubectl describe vpa -n sample-app
...
Spec:
Resource Policy:
Container Policies:
Container Name: sample-app
Controlled Resources:
cpu
memory
Max Allowed:
Cpu: 100m
Memory: 200Mi
Min Allowed:
Cpu: 10m
Memory: 20Mi

Target Ref:
API Version: apps/v1
Kind: Deployment
Name: sample-app
Update Policy:
Update Mode: Auto

As you can see, this VPA is configured to scale the service based on its CPU and memory resources. Its spec states that the minimum allowed CPU/memory is 10m/20Mi and the maximum is 100m/200Mi.

Finally, its update policy is in “Auto” mode, meaning that VPA assigns resource requests on pod creation as well as updates them on existing pods using the preferred update mechanism.

Currently, both the resource requests and limits are matching the VPA’s minimum allowance.

➜ kubectl get pods -n sample-app -o yaml | grep -A 6 'resources:'
resources:
limits:
cpu: 10m
memory: 20Mi

requests:
cpu: 10m
memory: 20Mi

--
resources:
limits:
cpu: 10m
memory: 20Mi

requests:
cpu: 10m
memory: 20Mi

--
resources:
limits:
cpu: 10m
memory: 20Mi

requests:
cpu: 10m
memory: 20Mi

Now, let’s give some load to our service and see what happens!

For Load Testing, I really recommend k6 from the Grafana Labs team. It is a dead-simple yet super powerful tool with very extensive documentation.

See for yourself!

k6 run k6/script.js

While k6 is gradually increasing strain on the pods' CPU usage, let’s watch out for any EvictedByVPA events in a second tab: eventually, you should see all 3 pods get evicted simultaneously!

➜ kubectl get events -n sample-app -w | grep EvictedByVPA
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.

As soon as this happens, have a look at the updated pods’ resource requests and limits: the CPU/memory requests/limits should have doubled in value (CPU from 10m to 20m and memory from 10Mi to 20Mi).

➜ kubectl get pods -n sample-app -o yaml | grep -A 6 'resources:'
resources:
limits:
cpu: 20m
memory: "20971520"

requests:
cpu: 20m
memory: "20971520"

--
resources:
limits:
cpu: 20m
memory: "20971520"

requests:
cpu: 20m
memory: "20971520"

--
resources:
limits:
cpu: 20m
memory: "20971520"

requests:
cpu: 20m
memory: "20971520"

This means Vertical Pod Autoscaler successfully evicted the pods in order to increase their resource requests and limits! 🎉

Once k6 is done, have a look at the Load Test summary and the result of the status code counter metric in particular.

          /\      |‾‾| /‾‾/   /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: k6/script.js
output: -
scenarios: (100.00%) 1 scenario, 200 max VUs, 10m30s max duration (incl. graceful stop):
* load: Up to 40.00 iterations/s for 10m0s over 3 stages (maxVUs: 200, gracefulStop: 30s)
✗ status code is 200
↳ 97% — ✓ 17724 / ✗ 365

✗ node is kind-control-plane
↳ 97% — ✓ 17724 / ✗ 365
✗ namespace is sample-app
↳ 97% — ✓ 17724 / ✗ 365
✗ pod is sample-app-*
↳ 97% — ✓ 17724 / ✗ 365
✓ checks.........................: 97.98% ✓ 70896 ✗ 1460
data_received..................: 4.2 MB 7.1 kB/s
data_sent......................: 2.1 MB 3.5 kB/s
http_req_blocked...............: avg=18.39µs min=2µs med=8µs max=2.93ms p(90)=17µs p(95)=20µs
http_req_connecting............: avg=5.71µs min=0s med=0s max=2.74ms p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=188.13ms min=491µs med=4.57ms max=59.99s p(90)=269.43ms p(95)=646.78ms
{ expected_response:true }...: avg=129.55ms min=491µs med=4.62ms max=7.3s p(90)=261.24ms p(95)=602.03ms
http_req_failed................: 2.01% ✓ 365 ✗ 17724
http_req_receiving.............: avg=98.96µs min=0s med=75µs max=4.27ms p(90)=159µs p(95)=209µs
http_req_sending...............: avg=49.92µs min=7µs med=34µs max=14.39ms p(90)=72µs p(95)=93µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=187.99ms min=451µs med=4.38ms max=59.99s p(90)=269.31ms p(95)=646.69ms
http_reqs......................: 18089 30.147771/s
iteration_duration.............: avg=188.59ms min=666µs med=5.11ms max=1m0s p(90)=270.13ms p(95)=647.49ms
iterations.....................: 18089 30.147771/s
vus............................: 0 min=0 max=145
vus_max........................: 200 min=200 max=200
running (10m00.0s), 000/200 VUs, 18089 complete and 0 interrupted iterations
load ✓ [======================================] 000/200 VUs 10m0s 00.41 iters/s

Uh-oh… It looks like we had some downtime! 😱

Thankfully, our service was able to restart relatively fast and 365 out of 17724 requests failed. But for a service with a slower startup time, this could have led to an incident! 🚨

This is due to the fact that vertical scaling, in essence, cannot happen without a restart: a pod’s CPU and/or memory cannot be increased in place. Instead, the pod must be terminated and a new one created with increased resources.

So how do we ensure the availability of our service during vertical autoscaling then?

This is where the Pod Disruption Budget (PDB) comes in!

We’ll get to that in a minute, let’s uninstall our Helm release first! (we won’t be needing this one anymore)

helm uninstall sample-app -n sample-app

Configuring Pod Disruption Budget

To prevent downtimes during pod disruption such as the one we previously experienced, a Pod Disruption Budget (PDB) can be configured.

A PDB limits the number of pods that are down simultaneously from voluntary disruptions. It can be configured to sustain either a minimum amount of available pods (minAvailable), or a maximum amount of unavailable pods (maxUnavailable).

Let’s see what happens if we try to autoscale vertically the same application but with a Pod Disruption Budget with maxUnavailable: 1.

helm install sample-app/helm-chart --name-template sample-app --set podDisruptionBudget.enabled=true --create-namespace -n sample-app --wait

Once again, you should eventually see one Deployment with the READY state.

➜ kubectl get deploy -n sample-app
NAME READY UP-TO-DATE AVAILABLE AGE
sample-app 3/3 3 3 32s

Alright, now let’s have a look at the PDB!

➜ kubectl describe pdb -n sample-app
Name: sample-app
Namespace: sample-app
Max unavailable: 1
Selector: app=sample-app
Status:
Allowed disruptions: 1
Current: 3
Desired: 2
Total: 3

As you can see, this PDB is configured to prevent more than 1 pod to be unavailable during a voluntary pod disruption (such as pod eviction by VPA).

Now, let’s see how our service is going to behave under load with a PDB!

k6 run k6/script.js

Once again, while k6 is gradually increasing strain on the pods’ CPU usage, let’s watch out for any EvictedByVPA events in a second tab: eventually, you should see all 3 pods get evicted but this time, only one by one!

➜ kubectl get events -n sample-app -w | grep EvictedByVPA
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.
... Pod was evicted by VPA Updater to apply resource recommendation.

Once k6 is done, have a look at the Load Test summary and the result of the status code counter metric in particular.

          /\      |‾‾| /‾‾/   /‾‾/
/\ / \ | |/ / / /
/ \/ \ | ( / ‾‾\
/ \ | |\ \ | (‾) |
/ __________ \ |__| \__\ \_____/ .io
execution: local
script: k6/script.js
output: -
scenarios: (100.00%) 1 scenario, 200 max VUs, 10m30s max duration (incl. graceful stop):
* load: Up to 40.00 iterations/s for 10m0s over 3 stages (maxVUs: 200, gracefulStop: 30s)
✓ status code is 200
✓ node is kind-control-plane
✓ namespace is sample-app
✓ pod is sample-app-*
✓ checks.........................: 100.00% ✓ 72356 ✗ 0
data_received..................: 4.2 MB 7.0 kB/s
data_sent......................: 2.1 MB 3.5 kB/s
http_req_blocked...............: avg=19.7µs min=2µs med=7µs max=2.95ms p(90)=16µs p(95)=21µs
http_req_connecting............: avg=6.86µs min=0s med=0s max=2.24ms p(90)=0s p(95)=0s
✓ http_req_duration..............: avg=103.43ms min=452µs med=6.4ms max=5.29s p(90)=259.88ms p(95)=484.22ms
{ expected_response:true }...: avg=103.43ms min=452µs med=6.4ms max=5.29s p(90)=259.88ms p(95)=484.22ms
http_req_failed................: 0.00% ✓ 0 ✗ 18089
http_req_receiving.............: avg=99.34µs min=8µs med=77µs max=5.42ms p(90)=166µs p(95)=212µs
http_req_sending...............: avg=51.86µs min=9µs med=33µs max=18.76ms p(90)=71µs p(95)=96.59µs
http_req_tls_handshaking.......: avg=0s min=0s med=0s max=0s p(90)=0s p(95)=0s
http_req_waiting...............: avg=103.28ms min=418µs med=6.2ms max=5.29s p(90)=259.8ms p(95)=483.99ms
http_reqs......................: 18089 30.148343/s
iteration_duration.............: avg=103.9ms min=625.62µs med=6.98ms max=5.29s p(90)=260.24ms p(95)=485.04ms
iterations.....................: 18089 30.148343/s
vus............................: 0 min=0 max=69
vus_max........................: 200 min=200 max=200
running (10m00.0s), 000/200 VUs, 18089 complete and 0 interrupted iterations
load ✓ [======================================] 000/200 VUs 10m0s 00.41 iters/s

Yay! 🎉
This time, our service handled pod disruption beautifully and not a single request failed!

Thanks to the Pod Disruption Budget, a pod can only be evicted if all other pods are up, ensuring that at least 2 pods are available to handle the traffic.

This is what we call: High availability! 🚀

Wrapping up

That’s it! You can now stop and delete your Kind cluster.

kind delete cluster

To summarize, using Vertical Pod Autoscaler (VPA) we were able to:

  • Autoscale vertically our service, based on resource metrics;
  • Prevent downtime during pod eviction thanks to Pod Disruption Budget.

Was it worth it? Did that help you understand how to implement Vertical Pod Autoscaler in Kubernetes?

If so:

  1. Let me know in the comments below! 👇
  2. Don’t forget to hit that subscribe button! ✅
  3. Follow me on Twitter, I’ll be happy to answer any of your questions and you’ll be the first ones to know when a new article comes out! 👌

Bye-bye! 👋

--

--