article

Millisecond Autoscale for Apps: a Pipedream?

In the cloud, you can have fast autoscale for functions, or slow autoscale for apps, but not both. In this blog post we show that on Unikraft Cloud millisecond autoscale for apps is not only possible, but available now.

· 4 min read

In Alice in Wonderland, the King suggests to “Begin at the beginning”, so let’s start with Wikipedia’s definition of autoscale:

[…] a method used in cloud computing that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load

Easy enough, but what I feel is missing from this definition when it talks about “load” is the question of timescales. On the Internet, bursts for such loads happen quickly, sometimes in scales of seconds or milliseconds; it would make sense, then, that the autoscale mechanisms that are designed to cope with such bursts would operate in those same timescales.

But they don’t. If you ever set up autoscale mechanisms on hyperscaler infra, you know that it takes seconds and often minutes for new instances to come up, and so a reactive approach isn’t always feasible. There are then a few sub-optimal workarounds:

  1. Use scheduled autoscaling, where you know when bursts will happen, and what the size of them will be — a big assumption.
  2. Use predictive autoscaling — but using heuristics to predict the future is tricky and not always accurate
  3. Use FaaS offerings like AWS Lambda, where autoscale happens quickly (ie, it is reactive) — but you then have to cope with other limitations like cold starts and having to run under a functions model.

Back To First Principles?

This whole landscape feels like we’re tying ourselves in knots. If we were to go back to first principles, we could perhaps outline the requirements for an ideal autoscale mechanism as follows:

  • Reactive in timescales of milliseconds, both for scaling out and scaling in
  • Based on apps, so not limited to functions
  • Provide strong, hardware-level isolation (i.e., no language level isolation)

The bottom line is that the tech currently used on cloud platform (based on general-purpose OSes, standard distros and slow-to-react controllers) makes it really hard to achieve millisecond reactivity.

A Different Way Forward

In order to have millisecond autoscale, a basic building block is that apps on a cloud platform need to be able to start and stop very quickly. To achieve this, we leverage years of research and open source work into unikernels, extremely specialized, fast yet strongly isolated virtual machines that can cold start in milliseconds. A second, Fundamental building block is a high performance controller and proxy built from scratch that can be reactive in scales of milliseconds and scale to thousands of instances. And some mods and tweaks to the underlying network interfaces and host to make sure everything runs fast.

The end result is that on Unikraft Cloud, scale out (the process of adding instances to cope with increased load) happens in milliseconds, so you can transparently and effortlessly handle load increase including traffic peaks. No more headaches due to slow autoscale like keeping hot instances around to deal with peaks, coming up with complex predictive algorithms, or other painful workarounds; you can just set autoscale on and let Unikraft Cloud handle your traffic increases and peaks. And all of this while using apps/containers, not functions, all of them strongly isolated (unikernels are, after all, VMs, albeit extremely specialized ones).

Millisecond App Autoscale in Practice

Setting up autoscale on Unikraft Cloud is fairly simple. To show this, we’ll use NGINX as the app to deploy. First, we need to deploy an instance of it, and we’ll use the kraft cloud deploy -p 443:8080 one-liner command to do so:

Terminal window
[] Deployed successfully!
────────── name: nginx-4d7u3
────────── uuid: 8fda2a70-6a32-4b5e-8900-4395b33d02d7
───────── state: running
─────────── url: https://small-leaf-rafirkw7.fra0.kraft.host
───────── image: nginx@sha256:389bfa6be6455c92b61cfe429b50491373731dbdd8bd8dc79c08f985d6114758
───── boot time: 20.36 ms
──────── memory: 128 MiB
service group: small-leaf-rafirkw7
── private fqdn: nginx-4d7u3.internal
──── private ip: 172.16.6.5
────────── args: /usr/bin/nginx -c /etc/nginx/nginx.conf

It’s worth noting the cold start of just 20 milliseconds, which is fundamental to having fast scale out. Next, we’ll configure how we want autoscale to function: in this case we’ll configure it to use 0-8 instances, and scale out and in based on CPU (though it can also be done based on network metrics):

Terminal window
kraft cloud scale init small-leaf-rafirkw7 --master nginx-4d7u3 --min-size 0 --max-size 8
kraft cloud scale add small-leaf-rafirkw7 --name scale-out-policy --metric cpu --adjustment percent --step 600:800/50 --step 800:/100
kraft cloud scale add small-leaf-rafirkw7 --name scale-in-policy --metric cpu --adjustment percent --step :50/-50

And that’s it! You now have millisecond autoscale operating on apps. To show that it is fast/reactive, check out this brief video showing different traffic loads and how Unikraft Cloud’s autoscale reacts to them:

Uniraft Cloud autoscale

Get early access to the Unikraft Cloud

If you want to find out more about the tech behind Unikraft Cloud read our other blog posts, join our Discord server and check out the Unikraft’s Linux Foundation OSS website . We would be extremely grateful for any feedback provided!

Sign-up now