article

Millisecond Scale-to-Zero and the Quest to Never Pay for Idle Again

Your workloads are intermittent, but your bill is always on? On Unikraft Cloud, a simple `-0` flag is enough to have any of your apps scale-to-zero. No traffic? No running instance.

· 7 min read

It is a reality that traffic on the Internet is intermittent and unpredictably so: peaks of high traffic request rates are followed by periods of little to no traffic. Contrast this with your billing, which is often always on: it is precisely this dissonance that we’re looking to solve with Unikraft Cloud.

The high level of goal is clear: if there’s no traffic, would it be possible to have a mechanism that could detect this, and automatically scale your app to zero? And then detect when traffic comes back, and wake the app back up, quickly enough such that your users don’t notice (i.e., in the milliseconds of the Internet’s round-trip times?).

In traditional cloud providers, the answer is a categoric no: you’d have to go to FaaS offerings, and even then, those are not silver bullets, since they come with issues like cold starts, run durations, limitations on what apps and languages you can run (they are, after all, function-based, not container/VM-based), memory limits, etc. Outside of the FaaS world, start times can range from seconds to minutes, and so implementing a fast scale to zero mechanism is next to impossible

But Don’t Cloud Providers Already Offer Scale to Zero?

To be clear, many cloud providers offer scale to zero, but the definition of what they offer and what this mechanism is is a bit loose.

To a first approximation, any platform that can transparently shut down your app when it’s idle and can then transparently wake it up when traffic for it arrives once again can claim to provide a scale to zero mechanism.

But let’s zoom out and think about the high-level purpose of scale to 0: to make sure that you don’t pay for idle, when you’re app isn’t receiving any traffic, and that when traffic does arrive your app’s users don’t notice that a scale to 0 mechanism is in place. In short, what we’d like is:

  • Scale to 0: The cloud platform should quickly, in milliseconds, detect when an app is idle and immediately put it to sleep so you don’t get overcharged. The reality of current offering is, however, that it can take seconds or even minutes for this detection to actually happen; you’re getting billed for your cloud provider’s overhead.
  • Scale to 1: When traffic for your app shows up again, the platform should detect it, wake up your app, and your app respond, all ideally within a small amount of the request’s RTT, so that your end user doesn’t notice your app was ever sleeping. Once again, current offerings fall short of this, often taking seconds to have your app come up, causing a degraded experience for your users.

Breaking out of the Status Quo

Fundamentally, what we are aiming to achieve is a scale to zero mechanism (and subsequent scale from 0 to 1) that can operate in scales of milliseconds, to ensure that the mechanism is completely transparent to end users. This means that the instances that run your apps have to be able to cold start in milliseconds (you might want to read this blog post about such fast cold starts on Unikraft Cloud).

But before I dive into how this can be achieved, let me show that on Unikraft Cloud it does already exist and can be easily enabled: a simple -0 flag to our OSS kraft CLI tool is sufficient to tell the platform that an instance should scale-to-zero:

Terminal window
kraft cloud deploy -0 -p 443:8080 .
Terminal window
[] Deployed successfully!
────────── name: nginx-192iw
────────── uuid: eaaa52ea-21b1-4ab8-a7fe-f1eef0a29de0
───────── state: running
─────────── url: https://falling-star-102sxpqm.fra0.kraft.host
───────── image: nginx@sha256:b9000d409c2217ff1f61d40a9b99fefbbd7afeb0f9aa9d6263ca47991597bde8
───── boot time: 21.32 ms
──────── memory: 128 MiB
service group: falling-star-102sxpqm
── private fqdn: nginx-192iw.internal
──── private ip: 172.16.6.5
────────── args: /usr/bin/nginx -c /etc/nginx/nginx.conf

And that’s it! Now we have an NGINX instance running with scale to 0. But wait, why is its state set to running if it has not traffic and we passed the -0 flag? In fact, when deployed, the instance’s state is set to running, but since it has no traffic, it then immediately gets scaled to 0 (in this case after 500 milliseconds of traffic inactivity, but this is configurable). We can easily check that ths instance has been scaled to 0 by using the kraft cloud inst command:

Terminal window
kraft cloud instance ls
Terminal window
NAME FQDN STATE CREATED AT IMAGE MEMORY ARGS BOOT TIME
nginx-192iw falling-star-102sxpqm.fra0.kra... standby 1 minute ago nginx@sha2... 128 MiB /usr/bin/nginx -c /etc/nginx/... 21.32 ms

In this output you can see that the instance’s current state is set to standby, meaning that it’s now dormant, consuming no resources, and waiting for the next request to come in before waking back up. You can test that this is the case by putting a watch on this commands, e.g., watch --color -n 0.5 kraft cloud inst ls, pointing your browser to the instance’s URL, and seeing that the instance immediately replies, and that its state goes from standby to running and back to standby.

Also note that in this case we used a web server (NGINX) as an example, but you can use the scale-to-zero flag with any other app/lang from our examples repo.

Instances Galore?

Another benefit of scale-to-zero on Unikraft Cloud is that we can distinguish between two types of instances:

Terminal window
kraft cloud quotas
Terminal window
user uuid: <REDACTED>
user name: $KRAFTCLOUD_USER
active instances: ████████████████████████████████████ 1/16
total instances: ████████████████████████████████████ 8/64
active used memory: ████████████████████████████████████ 256 MiB/4.0 GiB
memory size limits: 16 MiB-2.0 GiB
exposed services: ████████████████████████████████████ 2/253
service groups: ████████████████████████████████████ 1/253
volumes: enabled
active volumes: ████████████████████████████████████ 2/253
used volume space: ████████████████████████████████████ 756 MiB/1.0 GiB
volume size limits: 1.0 MiB-1.0 GiB
autoscale: enabled
autoscale limit: 0-16
scale-to-zero: enabled

Active instances are instances that are active and running (i.e., not scaled to zero); these are limited by the maximum amount of instances allowed on a particular tier. However, on Unikraft Cloud you can also have a larger number of total instances: the sum of live instances plus those that are scaled to zero. For example, in the listing above, you could have 16 active instances but also 64 - 16 = 48 additional scaled to zero ones.

This means that if your traffic is intermittent or you can control how or when it arrives you can effectively run more apps/instances than your current plan’s maximum number of instances allows.

How Does it Work?

To have a millisecond scale to zero mechanism, several components have to come together, all of which need to carry out their tasks fast. When an instance is scaled to zero (in the standby state) and a request for it arrives, the first thing that happens is that a custom, front-end load balancer we built buffers the request and signals the platform’s controller.

The controller we built in-house, since we needed one that could be reactive in milliseconds and scale to potentially thousands of instances. Upon receiving the notification, the controller identifies the instance the request corresponds to, and immediately asks Firecracker, the virtual machine monitoring we use, to wake the instance up.

The instance has to also, of course, wake up quickly. To do so, Unikraft Cloud leverages extremely specialized virtual machines called unikernels based on the Linux Foundation OSS Unikraft project. These uniknernels provide the same strong, hardware-level isolation VMs do (after all, they are VMs), but can cold start in a few milliseconds — you can read the blog post about cold starts for more details.

Finally, when up, the unikernel notifies the controller, which then signals the load balancer to send the request to the (now running) instance so it can answer — all of this within a small mount of an Internet RTT so end users don’t notice.

A Brief Demo Video

At KubeCon 2024 in Paris we had a booth demo showing all of this in action; we made a 1-minute video you might want to check out.

If you’d like to try Unikraft Cloud out for yourself you can sign up through the button below. If you have comments or questions please drop by our Discord serer, write to support@kraft.cloud or DM me on LinkedIn.

Get early access to the Unikraft Cloud

If you want to find out more about the tech behind Unikraft Cloud read our other blog posts, join our Discord server and check out the Unikraft’s Linux Foundation OSS website . We would be extremely grateful for any feedback provided!

Sign-up now