article

The Idle (and Over-provisioned) Cloud

We generally like to think of the cloud handling gazillion requests per second, but a significant part of it is over-provisioned and so idle. In this post we look into the what, how and why what we term “The Idle Cloud” exists, and how to fix it.

· 5 min read

Let’s start with the elephant in the room: while the cloud has certainly revolutionized the way we do business, and handles unimaginable amounts of traffic per day, it is massively over-provisioned and so also often idle. There are many reports and accounts attesting to this, but let’s take Datadog’s recent State of Cloud Costs report which nicely quantifies the problem:

Idle resources account for a majority of container costs.

So “idle resources account for a majority of container costs” – their words, not mine. And the problem isn’t limited to “just” containers; reports along these lines are common all across cloud services:

So we have what we term “The Idle Cloud”, but where is all of this idling coming from? Some of it of course comes from people allocating resources, especially for dev purposes, rarely or never using them, and then forgetting to turn them off. There are many tools to help identify such cases and to shut down unused resources.

But this isn’t where the big problem lies, because much of the idleness comes from production systems that are over-provisioned to cope with peak traffic, or to ensure that services respond with minimal delay. We do this because traffic (and traffic bursts) happen in timescales of milliseconds, but the cloud was never built for such short timescales.

The fundamental reason behind the Idle Cloud is the orders of magnitude dissonance between millisecond demands from traffic being serviced (or attempted to be serviced) by cloud services that take seconds or even minutes to come up.

millisecond timescalesCloud Trafficseconds-to-minutes timescalesCloud Infra

More concretely, this dissonance manifests itself in the form of a large range of common, constantly recurring cloud infra issues, which are commonly solved by throwing money at the problem, aka, over-provisioning:

IssueCurrent (Over-provisioning) Workaround
Extremely slow cold startsKeep instances warm by artificially pinging them, or pay more to “boost” cold start times.
Slow scale to zeroKeep instances on longer (minutes) than they need to.
Slow auto-scaleProvision for peak capacity always or use a FaaS platform but then deal with a number of other shortcomings (eg, run duration limits, limited language support, functions only, etc).
Low server densityKeep idle instances always on and use many more servers than needed.

Many people even resort to renting bare metal servers even for intermittent workloads (the ultimate form of over-provisioning: practically owning a server and running it forever) because that’s cheaper than the workarounds above. The cloud is not only idle but also messy – we covered the latter in our 2-part post on the cloud engineer’s dilemma.

Cloud traffic has Millisecond Semantics. Shouldn’t Cloud Platforms Too?

Asking cloud infra designed to operate in seconds or minutes to service traffic that is intermittent or bursty in millisecond timescales is bound to cause problems. At Unikraft, we believe it’s time to disrupt the current way we build cloud infra, as well as the technology we use to build it with, as well as the technology we use to build cloud infra, to one where it can be reactive, in milliseconds timescales. We call this true serverless, because we feel the promise of serverless was that devs would just deploy apps and services and would never have to even think about infra issues (the “less” in serverless) – but that promise has unfortunately been largely broken: devs are very much aware that there’s servers/cloud infra underneath, and, worse, there’s little they can control or do about fixing the root problem (nor should they!) other than through over-provisioning. One could even argue that the incentives for cloud infra providers to fix these issues are mis-aligned, in that they make more money out of these inefficiencies, but let’s table that for another day.

Our Unikraft Cloud platform provides true, millisecond semantics: cold starts, autoscale, and scale to zero are all extremely snappy, all of the while providing strong, hardware-level isolation via virtual machines (the golden standard for multi-tenancy on public clouds) and the ability to deploy complex services based on Dockerfiles, not just functions. As an example, and to illustrate, here’s what happens with scale to zero on current cloud infra (top) and on Unikraft Cloud (bottom):

Slow & CostlyInitializationActual WorkloadCOMPUTEWITHOUT UNIKRAFT CLOUDCold startTIMEStart of cooldownActual WorkloadCOMPUTEWITH UNIKRAFT CLOUDCold startTIMEStart of cooldownCostlyCooldownThe UnikraftDifferenceThe UnikraftDifference

Quantitatively, here are a couple of stats for cold starts:

Another important result of these millisecond semantics is that we can have massive server density: imagine you have a large user base consisting of 1 million users, but a long tail of inactive users. With normal infra, you’d have to provision for 1 million users, but with a millisecond, reactive platform, you could just provision for the max number of concurrently active users. In many cases, the difference between that number and the number of total users can be several orders of magnitude, and so with Unikraft Cloud this means you could service your 1M users with a fraction of the servers you’d normally need. In some stress tests, we’ve had as many as 100,000 instances running on a single, standard server. I won’t cover how we achieve milliseconds semantics in this post; for a high level explanation of this you can check out this how it works page.

Get early access to the Unikraft Cloud

Unikraft Cloud has free early access, so if you find it intriguing and would like to get a taste of what a next generation compute cloud platform looks like:

Sign-up now