article

High Server Density in the Cloud Should be the Norm, not a Pipedream

In the cloud, running only 10-100 instances/virtual machines per server is commonplace practice. In this post we explore why, and show that with a novel cloud stack designed from first principles it's possible to increase these numbers by orders of magnitude (thousands on a single server).

· 7 min read

As an ice breaker, I sometimes open talks at conferences with the following question:

When I present this, I first only show the question, and have the audience yell out guesses. Surprisingly, but also logically enough, the highest number I ever got was about 100 VMs on a server. I say logically because when you take into account how heavyweight typical VMs can be (yes, I know, with distros like Alpine you can put this down, but for most cases we’re talking in the GBs range), and the fact that in many deployments they run forever, even if idle, we start to see why a “measly” 64-core server might not be able to cope with that many VMs.

Are VMs a Must? And Must they be Heavy?

On public clouds, the hypervisor/VM model is the golden standard for providing the strong, hardware-level isolation needed for multi-tenancy, so a bit of a must, yes. In fact, in the public cloud, almost all deployments* are done on top of a VM, whether you know it or not; even deploying a container on the cloud means that it’ll be running with a VM underneath.

  • Normally asterisks get relegated to the footer or end of document, but I couldn’t resist promoting it to here: there are in some cases deployments based on language-level isolation mechanisms, such as isolates in the case of v8, but these provide weaker security guarantees than a hypervisor/VM – apologies in advance for the digression.

So now to the second question: are virtual machines fundamentally heavy? Well, it depends what you put in it, so fundamentally no.

When deploying to the cloud, in most cases we have a huge advantage: we know exactly what application or service we want to deploy before we actually deploy it. This means that we could, in principle, build a custom operating system and distro for our app, if only we had a magic wand that could do this for us simply and in an automated way. What we’d have then would be a highly specialized VM containing only the code that our app needs to run, plus our app, and nothing more. In essence, we’d like to revert the following:

HypervisorApplicationVMLanguage-Level IsolationLibraries & PackagesContainer RuntimeOS/Kernel (E.g. Linux)HypervisorApplicationVMUp to 90% of the VM’s codebaseand overhead is used for theunderlying Cloud Stack.

The technical term for such a specialized VM is a unikernel. Unikernels are typically based on a library operating system, that is, a modular OS, in order to maximize the level of specialization of the resulting VM images. In our case, we created the Unikraft Linux Foundation project many years ago as a unikernel development kit to make it easy to build such unikernels. In addition to efficiency, one of the key tenets of Unikraft was to ensure POSIX compatibility so that unmodified Linux applications could run on it (something akin to Linus’ “don’t break userspace”). Unikraft supports something called binary compatibility mode, allowing users to provide a standard Linux ELF that then gets (transparently) built into a unikernel.

Ok, so enough chatter, what happens if we were to use unikernels? By way of example, if we take an NGINX server (ELF) and build a Unikraft unikernel for it, we end up with an image of about 1.6MB on disk (the NGINX binary is roughly 1.1MB):

Terminal window
kraft cloud image ls --all
NAME VERSION SIZE
nginx latest 1.6 MB
[...]

When deployed, the image only needs 16MBs to actually run:

Terminal window
[] Deployed successfully!
────────── name: nginx-6cfc4
────────── uuid: 62d1d6e9-0d45-4ced-ad2a-619718ba0344
───────── state: running
─────────── url: https://long-violet-92ka3gk7.fra0.kraft.host
───────── image: nginx@sha256:fb3e5fb1609ab4fd40d38ae12605d56fc0dc48aaa0ad4890ed7ba0b637af69f6
───── boot time: 16.65 ms
──────── memory: 16 MiB
service group: long-violet-92ka3gk7
── private fqdn: nginx-6cfc4.internal
──── private ip: 172.16.6.4
────────── args: /usr/bin/nginx -c /etc/nginx/nginx.conf

Towards High Server Density

So we’re no longer in the realm of the GB-sized VM, but what can we do with such a unikernel beyond these basic stats? The first thing s to just take it out for a (stress test) spin: