Running Thousands of Millisecond Headless Browsers on a Single (Cloud) Server

Headless browsers have become a fundamental cog in the AI machine, running as a cloud service for running JavaScript code, scraping websites, automating testing, and integrating with AI agents. A headless browser is, however, a resource-hungry, heavy beast. In this article we show how Unikraft Cloud tech allows for deploying 1000s of these on a single server, while ensuring they can be started in 10s of milliseconds.

Mar 17, 2025 · 6 min read

While browsers are indispensable on desktops, their heavyweight nature poses a significant challenge in cloud deployments. The familiar memes about Chrome or Firefox consuming excessive memory and resources underscore this issue.

The question arises: can we afford to use such heavy browsers in the cloud?

Contrary to the perception that browsers are too heavy for cloud deployments, headless browsers, which can be run without a GUI interface, are set to play, and already are doing so, a pivotal role in the AI/agentic space, automating interaction with web applications.

Browsers can be run in the cloud “headless,” i.e., without a GUI interface, via automation tools such as Playwright and Puppeteer. Full-fledged headless browsers such as Chromium or Firefox are essential for automating interaction with web applications, as they allow running JavaScript code, full-page rendering (including CSS), taking screenshots, inspecting elements, and simulating user interactions — operations that are impossible using smaller CLI web clients such as curl or wget.

So, we need to run full-fledged headless browsers in the cloud, and potentially large fleets of them, to meet the significant and increasing demand of the AI agentic world. The issue is that a headless browser:

Takes a long time to start, so best left running forever
Consumes significant resources when running, so best shut down when idle.

To square that circle, Unikraft Cloud provides the ability to scale headless browsers up and down in a matter of milliseconds, as requests come in and then die down. To boot, and to save your wallet from collapsing, in can cram thousands of these on single servers, providing unparalleled levels of efficiency and superior units of economics.

To explain how this is done, let’s first consider the typical workflow for a request going to a headless browser:

The client makes a request, for example to take a screenshot of a webpage, to retrieve the element tree of a page or to extract the contents of a specific element.
A headless browser instance is fired up to serve the request.
The headless browser receives the request and does the necessary processing: it connects to the web page, pushes buttons, accesses menus, parses the web page, and creates a screenshot of the page.
The headless browser returns the response.
The headless browser either shuts down or keeps running to serve future requests.

For a typical cloud deployment (e.g., for AI agents), we need thousands (and potentially millions in the future) of headless browser instances. In addition, we want them to serve requests as fast as possible, which means not shutting them down, as doing so would incur severe delays when starting them up again — but keeping them on would consume large resources, which would reduce the size of the fleet we can deploy (or severely increase costs) … round and round we go.

Unikraft Cloud breaks this deadlock by providing the ability to scale (heavyweight) headless browsers down in milliseconds, bring them back up in millisecond, and host thousands of instances on a single server. It does so by combining a number of high performance features including scale-to-zero, snapshotting, load balancing, and autoscaling — enabling the rapid serving of requests from many headless browser instances, all at a fraction of the cost of other cloud platforms.

To make this more tangible, here’s a demo of this in action, where we deploy 64 headless Chromium instances and query them concurrently:

Here are the takeaways from the video:

Unikraft Cloud ensures that no resources are used when no requests are coming in; instances are immediately (in milliseconds) scaled to zero and shut down when not used, consuming no CPU nor memory resources. Instances are not gone however: they are (very efficiently) snapshotted, and thus their state saved.
When a new request comes in, Unikraft Cloud resumes an instance from a snapshotted state in a mere 50 milliseconds or less, with multiple instances servicing a given public endpoint URL. The Unikraft Cloud load balancer then distributes requests among various instances, allowing concurrent serving.

To further optimize this deployment, we used custom-made and trimmed the Chromium image to the minimal necessary components, allowing each instance to run with a relatively low 1.5GB of RAM — and we’re currently working on further optimizations.

What happens when there are more requests than the number of instances? There are two scenarios to consider:

In the typical case of intermittent requests (i.e., request, pause, request, pause), then there is no issue, as Unikraft Cloud quickly resumes an instance, the instance serves the request, gets suspended, and then resumes for the following request.
There may be, however, large request bursts that may overflow the 64 load-balanced instances; in this scenario, our platform provides millisecond autoscaling, allowing it to quickly fire up new instances to handle the spike in traffic, and to then to quickly scale them back down once the burst is over.

What about state for all of those headless browsers? How much space/storage does the platform need? The answer is, perhaps surprisingly, not much at all: any new request is served from the same initial snapshot created after the instance was started the first time (e.g., each time we do a new release of the image); in other words, we can create ensure that all instances are spawned from that single snapshots (though clearly the platform also allows for using multiple images/snapshots as basis for spawning instances).

In addition, depending on resource availability and needs, you can choose between three types of snapshots:

in-memory snapshots: the fastest option, but consumes memory.
on-disk snapshots: slower (but still milliseconds to restore!), but doesn’t consume memory and is persistent.
compressed on-disk: to reduce storage needs.

Interested? Take it out for a spin! Create a (free) account by clicking on the sign up button below, and go to our examples repository (we have deployments available with Chromium, Firefox, and WebKit). Follow the instructions and enjoy millisecond headless browsers on Unikraft Cloud. If you’d like to learn more on a call, including more details about how it works, additional features or a potential deployment, you can book a call.

If you have comments or questions please drop by our Discord serer, write to support@kraft.cloud or DM me on LinkedIn.

Get early access to the Unikraft Cloud

If you want to find out more about the tech behind Unikraft Cloud read our other blog posts, join our Discord server and check out the Unikraft’s Linux Foundation OSS website . We would be extremely grateful for any feedback provided!

Sign-up now

Platform Features

Dive Deeper

Reference

Running Thousands of Millisecond Headless Browsers on a Single (Cloud) Server

Get early access to the Unikraft Cloud

Related Posts

The History of Unikraft

Serverless is a Broken Promise - Time to Fix it!

Unikraft and WebAssembly: Friends or Foes?

Introducing Unikraft Cloud