How Cloudflare eliminates cold starts for serverless workers

Alex Xu delivers a masterclass in distributed systems engineering by revealing a counterintuitive truth: the most effective way to eliminate latency isn't to make things faster, but to stop them from starting over. In a landscape where serverless computing is often sold on the promise of infinite scale, Xu exposes the hidden tax of "cold starts"—the delay when code must initialize from scratch—and details how Cloudflare engineered a solution that reduced these delays by a factor of ten. This isn't just a technical deep dive; it's a case study in how understanding traffic patterns can outperform raw computational power.

The Broken Promise of Hiding Delays

Xu begins by dissecting the original strategy Cloudflare employed in 2020, which relied on a clever timing trick. The team masked initialization delays by pre-warming code during the Transport Layer Security (TLS) handshake, the security protocol that encrypts web traffic. "The original technique worked because Cloudflare could identify which Worker to start from the Server Name Indication (SNI) field in the very first TLS message," Xu writes. This approach was elegant because it utilized the inherent latency of network round-trips to hide the work of compiling code.

How Cloudflare eliminates cold starts for serverless workers

However, this solution was fragile, built on a specific temporal balance that has since collapsed. As Xu notes, "Over the past five years, this relationship broke down for two reasons." First, the code itself grew heavier; Cloudflare increased script size limits, meaning more data to transfer and compile. Second, the security protocol became more efficient. "TLS 1.3 reduced the handshake from three round-trips to just one round-trip," Xu explains. The very efficiency of modern security protocols stripped away the time buffer needed to hide the initialization lag. Critics might argue that simply optimizing the compilation speed would have been a more direct fix, but Xu's analysis suggests that without changing the architecture, the problem would merely resurface with the next software update.

The original solution no longer eliminated the problem because cold starts became longer and TLS handshakes became faster.

Sharding: The Power of Consistency

The core of Xu's argument is a pivot from optimization to routing. Instead of trying to make a cold start faster, Cloudflare decided to make cold starts unnecessary. The author describes a scenario where a low-traffic application receives one request every five hours across a 300-server data center, forcing a restart every single time. "The solution involves routing all requests for a specific Worker to the same server within a data center," Xu writes. By concentrating traffic, the code stays warm in memory, turning a 100% cold start rate into a near-zero one.

To achieve this without causing chaos when servers go offline, Xu details the implementation of a "consistent hash ring." This data structure maps both servers and workers to a number line, ensuring that when a server is added or removed, only a tiny fraction of the workers need to be reassigned. "When a server disappears from the ring, only the Workers positioned immediately before it need reassignment," he notes. This stability is crucial; if the system constantly reshuffled assignments, the benefits of keeping code warm would vanish. This approach mirrors the logic used in distributed caching systems, proving that sometimes the best new technology is a refined application of old principles.

Handling the Human Cost of Overload

The engineering challenge didn't end with routing; it shifted to managing what happens when a single server becomes a bottleneck. Xu outlines two approaches: a pessimistic one where the client asks for permission before sending data, and an optimistic one where the client sends the request immediately. Cloudflare chose the latter. "Cloudflare chose the optimistic approach for two reasons," Xu writes, noting that refusals are rare and that the system has a fallback mechanism.

When a server is overloaded, it doesn't just reject the request; it returns a "capability" that tells the client to handle the work locally. This is where the system's sophistication shines. Xu explains that the Workers runtime uses Cap'n Proto RPC to manage this handoff. "The RPC system recognizes that the capability actually points back to a local lazy Worker. Once it realizes the request would loop back to the shard client, it stops sending additional request bytes to the shard server and handles everything locally." This prevents the waste of bandwidth that would occur if large data payloads were sent to a server only to be bounced back. It is a graceful degradation strategy that prioritizes user experience over rigid system boundaries.

Forwarding a request to a warm Worker is always faster than starting a cold Worker locally.

The Bottom Line

Xu's piece succeeds because it reframes a performance problem as a distribution problem, demonstrating that 99.99% reliability is achieved not by making the fast faster, but by ensuring the slow never happens. The strongest part of the argument is the empirical evidence: a 10x reduction in eviction rates achieved by sharding only 4% of requests, leveraging the power law distribution of internet traffic. The biggest vulnerability lies in the complexity introduced; as Xu admits, supporting nested Worker invocations requires serializing execution context across servers, adding a layer of architectural fragility. For engineers and architects, the takeaway is clear: in distributed systems, the most efficient path forward is often to stop fighting the network and start working with it.

Bottom Line

Alex Xu's analysis proves that in the era of serverless computing, the most valuable resource is not raw CPU cycles, but memory locality. By shifting the focus from optimizing initialization speed to optimizing request routing, Cloudflare achieved a tenfold improvement in performance that brute-force optimization could never match. The lesson for the industry is that architectural elegance often trumps incremental efficiency gains.

How Cloudflare eliminates cold starts for serverless workers

by Alex Xu · ByteByteGo Newsletter · Read full article

Cut Code Review Time & Bugs in Half (Sponsored).

Code reviews are critical but time-consuming. CodeRabbit acts as your AI co-pilot, providing instant Code review comments and potential impacts of every pull request.

Beyond just flagging issues, CodeRabbit provides one-click fix suggestions and lets you define custom code quality rules using AST Grep patterns, catching subtle issues that traditional static analysis tools might miss.

CodeRabbit reviews 1 million PRs every week across 3 million repositories and is used by 100 thousand Open-source projects.

CodeRabbit is free for all open-source repo’s.

Cloudflare has reduced cold start delays in its Workers platform by 10 times through a technique called worker sharding.

A cold start occurs when serverless code must initialize completely before handling a request. For Cloudflare Workers, this initialization involves four distinct phases:

Fetching the JavaScript source code from storage

Compiling that code into executable machine instructions

Executing any top-level initialization code

Finally, invoking the code to handle the incoming request

See the diagram below:

The improvement around cold starts means that 99.99% of requests now hit already-running code instances instead of waiting for code to start up.

The overall solution works by routing all requests for a specific application to the same server using a consistent hash ring, reducing the number of times code needs to be initialized from scratch.

In this article, we will look at how Cloudflare built this system and the challenges it faced.

Disclaimer: This post is based on publicly shared details from the Cloudflare Engineering Team. Please comment if you notice any inaccuracies.

The Initial Solution.

In 2020, Cloudflare introduced a solution that masked cold starts by pre-warming Workers during TLS handshakes.

TLS is the security protocol that encrypts web traffic and makes HTTPS possible. Before any actual data flows between a browser and server, they perform a handshake to establish encryption. This handshake requires multiple round-trip messages across the network, which takes time.

The original technique worked because Cloudflare could identify which Worker to start from the Server Name Indication (SNI) field in the very first TLS message. While the rest of the handshake continued, they would initialize the Worker in the background. If the Worker finished starting up before the handshake completed, the user experienced zero visible delay.

See the diagram below:

This technique succeeded initially because cold starts took only 5 milliseconds while TLS 1.2 handshakes required three network round-trips. The handshake ...

The Broken Promise of Hiding Delays

Sharding: The Power of Consistency

Handling the Human Cost of Overload

The Bottom Line

Bottom Line

Sources

How Cloudflare eliminates cold starts for serverless workers