How Disney hotstar (now JioHotstar) scaled its infra for 60 million concurrent users

Alex Xu · ByteByteGo Newsletter ·Nov 18, 2025 ·17 min read

Commentary by Hex Index staff

Most engineering case studies focus on the glory of the launch; Alex Xu focuses on the terrifying fragility of the system just before it breaks. This piece is notable not because it celebrates a record-breaking 60 million concurrent users, but because it meticulously dissects how a platform nearly collapsed under its own weight and had to fundamentally rewrite its DNA to survive. For busy leaders, the value here isn't the technical jargon—it's the stark lesson that simply adding more servers is a losing strategy when your architecture is fundamentally flawed.

The Gateway Bottleneck

Xu begins by dismantling the assumption that scale is linear. He notes that before the 2023 Cricket World Cup, the platform was already straining at 25 million users on self-managed clusters. The challenge wasn't just volume; it was the introduction of a "Free on Mobile" initiative that exploded the user base overnight. "Hotstar's engineers knew that simply adding more servers would not be enough," Xu writes, highlighting a critical pivot point where infrastructure strategy must evolve or fail.

How Disney hotstar (now JioHotstar) scaled its infra for 60 million concurrent users

The author's framing of the Content Delivery Network (CDN) is particularly sharp. Instead of acting merely as a cache for video files, the CDN nodes were forced to perform heavy lifting as API gateways, verifying security tokens and processing requests. "The system began to hit limits on how many requests it could process per second," Xu observes. This is a classic case of a layer designed for speed being overloaded with logic. The solution—separating cacheable data like scorecards from non-cacheable user sessions—wasn't a hardware upgrade but a logical reorganization. By creating a dedicated CDN domain for static data, they freed up edge capacity. This approach mirrors lessons from the 2023 Cricket World Cup deep dives, where network address translation (NAT) often becomes the silent killer of performance. Xu correctly identifies that "not all API requests were equal," a distinction that saved the platform from a cascade failure.

Critics might argue that this level of granular optimization is only possible for a company with infinite engineering resources, but the principle of separating stateless, cacheable traffic from stateful, dynamic requests is universally applicable. The real insight here is that efficiency often comes from doing less, not more.

"Each extra rule increases processing time, and by removing unnecessary ones, the platform was able to save additional compute resources."

The Hidden Cost of Network Topology

Moving deeper into the stack, Xu shifts the focus to the invisible plumbing of the cloud: Network Address Translation (NAT) gateways. This section is a masterclass in diagnostic rigor. The team discovered a bizarre imbalance where one cluster was consuming 50 percent of the total NAT bandwidth while running at only 10 percent of expected peak load. "This meant that if traffic increased five times during the live matches, the gateways would have become a serious bottleneck," Xu explains. The fix was counter-intuitive: instead of fewer, larger gateways, they deployed one per subnet to distribute the load. This granularity prevented a single point of failure from taking down the entire region.

The commentary on Kubernetes worker nodes is equally vital. Xu describes how bandwidth-heavy services were causing contention on individual nodes, with some consuming up to 9 gigabits per second. The solution involved a dual approach: upgrading to high-throughput nodes and using "topology spread constraints" to ensure only one gateway pod ran per node. This prevents the "noisy neighbor" problem where one service starves others of resources. "This ensured that no single node was overloaded and that network usage remained balanced across the cluster," Xu writes. It's a reminder that in distributed systems, isolation is just as important as raw power.

However, the migration to Amazon Elastic Kubernetes Service (EKS) reveals a new set of challenges. While moving the control plane to a managed service reduced operational fragility, it introduced API server throttling at scales beyond 400 nodes. "The Kubernetes API server, which coordinates all communication within the cluster, began slowing down and temporarily limiting the rate at which new nodes and pods could be created," Xu notes. The team's response—stepwise scaling in batches of 100 to 300 nodes—shows that even managed services have hard limits that require human ingenuity to navigate.

Architectural Abstraction and the End of the Line

The final act of the story addresses the limitations of the legacy setup: port exhaustion, IP address depletion, and the inability to use modern hardware like Graviton processors. The old architecture, built on KOPS and older Kubernetes versions, was hitting a wall. "With more than 800 services deployed, Hotstar was fast running out of available ports," Xu writes, illustrating how technical debt accumulates silently until it becomes a hard stop.

The introduction of "Datacenter Abstraction" is the piece's most forward-looking concept. Xu explains that this model treats a "data center" not as a physical building, but as a logical grouping of resources. This abstraction allows the system to scale without being tethered to the physical constraints of a specific subnet or hardware generation. It is a move from managing infrastructure to managing logic. "Every time a major cricket tournament or live event was about to begin, the operations team had to manually pre-warm hundreds of load balancers," Xu recalls of the old way. The new architecture automates this, turning a days-long manual ritual into a seamless, automated process.

Critics might note that such abstraction adds a layer of complexity that could obscure failures if not monitored correctly. Yet, as Xu implies, the alternative—manually managing hundreds of load balancers and fighting port exhaustion—is a recipe for disaster at this scale. The shift from reactive monitoring to proactive ops, as hinted in the article's opening, is the only path forward for systems of this magnitude.

"The platform's architecture needed to evolve to handle higher traffic while maintaining reliability, speed, and efficiency."

Bottom Line

Alex Xu's analysis succeeds because it refuses to treat the 60 million user milestone as a victory lap; instead, it treats it as a forensic audit of near-collapse. The strongest part of the argument is the demonstration that architectural flexibility—specifically the separation of concerns and logical abstraction—outperforms brute-force scaling every time. The biggest vulnerability in the narrative is the sheer scale of the resources required to implement these fixes, which may feel out of reach for smaller organizations, though the principles remain universally valid. For the smart, busy reader, the takeaway is clear: in the era of hyper-scale, the most valuable asset isn't the server farm, but the ability to rewire the system before the lights go out.

Deep Dives

Explore these related deep dives:

2023 Cricket World Cup
The article specifically mentions this tournament as one of the peak events Hotstar had to scale for. Understanding the scope, viewership, and cultural significance of this event helps contextualize why 60 million concurrent users was such an engineering challenge.
Content delivery network
CDNs are central to the article's technical discussion - they handled API gateway functions, caching, security, and rate limiting. A deeper understanding of how CDNs work globally would help readers appreciate the architectural decisions described.
Network address translation
The article discusses NAT Gateway scaling as a critical bottleneck. Understanding how NAT works, its limitations, and why it creates scaling challenges provides essential context for the infrastructure problems Hotstar solved.

Sources

How Disney hotstar (now JioHotstar) scaled its infra for 60 million concurrent users

by Alex Xu · ByteByteGo Newsletter · Read full article

How DevOps Teams Scale Observability in 2025 (Sponsored).

Today’s systems are getting more complex, more distributed, and harder to manage. If you’re scaling fast, your observability strategy needs to keep up. This eBook introduces Datadog’s Observability Maturity Framework to help you reduce incident response time, automate repetitive tasks, and build resilience at scale.

You’ll learn:

How to unify fragmented data and reduce manual triage

The importance of moving from tool sprawl to platform-level observability

What it takes to go from reactive monitoring to proactive ops

Disclaimer: The details in this post have been derived from the details shared online by the Disney+ Hotstar (now JioHotstar) Engineering Team. All credit for the technical details goes to the Disney+ Hotstar (now JioHotstar) Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

In 2023, Disney+ Hotstar faced one of the most ambitious engineering challenges in the history of online streaming. The goal was to support more than 50 to 60 million concurrent live streams during the Asia Cup and Cricket World Cup. These are events that attract some of the largest online audiences in the world. For perspective, before this, Hotstar had handled about 25 million concurrent users on two self-managed Kubernetes clusters.

To make things even more challenging, the company introduced a “Free on Mobile” initiative, which allowed millions of users to stream live matches without a subscription. This move significantly expanded the expected load on the platform, creating the need to rethink its infrastructure completely.

Hotstar’s engineers knew that simply adding more servers would not be enough. The platform’s architecture needed to evolve to handle higher traffic while maintaining reliability, speed, and efficiency. This led to the migration to a new “X architecture,” a server-driven design that emphasized flexibility, scalability, and cost-effectiveness at a global scale.

The journey that followed involved a series of deep technical overhauls. From redesigning the network and API gateways to migrating to managed Kubernetes (EKS) and introducing an innovative concept called “Data Center Abstraction,” Hotstar’s engineering teams tackled multiple layers of complexity. Each step focused on ensuring that millions of cricket fans could enjoy uninterrupted live streams, no matter how many ...