← Back to Library

How zalando delivers real-time insights to its partners brands

In an era where data is often treated as a static commodity, Alex Xu presents a compelling case for why real-time, zero-copy sharing is the only viable path for modern enterprise ecosystems. The piece stands out not merely for its technical depth, but for its stark admission that even a European giant like Zalando was losing millions in analyst hours to the archaic practice of manual data consolidation. This is a story about the hidden tax of fragmentation, and why the future of commerce depends on breaking down the walls between platforms.

The Hidden Cost of Fragmentation

Xu opens by dismantling the romanticized view of digital ecosystems, revealing a backend reality that is far messier. He notes that while Zalando connects thousands of brands, the data flow was "scattered across multiple systems and shared through a patchwork of methods." This isn't just an IT inconvenience; it's a strategic bottleneck. The author highlights a staggering inefficiency: partners were dedicating "the equivalent of 1.5 full-time employees each month just to extract, clean, and consolidate the data they received."

How zalando delivers real-time insights to its partners brands

This framing is crucial because it shifts the conversation from abstract "data strategy" to concrete labor costs. When skilled analysts are forced to act as data janitors, the entire organization suffers. Xu argues that the existing interfaces were "not designed for heavy or large-scale data downloads," leaving partners blind during critical forecasting cycles. The argument lands hard because it exposes a fundamental disconnect: the platform wanted to be a partner, but the infrastructure treated them like afterthoughts.

Critics might note that the article glosses over the immense legacy debt required to migrate from such a fragmented state, but the focus on the result rather than the pain of migration is a deliberate choice to keep the narrative forward-looking.

"Partners did not just want raw data or operational feeds. They wanted analytical-ready datasets that could be accessed programmatically and integrated directly into their internal analytics tools."

The Architecture of Trust

The core of Xu's analysis lies in the rigorous criteria Zalando established before selecting a solution. The author emphasizes that the new system had to be "cloud-agnostic," recognizing that forcing partners to migrate to a single vendor would be a non-starter. This reflects a mature understanding of the modern tech landscape, where heterogeneity is the norm, not the exception.

Xu details the selection of Delta Sharing, an open protocol that allows for "zero-copy access." This concept is the article's technical anchor. As Xu explains, this means partners can "query live datasets directly without needing to download or duplicate them." The implication is profound: it eliminates data redundancy and ensures everyone works from a single source of truth. This approach mirrors the evolution of Extract, Transform, Load (ETL) processes from the 1990s, where batch processing created massive data silos, to today's streaming architectures that prioritize immediacy and consistency.

The decision to use a managed service rather than hosting the protocol internally is also significant. Xu writes that this choice "removes the operational overhead of managing and maintaining sharing servers, tokens, and access logs internally." This is a pragmatic move that prioritizes value delivery over infrastructure control. It acknowledges that in a B2B context, reliability and security are often better outsourced to specialists than built in-house.

"The platform had to be open and extensible. This meant avoiding dependence on a single vendor or proprietary technology."

Scaling for Diverse Maturity Levels

One of the most nuanced parts of Xu's coverage is the acknowledgment that not all partners are created equal. The article breaks down the ecosystem into three tiers: large enterprises with their own data teams, medium-sized partners needing flexibility, and small retailers relying on spreadsheets.

The proposed solution had to serve all three without creating new complexity. Xu describes how the architecture uses a "logical container" called a Delta Share to group datasets, and a "Recipient" identity for each partner. This granularity allows for "access control at the table or dataset level," ensuring that a small retailer doesn't accidentally see data meant for a global brand. This level of security is non-negotiable when dealing with sensitive commercial data.

However, the article's focus on the technical elegance of the solution sometimes underplays the human friction of adoption. While Xu mentions the creation of user guides and troubleshooting documentation, the sheer cultural shift required to move from downloading CSVs to connecting via API is often underestimated. A counterargument worth considering is that the biggest hurdle isn't the protocol, but the willingness of smaller partners to upgrade their own internal workflows to consume this new data stream.

"Partners now had real-time access to data, partner-specific credentials ensured granular security, and no redundant storage simplified maintenance."

Bottom Line

Alex Xu's analysis succeeds by reframing data sharing not as a technical feature, but as a fundamental business enabler that directly impacts the bottom line of thousands of partners. The strongest part of the argument is the demonstration of how "zero-copy" technology solves both the latency problem and the security dilemma simultaneously. Its biggest vulnerability is the assumption that all partners have the technical maturity to leverage these advanced capabilities, a gap that may persist despite the best onboarding efforts. For leaders watching the industry, the takeaway is clear: the era of static data dumps is over, and the winners will be those who build open, real-time bridges instead of walled gardens.

Deep Dives

Explore these related deep dives:

  • Extract, transform, load

    The article highlights how partners spent 1.5 FTEs monthly on data extraction and cleaning - classic ETL work. Understanding ETL processes illuminates the operational burden Zalando aimed to eliminate and why 'analytical-ready datasets' represent a paradigm shift from raw data sharing.

Sources

How zalando delivers real-time insights to its partners brands

The 2025 Data Streaming & AI Report (Sponsored).

AI is only as powerful as the data behind it — but most teams aren’t ready.

We surveyed 200 senior IT and data leaders to uncover how enterprises are really using streaming to power AI, and where the biggest gaps still exist.

Discover the biggest challenges in real-time data infrastructure, the top obstacles slowing down AI adoption, and what high-performing teams are doing differently in 2025.

Download the full report to see where your organisation stands.

Disclaimer: The details in this post have been derived from the details shared online by the Zalando Engineering Team. All credit for the technical details goes to the Zalando Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.

Zalando is one of Europe’s largest fashion and lifestyle platforms, connecting thousands of brands, retailers, and physical stores under one digital ecosystem.

As the company’s scale grew, so did the volume of commercial data it generated. This included information about product performance, sales patterns, pricing insights, and much more. This data was not just important for Zalando itself but also for its vast network of retail partners who relied on it to make critical business decisions.

However, sharing this data efficiently with external partners became increasingly complex.

Zalando’s Partner Tech division, responsible for data sharing and collaboration with partners, found itself managing a fragmented and inefficient process. Partners needed clear visibility into how their products were performing on the platform, but accessing that information was far from seamless. Data was scattered across multiple systems and shared through a patchwork of methods. Some partners received CSV files over SFTP, others pulled data via APIs, and many depended on self-service dashboards to manually export reports. Each method served a purpose, but together created a tangled system where consistency and reliability were hard to maintain. Many partners had to dedicate the equivalent of 1.5 full-time employees each month just to extract, clean, and consolidate the data they received. Instead of focusing on strategic analysis or market planning, skilled analysts spent valuable time performing repetitive manual work.

There was also a serious accessibility issue. The existing interfaces ...