Scaling bio 005: Eli lilly's aliza Apple on building collaborative AI infrastructure for drug…

Zahra Khwaja · ·Dec 2, 2025 ·19 min read

Commentary by Hex Index staff

Zahra Khwaja presents a narrative that defies the pharmaceutical industry's entrenched culture of secrecy, arguing that Eli Lilly is betting its future on a radical premise: that sharing proprietary AI models is the only way to accelerate drug discovery. The piece is notable not for announcing a new drug, but for detailing a structural pivot where a $45 billion giant invites competitors to train on its data without ever seeing the raw numbers. This is a high-stakes gamble on the mechanics of trust, and the evidence suggests the industry may finally be ready to stop hoarding and start collaborating.

The Architecture of Trust

Khwaja anchors her analysis in the technical innovation that makes this collaboration possible: federated learning. She explains that TuneLab, launched in September 2025, does not ask partners to upload their sensitive molecular data to a central server. Instead, the platform moves the computation to the data. "The platform allows for additional privacy-enhancing techniques like differential privacy (adding mathematical noise to model updates) and k-anonymisation to further protect proprietary information," Khwaja writes. This distinction is critical. It transforms the traditional data-sharing model, which requires surrendering IP, into a system where the "global AI model is distributed... to each participating company's local node."

Scaling bio 005: Eli lilly's aliza Apple on building collaborative AI infrastructure for drug…

The author highlights the scale of the investment behind this infrastructure, noting that the models were trained on over 500,000 data points accumulated over two decades at a cost exceeding $1 billion. By opening these assets, Lilly is effectively saying that the value lies not in the data itself, but in the collective intelligence generated by the ecosystem. This approach mirrors the evolution of federated learning seen in healthcare privacy research, where the goal has long been to train robust models across disparate hospital systems without violating patient confidentiality. Here, the stakes are commercial rather than clinical, but the mathematical principle remains the same: the whole becomes greater than the sum of its parts.

"The key security principle: Lilly never hosts partner data, and partners never see each other's data. All raw molecular information stays within each organisation's infrastructure."

Critics might argue that no amount of encryption can fully eliminate the risk of model inversion attacks, where a sophisticated actor could theoretically reconstruct training data from the model updates. However, Khwaja notes that Lilly conducted an internal simulation using data from acquired companies to prove the system's viability before opening the doors. "We really proved to ourselves that it would work from both a data science perspective and a usability perspective before we opened it up to early access third-party partners," she quotes Aliza Apple, the architect of the initiative. This internal due diligence suggests the administration is not acting on faith alone, but on validated engineering.

The Strategy of Openness

The commentary shifts to the strategic rationale, where Khwaja captures the tension between corporate protectionism and the need for speed. The article details how Lilly selected specific models for the platform—small-molecule ADMET (absorption, distribution, metabolism, excretion, and toxicity) and antibody developability—because these are areas where "companies are not typically seeking to differentiate but instead are simply generating data and solving problems empirically."

This is a shrewd selection. By focusing on the "commodities" of drug discovery rather than the proprietary targets, Lilly removes the primary barrier to entry for competitors. Khwaja writes, "Since there's a lot less differentiation in the current empirical approach for ADMET and development there's a very clear measurable gain around time and dollars from a savings perspective that we can deliver if we can replace all of that work with a model." The argument is that efficiency gains in these foundational areas will free up resources for the high-risk, high-reward work that actually defines a company's competitive edge.

"We have a strong belief that the medicines of the future...are going to be discovered by AI, with the help of AI, over the next several years."

The piece also touches on the ambition to create "digital twins" of scientists, a concept that echoes the historical trajectory of computational biology. Just as the development of the first effective insulin analogs required a leap from animal extraction to recombinant DNA technology, this new era requires a leap from isolated data silos to networked intelligence. Khwaja notes that the executive team, including CEO Dave Ricks and CIO Diogo Rau, provided direct sponsorship, which was essential for overcoming the "appropriate stage gates" that usually stifle innovation in large pharma.

The Future of Agentic Workflows

Looking forward, Khwaja explores the potential for "agentic reasoning," where AI systems autonomously select and orchestrate different models. This is where the vision becomes most complex. "The core challenge is this: for an agent to make good decisions about which model to use when, we need rigorous, consistent benchmarking across all available models," Khwaja explains. Without standardized benchmarks, an AI agent cannot reliably determine which tool is best for a specific task.

The article suggests that TuneLab could become the industry's de facto benchmarking ground. If successful, this would solve a fragmentation problem that has plagued the sector for years. Khwaja quotes Apple on the potential for this evolution: "Imagine an AI agent that can intelligently select and orchestrate different models across a full drug discovery workflow." The implication is that the next breakthrough in drug discovery won't just be a new molecule, but a new way of organizing the scientific process itself.

"We're actively exploring how to incorporate agentic elements into TuneLab. The vision is compelling - imagine an AI agent that can intelligently select and orchestrate different models across a full drug discovery workflow."

A counterargument worth considering is whether this "SaaS co-creation" mindset truly levels the playing field or simply cements the dominance of the largest players who can afford the most sophisticated computing infrastructure. While the platform is open, the computational cost of running these models locally could still be a barrier for smaller biotechs. Khwaja acknowledges the potential for third-party models to be included, noting, "We recognise that there may be best in class that will not come from Lilly, and we're open to including them in TuneLab." This openness is a necessary check against the platform becoming a walled garden.

Bottom Line

Zahra Khwaja's coverage effectively reframes the pharmaceutical industry's AI strategy from a race for proprietary dominance to a collective infrastructure build. The strongest part of the argument is the technical and strategic justification for federated learning, which solves the trust deficit that has historically prevented collaboration. The biggest vulnerability remains the execution risk: can an industry built on secrecy truly adapt to a model where the value is derived from shared, rather than hoarded, intelligence? The next few years will determine if TuneLab is a fleeting experiment or the new operating system for drug discovery.

Deep Dives

Explore these related deep dives:

Federated learning
The article centers on TuneLab's use of federated learning as its core privacy-preserving technology. Understanding the technical foundations of how machine learning models can be trained across decentralized data without sharing raw information provides essential context for grasping why this approach is revolutionary for pharmaceutical collaboration.
Drug discovery
The article discusses AI's role in transforming drug discovery but assumes readers understand the traditional process. Learning about the historical stages of drug discovery—from target identification through clinical trials—illuminates why AI infrastructure and data sharing could dramatically accelerate what has traditionally been a decade-long, billion-dollar endeavor.
Tirzepatide
The article mentions Mounjaro and Zepbound (both tirzepatide) as Lilly's blockbuster drugs generating over $16 billion in 2024. Understanding the science behind this dual GIP/GLP-1 receptor agonist—how it works, its development history, and why it's been so successful for diabetes and weight management—provides concrete context for Lilly's drug discovery capabilities.

Sources

Scaling bio 005: Eli lilly's aliza Apple on building collaborative AI infrastructure for drug…

by Zahra Khwaja · · Read full article

Eli Lilly is rethinking how the next wave of drug discovery will happen. Rather than guarding internal tools, the company is building shared infrastructure that others can tap into. Through TuneLab, launched in September 2025, Lilly is opening its proprietary AI models to biotech partners. These models draw on Lilly’s internal databases and many years of discovery work, and they run through a federated learning setup so partners can use their own data without giving anything away. As each group trains the models, the whole system quietly gets better.

Aliza Apple leads TuneLab as Vice President of Catalyze360 AI and ML. She has been shaping how Lilly works with the broader biotech community and how open collaboration can unlock new scientific ground. We asked Aliza to share how she sees this shift unfolding and what it means for the future of innovation at the intersection of pharma and AI.

In this conversation, we explore:

Why Lilly is opening access to proprietary models it spent decades building.

The push to solve complex in vivo prediction through the Insitro partnership.

The roadmap toward AI that autonomously selects and orchestrates its own models.

How adopting a “SaaS co-creation” mindset is finally breaking down historical data silos.

How federated learning builds trust by moving compute to data, not data to compute.

Background.

About Eli Lilly

Eli Lilly and Company, founded in 1876 and headquartered in Indianapolis, is one of the world’s leading pharmaceutical companies with a legacy spanning nearly 150 years of medicine discovery and development. The company develops therapeutics across diabetes, oncology, immunology, and neuroscience, with 2024 annual revenues at $45B and over 40,000 employees worldwide1.

Eli Lilly’s portfolio includes some of the pharmaceutical industry’s most impactful medicines (revenues quoted for FY20242):

Mounjaro (tirzepatide) for Type II Diabetes: $11.45B

Zepbound (tirzepatide) for Chronic Weight Management: $4.9B

Trulicity (dulaglutide) for Diabetes: $5.25B

Verzenio (abemaciclib) for Breast Cancer: $5.3B

Eli Lilly’s AI Strategy

In October 2025, Lilly announced a partnership with NVIDIA to build what it claims will be “the most powerful supercomputer owned and operated by a pharmaceutical company,” featuring over 1,000 NVIDIA B300 GPUs. “We have a strong belief that the medicines of the future...are going to be discovered by AI, with the help of AI, over the next several years,” said Diogo Rau3, Lilly’s Chief Information and Digital Officer.

Lilly is pursuing an ambitious vision beyond traditional molecular modelling. The company is ...