BioByte 138: Understanding codon translation with EnCodon, LabOS ai-xr co-scientist assists in wet…

Pablo Lubroth · ·Oct 30, 2025 ·13 min read

Commentary by Hex Index staff

This week's BioByte cuts through the noise of generic "AI for science" hype by pinpointing a specific, often overlooked bottleneck: the gap between digital prediction and physical reality. Pablo Lubroth argues that the next breakthrough in biotechnology won't just come from better algorithms, but from models that understand the "grammar" of biology and systems that can actually see what happens in a wet lab. The piece is notable for its refusal to accept sub-angstrom accuracy as a finish line, exposing how current tools still fail to guide real-world drug discovery.

The Grammar of Life and the Limits of Prediction

Lubroth opens by dissecting Genesis Molecular AI's new model, Pearl, which claims to outperform the industry standard AlphaFold3. The author notes that while headline numbers are impressive, the real story lies in the engineering choices. "Beyond the headline numbers, Pearl's explicit design for deployment is noteworthy," Lubroth writes, highlighting the model's ability to train on synthetic physics-generated data to overcome the scarcity of experimental structures. This is a pragmatic pivot; rather than waiting for more lab data, the team is simulating reality to teach the AI.

BioByte 138: Understanding codon translation with EnCodon, LabOS ai-xr co-scientist assists in wet…

However, the commentary quickly pivots to a critical warning that many in the field might ignore. Lubroth points out that "standard <2Å accuracy thresholds are insufficient," revealing that many mathematically "correct" poses still contain fatal errors like ring flips or missed interactions. This distinction is vital for any reader in drug development: a model can be statistically perfect and practically useless.

"Reliance on proprietary and physics-generated training data raises questions about out-of-distribution robustness and whether synthetic priors fully capture binding thermodynamics."

The author's skepticism here is well-placed. While scaling laws suggest performance improves with more synthetic data, the piece acknowledges a glaring hole: "the team demonstrates that confidence models across all evaluated systems failed to rank poses better than random selection." This means the AI can generate a good pose, but it cannot tell you which one is good. Critics might argue that this limitation renders the tool premature for clinical decision-making, yet Lubroth frames it as a necessary step toward shifting structure-based modeling from a validation tool to a driver of lead optimization.

Decoding the Redundancy of Codons

Shifting from proteins to the code that builds them, Lubroth introduces EnCodon, a foundation model designed to understand codon usage bias. For decades, scientists have known that the same amino acid can be encoded by different codons, but the functional impact of this redundancy has been a black box. Lubroth explains that "while the same amino acid can be encoded by different codons, the use of different codons can have significant downstream effects on translation, RNA stability, and protein expression."

The article draws a fascinating historical parallel to the concept of Root Mean Square Deviation (RMSD), noting that just as RMSD measures structural deviation, codon usage measures a "contextual grammar" that dictates biological function. The EnCodon model, trained on over 130 million sequences, attempts to decode this. Lubroth observes that "larger models are able to more effectively distinguish between codons when predicting masked tokens in different contexts," suggesting that scale is unlocking a deeper understanding of biological nuance.

"This task demonstrates the benefits of codon-level modeling given the historically challenging nature of resolving synonymous variant effects."

This is a significant claim. Historically, synonymous mutations (those that don't change the amino acid) were often dismissed as benign. Lubroth highlights that EnCodon can now distinguish between these benign variants and pathogenic missense mutations better than previous RNA baselines. This has profound implications for mRNA therapeutics and vaccine design, where codon optimization is key to expression levels. However, the author remains grounded, noting that while the model surpasses RNA baselines, it still lags behind protein language models like ESM2 in some areas, indicating that the field is still in the early stages of integrating these modalities.

When AI Gets Its Hands Dirty

Perhaps the most striking section of the piece moves beyond the screen to the bench. Lubroth details the "LabOS" system, an AI co-scientist that uses extended reality (XR) glasses to watch and critique human experiments in real-time. The author describes a shift from "digital scientific agents" to systems that act as "a set of eyes that can guide and critique real-world wet lab experiments."

The methodology is rigorous. The team didn't just simulate errors; they recorded "over 200 real experimental sessions" and had experts annotate every mistake to build the LabSuperVision dataset. Lubroth writes that the system uses vision language models to "infer which protocol is being performed and to identify procedural errors from the videos." This moves AI from a theoretical partner to a safety inspector.

"The system as a whole marks an exciting step toward integrating AI into the physical workflows of science—augmenting scientists at the bench with real-time visual guidance, feedback, and knowledge transfer."

The implication is that the bottleneck in science is no longer just data analysis, but the consistency of human execution. By training on real-world footage, the AI learns the "temporal consistency" required for complex protocols. A counterargument worth considering is the privacy and trust implications of having an AI constantly recording and critiquing researchers, but Lubroth focuses on the potential for reducing costly errors in high-stakes environments like cancer immunotherapy research.

The Mind-Body Interface

In a surprising pivot, the commentary touches on a study where virtual reality triggers a real immune response. Lubroth describes how "potential contact with virtual infection threats is predicted by fronto-parietal areas of the PPS system," leading to actual changes in innate lymphoid cell activation.

This finding challenges the boundary between the physical and the simulated. The author notes that "infectious VR avatars generated an increase in ILC activation that was similar to that generated by vaccines." This suggests that the brain's anticipation of a threat can prime the body's defenses without a single pathogen being present. While this is a small study, Lubroth frames it as a crucial piece of the puzzle in understanding neuro-immune modulation, hinting at future therapies that might leverage psychological states to boost physical health.

The Bottom Line

Pablo Lubroth's analysis succeeds by refusing to treat AI as a magic bullet, instead focusing on the specific engineering hurdles that separate a good model from a useful tool. The strongest argument is the insistence that accuracy metrics alone are meaningless without the ability to select the right pose or understand the context of a codon. The biggest vulnerability remains the gap between these sophisticated models and independent, prospective validation in real clinical trials. Readers should watch for the upcoming blind tests on industrial targets, which will be the true litmus test for whether these "co-scientists" can truly drive discovery.

Deep Dives

Explore these related deep dives:

Codon usage bias
The article discusses EnCodon, a foundation model for understanding codon translation and the 'contextual grammar of codon usage.' Understanding codon usage bias - why organisms prefer certain codons over synonymous alternatives - is fundamental to grasping why this model matters for mRNA therapeutics and vaccine development.
Root mean square deviation of atomic positions
The article mentions Pearl achieving improvements on 'RMSD-based benchmarks' and discusses 'sub-angstrom accuracy' thresholds. RMSD is the standard metric for comparing predicted protein-ligand structures to experimental ones, and understanding this metric is essential for evaluating drug discovery model performance.

Sources

BioByte 138: Understanding codon translation with EnCodon, LabOS ai-xr co-scientist assists in wet…

by Pablo Lubroth · · Read full article

Welcome to Decoding Bio’s BioByte: each week our writing collective highlight notable news—from the latest scientific papers to the latest funding rounds—and everything in between. All in one place.

Stars by Wassily Kandinsky, 1938

What we read.

Blogs.

Introducing Pearl: The Next Generation Foundation Model for Drug Discovery [Genesis Research Team, Genesis Molecular AI, October 2025]

Accurately predicting how drugs bind to protein targets has been the holy grail of computational drug discovery. Genesis takes a significant step toward that goal: with Pearl, a protein-ligand structure prediction model, they surpass AlphaFold3 – achieving 14-15% relative improvements on standard RMSD-based benchmarks (PoseBusters and Runs N’ Poses) and substantially larger gains (3.6x at sub-angstrom accuracy) on proprietary, real-world drug targets. Beyond the headline numbers, Pearl’s explicit design for deployment is noteworthy: the model trains on large-scale synthetic complexes generated via physics simulations to overcome data scarcity, uses an SO(3)-equivariant diffusion architecture that enforces 3D rotational symmetry and geometric consistency in its predictions, and provides ‘scientist-in-the-loop’ control through templating and conditional inference modes enabling chemists to steer predictions with auxiliary structural information.

The technical choices reflect pragmatic engineering for drug discovery workflows rather than pure benchmark optimization. Training on synthetic physics-generated data is presented as a viable strategy – the team demonstrates that model performance scales monotonically with synthetic dataset size, suggesting a potential concrete path to improvement beyond the limited corpus of experimental structures. The conditional co-folding interfaces and templating system address a common failure mode of generative structure models: a lack of actionable control. Pearl allows chemists to leverage known binding pocket information or related crystal structures to generate and sample physically plausible poses they can iterate on – critical for lead optimization where sub-angstrom accuracy correlates with potency prediction. Importantly, the team demonstrates that standard <2Å accuracy thresholds are insufficient: qualitative analysis reveals that many poses below this threshold contain critical issues ranging from ring flips to missed interactions, rendering them spurious models for medicinal chemists.

The work comes with measurable caveats crucial to adoption in drug discovery. Reliance on proprietary and physics-generated training data raises questions about out-of-distribution robustness and whether synthetic priors fully capture binding thermodynamics – the associated paper itself catalogs common failure modes and emphasizes that high-accuracy thresholds are necessary for discovery utility. Moreover, independent validation will offer more decisive information on the model: blind prospective tests on withheld industrial targets, comparative analyses in drug programs, ...