← Back to Library

Did you miss these 2 AI stories? A *real* llm-crafted breakthrough + continual learning blocked?

Two AI Stories You Might Have Missed — Including One Genuine Breakthrough

The AI industry is spending more compute on money-making products like browsers and short-form video than on pushing frontier intelligence. That dynamic has created a perception of slowdown. But beneath that surface, genuinely novel discoveries are happening. Here's two stories worth your attention.

Did you miss these 2 AI stories? A *real* llm-crafted breakthrough + continual learning blocked?

A Small LLM Pushing Science Forward

While much of the industry chases the next flagship model, one relatively small language model is actually advancing biological science. It's called C2S Scale — based on Google's Gemma 2 architecture from over a year ago. This model generated a novel hypothesis for a cancer drug that wasn't in any literature.

The researchers trained it with reinforcement learning rewards for accurately predicting how cells would react to drugs, particularly regarding interferon. The goal: make cold tumors hot — detectable by the immune system. C2S Scale converts each cell's gene activity into sentences, essentially reading biology the way it reads text.

The model identified a drug candidate called Sil Mittertib that wasn't linked anywhere in literature for this capacity. Its in-vitro predictions were confirmed multiple times in laboratory settings. Human testing will take years — that's how medicine works — but the implications are significant: language models can generate genuinely new, testable scientific hypotheses.

This result provides a blueprint for a new kind of biological discovery. When people claim LLMs won't accelerate science, remember this story.

The Quest for an AGI Definition

A paper by several prominent AI researchers proposes the first conclusive definition of AGI using what they call the Cattle Horn Carroll theory — an empirically validated model of human cognition applied to artificial systems.

The resulting scores show GPT-4 at 27% and GPT-5 at 58%. But that doesn't mean GPT-6 or 7 would reach full AGI. The theory breaks cognition into ten discrete categories, each weighted equally at 10%, including general knowledge, reading ability, math competence, spot reasoning, working memory, long-term memory storage, visual processing, listening ability, and reaction time.

One category stands out as the most significant limitation: memory. Language models can't remember things beyond their conversation context. They don't continually learn on the job.

The authors write that without continual learning, AI systems suffer from amnesia, limiting their utility. Every bit of context adds cost to API calls, so providers deliberately limit how much context these models take in. Without more context, they make huge blunders because they simply don't understand the situation — and they won't remember it next time.

Without the ability to continually learn, AI systems suffer from amnesia, which limits their utility, forcing the AI to relearn context in every interaction.

This is fundamentally different from just adding more context. It's a question of whether we'll solve continual learning itself.

The OpenAI Quote That Reveals Why

Jerry Tuar, OpenAI's VP of Research, recently addressed this limitation directly. His interview revealed something crucial: current reinforcement learning happens during training runs, not in real-time with users in the loop. Some companies like Cursor are trying to train models online with users in the loop — theoretically possible with GPT and other products — but it's a dangerous path.

Tuar argued that without robust safeguards, this approach could enable all kinds of harmful training. Until we have strong safeguards, with something as complex as GPT would be reckless.

The real issue isn't benchmark performance — Gemini 2.5 Deep Think just broke records on Frontier Math, the hardest mathematics benchmark. It's not about who wins a benchmark today. The fundamental limitation is that AI systems can't learn continuously. They forget everything between conversations. That may soon change.

Bottom Line

The drug discovery story represents something genuinely new: an LLM generating novel, testable scientific hypotheses in biology. That's different from incremental benchmark improvements. But the AGI definition paper and OpenAI's quote reveal the real constraint holding back artificial general intelligence — not capability benchmarks but fundamental memory and continual learning problems that remain unsolved. Watch for solutions to those problems; that's where the next breakthrough will happen.

Deep Dives

Explore these related deep dives:

Sources

Did you miss these 2 AI stories? A *real* llm-crafted breakthrough + continual learning blocked?

by AI Explained · AI Explained · Watch video

Okay, I'm going to be honest. AI companies have a set amount of computing power. And at the moment, they are spending more of it on scaling up money-making stuff like browsers and video shorts than on scaling up frontier performance and IQ points. Hence the feeling among some of a slowdown in progress and no hard feelings.

You got to make the investors some money. But as that maxes out, the story will shortly hopefully return to ramping up Frontier Intelligence. For example, when Gemini 3 comes out from Google DeepMind, expected in the next 2 months, but none of what I just said means that pretty juicy things aren't happening with our current language model horsepower. So, I'm going to start with a novelty produced by a baby LLM.

Then I'll get to a revealing remark from a top OpenAI researcher and end with some interesting bits that I found throughout this week. Let's start with a language model so often decrieded for their hallucinations, but one that actually is pushing science forward by learning the language of biology. Yes, the model does have a dodgy name, C2S scale, but it was able to generate a novel hypothesis for a drug to aid in cancer treatment. What I'm going to try and do is simplify the simplification of this 29page paper.

So here we go. When I say we have a language model that generated a novel hypothesis for a drug to aid in cancer treatment, what is it was a drug candidate that was not in the literature for being able to help in this way. This model by the way was based largely on the openw weightights Gemma 2 architecture from Google released over a year ago. Gemma 3 has come out since.

Gemma 4 is due any time. So it's not the latest Gemma by any stretch. But anyway, this language model was given special doggy training. You can think of it.

Reinforcement learning rewards for accurately predicting how cells would react to drugs, especially regarding interferon. I know what you're thinking. But why would we do that though? Well, it's to make cold cancer tumors hot.

In other words, to make them detectable by the immune system. Wait, so hold on. This is an LLM that can speak biology. Yes.

And English, too. Like, you could probably likely still chat to this model about ...