Rohit Krishnan challenges the prevailing assumption that artificial intelligence is a static tool, proposing instead that its true potential lies in a continuous, real-time feedback loop with the messy reality of the world. By running a tiny, open-source model on a personal laptop to predict daily news headlines, he demonstrates that even modest systems can learn to navigate complex, adversarial environments without massive computational resources. This is not just a technical demo; it is a blueprint for how AI might finally evolve from a database of past facts into a dynamic participant in the future.
The Limits of Static Benchmarks
Krishnan begins by dismantling the standard metrics we use to evaluate large language models. He notes that while current systems excel at math, logic puzzles, or even booking plane tickets, these tasks fail to capture the essence of understanding a changing world. "One would imagine they go hand in hand but alas," he observes regarding the boom in AI usage versus the clarity of its capabilities. The author argues that traditional benchmarks suffer from "teaching to the test," where models optimize for a specific exam rather than genuine comprehension. Instead, he proposes a more rigorous yardstick: prediction markets. If a model can accurately forecast future events, it must possess a robust internal model of how the world works.
"The key thing that you know differentiates us is the fact that we are able to learn right like if you have a trader who gets better making predictions they do that because like you know he or she is able to read about what they did before and can use that as a springboard to learn something else."
This comparison to human traders is the piece's most compelling insight. Krishnan suggests that the current gap between human and machine intelligence isn't about raw processing power, but about the mechanism of learning. Humans improve because they constantly update their mental models based on outcomes; most AI models, once trained, remain frozen in time. The author's experiment, dubbed "Foresight Forge," was designed to bridge this gap by creating an automated research engine that not only makes predictions but also ingests the results the next day to update its own policy.
Critics might note that predicting headlines is a noisy task where luck often masquerades as skill, and a model could simply be regurgitating common tropes rather than truly understanding causality. However, Krishnan anticipates this by designing a system that requires specific structural elements in its predictions, forcing the model to articulate drivers and verification sketches rather than vague guesses.
The Tiny Model Revolution
The most surprising element of Krishnan's coverage is his decision to run this experiment on a "tiny model" rather than the massive, state-of-the-art systems dominating the headlines. He chose a 0.6 billion parameter model, Qwen, running locally on a laptop to avoid the prohibitive costs of cloud computing. "For instance what's the best way to do this would be to say make a bunch of predictions and the next day you can look back and see how close you got to some of those predictions and update your views," he explains. By using a small model, he forces the system to learn efficiently from sparse rewards, mimicking the constraints of the natural environment.
"I was super surprised that even a small model did learn to get better at predicting next day's headlines. I wouldn't have expected it because there is no logical reason to believe that tiny models can still learn sufficient world model type information that it can do this."
This finding upends the industry's current trajectory, which assumes that scaling up parameters and data is the only path to intelligence. Krishnan argues that the bottleneck is not size, but the feedback mechanism. He details how he had to engineer a specific reward function using semantic similarity to judge the model's accuracy, a workaround for the fact that small models are poor at judging their own work. "The hardest part was trying to figure out the exact combination of rewards that would actually make the model do what I wanted and not whatever it wanted to try and maximise and reward by doing weird stuff," he writes. This highlights a critical, often overlooked challenge in AI development: the reward function is just as important as the model architecture.
"The future is totally going to look like a video game."
This metaphor captures the essence of his argument. In a video game, an agent learns by interacting with the environment, receiving immediate feedback on success or failure, and adjusting its strategy in real-time. Krishnan sees this as the missing link for AI. He points to companies like Cursor, which already use similar reinforcement learning techniques to update their coding assistants every few hours based on human acceptance or rejection of suggestions. If this works for code, Krishnan posits, it should work for the broader world.
The Path to Continuous Learning
The broader implication of Krishnan's work is a shift from episodic training to continuous adaptation. He envisions a future where AI systems are not static products but evolving partners that get smarter every day. "There's no reason to believe that this is an isolated incident, just like with the RLNVR paper there is no reason to believe that this will not scale to doing more interesting things," he asserts. The technical hurdles are significant—specifically, creating a reward function that is both interesting and robust enough to teach the model something new without causing it to collapse into nonsense. But the proof of concept is there.
"While I chose one of the harder ways to do this by predicting the whole world, I was super surprised that even a small model did learn to get better at predicting next day's headlines."
The author's willingness to publish his code and methodology, including the specific parameters that worked best, invites the community to replicate and improve upon his findings. This openness stands in contrast to the secretive nature of many major AI labs. By demonstrating that a personal laptop can run a continuous learning loop, Krishnan democratizes the path to advanced AI capabilities. He suggests that the next breakthrough won't come from a single massive training run, but from thousands of small, daily updates driven by real-world feedback.
Bottom Line
Krishnan's most powerful contribution is the demonstration that continuous, on-policy learning is not just a theoretical goal but a practical reality achievable with modest resources. While the experiment relies on a somewhat noisy metric—headline prediction—the underlying mechanism of updating a model based on daily outcomes offers a viable path toward truly adaptive AI. The biggest vulnerability remains the difficulty of designing reward functions that generalize well across diverse, complex domains without encouraging the model to game the system. Readers should watch for how this "video game" approach to AI training scales beyond news prediction to more critical areas like policy analysis and scientific discovery.