Never cross a river four feet deep on average

This piece delivers a stark warning disguised as a neuroscience replication: a flashy promise that flashing lights can turbocharge learning appears to be an illusion born of statistical noise and hidden data quirks. Scott Alexander doesn't just report that the experiment failed; he dissects how it failed, revealing that the original study's "miracle" relied on averaging away a few participants who simply got bored and quit. For anyone investing time in biohacking trends or trusting headline-grabbing brainwave studies, this is a critical reality check on the fragility of modern scientific claims.

The Metronome That Wasn't There

The original study, "Learning at your brain's rhythm," promised something seductive: if you flash light at a person's specific alpha frequency (the brain wave oscillating 8–12 times per second), you can sync their internal metronome and accelerate learning. The hypothesis was that the brain has an intrinsic rhythm for visual processing, and an external flicker could reinforce it. As Scott Alexander notes, "If flickering light could act as an external metronome, it might help the brain maintain the right rhythm and learn faster." This idea hinted at a future where consumer-grade helmets could make us smarter overnight.

Never cross a river four feet deep on average

However, the replication effort, led by grantee Sasha Putilin, found no evidence of this accelerated learning. The core finding—that one specific timing condition (T-match) made people learn three times faster—vanished. Alexander writes, "The original study's central finding — that the T-match group learned three times faster — is absent." Instead of a breakthrough, the data suggested the effect was likely not real to begin with.

This failure isn't just about one failed experiment; it touches on the broader replication crisis in science, where initial exciting findings often dissolve under rigorous scrutiny. Just as the stroboscopic effect can make a rotating wheel appear stationary or move backward due to sampling rates, statistical sampling here created an illusion of progress that wasn't there. Alexander points out that the original study obscured the reality by using summary statistics: "The individual data tells a different story: the difference is primarily driven by a few P-match participants with sharply negative learning rates."

The point of science is to look at the underlying data with a critical eye and ask yourself questions like: Is the effect real, or is it an artefact of analytic flexibility and small samples?

Cargo-Cult Statistics and Hidden Data

The most damning part of Alexander's commentary isn't that the experiment failed, but how the original authors presented their success. He introduces the concept of "cargo-cult statistics," a term from Stark and Saltelli describing the mechanical ritual of running tests without understanding the data. The original researchers performed the ceremony: they collected data, ran t-tests, found p-values under 0.05, and published. But Alexander argues this is insufficient. "They invoke statistical terms and procedures as incantations, with scant understanding of the assumptions or relevance of the calculations," he writes.

The original paper averaged individual learning curves into a smooth group trend, hiding the fact that the "success" was driven entirely by outliers in the control groups who got worse over time, likely due to boredom. Alexander notes, "For 17 of the 40 data points in the original study's P-match and T-match groups, removing that single data point would push the study outside the traditional p = 0.05 threshold." This fragility suggests the result was a statistical fluke rather than a robust biological phenomenon.

Critics might argue that small sample sizes are an inevitable cost of expensive neuroscience research, and that the original authors did provide enough raw data for others to spot these issues eventually. However, Alexander counters that relying on the community to dig through messy data defeats the purpose of scientific communication: "They don't even properly release per-block accuracies for recreating their analysis." The failure to anticipate how averaging could mislead suggests a lack of rigor in the original design, not just bad luck.

The Cost of Verification

The replication itself was a triumph of frugality and transparency, contrasting sharply with the opacity of the original. Putilin managed to replicate the study's core mechanics using consumer-grade hardware costing around $2,000, compared to the original's $50,000–$100,000 setup. "Although the decision was forced by the budget, replicating the study on consumer hardware had one important advantage: it tested whether someone could plausibly build learning software for cheap headsets," Alexander observes.

Despite using cheaper equipment and a smaller sample size (12 participants versus 80), the replication was able to definitively disprove the original claim because it looked at individual trajectories rather than group averages. The result was a humbling reminder that high-tech gear doesn't guarantee truth, but rigorous statistical hygiene does. Alexander concludes that the original study's flaws are part of a systemic issue where "weak work published" is rewarded if it looks good on paper and passes the ritualistic checks of peer review.

Cargo-cult statistics... demotes statistics from a way of thinking about evidence and avoiding self-deception to a formal 'blessing' of claims.

Bottom Line

Scott Alexander's commentary succeeds in shifting the focus from the allure of "brain hacking" to the mechanics of scientific integrity, proving that a $2,000 experiment can dismantle a $100,000 myth if it asks the right questions. The piece's greatest strength is its exposure of how easily summary statistics can mask failure, but it leaves readers with an uncomfortable question: how many other "breakthroughs" in neuroscience are built on similar statistical sand?

Never cross a river four feet deep on average

by Scott Alexander · Astral Codex Ten · Read full article

[This is a guest post by 2024 ACX grantee Sasha Putilin. I encourage any ACX grantees who are interested to write about their projects. - SA]

The results of my ACX Grants 2024 project are in.

The project attempted to replicate the 2023 study “Learning at your brain’s rhythm: individualized entrainment boosts learning for perceptual decisions”. It claimed that if you read a person’s brain waves, figured out an individual peak alpha frequency, and flashed a bright white light at that frequency, then they learned a certain perceptual task faster.

Why bother? The result hinted that learning may depend in part on how well the brain keeps its rhythms coordinated. In other words, perceptual learning may rely on an internal brain metronome. If flickering light could act as an external metronome, it might help the brain maintain the right rhythm and learn faster.

The study offered an invitation to develop new frontiers of neuroscience and biohacking. If the effect generalised to other types of learning, you could build a learning helmet: put it on your head, let it read your brainwaves, flicker light tailored to your individual brain — and you learn a new skill quicker.

And no, it didn’t replicate. Most likely it can’t replicate, because the effect is probably not real. The original study obscured the data with summary statistics. Running a $32,000 replication was excessive. We could’ve caught the issue with this study if we simply looked at the original data carefully.

*record scratch* *freeze frame* Yep, that’s me. You’re probably wondering how I got here. Here’s the story.

The original study.

“Learning at your brain’s rhythm” tested participants’ ability to learn an artificial task — distinguishing between radial and concentric patterns.

The patterns on the left are easy to tell apart — they are prototypes. Those on the right are more like the actual task. They mask the underlying pattern in noise, with the noise level varying around ≈75%. Participants had to classify patterns fast: they were shown on the screen for just 200 ms, after which they had 1.3s to pick an answer.

During the study participants wore an electroencephalography (EEG) cap — a mesh of electrodes on the scalp that recorded their brain rhythms. The EEG was used to tune the flicker to each person’s own alpha frequency and to track what was happening in their brain as they learned.

The visual identification task ...

Never cross a river four feet deep on average

The Metronome That Wasn't There

Cargo-Cult Statistics and Hidden Data

The Cost of Verification

Bottom Line

Deep Dives

Sources

Never cross a river four feet deep on average