← Back to Library

How this small startup achieved a Near-Perfect record against AI slop

In an era where digital pollution is often deemed inevitable, Alberto Romero presents a startling counter-narrative: a small startup has finally cracked the code on distinguishing AI-generated text without sacrificing human expression. While previous tools were notorious for flagging the US Constitution as machine-written, Romero argues that a new player, Pangram Labs, has achieved a near-perfect record by abandoning the impossible goal of catching everything and focusing entirely on never catching the innocent. This is not just a technical upgrade; it is a fundamental shift in how we might reclaim the integrity of the internet.

The Trap of the Drift Net

Romero opens by reframing the problem of "AI slop." He suggests that the content itself isn't the primary obstacle; rather, the failure lies in our detection methods. "Unlike the others, however, AI slop is hard to distinguish but not really hard to detect," Romero writes. He draws a sharp line between distinguishing, which isolates a specific item, and detecting, which merely confirms presence. The historical failure of previous detectors, he argues, stems from an over-enthusiastic approach that cast too wide a net. "Everyone knows that catching a machine on purpose is easy; what almost nobody knows is how hard it is not to catch a human by mistake," he notes. This indiscriminate approach created a paradox where the tools designed to clean the web ended up punishing human writers, effectively allowing the slop to settle like digital mold because the cost of removing it was too high.

How this small startup achieved a Near-Perfect record against AI slop

The core of Romero's analysis is that the industry has been fighting the wrong battle. He argues that detectors have been trying to be "AI distinguishers" when they should have been content with being reliable sensors. "The entire story of AI detectors is the story of how they want to be, instead, AI distinguishers," he observes. This framing is crucial because it shifts the blame from the technology's inability to detect AI to the technology's refusal to accept a trade-off. By trying to eliminate both false positives (flagging humans) and false negatives (missing AI), previous systems failed at both. Romero suggests this mirrors the concept of Goodhart's law, where a measure becomes a target and ceases to be a good measure; when detection rates become the sole metric of success, the nuance of human writing is lost.

"It's better for 10 guilty people to escape than for 1 innocent person to suffer."

The Blackstone Principle in Code

Enter Pangram Labs, which Romero credits with making the decisive compromise that eluded others. Instead of waging a total war, they adopted a strategy rooted in legal philosophy. Romero points out that Pangram adheres to William Blackstone's 1765 principle, prioritizing the protection of the innocent over the total capture of the guilty. "Pangram's decisive compromise is, as I see it, the first successful offensive in humanity's reconquest of the web," he asserts. This approach allows them to claim with high confidence that when their tool flags content, it is indeed AI-generated. The result is a false positive rate so low—1 in 10,000 on test sets and 1 in 100,000 on scientific papers—that it renders the "liar's dividend" useless. No longer can bad actors claim that a flag is just a glitch in the system.

Romero highlights the tangible impact of this precision. He cites Pangram's data showing that "21%, or 15,899 reviews [at ICLR 2026], were fully AI-generated," and that over half had some form of AI involvement. This level of specificity was previously impossible. By refusing to flag human writing, Pangram has created a tool that can actually be trusted. "With Pangram, you can be almost 100% sure what fraction of 'post-AI' content is AI-generated because Pangram doesn't fail with 'pre-AI' content at all," Romero explains. This is a massive leap forward for institutions, from academic journals to newsrooms, that need to verify the authenticity of submissions without stifling human creativity.

Critics might note that this strategy leaves a significant portion of AI content undetected if it is cleverly disguised. However, Romero argues this is a feature, not a bug, of a system designed for fairness. The goal isn't to catch every single instance of AI use, but to ensure that no human is falsely accused. This aligns with the broader ethical imperative of avoiding harm to those who are playing by the rules.

The Illusion of the Perfect Catch

Despite the praise, Romero does not shy away from the limitations of Pangram's success. He addresses the claim of a near-zero false negative rate—the ability to catch every piece of AI text. "Independent researchers at UChicago have confirmed it," he writes, but he immediately questions the methodology. The problem, Romero argues, is that these tests often rely on controlled environments where AI text is pure and unadulterated. "The distribution of AI presence in a controlled experiment in a lab doesn't necessarily have any resemblance to the distribution of AI presence in the real world," he warns.

Romero offers a personal anecdote to illustrate this gap. He admits that he can routinely fool the detector by blending his own style with AI output, a technique he calls having a "sense of smell" for the tools. "I do it easily, with no special tools, humanizer software, or careful adversarial tricks," he confesses. This highlights a critical vulnerability: the real world is a gradient, not a binary. Most writers do not use AI to generate entire articles from scratch; they use it to edit, assist, or generate parts of a larger human work. "There are as many ways to blend AI into your writing process as there are writers; the real world is a gradient," Romero observes. By testing only on pure AI text, the benchmarks for false negatives may be misleading.

"To claim near-zero true false negatives in these conditions is like claiming you've caught every target fish in a lake with your large-scale drift net when your evidence is that you've caught every target fish you put there yourself."

Romero suggests that this limitation is actually what makes Pangram valuable. By optimizing for the battle where victory is verifiable (avoiding false positives), they have achieved a level of reliability that other tools lack. "Pangram will actually lean on 'human-written' when unsure. That's the right engineering decision and, I'd argue, the right ethical one too," he concludes. This admission of imperfection in one area allows for perfection in the other, creating a tool that is honest about its capabilities.

A Functional Approach to the Digital Commons

Ultimately, Romero sees Pangram not as a silver bullet, but as a catalyst for a new kind of digital hygiene. He rejects the idea of a blanket rejection of AI, or "AI;DR," in favor of a "functional" approach that combines tool-assisted detection with human judgment. "The only way to clean up the entire digital town is for each of us to clean the sidewalk in front of our own digital homes," he urges. This call to action empowers individuals to take control of their information diets without resorting to witch hunts or public shaming.

He acknowledges that some guilty parties will escape detection, but argues this is an acceptable cost. "Is it so terrible that some guilty individuals will escape our judgment? No, when you consider that not pursuing them ensures that almost no innocent people are harmed," Romero writes. This perspective shifts the focus from total eradication to the preservation of trust. By ensuring that a flag is a definitive statement of fact, Pangram restores the possibility of accountability in a way that previous, error-prone tools never could.

Bottom Line

Alberto Romero's analysis offers a compelling roadmap for navigating the AI-saturated web, proving that perfection is indeed the enemy of the good. While the tool's inability to catch every hybrid AI-human text remains a vulnerability, its unwavering commitment to never falsely accusing a human writer is a transformative achievement. The strongest part of this argument is the ethical reframing of detection as a tool for protection rather than punishment, though readers must remain vigilant about the limitations of false negative claims in a world of blended content.

Deep Dives

Explore these related deep dives:

  • False positive rate

    The article hinges on Pangram Labs' strategic decision to prioritize eliminating false positives over catching every instance of AI, a trade-off that defines their 'near-perfect' reputation.

  • Adulterant

    The author frames AI slop as a modern form of food adulteration, a historical practice of mixing inferior substances into goods that provides a crucial analogy for understanding why detection alone fails to solve the problem.

  • Goodhart's law

    The failure of previous detectors to distinguish humans from machines illustrates this principle, where the metric of 'catching AI' became so optimized that it distorted the system to the point of punishing human writers.

Sources

How this small startup achieved a Near-Perfect record against AI slop

Hey, Alberto here! Each week, I publish long-form AI analysis covering culture, philosophy, and business for The Algorithmic Bridge. Paid subscribers also get Monday how-to guides and Friday news commentary. I publish occasional extra articles. If you’d like to become a paid subscriber, here’s a button for that:

Full disclosure: This is not a paid sponsorship.

I..

The first thing you notice about AI slop is that it reads like standard online writing.

It is, in this sense, the latest of a long series of pollutants that are indistinguishable from the thing they pollute, including counterfeit currency, adulterated food, propaganda, and short-form TikTok entertainment videos passing as educational content.

Unlike the others, however, AI slop is hard to distinguish but not really hard to detect.

It’s important to stop on this subtlety for a moment: distinguishing means isolating something from its surroundings, whereas detecting means knowing something is there at all. A detector is unconcerned with what extra things might be there besides the target thing.

AI slop is easy to detect and thus, exterminate. Any standard classifier can do it. I can do it. You could do it if only you tried. If anything, AI slop conquered the internet because detectors were too good at catching machines. So good, in fact, that they’d also catch humans. Therein lies the problem: human writing is that extra thing that detectors catch in their large-scale drift nets while they remain, unfortunately, unbothered by their indiscriminate actions.

Everyone knows that catching a machine on purpose is easy; what almost nobody knows is how hard it is not to catch a human by mistake. The entire story of AI detectors is the story of how they want to be, instead, AI distinguishers. Or, failing that, where they accept a trade-off between. Indeed, AI slop grew and settled as a digital mold would; in the absence of something that could separate it, despite everyone knowing it was there.

Until now. Enter Pangram Labs.

The best way to understand Pangram’s approach to AI slop—crucially, what separates it from most other AI detection tools—is through Voltaire’s adage: perfect is the enemy of good.

Instead of trying to wage a full-blown war against AI by detecting perfectly what is and what isn’t slop, they cleverly chose to maximize their odds on the most important battle: how to ensure that everything they catch is, actually, an AI. Or, to ...