← Back to Library

LLMs were mostly (but not entirely) useless at Extra-Textual tasks involved in the composition of…

In an era obsessed with automation, Freddie deBoer offers a rare and sobering counter-narrative: large language models are not just failing to replace human creativity; they are actively obstructing the very meta-textual workflows that make complex writing possible. While the industry fixates on generating text, deBoer argues that the real bottleneck for writers is not the production of words, but the management of narrative continuity and structural integrity—tasks where these tools remain surprisingly brittle.

The Illusion of Abundance

Freddie deBoer begins by dismantling the prevailing hype cycle surrounding artificial intelligence. He acknowledges that while these systems have clear implications for digital goods like code or images, he remains "deeply, deeply skeptical of claims that this technology will be the first in history to result in long-term net job loss rather than long-term net job growth." This stance is significant because it shifts the debate from fear of displacement to an analysis of actual utility. He notes that even in programming—a field predicted to be decimated—the market has recently been getting healthier, suggesting that the narrative of inevitable obsolescence relies on "raw assertion" rather than evidence.

LLMs were mostly (but not entirely) useless at Extra-Textual tasks involved in the composition of…

The core of his argument rests on a fundamental distinction between information and problem-solving. DeBoer writes, "most of our major problems as a species cannot be solved with information; indeed, I suspect that coming to understand the limits of computing will prove to be among the most profound scientific lesson of the 21st century." This framing is crucial for busy readers who are tired of being sold on efficiency gains that don't materialize. It suggests that the abundance these models create is merely an abundance of things we already had, not a solution to new complexities.

"I look to art to access the human... I access human-made art because I know there's a human behind it and that's what I'm looking for, other humans, showing me in art what they hide in their selves."

The Paradox of Prompting

DeBoer then pivots to his personal experiment: attempting to use these tools to manage the "mental juggling" required for writing a novel without letting them generate a single word of the final text. He identifies a paradox that many professionals will recognize immediately. To get a specific, nuanced argument out of an LLM, one must explain exactly what they want in excruciating detail. As he puts it, "writing enough to explain the argument I might ask them to make takes so much time and effort... that it's not a time saver."

This observation strikes at the heart of the productivity promise. If the input required to get a high-quality output exceeds the cost of doing the work oneself, the tool has failed its primary economic function. DeBoer notes that relying on these systems often leads to "trite, well-worn grooves," effectively shrinking the range of arguments a writer produces. He argues that this is not just a technical limitation but an ethical one: "My readers expect me to actually write the things I represent as mine." This commitment to authenticity is what separates professional work from automated content farming.

Critics might argue that deBoer's standard for utility is too high, suggesting that even imperfect assistance can save time on rough drafts. However, his experience suggests that the friction of correcting "profoundly goofy results" often outweighs the initial speed gain.

"With LLMs, what you get back is always inevitably what you've already gotten. No thanks."

The Limits of Machine Logic

The most revealing section of deBoer's analysis involves his attempt to use these models for practical tasks like chapter summaries and continuity checks. He found that while the tools could generate "at-a-glance" summaries, they often emphasized minor elements in ways that misrepresented the author's intent. More tellingly, when he tested their ability to count words—a task seemingly trivial for a computer—the models struggled due to the tokenization process.

He points out a critical disconnect: "the whole economic value of chatbots is that ordinary people can use them without special knowledge; the computer is supposed to be doing all of that thinking for you." When the user needs to write a script or understand tokenization to get a basic word count, the tool has failed its design purpose. This echoes historical debates about human-computer interaction, reminiscent of the "Chinese Room" argument which questions whether a system manipulating symbols truly understands them. Just as a person following rules in a room doesn't understand Chinese, an LLM processing tokens doesn't grasp narrative continuity.

Despite these failures, deBoer found some utility in checking for plot holes in his complex, non-linear manuscript involving nearly two dozen characters. The models could identify when events were out of order or when a character remembered something incorrectly. Yet, even here, the results were "a little wonky," with false positives that required human verification. This reinforces his conclusion that these tools are best viewed as clumsy assistants rather than autonomous partners.

"I sincerely believe I'm better at this than the LLMs are. Pretty self-explanatory."

Bottom Line

Freddie deBoer's most compelling contribution is the refusal to accept the premise that more data equals better art; he demonstrates that the human capacity for nuanced judgment remains irreplaceable in complex creative workflows. The argument's greatest vulnerability lies in its reliance on current model limitations, which are rapidly evolving, yet his core insight—that the cost of alignment often exceeds the benefit—remains a vital check against blind technological adoption.

Deep Dives

Explore these related deep dives:

  • Scrivener (software)

    The author explicitly names this niche writing tool to contrast the mechanical act of drafting with the complex, non-linear 'mental juggling' of story architecture that LLMs struggle to replicate.

  • Slate Star Codex

    Identifying this specific rationalist blog explains the origin of the author's rigorous, bet-based methodology for testing AI claims and clarifies why his skepticism is framed as a direct rebuttal to a particular online community.

  • Chinese room

    While not named directly, the article's core distinction between generating text and understanding 'extra-textual tasks' mirrors this philosophical thought experiment about syntax versus semantics in artificial intelligence.

Sources

LLMs were mostly (but not entirely) useless at Extra-Textual tasks involved in the composition of…

by Freddie deBoer · · Read full article

I am, as you are aware, not very impressed by LLMs.

I think they have clear implications for some fields that rely on the production of digital goods, such as writing text, developing code, producing images and video, or generating music. These effects will likely prove to be more modest than they are now hyped up to be, and I am deeply, deeply skeptical of claims that this technology will be the first in history to result in long-term net job loss rather than long-term net job growth. (The only evidence anyone can bring to that prediction is raw assertion.) LLMs are best at writing code, and computer programming has been widely predicted to be the field most susceptible to “disruption,” but in fact that job market has been getting healthier lately. (Albeit from a depressed recent baseline.) But still, sure, there will be consequences in the realm of generating digital goods. The issue is that most of the world is not made up of 0s and 1s, the things that LLMs make more abundant are things that were already abundant, and most of our major problems as a species cannot be solved with information; indeed, I suspect that coming to understand the limits of computing will prove to be among the most profound scientific lesson of the 21st century.

Still, I have tried very hard to be a good critic of this technology rather than a bad critic, an active one rather than a lazy one, an informed one rather than an ignorant one. I made very specific predictions about where LLMs would be in three years and tried to put money on it, which for some reason enraged the kinds of people on Reddit who think that criticisms of AI aren’t specific enough in their predictions and have no stakes. I have asked LLMs to produce information for me about subjects that I already know pretty well, which has been useful when confronting just how often these tools produce profoundly goofy results. I’ve paid for Claude and ChatGPT because people in the AstralCodexTen comments kept insisting that you couldn’t properly assess LLM performance with free versions. I have tried to be the critic that AI enthusiasts want. This has not endeared them to my conclusions - in fact being the kind of critic they say they want seems to only make them more resentful - but at least ...