← Back to Library

“AI polls” are fake polls

Nate Silver delivers a necessary reality check to a tech industry racing to replace human voices with algorithmic echoes. In an era where artificial intelligence is reshaping every sector, Silver argues that "AI polls" are not a revolutionary advancement in data collection, but a fundamental category error that confuses simulation with reality.

The Illusion of the Synthetic Voter

Silver opens by dismantling the grandiose promises of startups like Aaru and Electric Twin, which claim to simulate global opinion without interviewing a single human. He notes the founders' ambition to "simulate the entire globe — from the way crops are grown in Ukraine to how that impacts production of oil in Iraq, trade through the strait of Malacca, and elections for the mayor of Baltimore." While the scale is impressive, Silver points out the philosophical flaw: these companies are not gathering data; they are generating predictions based on existing data. As he puts it, "Polling is fundamentally a data collection process... Silicon sampling, on the other hand, produces no new data."

“AI polls” are fake polls

This distinction is critical for busy leaders who rely on accurate sentiment analysis. If a model is trained on what people have said, it cannot detect what they are beginning to think. Silver highlights the risk of relying on such models for high-stakes decisions, noting that "you don't actually know what these voters think unless you're reaching them directly." The argument gains historical weight when viewed through the lens of multilevel regression with poststratification (MRP), a technique Silver's own organization has long championed. MRP uses statistical modeling to adjust real survey data, but it never replaces the need for that original, human-collected data point. A model without a foundation of reality is just a sophisticated guess.

Polling is fundamentally a data collection process. We might use surveys to make predictions by feeding them into election forecasts, but the main purpose of a poll isn't prediction, it's gathering new data about what people think and how they feel.

Critics of traditional polling might argue that human respondents are increasingly hard to reach and often dishonest. However, Silver suggests that the solution isn't to abandon humans for machines, but to invest more in the difficult work of finding representative samples. He writes, "If there's a shift in opinion among this subgroup, you're not going to detect it" with synthetic agents. This is a crucial insight for the executive branch and campaign strategists alike: the inability to detect a sudden shift in public sentiment could lead to catastrophic policy or electoral miscalculations.

The Bias of the Polite Machine

Silver's most damning critique concerns the inherent biases of large language models. Unlike human respondents, who can be messy, contradictory, and deeply negative, AI agents tend to be "sycophantic" and overly polite. He cites pollster John Hagner, who observed that early experiments "cannot get respondents to be as racist or sexist or, frankly, as negative as human respondents." This creates a dangerous blind spot. If a model smooths over the raw, ugly edges of public opinion, it presents a sanitized version of reality that may be comforting but is ultimately useless for navigating complex social fractures.

The article references the work of Seymour Martin Lipset, the sociologist who famously argued that the legitimacy of a democracy depends on the congruence between what people want and what they get. If AI polls systematically underreport the intensity of negative sentiment or the prevalence of extreme views, they create a false sense of stability. Silver notes that academic research shows these models "can seriously overpredict the favorability of politicians" and struggle to capture the variation between demographic subgroups. The result is a feedback loop where the administration or a campaign sees a world that is more moderate and predictable than it actually is.

The reports that have come through at the meetings that I've been at are that the early experiments on this, they cannot get respondents to be as racist or sexist or, frankly, as negative as human respondents.

A counterargument worth considering is that these models are improving rapidly. Ben Warner, co-founder of Electric Twin, defends the technology by comparing it to a new tool in a toolbox: "The mistake I think we make is we think that these new tools should either work in exactly the same way or somehow replace these old tools." He argues for using synthetic sampling for specific tasks, like turnout modeling, rather than as a total replacement. Silver acknowledges this nuance, admitting that "there is evidence that some techniques can replicate topline survey results quickly and cheaply." Yet, he remains firm that the marketing hype often obscures these limitations, with founders claiming their models are "magic" and will render traditional polling obsolete.

The Value of Original Data

Ultimately, Silver reframes the debate not as a choice between old and new, but as a question of value. As AI makes statistical inference cheaper, the relative value of collecting original data actually increases. He writes, "You might be able to train a model to make a reasonable estimate of what some hard-to-reach poll respondent would say... But you don't actually know what these voters think unless you're reaching them directly." This is a strategic imperative for any organization that cannot afford to be blindsided by a sudden shift in public sentiment.

The piece concludes with a sobering look at the current state of the industry. While companies like Aaru have secured billion-dollar valuations and are being used by major corporations like McDonald's, the political sector remains wary. Silver notes that "if it's being used in a campaign, people are keeping it incredibly quiet." This silence speaks volumes. The people who need the most accurate data are the ones most likely to understand the risks of relying on a simulation.

Beyond the frequently misleading marketing, what bothers me about the AI 'poll' hype is that as AI tools make statistical inference cheaper and/or better (note that these are not synonyms) that actually increases the comparative value of collecting original data.

Bottom Line

Silver's argument is a vital corrective to the tech industry's tendency to conflate simulation with reality. The strongest part of his case is the clear distinction between data collection and data modeling, a nuance that is often lost in the rush to adopt AI. The biggest vulnerability, however, is the speed of technological change; while the current models are flawed, the gap between human and machine performance may narrow faster than Silver anticipates. Leaders should watch not for the replacement of polls, but for the subtle, dangerous drift toward a world where we only hear what the machines think we want to hear.

Deep Dives

Explore these related deep dives:

  • The Signal and the Noise Amazon · Better World Books by Nate Silver

  • How to Measure Anything Amazon · Better World Books by Douglas W. Hubbard

  • Multilevel regression with poststratification

    This statistical technique represents the gold standard for correcting demographic biases in traditional polling, serving as the rigorous methodological counterpoint to the 'synthetic' approaches criticized in the article.

  • Ensemble forecasting

    The article discusses how pollsters aggregate data to predict election outcomes, and this concept explains the mathematical logic behind combining multiple models to reduce error, contrasting with the single-model reliance of AI agents.

  • Seymour Martin Lipset

    This sociological theory posits that economic development leads to democracy, a concept the article's mention of simulating global crop and oil production hints at, yet it remains a specific, testable framework that AI simulations often fail to capture without real-world causal data.

Sources

“AI polls” are fake polls

by Nate Silver · · Read full article

A few weeks after Donald Trump’s second presidential win, I took the train up from London (where I was living at the time) to Oxford to attend a conference on polls and forecasts of the 2024 election. Most of the attendees were pollsters or academics, but I also watched presentations from Aaru and Electric Twin, two companies that do what is interchangeably called synthetic sampling, silicon sampling, or synthetic audiences. Stripped of startup jargon, that means they use large language models (LLMs) to simulate responses to public opinion polls by having AI agents take on the role of survey respondents.

I had already heard of Aaru thanks to some articles with eye-catching headlines like “No people, no problem: AI chatbots predict elections better than humans” in the months leading up to Election Day. The founders were making some big, some might even say far-fetched claims, such as: “within two years, we will simulate the entire globe — from the way crops are grown in Ukraine to how that impacts production of oil in Iraq, trade through the strait of Malacca, and elections for the mayor of Baltimore.” When Semafor asked Aaru’s cofounders — Cameron Fink and Ned Koh — about my boss, they said “we respect all those who came before us.” Nate (as he so often does) shared his thoughts on Twitter:

Fink and Koh were relatively good-natured about this back-and-forth when we spoke at Oxford. They even offered to mail me one of the t-shirts featuring Nate’s quote they apparently had made. I never took them up on the offer, which I now somewhat regret.

These synthetic sampling companies fell off my radar for a while, but they do still exist. In fact, Aaru recently received a $1 billion valuation. Is what they’re doing anywhere close to the most important frontier in AI development? Not by a longshot, especially when Anthropic just developed a model so adept at exploiting software vulnerabilities that it’s only being released to 40 companies.

Still, silicon sampling is increasingly finding its way into public polling. Axios reported in March that “a majority of people trust their own doctors and nurses” based on findings from Aaru — without mentioning that the “people” in that sentence were actually LLMs. Around the same time, the Public Sentiment Institute “boosted” their online sample of 373 real survey respondents with 114 AI agents.1 (Spoiler alert: even the co-founder ...