A Metaphor Worth Stress-Testing
Kenny Easwaran's lecture on the "data is the new oil" metaphor does something refreshingly rare in technology discourse: it takes the comparison seriously enough to interrogate it historically. Rather than accepting the slogan at face value or dismissing it as Silicon Valley bluster, Easwaran walks through the actual economic history of petroleum and asks whether the parallel holds up under scrutiny. The answer, it turns out, is more nuanced than either cheerleaders or skeptics tend to acknowledge.
The lecture's strongest move is grounding the oil side of the metaphor in specifics. Easwaran traces petroleum from an "obscure geological feature" in the 1830s through Abraham Gesner's refinement of kerosene in the 1850s, and then to its transformative role in powering automobiles, trains, ships, and aircraft. The key insight is that oil's strategic importance did not come from its initial use in lighting. It came from the secondary applications that emerged once refinement infrastructure already existed.
The essential features of oil for its strategic importance are these: there was an initial use, lighting, that created the market, made it valuable to start collecting and refining oil, and then once this market existed, a new use was developed for transportation that turned out to be central to huge numbers of valuable processes, and nothing else could compete with it.
This framing sets up the parallel to data neatly. Google's advertising business was the kerosene lamp: the initial use case that justified gathering and refining the resource. Neural networks and machine learning became the automobile: a secondary application so powerful it redefined the resource's strategic significance.
The Netflix Prize as Inflection Point
Easwaran's account of the Netflix Prize competition is particularly well-chosen as a case study. The 2006 challenge, which offered one million dollars for a ten percent improvement in movie recommendations, did more than advance recommendation algorithms. It demonstrated that anonymized data could be re-identified by cross-referencing it with public datasets, a finding by researchers at the University of Texas that effectively killed Netflix's plans for a second competition.
In one stroke this competition had shown the value of data for improving computer services and shown the risk of the public display of user data, and also developed several of the techniques that would turn out to become more powerful for both of these applications in coming years.
This is a genuinely important observation that deserves more emphasis than the lecture gives it. The Netflix Prize was not just a milestone in machine learning; it was an early warning about the tension between data utility and privacy that continues to define technology policy debates today. The European Union's General Data Protection Regulation, California's Consumer Privacy Act, and ongoing battles over facial recognition databases all trace their intellectual lineage, in part, to the realization that "anonymized" data is often anything but.
Where the Metaphor Breaks Down
Easwaran identifies several points where the oil-data comparison falters, and these are arguably the most valuable parts of the lecture. Oil is fungible: a barrel of crude from Saudi Arabia and a barrel from Texas can be refined into roughly interchangeable products. Data is not fungible at all. Netflix viewing data cannot train an image classifier. Medical records cannot improve a search engine's ad targeting. Each form of data serves specific purposes and cannot easily be substituted for another.
Different sources of data are not at all interchangeable. Data about words and data about images and data about locations and things like that are all valuable in very different ways.
There is a counterpoint worth raising here, however. The advent of multimodal models has begun to blur these boundaries in ways that Easwaran's framework does not fully anticipate. Modern foundation models like GPT-4, Claude, and Gemini are trained on text, images, code, and audio simultaneously. The technique of contrastive learning, pioneered by models like CLIP, explicitly maps between image data and text data, making them partially interchangeable for training purposes. While it remains true that Netflix ratings cannot directly train an image classifier, the broader trend is toward data becoming more fungible than it was even a few years ago.
Another break in the metaphor that Easwaran notes but perhaps underweights: oil is consumed when used, while data is not. A barrel of oil, once burned, is gone. A dataset can be copied, shared, and reused indefinitely at near-zero marginal cost. This difference has profound economic implications. Oil markets are governed by scarcity. Data markets are governed by network effects, lock-in, and the cost of collection, not the cost of replication. The strategic dynamics are fundamentally different: controlling oil means controlling a depleting resource, while controlling data means controlling an accumulating one.
The Missing Dimension: Power
The lecture is thorough on the economic parallels but lighter on the political ones. When Easwaran mentions that oil was "strategically important enough that many wars are said to be about it," the natural follow-up question is whether data has already reached that threshold. The evidence suggests it has, or is close. The United States government's restrictions on semiconductor exports to China, the geopolitical competition over undersea cables, and the weaponization of social media data in election interference campaigns all point to data's role as a strategic resource in international competition.
Easwaran gestures toward this with a brief mention of election influence in the mid to late 2010s, but the Cambridge Analytica scandal, the role of recommendation algorithms in radicalization, and the use of surveillance data by authoritarian governments all suggest that data's strategic importance may already rival oil's in certain domains. The metaphor might be more apt than the lecture's cautious conclusion implies.
The 25-Year Argument
The lecture's closing argument is its most provocative. Easwaran places the current moment in data's history at roughly the equivalent of the 1880s in oil's history, the moment when Benz and Daimler were inventing the automobile but before anyone could imagine the world that petroleum-powered transportation would create.
We're barely 25 years out from the start of the commercial extraction of data. In the history of oil, that would put us at the moment that Benz and Daimler invented the automobile. We haven't yet seen where the market for data is going.
This is a compelling framing, though it carries an implicit assumption worth questioning: that data's trajectory will follow oil's in terms of increasing strategic importance. There is an alternative scenario in which data becomes less strategic over time as collection becomes universal and commoditized. If every company and government has access to vast datasets, the competitive advantage may shift entirely to refinement capability, meaning compute power and algorithmic sophistication, rather than raw data ownership. In that world, "data is the new oil" would be less accurate than "compute is the new oil" or even "talent is the new oil."
Bottom Line
Easwaran's lecture succeeds as a historically grounded examination of a metaphor that is too often deployed without thought. The parallel between oil's initial use in lighting and data's initial use in advertising, and between their respective secondary applications in transportation and machine learning, is genuinely illuminating. The lecture is at its best when identifying where the metaphor breaks down: data's lack of fungibility, its non-rivalrous nature, and the uncertainty about whether its applications will ever be as universally central as transportation. Where it falls short is in underexploring the political and power dimensions of data as a strategic resource. The metaphor may be imperfect, but it is far from dead, and the lecture provides a solid foundation for thinking about why.