Wikipedia Deep Dive

Newcomb's problem

16 min read

In 1969, Robert Nozick published a paper that would fracture the philosophical community, presenting a scenario so simple a child could understand the rules, yet so devastatingly complex that it has divided the world's brightest minds into two irreconcilable camps for over half a century. The problem, originating from the mind of William Newcomb of the University of California's Lawrence Livermore Laboratory, is not a riddle of logic puzzles or lateral thinking tricks. It is a fundamental assault on how we understand rationality, causality, and the very nature of choice. When Nozick first introduced it, he observed a phenomenon that remains true today: "To almost everyone, it is perfectly clear and obvious what should be done. The difficulty is that these people seem to divide almost evenly on the problem, with large numbers thinking that the opposing half is just being silly." This is not a disagreement over facts; it is a disagreement over the architecture of reason itself.

The stage is set with two boxes, labeled A and B, and a being known as the "predictor." The rules are deceptively straightforward. Box A is transparent. You can see right through it. Inside, resting on a plain wooden shelf, sits exactly $1,000. Box B is opaque. You cannot see inside it. Its contents were determined by the predictor before you even walked into the room. The predictor is not a guesser, nor a gambler. In the standard formulation, this entity possesses a predictive accuracy that borders on the miraculous, often described as near-certainty or even absolute infallibility.

The predictor operates on a single, rigid logic. If the predictor foresaw that you would take only Box B, it placed $1,000,000 inside it. If the predictor foresaw that you would take both Box A and Box B, it left Box B empty. The player stands before the boxes, aware of these rules, aware of the predictor's track record, but unaware of what the predictor actually predicted. The contents of Box B are already fixed. You cannot reach in and change them now. You can only choose your action. Do you take the safe, visible $1,000 and the potential million? Or do you take only the opaque box and risk it being empty, or take both and guarantee the million is gone?

This scenario, which appeared in Martin Gardner's "Mathematical Games" column in the March 1973 issue of Scientific American, became the crucible for a war between two competing principles of decision theory. On one side stands the principle of Strategic Dominance. On the other, the principle of Expected Utility. The tension between them is what makes Newcomb's problem a paradox, or perhaps, a mirror reflecting the limitations of our own logical frameworks.

The argument for Strategic Dominance is the seductive logic of the "common sense" approach. It relies on the concept that at the moment you make your choice, the predictor has already finished its work. The money is either in the box or it isn't. The past is immutable. If you look inside Box B and find $1,000,000, taking both boxes nets you $1,001,000. Taking only Box B nets you $1,000,000. In this scenario, taking both is clearly better. If you look inside Box B and find it empty, taking both boxes nets you $1,000. Taking only Box B nets you $0. Again, taking both is clearly better.

This logic holds regardless of what is actually in the box. No matter the state of Box B, choosing both boxes yields exactly $1,000 more than choosing only Box B. Therefore, a rational agent who adheres to the principle of dominance must choose both boxes. To do otherwise would be to leave free money on the table. As David Lewis, a prominent philosopher, argued, the predictor's prediction is a fact about the past. Your current choice cannot change the past. If the predictor put the money there, it will be there whether you take it or not. If it wasn't there, it wasn't there. The causal arrow points only one way: from the prediction to the box's content, not from your choice to the box's content. Thus, the dominant strategy is to take both.

But then, look at the numbers.

Consider the player who follows the logic of Expected Utility. This approach does not look at the isolated states of the world; it looks at the probability of those states given the player's action. The predictor is, by definition, almost always right. Let's assume the predictor is correct 99% of the time, or even 100%. If you decide to take only Box B, the predictor almost certainly predicted this. Consequently, Box B almost certainly contains $1,000,000. Your expected payoff is nearly $1,000,000.

If, however, you decide to take both boxes, the predictor almost certainly predicted this. Consequently, Box B is almost certainly empty. You walk away with the $1,000 from Box A. Your expected payoff is roughly $1,000.

When you run the math, the difference is staggering. A strategy of one-boxing yields, on average, a million dollars per game. A strategy of two-boxing yields a thousand. The rational agent, seeking to maximize their wealth, must choose to take only Box B. The logic is impeccable. If you want the money, you must act as if your choice determines the contents of the box, even though the contents were set in the past. The correlation between your choice and the predictor's prediction is so strong that the choice effectively causes the prediction in a statistical sense, if not a causal one.

The paradox arises because both arguments feel undeniably correct. The dominance principle says, "The money is already there or it isn't; take the extra thousand." The expected utility principle says, "If you take the extra thousand, you are signaling to the predictor that you are the type of person who takes the extra thousand, and thus you lose the million."

In a 2020 survey of professional philosophers, this divide was quantified with painful precision. A modest plurality of 39.0% chose to take both boxes, adhering to the dominance principle. However, 31.2% chose to take only Box B, siding with expected utility. The remaining respondents were undecided or held other views. This is not a split between the educated and the ignorant, or the logical and the emotional. It is a split between two coherent, rigorous, yet mutually exclusive definitions of what it means to be rational. The people on each side look at the other and see "silly" behavior, unable to comprehend how someone could fail to see the obvious truth of their own position.

To understand why this schism exists, we must look deeper into the mechanics of the predictor. The paradox dissolves if we change the nature of the connection between the player and the predictor. Consider three specific variations that change the causal structure of the problem.

First, imagine the predictor places the million in Box B and then uses a hidden trapdoor to remove it the instant you reach for Box A. In this scenario, there is a direct, physical causal link between your action and the disappearance of the money. If you two-box, you trigger the trapdoor. If you one-box, the money stays. Here, everyone agrees: you must one-box. The dominance principle fails because the "state of the box" is not independent of your action; it is dynamically dependent on it.

Second, imagine retrocausality. Suppose the future can influence the past. In this case, your choice in the present literally determines what the predictor put in the box yesterday. If you choose to one-box, you cause the predictor to have predicted one-boxing. If you choose to two-box, you cause the predictor to have predicted two-boxing. Again, the consensus is clear: one-boxing is the only rational path. The "past" is not fixed; it is responsive to your current decision.

Third, consider a scenario where the game is played multiple times, and the predictor learns from your behavior. If the predictor is a machine that updates its algorithm based on your history, and you are trying to maximize your long-term winnings, the only way to train the predictor to put money in the box is to consistently one-box. Two-boxing in a single round might seem like a win, but it teaches the predictor to empty the box in the future. Over time, the one-boxer wins everything; the two-boxer wins nothing.

In cases (a), (b), and (c), the controversy evaporates. The causal link between the choice and the outcome is explicit. The paradox of Newcomb's problem exists specifically because the standard version strips away these causal mechanisms, leaving us with a "post-determination" premise. The prediction is made, the box is filled, and the choice is made, yet the correlation remains perfect. The mystery lies in why the predictor is so accurate.

This is where the debate shifts from decision theory to the metaphysics of free will and determinism. If the predictor is infallible, it implies that your choice was determined long before you made it. The predictor did not guess; it calculated the outcome of a deterministic universe. If the universe is deterministic, then the state of the boxes and your decision are both the result of prior causes stretching back to the Big Bang. In such a world, "free will" in the libertarian sense—where you could have done otherwise given the exact same prior conditions—is an illusion.

Some philosophers, such as William Lane Craig, have suggested that in a world with perfect predictors, retrocausality is not just a theoretical possibility but a necessary mechanism. The chooser's choice can be said to have caused the predictor's prediction. This creates a closed causal loop, a time loop of sorts, where the effect (the prediction) precedes the cause (the choice) in time, yet depends on it for its content. If this is true, then the "dominance" argument is a fallacy because it assumes the contents of the box are independent of the choice. They are not. They are correlated by a deeper, perhaps retrocausal, necessity.

However, the defenders of two-boxing argue that this conflates correlation with causation. They invoke Causal Decision Theory (CDT), a framework that insists an agent should only consider the causal consequences of their actions. According to CDT, your current decision cannot reach back in time to alter the predictor's past action. The money is either there or it isn't. The fact that your decision is correlated with the money's presence is a statistical fluke, or rather, a statistical certainty based on your character, but it is not a causal link. Therefore, CDT dictates that you must take both boxes to maximize the utility of your current action, regardless of the predictor's track record.

Critics of CDT, including Caspar Oesterheld and Vincent Conitzer, argue that this approach is flawed. They point out that agents using CDT in Newcomb-like scenarios will systematically lose money. If you consistently two-box, you will never get the million. Over a lifetime of such decisions, the CDT agent will be poorer than the agent who follows Expected Utility Theory. They argue that CDT is subject to "diachronic Dutch books," a situation where an agent can be exploited over time because their decision rule fails to account for the predictive nature of the environment. In their view, a decision theory that leads an agent to voluntarily lose money in a scenario where they could have won it is, by definition, irrational.

The debate also touches on the nature of the predictor itself. Is it a super-intelligent alien? A god? A computer running a simulation of your mind? If the predictor is simulating you to see what you will do, then the simulation is you. When you make a decision, you are essentially running the simulation. If the simulation runs and decides to two-box, the real you will two-box. The predictor, having seen the simulation, knows this. The distinction between the "real" you and the "simulated" you collapses. In this view, your decision is the same decision the predictor made. To say "I will choose two-boxing because the money is already there" is a logical error; the money is there because the simulation chose one-boxing. If the simulation chose two-boxing, the money would not be there.

This leads to the interpretation that Newcomb's problem is not really about boxes and money at all. It is a test of whether an agent can recognize that their decision-making process is part of a larger system. If you view your decision as an isolated event, you will two-box. If you view your decision as a signal of your nature, or a data point in a deterministic chain, you will one-box.

David Wolpert and Gregory Benford offered a brilliant resolution to the confusion. They pointed out that the paradox arises only because the problem statement is underspecified. It does not fully define the nature of the predictor or the causal structure. They argue that there are at least two distinct "games" hidden within the single problem statement. In one game, the predictor's accuracy is based on a causal mechanism (like the trapdoor or the simulation), making one-boxing the optimal strategy. In the other game, the predictor's accuracy is a brute fact of the universe, unconnected to the player's decision process, making two-boxing the optimal strategy. The confusion stems from people assuming they are playing the same game while actually applying the logic of different games to the same scenario.

Wolpert and Benford's analysis suggests that the debate is futile if we do not specify the mechanism of prediction. If the predictor is infallible, the problem is equivalent to a game where your choice and the prediction are perfectly correlated. In such a game, the concept of "strategic dominance" breaks down because the states of the world (money in box / money not in box) are not independent of your strategy. The "state" is not a fixed backdrop; it is a function of your strategy. Therefore, the dominance principle, which requires the states to be independent of the action, is inapplicable.

This insight forces us to reconsider the very foundation of decision theory. For decades, philosophers have treated the "rational choice" as a universal constant, independent of the context. Newcomb's problem suggests that rationality is context-dependent. What is rational in a world where your choices have no causal bearing on the past (the standard two-boxing view) is irrational in a world where your choices are the primary determinant of the outcome (the one-boxing view).

The problem also serves as a powerful critique of the concept of free will. If the predictor is infallible, then your choice was determined. If your choice was determined, then you did not "choose" in the sense of having the ability to do otherwise. Yet, the one-boxer argues that they are making a rational choice to maximize utility. This creates a tension: can you make a free, rational choice in a deterministic universe? The answer seems to be yes, if you define "free" not as the ability to do otherwise, but as the ability to act according to your reasons and desires. The one-boxer acts according to the reason that they want a million dollars. The two-boxer acts according to the reason that they want to maximize the payout given the current state. Both are acting rationally, but they are operating under different metaphysical assumptions.

The persistence of the paradox is a testament to the depth of the human condition. We are creatures who crave causal control. We want to believe that our actions matter, that they can change the world. The dominance principle appeals to this desire. It says, "Your action here and now is the only thing that matters. The past is done. Seize the opportunity." It feels empowering. The expected utility principle, by contrast, feels like surrender. It says, "Your action is already known. You are just confirming a prediction. Do what is necessary to confirm the good prediction." It feels like a trap.

Yet, the math does not lie. In the limit case of an infallible predictor, the one-boxer wins every time. The two-boxer wins nothing. The only way to get the million is to trust the correlation, to believe that the predictor's knowledge of your mind is as real as the money in the box. It requires a leap of faith—not in a religious sense, but in a logical one. It requires accepting that the universe is structured in such a way that your decision is inextricably linked to the outcome, even if the link is not causal in the traditional sense.

As we look to the future of decision theory, Newcomb's problem remains a vital, unsettling fixture. It challenges the dominance of Causal Decision Theory, forcing philosophers to develop new frameworks like Evidential Decision Theory and Functional Decision Theory. These newer theories attempt to capture the intuition that an agent's decision algorithm is a fundamental part of the world, not just a reaction to it. They suggest that to be truly rational, one must consider not just the causal effects of an action, but the logical implications of the decision process itself.

The problem also has implications for artificial intelligence. If we create an AI that must make decisions in a world where it is being predicted or simulated, how should it behave? If the AI follows CDT, it will two-box and fail. If it follows EDT, it will one-box and succeed. The design of rational agents may hinge on how we resolve this ancient paradox.

In the end, Newcomb's problem is a mirror. It shows us that our understanding of "rationality" is not a single, monolithic truth, but a complex interplay of beliefs about causality, time, and the nature of the self. It forces us to ask: Are we the masters of our fate, or are we merely actors in a script that was written before we were born? The answer depends on which box you choose. And perhaps, in choosing, we define the answer for ourselves.

The debate is far from over. As new variants of the problem are proposed and new theories of decision making emerge, the 50/50 split among philosophers remains a stubborn testament to the difficulty of the task. We are left with two camps, both convinced of their own brilliance, both unable to convince the other. In a world of increasing complexity, where algorithms predict our desires before we feel them, Newcomb's problem is no longer just a thought experiment. It is a warning. It is a reminder that in a universe where our choices are known, the line between freedom and fate is thinner than we dare to imagine.

The boxes are there. The money is waiting. The predictor is watching. The only question that remains is what you will do. Will you take the thousand and lose the million, or will you take the million and lose the thousand? The answer, it turns out, is not in the boxes. It is in you.

Related Articles