← Back to Library

Regularized adjusted plus-minus (rapm) metrics for NFL players

Ray Carpenter attempts the impossible: translating the fluid, possession-based logic of basketball analytics into the chaotic, position-siloed reality of the NFL. In a field often obsessed with refining existing metrics, Carpenter's most distinctive claim is that the very structure of football data resists the "plus-minus" model that revolutionized basketball, forcing a complete reinvention of how we value individual contribution.

The Data Dilemma

Carpenter begins by acknowledging the structural mismatch between the two sports. He writes, "Plus/minus ratings don't exactly translate perfectly to NFL play, but that's never stopped us from exploring the topic before." This admission sets the stage for a rigorous, iterative process rather than a simple copy-paste of NBA methodology. The author leverages a decade of snap participation data, merging it with play-by-play metrics like Expected Points Added (EPA) to create a baseline. By applying "heavier weights to more recent data," he attempts to solve the problem of obsolescence, noting that "good play in 2025 matters more than good play in 2019." This approach mirrors the temporal decay concepts found in broader sports modeling, similar to the exponential decay methods used in predicting Super Bowl outcomes, yet Carpenter pushes further by testing a "ridge regression" to prevent statistical noise from distorting player value.

Regularized adjusted plus-minus (rapm) metrics for NFL players

The core of his argument rests on the "shrinkage" technique, which pulls extreme values toward the mean. Carpenter explains, "The ridge portion of this pulls everyone's average down towards zero, to try and avoid the classic pitfalls of the NBA's RAPM like over-valuing a player who doesn't play many minutes at all." This is a critical distinction. In basketball, a player with limited minutes might still have a massive impact per possession. In football, a lineman who plays one snap and gets lucky with a touchdown play could skew the entire dataset without this correction. Critics might note that even with shrinkage, the model struggles to isolate individual impact in a sport where 11 players move as a single unit, making the "plus-minus" concept inherently noisy.

"Working with NFL data is like trying to get blood from a turnip, especially from free sources."

The Iterative Process

Carpenter does not present a finished product but rather a transparent journey through failure and refinement. His first version (V1) of the metric produced results that felt intuitively wrong, heavily favoring players like Stefon Diggs and Tua Tagovailoa. He attributes this to the model's inability to account for game context, writing, "I think this original model liked Stefon Diggs a lot because he was one of the only players in almost every participation dataset since 2016." The metric also inadvertently rewarded players who missed time due to injury, as their teammates' struggles while they were absent inflated their "on/off" differential. Carpenter admits, "This original measure is extremely inflating Tua Tagovailoa because of the poor quarterback play that happens down in Miami when he misses time."

To address this, he pivots to a second version (V2), scaling the metric to 75 plays instead of the NBA-standard 100, a decision driven by the reality that "the average NFL game has roughly 153 plays in it." He combines EPA with Win Probability Added (WPA) to discount "garbage time" contributions, a nuance often missing in standard efficiency stats. However, even this refinement had flaws. Carpenter observes, "This one might actually be slightly more suspect though, since TJ Bass played fewer than 400 snaps for the Cowboys this season." The model continued to favor low-snap players, suggesting that the weighting of recency was too aggressive. This iterative transparency is the piece's greatest strength; it shows that advanced metrics are not static truths but evolving hypotheses.

Structural Limitations

Despite the sophisticated modeling, Carpenter is candid about the fundamental barriers preventing a perfect metric. He highlights the "democratic" nature of basketball responsibility versus the rigid specialization of football. "The NBA simply has a larger sample size than the NFL does, and the sport lends itself to a more democratic form of responsibility than football does," he notes. The current model cannot distinguish between a defensive lineman who bull-rushes a quarterback and a wide receiver on the opposite side of the field who had no involvement in the play. "If he lined up off the left tackle, bull-rushed him, and sacked the QB, this current RAPM formula punishes every player equally for being on the field for that negative play," Carpenter writes. This is a significant blind spot that no amount of regression can fully resolve without granular tracking data that is not yet widely available.

Furthermore, the model struggles with special teams, a crucial third phase of the game. "Apologies to special teams, and I'm bummed I couldn't produce something that yielded effective results for them," he admits. This limitation underscores the difficulty of applying a unified metric to a sport with such distinct, non-overlapping roles. While the author successfully isolates offensive and defensive contributions, the inability to integrate special teams means the metric remains incomplete.

"To think I could create an entire advanced metric in under a week was bold, and I had fun trying."

Bottom Line

Carpenter's exploration of Regularized Adjusted Plus-Minus for the NFL is a masterclass in methodological humility, proving that the most valuable insights often come from understanding where a model fails. While the current iteration still overvalues players with limited snap counts and struggles to isolate individual impact in a team sport, the framework provides a necessary foundation for future refinement. The strongest part of this argument is its rejection of the "black box" approach, offering readers a clear view of the statistical trade-offs involved in quantifying football performance. The biggest vulnerability remains the data's inability to capture the specific spatial dynamics of a play, a hurdle that will require more than just better math to overcome.

Sources

Regularized adjusted plus-minus (rapm) metrics for NFL players

by Ray Carpenter · The Spade · Read full article

Good morning everyone,

It’s the NFL off-season and we’re heading for an amazing March Madness. If I had to explain why I’m on a basketball kick again, those are probably the two reasons why. And of course, the endless pool of knowledge in the world of basketball analytics. A couple of weeks ago, I attempted to calculate NFL on/off stats since nflreadR released their snap participation data for the season and freed me from the shackles of web scraping. That was me dipping my toes in the shallow end of basketball analytics, and this week’s edition is me diving head first into the deep end. We’re going to attempt to apply an NBA stat called Regularized Adjusted Plus-Minus (RAPM) to football.

You may be asking yourself, what is adjusted plus-minus? Jeremias Engelmann did a better job explaining it than I ever could here, and here. The gist of adjusted plus-minus is that it’s a stat trying to measure the impact of individual NBA players. Plus/minus ratings don’t exactly translate perfectly to NFL play, but that’s never stopped us from exploring the topic before.

The Spade is a weekly football analytics newsletter covering the NFL, college football, and everything in between. Lately we’ve been focused on the landscape of the college football transfer portal and applying NBA stats to the NFL. In the coming weeks, I’ll be publishing some NFL draft content. If that stuff sounds interesting to you, I’d love to have you along for the ride as a subscriber for free:

There’s a paid option as well if you’d like access to my coding tutorials for R, Python SQL, and soon, D3. Let’s dig up some more football data visualizations.

Inputs.

We’re using nflreadR’s participation data back from 2016 until now to grab snap participation data and merge it to play-by-play data that has all the good stuff we’re looking for, like Expected Points Added (EPA) per play. Like NBA RAPM, we’re applying heavier weights to more recent data. That’s just a fancy way of saying that good play in 2025 matters more than good play in 2019. I got that idea of recency decay from this paper by Joseph Sill, and applied something similar. I was messing with concepts of exponential decay back when we were predicting the Super Bowl winning Gatorade Shower color.

I chose a ridge regression for version 1 of the NFL RAPM equivalent. Per ...