Wikipedia Deep Dive

Glicko rating system

13 min read

Based on Wikipedia: Glicko rating system

In 1995, Mark Glickman, a statistician at Harvard University, identified a fundamental flaw in the way the world measured human competitive potential. For decades, the chess world and other competitive gaming spheres had relied on the Elo rating system, a method that treated a player's skill as a static, immutable number. If you were a 1500-rated player, the system assumed you were exactly a 1500-rated player, regardless of whether you had just played five games last week or hadn't touched a board in five years. Glickman realized this was a statistical fallacy. Skill is not a stone; it is a fluid, uncertain variable that fluctuates based on recent performance and the passage of time. To fix this, he invented the Glicko rating system, introducing a concept that would revolutionize how we quantify uncertainty in zero-sum games: Ratings Deviation, or RD.

This was not merely a minor tweak to an existing formula. It was a paradigm shift from measuring a single point of truth to measuring a range of probable truths. The Glicko system, and its subsequent iteration Glicko-2, transformed the way online gaming platforms from Lichess to Counter-Strike 2 determine who is better than whom. By acknowledging that we can never be 100% certain of a player's true strength, Glickman created a mathematical framework that is more honest, more dynamic, and far more accurate than its predecessor.

The Illusion of Certainty

To understand why Glickman's work was so necessary, one must first appreciate the limitations of the system it replaced. The Elo system, developed by Arpad Elo in the 1960s, operates on a simple premise: if a high-rated player beats a low-rated player, the high-rated player gains a few points, and the low-rated player loses a few. If the low-rated player upsets the favorite, the point exchange is massive. It is an elegant solution, but it suffers from a critical blind spot: it assumes every rating is equally reliable.

In the real world, this is absurd. Consider two players, both with a rating of 1500. Player A has played 500 games over the last three years, maintaining a steady win-loss ratio. Player B has played only three games in the last five years, winning two and losing one. Under Elo, these two players are identical. They are interchangeable data points. But any human observer knows they are not. Player A's rating is a solidified reflection of their current ability. Player B's rating is a guess, a statistical ghost that could easily be 1800 or 1200.

Glickman's breakthrough was to attach a measure of confidence to every single rating. He called this the Ratings Deviation (RD). In the Glicko system, a player is not defined by a single number, but by a pair of numbers: their rating and their RD. The RD represents one standard deviation of the player's true skill level. It is a quantification of doubt.

"A player with a rating of 1500 and an RD of 50 is expected to have a real strength between 1402 and 1598."

This range is not arbitrary; it is a 95% confidence interval. By adding and subtracting 1.96 times the RD from the rating, the system calculates the boundaries within which the player's true strength almost certainly lies. If the RD is high, the range is wide, indicating that the system knows very little about the player's actual capability. If the RD is low, the range is narrow, signifying high certainty.

This distinction changes everything about how ratings evolve. In the Glicko system, the amount a player's rating changes after a game is directly proportional to the reliability of the data. If a player with a low RD (high certainty) plays a game, their rating will change very little, even in the event of an upset. The system trusts the established rating more than the single new data point. Conversely, if a player with a high RD (low certainty) plays, their rating can swing wildly. The system is eager to learn, to narrow that confidence interval, and to pin down the player's true strength.

Furthermore, the system treats the opponent's uncertainty with equal gravity. If a player with a high RD wins against a known opponent, the winner's rating does not skyrocket. Why? Because the opponent's true strength is unknown. Beating a mystery opponent tells you very little about your own skill. The information gain is minimal. This logic prevents the "rating inflation" and "rating deflation" that can plague systems where a single lucky win against a fluctuating opponent can catapult a player into a new tier.

The Erosion of Knowledge

One of the most elegant features of the Glicko system is its treatment of time. In the Elo system, a rating is eternal. A player who achieves a 2000 rating in 1990 retains that 2000 rating in 2026 unless they play another game. This implies that if a player stops playing for a decade, their skill remains frozen in time, and the system remains just as confident in that number as it was when they last played.

Glickman recognized that skill decays, or at least becomes harder to track, with inactivity. A chess player who does not play for five years may have forgotten opening theory, or their reaction times may have slowed, or their strategic intuition may have rusted. The Glicko system accounts for this by allowing the RD to increase over time.

The formula for this adjustment is straightforward yet profound. The new Ratings Deviation ($RD$) is found using the old Ratings Deviation ($RD_0$) and the time elapsed ($t$):

$$RD = \min \left({\sqrt {{RD_{0}}^{2}+c^{2}t}},350 \right)$$

Here, $t$ represents the number of rating periods since the last competition, and $c$ is a constant that determines the rate at which uncertainty grows. The number 350 is a cap, representing the RD of a completely unrated player. It is the ultimate state of uncertainty.

This mechanism ensures that a player's rating remains dynamic. If you go on a hiatus, your RD slowly creeps upward. Your rating might stay the same, but the system's confidence in that rating evaporates. When you return to the game, the system is once again willing to make drastic adjustments based on your first few games, treating you with the same caution it would afford a newcomer. This prevents the stagnation of the leaderboard and ensures that the "true" rating is always being recalibrated against the current reality of the player's performance.

The constant $c$ is derived from empirical data or logical estimation. Glickman suggested that if one assumes it takes 100 rating periods for a player's RD to return to the initial uncertainty of 350, and a typical player starts with an RD of 50, the constant can be calculated to be approximately 34.6. This allows the system to be tuned to the specific ecosystem it is measuring. In a fast-paced game like Counter-Strike, where meta-shifts happen weekly, $c$ might be higher. In a game like Go, where skills are more stable, it might be lower.

The Evolution to Glicko-2

While the original Glicko system was a massive improvement over Elo, Mark Glickman was not satisfied. He realized that there was another dimension to player performance that the first iteration failed to capture: volatility.

Consider two players with identical ratings and identical RDs. Player X is a model of consistency. They win slightly more than they lose, and their performance fluctuates within a tight band. Player Y is a wildcard. They might crush a grandmaster one day and lose to a novice the next. Their average performance might be the same as Player X, but their behavior is erratic. In the original Glicko system, these two players would be treated identically.

This was a problem. If Player Y beats a strong opponent, is it a sign that their true skill has increased, or was it just a lucky spike in their natural volatility? The original system struggled to distinguish between a permanent improvement in skill and a temporary fluctuation in form.

Enter Glicko-2. Released as an enhancement to the original algorithm, Glicko-2 introduced a third variable: rating volatility, denoted by the Greek letter sigma ($\sigma$). This variable measures the degree of expected fluctuation in a player's rating based on how erratic their performances are.

"A player's rating volatility would be low when they performed at a consistent level, and would increase if they had exceptionally strong results after that period of consistency."

The introduction of volatility makes the system significantly more robust. If a player with low volatility suddenly achieves a massive upset, the system is more likely to attribute it to a temporary anomaly or a fluke, and the rating adjustment will be conservative. However, if a player with high volatility achieves the same upset, the system recognizes that such fluctuations are part of their nature, and the rating adjustment will be more aggressive, reflecting the higher probability that the result is indicative of their true, albeit unstable, strength.

The mathematics behind Glicko-2 are more complex, involving iterative calculations to solve for the new volatility. The system calculates ancillary quantities, $v$ and $\Delta$, which represent the variance of the rating update and the weighted sum of the differences between actual and expected scores. It then uses a function $f(x)$ to find the value $A$ that satisfies the equation $f(A) = 0$. This is typically done using the Illinois algorithm, a modified version of the regula falsi procedure.

Once this iterative procedure is complete, the new volatility $\sigma'$ is set, and the new rating deviation and rating are updated. The formula for the new rating $\mu'$ is:

$$\mu' = \mu + \phi'^2 \sum_{j=1}^{m} g(\phi_j) \{s_j - E(\mu, \mu_j, \phi_j)\}$$

Where $\phi'$ is the new RD, $g$ is a function that dampens the influence of high RD opponents, and $E$ is the expected score function. The inclusion of volatility ensures that the system is not just reacting to the last game, but is synthesizing a history of performance patterns to determine the most likely current state of the player.

A Global Standard

The impact of Glickman's work has been profound and widespread. Both the Glicko and Glicko-2 systems are in the public domain, a decision that has allowed them to permeate the digital gaming landscape without the barriers of licensing fees or proprietary restrictions. They are no longer just theoretical constructs; they are the invisible engines driving the competitive integrity of some of the world's most popular games.

The Australian Chess Federation was an early adopter, implementing a slightly modified version of Glicko-2 to manage its national ratings. But it was the online gaming revolution that truly cemented Glicko's legacy. Major platforms like Lichess and Chess.com use Glicko-2 to determine their rankings, replacing the older Elo systems that had dominated the chess world for decades. These platforms handle millions of games, and the speed at which the Glicko system converges on a player's true skill is essential for a smooth user experience.

Beyond chess, the system has found a home in the chaotic, fast-paced world of multiplayer online battle arenas (MOBAs) and first-person shooters. Dota 2, Team Fortress 2, and Counter-Strike 2 have all utilized variations of the Glicko system to rank their players. In these games, where team composition, map selection, and meta-shifts can cause wild swings in performance, the ability of Glicko-2 to account for volatility is invaluable. It prevents a player from being permanently penalized for a few bad games during a slump, just as it prevents a player from being unfairly elevated by a lucky streak.

Other notable implementations include Guild Wars 2, Splatoon 2, and Pokémon Showdown. Even the online Go community, on platforms like Online-go.com, has embraced the system. The versatility of Glicko is its greatest strength. Whether the game involves 60-second matches or hours-long strategic battles, the underlying mathematics of uncertainty and volatility remain applicable.

The Philosophy of Uncertainty

At its core, the Glicko system is a philosophical statement about the nature of knowledge. It rejects the idea that we can ever know the "true" skill of a player with absolute precision. Instead, it embraces uncertainty as a fundamental part of the competitive landscape.

In a world obsessed with rankings and leaderboards, it is easy to forget that these numbers are merely estimates. They are snapshots of a moving target. The Glicko system forces us to acknowledge the error bars. It reminds us that a rating of 2000 with an RD of 20 is a much more solid achievement than a rating of 2000 with an RD of 150. It tells us that the latter player is a "maybe" champion, while the former is a "probable" one.

This approach has profound implications for how players perceive their own progress. In an Elo system, a loss feels like a permanent subtraction from one's identity. In a Glicko system, a loss is just a data point that helps narrow the confidence interval. If you have a high RD, a loss is not a disaster; it is a necessary step in the calibration process. It is the system asking, "Okay, maybe you aren't as good as we thought. Let's adjust."

The system also protects the integrity of the competition. By making the rating change dependent on the opponent's RD, it discourages "smurfing"—where high-skilled players create new accounts to play against lower-skilled opponents. In a Glicko system, beating a low-rated opponent with a high RD yields very little gain. The system knows that beating a novice who is still figuring out the ropes doesn't prove much about the winner's skill. To gain significant points, you must beat opponents whose skill is well-established, or beat a wide variety of opponents to rapidly reduce your own RD.

The Future of Measurement

As we look toward the future of competitive gaming and statistical modeling, the principles established by Mark Glickman in 1995 remain as relevant as ever. The data explosion of the 21st century has only increased the need for systems that can handle uncertainty and volatility. Whether it is predicting the outcome of an election, assessing the risk of a financial portfolio, or ranking the skill of a professional athlete, the ability to quantify confidence is paramount.

The Glicko system stands as a testament to the power of statistical rigor in the face of complexity. It is a system that is not afraid to admit what it doesn't know. It is a system that evolves with time, that adapts to the erratic nature of human performance, and that provides a fair and accurate measure of skill in a chaotic world.

For the reader who has just finished "Overreactions, and regular reactions," the Glicko system offers a deeper understanding of the mechanisms that underpin our perception of performance. It shows that what we often mistake for overreaction or underreaction is, in fact, a sophisticated algorithmic response to the inherent uncertainty of the data. It is a reminder that in the pursuit of truth, the most honest answer is often not a single number, but a range, a volatility, and a measure of how much we can trust what we see.

The next time you log into a ranked game and see your rating change, remember that behind that number is a complex dance of probabilities, confidence intervals, and volatility coefficients. You are not just seeing a score; you are seeing a statistical portrait of your current state of being, constantly updated, constantly refined, and constantly aware of its own limitations. That is the genius of Glicko. It does not promise certainty. It promises accuracy. And in a world of games, that is the only victory that truly matters.

The Illusion of Certainty

The Erosion of Knowledge

The Evolution to Glicko-2

A Global Standard

The Philosophy of Uncertainty

The Future of Measurement

Related Articles