Tony Lezard, creator and administrator of the national ratings service at https://results.ukbgf.com, explains what the ratings formula means and why the bar for changing the formula should be set very high indeed.
I sometimes receive suggestions for how the formula used for the national ratings service could be improved. It’s hard to argue that the current formula is unimprovable, but hastily changing it causes its own problems, even if the rationale for making a change is sound. Let’s examine this more closely.
What do we mean by changing the formula?
The rating we are all familiar with is the result of applying a single formula repeatedly over a list of match results, with the players’ ratings being what comes out at the end. If the sequence of results changes (usually by adding new results at the end), the formula is recomputed over the entire sequence again.
For each match result in the sequence, the ratings formula is applied twice, once for the winner, once for the loser, taking as its input the ratings of both two players and the current the players’ experience level, returning a number by which the rating will change:
rating_change = f(rating, experience, opponent_rating, points_won)
where points_won is the match length, or the negative of that quantity, according to whether the player in question won or lost the match.
The change in rating is computed and applied for each player in the match, and then we move on to the next result in the sequence and repeat the process. Most of the time, we can save ourselves a lot of computational effort by making use of the property that when a result is added at the end of the sequence, you can use the current ratings as an intermediate result in the calculation of the new ratings.
It is important to note that there is only one formula, iterated over all the results in the sequence. It is incorrect to declare that “henceforth” the ratings formula will be something else – what you are actually doing is saying that the date of the match is now a dependent part of the formula:
rating_change = f(rating, experience, opponent_rating, points_won, date_of_match)
and internally the function f compares date_of_match to some hard-wired value and applies either the old formula or the new one.
So a change in formula is necessarily a retrospective change, applied to every result in the match sequence, however, a ratings formula with new data required and an embedded discontinuity is ugly and lacks credibility.
Keeping the date out of the calculation and using a new formula throughout is definitely cleaner, but it creates a new oddity. If a formula change is implemented, then history is erased: all historic claims of reaching some particular milestone like an impressive rating never happened. You might, or might not, be able to replace those occasions with other milestones, but you no longer own the particular achievement you thought you did.
It could also have the effect of changing past winners of competitions such as the Holland Park Ladder, which rely on the output of the ratings formula applied to a subset of the results.
It’s all about the rank
For most people, their ranking in the ratings table is the statistic of prime importance. The rating is a proxy for how good you are at backgammon, and the actual numbers are somewhat arbitrary, but everyone wants to be the best!
So when we consider changes to the ratings formula, it is crucial to look at how it will affect people’s rank in the table. A change in formula will shuffle the rankings about a bit, and without knowing anything more about the change, you would expect about half the players will go up in the table and half down, with a few staying in the same place.
There won’t necessarily be an equal division of winners and losers. A change whose only effect on the ranking was to shift the player at the top of the table to the bottom would improve the ranking of every player except for that one. However, the total ranking change must cancel out to zero.
Rationales for change
You might want to change the ratings formula for one of a couple of reasons.
The first is where a participant in the ratings system has come forward with an impression that there is an unfairness in the ratings table and it is not accurately reflecting backgammon ability. The suggestion is an effort to provide a remedy.
This does inevitably single out one group of players in the ratings table at the expense of another, pushing them up the table as the others go down. This immediately creates a tension in the debate, and careful justification that will be broadly accepted by all participants is required to explain why the new formula is a better proxy for playing ability.
The other grounds for suggesting a change are technical ones, such as a provable bettering of the theory developed by Arpad Elo, or a superior development of it for the specifics of backgammon. This does not have to be motivated by any perceived issue – it could be a paper presented independently by researchers disinterested from the UK ratings. So far I have not seen any examples of this type of work, but it would be interesting.
A worked example
Let us consider one recent suggestion I have received (from Chris Hamilton). Chris suggests abolishing the “K-factor” – the boost given to new players designed to accelerate them to a more representative rating when they enter the table.
Intuitively, one would expect this change to clip the wings of the less-experienced players and pull them towards the centre of the table, and indeed we do see higher-experienced players moving towards both ends of the table relative to their less-experienced counterparts.
In more detail, Chris’s change would improve the ranking of 41% of all players, while 58% would go down, and 1% would stay the same. Compared to an overall average experience of 401, in the top half of the table the average experience of those whose rank increases under this change is 974, and in the bottom half, the average experience of those who go down in rank is 605. Conversely, the experience of players who go down in the top half, or up in the bottom half, is below the average.
The pull towards the centre is felt by everyone: all of the top 100 players’ ratings go down, by an average of 63 points.
The 50-50 rule
Is this a reasonable change to be making? Perhaps. If you are a strong player with high experience, or a weak player with low experience, you’re going to come out well from it so there’s a point in its favour.
As noted previously, you’ll find that roughly 50% of players will benefit in the rankings from any given change, incentivising them to support the proposal regardless. It can be hard to disentangle ‘pure’ consideration of whether a change should be made from the fact of what it will do to oneself personally in the ratings table.
In practice, we can’t poll everybody in the table as we can’t verify everybody’s identity. But one cohort who we do have verified contact details for is UKBGF members, and it is the UKBGF’s own ratings table so that sounds fair enough. But this introduces a new bias: 71% of UKBGF members are in the top half of the table, and they have a lot of experience between them – an average of 1,243 points. In the example we have been looking at, they stand to gain more than most.
Viewed from a certain vantage point, passing a resolution for this proposal could look like a self-serving clique manipulating a ratings system that has been provided for all backgammon players in the country to inflate their own rankings in it. While this is unlikely to be how the voting members feel about it, impressions count.
But the fact is that every way of adjusting the formula will contain within it this property of about 50% of players benefitting personally, and there will always be potential for resentment and schisms involving aggrieved parties as a result.
One way of accounting for this effect would be to take 50% of the vote as a given and require that the proposition impresses a majority of the players who are going to lose out, i.e. a 75% margin overall. This might sound like overkill, but there is also another factor to consider.
A question of integrity
I don’t know if the formula we use on the UKBGF site is mathematically the best one there is, but one thing I can say for it is that it has pedigree.
The formula traces its lineage via Elo’s chess formula(e) to Andreas Schneider’s First Internet Backgammon Server (FIBS), which launched in 1992. From 2005 I implemented the formula unaltered for the MCC Backgammon Society and the London Clubs League, initially just to keep track of the club members and gauge their playing strength. Again, no changes to the formula were made when my software began to drive the UKBGF national ratings in 2014, and none have been made since then.
The virtue of never changing the formula means no one can possibly be accused of having an axe to grind, or a hidden agenda at play. It’s a selection brought to us by history that the UKBGF didn’t make. Like a dice roll, it may be good or bad, but at least it’s fair.
Once we start tinkering with the formula, we destroy that impartiality forever. We become open to accusations of manipulating it in favour of one group of players over another. If we leave it alone, then for better or for worse, it remains an independent adjudicator.
The Difficult Position
It is not my opinion that changing the ratings formula should be inconceivable, but I do believe that there needs to be a compelling case for any change, ideally with a mathematical foundation. If it passes that test, there should remain a strong barrier to putting it into effect – for example, requiring it to secure 75% of any vote on the question, and perhaps also limiting how frequently a proposal or one similar to it may be voted upon. The 75% threshold is not without precedent: an amendment to the United States constitution requires ratification by the same proportion of states.
We’ve seen how any formula change can have unexpected consequences, and how implementing a change will cost a fair chunk of reputational capital for the UKBGF, and risks fostering division in the backgammon community.
And we’ve also seen how all change proposals have the potential to carry a large number of voters along with them without much effort.
Making the formula difficult to change strengthens the ratings as a resource for all, and would in turn strengthen any successor should one be adopted in the future.
This article was first published in “One Foot In The Gammon”, the Ealing club newsletter. Figures correct as of the end of Friday 28th April 2017.