DaWei Designs

Craftsmanship Is Not A Boat From Sears

mylogo This material concerns the concept and use of ratings systems for games such a chess, backgammon, and, yes, even pitch. The subject seems to be mysterious and little undertood for most players. The principle is simple but the implementation often leads to dissatisfaction.

  ELO Ratings   

“Things are always least interesting when they're most clear, ...when everybody understands what's going on.”

Brian Eno, regarding music. He liked it mysterious, which is fine for music.

Questions & Answers

What is the ELO rating system?
The ELO Rating system assumes that one's skill level can be represented by a single number (as opposed to a formula). It also assumes that one's performance, though it may vary over a number of games, will have a mean value representing the player's true skill. It provides a number that purportedly represents that level. It was developed for one-on-one competition in games of skill with no elements of chance.

What does "ELO" stand for?
Nothing. It's the anglicized spelling of the name of the li'l feller that originated the system. It's capitilized to distinguish references to the system from Mr. Élő , himself.

Does it work?
Properly configured and calculated, it can work sufficiently well. Your skill level determines your rating. Your rating is used to determine your skill level. Obviously that's a chicken-egg thing, a circular process. Things can go kerplookety in such systems (those with "feedback").

Does my win/loss percentage affect my rating?
No, not directly.

Why not?
It shouldn't. If you amass a 60/40 record against a lobotomized chimpanzee, your skill is not the same as if you amass a 60/40 record against a world-class champion. Your win/loss record would be a great indicator if you played every possible opponent an equal number of games and so did everyone else. This could be the case in a small tournament, but the situation is not common, particularly in the online gaming environment, so another method must be used.

Tell me more about ELO, then.
If your skill level (and that of your opponent) can be represented numerically, then the outcome of a series of games (your wins versus your losses) can be quite closely predicted. If chance (luck) is a factor, then the series needs to be long enough for the effects of chance to balance out.

Conversely, if the outcome of a series of games is known, and the skill level of your opponent is known, your skill level can be accurately judged.

So what's the problem?
One problem is the starting point. Without the rating, how does one know what the skill level of a given player is? Without knowing the skill level, how does one calculate a meaningful rating? One might estimate the skills of a group of players by observing the outcome of a large number of games. One might assume that all players are equal, start them with the same rating, and let the games adjust the ultimate outcome. One might assume that new players are below the average by some amount and let their games adjust the outcome. Differing approaches are taken. For the results to be meaningful, the representation of the formula must be correct.

You mean, all ELO systems aren't alike?
"ELO system" is a generic term for the approach used. Different users have different views regarding the type of distribution that represents a single player's performance variation from game to game. They have different views about how much a player's rating should be adjusted in response to a single loss or win. They have different views about the uncertainty associated with a single prediction. Their versions of the formula therefore differ in respect to certain coefficients used.