I expound on how the United States Chess Federation has rated players since first instituting the rating system in 1952. Chess is different from a number of sports in that the only objective information that can be drawn from a game is who won; basketball, football, hockey, etc provide additional information in the form of scores and other statistics. I accordingly discuss some potential modifications.
Elo assumed, and the system seems to work, that this advantage is essentially multiplicative; if player A beats player B 3 times as often as vice versa, and player B beats player C 3 times as often as vice versa, then player A should beat player C 9 times as often as the other way around. The ratings, then, are assigned logarithmically, so that the ratio (3) becomes a difference (proportional to the log of 3). There's still a scale factor to be decided, and the USCF uses a rating system where a difference of 400 points is a factor of ten; you can then calculate the probability that player A will beat player B by taking the difference between the ratings, dividing by 400, raising ten to that power, then dividing this by one more than itself (so that the probability of A beating B plus the probability of B beating A is equal to 1). The sticklers will be interested to know that a draw counts as half a win for each side; if two players play 4 games, I don't distinguish between two wins and two draws versus three wins and a loss.
Even if we don't accept the assumption of multiplicativeness, the next step is very reasonable so long as we have some way of calculating probabilities from ratings. (The very concept itself of "ratings" requires an assumption of transitivity, that if team A is better than team B and B is better than C that A is better than C, where "better than" means "will win more often than will lose". This assumption is almost certainly not exactly correct, but is often close. The fact that this is not always true is one of my complaints about single elimination tournaments.) If a player wins, his rating increases in proportion to the probability that he was to lose; if the player loses, his rating decreases in proportion to the probability that he was to win. Thus if the player wins exactly as often as he is supposed to, his rating stays fixed; if he wins more, his rating goes up over time, while if he loses it goes down. (For most players, the USCF changes the rating by 32 times the relevant probability; again, a draw is half a win and half a loss.)
For initiating ratings, you could calculate an RPI and multiply by 400, which is essentially what the USCF does. For sports, where everyone's season begins at the same time, we can alternatively just assign everyone the same rating at the beginning of the season and run all the games they played through this procedure twenty or thirty times until the ratings stabilize. Note
This isn't to say, however, that it is not fair for a ranking system to use scores, statistics, or astrology to predict which team is going to win; while I've defined what I consider desirable in a ranking system, any system that has been concocted is "good" so long as it meets that single criterion: it must predict in advance the winners of games. I expect it is fair to assume that a team that wins a game 72-71 on a last second shot would have been more likely to lose that game than a team that wins the game 91-50; if these results are recorded against the same team, one would suspect that the second team is better than the first. This leads me, then, to my alteration of the Elo system.
To make things clearer, I write an equation. (I'm serious. I don't sympathize with people who are blindly frightened by equations; I often find them much clearer than words.) Let w be 1 if the team wins, w=0 if the team loses, and P is the probability that the team wins; then in the Elo system we change the rating by 32*(w-P), 32 times the difference between the result and the expected result. All I propose to do to modify this is to redefine w; w is going to be the probability I think the team has of winning a rematch based on its performance in the game. In the above example, I might say w=.53 for the team that wins 72-71. This is where I bring in my caveat about teams that gamble to win; I might get 80% of the value for w from the score, but some of it, say 20%, should be based only on whether they win. In this case .8*.53+.2*1=.62, so that I assign the value .62 to w; if I thought the team had a 60% chance of winning, I might actually bump them up a little then, because, even if it was close, they *did* win.
I assume in hockey that, in any given 20 seconds or so, team A will have a certain (small) probability of beating scoring against team B, and team B will have a different (small) probability of scoring against team A. So long as these probabilities are constant throughout the game, that they don't change appreciably when the score changes, and so long as these probabilities are small, the probability of team A scoring a certain number of goals over the course of a game will follow a distribution well known to physicists, the Poisson distribution. The expected value, of course, is 180, the number of 20-second periods in a hockey match, times P, the probability of scoring a goal in one of those periods. For those with patience for combinatorics, I note that there are C(180,n) ways of scoring n goals, and that this gives a probability of C(180,n)P^n(1-P)^(180-n), which, for small n, is about (180^n/n!)*(P^n)*(1-P)^(180-n) = ((180*P)^n)(e^(-180*P))/n!, where n! is n factorial and e is 2.7183, the base of the natural logarithm; since 180*P is just the expected score m, we have a probability of (e^-m)(m^n)/n!, where the e^-m is just a constant that makes the total come out to 1 as the total of probabilities should.
For larger values of m, and even, to a decent approximation, for smaller ones, this distribution is similar to a normal curve, or a bell curve, where the standard deviation will be equal to the square root of the expected value; the difference between two random numbers that follow separate normal distributions follows a normal distribution itself, where the standard deviation is the square root of the sum of the squares of the separate distributions; the punch line is that the difference between the scores should follow a distribution centered at m1-m2 with a width equal to the square root of m1+m2, so that, with hockey at least, the probability associated with a victory by a score of m1 to m2 is approximately erf((m1-m2)/sqrt(m1+m2)), where erf is the error function, which is tabulated in standard probability tables.
If we gave each team 2 points for each goal they scored, we'd of course double both m1 and m2, which would double the numerator of (m1-m2)/sqrt(m1+m2) but would only multiply the denominator by the square root of 2; in such a system we would want to use erf((m1-m2)/sqrt(2*(m1+m2))). Football might work with a fudge factor similar to this; in the case of football it would presumably be between 3 and 7. In basketball, however, and to a lesser extent in football, there is a heavy inverse correlation between a team's scores, by which I mean that when one team scores the other team gets the ball and is more likely than not going to score before you get the ball back. The difference with hockey and soccer, then, is that in these sports there are very few goals scored per possession, which fits into the given assumptions well. For basketball and football, we need another term, giving us something like erf((m1-m2)/sqrt(n*(m1+m2-2r*sqrt(m1*m2)))), where n is a constant that sort of describes how many points a team scores in one go and r is a number between 0 and 1 that characterizes this inverse correlation alluded to earlier. n and r will really have to be fit semiempirically, that is to say by just trying different numbers and seeing what works for each given sport, but once you have those you can use the formula given to calculate this w to put into the rating-change formula that I gave.