The "RPI" is the Rating Percentage Index concocted by the NCAA to "assist" in determining which teams to let into the national basketball championship tournaments and how to seed them. To calculate a team's RPI, you take its winning percentage, twice the average winning percentage of the team's opponents, and the average of the winning percentage of the opponent's opponent's, add them together and divide by four.

Some observations, then, on the RPI:

I don't understand why the opponent's average is weighted twice as much as the surrounding terms, but I would propose that, in fact, we go to a system that just adds these percentages to infinity. This can be done by a computer exactly and relatively quickly through the use of some linear algebra and a cute stunt.

Mathematics

You probably don't want to proceed any further if you are afraid of mathematics and/or don't know any linear algebra. What I'm going to do is to arrange the winning percentages into a vector w; from the schedule I can arrange a matrix C, each of whose rows totals 1, such that Cw becomes the average winning percentage of the teams' opponents, again arranged into a vector. (The entries of C, then, are simply the number of games any two teams played against each other divided by the number of games the team corresponding to the row played; for example, Duke plays North Carolina twice in a season, so if Duke played 34 games and UNC played 32 games, the entry corresponding to Duke-UNC is 2/34 while the entry for UNC-Duke is 2/32. The laws of matrix multiplication, then, mean that multiplication by this matrix averages the values of one's opponents'.) CCw is then the average for the opponent's opponents, and so on; what I've proposed, then, is r=w+Cw+C^2w...=(1+C+C^2+...)w, where by 1 I mean the identity matrix.

The mathematically inclined people who should be reading this will know that, in some sense, (1-C)(1+C+C^2+...)=(1+C+C^2+... -C-C^2...)=1 so long as C^n gets small when n gets big; by multiplying both sides of my equation above by (1-C), I get (1-C)r=w, or r=w+Cr, so that a team's RPI is its winning percentage plus the average of its opponents' RPIs, which seems itself to be a fine ab initio definition of our RPI. Thus, if we can invert the matrix 1-C by our usual laws of matrix inversion, we can just multiply that by w to get r.

Unfortunately, we can't.

Each row of C, you remember, adds up to 1, so that each row of 1-C will add to zero. The columns are linearly dependent, and 1-C is therefore not invertible. So we've reduced this problem to one of inverting a noninvertible matrix.

Consider a matrix all of whose entries are the same, say E. Er, then, would be equal to the average of r for all teams. Another upside to subtracting off .500 to put the average at zero; Er=0. (Incidentally, because different teams play different numbers of games, it may not work out exactly to zero; let me then not actually subtract off .500 exactly, but whatever I need to to make Er=0. Not only is this actually necessary to make the C^n actually go to zero, but it also gives the closest approximation in a least squares sense to w when we end up multiplying (1-C) by the solution r that this will give us; remember, since 1-C is singular, no matter what we choose for r (1-C)r will be restricted to a subspace that doesn't necessarily exactly include w, but it gets darn close, and this little hack we're doing gives us the point of closest approach.) Since Er=0, and (1-C)r=w, (1-C+E)r=w. So, by adding some number to each entry of 1-C, we get a matrix 1-C+E whose inverse, M, can be determined by standard methods.

In some sense we don't have to calculate M; all we really need to do is solve (1-C)r=w for r. There is a little value in calculating M:

  1. Modulo early season tournaments, we know the schedule before the season, and can thus calculate M before the season begins, rather than having to wait until we know w.
  2. Having an actual matrix M for which r=Mw can provide a certain amount of insight into how closely a team is tied to another. Some of you probably don't care, but I think this sort of thing is enlightening; a team might think it's doing well because it has some good wins over teams with good records, but its row in the matrix is dominated by entries against Towson St. and Yale.
Again, we have this kernel problem with (1-C), this singularity problem, that the equation (1-C)r=w is not solvable. (1-C+E)r=w is the equation you should actually try to solve.