Basic Rules that are applied during the application of the Overtake and Feedback System
The basic idea of the Overtake and Feedback (OAF) evaluation methodology is to ensure that if team A defeats "a better team" B, and both teams were ranked/rated relatively close to each other before the game, then A will overtake B, rising to just above B in the rankings (and A's rating will be slightly higher than B's after the game).
To begin with, each team will be expected to earn its rating, which will be updated after each game it plays. The teams are ranked in descending order according to their rating after each week's worth of games, where the team with the highest rating will be given the rank of N if there are N teams (which is equivalent to being ranked as the #1 team).
So, if team A defeats team B, and its rank is smaller than B by not more than SQRTN (which represents the rounded value of √N), then the two team's ratings will be averaged and A's new rating will be assigned as this average + 0.4, and B's rating will be the same average - 0.4. Team A is now ranked higher than B because of that outcome, however, if team A is more than √N positions below team B in the current rankings, then a modified overtake function must be applied, to lessen the effect of this "slight (or major) upset", and I will describe this function a little later on.
If A beats B, but is already ranked higher than B, its rating will go up by √(game score difference) * ((team B's rank) / (N+1))2, and B's will decrease by √(game score difference) * ((N+1-team A's rank)/(N+1))2. If both teams have roughly the same ranking/rating, the winning team could gain a larger, relative rating increase over the losing team by applying these updates than by utilizing the overtake strategy above, so, whichever method yields the larger separation between the winning and losing team's ratings will be used to update those team's ratings.
Teams are initially randomly ranked, and given a rating equal to its ranking, i.e. a team with the rank of 99 is assigned a rating of 99.0 to start. Then, each week's games are used to update the respective ratings, the teams are reordered, and then next week's games are used, and so on. At the end of the season, the ratings are normalized (by dividing them by N), and then all the games in that season are fed back into the algorithm. Normally, the rankings/ratings tend to stabilize pretty quickly, but to ensure that this occurs, N * √N iterations are performed. The final rankings/ratings are unfortunately somewhat sensitive to the initial ranking, though this is not true in all cases, so 1000 random rankings are used, each one with N*√N iterations, and those 1000 ratings are averaged to produce "the final OAF rating" for each team.
Now to get back to the modified overtake function that is used when "upsets" occur, let us take a look at the first game of 2004, Southern California versus Virginia Tech, utilizing the final 2003 OAF ratings for those teams (which are only relevant in 2003). USC ended up as the top team (#117) with a rating of 114.2505 and VT was team #56, rated at 65.0661. If VT had pulled off that upset, and the ratings and rankings were as listed above, then USC's rating would be reduced by (114.2505 - 65.0661) / (2 + distance) where distance is the largest, integer multiple of SQRTN less than or equal to the separation between the two team's rankings, which in this case is floor(61/11) or 5. (The farther down in the ranking the team that pulls off the "upset" is, the larger the denominator that will be used in the modified overtake function, thereby lessening the effect of a major upset more than losing to a team that is rated higher as such an outcome is considered more of a statistical anomaly than providing valuable information about the team's abilities.)
So, USC's rating would theoretically shrink by 7.0263 and that is how much VT's rating would increase by. (In subsequent iterations, those teams may be closer together because of this outcome, which would make the value of distance smaller, and therefore the amount the ratings change larger.) In actuality, to avoid some possible discontinuities in how ratings are updated at the multiples of SQRTN, that 7.0263 value is compared with the rating of the team that is exactly 55 below USC, which is allowed to use the denominator of 6 (not 7). (This ensures consistency as we want to use the denominator of 2 for team #106 - the pure overtake denominator value to get the exact average of the two ratings - for such calculations.) If team #62 (117-55) had a rating difference of 66 with USC, then 66/6 is larger than 7.0263 and that value of 11 would be used to decrease USC rating, and to increase VT's. So essentially, the teams that are less than or equal to SQRTN below its beaten opponent, use the pure overtake denominator of 2, then the next such SQRTN teams further down use a denominator of 3 with its rating, or a denominator of 2 using the rating of the team that is exactly SQRTN below the losing team; whichever value is larger gets applied to both team's ratings. Each subsequent group of SQRTN teams below that in the rankings increase those two denominator values each by one when this situation occurs (when utilizing this modified overtake function).
Though anomalies do sometimes occur using the OAF rating system, as there is no "perfect" evaluation system, this iterative methodology has correlated fairly well with the major polls as can be see via one of the links on the prior OAF "home page".
Back to Prof. Trono's Home page. (This page last modified February 25, 2005 .)