Why not use a computer? These techniques are already being used in some sports (college basketball and professional tennis, for example), so their extension to college football and other sports seems only a matter of time. But let us first list our requirements for an ideal system of ranking college football teams at any point in the season, and then see how close we can get to perfection.
PARITY: Scores of all games from 22 August to 4 January count equally.
SELF-CENSORSHIP: No other information is utilized. This means intentionally disregarding all we know about the history of each team, which leads to counter-intuitive standings early in the season.
ACCESSIBILITY: Coaches or any interested parties should be able to assess how the standings will reflect what a given score does for their team, if all other results were known.
WINS: Two victories by 1 point should be worth more than a tie and a lopsided win, no matter by how much.
MARGINS: A win by a higher margin should place a team higher, but the extra benefit should diminish to zero as the margin gets "too" high, and the total effect of any one game should be bounded (if the other results for that team remain fixed).
OPPOSITION: The same result against a higher ranked team should produce a higher standing.
ELIGIBILITY: Any team is eligible to be ranked anywhere in the standings based on their own scores, no matter what their opponents do in their other games.
COMPREHENSIVENESS: One list of all NCAA and NAIA teams should be produced, with division of competition indicated.
REGULATION: Standings should lead to automatic and foolproof selection of playoff participants and seeding, and determine the national championship.
Most computer standings meet very few of these conditions, either because of lack of mathematical sophistication or because they are intended primarily for betting. Polls meet only the final condition.
Because of the irregularity of college football schedules, the requisite complicated system would necessitate a sideline lap-top computer. But the official standings program should be in the public domain and made available to those schools with the resources to perform their own verification. Thus accessibility seems possible only to the extent that a coach interacts with his computer science department.
The self-censorship condition makes it very difficult to compute meaningful standings before October. In 1985, for example, Oklahoma played only one game in September.
There are several regulatory problems. Conferences with few or no games scheduled against outsiders cause unreliable or impossible ranking. This situation has arisen before (Texas Intercollegiate Conference years ago, union of Evergreen and Northwest Conferences for some years up to 1984, Columbia Football League in 1985) and should be guarded against. The solution is to insist on a miminum number of games against outsiders as a precondition for championship or playoff consideration.
Another problem occurs when too many teams from a conference or region would be qualified for playoffs. Appropriate maxima for conferences and regions could be specified in advance, and these could be larger for fields of 16 than for fields of 8. But setting minima by conference is inadvisable. A conference may simply be way under par in a given year. And sending a conference champion may be logically faulty (the champion could lose all outside games, while a second place team could win all outside games).
An unusual problem with the national championship is that two teams could be so close as to deserve a cochampionship (co-champions happen now only when the polls tie or disagree, or when the final playoff game is a tie). Heuristic rules may be devised to allow for this possibility, but some discretion seems necessary.
I would like to demonstrate one kind of ranking now available in the public domain. Most other computer systems are secret or proprietary, but this one has been presented to several meetings of the American Statistical Association and mentioned in the 11 November 1991 issue of Sports Illustrated, as well as on pp 55-58 of the latest issues of NCAA Football. The source code is available here.
In order to meet the wins and margins conditions, we first grade all scores according to a formula. Draws are worth .500, one-point victories .765, 3-point wins .795, 7 points .850, 14 points .922, 21 points .963, 28 points .984, 35 points .993, etc. The payoff is clearly diminishing. Thus coaches will not lose the incentives they now have to pull out the first team when they get far enough ahead (as protection against injury and to give experience to backups). Losses are worth one minus the grade of the winner. Grades are then totalled for all opponents, and this figure is matched against the (temporary) rankings of these opponents. An iterative procedure eventually produces the ranking in minutes on a personal computer according to an otherwise standard method of pairwise comparisons published at least 40 years ago. Estimates are then converted to a logistic scale corresponding to a standard deviation of 15.
The above system meets all the required conditions from October to the end of the season, except for championships. Any team within 1.8 points of the leader automatically shares in a cochampionship, and other teams within 3.0 points of the leader share at my discretion. Championships have been awarded on this basis by the Foundation for the Analysis of Competitions and Tournaments since the 1970s, and retroactive to 1968.
FACT's standings are computed without reference to things like division or conference. The code (BASIC) isn't proprietary. I have given papers on the methodology to professional meetings of statisticians, and will forward the program to anybody interested. If sombody is interested in using my program for rankings of college and professional sports, I will be happy to go through the job of preparing my material. Any such individual should contact me.
Basketball standings and comments are available here.
Please note that standings early in the season fluctuate wildly, because prior information which a reader may have about the teams is deliberately being withheld from the computer. If a team currently is 2-2, for example, the computer tries to place that team below the two teams lost to and above the two teams beaten. As a thought experiment, try replacing team's names with neutral identifiers (such as AAA, AAB, AAC, etc.) with no associations. Then see if the school is in a reasonable position. Remember that these rankings are not intended for any kind of gambling. Other ratings partially intended for gambling may seem more reasonable, but they include some kind of judicious assessment of information from sources other than the current year's scores.
Final ratings from past years:
| || || ||1983||1984||1985||1986||1987||1988||1989|
Top 5 teams since 1968.
Question: How long have you been doing your computer rankings?
Answer: I started in 1963 with an incorrect method. Around April of 1970 or 1971, I came up with the method now used. The number of teams increased until the 1980s, when I started trying to include every college football team. I've presented papers to ASA meetings on this subject. I was the founder and cochair of an ad hoc ASA Committee on Statistics in Sports and Competition in the 1970s. I have been advocating for many years that football playoffs in lower divisions of NCAA and in the NAIA be based on my computer standings, and that regional quotas should be eliminated. Teams would be required to play at least one game outside of their conferences to be eligible. Limits for the number of teams from a conference would depend on the number of teams in the playoff and on other complicated factors. All these playoffs now depend on polls. There are quotas in NCAA Division II and III. Last year an IIAC team went to the Division III playoffs without having a single game outside of their own conference! (They lost in the first round.)
Question: How did you get involved in the BCS?
Answer: After my work was posted on the internet by David Wilson, the BCS people came across the standings and accompanying essay written for a lay audience. They told me they were impressed with the constraint that no outside information is utilized. We then had many conversations in which the standings were tested with hypothetical results, and many weekly standings were also recomputed and sent to them by e-mail. The results apparently demonstrated to them that running up scores had very little payoff in my system.
Question: What is the main objective of computer rankings?
Answer: That depends on what kind of rankings you're looking for. I distinguish between ratings and standings. Ratings are estimates of the current ability of the teams, and all of the public and private knowledge about the teams might be utilized. But standings are self-censored estimates that are constrained to using only current season scores.
Question: What are the components that make up excellent computer rankings?
Answer: Excellence in ratings should be judged by predictive success. But standings should be judged by the value system of the participants. In other words, how much is any one score valuation compared to any other score?
Question: What factors do you consider in determining your rankings - margin of victory, home win, home loss, strength of schedule, etc.?
Answer: I use grades assigned to margins, and the identity of the opposition. Home field and date are ignored. See essay. My standings are in the public domain, and I will give my (BASIC) program to any serious investigator.
Question: What are the redeeming qualities of using computer rankings to help determine the top two teams?
Answer: We need a completely objective and public technique which will tell teams where they stand and what they have to do to be in the playoffs. (Please note that I have a theory about playoffs which claims that a second best team isn't always the right choice for a playoff game. In 1993, for example, my #3 team, Nebraska, would be chosen instead of my #2 team, Notre Dame, to play the #1 team, Florida St.) Polls depend on naturally Bayesian human reasoning, which has all kinds of personal bias and is subject to vagaries of publicity. (Bayesian estimates give some weight to an outside source of information other than the current sample of data.)
Question: What are the not-so-redeeming aspects of using computer rankings to help determine the top two teams?
Answer: The right computation is what we need. But note that the top two may not always be the right choices for a playoff game. Also note standings do not reflect the interest of business. If two teams were tied for a playoff spot, the people operating the venue would prefer that school likely to generate greater numbers of spectators. Polls seem to have a component which goes in this direction.
Question: Do you think your computer rankings are the best and why?
Answer: The basic technique is the natural one and has been known since at least the 1950s. The only question is the value system represented by my grades. Since I could not question participants, I had to guess at a smooth function to reflect their values. And my system is generic. There's a single constant (15) for college football which changes for other sports (e.g. 2.35 for ice hockey). My rankings should be used in college basketball instead of the awful RPI.
Question: Do you have any problems with any of the other computer rankings?
Answer: Anything goes in ratings. Even in standings, if one does accept the basic technique, there are infinite possibilities until we settle on a value system.
Question: Is there room for all these computer rankings?
Answer: The public has a high demand for ratings which can be met by many different products.
Question: Do you think it's good to have eight different computer rankings involved in the BCS?
Answer: Data reliability is a problem in college football because of wire service errors and our own transcription mistakes when entering data in a computer. One way to protect any system against error is to use some form of consensus depending on subsystems which are independent (hopefully).
Question: Are you a college football fan?
Answer: I try to watch a few minutes every week, but haven't been to a game in years. I watch a lot of bowl games.
Question: What is your background or bio? Where did you go to school? What do you do for a living? Do you consider yourself a computer nerd, a propeller head?
Answer: I was a mathematical statistician until 1986 in a variety of companies (defense, aerospace, seismic risk). Degrees (BA, MA) are from University of Wisconsin. Main interest now is in constitutional reform, and I have given my talk on this subject many times around Los Angeles (twice to a group of political scientists at UCLA). My proposals are unlike the material one usually sees in this field, since my approach comes from systems analysis. The interest in constitutions led to my taking a seminar at Harvard (1963 at GSPA, Banfield, political economy) which introduced me to the problem of the social welfare function, and that's analogous to the pairwise comparison problem when ranking contestants in sports without absolute standards. But we have an easier time ranking teams because team A will not throw a game to team B to benefit team C, whereas one can often observe strategic voting in legislatures (a senator may choose option A over option B, even though she really prefers B, because she wants some option C to survive the subsequent chain of ballots in the legislative mill.)
Question: Finally, if there is anything else you'd like to add, please feel free to do so.
Answer: Bronx High School of Science (1949-1951), Ford Foundation Fellowship as an undergraduate and National Science Foundation Fellowships as a graduate student at Wisconsin and at Harvard. By the way, I was in Henry Kissinger's seminar on defense organization during the Cuban missile crisis (fall, 1962). That was an experience! I was in ROTC class with Alan Ameche at Wisconsin when I first acquired an interest in football.
> To: David Rothman > > From: John Swofford, BCS Coordinator > > Date: January 11, 2001 > > As we evaluate the Bowl Championship Series standings, I would like > for > you to provide a detailed explanation with regard to your ratings > calculations and, in particular, account for the following: > > 1) margin of victory The winner gets a grade with diminishing marginal return based on score difference only, while a loser receives one minus the above amount. Given below are some sample grades (approximate): 1: .765 21: .963 3: .795 28: .984 7: .850 35: .993 14: .922 42: .997 No winner ever gets a grade of 1. The extra value to be gained by more than three touchdowns is so tiny that most coaches would pull their first teams. The risk of injury to key players isn't worth a higher value for their grade, and they may display tipoff behaviors not seen earlier in the game. > 2) strength of schedule For each team a total grade is computed, and that's matched against the latest values for all opponents played to date. The team's position on this list then gets adjusted slightly upward or downward, to get a better match. The process is repeated for all teams on the list millions of times until no further changes are necessary (estimates are in a double precision mode and for all practical purposes there will never be ties). > 3) head to head competition I am aware of no theory which justifies this. We are all familiar with the NFL tie-breaking procedures, in which ties are broken by a series of rules, but these rules are really arithmetic rituals without foundation in theory. > 4) other factors which may be included in your compilation Site and date of game are ignored, as is prior infor- mation about a team. > The BCS is concerned about the impact of "excessive" margin of victory > on its standings. Should the BCS decide to implement a limit on the > margin of victory, how would you be able to adjust your formula, for BCS > use only, to nullify the impact of a margin of victory beyond the 21-24 > point range? The BCS should want exactly what I have already. A tiny amount of grade improvement must be allowed for the rare situation where a team plays opponents exclusively which are relatively very weak. In addition, ceteris paribus, a higher score must be rewarded with a higher position although the incre- ment may be tiny. For example, if two schools both lose to a third school by a point, but the first school wins a rematch by 29 points and the second one is a rematch winner by only 28 points, the first team is ahead by .03146. But if the first school lost the first game by two points, they would be behind by .2182 in LOGITs. The difference between a 1 point and a 2 point margin swamps the tiny difference between a 28 point margin and a 29 point margin. > In addition, do you have any thoughts on how the BCS could add a > "head-to-head" factor to the overall BCS standings for teams that played > one another during the regular season and finished ranked next to each > other in the final BCS standings. Rock/paper/scissors is a children's game that may illustrate the fallacy involved. The three "teams" are really equal, but highlighting a single head-to-head outcome gives an incorrect ranking.
(c) David Rothman, Executive Director, FACT
14125 Doty Avenue, #23, Hawthorne, CA 90250-8042
FACT stands for Foundation for the Analysis of Competitions and Tournaments.
David Rothman / firstname.lastname@example.org