Thursday, October 22, 2009

What's all this, then?

Welcome to my humble attempt at blatant imitation of real sports statisticians.

I'm a long-time follower of college basketball, and over about the last five to ten years there have been some pretty interesting strides in improving our understanding of how to analyze college ball.  One of the more visible people in this arena is Ken Pomeroy, who I've followed since around 2004.  His blog and his rating system really sparked my interest in the concept of tempo-free statistics.  The basic idea behind tempo-free statistics is that what we currently measure in a sport like basketball -- points per game, number of turnovers, etc -- is in many ways broken.  They depend in large part on how quickly a team plays the game; the faster the pace, the more possessions a team and their opponents have, the more opportunities for both teams to score and give up points, rack up assists, and cough up turnovers.

They don't, however, answer the fundamental question "How good is team 'X'?"

For that we turn to tempo-free statistics.  Remove the disparity in possessions, normalize all statistics to a common metric such as "points per 100 possessions", and adjust for the quality of the opponent.  This allows us to see the fundamental efficiency of a given team.  Ken Pomeory has an excellent write-up of how this applies to college basketball that I will not even attempt to duplicate, but simply encourage you to read.

The question this blog examines is "do these concepts apply to football as well as they do to basketball?"  To answer that question I plan to lay out how I create my rankings, where I obtain my data, post predictions on upcoming games, and analyze where the system was right and where it went wrong.  I also hope to examine some areas in which my system produces vastly different results than either conventional wisdom and/or other computerized rankings such as those used in the BCS.

This is not my first attempt at applying this approach to college football.  Last year I participated in the ESPN Winning Formula Challenge, a 12-week-long contest with significant sums of prize money for those who could write a computer program to predict college football results.  Unfortunately there was an issue with my code during the second 4-week competition -- remember kids: bounds-check your array accesses because some team somewhere will hang 80 points on their opponents -- but had my code worked throughout the season I would have finished 6th out of approximately 120 competitors.  That's a 73% accuracy rate using nothing but the final score and number of possessions in each game.

In my next post I'll go into some of the nuts and bolts of my approach, but for now I encourage you to read Pomeroy's write-up on how to use and understand tempo-free statistics.