The Tempo-Free Gridiron: Exploring In-Game Win Probabilities [Updated]

This past year we've been working on providing live in-game win probabilities for every FBS game. By this, we mean that given the current situation on the field -- the teams playing, the score, the time remaining, possession, down, distance, and field position -- what are the odds that each time will win?

However the in-game win probabilities we posted this past season were a function only of team strength, offensive and defensive efficiency so far in the game, the magnitude of the lead, and the amount of time left. The obvious glaring deficiency here is that without possession, down, distance, and field position, we're blind to a late-game situation in which a team is driving down the field for a go-ahead score, or when a team with a lead has iced it by preventing the opposing team from ever getting the ball back. Can we quantify that, though? And to what extent are we flying blind by not having this data? Which matters most: possession, down, distance, or field position?

Let's examine our current model, see how it works, test how it's done, and then figure out how to improve it.

The current in-game model uses three main inputs, and combines those with varying weights as the game progresses:

Initial game state. These are the odds that TFG and RBA assign to each team before the game starts. Before the coin flip, we have a prediction as to which team will win, and the odds that they will win. This becomes less relevant as the game unfolds.
Demonstrated performance. These are the offensive and defensive efficiencies calculated from the number of points each team has scores so far in the game. They don't mean much at the start, are more important in the middle of the game, and become slightly less important at the end.
Magnitude of the lead. This is actually a combination of the number of points by which the currently-leading team has outscored their opponents and the amount of time left in the game. This factor is effectively meaningless in the early and middle parts of the game -- unless there's an extreme lead -- but becomes a dominant factor late in the game.

How do we combine these, then? In short, it's a bit of a hack. The initial game state has a weight of 1, then linearly decreases as the game progresses. The demonstrated performance of the teams is equal to the square root of the percent of the game played. The weight of the lead is equal to the square of the percent of the game played.

Visually, the relative weights look like this:

We'll be the first to admit that this isn't scientific or data-driven, and that bugs us. We wanted to get something out the door that seemed feasible, mimicked what our intuition told us was right, and could be tuned later.

First, though, is the question of how this hack holds up to scrutiny. We've done some tests on the quality of our in-game predictions for 2012, and plotted the data here:

The x-axis is the predicted probability that the at-the-time favored team would win, while the y-axis is the actual number of times that the favorite did go on to win. We broke the data down into buckets, so the first data point consists of all of predictions in the [0.500,0.525) range, the second is all the predictions in the [0.525,0.550) range, etc. For example, of the 1841 times we gave a team a [0.500,0.525) chance of winning during any point in the game, we expected to be right a total of 942.1 times (0.512), and were actually right 899 times (0.488).

Obviously a perfect predictor would have all points clustered at the top-right corner at 100%. Barring that, though, if our model says it's 90% confident in a prediction, 90% of those should turn out correct. And at the overall game-level, though, we seem to be doing pretty well, if a bit conservative. (R^2 against the target line = 0.940).

At first glance it appears that our hack isn't unreasonable. It produces plausible results in which the actual values closely mirror our expected values.

However we can also look at our end-of-quarter predictions. I.e., if we look at the odds we assign within either minute of the end of a quarter, how well do those snapshot odds play out?

At the end of the first quarter the R^2 for our fit isn't great (0.565) but that's because we appear to be biased in a conservative direction; we suspect we know the winner and we're right more often than we give ourselves credit for.

At halftime our pure statistical fit is better (R^2 = 0.868) but we're starting to see a divergence in the data; our predictions with confidence less than 0.65 are overly confident, while those above 0.65 are overly conservative.

Here's where it kind of falls apart. Our in-game odds tend to not be very good at the end (R^2 = 0.678) where there's a lot of leverage to be had, and a single play can make a huge difference. Not having possession, down, distance, and yardage is really hurting us here.

In preparation for Sloan we've been parsing a few years' worth of play-by-play data and thinking about how to incorporate that into our models. Strapped for time we decided to do the easiest thing possible: create an expected value based on field position, and add that to each team's score. In other words, if a team is at their own 25 yard line (say, after a touchback) what is the expected number of points the offense expects to score in that scenario, what what is the expected number of points the defense expects to score in that scenario (say, with a pick-six).

We crunched the numbers, and this is the graph we get:

The X-value is the distance from the goal line, and the Y-value is the number of points a side expects to get from having the ball at this distance. We'd like to point out that these distances are agnostic to the down and distance measurements. A data point in the 60-yard bin could be a 1st-and-10 from their own 40, or it could be a 4th-and-25 from a down that started on their opponent's 45 and has since been backed up to their own 40.

Even ignoring down and distance, there are some interesting observations we can make.

On offense, the field is broken roughly into three segments: inside the opponent's 10, between the opponent's 10 and their own 45, and from their own 45 back to their own goal line. We'd argue that the real "red zone" doesn't actually start until you reach the opponent's 10 yard line.

On offense, you're actually much better off with the ball at your own 1 than any other point inside your own 15. I don't know how to explain it, but drives that have the ball at their own 1 do a full half-point in expected value better than drives with the ball at their own 2. We can't explain that.

Also on offense, you're better off having the ball at your opponent's 2 yard-line than their 1 yard-line. A ball placed on the opponent's 2 yields nearly a half-point in expected value more than a ball placed on the opponent's 1. We haven't had time to explore if this is statistically significant, but our hypothesis is that a ball placed on the opponent's 1 tends to result in the offense calling a conservative run play instead of opening up the field a bit more and trying a passing play.

On defense, the field is broken into two segments: inside the offense's own 5, and the other 95 yards of the field. An interesting yet as-yet-unexplored phenomenon is that the defense doesn't actually get that many points with the offense backed up to their own 1 or 2, but yields almost a third of a point when the ball is placed on the five. In other words, there's about a 4.5% chance that a drive in which the offense has the ball on their own 5 results in a touchdown for the defense, either on a pick-six, a fumble, or a punt run back for a touchdown. If the ball is on the offense's own 1, those odds drop to 1.5%.

This is the part of the blog post where we should really be telling you what we're going to do with this data and how to incorporate it. We really wish we could, too. But the truth is that it's 1:20am and Sloan starts in a few hours. Questions on how to weight and re-weight the various factors in a theoretically consistent fashion, and how to incorporate down and distance will have to wait a few days.

Update: we've been able to integrate this basic field position data into our in-game models. It's a basic approach that adds the field-position-based expected value to the scores and recalculates the efficiency of the teams based on that. For example, in the 2013 Rose Bowl game pitting Stanford against Wisconsin, with 2:15 left in the first half the score was Stanford 17, Wisconsin 7; Wisconsin had a 1st-and-10 with the ball on their own 29. The expected value of that field position is 0.10 points for the defense (Stanford) and 1.38 points for the offense (Wisconsin). This makes the effective score Stanford 17.10, Wisconsin 8.38.

This gives us a more nuanced view of the game as it unfolds. Without this nuanced view, here's what our in-game odds looked like this past year:

Without accurate play-by-play data and without incorporating the expected value of field position, there are large discontinuities in the graph as points arrive in discrete three-point or seven-point chunks. Between Wisconsin's score just before halftime and Stanford's final field goal with five minutes left there's no sense that anything is happening of any significance.

Let's look at the graph once we add in field-position-based expected values:

Here we see the nuance of the game as Stanford and Wisconsin spend the better part of the third quarter driving back and forth. Stanford still remains in the 65%-75% likely to win the game, but we see the game with more clarity.

Here are two final examples from Alabama's 2012-2013 season. The first one is from the 2012 SEC Champsionship game. With 21 seconds left, Georgia has just completed a 26 yard pass play and has the ball at Alabama's 8 yard-line. Here's the in-game graph without the expected value from field position:

There's no sense that Alabama is on the verge of losing the game. They're at over 80% likely to win, even as the Bulldogs are knocking at the end zone. What happens when we include field position?

With 21 seconds left, the new in-game odds reflect that Georgia are almost 75% likely to win. That might be a bit high given that Georgia didn't have any way to stop the clock, but the bottom line is that the odds should favor Georgia at that point since they are in a better position to get more points that Alabama before the end of the game. It didn't turn out that way, but if you ran 20 trials in which you give Georgia a 1st-and-goal from the Alabama 8, it's reasonable to expect that more than half the time the Bulldogs would end up winning.

The final example is from the 2012 Alabama-LSU game. There's 1:18 left in the 4th quarter and Alabama has a 2nd-and-10 on the LSU 28. The scoreboard shows LSU up 17-14, but the expected-value-adjusted score shows that the score is Alabama 17.10, LSU 17.02, resulting in an in-game graph that looks like this:

This is the reverse of the SEC Championship game in that Alabama is in a better position to win than they were against Georgia. Unlike Georgia, however, the Crimson Tide were able to cover the 28 yards to the end zone in one play and seal the deal.

In short, adding in field-position-based expected value gives us a better picture of the game as it unfolds. Is the difference quantifiable and statistically significant? That's yet to be determined. But it's a low-overhead way to fit knowledge of field position value into our existing in-game win probabilities.

We'd love to hear your thoughts on this, though. Contact me at justin@tempo-free-gridiron.com, or look for me at Sloan.

As always, follow us on Twitter at @TFGridiron and @TFGLiveOdds.

Thursday, February 28, 2013

Exploring In-Game Win Probabilities [Updated]