How Data Can Be Used to Break Down the Game of Football
Author: Marshall King
Breaking Down the Game of Football
Guessing what a football offense will do next is part of the game.
Knowing what it would do next would be a huge advantage.
Zach Austin and Jimmy Brunette watch sports and wonder if they can use data and analytics to break down the game. That’s likely where they got the idea to study the data sets from college football games to predict whether an offense would run or pass in the next play.
The two were in the same dorm at the University of Notre Dame and had become close friends. In their Machine Learning class their junior year, they teamed up for a final project “to do something pretty cool,” Austin said.
Predicting what a college football offense will do next is often studied anecdotally, but not with data. Odds makers rarely offer odds on that kind of bet.
With powerful predictive models they encountered in class, they had the tools to tackle the challenge.
Tackling the Challenge
All the offensive plays 141 teams in the 2020 college had roughly 104,000 observations with 330 descriptive variables per observation. They pared that to roughly 80,000 observations with 32 predictor variables — still hundreds of thousands of lines of data.
They looked at teams individually and in six clusters, including grouping teams such as Army and Navy that are more run-oriented. The cluster model wasn’t quite as accurate but had a “strong balance between sensitivity and specificity, which is most applicable and safe for a coach to use,” according to their project.
“Our goal was to really capture coaching tendencies that would stick out: how well they were doing in this game, or the last number of carries or throws,” said Brunette. “A lot of the learning and predictive power came in the logic we were able to use.”
Predictability vs Success
They could predict that Navy would run 90 percent of the time, but value came in using a model that was 50 percent accurate on when Navy would pass.
They could predict the plays of the University of Washington with the most accuracy, demonstrating how that team played a predictable style. A coach who understood that could respond with play calling on either side of the ball.
“Teams that called plays very predictably didn’t see much success reflected in their records, neither did teams that called plays most unpredictably,” said Austin.
“The most successful teams were the ones that were predictable enough to demonstrate ‘smart’ play-calling, but unpredictable enough where defenses couldn’t catch on to their tendencies. That’s why the model performed moderately well on top teams like Alabama, Clemson, and Georgia.”
As expected, the distance to the first down, win probability and results of the previous play were the top factors in the model. Having machine learning help do this live would give bettors or teams a huge advantage.
As they watch the 2022 Fighting Irish football team, they believe Offensive Coordinator Tommy Rees is doing a good job calling plays. “He fits the mold of a lot of really successful teams,” said Austin.
They are in Chicago, where Brunette works for a consulting company and Austin works as a developer in machine learning operations. They work on sports-oriented data projects together, almost as part-time jobs. In the large NFL betting competition, they’re in the top 100 of 5,000 or so competitors. They are fans — just informed by data.