DATA SCIENCE

2020 NFL Postseason Predictions from Machine Learning Model — Wild Card

Buccaneers & Saints heavy favorites while Seahawks & Ravens only have a slim edge

Nasir Bhanpuri, PhD

--

Over the past few years I have been predicting Super Bowl participants at the start of playoffs using machine learning (ML) models (L, LI, LII, LIII, LIV). This year, I’ll be doing things slightly differently and starting off by only going one week at a time. Also, I’ll spare some details this time so check out previous articles (or connect with me) if you’re interested in more ML and NFL nuances. Similar to last year, this year’s ML model (V 5.0) was trained and tested on in-season information, though using more data sources than last year’s model. V 5.0 was trained on data from weeks 2 to 12 and tested on weeks 13 to 17 (inputs: weekly updated team statistics, outcomes: weekly winners). During weeks 13 to 17, it correctly predicted 69.6% (55/79) of game winners (outperforming V 4.0 over a similar timeframe last season and season-long performance of 30 top prognosticators; more details in the Model Development & Evaluation section below).

Below are the predictions for this weekend in comparison to lines posted by ESPN (Ceasars Sportsbook by William Hill; CSWH) and FiveThirtyEight’s QB-adjusted Elo at time of writing:

Comparison of Predictions for Wild Card Games

All of the favorites match, though there is a trend for V 5.0 to predict tighter games, and thus equal or higher likelihood of upsets than CSWH & FiveThirtyEight in almost all cases. (One exception is FiveThirtyEight giving Washington slightly more of an upset chance than V 5.0).

While it’s possible that V 5.0 goes 6 for 6 this weekend, based on historical performance, it’s more likely that 1 or 2 of the underdogs end up winning (with the Rams & Titans as top candidates to pull off upsets).

Model Development & Evaluation

I once again used supervised machine learning to predict postseason outcomes and incorporated home-field advantage. (An interesting note is that the model learned that home field advantage has a smaller impact than in previous years, which seems plausible given limited or no fan attendance due to COVID-19). There are some interesting differences between V 5.0 and V 4.0 :

  • Both were trained on weekly data during the season, but V 5.0 was trained using weeks 2 to 12, while V 4.0 was trained using weeks 5 to 10. Having a wider window of data, including more recent data, in the training set seems to help, but it’s possible that the evaluation in the reduced test set is less reliable.
  • V 5.0 uses a larger number of metrics, including: the team efficiency metrics (posted by numberFire and used by V 4.0), the data behind FiveThirtyEight’s NFL Elo ratings and predictions, and more subjective power rankings from various news outlets aggregated by eatdrinkandsleepfootball.

I used a random forest algorithm, which can avoid overfitting and reduce the need for manual feature selection when hyperparameters are tuned properly. I won’t go into detail here about feature importance (which can provide some insight on the inner workings of the prediction engine). Though if there is interest, (and the model appears to be useful for predicting postseason outcomes, i.e. performs well) I may write up another post on that topic.

More Uncertainty than Usual

When I started making ML-based playoff predictions 6 years ago, I started to get a deeper appreciation of the variable and volatile nature of the NFL as I could see it manifest in the statistics and outcomes I’ve been tracking. Typical sources of variation include large point swings with a single play (e.g. red zone pick-6), large field position/down & distance swings due to subjective referee decisions (e.g. pass interference on a long pass), bad weather, injuries, funny bounces, etc. This year, uncertainty is even higher due to COVID-19 and necessary precautions to keep everyone safe. With games getting rescheduled, players and coaches sitting out, teams unable to practice as they normally would, reduced fan noise in the stadiums, etc. the games get even tougher to predict using football statistics alone. Furthermore, the expanded playoffs this year add a new wrinkle, which, by my analysis, has the net effect of slightly bringing down the chances of the top seeds to win the Lombardi Trophy (and more games for NFL fans!).

The playoffs are always entertaining to watch and these predictions are always fun to develop and share, and I hope I get a few right. More importantly, I’m hoping there are no major injuries or COVID-19 concerns for all the folks involved in entertaining us fans on a weekly basis for almost half of the year. Looking forward to an action-packed Wild Card Weekend and maybe an upset or two.

--

--

Nasir Bhanpuri, PhD

AI at Virta Health where I use data science to solve challenges in healthcare/medicine. I also use DS for sports, education, and music.