2019 NFL Postseason Predictions from Machine Learning Model

8 min readJan 3, 2020

Ravens leading the pack to win Super Bowl LIV, 49ers distant second

Happy New Year! It’s early January, which means it’s time to kick off the NFL playoffs and make Super Bowl predictions. Once again, I’ve learned from previous predictions and modified my modeling approach. While in past years I mainly tweaked the original methodology (L, LI, LII, LIII), for Super Bowl LIV it’s more of an overhaul than a minor adjustment. Previously, the machine learning (ML) models employed were trained and tested using information from previous seasons (inputs: full season team statistics, outcomes: Super Bowl participants). This year, the ML model (V 4.0) was trained and tested on in-season information from weeks 5 to 16 (inputs: weekly updated team statistics, outcomes: weekly winners). The model performed well in the test set: It correctly predicted 68.5% (63/92) of game winners during weeks 11 to 16 (better than season-long performance of 20 top prognosticators; more details in the Model Evaluation section below).

I applied V 4.0 to the current playoff contenders and estimated each team’s chances of winning their upcoming games. I then computed how the entire playoff bracket could play out based on V 4.0’s best guesses at each team’s chances of advancing, round by round, until I was left with the simulated Super Bowl winner. The results of 10,000 simulations are summarized immediately below (for those who are interested in more technical details of the methodology and model performance, jump to the later sections):

Model V 4.0 Predictions for each Playoff Round

Unsurprisingly, the most likely representatives from each conference to play for the Lombardi Trophy in South Florida are the No. 1 seeds of each conference, the Baltimore Ravens and the San Francisco 49ers. What is perhaps more interesting is that the Ravens are almost twice as likely to win the Super Bowl (34%) than the 49ers (18%). While the Ravens and the 49ers are the favorites to win their conferences, there’s actually only a 22% chance that both teams will be playing on February 2nd. Next up in the AFC are the Chiefs (28%) and the Patriots (11%), and for the NFC it’s the Packers (30%) and the Seahawks (10%).

Let’s take a look at how V 4.0 compares to other predictions that have been published on the web, namely ESPN’s FPI and FiveThirtyEights’s Elo & QB-adjusted Elo.

Comparison of Model predictions for Winning Conference Championships and Super Bowl

On the whole, there are many similarities between the predictions, but upon closer examination there are some subtle differences. All the models picked the Lamar Jackson-led Ravens and run-heavy 49ers as the favorites to win their respective conferences. There is also consensus that the Chiefs are second and the Patriots are third most likely to represent the AFC in Super Bowl LIV, while all other teams in that conference have at most a 5% chance. When it comes to the more closely packed NFC, the models diverge in some significant ways. The Saints and the Seahawks appear to be the most controversial. Sean Payton’s squad is ranked as high as tied for second by FiveThirtyEight–QB-adjusted Elo and as low as fourth by V 4.0, while Pete Carroll’s bunch are third according to V 4.0 and sixth according to the rest of the models. When it comes to winning the Super Bowl, all the models favor the AFC North champs, but V 4.0 gives a slight edge to the 49ers as the next most likely, while all other models prefer the Chiefs.

To understand some of these differences between models, let’s dive a little deeper into how V 4.0 works.

Model Methodology & Interpretation

I once again used supervised machine learning to predict postseason outcomes. Similar to more recent versions, I have incorporated home-field advantage, which his known to impact game results. However, V 4.0 differs from my previous modeling efforts in a few key ways:

V 4.0 was trained and tested on weekly data during the 2019 season (weeks 5 to 10 were used for training and weeks 11 to 16 were used for testing)
V 4.0 uses a smaller number of (continuously updated) team efficiency metrics, designed to account for numerous situational variables [posted by numberFire]

During model development, I compared a number of different algorithms. The one that performed best on the test data gives a higher weight to Pass Defense Rating (PDR) relative to other metrics. While it does not completely explain model output (e.g. the Ravens rank below the 49ers and Patriots in this metric, yet are the top pick to win the championship), it does help to explain why V 4.0 predicts more more success for the Seahawks than the Saints within the NFC. Unlike other models, V 4.0 suggests that in the Wild Card Round this upcoming weekend, Seattle has a bigger advantage over Philadelphia than New Orleans does over Minnesota. This is mainly driven by Seattle’s presumed superiority defending the pass. Since the Saints only appear to have a slim edge at home against the Vikings, when we play the simulations forward, the Seahawks’ larger advantage in the first round leads to Seattle’s higher NFC champ rank according to V 4.0. The difference in PDR also contributes to why V 4.0 thinks the 49ers are slightly more likely than the Chiefs to win the Super Bowl. Furthermore, the Patriots’ league leading PDR relative to the mediocre Titans’ score on this metric helps explain why V 4.0 more heavily favors New England to win their Wild Card Game than other models. It is worth noting that these model differences may be partially due to model uncertainty and thus smaller than depicted above. (More discussion in the Model Evaluation section below).

I mentioned above that win prediction accuracy was 68.5%, but as data scientists know, it can be helpful to examine performance in a number of ways to better understand how to interpret model output. Compared to my previous modeling approaches, the larger sample size in the test set allowed for more robust evaluation of single game prediction (calibration curve, ROC AUC, peak accuracy). The next section describes these evaluation metrics in more detail.

Model Evaluation

The 68.5% win prediction rate gives V 4.0 some credibility, but it is far from perfect. A closer examination of some additional evaluation metrics can help better understand some strengths and weaknesses of the model. During model development, I framed the output as the probability that the away team would win a given game. Thus, for the more detailed evaluation metrics, when the output is a low probability (0–0.5), the home team is favored, while high probabilities (0.5–1) suggest the away team is favored.

First, let’s take a look at the calibration curve, which helps us address questions such as, when the model predicted 75% chance of winning, were 75% of predictions actually correct? The calibration curve, which shows actual ratio of correct predictions versus binned prediction probabilities, suggests the model is indeed well-calibrated.

Calibration Curve: generated by binning games with similar predicted probabilities

The red dashed line shows the ideal calibration, and the black line shows the empirically derived calibration fit (the gray region depicts the 95% CI). Two different models can provide the same % chance of winning, but the precision (i.e. error interval) for one approach might be much higher than the other. This characteristic depends on the sample size and variability observed. With this type of analysis, you can use the chart above to surmise that when V 4.0 produces an output of 75%, while that is most likely value, it could actually be 63–88%. For simplicity, I’ve reported the most likely model output values throughout this article, but keep in mind there is some uncertainty associated with those numbers.

I explored a few additional metrics. The test set ROC AUC was .74, which supports the conclusion that the model is better than chance, but imperfect. The peak accuracy is 72% at a cut point of .48, which suggests the model might be slightly overvaluing home field advantage (but it’s difficult to definitively conclude that the model contains excessive home-field bias given the sample size).

Caveats

A couple of the weaknesses of earlier models are again present in V 4.0. Firstly, since it’s based on team-level metrics aggregated throughout the regular season, it ignores personnel changes. For example, the Texans’ defensive stats might be misleading, and thus underestimating their chances, given the return of three-time Defensive Player of the Year J. J. Watt, who missed the latter half of the season due to injury. On the other hand, the Seahawks rushing statistics don’t reflect the team’s current backfield, consisting of Marshawn Lynch, Travis Homer and Robert Turbin, who have combined for a grand total of 30 carries this season in relief of Seattle’s typical top three backs (Chris Carson, Rashaad Penny and C. J. Prosise), who are all out with injury. Secondly, the model does not attempt to capture a recent “hot/cold streak”, but as described before, evidence suggests that “late-season momentum” may not have much impact on postseason success.

Importance of Playoff Seeding: What if the Patriots had fended off the Dolphins in week 17?

Last week, most NFL analysts put little thought into projecting the perennial powerhouse Patriots to beat the lowly Dolphins and secure the No. 2 seed in the AFC, but the Dolphins managed to surprise many with their victory. We know that first round byes and home-field advantage provide a critical edge in the postseason, but just how impactful was the Dolphins’ shocking victory? I used V 4.0 to estimate the probabilities of playoff outcomes using the same exact team statistics, but with the simple modification of swapping the 2 & 3 seeds between the Patriots and Chiefs. The result: Patriots’ chances nearly double from 11% up to 20%, and the Chiefs’ odds plummet from 28% to 13% (while all other teams chances barely changed; ≤ ∆ 2%). If the Chiefs manage to upset the Ravens in the AFC championship and punch their ticket to Miami Gardens, they should drop off some gift baskets for Brian Flores and company as the Dolphins’ week 17 upset more than doubled the Chiefs’ chances to finish their season at Hard Rock Stadium.

Low Certainty, High Excitement

For NFL fans, another fun postseason is around the corner. Personally, I’m thrilled about the most probable conference championship matchup — a showdown between the league’s top two quarterbacks: Lamar Jackson vs. Patrick Mahomes. Though, according to V 4.0, there’s only a 47% chance that the 1 & 2 seeds will be dueling for the AFC title. In other words, it’s about as definite as correctly guessing a coin flip. The fact that the ageless Tom Brady and the Patriots have better than a 1 in 4 chance of spoiling the meeting of last year’s and (likely) this year’s MVPs is what makes the NFL playoffs so much fun. Indeed, even the Titans’ odds of making the AFC Championship Game are just a tick below being dealt a pocket pair. On a similar note, while the Baltimore fans will most likely be the ones cheering in mid-February, there’s a 2 in 3 chance another city is hosting a Super Bowl parade in a month’s time, so the other 11 fan-bases shouldn’t give up just yet.

2019 NFL Postseason Predictions from Machine Learning Model

Ravens leading the pack to win Super Bowl LIV, 49ers distant second

Written by Nasir Bhanpuri, PhD

Responses (1)