15 min

The Signal and the Nate. How the Longshot Effect Informs Assessment of Election Forecasts

Published on November 9, 2020

The Longshot Effect is well established in betting markets. Once we account for it, a reasonable ex-post assessment of Nate Silver's FiveThirtyEight Election model must conclude it performed substantially worse than a "randomly chosen" prediction market. Either that, or we face a strange situation where Nassim Taleb and Nate Silver will have to agree with each other. 

The "Best" Forecaster Versus the "Worst" Betting Market?

A previous post suggested that readers would be better off running their own presidential election simulation, rather than trusting an opaque one provided by Nate Silver and team at FiveThirtyEight (this prompted some discussion). Some of you disagreed, which is healthy of course. I made an offhand, unsubstantiated claim in the comments, to the effect that if you were to look at the performance of well publicized election forecast models in the 2020 election, they would all perform worse than any of the betting markets - which is to say you'd be better served by the worst betting market than the best forecast model based on polls. 

Here I'm going to test that statement, somewhat tongue in cheek. Truth be told, I wish I hadn't made the assertion because I wasn't organized enough to save the relevant data for them all. So instead, I'm going to flatter Nate Silver (and probably offend Professor Andrew Gelman and the rest of the crew at The Economist, to name one competing model) by assuming the FiveThirtyEight Election forecast model is the best we can find. I'm going to insult New Zealanders by suggesting that PredictIt is the worst prediction market, because I'm bitter about the rugby.

Those category rankings are in jest, and really it was a choice of convenience. FiveThirtyEight lists their pre-election probabilities on their GitHub repo, and PredictIt made a snapshot of theirs in the form of the election map you see below. Despite the arbitrary choice, I hope this provides you with a balanced assessment of the FiveThirtyEight model - to the extent that anything can be learned from one election. 

The Showdown

Let's get ready to rumble! The first thing we will do is examine Brier scores for swing states. The Brier score is simply the square of the difference between the outcome (marked as 0 or 1) and the probability that the model assigned. Here are the probabilities assigned by PredictIt patrons. The convention here is that the probability is shown for the most likely candidate. 

Screen Shot 2020-11-07 at 10.57.34 AM

So, for example, since PredictIt's probability of Trump winning Texas was 0.71, and Trump did indeed win, the contribution to the Brier score is:

\begin{equation} \left(0.71 - 1 \right)^2 = 0.0841 \end{equation}

We do the same calculation for each state, then average and take a square root. We also compute the negative log-likelihood, which, as the name suggests, is minus the logarithm of the probability of the outcome assigned by the model. So, for example, the score for Texas for PredictIt is:

\begin{equation} -\log( 0.71 ) = 0.1487 \end{equation}

On the other hand, had Biden won Texas, the score would have been:

\begin{equation} -\log( 1-0.71 ) = 0.5376 \end{equation}

For both Brier score and negative log-likelihood, lower is better.

Now reader beware. For this analysis I read from this map and hand-copied the results into a notebook - a tremendously error prone activity given my appalling lack of knowledge of United States geography. Out of laziness, I included only swing states at first, defined as those for which PredictIt's prior odds were between \(0.1\) and \(0.9\), then slightly enlarged the sample by including a few more red and a few more blue states with Trump winning probabilities nearer to \(0.95\) or \(0.05\). 

I was wary of going too far into the tails because when a platform charges five percent for withdrawals, it makes it tricky to interpret a price less than five cents or greater than ninety five cents. Given that the focus of this particular site seems to be politics, and some patrons might be in for the big event only, there would seem to be a high likelihood that the five percentage points was on bettor's minds. 

In any case here are the results. 

  PredictIt FiveThirtyEight Forecast
Brier Score 0.3530 0.3547
Negative Log-Likelihood 0.4175 0.4345

And there you have it. Compared to Nate Silver's FiveThirtyEight Election model, a small(er) election prediction market (compared to Betfair, for example) has a lower Brier score and also lower negative log-likelihood - though not by much. The FiveThirtyEight team should be congratulated for holding their own, in this sense, as it is very hard for any one model to compete with a market. 

Incidentally if you are wondering why we use Brier scores (i.e. least squares), it is because it provides an incentive for the person providing probabilities to provide their honest assessment. The use of other penalties encourages them to game the system instead - for example, by shifting their probabilities towards or away from 50%. 

That should be kept in mind. If you feel inclined to praise or beat up on Nate Silver, the statistically valid way to do either is a Brier score as this will incentivize him to do the right thing next time. Other kinds of assessments, such as counting the number of states picked correctly, are simply silly. Incidentally FiveThirtyEight uses Brier scores to assess NFL and other probabilistic predictions. It is entirely standard. (Nate are you listening? Everybody will judge you with a proper scoring rule next time. Don't go shrinking on us).  

But unfortunately, I still can't award FiveThirtyEight a draw with PredictIt this time around, because thus far we have been using an extremely naive interpretation of the betting market prices - directly interpreting them as probabilities without any attempt to use our noggins. It turns out that the minute one does anything sensible, the non-for-profit platform out of Wellington starts outperforming FiveThirtyEight, and quite substantially so. 

The Longshot Effect

Anyone who has spent any time near a betting market knows the single most persistent bias, and in many ways the most obvious, is the Longshot Effect. So named because it applies with great force to longshot horses unlikely to win, the Longshot Effect hypothesizes that the return on a wager falls as the odds against the horse winning grow larger. 

For example, it may be the case that a patron wagering one dollar on an even money favorite (+100 in U.S. notation) is expected to get back 95 cents on average. On the other hand, someone wagering one dollar on a horse paying 100/1 (i.e. +10000) might well be throwing away a third of their investment, on average. 

The Longshot Effect is found in just about every betting market, and some financial markets. It runs counter to the notion that markets underprice tail events - the reverse is generally true (which is why when I first read The Black Swan, in which Nassim Taleb proposes the opposite, I wondered if the book was sponsored by a bookmarking consortium.) 

slow

Much has been written about how to correct for the Longshot Effect, but I will content myself with a simple transformation of state winning probability: \begin{equation} p \mapsto \frac{p^\beta}{p^{\beta}+(1-p)^{\beta}} \end{equation} where \(\beta\) is a constant. Using \(\beta=2\) to illustrate, a betting market with a price \(75c\) would be reinterpreted as a probability \(p=0.9\). In general, for \(\beta \gt 1\) we are pushing probabilities away from \(p=0.5\) in the direction of \(p=0\) or \(p=1\).

I'm endeavoring to be fair to the FiveThirtyEight team and I am aware of the danger of ex-post selection of a power coefficient, since of course I know the election results. In general it would be invalid to say, "Here's a one parameter transformation of the betting markets and lo, it beats Nate Silver for some value of the parameter." 

Thing is, the Longshot Effect is such a mainstay that something needs to be done and there are some sensible ranges for \(\beta\). Every other prediction market suggests we choose a coefficient somewhere between \(\beta=1\) and \(\beta=2\). And the bad news for FiveThirtyEight is that no matter what we do this time, PredictIt beats them. You can see this by the way the yellow curve moves well below the blue one for any value of \(\beta \gt 1\), so it wouldn't have mattered if I'd engraved \(\beta=1.33\) on a rock prior to the election, or \(\beta=1.5\). Evidently the conclusion would have been the same.

brier_blah

We see that PredictIt Brier score is lower for all remotely sensible choices of the exponent \(\beta\). I also plotted, for interest, the FiveThirtyEight model when the same transformation is applied. Ex-post using \(\beta=1.4\) helps the FiveThirtyEight model, which is consistent with a criticism leveled by Professor Gelman - namely that FiveThirtyEight might be deliberately shrinking their estimates to avoid being seen as very wrong (the incentives for publishers of probabilities who do not have capital at stake are a little curious, as noted above - but I'd rather not get in the middle of that one). 

It seems that Nate Silver is losing out to PredictIt. But as I said above, don't hate on Nate for that. It is incredibly difficult for one modeler to provide better predictions than a market. 

Can Nassim Taleb's Black Swan Theory Defend Nate Silver?

It is interesting, though probably just coincidence, that the curves intersect fairly close to \(\beta=1.0\). To the left, FiveThirtyEight performs better than the prediction market (because the prediction market does poorly - though notice that it doesn't help the FiveThirtyEight model to take \(\beta<1\)). 

In my way of thinking there isn't a shred of economic evidence suggesting that anyone would be well advised to use \(\beta \lt 1\), but what would I know? Popular probabilist Nassim Taleb has raised a number of intellectual arguments that all suggest people underestimate the small probabilities \(p\) - and thus necessarily overestimate \(1-p\). A lot of halfway plausible philosophy suggests that the reasonable interpretation of PredictIt would use \(\beta \lt 1\), whereupon we might conclude that Nate Silver escapes with a draw or even a win. 

Since I haven't shown a negative log likelihood plot just yet, let's use that metric to make the point, instead of Brier score. 

black-swan-theory

Notice that the crossover occurs for a lower value of \(\beta\) than before (likelihood seems to be harsher on FiveThirtyEight than Brier score, though that is beside the point). The real message in this plot is that if you want to award a win to Nate Silver this time around, you need to agree with Nassim Taleb and his seemingly counter-empirical Black Swan theory. Nassim Taleb is quite critical of Nate Silver and vice-versa, so revel in that irony if you must. There is some coverage of the Taleb-Silver spat here for those who care - but ignore the author's statistical comments as they are utter gibberish made in ignorance of Brier scores.  

What's To Be Done? 

So there you have it. If you don't believe in the Longshot Effect, and you're instead sympathetic to reverse Longshot Theory (i.e. Black Swan Theory) you are free to conclude, with logic I personally consider rather dubious, that Nate Silver did better than the PredictIt prediction market. However fans of reverse Longshot Theory should probably assimilate (or ignore) yet another data point. 

Are you ready for it?

Look at PredictIt right now. Setting \(\beta=0\) we see that Trump has nearly a ten percent chance of retaining the Presidency, and I write this after every news outlet has announced the result. If that isn't an example of the Longshot Effect, I'm not sure what is. If, to the contrary, you want to believe markets underprice the unexpected, I'd be curious to know what you think his probability of retaining the presidency is. One in five?!  

But regardless of your pre-existing affiliations to schools of thought in populist probabilism, or their leaders, I hope this post helps you interpret the performance of FiveThirtyEight in 2020. It is a small sample - just one election. I leave it to Bears kicker Cody Parkey to make the argument that if you flip Florida, then FiveThirtyEight looks much better than PredictIt. Since only half a dozen states are really in play, evaluating poll interpreters feels a bit like evaluating NFL kickers on the basis of one game. 

Taleb would agree, no doubt, but might also point out that Nate Silver's reputation for election forecasting was also built largely on just one election (his fan boys and girls have been fooled by randomness). I would prefer to make the argument that no single person or team can see every angle (like this one say, which I published prior to the election). It takes a village to compute probabilities well - which is to say a market. 

Taleb would also argue that having skin in the game is important. So the resolution is straightforward. No need for Twitter spats. If we really want to have good probabilities of election outcomes, the best thing might be for someone to just give Nate Silver a large pile of cash (I doubt he would complain) and have him move the election betting markets around just a little. That way Nate is going to have a lot of fun.  Nassim is happy too, because two out of three of his memes get emphasis (fooled by randomness and skin in the game). The third, Black Swan Theory, is going to take a hit, but let's face it, that one was a bit of a longshot to begin with. 

So I'm not concluding that Nate Silver is noise - merely that expecting him to substitute for prediction markets is asking too much of any hardworking statistician, however talented. I suspect a longitudinal study using an open sourced FiveThirtyEight model (how about it team?) would suggest features not fully priced in. But even then, I suspect that we will see more signal was coming through a New Zealand based prediction market than the FiveThirtyEight model. That's very fitting. New Zealand is the birthplace of the prediction market and for that matter, real-time information processing. If you don't believe me, I mention the history  here

Reproduce  

If you would like to reproduce the numbers and plots in this article, see the notebook and please report bugs. Thanks to the FiveThirtyEight team for making their pre-election probabilities easily accessible. 

Comments