4 min

Is Biden's Chance of Winning 90 percent or 97 percent? A Note on Implied Correlation in Election Markets

Published on November 5, 2020

Is Biden's chance of winning 90% or 97%? That's the question I asked myself at noon on Thursday, November 5th, as the election hung in the balance and votes trickled in. 

The Demise of "Celebrity" Election Models

This post is intended to help you interpret election betting markets. But why would you want to do that?

We look at betting markets because, to bastardize Winston Churchill, they are the worst system for estimating probabilities except for all the rest. Prominent examples of "the rest" include projections provided by celebrity data scientist Nate Silver (now incurring the wrath of Twitter), and the well publicized model provided by the Economist publication. Those models rated Trump as a 9/1 or 19/1 outsider, roughly, whereas the markets rated Trump at 2/1.

Were they serious? Would the Economist risk nineteen years of subscription revenue to gain one? The departure from markets is so pronounced as to be a joke before the election, never mind afterwards, and it touches on a bigger philosophical problem. A model for probabilities should take advantage of all relevant information in the world, within reason, and some of that information just happens to take the form of betting markets themselves (as compared with polls, for example).

The best professional gamblers know this. They calibrate a combination of their own homework and the market. This is a crucial step. No modesty, no accuracy. In a past life I worked with a professional handicapper. I've never, ever assigned 19/1 against, post calibration, when the market says 2/1. That would represent utter incompetence and lead to certain ruin, if taken literally in the construction of an optimal portfolio of bets. 

The election markets thought the race would be tight - despite the polls indicating otherwise. Markets represent competitive prediction and, subject to local laws, nobody is blocked from making the predictions more accurate (nobody in the U.K., for example) at any time. Celebrity models, as we might disparagingly refer to them, don't have this property for a mix of pragmatic and personal reasons. Someone like myself might suggest to the Economist, a week prior to the election, that they were missing something important but that won't find its way into the model. 

That applies not only to outsiders but also, presumably, those who are directly on the modeling team. They require permission to improve the model. "Blocking" data science such as this is the norm, but it clearly introduces a single point of intellectual failure. The likelihood of a gentle suggestion along the lines of "95% is way too high and here's why ..." being "approved" and therefore influencing the outcome is quite small. It physically can't be included without a re-write of the generative model underneath. 

Markets don't have that problem. 

Markets react. They move as quickly as the first person to realize something is out of whack, not the last. Unfortunately, the highly defensive custodians of models aren't usually so quick. For example, a person in charge of the Economist's model flatly rejected the notion that events in the last few days (combined with similar failure last time around) justify a serious rethink. Instead, he responded to me in a patronizing tone and suggested that his model did better than the markets because it had Biden winning all along!

This is a spurious argument, of course, since any narrow victory by Biden could be used to justify any model ridiculously over-estimating Biden's pre-election day probability of victory. The Intermediate Value Theorem demands that a vast number of utterly terrible models will be right at least once - but it also politely requests we not abuse it in this fashion.

And then there is the incomparable Nate Silver, responding to critics of his 2020 performance:

If they’re coming after FiveThirtyEight, then the answer is f--k you, we did a good job.

The Reverend Bayes turns in his grave, watching Silver's lack of Bayesian updating. Thanks Nate for highlighting why markets are nimbler than ... whatever it is that you are doing and don't feel comfortable sharing with the rest of us. I'll be your first client when you take up bookmaking. In contrast the Huffington post had the decency to publish a mea culpa after missing badly last time around, and that was both heart-felt and informative.    

But don't laugh too hard at the fall from grace of the celebrity pundits. If your own business depends critically on predictive modeling, you are likely subject to the whims of authoritarian data science too. It is unlikely that you have set up an apparatus that allows for seamless, ongoing improvement in predictive capability that is not subject to ego and territoriality. Most likely, your models are also subject to blocking, groupthink and the paucity of human pyramids as organizing principles for prediction. I hope that if you've made it to this site, you poke around and examine a different paradigm. 

Election Markets

I'm not suggesting that election betting markets, or other competitive prediction mechanisms, are perfect. Far from it. Some simple tools can help with scrutinizing them. This post provides the reader with a number - the implied correlation between state by state election markets - that might help them interpret, or assess the reasonableness of, current odds offered on Trump and Biden.

The point is that the election markets don't provide one answer right now. They provide two conflicting answers. The straight up markets for Biden and Trump winning suggest Trump has a ten percent chance of winning. However, if one looks at the probabilities that punters assign to candidates winning individual states, it is hard not to reach the conclusion that Biden's chance of winning the election is much higher than 90%, and Trump's chances correspondingly less than 10%.

The electoral college scenarios don't admit a trivial multiplication linking state to overall probabilities, because we are not, at the time of writing, in a position where Trump must win all remaining states. However it is easy, as I show in the notebook provided with this article, to run a quick simulation. As an illustration, I will read the following odds from the Betfair exchange. These are quoted in a convention that may be foreign to the American reader, but you can interpret them simply as the inverse of Biden's probability of winning. 

State Biden inverse probability Votes
Arizona 1.31 11
Michigan 1.03 16
Pennsylvania 1.2 20
Georgia 1.58 16
Nevada 1.22 6
North Carolina 4 15

If you disagree with any of these numbers you can change them in the notebook. I'm not looking to express opinions here, just provide you with simple tooling to interpret what markets are saying about the election. I literally cannot type fast enough to keep these numbers up-to-date while writing this article, but you can change them any time you like and re-run the notebook. 

Independence Day?

The notebook contains code for introducing correlation between state results. But first, you might wish to set the correlation to zero. Then the interpretation of the calculation is very straightforward. We are merely performing a Monte Carlo simulation where we independently determine who wins Arizona by a roll of a random number between 0 and 1. If it is less than 1/1.31, Biden gets that state. 
As you can verify for yourself, the independence assumption implies that people betting on outcomes in individual states - those listed above - collectively assign a much higher probability of a win to Biden than the market for the overall outcome. At the time of writing, Biden's probability of winning is close to 97% using this calculation, yet in theory you could have received a 15% return on your dollar invested (close to 14% after commission) should you reside in a jurisdiction where it is possible to make this wager. 
That was true when I began writing this article. Now, Biden has shortened and a hypothetical investor will only get 10%. Nothing in this post should be construed as investment advice.  

Risk-Neutral Correlation

The next small step is the introduction of correlation in state results. There are two intents here. The first is to capture true correlation that may exist even in this late stage of the race. For example, just to speculate, there might be commonality introduced by military voting patterns or legal moves to come. However, correlation can also be used to account for consistent bias across state markets by those who wager. 
Given that we see markets for both the overall final result and also state by state, a related use of the calculation I provide you is the computation of implied correlation. By tweaking the single correlation parameter, you can construct a coherent market-implied model for the dying stages of the election - one that matches both the state and overall markets. Then, you can decide for yourself if the magnitude of this correlation is reasonable or not. 
As you can read from the notebook, the precise definition of correlation here refers not to the binary variable representing state outcomes but rather to a set of fictional auxiliary normal random variables which are used internally to trigger a win for Biden or Trump in each state. This goes by the name Normal Copula if you are interested. There are plenty of alternatives but I choose one here to make the point and I think that suffices. (Thanks Mathworks, makers of Matlab, for this picture).  

Trump Trades at ~50% State Correlation!

Now the punchline. Achieving coherence between state and overall markets currently requires us to use a correlation number close to 50%! This is staggeringly high, when you consider that vote counts are taking place in far flung counties. Play with the notebook and you'll see for yourself. 
Prior to the election, implied correlation might be more than reasonable given the paucity of polling - a correlation enables us to factor in the possibility of systemically mis-calibrating Trump support. However, at this late stage, the surprise value of the election has already been consumed. Why does the market imply such a high number?
One explanation: laziness. Even if all three subscribers to my blog who live in the U.K. took the notebook and used it, that might not be enough to push the market around.
A different explanation posits a heterogenous beliefs model, as they are sometimes called in finance. Here we suppose that the market reaches equilibrium when two or more communities of investors, with consistently different views, clear the market. Put simply, people who are inclined to back Trump to win in Arizona might also view his odds of winning Pennsylvania attractive. This effect won't show up in the state markets. 
However, if those same punters participate in the overall market, and they each run their own simulations, then the Trump backers will put more mass on Trump winning and conversely. However these effects won't cancel out due to the multiplicative effect (Trump's probability of winning is a multiple of state probabilities - or if we are more correct, a linear combination of multiplied conditional state probabilities). The Trump backers will add more mass than the Biden backers remove - relative to a model in which there is no such heterogeneity. 
I'm sure there are plenty of other possibilities. I'll leave you with the riddle. As I write, the implied correlation is already creeping up to 70% which, I feel, demands some kind of explanation. So please modify, critique, and (especially) debug the notebook. Perhaps I'm missing something.

Can the Discrepancy be Exploited?

There are some who would say that very high implied correlation can't be exploited (the market isn't complete), but that's way to dogmatic. There may certainly be an opportunity here whose risk return is, though not a riskless play, far more favorable than anything you are likely to come across any time soon. 
Furthermore, we see a situation where state by state election results trickle in at different times, and where it is possible to reasonably predict which state is going to move first. In that stylized scenario, is really isn't hard to construct approximate hedges for an overall long Biden position.
But again, nothing in this article should be construed as investment advice. I would prefer that you contemplate the fundamental differences between ongoing competitive prediction and in-house data science. I hope that you and your business dispense with the "pick some guy's model" approach, and join us instead in the construction of an open source, open prediction network where nobody is in charge.