This post continues a series of posts intended to lay out what may and may not be concluded from the recent election, based on hard data. As much as possible, this series will attempt to be entirely non-partisan, simply laying out as accurate an explanation of the data as possible. As the vigorous debates about polling indicate, voters as a whole, and Christians in particular, struggle to examine data apart from partisan passions, but doing so is a tremendous boon to understanding the world as it is, and that itself is key to understanding how the world must change to become what it should be.
We all know it, and it’s so obvious that it’s shocking to question it: the polls leading up to November’s election were badly wrong. Those who’d called out “media bias” in the polls cited the fact as the ultimate vindication of their accusation that the polls were “oversampling” Democrats, and 2016 was added to the list of years pollsters just missed it, part of a burgeoning narrative that polls always unreliable and manipulative.
The problem with that narrative is that it is not just false, but ridiculous. Shocking as it sounds, national polls this year were somewhat better than average, and almost twice as accurate as they were in 2012. Even state polls were generally accurate, although with a few notable–but perfectly explicable–misses.
That almost certainly sounds impossible, given the narrative some (particularly Trump’s supporters) have been pushing, and yet it’s the truth. The RealClearPolitics polling average predicted that Clinton would win the national popular vote by 3.2 points; she actually won by 2.1 points. That error of 1.1 percentage points is well below the historical average, and well within the margin of error of those polls. (Whether it was within the uncertainty of the average hinges on whether you treat individual polls as independent measurements, but there’s no reason to do this: generally speaking, treating polling errors as correlated, as RCP does, produces a more reliable result, and doing so indicates that the final outcome was also within the uncertainty of the average.)
For comparison, since 1968 the polls have varied from an 0.1-point error (1992) to a 7.2-point error (1980). The average error in national polls is 2.0 percentage points, 0.9 points more than the average this year. In the last presidential election, in 2012, the national polls underestimated Obama’s vote total by 2.7 percentage points–still within their margin of error, but significantly worse than the polls this year.
Thus, not only is it wrong to say that the national polls were wrong when they predicted the outcome within their confidence limits, it’s absurd, because it requires finding fault with one of the more accurate polling outcomes in US presidential history. Doing so is a sure sign that the speaker doesn’t understand how polls or confidence limits work.
There’s a somewhat more reasonable argument on the basis of state polls. As a matter of fact, the state polls missed by the largest average margin in decades. Evidence of manipulation? Hardly. In fact, only one state polling average (Utah) incorrectly predicted the final margin. You see, polling, like any science, doesn’t produce a single number, it produces a number with confidence limits. The actual outcome may differ markedly from the single number you commonly hear quoted, but if it’s within the polls confidence limits, the poll still got it right. Unfortunately, this kind of uncertainty rarely gets communicated to the general public, so it’s worth having a look.
Predicted and actual Trump margin for the competitive states.
What you see is that although polls usually underestimated Trump’s lead, only one–Utah–actually missed the final margin by a significant amount. Utah, of course, was an extremely volatile polling environment because of the various third party candidates surging and receding there; arguably the polls accurately captured Trump’s surge from down by 4 to leading by 10, and projecting that trend through to Election Day would have given a much better match (as it was, the polls stopped five days before the election). The average of polls that exclude third party candidates, interestingly enough, does capture the eventual outcome within uncertainties.
Looking at the margin alone tends to give the state polls a little too much credit, though. The error on the margin between two candidates is twice the error on the total for any one of the candidates, since it depends on the accuracy of the total for both candidates. Because in the states where Trump overperformed the error came almost entirely in his share (that is, the polls predicted Clinton’s correctly, but underestimated Trump’s), several polls predict both the margin and Clinton’s total within their confidence interval, but miss Trump’s share.
Predicted and actual Trump vote percentage in competitive states where the polls missed.
Here, again, Utah would be removed from the list entirely if we considered only the two-way race polls (this, combined with the fact that Trump’s overperformances did not come at Clinton’s expense, hints at the reason the state polls were consistently misfit . . . but more on that below). There were only six competitive states in which the polls actually missed, and in all six they did so by correctly predicting Clinton’s percentage but underestimating–in some cases dramatically–Trump’s.
It’s pretty easy to figure out why, and it should have been obvious to those studying the election in advance. Trump, more than Clinton, was hurt by disunion within his party. It makes sense, after all–up until he started running for president as a Republican, his positions had been quite liberal, and Republicans, who claimed to be the party of conservatism and family values, were understandably reluctant to support a liberal who bragged about adultery. However, as the election dragged on and the spectre of a Clinton presidency became more and more real (with ample help from the Republican establishment) those reluctant Republican voters started to swing back into Trump’s camp. In Utah this effect is, in my opinion, very obvious: in a poll ending on 19 October, the day of the third debate, Evan McMullin led by 4 points. That night Clinton very publicly reaffirmed her support for murdering unborn children up until the moment of birth, and Trump led every Utah poll after. The effect was less obvious in other states, but in any case, Trump won because many of those who polled as conservative third-party voters decided to vote Republican when it came down to it.
This isn’t unique to this election, either. In general, polls that include third-party candidates are considered less reliable than those that only include major-party candidates because third-party voting is so volatile and unpredictable. It’s difficult to account for in modeling turnout, and errors in models, not manipulation or “oversampling” (that bit of ignorance warrants its own post, but all in good time), account for most polling error.
In summary, the national polls and most state polls were correct. In all but one competitive state the polling average correctly predicted the margin, but polls in a handful of competitive states really did fail to predict Trump’s share (but not Clinton’s), and those misses are likely a result of a late shift of third-party voters toward Trump. There’s no grand conspiracy or proof that polls don’t work, merely a set of circumstances that led to the actual errors being almost exclusively fortuitous for one candidate.
It’s simply foolish and ignorant to dismiss all polling out of hand, and far from being evidence against the existence of reliable polling, this election is yet another point in its favor.