Between August and November last year, the Times-Picayune and Lucid collaborated to develop a daily tracking poll to estimate what might happen on November 8th. Featured in The New York Times, Huffpost Pollster and FiveThirtyEight, the tracker was consistent with national polls and poll-based forecasts, only to be proven inaccurate when America woke up on November 9th and realized Donald Trump would be in fact, the next President of the United States.
Saying the American polling industry is disappointed in their polling outcomes would be a huge understatement, given they all predicted a Clinton win. At Lucid, we also showed a national election victory for Hillary Clinton, but did not parse our data to show Trump’s path to electoral college victory. In hindsight, we assumed our predictions were wrong in part because we created a model with certain limitations and weighted our data nationally instead of factoring in state-level demographic information.
We have decided to remodel our analysis because we believe our platform is uniquely positioned to provide solutions as the entire polling industry confronts declining telephone response rates. Our marketplace diversifies where respondents come from and blinds the subject matter of the survey, which we believe reduces bias relative to single sources. We have now run a more sophisticated model, which allows us to share a picture of the race that was much closer to what actually occurred.
To provide some background before we explore our original and reconstructed models, it is fair to say that it’s common for polls to miss the result in elections. Polls are probabilistic in nature and do not necessarily predict the final margin in an election. Clinton did not lose the popular vote, neither did she win it by as many polls as predicted. Our original polls were off by a 3% margin, which is not an unusually large miss considering past election predictions. In this recent election however, the race was close enough that 2% was consequential.
Since the elections, polling organizations and academics have gone back to their models and tried to uncover exactly why polls missed the ultimate result. Natalia Jackson at Huffington Post Polling suggests the model structure is not the problem, the data going into it is. Polls are vulnerable in all states to systematic errors: supporters of one candidate can respond more enthusiastically than supporters of their opponent. It is unlikely though, that most polls at national, state and individual level will miss in the same direction, which would indicate, conveniently, that pollsters everywhere were struggling with the same challenge.
Harry Enten and Nate Silver at FiveThirtyEight have each weighed in on parsing out differential factors that contributed to election predictions, dismissing the ‘shy voter’ theory and pointing out how “national journalists may have interpreted conflicting and contradicting information as confirming their prior belief that Clinton would win”. None of these explanations however have accounted for the construction of the model itself, something we will address in subsequent sections of this postmortem.
As a result, polls might have underestimated the support Trump drew from whites without college degrees in swing states and understated his margin by 4 points or more in Midwestern states. Various “fundamentals” models misread the high partisanship (where party allegiance has widened the gap between the left and right) that existed during the time of the campaign (putting Clinton at a disadvantage), or the declining turnout among black voters.
At Lucid, we have taken time to revisit our model and speculate what went wrong and what can be done differently.
Our postmortem approach is different in the sense that in addition to questioning the bias presented in the data itself, we also questioned whether implementing a more rigorous model would have deemed different outcomes. Unlike reader polls, Lucid’s polling platform finds survey-takers through an online marketplace that diversifies where respondents come from and blinds the subject matter of the survey that each respondent will take.
In our original model, we took into account each respondent’s demographic information ensuring that the model would be representative of the population and weighted them nationally. Online polls remain relatively unproven in their application to political projections, though they hold great potential as telephone polls conducted through probability sampling are expensive and their claims of representivity come into question. Furthermore, they are prone to some of the same biases as random digit dialing – the respondent can opt out of the survey at any point. Online surveys offer more robustness and speed, and have better reach among the biggest generation in U.S. history – millennials.
For our daily tracking poll, we took a daily national sample of at least 400 adults who reported both being registered to vote and likely to vote this past November. All respondents on the Lucid platform have volunteered to participate in a survey and were not chosen for participation in the Presidential tracker through random selection. No sampling error can therefore be calculated. All data was weighted nationally.
By polling estimates at the time, our daily tracker performed well, conservatively estimating Clinton’s lead and the close race, for example, following the presidential debate on September 26. Our margins were off by +/- 3%, which again, is not uncommon. One inefficiency in our model was that we took economic behavior and demographic information into consideration only at the national level. In reality, state-level differences played out more significantly than expected. We should have factored these additional, state-level demographic variables into our model based on the available data to build a more ‘accurate’ estimate, or in the least, to identify the gaps in our model.
Therefore, with the benefit of hindsight, we decided to implement a multi-level regression with poststratification (MRP) model on our data. MRP is becoming an increasingly popular model among political scientists with evidence pointing towards more accurate estimates, especially for presidential voting. MRP strongly outperforms disaggregation when working with small and medium sized samples, and corrects for clustering and statistical issues that may arise from disaggregation. Instead of weighting data nationally as we had done earlier, we decided to apply state-level predictors and subsequent weights, essentially treating opinion as a function of gender, ethnicity, education, income and age.
We used the polling data (71,199 unique respondents) that we collected last year and Census data for cross-tabbing demographic indicators. We also collected state-level predictors, such as aggregate demographic information in order to reduce unexplained group-level variation, hence group-level standard deviation. With the data, we created index variables that we will use in the individual-level model and in the post-stratification, namely race-gender, age-education and gender-age.
In implementing the model, we treated each individual’s response as a function of his or her demographics and state (for individual i, with indexes j, k, l, m for race-gender combination, age, education, age-education interaction and state respectively):
The terms after the intercept are modeled effects for the various groups of respondents. This logistic regression above now gives the probability that any voter will support Trump ( given the person’s age, race, gender, education and state. We then computed weighted averages of these probabilities to estimate the proportion of individuals who will vote for Trump in each state. Finally, we calculate the average response by weighting by the actual population frequency in each state.
Although Clinton had won the popular vote at the national level by a smaller margin than predicted by pollsters, our reconstructed model produces a more robust estimate of the distribution of the popular vote between the two leading candidates. Our state-level results show Trump leading in 27 states. In reality, he won 30. The ones we missed were the biggest shocks to the public, Michigan, Pennsylvania and Wisconsin. There are several theories on why we (and everyone else) missed those states by the largest margins. Huffington Post for instance, attributes it to their decision to use all-landline automated polls that could have gotten older people who still use landlines. FiveThirtyEight suggested how models underestimated the extent to which polling errors were related from state to state (if Clinton underperforms in Wisconsin, she will underperform in demographically similar states). In our model, Michigan, Wisconsin and Pennsylvania overestimated the support for Clinton from black Americans at state levels. In Michigan, it failed to account for urban and rural demographic differentiators, and primarily enrolled white, urban millennials into the final poll through the randomized online exchange.
Comparatively, our model performs much better in Iowa, Ohio and Florida by estimating Trump leading by a small margin, which many pollsters were unable to predict. There was a more even spread of demographic differentiators in these states in our data, which we believe explains why we were better at predicting their outcomes in the reconstructed model. Another explanation could be that state-level polling errors were significantly similar in IO, OH and FL – on accounting for it in our model, we were able to gauge more rigorous results than what other models had done.
In reconstructing the model, we applied to predict election outcomes, we recognize the inherent biases that may come from building a prediction model after an event has occurred, but we considered revisiting our polling methodology as a mandatory exercise in light of current political climates and the existential crises faced by pollsters and researchers.Prediction models are probabilistic and built on historical trends or sampling methods. Therefore, they are not meant to be robust enough to accurately estimate the final margin of an election, and should provide the closest guess. By treating individual opinion as a result of respondent’s demographic composition and state produced a more aggressive (and more accurate) indication of the distribution of the American political landscape.
This was Lucid’s first attempt to leverage its platform to predict a political outcome. As means of expanding our analytical experience, we are absolutely thrilled with our initial results. Online surveys are increasingly becoming as accurate as traditional random digit dialing, and in fact, better because of their ability to react quickly and reiterate accordingly. Our intent is to apply more rigorous model in public opinion research, and more importantly, use our incredible resource and flexible exchange platform to build more original research and estimates for public consumption. We encourage researchers and analysts to contact us for further information and collaboration in developing more evidence-driven, statistical models that accounts for human behavior in the political science and polling disciplines.