If you’re here, you might or might not know that Lucid is a technology company that’s been transforming the market research space. Our marketplace for online survey-takers, Fulcrum, gives pollsters and other students of public opinion an incredible resource for research experimentation. In 2016, we ran our first ever Presidential poll to experiment for ourselves and to showcase our toolset.
After the Election, we started an effort to understand why our poll and others were unable to accurately predict Donald Trump’s electoral college victory. Today, we’re releasing all of the polling data we collected during the 2016 cycle to GitHub for open-source analysis by any interested parties. We’re so excited to keep learning from this experience and how we can refine our strategy for next time.
Internally, we identified three areas that we wanted to examine further.
Sample bias – Systematic differences between your research subjects and population of interest.
Modelling error – Flawed weighting or post-stratification strategies that fit the sample to an inaccurate or biased projection of the population.
Response bias – Did the questions asked of survey-takers somehow encourage or bias them to answer a certain way?
Online polls like ours have become more common over the last few elections in response to growing concern over the feasibility and reliability of telephone polls. Several factors undermine the traditional, random sample telephone poll. The most existential involves shifting technology. Fewer people have landlines on which they can be reached and those who do are increasingly less likely to answer them and agree to participate in a survey opportunity. Increasing rates of non-response impact both cellphone and landline users because of call-screening technology and because of a decreasing willingness to participate in a survey. Not only does this make conducting telephone polls more difficult and more expensive, it increases the likelihood that the population of people available and willing to take a random phone survey is no longer representative of the population at-large.
Not all online polls are alike. Some researchers believe the way forward is to develop a community of survey takers or a “panel” that can be drawn from in various ways for a survey opportunity. There are a variety of means, however, through which those panels are constructed and sampled from. Some panels are built through random invitation, others are curated to be representative, others are open to all.
Our approach at Lucid is a bit different from other online research providers. Instead of building a panel ourselves, we built a marketplace that integrates a variety of online survey communities experimenting with diverse strategies. This gives us a few advantages as online research develops and evolves. First, because our marketplace includes many panels and other online communities, we can deliver many more survey interviews than other survey platforms. Large sample sizes help researchers draw inference from smaller subgroups within their sample. Second, because we work with a range of sample partners using multiple approaches, we can blend and balance sample so that any biases associated with a panel’s recruitment practices are muted. Third, our platform also allows researchers to meticulously layer quotas on any demographic element to maintain representivity and to deduplicate respondents across multiple sample suppliers.
Despite the increasing magnitude and representivity of the online population and our own advances in building technology to produce reliable public opinion data for our clients, it’s important to remember the inherent challenges we still face. We are not producing a random sample. By definition, an individual must be connected to the internet and a member of an online community to even be eligible to take a survey. It’s a mirror of the same issue faced by telephone pollsters. Just like telephone pollsters can’t reach people who screen their calls, online pollsters can’t reach people who don’t have internet access or who don’t engage with online surveys.
Modelling error is often the biggest factor behind poll results that differ from one another. The Upshot, project of the New York Times, conducted an exercise earlier this year to demonstrate. They gave the same exact raw survey responses to four separate respected pollsters and received back four different sets of results after each pollster applied their own post-stratification technique and their own judgments about how to classify voter likelihood and which data sources best estimate the population of the electorate.
One thing we wanted to check immediately was whether we are capturing the essential division of the electorate in our unweighted subgroup analysis. Indeed, in its rawest form, we did capture the story – major differences in voter preference based on race, gender, and education.
*Unweighted data, captured Nov. 1-7
It’s not hard to see from this how a flawed post-stratification strategy may have contributed to our exaggerated lead for Clinton over the last week of the Election. Though we could have used quotas to screen a demographically balanced sample into the survey we chose to instead rely more heavily on using survey weights in analysis. That’s why we interviewed more women than men, women are naturally more prevalent in online sample, and that decision may have magnified the impact of our simplified post-stratification strategy.
Here are some ways our simplified strategy may have led to errors:
Census vs. Electorate Representation
Our model of the electorate was based on the Census and did not accurately project the electorate, leading our model to overstate the impact of minority groups on the results.
Exit polls, while imperfect, demonstrate that our weighting strategy missed substantially.
While we post-stratified on gender, age, race, education, and income, we did not nest those factors together.That could have contributed to distortions within any one category, like a disproportionately older Latino population or a disproportionately female 65+ population.
For example, we interviewed a group of respondents that was more white than the population. We took that into account by weighting the sample to match the population. However, our model did not take into account whether our sample’s skew toward white voters was proportionally distributed by age. In fact, the sample of individuals over the age of 65 we interviewed was disproportionately whiter than our younger population.
*Unweighted cross-tab, captured Nov. 1-7
Because our weighting strategy looked at race in isolation, our correction deemphasized older whites more than younger whites. Nesting age and race together would have allowed us to match our sample to the population of 65+ African Americans rather than the population of African Americans and, separately, the population of seniors.
Our model may not have considered other factors important to projecting the electorate, like religion or religiosity.We also chose not to consider attitudinal factors like political ideology or party affiliation. Many pollsters don’t believe it’s appropriate to take into account for these in a post-stratification but doing so could have helped create more balance in our results on a daily basis or helped diagnose other biases in our model.
Did the way that we and other pollsters asked questions systematically encourage or discourage a certain response? Many people wonder whether people lie to pollsters or hide their intentions, particularly this year in the case of Donald Trump.
While our friends at Morning Consult did find some evidence of underreporting of preference for Donald Trump with live-interview telephone polls during the primaries, the similar poll numbers reported by both telephone and online pollsters during the general election does not suggest this played a substantial role in the polling miss.
It’s possible that bias resulted in survey respondents misrepresenting their intention to vote. Survey respondents may have perceived a higher likelihood of being able to complete the survey if they expressed a high likelihood of voting or acted on their perceived social desirability of voting. We asked two questions to determine voter likelihood, screening out individuals who reported they were not registered and who reported they were unlikely to vote. In hindsight, it would have been useful to test alternative screening criteria to be able to better measure this phenomenon.
Right now, the results of the election are still being counted. Hillary Clinton appears as though she’ll end up having won the popular vote by approximately 2 percentage points. While our polling showed a larger margin than that in the end, 5%, for all the reasons outlined above, the difference between our predicted outcome and the actual outcome gives us hope that our tools can be put to use to solve public opinion research problems reliably and accurately.
While we always published a rolling 3-day average of our poll results so as to minimize noise, our final single day result showed Clinton carrying the popular vote by exactly 2 percentage points. While publishing that number instead might have made us appear more accurate, our model was still subject to the same simple and imperfect set of assumptions.
Our complete dataset of survey responses is available for critique. I especially look forward to feedback and further criticism of our model and would love to hear from you directly. Please email me at firstname.lastname@example.org to talk more. While most people are probably glad the election season is over, I can’t wait to get back into this and try again.