There are several ways to use data to report on a presidential election. Data can be used to look back at what already happened, or it can be analyzed and used as support to make future predictions. Unfortunately, the predictions that journalists make are primarily reliant on the dependability of the numbers and are also influenced by factors that aren’t quantifiable.
This problem has led to top data journalists making misguided predictions that didn’t come close to getting it right in the end. In his article outlining why the masses should be cautious when reading the latest election forecast or study, Andrew Gelman pointed out several missteps made by the media. He started by noting that, “premature obituaries of Trump have been [a] common problem” in this election cycle.
Nate Silver, whom Gelman refers to as a “famed number cruncher,” even gave Trump a measly 2% chance of winning the Republican nomination in July 2015. It wasn’t just Silver, though, and it didn’t stop in 2015 or even after he won the primaries. As recently as August 4th of this year, manipulation of data amongst bettors in Britain was used to calculate an estimation of the probability that Trump will drop out of the election altogether. Strictly speaking, this is still a possible outcome, though at this point is seems improbable.
So what is the problem? Is the problem with the data, the analysis, or neither?
Writing for the New Yorker, John Cassidy proposed the idea that the problem was that Silver was simply a victim to a, “a flawed supposition, widely shared, that a major political party would never select a candidate as extreme as Trump.”
And how did Nate Silver, and others, determine that Trump had such a small likelihood of winning the primaries? A mixture of polling data and historical data, in most cases. As Cassidy points out, Silver relied more heavily on historical data from the postwar era, or “modern era.” While early polling data seemed to be telling the story that has continued to this point, history was telling the story that outsiders do not win primary elections. Of course, if you count each winner from both parties since the “modern era” began, it gives you only 34 data points to examine. That sample size is simply too small to give reliable feedback on the possible outcomes of a primary election.
Are the numbers that journalists are using reliable enough to help them form accurate predictions? People are tempted to say that because no candidate like Trump has ever come this far in an election cycle, it’s not probable that he will win. However, with such a small historical data set, it’s entirely possible that Donald Trump will at first be an outlying point in the set before a new trend in the data is fully formed.