A Polling-Based Forecast of the Republican Primary Field

This is the finale of a four-part series (Part I, Part II, Part III) evaluating the utility of early presidential primary polls as forecasting instruments. My contention is that these polls have enough predictive power to be a worthwhile starting point for handicapping a field of candidates. In this article, we’ll see what they have to say about the Republican contenders for 2012.

Here is a chart summarizing the 28 scientific polls that have been conducted on the Republican field since the start of the year, covering a total of 23 different candidates or prospective candidates. (For the ground rules used to assemble this data, see Part III).

Name recognition figures are mainly taken from Gallup, and reflect an average of all of Gallup’s surveys since the start of the year. The exceptions are a handful of relatively obscure candidates whom Gallup has not yet polled on — in those cases the name recognition figures are estimates, and are indicated in red in the table. (Some of the polls were conducted in multiple versions with varying lists of candidates; that’s why the table shows, for example, that Mike Huckabee was included in 26.2 polls out of 28.)

Our first model for translating this polling data into probabilities works as follows.

  • First, we divide each candidate’s polling average by name recognition. This gives us the percentage of voters who are familiar with the candidate and have him or her as their first choice.
  • Next, we use logistic regression analysis based on our data set of past primary polls to translate the candidate’s recognition-adjusted polling average into a probability of winning the nomination. (More technically, we use the square root of each candidate’s recognition-adjusted polling average to fit the regression curve, which produces slightly better results on the historical data.)
  • Finally, we prorate the numbers so that the probabilities sum up to 100 percent. That leaves us with the following:
  • I’m calling this the Classical Model, since it’s a little bit more elegant than an alternative method that we’ll examine later on. Divide a candidate’s polling average by name recognition, and you have a pretty decent benchmark for the candidate’s upside.

    One thing that stands out is that this method gives the leading candidate, Mitt Romney, is given only about a one-in-four chance of winning (more precisely, a 27 percent chance).

    How unusual is that? Have there been other races in the modern (post-1972) primary era that were more wide open? Here’s how this method would have designated a favorite in past election cycles:

    The current Republican race is, by some margin, the most wide-open in the modern era on the G.O.P. side, but there are a couple of comparable examples if you look at the Democrats. The model would have had Scoop Jackson as the nominal favorite to win the Democratic nomination in 1976 — but still would have given him only a 20 percent chance. Michael Dukakis in 1988 (26 percent chance of winning) and John Kerry in 2004 (29 percent) were in the same range as Mr. Romney is now, though for different reasons — their polling wasn’t quite as strong as Mr. Romney’s, but they were doing it with considerably lower name recognition.

    That brings me to the second point. What makes the 2012 Republican race unusual is not that there isn’t much of a frontrunner at this point — that’s happened before — but rather that both the high-recognition and low-recognition names are underwhelming.

    On the one hand, while Mr. Romney’s numbers and Mike Huckabee’s are considerably better than Sarah Palin’s or Newt Gingrich’s, they both fail to crack 20 percent in the polling average despite very wide name recognition. Both are also polling lower now than at the end of the 2008 campaign, in which Mr. Romney ultimately wound up with 22 percent of the Republican primary vote and Mr. Huckabee 21 percent.

    On the other hand, there’s no sign yet of a breakout candidate from the low-recognition group. Tim Pawlenty’s name recognition has improved more than any other Republican candidate since the start of the year — it’s increased to 49 percent from 39 percent, according to Gallup — but that hasn’t translated into any additional support in the horse race polling, where his numbers have been stuck at about 4 percent all year. The same holds for Mitch Daniels — and with Mr. Daniels there’s the added complication that he might not run at all.

    This method is also not very enamored of Donald Trump, although that is partly because he was not included in many of the polls at the start of the year, and the model scores those as zeroes.

    That effect becomes clear if we use the same methodology but exclude the polls conducted before April 1:

    That pushes Mr. Trump up considerably. Then again, though, there were reasons why pollsters did not include Mr. Trump in surveys early in the year: it was not clear whether he would run, or take the campaign seriously if he did. And now, indeed, Mr. Trump’s rise in the polls seems to be reversing.

    There’s another method of evaluating the race that is even more dismissive of Mr. Trump’s chances. In this version, I break a candidate’s polling average into two factors:

  • How many polls include his or her name?
  • How does the candidate poll when included?
  • This model treats name recognition as a separate variable, rather than meshing it together with a candidate’s polling average. So it fits a three-variable regression model.

    It turns out that one of the more potent predictors of success in past primary races was simply how frequently a candidate’s name was included in the early polls. Although there have been winning candidates in the modern era, like Bill Clinton, who waited until quite late in the process to officially declare that they were running, there haven’t been any who were not laying the groundwork for a run quite early on, to the point that they were routinely included in the polls. It’s not so easy to make up for lost time if you’ve dawdled rather than hire staff, cultivate elite support, brush up your media skills and so forth. Being included in a poll in the early going is an indication that you are in fact doing those things.

    Under this method, which treats inclusion in polls from the start of the year as something close to a prerequisite for winning the nomination, candidates like Mr. Pawlenty and Mr. Daniels do considerably better, while Mr. Trump’s chances look considerably worse:

    I call this the Aggressive Model because it can deviate quite a bit more from the horse race numbers — although it’s more in line with how political scientists like Jonathan Bernstein and Brendan Nyhan, who place more emphasis on factors like elite support, think about the race.

    Here, then, is the optimistic case for Tim Pawlenty — what the Aggressive Model would say if it spoke in English rather than statistics.

    1. Mr. Pawlenty is definitely running, and has been preparing to do so for a long time now — which is true of surprisingly few candidates.
    2. His lack of popular support certainly is problematic — and is only partially excused by his relative lack of name recognition. But all of the candidates have their problems, so he looks pretty decent by comparison.

    One of the reasons I was skeptical of Mr. Pawlenty early on is that there seemed to be a lot of potential candidates who might fill the same niche, as a “safe” consensus choice acceptable to both moderates and conservatives. But John Thune isn’t running; Mike Pence isn’t running; Haley Barbour isn’t running. There’s no sign of Jeb Bush, Rick Perry, or Chris Christie. Mitch Daniels might run — but he doesn’t have any more popular support than Mr. Pawlenty, and he is several months, at the very least, behind Mr. Pawlenty in his preparations. Jon Hunstman might run, but he’s got a variety of positions that are going to make him unpopular with conservatives — whereas Mr. Pawlenty is positioned pretty close to the center of the Republican primary electorate.

    However, while the Aggressive Model does have some theoretical appeal — and while it fits the historical data a tiny bit better than the Classical Model — it presents some potential issues. It really goes all-in on the assumption that a candidate cannot win unless he or she starts making preparations very early on, to the point of being considered viable enough by pollsters to be included in their surveys.

    While it is true that no winning candidate in modern times has violated that paradigm, the data is not all that robust — just 15 nominally competitive primary races since 1972, of which only a handful have been as competitive as this one. That probably isn’t enough to rule out the possibility that a late entrant could run away with things, and the Aggressive Model may be a bit overfit, meaning that it describes the historical data well but could be sub-par at making predictions.

    So I think these two models work best when viewed in tandem.

    For that matter, just as we did with the Classical Model, we can also run a version of the Aggressive Model based solely on polling data from April 1 onward:

    Let’s summarize these models and compare their results with the current betting lines at Intrade, a political futures market that captures the bettors’ view of the candidates’ current chances.

    We can see some differences between our polling-based models and Intrade on several candidates:

  • The models like Mr. Romney slightly more than the bettors do, although the difference is not large. Mr. Romney, in my view, has one major asset that is not well reflected in national polls, which is that he is strongly positioned in several early primary states (New Hampshire, Michigan, Nevada). He also has one major liability, the health care legislation enacted in Massachusetts while he was governor.
  • All four of the polling models think Mike Huckabee is grossly undervalued by the bettors. I’ll be writing more about Mr. Huckabee in the next week or two, so we’ll leave it at that observation for now.
  • The models also think that Newt Gingrich is undervalued. I’ve been a skeptic of Mr. Gingrich’s chances, and widely known candidates who are getting only about 10 percent off the vote in polls have a very poor past record. At the same time, Mr. Gingrich is definitely running — and he has at least some popular support and at least some elite support. Even if you don’t like a company’s business model, there’s some point at which its stock price becomes low enough for it to be a good buy; that’s more or less how I feel about Mr. Gingrich right now.
  • The models think Mr. Daniels is somewhat overvalued by the bettors, and that Mr. Huntsman is grossly so. Mr. Huntsman is the one I feel more confident saying that about. He’s positioned pretty far to the left (relative to the Republican field) on a lot of issues, he’s getting a late start on his campaign, and he served in President Obama’s administration — in a foreign policy capacity, no less, an area where Mr. Obama should get high marks from voters. And Mr. Huntsman is averaging only about 1 percent in the polls so far. That’s an awful lot to overcome, no matter how talented the politician.
  • Although one version of the model thinks Mr. Trump is undervalued, the others think he’s overvalued. Considering that about half of Republican voters have an unfavorable view of Mr. Trump, that he’s now moving backward in the polls, that his signature issue was just taken off the table, that some of the policy positions he holds now bear no resemblance to the ones he held earlier in his career, and that he isn’t certain to run, I’m not sure why the bettors at Intrade are giving him much of a chance at all. I don’t like to rule things out categorically — you’ll get burned if you do that too much. But while Mr. Trump’s chances of winning the Republican nomination may not be exactly zero, they’re pretty close.
  • The models like Rick Santorum and Ron Paul more than the bettors do. Although Mr. Santorum and Mr. Paul don’t share very many policy positions, they are parallel to one another in that both have strong appeal to one particular constituency within the Republican base — the religious right for Mr. Santorum, libertarians for Mr. Paul. But they don’t have much breadth of appeal, so their upside is limited. Who knows: perhaps Mr. Santorum and (especially) Mr. Paul will have some impact on the race. But there aren’t really any recent cases of candidates like these winning their party’s nomination, or even coming particularly close — and the polling models are going to have trouble accounting for that sort of thing.
  • ***

    The value of an approach like this is not that these models are infallible. Instead, they’re a pretty rough cut, as revealed by the fact that relatively small changes in methodology can produce large shifts in the chances attributed to candidates like Mr. Trump or Mr. Pawlenty.

    My contention, though, is that we’ll both do a better job of handicapping and will have more productive conversations about the primaries if we start with the assumption that the polls tell us something rather than nothing.

    (Stated far more technically, the polls are useful enough to serve as good Bayesian priors).

    You want to argue that Jon Hunstman is a more likely Republican nominee than Mike Huckabee? That’s fine. But know that, in the past, candidates who have polling numbers like Mr. Huckabee’s have had a pretty good shot at their nominations, while those with Mr. Huntsman’s profile have faced much longer odds — not just a little bit longer, but a lot longer. Maybe you can still win the argument, but it raises your burden of proof.

    Comments are no longer being accepted.