Skip to main content
ABC News
Can Bernie Sanders Pull Off An Upset In Ohio?

Bernie Sanders’s win in Michigan last week was a massive upset relative to the pre-election polls of the state’s voters, which had shown Hillary Clinton ahead by an average of 21 percentage points. In fact, Sanders may have pulled off the biggest upset in the history of primary polling, eclipsing the previous record from 1984, when Gary Hart beat Walter Mondale in New Hampshire despite having trailed him by 17 percentage points.

When you consider Michigan’s demographics, however, the result wasn’t all that shocking. Michigan Democrats are fairly liberal and the state has a lot of college students — both factors that help Sanders. We aren’t just making this up as we go along; last month, we published state-by-state targets for the Clinton-Sanders race based on a few simple demographic variables in each state: specifically, its racial composition, how liberal or conservative it was, and how rural it was. Those targets had Sanders ahead of Clinton by 4 percentage points in Michigan.

Does that mean we called the upset in Michigan weeks ahead of time? No, we weren’t quite that good or lucky. The targets were based on a hypothetical race in which Clinton and Sanders were each winning about half the vote and half the delegates nationally. Since Clinton is ahead of Sanders nationally, she still would have been favored in our model (although not by the blowout margin that polls suggested).

Either way, the big gap between polls and demographics makes us nervous, especially because three more Midwestern states are voting today, including Ohio, where Clinton leads Sanders by about 11 percentage points in the polls. Historically, a margin like that would be quite safe: hence our polling model’s conclusion that Clinton is a 97 percent favorite. But after what just happened in Michigan? I’d love to drop a few bucks on Sanders if a bookmaker offered 30-to-1 odds against him, as our polling model does.

Fortunately, even if the polls haven’t been great, the conditions1 are potentially favorable for making demographic forecasts of the Democratic race. In 2008, under similar circumstances, I made demographic-based predictions of the Democratic race — see here for my North Carolina prediction, for example — which often outperformed the polls.

Those predictions in 2008 were based on regression analysis. They took advantage of the fact that Democrats report their vote by congressional district, which makes the sample more robust; by the time North Carolina voted eight years ago, for instance, hundreds of diverse congressional districts had already weighed in. So we’re overdue to apply the same technique this year.

In contrast to the demographic benchmarks we set in February, which were based on polling data, these are based on actual votes so far, aggregated across congressional districts. We can then compare these votes against demographic and attitudinal variables in each congressional district. For a more technical description of the analysis, see the footnotes.2 But basically, we’re just looking for sensible variables that have done a good job of explaining the split in the vote between Clinton and Sanders so far. The ones we included in the model are as follows:

  • The share of African-Americans is the best predictor of the Democratic vote to date, with Clinton performing significantly better in congressional districts with more black voters.
  • Clinton also performs slightly better in districts with more Hispanic voters, although the magnitude of the effect is considerably smaller than that for black voters.
  • Sanders performs better in districts that express liberal attitudes on social policy3.
  • Sanders performs better in districts with major colleges, as measured by the number of people employed in postsecondary education in each district.
  • As other researchers have found, Clinton performs better in the South, even after controlling for other factors.4
  • Sanders performs better in districts where more voters are in labor-union households.
  • Clinton performs better in districts where voters are more in favor of gun control.5
  • Sanders performs better in caucuses relative to primaries, other factors held equal.

This regression analysis6 models the vote by congressional district reasonably well. We can aggregate the congressional district projections to come up with state forecasts. Here’s what they would have said about the states to have voted so far:

RETRODICTIVE VOTE SHARE BASED ON DEMOGRAPHICS ACTUAL VOTE SHARE
DATE STATE CLINTON SANDERS CLINTON SANDERS
2/1 Iowa 40% 59% 50% 50%
2/9 New Hampshire 47 52 38 60
2/20 Nevada 48 51 53 47
2/27 South Carolina 66 33 73 26
3/1 Alabama 74 25 78 19
Arkansas 59 40 66 30
Colorado 42 57 40 59
Georgia 73 26 71 28
Massachusetts 46 53 50 49
Minnesota 39 60 38 61
Oklahoma 52 47 42 52
Tennessee 66 33 66 32
Texas 65 34 65 33
Vermont 37 62 14 86
Virginia 64 35 64 35
3/5 Kansas 46 53 32 67
Louisiana 76 23 71 23
Nebraska 44 55 43 57
3/6 Maine 37 62 35 64
3/8 Michigan 51 48 48 50
Mississippi 77 22 83 16
How a demographic model has fit the Democratic race so far

Our demographic “retrodiction”7 for Michigan still has Clinton winning, but only barely — by 3 percentage points, compared with the actual 2-point win for Sanders. Especially under the Democrats’ proportional allocation method, that’s a pretty minor difference. The model’s retrodictions in Vermont and Arkansas are also pretty far off, as you can see, but that makes sense given potential home-state effects for Sanders and Clinton in those states.

Other results are a bit harder to explain. How did Clinton (barely) win the Iowa caucuses when she got crushed in other Midwest caucus states, like Kansas and Minnesota? How did Sanders lose Massachusetts after winning New Hampshire by so much? How did Sanders win Oklahoma by 10 percentage points?

I have my theories — Clinton’s ground game may have saved her in Iowa, for instance — but my goal isn’t to explain away every last bit of variance (in which case I’d be guilty of overfitting my model). Instead, it’s to have reasonably sensible demographic-based projections that pass the smell test when applied to future states. Here are those forecasts, starting with the five states that will vote on Tuesday:

FORECAST BASED ON DEMOGRAPHICS AND RESULTS IN PAST PRIMARIES “POLLS-ONLY” FORECAST
DATE STATE CLINTON SANDERS SANDERS WIN PROB. CLINTON SANDERS SANDERS WIN PROB.
3/15 Fla. 67% 32% 4% 63% 34% <1%
Ill. 54 45 34 52 44 10
Mo. 54 45 33 49 48 46
N.C. 68 31 4 63 36 <1
Ohio 51 48 42 54 43 3
3/22 Ariz. 52 47 40
Idaho 42 57 75
Utah 40 59 82
3/26 Alaska 36 63 91
Hawaii 41 58 81
Wash. 39 60 85
4/5 Wis. 47 52 61
4/9 Wyo. 41 58 80
4/19 N.Y. 55 44 30
4/26 Conn. 51 48 43
Del. 58 41 21
Md. 63 36 10 66 32 5
Penn. 52 47 41
R.I. 49 50 52
5/3 Ind. 52 47 42
5/10 W. Va. 45 54 67
5/17 Ky. 54 45 32
Ore. 44 55 70
6/7 Calif. 53 46 37
Mont. 39 60 85
N.J. 54 45 32
N.M. 52 47 42
N.D. 36 63 90
S.D. 54 45 34
6/14 D. C. 63 36 9
Demographic projections of the remaining Democratic states

The numbers in Ohio jump out, since they suggest — in contrast to the polls — a very close race between Sanders and Clinton. After accounting for the uncertainty in the forecasts, the demographic model gives Sanders a 42 percent chance of winning Ohio, much better than the 3 percent chance that our “polls-only” forecast gives to him.

The news isn’t as good for Sanders in Missouri. There, the demographic model concludes that polls showing the race to be essentially tied are slightly too generous to Sanders; it forecasts Clinton to win by 9 percentage points.

In Illinois, the polls have been all over the place, with recent surveys showing everything from a 42-point lead for Clinton to a 2-point lead for Sanders. Our weighted polling average has Clinton up by 7 points there, and the demographic model is largely in agreement, forecasting a 9-point win for Clinton.


Listen to the latest episode of the FiveThirtyEight politics podcast.

By
 

Finally, both polls and demographics imply that Clinton is likely to win by blowout margins in North Carolina and Florida. If Sanders were to win or come close in one of those states, it would be an even bigger upset than Michigan and would suggest that something fundamental had changed in the Democratic race.

For clarity: These are forecasts based on the results so far, as opposed to benchmarks of what might happen in a hypothetical 50-50 race between Clinton and Sanders. If the candidates hit their forecasts on the nose in every state, Clinton would wind up winning by about 10 percentage points nationally. Thus, Sanders needs to substantially beat and not just tie these numbers to have a shot at the nomination. If you like, you can turn them into benchmarks by adding a net of 10 percentage points to Sanders. For instance, while the forecast in Connecticut is Clinton +3, the benchmark would be Sanders +7.

Since Sanders has lost ground to Clinton in the states to have voted so far, however, even that would not suffice for him to win the nomination; he’d have to beat these forecasts by something like 15 percentage points instead. It would be pretty shocking — but then again, Sanders has proven he can win when the odds are against him.

Footnotes

  1. There are just two candidates, their position in national polls has been fairly stable, and the divisions between them have been reasonably clear across demographic lines.

  2. My source for voting data is The Green Papers. In Texas, where Democrats report their votes by state Senate district instead of congressional district, I attempted to map Senate districts to congressional districts: for instance, most of Texas Senate District 30 maps to Texas’s 23rd Congressional District. I then compared the voting results against demographic and attitudinal variables from the 2012 Cooperative Congressional Election Study (CCES) and the most recent edition of the American Community Survey. For data drawn from CCES, demographics are weighted based on a voter’s estimated likelihood of participating in the Democratic primaries. Because of potential home-state effects in Vermont and Arkansas, I didn’t include them in the regression analysis.

  3. Specifically, among voters I refer to as “white cosmopolitans,” who take liberal attitudes on both gay marriage and immigration.

  4. Because the definition of the South is fluid, I used a 2014 poll we conducted with SurveyMonkey to estimate how Southern each state is. Alabama is definitely Southern, for instance; Missouri only arguably is.

  5. This may be a proxy for urban-rural status, since voters in urban areas are more likely to favor gun control. Still, it’s interesting that gun control is one of the few issues where Sanders is running to Clinton’s right.

  6. Specifically, I applied a generalized linear model with a probit link function.

  7. A prediction made about a past event as if we did not know the final outcome.

Nate Silver founded and was the editor in chief of FiveThirtyEight.

Comments