Can Bernie Sanders Pull Off An Upset In Ohio?

Bernie Sanders’s win in Michigan last week was a massive upset relative to the pre-election polls of the state’s voters, which had shown Hillary Clinton ahead by an average of 21 percentage points. In fact, Sanders may have pulled off the biggest upset in the history of primary polling, eclipsing the previous record from 1984, when Gary Hart beat Walter Mondale in New Hampshire despite having trailed him by 17 percentage points.

When you consider Michigan’s demographics, however, the result wasn’t all that shocking. Michigan Democrats are fairly liberal and the state has a lot of college students — both factors that help Sanders. We aren’t just making this up as we go along; last month, we published state-by-state targets for the Clinton-Sanders race based on a few simple demographic variables in each state: specifically, its racial composition, how liberal or conservative it was, and how rural it was. Those targets had Sanders ahead of Clinton by 4 percentage points in Michigan.

Does that mean we called the upset in Michigan weeks ahead of time? No, we weren’t quite that good or lucky. The targets were based on a hypothetical race in which Clinton and Sanders were each winning about half the vote and half the delegates nationally. Since Clinton is ahead of Sanders nationally, she still would have been favored in our model (although not by the blowout margin that polls suggested).

Either way, the big gap between polls and demographics makes us nervous, especially because three more Midwestern states are voting today, including Ohio, where Clinton leads Sanders by about 11 percentage points in the polls. Historically, a margin like that would be quite safe: hence our polling model’s conclusion that Clinton is a 97 percent favorite. But after what just happened in Michigan? I’d love to drop a few bucks on Sanders if a bookmaker offered 30-to-1 odds against him, as our polling model does.

Fortunately, even if the polls haven’t been great, the conditions¹ are potentially favorable for making demographic forecasts of the Democratic race. In 2008, under similar circumstances, I made demographic-based predictions of the Democratic race — see here for my North Carolina prediction, for example — which often outperformed the polls.

Those predictions in 2008 were based on regression analysis. They took advantage of the fact that Democrats report their vote by congressional district, which makes the sample more robust; by the time North Carolina voted eight years ago, for instance, hundreds of diverse congressional districts had already weighed in. So we’re overdue to apply the same technique this year.

In contrast to the demographic benchmarks we set in February, which were based on polling data, these are based on actual votes so far, aggregated across congressional districts. We can then compare these votes against demographic and attitudinal variables in each congressional district. For a more technical description of the analysis, see the footnotes.² But basically, we’re just looking for sensible variables that have done a good job of explaining the split in the vote between Clinton and Sanders so far. The ones we included in the model are as follows:

The share of African-Americans is the best predictor of the Democratic vote to date, with Clinton performing significantly better in congressional districts with more black voters.
Clinton also performs slightly better in districts with more Hispanic voters, although the magnitude of the effect is considerably smaller than that for black voters.
Sanders performs better in districts that express liberal attitudes on social policy³.
Sanders performs better in districts with major colleges, as measured by the number of people employed in postsecondary education in each district.
As other researchers have found, Clinton performs better in the South, even after controlling for other factors.⁴
Sanders performs better in districts where more voters are in labor-union households.
Clinton performs better in districts where voters are more in favor of gun control.⁵
Sanders performs better in caucuses relative to primaries, other factors held equal.

This regression analysis⁶ models the vote by congressional district reasonably well. We can aggregate the congressional district projections to come up with state forecasts. Here’s what they would have said about the states to have voted so far:

How a demographic model has fit the Democratic race so far
		RETRODICTIVE VOTE SHARE BASED ON DEMOGRAPHICS		ACTUAL VOTE SHARE
DATE	STATE	CLINTON	SANDERS	CLINTON	SANDERS
2/1	Iowa	40%	59%	50%	50%
2/9	New Hampshire	47	52	38	60
2/20	Nevada	48	51	53	47
2/27	South Carolina	66	33	73	26
3/1	Alabama	74	25	78	19
	Arkansas	59	40	66	30
	Colorado	42	57	40	59
	Georgia	73	26	71	28
	Massachusetts	46	53	50	49
	Minnesota	39	60	38	61
	Oklahoma	52	47	42	52
	Tennessee	66	33	66	32
	Texas	65	34	65	33
	Vermont	37	62	14	86
	Virginia	64	35	64	35
3/5	Kansas	46	53	32	67
	Louisiana	76	23	71	23
	Nebraska	44	55	43	57
3/6	Maine	37	62	35	64
3/8	Michigan	51	48	48	50
	Mississippi	77	22	83	16

Our demographic “retrodiction”⁷ for Michigan still has Clinton winning, but only barely — by 3 percentage points, compared with the actual 2-point win for Sanders. Especially under the Democrats’ proportional allocation method, that’s a pretty minor difference. The model’s retrodictions in Vermont and Arkansas are also pretty far off, as you can see, but that makes sense given potential home-state effects for Sanders and Clinton in those states.

Other results are a bit harder to explain. How did Clinton (barely) win the Iowa caucuses when she got crushed in other Midwest caucus states, like Kansas and Minnesota? How did Sanders lose Massachusetts after winning New Hampshire by so much? How did Sanders win Oklahoma by 10 percentage points?

I have my theories — Clinton’s ground game may have saved her in Iowa, for instance — but my goal isn’t to explain away every last bit of variance (in which case I’d be guilty of overfitting my model). Instead, it’s to have reasonably sensible demographic-based projections that pass the smell test when applied to future states. Here are those forecasts, starting with the five states that will vote on Tuesday:

Demographic projections of the remaining Democratic states
		FORECAST BASED ON DEMOGRAPHICS AND RESULTS IN PAST PRIMARIES			“POLLS-ONLY” FORECAST
DATE	STATE	CLINTON	SANDERS	SANDERS WIN PROB.	CLINTON	SANDERS	SANDERS WIN PROB.
3/15	Fla.	67%	32%	4%	63%	34%	<1%
	Ill.	54	45	34	52	44	10
	Mo.	54	45	33	49	48	46
	N.C.	68	31	4	63	36	<1
	Ohio	51	48	42	54	43	3
3/22	Ariz.	52	47	40
	Idaho	42	57	75
	Utah	40	59	82
3/26	Alaska	36	63	91
	Hawaii	41	58	81
	Wash.	39	60	85
4/5	Wis.	47	52	61
4/9	Wyo.	41	58	80
4/19	N.Y.	55	44	30
4/26	Conn.	51	48	43
	Del.	58	41	21
	Md.	63	36	10	66	32	5
	Penn.	52	47	41
	R.I.	49	50	52
5/3	Ind.	52	47	42
5/10	W. Va.	45	54	67
5/17	Ky.	54	45	32
	Ore.	44	55	70
6/7	Calif.	53	46	37
	Mont.	39	60	85
	N.J.	54	45	32
	N.M.	52	47	42
	N.D.	36	63	90
	S.D.	54	45	34
6/14	D. C.	63	36	9

The numbers in Ohio jump out, since they suggest — in contrast to the polls — a very close race between Sanders and Clinton. After accounting for the uncertainty in the forecasts, the demographic model gives Sanders a 42 percent chance of winning Ohio, much better than the 3 percent chance that our “polls-only” forecast gives to him.

The news isn’t as good for Sanders in Missouri. There, the demographic model concludes that polls showing the race to be essentially tied are slightly too generous to Sanders; it forecasts Clinton to win by 9 percentage points.

In Illinois, the polls have been all over the place, with recent surveys showing everything from a 42-point lead for Clinton to a 2-point lead for Sanders. Our weighted polling average has Clinton up by 7 points there, and the demographic model is largely in agreement, forecasting a 9-point win for Clinton.

Listen to the latest episode of the FiveThirtyEight politics podcast.

Finally, both polls and demographics imply that Clinton is likely to win by blowout margins in North Carolina and Florida. If Sanders were to win or come close in one of those states, it would be an even bigger upset than Michigan and would suggest that something fundamental had changed in the Democratic race.

For clarity: These are forecasts based on the results so far, as opposed to benchmarks of what might happen in a hypothetical 50-50 race between Clinton and Sanders. If the candidates hit their forecasts on the nose in every state, Clinton would wind up winning by about 10 percentage points nationally. Thus, Sanders needs to substantially beat and not just tie these numbers to have a shot at the nomination. If you like, you can turn them into benchmarks by adding a net of 10 percentage points to Sanders. For instance, while the forecast in Connecticut is Clinton +3, the benchmark would be Sanders +7.

Since Sanders has lost ground to Clinton in the states to have voted so far, however, even that would not suffice for him to win the nomination; he’d have to beat these forecasts by something like 15 percentage points instead. It would be pretty shocking — but then again, Sanders has proven he can win when the odds are against him.

Footnotes

There are just two candidates, their position in national polls has been fairly stable, and the divisions between them have been reasonably clear across demographic lines.
My source for voting data is The Green Papers. In Texas, where Democrats report their votes by state Senate district instead of congressional district, I attempted to map Senate districts to congressional districts: for instance, most of Texas Senate District 30 maps to Texas’s 23rd Congressional District. I then compared the voting results against demographic and attitudinal variables from the 2012 Cooperative Congressional Election Study (CCES) and the most recent edition of the American Community Survey. For data drawn from CCES, demographics are weighted based on a voter’s estimated likelihood of participating in the Democratic primaries. Because of potential home-state effects in Vermont and Arkansas, I didn’t include them in the regression analysis.
Specifically, among voters I refer to as “white cosmopolitans,” who take liberal attitudes on both gay marriage and immigration.
Because the definition of the South is fluid, I used a 2014 poll we conducted with SurveyMonkey to estimate how Southern each state is. Alabama is definitely Southern, for instance; Missouri only arguably is.
This may be a proxy for urban-rural status, since voters in urban areas are more likely to favor gun control. Still, it’s interesting that gun control is one of the few issues where Sanders is running to Clinton’s right.
Specifically, I applied a generalized linear model with a probit link function.
A prediction made about a past event as if we did not know the final outcome.

FiveThirtyEight

Can Bernie Sanders Pull Off An Upset In Ohio?

Demographics favor Clinton in Tuesday’s Midwestern primaries, but only narrowly.

Footnotes

Comments