Tuesday, 7 April 2009

Andy Cooke on the poll averaging debate

(This is a comment that Andy Cooke posted on the main PB site on the debate that often flares up here on poll averaging between Mike Smithson and Rod Crosby)

On poll averaging: Rod Crosby would be absolutely correct - if the only variability of the polling were random error (imprecision). If systematic error (inaccuracy) creeps in, then the averaging technique would fail. We can, in fact, use Rod's correct statistical assertion to test the assumption that variations between all the polls from the polling companies is purely random error - if averaging tends to work over a specified technique of selecting a single poll (which we don't have to come up with - Mike has long subscribed to the technique "Whichever one is worst for Labour"), the assumption holds. If the specified technique tends to win, the assumption fails.

Quick test: Looking at the final 3 days polling data from the last 4 elections (information readily available; timescale chosen to provide sufficient polls to actually average (number of 4-7 over the 4 elections), data from most polling companies at each election, yet minimise the possibility of a late swing), we can take the average (Crosby's Rule, or "CR") and worst-for-Labour (Smithson's Rule, or "SR")

1992: (Lead in GB; Con score-Lab score; error)
Actual: Con 7.6%; 42.8-35.2
CR: Lab 1.4%; 38.1-39.5; 9.0% to Lab
SR: Con 0.5%; 38.5-38.0; 7.1% to Lab
SR wins

1997: (Lead in GB; Con score-Lab score; error)
Actual: Lab 13.0; 31.4-44.4
CR: Lab 18.0%; 30.0-48.0; 5.0% to Lab
SR: Lab 10.0%; 33.0-43.0; 3.0% to Con
SR wins

2001: (Lead in GB; Con score-Lab score; error)
Actual: Lab 9.3%; 32.7-42.0
CR: Lab 12.8%; 31.6-44.4; 3.5% to Lab
SR: Lab 10.0%; 33.0-43.0; 0.7% to Lab
SR wins

2005: (Lead in GB; Con score-Lab score; error)
Actual: Lab 3.0%; 33.2-36.2
CR: Lab 4.9%; 32.4-37.3; 1.9% to Lab
SR: Lab 3.0%; 33.0-36.0; 0.0% to Lab/Con
SR wins

SR wins in 4 out of 4 cases. If the assumption that error is purely random were to hold, then in each case, SR would be heavily odds against to win. A 4-horse accumulator all at odds against on the order of 4/1 to 6/1 would seem extremely unlikely (if all were 4/1, for example, we'd be looking at a 624/1 accumulator, which beats Mike's Obama bet totally hollow).

Ergo the assumption that all polling error is purely random fails. Polling averaging (across companies, at least) is contraindicated.


Timothy (likes zebras) said...

Interestingly, the error of both methods declines at each election. This is an indication of the polling companies improving their methodology, and consequently reducing their biases.

The "Smithson Rule" essentially assumes that there is always a bias in Labour's favour. While I suspect that this is still the case, it's not something you can know until after the event [statistically therefore, your test of the two methods is somewhat unfair].

There is a risk of being caught out, very badly, were the polls to begin to exhibit a Tory bias.

It's also the case that, if you suspect a bias [as the Smithson Rule does] you can apply a bias correction to the polling average. There's also considerable merit in using an average to help temper over-reaction to the minutiae of changes in individual polls.

RodCrosby said...

1. Polling methodology changes. If we accept there was substantial pro-Labour bias at some point, but methodology significantly changed, then the scorecard needs to be treated separately before and after. The smaller “after” set is more likely to have produced its results by chance.

2. If bias exists, it may not operate as a constant for all outcomes. For example, in 1997, it is possible that the “foregone conclusion” election may have introduced a contingent bias. There is at least tentative evidence this might be so, when looking at 1983, when all polls underestimated Labour and overestimated the Tories significantly.

3. Even if SR beats CR in terms of votes actually cast, the result may not be so clear-cut in seats, which of course is crucial. There is evidence that often the “pro-Labour bias” in the polls has been an artifact of differential turnout. In other words the difference can be explained by those Labour voters who don’t show up to vote on the day, and are concentrated in safe seats. They don’t affect the result, and the result may be better forecasted by the CR figures. It’s “as if” those voters had voted.

So as Timothy says, it is dangerous to assume that outcomes with entirely unrelated causes can be melded into a fixed "law." Pollsters come and go, and biases may change over time.

We cannot rule out the following hypothesis.

1992. generally poor methodology and the Poll Tax factor.
1997. a contingent "landslide" bias
2001. a contingent "landslide" bias
2005. chance?

Andy Cooke said...

Timothy (likes zebras),

You're correct in both cases (reduction in error and assumption of bias). In effect, what I was doing was aiming to prove/disprove systematic error (bias).

The test was essentially to see if averaging tended to beat a specific rule of picking a single poll. We can't actually know whether there remains a bias to Labour, of course, until after the event - but that makes predicting the event somewhat difficult.

We could apply a bias correction to the polling average, but how much? It would also vary dependant on which polls were wrapped in (for example, if MORI do 2 polls, ICM 1, YouGov 2 and CR 1 in the final 3 days, the bias will probably be different to if ICM did 3, MORI 1, YouGov 1, Populus 2 and CR 0 in the final 3 days).

Although the average is still very useful in picking up trends, as long as we don't then assume that it translates exactly to the popular vote.

Andy Cooke said...


1. Yup, but if we jettison 1992 and 1997 (first one with no modern polling methodology; second one with only ICM, so averaging one poll and comparing it to the worst case out of that one poll doesn't really do much good, I'd agree), we're left with 2001 and 2005. If they're both about 4/1 against the "worst" beating "the average" (which it did with some style, being less than 1% off on the lead both times), that's about a 4% chance of it occuring by chance.

2. Very true. I'd also postulate that the bias will change for different values of support (Tories in the 40's would give differing biases to Tories in the low 30's, for example).

3. Surely that cancels with part of the "electoral tilt" seen on seat calculators? We could overcome it by treating the baseline for swing calculations as the final polling score for that company rather than the actual GB wide score. Comparing like with like, so as to speak (eg calculate swing forecast by the current ICM poll as being from the final ICM poll in 2005 rather than the actual result - so any such error source would self-cancel.

Andy Cooke said...

P.S. to both,

I haven't mentioned my own favourite (although slightly unscientific) choice of polling:

I feel that ICM has probably the best technique for questions, but might "over-damp" slightly with their "spiral of silence" adjustment.

My personal choice (and I hasten to add that it's rather mathematically unjustifiable, but has done rather well in retrospect - although, of course, prior performance, etc etc.) is to take the ICM result and transfer 1 point from whoever was ahead in the previous election to their closest competitor.

Rationale being that the spiral of silence adjustment assumes a tendency towards the previous result and almost always transfers points from whoever lost last time to whoever won.

It's interesting to note the record of such a technique since the polling change:

1997, would have said: Con 32, Lab 44
2001, would have said: Con 33, Lab 42
2005, would have said: Con 33, Lab 37

Could of course be pure coincidence and obviously not endorsed mathematically.

RodCrosby said...

Having just done some simulations in Excel, I come up with the following.

For an unbiased set of 5 polls, each of 1000 sample size, 14% of the time the lowest poll would have the smallest deviation from the truth anyhow.

Add 0.5% +ve bias to all polls, and that rises to 30%

Add 1%, and it rises to 49% of the time.

Add 2%, and it rises to 85% of the time.

So, it's not totally unlikely to happen by chance if the polls are unbiased, but a relatively small amount of bias does improve the chance of Smithson's Law holding.

Anonymous said...

LOL PB is down with less than 10 comments to a million.

Chris A said...

One millionth!

Morris Dancer said...

Nooo, I tried to submit a comment and it died:(

Oliver said...

wow, that got really intense at the end

cds said...


Me said...

The site crashed. It would be so easy if all let me be the "one million poster"

James Burdett said...

It went bang. The hamsters must have died!

Andy Cooke said...

I heard a loud "Crash!" coming from the main site's server room ...

Andy Cooke said...

"Out of Cheese Error"
Please reinstall world

wibbler said...

Is this a nefarious plot by OGH to claim the millionth post for himself (on expenses?)

fr said...

Ladies Don't Lean Backwards

Oliver said...

man how lon gis this gonna take?

Me said...

Wibbler, exactly what I think.

LS said...

the last I saw was 999996.

Mitchell Stirling said...

That was predicable.

Go easy on the F5 duty guys and girls.

James Burdett said...

Tis back, but no post million.

wibbler said...

No fair! It seems to have gone to someone posting on a previous thread (just after Aaron at 130).

Mitchell Stirling said...



wibbler said...

EDIT: It was 'penddu' saying 'Dorothy' at post 124.

The numbering is highly out of sequence.

Chris A said...

it was actually penddu at 124

Anonymous said...

It depends what you are trying to show. Some would argue that absolute numbers are pretty meaningless, since we have no good way to translate those numbers into seats.

As such the main value of polls is in showing movement over time. In which case averaging is a perfectly sensible approach.