Whenever you go and collect data and do an initial analysis on it, there’s always one reason or another why it just doesn’t seem to fit with expectations. Perhaps there have been some gaps in the data? Or perhaps one series of data points seems to be over-represented? So, it’s the most natural thing in the world for the statistician to seek to fill-in or even-out these gaps, perhaps by reference to some sort of historical pattern or even by some sort of fudge-factor until the result seems more sensible and in-line with what you’d expect.
And, of course that’s what Phil Jones at his colleagues at the University of East Anglia just down the road from me did for about 15 years with his climate change models.
With hindsight it’s fairly easy to see that Jones and his colleagues had bought into a fixed notion that temperatures would be expected to rise as the amount of CO2 in the atmosphere increased. And so he designed his models accordingly. And then he got into a series of positive-feedback loops that progressively increased and then exaggerated the minor increases in the temperature record to fit into the peer-reviewed collective wisdom.
It didn’t matter that the fastest rises in CO2 occurred in the mid-1800’s [Krakatoa?] or that the weather during the Battle of Hastings was warmer than today [the Medieval Warm Period], a narrative had taken hold and people then became so engrossed in the computer models and their assumptions that they forgot to look out of the window to see what was really happening outside.
So, are the 2010 political polling reflecting what we’re seeing as we look out of our own windows? Is this poll narrowing really reflecting the conversations we’re having with colleagues at work or in the pub?
Have the Pollsters’ models got out of kilter with what we’re experiencing on the ground? Which is why I want an answer to Richard Tyndall’s
But I do have doubts about these Yougov daily polls. The difference between the pre and post weighting numbers seems to me to be unjustified and I would like to see this confirmed by a ‘real’ poll before saying whether or not the Tories are getting what they deserve for their recent mishandling of policy.
But before I highlight a worked example, let’s just get some things straight so you don’t characterise me as a Climate-Change or Polling denier. Let’s review some of the events of the last few weeks.
• As Bob Worcester says, it’s the share, not the lead that’s important. You only need a majority of one to win a seat. It’s clear that the Tory’s lead has narrowed over the last few weeks, even if their share has remained quite stable in the 38-40pc range.
• Labour’s share has increased. It’s been wall-to-wall Labour on the telly so Mike’s third rule applies: The more you’re on the box, the better you do in the polls. Their improvement over the last week seems to partly prove the saying that ‘There’s no such thing as bad publicity’.
• The LibDems have dropped and are flat-lining at 17/18pc and their misfortune has been Labour’s gain. The LibDems and Labour seem to be fishing in the same pool. As the LibDem vote falls, Labour’s rises, which has contributed most of all to Labour’s recovery.
• We also know that all-of-a-sudden Labour is doing substantially better in Scotland at the expense of the SNP so in aggregate over the whole of Great Britain, Labour’s percentage is higher. It’s debatable whether the increase is reflected in the English Towns, where the battle will be won or lost.
Oh yes and we also know a few weeks ago YouGov changed their weighting methodology in preparation for their daily polls, noting that Tory voters tended to respond more quickly to invitations-to-survey and older voters often missed-out because they only check their emails every few days.
The goalposts just got moved. Someone at YouGov just pressed the ‘Reset’ button. What’s the effect of this been?
So, let’s have a look at one of the recent polls by having a close look at the YouGov results from 25th February, for which fieldwork was conducted on 23/24th February.
First of all, the sample was reported to be 1473. Of the 1473 sample, the ‘Headline’ result was said to be Con 38%; Lab 32%; LibDem 19%; Others 10%, amounting to 99% with rounding errors. This excludes the don’t knows and the will not vote.
Of the 1473 sample just over 400 respondents were either in the others [10%], will not vote [7%] or undecided/Don’t know [13%] categories. That surprised me and suggests that all the parties still have a lot to play for. With 10% on ‘others’ and 20% of voters undecided, it’s still Game-On for the big parties.
The press is lapping-it-up. The Hung Government narrative sells papers.
Now, the pollsters job is to try and predict the national share. But is estimating the national share the same as predicting the result of the election? Andy Cooke has been postulating that the marginals are behaving differently and Blair Freebairn’s suggested that the key battleground is the METHHs, the medium English Towns and their hinterland. I agree.
So, now I’m going to do something controversial to illustrate a point. Just hear me out while I construct this straw man. There’ll be plenty of people lining-up to demolish it in the comments so I’ll put my tin hat on now!
I’m going to make the simplistic point that in the English METHH marginals, the battle is going to be a three-way fight. The contaminating effect of the nationalist parties in Scotland and Wales [SNP/PC] will be zero. I’m also going to make the intellectual leap that in a ‘change’ election, UKIP and the Greens [except Brighton & Norwich South] will be squeezed too.
So let’s just see what happens in what’s left of the YouGov weighted sample when we discount all except those who say they’ll vote for the main parties. In this scenario, what’s left are 1057 [71% of the whole sample] respondents out of the original 1473. I’m going to suggest that this forms a proxy for the 3-way English-Town fight, where the battle will be won or lost.
In the 1046 Raw Sample, there were 496 [47%] Con; 333 [32%] Lab & 217 [21%] LibDem. In 1057 weighted Headline Result, there were 453 [43%] Con; 375 [35%] Lab & 229 [22%] LibDem. Oooh. That's a big difference between the two!
In the weighted-sample three-main-parties-only figures, the Tories are 8% ahead… and that’s including Labour’s Scottish vote, which we know is disproportionately high north of the border and irrelevant for the METHHs. In the unweighted one, they're 15% clear.
Interestingly these are pretty similar results to the Angus Reid polls that have been studiously ignored by the national press, which have showed an additional swing over-and-above in the marginals.
In the weighted result, the Tories have been scaled-back from 496 to 452 [-43] and Labour improved from 333 to 375 [+42]. The LibDems have been boosted by 12 moving from 217 to 229. That represents a swing away from the Tories to Labour across the whole country of 5.7%. That’s quite a difference in a poll that gave the Tories a 6% lead.
If we accept that the 3-way fight is a proxy for the METHHs, for the Tories, there is still some comfort in the polls. They’re ahead where they need to be but it's still squeaky-bum time. Their campaign has faltered by firing out a series of policies in a scattergun approach without communicating a core narrative since January. It’s has been exposed as a great mistake, which is being punished.
But this weekend allows them to press their own ‘Reset’ button before the voters become engaged in the campaign proper.
I fully accept that the pollsters must re-weight their samples and the way in which they do this is their intellectual property and the value in their business. And it’s good business. I’ve used YouGov myself. And been pleased by the results.
But today's gap apparently narrowing to 2%, betting money at stake, and the future of the country hanging in the balance, we’re putting a lot of trust in the YouGov computer models, which have been recently tweaked. Are the tweaks, in this case resulting in a 5.7% Con-Lab swing over-cooking it? The Tories were 6% ahead overall. The weighting is as much as the lead. It's non-trivial.
We know what’s going on here: Pollsters know that, back in 2005 a certain proportion of people backed Labour. This year the pollsters are calling voters and not as many people are saying that they’re voting Labour as before. We can see that in the raw data.
So, it may well be that either a large number of people reallyaren’t going to vote Labour after all. Or the pollster may infer that, on an historical basis, his sample is just under-weight in Labour supporters and increase the Labour share with technical adjustments over-and-above normal demographic/occupational adjustments to align with the long-term model trend.
And if so, this is the fudge-factor. This is Dr Jones’ of the UEA “Mike’s Nature Trick”. That fudge in this case amounts to 5.7%. It needs to be explained.
Which brings me back to my initial point. Are there parallels here between the muddle with the scientists down the road at the UEA over ClimateGate and what we’re seeing here? Are we seeing too much reliance on black-box systems, whose complexity is now divorcing them from the system that they’re trying to model?
I’m not having a pop at YouGov. I don’t know the answer but you’ll excuse me from posing the question with a seeming 5.7% structural adjustment being applied.
As Political Punters, we now have a judgement call to make safe in the knowledge that if things generally look too good to be true, then they are normally are too good to be true.
• Do the national voting intentions accurately reflect the likely result of the election in 640 seats?
• Is the evidence of our own eyes telling us that only one voter in twenty has changed their mind since 2005?
• Do we trust the black-box polling models when the weightings are so large. Are they arbitary?
• How much of the polling is science and how much of prediction is art. And is this why the betting markets have been relatively stable over the last few weeks.
Peter Kellner’s coming on the site on Tuesday. I hope that this thread can collate the questions that we need to ask him.