June 17, 2014

Margins of error

From the Herald

The results for the Mana Party, Internet Party and Internet-Mana Party totalled 1.4 per cent in the survey – a modest start for the newly launched party which was the centre of attention in the lead-up to the polling period.

That’s probably 9 respondents. A 95% interval around the support for Internet–Mana goes from 0.6% to 2.4%, so we can’t really tell much about the expected number of seats.

Also notable

Although the deal was criticised by many commentators and rival political parties, 39 per cent of those polled said the Internet-Mana arrangement was a legitimate use of MMP while 43 per cent said it was an unprincipled rort.

I wonder what other options respondents were given besides “unprincipled rort” and “legitimate use of MMP”.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Megan Pledger

    These polls seem to have really variable weights so I don’t know that you can predict very accurately how many actual respondents supported I/M/IM.

    They don’t often tell you the raw numbers just the percentages for party support but Colmar Brunton gave raw numbers one week and percentages the next so it possible to have a rough estimate how off they are.

    From the Colmar Brunton poll that you talked about here there were 222 Nat supporters and 153 Lab+Green supporters. Using the data from the survey they did the next week there ought to have been 212 (95% CI 191-232) and 170 (95% CI 151-191) supporters respectively (after taking out the 17% don’t knows/refused).

    While 153 Lab+Green squeaks into the 95% CI of 151-192 it’s a little concerning that Lab+Green data appears to be upweighted so much.

    What I really would like from these polls is the design effect – the square of the sum of the weights/the sum of the squared weights. The design effect should factor into the margin of error, by using the effective sample size rather than the actual sample size, and my guess would be that it would easily double the margin of error.

    10 years ago

    • avatar
      Thomas Lumley

      For the parties that have been around for a while, you can get a good idea of the design effect by looking at the dispersion of poll results around the long-term trend, eg using Peter Green’s code.

      I’d be surprised if the design effect were as high as four, which is what it would take to double the margin of error.

      10 years ago

    • avatar

      Hi Megan

      Don’t forget that the results are also filtered on likelihood to vote, which differs by Party Support. I agree with your point though.

      Cheers
      Andrew

      10 years ago

      • avatar

        Yet another vector stuck into the Rim Weighting without any analysis or understanding of what it might be doing to the estimates.

        10 years ago

        • avatar

          There was supposed to be a ? on the end of that sentence. If they really use likelihood to vote and bin it. The other way is to base the entire analysis on people who say they voted in the last election. That of course creates issues with the youngest voting group who were too young to vote in the previous election.

          10 years ago

    • avatar

      Also, how do you calculate the design effect for quota surveys, that may not weight their data at all?

      10 years ago

      • avatar
        Megan Pledger

        Quota samples aren’t probabilistic samples so all the sampling theory from statistics doesn’t apply.

        If you assume that they are just like simple random samples (which going by how they are analysed people seem to do) than the weights are 1 and the design effect is 1. ;->

        But for a poll that failed spectacularly recently (way worse than Romney’s)
        http://talkingpointsmemo.com/livewire/eric-cantor-david-brat-challenger-tea-party-internal-poll

        But I suspect that’s more about not having a good definition about who should be in the sampling frame.

        10 years ago

    • avatar

      The technique used to do weighting is Rim Weighting. It goes by other names. Here is an article which does a little intro. It basically just uses the marginal totals for different variables and not the actual counts from the multidimensional table. Say an age group totals vector and then a separate vector of totals for large city/smaller city/rural or what have you. Note the last little warning paragraph “read this twice”:

      http://statpac.com/updates/rim-weighting.htm

      Long ago I had a copy of work Deming did which showed that depending on the true underlying counts from the multidimensional table, Rim Weighting (he called it Iterative Proportional Fitting I believe) could make your estimates better or worse. Catch 22 is that you only use Rim Weighting because you lack the full data to use thus you have no way to know your weighted data is more poorly estimated!

      So you would only use it when you had to, right? Nope. When I was the head of computing and statistics for the largest Market Research Company in NZ, Rim Weighting was used all the time because that is what was in the computer package (Quantum software started it as far as I know — everybody else just followed). Nobody (except me) looked at the variation in the weights. I saw ratios of 12 to 1 in weights. I cal bogus.

      Megan, your estimate of at least twice the stated margin of error is what I’ve come up with based on insider knowledge, access to the original data, and using a more complete statistical model of error (more than just random sampling).

      Things may be better now, but I don’t think so. Just start by asking each research company if they use Rim Weighting. It is the method in the software software which Morgan provides to clients (and uses in house). It is in SurveyCraft (used by many other companies). It just happens to be bogus in my book.

      10 years ago

    • avatar

      The technique used to do weighting is Rim Weighting. It goes by other names. Here is an article which does a little intro. It basically just uses the marginal totals for different variables and not the actual counts from the multidimensional table. Say an age group totals vector and then a separate vector of totals for large city/smaller city/rural or what have you. Note the last little warning paragraph “read this twice”:

      http://statpac.com/updates/rim-weighting.htm

      Long ago I had a copy of a paper Deming did which showed that depending on the true underlying counts from the multidimensional table, Rim Weighting (he called it Iterative Proportional Fitting I believe) could make your estimates better or worse. Catch 22 is that you only use Rim Weighting because you lack the full data, thus you have no way to know your weighted data is more poorly estimated!

      So you would only use it when you had to, right? Nope. When I was the head of computing and statistics for the largest Market Research Company in NZ, Rim Weighting was used all the time because that is what was in the computer package (the Quantum software started it as far as I know — everybody else just followed). Nobody (except me) looked at the variation in the weights. I saw ratios of 12 to 1 in weights. It is not best practice to use weights so extreme.

      Megan, your estimate of at least twice the stated margin of error is what I’ve come up with based on insider knowledge, access to the original data, and using a more complete statistical model of error (more than just random sampling) and including non response and refusals (another thing Research companies don’t report), unable to contact, and people who don’t know how they will vote.

      Things may be better now, but I don’t think so. Just start by asking each research company if they use Rim Weighting. It is the method in the software software which Morgan provides to clients (and uses in house). It is in SurveyCraft (used by many other companies). It just happens to be too far away from best practice in my book. Especially if nobody bothers to look at the distribution of weights.

      10 years ago

  • avatar

    Hmmm. Weird double post and you get to see me calling the method bogus before I went for “not best practice”. My modem went down part way through posting and I quickly copied my test into a window on my own machine, where I did my final edit.

    10 years ago

  • avatar

    If it is Ipsos you are thinking about regarding quota sampling, there is a methodology section here:

    http://find.ipsos.co.nz/Fairfax-Ipsos/14.02/Poll14.02.15/methodology.html

    To me it reads like the standard Market Research methodology. The data could be treated as having been obtained by a stratified sample (stratified by age, gender, ethnicity, urban/rural) but I have yet to meet anybody who actually does the calculations to use the proper design weights for a stratified sample. All they do is press the button for Rim Weighting by the margins of the 4 dimensional table. I don’t even know if they incorporate the proper weights to adjust for the fact that their sampling is by households but they want to know about individuals so they take one by next birthday. That means they should have a 1/n weight for each person where n is the number of eligible people in the household.

    Note also that when I had a close look at all the crosstabulation packages (I was writing code to reproduce their results so we are talking a very close look) not one adjusted any significance tests for the effects of the data having been weighted. The degrees of freedom and variances were not adjusted in any way. They report unweighted percentages (for better or worse) and weighted marginal counts (usually) but unadjusted significance tests.

    You can tell from the Ipsos link they are treating their weighted data as if it is a simple random sample. They have just read the row and column in the table for n=1000 and percentage =35% or 65%. At least it makes a change from quoting the 50% entry.

    10 years ago

  • avatar

    Steve

    Some pollsters and statisticians do actually carefully consider the variance they are sacrificing through error/bias correction. I can’t speak for all.

    Andrew

    10 years ago

    • avatar

      Thanks Andrew. That’s good to hear. As far as I know proper consideration has been restricted to Universities and the Stats Department.

      Do you know about Rim Weighting and have you used it? I’d love to hear what others who have achieved statistical consciousness think about it.

      Maybe things are much improved. I’ve been away from Market Research for many years. My only direct info now is seeing the same software being used and the same things written in the manuals. This issue comes up about once every three years (no guessing as to why) and I go and check again. As I have now.

      The basic problem back when I was involved is that the market is competitive and if we priced a job with higher specs and specialist analysis we didn’t get the job.

      No MR company would break ranks and ever give more than the margin of error presuming perfect response rate SRS because if they did the customers would have their eyes glaze over and ask why our estimates had more error than the other research companies. Clearly that meant they got better value from the other research companies.

      Nobody at a Newspaper or TV station is willing to pay for research where the results keep coming back “nothing has changed”. That isn’t a story. And when I was involved MR companies only took on political “give away your research” polling as an exercise in corporate branding — not to make money.

      Has the whole dynamic actually changed?

      10 years ago

  • avatar

    Hi Steve

    Yes I know about rim weighting and yes and I’ve used it on some of the surveys I’ve designed. I agree you need to be careful when using it – you can get some odd interactive effects, especially for sub-groups.

    Most of the research I do is social research, for government agencies. Higher specs and specialist analysis are an easier sell for these orgs, whose research is often required to stand up to ministerial, public or academic scrutiny.

    I’ve experienced some of what you describe, but among my clients higher specs are quite often requested/demanded, and decisions are less likely to be made on price alone. I don’t think the industry has changed, as this has been my experience with my clients over the past 10 or so years. As an example, we’ve responded to briefs that simply insist on probability sampling and that require 50-100 page methodology/sampling success reports. I like these briefs (eg, MSD Living Standards Survey, and others).

    Yes polls can be great for branding, or not.

    Cheers
    Andrew

    P.S. My question about design effects for quota surveys was rhetorical. My point was that we can’t demand that all polls report MoEs based on the effective sample size, when it’s not actually possible to calculate effective sample sizes for some polls. I used to go on-and-on about this (I’m like that), but I chilled out about it after people got annoyed :) Also, quota surveys have been fairly on-the-money over past elections (not saying the sampling approach is the sole determinant of this).

    P.P.S. Excuse autocorrect and typos (iPad)

    10 years ago

    • avatar
      Thomas Lumley

      As a translation note, “rim weighting” is now called “raking” in the statistical literature.

      10 years ago