February 23, 2013

When in doubt, randomise.

There has been (justified) wailing and gnashing of teeth over recent year-9 maths comparisons, and the Herald reports that a `back to basics’ system is being considered

Auckland educator Des Rainey, who did the research with teachers to test his home-made Kiwi Maths memorisation system, said the results came as a shock to the teachers and made him doubt his programme could work.

But after a year of practising multiplication and division on the Kiwi Maths grids for up to 10 minutes a day, the students more than doubled their speed.

This program looks promising, but why is anyone even talking about implementing a major nationwide intervention based on a small, uncontrolled before/after comparison measuring a surrogate outcome?

That is, unless you believe teachers and schoolchildren are much less individually variable than, say, pneumococci, you would want a randomised controlled comparison, and since presumably Des Rainey would agree that speed of basic arithmetic is important primarily because it’s a foundation for actual numeracy, you’d want to measure the success of the program based on numeracy tasks rather than on arithmetic speed. The results being reported are what the medical research community would call a non-randomised Phase IIa efficacy trial — an important stepping stone, but not a basis for policy.

Of course, that’s not how education works, is it?

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar

    In general, the complete lack of any sort of statistical rigor in education saddens me. All of these “movements” that sweep K12 education are all based on single charismatic instructors who come up with a system that works for them, in controlled settings, with certain students. Why would this *ever* generalize to arbitrary instructors with arbitrary students? Far too much personality worship and hand-waving justifications, and far too little actual science.

    11 years ago

    • avatar
      Thomas Lumley

      And, to be fair, the changes in the high-school stats curriculum that my colleagues have persuaded people to adopt are also not as well evaluated as one would like.

      They’ve gathered what data they can from both a set of pilot schools and from the UoA Stage 1 stats course, and this isn’t a “single charismatic instructor who comes up with a system that works for them” but I believe it’s still all uncontrolled before/after comparisons.

      I suspect that part of the problem is that parents are more likely than patients to think they know the right thing to do. And partly that the worst failures of expert professional judgement in medicine are so much more dramatic than in education.

      11 years ago

      • avatar

        Dramatic in timing, absolutely. I think that a failure of a school system can still be dramatic, it’s just much harder to point to the causes because they can be lagged 10-20 years.

        11 years ago

  • avatar
    Peter J Keegan

    There isn’t a complete lack of statistical rigor in education as Wesley suggests. Too many people rely on the media as your primary source of educational statistics and their interpretation. Good statistical studies in education are rarely reported as they often are not headline material or contradict current policies or individual ideologies

    11 years ago

    • avatar
      Thomas Lumley

      I agree that the media doesn’t present the evidence well, but I still think it’s true that there are almost no randomised trials of teaching strategies.

      11 years ago

      • avatar
        Thomas Lumley

        To expand: I’m not saying observational research is useless — I do a lot of it myself. But even the best observational research only gets you so far, as medicine and public health have painfully learned over the past half-century.

        I don’t have an informed opinion on the standard of observational research in education, but if the standard is high, there can’t still be lots of low-hanging fruit in the form of cost-neutral interventions whose benefit is obvious without comparative evaluation.

        11 years ago

    • avatar

      I have several colleagues who work Mathematical Education, and from the journal articles they show me, I’d still lean toward my original (admittedly slightly inflammatory) statement. I’m not getting my interpretations from the media, but from the literature of the field itself.

      And, to cherry pick an example, most of the United States education system appears to be unwilling to even consider statistical examination of their methodologies. I’m not familiar with the NZ system beyond a cursory knowledge, and am not trying to generalize to it, beyond a ‘guilt by association’ standard. :)

      I agree with Thomas’ post below: I’m not familiar with any serious attempts at randomized trials for education. There are many cases where it could be effectively done, e.g. some of the ‘Charter School’ movements in the US are well set up with accept/reject status that a randomized acceptance, carefully followed, might be interesting. It isn’t (and again, I point to my inflammatory statement) because there seems to be a strong unwillingness to even consider that the newest and greatest method might have flaws. That interpretation *is* coming from the media, but when proponents of a system say such inane things, it’s hard not to interpret it accordingly.

      11 years ago

  • avatar
    Hans Hockey

    As Ben Goldacre (of Bad Pharma fame) says in

    http://www.guardian.co.uk/commentisfree/2011/may/14/bad-science-ben-goldacre-randomised-trials

    Do school uniforms really improve attendance? Run a trial and find out.

    11 years ago

  • avatar
    Megan Pledger

    Wesley Burr said:
    “There are many cases where it could be effectively done, e.g. some of the ‘Charter School’ movements in the US are well set up with accept/reject status that a randomized acceptance, carefully followed, might be interesting.”

    There have been some experiments done where kids who have been in a charter school lottery have been followed up by whether they got into the charter school or not and how they go on to succeed.

    The main problem is that peer effects are entangled with school effects. See how charter schools “selects” students prior to the lottery (or “acidently” leaves people out of the lottery)
    http://www.reuters.com/article/2013/02/15/us-usa-charters-admissions-idUSBRE91E0HF20130215
    and you can see that the peers of a student getting in to a charter school are different to those of a kid heading back to the local public school. (Even so, most of the time that public school is doing the same, if not better, than the Charter School
    http://credo.stanford.edu/reports/National_Release.pdf
    )

    The main problem is that experimenting in schools is hard because of the structure of schools e.g. City/district – school – class/teacher – pupil. And even then you probably want to stratify at the school level for school type (private, public, single sex etc).

    The other thing is that treatments can’t (usually) be blinded which brings in a whole lot of biases.

    11 years ago

    • avatar
      Thomas Lumley

      The difficulty in blinding is a problem for deciding whether things work, but it’s not an argument against randomisation.

      11 years ago

      • avatar
        Megan Pledger

        It was an argument for experimenting in schools being hard not against randomization.

        If teachers don’t buy into the treatment than that 25-30 observations being effected.

        11 years ago

        • avatar
          Thomas Lumley

          Yes, but we know more or less how to handle cluster randomisation. In fact, even in cluster randomisation in schools.

          For example, the Hutchinson Smoking Prevention Project, which randomised school districts to get or not get a sophisticated anti-smoking intervention. The intervention didn’t work, but the trial certainly did.

          11 years ago

        • avatar
          Megan Pledger

          @ Thomas Lumley
          I didn’t say we didn’t know how, I said it was hard.

          The smoking trail would have had good buy-in from teachers because
          a) smoking is bad for health and
          b) it’s not about teaching.

          11 years ago

  • avatar
    Richard Penny

    And there is Andrew Gelman and Eric Loken’s article in Chance, vol 25(1), 47-48, “Statisticians: When we teach, we don’t practice what we preach.”

    11 years ago

    • avatar
      Thomas Lumley

      Yes, which I posted about when it came out, in the previous post with the “When in doubt, randomise” title.

      11 years ago

  • avatar
    Thomas Lumley

    Yes, I agree it’s hard, but it was hard in medicine as well.

    I’d say the real obstacle is not the (genuine) difficulties, but the lack of demand and the consequent lack of funding to do it. I’m not arguing that education researchers should just do randomisation — without budgets that’s not possible — but in order to get there we need a recognition that it’s important.

    11 years ago

    • avatar
      Megan Pledger

      Actually, from what I have observed, clustering is pretty much ignored in drug trials (although I haven’t kept up in the last 10 years or so). My suspicion would be that it wouldn’t make a lot of difference (in NZ and OZ) anyway, unless you have clusters within GP but that’s rare for drug trials, they are usually centred in hospitals.

      The Youth’12 national youth health and well being survey cost $1.6 million. It was a survey done in secondary schools only. I don’t see an experiment in schools being any cheaper given the assumption that we’ve got the low hanging fruit via observational studies. I mean, we are looking for small effects in that case.

      11 years ago

      • avatar
        Thomas Lumley

        Clustering in the design typically isn’t ignored now, though it used to often be in the past.

        With drug trials, though, the randomisation is usually at the individual level, or even balanced within clusters, so that it doesn’t matter.

        I’m not sure about the small effects — people are claiming big effects in the maths example, in which case a trial would not be that expensive (and it’s still a lot cheaper than, say, the additional patch-up cost of Novopay).

        11 years ago