November 28, 2022

99.44% pure

From the Guardian: Computer says there is a 80.58% probability painting is a real Renoir. The story goes on to say Dr Carina Popovici, Art Recognition’s CEO, believes that this ability to put a number on the degree of uncertainty is important.

It’s definitely valuable to put a number on the degree of uncertainty. What’s much less clear is that it’s valuable to put a number on the uncertainty to four-digit precision.   Let’s think about what it would take to be that precise.

If the 80.58% number was estimated from a proportion of observed data in some sense, quoting it to four digits would only make sense if the uncertainty was less than about 0.05%.  A standard error of 0.05% would need a sample size of more than five hundred million.

Another way you can get an estimate with high precision is including subjective expert opinion, which would be entirely appropriate in a context like this. There’s no limit to how precise this can be for the person whose opinion it is — you believe exactly what you believe — but there are very strong limits on how precise it can realistically be as a guide to others.  If the computer isn’t the one buying the Renoir, other people probably shouldn’t care about its opinion to more than one or two digits accuracy.

Sometimes when you come up with an estimate you want to quote it to higher precision than is directly useful — lots of statistical software, including some I write, quotes four or more digits in the default output. This allows rounding to happen closer to the point of use, such as before it’s in a headline in the mainstream media.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Nick Iversen

    I doubt that the number 80.58% is even a probability. The article says that they have trained their AI to recognise about 300 artists. I bet they used a softmax function to fit the model. The softmax ensures that you get numbers that add up to 100% but that doesn’t mean they are probabilities. Also did they include “not one of the 300” as a category? And how many images did they include from that category in the training set? It makes a difference. I could put that image into my cats-and-dogs model and it would probably tell me that with a 60% probability it is a cat. There are a small number of Renoirs out there and a billion paintings that are not Renoirs so a Bayesian would have to say that the probability it is a Renoir is 0.0000…

    1 year ago

  • avatar
    Antonio Rinaldi

    If one says 80.58% I suppose he is confident his estimate lies between 80.575% and 80.585%.
    So, the uncertainty should 0.005% at most. Have you forgotten a zero, or am I missing something else?

    1 year ago

    • avatar
      Thomas Lumley

      I’m allowing for “one uncertain digit” in reporting

      1 year ago