November 20, 2016

Gained in translation

From a talk  at the workshop on Fairness, Accountability, and Transparency in Machine Learning, via Twitter

she-is-a-nurse

There’s obviously something wrong with these translations, but it’s also hard to do better.

To step back, there has classically been a translation problem where Greek and Latin have separate words for man as distinguished from woman and for man ‘as distinguished from beasts and angels’. It can be quite hard to guess which word was in the original source, if you’re working from the English translation.  This problem has a simple solution, since modern English also has a clear (and increasingly unavoidable) distinction between ‘man’ on the one hand and  ‘human’ or ‘person’ on the other.

This isn’t that problem.  It’s kind of the opposite.

The correct translation of “O bir doktor” is one of “He is a doctor”, “She is a doctor”, and “They are a doctor” and the correct translation of “O bir hemşire” is one of “He is a nurse”, “She is a nurse”, and “They are a nurse”.  Without more context, though, you can’t tell which, and none of them is unmarked or neutral.  “He” and “She” are obviously too narrow, and while singular ‘They” has always been standard English for an unspecified individual, it is only recently standard for a specific individual if they have asked to be referred to that way because of non-binary gender identification.

This is an example where the ambiguities probably have to be put back in by humans, because predictive analytics is unavoidably going to follow the stereotypes. Or, as a new Harvard Business Review article rather optimistically says about the impacts of machine learning:

Using the language of economics, judgment is a complement to prediction and therefore when the cost of prediction falls demand for judgment rises. We’ll want more human judgment.

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »