September 10, 2017

Should there be an app for that?

As you may have heard, researchers at Stanford have tried to train a neural network to predict sexual orientation from photos. Here’s the Guardian‘s story.

Artificial intelligence can accurately guess whether people are gay or straight based on photos of their faces, according to new research that suggests machines can have significantly better “gaydar” than humans.

There are a few questions this should raise.  Is it really better? Compared to whose gaydar? And WTF would think this was a good idea?

As one comment on the study says

Finally, the predictability of sexual orientation could have serious and even life-threatening implications to gay men and women and the society asa whole. In some cultures, gay men and women still suffer physical and psychological abuse at the hands of governments, neighbors, and even their own families.

No, I lied. That’s actually a quote from the research paper (here). The researchers say this sort of research is ethical and important because people don’t worry enough about their privacy. Which is a point of view.

So, you might wonder about the details.

The data came from a dating website, using self-identified gender for the photo combined with the gender they were interested in dating to work out sexual orientation. That’s going to be pretty accurate (at least if you don’t care how bisexual people are classified, which they don’t seem to). It’s also pretty obvious that the pictures weren’t put up for the purpose of AI research.

The Guardian story says

 a computer algorithm could correctly distinguish between gay and straight men 81% of the time, and 74% for women

which is true, but is a fairly misleading summary of accuracy.  Presented with a pair of faces, one of which was gay and one wasn’t, that’s how accurate the computer was.  In terms of overall error rate, you can do better that 81% or 74% just by assuming everyone is straight, and the increase in prediction accuracy in random people over the human judgment is pretty small.

More importantly, these are photos from dating profiles. You’d expect dating profile photos to give more hints about sexual orientation than, say, passport photos, or CCTV stills.  That’s what they’re for.  The researchers tried to get around this, but they were limited by the mysterious absence of large databases of non-dating photos classified by sexual orientation.

The other question you might have is about the less-accurate human ratings.  These were done using Amazon’s Mechanical Turk.  So, a typical Mechanical Turk worker, presented only with a single pair of still photos, does do a bit worse than a neural network.  That’s basically what you’d expect with the current levels of still image classification: algorithms can do better than people who aren’t particularly good and who don’t get any particular training.  But anyone who thinks that’s evidence of significantly better gaydar than humans in a meaningful sense must have pretty limited experience of social interaction cues. Or have some reason to want the accuracy of their predictions overstated.

The research paper concludes

The postprivacy world will be a much safer and hospitable place if inhabited by well-educated, tolerant people who are dedicated to equal rights.

That’s hard to argue with. It’s less clear that normalising the automated invasion of privacy and use of personal information without consent is the best way to achieve this goal.


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »