Posts filed under General (1086)

January 18, 2017

Recognising te reo

Those of you on Twitter will have seen the little ‘translate this tweet’ suggestions that it puts up. If you’re from or in New Zealand you probably will have seen that reo Māori is often recognised by the algorithm as Latvian, presumably because Latvian also has long vowels indicated by macrons.   I’ve always been surprised by this, because Latvian looks so different.

It turns out I’m right.  Even looking just at individual letters, it’s very easy to distinguish the two.  I downloaded 74000 paragraphs of Latvian Wikipedia, a total of 6.5 million letters, and looked at how long the Latvians can go without using letters that don’t appear in te reo: specifically, s,z,j,v,d,c, g not as ng, the six accented consonants, and any consonant at the end of a word. On average, I only needed to wait five letters to know the language is Latvian rather than Māori, and 99% of the time it took less than 21 letters.

Another language that Twitter often guesses is Finnish. That makes more sense: many of the letters not used in Māori are also rare or absent in Finnish, and ‘g’ appears mostly as ‘ng’.   However, Finnish does have ‘s’, has ‘ä’ and ‘ö’, and ‘y’, and has words ending in consonants, so it should also be feasible to distinguish.

 

Update: Indonesian is another popular guess, but it has ‘d’,’j’,’y’,”b”, and it has lots of works ending with consonants.  The average time to rule out te reo is slightly longer, at nearly 6 characters, and the 99th percentile is 22 letters.  So if the algorithm can’t tell, it should probably guess it’s not Indonesian.

Update: For very short tweets, and those in mixed languages, nothing’s going to work, but this is about tweets where the answer is obvious to a human.

January 17, 2017

Briefly

  • There’s a planned course at the University of Washington “Calling Bullshit in the Age of Big Data”. Here’s the website with syllabus and readings, and the Twitter account.
  • Via a tweet from ‘Calling Bullshit’, there’s a computer science preprint looking at distinguishing ‘criminals’ from ‘normal people’ using photographs.  I usually wouldn’t comment here on research papers that haven’t made it to the news, but this sentence was irresistible
    “Unlike a human examiner/judge, a computer vision algorithm or classifier has absolutely no subjective baggages, having no emotions, no biases whatsoever due to past experience, race, religion, political doctrine, gender, age, etc.
    An aim of both the course and this blog is to increase the number of people who find this sort of claim ridiculous.
  • For map nerds: a detailed cartographic comparison of Google Maps and Apple Maps.
  • Data journalism: the Guardian looks at the spatial concentration of gun violence in the US.
  • There’s a quote circulating widely now on social media “Journalism is  printing what someone else does not want printed. Everything else is public relations.” It’s being attributed to Orwell. He didn’t say it — which I think matters in this context.
    According to Quote Investigator, versions of it described as an ‘old saying’ were around in US journalism in the early 20th century. Later, in 1930, Walter Winchell attributed a version to William Randolph Hearst. More recently, it has been attributed to Lord Northcliffe, a UK pioneer of tabloid journalism. It wasn’t attributed to Orwell until the 1990s, decades after he died.

And finally: this is actually true

January 12, 2017

Measuring what you care about: turmeric edition

5292650696_91308052f7_z

There’s a story on Stuff, with more detail at either Nature News or Scientific Americanthat turmeric doesn’t work. The original paper in the Journal of Medicinal Chemistry isn’t open access (actually, is), but its abstract is. It’s not new chemical research; it’s a review of what’s known about curcumin, the allegedly-active ingredient of turmeric, and why they don’t believe it.  In the opposite of the academic cliche, the point of the paper is to argue that less research is needed on curcumin and similar compounds.

StatsChat isn’t MedChemChat, but the paper is relevant for two reasons. First, turmeric is one of the foods that attracts low-quality, over-publicised research, which does end up on StatsChat. Second, the reason they don’t believe in turmeric is relevant.

Turmeric, if you believe the stories, appears to have pretty much every interesting biochemical effect anyone’s ever looked for.  That phenomenon has been seen before in medicinal chemistry, and the experience is that compounds which pass a huge range of screening tests tend to do it by cheating.

In 2010, two Australian chemists wrote a paper about “Pan-Assay INterference compounds” (PAINs) (abstract, story, blog post by another chemist). Most biologically interesting properties a compound might have aren’t visible to the naked eye. A lot of work goes into devising subtle and precise assays to measure them. A compound can mess up the assay and appear to pass the test without having the specific effect you’re looking for.  One important reason for PAIN is a compound that reacts with a wide range of proteins.

Turmeric, as you will no doubt have guessed, looks like a PAIN.  This nicely explains its excellent test-tube performance with its generally disappointing performance given as food to whole animals or people.  The researchers are arguing that turmeric seems to work in the lab because it cheats, and that it seems safe but less useful than hoped in people and animals mostly because it’s not absorbed well.

As the stories are careful to note, none of this definitively implies that curcumin (or some other tumeric ingredient) couldn’t have a beneficial effect, just that most of the evidence isn’t credible.   The same argument applies to some other trendy antioxidants.

It’s a recurrent theme on StatsChat that most data aren’t the real thing you care about. The speedometer needle position isn’t the same as speed; saliva THC concentration isn’t the same as impairment; methamphetamine traces on a wall aren’t the same as use — or manufacture– by a tenant; having a Chinese name isn’t the same as being an overseas housing speculator.  The map isn’t the territory.

 

 

Photo by Flickr user saptarshikar

January 9, 2017

News to look forward to

Last year, we had a bunch of early-stage Alzheimer’s trials in the news. I thought I’d look at what’s due out in the clinical trial world this year.

Perhaps most importantly, in March we should see the first real results on a new set of cholesterol-lowering drugs.  The ‘PCSK9’ inhibitors are one of the first drugs outside the cancer world to come from large-scale genetic studies without a particular hypothesis in mind. As the gene name ‘PCSK9’ indicates to those in the know, the gene was originally named just as the ninth in a series of genes that looked similar in structure.  It turned out that mutations in PCSK9 had big effects on LDL (‘bad’) cholesterol levels. Also, importantly, there is at least one person walking around alive and healthy with disabling mutations in both her copies of the gene — so there was a good chance that inhibiting the protein would be safe.  At least three companies have drugs (monoclonal antibodies) that target PCSK9 and reduce cholesterol by a lot; though the drugs need to be given by intravenous injection.

Although the drugs have been shown to reduce cholesterol, and have been approved for sale in the US for people with very high cholesterol not otherwise treatable, they haven’t been shown to prevent heart attacks (which is the point of lowering your cholesterol). The first trial looking at that sort of real outcome has finished, and there’s a good chance the results will be presented at the American College of Cardiology meeting in March.  For people in NZ the main interest isn’t in the new treatments — it’s hard to see them being cost-effective initially — but in the impact on understanding cholesterol.  If these drugs do prevent heart attacks, they will increase our confidence that LDL cholesterol really is a cause of disease; if they don’t, they will give aid and comfort to the people who think cholesterol is missing the whole point.

What else? There are some interesting migraine trials due out: both using a new approach to prevention and using a new approach to giving the current treatments.  The prevention approach is based on inhibiting something called CGRP in the brain, which appears to be a key trigger; the drug is injected, but only every few months.  The treatment approach is based on a new sort of skin patch to try to deliver the ‘triptan’ drugs, which they hope will be as fast as inhaling or injecting them and less unpleasant.

Also, there’s an earlier-stage New Zealand biotech product that will have results early in the year: using cells from specially bred pigs, coated so the immune system doesn’t notice them, to treat Parkinson’s Disease.

 

January 8, 2017

The drug-driving problem

The AA are campaigning again for random drug tests of drivers. I’m happy to stipulate that in NZ lots of people smoke cannabis, and some of these people drive when stoned, and sometimes when drunk as well, and this is bad. As the ads say.

On the other hand, science has not yet provided us with a good biochemical roadside test for impairment from cannabis. For alcohol, yes. For THC, no. That’s even more of an issue in the US states where recreational marijuana use is legal, since the option of just taking away driving licences for anyone with detectable levels isn’t even there.

This isn’t just a point about natural justice. There’s empirical reason (though not conclusive) to believe that many people who might fail a biochemical test are reasonably careful about driving while high.

First, there hasn’t been any evidence of an increase in road deaths in the US states where medical or recreational marijuana use is legal, even though there has been an increase in people driving with detectable levels of the drug.

Second, if you look at the 2010 ESR report (PDF) that the AA are relying on, you find (p20)

The culpability of the drivers using cannabis by itself was determined and odds ratios have been calculated as described in the alcohol section and in Appendix two. The results are given in Table seven. The odds ratio calculated for cannabis only use is only slightly greater than one, implying that cannabis does not significantly impact on the likelihood of having a crash.

Now, the report says, correctly, that this disagrees with other evidence and that we shouldn’t assume driving while stoned is safe. But they tried quite hard to do alternative analyses showing cannabis was bad, and were unsuccessful.

In 2012, there was another AA campaign, and a story in the Herald

But Associate Minister of Transport Simon Bridges said the Government would wait for saliva testing technology to improve before using it.

A government review of the drug testing regime in May concluded the testing devices were not reliable or fast enough to be effective.

It ruled the saliva screening takes at least five minutes, is unlikely to detect half of cannabis users, and results are not reliable enough for criminal prosecution.

“The real factor is reliability … we can’t have innocent people accused of drug driving if they haven’t been.

“But as the technology improves, I’m sure in the future we will have a randomised roadside drug test.”

That seems like a sensible policy.

Briefly

  • Graphics, overquantified life: Andrew Elliott’s graph of his baby’s first six months of sleep
  • Graphics: Bird migrations in the Americas (click for animation)bird_map
  • Public policy: Graeme Edgeler has better numbers on the three-strikes law, and a new post
  • Graphics: Weather forecast around the British Isles, shipping
    from the people who tweet the Shipping Forecast
  • From the Washington Post: more people die between Christmas and New Year than you’d expect. It’s true in NZ, so it’s not the weather.
  • The alt-right movement finds more dumb things to do with genetic testing: the Atlantic
  • A (moderately technical) short course on Fairness and Transparency in Machine Learning.
  • “Missing Datasets”: a partial list of useful and important public datasets that don’t (and won’t) exist.
  • Surash Venkat explains why modern data-based ‘algorithms’ aren’t at all like recipes — which is why they need to be studied statistically, not just by looking at the code or asking if the developers were pure of heart.
  • “The Great AI Awakening”. From the NY Times, on Google, the revolution in machine translation, and big data.
  • Companies Ponder a Rating of Workers’ Health”. From the Wall St Journal.  One one hand, having big companies report summaries of their employees’ health might give them better incentives.  On the other hand, they’d need to get the data, and if you think about what else they might do with it…
January 7, 2017

Social data analytics: how not to do it

Over the holidays, problems began emerging with the new data-based approach to detecting benefit overpayments in Australia. I learned about this from @Asher_Wolf, an Australian privacy advocate.  In a significant number of cases the computer system was  inaccurate as to whether people owed money.  Documentation to correct the errors is the sort of thing a lot of people don’t have lying around (though perhaps technically they should) and in at least some cases the computer system didn’t allow the correct information to be submitted.   The Sydney Morning Herald has a piece (warning: autoplaying audio ads) referencing Cathy O’Neil’s book Weapons of Math Destruction.

Australian regulations on government data-matching systems call for the development of a ‘program protocol’, including “description of the data to be provided and the methods used to ensure it is of sufficient quality for use in the program” and “a statement of the costs and benefits of the program.” However, in Appendix C describing the cost-benefit statement it’s made clear than only cash costs and benefits to the Commonwealth count. Monetary compliance costs to individuals don’t count, and non-monetary costs don’t count. Sending out more letters seems to counts as beneficial as long as it raises more money than you spend doing it — whether or not that money is legally owed.

The ‘technical standards’ report is supposed to cover data integrity and risks “including, but not limited to, risks to the privacy of individuals, reputational risks, and risks relating to incorrect matches.”  In particular, it’s supposed to describe “the sampling techniques used to verify the validity/accuracy of matches”.  That would be interesting to see, given that it seems to take a lot of work to prove that a match is incorrect.

In principle this might all  be worked out in the appeals process, by real humans — or, at least, the amounts of repayments might be. The stress inflicted on the recipients of the letters and the harm done to the reputation of Australia’s government data systems are harder to fix.  In the short term, the former is (rightly) getting more attention; in the long term it might be the latter that does the greater damage.

January 6, 2017

Detox isn’t a thing

The New Year is traditionally a time for short-term, one-off attempts to improve one’s health, like going to the gym for two weeks. One fashionable form is ‘detox’, where you take a few components of what might be a sensible change in diet, massively overdo them for a short time, then go back to your usual diet. The idea is that your body builds up ‘toxins’ that it’s unable to get rid of by normal biological processes, but that it can easily be tricked into getting rid of them rapidly by some special ritual.  Here’s a good piece from the Observer describing the problem. [update: and Michelle ‘Nanogirl’ Dickinson’s column this week, too]

The NZ media did ok on detox this year. There was a UK story about a particular herbal mixture causing dangerous sodium loss; there was one positive but somewhat restrained story; I’ve only seen one completely bogus one.

The moderately restrained story was in the Herald. It talked about a bunch of sensible dietary changes,  a bunch of basically unsupported herbal stuff, and for completeness, Native American sweat lodges. However, at least the main idea was to make long-term changes in one’s diet rather than to have some magical purification experience.  The story even had a couple of links to scientific papers, though they were to research showing that pollution might be harmful, which is not the problematic component of the detox myth.

On the other hand, on Twitter today, Peter Green posted a headline from the cover of “M2” magazine: “Six manly foods to detox your liver”. No, I’m not making this up.

It may help to know that other recent health headlines include “Experts Say Wearing This Colour Will Help You Have A More Effective Workout”, and Neuroscience Says That This Song Reduces Anxiety By 65%”

If you’re wondering what detox foods are considered “manly” in the 21st century, the list includes turmeric, green tea, and broccoli sprouts. Quiche is still out.

January 5, 2017

Traffic and the brain

Q: Do we need to move?

A: Um. No?

Q: “Living near a busy road could cause dementia”, it says.

A: No, that’s ok. We don’t live near a busy road, even to the extent that rhetorical constructs live anywhere.

Q: So what’s a ‘busy’ road, then?

A: An arterial or highway.  We’re more than 100m from the nearest one.

Q: That doesn’t sound very far.

A: The 1.07 times higher risk estimated in the research paper is for people within 50m of a major road.

Q: How many people is that?

A: In Ontario (mostly Toronto), 20%.  In Auckland, not so many.

Q: But we’re supposed to be in favour of population density and cities, aren’t we?

A: Yes. But even if the effect is real, it’s pretty small.

Q: The story says roads are responsible for 1 in 9 cases. That’s not so small.

A: One in 9 cases among people who live within 50m of a major road. Or, using one of the other estimates from the research, one in 14 cases among people who live within 50m of a major road.

Q: And 150m from a major road?

A: About one in 50 cases.

Q: Ok, that’s pretty small. Can they really detect it?

A: They’ve got data on a quarter of a million cases of dementia, so it’s borderline.

Q: But still?

A: Well, the the statistical evidence isn’t all that strong. A p-value of 0.035 from one of the three neurological diseases they looked at, isn’t much in a data set that large.

Q: And it’s just a correlation, right?

A: They’ve been able to do a reasonable job of removing other factors, and the road proximity was measured a long time before the dementia, so at least they don’t have cause and effect backwards.  But, yes, it could be something they didn’t have good enough data or modelling for.

Q: How about age? That’s a big issue with modelling dementia, isn’t it?

A: These are epidemiologists — “physicians broken down by age and sex”, as the old joke says — they know about age. They only compared groups of people of exactly the same age.

Q: But what does ‘exactly the same age’ even mean for something that doesn’t have a precise starting time?

A: That’s more of a problem. If people living near major roads got dementia at the same rate, but had it diagnosed six months earlier on average, that would be enough to explain the difference. There’s no particular reason that should happen, but it’s not impossible.

Q: So is the research worth looking at?

A: Worth looking at for consenting scientists in private, but not really worth international publicity.

 

January 1, 2017

Kinds of fairness worth working for

Machine learning/statistical learning has a very strong tendency to encode existing biases, because it works by finding patterns in existing data.  The ability to find patterns is very strong, and simply leaving out a variable you don’t want used isn’t enough if there are ways to extract the same information from other data. Because computers look objective and impartial, it can be easier to just accept their decisions — or regulations or trade-secret agreements may make it impossible to find out what they were doing.

That’s not necessarily a fatal flaw. People learn from existing cases, too. People can substitute a range of subtler social signals for crude, explicit bigotry.  It’s hard to force people to be honest about how they made a decision — they may not even know. Computer programs have the advantage of being much easier to audit for bias given the right regulatory framework; people have the advantage of occasionally losing some of their biases spontaneously.

Audit of black-box algorithms can be done in two complementary ways. You can give them made-up examples to see if differences that shouldn’t matter do affect the result, and you can see if their predictions on real examples were right.  The second is harder: if you give a loan to John from Epsom but not to Hone from Ōtara, you can see if John paid on time, but not if Hone would have.  Still, it can be done either using historical data or by just approving some loans that the algorithm doesn’t like.  You then need to decide whether the results were fair. That’s where things get surprisingly difficult.

Here’s a picture from a Google interactivefairness

People are divided into orange and blue, with different distributions of credit scores. In this case the blue and orange people are equally likely on average to pay off a loan, but the credit score is more informative in orange people.  I’ve set the threshold so that the error rate of the prediction is the same in blue people as in orange people, which is obviously what you want. I could also have set the threshold so the proportion of approvals among people who would pay back the loan was the same in blue and orange people. That’s obviously what you want.  Or so the proportion of rejections among people who wouldn’t pay back the loan is the same. That, too, is obviously what you want.

You can’t have it all.

This isn’t one of the problems specific to social bias or computer algorithms or inaccurate credit scoring or evil and exploitative banks.  It’s a problem with any method of making decisions.  In fact, it’s a problem with any approach to comparing differences. You have to decide what summary of the difference you care about, because you can’t make them all the same.  This is old news in medical diagnostics, but appears not to have been considered in some other areas.

The motivation for my post was a post at Pro Publica on biases in automated sentencing decisions.  An earlier story had compared the specificity of the decisions according to race:  black people who didn’t end up reoffending were more likely to have been judged high risk than white people who didn’t end up reoffending. The company who makes the algorithm said, no, everything is fine because people who were judged high risk were equally likely to reoffend regardless of race. Both Pro Publica and the vendors are right on the maths; obviously they can’t both be right on the policy implications. We need to decide what we mean by a fair sentencing system. Personally, I’m not sure risk of reoffending should actually be a criterion, but if we stipulate that it is, there’s a decision to make.

In the new post, Julia Angwin and Jeff Larsen say

The findings were described in scholarly papers published or circulated over the past several months. Taken together, they represent the most far-reaching critique to date of the fairness of algorithms that seek to provide an objective measure of the likelihood a defendant will commit further crimes.

That’s true, but ‘algorithms’ and ‘objective’ don’t come into it. Any method of deciding who to release early has this problem, from judicial discretion in sentencing to parole boards to executive clemency. The only way around it is mandatory non-parole sentences, and even then you have to decide who gets charged with which crimes.

Fairness and transparency in machine learning are worth fighting for. They’re worth spending public money and political capital on. Part of the process must be deciding, with as much input from the affected groups as possible, what measures of fairness really matter to them. In the longer term, reducing the degree of disadvantage of, say, racial minorities should be the goal, and will automatically help with the decision problem. But a decision procedure that is ‘fair’ for disadvantaged groups both according to positive and negative predictive value and according to  specificity and sensitivity  isn’t worth fighting for, any more than a perpetual motion machine would be.