October 10, 2011

# Stat of the Week Competition: October 8-14

Each week, we would like to invite readers of Stats Chat to submit nominations for our *Stat of the Week* competition and be in with the chance to win an iTunes voucher.

Here’s how it works:

- Anyone may add a comment on this post to nominate their
*Stat of the Week*candidate before midday Friday October 14 2011. - Statistics can be bad, exemplary or fascinating.
- The statistic must be in the NZ media during the period of October 8-14 2011 inclusive.
- Quote the statistic, when and where it was published and tell us why it should be our
*Stat of the Week*.

Next Monday at midday we’ll announce the winner of this week’s *Stat of the Week* competition, and start a new one.

The fine print:

- Judging will be conducted by the blog moderator in liaison with staff at the Department of Statistics, The University of Auckland.
- The judges’ decision will be final.
- The judges can decide not to award a prize if they do not believe a suitable statistic has been posted in the preceeding week.
- Only the first nomination of any individual example of a statistic used in the NZ media will qualify for the competition.
- Employees (other than student employees) of the Statistics department at the University of Auckland are not eligible to win.
- The person posting the winning entry will receive a $20 iTunes voucher.
- The blog moderator will contact the winner via their notified email address and advise the details of the $20 iTunes voucher to that same email address.
- The competition will commence Monday 8 August 2011 and continue until cancellation is notified on the blog.

### Nominations

### Nominate your Stat of the Week

**First time nominating? Please use your real first name ***and* surname and read the Comment Policy.

*and*surname and read the Comment Policy.

Statistic:Man drought confirmed in New ZealandSource:Sunday Start TimesDate:9 October 2011The use of statistics in this article that “confirms” that there is a man drought is bizarre.

It says that there are roughly 50 000 each of men and women in the age range 25-39 who are single. But if you restrict men to just those earning more than $60 000 then there are only 24 000 of them for the 60 000 women. Therefore there is a man drought!

Using the same logic I could say that of the 60 000 women there are only 10 000 who are blonde and so there really is a woman drought.

3 years ago

Statistic:It’s not often that the quiet world of mathematics is rocked by a murder case. But last summer saw a trial that sent academics into a tailspin, and has since swollen into a fevered clash between science and the law.At its heart, this is a story about chance. And it begins with a convicted killer, “T”, who took his case to the court of appeal in 2010. Among the evidence against him was a shoeprint from a pair of Nike trainers, which seemed to match a pair found at his home. While appeals often unmask shaky evidence, this was different. This time, a mathematical formula was thrown out of court. The footwear expert made what the judge believed were poor calculations about the likelihood of the match, compounded by a bad explanation of how he reached his opinion. The conviction was quashed.

But more importantly, as far as mathematicians are concerned, the judge also ruled against using similar statistical analysis in the courts in future. It’s not the first time that judges have shown hostility to using formulae. But the real worry, say forensic experts, is that the ruling could lead to miscarriages of justice.

“The impact will be quite shattering,” says Professor Norman Fenton, a mathematician at Queen Mary, University of London. In the last four years he has been an expert witness in six cases, including the 2007 trial of Levi Bellfield for the murders of Marsha McDonnell and Amelie Delagrange. He claims that the decision in the shoeprint case threatens to damage trials now coming to court because experts like him can no longer use the maths they need.

Specifically, he means a statistical tool called Bayes’ theorem. Invented by an 18th-century English mathematician, Thomas Bayes, this calculates the odds of one event happening given the odds of other related events. Some mathematicians refer to it simply as logical thinking, because Bayesian reasoning is something we do naturally. If a husband tells his wife he didn’t eat the leftover cake in the fridge, but she spots chocolate on his face, her estimate of his guilt goes up. But when lots of factors are involved, a Bayesian calculation is a more precise way for forensic scientists to measure the shift in guilt or innocence.

In the shoeprint murder case, for example, it meant figuring out the chance that the print at the crime scene came from the same pair of Nike trainers as those found at the suspect’s house, given how common those kinds of shoes are, the size of the shoe, how the sole had been worn down and any damage to it. Between 1996 and 2006, for example, Nike distributed 786,000 pairs of trainers. This might suggest a match doesn’t mean very much. But if you take into account that there are 1,200 different sole patterns of Nike trainers and around 42 million pairs of sports shoes sold every year, a matching pair becomes more significant.

The data needed to run these kinds of calculations, though, isn’t always available. And this is where the expert in this case came under fire. The judge complained that he couldn’t say exactly how many of one particular type of Nike trainer there are in the country. National sales figures for sports shoes are just rough estimates.

And so he decided that Bayes’ theorem shouldn’t again be used unless the underlying statistics are “firm”. The decision could affect drug traces and fibre-matching from clothes, as well as footwear evidence, although not DNA.

“We hope the court of appeal will reconsider this ruling,” says Colin Aitken, professor of forensic statistics at the University of Edinburgh, and the chairman of the Royal Statistical Society’s working group on statistics and the law. It’s usual, he explains, for forensic experts to use Bayes’ theorem even when data is limited, by making assumptions and then drawing up reasonable estimates of what the numbers might be. Being unable to do this, he says, could risk miscarriages of justice.

“From being quite precise and being able to quantify your uncertainty, you’ve got to give a completely bland statement as an expert, which says ‘maybe’ or ‘maybe not’. No numbers,” explains Fenton.

“It’s potentially very damaging,” agrees University College London psychologist, Dr David Lagnado. Research has shown that people frequently make mistakes when crunching probabilities in their heads. “We like a good story to explain the evidence and this makes us use statistics inappropriately,” he says. When Sally Clark was convicted in 1999 of smothering her two children, jurors and judges bought into the claim that the odds of siblings dying by cot death was too unlikely for her to be innocent. In fact, it was statistically more rare for a mother to kill both her children. Clark was finally freed in 2003.

Lawyers call this type of mistake the prosecutor’s fallacy, when people confuse the odds associated with a piece of evidence with the odds of guilt. Recognising this is also what eventually quashed the 1991 conviction for rape of Andrew Deen in Manchester. The courts realised at appeal that a one-in-three-million chance of a random DNA match for a semen stain from the crime scene did not mean there was only a one-in-three-million chance that anyone other than Deen could have been a match – those odds actually depend on the pool of potential suspects. In a population of 20 million adult men, for example, there could be as many as six other matches.

Now, Fenton and his colleague Amber Marks, a barrister and lecturer in evidence at Queen Mary, University of London, have begun assembling a group of statisticians, forensic scientists and lawyers to research a solution to bad statistics. “We want to do what people failed to do in the past, which is really get the legal profession and statisticians and probability guys understanding each other’s language,” says Fenton.

Their first job is to find out how often trials depend on Bayesian calculations, and the impact that the shoeprint-murder ruling might have on future trials. “This could affect thousands of cases,” says Marks.

They have 37 members on their list so far, including John Wagstaff, legal adviser to the Criminal Cases Review Commission, and David Spiegelhalter, the Winton professor of the public understanding of risk at the University of Cambridge. Added to these are senior statisticians and legal scholars from the Netherlands, US and New Zealand.

Fenton believes that the potential for mathematics to improve the justice system is huge. “You could argue that virtually every case with circumstantial evidence is ripe for being improved by Bayesian arguments,” he says.

But the real dilemma is finding a way to help people make sense of the calculations. The Royal Statistical Society already offers guidance for forensic scientists, to stop them making mistakes. Lagnado says that flowcharts in the style of family trees also help jurors visualise changing odds more clearly. But neither approach has been entirely successful. And until this complex bit of maths can be simply explained, chances are judges will keep rejecting it.

Source:http://www.guardian.co.uk/law/2011/oct/02/formula-justice-bayes-theorem-miscarriageDate:Sunday 2 October 2011Bayes’ theorem is a mathematical equation used in court cases to analyse statistical evidence. But a judge has ruled it can no longer be used. Will it result in more miscarriages of justice?

3 years ago

Statistic:Smoking costs the Australian health system $32 billionSource:TVNZDate:12 OctoberThis is a horrible mutant stat.

Here’s TVNZ’s quote: “Australia’s Cancer Council said the Senate should end the political delays and get on with passing the legislation, with authorities estimating smoking now kills 15,000 Australians each year and costs the health system $32 billion.”

The $32 billion figure comes from Collins & Lapsley’s report on the social costs of alcohol, tobacco, and other drugs.

Most importantly, $32 billion figure counts a host of tangible and intangible costs that fall on the smoker, those around the smoker, and the public health system. Only $312 million of the $32 billion, according to the report, counts as a net health cost. Just look at the first table at xii in the Executive Summary.

I get really really annoyed at how these big numbers, which mostly consist of costs borne by the smoker or drinker himself, get twisted by activists like the Cancer Council to build support for policies that further beat on smokers and drinkers. There can be a case for anti-smoking policy. But it oughtn’t be based on lies. Smokers pay more in tax than they cost the health system in any country that has a reasonably large tobacco tax and a reasonably large public pension system.

3 years ago

Statistic:A few months ago I came upon an old episode of Radiolab, one of my favorite podcasts whose host Jad Abumrad just won a Macarthur Fellowship. The episode was about numbers. It made me nostalgic for my youthful enthrallment with the pristine world of mathematics, before I succumbed to the gritty reality of the financial world. Among the episode’s astounding revelations was that babies count on a logarithmic scale.A second earth-shattering fact is that there are more numbers in the universe that begin with the digit 1 than 2, or 3, or 4, or 5, or 6, or 7, or 8, or 9. And more numbers that begin with 2 than 3, or 4, and so on. This relationship holds for the lengths of rivers, the populations of cities, molecular weights of chemicals, and any number of other categories. What a blow to any of us who purport to have mastered the basic facts of the world around us!

This numerical regularity is known as Benford’s Law, and specifically, it says that the probability of the first digit from a set of numbers is d is given by

In fact, Benford’s law has been used in legal cases to detect corporate fraud, because deviations from the law can indicate that a company’s books have been manipulated. Naturally, I was keen to see whether it applies to the large public firms that we commonly study in finance.

I downloaded quarterly accounting data for all firms in Compustat, the most widely-used dataset in corporate finance that contains data on over 20,000 firms from SEC filings. I used a standard set of 43 variables that comprise the basic components of corporate balance sheets and income statements (revenues, expenses, assets, liabilities, etc.).

And lo, it works! Here are the distribution of first digits vs. Benford’s law’s prediction for total assets and total revenues.

Next, I looked at how adherence to Benford’s law changed over time, using a measure of the sum of squared deviations of the empirical density from the Benford’s prediction.

where ^P(d) is the empirical probability of the first digit d.

Deviations from Benford’s law have increased substantially over time, such that today the empirical distribution of each digit is about 3 percentage points off from what Benford’s law would predict. The deviation increased sharply between 1982-1986 before leveling off, then zoomed up again from 1998 to 2002. Notably, the deviation from Benford dropped off very slightly in 2003-2004 after the enactment of Sarbanes-Oxley accounting reform act in 2002, but this was very tiny and the deviation resumed its increase up to an all-time peak in 2009.

So according to Benford’s law, accounting statements are getting less and less representative of what’s really going on inside of companies. The major reform that was passed after Enron and other major accounting standards barely made a dent.

Next, I looked at Benford’s law for three industries: finance, information technology, and manufacturing. The finance industry showed a huge surge in the deviation from Benford’s from 1981-82, coincident with two major deregulatory acts that sparked the beginnings of that other big mortgage debacle, the Savings and Loan Crisis. The deviation from Benford’s in the finance industry reached a peak in 1988 and then decreased starting in 1993 at the tail end of the S&L fraud wave, not matching its 1988 level until … 2008.

The time series for information technology is similarly tied to that industry’s big debacle, the dotcom bubble. Neither manufacturing nor IT showed the huge increase and decline of the deviation from Benford’s that finance experienced in the 1980s and early 1990s, further validating the measure since neither industry experienced major fraud scandals during that period. The deviation for IT streaked up between 1998-2002 exactly during the dotcom bubble, and manufacturing experienced a more muted increase during the same period.

While these time series don’t prove anything decisively, deviations from Benford’s law are compellingly correlated with known financial crises, bubbles, and fraud waves. And overall, the picture looks grim. Accounting data seem to be less and less related to the natural data-generating process that governs everything from rivers to molecules to cities. Since these data form the basis of most of our research in finance, Benford’s law casts serious doubt on the reliability of our results. And it’s just one more reason for investors to beware.

As noted by William Black in his great book on the S&L crisis The Best Way to Rob a Bank Is to Own One, the most fraudulent S&Ls were the ones that looked most profitable on paper. That was in fact an inherent part of the scam. So perhaps, instead of looking solely at profitability, we should also consider this more fundamental measure of a firm’s “performance.” And many questions remain. What types of firms, and what kind of executives drive the greatest deviations from Benford’s law? Does this measure do well in predicting known instances of fraud? How much of these deviations are driven by government deregulation, changes in accounting standards, and traditional measures of corporate governance?

Source:http://econerdfood.blogspot.com/Date:Sunday, October 9, 2011Benford’s Law and the Decreasing Reliability of Accounting Data for US Firms

3 years ago

Statistic:A few months ago I came upon an old episode of Radiolab, one of my favorite podcasts whose host Jad Abumrad just won a Macarthur Fellowship. The episode was about numbers. It made me nostalgic for my youthful enthrallment with the pristine world of mathematics, before I succumbed to the gritty reality of the financial world. Among the episode’s astounding revelations was that babies count on a logarithmic scale.A second earth-shattering fact is that there are more numbers in the universe that begin with the digit 1 than 2, or 3, or 4, or 5, or 6, or 7, or 8, or 9. And more numbers that begin with 2 than 3, or 4, and so on. This relationship holds for the lengths of rivers, the populations of cities, molecular weights of chemicals, and any number of other categories. What a blow to any of us who purport to have mastered the basic facts of the world around us!

This numerical regularity is known as Benford’s Law, and specifically, it says that the probability of the first digit from a set of numbers is d is given by

In fact, Benford’s law has been used in legal cases to detect corporate fraud, because deviations from the law can indicate that a company’s books have been manipulated. Naturally, I was keen to see whether it applies to the large public firms that we commonly study in finance.

I downloaded quarterly accounting data for all firms in Compustat, the most widely-used dataset in corporate finance that contains data on over 20,000 firms from SEC filings. I used a standard set of 43 variables that comprise the basic components of corporate balance sheets and income statements (revenues, expenses, assets, liabilities, etc.).

And lo, it works! Here are the distribution of first digits vs. Benford’s law’s prediction for total assets and total revenues.

Next, I looked at how adherence to Benford’s law changed over time, using a measure of the sum of squared deviations of the empirical density from the Benford’s prediction.

where ^P(d) is the empirical probability of the first digit d.

Deviations from Benford’s law have increased substantially over time, such that today the empirical distribution of each digit is about 3 percentage points off from what Benford’s law would predict. The deviation increased sharply between 1982-1986 before leveling off, then zoomed up again from 1998 to 2002. Notably, the deviation from Benford dropped off very slightly in 2003-2004 after the enactment of Sarbanes-Oxley accounting reform act in 2002, but this was very tiny and the deviation resumed its increase up to an all-time peak in 2009.

So according to Benford’s law, accounting statements are getting less and less representative of what’s really going on inside of companies. The major reform that was passed after Enron and other major accounting standards barely made a dent.

Next, I looked at Benford’s law for three industries: finance, information technology, and manufacturing. The finance industry showed a huge surge in the deviation from Benford’s from 1981-82, coincident with two major deregulatory acts that sparked the beginnings of that other big mortgage debacle, the Savings and Loan Crisis. The deviation from Benford’s in the finance industry reached a peak in 1988 and then decreased starting in 1993 at the tail end of the S&L fraud wave, not matching its 1988 level until … 2008.

The time series for information technology is similarly tied to that industry’s big debacle, the dotcom bubble. Neither manufacturing nor IT showed the huge increase and decline of the deviation from Benford’s that finance experienced in the 1980s and early 1990s, further validating the measure since neither industry experienced major fraud scandals during that period. The deviation for IT streaked up between 1998-2002 exactly during the dotcom bubble, and manufacturing experienced a more muted increase during the same period.

While these time series don’t prove anything decisively, deviations from Benford’s law are compellingly correlated with known financial crises, bubbles, and fraud waves. And overall, the picture looks grim. Accounting data seem to be less and less related to the natural data-generating process that governs everything from rivers to molecules to cities. Since these data form the basis of most of our research in finance, Benford’s law casts serious doubt on the reliability of our results. And it’s just one more reason for investors to beware.

As noted by William Black in his great book on the S&L crisis The Best Way to Rob a Bank Is to Own One, the most fraudulent S&Ls were the ones that looked most profitable on paper. That was in fact an inherent part of the scam. So perhaps, instead of looking solely at profitability, we should also consider this more fundamental measure of a firm’s “performance.” And many questions remain. What types of firms, and what kind of executives drive the greatest deviations from Benford’s law? Does this measure do well in predicting known instances of fraud? How much of these deviations are driven by government deregulation, changes in accounting standards, and traditional measures of corporate governance?

Source:http://econerdfood.blogspot.com/2011/10/benfords-law-and-decreasing-reliability.htmlDate:09 Oct, 2011Benford’s Law and the Decreasing Reliability of Accounting Data for US Firms

3 years ago

Statistic:All Blacks v Australia – What does history say?NZ Heard uses a dynamic pie chart to display the record of ABs vs Austrailia.

Source:NZ heraldDate:Friday Oct 14, 2011Personally, I think the game results is memoryless. The next result does not necessarily depends on the previous historical results

3 years ago