January 9, 2013

Something beginning with ‘D’

The graph shows letter frequencies in English by position in the word — for example, ‘e’ is much less common as the first letter than as the third letter.



The first systematic study of these was done using a sample of 20 000 words collected by hand and analysed with punched cards and a card-sorting machine.  The author recently wrote to Peter Norvig, at Google, suggesting that an update might be of interest.  The full details, using 0.75 trillion words,  are on Norvig’s web page.

While you’re there, if you have yet to encounter the “Gettysburg Powerpoint”, now is your opportunity.


Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Add a comment

First time commenting? Please use your real first name and surname and read the Comment Policy.