Tuesday, August 11, 2009

Book Review: The Numerati

With the advent of the Web and the fall in electronic prices, we have seen an explosion in digital data in the form of huge databases collecting various pieces of information to ever larger collection of documents. The Numerati (a portmanteau between the Number and Illuminati) are the statisticians, mathematicians, computer scientists, linguists and others involved in making sense of this data using sophisticated statistical techniques. The book describes the kind of problems being solved in the following areas, citing various examples at a bunch of organizations like IBM, Intel, Umbria, etc.:
  • Workers - building employee profiles, understanding employee networks, using it for optimal use of resources
  • Shoppers - microtargeting shoppers using personal information to customize service, give recommendations and increase sales
  • Voters - Understanding voter intent, issues - so that campaign messages can be targeted to focussed groups.
  • Bloggers - Understanding public opinion from the information on blogosphere, useful to understand sentiments on products, etc.
  • Medicine - Baker focusses on futuristic health monitoring (like floor tiles which capture your walking patterns!), whereaas he totally ignores contemporary challenges and work in analyzing medical records, genomic and proteomic data.
  • Terrorism
  • Match Making
All this comes at a cost. The Numerati has access to vast amounts of personal data, and we don't need an Orwellian Big Brother who is going to use it to learn about us, turn us into commodities and control our lives.

That's about it in the book - it can be a brisk read, which - you can give it a miss if you think you are familiar with the above topics.

Book Review: The Lady Tasting Tea

A lady claims that the taste of tea differs when milk is poured to tea leaves as opposed to adding tea leaves into a cup of milk. Everyone at the small party scoffs at the suggestion, except Ronald Aylmer Fisher. Fisher designs an experiment that would statistically establish the lady's claims. He creates a sample set containing tea prepared in either ways, and lo and behold - the story goes that the lady identifies each cup correctly. Fisher uses this example to explain the design of experiments in his book 'The Design of Experiments'. This anecdote sets up the book. 'The Lady Tasting Tea' is the story of the development of statistics, Fisher having built the pillars of statistics as it stands today.

I started reading this book, while looking around to brush my statistics; thought it would be a good idea to know the history of the subject I am exploring. That's particularly relevant in sciences filled with uncertainties like statistics, economics, linguistics; where the characteristics of the individual seem to contribute to the development of the theory, and there's a story behind things which seem arbitrary.

David Salsburg takes us through an entertaining journey starting with the earliest breakthroughs by Karl Pearson and William Gossett, going to the pioneering foundational works of the acerbic genius Ronald Fisher, the cheerful Jerzy Newman, and the multitalented Andrei Kolmogorov. Apart from these pioneers, Salsburg very vividly sketches the lives and contributions of Egon Pearson (hypothesis testing), Chester Bliss (probit analysis), John Tukey (exploratory data analysis), Frank Wilcoxon (non-parametric methods), EJG Pitman (non-parametric methods), Prasanta Chandra Mahalabonis (sampling theory), Samuel Wilks (Founder - Statistical Research Group, Princeton) , George Box (robust statistics) and Edward Deming (statistical quality control).

Some of the chapter names are interesting, and they are as good as the title of the book. It reminds me of 'The Mythical Man Month''s memorable illustrative sketches. Sample this:

  • The Mozart of Mathematics - Andrei Kolmorogov
  • The Picasso of Statistics - John Tukey
  • The March of the Martingales - on the work of Paul Levy

Read this if you are a fan of scientific history.