Pirates of Statistics

Pirates of Rrrrr!Article from the New York Times a bit ago, covering R, everyone’s favorite stats package:

R is also the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.

R is also open source, another focus of the article, which includes quoted gems such as this one from commercial competitor SAS:

Closed source: it’s got what airplanes crave!“I think it addresses a niche market for high-end data analysts that want free, readily available code,” said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

Pure gold: free software is scary software! And freeware? Is she trying to conflate R with free software downloads from CNET?

Truth be told, I don’t think I’d want to be on a plane that used a jet engine designed or built with SAS (or even R, for that matter). Does she know what her product does? (A hint: It’s a statistics package. You might analyze the engine with it, but you don’t use it for design or construction.)

For those less familiar with the project, some examples:

…companies like Google and Pfizer say they use the software for just about anything they can. Google, for example, taps R for help understanding trends in ad pricing and for illuminating patterns in the search data it collects. Pfizer has created customized packages for R to let its scientists manipulate their own data during nonclinical drug studies rather than send the information off to a statistician.

At any rate, many congratulations to Robert Gentleman and Ross Ihaka, the original creators, for their success. It’s a wonderful thing that they’re making enough of a rumpus that a stats package is being covered in a mainstream newspaper.


Tuesday, January 27, 2009 | languages, mine, software  

Visualizing Data Book CoverVisualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.