writing | ben fry

Writing

Data is the pollution of the information age

Welcome to the future, where everything about you is saved. A future where your actions are recorded, your movements are tracked, and your conversations are no longer ephemeral. A future brought to you not by some 1984-like dystopia, but by the natural tendencies of computers to produce data.

Data is the pollution of the information age. It’s a natural by-product of every computer-mediated interaction. It stays around forever, unless it’s disposed of. It is valuable when reused, but it must be done carefully. Otherwise, its after-effects are toxic.

The essay goes on to cite specific examples, though they sound more high-tech than the actual problem. Later it returns to the important part:

Cardinal Richelieu famously said: “If one would give me six lines written by the hand of the most honest man, I would find something in them to have him hanged.” When all your words and actions can be saved for later examination, different rules have to apply.

Society works precisely because conversation is ephemeral; because people forget, and because people don’t have to justify every word they utter.

Conversation is not the same thing as correspondence. Words uttered in haste over morning coffee, whether spoken in a coffee shop or thumbed on a BlackBerry, are not official correspondence.

And an earlier paragraph that highlights why I talk about privacy on this site:

And just as 100 years ago people ignored pollution in our rush to build the Industrial Age, today we’re ignoring data in our rush to build the Information Age.

Tuesday, March 17, 2009 | privacy

Book

Visualizing Data is my 2007 book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. When first published, it was the only book(s) for people who wanted to learn how to actually build a data visualization in code.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is the basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.

Much Clicked

Full Archives