Writing

Design and the Elastic Mind

Perhaps three months late for an announcement, and at the risk of totally reckless narcissism, I should mention that four of my projects are currently on display in the Design and the Elastic Mind exhibition at the Museum of Modern Art in New York. My work notwithstanding, I hear that the show is generating lots of foot traffic and positive reviews, which is a well-deserved compliment to curator Paola Antonelli.

There’s a New York Times article and slide show (too much linking to the Times lately, weird…) and a writeup in the International Herald Tribune that even mentions my Humans vs. Chimps piece.

The first wall as you enter the show is all of Chromosome 18, done in the style of this piece.

chr18-elastic-510b.jpg

It’s a 3 pixel font at 150 dpi, so there are 37.5 letters per inch in either direction, and the wall is about 20 feet square, making 75 million letters total. Paola and her staff asked whether it was OK to put the text on the piece itself, which I felt was fine, as the nature of the piece is about scale, and the printing would not detract from that. The funny side effect of this was watching people at the opening take one another’s picture in front of the piece, mostly probably not realizing that the wall itself was part of the exhibition. Perhaps my most popular work so far, given the number of family photos in which it will be found.

Former classmate Ron Kurti also took a nice detail shot:

chr18-placard-kurti-510.jpg

Also in the show is the previously mentioned Humans vs. Chimps project as seen below:

chimp-510.jpg

This image is about three feet wide so you can read the letters accurately. It’s found next to an identically sized print of isometricblocks depicting the CFTR region of the human genome (the area implicated in connection to Cystic Fibrosis). The image was first developed for a Nature cover.

isometricblocks-510.jpg

Finally, the Pac-Man print of distellamap is printed floor to ceiling on another wall in the exhibition. Unfortunately there was a glitch in the printing that caused the lines connecting portions of the code to be lost (because they’re too thin to see at a distance), but no matter.

pacman-crop-510.jpg

Much moreso than my own work, however, by far the most exciting for me is the number of projects built with Processing that are in the show. It’s a bit humbling and the sort of thing that makes me excited (and relieved) to have some time this summer to devote to Processing itself.

Wednesday, April 30, 2008 | iloveme  

Google Underwater

So that might not be the awesome name that they’ll be using, but CNET is rumormongering about Google cooking up something oceanographic along the lines of Maps or Earth. Their speculation includes this lovely image from the Lamont-Doherty Earth Observatory (LDEO) of Columbia University.

underwatertiles_510.jpg

Unlike most people with a heartbeat, I didn’t find Google Maps particularly interesting on arrival. I was a fan of the simplicity of Yahoo Maps at the time (but no longer, eek!) and Microsoft’s Terraserver had done satellite imagery for a few years. But the same way that Google Mars shows us something we’re even less familiar with than satellite imagery of Earth, there’s something really exciting about possibility of seeing beneath the oceans.

Wednesday, April 30, 2008 | mapping, rumors, water  

Me blog big linky

Kottke and Freakonomics were kind enough to link over here, which has brought more queries about salaryper. Rather than piling onto the original web page, I’ll add updates to this section of the site.

I didn’t include the project’s back story with the 2008 version of the piece, so here goes:

Some background for people who don’t watch/follow/care about baseball:

When I first created this piece in 2005, the Yankees had a particularly bad year, with a team full of aging all-stars and owner George Steinbrenner hoping that a World Series trophy could be purchased for $208 million. The World Champion Red Sox did an ample job of defending their title, but as the second highest paid team in baseball, they’re not exactly young upstarts. The Chicago White Sox had an excellent year with just one third the salary of the Yankees, while the Cardinals are performing roughly on par with what they’re paid. Interestingly, the White Sox went on to win the World Series. The performance of Oakland, which previous years has far exceeded their overall salary, was a story, largely about their General Manager Billy Beane, told in the book Moneyball.

Some background for people who do watch/follow/care about baseball:

I neglected to include a caveat on the original page that this is a really simplistic view of salary vs. performance. I created this piece because the World Series victory of my beloved Red Sox was somewhat bittersweet in the sense that the second highest paid team in baseball finally managed to win a championship. This fact made me curious about how that works across the league, with raw salaries and the general performance of the individual teams.

There are lots of proportional things that can be done too—the salaries especially exist across a wide range (the Yankees waaaay out in front, followed the another pack of big market teams, then everyone else).

There are far more complex things about how contracts work over multiple years, how the farm system works, and scoring methods for individual players that could be taken into consideration.

This piece was thrown together while watching a game, so it’s perhaps dangerously un-advanced, given the amount of time and energy that’s put into the analysis (and argument) of sports statistics.

That last point is really important… This is fun! I encourage people to try out their own methods of playing with the data. For those who need a guide on building such a beast, the book has all the explanation and all the code (which isn’t much). And if you adapt the code, drop me a line so I can link to your example.

I have a handful of things I’d like to try (such as a proper method for doing proportional spacing at the sides without overdoing it), though the whole point of the project is to strip away as much as possible, and make a straightforward statement about salaries, so I haven’t bothered coming back to it since it succeeds in that original intent.

Wednesday, April 30, 2008 | salaryper, updates, vida  

Updated Salary vs. Performance for 2008

It’s April again, which means that there are messages lurking in my inbox asking about the whereabouts of this year’s Salary vs. Performance project (found in Chapter 5 of the good book). I got around to updating it a few days ago, which means now my inbox has changed to suggestions on how the piece might be improved. (It’s tempting to say, “Hey! Check out the book and the code, you can do anything you’d like with it! It’s more fun that way.” but that’s not really what they’re looking for.)

One of the best messages I’ve received so far is from someone who I strongly suspect is a statistician, who was wishing to see a scatter plot of the data rather than its current representation. Who else would be pining for a scatterplot? There are lots of jokes about the statistically inclined that might cover this situation, but… we’re much too high minded to let things devolve to that (actually, it’s more of a pot-kettle-black situation). If prompted, statisticians usually tell better jokes about themselves anyways.

At any rate, as it’s relevant to the issue of how you choose representations, my response follows:

Sadly, the scatter plot of the same data is actually kinda uninformative, since one of your axes (salary) is more or less fixed all season (might change at the trade deadline, but more or less stays fixed) and it’s just the averages that move about. So in fact if we’re looking for more “accurate”, a time series is gonna be better for our purposes. In an actual analytic piece, for instance, I’d do something very different (which would include multiple years, more detail about the salaries and how they amortize over time, etc).

But even so, making the piece more “correct” misses the intentional simplifications found in it, e.g. it doesn’t matter whether a baseball team was 5% away from winning, it only matters whether they’ve won. At the end of the day, it’s all about the specific rankings, who gets into the playoffs, and who wins those final games. Since the piece isn’t intended as an analytical tool, but something that conveys the idea of salary vs. performance to an audience who by and large cares little about 1) baseball and 2) stats. That’s not to say that it’s about making something zoomy and pretty (and irrelevant), but rather, how do you engage people with the data in a way that teaches them something in the end and gets them thinking about it.

Now to get back to my inbox and the guy who would rather have the data sonified since he thinks this visual thing is just a fad.

Tuesday, April 29, 2008 | examples, represent, salaryper  

All Streets Error Messages

Some favorite error messages while working on the All Streets project (mentioned below). I was initially hoping to use Illustrator to open the generated PDF files (generated from Processing), but Venus informed me that it was not to be:

illustrator-sucks-balls.png

I’m having difficulties as well. Why did I pay for this software?

Generally, Photoshop is far better engineered so I was hoping that it would be able to rasterize the PDF file instead, never mind the vectors and all.

photoshops-own-balls.png

Oh come on… Just admit that you ran out of memory and can’t deal. Meanwhile, Eugene was helping out with the site, from the other end of iChat:

aim-error-none.png

Oh well.

Sunday, April 27, 2008 | allstreets, software  

The Advantages of Closing a Few Doors

From the New York Times, a piece about Predictably Irrational from Dan Ariely. I’m somewhat fascinated by the idea of our general preoccupation with holding on to things, particularly as it relates to retaining data (see previous posts referencing Facebook, Google, etc.)

Our natural tendency is to keep everything, in spite of the consequences. Storage capacity in the digital realm is only getting larger and cheaper (as its size in the physical realm continues to get smaller), which only seeks to feed off this tendency further. Perhaps this is also why more individuals don’t question Google claiming a right to keep messages from their Gmail account after the messages, or even the account, have been deleted.

Ariely’s book describes a set of experiments performed at M.I.T.:

[Students] played a computer game that paid real cash to look for money behind three doors on the screen… After they opened a door by clicking on it, each subsequent click earned a little money, with the sum varying each time.

As each player went through the 100 allotted clicks, he could switch rooms to search for higher payoffs, but each switch used up a click to open the new door. The best strategy was to quickly check out the three rooms and settle in the one with the highest rewards.

Even after students got the hang of the game by practicing it, they were flummoxed when a new visual feature was introduced. If they stayed out of any room, its door would start shrinking and eventually disappear.

They should have ignored those disappearing doors, but the students couldn’t. They wasted so many clicks rushing back to reopen doors that their earnings dropped 15 percent. Even when the penalties for switching grew stiffer — besides losing a click, the players had to pay a cash fee — the students kept losing money by frantically keeping all their doors open.

(Emphasis mine.) I originally came across the article via Mark Hurst, who adds:

I’ve said for a long time that the solution to information overload is to let the bits go: always look for ways to delete, defer, or otherwise avoid bits, so that the few that remain are more relevant and easier to handle. This is the core philosophy of Bit Literacy.

Put another way, do we need to take more personal responsibility for subjecting ourselves to the “information overload” that people so happily buzzword about? Is complaining about the overload really an issue of not doing enough spring cleaning at home?

Sunday, April 27, 2008 | retention  

Restroom information graphics

bacon-510.jpg

I like neither bacon nor these machines, so I wish they would always provide this helpful explanation (or warning).

Friday, April 25, 2008 | infographics  

The Earth at night

Via mailing list, Oswald Berthold passes along images and a short article of the Earth from space as compiled by NASA, highlighting city lights in particular.

Tokyo Bay

The collection is an update to the Earth Lights image developed a few years ago (and which made its way ’round the interwebs at the time).

For the more technical, a presentation from the NOAA titled Low Light Imaging of the Earth at Night provides greater detail about the methods used to produce such images. Also includes a couple interesting historical examples (such as the first image they created) as well as comparisons of city growth over time based on changes in the data.

Of course many conclusions can be drawn from seeing map data such as this. Look at the difference between North and South Korea, for instance (original image from globalsecurity.org).

North and South Korea by night

Apparently this is a favorite of former U.S. Secretary of Defense Donald Rumsfeld:

Mr Rumsfeld showed the picture to illustrate how backward the northern regime really is – and how oppressed its people are. Without electricity there can be none of the appliances that make life easy and that we take for granted, he said.

“Except for my wife and family, that is my favourite photo,” said Mr Rumsfeld.

“It says it all. There’s the south, the same people as the north, the same resources north and south, and the big difference is in the south it’s a free political system and a free economic system.

I’ve vowed to myself not to make this page be about politics so I won’t get into the fatuous arguments of a warmonger (oops), but I think the fascinating thing is that

  1. This image, this “information graphic,” would be of such great importance to a person that he would see fit to even mention it in reference to photos of his wife and children. This is a strong statement for any image, even if he is being dramatic.
  2. The use of images to make or score political points. There’s some great stuff buried in recent Congressional testimony about the Iraq War, for instance, that I want to get to soon.

In regards to #1, I’m trying to think of other images to which people maintain such a personal relationship (particularly those whose job is not info graphics—Tufte’s preoccupation with Napoleon’s March doesn’t count.)

As for #2, hopefully we’ll get to that a bit later.

Friday, April 25, 2008 | mapping, physical, politics  

All Streets

all streetsNew work, now posted. All of the streets in the lower 48 United States: an image of 26 million individual road segments. This began as an example I created for one of my students in the fall of 2006, and I just recently got a chance to document it properly.

Nothing particularly genius about this piece—it’s mostly just a matter of collecting the data and creating the image. But it’s one of those cases where even in a (relatively) raw format, the data itself is quite striking.

The data in this piece comes from the U.S. Census Bureau’s TIGER/Line data files. The data is first parsed and filtered (to remove non-street features) using Perl. Next, using Processing, the latitude and longitude coordinates are transformed using an Albers equal-area conic projection (which gives it that curvy surface-of-the-Earth look that we’re used to), and then plotted to an enormous image that’s saved to the disk. The steps are similar to the preprocessing stages described in Chapter 6 of Visualizing Data.

I had originally hoped to use this piece to show patterns in street naming, but I didn’t manage to find as much as I had hoped. For instance, names of local trees and flowers being tied to the local geographic regions where they’re found. However, cookie cutter suburban neighborhood developments seem to have obliterated any causation. “Magnolia” is such a nice sounding, outdoorsy word; who wouldn’t want it adorning their street corner? Local flora be damned.

There are, however, a few other interesting tidbits in the data that I hope to cover in a future project. Real work be damned.

Friday, April 25, 2008 | allstreets  

Data availability is aiming too low

The quote is primarily in regards to Web 2.0 (cough), and I couldn’t agree more.

“Praising companies for providing APIs to get your own data out is like praising auto companies for not filling your airbags with gravel. I’m not saying data export isn’t important, it’s just aiming kinda low. You mean when I give you data, you’ll give it back to me? People who think this is the pinnacle of freedom aren’t really worth listening to.”

Via pmarca, I think?

Thursday, April 24, 2008 | acquire  

Dusting off

There’s nothing worse than someone keeping a journal or blog and having it go stale, so I’ve watched in horror during the forty day Lenten fast since I’ve had a chance to post. Things should be better in the next few weeks.

My guidance is Mark Twain, speaking in The Innocents Abroad, who lampooned blogging so accurately a short 139 years ago.

One of our favorite youths, Jack, a splendid young fellow with a head full of good sense, and a pair of legs that were a wonder to look upon in the way of length and straightness and slimness, used to report progress every morning in the most glowing and spirited way, and say:

“Oh, I’m coming along bully!” (he was a little given to slang in his happier moods.) “I wrote ten pages in my journal last night – and you know I wrote nine the night before and twelve the night before that. Why, it’s only fun!”

“What do you find to put in it, Jack?”

“Oh, everything. Latitude and longitude, noon every day; and how many miles we made last twenty-four hours; and all the domino games I beat and horse billiards; and whales and sharks and porpoises; and the text of the sermon Sundays (because that’ll tell at home, you know); and the ships we saluted and what nation they were; and which way the wind was, and whether there was a heavy sea, and what sail we carried, though we don’t ever carry any, principally, going against a head wind always – wonder what is the reason of that? – and how many lies Moult has told – Oh, every thing! I’ve got everything down. My father told me to keep that journal. Father wouldn’t take a thousand dollars for it when I get it done.”

“No, Jack; it will be worth more than a thousand dollars – when you get it done.”

“Do you? – no, but do you think it will, though?

“Yes, it will be worth at least as much as a thousand dollars – when you get it done. May be more.”

“Well, I about half think so, myself. It ain’t no slouch of a journal.”

But it shortly became a most lamentable “slouch of a journal.” One night in Paris, after a hard day’s toil in sightseeing, I said:

“Now I’ll go and stroll around the cafes awhile, Jack, and give you a chance to write up your journal, old fellow.”

His countenance lost its fire. He said:

“Well, no, you needn’t mind. I think I won’t run that journal anymore. It is awful tedious. Do you know – I reckon I’m as much as four thousand pages behind hand. I haven’t got any France in it at all. First I thought I’d leave France out and start fresh. But that wouldn’t do, would it? The governor would say, ‘Hello, here – didn’t see anything in France? That cat wouldn’t fight, you know. First I thought I’d copy France out of the guide-book, like old Badger in the for’rard cabin, who’s writing a book, but there’s more than three hundred pages of it. Oh, I don’t think a journal’s any use – -do you? They’re only a bother, ain’t they?”

“Yes, a journal that is incomplete isn’t of much use, but a journal properly kept is worth a thousand dollars – when you’ve got it done.”

“A thousand! – well, I should think so. I wouldn’t finish it for a million.”

Stay tuned for Mark Twain’s thoughts on Digg, YouTube, and Web 2.0.

Thursday, April 24, 2008 | site  

Representing power usage in the sky

collage123_600.jpgWonderful project that shows power usage mapped to a green cloud, projected into the sky and onto the output of the Salmisaari power plant in Helsinki. From their description:

Every night from the 22 to the 29 of February 2008, the vapour emissions of he Salmisaari power plant in Helsinki will be illuminated to show the current levels of electricity consumption by local residents. A laser ray will trace the cloud during the night time and turn it into a city scale neon sign. Nuage Vert is a communal event for the area of Ruoholahti, which anticipates esoteric cults centred on energy and transforms an active power plant into a space for art, a living factory. In tandem, as a reversal of conventional roles whereby the post-industrial factory is turned into space for culture, Kaapeli (the cultural factory) becomes the site of operation and Salmisaari (the industrious factory) becomes the site of spectacle.

Check out their blog page with updates and pictures.

Thursday, April 24, 2008 | physical  
Book

Visualizing Data Book CoverVisualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.

As seen on Twitter