This here is a ghost town

This blog was created in 2008 and hasn’t been actively updated for several years.

For more recent work, please visit the Fathom site, where you can see current projects, or read the latest updates on what we’ve been up to.

For Processing Foundation work, please visit the site or check out our page on Github.

Tuesday, April 14, 2015 | site  

And speaking of height…

Another wonderful example, more powerful as words than as an image:

Jan Pen, a Dutch economist who died last year, came up with a striking way to picture inequality. Imagine people’s height being proportional to their income, so that someone with an average income is of average height. Now imagine that the entire adult population of America is walking past you in a single hour, in ascending order of income.

The first passers-by, the owners of loss-making businesses, are invisible: their heads are below ground. Then come the jobless and the working poor, who are midgets. After half an hour the strollers are still only waist-high, since America’s median income is only half the mean. It takes nearly 45 minutes before normal-sized people appear. But then, in the final minutes, giants thunder by. With six minutes to go they are 12 feet tall. When the 400 highest earners walk by, right at the end, each is more than two miles tall.

(From The Economist, by way of Eva)

Tuesday, February 1, 2011 | finance, scale  

The importance of showing numbers in context

An info graphic from the Boston Globe:

measuring in shaq inches

Monday, January 31, 2011 | basketball, scale, sports  

Come work with us in Boston

Fathom Information Design is looking for developers and designers. Come join us!

We’re looking for people to join us at Fathom. For all the positions, you’ll be creating work like you see on fathom.info, plus more mobile projects (Android, iOS, JavaScript) and the occasional installation piece. If you’re a developer, design skills are a plus. Or if you’re a designer, same goes for coding.

  • Developer – Looking for someone with a strong background in Java, and some C/C++ as well. On Monday this person would be sorting out more advanced aspects of a client project. On Tuesday they would hone the Processing Development Environment, mercilessly crushing bugs. On Wednesday they would refactor critical visualization tools used by brilliant scientists. On Thursday they could put out a fire in another client project without breaking a sweat, and on the fifth day, they would choose what we’re having for Beer Friday. This messiah also might not mind being referred to in the third person.
  • Web Developer – In 1996, I used Java for my Computer Graphics 2 homework at Carnegie Mellon. I’ll never forget the look on the face of my professor Paul Heckbert (Graphics Gems IV, Pixar, and now Gigapan — a man who wrote an actual ray tracer in C code that fit on the back of a business card), when he asked me during office hours why this was a good idea. Your professor did the same thing when you told him (or her) that you’d be implementing your final project with JavaScript and Canvas. We need amazing things to happen with HTML, CSS, and JavaScript, and you’re the person to do it.
  • Junior Designer – You’ve finished your undergrad design program and feel the need to make beautiful things. Your commute is spent fixing the typography in dreadful subway ads (only in your head, please). You are capable of pixel-level detail work to get mobile apps or a web site just right. And if we’re lucky, you’re so good with color that you’ve been mistaken for an impressionist painter.
  • Senior Designer – So all that stuff above that the Junior Designer candidate thinks they can do? You can actually do it. And more important, you have the patience and humility to teach it to others around you. You’re also an asset on group projects, best friends with developers, and adored by clients.

At the moment, we’re only looking for people located in (or willing to relocate to) the Boston area.

Please send résumé or CV, links to relevant work, and cover letter to inquire (at) fathom (dot) info. Please do not write us individually, as that may void your contest entry.

Monday, January 17, 2011 | opportunities  

Minnesota, meet Physics

The roof of the Metrodome springs a leak following heavy snow in Minnesota:

I’ve been looking at too many particle and fluid dynamics simulations because it looks fake to me — more like a simulation created by the structural engineers of what would happen if the roof were to collapse — rather than thousands of pounds of honest-to-goodness midwestern snow pummeling the turf seemingly in slow motion. Beautiful.

And another version from a local FOX affiliate in Minnesota:

Sunday, December 12, 2010 | physical, simulation, water  

The growth of the Processing project

Number of Processing users, every four weeks, since 2005:

humbling and terrifying

Long version: this is a tally of the number of unique users who run the Processing environment every four weeks, as measured by the number of machines checking for updates.

Of note:

  • In spite of the frequently proclaimed “death of Java” or “death of Java on the desktop,” we’re continuing to grow. This isn’t to say that Java on the desktop is undead, but this frustrating contradiction presents a considerable challenge for us… I’ll write more about that soon.
  • There’s a considerable (even comical) dip each January, when people decide that the holidays and drinking with their family is more fun than coding (or maybe that’s only my household). Things also tail off during the summer into August. These two trends are amplified due to the number of academic users, however other data I’ve seen (web traffic, etc) suggests that the rest of the world actually operates on something like the academic calendar as well.

About the data:

  • This is a very conservative estimate of the number of Processing users out there. Our software is free — we don’t have a lot to gain by inflating the numbers.
  • This covers only unique users — we don’t double count the same person in each 4-week period. Otherwise our numbers would be much higher.
  • This is not downloads, which are also significantly higher.
  • This is every four weeks, not every month. Unless there are 13 months in a year. Wait, how many months are in a year?
  • This only covers people who are using the actual Processing Development Environment — no Eclipse users, etc.
  • Use of processing.js or spinoff projects are not included.
  • This doesn’t include anyone who has disabled checking for updates.
  • This doesn’t include anyone not connected to the net.
  • The unique ID is stored in the preferences.txt file, so if a single login is used on a machine, that’s counting multiple people. Conversely, if you have multiple machines, you’ll be counted more than once.
  • Showing the data by day, week, or year all show the same overall trend.

This is a pretty lame visualization of the numbers, and I’m not even showing other interesting tidbits like what OS, version, and so on are in use. Maybe we can release the data if we can figure out an appropriate way to do so.

Tuesday, November 2, 2010 | processing  

Processing + Eclipse

Exciting news! The short story is that there’s a new Processing Plug-in for Eclipse, and you can learn about it here.


The long story is that Chris Lonnen contacted me in the spring about applying for the Google Summer of Code (SoC) program, which I promptly missed the deadline for. But we eventually managed to put him to work anyway, via Fathom (our own SoC army of one, with Chris working from afar in western New York) with the task of working on a new editor that we can use to replace the current Processing Development Environment (the PDE).

After some initial work and scoping things out, we settled on the Eclipse RCP as the platform, with the task of first making a plug-in that works in the Eclipse environment (everything in Eclipse is a plug-in), which could then eventually become its own standalone editor to replace the current PDE.

Things are currently incomplete (again, see the Wiki page for more details), but give it a shot, file bugs (tag with Component-Eclipse when filing), and help lend Chris a hand in developing it further. Or if you have questions, be sure to use the forum. Come to think of it, might be time for a new forum section…

Tuesday, October 19, 2010 | processing  

When you spend your life doing news graphics…

…like Karl Gude has, then parking lots start to look like this:


Tuesday, October 19, 2010 | mapping, news, perception  

Ever feel like there’s just a tiny curtain protecting your privacy online?

This piece from Niklas Roy made me laugh out loud:

Built with Processing and AVR-GCC.

(Thanks to Golan, who pointed out this link.)

Monday, October 18, 2010 | laughinglikeanidiotatyourcomputer, processing  

Already checked it in Photoshop, so you don’t have to

I wasn’t going to post this one, but I can’t get it out of my head. In the image below, the squares marked A and B are the same shade of gray.

prepare to have your mind blown. what's that? it already was?

The image is from Edward H. Adelson at MIT, and you can find my original source here. More details (proof, etc) on Adelson’s site here, which includes this explanation:

The visual system needs to determine the color of objects in the world. In this case the problem is to determine the gray shade of the checks on the floor. Just measuring the light coming from a surface (the luminance) is not enough: a cast shadow will dim a surface, so that a white surface in shadow may be reflecting less light than a black surface in full light. The visual system uses several tricks to determine where the shadows are and how to compensate for them, in order to determine the shade of gray “paint” that belongs to the surface.

The first trick is based on local contrast. In shadow or not, a check that is lighter than its neighboring checks is probably lighter than average, and vice versa. In the figure, the light check in shadow is surrounded by darker checks. Thus, even though the check is physically dark, it is light when compared to its neighbors. The dark checks outside the shadow, conversely, are surrounded by lighter checks, so they look dark by comparison.

A second trick is based on the fact that shadows often have soft edges, while paint boundaries (like the checks) often have sharp edges. The visual system tends to ignore gradual changes in light level, so that it can determine the color of the surfaces without being misled by shadows. In this figure, the shadow looks like a shadow, both because it is fuzzy and because the shadow casting object is visible.

The “paintness” of the checks is aided by the form of the “X-junctions” formed by 4 abutting checks. This type of junction is usually a signal that all the edges should be interpreted as changes in surface color rather than in terms of shadows or lighting.

As with many so-called illusions, this effect really demonstrates the success rather than the failure of the visual system. The visual system is not very good at being a physical light meter, but that is not its purpose. The important task is to break the image information down into meaningful components, and thereby perceive the nature of the objects in view.

(Like the earlier illusion post, this one’s also from my mother-in-law, who should apparently be writing this blog instead of its current—woefully negligent—author.)

Sunday, October 17, 2010 | perception, science  

Processing 0191 for Android

Casey and I are in Chicago this weekend for the Processing+Android conference at UIC, organized by Daniel Sauter. In our excitement over the event, we posted revision 0191 last night (we tried to post from the back of Daniel’s old red Volvo, but Sprint’s network took exception). The release includes several Android-related updates, mostly fixed from Andres Colubri to improve how 3D works. Get the download here:

http://processing.org/download/ (under pre-releases)

Also be sure to keep an eye on the Wiki for Android updates:

(By the time you read this, there may be newer pre-releases like 0192, or 0193, and so on. Use those instead.)

Release notes for the 0191 update follow. And we’ll be doing a more final release (1.3 or 2.0, depending) once things settle a bit.

Processing Revision 0191 – 30 September 2010

Bug fix release. Contains major fixes to 3D for Android.

[ changes ]

+ Added option to preferences panel to enable/disable smoothing of text inside the editor.

+ Added more anti-aliasing to the Linux interface. Things were downright ugly in places where defaults different from Windows and Mac OS X.

[ bug fixes ]

+ Fix a problem with Linux permissions in the download.

+ Fix ‘redo’ command to follow various OS conventions.
Linux: ctrl-shift-z, macosx cmd-shift-z, windows ctrl-y

+ Remove extraneous console messages on export.

+ When exporting, don’t include a library multiple times.

+ Fixed a problem where no spaces in the size() command caused an error.

[ andres 1, android 0 ]

+ Implemented offscreen operations in A3D when FBO extension is not available

+ Get OpenGL matrices in A3D when GL_OES_matrix_get extension is not available

+ Implemented calculateModelviewInverse() in A3D

+ Automatic clear/noClear() switch in A3D

+ Fix camera issues in A3D

+ Major fixes for type to work properly in 3D (fixes KineticType)

+ Lighting and materials testing in A3D

+ Generate mipmaps when the GL_OES_generate_mipmaps extension is not available.

+ Finish screen pixels/texture operations in A3D

+ Fixed a bug in the camera handling. This was a quite urgent issue, since affected pretty much everything. It went unnoticed until now because the math error canceled out with the default camera settings.

+ Also finished the implementation of the getImpl() method in PImage,  so it initializes the texture of the new image in A3D mode. This makes the CubicVR example to work fine.

[ core ]

+ Fix background(PImage) for OpenGL

+ Skip null entries with trim(String[])

+ Fix NaN with PVector.angleBetween

+ Fix missing getFloat() method in XML library

+ Make sure that paths are created with saveStream(). (saveStream() wasn’t working when intermediate directories didn’t exist)

+ Make createWriter() use an 8k buffer by default.

Friday, October 1, 2010 | processing  

Matthew Carter wins a MacArthur

I’m really happy to see typographer Matthew Carter receive a well-deserved MacArthur “Genius” Grant. A short video:

Very well put:

I think they’re saying to me, “You’ve done all this work. Well done… Here’s an award, now do more. Do better.” And it’s very nice, at my age, to be told by someone, that “we expect more from you. And here’s the means to help you achieve that.”

And if you’re not familiar with Carter’s name, you know his work: he created both Verdana and Georgia, at least one of which will be found on nearly any web site (the text you’re reading now is Georgia). Microsoft’s commission of these web fonts helped improve design on the web significantly in the mid-to-late 90s. Carter also developed several other important typefaces like Bell Centennial (back in the 70s), the tiny text found in phone books.

Tuesday, September 28, 2010 | typography  

Awesome now travels by poster tube

A few weeks ago I received a note from Ed Fries, who was interested in a distellamap-style print of his recently-finished Halo 2600.

Halo? Like the Xbox game by Bungie?

Why, yes! Sure enough, he’s written a version of the game for he Atari 2600.

You can play the game here, and if you don’t drown in the awesome (or die from laughing), you can now purchase prints here. Like the other distellamap prints, it shows how the image and code data coexist and interact inside an Atari 2600 cartridge games:

with all new colors!

A detail of what it looks like up close:

grab the key!

(And as with the other prints, proceeds are given to charity.)

Saturday, September 4, 2010 | distellamap, prints  

That wasn’t all he lost on his trip to Tiny

Trying to open an SVG with Illustrator, and she tells me this sad story…

shoulda listened to mom instead of the guys

Have a good Friday, everyone.

Friday, September 3, 2010 | thisneedsfixed  

Conveying multiple realities in research and journalism

A recent Boston Globe editorial covers the issue of multiple, seemingly (if obviously) contradictory statements that come from complex research, in this case around the oil spill:

Last week, Woods Hole researchers reported a 22-mile-long underwater plume that they mapped out in the Gulf of Mexico in June — a finding indicating that much more oil may lie deep underwater and be degrading so slowly that it might affect the ecosystem for some time. Also last week, University of Georgia researchers estimated up to 80 percent of the spill may still be at large, with University of South Florida researchers finding poisoned plankton between 900 feet and 3,300 feet deep. This differed from the Aug. 4 proclamation by Administrator Jane Lubchenco of the National Oceanic and Atmospheric Administration that three-quarters of the oil was “completely gone’’ or dispersed and the remaining quarter was “degrading rapidly.’’

But then comes the Lawrence Berkeley National Laboratory, which this week said a previously unclassified species of microbes is wolfing down the oil with amazing speed. This means that all the scientists could be right, with massive plumes being decimated these past two months by an unexpected cleanup crew from the deep.

This is often the case for anything remotely complex: the opacity of the research process to the general public, the communication skills of various institutions, the differing perspective between what the public cares about (whose fault is it? how bad is it?) versus the interests of the researchers, and so on.

It’s a basic issue around communicating complex ideas, and therefore affects visualization too — it’s rare that there’s a single answer.


On a more subjective note, I don’t know if I agree with the premise of the editorial is that it’s on the government to sort out the mess for the public. It’s certainly a role of the government, though the sniping at the Obama administration makes the editorial writer sound one who is equally likely to bemoan government spending, size, etc. But I could write an equally (perhaps more) compelling editorial making the point that it’s actually the role of newspapers like the Globe to sort out newsworthy issues that concern the public. But sadly, the Globe, or at least the front page of boston.com, has been overly obsessed with more click-ready topics like the Craigslist killer (or any other rapist, murderer, or stomach-turning story involving children du jour) and playing “gotcha” with spending and taxes for universities and public officials. What a bunch of ghouls.

(Thanks to my mother-in-law for the article link.)

Wednesday, September 1, 2010 | government, news, reading, science  

Scientific identification, ordering, & quantification of awesome

There may be many versions of the Periodic Table, but this is my favorite.

it's more fun to be a 10-year-old boy than a crusty old academic

The image was created by The Dapperstache, who has since updated the graphic, but I prefer this version with its bevel-crazy gradient awful.

Saturday, August 21, 2010 | infographics  


Ben Fry LLC now has a proper name, and it is Fathom. Or if you want to be formal about it, “Fathom Information Design”.

And today we launched a new site, fathom.info, for our work. (I’ll still be using benfry.com for my older research projects, Processing updates, software and visualization ramblings, book updates…)

We also have a new project that launched yesterday with GE, this time looking at shifts in age within world populations. A little more info about it is on the Fathom updates page (some might call it a blog). And when we have a chance, we hope to post a bit more of the process behind the piece.

Friday, July 23, 2010 | fathom  

Processing 0187

New release available shortly in the pre-releases section of processing.org/download.

More bug fixes, and one new treat for OS X users. Hopefully we’re about set
to call this one 1.2. Please test and report any issues you find.

[ additions ]

+ On Mac OS X, you’re no longer required to have a sketch window open at
all times. This will make the application feel more Mac-like–a little
more elegant and trendy and smug with superiority.

+ Added a warning to the Linux version to tell users that they should be
using the official version of Java from Sun if they’re not.
There isn’t a perfect way to detect whether Sun Java is in use,
so please let us know how it works or if you have a better idea.

[ fixes ]

+ “Unexpected token” error when creating classes with recent pre-releases.

+ Prevent horizontal scroll offset from disappearing.
Thanks to Christian Thiemann for the fix.

+ Fix NullPointerException when making a new sketch on non-English systems.

+ Fixed a problem when using command-line arguments with exported sketches
on Windows. Thanks to davbol for the fix.

+ Added requestFocusInWindow() call to replace Apple’s broken requestFocus(),
which should return the previous behavior of sketches getting focus
immediately when loaded in a web browser.

+ Add getDocumentBase() version of createInput() for Internet Explorer.
Without this, sketches will crash when trying to find files on a web server
that are not in the exported .jar file. This fix is only for IE. Yay IE!

Monday, July 12, 2010 | processing  

Processing 0186

Mixed bag of updates as a follow-on to release 0185.

[ mixed bag ]

Android SDK requirement is now API 7 (Android 2.1), because Google has deprecated API 6 (2.0.1).

More Linux PDF fixes from Matthias Breuer. Thanks!

PDF library matrix not reset between frames. (Fixed in 0185.)

Updated the URLs opened by the software to reflect the new site layout.

Updated the included examples with recent changes.

Friday, June 25, 2010 | processing  

Processing 0185

Just posted release 0185 of Processing on the download page. It’s a pre-release for what will eventually become 1.2 or 1.5. Please test and file bugs if you find problems. The list revisions are below:

PROCESSING 0185 – 20 June 2010

Primarily a bug fix release. The biggest change are a couple tweaks for problems caused by Apple’s Update 2 for Java on OS X, so this should make Processing usable on Macs again.

[ bug fixes ]

+ Fix for Apple bug that caused an assertion failure when requestFocus() was called in some situations. This was causing the PDE to become unusable for opening sketches, and focus highlighting was no longer happening.

+ Fixed two bugs with fonts created with specific charsets.

+ Fix from jdf for PImage(java.awt.Image img) and ARGB images. The method “public PImage(java.awt.Image)” was setting the format to RGB (even if ARGB)

+ Large number of beginShape(POINTS) not rendering correctly on first frame

+ Fix for PDF library and createFont() on Linux, thanks to Matthias Breuer.

+ Fix from takachin for a problem with full-width space with Japanese IME.

+ Reset matrix for the PDF library in-between frames also added begin/endDraw between frames

[ additions ]

+ Add the changes for “Copy as HTML” to replace the “Copy for Discourse” function, now that we’ve shut down the old YaBB discourse board.

+ Option to disable re-opening sketches when you start Processing. The default will stay the same, but if you don’t like the feature, alter your preferences.txt file to change:
to the following:
The issue was originally filed here:
However the main problem with this is that due to other errors, the wrong sketches are being opened, sketches are sometimes forgotten, or windows are opened concurrently on top of one another, creating a bad situation:
Those bugs are not yet fixed, but will be addressed in future releases.

+ Option to change the default naming of sketches via preferences.txt.
First, you can change the prefix, which defaults to:
And the suffix is handled using dates. The current default (since 1.0) is:
Or if you want to switch back to the old (six digit) style, you could use:

+ Updated bundled JRE/tools to 6u20 for Windows and Linux

+ Several SVG fixes and additions, including some tweaks from PhiLho. These changes will be documented in a future release once the API changes are complete.

+ Added option to launch a sketch directly w/ linux. Thanks to Larry Kyrala.

+ Pass actual exceptions from InvocationTargetException in registered methods, which improves how exceptions are reported with libraries.

+ Added loading.gif to the js version of the applet loader. Not sure if this is actually working or not, but it’s there.

[ android ]

+ Added permissions for INTERNET and WRITE_EXTERNAL_STORAGE to the default AndroidManifest.xml file. This will be addressed in greater detail here:
And with the implementation of code signing here:

+ Lots of work happening underneath with regards to Android, more updates soon as things start evening out a bit.

+ Defaulting to a WVGA screen for the default Processing AVD.

Monday, June 21, 2010 | processing  

The Pleasures of Imagination

A wonderful article by Yale professor Paul Bloom on imagination:

Our main leisure activity is, by a long shot, participating in experiences that we know are not real. When we are free to do whatever we want, we retreat to the imagination—to worlds created by others, as with books, movies, video games, and television (over four hours a day for the average American), or to worlds we ourselves create, as when daydreaming and fantasizing. While citizens of other countries might watch less television, studies in England and the rest of Europe find a similar obsession with the unreal.

Another portion talks about emotional response:

The emotions triggered by fiction are very real. When Charles Dickens wrote about the death of Little Nell in the 1840s, people wept—and I’m sure that the death of characters in J.K. Rowling’s Harry Potter series led to similar tears. (After her final book was published, Rowling appeared in interviews and told about the letters she got, not all of them from children, begging her to spare the lives of beloved characters such as Hagrid, Hermione, Ron, and, of course, Harry Potter himself.) A friend of mine told me that he can’t remember hating anyone the way he hated one of the characters in the movie Trainspotting, and there are many people who can’t bear to experience certain fictions because the emotions are too intense. I have my own difficulty with movies in which the suffering of the characters is too real, and many find it difficult to watch comedies that rely too heavily on embarrassment; the vicarious reaction to this is too unpleasant.

The essay is based on an excerpt of his book, How Pleasure Works: The New Science of Why We Like What We Like, which looks like a good read if I could clear out the rest of the books on my reading pile.

A reading pile that, of course, contains too little fiction.

Friday, June 4, 2010 | creativity  


A terrific set of videos from the “Best Illusion of the Year” contest. Congratulations to all the finalists, in particular first prize winner Koukichi Sugihara whose video is below:

More from Kokichi Sugihara (including an explanation of how this works) can be found here.

(thanks to my mother-in-law, who sent the link)

Saturday, May 22, 2010 | perception, science  

The Evolution of Privacy on Facebook

Inspired by this post by Kurt Opsahl of the EFF, Matt McKeon of IBM’s Visual Communication Lab created the following visualization depicting the evolution of the default privacy settings on Facebook:

sorry, still don't have an account on fb

Has a couple nice visual touches that prevent it from looking like YAHSVPOQUFOTI (yet another highly-stylized visualization piece of questionable utility found on the internet). Also cool to see it was built with Processing.js.

Friday, May 7, 2010 | javascript, privacy, processing, refine, social  

Cake Versus Pie: A Scientific Approach

Allie Brosh, who appears to be some sort of genius, brings us definitive arguments in the cake versus pie debate. Best to read the entire treatise, but here are a few highlights on how clearly pie defeats cake:

Ability of enjoyment to be sustained over time

what am i doing?

Couldn’t agree more: it always seems like a good idea on the first bite, and then I catch myself. What am I doing? I hate cake. Another graphic:

Unequal frosting distribution is a problem

mommy says don't swear about your dessert

I grew up requesting pie for my birthday (strawberry rhubarb, thank you very much) instead of cake. This resonates. More importantly (for this site), Brosh cites the enormous impact of pie vs. cake for information design and visualization:

Pie is more scientifically versatile:

eat your heart out, tufte. no pun intended.

Again, you really should read the full post, or the rest of her site for that matter. Her piece on the Alot is alone worth the price of admission.

Friday, May 7, 2010 | infographics, represent  

Pinhole camera image of the Sun’s path

A beautiful image taken by a pinhole camera, showing the Sun’s path over six months:

times square curvey billboards, eat your heart out

From the explanation:

The picture clearly shows the path of the sun through the sky over the last six months. I believe you can see we didn’t have a great summer by the broken lines at the top. More sun shone in the month of October.

The post also links to a description of how to make your own.

Tuesday, April 13, 2010 | physical, science  

Food Fight!

As reported here and here, Apple has updated the language in the latest release of their iPhone/iPad developer tools to explicitly disallow development with other tools and languages:

3.3.1 — Applications may only use Documented APIs in the manner prescribed by Apple and must not use or call any private APIs. Applications must be originally written in Objective-C, C, C++, or JavaScript as executed by the iPhone OS WebKit engine, and only code written in C, C++, and Objective-C may compile and directly link against the Documented APIs (e.g., Applications that link to Documented APIs through an intermediary translation or compatibility layer or tool are prohibited).

I’m happy that Apple is being explicit about this sort of thing, rather than their previous passive aggressive stance that gave more wiggle room for their apologists. This is a big “screw you” to Adobe in particular, who had been planning to release a Flash-to-iPhone converter with Creative Suite 5. I understand why they’re doing it, but in the broader scheme of what’s at stake, why pick a fight with one of the largest software vendors for the Mac?

In addition to being grounded in total, obsessive control over the platform, the argument seems to be that the only way to make a proper iPhone/iPad experience is to build things with their tools, as a way to prevent people from developing for multiple platforms at once. This has two benefits: first, it encourages developers to think within the constraints and affordances of the platform, and second, it forces potential developers to make a choice of which platform they’re going to support. It’s not quite doubling the amount of work that would go into creating an app for both, say, the iPhone and Android, but it’s fairly close. So what will people develop for? The current winner with all the marketing and free hype from the press.

To be clear, developing within the constraints of a platform is incredibly important for getting an application right. But using Apple’s sanctioned tools doesn’t guarantee that, and using a legal document to enforce said tools steps into the ridiculous.

Fundamentally, I think the first argument — that to create a decent application you have to develop a certain way, with one set of tools — is bogus. It’s a lack of trust in your developers and even moreso, a distrust of the market. In the early days of the Macintosh, it was difficult to get companies to rework their DOS (or even Apple II) applications to use the now-familiar menu bars and icons. The Human Interface Guidelines addressed it specifically. And when companies ignored those warnings, and released software that was a clear port from a DOS equivalent, people got upset and the software got trashed. Just search for the phrase “not mac-like” and you’ll get the picture. Point being, people came around on developing for the Mac, and it didn’t require a legal document saying that developers had to use MPW and ResEdit.

The market demanded software that felt like Macintosh applications, and it’s the same for the iPhone and iPad. On the tools side, the free choice also meant that the market produced far better tools than what Apple provided — instead of the archaic MPW (ironically, itself something of a terminal application), Think Pascal, Lightspeed C, Metrowerks Codewarrior, and even Resorcerer all filled in various gaps at different times, all providing a better platform than (or at least a suitable alternative to) Apple’s tools.

But like this earlier post, it seems like Apple is being run by someone who is re-fighting battles of the 80s and 90s, but whose personal penchant for control prevents him from learning from the outcomes. That rhyming sound you hear? It’s history.

Friday, April 9, 2010 | cs, languages, mobile, software  

Cut! Cut! Paste. Cut!

Nice heat map image of how people use the menu bar in Firefox by Alex Faaborg:

copy! copy! paste! copy!

Most of the results are what you’d expect, but fun to see it nonetheless. Some other info graphics using the same data can be found here, and even better, the raw data can be found here.

Thursday, April 1, 2010 | data, heatmap, interact, inventory  

What this interminable conflict needs is a *mind map*

worse than boehner's health care diagram

What’s that?

It’s actually a map of counter-insurgency strategy for Afghanistan?


Wednesday, March 31, 2010 | networks, news, politics, thisneedsfixed  

Controlled leaks and pre-announcements

This Wall Street Journal piece sounds a lot like a controlled leak:

Apple Inc. plans to begin producing this year a new iPhone that could allow U.S. phone carriers other than AT&T Inc. to sell the iconic gadget, said people briefed by the company.

The new iPhone would work on a type of wireless network called CDMA, these people said. CDMA is used by Verizon Wireless, AT&T’s main competitor, as well as Sprint Nextel Corp. and a handful of cellular operators in countries including South Korea and Japan. The vast majority of carriers world-wide, including AT&T, use another technology called GSM.

(Paranoid emphasis my own.) Apple (like any other major company) has been known to use leaks to their advantage, and there seems to be an uptick of next generation iPhone rumors (double-size screen, faster processor, thinner, Verizon) in the past week that seems to coincide with the announcement of several promising-sounding Android phones (big screens, fancy features, 4G and HSPA+ networks, thin, light, lots of providers). It doesn’t seem like Apple is terribly worried about Android, but aggressively keeping the Android platform from getting any sort of traction would makes good business sense.

I think this is the first time that I’ve seen such rumors appearing to coincide with Android launches (that you probably didn’t even hear about), which gave me some hope that Android might be going somewhere. (I use an iPhone and a Nexus One. I’m rooting for competition and better products more than either platform.)

Microsoft was always good at using pre-announcements to kill competitor’s products (“oh, I can wait a couple months for the Microsoft solution…”), which is of course different than just leaking. Microsoft often wouldn’t ship the product, or would ship a far inferior version to what was announced or leaked, but in the meantime, they had successfully screwed the competitor. I think it’s safe to assume that there will be a new iPhone (or two) in June like there have been the past several years.

And now, back to playing with data… rumors are clearly not my thing.

Tuesday, March 30, 2010 | mobile, rumors  

Yeah, that sounds about right…

More greatness from xkcd:

the no longer secret life of numbers

(Thanks Andrea)

Monday, March 29, 2010 | inventory  

A glimpse of modern reporting

Colin Raney turned me on to this project (podcast? article? info graphic? series? part of what’s great is that there isn’t really a good term for this) by the team of five running the Planet Money podcast for NPR. To explain toxic assets, they bought one, and are now tracking its demise:

losing $1000 isn't usually this elegant

Here I’m showing the info graphic, which is just one component of telling the broader story. The series does a great job of balancing 1) investigative journalism (an engaging story), 2) participation by a small team (the four reporters plus their producer each pooled $200 apiece), 3) timely and relevant, 4) really understanding an issue (toxic assets are in the news but we still don’t quite get it), 5) distribution (blog with updates, regular podcast), and 6) telling a story with information graphics (being able to track what’s happening with the asset).

I could keep adding to that numbered list, but my hastily and poorly worded point is that the idea is just right.

Perhaps if the papers weren’t so busy wringing their hands about the loss of classified ads, maybe this would have been the norm five years ago when it should have been. But it’s a great demonstration of where we need to be with online news, particularly as it’s consumed with all these $500 devices we keep purchasing, that deliver the news in a tiny, scrolly text format that echoes the print version. A print format that’s 100s of years old.

Anyhow, this is great. Cheers to the Planet Money folks.

(Another interesting perspective here, from TechDirt, which was the original link I read.)

Friday, March 26, 2010 | infographics, news  

Shirts with Zips

Got a note during SXSW from Marc Cull at CafePress to tell me that they were doing real-time order visualization using an adaptation of zipdecode (explained in Visualizing Data). Fun! Gave me a giggle, at any rate:

the united states, before being ironed or air-fluffed

Tuesday, March 23, 2010 | adaptation, zipdecode  

On needing approval for what we create, and losing control over how it’s distributed

I’ve been trying to organize my thoughts about the iPad and the direction that Apple is taking computing along with it. It’s really an extension of the way they look at the iPhone, which I found unsettling at the time but with the iPad, we’re all finally coming around to the idea that they really, really mean it.

I want to build software for this thing. I’m really excited about the idea of a touch-screen computing platform that’s available for general use from a known brand who has successfully marketed unfamiliar devices to a wide audience. (Compare this to, say, Microsoft’s Tablet PC push that began in the mid-2000s and is… nowhere?)

It represents an incredible opportunity, but I can’t get excited about it because of Apple’s attempt to control who creates for it, and what they can create for it. Their policy of being the sole distributor of applications, and even worse, requiring approval on all applications, is insulting to developers. Even the people who have created Mac software for years are being told they can no longer be trusted.

I find it offensive on a very basic level, because I know that if such restrictions were in place when I was first learning to write software — mostly on Apple machines, no less — I would not have a career in the field. Or if we had to pay regular fees to become a developer, use only Apple-provided tools, and could release only approved software through an Apple store, things like the Processing project would not have happened. I can definitively say that any success that I’ve had has come from the ability to create what I want, the way that I want, and to be able to distribute it as I see fit, usually over the internet.

As background, I’m writing this as a long-time Apple user who started with an Apple ][+ and later the original 128K Mac. A couple months ago, Apple even profiled my work here.

You’ll shoot your eye out, kid!

There’s simply no reason to prevent people from installing anything they want on the iPad. The same goes for the iPhone. When the iPhone appeared, Steve Jobs made a ridiculous claim that a rogue application could “take down the network.” That’s an insult to common sense (if it were true, then the networks have a serious, serious design flaw). It’s also an insult on our intelligence, except for the Apple fans who repeat this ridiculous statement.

But even if you believed the Bruce Willis movie version of how mobile networks are set up, it simply does not hold true for the iPad (and the iPod Touch before it). The $499 iPad that has no data network hardware is not in danger of “taking down” anyone’s cell network, but applications will still be required to go through the app store and therefore, its approval process.

The irony is that the original Mac was almost a failure because of Jobs’ insistence at the time about how closed the machine must be. I recall reading about how the original Macintosh was almost nearly a failure, were it not for engineers who developed AppleTalk networking in spite of Steve Jobs’ insistence of keeping the original Macintosh as an island unto itself. Networking helped make the “Macintosh Office” possible by connecting a series of Macs to the laser printer (introduced at the same time), and so followed the desktop publishing revolution of the mid-80s. Until that point, the 128K Macintosh was largely a $2500 novelty.

For the amazing number of lessons that Jobs seems to have learned in his many years in technology, his insistence of control is to me a glaring omission. It’s sad that Jobs groks the idea of computers designed for humans, but then consistently slides into unnecessary lockdown restrictions. It’s an all-too-human failing of wanting too much control.

Only available on the Crapp Store!

For all the control that Apple’s taken over the content on the App Store, it hasn’t prevented the garbage. Applications for jiggling boobs or shaking babies have somehow first made it through the same process that delayed the release or update of many other developers’ applications for weeks. Some have been removed, but only after an online uproar of keyboards and pitchforks. The same approval process that OKs flashlight apps by the dozen and fart apps.

Obvious instances aside, the line of “appropriate” will always be subjective. The line changed last week when Apple decided to remove 5,000 “overtly sexual” applications, which might make sense, but is instead hypocritical when they don’t apply the same criteria to established names like Playboy.

Somebody’s forgetting the historical mess of “I know it when I see it.” It’s an un-answerable dilemma (or is that an enigma?), so why place yourself in a situation of being arbiter?

Another banned application was a version of Dope Wars, a game that dates back to the mid-80s. Inappropriate? Maybe. A problem? Only if children have been turning to lives of crime since its early days as an MS-DOS console program, or on their programmable TI calculators. Perhaps the faux-realistic interface style of the iPhone OS tipped the scales.

The problem is that fundamentally, it’s just never going to be possible to prevent the garbage. If you want to have a boutique, like the Apple retail stores, where you can buy a specially selected subset of merchandise from third parties, then great. But instead, we’ve conflated wanting to have that kind of retail control (a smart idea) with the only conduit by which software can be sold for the platform (an already flawed idea).

Your toaster doesn’t need a hierarchical file system

Anyone who has spent five minutes helping someone with their computer will know that the overwhelming majority don’t need full access to the file system, and that it’s a no-brainer to begin hiding as much of it as possible. The idea of the ipad as appliance (and the iPhone before it) is an obvious, much needed step in the user interface of computing devices.

(Of course, the hobbyist in me doesn’t want that version, since I still want access to everything, but most people I know who don’t spend all their time geeking out on the computer have no use for the confusion. I’m happy to separate those parts.)

And frankly, it’s an obvious direction, and it’s actually much closer to very early versions of Mac OS — the original System and Finder — than it is with OS X. Mac OS X is as complicated as Windows. My father, also an early Mac user, began using PCs as Apple fell apart in the late 90s. He hasn’t returned to the Mac largely because of the learning curve for OS X, which is no longer head and shoulders above Windows in terms of its ease of use. Surely the overall UI is better, clearer, and more thoughtfully put together. But the reason to switch nowadays is less to do with the UI, and more to do with the way that one can lose control of their Windows machines due to the garbage installed by PC vendors, the required virus scanning software, the malware scanning software, and all the malware that gets through in spite of it all.

The amazing Steven Frank, co-founder of Panic, puts things in terms of Old World and New World computing. But I believe he’s mixing the issue of the device feeling (and working) in a more appliance-like fashion with the issue of who controls what goes on the device, and how it’s distributed to the device. I’m comfortable with the idea that we don’t need access to the file system, and it doesn’t need to feel like a “computer.” I’m not comfortable with people being prevented by a licensing agreement, or worse, sued, for hacking the device to work that way.

It Just Works, except when It Doesn’t

The “it just works” mantra often credited to Apple is — to borrow the careful elocution of Steve Jobs — “bullshit.” To use an example, if things “just worked” then I’d be able to copy music from my iPod back to my laptop, or from one machine that I own to another. If I’ve paid for that music (whether it’s DRM-free or even if I made the MP3 myself), there’s simply no reason that I should be restricted from copying this way. Instead we have the assumption that I’m doing something illegal built into the software, and preventing obvious use.

Of course, I assume that as implemented, this feature is something that was “required” by the music industry. But to my knowledge, there’s simply no proof of that. No such statement has been made, and more likely, it’s easier for Apple fans to use the “evil music industry” or “evil RIAA” as easier to blame. This thinking avoids noticing that Apple has also demanded similar restrictions for others’ projects, in a case where they actually have control over such matters.

Bottom line, when trying to save the music collection of a family member whose laptop has crashed is a great time, and it’s only made better by taking the time to dig up a piece of freeware that will let me copy the music from their iPod back to their now blank machine. The music that they spent so much money on at the iTunes Store.

Like “don’t be evil,” the “it just works” phrase applies, or it doesn’t. Let’s not keep repeating the mantra and conveniently ignoring the times when the opposite is true.

It’s been a long couple of months, and it’s only getting longer

One of the dumbest things that I’ve seen in the past two months since the iPad announcement is articles that write about the device comparing it to other computers, and how it doesn’t have feature x, y, or z. That’s silly to me because it’s not a general purpose computer like we’re used to. And yes, I’m fully aware of the irony of that statement if you take it too literally. I am in fact complaining about what’s missing from the iPad (and iPhone), though it’s about things that have been removed or disallowed for reasons of control, and don’t actually improve the experience of using the device. Now stop thinking so literally.

The thing that will be interesting about the iPad is the experience of using it — something that nobody has had except for the folks at Apple — and as is always the case when dealing with a different type of interface, you’re always going to be wrong.

So what is it? I’m glad you asked…

Who is this for?

As Teri likes to point out, it’s also important to note the appeal of this device to a different audience — our parents. They need something like an iPhone, with a bigger screen, that allows them to browse the internet and read lots of email and answer a few. (No word yet on whether the iPad will have the ability to forward YouTube videos, chain e-mails, or internet jokes.) For them, “it’s just a big iPhone” is a selling point. The point is not that the iPad is for old people, the point is that it’s a new device category that will find its way into interesting niches that we can’t ascertain until we play with the thing.

Any time you have a new device, such as this one, it also doesn’t make a lot of sense. It simply doesn’t fit with anything that we’re currently used to. So we have a lot of lazy tech writers who go on about how it’s under-featured (it’s a small computer! it’s a big phone!) or that it doesn’t make sense in the lineup. This is a combination of a lack of creativity (rather than tearing the thing down, think about how it might be used!) and perhaps the interest of filling column inches in spite of the fact that none of these people has used the device, so we simply don’t know. It’s part of what’s so dumb about pre-game shows for sports. What could be more boring than a bunch of people arguing about what might happen? The only thing that’s interesting about the game is what does happen (and how it happens). I know you’ve got to write something, but man, it’s gonna been a long couple weeks until the device arrives.

It’s Perfect! I love it like it is.

There’s also talk about the potential disappearance of extensions or plug-in applications. While Mac OS extensions (of OS 9 and earlier) were a significant reason for crashes on older machines, they also contributed to the success of the platform. Those extensions wouldn’t be installed if there weren’t a reason, and the fact is, they were valuable enough that it was the occasional sobs for an hour of lost work after a system crash to have them present.

I think the anti-extension arguments come from people who are imagining the ridiculous number of extensions on others’ machines, but disregarding the fact that they badly needed something like Suitcase to handle the number of fonts on their system. As time goes on, people will want to do a wider range of things with the iPhone/iPad OS too. The original Finder and System had a version 3 too (actually they skipped 3.0, but nevermind that), just like the iPhone. Go check that out, and now compare it to OS X. The iPhone OS will get crapped up soon enough. Just as installing more than 2-3 pages of apps on the iPhone breaks down the UI (using search is not the answer — that’s the equivalent of giving up in UI design), I’m curious to see what the oft-rumored multitasking support in iPhone OS 4 will do for things.

And besides, without things like Windowshade, what UI elements could be licensed (or stolen) and incorporated into the OS. Ahem.

I’d never bet against people who tinker, and neither should Apple.

I haven’t even covered issues from the hardware side, in spite of having grown up taking apart electronics and in awe of the Heathkit stereo my dad built. But it’s the sort of thing that disturbs our friends at MAKE, and others have written about similar issues. Peter Kirn has more on just how bad the device is in terms of openness. One of the most egregious hardware problems is the device’s connection to the outside world is a proprietary port, access to which has to be licensed from Apple. This isn’t just a departure from the Apple ][ days of having actual digital and analog ports on the back (it was like an Arduino! but slower…) it’s not even something more standard like USB.

But why would you artificially keep this audience away? To make a couple extra percent on licensing fees? How sustainable is that? Sure it’s a tiny fraction of users, but it’s some of the most important — the people who are going to do new and interesting things with your platform, and take it in new directions. Just like the engineers who sneaked networking into the original Macintosh, or who built entire industries around extending the Apple ][ to do new things. Aside from the schools, these were the people who kept that hardware relevant long enough for Apple to screw up the Lisa and Mac projects for a few years while they got their bearings.


I am not a futurist, but at the end of it all, I’m pretty disappointed by where things seem to be heading. I spend a lot of effort on making things, and trying to get others to make things, and having someone in charge of what I make, and how I distribute it is incredibly grating. And the fact that they’re having this much success with it is saddening.

It may even just work.

Friday, March 12, 2010 | cs, mobile, notafuturist, software  

1995? Bah!

Newsweek has posted a 1995 article by Clifford Stoll slamming “The Internet.”

Yet Nicholas Negroponte, director of the MIT Media Lab, predicts that we’ll soon buy books and newspapers straight over the Intenet. Uh, sure.

Well, maybe Negroponte was wrong that we’d be buying newspapers. Ahem.

But the thing I find most amazing about the article, however, is that the all the examples that he cites as futuristic B.S. are in fact the successful parts. Take shopping:

Then there’s cyberbusiness. We’re promised instant catalog shopping—just point and click for great deals. We’ll order airline tickets over the network, make restaurant reservations and negotiate sales contracts. Stores will become obselete. So how come my local mall does more business in an afternoon than the entire Internet handles in a month? Even if there were a trustworthy way to send money over the Internet—which there isn’t—the network is missing a most essential ingredient of capitalism: salespeople.

He could have at least picked some of the dumber ideas about “the future” that were being pushed at the time, but instead he’s a shockingly accurate anti-futurist.

I’ll happily point out that in 1995 I couldn’t imagine buying clothes online either. In fact I remember having a conversation with Frank Ludolph (former Xerox PARC researcher, part of the Lisa team and worked on the Mac Finder as well, at Sun at the time) about exactly that. He said you had to be able to touch the clothes and get the color and texture — I concurred. Then again, Frank was also cheerfully embarrassed to admit (that same Summer) that he was one of the people (at PARC or Apple, I don’t recall) who argued against the idea of overlapping windows in user interfaces because they would be too confusing for users. Instead he (and many others in that camp) advocated that the screen be divided into a grid of panels.

It’s tough to be a futurist, but Stoll seems to have the market cornered on being an exactly wrong, and very entertaining, anti-futurist.

Monday, March 1, 2010 | notafuturist  

JavaScript: The Good Parts

Watched Douglas Crockford’s “JavaScript: The Good Parts” talk, based on his book of the same name. I like Crockford’s work on JSON—or rather, the idea of simple file formats that need simple APIs to work with them. More important, with the continued evolution of processing.js, I’m really optimistic about where things are headed with JavaScript. (You might say I’m feeling a bit hopey changey about it.) I’ve had Crockford’s book in my reading pile for a while and finally got around to watching the talk last week.

I was at Netscape (or maybe at Sun?) when they renamed their “LiveScript” language to “JavaScript” (because Java was the it-language at the time) and I’d avoided it for a long time. His talk points out a series of things to avoid from the JavaScript syntax, in fact I think I enjoyed the explanation of the “Bad Parts” a bit more. By clearing out a few things, the whole starts making more sense. But it’s an interesting discussion for people scratching their head about this incredibly pervasive language found in web browsers, and rapidly becoming more exciting as support for Canvas and WebGL evolve.

Tuesday, February 23, 2010 | cs, languages, processing, speaky  

Processing 0176 (pre-release)

PooI’ve just posted revision 0176 of Processing, a pre-release of what will become version 1.1 or maybe 1.5, depending on how long we bake this one before releasing the final. A list of changes can be found here.

You can download the release at android.processing.org, which (as you might guess) is the eventual home of the Android version of Processing. The Android support is very incomplete, as you can see from the warnings on the page.

But ignore for a moment that it says “Android”, the download is hosted there because at the moment, most of my energy is focused on the Android extensions. While the build also includes the incomplete Android tools (just pretend they aren’t there, unless you’re willing to read all the caveats on that page), there are many bug fixes for the regular Java version of Processing in the download too. It’s been a couple months since I’ve done a proper release, so there’s a backlog of fixed bugs and things I’ve been adding.

I’m posting the pre-release because so many things have changed, and I don’t want to do a 1.1 release, followed by an immediate 1.1.1. So please test! Then again, it’s taken me so long to explain the situation that I should have just posted it as 1.1.

And by the time you read this, it’ll probably be release 0177, or 0178, or…

Saturday, February 20, 2010 | processing  

Taking the “vs.” out of Man & Machine

Fascinating editorial from chess champion Gary Kasparov, about the relationship between humans and machines:

The AI crowd, too, was pleased with the result and the attention, but dismayed by the fact that Deep Blue was hardly what their predecessors had imagined decades earlier when they dreamed of creating a machine to defeat the world chess champion. Instead of a computer that thought and played chess like a human, with human creativity and intuition, they got one that played like a machine, systematically evaluating 200 million possible moves on the chess board per second and winning with brute number-crunching force. As Igor Aleksander, a British AI and neural networks pioneer, explained in his 2000 book, How to Build a Mind:

By the mid-1990s the number of people with some experience of using computers was many orders of magnitude greater than in the 1960s. In the Kasparov defeat they recognized that here was a great triumph for programmers, but not one that may compete with the human intelligence that helps us to lead our lives.

It was an impressive achievement, of course, and a human achievement by the members of the IBM team, but Deep Blue was only intelligent the way your programmable alarm clock is intelligent. Not that losing to a $10 million alarm clock made me feel any better.

He continues to describe playing games with humans aided by computers, and how it made the game even more dependent upon creativity:

Having a computer program available during play was as disturbing as it was exciting. And being able to access a database of a few million games meant that we didn’t have to strain our memories nearly as much in the opening, whose possibilities have been thoroughly catalogued over the years. But since we both had equal access to the same database, the advantage still came down to creating a new idea at some point.

Or some of the other effects:

Having a computer partner also meant never having to worry about making a tactical blunder. The computer could project the consequences of each move we considered, pointing out possible outcomes and countermoves we might otherwise have missed. With that taken care of for us, we could concentrate on strategic planning instead of spending so much time on calculations. Human creativity was even more paramount under these conditions. Despite access to the “best of both worlds,” my games with Topalov were far from perfect. We were playing on the clock and had little time to consult with our silicon assistants. Still, the results were notable. A month earlier I had defeated the Bulgarian in a match of “regular” rapid chess 4–0. Our advanced chess match ended in a 3–3 draw. My advantage in calculating tactics had been nullified by the machine.

The final reinforces that I’d heard others describe Kasparov’s play as machine-like in the past (in a sense, this is verification or even quantification of that idea). It also includes some interesting comments on numerical scale:

The number of legal chess positions is 1040, the number of different possible games, 10120. Authors have attempted various ways to convey this immensity, usually based on one of the few fields to regularly employ such exponents, astronomy. In his book Chess Metaphors, Diego Rasskin-Gutman points out that a player looking eight moves ahead is already presented with as many possible games as there are stars in the galaxy. Another staple, a variation of which is also used by Rasskin-Gutman, is to say there are more possible chess games than the number of atoms in the universe. All of these comparisons impress upon the casual observer why brute-force computer calculation can’t solve this ancient board game. They are also handy, and I am not above doing this myself, for impressing people with how complicated chess is, if only in a largely irrelevant mathematical way.

And one last statement:

Our best minds have gone into financial engineering instead of real engineering, with catastrophic results for both sectors.

In the article, Kasparov mentions Moravec’s Paradox, described by Wikipedia as:

“contrary to traditional assumptions, the uniquely human faculty of reason (conscious, intelligent, rational thought) requires very little computation, but that the unconscious sensorimotor skills and instincts that we share with the animals require enormous computational resources”

And another interesting notion:

Marvin Minsky emphasizes that the most difficult human skills to reverse engineer are those that are unconscious. “In general, we’re least aware of what our minds do best,” he writes, and adds “we’re more aware of simple processes that don’t work well than of complex ones that work flawlessly.”

Saturday, February 20, 2010 | human, scale, simulation  

Dick Brass

An interesting op-ed by Dick Brass, a former Vice President at Microsoft on how their internal structure can get in the way of innovation, and citing specific examples. The first relates to ClearType and the difficulties of getting it integrated into other products:

Although we built it to help sell e-books, it gave Microsoft a huge potential advantage for every device with a screen. But it also annoyed other Microsoft groups that felt threatened by our success.

Engineers in the Windows group falsely claimed it made the display go haywire when certain colors were used. The head of Office products said it was fuzzy and gave him headaches. The vice president for pocket devices was blunter: he’d support ClearType and use it, but only if I transferred the program and the programmers to his control. As a result, even though it received much public praise, internal promotion and patents, a decade passed before a fully operational version of ClearType finally made it into Windows.

Or another case in attempts to build the Tablet PC, in stark contrast to Apple’s (obvious and necessary) redesign of iWork for their upcoming iPad:

Another example: When we were building the tablet PC in 2001, the vice president in charge of Office at the time decided he didn’t like the concept. The tablet required a stylus, and he much preferred keyboards to pens and thought our efforts doomed. To guarantee they were, he refused to modify the popular Office applications to work properly with the tablet. So if you wanted to enter a number into a spreadsheet or correct a word in an e-mail message, you had to write it in a special pop-up box, which then transferred the information to Office. Annoying, clumsy and slow.

Having spent time in engineering meetings where similar arguments were made, it’s interesting to see how that perspective translates into actual outcomes. ClearType has seemingly crawled its way to a modest success (though arguably was invented much earlier with Apple ][ displays), while Microsoft’s Tablet efforts remain a failure. But neither represent he common sense approach that has had such an influence on Apple’s success.

Update: A shockingly bad official response has been posted to Microsoft’s corporate blog. While I took the original article to be one person’s perspective, the lame retort (inline smiley face and all) does more to reinforce Brass’ argument.

Thursday, February 4, 2010 | cs, failure, software  

Design for Haiti

John Maeda put us in touch with Aaron Perry-Zucker, who writes:

I created Design for Obama and saw what a fully engaged, passionate, creative community can do. On that occasion, we were eager to lend our creative talents to a movement calling for change and inspire others to do the same.

Today we face a much graver task: In the wake of the unimaginable suffering that has befallen the island of Haiti, it is our job as artists and designers to use our talents to call for advocacy and understanding. Thanks to Design for Obama artist, James Nesbitt, we are now operating from designforhaiti.com.

Consider this a creative call to action to design:

Both are necessary; this is what artists and designers do best. Let us come together and lead the way to relief.

— Aaron Perry-Zucker

Thursday, January 21, 2010 | opportunities  

Dr. Baumol Talks Health Care Cost

Dr. Baumol, in red.Continuing my recent fascination/attention to health care, an interesting post on the New York Times site about the economics of increasing health costs, based on the ideas of William J. Baumol, who developed the notion of “cost disease”:

Dr. Baumol and a colleague, William G. Bowen, described the cost disease in a 1966 book on the economics of the performing arts. Their point was that some sectors of the economy are burdened by an inexorable rise in labor costs because they tend not to benefit from increased efficiency. As an example, they used a Mozart string quintet composed in 1787: 223 years later, it still requires five musicians and the same amount of time to play.

Essentially, making the point that no matter how much reform there is, the cost of care will still outpace inflation. The article (and theory) focuses on people as the most significant bottleneck, though I haven’t seen anything showing that in the current setting, the excessive increase in costs from the last ten years (and why the U.S. is paying twice other industrialized nations, for only average care) is tied to salaries. Tests, insurance cost, overhead, equipment all seem like things that the market can fix, but then again, I’m not much for Economics. In the end, the post is light on details (it’s a blog post, not a full article), but is interesting food for thought.

(Thanks to Teri for the link)

Monday, January 18, 2010 | healthcare, notaneconomist, Uncategorized  

New for 2010

Back in December, I made the decision to leave Seed and strike out on my own. As of January 1st (two weeks ago), I’m setting up shop in Cambridge. (That’s the fake Cambridge for you UK readers. Or, Cambridge like “MIT and Harvard” not “University Of”).

The federal government knows this new venture under the charmingly creative moniker of BEN FRY LLC, but with any luck, a proper name will be found soon so that I don’t have to introduce myself as Ben Fry, founder of Ben Fry LLC. (Which is even worse than having a site with your own name as the URL. I have Tom White—who originally registered the site as a joke—to thank for that.)

I’ll soon be hiring designers, developers, data people, and peculiar hybrids thereof. If you do the sort of work that you see on this site, please get in touch (send a message to mail at benfry.com). In particular I’d like to find people local to Cambridge/Boston, but because some of this will be project-oriented freelance work, some of it can be done at a distance.

Stay tuned, more to come.

(Update 1/21/2010 – Thanks for the responses. I’m having trouble keeping on top of my inbox so my apologies in advance if you don’t hear back from me promptly.)

Saturday, January 16, 2010 | opportunities, seed, site  

toxiclibs showreel

One of the earliest fixtures in the Processing community is toxi (or Karsten Schmidt, if you must) who has been doing wonderful things using the language/environment/core for many years. A couple months ago he posted a beautiful reel of work done by the many users of his toxiclibs library. Just beautiful:

A more complete description can be found on the video page over at Vimeo.

Wednesday, November 11, 2009 | processing  

Are electronic medical records really about data?

Having spent my morning at the doctor’s office (I’m fine, Mom–just a physical), I passed the time by asking my doctor about the system they use for electronic medical records. Our GE work (1, 2) and seeing her gripe and sigh as truly awful-looking screen after screen flew past on her display caught my interest. And as someone who has an odd fascination with bad interfaces, I just had to ask…

Perhaps the most surprising bit was that without explicitly saying so, she seemed to find the EMR system most useful not as a thing that aggregates data, or makes her work easier, but instead as a communication tool. It combats the (very real, not just an overused joke) penmanship issues of fellow doctors, but equally as important, it sets a baseline or common framework for the details of a visit. The latter part is obvious, but the actual nature of it is more subtle. For instance, she would often find herself deciphering a scribble that says “throat, amox” by another doctor, and it says nothing of dosage, frequency, type of Amoxicillin, much less the nature of the throat trouble. A patient (particularly a sick patient) is also not the person to provide precise details. How many would remember whether they were assigned a 50, 150 or 500 milligram dosage (very different things, you might say). And for that matter, they’re probably equally likely to think they’re on a 500 kilogram dose. (“No, that’s too high. Must be 5 kilogram.”)

My doctor might be seeing such a patient because their primary care doctor (the mad scribbler) was out, or the patient was a referral, or had just moved offices, or whatever. But it makes an interesting point for the transience of medical data: importance increases as it’s in motion, which is especially true since the patient it’s attached to is not a static entity (from changing health conditions to changing jobs, cities, and doctors).

Or from a simpler angle, if you’re sick enough that you have to be seen by someone other than your primary care doctor, then it’s especially important for the information to be complete. So with any luck, the EMR removes a layer of translation that was required before.

As she described things off the top of her head, the data only came up later. Ok, it’s all data, but I’m referring to the numbers and the tests and the things that can be tracked easily over time. The sort of reduce-the-patient-to-numbers things we usually think of when hearing about EMRs. Readouts that display an array of tests, such as blood pressure history, is an important feature, but it wasn’t the killer app of EMRs. (And that will be the last time I use “killer app” and “electronic medical records” together. Pun not intended.)

The biggest downside (she’s now using her second system) is that the interfaces are terrible, usually that they do things in the wrong order, or require several windows and multiple clicks to do mundane tasks. She said there were several things that she liked and hated about this one, but that it was a completely different set of pros/cons from the other system she used. (And to over-analyze for a moment, I think she even said “like” and “hate” not “love” and “hate” or “like” and “dislike”. She also absentmindedly mentioned “this computer is going to kill me.” She’s not a whiner, and may truly believe it. EMRs may be killing our doctors! Call The New York Times, or at least Fox 25.) This isn’t surprising, I assume it’s just that technology purchasers are several levels removed from the doctors who have to use the equipment, which is usually the case for software systems like this, so there’s little market pressure for usability. If you’re big enough to need such a beast, then it means that the person making the decision about what to buy is a long ways removed. But I’m curious about whether this is a necessity of how big software is implemented, or a market opportunity.

At some point she also stated that it would be great if the software company had asked a doctor for their input in how the system was implemented. I think it’s safe to assume that there was at least one M.D.–if not an arsenal of individuals with a whole collection of alphabet soup trailing their names–who were involved with the software. But I was struck with how matter-of-fact she was that nobody had even thought about it. The software was that bad, and to her, the flaws were that obvious. The process by which she was forced to travel through the interface had little to do with the way she worked. Now, for any expert, they might have their own way of doing things, but that’s probably not the discrepancy here. (And in fact, if the differences between doctors are that great, then that itself should be part of the software: the doctor needs to be able to change the order in which the software works.) But it’s worth noting that the data (again, meaning the numbers and test history and easily measurable things) were all easily accessible from the interface, which suggests that like so many data-oriented projects, the numbers seduced the implementors. And so those concrete numbers (fourth or so on ranked importance for this doctor) won out over process (the way the doctor spends their day, and their time with the patient).

All of which is a long way of wondering, “are electronic medical records really about data?”

Monday, October 5, 2009 | healthcare, interact, thisneedsfixed  

So an alien walks into a bar, says “6EQUJ5”

I love this image of a radio signal reading found on Futility Closet, mostly because it belongs in a movie:

yeah, that surprised me too

As the post explains, this was a signal seen by astronomer Jerry Ehman, coming from Sagittarius in 1977, but never replicated.

Friday, September 25, 2009 | mine, probability  

Go Greyhound, and leave the route-planning to us!

While checking the bus schedule for Greyhound, I recently discovered that travel from New York City to Boston is a multi-day affair, involving stops in Rochester, Toronto (yes, Canada), Fort Erie, Syracuse, and even Schenectady and Worcester (presumably because they’re both fun to say).

oh, you can get there from here all right

1 day, 5 hours, and 35 minutes. That’s the last time I complain about how bad the Amtrak site is.

Monday, September 21, 2009 | software, thisneedsfixed  

The Fall Cleaning continues…

As I continue the purge of images, movies, and articles that I’ve set aside, two beautiful works of motion graphics. (Neither are related to visualization, but both are inspiring.)

First is James Jarvis’ running video for Nike — beautifully drawn and captures a wonderful collection of experiences I could identify with (bird attack, rainstorms, stairs…)

And the second is the “Big Ideas (Don’t Get Any)” video by James Houston.

Just incredible.

Monday, September 21, 2009 | motion  

Chris Jordan at TED

Fantastic TED talk from Chris Jordan back in February 2008. Chris creates beautiful images that convey scale in the millions. Examples include statistics like the number of plastic cups used in a day — four million — and here showing one million of them:

i think you're spending too much time at the water cooler

The talk is ten minutes, and well worth a look. I’m linking a sinfully small version here, but check out the higher resolution version on the TED site.

As much as I love looking at this work (and his earlier portraits, more can be found on his site), there’s also something peculiar about the beauty of the images perhaps neutering their original point. Does seeing the number of prison uniforms spur viewers to action, or does it give chin-rubbing intellectual fulfillment accompanied by a deep sigh of worldliness? I’d hate to think it’s the latter. Someone I asked about this had a different reaction, and cited a group that had actually begun to act based on what they saw in his work. I wish I had the reference, but if that’s the case (and I hope it is), there’s no argument.

Looking at it another way, next time you reach for a plastic cup, will Jordan’s image that will come to mind? Will you make a different decision, even some of the time?

I’ve also just purchased his “Running the Numbers” book, since these web images are an injustice to the work. And I have more chin scratching and sighing to do.

(Thanks to Ron Kurti for the heads up on the video.)

Sunday, September 20, 2009 | collections, speaky  

Data & Drawing, Football Sunday Edition

I wanted to post this last week in my excitement over week 1 of pro football season (that’s the 300 lbs. locomotives pounding into each other kind of football, not the game played with actual balls and feet), but ran out of time. So instead, in honor of football Sunday, week 2, my favorite advertisement of last year’s football season:

The ad is a phone conversation with Coca-Cola’s Katie Bayne, animated by Imaginary Forces. A couple things I like about this… First, that the attitude is so much less heavy-handed than, say, the IBM spots that seem to be based on the premise that if they jump cut quickly enough, they can cure cancer. The woman being interviewed actually laughs about “big data” truisms. Next is the fact that it’s actually a fairly smart question that’s asked:

How important is it that you get the right information rather than just a lot of information?

Well… you know you can roll around in facts all day long. It’s critical that we stay aware of that mountain of data that’s coming in and mine it for the most valuable nuggets. It helps keep us honest.

And third, the visual quality that reinforces the lighter attitude. Cleverly drawn without overdoing it. She talks about being honest and a hand comes flying in to push back a Pinnocchio nose. Nuggets of data are shown as… eh, nuggets.

And the interviewer is a dog.

Sunday, September 20, 2009 | drawing, football, motion  

I am what I should have said much earlier

So it takes me a year or two to post the “You Are What You Say” lecture by Dan Frankowski, and the day after, a much more up-to-date paper is in the news. The paper is by Paul Ohm and is available here, or you can read an Ars Technica article about it if you’d prefer the (geeky) executive summary. The paper also sites the work of Latanya Sweeney (as did the Frankowski lecture), with this defining moment of the contemporary privacy debate, when the Massachusetts Group Insurance Commission (GIC) released “anonymized” patient data in the mid-90s:

At the time GIC released the data, William Weld, then Governor of Massachusetts, assured the public that GIC had protected patient privacy by deleting identifiers. In response, then-graduate student Sweeney started hunting for the Governor’s hospital records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts, a city of 54,000 residents and seven ZIP codes. For twenty dollars, she purchased the complete voter rolls from the city of Cambridge, a database containing, among other things, the name, address, ZIP code, birth date, and sex of every voter. By combining this data with the GIC records, Sweeney found Governor Weld with ease. Only six people in Cambridge shared his birth date, only three of them men, and of them, only he lived in his ZIP code. In a theatrical flourish, Dr. Sweeney sent the Governor’s health records (which included diagnoses and prescriptions) to his office.

And from the “where are they now?” file, Sweeney continues her work at Carnegie Mellon, though I have to admit I’m a little nervous that she’s currently back in my neighborhood with visiting posts at MIT and Harvard. Damn this Cambridge ZIP code.

Saturday, September 19, 2009 | privacy  

Learning from Lombardi

Just posted an essay about the work of artist Mark Lombardi that I presented at Experimenta Design in Lisbon last week. I don’t usually post lectures, but this is a kind of work-in-progress that I’m trying to sort out for myself.

it takes a very steady haaaaand

For the panel, we were to choose “an individual, movement, technology, whatever – whose importance has been overlooked” and follow that with “two themes that [we] believe will define the future of design and architecture.” In that context, I chose Lombardi’s work, and how it highlights a number of themes that are important to the future of design, particularly in working with data.

Saturday, September 19, 2009 | drawing, networks, social, talk  

Bio+Viz in da ‘berg

Give up those full hue heat map colors! Make images of biological data that even a grandmother can love! How about posters that no longer require an advanced degree to decipher? These platitudes and more coming next March, when I’ll be giving a keynote at the EMBO Workshop on Visualizing Biological Data in Heidelberg. Actually, I won’t be talking about any of those three things (though there’s a good chance I’ll talk about things like this), but registration is now open for participants:

Dear colleagues,

We invite you to participate in the first EMBO Workshop on Visualizing Biological Data (VizBi) 3 – 5 March 2010 at the EMBL’s new Advanced Training Centre in Heidelberg, Germany.

The goal of the workshop is to bring together, for the first time, researchers developing and using visualization systems across all areas of biology, including genomics, sequence analysis, macromolecular structures, systems biology, and imaging (including microscopy and magnetic resonance imaging). We have assembled an authoritative list of 29 invited speakers who will present an exciting program, reviewing the state-of-the-art and perspectives in each of these areas. The primary focus will be on visualizing processed and annotated data in their biological context, rather than on processing of raw data.

The workshop is limited in the total number participants, and each participant is normally required to present a poster and to give a ‘fastforward’ presentation about their work (limited to 30 seconds and 1 slide).

To apply to join the workshop, please go to http://vizbi.org and submit an abstract and image related to your work. Submissions close on 16 November 2009. Since places are limited, participants will be selected based on the relevance of their work to the goals of the workshop.

Notifications of acceptance will be sent within three weeks after the close of submissions.

We plan to award a prize for the submitted image that best conveys a strong scientific message in a visually compelling manner.

Please forward this announcement to anyone who may be interested. We hope to see you in Heidelberg next spring!

Seán O’Donoghue, EMBL
Jim Procter, University of Dundee
Nils Gehlenborg, European Bioinformatics Institute
Reinhard Schneider, EMBL

If you have any questions about the registration process please contact:

Adela Valceanu

Conference Officer
European Molecular Biology Laboratory
Meyerhofstr. 1
D-69117 Heidelberg
Tel: +49-6221-387 8625
Fax: +49-6221-387 8158
Email: valceanu@embl.de

For full event listings please visit our website or sign up for our newsletter.

Which also reminds me, I oughta finish cooking a few back-burner genetics projects before they go bad…

Tuesday, September 15, 2009 | science, talk  

Controlling the news cycle & the terror alert level

I’ve been hesitant to post this video of Keith Olbermann’s 17-minute timeline connecting the shifting terror alert level to the news cycle and administration at the risk of veering too far into politics, but I’m reminded again of it with Tom Ridge essentially admitting to it in his book:

In The Test of Our Times: America Under Siege, Ridge wrote that although Rumsfeld and Ashcroft wanted to raise the alert level, “There was absolutely no support for that position within our department. None. I wondered, ‘Is this about security or politics?'”

Only to recant and be taken to task by Rachel Maddow:

Ridge went on to say that “politics was not involved” and that “I was not pressured.” Maddow then read to Ridge directly from his book’s jacket: “‘He recounts episodes such as the pressure that the DHS received to raise the security alert on the eve of of the ’04 presidential election.’ That’s wrong?”

As Seth Meyers put it, “My shock level on manipulation of terror alerts for political gain is green, or low.”

At any rate, whether there is in fact correlation, causation, or simply a conspiracy theory that gives far too much credit to the number of people who would have to be involved, I think it’s an interesting look at 1) message control 2) using the press (or a clear example of the possibilities) 3) the power of assembling information like this to produce such a timeline, and 4) actual reporting (as opposed to tennis match commentary) done by a 24-hour news channel.

Of course, I was disappointed that it wasn’t an actual visual timeline, though somebody has probably done that as well.

Tuesday, September 8, 2009 | news, politics, security, time  

You stick out like a sore thumb in the matrix

Finally got around to watching Dan Frankowski’s “You Are What You Say: Privacy Risks of Public Mentions” Google Tech Talk the other day. (I had the link set aside for two years. There’s a bit of a backlog.) In the talk, he takes an “anonymized” set of movie ratings and removes the anonymity by matching them to public mentions of movies in user profiles on the same site.

Interestingly, the ratings themselves weren’t as informative as the actual choice of movies to talk about. In the case of a site for movie buffs — ahem, film aficionados — I couldn’t help but think about participants in discussions using obscure film references as colored tail feathers as they try to out-strut one another. Of course this has significant impact on such a method, making the point that individual uniqueness is only a signature for identification: what makes you different just makes you more visible to a data mining algorithm.

The other interesting bit from the talk is about 20 minutes through, where starts to address ways to defeat such methods. There aren’t many good ideas, because of the tradeoffs involved in each, but it’s interesting to think about.

Monday, September 7, 2009 | privacy, speaky  

Watching the evolution of the “Origin of Species”

I’ve just posted a new piece that depicts changes between the multiple editions of Darwin’s “On the Origin of Species:


To quote myself, because it looks important:

We often think of scientific ideas, such as Darwin’s theory of evolution, as fixed notions that are accepted as finished. In fact, Darwin’s On the Origin of Species evolved over the course of several editions he wrote, edited, and updated during his lifetime. The first English edition was approximately 150,000 words and the sixth is a much larger 190,000 words. In the changes are refinements and shifts in ideas — whether increasing the weight of a statement, adding details, or even a change in the idea itself.

The idea that we can actually see change over time in a person’s thinking is fascinating. Darwin scholars are of course familiar with this story, but here we can view it directly, both on a macro-level as it animates, or word-by-word as we examine pieces of the text more closely.

This is hopefully the first of multiple pieces working with this data. Having worked with it since last December, I’ve been developing a larger application that deals with the information in a more sophisticated way, but that’s continually set aside because of other obligations. This simpler piece was developed for Emily King’s “Quick Quick Slow” exhibition opening next week at Experimenta Design in Portugal. As is often the case, many months were spent to try to create something monolithic, then in a very short time, an offshoot of all that work is developed that makes use of that infrastructure.

Oddly enough, I first became interested in this because of a discussion with a friend a few years ago, who had begun to wonder whether Darwin had stolen most of his better ideas from Alfred Russel Wallace, but gained the notoriety and credit because of his social status. (This appealed to the paranoid creator in me.) She cited the first edition of Darwin’s text as incoherent, and that it gradually improved over time. Interestingly (and happily, I suppose), the process of working on this piece has instead shown the opposite, and I have far greater appreciation for Darwin’s ideas than I had in the past.

Friday, September 4, 2009 | science, text, time  

Cue the violins for American Telephone & Telegraph

The New York Times today looks upon the plight of poor AT&T, saddled with millions of new customers paying thousands of dollars a year. Jenna Wortham writes:

Slim and sleek as it is, the iPhone is really the Hummer of cellphones. It’s a data guzzler. Owners use them like minicomputers, which they are, and use them a lot. Not only do iPhone owners download applications, stream music and videos and browse the Web at higher rates than the average smartphone user, but the average iPhone owner can also use 10 times the network capacity used by the average smartphone user.

If that 10x number didn’t come from AT&T, where did it come from? Seems like they might be starting a “we didn’t want the iPhone anyway” campaign so that investors treat them more nicely when they (are rumored to) lose their carrier exclusivity next year.

The result is dropped calls, spotty service, delayed text and voice messages and glacial download speeds as AT&T’s cellular network strains to meet the demand. Another result is outraged customers.

So even with AT&T’s outrageous prices, they can’t make this work? This week I’m canceling my AT&T service because it would cost $150 a month to get what T-Mobile charges me $80 for. (Two lines with shared minutes, texting on both lines, unlimited data on one, and even tethering. I also love T-Mobile’s customer service, staffed by friendly humans who don’t just read from scripts.)

With nine million users paying in excess of $100 a month apiece, they’re grossing a billion dollars a month, and they’re complaining about having to upgrade their network? They could probably fund rebuilding their entire network from scratch with the $15/month they charge to send more than 200 text messages. (Text messages are pure profit, because they’re sent using extra space in packets sent between the phone and the carrier.)

All of the cited problems, of course, would be lessened without carrier exclusivity. Don’t want 9 million iPhone customers clogging the network? Then don’t sign a deal requiring that you’re the only network they have access to. Hilarious.

But! The real reason I’m posting is because of the photos that accompany the article, including a shot of the AT&T command center and its big board:

who turned the lights off?

A few thoughts:

  1. If they’re gonna make it look like an orchestra pit, then I hope the head of IT is wearing tails.
  2. Do they get night & weekend minutes because the lights are out? Wouldn’t the staff be a little happier if the lights were turned on?
  3. And most important, I wonder what kind of coverage they get in there. It looks like the kind of underground bunker where you can’t get any signal. And if I’m not mistaken, those look like land lines on the desks.
Thursday, September 3, 2009 | bigboard, mobile  

Health Numbers in Context

As a continuation of this project, we’ve just finished a second health visualization (also built with Processing) using GE’s data. Like the first round, we started with ~6 million patient records from their “MQIC” database. Using the software, you input gender, age range, height/weight (to calculate BMI), and smoking status. Based on the selections it shows you the number of people in the database that match those settings, and the percentages that have been diagnosed with diabetes, heart disease, hypertension, or have had a stroke:

are you blue? no, dark blue.

For people reading the site because they’re interested in visualization (I guess that’s all of you, except for mom, who is just trying to figure out what I’m up to), some inside baseball:

On the interaction side, the main objective here was to make it easy to move around the interface as quickly as possible. The rows are shown in succession so that the interface can teach itself, but we also provide a reset button so that you can return to the starting point. Once the rows are visible, though, it’s easy to move laterally and make changes to the settings (swapping between age ranges, for instance).

One irony of making the data accessible this way is that most users — after looking up their own numbers — will then try as many different possibilities, in a quick hunt for the extremes. How high do the percentages go? If I select bizarre values, what happens at the edges? Normally, you don’t have to spend as much time on these 1% cases, and it would be alright for things to be a little weird when truly odd values are entered (300 lb. people who are 4′ tall, smokers, and age 75 and over). But in this case, a lot more time has to be spent making sure things work. So while most of the time the percentages at the top are in the 5-15% range, I had to write code so that when one category shoots up to 50%, the other bars in the chart scale down in proportion.

Another aspect of the interface is the body mass index calculator. Normally a BMI chart looks something like this, a large two-dimensional plot that would otherwise use up half of the interface. By using a little interaction, we can make a simpler chart that dynamically updates itself based on the current height or weight settings. Also, because the ranges have (mathematically) hard edges, we’re showing that upper and lower bound of the range so that it’s more apparent. Otherwise, a 5’8″ person steps from 164 to 165 lbs to find themselves suddenly overweight. In reality, the boundaries are more fuzzy, which would be taken into account by a doctor. But with the software, we instead have to be clear about the way the logic is working.

(Note that the height and weight are only used to calculate a BMI range — it’s not pulling individuals from the database who are 5’8″ and 160 lbs, it’s pulling people from the “normal” BMI range.)

For the statistically (or at least numerically) inclined, there are also some interesting quirks that can be found, like a situation or two where health risk would be expected to go up, but in fact they go down (I’ll leave you to find them yourself). This is not a bug. We’re not doing any sort of complex math here to evaluate actual risk, the software is just a matching game with individuals in the database. These cases in particular show up when there are only a few thousand individuals, say 2,000 out of the full 6 million records. The number of people in these edge cases is practically a rounding error, which means that we can’t make sound conclusions with them. As armchair doctor-scientist, it’s also interesting to speculate as to what might be happening in such cases, and how other factors may come into play.

Have fun!

Wednesday, August 26, 2009 | interact, mine, probability, processing, seed  

History of Processing, as told by John Maeda

kicking it color mac classic styleJohn Maeda (Casey and I’s former advisor) has written a very gracious, and very generous article about the origins of the Processing project for Technology Review. An excerpt:

In 2001, when I was a young MIT faculty member overseeing the Media Lab Aesthetics and Computation Group, two students came up with an idea that would become an award-winning piece of software called Processing—which I am often credited with having a hand in conceiving. Processing, a programming language and development environment that makes sophisticated animations and other graphical effects accessible to people with relatively little programming experience, is today one of the few open-source challengers to Flash graphics on the Web. The truth is that I almost stifled the nascent project’s development, because I couldn’t see the need it would fill. Luckily, Ben Fry and Casey Reas absolutely ignored my opinion. And good for them: the teacher, after all, isn’t always right.

To give him more credit (not that he needs it, but maybe because I’m bad with compliments), John’s objection had much to do with the fact that Processing was explicitly an evolutionary, as opposed to revolutionary, step in how coding was done. That’s why it was never the focus of my Masters or Ph.D. work, and instead has nearly always been a side project. And more importantly, for students in his research group, he usually forced us away from whatever came naturally for us. Those of us for whom creating tools was “easy,” he forced us to make less practical things. For those who were comfortable making art, he steered them toward creating tools. In the end, we all learned more that way.

Tuesday, August 25, 2009 | processing  

Tiny Sketch, Big Funny

not all sketches are 6x6 pixels in sizeJust heard about this from Casey yesterday:

Tiny Sketch is an open challenge to artists and programmers to create the most compelling creative work possible with the programming language Processing using 200 characters or less.

…building on the proud traditions of obfuscated code contests and the demo scene. The contest runs through September 13 and is sponsored by Rhizome and OpenProcessing.

Having designed Processing to do one thing or another, several of the submissions made me laugh out loud for ways their authors managed to introduce new quirks. For instance, consider the createFont() function. Usually it looks something like this:

PFont f = createFont("Helvetica", 12);

If the “Helvetica” font is not installed, it silently winds up using a default font. So somebody clever figured out that if you just leave the font name blank, it’s an easy way to get a default font, and not burn several characters of the limit:

PFont f = createFont("", 12);

Another, by Kyle McDonald, throws an exception as a way to produce text to plot on screen. (It’s also a bit of an inside joke—on us, perhaps—because it’s a ubiquitous error message resulting from a change that was made since earlier releases of Processing.)

One of the most interesting bits is seeing how these ideas propagate into later sketches that are produced. Since the font hack appeared (not sure who did it first, let me know if you do), everyone else is now using that method for producing text. Obviously art/design/coding projects are always the result of other influences, but it’s rare that you get to see ideas exchanged in such a direct fashion.

And judging from some of the jagged edges in the submissions, I’m gonna change the smooth() to just s() for the next release of Processing, so that more people will use it in the next competition.

Friday, August 14, 2009 | code, opportunities, processing  

Weight Duplexing, Condensed Tabulars, and Multiple Enclosures

More typographic tastiness (see the earlier post) from Hoefler & Frere-Jones with a writeup on Choosing Fonts for Annual Reports. Lots of useful design help and ideas for anyone who works with numbers, whether actual annual reports or (more likely) fighting with Excel and PowerPoint. For instance, using enclosures to frame numbers, or knock them out:

knocking out heaven's door

Another helpful trick is using two weights so that you can avoid placing a line between them:

pick em out of a lineup

Or using a proper condensed face when you have to invite too many of your numerical friends:

squeeze me macaroni

At any rate, I recommend the full article for anyone working with numbers, either for the introduction to setting type (for the non-designers) or a useful reminder of some of the solutions (for those who fret about these things on a regular basis).

Thursday, August 6, 2009 | refine, typography  

Also from the office of scary flowcharts

Responding to the Boehner post, Jay Parkinson, M.D. pointed me to this improved chart by designer Robert Palmer, accompanied by an angst-ridden open letter (an ironic contrast to the soft pastels in his diagram) decrying the crimes of visual malfeasance.

gonna have to face it you're addicted to chartsMeanwhile, Ezra Klein over at the Washington Post seems to be thinking along similar lines as my original post, noting this masked artist’s earlier trip to Kinko’s a few weeks ago. Klein writes:

it may be small, but there is still terrorWhoever is heading the Scary Flowcharts Division of John Boehner’s office is quickly becoming my favorite person in Washington. A few weeks ago, we got this terror-inducing visualization of the process behind “Speaker Pelosi’s National Energy Tax.”

That’s hot!

If I were teaching right now, I’d make all my students do a one day charrette on trying to come up with something worse than the Boehner health care image while staying in the realm of colloquial things you can do with PowerPoint. It’d be a great time, and we’d all learn a lot.

Having spent two posts making fun of the whole un-funny mess around health care, I’ll leave you with the best bit of op-ed I’ve read on the topic, from Harold Meyerson, also at the Washington Post:

Watching the centrist Democrats in Congress create more and more reasons why health care can’t be fixed, I’ve been struck by a disquieting thought: Suppose our collective lack of response to Hurricane Katrina wasn’t exceptional but, rather, the new normal in America. Suppose we can no longer address the major challenges confronting the nation. Suppose America is now the world’s leading can’t-do country.

I agree and find it terrifying. And I don’t think that’s a partisan issue.

Now back to your purposefully apolitical, regularly scheduled blog on making pictures of data.

Thursday, August 6, 2009 | feedbag, flowchart, obfuscation, politics, thisneedsfixed  

Thesaurus Plus Context

can i get it in red?BBC News brings word (via) that after a 44 year effort, the Historical Thesaurus of the Oxford English Dictionary will see the light of day. Rather than simple links between words, the beastly volume covers the history of the words within. For instance, the etymological timeline of the word “trousers” follows:

trousers breeks 1552- · strosser 1598-1637 · strouse 1600-1620 · brogues 1615- a 1845 · trouses 1679-1820 · trousers 1681- · trouser 1702- ( rare ) · inexpressibles 1790- ( colloq. ) · indescribables 1794-1837 ( humorous slang ) ·etceteras 1794-1843 ( euphem. ) · kickseys/kicksies 1812-1851 ( slang ) · pair of trousers 1814- · ineffables 1823-1867 ( colloq. ) · unmentionables 1823- · pantaloons 1825- · indispensables a 1828- ( colloq. euphem. ) · unimaginables 1833 · innominables 1834/43 ( humorous euphem. ) · inexplicables 1836/7 · unwhisperables 1837-1863 ( slang ) · result 1839 · sit-down-upons 1840-1844 ( colloq. ) · pants 1840- · sit-upons 1841-1857 ( colloq. ) · unutterables 1843; 1860 ( slang Dict. ) · trews 1847- · sine qua nons 1850 · never-mention-ems 1856 · round-me-houses 1857 ( slang ) · round-the-houses 1858- ( slang ) · unprintables 1860 · stove-pipes 1863 · terminations 1863 · reach-me-downs 1877- · sit-in-’ems/sitinems 1886- ( slang ) · trousies 1886- · strides1889- ( slang ) · rounds 1893 ( slang ) · rammies 1919- ( Austral. &S. Afr. slang ) · longs 1928- ( colloq. )

Followed by a proper explanation:

breeks The earliest reference from 1552 marks the change in fashion from breeches, a garment tied below the knee and worn with tights. Still used in Scotland, it derives from the Old English “breeches”. trouser The singular form of “trousers” comes from the Gallic word “trews”, a close-fitting tartan garment formerly worn by Scottish and Irish highlanders and to this day by a Scottish regiment. The word “trouses” probably has the same derivation. unimaginables This 19th Century word, and others like “unwhisperables” and “never-mention-ems”, reflect Victorian prudery. Back then, even trousers were considered risque, which is why there were so many synonyms. People didn’t want to confront the brutal idea, so found jocular alternatives. In the same way the word death is avoided with phrases like “pass away” and “pushing up daisies”. stove-pipes A 19th Century reference hijacked in the 1950s by the Teddy Boys along with drainpipes. The tight trousers became synonymous with youthful rebellion, a statement of difference from the standard post-war suits. rammies This abbreviation of Victorian cockney rhyming slang “round-me-houses” travelled with British settlers to Australia and South Africa.

Are you seeing pictures and timelines yet? Then this continues for 600,000 more words. Mmmm!

And Ms. Christian Kay, one of the four editors, is my new hero:

An English language professor, Ms Kay, one of four co-editors of the publication, began work on it in the late 1960s – while she was in her 20s.

It’s hard to fathom being in your 60s, and completing the book that you started in your 20s, though it’s difficult to argue with the academic and societal contribution of the work. Her web page also lists “the use of computers in teaching and research” as one of her interest areas, which sounds like a bit of an understatement. I’d be interested in computers too if my research interest was the history 600,000 words and their 800,000 meanings across 236,000 categories.

Sadly, this book of life is not cheap, currently listed at Amazon for $316, (but that’s $79 off the cover price!) Though with a wife who covets the full 20 volume Oxford English Dictionary (she already owns the smaller, 35 lbs. version), I may someday get my wish.

Wednesday, August 5, 2009 | text  

Mapping Health Care: Here Be Dragons!

I’m so completely impressed with this incredible bit of info graphic awesomeness distributed by the office of John Boehner, Republican congressman from Ohio’s 8th District. The flow chart purports to show the Democrats’ health care proposal:

keep pushing this health care thing and it's only gonna get uglier!

The image needs to be much larger to be fully appreciated in its magnificent glory of awfulness, so a high resolution version is here, and the PDF version is here.

The chart was used by Boehner as a way to make the plan look as awful as possible — a tactic used to great effect by the same political party during the last attempt at health care reform in 1994. The diagram appears to be the result of a heart-warming collaboration between a colorblind draughtsman, the architect of a nightmarish city water works, and whoever designed the instructions for the bargain shelving unit I bought from Target.

Don’t waste your time, by the way — I’ve already nominated it for an AIGA award.

(And yes, The New Republic also created a cleaner version, and the broader point is that health care is just a complex mess no matter what, so don’t let that get in the way of my enjoyment of this masterwork.)

Additional perspective from The Daily Show (my original source) follows.

Tuesday, August 4, 2009 | flowchart, obfuscation, politics, thisneedsfixed  

Our gattaca future begins with our sports heroes

The New York Times this morning documents Major League Baseball’s use of DNA tests to verify the age of baseball prospects:

Dozens of Latin American prospects in recent years have been caught purporting to be younger than they actually were as a way to make themselves more enticing to major league teams. Last week the Yankees voided the signing of an amateur from the Dominican Republic after a DNA test conducted by Major League Baseball’s department of investigations showed that the player had misrepresented his identity.

Some players have also had bone scans to be used in determining age range.

(Why does a “bone scan” sound so painful? “You won’t provide a DNA sample? Well, maybe you’ll change your mind after the bone scan!”)

Kathy Hudson of Johns Hopkins notes the problem with testing:

“The point of [the Genetic Information Nondiscrimination Act, passed last year] was to remove the temptation and prohibit employers from asking or receiving genetic information.”

The article continues and makes note of the fact that such tests are also used to determine whether a player’s parents are his real parents, which can have an upsetting outcome.

But perhaps the broader concern (outside broken homes) and the scarier motivation for expansion of such testing is noted by a scouting director (not named), who comments:

“Can they test susceptibility to cancer? I don’t know if they’re doing any of that. But I know they’re looking into trying to figure out susceptibility to injuries, things like that. If they come up with a test that shows someone’s connective tissue is at a high risk of not holding up, can that be used? I don’t know. I do think that’s where this is headed.”

Injury is perhaps the most significant, yet most random, factor in scouting. If we’re talking about paying someone $27 million, will the threat of a federal discrimination law (wielded by a young player and agent) really be enough to keep teams away from this?

Wednesday, July 22, 2009 | genetics, sports  

Mediocre metrics, and how did we get here?

In other news, an article from Slate about measuring obesity using BMI (Body Mass Index). Interesting reading as I continue with work in the health care space. The article goes through the obvious flaws of the BMI measure, along with some history. Jeremy Singer-Vine writes:

Belgian polymath Adolphe Quetelet devised the equation in 1832 in his quest to define the “normal man” in terms of everything from his average arm strength to the age at which he marries. This project had nothing to do with obesity-related diseases, nor even with obesity itself. Rather, Quetelet used the equation to describe the standard proportions of the human build—the ratio of weight to height in the average adult. Using data collected from several hundred countrymen, he found that weight varied not in direct proportion to height (such that, say, people 10 percent taller than average were 10 percent heavier, too) but in proportion to the square of height. (People 10 percent taller than average tended to be about 21 percent heavier.)

For some reason, this brings to mind a guy in a top hat guessing peoples’ weight at the county fair. More to the point is the “how did we get here?” part of the story. Starting with a mediocre measure, it evolved into something for which it was never intended, simply because it worked for a large number of individuals:

The new measure caught on among researchers who had previously relied on slower and more expensive measures of body fat or on the broad categories (underweight, ideal weight, and overweight) identified by the insurance companies. The cheap and easy BMI test allowed them to plan and execute ambitious new studies involving hundreds of thousands of participants and to go back through troves of historical height and weight data and estimate levels of obesity in previous decades.

Gradually, though, the popularity of BMI spread from epidemiologists who used it for studies of population health to doctors who wanted a quick way to measure body fat in individual patients. By 1985, the NIH started defining obesity according to body mass index, on the theory that official cutoffs could be used by doctors to warn patients who were at especially high risk for obesity-related illness. At first, the thresholds were established at the 85th percentile of BMI for each sex: 27.8 for men and 27.3 for women. (Those numbers now represent something more like the 50th percentile for Americans.) Then, in 1998, the NIH changed the rules: They consolidated the threshold for men and women, even though the relationship between BMI and body fat is different for each sex, and added another category, “overweight.” The new cutoffs—25 for overweight, 30 for obesity—were nice, round numbers that could be easily remembered by doctors and patients.

I hadn’t realized that it was only 1985 that this came into common use. And I thought the new cutoffs had more to do with the stricter definition from the WHO, rather than the simplicity of rounding. But back to the story:

Keys had never intended for the BMI to be used in this way. His original paper warned against using the body mass index for individual diagnoses, since the equation ignores variables like a patient’s gender or age, which affect how BMI relates to health.

After taking as fact that it was a poor indicator, all this grousing about the inaccuracy of BMI now has me wondering how often it’s actually out of whack. For instance, it does poorly for muscular athletes, but what percentage of the population is that? 10% at the absolute highest? Or at the risk of sounding totally naive, if the metric is correct, say, 85% of the time, does it deserve as much derision as it receives?

Going a little further, another fascinating part of returns to the fact that the BMI numbers had in the past been a sort of guideline used by doctors. Consider the context: a doctor might sit with a patient in their office, and if the person is obviously not obese or underweight, not even consider such a measure. But if there’s any question, BMI provides a general clue as to an appropriate range, which, when delivered by a doctor with experience, can be framed appropriately. However, as we move to using technology to record such measures—it’s easy to put an obesity calculation into an electronic medical record, for instance, that EMR does not (necessarily) include the doctor’s delivery.

Basically, we can make a general rule or goal that numbers that require additional context (delivery by a doctor), shouldn’t be stored in places devoid of context (databases). If we’re taking away context, the accuracy of the metric has to increase in proportion (or proportion squared, even) to the amount of context that has been removed.

I assume this is the case for most fields, and that the statistical field has a term (probably made up by Tukey) for the “remove context, increase accuracy” issue. At any rate, that’s the end of today’s episode of “what’s blindingly obvious to proper statisticians but I like working out for myself.”

Tuesday, July 21, 2009 | human, numberscantdothat  

“…the sexiest numbers I’ve seen in some time.”

A Fonts for Financials mailing from Hoefler & Frere-Jones includes some incredibly beautiful typefaces they’ve developed that play well with numbers. A sampling includes tabular figures (monospaced numbers, meaning “farewell, Courier!”) using Gotham and Sentinel:

courier and andale mono can bite me

Or setting indices (numbers in circles, apparently), using Whitney:

numbers, dots, dots, numbers

As Casey wrote this morning, “these are the sexiest numbers I’ve seen in some time.” I love ’em.

Tuesday, July 21, 2009 | typography  

Dropping Statistics for Knowledge

My favorite part of this week’s Seminar on Innovative Approaches to Turn Statistics into Knowledge (aside from its comically long name) was the presentation from Amanda Cox of The New York Times. She showed three particular projects which are a little further up the complexity scale as compared to a lot of the work from the Times, and much more like the sort of numerical messes that catch my interest. The three serve are also a great cross-section of Amanda’s work with her collaborators, so I’m posting them here. Check ’em out:

“How Different Groups Voted in the 2008 Democratic Presidential Primaries” by Shan Carter and Amanda Cox:

oh hillary

“All of Inflation’s Little Parts” by Matthew Bloch, Shan Carter and Amanda Cox

soap bubble opera

And finally, “Turning a Corner?” which is perhaps the most complicated of the bunch, but gets more interesting as you spend a little more time with it.

just give it some time

Sunday, July 19, 2009 | infographics  

“There’s a movie in there, but it’s a very unusual movie.”

how about some handsome with that?On the heels of today’s posting of the updated Salary vs. Performance piece comes word in the New York Times that a film version of Moneyball has been shelved:

Just days before shooting was to begin, Sony Pictures pulled the plug on “Moneyball,” a major film project starring Brad Pitt and being directed by Steven Soderbergh.

Yesterday I found it far more unsettling that such a movie would be made period, but today I’m oddly curious about how they might pull it off:

What baseball saw as accurate, Sony executives saw as being too much a documentary. Mr. Soderbergh, for instance, planned to film interviews with some of the people who were connected to the film’s story.

I guess we’ll never know, since other studios also passed on the project, but that’s probably a good thing.

As an aside, I’m in the midst of reading Liar’s Poker (another by Moneyball author Michael Lewis) and again find myself amused by his ability as a storyteller: he reminds me of a friend who can take the most banal event and turn it into the most peculiar and hilarious story you’ve ever heard.

Thursday, July 2, 2009 | movies, reading, salaryper  

Salary vs. Performance for 2009

go tigersI’ve just posted the updated version of Salary vs. Performance for the 2009 baseball season. I had hoped this year to rewrite the piece to cover multiple years, have a couple more analysis options, and even to rebuild it using the JavaScript version of Processing (no Java! no plug-in!), but a busy spring has upended my carefully crafted but poorly implemented plans.

Meanwhile, my inbox has been filling with plaintive comments like this one:

Will you be updating this site for this year? It’s the first year I think my team, the Giants would have a blue line instead of a red line.

How can I ignore the Giants fans? (Or for that matter, their neighbors to the south, the Dodgers, who perch atop the list as I write this.)

More about the project can be found in the archives. Visualizing Data explains the code and how it works, and the code itself is amongst the book examples.

Thursday, July 2, 2009 | inbox, salaryper  

It’s 10:12 in New York — but 18:12 in Baghdad!

Passed along by Jane Nisselson, a photo she found in the New Yorker, apropos of my continued fascination with command centers and the selection of information they highlight:

I think it was those clocks and choice of cities that were memorable. It is actually One Police plaza and not the terrorism HQ on Coney Island. The photographer is Eugene Richards.

r kelly is back!

For New Yorker readers, the original article is here.

Thursday, June 25, 2009 | bigboard, feedbag  

Curiosity Kills Privacy

There’s simply no way to give people access to others’ private records — in the name of security or otherwise — and trust those given access to do the right thing. From a New York Times story on the NSA’s expanded wiretapping:

The former analyst added that his instructors had warned against committing any abuses, telling his class that another analyst had been investigated because he had improperly accessed the personal e-mail of former President Bill Clinton.

This is not isolated, and this will always be the case. From a story in The Boston Globe a month ago:

Law enforcement personnel looked up personal information on Patriots star Tom Brady 968 times – seeking anything from his driver’s license photo and home address, to whether he had purchased a gun – and auditors discovered “repeated searches and queries” on dozens of other celebrities such as Matt Damon, James Taylor, Celtics star Paul Pierce, and Red Sox owner John Henry, said two state officials familiar with the audit.

The NSA wiretapping is treated too much like an abstract operation, with most articles that describe it overloaded with talk of “data collection,” and “monitoring,” and the massive scale of data that traffics through internet service providers. But the problem isn’t the computers and data and equipment, it’s that on the other end of the line, a human being is sitting there deciding what to do with that information. Our curiosity and voyeurism leaves us fundamentally flawed for dealing with such information, and unable to ever live up to the responsibility of having that access.

The story about the police officers who are overly curious about sports stars (or soft rock balladeers) is no different from the NSA wiretapping, because it’s still people, with the same impulses, on the other end of the line. Until reading this, I had wanted to believe that NSA employees — who should truly understand the ramifications — would have been more professional. But instead they’ve proven themselves no different from a local cop who wants to know if Paul Pierce owns a gun or Matt Damon has a goofy driver’s license picture.

Friday, June 19, 2009 | human, privacy, security  


Adobe Illustrator has regressed into talking back like it’s a two-year-old:

cant do that noooo

Asked for further comment, Illustrator responded:


No doubt this is my own fault for not having upgraded to CS4. I’ll wait for CS5 when I can shell out for the privilege of using 64-bits, maybe the additional memory access will allow me to open files that worked in Illustrator 10 but no longer open on newer releases because the system (with 10x the RAM, and 5x the CPU) runs out of memory.

Wednesday, June 17, 2009 | software  

Transit Trekkies

Casey wrote with more info regarding the previous post about Pelham. The command center in the movie is fake (as expected), because the real command center looks too sophisticated. NPR had this quote from John Johnson (spelling?), New York City Transit’s Chief Transportation Officer:

“They actually … attempted to downplay what the existing control center looks like, because they wanted to make it look real to the average eye as compared to… we’re pretty Star Trekky up in the new control center now.”

So that would explain the newish typeface used in the image, and the general dumbing-down of the display. The audio from the NPR story is here, with the quote near the 3:00 mark.

This is the only image I’ve been able to find of the real command center:

where are the people?

Links to larger/better/more descriptive images welcome!

Tuesday, June 16, 2009 | bigboard, movies  

Pelham taking my money in 3-2-1

I might go see the remake of The Taking of Pelham One Two Three just to get a better look at this MTA command center:

denzel and his data

Is this a real place? Buried within the bowels of New York City? And Mr. Washington, how about using one of your two telephones to order a new typeface for that wall? Looks like a hundred thousand dollars of display technology being used for ASCII line art.

Maybe I’ll see the original instead.

Friday, June 12, 2009 | bigboard, movies  

Collections for Charity

sheena is... a punk rockerLast week at the CaT conference, I met Sheena Matheiken, a designer who is … I’ll let her explain:

Starting May 2009, I have pledged to wear one dress for one year as an exercise in sustainable fashion. Here’s how it works: There are 7 identical dresses, one for each day of the week. Every day I will reinvent the dress with layers, accessories and all kinds of accouterments, the majority of which will be vintage, hand-made, or hand-me-down goodies. Think of it as wearing a daily uniform with enough creative license to make it look like I just crawled out of the Marquis de Sade’s boudoir.

Interesting, right? Particularly where the idea is to make the outfit new through the sort of forced creativity that comes from wearing a uniform. But also not unlike the dozens (hundreds? thousands?) of other “I’m gonna do x each day for 365 days” projects, where obsessive compulsive types take a photo, choose a Pantone swatch, learn a new word, etc. in celebration of the Earth revolving about its axis once more. Yale’s graduate graphic design program even frequents a yearly “100 day” project along these lines. (Don’t get me wrong–I’m happy to obsess and compulse with the best of them.)

But then it gets more interesting:

The Uniform Project is also a year-long fundraiser for the Akanksha Foundation, a grassroots movement that is revolutionizing education in India. At the end of the year, all contributions will go toward Akanksha’s School Project to fund uniforms and other educational expenses for slum children in India.

How cool! I love how this ties the project together. More can be found at The Uniform Project, with daily photos of Sheena’s progress. And be sure to donate.

I’m looking forward to what she has to say about what she’s learned about clothes and how you wear them after the year is complete. Ironic, that the year she wears the same thing for 365 days will be her most creative.

Tuesday, June 9, 2009 | collections  

Comorbidity: it’s no longer just for physicians and statisticians

A simple, interactive means for seeing connections between demographics, diseases, and diagnoses:

imagining health as 300 people symbols rearranging themselves in a data symphony

We just finished developing this project for GE as part of the launch of their new health care initiative. With the input and guidance of a handful of departments within the company, we began by looking at their proprietary database of 14 million patient records looking for ways to show connections between related conditions. For instance, we wanted visitors to the site to be able to learn how diabetes diagnoses increase along with obesity, but convey it in a manner that didn’t feel like a math lesson. By cycling through the eight items at the top (and the row beneath it), you can make several dozen comparisons, highlighting what’s found in actual patient data. At the bottom, some additional background is provided based on various national health care studies.

I’m excited to have the project finished and online, and have people making use of it, as I readjust from the instant gratification of building things one day and then talking about them the next day. More to come!

Monday, May 18, 2009 | seed  

15 Views of a Node Link Graph

Depicting networks (also known as graphs, and covered in chapters 7 and 8 of Visualizing Data) is a tricky subject, and too often leads to representations that are a tangled and complicated mess. Such diagrams are often referred to with terms like ball of yarn or string, a birds nest, cat hair or simply hairball.

It’s also common for a network diagram to be engaging and attractive for its complexity (usually aided and abetted by color), which tends to hide how poorly it conveys the meaning of the data it represents.

On the other hand, Tamara Munzner is someone in visualization who really “gets” graphs in greater depth. A couple years ago she gave an excellent Google Tech Talk (looks like it was originally from another conference in ’05), titled “15 Views of a Node Link Graph” (video, links, slides) where she discussed a range of methods for working viewing graph data, along with their pros and cons:

A cheat sheet of the 15 methods:

  1. Edge List
  2. Hand-Drawn
  3. Dot
  4. Force-Directed Placement
  5. TopoLayout
  6. Animated Radial Layouts
  7. Constellation
  8. Treemaps
  9. Cushion Treemaps
  10. Themescapes
  11. Multilevel Call Matrices
  12. SpaceTree
  13. 2D Hyperbolic Trees
  14. H3
  15. TreeJuxtaposer

The presentation is an excellent survey of methods, and highly recommended for anyone getting started with graph and network data. It’s useful food for thought for the “how should I represent this data?” question.

Wednesday, May 13, 2009 | networks, represent  


the erudite and articulate john underkofflerI was in the midst of starting a new post in January so I failed to make a post about it at the time, but Oblong‘s Tamper installation was on display at the 2009 Sundance Film Festival. John writes (and I copy verbatim):

Our Sundance guests — who already number in the thousands — find the experience exhilarating. A few grim cinephiles have supplementally raised an eyebrow (one per cinephile) at the filmic heresy that TAMPER provides: a fluid new ability to isolate, manipulate, and juxtapose (rudely, say the grim) disparate elements (ripped from some of the greatest works of cinema, continue the grim). For us, what’s important is the style of work: real-time manipulation of media elements at a finer granularity than has previously been customary or, for the most part, possible; and a distinctly visceral, dynamic, and geometric mode of interaction that’s hugely intuitive because the incorporeal suddenly now reacts just like bits of the corporeal world always have. Also, it’s glasses-foggingly fun.

Read more at Oblong Industries.

Tuesday, May 12, 2009 | interact, movies  

Electric Avenues

That’s right, I’m trying to ruin your Friday by planting Eddy Grant in your head.

A very nicely done visualization from NPR of the U.S. Electrical Grid:


I mostly find this fascinating having not seen it properly depicted, but the interactive version shows more about locations of power plants, plus maps of solar and wind power along with their relative capacities.

I love the craggy beauty of the layered lines, and appreciate the restraint of the map’s creators to simply show us this amazing data set.

And if you find yourself toe tapping and humming “we gonna rock down to…” later this afternoon, then I’m really sorry. I’m already beginning to regret it.

(Thanks, Eugene)

Friday, May 8, 2009 | energy, mapping  

No Date? No Time!

I’ve not been working on Windows much lately, but while installing Windows XP today, I was greeted with this fine work of nonfiction, which reminds me why I miss it so:

oh, well i guess that makes sense

So I can’t synchronize the time because…the time on the machine is incorrect. And not only that, but my state represents a security risk to the time synchronization machine in the sky.

I hope the person who wrote this error message enjoyed it as much as I did. At least when writing bad error messages in Processing I have some leeway for making fun of the situation (hence the unprofessional window titles of some of the error dialogs).

Thursday, May 7, 2009 | software  

No really, 3% of our GDP

Reader Eric Mika sent a link to the video of Obama’s speech that I mentioned a couple days ago. The speech was knocked from the headlines by news of Arlen Specter leaving the Republican party within just a few hours, so this is my chance to repeat the story again.

Specter’s defection is only relevant (if it’s relevant at all) until the next election cycle, so it’s frustrating to see something that could affect us for five to fifty years pre-empted by what talking heads are more comfortable bloviating about. It’s a reminder that with all the progress we’ve made on how quickly we can distribute news, and the increase in the number of outlets by which it’s available, the quality and thoughtfulness of the product has only been further undermined.

Update, a few hours later: it’s a battle of the readers! now Jamie Alessio passed along a high quality video of the the President’s speech from the White House channel on YouTube. Here’s the embedded version:

Monday, May 4, 2009 | government, news, science, speaky  

Lecture tomorrow in Northampton, MA

I’ll be doing a talk on Tuesday evening (5 May 2009) for wmassdevs:

Author Ben Fry will be presenting “Computational Information Design” –a mix of his work in visualization and coding plus a quick introduction to Processing. We are very excited to talk to Mr. Fry and our thanks go out to this event’s sponsors: Atalasoft and Snowtide Informatics.

He will be presenting to the Western MA Developers Group on Tuesday, May 5th at 6:30pm at 243 King String, Poutpourri Plaza in Northampton, MA. This event will be hosted by Snowtide Informatics, beverages and snacks will be available.

Mmm, snacks!

Monday, May 4, 2009 | talk  

Not in our character to follow

obama rocks the academyObama’s goal for research and development is 3% of our GDP:

I believe it is not in our American character to follow – but to lead. And it is time for us to lead once again. I am here today to set this goal: we will devote more than three percent of our GDP to research and development. We will not just meet, but we will exceed the level achieved at the height of the Space Race, through policies that invest in basic and applied research, create new incentives for private innovation, promote breakthroughs in energy and medicine, and improve education in math and science. This represents the largest commitment to scientific research and innovation in American history.

I’m not much for patriotism rah-rah but it’s hard not to get fired up about this. I found the rest of his speech remarkable as well, listing specific technologies that emerged from basic research, too often overlooked:

The Apollo program itself produced technologies that have improved kidney dialysis and water purification systems; sensors to test for hazardous gasses; energy-saving building materials; and fire-resistant fabrics used by firefighters and soldiers.

And the announcement of a new agency along the lines of DARPA:

And today, I am also announcing that for the first time, we are funding an initiative – recommended by this organization – called the Advanced Research Projects Agency for Energy, or ARPA-E.

This is based on the Defense Advanced Research Projects Agency, known as DARPA, which was created during the Eisenhower administration in response to Sputnik. It has been charged throughout its history with conducting high-risk, high-reward research. The precursor to the internet, known as ARPANET, stealth technology, and the Global Positioning System all owe a debt to the work of DARPA.

The speech, nearly 5000 words in total (did our former President spill that many words for science during eight years in office?) continues with more policy regarding research, investment, and education–all very exciting to read. But perhaps my most favorite line of all, when he said to the members of the National Academy of Sciences in attendance:

And so today I want to challenge you to use your love and knowledge of science to spark the same sense of wonder and excitement in a new generation.

Tuesday, April 28, 2009 | government, science  

OpenSecrets no longer secret

Word on the street (where by “the street” I mean an email from Golan Levin), is that the Center for Responsive Politics has made available piles and piles of data:

The following data sets, along with a user guide, resource tables and other documentation, are now available in CSV format (comma-separated values, for easy importing) through OpenSecrets.org’s Action Center at http://www.opensecrets.org/action/data.php:

CAMPAIGN FINANCE: 195 million records dating to the 1989-1990 election cycle, tracking campaign fundraising and spending by candidates for federal office, as well as political parties and political action committees. CRP’s researchers add value to Federal Election Commission data by cleaning up and categorizing contribution records. This allows for easier totaling by industry and company or organization, to measure special-interest influence.

LOBBYING: 3.5 million records on federal lobbyists, their clients, their fees and the issues they reported working on, dating to 1998. Industry codes have been applied to this data, as well.

PERSONAL FINANCES: Reports from members of Congress and the executive branch that detail their personal assets, liabilities and transactions in 2004 through 2007. The reports covering 2008 will become available to the public in June, and the data will be available for download once CRP has keyed those reports.

527 ORGANIZATIONS: Electronically filed financial records beginning in the 2004 election cycle for the shadowy issue-advocacy groups known as 527s, which can raise unlimited sums of money from corporations, labor unions and individuals.

The best thing here is that they’ve already tidied and scrubbed the data for you, just like Mom used to. The personal finance information alone has already led to startling revelations.

Monday, April 13, 2009 | data, politics  

Processing Time

processing-time-200pxSpend your Saturday making clocks with code:

Processing Time

A code jam / party and programming competition
Part of the Boston Cyberarts Festival

Saturday, May 2, 2009 – MIT N52–390
265 Massachusetts Ave, 3rd Floor

  • Compete individually or in pairs to design and develop beautiful programs in Processing
  • Snack and refresh yourself
  • Present completed projects to other participants and visitors at the end of the day
  • Anyone (not just MIT students or community members) can compete, anyone can stop by to see presentations
  • Meet the creators of Processing, Ben Fry (in person) and Casey Reas (via video), who will award prizes


  • 12:30-01:00 pm: Check in
  • 01:00-01:15 pm: Welcome
  • 01:15-05:15 pm: Coding Session
  • 05:15-06:45 pm: Presentations and Awards – Public welcome!


Register in advance, individually or in two-person teams, by emailing processing-time@mit.edu with one or two participant names and a team name.


Processing Time is sponsored by MIT (Arts Initiatives at MIT, Center for Advanced Visual Studies, Program in Writing & Humanistic Studies) and is part of the Boston Cyberarts Festival.

The Processing Time page, linked to a nifty poster, is at: burgess.mit.edu/pt

Monday, April 6, 2009 | opportunities, processing  


I’ve been working on ways to visualize the current world economic situation along with the bailouts, the Recovery and Reinvestment Act, and so on, but South Park beat me to it.

Friday, March 27, 2009 | decisions, finance, government  

PARSE show at the Axiom Gallery in Boston

New work! Sometime collaborator and savior of this site Eugene Kuo and I have developed a piece for the PARSE show opening tomorrow (Friday, March 27) at the Axiom Art Gallery in Boston. From the announcement:

Curated by AXIOM Founding Director, Heidi Kayser, PARSE, includes the work of five artists who use data to present new perspectives on the underlying information that makes us human. Overlooked patterns of data surround us daily. The artists in PARSE sort, separate and amalgamate physical, mental and social information to create intricate visualizations in print, interactive media, animation and sculpture. These pieces track and reflect our brainwaves during REM sleep, our genetic code, our social icons, and even our carnal desires.

syntenograph printFeaturing works by: Ben Fry and Eugene Kuo, Fernanda Viegas and Martin Wattenberg, Jason Salavon, Jen Hall

The opening is from 6-9pm. The gallery location is amazing — it’s a nook to the side of the Green Street subway station (on the Orange Line in Boston) — it makes me think of what it might be like to have a show at the lair of Bill Murray’s character in Caddyshack. I love that it’s been reserved as a gallery space.

Martin & Fernanda are showing their Fleshmap project, along with a pair of amalgamations by Jason Salavon, and two sculptures from Jen Hall (hrm, can’t find a link for those). Our project is described here, and uses comparisons of the DNA between many species that have been the focus of my curiosity recently to make compositions like the one seen to the right.

Thursday, March 26, 2009 | iloveme  

On the Marionette Theatre

I happened across On the Marionette Theatre by Heinrich von Kleist while reading the Wikipedia entry on Philip Pullman’s His Dark Materials series. Pullman had cited it as one of three influences, and it being the shortest of the three, I gave it a shot (naturally, due to my apparent “young adult” reading level that found me reading his trilogy in the first place).

The story begins with the writer having a chance meeting with a friend, and inquiring about his apparent interest in puppet theater. As the story moves on:

“And what is the advantage your puppets would have over living dancers?”

“The advantage? First of all a negative one, my friend: it would never be guilty of affectation. For affectation is seen, as you know, when the soul, or moving force, appears at some point other than the centre of gravity of the movement. Because the operator controls with his wire or thread only this centre, the attached limbs are just what they should be.… lifeless, pure pendulums, governed only by the law of gravity. This is an excellent quality. You’ll look for it in vain in most of our dancers.”

The remainder is a wonderful parable of vanity and grace.

Wednesday, March 25, 2009 | reading  

Data is the pollution of the information age

An essay by Bruce Schneier on BBC.com:

Welcome to the future, where everything about you is saved. A future where your actions are recorded, your movements are tracked, and your conversations are no longer ephemeral. A future brought to you not by some 1984-like dystopia, but by the natural tendencies of computers to produce data.

Data is the pollution of the information age. It’s a natural by-product of every computer-mediated interaction. It stays around forever, unless it’s disposed of. It is valuable when reused, but it must be done carefully. Otherwise, its after-effects are toxic.

The essay goes on to cite specific examples, though they sound more high-tech than the actual problem. Later it returns to the important part:

Cardinal Richelieu famously said: “If one would give me six lines written by the hand of the most honest man, I would find something in them to have him hanged.” When all your words and actions can be saved for later examination, different rules have to apply.

Society works precisely because conversation is ephemeral; because people forget, and because people don’t have to justify every word they utter.

Conversation is not the same thing as correspondence. Words uttered in haste over morning coffee, whether spoken in a coffee shop or thumbed on a BlackBerry, are not official correspondence.

And an earlier paragraph that highlights why I talk about privacy on this site:

And just as 100 years ago people ignored pollution in our rush to build the Industrial Age, today we’re ignoring data in our rush to build the Information Age.

Tuesday, March 17, 2009 | privacy  

Barbara Liskov wins Turing Award

What a brilliant woman:

Liskov, the first U.S. woman to earn a PhD in computer science, was recognized for helping make software more reliable, consistent and resistant to errors and hacking. She is only the second woman to receive the honor, which carries a $250,000 purse and is often described as the “Nobel Prize in computing.”

I’m embarrassed to admit that I wasn’t more familiar with her work prior to reading about it in Tuesday’s Globe, but wow:

The latter day Ada herselfLiskov’s early innovations in software design have been the basis of every important programming language since 1975, including Ada, C++, Java and C#.

Liskov’s most significant impact stems from her influential contributions to the use of data abstraction, a valuable method for organizing complex programs. She was a leader in demonstrating how data abstraction could be used to make software easier to construct, modify and maintain…

In another contribution, Liskov designed CLU, an object-oriented programming language incorporating clusters to provide coherent, systematic handling of abstract data types. She and her colleagues at MIT subsequently developed efficient CLU compiler implementations on several different machines, an important step in demonstrating the practicality of her ideas. Data abstraction is now a generally accepted fundamental method of software engineering that focuses on data rather than processes.

This has nothing to do with gender, of course, but I find it exciting apropos of this earlier post regarding women in computer science.

Thursday, March 12, 2009 | cs, gender  

Seed Visualization

Update: As of January 1st, 2010, I’m no longer at Seed. Read more here.

Some eighteen months as visualization vagabond (roving writer, effusive explainer, help me out here…) came to a close in December when I signed up with Seed Media Group to direct a new visualization studio here in Cambridge. We now have a name—the Phyllotaxis Lab—and as of last week, we’ve made it official with a press release:

NEW YORK and CAMBRIDGE, MA (March 5, 2009) – Building on Seed Media Group’s strong design culture, Adam Bly, founder and CEO, announced today the appointment of Ben Fry as the company’s first Design Director. Seed Media Group also announced the launch of a new unit focused on data and information visualization to be based in Cambridge, Massachusetts and headed by Ben Fry.

Seed Visualization will help companies and governments find solutions to clearly communicate complex data sets and information to various stakeholders. phyllotaxisThe unit’s research arm, the Phyllotaxis Lab, will work to advance the field of data visualization and will undertake research and experimental design work. The Lab will partner with academic institutions around the world and will provide education on the field of data visualization.

And about that name:

Phyllotaxis is a form commonly found in nature that is derived from the Fibonacci sequence. It is the inspiration for Seed Media Group’s logo, designed in 2005 by Stefan Sagmeister and recently included in the Design and the Elastic Mind exhibit at MoMA. “Much like a phyllotaxis, visualization is about both numbers and information as well as structure and form,” said Ben Fry. “It’s a reminder that beauty is derived from the intelligence of the solution.”

The full press release can be found here (PDF), and more details are forthcoming.

This is gonna be great.

Tuesday, March 10, 2009 | iloveme, seed  

Goodbye, Desktop

Casey sent over a video from someone who has an Arduino (outfitted with a display) running examples from our Processing book:

A little surreal and a lot exciting.

Tuesday, March 10, 2009 | physical, processing  

The Lost City of Atlantisnt

Some combination of internet-fed conspiracy theorists and Google Earthlings (lings that use Google Earth) were abuzz last week with an odd image find, possibly representing the lost city of Atlantis:

tipi drawings on the ocean floor

These hopes were later dashed (or perhaps only fed further) when the apparition was denied in a post on the Official Google Blog crafted by two of the gentlemen involved in the data collection for Google Ocean. The post is fascinating as it describes much of the process that they use to get readings of the ocean floor. They explain how echosounding (soundwaves bounced into the depths) is used to determine distance, and when that’s not possible, they actually use the sea level itself:

Above large underwater mountains (seamounts), the surface of the ocean is actually higher than in surrounding areas. These seamounts actually increase gravity in the area, which attracts more water and causes sea level to be slightly higher. The changes in water height are measurable using radar on satellites. This allows us to make a best guess as to what the rest of the sea floor looks like, but still at relatively low resolutions (the model predicts the ocean depth about once every 4000 meters). What you see in Google Earth is a combination of both this satellite-based model and real ship tracks from many research cruises (we first published this technique back in 1997).

How great is that? The water actually reveals shapes beneath because of gravity’s rearrangement of the ocean surface.

A more accurate map of the entire ocean would require a bit more effort:

…we could map the whole ocean using ships. A published U.S. Navy study found that it would take about 200 ship-years, meaning we’d need one ship for 200 years, or 10 ships for 20 years, or 100 ships for two years. It costs about $25,000 per day to operate a ship with the right mapping capability, so 200 ship-years would cost nearly two billion dollars.

Holy crap, two billion dollars? That’s real money!

That may seem like a lot of money…

Yeah, no kidding — that’s what I just said!

…but it’s not that far off from the price tag of, say, a new sports stadium.


You mean this would teach us more than New Yorkers will learn from the Meadowlands Stadium debacle, beyond “the Jets still stink” and “Eli Manning is still a weenie”? (Excellent Bob Herbert op-ed on a similar topic — the education part, not the Manning part.)

So in the end, this “Atlantis” is the result of the rounding error in the patchwork of data produced by the various measurement and tiling methods. Not as exciting as a waterlogged and trident-wielding civilization, but the remainder of the article is a great read if you’re curious about how the ocean images are collected assembled.

Monday, March 2, 2009 | acquire, mapping, sports, water  

ACM Creativity & Cognition 2009

Passing along a call for the ACM Creativity & Cognition 2009. Sadly I’m overbooked and won’t be able to participate this year, but I attended in 2007 and found it a much more personal alternative to the more enormous ACM conferences (CHI, SIGGRAPH) without losing quality.

Everyday Creativity: Shared Languages and Collective Action
October 27-30, 2009, Berkeley Art Museum, CA, USA

Sponsored by ACM SIGCHI, in co-operation with SIGMM/ SIGART [pending approval]

Keynote Speakers

Mihaly Csikszentmihalyi
Professor of Psychology & Management, Claremont Graduate University, USA
JoAnn Kuchera-Morin
Director, Allosphere Research Laboratory, California Nanosystems Institute, USA
Jane Prophet
Professor of Interdisciplinary Computing, Goldsmiths University of London, UK

Call for Participation

Full Papers, Art Exhibition, Live Performances, Demonstrations, Posters, Workshops, Tutorials, Graduate Symposium

Submission deadline: April 24, 2009
For more information and submission process see: www.creativityandcognition09.org

the way things work and cognition 2009Creativity is present in all we do. The 7th Creativity and Cognition Conference (CC09) embraces the broad theme of Everyday Creativity. This year the conference will be held at the Berkeley Art Museum (CA, USA), and asks: How do we enable everyone to enjoy their creative potential? How do our creative activities differ? What do they have in common? What languages can we use to talk to each other? How do shared languages support collective action? How can we incubate innovation? How do we enrich the creative experience? What encourages participation in everyday creativity?

The Creativity and Cognition Conference series started in 1993 and is sponsored by ACM SIGCHI. The conference provides a forum for lively interdisciplinary debate exploring methods and tools to support creativity at the intersection of art and technology. We welcome submissions from academics and practitioners, makers and scientists, artists and theoreticians. This year’s broad theme of Everyday Creativity reflects the new forms of creativity emerging in everyday life, and includes topics of:

  • Collective creativity and creative communities
  • Shared languages and Participatory creativity
  • Incubating creativity and supporting Innovation
  • DIY and folk creativity
  • Democratising creativity
  • New materials for creativity
  • Enriching the collaborative experience

We welcome the following forms of submission:

  • Empirical evaluations by quantitative and qualitative methods
  • In-depth case studies and ethnographic analyses
  • Reflective and theoretical accounts of individual and collaborative practice
  • Principles of interaction design and requirements for creativity support tools
  • Educational and training methods
  • Interdisciplinary methods, and models of creativity and collaboration
  • Analyses of the role of technology in supporting everyday creativity

The Berkeley Art Museum should be a great venue too.

Tuesday, February 24, 2009 | creativity, opportunities  

Flu headed to the dustbin of disease history?

And is disease history stored in a dustbin, for that matter?

Researchers at Dana-Farber may have found influenza’s weak spot, which could lead to a vaccine:

Yearly vaccination is currently needed because different strains of the virus circulate around the world regularly, owing to the germs’ rapidly changing genetic makeup. But the researchers reported yesterday that they had found one pocket of the virus that appears to remain static in multiple strains, making it an attractive target for a vaccine, as well as drugs.

And instead of fighting the primary part virus head on, you figure out a way to attack a portion that does not mutate in the weaker part and neutralize it:

Most vaccines work by revving up the body’s disease-fighting cells, helping them to recognize and rapidly neutralize invading germs. The researchers realized that the disease fighters generated by existing flu vaccines – which contain killed or weakened whole viruses – head straight toward the biggest target, the globular head. It is, in effect, a Trojan horse that prevents the body’s immune system from directing more of its firepower toward the stalk of the [virus], where the scientists found the pocket that was so static. That site contains machinery that lets the virus penetrate human cells.

A vaccine is a way off, but they say it should be possible to make a drug that helps the body create antibodies to fight off the flu sooner than that. Incredible work.

Monday, February 23, 2009 | genetics  

Pirates of Global Warming

This month’s pirate reference comes to us by way of the theory of the Flying Spaghetti Monster. The theory was first introduced in an open letter from Bobby Henderson to the Kansas State Board of Education after deciding that creationism must be taught alongside the theory of evolution. I had disregarded the Spaghetti Monster as a heavy-handed response to the hard-headed, but had missed this important bit of context:

You may be interested to know that global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of Pirates since the 1800s. For your interest, I have included a graph of the approximate number of pirates versus the average global temperature over the last 200 years. As you can see, there is a statistically significant inverse relationship between pirates and global temperature.

invarrse carrrrrelation

A stunning find! And like an overly literal translation of the bible, so accurate — except when it’s not. The horizontal scale, as Edward Tufte would say, “repays careful study.”

Wednesday, February 18, 2009 | infographics  

Under The Sea

I wrote about my excitement over the rumor that Google was going under back in April, but now it has officially happened — the Ocean has arrived as part of Google Earth:

cue little mermaid theme

Look at those trenches! And now you can use the Google Earth software to fly through the area in the middle of the Atlantic where some god has decided to begin peeling the globe like an orange.

I’m waiting for the day (presumably a few years from now) that this feature includes other major bodies of water, revealing the hidden shapes beneath the surface of lakes or rivers that you know well from above. The physical relief version, that is. I’ll pass on the underwater Google Street View with their privacy-invading minisubs sticking their nose in everyone’s business.

Thursday, February 5, 2009 | mapping, water  

F* Everything, We’re Doing 44 Vertebrates

From an announcement email sent this week by the folks behind the UCSC Genome Browser project:

We are pleased to announce the release of a new Conservation track based on the human (hg18) assembly.  This track shows multiple alignments of 44 vertebrate species and measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all species (vertebrate) and two subsets (primate and placental mammal). The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. Conserved elements identified by phastCons are also displayed in this track. For more details, please visit the track description page

It’s the comparative genomics equivalent of “Fuck Everything, We’re Doing Five Blades,” an editorial penned by James M. Kilts (President and CEO of Gillette) for The Onion. Kilts writes:

Would someone tell me how this happened? We were the fucking vanguard of shaving in this country. The Gillette Mach3 was the razor to own. Then the other guy came out with a three-blade razor. Were we scared? Hell, no. Because we hit back with a little thing called the Mach3Turbo. That’s three blades and an aloe strip. For moisture. But you know what happened next? Shut up, I’m telling you what happened—the bastards went to four blades. Now we’re standing around … selling three blades and a strip. Moisture or no, suddenly we’re the chumps. Well, fuck it. We’re going to five blades.

44 species, sittin’ in a treeConservation tracks in the human genome are simply additional lines of annotation shown alongside the human DNA sequence. The lines show identical areas of near-similar DNA found in other species (in this case 44 vertebrates). In the past we might have looked at two, three, seven, maybe a dozen different species in a row. UCSC had actually been up to 27 different species at a time before they took the extra push over the cliff to 44.

As it turns out, just sequencing the human genome isn’t all that interesting. It only starts to get interesting in the context of other genomes from other species. With multiple species, the data can be compared and evolutionary trees drawn. We can take an organism that we know a lot about — say the fruitfly — and compare its genes (which have been studied extensively) to the genetic code of humans (who have been studied less), and we can look for similar regions. For instance, the HOX family of genes is involved in structure and limb development. A similar region can be found in humans, insects, and many things in between. How cool is that?

Further, how about all that “junk” DNA? A particular portion of DNA might have no known function, but if you find an area where the data matches (is conserved) with another species, then it might not be quite as irrelevant as previously thought (and for the record, the term junk is only used in the media). If you see that it’s highly conserved (a large percentage is identical) across many different species, then you’re probably onto something, and it’s time to start digging further.

Spending time with data like this really highlights the silliness of anti-evolution claims. It’s tough to argue with being able to see it. Unfortunately most of the work I’ve done in this area isn’t documented properly, though you can see human/chimp/dog/mouse alignments in this genome browser, a dozen mammals aligned in this illustration, or humans and chimps in this piece.

As an aside, a few months after the Onion article, Gillette really did go to five blades with their Fusion razor. And happily, the (real) CEO speaks with the same bravado as the earlier editorial:

“The Schick launch has nothing to do with this, it’s like comparing a Ferrari to a Volkswagen as far as we’re concerned,” Chairman, President and Chief Executive James Kilts, told Reuters.

And why isn’t that guy doing their ads instead of those other namby-pambies?

Wednesday, February 4, 2009 | genetics  

Piet Mondrian Goes to the Super Bowl

Beneath a pile of 1099s, I found myself distracted still thinking about the logo colors and proportions seen in the previous post. This led to a diversion to extract the colors from the Super Bowl logos and depict them according to their usage. The colors are counted up and laid out using a Treemap.

The result for all 43 Super Bowl logos, using the same layout as the previous image:

my last hour or two

A few of the typical pairs, starting with 2001:




See all of the pairings here. Some notes about what’s mildly clever, and the less so:

  • The empty space (white areas or transparent background) is subtracted from the logo, and the code tries to size the Treemap according to the aspect ratio of the original image, so that when seen adjacent the logo, things look balanced (kinda).
  • The code is a simple adaptation of the Treemap project in Chapter 7 of Visualizing Data.
  • Unfortunately, I could not find vector images (for all of the games, at least), which means the colors in the original images are not pure. For instance, edges of a solid blue color will have light blue edges because of smoothing (anti-aliasing). This makes it difficult to accurately figure out what’s a real color and what isn’t. Sometimes the fuzzy edge colors are correctly removed, other times not so much. Even worse, it may even remove legitimate colors that are used in less than 4-5% of the image.
  • The color quantization isn’t good. On a few, it’s bad, and causes a few similar colors to disappear.
  • All the above could be fixed, but taxes are more important than non-representational art. (That’s not a blanket statement — just for me this evening.)

And finally, I don’t honestly think there’s any relationship between a software algorithm for data visualization and the work of an artist like Piet Mondrian. But I do love the idea of a Dutch painter from the De Stijl movement making his way through the turnstiles at Raymond Jones Stadium.

Monday, February 2, 2009 | collections, examples, football, represent, sports, time  

Evolution of the Super Bowl Logo

From The New York Times, a collection of all 43 logos used to advertise the Super Bowl:

super bowl logos over time

The original article cites how the logos reflect the evolution and growth of the league. Which makes sense, you can see that it was more than fifteen years before it moved from just a logotype to a fully branded extravaganza. Or that in its first year it wasn’t the Super Bowl at all, and instead billed as “The First World Championship Game of the American Football Conference versus the National Football Conference,” a title that sounds great in a late-60s broadcaster voice (try it, you’ll like it), but was still shortened to the neanderthal “First World World Championship Game AFC vs NFC” for the logo, before it was renamed the “Super Bowl” the following year. (You can stop repeating the name in the broadcaster voice now, your officemates are getting annoyed.)

The similarities in the coloring are perhaps more interesting than the differences, though the general Americana obsession of the constant blue/red coloring is unsurprising, especially when you recall that some of the biggest perennial ad buyers (Coke, Pepsi, Budweiser) also share red, white, and blue labels. I’m guessing that the heavy use of yellow in the earlier logos had more to do with yellow looking good against a background when used for broadcast.

Or maybe not — like any good collection, there’s plenty to speculate about and many hypotheses to be drawn — and the investigation is more interesting for the exercise.

Monday, February 2, 2009 | collections, football, sports, time, typography  

Songs off the Charts

I’m often asked about sonification—instead of visualization, turning data into audio—but I’ve never pursued it because there are other things that I’m more curious about. The bigger issue is that I was concerned that audio would require even more of a trained ear than a  visualization (according to some) requires a trained eye.

But now, Johannes Kreidler, with the help of Microsoft Songsmith, has proven me wrong:

Johannes, time to book your ticket to IEEE InfoVis.

My opinion of Songsmith is shifting — while it’s generally presented as a laughingstock, catastrophic failure, or if nothing else, a complete embarrassment (especially for its developers slash infomercial actors), it’s really caught the imagination of a lot of people who are creating new things, even if all of them subvert the original intent of the project. (Where the original intent was to… create a tool that would help write a jingle for glow in the dark towels?)

At any rate, I think it’s achieved another kind of success, and web memes aside, I’m curious to see what actual utility comes from derivatives of the project, now that the music idea is firmly planted in peoples’ heads.

And if you stopped the video halfway through because it got a little tedious, you missed some of the good bits toward the end.

(Thanks to Moiz Syed for the link.)

Sunday, February 1, 2009 | finance, music, sonification  

Pirates of Statistics

Pirates of Rrrrr!Article from the New York Times a bit ago, covering R, everyone’s favorite stats package:

R is also the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.

R is also open source, another focus of the article, which includes quoted gems such as this one from commercial competitor SAS:

Closed source: it’s got what airplanes crave!“I think it addresses a niche market for high-end data analysts that want free, readily available code,” said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”

Pure gold: free software is scary software! And freeware? Is she trying to conflate R with free software downloads from CNET?

Truth be told, I don’t think I’d want to be on a plane that used a jet engine designed or built with SAS (or even R, for that matter). Does she know what her product does? (A hint: It’s a statistics package. You might analyze the engine with it, but you don’t use it for design or construction.)

For those less familiar with the project, some examples:

…companies like Google and Pfizer say they use the software for just about anything they can. Google, for example, taps R for help understanding trends in ad pricing and for illuminating patterns in the search data it collects. Pfizer has created customized packages for R to let its scientists manipulate their own data during nonclinical drug studies rather than send the information off to a statistician.

At any rate, many congratulations to Robert Gentleman and Ross Ihaka, the original creators, for their success. It’s a wonderful thing that they’re making enough of a rumpus that a stats package is being covered in a mainstream newspaper.


Tuesday, January 27, 2009 | languages, mine, software  

Just when you thought the world revolved around you

Eugene Kuo sends a link to the Wikipedia article on center of population, an awkward term for the middlin’ place of all the people in a region. Calculation can be tricky because the Earth is round (what!?) and the statistical hooey that goes into determining a proper distance metric. The article includes a heat map of world population:

center of population

From a cited article, Wikipedia notes that:

…the world’s center of population is found to lie “at the crossroads between China, India, Pakistan and Tajikistan”, with an average distance of 5,200 kilometers (3,200 mi) to all humans…

Though sadly, the map also uses a strange color scale for the heat map, with blue the area of greatest density, and red (traditionally the “important” end of the scale) as the least populated area. Even shifting the colors helps a bit, at least in terms of highlighting the correct area:


Though the shift is of questionable accuracy, and the bright green still draws too much attention, as does the banding in the middle of the Atlantic.

Outside of musing for your own edification, practical applications of calculating a population’s center include:

…locating possible sites for forward capitals, such as Brasilia, Astana or Austin. Practical selection of a new site for a capital is a complex problem that depends also on population density patterns and transportation networks.

Check the article for more about centers of various countries, including the United States:

The mean center of United States population has been calculated for each U.S. Census since 1790. If the United States map were perfectly balanced on a point, this point would be its physical centroid. Currently this point is located in Phelps County, Missouri, in the east-central part of the state. However, when Washington, D.C. was chosen as the federal capital of the United States in 1790, the center of the U.S. population was in Kent County, Maryland, a mere 47 miles (76 km) east-northeast of the new capital. Over the last two centuries, the mean center of United States population has progressed westward and, since 1930, southwesterly, reflecting population drift.

For added fun, I’ve created an interactive version of the map, based on a Processing example. (Though it took me longer to write the credits for the adaptation than to actually assemble it — thanks for all those who contributed little bits to it.)


Monday, January 26, 2009 | mapping, population  

Renting Big Data

logo_aws.gifBack in December (or maybe even November… sorry, digging out my inbox this morning) Amazon announced the availability of public data sets for their Elastic Compute Cloud platform:

Previously, large data sets such as the mapping of the Human Genome and the US Census data required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. For example, users can produce or use prebuilt server images with tools and applications to analyze the data sets. By hosting this important and useful data with cost-efficient services such as Amazon EC2, AWS hopes to provide researchers across a variety of disciplines and industries with tools to enable more innovation, more quickly.

The current lists includes ENSEMBL (550 GB), GenBank (250 GB), various collections from the US Census (about 500 GB), and a handful of others (with more promised). I’m excited about the items under the “Economy” heading, since lots of that information has to date been difficult to track down in one place and in a single format.

While it may be possible to download these as raw files from FTP servers from their original sources, it’s already set up for you, rather than running rsync or ncftp for twenty-four hours, then spending an afternoon setting up a Linux server with MySQL and lots of big disk space, and dealing with various issues regarding versions of Apache, MySQL, PHP, different Perl modules to be installed, permissions to be fixed, etc. etc. (Can you tell the pain is real?)

As I understand it, you start with a frozen version of the database, then import that into your own workspace on AWS, and pay only for the CPU time, storage, and bandwidth that you actually use. Pricing details are here, but wear boots — there’s a lotta cloud marketingspeak to wade through.

(Thanks to Andrew Otwell for the nudge.)

Sunday, January 25, 2009 | acquire, data, goinuptotheserverinthesky  

GMing the Inbox

An email regarding the last post, answering some of the questions about success and popularity of management games. Andrew Walkingshaw writes:

The short answer to this is “yes” – football (soccer) management games are a very big deal here in Europe. One of the major developers is Sports Interactive, (or at Wikipedia) with their Championship Manager/Football Manager series: they’ve been going over fifteen years now.

And apparently the games have even been popular since the early 80s.  I found this bit especially interesting:

Fantasy soccer doesn’t really work – the game can’t really be quantified in the way NFL football or baseball can – so it could be that these games’ popularity comes from filling the same niche as rotisserie baseball does on your side of the Atlantic.

Which suggests a more universal draw to the numbers game or statistics competition that gives rise to fantasy/rotisserie leagues. The association with sports teams gives it broader appeal, but at its most basic, it’s just sports as a random number generator.


Some further digging yesterday also turned up Baseball Mogul 2008 (and the 2009 Edition). The interface seems closer to a bad financial services app (bad in this case just means poorly designed, click the image above for a screenshot), which is the opposite direction of what I’m interested in, but at least gives us another example. Although this one also seems to have reviewed better than the game from the previous post.

Saturday, January 24, 2009 | baseball, feedbag, games, simulation, sports  

Gaming the GM

Via News.com, the peculiar story of MLB Front Office Manager, a sports simulation game in which you play the general manager of a major league baseball team. Daniel Terdiman writes:

The new game — which is unlike any baseball video game I’ve ever seen — has perhaps the perfect pitchman, Oakland A’s General Manager Billy Beane. For those not familiar with him, the game probably won’t mean much, since as the main subject of Michael Lewis’ hit book, Moneyball, Beane has long been considered the most cerebral and efficient guy putting contending baseball teams on the field.

This caught my eye because of its focus on the numbers, and how you’d pull that off in the context of a console game.


A “first look” review from GameSpot notes:

As you may imagine, FOM’s interface is menu heavy, providing access to the various statistical metrics and trends to keep you apprised as general manager. What is surprising is that FOM manages to bring this depth to the console as well as the PC. While other console-based franchise management titles have struggled to create effective navigation tools, FOB’s vertical menu interface is both clean and intuitive without compromising the depth one would expect from a game in this genre. Top-level categories include submenus (many of which include further submenus) similar to navigating a sports Web site.

Other reviews seem to be less charitable, but I’m less interested in the game itself than the curiosity that it exists in the first place. GameSpot describes the audience:

By 2K’s own admission, the game targets a specific niche: the roughly 3.5 million participants of Fantasy Baseball leagues. It is 2K’s hope that this hardcore baseball audience, many of whom spend two to three hours every day managing their fantasy rosters, will see FOM as a convenient alternative (or even a complement, assuming those individuals forgo sleep).

So it’s a niche, as would be expected. But I’m curious about a handful of issues, a combination of not knowing much about gaming, mixed with a fascination for what gaming means for interfaces:

  • Could this be done properly, to a point where a game like this is a wider success? The niche audience is interesting at first, but is it possible to take a numbers game to a broader audience than that?
  • Has anyone already had success doing that?
  • Are there methods for showing complex numbers, data, and stats that have been used in (particularly console) games that are more effective than typical information dashboards used by, say, corporations?

The combination of having a motivated user who is willing to put up with the numbers suggests that some really interesting things could be done. And because the interface has to be optimized for the limited interaction afforded by a handheld controller (if played on a console) suggests that the implementation would also need to be clever.

If you have any insight, please drop me a line. Or you can continue to speculate for yourself while enjoying the promotional video below with the most fantastically awful background music I’ve heard since Microsoft Songsmith appeared a little while ago.

Friday, January 23, 2009 | baseball, games, simulation, sports  

Mapping Over Time

A video depicting all the edits for the OpenStreetMap project for 2008.

OpenStreetMap is a wiki-style map of the world and this animation displays a white flash each time a way is entered or updated. Some edits are a result of a physical local survey by a contributor with a GPS unit and taking notes, other edits are done remotely using aerial photography or out-of-copyright maps, and some are bulk imports of official data.

Simple idea but really elegant execution. Created by ITO.

Thursday, January 22, 2009 | mapping, motion, time  

Statistics, Science, and Speeches


Our forty-fourth president:

That we are in the midst of crisis is now well understood. Our nation is at war, against a far-reaching network of violence and hatred. Our economy is badly weakened, a consequence of greed and irresponsibility on the part of some, but also our collective failure to make hard choices and prepare the nation for a new age. Homes have been lost; jobs shed; businesses shuttered. Our health care is too costly; our schools fail too many; and each day brings further evidence that the ways we use energy strengthen our adversaries and threaten our planet.

These are the indicators of crisis, subject to data and statistics. Less measurable but no less profound is a sapping of confidence across our land – a nagging fear that America’s decline is inevitable, and that the next generation must lower its sights.

For the politically-oriented math geek in me, his mention of statistics stood out: we now have a president who can actually bring himself to reference numbers and facts. I searched for other mentions of “statistics” in previous inaugural speeches and found just a single, though oddly relevant, quote from William Howard Taft in 1909:

The progress which the negro has made in the last fifty years, from slavery, when its statistics are reviewed, is marvelous, and it furnishes every reason to hope that in the next twenty-five years a still greater improvement in his condition as a productive member of society, on the farm, and in the shop, and in other occupations may come.

Progress indeed. (And what’s the term for that? A surprising coincidence? Irony? Is there a proper term for such a connection? Perhaps a thirteen letter German word along the lines of schadenfreude?)

And it’s such a relief to see the return of science:

For everywhere we look, there is work to be done. The state of the economy calls for action, bold and swift, and we will act – not only to create new jobs, but to lay a new foundation for growth. We will build the roads and bridges, the electric grids and digital lines that feed our commerce and bind us together. We will restore science to its rightful place, and wield technology’s wonders to raise health care’s quality and lower its cost. We will harness the sun and the winds and the soil to fuel our cars and run our factories. And we will transform our schools and colleges and universities to meet the demands of a new age. All this we can do. And all this we will do.

Tuesday, January 20, 2009 | data, government, science  

Can a bunch of mathematicians make government more representative?

An interesting article from Slate about a session at the Joint Mathematics Meeting that discussed mathematical solutions and proposals to undo the problem of gerrymandered congressional districts. That is, politicians in congress having the ability to draw an outline around the group of people they want to represent (which is based on how likely they are to vote for said politician’s re-election). The resulting shapes are often comical, insofar as you’re willing to be cheerful in a “politics is perpetually broken and corrupt” kind of way. Chris Wilson writes:

It’s tough to find many defenders of the status quo, in which a supermajority of House seats are noncompetitive. (Congressional Quarterly ranked 324 of the 435 seats as “safe” for one party or the other in 2008.) The mathematicians—and social scientists and lawyers—who gathered to discuss the subject Thursday are certain there’s a better way to do it. They just haven’t quite figured out what it is.

The meeting also seemed to include a contest (knock down, drag out, winner take pocket protector) between the presenters each trying to one-up each other for worst district. For instance, Florida’s 23rd, provided by govtrack.us:


Which doesn’t seem awful at first, until you see the squiggle up the coast. Or Pennsylvania’s 12th, which Wilson describes as “an anchor glued to a sea anemone.”


Fixing the problem is difficult, but sometimes there are elegant and straightforward metrics that get you closer to a solution:

The most interesting proposal of the afternoon came from a Caltech grad student named Alan Miller, who proposed a simple test: If you take two random people in a district, what are the odds that one can walk in a straight line to the other without ever leaving the district? (Actually, it’s without leaving the district while remaining in the state, so as not to penalize districts like Maryland’s 6th, which has to account for Virginia’s hump.) This rewards neat, simple shapes. But it penalizes districts like Maryland’s 3rd, which looks like something out of Kandinsky’s Improvisation 31.

This turns the issue into something directly testable (two residents and their path) for which we can calculate a probability — the sort of thing statisticians love (because it can be measured). Given this criteria (and others like it) for congressional district godliness, another proposal was a kind of Netflix Prize for redistricting, where groups could compete to develop the best redistricting algorithm. Such an algorithm would seek to remove the (bipartisan) mischief by limiting human intervention.

The original article also includes a slide show of particularly heinous district shapes. And as an aside, the images above, while enormously useful, illustrate part of my beef with mash-ups: Google Maps was designed as a mapping application, not a mapping-with-stuff-on-it application. So when you add data to the map image — itself a completed design —you throw off that balance. It’s difficult to read the additional information (the district area), and the information that’s there (the map coloring, specific details of the roads) is more than necessary for this purpose.

Wednesday, January 14, 2009 | mapping, politics, probability  

Bird Tracks in the Snow

The field in snowy Foxborough, Massachusetts after a running play in Sunday’s football game:


(Click the image for the original version, taken from the broadcast.)

Look at all the footprints in the snow: The previous play began to the right of the white line, where you can see most of the snow was cleared by the players lining up. Just to the left of that is another cleared area, where a group of players began to tackle Sammy Morris. But it’s not until almost ten yards — two more white lines, and the area below where the players are standing in that picture — that he’s finally taken to the ground. For a visual explanation, watch the play:

(Mute the audio and spare yourself the insipid commentary from the FOX booth. And then be thankful that at least it’s not Joe Buck and Tim McCarver.)

The path left behind in the snow explains exactly how the play developed, according to the players’ feet. (And as a running play, feet are important.) Absolutely beautiful.

One of the best things about December is watching football games played in the snow. For instance last year, there was a game between Cleveland and Buffalo last year that looked like it was being played inside a snow globe, with the globe being picked up and shaken during each commercial break.

Boston was a complete mess yesterday with a few inches of snow, sleet, and muck falling from the sky, which made a mess of the field where the New England Patriots were happily hosting the Arizona Cardinals, who are less accustomed to digging out their cars and leaving behind patio furniture.

Another image from later in the game, this one instead depicts the substitutions of players as they near the goal line. Note the lines in the snow that begin at the left, and lead to where the players are lined up:


Monday, December 22, 2008 | football, physical, sports  

Numbers Hurt

Oww, my data.


(Originally found on Boston.com, credited only to Reuters… If anyone knows where to find a larger version or the original, please drop me a line. Update – Paul St. Amant and Martin Wattenberg have also pointed out The Brokers With Hands On Their Faces Blog, which is also evocative, yet wildly entertaining, but not as data-centric as The Brokers With Tales Of Sadness Depicted On Multiple Brightly Colored Yet Highly Detailed Computer Displays in the Background Behind Them Blog that I’ve just started.)

Monday, December 15, 2008 | displays, news  


Further down in the reading pile is an article from Slate titled Does Advertising Really Work?

Every book ever written about marketing will at some point dig up that old, familiar line: “I know half my advertising is wasted—I just don’t know which half.”

The article by Seth Stevenson goes on to discuss What Sticks, by Rex Briggs and Greg Stuart, a pair of marketing researchers who study the advertising industry. Mad Men notwithstanding, I find the topic fascinating as a trained designer (trained meaning someone who learned to make such things) who happily pays Comcast $12.95 a month for the privilege to never hear or see Levitra, Viagra, or Cialis advertisements.

But separately, and as someone who did a lecture last night, I really enjoyed this point about anecdotes:

Why is this anecdote-laden style so popular with business authors, and so successful (to the tune of best-selling books and huge speaking fees)? I think it comes down to two things: 1) Fascinating anecdotes can, just by themselves, make you feel like you’ve really learned something… 2) A skillful anecdote-wielder can trick us into thinking the anecdote is prescriptive. In fact, what’s being sold is success by association. It’s no coincidence that [one such book talks] about the iPod—a recent mega-hit we’re all familiar with—in at least three chapters. It’s tempting to believe that bite-sized anecdotes about how the iPod was conceived, or designed, or marketed will reveal the secret formula for kicking butt with our own projects. Of course, it’s never that simple. An anecdote is a single data point, …

I find the first point interesting in light of the way in which we digest information from the world around us. We’re continually consuming data and then trying to synthesize it to larger meanings. And perhaps anecdotes are a kind of shortcut for this process because they provide something that’s already been digested but still feels substantial because it affords a brief leap in our thinking (and one that seems significant at the time).

Of course, unless you’re a baby bird, you’re better off digesting on your own.

As a side note, I went looking for an image to illustrate this blob of text, and was amused to find that the results from a google image search for “anecdote” consisted almost entirely of cartoons. Which reminds me of a story…

Saturday, December 13, 2008 | speaky  

Wet and Dry Ingredients; Mixing Bowls and Baking Dishes

51mrbt0099l_ss400_.jpgDigging through my reading list pile, I begin skimming through A Box, Darkly: Obfuscation, Weird Languages, and Code Aesthetics by Michael Mateas and Nick Montfort. I was moving along pretty good until I reached the description of the Chef programming language:

Another language, Chef, illustrates different design decisions for structuring play. Chef facilities double-coding programs as recipes. Variables are declared in an ingredients list, with amounts indicating the initial value (e.g., 114 g of red salmon). The type of measurement determines whether an ingredient is wet or dry; wet ingredients are output as characters, dry ingredients are output as numbers. Two types of memory are provided, mixing bowls and baking dishes. Mixing bowls hold ingredients which are still being manipulated, while baking dishes hold collections of ingredients to output. What makes Chef particularly interesting is that all operations have a sensible interpretation as a step in a food recipe. Where Shakespeare programs parody Shakespearean plays, and often contain dialog that doesn’t work as dialog in a play (“you are as hard as the sum of yourself and a stone wall”), it is possible to write programs in Chef that might reasonably be carried out as a recipe. Chef recipes do have the unfortunate tendency to produce huge quantities of food, however, particularly because the sous-chef may be asked to produce sub-recipes, such as sauces, in a loop.

Wonderful. (And a nice break for someone who has been fretting about languages and syntax over the last couple weeks.)

Friday, December 12, 2008 | languages  

Lecture in Cambridge, MA this Thursday

The folks at the Boston Chapter of the IEEE Computer Society / Greater Boston Chapter of the ACM have kindly invited me to give a talk this Thursday, December 11.

The details can be found here, here, here, and here. They all contain identical information, but have different text layouts and varied sizes of my grinning mug. You can choose which one you like best (and sorry, none are available without my picture).

Tuesday, December 9, 2008 | talk  

Subjectively Attractive Client-Side Scripted Browser-Delivered Charts and Plots

annual-fruit-sales.png…also known as Bluff, though they call it “Beautiful Graphs in JavaScript.” And who can argue with pink?

Bluff is a JavaScript port of the Gruff graphing library for Ruby. It is designed to support all the features of Gruff with minimal dependencies; the only third-party scripts you need to run it are a copy of JS.Class (about 2kb gzipped) and a copy of Google’s ExCanvas to support canvas in Internet Explorer. Both these scripts are supplied with the Bluff download. Bluff itself is around 8kb gzipped.

There’s something cool (and hilarious) about the fact that even though we’re talking about bleeding edge features (decent JavaScript and Canvas support) only available in the most recent of modern browser releases, the criteria of awesomeness and usefulness is still the same as 1997 — that it’s only 8 Kb.

(The only thing that strikes me as odd, strictly from an interface perspective, is the fact that I can’t drag the “image” to the Desktop, the way that I would a JPEG or GIF image. Certainly that’s also the case for Flash and Java, but there’s something that strikes me as strange the way that JavaScript is so lightweight — part of the browser — yet  the thing isn’t really “there”.)

At any rate, I’m fairly fascinated by this idea of JavaScript being a useful client-side means of generating images. Something very exciting is bound to happen.

Tuesday, December 9, 2008 | api, represent  

Visualization + Processing in Today’s IHT

Alice Rawsthorn writes about visualization in today’s International Herald Tribune, which also includes a mention of Processing:

Producing visualization required the development of new tools capable of analyzing huge quantities of complex data, and interpreting it visually. In the forefront is Processing, a software system devised by the American designers, Ben Fry and Casey Reas, to enable computer programmers to create visual images, and designers to get to grips with programming. “Processing is a bridge between those fields,” said Reas. “Designers feel comfortable with it because it enables them to work visually, yet it also feels familiar to programmers.”

Paola Antonelli on visualization:

“Visualization is not simply an evolution of graphic design, but a complete and complex design form that requires spatial, narrative, synthetic and graphic sensitivity and expertise,” explained Antonelli. “That’s why we see so many practitioners – architects, product designers, filmmakers, statisticians and graphic designers – flocking to it.”

The Humans vs. Chimps illustration even gets a mention:

Take a scientific question like the genetic difference between humans and chimpanzees. Would you prefer to plough through an essay on the subject, or to glance at the visualization created by Fry in which the 75,000 letters of coding in the human genome form a photographic image of a chimp’s head? Virtually all of our genetic information is identical, and Fry highlights the discrepancies by depicting nine of the letters as red dots. No contest again.

The full article is here, and also includes a slide show of other works.

Monday, December 8, 2008 | iloveme, processing, reviews  

220 Feet on 60 Minutes

From a segment on last night’s 60 Minutes:

Saudi Aramco was originally an American company. It goes way back to the 1930s when two American geologists from Standard Oil of California discovered oil in the Saudi desert.

Standard Oil formed a consortium with Texaco, Exxon and Mobil, which became Aramco. It wasn’t until the 1980s that Saudi Arabia bought them out and nationalized the company. Today, Saudi Aramco is the custodian of the country’s sole source of wealth and power.

Over 16,000 people work at the company’s massive compound, which is like a little country with its own security force, schools, hospitals, and even its own airline.

According to Abdallah Jum’ah, Saudi Aramco’s president and CEO, Aramco is the world’s largest oil producing company.

And it’s the richest company in the world, worth, according to the latest estimate, $781 billion.

I was about to change the channel (perhaps as you were just about to stop reading this post), when they showed the big board:

Jum’ah gave 60 Minutes a tour of the company’s command center, where engineers scrutinize and analyze every aspect of the company’s operations on a 220-foot digital screen.

“Every facility in the kingdom, every drop of oil that comes from the ground is monitored in real time in this room,” Jum’ah explained. “And we have control of each and every facility, each and every pipeline, each and every valve on the pipeline. And therefore, we know exactly what is happening in the system from A to Z.”

A large map shows all the oil fields in Saudi Arabia, including Ghawar, the largest on-shore oil field in the world, and Safaniya, the largest off-shore oil field in the world; green squares on the map monitor supertankers on the high seas in real time.

Here’s a short part of the segment that shows the display:

Since the smaller video doesn’t do it justice, several still images follow, each linked to their Comcastic, artifact-ridden HD versions:


Did rooms like this first exist in the movies and compelled everyone to imitate?


New guys and interns have to sit in front of the wall of vibrating bright blues:


The display is ambient in the sense that nobody’s actually using the larger version to do real work (you can see relevant portions replicated on individuals’ monitors). It seems to serve as a means of knowing what everyone in the room is up to (or as a deterrent against firing up Solitaire — I’m looking at you Ahmad). But more importantly, it’s there for visitors, especially visitors with video cameras, and people who write about visualization and happened to catch a segment about their info palace since it immediately followed the Patriots-Seahawks game.

A detail of one of the displays bears this out — an overload of ALL CAPS SANS SERIF TYPE with the appropriately unattractive array reds and greens. This sort of thing always makes me curious about what such displays would look like if they were designed properly. Rather than blowing up low resolution monitors, what would it look like if it were designed for the actual space and viewing distance in which it’s used?


Sexy numbers on curvaceous walls:


View the entire segment from 60 Minutes here.

Monday, December 8, 2008 | bigboard, energy, infographics, movies  

The Owl Learns Japanese

1378_visualdata_h1.jpgI’m incredibly pleased to write that O’Reilly Japan has just completed a Japanese translation of Visualizing Data. The book is available for pre-order on Amazon, and has also been announced on O’Reilly’s Japanese site.

Having the book published in Japanese is incredibly gratifying. Two of my greatest mentors (Suguru Ishizaki at CMU, and later John Maeda at MIT) were Japanese Americans who trained at Tsukuba University, training that informed both their own work and their teaching style.

I first unveiled Processing during a two week workshop course at Musashino Art University in Japan in August 2001, working with a group of about 40 students. And in 2005, we won the Interactive Design Prize from the Tokyo Type Director’s Club.

At any rate, I can’t wait to see the book in person, this is just too cool.

Monday, December 1, 2008 | processing, translation  

LA’s Dirtiest Pools & More

39342283-01163012.jpgFeaturing “38 projects and more than 730,000 records,” the Los Angeles Times now has a Data Desk feature, a collection of searchable data sets and information graphics from recent publications. It’s like reading the LA Times online but only paying attention to the data-oriented features. (Boring? Appealing? Your ideal newspaper? We database, you decide. Eww, don’t repeat that.) On first glance I thought (hoped) it would be more raw data, but even having all the items collected in one location suggests something interesting for how newspapers share (and perceive, internally) their carefully researched (an massaged) data that they collect on a regular basis.

Thanks to Casey for the pointer.

Thursday, November 27, 2008 | data, infographics  

Call for Papers: Visualizing the Past

James Torget, by way of my inbox:

I wanted to touch base to let you know about a workshop that we’re putting together out here at the University of Richmond.  Basically, UR (with James Madison University) will be hosting a workshop this spring focused on how scholars can create visualizations of historical data and how we can better share our data across the Internet.  To that end, we are looking for people working on these questions who would be interested in participating in an NEH-sponsored workshop.

We are seeking proposals for presentations at the workshop, and participants for our in-depth discussions.  The workshop is scheduled for February 20-21, 2009 at the University of Richmond.  We are asking that people submit their proposals by December 15, and we will extend invitations for participation by December 31, 2008. Detailed information can be found at: http://dsl.richmond.edu/workshop/

Thursday, November 27, 2008 | inbox, opportunities  

It only took 162 attempts, but Processing 1.0 is here!

We’ve just posted Processing 1.0 at http://processing.org/download. We’re so excited about it, we even took time to write a press release:

CAMBRIDGE, Mass. and LOS ANGELES, Calif. – November 24, 2008 – The Processing project today announced the immediate availability of the Processing 1.0 product family, the highly anticipated release of industry-leading design and development software for virtually every creative workflow. Delivering radical breakthroughs in workflow efficiency – and packed with hundreds of innovative, time-saving features – the new Processing 1.0 product line advances the creative process across print, Web, interactive, film, video and mobile.

Whups! That’s not the right one. Here we go:

Today, on November 24, 2008, we launch the 1.0 version of the Processing software. Processing is a programming language, development environment, and online community that since 2001 has promoted software literacy within the visual arts. Initially created to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing quickly developed into a tool for creating finished professional work as well.

Processing is a free, open source alternative to proprietary software tools with expensive licenses, making it accessible to schools and individual students. Its open source status encourages the community participation and collaboration that is vital to Processing’s growth. Contributors share programs, contribute code, answer questions in the discussion forum, and build libraries to extend the possibilities of the software. The Processing community has written over seventy libraries to facilitate computer vision, data visualization, music, networking, and electronics.

Students at hundreds of schools around the world use Processing for classes ranging from middle school math education to undergraduate programming courses to graduate fine arts studios.

  • At New York University’s graduate ITP program, Processing is taught alongside its sister project Arduino and PHP as part of the foundation course for 100 incoming students each year.
  • At UCLA, undergraduates in the Design | Media Arts program use Processing to learn the concepts and skills needed to imagine the next generation of web sites and video games.
  • At Lincoln Public Schools in Nebraska and the Phoenix Country Day School in Arizona, middle school teachers are experimenting with Processing to supplement traditional algebra and geometry classes.

Tens of thousands of companies, artists, designers, architects, and researchers use Processing to create an incredibly diverse range of projects.

  • Design firms such as Motion Theory provide motion graphics created with Processing for the TV commercials of companies like Nike, Budweiser, and Hewlett-Packard.
  • Bands such as R.E.M., Radiohead, and Modest Mouse have featured animation created with Processing in their music videos.
  • Publications such as the journal Nature, the New York Times, Seed, and Communications of the ACM have commissioned information graphics created with Processing.
  • The artist group HeHe used Processing to produce their award-winning Nuage Vert installation, a large-scale public visualization of pollution levels in Helsinki.
  • The University of Washington’s Applied Physics Lab used Processing to create a visualization of a coastal marine ecosystem as a part of the NSF RISE project.
  • The Armstrong Institute for Interactive Media Studies at Miami University uses Processing to build visualization tools and analyze text for digital humanities research.

The Processing software runs on the Mac, Windows, and GNU/Linux platforms. With the click of a button, it exports applets for the Web or standalone applications for Mac, Windows, and GNU/Linux. Graphics from Processing programs may also be exported as PDF, DXF, or TIFF files and many other file formats. Future Processing releases will focus on faster 3D graphics, better video playback and capture, and enhancing the development environment. Some experimental versions of Processing have been adapted to other languages such as JavaScript, ActionScript, Ruby, Python, and Scala; other adaptations bring Processing to platforms like the OpenMoko, iPhone, and OLPC XO-1.

Processing was founded by Ben Fry and Casey Reas in 2001 while both were John Maeda’s students at the MIT Media Lab. Further development has taken place at the Interaction Design Institute Ivrea, Carnegie Mellon University, and the UCLA, where Reas is chair of the Department of Design | Media Arts. Miami University, Oblong Industries, and the Rockefeller Foundation have generously contributed funding to the project.

The Cooper-Hewitt National Design Museum (a Smithsonian Institution) included Processing in its National Design Triennial. Works created with Processing were featured prominently in the Design and the Elastic Mind show at the Museum of Modern Art. Numerous design magazines, including Print, Eye, and Creativity, have highlighted the software.

For their work on Processing, Fry and Reas received the 2008 Muriel Cooper Prize from the Design Management Institute. The Processing community was awarded the 2005 Prix Ars Electronica Golden Nica award and the 2005 Interactive Design Prize from the Tokyo Type Director’s Club.

The Processing website includes tutorials, exhibitions, interviews, a complete reference, and hundreds of software examples. The Discourse forum hosts continuous community discussions and dialog with the developers.

Tuesday, November 25, 2008 | processing  

Visualizing Data with an English translation and Processing.js

Received a note from Vitor Silva, who created the Portuguese-language examples from Visualizing Data using Processing.js:

i created a more “world friendly” version of the initial post. it’s now in english (hopefully in a better translation than babelfish) and it includes a variation on your examples of chapter 3.

The new page can be found here. And will you be shocked to hear that indeed it is far better than Babelfish?

Many thanks to Vitor for the examples and the update.

Wednesday, November 19, 2008 | examples, feedbag, translation, vida  

John Oliver and John King’s Magic Wall

Hilarious, bizarre, and rambling segment from last night’s Daily Show featuring John Oliver’s take on CNN’s favorite toy from this year’s election.

I’m continually amazed by the amount of interest this technology generates (yeah, I posted about it too), so perspective from the Daily Show is always helpful and welcome.

Wednesday, November 19, 2008 | election, interact  

What has driven women out of Computer Science?

1116-sbn-webdigi-crop.gifCasey yesterday noted this article from the New York Times on the declining number of women who are pursuing computer science degrees. Declining as in “wow, weren’t the numbers too low already?” From the article’s introduction:

ELLEN SPERTUS, a graduate student at M.I.T., wondered why the computer camp she had attended as a girl had a boy-girl ratio of six to one. And why were only 20 percent of computer science undergraduates at M.I.T. female? She published a 124-page paper, “Why Are There So Few Female Computer Scientists?”, that catalogued different cultural biases that discouraged girls and women from pursuing a career in the field. The year was 1991.

Computer science has changed considerably since then. Now, there are even fewer women entering the field. Why this is so remains a matter of dispute.

The article goes on to explain that even though there is far better gender parity (since 1991) when looking at roles in technical fields, computer science still stands alone in moving backwards.

The text also covers some of the “do it with gaming!” nonsense. As someone who became interested in programming because I didn’t like games, I’ve never understood why gaming was pushed as a cure-all for disinterest in programming:

Such students who choose not to pursue their interest may have been introduced to computer science too late. The younger, the better, Ms. Margolis says. Games would offer considerable promise, except that they have been tried and have failed to have an effect on steeply declining female enrollment.

But I couldn’t agree more with the sentiment with regard to age. I know of two all-girls schools (Miss Porter’s in Connecticut and Nightingale-Bamford in New York) who have used Processing in courses with high school and middle school students, and I couldn’t be more excited about it. Let’s hope there are more.

Tuesday, November 18, 2008 | cs, gender, reading  

Visualizing Data with Portuguese and Processing.js

Very cool! Check out these implementations of several Visualizing Data examples that make use of John Resig’s Processing.js, an adaptation of the Processing API with pure JavaScript. This means running in a web browser with no additional plug-ins (no Java Virtual Machine kicking in while you take a sip of coffee—much less drain the whole cup, depending the speed of your computer). Since the first couple chapters cover straightforward, static exercises, I’d been wanting to try this, but it’s more fun when someone beats you to it. (Nothing is better than feeling like a slacker, after all.)

map-example.pngView the introductory Processing sketch from Page 22,  or the map of the United States populated with random data points from Page 35.

Babelfish translation of the page here, with choice quotes like “also the shipment of external filing-cabinets had that to be different of what was in the book.”

And the thing is, when I finished the proof of the book for O’Reilly, I had this uneasy feeling that I was shipping the wrong filing-cabinets. Particularly the external ones.

Monday, November 17, 2008 | examples, processing, translation, vida  

Did Forbes just write an article about a font?

Via this Slate article from Farhad Manjoo (writer of tech-hype articles with Salon and now Slate), I just read about Droid, the typeface used in Google’s new Android phones. More specifically, he references this Forbes article, describing the background of the font, and its creator, Steve Matteson of Ascender Corporation in Elk Grove, Illinois.

Some background from the Forbes piece:

In fonts, Google has a predilection for cute letters and bright primary colors, as showcased in the company’s own logo. But for Android Google wanted a font with “common appeal,” Davis says. Ascender’s chief type designer, Steve Matteson, who created the Droid fonts, says Google requested a design that was friendly and approachable. “They wanted to see a range of styles, from the typical, bubbly Google image to something very techno-looking,” Matteson says.


The sweet spot—and the final look for Droid—fell somewhere in the middle. Matteson’s first design was “bouncy”: a look in line with the Google logo’s angled lowercase “e.” Google passed on the design because it was “a little too mannered,” Matteson says. “There was a fine line between wanting the font to have character but not cause too much commotion.”

Another proposal erred on the side of “techno” with squared-off edges reminiscent of early computer typefaces. That too was rejected, along with several others, in favor of a more neutral design that Matteson describes as “upright with open forms, but not so neutral as a design like, say, Helvetica.”

I haven’t had a chance to play with an Android phone (as much as I’ve been happy with T-Mobile, particularly their customer service, do I re-up with them for two years just to throw money at alpha hardware?) so I can’t say much about the face, but I find the font angle fascinating, particular in light of Apple’s Helvetica-crazy iPhone and iPod Touch. (Nothing says late 1950s Switzerland quite like a touch-screen interface mobile phone, after all.)

Ascender Corporation also seems to be connected to the hideously named C@#$(*$ fonts found in Windows Vista and Office 2007: Calibri, Cambria, Candara, Consolas, Constantia, Corbel, Cariadings. In the past several years, Microsoft has shown a notable and impressive commitment to typography (most notably, hiring Matthew Carter to create Verdana, and other decisions of that era), but the new C* fonts have that same air of creepiness of a family who names all their kids with names that start with the same letter. I mean sure, they’re terrific people, but man, isn’t that just a little…unnecessary?

Monday, November 17, 2008 | mobile, typography  

Change is always most interesting

The New York Times has a very nicely done election map this year. Amongst its four viewing options is a depiction of counties that voted more Democratic (blue) or Republican (red) in comparison to the 2004 presidential election:


The blue is to be expected, given that the size of the win for Obama, but the red pattern is quite striking.

Also note the shift for candidate home states, in Arizona with McCain on the ticket, and what appears to be the reverse result in parts of Massachusetts, with Kerry no longer on the ticket. (The shift to the Democrats in Indiana is also amazing: without looking at the map closely enough I had assumed that area to be Obama’s home of Illinois.)

I recommend checking out the actual application on the Times site, the interaction lacks some of the annoying ticks that can be found in some of their other work (irritating rollovers that get in the way, worthless zooming, and silly transition animations). It’s useful and succinct, just like an infographic should be. Or just the way Mom used to make. Or whatever.

Thursday, November 6, 2008 | infographics, interact, mapping, politics  

iPolljunkie, iPoliticsobsession, iFix, iLackawittytitle

I apologize that I’ve been too busy and distracted with preparing Processing 1.0 to have any time to post things here, but here’s a quickie so that the page doesn’t just rot into total embarrassment.

Slate this morning announced the availability of a poll tracking application for the iPhone:


I haven’t yet ponied up ninety nine of my hard-earned cents to buy it but find it oddly fascinating. Is there actually any interest for this? Is this a hack? Is there a market for such things? Is the market simply based on the novelty of it? Is it possible to quantify the size of the poll-obsessed political junkie market? And how is that market comprised—what percentage of those people are part of campaigns, versus just people who spend too much time reading political news? (I suspect the former is negligible, but may be tainted as a card-carrying member of the latter group.)

To answer my own questions, I suspect that it was thrown together by a couple of people from the tech side of the organization (meaning “hack” in the best sense of the word), who then sold management on it, with the rationale of 1) it’ll generate a little press (or hype on, um, blogs), 2) it’ll reinforce Slate readers’ interest in or connection to the site, and 3) it’s a little cool and trendy. I don’t think they’re actually planning to make money on it (or recoup any development costs), but that the price tag has more to do with 99¢ sounds more valuable and interesting than a free giveaway.

Of course, anyone with more interesting insights (let alone useful facts), please pass them along. I’m hoping it’s an actual Cocoa app, and not just a special link to web pages reformatted for the iPhone, which would largely invalidate this post and extinguish my own curiosity about the beast.

Update: The application is a branded reincarnation of a poll tracker developed by Aaron Brethorst at Chimp Software. Here’s his blog post announcing the change, and even a press release.

Friday, October 3, 2008 | infographics, mobile, politics, software  

Three-dimensional force-directed starling layout

Amazing video of starling flocking behavior, via Dan Paluska:

And how a swarm reacts to a falcon attack, via Burak Arikan:

For myself and all you designers out there just getting their heads around particle simulations, this is just a reminder: nature is better than you.

Wednesday, September 24, 2008 | forcelayout, physical, science  

Small Design Firm is looking for a programmer-designer

nobel00.jpgMy friends down the street at Small Design Firm (started by Media Lab alum and namesake David Small) are looking for a programmer-designer type:

Small Design Firm is an interactive design studio that specializes in museum exhibits, information design, dynamic typography and interactive art. We write custom graphics software and build unique physical installations and media environments. Currently our clients include the Metropolitan Museum of Art, United States Holocaust Memorial Museum and Maya Lin.

We are looking to hire an individual with computer programming and design/art/architecture skills. Applicants should have a broad skill set that definitely includes C++ programming experience and an interest in the arts. This position is open to individuals with a wide variety of experiences and specialities. Our employees have backgrounds in computer graphics, typography, electrical engineering, architecture, music, and physics.

Responsibilities will be equally varied. You will be programming, designing, writing proposals, working directly with clients, managing content and production, and fabricating prototypes and installations.

Small Design Firm is an energetic and exciting place to work. We are a close-knit community, so we are looking for an outgoing team member who is willing to learn new skills and bring new ideas to the group.

Salary is commensurate with experience and skill set. Benefits include health insurance, SIMPLE IRA, and paid vacation.

Contact john (at) smalldesignfirm.com if you’re interested.

Tuesday, September 16, 2008 | opportunities  

Hide the bipolar data, here comes bioinformatics!

I was fascinated a few weeks ago to receive this email from the Genome-announce list at UCSC:

Last week the National Institutes of Health (NIH) modified their policy for posting and accessing genome-wide association studies (GWAS) data contained in NIH databases. They have removed public access to aggregate genotype GWAS data in response to the publication of new statistical techniques for analyzing dense genomic information that make it possible to infer the group assignment (case vs. control) of an individual DNA sample under certain circumstances. The Wellcome Trust Case Control Consortium in the UK and the Broad Institute of MIT and Harvard in Boston have also removed aggregate data from public availability. Consequently, UCSC has removed the “NIMH Bipolar” and “Wellcome Trust Case Control Consortium” data sets from our Genome Browser site.

The ingredients for a genome-wide association study are a few hundred people, and a list of what genetic letter (A, C, G, or T) is found at a few hundred specific locations in the DNA of each of those people. Such data is then correlated to whether individuals have a particular disease, and using the correlation, it’s possible to sometimes localize what part of the genome is responsible for the disease.

Of course, the diseases might be of a sensitive nature (e.g. bipolar disorder), so when such data is made publicly available, it’s done in a manner that protects the privacy of the individuals in the data set. What this message means is that a bioinformatics method has been developed that undermines those privacy protections. An amazing bit of statistics!

This made me curious about what led to such a result, so with a little digging, I found this press release, which describes the work:

A team of investigators led by scientists at the Translational Genomics Research Institute (TGen) have found a way to identify possible suspects at crime scenes using only a small amount of DNA, even if it is mixed with hundreds of other genetic fingerprints.

Using genotyping microarrays, the scientists were able to identify an individual’s DNA from within a mix of DNA samples, even if that individual represented less than 0.1 percent of the total mix, or less than one part per thousand. They were able to do this even when the mix of DNA included more than 200 individual DNA samples.

The discovery could help police investigators better identify possible suspects, even when dozens of people over time have been at a crime scene. It also could help reassess previous crime scene evidence, and it could have other uses in various genetic studies and in statistical analysis.

So the CSI folks have screwed it up for the bipolar folks. The titillatingly-titled “Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays” can be found at PLoS Genetics, and a PDF describing the the policy changes is on the NIH’s site for Genome-Wide Association Studies. The PDF provides a much more thorough explanation of what association studies are, in case you’re looking for something better than my cartoon version described above.

Links to much more coverage can be found here, which includes major journals (Nature) and mainstream media outlets (LA Times, Financial Times) weighing in on the research. (It’s always funny to see how news outlets respond to this sort of thing—the Financial Times talk about the positive side, the LA Times focuses exclusively on the negative.) A discussion about the implications of the study can also be found on the PLoS site, with further background from the study’s primary author.

Science presents such fascinating contradictions. A potentially helpful advance that undermines another area of research. The breakthrough that opens a Pandora’s Box. It’s probably rare to see such a direct contradiction (that’s not heavily politicized like, say, stem cell research), but the social and societal impact is undoubtedly one of the things I love most about genetics in particular.

Tuesday, September 16, 2008 | genetics, mine, privacy, science  

Mention Offhand and Ye Shall Receive

Just received a helpful note from Nelson Minar, who notes an already redrawn version of the graph from the last post over at Chartjunk. The redraw aims to improve the proportion between the different tax brackets:


Much better! Read more about their take, and associated caveats here. (Also thanks to Peter Merholz and Andrew Otwell who also wrote, yet were no match for Nelson’s swift fingers.)

Saturday, September 13, 2008 | feedbag, infographics, notaneconomist, politics  

Glancing at Tax Proposals

Finally, the infographic I’ve been waiting for, the Washington Post compares the tax proposals of United States presidential candidates John McCain and Barack Obama:


Lots of words have been spilled over the complexities of tax policy, whether in stump speeches, advertisements, or policy papers. But these are usually distilled for voters in lengthy articles that throw more words at the problem. But compare even a well-written article like this one at Business Week versus the graphic above from the Washington Post. Which of the two will you be able to remember tomorrow?

I also appreciate that the graphic very clearly represents the general tax policies of Republicans vs. Democrats, without showing bias toward either. The only thing that’s missing is a sense of how big each of the categories are – how many people are in the “over $2.87 million” category versus how many are in the “$66,000 to $112,000” category, which would help convey a better sense of the “middle class” term that candidates like to throw around.

There is still greater complexity to the debate than what’s shown in this image (the Business Week article describes treasury shortfalls based on the McCain proposal, for instance), but without the initial explanation provided by that graphic, will voters even bother with those details?

Saturday, September 13, 2008 | infographics, notaneconomist, politics  

Sustainable Creativity at Pixar

pixar_photo5_blursharpen.jpgGiven some number of talented people, success is not particularly surprising. But sustaining that success in a creative organization, the way that Pixar has over the last fifteen years is truly exceptional. Ed Catmull, cofounder of Pixar (and computer graphics pioneer) writes about their success for the Harvard Business Review:

Unlike most other studios, we have never bought scripts or movie ideas from the outside. All of our stories, worlds, and characters were created internally by our community of artists. And in making these films, we have continued to push the technological boundaries of computer animation, securing dozens of patents in the process.

On Creativity:

People tend to think of creativity as a mysterious solo act, and they typically reduce products to a single idea: This is a movie about toys, or dinosaurs, or love, they’ll say. However, in filmmaking and many other kinds of complex product development, creativity involves a large number of people from different disciplines working effectively together to solve a great many problems. The initial idea for the movie—what people in the movie business call “the high concept”—is merely one step in a long, arduous process that takes four to five years.

A movie contains literally tens of thousands of ideas.

On Taking Risks:

…we as executives have to resist our natural tendency to avoid or minimize risks, which, of course, is much easier said than done. In the movie business and plenty of others, this instinct leads executives to choose to copy successes rather than try to create something brand-new. That’s why you see so many movies that are so much alike. It also explains why a lot of films aren’t very good. If you want to be original, you have to accept the uncertainty, even when it’s uncomfortable, and have the capability to recover when your organization takes a big risk and fails. What’s the key to being able to recover? Talented people!

Reminding us that we learn more from failure, the more interesting part of the article talks about how Pixar responded to early failures in Toy Story 2:

Toy Story 2 was great and became a critical and commercial success—and it was the defining moment for Pixar. It taught us an important lesson about the primacy of people over ideas: If you give a good idea to a mediocre team, they will screw it up; if you give a mediocre idea to a great team, they will either fix it or throw it away and come up with something that works.

Toy Story 2 also taught us another important lesson: There has to be one quality bar for every film we produce. Everyone working at the studio at the time made tremendous personal sacrifices to fix Toy Story 2. We shut down all the other productions. We asked our crew to work inhumane hours, and lots of people suffered repetitive stress injuries. But by rejecting mediocrity at great pain and personal sacrifice, we made a loud statement as a community that it was unacceptable to produce some good films and some mediocre films. As a result of Toy Story 2, it became deeply ingrained in our culture that everything we touch needs to be excellent.

On mixing art and technology:

[Walt Disney] believed that when continual change, or reinvention, is the norm in an organization and technology and art are together, magical things happen. A lot of people look back at Disney’s early days and say, “Look at the artists!” They don’t pay attention to his technological innovations. But he did the first sound in animation, the first color, the first compositing of animation with live action, and the first applications of xerography in animation production. He was always excited by science and technology.

At Pixar, we believe in this swirling interplay between art and technology and constantly try to use better technology at every stage of production. John coined a saying that captures this dynamic: “Technology inspires art, and art challenges the technology.”

I saw Catmull speak to the Computer Science department a month or two before I graduated from Carnegie Mellon. Toy Story had been released two years earlier, and 20 or 30 of us were all jammed into a room listening to this computer graphics legend speaking about…storytelling. The importance of narrative. How the movies Pixar was creating had less to do with the groundbreaking computer graphics (the reason that most were in the room) than it did with a good story. This is less shocking nowadays, especially if you’ve ever seen a lecture by someone from Pixar, but the scene left an incredible impression on me. It was a wonderful message to the programmers in attendance about the importance of placing purpose before the technology, but without belitting the importance of either.

(While digging for an image to illustrate this post, I also found this review of The Pixar Touch: The Making of a Company, a book that seems to cover similar territory as the HBR article, but from the perspective of an outside author. The image is stolen from Ricky Grove’s review.)

Tuesday, September 9, 2008 | creativity, failure, movies  

Temple of Post-Its

The writing room of author Will Self (Wikipedia), where he organizes his complicated stories through copious use of small yellow (and pink) adhesive papers on the wall:


Or amongst a map and more papers:


Not even the bookshelf is safe:


Check out the whole collection.

Reminds me of taking all the pages of my Ph.D. dissertation (a hundred or so) and organizing them on the floor of a friend’s living room. (Luckily it was a large living room.) It was extremely helpful and productive but frightened my friend who returned home to a sea of paper and a guy who had been indoors all day sitting in the middle of it with a slightly wild look in his eyes.

(Thanks to Jason Leigh, who mentioned the photos during his lecture at last week’s iCore summit in Banff.)

Wednesday, September 3, 2008 | collections, organize  

In A World…Without Don LaFontaine

Don LaFontaine, voice artist for some 5,000 movies and 350,000 advertisements passed away Monday. He’s the man who came up with the “In A World…” that begins most film trailers, as well as the baritone voice style that goes with it. The Washington Post has an obituary.

In the early 1960s, he landed a job in New York with National Recording Studios, where he worked alongside radio producer Floyd L. Peterson, who was perfecting radio spots for movies. Until then, movie studios primarily relied on print advertising or studio-made theatrical trailers. The two men became business partners and, together, perfected the familiar format.

Mr. LaFontaine, who was editing, writing and producing in the early days of the partnership, became a voice himself by accident. In 1964, when an announcer failed to show up for a job, he recorded himself reading copy and sent it to the studio with a message: “This is what it’ll sound like when we get a ‘real’ announcer.”

Trailer for The Elephant Man, proclaimed to be his favorite:

And a short interview/documentary:

Don’s impact is unmistakable, and it’s striking to think of how his approach changed movie advertising. May he rest in peace.

Wednesday, September 3, 2008 | movies  

Handcrafted Data

1219473416_8507.jpgContinuing Luddite Monday, a new special feature on benfry.com, an article from the Boston Globe about the prevalence of handcrafted images in reference texts. Dushko Petrovich writes:

But in fact, nearly two centuries after the publication of his famous folios, it is Audubon’s technique, and not the sharp eye of the modern camera, that prevails in a wide variety of reference books. For bird-watchers, the best guides, the most coveted guides – like those by David Allen Sibley and Roger Tory Peterson – are still filled with hand-painted images. The same is true for similar volumes on fish, trees, and even the human body. Ask any first-year medical student what they consult during dissections, and they will name Dr. Frank H. Netter’s meticulously drafted “Atlas of Human Anatomy.” Or ask architects and carpenters to see their structures, and they will often show you chalk and pencil “renderings,” even after the things have been built and professionally photographed.

This nicely reinforces the case for drawing, and why it’s so powerful. The article later gets to the meat of the issue, which is the same reason that drawing is a topic on a site about data visualization.

Besides seamlessly imposing a hierarchy of information, the handmade image is also free to present its subject from the most efficient viewpoint. Audubon sets a high standard in this regard; he is often at pains to depict the beak in its most revealing profile, the crucial feathers at an identifiable angle, the front leg extended just so. When the nighthawk and the whip-poor-will are pictured in full flight, their legs tucked away, he draws the feet at the side of the page, so we’re not left guessing. If Audubon draws a bird in profile, as he does with the pitch-black rook and the grayer hooded crow, we’re not missing any details a three-quarters view would have shown.

And finally, a reminder:

Confronted with unprecedented quantities of data, we are constantly reminded that quality is what really matters. At a certain point, the quality and even usefulness of information starts being defined not by the precision and voracity of technology, but by the accuracy and circumspection of art. Seen in this context, Audubon shows us that painting is not just an old fashioned medium: it is a discipline that can serve as a very useful filter, collecting, editing, and carefully synthesizing information into a single efficient and evocative image – giving us the information that we really want, information we can use and, as is the case with Audubon, even cherish.

Consider this your constant reminder, because I think it’s actually quite rare that quality is acknowledged. I regularly attend lectures by speakers who boast about how much data they’ve collected and the complexity of their software and hardware, but it’s one in ten thousand who even mention the art of removing or ignoring data in search of better quality.

Looks like the Early Drawings book mentioned in the article will be available at the end of September.

Monday, September 1, 2008 | drawing, human, refine  

Skills as Numbers

numerati-small.jpgBusinessWeek has an excerpt of Numerati, a book about the fabled monks of data mining (publishers weekly calls them “entrepreneurial mathematicians”) who are sifting through the personal data we create every day.

Picture an IBM manager who gets an assignment to send a team of five to set up a call center in Manila. She sits down at the computer and fills out a form. It’s almost like booking a vacation online. She puts in the dates and clicks on menus to describe the job and the skills needed. Perhaps she stipulates the ideal budget range. The results come back, recommending a particular team. All the skills are represented. Maybe three of the five people have a history of working together smoothly. They all have passports and live near airports with direct flights to Manila. One of them even speaks Tagalog.

Everything looks fine, except for one line that’s highlighted in red. The budget. It’s $40,000 over! The manager sees that the computer architect on the team is a veritable luminary, a guy who gets written up in the trade press. Sure, he’s a 98.7% fit for the job, but he costs $1,000 an hour. It’s as if she shopped for a weekend getaway in Paris and wound up with a penthouse suite at the Ritz.

Hmmm. The manager asks the system for a cheaper architect. New options come back. One is a new 29-year-old consultant based in India who costs only $85 per hour. That would certainly patch the hole in the budget. Unfortunately, he’s only a 69% fit for the job. Still, he can handle it, according to the computer, if he gets two weeks of training. Can the job be delayed?

This is management in a world run by Numerati.

I’m highly skeptical of management (a fundamentally human activity) being distilled to numbers in this manner. Unless, of course, the managers are that poor at doing their job. And further, what’s the point of the manager if they’re spending most of their time filling out the vacation form-style work order? (Filling out tedious year-end reviews, no doubt.) Perhaps it should be an indication that the company is simply too large:

As IBM sees it, the company has little choice. The workforce is too big, the world too vast and complicated for managers to get a grip on their workers the old-fashioned way—by talking to people who know people who know people.

Then we descend (ascend?) into the rah-rah of today’s global economy:

Word of mouth is too foggy and slow for the global economy. Personal connections are too constricted. Managers need the zip of automation to unearth a consultant in New Delhi, just the way a generation ago they located a shipment of condensers in Chicago. For this to work, the consultant—just like the condensers—must be represented as a series of numbers.

I say rah-rah because how else can you put refrigeration equipment parts in the same sentence as a living, breathing person with a mind, free will and a life.

And while I don’t think I agree with this particular thesis, the book as a whole looks like an interesting survey of efforts in this area. Time to finish my backlog of Summer reading so I can order more books…

Monday, September 1, 2008 | human, mine, notafuturist, numberscantdothat, privacy, social  

Is Processing a Language?

This question is covered in the FAQ on Processing.org, but still tends to reappear on the board every few months (most recently here). Someone once described Processing syntax as a dialect of Java, which sounds about right to me. It’s syntax that we’ve added on top of Java to make things a little easier for a particular work domain (roughly, making visual things). There’s also a programming environment that significantly simplifies what’s found in traditional IDEs. Plus there’s a core API set (and a handful of core libraries) that we’ve built to support this type of work. If we did these in isolation, none would really stick out:

  • The language changes are pretty minimal. The big difference is probably how they integrate with the IDE that’s built around the idea of sitting down and quickly writing code (what we call sketching). We don’t require users to first learn class definitions or even method declarations before they can show something on the screen, which helps avoid some of the initial head-scratching that comes from trying to explain “public class” or “void” or beginning programmers. For more advanced coders, it helps Java feel a bit more like scripting. I use a lot of Perl for various tasks, and I wanted to replicate the way you can write 5-10 lines of Perl (or Python, or Ruby, or whatever) and get something done. In Java, you often need double that number of lines just to set up your class definitions and a thread.
  • The API set is a Java API. It can be used with traditional Java IDEs (Eclipse, Netbeans, whatever) and a Processing component can be embedded into other applications. But without the rest of it (the syntax and IDE), Processing (API or otherwise) it would not be as widely used as it is today. The API grew out of Casey and I’s work, and our like/dislike of various approaches used by libraries that we’ve used: Postscript, QuickDraw, OpenGL, Java AWT, even Applesoft BASIC. Can we do OpenGL but still have it feel as simple as writing graphics code on the Apple ][? Can we simplify current graphics approaches so that they at least feel simpler like the original QuickDraw on the Mac?
  • The IDE is designed to make Java-style programming less wretched. Check out the Integration discussion board to see just how un-fun it is to figure out how the Java CLASSPATH and java.library.path work, or how to embed AWT and Swing components. These frustrations and complications sometimes are even filed as bugs in the Processing bugs database by users who have apparently become spoiled by not having to worry about such things.

If pressed, perhaps the language itself is probably the easiest to let go of—witness the Python, Ruby and now JavaScript versions of the API, or the C++ version that I use for personal work (when doing increasingly rare C++ projects). And lots of people build Processing projects without the preprocessor and PDE.

In some cases, we’ve even been accused of not being clear that it’s “just Java,” or even that Processing is Java with a trendy name. Complaining is easier than reading, so there’s not much we can do for people who don’t glance at the FAQ before writing their unhappy screeds. And with the stresses of the modern world, people need to relieve themselves of their angst somehow. (On the other hand, if you’ve met either of us, you’ll know that Casey and I are very trendy people, having grown up in the farmlands of Ohio and Michigan.)

However, we don’t print “Java” on every page of Processing.org for a very specific reason: knowing it’s Java behind the scenes doesn’t actually help our audience. In fact, it usually causes more trouble than not because people expect it to behave exactly like Java. We’ve had a number of people who copy and pasted code from the Java Tutorial into the PDE, and are confused when it doesn’t work.

(Edit – In writing this, I don’t want to understate the importance of Java, especially in the early stages of the Processing project. It goes without saying that we owe a great deal to Sun for developing, distributing, and championing Java. It was, and is, the best language/environment on which to base the project. More about the choice of language can be found in the FAQ.)

But for as much trouble as the preprocessor and language component of Processing is for us to develop (or as irrelevant it might seem to programmers who already code in Java), we’re still not willing to give that up—damned if we’re gonna make students learn how to write a method declaration and “public class Blah extends PApplet” before they can get something to show up on the screen.

I think the question is a bit like the general obsession of people trying to define Apple as a hardware or software company. They don’t do either—they do both. They’re one of the few to figure out that the distinction actually gets in the way of delivering good products.

Now, whether we’re delivering a good product is certainly questionable—the analogy with Apple may, uh, end there.

Wednesday, August 27, 2008 | languages, processing, software  

Mapping Iran’s Online Public

mapping-iran-public-200px.jpg“Mapping Iran’s Online Public” is a fascinating (and very readable) paper from a study by John Kelly and Bruce Etling at Harvard’s Berkman Center. From the abstract:

In contrast to the conventional wisdom that Iranian bloggers are mainly young democrats critical of the regime, we found a wide range of opinions representing religious conservative points of view as well as secular and reform-minded ones, and topics ranging from politics and human rights to poetry, religion, and pop culture. Our research indicates that the Persian blogosphere is indeed a large discussion space of approximately 60,000 routinely updated blogs featuring a rich and varied mix of bloggers.

In addition to identifying four major poles (Secular/Reformist, Conservative/Religious, Persian Poetry and Literature, and Mixed Networks.) A number of surprising findings include details like the nature of discourse (such as the prominence of the poetry and literature category) or issues of anonymity:

…a minority of bloggers in the secular/reformist pole appear to blog anonymously, even in the more politically-oriented part of it; instead, it is more common for bloggers in the religious/conservative pole to blog anonymously. Blocking of blogs by the government is less pervasive than we had assumed.

They also produced images to represent the nature of the networks, seen in the thumbnail at right. The visualization is created with a force-directed layout that iteratively groups data points closer based on their content. It’s useful for this kind of study, where the intent is to represent or identify larger groups. In this case, the graphic supports what’s laid out in the text, but to me the most interesting thing about the study is the human-centered tasks of the project, such as the work done by hand in reviewing and categorizing such a large number of sites. It’s this background work that sets it apart from many other images like it which tend to rely too heavily on automation.

(The paper is from April 6, 2008 and I first heard about after being contacted by John in June. Around 1999, our group had hosted students that he was teaching in a summer session for a visit to the Media Lab. And now a few months later, I’m digging through my writing todo pile.)

Tuesday, August 26, 2008 | forcelayout, represent, social  

Panicky Addition

In response to the last post, a message from João Antunes:

…you should also read this story about Panic’s old MP3 player applications.

The story includes how they came to almost dominate the Mac market before iTunes, how AOL and Apple tried to buy the application before coming out with iTunes, even recollections of meetings with Steve Jobs and how he wanted them to go work at Apple – it’s a fantastic indie story.

Regarding the Mac ‘indie’ development there’s this recent thesis by a Dutch student, also a good read.

I’d read the story about Audion (the MP3 player) before, and failed to make the connection that this was the same Audion that I rediscovered in the O’Reilly interview from the last post (and took a moment to mourn its loss). It’s sad to think of how much better iTunes would be if the Panic guys were making it — iTunes must be the first MP3 player that feels like a heavy duty office suite. In the story, Cabel Sasser (the other co-founder of Panic) begins:

Is it just me? I mean, do you ever wonder about the stories behind everyday products?

What names were Procter & Gamble considering before they finally picked “Swiffer”? (Springle? Sweepolio? Dirtrocker?) What flavors of Pop-Tarts never made it out of the lab, and did any involve lychee, the devil’s fruit?

No doubt the backstory on the Pop-Tarts question alone could be turned into a syndicated network show to compete with LOST.

Audion is now available as a free download, though without updates since 2002, it’s not likely to work much longer (seemed fine with OS X 10.4, though who knows with even 10.5).

Tuesday, August 19, 2008 | feedbag, software  

Mangled Tenets and Exasperation: the iTunes App Store

By way of Darling Furball, a blog post by Steven Frank, co-founder of Panic, on his personal opinion of Apple’s gated community of software distribution, the iTunes App Store:

Some of my most inviolable principles about developing and selling software are:

  1. I can write any software I want. Nobody needs to “approve” it.
  2. Anyone who wants to can download it. Or not.
  3. I can set any price I want, including free, and there’s no middle-man.
  4. I can set my own policies for refunds, coupons and other promotions.
  5. When a serious bug demands an update, I can publish it immediately.
  6. If I want, I can make the source code available.
  7. If I want, I can participate in a someone else’s open source project.
  8. If I want, I can discuss coding difficulties and solutions with other developers.

The iTunes App Store distribution model mangles almost every one of those tenets in some way, which is exasperating to me.

But, the situation’s not that clear-cut.

The entire post is very thoughtful and well worth reading, it’s also coming from a long-time Apple developer rather than some crank from an online magazine looking to stir up advertising hits. Panic’s software is wonderful: Transmit is an application that singlehandedly makes me want to use a Mac (yet it’s only, uh, an SFTP client). I think his post nicely sums up the way a lot of developers (including myself) feel about the App Store. He concludes:

I’ve been trying to reconcile the App Store with my beliefs on “how things should be” ever since the SDK was announced. After all this time, I still can’t make it all line up. I can’t question that it’s probably the best mobile application distribution method yet created, but every time I use it, a little piece of my soul dies. And we don’t even have anything for sale on there yet.

Reading this also made me curious to learn more about Panic, which led me to this interview from 2004 with Frank and the other co-founder. He also has a number of side projects, including Spamusement, a roughly drawn cartoon depicting spam headlines (Get a bigger flute, for instance).

Tuesday, August 19, 2008 | mobile, software  

Data as Dairy

As a general tip, keep in mind that any data looks better as a wheel of Gouda.

delicious cheese

You say “market share,” I say “wine pairing.”

(Via this article, passed along by a friend looking for ways to make pie charts with more visual depth.)

Tuesday, August 19, 2008 | refine, represent  

History of Predictive Text Swearing

Wonderful commentary on being nannied by your mobile, and head-in-the-sand text prediction algorithms.

There’s lots more to be said about predictive text, but in the meantime, this also brings to mind Jonathan Harris’ QueryCount, which I found to be a more interesting followup to his WordCount project. (WordCount tells us something we already know, but QueryCount lets us see something we suspect.)

Monday, August 18, 2008 | text  

“Hello Kettle? Yeah, hi, this is the Pot calling.”

Wired’s Ryan Singel reports on a spat between AT&T and Google regarding their privacy practices:

Online advertising networks — particularly Google’s — are more dangerous than the fledgling plans and dreams of ISPs to install eavesdropping equipment inside their internet pipes to serve tailored ads to their customers, AT&T says.

Even more fun than watching gorillas fight (you don’t have to pick a side—it’s guaranteed to be entertaining) is when they bring up accusations that are usually reserved for the security and privacy set (or borderline paranoids who write blogs that cover  information and privacy). Or their argument boils down to “but we’re less naughty than you.” Ask any Mom about the effectiveness of that argument. AT&T writes:

Advertising-network operators such as Google have evolved beyond merely tracking consumer web surfing activity on sites for which they have a direct ad-serving relationship. They now have the ability to observe a user’s entire web browsing experience at a granular level, including all URLs visited, all searches, and actual page-views.

Deep Packet Inspection is an important sounding way to say that they’re just watching all your traffic. It’s quite literally the same as the post office opening all your letters and reading them, and in AT&T’s case, adding additional bulk mail (flyers, sweepstakes, and other junk) that seems appropriate to your interests based on what they find.

Are you excited yet?

Monday, August 18, 2008 | privacy  

The Importance of Failure

This segment from CBS Sunday Morning isn’t particularly groundbreaking or profound (and perhaps a bit hokey), but is a helpful reminder on the importance of failure. (Nevermind the failure to post anything new for two weeks.)

Duke University professor Henry Petroski has made a career studying design failures, which he says are far more interesting than successes.

“Successes teach us very little,” Petroski said.

Petroski’s talking about bridges, but it holds true for any creative endeavor.

Also cited are J.K. Rowling bottoming out before her later success, van Gogh who sold just one painting before his death, Michael Jordan not making his high school basketball team, and others. (You’ve heard of these, but like I said, it’s about the reminder.)

It also notes that the important part is also how you handle failure, citing Chipper Jones, who leads baseball with a .369 batting average, which is impressive but also means that he’s only getting a hit one in three times he has a chance:

“Well, most of the time it’s not [going your way] and that’s why you have to be able to accept failure,” Jones said. “[…] a lot of work […] here in the big league is how you accept failure.”

Which is another important reminder: the standout difference in “making it” has to do with bouncing back from failure.

And if nothing else, watch it for footage of the collapse of the Tacoma Narrows Bridge in 1940. Such a beautiful (if terrifying) picture of cement and metal oscillating in the wind. Also linked from the Wikipedia article are a collection of still photographs (including the collapse) and links to newsreel footage from the Internet Archive.

Friday, August 15, 2008 | failure  

More NASA Observations Acquire Interest

Some additional followup from Robert Simmon regarding the previous post. I asked more about the “amateur Earth observers” and the intermediate data access. He writes:

The original idea was sparked from the success of amateur astronomers discovering comets. Of course amateur astronomy is mostly about making observations, but we (NASA) already have the observations: the question is what to do with them–which we really haven’t figured out. One approach is to make in-situ observations like aerosol optical thickness (haziness, essentially), weather measurements, cloud type, etc. and then correlate them with satellite data. Unfortunately, calibration issues make this data difficult to use scientifically. It is a good outreach tool, so we’re partnering with science museums, and the GLOBE program does this with schools.

We don’t really have a good sense yet of how to allow amateurs to make meaningful analyses: there’s a lot of background knowledge required to make sense of the data, and it’s important to understand the limitations of satellite data, even if the tools to extract and display it are available. There’s also the risk that quacks with and axe to grind will willfully abuse data to make a point, which is more significant for an issue like climate change than it is for the face on Mars, for example. That’s just a long way of saying that we don’t know yet, and we’d appreciate suggestions.

I’m more of a “face on Mars” guy myself. It’s unfortunate that the quacks even have to be considered, though not surprising from what I’ve seen online. Also worth checking out:

Are you familiar with Web Map Service (WMS)?
It’s one of the ways we distribute & display our data, in addition to KML.

And one last followup:

Here’s another data source for NASA satellite data that’s a bit easier than the data gateway:

and examples of classroom exercises using data, with some additional data sources folded in to each one:

The EET holds an “access data workshop” each year in late spring, you may be interested in attending next year.

And with regards to guidelines, Mark Baltzegar (of The Cyc Foundation) sent along this note:

Are you familiar with the ongoing work within the W3C’s Linking Open Data project? There is a vibrant community actively exposing and linking open data.

More to read and eat up your evening, at any rate.

Thursday, July 31, 2008 | acquire, data, feedbag, parse  

NASA Observes Earth Blogs

Robert Simmon of NASA caught this post about the NASA Earth Observatory and was kind enough to pass along some additional information.

Regarding the carbon emissions video:

The U.S. carbon emissions data were taken from the Vulcan Project:

They distribute the data here:

In addition to the animation (which was intended to show the daily cycle and the progress of elevated emissions from east to west each morning), we published a short feature about the project and the dataset, including some graphs that remove the diurnal cycle.

American Carbon is an example of one of our feature articles, which are published every month or so. We try to cover current research, focusing on individual scientists, using narrative techniques. The visualizations tie in closely to the text of the story. I’m the primary visualizer, and I focus on presenting the data as clearly as possible, rather than allowing free-form investigation of data. We also publish daily images (with links to images at the original resolution), imagery of natural hazards emphasizing current events (fires, hurricanes, and dust storms, for example), nasa press releases, a handful of interactive lessons, and the monthly global maps of various parameters. We’re in the finishing stages of a redesign, which will hopefully improve the navigation and site usability.

Also some details about the difficulties of distributing and handling the data:

These sections draw on data from wide and varied sources. The raw data is extremely heterogeneous, formats include: text files, HDF, matlab, camera raw files, GRADS, NetCDF, etc. All in different projections, at different spatial scales, and covering different time periods. Some of them are updated every five minutes, and others are reprocessed periodically. Trying to make the data available—and current—through our site would be overly ambitious. Instead, we focus on a non-expert audience interested in space, technology, and the environment, and link to the original science groups and the relevant data archives. Look in the credit lines of images for links.

Unfortunately the data formats can be very difficult to read. Here’s the main portal for access to NASA Earth Observing System data:

and the direct link to several of the data access interfaces:

And finally, something closer to what was discussed in the earlier post:

With the complexity of the science data, there is a place for an intermediate level of data: processed to a consistent format and readable by common commercial or free software (intervention by a data fairy?). NASA Earth Observations (NEO) is one attempt at solving that problem: global images at 0.1 by 0.1 degrees distributed as lossless-compressed indexed color images and csv files. Obviously there’s work to be done to improve NEO, but we’re getting there. We’re having a workshop this month to develop material for “amateur Earth observers” which will hopefully help us in this area, as well.

This speaks to the audience I tried to address with Visualizing Data in particular (or with Processing in general). There is a group of people who want access to data that’s more low-level than what’s found in a newspaper article, but not as complicated as raw piles of data from measuring instruments that are only decipherable by the scientists who use them.

This is a general theme, not specific to NASA’s data. And I think it’s a little more low-level than requiring that everything be in mashup-friendly XML or JSON feeds, but it seems worthwhile to start thinking about what the guidelines would be for open data distribution. And with such guidelines in place, we can browbeat organizations to play along! Since that would be, uh, a nice way to thank them for making their data available in the first place.

Thursday, July 31, 2008 | acquire, data, feedbag  

Processing 0143 and a status report

Just posted Processing 0143 to the download page. This is not yet the stable release, so please read revisions.txt, which describes the signficant changes in the releases since 0135 (the last “stable” release, and the current default download).

I’ve also posted a status report:

Some updates from the Processing Corporation’s east coast tower high rise offices in Cambridge, MA.

We’re working to finish Processing 1.0. The target date is this Fall, meaning August or September. We’d like to have it done as early as possible so that Fall classes can make use of it. In addition to the usual channels, we have a dozen or so people who are helping out with getting the release out the door. We’ll unmask these heros at some point in the future.

I’m also pleased to announce that I’m able to focus on Processing full time this Summer with the help of a stipend provided by Oblong Industries. They’re the folks behind the gesture-controlled interface you see in Minority Report. (You can find more about them with a little Google digging.) They’re funding us because of their love of open source and they feel that Processing is an important project. As in, there are no strings attached to the funding, and Processing is not being re-tooled for gesture interfaces. We owe them our enormous gratitude.

The big things for 1.0 include the Tools menu, better compile/run setup (what you see in 0136+), bringing back P2D, perhaps bringing back P3D with anti-aliasing, better OpenGL support, better library support, some major bug fixes (outstanding threading problems and more).

If you have a feature or bug that you want fixed in time for 1.0, now is the time to vote by making sure that it’s listed at http://dev.processing.org/bugs.

I’ll try to post updates more frequently over the next few weeks.

Monday, July 28, 2008 | processing  

Wordle me this, Batman

I’ve never really been fond of tag clouds, but Wordle, by MacGyver of software (and former drummer for They Might Be Giants) Jonathan Feinberg gives the representation an aesthetic nudge lacking in most representations. The application creates word clouds from input data submitted by users. I was reminded of it yesterday by Eugene, who submitted Lorem Ipsum:


I had first heard about it from emailer Bill Robertson, who had uploaded Organic Information Design, my master’s thesis. (Which was initially flattering but quickly became terrifying when I remembered that it still badly needs a cleanup edit.)


A wonderful tree shape! Can’t decide which I like better: “information” as the stem or “data” as a cancerous growth in the upper-right.

Mr. Feinberg is also the reason that Processing development has been moving to Eclipse (replacing emacs, some shell scripts, two packages of bazooka bubble gum and the command line) because of his donation of a long afternoon helping set up the software in the IDE back when I lived in East Cambridge, just a few blocks from where he works at IBM Research.

Wednesday, July 23, 2008 | inbox, refine, represent  

Blood, guts, gore and the data fairy

The O’Reilly press folks passed along this review (PDF) of Visualizing Data from USENIX magazine. I really appreciated this part:

My favorite thing about Visualizing Data is that it tackles the whole process in all its blood, guts, and gore. It starts with finding the data and cleaning it up. Many books assume that the data fairy is going to come bring you data, and that it will either be clean, lovely data or you will parse it carefully into clean, lovely data. This book assumes that a significant portion of the data you care about comes from some scuzzy Web page you don’t control and that you are going to use exactly the minimum required finesse to tear out the parts you care about. It talks about how to do this, and how to decide what the minimum required finesse would be. (Do you do it by hand? Use a regular expression? Actually bother to parse XML?)

Indeed, writing this book was therapy for that traumatized inner child who learned at such a tender young age that the data fairy did not exist.

Wednesday, July 23, 2008 | iloveme, parse, reviews, vida  

NASA Earth Observatory

carbon.jpgSome potentially interesting data from NASA passed along by Chris Lonnen. The first is the Earth Observatory, which includes images of things like Carbon Monoxide, Snow Cover, Surface Temperature, UV Exposure, and so on. Chris writes:

I’m not sure how useful they would be to novices in terms of usable data (raw numbers are not provided in any easy to harvest manner), but the information is
still useful and they provide for a basic, if clunky, presentation that follows the basic steps you laid out in your book. They data can be found here, and they occasionally compile it all into interesting visualizations. My favorite being the carbon map here.

The carbon map movie is really cool, though I wish the raw data were available since the strong cyclical effect seen in the animation needs to be separated out. The cycles dominates the animation to such an extent that it’s nearly the only takeaway from the movie. For instance, each cycle is a 24 hour period. Instead of showing them one after another, show several days adjacent one another, so that we can compare 3am with one day to 3am the next.

For overseas readers, I’ll note that the images and data are not all U.S.-centric—most cover the surface of the Earth.

I asked Chris about availability for more raw data, and he did a little more digging:

The raw data availability is slim. From what I’ve gathered you need to contact NASA and have them give you clearance as a researcher. If you were looking for higher quality photography for a tutorial NASA Earth Observations has a newer website that I’ve just found which offers similar data in the format of your choice at up to 3600 x 1800. For some sets it will also offer you data in CSV or CSV for Excel.

If you needed higher resolutions that that NASA’s Visible Earth offers some TIFF’s at larger sizes. A quick search for .tiff gave me an 16384 x 8192 map of the earth with city lights shining, which would be relatively easy to filter out from the dark blue background. These two websites are probably a bit more helpful.

Interesting tidbits for someone interested in a little planetary digging. I’ve had a few of these links sitting in a pile waiting for me to finish the “data” section of my web site; in the meantime I’ll just mention things here.

Update 31 July 2008: Robert Simmon from NASA chimes in.

Saturday, July 19, 2008 | acquire, data, inbox, science  

Brains on the Line

I was reminded this morning that Mario Manningham, a wide receiver who played for Michigan was rumored to have scored a 6 (out of 50) on the Wonderlic, an intelligence test administered in some occupations (and now pro football) to check the mental capability of job candidates. Intelligence tests are strange beasts, but after watching my niece working on similar problems—for fun—during her summer vacation last week, the tests caught my eye more than when I first heard about it.

Manningham was once a promising undergrad receiver for U of M, but has in recent years proven himself to be a knucklehead, loafing through plays and most recently making headlines for marijuana use and an interview on Sirius radio described as “… arrogant and defensive. When asked about the balls he dropped in big spots, he responded, ‘What about the ball I caught?’” So while an exceptionally score on a standardized test might suggest dyslexia, the guy’s an egotistical bonehead even without mitigating factors.

Most people don’t associate brains with football, but in recent years teams have begun to use a Wonderlic test while scouting, which consists of 50 questions to be completed in 12 minutes. Many of the questions are multiple choice, but the time is certainly a factor when completing the tests. A score of 10 is considered “literate”, while 20 is said to coincide with average intelligence (an IQ of 100, though now we’re comparing one somewhat arbitrary numerically scored intelligence test with another).

In another interesting twist, the test is also administered to players the day of the NFL combine—which means they first spend the day running, jumping, benching, interviewing, and lots of other -ings, before they sit down and take an intelligence test. It’s a bit like a medical student running a half marathon before taking the boards.

Wonderlic himself says that basically, the scores decrease as you move further away from the ball, which is interesting but unsurprising. It’s sort of obvious that a quarterback needs to be on the smarter side, but I was curious to see what this actually looked like. Using this table as a guide, I then grabbed this diagram from Wikipedia showing a typical formation in a football game. I cleaned up the design of the diagram a bit and replaced the positions with their scores:


Offense is shown in blue, defense in red. You can see the quarterback with a 24, the center (over 6 feet and around 300 lbs.) averaging higher at 25, and the outside linemen even a little higher. Presumably this is because the outside linemen need to mentally quick (as well as tough) to read the defense and respond to it. Those are the wide receivers (idiot loud mouths) with the 17s on the outside.

To make the diagram a bit clearer, I scaled each position based on its score:


That’s a little better since you can see the huddle around the ball and where the brains need to be for the system of protection around it. With the proportion, I no longer need the numbers, so I’ve switched back to using the initials for each position’s title:


(Don’t tell Tufte that I’ve used the radius, not the proportional area, of the circle as the value for each ellipse! A cardinal sin that I’m using in this case to improve proportion and clarify a point.)

I’ll also happily point out that the linemen for the Patriots all score above average for their position:

Player Position Year Score
Matt Light left tackle 2001 29
Logan Mankins left guard 2005 25
Dan Koppen center 2003 28
Stephen Neal right guard 2001 31
Nick Kaczur right tackle 2005 29

A position-by-position image for a team would be interesting, but I’ve already spent too much time thinking about this. The Patriots are rumored to be heavy on brains, with Green Bay at the other end of the spectrum.

An ESPN writeup about the test (and testing in general) can be found here, along with a sample test here.

One odd press release from Wonderlic even compares scores per NFL position with private sector job titles. For instance, a middle linebacker scores like a hospital orderly, while an offensive tackle is closer to a marketing executive. Fullbacks and halfbacks share the lower end with dock hands and material handlers.

During the run-up to Super Bowl XXXII in 1998, one reporter even dug up the Wonderlic scores for the Broncos and Packers, showing Denver with an average score of 20.4 compared to Green Bay’s 19.6. As defending champions, the Packers were favored but wound up losing 31-24.

Nobody cited test scores in the post-game coverage.

Wednesday, July 16, 2008 | football, sports  

Eric Idle on “Scale”

Scale is one of the most important themes in data visualization. In Monty Python’s The Meaning of Life, Eric Idle shares his perspective:

The lyrics:

Just remember that you’re standing on a planet that’s evolving
And revolving at nine hundred miles an hour,
That’s orbiting at nineteen miles a second, so it’s reckoned,
A sun that is the source of all our power.
The sun and you and me and all the stars that we can see
Are moving at a million miles a day
In an outer spiral arm, at forty thousand miles an hour,
Of the galaxy we call the ‘Milky Way’.

Our galaxy itself contains a hundred billion stars.
It’s a hundred thousand light years side to side.
It bulges in the middle, sixteen thousand light years thick,
But out by us, it’s just three thousand light years wide.
We’re thirty thousand light years from galactic central point.
We go ’round every two hundred million years,
And our galaxy is only one of millions of billions
In this amazing and expanding universe.

The universe itself keeps on expanding and expanding
In all of the directions it can whizz
As fast as it can go, at the speed of light, you know,
Twelve million miles a minute, and that’s the fastest speed there is.
So remember, when you’re feeling very small and insecure,
How amazingly unlikely is your birth,
And pray that there’s intelligent life somewhere up in space,
‘Cause there’s bugger all down here on Earth.

Wednesday, July 16, 2008 | music, scale  

Postleitzahlen in Deutschland

germany-contrast-small.pngMaximillian Dornseif has adapted Zipdecode from Chapter 6 of Visualizing Data to handle German postal codes. I’ve wanted to do this myself since hearing about the OpenGeoDB data set which includes the data, but thankfully he’s taken care of it first and is sharing it with the rest of us along with his modified code.

(The site is in German…I’ll trust any of you German readers to let me know if the site actually says that Visualizing Data is the dumbest book he’s ever read.)

Also helpful to note that he used Python for preprocessing the data. He doesn’t bother implementing a map projection, as done in the book, but the Python code is a useful example of using another language when appropriate, and how the syntax differs from Processing:

# Convert opengeodb data for zipdecode
fd = open('PLZ.tab')
out = []
minlat = minlon = 180
maxlat = maxlon = 0

for line in fd:
    line = line.strip()
    if not line or line.startswith('#'):
    parts = line.split('\t')
    dummy, plz, lat, lon, name = parts
    out.append([plz, lat, lon, name])
    minlat = min([float(lat), minlat])
    minlon = min([float(lon), minlon])
    maxlat = max([float(lat), maxlat])
    maxlon = max([float(lon), maxlon])

print "# %d,%f,%f,%f,%f" % (len(out), minlat, maxlat, minlon, maxlon)
for data in out:
    plz, lat, lon, name = data
    print '\t'.join([plz, str(float(lat)), str(float(lon)), name])

In the book, I used Processing for most of the examples (with a little bit of Perl) for sake of simplicity. (The book is already introducing a lot of new material, why hurt people and introduce multiple languages while I’m at it?) However that’s one place where the book diverges from my own process a bit, since I tend to use a lot of Perl when dealing with large volumes of text data. Python is also a good choice (or Ruby if that’s your thing), but I’m tainted since I learned Perl first, while a wee intern at Sun.

Tuesday, July 15, 2008 | adaptation, vida, zipdecode  

Parsing Numbers by the Bushel

While taking a look at the code mentioned in the previous post, I noticed two things. First, the PointCloud.pde file drops directly into OpenGL-specific code (rather than Processing API) for sake of speed to draw thousands and thousands of points. It’s further proof that I need to finish the PShape class for Processing 1.0, which will automatically handle this sort of thing automatically.

Second is a more general point about parsing. This isn’t intended as a nitpick on Aaron’s code (it’s commendable that he put his code out there for everyone to see—and uh, nitpick about). But seeing how it was written reminded me that most people don’t know about the casts in Processing, particularly when applied to whole arrays, and this can be really useful when parsing data.

To convert a String to a float (or int) in Processing, you can use a cast, for instance:

String s = "667.12";
float f = float(s);

This also in fact works with String[] arrays, like the kind returned by the split() method while parsing data. For instance, in SceneViewer.pde, the code currently reads:

String[] thisLine = split(raw[i], ",");
points[i * 3] = new Float(thisLine[0]).floatValue() / 1000;
points[i * 3 + 1] = new Float(thisLine[1]).floatValue() / 1000;
points[i * 3 + 2] = new Float(thisLine[2]).floatValue() / 1000;

Which could be written more cleanly as:

String[] thisLine = split(raw[i], ",");
float[] f = float(thisLine);
points[i * 3 + 0] = f[0] / 1000;
points[i * 3 + 1] = f[1] / 1000;
points[i * 3 + 2] = f[2] / 1000;

However, to his credit, Aaron may have have intentionally skipped it in this case since he don’t need the whole line of numbers.

Or if you’re using the Processing API with Eclipse or some other IDE, that means that the float() cast won’t work for you. You can substitute float() with the parseFloat() method:

String[] thisLine = split(raw[i], ",");
float[] f = parseFloat(thisLine);
points[i * 3 + 0] = f[0] / 1000;
points[i * 3 + 1] = f[1] / 1000;
points[i * 3 + 2] = f[2] / 1000;

The same can be done for int, char, byte, and boolean. You can also go the other direction by converting float[] or int[] arrays to String[] arrays using the str() method. (The method is named str() because a String() cast would be awkward, a string() cast would be error prone, and it’s not really parseStr() either.)

When using parseInt() and parseFloat() (versus the int() and float() casts), it’s also possible to include a second parameter that specifies a “default” value for missing data. Normally, the default is Float.NaN for parseFloat(), or 0 with parseInt() and the others. When parsing integers, 0 and “no data” often have a very different meaning, in which case this can be helpful.

Tuesday, July 15, 2008 | parse  

Radiohead – House of Cards

Radiohead’s new video for “House of Cards” built using a laser scanner and software:

Aaron Koblin, one of Casey’s former students was involved in the project and also made use of Processing for the video. He writes:

A couple of hours ago was the release of a project I’ve been working on with Radiohead and Google. Lots of laser scanner fun.

I released some Processing code along with the data we captured to make the video. Also tried to give a basic explanation of how to get started using Processing to play with all this stuff.

The project is hosted at code.google.com/radiohead, where you can also download all the data for the point clouds captured by the scanner, as well as Processing source code to render the points and rotate Thom’s head as much as you’d like. This is the download page for the data and source code.

They’ve also posted a “making of” video:

(Just cover your ears toward the end where the director starts going on about “everything is data…”)

Sort of wonderful and amazing that they’re releasing the data behind the project, opening up the possibility for a kind of software-based remixing of the video. I hope their leap of faith will be rewarded by individuals doing interesting and amazing things with the data. (Nudge, nudge.)

Aaron’s also behind the excellent Flight Patterns as well as The Sheep Market, both highly recommended.

Tuesday, July 15, 2008 | data, motion, music  

Derek Jeter Probably Didn’t Need To Jump To Throw That Guy Out

05jeterderek14.jpgDerek Jeter vs. Objective Reality is an entertaining article from Slate regarding a study by Shane T. Jensen at the Wharton School. Nate DiMeo writes:

The take-away from the study, which was presented at the annual meeting of the American Association for the Advancement of Science, was that Mr. Jeter (despite his three Gold Gloves and balletic leaping throws) is the worst-fielding shortstop in the game.

The New York press was unhappy, but the stats-minded baseball types (Sabermetricians) weren’t that impressed. DiMeo continues:

Mostly, though, the paper didn’t provoke much intrigue because Jeter’s badness is already an axiom of [Sabermetric literature]. In fact, debunking the conventional wisdom about the Yankee captain’s fielding prowess has become a standard method of proving the validity of a new fielding statistic. That places Derek Jeter at the frontier of new baseball research.

Well put. Mr. Jeter defended himself by saying:

“Maybe it was a computer glitch”

What I like about the article, aside from a objective and quantitative reason to dislike Jeter (I already have a quantity of subjective reasons) is how the article frames the issue in the broader sports statistics debate. It nicely covers this new piece of information as a microcosm of the struggle between sabermetricians and traditional baseball types, while essentially poking fun at both: the total refusal of the traditional side to buy into the numbers, and the schadenfreude of the geeks going after Jeter since he’s the one who gets the girls. (The article is thankfully not as trite as that, but you get the idea.)

I’m also biased since the metric in the paper places Pokey Reese, one of my favorite Red Sox players of 2004 as #11 amongst second basemen between 2000-2005.

And of course, The Onion does it better:

Experts: ‘Derek Jeter Probably Didn’t Need To Jump To Throw That Guy Out’

BRISTOL, CT—Baseball experts agreed Sunday that Derek Jeter, who fielded a routine ground ball during a regular-season game in which the Yankees were leading by five runs and then threw it to first base using one of his signature leaps, did not have to do that to record the out. “If it had been a hard-hit grounder in the hole or even a slow dribbler he had to charge, that would’ve been one thing,” analyst John Kruk said during a broadcast of Baseball Tonight. “But when it’s hit right to him by [Devil Rays first-baseman] Greg Norton, a guy who has no stolen bases and is still suffering the effects of a hamstring injury sustained earlier this year… Well, that’s a different story.” Jeter threw out Norton by 15 feet and pumped his fist in celebration at the end of the play.

In other news, I can’t believe I just put a picture of Jeter on my site.

Monday, July 14, 2008 | baseball, mine, sports  

Storyboarding with the Coen Brothers

0805ande1_533x600_4.jpgWonderful article about the work of J. Todd Anderson, who storyboards the Coen Brothers’ movies:

Anderson’s drawings have a jauntiness that seems absent from the more serious cinematic depiction; Anderson says he is simply trying to inject as much of a sense of action as possible into each scene.

Anderson describes the process of meeting about a new film:

“It’s like they’re making a movie in front of me,” he says. “They tell me the shots. I do fast and loose drawings on a clipboard with a Sharpie pen—one to three drawings to a sheet of regular bond paper. I try to establish the scale, trap the angle, ID the character, get the action.”

More in the article

Friday, June 27, 2008 | drawing, movies  

National Traffic Scorecard

The top 100 most congested metropolitan areas, visualized as a series of tomato stems:


Includes links to PDF reports for each area which detail overall congestion and the worst bottlenecks.

Thursday, June 26, 2008 | mapping, traffic  

Paternalism at the state level and the definition of “advice”

Following up on an earlier post, The New York Times jumps in with more about California (and New York before it) shutting down personal genomics companies, including this curious definition of advice:

“We think if you’re telling people you have increased risk of adverse health effects, that’s medical advice,” said Ann Willey, director of the office of laboratory policy and planning at the New York State Department of Health.

The dictionary confirmed my suspicion that advice refers to “guidance or recommendatios concerning prudent future action,” which doesn’t coincide with telling people they have increased risk for a disease. If they told you to take medication based on that risk, it would most certainly be advice. But as far as I know, the extent of the advice given by these companies is to consult a doctor for…advice.

As in the earlier post, the health department in California continues to sound nutty:

“We started this week by no longer tolerating direct-to-consumer genetic testing in California,” Karen L. Nickel, chief of laboratory field services for the state health department, said during a June 13 meeting of a state advisory committee on clinical laboratories.

We will not tolerate it! These tests are a scourge upon our society! The collapse of the housing loan market, high gas prices, and the “great trouble or suffering” brought on by this beast that preys on those with an excess of disposable income. Someone has to save these people who have $1000 to spare on self-curiosity! And the poor millionaires spending $350,000 to get their genome sequenced by Knome. Won’t someone think of the millionaires!?

I wish I still lived in California, because then I would know someone was watching out for me.

For the curious, the letters sent to the individual companies can be found here, sadly they aren’t any more insightful than the comments to the press. But speaking of scourge—the notices are all Microsoft Word files.

One interesting tidbit closing out the Times article:

Dr. Hudson [director of the Genetics and Public Policy Center at Johns Hopkins University] said it was “not surprising that the states are stepping in, in an effort to protect consumers, because there has been a total absence of federal leadership.” She said that if the federal government assured tests were valid, “paternalistic” state laws could be relaxed “to account for smart, savvy consumers” intent on playing a greater role in their own health care.

It’s not clear whether this person is just making a trivial dig at the federal government
or whether this is the root of the problem. In the previous paragraph she’s being flippant about “Genes R Us” so it might be just a swipe, but it’s an interesting point nonetheless.

Thursday, June 26, 2008 | genetics, government, privacy, science  

Surfing, Orgies, and Apple Pie

Obscenity law in the United States is based on Miller vs. California, a precedent set in 1973:

“(a) whether the ‘average person, applying contemporary community standards’ would find that the work, taken as a whole, appeals to the prurient interest,

(b) whether the work depicts or describes, in a patently offensive way, sexual conduct specifically defined by the applicable state law, and

(c) whether the work, taken as a whole, lacks serious literary, artistic, political, or scientific value.”

Of course, the definition of an average person or community standards isn’t quite as black and white as most Supreme Court decisions. In a new take, the lawyer defending the owner of a pornography site in Florida is using Google Trends to produce what he feels is a more accurate definition of community standards:

In the trial of a pornographic Web site operator, the defense plans to show that residents of Pensacola are more likely to use Google to search for terms like “orgy” than for “apple pie” or “watermelon.” The publicly accessible data is vague in that it does not specify how many people are searching for the terms, just their relative popularity over time. But the defense lawyer, Lawrence Walters, is arguing that the evidence is sufficient to demonstrate that interest in the sexual subjects exceeds that of more mainstream topics — and that by extension, the sexual material distributed by his client is not outside the norm.

Below, “surfing” in blue, “orgy” in red, and “apple pie” in orange:


A clever defense. The trends can also be localized to roughly the size of a large city or county, which arguably might be considered the “community.” The New York Times article continues:

“Time and time again you’ll have jurors sitting on a jury panel who will condemn material that they routinely consume in private,” said Mr. Walters, the defense lawyer. Using the Internet data, “we can show how people really think and feel and act in their own homes, which, parenthetically, is where this material was intended to be viewed,” he added.

Fascinating that there could actually be something even remotely quantifiable about community standards. “I know it when I see it” is inherently subjective, so is any introduction of objectivity an improvement? For more perspective, I recommend this article from FindLaw, which describes the history of “Movie Day” at the Supreme Court and the evolution of obscenity law.

The trends data has many inherent problems (lack of detail for one), but is another indicator of what we can learn from Google. Most important to me, the case provides an example of what it means for search engines to capture this information, because it demonstrates to the public at large (not just people who think about data all day) how the information can be used. As more information is collected about us, search engine data provides an imperfect mirror onto our society, previously known only to psychiatrists and priests.

Tuesday, June 24, 2008 | online, privacy, retention, social  

Typography Grab Bag: Berlow, Carter, and Indiana Jones

raiders.jpgIndiana Jones and the Fonts on the Maps – Mark Simonson takes on historical accuracy of the typography used in the Indiana Jones movies:

For the most part, the type usage in each of the movies is correct for the period depicted. With one exception: The maps used in the travel montages.

My theory is that this is because the travel maps are produced completely outside the standard production team. They’re done by some motion graphics house, outside the purview of the people on-set who are charged with issues of consistency. A nastier version of this theory might indict folks who do motion graphics for not knowing their typography and its time period—instead relying on the “feel” of the type when selecting. The bland version of this theory is that type history is esoteric, and nobody truly cares.

(Also a good time to point out how maps are used as a narrative device in the film, to great effect. The red line extending across the map is part of the Indiana Jones brand. I’d be curious to hear the story behind the mapping—who decided it needed to be there, who made it happen, who said “let’s do a moving red line that tracks the progress”—which parts were intentional, and which unintentional.)

Identifying the period for the faces reminded me of a 2005 profile of Matthew Carter, which described his involvement in court cases where date was in doubt, but typography of artifacts in question gave away their era. Sadly the article cannot be procured from the web site of The New Yorker, though you may have better luck if you possess a library card. Matthew Carter designed the typefaces Verdana and Bell Centennial (among many others). Spotting his wispy white ponytail around Harvard Square is a bit like seeing a rock star, if you’re a Cantabridgian typography geek.

From A to Z, font designer knows his type – a Boston Globe interview with type designer David Berlow (one of the founders of Font Bureau), some of the questions are unfortunate, but a few interesting anecdotes:

Playboy magazine came to me; they were printing with two printing processes, offset and gravure. Gravure (printing directly from cylinder to paper), gives a richer, smoother texture when printing flesh tones and makes the type look darker on the page than offset (indirect image transfer from plates). So if you want the type to look the same, you have to use two fonts. We developed two fonts for Playboy, but they kept complaining that the type was still coming out too dark or too light. Finally, I got a note attached to a proof that said, “Sorry. It was me. I needed new glasses. Thanks for all your help. Hef.” That was Hugh Hefner, of course.

Or speaking about his office:

From Oakland, Calif., to Delft, Holland, all the designers work from home. I have never been to the office. The first time I saw it was when I watched the documentary “Helvetica,” which showed our offices.


The strange allure of making your own fonts – Jason Fagone describes FontStruct, a web-based font design tool from FontShop:

FontStruct’s interface couldn’t be more intuitive. The central metaphor is a sheet of paper. You draw letters on the “sheet” using a set of standard paint tools (pencil, line, box, eraser) and a library of what FontStruct calls “bricks” (squares, circles, half-circles, crescents, triangles, stars). If you keep at it and complete an entire alphabet, FontStruct will package your letters into a TrueType file that you can download and plunk into your PC’s font folder. And if you’re feeling generous, you can tell FontStruct to share your font with everybody else on the Internet under a Creative Commons license. Every font has its own comment page, which tends to fill with praise, practical advice, or just general expressions of devotion to FontStruct.

Though I think my favorite bit might be this one:

But the vast majority of FontStruct users aren’t professional designers, just enthusiastic font geeks.

I know that because I’m one of them. FontStruct brings back a ton of memories; in college, I used to run my own free-font site called Alphabet Soup, where I uploaded cheapie fonts I made with a pirated version of a $300 program called Fontographer. Even today, when I self-Google, I mostly come up with links to my old, crappy fonts. (My secret fear is that no matter what I do as a reporter, the Monko family of fonts will remain my most durable legacy.)

The proliferation of bad typefaces: the true cost of software piracy.

Tuesday, June 17, 2008 | grabbag, mapping, refine, software, typography  

Personal genetic testing gets hilarious before it gets real

Before I even had a chance to write about personal genomics companies 23andMe, Navigenics, and deCODEme, Forbes reports that the California Health Department is looking to shut them down:

This week, the state health department sent cease-and-desist letters to 13 such firms, ordering them to immediately stop offering genetic tests to state residents.

Because of advances in genotyping, it’s possible for companies to detect changes from half a million data points (or soon, a million) of a person’s genome. The idea behind genotyping is that you look only for the single letter changes (SNPs) that are more likely to be unique between individuals, and then use that to create a profile of similarities and differences. So companies have sprung up, charging $1000 (ok, $999) a pop to decode these bits of your genome. It can then tell you some basic things about ancestry, or maybe a little about susceptibility for certain kinds of diseases (those that have a fairly simple genetic makeup—of which there aren’t many, to be sure).

Lea Brooks, spokesperson for the California Health Department, confirmed for Wired that:

…the investigation began after “multiple” anonymous complaints were sent to the Health Department. Their researchers began with a single target but the list of possible statute violators grew as one company led to another.

Listen folks, this is not just one California citizen, but two or more anonymous persons! Perhaps one of them was a doctor or insurance firm who have been neglected their cut of the $1000:

One controversy is that some gene testing Web sites take orders directly from patients without a doctor’s involvement.

Well now, that is a controversy! Genetics has been described as the future of medicine, and yet traditional drainers of wallets (is drainer a word?) in the current health care system have been sadly neglected. The Forbes article also describes the nature of the complaints:

The consumers were unhappy about the accuracy [of the tests] and thought they cost too much.

California residents will surely be pleased that the health department is taking a hard stand on the price of boutique self-testing. As soon as they finish off these scientifimagical “genetic test” goons, we could all use a price break on home pregnancy tests.

video1_6.pngAnd as to the accuracy of, or what can be ascertained from such tests? That’s certainly been a concern of the genetics community, and in fact 23andme has “admitted its tests are not medically useful, as they represent preliminary findings, and so are merely for educational purposes.” Which is perfectly clear to someone visiting their site, however that presents a bigger problem:

“These businesses are apparently operating without a clinical laboratory license in California. The genetic tests have not been validated for clinical utility and accuracy,” says Nickel.

So an accurate, clinical-level test is illegal. But a less accurate, do-it-yourself (without a doctor) test is also illegal. And yet, California’s complaint gets more bizarre:

“And they are scaring a lot of people to death.”

Who? The people who were just complaining about the cost of the test? That’s certainly a potential problem if you don’t do testing through a doctor—and in fact, it’s a truly significant concern. But who purchases a $999 test from a site with the cartoon characters seen above to check for Huntington’s disease?

And don’t you think if “scaring people” were the problem, wouldn’t the papers and the nightly news be all over it? The only thing they love more than a new scientific technology that’s going to save the world is a new scientific technology to be scared of. Ooga booga! Fearmongering hits the press far more quickly than it does the health department, so this particular line of argument just sounds specious.

The California Health Department does an enormous disservice to the debate of a complicated issue by mixing several lines of reasoning which taken as a whole simply contradict one another. The role of personal genetic testing in our society deserves a debate and consideration; I thought I would be able to post about that part first, but instead the CA government beat me to the dumb stuff.

Thomas Goetz, deputy editor at Wired has had two such tests (clearly not unhappy with the price), and angrily responds “Attention, California Health Department: My DNA Is My Data.” It’s not just those anonymous Californians who are wound up about genetic testing, he’s writing his sternly worded letter as we speak:

This is my data, not a doctor’s. Please, send in your regulators when a doctor needs to cut me open, or even draw my blood. Regulation should protect me from bodily harm and injury, not from information that’s mine to begin with.

Are angry declarations of ownership of one’s health data a new thing? It’s not like most people fight for their doctor’s office papers, or even something as simple as a fingerprint, this way.

It’ll be interesting to see how this shakes out. Or it might not, since it will probably consist of:

  1. A settlement by the various companies to continue doing business.
  2. Some means of doctors and insurance companies getting paid (requiring a visit, at a minimum).
  3. People trying to circumvent #2 (see related topics filed under “H” for Human Growth Hormone).
  4. An entrepreneur figures out how to do it online and in a large scale fashion (think WebMD), turning out new hoards of “information” seeking hypochondriacs to fret about their 42% potential alternate likelihood maybe chance of genetic malady. (You have brain cancer too!? OMG!)
  5. If this hits mainstream news, will people hear about the outcome of #1, or will there be an assumption that “personal genetic tests are illegal” from here on out? How skittish will this make investors (the Forbes set) about such companies?

Then again, I’ve already proven myself terrible at predicting the future. But I’ll happily enjoy the foolishness of the present.

Tuesday, June 17, 2008 | genetics, privacy, science  

Iron Woman

Apropos of the recent film graphics post, Jessica Helfand at Design Observer writes about the recently released Iron Man:

Iron Man is the fulfillment of all the computer-integrated movies were ever meant to be, and by computer-integrated, I mean just that: beyond the technical wizardry of special effects, this is a film in which the computer is incorporated, like a cast member, into the development of the plot itself.

I’ve not seen the movie but the statement appears to be provocative enough to elicit cheers and venom from the scribes in the comments section. (This seems to be common at Design Observer, are designers really this angry and unhappy? How ’bout them antisocial personal attacks! I take back what I wrote in the last post about wanting to be a designer when I grow up. Some thick skin or self-fashioned military grade body armor over at DO.)

On the other hand, a more helpful post linked to the lovely closing title sequence, designed by Danny Yount of Prologue.


I wish they didn’t use Black Sabbath. Is that really the way it’s done in the film? Paranoid is a great album (even if Iron Man is my least favorite track) but the titles and the music couldn’t have less to do with each other. Enjoy the music or enjoy the video; just don’t do ’em together.

Saturday, June 14, 2008 | motion, movies  

All the water in the world

From a post by Dan Phiffer, an image by Adam Nieman and the Science Photo Library.

All the water in the world (1.4087 billion cubic kilometers of it) including sea water, ice, lakes, rivers, ground water, clouds, etc. Right: All the air in the atmosphere (5140 trillion tonnes of it) gathered into a ball at sea-level density. Shown on the same scale as the Earth.


More information at the original post. (Thanks to Eugene for the link.)

Saturday, June 14, 2008 | infographics, scale  

Rick Astley & Ludacris

Someday I want to write like Ludacris, but for now I’ll enjoy info graphics of his work. Luda not only knows a lot of young ladies, but can proudly recite the range of area codes in which they live. Geographer (and feminist) Stefanie Gray took it upon herself to make a map:


You’ll need background music while taking a look; and I found a quick refresher of the lyrics also informative. More discussion and highlights of her findings can be found on Strange Maps, who first published Stefanie’s image.

In related news, someone else has figured out Rick Astley:


I’ve added the album cover at left so that you can look into his eyes and see his honest face for yourself. If you’re not a proud survivor of the 80s (or perhaps if you are), the single can be had for a mere 99¢. Or if that only gets you started, you can pick up his Greatest Hits. Someone also made another version of the graphic using the Google chart API (mentioned earlier), though it appears less analytically sound (accurate).

More from song charts at this earlier post.

Saturday, June 14, 2008 | infographics, music  

Paola Antonelli on Charlie Rose

This is from May, and the Design and the Elastic Mind show has now finished, but Paola Antonelli’s interview with Charlie Rose is well worth watching.

Paola’s incredibly sharp. Don’t turn it off in the first few minutes, however; I found that it wasn’t until about five or even ten minutes into the show that she began to sound like herself. I guess it takes a while to get past the requisite television pleasantries and the basic design-isms.

The full transcript doesn’t seem to be available freely, however some excerpts:

And I believe that design is one of the highest forms of human creative expression.

I would never dare say that! But I’ll secretly root for her making her case.

And also, I believe that designers, when they’re good, take revolutions in science and in technology, and they transform them into objects that people like us can use.

Doesn’t that make you want to be a designer when you grow up?

Regarding the name of the show, and the notion of elasticity:

…it was about showing how we need to adapt to different conditions every single day. Just work across different time zones, go fast and slow, use different means of communication, look at things at different scales. You know, some of us are perfectly elastic. And instead, some others get a little bit of stretch marks. And some others just cannot deal with it.

And designers help us cope with all these changes.

Her ability to speak plainly and clearly reinforces her point about designers and their role in society. (And if you don’t agree, consider what sort of garbage she could have said, or rather that most would have said, speaking about such a trendy oh-so-futuristic show.)

In the interest of full disclosure, she does mention my work (very briefly), but that’s not until about halfway through, so it shouldn’t interfere with your enjoyment of the rest of the interview.

Thursday, June 12, 2008 | iloveme, speaky  

Spying on teenagers: too much information

Excellent article from the Boston Globe Sunday Magazine on how parents of teenagers are handling their over-connected kids. Cell phones, text messaging, instant messaging, Facebook, MySpace, and to a lesser extent (for this age group) email mean that a lot of information and conversation is shared and exchanged. And as with all new technologies, it can all be tracked and recorded, and more easily spied upon. (More easily meaning that a parent can read a day worth of IM logs in a fairly quick sitting—something that couldn’t be done with a day’s worth of telephone conversations.) There are obvious and direct parallels to the U.S. government monitoring its own citizens, but I’ll return to that in a later post.

The article starts with a groan:

One mom does her best surveillance in the laundry room. Her teenage son has the habit of leaving his cellphone in the pocket of his jeans, so in between sorting colors and whites, she’ll grab his phone and furtively scroll through his text messages from the past week to see what he’s said, whom he’s connected with, and where he’s been.

While it’s difficult to say what this parent was specifically hoping to find (or what they’d do with the information), it worsens as it sinks to a level of cattiness:

Sometimes, she’ll use her own phone to call another mom she’s friendly with and share her findings in hushed tones.

Further in, some insight from Sherry Turkle:

MIT professor Sherry Turkle is a leading thinker on the relationship between human beings and technology. She’s also the mother of a teenage girl. So she knows what she’s talking about when she says, “Parents were not built to know the kinds of things that technology makes possible.”

(Emphasis mine.) This doesn’t just go for parents, it’s a much bigger issue of spying on the day-to-day habits and ramblings of someone else. This is the same reason why you should never read someone’s email, like a significant other, a spouse, a friend. No matter how well you know the sender and recipient, you’re still not them. You don’t think like them. You don’t see the world the way they do. You simply don’t have proper context, nor the understanding of their relationship with one another. You probably don’t even have the entire thread of even just this one email conversation. I’ve heard from friends who read an email belonging to their significant other, only to wind up in tears and expecting the worst.

This scenario never ends well: you can either keep it in and remain upset, or you can confront the person. In which case, one of two things will happen. One, that your worst fear will be true (“he’s cheating!”) and you’ll be partially indicted in the mess because you’ve spied (“how could you read my email?”), and you’ve lost the moral high ground you might otherwise have had (“I can’t believe you didn’t trust me”). Or two, that you’ve blown something out of proportion, and destroyed the trust of that person: someone that you cared about enough to be concerned to the point of reading their private email.

Returning to the article, one of the scenarios I found notable:

…there’s a natural desire, and a need, for teenagers to have their own parent-free zone as they get older.

As a graduating senior at Cambridge Rindge and Latin, Sam McFarland is grateful his parents trusted him to make the right decisions once he had established himself as worthy of the trust. A few of his friends had parents who were exceedingly vigilant. The result? “You don’t hang out at those kids’ houses as much,” Sam says.

So there’s something fascinating about this—that not only is it detrimental to your kid’s development to be overly involved, but that it presents a socialization problem for them because they become ostracized (even if mildly) because of your behavior.

And when parents confront?

When one of his friends was 14, the kid’s parents reprimanded him for something he had talked about online. Immediately, he knew they had been spying on him, and it didn’t take long for him to determine they’d been doing it for some time.” He was pretty angry,” Sam says, “He felt kind of invaded.” At first, his friend behaved, conscious that his parents were watching his every move.” But then it reached a tipping point,” Sam says. “He became so fed up about it that, not only didn’t he care if they were watching, but he began acting out, hoping they were watching or listening so he could upset them.”

I’m certain that this would have been my response if my parents had done something like this. (As if teenagers need something to fuel their adversarial attitude toward their parents.) But now you have a situation where a reasonably good kid has made an active decision to behave worse in response to his parents’ mistrust and attempt to rein him in.

The article doesn’t mention what he had done, but how bad could it have been? And that is the crux of the situation: What do these parents really expect to find, and how can that possibly be outweighed by breaking that bond of trust?

It’s also easy to spy, so one (technology savvy) parent profiled goes with what he calls his “fear of God” speech:

Greg warned them, “I can know everything you’re doing online. But I’m not going to invade your privacy unless you give me a reason to.”

By relying on the threat of intervention rather than intervention itself, Greg has been able to avoid the drawbacks that several friends of mine told me they experienced after monitoring their teenagers’ IM and text conversations. These are all great, involved parents who undertook limited monitoring for the right reasons. But they found that, in their hunt for reassurance that their teenager was not engaging in dangerously bad behavior, they were instead worn down by the little disappointments – the occasional use of profanities or mean-spirited name-calling – as well as the mind-numbing banality of so much teen talk.

And that’s exactly it—tying together the points of 1) you’re not in their head and 2) what did you expect to find? As you act out in different ways (particularly as a teenager), you’re trying to figure out how things fit. Nobody’s perfect, and they need some room to be their own age, particularly with their friends. Which made me particularly interested in this quote:

Leysia Palen, the University of Colorado professor, says the work of social theorist Erving Goffman is instructive. Goffman talked about how we all have “front-stage” and “backstage” personas. For example, ballerinas might seem prim and perfect while performing, only to let loose by smoking and swearing as soon as they are behind the curtain. “Everyone needs to be able to retreat to the backstage,” Palen says. “These kids need to learn. Maybe they need to use bad language to realize that they don’t want to use bad language.

Unfortunately the article also goes astray with its glorification of the multitasking abilities of today’s teenagers:

On an average weeknight, Tim has Facebook and IM sharing screen space on the Mac outside his bedroom as he keeps connected with dozens of friends simultaneously. His Samsung Slider cellphone rests nearby, ready to receive the next text message…Every once in a while, he’ll strum his guitar or look up at the TV to catch some Ninja Warrior on the G4 network. Playing softly in the background is his personal soundtrack that shuffles between the Beatles and a Swedish techno band called Basshunter. Amid all this, he is doing his homework.

Yes, in truly amazing fashion, the human race has somehow evolved in the last ten years to be capable of effectively multitasking between this many different things at once. I don’t understand why people (much less parents) buy this. We have a finite attention span, and technology suggests ways to carve it up into ever-smaller slices. I might balance email, phone calls, writing, and watching a Red Sox game in the background, but there’s no way I’m gonna claim that I’m somehow performing all those things at 100%, or even that as I focus in on one of them, I’m truly 100% at that task. Those will be my teenagers in the sensory deprivation tank while they work on Calculus and U.S. History.

And to close, a more accurate portrayal of multitasking:

It’s not uncommon to see two teenage pals riding in the back of a car, each one texting a friend somewhere else rather than talking to the friend sitting next to them. It’s a throwback to the toddler days, when kids engage in parallel play before they’re capable of sustained interaction.

Thursday, June 12, 2008 | overload, privacy  

You’ve never actually known what the question is

Douglas Adams addresses “What is the Question?”, the mantra of Visualizing Data, my Ph.D. dissertation, and hopefully haunts any visualization student I’ve ever taught:

The answer to the Great Question…?
“Forty-two,” said Deep Thought with infinite majesty and calm.
“Forty-two!” yelled Loonquawl, “Is that all you’ve got to show for seven and a half million years of work?”
“I checked it very thoroughly,” said the computer, “and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you’ve never actually known what the question is.”

The Hitchhiker’s Guide to the Galaxy

(Found at the FontForge FAQ)

Monday, June 9, 2008 | question, vida  

Making fun of movie infographics only gets you so far

As much as snickering about computers in movies might make me feel smart, I’ve since become fascinated by how software, and in particular information, is portrayed in film. There are many layers at work:

  1. Film is visual storytelling. As such, you have to be able to see everything that’s happening. Data is not visual, which is why you see symbols that represent data used more often: It’s 2012 but they’re still storing data on physical media because at some point, showing the data being moved is important. (Nevermind that it can be transmitted thousands of kilomteters in a fraction of a second.) This is less interesting, since it means a sort of dumbing-down of the technology, and presents odd contradictions. It can also make things ugly: progress bars are often full screen interface elements, or how many technology-heavy action flicks have included the pursuit of a computer disk? (On the other hand, the non-visual aspect can be a positive one: a friend finishing film school at NYU once pursued a nanotechnology thriller as his final film because “you can’t see it.” It would allow him to tackle a technical subject without needing the millions of dollars in props.)
  2. Things need to “feel” like a computer. When this piece appeared in the Hulk, they added extra gray interface elements in and around it so that it didn’t look too futuristic. Nevermind that it was a real, working piece of software for browsing the human genome. To the consternation of a friend who worked on Minority Report, on-screen “windows” in the interface all had borders around them. If you have a completely fluid interface with hands, motion, and accessing piles of video being output from three people in a tank, do we really need…title bars?
  3. It’s not just computers—anything remotely complicated is handled in this manner. Science may be worse off than software, though I don’t think scientists complain as loudly as the geeks did when they heard “This is UNIX, I know this!” (My personal favorite in that one was a scene where a video phone discussion was actually an actor talking to a QuickTime movie—you could see the progress bar moving left to right as the scene wore on.)
  4. There’s a lot of superfluous gimmickery that goes on too. There’s just no way you’re gonna show important information in a film without random numbers twitching or counting down. Everything is more important when we have know the current time with millisecond accuracy (that’s three digits after the decimal point for seconds). Or maybe some random software code (since that’s incomprehensible but seems significant). This is obvious and sometimes painful to watch, except in the case of a talented visual designer who makes it look compelling.
  5. Finally, the way that computers are represented in film has something to do with how we (society? lay people? them?) think that computers should work.

It’s that last one that is the fascinating point for me: by virtue of the intent to reach a large audience, a movie streamlines the way that information is handled and interfaces behave. A their best, it suggests where we need to go (at their worst, they blink “Access Denied”). It’s easy to point out the ridiculousness of the room full of people hunched over computers at CIA headquarters and the guy saying “give me all people with last name Jones in the Baltimore area” and in the next scene that’s tallied against satellite video (which of course can be enhanced ad infinitum). But think about how ridiculous those scenes looked twenty years ago, and the parts of that scenario that are no longer far-fetched as the population at large gets used to Google and having satellite imagery available for the price of typing a query. Even the most outrageous—the imagery enhancement—has had breakthroughs associated with it, some of which can be done by anyone using Photoshop, like the case of people trying to figure out if Bush was wearing a wire at the debates in 2004. (Contradicting their earlier denials, Bush’s people later admitted that he was wearing a bulletproof vest.)

That’s the end of today’s lecture on movie graphics, so I’ll leave you with a link to Mark Coleran, a visual designer who has produced many such sequences for film.


I recommend the large version of his demo reel, and I’ll be returning to this topic later with more designers. Drop me an email if you have favorite designer or film sequence.

Monday, June 9, 2008 | infographics, movies  

Somewhere between graffiti and terrorism

boy-noshadow.jpgMatt Mullenweg, creator of WordPress, speaking at the “Future of Web Apps” conference in February:

Spammers are “the terrorists of Web 2.0,” Mullenweg said. “They come into our communities and take advantage of our openness.” He suggested that people may have moved away from e-mail and toward messaging systems like Facebook messaging and Twitter to get away from spam. But with all those “zombie bites” showing up in his Facebook in-box, he explained, the spammers are pouncing on openness once again.

I don’t think that “terrorists” is the right word—they’re not taking actions with an intent to produce fear that will prevent people from using online communities (much less killing bloggers or kidnapping Facebook users). What I like about this quote is the idea that “they take advantage of openness,” which puts it well. There needs to be a harsher way to describe this situation than “spamming” which suggests a minor annoyance. There’s nothing like spending a Saturday morning cleaning out the Processing discussion board, or losing an afternoon modifying the bug database to keep it safer from these losers. It’s a bit like people who crack machines out of maliciousness or boredom—it’s incredibly time consuming to clean up the mess, and incredibly frustrating when it’s something done in your spare time (like Processing) or to help out the group (during grad school at the ACG).

So it’s somewhere between graffiti and terrorism, but it doesn’t match either because the social impact at either end of that scale is incredibly different (graffiti can be a positive thing, and terrorism is a real world thing where people die).

On a more positive note, and for what it’s worth, I highly recommend WordPress. It’s obvious that it’s been designed and built by people who actually use it, which means that the interface is pleasantly intuitive. And not surprising that it was initially created by such a character.

Monday, June 9, 2008 | online, social  

The cloud over the rainforest brings a thunderstorm

And now, the opposite of the Amazon plot posted yesterday. No sooner had I finished writing about their online aptitude that they have a major site outage, greeting visitors with a Http/1.1 Service Unavailable message.


Plot from this article on News.com.

Friday, June 6, 2008 | goinuptotheserverinthesky, notafuturist  

Proper Analysis of Salary vs. Performance?

Got an email from Mebane Faber who noted the roughly inverse correlation you currently see in salaryper, and asking about whether I’d done proper year-end analysis. The response follows:

I threw the project together as sort of a fun thing out of curiosity, and haven’t taken the time to do a proper analysis. However you can see in the previous years that the inverse relationship happens each year at the beginning of the season, and then as it progresses, the big market teams tend to mow down the small guys. Or at least those that are successful–the correlation between salary and performance at the end of a season is generally pretty haphazard. In fact, it’s possible that the inverse correlation at the beginning of the season is actually stronger than the positive correlation at the end.

I think the last point is kinda funny, though I’d imagine there’s a less funny statistics term for that phenomenon. Such a fine line between funny and sounding important.

Friday, June 6, 2008 | feedbag, salaryper  

Distribution of the foreign customers at a particular youth hostel

Two pieces representing youth hostel data from Julien Bayle. Both adaptations of the code found in Visualizing Data. The first a map:


The map looks like most maps of data connected to a world map, but the second representation uses a treemap, which is much more effective (meaning that it answers his question much more directly).


The image as background is a nice technique, since if you’re not using colors to differentiate individual sectors, the treemap tends to be dominated by the outlines around the squares (search for treemap images and you’ll see what I mean). The background image lets you use the border lines, but the visual weight of the image prevents them from being in the foreground.

Anyone else with adaptations? Pass them along.

Thursday, June 5, 2008 | adaptation, vida  

I Think Somebody Needs A Hug

I tend to avoid reading online comments since they’re either overly negative or overly positive (neither is healthy), but I laughed out loud after happening across this comment from a post about salaryper on the Freakonomics blog at the New York Times site:

How do I become a “data visualization guru?”
Seems like a pretty sweet gig. But you probably need a degree in Useless Plots from Superficial Analysis School.

– Ben D.

No my friend, it takes a Ph.D. in Useless Plots from Superficial Analysis School. (And if you know this guy, please take him out for a drink — I’m concerned he’s been indoors too long.)

Thursday, June 5, 2008 | reviews, salaryper  

Obama Limited to 16 Bits

I guess I never thought I’d read about the 16-bit limitations of Microsoft Excel in mainstream press (or at least outside the geek press), but here it is:

Obama’s January fundraising report, detailing the $23 million he raised and $41 million he spent in the last three months of 2007, far exceeded 65,536 rows listing contributions, refunds, expenditures, debts, reimbursements and other details.

Excel has since its inception been limited to 65,536 rows, the maximum number you get when you represent the row number using two bytes. Mr. Millionsfromsmallcontributions has apparently flown past this limit in his FEC reports, forcing poor reporters to either use Microsoft Access (a database program) or pray for the just-released Excel 2007, where in fact the row restriction has been lifted.

In the past the argument against fixing the restriction had always been a mixture of “it’s too messy to upgrade something like that” and “you shouldn’t have that many rows of data in a spreadsheet anyway, you should use a database.” Personally I disagree with the latter; and as silly as the former sounds, it’s been the case for a good 20 years (or was the row limit even lower back then?)

The OpenOffice project, for instance, has an entire page dedicated to fixing the issue in OpenOffice Calc, where they’re limited to 30,000 rows—the limit being tied to 32,768, or the number you get with 15 bits instead of 16 (use the sixteenth bit as the sign bit indicating positive or negative, and you can represent numbers from -32768 to 32767 instead of unsigned 16 bit values that range from 0 to 65535).

Bottoms up for the first post tagged both “parse” and “politics”.

Thursday, June 5, 2008 | parse, politics  

What’s that big cloud over the rainforest?

As the .com shakeout loomed in the late 90s, I always assumed that:

  1. Most internet-born companies would disappear.
  2. Traditional (brick & mortar) stores would eventually get their act together and have (or outsource) a proper online presence. For instance Barnes & Noble hobbling toward a usable site, and Borders just giving up and turning over their online presence to Amazon. The former comical, the latter brilliant, though Borders has just returned with their own non-Amazonian presence. (Though I think the humor is now gone from watching old-school companies trying to move online.)
  3. Finally, a few new names—namely the biggest ones, like Amazon—would be left that didn’t disappear with the others from point #1.

Basically, that not much would change. A couple new brands would emerge, but that there wasn’t really room in people’s heads for that many new retailers or services. (It probably didn’t help that all their logos were blue and orange, and had names like Flooz, Boo and Kibu that feel natural on the tongue and inspire buyer loyalty and confidence.)

aws_bandwidth.gifBut not only did more companies stick around, some seem to be successfully pivoting into other areas. From Amazon:

In January of 2008 we announced that the Amazon Web Services now consume more bandwidth than do the entire global network of Amazon.com retail sites.

This from a blog post with this plot of the bandwidth use for both sides of the business.

Did you imagine that the site where you could buy books cheaper than anywhere else in 1998 would ten years later exceed the bandwidth from that with services for data storage and cloud computing? Of course, this announcement doesn’t say anything about their profits at this point, but I don’t think anyone expected Steve Jobs to turn Apple into a toy factory and start turning out music players and cell phones to have it become half their business within just a few years. (That’s half as in, “beastly silver PCs and shiny black and white laptops seem important and all, but those take real work…why bother?”)

But the point (aside from subjecting you to a long-winded description of .com history and my shortcomings as a futurist) has more to do with Amazon becoming a business that’s dealing purely in information. The information economy is all about people moving bits and ideas around (abstractions of things), instead of silk, furs, and spices (actual physical things). And while books are information, the growth of Amazon’s data services business—as evidenced by that graph—is one of the strongest indicators I’ve seen of just how real the non-real information economy has become. Not that the information economy is something new; but that the groundwork has been laid in the preceding decades where something like Amazon Web Services can be successful.

And since we’re on the subject of Amazon, I’ll close with more from Jeff Bezos from “How the Web Was Won” in this month’s Vanity Fair:

When we launched, we launched with over a million titles. There were countless snags. One of my friends figured out that you could order a negative quantity of books. And we would credit your credit card and then, I guess, wait for you to deliver the books to us. We fixed that one very quickly.

Or showing his genius early on:

When we started out, we were packing on our hands and knees on these cement floors. One of the software engineers that I was packing next to was saying, You know, this is really killing my knees and my back. And I said to this person, I just had a great idea. We should get kneepads. And he looked at me like I was from Mars. And he said, Jeff, we should get packing tables.

Thanks to Eugene for passing along the links.

Thursday, June 5, 2008 | goinuptotheserverinthesky, infographics, notaneconomist  

Movies, Mapping, and Motion Graphics

Elegantly done, and some of the driest humor in film titles you might ever see, the opening sequence from Death at a Funeral.

Excellent (and appropriate) music, color, and type; does a great job of setting up the film. IMDB description:

Chaos ensues when a man tries to expose a dark secret regarding a recently deceased patriarch of a dysfunctional British family

Or the tagline:

From director Frank Oz comes the story of a family that puts the F U in funeral.

Tuesday, June 3, 2008 | mapping, motion, movies  

Mark in Madrid

Mark Hansen is one of the nicest and most intelligent people you’ll ever meet. He was one of the speakers at the symposium at last Fall’s Visualizar workshop in Madrid, and Medialab Prado has now put the video of Mark’s talk (and others) online. Check it out:

Mark has a Ph.D. in Statistics and along with his UCLA courses like Statistical Computing and Advanced Regression, has taught one called Database Aesthetics, which he describes a bit in his talk. You might also be familiar with his piece Listening Post, which he created with Ben Rubin.

Tuesday, June 3, 2008 | speaky  

Goodbye 15 minutes: 1.5 seconds is the new real time

As cited on Slashdot, Google has announced that they’ll be providing real-time stock quotes from NASDAQ. As referred to in the title, this “real time” isn’t likely the same “real time” that financial institutions get for their “quotes,” since they still need to process the data and serve it up to you somehow. But for an old internet codger who thought quotes delayed by 15 minutes back in 1995 was pretty nifty, this is just one more sign of the information apocalypse.


The Wall Street Journal is also in on the gig, and Allen Wastler from CNBC crows that they’re also a player. Interestingly, the data will be free from the WSJ at their Markets Data Center page—one more sign of a Journal that’s continuing to open up its grand Oak doors to give us plebes a peek inside their exclusive club.

An earlier post from the Google blog has some interesting details:

As a result, we’ve worked with the SEC, the New York Stock Exchange (NYSE) and our D.C. trade association, NetCoalition, to find a way to bring stock data to Google users in a way that benefits users and is practical for all parties. We have encouraged the SEC to ensure that this data can be made available to our users at fair and reasonable rates, and applaud their recent efforts to review this issue. Today, the NYSE has moved the issue a great step forward with a proposal to the SEC which if approved, would allow you to see real-time, last-sale prices…

The NYSE hasn’t come around yet, but the move by NASDAQ should give them the additional competitive push to make it happen soon enough. As it appears, this had more to do with getting SEC approval than the exchanges themselves. Which, if you think about it, makes sense—and if you think about it more, makes one wonder what sort of market-crashing scenario might be opened by millions having access to the live data. Time to write that movie script.

At right: CNBC’s publicity photo of Allen Wastler, which appears to have been shot in the 1930s and later hand-colorized. Upon seeing this, Wastler was then heard to say to the photo and paste-up people, “That’s amazing, can you also give me a stogie?” Who doesn’t want that coveted fat cat, robber baron blogger look.

Tuesday, June 3, 2008 | acquire  

Melting Ants for Science (or, Solenopsis invicta as Dross)

Another visualization from the see-through fish category, a segment from Sunday Morning about Dr. Walter Tschinkel who studies the structure of ant colonies using aluminum casts. Three easy steps: Heat aluminum to 1200 degrees, pour it down an ant hole, and dig away carefully to reveal the intricate structure of the interior:

What amazing structures! Whenever you think you’ve made something that looks “good,” you can count on nature to dole out humility. Maybe killing the ants in the process is a little way to get the control back. Um, or something.

(Pardon the crappy video quality and annoying ad… Tried to tape the real version from my cable box, but @#$%*! Comcast has CBS marked as a 5c protected “premium” channel. Riiiight.)

Thursday, May 29, 2008 | physical, science  

Summerschool in Wiesbaden


Scholz & Volkmer is running a Summerschool program this July and is looking for eight students from USA and Europe. (Since “summer school” is one word, you may have already guessed that it’s based in Germany.) This is the group behind the SEE Conference that I spoke at in April. (Great conference, and the lectures are online, check ’em out.)

The program is run by their Technical Director (Peter), who is a great guy. They’re looking for topics like data visualization, mobile applications, interaction concepts, etc. and are covering flight and accomodations plus a small stipend during your four week stay. Should be a great time.

Tuesday, May 27, 2008 | opportunities  

Schneier, Terrorists and Accuracy

Some thoughtful comments passed along by Alex Hutton regarding the last post:

Part of the problem with point technology solutions is in the policies of implementation.  IMHO, we undervalue the subject matter expert, or operate as a denigrated bureaucracy which does not allow the subject matter expert the flexibility to make decisions.  When that happens, the decision is left to technology (and as you point out, no technology is a perfect decision maker).

I thought it was apropos that you brought in the Schneier example.  I’ve been very much involved in a parallel thought process in the same industry as he, and we (my partner and I) are coming to a solution that attempts to balance technology, point human decision, and the bureaucracy within which they operate.

If you believe the Bayesians, then the right Bayesian network mimics the way the brain processes qualitative information to create a belief (or in the terms of Bayesians, a probability statement used to make a decision).  As such, the current way we use the technology (that policy of implementation, above) is faulty because it minimizes that “Human Computational Engine” for a relatively unsophisticated, unthinking technology.  That’s not to say that technologies like facial recognition are worthless – computational engines, even less magic ones that aren’t 99.99% accurate, are valid pieces of prior information (data).

Now in the same way, Human Computational Engines are also less than perfectly accurate.  In fact, they are not at all guaranteed to work the same way twice – even by the same person unless that person is using framework to provide rigor, rationality, and consistency in analysis.

So ideally, in physical security (or information security where Schneier and I come from) the imperfect computer detection engine is combined with a good Bayesian network and well trained/educated/experienced subject matter experts to create a more accurate probability statement around terrorist/non-terrorist – one that at least is better at identifying cases where more information is needed before a person is prevented from flying, searched and detained.  While this method, too, would not be 100% infallible (no solution will ever be), it would create a more accurate means of detection by utilizing the best of the human computational engine.

I believe the Bayesians, just 99.99% of the time.

Thursday, May 15, 2008 | bayesian, feedbag, mine, security  

Human Computation (or “Mechanical Turk” meets “Family Feud”)

richard_dawson.jpgComputers are really good at repetitive work. You can ask a computer to multiply two numbers together seven billion times and not only will it not complain, it’ll probably have seven billion answers for you a few seconds later. Ask a person to do the same thing and they’ll either walk away at the outset, realizing the ridiculousness of the task, or they’ll get through the first few tries and lose interest. But even the fact that a human can recognize the ridiculousness of the task is important. Humans are good at lots of things—like identifying a face in a crowd—that cannot be addressed by computation with the same level of accuracy.

Visualization is about the interface between what humans are good at, and what computers are good at. First, the computer can crunch all seven billion numbers, then present the results in a way that we can use our own perceptual skills to identify what’s important or interesting. (This is also why the design of a visualization is a fundamentally human task, and not something to be left to automation.)

This is also the subject of Luis von Ahn’s work at Carnegie Mellon. You’re probably familiar with CAPTCHA images—usually wavy numbers and letters that you have to discern when signing up for a webmail account or buying tickets from Ticketmaster. The acronym stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart,” a clever mouthful referring to Alan Turing’s work in discerning man or machine. (I encourage you to read about them, but this is already getting long so I won’t get into it here.)

More interesting than CAPTCHA, however, is the whole notion that’s behind it: that it’s an example of relying on humans to do what they’re best at, though it’s a task that’s difficult for computers. (Sure, in recent weeks, people have actually found ways to “break” CAPTCHAs in specific cases, but that’s not important here.) For instance, the work was extended to the Google Image Labeler, described as follows:

You’ll be randomly paired with a partner who’s online and using the feature. Over a two-minute period, you and your partner will:

  • View the same set of images.
  • Provide as many labels as possible to describe each image you see.
  • Receive points when your label matches your partner’s label. The number of points will depend on how specific your label is.
  • See more images until time runs out.

Prior to this, most image labeling systems had to do with getting volunteers to name or tag images individually. As you can imagine, the quality of tags suffer considerably because of everything from differences in how people perceive or describe what they see, to individuals who try to be a little too clever in choosing tags. With the Image Labeler game, that’s turned around backwards, where there is a motivation to use tags that match the other person, thus minimizing the previous problems. (It’s “Mechanical Turk” meets “Family Feud”.) They’ve also applied the same ideas to scanning books—where fragments of text that cannot be recognized by software are instead checked by multiple people.

More recently, von Ahn’s group has expanded these ideas in Games With A Purpose, a site that addresses these “casual games” more directly. The new site is covered in this New Scientist article, which offers additional tidbits (perspective? background? couldn’t think of the right word).

You can also watch Luis’ Google Tech Talk about Human Computation, which if I’m not mistaken, led to the Image Labeler project.

(We met Luis a couple times while at CMU and watched the Superbowl with his awesome fiancée Laura, cheering on her hometown Chicago Bears against those villainous Colts. We were happy when he received a MacArthur Fellowship for his work—just the sort of person you’d like to get such an award that highlights people who often don’t quite fit in their field.)

Mommy can we play infringing on my civil liberties?Returning to the earlier argument, algorithms to identify a face in a crowd are certainly improving. But without a significant breakthrough, their usefulness will be significantly limited. One commonly hyped use for such systems is airport security. Bruce Schneier explains the problem:

Suppose this magically effective face-recognition software is 99.99 percent accurate. That is, if someone is a terrorist, there is a 99.99 percent chance that the software indicates “terrorist,” and if someone is not a terrorist, there is a 99.99 percent chance that the software indicates “non-terrorist.” Assume that one in ten million flyers, on average, is a terrorist. Is the software any good?

No. The software will generate 1000 false alarms for every one real terrorist. And every false alarm still means that all the security people go through all of their security procedures. Because the population of non-terrorists is so much larger than the number of terrorists, the test is useless. This result is counterintuitive and surprising, but it is correct. The false alarms in this kind of system render it mostly useless. It’s “The Boy Who Cried Wolf” increased 1000-fold.

Given the number of travelers at Boston Logan in 2006, that would be two “terrorists” identified per day. (And with Schneier’s one in ten million is a terrorist figure, that would be two or three terrorists per year…clearly too generous, which makes the face detection accuracy even worse than how he describes it.) I find myself thinking about the 99.99% accuracy number as I stare at the back of heads lined up at the airport security checkpoint—itself a human problem, not a computational problem.

Thursday, May 15, 2008 | cs, games, human, perception, security  

Gender and Information Graphics

Just received this in a message from a journalism grad student studying information graphics:

I have looked at 2 years worth of Glamour (and Harper’s Bazaar too) magazines for my project and it shows that Glamour and other women’s magazines have less amount of information graphics in the magazines compared to men’s magazines, such as GQ and Esquire. Why do you think that is? Do you think that is gender-related at all?

I hadn’t really thought about it much. For the record, my reply:

My fiancée (who knows a lot more about being female than I do) pointed out that such magazines have much less practical content in general, so it may have more to do with that than a specific gender thing. Though she also pointed out that, for instance, in today’s news about the earthquake in China, she felt that women might be more inclined to read a story with the faces of those affected than one with information graphics tallying or describing the same.

I think you’d need to find something closer to a male equivalent of Glamour so that you can cover your question and remove the significant bias you’re getting for the content. Though, uh, a male equivalent of Glamour may not really exist… But perhaps there are better options.

And as I was writing this, she responded:

Finding a male equivalent of Glamour is hard but they actually do have some hard-hitting stories near the back in every issue that sometimes might be overshadowed by all the fashion and beauty stuff. Actually, finding a female equivalent of GQ or Esquire is also hard because they sort of have a niche of their own too. I have to agree with your fiancée too, because, I studied Oprah’s magazines a little in my previous study and sometimes it is really about what appeals to their audience.

Well, my study does not imply causality and it sometimes might be hard to differentiate if the result was due to gender differences or content. So, it’s interesting to find all these out, and actually men’s magazines have about 5 times more information graphics than women’s magazines which is amazing.

Wow—five times more. (At least amongst the magazines that she mentioned.)

My hope in posting this (rather than just sharing the contents of my inbox…can you tell that I’m answering mail today?) is that someone else out there knows more about the subject. Please drop me a line if you do; I’d like to know more and to post a follow-up.

Monday, May 12, 2008 | gender, inbox, infographics  

Glagolitic Capital Letter Spidery Ha

spidery-170x205.pngA great Unicode in 5 Minutes presentation from Mark Lentczner at Linden Lab. He passed it along after reading this dense post, clearly concerned about the welfare of my readers.

(Searching out the image for the title of this post also led me to a collection of Favourite Unicode Codepoints. This seems ripe for someone to waste more time really tracking down such things and documenting them.)

Mark’s also behind Context Free, one of the “related initiatives” that we have listed on Processing.org.

Context Free is a program that generates images from written instructions called a grammar. The program follows the instructions in a few seconds to create images that can contain millions of shapes.

Grammars are covered briefly in the Parse chapter of vida, with the name of the language coming from a specific variety called Context Free Grammars. The magical (and manic) part of grammars is that their rules tend to be recursive and layered, which leads to a certain kind of insanity as you try to tease out how the rules work. With Context Free, Mark has instead turned this dizziness into the basis for creating visual form.

Updated 14 May 08 to fix the glyph. Thanks to Paul Oppenheim, Spidery Ha Devotee, for the correction.

Monday, May 12, 2008 | feedbag, languages, parse, unicode  

So much for “wonderfully simple”

In contrast to the clarity and simplicity of the New York Times info graphic mentioned yesterday, the example currently on their home page is an example of the opposite:

This is helpful because it clarifies the point I tried to make about what was nice about the other graphic. Because of space limitations, this graphic is small, and the information is stored across multiple panels. So at the top there are a pair of tabs. Then within the tabs we have a pair of buttons. Two tabs, four buttons, just to get through four possible pieces of data. That’s the sort of combinatoric magic we see in Microsoft Windows preference panels:


While the organization in the info graphic makes conceptual sense—first you must choose one of two states, then choose one of the candidates—it makes little cognitive sense. We’re choosing between one of four options. Just give them to us! For a pair of items beneath another pair of items, there’s no need to establish a sense of hierarchy. If there were a half dozen states, and a half dozen candidates, then that might make sense. Just because the data is technically hierarchic, or arranged in a tree, that doesn’t mean that it’s the best representation for it.

The solution? Just give us the four options. No sliding panels, trap doors, etc. Better yet, superimpose the Clinton and Obama data on a single map as different colors, and have a pair of buttons (not tabs!) that let the viewer quickly swap between Indiana and North Carolina.

(This only covers the interaction model, without getting into the way the data itself is presented, colors chosen, laid out, etc. The lack of population density information in the image makes the maps themselves nearly worthless.)

Tuesday, May 6, 2008 | infographics, interact, politics  

Average Distance to the Nearest Road in the Conterminous United States

Got an email over the weekend from Tom Vanderbilt, who had seen the All Streets piece, and was kind enough to point me to this map (PDF) from the USGS that depicts the average distance to the nearest road across the continental 48 states. (He’s currently working on a book titled Traffic: Why We Drive the Way We Do (and What It Says About Us) to be released this fall).

And too bad I just learned the word conterminous, but had I used that in the original project description, we would have missed (or been spared) the Metafilter discussion of whether “lower 48” was accurate terminology.


A really interesting map, which of course also shows the difference between something thrown together in a few hours and actual research. In digging around for the map’s source, I found that exactly a year ago, they also published a paper in Science describing their broader work:

Roads encroaching into undeveloped areas generally degrade ecological and watershed conditions and simultaneously provide access to natural resources, land parcels for development, and recreation. A metric of roadless space is needed for monitoring the balance between these ecological costs and societal benefits. We introduce a metric, roadless volume (RV), which is derived from the calculated distance to the nearest road. RV is useful and integrable over scales ranging from local to national. The 2.1 million cubic kilometers of RV in the conterminous United States are distributed with extreme inhomogeneity among its counties.

The publication even includes a response and a response to the response—high scientific drama! Apparently some lads feel that “roadless volume does not explicitly address ecological processes.” So let that be a warning to all you non-explicit addressers.

For those lucky to have access to the journal online, the supplementary information includes a time lapse video of a section of Colorado, and its roadless volume since 1937. As with all things, it’s much more interesting to see how this changes over time. A map of all streets in the lower 48 isn’t nearly as engaging as a sequence of the same area over several years. The latter story is simply far more compelling.

Tuesday, May 6, 2008 | allstreets, feedbag, mapping  

Unicode, character encodings, and the declining dominance of Western European character sets

Computers know nothing but numbers. As humans we have varying levels of skill in using numbers, but most of the time we’re communicating with words and phrases. So in the early days of computing, the earliest software developers had to find a way to map each character—a letter Q, the character #, or maybe a lowercase b—into a number. A table of characters would be made, usually either 128 or 256 of them, depending on whether data was stored or transmitted using 7 or 8 bits. Often the data would be stored as 7 bits, so that the eighth bit could be used as a parity bit, a simple method of error correction (because data transmission—we’re talking modems and serial ports here—was so error prone).

Early on, such encoding systems were designed in isolation, which meant that they were rarely compatible with one another. The number 34 in one character set might be assigned to “b”, while in another character set, assigned to “%”. You can imagine how that works out over an entire message, but the hilarity was lost on people trying to get their work done.

In the 1960s, the American National Standards Institute (or ANSI) came along and set up a proper standard, called ASCII, that could be shared amongst computers. It was 7 bits (to allow for the parity bit) and looked like:

  0 nul    1 soh    2 stx    3 etx    4 eot    5 enq    6 ack    7 bel
  8 bs     9 ht    10 nl    11 vt    12 np    13 cr    14 so    15 si
 16 dle   17 dc1   18 dc2   19 dc3   20 dc4   21 nak   22 syn   23 etb
 24 can   25 em    26 sub   27 esc   28 fs    29 gs    30 rs    31 us
 32 sp    33  !    34  "    35  #    36  $    37  %    38  &    39  '
 40  (    41  )    42  *    43  +    44  ,    45  -    46  .    47  /
 48  0    49  1    50  2    51  3    52  4    53  5    54  6    55  7
 56  8    57  9    58  :    59  ;    60  <    61  =    62  >    63  ?
 64  @    65  A    66  B    67  C    68  D    69  E    70  F    71  G
 72  H    73  I    74  J    75  K    76  L    77  M    78  N    79  O
 80  P    81  Q    82  R    83  S    84  T    85  U    86  V    87  W
 88  X    89  Y    90  Z    91  [    92  \    93  ]    94  ^    95  _
 96  `    97  a    98  b    99  c   100  d   101  e   102  f   103  g
104  h   105  i   106  j   107  k   108  l   109  m   110  n   111  o
112  p   113  q   114  r   115  s   116  t   117  u   118  v   119  w
120  x   121  y   122  z   123  {   124  |   125  }   126  ~   127 del

The lower numbers are various control codes, and the characters 32 (space) through 126 are actual printed characters. An eagle-eyed or non-Western reader will note that there are no umlauts, cedillas, or Kanji characters in that set. (You’ll note that this is the American National Standards Institute, after all. And to be fair, those were things well outside their charge.) So while the immediate character encoding problem of the 1960s was solved for Westerners, other languages would still have their own encoding systems.

As time rolled on, the parity bit became less of an issue, and people were antsy to add more characters. Getting rid of the parity bit meant 8 bits instead of 7, which would double the number of available characters. Other encoding systems like ISO-8859-1 (also called Latin-1) were developed. These had better coverage for Western European languages, by adding some umlauts we’d all been missing. The encodings kept the first 0–127 characters identical to ASCII, but defined characters numbered 128–255.

However this still remained a problem, even for Western languages, because if you were on a Windows machine, there was a different definition for characters 128–255 than there was on the Mac. Windows used what was called Windows 1252, which was just close enough to Latin-1 (embraced and extended, let’s say) to confuse everyone and make a mess. And because they like to think different, Apple used their own standard, called Mac Roman, which had yet another colorful ordering for characters 128–255.

This is why there are lots of web pages that will have squiggly marks or odd characters where em dashes or quotes should be found. If authors of web pages include a tag in the HTML that defines the character set (saying essentially “I saved this on a Western Mac!” or “I made this on a Norwegian Windows machine!”) then this problem is avoided, because it gives the browser a hint at what to expect in those characters with numbers from 128–255.

Those of you who haven’t fallen asleep yet may realize that even 200ish characters still won’t do—remember our Kanji friends? Such languages usually encode with two bytes (16 bits to the West’s measly 8), providing access to 65,536 characters. Of course, this creates even more issues because software must be designed to no longer think of characters as a single byte.

In the very early 90s, the industry heavies got together to form the Unicode consortium to sort out all this encoding mess once and for all. They describe their charge as:

Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.

They’ve produced a series of specifications, both for a wider character set (up to 4! bytes) and various methods for encoding these character sets. It’s truly amazing work. It means we can do things like have a font (such as the aptly named Arial Unicode) that defines tens of thousands of character shapes. The first of these (if I recall correctly) was Bitstream Cyberbit, which was about the coolest thing a font geek could get their hands on in 1998.

The most basic version of Unicode defines characters 0–65535, with the first 0–255 characters defined as identical to Latin-1 (for some modicum of compatibility with older systems).

One of the great things about the Unicode spec is the UTF-8 encoding. The idea behind UTF-8 is that the majority of characters will be in that standard ASCII set. So if the eighth bit of a character is a zero, then the other seven bits are just plain ASCII. If the eighth bit is 1, then it’s some sort of extended format. At which point the remaining bits determine how many additional characters (usually two) are required to encode the value for that character. It’s a very clever scheme because it degrades nicely, and provides a great deal of backward compatibility with the large number of systems still requiring only ASCII.

Of course, assuming that ASCII characters will be most predominant is to some repeating the same bias as back in the 1960s. But I think this is an academic complaint, and the benefits of the encoding far outweigh the negatives.

Anyhow, the purpose of this post was to write that Google reported yesterday that Unicode adoption on the web has passed ASCII and Western European. This doesn’t mean that English language characters have been passed up, but rather that the number of pages encoded using Unicode (usually in UTF-8 format), has finally left behind the archaic ASCII and Western European formats. The upshot is that it’s a sign of us leaving the dark ages—almost 20 years since the internet was made publicly available, and since the start of the Unicode consortium, we’re finally starting to take this stuff seriously.

The Processing book also has a bit of background on ASCII and Unicode in an Appendix, which includes more about character sets and how to work with them. And future editions of vida will also cover such matters in the Parse chapter.

Tuesday, May 6, 2008 | parse, unicode, updates, vida  

Another delegate calculator

Wonderfully simple delegate calculator from the New York Times. Addresses a far simpler question than the previously mentioned Slate calculator, but bless the NYT for realizing that something that complicated was no longer necessary.


Good example of throwing out extraneous information to tell a story more directly: a quick left and right drag provides a more accurate depiction than the horse race currently in the headlines.

Monday, May 5, 2008 | election, politics, scenarios  

Doin’ stats for the C’s

A New York Times piece by the Freakonomics guys about Mike Zarren, the 32-year-old numbers guy for the Boston Celtics. While statistics has become more-or-less mainstream for baseball, the same isn’t quite true for basketball or football (though that’s changing too). They have better words for it than me:

This probably makes good sense for a sport like baseball, which is full of discrete events that are easily measured… Basketball, meanwhile, might seem too hectic and woolly for such rigorous dissection. It is far more collaborative than baseball and happens much faster, with players shifting from offense one moment to defense the next. (Hockey and football present their own challenges.)

But that’s not to say that something can be gained by looking at the numbers:

What’s the most efficient shot to take besides a layup? Easy, says Zarren: a three-pointer from the corner. What’s one of the most misused, misinterpreted statistics? “Turnovers are way more expensive than people think,” Zarren says. That’s because most teams focus on the points a defense scores from the turnover but don’t correctly value the offense’s opportunity cost — that is, the points it might have scored had the turnover not occurred.

Of course, the interesting thing about sports is that at their most basic, they cannot be defined by statistics or numbers. Take the Celtics, who just won the first round of the playoffs. Given their ability, the Celtics should have dispensed with the Hawks more quickly, rather than needing all seven games of the series to win the necessary four. The coach in the locker room of any Hoosiers ripoff will tell you it doesn’t matter what’s on the stat sheets, it matters who shows up that day. It’s the same reason that owners cannot buy a trophy even in a sport that has no salary cap. Or, if you’re like some of my in-laws-to-be (all Massachusetts natives), you might suspect that the fix is in (“How much money do those guys make per game?”) Regardless, it’s the human side of the sport, not the numbers, that make it worth watching. (And I don’t mean the soft-focus ESPN “Outside the Lines” version of the “human” side of the sport. Yech.)

In the meantime, maybe the Patriots or the Sox are hiring…

(Passed along by Andy Oram, my editor for vida)

Monday, May 5, 2008 | sports  

Flash file formats opened?

Via Slashdot, word that Adobe is opening the SWF and FLV file formats through the Open Screen Project. On first read this seemed great—Adobe essentially re-opening the SWF spec. It was released under a less onerous license by Macromedia ca. 1998, but then closed back up again once it became clear that the other vector graphics for the web proposals from Microsoft and others would not be an actual competitor. At the time, Microsoft had submitted a binary format called VML to the W3C, and the predecessor to SVG (called PGML) had also been proposed by then-rival Adobe and friends.

On second read it looks like they’re trying to kill Android before it has a chance to get rolling. So history rhymes ten years later. (Shannon informs me that this may qualify as a pantoum).

But to their credit (I’m shocked, actually), both specs are online already:

The SWF (Flash file format) specification

The FLV (Flash video file format) specification

….and more important, without any sort of click-through license. (“By clicking this button you pledge your allegiance to Adobe Systems and disavow your right to develop for products and platforms not controlled or approved by Adobe or its partners. The aforementioned transferral of rights also applies to your next of kin as well as your extended network of business partners and/or (at Adobe’s discretion) lunch dates.”)

I’ve never been nuts about using “open” as prefix for projects, especially as it relates to big companies hyping what do-gooders they are. It makes me think of the phrase “compassionate conservatism”. The fact that “compassionate” has to be added is more telling than anything else. They doth protest too much.

Thursday, May 1, 2008 | parse  

Design and the Elastic Mind

Perhaps three months late for an announcement, and at the risk of totally reckless narcissism, I should mention that four of my projects are currently on display in the Design and the Elastic Mind exhibition at the Museum of Modern Art in New York. My work notwithstanding, I hear that the show is generating lots of foot traffic and positive reviews, which is a well-deserved compliment to curator Paola Antonelli.

There’s a New York Times article and slide show (too much linking to the Times lately, weird…) and a writeup in the International Herald Tribune that even mentions my Humans vs. Chimps piece.

The first wall as you enter the show is all of Chromosome 18, done in the style of this piece.


It’s a 3 pixel font at 150 dpi, so there are 37.5 letters per inch in either direction, and the wall is about 20 feet square, making 75 million letters total. Paola and her staff asked whether it was OK to put the text on the piece itself, which I felt was fine, as the nature of the piece is about scale, and the printing would not detract from that. The funny side effect of this was watching people at the opening take one another’s picture in front of the piece, mostly probably not realizing that the wall itself was part of the exhibition. Perhaps my most popular work so far, given the number of family photos in which it will be found.

Former classmate Ron Kurti also took a nice detail shot:


Also in the show is the previously mentioned Humans vs. Chimps project as seen below:


This image is about three feet wide so you can read the letters accurately. It’s found next to an identically sized print of isometricblocks depicting the CFTR region of the human genome (the area implicated in connection to Cystic Fibrosis). The image was first developed for a Nature cover.


Finally, the Pac-Man print of distellamap is printed floor to ceiling on another wall in the exhibition. Unfortunately there was a glitch in the printing that caused the lines connecting portions of the code to be lost (because they’re too thin to see at a distance), but no matter.


Much moreso than my own work, however, by far the most exciting for me is the number of projects built with Processing that are in the show. It’s a bit humbling and the sort of thing that makes me excited (and relieved) to have some time this summer to devote to Processing itself.

Wednesday, April 30, 2008 | iloveme  

Google Underwater

So that might not be the awesome name that they’ll be using, but CNET is rumormongering about Google cooking up something oceanographic along the lines of Maps or Earth. Their speculation includes this lovely image from the Lamont-Doherty Earth Observatory (LDEO) of Columbia University.


Unlike most people with a heartbeat, I didn’t find Google Maps particularly interesting on arrival. I was a fan of the simplicity of Yahoo Maps at the time (but no longer, eek!) and Microsoft’s Terraserver had done satellite imagery for a few years. But the same way that Google Mars shows us something we’re even less familiar with than satellite imagery of Earth, there’s something really exciting about possibility of seeing beneath the oceans.

Wednesday, April 30, 2008 | mapping, rumors, water  

Me blog big linky

Kottke and Freakonomics were kind enough to link over here, which has brought more queries about salaryper. Rather than piling onto the original web page, I’ll add updates to this section of the site.

I didn’t include the project’s back story with the 2008 version of the piece, so here goes:

Some background for people who don’t watch/follow/care about baseball:

When I first created this piece in 2005, the Yankees had a particularly bad year, with a team full of aging all-stars and owner George Steinbrenner hoping that a World Series trophy could be purchased for $208 million. The World Champion Red Sox did an ample job of defending their title, but as the second highest paid team in baseball, they’re not exactly young upstarts. The Chicago White Sox had an excellent year with just one third the salary of the Yankees, while the Cardinals are performing roughly on par with what they’re paid. Interestingly, the White Sox went on to win the World Series. The performance of Oakland, which previous years has far exceeded their overall salary, was a story, largely about their General Manager Billy Beane, told in the book Moneyball.

Some background for people who do watch/follow/care about baseball:

I neglected to include a caveat on the original page that this is a really simplistic view of salary vs. performance. I created this piece because the World Series victory of my beloved Red Sox was somewhat bittersweet in the sense that the second highest paid team in baseball finally managed to win a championship. This fact made me curious about how that works across the league, with raw salaries and the general performance of the individual teams.

There are lots of proportional things that can be done too—the salaries especially exist across a wide range (the Yankees waaaay out in front, followed the another pack of big market teams, then everyone else).

There are far more complex things about how contracts work over multiple years, how the farm system works, and scoring methods for individual players that could be taken into consideration.

This piece was thrown together while watching a game, so it’s perhaps dangerously un-advanced, given the amount of time and energy that’s put into the analysis (and argument) of sports statistics.

That last point is really important… This is fun! I encourage people to try out their own methods of playing with the data. For those who need a guide on building such a beast, the book has all the explanation and all the code (which isn’t much). And if you adapt the code, drop me a line so I can link to your example.

I have a handful of things I’d like to try (such as a proper method for doing proportional spacing at the sides without overdoing it), though the whole point of the project is to strip away as much as possible, and make a straightforward statement about salaries, so I haven’t bothered coming back to it since it succeeds in that original intent.

Wednesday, April 30, 2008 | salaryper, updates, vida  

Updated Salary vs. Performance for 2008

It’s April again, which means that there are messages lurking in my inbox asking about the whereabouts of this year’s Salary vs. Performance project (found in Chapter 5 of the good book). I got around to updating it a few days ago, which means now my inbox has changed to suggestions on how the piece might be improved. (It’s tempting to say, “Hey! Check out the book and the code, you can do anything you’d like with it! It’s more fun that way.” but that’s not really what they’re looking for.)

One of the best messages I’ve received so far is from someone who I strongly suspect is a statistician, who was wishing to see a scatter plot of the data rather than its current representation. Who else would be pining for a scatterplot? There are lots of jokes about the statistically inclined that might cover this situation, but… we’re much too high minded to let things devolve to that (actually, it’s more of a pot-kettle-black situation). If prompted, statisticians usually tell better jokes about themselves anyways.

At any rate, as it’s relevant to the issue of how you choose representations, my response follows:

Sadly, the scatter plot of the same data is actually kinda uninformative, since one of your axes (salary) is more or less fixed all season (might change at the trade deadline, but more or less stays fixed) and it’s just the averages that move about. So in fact if we’re looking for more “accurate”, a time series is gonna be better for our purposes. In an actual analytic piece, for instance, I’d do something very different (which would include multiple years, more detail about the salaries and how they amortize over time, etc).

But even so, making the piece more “correct” misses the intentional simplifications found in it, e.g. it doesn’t matter whether a baseball team was 5% away from winning, it only matters whether they’ve won. At the end of the day, it’s all about the specific rankings, who gets into the playoffs, and who wins those final games. Since the piece isn’t intended as an analytical tool, but something that conveys the idea of salary vs. performance to an audience who by and large cares little about 1) baseball and 2) stats. That’s not to say that it’s about making something zoomy and pretty (and irrelevant), but rather, how do you engage people with the data in a way that teaches them something in the end and gets them thinking about it.

Now to get back to my inbox and the guy who would rather have the data sonified since he thinks this visual thing is just a fad.

Tuesday, April 29, 2008 | examples, represent, salaryper  

All Streets Error Messages

Some favorite error messages while working on the All Streets project (mentioned below). I was initially hoping to use Illustrator to open the generated PDF files (generated from Processing), but Venus informed me that it was not to be:


I’m having difficulties as well. Why did I pay for this software?

Generally, Photoshop is far better engineered so I was hoping that it would be able to rasterize the PDF file instead, never mind the vectors and all.


Oh come on… Just admit that you ran out of memory and can’t deal. Meanwhile, Eugene was helping out with the site, from the other end of iChat:


Oh well.

Sunday, April 27, 2008 | allstreets, software  

The Advantages of Closing a Few Doors

From the New York Times, a piece about Predictably Irrational from Dan Ariely. I’m somewhat fascinated by the idea of our general preoccupation with holding on to things, particularly as it relates to retaining data (see previous posts referencing Facebook, Google, etc.)

Our natural tendency is to keep everything, in spite of the consequences. Storage capacity in the digital realm is only getting larger and cheaper (as its size in the physical realm continues to get smaller), which only seeks to feed off this tendency further. Perhaps this is also why more individuals don’t question Google claiming a right to keep messages from their Gmail account after the messages, or even the account, have been deleted.

Ariely’s book describes a set of experiments performed at M.I.T.:

[Students] played a computer game that paid real cash to look for money behind three doors on the screen… After they opened a door by clicking on it, each subsequent click earned a little money, with the sum varying each time.

As each player went through the 100 allotted clicks, he could switch rooms to search for higher payoffs, but each switch used up a click to open the new door. The best strategy was to quickly check out the three rooms and settle in the one with the highest rewards.

Even after students got the hang of the game by practicing it, they were flummoxed when a new visual feature was introduced. If they stayed out of any room, its door would start shrinking and eventually disappear.

They should have ignored those disappearing doors, but the students couldn’t. They wasted so many clicks rushing back to reopen doors that their earnings dropped 15 percent. Even when the penalties for switching grew stiffer — besides losing a click, the players had to pay a cash fee — the students kept losing money by frantically keeping all their doors open.

(Emphasis mine.) I originally came across the article via Mark Hurst, who adds:

I’ve said for a long time that the solution to information overload is to let the bits go: always look for ways to delete, defer, or otherwise avoid bits, so that the few that remain are more relevant and easier to handle. This is the core philosophy of Bit Literacy.

Put another way, do we need to take more personal responsibility for subjecting ourselves to the “information overload” that people so happily buzzword about? Is complaining about the overload really an issue of not doing enough spring cleaning at home?

Sunday, April 27, 2008 | retention  

Restroom information graphics


I like neither bacon nor these machines, so I wish they would always provide this helpful explanation (or warning).

Friday, April 25, 2008 | infographics  

The Earth at night

Via mailing list, Oswald Berthold passes along images and a short article of the Earth from space as compiled by NASA, highlighting city lights in particular.

Tokyo Bay

The collection is an update to the Earth Lights image developed a few years ago (and which made its way ’round the interwebs at the time).

For the more technical, a presentation from the NOAA titled Low Light Imaging of the Earth at Night provides greater detail about the methods used to produce such images. Also includes a couple interesting historical examples (such as the first image they created) as well as comparisons of city growth over time based on changes in the data.

Of course many conclusions can be drawn from seeing map data such as this. Look at the difference between North and South Korea, for instance (original image from globalsecurity.org).

North and South Korea by night

Apparently this is a favorite of former U.S. Secretary of Defense Donald Rumsfeld:

Mr Rumsfeld showed the picture to illustrate how backward the northern regime really is – and how oppressed its people are. Without electricity there can be none of the appliances that make life easy and that we take for granted, he said.

“Except for my wife and family, that is my favourite photo,” said Mr Rumsfeld.

“It says it all. There’s the south, the same people as the north, the same resources north and south, and the big difference is in the south it’s a free political system and a free economic system.

I’ve vowed to myself not to make this page be about politics so I won’t get into the fatuous arguments of a warmonger (oops), but I think the fascinating thing is that

  1. This image, this “information graphic,” would be of such great importance to a person that he would see fit to even mention it in reference to photos of his wife and children. This is a strong statement for any image, even if he is being dramatic.
  2. The use of images to make or score political points. There’s some great stuff buried in recent Congressional testimony about the Iraq War, for instance, that I want to get to soon.

In regards to #1, I’m trying to think of other images to which people maintain such a personal relationship (particularly those whose job is not info graphics—Tufte’s preoccupation with Napoleon’s March doesn’t count.)

As for #2, hopefully we’ll get to that a bit later.

Friday, April 25, 2008 | mapping, physical, politics  
Older Posts »

Visualizing Data Book CoverVisualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.