Writing

Health Numbers in Context

As a continuation of this project, we’ve just finished a second health visualization (also built with Processing) using GE’s data. Like the first round, we started with ~6 million patient records from their “MQIC” database. Using the software, you input gender, age range, height/weight (to calculate BMI), and smoking status. Based on the selections it shows you the number of people in the database that match those settings, and the percentages that have been diagnosed with diabetes, heart disease, hypertension, or have had a stroke:

are you blue? no, dark blue.

For people reading the site because they’re interested in visualization (I guess that’s all of you, except for mom, who is just trying to figure out what I’m up to), some inside baseball:

On the interaction side, the main objective here was to make it easy to move around the interface as quickly as possible. The rows are shown in succession so that the interface can teach itself, but we also provide a reset button so that you can return to the starting point. Once the rows are visible, though, it’s easy to move laterally and make changes to the settings (swapping between age ranges, for instance).

One irony of making the data accessible this way is that most users — after looking up their own numbers — will then try as many different possibilities, in a quick hunt for the extremes. How high do the percentages go? If I select bizarre values, what happens at the edges? Normally, you don’t have to spend as much time on these 1% cases, and it would be alright for things to be a little weird when truly odd values are entered (300 lb. people who are 4′ tall, smokers, and age 75 and over). But in this case, a lot more time has to be spent making sure things work. So while most of the time the percentages at the top are in the 5-15% range, I had to write code so that when one category shoots up to 50%, the other bars in the chart scale down in proportion.

Another aspect of the interface is the body mass index calculator. Normally a BMI chart looks something like this, a large two-dimensional plot that would otherwise use up half of the interface. By using a little interaction, we can make a simpler chart that dynamically updates itself based on the current height or weight settings. Also, because the ranges have (mathematically) hard edges, we’re showing that upper and lower bound of the range so that it’s more apparent. Otherwise, a 5′8″ person steps from 164 to 165 lbs to find themselves suddenly overweight. In reality, the boundaries are more fuzzy, which would be taken into account by a doctor. But with the software, we instead have to be clear about the way the logic is working.

(Note that the height and weight are only used to calculate a BMI range — it’s not pulling individuals from the database who are 5′8″ and 160 lbs, it’s pulling people from the “normal” BMI range.)

For the statistically (or at least numerically) inclined, there are also some interesting quirks that can be found, like a situation or two where health risk would be expected to go up, but in fact they go down (I’ll leave you to find them yourself). This is not a bug. We’re not doing any sort of complex math here to evaluate actual risk, the software is just a matching game with individuals in the database. These cases in particular show up when there are only a few thousand individuals, say 2,000 out of the full 6 million records. The number of people in these edge cases is practically a rounding error, which means that we can’t make sound conclusions with them. As armchair doctor-scientist, it’s also interesting to speculate as to what might be happening in such cases, and how other factors may come into play.

Have fun!

Wednesday, August 26, 2009 | interact, mine, probability, processing, seed  

History of Processing, as told by John Maeda

kicking it color mac classic styleJohn Maeda (Casey and I’s former advisor) has written a very gracious, and very generous article about the origins of the Processing project for Technology Review. An excerpt:

In 2001, when I was a young MIT faculty member overseeing the Media Lab Aesthetics and Computation Group, two students came up with an idea that would become an award-winning piece of software called Processing—which I am often credited with having a hand in conceiving. Processing, a programming language and development environment that makes sophisticated animations and other graphical effects accessible to people with relatively little programming experience, is today one of the few open-source challengers to Flash graphics on the Web. The truth is that I almost stifled the nascent project’s development, because I couldn’t see the need it would fill. Luckily, Ben Fry and Casey Reas absolutely ignored my opinion. And good for them: the teacher, after all, isn’t always right.

To give him more credit (not that he needs it, but maybe because I’m bad with compliments), John’s objection had much to do with the fact that Processing was explicitly an evolutionary, as opposed to revolutionary, step in how coding was done. That’s why it was never the focus of my Masters or Ph.D. work, and instead has nearly always been a side project. And more importantly, for students in his research group, he usually forced us away from whatever came naturally for us. Those of us for whom creating tools was “easy,” he forced us to make less practical things. For those who were comfortable making art, he steered them toward creating tools. In the end, we all learned more that way.

Tuesday, August 25, 2009 | processing  

Tiny Sketch, Big Funny

not all sketches are 6x6 pixels in sizeJust heard about this from Casey yesterday:

Tiny Sketch is an open challenge to artists and programmers to create the most compelling creative work possible with the programming language Processing using 200 characters or less.

…building on the proud traditions of obfuscated code contests and the demo scene. The contest runs through September 13 and is sponsored by Rhizome and OpenProcessing.

Having designed Processing to do one thing or another, several of the submissions made me laugh out loud for ways their authors managed to introduce new quirks. For instance, consider the createFont() function. Usually it looks something like this:

PFont f = createFont("Helvetica", 12);

If the “Helvetica” font is not installed, it silently winds up using a default font. So somebody clever figured out that if you just leave the font name blank, it’s an easy way to get a default font, and not burn several characters of the limit:

PFont f = createFont("", 12);

Another, by Kyle McDonald, throws an exception as a way to produce text to plot on screen. (It’s also a bit of an inside joke—on us, perhaps—because it’s a ubiquitous error message resulting from a change that was made since earlier releases of Processing.)

One of the most interesting bits is seeing how these ideas propagate into later sketches that are produced. Since the font hack appeared (not sure who did it first, let me know if you do), everyone else is now using that method for producing text. Obviously art/design/coding projects are always the result of other influences, but it’s rare that you get to see ideas exchanged in such a direct fashion.

And judging from some of the jagged edges in the submissions, I’m gonna change the smooth() to just s() for the next release of Processing, so that more people will use it in the next competition.

Friday, August 14, 2009 | code, opportunities, processing  

Weight Duplexing, Condensed Tabulars, and Multiple Enclosures

More typographic tastiness (see the earlier post) from Hoefler & Frere-Jones with a writeup on Choosing Fonts for Annual Reports. Lots of useful design help and ideas for anyone who works with numbers, whether actual annual reports or (more likely) fighting with Excel and PowerPoint. For instance, using enclosures to frame numbers, or knock them out:

knocking out heaven's door

Another helpful trick is using two weights so that you can avoid placing a line between them:

pick em out of a lineup

Or using a proper condensed face when you have to invite too many of your numerical friends:

squeeze me macaroni

At any rate, I recommend the full article for anyone working with numbers, either for the introduction to setting type (for the non-designers) or a useful reminder of some of the solutions (for those who fret about these things on a regular basis).

Thursday, August 6, 2009 | refine, typography  

Also from the office of scary flowcharts

Responding to the Boehner post, Jay Parkinson, M.D. pointed me to this improved chart by designer Robert Palmer, accompanied by an angst-ridden open letter (an ironic contrast to the soft pastels in his diagram) decrying the crimes of visual malfeasance.

gonna have to face it you're addicted to chartsMeanwhile, Ezra Klein over at the Washington Post seems to be thinking along similar lines as my original post, noting this masked artist’s earlier trip to Kinko’s a few weeks ago. Klein writes:

it may be small, but there is still terrorWhoever is heading the Scary Flowcharts Division of John Boehner’s office is quickly becoming my favorite person in Washington. A few weeks ago, we got this terror-inducing visualization of the process behind “Speaker Pelosi’s National Energy Tax.”

That’s hot!

If I were teaching right now, I’d make all my students do a one day charrette on trying to come up with something worse than the Boehner health care image while staying in the realm of colloquial things you can do with PowerPoint. It’d be a great time, and we’d all learn a lot.

Having spent two posts making fun of the whole un-funny mess around health care, I’ll leave you with the best bit of op-ed I’ve read on the topic, from Harold Meyerson, also at the Washington Post:

Watching the centrist Democrats in Congress create more and more reasons why health care can’t be fixed, I’ve been struck by a disquieting thought: Suppose our collective lack of response to Hurricane Katrina wasn’t exceptional but, rather, the new normal in America. Suppose we can no longer address the major challenges confronting the nation. Suppose America is now the world’s leading can’t-do country.

I agree and find it terrifying. And I don’t think that’s a partisan issue.

Now back to your purposefully apolitical, regularly scheduled blog on making pictures of data.

Thursday, August 6, 2009 | feedbag, flowchart, obfuscation, politics, thisneedsfixed  

Thesaurus Plus Context

can i get it in red?BBC News brings word (via) that after a 44 year effort, the Historical Thesaurus of the Oxford English Dictionary will see the light of day. Rather than simple links between words, the beastly volume covers the history of the words within. For instance, the etymological timeline of the word “trousers” follows:

trousers breeks 1552- · strosser 1598-1637 · strouse 1600-1620 · brogues 1615- a 1845 · trouses 1679-1820 · trousers 1681- · trouser 1702- ( rare ) · inexpressibles 1790- ( colloq. ) · indescribables 1794-1837 ( humorous slang ) ·etceteras 1794-1843 ( euphem. ) · kickseys/kicksies 1812-1851 ( slang ) · pair of trousers 1814- · ineffables 1823-1867 ( colloq. ) · unmentionables 1823- · pantaloons 1825- · indispensables a 1828- ( colloq. euphem. ) · unimaginables 1833 · innominables 1834/43 ( humorous euphem. ) · inexplicables 1836/7 · unwhisperables 1837-1863 ( slang ) · result 1839 · sit-down-upons 1840-1844 ( colloq. ) · pants 1840- · sit-upons 1841-1857 ( colloq. ) · unutterables 1843; 1860 ( slang Dict. ) · trews 1847- · sine qua nons 1850 · never-mention-ems 1856 · round-me-houses 1857 ( slang ) · round-the-houses 1858- ( slang ) · unprintables 1860 · stove-pipes 1863 · terminations 1863 · reach-me-downs 1877- · sit-in-’ems/sitinems 1886- ( slang ) · trousies 1886- · strides1889- ( slang ) · rounds 1893 ( slang ) · rammies 1919- ( Austral. &S. Afr. slang ) · longs 1928- ( colloq. )

Followed by a proper explanation:

breeks The earliest reference from 1552 marks the change in fashion from breeches, a garment tied below the knee and worn with tights. Still used in Scotland, it derives from the Old English “breeches”. trouser The singular form of “trousers” comes from the Gallic word “trews”, a close-fitting tartan garment formerly worn by Scottish and Irish highlanders and to this day by a Scottish regiment. The word “trouses” probably has the same derivation. unimaginables This 19th Century word, and others like “unwhisperables” and “never-mention-ems”, reflect Victorian prudery. Back then, even trousers were considered risque, which is why there were so many synonyms. People didn’t want to confront the brutal idea, so found jocular alternatives. In the same way the word death is avoided with phrases like “pass away” and “pushing up daisies”. stove-pipes A 19th Century reference hijacked in the 1950s by the Teddy Boys along with drainpipes. The tight trousers became synonymous with youthful rebellion, a statement of difference from the standard post-war suits. rammies This abbreviation of Victorian cockney rhyming slang “round-me-houses” travelled with British settlers to Australia and South Africa.

Are you seeing pictures and timelines yet? Then this continues for 600,000 more words. Mmmm!

And Ms. Christian Kay, one of the four editors, is my new hero:

An English language professor, Ms Kay, one of four co-editors of the publication, began work on it in the late 1960s – while she was in her 20s.

It’s hard to fathom being in your 60s, and completing the book that you started in your 20s, though it’s difficult to argue with the academic and societal contribution of the work. Her web page also lists “the use of computers in teaching and research” as one of her interest areas, which sounds like a bit of an understatement. I’d be interested in computers too if my research interest was the history 600,000 words and their 800,000 meanings across 236,000 categories.

Sadly, this book of life is not cheap, currently listed at Amazon for $316, (but that’s $79 off the cover price!) Though with a wife who covets the full 20 volume Oxford English Dictionary (she already owns the smaller, 35 lbs. version), I may someday get my wish.

Wednesday, August 5, 2009 | text  

Mapping Health Care: Here Be Dragons!

I’m so completely impressed with this incredible bit of info graphic awesomeness distributed by the office of John Boehner, Republican congressman from Ohio’s 8th District. The flow chart purports to show the Democrats’ health care proposal:

keep pushing this health care thing and it's only gonna get uglier!

The image needs to be much larger to be fully appreciated in its magnificent glory of awfulness, so a high resolution version is here, and the PDF version is here.

The chart was used by Boehner as a way to make the plan look as awful as possible — a tactic used to great effect by the same political party during the last attempt at health care reform in 1994. The diagram appears to be the result of a heart-warming collaboration between a colorblind draughtsman, the architect of a nightmarish city water works, and whoever designed the instructions for the bargain shelving unit I bought from Target.

Don’t waste your time, by the way — I’ve already nominated it for an AIGA award.

(And yes, The New Republic also created a cleaner version, and the broader point is that health care is just a complex mess no matter what, so don’t let that get in the way of my enjoyment of this masterwork.)

Additional perspective from The Daily Show (my original source) follows.

Tuesday, August 4, 2009 | flowchart, obfuscation, politics, thisneedsfixed  
Book

Visualizing Data Book CoverVisualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.

As seen on Twitter