Writing

New for 2010

Back in December, I made the decision to leave Seed and strike out on my own. As of January 1st (two weeks ago), I’m setting up shop in Cambridge. (That’s the fake Cambridge for you UK readers. Or, Cambridge like “MIT and Harvard” not “University Of”).

The federal government knows this new venture under the charmingly creative moniker of BEN FRY LLC, but with any luck, a proper name will be found soon so that I don’t have to introduce myself as Ben Fry, founder of Ben Fry LLC. (Which is even worse than having a site with your own name as the URL. I have Tom White—who originally registered the site as a joke—to thank for that.)

I’ll soon be hiring designers, developers, data people, and peculiar hybrids thereof. If you do the sort of work that you see on this site, please get in touch (send a message to mail at benfry.com). In particular I’d like to find people local to Cambridge/Boston, but because some of this will be project-oriented freelance work, some of it can be done at a distance.

Stay tuned, more to come.

(Update 1/21/2010 – Thanks for the responses. I’m having trouble keeping on top of my inbox so my apologies in advance if you don’t hear back from me promptly.)

Saturday, January 16, 2010 | opportunities, seed, site  

Health Numbers in Context

As a continuation of this project, we’ve just finished a second health visualization (also built with Processing) using GE’s data. Like the first round, we started with ~6 million patient records from their “MQIC” database. Using the software, you input gender, age range, height/weight (to calculate BMI), and smoking status. Based on the selections it shows you the number of people in the database that match those settings, and the percentages that have been diagnosed with diabetes, heart disease, hypertension, or have had a stroke:

are you blue? no, dark blue.

For people reading the site because they’re interested in visualization (I guess that’s all of you, except for mom, who is just trying to figure out what I’m up to), some inside baseball:

On the interaction side, the main objective here was to make it easy to move around the interface as quickly as possible. The rows are shown in succession so that the interface can teach itself, but we also provide a reset button so that you can return to the starting point. Once the rows are visible, though, it’s easy to move laterally and make changes to the settings (swapping between age ranges, for instance).

One irony of making the data accessible this way is that most users — after looking up their own numbers — will then try as many different possibilities, in a quick hunt for the extremes. How high do the percentages go? If I select bizarre values, what happens at the edges? Normally, you don’t have to spend as much time on these 1% cases, and it would be alright for things to be a little weird when truly odd values are entered (300 lb. people who are 4′ tall, smokers, and age 75 and over). But in this case, a lot more time has to be spent making sure things work. So while most of the time the percentages at the top are in the 5-15% range, I had to write code so that when one category shoots up to 50%, the other bars in the chart scale down in proportion.

Another aspect of the interface is the body mass index calculator. Normally a BMI chart looks something like this, a large two-dimensional plot that would otherwise use up half of the interface. By using a little interaction, we can make a simpler chart that dynamically updates itself based on the current height or weight settings. Also, because the ranges have (mathematically) hard edges, we’re showing that upper and lower bound of the range so that it’s more apparent. Otherwise, a 5’8″ person steps from 164 to 165 lbs to find themselves suddenly overweight. In reality, the boundaries are more fuzzy, which would be taken into account by a doctor. But with the software, we instead have to be clear about the way the logic is working.

(Note that the height and weight are only used to calculate a BMI range — it’s not pulling individuals from the database who are 5’8″ and 160 lbs, it’s pulling people from the “normal” BMI range.)

For the statistically (or at least numerically) inclined, there are also some interesting quirks that can be found, like a situation or two where health risk would be expected to go up, but in fact they go down (I’ll leave you to find them yourself). This is not a bug. We’re not doing any sort of complex math here to evaluate actual risk, the software is just a matching game with individuals in the database. These cases in particular show up when there are only a few thousand individuals, say 2,000 out of the full 6 million records. The number of people in these edge cases is practically a rounding error, which means that we can’t make sound conclusions with them. As armchair doctor-scientist, it’s also interesting to speculate as to what might be happening in such cases, and how other factors may come into play.

Have fun!

Wednesday, August 26, 2009 | interact, mine, probability, processing, seed  

Comorbidity: it’s no longer just for physicians and statisticians

A simple, interactive means for seeing connections between demographics, diseases, and diagnoses:

imagining health as 300 people symbols rearranging themselves in a data symphony

We just finished developing this project for GE as part of the launch of their new health care initiative. With the input and guidance of a handful of departments within the company, we began by looking at their proprietary database of 14 million patient records looking for ways to show connections between related conditions. For instance, we wanted visitors to the site to be able to learn how diabetes diagnoses increase along with obesity, but convey it in a manner that didn’t feel like a math lesson. By cycling through the eight items at the top (and the row beneath it), you can make several dozen comparisons, highlighting what’s found in actual patient data. At the bottom, some additional background is provided based on various national health care studies.

I’m excited to have the project finished and online, and have people making use of it, as I readjust from the instant gratification of building things one day and then talking about them the next day. More to come!

Monday, May 18, 2009 | seed  

Seed Visualization

Update: As of January 1st, 2010, I’m no longer at Seed. Read more here.

Some eighteen months as visualization vagabond (roving writer, effusive explainer, help me out here…) came to a close in December when I signed up with Seed Media Group to direct a new visualization studio here in Cambridge. We now have a name—the Phyllotaxis Lab—and as of last week, we’ve made it official with a press release:

NEW YORK and CAMBRIDGE, MA (March 5, 2009) – Building on Seed Media Group’s strong design culture, Adam Bly, founder and CEO, announced today the appointment of Ben Fry as the company’s first Design Director. Seed Media Group also announced the launch of a new unit focused on data and information visualization to be based in Cambridge, Massachusetts and headed by Ben Fry.

Seed Visualization will help companies and governments find solutions to clearly communicate complex data sets and information to various stakeholders. phyllotaxisThe unit’s research arm, the Phyllotaxis Lab, will work to advance the field of data visualization and will undertake research and experimental design work. The Lab will partner with academic institutions around the world and will provide education on the field of data visualization.

And about that name:

Phyllotaxis is a form commonly found in nature that is derived from the Fibonacci sequence. It is the inspiration for Seed Media Group’s logo, designed in 2005 by Stefan Sagmeister and recently included in the Design and the Elastic Mind exhibit at MoMA. “Much like a phyllotaxis, visualization is about both numbers and information as well as structure and form,” said Ben Fry. “It’s a reminder that beauty is derived from the intelligence of the solution.”

The full press release can be found here (PDF), and more details are forthcoming.

This is gonna be great.

Tuesday, March 10, 2009 | iloveme, seed  
Book

Visualizing Data Book CoverVisualizing Data is my 2007 book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. When first published, it was the only book(s) for people who wanted to learn how to actually build a data visualization in code.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is the basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.