One of the earliest fixtures in the Processing community is toxi (or Karsten Schmidt, if you must) who has been doing wonderful things using the language/environment/core for many years. A couple months ago he posted a beautiful reel of work done by the many users of his toxiclibs library. Just beautiful:
A more complete description can be found on the video page over at Vimeo.
Having spent my morning at the doctor’s office (I’m fine, Mom–just a physical), I passed the time by asking my doctor about the system they use for electronic medical records. Our GE work (1, 2) and seeing her gripe and sigh as truly awful-looking screen after screen flew past on her display caught my interest. And as someone who has an odd fascination with bad interfaces, I just had to ask…
Perhaps the most surprising bit was that without explicitly saying so, she seemed to find the EMR system most useful not as a thing that aggregates data, or makes her work easier, but instead as a communication tool. It combats the (very real, not just an overused joke) penmanship issues of fellow doctors, but equally as important, it sets a baseline or common framework for the details of a visit. The latter part is obvious, but the actual nature of it is more subtle. For instance, she would often find herself deciphering a scribble that says “throat, amox” by another doctor, and it says nothing of dosage, frequency, type of Amoxicillin, much less the nature of the throat trouble. A patient (particularly a sick patient) is also not the person to provide precise details. How many would remember whether they were assigned a 50, 150 or 500 milligram dosage (very different things, you might say). And for that matter, they’re probably equally likely to think they’re on a 500 kilogram dose. (“No, that’s too high. Must be 5 kilogram.”)
My doctor might be seeing such a patient because their primary care doctor (the mad scribbler) was out, or the patient was a referral, or had just moved offices, or whatever. But it makes an interesting point for the transience of medical data: importance increases as it’s in motion, which is especially true since the patient it’s attached to is not a static entity (from changing health conditions to changing jobs, cities, and doctors).
Or from a simpler angle, if you’re sick enough that you have to be seen by someone other than your primary care doctor, then it’s especially important for the information to be complete. So with any luck, the EMR removes a layer of translation that was required before.
As she described things off the top of her head, the data only came up later. Ok, it’s all data, but I’m referring to the numbers and the tests and the things that can be tracked easily over time. The sort of reduce-the-patient-to-numbers things we usually think of when hearing about EMRs. Readouts that display an array of tests, such as blood pressure history, is an important feature, but it wasn’t the killer app of EMRs. (And that will be the last time I use “killer app” and “electronic medical records” together. Pun not intended.)
The biggest downside (she’s now using her second system) is that the interfaces are terrible, usually that they do things in the wrong order, or require several windows and multiple clicks to do mundane tasks. She said there were several things that she liked and hated about this one, but that it was a completely different set of pros/cons from the other system she used. (And to over-analyze for a moment, I think she even said “like” and “hate” not “love” and “hate” or “like” and “dislike”. She also absentmindedly mentioned “this computer is going to kill me.” She’s not a whiner, and may truly believe it. EMRs may be killing our doctors! Call The New York Times, or at least Fox 25.) This isn’t surprising, I assume it’s just that technology purchasers are several levels removed from the doctors who have to use the equipment, which is usually the case for software systems like this, so there’s little market pressure for usability. If you’re big enough to need such a beast, then it means that the person making the decision about what to buy is a long ways removed. But I’m curious about whether this is a necessity of how big software is implemented, or a market opportunity.
At some point she also stated that it would be great if the software company had asked a doctor for their input in how the system was implemented. I think it’s safe to assume that there was at least one M.D.–if not an arsenal of individuals with a whole collection of alphabet soup trailing their names–who were involved with the software. But I was struck with how matter-of-fact she was that nobody had even thought about it. The software was that bad, and to her, the flaws were that obvious. The process by which she was forced to travel through the interface had little to do with the way she worked. Now, for any expert, they might have their own way of doing things, but that’s probably not the discrepancy here. (And in fact, if the differences between doctors are that great, then that itself should be part of the software: the doctor needs to be able to change the order in which the software works.) But it’s worth noting that the data (again, meaning the numbers and test history and easily measurable things) were all easily accessible from the interface, which suggests that like so many data-oriented projects, the numbers seduced the implementors. And so those concrete numbers (fourth or so on ranked importance for this doctor) won out over process (the way the doctor spends their day, and their time with the patient).
All of which is a long way of wondering, “are electronic medical records really about data?”
While checking the bus schedule for Greyhound, I recently discovered that travel from New York City to Boston is a multi-day affair, involving stops in Rochester, Toronto (yes, Canada), Fort Erie, Syracuse, and even Schenectady and Worcester (presumably because they’re both fun to say).
1 day, 5 hours, and 35 minutes. That’s the last time I complain about how bad the Amtrak site is.
Fantastic TED talk from Chris Jordan back in February 2008. Chris creates beautiful images that convey scale in the millions. Examples include statistics like the number of plastic cups used in a day — four million — and here showing one million of them:
The talk is ten minutes, and well worth a look. I’m linking a sinfully small version here, but check out the higher resolution version on the TED site.
As much as I love looking at this work (and his earlier portraits, more can be found on his site), there’s also something peculiar about the beauty of the images perhaps neutering their original point. Does seeing the number of prison uniforms spur viewers to action, or does it give chin-rubbing intellectual fulfillment accompanied by a deep sigh of worldliness? I’d hate to think it’s the latter. Someone I asked about this had a different reaction, and cited a group that had actually begun to act based on what they saw in his work. I wish I had the reference, but if that’s the case (and I hope it is), there’s no argument.
Looking at it another way, next time you reach for a plastic cup, will Jordan’s image that will come to mind? Will you make a different decision, even some of the time?
I’ve also just purchased his “Running the Numbers” book, since these web images are an injustice to the work. And I have more chin scratching and sighing to do.
(Thanks to Ron Kurti for the heads up on the video.)
I wanted to post this last week in my excitement over week 1 of pro football season (that’s the 300 lbs. locomotives pounding into each other kind of football, not the game played with actual balls and feet), but ran out of time. So instead, in honor of football Sunday, week 2, my favorite advertisement of last year’s football season:
The ad is a phone conversation with Coca-Cola’s Katie Bayne, animated by Imaginary Forces. A couple things I like about this… First, that the attitude is so much less heavy-handed than, say, the IBM spots that seem to be based on the premise that if they jump cut quickly enough, they can cure cancer. The woman being interviewed actually laughs about “big data” truisms. Next is the fact that it’s actually a fairly smart question that’s asked:
How important is it that you get the right information rather than just a lot of information?
Well… you know you can roll around in facts all day long. It’s critical that we stay aware of that mountain of data that’s coming in and mine it for the most valuable nuggets. It helps keep us honest.
And third, the visual quality that reinforces the lighter attitude. Cleverly drawn without overdoing it. She talks about being honest and a hand comes flying in to push back a Pinnocchio nose. Nuggets of data are shown as… eh, nuggets.
So it takes me a year or two to post the “You Are What You Say” lecture by Dan Frankowski, and the day after, a much more up-to-date paper is in the news. The paper is by Paul Ohm and is available here, or you can read an Ars Technicaarticle about it if you’d prefer the (geeky) executive summary. The paper also sites the work of Latanya Sweeney (as did the Frankowski lecture), with this defining moment of the contemporary privacy debate, when the Massachusetts Group Insurance Commission (GIC) released “anonymized” patient data in the mid-90s:
At the time GIC released the data, William Weld, then Governor of Massachusetts, assured the public that GIC had protected patient privacy by deleting identifiers. In response, then-graduate student Sweeney started hunting for the Governor’s hospital records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts, a city of 54,000 residents and seven ZIP codes. For twenty dollars, she purchased the complete voter rolls from the city of Cambridge, a database containing, among other things, the name, address, ZIP code, birth date, and sex of every voter. By combining this data with the GIC records, Sweeney found Governor Weld with ease. Only six people in Cambridge shared his birth date, only three of them men, and of them, only he lived in his ZIP code. In a theatrical flourish, Dr. Sweeney sent the Governor’s health records (which included diagnoses and prescriptions) to his office.
And from the “where are they now?” file, Sweeney continues her work at Carnegie Mellon, though I have to admit I’m a little nervous that she’s currently back in my neighborhood with visiting posts at MIT and Harvard. Damn this Cambridge ZIP code.
For the panel, we were to choose “an individual, movement, technology, whatever – whose importance has been overlooked” and follow that with “two themes that [we] believe will define the future of design and architecture.” In that context, I chose Lombardi’s work, and how it highlights a number of themes that are important to the future of design, particularly in working with data.
Give up those full hue heat map colors! Make images of biological data that even a grandmother can love! How about posters that no longer require an advanced degree to decipher? These platitudes and more coming next March, when I’ll be giving a keynote at the EMBO Workshop on Visualizing Biological Data in Heidelberg. Actually, I won’t be talking about any of those three things (though there’s a good chance I’ll talk about things like this), but registration is now open for participants:
We invite you to participate in the first EMBO Workshop on Visualizing Biological Data (VizBi) 3 – 5 March 2010 at the EMBL’s new Advanced Training Centre in Heidelberg, Germany.
The goal of the workshop is to bring together, for the first time, researchers developing and using visualization systems across all areas of biology, including genomics, sequence analysis, macromolecular structures, systems biology, and imaging (including microscopy and magnetic resonance imaging). We have assembled an authoritative list of 29 invited speakers who will present an exciting program, reviewing the state-of-the-art and perspectives in each of these areas. The primary focus will be on visualizing processed and annotated data in their biological context, rather than on processing of raw data.
The workshop is limited in the total number participants, and each participant is normally required to present a poster and to give a ‘fastforward’ presentation about their work (limited to 30 seconds and 1 slide).
To apply to join the workshop, please go to http://vizbi.org and submit an abstract and image related to your work. Submissions close on 16 November 2009. Since places are limited, participants will be selected based on the relevance of their work to the goals of the workshop.
Notifications of acceptance will be sent within three weeks after the close of submissions.
We plan to award a prize for the submitted image that best conveys a strong scientific message in a visually compelling manner.
Please forward this announcement to anyone who may be interested. We hope to see you in Heidelberg next spring!
Seán O’Donoghue, EMBL
Jim Procter, University of Dundee
Nils Gehlenborg, European Bioinformatics Institute
Reinhard Schneider, EMBL
If you have any questions about the registration process please contact:
I’ve been hesitant to post this video of Keith Olbermann’s 17-minute timeline connecting the shifting terror alert level to the news cycle and administration at the risk of veering too far into politics, but I’m reminded again of it with Tom Ridge essentially admitting to it in his book:
In The Test of Our Times: America Under Siege, Ridge wrote that although Rumsfeld and Ashcroft wanted to raise the alert level, “There was absolutely no support for that position within our department. None. I wondered, ‘Is this about security or politics?'”
Only to recant and be taken to task by Rachel Maddow:
Ridge went on to say that “politics was not involved” and that “I was not pressured.” Maddow then read to Ridge directly from his book’s jacket: “‘He recounts episodes such as the pressure that the DHS received to raise the security alert on the eve of of the ’04 presidential election.’ That’s wrong?”
As Seth Meyers put it, “My shock level on manipulation of terror alerts for political gain is green, or low.”
At any rate, whether there is in fact correlation, causation, or simply a conspiracy theory that gives far too much credit to the number of people who would have to be involved, I think it’s an interesting look at 1) message control 2) using the press (or a clear example of the possibilities) 3) the power of assembling information like this to produce such a timeline, and 4) actual reporting (as opposed to tennis match commentary) done by a 24-hour news channel.
Of course, I was disappointed that it wasn’t an actual visual timeline, though somebody has probably done that as well.
Finally got around to watching Dan Frankowski’s “You Are What You Say: Privacy Risks of Public Mentions” Google Tech Talk the other day. (I had the link set aside for two years. There’s a bit of a backlog.) In the talk, he takes an “anonymized” set of movie ratings and removes the anonymity by matching them to public mentions of movies in user profiles on the same site.
Interestingly, the ratings themselves weren’t as informative as the actual choice of movies to talk about. In the case of a site for movie buffs — ahem, film aficionados — I couldn’t help but think about participants in discussions using obscure film references as colored tail feathers as they try to out-strut one another. Of course this has significant impact on such a method, making the point that individual uniqueness is only a signature for identification: what makes you different just makes you more visible to a data mining algorithm.
The other interesting bit from the talk is about 20 minutes through, where starts to address ways to defeat such methods. There aren’t many good ideas, because of the tradeoffs involved in each, but it’s interesting to think about.
I’ve just posted a new piece that depicts changes between the multiple editions of Darwin’s “On the Origin of Species:
To quote myself, because it looks important:
We often think of scientific ideas, such as Darwin’s theory of evolution, as fixed notions that are accepted as finished. In fact, Darwin’s On the Origin of Species evolved over the course of several editions he wrote, edited, and updated during his lifetime. The first English edition was approximately 150,000 words and the sixth is a much larger 190,000 words. In the changes are refinements and shifts in ideas — whether increasing the weight of a statement, adding details, or even a change in the idea itself.
The idea that we can actually see change over time in a person’s thinking is fascinating. Darwin scholars are of course familiar with this story, but here we can view it directly, both on a macro-level as it animates, or word-by-word as we examine pieces of the text more closely.
This is hopefully the first of multiple pieces working with this data. Having worked with it since last December, I’ve been developing a larger application that deals with the information in a more sophisticated way, but that’s continually set aside because of other obligations. This simpler piece was developed for Emily King’s “Quick Quick Slow” exhibition opening next week at Experimenta Design in Portugal. As is often the case, many months were spent to try to create something monolithic, then in a very short time, an offshoot of all that work is developed that makes use of that infrastructure.
Oddly enough, I first became interested in this because of a discussion with a friend a few years ago, who had begun to wonder whether Darwin had stolen most of his better ideas from Alfred Russel Wallace, but gained the notoriety and credit because of his social status. (This appealed to the paranoid creator in me.) She cited the first edition of Darwin’s text as incoherent, and that it gradually improved over time. Interestingly (and happily, I suppose), the process of working on this piece has instead shown the opposite, and I have far greater appreciation for Darwin’s ideas than I had in the past.
The New York Times today looks upon the plight of poor AT&T, saddled with millions of new customers paying thousands of dollars a year. Jenna Wortham writes:
Slim and sleek as it is, the iPhone is really the Hummer of cellphones. It’s a data guzzler. Owners use them like minicomputers, which they are, and use them a lot. Not only do iPhone owners download applications, stream music and videos and browse the Web at higher rates than the average smartphone user, but the average iPhone owner can also use 10 times the network capacity used by the average smartphone user.
If that 10x number didn’t come from AT&T, where did it come from? Seems like they might be starting a “we didn’t want the iPhone anyway” campaign so that investors treat them more nicely when they (are rumored to) lose their carrier exclusivity next year.
The result is dropped calls, spotty service, delayed text and voice messages and glacial download speeds as AT&T’s cellular network strains to meet the demand. Another result is outraged customers.
So even with AT&T’s outrageous prices, they can’t make this work? This week I’m canceling my AT&T service because it would cost $150 a month to get what T-Mobile charges me $80 for. (Two lines with shared minutes, texting on both lines, unlimited data on one, and even tethering. I also love T-Mobile’s customer service, staffed by friendly humans who don’t just read from scripts.)
With nine million users paying in excess of $100 a month apiece, they’re grossing a billion dollars a month, and they’re complaining about having to upgrade their network? They could probably fund rebuilding their entire network from scratch with the $15/month they charge to send more than 200 text messages. (Text messages are pure profit, because they’re sent using extra space in packets sent between the phone and the carrier.)
All of the cited problems, of course, would be lessened without carrier exclusivity. Don’t want 9 million iPhone customers clogging the network? Then don’t sign a deal requiring that you’re the only network they have access to. Hilarious.
But! The real reason I’m posting is because of the photos that accompany the article, including a shot of the AT&T command center and its big board:
A few thoughts:
If they’re gonna make it look like an orchestra pit, then I hope the head of IT is wearing tails.
Do they get night & weekend minutes because the lights are out? Wouldn’t the staff be a little happier if the lights were turned on?
And most important, I wonder what kind of coverage they get in there. It looks like the kind of underground bunker where you can’t get any signal. And if I’m not mistaken, those look like land lines on the desks.
As a continuation of this project, we’ve just finished a second health visualization (also built with Processing) using GE’s data. Like the first round, we started with ~6 million patient records from their “MQIC” database. Using the software, you input gender, age range, height/weight (to calculate BMI), and smoking status. Based on the selections it shows you the number of people in the database that match those settings, and the percentages that have been diagnosed with diabetes, heart disease, hypertension, or have had a stroke:
For people reading the site because they’re interested in visualization (I guess that’s all of you, except for mom, who is just trying to figure out what I’m up to), some inside baseball:
On the interaction side, the main objective here was to make it easy to move around the interface as quickly as possible. The rows are shown in succession so that the interface can teach itself, but we also provide a reset button so that you can return to the starting point. Once the rows are visible, though, it’s easy to move laterally and make changes to the settings (swapping between age ranges, for instance).
One irony of making the data accessible this way is that most users — after looking up their own numbers — will then try as many different possibilities, in a quick hunt for the extremes. How high do the percentages go? If I select bizarre values, what happens at the edges? Normally, you don’t have to spend as much time on these 1% cases, and it would be alright for things to be a little weird when truly odd values are entered (300 lb. people who are 4′ tall, smokers, and age 75 and over). But in this case, a lot more time has to be spent making sure things work. So while most of the time the percentages at the top are in the 5-15% range, I had to write code so that when one category shoots up to 50%, the other bars in the chart scale down in proportion.
Another aspect of the interface is the body mass index calculator. Normally a BMI chart looks something like this, a large two-dimensional plot that would otherwise use up half of the interface. By using a little interaction, we can make a simpler chart that dynamically updates itself based on the current height or weight settings. Also, because the ranges have (mathematically) hard edges, we’re showing that upper and lower bound of the range so that it’s more apparent. Otherwise, a 5’8″ person steps from 164 to 165 lbs to find themselves suddenly overweight. In reality, the boundaries are more fuzzy, which would be taken into account by a doctor. But with the software, we instead have to be clear about the way the logic is working.
(Note that the height and weight are only used to calculate a BMI range — it’s not pulling individuals from the database who are 5’8″ and 160 lbs, it’s pulling people from the “normal” BMI range.)
For the statistically (or at least numerically) inclined, there are also some interesting quirks that can be found, like a situation or two where health risk would be expected to go up, but in fact they go down (I’ll leave you to find them yourself). This is not a bug. We’re not doing any sort of complex math here to evaluate actual risk, the software is just a matching game with individuals in the database. These cases in particular show up when there are only a few thousand individuals, say 2,000 out of the full 6 million records. The number of people in these edge cases is practically a rounding error, which means that we can’t make sound conclusions with them. As armchair doctor-scientist, it’s also interesting to speculate as to what might be happening in such cases, and how other factors may come into play.
In 2001, when I was a young MIT faculty member overseeing the Media Lab Aesthetics and Computation Group, two students came up with an idea that would become an award-winning piece of software called Processing—which I am often credited with having a hand in conceiving. Processing, a programming language and development environment that makes sophisticated animations and other graphical effects accessible to people with relatively little programming experience, is today one of the few open-source challengers to Flash graphics on the Web. The truth is that I almost stifled the nascent project’s development, because I couldn’t see the need it would fill. Luckily, Ben Fry and Casey Reas absolutely ignored my opinion. And good for them: the teacher, after all, isn’t always right.
To give him more credit (not that he needs it, but maybe because I’m bad with compliments), John’s objection had much to do with the fact that Processing was explicitly an evolutionary, as opposed to revolutionary, step in how coding was done. That’s why it was never the focus of my Masters or Ph.D. work, and instead has nearly always been a side project. And more importantly, for students in his research group, he usually forced us away from whatever came naturally for us. Those of us for whom creating tools was “easy,” he forced us to make less practical things. For those who were comfortable making art, he steered them toward creating tools. In the end, we all learned more that way.
Tiny Sketch is an open challenge to artists and programmers to create the most compelling creative work possible with the programming language Processing using 200 characters or less.
…building on the proud traditions of obfuscated code contests and the demo scene. The contest runs through September 13 and is sponsored by Rhizome and OpenProcessing.
Having designed Processing to do one thing or another, several of the submissions made me laugh out loud for ways their authors managed to introduce new quirks. For instance, consider the createFont() function. Usually it looks something like this:
PFont f = createFont("Helvetica", 12);
If the “Helvetica” font is not installed, it silently winds up using a default font. So somebody clever figured out that if you just leave the font name blank, it’s an easy way to get a default font, and not burn several characters of the limit:
PFont f = createFont("", 12);
Another, by Kyle McDonald, throws an exception as a way to produce text to plot on screen. (It’s also a bit of an inside joke—on us, perhaps—because it’s a ubiquitous error message resulting from a change that was made since earlier releases of Processing.)
One of the most interesting bits is seeing how these ideas propagate into later sketches that are produced. Since the font hack appeared (not sure who did it first, let me know if you do), everyone else is now using that method for producing text. Obviously art/design/coding projects are always the result of other influences, but it’s rare that you get to see ideas exchanged in such a direct fashion.
And judging from some of the jagged edges in the submissions, I’m gonna change the smooth() to just s() for the next release of Processing, so that more people will use it in the next competition.
More typographic tastiness (see the earlier post) from Hoefler & Frere-Jones with a writeup on Choosing Fonts for Annual Reports. Lots of useful design help and ideas for anyone who works with numbers, whether actual annual reports or (more likely) fighting with Excel and PowerPoint. For instance, using enclosures to frame numbers, or knock them out:
Another helpful trick is using two weights so that you can avoid placing a line between them:
Or using a proper condensed face when you have to invite too many of your numerical friends:
At any rate, I recommend the full article for anyone working with numbers, either for the introduction to setting type (for the non-designers) or a useful reminder of some of the solutions (for those who fret about these things on a regular basis).
Responding to the Boehner post, Jay Parkinson, M.D. pointed me to this improved chart by designer Robert Palmer, accompanied by an angst-ridden open letter (an ironic contrast to the soft pastels in his diagram) decrying the crimes of visual malfeasance.
Meanwhile, Ezra Klein over at the Washington Post seems to be thinking along similar lines as my original post, noting this masked artist’s earlier trip to Kinko’s a few weeks ago. Klein writes:
Whoever is heading the Scary Flowcharts Division of John Boehner’s office is quickly becoming my favorite person in Washington. A few weeks ago, we got this terror-inducing visualization of the process behind “Speaker Pelosi’s National Energy Tax.”
If I were teaching right now, I’d make all my students do a one day charrette on trying to come up with something worse than the Boehner health care image while staying in the realm of colloquial things you can do with PowerPoint. It’d be a great time, and we’d all learn a lot.
Watching the centrist Democrats in Congress create more and more reasons why health care can’t be fixed, I’ve been struck by a disquieting thought: Suppose our collective lack of response to Hurricane Katrina wasn’t exceptional but, rather, the new normal in America. Suppose we can no longer address the major challenges confronting the nation. Suppose America is now the world’s leading can’t-do country.
I agree and find it terrifying. And I don’t think that’s a partisan issue.
Now back to your purposefully apolitical, regularly scheduled blog on making pictures of data.
BBC Newsbrings word (via) that after a 44 year effort, the Historical Thesaurus of the Oxford English Dictionary will see the light of day. Rather than simple links between words, the beastly volume covers the history of the words within. For instance, the etymological timeline of the word “trousers” follows:
breeks The earliest reference from 1552 marks the change in fashion from breeches, a garment tied below the knee and worn with tights. Still used in Scotland, it derives from the Old English “breeches”. trouser The singular form of “trousers” comes from the Gallic word “trews”, a close-fitting tartan garment formerly worn by Scottish and Irish highlanders and to this day by a Scottish regiment. The word “trouses” probably has the same derivation. unimaginables This 19th Century word, and others like “unwhisperables” and “never-mention-ems”, reflect Victorian prudery. Back then, even trousers were considered risque, which is why there were so many synonyms. People didn’t want to confront the brutal idea, so found jocular alternatives. In the same way the word death is avoided with phrases like “pass away” and “pushing up daisies”. stove-pipes A 19th Century reference hijacked in the 1950s by the Teddy Boys along with drainpipes. The tight trousers became synonymous with youthful rebellion, a statement of difference from the standard post-war suits. rammies This abbreviation of Victorian cockney rhyming slang “round-me-houses” travelled with British settlers to Australia and South Africa.
Are you seeing pictures and timelines yet? Then this continues for 600,000 more words. Mmmm!
An English language professor, Ms Kay, one of four co-editors of the publication, began work on it in the late 1960s – while she was in her 20s.
It’s hard to fathom being in your 60s, and completing the book that you started in your 20s, though it’s difficult to argue with the academic and societal contribution of the work. Her web page also lists “the use of computers in teaching and research” as one of her interest areas, which sounds like a bit of an understatement. I’d be interested in computers too if my research interest was the history 600,000 words and their 800,000 meanings across 236,000 categories.
I’m so completely impressed with this incredible bit of info graphic awesomeness distributed by the office of John Boehner, Republican congressman from Ohio’s 8th District. The flow chart purports to show the Democrats’ health care proposal:
The image needs to be much larger to be fully appreciated in its magnificent glory of awfulness, so a high resolution version is here, and the PDF version is here.
The chart was used by Boehner as a way to make the plan look as awful as possible — a tactic used to great effect by the same political party during the last attempt at health care reform in 1994. The diagram appears to be the result of a heart-warming collaboration between a colorblind draughtsman, the architect of a nightmarish city water works, and whoever designed the instructions for the bargain shelving unit I bought from Target.
Don’t waste your time, by the way — I’ve already nominated it for an AIGA award.
(And yes, The New Republic also created a cleaner version, and the broader point is that health care is just a complex mess no matter what, so don’t let that get in the way of my enjoyment of this masterwork.)
Additional perspective from The Daily Show (my original source) follows.
The New York Times this morning documents Major League Baseball’s use of DNA tests to verify the age of baseball prospects:
Dozens of Latin American prospects in recent years have been caught purporting to be younger than they actually were as a way to make themselves more enticing to major league teams. Last week the Yankees voided the signing of an amateur from the Dominican Republic after a DNA test conducted by Major League Baseball’s department of investigations showed that the player had misrepresented his identity.
Some players have also had bone scans to be used in determining age range.
(Why does a “bone scan” sound so painful? “You won’t provide a DNA sample? Well, maybe you’ll change your mind after the bone scan!”)
Kathy Hudson of Johns Hopkins notes the problem with testing:
The article continues and makes note of the fact that such tests are also used to determine whether a player’s parents are his real parents, which can have an upsetting outcome.
But perhaps the broader concern (outside broken homes) and the scarier motivation for expansion of such testing is noted by a scouting director (not named), who comments:
“Can they test susceptibility to cancer? I don’t know if they’re doing any of that. But I know they’re looking into trying to figure out susceptibility to injuries, things like that. If they come up with a test that shows someone’s connective tissue is at a high risk of not holding up, can that be used? I don’t know. I do think that’s where this is headed.”
Injury is perhaps the most significant, yet most random, factor in scouting. If we’re talking about paying someone $27 million, will the threat of a federal discrimination law (wielded by a young player and agent) really be enough to keep teams away from this?
In other news, an article from Slate about measuring obesity using BMI (Body Mass Index). Interesting reading as I continue with work in the health care space. The article goes through the obvious flaws of the BMI measure, along with some history. Jeremy Singer-Vine writes:
Belgian polymath Adolphe Quetelet devised the equation in 1832 in his quest to define the “normal man” in terms of everything from his average arm strength to the age at which he marries. This project had nothing to do with obesity-related diseases, nor even with obesity itself. Rather, Quetelet used the equation to describe the standard proportions of the human build—the ratio of weight to height in the average adult. Using data collected from several hundred countrymen, he found that weight varied not in direct proportion to height (such that, say, people 10 percent taller than average were 10 percent heavier, too) but in proportion to the square of height. (People 10 percent taller than average tended to be about 21 percent heavier.)
For some reason, this brings to mind a guy in a top hat guessing peoples’ weight at the county fair. More to the point is the “how did we get here?” part of the story. Starting with a mediocre measure, it evolved into something for which it was never intended, simply because it worked for a large number of individuals:
The new measure caught on among researchers who had previously relied on slower and more expensive measures of body fat or on the broad categories (underweight, ideal weight, and overweight) identified by the insurance companies. The cheap and easy BMI test allowed them to plan and execute ambitious new studies involving hundreds of thousands of participants and to go back through troves of historical height and weight data and estimate levels of obesity in previous decades.
Gradually, though, the popularity of BMI spread from epidemiologists who used it for studies of population health to doctors who wanted a quick way to measure body fat in individual patients. By 1985, the NIH started defining obesity according to body mass index, on the theory that official cutoffs could be used by doctors to warn patients who were at especially high risk for obesity-related illness. At first, the thresholds were established at the 85th percentile of BMI for each sex: 27.8 for men and 27.3 for women. (Those numbers now represent something more like the 50th percentile for Americans.) Then, in 1998, the NIH changed the rules: They consolidated the threshold for men and women, even though the relationship between BMI and body fat is different for each sex, and added another category, “overweight.” The new cutoffs—25 for overweight, 30 for obesity—were nice, round numbers that could be easily remembered by doctors and patients.
I hadn’t realized that it was only 1985 that this came into common use. And I thought the new cutoffs had more to do with the stricter definition from the WHO, rather than the simplicity of rounding. But back to the story:
Keys had never intended for the BMI to be used in this way. His original paper warned against using the body mass index for individual diagnoses, since the equation ignores variables like a patient’s gender or age, which affect how BMI relates to health.
After taking as fact that it was a poor indicator, all this grousing about the inaccuracy of BMI now has me wondering how often it’s actually out of whack. For instance, it does poorly for muscular athletes, but what percentage of the population is that? 10% at the absolute highest? Or at the risk of sounding totally naive, if the metric is correct, say, 85% of the time, does it deserve as much derision as it receives?
Going a little further, another fascinating part of returns to the fact that the BMI numbers had in the past been a sort of guideline used by doctors. Consider the context: a doctor might sit with a patient in their office, and if the person is obviously not obese or underweight, not even consider such a measure. But if there’s any question, BMI provides a general clue as to an appropriate range, which, when delivered by a doctor with experience, can be framed appropriately. However, as we move to using technology to record such measures—it’s easy to put an obesity calculation into an electronic medical record, for instance, that EMR does not (necessarily) include the doctor’s delivery.
Basically, we can make a general rule or goal that numbers that require additional context (delivery by a doctor), shouldn’t be stored in places devoid of context (databases). If we’re taking away context, the accuracy of the metric has to increase in proportion (or proportion squared, even) to the amount of context that has been removed.
I assume this is the case for most fields, and that the statistical field has a term (probably made up by Tukey) for the “remove context, increase accuracy” issue. At any rate, that’s the end of today’s episode of “what’s blindingly obvious to proper statisticians but I like working out for myself.”
A Fonts for Financials mailing from Hoefler & Frere-Jones includes some incredibly beautiful typefaces they’ve developed that play well with numbers. A sampling includes tabular figures (monospaced numbers, meaning “farewell, Courier!”) using Gotham and Sentinel:
Or setting indices (numbers in circles, apparently), using Whitney:
As Casey wrote this morning, “these are the sexiest numbers I’ve seen in some time.” I love ’em.
My favorite part of this week’s Seminar on Innovative Approaches to Turn Statistics into Knowledge (aside from its comically long name) was the presentation from Amanda Cox of The New York Times. She showed three particular projects which are a little further up the complexity scale as compared to a lot of the work from the Times, and much more like the sort of numerical messes that catch my interest. The three serve are also a great cross-section of Amanda’s work with her collaborators, so I’m posting them here. Check ’em out:
Just days before shooting was to begin, Sony Pictures pulled the plug on “Moneyball,” a major film project starring Brad Pitt and being directed by Steven Soderbergh.
Yesterday I found it far more unsettling that such a movie would be made period, but today I’m oddly curious about how they might pull it off:
What baseball saw as accurate, Sony executives saw as being too much a documentary. Mr. Soderbergh, for instance, planned to film interviews with some of the people who were connected to the film’s story.
I guess we’ll never know, since other studios also passed on the project, but that’s probably a good thing.
As an aside, I’m in the midst of reading Liar’s Poker (another by Moneyball author Michael Lewis) and again find myself amused by his ability as a storyteller: he reminds me of a friend who can take the most banal event and turn it into the most peculiar and hilarious story you’ve ever heard.
Meanwhile, my inbox has been filling with plaintive comments like this one:
Will you be updating this site for this year? It’s the first year I think my team, the Giants would have a blue line instead of a red line.
How can I ignore the Giants fans? (Or for that matter, their neighbors to the south, the Dodgers, who perch atop the list as I write this.)
There’s simply no way to give people access to others’ private records — in the name of security or otherwise — and trust those given access to do the right thing. From a New York Times story on the NSA’s expanded wiretapping:
The former analyst added that his instructors had warned against committing any abuses, telling his class that another analyst had been investigated because he had improperly accessed the personal e-mail of former President Bill Clinton.
This is not isolated, and this will always be the case. From a story in The Boston Globe a month ago:
Law enforcement personnel looked up personal information on Patriots star Tom Brady 968 times – seeking anything from his driver’s license photo and home address, to whether he had purchased a gun – and auditors discovered “repeated searches and queries” on dozens of other celebrities such as Matt Damon, James Taylor, Celtics star Paul Pierce, and Red Sox owner John Henry, said two state officials familiar with the audit.
The NSA wiretapping is treated too much like an abstract operation, with most articles that describe it overloaded with talk of “data collection,” and “monitoring,” and the massive scale of data that traffics through internet service providers. But the problem isn’t the computers and data and equipment, it’s that on the other end of the line, a human being is sitting there deciding what to do with that information. Our curiosity and voyeurism leaves us fundamentally flawed for dealing with such information, and unable to ever live up to the responsibility of having that access.
The story about the police officers who are overly curious about sports stars (or soft rock balladeers) is no different from the NSA wiretapping, because it’s still people, with the same impulses, on the other end of the line. Until reading this, I had wanted to believe that NSA employees — who should truly understand the ramifications — would have been more professional. But instead they’ve proven themselves no different from a local cop who wants to know if Paul Pierce owns a gun or Matt Damon has a goofy driver’s license picture.
Adobe Illustrator has regressed into talking back like it’s a two-year-old:
Asked for further comment, Illustrator responded:
CANT DO THAT. MOMMY NOOOOOO! CANT!
No doubt this is my own fault for not having upgraded to CS4. I’ll wait for CS5 when I can shell out for the privilege of using 64-bits, maybe the additional memory access will allow me to open files that worked in Illustrator 10 but no longer open on newer releases because the system (with 10x the RAM, and 5x the CPU) runs out of memory.
Casey wrote with more info regarding the previous post about Pelham. The command center in the movie is fake (as expected), because the real command center looks too sophisticated. NPR had this quote from John Johnson (spelling?), New York City Transit’s Chief Transportation Officer:
“They actually … attempted to downplay what the existing control center looks like, because they wanted to make it look real to the average eye as compared to… we’re pretty Star Trekky up in the new control center now.”
So that would explain the newish typeface used in the image, and the general dumbing-down of the display. The audio from the NPR story is here, with the quote near the 3:00 mark.
This is the only image I’ve been able to find of the real command center:
Links to larger/better/more descriptive images welcome!
Is this a real place? Buried within the bowels of New York City? And Mr. Washington, how about using one of your two telephones to order a new typeface for that wall? Looks like a hundred thousand dollars of display technology being used for ASCII line art.
Last week at the CaT conference, I met Sheena Matheiken, a designer who is … I’ll let her explain:
Starting May 2009, I have pledged to wear one dress for one year as an exercise in sustainable fashion. Here’s how it works: There are 7 identical dresses, one for each day of the week. Every day I will reinvent the dress with layers, accessories and all kinds of accouterments, the majority of which will be vintage, hand-made, or hand-me-down goodies. Think of it as wearing a daily uniform with enough creative license to make it look like I just crawled out of the Marquis de Sade’s boudoir.
Interesting, right? Particularly where the idea is to make the outfit new through the sort of forced creativity that comes from wearing a uniform. But also not unlike the dozens (hundreds? thousands?) of other “I’m gonna do x each day for 365 days” projects, where obsessive compulsive types take a photo, choose a Pantone swatch, learn a new word, etc. in celebration of the Earth revolving about its axis once more. Yale’s graduate graphic design program even frequents a yearly “100 day” project along these lines. (Don’t get me wrong–I’m happy to obsess and compulse with the best of them.)
But then it gets more interesting:
The Uniform Project is also a year-long fundraiser for the Akanksha Foundation, a grassroots movement that is revolutionizing education in India. At the end of the year, all contributions will go toward Akanksha’s School Project to fund uniforms and other educational expenses for slum children in India.
How cool! I love how this ties the project together. More can be found at The Uniform Project, with daily photos of Sheena’s progress. And be sure to donate.
I’m looking forward to what she has to say about what she’s learned about clothes and how you wear them after the year is complete. Ironic, that the year she wears the same thing for 365 days will be her most creative.
A simple, interactive means for seeing connections between demographics, diseases, and diagnoses:
We just finished developing this project for GE as part of the launch of their new health care initiative. With the input and guidance of a handful of departments within the company, we began by looking at their proprietary database of 14 million patient records looking for ways to show connections between related conditions. For instance, we wanted visitors to the site to be able to learn how diabetes diagnoses increase along with obesity, but convey it in a manner that didn’t feel like a math lesson. By cycling through the eight items at the top (and the row beneath it), you can make several dozen comparisons, highlighting what’s found in actual patient data. At the bottom, some additional background is provided based on various national health care studies.
I’m excited to have the project finished and online, and have people making use of it, as I readjust from the instant gratification of building things one day and then talking about them the next day. More to come!
Depicting networks (also known as graphs, and covered in chapters 7 and 8 of Visualizing Data) is a tricky subject, and too often leads to representations that are a tangled and complicated mess. Such diagrams are often referred to with terms like ball of yarn or string, a birds nest, cat hair or simply hairball.
It’s also common for a network diagram to be engaging and attractive for its complexity (usually aided and abetted by color), which tends to hide how poorly it conveys the meaning of the data it represents.
On the other hand, Tamara Munzner is someone in visualization who really “gets” graphs in greater depth. A couple years ago she gave an excellent Google Tech Talk (looks like it was originally from another conference in ’05), titled “15 Views of a Node Link Graph” (video, links, slides) where she discussed a range of methods for working viewing graph data, along with their pros and cons:
A cheat sheet of the 15 methods:
Animated Radial Layouts
Multilevel Call Matrices
2D Hyperbolic Trees
The presentation is an excellent survey of methods, and highly recommended for anyone getting started with graph and network data. It’s useful food for thought for the “how should I represent this data?” question.
I was in the midst of starting a new post in January so I failed to make a post about it at the time, but Oblong‘s Tamper installation was on display at the 2009 Sundance Film Festival. John writes (and I copy verbatim):
Our Sundance guests — who already number in the thousands — find the experience exhilarating. A few grim cinephiles have supplementally raised an eyebrow (one per cinephile) at the filmic heresy that TAMPER provides: a fluid new ability to isolate, manipulate, and juxtapose (rudely, say the grim) disparate elements (ripped from some of the greatest works of cinema, continue the grim). For us, what’s important is the style of work: real-time manipulation of media elements at a finer granularity than has previously been customary or, for the most part, possible; and a distinctly visceral, dynamic, and geometric mode of interaction that’s hugely intuitive because the incorporeal suddenly now reacts just like bits of the corporeal world always have. Also, it’s glasses-foggingly fun.
I mostly find this fascinating having not seen it properly depicted, but the interactive version shows more about locations of power plants, plus maps of solar and wind power along with their relative capacities.
I love the craggy beauty of the layered lines, and appreciate the restraint of the map’s creators to simply show us this amazing data set.
And if you find yourself toe tapping and humming “we gonna rock down to…” later this afternoon, then I’m really sorry. I’m already beginning to regret it.
I’ve not been working on Windows much lately, but while installing Windows XP today, I was greeted with this fine work of nonfiction, which reminds me why I miss it so:
So I can’t synchronize the time because…the time on the machine is incorrect. And not only that, but my state represents a security risk to the time synchronization machine in the sky.
I hope the person who wrote this error message enjoyed it as much as I did. At least when writing bad error messages in Processing I have some leeway for making fun of the situation (hence the unprofessional window titles of some of the error dialogs).
Reader Eric Mika sent a link to the video of Obama’s speech that I mentioned a couple days ago. The speech was knocked from the headlines by news of Arlen Specter leaving the Republican party within just a few hours, so this is my chance to repeat the story again.
Specter’s defection is only relevant (if it’s relevant at all) until the next election cycle, so it’s frustrating to see something that could affect us for five to fifty years pre-empted by what talking heads are more comfortable bloviating about. It’s a reminder that with all the progress we’ve made on how quickly we can distribute news, and the increase in the number of outlets by which it’s available, the quality and thoughtfulness of the product has only been further undermined.
Update, a few hours later: it’s a battle of the readers! now Jamie Alessio passed along a high quality video of the the President’s speech from the White House channel on YouTube. Here’s the embedded version:
Author Ben Fry will be presenting “Computational Information Design” –a mix of his work in visualization and coding plus a quick introduction to Processing. We are very excited to talk to Mr. Fry and our thanks go out to this event’s sponsors: Atalasoft and Snowtide Informatics.
I believe it is not in our American character to follow – but to lead. And it is time for us to lead once again. I am here today to set this goal: we will devote more than three percent of our GDP to research and development. We will not just meet, but we will exceed the level achieved at the height of the Space Race, through policies that invest in basic and applied research, create new incentives for private innovation, promote breakthroughs in energy and medicine, and improve education in math and science. This represents the largest commitment to scientific research and innovation in American history.
I’m not much for patriotism rah-rah but it’s hard not to get fired up about this. I found the rest of his speech remarkable as well, listing specific technologies that emerged from basic research, too often overlooked:
The Apollo program itself produced technologies that have improved kidney dialysis and water purification systems; sensors to test for hazardous gasses; energy-saving building materials; and fire-resistant fabrics used by firefighters and soldiers.
And the announcement of a new agency along the lines of DARPA:
And today, I am also announcing that for the first time, we are funding an initiative – recommended by this organization – called the Advanced Research Projects Agency for Energy, or ARPA-E.
This is based on the Defense Advanced Research Projects Agency, known as DARPA, which was created during the Eisenhower administration in response to Sputnik. It has been charged throughout its history with conducting high-risk, high-reward research. The precursor to the internet, known as ARPANET, stealth technology, and the Global Positioning System all owe a debt to the work of DARPA.
The speech, nearly 5000 words in total (did our former President spill that many words for science during eight years in office?) continues with more policy regarding research, investment, and education–all very exciting to read. But perhaps my most favorite line of all, when he said to the members of the National Academy of Sciences in attendance:
And so today I want to challenge you to use your love and knowledge of science to spark the same sense of wonder and excitement in a new generation.
Word on the street (where by “the street” I mean an email from Golan Levin), is that the Center for Responsive Politics has made available piles and piles of data:
The following data sets, along with a user guide, resource tables and other documentation, are now available in CSV format (comma-separated values, for easy importing) through OpenSecrets.org’s Action Center at http://www.opensecrets.org/action/data.php:
CAMPAIGN FINANCE: 195 million records dating to the 1989-1990 election cycle, tracking campaign fundraising and spending by candidates for federal office, as well as political parties and political action committees. CRP’s researchers add value to Federal Election Commission data by cleaning up and categorizing contribution records. This allows for easier totaling by industry and company or organization, to measure special-interest influence.
LOBBYING: 3.5 million records on federal lobbyists, their clients, their fees and the issues they reported working on, dating to 1998. Industry codes have been applied to this data, as well.
PERSONAL FINANCES: Reports from members of Congress and the executive branch that detail their personal assets, liabilities and transactions in 2004 through 2007. The reports covering 2008 will become available to the public in June, and the data will be available for download once CRP has keyed those reports.
527 ORGANIZATIONS: Electronically filed financial records beginning in the 2004 election cycle for the shadowy issue-advocacy groups known as 527s, which can raise unlimited sums of money from corporations, labor unions and individuals.
The best thing here is that they’ve already tidied and scrubbed the data for you, just like Mom used to. The personal finance information alone has already led to startling revelations.
Curated by AXIOM Founding Director, Heidi Kayser, PARSE, includes the work of five artists who use data to present new perspectives on the underlying information that makes us human. Overlooked patterns of data surround us daily. The artists in PARSE sort, separate and amalgamate physical, mental and social information to create intricate visualizations in print, interactive media, animation and sculpture. These pieces track and reflect our brainwaves during REM sleep, our genetic code, our social icons, and even our carnal desires.
Featuring works by: Ben Fry and Eugene Kuo, Fernanda Viegas and Martin Wattenberg, Jason Salavon, Jen Hall
The opening is from 6-9pm. The gallery location is amazing — it’s a nook to the side of the Green Street subway station (on the Orange Line in Boston) — it makes me think of what it might be like to have a show at the lair of Bill Murray’s character in Caddyshack. I love that it’s been reserved as a gallery space.
Martin & Fernanda are showing their Fleshmap project, along with a pair of amalgamations by Jason Salavon, and two sculptures from Jen Hall (hrm, can’t find a link for those). Our project is described here, and uses comparisons of the DNA between many species that have been the focus of my curiosity recently to make compositions like the one seen to the right.
The story begins with the writer having a chance meeting with a friend, and inquiring about his apparent interest in puppet theater. As the story moves on:
“And what is the advantage your puppets would have over living dancers?”
“The advantage? First of all a negative one, my friend: it would never be guilty of affectation. For affectation is seen, as you know, when the soul, or moving force, appears at some point other than the centre of gravity of the movement. Because the operator controls with his wire or thread only this centre, the attached limbs are just what they should be.… lifeless, pure pendulums, governed only by the law of gravity. This is an excellent quality. You’ll look for it in vain in most of our dancers.”
The remainder is a wonderful parable of vanity and grace.
Welcome to the future, where everything about you is saved. A future where your actions are recorded, your movements are tracked, and your conversations are no longer ephemeral. A future brought to you not by some 1984-like dystopia, but by the natural tendencies of computers to produce data.
Data is the pollution of the information age. It’s a natural by-product of every computer-mediated interaction. It stays around forever, unless it’s disposed of. It is valuable when reused, but it must be done carefully. Otherwise, its after-effects are toxic.
The essay goes on to cite specific examples, though they sound more high-tech than the actual problem. Later it returns to the important part:
Cardinal Richelieu famously said: “If one would give me six lines written by the hand of the most honest man, I would find something in them to have him hanged.” When all your words and actions can be saved for later examination, different rules have to apply.
Society works precisely because conversation is ephemeral; because people forget, and because people don’t have to justify every word they utter.
Conversation is not the same thing as correspondence. Words uttered in haste over morning coffee, whether spoken in a coffee shop or thumbed on a BlackBerry, are not official correspondence.
And an earlier paragraph that highlights why I talk about privacy on this site:
And just as 100 years ago people ignored pollution in our rush to build the Industrial Age, today we’re ignoring data in our rush to build the Information Age.
Liskov, the first U.S. woman to earn a PhD in computer science, was recognized for helping make software more reliable, consistent and resistant to errors and hacking. She is only the second woman to receive the honor, which carries a $250,000 purse and is often described as the “Nobel Prize in computing.”
I’m embarrassed to admit that I wasn’t more familiar with her work prior to reading about it in Tuesday’s Globe, but wow:
Liskov’s early innovations in software design have been the basis of every important programming language since 1975, including Ada, C++, Java and C#.
Liskov’s most significant impact stems from her influential contributions to the use of data abstraction, a valuable method for organizing complex programs. She was a leader in demonstrating how data abstraction could be used to make software easier to construct, modify and maintain…
In another contribution, Liskov designed CLU, an object-oriented programming language incorporating clusters to provide coherent, systematic handling of abstract data types. She and her colleagues at MIT subsequently developed efficient CLU compiler implementations on several different machines, an important step in demonstrating the practicality of her ideas. Data abstraction is now a generally accepted fundamental method of software engineering that focuses on data rather than processes.
This has nothing to do with gender, of course, but I find it exciting apropos of this earlier post regarding women in computer science.
Update: As of January 1st, 2010, I’m no longer at Seed. Read more here.
Some eighteen months as visualization vagabond (roving writer, effusive explainer, help me out here…) came to a close in December when I signed up with Seed Media Group to direct a new visualization studio here in Cambridge. We now have a name—the Phyllotaxis Lab—and as of last week, we’ve made it official with a press release:
NEW YORK and CAMBRIDGE, MA (March 5, 2009) – Building on Seed Media Group’s strong design culture, Adam Bly, founder and CEO, announced today the appointment of Ben Fry as the company’s first Design Director. Seed Media Group also announced the launch of a new unit focused on data and information visualization to be based in Cambridge, Massachusetts and headed by Ben Fry.
Seed Visualization will help companies and governments find solutions to clearly communicate complex data sets and information to various stakeholders. The unit’s research arm, the Phyllotaxis Lab, will work to advance the field of data visualization and will undertake research and experimental design work. The Lab will partner with academic institutions around the world and will provide education on the field of data visualization.
And about that name:
Phyllotaxis is a form commonly found in nature that is derived from the Fibonacci sequence. It is the inspiration for Seed Media Group’s logo, designed in 2005 by Stefan Sagmeister and recently included in the Design and the Elastic Mind exhibit at MoMA. “Much like a phyllotaxis, visualization is about both numbers and information as well as structure and form,” said Ben Fry. “It’s a reminder that beauty is derived from the intelligence of the solution.”
The full press release can be found here (PDF), and more details are forthcoming.
Some combination of internet-fed conspiracy theorists and Google Earthlings (lings that use Google Earth) were abuzz last week with an odd image find, possibly representing the lost city of Atlantis:
These hopes were later dashed (or perhaps only fed further) when the apparition was denied in a post on the Official Google Blog crafted by two of the gentlemen involved in the data collection for Google Ocean. The post is fascinating as it describes much of the process that they use to get readings of the ocean floor. They explain how echosounding (soundwaves bounced into the depths) is used to determine distance, and when that’s not possible, they actually use the sea level itself:
Above large underwater mountains (seamounts), the surface of the ocean is actually higher than in surrounding areas. These seamounts actually increase gravity in the area, which attracts more water and causes sea level to be slightly higher. The changes in water height are measurable using radar on satellites. This allows us to make a best guess as to what the rest of the sea floor looks like, but still at relatively low resolutions (the model predicts the ocean depth about once every 4000 meters). What you see in Google Earth is a combination of both this satellite-based model and real ship tracks from many research cruises (we first published this technique back in 1997).
How great is that? The water actually reveals shapes beneath because of gravity’s rearrangement of the ocean surface.
A more accurate map of the entire ocean would require a bit more effort:
…we could map the whole ocean using ships. A published U.S. Navy study found that it would take about 200 ship-years,meaning we’d need one ship for 200 years, or 10 ships for 20 years, or 100 ships for two years. It costs about $25,000 per day to operate a ship with the right mapping capability, so 200 ship-years would cost nearly two billion dollars.
Holy crap, two billion dollars? That’s real money!
That may seem like a lot of money…
Yeah, no kidding — that’s what I just said!
…but it’s not that far off from the price tag of, say, a new sports stadium.
You mean this would teach us more than New Yorkers will learn from the Meadowlands Stadium debacle, beyond “the Jets still stink” and “Eli Manning is still a weenie”? (Excellent Bob Herbert op-ed on a similar topic — the education part, not the Manning part.)
So in the end, this “Atlantis” is the result of the rounding error in the patchwork of data produced by the various measurement and tiling methods. Not as exciting as a waterlogged and trident-wielding civilization, but the remainder of the article is a great read if you’re curious about how the ocean images are collected assembled.
Passing along a call for the ACM Creativity & Cognition 2009. Sadly I’m overbooked and won’t be able to participate this year, but I attended in 2007 and found it a much more personal alternative to the more enormous ACM conferences (CHI, SIGGRAPH) without losing quality.
Everyday Creativity: Shared Languages and Collective Action
October 27-30, 2009, Berkeley Art Museum, CA, USA
Sponsored by ACM SIGCHI, in co-operation with SIGMM/ SIGART [pending approval]
Mihaly Csikszentmihalyi Professor of Psychology & Management, Claremont Graduate University, USA
JoAnn Kuchera-Morin Director, Allosphere Research Laboratory, California Nanosystems Institute, USA
Jane Prophet Professor of Interdisciplinary Computing, Goldsmiths University of London, UK
Call for Participation
Full Papers, Art Exhibition, Live Performances, Demonstrations, Posters, Workshops, Tutorials, Graduate Symposium
Creativity is present in all we do. The 7th Creativity and Cognition Conference (CC09) embraces the broad theme of Everyday Creativity. This year the conference will be held at the Berkeley Art Museum (CA, USA), and asks: How do we enable everyone to enjoy their creative potential? How do our creative activities differ? What do they have in common? What languages can we use to talk to each other? How do shared languages support collective action? How can we incubate innovation? How do we enrich the creative experience? What encourages participation in everyday creativity?
The Creativity and Cognition Conference series started in 1993 and is sponsored by ACM SIGCHI. The conference provides a forum for lively interdisciplinary debate exploring methods and tools to support creativity at the intersection of art and technology. We welcome submissions from academics and practitioners, makers and scientists, artists and theoreticians. This year’s broad theme of Everyday Creativity reflects the new forms of creativity emerging in everyday life, and includes topics of:
Collective creativity and creative communities
Shared languages and Participatory creativity
Incubating creativity and supporting Innovation
DIY and folk creativity
New materials for creativity
Enriching the collaborative experience
We welcome the following forms of submission:
Empirical evaluations by quantitative and qualitative methods
In-depth case studies and ethnographic analyses
Reflective and theoretical accounts of individual and collaborative practice
Principles of interaction design and requirements for creativity support tools
Educational and training methods
Interdisciplinary methods, and models of creativity and collaboration
Analyses of the role of technology in supporting everyday creativity
The Berkeley Art Museum should be a great venue too.
Yearly vaccination is currently needed because different strains of the virus circulate around the world regularly, owing to the germs’ rapidly changing genetic makeup. But the researchers reported yesterday that they had found one pocket of the virus that appears to remain static in multiple strains, making it an attractive target for a vaccine, as well as drugs.
And instead of fighting the primary part virus head on, you figure out a way to attack a portion that does not mutate in the weaker part and neutralize it:
Most vaccines work by revving up the body’s disease-fighting cells, helping them to recognize and rapidly neutralize invading germs. The researchers realized that the disease fighters generated by existing flu vaccines – which contain killed or weakened whole viruses – head straight toward the biggest target, the globular head. It is, in effect, a Trojan horse that prevents the body’s immune system from directing more of its firepower toward the stalk of the [virus], where the scientists found the pocket that was so static. That site contains machinery that lets the virus penetrate human cells.
A vaccine is a way off, but they say it should be possible to make a drug that helps the body create antibodies to fight off the flu sooner than that. Incredible work.
This month’s pirate reference comes to us by way of the theory of the Flying Spaghetti Monster. The theory was first introduced in an open letter from Bobby Henderson to the Kansas State Board of Education after deciding that creationism must be taught alongside the theory of evolution. I had disregarded the Spaghetti Monster as a heavy-handed response to the hard-headed, but had missed this important bit of context:
You may be interested to know that global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of Pirates since the 1800s. For your interest, I have included a graph of the approximate number of pirates versus the average global temperature over the last 200 years. As you can see, there is a statistically significant inverse relationship between pirates and global temperature.
A stunning find! And like an overly literal translation of the bible, so accurate — except when it’s not. The horizontal scale, as Edward Tufte would say, “repays careful study.”
I wrote about my excitement over the rumor that Google was going under back in April, but now it has officially happened — the Ocean has arrived as part of Google Earth:
Look at those trenches! And now you can use the Google Earth software to fly through the area in the middle of the Atlantic where some god has decided to begin peeling the globe like an orange.
I’m waiting for the day (presumably a few years from now) that this feature includes other major bodies of water, revealing the hidden shapes beneath the surface of lakes or rivers that you know well from above. The physical relief version, that is. I’ll pass on the underwater Google Street View with their privacy-invading minisubs sticking their nose in everyone’s business.
We are pleased to announce the release of a new Conservation track based on the human (hg18) assembly. This track shows multiple alignments of 44 vertebrate species and measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all species (vertebrate) and two subsets (primate and placental mammal). The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. Conserved elements identified by phastCons are also displayed in this track. For more details, please visit the track description page…
Would someone tell me how this happened? We were the fucking vanguard of shaving in this country. The Gillette Mach3 was the razor to own. Then the other guy came out with a three-blade razor. Were we scared? Hell, no. Because we hit back with a little thing called the Mach3Turbo. That’s three blades and an aloe strip. For moisture. But you know what happened next? Shut up, I’m telling you what happened—the bastards went to four blades. Now we’re standing around … selling three blades and a strip. Moisture or no, suddenly we’re the chumps. Well, fuck it. We’re going to five blades.
Conservation tracks in the human genome are simply additional lines of annotation shown alongside the human DNA sequence. The lines show identical areas of near-similar DNA found in other species (in this case 44 vertebrates). In the past we might have looked at two, three, seven, maybe a dozen different species in a row. UCSC had actually been up to 27 different species at a time before they took the extra push over the cliff to 44.
As it turns out, just sequencing the human genome isn’t all that interesting. It only starts to get interesting in the context of other genomes from other species. With multiple species, the data can be compared and evolutionary trees drawn. We can take an organism that we know a lot about — say the fruitfly — and compare its genes (which have been studied extensively) to the genetic code of humans (who have been studied less), and we can look for similar regions. For instance, the HOX family of genes is involved in structure and limb development. A similar region can be found in humans, insects, and many things in between. How cool is that?
Further, how about all that “junk” DNA? A particular portion of DNA might have no known function, but if you find an area where the data matches (is conserved) with another species, then it might not be quite as irrelevant as previously thought (and for the record, the term junk is only used in the media). If you see that it’s highly conserved (a large percentage is identical) across many different species, then you’re probably onto something, and it’s time to start digging further.
Spending time with data like this really highlights the silliness of anti-evolution claims. It’s tough to argue with being able to see it. Unfortunately most of the work I’ve done in this area isn’t documented properly, though you can see human/chimp/dog/mouse alignments in this genome browser, a dozen mammals aligned in this illustration, or humans and chimps in this piece.
As an aside, a few months after the Onion article, Gillette really did go to five blades with their Fusion razor. And happily, the (real) CEO speaks with the same bravado as the earlier editorial:
“The Schick launch has nothing to do with this, it’s like comparing a Ferrari to a Volkswagen as far as we’re concerned,” Chairman, President and Chief Executive James Kilts, told Reuters.
Beneath a pile of 1099s, I found myself distracted still thinking about the logo colors and proportions seen in the previous post. This led to a diversion to extract the colors from the Super Bowl logos and depict them according to their usage. The colors are counted up and laid out using a Treemap.
The result for all 43 Super Bowl logos, using the same layout as the previous image:
A few of the typical pairs, starting with 2001:
See all of the pairings here. Some notes about what’s mildly clever, and the less so:
The empty space (white areas or transparent background) is subtracted from the logo, and the code tries to size the Treemap according to the aspect ratio of the original image, so that when seen adjacent the logo, things look balanced (kinda).
The code is a simple adaptation of the Treemap project in Chapter 7 of Visualizing Data.
Unfortunately, I could not find vector images (for all of the games, at least), which means the colors in the original images are not pure. For instance, edges of a solid blue color will have light blue edges because of smoothing (anti-aliasing). This makes it difficult to accurately figure out what’s a real color and what isn’t. Sometimes the fuzzy edge colors are correctly removed, other times not so much. Even worse, it may even remove legitimate colors that are used in less than 4-5% of the image.
The color quantization isn’t good. On a few, it’s bad, and causes a few similar colors to disappear.
All the above could be fixed, but taxes are more important than non-representational art. (That’s not a blanket statement — just for me this evening.)
And finally, I don’t honestly think there’s any relationship between a software algorithm for data visualization and the work of an artist like Piet Mondrian. But I do love the idea of a Dutch painter from the De Stijl movement making his way through the turnstiles at Raymond Jones Stadium.
The original article cites how the logos reflect the evolution and growth of the league. Which makes sense, you can see that it was more than fifteen years before it moved from just a logotype to a fully branded extravaganza. Or that in its first year it wasn’t the Super Bowl at all, and instead billed as “The First World Championship Game of the American Football Conference versus the National Football Conference,” a title that sounds great in a late-60s broadcaster voice (try it, you’ll like it), but was still shortened to the neanderthal “First World World Championship Game AFC vs NFC” for the logo, before it was renamed the “Super Bowl” the following year. (You can stop repeating the name in the broadcaster voice now, your officemates are getting annoyed.)
The similarities in the coloring are perhaps more interesting than the differences, though the general Americana obsession of the constant blue/red coloring is unsurprising, especially when you recall that some of the biggest perennial ad buyers (Coke, Pepsi, Budweiser) also share red, white, and blue labels. I’m guessing that the heavy use of yellow in the earlier logos had more to do with yellow looking good against a background when used for broadcast.
Or maybe not — like any good collection, there’s plenty to speculate about and many hypotheses to be drawn — and the investigation is more interesting for the exercise.
I’m often asked about sonification—instead of visualization, turning data into audio—but I’ve never pursued it because there are other things that I’m more curious about. The bigger issue is that I was concerned that audio would require even more of a trained ear than a visualization (according to some) requires a trained eye.
My opinion of Songsmith is shifting — while it’s generally presented as a laughingstock, catastrophic failure, or if nothing else, a complete embarrassment (especially for its developers slash infomercial actors), it’s really caught the imagination of a lot of people who are creating new things, even if all of them subvert the original intent of the project. (Where the original intent was to… create a tool that would help write a jingle for glow in the dark towels?)
At any rate, I think it’s achieved another kind of success, and web memes aside, I’m curious to see what actual utility comes from derivatives of the project, now that the music idea is firmly planted in peoples’ heads.
And if you stopped the video halfway through because it got a little tedious, you missed some of the good bits toward the end.
R is also the name of a popular programming language used by a growing number of data analysts inside corporations and academia. It is becoming their lingua franca partly because data mining has entered a golden age, whether being used to set ad prices, find new drugs more quickly or fine-tune financial models. Companies as diverse as Google, Pfizer, Merck, Bank of America, the InterContinental Hotels Group and Shell use it.
R is also open source, another focus of the article, which includes quoted gems such as this one from commercial competitor SAS:
“I think it addresses a niche market for high-end data analysts that want free, readily available code,” said Anne H. Milley, director of technology product marketing at SAS. She adds, “We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”
Pure gold: free software is scary software! And freeware? Is she trying to conflate R with free software downloads from CNET?
Truth be told, I don’t think I’d want to be on a plane that used a jet engine designed or built with SAS (or even R, for that matter). Does she know what her product does? (A hint: It’s a statistics package. You might analyze the engine with it, but you don’t use it for design or construction.)
For those less familiar with the project, some examples:
…companies like Google and Pfizer say they use the software for just about anything they can. Google, for example, taps R for help understanding trends in ad pricing and for illuminating patterns in the search data it collects. Pfizer has created customized packages for R to let its scientists manipulate their own data during nonclinical drug studies rather than send the information off to a statistician.
At any rate, many congratulations to Robert Gentleman and Ross Ihaka, the original creators, for their success. It’s a wonderful thing that they’re making enough of a rumpus that a stats package is being covered in a mainstream newspaper.
Eugene Kuo sends a link to the Wikipedia article on center of population, an awkward term for the middlin’ place of all the people in a region. Calculation can be tricky because the Earth is round (what!?) and the statistical hooey that goes into determining a proper distance metric. The article includes a heat map of world population:
…the world’s center of population is found to lie “at the crossroads between China, India, Pakistan and Tajikistan”, with an average distance of 5,200 kilometers (3,200 mi) to all humans…
Though sadly, the map also uses a strange color scale for the heat map, with blue the area of greatest density, and red (traditionally the “important” end of the scale) as the least populated area. Even shifting the colors helps a bit, at least in terms of highlighting the correct area:
Though the shift is of questionable accuracy, and the bright green still draws too much attention, as does the banding in the middle of the Atlantic.
Outside of musing for your own edification, practical applications of calculating a population’s center include:
…locating possible sites for forward capitals, such as Brasilia, Astana or Austin. Practical selection of a new site for a capital is a complex problem that depends also on population density patterns and transportation networks.
Check the article for more about centers of various countries, including the United States:
The mean center of United States population has been calculated for each U.S. Census since 1790. If the United States map were perfectly balanced on a point, this point would be its physical centroid. Currently this point is located in Phelps County, Missouri, in the east-central part of the state. However, when Washington, D.C. was chosen as the federal capital of the United States in 1790, the center of the U.S. population was in Kent County, Maryland, a mere 47 miles (76 km) east-northeast of the new capital. Over the last two centuries, the mean center of United States population has progressed westward and, since 1930, southwesterly, reflecting population drift.
For added fun, I’ve created an interactive version of the map, based on a Processing example. (Though it took me longer to write the credits for the adaptation than to actually assemble it — thanks for all those who contributed little bits to it.)
Back in December (or maybe even November… sorry, digging out my inbox this morning) Amazon announced the availability of public data sets for their Elastic Compute Cloud platform:
Previously, large data sets such as the mapping of the Human Genome and the US Census data required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. For example, users can produce or use prebuilt server images with tools and applications to analyze the data sets. By hosting this important and useful data with cost-efficient services such as Amazon EC2, AWS hopes to provide researchers across a variety of disciplines and industries with tools to enable more innovation, more quickly.
The current lists includes ENSEMBL (550 GB), GenBank (250 GB), various collections from the US Census (about 500 GB), and a handful of others (with more promised). I’m excited about the items under the “Economy” heading, since lots of that information has to date been difficult to track down in one place and in a single format.
While it may be possible to download these as raw files from FTP servers from their original sources, it’s already set up for you, rather than running rsync or ncftp for twenty-four hours, then spending an afternoon setting up a Linux server with MySQL and lots of big disk space, and dealing with various issues regarding versions of Apache, MySQL, PHP, different Perl modules to be installed, permissions to be fixed, etc. etc. (Can you tell the pain is real?)
As I understand it, you start with a frozen version of the database, then import that into your own workspace on AWS, and pay only for the CPU time, storage, and bandwidth that you actually use. Pricing details are here, but wear boots — there’s a lotta cloud marketingspeak to wade through.
And apparently the games have even been popular since the early 80s. I found this bit especially interesting:
Fantasy soccer doesn’t really work – the game can’t really be quantified in the way NFL football or baseball can – so it could be that these games’ popularity comes from filling the same niche as rotisserie baseball does on your side of the Atlantic.
Which suggests a more universal draw to the numbers game or statistics competition that gives rise to fantasy/rotisserie leagues. The association with sports teams gives it broader appeal, but at its most basic, it’s just sports as a random number generator.
Some further digging yesterday also turned up Baseball Mogul 2008 (and the 2009 Edition). The interface seems closer to a bad financial services app (bad in this case just means poorly designed, click the image above for a screenshot), which is the opposite direction of what I’m interested in, but at least gives us another example. Although this one also seems to have reviewed better than the game from the previous post.
The new game — which is unlike any baseball video game I’ve ever seen — has perhaps the perfect pitchman, Oakland A’s General Manager Billy Beane. For those not familiar with him, the game probably won’t mean much, since as the main subject of Michael Lewis’ hit book, Moneyball, Beane has long been considered the most cerebral and efficient guy putting contending baseball teams on the field.
This caught my eye because of its focus on the numbers, and how you’d pull that off in the context of a console game.
As you may imagine, FOM’s interface is menu heavy, providing access to the various statistical metrics and trends to keep you apprised as general manager. What is surprising is that FOM manages to bring this depth to the console as well as the PC. While other console-based franchise management titles have struggled to create effective navigation tools, FOB’s vertical menu interface is both clean and intuitive without compromising the depth one would expect from a game in this genre. Top-level categories include submenus (many of which include further submenus) similar to navigating a sports Web site.
Other reviews seem to be less charitable, but I’m less interested in the game itself than the curiosity that it exists in the first place. GameSpot describes the audience:
By 2K’s own admission, the game targets a specific niche: the roughly 3.5 million participants of Fantasy Baseball leagues. It is 2K’s hope that this hardcore baseball audience, many of whom spend two to three hours every day managing their fantasy rosters, will see FOM as a convenient alternative (or even a complement, assuming those individuals forgo sleep).
So it’s a niche, as would be expected. But I’m curious about a handful of issues, a combination of not knowing much about gaming, mixed with a fascination for what gaming means for interfaces:
Could this be done properly, to a point where a game like this is a wider success? The niche audience is interesting at first, but is it possible to take a numbers game to a broader audience than that?
Has anyone already had success doing that?
Are there methods for showing complex numbers, data, and stats that have been used in (particularly console) games that are more effective than typical information dashboards used by, say, corporations?
The combination of having a motivated user who is willing to put up with the numbers suggests that some really interesting things could be done. And because the interface has to be optimized for the limited interaction afforded by a handheld controller (if played on a console) suggests that the implementation would also need to be clever.
If you have any insight, please drop me a line. Or you can continue to speculate for yourself while enjoying the promotional video below with the most fantastically awful background music I’ve heard since Microsoft Songsmith appeared a little while ago.
OpenStreetMap is a wiki-style map of the world and this animation displays a white flash each time a way is entered or updated. Some edits are a result of a physical local survey by a contributor with a GPS unit and taking notes, other edits are done remotely using aerial photography or out-of-copyright maps, and some are bulk imports of official data.
Simple idea but really elegant execution. Created by ITO.
That we are in the midst of crisis is now well understood. Our nation is at war, against a far-reaching network of violence and hatred. Our economy is badly weakened, a consequence of greed and irresponsibility on the part of some, but also our collective failure to make hard choices and prepare the nation for a new age. Homes have been lost; jobs shed; businesses shuttered. Our health care is too costly; our schools fail too many; and each day brings further evidence that the ways we use energy strengthen our adversaries and threaten our planet.
These are the indicators of crisis, subject to data and statistics. Less measurable but no less profound is a sapping of confidence across our land – a nagging fear that America’s decline is inevitable, and that the next generation must lower its sights.
For the politically-oriented math geek in me, his mention of statistics stood out: we now have a president who can actually bring himself to reference numbers and facts. I searched for other mentions of “statistics” in previous inaugural speeches and found just a single, though oddly relevant, quote from William Howard Taft in 1909:
The progress which the negro has made in the last fifty years, from slavery, when its statistics are reviewed, is marvelous, and it furnishes every reason to hope that in the next twenty-five years a still greater improvement in his condition as a productive member of society, on the farm, and in the shop, and in other occupations may come.
Progress indeed. (And what’s the term for that? A surprising coincidence? Irony? Is there a proper term for such a connection? Perhaps a thirteen letter German word along the lines of schadenfreude?)
And it’s such a relief to see the return of science:
For everywhere we look, there is work to be done. The state of the economy calls for action, bold and swift, and we will act – not only to create new jobs, but to lay a new foundation for growth. We will build the roads and bridges, the electric grids and digital lines that feed our commerce and bind us together. We will restore science to its rightful place, and wield technology’s wonders to raise health care’s quality and lower its cost. We will harness the sun and the winds and the soil to fuel our cars and run our factories. And we will transform our schools and colleges and universities to meet the demands of a new age. All this we can do. And all this we will do.
An interesting article from Slate about a session at the Joint Mathematics Meeting that discussed mathematical solutions and proposals to undo the problem of gerrymandered congressional districts. That is, politicians in congress having the ability to draw an outline around the group of people they want to represent (which is based on how likely they are to vote for said politician’s re-election). The resulting shapes are often comical, insofar as you’re willing to be cheerful in a “politics is perpetually broken and corrupt” kind of way. Chris Wilson writes:
It’s tough to find many defenders of the status quo, in which a supermajority of House seats are noncompetitive. (Congressional Quarterlyranked 324 of the 435 seats as “safe” for one party or the other in 2008.) The mathematicians—and social scientists and lawyers—who gathered to discuss the subject Thursday are certain there’s a better way to do it. They just haven’t quite figured out what it is.
The meeting also seemed to include a contest (knock down, drag out, winner take pocket protector) between the presenters each trying to one-up each other for worst district. For instance, Florida’s 23rd, provided by govtrack.us:
Which doesn’t seem awful at first, until you see the squiggle up the coast. Or Pennsylvania’s 12th, which Wilson describes as “an anchor glued to a sea anemone.”
Fixing the problem is difficult, but sometimes there are elegant and straightforward metrics that get you closer to a solution:
The most interesting proposal of the afternoon came from a Caltech grad student named Alan Miller, who proposed a simple test: If you take two random people in a district, what are the odds that one can walk in a straight line to the other without ever leaving the district? (Actually, it’s without leaving the district while remaining in the state, so as not to penalize districts like Maryland’s 6th, which has to account for Virginia’s hump.) This rewards neat, simple shapes. But it penalizes districts like Maryland’s 3rd, which looks like something out of Kandinsky’s Improvisation 31.
This turns the issue into something directly testable (two residents and their path) for which we can calculate a probability — the sort of thing statisticians love (because it can be measured). Given this criteria (and others like it) for congressional district godliness, another proposal was a kind of Netflix Prize for redistricting, where groups could compete to develop the best redistricting algorithm. Such an algorithm would seek to remove the (bipartisan) mischief by limiting human intervention.
The original article also includes a slide show of particularly heinous district shapes. And as an aside, the images above, while enormously useful, illustrate part of my beef with mash-ups: Google Maps was designed as a mapping application, not a mapping-with-stuff-on-it application. So when you add data to the map image — itself a completed design —you throw off that balance. It’s difficult to read the additional information (the district area), and the information that’s there (the map coloring, specific details of the roads) is more than necessary for this purpose.
Visualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.
The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)
The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.
The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).
This site is used for follow-up code and writing about related topics.