One of the earliest fixtures in the Processing community is toxi (or Karsten Schmidt, if you must) who has been doing wonderful things using the language/environment/core for many years. A couple months ago he posted a beautiful reel of work done by the many users of his toxiclibs library. Just beautiful:
A more complete description can be found on the video page over at Vimeo.
Having spent my morning at the doctor’s office (I’m fine, Mom–just a physical), I passed the time by asking my doctor about the system they use for electronic medical records. Our GE work (1, 2) and seeing her gripe and sigh as truly awful-looking screen after screen flew past on her display caught my interest. And as someone who has an odd fascination with bad interfaces, I just had to ask…
Perhaps the most surprising bit was that without explicitly saying so, she seemed to find the EMR system most useful not as a thing that aggregates data, or makes her work easier, but instead as a communication tool. It combats the (very real, not just an overused joke) penmanship issues of fellow doctors, but equally as important, it sets a baseline or common framework for the details of a visit. The latter part is obvious, but the actual nature of it is more subtle. For instance, she would often find herself deciphering a scribble that says “throat, amox” by another doctor, and it says nothing of dosage, frequency, type of Amoxicillin, much less the nature of the throat trouble. A patient (particularly a sick patient) is also not the person to provide precise details. How many would remember whether they were assigned a 50, 150 or 500 milligram dosage (very different things, you might say). And for that matter, they’re probably equally likely to think they’re on a 500 kilogram dose. (“No, that’s too high. Must be 5 kilogram.”)
My doctor might be seeing such a patient because their primary care doctor (the mad scribbler) was out, or the patient was a referral, or had just moved offices, or whatever. But it makes an interesting point for the transience of medical data: importance increases as it’s in motion, which is especially true since the patient it’s attached to is not a static entity (from changing health conditions to changing jobs, cities, and doctors).
Or from a simpler angle, if you’re sick enough that you have to be seen by someone other than your primary care doctor, then it’s especially important for the information to be complete. So with any luck, the EMR removes a layer of translation that was required before.
As she described things off the top of her head, the data only came up later. Ok, it’s all data, but I’m referring to the numbers and the tests and the things that can be tracked easily over time. The sort of reduce-the-patient-to-numbers things we usually think of when hearing about EMRs. Readouts that display an array of tests, such as blood pressure history, is an important feature, but it wasn’t the killer app of EMRs. (And that will be the last time I use “killer app” and “electronic medical records” together. Pun not intended.)
The biggest downside (she’s now using her second system) is that the interfaces are terrible, usually that they do things in the wrong order, or require several windows and multiple clicks to do mundane tasks. She said there were several things that she liked and hated about this one, but that it was a completely different set of pros/cons from the other system she used. (And to over-analyze for a moment, I think she even said “like” and “hate” not “love” and “hate” or “like” and “dislike”. She also absentmindedly mentioned “this computer is going to kill me.” She’s not a whiner, and may truly believe it. EMRs may be killing our doctors! Call The New York Times, or at least Fox 25.) This isn’t surprising, I assume it’s just that technology purchasers are several levels removed from the doctors who have to use the equipment, which is usually the case for software systems like this, so there’s little market pressure for usability. If you’re big enough to need such a beast, then it means that the person making the decision about what to buy is a long ways removed. But I’m curious about whether this is a necessity of how big software is implemented, or a market opportunity.
At some point she also stated that it would be great if the software company had asked a doctor for their input in how the system was implemented. I think it’s safe to assume that there was at least one M.D.–if not an arsenal of individuals with a whole collection of alphabet soup trailing their names–who were involved with the software. But I was struck with how matter-of-fact she was that nobody had even thought about it. The software was that bad, and to her, the flaws were that obvious. The process by which she was forced to travel through the interface had little to do with the way she worked. Now, for any expert, they might have their own way of doing things, but that’s probably not the discrepancy here. (And in fact, if the differences between doctors are that great, then that itself should be part of the software: the doctor needs to be able to change the order in which the software works.) But it’s worth noting that the data (again, meaning the numbers and test history and easily measurable things) were all easily accessible from the interface, which suggests that like so many data-oriented projects, the numbers seduced the implementors. And so those concrete numbers (fourth or so on ranked importance for this doctor) won out over process (the way the doctor spends their day, and their time with the patient).
All of which is a long way of wondering, “are electronic medical records really about data?”
While checking the bus schedule for Greyhound, I recently discovered that travel from New York City to Boston is a multi-day affair, involving stops in Rochester, Toronto (yes, Canada), Fort Erie, Syracuse, and even Schenectady and Worcester (presumably because they’re both fun to say).
1 day, 5 hours, and 35 minutes. That’s the last time I complain about how bad the Amtrak site is.
Fantastic TED talk from Chris Jordan back in February 2008. Chris creates beautiful images that convey scale in the millions. Examples include statistics like the number of plastic cups used in a day — four million — and here showing one million of them:
The talk is ten minutes, and well worth a look. I’m linking a sinfully small version here, but check out the higher resolution version on the TED site.
As much as I love looking at this work (and his earlier portraits, more can be found on his site), there’s also something peculiar about the beauty of the images perhaps neutering their original point. Does seeing the number of prison uniforms spur viewers to action, or does it give chin-rubbing intellectual fulfillment accompanied by a deep sigh of worldliness? I’d hate to think it’s the latter. Someone I asked about this had a different reaction, and cited a group that had actually begun to act based on what they saw in his work. I wish I had the reference, but if that’s the case (and I hope it is), there’s no argument.
Looking at it another way, next time you reach for a plastic cup, will Jordan’s image that will come to mind? Will you make a different decision, even some of the time?
I’ve also just purchased his “Running the Numbers” book, since these web images are an injustice to the work. And I have more chin scratching and sighing to do.
(Thanks to Ron Kurti for the heads up on the video.)
I wanted to post this last week in my excitement over week 1 of pro football season (that’s the 300 lbs. locomotives pounding into each other kind of football, not the game played with actual balls and feet), but ran out of time. So instead, in honor of football Sunday, week 2, my favorite advertisement of last year’s football season:
The ad is a phone conversation with Coca-Cola’s Katie Bayne, animated by Imaginary Forces. A couple things I like about this… First, that the attitude is so much less heavy-handed than, say, the IBM spots that seem to be based on the premise that if they jump cut quickly enough, they can cure cancer. The woman being interviewed actually laughs about “big data” truisms. Next is the fact that it’s actually a fairly smart question that’s asked:
How important is it that you get the right information rather than just a lot of information?
Well… you know you can roll around in facts all day long. It’s critical that we stay aware of that mountain of data that’s coming in and mine it for the most valuable nuggets. It helps keep us honest.
And third, the visual quality that reinforces the lighter attitude. Cleverly drawn without overdoing it. She talks about being honest and a hand comes flying in to push back a Pinnocchio nose. Nuggets of data are shown as… eh, nuggets.
So it takes me a year or two to post the “You Are What You Say” lecture by Dan Frankowski, and the day after, a much more up-to-date paper is in the news. The paper is by Paul Ohm and is available here, or you can read an Ars Technicaarticle about it if you’d prefer the (geeky) executive summary. The paper also sites the work of Latanya Sweeney (as did the Frankowski lecture), with this defining moment of the contemporary privacy debate, when the Massachusetts Group Insurance Commission (GIC) released “anonymized” patient data in the mid-90s:
At the time GIC released the data, William Weld, then Governor of Massachusetts, assured the public that GIC had protected patient privacy by deleting identifiers. In response, then-graduate student Sweeney started hunting for the Governor’s hospital records in the GIC data. She knew that Governor Weld resided in Cambridge, Massachusetts, a city of 54,000 residents and seven ZIP codes. For twenty dollars, she purchased the complete voter rolls from the city of Cambridge, a database containing, among other things, the name, address, ZIP code, birth date, and sex of every voter. By combining this data with the GIC records, Sweeney found Governor Weld with ease. Only six people in Cambridge shared his birth date, only three of them men, and of them, only he lived in his ZIP code. In a theatrical flourish, Dr. Sweeney sent the Governor’s health records (which included diagnoses and prescriptions) to his office.
And from the “where are they now?” file, Sweeney continues her work at Carnegie Mellon, though I have to admit I’m a little nervous that she’s currently back in my neighborhood with visiting posts at MIT and Harvard. Damn this Cambridge ZIP code.
For the panel, we were to choose “an individual, movement, technology, whatever – whose importance has been overlooked” and follow that with “two themes that [we] believe will define the future of design and architecture.” In that context, I chose Lombardi’s work, and how it highlights a number of themes that are important to the future of design, particularly in working with data.
Give up those full hue heat map colors! Make images of biological data that even a grandmother can love! How about posters that no longer require an advanced degree to decipher? These platitudes and more coming next March, when I’ll be giving a keynote at the EMBO Workshop on Visualizing Biological Data in Heidelberg. Actually, I won’t be talking about any of those three things (though there’s a good chance I’ll talk about things like this), but registration is now open for participants:
We invite you to participate in the first EMBO Workshop on Visualizing Biological Data (VizBi) 3 – 5 March 2010 at the EMBL’s new Advanced Training Centre in Heidelberg, Germany.
The goal of the workshop is to bring together, for the first time, researchers developing and using visualization systems across all areas of biology, including genomics, sequence analysis, macromolecular structures, systems biology, and imaging (including microscopy and magnetic resonance imaging). We have assembled an authoritative list of 29 invited speakers who will present an exciting program, reviewing the state-of-the-art and perspectives in each of these areas. The primary focus will be on visualizing processed and annotated data in their biological context, rather than on processing of raw data.
The workshop is limited in the total number participants, and each participant is normally required to present a poster and to give a ‘fastforward’ presentation about their work (limited to 30 seconds and 1 slide).
To apply to join the workshop, please go to http://vizbi.org and submit an abstract and image related to your work. Submissions close on 16 November 2009. Since places are limited, participants will be selected based on the relevance of their work to the goals of the workshop.
Notifications of acceptance will be sent within three weeks after the close of submissions.
We plan to award a prize for the submitted image that best conveys a strong scientific message in a visually compelling manner.
Please forward this announcement to anyone who may be interested. We hope to see you in Heidelberg next spring!
Seán O’Donoghue, EMBL
Jim Procter, University of Dundee
Nils Gehlenborg, European Bioinformatics Institute
Reinhard Schneider, EMBL
If you have any questions about the registration process please contact:
I’ve been hesitant to post this video of Keith Olbermann’s 17-minute timeline connecting the shifting terror alert level to the news cycle and administration at the risk of veering too far into politics, but I’m reminded again of it with Tom Ridge essentially admitting to it in his book:
In The Test of Our Times: America Under Siege, Ridge wrote that although Rumsfeld and Ashcroft wanted to raise the alert level, “There was absolutely no support for that position within our department. None. I wondered, ‘Is this about security or politics?'”
Only to recant and be taken to task by Rachel Maddow:
Ridge went on to say that “politics was not involved” and that “I was not pressured.” Maddow then read to Ridge directly from his book’s jacket: “‘He recounts episodes such as the pressure that the DHS received to raise the security alert on the eve of of the ’04 presidential election.’ That’s wrong?”
As Seth Meyers put it, “My shock level on manipulation of terror alerts for political gain is green, or low.”
At any rate, whether there is in fact correlation, causation, or simply a conspiracy theory that gives far too much credit to the number of people who would have to be involved, I think it’s an interesting look at 1) message control 2) using the press (or a clear example of the possibilities) 3) the power of assembling information like this to produce such a timeline, and 4) actual reporting (as opposed to tennis match commentary) done by a 24-hour news channel.
Of course, I was disappointed that it wasn’t an actual visual timeline, though somebody has probably done that as well.
Finally got around to watching Dan Frankowski’s “You Are What You Say: Privacy Risks of Public Mentions” Google Tech Talk the other day. (I had the link set aside for two years. There’s a bit of a backlog.) In the talk, he takes an “anonymized” set of movie ratings and removes the anonymity by matching them to public mentions of movies in user profiles on the same site.
Interestingly, the ratings themselves weren’t as informative as the actual choice of movies to talk about. In the case of a site for movie buffs — ahem, film aficionados — I couldn’t help but think about participants in discussions using obscure film references as colored tail feathers as they try to out-strut one another. Of course this has significant impact on such a method, making the point that individual uniqueness is only a signature for identification: what makes you different just makes you more visible to a data mining algorithm.
The other interesting bit from the talk is about 20 minutes through, where starts to address ways to defeat such methods. There aren’t many good ideas, because of the tradeoffs involved in each, but it’s interesting to think about.
I’ve just posted a new piece that depicts changes between the multiple editions of Darwin’s “On the Origin of Species:
To quote myself, because it looks important:
We often think of scientific ideas, such as Darwin’s theory of evolution, as fixed notions that are accepted as finished. In fact, Darwin’s On the Origin of Species evolved over the course of several editions he wrote, edited, and updated during his lifetime. The first English edition was approximately 150,000 words and the sixth is a much larger 190,000 words. In the changes are refinements and shifts in ideas — whether increasing the weight of a statement, adding details, or even a change in the idea itself.
The idea that we can actually see change over time in a person’s thinking is fascinating. Darwin scholars are of course familiar with this story, but here we can view it directly, both on a macro-level as it animates, or word-by-word as we examine pieces of the text more closely.
This is hopefully the first of multiple pieces working with this data. Having worked with it since last December, I’ve been developing a larger application that deals with the information in a more sophisticated way, but that’s continually set aside because of other obligations. This simpler piece was developed for Emily King’s “Quick Quick Slow” exhibition opening next week at Experimenta Design in Portugal. As is often the case, many months were spent to try to create something monolithic, then in a very short time, an offshoot of all that work is developed that makes use of that infrastructure.
Oddly enough, I first became interested in this because of a discussion with a friend a few years ago, who had begun to wonder whether Darwin had stolen most of his better ideas from Alfred Russel Wallace, but gained the notoriety and credit because of his social status. (This appealed to the paranoid creator in me.) She cited the first edition of Darwin’s text as incoherent, and that it gradually improved over time. Interestingly (and happily, I suppose), the process of working on this piece has instead shown the opposite, and I have far greater appreciation for Darwin’s ideas than I had in the past.
The New York Times today looks upon the plight of poor AT&T, saddled with millions of new customers paying thousands of dollars a year. Jenna Wortham writes:
Slim and sleek as it is, the iPhone is really the Hummer of cellphones. It’s a data guzzler. Owners use them like minicomputers, which they are, and use them a lot. Not only do iPhone owners download applications, stream music and videos and browse the Web at higher rates than the average smartphone user, but the average iPhone owner can also use 10 times the network capacity used by the average smartphone user.
If that 10x number didn’t come from AT&T, where did it come from? Seems like they might be starting a “we didn’t want the iPhone anyway” campaign so that investors treat them more nicely when they (are rumored to) lose their carrier exclusivity next year.
The result is dropped calls, spotty service, delayed text and voice messages and glacial download speeds as AT&T’s cellular network strains to meet the demand. Another result is outraged customers.
So even with AT&T’s outrageous prices, they can’t make this work? This week I’m canceling my AT&T service because it would cost $150 a month to get what T-Mobile charges me $80 for. (Two lines with shared minutes, texting on both lines, unlimited data on one, and even tethering. I also love T-Mobile’s customer service, staffed by friendly humans who don’t just read from scripts.)
With nine million users paying in excess of $100 a month apiece, they’re grossing a billion dollars a month, and they’re complaining about having to upgrade their network? They could probably fund rebuilding their entire network from scratch with the $15/month they charge to send more than 200 text messages. (Text messages are pure profit, because they’re sent using extra space in packets sent between the phone and the carrier.)
All of the cited problems, of course, would be lessened without carrier exclusivity. Don’t want 9 million iPhone customers clogging the network? Then don’t sign a deal requiring that you’re the only network they have access to. Hilarious.
But! The real reason I’m posting is because of the photos that accompany the article, including a shot of the AT&T command center and its big board:
A few thoughts:
If they’re gonna make it look like an orchestra pit, then I hope the head of IT is wearing tails.
Do they get night & weekend minutes because the lights are out? Wouldn’t the staff be a little happier if the lights were turned on?
And most important, I wonder what kind of coverage they get in there. It looks like the kind of underground bunker where you can’t get any signal. And if I’m not mistaken, those look like land lines on the desks.
As a continuation of this project, we’ve just finished a second health visualization (also built with Processing) using GE’s data. Like the first round, we started with ~6 million patient records from their “MQIC” database. Using the software, you input gender, age range, height/weight (to calculate BMI), and smoking status. Based on the selections it shows you the number of people in the database that match those settings, and the percentages that have been diagnosed with diabetes, heart disease, hypertension, or have had a stroke:
For people reading the site because they’re interested in visualization (I guess that’s all of you, except for mom, who is just trying to figure out what I’m up to), some inside baseball:
On the interaction side, the main objective here was to make it easy to move around the interface as quickly as possible. The rows are shown in succession so that the interface can teach itself, but we also provide a reset button so that you can return to the starting point. Once the rows are visible, though, it’s easy to move laterally and make changes to the settings (swapping between age ranges, for instance).
One irony of making the data accessible this way is that most users — after looking up their own numbers — will then try as many different possibilities, in a quick hunt for the extremes. How high do the percentages go? If I select bizarre values, what happens at the edges? Normally, you don’t have to spend as much time on these 1% cases, and it would be alright for things to be a little weird when truly odd values are entered (300 lb. people who are 4′ tall, smokers, and age 75 and over). But in this case, a lot more time has to be spent making sure things work. So while most of the time the percentages at the top are in the 5-15% range, I had to write code so that when one category shoots up to 50%, the other bars in the chart scale down in proportion.
Another aspect of the interface is the body mass index calculator. Normally a BMI chart looks something like this, a large two-dimensional plot that would otherwise use up half of the interface. By using a little interaction, we can make a simpler chart that dynamically updates itself based on the current height or weight settings. Also, because the ranges have (mathematically) hard edges, we’re showing that upper and lower bound of the range so that it’s more apparent. Otherwise, a 5’8″ person steps from 164 to 165 lbs to find themselves suddenly overweight. In reality, the boundaries are more fuzzy, which would be taken into account by a doctor. But with the software, we instead have to be clear about the way the logic is working.
(Note that the height and weight are only used to calculate a BMI range — it’s not pulling individuals from the database who are 5’8″ and 160 lbs, it’s pulling people from the “normal” BMI range.)
For the statistically (or at least numerically) inclined, there are also some interesting quirks that can be found, like a situation or two where health risk would be expected to go up, but in fact they go down (I’ll leave you to find them yourself). This is not a bug. We’re not doing any sort of complex math here to evaluate actual risk, the software is just a matching game with individuals in the database. These cases in particular show up when there are only a few thousand individuals, say 2,000 out of the full 6 million records. The number of people in these edge cases is practically a rounding error, which means that we can’t make sound conclusions with them. As armchair doctor-scientist, it’s also interesting to speculate as to what might be happening in such cases, and how other factors may come into play.
In 2001, when I was a young MIT faculty member overseeing the Media Lab Aesthetics and Computation Group, two students came up with an idea that would become an award-winning piece of software called Processing—which I am often credited with having a hand in conceiving. Processing, a programming language and development environment that makes sophisticated animations and other graphical effects accessible to people with relatively little programming experience, is today one of the few open-source challengers to Flash graphics on the Web. The truth is that I almost stifled the nascent project’s development, because I couldn’t see the need it would fill. Luckily, Ben Fry and Casey Reas absolutely ignored my opinion. And good for them: the teacher, after all, isn’t always right.
To give him more credit (not that he needs it, but maybe because I’m bad with compliments), John’s objection had much to do with the fact that Processing was explicitly an evolutionary, as opposed to revolutionary, step in how coding was done. That’s why it was never the focus of my Masters or Ph.D. work, and instead has nearly always been a side project. And more importantly, for students in his research group, he usually forced us away from whatever came naturally for us. Those of us for whom creating tools was “easy,” he forced us to make less practical things. For those who were comfortable making art, he steered them toward creating tools. In the end, we all learned more that way.
Tiny Sketch is an open challenge to artists and programmers to create the most compelling creative work possible with the programming language Processing using 200 characters or less.
…building on the proud traditions of obfuscated code contests and the demo scene. The contest runs through September 13 and is sponsored by Rhizome and OpenProcessing.
Having designed Processing to do one thing or another, several of the submissions made me laugh out loud for ways their authors managed to introduce new quirks. For instance, consider the createFont() function. Usually it looks something like this:
PFont f = createFont("Helvetica", 12);
If the “Helvetica” font is not installed, it silently winds up using a default font. So somebody clever figured out that if you just leave the font name blank, it’s an easy way to get a default font, and not burn several characters of the limit:
PFont f = createFont("", 12);
Another, by Kyle McDonald, throws an exception as a way to produce text to plot on screen. (It’s also a bit of an inside joke—on us, perhaps—because it’s a ubiquitous error message resulting from a change that was made since earlier releases of Processing.)
One of the most interesting bits is seeing how these ideas propagate into later sketches that are produced. Since the font hack appeared (not sure who did it first, let me know if you do), everyone else is now using that method for producing text. Obviously art/design/coding projects are always the result of other influences, but it’s rare that you get to see ideas exchanged in such a direct fashion.
And judging from some of the jagged edges in the submissions, I’m gonna change the smooth() to just s() for the next release of Processing, so that more people will use it in the next competition.
More typographic tastiness (see the earlier post) from Hoefler & Frere-Jones with a writeup on Choosing Fonts for Annual Reports. Lots of useful design help and ideas for anyone who works with numbers, whether actual annual reports or (more likely) fighting with Excel and PowerPoint. For instance, using enclosures to frame numbers, or knock them out:
Another helpful trick is using two weights so that you can avoid placing a line between them:
Or using a proper condensed face when you have to invite too many of your numerical friends:
At any rate, I recommend the full article for anyone working with numbers, either for the introduction to setting type (for the non-designers) or a useful reminder of some of the solutions (for those who fret about these things on a regular basis).
Responding to the Boehner post, Jay Parkinson, M.D. pointed me to this improved chart by designer Robert Palmer, accompanied by an angst-ridden open letter (an ironic contrast to the soft pastels in his diagram) decrying the crimes of visual malfeasance.
Meanwhile, Ezra Klein over at the Washington Post seems to be thinking along similar lines as my original post, noting this masked artist’s earlier trip to Kinko’s a few weeks ago. Klein writes:
Whoever is heading the Scary Flowcharts Division of John Boehner’s office is quickly becoming my favorite person in Washington. A few weeks ago, we got this terror-inducing visualization of the process behind “Speaker Pelosi’s National Energy Tax.”
If I were teaching right now, I’d make all my students do a one day charrette on trying to come up with something worse than the Boehner health care image while staying in the realm of colloquial things you can do with PowerPoint. It’d be a great time, and we’d all learn a lot.
Watching the centrist Democrats in Congress create more and more reasons why health care can’t be fixed, I’ve been struck by a disquieting thought: Suppose our collective lack of response to Hurricane Katrina wasn’t exceptional but, rather, the new normal in America. Suppose we can no longer address the major challenges confronting the nation. Suppose America is now the world’s leading can’t-do country.
I agree and find it terrifying. And I don’t think that’s a partisan issue.
Now back to your purposefully apolitical, regularly scheduled blog on making pictures of data.
BBC Newsbrings word (via) that after a 44 year effort, the Historical Thesaurus of the Oxford English Dictionary will see the light of day. Rather than simple links between words, the beastly volume covers the history of the words within. For instance, the etymological timeline of the word “trousers” follows:
breeks The earliest reference from 1552 marks the change in fashion from breeches, a garment tied below the knee and worn with tights. Still used in Scotland, it derives from the Old English “breeches”. trouser The singular form of “trousers” comes from the Gallic word “trews”, a close-fitting tartan garment formerly worn by Scottish and Irish highlanders and to this day by a Scottish regiment. The word “trouses” probably has the same derivation. unimaginables This 19th Century word, and others like “unwhisperables” and “never-mention-ems”, reflect Victorian prudery. Back then, even trousers were considered risque, which is why there were so many synonyms. People didn’t want to confront the brutal idea, so found jocular alternatives. In the same way the word death is avoided with phrases like “pass away” and “pushing up daisies”. stove-pipes A 19th Century reference hijacked in the 1950s by the Teddy Boys along with drainpipes. The tight trousers became synonymous with youthful rebellion, a statement of difference from the standard post-war suits. rammies This abbreviation of Victorian cockney rhyming slang “round-me-houses” travelled with British settlers to Australia and South Africa.
Are you seeing pictures and timelines yet? Then this continues for 600,000 more words. Mmmm!
An English language professor, Ms Kay, one of four co-editors of the publication, began work on it in the late 1960s – while she was in her 20s.
It’s hard to fathom being in your 60s, and completing the book that you started in your 20s, though it’s difficult to argue with the academic and societal contribution of the work. Her web page also lists “the use of computers in teaching and research” as one of her interest areas, which sounds like a bit of an understatement. I’d be interested in computers too if my research interest was the history 600,000 words and their 800,000 meanings across 236,000 categories.
Visualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.
The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)
The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.
The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).
This site is used for follow-up code and writing about related topics.