Writing

Obama Limited to 16 Bits

I guess I never thought I’d read about the 16-bit limitations of Microsoft Excel in mainstream press (or at least outside the geek press), but here it is:

Obama’s January fundraising report, detailing the $23 million he raised and $41 million he spent in the last three months of 2007, far exceeded 65,536 rows listing contributions, refunds, expenditures, debts, reimbursements and other details.

Excel has since its inception been limited to 65,536 rows, the maximum number you get when you represent the row number using two bytes. Mr. Millionsfromsmallcontributions has apparently flown past this limit in his FEC reports, forcing poor reporters to either use Microsoft Access (a database program) or pray for the just-released Excel 2007, where in fact the row restriction has been lifted.

In the past the argument against fixing the restriction had always been a mixture of “it’s too messy to upgrade something like that” and “you shouldn’t have that many rows of data in a spreadsheet anyway, you should use a database.” Personally I disagree with the latter; and as silly as the former sounds, it’s been the case for a good 20 years (or was the row limit even lower back then?)

The OpenOffice project, for instance, has an entire page dedicated to fixing the issue in OpenOffice Calc, where they’re limited to 30,000 rows—the limit being tied to 32,768, or the number you get with 15 bits instead of 16 (use the sixteenth bit as the sign bit indicating positive or negative, and you can represent numbers from -32768 to 32767 instead of unsigned 16 bit values that range from 0 to 65535).

Bottoms up for the first post tagged both “parse” and “politics”.

Thursday, June 5, 2008 | parse, politics  

So much for “wonderfully simple”

In contrast to the clarity and simplicity of the New York Times info graphic mentioned yesterday, the example currently on their home page is an example of the opposite:

This is helpful because it clarifies the point I tried to make about what was nice about the other graphic. Because of space limitations, this graphic is small, and the information is stored across multiple panels. So at the top there are a pair of tabs. Then within the tabs we have a pair of buttons. Two tabs, four buttons, just to get through four possible pieces of data. That’s the sort of combinatoric magic we see in Microsoft Windows preference panels:

snap1.gif

While the organization in the info graphic makes conceptual sense—first you must choose one of two states, then choose one of the candidates—it makes little cognitive sense. We’re choosing between one of four options. Just give them to us! For a pair of items beneath another pair of items, there’s no need to establish a sense of hierarchy. If there were a half dozen states, and a half dozen candidates, then that might make sense. Just because the data is technically hierarchic, or arranged in a tree, that doesn’t mean that it’s the best representation for it.

The solution? Just give us the four options. No sliding panels, trap doors, etc. Better yet, superimpose the Clinton and Obama data on a single map as different colors, and have a pair of buttons (not tabs!) that let the viewer quickly swap between Indiana and North Carolina.

(This only covers the interaction model, without getting into the way the data itself is presented, colors chosen, laid out, etc. The lack of population density information in the image makes the maps themselves nearly worthless.)

Tuesday, May 6, 2008 | infographics, interact, politics  

Another delegate calculator

Wonderfully simple delegate calculator from the New York Times. Addresses a far simpler question than the previously mentioned Slate calculator, but bless the NYT for realizing that something that complicated was no longer necessary.

delegate-5101.jpg

Good example of throwing out extraneous information to tell a story more directly: a quick left and right drag provides a more accurate depiction than the horse race currently in the headlines.

Monday, May 5, 2008 | election, politics, scenarios  

The Earth at night

Via mailing list, Oswald Berthold passes along images and a short article of the Earth from space as compiled by NASA, highlighting city lights in particular.

Tokyo Bay

The collection is an update to the Earth Lights image developed a few years ago (and which made its way ’round the interwebs at the time).

For the more technical, a presentation from the NOAA titled Low Light Imaging of the Earth at Night provides greater detail about the methods used to produce such images. Also includes a couple interesting historical examples (such as the first image they created) as well as comparisons of city growth over time based on changes in the data.

Of course many conclusions can be drawn from seeing map data such as this. Look at the difference between North and South Korea, for instance (original image from globalsecurity.org).

North and South Korea by night

Apparently this is a favorite of former U.S. Secretary of Defense Donald Rumsfeld:

Mr Rumsfeld showed the picture to illustrate how backward the northern regime really is - and how oppressed its people are. Without electricity there can be none of the appliances that make life easy and that we take for granted, he said.

“Except for my wife and family, that is my favourite photo,” said Mr Rumsfeld.

“It says it all. There’s the south, the same people as the north, the same resources north and south, and the big difference is in the south it’s a free political system and a free economic system.

I’ve vowed to myself not to make this page be about politics so I won’t get into the fatuous arguments of a warmonger (oops), but I think the fascinating thing is that

  1. This image, this “information graphic,” would be of such great importance to a person that he would see fit to even mention it in reference to photos of his wife and children. This is a strong statement for any image, even if he is being dramatic.
  2. The use of images to make or score political points. There’s some great stuff buried in recent Congressional testimony about the Iraq War, for instance, that I want to get to soon.

In regards to #1, I’m trying to think of other images to which people maintain such a personal relationship (particularly those whose job is not info graphics—Tufte’s preoccupation with Napoleon’s March doesn’t count.)

As for #2, hopefully we’ll get to that a bit later.

Friday, April 25, 2008 | mapping, physical, politics  

Democratic Delegate Scenarios

counter.jpgOne of the chapters that I had to cut from Visualizing Data was about scenarios—building interactive “what if” tools that help you quickly try out several possibilities. This is one of the most useful aspects of dynamic visualization—being able to try out different ideas in a quick way (and safe, as in non-destructive, since Undo is always nearby). Hopefully I’ll be able to cover this sometime soon.

At any rate, one such scenario-building tool is Slate’s Delegate Calculator, where you can drag primitive sliders back and forth and see the possibilities for delegate outcomes for Hillary and Obama.

I’ve seen complaints about its math, but it seems to do an OK job for a big-picture look at the likelihood of different outcomes. Getting the math 100% is impossible (unless you have a far more complicated interface) because the delegate selection process is different in each state. It appears that none of the states wanted to be seen using the same approach as another, and with fifty states going their own way, things got pretty random (Texas: we’ll have a caucus and a primary).

I think that’s enough posting about politics for a bit.

Saturday, March 15, 2008 | election, politics, scenarios  

Basing News Categorization on Blog Blather

blews-small.jpgFound this on Slashdot, but their headline—“Microsoft Developing News Sorting Based On Political Bias” made it sound a lot more interesting than it may be. The idea of mining text data to tease out mythical media biases and leanings sounds fascinating. What sort of axes could be determined? Could we see how different kinds of language are used, or ways that particular code words or phrases infect news coverage?

Unfortunately, the research project from Microsoft looks like it’s just procuring link counts from “liberal” and “conservative” blogs, and gauging the vigor of commentary on either side. Does this make you uneasy yet?

  • We are politically binary: the world has devolved into conservative and liberal! (Or not, yet why do people insist on it?) The representation seems almost entirely U.S.-centric, right down to the red and blue coloring on either side. Red states! Blue states! Red blogs! Blue Blogs! A maleficent Dr. Seuss has infected our political outlook.
  • What about those other axes, where are they? Of all the things to cull from political discourse, liberal vs. conservative must be one of the least interesting. Did you need a team of six from Microsoft, plus all the computing power at their disposal, to tell you that one article or another ruffled more feathers on either side of this simplified spectrum?
  • There’s so much to be learned from propagation of phrases and ideas in the news; why hasn’t there been a more sophisticated representation of it? (Because it’s hard?) The Daily Show has shown this successfully (queueing several people in order repeating something like “axis of evil” or something about “momentum” for a candidate).
  • Blogs are not real. When you turn off the computer, they go away. The internet is not a place, and is too divorced from actual reality to be a useful gauge on most social phenomena. Using blogs as input for a kind of meta-study seems like a poor way to acquire data.

The problems I cite are a bit unfair since they haven’t posted much on their site (looks like they’re presenting a paper…soon?) so the reaction is just based on what they’ve provided. I knew Sumit Basu back at the Media Lab and I think it’s safe to assume there’s more going on…

But what about these bigger issues?

Saturday, March 15, 2008 | news, politics  
Book

Visualizing Data Book CoverVisualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. People who have purchased the book can find the examples here.

The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. but applies them to a series of examples, first starting with a simple mapping project (Chapter 3) to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site will be used for follow-up code and writing about related topics.