Writing

Bird Tracks in the Snow

The field in snowy Foxborough, Massachusetts after a running play in Sunday’s football game:

two-500px-levels.jpg

(Click the image for the original version, taken from the broadcast.)

Look at all the footprints in the snow: The previous play began to the right of the white line, where you can see most of the snow was cleared by the players lining up. Just to the left of that is another cleared area, where a group of players began to tackle Sammy Morris. But it’s not until almost ten yards — two more white lines, and the area below where the players are standing in that picture — that he’s finally taken to the ground. For a visual explanation, watch the play:

(Mute the audio and spare yourself the insipid commentary from the FOX booth. And then be thankful that at least it’s not Joe Buck and Tim McCarver.)

The path left behind in the snow explains exactly how the play developed, according to the players’ feet. (And as a running play, feet are important.) Absolutely beautiful.

One of the best things about December is watching football games played in the snow. For instance last year, there was a game between Cleveland and Buffalo last year that looked like it was being played inside a snow globe, with the globe being picked up and shaken during each commercial break.

Boston was a complete mess yesterday with a few inches of snow, sleet, and muck falling from the sky, which made a mess of the field where the New England Patriots were happily hosting the Arizona Cardinals, who are less accustomed to digging out their cars and leaving behind patio furniture.

Another image from later in the game, this one instead depicts the substitutions of players as they near the goal line. Note the lines in the snow that begin at the left, and lead to where the players are lined up:

later-closer-500px.jpg

Monday, December 22, 2008 | football, physical, sports  

Numbers Hurt

Oww, my data.

stocks_329__1229362245_6462.jpg

(Originally found on Boston.com, credited only to Reuters… If anyone knows where to find a larger version or the original, please drop me a line. Update – Paul St. Amant and Martin Wattenberg have also pointed out The Brokers With Hands On Their Faces Blog, which is also evocative, yet wildly entertaining, but not as data-centric as The Brokers With Tales Of Sadness Depicted On Multiple Brightly Colored Yet Highly Detailed Computer Displays in the Background Behind Them Blog that I’ve just started.)

Monday, December 15, 2008 | displays, news  

Anecdotes

Further down in the reading pile is an article from Slate titled Does Advertising Really Work?

Every book ever written about marketing will at some point dig up that old, familiar line: “I know half my advertising is wasted—I just don’t know which half.”

The article by Seth Stevenson goes on to discuss What Sticks, by Rex Briggs and Greg Stuart, a pair of marketing researchers who study the advertising industry. Mad Men notwithstanding, I find the topic fascinating as a trained designer (trained meaning someone who learned to make such things) who happily pays Comcast $12.95 a month for the privilege to never hear or see Levitra, Viagra, or Cialis advertisements.

But separately, and as someone who did a lecture last night, I really enjoyed this point about anecdotes:

Why is this anecdote-laden style so popular with business authors, and so successful (to the tune of best-selling books and huge speaking fees)? I think it comes down to two things: 1) Fascinating anecdotes can, just by themselves, make you feel like you’ve really learned something… 2) A skillful anecdote-wielder can trick us into thinking the anecdote is prescriptive. In fact, what’s being sold is success by association. It’s no coincidence that [one such book talks] about the iPod—a recent mega-hit we’re all familiar with—in at least three chapters. It’s tempting to believe that bite-sized anecdotes about how the iPod was conceived, or designed, or marketed will reveal the secret formula for kicking butt with our own projects. Of course, it’s never that simple. An anecdote is a single data point, …

I find the first point interesting in light of the way in which we digest information from the world around us. We’re continually consuming data and then trying to synthesize it to larger meanings. And perhaps anecdotes are a kind of shortcut for this process because they provide something that’s already been digested but still feels substantial because it affords a brief leap in our thinking (and one that seems significant at the time).

Of course, unless you’re a baby bird, you’re better off digesting on your own.

As a side note, I went looking for an image to illustrate this blob of text, and was amused to find that the results from a google image search for “anecdote” consisted almost entirely of cartoons. Which reminds me of a story…

Saturday, December 13, 2008 | speaky  

Wet and Dry Ingredients; Mixing Bowls and Baking Dishes

51mrbt0099l_ss400_.jpgDigging through my reading list pile, I begin skimming through A Box, Darkly: Obfuscation, Weird Languages, and Code Aesthetics by Michael Mateas and Nick Montfort. I was moving along pretty good until I reached the description of the Chef programming language:

Another language, Chef, illustrates different design decisions for structuring play. Chef facilities double-coding programs as recipes. Variables are declared in an ingredients list, with amounts indicating the initial value (e.g., 114 g of red salmon). The type of measurement determines whether an ingredient is wet or dry; wet ingredients are output as characters, dry ingredients are output as numbers. Two types of memory are provided, mixing bowls and baking dishes. Mixing bowls hold ingredients which are still being manipulated, while baking dishes hold collections of ingredients to output. What makes Chef particularly interesting is that all operations have a sensible interpretation as a step in a food recipe. Where Shakespeare programs parody Shakespearean plays, and often contain dialog that doesn’t work as dialog in a play (“you are as hard as the sum of yourself and a stone wall”), it is possible to write programs in Chef that might reasonably be carried out as a recipe. Chef recipes do have the unfortunate tendency to produce huge quantities of food, however, particularly because the sous-chef may be asked to produce sub-recipes, such as sauces, in a loop.

Wonderful. (And a nice break for someone who has been fretting about languages and syntax over the last couple weeks.)

Friday, December 12, 2008 | languages  

Lecture in Cambridge, MA this Thursday

The folks at the Boston Chapter of the IEEE Computer Society / Greater Boston Chapter of the ACM have kindly invited me to give a talk this Thursday, December 11.

The details can be found here, here, here, and here. They all contain identical information, but have different text layouts and varied sizes of my grinning mug. You can choose which one you like best (and sorry, none are available without my picture).

Tuesday, December 9, 2008 | talk  

Subjectively Attractive Client-Side Scripted Browser-Delivered Charts and Plots

annual-fruit-sales.png…also known as Bluff, though they call it “Beautiful Graphs in JavaScript.” And who can argue with pink?

Bluff is a JavaScript port of the Gruff graphing library for Ruby. It is designed to support all the features of Gruff with minimal dependencies; the only third-party scripts you need to run it are a copy of JS.Class (about 2kb gzipped) and a copy of Google’s ExCanvas to support canvas in Internet Explorer. Both these scripts are supplied with the Bluff download. Bluff itself is around 8kb gzipped.

There’s something cool (and hilarious) about the fact that even though we’re talking about bleeding edge features (decent JavaScript and Canvas support) only available in the most recent of modern browser releases, the criteria of awesomeness and usefulness is still the same as 1997 — that it’s only 8 Kb.

(The only thing that strikes me as odd, strictly from an interface perspective, is the fact that I can’t drag the “image” to the Desktop, the way that I would a JPEG or GIF image. Certainly that’s also the case for Flash and Java, but there’s something that strikes me as strange the way that JavaScript is so lightweight — part of the browser — yet  the thing isn’t really “there”.)

At any rate, I’m fairly fascinated by this idea of JavaScript being a useful client-side means of generating images. Something very exciting is bound to happen.

Tuesday, December 9, 2008 | api, represent  

Visualization + Processing in Today’s IHT

Alice Rawsthorn writes about visualization in today’s International Herald Tribune, which also includes a mention of Processing:

Producing visualization required the development of new tools capable of analyzing huge quantities of complex data, and interpreting it visually. In the forefront is Processing, a software system devised by the American designers, Ben Fry and Casey Reas, to enable computer programmers to create visual images, and designers to get to grips with programming. “Processing is a bridge between those fields,” said Reas. “Designers feel comfortable with it because it enables them to work visually, yet it also feels familiar to programmers.”

Paola Antonelli on visualization:

“Visualization is not simply an evolution of graphic design, but a complete and complex design form that requires spatial, narrative, synthetic and graphic sensitivity and expertise,” explained Antonelli. “That’s why we see so many practitioners – architects, product designers, filmmakers, statisticians and graphic designers – flocking to it.”

The Humans vs. Chimps illustration even gets a mention:

Take a scientific question like the genetic difference between humans and chimpanzees. Would you prefer to plough through an essay on the subject, or to glance at the visualization created by Fry in which the 75,000 letters of coding in the human genome form a photographic image of a chimp’s head? Virtually all of our genetic information is identical, and Fry highlights the discrepancies by depicting nine of the letters as red dots. No contest again.

The full article is here, and also includes a slide show of other works.

Monday, December 8, 2008 | iloveme, processing, reviews  

220 Feet on 60 Minutes

From a segment on last night’s 60 Minutes:

Saudi Aramco was originally an American company. It goes way back to the 1930s when two American geologists from Standard Oil of California discovered oil in the Saudi desert.

Standard Oil formed a consortium with Texaco, Exxon and Mobil, which became Aramco. It wasn’t until the 1980s that Saudi Arabia bought them out and nationalized the company. Today, Saudi Aramco is the custodian of the country’s sole source of wealth and power.

Over 16,000 people work at the company’s massive compound, which is like a little country with its own security force, schools, hospitals, and even its own airline.

According to Abdallah Jum’ah, Saudi Aramco’s president and CEO, Aramco is the world’s largest oil producing company.

And it’s the richest company in the world, worth, according to the latest estimate, $781 billion.

I was about to change the channel (perhaps as you were just about to stop reading this post), when they showed the big board:

Jum’ah gave 60 Minutes a tour of the company’s command center, where engineers scrutinize and analyze every aspect of the company’s operations on a 220-foot digital screen.

“Every facility in the kingdom, every drop of oil that comes from the ground is monitored in real time in this room,” Jum’ah explained. “And we have control of each and every facility, each and every pipeline, each and every valve on the pipeline. And therefore, we know exactly what is happening in the system from A to Z.”

A large map shows all the oil fields in Saudi Arabia, including Ghawar, the largest on-shore oil field in the world, and Safaniya, the largest off-shore oil field in the world; green squares on the map monitor supertankers on the high seas in real time.

Here’s a short part of the segment that shows the display:

Since the smaller video doesn’t do it justice, several still images follow, each linked to their Comcastic, artifact-ridden HD versions:

02-small.jpg

Did rooms like this first exist in the movies and compelled everyone to imitate?

03-small.jpg

New guys and interns have to sit in front of the wall of vibrating bright blues:

04-small.jpg

The display is ambient in the sense that nobody’s actually using the larger version to do real work (you can see relevant portions replicated on individuals’ monitors). It seems to serve as a means of knowing what everyone in the room is up to (or as a deterrent against firing up Solitaire — I’m looking at you Ahmad). But more importantly, it’s there for visitors, especially visitors with video cameras, and people who write about visualization and happened to catch a segment about their info palace since it immediately followed the Patriots-Seahawks game.

A detail of one of the displays bears this out — an overload of ALL CAPS SANS SERIF TYPE with the appropriately unattractive array reds and greens. This sort of thing always makes me curious about what such displays would look like if they were designed properly. Rather than blowing up low resolution monitors, what would it look like if it were designed for the actual space and viewing distance in which it’s used?

08-small.jpg

Sexy numbers on curvaceous walls:

06-small.jpg

View the entire segment from 60 Minutes here.

Monday, December 8, 2008 | bigboard, energy, infographics, movies  

The Owl Learns Japanese

1378_visualdata_h1.jpgI’m incredibly pleased to write that O’Reilly Japan has just completed a Japanese translation of Visualizing Data. The book is available for pre-order on Amazon, and has also been announced on O’Reilly’s Japanese site.

Having the book published in Japanese is incredibly gratifying. Two of my greatest mentors (Suguru Ishizaki at CMU, and later John Maeda at MIT) were Japanese Americans who trained at Tsukuba University, training that informed both their own work and their teaching style.

I first unveiled Processing during a two week workshop course at Musashino Art University in Japan in August 2001, working with a group of about 40 students. And in 2005, we won the Interactive Design Prize from the Tokyo Type Director’s Club.

At any rate, I can’t wait to see the book in person, this is just too cool.

Monday, December 1, 2008 | processing, translation  

LA’s Dirtiest Pools & More

39342283-01163012.jpgFeaturing “38 projects and more than 730,000 records,” the Los Angeles Times now has a Data Desk feature, a collection of searchable data sets and information graphics from recent publications. It’s like reading the LA Times online but only paying attention to the data-oriented features. (Boring? Appealing? Your ideal newspaper? We database, you decide. Eww, don’t repeat that.) On first glance I thought (hoped) it would be more raw data, but even having all the items collected in one location suggests something interesting for how newspapers share (and perceive, internally) their carefully researched (an massaged) data that they collect on a regular basis.

Thanks to Casey for the pointer.

Thursday, November 27, 2008 | data, infographics  

Call for Papers: Visualizing the Past

James Torget, by way of my inbox:

I wanted to touch base to let you know about a workshop that we’re putting together out here at the University of Richmond.  Basically, UR (with James Madison University) will be hosting a workshop this spring focused on how scholars can create visualizations of historical data and how we can better share our data across the Internet.  To that end, we are looking for people working on these questions who would be interested in participating in an NEH-sponsored workshop.

We are seeking proposals for presentations at the workshop, and participants for our in-depth discussions.  The workshop is scheduled for February 20-21, 2009 at the University of Richmond.  We are asking that people submit their proposals by December 15, and we will extend invitations for participation by December 31, 2008. Detailed information can be found at: http://dsl.richmond.edu/workshop/

Thursday, November 27, 2008 | inbox, opportunities  

It only took 162 attempts, but Processing 1.0 is here!

We’ve just posted Processing 1.0 at http://processing.org/download. We’re so excited about it, we even took time to write a press release:

CAMBRIDGE, Mass. and LOS ANGELES, Calif. – November 24, 2008 – The Processing project today announced the immediate availability of the Processing 1.0 product family, the highly anticipated release of industry-leading design and development software for virtually every creative workflow. Delivering radical breakthroughs in workflow efficiency – and packed with hundreds of innovative, time-saving features – the new Processing 1.0 product line advances the creative process across print, Web, interactive, film, video and mobile.

Whups! That’s not the right one. Here we go:

Today, on November 24, 2008, we launch the 1.0 version of the Processing software. Processing is a programming language, development environment, and online community that since 2001 has promoted software literacy within the visual arts. Initially created to serve as a software sketchbook and to teach fundamentals of computer programming within a visual context, Processing quickly developed into a tool for creating finished professional work as well.

Processing is a free, open source alternative to proprietary software tools with expensive licenses, making it accessible to schools and individual students. Its open source status encourages the community participation and collaboration that is vital to Processing’s growth. Contributors share programs, contribute code, answer questions in the discussion forum, and build libraries to extend the possibilities of the software. The Processing community has written over seventy libraries to facilitate computer vision, data visualization, music, networking, and electronics.

Students at hundreds of schools around the world use Processing for classes ranging from middle school math education to undergraduate programming courses to graduate fine arts studios.

  • At New York University’s graduate ITP program, Processing is taught alongside its sister project Arduino and PHP as part of the foundation course for 100 incoming students each year.
  • At UCLA, undergraduates in the Design | Media Arts program use Processing to learn the concepts and skills needed to imagine the next generation of web sites and video games.
  • At Lincoln Public Schools in Nebraska and the Phoenix Country Day School in Arizona, middle school teachers are experimenting with Processing to supplement traditional algebra and geometry classes.

Tens of thousands of companies, artists, designers, architects, and researchers use Processing to create an incredibly diverse range of projects.

  • Design firms such as Motion Theory provide motion graphics created with Processing for the TV commercials of companies like Nike, Budweiser, and Hewlett-Packard.
  • Bands such as R.E.M., Radiohead, and Modest Mouse have featured animation created with Processing in their music videos.
  • Publications such as the journal Nature, the New York Times, Seed, and Communications of the ACM have commissioned information graphics created with Processing.
  • The artist group HeHe used Processing to produce their award-winning Nuage Vert installation, a large-scale public visualization of pollution levels in Helsinki.
  • The University of Washington’s Applied Physics Lab used Processing to create a visualization of a coastal marine ecosystem as a part of the NSF RISE project.
  • The Armstrong Institute for Interactive Media Studies at Miami University uses Processing to build visualization tools and analyze text for digital humanities research.

The Processing software runs on the Mac, Windows, and GNU/Linux platforms. With the click of a button, it exports applets for the Web or standalone applications for Mac, Windows, and GNU/Linux. Graphics from Processing programs may also be exported as PDF, DXF, or TIFF files and many other file formats. Future Processing releases will focus on faster 3D graphics, better video playback and capture, and enhancing the development environment. Some experimental versions of Processing have been adapted to other languages such as JavaScript, ActionScript, Ruby, Python, and Scala; other adaptations bring Processing to platforms like the OpenMoko, iPhone, and OLPC XO-1.

Processing was founded by Ben Fry and Casey Reas in 2001 while both were John Maeda’s students at the MIT Media Lab. Further development has taken place at the Interaction Design Institute Ivrea, Carnegie Mellon University, and the UCLA, where Reas is chair of the Department of Design | Media Arts. Miami University, Oblong Industries, and the Rockefeller Foundation have generously contributed funding to the project.

The Cooper-Hewitt National Design Museum (a Smithsonian Institution) included Processing in its National Design Triennial. Works created with Processing were featured prominently in the Design and the Elastic Mind show at the Museum of Modern Art. Numerous design magazines, including Print, Eye, and Creativity, have highlighted the software.

For their work on Processing, Fry and Reas received the 2008 Muriel Cooper Prize from the Design Management Institute. The Processing community was awarded the 2005 Prix Ars Electronica Golden Nica award and the 2005 Interactive Design Prize from the Tokyo Type Director’s Club.

The Processing website includes tutorials, exhibitions, interviews, a complete reference, and hundreds of software examples. The Discourse forum hosts continuous community discussions and dialog with the developers.

Tuesday, November 25, 2008 | processing  

Visualizing Data with an English translation and Processing.js

Received a note from Vitor Silva, who created the Portuguese-language examples from Visualizing Data using Processing.js:

i created a more “world friendly” version of the initial post. it’s now in english (hopefully in a better translation than babelfish) and it includes a variation on your examples of chapter 3.

The new page can be found here. And will you be shocked to hear that indeed it is far better than Babelfish?

Many thanks to Vitor for the examples and the update.

Wednesday, November 19, 2008 | examples, feedbag, translation, vida  

John Oliver and John King’s Magic Wall

Hilarious, bizarre, and rambling segment from last night’s Daily Show featuring John Oliver’s take on CNN’s favorite toy from this year’s election.

I’m continually amazed by the amount of interest this technology generates (yeah, I posted about it too), so perspective from the Daily Show is always helpful and welcome.

Wednesday, November 19, 2008 | election, interact  

What has driven women out of Computer Science?

1116-sbn-webdigi-crop.gifCasey yesterday noted this article from the New York Times on the declining number of women who are pursuing computer science degrees. Declining as in “wow, weren’t the numbers too low already?” From the article’s introduction:

ELLEN SPERTUS, a graduate student at M.I.T., wondered why the computer camp she had attended as a girl had a boy-girl ratio of six to one. And why were only 20 percent of computer science undergraduates at M.I.T. female? She published a 124-page paper, “Why Are There So Few Female Computer Scientists?”, that catalogued different cultural biases that discouraged girls and women from pursuing a career in the field. The year was 1991.

Computer science has changed considerably since then. Now, there are even fewer women entering the field. Why this is so remains a matter of dispute.

The article goes on to explain that even though there is far better gender parity (since 1991) when looking at roles in technical fields, computer science still stands alone in moving backwards.

The text also covers some of the “do it with gaming!” nonsense. As someone who became interested in programming because I didn’t like games, I’ve never understood why gaming was pushed as a cure-all for disinterest in programming:

Such students who choose not to pursue their interest may have been introduced to computer science too late. The younger, the better, Ms. Margolis says. Games would offer considerable promise, except that they have been tried and have failed to have an effect on steeply declining female enrollment.

But I couldn’t agree more with the sentiment with regard to age. I know of two all-girls schools (Miss Porter’s in Connecticut and Nightingale-Bamford in New York) who have used Processing in courses with high school and middle school students, and I couldn’t be more excited about it. Let’s hope there are more.

Tuesday, November 18, 2008 | cs, gender, reading  

Visualizing Data with Portuguese and Processing.js

Very cool! Check out these implementations of several Visualizing Data examples that make use of John Resig’s Processing.js, an adaptation of the Processing API with pure JavaScript. This means running in a web browser with no additional plug-ins (no Java Virtual Machine kicking in while you take a sip of coffee—much less drain the whole cup, depending the speed of your computer). Since the first couple chapters cover straightforward, static exercises, I’d been wanting to try this, but it’s more fun when someone beats you to it. (Nothing is better than feeling like a slacker, after all.)

map-example.pngView the introductory Processing sketch from Page 22,  or the map of the United States populated with random data points from Page 35.

Babelfish translation of the page here, with choice quotes like “also the shipment of external filing-cabinets had that to be different of what was in the book.”

And the thing is, when I finished the proof of the book for O’Reilly, I had this uneasy feeling that I was shipping the wrong filing-cabinets. Particularly the external ones.

Monday, November 17, 2008 | examples, processing, translation, vida  

Did Forbes just write an article about a font?

Via this Slate article from Farhad Manjoo (writer of tech-hype articles with Salon and now Slate), I just read about Droid, the typeface used in Google’s new Android phones. More specifically, he references this Forbes article, describing the background of the font, and its creator, Steve Matteson of Ascender Corporation in Elk Grove, Illinois.

Some background from the Forbes piece:

In fonts, Google has a predilection for cute letters and bright primary colors, as showcased in the company’s own logo. But for Android Google wanted a font with “common appeal,” Davis says. Ascender’s chief type designer, Steve Matteson, who created the Droid fonts, says Google requested a design that was friendly and approachable. “They wanted to see a range of styles, from the typical, bubbly Google image to something very techno-looking,” Matteson says.

droidfont_426x100.jpg

The sweet spot—and the final look for Droid—fell somewhere in the middle. Matteson’s first design was “bouncy”: a look in line with the Google logo’s angled lowercase “e.” Google passed on the design because it was “a little too mannered,” Matteson says. “There was a fine line between wanting the font to have character but not cause too much commotion.”

Another proposal erred on the side of “techno” with squared-off edges reminiscent of early computer typefaces. That too was rejected, along with several others, in favor of a more neutral design that Matteson describes as “upright with open forms, but not so neutral as a design like, say, Helvetica.”

I haven’t had a chance to play with an Android phone (as much as I’ve been happy with T-Mobile, particularly their customer service, do I re-up with them for two years just to throw money at alpha hardware?) so I can’t say much about the face, but I find the font angle fascinating, particular in light of Apple’s Helvetica-crazy iPhone and iPod Touch. (Nothing says late 1950s Switzerland quite like a touch-screen interface mobile phone, after all.)

Ascender Corporation also seems to be connected to the hideously named C@#$(*$ fonts found in Windows Vista and Office 2007: Calibri, Cambria, Candara, Consolas, Constantia, Corbel, Cariadings. In the past several years, Microsoft has shown a notable and impressive commitment to typography (most notably, hiring Matthew Carter to create Verdana, and other decisions of that era), but the new C* fonts have that same air of creepiness of a family who names all their kids with names that start with the same letter. I mean sure, they’re terrific people, but man, isn’t that just a little…unnecessary?

Monday, November 17, 2008 | mobile, typography  

Change is always most interesting

The New York Times has a very nicely done election map this year. Amongst its four viewing options is a depiction of counties that voted more Democratic (blue) or Republican (red) in comparison to the 2004 presidential election:

shift-levels-500.jpg

The blue is to be expected, given that the size of the win for Obama, but the red pattern is quite striking.

Also note the shift for candidate home states, in Arizona with McCain on the ticket, and what appears to be the reverse result in parts of Massachusetts, with Kerry no longer on the ticket. (The shift to the Democrats in Indiana is also amazing: without looking at the map closely enough I had assumed that area to be Obama’s home of Illinois.)

I recommend checking out the actual application on the Times site, the interaction lacks some of the annoying ticks that can be found in some of their other work (irritating rollovers that get in the way, worthless zooming, and silly transition animations). It’s useful and succinct, just like an infographic should be. Or just the way Mom used to make. Or whatever.

Thursday, November 6, 2008 | infographics, interact, mapping, politics  

iPolljunkie, iPoliticsobsession, iFix, iLackawittytitle

I apologize that I’ve been too busy and distracted with preparing Processing 1.0 to have any time to post things here, but here’s a quickie so that the page doesn’t just rot into total embarrassment.

Slate this morning announced the availability of a poll tracking application for the iPhone:

iphoneapp2-crop.jpg

I haven’t yet ponied up ninety nine of my hard-earned cents to buy it but find it oddly fascinating. Is there actually any interest for this? Is this a hack? Is there a market for such things? Is the market simply based on the novelty of it? Is it possible to quantify the size of the poll-obsessed political junkie market? And how is that market comprised—what percentage of those people are part of campaigns, versus just people who spend too much time reading political news? (I suspect the former is negligible, but may be tainted as a card-carrying member of the latter group.)

To answer my own questions, I suspect that it was thrown together by a couple of people from the tech side of the organization (meaning “hack” in the best sense of the word), who then sold management on it, with the rationale of 1) it’ll generate a little press (or hype on, um, blogs), 2) it’ll reinforce Slate readers’ interest in or connection to the site, and 3) it’s a little cool and trendy. I don’t think they’re actually planning to make money on it (or recoup any development costs), but that the price tag has more to do with 99¢ sounds more valuable and interesting than a free giveaway.

Of course, anyone with more interesting insights (let alone useful facts), please pass them along. I’m hoping it’s an actual Cocoa app, and not just a special link to web pages reformatted for the iPhone, which would largely invalidate this post and extinguish my own curiosity about the beast.

Update: The application is a branded reincarnation of a poll tracker developed by Aaron Brethorst at Chimp Software. Here’s his blog post announcing the change, and even a press release.

Friday, October 3, 2008 | infographics, mobile, politics, software  

Three-dimensional force-directed starling layout

Amazing video of starling flocking behavior, via Dan Paluska:

And how a swarm reacts to a falcon attack, via Burak Arikan:

For myself and all you designers out there just getting their heads around particle simulations, this is just a reminder: nature is better than you.

Wednesday, September 24, 2008 | forcelayout, physical, science  

Small Design Firm is looking for a programmer-designer

nobel00.jpgMy friends down the street at Small Design Firm (started by Media Lab alum and namesake David Small) are looking for a programmer-designer type:

Small Design Firm is an interactive design studio that specializes in museum exhibits, information design, dynamic typography and interactive art. We write custom graphics software and build unique physical installations and media environments. Currently our clients include the Metropolitan Museum of Art, United States Holocaust Memorial Museum and Maya Lin.

We are looking to hire an individual with computer programming and design/art/architecture skills. Applicants should have a broad skill set that definitely includes C++ programming experience and an interest in the arts. This position is open to individuals with a wide variety of experiences and specialities. Our employees have backgrounds in computer graphics, typography, electrical engineering, architecture, music, and physics.

Responsibilities will be equally varied. You will be programming, designing, writing proposals, working directly with clients, managing content and production, and fabricating prototypes and installations.

Small Design Firm is an energetic and exciting place to work. We are a close-knit community, so we are looking for an outgoing team member who is willing to learn new skills and bring new ideas to the group.

Salary is commensurate with experience and skill set. Benefits include health insurance, SIMPLE IRA, and paid vacation.

Contact john (at) smalldesignfirm.com if you’re interested.

Tuesday, September 16, 2008 | opportunities  

Hide the bipolar data, here comes bioinformatics!

I was fascinated a few weeks ago to receive this email from the Genome-announce list at UCSC:

Last week the National Institutes of Health (NIH) modified their policy for posting and accessing genome-wide association studies (GWAS) data contained in NIH databases. They have removed public access to aggregate genotype GWAS data in response to the publication of new statistical techniques for analyzing dense genomic information that make it possible to infer the group assignment (case vs. control) of an individual DNA sample under certain circumstances. The Wellcome Trust Case Control Consortium in the UK and the Broad Institute of MIT and Harvard in Boston have also removed aggregate data from public availability. Consequently, UCSC has removed the “NIMH Bipolar” and “Wellcome Trust Case Control Consortium” data sets from our Genome Browser site.

The ingredients for a genome-wide association study are a few hundred people, and a list of what genetic letter (A, C, G, or T) is found at a few hundred specific locations in the DNA of each of those people. Such data is then correlated to whether individuals have a particular disease, and using the correlation, it’s possible to sometimes localize what part of the genome is responsible for the disease.

Of course, the diseases might be of a sensitive nature (e.g. bipolar disorder), so when such data is made publicly available, it’s done in a manner that protects the privacy of the individuals in the data set. What this message means is that a bioinformatics method has been developed that undermines those privacy protections. An amazing bit of statistics!

This made me curious about what led to such a result, so with a little digging, I found this press release, which describes the work:

A team of investigators led by scientists at the Translational Genomics Research Institute (TGen) have found a way to identify possible suspects at crime scenes using only a small amount of DNA, even if it is mixed with hundreds of other genetic fingerprints.

Using genotyping microarrays, the scientists were able to identify an individual’s DNA from within a mix of DNA samples, even if that individual represented less than 0.1 percent of the total mix, or less than one part per thousand. They were able to do this even when the mix of DNA included more than 200 individual DNA samples.

The discovery could help police investigators better identify possible suspects, even when dozens of people over time have been at a crime scene. It also could help reassess previous crime scene evidence, and it could have other uses in various genetic studies and in statistical analysis.

So the CSI folks have screwed it up for the bipolar folks. The titillatingly-titled “Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays” can be found at PLoS Genetics, and a PDF describing the the policy changes is on the NIH’s site for Genome-Wide Association Studies. The PDF provides a much more thorough explanation of what association studies are, in case you’re looking for something better than my cartoon version described above.

Links to much more coverage can be found here, which includes major journals (Nature) and mainstream media outlets (LA Times, Financial Times) weighing in on the research. (It’s always funny to see how news outlets respond to this sort of thing—the Financial Times talk about the positive side, the LA Times focuses exclusively on the negative.) A discussion about the implications of the study can also be found on the PLoS site, with further background from the study’s primary author.

Science presents such fascinating contradictions. A potentially helpful advance that undermines another area of research. The breakthrough that opens a Pandora’s Box. It’s probably rare to see such a direct contradiction (that’s not heavily politicized like, say, stem cell research), but the social and societal impact is undoubtedly one of the things I love most about genetics in particular.

Tuesday, September 16, 2008 | genetics, mine, privacy, science  

Mention Offhand and Ye Shall Receive

Just received a helpful note from Nelson Minar, who notes an already redrawn version of the graph from the last post over at Chartjunk. The redraw aims to improve the proportion between the different tax brackets:

taxplans-crop-small.jpg

Much better! Read more about their take, and associated caveats here. (Also thanks to Peter Merholz and Andrew Otwell who also wrote, yet were no match for Nelson’s swift fingers.)

Saturday, September 13, 2008 | feedbag, infographics, notaneconomist, politics  

Glancing at Tax Proposals

Finally, the infographic I’ve been waiting for, the Washington Post compares the tax proposals of United States presidential candidates John McCain and Barack Obama:

graphic-halfsize.jpg

Lots of words have been spilled over the complexities of tax policy, whether in stump speeches, advertisements, or policy papers. But these are usually distilled for voters in lengthy articles that throw more words at the problem. But compare even a well-written article like this one at Business Week versus the graphic above from the Washington Post. Which of the two will you be able to remember tomorrow?

I also appreciate that the graphic very clearly represents the general tax policies of Republicans vs. Democrats, without showing bias toward either. The only thing that’s missing is a sense of how big each of the categories are – how many people are in the “over $2.87 million” category versus how many are in the “$66,000 to $112,000” category, which would help convey a better sense of the “middle class” term that candidates like to throw around.

There is still greater complexity to the debate than what’s shown in this image (the Business Week article describes treasury shortfalls based on the McCain proposal, for instance), but without the initial explanation provided by that graphic, will voters even bother with those details?

Saturday, September 13, 2008 | infographics, notaneconomist, politics  

Sustainable Creativity at Pixar

pixar_photo5_blursharpen.jpgGiven some number of talented people, success is not particularly surprising. But sustaining that success in a creative organization, the way that Pixar has over the last fifteen years is truly exceptional. Ed Catmull, cofounder of Pixar (and computer graphics pioneer) writes about their success for the Harvard Business Review:

Unlike most other studios, we have never bought scripts or movie ideas from the outside. All of our stories, worlds, and characters were created internally by our community of artists. And in making these films, we have continued to push the technological boundaries of computer animation, securing dozens of patents in the process.

On Creativity:

People tend to think of creativity as a mysterious solo act, and they typically reduce products to a single idea: This is a movie about toys, or dinosaurs, or love, they’ll say. However, in filmmaking and many other kinds of complex product development, creativity involves a large number of people from different disciplines working effectively together to solve a great many problems. The initial idea for the movie—what people in the movie business call “the high concept”—is merely one step in a long, arduous process that takes four to five years.

A movie contains literally tens of thousands of ideas.

On Taking Risks:

…we as executives have to resist our natural tendency to avoid or minimize risks, which, of course, is much easier said than done. In the movie business and plenty of others, this instinct leads executives to choose to copy successes rather than try to create something brand-new. That’s why you see so many movies that are so much alike. It also explains why a lot of films aren’t very good. If you want to be original, you have to accept the uncertainty, even when it’s uncomfortable, and have the capability to recover when your organization takes a big risk and fails. What’s the key to being able to recover? Talented people!

Reminding us that we learn more from failure, the more interesting part of the article talks about how Pixar responded to early failures in Toy Story 2:

Toy Story 2 was great and became a critical and commercial success—and it was the defining moment for Pixar. It taught us an important lesson about the primacy of people over ideas: If you give a good idea to a mediocre team, they will screw it up; if you give a mediocre idea to a great team, they will either fix it or throw it away and come up with something that works.

Toy Story 2 also taught us another important lesson: There has to be one quality bar for every film we produce. Everyone working at the studio at the time made tremendous personal sacrifices to fix Toy Story 2. We shut down all the other productions. We asked our crew to work inhumane hours, and lots of people suffered repetitive stress injuries. But by rejecting mediocrity at great pain and personal sacrifice, we made a loud statement as a community that it was unacceptable to produce some good films and some mediocre films. As a result of Toy Story 2, it became deeply ingrained in our culture that everything we touch needs to be excellent.

On mixing art and technology:

[Walt Disney] believed that when continual change, or reinvention, is the norm in an organization and technology and art are together, magical things happen. A lot of people look back at Disney’s early days and say, “Look at the artists!” They don’t pay attention to his technological innovations. But he did the first sound in animation, the first color, the first compositing of animation with live action, and the first applications of xerography in animation production. He was always excited by science and technology.

At Pixar, we believe in this swirling interplay between art and technology and constantly try to use better technology at every stage of production. John coined a saying that captures this dynamic: “Technology inspires art, and art challenges the technology.”

I saw Catmull speak to the Computer Science department a month or two before I graduated from Carnegie Mellon. Toy Story had been released two years earlier, and 20 or 30 of us were all jammed into a room listening to this computer graphics legend speaking about…storytelling. The importance of narrative. How the movies Pixar was creating had less to do with the groundbreaking computer graphics (the reason that most were in the room) than it did with a good story. This is less shocking nowadays, especially if you’ve ever seen a lecture by someone from Pixar, but the scene left an incredible impression on me. It was a wonderful message to the programmers in attendance about the importance of placing purpose before the technology, but without belitting the importance of either.

(While digging for an image to illustrate this post, I also found this review of The Pixar Touch: The Making of a Company, a book that seems to cover similar territory as the HBR article, but from the perspective of an outside author. The image is stolen from Ricky Grove’s review.)

Tuesday, September 9, 2008 | creativity, failure, movies  

Temple of Post-Its

The writing room of author Will Self (Wikipedia), where he organizes his complicated stories through copious use of small yellow (and pink) adhesive papers on the wall:

ws24-wall-500.jpg

Or amongst a map and more papers:

ws5-map-500.jpg

Not even the bookshelf is safe:

ws6-shelf-500.jpg

Check out the whole collection.

Reminds me of taking all the pages of my Ph.D. dissertation (a hundred or so) and organizing them on the floor of a friend’s living room. (Luckily it was a large living room.) It was extremely helpful and productive but frightened my friend who returned home to a sea of paper and a guy who had been indoors all day sitting in the middle of it with a slightly wild look in his eyes.

(Thanks to Jason Leigh, who mentioned the photos during his lecture at last week’s iCore summit in Banff.)

Wednesday, September 3, 2008 | collections, organize  

In A World…Without Don LaFontaine

Don LaFontaine, voice artist for some 5,000 movies and 350,000 advertisements passed away Monday. He’s the man who came up with the “In A World…” that begins most film trailers, as well as the baritone voice style that goes with it. The Washington Post has an obituary.

In the early 1960s, he landed a job in New York with National Recording Studios, where he worked alongside radio producer Floyd L. Peterson, who was perfecting radio spots for movies. Until then, movie studios primarily relied on print advertising or studio-made theatrical trailers. The two men became business partners and, together, perfected the familiar format.

Mr. LaFontaine, who was editing, writing and producing in the early days of the partnership, became a voice himself by accident. In 1964, when an announcer failed to show up for a job, he recorded himself reading copy and sent it to the studio with a message: “This is what it’ll sound like when we get a ‘real’ announcer.”

Trailer for The Elephant Man, proclaimed to be his favorite:

And a short interview/documentary:

Don’s impact is unmistakable, and it’s striking to think of how his approach changed movie advertising. May he rest in peace.

Wednesday, September 3, 2008 | movies  

Handcrafted Data

1219473416_8507.jpgContinuing Luddite Monday, a new special feature on benfry.com, an article from the Boston Globe about the prevalence of handcrafted images in reference texts. Dushko Petrovich writes:

But in fact, nearly two centuries after the publication of his famous folios, it is Audubon’s technique, and not the sharp eye of the modern camera, that prevails in a wide variety of reference books. For bird-watchers, the best guides, the most coveted guides – like those by David Allen Sibley and Roger Tory Peterson – are still filled with hand-painted images. The same is true for similar volumes on fish, trees, and even the human body. Ask any first-year medical student what they consult during dissections, and they will name Dr. Frank H. Netter’s meticulously drafted “Atlas of Human Anatomy.” Or ask architects and carpenters to see their structures, and they will often show you chalk and pencil “renderings,” even after the things have been built and professionally photographed.

This nicely reinforces the case for drawing, and why it’s so powerful. The article later gets to the meat of the issue, which is the same reason that drawing is a topic on a site about data visualization.

Besides seamlessly imposing a hierarchy of information, the handmade image is also free to present its subject from the most efficient viewpoint. Audubon sets a high standard in this regard; he is often at pains to depict the beak in its most revealing profile, the crucial feathers at an identifiable angle, the front leg extended just so. When the nighthawk and the whip-poor-will are pictured in full flight, their legs tucked away, he draws the feet at the side of the page, so we’re not left guessing. If Audubon draws a bird in profile, as he does with the pitch-black rook and the grayer hooded crow, we’re not missing any details a three-quarters view would have shown.

And finally, a reminder:

Confronted with unprecedented quantities of data, we are constantly reminded that quality is what really matters. At a certain point, the quality and even usefulness of information starts being defined not by the precision and voracity of technology, but by the accuracy and circumspection of art. Seen in this context, Audubon shows us that painting is not just an old fashioned medium: it is a discipline that can serve as a very useful filter, collecting, editing, and carefully synthesizing information into a single efficient and evocative image – giving us the information that we really want, information we can use and, as is the case with Audubon, even cherish.

Consider this your constant reminder, because I think it’s actually quite rare that quality is acknowledged. I regularly attend lectures by speakers who boast about how much data they’ve collected and the complexity of their software and hardware, but it’s one in ten thousand who even mention the art of removing or ignoring data in search of better quality.

Looks like the Early Drawings book mentioned in the article will be available at the end of September.

Monday, September 1, 2008 | drawing, human, refine  

Skills as Numbers

numerati-small.jpgBusinessWeek has an excerpt of Numerati, a book about the fabled monks of data mining (publishers weekly calls them “entrepreneurial mathematicians”) who are sifting through the personal data we create every day.

Picture an IBM manager who gets an assignment to send a team of five to set up a call center in Manila. She sits down at the computer and fills out a form. It’s almost like booking a vacation online. She puts in the dates and clicks on menus to describe the job and the skills needed. Perhaps she stipulates the ideal budget range. The results come back, recommending a particular team. All the skills are represented. Maybe three of the five people have a history of working together smoothly. They all have passports and live near airports with direct flights to Manila. One of them even speaks Tagalog.

Everything looks fine, except for one line that’s highlighted in red. The budget. It’s $40,000 over! The manager sees that the computer architect on the team is a veritable luminary, a guy who gets written up in the trade press. Sure, he’s a 98.7% fit for the job, but he costs $1,000 an hour. It’s as if she shopped for a weekend getaway in Paris and wound up with a penthouse suite at the Ritz.

Hmmm. The manager asks the system for a cheaper architect. New options come back. One is a new 29-year-old consultant based in India who costs only $85 per hour. That would certainly patch the hole in the budget. Unfortunately, he’s only a 69% fit for the job. Still, he can handle it, according to the computer, if he gets two weeks of training. Can the job be delayed?

This is management in a world run by Numerati.

I’m highly skeptical of management (a fundamentally human activity) being distilled to numbers in this manner. Unless, of course, the managers are that poor at doing their job. And further, what’s the point of the manager if they’re spending most of their time filling out the vacation form-style work order? (Filling out tedious year-end reviews, no doubt.) Perhaps it should be an indication that the company is simply too large:

As IBM sees it, the company has little choice. The workforce is too big, the world too vast and complicated for managers to get a grip on their workers the old-fashioned way—by talking to people who know people who know people.

Then we descend (ascend?) into the rah-rah of today’s global economy:

Word of mouth is too foggy and slow for the global economy. Personal connections are too constricted. Managers need the zip of automation to unearth a consultant in New Delhi, just the way a generation ago they located a shipment of condensers in Chicago. For this to work, the consultant—just like the condensers—must be represented as a series of numbers.

I say rah-rah because how else can you put refrigeration equipment parts in the same sentence as a living, breathing person with a mind, free will and a life.

And while I don’t think I agree with this particular thesis, the book as a whole looks like an interesting survey of efforts in this area. Time to finish my backlog of Summer reading so I can order more books…

Monday, September 1, 2008 | human, mine, notafuturist, numberscantdothat, privacy, social  

Is Processing a Language?

This question is covered in the FAQ on Processing.org, but still tends to reappear on the board every few months (most recently here). Someone once described Processing syntax as a dialect of Java, which sounds about right to me. It’s syntax that we’ve added on top of Java to make things a little easier for a particular work domain (roughly, making visual things). There’s also a programming environment that significantly simplifies what’s found in traditional IDEs. Plus there’s a core API set (and a handful of core libraries) that we’ve built to support this type of work. If we did these in isolation, none would really stick out:

  • The language changes are pretty minimal. The big difference is probably how they integrate with the IDE that’s built around the idea of sitting down and quickly writing code (what we call sketching). We don’t require users to first learn class definitions or even method declarations before they can show something on the screen, which helps avoid some of the initial head-scratching that comes from trying to explain “public class” or “void” or beginning programmers. For more advanced coders, it helps Java feel a bit more like scripting. I use a lot of Perl for various tasks, and I wanted to replicate the way you can write 5-10 lines of Perl (or Python, or Ruby, or whatever) and get something done. In Java, you often need double that number of lines just to set up your class definitions and a thread.
  • The API set is a Java API. It can be used with traditional Java IDEs (Eclipse, Netbeans, whatever) and a Processing component can be embedded into other applications. But without the rest of it (the syntax and IDE), Processing (API or otherwise) it would not be as widely used as it is today. The API grew out of Casey and I’s work, and our like/dislike of various approaches used by libraries that we’ve used: Postscript, QuickDraw, OpenGL, Java AWT, even Applesoft BASIC. Can we do OpenGL but still have it feel as simple as writing graphics code on the Apple ][? Can we simplify current graphics approaches so that they at least feel simpler like the original QuickDraw on the Mac?
  • The IDE is designed to make Java-style programming less wretched. Check out the Integration discussion board to see just how un-fun it is to figure out how the Java CLASSPATH and java.library.path work, or how to embed AWT and Swing components. These frustrations and complications sometimes are even filed as bugs in the Processing bugs database by users who have apparently become spoiled by not having to worry about such things.

If pressed, perhaps the language itself is probably the easiest to let go of—witness the Python, Ruby and now JavaScript versions of the API, or the C++ version that I use for personal work (when doing increasingly rare C++ projects). And lots of people build Processing projects without the preprocessor and PDE.

In some cases, we’ve even been accused of not being clear that it’s “just Java,” or even that Processing is Java with a trendy name. Complaining is easier than reading, so there’s not much we can do for people who don’t glance at the FAQ before writing their unhappy screeds. And with the stresses of the modern world, people need to relieve themselves of their angst somehow. (On the other hand, if you’ve met either of us, you’ll know that Casey and I are very trendy people, having grown up in the farmlands of Ohio and Michigan.)

However, we don’t print “Java” on every page of Processing.org for a very specific reason: knowing it’s Java behind the scenes doesn’t actually help our audience. In fact, it usually causes more trouble than not because people expect it to behave exactly like Java. We’ve had a number of people who copy and pasted code from the Java Tutorial into the PDE, and are confused when it doesn’t work.

(Edit – In writing this, I don’t want to understate the importance of Java, especially in the early stages of the Processing project. It goes without saying that we owe a great deal to Sun for developing, distributing, and championing Java. It was, and is, the best language/environment on which to base the project. More about the choice of language can be found in the FAQ.)

But for as much trouble as the preprocessor and language component of Processing is for us to develop (or as irrelevant it might seem to programmers who already code in Java), we’re still not willing to give that up—damned if we’re gonna make students learn how to write a method declaration and “public class Blah extends PApplet” before they can get something to show up on the screen.

I think the question is a bit like the general obsession of people trying to define Apple as a hardware or software company. They don’t do either—they do both. They’re one of the few to figure out that the distinction actually gets in the way of delivering good products.

Now, whether we’re delivering a good product is certainly questionable—the analogy with Apple may, uh, end there.

Wednesday, August 27, 2008 | languages, processing, software  

Mapping Iran’s Online Public

mapping-iran-public-200px.jpg“Mapping Iran’s Online Public” is a fascinating (and very readable) paper from a study by John Kelly and Bruce Etling at Harvard’s Berkman Center. From the abstract:

In contrast to the conventional wisdom that Iranian bloggers are mainly young democrats critical of the regime, we found a wide range of opinions representing religious conservative points of view as well as secular and reform-minded ones, and topics ranging from politics and human rights to poetry, religion, and pop culture. Our research indicates that the Persian blogosphere is indeed a large discussion space of approximately 60,000 routinely updated blogs featuring a rich and varied mix of bloggers.

In addition to identifying four major poles (Secular/Reformist, Conservative/Religious, Persian Poetry and Literature, and Mixed Networks.) A number of surprising findings include details like the nature of discourse (such as the prominence of the poetry and literature category) or issues of anonymity:

…a minority of bloggers in the secular/reformist pole appear to blog anonymously, even in the more politically-oriented part of it; instead, it is more common for bloggers in the religious/conservative pole to blog anonymously. Blocking of blogs by the government is less pervasive than we had assumed.

They also produced images to represent the nature of the networks, seen in the thumbnail at right. The visualization is created with a force-directed layout that iteratively groups data points closer based on their content. It’s useful for this kind of study, where the intent is to represent or identify larger groups. In this case, the graphic supports what’s laid out in the text, but to me the most interesting thing about the study is the human-centered tasks of the project, such as the work done by hand in reviewing and categorizing such a large number of sites. It’s this background work that sets it apart from many other images like it which tend to rely too heavily on automation.

(The paper is from April 6, 2008 and I first heard about after being contacted by John in June. Around 1999, our group had hosted students that he was teaching in a summer session for a visit to the Media Lab. And now a few months later, I’m digging through my writing todo pile.)

Tuesday, August 26, 2008 | forcelayout, represent, social  

Panicky Addition

In response to the last post, a message from João Antunes:

…you should also read this story about Panic’s old MP3 player applications.

The story includes how they came to almost dominate the Mac market before iTunes, how AOL and Apple tried to buy the application before coming out with iTunes, even recollections of meetings with Steve Jobs and how he wanted them to go work at Apple – it’s a fantastic indie story.

Regarding the Mac ‘indie’ development there’s this recent thesis by a Dutch student, also a good read.

I’d read the story about Audion (the MP3 player) before, and failed to make the connection that this was the same Audion that I rediscovered in the O’Reilly interview from the last post (and took a moment to mourn its loss). It’s sad to think of how much better iTunes would be if the Panic guys were making it — iTunes must be the first MP3 player that feels like a heavy duty office suite. In the story, Cabel Sasser (the other co-founder of Panic) begins:

Is it just me? I mean, do you ever wonder about the stories behind everyday products?

What names were Procter & Gamble considering before they finally picked “Swiffer”? (Springle? Sweepolio? Dirtrocker?) What flavors of Pop-Tarts never made it out of the lab, and did any involve lychee, the devil’s fruit?

No doubt the backstory on the Pop-Tarts question alone could be turned into a syndicated network show to compete with LOST.

Audion is now available as a free download, though without updates since 2002, it’s not likely to work much longer (seemed fine with OS X 10.4, though who knows with even 10.5).

Tuesday, August 19, 2008 | feedbag, software  

Mangled Tenets and Exasperation: the iTunes App Store

By way of Darling Furball, a blog post by Steven Frank, co-founder of Panic, on his personal opinion of Apple’s gated community of software distribution, the iTunes App Store:

Some of my most inviolable principles about developing and selling software are:

  1. I can write any software I want. Nobody needs to “approve” it.
  2. Anyone who wants to can download it. Or not.
  3. I can set any price I want, including free, and there’s no middle-man.
  4. I can set my own policies for refunds, coupons and other promotions.
  5. When a serious bug demands an update, I can publish it immediately.
  6. If I want, I can make the source code available.
  7. If I want, I can participate in a someone else’s open source project.
  8. If I want, I can discuss coding difficulties and solutions with other developers.

The iTunes App Store distribution model mangles almost every one of those tenets in some way, which is exasperating to me.

But, the situation’s not that clear-cut.

The entire post is very thoughtful and well worth reading, it’s also coming from a long-time Apple developer rather than some crank from an online magazine looking to stir up advertising hits. Panic’s software is wonderful: Transmit is an application that singlehandedly makes me want to use a Mac (yet it’s only, uh, an SFTP client). I think his post nicely sums up the way a lot of developers (including myself) feel about the App Store. He concludes:

I’ve been trying to reconcile the App Store with my beliefs on “how things should be” ever since the SDK was announced. After all this time, I still can’t make it all line up. I can’t question that it’s probably the best mobile application distribution method yet created, but every time I use it, a little piece of my soul dies. And we don’t even have anything for sale on there yet.

Reading this also made me curious to learn more about Panic, which led me to this interview from 2004 with Frank and the other co-founder. He also has a number of side projects, including Spamusement, a roughly drawn cartoon depicting spam headlines (Get a bigger flute, for instance).

Tuesday, August 19, 2008 | mobile, software  

Data as Dairy

As a general tip, keep in mind that any data looks better as a wheel of Gouda.

delicious cheese

You say “market share,” I say “wine pairing.”

(Via this article, passed along by a friend looking for ways to make pie charts with more visual depth.)

Tuesday, August 19, 2008 | refine, represent  

History of Predictive Text Swearing

Wonderful commentary on being nannied by your mobile, and head-in-the-sand text prediction algorithms.

There’s lots more to be said about predictive text, but in the meantime, this also brings to mind Jonathan Harris’ QueryCount, which I found to be a more interesting followup to his WordCount project. (WordCount tells us something we already know, but QueryCount lets us see something we suspect.)

Monday, August 18, 2008 | text  

“Hello Kettle? Yeah, hi, this is the Pot calling.”

Wired’s Ryan Singel reports on a spat between AT&T and Google regarding their privacy practices:

Online advertising networks — particularly Google’s — are more dangerous than the fledgling plans and dreams of ISPs to install eavesdropping equipment inside their internet pipes to serve tailored ads to their customers, AT&T says.

Even more fun than watching gorillas fight (you don’t have to pick a side—it’s guaranteed to be entertaining) is when they bring up accusations that are usually reserved for the security and privacy set (or borderline paranoids who write blogs that cover  information and privacy). Or their argument boils down to “but we’re less naughty than you.” Ask any Mom about the effectiveness of that argument. AT&T writes:

Advertising-network operators such as Google have evolved beyond merely tracking consumer web surfing activity on sites for which they have a direct ad-serving relationship. They now have the ability to observe a user’s entire web browsing experience at a granular level, including all URLs visited, all searches, and actual page-views.

Deep Packet Inspection is an important sounding way to say that they’re just watching all your traffic. It’s quite literally the same as the post office opening all your letters and reading them, and in AT&T’s case, adding additional bulk mail (flyers, sweepstakes, and other junk) that seems appropriate to your interests based on what they find.

Are you excited yet?

Monday, August 18, 2008 | privacy  

The Importance of Failure

This segment from CBS Sunday Morning isn’t particularly groundbreaking or profound (and perhaps a bit hokey), but is a helpful reminder on the importance of failure. (Nevermind the failure to post anything new for two weeks.)

Duke University professor Henry Petroski has made a career studying design failures, which he says are far more interesting than successes.

“Successes teach us very little,” Petroski said.

Petroski’s talking about bridges, but it holds true for any creative endeavor.

Also cited are J.K. Rowling bottoming out before her later success, van Gogh who sold just one painting before his death, Michael Jordan not making his high school basketball team, and others. (You’ve heard of these, but like I said, it’s about the reminder.)

It also notes that the important part is also how you handle failure, citing Chipper Jones, who leads baseball with a .369 batting average, which is impressive but also means that he’s only getting a hit one in three times he has a chance:

“Well, most of the time it’s not [going your way] and that’s why you have to be able to accept failure,” Jones said. “[…] a lot of work […] here in the big league is how you accept failure.”

Which is another important reminder: the standout difference in “making it” has to do with bouncing back from failure.

And if nothing else, watch it for footage of the collapse of the Tacoma Narrows Bridge in 1940. Such a beautiful (if terrifying) picture of cement and metal oscillating in the wind. Also linked from the Wikipedia article are a collection of still photographs (including the collapse) and links to newsreel footage from the Internet Archive.

Friday, August 15, 2008 | failure  

More NASA Observations Acquire Interest

Some additional followup from Robert Simmon regarding the previous post. I asked more about the “amateur Earth observers” and the intermediate data access. He writes:

The original idea was sparked from the success of amateur astronomers discovering comets. Of course amateur astronomy is mostly about making observations, but we (NASA) already have the observations: the question is what to do with them–which we really haven’t figured out. One approach is to make in-situ observations like aerosol optical thickness (haziness, essentially), weather measurements, cloud type, etc. and then correlate them with satellite data. Unfortunately, calibration issues make this data difficult to use scientifically. It is a good outreach tool, so we’re partnering with science museums, and the GLOBE program does this with schools.

We don’t really have a good sense yet of how to allow amateurs to make meaningful analyses: there’s a lot of background knowledge required to make sense of the data, and it’s important to understand the limitations of satellite data, even if the tools to extract and display it are available. There’s also the risk that quacks with and axe to grind will willfully abuse data to make a point, which is more significant for an issue like climate change than it is for the face on Mars, for example. That’s just a long way of saying that we don’t know yet, and we’d appreciate suggestions.

I’m more of a “face on Mars” guy myself. It’s unfortunate that the quacks even have to be considered, though not surprising from what I’ve seen online. Also worth checking out:

Are you familiar with Web Map Service (WMS)?
http://www.opengeospatial.org/standards/wms
It’s one of the ways we distribute & display our data, in addition to KML.

And one last followup:

Here’s another data source for NASA satellite data that’s a bit easier than the data gateway:
http://daac.gsfc.nasa.gov/techlab/giovanni/

and examples of classroom exercises using data, with some additional data sources folded in to each one:
http://serc.carleton.edu/eet/

The EET holds an “access data workshop” each year in late spring, you may be interested in attending next year.

And with regards to guidelines, Mark Baltzegar (of The Cyc Foundation) sent along this note:

Are you familiar with the ongoing work within the W3C’s Linking Open Data project? There is a vibrant community actively exposing and linking open data.
http://richard.cyganiak.de/2007/10/lod/
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

More to read and eat up your evening, at any rate.

Thursday, July 31, 2008 | acquire, data, feedbag, parse  

NASA Observes Earth Blogs

Robert Simmon of NASA caught this post about the NASA Earth Observatory and was kind enough to pass along some additional information.

Regarding the carbon emissions video:

The U.S. carbon emissions data were taken from the Vulcan Project:
http://www.purdue.edu/eas/carbon/vulcan/index.php

They distribute the data here:
http://www.purdue.edu/eas/carbon/vulcan/research.html

In addition to the animation (which was intended to show the daily cycle and the progress of elevated emissions from east to west each morning), we published a short feature about the project and the dataset, including some graphs that remove the diurnal cycle.
http://earthobservatory.nasa.gov/Study/AmericanCarbon/

American Carbon is an example of one of our feature articles, which are published every month or so. We try to cover current research, focusing on individual scientists, using narrative techniques. The visualizations tie in closely to the text of the story. I’m the primary visualizer, and I focus on presenting the data as clearly as possible, rather than allowing free-form investigation of data. We also publish daily images (with links to images at the original resolution), imagery of natural hazards emphasizing current events (fires, hurricanes, and dust storms, for example), nasa press releases, a handful of interactive lessons, and the monthly global maps of various parameters. We’re in the finishing stages of a redesign, which will hopefully improve the navigation and site usability.

Also some details about the difficulties of distributing and handling the data:

These sections draw on data from wide and varied sources. The raw data is extremely heterogeneous, formats include: text files, HDF, matlab, camera raw files, GRADS, NetCDF, etc. All in different projections, at different spatial scales, and covering different time periods. Some of them are updated every five minutes, and others are reprocessed periodically. Trying to make the data available—and current—through our site would be overly ambitious. Instead, we focus on a non-expert audience interested in space, technology, and the environment, and link to the original science groups and the relevant data archives. Look in the credit lines of images for links.

Unfortunately the data formats can be very difficult to read. Here’s the main portal for access to NASA Earth Observing System data:
http://esdis.eosdis.nasa.gov/index.html

and the direct link to several of the data access interfaces:
http://esdis.eosdis.nasa.gov/dataaccess/search.html

And finally, something closer to what was discussed in the earlier post:

With the complexity of the science data, there is a place for an intermediate level of data: processed to a consistent format and readable by common commercial or free software (intervention by a data fairy?). NASA Earth Observations (NEO) is one attempt at solving that problem: global images at 0.1 by 0.1 degrees distributed as lossless-compressed indexed color images and csv files. Obviously there’s work to be done to improve NEO, but we’re getting there. We’re having a workshop this month to develop material for “amateur Earth observers” which will hopefully help us in this area, as well.

This speaks to the audience I tried to address with Visualizing Data in particular (or with Processing in general). There is a group of people who want access to data that’s more low-level than what’s found in a newspaper article, but not as complicated as raw piles of data from measuring instruments that are only decipherable by the scientists who use them.

This is a general theme, not specific to NASA’s data. And I think it’s a little more low-level than requiring that everything be in mashup-friendly XML or JSON feeds, but it seems worthwhile to start thinking about what the guidelines would be for open data distribution. And with such guidelines in place, we can browbeat organizations to play along! Since that would be, uh, a nice way to thank them for making their data available in the first place.

Thursday, July 31, 2008 | acquire, data, feedbag  

Processing 0143 and a status report

Just posted Processing 0143 to the download page. This is not yet the stable release, so please read revisions.txt, which describes the signficant changes in the releases since 0135 (the last “stable” release, and the current default download).

I’ve also posted a status report:

Some updates from the Processing Corporation’s east coast tower high rise offices in Cambridge, MA.

We’re working to finish Processing 1.0. The target date is this Fall, meaning August or September. We’d like to have it done as early as possible so that Fall classes can make use of it. In addition to the usual channels, we have a dozen or so people who are helping out with getting the release out the door. We’ll unmask these heros at some point in the future.

I’m also pleased to announce that I’m able to focus on Processing full time this Summer with the help of a stipend provided by Oblong Industries. They’re the folks behind the gesture-controlled interface you see in Minority Report. (You can find more about them with a little Google digging.) They’re funding us because of their love of open source and they feel that Processing is an important project. As in, there are no strings attached to the funding, and Processing is not being re-tooled for gesture interfaces. We owe them our enormous gratitude.

The big things for 1.0 include the Tools menu, better compile/run setup (what you see in 0136+), bringing back P2D, perhaps bringing back P3D with anti-aliasing, better OpenGL support, better library support, some major bug fixes (outstanding threading problems and more).

If you have a feature or bug that you want fixed in time for 1.0, now is the time to vote by making sure that it’s listed at http://dev.processing.org/bugs.

I’ll try to post updates more frequently over the next few weeks.

Monday, July 28, 2008 | processing  

Wordle me this, Batman

I’ve never really been fond of tag clouds, but Wordle, by MacGyver of software (and former drummer for They Might Be Giants) Jonathan Feinberg gives the representation an aesthetic nudge lacking in most representations. The application creates word clouds from input data submitted by users. I was reminded of it yesterday by Eugene, who submitted Lorem Ipsum:

lorem-500.png

I had first heard about it from emailer Bill Robertson, who had uploaded Organic Information Design, my master’s thesis. (Which was initially flattering but quickly became terrifying when I remembered that it still badly needs a cleanup edit.)

organic-500.jpg

A wonderful tree shape! Can’t decide which I like better: “information” as the stem or “data” as a cancerous growth in the upper-right.

Mr. Feinberg is also the reason that Processing development has been moving to Eclipse (replacing emacs, some shell scripts, two packages of bazooka bubble gum and the command line) because of his donation of a long afternoon helping set up the software in the IDE back when I lived in East Cambridge, just a few blocks from where he works at IBM Research.

Wednesday, July 23, 2008 | inbox, refine, represent  

Blood, guts, gore and the data fairy

The O’Reilly press folks passed along this review (PDF) of Visualizing Data from USENIX magazine. I really appreciated this part:

My favorite thing about Visualizing Data is that it tackles the whole process in all its blood, guts, and gore. It starts with finding the data and cleaning it up. Many books assume that the data fairy is going to come bring you data, and that it will either be clean, lovely data or you will parse it carefully into clean, lovely data. This book assumes that a significant portion of the data you care about comes from some scuzzy Web page you don’t control and that you are going to use exactly the minimum required finesse to tear out the parts you care about. It talks about how to do this, and how to decide what the minimum required finesse would be. (Do you do it by hand? Use a regular expression? Actually bother to parse XML?)

Indeed, writing this book was therapy for that traumatized inner child who learned at such a tender young age that the data fairy did not exist.

Wednesday, July 23, 2008 | iloveme, parse, reviews, vida  

NASA Earth Observatory

carbon.jpgSome potentially interesting data from NASA passed along by Chris Lonnen. The first is the Earth Observatory, which includes images of things like Carbon Monoxide, Snow Cover, Surface Temperature, UV Exposure, and so on. Chris writes:

I’m not sure how useful they would be to novices in terms of usable data (raw numbers are not provided in any easy to harvest manner), but the information is
still useful and they provide for a basic, if clunky, presentation that follows the basic steps you laid out in your book. They data can be found here, and they occasionally compile it all into interesting visualizations. My favorite being the carbon map here.

The carbon map movie is really cool, though I wish the raw data were available since the strong cyclical effect seen in the animation needs to be separated out. The cycles dominates the animation to such an extent that it’s nearly the only takeaway from the movie. For instance, each cycle is a 24 hour period. Instead of showing them one after another, show several days adjacent one another, so that we can compare 3am with one day to 3am the next.

For overseas readers, I’ll note that the images and data are not all U.S.-centric—most cover the surface of the Earth.

I asked Chris about availability for more raw data, and he did a little more digging:

The raw data availability is slim. From what I’ve gathered you need to contact NASA and have them give you clearance as a researcher. If you were looking for higher quality photography for a tutorial NASA Earth Observations has a newer website that I’ve just found which offers similar data in the format of your choice at up to 3600 x 1800. For some sets it will also offer you data in CSV or CSV for Excel.

If you needed higher resolutions that that NASA’s Visible Earth offers some TIFF’s at larger sizes. A quick search for .tiff gave me an 16384 x 8192 map of the earth with city lights shining, which would be relatively easy to filter out from the dark blue background. These two websites are probably a bit more helpful.

Interesting tidbits for someone interested in a little planetary digging. I’ve had a few of these links sitting in a pile waiting for me to finish the “data” section of my web site; in the meantime I’ll just mention things here.

Update 31 July 2008: Robert Simmon from NASA chimes in.

Saturday, July 19, 2008 | acquire, data, inbox, science  

Brains on the Line

I was reminded this morning that Mario Manningham, a wide receiver who played for Michigan was rumored to have scored a 6 (out of 50) on the Wonderlic, an intelligence test administered in some occupations (and now pro football) to check the mental capability of job candidates. Intelligence tests are strange beasts, but after watching my niece working on similar problems—for fun—during her summer vacation last week, the tests caught my eye more than when I first heard about it.

Manningham was once a promising undergrad receiver for U of M, but has in recent years proven himself to be a knucklehead, loafing through plays and most recently making headlines for marijuana use and an interview on Sirius radio described as “… arrogant and defensive. When asked about the balls he dropped in big spots, he responded, ‘What about the ball I caught?’” So while an exceptionally score on a standardized test might suggest dyslexia, the guy’s an egotistical bonehead even without mitigating factors.

Most people don’t associate brains with football, but in recent years teams have begun to use a Wonderlic test while scouting, which consists of 50 questions to be completed in 12 minutes. Many of the questions are multiple choice, but the time is certainly a factor when completing the tests. A score of 10 is considered “literate”, while 20 is said to coincide with average intelligence (an IQ of 100, though now we’re comparing one somewhat arbitrary numerically scored intelligence test with another).

In another interesting twist, the test is also administered to players the day of the NFL combine—which means they first spend the day running, jumping, benching, interviewing, and lots of other -ings, before they sit down and take an intelligence test. It’s a bit like a medical student running a half marathon before taking the boards.

Wonderlic himself says that basically, the scores decrease as you move further away from the ball, which is interesting but unsurprising. It’s sort of obvious that a quarterback needs to be on the smarter side, but I was curious to see what this actually looked like. Using this table as a guide, I then grabbed this diagram from Wikipedia showing a typical formation in a football game. I cleaned up the design of the diagram a bit and replaced the positions with their scores:

positions1.png

Offense is shown in blue, defense in red. You can see the quarterback with a 24, the center (over 6 feet and around 300 lbs.) averaging higher at 25, and the outside linemen even a little higher. Presumably this is because the outside linemen need to mentally quick (as well as tough) to read the defense and respond to it. Those are the wide receivers (idiot loud mouths) with the 17s on the outside.

To make the diagram a bit clearer, I scaled each position based on its score:

positions2.png

That’s a little better since you can see the huddle around the ball and where the brains need to be for the system of protection around it. With the proportion, I no longer need the numbers, so I’ve switched back to using the initials for each position’s title:

positions3.png

(Don’t tell Tufte that I’ve used the radius, not the proportional area, of the circle as the value for each ellipse! A cardinal sin that I’m using in this case to improve proportion and clarify a point.)

I’ll also happily point out that the linemen for the Patriots all score above average for their position:

Player Position Year Score
Matt Light left tackle 2001 29
Logan Mankins left guard 2005 25
Dan Koppen center 2003 28
Stephen Neal right guard 2001 31
Nick Kaczur right tackle 2005 29

A position-by-position image for a team would be interesting, but I’ve already spent too much time thinking about this. The Patriots are rumored to be heavy on brains, with Green Bay at the other end of the spectrum.

An ESPN writeup about the test (and testing in general) can be found here, along with a sample test here.

One odd press release from Wonderlic even compares scores per NFL position with private sector job titles. For instance, a middle linebacker scores like a hospital orderly, while an offensive tackle is closer to a marketing executive. Fullbacks and halfbacks share the lower end with dock hands and material handlers.

During the run-up to Super Bowl XXXII in 1998, one reporter even dug up the Wonderlic scores for the Broncos and Packers, showing Denver with an average score of 20.4 compared to Green Bay’s 19.6. As defending champions, the Packers were favored but wound up losing 31-24.

Nobody cited test scores in the post-game coverage.

Wednesday, July 16, 2008 | football, sports  

Eric Idle on “Scale”

Scale is one of the most important themes in data visualization. In Monty Python’s The Meaning of Life, Eric Idle shares his perspective:

The lyrics:

Just remember that you’re standing on a planet that’s evolving
And revolving at nine hundred miles an hour,
That’s orbiting at nineteen miles a second, so it’s reckoned,
A sun that is the source of all our power.
The sun and you and me and all the stars that we can see
Are moving at a million miles a day
In an outer spiral arm, at forty thousand miles an hour,
Of the galaxy we call the ‘Milky Way’.

Our galaxy itself contains a hundred billion stars.
It’s a hundred thousand light years side to side.
It bulges in the middle, sixteen thousand light years thick,
But out by us, it’s just three thousand light years wide.
We’re thirty thousand light years from galactic central point.
We go ’round every two hundred million years,
And our galaxy is only one of millions of billions
In this amazing and expanding universe.

The universe itself keeps on expanding and expanding
In all of the directions it can whizz
As fast as it can go, at the speed of light, you know,
Twelve million miles a minute, and that’s the fastest speed there is.
So remember, when you’re feeling very small and insecure,
How amazingly unlikely is your birth,
And pray that there’s intelligent life somewhere up in space,
‘Cause there’s bugger all down here on Earth.

Wednesday, July 16, 2008 | music, scale  

Postleitzahlen in Deutschland

germany-contrast-small.pngMaximillian Dornseif has adapted Zipdecode from Chapter 6 of Visualizing Data to handle German postal codes. I’ve wanted to do this myself since hearing about the OpenGeoDB data set which includes the data, but thankfully he’s taken care of it first and is sharing it with the rest of us along with his modified code.

(The site is in German…I’ll trust any of you German readers to let me know if the site actually says that Visualizing Data is the dumbest book he’s ever read.)

Also helpful to note that he used Python for preprocessing the data. He doesn’t bother implementing a map projection, as done in the book, but the Python code is a useful example of using another language when appropriate, and how the syntax differs from Processing:

# Convert opengeodb data for zipdecode
fd = open('PLZ.tab')
out = []
minlat = minlon = 180
maxlat = maxlon = 0

for line in fd:
    line = line.strip()
    if not line or line.startswith('#'):
        continue
    parts = line.split('\t')
    dummy, plz, lat, lon, name = parts
    out.append([plz, lat, lon, name])
    minlat = min([float(lat), minlat])
    minlon = min([float(lon), minlon])
    maxlat = max([float(lat), maxlat])
    maxlon = max([float(lon), maxlon])

print "# %d,%f,%f,%f,%f" % (len(out), minlat, maxlat, minlon, maxlon)
for data in out:
    plz, lat, lon, name = data
    print '\t'.join([plz, str(float(lat)), str(float(lon)), name])

In the book, I used Processing for most of the examples (with a little bit of Perl) for sake of simplicity. (The book is already introducing a lot of new material, why hurt people and introduce multiple languages while I’m at it?) However that’s one place where the book diverges from my own process a bit, since I tend to use a lot of Perl when dealing with large volumes of text data. Python is also a good choice (or Ruby if that’s your thing), but I’m tainted since I learned Perl first, while a wee intern at Sun.

Tuesday, July 15, 2008 | adaptation, vida, zipdecode  

Parsing Numbers by the Bushel

While taking a look at the code mentioned in the previous post, I noticed two things. First, the PointCloud.pde file drops directly into OpenGL-specific code (rather than Processing API) for sake of speed to draw thousands and thousands of points. It’s further proof that I need to finish the PShape class for Processing 1.0, which will automatically handle this sort of thing automatically.

Second is a more general point about parsing. This isn’t intended as a nitpick on Aaron’s code (it’s commendable that he put his code out there for everyone to see—and uh, nitpick about). But seeing how it was written reminded me that most people don’t know about the casts in Processing, particularly when applied to whole arrays, and this can be really useful when parsing data.

To convert a String to a float (or int) in Processing, you can use a cast, for instance:

String s = "667.12";
float f = float(s);

This also in fact works with String[] arrays, like the kind returned by the split() method while parsing data. For instance, in SceneViewer.pde, the code currently reads:

String[] thisLine = split(raw[i], ",");
points[i * 3] = new Float(thisLine[0]).floatValue() / 1000;
points[i * 3 + 1] = new Float(thisLine[1]).floatValue() / 1000;
points[i * 3 + 2] = new Float(thisLine[2]).floatValue() / 1000;

Which could be written more cleanly as:

String[] thisLine = split(raw[i], ",");
float[] f = float(thisLine);
points[i * 3 + 0] = f[0] / 1000;
points[i * 3 + 1] = f[1] / 1000;
points[i * 3 + 2] = f[2] / 1000;

However, to his credit, Aaron may have have intentionally skipped it in this case since he don’t need the whole line of numbers.

Or if you’re using the Processing API with Eclipse or some other IDE, that means that the float() cast won’t work for you. You can substitute float() with the parseFloat() method:

String[] thisLine = split(raw[i], ",");
float[] f = parseFloat(thisLine);
points[i * 3 + 0] = f[0] / 1000;
points[i * 3 + 1] = f[1] / 1000;
points[i * 3 + 2] = f[2] / 1000;

The same can be done for int, char, byte, and boolean. You can also go the other direction by converting float[] or int[] arrays to String[] arrays using the str() method. (The method is named str() because a String() cast would be awkward, a string() cast would be error prone, and it’s not really parseStr() either.)

When using parseInt() and parseFloat() (versus the int() and float() casts), it’s also possible to include a second parameter that specifies a “default” value for missing data. Normally, the default is Float.NaN for parseFloat(), or 0 with parseInt() and the others. When parsing integers, 0 and “no data” often have a very different meaning, in which case this can be helpful.

Tuesday, July 15, 2008 | parse  

Radiohead – House of Cards

Radiohead’s new video for “House of Cards” built using a laser scanner and software:

Aaron Koblin, one of Casey’s former students was involved in the project and also made use of Processing for the video. He writes:

A couple of hours ago was the release of a project I’ve been working on with Radiohead and Google. Lots of laser scanner fun.

I released some Processing code along with the data we captured to make the video. Also tried to give a basic explanation of how to get started using Processing to play with all this stuff.

The project is hosted at code.google.com/radiohead, where you can also download all the data for the point clouds captured by the scanner, as well as Processing source code to render the points and rotate Thom’s head as much as you’d like. This is the download page for the data and source code.

They’ve also posted a “making of” video:

(Just cover your ears toward the end where the director starts going on about “everything is data…”)

Sort of wonderful and amazing that they’re releasing the data behind the project, opening up the possibility for a kind of software-based remixing of the video. I hope their leap of faith will be rewarded by individuals doing interesting and amazing things with the data. (Nudge, nudge.)

Aaron’s also behind the excellent Flight Patterns as well as The Sheep Market, both highly recommended.

Tuesday, July 15, 2008 | data, motion, music  

Derek Jeter Probably Didn’t Need To Jump To Throw That Guy Out

05jeterderek14.jpgDerek Jeter vs. Objective Reality is an entertaining article from Slate regarding a study by Shane T. Jensen at the Wharton School. Nate DiMeo writes:

The take-away from the study, which was presented at the annual meeting of the American Association for the Advancement of Science, was that Mr. Jeter (despite his three Gold Gloves and balletic leaping throws) is the worst-fielding shortstop in the game.

The New York press was unhappy, but the stats-minded baseball types (Sabermetricians) weren’t that impressed. DiMeo continues:

Mostly, though, the paper didn’t provoke much intrigue because Jeter’s badness is already an axiom of [Sabermetric literature]. In fact, debunking the conventional wisdom about the Yankee captain’s fielding prowess has become a standard method of proving the validity of a new fielding statistic. That places Derek Jeter at the frontier of new baseball research.

Well put. Mr. Jeter defended himself by saying:

“Maybe it was a computer glitch”

What I like about the article, aside from a objective and quantitative reason to dislike Jeter (I already have a quantity of subjective reasons) is how the article frames the issue in the broader sports statistics debate. It nicely covers this new piece of information as a microcosm of the struggle between sabermetricians and traditional baseball types, while essentially poking fun at both: the total refusal of the traditional side to buy into the numbers, and the schadenfreude of the geeks going after Jeter since he’s the one who gets the girls. (The article is thankfully not as trite as that, but you get the idea.)

I’m also biased since the metric in the paper places Pokey Reese, one of my favorite Red Sox players of 2004 as #11 amongst second basemen between 2000-2005.

And of course, The Onion does it better:

Experts: ‘Derek Jeter Probably Didn’t Need To Jump To Throw That Guy Out’

BRISTOL, CT—Baseball experts agreed Sunday that Derek Jeter, who fielded a routine ground ball during a regular-season game in which the Yankees were leading by five runs and then threw it to first base using one of his signature leaps, did not have to do that to record the out. “If it had been a hard-hit grounder in the hole or even a slow dribbler he had to charge, that would’ve been one thing,” analyst John Kruk said during a broadcast of Baseball Tonight. “But when it’s hit right to him by [Devil Rays first-baseman] Greg Norton, a guy who has no stolen bases and is still suffering the effects of a hamstring injury sustained earlier this year… Well, that’s a different story.” Jeter threw out Norton by 15 feet and pumped his fist in celebration at the end of the play.

In other news, I can’t believe I just put a picture of Jeter on my site.

Monday, July 14, 2008 | baseball, mine, sports  

Storyboarding with the Coen Brothers

0805ande1_533x600_4.jpgWonderful article about the work of J. Todd Anderson, who storyboards the Coen Brothers’ movies:

Anderson’s drawings have a jauntiness that seems absent from the more serious cinematic depiction; Anderson says he is simply trying to inject as much of a sense of action as possible into each scene.

Anderson describes the process of meeting about a new film:

“It’s like they’re making a movie in front of me,” he says. “They tell me the shots. I do fast and loose drawings on a clipboard with a Sharpie pen—one to three drawings to a sheet of regular bond paper. I try to establish the scale, trap the angle, ID the character, get the action.”

More in the article

Friday, June 27, 2008 | drawing, movies  

National Traffic Scorecard

The top 100 most congested metropolitan areas, visualized as a series of tomato stems:

scorecard-500.png

Includes links to PDF reports for each area which detail overall congestion and the worst bottlenecks.

Thursday, June 26, 2008 | mapping, traffic  

Paternalism at the state level and the definition of “advice”

Following up on an earlier post, The New York Times jumps in with more about California (and New York before it) shutting down personal genomics companies, including this curious definition of advice:

“We think if you’re telling people you have increased risk of adverse health effects, that’s medical advice,” said Ann Willey, director of the office of laboratory policy and planning at the New York State Department of Health.

The dictionary confirmed my suspicion that advice refers to “guidance or recommendatios concerning prudent future action,” which doesn’t coincide with telling people they have increased risk for a disease. If they told you to take medication based on that risk, it would most certainly be advice. But as far as I know, the extent of the advice given by these companies is to consult a doctor for…advice.

As in the earlier post, the health department in California continues to sound nutty:

“We started this week by no longer tolerating direct-to-consumer genetic testing in California,” Karen L. Nickel, chief of laboratory field services for the state health department, said during a June 13 meeting of a state advisory committee on clinical laboratories.

We will not tolerate it! These tests are a scourge upon our society! The collapse of the housing loan market, high gas prices, and the “great trouble or suffering” brought on by this beast that preys on those with an excess of disposable income. Someone has to save these people who have $1000 to spare on self-curiosity! And the poor millionaires spending $350,000 to get their genome sequenced by Knome. Won’t someone think of the millionaires!?

I wish I still lived in California, because then I would know someone was watching out for me.

For the curious, the letters sent to the individual companies can be found here, sadly they aren’t any more insightful than the comments to the press. But speaking of scourge—the notices are all Microsoft Word files.

One interesting tidbit closing out the Times article:

Dr. Hudson [director of the Genetics and Public Policy Center at Johns Hopkins University] said it was “not surprising that the states are stepping in, in an effort to protect consumers, because there has been a total absence of federal leadership.” She said that if the federal government assured tests were valid, “paternalistic” state laws could be relaxed “to account for smart, savvy consumers” intent on playing a greater role in their own health care.

It’s not clear whether this person is just making a trivial dig at the federal government
or whether this is the root of the problem. In the previous paragraph she’s being flippant about “Genes R Us” so it might be just a swipe, but it’s an interesting point nonetheless.

Thursday, June 26, 2008 | genetics, government, privacy, science  

Surfing, Orgies, and Apple Pie

Obscenity law in the United States is based on Miller vs. California, a precedent set in 1973:

“(a) whether the ‘average person, applying contemporary community standards’ would find that the work, taken as a whole, appeals to the prurient interest,

(b) whether the work depicts or describes, in a patently offensive way, sexual conduct specifically defined by the applicable state law, and

(c) whether the work, taken as a whole, lacks serious literary, artistic, political, or scientific value.”

Of course, the definition of an average person or community standards isn’t quite as black and white as most Supreme Court decisions. In a new take, the lawyer defending the owner of a pornography site in Florida is using Google Trends to produce what he feels is a more accurate definition of community standards:

In the trial of a pornographic Web site operator, the defense plans to show that residents of Pensacola are more likely to use Google to search for terms like “orgy” than for “apple pie” or “watermelon.” The publicly accessible data is vague in that it does not specify how many people are searching for the terms, just their relative popularity over time. But the defense lawyer, Lawrence Walters, is arguing that the evidence is sufficient to demonstrate that interest in the sexual subjects exceeds that of more mainstream topics — and that by extension, the sexual material distributed by his client is not outside the norm.

Below, “surfing” in blue, “orgy” in red, and “apple pie” in orange:

viz-500.png

A clever defense. The trends can also be localized to roughly the size of a large city or county, which arguably might be considered the “community.” The New York Times article continues:

“Time and time again you’ll have jurors sitting on a jury panel who will condemn material that they routinely consume in private,” said Mr. Walters, the defense lawyer. Using the Internet data, “we can show how people really think and feel and act in their own homes, which, parenthetically, is where this material was intended to be viewed,” he added.

Fascinating that there could actually be something even remotely quantifiable about community standards. “I know it when I see it” is inherently subjective, so is any introduction of objectivity an improvement? For more perspective, I recommend this article from FindLaw, which describes the history of “Movie Day” at the Supreme Court and the evolution of obscenity law.

The trends data has many inherent problems (lack of detail for one), but is another indicator of what we can learn from Google. Most important to me, the case provides an example of what it means for search engines to capture this information, because it demonstrates to the public at large (not just people who think about data all day) how the information can be used. As more information is collected about us, search engine data provides an imperfect mirror onto our society, previously known only to psychiatrists and priests.

Tuesday, June 24, 2008 | online, privacy, retention, social  

Typography Grab Bag: Berlow, Carter, and Indiana Jones

raiders.jpgIndiana Jones and the Fonts on the Maps – Mark Simonson takes on historical accuracy of the typography used in the Indiana Jones movies:

For the most part, the type usage in each of the movies is correct for the period depicted. With one exception: The maps used in the travel montages.

My theory is that this is because the travel maps are produced completely outside the standard production team. They’re done by some motion graphics house, outside the purview of the people on-set who are charged with issues of consistency. A nastier version of this theory might indict folks who do motion graphics for not knowing their typography and its time period—instead relying on the “feel” of the type when selecting. The bland version of this theory is that type history is esoteric, and nobody truly cares.

(Also a good time to point out how maps are used as a narrative device in the film, to great effect. The red line extending across the map is part of the Indiana Jones brand. I’d be curious to hear the story behind the mapping—who decided it needed to be there, who made it happen, who said “let’s do a moving red line that tracks the progress”—which parts were intentional, and which unintentional.)

Identifying the period for the faces reminded me of a 2005 profile of Matthew Carter, which described his involvement in court cases where date was in doubt, but typography of artifacts in question gave away their era. Sadly the article cannot be procured from the web site of The New Yorker, though you may have better luck if you possess a library card. Matthew Carter designed the typefaces Verdana and Bell Centennial (among many others). Spotting his wispy white ponytail around Harvard Square is a bit like seeing a rock star, if you’re a Cantabridgian typography geek.

From A to Z, font designer knows his type – a Boston Globe interview with type designer David Berlow (one of the founders of Font Bureau), some of the questions are unfortunate, but a few interesting anecdotes:

Playboy magazine came to me; they were printing with two printing processes, offset and gravure. Gravure (printing directly from cylinder to paper), gives a richer, smoother texture when printing flesh tones and makes the type look darker on the page than offset (indirect image transfer from plates). So if you want the type to look the same, you have to use two fonts. We developed two fonts for Playboy, but they kept complaining that the type was still coming out too dark or too light. Finally, I got a note attached to a proof that said, “Sorry. It was me. I needed new glasses. Thanks for all your help. Hef.” That was Hugh Hefner, of course.

Or speaking about his office:

From Oakland, Calif., to Delft, Holland, all the designers work from home. I have never been to the office. The first time I saw it was when I watched the documentary “Helvetica,” which showed our offices.

fontstruct-screenshot-300.jpg

The strange allure of making your own fonts – Jason Fagone describes FontStruct, a web-based font design tool from FontShop:

FontStruct’s interface couldn’t be more intuitive. The central metaphor is a sheet of paper. You draw letters on the “sheet” using a set of standard paint tools (pencil, line, box, eraser) and a library of what FontStruct calls “bricks” (squares, circles, half-circles, crescents, triangles, stars). If you keep at it and complete an entire alphabet, FontStruct will package your letters into a TrueType file that you can download and plunk into your PC’s font folder. And if you’re feeling generous, you can tell FontStruct to share your font with everybody else on the Internet under a Creative Commons license. Every font has its own comment page, which tends to fill with praise, practical advice, or just general expressions of devotion to FontStruct.

Though I think my favorite bit might be this one:

But the vast majority of FontStruct users aren’t professional designers, just enthusiastic font geeks.

I know that because I’m one of them. FontStruct brings back a ton of memories; in college, I used to run my own free-font site called Alphabet Soup, where I uploaded cheapie fonts I made with a pirated version of a $300 program called Fontographer. Even today, when I self-Google, I mostly come up with links to my old, crappy fonts. (My secret fear is that no matter what I do as a reporter, the Monko family of fonts will remain my most durable legacy.)

The proliferation of bad typefaces: the true cost of software piracy.

Tuesday, June 17, 2008 | grabbag, mapping, refine, software, typography  

Personal genetic testing gets hilarious before it gets real

Before I even had a chance to write about personal genomics companies 23andMe, Navigenics, and deCODEme, Forbes reports that the California Health Department is looking to shut them down:

This week, the state health department sent cease-and-desist letters to 13 such firms, ordering them to immediately stop offering genetic tests to state residents.

Because of advances in genotyping, it’s possible for companies to detect changes from half a million data points (or soon, a million) of a person’s genome. The idea behind genotyping is that you look only for the single letter changes (SNPs) that are more likely to be unique between individuals, and then use that to create a profile of similarities and differences. So companies have sprung up, charging $1000 (ok, $999) a pop to decode these bits of your genome. It can then tell you some basic things about ancestry, or maybe a little about susceptibility for certain kinds of diseases (those that have a fairly simple genetic makeup—of which there aren’t many, to be sure).

Lea Brooks, spokesperson for the California Health Department, confirmed for Wired that:

…the investigation began after “multiple” anonymous complaints were sent to the Health Department. Their researchers began with a single target but the list of possible statute violators grew as one company led to another.

Listen folks, this is not just one California citizen, but two or more anonymous persons! Perhaps one of them was a doctor or insurance firm who have been neglected their cut of the $1000:

One controversy is that some gene testing Web sites take orders directly from patients without a doctor’s involvement.

Well now, that is a controversy! Genetics has been described as the future of medicine, and yet traditional drainers of wallets (is drainer a word?) in the current health care system have been sadly neglected. The Forbes article also describes the nature of the complaints:

The consumers were unhappy about the accuracy [of the tests] and thought they cost too much.

California residents will surely be pleased that the health department is taking a hard stand on the price of boutique self-testing. As soon as they finish off these scientifimagical “genetic test” goons, we could all use a price break on home pregnancy tests.

video1_6.pngAnd as to the accuracy of, or what can be ascertained from such tests? That’s certainly been a concern of the genetics community, and in fact 23andme has “admitted its tests are not medically useful, as they represent preliminary findings, and so are merely for educational purposes.” Which is perfectly clear to someone visiting their site, however that presents a bigger problem:

“These businesses are apparently operating without a clinical laboratory license in California. The genetic tests have not been validated for clinical utility and accuracy,” says Nickel.

So an accurate, clinical-level test is illegal. But a less accurate, do-it-yourself (without a doctor) test is also illegal. And yet, California’s complaint gets more bizarre:

“And they are scaring a lot of people to death.”

Who? The people who were just complaining about the cost of the test? That’s certainly a potential problem if you don’t do testing through a doctor—and in fact, it’s a truly significant concern. But who purchases a $999 test from a site with the cartoon characters seen above to check for Huntington’s disease?

And don’t you think if “scaring people” were the problem, wouldn’t the papers and the nightly news be all over it? The only thing they love more than a new scientific technology that’s going to save the world is a new scientific technology to be scared of. Ooga booga! Fearmongering hits the press far more quickly than it does the health department, so this particular line of argument just sounds specious.

The California Health Department does an enormous disservice to the debate of a complicated issue by mixing several lines of reasoning which taken as a whole simply contradict one another. The role of personal genetic testing in our society deserves a debate and consideration; I thought I would be able to post about that part first, but instead the CA government beat me to the dumb stuff.

Thomas Goetz, deputy editor at Wired has had two such tests (clearly not unhappy with the price), and angrily responds “Attention, California Health Department: My DNA Is My Data.” It’s not just those anonymous Californians who are wound up about genetic testing, he’s writing his sternly worded letter as we speak:

This is my data, not a doctor’s. Please, send in your regulators when a doctor needs to cut me open, or even draw my blood. Regulation should protect me from bodily harm and injury, not from information that’s mine to begin with.

Are angry declarations of ownership of one’s health data a new thing? It’s not like most people fight for their doctor’s office papers, or even something as simple as a fingerprint, this way.

It’ll be interesting to see how this shakes out. Or it might not, since it will probably consist of:

  1. A settlement by the various companies to continue doing business.
  2. Some means of doctors and insurance companies getting paid (requiring a visit, at a minimum).
  3. People trying to circumvent #2 (see related topics filed under “H” for Human Growth Hormone).
  4. An entrepreneur figures out how to do it online and in a large scale fashion (think WebMD), turning out new hoards of “information” seeking hypochondriacs to fret about their 42% potential alternate likelihood maybe chance of genetic malady. (You have brain cancer too!? OMG!)
  5. If this hits mainstream news, will people hear about the outcome of #1, or will there be an assumption that “personal genetic tests are illegal” from here on out? How skittish will this make investors (the Forbes set) about such companies?

Then again, I’ve already proven myself terrible at predicting the future. But I’ll happily enjoy the foolishness of the present.

Tuesday, June 17, 2008 | genetics, privacy, science  

Iron Woman

Apropos of the recent film graphics post, Jessica Helfand at Design Observer writes about the recently released Iron Man:

Iron Man is the fulfillment of all the computer-integrated movies were ever meant to be, and by computer-integrated, I mean just that: beyond the technical wizardry of special effects, this is a film in which the computer is incorporated, like a cast member, into the development of the plot itself.

I’ve not seen the movie but the statement appears to be provocative enough to elicit cheers and venom from the scribes in the comments section. (This seems to be common at Design Observer, are designers really this angry and unhappy? How ’bout them antisocial personal attacks! I take back what I wrote in the last post about wanting to be a designer when I grow up. Some thick skin or self-fashioned military grade body armor over at DO.)

On the other hand, a more helpful post linked to the lovely closing title sequence, designed by Danny Yount of Prologue.

endtitles-500.jpg

I wish they didn’t use Black Sabbath. Is that really the way it’s done in the film? Paranoid is a great album (even if Iron Man is my least favorite track) but the titles and the music couldn’t have less to do with each other. Enjoy the music or enjoy the video; just don’t do ’em together.

Saturday, June 14, 2008 | motion, movies  

All the water in the world

From a post by Dan Phiffer, an image by Adam Nieman and the Science Photo Library.

All the water in the world (1.4087 billion cubic kilometers of it) including sea water, ice, lakes, rivers, ground water, clouds, etc. Right: All the air in the atmosphere (5140 trillion tonnes of it) gathered into a ball at sea-level density. Shown on the same scale as the Earth.

label-moved-and-resaved.jpg

More information at the original post. (Thanks to Eugene for the link.)

Saturday, June 14, 2008 | infographics, scale  

Rick Astley & Ludacris

Someday I want to write like Ludacris, but for now I’ll enjoy info graphics of his work. Luda not only knows a lot of young ladies, but can proudly recite the range of area codes in which they live. Geographer (and feminist) Stefanie Gray took it upon herself to make a map:

finalareacodes-500px.jpg

You’ll need background music while taking a look; and I found a quick refresher of the lyrics also informative. More discussion and highlights of her findings can be found on Strange Maps, who first published Stefanie’s image.

In related news, someone else has figured out Rick Astley:

composite-500px.jpg

I’ve added the album cover at left so that you can look into his eyes and see his honest face for yourself. If you’re not a proud survivor of the 80s (or perhaps if you are), the single can be had for a mere 99¢. Or if that only gets you started, you can pick up his Greatest Hits. Someone also made another version of the graphic using the Google chart API (mentioned earlier), though it appears less analytically sound (accurate).

More from song charts at this earlier post.

Saturday, June 14, 2008 | infographics, music  

Paola Antonelli on Charlie Rose

This is from May, and the Design and the Elastic Mind show has now finished, but Paola Antonelli’s interview with Charlie Rose is well worth watching.

Paola’s incredibly sharp. Don’t turn it off in the first few minutes, however; I found that it wasn’t until about five or even ten minutes into the show that she began to sound like herself. I guess it takes a while to get past the requisite television pleasantries and the basic design-isms.

The full transcript doesn’t seem to be available freely, however some excerpts:

And I believe that design is one of the highest forms of human creative expression.

I would never dare say that! But I’ll secretly root for her making her case.

And also, I believe that designers, when they’re good, take revolutions in science and in technology, and they transform them into objects that people like us can use.

Doesn’t that make you want to be a designer when you grow up?

Regarding the name of the show, and the notion of elasticity:

…it was about showing how we need to adapt to different conditions every single day. Just work across different time zones, go fast and slow, use different means of communication, look at things at different scales. You know, some of us are perfectly elastic. And instead, some others get a little bit of stretch marks. And some others just cannot deal with it.

And designers help us cope with all these changes.

Her ability to speak plainly and clearly reinforces her point about designers and their role in society. (And if you don’t agree, consider what sort of garbage she could have said, or rather that most would have said, speaking about such a trendy oh-so-futuristic show.)

In the interest of full disclosure, she does mention my work (very briefly), but that’s not until about halfway through, so it shouldn’t interfere with your enjoyment of the rest of the interview.

Thursday, June 12, 2008 | iloveme, speaky  

Spying on teenagers: too much information

Excellent article from the Boston Globe Sunday Magazine on how parents of teenagers are handling their over-connected kids. Cell phones, text messaging, instant messaging, Facebook, MySpace, and to a lesser extent (for this age group) email mean that a lot of information and conversation is shared and exchanged. And as with all new technologies, it can all be tracked and recorded, and more easily spied upon. (More easily meaning that a parent can read a day worth of IM logs in a fairly quick sitting—something that couldn’t be done with a day’s worth of telephone conversations.) There are obvious and direct parallels to the U.S. government monitoring its own citizens, but I’ll return to that in a later post.

The article starts with a groan:

One mom does her best surveillance in the laundry room. Her teenage son has the habit of leaving his cellphone in the pocket of his jeans, so in between sorting colors and whites, she’ll grab his phone and furtively scroll through his text messages from the past week to see what he’s said, whom he’s connected with, and where he’s been.

While it’s difficult to say what this parent was specifically hoping to find (or what they’d do with the information), it worsens as it sinks to a level of cattiness:

Sometimes, she’ll use her own phone to call another mom she’s friendly with and share her findings in hushed tones.

Further in, some insight from Sherry Turkle:

MIT professor Sherry Turkle is a leading thinker on the relationship between human beings and technology. She’s also the mother of a teenage girl. So she knows what she’s talking about when she says, “Parents were not built to know the kinds of things that technology makes possible.”

(Emphasis mine.) This doesn’t just go for parents, it’s a much bigger issue of spying on the day-to-day habits and ramblings of someone else. This is the same reason why you should never read someone’s email, like a significant other, a spouse, a friend. No matter how well you know the sender and recipient, you’re still not them. You don’t think like them. You don’t see the world the way they do. You simply don’t have proper context, nor the understanding of their relationship with one another. You probably don’t even have the entire thread of even just this one email conversation. I’ve heard from friends who read an email belonging to their significant other, only to wind up in tears and expecting the worst.

This scenario never ends well: you can either keep it in and remain upset, or you can confront the person. In which case, one of two things will happen. One, that your worst fear will be true (“he’s cheating!”) and you’ll be partially indicted in the mess because you’ve spied (“how could you read my email?”), and you’ve lost the moral high ground you might otherwise have had (“I can’t believe you didn’t trust me”). Or two, that you’ve blown something out of proportion, and destroyed the trust of that person: someone that you cared about enough to be concerned to the point of reading their private email.

Returning to the article, one of the scenarios I found notable:

…there’s a natural desire, and a need, for teenagers to have their own parent-free zone as they get older.

As a graduating senior at Cambridge Rindge and Latin, Sam McFarland is grateful his parents trusted him to make the right decisions once he had established himself as worthy of the trust. A few of his friends had parents who were exceedingly vigilant. The result? “You don’t hang out at those kids’ houses as much,” Sam says.

So there’s something fascinating about this—that not only is it detrimental to your kid’s development to be overly involved, but that it presents a socialization problem for them because they become ostracized (even if mildly) because of your behavior.

And when parents confront?

When one of his friends was 14, the kid’s parents reprimanded him for something he had talked about online. Immediately, he knew they had been spying on him, and it didn’t take long for him to determine they’d been doing it for some time.” He was pretty angry,” Sam says, “He felt kind of invaded.” At first, his friend behaved, conscious that his parents were watching his every move.” But then it reached a tipping point,” Sam says. “He became so fed up about it that, not only didn’t he care if they were watching, but he began acting out, hoping they were watching or listening so he could upset them.”

I’m certain that this would have been my response if my parents had done something like this. (As if teenagers need something to fuel their adversarial attitude toward their parents.) But now you have a situation where a reasonably good kid has made an active decision to behave worse in response to his parents’ mistrust and attempt to rein him in.

The article doesn’t mention what he had done, but how bad could it have been? And that is the crux of the situation: What do these parents really expect to find, and how can that possibly be outweighed by breaking that bond of trust?

It’s also easy to spy, so one (technology savvy) parent profiled goes with what he calls his “fear of God” speech:

Greg warned them, “I can know everything you’re doing online. But I’m not going to invade your privacy unless you give me a reason to.”

By relying on the threat of intervention rather than intervention itself, Greg has been able to avoid the drawbacks that several friends of mine told me they experienced after monitoring their teenagers’ IM and text conversations. These are all great, involved parents who undertook limited monitoring for the right reasons. But they found that, in their hunt for reassurance that their teenager was not engaging in dangerously bad behavior, they were instead worn down by the little disappointments – the occasional use of profanities or mean-spirited name-calling – as well as the mind-numbing banality of so much teen talk.

And that’s exactly it—tying together the points of 1) you’re not in their head and 2) what did you expect to find? As you act out in different ways (particularly as a teenager), you’re trying to figure out how things fit. Nobody’s perfect, and they need some room to be their own age, particularly with their friends. Which made me particularly interested in this quote:

Leysia Palen, the University of Colorado professor, says the work of social theorist Erving Goffman is instructive. Goffman talked about how we all have “front-stage” and “backstage” personas. For example, ballerinas might seem prim and perfect while performing, only to let loose by smoking and swearing as soon as they are behind the curtain. “Everyone needs to be able to retreat to the backstage,” Palen says. “These kids need to learn. Maybe they need to use bad language to realize that they don’t want to use bad language.

Unfortunately the article also goes astray with its glorification of the multitasking abilities of today’s teenagers:

On an average weeknight, Tim has Facebook and IM sharing screen space on the Mac outside his bedroom as he keeps connected with dozens of friends simultaneously. His Samsung Slider cellphone rests nearby, ready to receive the next text message…Every once in a while, he’ll strum his guitar or look up at the TV to catch some Ninja Warrior on the G4 network. Playing softly in the background is his personal soundtrack that shuffles between the Beatles and a Swedish techno band called Basshunter. Amid all this, he is doing his homework.

Yes, in truly amazing fashion, the human race has somehow evolved in the last ten years to be capable of effectively multitasking between this many different things at once. I don’t understand why people (much less parents) buy this. We have a finite attention span, and technology suggests ways to carve it up into ever-smaller slices. I might balance email, phone calls, writing, and watching a Red Sox game in the background, but there’s no way I’m gonna claim that I’m somehow performing all those things at 100%, or even that as I focus in on one of them, I’m truly 100% at that task. Those will be my teenagers in the sensory deprivation tank while they work on Calculus and U.S. History.

And to close, a more accurate portrayal of multitasking:

It’s not uncommon to see two teenage pals riding in the back of a car, each one texting a friend somewhere else rather than talking to the friend sitting next to them. It’s a throwback to the toddler days, when kids engage in parallel play before they’re capable of sustained interaction.

Thursday, June 12, 2008 | overload, privacy  

You’ve never actually known what the question is

Douglas Adams addresses “What is the Question?”, the mantra of Visualizing Data, my Ph.D. dissertation, and hopefully haunts any visualization student I’ve ever taught:

The answer to the Great Question…?
Yes…!
Is…
Yes…!
Is…
Yes…!!!…?
“Forty-two,” said Deep Thought with infinite majesty and calm.
“Forty-two!” yelled Loonquawl, “Is that all you’ve got to show for seven and a half million years of work?”
“I checked it very thoroughly,” said the computer, “and that quite definitely is the answer. I think the problem, to be quite honest with you, is that you’ve never actually known what the question is.”

The Hitchhiker’s Guide to the Galaxy

(Found at the FontForge FAQ)

Monday, June 9, 2008 | question, vida  

Making fun of movie infographics only gets you so far

As much as snickering about computers in movies might make me feel smart, I’ve since become fascinated by how software, and in particular information, is portrayed in film. There are many layers at work:

  1. Film is visual storytelling. As such, you have to be able to see everything that’s happening. Data is not visual, which is why you see symbols that represent data used more often: It’s 2012 but they’re still storing data on physical media because at some point, showing the data being moved is important. (Nevermind that it can be transmitted thousands of kilomteters in a fraction of a second.) This is less interesting, since it means a sort of dumbing-down of the technology, and presents odd contradictions. It can also make things ugly: progress bars are often full screen interface elements, or how many technology-heavy action flicks have included the pursuit of a computer disk? (On the other hand, the non-visual aspect can be a positive one: a friend finishing film school at NYU once pursued a nanotechnology thriller as his final film because “you can’t see it.” It would allow him to tackle a technical subject without needing the millions of dollars in props.)
  2. Things need to “feel” like a computer. When this piece appeared in the Hulk, they added extra gray interface elements in and around it so that it didn’t look too futuristic. Nevermind that it was a real, working piece of software for browsing the human genome. To the consternation of a friend who worked on Minority Report, on-screen “windows” in the interface all had borders around them. If you have a completely fluid interface with hands, motion, and accessing piles of video being output from three people in a tank, do we really need…title bars?
  3. It’s not just computers—anything remotely complicated is handled in this manner. Science may be worse off than software, though I don’t think scientists complain as loudly as the geeks did when they heard “This is UNIX, I know this!” (My personal favorite in that one was a scene where a video phone discussion was actually an actor talking to a QuickTime movie—you could see the progress bar moving left to right as the scene wore on.)
  4. There’s a lot of superfluous gimmickery that goes on too. There’s just no way you’re gonna show important information in a film without random numbers twitching or counting down. Everything is more important when we have know the current time with millisecond accuracy (that’s three digits after the decimal point for seconds). Or maybe some random software code (since that’s incomprehensible but seems significant). This is obvious and sometimes painful to watch, except in the case of a talented visual designer who makes it look compelling.
  5. Finally, the way that computers are represented in film has something to do with how we (society? lay people? them?) think that computers should work.

It’s that last one that is the fascinating point for me: by virtue of the intent to reach a large audience, a movie streamlines the way that information is handled and interfaces behave. A their best, it suggests where we need to go (at their worst, they blink “Access Denied”). It’s easy to point out the ridiculousness of the room full of people hunched over computers at CIA headquarters and the guy saying “give me all people with last name Jones in the Baltimore area” and in the next scene that’s tallied against satellite video (which of course can be enhanced ad infinitum). But think about how ridiculous those scenes looked twenty years ago, and the parts of that scenario that are no longer far-fetched as the population at large gets used to Google and having satellite imagery available for the price of typing a query. Even the most outrageous—the imagery enhancement—has had breakthroughs associated with it, some of which can be done by anyone using Photoshop, like the case of people trying to figure out if Bush was wearing a wire at the debates in 2004. (Contradicting their earlier denials, Bush’s people later admitted that he was wearing a bulletproof vest.)

That’s the end of today’s lecture on movie graphics, so I’ll leave you with a link to Mark Coleran, a visual designer who has produced many such sequences for film.

coleran-510.jpg

I recommend the large version of his demo reel, and I’ll be returning to this topic later with more designers. Drop me an email if you have favorite designer or film sequence.

Monday, June 9, 2008 | infographics, movies  

Somewhere between graffiti and terrorism

boy-noshadow.jpgMatt Mullenweg, creator of WordPress, speaking at the “Future of Web Apps” conference in February:

Spammers are “the terrorists of Web 2.0,” Mullenweg said. “They come into our communities and take advantage of our openness.” He suggested that people may have moved away from e-mail and toward messaging systems like Facebook messaging and Twitter to get away from spam. But with all those “zombie bites” showing up in his Facebook in-box, he explained, the spammers are pouncing on openness once again.

I don’t think that “terrorists” is the right word—they’re not taking actions with an intent to produce fear that will prevent people from using online communities (much less killing bloggers or kidnapping Facebook users). What I like about this quote is the idea that “they take advantage of openness,” which puts it well. There needs to be a harsher way to describe this situation than “spamming” which suggests a minor annoyance. There’s nothing like spending a Saturday morning cleaning out the Processing discussion board, or losing an afternoon modifying the bug database to keep it safer from these losers. It’s a bit like people who crack machines out of maliciousness or boredom—it’s incredibly time consuming to clean up the mess, and incredibly frustrating when it’s something done in your spare time (like Processing) or to help out the group (during grad school at the ACG).

So it’s somewhere between graffiti and terrorism, but it doesn’t match either because the social impact at either end of that scale is incredibly different (graffiti can be a positive thing, and terrorism is a real world thing where people die).

On a more positive note, and for what it’s worth, I highly recommend WordPress. It’s obvious that it’s been designed and built by people who actually use it, which means that the interface is pleasantly intuitive. And not surprising that it was initially created by such a character.

Monday, June 9, 2008 | online, social  

The cloud over the rainforest brings a thunderstorm

And now, the opposite of the Amazon plot posted yesterday. No sooner had I finished writing about their online aptitude that they have a major site outage, greeting visitors with a Http/1.1 Service Unavailable message.

amazon-down-500.jpg

Plot from this article on News.com.

Friday, June 6, 2008 | goinuptotheserverinthesky, notafuturist  

Proper Analysis of Salary vs. Performance?

Got an email from Mebane Faber who noted the roughly inverse correlation you currently see in salaryper, and asking about whether I’d done proper year-end analysis. The response follows:

I threw the project together as sort of a fun thing out of curiosity, and haven’t taken the time to do a proper analysis. However you can see in the previous years that the inverse relationship happens each year at the beginning of the season, and then as it progresses, the big market teams tend to mow down the small guys. Or at least those that are successful–the correlation between salary and performance at the end of a season is generally pretty haphazard. In fact, it’s possible that the inverse correlation at the beginning of the season is actually stronger than the positive correlation at the end.

I think the last point is kinda funny, though I’d imagine there’s a less funny statistics term for that phenomenon. Such a fine line between funny and sounding important.

Friday, June 6, 2008 | feedbag, salaryper  

Distribution of the foreign customers at a particular youth hostel

Two pieces representing youth hostel data from Julien Bayle. Both adaptations of the code found in Visualizing Data. The first a map:

bayle-worldmap.jpg

The map looks like most maps of data connected to a world map, but the second representation uses a treemap, which is much more effective (meaning that it answers his question much more directly).

bayle-treemap.jpg

The image as background is a nice technique, since if you’re not using colors to differentiate individual sectors, the treemap tends to be dominated by the outlines around the squares (search for treemap images and you’ll see what I mean). The background image lets you use the border lines, but the visual weight of the image prevents them from being in the foreground.

Anyone else with adaptations? Pass them along.

Thursday, June 5, 2008 | adaptation, vida  

I Think Somebody Needs A Hug

I tend to avoid reading online comments since they’re either overly negative or overly positive (neither is healthy), but I laughed out loud after happening across this comment from a post about salaryper on the Freakonomics blog at the New York Times site:

How do I become a “data visualization guru?”
Seems like a pretty sweet gig. But you probably need a degree in Useless Plots from Superficial Analysis School.

– Ben D.

No my friend, it takes a Ph.D. in Useless Plots from Superficial Analysis School. (And if you know this guy, please take him out for a drink — I’m concerned he’s been indoors too long.)

Thursday, June 5, 2008 | reviews, salaryper  

Obama Limited to 16 Bits

I guess I never thought I’d read about the 16-bit limitations of Microsoft Excel in mainstream press (or at least outside the geek press), but here it is:

Obama’s January fundraising report, detailing the $23 million he raised and $41 million he spent in the last three months of 2007, far exceeded 65,536 rows listing contributions, refunds, expenditures, debts, reimbursements and other details.

Excel has since its inception been limited to 65,536 rows, the maximum number you get when you represent the row number using two bytes. Mr. Millionsfromsmallcontributions has apparently flown past this limit in his FEC reports, forcing poor reporters to either use Microsoft Access (a database program) or pray for the just-released Excel 2007, where in fact the row restriction has been lifted.

In the past the argument against fixing the restriction had always been a mixture of “it’s too messy to upgrade something like that” and “you shouldn’t have that many rows of data in a spreadsheet anyway, you should use a database.” Personally I disagree with the latter; and as silly as the former sounds, it’s been the case for a good 20 years (or was the row limit even lower back then?)

The OpenOffice project, for instance, has an entire page dedicated to fixing the issue in OpenOffice Calc, where they’re limited to 30,000 rows—the limit being tied to 32,768, or the number you get with 15 bits instead of 16 (use the sixteenth bit as the sign bit indicating positive or negative, and you can represent numbers from -32768 to 32767 instead of unsigned 16 bit values that range from 0 to 65535).

Bottoms up for the first post tagged both “parse” and “politics”.

Thursday, June 5, 2008 | parse, politics  

What’s that big cloud over the rainforest?

As the .com shakeout loomed in the late 90s, I always assumed that:

  1. Most internet-born companies would disappear.
  2. Traditional (brick & mortar) stores would eventually get their act together and have (or outsource) a proper online presence. For instance Barnes & Noble hobbling toward a usable site, and Borders just giving up and turning over their online presence to Amazon. The former comical, the latter brilliant, though Borders has just returned with their own non-Amazonian presence. (Though I think the humor is now gone from watching old-school companies trying to move online.)
  3. Finally, a few new names—namely the biggest ones, like Amazon—would be left that didn’t disappear with the others from point #1.

Basically, that not much would change. A couple new brands would emerge, but that there wasn’t really room in people’s heads for that many new retailers or services. (It probably didn’t help that all their logos were blue and orange, and had names like Flooz, Boo and Kibu that feel natural on the tongue and inspire buyer loyalty and confidence.)

aws_bandwidth.gifBut not only did more companies stick around, some seem to be successfully pivoting into other areas. From Amazon:

In January of 2008 we announced that the Amazon Web Services now consume more bandwidth than do the entire global network of Amazon.com retail sites.

This from a blog post with this plot of the bandwidth use for both sides of the business.

Did you imagine that the site where you could buy books cheaper than anywhere else in 1998 would ten years later exceed the bandwidth from that with services for data storage and cloud computing? Of course, this announcement doesn’t say anything about their profits at this point, but I don’t think anyone expected Steve Jobs to turn Apple into a toy factory and start turning out music players and cell phones to have it become half their business within just a few years. (That’s half as in, “beastly silver PCs and shiny black and white laptops seem important and all, but those take real work…why bother?”)

But the point (aside from subjecting you to a long-winded description of .com history and my shortcomings as a futurist) has more to do with Amazon becoming a business that’s dealing purely in information. The information economy is all about people moving bits and ideas around (abstractions of things), instead of silk, furs, and spices (actual physical things). And while books are information, the growth of Amazon’s data services business—as evidenced by that graph—is one of the strongest indicators I’ve seen of just how real the non-real information economy has become. Not that the information economy is something new; but that the groundwork has been laid in the preceding decades where something like Amazon Web Services can be successful.

And since we’re on the subject of Amazon, I’ll close with more from Jeff Bezos from “How the Web Was Won” in this month’s Vanity Fair:

When we launched, we launched with over a million titles. There were countless snags. One of my friends figured out that you could order a negative quantity of books. And we would credit your credit card and then, I guess, wait for you to deliver the books to us. We fixed that one very quickly.

Or showing his genius early on:

When we started out, we were packing on our hands and knees on these cement floors. One of the software engineers that I was packing next to was saying, You know, this is really killing my knees and my back. And I said to this person, I just had a great idea. We should get kneepads. And he looked at me like I was from Mars. And he said, Jeff, we should get packing tables.

Thanks to Eugene for passing along the links.

Thursday, June 5, 2008 | goinuptotheserverinthesky, infographics, notaneconomist  

Movies, Mapping, and Motion Graphics

Elegantly done, and some of the driest humor in film titles you might ever see, the opening sequence from Death at a Funeral.

Excellent (and appropriate) music, color, and type; does a great job of setting up the film. IMDB description:

Chaos ensues when a man tries to expose a dark secret regarding a recently deceased patriarch of a dysfunctional British family

Or the tagline:

From director Frank Oz comes the story of a family that puts the F U in funeral.

Tuesday, June 3, 2008 | mapping, motion, movies  

Mark in Madrid

Mark Hansen is one of the nicest and most intelligent people you’ll ever meet. He was one of the speakers at the symposium at last Fall’s Visualizar workshop in Madrid, and Medialab Prado has now put the video of Mark’s talk (and others) online. Check it out:

Mark has a Ph.D. in Statistics and along with his UCLA courses like Statistical Computing and Advanced Regression, has taught one called Database Aesthetics, which he describes a bit in his talk. You might also be familiar with his piece Listening Post, which he created with Ben Rubin.

Tuesday, June 3, 2008 | speaky  

Goodbye 15 minutes: 1.5 seconds is the new real time

As cited on Slashdot, Google has announced that they’ll be providing real-time stock quotes from NASDAQ. As referred to in the title, this “real time” isn’t likely the same “real time” that financial institutions get for their “quotes,” since they still need to process the data and serve it up to you somehow. But for an old internet codger who thought quotes delayed by 15 minutes back in 1995 was pretty nifty, this is just one more sign of the information apocalypse.

wastler_a_100x100b.jpg

The Wall Street Journal is also in on the gig, and Allen Wastler from CNBC crows that they’re also a player. Interestingly, the data will be free from the WSJ at their Markets Data Center page—one more sign of a Journal that’s continuing to open up its grand Oak doors to give us plebes a peek inside their exclusive club.

An earlier post from the Google blog has some interesting details:

As a result, we’ve worked with the SEC, the New York Stock Exchange (NYSE) and our D.C. trade association, NetCoalition, to find a way to bring stock data to Google users in a way that benefits users and is practical for all parties. We have encouraged the SEC to ensure that this data can be made available to our users at fair and reasonable rates, and applaud their recent efforts to review this issue. Today, the NYSE has moved the issue a great step forward with a proposal to the SEC which if approved, would allow you to see real-time, last-sale prices…

The NYSE hasn’t come around yet, but the move by NASDAQ should give them the additional competitive push to make it happen soon enough. As it appears, this had more to do with getting SEC approval than the exchanges themselves. Which, if you think about it, makes sense—and if you think about it more, makes one wonder what sort of market-crashing scenario might be opened by millions having access to the live data. Time to write that movie script.

At right: CNBC’s publicity photo of Allen Wastler, which appears to have been shot in the 1930s and later hand-colorized. Upon seeing this, Wastler was then heard to say to the photo and paste-up people, “That’s amazing, can you also give me a stogie?” Who doesn’t want that coveted fat cat, robber baron blogger look.

Tuesday, June 3, 2008 | acquire  

Melting Ants for Science (or, Solenopsis invicta as Dross)

Another visualization from the see-through fish category, a segment from Sunday Morning about Dr. Walter Tschinkel who studies the structure of ant colonies using aluminum casts. Three easy steps: Heat aluminum to 1200 degrees, pour it down an ant hole, and dig away carefully to reveal the intricate structure of the interior:

What amazing structures! Whenever you think you’ve made something that looks “good,” you can count on nature to dole out humility. Maybe killing the ants in the process is a little way to get the control back. Um, or something.

(Pardon the crappy video quality and annoying ad… Tried to tape the real version from my cable box, but @#$%*! Comcast has CBS marked as a 5c protected “premium” channel. Riiiight.)

Thursday, May 29, 2008 | physical, science  

Summerschool in Wiesbaden

sv-summerschool.jpg

Scholz & Volkmer is running a Summerschool program this July and is looking for eight students from USA and Europe. (Since “summer school” is one word, you may have already guessed that it’s based in Germany.) This is the group behind the SEE Conference that I spoke at in April. (Great conference, and the lectures are online, check ’em out.)

The program is run by their Technical Director (Peter), who is a great guy. They’re looking for topics like data visualization, mobile applications, interaction concepts, etc. and are covering flight and accomodations plus a small stipend during your four week stay. Should be a great time.

Tuesday, May 27, 2008 | opportunities  

Schneier, Terrorists and Accuracy

Some thoughtful comments passed along by Alex Hutton regarding the last post:

Part of the problem with point technology solutions is in the policies of implementation.  IMHO, we undervalue the subject matter expert, or operate as a denigrated bureaucracy which does not allow the subject matter expert the flexibility to make decisions.  When that happens, the decision is left to technology (and as you point out, no technology is a perfect decision maker).

I thought it was apropos that you brought in the Schneier example.  I’ve been very much involved in a parallel thought process in the same industry as he, and we (my partner and I) are coming to a solution that attempts to balance technology, point human decision, and the bureaucracy within which they operate.

If you believe the Bayesians, then the right Bayesian network mimics the way the brain processes qualitative information to create a belief (or in the terms of Bayesians, a probability statement used to make a decision).  As such, the current way we use the technology (that policy of implementation, above) is faulty because it minimizes that “Human Computational Engine” for a relatively unsophisticated, unthinking technology.  That’s not to say that technologies like facial recognition are worthless – computational engines, even less magic ones that aren’t 99.99% accurate, are valid pieces of prior information (data).

Now in the same way, Human Computational Engines are also less than perfectly accurate.  In fact, they are not at all guaranteed to work the same way twice – even by the same person unless that person is using framework to provide rigor, rationality, and consistency in analysis.

So ideally, in physical security (or information security where Schneier and I come from) the imperfect computer detection engine is combined with a good Bayesian network and well trained/educated/experienced subject matter experts to create a more accurate probability statement around terrorist/non-terrorist – one that at least is better at identifying cases where more information is needed before a person is prevented from flying, searched and detained.  While this method, too, would not be 100% infallible (no solution will ever be), it would create a more accurate means of detection by utilizing the best of the human computational engine.

I believe the Bayesians, just 99.99% of the time.

Thursday, May 15, 2008 | bayesian, feedbag, mine, security  

Human Computation (or “Mechanical Turk” meets “Family Feud”)

richard_dawson.jpgComputers are really good at repetitive work. You can ask a computer to multiply two numbers together seven billion times and not only will it not complain, it’ll probably have seven billion answers for you a few seconds later. Ask a person to do the same thing and they’ll either walk away at the outset, realizing the ridiculousness of the task, or they’ll get through the first few tries and lose interest. But even the fact that a human can recognize the ridiculousness of the task is important. Humans are good at lots of things—like identifying a face in a crowd—that cannot be addressed by computation with the same level of accuracy.

Visualization is about the interface between what humans are good at, and what computers are good at. First, the computer can crunch all seven billion numbers, then present the results in a way that we can use our own perceptual skills to identify what’s important or interesting. (This is also why the design of a visualization is a fundamentally human task, and not something to be left to automation.)

This is also the subject of Luis von Ahn’s work at Carnegie Mellon. You’re probably familiar with CAPTCHA images—usually wavy numbers and letters that you have to discern when signing up for a webmail account or buying tickets from Ticketmaster. The acronym stands for “Completely Automated Public Turing Test to Tell Computers and Humans Apart,” a clever mouthful referring to Alan Turing’s work in discerning man or machine. (I encourage you to read about them, but this is already getting long so I won’t get into it here.)

More interesting than CAPTCHA, however, is the whole notion that’s behind it: that it’s an example of relying on humans to do what they’re best at, though it’s a task that’s difficult for computers. (Sure, in recent weeks, people have actually found ways to “break” CAPTCHAs in specific cases, but that’s not important here.) For instance, the work was extended to the Google Image Labeler, described as follows:

You’ll be randomly paired with a partner who’s online and using the feature. Over a two-minute period, you and your partner will:

  • View the same set of images.
  • Provide as many labels as possible to describe each image you see.
  • Receive points when your label matches your partner’s label. The number of points will depend on how specific your label is.
  • See more images until time runs out.

Prior to this, most image labeling systems had to do with getting volunteers to name or tag images individually. As you can imagine, the quality of tags suffer considerably because of everything from differences in how people perceive or describe what they see, to individuals who try to be a little too clever in choosing tags. With the Image Labeler game, that’s turned around backwards, where there is a motivation to use tags that match the other person, thus minimizing the previous problems. (It’s “Mechanical Turk” meets “Family Feud”.) They’ve also applied the same ideas to scanning books—where fragments of text that cannot be recognized by software are instead checked by multiple people.

More recently, von Ahn’s group has expanded these ideas in Games With A Purpose, a site that addresses these “casual games” more directly. The new site is covered in this New Scientist article, which offers additional tidbits (perspective? background? couldn’t think of the right word).

You can also watch Luis’ Google Tech Talk about Human Computation, which if I’m not mistaken, led to the Image Labeler project.

(We met Luis a couple times while at CMU and watched the Superbowl with his awesome fiancée Laura, cheering on her hometown Chicago Bears against those villainous Colts. We were happy when he received a MacArthur Fellowship for his work—just the sort of person you’d like to get such an award that highlights people who often don’t quite fit in their field.)

Mommy can we play infringing on my civil liberties?Returning to the earlier argument, algorithms to identify a face in a crowd are certainly improving. But without a significant breakthrough, their usefulness will be significantly limited. One commonly hyped use for such systems is airport security. Bruce Schneier explains the problem:

Suppose this magically effective face-recognition software is 99.99 percent accurate. That is, if someone is a terrorist, there is a 99.99 percent chance that the software indicates “terrorist,” and if someone is not a terrorist, there is a 99.99 percent chance that the software indicates “non-terrorist.” Assume that one in ten million flyers, on average, is a terrorist. Is the software any good?

No. The software will generate 1000 false alarms for every one real terrorist. And every false alarm still means that all the security people go through all of their security procedures. Because the population of non-terrorists is so much larger than the number of terrorists, the test is useless. This result is counterintuitive and surprising, but it is correct. The false alarms in this kind of system render it mostly useless. It’s “The Boy Who Cried Wolf” increased 1000-fold.

Given the number of travelers at Boston Logan in 2006, that would be two “terrorists” identified per day. (And with Schneier’s one in ten million is a terrorist figure, that would be two or three terrorists per year…clearly too generous, which makes the face detection accuracy even worse than how he describes it.) I find myself thinking about the 99.99% accuracy number as I stare at the back of heads lined up at the airport security checkpoint—itself a human problem, not a computational problem.

Thursday, May 15, 2008 | cs, games, human, perception, security  

Gender and Information Graphics

Just received this in a message from a journalism grad student studying information graphics:

I have looked at 2 years worth of Glamour (and Harper’s Bazaar too) magazines for my project and it shows that Glamour and other women’s magazines have less amount of information graphics in the magazines compared to men’s magazines, such as GQ and Esquire. Why do you think that is? Do you think that is gender-related at all?

I hadn’t really thought about it much. For the record, my reply:

My fiancée (who knows a lot more about being female than I do) pointed out that such magazines have much less practical content in general, so it may have more to do with that than a specific gender thing. Though she also pointed out that, for instance, in today’s news about the earthquake in China, she felt that women might be more inclined to read a story with the faces of those affected than one with information graphics tallying or describing the same.

I think you’d need to find something closer to a male equivalent of Glamour so that you can cover your question and remove the significant bias you’re getting for the content. Though, uh, a male equivalent of Glamour may not really exist… But perhaps there are better options.

And as I was writing this, she responded:

Finding a male equivalent of Glamour is hard but they actually do have some hard-hitting stories near the back in every issue that sometimes might be overshadowed by all the fashion and beauty stuff. Actually, finding a female equivalent of GQ or Esquire is also hard because they sort of have a niche of their own too. I have to agree with your fiancée too, because, I studied Oprah’s magazines a little in my previous study and sometimes it is really about what appeals to their audience.

Well, my study does not imply causality and it sometimes might be hard to differentiate if the result was due to gender differences or content. So, it’s interesting to find all these out, and actually men’s magazines have about 5 times more information graphics than women’s magazines which is amazing.

Wow—five times more. (At least amongst the magazines that she mentioned.)

My hope in posting this (rather than just sharing the contents of my inbox…can you tell that I’m answering mail today?) is that someone else out there knows more about the subject. Please drop me a line if you do; I’d like to know more and to post a follow-up.

Monday, May 12, 2008 | gender, inbox, infographics  

Glagolitic Capital Letter Spidery Ha

spidery-170x205.pngA great Unicode in 5 Minutes presentation from Mark Lentczner at Linden Lab. He passed it along after reading this dense post, clearly concerned about the welfare of my readers.

(Searching out the image for the title of this post also led me to a collection of Favourite Unicode Codepoints. This seems ripe for someone to waste more time really tracking down such things and documenting them.)

Mark’s also behind Context Free, one of the “related initiatives” that we have listed on Processing.org.

Context Free is a program that generates images from written instructions called a grammar. The program follows the instructions in a few seconds to create images that can contain millions of shapes.

Grammars are covered briefly in the Parse chapter of vida, with the name of the language coming from a specific variety called Context Free Grammars. The magical (and manic) part of grammars is that their rules tend to be recursive and layered, which leads to a certain kind of insanity as you try to tease out how the rules work. With Context Free, Mark has instead turned this dizziness into the basis for creating visual form.

Updated 14 May 08 to fix the glyph. Thanks to Paul Oppenheim, Spidery Ha Devotee, for the correction.

Monday, May 12, 2008 | feedbag, languages, parse, unicode  

So much for “wonderfully simple”

In contrast to the clarity and simplicity of the New York Times info graphic mentioned yesterday, the example currently on their home page is an example of the opposite:

This is helpful because it clarifies the point I tried to make about what was nice about the other graphic. Because of space limitations, this graphic is small, and the information is stored across multiple panels. So at the top there are a pair of tabs. Then within the tabs we have a pair of buttons. Two tabs, four buttons, just to get through four possible pieces of data. That’s the sort of combinatoric magic we see in Microsoft Windows preference panels:

snap1.gif

While the organization in the info graphic makes conceptual sense—first you must choose one of two states, then choose one of the candidates—it makes little cognitive sense. We’re choosing between one of four options. Just give them to us! For a pair of items beneath another pair of items, there’s no need to establish a sense of hierarchy. If there were a half dozen states, and a half dozen candidates, then that might make sense. Just because the data is technically hierarchic, or arranged in a tree, that doesn’t mean that it’s the best representation for it.

The solution? Just give us the four options. No sliding panels, trap doors, etc. Better yet, superimpose the Clinton and Obama data on a single map as different colors, and have a pair of buttons (not tabs!) that let the viewer quickly swap between Indiana and North Carolina.

(This only covers the interaction model, without getting into the way the data itself is presented, colors chosen, laid out, etc. The lack of population density information in the image makes the maps themselves nearly worthless.)

Tuesday, May 6, 2008 | infographics, interact, politics  

Average Distance to the Nearest Road in the Conterminous United States

Got an email over the weekend from Tom Vanderbilt, who had seen the All Streets piece, and was kind enough to point me to this map (PDF) from the USGS that depicts the average distance to the nearest road across the continental 48 states. (He’s currently working on a book titled Traffic: Why We Drive the Way We Do (and What It Says About Us) to be released this fall).

And too bad I just learned the word conterminous, but had I used that in the original project description, we would have missed (or been spared) the Metafilter discussion of whether “lower 48” was accurate terminology.

roadproximity2.jpg

A really interesting map, which of course also shows the difference between something thrown together in a few hours and actual research. In digging around for the map’s source, I found that exactly a year ago, they also published a paper in Science describing their broader work:

Roads encroaching into undeveloped areas generally degrade ecological and watershed conditions and simultaneously provide access to natural resources, land parcels for development, and recreation. A metric of roadless space is needed for monitoring the balance between these ecological costs and societal benefits. We introduce a metric, roadless volume (RV), which is derived from the calculated distance to the nearest road. RV is useful and integrable over scales ranging from local to national. The 2.1 million cubic kilometers of RV in the conterminous United States are distributed with extreme inhomogeneity among its counties.

The publication even includes a response and a response to the response—high scientific drama! Apparently some lads feel that “roadless volume does not explicitly address ecological processes.” So let that be a warning to all you non-explicit addressers.

For those lucky to have access to the journal online, the supplementary information includes a time lapse video of a section of Colorado, and its roadless volume since 1937. As with all things, it’s much more interesting to see how this changes over time. A map of all streets in the lower 48 isn’t nearly as engaging as a sequence of the same area over several years. The latter story is simply far more compelling.

Tuesday, May 6, 2008 | allstreets, feedbag, mapping  

Unicode, character encodings, and the declining dominance of Western European character sets

Computers know nothing but numbers. As humans we have varying levels of skill in using numbers, but most of the time we’re communicating with words and phrases. So in the early days of computing, the earliest software developers had to find a way to map each character—a letter Q, the character #, or maybe a lowercase b—into a number. A table of characters would be made, usually either 128 or 256 of them, depending on whether data was stored or transmitted using 7 or 8 bits. Often the data would be stored as 7 bits, so that the eighth bit could be used as a parity bit, a simple method of error correction (because data transmission—we’re talking modems and serial ports here—was so error prone).

Early on, such encoding systems were designed in isolation, which meant that they were rarely compatible with one another. The number 34 in one character set might be assigned to “b”, while in another character set, assigned to “%”. You can imagine how that works out over an entire message, but the hilarity was lost on people trying to get their work done.

In the 1960s, the American National Standards Institute (or ANSI) came along and set up a proper standard, called ASCII, that could be shared amongst computers. It was 7 bits (to allow for the parity bit) and looked like:

  0 nul    1 soh    2 stx    3 etx    4 eot    5 enq    6 ack    7 bel
  8 bs     9 ht    10 nl    11 vt    12 np    13 cr    14 so    15 si
 16 dle   17 dc1   18 dc2   19 dc3   20 dc4   21 nak   22 syn   23 etb
 24 can   25 em    26 sub   27 esc   28 fs    29 gs    30 rs    31 us
 32 sp    33  !    34  "    35  #    36  $    37  %    38  &    39  '
 40  (    41  )    42  *    43  +    44  ,    45  -    46  .    47  /
 48  0    49  1    50  2    51  3    52  4    53  5    54  6    55  7
 56  8    57  9    58  :    59  ;    60  <    61  =    62  >    63  ?
 64  @    65  A    66  B    67  C    68  D    69  E    70  F    71  G
 72  H    73  I    74  J    75  K    76  L    77  M    78  N    79  O
 80  P    81  Q    82  R    83  S    84  T    85  U    86  V    87  W
 88  X    89  Y    90  Z    91  [    92  \    93  ]    94  ^    95  _
 96  `    97  a    98  b    99  c   100  d   101  e   102  f   103  g
104  h   105  i   106  j   107  k   108  l   109  m   110  n   111  o
112  p   113  q   114  r   115  s   116  t   117  u   118  v   119  w
120  x   121  y   122  z   123  {   124  |   125  }   126  ~   127 del

The lower numbers are various control codes, and the characters 32 (space) through 126 are actual printed characters. An eagle-eyed or non-Western reader will note that there are no umlauts, cedillas, or Kanji characters in that set. (You’ll note that this is the American National Standards Institute, after all. And to be fair, those were things well outside their charge.) So while the immediate character encoding problem of the 1960s was solved for Westerners, other languages would still have their own encoding systems.

As time rolled on, the parity bit became less of an issue, and people were antsy to add more characters. Getting rid of the parity bit meant 8 bits instead of 7, which would double the number of available characters. Other encoding systems like ISO-8859-1 (also called Latin-1) were developed. These had better coverage for Western European languages, by adding some umlauts we’d all been missing. The encodings kept the first 0–127 characters identical to ASCII, but defined characters numbered 128–255.

However this still remained a problem, even for Western languages, because if you were on a Windows machine, there was a different definition for characters 128–255 than there was on the Mac. Windows used what was called Windows 1252, which was just close enough to Latin-1 (embraced and extended, let’s say) to confuse everyone and make a mess. And because they like to think different, Apple used their own standard, called Mac Roman, which had yet another colorful ordering for characters 128–255.

This is why there are lots of web pages that will have squiggly marks or odd characters where em dashes or quotes should be found. If authors of web pages include a tag in the HTML that defines the character set (saying essentially “I saved this on a Western Mac!” or “I made this on a Norwegian Windows machine!”) then this problem is avoided, because it gives the browser a hint at what to expect in those characters with numbers from 128–255.

Those of you who haven’t fallen asleep yet may realize that even 200ish characters still won’t do—remember our Kanji friends? Such languages usually encode with two bytes (16 bits to the West’s measly 8), providing access to 65,536 characters. Of course, this creates even more issues because software must be designed to no longer think of characters as a single byte.

In the very early 90s, the industry heavies got together to form the Unicode consortium to sort out all this encoding mess once and for all. They describe their charge as:

Unicode provides a unique number for every character,
no matter what the platform,
no matter what the program,
no matter what the language.

They’ve produced a series of specifications, both for a wider character set (up to 4! bytes) and various methods for encoding these character sets. It’s truly amazing work. It means we can do things like have a font (such as the aptly named Arial Unicode) that defines tens of thousands of character shapes. The first of these (if I recall correctly) was Bitstream Cyberbit, which was about the coolest thing a font geek could get their hands on in 1998.

The most basic version of Unicode defines characters 0–65535, with the first 0–255 characters defined as identical to Latin-1 (for some modicum of compatibility with older systems).

One of the great things about the Unicode spec is the UTF-8 encoding. The idea behind UTF-8 is that the majority of characters will be in that standard ASCII set. So if the eighth bit of a character is a zero, then the other seven bits are just plain ASCII. If the eighth bit is 1, then it’s some sort of extended format. At which point the remaining bits determine how many additional characters (usually two) are required to encode the value for that character. It’s a very clever scheme because it degrades nicely, and provides a great deal of backward compatibility with the large number of systems still requiring only ASCII.

Of course, assuming that ASCII characters will be most predominant is to some repeating the same bias as back in the 1960s. But I think this is an academic complaint, and the benefits of the encoding far outweigh the negatives.

Anyhow, the purpose of this post was to write that Google reported yesterday that Unicode adoption on the web has passed ASCII and Western European. This doesn’t mean that English language characters have been passed up, but rather that the number of pages encoded using Unicode (usually in UTF-8 format), has finally left behind the archaic ASCII and Western European formats. The upshot is that it’s a sign of us leaving the dark ages—almost 20 years since the internet was made publicly available, and since the start of the Unicode consortium, we’re finally starting to take this stuff seriously.

The Processing book also has a bit of background on ASCII and Unicode in an Appendix, which includes more about character sets and how to work with them. And future editions of vida will also cover such matters in the Parse chapter.

Tuesday, May 6, 2008 | parse, unicode, updates, vida  

Another delegate calculator

Wonderfully simple delegate calculator from the New York Times. Addresses a far simpler question than the previously mentioned Slate calculator, but bless the NYT for realizing that something that complicated was no longer necessary.

delegate-5101.jpg

Good example of throwing out extraneous information to tell a story more directly: a quick left and right drag provides a more accurate depiction than the horse race currently in the headlines.

Monday, May 5, 2008 | election, politics, scenarios  

Doin’ stats for the C’s

A New York Times piece by the Freakonomics guys about Mike Zarren, the 32-year-old numbers guy for the Boston Celtics. While statistics has become more-or-less mainstream for baseball, the same isn’t quite true for basketball or football (though that’s changing too). They have better words for it than me:

This probably makes good sense for a sport like baseball, which is full of discrete events that are easily measured… Basketball, meanwhile, might seem too hectic and woolly for such rigorous dissection. It is far more collaborative than baseball and happens much faster, with players shifting from offense one moment to defense the next. (Hockey and football present their own challenges.)

But that’s not to say that something can be gained by looking at the numbers:

What’s the most efficient shot to take besides a layup? Easy, says Zarren: a three-pointer from the corner. What’s one of the most misused, misinterpreted statistics? “Turnovers are way more expensive than people think,” Zarren says. That’s because most teams focus on the points a defense scores from the turnover but don’t correctly value the offense’s opportunity cost — that is, the points it might have scored had the turnover not occurred.

Of course, the interesting thing about sports is that at their most basic, they cannot be defined by statistics or numbers. Take the Celtics, who just won the first round of the playoffs. Given their ability, the Celtics should have dispensed with the Hawks more quickly, rather than needing all seven games of the series to win the necessary four. The coach in the locker room of any Hoosiers ripoff will tell you it doesn’t matter what’s on the stat sheets, it matters who shows up that day. It’s the same reason that owners cannot buy a trophy even in a sport that has no salary cap. Or, if you’re like some of my in-laws-to-be (all Massachusetts natives), you might suspect that the fix is in (“How much money do those guys make per game?”) Regardless, it’s the human side of the sport, not the numbers, that make it worth watching. (And I don’t mean the soft-focus ESPN “Outside the Lines” version of the “human” side of the sport. Yech.)

In the meantime, maybe the Patriots or the Sox are hiring…

(Passed along by Andy Oram, my editor for vida)

Monday, May 5, 2008 | sports  

Flash file formats opened?

Via Slashdot, word that Adobe is opening the SWF and FLV file formats through the Open Screen Project. On first read this seemed great—Adobe essentially re-opening the SWF spec. It was released under a less onerous license by Macromedia ca. 1998, but then closed back up again once it became clear that the other vector graphics for the web proposals from Microsoft and others would not be an actual competitor. At the time, Microsoft had submitted a binary format called VML to the W3C, and the predecessor to SVG (called PGML) had also been proposed by then-rival Adobe and friends.

On second read it looks like they’re trying to kill Android before it has a chance to get rolling. So history rhymes ten years later. (Shannon informs me that this may qualify as a pantoum).

But to their credit (I’m shocked, actually), both specs are online already:

The SWF (Flash file format) specification

The FLV (Flash video file format) specification

….and more important, without any sort of click-through license. (“By clicking this button you pledge your allegiance to Adobe Systems and disavow your right to develop for products and platforms not controlled or approved by Adobe or its partners. The aforementioned transferral of rights also applies to your next of kin as well as your extended network of business partners and/or (at Adobe’s discretion) lunch dates.”)

I’ve never been nuts about using “open” as prefix for projects, especially as it relates to big companies hyping what do-gooders they are. It makes me think of the phrase “compassionate conservatism”. The fact that “compassionate” has to be added is more telling than anything else. They doth protest too much.

Thursday, May 1, 2008 | parse  

Design and the Elastic Mind

Perhaps three months late for an announcement, and at the risk of totally reckless narcissism, I should mention that four of my projects are currently on display in the Design and the Elastic Mind exhibition at the Museum of Modern Art in New York. My work notwithstanding, I hear that the show is generating lots of foot traffic and positive reviews, which is a well-deserved compliment to curator Paola Antonelli.

There’s a New York Times article and slide show (too much linking to the Times lately, weird…) and a writeup in the International Herald Tribune that even mentions my Humans vs. Chimps piece.

The first wall as you enter the show is all of Chromosome 18, done in the style of this piece.

chr18-elastic-510b.jpg

It’s a 3 pixel font at 150 dpi, so there are 37.5 letters per inch in either direction, and the wall is about 20 feet square, making 75 million letters total. Paola and her staff asked whether it was OK to put the text on the piece itself, which I felt was fine, as the nature of the piece is about scale, and the printing would not detract from that. The funny side effect of this was watching people at the opening take one another’s picture in front of the piece, mostly probably not realizing that the wall itself was part of the exhibition. Perhaps my most popular work so far, given the number of family photos in which it will be found.

Former classmate Ron Kurti also took a nice detail shot:

chr18-placard-kurti-510.jpg

Also in the show is the previously mentioned Humans vs. Chimps project as seen below:

chimp-510.jpg

This image is about three feet wide so you can read the letters accurately. It’s found next to an identically sized print of isometricblocks depicting the CFTR region of the human genome (the area implicated in connection to Cystic Fibrosis). The image was first developed for a Nature cover.

isometricblocks-510.jpg

Finally, the Pac-Man print of distellamap is printed floor to ceiling on another wall in the exhibition. Unfortunately there was a glitch in the printing that caused the lines connecting portions of the code to be lost (because they’re too thin to see at a distance), but no matter.

pacman-crop-510.jpg

Much moreso than my own work, however, by far the most exciting for me is the number of projects built with Processing that are in the show. It’s a bit humbling and the sort of thing that makes me excited (and relieved) to have some time this summer to devote to Processing itself.

Wednesday, April 30, 2008 | iloveme  

Google Underwater

So that might not be the awesome name that they’ll be using, but CNET is rumormongering about Google cooking up something oceanographic along the lines of Maps or Earth. Their speculation includes this lovely image from the Lamont-Doherty Earth Observatory (LDEO) of Columbia University.

underwatertiles_510.jpg

Unlike most people with a heartbeat, I didn’t find Google Maps particularly interesting on arrival. I was a fan of the simplicity of Yahoo Maps at the time (but no longer, eek!) and Microsoft’s Terraserver had done satellite imagery for a few years. But the same way that Google Mars shows us something we’re even less familiar with than satellite imagery of Earth, there’s something really exciting about possibility of seeing beneath the oceans.

Wednesday, April 30, 2008 | mapping, rumors, water  

Me blog big linky

Kottke and Freakonomics were kind enough to link over here, which has brought more queries about salaryper. Rather than piling onto the original web page, I’ll add updates to this section of the site.

I didn’t include the project’s back story with the 2008 version of the piece, so here goes:

Some background for people who don’t watch/follow/care about baseball:

When I first created this piece in 2005, the Yankees had a particularly bad year, with a team full of aging all-stars and owner George Steinbrenner hoping that a World Series trophy could be purchased for $208 million. The World Champion Red Sox did an ample job of defending their title, but as the second highest paid team in baseball, they’re not exactly young upstarts. The Chicago White Sox had an excellent year with just one third the salary of the Yankees, while the Cardinals are performing roughly on par with what they’re paid. Interestingly, the White Sox went on to win the World Series. The performance of Oakland, which previous years has far exceeded their overall salary, was a story, largely about their General Manager Billy Beane, told in the book Moneyball.

Some background for people who do watch/follow/care about baseball:

I neglected to include a caveat on the original page that this is a really simplistic view of salary vs. performance. I created this piece because the World Series victory of my beloved Red Sox was somewhat bittersweet in the sense that the second highest paid team in baseball finally managed to win a championship. This fact made me curious about how that works across the league, with raw salaries and the general performance of the individual teams.

There are lots of proportional things that can be done too—the salaries especially exist across a wide range (the Yankees waaaay out in front, followed the another pack of big market teams, then everyone else).

There are far more complex things about how contracts work over multiple years, how the farm system works, and scoring methods for individual players that could be taken into consideration.

This piece was thrown together while watching a game, so it’s perhaps dangerously un-advanced, given the amount of time and energy that’s put into the analysis (and argument) of sports statistics.

That last point is really important… This is fun! I encourage people to try out their own methods of playing with the data. For those who need a guide on building such a beast, the book has all the explanation and all the code (which isn’t much). And if you adapt the code, drop me a line so I can link to your example.

I have a handful of things I’d like to try (such as a proper method for doing proportional spacing at the sides without overdoing it), though the whole point of the project is to strip away as much as possible, and make a straightforward statement about salaries, so I haven’t bothered coming back to it since it succeeds in that original intent.

Wednesday, April 30, 2008 | salaryper, updates, vida  

Updated Salary vs. Performance for 2008

It’s April again, which means that there are messages lurking in my inbox asking about the whereabouts of this year’s Salary vs. Performance project (found in Chapter 5 of the good book). I got around to updating it a few days ago, which means now my inbox has changed to suggestions on how the piece might be improved. (It’s tempting to say, “Hey! Check out the book and the code, you can do anything you’d like with it! It’s more fun that way.” but that’s not really what they’re looking for.)

One of the best messages I’ve received so far is from someone who I strongly suspect is a statistician, who was wishing to see a scatter plot of the data rather than its current representation. Who else would be pining for a scatterplot? There are lots of jokes about the statistically inclined that might cover this situation, but… we’re much too high minded to let things devolve to that (actually, it’s more of a pot-kettle-black situation). If prompted, statisticians usually tell better jokes about themselves anyways.

At any rate, as it’s relevant to the issue of how you choose representations, my response follows:

Sadly, the scatter plot of the same data is actually kinda uninformative, since one of your axes (salary) is more or less fixed all season (might change at the trade deadline, but more or less stays fixed) and it’s just the averages that move about. So in fact if we’re looking for more “accurate”, a time series is gonna be better for our purposes. In an actual analytic piece, for instance, I’d do something very different (which would include multiple years, more detail about the salaries and how they amortize over time, etc).

But even so, making the piece more “correct” misses the intentional simplifications found in it, e.g. it doesn’t matter whether a baseball team was 5% away from winning, it only matters whether they’ve won. At the end of the day, it’s all about the specific rankings, who gets into the playoffs, and who wins those final games. Since the piece isn’t intended as an analytical tool, but something that conveys the idea of salary vs. performance to an audience who by and large cares little about 1) baseball and 2) stats. That’s not to say that it’s about making something zoomy and pretty (and irrelevant), but rather, how do you engage people with the data in a way that teaches them something in the end and gets them thinking about it.

Now to get back to my inbox and the guy who would rather have the data sonified since he thinks this visual thing is just a fad.

Tuesday, April 29, 2008 | examples, represent, salaryper  

All Streets Error Messages

Some favorite error messages while working on the All Streets project (mentioned below). I was initially hoping to use Illustrator to open the generated PDF files (generated from Processing), but Venus informed me that it was not to be:

illustrator-sucks-balls.png

I’m having difficulties as well. Why did I pay for this software?

Generally, Photoshop is far better engineered so I was hoping that it would be able to rasterize the PDF file instead, never mind the vectors and all.

photoshops-own-balls.png

Oh come on… Just admit that you ran out of memory and can’t deal. Meanwhile, Eugene was helping out with the site, from the other end of iChat:

aim-error-none.png

Oh well.

Sunday, April 27, 2008 | allstreets, software  

The Advantages of Closing a Few Doors

From the New York Times, a piece about Predictably Irrational from Dan Ariely. I’m somewhat fascinated by the idea of our general preoccupation with holding on to things, particularly as it relates to retaining data (see previous posts referencing Facebook, Google, etc.)

Our natural tendency is to keep everything, in spite of the consequences. Storage capacity in the digital realm is only getting larger and cheaper (as its size in the physical realm continues to get smaller), which only seeks to feed off this tendency further. Perhaps this is also why more individuals don’t question Google claiming a right to keep messages from their Gmail account after the messages, or even the account, have been deleted.

Ariely’s book describes a set of experiments performed at M.I.T.:

[Students] played a computer game that paid real cash to look for money behind three doors on the screen… After they opened a door by clicking on it, each subsequent click earned a little money, with the sum varying each time.

As each player went through the 100 allotted clicks, he could switch rooms to search for higher payoffs, but each switch used up a click to open the new door. The best strategy was to quickly check out the three rooms and settle in the one with the highest rewards.

Even after students got the hang of the game by practicing it, they were flummoxed when a new visual feature was introduced. If they stayed out of any room, its door would start shrinking and eventually disappear.

They should have ignored those disappearing doors, but the students couldn’t. They wasted so many clicks rushing back to reopen doors that their earnings dropped 15 percent. Even when the penalties for switching grew stiffer — besides losing a click, the players had to pay a cash fee — the students kept losing money by frantically keeping all their doors open.

(Emphasis mine.) I originally came across the article via Mark Hurst, who adds:

I’ve said for a long time that the solution to information overload is to let the bits go: always look for ways to delete, defer, or otherwise avoid bits, so that the few that remain are more relevant and easier to handle. This is the core philosophy of Bit Literacy.

Put another way, do we need to take more personal responsibility for subjecting ourselves to the “information overload” that people so happily buzzword about? Is complaining about the overload really an issue of not doing enough spring cleaning at home?

Sunday, April 27, 2008 | retention  

Restroom information graphics

bacon-510.jpg

I like neither bacon nor these machines, so I wish they would always provide this helpful explanation (or warning).

Friday, April 25, 2008 | infographics  

The Earth at night

Via mailing list, Oswald Berthold passes along images and a short article of the Earth from space as compiled by NASA, highlighting city lights in particular.

Tokyo Bay

The collection is an update to the Earth Lights image developed a few years ago (and which made its way ’round the interwebs at the time).

For the more technical, a presentation from the NOAA titled Low Light Imaging of the Earth at Night provides greater detail about the methods used to produce such images. Also includes a couple interesting historical examples (such as the first image they created) as well as comparisons of city growth over time based on changes in the data.

Of course many conclusions can be drawn from seeing map data such as this. Look at the difference between North and South Korea, for instance (original image from globalsecurity.org).

North and South Korea by night

Apparently this is a favorite of former U.S. Secretary of Defense Donald Rumsfeld:

Mr Rumsfeld showed the picture to illustrate how backward the northern regime really is – and how oppressed its people are. Without electricity there can be none of the appliances that make life easy and that we take for granted, he said.

“Except for my wife and family, that is my favourite photo,” said Mr Rumsfeld.

“It says it all. There’s the south, the same people as the north, the same resources north and south, and the big difference is in the south it’s a free political system and a free economic system.

I’ve vowed to myself not to make this page be about politics so I won’t get into the fatuous arguments of a warmonger (oops), but I think the fascinating thing is that

  1. This image, this “information graphic,” would be of such great importance to a person that he would see fit to even mention it in reference to photos of his wife and children. This is a strong statement for any image, even if he is being dramatic.
  2. The use of images to make or score political points. There’s some great stuff buried in recent Congressional testimony about the Iraq War, for instance, that I want to get to soon.

In regards to #1, I’m trying to think of other images to which people maintain such a personal relationship (particularly those whose job is not info graphics—Tufte’s preoccupation with Napoleon’s March doesn’t count.)

As for #2, hopefully we’ll get to that a bit later.

Friday, April 25, 2008 | mapping, physical, politics  

All Streets

all streetsNew work, now posted. All of the streets in the lower 48 United States: an image of 26 million individual road segments. This began as an example I created for one of my students in the fall of 2006, and I just recently got a chance to document it properly.

Nothing particularly genius about this piece—it’s mostly just a matter of collecting the data and creating the image. But it’s one of those cases where even in a (relatively) raw format, the data itself is quite striking.

The data in this piece comes from the U.S. Census Bureau’s TIGER/Line data files. The data is first parsed and filtered (to remove non-street features) using Perl. Next, using Processing, the latitude and longitude coordinates are transformed using an Albers equal-area conic projection (which gives it that curvy surface-of-the-Earth look that we’re used to), and then plotted to an enormous image that’s saved to the disk. The steps are similar to the preprocessing stages described in Chapter 6 of Visualizing Data.

I had originally hoped to use this piece to show patterns in street naming, but I didn’t manage to find as much as I had hoped. For instance, names of local trees and flowers being tied to the local geographic regions where they’re found. However, cookie cutter suburban neighborhood developments seem to have obliterated any causation. “Magnolia” is such a nice sounding, outdoorsy word; who wouldn’t want it adorning their street corner? Local flora be damned.

There are, however, a few other interesting tidbits in the data that I hope to cover in a future project. Real work be damned.

Friday, April 25, 2008 | allstreets  

Data availability is aiming too low

The quote is primarily in regards to Web 2.0 (cough), and I couldn’t agree more.

“Praising companies for providing APIs to get your own data out is like praising auto companies for not filling your airbags with gravel. I’m not saying data export isn’t important, it’s just aiming kinda low. You mean when I give you data, you’ll give it back to me? People who think this is the pinnacle of freedom aren’t really worth listening to.”

Via pmarca, I think?

Thursday, April 24, 2008 | acquire  

Dusting off

There’s nothing worse than someone keeping a journal or blog and having it go stale, so I’ve watched in horror during the forty day Lenten fast since I’ve had a chance to post. Things should be better in the next few weeks.

My guidance is Mark Twain, speaking in The Innocents Abroad, who lampooned blogging so accurately a short 139 years ago.

One of our favorite youths, Jack, a splendid young fellow with a head full of good sense, and a pair of legs that were a wonder to look upon in the way of length and straightness and slimness, used to report progress every morning in the most glowing and spirited way, and say:

“Oh, I’m coming along bully!” (he was a little given to slang in his happier moods.) “I wrote ten pages in my journal last night – and you know I wrote nine the night before and twelve the night before that. Why, it’s only fun!”

“What do you find to put in it, Jack?”

“Oh, everything. Latitude and longitude, noon every day; and how many miles we made last twenty-four hours; and all the domino games I beat and horse billiards; and whales and sharks and porpoises; and the text of the sermon Sundays (because that’ll tell at home, you know); and the ships we saluted and what nation they were; and which way the wind was, and whether there was a heavy sea, and what sail we carried, though we don’t ever carry any, principally, going against a head wind always – wonder what is the reason of that? – and how many lies Moult has told – Oh, every thing! I’ve got everything down. My father told me to keep that journal. Father wouldn’t take a thousand dollars for it when I get it done.”

“No, Jack; it will be worth more than a thousand dollars – when you get it done.”

“Do you? – no, but do you think it will, though?

“Yes, it will be worth at least as much as a thousand dollars – when you get it done. May be more.”

“Well, I about half think so, myself. It ain’t no slouch of a journal.”

But it shortly became a most lamentable “slouch of a journal.” One night in Paris, after a hard day’s toil in sightseeing, I said:

“Now I’ll go and stroll around the cafes awhile, Jack, and give you a chance to write up your journal, old fellow.”

His countenance lost its fire. He said:

“Well, no, you needn’t mind. I think I won’t run that journal anymore. It is awful tedious. Do you know – I reckon I’m as much as four thousand pages behind hand. I haven’t got any France in it at all. First I thought I’d leave France out and start fresh. But that wouldn’t do, would it? The governor would say, ‘Hello, here – didn’t see anything in France? That cat wouldn’t fight, you know. First I thought I’d copy France out of the guide-book, like old Badger in the for’rard cabin, who’s writing a book, but there’s more than three hundred pages of it. Oh, I don’t think a journal’s any use – -do you? They’re only a bother, ain’t they?”

“Yes, a journal that is incomplete isn’t of much use, but a journal properly kept is worth a thousand dollars – when you’ve got it done.”

“A thousand! – well, I should think so. I wouldn’t finish it for a million.”

Stay tuned for Mark Twain’s thoughts on Digg, YouTube, and Web 2.0.

Thursday, April 24, 2008 | site  

Representing power usage in the sky

collage123_600.jpgWonderful project that shows power usage mapped to a green cloud, projected into the sky and onto the output of the Salmisaari power plant in Helsinki. From their description:

Every night from the 22 to the 29 of February 2008, the vapour emissions of he Salmisaari power plant in Helsinki will be illuminated to show the current levels of electricity consumption by local residents. A laser ray will trace the cloud during the night time and turn it into a city scale neon sign. Nuage Vert is a communal event for the area of Ruoholahti, which anticipates esoteric cults centred on energy and transforms an active power plant into a space for art, a living factory. In tandem, as a reversal of conventional roles whereby the post-industrial factory is turned into space for culture, Kaapeli (the cultural factory) becomes the site of operation and Salmisaari (the industrious factory) becomes the site of spectacle.

Check out their blog page with updates and pictures.

Thursday, April 24, 2008 | physical  

Why are the Microsoft Office file formats so complicated?

An excellent post from Joel Spolsky about the file format specifications that were recently released by Microsoft (to comply with or avoid more anti-trust / anti-competition mess).

Last week, Microsoft published the binary file formats for Office. These formats appear to be almost completely insane. The Excel 97-2003 file format is a 349 page PDF file. But wait, that’s not all there is to it!

This is a perfect example of the complexity of parsing, and dealing with file formats (particularly binary file formats) in general. As Joel describes it:

A normal programmer would conclude that Office’s binary file formats:

  • are deliberately obfuscated
  • are the product of a demented Borg mind
  • were created by insanely bad programmers
  • and are impossible to read or create correctly.

You’d be wrong on all four counts.

Read the article for more insight about parsing and the kind of data that you’re likely to find in the wild. While you’re at it, his Back to Basics post covers similar ground with regard to proper programming skills, and also gets into the issues of file formats (binary versus XML, and how you build code that reads it).

Joel is another (technical) person whose writing I really enjoy. In the course of digging through his page a bit, I also was reminded of the Best Software Writing I compilation that he assembled, a much needed collection because of the lack of well chosen words on the topic.

Saturday, March 15, 2008 | parse  

Are you a member of Facebook.com? You may have a lifetime contract

A New York Times article from February about the difficulty of removing your personal information from Facebook. I believe that in the days that followed Facebook responded by making it ever-so-slightly possible to actually remove your account (though still not very easy).

Further, there is the network effect of information that’s not “just” your own. Deleting a Facebook profile does not appear to delete posts you’ve made to “the wall” of any friends, for instance. Do you own those comments? Does your friend? It’s a somewhat similar situation in other areas—even if I chose not to have a Gmail account, because I don’t like their data retention policy, all my email sent to friends with Gmail accounts is subject to those terms I’m unhappy with.

Regardless, this is an enormous issue as we put more of our data online. What does it mean to have this information public? What happens when you change your mind?

Facebook stands out because it’s a scenario of starting college (at age 17 or 18 or now even earlier), having a very different view of what’s public and private, and that evolving over time. You may not care to have things public at the time, but one of the best things about college (or high school, for that matter) is that you move on. Having a log of your outlook, attitude, and photos to prove it that is stored on a a company’s servers means that there are more permanent memories of the time which are out of your control. (And you don’t know who else beside Facebook is storing it—search engine caches, companies doing data mining, etc. all take a role here.) Your own memories might be lost to alcohol or willful forgetfulness, but digital copies don’t behave the same way.

The bottom line is an issue of ownership of one’s own personal information. At this point, we’re putting more information online—whether it’s Facebook or having all your email stored by Gmail—but we haven’t figured out what that really means.

Saturday, March 15, 2008 | privacy, retention, social  

Democratic Delegate Scenarios

counter.jpgOne of the chapters that I had to cut from Visualizing Data was about scenarios—building interactive “what if” tools that help you quickly try out several possibilities. This is one of the most useful aspects of dynamic visualization—being able to try out different ideas in a quick way (and safe, as in non-destructive, since Undo is always nearby). Hopefully I’ll be able to cover this sometime soon.

At any rate, one such scenario-building tool is Slate’s Delegate Calculator, where you can drag primitive sliders back and forth and see the possibilities for delegate outcomes for Hillary and Obama.

I’ve seen complaints about its math, but it seems to do an OK job for a big-picture look at the likelihood of different outcomes. Getting the math 100% is impossible (unless you have a far more complicated interface) because the delegate selection process is different in each state. It appears that none of the states wanted to be seen using the same approach as another, and with fifty states going their own way, things got pretty random (Texas: we’ll have a caucus and a primary).

I think that’s enough posting about politics for a bit.

Saturday, March 15, 2008 | election, politics, scenarios  

Basing News Categorization on Blog Blather

blews-small.jpgFound this on Slashdot, but their headline—“Microsoft Developing News Sorting Based On Political Bias” made it sound a lot more interesting than it may be. The idea of mining text data to tease out mythical media biases and leanings sounds fascinating. What sort of axes could be determined? Could we see how different kinds of language are used, or ways that particular code words or phrases infect news coverage?

Unfortunately, the research project from Microsoft looks like it’s just procuring link counts from “liberal” and “conservative” blogs, and gauging the vigor of commentary on either side. Does this make you uneasy yet?

  • We are politically binary: the world has devolved into conservative and liberal! (Or not, yet why do people insist on it?) The representation seems almost entirely U.S.-centric, right down to the red and blue coloring on either side. Red states! Blue states! Red blogs! Blue Blogs! A maleficent Dr. Seuss has infected our political outlook.
  • What about those other axes, where are they? Of all the things to cull from political discourse, liberal vs. conservative must be one of the least interesting. Did you need a team of six from Microsoft, plus all the computing power at their disposal, to tell you that one article or another ruffled more feathers on either side of this simplified spectrum?
  • There’s so much to be learned from propagation of phrases and ideas in the news; why hasn’t there been a more sophisticated representation of it? (Because it’s hard?) The Daily Show has shown this successfully (queueing several people in order repeating something like “axis of evil” or something about “momentum” for a candidate).
  • Blogs are not real. When you turn off the computer, they go away. The internet is not a place, and is too divorced from actual reality to be a useful gauge on most social phenomena. Using blogs as input for a kind of meta-study seems like a poor way to acquire data.

The problems I cite are a bit unfair since they haven’t posted much on their site (looks like they’re presenting a paper…soon?) so the reaction is just based on what they’ve provided. I knew Sumit Basu back at the Media Lab and I think it’s safe to assume there’s more going on…

But what about these bigger issues?

Saturday, March 15, 2008 | news, politics  

Lesson #5: Proportionality should be a guideline in war

firebombing_leaflet.jpgHalfway through The Fog of War by Errol Morris (of The Thin Blue Line, or the Apple “Switch” ad campaign depending on your persuasion), Robert S. McNamara (Secretary of Defense for the Kennedy and Johnson administrations) describes proportionality in war:

Why was it necessary to drop the nuclear bomb if [General Curtis] LeMay was burning up Japan? And he went on from Tokyo to firebomb other cities. 58% of Yokohama. Yokohama is roughly the size of Cleveland. 58% of Cleveland destroyed. Tokyo is roughly the size of New York. 51% percent of New York destroyed. 99% of the equivalent of Chattanooga, which was Toyama. 40% of the equivalent of Los Angeles, which was Nagoya. This was all done before the dropping of the nuclear bomb, which by the way was dropped by LeMay’s command.

The gruesome description is abetted by a different kind of proportionality—that when placed in the context of size with regard to U.S. cities, these numbers become more “real.” I found this set particularly striking for how ordinary the cities were—Cleveland and Chattanooga, in addition to the usual New York and Los Angeles. The huge metropolitan areas may be too abstract for many, but Cleveland!?—those are actual people!

The entire transcript is also on Errol Morris’ site—amazing. Why don’t more studios do this? It’s great to be able to study it more closely, and was enough to convince me to purchase (rather than just rent) the movie.

Sunday, March 9, 2008 | movies  

“How much is your education worth to you?!? E-mail your best offer.”

5423-doonsebury.pngArticle from the Chronicle of Higher Education about course selection (competition, class lotteries, etc).

Every college has a hot-ticket class. Maybe it’s the subject matter (serial killers! sailing!) or maybe it’s a celebrity professor (George Tenet! Toni Morrison!). Whatever it is, everybody wants to get in.

And, of course, not everybody can. So how do you decide who gets a seat and who’s disappointed?

If you’re Patricia de Castries, you make everybody sleep outside your door. Ms. de Castries, assistant director of the Stanford Language Center, teaches a wildly popular wine-tasting course at the university. Often more than 100 would-be connoisseurs compete for the 60 spots, so on the eve of registration students show up with pillows and sleeping bags, hoping to get their names on the list. “It’s tough,” says Ms. de Castries, “but if you want to be in the class, you do it.”

Covers the range from MIT’s technical approach to Wharton’s free market approach, where students at the latter bid on courses using a point system. Sadly, the article now seems to be blocked except for those academic-types who have access to a subscription.

(Thanks Eugene)

Sunday, March 9, 2008 | probability  

Wal-Mart states and Starbucks states

Comparing the number of Starbucks and Wal-Marts per capita across the United States (the lower 48 at least).

both-520.png

Read more Statistical Modeling, Causal Inference, and Social Science from Andrew Gelman’s lab at Columbia.

(thx, jason)

Sunday, March 9, 2008 | infographics, mapping  

The Myth of the ‘Transparent Society’

I’ve always been uncomfortable with the idea of David Brin’s The Transparent Society, because it provides an over-simplified version of a very complex problem. While it appeals to our general obsession with finding simple solutions, it fails to actually address a very real problem. Rather than a revolutionary or provocative idea, it’s in fact an argument for maintaining the status quo.

I’ve never quite been able to parse it out properly, but was pleased to see that Bruce Schneier (Chuck Norris of the security industry) addressed Brin’s argument this week for Wired:

When I write and speak about privacy, I am regularly confronted with the mutual disclosure argument. Explained in books like David Brin’s The Transparent Society, the argument goes something like this: In a world of ubiquitous surveillance, you’ll know all about me, but I will also know all about you. The government will be watching us, but we’ll also be watching the government. This is different than before, but it’s not automatically worse. And because I know your secrets, you can’t use my secrets as a weapon against me.

This might not be everybody’s idea of utopia — and it certainly doesn’t address the inherent value of privacy — but this theory has a glossy appeal, and could easily be mistaken for a way out of the problem of technology’s continuing erosion of privacy. Except it doesn’t work, because it ignores the crucial dissimilarity of power.

Schneier’s most recent book is Beyond Fear (which I’ve not yet had a chance to read) and also has an excellent monthly mailing list (that I read all the time) that covers topics like privacy and security. He is a gifted writer who can explain both the subtleties of the privacy debate as well as the complexities of security in terms that are informative for technologists and interesting for anyone else.

Sunday, March 9, 2008 | privacy  

Li’l Endian

GulliverChapters 9 and 10 (acquire and parse) are secretly my favorite parts of Visualizing Data. They’re a grab bag of useful bits based on many years of working with information (previous headaches)… the sort of things that come up all the time.

Page 327 (Chapter 10) has some discussion about little endian versus big endian, the way in which different computer architectures (Intel vs. the rest of the world, respectively) handle multi-byte binary data. I won’t repeat the whole section here, though I have two minor errata for that page.

First, an error in formatting which lists network byte order, rather than network byte order. The other problem is that I mention that little endian versions of Java’s DataInputStream class can be found on the web for little more than a search for DataInputStreamLE. As it turns out, that was a big fat lie, though you can find a handful if you search for LEDataInputStream (even though that’s a goofier name).

To make it up to you, I’m posting proper DataInputStreamLE (and DataOutputStreamLE) which are a minor adaptation of code from the GNU Classpath project. They work just like DataInputStream and DataOutputStream, but just swap the bytes around for the Intel-minded. Have fun!

DataInputStreamLE.java

DataOutputStreamLE.java

I’ve been using these for a project and they seem to be working, but let me know if you find errors. In particular, I’ve not looked closely at the UTF encoding/decoding methods to see if there’s anything endian-oriented in there. I tried to clean it up a bit, but the javadoc may also be a bit hokey.

(Update) Household historian Shannon on the origin of the terms:

The terms “big-endian” and “little-endian” come from Gulliver’s Travels by Jonathan Swift, published in England in 1726. Swift’s hero Gulliver finds himself in the midst of a war between the empire of Lilliput, where people break their eggs on the smaller end per a royal decree (Protestant England) and the empire of Blefuscu, which follows tradition and breaks their eggs on the larger end (Catholic France). Swift was satirizing Henry VIII’s 1534 decision to break with the Roman Catholic Church and create the Church of England, which threw England into centuries of both religious and political turmoil despite the fact that there was little doctrinal difference between the two religions.

Friday, March 7, 2008 | code, parse, updates, vida  

United Nations data now (more readily) available

The United Nations has just launched a new web site to house all their data for all you kids out there who wanna crush Hans Rosling. The availability of this sort of information has been a huge problem in the past (Hans’ talks are based on World Bank data that costs real money), and while the U.N. has been pretty good about making things available, a site whose sole purpose is to disseminate usable data is enormous.

Thursday, March 6, 2008 | acquire  

Zipdecode in der Schweiz

gossau.pngDominic Allemann has developed a Swiss version of the zipdecode example from chapter six of Visualizing Data. This is the whole point of the book—to actually try things out and adapt them in different ways and see what you can learn from it.

Switzerland makes an interesting example because it has far fewer postal codes than the U.S., though the dots are quite elegant on their own. With fewer data points, I’d be inclined to 1) change the size of the individual points to make them larger without making them overwhelming, 2) or work with the colors to make the contrast more striking, since changing the point size is likely to be too much), and 3) get the text into mixed case (in this example, Gossau SG instead of GOSSAU SG). Something as minor as avoiding ALL CAPS helps get us away from the representation looking too much like COMPUTERS and DATABASES, and instead into something meant for regular humans. Finally, 4) with the smaller (and far more regular) data set, it’s not clear if the zoom even helps—could even be better off without it.

Thanks to Dominic for passing this along; it’s great to see!

Thursday, March 6, 2008 | adaptation, vida, zipdecode  

Goodbye red background, farewell imposing text blocks

I’m in the midst of rolling out a web site redesign. The former site (un)design was assembled just after finishing my Ph.D. I expected it to be bad enough to force myself to make a proper site. Three and a half years passed, with even friends who weren’t designers (including my future mother-in-law) taking exception. The redesign was done by my friend Eugene Kuo, who couldn’t deal with it any longer.

I’m currently building out the design and hooking up all the pages (including a handful of projects that weren’t linked before). The navigation at the top will slowly begin to work as this process continues. For instance, the “projects” link currently points to my old site, which is missing anything I’ve done in the past four years. The big images on the home page will soon be rotating through projects, while the new projects page will provide a better visual overview of what’s inside.

At any rate, thanks to Eugene and keep an eye out…

Wednesday, March 5, 2008 | site  

Google Chart API

I’ve not had a chance to try these out with an actual project yet, but the Google Chart API seems to be a decent way to get Tufte® compliant chart images using simple web requests. Just pack the info for the chart’s appearance and data into a specially crafted URL and you’re set.

It’s a nice idea for a service, and I also appreciate that Google has kept it simple, rather than implementing it through a series of obfuscated and strangely-crafted embedded JavaScript (like, say, Google maps or their newer search APIs after discontinuing the SOAP protocol).

Wednesday, March 5, 2008 | api, represent  

Robust Analysis of Socio-cultural Observations

money-vs-problems.jpgGiven the number of data points provided, it would be difficult to refute the findings depicted in this chart.

Related work can be found here and here. While later research findings (by latecomers who foolishly claim to have invented the approach) here and here.

Thanks to Raelynn Miles for the original link.

Wednesday, March 5, 2008 | infographics, music  

Cracks in the Guggenheim

Beautiful info graphic from a September 2007 article about the restoration of the Guggenheim, depicting the cracks in the concrete walls. From the image:

Since the Guggenheim Museum opened in 1959, Frank Lloyd Wright’s massive spiral facade has been showing signs of cracking, mainly from seasonal temperature fluctuations that caus the concrete walls, built without expansion joints, to contract and expand.

The image is partly striking for the contrast between the NYT-style geometric graphic and pale colors mixed with the organic shape of the cracks. Wonderful.

guggenheim-520.jpg

Sent from one of my former students at CMU (you know who you are, drop me a line if it was you…I’ve lost the original message!)

Tuesday, March 4, 2008 | infographics  

Can we just agree never to use the word “surge”… in any context?

Somewhere between the “most important” and “only useful” thing about the wide availability map data, GPS systems, and the sort ied-mapping-300.jpgof mash-up type things that are all the rage is the ability to actually annotate map information in a useful way by combining these features.

An unfortunately titled article from the Technology Review describes a system being used in Iraq to help soldiers with their counterinsurgency efforts.

The new technology … is a map-centric application that … officers … can study before going on patrol and add to upon returning. By clicking on icons and lists, they can see the locations of key buildings, like mosques, schools, and hospitals, and retrieve information such as location data on past attacks, geotagged photos of houses and other buildings (taken with [GPS-equipped] cameras), and photos of suspected insurgents and neighborhood leaders. They can even listen to civilian interviews and watch videos of past maneuvers. It is just the kind of information that soldiers need to learn about Iraq and its perils.

It’s a wonder that such systems aren’t the norm, and the software described seems quite straightforward. But a step further, I found this quote intriguing:

“It is a bit revolutionary from a military perspective when you think about it, using peer-based information to drive the next move … Normally we are used to our higher headquarters telling the patrol leader what he needs to think.”

Not so much the cliché of technology being an enabler or democratizer (that can’t be a word, can it?) Rather, there’s something interesting about how the strength of a military structure (in discipline and rote effectiveness) is derived in part from top-down control, but that lies in direct contradiction to how information—of any kind, really—needs to move around this organization for it to be effective. What does it mean an approach like this one works in such contrast to tradition?

Thursday, February 28, 2008 | iraq, mapping  

The minimum, the maximum, and the typos therein

Blake Tregre found a typo on page 55 of Visualizing Data in one of the comments:

// Set the value of m arbitrarily high, so the first value
// found will be set as the maximum.
float m = MIN_FLOAT;

That should instead read something like:

// Set the value of m to the lowest possible value,
// so that the first value found will automatically be larger.
float m = MIN_FLOAT;

This also reminds me that the Table class used in chapter 4, makes use of Float.MAX_VALUE and -Float.MAX_VALUE, which are inherited from Java. Processing has constants named MAX_FLOAT and MIN_FLOAT that do the same thing. We added the constants because -Float.MAX_VALUE seems like especially awkward syntax when you’re just trying to get the smallest possible float. The Table class was written sometime before the constants were added to the Processing syntax, so they use the Java approach.

There is a Float.MIN_VALUE in Java, however the spec does a very unfortunate thing, because MIN_VALUE is defined as “A constant holding the smallest positive nonzero value of type float”, which sounds promising until you realize that it just means a very tiny positive number, not the minimum possible value for float. It’s not clear why they thought this would be a more useful constant (or useful at all).

And to make things even more confusing, Integer.MAX_VALUE and Integer.MIN_VALUE behave more like the way you might expect, where the MIN_VALUE is in fact that the lowest (most negative) value for an int. Had they used the same definition as Float.MIN_VALUE, then Integer.MIN_VALUE would equal 1. Which illustrates just how silly it is to do that for the Float class.

Tuesday, February 26, 2008 | series, updates, vida  

Sometimes, the map plays with you

cnn-250.jpgI missed seeing it live, but was told about it by a baffled friend who muttered about watching CNN and that the anchors were having a little too much fun with a new touch-screen toy while they covered returns for the primaries. The Washington Post provides more details.

Standing in front of an oversize monitor, King began poking, touching and waving at the screen like an over-caffeinated traffic cop. Each movement set in motion a series of zooming maps and flying pie charts, which King was then able to position around the screen at will.

The story also references Tim Russert’s much-talked-about (and I’d-forgotten-about) whiteboard scribbling for the 2000 election, which has me wondering about which presentation was actually more informative for viewers.

Thankfully, The Daily Show provides some insight:


 

Tuesday, February 26, 2008 | election  

Restoring Sight with the Tongue

a887_2866.jpgVisualization works because our eyes are the highest bandwidth channel for getting information into our brains. Researchers working to restore sight have found that the second best place may be the the tongue, due to the high density of nerve endings. An amazing testament to the adaptability of the brain to begin perceiving visual/spatial information from sensors of another organ.

Researchers at the University of Wisconsin-Madison are developing this tongue-stimulating system, which translates images detected by a camera into a pattern of electric pulses that trigger touch receptors. The scientists say that volunteers testing the prototype soon lose awareness of on-the-tongue sensations. They then perceive the stimulation as shapes and features in space. Their tongue becomes a surrogate eye.

Earlier research had used the skin as a route for images to reach the nervous system. That people can decode nerve pulses as visual information when they come from sources other than the eyes shows how adaptable, or plastic, the brain is, says Wisconsin neuroscientist and physician Paul Bach-y-Rita, one of the device’s inventors.

Via mailing list post from Daniel Brown.

Saturday, February 23, 2008 | science  

Caricatures of the Presidential Candidates

Obama CaricatureAn example of how cartoonists embed sophisticated ideas inside their drawings, videos from the Washington Post of caricaturist John Kascht describing his process. I especially liked the idea of Obama not smiling (in spite of the positive persona the campaign has been selling), and the description of McCain’s head as a “clenched fist” couldn’t be more apt. These are impressions that will stick with me next time I see all these candidates.

On Obama: “There’s a messianic aura about him. … That air of destiny really registers all across his face and in his body language as well. He shines. Light literally bounces off the guy from everywhere … And yet for all of the surface appeal of him, I’m drawn to the unsmiling images of him, where he has his head tipped back with an almost aristocratic bearing. Seems very telling somehow. As a work in progress he’s completely fascinating to watch and to draw.”

On Hillary: “It seems to fit that Clinton’s cheeks are her most prominent features. Cheeks aren’t exactly the windows to the soul, but Hillary Clinton’s not exactly a ‘peek inside my soul’ kind of person, anyway. … Her round facial features seem to balance on top of one another, and along the same lines her head seems to balance on top of her narrow shoulders like a boulder on a pyramid. I find it really interesting that this graphic profile that she cuts—of all of these elements in precarious alignment—is such a perfect metaphor for her political balancing act.”

On McCain: “His jaw gives him away…it’s an anger barometer. During debates when he’s being challenged by an opponent, he bites down hard, and you know what he really wants to do is go to the podium next door and smack somebody. … He’s got a head like a clenched fist, and it expands with every passing year. … His small, dark eyes are watchful and wary. Whether he’s smiling or talking he bares his teeth; they’re choppers really, and they flash with metal. They look like weapons. His skin isn’t skin so much as hide.”

On Mitt: “Mitt Romney is both the easiest and the hardest of the candidates to caricature. … He seems less like an individual person than a ‘type’ of person. He’s what central casting might come up with for the game show host type or the Ward Cleaverish 50’s dad type. … Because of the heavy ridge of his brow and his deep-set eyes, it’s tough to even see his eyes, much less find a twinkle in them. But his hair sparkles. That’s what we end up making eye contact with. It’s off-putting rather than inviting.”

Link and summaries stolen from Daily Kos.

Saturday, February 23, 2008 | drawing  

Karl Gude describes How to Draw an Eye

Wonderfully simple explanation of how to draw an eye. Karl used to be the graphics editor at Newsweek, and now teaches in the journalism school at Sparty.

I thought I’d share a short video I just made on how to draw an eye. I think it’s fun… Skip to the end if you’re in a hurry, though it’s only a couple of minutes long. Please pass it along to any budding artists! I plan to do a series of drawing instruction videos over time and this is the first.

Karl put together a fun conference last year. Conference might not be the right word (the attendees were the speakers, and the speakers the only attendees); really it was a handful of info geeks hanging out in Newport discussing each other’s work, but we certainly had a good time.

Saturday, February 23, 2008 | drawing  

Watching Cancer Growth with a See-Through Fish

080206-seethru-fish-02.jpgInformation visualization is the process of converting abstract information, like raw numbers, into form. Visualization is about representing phenomena, like weather, that already have a physical manifestation. Then there’s open your damn eyes, where you just stare at the thing you’re studying. Researchers at Children’s Hospital in Boston have created a see-through Zebrafish, allowing them to watch cancer growth in the fish’s body.

Via Slashdot.

Saturday, February 23, 2008 | science  

Book

Visualizing Data Book CoverVisualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.

Saturday, February 23, 2008 | vida  

Examples for Visualizing Data

Source code for the book examples for the folks who have kindly purchased the book (lining my pockets, $1.50 at a time).

Chapter 3 (the US map example).

Chapter 4 (time series with milk, tea, coffee consumption)

Chapter 5 (connections & correlations – salary vs. performance)

Chapter 6 (scatterplot maps – zipdecode)

Chapter 7 (hierarchies, recursion, word treemap, disk space treemap)

Chapter 8 (graph layout adaptation)

These links should cover the bulk of the code. More can be found at the URLs printed in the book, or copy & pasted from Safari online. As I understand it, those who have purchased the book should have access to the online version (see the back cover).

All examples have been tested but if you find errors of any kind (typos, unused variables, profanities in the comments, the usual), please drop me an email and I’ll be happy to fix the code.

Monday, February 4, 2008 | examples, vida  
Book

Visualizing Data Book CoverVisualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.

Archives