<< ben fry

genomic cartography

This is a short draft-quality writeup that describes my dissertation direction and what I call as "genomic cartography." The document was prepared as part of my general exam, and describes the redesign of gff2ps, a commonly used program for depicting sequence data. It is not a paper that has been through a full review process.

Download PDF (1.2 MB)
You will need Acrobat Reader to view this document. If you're having trouble printing, try upgrading to the most recent release.

The completion of a genome sequencing project is often followed by the publication of a paper detailing the process and notable observations that can be made of the newly acquired data set. More significant projects are typically presented in the prestigious scientific journals Nature and Science, as cover stories that detail what the new findings offer the scientific community.

In many such cases, the editors have seen fit to publish an image of the data itself, ostensibly to provide the reader with a high level understanding of how the new data set fits together, as a large-format figure to accompany the paper. Of course these are also meant to be subjectively enjoyed, since they often take the form of a wall poster, which might find a place on the wall of the reader's office or lab. Not so much as a day-to-day research tool, but an image that provides a feel for the data—a sense of what a genome 'looks' like.

Considering the circulation (64,000 subscribers for Nature, 150,000 for Science) and broad scientific audience for the two publications, these maps are arguably the most widely known information graphic for genetic data. However, there has been a lack of analysis of its implementation. It would appear that the same level of rigor that is applied to the content of the magazine has not been applied to these images. Effective communication in the text of an article is tied to explicitly stated editorial principles of clarity and brevity, and the same high standard should be applied to the figures that accompany them. While perhaps not the most difficult or pressing challenge in genomic visualization, the widespread use of these large format images makes them a useful starting point for research in how to improve such data visualizations.

A useful analogy is mapping and cartography. Similar to a genome, data for geographic features is notoriously voluminous, and the resulting representation extremely dense. Yet cartographers have mastered the ability to successfully organize geographic data in a manner that communicates effectively. Cartography is a useful model because it synthesizes illustration, information design, statistics, and most often employs technological tools for implementation.