Writing

Li’l Endian

GulliverChapters 9 and 10 (acquire and parse) are secretly my favorite parts of Visualizing Data. They’re a grab bag of useful bits based on many years of working with information (previous headaches)… the sort of things that come up all the time.

Page 327 (Chapter 10) has some discussion about little endian versus big endian, the way in which different computer architectures (Intel vs. the rest of the world, respectively) handle multi-byte binary data. I won’t repeat the whole section here, though I have two minor errata for that page.

First, an error in formatting which lists network byte order, rather than network byte order. The other problem is that I mention that little endian versions of Java’s DataInputStream class can be found on the web for little more than a search for DataInputStreamLE. As it turns out, that was a big fat lie, though you can find a handful if you search for LEDataInputStream (even though that’s a goofier name).

To make it up to you, I’m posting proper DataInputStreamLE (and DataOutputStreamLE) which are a minor adaptation of code from the GNU Classpath project. They work just like DataInputStream and DataOutputStream, but just swap the bytes around for the Intel-minded. Have fun!

DataInputStreamLE.java

DataOutputStreamLE.java

I’ve been using these for a project and they seem to be working, but let me know if you find errors. In particular, I’ve not looked closely at the UTF encoding/decoding methods to see if there’s anything endian-oriented in there. I tried to clean it up a bit, but the javadoc may also be a bit hokey.

(Update) Household historian Shannon on the origin of the terms:

The terms “big-endian” and “little-endian” come from Gulliver’s Travels by Jonathan Swift, published in England in 1726. Swift’s hero Gulliver finds himself in the midst of a war between the empire of Lilliput, where people break their eggs on the smaller end per a royal decree (Protestant England) and the empire of Blefuscu, which follows tradition and breaks their eggs on the larger end (Catholic France). Swift was satirizing Henry VIII’s 1534 decision to break with the Roman Catholic Church and create the Church of England, which threw England into centuries of both religious and political turmoil despite the fact that there was little doctrinal difference between the two religions.

Friday, March 7, 2008 | code, parse, updates, vida  
Book

Visualizing Data Book CoverVisualizing Data is my 2007 book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. When first published, it was the only book(s) for people who wanted to learn how to actually build a data visualization in code.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is the basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.