Writing

Parsing Numbers by the Bushel

While taking a look at the code mentioned in the previous post, I noticed two things. First, the PointCloud.pde file drops directly into OpenGL-specific code (rather than Processing API) for sake of speed to draw thousands and thousands of points. It’s further proof that I need to finish the PShape class for Processing 1.0, which will automatically handle this sort of thing automatically.

Second is a more general point about parsing. This isn’t intended as a nitpick on Aaron’s code (it’s commendable that he put his code out there for everyone to see—and uh, nitpick about). But seeing how it was written reminded me that most people don’t know about the casts in Processing, particularly when applied to whole arrays, and this can be really useful when parsing data.

To convert a String to a float (or int) in Processing, you can use a cast, for instance:

String s = "667.12";
float f = float(s);

This also in fact works with String[] arrays, like the kind returned by the split() method while parsing data. For instance, in SceneViewer.pde, the code currently reads:

String[] thisLine = split(raw[i], ",");
points[i * 3] = new Float(thisLine[0]).floatValue() / 1000;
points[i * 3 + 1] = new Float(thisLine[1]).floatValue() / 1000;
points[i * 3 + 2] = new Float(thisLine[2]).floatValue() / 1000;

Which could be written more cleanly as:

String[] thisLine = split(raw[i], ",");
float[] f = float(thisLine);
points[i * 3 + 0] = f[0] / 1000;
points[i * 3 + 1] = f[1] / 1000;
points[i * 3 + 2] = f[2] / 1000;

However, to his credit, Aaron may have have intentionally skipped it in this case since he don’t need the whole line of numbers.

Or if you’re using the Processing API with Eclipse or some other IDE, that means that the float() cast won’t work for you. You can substitute float() with the parseFloat() method:

String[] thisLine = split(raw[i], ",");
float[] f = parseFloat(thisLine);
points[i * 3 + 0] = f[0] / 1000;
points[i * 3 + 1] = f[1] / 1000;
points[i * 3 + 2] = f[2] / 1000;

The same can be done for int, char, byte, and boolean. You can also go the other direction by converting float[] or int[] arrays to String[] arrays using the str() method. (The method is named str() because a String() cast would be awkward, a string() cast would be error prone, and it’s not really parseStr() either.)

When using parseInt() and parseFloat() (versus the int() and float() casts), it’s also possible to include a second parameter that specifies a “default” value for missing data. Normally, the default is Float.NaN for parseFloat(), or 0 with parseInt() and the others. When parsing integers, 0 and “no data” often have a very different meaning, in which case this can be helpful.

Tuesday, July 15, 2008 | parse  
Book

Visualizing Data Book CoverVisualizing Data is my 2007 book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. When first published, it was the only book(s) for people who wanted to learn how to actually build a data visualization in code.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is the basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.