Writing

Proper Analysis of Salary vs. Performance?

Got an email from Mebane Faber who noted the roughly inverse correlation you currently see in salaryper, and asking about whether I’d done proper year-end analysis. The response follows:

I threw the project together as sort of a fun thing out of curiosity, and haven’t taken the time to do a proper analysis. However you can see in the previous years that the inverse relationship happens each year at the beginning of the season, and then as it progresses, the big market teams tend to mow down the small guys. Or at least those that are successful–the correlation between salary and performance at the end of a season is generally pretty haphazard. In fact, it’s possible that the inverse correlation at the beginning of the season is actually stronger than the positive correlation at the end.

I think the last point is kinda funny, though I’d imagine there’s a less funny statistics term for that phenomenon. Such a fine line between funny and sounding important.

Friday, June 6, 2008 | feedbag, salaryper  

Distribution of the foreign customers at a particular youth hostel

Two pieces representing youth hostel data from Julien Bayle. Both adaptations of the code found in Visualizing Data. The first a map:

bayle-worldmap.jpg

The map looks like most maps of data connected to a world map, but the second representation uses a treemap, which is much more effective (meaning that it answers his question much more directly).

bayle-treemap.jpg

The image as background is a nice technique, since if you’re not using colors to differentiate individual sectors, the treemap tends to be dominated by the outlines around the squares (search for treemap images and you’ll see what I mean). The background image lets you use the border lines, but the visual weight of the image prevents them from being in the foreground.

Anyone else with adaptations? Pass them along.

Thursday, June 5, 2008 | adaptation, vida  

I Think Somebody Needs A Hug

I tend to avoid reading online comments since they’re either overly negative or overly positive (neither is healthy), but I laughed out loud after happening across this comment from a post about salaryper on the Freakonomics blog at the New York Times site:

How do I become a “data visualization guru?”
Seems like a pretty sweet gig. But you probably need a degree in Useless Plots from Superficial Analysis School.

– Ben D.

No my friend, it takes a Ph.D. in Useless Plots from Superficial Analysis School. (And if you know this guy, please take him out for a drink — I’m concerned he’s been indoors too long.)

Thursday, June 5, 2008 | reviews, salaryper  

Obama Limited to 16 Bits

I guess I never thought I’d read about the 16-bit limitations of Microsoft Excel in mainstream press (or at least outside the geek press), but here it is:

Obama’s January fundraising report, detailing the $23 million he raised and $41 million he spent in the last three months of 2007, far exceeded 65,536 rows listing contributions, refunds, expenditures, debts, reimbursements and other details.

Excel has since its inception been limited to 65,536 rows, the maximum number you get when you represent the row number using two bytes. Mr. Millionsfromsmallcontributions has apparently flown past this limit in his FEC reports, forcing poor reporters to either use Microsoft Access (a database program) or pray for the just-released Excel 2007, where in fact the row restriction has been lifted.

In the past the argument against fixing the restriction had always been a mixture of “it’s too messy to upgrade something like that” and “you shouldn’t have that many rows of data in a spreadsheet anyway, you should use a database.” Personally I disagree with the latter; and as silly as the former sounds, it’s been the case for a good 20 years (or was the row limit even lower back then?)

The OpenOffice project, for instance, has an entire page dedicated to fixing the issue in OpenOffice Calc, where they’re limited to 30,000 rows—the limit being tied to 32,768, or the number you get with 15 bits instead of 16 (use the sixteenth bit as the sign bit indicating positive or negative, and you can represent numbers from -32768 to 32767 instead of unsigned 16 bit values that range from 0 to 65535).

Bottoms up for the first post tagged both “parse” and “politics”.

Thursday, June 5, 2008 | parse, politics  

What’s that big cloud over the rainforest?

As the .com shakeout loomed in the late 90s, I always assumed that:

  1. Most internet-born companies would disappear.
  2. Traditional (brick & mortar) stores would eventually get their act together and have (or outsource) a proper online presence. For instance Barnes & Noble hobbling toward a usable site, and Borders just giving up and turning over their online presence to Amazon. The former comical, the latter brilliant, though Borders has just returned with their own non-Amazonian presence. (Though I think the humor is now gone from watching old-school companies trying to move online.)
  3. Finally, a few new names—namely the biggest ones, like Amazon—would be left that didn’t disappear with the others from point #1.

Basically, that not much would change. A couple new brands would emerge, but that there wasn’t really room in people’s heads for that many new retailers or services. (It probably didn’t help that all their logos were blue and orange, and had names like Flooz, Boo and Kibu that feel natural on the tongue and inspire buyer loyalty and confidence.)

aws_bandwidth.gifBut not only did more companies stick around, some seem to be successfully pivoting into other areas. From Amazon:

In January of 2008 we announced that the Amazon Web Services now consume more bandwidth than do the entire global network of Amazon.com retail sites.

This from a blog post with this plot of the bandwidth use for both sides of the business.

Did you imagine that the site where you could buy books cheaper than anywhere else in 1998 would ten years later exceed the bandwidth from that with services for data storage and cloud computing? Of course, this announcement doesn’t say anything about their profits at this point, but I don’t think anyone expected Steve Jobs to turn Apple into a toy factory and start turning out music players and cell phones to have it become half their business within just a few years. (That’s half as in, “beastly silver PCs and shiny black and white laptops seem important and all, but those take real work…why bother?”)

But the point (aside from subjecting you to a long-winded description of .com history and my shortcomings as a futurist) has more to do with Amazon becoming a business that’s dealing purely in information. The information economy is all about people moving bits and ideas around (abstractions of things), instead of silk, furs, and spices (actual physical things). And while books are information, the growth of Amazon’s data services business—as evidenced by that graph—is one of the strongest indicators I’ve seen of just how real the non-real information economy has become. Not that the information economy is something new; but that the groundwork has been laid in the preceding decades where something like Amazon Web Services can be successful.

And since we’re on the subject of Amazon, I’ll close with more from Jeff Bezos from “How the Web Was Won” in this month’s Vanity Fair:

When we launched, we launched with over a million titles. There were countless snags. One of my friends figured out that you could order a negative quantity of books. And we would credit your credit card and then, I guess, wait for you to deliver the books to us. We fixed that one very quickly.

Or showing his genius early on:

When we started out, we were packing on our hands and knees on these cement floors. One of the software engineers that I was packing next to was saying, You know, this is really killing my knees and my back. And I said to this person, I just had a great idea. We should get kneepads. And he looked at me like I was from Mars. And he said, Jeff, we should get packing tables.

Thanks to Eugene for passing along the links.

Thursday, June 5, 2008 | goinuptotheserverinthesky, infographics, notaneconomist  

Movies, Mapping, and Motion Graphics

Elegantly done, and some of the driest humor in film titles you might ever see, the opening sequence from Death at a Funeral.

Excellent (and appropriate) music, color, and type; does a great job of setting up the film. IMDB description:

Chaos ensues when a man tries to expose a dark secret regarding a recently deceased patriarch of a dysfunctional British family

Or the tagline:

From director Frank Oz comes the story of a family that puts the F U in funeral.

Tuesday, June 3, 2008 | mapping, motion, movies  

Mark in Madrid

Mark Hansen is one of the nicest and most intelligent people you’ll ever meet. He was one of the speakers at the symposium at last Fall’s Visualizar workshop in Madrid, and Medialab Prado has now put the video of Mark’s talk (and others) online. Check it out:

Mark has a Ph.D. in Statistics and along with his UCLA courses like Statistical Computing and Advanced Regression, has taught one called Database Aesthetics, which he describes a bit in his talk. You might also be familiar with his piece Listening Post, which he created with Ben Rubin.

Tuesday, June 3, 2008 | speaky  

Goodbye 15 minutes: 1.5 seconds is the new real time

As cited on Slashdot, Google has announced that they’ll be providing real-time stock quotes from NASDAQ. As referred to in the title, this “real time” isn’t likely the same “real time” that financial institutions get for their “quotes,” since they still need to process the data and serve it up to you somehow. But for an old internet codger who thought quotes delayed by 15 minutes back in 1995 was pretty nifty, this is just one more sign of the information apocalypse.

wastler_a_100x100b.jpg

The Wall Street Journal is also in on the gig, and Allen Wastler from CNBC crows that they’re also a player. Interestingly, the data will be free from the WSJ at their Markets Data Center page—one more sign of a Journal that’s continuing to open up its grand Oak doors to give us plebes a peek inside their exclusive club.

An earlier post from the Google blog has some interesting details:

As a result, we’ve worked with the SEC, the New York Stock Exchange (NYSE) and our D.C. trade association, NetCoalition, to find a way to bring stock data to Google users in a way that benefits users and is practical for all parties. We have encouraged the SEC to ensure that this data can be made available to our users at fair and reasonable rates, and applaud their recent efforts to review this issue. Today, the NYSE has moved the issue a great step forward with a proposal to the SEC which if approved, would allow you to see real-time, last-sale prices…

The NYSE hasn’t come around yet, but the move by NASDAQ should give them the additional competitive push to make it happen soon enough. As it appears, this had more to do with getting SEC approval than the exchanges themselves. Which, if you think about it, makes sense—and if you think about it more, makes one wonder what sort of market-crashing scenario might be opened by millions having access to the live data. Time to write that movie script.

At right: CNBC’s publicity photo of Allen Wastler, which appears to have been shot in the 1930s and later hand-colorized. Upon seeing this, Wastler was then heard to say to the photo and paste-up people, “That’s amazing, can you also give me a stogie?” Who doesn’t want that coveted fat cat, robber baron blogger look.

Tuesday, June 3, 2008 | acquire  
« Newer Posts
Book

Visualizing Data Book CoverVisualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.

As seen on Twitter