It’s April again, which means that there are messages lurking in my inbox asking about the whereabouts of this year’s Salary vs. Performance project (found in Chapter 5 of the good book). I got around to updating it a few days ago, which means now my inbox has changed to suggestions on how the piece might be improved. (It’s tempting to say, “Hey! Check out the book and the code, you can do anything you’d like with it! It’s more fun that way.” but that’s not really what they’re looking for.)
One of the best messages I’ve received so far is from someone who I strongly suspect is a statistician, who was wishing to see a scatter plot of the data rather than its current representation. Who else would be pining for a scatterplot? There are lots of jokes about the statistically inclined that might cover this situation, but… we’re much too high minded to let things devolve to that (actually, it’s more of a pot-kettle-black situation). If prompted, statisticians usually tell better jokes about themselves anyways.
At any rate, as it’s relevant to the issue of how you choose representations, my response follows:
Sadly, the scatter plot of the same data is actually kinda uninformative, since one of your axes (salary) is more or less fixed all season (might change at the trade deadline, but more or less stays fixed) and it’s just the averages that move about. So in fact if we’re looking for more “accurate”, a time series is gonna be better for our purposes. In an actual analytic piece, for instance, I’d do something very different (which would include multiple years, more detail about the salaries and how they amortize over time, etc).
But even so, making the piece more “correct” misses the intentional simplifications found in it, e.g. it doesn’t matter whether a baseball team was 5% away from winning, it only matters whether they’ve won. At the end of the day, it’s all about the specific rankings, who gets into the playoffs, and who wins those final games. Since the piece isn’t intended as an analytical tool, but something that conveys the idea of salary vs. performance to an audience who by and large cares little about 1) baseball and 2) stats. That’s not to say that it’s about making something zoomy and pretty (and irrelevant), but rather, how do you engage people with the data in a way that teaches them something in the end and gets them thinking about it.
Now to get back to my inbox and the guy who would rather have the data sonified since he thinks this visual thing is just a fad.