Writing

Mapping Iran’s Online Public

mapping-iran-public-200px.jpg“Mapping Iran’s Online Public” is a fascinating (and very readable) paper from a study by John Kelly and Bruce Etling at Harvard’s Berkman Center. From the abstract:

In contrast to the conventional wisdom that Iranian bloggers are mainly young democrats critical of the regime, we found a wide range of opinions representing religious conservative points of view as well as secular and reform-minded ones, and topics ranging from politics and human rights to poetry, religion, and pop culture. Our research indicates that the Persian blogosphere is indeed a large discussion space of approximately 60,000 routinely updated blogs featuring a rich and varied mix of bloggers.

In addition to identifying four major poles (Secular/Reformist, Conservative/Religious, Persian Poetry and Literature, and Mixed Networks.) A number of surprising findings include details like the nature of discourse (such as the prominence of the poetry and literature category) or issues of anonymity:

…a minority of bloggers in the secular/reformist pole appear to blog anonymously, even in the more politically-oriented part of it; instead, it is more common for bloggers in the religious/conservative pole to blog anonymously. Blocking of blogs by the government is less pervasive than we had assumed.

They also produced images to represent the nature of the networks, seen in the thumbnail at right. The visualization is created with a force-directed layout that iteratively groups data points closer based on their content. It’s useful for this kind of study, where the intent is to represent or identify larger groups. In this case, the graphic supports what’s laid out in the text, but to me the most interesting thing about the study is the human-centered tasks of the project, such as the work done by hand in reviewing and categorizing such a large number of sites. It’s this background work that sets it apart from many other images like it which tend to rely too heavily on automation.

(The paper is from April 6, 2008 and I first heard about after being contacted by John in June. Around 1999, our group had hosted students that he was teaching in a summer session for a visit to the Media Lab. And now a few months later, I’m digging through my writing todo pile.)

Tuesday, August 26, 2008 | forcelayout, represent, social  

Surfing, Orgies, and Apple Pie

Obscenity law in the United States is based on Miller vs. California, a precedent set in 1973:

“(a) whether the ‘average person, applying contemporary community standards’ would find that the work, taken as a whole, appeals to the prurient interest,

(b) whether the work depicts or describes, in a patently offensive way, sexual conduct specifically defined by the applicable state law, and

(c) whether the work, taken as a whole, lacks serious literary, artistic, political, or scientific value.”

Of course, the definition of an average person or community standards isn’t quite as black and white as most Supreme Court decisions. In a new take, the lawyer defending the owner of a pornography site in Florida is using Google Trends to produce what he feels is a more accurate definition of community standards:

In the trial of a pornographic Web site operator, the defense plans to show that residents of Pensacola are more likely to use Google to search for terms like “orgy” than for “apple pie” or “watermelon.” The publicly accessible data is vague in that it does not specify how many people are searching for the terms, just their relative popularity over time. But the defense lawyer, Lawrence Walters, is arguing that the evidence is sufficient to demonstrate that interest in the sexual subjects exceeds that of more mainstream topics — and that by extension, the sexual material distributed by his client is not outside the norm.

Below, “surfing” in blue, “orgy” in red, and “apple pie” in orange:

viz-500.png

A clever defense. The trends can also be localized to roughly the size of a large city or county, which arguably might be considered the “community.” The New York Times article continues:

“Time and time again you’ll have jurors sitting on a jury panel who will condemn material that they routinely consume in private,” said Mr. Walters, the defense lawyer. Using the Internet data, “we can show how people really think and feel and act in their own homes, which, parenthetically, is where this material was intended to be viewed,” he added.

Fascinating that there could actually be something even remotely quantifiable about community standards. “I know it when I see it” is inherently subjective, so is any introduction of objectivity an improvement? For more perspective, I recommend this article from FindLaw, which describes the history of “Movie Day” at the Supreme Court and the evolution of obscenity law.

The trends data has many inherent problems (lack of detail for one), but is another indicator of what we can learn from Google. Most important to me, the case provides an example of what it means for search engines to capture this information, because it demonstrates to the public at large (not just people who think about data all day) how the information can be used. As more information is collected about us, search engine data provides an imperfect mirror onto our society, previously known only to psychiatrists and priests.

Tuesday, June 24, 2008 | online, privacy, retention, social  

Somewhere between graffiti and terrorism

boy-noshadow.jpgMatt Mullenweg, creator of Wordpress, speaking at the “Future of Web Apps” conference in February:

Spammers are “the terrorists of Web 2.0,” Mullenweg said. “They come into our communities and take advantage of our openness.” He suggested that people may have moved away from e-mail and toward messaging systems like Facebook messaging and Twitter to get away from spam. But with all those “zombie bites” showing up in his Facebook in-box, he explained, the spammers are pouncing on openness once again.

I don’t think that “terrorists” is the right word—they’re not taking actions with an intent to produce fear that will prevent people from using online communities (much less killing bloggers or kidnapping Facebook users). What I like about this quote is the idea that “they take advantage of openness,” which puts it well. There needs to be a harsher way to describe this situation than “spamming” which suggests a minor annoyance. There’s nothing like spending a Saturday morning cleaning out the Processing discussion board, or losing an afternoon modifying the bug database to keep it safer from these losers. It’s a bit like people who crack machines out of maliciousness or boredom—it’s incredibly time consuming to clean up the mess, and incredibly frustrating when it’s something done in your spare time (like Processing) or to help out the group (during grad school at the ACG).

So it’s somewhere between graffiti and terrorism, but it doesn’t match either because the social impact at either end of that scale is incredibly different (graffiti can be a positive thing, and terrorism is a real world thing where people die).

On a more positive note, and for what it’s worth, I highly recommend Wordpress. It’s obvious that it’s been designed and built by people who actually use it, which means that the interface is pleasantly intuitive. And not surprising that it was initially created by such a character.

Monday, June 9, 2008 | online, social  

Are you a member of Facebook.com? You may have a lifetime contract

A New York Times article from February about the difficulty of removing your personal information from Facebook. I believe that in the days that followed Facebook responded by making it ever-so-slightly possible to actually remove your account (though still not very easy).

Further, there is the network effect of information that’s not “just” your own. Deleting a Facebook profile does not appear to delete posts you’ve made to “the wall” of any friends, for instance. Do you own those comments? Does your friend? It’s a somewhat similar situation in other areas—even if I chose not to have a Gmail account, because I don’t like their data retention policy, all my email sent to friends with Gmail accounts is subject to those terms I’m unhappy with.

Regardless, this is an enormous issue as we put more of our data online. What does it mean to have this information public? What happens when you change your mind?

Facebook stands out because it’s a scenario of starting college (at age 17 or 18 or now even earlier), having a very different view of what’s public and private, and that evolving over time. You may not care to have things public at the time, but one of the best things about college (or high school, for that matter) is that you move on. Having a log of your outlook, attitude, and photos to prove it that is stored on a a company’s servers means that there are more permanent memories of the time which are out of your control. (And you don’t know who else beside Facebook is storing it—search engine caches, companies doing data mining, etc. all take a role here.) Your own memories might be lost to alcohol or willful forgetfulness, but digital copies don’t behave the same way.

The bottom line is an issue of ownership of one’s own personal information. At this point, we’re putting more information online—whether it’s Facebook or having all your email stored by Gmail—but we haven’t figured out what that really means.

Saturday, March 15, 2008 | privacy, retention, social  
Book

Visualizing Data Book CoverVisualizing Data is my book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. Unlike nearly all books in this field, it is a hands-on guide intended for people who want to learn how to actually build a data visualization.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. People who have purchased the book can find the examples here.

The book covers ideas found in my Ph.D. dissertation, which is basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. but applies them to a series of examples, first starting with a simple mapping project (Chapter 3) to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site will be used for follow-up code and writing about related topics.