writing | ben fry

Writing

Our gattaca future begins with our sports heroes

The New York Times this morning documents Major League Baseball’s use of DNA tests to verify the age of baseball prospects:

Dozens of Latin American prospects in recent years have been caught purporting to be younger than they actually were as a way to make themselves more enticing to major league teams. Last week the Yankees voided the signing of an amateur from the Dominican Republic after a DNA test conducted by Major League Baseball’s department of investigations showed that the player had misrepresented his identity.

Some players have also had bone scans to be used in determining age range.

(Why does a “bone scan” sound so painful? “You won’t provide a DNA sample? Well, maybe you’ll change your mind after the bone scan!”)

Kathy Hudson of Johns Hopkins notes the problem with testing:

“The point of [the Genetic Information Nondiscrimination Act, passed last year] was to remove the temptation and prohibit employers from asking or receiving genetic information.”

The article continues and makes note of the fact that such tests are also used to determine whether a player’s parents are his real parents, which can have an upsetting outcome.

But perhaps the broader concern (outside broken homes) and the scarier motivation for expansion of such testing is noted by a scouting director (not named), who comments:

“Can they test susceptibility to cancer? I don’t know if they’re doing any of that. But I know they’re looking into trying to figure out susceptibility to injuries, things like that. If they come up with a test that shows someone’s connective tissue is at a high risk of not holding up, can that be used? I don’t know. I do think that’s where this is headed.”

Injury is perhaps the most significant, yet most random, factor in scouting. If we’re talking about paying someone $27 million, will the threat of a federal discrimination law (wielded by a young player and agent) really be enough to keep teams away from this?

Wednesday, July 22, 2009 | genetics, sports

Flu headed to the dustbin of disease history?

And is disease history stored in a dustbin, for that matter?

Researchers at Dana-Farber may have found influenza’s weak spot, which could lead to a vaccine:

Yearly vaccination is currently needed because different strains of the virus circulate around the world regularly, owing to the germs’ rapidly changing genetic makeup. But the researchers reported yesterday that they had found one pocket of the virus that appears to remain static in multiple strains, making it an attractive target for a vaccine, as well as drugs.

And instead of fighting the primary part virus head on, you figure out a way to attack a portion that does not mutate in the weaker part and neutralize it:

Most vaccines work by revving up the body’s disease-fighting cells, helping them to recognize and rapidly neutralize invading germs. The researchers realized that the disease fighters generated by existing flu vaccines – which contain killed or weakened whole viruses – head straight toward the biggest target, the globular head. It is, in effect, a Trojan horse that prevents the body’s immune system from directing more of its firepower toward the stalk of the [virus], where the scientists found the pocket that was so static. That site contains machinery that lets the virus penetrate human cells.

A vaccine is a way off, but they say it should be possible to make a drug that helps the body create antibodies to fight off the flu sooner than that. Incredible work.

Monday, February 23, 2009 | genetics

F* Everything, We’re Doing 44 Vertebrates

From an announcement email sent this week by the folks behind the UCSC Genome Browser project:

We are pleased to announce the release of a new Conservation track based on the human (hg18) assembly. This track shows multiple alignments of 44 vertebrate species and measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all species (vertebrate) and two subsets (primate and placental mammal). The multiple alignments were generated using multiz and other tools in the UCSC/Penn State Bioinformatics comparative genomics alignment pipeline. Conserved elements identified by phastCons are also displayed in this track. For more details, please visit the track description page…

It’s the comparative genomics equivalent of “Fuck Everything, We’re Doing Five Blades,” an editorial penned by James M. Kilts (President and CEO of Gillette) for The Onion. Kilts writes:

Would someone tell me how this happened? We were the fucking vanguard of shaving in this country. The Gillette Mach3 was the razor to own. Then the other guy came out with a three-blade razor. Were we scared? Hell, no. Because we hit back with a little thing called the Mach3Turbo. That’s three blades and an aloe strip. For moisture. But you know what happened next? Shut up, I’m telling you what happened—the bastards went to four blades. Now we’re standing around … selling three blades and a strip. Moisture or no, suddenly we’re the chumps. Well, fuck it. We’re going to five blades.

44 species, sittin’ in a tree Conservation tracks in the human genome are simply additional lines of annotation shown alongside the human DNA sequence. The lines show identical areas of near-similar DNA found in other species (in this case 44 vertebrates). In the past we might have looked at two, three, seven, maybe a dozen different species in a row. UCSC had actually been up to 27 different species at a time before they took the extra push over the cliff to 44.

As it turns out, just sequencing the human genome isn’t all that interesting. It only starts to get interesting in the context of other genomes from other species. With multiple species, the data can be compared and evolutionary trees drawn. We can take an organism that we know a lot about — say the fruitfly — and compare its genes (which have been studied extensively) to the genetic code of humans (who have been studied less), and we can look for similar regions. For instance, the HOX family of genes is involved in structure and limb development. A similar region can be found in humans, insects, and many things in between. How cool is that?

Further, how about all that “junk” DNA? A particular portion of DNA might have no known function, but if you find an area where the data matches (is conserved) with another species, then it might not be quite as irrelevant as previously thought (and for the record, the term junk is only used in the media). If you see that it’s highly conserved (a large percentage is identical) across many different species, then you’re probably onto something, and it’s time to start digging further.

Spending time with data like this really highlights the silliness of anti-evolution claims. It’s tough to argue with being able to see it. Unfortunately most of the work I’ve done in this area isn’t documented properly, though you can see human/chimp/dog/mouse alignments in this genome browser, a dozen mammals aligned in this illustration, or humans and chimps in this piece.

As an aside, a few months after the Onion article, Gillette really did go to five blades with their Fusion razor. And happily, the (real) CEO speaks with the same bravado as the earlier editorial:

“The Schick launch has nothing to do with this, it’s like comparing a Ferrari to a Volkswagen as far as we’re concerned,” Chairman, President and Chief Executive James Kilts, told Reuters.

And why isn’t that guy doing their ads instead of those other namby-pambies?

Wednesday, February 4, 2009 | genetics

Hide the bipolar data, here comes bioinformatics!

I was fascinated a few weeks ago to receive this email from the Genome-announce list at UCSC:

Last week the National Institutes of Health (NIH) modified their policy for posting and accessing genome-wide association studies (GWAS) data contained in NIH databases. They have removed public access to aggregate genotype GWAS data in response to the publication of new statistical techniques for analyzing dense genomic information that make it possible to infer the group assignment (case vs. control) of an individual DNA sample under certain circumstances. The Wellcome Trust Case Control Consortium in the UK and the Broad Institute of MIT and Harvard in Boston have also removed aggregate data from public availability. Consequently, UCSC has removed the “NIMH Bipolar” and “Wellcome Trust Case Control Consortium” data sets from our Genome Browser site.

The ingredients for a genome-wide association study are a few hundred people, and a list of what genetic letter (A, C, G, or T) is found at a few hundred specific locations in the DNA of each of those people. Such data is then correlated to whether individuals have a particular disease, and using the correlation, it’s possible to sometimes localize what part of the genome is responsible for the disease.

Of course, the diseases might be of a sensitive nature (e.g. bipolar disorder), so when such data is made publicly available, it’s done in a manner that protects the privacy of the individuals in the data set. What this message means is that a bioinformatics method has been developed that undermines those privacy protections. An amazing bit of statistics!

This made me curious about what led to such a result, so with a little digging, I found this press release, which describes the work:

A team of investigators led by scientists at the Translational Genomics Research Institute (TGen) have found a way to identify possible suspects at crime scenes using only a small amount of DNA, even if it is mixed with hundreds of other genetic fingerprints.

Using genotyping microarrays, the scientists were able to identify an individual’s DNA from within a mix of DNA samples, even if that individual represented less than 0.1 percent of the total mix, or less than one part per thousand. They were able to do this even when the mix of DNA included more than 200 individual DNA samples.

The discovery could help police investigators better identify possible suspects, even when dozens of people over time have been at a crime scene. It also could help reassess previous crime scene evidence, and it could have other uses in various genetic studies and in statistical analysis.

So the CSI folks have screwed it up for the bipolar folks. The titillatingly-titled “Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays” can be found at PLoS Genetics, and a PDF describing the the policy changes is on the NIH’s site for Genome-Wide Association Studies. The PDF provides a much more thorough explanation of what association studies are, in case you’re looking for something better than my cartoon version described above.

Links to much more coverage can be found here, which includes major journals (Nature) and mainstream media outlets (LA Times, Financial Times) weighing in on the research. (It’s always funny to see how news outlets respond to this sort of thing—the Financial Times talk about the positive side, the LA Times focuses exclusively on the negative.) A discussion about the implications of the study can also be found on the PLoS site, with further background from the study’s primary author.

Science presents such fascinating contradictions. A potentially helpful advance that undermines another area of research. The breakthrough that opens a Pandora’s Box. It’s probably rare to see such a direct contradiction (that’s not heavily politicized like, say, stem cell research), but the social and societal impact is undoubtedly one of the things I love most about genetics in particular.

Tuesday, September 16, 2008 | genetics, mine, privacy, science

Paternalism at the state level and the definition of “advice”

Following up on an earlier post, The New York Times jumps in with more about California (and New York before it) shutting down personal genomics companies, including this curious definition of advice:

“We think if you’re telling people you have increased risk of adverse health effects, that’s medical advice,” said Ann Willey, director of the office of laboratory policy and planning at the New York State Department of Health.

The dictionary confirmed my suspicion that advice refers to “guidance or recommendatios concerning prudent future action,” which doesn’t coincide with telling people they have increased risk for a disease. If they told you to take medication based on that risk, it would most certainly be advice. But as far as I know, the extent of the advice given by these companies is to consult a doctor for…advice.

As in the earlier post, the health department in California continues to sound nutty:

“We started this week by no longer tolerating direct-to-consumer genetic testing in California,” Karen L. Nickel, chief of laboratory field services for the state health department, said during a June 13 meeting of a state advisory committee on clinical laboratories.

We will not tolerate it! These tests are a scourge upon our society! The collapse of the housing loan market, high gas prices, and the “great trouble or suffering” brought on by this beast that preys on those with an excess of disposable income. Someone has to save these people who have $1000 to spare on self-curiosity! And the poor millionaires spending $350,000 to get their genome sequenced by Knome. Won’t someone think of the millionaires!?

I wish I still lived in California, because then I would know someone was watching out for me.

For the curious, the letters sent to the individual companies can be found here, sadly they aren’t any more insightful than the comments to the press. But speaking of scourge—the notices are all Microsoft Word files.

One interesting tidbit closing out the Times article:

Dr. Hudson [director of the Genetics and Public Policy Center at Johns Hopkins University] said it was “not surprising that the states are stepping in, in an effort to protect consumers, because there has been a total absence of federal leadership.” She said that if the federal government assured tests were valid, “paternalistic” state laws could be relaxed “to account for smart, savvy consumers” intent on playing a greater role in their own health care.

It’s not clear whether this person is just making a trivial dig at the federal government
or whether this is the root of the problem. In the previous paragraph she’s being flippant about “Genes R Us” so it might be just a swipe, but it’s an interesting point nonetheless.

Thursday, June 26, 2008 | genetics, government, privacy, science

Personal genetic testing gets hilarious before it gets real

Before I even had a chance to write about personal genomics companies 23andMe, Navigenics, and deCODEme, Forbes reports that the California Health Department is looking to shut them down:

This week, the state health department sent cease-and-desist letters to 13 such firms, ordering them to immediately stop offering genetic tests to state residents.

Because of advances in genotyping, it’s possible for companies to detect changes from half a million data points (or soon, a million) of a person’s genome. The idea behind genotyping is that you look only for the single letter changes (SNPs) that are more likely to be unique between individuals, and then use that to create a profile of similarities and differences. So companies have sprung up, charging $1000 (ok, $999) a pop to decode these bits of your genome. It can then tell you some basic things about ancestry, or maybe a little about susceptibility for certain kinds of diseases (those that have a fairly simple genetic makeup—of which there aren’t many, to be sure).

Lea Brooks, spokesperson for the California Health Department, confirmed for Wired that:

…the investigation began after “multiple” anonymous complaints were sent to the Health Department. Their researchers began with a single target but the list of possible statute violators grew as one company led to another.

Listen folks, this is not just one California citizen, but two or more anonymous persons! Perhaps one of them was a doctor or insurance firm who have been neglected their cut of the $1000:

One controversy is that some gene testing Web sites take orders directly from patients without a doctor’s involvement.

Well now, that is a controversy! Genetics has been described as the future of medicine, and yet traditional drainers of wallets (is drainer a word?) in the current health care system have been sadly neglected. The Forbes article also describes the nature of the complaints:

The consumers were unhappy about the accuracy [of the tests] and thought they cost too much.

California residents will surely be pleased that the health department is taking a hard stand on the price of boutique self-testing. As soon as they finish off these scientifimagical “genetic test” goons, we could all use a price break on home pregnancy tests.

And as to the accuracy of, or what can be ascertained from such tests? That’s certainly been a concern of the genetics community, and in fact 23andme has “admitted its tests are not medically useful, as they represent preliminary findings, and so are merely for educational purposes.” Which is perfectly clear to someone visiting their site, however that presents a bigger problem:

“These businesses are apparently operating without a clinical laboratory license in California. The genetic tests have not been validated for clinical utility and accuracy,” says Nickel.

So an accurate, clinical-level test is illegal. But a less accurate, do-it-yourself (without a doctor) test is also illegal. And yet, California’s complaint gets more bizarre:

“And they are scaring a lot of people to death.”

Who? The people who were just complaining about the cost of the test? That’s certainly a potential problem if you don’t do testing through a doctor—and in fact, it’s a truly significant concern. But who purchases a $999 test from a site with the cartoon characters seen above to check for Huntington’s disease?

And don’t you think if “scaring people” were the problem, wouldn’t the papers and the nightly news be all over it? The only thing they love more than a new scientific technology that’s going to save the world is a new scientific technology to be scared of. Ooga booga! Fearmongering hits the press far more quickly than it does the health department, so this particular line of argument just sounds specious.

The California Health Department does an enormous disservice to the debate of a complicated issue by mixing several lines of reasoning which taken as a whole simply contradict one another. The role of personal genetic testing in our society deserves a debate and consideration; I thought I would be able to post about that part first, but instead the CA government beat me to the dumb stuff.

Thomas Goetz, deputy editor at Wired has had two such tests (clearly not unhappy with the price), and angrily responds “Attention, California Health Department: My DNA Is My Data.” It’s not just those anonymous Californians who are wound up about genetic testing, he’s writing his sternly worded letter as we speak:

This is my data, not a doctor’s. Please, send in your regulators when a doctor needs to cut me open, or even draw my blood. Regulation should protect me from bodily harm and injury, not from information that’s mine to begin with.

Are angry declarations of ownership of one’s health data a new thing? It’s not like most people fight for their doctor’s office papers, or even something as simple as a fingerprint, this way.

It’ll be interesting to see how this shakes out. Or it might not, since it will probably consist of:

A settlement by the various companies to continue doing business.
Some means of doctors and insurance companies getting paid (requiring a visit, at a minimum).
People trying to circumvent #2 (see related topics filed under “H” for Human Growth Hormone).
An entrepreneur figures out how to do it online and in a large scale fashion (think WebMD), turning out new hoards of “information” seeking hypochondriacs to fret about their 42% potential alternate likelihood maybe chance of genetic malady. (You have brain cancer too!? OMG!)
If this hits mainstream news, will people hear about the outcome of #1, or will there be an assumption that “personal genetic tests are illegal” from here on out? How skittish will this make investors (the Forbes set) about such companies?

Then again, I’ve already proven myself terrible at predicting the future. But I’ll happily enjoy the foolishness of the present.

Tuesday, June 17, 2008 | genetics, privacy, science

Book

Visualizing Data is my 2007 book about computational information design. It covers the path from raw data to how we understand it, detailing how to begin with a set of numbers and produce images or software that lets you view and interact with information. When first published, it was the only book(s) for people who wanted to learn how to actually build a data visualization in code.

The text was published by O’Reilly in December 2007 and can be found at Amazon and elsewhere. Amazon also has an edition for the Kindle, for people who aren’t into the dead tree thing. (Proceeds from Amazon links found on this page are used to pay my web hosting bill.)

Examples for the book can be found here.

The book covers ideas found in my Ph.D. dissertation, which is the basis for Chapter 1. The next chapter is an extremely brief introduction to Processing, which is used for the examples. Next is (chapter 3) is a simple mapping project to place data points on a map of the United States. Of course, the idea is not that lots of people want to visualize data for each of 50 states. Instead, it’s a jumping off point for learning how to lay out data spatially.

The chapters that follow cover six more projects, such as salary vs. performance (Chapter 5), zipdecode (Chapter 6), followed by more advanced topics dealing with trees, treemaps, hierarchies, and recursion (Chapter 7), plus graphs and networks (Chapter 8).

This site is used for follow-up code and writing about related topics.

Much Clicked

Full Archives