Saturday, August 16, 2008

The DNA Network

The DNA Network

Systematic Biology T-Shirts [HENRY » genetics]

Posted: 16 Aug 2008 08:01 PM CDT

Everyone’s favorite systematics journal Systematic Biology have produced a collection of T-Shirts that you can buy online (like the awesome one above). This is a fundraising project, and 100% of the profits will go to helping graduate students in the field of systematic biology (like me!).

There are others there that aren’t official SB-products, but are still tempting

A society where ideology is a substitute for evidence can go badly awry [Tomorrow's Table]

Posted: 16 Aug 2008 05:07 PM CDT

Olivia Judson has a lovely piece today on why it is important for evolution to be taught in beginning biology classes. One important point that she makes is this:

"A society where ideology is a substitute for evidence can go badly awry."

She goes on to say:

"In his book, 'The Republican War on Science,' the journalist Chris Mooney argues persuasively that a contempt for scientific evidence — or indeed, evidence of any kind — has permeated the Bush administration's policies, from climate change to sex education, from drilling for oil to the war in Iraq. A dismissal of evolution is an integral part of this general attitude".

I would argue (as does she) that we see this substitute for ideology over science on the left end of the political spectrum as well. In the case of genetically engineered crops, for example, there is a pervasive ideology that GE crops are harmful to human health and to the environment, even though there has not been a single case of harm in over 10 years of cultivation.

Venter's exome, and the challenge of rare variants for personal genomics [Genetic Future]

Posted: 16 Aug 2008 01:13 AM CDT

A team led by J. Craig Venter from the J. Craig Venter Institute has just published another paper on J. Craig Venter's favourite topic: J. Craig Venter.

This study follows up on last year's publication of the complete sequence of Venter's genome, this time reporting a detailed analysis of a small but quite informative fraction of the genome: the exome, which consists of all of the pieces of DNA (called exons) that directly code for protein molecules.

The exome is a favoured target of geneticists. There are two major reasons for this: firstly, the exome is enriched for functional sequence, whereas non-coding DNA has a much higher fraction of non-functional junk; and secondly, we understand protein-coding DNA much better than we do non-coding DNA. If a novel mutation alters a protein sequence, we have algorithms that can predict (with moderate accuracy) how likely it is to alter the function of the cell. In contrast, for most mutations in non-coding DNA we have almost no way to predict whether they are functional or not. So, like the drunkard looking for his keys under the lamp-post because the light is better there, geneticists are inclined to look hardest at the regions where they actually have some chance of finding something they can understand.

Venter's mutations
The article (which is open access, so you can read it yourself) has a number of interesting factoids about Venter's protein-coding genome that are highly relevant to personal genomics:
  1. The authors identified 10,389 variants predicted to alter protein sequences;

  2. Of these, most are common (they estimate that 80-85% are present at a frequency of over 5% in the general population);

  3. About 1,500 of these variants are likely to actually significantly alter protein function, based on the SIFT prediction algorithm - these are the variants most likely to play a role in shaping human variation and common disease risk;

  4. A variant is twice as likely to be functionally damaging if it is rare (frequency less than 5%) than if it is common (frequency over 5%);

  5. Several quite unambiguously protein-damaging mutations were also found (74 would introduce an abnormal "stop" signal, while others create "frame-shifts" that alter large regions of an encoded protein), but many of these fall in genes with poor annotation that may well be non-functional;

  6. Venter carries seven known disease-associated variants, all present in only one copy (i.e. heterozygous);

  7. The interpretation of all of these data in terms of making actual health predictions is remarkably problematic, an ominous sign for the ~20 wealthy folks getting their genome sequenced by Knome this year.
The authors raise some interesting discussion points about the implications of their results for personal genomics; this paragraph is particularly sobering:
Even if a gene is known to be involved in disease, it is difficult to understand if a variant in the gene will have a phenotypic effect. We found that 99% of the [protein-altering variants] in disease genes could not be characterized by current literature. Different mutations in the same gene can cause different phenotypic effects [49], thus making it difficult to interpret possible phenotypes. Furthermore, some variants have phenotypic effects only under certain environments (see SOD2 and BDNF in Table 2 and [48]). Also, when looking at complex phenotypes, multiple variants in coding and non-coding regions are likely to be involved [63][66]. This genetic complexity, as well as exposure to various environmental factors, will need to be taken into account in assessing risk for various diseases.
In other words, it will be quite some time before we can use a genome sequence to make realistic predictions about overall health (except for the unlucky few who carry mutations unambiguously associated with disease, such as a CAG repeat expansion in the HTT gene - in which case the predictions will tend to be dire). The next few years will be interesting times indeed for personal genomics companies, as their ability to generate oodles of genetic data with cheap sequencing increases exponentially faster than their capacity to explain what the data actually mean.

The challenge of rare variants
I want to draw particular attention to the implications of point 4 above (the fact that rare mutations are the most likely to alter protein function, and thus to have an effect on disease risk). The evolutionary basis for this association is trivially clear: if a variant has a serious negative effect on health then in most cases natural selection will keep it at a low frequency in the population, since really sick people tend to have fewer kids. Disease-causing variants can reach high frequencies under certain conditions (if they also provide benefits under certain situations, or if the disease only hits its victims after they've already reproduced, for instance) but all else being equal, evolution's sickle means that you're far more likely to find disease-causing variants at the rare rather than the common end of the spectrum.

The reason this is so problematic is that rare disease-causing variants are also the hardest to find and characterise. I've mentioned a few times that the current crop of genome-wide association studies (GWAS), while reasonably well-powered to detect common disease-causing variants, have virtually no ability to find rare causal variants - even if these variants explain the majority of disease risk. This probably goes some way to explaining why even massive GWAS are capturing only a small proportion of the overall genetic risk for most common diseases.

This arises primarily because the chips used in current GWAS only efficiently "tag" common variants. However, even once this technological barrier is lifted it will still be fiendishly difficult to assign function to rare variants: because there will be many millions of these variants, each at a low frequency, the sample sizes required to find those few associated with disease risk will be mind-bogglingly large - we're talking cohorts of millions of people, all with large-scale sequence data and well-collected information on environment and health. I have no doubt such studies will eventually be done, but it will take many years before we see the results.

And of course, even with such massive cohorts, the rarest variants (those restricted to single families, or even just a few isolated individuals) will still slip through the statistical cracks - but such variants may well be the most important features in the genome sequence of any given individual, the ones disrupting that crucial tumour-suppressor gene or messing with neurotransmitter expression levels. If you have one of these nasty variants, you'll want to know about it, and you'll want to know what it does.

Beyond genetics
Ultimately, geneticists will have to deal with such variants using non-genetic methods. For instance, for many genes it may eventually be possible to create experimental assays that allow researchers to rapidly test whether a novel variant disrupts protein function; the mouse embryonic stem cell assays that can be used to test novel variants in the breast cancer gene BRCA2 are a proof of principle, as well as a demonstration of just how challenging this process will be.

More broadly and ambitiously, we need to build and refine models of how human beings operate at a molecular level, integrating data from many fields of biology. If we understand which proteins interact within which cells, how these interactions influence protein dynamics, and where the binding sites for each interaction lie, we will have a much better chance of inferring the effect of an isolated change in protein sequence on overall cellular function and thus human health. Moving beyond the exome into non-coding DNA will require even more subtle and complex models including protein-DNA binding, the regulation of DNA modification and conformation, and the effects of non-coding RNA.

In other words, ultimate personal genomics - the extraction of every byte of useful predictive information out of an individual's genome sequence - will require nothing less than an atomic-level understanding of the operation of the human machine. Now that is an effort I'd like to see Google throw its weight behind...

(Venter image from Wikimedia Commons.)

Ng, P.C., Levy, S., Huang, J., Stockwell, T.B., Walenz, B.P., Li, K., Axelrod, N., Busam, D.A., Strausberg, R.L., Venter, J.C., Schork, N.J. (2008). Genetic Variation in an Individual Human Exome. PLoS Genetics, 4(8), e1000160. DOI: 10.1371/journal.pgen.1000160

Subscribe to Genetic Future.

Vaccines, part II: what are vaccines made of? [Discovering Biology in a Digital World]

Posted: 15 Aug 2008 03:38 PM CDT

Vaccines work by stimulating the immune system to respond to a specific thing. Most of the vaccines we use are designed to prime the immune system so that it's ready to fight off some kind of disease, like whooping cough, polio, or influenza. Some vaccines can have more specialized functions, like stimulating the body to attack cancer cells, kill rogue autoimmune cells, or prevent pregnancy. We'll look at what they do in later posts, for now, let's look at the kinds of things that can be used as vaccines.

Read the rest of this post... | Read the comments on this post...

Exits [business|bytes|genes|molecules]

Posted: 14 Aug 2008 11:32 PM CDT

Matt Asay ask’s an interesting question in a recent column. “Should you sell out your next big open source idea?“.

As is the tendency in these parts, that got me thinking about life science startups. What kind of exits can disruptive life science startups expect. In recent years, Solexa got acquired by Illumina, 454 by Roche, which would suggest that acquisition is a popular route. Certainly in today’s market, IPOs aren’t the most popular exit. While pharma has, in the past, tended to gobble up startups, often to get access to their pipeline, in a world where pharma is trying to streamline and virtualize, will technology acquisitions still make sense?

My guess is that for the time being technology acquisitions will continue to be a trend in the life science industry and the most likely exit for disruptive startups, with a brave few succeeding on their own (a la Illumina). On the other hand, I wish the life science industry also made acquisitions for people. It almost seems that the people in startups are expendable, and the value lies in the IP (somewhat different from the internet industry where a decent chunk of acquisitions seem to be “people” acquisitions).

Reblog this post [with Zemanta]


Vaccines, part I [Discovering Biology in a Digital World]

Posted: 14 Aug 2008 02:18 PM CDT

A long time ago, I saw a movie called "The Other Side of the Mountain." The movie told the story of Jill Kinmont, a ski racer who contracted polio and lost the use of her legs. I was sad for days for afterward, but also relieved to know that Jill Kinmont's fate wasn't going to be mine. I wasn't going to wake up in an iron lung after a ski race, and neither were my friends, because most of the children in my generation had been vaccinated against the Polio virus.

This image shows a polio survivor learning to walk. The image comes from the CDC Public Health Image Library

Read the rest of this post... | Read the comments on this post...

More thoughts on animal research: Pets and wild animals benefit, too [Discovering Biology in a Digital World]

Posted: 14 Aug 2008 09:00 AM CDT

Every year people adopt pet dogs, cats, birds, and other creatures and take them to their local veterinarians for all the usual vaccinations and exams. The usual vaccinations protect your pets from diseases like rabies, distemper, Feline Immunodeficiency Virus, and Feline Leukemia. But it's not just pets that get protected by vaccines. Agricultural creatures: fish, chickens, sheep, cows, pigs, and horses receive vaccines and increasingly, wild animals are getting vaccinated, too.

Read the rest of this post... | Read the comments on this post...

Fireeagle as a paradigm for bioinformatics services [business|bytes|genes|molecules]

Posted: 14 Aug 2008 01:23 AM CDT

Fire EagleImage by Phillie Casablanca via Flickr Yesterday Yahoo announced the public launch of Fireeagle, a location service I have been using almost from the day it was first launched in private beta.

The part about Fireeagle that appeals to me is that from the beginning it was designed to be a platform, very much in the architect for innovation spirit. The FAQ says it well (emphasis mine)

What is Fire Eagle?

Fire Eagle is a site that stores information about your location. With your permission, other services and devices can either update that information or access it. By helping applications respond to your location, Fire Eagle is designed to make the world around you more interesting! Use your location to power friend-finders, games, local information services, blog badges and stuff like that…

In a perfect world, Fireeagle would serve as the hub that would connect multiple location aware services, keeping them nicely in sync.

That got me thinking. In addition to location or perhaps state-aware applications perhaps we need a service that would keep the various protein and gene databases, and/or associated web services in the world, in sync. A service that understands some agreed upon standard, and has an API that allows an update in say Genbank to be immediately mirrored in a repository or service that might use a slightly different structure or presentation layer.

What kind of applications can you think about?

Reblog this post [with Zemanta]


You could win an iPod and a MacBook Air and an Apple TV [Discovering Biology in a Digital World]

Posted: 13 Aug 2008 01:37 PM CDT

if you take this survey.

Wanna change the world? Make it possible for everyone to talk about science in a normal conversation? Do you have ideas for improving science literacy?

Seed is interested in your ideas. Answer the survey and share your thoughts. And I've seen the MacBook Air. It's beautiful.

UPDATE: if you had trouble accessing the survey, try it again. It will be open until Friday, August 15th, 11pm EST.

Read the comments on this post...

No comments: