Tuesday, September 30, 2008

The DNA Network

The DNA Network

Financial Crisis - Some Links + One Rant [The Daily Transcript]

Posted: 30 Sep 2008 08:23 PM CDT

One great aspect of the Internet is the amount of information that is to be found out there. Here are some links about the current financial crisis.

First up, a discussion between Bill Moyers and Kevin Phillips from the site:

Bill Moyers sits down with former Nixon White House strategist and political and economic critic Kevin Phillips, whose latest book BAD MONEY: RECKLESS FINANCE, FAILED POLITICS, AND THE GLOBAL CRISIS OF AMERICAN CAPITALISM explores the role that the crumbling financial sector played in the now-fragile American economy.

Next up a couple of recent shows from This American Life. The first episode is from way back in May when the increase in foreclosures started rising rapidly, The Giant Pool of Money. The second show was from a few weeks ago (Enforcers). Part II of this episode is about the SEC and naked short selling.

This segment was itself expanded into a feature on NPR called Planet Money, which includes a daily podcast dedicated to the current financial crisis.

(rant is below the fold)

Read the rest of this post... | Read the comments on this post...

Congrats to Gary Andersen, Developer of the Phylochip, for Getting a WSJ Technology Award [The Tree of Life]

Posted: 30 Sep 2008 07:32 PM CDT

The Phylochip, developed by Gary Andersen, of Lawrence Berkeley National Lab, and colleagues, has won a Wall Street Journal, Technology Innovation Award. For more see the Wall Street Journal here. Their phylochip is a microarray which can be used to rapidly survey rRNAs from different organisms and get a measure of the types and abundances of organisms present in a sample. It is similar in concept although different in design from a rRNA chip that was used by David Relman, Pat Brown, Chana Palmer and others. Not sure why the chip from Brown et al did not also win the award (it probably was not nominated, or something like that), but still, always good to see cool things in microbiology win awards like this.

HealthMap: hunting for global outbreaks and learning about microbiology [Discovering Biology in a Digital World]

Posted: 30 Sep 2008 05:06 PM CDT

HealthMap is a great site that could be an excellent resource when teaching a biology, microbiology, or health class. Not to mention, I can picture people using it before they travel somewhere or even just for fun.

I learned about HealthMap awhile ago from Mike the Mad Biologist, but I didn't get time to play with the site until today.

Here's an example to see how it works.

How do I use HealthMap?

Read the rest of this post... | Read the comments on this post...

Podcast about the Personal Genome Project via Harvard Medical Labcast [The Personal Genome]

Posted: 30 Sep 2008 03:51 PM CDT

The Harvard Medical Labcast published a podcast today about the Personal Genome Project (PGP). Interviews include founder and professor of genetics at HMS, George Church; Jeantine Lunshof, ethicist for the PGP; John Halamka, PGP participant and HMS CIO; and myself.

To listen, please see:
Harvard Medical Labcast, Episode 6: Your genome, your future. [mp3 or subscribe via iTunes] The PGP-related material begins around the 9:30 mark.

Penicillin Genome Announced [Genome Blog]

Posted: 30 Sep 2008 03:00 PM CDT

When I woke up this morning, the big news was that the genome of Penicillin has been determined. The paper will be published in the journal Nature corresponding with the discovery of Penicillin by Sir Alexander Fleming 80 years ago. Learning about the famous discovery as the accidental contamination of a Petri dish is a standard part of high school biology courses. It is often used as a prime example of luck being the meeting of opportunity and the prepared mind. Undergrad students of microbiology often get first hand experience with penicillin contamination in their lab experiments, much to the horror and chagrin of the post-doc supervising the lab.

Joe Derisi and Open Science featured on Voice of America [The Tree of Life]

Posted: 30 Sep 2008 03:00 PM CDT

Well, I already gave him one of my awards, so what else could he do?  Anyway, always good to see Open Science getting promoted and nice to see Voice of America running a story on Joe Derisi after his Heinz Award and featuring this openness (listen to an MP3 of the radio story here).  And they even interviewed me because of my blog about him.  Blogs and the "real" news merge closer and closer every day.

Updating my Newick parser [Mailund on the Internet]

Posted: 30 Sep 2008 01:16 PM CDT

Back in 2003 I wrote a small parser for the Newick tree format.  It is pretty straightforward Python code, and basically just something I hacked up because I needed to manipulate some trees for a project.

Figuring that others might also find it useful I put it on my webpage and that’s about it for the story of my Newick parser. I’ve used it in a few other projects, but haven’t really developed it further from the initial code, and haven’t really received much feedback on it.

Except for this weekend where I got three emails about it.  I might have received one email a year until now.

It was a few bug reports and some questions, and because of the bug reports I’ve now made a new release, version 1.3.

I also have a Newick parser for C++.  Actually, I have more than one, since there are two different parsers in QDist and SplitDist, but the one I have in mind is more stand-alone and can probably be used by others.

It is a recursive decent parser I wrote in Boost.Spirit as an exercise to learn the Spirit language.

I think I will clean it up a bit and put it up on the web…

Thinking about “thinkism” [business|bytes|genes|molecules]

Posted: 30 Sep 2008 10:00 AM CDT

The Thinker, Artist's rendering of the sculptu...Image via WikipediaSay what you like about Kevin Kelly, he has the ability to write material that makes you think. In a (no pun intended) post called Thinkism, Kelly makes a very effective argument related to the Singularity, one I try and make but not this effectively.

Let’s start with his definition of thinkism (emphasis mine)

Setting aside the Maes-Garreau effect, the major trouble with this scenario is a confusion between intelligence and work. The notion of an instant Singularity rests upon the misguided idea that intelligence alone can solve problems. As an essay called Why Work Toward the Singularity lets slip: “Even humans could probably solve those difficulties given hundreds of years to think about it.” In this approach one only has to think about problems smartly enough to solve them. I call that “thinkism.”

Here are some other choice lines, ones that fit my own world view

Let’s take curing cancer or prolonging longevity. These are problems that thinking along cannot solve.

No intelligence, no matter how super duper, can figure out how human body works simply by reading all the known scientific literature in the world and then contemplating it.

But it’s what he says after all this that really hits the nail on the head. He says that “Between not knowing how things work and knowing how they work is a lot more than thinkism.” So true. I just wrote a post about how we have so many gaps in our data (something that has come up a lot lately). Our hypotheses are only as good as the data that we can collect. As Kelly said, just thinking about he potential data will not yield the correct data. We have our working models, but as we collect all the data that we can, these models have to be refined, till at some point we can’t correct them any longer (you know that thing they call the Scientific Method). We need to do a lot of experiments, collect a lot of data, build a lot more hypotheses before we can come close to addressing the kinds of problems that Singularitarians talk about.

To end, here is the last line of Kelly’s post

Since we did not see them coming, we look back and say, yes, that was the Singularity.

Reblog this post [with Zemanta]

When SDS-PAGE Goes Bad [Bitesize Bio]

Posted: 30 Sep 2008 08:54 AM CDT

With all of our recent talk about how SDS-PAGE works and how to improve your gels, I wouldn’t want to give the impression that running protein gels is easy or foolproof.

On the contrary, just like everything else in research, SDS-PAGE can go wrong in a multitude of ways. And if you took a peek inside some of my old lab books you would have all of the proof you need about how easily this technique can make you look like a fool.

But luckily, I don’t have to air any dirty laundry in public to show you, because someone else has done something similar already.

As part of a lab guide for an experimental biology course at Rice University, David R. Caprette has pulled together an “SDS-PAGE Hall of Shame“. It’s made up of photos of gels produced by course students, as well as some from the university’s research labs, that have gone horribly wrong for all manner of reasons.

As well as providing entertainment for ghoulish (science geek) onlookers, the gallery is, of course for educational purposes. It is intended as a troubleshooting resource; by clicking on the picture that looks most like your poor, messed up gel you will be given a pearl of wisdom that suggests what you might have done wrong so you can remedy it in the future.

But perhaps the best things about it is that in an uncertain world, it shows that everyone else has protein gel disasters too. And most of us can probably take comfort from that.

Hey teachers! Researchblogging.org is a great classroom resource [Discovering Biology in a Digital World]

Posted: 30 Sep 2008 08:09 AM CDT

One time, I suggested in a list-serve that science teachers make more use of primary scientific literature. Naturally, I learned all the reasons why teachers don't do this-lack of access being one of the biggies- but I also learned something surprising.

Read the rest of this post... | Read the comments on this post...

Science Tuesday: It’s better than real, it’s a real imitation [A Free Man » Science]

Posted: 30 Sep 2008 06:42 AM CDT

When I was born, thirty-ahremeah years ago, there were about 3.7 billion people in the world. The most recent estimates place the population of this planet at 6.725 billion, which means that world’s population has nearly doubled in less than four decades. At our current growth rate we face an imminent Malthusian crisis. Maybe not today, maybe not tomorrow but at some point we’re going to reach the tipping point at which there will not be enough agriculture to sustain the world’s population. Food prices are on the rise and This is one of the reasons that I chose to do a Ph.D. in the field and the place that I did. It turns out that, in the long run, I’m neither breeding nor genetically engineering better crops but it is a field which I still follow with some interest.

There has been a renaissance in plant biotechnology in the last quarter century, which has made it possible to increase crop yield, develop new strains with resistance to many diseases or to too much salt, heat, drought or soil toxins. A big part of this golden age has involved transgenic, or genetically modified (GM), crops. A GM plant is one that has had a foreign gene inserted into its genome. This usually results in an added or modified trait. For example, some of the most common GM plants have had a gene from a soil bacterium which produces a protein that is toxic to some herbivorous pests. When these pests feed on Bt crops they are killed without the addition of pesticide.
The use of transgenic or genetically modified (GM) crops has been a contentious issue around the world for the last couple of decades and made the news here in Australia earlier this month. The governing Labor Party in Western Australia banned the growth of GM crops in that state four years ago. However, in recent elections, Labor was ousted and a Liberal and National coalition have promised to rescind that ban. This follows lifting of bans on GM crops in New South Wales and Victoria earlier in the year. With a changing environment and mired in a seemingly endless drought, Australian wheat farmers are poised to reap the benefits of transgenic technology if drought resistant or salt tolerant varieties could be developed. In other news from earlier this month, China announced a $3.5 billion GM crops initiative  to help the world’s most populous nation catch up with the West in the race to patent new plant genes. The Chinese are beginning to place a priority on food security and see GM crops as the best way forward.

I’m in the minority of plant scientists in the sense that I’ve always been a little hesitant about the use of GM crops. I’m not an alarmist, nor would I support a ban of GM crops for human consumption as the European Union has instated. I believe that most GM crops are perfectly safe and that the technology does have potential to revolutionize agriculture. Hell, I’ve made transgenic plants myself, though none that are going to find their way to your dinner plate. (Unless you have a rather unusual palate.) I do, however, have some pretty serious concerns about regulation, environmental issues and intellectual property.

In terms of regulation, my concerns revolve around scrutiny of GM crops that make their way into the human food pool. GM crops have been approved for consumption in the U.S. since 1994 and there have been exactly zero reports of ill health effects. However, there are an increasing number of instances in which unapproved GM crops are finding their way to the supermarket. The inadvertent release of Starlink corn, a GM line approved only for animal feed,  into the human food supply in 2001 raised some fairly serious concerns regarding regulation and ones that have not been fully resolved. There were no reliable reports of health effects of any kind, despite concerns over potential allergic reaction. More recently, in 2006,  a GM variety of rice that had never been approved or marketed appeared in commercially available supplies in both the U.S. and Europe. It is still unclear how the GM line “got loose”. This is the crux of the problem, regulation of transgenic plants is spotty and inconsistent with different universities, research institutes and companies having wildly different regulations. American consumers in particular should be vigilant here as there is a combination of lots of GM acreage and regulatory agencies stripped of many of their powers after 8 years of the Bush Administration.

One of the benefits cited for the use of GM crops is the reduction of pesticides and fertilizers required for cultivation. For example, growing Bt crops can vastly reduce the amount of pesticide required. Some researchers are concerned, however, that there are also environmental costs of the use of transgenic crops. The most serious of these is potential transgene escape. Recent studies of transgenic sugar beet and canola have shown that cross-pollination of non-transgenic relatives of transgenic crops can occur and that the presence of the transgene can persist for at least six years. This becomes especially problematic when GM and non-GM crops are grown in close proximity and is the most likely explanation for the GM rice escape in 2006. Beyond transgene transfer, there is an issue of harmful effects of transgene products. One of the toxins expressed in Bt crops has been detected in the guts of predators of plant pests. For example, aphids that feed on Bt corn are themselves fed on by ladybugs. Researchers at the University of Kentucky have been able to detect low levels of Bt toxin in the latter. In a controversial study published in PNAS by researchers from the University of Wisconsin, it was claimed that corn byproducts enter streams and are subject to storage, consumption, and transport to downstream water bodies and result in reduced growth and increased mortality of nontarget stream insects. It is worth noting that the large-scale mono-crop agriculture that predominates in the West is environmentally disastrous anyway. Most researchers think that GM crops offer, if anything, a slight improvement on environmental effects.

The final issue that I have with GM crops is that I’m not sure that, as things stand now, they will solve world food supply issues. The vast majority of GM crops are owned by one of a handful of large biotech companies. Monsanto produces more than 90% of crops worldwide with Syngenta, Bayer Cropscience, Dow and Du Pont producing the remainder. It is of some concern that these companies will have too much control over world food productionor will force traditional farmers out of the market.  The biggest fears around world hunger are in developing countries where farmers generally can not afford to buy new seed stocks each season and rely on ‘recycling seed’. Most corporations aren’t in the business of giving their products away for free and thus legally obliagte farmers to buy new GM seed each year. There are instances of biotech companies aggressively protecting their intellectual property. Call me a cynic, I just doubt that the biotech companies that hold the patents for most of the useful GM crops are that interested in solving world poverty.

I know I’ve spent most of this post discussing some of the concerns surrounding transgenic crops, but at the bottom of everything I do think that GM crops could, in the words of Nina Fedoroff, be the source of a new Green Revolution. The Golden Rice story is a wonderful example of academic scientists working with biotech companies for humanitarian purposes. I just think that regulation, on a global scale, is absolutely key. Because we now live in a global economy and agricultural products are shipped around the world, there needs to be a global consensus on how to regulate GM crops. The biggest unresolved issue, and potential for trouble, surrounds inadvertant spreading of GM pollen to neighboring fields or wild relatives. Regulations need to be established to minimize this risk. Importantly, you can not force people to accept a technology with which they are uncomfortable. Just as now we have organic produce alternatives, as GM crops become more prevalent, there should be non-GM alternatives. This requires either labelling of GM products or non-GM products to allow consumers an opportunity to make an informed decision.


Aimee Mann’s “I’m With Stupid” is available from Aimee Mann - I'm With Stupid


Image credits:

GM Soya

GM Money Tree

This posting includes an audio/video/photo media file: Download Now

R/parallel [Mailund on the Internet]

Posted: 30 Sep 2008 06:02 AM CDT


There’s a paper that just got out in BMC Bioinformatics that I found an interesting read.

R/parallel - speeding up bioinformatics analysis with R
Gonzalo Vera, Ritsert C Jansen and Remo L Suppi

BMC Bioinformatics 2008, 9:390 doi:10.1186/1471-2105-9-390


R is the preferred tool for statistical analysis of many bioinformaticians due in part to the increasing number of freely available analytical methods. Such methods can be quickly reused and adapted to each particular experiment. However, in experiments where large amounts of data are generated, for example using high-throughput screening devices, the processing time required to analyze data is often quite long. A solution to reduce the processing time is the use of parallel computing technologies. Because R does not support parallel computations, several tools have been developed to enable such technologies. However, these tools require multiple modications to the way R programs are usually written or run. Although these tools can finally speed up the calculations, the time, skills and additional resources required to use them are an obstacle for most bioinformaticians.


We have designed and implemented an R add-on package, R/parallel, that extends R by adding user-friendly parallel computing capabilities. With R/parallel any bioinformatician can now easily automate the parallel execution of loops and benefit from the multicore processor power of today’s desktop computers. Using a single and simple function, R/parallel can be integrated directly with other existing R packages. With no need to change the implemented algorithms, the processing time can be approximately reduced N-fold, N being the number of available processor cores.


R/parallel saves bioinformaticians time in their daily tasks of analyzing experimental data. It achieves this objective on two fronts: first, by reducing development time of parallel programs by avoiding reimplementation of existing methods and second, by reducing processing time by speeding up computations on current desktop computers. Future work is focused on extending the envelope of R/parallel by interconnecting and aggregating the power of several computers, both existing office computers and computing clusters.

It concerns an extension module for R that helps parallelising code on multi-core machines.

This is an important issue.  Data analysis is getting relatively slower and slower, as the data size increases faster than the improvements in CPU speed, so exploiting the parallelism in modern CPUs will get increasingly important.

It is not quite straightforward to do this, however.  Writing concurrent programs is much harder than writing sequential programs, and that is hard enough as it is.

Problems with concurrent programs

The two major difficulties in programming is getting it right and making it fast and both are much more difficult with parallel programs.

When several threads are running concurrently, you need all kinds of synchronisation to ensure that the input of one calculation is actually available before you start computing and to prevent threads from corrupting data structures by updating them at the same time.

This synchronisation is not only hard to get right, it also carries an overhead that can be very hard to reason about.  Just like it is difficult to know which parts of a program is using most of the CPU time without profiling, it is difficult to know which parts of a concurrent program are the synchronisation bottlenecks.

Hide the parallelism as a low-level detail

Since concurrency is so hard to get right, we would like to hide it away as much as we can. Just like we hide assembler instructions away and program in high-level languages.

This is by no means easy to do, and the threading libraries in e.g. Python and Java are not even close to doing that.

In a language such as R or Matlab, you potentially have an easier way of achieving it. A lot of operations are “vectorized”, i.e. you have a high-level instruction for performing multiple operations on vectors or matrices.  Rather than multiplying all elements in a vector using a loop

> v <- c(1,2,3) > w <- v > for (i in 1:3) w[i] <- 2*w[i] > w [1] 2 4 6

you do the multiplication in a single operation

> 2*v [1] 2 4 6

and rather than multiplying matrices explicitly in a triple loop,

> A <- matrix(c(1,2,3,4),nc=2) ; B <- matrix(c(4,3,2,1),nc=2) > C <- matrix(0,nc=2,nr=2)
> for (i in 1:2) for (j in 1:2) for (k in 1:2) C[i,j] <- C[i,j]+A[i,k]*B[k,j] > C      [,1] [,2] [1,]   13    5 [2,]   20    8

you have a single operation for it.

> A %*% B      [,1] [,2] [1,]   13    5 [2,]   20    8

It is almost trivial to parallelise such operations, and if your program consists of a lot of such high-level operations, much program parallelisation can be automated.


This is the reasoning behind the BMC Bioinformatics paper.

The are not exactly doing what I describe above — that would be the “right” thing to do, but would require changes to the R system that you cannot directly do through an add-on package — but they provide an operation for replacing sequential loops with a parallel version.

Just replacing a sequential loop with parallel execution, assuming that the operations are independent, is always safe. The behaviour of the sequential and the parallel program is exactly the same, except for the execution time.

As such there is no extra mental overhead in introducing it to your programs.

Using it won’t necessarily speed up the program, of course. Even if the synchronisation is hidden from the programmer, the overhead is still there.

The authors leave it to the programmer to know when the parallel execution pays off (and profiling should tell him so).

It is quite likely that a sequence of small fast tasks is parallelized and, despite parallel execution, as a result of the transformation process and additional synchronization, the overall processing time can be increased. To avoid this situation, the design decision made is to let the users indicate which code regions (i.e. loops) they need to speed up.

Knowing what to run in parallel and what not to is a hard problem.  It will often depend on the data as well, if nothing else the data size.

A modern virtual machine would be able to profile the execution of the program and make a decision based on that.  Assuming it is operations that are executed more than once.

I don’t know if any virtual machines are actually doing this, though. I really have no idea how much automatic parallelisation is part of the back end of virtual machines.

I would love to know, though.  This is the way to go, to get the speedups of parallelisation without too high a mental burden for the scientist/programmer.

Gonzalo Vera, Ritsert C Jansen, Remo L Suppi (2008). R/parallel - speeding up bioinformatics analysis with R BMC Bioinformatics, 9 (390)

Hot Hot Hot [Mailund on the Internet]

Posted: 30 Sep 2008 03:04 AM CDT

I’m not going to comment on this post over at Bayblab, but the beginning paragraph concerns habanero chili. That brings back memories.

I like chili and usually take a lot in my food.  The first time I had habanero, I wasn’t aware how much stronger they were than the chili I usually get.

It wasn’t a pleasant experience.

I stay away from them now, or if I have them, I would use half of one when I would normally use two of the ones I typically have…

Humorous sciency signs #2: Squirrel liberation front [The Tree of Life]

Posted: 29 Sep 2008 11:09 PM CDT

Here is another funny science related sign. I think this was from the Grand Canyon or somewhere near there, taken by my mother, many years ago.

Like I needed any more justification to not trust organisms bigger than a large protist.

When more is easier [business|bytes|genes|molecules]

Posted: 29 Sep 2008 10:13 PM CDT

Public Art at the NKFUSTImage via WikipediaMore goodness from Jeff Jonas. In The Fast Last Puzzle Piece, he talks about how the notion that more data = slower system is not true. The analogy he uses is that of a jigsaw puzzle, which starts easy, gets harder and eventually gets easier again as pieces can only fit in certain specific positions (fewer degrees of freedom in language we are used to). He goes on to add that such behavior needs to fulfill a set of requirements and that’s what caught my eye. Essentially any set of observations must

  • Belong to the same universe
  • Have enough features to enable contextualization
  • Be such that the features can be extracted, enhanced and classified
  • Sufficiently saturate observational space

He adds that you need to have enough smarts to stitch everything together.

As I read that list, I kept thinking of the data we are used to seeing as life scientists. One would think it satisfied all the criteria above, so why are things getting harder? I think it has to do with the point around saturation. In many cases, we don’t have saturation, which is why we can’t get the required results. In others, structure prediction comes to mind, we do have sufficient saturation and we are able to get meaningful results as our body of work grows. However, right now, we haven’t hit that tipping point with a lot of data types that we are in a situation where the system gets “faster” and easier to solve.

What do you think?

Reblog this post [with Zemanta]