Is it time for a thorough re-evaluation of how the genome is structured and how it operates? Recent work seems to be hinting that that could be so, as I’ve argued in an article in the August issue of Prospect, reproduced below (with small changes due to editing corrections that somehow got omitted). The idea is further supported by very recent work in Nature from Eran Segal at the Weizmann Institute and Jonathan Widom of Northwestern, which claims to have uncovered a kind of genomic code for DNA packing in nucleosomes. I suspect there is much more to come – this latest work already raises lots of interesting questions. Is it time to start probing the informational basis of the mesoscale structure of chromatin, which is clearly of great importance in transcription and regulation? I hope so. Watch this space.
[From Prospect, August 2006, p.14, with minor amendments]
The latest findings in genetics and molecular biology are revealing the human genome, the so-called “book of life”, to be messy to the point of incomprehensibility. Each copy is filled with annotations that change the meaning, there are some instructions that the book omits, the words overlap, and we don’t have a clue about the grammar.
Or maybe we simply have the wrong metaphor. The genome is no book, and the longer we talk about it in that way, the harder it will be to avoid misconceptions about how genes work.
But it’s not just the notion of the genome as a “list of parts” that now appears under threat. The entire central dogma of genetics—that a gene is a self-contained stretch of a DNA molecule that encodes instructions for making an enzyme, and that genetic inheritance works via DNA—is now under revision. It is not that this picture is wrong, but it is certainly incomplete in ways that are challenging the textbook image of how genes work.
Take, for example, the discovery in May by a team of French researchers that mice can show the physiological expression (the phenotype) of a genetic mutation that produces a spotty tail even if they don’t carry the mutant gene. Minoo Rassoulzadegan’s group in Nice found that the spotty phenotype can be induced by molecules of RNA that are passed from sperm to egg and then affect (in ways that are not understood) the operation of genes in the developing organism.
RNA is not normally regarded as an agent of inheritance. It is the ephemeral intermediate in the translation of genetic information on DNA to protein enzymes. According to the standard model of genetics, a gene on DNA is “transcribed” into an RNA molecule, which is then “translated” into a protein. In the book metaphor, the RNA is like a word that is copied from the pages of the genome and sent to a translator to be converted into the “protein” language.
The RNA transcripts are thrown away once they have been translated—they aren’t supposed to ferry genetic information between generations. Yet it seems that sometimes they can. That is profoundly challenging to current ideas about the role of genes in inheritance. It is not exactly a Lamarckian process (in which characteristics acquired by an organism through its interaction with the environment may be inherited)—but it is not consistent with the usual neo-Darwinian picture.
“Inheritance” via RNA is an example of so-called epigenetic variation: a change in an organism’s phenotype that is not induced by a change in its genome. Rassoulzadegan thinks that this offers nature a way of “trying out” mutations in the Darwinian arena without committing to the irreversible step of altering the genome itself. “It may be a way to induce variations quickly without taking the risk of changing primary DNA sequences,” he says. “Epigenetic variations in the phenotype are reversible events, and probably more flexible.” He also suspects they could be common.
There is even evidence that genetic mutations can be “corrected” in subsequent generations, implying that back-up copies of the original genetic information are kept and passed on between generations. This all paints a more complex picture than the standard neo-Darwinian story of random mutation pruned by natural selection.
Epigenetic influences on the development of organisms have been known for a long time. For example, identical twins with the same genomes don’t necessarily have the same physical characteristics or disposition to genetic disease. It’s clear that the influence of genes may be altered by their environment: a study in 2005 showed that differences in the activity of genes in identical twins become more pronounced with age, as the messages in the genome get progressively more modified. These “books” are constantly being revised and edited.
One way in which this happens is by chemical modification of DNA, such as the attachment of tags that “silence” the genes. These modifications can be strongly influenced by environmental factors such as diet. In effect, such epigenetic alterations constitute a second kind of code that overwrites the primary genetic instructions. An individual’s genetic character is therefore defined not just by his or her genome, but by the way it is epigenetically annotated. So rapid genetic screening will provide only part of the picture for the kind of personalised medicine that has been promised in the wake of the genome project: merely possessing a gene doesn’t mean that it is “used.”
Even the basic idea of a gene as a self-contained unit of biological information is now being contested. It has been known for 30 years that a single gene can encode several different proteins: the genetic information can be reshuffled after transcription into RNA. But the situation is far more complex than that. The molecular machinery that transcribes DNA doesn’t just start at the beginning of a gene and stop at the end: it seems regularly to overrun into another gene, or into regions that don’t seem to encode proteins at all. (A sobering 98-99 per cent of the human genome consists of such non-coding DNA, often described as “junk.”) Thus, many RNA molecules aren’t neat copies of genetic “words,” but contain fragments of others and appear to disregard the distinctions between them. If this looks like sloppy work, that may be because we simply don’t understand how the processes of transcription and translation really operate, and have developed an over-simplistic way of describing them.
This is backed up by observations that some RNA transcripts are composites of “words” from completely different parts of the genome, as though the copyist began writing down one word, then turned several pages and continued with another word entirely. If that’s so, one has to wonder whether the notions of copying, translation and books really have much value at all in describing the way genetics works. Worse, they could be misleading, persuading scientists that they understand more than they really do and tempting them towards incorrect interpretations. Already there are disagreements about precisely what a gene is.
In 2002 US scientists reported that the known protein-encoding regions of the genome account for perhaps less than a tenth of what gets transcribed into RNA. It is hard to imagine that cells would bother with the energetically costly process of making all that RNA unless it needed to. So apparently genetics isn’t all, or perhaps even primarily, about making proteins from genetic instructions.“The concept of a gene may not be as useful as it once was”, admits Thomas Gingeras of the biotech company Affymetrix. He suggests that a gene may be not a piece of DNA but a collective phenomenon involving a whole group of protein-coding and non-coding RNA transcripts. Perhaps these transcripts, not genes in the classical sense, are the fundamental functional units of the genome.
Scientists are constantly trying to express what they discover using concepts that are already familiar—not only when they communicate outside their field, but also when they talk among themselves. This is natural, and probably essential in order to gain a foothold on the slopes of new knowledge. But there’s no guarantee that those footholds are the ones that will lead us in the right direction, or anywhere at all. Relativity and quantum mechanics are still, after 100 years, deemed difficult and mysterious, not because they truly are but because we don’t really possess any good metaphors for them: the quantum world is not a game of billiards, the universe is not an expanding balloon and a light beam is not like a bullet train. Genetics, in contrast, looked accessible, because we thought we knew how to talk about information: that it is held as discrete, self-contained and stable packages of meaning that may be kept in data banks, copied, translated. Meaning is constructed by assembling those entities into linear strings organised by grammatical rules. The delight that accompanied the discovery of the structure of DNA half a century ago was that nature seemed to use this model too.
And indeed, the notion of information stored in the genome, passed on by copying, and read out as proteins, still seems basically sound. But it is looking increasingly doubtful that nature acts like a librarian or a computer programmer. It is quite possible that genetic information is parcelled and manipulated in ways that have no direct analogue in our own storage and retrieval systems. If we cleave to simplistic images of books and libraries, we may be missing the point.