homunculus: 2024

Monday, May 27, 2024

Thoughts on a podcast about How Life Works

The podcast run by the computer company Oxide has hosted a discussion of my book How life Works. It’s deeply thoughtful, engaged, and also fun, and I’m very grateful for it. I’d recommend it to anyone as a fascinating listen for its own sake.

I’m also particularly grateful to the biologist on the panel, Greg Cost, for such a considered appraisal. Greg doesn’t agree with everything in the book (and he’s a Dawkins fan, after all!), but he is generous and measured in all that he says.

I have some comments on some of that. One small issue: while it is of course true that cancer is mostly linked to genetic “breakdowns” of some sort, this is not invariably so.

What strikes me most about Greg’s comments – given that he is someone who is evidently very knowledgeable, thoughtful, and receptive – is his pessimism about figuring life out, at least until we have pinned down every last detail. I was pleased and relieved to hear that he recognizes the significance of the various new aspects that I cover, such as the importance of RNA, protein disorder, condensates etc. But it seems that for him this just makes the matter all the more complicated – even though I suggest in the book ways in which we can start to integrate these considerations into a synoptic view that does after all seem to have general principles (just not the ones we once believed!).

It's not that Greg seems to think the general considerations I identify are wrong or misplaced. He simply doesn’t remark on them at all, as if they were not there. I’m struck by this because it’s rather similar to one or two other reviews of the book I’ve had from thoughtful and receptive biologists: they say yes, yes to all the details, but are simply silent about – as if blind to – efforts to draw broader insights from them.

Thence Greg’s view that made what we’ll need is Sydney Brenner’s approach of exhaustive simulations down to the last molecule. That, to me, is the strategy of desperation: “We’ve run out of ideas so we just have to be literal about it and include everything”. What I want the book to suggest is that, not only is this not necessary but it is not helpful. It gives no real insight, but also it denies the fact that no highly complex system can work if every detail matters. What the “new biology” is telling us is precisely that not all the details do or can matter – and what principles might be needed that allow a system to transcend its microscopic details.

Greg, for example, says that his heart sank when we started to realise that the formation of condensates/phase-separated blobs by proteins and RNAs is a common process in our cells. It seems that to him this merely drives another nail into the coffin of the old picture of all things happening in aqueous solution. But I have been immensely excited and stimulated by it. As I suggest in the book, what we’re surely seeing here is a general principle whereby cells create structure and marshal many molecules into a shared location where they can work collectively. Perhaps it’s simply that this is a “condensed matter” way of thinking that is (as I have discovered) wholly unfamiliar to biologists. But I can’t help but suspect too that all biologists tend to see is a further receding of long-cherished ideas. I was struck by Greg’s apparent longing to retain the “DNA blueprint” picture, seemingly by means of saying that there is after all still a blueprint but also lots of “noise” between it and the phenotype. To my mind, that “noise” is metaphorical, perhaps even psychological: it implies “details we don’t and may never understand, but which can be invoked to explain away anything that doesn’t fit a blueprint picture”. The story in the podcast about the building of household steps was telling: in the end it amounted to “Well, the steps were made according to a blueprint, even though the blueprint was ignored”. The actual construction, the application of a real joiner’s skill in seeing how to do the job, then became the “noise” that accounted for the difference between the alleged blueprint and the reality. In other words, “noise” here is the gap between how we see life work, and how we would like it to work.

This attitude becomes particularly evident in Greg’s comment about theoretical biologist David Penny’s remark from the book that he’d have been proud to be on the committee that designed the E. coli genome, but not the human genome. Greg takes this to imply that the genomes of bacteria/prokaryotes are somehow “more evolved” and thus better designed, as though the 3.5(?) bn year evolutionary history of prokaryotes has perfected their genomes.

I’m very curious about this, because it is so obviously wrong (as well as missing the point) that, coming from a very smart guy, it seems to suggest that something else is going on.

The implication seems to be that prokaryotic genomes were once as seemingly messy and confusing as ours, but have been cleaned up and streamlined by their longer evolution. But that’s simply not so. After all, no genome was produced de novo since life began. Rather, we obviously evolved from prokaryotes, and so our genomes have acquired their alleged messiness out of the simplicity of theirs. Indeed, molecular phylogenetics allows us to effectively watch this happen.

So the puzzling complexity and seeming messiness of our genome is absolutely not something that, given time, evolution could and would simplify. Rather, this difference seems to have been *necessary* to make complex metazoans possible.

So it rather feels as though Greg is desperately eager to find a way of making biology “make sense” – which E. coli seems to promise with its more transparent logic. This then becomes how biology “should be”, while we are a messy aberration. That is reinformed by the fact that, when one of the panellists raises my point that it is evidently now known that what is true for E. coli is not necessarily true for the elephant, Greg immediately and only talks about the ways in which we are the same.

As I say in the book, Penny’s remark reveals how far the answer of “how life works” for us humans is from our intuitions based on machine-like technologies. The fact that we look at it and think “Good God, what?!” is not an indication that nature has done a terrible job, but that we need to rethink our preconceptions. And that’s what this book is about.

Oh, and guys, guys: yes, you’re computer scientists so you know all about Alan Turing. But I’m not sure you know how much he was (and perhaps still is) not known beyond your field. And yes, as history The Imitation Game was pretty terrible - but that doesn’t mean it didn’t help to put Turing on the radar of lots more people. Just as it may be inconceivable to you that plenty of people don’t know about Turing, so it is not easy for me to accept that some people who know a lot about Turing did not know at all about his morphogenesis paper. We’re all constrained by our priors!

But in any event, thank you chaps for a nice discussion.

Thursday, April 18, 2024

Genes, in more context: a reply to a review of How Life Works

SFSU biologist Michael Goldman has written a generous and thoughtful review of How Life Works in Science. He raises some issues that I will respond to here.

I figured it was always likely, indeed inevitable, that some would respond to my book by saying it creates straw men of one kind of another. (Goldman doesn’t exactly say this; although his comments could be read that way, he phrases them more kindly.) After all, given that what I’m saying reflects rather than challenges what molecular and cell biology has been revealing over the past few decades, obviously all of it will to some degree or another be “known” to some practising biologists. What I am arguing is that, in the light of this work, it is time to reconsider some of the narratives about “how life works” that are passed on to the public.

In this regard, then, when Goldman says “We already knew this” (for example, “the edifice Ball builds up and deftly tears down, if it ever existed at all, had not been a major factor in genetics or evolutionary biology for decades before [the Human Genome Project]”), I suspect he is referring to things recognized by many professional biologists, but not by plenty of lay people. In particular, they have often been given lazy, outdated and misleading narratives about genes that are now causing problems for public understanding of genomic technologies, medicines, and genealogies. In the light of my own experience, I suspect Goldman might be a little horrified to discover what misconceptions or over-simplifications some – even other scientists – still believe about biology, and especially about genetics.

Goldman affirms that genes play a “central role” in biology. This, in my view, is beyond question. Of course they do, and I would not want anyone to think my book suggests otherwise. Insofar as I talk about genes (and only one of ten chapters addresses them directly, though this seems to be what quite a few reviewers have latched onto – which tells a tale in itself!), I want rather to examine exactly what that role is. I suggest that it is not a “blueprint” role, nor an “instruction book” role. It is harder to summarize than that, but it is precisely because of research over the past few decades that we can now begin to find better metaphors.

Among the things we didn’t know before the completion of the Human Genome Project, and which I explore in the book and which I believe have significant consequences for our notions of how life works, are these:

- The real extent of regulatory machinery, including masses of non-coding RNA. There were almost no noncoding genes recognized before the HGP began; now many estimates suggest they outnumber coding genes. It would be bizarre to suppose that this alone does not significantly revise our narrative of the human genome.

- How gene regulation in humans really works, which is not in general how it works in prokaryotes.

- The extent to which genomic linkages for many traits and diseases lie outside the coding part of the genome.

- The vital importance of protein disorder, particularly in helping to explain biomolecular promiscuity and the kinds of linkages between signalling or regulatory pathways that promote pleiotropy. (Incidentally, yes of course it is true that pleiotropy has been long known – for over a century, in fact. But for a long time it was simply a word to describe a puzzling phenomenon. We have known for some time how isolated cases arise, but only relatively recently have we understood how polygenic most traits are. And only even more recently have we started to be able to say why.)

- The widespread occurrence of liquid-liquid phase separation as a mechanism for creating organization in the cell – an issue that itself is closely bound up with the recognition of the roles of disordered proteins, RNA-binding proteins, and noncoding RNA.

- The possibility of reprogramming cell states, à la Yamanaka. This is of course an immensely big issue for our understanding of cell fate determination as well as biomedically.

- The complexity and diversity of transcriptional landscapes, revealed for example by single-cell RNA sequencing.

The argument I lay out in the book is that it is developments like these that constitute a “new biology”, insofar as they enable us to tell new stories about how life works that go beyond, and in fact often undermine, earlier, simplistic ideas about “gene action”.

Goldman seems to imply that the notion of agency is invoked as “the idea that there is a gap between what we can explain mechanistically and what we observe” – an idea that he says “has gained traction in philosophy of science circles but has been resisted by mainstream science.”

I am not sure what he means by this. There is lots in biology that we currently can’t explain mechanistically (especially in neuroscience), but I see no reason why explanations won’t be found eventually, if they are addressing the right level. Agency is not about that! It is not some mysterious property that goes where mechanism cannot. Rather, it speaks to a more top-down view of life: rather than trying to reduce life’s mechanisms to principles that are no different to those that operate in inorganic matter, it starts by considering what truly differentiates the living from the non-living.

To imagine that agency is a kind of resurgent vitalism seems a little like scientists pre-Boltzmann complaining that thermodynamics invokes this mysterious entity called “entropy” rather than sticking to a strictly Newtonian view of matter as billiard-ball atoms. Or perhaps an even better analogy: it’s like supposing that the concepts psychologists use to talk about the properties of mind are suspect and a bit woo because they don’t start from action potentials.

I am not sure how Goldman could have got this impression about efforts to develop theories of agency, except to say that this seems to confirm my sense that many biologists struggle even to recognize what agency – the central property of all living things – can mean. As I say in the book, this is an extraordinary lacuna in biology.

I hope this won’t sound like grouchiness. I truly appreciate Goldman’s comments, and am delighted that he find the book offers “a great chance to review and admire the beauty and complexity of life” – that was indeed a key objective. And if he sometimes seems to miss the message I wanted to convey, that’s a reason for me to think carefully about whether I conveyed it clearly enough. But I hope these remarks help a little to clarify what I’m aiming to do in this book.

Thursday, March 28, 2024

How our cells cope with oxygen stress: a paradigm of life's fuzzy, distributed control

My nephew Andrew, a chemistry postdoc at Oxford, has just published a paper in JACS on developing inhibitors of the protein HIF (hypoxia inhibitory factor) 1A. Hurrah for him! And this got me curious enough to delve into what this molecule does. Andrew had told me before that it’s a transcription factor, which naturally led me to guess it has a fair degree of intrinsic disorder – as is indeed the case (see the floppy bits of polypeptide chain here):

Why? Because most eukaryotic TFs do, as they tend to operate in conjunction with a host of other molecules such as cofactors and seem to benefit from having a degree of promiscuity in their interactions.

That’s just one way in which I suspected a protein like this might exemplify the ways in which our molecular mechanisms operate. And indeed, this turns out to be the case. At face value, how HIF1A (sometimes written as HIF1[alpha]) does what it does looks ever more perplexingly, indeed impossibly, complicated the harder you look. But in every respect I found those details confirming the kind of picture I have tried to sketch in my book How Life Works – and I’d hope that the book might help a non-specialist see how there are actually some generic principles operating in a case like this that can bring some sense of order and logic to what otherwise appears utterly confusing. So if you’re ready for the ride, strap in.

HIF1A is a member of a family of HIF proteins, in mammals encoded by the genes HIF1A, HIF2A and HIF3A. The proteins enable cells to cope with oxygen-depleted circumstances, in general by activating or inhibiting the expression of certain genes. For example, HIF1A can upregulate expression of vascular endothelial growth factor (VEGF), a key gene involved in angiogenesis (the formation of new blood vessels), so as to encourage the formation of new sources of oxygenation. For this reason, HIFs are not merely activated in unusual conditions of oxygen stress but are a crucial part of normal development, and are associated with disorders of blood circulation, such as atherosclerosis, hypertension and aneurysms. The 2019 Nobel Prize in physiology or medicine was awarded to William Kaelin, Peter Radcliffe and Gregg Semenza for their work in discovering the HIF proteins and how they regulate the cell’s response to hypoxia.

HIF1A has also become a focus of interest for cancer treatments, because if it can be inhibited specifically in cancer cells, this could enable the tumour to be slowed or even killed by oxygen depletion. That’s what Andrew and his colleagues are working on.

The basic mode of action is interesting, but also a major saga in itself. HIF1A is produced even when the cells have plenty of oxygen – but is then targeted by enzymes that stick ubiquityl groups onto it so as to label it for destruction by proteases. Those ubiquitylating enzymes are oxygen-sensitive, and if lack of oxygen stops them working, HIF1A is no longer degraded but is free to do its regulatory work as it accumulates in the cell nucleus. (This bit of the story, like all the others, is actually rather more complicated, as HIF1A degradation is also sensitive to factors other than oxygenation, such as nutrient levels – there is evidently a fair amount of context dependence and integration of various input signals determining HIF stability. What HIF1A does, and indeed how stable it is, is also influenced by having other chemical groups appended to it: phosphorylation, SUMOylation and acetylation.)

In the nucleus, HIF1A dimerizes with another member of the family, HIF1B [or beta] (which has two subunits, encoded by the genes ARNT1 and ARNT2) to form a complex that can bind to DNA and regulate genes. Those genes it regulates have promoter groups denoted hypoxia-response elements (HREs) that the HIF1A/1B complex recognizes. These are generally close to the target genes themselves, but not always; some are distal.

OK, so far it seems like classic switch-like regulation (albeit fiendishly complicated!). But here’s where things get complicated. For one thing, there are many more genomic loci carrying the 5-base-pair HRE recognition sequence than there are actual HRE binding sites. In fact, less than 1% of the potential HRE sites are bound by HIFs in response to hypoxia. How come HIF1A/1B isn’t sticking to all those others too? No one really knows. But it seems that some of the selectivity depends on sequences flanking the HREs, in a manner as yet unclear. This reminds me of the work I wrote about recently by Polly Fordyce at Stanford and colleagues, who showed that repetitive sequences flanking regulatory sites, previously dismissed as “junk”, might act as a sort of attractive well that accumulates and holds onto the regulatory molecules like TFs, via weak and fairly non-specific interactions that nevertheless somehow cumulatively impart the right selectivity. These so-called short tandem repeats act as a kind of “lobby” where the molecules can hang around so that they are ready when needed. I’ve no idea if anything like that is happening here, but it shows that we should not be too ready to dismiss parts of the genome that seem literally peripheral and “probably” useless. However, it seems likely that factors other than the DNA sequences are also influencing HIF binding to HREs.

What’s more, HIF1A doesn’t do its job alone. Eukaryotic TFs hardly ever do. There is a whole host of other molecules involved in regulating those genes, as evident in this diagram from one article:

When I see something like this, I now know not to take it too literally. It may well be that these molecules aren’t getting together in well defined and stoichiometric complexes, but are more probably associating in looser and fuzzier ways – perhaps involving what some call transcriptional hubs or condensates, blobs with liquid-like behaviour that constitute a distinct phase from the rest of the nucleoplasm. I haven’t been able to find any indication that this is what goes on for the HIF proteins, but it wouldn’t surprise me, given how such structures seem to be involved in other regulatory processes. One review of this topic simply says that “HIF1A may stimulate transcription either by means of cooperative DNA binding or cooperative recruitment of coactivators.” (That word “recruitment” is always a giveaway, since obviously the protein is not literally summoning its coactivators from afar – “recruitment” tends to mean “these molecules somehow gather and act together in a way we don’t understand.”)

And get this: “HIF1A has been shown to contribute to transcriptional control independently of its DNA binding activity, working instead in partnership with other DNA binding proteins to affect other cellular pathways.” In other words, there seems to be another (at least one other?) mechanism by which HIF1A does its regulatory work. How is a protein designed to do a job in two totally different ways? The answer is surely that it is modular. But how do these different channels depend on one another, if at all? When does one happen, and when the other? At what level is that decision made?

So in short: what the hell? How can we start to make any sense of this process, beyond the morass of details? Well, here’s the key thing: it seems that this fuzziness and multiplicity of actors enables the regulatory process to be sensitive to higher-level information – so that exactly which genes the HIF complex regulates is tissue- or cell-state-specific. That, after all, is what we’d expect: what’s needed to survive hypoxia will vary between tissues, so the response has to be attuned to that. This is an illustration of why we mustn’t imagine that Crick’s Central Dogma gives any kind of indication of the overall information flow in cells: it is not simply from DNA to RNA to protein (even if that applies to sequence information). What a protein does will be sensitive to higher-level information too.

And in fact, even what a protein is is sensitive to that too. We have known about alternative splicing – the creation of different mRNAs, and thus proteins, from the same primary transcript – since the 1970s, of course. But I am not convinced, despite protestations to the contrary, that the implications of that have really filtered through to the public consciousness, not least in terms of how it undermines the notion of a genetically encoded “program”. Contextual information from the surroundings literally changes the output of the Central Dogma. And the HIF family offer a great illustration of this, as you’ll see.

The other two alpha units of the HIF family, HIF2A and HIF3A, also bind to HIF1B. HIF2A has its own set of target genes. But weirdly, its DNA binding domain is very similar to that of HIF1A, and so the sequences HIF1A and HIF2A bind are essentially identical. Yet they do target different genes. How on earth does that happen? Well, it seems that for one thing they have differently spliced varieties (isoforms), meaning that the proteins are stitched together differently by the spliceosome during translation. Still, it’s hard to figure out how, or if, this is the key factor in their target specificity. One review says:

Although several studies have attempted to define the isoform-specific transcriptional programs, few common themes have emerged from these investigations, thus highlighting the complex nature of this cellular response. Variables such as cell type, severity, duration and variety of stimulation, the presence of functional VHL, and even culture conditions reportedly influence the transcriptional output mediated by HIF1A versus HIF2A. Furthermore, many of these studies have only examined either HIF1A or HIF2A, and untangling HIF-dependent from HIF-independent hypoxia-induced responses has proved challenging.

Again, what the hell? And again: it’s clear that a whole bunch of higher-level information is involved in determining the outcomes. For example, a part of the cell-type specificity seems to relate to the state that the chromatin is in: how it is packaged.

HIF3A, meanwhile, seems a little different from HIF1A and HIF2A, both in terms of sequence and functionally. There are several – around six – alternative splicing variants with different regulatory functions. Some of these seem to have a negative regulatory action – for example, one isoform of HIF3A inhibits HIF1A. HIF3A seems to be a classic example of a protein with very tissue-specific alternative splicing: one form, called HIF3A4, for example, is expressed only in the corneal lens epithelium and controls vascularization there in response to hypoxia.

There’s more. How does HIF binding actually alter gene expression? Well, it’s sure not in the way the classic regulatory paradigm, the lac operon of E. coli, does it: by simply blocking RNA polymerase from attaching and transcribing the adjacent gene. Or perhaps we should say that yes, ultimately it’s a matter of hindering transcription, but in a manner that is far more complicated. In essence, HIF does this by initiating a change in the way the chromatin in that region is packed, for example by making the packing denser so that the DNA there is inaccessible to transcription.

And this too is subtle. One thing HIF binding does is to trigger enzymes that stick methyl groups onto the histone proteins around which DNA is wound in chromatin. Such changes are known to affect chromatin packing, but the details aren’t well understood. Certainly it’s not as simple as saying that methylation makes the histones bulkier and less well packed; sometimes that process will enhance transcription, and sometimes inhibit it. We don’t know what the “rules” are. But they aren’t, I think, going to be governed by any sort of simple, digital code – not least because they involve issues of three-dimensional molecular structure and solvation, and appear again to have a context dependence.

Nor should we imagine that the hypoxia response is merely a matter of the HIF proteins. Several others are involved too. At this point you might want to despair of making any sense of it all. But the point is that this process resembles nothing more than what goes on in the brain, where information from many sources is integrated and contextualized in the process of generating some appropriate output. That process involves several different scales – it is not simply a matter of this molecule speaking to that one in linear chains of communication. In this case, a more useful framework for thinking about the problem is one that is cognitive and analogue, not mechanical and digital.

Oh, there are yet more wrinkles, but I’m going to spare you those. The bottom line is that there are perils in taking the tempting line of explaining how HIF1A works by saying something along the lines of “It is a master regulator that switches genes on or off when the cell gets hypoxic.” That is true in a sense, but risks giving a false impression of understanding. In the end it implies that proteins (or their genes) just “do” things, as if by magic, and so suggests that they are in control. In reality, the way it works bears little resemblance to those pictures of blobby molecules sticking together and working via magical arrows. The information flow is much more omnidirectional, and the logic is fuzzy and combinatorial, and also poorly understood in many respects, and only makes sense if we take into account the system as a whole. When we do, it becomes clear that there is no basis for saying that genes like HIF1A dictate the hypoxia response; we can with more justification say that cells “decide” how to use their genetic resources to mount a response that is appropriate to their particular state and circumstances.

This is nothing that molecular biologists don’t know (and it is phenomenally impressive that they have got as far as they have). But I believe we need better ways to tell the story, which do justice to the real ingenuity, versatility, and contextuality of life.