Tuesday, April 07, 2009

Physics by numbers

[This is the full version of my latest Muse for Nature News.]

A suggestion that the identification of physical laws can be automated raises questions about what it means to do science.

Two decades ago, computer scientist Kemal Ebcioglu at IBM described a computer program that wrote music like J. S. Bach. Now I know what you’re thinking: no one has ever written music like Bach. And Ebcioglu’s algorithm had a somewhat more modest goal: given the bare melody of a Bach chorale, it could fill in the rest (the harmony) in the style of the maestro. The results looked entirely respectable [1], although sadly no ‘blind tasting’ by music experts ever put them to the test.

Ebcioglu’s aim was not to rival Bach, but to explore whether the ‘laws’ governing his composition could be abstracted from the ‘data’. The goal was really no different from that attempted by scientists all the time: to deduce underlying principles from a mass of observations. Writing ‘Bach-like music’, however, highlights the constant dilemma in this approach. Even if the computerized chorales had fooled experts, there would be no guarantee that the algorithm’s rules bore any relation to the mental processes of Johann Sebastian Bach. To put it crudely, we couldn’t know if the model captured the physics of Bach.

That issue has become increasingly acute in recent years, especially in the hazily defined area of science labelled complexity. Computer models can now supply convincing mimics of all manner of complex behaviours, from the flocking of birds to traffic jams to the dynamics of economic markets. And the question repeatedly put to such claims is: do the rules of the model bear any relation to the real world, or are the resemblances coincidental?

This matter is raised by a recent paper in Science that reports on a technique to ‘automate’ the identification of ‘natural laws’ from experimental data [2]. As the authors Michael Schmidt and Hod Lipson of Cornell University point out, this is much more than a question of data-fitting – it examines what it means to think like a physicist, and perhaps even interrogates the issue of what natural laws are.

The basic conundrum is that, as is well known, it’s always possible to find a mathematical equation that will fit any data set to arbitrary precision. But that’s often pointless, since the resulting equations may be capturing contingent noise as well as meaningful physical processes. What’s needed is a law that obeys Einstein’s famous dictum, being as simple as possible but not simpler.

‘Simpler’ means here that you don’t reduce the data to a trivial level. In complex systems, it has become common, even fashionable, to find power laws (y proportional to x**n) that link two variables [3]. But the very ubiquity of such laws in systems ranging from economics to linguistics is now leading to suspicions that power laws might in themselves lack much physical significance. And some alleged power-laws might in fact be different mathematical relationships that look similar over small ranges [4].

Ideally, the mathematical laws governing a process should reflect the physically meaningful invariants of that process. They might, for example, stem from conservation of energy or of momentum. But it can be terribly hard to distinguish true invariants from trivial patterns. A recent study showed that the similarity of various dimensionless parameters from the life histories of different species, such as the ratio of average life span to age at maturity, have no fundamental significance [5].

It’s not always easy to separate the trivial or coincidental from the profound. Isaac Newton showed that Kepler’s laws identifying mathematical regularities in the parameters of planetary orbits have a deep origin in the inverse-square law of gravity. But the notorious Titius-Bode ‘law’ that alleges a mathematical relationship between the semi-major axes and the ranking of planets in the solar system remains contentious and is dismissed by many astronomers as mere numerology.

As Schmidt and Lipson point out, some of the invariants embedded in natural laws aren’t at all intuitive because they don’t actually relate to observable quantities. Newtonian mechanics deals with quantities such as mass, velocity and acceleration, while its more fundamental formulation by Joseph-Louis Lagrange invokes the principle of minimal action – yet ‘action’ is an abstract mathematical quantity, an integral that can be calculated but not really ‘measured’ directly.

And many of the seemingly fundamental constructs of ‘natural law’ – the concept of force, say, or the Schrodinger equation in quantum theory – turn out to be unphysical conveniences or arbitrary (if well motivated) guesses that merely work well. The question of whether one ascribes any physical reality to such things, or just uses them as theoretical conveniences, is often still unresolved.

Schmidt and Lipson present a clever way to narrow down the list of candidate ‘laws’ describing a data set by using additional criteria, such as whether partial derivatives of the equations also fit those of the data. Their approach is Darwinian: the best candidates are selected, on such grounds, from a pool of trial functions, and refined by iteration with mutation until reaching some specified level of predictive ability. Then parsimony pulls out the preferred solution. This process often generates a sharp drop in predictive ability as the parsimony crosses some threshold, suggesting that the true physics of the problem disappears at that point.

The key point is that the method seems to work. When used to deduce mathematical laws describing the data from two experiments in mechanics – an oscillator made from two masses linked by springs, and a pendulum with two hinged arms – it came up with precisely the equations of motion that physicists would construct from first principles using Newton’s laws of motion and Lagrangian mechanics. In other words, the solutions encode not just the observed data but the underlying physics.

Their experience with this system leads Schmidt and Lipson to suggest ‘seeding’ the selection process by drawing on an ‘alphabet’ of physically motivated building blocks. For example, if the algorithm is sent fishing for equations incorporating kinetic energy, it should seek expressions involving the square of velocities (since kinetic energy is proportional to velocity squared). In this way, the system would start to think increasingly like a physicist, giving results that we can interpret intuitively.

But perhaps the arena most in need of a tool like this is not physics but biology. Another paper in Science by researchers at Cambridge University reports a ‘robot scientist’ named Adam that can frame and experimentally test hypotheses about the genomics of yeast [6] (see here). By identifying connections between genes and enzymes, Adam could channel post-docs away from such donkey-work towards more creative endeavours. But the really deep questions, about which we remain largely ignorant, concern what one might call the physics of genomics: whether there are the equivalents of Newtonian and Lagrangian principles, and if so, what. Despite the current fads for banking vast swathes of biological data, theories of this sort are not going to simply fall out of the numbers. So we need all the help we can get – even from robots.

References

1. Ebcioglu, K. Comput. Music J. 12(3), 43-51 (1988).
2. Schmidt, M. & Lipson, H. Science 324, 81-85 (2009).
3. Newman, M. E. J. Contemp. Phys. 46, 323-351 (2005).
4. Clauset, A., Shalizi, C. R. & Newman, M. E. J. SIAM Rev. (in press).
5. Nee, S. et al., Science 309, 1236-1239 (2005).
6. King, R. D. et al., Science 324, 85-89 (2009).

No comments: