The thermodynamics of images
This was a somewhat challenging topic I took on for my latest column for BBC Future.
On an unrelated matter, my talk on Curiosity at the Perimeter Institute in December is now online. The Q&A is here.
And while I am doing non-sequiturs, I am deeply troubled by the news that the Royal Institution has put its Albemarle St building up for sale to cover the debts incurred in the excessively lavish refurbishment (don’t get me started). Amol Rajan in the Independent is dead right: it would be a monstrous if this place were lost to science, and Faraday’s lecture theatre became a corporate office. It must be saved!
One of the unforeseen boons of research on artificial intelligence is that it has revealed much about our own intelligence. Some aspects of human perception and thought can be mimicked easily, indeed vastly surpassed, by machines, while others are extremely hard to reproduce. Take visual processing. We can give a satellite an artificial eye that can photograph your backyard from space, but making machines that can interpret what they ‘see’ is still very challenging. That realization should make us appreciate our own virtuosity in making sense of a visual field crammed with objects, some overlapping, occluded, moving, or viewed at odd angles or in poor light.
This ability to deconstruct immense visual complexity is usually regarded as an exquisite refinement of the neural circuitry of the human brain: in other words, it’s all in the head. It’s seldom asked what are the rules governing the visual stimulus in the first place: we tend to regard this as simply composed of objects whose identity and discreteness we must decode. But a paper published in the journal Physical Review Letters stands the problem of image analysis on its head by asking what are the typical statistical features of natural images. In other words, what sort of problem is it, really, that we’re solving when we look at the world?
Answering that question involves a remarkable confluence of scientific concepts. There is today a growing awareness that the science of information – how data is encoded, inter-converted and transported, whether in computers, genes or the quantum states of atoms – is closely linked to the field of thermodynamics, which was originally devised to understand how heat flows in engines and other machinery. For example, any processing of information – changing a bit in a computer’s binary memory from a 1 to a 0, say – generates heat.
A team at Princeton University led by William Bialek now integrates these ideas with concepts from image processing and neuroscience. The consequences are striking. Bialek and his colleagues Greg Stephens, Thierry Mora and Gasper Tkacik find that in a pixellated monochrome image of a typical natural scene, some groups of black and white pixels are more common than other, seemingly similar ones. And they argue that such images can be assigned a kind of ‘temperature’ which reflects the way the black and white pixels are distributed across the visual field. Some types of image are ‘hotter’ than others – and in particular, natural images seem to correspond to a ‘special’ temperature.
One way to describe a (black and white) image is to break it down into ‘waves’ of alternating light and dark patches. The longest wavelength would correspond to an all-white or all-black image, the shortest to black and white alternating for every adjacent pixel. The finer the pixels, the more detail you capture. It is equivalent to breaking down a complex sound into its component frequencies, and a graph of the intensity of each wavelength plotted against its wavelength is called a power spectrum. One of the characteristics of typical natural images, such as photos of people or scenery, is that they all tend to have the same kind of power spectrum – that’s a way of saying that, while the images might show quite different things, the ‘patchiness’ of light and dark is typically the same. It’s not always so, of course – if we look at the night sky, or a blank wall, there’s very little variation in brightness. But the power spectra reveal a surprising statistical regularity in most images we encounter.
What’s more, these power spectra have another common characteristic, called scale invariance. This means that pretty much any small part of an image is likely to have much the same kind of variation of light and dark pixels as the whole image. Bialek and colleagues point out that this kind of scale-invariant patchiness is analogous to what is found in physical systems at a so-called critical temperature, where two different states of the system merge into one. A fluid (such as water) has a critical temperature at which its liquid and gas states become indistinguishable. And a magnet such as iron has a critical temperature at which it loses its north and south magnetic poles: the magnetic poles of its constituent atoms are no longer aligned but become randomized and scrambled by the heat.
So natural images seem to possess something like a critical temperature: they are poised between ‘cold’ images that are predominantly light or dark, and ‘hot’ images that are featureless and random. This is more than a vague metaphor – for a selection of woodland images, the researchers show that the distributions of light and dark patches have just the same kinds of statistical behaviours as a theoretical model of a two-dimensional magnet near its critical temperature.
Another feature of a system in such a critical state is that it has access to a much wider range of possible configurations than it does at either lower or higher temperatures. For images, this means that each one is rather unique – they share few specific features, even if statistically they are similar. Bialek and colleagues suspect this might be why data files encoding natural images are hard to compress: the fine details matter in distinguishing one image from another.
What are the fundamental patterns from which these images are composed? When the researchers looked for the most common types of pixel patches – for example, 4x4 groups of pixels – they found something surprising. Fully black or white patches are very common, but as the patches become divided into increasingly complex divisions of white and black pixels, not all are equally likely: there are certain forms that are significantly more likely than others. In other words, natural images seem to have some special ‘building blocks’ from which they are constituted.
If that’s so, Bialek and colleagues think the brain might exploit this fact to aid visual perception by filtering out ‘noise’ that occurs naturally on the retina. If the brain were to attune groups of neurons to these privileged ‘patches’, then it would be easier to distinguish two genuinely different images (made up of the ‘special’ patches) from two versions of the same image corrupted by random noise (which would include ‘non-special’ patches). In other words, natural images may offer a ready-made error-correction scheme that helps us interpret what we see.
Reference: G. J. Stephens, T. Mora, G. Tkacik & W. Bialek, Physical Review Letters 110, 018701 (2013).