The chemical brain
Here’s my latest Crucible column for Chemistry World. A techie one, but no harm in that. I also have a feature on nanobubbles in this (September) issue, and will try to stick that up, in extended form, on my website soon.
Bartosz Grzybowski of Northwestern University in Illinois, who has already established himself as one of the most inventive current practitioners of the chemical art, has unveiled a ‘chemo-informatic’ scheme called Chematica that can stake a reasonable claim to being paradigm-changing. He and his colleagues have spent years assembling the transformations linking chemical species into a vast network that codifies and organizing the known pathways through chemical space. Each node of the network is either a molecule or element, or a chemical reaction. Links connect reactants and products via the nexus of a known reaction. The full network contains around 7 million compound nodes and about the same number of reaction nodes. Grzybowski calls it a “collective chemical brain.”
I predict a mixed reaction from chemists. On the one hand the potential value of such a tool for discovering improved or entirely new synthetic pathways to drugs, materials and other useful products is tremendous, and has already been illustrated by Grzybowski’s team. On the other hand, Chematica seems to imply that chemistry is indeed, as the old jibe puts it, just cookery, and is something now better orchestrated by computer than by chemists.
I’ll come back to that. First let’s look at what Chematica is. Grzybowski first described the network in 2005 , when he was mostly concerned with its topological properties rather than with chemical insights. Like the Internet or some social networks, the chemical network has ‘scale-free’ connectivity, meaning that the distribution of nodes with different degrees of connectivity n is a power law: the number of nodes with n links is proportional to n(exp-α), where α is a constant. This means that a few very highly connected nodes are the hubs that bind the network together and provide shortcuts. The same structure is also found in the reaction network of compounds in metabolic pathways.
In a trio of new papers the researchers have now started to put the network to use. In the first, they perform an automated trawl for new one-pot reactions that can replace existing multi-step syntheses . The advantages of single-step processes are obvious: no laborious separation and purification of products after each step, with consequent reductions in yield. Identifying potential one-pot processes linking molecular nodes that hitherto lacked a direct connection here means subjecting the relevant reactions to several filtering steps that check for compatibility – for example, checking that a water-solvated synthesis will not unintentionally hydrolyse functional groups. This filtering is painstaking in principle, but very quick once automated.
It is one thing to demonstrate that such one-pot syntheses are possible in principle, but Grzybowski and colleagues have ensured that at least some of those identified work in practice. Specifically, they looked for syntheses of quinoline-based molecules – common components of drugs and dyes – and thiophenes, which have useful electronic and optical properties. Many of the new pathways worked with high yields, in some cases demonstrably higher than those of alternative multi-step syntheses. Some false positives arise from errors in the literature used to build the network.
Another use of Chematica is to optimize existing syntheses – something previously reliant on manual or inexhaustive semi-automated searches. Looking for improved – basically, cheaper – routes to a given target is a matter of stepping progressively backwards from that molecule to preceding intermediates . An algorithm can calculate the costs of all such steps in the network, working recursively backwards to a specified ‘depth’ (maximum number of synthetic steps) and finding the cheapest option. Applied to syntheses conducted by Grzybowski’s company ProChimia, Chematica offered potential savings of up to 45 percent if instituted for 51 of the company’s targets. The greatest the number of targets, the greater the savings because of the economies of shared ingredients and intermediates.
Finally, and perhaps most controversially, the researchers show how Chematica can be used to identify threats of chemical-weapons manufacture by terrorists . The network can be searched for routes to harmful substances such as nerve agents using unregulated ingredients. Of course, it can also disclose such routes, but as with viral genomic data , open access to such data should be the best antidote to the risks they inherently pose.
Does all this, then, mean that synthetic organic chemists are about to be automated? The usual response is to insist that computers will never match human creativity. But that defence is looking increasingly under threat in, say, chess, maths and perhaps even music and visual art. In some ways chemical synthesis is as rule-bound as music if not chess, and thus ripe for an algorithmic approach. Perhaps at least some of the beauty rightly attributed to classic syntheses should be seen as illustrating human ingenuity in the face of tasks for which no better solution then existed. Synthetic schemes designed by humans surely won’t become obsolete any time soon – but there seems no harm in acknowledging that the time may come when the art and creativity of chemistry resides more solidly in our decisions of what to make, and why, than in how we make it.
1. M. Fialkowski, K. J. M. Bishop, V. A. Chubukov, C. J. Campbell & B. A. Grzybowski, Angew. Chem. Int. Ed. 44, 7263 (2005).
2. C. M. Gothard et al., Angew. Chem. Int. Ed. online publication 10.1002/anie.201202155 (2012).
3. M. Kowalik et al., Angew. Chem. Int. Ed. online publication 10.1002/anie.201202209 (2012).
4. P. E. Fuller, C. M. Gothard, N. A. Gothard, A. Wieckiewicz & B. A. Grzybowski, Angew. Chem. Int. Ed. online publication 10.1002/anie.201202210 (2012).
5. M. Imai et al., Nature 10.1038/nature10831 (2012).