Here’s the pre-edited version of my latest story for Nature’s online news, with added bonus boxes. There was far too much interesting stuff in this paper to cram into 700 words or so. And more on the way from others working in this field: watch this space.
Signs of impending social and political change may lie hidden in a sea of data.
You could have foreseen the Arab spring if only you’d been paying enough attention to the news. That’s the claim of a new study which shows how ‘data mining’ of news reportage can reveal the possibility of future crises well before they happen.
Computer scientist Kalev Leetaru at the University of Illinois in Champaign has trawled through a vast collection of open-access news reporting and examined the ‘tone’ of the news about Tunisia, Egypt and Libya, where long-established dictatorial political leaders have been deposed by public uprisings in the so-called Arab spring. In all cases, he says, there was a clear, steady trend towards a negative tone for about a decade before the revolts .
While this doesn’t predict either the course or the timing of the events during last spring and summer, Leetaru argues that it provided a clear indicator of an impending crisis. “I strongly doubt we'll ever get to the point where we can say ‘at 5:05PM next July 2nd there will be a riot of 20 people at such and such street corner’”, he says. “Rather, the value of this class of work lies in warning of changing moods and environments, and increased vulnerability to a sudden shock”.
Erez Lieberman Aiden of Harvard University, who has explored the mining of digitized literary texts for linguistic and historical trends, agrees. “Leetaru’s work is interesting not so much because it makes predictions, but because it points to the power and the opportunity latent in new ways of analyzing large-scale news databases”, he says.
Political scientist Thomas Chadefaux of the Swiss Federal Institute of Technology (ETH) in Zurich, Switzerland, calls the paper “a welcome addition to a field – political science – that has cared very little about finding early warning signals for war, or making predictions at all.”
Long-term trends can be subtle and hard to spot by subjective and partial monitoring of the news. But they might presage crises more reliably than does a focus on the short term. For example, while there was talk during the spring of the possibility of similar public uprisings in Saudi Arabia, reflected in a rather negative tone in the news there during March 2011, the long-term data showed that spell to be no worse than other fluctuations in recent years – there was no worsening trend. On this basis, one would have predicted the failure of the Arab spring to unseat the Saudi rulers.
“If we think of the vast array of digital information around us today as an ocean of information, up to this point we've largely been studying the surface”, says Leetaru. “The idea behind this work is to poke our heads beneath the water for a moment to show that there's a vast world down there that we've been missing”. He thinks that automated news analysis that looks for information about mood, tone or spatial references could supply something like a political weather forecast, “offering updated assessments every few minutes for the entire planet and pointing out emerging patterns that might warrant further investigation.”
Leetaru has used the immense collection of news reports in the Summary of World Broadcasts (SWB), a monitoring service set up by the British intelligence service just before World War II to assess world opinion. The SWB now includes newspaper articles, television and radio broadcasts, periodicals and a variety of other online resources from over 130 countries.
Previous efforts to extract ‘buried’ information from vast literary resources – an approach dubbed ‘culturomics’ – have tended to focus on quantifying the occurrence of certain key words . In contrast, Leeratu conducted ‘sentiment mining’ of the sources by assessing their positive or negative tone, looking for evaluation words such as ‘terrible’, ‘awful’ or ‘good’. He used computer algorithms to convert these data trawls into a single parameter that quantified the tone of the news, normalized so that the long-term average value is zero.
For Egypt, the tone in early 2011 fell to a negative value seen only once before in the past three decades. What’s more, at that same time the tone of the coverage specifically mentioning the (now deposed) president Hosni Mubarak reached its lowest ever level for his almost 30-year rule. Similar falls to highly unusual low points were found for Tunisia and Libya.
This didn’t in itself predict when those crises would happen – it seems likely, for example, that rocketing food prices helped to trigger the Arab spring revolts . But it might reveal when a region or state is ripe for unrest. Dirk Helbing, a specialist in modeling of social systems at ETH, compares it to the case of traffic flow: computer models can help to spot when traffic is in a potentially unstable state, but the actual triggers for jams may be random and unpredictable.
By the same token, it remains to be seen whether this approach can spot signs of trouble in advance, rather than retrospectively finding them foreshadowed in the media. “It is obviously much easier to find precursory signs when you know where to look than to do it blindly”, says Chadefaux.
But if news mining does turn out to offer a crystal ball, “the question is what kinds of use we’ll make of this information”, says Helbing. “Will governments act in a responsive way to avoid crises, say by improving people’s living conditions, or will they use it to police dissatisfied people in a preventative way?”
1. Leeratu, K. First Monday 16(9) (online only), 5 September 2011. Available here.
2. Michel, J. B. et al., Science 331, 176-182 (2010).
3. Lagi, M., Betrand, K. Z. & Bar-Yam, Y. http://arxiv.org/abs/1108.2455 (2011).
Read all about it
Where is Osama bin Laden?
Leetaru also looked at whether the sources of news reports might provide information about the spatial location of events. He analysed all media references to Osama bin Laden since 1979 to look for co-occurrences of geographical places. Between bin Laden’s rise to media prominence in the 1990s and his capture and killing in 2011, the most common associations were with northern Pakistan, within a 200-km radius of the cities of Islamabad and Peshawar – the region in which he was finally found.
How the world looks from here
News sources are often criticized for being too parochial. That turns out to be a valid complaint, at least for the US news: Leetaru found that even the New York Times, a relatively ‘internationalist’ newspaper, constantly refers reports in other countries back to the US. “Nearly every foreign location it covers is mentioned alongside a US city, usually Washington DC”, he says.
By looking for such co-references to specific cities or other geographical landmarks throughout the world, Leetaru extracted a map of how the global news links nations into ‘world civilizations’. For SWB these correspond largely to the recognized geographical affiliations: Australasia, the Middle East (including much of northeast Africa), the Americas and so forth. But there are anomalies: Spain is linked to South America, and France and Portugal to southern Africa, showing that the imprint of imperial history is still felt in the world. Strikingly, however, the ‘map’ derived from the New York Times alone is rather different: on this measure, the US has its own distinctive view of the world. That matters, says Leetaru. “Understanding how a given country groups the rest of the world gives you critical information on how to approach that country in terms of shaping policy”, he says.
Here’s some more bad news
If you’ve been feeling that the news is always bad these days, you’ve got a point. It has been getting steadily worse for the past 30 years, according to the trend in the tone of the entire data set in the SWB since 1979.