Tuesday, November 06, 2012

Who's bored af?

Here’s my latest piece for BBC Future. This version contains rather more rude words than the one eventually published – the BBC is perhaps surprisingly decorous in this respect (or maybe they figure that their science readers aren’t used to seeing the sort of language that arty types throw around all the time).

My editor Simon Frantz pointed out this other example of how Twitter is being used for linguistic/demographic analysis, in this case to map the distribution of languages in London. I love the bit about the unexpected prevalence of the Tagalog language of the Philippines – because it turns out to contain constructions such as “lolololol” and “hahahahaha”. I hope that in Tagalog these convey thoughts profounder than those of teenage tweeters.

_____________________________________________________________________

This piece contains strong language from the beginning, as they say on the BBC. But only in the name of science – for a new study of how slang expressions spread on Twitter professes to offer insights into a more general question in linguistics: how innovation in language use occurs.

You might, like me, have been entirely innocent of what ‘af’ denotes in the Twittersphere, in which case the phrase “I’m bored af” would simply baffle you. It doesn’t, of course, take much thought to realise that it’s simply an abbreviation for “as fuck”. What’s less obvious is why this pithy abbreviation should, as computer scientist Jacob Eisenstein of the Georgia Institute of Technology in Atlanta and his coworkers Brendan O’Connor, Noah Smith and Eric Xing of Carnegie Mellon University in Pittsburgh report in a preprint as yet unpublished, have jumped from its origin in southern California to a cluster of cities around Atlanta before spreading more widely across the east and west US coasts.

Other neologisms have different life stories. Spelling bro, slang for brother (male friend or peer) as bruh began in cities of the southeastern US (where it reflects the local pronunciation) before finally jumping to southern California. The emoticon “-__-“ (denoting mild discontent) began in New York and Florida before colonizing both coasts and gradually reaching Arizona and Texas.

Who cares? Well, the question of how language changes and evolves has occupied linguistic anthropologists for several decades. What determines whether an innovation will propagate throughout a culture, remain just a local variant, or be stillborn? Such questions decide the grain and texture of all our languages – why we might tweet “I’m bored af” rather than “I’m bored, forsooth”.

There are plenty of ideas about how this happens. One suggestion was that innovations spread by simple diffusion from person to person, like a spreading ink blot. Another idea is that bigger population centres exert a stronger attraction on neologisms, so that they go first to large cities by a kind of gravitational pull. Or maybe culture and demography matters more than geographical proximity: words might spread initially within some minority groups while being invisible to the majority.

It’s now possible to devise rather sophisticated computer models of interacting ‘agents’ to examine these processes. They tell us little, however, unless there are real data to compare them against. Whereas once such data were extremely difficult to obtain, now social media provide an embarrassment of riches. Eisenstein and colleagues collected messages from the public feed on Twitter, which collects about 10 percent of all public posts. They collected around 40 million messages from around 400,000 individuals between June 2009 and May 2011 that could be tied to a particular geographical location in the USA because of the smartphone metadata optionally included with the message.

The researchers then assigned these to the respective “Metropolitan Statistical Areas” (MSAs): urban centres that typically representing a single city. For each MSA, demographic data on ethnicity are available which, with some effort to correct for the fact that Twitter users are not necessarily representative of the area’s overall population, allows a rough estimate of what the ethnic makeup of the messagers is.

Eisenstein and colleagues want to work out how these urban centres influence each other – to tease out the network across which linguistic innovation spreads. This is a challenging statistical problem, since they must distinguish between coincidences in word use in different locations that could arise just by chance. There is, it must be said, a slightly surreal aspect in the application of complex statistical methods to the use of the shorthand ctfu (“cracking the fuck up”) – but after all, expletive and profanity have always offered one of the richest and inventive examples of language evolution.

The result is a map of the USA showing the influence networks of many of the major urban centres: not just how they are linked, but what the direction of that influence is. What, then, are the characteristics that make an MSA likely to spawn successful neologisms? Eisenstein and colleagues have previously found that Twitter has a higher rate of adoption among African Americans than other ethnic groups, and so it perhaps isn’t surprising that they now find that innovation centres, as well as being highly populated, have a higher proportion of African Americans, and that similarity of racial demographic can make two urban centres more likely to be linked in the influence network. There is a long history of adoption of African American slang (cool, dig, rip off) in mainstream US culture, so these findings too accord with what we’d expect.

These are still early days, and the researchers – who hope to present their preliminary findings at a workshop on Social Network and Social Media Analysis in December organized by the Neural Information Processing Systems Foundation – anticipate that they will eventually be able to identify more nuances of influence in the data. The real point at this stage is the method. Twitter and other social media offer records of language mutating in real time and space: an immense and novel resource that, while no doubt subject to its own unique quirks, can offer linguists the opportunity to explore how our words and phrases arise from acts of tacit cultural negotiation.

Paper: J. Eisenstein et al. preprint http://www.arxiv.org/abs/1210.5268.

No comments: