Monday, February 18, 2013

Rooting out the mother tongue

Catching up after winter bugs… here is an article for Nature News from a week or so back.

____________________________________________________________

Computer algorithm reconstructs ancient languages

On Fiji, a star is a kalokalo. For Pazeh Taiwanese aboriginals it is mintol, and for the Melanau people of Borneo, biten. All these words are thought to come from the same root – but what was it?

An automated computer algorithm devised by researchers in Canada and California now offers an answer – in this case, bituqen. The program can reconstruct extinct ‘root’ languages from modern ones, a process that has previously been done painstakingly ‘by hand’ using rules of how linguistic sounds tend to change over time.

Statistician Alexandre Bouchard-Côté of the University of British Columbia and his coworkers say that, by making the reconstruction of ancestral languages much simpler, their method should make it easier to test hypotheses about how languages evolve. They report their technique in the Proceedings of the National Academy of Sciences USA [1].

Automated computer methods like this have been attempted before, but the authors say these were rather intractable and prescriptive. The method of Bouchard-Côté and colleagues can factor in a large number of languages to improve the quality of reconstruction, and it uses rules that handle possible sound changes in flexible, probabilistic ways.

The method requires a list of words in each language, with their meanings, and a map of the phylogenetic ‘language tree’ showing how each language is related to the others. These trees are routinely constructed today by linguists using techniques borrowed from evolutionary biology.

The algorithm can automatically identify cognate words (ones with the same root) in the lexicons. It then applies rules known to govern sound changes to deduce the likely root of each set of cognates. For example, phonemes that are always paired will tend to get condensed into one if this doesn’t lose any semantic information.

The algorithm involves millions of parameters, whose values are found by an automated shuffling process that seeks the simplest fit to the data. It’s a little like cracking a code, based on a series of encoded phrases, by trying out possible bits of cipher and working your way towards one that gives a plausible solution to all the phrases.

The researchers tested their approach on 637 Austronesian languages spoken primarily on islands in Southeast Asia and the Pacific, including Malaysia, the Philippines and Indonesia. Manual methods have previously been used to reconstruct the protolanguage of this large group, thought to have come originally from Taiwan.

Bouchard-Côté and colleagues found that the predictions of their algorithm matched those of the manual method in 85 percent of cases (including bituqen). “Our system only uses a subset of the factors taken into consideration by a linguist, so we feel most of the discrepancies reflect things to be improved in our method”, admits Bouchard-Côté.

“It looks as though this method could be a very useful labor-saving device in some cases”, says linguist Don Ringe of the University of Pennsylvania. But he cautions that methods which are “correct or nearly correct in about 85% of the cases will never be good enough. Our reconstructions might be no better than an approximation, and if we settle for what look like approximations even to us, we might be plain wrong.”

Bouchard-Côté and colleagues used the method to test a hypothesis about language evolution first proposed in 1955 [2], which states that sounds that are particularly important for distinguishing words from each other are more resistant to change. Any such pattern is almost impossible to spot for just a few languages, but it emerged clearly from the data set of 637 languages.

There had previously been some scepticism about this so-called ‘functional load hypothesis’, and Ringe says that “the demonstration that there might be something to it after all is interesting.”

He adds that “it’s refreshing to find colleagues in other disciplines tackling a problem that historical linguists actually care about.”

References
1. Bouchard-Côté, A., Hall, D., Griffiths, T. L. & Klein, D. Proc. Natl Acad. Sci. USA doi/10.1073/pnas.1204678110 (2013).
2. Martinet, A., Économie des Changements Phonétiques (Maisonneuve & Larose, Paris, 1955).

No comments: