Monday, March 18, 2013

Chinese made easy(er)

Here’s my latest piece for BBC Future. Hmm, will Blogger permit Chinese characters? We’ll see. If you want to find out more about this interesting learning system, there’s some stuff here, but apparently more on the way as the authors figure out how to develop this tool.


There’s no way round it: learning Chinese is tough. As far as reading goes, what most dismays native speakers of alphabetic languages is that Chinese characters offer so few clues. With virtually no Spanish, I can figure out in the right context that baño means bath, but that word in Chinese (洗澡) seems to offer no clues about pronunciation, let alone meaning.

There seems no alternative, then, but to slavishly learn the 3,500 or so characters that account for at least 99% of usage frequency in written Chinese. This is hard even for native Chinese speakers, usually demanding endless rote copying in school. And even then, it is far more common than is often admitted for Chinese people to forget even quite routine characters, such as 钥匙 (key).

Is there a better way? Jinshan Wu of Beijing Normal University, a specialist in the new mathematical science of network theory, and colleagues have investigated the relationships between these 3,500 most used characters to develop a strategy that makes optimal use of the connections to assist learning and memorization.

Chinese characters aren’t really as arbitrary and bewilderingly diverse as they seem at first encounter. For one thing, they are made up of a fairly limited number of sub-characters or radicals, which themselves are composed of a set of standard marks or ‘strokes’. What’s more, the radicals often contain clues about meaning or pronunciation, or both. In the Chinese for bath, for example (pronounced xizao in the pinyin Romanization system), both characters start with the same radical, which denotes ‘water’, and the righthand half of both indicates the pronunciation. There are general rules (called liu shu, 六书) for building characters from radicals.

These connections can be exploited in learning. Once you know that wood is 木 (mu), it’s not so hard to remember that forest is 林 (lin) – or even more pictorially, 森林 (senlin). Assisted by the liu shu rules, Wu and colleagues mapped out the structural relationships between all 3,500 of the common characters, to form a network with over 7,000 links. This shows that the roughly 224 radicals are combined in just 1,000 or so characters that form the basis of all the others.

This network is hierarchical, meaning that it is somewhat like a tree, with a few central nodes (trunks) branching into many branch tips. That’s very different from a web-like network such as a grid or street map, in which there are often many different ways to get to any particular node. The researchers figured that it could be most efficient to start learning at the lower levels of the hierarchy – the trunks, as it were – and to progress gradually out towards the branch tips.

But would that necessarily be better than a strategy which focuses on the most frequently used words first? How, indeed, can one assess the relative learning cost of different strategies? There’s no unique way to do this, but Wu and colleagues developed a logical, intuitive method of enumerating costs. They figured that it is easier to learn a multi-part character if all the components had been learnt previously. To take a simple case, it’s easier to learn 明 (ming: bright) if you have already learnt 日 (ri: sun, day) and 月 (yue: month, moon). The researchers assigned cost values to each ‘new learning’ task.

The ‘cheapest’ way to learn all the characters in the network is then to start with the ‘trunk’ characters that have the highest number of branches, and work up through the layers. But that could leave you knowing a lot of words you rarely need to use. If, on the other hand, you simply learn characters in order of use frequency (as some learning methods do), you fail to take advantage of the network connections that can aid recognition.

The idea approach is a compromise between the two. Wu and colleagues therefore adjust the relationship network by giving a certain weighting or priority to each character depending on its use frequency. Then the learning path spreads gradually through the network while picking up most of the common characters first. It’s rather like planning a shopping trip by seeking a short total path between shops while also contriving to pick up the heaviest items last.

The researchers compared the learning cost of their strategy with that for the most widely used textbook in Chinese primary schools (covering 2,475 characters) and a popular textbook for learning Chinese as a second language. For a given cost, their new strategy picked up both considerably more characters in total and a significantly greater total use frequency than the two alternatives.

What’s more, the researchers say that their approach would allow each student’s learning strategy to be tailored to his or her individual strengths – for example, to suit those who have already learnt some characters. This just isn’t possible with traditional approaches.

Of course, the ultimate test is whether students do actually learn faster. This remains to be seen. But with a debate already raging in China over whether current teaching methods are the most suitable, this new proposal shows that there may be rational ways to pursue the question.

X. Yan, Y. Fan, Z. Di, S. Havlin & J. Wu, preprint.

1 comment:

JimmyGiro said...

So when's the book coming out:

"Eats, shoots, and builds a road net-work."