Thursday, August 07, 2014

Calvino's culturomics

Italo Calvino’s If On a Winter’s Night a Traveller is one of the finest and funniest meditations on writing that I’ve ever read. It also contains a glorious pre-emptive critique on what began as Zipf’s law and is now called culturomics: the statistical mining of vast bodies of text for word frequencies, trends and stylistic features. What is so nice about it (apart from the wit) is that Calvino seems to recognize that this approach is not without validity (and I certainly think it is not), while at the same time commenting on the gulf that separates this clinical enumeration from the true craft of writing – and for that matter, of reading. I am going to quote the passage in full – I don’t know what copyright law might have to say about that, but I am trusting to the fact that anyone familiar with Calvino’s book would be deterred from trying to enforce ownership of the text by the baroque level of irony that would entail.

__________________________________________________________

[From Vintage edition 1998, translated by William Weaver]

I asked Lotaria if she has already read some books of mine that I lent her. She said no, because here she doesn’t have a computer at her disposal.

She explained to me that a suitably programmed computer can read a novel in a few minutes and record the list of all the words contained in the text, in order of frequency. ‘That way I can have an already completed reading at hand,” Lotaria says, “with an incalculable saving of time. What is the reading of a text, in fact, except the recording of certain thematic recurrences, certain insistences of forms and meanings? An electronic reading supplies me with a list of the frequencies, which I have only to glance at to form an idea of the problems the book suggests to my critical study. Naturally, at the highest frequencies the list records countless articles, pronouns, particles, but I don’t pay them any attention. I head straight for the words richest in meaning; they can give me a fairly precise notion of the book.”

Lotaria brought me some novels electronically transcribed, in the form of words listed in the order of their frequency. “In a novel of fifty to a hundred thousand words,” she said to me, “I advise you to observe immediately the words that are repeated about twenty times. Look here. Words that appear nineteen times:
“blood, cartridge belt, commander, do, have, immediately, it, life, seen, sentry, shots, spider, teeth, together, your…”
“Words that appear eighteen times:
“boys, cap, come, dead, eat, enough, evening, French, go, handsome, new, passes, period, potatoes, those, until…”

“Don’t you already have a clear idea what it’s about?” Lotaria says. “There’s no question: it’s a war novel, all actions, brisk writing, with a certain underlying violence. The narration is entirely on the surface, I would say; but to make sure, it’s always a good idea to take a look at the list of words used only once, though no less important for that. Take this sequence, for example:
“underarm, underbrush, undercover, underdog, underfed, underfoot, undergo, undergraduate, underground, undergrowth, underhand, underprivileged, undershirt, underwear, underweight…”

“No, the book isn’t completely superficial, as it seemed. There must be something hidden; I can direct my research along these lines.”

Lotaria shows me another series of lists. “This is an entirely different novel. It’s immediately obvious. Look at the words that recur about fifty times:
“had, his, husband, little, Riccardo (51) answered, been, before, has, station, what (48) all, barely, bedroom, Mario, some, Times (47) morning, seemed, went, whom (46) should (45) hand, listen, until, were (43) Cecilia, Delaia, evening, girl, hands, six, who, years (42) almost, alone, could, man returned, window (41) me, wanted (40) life (39)"

“What do you think of that? An intimatist narration, subtle feelings, understated, a humble setting, everyday life in the provinces … As a confirmation, we’ll take a sample of words used a single time:
“chilled, deceived, downward, engineer, enlargement, fattening, ingenious, ingenious, injustice, jealous, kneeling, swallow, swallowed, swallowing…"

“So we already have an idea of the atmosphere, the moods, the social background… We can go on to a third book:
“according, account, body, especially, God, hair, money, times, went (29) evening, flour, food, rain, reason, somebody, stay, Vincenzo, wine (38) death, eggs, green, hers, legs, sweet, therefore (36) black, bosom, children, day, even, ha, head, machine, make, remained, stays, stuffs, white, would (35)"

“Here I would say we’re dealing with a full-blooded story, violent, everything concrete, a bit brusque, with a direct sensuality, no refinement, popular eroticism. But here again, let’s go on to the list of words with a frequency of one. Look, for example:
“ashamed, shame, shamed, shameful, shameless, shames, shaming, vegetables, verify, vermouth, virgins…"

“You see? A guilt complex, pure and simple! A valuable indication: the critical inquiry can start with that, establish some working hypothesis…What did I tell you? Isn’t this a quick, effective system?”

The idea that Lotaria reads my books in this way creates some problems for me. Now, every time I write a word, I see it spun around by the electronic brain, ranked according to its frequency, next to other words whose identity I cannot know, and so I wonder how many times I have used it, I feel the whole responsibility of writing weigh on those isolated syllables, I try to imagine what conclusions can be drawn from the fact that I have used this word once or fifty times. Maybe it would be better for me to erase it…But whatever other word I try to use seems unable to withstand the test…Perhaps instead of a book I could write lists of words, in alphabetical order, an avalanche of isolated words which expresses that truth I still do not know, and from which the computer, reversing its program, could construct the book, my book.

1 comment:

Note: Only a member of this blog may post a comment.