Monday, June 06, 2011

Musical intelligence

In the latest issue of Nature I have interviewed the composer Eduardo Reck Miranda about his experimental soundscapes, pinned to a forthcoming performance of one of them at London’s South Bank Centre. Here’s the longer version of the exchange.

Eduardo Reck Miranda is a composer based at the University of Plymouth in England, where he heads the Interdisciplinary Centre for Computer Music Research. He studied computer science as well as music composition, and is a leading researcher in the field of artificial intelligence in music. He also worked on phonetics and phonology at the Sony Computer Science Laboratory in Paris. He is currently developing human-machine interfaces that can enable musical performance and composition for therapeutic use with people with extreme physical disability.

Miranda’s compositions combine conventional instruments with electronically manipulated sound and voice. His piece Sacra Conversazione, composed between 2000 and 2003, consists of five movements in which string ensemble pieces are combined with pre-recorded ‘artificial vocalizations’ and percussion. A newly revised version will be performed at the Queen Elizabeth Hall, London, on 9 June as part of a programme of electronic music, Electronica III. Nature spoke to him about the way his work combines music with neurology, psychology and bioacoustics.

In Sacra Conversazione you are aiming to synthesize voice-like utterances without semantic content, by using physical modelling and computer algorithms to splice sounds from different languages in physiologically plausible ways. What inspired this work?

The human voice is a wonderfully sophisticated musical instrument. But in Sacra Conversazione I focused on the non-semantic communicative power of the human voice, which is conveyed mostly by the timbre and prosody of utterances. (Prosody refers to the acoustical traits of vocal utterances characterized by their melodic contour, rhythm, speed and loudness.)

Humans seem to have evolved some sort of ‘prosodic fast lane’ for non-semantic vocal information in the auditory pathways of the brain, from the ears to regions that processes emotion, such as the amygdala. There is evidence that non-semantic content of speech is processed considerably faster than semantic content. We can very often infer the emotional content and intent of utterances before we process their semantic, or linguistic, meaning. I believe that this aspect of our mind is one of the pillars of our capacity for music.

You say that some of the sounds you used would be impossible to produce physiologically, and yet retain an inherent vocal quality. Do you know why that is?

Let me begin by explaining how I began to work on this piece. I started by combining single utterances from a number of different languages – over a dozen, as diverse as Japanese, English, Spanish, Farsi, Thai and Croatian – to form hundreds of composite utterances, or ‘words’, as if I were creating the lexicon for a new artificial language. I carefully combined utterances by speakers of similar voice and gender and I used sophisticated speech-synthesis methods to synthesise these new utterances. It was a painstaking job.

I was surprised that only about 1 in 5 of these new ‘words’ sounded natural to me. The problem was in the transitions between the original utterances. For example, whereas the transition from say Thai utterance A to Japanese utterance B did not sound right, the transition of the former to Japanese utterance C was acceptable. I came to believe that the main reason is physiological. When we speak, our vocal mechanism needs to articulate a number of different muscles simultaneously. I suspect that even though we may be able to synthesise physiologically implausible utterances artificially, the brain would be reluctant to accept them.

Then I moved on to synthesize voice using a physical model of the vocal tract. I used a model with over 20 variables, each of which roughly represents a muscle of the vocal tract (see E. R. Miranda, Leonardo Music Journal 15, 8-16 (2005)). I found it extremely difficult to co-articulate the variables of the model to produce decent utterances, which explains why speech technology for machines is still is very much reliant on splicing and smoothing methods. On the other hand, I was able to produce surreal vocalizations that, while implausible for humans to produce, retain a certain degree of coherence because of the physiological constraints embedded in the model.

Much of the research in music cognition uses the methods of neuroscience to understand the perception of music. You appear to be more or less reversing this approach, using music to try to understand processes of speech production and cognition. What makes you think this is possible?

The choice of research methodology depends on the aims to the research. The methods of cognitive neuroscience are largely aimed at proving hypotheses. One formulates a hypothesis to explain a certain aspect of cognition and then designs experiments aimed at proving it.

My research, however, is not aimed at a describing how music perception works. Rather, I am interested in creating new approaches to musical composition informed by research into speech production and cognition. This requires a different methodology, which is more exploratory: do it first and reflect upon the outcomes later.

I feel that cognitive neuroscience research methods force scientists to narrow the concept of music, whereas I am looking for the opposite: my work is aimed at broadening the concept of music. I should not think that both approaches are incompatible: one could certainly inform and complement the other.

What have you learnt from your work about how we make and perceive sound?

One of the things I’ve learnt is that perception of voice – and, I suspect, auditory perception in general – seems to be very much influenced by the physiology of vocal production.

Much of your work has been concerned with the synthesis and manipulation of voice. Where does music enter into it, and why?

Metaphorically speaking, synthesis and manipulation of voice are only the cogs, nuts and bolts. Music really happens when one starts to assemble the machine. It is extremely hard to describe how I composed Sacra Conversazione, but inspiration played a big role. Creative inspiration is beyond the capability of computers, yet finding its origin is the Holy Grail of the neurosciences. How can the brain draw and execute plans on our behalf implicitly, without telling us?

What are you working on now?

Right now I am orchestrating raster plots of spiking neurons and the behaviour of artificial life models for Sound to Sea, a large-scale symphonic piece for orchestra, church organ, percussion, choir and mezzo soprano soloist. The piece was commissioned by my university, and will be premiered in 2012 at the Minster Church of St Andrew in Plymouth.

Do you feel that the evolving understanding of music cognition is opening up new possibilities in music composition?

Yes, to a limited extent. Progress will probably emerge from the reverse: new possibilities in musical composition contributing to the development of such understanding.

What do you hope audiences might feel when listening to your work? Are you trying to create an experience that is primarily aesthetic, or one that challenges listeners to think about the relationship of sound to language? Or something else?

I would say both. But my primary aim is to compose music that is interesting to listen to and catches the imagination of the audience. I would prefer my music to be appreciated as a piece of art rather than as a challenging auditory experiment. However, if the music makes people think about, say, the relationship of sound to language, I would be even happier. After all, music is not merely entertainment.

Although many would regard your work as avant-garde, do you feel part of a tradition that explores the boundaries of sound, voice and music? Arnold Schoenberg, for example, aimed to find a form of vocalization pitched between song and speech, and indeed the entire operatic form of recitative is predicated on a musical version of speech.

Absolutely. The notion of avant-garde disconnected from tradition is too naïve. If anything, to be at the forefront of something you need the stuff in the background. Interesting discoveries and innovations do not happen in a void.

1 comment:

Mindvalley said...

The stuff is very nice and helpful! Thanks to providing this info.