Post on 28-Jan-2023
Syntax: Music that Makes Sense and Music that Doesn’t
The way in which music makes sense is essentially the same as the way we make sense out of verbal speech.
• We find a piece of music meaningful, if our ear detects familiar patterns of pitch, rhythm and harmony.
• We hypothesize the meaning of a spoken sentence, if our brain matches the words we know to the words we identify in what we hear.
In both cases the same mechanism is at work. We break the flow of sounds into meaningful segments and then initiate a chain of references, cross-‐references, guesses of the connection between various segments – and finally confirming those guesses, or rejecting them – and searching for a better guess. All of this intricate analytical work is done semi-‐consciously, in “real time,” as we listen to speech or music.
Not recognizing any familiar patterns in music results in the state of confusion and disinterest. Both of them appear shortly after our ear gets used to the “strangeness” of meaningless sounds. The moment the novelty of the sound stops surprising us, we become bored. The longer the streak of meaningless music goes, the more difficult it becomes for us to stay focused. We might still retrieve some information from such music via the psycho-‐physiological markers, such as tempo, beat, loudness and pitch register. Together, they can provide enough material for us to theorize what the music could possibly mean. But the longer the music passes without unveiling patterns familiar to us, the more abstract our hypothetical meaning appears.
At the absence of melodic, rhythmic or harmonic idioms that would signal to us what emotional state is implied, our ear does not really connect to our mind. We hear some changes in pitch and duration, but they do not fire up our response. Emotional reaction requires familiar stimuli. Uncertainty tends to obstruct emotional response. We might guess a musical emotion presumably carried by the music in question, and try to evoke that emotion in us. However, without the ongoing reinforcement by familiar idioms, the empathetic mechanisms are not likely to kick in. Without them our emotional experience will not be stable enough for our mind to receive substantial feedback and confirm that we indeed are in such and such emotional state.
This is what I observed, when in 2001, at the Abraham Joshua Heschel Day School in Northridge, CA, eighteen 8th graders tried to answer the question, what emotion was expressed in the Overtones Aria from Kunqu opera (traditional Chinese music) after listening to it for the first time in their life. Their answers ranged from “surprise” and “fun” to “fear” and “pain.” The students appeared confused while listening, looking at each other with a puzzled expression or giggling. They could not provide any detailed record on how exactly did the emotional states follow each
2
other -‐ although the music clearly contained strong contrasting changes. Only one participant was able to specify three emotional changes within 5 minutes of music. However, not a single listener guessed the “right” emotional condition, that would come at least in remote accordance with the plot of the opera: in it, during that aria, a scholar overhears a nun improvising music on her qin (string plucking instrument, like a zither); enchanted by the sounds, the scholar falls in love and becomes overfilled with irresistible desire to meet the musician.
However, the same students did not have any problem identifying emotions in another opera – Carmen by Bizet (the closing scene). Almost everybody correctly named each of the emotional states in their succession within a 5-‐minute fragment of music: begging of Don Jose, Carmen’s indifference to him, his and Carmen’s anger, and carefree happy people in the background of the action. None of the students knew French, or heard this music before, and could rely only on the music they heard during the audition to support their conclusion.
It appears that familiarity with the musical idioms used in Carmen, prompted by prior exposure to the Western music, native to all of the students, was sufficient to let them correctly recognize the musical emotions and make sense of the music.
Chinese traditional opera, on the other hand, is characterized with strong stylistic artificiality: thus, actors are not allowed to bend their knees while walking on stage, their neck is supposed to swing from side to side on each step -‐ mincing for women, and rocking forward for men.1 Stylization rules Chinese theater. Everything, from gait to gestures, is called to emphasize the divergence between behaviors of daily life and their representation on stage. The theatrical art is believed to refine and aesthetically elevate things mundane by employing a strong stylistic distortion.2
Heavily influenced by Confucian aesthetics, the music in such opera deliberately avoids synesthetic connections that could potentially guide the listener (i.e., as in Western music, love associated with tender music, anger – with rough). According to the tradition of Cantonese Opera, all vocals had to be sung in forcefully delivered falsetto.3 Majority of Western listeners would interpret such vocal expression as angry – whereas the music, in fact, could be a love song.
Confucius looked at the free display of emotions in music with suspicion, condemning sensual forms of music and praising normative-‐official forms, suitable
1 Scott A.C. (1983) – The Performance of Classical Theatre. In: Chinese Theater:
From Its Origins to the Present Day, ed. Colin Mackerras, University of Hawaii Press, Honolulu, p. 139-‐140.
2 Wichmann, Elizabeth (1991) -‐ Listening to Theatre: The Aural Dimensions of Beijing Opera. University of Hawai‘i Press, Honolulu, p.4.
3 Chan, Sau Y. (2005) -‐ Performance Context as a Molding Force: Photographic Documentation of Cantonese Opera in Hong Kong. Visual Anthropology. Mar-‐Jun2005, Vol. 18 Issue 2/3, p. 167-‐198.
3
for enforcing virtuous moral attitudes.4 His advocacy of ritual, for which he became famous, influenced generations of Chinese musicians to come. As a result, art music cultivated only those musical idioms, whose acoustic attributes did not share a symbolic similarity with the emotions they denoted.
• In Western opera, the timbre of a snare drum usually refers to the military music. Snare drum in the symphonic orchestra sounds very similar to the snare drum in a marching band. The dry, clean and orderly sound of the rhythmic patterns performed on this drum resembles the discipline that is showcased at performances of military music during parades or civic ceremonies.
• In Chinese traditional opera, there are no such direct connections between the elements of the music and real life. For example, a particular rhythm played by a given percussive instrument (such as a drum, a clapper, a cymbal or a gong)5 is used to refer to the place of the action, i.e. at the meadow, or in the palace (most performances were housed in the small “bamboo theaters” at teahouses or temple courtyards -‐ without any stage design).6 Understanding references like this is impossible without exquisite knowledge of all the musical idioms and related to them conventions of act-‐out.
Not surprisingly, Western listeners have trouble making sense of this highly denotative opera music and have no choice but to reserve their interpretive strategy to mere guessing.
It should be noted that those studies of cross-‐cultural recognition of emotion that report that Western listeners had success recognizing emotions in Chinese tunes are based on samples not from Chinese opera, but from folk music.7
The attitude of Confucian driven aesthetics towards folk music in China is undoubtfully hostile. Chinese scholarship has been strongly politicized from times of Antiquity. A debasing outlook towards “common people’s” music has characterized Chinese musicology through centuries and still is prominent today. Years of Maoism did not change this bias, because the Maoist ideology still pressed towards putting authored “art music” at service of common
4 Thrasher, Alan R. (1981) -‐ The Sociology of Chinese Music: An Introduction.
Asian Music, Vol. 12, No. 2 (1981), pp. 17-‐53 5 Rao, Nancy Yunhwa (2007) -‐ The tradition of luogu dianzi (percussion
classics) and Its signification in contemporary music. Contemporary Music Review. 2007, Vol. 26 Issue 5/6, p. 511-‐527.
6 Wichmann, Elizabeth (1991) -‐ Listening to Theatre: The Aural Dimensions of Beijing Opera. University of Hawai‘i Press, Honolulu, p. 6.
7 Shui'er Han; Sundararajan, Janani; Bowling, Daniel Liu; Lake, Jessica; Purves, Dale (2011) -‐ Co-‐Variation of Tonality in the Music and Speech of Different Cultures.
PLoS ONE. 2011, Vol. 6 Issue 5, p. 1-‐5.
4
people, rather than popularizing anonymous music of common people. That is why the body of folk songs published in China has never been adequately examined in relation to its authenticity and accuracy of arrangement and notation. Information on Chinese music folklore coming from Chinese sources has often misled Western scholars to misapprehended conclusions.8
Besides the issues of reliability in representation of “Chinese” national characteristics in published folk tunes, it should be remembered that their propensity to reflect emotions, common for all folk music in the world, comes in sharp contradiction with values of the traditional art music of China. Just as emotional openness is viewed negatively by Confucian ethics, spontaneous natural expression of emotion in folk music is likely to be interpreted as “crude,” “uncivilized” and “chaotic” in eyes of traditional art music expert – serving as a model of what not to do in a high art composition.
Strong isolation of art music in Far East, with its complete dependence on conventions, has worked reciprocally in making it appear esoteric to ear of the un-‐initiated listeners. Equally esoteric is an impression from Western music carried by a Far Eastern listener, who was brought up on his traditional music, with no prior exposure to Western music (of course, today it is hard to find such a person). Nicolas Slonimsky presents an account of such an encounter by some Jijei Hashigushi who happened to attend the cream of the crop opera performance at the premiere of Puccini’s Madama Butterfly in New York, in 1907. His bitter letter to New York Daily grumbled: “I can say nothing for the music of Madama Butterfly. Western music is too complicated for a Japanese. Even Caruso’s celebrated singing does not appeal very much more than the barking of a dog in a faraway woods.”9
Evidently, an intuitive skill of breaking the flow of music into the familiar meaningful units lies at the heart of music perception. Without it no satisfactory aesthetic experience is possible for an individual. And the same applies to speech recognition. Quite similar effect can be observed in Chinese poetry. A famous poem, “Shī Shì shí shī shǐ” (Lion-‐Eating Poet in the Stone Den) by Yuen Ren Chao sounds as 91 monotonous repetitions of the same sound “shi” to the non-‐Chinese speakers. However, the content of this poem is far from being monotonous – it presents a fantastic story of a poet who ate ten lions. Western listener not versed in Mandarin cannot draw any idea of this meaning from the boring sound of the poem. But speakers of Mandarin can distinguish between monosyllabic and disyllabic words, and join words into phrasal units – which enables them to appreciate finesse and sophistication of this poem.
8 Yang Mu (1994) -‐ Academic Ignorance or Political Taboo? Some Issues in
China's Study of Its Folk Song Culture. Ethnomusicology, Vol. 38, No. 2, Music and Politics (Spring -‐ Summer, 1994), pp. 303-‐ 320.
9 Slonimsky, Nicolas (1965) -‐ Lexicon of Musical Invective: Critical Assaults on Composers Since Beethoven. Norton & Company, New York, p.5
5
Understanding of both, music and speech, starts at the same point – with the same strategies employed to spot syntactic structures in the stream of sounds. This common root supports the entire tree of sense-‐making via auditory channel.
• We recognize the meaning of a sentence after our brain negotiates the meanings of all the words that constituted that sentence.
• We recognize the meaning of a music phrase while and after our brain negotiates the meanings of all the musical idioms found in that phrase.
Both processes share the same initial stage: our mind recognizes elementary structures and relates them to each other – after which both, music and speech, take their own route. The entire semiotic chain in music comprehension, however, remains dependent on idioms and linguistic-‐like syntax – without them no communication of information via music can reliably take place.
Parsing of Syntax in Music: its Idiomatic Basis
The first step our brain has to do in order to reduce the wealth of the data entering the ear to manageable portions of information is the operation of “parsing.” Parsing is a linguistic term for instant analysis of the stream of sounds according to the set of known syntactic rules. Just like parsing in speech, parsing of music underlies the process of music comprehension.
Making sense from music is essentially a coordinated network of emotional responses to an on-‐going chain of micro-‐events. The brain matches the perceived data to the known glossary of patterns, and guesses where the end point of one pattern and the beginning point of another pattern is likely to be. Syntax plays the key role here: without knowing the rules of grouping sounds together, or breaking them apart, no event will ever be registered by brain – not to speak of the emotional reaction.
There are five primary aspects of expression, which are actively engaged in much of musical communication. Their idioms and their syntactic rules are most frequently used in music comprehension.
1. Pitch idioms (like mentioned earlier “fanfare”) are mediated by the pitch syntax (such as pitch contour or intervallic proximity);
2. Rhythm idioms (like “punctured rhythm”, discussed below) are mediated by the rhythm syntax (such as rules of grouping);
3. Metric idioms (like binary or ternary meters, discussed below) are mediated by the metric syntax (such as ostinato, syncopation or alternation);
4. Harmonic idioms (like “major triad”) are mediated by the harmonic syntax (such as tonic or dominant);
6
5. Texture idioms (like “melodic line,” “chords” or “figure of accompaniment”) are mediated by the textural syntax (such as polyphony, homophony or heterophony).
The other expressive aspects feature idioms, but lack their proprietary syntactic rules – instead relying on the syntax of five primary aspects:
6. Dynamic idioms (like “forte,” “piano” or “sforzando”) are not connected to each other by dynamic means – instead they depend on harmonic, metric, pitch and, to a less extent, rhythm syntax;
7. Tempo idioms (like “ritenuto” or “accelerando”) do not follow any tempo-‐based logic – their appropriateness is decided by pitch, rhythm and harmony;
8. Articulation idioms (like “staccato” or “legato”) are not limited by any articulation restrictions, they can be easily combined or alternated – determined by all five primary aspects of expression, plus tempo;
9. Timbral idioms (such as “vibrato” or “con sordino”) highlight the pitch, rhythm, harmonic and, to a less extent, metric organization, plus tempo and articulation;
10. Idioms of music form (such as “introduction,” “exposition,” or “development”) are integrated from all nine aspects and all syntaxes.
Perception of an elementary musical unit in any of these aspects leads to the next stage in processing. Now that elementary unit has to be categorized according to the expertise of a listener. This is the point when a single event is recognized as an idiomatic unit. The moment it happens, this audio event is placed in a hierarchical framework – it becomes mapped in relation to previous and coming events of the same kind, so that the mind can access this particular event at any point of time, as needed.
Having a clear destination point is extremely important for delivery of meaning. When the mind negotiates the meaning of one identified idiom with the meaning of another idiom, it often has to skip back and forth in the chain of micro-‐events -‐ to check and double-‐check for consistency in the logical connections. When the meaning of one link mismatches the other ones, the mind has to hypothesize another connection – which then needs another double-‐check, and perhaps, a triple-‐check. In practice, our mind is constantly busy operating within the span of 5-‐6 adjacent idiomatic units. Occasionally it has to leap far beyond – in the order of hundreds of idioms – in order to access a previous section of music form. This is the case of so-‐called “recapitulation” that brings back the same musical material that has opened a music piece.
Let us take the example of rhythm to see how a single idiomatic unit fits into the chain of rhythmic events, drawing a map of hierarchical relation between all of these events.
7
The Arabeske op.18 by Schumann opens with the rhythmic pattern “short-‐long-‐short-‐long,” with the proportion of 3:1 between long and short tones.
Example 1. Schumann – Arabeske. Each rhythmic pattern is marked by the bracket under the score:
<_________> <_________> <____> <____> <___> <____> <________ 1. 2. 3. 4. 5. 6.
___> <______> <____> <____> <____> <____> <____>
The “short-‐long” rhythm will be recognized by a competent listener as a so-‐called “punctured rhythm.” It is often referred to as “dotted” rhythm:
!" In case of the Arabeske, this rhythm is inverted: #$ This dotted rhythm figure is repeated once and then fragmented by the
succession of four “short-‐long” couples of tones. Then the entire progression of 6 patterns is repeated once again. The sequence of all 12 patterns comprises the first sentence and serves as the theme of the Arabeske. The accompaniment to this theme runs by the non-‐stop 16th notes, providing a well-‐defined metric grid for the punctured figures in the melody. Every fifth 16th note is marked by the change of harmony.
Example 2. The brackets here indicate the span of the same harmony:
8
<______> <____> <____> <____> <____> <___> <____> <____> <_____>
<____> <_____> <____> <____> <____> <____> <____>
Regularity of this harmonic pulse stresses the beat. Indeed, listening to the music confirms that the beat follows harmonic changes: it feels comfortable to tap to every fifth 16th note – such tapping seems to agree with the character of music and to stimulate the musical movement. Its liveliness supports the indication of the character in the score by Schumann, “lightly and tenderly.”
What is evident from this framework is that the “long” note of a rhythmic pattern is shorter than the beat. Subsequently, the unevenness of the on-‐going rhythm (long-‐short-‐long-‐short…) ignites the beat, charges it with energy to bounce up and down – exerting the force like a compressed spring. The moment the listener realizes this relation to the beat, he confirms his recognition of “punctured rhythm” and starts looking for wider perspective.
He notices that every other beat is heavier. This suggests the binary organization of meter. Indeed, the music moves in a walking pattern: “left-‐right-‐left-‐right.” The excited state projected by the bouncy rhythm, plus zigzagging melodic contour and busy harmonic pulse, all suggest that we deal here with the simple binary meter. Such meter makes the punctured rhythm sound the most active comparing to any other meter. In a simple binary pulse every beat steps in opposite direction – “upbeat-‐downbeat-‐upbeat-‐downbeat.” There are no neutral regular beats. All beats in a metric group go either “up” or “down.” Therefore, each bounce of the punctured figure flips the metric direction. This makes the movement as hyperactive as the regular meter can possibly allow for.
What we observe in this example is the construction of metric space from recognition and mapping of the rhythmic patterns – cross-‐relating them to harmony and pitch, while taking texture into consideration. This process starts from
9
identifying the first rhythmic pattern and ends with definition of the metric movement (open to future adjustments as the music piece progresses).
At which point does this construction become meaningful? – We ascribe a character to it the moment we realize that we deal with the punctured figure within a simple binary pulse – that is, around the time we hear the 7th rhythmic pattern. By that time the rhythm is categorized and placed into a hierarchical structure.
What is important here is that the very process of categorization is highly idiomatic. If the mind is not aware that the pattern “short-‐long-‐short-‐long” constitutes the so-‐called “punctured” rhythm, and the meaning of it is the excess of energy – bouncy, bubbly character – then no grid-‐fitting is possible. Each punctured figure is responsible for production of a single impulse. Without the grid, the succession of punctured figures in music will not accumulate energy and fail to charge the musical movement.
Moreover, without the knowledge of all related idioms, the rhythmic pattern can be categorized in a wrong way. Proportion of 3:1 can be mistaken for 2:1, which constitutes completely different rhythmic idiom. A good example of such similarity is Song Without Words G Minor op.19, No.6 by Mendelssohn.
Example 3. Mendelssohn – Song Without Words op. 19 No.6. Each rhythmic pattern is marked by a bracket under the score:
<________> <_______> <___> <____> <____> <_______> <_ 1. 2. 3. 4. 5.
____> <___> <______> <____> <___> <__________>
The melody of this piece closely follows the rhythmic scheme from the Arabeske: the “short-‐long-‐short-‐long” pattern is repeated, and then fragmented once. After that a new rhythmic pattern terminates the sentence.
Schumann: #!"$ Schumann: %&%& The biggest difference between the rhythm of both melodies is their
metric division: in Schumann’s case the ratio is 3:1, whereas in Mendelssohn’s it is 2:1. Other parameters are very similar.
10
Like in the Arabeske, the accompaniment in the Song also marks the continuous motion – this time by the 8h notes. Every fourth 8th note is marked with the stress. The stresses, however, are not equal: every other stress is stronger, providing the framework of “right-‐left-‐right-‐left” pulsation. It is very similar to the Arabeske, except that there each “right” and “left” couple was set against 4 notes of the accompaniment, whereas here there are 3 notes.
The ternary division makes a big difference. The greater amount of the divided parts, the weaker the metric impulse. Each of 3 notes in the Mendelssohn’s accompaniment sounds heavier than each of 4 notes in Schumann’s Arabeske -‐ producing a different grid. Here each of the notes in the accompaniment constitutes a beat.
As a rule of thumb, the beat unit is the medium short duration observable throughout the score: collect all the note durations and define the 3 shortest ones. Then the one but last is likely to serve as the main beat unit. This rule hold true in most cases except where there are not enough rhythms to build a hierarchy. In this particular excerpt we find the shortest note by the end of the phrase, in the 4th rhythmic pattern.
Example 4. the harmonic pulse:
<________> <_______> <______> <_______> <______> <_______>
<______> <___> <___> <______> <___> <___> <_____>
Heavier beat with smoother rhythm corresponds with the composer’s marks for the tempo, calling for sustained walking pace. The indication “singingly” for the melody emphasizes the gentler character, comparing to Schuman. Finally, the subtitle “Venetian Boat Song” directs the performer and the listener towards thinking about a boatman who keeps propelling his boat with short energetic pushes of his long oar – and his passenger enjoying the ride.
The harmonic pulse is more complex here than in the Arabeske, but overall, it marks every other group of “three” 8th notes in the accompaniment,
11
suggesting the complex ternary meter – the pulse of six, subdivided in two groups, 3 + 3. In that case the “short-‐long-‐short-‐long” pattern of the melody maps differently: “long” is found to be longer than the beat – in contrast with the Arabeske. This relation drastically changes the rhythmic expression.
The “short-‐long-‐short-‐long” pattern at 2:1 constitutes the swing-‐like rhythm that is associated with the expression of relative relaxation and ease. The 2:1 ratio produces rounded rather than jagged (as it is in case of 3:1 ratio) rhythm, and therefore suggests a smooth directed movement – unlike the bouncy gait of the 3:1 rhythm.
Such contrast between expressions of 2:1 and 3:1 rhythms is clearly illustrated in the song Trost an Elisa D.97 by Schubert. This song starts in common time (4/4) as a recitative, with numerous brief punctured figures (“long-‐short” patterns) which sharpen the vocal line.
Example 5. Schubert – Trost an Elisa, beginning:
The lyrics talk about Elisa who could not stop weeping for the death of
her beloved one -‐ despite many years that have passed since. The moment the lyrics mention the “spirit” of her beloved, the 3:1 rhythm disappears and gives way to the 2:1 rhythm (long-‐short, within the 12/8 meter).
Example 6. Bars 15-‐17:
It noticeably soothes the movement, providing gentle swaying, probably
to account for the reference to “wandering” of the sole between the Earth and the Heavens. On the words “loving companion”, the common time and punctured rhythms return, suggesting the idea of suffering. What makes this transition even more dramatic, the composer switches between two meters right amidst the bar. On the word “longing,” the 2:1 ratio kicks in -‐ in the piano
12
part alone, leaving the vocals in different meter, with the 3:1 ratio, thereby creating a polymeter.
Example 7. Bars 18-‐20:
This moment corresponds to the exclamation “for he is eternally yours!”
and implies duality, since Elisa remains in the Earthly world, whereas her beloved one belongs to the Heavens. The next line speaks of Elisa’s suffering, while the piano and the vocals join in common time and the 3:1 punctured rhythm. Nevertheless, the ending resumes the 2:1 rhythm -‐ with the promise of rejoining and immortality.
Evidently, the 2:1 rhythm is associated throughout the song with the idea of comfort, whereas the 3:1 rhythm accompanies the idea of suffering.
What we see is that there are two rhythmic idioms with diametrically opposite expressions: hyperactive punctured rhythm (3:1) and sleek swing-‐like rhythm (2:1).
It is highly unlikely that any listener who is not already familiar with these idioms will be capable of telling that the Arabeske is affectionate and excited, while the Song without Words is pleasurably laid back, whereas the Trost an Elisa is torn between suffering and reconciliation. An incompetent listener might be able to infer some emotional characteristics from the psychophysiological markers in the music that are obvious to anybody (i.e. tempo and dynamics), but it is doubtful if the emotional stimulus will be sufficient to drive the emotional contagion and trigger the “real life” experience of warmth and affection in the Arabeske, comfort and sensuality of the Song without Words and torment of the Schubert’s song.
Idiomatic Basis of the Performance Exaggerations
Yet another syntactic problem is that not every piece of music provides a texture clear enough to indicate the 2:1 versus the 3:1 metric grid. The examples from Schumann and Mendelssohn both featured very clear regular division in the accompaniment. It was easy to hear how many notes of the accompaniment fit on one “long” note of the melody – two or three.
When such shorter notes are absent in the texture, the task of detecting the rhythmic ratio becomes much more challenging. It cannot be accomplished by physical timing of exact duration for every note under question. Musical time is very different from physical time!
13
In music practice, as a rule, the performers manipulate the exact timing of rhythmic figures, making them slightly longer or shorter, depending on the context of the melody and harmony. As a result, a unit of musical time, such as a beat or a bar, becomes shorter or longer in units of absolute time, such as milliseconds or seconds.
Bengtsson & Gabrielsson (1983)10 provide the pictorial representation how the rhythm prescribed by the score differs from real life performance for the Sonata A Major K331 by Mozart. The vertical axis shows deviations in timing: positive values stand for extra time added to the time value specified by the score (slowing down the notes); while the negative values – subtracting time value (speeding up the notes). Two graphs correspond to the performances by two different players.
Example 8. Mozart – Sonata A Major K331 (from Bengtsson & Gabrielsson, 1983)
It is obvious that very few notes (1 of 34 in the 1st performance, and 6 of 34 in
the 2nd one) receive the time value prescribed by the score. The majority of the notes are constantly “distorted”: either trimmed or stretched. The degree of distortion and the choice of which notes to slow down, and which to speed up seems to characterize individual performance style of each player. More recent studies demonstrate that expressive timing signature serves as a marker of the individual
10 Bengtsson, I., & Gabrielsson, A. (1983). Analysis and synthesis of musical
rhythm. In J. Sundberg (Ed.), Studies of music performance (pp. 27-‐60). Publications issued by the Royal Swedish Academy of Music No. 39, Stockholm, Sweden.
14
style for a master performer. Master-‐musician recognizes his own performance months and even years after sight-‐reading an unfamiliar piece of music, even after all the dynamic variation is artificially removed from the recordings -‐ suggesting that the manner of tweaking of timing is the sole marker of individuality for a performer.11
This consistent textual “inaccuracy” on part of the performers constitutes the soul of music. Multiple studies demonstrate that imprecision of the expressive timing is directly linked to emotionality. The more precise is the timing in music, with little deviation from the metronomically correct pulse, the greater is the impression of the lack of “human feel.”
Thus, in the series of experiments, Bhatara et al. (2011)12 created MIDI recordings of the expressive performances of four nocturnes by Chopin, and electronically modified them. Some of the recordings were stripped of the temporal and dynamic fluctuations, and some others were altered to include slowing down or speeding up. Such alterations, however, were applied in the places different than the ones chosen by the master pianist for expressive timing. The panel of listeners rated the original recordings as well as their modifications. All the accurate timing versions were ruled as “less human” and “less emotionally communicative.” The versions, in which the time intervals were altered in random places, were found the least emotional.
Such results strongly suggest that the music by Chopin contains a set of patterns which require shortening or lengthening of certain tones in order to emphasize this or that musical emotion. The less exaggeration, the less obvious is the pattern, and subsequently, the weaker the musical emotion – and weaker the emotional contagion in the audience. This correlation must be taken as an indirect proof that the musical idioms indeed are present in music: the listeners intuitively recognize familiar idioms, when these idioms are marked out by means of the expressive timing. Then the audience has easy time identifying the denoted musical emotions, and empathizing to them. Exaggerating the durations in random places makes it harder to recognize the idioms – and therefore reduces emotional response.
Another argument in support of idiomatic nature of expressive timing is the fact that performers cannot get rid of rhythmic exaggerations even if they want to. In number of experimental studies, musicians were asked to play without any expression. Nonetheless, their performances still showed small variations in timing.
11 Repp BH, Knoblich G. (2004) -‐ Perceiving action identity: how pianists
recognize their own performances. Psychological Science 15, p. 604-‐9. 12 Bhatara, Anjali; Tirovolas, Anna K.; Duan, Lilu Marie (2011) -‐ Perception of
Emotional Expression in Musical Performance. Journal of Experimental Psychology: Human Perception and Performance, v37 n3 p. 921-‐934.
15
One of such studies13 examined exact mapping of timing “inaccuracies” of 6 pianists playing the Chopin’s Etude excerpt. Each pianist played it twenty times on a digital piano, the first ten times with normal expression and the second ten times “metronomically” accurate. In each case, the “metronomic” versions were found to contain timing fluctuations in the same exact locations, where exaggerated timing in the expressive performances (by the same pianist) took place.
One cannot avoid the impression that once a pianist learns an idiom, his mind becomes wired to bind certain tones in a given idiom to particular amount of shortening or lengthening. Once an idiom is “understood” in its expression, it becomes impossible for the mind to wipe this “understanding” off – hence, the corresponding expressive timing is bound to stay, no matter what.
Most live performances of music of all sorts of styles contain more or less pronounced variations of tempo, even in such applications where one might expect a very stable tempo. For example, dance music is designed to help dancers to reproduce the same steps of a dance over and over – seemingly, a very regular task. However, every Viennese waltz, when performed by expert musicians, features marked accelerations and retardations almost in every measure to the extent that a notated bar may at times be performed at twice the speed of another bar. 14
This flexibility of musical time appears to be present in any naturally evolved form of music – except the case when electronic music has been conceived, arranged and performed on computerized equipment. However, such music is rather new -‐ pioneered by the German band Kraftswerk.
Starting from 1978, Kraftswerk stopped using human operators, relying solely on drum machines and sequencers in all live performances. The band members often left stage during their concerts, letting the machines take over. Noteworthy, the entire genre of techno music that has adopted this precise “absolute timing” style is notorious for anti-‐emotional robotic character, where all human aspects of performance are either downplayed or right ignored.15
Even in the music where a dedicated performer, a drummer, is responsible for “keeping time” in the strictest possible way – timing is never really “strict” in physical terms. In the latest experimental study, fifteen professional drummers were asked to synchronize a basic drumming pattern with a metronome as precisely as possible at the speed of 60, 120, and 200
13 Repp, Bruno H. (2000) -‐ The timing implications of musical structures. In:
Musicology and sister disciplines: Past, present, future. Greer, David ed., Oxford University Press, New York, p. 60-‐67.
14 Bengtsson, I., & Gabrielsson, A. (1983) -‐ Analysis and synthesis of musical rhythm. In: Studies of music performance, J. Sundberg (Ed.). Publications issued by the Royal Swedish Academy of Music No. 39, Stockholm, Sweden, pp. 27-‐60.
15 Reinecke, David M. (2009) -‐ “When I Count to Four ...”: James Brown, Kraftwerk, and the Practice of Musical Time Keeping before Techno. Popular Music & Society. Dec2009, Vol. 32 Issue 5, p. 607-‐616.
16
beats per minute (bpm). At the slower tempo the right hand (playing high-‐hat cymbals) was found to have a 2 ms synchronization error (SE), whereas the left hand (on snare drum) and right foot (bass drum) were ahead of the metronome by about 10 ms. At the highest speed, the synchronization error rates reversed between both hands. Overall, the variation of SE stayed around 2% for the 60 and 120 bpm tempos, and 4% for the 200 bpm tempo.16
The reality of music is such that every rhythmic idiom is entitled to a specific amount of exaggeration. This is where the recognition of idioms and their syntactic relations becomes imperative. If the musician fails in his parsing, he will miss expressive timing, and therefore, fail to deliver an adequate emotional message. The consequences of this can range from dry impression on the audience to its confusion (in cases of complex rhythm or meter). If the listener fails to account for expressive timing, he will infer a misrepresented rhythmic figure, resulting in mistaken metric grid – which altogether, lead to severe distortions and misunderstanding.
Each of the rhythmic idioms that we have discussed earlier, the 3:1 puncture and the 2:1 swing, possess their own expressive timing profile, closely related to the emotional properties they are supposed to convey.
Musicians know that the punctured rhythm sharpens. They, therefore, emphasize the puncture by shortening the short note and lengthening the long note – what musicians call “overdotting.” This term refers to the dot in musical notation (on the right side of the note head) that indicates a 50% elongation of that note.
!" as opposed to equal notes of the “straight” rhythm: '( To “overdo” means to use greater than 50% elongation, violating the amount
prescribed by notation. The amount of overdotting can vary greatly between different performers -‐ almost to double size (depending on the context of articulation and tempo). However, the average elongation of the long note is usually around 0.79 instead of 0.75 specified by notation (3:4 = 0.75).17
The rule of a thumb seems to be that joyous, solemn, bold or fiery musical emotions call for stronger overdotting. The emotional conditions of pleasing, flattering, or sleepiness benefit from moderate overdotting.18
The 2:1 rhythm, on the other hand, is characterized by the opposite style of expressive timing. Whenever the 2:1 “long-‐short” pattern of rhythm sustains throughout the melody in a ternary meter, performers tend to skew a ratio well
16 Fujii, Shinya et al. (2011) -‐ Synchronization error of drum kit playing with a
metronome at different tempi by professional drummers. Music perception: An interdisciplinary journal, 28(5) p. 491.
17 Fabian, Dorottya; Schubert, Emery (2010) -‐ A new perspective on the performance of dotted rhythms. Early Music. Nov2010, Vol. 38 Issue 4, p. 585-‐588.
18 Hefling, Stephen E. (1993) -‐ Rhythmic Alteration in Seventeenth-‐ And Eighteenth-‐Century Music: Notes Inegales and Overdotting, Schirmer Books, New York, pp. 101–105.
17
below the nominal 2:1.19 The performing strategy here appears to aim at rounding up the rhythm by making it closer to the binary pulse (shortening the 2:1 long note towards the 1:1 ratio).
Evidently, each of two rhythmic idioms, punctured and swing-‐like rhythms, have been understood by music practitioners as opposite in their expression – in order for these opposite strategies of expressive timing to arise. Each of them handles the durational contrast in its own way: punctured style increases the contrast, while swinging style reduces it. Not only performers exaggerate these rhythms in the same manner – listeners expect them to be exaggerated in this particular manner. Often listeners perceive absence of overdotting as a fault, or, in contrary, are not aware of overdotting, because the rhythm appears to them “normal.” 20
Wide spread of this convention across different genres and styles of Western music, including performers as well as listeners, testifies that the knowledge of the punctured and swinging rhythms precedes perception of them in music. Music practitioners ought to know these idioms and syntactic conditions applicable to them – before they can identify the exaggerated versions of these idioms.
• Increasing the contrast between the notes of the punctured rhythm highlights the energizing character of this rhythm (active style).
• In opposite, smooth character of the 2:1 rhythm invites the opposite expressive timing style – to reduce the length of the long note and extend the short note, so that their contrast will be less pronounced (passive style).
Commonality of both rhythms attests to the fact that both, listeners and performers, are constantly engaged in the process of spotting familiar idioms and identifying their syntactic conditions. In real life situation, the rhythmic ratio is not the only factor affecting expressive timing. Harmony, melody and texture each contribute to shortening or lengthening a particular tone in music.
• Performers are expected to mark unexpected harmony or the most tensed dissonant chord with considerable extra timing.
• It is typical for the tone in the melody that commits a wide jump up or down to receive a little elongation.
• Whenever a new voice enters in the texture, its first note is usually sustained moderately over time.
19 Gabrielsson, A., Bengtsson, I. and Gabrielsson, B. (1983) -‐ Performances of
musical rhythm in 3/4 and 6/8 meter. Scandinavian Journal of Psychology, 24, p. 193-‐213.
20 Fabian, Dorottya; Schubert, Emery (2010) -‐ A new perspective on the performance of dotted rhythms. Early Music. Nov2010, Vol. 38 Issue 4, p.585-‐588
18
Subsequently, any music fragment can incorporate substantial deviations from strict “metronomic” timing, which would make it hard for the listeners to grasp which idiomatic rhythm is implied by this or that rhythmic pattern they hear.
Repp (1999) demonstrates how vast the discrepancy in expressive timing can be. The recordings of 19 famous pianists performing Beethoven's minuet from Piano Sonata in E-‐flat Major, op. 31, No.3 were analyzed in relation to exact duration of all the notes. The “short” note in the upbeat punctured figure (long-‐short-‐long) was found to differ on its every appearance, within the range from 199% of the notated rhythmic value to 60%. The expressive timing pattern varied from bar to bar according to musical demands as envisioned by every performer to his own discretion. Not a single performance featured a constant pulse. The precise timing of quarter-‐, eighth-‐ and sixteenth-‐notes varied substantially depending on their musical function. Thus, sixteenth-‐notes following dotted eighth-‐notes were generally prolonged in the Minuet, where they were part of an upbeat, but generally shortened in the Trio, where they fell on the downbeat.21
The scale and frequency of expressive exaggerations makes it doubtful if music follows the rules of syntax in the same way how it is done in speech – where the standard direction of processing is believed to proceed from lower level syntax to higher order. Similar convictions have been held in the field of semiotics of music.
But the havoc of expressive exaggerations posts a problem. How can the listener decide: does the auditioned music contain an overdotted 2:1 rhythm or an “underdotted” 3:1? Or, what if the durational contrast is the result of expressive timing of melodic jumps or dissonant chords – and, in fact, the rhythmic ratio is straight (1:1)? Is it possible at all to conceive an elementary unit in such conditions?
We face a chicken-‐or-‐the-‐egg dilemma here: is the music parsed by detecting the basic properties of the sounds in the music flow and joining certain sounds together? Or, is the music parsed by matching the familiar higher order structures to the flow of music?
Syntactic Order: from Top to Bottom, or from Bottom to Top?
The generative theory by Lerdahl and Jackendoff (1983)22 is by a large margin the most widely accepted syntactic theory in the field of music. One of its postulates is that the levels of syntactic hierarchy are consecutively derived from the “surface level” of the musical tones. In relation to the rhythm the generative order goes like that:
21 Repp B.H. (1990) -‐ Patterns of expressive timing in performances of a
Beethoven minuet by nineteen famous pianists. Journal of the Acoustical Society of America 88, p. 622–641.
22 Lerdahl, F., & Jackendoff, R. (1983) -‐ A generative theory of tonal music. The MIT Press, Cambridge MA.
19
1. Listener hears the music; 2. detects the beat; 3. breaks beats into groups (usually 2 or 3); 4. finds downbeat (the strongest regularly stressed beats); 5. defines meter (usually 2, 3, 6, 9, or 12-‐beat); 6. recognizes measures and hypermeasures (couples of measures).
Example 9. Generative Theory Schematics of a chorale by J. S. Bach produced with the GTTMS software by Robe Seward:
According to the generative theory, comprehension of music follows this
succession of steps from the elementary level to the most advanced level of musical form. The mind of a listener unconsciously executes all of the mental work and arrives at the stage of final understanding of music. Lerdahl and Jackendoff believe that the set of generative rules must have an innate origin, and they insist that the generative order guides the steps of music comprehension and transcends historical and geographical varieties of music.
This theory has been thoroughly tested in the past thirty years and its generative rules indeed have been confirmed in numerous experimental studies. This theory also proved to be very effective in computer science, providing reliable framework for automated analysis of music – including considerable success in recognition of expressive timing. However, the foundation of generative theory lies in cognitive sciences – with little correlation with the fields of music theory or music practice. Emotional component of music, so obvious for most music practitioners, completely eludes the scope of generative theory. Moreover, generative theory leaves no place for emotional meaning in its presentation of musical syntax.
One downside of the generative theory is that it creates an illusion that syntax can be effectively studied in isolation from semantics. Lerdahl and Jackendoff hold that the listener assembles the hierarchic structure all the way from the most superficial “surface” level of audition to the level of musical form, and it is only at the very top of this pyramid, listener somehow comes up with the “musical meaning.”
20
Lerdahl and Jackendoff view such meaning “as a combination of well-‐formedness rules and preference rules”23 – that is, purely in structuralistic terms, disregarding the idiomatic connection with emotions.
The most obvious objection to their semantic model comes from the experimental research on timing of the emotional response to music.
Bigand et al (2005) 24 tested what is the minimal time that takes for a group of listeners to adequately detect a musical emotion. Musically trained and untrained listeners were required to listen to 27 not popular musical excerpts of different styles, and to group those excerpts that conveyed a similar emotional meaning. In the first stage, all excerpts were 25 seconds long. Both, musicians and nonmusicians did very well, and produced an equal number of groups, highly correlated to the emotional expression in music. The amount of groups and their contents were very consistent between different participants, and included complex emotions. In the second stage, the excerpts were trimmed to bare 1 second. Nevertheless, such drastic shortening had only a weak effect on emotional responses. Although some of the musical emotions were incorrectly tagged, overall, the perceived emotions were remarkably similar to those experienced with longer excerpts.
It is highly unlikely that the listeners could have constructed the syntactic hierarchy according to the Lerdahl and Jackendoff’s generative rules in 1 second. Even harder it is to explain how all participants in the Bigand’s study could have inferred such a close match – as it follows from the high correspondence between their groupings. In such short period of time, how could they possibly estimate the emotional valence of different excerpts so closely? -‐ Much more likely, the speed and coherence in judgment had to do with the listener’s intuitive expertise in musical idioms that they detected within 1 second of music.
Such speed of detection characterizes recognition of familiar melodies, as revealed in another experimental study.25 Musicians and non-‐musicians were presented with segments of increasing duration of familiar and unfamiliar melodies and asked to sing the continuation of the melody. The results showed that 3 to 6 notes (i.e., about 2 sec) of a familiar melody were sufficient to evoke a feeling-‐of-‐ knowing judgment. Two additional notes allowed the participants to gain full confidence and carry on the tune. Identifying unfamiliar melodies took longer: 8-‐10 notes.
23 ibid. p. 312. 24 Bigand, E.; Vieillard, S.; Madurell, F.; Marozeau, J.; Dacquet, A. (2005) -‐
Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition & Emotion. Dec2005, Vol. 19 Issue 8, p. 1113-‐1139.
25 Bella, Simone; Peretz, Isabelle; Aronoff, Neil (2003) -‐ Time course of melody recognition: A gating paradigm study. Attention, Perception, and Psychophysics, October 2003, Vol. 65 Issue: Number 7 p. 1019-‐1028.
21
The 2-‐3 second range in detection of familiarity agrees with the span of time necessary for integration of felt emotion in the listeners. A group of 81 subjects recruited from Boston metropolitan area were asked to indicate emotional valence (positive-‐negative) of their felt response to 138 musical excerpts drawn from 11 genres of music. It was established that listeners require from 8.31 to 11.00 seconds of listening before they can formulate emotional judgments regarding the musical stimuli they experience.26
The time frame for emotional reaction to music appears to take less than 25 seconds and contain emotional “guess” during the first second of listening, followed by another second, during which the listeners find out if the music they hear contains any familiar structures. In the next 2-‐3 seconds they seem to finalize their decision on which musical emotion is contained in music. Extra 2-‐3 seconds allow the listeners to become aware of their “felt” emotions and move on in fine-‐tuning of their emotional experience.
The time frame required for the estimation of the minimal hierarchy of generative grammar – the level of hypermeasures – needs at least 4 measures of music. In moderate tempo of M=60, in common time, that would give 16 seconds of music. Then, it appears that by the time a listener infers the syntactic hierarchy, he already has long recognized the musical emotion and become aware of his emotional state invoked by music. Hardly such situation can be qualified as “making sense” of music through inference of its “generative grammar” according to Lerdahl and Jackendoff.
Yet another argument why the “bottom-‐to-‐top” syntax cannot secure sense-‐making in perception of music is that, in practice, no hierarchy building is possible without having a preconceived idea of what the hierarchic structure is likely to be. A person has to be familiar with at least few types of hierarchy in order to be able to construct one.
Furthermore, the choice of the material appropriate for hierarchical ranking depends on the prior knowledge of that given hierarchy. A person has to know what to look for in music – before he can define one layer in relation to another layer. Otherwise Western listeners would have no trouble making sense of Chinese traditional opera. Evidently, they are incapable of constructing an effective hierarchic scheme upon listening to Chinese music, because they are not familiar with any prototype of the hierarchy employed by the Chinese musical syntax.
The adherents of generative theory do not acknowledge this dependence. They believe that at the surface layer, music can be categorized and understood directly, without any prior knowledge. Jackendoff (1987) emphasizes that the “musical surface” is not a structure, but rather a “set of
26 Bachorik, Justin Pierre et al (2009) -‐ Emotion in motion: Investigating the
time-‐course of emotional judgments of musical stimuli. Music perception: An interdisciplinary journal, 26(4) p. 355.
22
relationships between elements that are present in levels of representation”27 – comparable to set of colors for vision or set of phonemes for speech. The brain encodes the stream of music in terms of discrete pitch events of specific duration – according to Jackendoff, this constitutes the “musical surface” – a “lowest level of representation that has musical significance.”
However, the notion of the “musical surface” as the lowest level of syntax by itself is shown to depend on the “higher-‐level” percepts. The surface is not perceptible unless the higher order structures are acknowledged by the mind. The elementary unit of music, a musical tone, cannot be perceived unless the mind knows its anatomy:
• Duration of a tone cannot exist without awareness of the beat;
• Pitch cannot exist without awareness of the tuning system;
• Timbre cannot exist without categorization of texture (decision of whether we hear a single tone of complex timbre or two tones of simpler timbre, played in unison).
And beat, tuning system and typology of textures are all higher order percepts – one can become aware of them only as a result of comprehension of the entire hierarchic scheme for a given piece of music.
• Beat is the prevailing average periodic duration observed in a given piece of music (it can be inferred only after the mind registers a number of regular pulses and correlates their ratio).
• Tuning system is the conglomerate of pitches that set the standard for all the intervals possible within a given music system (it can be inferred only after all the pitches available for music making are estimated in their octave equivalence).
• Texture is the utmost complexity of melodic, rhythmic and harmonic materials integrated together by a particular function (such as accompaniment, imitation or contrast).
Beat, tuning system and texture are all hierarchies – they should not be mistaken for the rules of generative grammar. There are about few dozen types of beats, categorized in a variety of ways depending on their duration, regularity and articulation style. There are three tuning systems used in Western music practice (Pythagorean, mean tone and equal temperament), each comprising a particular hierarchy of intervals. There are about a dozen texture types common for Western music, and each of these types unites a number of voices related to each other in a particular way. Beat, tuning system and texture are complexities that obey to the rules of rhythmic, metric, melodic and harmonic syntax.
27 Jackendoff, R. (1987) -‐ Consciousness and the computational mind. The MIT
Press, Cambridge MA, p. 218-‐219.
23
So, even perception of a single tone depends on the knowledge of higher order syntax. In order to conceive the “surface level” of music in its entirety, one has to constantly keep in mind the syntactic hierarchy for pitch, rhythm, meter and texture. Without awareness of organization of the beat (temporal aspect), harmonic segmentation (pitch aspect) and texture categorization (timbral aspect) – musical surface simply cannot exist as an entity.
Cambouropoulos (2010) 28 provides experimental support for this objection. He recorded few sequences of tones of slightly different duration (i.e. the first sequence consisted of the 1st note equal to 15 64th notes, 2nd note – to 17, 3rd note – to 15, and the following ones to 13, 11, 12 and 13 64th notes respectively). This and similar sequences were presented to 46 undergraduate music students, who were asked whether they perceived a metrical structure in the sound of a sequence (Yes/No). Additionally, they were urged to notate each sequence in standard music notation.
Example 10. Unevenly timed tones are categorized in variety of ways by different listeners (from Cambouropoulos 2010):
The majority of participants found the first sequence to be ametric (free
floating). The other sequences that were not as isochronous as the first one were notated by different participants in a number of different meters, with considerable discrepancies between the rhythmic durations for each of the notes in a sequence. Evidently, even musically trained participants could not reach an agreement in their recognition of the “musical surface,” and in the first case, simply were not capable of categorizing the “musical surface.”
Cambouropoulos also points out that the research in automated transcription of music demonstrates that surface-‐level musical analysis depends on the higher-‐order categories. The first attempts of building software applications for transcription of
28 Cambouropoulos, Emilios (2010) -‐ The Musical Surface: Challenging Basic
Assumptions. Musicae Scientiae, Special Issue, pp. 131-‐148.
24
recordings of polyphonic music relied completely on the bottom-‐to-‐top algorithms, and were proven not to be effective. Recent research has shown that a purely “bottom-‐up” approach cannot achieve satisfactory recognition of audio -‐ higher-‐level music processing (such as recognition of chords and voices) is necessary to enable basic multi-‐pitch and onset extraction. Inclusion of such higher order algorithms as multi-‐pitch analysis, beat tracking, instrument recognition, harmonic analysis and chord transcription, as well as music structure analysis29 – allowed for significant improvement in accuracy of transcription – in excess of 60% in some cases of polyphonic music.
The listener’s mind is no different from a computer when it comes to recognition of music – the same process of “transcription” takes place. In order for listener to parse musical tones into groups, he has to estimate the rhythmic values of each of the tones. But identifying the rhythm is impossible without the beat. And beat is a holistic concept: it has to be inferred from the totality of all tones, based on the prevailing average relations between them. Similar integrative operations are required for comprehension of elementary units of other aspects of musical expression.
In order to know what in the sound scene ought to be regarded as a motif, every listener has to decide which tones should be regarded consecutively and which simultaneously -‐ that is, to distinguish between chords and melodic lines. Then the listener has to determine which timbres have a priority -‐ i.e. figure out the instruments (or vocals) that perform a melody, and focus on them, paying less attention to the other instruments.
Throughout the act of listening, one has to keep constantly skipping back and forth between the low-‐level elements and high-‐level concepts. Are the heard tones, in fact, short notes passages in slow tempo, or are they long notes in a very fast tempo? Is it the rhythm that features progressive shortening of notes, or is it the tempo that slows down? Listener has to answer questions like that all the way through tracing the “surface level” of music. But no answers are possible without looking into higher order metric and melodic structures.
Several researchers provided experimental evidence for the “top-‐down” processing of musical syntax. Some of them even went as far as to conclude that rhythm only exists under the conditions of such processing. A dedicated term has been coined for this precondition of rhythm -‐ “metrical representation”. The model for it has been summarized by Povel and Essens (1985).30
Time, and therefore temporal intervals, can only be assessed by means of a clock. Meter plays the function of such a clock for music, providing the accurate internal representation of rhythm in music. The stresses detected in music invoke a
29 Ryynänen M. P., & Klapuri A. P. (2008) -‐ Automatic transcription of melody,
bass line, and chords in polyphonic music. Computer Music Journal, 32(3), p. 72-‐86. 30 Povel, D.J. & Essens, P. (1985) -‐ Perception of temporal patterns. Music
Perception, 2(4): p. 411-‐440.
25
metric pulse in the mind of a listener. In case if its beats coincide with the actual accents heard in music, the perception of their dynamic strength becomes amplified. Then the estimations of exact timing in rhythm are done with greater precision – providing more information and supporting richer emotional experience. If the clock is not well chosen, the rhythmic pattern is poorly reproduced and judged ambiguous, putting emotional response under question.
The effect of metrical representation is so strong that it can drastically change the “tiling” in the rhythm space. In the series of tests conducted on the group of professionally trained musicians, simple rhythmic pattern of 3 sounds with the musical ratio of 1:2:1 (210 ms, 474 ms, 316 ms) was correctly identified as 1:2:1, when presented to the subjects without the drum stressing a metric pulse. Addition of the sound track with the binary metric pulse performed on the woodblock did not change the recognition of the ratio. Only 10% of respondents mistook it for 1:3:2. However, when the very same rhythm was played against the ternary metric pulse, the rhythm was interpreted mostly as 1:3:2 -‐ and not one identifying it as 1:2:1. The rhythmic categorization so prevalent in duple meter conditions completely disappeared in triple meter.31
Awareness of meter, which is a higher-‐order concept, clearly affects discrimination of rhythm, which is a low-‐level percept. The dynamics of their interaction is such that meter induction and rhythm categorization appear to run in parallel, as two correlated processes, supporting or negating each other throughout the flow of music – thereby negotiating musical meaning.32
We recognize the surface level rhythm in terms of pre-‐established cognitive framework of time structuring. Rhythmic categorization and beat induction appear to function as two overlapping modular processes: the perceived audio material induces the beat, which in turn influences the categorization of new incoming material. This becomes a self-‐adjusting on-‐going process as long as the music keeps sounding.
Similar modularity can be observed in relation to the pitch. Thus, chord in music is perceived as a single perceptual unit rather than a combination of tones. Our ear analyzes a complex spectrum, breaking it into partials and reintegrating them into a single percept related to a single perceived root tone of that chord.
Richard Parncutt has conducted a comprehensive research in perception of chords, and came to conclusion that chords are processed in essentially the same
31 Desain, P. and Honing, H. (2001) -‐ Modeling the Effect of Meter in Rhythmic
Categorization: Preliminary Results. Japanese Journal of Music Perception and Cognition, 7(2), 145–56.
32 Desain P; Honing H. (2003) -‐ The formation of rhythmic categories and metric priming. Perception; Vol. 32 (3), pp. 341-‐65.
26
way as individual pitches.33 Then, individual pitch and individual chord are found to belong to different hierarchic layers, yet the mind of a listener has to be aware of both of them simultaneously. He must decide for each pitch he hears, whether it is part of a chord or a separate entity. Therefore, categorization of lowest level pitch order depends on higher order percepts of chords.
Example 11. Chord versus non-‐chordal tones. Debussy -‐ La puerta del Vino from Preludes, vol.2:
In the prelude by Debussy La puerta del Vino the correct perception of harmony depends on the prior knowledge of chords that are typical for Western music. Otherwise the combination of tones D flat, F, B and E can be mistaken for a chord, whereas the sound E is a non-‐harmonic tone that is used to interfere with the underlying chordal structure – to create tonal tension and generate melodic motion.
Taking this tone for an organic part of a chord would distort the expression of the music: the returns of E in the melody would become just ordinary repetitions of the same chord, without any harmonic contrast. Then the music would appear monotonous and relaxed – which clearly contradicts the prescription of the author “very expressive,” as well as the program of the music, suggested by its title. The brusque harmony feeds into the growing tension in the piece, where seductive mysterious atmosphere thickens and eventually erupts in violent explosion.
Yet another typical controversy that can arise as a result of wrong categorization of chords is taking polyphonic combination of melodic
33 Parncutt, Richard (1989) -‐ Harmony: A Psychoacoustica1 Approach, Springer-‐
Verlag Berlin, p. 68-‐70.
27
voices for a chord. Colorful harmonic sound can originate from juxtaposition of melodic changes in different voices. When numerous voices all move at the same time that can create an illusion of a chord.
Example 12. Chord versus linear sonance. Mussorgsky -‐ Ballet of the Unhatched Chicks from Pictures from an Exhibition:
In this extravagant trio from Mussorgsky’s Ballet of the Unhatched
Chicks the texture is layered into 5 voices – a very demanding task for the pianist to carry out. Then the challenge goes for the listener: he has to identify the presumable “chords” – the sounds that fall on the downbeat – and realize that neither they form conventional chordal progressions, nor they emphasize traditional functions of tonic, dominant or subdominant.
The pseudo-‐chords are mere sonances – the accidental combinations of tones that occur as a result of melodic motion in different voices in the artificial mode F-‐G#-‐A-‐B-‐C-‐C#-‐D-‐E. This heavily sliced texture in a strange mode is called to illustrate phantasmagoric situation of chicks that try to dance in the most delicate and sophisticated way while still not being able to get out from their shell.
Not only categorization of vertical harmony depends on the vocabulary of chords and chordal progressions – completely horizontal melodic progressions are categorized exactly in the same way – by fitting the melodic motifs into the framework of known chordal structures and chordal progressions.
Detection of pitches is determined by knowledge of the most common chords and rules of their connection in music – what constitutes the set of harmonic idioms. “Chord recognition is the result of a successful memory search in which a tone series is recognized as a pattern stored in long-‐term memory.”34
The listener recognizes that a certain progression of pitches in melody utilizes the tones of a familiar chord (such as the melody of the Blue Danube Waltz by Johann Strauss Jr.).
Example 13. Johann Strauss Jr. -‐ the Blue Danube Waltz. The brackets show the changes in vertical harmony (the chords of harmonization).
34 Povel, D.-‐ J., & Jansen, E. (2001) -‐ Perceptual mechanisms in music processing.
Music Perception, 19 (2), p. 169–198.
28
<___________________________> <____________
_____________> <________________________> <___________
This seemingly “elementary” melody requires great expertise of chords. Without it, it would be very confusing to distinguish between the G in bar 2 of the example above and the G in bar 6 – as well as the A in bar 10 and the A in bar 14. The obvious similarity of the melodic phrases to the C major triad (bars 1-‐7) and the half-‐diminished seventh-‐chord (bars 8-‐15) can only confuse the listener unfamiliar with the typical progressions of tonic and dominant harmony. Then the clash between the vertical and the horizontal harmonies in this famous melody will not come to the surface and the listener’s perception of the music will lose in dynamism, making the music appear trivial.
The categorization of pitch by chords is not limited to the melodies like the Blue Danube. The research by Dirk-‐Jan Povel shows that any tone sequence is defined in harmonic terms. While tracking the pitch in melody, listeners are constantly engaged in guessing as to which chords the tones of melody belong, and at which point of the melody one implied chord is changed by another.
Then both hypothetical chords are related to each other and measured against the stock of known progressions of chords. If the progression of chords observed in music is unknown, the listener looks for an alternative categorization of a chord. Each chord is stored in memory, while the most recent chord change appears to be most salient. As the music progresses, the listener comes up with projections of future chords, which can be confirmed or deceived by music – leading to experience of different melodic expression.35
As we see, perception of rhythm, pitch and harmony is strongly idiomatic and is impossible without “top-‐down” processing of syntax. There is evidence that all aspects of musical expression operate on idiomatic base – coming from the error-‐filtering phenomenon, well familiar to any professional classical musician. Mistakes in performance that appear so striking to a performer largely pass unnoticed by the audience -‐ something that all professionals eventually learn throughout their career.
35 Jansen, Erik; Povel, Dirk-‐Jan (2004) -‐ Perception of arpeggiated chord
progressions. Musicæ scientiæ: The journal of the European Society for the Cognitive Sciences of Music, 8(1) p. 7.
29
That is why more experienced performers take public performance easier, knowing that they will get away with lots of misses in the score.36
Research studies show that the percentage of unnoticed mistakes is amazingly high: even the most obvious of mistakes – wrong pitches – run 38% unnoticed – and this is not by laypeople, but by the graduate students majoring in piano.37 More so, they overlook more than a third of pitch misses in the music they well knew, and some of them even have recently studied. The range of errors include intrusions, omissions, untied notes and substitutions, and involves not only pitch and rhythm, but texture, articulation, dynamics etc.. Such poor error detection (likely to be even lower for laymen audience) reflects the idiomatic nature of music comprehension: performers prioritize in their attention, and “allow” mistakes in subsidiary places (i.e. in chords rather than in melodies, and in shorter rather than longer rhythms).
The idiomatic based error-‐correction mechanism is most obvious in sight-‐reading: when a musician performs an unknown piece of music from sight, by reading the score. Examination of the errors made during sight-‐reading demonstrates that most of them are judged by the performers as contextually appropriate – and therefore, hard to notice. Musicians tend to recognize effectively the low level syntax and make "smart guesses" -‐ filtering errors from places where they would be most noticeable.
The second line of defense against communication failure is provided by the listeners. They guess the right way for the music to go in most situations where things went wrong for the performer. Repp (1997) found out that absolute majority of omission errors pass unnoticed by the audience. Only 4 out of 28 substitution errors were judged as inappropriate. Only 19% of intrusions were detected. What is even more amazing is that between 62% of the total errors are not detected by professionally trained musicians-‐listeners.38
It looks like the more a listener knows about music, the more naturally he guesses the “right” text messed up by the performer. What we observe here at work is the same mechanism of error correction known in psycholinguistics as "scrambled letters" effect.39 It is experimentally confirmed that most words with typographic errors can be guessed right even if the letters are completely misplaced (as in the phrase “Raeding Wrods With Jubmled Lettres”) -‐ provided the first and last letter of a scrambled word is in the right place. Even words with the inverted order of letters are easily comprehensible. Substitution of letters presents more of a problem, but is still manageable as long as the substituted characters are similar in shape and the first letter is right.
36 Sloboda, John (1985) -‐ The musical mind: The cognitive psychology of music.
Clarendon Press, Oxford, p. 85. 37 Repp B.H. (1997) -‐ The art of inaccuracy: Why pianists' errors are difficult to
hear. Music Perception, 14, p. 161-‐184. 38 Ibid. 39 Rayner, Keith et al (2006) -‐ Raeding Wrods With Jubmled Lettres: There Is a
Cost. Psychological Science, 17(3), p. 192-‐193.
30
Evidently, the main criterion for ease of error-‐correction is the integrity of a word. The correction mechanism must be directed "bottom-‐up." The data for correct recognition of a word comes primarily from the elementary units of the text. The brain matches this information to the vocabulary of known words and comes up with a guess. The hypothetical correction then is weighted against the earlier words, stored in short term memory, taking in account the commonality of the combinations of words. Such model of correction has been successfully tested in studies that measured the speed of comprehension of “jumbled-‐word” texts.40
Similar mechanisms are engaged in processing the low level syntactic structures in music. The “bottom-‐up” processing is correlated with the “top-‐down” processing. Both of them involve the short and long term memory. The short-‐term memory cashes the most immediately preceding patterns of music, whereas the long-‐term memory retrieves the entries from the listener’s lexicon of familiar musical structures. One more component necessary for successful error correction in music is the issue of semantic relevance – the emotional reaction to a parsed elementary unit of music should match the context of the auditioned excerpt.
Emotional Factor in Syntactic Categorization of Music
We already have seen the evidence of ultra-‐fast emotional response to a music stimulus, in splitting fast 1 to 9 seconds. Let’s return to our previous example of the binary meter and examine what exactly happens during such a response.
When we hear a succession of notes of equal duration, by default, our brain invokes the binary metric pulse. It has been experimentally proven that spontaneously produced rhythmic patterns generate durations related primarily by 1:1 and 2:1 ratios.41 Our mind starts categorizing the tones we hear in binary grid. Once our projected beats are found to coincide with the actual accents in music, our mind focuses on the musical events that coincide with the binary pulse – amplifying those events and feeding back our experience of binary pulsation. At this point our brain generates “virtual” movement – an ongoing pattern of “left-‐right” steps superimposed on the flow of music. Then all the rhythmic patterns become projected within the march-‐like space.
The unity of metric induction, rhythmic categorization and motor representation has been elaborated by Neil Todd in his sensory-‐motor theory
40 Paciorek, Wiktor; Ralczaszek-‐Leonardi, Joanna (2009) -‐ The influence of
sentential context and frequency of occurrence on the recognition of words with scrambled letters, Psychology of Language and Communication, January 2009, Vol. 13 Issue: Number 2 p. 45-‐57.
41 Fraisse, P. (1982) -‐ Rhythm and tempo. In: Psychology of music, D. Deutsch (Ed.). New York, NY: Academic Press. pp. 149-‐180.
31
of rhythm.42 According to it, the experience of rhythm is mediated by two complementary representations: a sensory representation of the musical movement, and a motor representation of the musculoskeletal system. This cross-‐modal connection allows a person to appropriate new forms of musical movement and/or convert them into new physical motions. The connection between motor representations and musical movement is supported by the findings from the neuro-‐imaging research, which reveal that the same brain areas that are associated with vestibular processing are involved in rhythm perception.43
Laurel Trainor conducted series of studies investigating the contribution of vestibular system to rhythmic categorization. The last of her studies established that the auditory perception of metrical structure in musical rhythm can be influenced by artificial stimulation of the vestibular nerve -‐ in the absence of any physical movement.44
The ramifications of this is that listeners internalize musical movement in terms of locomotory impulses in their body parts – and in great detail. Toiviainen et al (2010)45 conducted an experiment, where the group of musicians were instructed to move to the music, and their movements were video-‐recorded and analyzed in relation to the metric grid of the music. A kinetic analysis of peaks in mechanical energy revealed that participants embodied the metric pulses on numerous levels: synchronizing motion of different parts of their body to periods of one, two and four beats.
• The 1-‐beat pulse was associated with vertical hand and torso movements as well as mediolateral arm movements,
• the two-‐beat pulse -‐ with mediolateral arm movements and rotation of the upper torso,
• the 4-‐beat pulse – with lateral flexion of the torso and rotation of the upper torso.
Higher order meter involved central parts of the body, whereas basic beat engaged the limbs. Altogether, the entire metric hierarchy was simultaneously represented in spontaneous yet systemic physical motions.
42 Todd, Neil P.; O'Boyle, Donald J.; Lee, Christopher S. (1999) -‐ A sensory-‐motor
theory of rhythm, time perception and beat induction. Journal of new music research, 28(1) p.5 .
43 Grahn J.A; Brett M. (2007) -‐ Rhythm and beat perception in motor areas of the brain. Journal Of Cognitive Neuroscience, 2007 May; Vol. 19 (5), pp. 893-‐906.
44 Trainor, L. J., Gao, X., Lei, J., Lehtovarara, K., SC Harris, L. R. (2009) -‐ The primal role of the vestibular system in determining musical rhythm. Cortex, 45, p. 3543.
45 Toiviainen P, Luck G, Thompson M (2010) -‐ Embodied meter: Hierarchical eigenmodes in music-‐induced movement. Music Perception 28, p. 59-‐70.
32
Going back to our example of experiencing the surface rhythm of a binary meter, the motor representation of the rhythm evokes the emotional conditions characteristic for marching style of motion. Marching is associated with communal feeling – the sense of togetherness in devotion to a common goal. It also implies the sense of discipline and direction: marching people move towards a specific target in a specific manner. That is why the surface level of music in binary meter receives the semantic attributes of forcefulness, togetherness, commitment and purposefulness.
It does not take long before a listener becomes submerged into this state. Majority of listeners can identify the march right from the first measure. There is no need to comprehend the entire hierarchy of temporal organization, including the hypermeter, to experience the movement of the music. The motoric reflexes set by music are instantaneous. The time delay might be necessary only in cases of ambiguity, where few attempts of guessing would have to be made before committing to the particular metric clock.
And this is what we are going to observe in case, where the beat in music goes not by binary, but by ternary pulse. Fujioka et al (2010)46 used magneto-‐encephalography and spatial-‐filtering source analysis to identify a strong effect of the metric contrast between waltz and march on exact timing of the activation of different parts of the brain. Thus, the right hippocampus was activated 80 ms after the march downbeat and 250 ms past the waltz downbeat. Basal ganglia showed a greater 80 ms peak for the march than for the waltz.
Higher reactivity to march must be explained by the perceptual advantage of binary meter of march over ternary meter of waltz. Identifying the ternary meter takes longer, because of an extra step of cancelling out the default binary interpretation. Larger latency and weaker response to the ternary meter correspond to the reputation of ternary pulse amongst Western musicians as being more entertaining, smooth and easy-‐going than binary pulse.
In Western music, the ternary meter has become embodied in the waltz pattern of motion – three steps of circling motion. Done continuously, waltzing produces complete circle of steps on the dance floor. Accordingly, ternary metric pulse is associated with carefree movement – going by a circle does not have a purpose. It is the pulse suited primarily for dancing -‐ in contrast to binary pulse, which serves primarily for walking. Notably, all ternary meters feature the casual delightful style of movement, not purposeful, not directed, but rather pleasurable and relatively easy-‐going – unlike the purpose-‐oriented walking.
46 Fujioka T; Zendel BR; Ross B, (2010) -‐ Endogenous neuromagnetic activity for
mental hierarchy of timing. The Journal Of Neuroscience: The Official Journal Of The Society For Neuroscience, 2010 Mar 3; Vol. 30 (9), pp. 3458-‐66.
33
This traditional amongst musicians view of ternary versus binary meters has gained scientific support. In 1985 the group of researchers have discovered the “detour path effect”:47 the fact that moving the finger from point A to point B straightly subjectively appears to take shorter time than actually it takes, whereas moving by the ellipse appears to be longer than it actually is. The greater is the detour, the greater the illusion. Further research demonstrated that there is actual time difference in the velocity of the hand tracing or drawing the straight line as opposed to oval line. The delay in estimation of the hand’s trajectory was responsible for the retardation effect. The amount of inflection in the direction of a line governed the impression of slowness: the greater the curvature, the slower the movement.48
When the circular motion is reproduced over and over -‐ as it happens in ternary metric pulsation -‐ the lax effect magnifies. Every measure becomes “laid back” between the neighboring downbeat points, since the downbeats designate the leaning points for the metric pulse. As a result, the ternary beat movement obtains its lackadaisical flavor.
Anybody can conduct an easy experiment: turn the metronome on and tap to its clicking, marking every other beat – then, after getting accustomed to the binary pulse, start stressing every third beat. Changes are you will feel that the new pulse is more relaxed than the previous one.
The tendency of a binary pulse to produce more busy impression has a physiological basis, too. The straight movement associated with the marching pattern of the beat implies directedness -‐ the motion towards a certain goal. And we know from the experimental studies that aimed hand movements are planned vectorially: in terms of distance and direction, rather than in terms of absolute position in space.49
Vector-‐like processing of the binary meter might be responsible for its greater dynamism, associated with the instinct to reach the vector’s target sooner -‐ in contrast with the ternary meter, with its propensity to lag. The connection between goal orientation and distance estimation has been experimentally supported. In one study, the subjects were found to perceive the straight-‐line distance to a cylinder as
47 Lederman SJ, Klatzky RL, Barber PO (1985) -‐ Spatial and movement-‐based
heuristics for encoding pattern information through touch. Journal Of Experimental Psychology. 1985 Mar, Vol. 114, Issue 1, p. 33-‐49.
48 Faineteau, Henry; Gentaz, Edouard; Viviani, Paolo (2005) -‐ Factors affecting the size of the detour effect in the kinaesthetic perception of Euclidean distance. Experimental Brain Research. Aug2005, Vol. 163 Issue 4, p. 503-‐514.
49 Vindras, Philippe; Desmurget, Michel; Prablanc, Claude; Viviani, Paolo (1998) -‐ Pointing errors reflect biases in the perception of the initial hand position. Journal of Neurophysiology (Bethesda), Vol. 79 (6). June, p. 3290-‐3294.
34
being longer when they intended to grasp the cylinder by reaching around a wide barrier. The same distance was perceived shorter when the barrier was narrower. 50
Now we can see why the punctured rhythm in the binary meter (that we have discussed earlier, in relation to Schumann) can exert much energy and produce the impression of rushing. Association of binary pulse with goal-‐oriented marching movement could be responsible for the “wishful thinking” – the impression of shortening of a distance that one has to travel. Ternary pulse, free from the goal, is not capable of stimulating such “wishful thinking,” and therefore commonly produces less active impression. That throws new light at the contrast between the punctured binary and swing-‐like ternary divisions of a beat: the former rushes the movement, whereas the latter relaxes it.
The contrast between binary and ternary is manifested in two expressive aspects: rhythm and meter. In both of them, elementary idioms deliver their semantic messages to the listener the moment he becomes aware of them (which takes split seconds). In rhythm, punctured figure bounces and bursts, whereas swing figure rolls and rocks. In meter, binary pulse hastens, whereas ternary pulse comforts.
The emotional meaning conveyed by the syntactic relation between rhythm and meter is corroborated by the expressive timing chosen by a performer. The measurements show that in practice, the 2:1 swinging rhythms can actually vary anywhere from 1.66:1 to 5.6:1 ratio. The lower ratios correspond to natural, tender, solemn, and sad expressions. The higher ratios suit happy and angry expressions.51 Durational contrast between longer and shorter tones can be understood as sharpening or softening of a rhythmic ratio in order to account for sharpening or softening a corresponding emotional meaning.
Their connection must have been established through cultivation of dance music in folkloric culture: a dancer adjusts his steps in a particular dance to the music according to his emotional state. Other people see this adjustment and entrain to both, the music and the accompanying it motion. They remember the motion and reuse it every time they happen to dance (in traditional folklore dancing is a very common activity). The motion becomes fixed by a convention. Eventually, the gradations in that motion become fixed as well – they obtain their own “sharpening” or “softening” values along with related emotional associations. Thus, rocking motion is generally believed to be softer than staggering, and is associated with softer, loving, expression.
Emotion acts like a two-‐edged sword in music: it is conveyed by idioms and syntax, yet it affects perception of syntax in a feedback loop. Musicians emote to the
50 Morgado N; Gentaz E; Guinet E; Osiurak F; Palluel-‐Germain R. (2013) -‐ Within
reach but not so reachable: obstacles matter in visual perception of distances. Psychonomic Bulletin & Review, 2013 Jun; Vol. 20 (3), pp. 462-‐7.
51 Madison, Guy (2000) -‐ Properties of Expressive Variability Patterns in Music Performances. Journal of New Music Research. Dec2000, Vol. 29 Issue 4, p. 335-‐357.
35
music they play, which causes them to exaggerate expressive timing, dynamics and even pitch. Varying pitches at the discretion of a performer within the same motif often happens in popular music. It is prohibited in classical music, with its reverence for the score, however even there, singers and string instruments players have leeway in bending their intonation in so-‐called “portamento,” not to speak of their right to exercise embellishments. Such expressive exaggerations, in turn, affect the way in which listeners perceive the musical idioms and the syntactic structures rendered by the musicians. As we could see in numerous cases, listeners tend to hold a perception bias that usually matches the bias employed by performers.
The communication here works akin the Dolby process: performer encodes the emotional message via expressive exaggeration – listener decodes it by nullifying the exaggeration – by looking “through” it, as though it is not there. So, altogether, the expressive distortion of rhythm, dynamics or pitch appears “normative” to the audience. It is only the absence of exaggeration is registered by the audience as abnormality.
This mechanism works strictly on idiomatic basis. The central place in this scheme is occupied by the depository of idioms available to the listener. The real lexicon of a non-‐musician is a lot larger than what has been covered by modern day musicology. The topoi and styles defined in the music literature constitute just the drop in a pool of what actually is out there, in music practice. All the expressive devices listed in this book in conjunction to the meter, rhythm, tempo and articulation are the idioms that constitute entries in the Western lexicon of music. This lexicon is vast. The amount of rhythmic figures alone must run in hundreds -‐ involving many different combinations and configurations of available time ratios -‐ with all variants produced by interaction with different meters at different tempi. And comparable multitude of entries represent the harmonic, melodic, textural and timbral aspects of music.
Without any awareness, mostly automatically, the listener manages the enormous database of what is common for his native type of music -‐ in a way very similar to how he stores up to a hundred thousand words and their phrasal combinations of his native tongue. On this, still syntactic, level of auditory perception speech and music are processed in a single domain. That is why the findings of psycholinguists are quite applicable to the field of music. There is no principal difference in operating perceptual tasks between music and language -‐ not until the time to estimate the contextual appropriateness of a given expression will come. Then semantic processing of music takes a different path, departing from speech.
Aniruddh Patel proposes a model of functionally shared brain networks52 in attempt to reconcile the conflicting evidence from behavioral studies of patients with musical deficits -‐ which point toward independence of musical
52 Patel, Aniruddh D. (2013) -‐ Sharing and nonsharing of brain resources for
language and music. In: Language, music, and the brain: A mysterious relationship. edited by Michael A. Arbib, MIT Press, Cambridge, MA:, p. 329-‐356.
36
syntax from syntax of speech -‐ against the evidence from neuroimaging research, which proves their overlap. The main reason for musical syntax to have something in common with linguistic syntax, according to that theory, is the fact that both of them depend on the real-‐time interpretation of rapidly unfolding streams of information. In both cases, interpretation involves application of abstract structural categories and rules. They, per se, are not bound to specific semantic denotations, yet exercise influence over the meaning. What unites music and speech syntax is their heavy reliance on the relevant functional computations -‐ despite neuropsychological dissociations between linguistic and musical abilities.
The biggest principal difference of musical syntax from speech is the greater span and complexity of its hierarchic organization, as well as simultaneous engagement of multiple hierarchic levels in definition of the very same sound event.
The syntax of music is by far a lot more complex than that of speech. Even on the most basic surface level, a listener has to pay attention concurrently to the changes in:
1. pitch, 2. rhythm, 3. harmony, 4. dynamics, 5. tempo, 6. metric organization, 7. texture, 8. articulation and 9. music form.
Each of these nine aspects of expression possesses its proprietary set of idiomatic patterns. And each of these idioms is bound by syntactic rules – that act across the number of expressive aspects.
We could already observe how the idiom of “short-‐long” 3:1 punctured rhythm depended on:
• its relation to the beat (metric aspect), • harmonic pulse (harmonic aspect), • separation from the rhythm in the accompaniment (texture
aspect), • span of the pattern (articulation aspect) and • repetitiveness of the pattern (aspect of music form).
So, the rules that govern appropriateness of this idiom to a particular context involve conditions from 5 different aspects of organization. The fact that the other 4 aspects of organization are irrelevant to the idiom of punctured rhythm is as characteristic in defining its syntax. Some other idiom, such as “fanfare” is characterized by a completely different configuration of relevant and irrelevant aspects of expression.
37
Not only each musical idiom is mapped across several aspects of expression – it is accessed simultaneously from a number of hierarchic levels of organization. In general, music perception seems to favor “top-‐down” direction of syntax building in relation to parsing, and “bottom-‐up” direction in relation to error-‐correction. However, in every particular case categorization of a musical event can potentially engage low, middle or high order hierarchic level.
In the same example of punctured rhythm from the Arabeske by Schumann, the “short-‐long-‐short-‐long” rhythmic figure is defined by:
• Elementary level of metric aspect (beat); • Advanced level of harmonic aspect (chord is the elementary
level, succession of two chords is the medium level, and harmonic pulse is the average of progression of multiple chords);
• Medium level of texture aspect (continuity of a single melodic voice is the elementary level of texture, and integration of the tones that comprise the accompaniment into a single entity is the next level);
• Elementary level of articulation aspect (legato connection of 4 notes constitutes an elementary unit of articulation);
• Elementary level of music form (repetition of the pattern within the musical sentence, which is the simplest unit of music form).
Different layers of syntactic hierarchy are accessed at once – without any building up. The harmonic pulse is not inferred from observation of chords, but “guessed” right away: the listener already knows what to expect from the sound of a harmonic progression where chords keep changing on every beat. He remembers how this structure should sound. He also knows that such harmonic pulse often supports the punctured rhythm. So, after hearing just 3-‐4 changes of chords, he “guesses” the rest – before the music will actually demonstrate that the harmonic pulse indeed commits to the beat.
Lerdahl and Jackendoff are right in formulating the generative rules and stating that construction of syntactic hierarchy takes place while listening. However, this process is not the main method of music comprehension. The emotional nature of music makes it way too exciting and intuitive to follow the accurate logic chain of derivation.
The greatest physiological advantage of emotion lies in its enormous speed of reaction. The same applies to musical emotion. It jumps like a flee -‐ it simply cannot crawl like a beetle. The mind has easier time activating an emotional reaction and then, if it turns out to be inappropriate, cancelling it in favor of some other emotion – than to withhold any emotional reaction and sit there, waiting until the entire syntactic hierarchy is constructed and tested.
Wrong-‐guessing an emotion is the most common situation in everyday life. How many times have we gotten scared and then immediately realized that there is
38
nothing dangerous out there, and laughed at ourselves? -‐ Musical emotions are as volatile.
Musical mind thinks by guesses – this is the prime mode for making sense in music. The “generative” mode is only secondary – reserved for the cases of ambiguity: whenever the music depends on complex emotional conditions or expresses conflicting emotional states, or when the listener is not very well versed in the music language that is used by a music composition.
Of course, the “generative” mode of listening can be cultivated, like everything else. However, not that many people are capable of sticking to this mode as the primary means of following music. Nicholas Cook describes these two modes, calling the most common mode "musical listening", because it focuses on the flow of music. The other one he qualifies as the "musicological listening" mode, where the focus goes towards establishing presence of certain structures in music.53
Empirical evidence of ultra-‐fast emotional reactions to music of both, musically untrained listeners, as well as musicians, testifies that even professional musicians rarely reserve to the purely “musicological” mode of listening. 54
The principal strategy for a listener in parsing the stream of music is to look for familiar “sound bites” of rhythm, melody and harmony – in their most common metric modifications (i.e. punctured rhythm in binary versus ternary meter, or, simple meter versus complex etc.). These “sound bites” can be elementary or complex, and often belong to the higher order hierarchic levels.
The cognitive mechanism of chunking allows anybody to grasp the peculiar clash of both conflicting rhythms and remember them as a single block of information. Then every time a similar sound will be encountered in music, the listener will recognize this polyrhythmic idiom -‐ without any generative operations. In fact, in one experimental study, half of the musically untrained participants were able to tap the alternations of binary and ternary divisions – remembering and appreciating the peculiar “against the beat” quality of this polyrhythmic combination.55
A recent experimental study by Koelsch (2013) 56 investigated how musicians and non-‐musicians process higher order syntax in music.
53 Cook, Nicholas (1990) -‐ Music, imagination, and culture. Oxford: Clarendon
Press, Oxford, p. 152. 54 Bachorik, Justin Pierre et al (2009) -‐ Emotion in motion: Investigating the
time-‐course of emotional judgments of musical stimuli. Music perception: An interdisciplinary journal, 26(4) p. 355.
55 Vos P, Handel S, (1987) -‐ Playing triplets: facts and preferences. In: Action and Perception in Rhythm and Music, Ed. A Gabrielsson, Royal Swedish Academy of Music, Stockholm,action and perceptionaction and perception pp. 35-‐47.
56 Koelsch S; Rohrmeier M; Torrecuso R; Jentschke S. (2013) -‐ Processing of hierarchical syntactic structure in music. Proceedings Of The National Academy Of Sciences Of The United States Of America, 2013 Sep 17; Vol. 110 (38), pp. 15443-‐8.
39
Researchers used two versions of a chorale by J. S. Bach: one intact, and another one with the transposed first sentence – so that the harmonic endings of both sentences mismatched. As a result, the higher order harmonic organization was broken (at the level of a musical period), while keeping the lower level correct (at the sentence level).
Example 14. J. S. Bach’s chorale Liebster Jesu, wir sind hier (BWV 373) in the original and distorted form, where the 1st sentence is transposed to create a harmonic clash with the 2nd sentence (from Koelsch et al 2013).
40
Previous studies have demonstrated that whenever listeners discover syntactic errors in music, their electroencephalographic (EEG) recording shows an early right anterior negativity (ERAN), that is known to reflect music–syntactic processing, together with a subsequent late negativity (the so-‐called N5), known to reflect harmonic integration. Listening to the distorted version of Bach’s chorale invoked the same pattern of response in Koelsch’s subjects, indicating that listeners can be aware of irregularities in higher order syntax at the absence of irregularities at the lower order syntax.
This finding suggests that the listener’s mind can process syntactic hierarchy altogether with nested long-‐distance dependencies. Listeners can keep in active memory representations of syntactic structures they encountered earlier in music. Their mind allows them to selectively access these structures as needed. In fact, it is likely that this capacity is much stronger for musical syntax comparing to verbal syntax. Music routinely employs repetitions of musical material over large spans of music (i.e. recapitulation in sonata form), which can operate within the time periods of 15-‐20 minutes – greatly exceeding the structural complexity even of the longest sentences.
In this light, it becomes crucial for the listener to be able to quickly move focus of his attention from one aspect of expression to another aspect, to zoom in, or out, in examination of a syntactic structure -‐ while shifting from one level of syntactic hierarchy to another. Strictly following the “generative” framework would deprive the listener of precious time and rob him of much of his emotional experience.
The evidence from the research on expressive timing supports the conclusion that music becomes emotionally meaningful right from the start of syntactic processing. The conviction that conveyance of emotion is the prime function of music, so dominant in Western literature on music before the 20th century, finds confirmation in the most recent psycho-‐physiological studies.
Musical syntax seems to be organized around the central technical issue of music – the task of speeding up the mental processing. The principal function of musical syntax is to reserve the most time-‐space available for experience of music as the “emotional theater.” The entire evolution of Western music has been directed toward establishment of “musica reservata” during the 1560’s – the same point in history and geography, where the fine art of Renaissance reached its summit and set the Western civilization apart from the rest of the world cultures, coining the Western identity.
“Suiting the power of music to the meaning of the words, expressing the power of each different emotion, making the things of the text so vivid that they seem to stand actually before our eyes,”57 – this description of music of Orlando di Lasso from the 1565 letter by the Dutch scholar, Samuel Quickelberg, captures the essence of a big leap Western music took at that time, making a revolutionary impression on
57 Grout, Donald J. (1973) – A History of Western Music. W. W. Norton &
Company, New York, p. 283.
41
the contemporaries. This was the moment when the spontaneously forming public market for music opened doors for the composers to compete with each other in their skills of creating “emotional theater.” Quickly maturing market conditions started rewarding more emotionally expressive authors over their less expressive colleagues, setting incentives for all musicians, including performers, to maximize the emotional expression. This vector of historic development has been dominant until the advance of abstract music in the middle of the 20th century – and still is prominent in the domain of popular music, as well as film music.
The time period between 1560 and 1948 generally coincides with what is called “common practice period” – time when classical music strictly followed the set of compositional rules and enjoyed wide recognition (c.1600-‐1900). Absolute majority of musical idioms in use today have been generated during these three centuries under the aegis of public interest in “emotional theater” provided by music.
Musical syntax, as we know it, has been crystallized in millions of acts of emotional communication between the composers, the performers and the listeners, providing feedback and polishing the conventions of use -‐ as well as optimizing the physical appearance of the musical idioms. Emotional factor has been conditioning both, production of music and perception of it.
Musical syntax should be viewed as the set of rules regulating the showcase of musical emotions. These rules take familiar emotional expressions and organize them into the chain of exciting events that collide and lead to unpredictable outcomes, always fresh, always believable – affecting the perception of syntax by the listener and commanding the performer to adjust for the listener’s affective state. Emotional factor turns musical syntax into an ouroboros -‐ a snake eating its own tail. Studying musical syntax without considering emotional denotation is throwing out the baby with the bath water. Emotion is those stereoscopic glasses that make one feel syntactic structures of the music in flesh and blood. Without such glasses music becomes mundane and dull reality, attractive to no one but most devoted musicologists.
REFERENCES:
Bachorik, Justin Pierre et al (2009) -‐ Emotion in motion: Investigating the time-‐course of emotional judgments of musical stimuli. Music perception: An interdisciplinary journal, 26(4) p. 355.
Bella, Simone; Peretz, Isabelle; Aronoff, Neil (2003) -‐ Time course of melody recognition: A gating paradigm study. Attention, Perception, and Psychophysics, October 2003, Vol. 65 Issue: Number 7 p. 1019-‐1028.
Bengtsson, I., & Gabrielsson, A. (1983) -‐ Analysis and synthesis of musical rhythm. In: Studies of music performance, J. Sundberg (Ed.). Publications issued by the Royal Swedish Academy of Music No. 39, Stockholm, Sweden, pp. 27-‐60.
42
Bhatara, Anjali; Tirovolas, Anna K.; Duan, Lilu Marie (2011) -‐ Perception of Emotional Expression in Musical Performance. Journal of Experimental Psychology: Human Perception and Performance, v37 n3 p. 921-‐934.
Bigand, E.; Vieillard, S.; Madurell, F.; Marozeau, J.; Dacquet, A. (2005) -‐Multidimensional scaling of emotional responses to music: The effect of musical expertise and of the duration of the excerpts. Cognition & Emotion. Dec2005, Vol. 19 Issue 8, p. 1113-‐1139.
Cambouropoulos, Emilios (2010) -‐ The Musical Surface: Challenging Basic Assumptions. Musicae Scientiae, Special Issue, pp. 131-‐148.
Chan, Sau Y. (2005) -‐ Performance Context as a Molding Force: Photographic Documentation of Cantonese Opera in Hong Kong. Visual Anthropology. Mar-‐Jun2005, Vol. 18 Issue 2/3, p. 167-‐198.
Cook, Nicholas (1990) -‐ Music, imagination, and culture. Oxford: Clarendon Press, Oxford, p. 152.
Desain P.; Honing H. (2003) -‐ The formation of rhythmic categories and metric priming. Perception; Vol. 32 (3), pp. 341-‐65.
Desain, P.; Honing, H. (2001) -‐ Modeling the Effect of Meter in Rhythmic Categorization: Preliminary Results. Japanese Journal of Music Perception and Cognition, 7(2), 145–56.
Fabian, Dorottya; Schubert, Emery (2010) -‐ A new perspective on the performance of dotted rhythms. Early Music. Nov2010, Vol. 38 Issue 4, p. 585-‐588.
Faineteau, Henry; Gentaz, Edouard; Viviani, Paolo (2005) -‐ Factors affecting the size of the detour effect in the kinaesthetic perception of Euclidean distance. Experimental Brain Research. Aug2005, Vol. 163 Issue 4, p. 503-‐514.
Fraisse, P. (1982) -‐ Rhythm and tempo. In: Psychology of music, D. Deutsch (Ed.). New York, NY: Academic Press. pp. 149-‐180.
Fujii, Shinya et al. (2011) -‐ Synchronization error of drum kit playing with a metronome at different tempi by professional drummers. Music perception: An interdisciplinary journal, 28(5) p. 491.
Fujioka T; Zendel BR; Ross B, (2010) -‐ Endogenous neuromagnetic activity for mental hierarchy of timing. The Journal Of Neuroscience: The Official Journal Of The Society For Neuroscience, 2010 Mar 3; Vol. 30 (9), pp. 3458-‐66.
Gabrielsson, A., Bengtsson, I. and Gabrielsson, B. (1983) -‐ Performances of musical rhythm in 3/4 and 6/8 meter. Scandinavian Journal of Psychology, 24, p. 193-‐213.
Grahn J.A; Brett M. (2007) -‐ Rhythm and beat perception in motor areas of the brain. Journal Of Cognitive Neuroscience, 2007 May; Vol. 19 (5), pp. 893-‐906.
Grout, Donald J. (1973) – A History of Western Music. W. W. Norton & Company, New York, p. 283.
43
Hefling, Stephen E. (1993) -‐ Rhythmic Alteration in Seventeenth-‐ And Eighteenth-‐Century Music: Notes Inegales and Overdotting, Schirmer Books, New York, pp. 101–105.
Jackendoff, R. (1987) -‐ Consciousness and the computational mind. The MIT Press, Cambridge MA, p. 218-‐219.
Jansen, Erik; Povel, Dirk-‐Jan (2004) -‐ Perception of arpeggiated chord progressions. Musicæ scientiæ: The journal of the European Society for the Cognitive Sciences of Music, 8(1) p. 7.
Koelsch S; Rohrmeier M; Torrecuso R; Jentschke S. (2013) -‐ Processing of hierarchical syntactic structure in music. Proceedings Of The National Academy Of Sciences Of The United States Of America, 2013 Sep 17; Vol. 110 (38), pp. 15443-‐8.
Lederman SJ, Klatzky RL, Barber PO (1985) -‐ Spatial and movement-‐based heuristics for encoding pattern information through touch. Journal Of Experimental Psychology. 1985 Mar, Vol. 114, Issue 1, p. 33-‐49.
Lerdahl, F., & Jackendoff, R. (1983) -‐ A generative theory of tonal music. The MIT Press, Cambridge MA.
Madison, Guy (2000) -‐ Properties of Expressive Variability Patterns in Music Performances. Journal of New Music Research. Dec2000, Vol. 29 Issue 4, p. 335-‐357.
Morgado N; Gentaz E; Guinet E; Osiurak F; Palluel-‐Germain R. (2013) -‐ Within reach but not so reachable: obstacles matter in visual perception of distances. Psychonomic Bulletin & Review, 2013 Jun; Vol. 20 (3), pp. 462-‐7.
Paciorek, Wiktor; Ralczaszek-‐Leonardi, Joanna (2009) -‐ The influence of sentential context and frequency of occurrence on the recognition of words with scrambled letters, Psychology of Language and Communication, January 2009, Vol. 13 Issue: Number 2 p. 45-‐57.
Parncutt, Richard (1989) -‐ Harmony: A Psychoacoustica1 Approach, Springer-‐Verlag Berlin, p. 68-‐70.
Patel, Aniruddh D. (2013) -‐ Sharing and nonsharing of brain resources for language and music. In: Language, music, and the brain: A mysterious relationship. edited by Michael A. Arbib, MIT Press, Cambridge, MA:, p. 329-‐356.
Povel, D. J., & Jansen, E. (2001) -‐ Perceptual mechanisms in music processing. Music Perception, 19 (2), p. 169–198.
Povel, D.J. & Essens, P. (1985) -‐ Perception of temporal patterns. Music Perception, 2(4): p. 411-‐440.
Rao, Nancy Yunhwa (2007) -‐ The tradition of luogu dianzi (percussion classics) and Its signification in contemporary music. Contemporary Music Review. 2007, Vol. 26 Issue 5/6, p. 511-‐527.
Rayner, Keith et al (2006) -‐ Raeding Wrods With Jubmled Lettres: There Is a Cost. Psychological Science, 17(3), p. 192-‐193.
44
Reinecke, David M. (2009) -‐ “When I Count to Four ...”: James Brown, Kraftwerk, and the Practice of Musical Time Keeping before Techno. Popular Music & Society. Dec2009, Vol. 32 Issue 5, p. 607-‐616.
Repp B.H. (1990) -‐ Patterns of expressive timing in performances of a Beethoven minuet by nineteen famous pianists. Journal of the Acoustical Society of America 88, p. 622–641.
Repp B.H. (1997) -‐ The art of inaccuracy: Why pianists' errors are difficult to hear. Music Perception, 14, p. 161-‐184.
Repp BH, Knoblich G. (2004) -‐ Perceiving action identity: how pianists recognize their own performances. Psychological Science 15, p. 604-‐9.
Repp, Bruno H. (2000) -‐ The timing implications of musical structures. In: Musicology and sister disciplines: Past, present, future. Greer, David ed., Oxford University Press, New York, p. 60-‐67.
Ryynänen M. P., & Klapuri A. P. (2008) -‐ Automatic transcription of melody, bass line, and chords in polyphonic music. Computer Music Journal, 32(3), p. 72-‐86.
Scott A.C. (1983) – The Performance of Classical Theatre. In: Chinese Theater: From Its Origins to the Present Day, ed. Colin Mackerras, University of Hawaii Press, Honolulu, p. 139-‐140.
Shui'er Han; Sundararajan, Janani; Bowling, Daniel Liu; Lake, Jessica; Purves, Dale (2011) -‐ Co-‐Variation of Tonality in the Music and Speech of Different Cultures. PLoS ONE. 2011, Vol. 6 Issue 5, p. 1-‐5.
Sloboda, John (1985) -‐ The musical mind: The cognitive psychology of music. Clarendon Press, Oxford, p. 85.
Slonimsky, Nicolas (1965) -‐ Lexicon of Musical Invective: Critical Assaults on Composers Since Beethoven. Norton & Company, New York, p.5
Thrasher, Alan R. (1981) -‐ The Sociology of Chinese Music: An Introduction. Asian Music, Vol. 12, No. 2 (1981), pp. 17-‐53.
Todd, Neil P.; O'Boyle, Donald J.; Lee, Christopher S. (1999) -‐ A sensory-‐motor theory of rhythm, time perception and beat induction. Journal of new music research, 28(1) p. 5.
Toiviainen P, Luck G, Thompson M (2010) -‐ Embodied meter: Hierarchical eigenmodes in music-‐induced movement. Music Perception 28, p. 59-‐70.
Trainor, L. J., Gao, X., Lei, J., Lehtovarara, K., SC Harris, L. R. (2009) -‐ The primal role of the vestibular system in determining musical rhythm. Cortex, 45, p. 3543.
Vindras, Philippe; Desmurget, Michel; Prablanc, Claude; Viviani, Paolo (1998) -‐ Pointing errors reflect biases in the perception of the initial hand position. Journal of Neurophysiology (Bethesda), Vol. 79 (6). June, p. 3290-‐3294.
45
Vos P, Handel S, (1987) -‐ Playing triplets: facts and preferences. In: Action and Perception in Rhythm and Music, Ed. A Gabrielsson, Royal Swedish Academy of Music, Stockholm,action and perceptionaction and perception pp. 35-‐47.
Wichmann, Elizabeth (1991) -‐ Listening to Theatre: The Aural Dimensions of Beijing Opera. University of Hawai‘i Press, Honolulu, p.4.
Yang Mu (1994) -‐ Academic Ignorance or Political Taboo? Some Issues in China's Study of Its Folk Song Culture. Ethnomusicology, Vol. 38, No. 2, Music and Politics (Spring -‐ Summer, 1994), pp. 303-‐ 320.