Syntax: Music that Makes Sense and Music that Doesn’t

45
Syntax: Music that Makes Sense and Music that Doesn’t The way in which music makes sense is essentially the same as the way we make sense out of verbal speech. We find a piece of music meaningful, if our ear detects familiar patterns of pitch, rhythm and harmony. We hypothesize the meaning of a spoken sentence, if our brain matches the words we know to the words we identify in what we hear. In both cases the same mechanism is at work. We break the flow of sounds into meaningful segments and then initiate a chain of references, crossreferences, guesses of the connection between various segments – and finally confirming those guesses, or rejecting them – and searching for a better guess. All of this intricate analytical work is done semiconsciously, in “real time,” as we listen to speech or music. Not recognizing any familiar patterns in music results in the state of confusion and disinterest. Both of them appear shortly after our ear gets used to the “strangeness” of meaningless sounds. The moment the novelty of the sound stops surprising us, we become bored. The longer the streak of meaningless music goes, the more difficult it becomes for us to stay focused. We might still retrieve some information from such music via the psychophysiological markers, such as tempo, beat, loudness and pitch register. Together, they can provide enough material for us to theorize what the music could possibly mean. But the longer the music passes without unveiling patterns familiar to us, the more abstract our hypothetical meaning appears. At the absence of melodic, rhythmic or harmonic idioms that would signal to us what emotional state is implied, our ear does not really connect to our mind. We hear some changes in pitch and duration, but they do not fire up our response. Emotional reaction requires familiar stimuli. Uncertainty tends to obstruct emotional response. We might guess a musical emotion presumably carried by the music in question, and try to evoke that emotion in us. However, without the ongoing reinforcement by familiar idioms, the empathetic mechanisms are not likely to kick in. Without them our emotional experience will not be stable enough for our mind to receive substantial feedback and confirm that we indeed are in such and such emotional state. This is what I observed, when in 2001, at the Abraham Joshua Heschel Day School in Northridge, CA, eighteen 8 th graders tried to answer the question, what emotion was expressed in the Overtones Aria from Kunqu opera (traditional Chinese music) after listening to it for the first time in their life. Their answers ranged from “surprise” and “fun” to “fear” and “pain.” The students appeared confused while listening, looking at each other with a puzzled expression or giggling. They could not provide any detailed record on how exactly did the emotional states follow each

Transcript of Syntax: Music that Makes Sense and Music that Doesn’t

Syntax:  Music  that  Makes  Sense  and  Music  that  Doesn’t    

The  way  in  which  music  makes  sense  is  essentially  the  same  as  the  way  we  make  sense  out  of  verbal  speech.    

• We  find  a  piece  of  music  meaningful,  if  our  ear  detects  familiar  patterns  of  pitch,  rhythm  and  harmony.      

• We  hypothesize  the  meaning  of  a  spoken  sentence,  if  our  brain  matches  the  words  we  know  to  the  words  we  identify  in  what  we  hear.    

In  both  cases  the  same  mechanism  is  at  work.  We  break  the  flow  of  sounds  into  meaningful  segments  and  then  initiate  a  chain  of  references,  cross-­‐references,  guesses  of  the  connection  between  various  segments  –  and  finally  confirming  those  guesses,  or  rejecting  them  –  and  searching  for  a  better  guess.  All  of  this  intricate  analytical  work  is  done  semi-­‐consciously,  in  “real  time,”  as  we  listen  to  speech  or  music.  

Not  recognizing  any  familiar  patterns  in  music  results  in  the  state  of  confusion  and  disinterest.  Both  of  them  appear  shortly  after  our  ear  gets  used  to  the  “strangeness”  of  meaningless  sounds.  The  moment  the  novelty  of  the  sound  stops  surprising  us,  we  become  bored.  The  longer  the  streak  of  meaningless  music  goes,  the  more  difficult  it  becomes  for  us  to  stay  focused.  We  might  still  retrieve  some  information  from  such  music  via  the  psycho-­‐physiological  markers,  such  as  tempo,  beat,  loudness  and  pitch  register.  Together,  they  can  provide  enough  material  for  us  to  theorize  what  the  music  could  possibly  mean.  But  the  longer  the  music  passes  without  unveiling  patterns  familiar  to  us,  the  more  abstract  our  hypothetical  meaning  appears.    

At  the  absence  of  melodic,  rhythmic  or  harmonic  idioms  that  would  signal  to  us  what  emotional  state  is  implied,  our  ear  does  not  really  connect  to  our  mind.  We  hear  some  changes  in  pitch  and  duration,  but  they  do  not  fire  up  our  response.  Emotional  reaction  requires  familiar  stimuli.  Uncertainty  tends  to  obstruct  emotional  response.  We  might  guess  a  musical  emotion  presumably  carried  by  the  music  in  question,  and  try  to  evoke  that  emotion  in  us.  However,  without  the  ongoing  reinforcement  by  familiar  idioms,  the  empathetic  mechanisms  are  not  likely  to  kick  in.  Without  them  our  emotional  experience  will  not  be  stable  enough  for  our  mind  to  receive  substantial  feedback  and  confirm  that  we  indeed  are  in  such  and  such  emotional  state.  

This  is  what  I  observed,  when  in  2001,  at  the  Abraham  Joshua  Heschel  Day  School  in  Northridge,  CA,  eighteen  8th  graders  tried  to  answer  the  question,  what  emotion  was  expressed  in  the  Overtones  Aria  from  Kunqu  opera  (traditional  Chinese  music)  after  listening  to  it  for  the  first  time  in  their  life.  Their  answers  ranged  from  “surprise”  and  “fun”  to  “fear”  and  “pain.”  The  students  appeared  confused  while  listening,  looking  at  each  other  with  a  puzzled  expression  or  giggling.  They  could  not  provide  any  detailed  record  on  how  exactly  did  the  emotional  states  follow  each  

  2  

other  -­‐  although  the  music  clearly  contained  strong  contrasting  changes.  Only  one  participant  was  able  to  specify  three  emotional  changes  within  5  minutes  of  music.  However,  not  a  single  listener  guessed  the  “right”  emotional  condition,  that  would  come  at  least  in  remote  accordance  with  the  plot  of  the  opera:  in  it,  during  that  aria,  a  scholar  overhears  a  nun  improvising  music  on  her  qin  (string  plucking  instrument,  like  a  zither);  enchanted  by  the  sounds,  the  scholar  falls  in  love  and  becomes  overfilled  with  irresistible  desire  to  meet  the  musician.    

However,  the  same  students  did  not  have  any  problem  identifying  emotions  in  another  opera  –  Carmen  by  Bizet  (the  closing  scene).  Almost  everybody  correctly  named  each  of  the  emotional  states  in  their  succession  within  a  5-­‐minute  fragment  of  music:  begging  of  Don  Jose,  Carmen’s  indifference  to  him,  his  and  Carmen’s  anger,  and  carefree  happy  people  in  the  background  of  the  action.  None  of  the  students  knew  French,  or  heard  this  music  before,  and  could  rely  only  on  the  music  they  heard  during  the  audition  to  support  their  conclusion.      

It  appears  that  familiarity  with  the  musical  idioms  used  in  Carmen,  prompted  by  prior  exposure  to  the  Western  music,  native  to  all  of  the  students,  was  sufficient  to  let  them  correctly  recognize  the  musical  emotions  and  make  sense  of  the  music.    

Chinese  traditional  opera,  on  the  other  hand,  is  characterized  with  strong  stylistic  artificiality:  thus,  actors  are  not  allowed  to  bend  their  knees  while  walking  on  stage,  their  neck  is  supposed  to  swing  from  side  to  side  on  each  step  -­‐  mincing  for  women,  and  rocking  forward  for  men.1  Stylization  rules  Chinese  theater.  Everything,  from  gait  to  gestures,  is  called  to  emphasize  the  divergence  between  behaviors  of  daily  life  and  their  representation  on  stage.  The  theatrical  art  is  believed  to  refine  and  aesthetically  elevate  things  mundane  by  employing  a  strong  stylistic  distortion.2    

Heavily  influenced  by  Confucian  aesthetics,  the  music  in  such  opera  deliberately  avoids  synesthetic  connections  that  could  potentially  guide  the  listener  (i.e.,  as  in  Western  music,  love  associated  with  tender  music,  anger  –  with  rough).  According  to  the  tradition  of  Cantonese  Opera,  all  vocals  had  to  be  sung  in  forcefully  delivered  falsetto.3  Majority  of  Western  listeners  would  interpret  such  vocal  expression  as  angry  –  whereas  the  music,  in  fact,  could  be  a  love  song.  

Confucius  looked  at  the  free  display  of  emotions  in  music  with  suspicion,  condemning  sensual  forms  of  music  and  praising  normative-­‐official  forms,  suitable  

                                                                                                               1  Scott  A.C.  (1983)  –  The  Performance  of  Classical  Theatre.  In:  Chinese  Theater:  

From  Its  Origins  to  the  Present  Day,  ed.  Colin  Mackerras,  University  of  Hawaii  Press,  Honolulu,  p.  139-­‐140.  

2  Wichmann,  Elizabeth  (1991)  -­‐  Listening  to  Theatre:  The  Aural  Dimensions  of  Beijing  Opera.  University  of  Hawai‘i  Press,  Honolulu,  p.4.  

3  Chan,  Sau  Y.  (2005)  -­‐  Performance  Context  as  a  Molding  Force:  Photographic  Documentation  of  Cantonese  Opera  in  Hong  Kong.  Visual  Anthropology.  Mar-­‐Jun2005,  Vol.  18  Issue  2/3,  p.  167-­‐198.  

  3  

for  enforcing  virtuous  moral  attitudes.4  His  advocacy  of  ritual,  for  which  he  became  famous,  influenced  generations  of  Chinese  musicians  to  come.  As  a  result,  art  music  cultivated  only  those  musical  idioms,  whose  acoustic  attributes  did  not  share  a  symbolic  similarity  with  the  emotions  they  denoted.    

• In  Western  opera,  the  timbre  of  a  snare  drum  usually  refers  to  the  military  music.  Snare  drum  in  the  symphonic  orchestra  sounds  very  similar  to  the  snare  drum  in  a  marching  band.  The  dry,  clean  and  orderly  sound  of  the  rhythmic  patterns  performed  on  this  drum  resembles  the  discipline  that  is  showcased  at  performances  of  military  music  during  parades  or  civic  ceremonies.      

• In  Chinese  traditional  opera,  there  are  no  such  direct  connections  between  the  elements  of  the  music  and  real  life.  For  example,  a  particular  rhythm  played  by  a  given  percussive  instrument  (such  as  a  drum,  a  clapper,  a  cymbal  or  a  gong)5  is  used  to  refer  to  the  place  of  the  action,  i.e.  at  the  meadow,  or  in  the  palace  (most  performances  were  housed    in  the  small  “bamboo  theaters”  at  teahouses  or  temple  courtyards  -­‐  without  any  stage  design).6  Understanding  references  like  this  is  impossible  without  exquisite  knowledge  of  all  the  musical  idioms  and  related  to  them  conventions  of  act-­‐out.    

Not  surprisingly,  Western  listeners  have  trouble  making  sense  of  this  highly  denotative  opera  music  and  have  no  choice  but  to  reserve  their  interpretive  strategy  to  mere  guessing.    

It   should   be   noted   that   those   studies   of   cross-­‐cultural   recognition   of  emotion  that  report  that  Western  listeners  had  success  recognizing  emotions  in  Chinese  tunes  are  based  on  samples  not  from  Chinese  opera,  but  from  folk  music.7      

The  attitude  of  Confucian  driven  aesthetics  towards  folk  music  in  China  is  undoubtfully  hostile.   Chinese   scholarship  has  been   strongly  politicized   from  times  of  Antiquity.  A  debasing  outlook  towards  “common  people’s”  music  has  characterized   Chinese   musicology   through   centuries   and   still   is   prominent  today.  Years  of  Maoism  did  not  change  this  bias,  because  the  Maoist  ideology  still   pressed   towards   putting   authored   “art   music”   at   service   of   common  

                                                                                                               4  Thrasher,  Alan  R.  (1981)  -­‐  The  Sociology  of  Chinese  Music:  An  Introduction.  

Asian  Music,  Vol.  12,  No.  2  (1981),  pp.  17-­‐53  5  Rao,  Nancy  Yunhwa  (2007)  -­‐  The  tradition  of  luogu  dianzi  (percussion  

classics)  and  Its  signification  in  contemporary  music.  Contemporary  Music  Review.  2007,  Vol.  26  Issue  5/6,  p.  511-­‐527.  

6  Wichmann,  Elizabeth  (1991)  -­‐  Listening  to  Theatre:  The  Aural  Dimensions  of  Beijing  Opera.  University  of  Hawai‘i  Press,  Honolulu,  p.  6.  

7  Shui'er  Han;  Sundararajan,  Janani;  Bowling,  Daniel  Liu;  Lake,  Jessica;  Purves,  Dale  (2011)  -­‐  Co-­‐Variation  of  Tonality  in  the  Music  and  Speech  of  Different  Cultures.  

PLoS  ONE.  2011,  Vol.  6  Issue  5,  p.  1-­‐5.  

  4  

people,  rather  than  popularizing  anonymous  music  of  common  people.  That  is  why   the   body   of   folk   songs   published   in   China   has   never   been   adequately  examined   in   relation   to   its   authenticity   and   accuracy   of   arrangement   and  notation.  Information  on  Chinese  music  folklore  coming  from  Chinese  sources  has  often  misled  Western  scholars  to  misapprehended  conclusions.8  

Besides   the   issues   of   reliability   in   representation   of   “Chinese”   national  characteristics   in   published   folk   tunes,   it   should   be   remembered   that   their  propensity  to  reflect  emotions,  common  for  all  folk  music  in  the  world,  comes  in  sharp  contradiction  with  values  of  the  traditional  art  music  of  China.  Just  as  emotional   openness   is   viewed   negatively   by   Confucian   ethics,   spontaneous  natural   expression   of   emotion   in   folk   music   is   likely   to   be   interpreted   as  “crude,”   “uncivilized”   and   “chaotic”   in   eyes   of   traditional   art  music   expert   –  serving  as  a  model  of  what  not  to  do  in  a  high  art  composition.  

Strong  isolation  of  art  music  in  Far  East,  with  its  complete  dependence  on  conventions,  has  worked  reciprocally  in  making  it  appear  esoteric  to  ear  of  the  un-­‐initiated  listeners.  Equally  esoteric  is  an  impression  from  Western  music  carried  by  a  Far  Eastern  listener,  who  was  brought  up  on  his  traditional  music,  with  no  prior  exposure  to  Western  music  (of  course,  today  it  is  hard  to  find  such  a  person).  Nicolas  Slonimsky  presents  an  account  of  such  an  encounter  by  some  Jijei  Hashigushi  who  happened  to  attend  the  cream  of  the  crop  opera  performance  at  the  premiere  of  Puccini’s  Madama  Butterfly  in  New  York,  in  1907.  His  bitter  letter  to  New  York  Daily  grumbled:  “I  can  say  nothing  for  the  music  of  Madama  Butterfly.  Western  music  is  too  complicated  for  a  Japanese.  Even  Caruso’s  celebrated  singing  does  not  appeal  very  much  more  than  the  barking  of  a  dog  in  a  faraway  woods.”9  

Evidently,  an  intuitive  skill  of  breaking  the  flow  of  music  into  the  familiar  meaningful  units  lies  at  the  heart  of  music  perception.  Without  it  no  satisfactory  aesthetic  experience  is  possible  for  an  individual.  And  the  same  applies  to  speech  recognition.  Quite  similar  effect  can  be  observed  in  Chinese  poetry.  A  famous  poem,  “Shī  Shì  shí  shī  shǐ”  (Lion-­‐Eating  Poet  in  the  Stone  Den)  by  Yuen  Ren  Chao  sounds  as  91  monotonous  repetitions  of  the  same  sound  “shi”  to  the  non-­‐Chinese  speakers.  However,  the  content  of  this  poem  is  far  from  being  monotonous  –  it  presents  a  fantastic  story  of  a  poet  who  ate  ten  lions.  Western  listener  not  versed  in  Mandarin  cannot  draw  any  idea  of  this  meaning  from  the  boring  sound  of  the  poem.  But  speakers  of  Mandarin  can  distinguish  between  monosyllabic  and  disyllabic  words,  and  join  words  into  phrasal  units  –  which  enables  them  to  appreciate  finesse  and  sophistication  of  this  poem.  

                                                                                                               8  Yang  Mu  (1994)  -­‐  Academic  Ignorance  or  Political  Taboo?  Some  Issues  in  

China's  Study  of  Its  Folk  Song  Culture.  Ethnomusicology,  Vol.  38,  No.  2,  Music  and  Politics  (Spring  -­‐  Summer,  1994),  pp.  303-­‐  320.  

9  Slonimsky,  Nicolas  (1965)  -­‐  Lexicon  of  Musical  Invective:  Critical  Assaults  on  Composers  Since  Beethoven.  Norton  &  Company,  New  York,  p.5  

  5  

Understanding  of  both,  music  and  speech,  starts  at  the  same  point  –  with  the  same  strategies  employed  to  spot  syntactic  structures  in  the  stream  of  sounds.  This  common  root  supports  the  entire  tree  of  sense-­‐making  via  auditory  channel.    

• We  recognize  the  meaning  of  a  sentence  after  our  brain  negotiates  the  meanings  of  all  the  words  that  constituted  that  sentence.  

• We  recognize  the  meaning  of  a  music  phrase  while  and  after  our  brain  negotiates  the  meanings  of  all  the  musical  idioms  found  in  that  phrase.  

Both  processes  share  the  same  initial  stage:  our  mind  recognizes  elementary  structures  and  relates  them  to  each  other  –  after  which  both,  music  and  speech,  take  their  own  route.  The  entire  semiotic  chain  in  music  comprehension,  however,  remains  dependent  on  idioms  and  linguistic-­‐like  syntax  –  without  them  no  communication  of  information  via  music  can  reliably  take  place.  

 

Parsing  of  Syntax  in  Music:  its  Idiomatic  Basis  

The  first  step  our  brain  has  to  do  in  order  to  reduce  the  wealth  of  the  data  entering  the  ear  to  manageable  portions  of  information  is  the  operation  of  “parsing.”  Parsing  is  a  linguistic  term  for  instant  analysis  of  the  stream  of  sounds  according  to  the  set  of  known  syntactic  rules.  Just  like  parsing  in  speech,  parsing  of  music  underlies  the  process  of  music  comprehension.    

Making  sense  from  music  is  essentially  a  coordinated  network  of  emotional  responses  to  an  on-­‐going  chain  of  micro-­‐events.  The  brain  matches  the  perceived  data  to  the  known  glossary  of  patterns,  and  guesses  where  the  end  point  of  one  pattern  and  the  beginning  point  of  another  pattern  is  likely  to  be.  Syntax  plays  the  key  role  here:  without  knowing  the  rules  of  grouping  sounds  together,  or  breaking  them  apart,  no  event  will  ever  be  registered  by  brain  –  not  to  speak  of  the  emotional  reaction.    

There  are  five  primary  aspects  of  expression,  which  are  actively  engaged  in  much  of  musical  communication.  Their  idioms  and  their  syntactic  rules  are  most  frequently  used  in  music  comprehension.  

1. Pitch  idioms  (like  mentioned  earlier  “fanfare”)  are  mediated  by  the  pitch  syntax  (such  as  pitch  contour  or  intervallic  proximity);  

2. Rhythm  idioms  (like  “punctured  rhythm”,  discussed  below)  are  mediated  by  the  rhythm  syntax  (such  as  rules  of  grouping);  

3. Metric  idioms  (like  binary  or  ternary  meters,  discussed  below)  are  mediated  by  the  metric  syntax  (such  as  ostinato,  syncopation  or  alternation);  

4. Harmonic  idioms  (like  “major  triad”)  are  mediated  by  the  harmonic  syntax  (such  as  tonic  or  dominant);  

  6  

5. Texture  idioms  (like  “melodic  line,”  “chords”  or  “figure  of  accompaniment”)  are  mediated  by  the  textural  syntax  (such  as  polyphony,  homophony  or  heterophony).  

The  other  expressive  aspects  feature  idioms,  but  lack  their  proprietary  syntactic  rules  –  instead  relying  on  the  syntax  of  five  primary  aspects:  

6. Dynamic  idioms  (like  “forte,”  “piano”  or  “sforzando”)  are  not  connected  to  each  other  by  dynamic  means  –  instead  they  depend  on  harmonic,  metric,  pitch  and,  to  a  less  extent,  rhythm  syntax;  

7. Tempo  idioms  (like  “ritenuto”  or  “accelerando”)  do  not  follow  any  tempo-­‐based  logic  –  their  appropriateness  is  decided  by  pitch,  rhythm  and  harmony;  

8. Articulation  idioms  (like  “staccato”  or  “legato”)  are  not  limited  by  any  articulation  restrictions,  they  can  be  easily  combined  or  alternated  –  determined  by  all  five  primary  aspects  of  expression,  plus  tempo;  

9. Timbral  idioms  (such  as  “vibrato”  or  “con  sordino”)  highlight  the  pitch,  rhythm,  harmonic  and,  to  a  less  extent,  metric  organization,  plus  tempo  and  articulation;  

10. Idioms  of  music  form  (such  as  “introduction,”  “exposition,”  or  “development”)  are  integrated  from  all  nine  aspects  and  all  syntaxes.  

Perception  of  an  elementary  musical  unit  in  any  of  these  aspects  leads  to  the  next  stage  in  processing.  Now  that  elementary  unit  has  to  be  categorized  according  to  the  expertise  of  a  listener.  This  is  the  point  when  a  single  event  is  recognized  as  an  idiomatic  unit.  The  moment  it  happens,  this  audio  event  is  placed  in  a  hierarchical  framework  –  it  becomes  mapped  in  relation  to  previous  and  coming  events  of  the  same  kind,  so  that  the  mind  can  access  this  particular  event  at  any  point  of  time,  as  needed.    

Having  a  clear  destination  point  is  extremely  important  for  delivery  of  meaning.  When  the  mind  negotiates  the  meaning  of  one  identified  idiom  with  the  meaning  of  another  idiom,  it  often  has  to  skip  back  and  forth  in  the  chain  of  micro-­‐events  -­‐  to  check  and  double-­‐check  for  consistency  in  the  logical  connections.  When  the  meaning  of  one  link  mismatches  the  other  ones,  the  mind  has  to  hypothesize  another  connection  –  which  then  needs  another  double-­‐check,  and  perhaps,  a  triple-­‐check.  In  practice,  our  mind  is  constantly  busy  operating  within  the  span  of  5-­‐6  adjacent  idiomatic  units.  Occasionally  it  has  to  leap  far  beyond  –  in  the  order  of  hundreds  of  idioms  –  in  order  to  access  a  previous  section  of  music  form.  This  is  the  case  of  so-­‐called  “recapitulation”  that  brings  back  the  same  musical  material  that  has  opened  a  music  piece.  

Let  us  take  the  example  of  rhythm  to  see  how  a  single  idiomatic  unit  fits  into  the  chain  of  rhythmic  events,  drawing  a  map  of  hierarchical  relation  between  all  of  these  events.  

  7  

The   Arabeske   op.18   by   Schumann   opens   with   the   rhythmic   pattern  “short-­‐long-­‐short-­‐long,”   with   the   proportion   of   3:1   between   long   and   short  tones.    

Example   1.   Schumann   –   Arabeske.   Each   rhythmic   pattern   is  marked   by  the  bracket  under  the  score:      

  <_________> <_________> <____> <____> <___> <____> <________      1.     2.    3.    4.   5.   6.  

 ___> <______> <____> <____> <____> <____> <____>

The  “short-­‐long”  rhythm  will  be  recognized  by  a  competent  listener  as  a  so-­‐called  “punctured  rhythm.”  It  is  often  referred  to  as  “dotted”  rhythm:    

!" In  case  of  the  Arabeske,  this  rhythm  is  inverted:      #$ This  dotted  rhythm   figure   is   repeated  once  and   then   fragmented  by   the  

succession  of  four  “short-­‐long”  couples  of  tones.  Then  the  entire  progression  of   6   patterns   is   repeated   once   again.   The   sequence   of   all   12   patterns  comprises   the   first   sentence   and   serves   as   the   theme   of   the   Arabeske.   The  accompaniment   to   this   theme   runs   by   the   non-­‐stop   16th   notes,   providing   a  well-­‐defined  metric  grid   for   the  punctured   figures   in   the  melody.  Every   fifth  16th  note  is  marked  by  the  change  of  harmony.    

Example  2.    The  brackets  here  indicate  the  span  of  the  same  harmony:  

  8  

  <______> <____> <____> <____> <____> <___> <____> <____> <_____>

  <____> <_____> <____> <____> <____> <____> <____>

Regularity   of   this   harmonic   pulse   stresses   the   beat.   Indeed,   listening   to  the   music   confirms   that   the   beat   follows   harmonic   changes:   it   feels  comfortable  to  tap  to  every  fifth  16th  note  –  such  tapping  seems  to  agree  with  the  character  of  music  and  to  stimulate   the  musical  movement.   Its   liveliness  supports  the  indication  of  the  character  in  the  score  by  Schumann,  “lightly  and  tenderly.”    

What  is  evident  from  this  framework  is  that  the  “long”  note  of  a  rhythmic  pattern  is  shorter  than  the  beat.  Subsequently,  the  unevenness  of  the  on-­‐going  rhythm   (long-­‐short-­‐long-­‐short…)   ignites   the   beat,   charges   it   with   energy   to  bounce   up   and   down   –   exerting   the   force   like   a   compressed   spring.   The  moment   the   listener   realizes   this   relation   to   the   beat,   he   confirms   his  recognition  of  “punctured  rhythm”  and  starts  looking  for  wider  perspective.    

He   notices   that   every   other   beat   is   heavier.   This   suggests   the   binary  organization   of  meter.   Indeed,   the  music  moves   in   a   walking   pattern:   “left-­‐right-­‐left-­‐right.”   The   excited   state   projected   by   the   bouncy   rhythm,   plus  zigzagging  melodic  contour  and  busy  harmonic  pulse,  all  suggest  that  we  deal  here  with  the  simple  binary  meter.  Such  meter  makes  the  punctured  rhythm  sound  the  most  active  comparing  to  any  other  meter.  In  a  simple  binary  pulse  every   beat   steps   in   opposite   direction   –   “upbeat-­‐downbeat-­‐upbeat-­‐downbeat.”  There  are  no  neutral  regular  beats.  All  beats  in  a  metric  group  go  either  “up”  or  “down.”  Therefore,  each  bounce  of  the  punctured  figure  flips  the  metric   direction.   This   makes   the   movement   as   hyperactive   as   the   regular  meter  can  possibly  allow  for.      

What  we  observe  in  this  example  is  the  construction  of  metric  space  from  recognition  and  mapping  of  the  rhythmic  patterns  –  cross-­‐relating  them  to  harmony  and  pitch,  while  taking  texture  into  consideration.  This  process  starts  from  

  9  

identifying  the  first  rhythmic  pattern  and  ends  with  definition  of  the  metric  movement  (open  to  future  adjustments  as  the  music  piece  progresses).  

At  which  point  does  this  construction  become  meaningful?  –  We  ascribe  a  character  to  it  the  moment  we  realize  that  we  deal  with  the  punctured  figure  within  a  simple  binary  pulse  –  that  is,  around  the  time  we  hear  the  7th  rhythmic  pattern.  By  that  time  the  rhythm  is  categorized  and  placed  into  a  hierarchical  structure.  

What  is  important  here  is  that  the  very  process  of  categorization  is  highly  idiomatic.  If  the  mind  is  not  aware  that  the  pattern  “short-­‐long-­‐short-­‐long”  constitutes  the  so-­‐called  “punctured”  rhythm,  and  the  meaning  of  it  is  the  excess  of  energy  –  bouncy,  bubbly  character  –  then  no  grid-­‐fitting  is  possible.  Each  punctured  figure  is  responsible  for  production  of  a  single  impulse.  Without  the  grid,  the  succession  of  punctured  figures  in  music  will  not  accumulate  energy  and  fail  to  charge  the  musical  movement.    

Moreover,  without  the  knowledge  of  all  related  idioms,  the  rhythmic  pattern  can  be  categorized  in  a  wrong  way.  Proportion  of  3:1  can  be  mistaken  for  2:1,  which  constitutes  completely  different  rhythmic  idiom.  A  good  example  of  such  similarity  is  Song  Without  Words  G  Minor  op.19,  No.6  by  Mendelssohn.    

Example   3.   Mendelssohn   –   Song   Without   Words   op.   19   No.6.   Each  rhythmic  pattern  is  marked  by  a  bracket  under  the  score:  

  <________> <_______> <___> <____> <____> <_______> <_   1.   2.   3.   4.   5.        

  ____> <___> <______> <____> <___> <__________>

The  melody   of   this   piece   closely   follows   the   rhythmic   scheme   from   the  Arabeske:   the   “short-­‐long-­‐short-­‐long”   pattern   is   repeated,   and   then  fragmented  once.  After  that  a  new  rhythmic  pattern  terminates  the  sentence.    

Schumann:     #!"$ Schumann:    %&%& The   biggest   difference   between   the   rhythm   of   both   melodies   is   their  

metric  division:  in  Schumann’s  case  the  ratio  is  3:1,  whereas  in  Mendelssohn’s  it  is  2:1.  Other  parameters  are  very  similar.  

  10  

Like   in   the   Arabeske,   the   accompaniment   in   the   Song   also   marks   the  continuous  motion  –  this  time  by  the  8h  notes.  Every  fourth  8th  note  is  marked  with   the   stress.   The   stresses,   however,   are   not   equal:   every   other   stress   is  stronger,  providing  the  framework  of  “right-­‐left-­‐right-­‐left”  pulsation.  It  is  very  similar  to  the  Arabeske,  except  that  there  each  “right”  and  “left”  couple  was  set  against  4  notes  of  the  accompaniment,  whereas  here  there  are  3  notes.    

The   ternary  division  makes   a   big   difference.   The   greater   amount   of   the  divided   parts,   the   weaker   the   metric   impulse.   Each   of   3   notes   in   the  Mendelssohn’s   accompaniment   sounds   heavier   than   each   of   4   notes   in  Schumann’s  Arabeske   -­‐   producing  a  different  grid.  Here  each  of   the  notes   in  the  accompaniment  constitutes  a  beat.    

As  a  rule  of  thumb,  the  beat  unit  is  the  medium  short  duration  observable  throughout  the  score:  collect  all  the  note  durations  and  define  the  3  shortest  ones.  Then  the  one  but   last   is   likely  to  serve  as  the  main  beat  unit.  This  rule  hold  true  in  most  cases  except  where  there  are  not  enough  rhythms  to  build  a  hierarchy.  In  this  particular  excerpt  we  find  the  shortest  note  by  the  end  of  the  phrase,  in  the  4th  rhythmic  pattern.  

Example  4.  the  harmonic  pulse:  

  <________> <_______> <______> <_______> <______> <_______>

  <______> <___> <___> <______> <___> <___> <_____>

 

Heavier   beat   with   smoother   rhythm   corresponds   with   the   composer’s  marks   for   the   tempo,   calling   for   sustained   walking   pace.   The   indication  “singingly”   for   the   melody   emphasizes   the   gentler   character,   comparing   to  Schuman.  Finally,  the  subtitle  “Venetian  Boat  Song”  directs  the  performer  and  the  listener  towards  thinking  about  a  boatman  who  keeps  propelling  his  boat  with  short  energetic  pushes  of  his   long  oar  –  and  his  passenger  enjoying  the  ride.  

The   harmonic   pulse   is   more   complex   here   than   in   the   Arabeske,   but  overall,  it  marks  every  other  group  of  “three”  8th  notes  in  the  accompaniment,  

  11  

suggesting   the   complex   ternary  meter   –   the   pulse   of   six,   subdivided   in   two  groups,  3  +  3.   In   that  case   the  “short-­‐long-­‐short-­‐long”  pattern  of   the  melody  maps  differently:  “long”  is  found  to  be  longer  than  the  beat  –  in  contrast  with  the  Arabeske.  This  relation  drastically  changes  the  rhythmic  expression.      

The  “short-­‐long-­‐short-­‐long”  pattern  at  2:1  constitutes  the  swing-­‐like  rhythm  that  is  associated  with  the  expression  of  relative  relaxation  and  ease.  The  2:1  ratio  produces  rounded  rather  than  jagged  (as  it  is  in  case  of  3:1  ratio)  rhythm,  and  therefore  suggests  a  smooth  directed  movement  –  unlike  the  bouncy  gait  of  the  3:1  rhythm.  

Such   contrast   between   expressions   of   2:1   and   3:1   rhythms   is   clearly  illustrated   in   the   song   Trost   an   Elisa   D.97   by   Schubert.   This   song   starts   in  common   time   (4/4)   as   a   recitative,   with   numerous   brief   punctured   figures  (“long-­‐short”  patterns)  which  sharpen  the  vocal  line.    

Example  5.  Schubert  –  Trost  an  Elisa,  beginning:  

 The   lyrics   talk  about  Elisa  who  could  not   stop  weeping   for   the  death  of  

her  beloved  one  -­‐  despite  many  years  that  have  passed  since.  The  moment  the  lyrics  mention  the  “spirit”  of  her  beloved,  the  3:1  rhythm  disappears  and  gives  way  to  the  2:1  rhythm  (long-­‐short,  within  the  12/8  meter).    

Example  6.  Bars  15-­‐17:  

 It  noticeably  soothes  the  movement,  providing  gentle  swaying,  probably  

to  account  for  the  reference  to  “wandering”  of  the  sole  between  the  Earth  and  the   Heavens.   On   the   words   “loving   companion”,   the   common   time   and  punctured  rhythms  return,  suggesting  the  idea  of  suffering.  What  makes  this  transition   even  more  dramatic,   the   composer   switches  between   two  meters  right  amidst  the  bar.  On  the  word  “longing,”  the  2:1  ratio  kicks  in  -­‐  in  the  piano  

  12  

part   alone,   leaving   the   vocals   in   different  meter,  with   the   3:1   ratio,   thereby  creating  a  polymeter.    

Example  7.  Bars  18-­‐20:  

 This  moment  corresponds  to  the  exclamation  “for  he  is  eternally  yours!”  

and   implies   duality,   since   Elisa   remains   in   the   Earthly   world,   whereas   her  beloved  one  belongs  to  the  Heavens.  The  next  line  speaks  of  Elisa’s  suffering,  while   the   piano   and   the   vocals   join   in   common   time   and   the   3:1   punctured  rhythm.  Nevertheless,  the  ending  resumes  the  2:1  rhythm  -­‐  with  the  promise  of  rejoining  and  immortality.    

Evidently,  the  2:1  rhythm  is  associated  throughout  the  song  with  the  idea  of  comfort,  whereas  the  3:1  rhythm  accompanies  the  idea  of  suffering.  

What  we  see  is  that  there  are  two  rhythmic  idioms  with  diametrically  opposite  expressions:  hyperactive  punctured  rhythm  (3:1)  and  sleek  swing-­‐like  rhythm  (2:1).    

It  is  highly  unlikely  that  any  listener  who  is  not  already  familiar  with  these  idioms  will  be  capable  of  telling  that  the  Arabeske  is  affectionate  and  excited,  while  the  Song  without  Words  is  pleasurably  laid  back,  whereas  the  Trost  an  Elisa  is  torn  between  suffering  and  reconciliation.  An  incompetent  listener  might  be  able  to  infer  some  emotional  characteristics  from  the  psychophysiological  markers  in  the  music  that  are  obvious  to  anybody  (i.e.  tempo  and  dynamics),  but  it  is  doubtful  if  the  emotional  stimulus  will  be  sufficient  to  drive  the  emotional  contagion  and  trigger  the  “real  life”  experience  of  warmth  and  affection  in  the  Arabeske,  comfort  and  sensuality  of  the  Song  without  Words  and  torment  of  the  Schubert’s  song.  

 

Idiomatic  Basis  of  the  Performance  Exaggerations    

Yet  another  syntactic  problem  is  that  not  every  piece  of  music  provides  a  texture  clear  enough  to  indicate  the  2:1  versus  the  3:1  metric  grid.  The  examples  from  Schumann  and  Mendelssohn  both  featured  very  clear  regular  division  in  the  accompaniment.  It  was  easy  to  hear  how  many  notes  of  the  accompaniment  fit  on  one  “long”  note  of  the  melody  –  two  or  three.      

When  such  shorter  notes  are  absent  in  the  texture,  the  task  of  detecting  the  rhythmic  ratio  becomes  much  more  challenging.  It  cannot  be  accomplished  by  physical  timing  of  exact  duration  for  every  note  under  question.  Musical  time  is  very  different  from  physical  time!      

  13  

In  music  practice,  as  a  rule,  the  performers  manipulate  the  exact  timing  of  rhythmic  figures,  making  them  slightly  longer  or  shorter,  depending  on  the  context  of  the  melody  and  harmony.  As  a  result,  a  unit  of  musical  time,  such  as  a  beat  or  a  bar,  becomes  shorter  or  longer  in  units  of  absolute  time,  such  as  milliseconds  or  seconds.    

Bengtsson  &  Gabrielsson  (1983)10  provide  the  pictorial  representation  how  the  rhythm  prescribed  by  the  score  differs  from  real  life  performance  for  the  Sonata  A  Major  K331  by  Mozart.  The  vertical  axis  shows  deviations  in  timing:  positive  values  stand  for  extra  time  added  to  the  time  value  specified  by  the  score  (slowing  down  the  notes);  while  the  negative  values  –  subtracting  time  value  (speeding  up  the  notes).  Two  graphs  correspond  to  the  performances  by  two  different  players.      

Example   8.     Mozart   –   Sonata   A   Major   K331   (from   Bengtsson   &  Gabrielsson,  1983)  

 It  is  obvious  that  very  few  notes  (1  of  34  in  the  1st  performance,  and  6  of  34  in  

the  2nd  one)  receive  the  time  value  prescribed  by  the  score.  The  majority  of  the  notes  are  constantly  “distorted”:  either  trimmed  or  stretched.  The  degree  of  distortion  and  the  choice  of  which  notes  to  slow  down,  and  which  to  speed  up  seems  to  characterize  individual  performance  style  of  each  player.  More  recent  studies  demonstrate  that  expressive  timing  signature  serves  as  a  marker  of  the  individual  

                                                                                                               10  Bengtsson,  I.,  &  Gabrielsson,  A.  (1983).  Analysis  and  synthesis  of  musical  

rhythm.  In  J.  Sundberg  (Ed.),  Studies  of  music  performance  (pp.  27-­‐60).  Publications  issued  by  the  Royal  Swedish  Academy  of  Music  No.  39,  Stockholm,  Sweden.  

  14  

style  for  a  master  performer.  Master-­‐musician  recognizes  his  own  performance  months  and  even  years  after  sight-­‐reading  an  unfamiliar  piece  of  music,  even  after  all  the  dynamic  variation  is  artificially  removed  from  the  recordings  -­‐  suggesting  that  the  manner  of  tweaking  of  timing  is  the  sole  marker  of  individuality  for  a  performer.11  

This  consistent  textual  “inaccuracy”  on  part  of  the  performers  constitutes  the  soul  of  music.  Multiple  studies  demonstrate  that  imprecision  of  the  expressive  timing  is  directly  linked  to  emotionality.  The  more  precise  is  the  timing  in  music,  with  little  deviation  from  the  metronomically  correct  pulse,  the  greater  is  the  impression  of  the  lack  of  “human  feel.”  

Thus,  in  the  series  of  experiments,  Bhatara  et  al.  (2011)12  created  MIDI  recordings  of  the  expressive  performances  of  four  nocturnes  by  Chopin,  and  electronically  modified  them.  Some  of  the  recordings  were  stripped  of  the  temporal  and  dynamic  fluctuations,  and  some  others  were  altered  to  include  slowing  down  or  speeding  up.  Such  alterations,  however,  were  applied  in  the  places  different  than  the  ones  chosen  by  the  master  pianist  for  expressive  timing.  The  panel  of  listeners  rated  the  original  recordings  as  well  as  their  modifications.  All  the  accurate  timing  versions  were  ruled  as  “less  human”  and  “less  emotionally  communicative.”  The  versions,  in  which  the  time  intervals  were  altered  in  random  places,  were  found  the  least  emotional.    

Such  results  strongly  suggest  that  the  music  by  Chopin  contains  a  set  of  patterns  which  require  shortening  or  lengthening  of  certain  tones  in  order  to  emphasize  this  or  that  musical  emotion.  The  less  exaggeration,  the  less  obvious  is  the  pattern,  and  subsequently,  the  weaker  the  musical  emotion  –  and  weaker  the  emotional  contagion  in  the  audience.  This  correlation  must  be  taken  as  an  indirect  proof  that  the  musical  idioms  indeed  are  present  in  music:  the  listeners  intuitively  recognize  familiar  idioms,  when  these  idioms  are  marked  out  by  means  of  the  expressive  timing.  Then  the  audience  has  easy  time  identifying  the  denoted  musical  emotions,  and  empathizing  to  them.  Exaggerating  the  durations  in  random  places  makes  it  harder  to  recognize  the  idioms  –  and  therefore  reduces  emotional  response.      

Another  argument  in  support  of  idiomatic  nature  of  expressive  timing  is  the  fact  that  performers  cannot  get  rid  of  rhythmic  exaggerations  even  if  they  want  to.  In  number  of  experimental  studies,  musicians  were  asked  to  play  without  any  expression.  Nonetheless,  their  performances  still  showed  small  variations  in  timing.    

                                                                                                               11  Repp  BH,  Knoblich  G.  (2004)  -­‐  Perceiving  action  identity:  how  pianists  

recognize  their  own  performances.  Psychological  Science  15,  p.  604-­‐9.  12  Bhatara,  Anjali;  Tirovolas,  Anna  K.;  Duan,  Lilu  Marie  (2011)  -­‐  Perception  of  

Emotional  Expression  in  Musical  Performance.  Journal  of  Experimental  Psychology:  Human  Perception  and  Performance,  v37  n3  p.  921-­‐934.  

  15  

One  of  such  studies13  examined  exact  mapping  of  timing  “inaccuracies”  of  6  pianists  playing  the  Chopin’s  Etude  excerpt.  Each  pianist  played  it  twenty  times  on  a  digital  piano,  the  first  ten  times  with  normal  expression  and  the  second  ten  times  “metronomically”  accurate.  In  each  case,  the  “metronomic”  versions  were  found  to  contain  timing  fluctuations  in  the  same  exact  locations,  where  exaggerated  timing  in  the  expressive  performances  (by  the  same  pianist)  took  place.    

One  cannot  avoid  the  impression  that  once  a  pianist  learns  an  idiom,  his  mind  becomes  wired  to  bind  certain  tones  in  a  given  idiom  to  particular  amount  of  shortening  or  lengthening.  Once  an  idiom  is  “understood”  in  its  expression,  it  becomes  impossible  for  the  mind  to  wipe  this  “understanding”  off  –  hence,  the  corresponding  expressive  timing  is  bound  to  stay,  no  matter  what.    

Most  live  performances  of  music  of  all  sorts  of  styles  contain  more  or  less  pronounced  variations  of  tempo,  even  in  such  applications  where  one  might  expect  a  very  stable  tempo.  For  example,  dance  music  is  designed  to  help  dancers  to  reproduce  the  same  steps  of  a  dance  over  and  over  –  seemingly,  a  very  regular  task.  However,  every  Viennese  waltz,  when  performed  by  expert  musicians,  features  marked  accelerations  and  retardations  almost  in  every  measure  to  the  extent  that  a  notated  bar  may  at  times  be  performed  at  twice  the  speed  of  another  bar.  14    

This  flexibility  of  musical  time  appears  to  be  present  in  any  naturally  evolved  form  of  music  –  except  the  case  when  electronic  music  has  been  conceived,  arranged  and  performed  on  computerized  equipment.  However,  such  music  is  rather  new  -­‐  pioneered  by  the  German  band  Kraftswerk.    

Starting  from  1978,  Kraftswerk  stopped  using  human  operators,  relying  solely  on  drum  machines  and  sequencers  in  all  live  performances.  The  band  members  often  left  stage  during  their  concerts,  letting  the  machines  take  over.  Noteworthy,  the  entire  genre  of  techno  music  that  has  adopted  this  precise  “absolute  timing”  style  is  notorious  for  anti-­‐emotional  robotic  character,  where  all  human  aspects  of  performance  are  either  downplayed  or  right  ignored.15    

Even   in   the   music   where   a   dedicated   performer,   a   drummer,   is  responsible  for  “keeping  time”  in  the  strictest  possible  way  –  timing  is  never  really   “strict”   in   physical   terms.   In   the   latest   experimental   study,   fifteen  professional  drummers  were  asked  to  synchronize  a  basic  drumming  pattern  with  a  metronome  as  precisely  as  possible  at   the  speed  of  60,  120,  and  200  

                                                                                                               13  Repp,  Bruno  H.  (2000)  -­‐  The  timing  implications  of  musical  structures.  In:  

Musicology  and  sister  disciplines:  Past,  present,  future.  Greer,  David  ed.,  Oxford  University  Press,  New  York,  p.  60-­‐67.  

14  Bengtsson,  I.,  &  Gabrielsson,  A.  (1983)  -­‐  Analysis  and  synthesis  of  musical  rhythm.  In:  Studies  of  music  performance,  J.  Sundberg  (Ed.).  Publications  issued  by  the  Royal  Swedish  Academy  of  Music  No.  39,  Stockholm,  Sweden,  pp.  27-­‐60.  

15  Reinecke,  David  M.  (2009)  -­‐  “When  I  Count  to  Four  ...”:  James  Brown,  Kraftwerk,  and  the  Practice  of  Musical  Time  Keeping  before  Techno.  Popular  Music  &  Society.  Dec2009,  Vol.  32  Issue  5,  p.  607-­‐616.  

  16  

beats  per  minute  (bpm).  At  the  slower  tempo  the  right  hand  (playing  high-­‐hat  cymbals)  was   found  to  have  a  2  ms  synchronization  error  (SE),  whereas  the  left   hand   (on   snare   drum)   and   right   foot   (bass   drum)   were   ahead   of   the  metronome  by  about  10  ms.  At   the  highest  speed,   the  synchronization  error  rates  reversed  between  both  hands.  Overall,  the  variation  of  SE  stayed  around  2%  for  the  60  and  120  bpm  tempos,  and  4%  for  the  200  bpm  tempo.16  

The  reality  of  music  is  such  that  every  rhythmic  idiom  is  entitled  to  a  specific  amount  of  exaggeration.  This  is  where  the  recognition  of  idioms  and  their  syntactic  relations  becomes  imperative.  If  the  musician  fails  in  his  parsing,  he  will  miss  expressive  timing,  and  therefore,  fail  to  deliver  an  adequate  emotional  message.  The  consequences  of  this  can  range  from  dry  impression  on  the  audience  to  its  confusion  (in  cases  of  complex  rhythm  or  meter).  If  the  listener  fails  to  account  for  expressive  timing,  he  will  infer  a  misrepresented  rhythmic  figure,  resulting  in  mistaken  metric  grid  –  which  altogether,  lead  to  severe  distortions  and  misunderstanding.    

Each  of  the  rhythmic  idioms  that  we  have  discussed  earlier,  the  3:1  puncture  and  the  2:1  swing,  possess  their  own  expressive  timing  profile,  closely  related  to  the  emotional  properties  they  are  supposed  to  convey.  

Musicians  know  that  the  punctured  rhythm  sharpens.  They,  therefore,  emphasize  the  puncture  by  shortening  the  short  note  and  lengthening  the  long  note  –  what  musicians  call  “overdotting.”  This  term  refers  to  the  dot  in  musical  notation  (on  the  right  side  of  the  note  head)  that  indicates  a  50%  elongation  of  that  note.    

!" as  opposed  to  equal  notes  of  the  “straight”  rhythm:      '( To  “overdo”  means  to  use  greater  than  50%  elongation,  violating  the  amount  

prescribed  by  notation.  The  amount  of  overdotting  can  vary  greatly  between  different  performers  -­‐  almost  to  double  size  (depending  on  the  context  of  articulation  and  tempo).  However,  the  average  elongation  of  the  long  note  is  usually  around  0.79  instead  of  0.75  specified  by  notation  (3:4  =  0.75).17  

The  rule  of  a  thumb  seems  to  be  that  joyous,  solemn,  bold  or  fiery  musical  emotions  call  for  stronger  overdotting.  The  emotional  conditions  of  pleasing,  flattering,  or  sleepiness  benefit  from  moderate  overdotting.18    

The  2:1  rhythm,  on  the  other  hand,  is  characterized  by  the  opposite  style  of  expressive  timing.  Whenever  the  2:1  “long-­‐short”  pattern  of  rhythm  sustains  throughout  the  melody  in  a  ternary  meter,  performers  tend  to  skew  a  ratio  well  

                                                                                                               16  Fujii,  Shinya  et  al.  (2011)  -­‐  Synchronization  error  of  drum  kit  playing  with  a  

metronome  at  different  tempi  by  professional  drummers.  Music  perception:  An  interdisciplinary  journal,  28(5)  p.  491.  

17  Fabian,  Dorottya;  Schubert,  Emery  (2010)  -­‐  A  new  perspective  on  the  performance  of  dotted  rhythms.  Early  Music.  Nov2010,  Vol.  38  Issue  4,  p.  585-­‐588.  

18  Hefling,  Stephen  E.  (1993)  -­‐  Rhythmic  Alteration  in  Seventeenth-­‐  And  Eighteenth-­‐Century  Music:  Notes  Inegales  and  Overdotting,  Schirmer  Books,  New  York,  pp.  101–105.  

  17  

below  the  nominal  2:1.19    The  performing  strategy  here  appears  to  aim  at  rounding  up  the  rhythm  by  making  it  closer  to  the  binary  pulse  (shortening  the  2:1  long  note  towards  the  1:1  ratio).  

Evidently,  each  of  two  rhythmic  idioms,  punctured  and  swing-­‐like  rhythms,  have  been  understood  by  music  practitioners  as  opposite  in  their  expression  –  in  order  for  these  opposite  strategies  of  expressive  timing  to  arise.  Each  of  them  handles  the  durational  contrast  in  its  own  way:  punctured  style  increases  the  contrast,  while  swinging  style  reduces  it.  Not  only  performers  exaggerate  these  rhythms  in  the  same  manner  –  listeners  expect  them  to  be  exaggerated  in  this  particular  manner.  Often  listeners  perceive  absence  of  overdotting  as  a  fault,  or,  in  contrary,  are  not  aware  of  overdotting,  because  the  rhythm  appears  to  them  “normal.”  20    

Wide  spread  of  this  convention  across  different  genres  and  styles  of  Western  music,  including  performers  as  well  as  listeners,  testifies  that  the  knowledge  of  the  punctured  and  swinging  rhythms  precedes  perception  of  them  in  music.  Music  practitioners  ought  to  know  these  idioms  and  syntactic  conditions  applicable  to  them  –  before  they  can  identify  the  exaggerated  versions  of  these  idioms.  

• Increasing  the  contrast  between  the  notes  of  the  punctured  rhythm  highlights  the  energizing  character  of  this  rhythm  (active  style).    

• In  opposite,  smooth  character  of  the  2:1  rhythm  invites  the  opposite  expressive  timing  style  –  to  reduce  the  length  of  the  long  note  and  extend  the  short  note,  so  that  their  contrast  will  be  less  pronounced  (passive  style).    

Commonality  of  both  rhythms  attests  to  the  fact  that  both,  listeners  and  performers,  are  constantly  engaged  in  the  process  of  spotting  familiar  idioms  and  identifying  their  syntactic  conditions.  In  real  life  situation,  the  rhythmic  ratio  is  not  the  only  factor  affecting  expressive  timing.  Harmony,  melody  and  texture  each  contribute  to  shortening  or  lengthening  a  particular  tone  in  music.    

• Performers  are  expected  to  mark  unexpected  harmony  or  the  most  tensed  dissonant  chord  with  considerable  extra  timing.  

• It  is  typical  for  the  tone  in  the  melody  that  commits  a  wide  jump  up  or  down  to  receive  a  little  elongation.  

• Whenever  a  new  voice  enters  in  the  texture,  its  first  note  is  usually  sustained  moderately  over  time.      

                                                                                                               19  Gabrielsson,  A.,  Bengtsson,  I.  and  Gabrielsson,  B.  (1983)  -­‐  Performances  of  

musical  rhythm  in  3/4  and  6/8    meter.  Scandinavian  Journal  of  Psychology,  24,  p.  193-­‐213.  

20  Fabian,  Dorottya;  Schubert,  Emery  (2010)  -­‐  A  new  perspective  on  the  performance  of  dotted  rhythms.  Early  Music.  Nov2010,  Vol.  38  Issue  4,  p.585-­‐588  

  18  

Subsequently,  any  music  fragment  can  incorporate  substantial  deviations  from  strict  “metronomic”  timing,  which  would  make  it  hard  for  the  listeners  to  grasp  which  idiomatic  rhythm  is  implied  by  this  or  that  rhythmic  pattern  they  hear.    

Repp  (1999)  demonstrates  how  vast  the  discrepancy  in  expressive  timing  can  be.  The  recordings  of  19  famous  pianists  performing  Beethoven's  minuet  from  Piano  Sonata   in  E-­‐flat  Major,   op.  31,  No.3  were  analyzed   in   relation   to  exact  duration  of  all  the  notes.  The  “short”  note  in  the  upbeat  punctured  figure  (long-­‐short-­‐long)   was   found   to   differ   on   its   every   appearance,   within   the  range   from   199%   of   the   notated   rhythmic   value   to   60%.   The   expressive  timing   pattern   varied     from   bar   to   bar   according   to   musical   demands   as  envisioned   by   every   performer   to   his   own   discretion.   Not   a   single  performance  featured  a  constant  pulse.  The  precise  timing  of  quarter-­‐,  eighth-­‐  and  sixteenth-­‐notes  varied  substantially  depending  on    their  musical  function.  Thus,   sixteenth-­‐notes   following   dotted   eighth-­‐notes   were   generally    prolonged    in    the    Minuet,    where    they  were  part  of  an  upbeat,  but  generally  shortened  in  the  Trio,    where  they  fell    on    the    downbeat.21  

The  scale  and  frequency  of  expressive  exaggerations  makes  it  doubtful  if  music  follows  the  rules  of  syntax  in  the  same  way  how  it  is  done  in  speech  –  where  the  standard  direction  of  processing  is  believed  to  proceed  from  lower  level  syntax  to  higher  order.  Similar  convictions  have  been  held  in  the  field  of  semiotics  of  music.  

But  the  havoc  of  expressive  exaggerations  posts  a  problem.  How  can  the  listener  decide:  does  the  auditioned  music  contain  an  overdotted  2:1  rhythm  or  an  “underdotted”  3:1?  Or,  what  if  the  durational  contrast  is  the  result  of  expressive  timing  of  melodic  jumps  or  dissonant  chords  –  and,  in  fact,  the  rhythmic  ratio  is  straight  (1:1)?  Is  it  possible  at  all  to  conceive  an  elementary  unit  in  such  conditions?    

We  face  a  chicken-­‐or-­‐the-­‐egg  dilemma  here:  is  the  music  parsed  by  detecting  the  basic  properties  of  the  sounds  in  the  music  flow  and  joining  certain  sounds  together?  Or,  is  the  music  parsed  by  matching  the  familiar  higher  order  structures  to  the  flow  of  music?    

 

Syntactic  Order:  from  Top  to  Bottom,  or  from  Bottom  to  Top?  

The  generative  theory  by  Lerdahl  and  Jackendoff  (1983)22  is  by  a  large  margin  the  most  widely  accepted  syntactic  theory  in  the  field  of  music.  One  of  its  postulates  is  that  the  levels  of  syntactic  hierarchy  are  consecutively  derived  from  the  “surface  level”  of  the  musical  tones.  In  relation  to  the  rhythm  the  generative  order  goes  like  that:  

                                                                                                               21  Repp  B.H.  (1990)  -­‐  Patterns  of  expressive  timing  in  performances  of  a  

Beethoven  minuet  by  nineteen  famous  pianists.  Journal  of  the  Acoustical  Society  of  America  88,  p.  622–641.    

22  Lerdahl,  F.,  &  Jackendoff,  R.  (1983)  -­‐  A  generative  theory  of  tonal  music.  The  MIT  Press,  Cambridge  MA.    

  19  

1. Listener  hears  the  music;  2. detects  the  beat;    3. breaks  beats  into  groups  (usually  2  or  3);  4. finds  downbeat  (the  strongest  regularly  stressed  beats);      5. defines  meter  (usually  2,  3,  6,  9,  or  12-­‐beat);  6. recognizes  measures  and  hypermeasures  (couples  of  measures).  

Example   9.   Generative   Theory   Schematics   of   a   chorale   by   J.   S.   Bach  produced  with  the  GTTMS  software  by  Robe  Seward:  

   According  to  the  generative  theory,  comprehension  of  music  follows  this  

succession  of  steps  from  the  elementary  level  to  the  most  advanced  level  of  musical  form.  The  mind  of  a  listener  unconsciously  executes  all  of  the  mental  work  and  arrives  at  the  stage  of  final  understanding  of  music.  Lerdahl  and  Jackendoff  believe  that  the  set  of  generative  rules  must  have  an  innate  origin,  and  they  insist  that  the  generative  order  guides  the  steps  of  music  comprehension  and  transcends  historical  and  geographical  varieties  of  music.    

This  theory  has  been  thoroughly  tested  in  the  past  thirty  years  and  its  generative  rules  indeed  have  been  confirmed  in  numerous  experimental  studies.  This  theory  also  proved  to  be  very  effective  in  computer  science,  providing  reliable  framework  for  automated  analysis  of  music  –  including  considerable  success  in  recognition  of  expressive  timing.  However,  the  foundation  of  generative  theory  lies  in  cognitive  sciences  –  with  little  correlation  with  the  fields  of  music  theory  or  music  practice.  Emotional  component  of  music,  so  obvious  for  most  music  practitioners,  completely  eludes  the  scope  of  generative  theory.  Moreover,  generative  theory  leaves  no  place  for  emotional  meaning  in  its  presentation  of  musical  syntax.  

One  downside  of  the  generative  theory  is  that  it  creates  an  illusion  that  syntax  can  be  effectively  studied  in  isolation  from  semantics.  Lerdahl  and  Jackendoff  hold  that  the  listener  assembles  the  hierarchic  structure  all  the  way  from  the  most  superficial  “surface”  level  of  audition  to  the  level  of  musical  form,  and  it  is  only  at  the  very  top  of  this  pyramid,  listener  somehow  comes  up  with  the  “musical  meaning.”    

  20  

Lerdahl  and  Jackendoff  view  such  meaning  “as  a  combination  of  well-­‐formedness  rules  and  preference  rules”23  –  that  is,  purely  in  structuralistic  terms,  disregarding  the  idiomatic  connection  with  emotions.  

The  most  obvious  objection  to  their  semantic  model  comes  from  the  experimental  research  on  timing  of  the  emotional  response  to  music.    

Bigand  et  al  (2005)  24  tested  what  is  the  minimal  time  that  takes  for  a  group  of  listeners  to  adequately  detect  a  musical  emotion.  Musically  trained  and  untrained  listeners  were  required  to  listen  to  27  not  popular  musical  excerpts  of  different  styles,  and  to  group  those  excerpts  that  conveyed  a  similar  emotional  meaning.  In  the  first  stage,  all  excerpts  were  25  seconds  long.    Both,  musicians  and  nonmusicians  did  very  well,  and  produced  an  equal  number  of  groups,  highly  correlated  to  the  emotional  expression  in  music.  The  amount  of  groups  and  their  contents  were  very  consistent  between  different  participants,  and  included  complex  emotions.  In  the  second  stage,  the  excerpts  were  trimmed  to  bare  1  second.  Nevertheless,  such  drastic  shortening  had  only  a  weak  effect  on  emotional  responses.  Although  some  of  the  musical  emotions  were  incorrectly  tagged,  overall,  the  perceived  emotions  were  remarkably  similar  to  those  experienced  with  longer  excerpts.  

It  is  highly  unlikely  that  the  listeners  could  have  constructed  the  syntactic  hierarchy  according  to  the  Lerdahl  and  Jackendoff’s  generative  rules  in  1  second.  Even  harder  it  is  to  explain  how  all  participants  in  the  Bigand’s  study  could  have  inferred  such  a  close  match  –  as  it  follows  from  the  high  correspondence  between  their  groupings.  In  such  short  period  of  time,  how  could  they  possibly  estimate  the  emotional  valence  of  different  excerpts  so  closely?  -­‐    Much  more  likely,  the  speed  and  coherence  in  judgment  had  to  do  with  the  listener’s  intuitive  expertise  in  musical  idioms  that  they  detected  within  1  second  of  music.  

Such  speed  of  detection  characterizes  recognition  of  familiar  melodies,  as  revealed  in  another  experimental  study.25  Musicians  and  non-­‐musicians  were  presented  with  segments  of  increasing  duration  of  familiar  and  unfamiliar  melodies  and  asked  to  sing  the  continuation  of  the  melody.  The  results  showed  that  3  to  6  notes  (i.e.,  about  2  sec)  of  a  familiar  melody  were  sufficient  to  evoke  a  feeling-­‐of-­‐  knowing  judgment.  Two  additional  notes  allowed  the  participants  to  gain  full  confidence  and  carry  on  the  tune.  Identifying  unfamiliar  melodies  took  longer:  8-­‐10  notes.  

                                                                                                               23  ibid.  p.  312.  24  Bigand,  E.;  Vieillard,  S.;  Madurell,  F.;  Marozeau,  J.;  Dacquet,  A.  (2005)  -­‐

Multidimensional  scaling  of  emotional  responses  to  music:  The  effect  of  musical  expertise  and  of  the  duration  of  the  excerpts.  Cognition  &  Emotion.  Dec2005,  Vol.  19  Issue  8,  p.  1113-­‐1139.  

25  Bella,  Simone;  Peretz,  Isabelle;  Aronoff,  Neil  (2003)  -­‐  Time  course  of  melody  recognition:  A  gating  paradigm  study.  Attention,  Perception,  and  Psychophysics,  October  2003,  Vol.  65  Issue:  Number  7  p.  1019-­‐1028.  

  21  

The  2-­‐3  second  range  in  detection  of  familiarity  agrees  with  the  span  of  time  necessary  for  integration  of  felt  emotion  in  the  listeners.  A  group  of  81  subjects  recruited  from  Boston  metropolitan  area  were  asked  to  indicate  emotional  valence  (positive-­‐negative)  of  their  felt  response  to  138  musical  excerpts  drawn  from  11  genres  of  music.  It  was  established  that  listeners  require  from  8.31  to  11.00  seconds  of  listening  before  they  can  formulate  emotional  judgments  regarding  the  musical  stimuli  they  experience.26    

The  time  frame  for  emotional  reaction  to  music  appears  to  take  less  than  25  seconds  and  contain  emotional  “guess”  during  the  first  second  of  listening,  followed  by  another  second,  during  which  the  listeners  find  out  if  the  music  they  hear  contains  any  familiar  structures.  In  the  next  2-­‐3  seconds  they  seem  to  finalize  their  decision  on  which  musical  emotion  is  contained  in  music.  Extra  2-­‐3  seconds  allow  the  listeners  to  become  aware  of  their  “felt”  emotions  and  move  on  in  fine-­‐tuning  of  their  emotional  experience.      

The  time  frame  required  for  the  estimation  of  the  minimal  hierarchy  of  generative  grammar  –  the  level  of  hypermeasures  –  needs  at  least  4  measures  of  music.  In  moderate  tempo  of  M=60,  in  common  time,  that  would  give  16  seconds  of  music.  Then,  it  appears  that  by  the  time  a  listener  infers  the  syntactic  hierarchy,  he  already  has  long  recognized  the  musical  emotion  and  become  aware  of  his  emotional  state  invoked  by  music.  Hardly  such  situation  can  be  qualified  as  “making  sense”  of  music  through  inference  of  its  “generative  grammar”  according  to  Lerdahl  and  Jackendoff.  

Yet  another  argument  why  the  “bottom-­‐to-­‐top”  syntax  cannot  secure  sense-­‐making  in  perception  of  music  is  that,  in  practice,  no  hierarchy  building  is  possible  without  having  a  preconceived  idea  of  what  the  hierarchic  structure  is  likely  to  be.  A  person  has  to  be  familiar  with  at  least  few  types  of  hierarchy  in  order  to  be  able  to  construct  one.    

Furthermore,  the  choice  of  the  material  appropriate  for  hierarchical  ranking  depends  on  the  prior  knowledge  of  that  given  hierarchy.  A  person  has  to  know  what  to  look  for  in  music  –  before  he  can  define  one  layer  in  relation  to  another  layer.  Otherwise  Western  listeners  would  have  no  trouble  making  sense  of  Chinese  traditional  opera.  Evidently,  they  are  incapable  of  constructing  an  effective  hierarchic  scheme  upon  listening  to  Chinese  music,  because  they  are  not  familiar  with  any  prototype  of  the  hierarchy  employed  by  the  Chinese  musical  syntax.      

The  adherents  of  generative  theory  do  not  acknowledge  this  dependence.  They   believe   that   at   the   surface   layer,   music   can   be   categorized   and  understood   directly,   without   any   prior   knowledge.   Jackendoff   (1987)  emphasizes   that   the   “musical  surface”   is  not  a  structure,  but  rather  a   “set  of    

                                                                                                               26  Bachorik,  Justin  Pierre  et  al  (2009)  -­‐  Emotion  in  motion:  Investigating  the  

time-­‐course  of  emotional  judgments  of  musical  stimuli.  Music  perception:  An  interdisciplinary  journal,  26(4)  p.  355.  

  22  

relationships  between  elements  that  are  present  in  levels  of  representation”27  –   comparable   to   set   of   colors   for   vision  or   set   of   phonemes   for   speech.  The  brain  encodes  the  stream  of  music  in  terms  of  discrete  pitch  events  of  specific  duration  –  according  to   Jackendoff,   this  constitutes   the  “musical  surface”  –  a  “lowest  level  of  representation  that  has  musical  significance.”    

However,  the  notion  of  the  “musical  surface”  as  the  lowest  level  of  syntax  by  itself  is  shown  to  depend  on  the  “higher-­‐level”  percepts.  The  surface  is  not  perceptible  unless  the  higher  order  structures  are  acknowledged  by  the  mind.  The  elementary  unit  of  music,  a  musical  tone,  cannot  be  perceived  unless  the  mind  knows  its  anatomy:    

• Duration  of  a  tone  cannot  exist  without  awareness  of  the  beat;  

• Pitch  cannot  exist  without  awareness  of  the  tuning  system;  

• Timbre  cannot  exist  without  categorization  of  texture  (decision  of  whether  we  hear  a  single  tone  of  complex  timbre  or  two  tones  of  simpler  timbre,  played  in  unison).  

And   beat,   tuning   system   and   typology   of   textures   are   all   higher   order  percepts  –  one  can  become  aware  of  them  only  as  a  result  of  comprehension  of  the  entire  hierarchic  scheme  for  a  given  piece  of  music.    

• Beat  is  the  prevailing  average  periodic  duration  observed  in  a  given  piece  of  music  (it  can  be  inferred  only  after  the  mind  registers  a  number  of  regular  pulses  and  correlates  their  ratio).  

• Tuning  system  is  the  conglomerate  of  pitches  that  set  the  standard  for  all  the  intervals  possible  within  a  given  music  system  (it  can  be  inferred  only  after  all  the  pitches  available  for  music  making  are  estimated  in  their  octave  equivalence).  

• Texture  is  the  utmost  complexity  of  melodic,  rhythmic  and  harmonic  materials  integrated  together  by  a  particular  function  (such  as  accompaniment,  imitation  or  contrast).      

Beat,   tuning  system  and  texture  are  all  hierarchies  –   they  should  not  be  mistaken   for   the   rules   of   generative   grammar.   There   are   about   few   dozen  types  of  beats,  categorized  in  a  variety  of  ways  depending  on  their  duration,  regularity   and   articulation   style.   There   are   three   tuning   systems   used   in  Western  music   practice   (Pythagorean,  mean   tone   and   equal   temperament),  each  comprising  a  particular  hierarchy  of   intervals.  There  are  about  a  dozen  texture   types   common   for  Western  music,   and   each   of   these   types   unites   a  number   of   voices   related   to   each   other   in   a   particular   way.   Beat,   tuning  system  and  texture  are  complexities  that  obey  to  the  rules  of  rhythmic,  metric,  melodic  and  harmonic  syntax.  

                                                                                                               27  Jackendoff,  R.  (1987)  -­‐  Consciousness  and  the  computational  mind.  The  MIT  

Press,  Cambridge  MA,  p.  218-­‐219.  

  23  

So,  even  perception  of  a  single  tone  depends  on  the  knowledge  of  higher  order  syntax.  In  order  to  conceive  the  “surface  level”  of  music  in  its  entirety,  one  has  to  constantly  keep  in  mind  the  syntactic  hierarchy  for  pitch,  rhythm,  meter  and  texture.  Without  awareness  of  organization  of  the  beat  (temporal  aspect),  harmonic  segmentation  (pitch  aspect)  and  texture  categorization  (timbral  aspect)  –  musical  surface  simply  cannot  exist  as  an  entity.    

Cambouropoulos   (2010) 28  provides   experimental   support   for   this  objection.  He   recorded   few   sequences   of   tones   of   slightly   different   duration  (i.e.  the  first  sequence  consisted  of  the  1st  note  equal  to  15  64th  notes,  2nd  note  –  to  17,  3rd  note  –  to  15,  and  the  following  ones  to  13,  11,  12  and  13  64th  notes  respectively).   This   and   similar   sequences   were   presented   to   46  undergraduate   music   students,   who   were   asked   whether   they   perceived   a  metrical   structure   in   the   sound   of   a   sequence   (Yes/No).   Additionally,   they  were  urged  to  notate  each  sequence  in  standard  music  notation.    

Example  10.  Unevenly  timed  tones  are  categorized  in  variety  of  ways  by  different  listeners  (from  Cambouropoulos  2010):  

 The  majority  of  participants  found  the  first  sequence  to  be  ametric  (free  

floating).  The  other   sequences   that  were  not  as   isochronous  as   the   first  one  were  notated  by  different  participants   in  a  number  of  different  meters,  with  considerable   discrepancies   between   the   rhythmic   durations   for   each   of   the  notes  in  a  sequence.  Evidently,  even  musically  trained  participants  could  not  reach  an  agreement   in   their   recognition  of   the   “musical   surface,”   and   in   the  first  case,  simply  were  not  capable  of  categorizing  the  “musical  surface.”  

Cambouropoulos  also  points  out  that  the  research  in  automated  transcription  of  music  demonstrates  that  surface-­‐level  musical  analysis  depends  on  the  higher-­‐order  categories.  The  first  attempts  of  building  software  applications  for  transcription  of  

                                                                                                               28  Cambouropoulos,  Emilios  (2010)  -­‐  The  Musical  Surface:  Challenging  Basic  

Assumptions.  Musicae  Scientiae,  Special  Issue,  pp.  131-­‐148.  

  24  

recordings  of  polyphonic  music  relied  completely  on  the  bottom-­‐to-­‐top  algorithms,  and  were  proven  not  to  be  effective.  Recent  research  has  shown  that  a  purely  “bottom-­‐up”  approach  cannot  achieve  satisfactory  recognition  of  audio  -­‐  higher-­‐level  music  processing  (such  as  recognition  of  chords  and  voices)  is  necessary  to  enable  basic  multi-­‐pitch  and  onset  extraction.  Inclusion  of  such  higher  order  algorithms  as  multi-­‐pitch  analysis,  beat  tracking,  instrument  recognition,  harmonic  analysis  and  chord  transcription,  as  well  as  music  structure  analysis29  –  allowed  for  significant  improvement  in  accuracy  of  transcription  –  in  excess  of  60%  in  some  cases  of  polyphonic  music.    

The  listener’s  mind  is  no  different  from  a  computer  when  it  comes  to  recognition  of  music  –  the  same  process  of  “transcription”  takes  place.  In  order  for  listener  to  parse  musical  tones  into  groups,  he  has  to  estimate  the  rhythmic  values  of  each  of  the  tones.  But  identifying  the  rhythm  is  impossible  without  the  beat.  And  beat  is  a  holistic  concept:  it  has  to  be  inferred  from  the  totality  of  all  tones,  based  on  the  prevailing  average  relations  between  them.  Similar  integrative  operations  are  required  for  comprehension  of  elementary  units  of  other  aspects  of  musical  expression.      

In  order  to  know  what  in  the  sound  scene  ought  to  be  regarded  as  a  motif,  every  listener  has  to  decide  which  tones  should  be  regarded  consecutively  and  which  simultaneously    -­‐  that  is,  to  distinguish  between  chords  and  melodic  lines.  Then  the  listener  has  to  determine  which  timbres  have  a  priority  -­‐  i.e.  figure  out  the  instruments  (or  vocals)  that  perform  a  melody,  and  focus  on  them,  paying  less  attention  to  the  other  instruments.    

Throughout  the  act  of  listening,  one  has  to  keep  constantly  skipping  back  and  forth  between  the  low-­‐level  elements  and  high-­‐level  concepts.  Are  the  heard  tones,  in  fact,  short  notes  passages  in  slow  tempo,  or  are  they  long  notes  in  a  very  fast  tempo?  Is  it  the  rhythm  that  features  progressive  shortening  of  notes,  or  is  it  the  tempo  that  slows  down?  Listener  has  to  answer  questions  like  that  all  the  way  through  tracing  the  “surface  level”  of  music.  But  no  answers  are  possible  without  looking  into  higher  order  metric  and  melodic  structures.  

Several  researchers  provided  experimental  evidence  for  the  “top-­‐down”  processing  of  musical  syntax.  Some  of  them  even  went  as  far  as  to  conclude  that  rhythm  only  exists  under  the  conditions  of  such  processing.  A  dedicated  term  has  been  coined  for  this  precondition  of  rhythm  -­‐  “metrical  representation”.  The  model  for  it  has  been  summarized  by  Povel  and  Essens  (1985).30    

Time,  and  therefore  temporal  intervals,  can  only  be  assessed  by  means  of  a  clock.  Meter  plays  the  function  of  such  a  clock  for  music,  providing  the  accurate  internal  representation  of  rhythm  in  music.  The  stresses  detected  in  music  invoke  a  

                                                                                                               29  Ryynänen  M.  P.,  &  Klapuri  A.  P.  (2008)  -­‐  Automatic  transcription  of  melody,  

bass  line,  and  chords  in  polyphonic  music.  Computer  Music  Journal,  32(3),  p.  72-­‐86.  30  Povel,  D.J.  &  Essens,  P.  (1985)  -­‐  Perception  of  temporal  patterns.  Music  

Perception,  2(4):  p.  411-­‐440.  

  25  

metric  pulse  in  the  mind  of  a  listener.  In  case  if  its  beats  coincide  with  the  actual  accents  heard  in  music,  the  perception  of  their  dynamic  strength  becomes  amplified.  Then  the  estimations  of  exact  timing  in  rhythm  are  done  with  greater  precision  –  providing  more  information  and  supporting  richer  emotional  experience.  If  the  clock  is  not  well  chosen,  the  rhythmic  pattern  is  poorly  reproduced  and  judged  ambiguous,  putting  emotional  response  under  question.    

The   effect   of  metrical   representation   is   so   strong   that   it   can   drastically  change  the  “tiling”  in  the  rhythm  space.  In  the  series  of  tests  conducted  on  the  group   of   professionally   trained   musicians,   simple   rhythmic   pattern   of   3  sounds  with  the  musical  ratio  of  1:2:1  (210  ms,  474  ms,  316  ms)  was  correctly  identified  as  1:2:1,  when  presented  to  the  subjects  without  the  drum  stressing  a   metric   pulse.   Addition   of   the   sound   track   with   the   binary   metric   pulse  performed  on  the  woodblock  did  not  change  the  recognition  of  the  ratio.  Only  10%   of   respondents   mistook   it   for   1:3:2.   However,   when   the   very   same  rhythm   was   played   against   the   ternary   metric   pulse,   the   rhythm   was  interpreted  mostly  as  1:3:2    -­‐  and  not  one  identifying  it  as  1:2:1.  The  rhythmic  categorization  so  prevalent  in  duple  meter  conditions  completely  disappeared  in  triple  meter.31    

Awareness   of   meter,   which   is   a   higher-­‐order   concept,   clearly   affects  discrimination  of  rhythm,  which  is  a  low-­‐level  percept.  The  dynamics  of  their  interaction  is  such  that  meter  induction  and  rhythm  categorization  appear  to  run  in  parallel,  as  two  correlated  processes,  supporting  or  negating  each  other  throughout  the  flow  of  music  –  thereby  negotiating  musical  meaning.32  

We  recognize  the  surface  level  rhythm  in  terms  of  pre-­‐established  cognitive  framework  of  time  structuring.  Rhythmic  categorization  and  beat  induction  appear  to  function  as  two  overlapping  modular  processes:  the  perceived  audio  material  induces  the  beat,  which  in  turn  influences  the  categorization  of  new  incoming  material.  This  becomes  a  self-­‐adjusting  on-­‐going  process  as  long  as  the  music  keeps  sounding.  

Similar  modularity  can  be  observed  in  relation  to  the  pitch.  Thus,  chord  in  music  is  perceived  as  a  single  perceptual  unit  rather  than  a  combination  of  tones.  Our  ear  analyzes  a  complex  spectrum,  breaking  it  into  partials  and  reintegrating  them  into  a  single  percept  related  to  a  single  perceived  root  tone  of  that  chord.    

Richard  Parncutt  has  conducted  a  comprehensive  research  in  perception  of  chords,  and  came  to  conclusion  that  chords  are  processed  in  essentially  the  same  

                                                                                                               31  Desain,  P.  and  Honing,  H.  (2001)  -­‐  Modeling  the  Effect  of  Meter  in  Rhythmic  

Categorization:  Preliminary  Results.  Japanese  Journal  of  Music  Perception  and  Cognition,  7(2),  145–56.  

32  Desain  P;  Honing  H.  (2003)  -­‐  The  formation  of  rhythmic  categories  and  metric  priming.  Perception;  Vol.  32  (3),  pp.  341-­‐65.  

  26  

way  as  individual  pitches.33  Then,  individual  pitch  and  individual  chord  are  found  to  belong  to  different  hierarchic  layers,  yet  the  mind  of  a  listener  has  to  be  aware  of  both  of  them  simultaneously.  He  must  decide  for  each  pitch  he  hears,  whether  it  is  part  of  a  chord  or  a  separate  entity.  Therefore,  categorization  of  lowest  level  pitch  order  depends  on  higher  order  percepts  of  chords.  

Example   11.   Chord   versus   non-­‐chordal   tones.   Debussy   -­‐   La   puerta   del  Vino  from  Preludes,  vol.2:  

   

In  the  prelude  by  Debussy  La  puerta  del  Vino  the  correct  perception  of  harmony  depends  on  the  prior  knowledge  of  chords  that  are  typical  for  Western  music.  Otherwise  the  combination  of  tones  D  flat,  F,  B  and  E  can  be  mistaken  for  a  chord,  whereas  the  sound  E  is  a  non-­‐harmonic  tone  that  is  used  to  interfere  with  the  underlying  chordal  structure  –  to  create  tonal  tension  and  generate  melodic  motion.    

Taking  this  tone  for  an  organic  part  of  a  chord  would  distort  the  expression  of  the  music:  the  returns  of  E  in  the  melody  would  become  just  ordinary  repetitions  of  the  same  chord,  without  any  harmonic  contrast.  Then  the  music  would  appear  monotonous  and  relaxed  –  which  clearly  contradicts  the  prescription  of  the  author  “very  expressive,”  as  well  as  the  program  of  the  music,  suggested  by  its  title.  The  brusque  harmony  feeds  into  the  growing  tension  in  the  piece,  where  seductive  mysterious  atmosphere  thickens  and  eventually  erupts  in  violent  explosion.  

Yet  another  typical  controversy  that  can  arise  as  a  result  of  wrong  categorization  of  chords  is  taking  polyphonic  combination  of  melodic  

                                                                                                               33  Parncutt,  Richard  (1989)  -­‐  Harmony:  A  Psychoacoustica1  Approach,  Springer-­‐

Verlag  Berlin,  p.  68-­‐70.  

  27  

voices  for  a  chord.  Colorful  harmonic  sound  can  originate  from  juxtaposition  of  melodic  changes  in  different  voices.  When  numerous  voices  all  move  at  the  same  time  that  can  create  an  illusion  of  a  chord.    

Example  12.  Chord  versus  linear  sonance.  Mussorgsky  -­‐  Ballet  of  the  Unhatched  Chicks  from  Pictures  from  an  Exhibition:  

 In  this  extravagant  trio  from  Mussorgsky’s  Ballet  of  the  Unhatched  

Chicks  the  texture  is  layered  into  5  voices  –  a  very  demanding  task  for  the  pianist  to  carry  out.  Then  the  challenge  goes  for  the  listener:  he  has  to  identify  the  presumable  “chords”  –  the  sounds  that  fall  on  the  downbeat  –  and  realize  that  neither  they  form  conventional  chordal  progressions,  nor  they  emphasize  traditional  functions  of  tonic,  dominant  or  subdominant.    

The  pseudo-­‐chords  are  mere  sonances  –  the  accidental  combinations  of  tones  that  occur  as  a  result  of  melodic  motion  in  different  voices  in  the  artificial  mode  F-­‐G#-­‐A-­‐B-­‐C-­‐C#-­‐D-­‐E.  This  heavily  sliced  texture  in  a  strange  mode  is  called  to  illustrate  phantasmagoric  situation  of  chicks  that  try  to  dance  in  the  most  delicate  and  sophisticated  way  while  still  not  being  able  to  get  out  from  their  shell.        

Not  only  categorization  of  vertical  harmony  depends  on  the  vocabulary  of  chords  and  chordal  progressions  –  completely  horizontal  melodic  progressions  are  categorized  exactly  in  the  same  way  –  by  fitting  the  melodic  motifs  into  the  framework  of  known  chordal  structures  and  chordal  progressions.    

Detection  of  pitches  is  determined  by  knowledge  of  the  most  common  chords  and  rules  of  their  connection  in  music  –  what  constitutes  the  set  of  harmonic  idioms.  “Chord  recognition  is  the  result  of  a  successful  memory  search  in  which  a  tone  series  is  recognized  as  a  pattern  stored  in  long-­‐term  memory.”34    

The  listener  recognizes  that  a  certain  progression  of  pitches  in  melody  utilizes  the  tones  of  a  familiar  chord  (such  as  the  melody  of  the  Blue  Danube  Waltz  by  Johann  Strauss  Jr.).    

Example   13.   Johann   Strauss   Jr.   -­‐   the   Blue   Danube  Waltz.   The   brackets  show  the  changes  in  vertical  harmony  (the  chords  of  harmonization).    

                                                                                                               34  Povel,  D.-­‐  J.,  &  Jansen,  E.  (2001)  -­‐  Perceptual  mechanisms  in  music  processing.  

Music  Perception,  19  (2),  p.  169–198.  

  28  

  <___________________________> <____________

 _____________> <________________________> <___________

This   seemingly   “elementary”  melody   requires   great   expertise  of   chords.  Without  it,  it  would  be  very  confusing  to  distinguish  between  the  G  in  bar  2  of  the  example  above  and  the  G  in  bar  6  –  as  well  as  the  A  in  bar  10  and  the  A  in  bar   14.   The   obvious   similarity   of   the  melodic   phrases   to   the   C  major   triad  (bars   1-­‐7)   and   the   half-­‐diminished   seventh-­‐chord   (bars   8-­‐15)   can   only  confuse   the   listener   unfamiliar   with   the   typical   progressions   of   tonic   and  dominant   harmony.   Then   the   clash   between   the   vertical   and   the   horizontal  harmonies   in   this   famous   melody   will   not   come   to   the   surface   and   the  listener’s   perception   of   the  music  will   lose   in   dynamism,  making   the  music  appear  trivial.  

The  categorization  of  pitch  by  chords  is  not  limited  to  the  melodies  like  the  Blue  Danube.  The  research  by  Dirk-­‐Jan  Povel  shows  that  any  tone  sequence  is  defined  in  harmonic  terms.  While  tracking  the  pitch  in  melody,  listeners  are  constantly  engaged  in  guessing  as  to  which  chords  the  tones  of  melody  belong,  and  at  which  point  of  the  melody  one  implied  chord  is  changed  by  another.    

Then  both  hypothetical  chords  are  related  to  each  other  and  measured  against  the  stock  of  known  progressions  of  chords.  If  the  progression  of  chords  observed  in  music  is  unknown,  the  listener  looks  for  an  alternative  categorization  of  a  chord.  Each  chord  is  stored  in  memory,  while  the  most  recent  chord  change  appears  to  be  most  salient.  As  the  music  progresses,  the  listener  comes  up  with  projections  of  future  chords,  which  can  be  confirmed  or  deceived  by  music  –  leading  to  experience  of  different  melodic  expression.35    

As  we  see,  perception  of  rhythm,  pitch  and  harmony  is  strongly  idiomatic  and  is  impossible  without    “top-­‐down”  processing  of  syntax.  There  is  evidence  that  all  aspects  of  musical  expression  operate  on  idiomatic  base  –  coming  from  the  error-­‐filtering  phenomenon,  well  familiar  to  any  professional  classical  musician.  Mistakes  in  performance  that  appear  so  striking  to  a  performer  largely  pass  unnoticed  by  the  audience  -­‐  something  that  all  professionals  eventually  learn  throughout  their  career.  

                                                                                                               35  Jansen,  Erik;  Povel,  Dirk-­‐Jan  (2004)  -­‐  Perception  of  arpeggiated  chord  

progressions.  Musicæ  scientiæ:  The  journal  of  the  European  Society  for  the  Cognitive  Sciences  of  Music,  8(1)  p.  7.  

  29  

That  is  why  more  experienced  performers  take  public  performance  easier,  knowing  that  they  will  get  away  with  lots  of  misses  in  the  score.36  

Research  studies  show  that  the  percentage  of  unnoticed  mistakes  is  amazingly  high:  even  the  most  obvious  of  mistakes  –  wrong  pitches  –  run  38%  unnoticed  –  and  this  is  not  by  laypeople,  but  by  the  graduate  students  majoring  in  piano.37  More  so,  they  overlook  more  than  a  third  of  pitch  misses  in  the  music  they  well  knew,  and  some  of  them  even  have  recently  studied.  The  range  of  errors  include  intrusions,  omissions,  untied  notes  and  substitutions,  and  involves  not  only  pitch  and  rhythm,  but  texture,  articulation,  dynamics  etc..  Such  poor  error  detection  (likely  to  be  even  lower  for  laymen  audience)  reflects  the  idiomatic  nature  of  music  comprehension:  performers  prioritize  in  their  attention,  and  “allow”  mistakes  in  subsidiary  places  (i.e.  in  chords  rather  than  in  melodies,  and  in  shorter  rather  than  longer  rhythms).  

The  idiomatic  based  error-­‐correction  mechanism  is  most  obvious  in  sight-­‐reading:  when  a  musician  performs  an  unknown  piece  of  music  from  sight,  by  reading  the  score.  Examination  of  the  errors  made  during  sight-­‐reading  demonstrates  that  most  of  them  are  judged  by  the  performers  as  contextually  appropriate  –  and  therefore,  hard  to  notice.  Musicians  tend  to  recognize  effectively  the  low  level  syntax  and  make  "smart  guesses"  -­‐  filtering  errors  from  places  where  they  would  be  most  noticeable.    

The  second  line  of  defense  against  communication  failure  is  provided  by  the  listeners.  They  guess  the  right  way  for  the  music  to  go  in  most  situations  where  things  went  wrong  for  the  performer.  Repp  (1997)  found  out  that  absolute  majority  of  omission  errors  pass  unnoticed  by  the  audience.  Only  4  out  of  28  substitution  errors  were  judged  as  inappropriate.  Only  19%  of  intrusions  were  detected.  What  is  even  more  amazing  is  that  between  62%  of  the  total  errors  are  not  detected  by  professionally  trained  musicians-­‐listeners.38  

It  looks  like  the  more  a  listener  knows  about  music,  the  more  naturally  he  guesses  the  “right”  text  messed  up  by  the  performer.  What  we  observe  here  at  work  is  the  same  mechanism  of  error  correction  known  in  psycholinguistics  as  "scrambled  letters"  effect.39  It  is  experimentally  confirmed  that  most  words  with  typographic  errors  can  be  guessed  right  even  if  the  letters  are  completely  misplaced  (as  in  the  phrase  “Raeding  Wrods  With  Jubmled  Lettres”)  -­‐  provided  the  first  and  last  letter  of  a  scrambled  word  is  in  the  right  place.  Even  words  with  the  inverted  order  of  letters  are  easily  comprehensible.  Substitution  of  letters  presents  more  of  a  problem,  but  is  still  manageable  as  long  as  the  substituted  characters  are  similar  in  shape  and  the  first  letter  is  right.  

                                                                                                               36  Sloboda,  John  (1985)  -­‐  The  musical  mind:  The  cognitive  psychology  of  music.  

Clarendon  Press,  Oxford,  p.  85.  37  Repp  B.H.  (1997)  -­‐  The  art  of  inaccuracy:  Why  pianists'  errors  are  difficult  to  

hear.  Music  Perception,  14,  p.  161-­‐184.  38  Ibid.  39  Rayner,  Keith  et  al  (2006)  -­‐  Raeding  Wrods  With  Jubmled  Lettres:  There  Is  a  

Cost.  Psychological  Science,  17(3),  p.  192-­‐193.    

  30  

Evidently,  the  main  criterion  for  ease  of  error-­‐correction  is  the  integrity  of  a  word.  The  correction  mechanism  must  be  directed  "bottom-­‐up."  The  data  for  correct  recognition  of  a  word  comes  primarily  from  the  elementary  units  of  the  text.  The  brain  matches  this  information  to  the  vocabulary  of  known  words  and  comes  up  with  a  guess.  The  hypothetical  correction  then  is  weighted  against  the  earlier  words,  stored  in  short  term  memory,  taking  in  account  the  commonality  of  the  combinations  of  words.  Such  model  of  correction  has  been  successfully  tested  in  studies  that  measured  the  speed  of  comprehension  of  “jumbled-­‐word”  texts.40  

Similar  mechanisms  are  engaged  in  processing  the  low  level  syntactic  structures  in  music.  The  “bottom-­‐up”  processing  is  correlated  with  the  “top-­‐down”  processing.  Both  of  them  involve  the  short  and  long  term  memory.  The  short-­‐term  memory  cashes  the  most  immediately  preceding  patterns  of  music,  whereas  the  long-­‐term  memory  retrieves  the  entries  from  the  listener’s  lexicon  of  familiar  musical  structures.  One  more  component  necessary  for  successful  error  correction  in  music  is  the  issue  of  semantic  relevance  –  the  emotional  reaction  to  a  parsed  elementary  unit  of  music  should  match  the  context  of  the  auditioned  excerpt.  

 

Emotional  Factor  in  Syntactic  Categorization  of  Music  

We  already  have  seen  the  evidence  of  ultra-­‐fast  emotional  response  to  a  music  stimulus,  in  splitting  fast  1  to  9  seconds.  Let’s  return  to  our  previous  example  of  the  binary  meter  and  examine  what  exactly  happens  during  such  a  response.  

When  we  hear  a  succession  of  notes  of  equal  duration,  by  default,  our  brain  invokes  the  binary  metric  pulse.  It  has  been  experimentally  proven  that  spontaneously  produced  rhythmic  patterns  generate  durations  related  primarily  by  1:1  and  2:1  ratios.41  Our  mind  starts  categorizing  the  tones  we  hear  in  binary  grid.  Once  our  projected  beats  are  found  to  coincide  with  the  actual  accents  in  music,  our  mind  focuses  on  the  musical  events  that  coincide  with  the  binary  pulse  –  amplifying  those  events  and  feeding  back  our  experience  of  binary  pulsation.  At  this  point  our  brain  generates  “virtual”  movement  –  an  ongoing  pattern  of  “left-­‐right”  steps  superimposed  on  the  flow  of  music.  Then  all  the  rhythmic  patterns  become  projected  within  the  march-­‐like  space.      

The   unity   of   metric   induction,   rhythmic   categorization   and   motor  representation  has  been  elaborated  by  Neil  Todd  in  his  sensory-­‐motor  theory  

                                                                                                               40  Paciorek,  Wiktor;  Ralczaszek-­‐Leonardi,  Joanna  (2009)  -­‐  The  influence  of  

sentential  context  and  frequency  of  occurrence  on  the  recognition  of  words  with  scrambled  letters,  Psychology  of  Language  and  Communication,  January  2009,  Vol.  13  Issue:  Number  2  p.  45-­‐57.  

41  Fraisse,  P.  (1982)  -­‐  Rhythm  and  tempo.  In:  Psychology  of  music,  D.  Deutsch  (Ed.).  New  York,  NY:  Academic  Press.  pp.  149-­‐180.  

  31  

of   rhythm.42  According   to   it,   the   experience   of   rhythm   is   mediated   by   two  complementary   representations:   a   sensory   representation   of   the   musical  movement,   and   a  motor   representation   of   the  musculoskeletal   system.   This  cross-­‐modal  connection  allows  a  person  to  appropriate  new  forms  of  musical  movement   and/or   convert   them   into  new  physical  motions.   The   connection  between  motor   representations   and  musical  movement   is   supported   by   the  findings   from   the  neuro-­‐imaging   research,  which   reveal   that   the   same  brain  areas   that   are   associated  with   vestibular  processing   are   involved   in   rhythm  perception.43    

Laurel  Trainor  conducted  series  of  studies  investigating  the  contribution  of   vestibular   system   to   rhythmic   categorization.   The   last   of   her   studies  established   that   the   auditory   perception   of   metrical   structure   in   musical  rhythm  can  be  influenced  by  artificial  stimulation  of  the  vestibular  nerve  -­‐  in  the  absence  of  any  physical  movement.44    

The  ramifications  of  this  is  that  listeners  internalize  musical  movement  in  terms  of  locomotory  impulses  in  their  body  parts  –  and  in  great  detail.  Toiviainen  et  al  (2010)45  conducted  an  experiment,  where  the  group  of  musicians  were  instructed  to  move  to  the  music,  and  their  movements  were  video-­‐recorded  and  analyzed  in  relation  to  the  metric  grid  of  the  music.  A  kinetic  analysis  of  peaks  in  mechanical  energy  revealed  that  participants  embodied  the  metric  pulses  on  numerous  levels:  synchronizing  motion  of  different  parts  of  their  body  to  periods  of  one,  two  and  four  beats.    

• The  1-­‐beat  pulse  was  associated  with  vertical  hand  and  torso  movements  as  well  as  mediolateral  arm  movements,    

• the  two-­‐beat  pulse  -­‐  with  mediolateral  arm  movements  and  rotation  of  the  upper  torso,    

• the  4-­‐beat  pulse  –  with  lateral  flexion  of  the  torso  and  rotation  of  the  upper  torso.    

Higher  order  meter  involved  central  parts  of  the  body,  whereas  basic  beat  engaged  the  limbs.  Altogether,  the  entire  metric  hierarchy  was  simultaneously  represented  in  spontaneous  yet  systemic  physical  motions.    

                                                                                                               42  Todd,  Neil  P.;  O'Boyle,  Donald  J.;  Lee,  Christopher  S.  (1999)  -­‐  A  sensory-­‐motor  

theory  of  rhythm,  time  perception  and  beat  induction.  Journal  of  new  music  research,  28(1)  p.5  .  

43  Grahn  J.A;  Brett  M.  (2007)  -­‐  Rhythm  and  beat  perception  in  motor  areas  of  the  brain.  Journal  Of  Cognitive  Neuroscience,  2007  May;  Vol.  19  (5),  pp.  893-­‐906.  

44  Trainor,  L.  J.,  Gao,  X.,  Lei,  J.,  Lehtovarara,  K.,  SC  Harris,  L.  R.  (2009)  -­‐  The  primal  role  of  the  vestibular  system  in  determining  musical  rhythm.  Cortex,  45,  p.  3543.  

45  Toiviainen  P,  Luck  G,  Thompson  M  (2010)  -­‐  Embodied  meter:  Hierarchical  eigenmodes  in  music-­‐induced  movement.  Music  Perception  28,  p.  59-­‐70.  

  32  

Going  back  to  our  example  of  experiencing  the  surface  rhythm  of  a  binary  meter,  the  motor  representation  of  the  rhythm  evokes  the  emotional  conditions  characteristic  for  marching  style  of  motion.  Marching  is  associated  with  communal  feeling  –  the  sense  of  togetherness  in  devotion  to  a  common  goal.  It  also  implies  the  sense  of  discipline  and  direction:  marching  people  move  towards  a  specific  target  in  a  specific  manner.  That  is  why  the  surface  level  of  music  in  binary  meter  receives  the  semantic  attributes  of  forcefulness,  togetherness,  commitment  and  purposefulness.  

It  does  not  take  long  before  a  listener  becomes  submerged  into  this  state.  Majority  of  listeners  can  identify  the  march  right  from  the  first  measure.  There  is  no  need  to  comprehend  the  entire  hierarchy  of  temporal  organization,  including  the  hypermeter,  to  experience  the  movement  of  the  music.  The  motoric  reflexes  set  by  music  are  instantaneous.  The  time  delay  might  be  necessary  only  in  cases  of  ambiguity,  where  few  attempts  of  guessing  would  have  to  be  made  before  committing  to  the  particular  metric  clock.  

And  this  is  what  we  are  going  to  observe  in  case,  where  the  beat  in  music  goes  not  by  binary,  but  by  ternary  pulse.  Fujioka  et  al  (2010)46  used  magneto-­‐encephalography   and   spatial-­‐filtering   source   analysis   to   identify   a   strong  effect  of  the  metric  contrast  between  waltz  and  march  on  exact  timing  of  the  activation   of   different   parts   of   the   brain.   Thus,   the   right   hippocampus   was  activated   80   ms   after   the   march   downbeat   and   250   ms   past   the   waltz  downbeat.  Basal  ganglia  showed  a  greater  80  ms  peak  for  the  march  than  for  the  waltz.    

Higher   reactivity   to   march   must   be   explained   by   the   perceptual  advantage  of  binary  meter  of  march  over  ternary  meter  of  waltz.   Identifying  the  ternary  meter  takes  longer,  because  of  an  extra  step  of  cancelling  out  the  default   binary   interpretation.   Larger   latency   and   weaker   response   to   the  ternary  meter  correspond  to  the  reputation  of  ternary  pulse  amongst  Western  musicians   as   being   more   entertaining,   smooth   and   easy-­‐going   than   binary  pulse.      

In  Western  music,  the  ternary  meter  has  become  embodied  in  the  waltz  pattern  of  motion  –  three  steps  of  circling  motion.  Done  continuously,  waltzing  produces  complete  circle  of  steps  on  the  dance  floor.  Accordingly,  ternary  metric  pulse  is  associated  with  carefree  movement  –  going  by  a  circle  does  not  have  a  purpose.  It  is  the  pulse  suited  primarily  for  dancing  -­‐  in  contrast  to  binary  pulse,  which  serves  primarily  for  walking.  Notably,  all  ternary  meters  feature  the  casual  delightful  style  of  movement,  not  purposeful,  not  directed,  but  rather  pleasurable  and  relatively  easy-­‐going  –  unlike  the  purpose-­‐oriented  walking.    

                                                                                                               46  Fujioka  T;  Zendel  BR;  Ross  B,    (2010)  -­‐  Endogenous  neuromagnetic  activity  for  

mental  hierarchy  of  timing.  The  Journal  Of  Neuroscience:  The  Official  Journal  Of  The  Society  For  Neuroscience,  2010  Mar  3;  Vol.  30  (9),  pp.  3458-­‐66.  

  33  

This  traditional  amongst  musicians  view  of  ternary  versus  binary  meters  has  gained  scientific  support.  In  1985  the  group  of  researchers  have  discovered  the  “detour  path  effect”:47  the  fact  that  moving  the  finger  from  point  A  to  point  B  straightly  subjectively  appears  to  take  shorter  time  than  actually  it  takes,  whereas  moving  by  the  ellipse  appears  to  be  longer  than  it  actually  is.  The  greater  is  the  detour,  the  greater  the  illusion.  Further  research  demonstrated  that  there  is  actual  time  difference  in  the  velocity  of  the  hand  tracing  or  drawing  the  straight  line  as  opposed  to  oval  line.  The  delay  in  estimation  of  the  hand’s  trajectory  was  responsible  for  the  retardation  effect.  The  amount  of  inflection  in  the  direction  of  a  line  governed  the  impression  of  slowness:  the  greater    the  curvature,    the    slower    the    movement.48    

When  the  circular  motion  is  reproduced  over  and  over  -­‐  as  it  happens  in  ternary  metric  pulsation  -­‐  the  lax  effect  magnifies.  Every  measure  becomes  “laid  back”  between  the  neighboring  downbeat  points,  since  the  downbeats  designate  the  leaning  points  for  the  metric  pulse.  As  a  result,  the  ternary  beat  movement  obtains  its  lackadaisical  flavor.  

Anybody  can  conduct  an  easy  experiment:     turn   the  metronome  on  and  tap  to  its  clicking,  marking  every  other  beat  –  then,  after  getting  accustomed  to  the  binary  pulse,  start  stressing  every  third  beat.  Changes  are  you  will  feel  that  the  new  pulse  is  more  relaxed  than  the  previous  one.  

The  tendency  of  a  binary  pulse  to  produce  more  busy  impression  has  a  physiological  basis,  too.  The  straight  movement  associated  with  the  marching  pattern  of  the  beat  implies  directedness  -­‐  the  motion  towards  a  certain  goal.  And  we  know  from  the  experimental  studies  that  aimed  hand  movements  are  planned  vectorially:  in  terms  of  distance  and  direction,  rather  than  in  terms  of  absolute  position  in  space.49  

Vector-­‐like  processing  of  the  binary  meter  might  be  responsible  for  its  greater  dynamism,  associated  with  the  instinct  to  reach  the  vector’s  target  sooner  -­‐  in  contrast  with  the  ternary  meter,  with  its  propensity  to  lag.  The  connection  between  goal  orientation  and  distance  estimation  has  been  experimentally  supported.  In  one  study,  the  subjects  were  found  to  perceive  the  straight-­‐line  distance  to  a  cylinder  as  

                                                                                                               47  Lederman  SJ,  Klatzky  RL,  Barber  PO  (1985)  -­‐  Spatial  and  movement-­‐based  

heuristics  for  encoding  pattern  information  through  touch.  Journal  Of  Experimental  Psychology.  1985  Mar,  Vol.  114,  Issue  1,  p.  33-­‐49.  

48  Faineteau,  Henry;  Gentaz,  Edouard;  Viviani,  Paolo  (2005)  -­‐  Factors  affecting  the  size  of  the  detour  effect  in  the  kinaesthetic  perception  of  Euclidean  distance.  Experimental  Brain  Research.  Aug2005,  Vol.  163  Issue  4,  p.  503-­‐514.  

49  Vindras,  Philippe;  Desmurget,  Michel;  Prablanc,  Claude;  Viviani,  Paolo  (1998)  -­‐  Pointing  errors  reflect  biases  in  the  perception  of  the  initial  hand  position.  Journal  of  Neurophysiology  (Bethesda),  Vol.  79  (6).  June,  p.  3290-­‐3294.  

  34  

being  longer  when  they  intended  to  grasp  the  cylinder  by  reaching  around  a  wide  barrier.  The  same  distance  was  perceived  shorter  when  the  barrier  was  narrower.  50  

Now  we  can  see  why  the  punctured  rhythm  in  the  binary  meter  (that  we  have  discussed  earlier,  in  relation  to  Schumann)  can  exert  much  energy  and  produce  the  impression  of  rushing.  Association  of  binary  pulse  with  goal-­‐oriented  marching  movement  could  be  responsible  for  the  “wishful  thinking”  –  the  impression  of  shortening  of  a  distance  that  one  has  to  travel.  Ternary  pulse,  free  from  the  goal,  is  not  capable  of  stimulating  such  “wishful  thinking,”  and  therefore  commonly  produces  less  active  impression.  That  throws  new  light  at  the  contrast  between  the  punctured  binary  and  swing-­‐like  ternary  divisions  of  a  beat:  the  former  rushes  the  movement,  whereas  the  latter  relaxes  it.  

The  contrast  between  binary  and  ternary  is  manifested  in  two  expressive  aspects:  rhythm  and  meter.  In  both  of  them,  elementary  idioms  deliver  their  semantic  messages  to  the  listener  the  moment  he  becomes  aware  of  them  (which  takes  split  seconds).  In  rhythm,  punctured  figure  bounces  and  bursts,  whereas  swing  figure  rolls  and  rocks.  In  meter,  binary  pulse  hastens,  whereas  ternary  pulse  comforts.    

The  emotional  meaning  conveyed  by  the  syntactic  relation  between  rhythm  and  meter  is  corroborated  by  the  expressive  timing  chosen  by  a  performer.  The  measurements  show  that  in  practice,  the  2:1  swinging  rhythms  can  actually  vary  anywhere  from  1.66:1  to  5.6:1  ratio.  The  lower  ratios  correspond  to  natural,  tender,  solemn,  and  sad  expressions.  The  higher  ratios  suit  happy  and  angry  expressions.51  Durational  contrast  between  longer  and  shorter  tones  can  be  understood  as  sharpening  or  softening  of  a  rhythmic  ratio  in  order  to  account  for  sharpening  or  softening  a  corresponding  emotional  meaning.    

Their  connection  must  have  been  established  through  cultivation  of  dance  music  in  folkloric  culture:  a  dancer  adjusts  his  steps  in  a  particular  dance  to  the  music  according  to  his  emotional  state.  Other  people  see  this  adjustment  and  entrain  to  both,  the  music  and  the  accompanying  it  motion.  They  remember  the  motion  and  reuse  it  every  time  they  happen  to  dance  (in  traditional  folklore  dancing  is  a  very  common  activity).  The  motion  becomes  fixed  by  a  convention.  Eventually,  the  gradations  in  that  motion  become  fixed  as  well  –  they  obtain  their  own  “sharpening”  or  “softening”  values  along  with  related  emotional  associations.  Thus,  rocking  motion  is  generally  believed  to  be  softer  than  staggering,  and  is  associated  with  softer,  loving,  expression.  

Emotion  acts  like  a  two-­‐edged  sword  in  music:  it  is  conveyed  by  idioms  and  syntax,  yet  it  affects  perception  of  syntax  in  a  feedback  loop.  Musicians  emote  to  the  

                                                                                                               50  Morgado  N;  Gentaz  E;  Guinet  E;  Osiurak  F;  Palluel-­‐Germain  R.  (2013)  -­‐  Within  

reach  but  not  so  reachable:  obstacles  matter  in  visual  perception  of  distances.  Psychonomic  Bulletin  &  Review,  2013  Jun;  Vol.  20  (3),  pp.  462-­‐7.  

51  Madison,  Guy  (2000)  -­‐  Properties  of  Expressive  Variability  Patterns  in  Music  Performances.    Journal  of  New  Music  Research.  Dec2000,  Vol.  29  Issue  4,  p.  335-­‐357.  

  35  

music  they  play,  which  causes  them  to  exaggerate  expressive  timing,  dynamics  and  even  pitch.  Varying  pitches  at  the  discretion  of  a  performer  within  the  same  motif  often  happens  in  popular  music.  It  is  prohibited  in  classical  music,  with  its  reverence  for  the  score,  however  even  there,  singers  and  string  instruments  players  have  leeway  in  bending  their  intonation  in  so-­‐called  “portamento,”  not  to  speak  of  their  right  to  exercise  embellishments.  Such  expressive  exaggerations,  in  turn,  affect  the  way  in  which  listeners  perceive  the  musical  idioms  and  the  syntactic  structures  rendered  by  the  musicians.  As  we  could  see  in  numerous  cases,  listeners  tend  to  hold  a  perception  bias  that  usually  matches  the  bias  employed  by  performers.    

The  communication  here  works  akin  the  Dolby  process:  performer  encodes  the  emotional  message  via  expressive  exaggeration  –  listener  decodes  it  by  nullifying  the  exaggeration  –  by  looking  “through”  it,  as  though  it  is  not  there.  So,  altogether,  the  expressive  distortion  of  rhythm,  dynamics  or  pitch  appears  “normative”  to  the  audience.  It  is  only  the  absence  of  exaggeration  is  registered  by  the  audience  as  abnormality.    

This  mechanism  works  strictly  on  idiomatic  basis.  The  central  place  in  this  scheme  is  occupied  by  the  depository  of  idioms  available  to  the  listener.  The  real  lexicon  of  a  non-­‐musician  is  a  lot  larger  than  what  has  been  covered  by  modern  day  musicology.  The  topoi  and  styles  defined  in  the  music  literature  constitute  just  the  drop  in  a  pool  of  what  actually  is  out  there,  in  music  practice.  All  the  expressive  devices  listed  in  this  book  in  conjunction  to  the  meter,  rhythm,  tempo  and  articulation  are  the  idioms  that  constitute  entries  in  the  Western  lexicon  of  music.  This  lexicon  is  vast.  The  amount  of  rhythmic  figures  alone  must  run  in  hundreds  -­‐  involving  many  different  combinations  and  configurations  of  available  time  ratios  -­‐  with  all  variants  produced  by  interaction  with  different  meters  at  different  tempi.  And  comparable  multitude  of  entries  represent  the  harmonic,  melodic,  textural  and  timbral  aspects  of  music.  

Without  any  awareness,  mostly  automatically,  the  listener  manages  the  enormous  database  of  what  is  common  for  his  native  type  of  music  -­‐  in  a  way  very  similar  to  how  he  stores  up  to  a  hundred  thousand  words  and  their  phrasal  combinations  of  his  native  tongue.  On  this,  still  syntactic,  level  of  auditory  perception  speech  and  music  are  processed  in  a  single  domain.  That  is  why  the  findings  of  psycholinguists  are  quite  applicable  to  the  field  of  music.  There  is  no  principal  difference  in  operating  perceptual  tasks  between  music  and  language  -­‐  not  until  the  time  to  estimate  the  contextual  appropriateness  of  a  given  expression  will  come.  Then  semantic  processing  of  music  takes  a  different  path,  departing  from  speech.  

Aniruddh  Patel  proposes  a  model  of  functionally  shared  brain  networks52  in   attempt   to   reconcile   the   conflicting   evidence   from   behavioral   studies   of  patients  with  musical  deficits  -­‐  which  point  toward  independence  of  musical  

                                                                                                               52  Patel,  Aniruddh  D.    (2013)  -­‐  Sharing  and  nonsharing  of  brain  resources  for  

language  and  music.  In:  Language,  music,  and  the  brain:  A  mysterious  relationship.  edited  by  Michael  A.  Arbib,  MIT  Press,  Cambridge,  MA:,  p.  329-­‐356.  

  36  

syntax   from   syntax   of   speech   -­‐   against   the   evidence   from   neuroimaging  research,  which  proves  their  overlap.  The  main  reason  for  musical  syntax  to  have  something  in  common  with  linguistic  syntax,  according  to  that  theory,  is  the   fact   that  both  of   them  depend  on   the   real-­‐time   interpretation  of   rapidly  unfolding   streams   of   information.   In   both   cases,   interpretation   involves  application   of   abstract   structural   categories   and   rules.   They,   per   se,   are   not  bound   to   specific   semantic   denotations,   yet   exercise   influence   over   the  meaning.  What  unites  music  and  speech  syntax  is  their  heavy  reliance  on  the  relevant   functional   computations   -­‐   despite   neuropsychological   dissociations  between  linguistic  and  musical  abilities.  

The  biggest  principal  difference  of  musical  syntax  from  speech  is  the  greater  span  and  complexity  of  its  hierarchic  organization,  as  well  as  simultaneous  engagement  of  multiple  hierarchic  levels  in  definition  of  the  very  same  sound  event.    

The  syntax  of  music  is  by  far  a  lot  more  complex  than  that  of  speech.  Even  on  the  most  basic  surface  level,  a  listener  has  to  pay  attention  concurrently  to  the  changes  in:  

1. pitch,    2. rhythm,    3. harmony,    4. dynamics,    5. tempo,    6. metric  organization,    7. texture,    8. articulation  and    9. music  form.    

Each  of  these  nine  aspects  of  expression  possesses  its  proprietary  set  of  idiomatic  patterns.  And  each  of  these  idioms  is  bound  by  syntactic  rules  –  that  act  across  the  number  of  expressive  aspects.  

We   could   already   observe   how   the   idiom  of   “short-­‐long”   3:1   punctured  rhythm  depended  on:  

• its  relation  to  the  beat  (metric  aspect),    • harmonic  pulse  (harmonic  aspect),    • separation  from  the  rhythm  in  the  accompaniment  (texture  

aspect),    • span  of  the  pattern  (articulation  aspect)  and    • repetitiveness  of  the  pattern  (aspect  of  music  form).    

So,   the   rules   that   govern   appropriateness   of   this   idiom   to   a   particular  context   involve  conditions   from  5  different  aspects  of  organization.  The   fact  that   the   other   4   aspects   of   organization   are   irrelevant   to   the   idiom   of  punctured  rhythm  is  as  characteristic  in  defining  its  syntax.  Some  other  idiom,  such   as   “fanfare”   is   characterized  by   a   completely   different   configuration   of  relevant  and  irrelevant  aspects  of  expression.    

  37  

Not  only  each  musical  idiom  is  mapped  across  several  aspects  of  expression  –  it  is  accessed  simultaneously  from  a  number  of  hierarchic  levels  of  organization.  In  general,  music  perception  seems  to  favor  “top-­‐down”  direction  of  syntax  building  in  relation  to  parsing,  and  “bottom-­‐up”  direction  in  relation  to  error-­‐correction.  However,  in  every  particular  case  categorization  of  a  musical  event  can  potentially  engage  low,  middle  or  high  order  hierarchic  level.  

In   the   same   example   of   punctured   rhythm   from   the   Arabeske   by  Schumann,  the  “short-­‐long-­‐short-­‐long”  rhythmic  figure  is  defined  by:  

• Elementary  level  of  metric  aspect  (beat);  • Advanced  level  of  harmonic  aspect  (chord  is  the  elementary  

level,  succession  of  two  chords  is  the  medium  level,  and  harmonic  pulse  is  the  average  of  progression  of  multiple  chords);  

• Medium  level  of  texture  aspect  (continuity  of  a  single  melodic  voice  is  the  elementary  level  of  texture,  and  integration  of  the  tones  that  comprise  the  accompaniment  into  a  single  entity  is  the  next  level);  

• Elementary  level  of  articulation  aspect  (legato  connection  of  4  notes  constitutes  an  elementary  unit  of  articulation);  

• Elementary  level  of  music  form  (repetition  of  the  pattern  within  the  musical  sentence,  which  is  the  simplest  unit  of  music  form).    

Different  layers  of  syntactic  hierarchy  are  accessed  at  once  –  without  any  building  up.  The  harmonic  pulse   is  not   inferred   from  observation  of   chords,  but  “guessed”  right  away:  the  listener  already  knows  what  to  expect  from  the  sound  of  a  harmonic  progression  where  chords  keep  changing  on  every  beat.  He   remembers   how   this   structure   should   sound.   He   also   knows   that   such  harmonic  pulse  often  supports  the  punctured  rhythm.  So,  after  hearing  just  3-­‐4   changes   of   chords,   he   “guesses”   the   rest   –   before   the  music   will   actually  demonstrate  that  the  harmonic  pulse  indeed  commits  to  the  beat.  

Lerdahl  and  Jackendoff  are  right  in  formulating  the  generative  rules  and  stating  that  construction  of  syntactic  hierarchy  takes  place  while  listening.  However,  this  process  is  not  the  main  method  of  music  comprehension.  The  emotional  nature  of  music  makes  it  way  too  exciting  and  intuitive  to  follow  the  accurate  logic  chain  of  derivation.    

The  greatest  physiological  advantage  of  emotion  lies  in  its  enormous  speed  of  reaction.  The  same  applies  to  musical  emotion.  It  jumps  like  a  flee  -­‐  it  simply  cannot  crawl  like  a  beetle.  The  mind  has  easier  time  activating  an  emotional  reaction  and  then,  if  it  turns  out  to  be  inappropriate,  cancelling  it  in  favor  of  some  other  emotion  –  than  to  withhold  any  emotional  reaction  and  sit  there,  waiting  until  the  entire  syntactic  hierarchy  is  constructed  and  tested.    

Wrong-­‐guessing  an  emotion  is  the  most  common  situation  in  everyday  life.  How  many  times  have  we  gotten  scared  and  then  immediately  realized  that  there  is  

  38  

nothing  dangerous  out  there,  and  laughed  at  ourselves?  -­‐  Musical  emotions  are  as  volatile.  

Musical  mind  thinks  by  guesses  –  this  is  the  prime  mode  for  making  sense  in  music.  The  “generative”  mode  is  only  secondary  –  reserved  for  the  cases  of  ambiguity:  whenever  the  music  depends  on  complex  emotional  conditions  or  expresses  conflicting  emotional  states,  or  when  the  listener  is  not  very  well  versed  in  the  music  language  that  is  used  by  a  music  composition.    

Of  course,  the  “generative”  mode  of  listening  can  be  cultivated,  like  everything  else.  However,  not  that  many  people  are  capable  of  sticking  to  this  mode  as  the  primary  means  of  following  music.  Nicholas  Cook  describes  these  two  modes,  calling  the  most  common  mode  "musical  listening",  because  it  focuses  on  the  flow  of  music.  The  other  one  he  qualifies  as  the  "musicological  listening"  mode,  where  the  focus  goes  towards  establishing  presence  of  certain  structures  in  music.53    

Empirical  evidence  of  ultra-­‐fast  emotional  reactions  to  music  of  both,  musically  untrained  listeners,  as  well  as  musicians,  testifies  that  even  professional  musicians  rarely  reserve  to  the  purely  “musicological”  mode  of  listening.  54  

The  principal  strategy  for  a  listener  in  parsing  the  stream  of  music  is  to  look  for  familiar  “sound  bites”  of  rhythm,  melody  and  harmony  –  in  their  most  common  metric  modifications  (i.e.  punctured  rhythm  in  binary  versus  ternary  meter,  or,  simple  meter  versus  complex  etc.).  These  “sound  bites”  can  be  elementary  or  complex,  and  often  belong  to  the  higher  order  hierarchic  levels.    

The  cognitive  mechanism  of  chunking  allows  anybody  to  grasp  the  peculiar  clash  of  both  conflicting  rhythms  and  remember  them  as  a  single  block  of  information.  Then  every  time  a  similar  sound  will  be  encountered  in  music,  the  listener  will  recognize  this  polyrhythmic  idiom  -­‐  without  any  generative  operations.  In  fact,  in  one  experimental  study,  half  of  the  musically  untrained  participants  were  able  to  tap  the  alternations  of  binary  and  ternary  divisions  –  remembering  and  appreciating  the  peculiar  “against  the  beat”  quality  of  this  polyrhythmic  combination.55  

A   recent   experimental   study   by   Koelsch   (2013) 56  investigated   how  musicians   and   non-­‐musicians   process   higher   order   syntax   in   music.  

                                                                                                               53  Cook,  Nicholas  (1990)  -­‐  Music,  imagination,  and  culture.  Oxford:  Clarendon  

Press,  Oxford,  p.  152.  54  Bachorik,  Justin  Pierre  et  al  (2009)  -­‐  Emotion  in  motion:  Investigating  the  

time-­‐course  of  emotional  judgments  of  musical  stimuli.  Music  perception:  An  interdisciplinary  journal,  26(4)  p.  355.  

55  Vos  P,  Handel  S,  (1987)  -­‐  Playing  triplets:  facts  and  preferences.  In:  Action  and  Perception  in  Rhythm  and  Music,  Ed.  A  Gabrielsson,  Royal  Swedish  Academy  of  Music,  Stockholm,action  and  perceptionaction  and  perception  pp.  35-­‐47.  

56  Koelsch  S;  Rohrmeier  M;  Torrecuso  R;  Jentschke  S.  (2013)  -­‐  Processing  of  hierarchical  syntactic  structure  in  music.  Proceedings  Of  The  National  Academy  Of  Sciences  Of  The  United  States  Of  America,  2013  Sep  17;  Vol.  110  (38),  pp.  15443-­‐8.  

  39  

Researchers   used   two   versions   of   a   chorale   by   J.   S.   Bach:   one   intact,   and  another  one  with  the  transposed  first  sentence  –  so  that  the  harmonic  endings  of   both   sentences   mismatched.   As   a   result,   the   higher   order   harmonic  organization  was  broken  (at  the  level  of  a  musical  period),  while  keeping  the  lower  level  correct  (at  the  sentence  level).    

Example  14.    J.  S.  Bach’s  chorale  Liebster  Jesu,  wir  sind  hier  (BWV  373)  in  the  original  and  distorted  form,  where  the  1st  sentence  is  transposed  to  create  a  harmonic  clash  with  the  2nd  sentence  (from  Koelsch  et  al  2013).  

 

  40  

Previous   studies   have   demonstrated   that   whenever   listeners   discover  syntactic   errors   in   music,   their   electroencephalographic   (EEG)   recording  shows   an   early   right   anterior   negativity   (ERAN),   that   is   known   to   reflect  music–syntactic   processing,   together  with   a   subsequent   late   negativity   (the  so-­‐called   N5),   known   to   reflect   harmonic   integration.   Listening   to   the  distorted  version  of  Bach’s   chorale   invoked   the  same  pattern  of   response   in  Koelsch’s   subjects,   indicating   that   listeners   can  be   aware   of   irregularities   in  higher  order  syntax  at  the  absence  of  irregularities  at  the  lower  order  syntax.      

This  finding  suggests  that  the  listener’s  mind  can  process  syntactic  hierarchy  altogether  with  nested  long-­‐distance  dependencies.  Listeners  can  keep  in  active  memory  representations  of  syntactic  structures  they  encountered  earlier  in  music.  Their  mind  allows  them  to  selectively  access  these  structures  as  needed.  In  fact,  it  is  likely  that  this  capacity  is  much  stronger  for  musical  syntax  comparing  to  verbal  syntax.  Music  routinely  employs  repetitions  of  musical  material  over  large  spans  of  music  (i.e.  recapitulation  in  sonata  form),  which  can  operate  within  the  time  periods  of  15-­‐20  minutes  –  greatly  exceeding  the  structural  complexity  even  of  the  longest  sentences.  

In  this  light,  it  becomes  crucial  for  the  listener  to  be  able  to  quickly  move  focus  of  his  attention  from  one  aspect  of  expression  to  another  aspect,  to  zoom  in,  or  out,  in  examination  of  a  syntactic  structure  -­‐  while  shifting  from  one  level  of  syntactic  hierarchy  to  another.  Strictly  following  the  “generative”  framework  would  deprive  the  listener  of  precious  time  and  rob  him  of  much  of  his  emotional  experience.    

The  evidence  from  the  research  on  expressive  timing  supports  the  conclusion  that  music  becomes  emotionally  meaningful  right  from  the  start  of  syntactic  processing.  The  conviction  that  conveyance  of  emotion  is  the  prime  function  of  music,  so  dominant  in  Western  literature  on  music  before  the  20th  century,  finds  confirmation  in  the  most  recent  psycho-­‐physiological  studies.    

Musical  syntax  seems  to  be  organized  around  the  central  technical  issue  of  music  –  the  task  of  speeding  up  the  mental  processing.  The  principal  function  of  musical  syntax  is  to  reserve  the  most  time-­‐space  available  for  experience  of  music  as  the  “emotional  theater.”  The  entire  evolution  of  Western  music  has  been  directed  toward  establishment  of  “musica  reservata”  during  the  1560’s  –  the  same  point  in  history  and  geography,  where  the  fine  art  of  Renaissance  reached  its  summit  and  set  the  Western  civilization  apart  from  the  rest  of  the  world  cultures,  coining  the  Western  identity.    

“Suiting  the  power  of  music  to  the  meaning  of  the  words,  expressing  the  power  of  each  different  emotion,  making  the  things  of  the  text  so  vivid  that  they  seem  to  stand  actually  before  our  eyes,”57  –  this  description  of  music  of  Orlando  di  Lasso  from  the  1565  letter  by  the  Dutch  scholar,  Samuel  Quickelberg,  captures  the  essence  of  a  big  leap  Western  music  took  at  that  time,  making  a  revolutionary  impression  on  

                                                                                                               57  Grout,  Donald  J.  (1973)  –  A  History  of  Western  Music.  W.  W.  Norton  &  

Company,  New  York,  p.  283.  

  41  

the  contemporaries.  This  was  the  moment  when  the  spontaneously  forming  public  market  for  music  opened  doors  for  the  composers  to  compete  with  each  other  in  their  skills  of  creating  “emotional  theater.”  Quickly  maturing  market  conditions  started  rewarding  more  emotionally  expressive  authors  over  their  less  expressive  colleagues,  setting  incentives  for  all  musicians,  including  performers,  to  maximize  the  emotional  expression.  This  vector  of  historic  development  has  been  dominant  until  the  advance  of  abstract  music  in  the  middle  of  the  20th  century  –  and  still  is  prominent  in  the  domain  of  popular  music,  as  well  as  film  music.      

The  time  period  between  1560  and  1948  generally  coincides  with  what  is  called  “common  practice  period”  –  time  when  classical  music  strictly  followed  the  set  of  compositional  rules  and  enjoyed  wide  recognition  (c.1600-­‐1900).  Absolute  majority  of  musical  idioms  in  use  today  have  been  generated  during  these  three  centuries  under  the  aegis  of  public  interest  in  “emotional  theater”  provided  by  music.    

Musical  syntax,  as  we  know  it,  has  been  crystallized  in  millions  of  acts  of  emotional  communication  between  the  composers,  the  performers  and  the  listeners,  providing  feedback  and  polishing  the  conventions  of  use  -­‐  as  well  as  optimizing  the  physical  appearance  of  the  musical  idioms.  Emotional  factor  has  been  conditioning  both,  production  of  music  and  perception  of  it.    

Musical  syntax  should  be  viewed  as  the  set  of  rules  regulating  the  showcase  of  musical  emotions.  These  rules  take  familiar  emotional  expressions  and  organize  them  into  the  chain  of  exciting  events  that  collide  and  lead  to  unpredictable  outcomes,  always  fresh,  always  believable  –  affecting  the  perception  of  syntax  by  the  listener  and  commanding  the  performer  to  adjust  for  the  listener’s  affective  state.  Emotional  factor  turns  musical  syntax  into  an  ouroboros  -­‐  a  snake  eating  its  own  tail.  Studying  musical  syntax  without  considering  emotional  denotation  is  throwing  out  the  baby  with  the  bath  water.  Emotion  is  those  stereoscopic  glasses  that  make  one  feel  syntactic  structures  of  the  music  in  flesh  and  blood.  Without  such  glasses  music  becomes  mundane  and  dull  reality,  attractive  to  no  one  but  most  devoted  musicologists.    

 

REFERENCES:    

   Bachorik,  Justin  Pierre  et  al  (2009)  -­‐  Emotion  in  motion:  Investigating  the  time-­‐course  of  emotional  judgments  of  musical  stimuli.  Music  perception:  An  interdisciplinary  journal,  26(4)  p.  355.  

   Bella,  Simone;  Peretz,  Isabelle;  Aronoff,  Neil  (2003)  -­‐  Time  course  of  melody  recognition:  A  gating  paradigm  study.  Attention,  Perception,  and  Psychophysics,  October  2003,  Vol.  65  Issue:  Number  7  p.  1019-­‐1028.  

   Bengtsson,  I.,  &  Gabrielsson,  A.  (1983)  -­‐  Analysis  and  synthesis  of  musical  rhythm.  In:  Studies  of  music  performance,  J.  Sundberg  (Ed.).  Publications  issued  by  the  Royal  Swedish  Academy  of  Music  No.  39,  Stockholm,  Sweden,  pp.  27-­‐60.  

  42  

   Bhatara,  Anjali;  Tirovolas,  Anna  K.;  Duan,  Lilu  Marie  (2011)  -­‐  Perception  of  Emotional  Expression  in  Musical  Performance.  Journal  of  Experimental  Psychology:  Human  Perception  and  Performance,  v37  n3  p.  921-­‐934.  

   Bigand,  E.;  Vieillard,  S.;  Madurell,  F.;  Marozeau,  J.;  Dacquet,  A.  (2005)  -­‐Multidimensional  scaling  of  emotional  responses  to  music:  The  effect  of  musical  expertise  and  of  the  duration  of  the  excerpts.  Cognition  &  Emotion.  Dec2005,  Vol.  19  Issue  8,  p.  1113-­‐1139.  

   Cambouropoulos,  Emilios  (2010)  -­‐  The  Musical  Surface:  Challenging  Basic  Assumptions.  Musicae  Scientiae,  Special  Issue,  pp.  131-­‐148.  

   Chan,  Sau  Y.  (2005)  -­‐  Performance  Context  as  a  Molding  Force:  Photographic  Documentation  of  Cantonese  Opera  in  Hong  Kong.  Visual  Anthropology.  Mar-­‐Jun2005,  Vol.  18  Issue  2/3,  p.  167-­‐198.  

   Cook,  Nicholas  (1990)  -­‐  Music,  imagination,  and  culture.  Oxford:  Clarendon  Press,  Oxford,  p.  152.  

   Desain  P.;  Honing  H.  (2003)  -­‐  The  formation  of  rhythmic  categories  and  metric  priming.  Perception;  Vol.  32  (3),  pp.  341-­‐65.  

   Desain,  P.;  Honing,  H.  (2001)  -­‐  Modeling  the  Effect  of  Meter  in  Rhythmic  Categorization:  Preliminary  Results.  Japanese  Journal  of  Music  Perception  and  Cognition,  7(2),  145–56.  

   Fabian,  Dorottya;  Schubert,  Emery  (2010)  -­‐  A  new  perspective  on  the  performance  of  dotted  rhythms.  Early  Music.  Nov2010,  Vol.  38  Issue  4,  p.  585-­‐588.  

   Faineteau,  Henry;  Gentaz,  Edouard;  Viviani,  Paolo  (2005)  -­‐  Factors  affecting  the  size  of  the  detour  effect  in  the  kinaesthetic  perception  of  Euclidean  distance.  Experimental  Brain  Research.  Aug2005,  Vol.  163  Issue  4,  p.  503-­‐514.  

   Fraisse,  P.  (1982)  -­‐  Rhythm  and  tempo.  In:  Psychology  of  music,  D.  Deutsch  (Ed.).  New  York,  NY:  Academic  Press.  pp.  149-­‐180.  

   Fujii,  Shinya  et  al.  (2011)  -­‐  Synchronization  error  of  drum  kit  playing  with  a  metronome  at  different  tempi  by  professional  drummers.  Music  perception:  An  interdisciplinary  journal,  28(5)  p.  491.  

   Fujioka  T;  Zendel  BR;  Ross  B,    (2010)  -­‐  Endogenous  neuromagnetic  activity  for  mental  hierarchy  of  timing.  The  Journal  Of  Neuroscience:  The  Official  Journal  Of  The  Society  For  Neuroscience,  2010  Mar  3;  Vol.  30  (9),  pp.  3458-­‐66.  

   Gabrielsson,  A.,  Bengtsson,  I.  and  Gabrielsson,  B.  (1983)  -­‐  Performances  of  musical  rhythm  in  3/4  and  6/8    meter.  Scandinavian  Journal  of  Psychology,  24,  p.  193-­‐213.  

   Grahn  J.A;  Brett  M.  (2007)  -­‐  Rhythm  and  beat  perception  in  motor  areas  of  the  brain.  Journal  Of  Cognitive  Neuroscience,  2007  May;  Vol.  19  (5),  pp.  893-­‐906.  

   Grout,  Donald  J.  (1973)  –  A  History  of  Western  Music.  W.  W.  Norton  &  Company,  New  York,  p.  283.  

  43  

   Hefling,  Stephen  E.  (1993)  -­‐  Rhythmic  Alteration  in  Seventeenth-­‐  And  Eighteenth-­‐Century  Music:  Notes  Inegales  and  Overdotting,  Schirmer  Books,  New  York,  pp.  101–105.  

Jackendoff,  R.  (1987)  -­‐  Consciousness  and  the  computational  mind.  The  MIT  Press,  Cambridge  MA,  p.  218-­‐219.  

   Jansen,  Erik;  Povel,  Dirk-­‐Jan  (2004)  -­‐  Perception  of  arpeggiated  chord  progressions.  Musicæ  scientiæ:  The  journal  of  the  European  Society  for  the  Cognitive  Sciences  of  Music,  8(1)  p.  7.  

   Koelsch  S;  Rohrmeier  M;  Torrecuso  R;  Jentschke  S.  (2013)  -­‐  Processing  of  hierarchical  syntactic  structure  in  music.  Proceedings  Of  The  National  Academy  Of  Sciences  Of  The  United  States  Of  America,  2013  Sep  17;  Vol.  110  (38),  pp.  15443-­‐8.  

   Lederman  SJ,  Klatzky  RL,  Barber  PO  (1985)  -­‐  Spatial  and  movement-­‐based  heuristics  for  encoding  pattern  information  through  touch.  Journal  Of  Experimental  Psychology.  1985  Mar,  Vol.  114,  Issue  1,  p.  33-­‐49.  

   Lerdahl,  F.,  &  Jackendoff,  R.  (1983)  -­‐  A  generative  theory  of  tonal  music.  The  MIT  Press,  Cambridge  MA.    

   Madison,  Guy  (2000)  -­‐  Properties  of  Expressive  Variability  Patterns  in  Music  Performances.    Journal  of  New  Music  Research.  Dec2000,  Vol.  29  Issue  4,  p.  335-­‐357.  

   Morgado  N;  Gentaz  E;  Guinet  E;  Osiurak  F;  Palluel-­‐Germain  R.  (2013)  -­‐  Within  reach  but  not  so  reachable:  obstacles  matter  in  visual  perception  of  distances.  Psychonomic  Bulletin  &  Review,  2013  Jun;  Vol.  20  (3),  pp.  462-­‐7.  

   Paciorek,  Wiktor;  Ralczaszek-­‐Leonardi,  Joanna  (2009)  -­‐  The  influence  of  sentential  context  and  frequency  of  occurrence  on  the  recognition  of  words  with  scrambled  letters,  Psychology  of  Language  and  Communication,  January  2009,  Vol.  13  Issue:  Number  2  p.  45-­‐57.  

   Parncutt,  Richard  (1989)  -­‐  Harmony:  A  Psychoacoustica1  Approach,  Springer-­‐Verlag  Berlin,  p.  68-­‐70.  

   Patel,  Aniruddh  D.    (2013)  -­‐  Sharing  and  nonsharing  of  brain  resources  for  language  and  music.  In:  Language,  music,  and  the  brain:  A  mysterious  relationship.  edited  by  Michael  A.  Arbib,  MIT  Press,  Cambridge,  MA:,  p.  329-­‐356.  

   Povel,  D.  J.,  &  Jansen,  E.  (2001)  -­‐  Perceptual  mechanisms  in  music  processing.  Music  Perception,  19  (2),  p.  169–198.  

   Povel,  D.J.  &  Essens,  P.  (1985)  -­‐  Perception  of  temporal  patterns.  Music  Perception,  2(4):  p.  411-­‐440.  

   Rao,  Nancy  Yunhwa  (2007)  -­‐  The  tradition  of  luogu  dianzi  (percussion  classics)  and  Its  signification  in  contemporary  music.  Contemporary  Music  Review.  2007,  Vol.  26  Issue  5/6,  p.  511-­‐527.  

   Rayner,  Keith  et  al  (2006)  -­‐  Raeding  Wrods  With  Jubmled  Lettres:  There  Is  a  Cost.  Psychological  Science,  17(3),  p.  192-­‐193.    

  44  

   Reinecke,  David  M.  (2009)  -­‐  “When  I  Count  to  Four  ...”:  James  Brown,  Kraftwerk,  and  the  Practice  of  Musical  Time  Keeping  before  Techno.  Popular  Music  &  Society.  Dec2009,  Vol.  32  Issue  5,  p.  607-­‐616.  

   Repp  B.H.  (1990)  -­‐  Patterns  of  expressive  timing  in  performances  of  a  Beethoven  minuet  by  nineteen  famous  pianists.  Journal  of  the  Acoustical  Society  of  America  88,  p.  622–641.    

   Repp  B.H.  (1997)  -­‐  The  art  of  inaccuracy:  Why  pianists'  errors  are  difficult  to  hear.  Music  Perception,  14,  p.  161-­‐184.  

   Repp  BH,  Knoblich  G.  (2004)  -­‐  Perceiving  action  identity:  how  pianists  recognize  their  own  performances.  Psychological  Science  15,  p.  604-­‐9.  

   Repp,  Bruno  H.  (2000)  -­‐  The  timing  implications  of  musical  structures.  In:  Musicology  and  sister  disciplines:  Past,  present,  future.  Greer,  David  ed.,  Oxford  University  Press,  New  York,  p.  60-­‐67.  

   Ryynänen  M.  P.,  &  Klapuri  A.  P.  (2008)  -­‐  Automatic  transcription  of  melody,  bass  line,  and  chords  in  polyphonic  music.  Computer  Music  Journal,  32(3),  p.  72-­‐86.  

   Scott  A.C.  (1983)  –  The  Performance  of  Classical  Theatre.  In:  Chinese  Theater:  From  Its  Origins  to  the  Present  Day,  ed.  Colin  Mackerras,  University  of  Hawaii  Press,  Honolulu,  p.  139-­‐140.  

   Shui'er  Han;  Sundararajan,  Janani;  Bowling,  Daniel  Liu;  Lake,  Jessica;  Purves,  Dale  (2011)  -­‐  Co-­‐Variation  of  Tonality  in  the  Music  and  Speech  of  Different  Cultures.  PLoS  ONE.  2011,  Vol.  6  Issue  5,  p.  1-­‐5.  

   Sloboda,  John  (1985)  -­‐  The  musical  mind:  The  cognitive  psychology  of  music.  Clarendon  Press,  Oxford,  p.  85.  

   Slonimsky,  Nicolas  (1965)  -­‐  Lexicon  of  Musical  Invective:  Critical  Assaults  on  Composers  Since  Beethoven.  Norton  &  Company,  New  York,  p.5  

   Thrasher,  Alan  R.  (1981)  -­‐  The  Sociology  of  Chinese  Music:  An  Introduction.  Asian  Music,  Vol.  12,  No.  2  (1981),  pp.  17-­‐53.  

   Todd,  Neil  P.;  O'Boyle,  Donald  J.;  Lee,  Christopher  S.  (1999)  -­‐  A  sensory-­‐motor  theory  of  rhythm,  time  perception  and  beat  induction.  Journal  of  new  music  research,  28(1)  p.  5.  

   Toiviainen  P,  Luck  G,  Thompson  M  (2010)  -­‐  Embodied  meter:  Hierarchical  eigenmodes  in  music-­‐induced  movement.  Music  Perception  28,  p.  59-­‐70.  

   Trainor,  L.  J.,  Gao,  X.,  Lei,  J.,  Lehtovarara,  K.,  SC  Harris,  L.  R.  (2009)  -­‐  The  primal  role  of  the  vestibular  system  in  determining  musical  rhythm.  Cortex,  45,  p.  3543.  

   Vindras,  Philippe;  Desmurget,  Michel;  Prablanc,  Claude;  Viviani,  Paolo  (1998)  -­‐  Pointing  errors  reflect  biases  in  the  perception  of  the  initial  hand  position.  Journal  of  Neurophysiology  (Bethesda),  Vol.  79  (6).  June,  p.  3290-­‐3294.  

  45  

   Vos  P,  Handel  S,  (1987)  -­‐  Playing  triplets:  facts  and  preferences.  In:  Action  and  Perception  in  Rhythm  and  Music,  Ed.  A  Gabrielsson,  Royal  Swedish  Academy  of  Music,  Stockholm,action  and  perceptionaction  and  perception  pp.  35-­‐47.  

   Wichmann,  Elizabeth  (1991)  -­‐  Listening  to  Theatre:  The  Aural  Dimensions  of  Beijing  Opera.  University  of  Hawai‘i  Press,  Honolulu,  p.4.  

   Yang  Mu  (1994)  -­‐  Academic  Ignorance  or  Political  Taboo?  Some  Issues  in  China's  Study  of  Its  Folk  Song  Culture.  Ethnomusicology,  Vol.  38,  No.  2,  Music  and  Politics  (Spring  -­‐  Summer,  1994),  pp.  303-­‐  320.