Aesthetic judgments of music in experts and laypersons — An ERP study

12
Aesthetic judgments of music in experts and laypersons An ERP study Mira Müller a, , Lea Höfel a , Elvira Brattico b,c , Thomas Jacobsen a a Institute of Psychology I, University of Leipzig, Leipzig, Germany b Cognitive Brain Research Unit, Department of Psychology, University of Helsinki, Helsinki, Finland c Center of Excellence for Interdisciplinary Music Research, University of Jyväskylä, Jyväskylä, Finland abstract article info Article history: Received 8 July 2009 Received in revised form 22 January 2010 Accepted 2 February 2010 Available online 11 February 2010 Keywords: Judgment processes Evaluation Aesthetics Music perception Music aesthetics Preferences ERP EEG Expertise We investigated whether music experts and laypersons differ with regard to aesthetic evaluation of musical sequences. 16 music experts and 16 music laypersons judged the aesthetic value (beauty judgment task) as well as the harmonic correctness (correctness judgment task) of chord sequences. The sequences consisted of ve chords with the nal chord sounding congruous, ambiguous or incongruous relative to the harmonic context established by the preceding four chords. On behavioural measures, few differences were observed between experts and laypersons. However, several differences in event-related potential (ERP) parameters were observed in auditory, cognitive and aesthetic processing of chord cadences between experts and laypersons. First, established ERP effects known to reect the processing of harmonic rule violation were investigated. Here, differences between the groups were observed in the processing of the mild violation experts and laypersons differed in their early brain responses to the beginning of the chord sequence. Furthermore, ERP data indicated distinctions between experts and laypersons in aesthetic evaluation at three different stages. Firstly, during the interval of task-cue presentation, a stronger contingent negative variation (CNV) to the beauty judgment task was observed for experts, indicating that experts invest more effort into preparation for aesthetic processes than into correctness judgments. Secondly, during the rst four chords, preparation for the correctness judgment required more exertion on the laypersons' side. Thirdly, during the last chord, laypersons showed a larger late and widespread positivity for the beauty compared to the correctness judgment, indicating a stronger reliance on internal affective states while forming a judgment. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Most people like or even love music. This positive hedonic attitude towards music has been observed throughout many epochs and cultures as well as in people of all ages (Brattico et al., 2009; McDonald and Stewart, 2008; Peretz, 2006; Trehub and Hannon, 2006). Liking or loving music means that appreciation is felt for objects of art which, by default, are not appreciated for their utilitarian qualities but for something less tangible, namely, their aesthetic qualities. What is commonly understood about the concept of the aesthetic value of music? This question has recently been empirically addressed by Istók et al. (2009) employing a verbal association task. It was observed that the adjective beautifulresides at the core of this concept and therewith represents the optimal linguistic device for expressing the aesthetic value of music. In addition, differences in music experts' and laypersons' concepts of music aesthetics were found. Laypersons produced adjectives related to mood or mood regulation more frequently than music experts. For music experts, the stimulating character of music, as well as its novelty and originality, seems to be of greater importance as reected in their inclination to list the adjectives varying, and originalmore frequently than laypersons. Further evidence for differences between music experts and lay- persons comes from studies focused on the outcome of the aesthetic judgement process, the verdict itself. For example, Smith and Melara (1990) observed that experts prefer unusual chord sequences more than laypersons. Crozier (1974) also reports variation in music preferences dependent on the level of musical training. These results, together with those reported by Istók et al. (2009), suggest that music experts and laypersons might also differ in regard to music aesthetic judgement processes. The present study is aimed at exploring these processes and how they are moderated by different levels of music expertise. Moreover, neuroscientic studies provide strong evidence for the signicant impact of music training on cognitive processes connected with music perception (Bigand and Poulin-Charronnat, 2006). This strengthens the hypothesis, also accounted for by Brattico and Jacobsen (2009), that the process of judging music aesthetically would be International Journal of Psychophysiology 76 (2010) 4051 Corresponding author. University of Leipzig, Institute of Psychology I, Cognitive including Biological Psychology Seeburgstraße 14-20 D-04103 Leipzig, Germany. Tel.: +49 341 973 5908; fax: +49 341 973 5969. E-mail address: [email protected] (M. Müller). 0167-8760/$ see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.ijpsycho.2010.02.002 Contents lists available at ScienceDirect International Journal of Psychophysiology journal homepage: www.elsevier.com/locate/ijpsycho

Transcript of Aesthetic judgments of music in experts and laypersons — An ERP study

International Journal of Psychophysiology 76 (2010) 40–51

Contents lists available at ScienceDirect

International Journal of Psychophysiology

j ourna l homepage: www.e lsev ie r.com/ locate / i jpsycho

Aesthetic judgments of music in experts and laypersons — An ERP study

Mira Müller a,⁎, Lea Höfel a, Elvira Brattico b,c, Thomas Jacobsen a

a Institute of Psychology I, University of Leipzig, Leipzig, Germanyb Cognitive Brain Research Unit, Department of Psychology, University of Helsinki, Helsinki, Finlandc Center of Excellence for Interdisciplinary Music Research, University of Jyväskylä, Jyväskylä, Finland

⁎ Corresponding author. University of Leipzig, Institincluding Biological Psychology Seeburgstraße 14-20Tel.: +49 341 973 5908; fax: +49 341 973 5969.

E-mail address: [email protected] (M. Mü

0167-8760/$ – see front matter © 2010 Elsevier B.V. Adoi:10.1016/j.ijpsycho.2010.02.002

a b s t r a c t

a r t i c l e i n f o

Article history:Received 8 July 2009Received in revised form 22 January 2010Accepted 2 February 2010Available online 11 February 2010

Keywords:Judgment processesEvaluationAestheticsMusic perceptionMusic aestheticsPreferencesERPEEGExpertise

We investigated whether music experts and laypersons differ with regard to aesthetic evaluation of musicalsequences. 16 music experts and 16 music laypersons judged the aesthetic value (beauty judgment task) aswell as the harmonic correctness (correctness judgment task) of chord sequences. The sequences consistedof five chords with the final chord sounding congruous, ambiguous or incongruous relative to the harmoniccontext established by the preceding four chords. On behavioural measures, few differences were observedbetween experts and laypersons. However, several differences in event-related potential (ERP) parameterswere observed in auditory, cognitive and aesthetic processing of chord cadences between experts andlaypersons. First, established ERP effects known to reflect the processing of harmonic rule violation wereinvestigated. Here, differences between the groups were observed in the processing of the mild violation —

experts and laypersons differed in their early brain responses to the beginning of the chord sequence.Furthermore, ERP data indicated distinctions between experts and laypersons in aesthetic evaluation at threedifferent stages. Firstly, during the interval of task-cue presentation, a stronger contingent negative variation(CNV) to the beauty judgment task was observed for experts, indicating that experts invest more effort intopreparation for aesthetic processes than into correctness judgments. Secondly, during the first four chords,preparation for the correctness judgment required more exertion on the laypersons' side. Thirdly, during thelast chord, laypersons showed a larger late and widespread positivity for the beauty compared to thecorrectness judgment, indicating a stronger reliance on internal affective states while forming a judgment.

ute of Psychology I, CognitiveD-04103 Leipzig, Germany.

ller).

ll rights reserved.

© 2010 Elsevier B.V. All rights reserved.

1. Introduction

Most people like or even love music. This positive hedonic attitudetowards music has been observed throughout many epochs andcultures as well as in people of all ages (Brattico et al., 2009;McDonaldand Stewart, 2008; Peretz, 2006; Trehub and Hannon, 2006). Liking orlovingmusicmeans that appreciation is felt for objects of art which, bydefault, are not appreciated for their utilitarian qualities but forsomething less tangible, namely, their aesthetic qualities. What iscommonly understood about the concept of the aesthetic value ofmusic? This question has recently been empirically addressed by Istóket al. (2009) employing a verbal association task. It was observed thatthe adjective “beautiful” resides at the core of this concept andtherewith represents the optimal linguistic device for expressing theaesthetic value of music. In addition, differences in music experts' and

laypersons' concepts of music aesthetics were found. Laypersonsproduced adjectives related to mood or mood regulation morefrequently than music experts. For music experts, the stimulatingcharacter of music, as well as its novelty and originality, seems to be ofgreater importance as reflected in their inclination to list the adjectives‘varying’, and ‘original’ more frequently than laypersons.

Further evidence for differences between music experts and lay-persons comes from studies focused on the outcome of the aestheticjudgement process, the verdict itself. For example, Smith and Melara(1990) observed that experts prefer unusual chord sequences morethan laypersons. Crozier (1974) also reports variation in musicpreferences dependent on the level of musical training. These results,together with those reported by Istók et al. (2009), suggest that musicexperts and laypersons might also differ in regard to music aestheticjudgement processes. The present study is aimed at exploring theseprocesses and how they are moderated by different levels of musicexpertise.

Moreover, neuroscientific studies provide strong evidence for thesignificant impact of music training on cognitive processes connectedwith music perception (Bigand and Poulin-Charronnat, 2006). Thisstrengthens the hypothesis, also accounted for by Brattico and Jacobsen(2009), that the process of judging music aesthetically would be

41M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

similarly affected by brain specificities that were acquired throughmusic training.

In regard to perceptive and cognitive aspects of music processing,differences between experts and laypersons have, for example, beenobserved in electrophysiological studies. Selected P2 and ERAN resultsare mentioned here; for broader reviews on differences betweenmusic experts and laypersons see Hannon and Trainor (2007) andTervaniemi (2009). Shahin et al. (2003) report effects of musicalexpertise on the P2. They observed larger P2 amplitude in professionalviolinists and skilled pianists, as compared to students withoutmusical training, in response to violin and piano tones. Generators ofthe P2 have been located in primary and secondary auditory cortices(Shahin et al., 2003) as well as in the anterior cingulum (Baumannet al., 2008). Generator activity has been proven to differ betweenmusicians and non-musicians (Baumann et al., 2008). The enhance-ment of the P2 could indicate that in these areas, cortical representa-tions for musical stimuli are extended and synaptic communication isenhanced through music training. Sensitivity to training of P2 hasbeen confirmed by other researchers (as for example, Atienza et al.,2002; Bosnyak et al., 2004; Kuriki et al., 2006 and Tremblay et al.,2001).

Another reliable indicator for the influence of musical expertise oncognitive music processing is the Early Right Anterior Negativity(ERAN). The ERAN has its maximum amplitude at approximately200 ms after the onset of the decisive chord, and is observed at rightanterior recording sites. It denotes the difference observed in theERP in response to harmonically expected events, as compared tounexpected events. The ERAN is present even in laypersons but it ismore pronounced in music experts (Koelsch et al., 2002) emphasisingthe influence of training on brain functioning.

Empirical studies have rarely addressed whether musical trainingalso influences brain functioning that is not purely cognitive (as, forexample, aesthetic judgement processes). Aesthetic judgments areoften characterized as “self-referential” (Jacobsen et al., 2006) whichmeans that the system uses itself as a source of information to arriveat a verdict. This includes affective information, such as feelings ofpleasure or displeasure elicited by the stimulus in question. Thisindicates that information from affective sources is drawn uponduring aesthetic judgment processes and incorporated into thejudgment. Music is apt to elicit affective responses. This has beenneuroscientfically underscored for the first time by Blood et al. (1999),who specified a neural network associated with affective responses tomusic, demonstrating the originality and discreteness of theseresponses in regard to other cognitive processes important in musicreception. However, the relevance of affective responses to music foraesthetic processing and aesthetic music processing in general has notbeen widely studied with neuroscientific methods. For instance,Gagnon and Peretz (2000) asked their participants to listen to tonaland atonal melodies and to judge whether they sound pleasant orunpleasant. Responses to the pleasant melodies were faster whenthey were presented to the right ear and therewith primarily to theleft hemisphere, whereas responses to the unpleasant melodies werefaster when they were presented to the left ear and therewithprimarily to the right hemisphere (for similar EEG results seeAltenmüller et al. (2002)). If participants were asked for descriptivejudgments (judging whether the melodies are tonal or atonal) thislateralization pattern could not be observed. This provides evidencethat aesthetic appreciation is dissociable from structural evaluation;however, a between subject-design was used. No significant differ-ences between experts and laypersons were observed in regard to thispattern of results. Brattico et al. (2003)1 used the event-relatedpotential (ERP) method to investigate the neural correlates of

1 A full account of this study has been submitted for publication: Brattico, E.,Jacobsen, T., De Baene, W., Tervaniemi, M., submitted. Subjective correctness versusliking judgments of music — an ERP study.

aesthetic evaluative versus non-evaluative judgment processes ofmusical chord sequences, exclusively in music laypersons. Theyintroduced two tasks in their design, one that was of a clearlyevaluative nature (liking or disliking) and another more descriptivetask, relative to piano chord sequences that more or less adhered tothe rules that govern Western tonal music. The ERP results showedthat preparatory processes had already occurred before the onset ofthe decisive last chord, with more neural effort devoted to thedescriptive task than the evaluative task.

Similar designs incorporating comparisons between descriptiveand evaluative judgments have been used in ERP studies onevaluative processes in regard to various subject matters (Cacioppoet al., 1994; Crites et al., 1995; Jacobsen and Höfel, 2003; Schupp etal., 2000a, 2000b) and also on affect regulation (Hajcak et al., 2006;Moser et al., 2006). In the study by Hajcak et al. (2006) a visualstimulus was presented and the participants were instructed toperform a more or less affective judgment task. The Late PositivePotential (LPP), an ERP component, proved to be sensitive to this taskmanipulation. Larger positive amplitudes were observed whenparticipants carried out the affective evaluation task than whenthey carried out the descriptive judgment task. Similar to theseaffective evaluations, aesthetic judgments also draw on affect as aninformational source.

1.1. Present study

To gain more insight into aesthetic evaluative processes and onhow they are affected by musical expertise, the present study wasconducted drawing on a combination of behavioural and electro-physiological measures. Firstly, in line with the results reported byGagnon and Peretz (2000) and by Altenmüller et al. (2002), strongerlateralization effects are expected for the beauty judgment task thanfor the correctness judgment task. Secondly, the affective aspects ofmusic aesthetic processes might be reflected in enhanced LPPamplitudes. Underlying this hypothesis is Meyer's (1956) claimthat confirmation and disconfirmation of harmonic expectanciessuccessfully elicit affective responses. Empirically, this fundamentalquestion has been addressed and answered in the affirmative bySteinbeis et al. (2006). Since musical material elicits affectiveprocesses, investigations can be made into the role this affectiveinformation plays in judgment processes. Thirdly, in addition to LPPanalyses, ERPs will also be analysed for the presence of thecontingent negative variation component (CNV), a slow negativepotential with a shallow slope, which has been linked to processesprior to task execution initiated by the presentation of a cueingstimulus (Brunia and Damen, 1988; Gomez et al., 2007; Walter et al.,1964). Whereas earlier studies pointed to the relation of expectancyand motor preparation to the CNV, later studies suggest that theproportions of sensory, cognitive and motor preparation reflected inthe CNV are dependent upon the task at hand (Birbaumer, 1990).Recently, a connection between CNV and task difficulty, as well as theamount of effort invested into the preparation and execution of a taskhas been observed (Falkenstein et al., 2003; Lorist et al., 2000). In thecontext of the present study, it can be assumed that there aredifferences in the preparation processes for the two tasks. Fourthly, itis assumed that music expertise affects the outcome of the judgmentprocess (the verdicts themselves) as has been indicated in thebehavioural studies on music aesthetics mentioned above. Forinstance, experts may judge unusual harmonies as more beautifulthan laypersons. Fifthly, we expect that the differential cognitiveprocessing of the music stimuli would be reflected in the P2 andERAN, replicating previous data. Finally, concerning specific differ-ences in aesthetic neural processing according to musical expertise,no specific hypotheses are formulated as there are no prior results todraw upon.

42 M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

2. Materials and methods

2.1. Participants

On total, 39 normal hearing, healthy participants aged 19–34 yearsparticipated in the experiment for partial fulfillment of courserequirements or monetary compensation. The participants reportednormal or corrected-to-normal visual acuity, no known neurologicalcondition and none were taking any medication that might affect thecentral nervous system. Written informed consent was obtained fromeach participant at the beginning of the experiment. Subsequently,handedness was determined with the Edinburgh inventory (Oldfield,1971). The data of seven participants had to be excluded from theanalysis because of technical problems (in one case, malfunctioning ofthe data recording system; in four cases, malfunctioning of too manyessential electrodes; in two cases, artifacts due to sweating). Of theremaining 32 participants, 16 participants (10 females, aged onaverage 23.9 years, SD=3.84, all right-handed) were advancedstudents of musicology (having at least completed two years oftheir university studies), forming the group of music experts and 16participants (8 females, aged on average 25.4 years, SD=3.03, twoleft-handed) were students of various other fields of study not relatedto music, forming the group of music laypersons. Mean age anddistribution of sexes did not differ significantly between the twogroups. All music experts reported having received instructions inmusic theory and being able to play one or more instruments withaverage practice times amounting to 8.2 h per week and, on average,5.1 years of private music lessons. In the group of music laypersons,only 25% had received instruction in music theory and 56% played oneor more instruments with average practice time of this subgroupamounting to 1.7 h per week and on average 1.1 years of privatemusic lessons.

2.2. Stimulus material

A total of 180 four-part piano sequences of four triads preceding afinal chordwere used. There were two basic types of sequences of fourtriads preceding the ending chords in the cadences: IV, II, VI, VII or I,IV, II, V. From these two types, ten four-chord sequences were com-posed, varying the chord inversions and registers of each chord in thesequence. The 10 sequence typeswere then combinedwith 6 differenttypes of ending chords from three different categories. The chordsoccurring in the final position were varied in their harmoniccongruency with the context established by the first four chords,thereby creating 3 stimulus categories — stimuli with congruous,ambiguous or incongruous ending chords. Transposing the 30 original

Fig. 1. Example sequence with its three possible ending chords tha

cadences over 6musical scales from B flatmajor to E flatmajor yielded180 different stimuli (Fig. 1).

The chord cadences were produced with the Reason software(a sequencing program that contains a virtual sampler; Propeller-head Software, Stockholm, Sweden). The sounds (mono, 44,100 Hzsampling rate)were sampled from a grand piano (volumeswere keptnatural and thus not fully calibrated) and they were edited withcutoff frequency and EQ. The grand piano was multi-sampled: everythird key was sampled (an octave consists of four samples).

2.3. Apparatus and electrophysiological recordings

An electrically shielded and sound-attenuated experimental cham-ber (International Acoustic Company) was used. The presentation wasshown on a 15-inch flat screen using Matlab 7.0 (Cogent-Toolbox) torun the experiment. The auditory stimuli were presented binaurallyover headphones (with the loudness ranging within a given chordsequence between 54 and 68 dB SPL). Matlab registered judgmentresponses and latencies, which were captured using the right and leftbuttons of a four-button response keyboard, beginning 200 ms after theonset of the last chord up to 2000 ms after stimulus presentation. Usinga BIOSEMI Active-Two amplifier system, EEG recordings were madecontinuously with Ag/AgCl electrodes from 128 locations radiallyequidistant from Cz according to the ABC layout (http://www.biosemi.com), which roughly corresponds to the 10–5 extension of theinternational 10–20 system (Oostenveld and Praamstra, 2001). Electro-des were mounted in a nylon cap. Additional electrodes were placed atthe tip of the nose, which served as reference, and at the left and rightmastoid sites. Electroocular activity was recorded from two bipolarchannels. The vertical EOG was recorded from the right eye by supra-and infraorbital electrodes. The horizontal EOG was recorded fromelectrodes lateral to the outer canthi. EEG and EOG recordings weresampled at 512 Hz. Offline signal processing was carried out oncomputer work stations running EEP 3.1 (MPI-CNS; ANT-Software)under Linux.

2.4. Procedure

2.4.1. DesignType of judgment task (beauty and correctness) and stimulus

category (congruous, incongruous, and ambiguous) were fullycrossed within participants. Thus, 180 beauty and 180 correctnessjudgments were required, each of the two task groups included the 60congruous, 60 incongruous, and 60 ambiguous trials. The itemsequence was pseudo-randomized according to the following con-straints: amaximumof 3 trials of a given task and stimulus category in

t either render it to be congruous, ambiguous or incongruous.

43M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

sequence. The experiment consisted of 4 blocks, each block included90 trials. Response key assignments (no/yes, yes/no) were counter-balanced across participants.

2.4.2. Trial structureA white fixation cross appeared in the middle of the black screen

for 800 ms. The fixation cross disappeared and the cue (‘beautiful?’,‘correct?’) was shown for 1200 ms. Viewing distance was 1.73 mresulting in a visual angle of 1.2°. The question disappeared and thefixation cross was shown until the end of the trial. 800 ms after theonset of the fixation cross, the stimulus sequence was presentedbinaurally via headphones for 3000 ms. Judgment responses andlatencies were recorded starting 200 ms after the onset of the lastchord plus 2000 ms afterwards. A variable uniformly distributedinterval (mean length, 200 ms; maximum length, 400 ms) was addedin between trials.

2.4.3. Experimental sessionThe experiment was structured as follows. After having filled the

letter of agreement for participation, the electrodes were applied andthe participants were seated in the dimly lit experimental chamber.Participants were instructed to listen to the sequences and to decidewhether the sequence sounded correct or incorrect when the cor-rectness task was cued and to decide whether the sequence soundedbeautiful or not beautiful when the beauty task was cued. No furtherexplanation was given to them. Furthermore, participants wereinstructed to answer faithfully and speedily. During the experimentaltrials, they were asked to avoid excessive blinking.

Two blocks of eight practice trials were administered usingsequences and trial structure equivalent to those in the main ex-periment. After a short break, during which participants were askedwhether they had any further questions and whether they hadunderstood the tasks, the experiment was started. All participantsreported having understood what was requested of them. Anexperimental session lasted approximately 50 min (plus additionaltime for electrode application and removal).

2.4.4. Data analysisFor the analysis of the behavioural responses, three additional

variables were computed from the original answer frequencies. Thefirst two aremeasures of inter-task answer concordance, and the thirdis a measure for answer stability. The inter-task answer concordancescore relies on the fact that all 180 musical stimuli were presentedtwice — once for the correctness and once for the beauty judgmenttask, making it possible to check if each stimulus received the samerating under both task demands. Thus, a new variable was constructedwhich will be referred to as “per-stimulus-concordance-score”. Itreceived the value 1 in cases when the same stimulus was judged asboth beautiful and correct, or when a stimulus was judged as both notbeautiful and incorrect. If the ratings in both tasks did not match, thevalue 0 was given. For each participant, a mean per-stimulus-concordance-score was calculated. Values range between 0 and 1with low values signifying low inter-task concordance and highvalues signifying high inter-task concordance. The per-stimulus-concordance-score was complemented by a second concordancescore, the “per-cadence-concordance-score” that was derived with aslightly different calculation. The 180 stimuli consisted of 30 differentcadences transposed into six different keys. Analysis of behaviouraldata confirmed that key does not influence beauty (Chi²=7.27,df=5, p=0.172) or correctness judgments (Chi²=3.70, df=5,p=0.593). Therefore, the six versions of one cadence differing onlyin key were collapsed into one category resulting in 30 differentcategories. For each category, a mean beauty and correctness scorewas calculated by averaging the six ratings. A difference score wascalculated for each cadence by subtracting the mean score in thecorrectness judgment task from the mean score in the beauty judg-

ment task. To increase the comparability with the per-stimulus-concordance-score, the per-cadence-concordance-score was recodedso that values range from 0 to 1 with low values signifying low inter-task concordance and high values signifying high inter-task concor-dance. It is reasonable to assume that both concordance scores areinfluenced by answer tendencies. If someone has a very high or verylow percentage of yes answers, his/her concordance scores might beartificially increased. To take this into account, the variance that theconcordance scores and the answer tendency score share waseliminated from the concordance scores. To this end, two regressionanalyses were conducted with the concordance scores as dependentvariables and answer tendency as an independent variable. Residualswere saved and used for further analysis instead of the originalconcordance scores.

The third additional variable of interest is answer stability. Asmentioned above, the same 30 cadences (transposed into differentkeys) were each presented six times with both tasks. If a cadencereceives the same judgment every time it is presented, then answerstability for this cadence is high. Thus, answer stability was calculatedby averaging the six judgments for each cadence in each task. On total,60 answer stability scores were derived in this manner (30 per task).Mean answer stability for both tasks was calculated by averaging therespective 30 answer stability scores. Originally, values rangebetween 0 and 1 with 0 or 1 denoting the highest answer stabilityand 0.5 denoting the lowest answer stability. After recoding, valuesrange from 1 to 4, 4 denoting the lowest and 1 the highest answerstability.

Response time data was subjected to logarithmic transformation.Subsequently, all ensuing analyses were conducted twice: once withaverages computed from transformed data and once with averagescomputed from untransformed data. Results obtained from analyseswith transformed data did not differ, in any way, from those obtainedwith non-transformed data; therefore, solely the latter are reported.

All continuous EEG records were filtered off line via a band-passwith a finite impulse response filter (FIR) with the followingspecifications: 9275 points, critical high-pass frequency of 0.1 Hz,and critical low-pass frequency of 20 Hz. Artifacts were rejected usinga standard deviation criterion in a sliding window of 200 ms (verticalEOG, horizontal EOG, Cz, right and left mastoids, 40 μV). Contami-nated epochs were excluded from further analysis. For comparison,and because both methods are theoretically plausible, two slightlydiffering strategies were pursued at the same time to average the ERPsfor the analysis of task differences during the last chord. In version A,wrong answers in the correctness judgment task were excluded,whereas in version B they were included. Figures presenting theresults of these analyses are based on the data from version A.Statistics and mean values in the results section are reported for bothversions. Different time windows were chosen for further analysesafter artifact rejection. For the analysis of cue processing, epochs of1300 ms including a 100 ms pre-cue baseline were averaged(excluding on average 40.70% of the epochs due to contamination).For the analysis of the first four chords, epochs of 2100 ms including a100 ms pre-sequence baseline were averaged (excluding on average24.78% of the epochs due to contamination). For the analysis of taskdifferences during the last chord, epochs of 1600 ms including a100 ms pre-chord baseline were averaged (excluding on average31.06% of the epochs due to contamination), whereas, for the ERANanalysis, to improve the signal-to-noise ratio, shorter epochs of700 ms including a 100 ms pre-chord baseline were averaged(excluding on average 10.97% of the epochs due to contamination).

Grand averages were subsequently computed from the individual-subject averages. The ERP quantification routine consisted of severalsteps. Time windows were chosen after the inspection of differencewaves. All analyses were based on individual mean amplitudes in thegiven time window at the respective electrode locations. For each ofthese time windows, a repeated-measures analysis of variance

Table 1Group means and T-Tests for inter-task answer concordance scores (values range from0 to 1) and answer stability scores (values range from 1 to 4, with value 1 denoting thehighest answer stability).

Experts Laypersons T-Test

Per-stimulus-concordance Mean 0.75 0.76 t(30)=0.16, p=0.877SD 0.11 0.09

Per-category-concordance Mean 0.82 0.83 t(30)=0.39, p=0.701SD 0.10 0.07

Answer stability (beauty) Mean 1.77 1.81 t(30)=0.39, p=0.697SD 0.28 0.38

Answer stability(correctness)

Mean 1.79 1.92 t(23.01)=1.26, p=0.222SD 0.20 0.37

44 M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

(ANOVA)was conducted with the factors “Task” (beauty judgment andcorrectness judgment), “Group” (experts and laypersons) anterior/posterior scalp location “Anterior–posterior distribution” ( F-line, C-line,P-line, and O-line), and lateral scalp location “Laterality” (left [D5, D20,D29, A11],middle-left [C26, D14, A18, A15],midline [C21, A1, A19, A23],middle-right [C13, B20, A31, A28], and right [C5, B23, B13, B8]). That is,20 electrodes were used in a 4x5 grid. Additionally, the factor “Answer”(yes, no) was entered into the analysis of ERPs to the last chord. For theanalysis of the ERAN, the factor “Ending chord category” (congruous,incongruous, ambiguous)was includedand the factors Task andAnswerwere excluded. For this analysis, the averaging of the electric potentialwas computed, disregarding the participants' responses in the beha-vioural beauty and correctness judgment tasks. Subsequent ANOVAswere performed to further analyze the effects. Only significant effectsare reported in detail. If applicable, error percentages reflectingGreenhouse-Geisser (G-G) corrected degrees of freedom and G-Gepsilon (ε) values are reported.

To avoid confounding as described in the Simpson's paradox, thefactors Ending chord category and Answer were never enteredtogether into the same repeated-measures ANOVA performed onthe behavioural and ERP data (Greenland et al., 1999; Simpson, 1951).This was true for analyses with both behavioural and electrophysi-ological data. In the case of response time analyses, if the factor levelsof the factor Answer were collapsed for the computation of the maineffect for the factor Ending chord category, distortion of the responsetimemeans would ensue. The distortion is caused by a combination oflarge discrepancies in answer frequencies with considerable differ-ences in response timemeans for the subcategories. For example, veryfew congruous stimuli received the rating incorrect or not beautiful.Answering in the negative to a congruous stimulus (neg_con), how-ever, took considerably longer than answering affirmatively to it(pos_con). In a repeated-measure ANOVA, both means, pos_con andneg_con, would be weighted in an equal way during the averagingprocess disregarding the actual frequencies which form the basis ofthese means.

3. Results

3.1. Behavioural data

3.1.1. Answer frequenciesThe frequency of positive answers differed according to stimulus

categories. Congruous sequences received the highest amount ofpositive judgments (M=79.73%, SD=16.52%), whereas incongruoussequences were less frequently judged as beautiful or correct(M=19.69%, SD=17.38%) and ambiguous sequences received anintermediate level of positive answers (M=49.98%, SD=18.95%).This pattern of results was significant in a Kendall-W-test withW=0.97 and pb0.001. An exploratory repeated-measures ANOVAthat comprised the factors Ending chord category, Task and Groupwas computed without subjecting the data to prior transformation, asnone of the variables entered violated the criteria for normaldistribution in either group. Apart from the significant main effectof the factor Ending chord category, there was a significant interactioneffect between the factors Task and Group (F(1,30)=5.43, p=0.027).As revealed by a separate Wilcoxon-test for the group of laypersons,the frequency of positive judgments was (with 52.17% positiveanswers) significantly higher (Z=−2.17, p=0.030) in the correct-ness judgment task than in the beauty judgment task (with only44.33% positive answers). For the group of experts, this wasdescriptively reversed with 51.44% positive answers in the beautyjudgment task and 47.25% in the correctness judgment task. However,for the experts this effect was not significant (Z=−0.57, p=0.569).According to Cohen (1988) the group difference observed here can beregarded as a large effect, because its effect size surpassed the value of0.8 with Cohen's d amounting to 0.82. To test whether experts have

stronger aesthetic preferences for unusual chord functions thanlaypersons, Wilcoxon–Mann–Whitney tests were computed compar-ing the two groups' amount of positive answers in response toambiguous and incongruous sequences. Experts responded affirma-tively to the beauty question in 50.97% (SD=18.47%) of the trials withambiguous sequences, compared to a rate of 40.94% (SD=18.40%) inlaypersons. This group difference was not significant (U=85,W=221, p=0.105). The difference between the groups was evensmaller for the amount of positive answers to the beauty questionwhen incongruous sequences were presented. Experts judgedincongruous chord sequences as beautiful in 20.35% (SD=17.00%)of the cases and laypersons in 17.35% (SD=18.83%) of the cases(U=104, W=240, p=0.381). Differences between experts andlaypersons were examined for further behavioural measure basedon answer frequency. As shown in Table 1 for both inter-task answerconcordance scores, no mean differences were observed between thetwo groups. The same is true for the answer stability scores.

3.1.2. Response timesOn average, participants responded 34 ms faster on trials de-

manding beauty judgments (M=1264.50 ms, SD=316.24 ms) com-pared to trials demanding correctness judgments (M=1298.29 ms,SD=309.07 ms). This difference was significant (F(1,30)=8.50,p=0.007) in a repeated-measures ANOVA comprising the factorsTask, Answer and Group. Furthermore, participants responded 66 msfaster when their response was negative compared to whenanswering in the positive. This main effect of the factor Answer wasalso significant (F(1,30)=12.38, p=0.001). No other significant effectswere obtained in this ANOVA.

A second repeated-measures ANOVA was conducted, which com-prised the factors Task, Ending chord category and Group. Again, themain effect for the factor Task was significant (see above). There wasalso a significant main effect for the factor Ending chord category(F(2,60)=5.77, p=0.012, ε=0.716). Pairwise comparisons revealedthat response times to ambiguous chord sequences were significantlyslower compared to response times to both congruous (p=0.032)and incongruous chord sequences (pb0.001). Response times to con-gruous sequences did not significantly differ from response times toincongruous sequences (p=0.401). No other significant effects wereobserved.

3.2. Electrophysiological data

3.2.1. P2The P2 component elicited by the onset of the first chord in the

sequence differed between experts and laypersons. As shown in Fig. 2,at FZ, the laypersons' P2 peaked at 186 ms after the onset of thesequence, with an amplitude of 4.64 μV whereas for experts thehighest amplitude measured at this recording site amounted to7.83 μV at 180 ms. The difference at FZ between the two groups wasmaximal at 156 ms amounting to 3.75 μV. To calculate the meanamplitude of the P2, the timewindow from 100 to 250 mswas chosen.

Fig. 2. Grand averages of the ERPs recorded at FZ during the presentation of the firstfour chords for experts (dotted line) and laypersons (dashed line). Time window (100to 250 ms) for P2 quantification marked in light gray. Scalp map of voltage distributionof the group difference for this time window.

45M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

Taking all electrodes in the grid into account, the mean amplitude ofthe P2 was higher in experts (M=2.20 μV) than in laypersons(M=0.16 μV). This between-subjects effect was significant (F(1,30)=13.84, p=0.001) in a repeated-measures ANOVA including the factorsGroup, Laterality and Anterior–posterior distribution. The effect size(Cohen's d) of this group difference amounted to 1.30. Therewith, the

Fig. 3. Grand averages of the ERPs recorded at FZ for congruous (dotted line) and incongruouthe figure) and laypersons (right side of the figure). Time window (200 to 240 ms) for ERANbetween congruous and incongruous ending chords during this time window for both grou

effect is considered to be large. No differential distribution pattern ofvoltages for the two groups was observed, with the highest positiveamplitudes recorded at right fronto-temporal and fronto-central elec-trodes (main effect of Anterior–posterior distribution: F(3,90)=7.86,p=0.006, ε=0.397; main effect of Laterality: F(4,120)=2.76,p=0.066, ε=0.549; and interaction Laterality and Anterior–posteriordistribution: F(12,360)=8.55, pb0.001, ε=0.392).

3.2.2. ERAN to ending chordsAn ERAN to incongruous sequences was observed. As shown in

Fig. 3, at FZ, 215 ms after the onset of the last chord of the sequence,the difference between the ERPs measured for incongruous andcongruous ending chords reached its maximum. Peak amplitudeamounted to −1.16 μV. For statistical analysis a time window from200 to 240 ms was chosen. In this time window, the mean amplitudeof incongruous ending chords was on average by 0.83 μV (Mdiff) morenegative than for congruous ending chords. This main effect of Endingchord category was significant (F(1,30)=5.71, p=0.023) in arepeated-measures ANOVA comprising the factors Ending chordcategory, Laterality, Anterior–posterior distribution and Group. Forexperts this effect was higher (peak amplitude of the differencewave at FZ=−1.55 μV, latency=217 ms, Mdiff=−1.21 μV) than forlaypersons (peak amplitude of the difference wave at FZ=−0.78 μV,latency=213 ms, Mdiff=−0.48 μV). However, the interaction effectof the factors Ending chord category and Group was not significant(F(1,30)=1.15, p=0.292). The ERAN was especially pronounced atcentral and anterior recording sites which was confirmed by asignificant interaction effect between the factors Ending chordcategory and Anterior–posterior distribution (F(3,90)=4.97,p=0.022, ε=0.447). In the anterior portion of the scalp, it was alsoslightly lateralized to the right — shown by the interaction effectbetween the factors Ending chord category, Anterior–posteriordistribution and Laterality (F(12,360)=1.72, p=0.098, ε=0.646).Further significant results showed that, independent of endingchord category, voltages in this time window were not homoge-neously distributed over the scalp. For example, more positive meanamplitudes were recorded from fronto-central recording sites and

s (dashed line) ending chords in separate voltage–time diagrams for experts (left side ofquantification marked in light gray. Scalp maps of voltage distribution of the differenceps.

46 M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

more negative voltages from posterior and peripheral electrodes. Thiswas confirmed by significant main effects of the factors Laterality(F(4,120)=26.89, pb0.001, ε=0.623) and Anterior–posterior distri-bution (F(3,90)=53.49, pb0.001, ε=0.701). There was also a signi-ficant interaction effect between the factors Laterality and Anterior–posterior distribution, indicating a stronger centralization for theanterior half of the scalp then for the posterior half (F(12,360)=6.71,pb0.001, ε=0.535). Furthermore, experts showed larger posteriornegativities than laypersons, as revealed by a significant interactioneffect between the factors Anterior–posterior distribution and Group(F(3,90)=6.52, p=0.002, ε=0.623).

At FZ, for laypersons the peak amplitude of the difference wavebetween ambiguous and congruous ending chords was approximately0 μV at a latency of 215 ms, whereas for experts, a peak in thedifference wave was observed at 213 ms with an amplitude amount-ing to −1.06 μV. In a repeated-measures ANOVA comprising thefactors Ending chord category, Laterality, Anterior–posterior distri-bution and Group the main effect for the factor Ending chord categorywas not significant (F(1,30)=0.17, p=0.682), indicating that therewas no general ERAN effect. However, there was an interaction effectbetween the factors Ending chord category and Group (F(1,30)=2.97,p=0.095). Furthermore an interaction effect between the factorsEnding chord category, Laterality and Anterior–posterior distribution(F(12,360)=1.95, p=0.071, ε=0.523) and an interaction effectbetween the factors Ending chord category and Anterior–posteriordistribution (F(3,90)=3.93, p=0.032, ε=0.572) were obtained, asERAN amplitudes measured at fronto-central and right fronto-temporal electrodes were the highest as can be seen in Fig. 4.Following up the latter interaction effect with a separate ANOVAincluding only data recorded from anterior electrodes (D5, D20, C26,D14, C21, A1, C13, B20, C5, and B23) yielded no significant main effectfor the factor Ending chord category (F(1,30)=1.64, p=0.210).However, the interaction effect between Ending chord category andGroupwas increased in strength for anterior electrodes (F(1,30)=5.61,p=0.024). Separate ANOVAs for each group revealed that an ERAN toambiguous ending chords was observed at anterior recording sitesonly for experts (Mdiff=−0.83 μV, main effect of Ending chord

Fig. 4. Grand averages of the ERPs recorded at FZ for congruous (dotted line) and ambiguousthe figure) and laypersons (right side of the figure). Time window (200 to 240 ms) for ERANbetween congruous and ambiguous ending chords during this time window for both group

category: F(1,15)=4.68, p=0.047) and not for laypersons (Mdiff=0.25 μV, main effect of Ending chord category: F(1,15)=1.03,p=0.327). The effect size (Cohen's d) of this group differenceamounted to 0.83. Therefore, the effect is considered to be large.Further significant results were comparable to those obtained in theanalysis of the incongruous ending chords. There were also significantmain effects for the factors Laterality (F(4,120)=31.86, pb0.001,ε=0.621) and Anterior–posterior distribution (F(3,90)=42.52,pb0.001, ε=0.526) as well as significant interaction effects betweenthe factors Anterior–posterior distribution and Group (F(3,90)=3.00,p=0.071, ε=0.526) and the factors Laterality and Anterior–posteriordistribution (F(12,360)=5.35, pb0.001, ε=0.613).

3.2.3. Cue-intervalFor the timewindow between 1600 and 1200 ms prior to the onset

of the first chord of the sequence (400 ms into the cue presentation),it was tested whether the ERP responses to the two cues ‘beautiful’versus ‘correct’ differed in amplitude (see Fig. 5). On average, thedifference in mean amplitude between the two tasks amounted to0.34 μV. In a repeated-measures ANOVA comprising the factors Task,Laterality, Anterior–posterior distribution and Group, this main effectof the factor Task was not significant (F(1,30)=0.578, p=0.453).However, for experts we observed a lower positivemean amplitude inthe beauty judgment task (M=0.38 μV) than in the correctnessjudgment task (M=1.65 μV); whereas for laypersons, this wasreversed because a higher positive mean amplitude was observed inthe beauty judgment task (M=2.33 μV) compared to the correctnessjudgment task (M=1.73 μV). This interaction effect between thefactors Task and Group was significant (F(1,30)=4.39, p=0.045).Following up this interaction with separate ANOVAs for each grouprevealed that the main effect of the factor Task was only significant forthe experts (F(1,15)=3.61, p=0.077) but not for the laypersons(F(1,15)=1.03, p=0.327). The effect size (Cohen's d) of this groupdifference amounted to 0.74. Therefore, the effect is considered tobe of medium size. Furthermore, the general ANOVA including allaforementioned factors yielded task independent effects pertaining todistribution of voltages measured across the scalp. There were

(dashed line) ending chords in separate voltage–time diagrams for experts (left side ofquantification marked in light gray. Scalp maps of voltage distribution of the differences.

Fig. 5. Grand averages of the ERPs recorded at PZ during the presentation of the cue for the beauty judgment task (dotted line) and the correctness judgment task (dashed line) inseparate voltage–time diagrams for experts (left side of the figure) and laypersons (right side of the figure). Time window (−1600 to−1200 ms) for quantification of the ERP effectmarked in light gray. Scalp maps of voltage distribution of the difference between the cue for the beauty and the cue for the correctness judgment task during this time window forboth groups.

47M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

significant main effects for the factors Laterality (F(4,120)=2.64,p=0.089, ε=0.428) and Anterior–posterior distribution (F(3,90)=8.53, p=0.002, ε=0.472) as well as significant interaction effectsbetween the factors Laterality and Anterior–posterior distribution(F(12,360)=9.00, pb0.001, ε=0.477) and between the factors Later-ality and Group (F(4,120)=2.9, p=0.072, ε=0.428). For laypersons,higher positive amplitudes were measured at right hemisphericrecording sites than at left hemispheric recording sites, independentofwhich taskwas cued. Thiswas confirmed by a significantmain effectof the factor Laterality in a separate ANOVA for the group of laypersons(F(4,60)=5.72, p=0.014, ε=0.390)whereas experts did not show thisdistributional pattern (F(4,60)=0.10, p=0.855, ε=0.389).

3.2.4. First four chordsFor the timewindow comprising the first 300 ms of the third chord

(from 1000 to 1300 ms after the onset of the chord sequence), nosignificant general mean amplitude difference between the beautyand the correctness judgment task was observed. This was confirmedby a repeated-measures ANOVA including the factors Task, Laterality,Anterior–posterior distribution and Group (F(1,30)=1.07, p=0.310).The interaction effect between the factors Task and Group was,similarly, not significant (F(1,30)=1.53, p=0.227). However, layper-sons showed task differences at midline electrodes whereas expertsdid not (see Fig. 6). This was confirmed by a significant interactioneffect between the factors Task, Laterality and Group (F(4,120)=3.41,p=0.040, ε=0.499), an ensuing separate ANOVA exclusively includ-ing midline electrodes (C21, A1, A19, and A23) which revealed asignificant interaction effect between the factors Task and Group(F(1,30)=3.66, p=0.065) and separate ANOVAs for each group. Themain effect of the factor Task was significant at midline electrodesonly for laypersons (F(1,15)=5.98, p=0.027). For experts it was notsignificant (F(1,15)=0.001, p=0.974) (see Fig. 6). The effect size(Cohen's d) of this group difference amounted to 0.68. Therefore, theeffect is considered to be of medium size.

Independent of the factors Task and Group, maximal negativevoltages were measured at fronto-central electrodes. This was statisti-cally confirmed by significant main effects for the factor Laterality(F(4,120)=13.32, pb0.001, ε=0.568) and the factor Anterior–posteriordistribution (F(3,90)=67.05, pb0.001, ε=0.509).

3.2.5. Last chordFor the time window between 600 ms and 1200 ms after the onset

of the last chord of the sequence, a mean amplitude differencebetween the two tasks was observed. It amounted to 0.81 μV(averaging version A), or 1.12 μV (averaging version B) respectively,and was significant in a repeated-measures ANOVA including thefactors Task, Answer, Laterality, Anterior–posterior distribution andGroup (A: F(1,30)=4.14, p=0.051; B: F(1,30)=8.06, p=0.008). Forlaypersons, the mean amplitude for the beauty judgment task was3.91 μV and for the correctness judgment task 2.63 μV (A), or 2.42 μV(B) respectively, whereas the difference for the experts was smallerwith 3.54 μV in the beauty judgment task and 3.20 μV (A), or 2.80 μV(B) respectively, in the correctness judgment task (see Fig. 7).

In separate ANOVAs for each group, only the laypersons showed asignificant main effect of the factor Task (A: F(1,15)=6.41, p=0.023;B: F(1,15)=10.25, p=0.006) whereas experts did not (A: F(1,15)=0.31, p=0.587, B: F(1,15)=1.38, p=0.258). The interaction effect,however, between the factors Task and Group was not significant inthe general ANOVA (A: F(1,30)=1.38, p=0.249, B: F(1,30)=0.88,p=0.355). The effect size (Cohen's d) of this group differenceamounted to 0.18 (A) and 0.27 (B). Therefore, the effect is consideredto be small. Furthermore, there was a significant interaction effectbetween the factors Task and Laterality (A: F(4,120)=3.37, p=0.025,ε=0.695; B: F(4,120)=3.56, p=0.020, ε=0.695) which disappearedfor the group of laypersons (A: F(4,60)=0.67, p=0.551, ε=0.641; B:F(4,60)=0.72, p=0.523, ε=0.637) in the separate ANOVA, butremained significant for the experts (A: F(4,60)=4.34, p=0.014,ε=0.624; B: F(4,60)=4.56, p=0.011, ε=0.625) (see Fig. 7). However,

Fig. 6. Grand averages of the ERPs recorded at FZ during the presentation of the first four chords for the beauty judgment task (dotted line) and the correctness judgment task(dashed line) in separate voltage–time diagrams for experts (left side of the figure) and laypersons (right side of the figure). Timewindow (1000 to 1300 ms) for quantification of theERP effect marked in light gray. Scalp maps of voltage distribution of the difference between the beauty and the correctness judgment tasks during this time window for both groups.

48 M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

these results should be regarded as explorative, as they were notindicated by a significant three-way interaction between the factorsTask, Laterality and Group (A: F(4,120)=0.85, p=0.464, ε=0.695; B:F(4,120)=1.01, p=0.387, ε=0.695) in the omnibus ANOVA. For thepositive potential observed in this time window, the highest meanamplitudes were measured at parietal electrodes (interaction Later-ality×Anterior–posterior distribution: A: F(12,360)=2.46, p=0.021,ε=0.55; B: F(12,360)=2.60, p=0.016, ε=0.55).

4. Discussion

In the scope of this study, music aesthetic judgment processeswere explored, and influences of musical expertise on these processeswere especially of interest. Adopting a mixed design with twodifferent tasks performed on the same stimulus material, behaviouraland electrophysiological results were obtained. Pertaining to cogni-tive music processing, the previously reported sensitivity of certainERP components (P2 and ERAN) to different levels of music expertisewas replicatedwith the current data. Analyses focused on the specificsof aesthetic evaluation yielded group differences occurring atdifferent stages in the music judgment process. Subsequent to adiscussion of the behavioural results, the ERP results are discussed indetail.

The analysis of the behavioural data resulted in few significantdifferences between the two groups of participants. Experts andlaypersons were surprisingly homogenous in their behaviouralresponses: solely the answer frequencies to the two tasks weredifferent for each group. Laypersons were more inclined to respondaffirmatively in the correctness judgment task than in the beautyjudgment task. In contrast to that, experts showed a non-significanttendency for the reversed pattern. The previously reported result thatmusic experts prefer unusual chord sequences more than laypersons(Smith and Melara, 1990) could not be fully replicated. However, theeffects analysed showed the expected direction. The failure toreplicate Smith and Melara's results might be attributable to thedifferent nature of the stimulusmaterial used in their study; they usedsolely chord progressions that stayed within the boundaries of musicsyntactical rules, whereas the chord functions used in the presentstudy often violated these boundaries.

Inter-task answer concordance scores did not differ between thetwo groups. This means that at the level of the individual stimuli andstimulus categories, no differential answering pattern was observedthat distinguished experts from laypersons. Another behaviouralmeasure of interest was the answer stability, which was computedfrom the original answer frequencies. Low answer stability can betaken as a sign for an answering pattern characterized by randomchoices. The overall stability achieved by the participants in this study

Fig. 7. Grand averages of the ERPs recorded at FZ during the presentation of the last chord for the beauty judgment task (dotted line) and the correctness judgment task (dashed line)in separate voltage–time diagrams for experts (left side of the figure) and laypersons (right side of the figure). Time window (2600 to 3200 ms) for quantification of the ERP effectmarked in light gray. Scalp maps of voltage distribution of the difference between the beauty and the correctness judgment tasks during this time window for both groups.

49M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

ranged at an intermediate level, and there were no differences be-tween experts and laypersons in terms of stability. This is unexpectedbecause it can be assumed that experts trained inWestern tonalmusictheory possess more explicit knowledge about chord functions, aswell as the ability to recognize them, which would be of advantage inthe correctness judgment task and lead to a high answer stability. In asimilar line, for the beauty judgment task we also did not observemore stable preferences in experts than in laypersons, as both groupsproduced an equal amount of random choices.

Independent of expertise, decisions for ambiguous stimuli tooksignificantly longer than decisions for both congruous and incongru-ous stimuli. This can be attributed to the forced-choice answer formatof the tasks. Participants had only two answer possibilities without aneutral category to resort to for ambiguous cases. The beautyjudgment task was executed faster than the correctness judgmenttask. This is contrary to the results fromBrattico et al. (submitted)whoobserved faster responses for correctness judgment (M=407 ms,SD=69 ms, measured from the end of the five-chord-cadence)compared to liking judgments (M=453 ms, SD=70 ms). The sameis true for results reported by Mandler and Shebo (1983) whocontrasted liking judgments of words and paintingswith either lexicaldecisions in the case of words, or recognition ratings in the case of

paintings. An explanation might lie in the difference between beautyjudgments as used in the present study and liking judgments asrecorded by Brattico et al. (2003) andMandler and Shebo (1983). Suchdifference, hypothesized by Brattico and Jacobsen (2009), hints atdivergent mental processes during liking and beauty evaluation,which need to be further investigated.

Higher amplitudes of P2 and ERANwere observed in music expertsas compared to laypersons. These results illustrate the experts' morestrongly developed abilities in regard to music perception and cog-nition. This was especially noticeable in the ERPs measured at thebeginning of the chord sequences (P2 effect as reported by Shahinet al. (2003)) and in response to subtle incoherence in chord pro-gressions and their preceding context, as reflected in the ERAN toambiguous chord sequences (as reported by Koelsch et al. (2002)).

The analysis of ERP responses focused on aesthetic processingyielded group differences occurring at three different stages in themusic evaluation process. During the processing of the cue, groupdependent task differences in the ERP mean amplitude wereobserved. At this early preparatory stage, only experts showed atendency for task differences. This could implicate that expertsactivate different listening modes, according to the respective taskdemand, as soon as they become aware of what the task demand is.

50 M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

More specifically, the ERP difference was observed at posteriorrecording sites and can be characterized as a slow negative goingwave which was more enhanced for the beauty than for thecorrectness judgment task, in experts only. Slow negative potentialsobserved during task preparation have been well studied and termedcontingent negative variation (CNV) (Birbaumer et al., 1990; Bruniaand Damen, 1988; Falkenstein et al., 2003). If task execution isanticipated, brain areas required for solving the task are pre-activated(Wild-Wall et al., 2007). This is reflected in the CNV. Differences inCNV have been related to differences in motor preparation, expec-tancy (Walter et al., 1964), effort invested in a task (Falkenstein et al.,2003) and task difficulty (Lorist et al., 2000). Falkenstein et al. (2003)cued each trial individually with cues that communicated the amountof effort the participants were supposed to apply to the ensuing task.By changes reflected in the CNV, they were able to demonstrate thatinvestment of effort is subjected to voluntary control and can beadjusted from trial to trial. Considering the present data, it is possiblethat the slow negative goingwave observed for experts during the cueinterval, in a similar line as the CNV, reflects higher effort invested intothe preparation for the beauty judgment task. However, thedistribution of the difference wave observed in the present studydiverges greatly from the distribution of voltages reported byFalkenstein et al. (2003). They observed the CNV at anterior recordingsites while, with the current data, a clear posterior distributionemerged. These discrepancies are likely related to the qualitativedifferences in the tasks employed.

In the consecutively analysed time window that starts with theonset of the third chord of the sequence, no differences between theERPs to the two tasks were observed for experts. However, laypersonshere showed task-related differences, especially at midline recordingsites. Their ERP recorded during the correctness judgment task wasmore negative than their ERP recorded during the beauty judgmenttask. Interpreting this slow negative going wave as a CNV, it can beargued that laypersons invest more effort into the performance of thecorrectness judgment task. This makes sense as laypersons, lackinghigher-order musical education and training, probably found it moredifficult to perform the correctness than the beauty judgment task.They mobilised more resources in order to improve their chancesfor arriving at a correct response. However, it is also possible thatlaypersons pre-activate more brain areas (related to sensory,cognitive or motor aspects of the task) or pre-activate certain brainareas more strongly when preparing for the correctness taskcompared to the beauty task. They might have also started preparingfor the execution of the correctness task earlier than for the executionof the beauty task. Similar to the cue interval, the distributionexpected for the CNV does not emerge because task differences reachtheir maxima at midline electrodes, instead of being distributedanteriorily. Furthermore, it could be argued that the CNV is acomponent only observed for intervals prior to stimulus presentation,and that this deflection found already halfway into the stimuluscannot reasonably be discussed in this scope. However, the first fourchords can be regarded as part of the preparatory phase because thedecisive event (the ending chord) that enables the participant toexecute the task has not yet occurred.

The tendency of laypersons to engage different processingstrategies for the computation of the different judgments is sustainedduring the presentation of the last chord, the point in time when allthe information that is needed to finalize the judgment process hasbecome available. This finding of an enhanced LPP during aestheticjudgments at least in laypersons is in line with the results fromHajcaket al. (2006) who found higher mean amplitudes of the LPP inresponse to an affective compared to a non-affective judgment task(cf. also Jacobsen and Höfel, 2003; Brattico et al., 2003). This suggeststhat laypersons, compared to experts, consider their internal affectivestates (as for example feelings of pleasure or displeasure) morestrongly for the computation of aesthetic judgments. Factoring in the

recruiting process, the following explanation for this result can beconsidered: participants, unavoidably, knew that they were recruitedas music experts or laypersons. Experts may have had a strongawareness of their role as specialists and felt called upon to makeexpert decisions for both tasks, thereby repressing the more self-referential, affective side of the beauty judgment. This indicates thatbeauty judgments in themselves can differ depending upon the role inwhich the participant feels him-/herself addressed. Whether anindividual is called upon as a private person or as the representativeof a group activates different processing strategies (Callero, 1985). Torecapitulate, it can be noted that the distinction between experts andlaypersons implies a variation of more than one aspect. Questionspertaining to whether effects found in this study depend ondifferences in knowledge structures or on mechanisms that arediscussed by social psychologists (for example, effects of saliency ofgroup membership) have to be addressed in future studies. However,for early brain responses as P2 and ERAN that have not yet beenproven sensitive to top–down modulation, influences of heightenedsaliency of group membership are unlikely.

Furthermore, it is possible that the stimulus material was tootransparent for the group of experts, which could have made itimpossible for them to refrain from performing the same analyticaloperations during beauty judgments that were necessary forcorrectness judgments. To further follow up this explanatory line,we analysed behavioural response patterns. As mentioned above, aratio of inter-task judgment concordance was calculated for eachparticipant. No differences between the two groups in regard to thismeasurewere observed. Thismeans that experts did not adhere to therule “What is correct is also beautiful” to any greater or lesser extentthan did laypersons. However, the analysis mentioned here drawssolely upon information pertaining to the outcome of the judgmentprocess – the verdict itself – and cannot be satisfactorily used toanswer questions concerning the actual course of processing. Even ifinter-task judgment concordance were 100% for every participant, theprocesses leading to the identical verdicts could still differ. Thereplication of P2 and ERAN effects that have previously been reportedindicates that experts and laypersons differ in regard to earlycognitive processes in response to musical stimuli. This means thatthey already start out with greater processing depth, or possess morecomplex conceptual systems and enhanced attention control formusical stimuli (Shahin et al., 2003; Brattico et al., 2008). Therefore, itis possible that these early, initial processes are responsible forexperts' inability to refrain from performing the same analyticaloperations during beauty judgments as during correctness judgments.

A third, complementary explanation is deduced from resultsreported by Istók et al. (2009) who found that laypersons connect theconcept of music aesthetics more strongly to aspects of mood andaffect regulation than experts. This leads to the assumption that forlaypersons, but not for experts, the aptness of a musical stimulus tomodify affective states is an important criterion when aestheticjudgments about music are formed. Furthermore, Kreutz et al. (2008)applied Baron–Cohen's Empathizing–Systemizing-Theory to themusic domain. They report that experts are more likely to fall intothe category of “Music Systemizers” focusing on regularities and otherdescriptive aspects of the music heard, whereas the laypersons'listening styles are more strongly characterized by emphatic pro-cesses of tuning in to the emotional and subjective aspects of themusic heard. The laypersons' stronger focus on their internal affectivestates is in line with the above-mentioned assumptions, as shown inthe scope of the present study by the LPP difference between aestheticand non-aesthetic judgment processes. The pattern of lateralizationfound in the studies by Altenmüller et al. (2002) and Gagnon andPeretz (2000) was not replicated with the current data. In the presentstudy, participants were asked whether they thought the musicalstimuli to be beautiful or not, whereas in the other studies they wereasked whether they liked or disliked the stimuli. The beauty question

51M. Müller et al. / International Journal of Psychophysiology 76 (2010) 40–51

might, after all, initiate different processes than the liking question (cf.Brattico and Jacobsen, 2009).

Listening to music is often an aesthetic experience involvingcognitive, affective and evaluative processes. Previously, musicexpertise has frequently been found to modulate the corticalprocessing of various aspects of music perception. However, littlehad been known about how aesthetic processes are affected by musicexpertise. Therefore, we investigated whether music experts andlaypersons differ with regard to aesthetic processing of musicalsequences. The present study achieved to combine new insightsconcerning this difference with a replication of neural correlates ofmusic expertise previously documented in the literature.

Acknowledgement

The authors thank Anja Roye and Urte Roeber for the technicalsupport and Sara Bergman for proof-reading the manuscript. Thiswork was supported by the NEST (New and Emerging Science andTechnology) program of the European Commission (FP6-2004-NEST-PATH-028570) and by the Finnish Center of Excellence in Interdisci-plinary Music Research, Department of Music, University of Jyväskylä(Finnish Academy).

Aspects of this work were presented at the Conference on theNeurosciences and Music III. Thereupon, a summary of this workappeared in the conference proceeding published by the New YorkAcademy of Science (Müller et al., 2009).

References

Altenmüller, E., Schürmann, K., Lim, V.K., Parlitz, D., 2002. Hits to the left, flops to theright: different emotions during listening to music are reflected in corticallateralisation patterns. Neuropsychologia. 40, 2242–2256.

Atienza, M., Cantero, J.L., Dominguez-Marin, E., 2002. The time course of neural changesunderlying auditory perceptual learning. Learn. Mem. 9, 138–150.

Baumann, S., Meyer, M., Jäncke, L., 2008. Enhancement of auditory-evoked potentials inmusicians reflects an influence of expertise but not selective attention. J. Cogn.Neurosci. 20, 2238–2249.

Bigand, E., Poulin-Charronnat, B., 2006. Are we “experienced listeners”? A review of themusical capacities that do not depend on formal musical training. Cognition 100,100–130.

Birbaumer, N., Elbert, T., Canavan, A.G.M., Rockstroh, B., 1990. Slow potentials of thecerebral-cortex and behavior. Physiol. Rev. 70, 1–41.

Blood, A.J., Zatorre, R.J., Bermudez, P., Evans, A.C., 1999. Emotional responses to pleasantand unpleasant music correlate with activity in paralimbic brain regions. Nat.Neurosci. 2, 382–387.

Bosnyak, D.J., Eaton, R.A., Roberts, L.E., 2004. Distributed auditory cortical representa-tions are modified when non-musicians are trained at pitch discrimination with40 Hz amplitude modulated tones. Cereb. Cortex 14, 1088–1099.

Brattico, E., Jacobsen, T., 2009. Subjective appraisal of music: neuroimaging evidence.Ann. N. Y. Acad. Sci. 1169, 308–317.

Brattico, E., Jacobsen, T., De Baene, W., Nakai, N., Tervaniemi, M., 2003. Electrical brainresponses to descriptive versus evaluative judgments of music. Ann. N. Y. Acad. Sci.999, 155–157.

Brattico, E., Jacobsen, T., De Baene, W., Tervaniemi, M., submitted. Subjectivecorrectness vs. liking judgments of music — an ERP study.

Brattico, E., Pallesen, K.J., Varyagina, O., Bailey, C., Anourova, I., Jarvenpaa, M., Eerola, T.,Tervaniemi, M., 2008. Neural discrimination of nonprototypical chords in musicexperts and laymen: an MEG study. J. Cogn. Neurosci.

Brattico, E., Pallesen, K.J., Varyagina, O., Bailey, C., Anourova, I., Jarvenpaa, M., Eerola, T.,Tervaniemi, T., 2009. Neural discrimination of nonprototypical chords in musicexperts and laymen: an MEG study. J. Cogn. Neurosci. 21, 2230–2244.

Brunia, C.H.M., Damen, E.J.P., 1988. Response preparation and stimulus anticipation.Electroencephalogr. Clin. Neurophysiol. 70, 28.

Cacioppo, J.T., Crites, S.L., Gardner, W.L., Berntson, G.G., 1994. Bioelectrical echoes fromevaluative categorizations. 1. A late positive brain potential that varies as a functionof trait negativity and extremity. J. Pers. Soc. Psychol. 67, 115–125.

Callero, P.L., 1985. Role-identity salience. Soc. Psychol. Q. 48, 203–215.Cohen, J., 1988. Statistical Power Analysis for the Behavioral Sciences. Lawrence

Erlbaum, New York City.

Crites, S.L., Cacioppo, J.T., Gardner, W.L., Berntson, G.G., 1995. Bioelectrical echoes fromevaluative categorization. 2. A late positive brain potential that varies as a functionof attitude registration rather than attitude report. J. Pers. Soc. Psychol. 68,997–1013.

Crozier, J.B., 1974. Verbal and Exploratory Responses to Sound Sequences Varying inUncertainty Level. In: Berlyne, D.E. (Ed.), Studies in the New ExperimentalAesthetics. Washington D.C, Hemisphere Publishing Cooperation.

Falkenstein, M., Hoormann, J., Hohnsbein, J., Kleinsorge, T., 2003. Short-termmobilizationof processing resources is revealed in the event-related potential. Psychophysiology.40, 914–923.

Gagnon, L., Peretz, I., 2000. Laterality effects in processing tonal and atonal melodieswith affective and nonaffective task instructions. Brain Cogn. 43, 206–210.

Gomez, C.A., Flores, A., Ledesma, A., 2007. Fronto-parietal networks activation duringthe contingent negative variation period. Brain Res. Bull. 73, 40–47.

Greenland, S., Robins, J.M., Pearl, J., 1999. Confounding and collapsibility in causal infer-ence. Stat. Sci. 14, 29–46.

Hajcak, G., Moser, J.S., Simons, R.F., 2006. Attending to affect: appraisal strategiesmodulate the electrocortical response to arousing pictures. Emotion. 6, 517–522.

Hannon, E.E., Trainor, L.J., 2007. Music acquisition: effects of enculturation and formaltraining on development. Trends in Cognitive Sciences. 11, 466–472.

Istók, E., Brattico, E., Jacobsen, T., Krohn, K., Müller, M., Tervaniemi, M., 2009. Aestheticresponses to music: a questionnaire study. Musicae Scientiae. 13, 183–206.

Jacobsen, T., Höfel, L., 2003. Descriptive and evaluative judgment processes: behavioraland electrophysiological indices of processing symmetry and aesthetics. Cogn.Affect. Behav. Neurosci. 3, 289–299.

Jacobsen, T., Schubotz, R.I., Höfel, L., Cramon, D.Y., 2006. Brain correlates of aestheticjudgment of beauty. Neuroimage. 29, 276–285.

Koelsch, S., Schmidt, B.H., Kansok, J., 2002. Effects of musical expertise on the early rightanterior negativity: an event-related brain potential study. Psychophysiology. 39,657–663.

Kreutz, G., Schubert, E., Mitchell, L.A., 2008. Cognitive styles of music listening. MusicPercept. 26, 57–73.

Kuriki, S., Kanda, S., Hirata, Y., 2006. Effects ofmusical experience on different componentsof MEG responses elicited by sequential piano-tones and chords. J. Neurosci. 26,4046–4053.

Lorist, M.M., Klein, M., Nieuwenhuis, S., De Jong, R., Mulder, G., Meijman, T.F., 2000.Mental fatigue and task control: planning and preparation. Psychophysiology. 37,614–625.

Mandler, G., Shebo, B.J., 1983. Knowing and liking. Motiv. Emotion. 7, 125–144.McDonald, C., Stewart, L., 2008. Uses and functions of music in congenital amusia. Music

Percept. 25, 345–355.Meyer, L.B., 1956. Emotion and Meaning in Music. University of Chicago Press, Chicago.Moser, J.S., Hajcak, G., Bukay, E., Simons, R.F., 2006. Intentional modulation of emotional

responding to unpleasant pictures: an ERP study. Psychophysiology. 43, 292–296.Müller, M., Höfel, L., Brattico, E., Jacobsen, T., 2009. Electrophysiological correlates of

aesthetic music processing comparing experts with laypersons. Neurosciences andMusic III: Disorders and Plasticity 1169, 355–358.

Oldfield, R.C., 1971. Assessment and analysis of handedness — Edinburgh inventory.Neuropsychologia. 9, 97–114.

Oostenveld, R., Praamstra, P., 2001. The five percent electrode system for high-resolution EEG and ERP measurements. Clin. Neurophysiol. 112, 713–719.

Peretz, I., 2006. The nature of music from a biological perspective. Cognition 100, 1–32.Schupp, H.T., Cuthbert, B.N., Bradley, M.M., Cacioppo, J.T., Ito, T., Lang, P.J., 2000a. Affective

picture processing: the late positive potential ismodulated bymotivational relevance.Psychophysiology. 37, 257–261.

Schupp, H.T., Weike, A.I., Hamm, A.O., 2000b. Affect and evaluative context: high-density ERP recordings during picture processing. Psychophysiology. 37, 88.

Shahin, A., Bosnyak, D.J., Trainor, L.J., Roberts, L.E., 2003. Enhancement of neuroplasticP2 and N1c auditory evoked potentials in musicians. J. Neurosci. 23, 5545–5552.

Simpson, E.H., 1951. The interpretation of interaction in contingency tables. J. R. Stat.Soc. B. 13, 238–241.

Smith, J.D., Melara, R.J., 1990. Aesthetic preference and syntactic prototypicality inmusic — tis the gift to be simple. Cognition 34, 279–298.

Steinbeis, N., Koelsch, S., Sloboda, J.A., 2006. The role of harmonic expectancy violationsinmusical emotions: evidence from subjective, physiological, and neural responses.J. Cogn. Neurosci. 18, 1380–1393.

Tervaniemi, M., 2009. Musicians-same or different? Neurosciences and Music III: Dis-orders and Plasticity. 1169, 151–156.

Trehub, S.E., Hannon, E.E., 2006. Infant music perception: domain-general or domain-specific mechanisms? Cognition 100, 73–99.

Tremblay, K., Kraus, N., McGee, T., Ponton, C., Otis, B., 2001. Central auditory plasticity:changes in the N1–P2 complex after speech-sound training. Ear Hear. 22, 79–90.

Walter, W.G., Winter, A.L., Cooper, R., Mccallum, W.C., Aldridge, V.J., 1964. Contingentnegative variation: an electric sign of sensori-motor association and expectancy inthe human brain. Nature 203, 380–384.

Wild-Wall, N., Hohnsbein, J., Falkenstein, M., 2007. Effects of ageing on cognitive taskpreparation as reflectedbyevent-relatedpotentials. Clin. Neurophysiol. 118, 558–569.