Differential Short-Term Memorization for Vocal and Instrumental Rhythms
Transcript of Differential Short-Term Memorization for Vocal and Instrumental Rhythms
Running head: SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 1
Differential Short-Term Memorization for Vocal and Instrumental Rhythms
Niall A.M. Klyn**, Udo Will*†, YongJeon Cheong*, and Erin T. Allen*
*School of Music, The Ohio State University, Columbus OH, USA **Department of Speech and Hearing Science, The Ohio State University, Columbus OH, USA
School of Music. 1866 N. College Road, 110 Weigel Hall, Columbus OH, 43210. 614-292-6389. Department of Speech and Hearing Science. 1070 Carmack Road, 110 Pressey Hall, Columbus OH, 43210. 614-292-8207.
[email protected]; [email protected]; [email protected]; [email protected]
This is a Prepublication version of the article to be published by Taylor & Francis in Memory (online publication date 08/17/2015), available online: http://wwww.tandfonline.com/10.1080/09658211.2015.1050400.
e-prints of the published article are available at:
http://www.tandfonline.com/eprint/rHjz2dnkHq33EaiMQPm3/full
† Corresponding author: Udo Will, School of Music. 1866 N. College Road, 101G Weigel Hall,
Columbus OH, 43210. Phone: 614-292-6389, email: [email protected]
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 2
Differential Short-Term Memorization for Vocal and Instrumental Rhythms
This study explores differential processing of vocal and instrumental rhythms in short-term memory with three decision (same/different judgments) and one reproduction experiment. In the first experiment memory performance declined for delayed versus immediate recall, with accuracy for the two rhythms being affected differently: Musicians performed better than non-musicians on clapstick but not on vocal rhythms, and musicians were better on vocal rhythms in the same than in the different condition. Results for experiment two showed that concurrent sub-vocal articulation and finger-tapping differentially affected the two rhythms and same/different decisions, but produced no evidence for articulatory loop involvement in delayed decision tasks. In a third experiment which tested rhythm reproduction, concurrent sub-vocal articulation decreased memory performance, with a stronger deleterious effect on the reproduction of vocal than of clapstick rhythms. This suggests that the articulatory loop may only be involved in delayed reproduction not in decision tasks. The fourth experiment tested whether differences between filled and empty rhythms (continuous vs. discontinuous sounds) can explain the different memorization of vocal and clapstick rhythms. Though significant differences were found for empty and filled instrumental rhythms, the differences between vocal and clapstick can only be explained by considering additional voice specific features.
Keywords: short-term memory, vocal rhythm, instrumental rhythm, rhythm encoding, articulatory loop, task dependency
Introduction:
Research on processing of sounds from different sources (Belin, Zatorre, Lafaille, Ahad, & Pike,
2000; Bent, Bradlow, & Wright, 2006; Levy, Granot, & Bentin, 2001, 2003; Vouloumanos,
Kiehl, Werker, & Liddle, 2001; Zatorre, Belin, & Penhune, 2002) indicates that sound source
identification is a well-developed human ability, and that vocal and non-vocal sounds are
processed differently. In addition, vocal and instrumental melodic contours have been shown to
differentially affect performance of word repetition tasks in tone language speakers (Poss, 2012;
Poss, Hung, & Will, 2008;). However, studies on auditory rhythm have not systematically
investigated whether sounds from different sources, e.g. vocal versus non-vocal, lead to
differential rhythm processing. Studies have investigated temporal coding (Deutsch, 1986; Povel,
1981; Povel & Essens, 1985), implicitly assuming that rhythm processing is an essentially
amodal process, though there is evidence that modality of stimulus presentation affects temporal
encoding (Glenberg & Jona, 1991). A recent study by Hung (2011) advanced evidence that even
within the auditory modality rhythm processing is not independent of features of the sound
source. Her study, in which subjects had to decide whether two sequentially presented rhythms
were the same or different, found significant behavioural (reaction time, accuracy) and imaging
(fMRI) differences for vocal and instrumental rhythms.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 3
The current study extends research on rhythm memory by asking whether vocal and
instrumental rhythms are processed differently in short-term memory. It links questions arising
from recent timing research that suggests that multiple distinct processes underlie timing (Coull,
Cheng, & Meck, 2010; Wiener, Mattel, & Coslett, 2011), with questions of how rhythms
produced by different sources are represented and maintained in memory. Consequently, we
investigate how differential processing of vocal and instrumental rhythms is affected by retention
span, musical training, the type of responses to be made (same/different decision or
reproduction), distractor tasks, and acoustic features of the sounds that form the rhythms. The
current study presents four experiments that investigate short-term memory for vocal and
clapstick rhythms: experiment 1 examines changes in rhythm discrimination at short (0.5s) and
long (12.5s) inter-stimulus intervals, whether there is trace decay for rhythm and whether vocal
and clapstick rhythms are affected in similar ways; experiment 2 investigates a potential
rehearsal mechanism for rhythm memory by examining the effect of concurrent motor tasks on
rhythm discrimination; experiment 3 examines effects of concurrent motor tasks on rhythm
reproduction; finally, experiment 4 investigates the role of one acoustical difference between
clapstick and vocal sounds, that of “empty” and “filled” rhythms. Empty rhythms use sounds
with a brief onset and offset. Thus, there is no steady state in between events. Filled rhythms,
then, are those in which the onsets are proceeded by a more continuous sound before the offsets,
which may coincide with the following event onset. The analytical procedures we use are guided
by our data and the choice of experimental variables. It might seem that a signal detection (SDT)
approach would be appropriate for this study, as SDT analyses have been used for same/different
decision experiments (MacMillan & Creelman, 2005). Classic signal detection theory assumes
that recognition judgments can be analysed using a unidimensional value of familiarity.
However, there is evidence that other sources, e.g. neural synchronization and accumulated
information, can drive new/old or same/different decisions (Johns, Jones, & Mewhort, 2012;
Finnigan, Humphreys, Dennis, & Geffen, 2002; Mewhort & Johns, 2005) and that these
decisions are based on different cognitive processes involving different types of comparison
(Bagnara, Boles, Simion, & Umiltà, 1982; Keuss, 1977; Markman & Gentner, 2005). As we
were particularly interested in whether and how the processes underlying same/different
decisions affect rhythm memory processing we did not use SDT for our analysis. An SDT
analysis would not permit considering the same/different factor as a separate experimental
variable, instead collapsing the factor levels as two distributions on the same underlying
parameter. We did, however, perform supplementary bias analyses for the first two experiments
(see results section for experiment 1 and 2) to test whether our results could be explained by bias
changes in subjects’ decision making.
The experiments were approved by the Institutional Review Board of The Ohio State
University.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 4
Experiment 1:
1.1. Introduction:
Hung’s (2011) experiment provided evidence effects of clapstick and vocal rhythms on
same/different decision tasks. Given these differences were found at the short ISI (~0.5s), we ask
whether these differences persist or change for longer retention spans. There is ample evidence
to show that auditory memory performance degrades over longer periods (for reviews, cf.
Baddeley, 1990, 2010; Cowan, 1997, 2008; Neath & Surprenant, 2003), but it is unclear whether
this also holds for rhythm memory and whether memory for vocal and clapstick rhythms will be
affected differentially. Percussion rhythms, like those of the clapstick used here, may be simply
described by their relative onset and offsets, while vocal rhythms include additional features like
changes in pitch, frequency spectra and timbre. This greater complexity could induce greater
degradation in memory and we hypothesized that for delayed responses the performance on the
vocal rhythm task would be significantly degraded compared to the clapstick task.
A further question that arises concerns the influence of musical training (Hung, 2011 only
examined trained musicians). Differences between musicians and non-musicians have received a
fair bit of attention (Gaser & Schlaug, 2003; Koelsch, Schröger, & Tervaniemi, 1999;
Musacchia, Sams, Skoe, & Kraus, 2007; Musacchia, Strait, & Kraus, 2008; Parbery-Clark, Skoe,
Lam, & Kraus, 2009; Schaal, Banissy, & Lange, 2015; Zatorre, 1998), but it is unclear whether
any effects of musical training will extend to differential processing in memory for rhythm.
Several studies have shown superior performance on memory tasks by musicians (Jakobson,
Lewycky, Kilgour, & Stoesz, 2008; Kilgour, Jakobson, & Cuddy, 2000; Schaal et al., 2015;
Tervaniemi, Rytkönen, Schröger, Ilmoniemi, & Näätänen, 2001), which would indicate that
musically trained subjects should exhibit generally superior performance at longer ISI than non-
musicians, but whether this will affect memory for vocal versus clapstick rhythms is uncertain.
Moreover, musical training may specifically improve memory for rhythm by fostering strategies
like categorization of rhythmic intervals in terms of a set of learned duration values (e.g. quarter
note, eighth note, etc.) and/or visualized representation of rhythms using musical notation.
Indeed, in a recent article Schaal et al. (2015) proposed a rhythm span task that uses a
same/different decision task to estimate the number of "rhythm elements" that listeners can retain
over a 2 second delay. In the first experiment using their novel procedure, musicians had a
significantly longer estimated rhythm span than non-musicians. Due to the substantial amount of
ear-training most university-level musicians receive and the previous evidence of superior
working memory in musicians, we hypothesized that musicians would show superior accuracy
for all conditions in the experiment. As we are unaware of any evidence for differences in
reaction time (henceforth RT) between these groups, we hypothesized there would be no
difference in our study. Finally, we anticipated that our results would be consistent with previous
research in which same/different decisions led to significantly different RTs (Proctor, 1981).
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 5
1.2. Methods:
Participants:
For the purposes of our study, subjects were classified as musicians if they satisfied all of three
criteria: 1) at least 5 years of formal musical training, 2) ongoing active musical engagement at
the time of participation, and 3) self-identification as a musician. Any participant who did not
meet all three criteria was therefore categorized as a non-musician. Our musicians played a wide
range of instruments, but this study was not designed to detect potential differences between
different instrumentalists, different durations of training, or techniques. Therefore no attempt was
made to analyse the data along these lines. Twenty-five participants who reported normal hearing
took part in the first experiment; 10 non-musicians (avg. years of musical training = 0.66 years,
min. 0, max. 4; 6 female; avg. age = 24 years, min. 20, max. 30), and 15 musicians (avg. years of
musical training = 12.3 years, min. 5, max. 20; 4 female; avg. age = 23 years, min. 18, max. 31;
1 left-handed). All subjects of this study were members of the Ohio State University community
who participated for either monetary compensation or course credit, and all signed informed
consent prior to participation.
Stimuli:
The stimuli for the following experiments were created using excerpts from a CD of field
recordings of the Dyirbal people from Queensland, Australia (Dixon & Koch, 1996) by Hung
(2011). These recordings were selected to use real-world stimuli and to avoid possible confounds
due to semantic processing and musical familiarity. From 10 recordings of male performances
we selected brief excerpts (mean length 1.73s, range 1.5-2s). To ensure a consistent clapstick
signal for all vocal excerpts, one clapstick sound from the CD was used to replace the existing
clapstick sounds if the recording had a pre-existing accompaniment or to insert a new pattern if
the recording was a solo voice excerpt. Any new clapstick pattern was constructed following one
of the clapstick patterns available on the CD. The combination of voice and clapstick rhythms
was done to assure identical stimulus input for the main tasks, the comparison of two subsequent
vocal or instrumental rhythm patterns: with identical stimuli, any difference in results cannot be
attributed to differences in the stimuli. The resulting 10 parent stimuli were subsequently
modified to produce three variant stimuli for each parent, one with vocal rhythm changed but
clapstick the same, one with vocal rhythm the same but clapstick rhythm changed, and one with
a change to both the clapstick and the voice. If changes were only introduced to one rhythm and
not the other, subjects could more easily use the non-task rhythm for comparison when making a
decision. Therefore we decided to include all four types of variants in this study. The variant
rhythms were created by modifying the timing of one event or by adding or removing one event,
with modifications equally distributed over the length of the sound files. The 10 parent rhythms
were then grouped with either themselves or one of the three variants, resulting in 37 stimuli –
three variants were excluded due to unacceptably high error rates for these variants found by
Hung (2011). Figure 1 presents an illustration of the stimulus creation process (fig. 1a), as well
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 6
as the waveforms of two stimuli as they were presented to our subjects, one with a difference in
the clapstick (fig. 1c) and one with a difference in the vocal rhythm (fig. 1b). Both sample
stimuli are available for listening online (https://soundcloud.com/ethnomusicology-
osu/sets/stimuli-for-differential-short).
Figure 1, Stimuli from experiment 1. A: Schematic diagram showing the creation of a "parent" rhythm and its variants. Four waveforms are created from one parent combination of clapstick (c.) and voice (v.). Variants in which the rhythm is the same are marked "=," altered rhythms are marked "≠." B: vocal rhythms = different, clapstick rhythm = same. C: vocal rhythm = same, clapstick rhythms = different. Arrows mark the points of difference.
Using the stimuli described above, ISIs of 0.5s and 12.5s were chosen, in order to place
them within and outside of the phonological store of Baddeley and Hitch’s (1974) model for
memory and at the upper and lower end of Cowan’s (1984) auditory memory. Presumably,
comparisons at the short ISI (immediate recall task) will make use of some form of echoic
memory (Neisser, 1967), whereas the long ISI (delayed recall task) will require some other
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 7
mechanisms. Stimuli were grouped in four blocks of 18 or 19 stimuli, and the duration and task
variables kept constant within each block.
Equipment:
The experiment was conducted on a Sony Vaio laptop running the Windows XP environment.
Stimuli were presented with the DMDX software package (Forster & Forster, 2003) and
responses were recorded via the built-in track pad buttons labelled with the corresponding
response; “=” for same, “≠” for different. RT and correct-incorrect status was recorded for each
response. Stimuli were presented via Sony MDR-V200 headphones connected to the laptop’s
built-in headphone jack.
Procedure:
Trial stimuli, not part of the experimental set, were presented with verbal instructions for the
experiment, and the subject's understanding of the task was evaluated informally. During this
practice phase the subjects adjusted the volume to a comfortable listening level. Over the course
of the test phase stimuli at each ISI were presented twice to every subject, once with the task to
judge the vocal rhythm and once with the task to judge the clapstick rhythm. In each case the
subjects were instructed to ignore the other instrument and pay attention only to target rhythm–
vocal or clapstick. Block order and experimental tasks were balanced across subjects to reduce
the impact of any possible learning effects on the results. At the beginning of each block,
instructions for that block would appear on the screen, asking either “is the VOICE the same?”
or “is the CLAPSTICK the same?” On each trial the first stimulus of a pair was presented,
followed by a pause of the appropriate duration for the block (0.5s or 12.5s), followed by the
second stimulus of the pair. The subjects were asked to make a decision as quickly and
accurately as they could, using the first and second fingers of their dominant hand to push the
appropriate trackpad buttons. The experiment lasted approximately 60 minutes including the
instruction period. After completion of the final block subjects were debriefed.
1.3. Results:
Throughout the current study we use the following set of experimental variables (names
capitalized for unequivocal identification): Our main variable, ‘TIMBRE,’ will refer to subjects
being asked to respond to either vocal or clapstick rhythms. ‘RETENTION’ will refer to the
inter-stimuli interval (ISI) between the two stimuli of each presentation pair, and takes short and
long values. A third variable, ‘CONGRUITY,’ refers to the TIMBRE rhythms of a stimulus pair
being either the ‘same’ or ‘different’. The fourth variable, ‘TRAINING,’ refers to musical
training, has two levels, ‘musician’ and ‘non-musician,’ and allows us to detect whether coding
and representation differences due to musical training affect vocal and instrumental rhythm
memorization.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 8
Reaction Time:
A repeated measurement ANOVA with a between-subject factor TRAINING (musician/non-
musician) and within-subject factors of TIMBRE (clap/voice), RETENTION (long/short), and
CONGRUITY (same/different) was performed on the RT data after exclusion of all RTs greater
than three standard deviations from the mean, and after averaging each subject’s multiple
responses for each factor combination to account for the interdependencies of the repeated
measures. The ANOVA was thus performed on means of means, which were normally
distributed. Due to the unequal sample size of the between-subject factor, weighted means were
used in calculating the ANOVA.
RETENTION and CONGRUITY were found to be significant at α = .05 and other main
factors failed to reach significance. There was also a significant 2-way interaction for
TRAINING and TIMBRE. These results are summarized in Table 1.
(Insert Table 1 here)
The significant effect of RETENTION had a size (mean response difference) of 0.54s
(Cohen’s d = 1.16) and that of CONGRUITY had a size of 0.74s (Cohen’s d = 1.99). The means
for RETENTION were 1.44s (long; C.I. 1.35 to 1.53) and 0.9s (short; C.I. 0.83 to 0.87); for
CONGRUITY they were 0.80s (same; C.I. 0.74 to 0.87) and 1.54s (different; C.I. 1.46 to 1.61),
and the difference between the two was not affected by the length of the retention span (fig.2A).
Musicians were faster for clapstick (1.15s; C.I. 1.03 to 1.27) than non-musicians (1.21s; C.I. 1.08
to 1.34) whereas musicians were slower for vocal rhythms (1.19s; C.I. 1.08 to 1.31) than non-
musicians (1.13s s; C.I. 1.00 to 1.26; fig.2B). The range of our RTs did not seem too surprising
as similar ranges have been reported by studies with task of comparable complexity. Malmberg
and Xu (2007) reported RTs between 1.7 and 2.5s for memory recognition tasks, Will,
Nottbusch, and Weingarten (2006) reported initial latencies in picture naming and word typing
tasks of 1.6 and 1.2s, respectively, and Alvarez, Cottrell, and Afonso (2009) reported means of
1.5s for similar tasks.
Figure 2, A: Reaction time interaction plot from experiment 1 for RETENTION:CONGRUITY. RT (s): reaction time in seconds, long: 12.5s RETENTION, short: 0.5s RETENTION, d: ‘different’ rhythm, s: ‘same’ rhythm. B: Reaction time interaction plot from experiment 1 for
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 9
TRAINING:TIMBRE. clap: clapstick rhythm, vocal: vocal rhythm, m: musicians, n: non-musicians. Error bars represent 1 standard error. Accuracy:
We used R (R Development Core Team, 2012) and the lme4 package (Bates, 2010; Bates,
Maechler, Bolker & Walker, 2014) to assess the relationship between responses and
experimental variables in a generalized linear mixed model with a binomial error distribution and
a log link function (as GLMMs in R do not output confidence interval (for rationale see Baayen
et al., 2008) we list P>|z| values as measures of uncertainty of the parameter estimates in our
result tables). As fixed effects, we used TRAINING, RETENTION, TIMBRE, and
CONGRUITY and as random effect we used the intercepts for subjects, nested within
TRAINING (two levels). Residual plots did not reveal any obvious deviations from
homoscedasticity or normality. The initial full model was fitted by sequential removal of non-
significant components. The nesting of subjects within factor TRAINING was not significant (χ2
= 0.171; p = .918) and was not included in the fitted model. The minimal significant model was
then subjected to an ANOVA and the probabilities for the F-values determined via the
Satterthwaite approximation (using the lmerTest package; Kuznetsova, Brockhoff, &
Christensen, 2014). Following Baguley (2009), both standardized (z-values for the estimated
coefficients) and simple effect sizes (mean response differences) are reported, and all figures
show raw data values and SE bars.
The ANOVA results showed that all main factors, the two-way interactions
TRAINING:TIMBRE, RETENTION:TIMBRE, TIMBRE:CONGRUITY, the three-way
interaction of TRAINING:TIMBRE:CONGRUITY, and the four-way interaction of the main
factors were significant. Estimated standardized effect sizes for the model are shown in table 2,
and the simple effect sizes (mean response differences) indicate that correct responses for
musicians were 7.7 percentage points (pp) higher than for non-musicians, that the short
RETENTION produced 5.1 pp more correct responses than the long RETENTION, that
responses to clapstick rhythms were 10.9 pp higher than to vocal rhythms, and that ‘different’
stimuli were correctly identified 4.3 pp more often than ‘same’ pairs.
(Insert Table 2 here)
However, due to the TIMBRE:CONGRUITY interaction, clapsticks showed a larger
difference (8.2 pp) between same and different decisions than did vocal rhythms (1.1 pp). The
TRAINING:TIMBRE interaction was due to the larger difference between clapstick and vocal
rhythms in musicians (15 pp) than in non-musicians (4.8 pp). The RETENTION:TIMBRE
interaction was caused by the stronger effect of RETENTION on vocal rhythms, which were
8.3 pp better for the short than for the long RETENTION, while the difference for clapsticks was
only 1.7 pp. The three-way interaction TRAINING:TIMBRE:CONGRUITY reflected the fact
that musicians showed a larger difference for same and different responses to clapstick rhythms
(10.5 pp) than non-musicians (4.1 pp), whereas the difference for vocal rhythms was minimal for
both groups (0.8 pp and 0.1 pp, respectively). The 4-way interaction between the main factors
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 10
was attributable to the fact that for the short retention span the difference between ‘same’ and
‘different’ responses was 2.4 pp for vocal rhythms and 5.0 pp for clapstick rhythms, whereas for
the long span they were 1.3 pp and 10.9 pp, respectively. In addition, the difference between
‘same’ and ‘different’ responses for clap and vocal rhythms was affected by TRAINING: for the
long retention span the musicians’ different decisions for vocal rhythms were 27.7 pp lower than
for clapstick, whereas same decisions were only reduced by 8.8 pp, and for non-musicians the
corresponding differences were 9.4 pp for different and 7.3 pp for same decisions (see fig. 3A).
Figure 3, A: Accuracy interaction plot for RETENTION:TIMBRE split by TRAINING from experiment 1. Accuracy(%): percentage correct, long: 12.5s RETENTION, short: 0.5s RETENTION, clap: clapstick rhythm, and vocal: vocal rhythm. B: Accuracy interaction plot for TIMBRE:CONGRUITY from experiment 1, clap: clapstick rhythm, and vocal: vocal rhythm, d: ‘different’ rhythm, s: ‘same’ rhythm. Error bars represent 1 standard error.
As it could be argued that our results might be due to changes in subjects’ bias to make
‘same’ or ‘different’ decisions, we additionally calculated the response bias measure β
(MacMillan & Creelman, 2005) for all main factor levels using the sdtalt R package (Wright,
Horry, & Skagerberg, 2009) and compared the two β for each factor level using t tests. We found
no bias difference between musicians and non-musicians or between short and long retention
spans. Only the t test for TIMBRE for musicians in the long RETENTION condition was
significant (t = -3.49, df = 14, p = .003), all other t test showed p values between .08 and .67.
Closer inspection of this significant case showed that this effect was due to a change in the
proportion of correct responses for vocal rhythms in the different condition for the long
RETENTION as compared to short RETENTION. However, no corresponding bias changes for
the musicians in the short RETENTION, or in any of the tests for the non-musicians were found,
and a response bias change does not seem to contribute to these results.
1.4.Discussion:
In this experiment we found that the length of the memorization period affected both RT and
accuracy of the responses. Delayed recall resulted in a significant increase of RT compared to
immediate recall, with an overall increase of 0.541s and a significant reduction in accuracy by
5.1 pp. Rhythm memory performance declines with prolongation of the retention period from
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 11
0.5s to 12.5s. With the experimental design applied here (presentation of 1st stimulus → retention
period → presentation of 2nd stimulus → decision), it seems difficult to explain the observed
memory decay, as argued by Nairne (2002), as an effect of distractors or interferences during the
retention period. It therefore seems a more fitting explanation to assume an inherent instability of
memory traces for rhythms because the retention periods did not involve interventions of any
sort. Evidence for decay of an auditory memory trace has also recently been presented for the
frequency and intensities of sounds (Mathias, Micheyl & Shinn-Cunningham, 2014; Mercer &
McKeown, 2014). We will return to this topic in the general discussion.
Consistent with previous findings (Proctor, 1981) RTs for ‘different’ decisions were
longer by 0.71s than ‘same’ decisions. We also found a significant effect of CONGRUITY on
accuracy, with ‘different’ stimuli identified correct 75.1% of the time, whereas the ‘same’ stimuli
were correctly identified 70.9% of the time. Neither the RT difference nor the accuracy
difference between ‘same’ and ‘different’ decision was affected by RETENTION. Interestingly,
though accuracy performance for clapstick only dropped by 1.7 pp from short to long
RETENTION, performance for vocal rhythm dropped by 8.3 pp, causing the significant
RETENTION:TIMBRE interaction.
Though RTs did not show a significant effect for TRAINING, musicians were
significantly more accurate (76.0%) than non-musicians (68.3%). This appears to be consistent
with our initial hypothesis that musical training might confer a general advantage. However, the
significant TRAINING:TIMBRE interaction showed that accuracy differences between
musicians and non-musicians are remarkably larger for clapstick (15.0 pp) than for vocal
rhythms (4.8 pp). The significant 2-way interaction of TRAINING:TIMBRE for RT showed that
musicians were faster than non-musicians on clapstick but slower on vocal rhythms. These
accuracy and RT differences between the two participant groups indicates that musical training
tends to be more of an advantage for extracting and memorizing clapstick than vocal rhythms, a
possible explanation for which is explored in the general discussion.
For the accuracy data, the interaction of TRAINING:TIMBRE:CONGRUITY as well as
the four-way interactions between the main factors indicate that they were mainly due to the
different response rates of musicians and non-musicians for ‘different’ decisions on vocal and
clapstick rhythms, with musicians showing a larger difference (27.7 pp) for the two rhythms than
non-musicians (9.4 pp). For ‘same’ decisions the differences for the groups were quite similar,
8.8 pp and 9.4 pp, respectively. An explanation for this interaction could be that for ‘different’
decisions musicians, due to training, might make use of different stimulus representations (e.g.
perceptual categorizations, or feature extractions and conceptualizations) than non-musicians.
Different encodings - the forms in which information is placed in memory - also might be
involved in the significant interaction between TIMBRE and CONGRUITY we found for
decision accuracy. ‘Different’ decisions on vocal rhythms were only slightly (1.1 pp) better than
‘same’ decisions, whereas the corresponding difference for clapstick rhythms was 8.2 pp.
Assuming that ‘same’ and ‘different’ decisions are based on two different matching processes
(Bagnara et al., 1982; Keuss, 1977; Markman & Gentner, 2005), a fast, holistic one for ‘same’
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 12
and a slow, analytic one for ‘different’ decision, the two processes may involve different forms
of encodings that could explain both our accuracy and RT data. The fact that Keuss (1977) found
a same/different effect on RTs but not on accuracy suggests that same/different decisions in
(visual) letter and digit search tasks (for Keuss’ study) involve different processes than in
(auditory) rhythm comparison tasks (for the current study). An alternative explanation which
assumes only one matching process with rechecking for different decisions (Briggs & Johnson,
1973; Krueger, 1978), and operating on only one form of rhythm representations is not able to
account for our accuracy data (i.e. the significant interaction between TIMBRE and
CONGRUITY).
The experiment shows that, on average, clapstick rhythms led to significantly more
accurate responses (78.4%) than vocal rhythms (67.5%). The biggest advantage seemed to lie
with the clapstick/different stimuli, which were correctly identified 82.4% of the time, compared
to 74.5% for clapstick/same, and 67.2% and 67.7% for voice/same and different, respectively.
Though it might seem possible that this result is confounded by different levels of complexity of
the vocal and instrumental rhythms, Hung (2011) tested the relationship of rhythms consisting of
simple integer ratios and non-integer ratios intervals to the respective error rates using a χ2 test,
and found the relationship not significant (p = .53). In addition, the way the stimuli were chosen
(see section 1.2 Stimuli) eliminated any possible semantic influence to account for these
differences. A remaining plausible explanation is that the physical sound features of clapstick
and vocal rhythms could account for the difference in memorization . One potential variable that
may drive this difference–the filled versus unfilled nature of the stimuli–is tested in experiment
4.
An interesting question was highlighted during the post-interview debriefs; what strategy
was being used by subjects over the longer ISI’s to retain these rhythms? Many subjects noted
that they used some form of silent, “vocal” repetition for the vocal rhythms, and some form of
non-vocal motor repetition (e.g. finger-tapping) for the clapstick rhythms. This was true for both
musicians and non-musicians. A likely candidate for this memorization process is the
articulatory loop (Baddeley & Hitch, 1994), and this is tested in the second experiment.
Experiment 2:
2.1.Introduction:
The articulatory loop is thought to offset memory trace degradation through repeated rehearsal of
stimuli (Baddeley & Hitch, 1994). If it is a mechanism for memorization of verbal information,
as Baddeley and Hitch suggest, then it may also be involved in processing timing information
produced by human vocal organs. Thus we might not expect clapstick rhythms to be ‘rehearsed’
via the articulatory loop as they are not produced in the vocal modality.
As noted in the discussion for experiment 1, multiple subjects reported rehearsal
strategies for the vocal rhythm task that qualitatively support the idea of the involvement of the
articulatory loop. Interestingly, however, they also reported an alternate strategy for the clapstick
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 13
rhythm task; namely they used non-vocal ‘rehearsal’ by tapping or other rhythmic movements,
which may implicate an alternate rehearsal strategy other than, or in addition to, the articulatory
loop. These subject reports seem to contrast with the findings of Saito and Ishio (1998) wherein
concurrent sub-vocal articulation was found to degrade a rhythmic reproduction task more
significantly than a concurrent drawing task or the control (without simultaneous motor task).
The rhythmic stimuli in that study were pure tones, however, and the possibility of differential
memory for vocal and non-vocal sounds was not considered. The use of articulatory suppression
has been shown to be a consistent method of occupying the limited space of the phonological
loop, thereby preventing participants from using the associated refresh mechanism and resulting
in the deterioration of stored material (Baddeley, 1975; Keller, Cowan & Saults, 1995; Salamé &
Baddeley, 1982). Concurrent finger-tapping has sometimes been used as a comparison control
for articulatory suppression — the two tasks are assumed by researchers to require roughly
equivalent attention and./or motor activation (e.g. Baddeley, Eldridge, & Lewis, 1981; Halliday,
Hitch, Lennon, & Pettipher, 1990; Papagno, Valentine, & Baddeley, 1991). However, because of
the differences we found for vocal and non-vocal rhythm processing, one might expect to see
differential effects from these concurrent motor tasks. Specifically, we would expect that
concurrent finger-tapping would degrade clapstick rhythm memory more than vocal rhythm
memory, and that sub-vocal articulation would degrade vocal rhythm memory more than
clapstick rhythm memory. Experiment 2 was designed to test these hypotheses.
2.2.Methods:
Participants:
Musical experience was determined as in experiment 1. Twelve non-musicians (5 female; avg.
age = 23 years, min. 19, max. 34) and 14 musicians (8 female; avg. age = 25 years, min. 21, max.
34) participated in the experiment and all reported normal hearing. Non-musicians had an
average of 0.3 years of formal music training (min. = 0, max. = 3), and musicians 12.8 years
(min. = 7, max. = 18). Three subjects (2 non-musicians) were unable to complete the entire
experiment and are not included in the analysis.
Stimuli:
The stimuli were those with the long ISI (12.5s) from experiment 1, again grouped into two
blocks of 18 and 19 pairs, respectively. These blocks were combined with the three distractor
tasks (control, i,e no motor action; finger-tapping; repeated sub-vocal articulation of the syllable
‘the’), and the resulting six blocks were presented twice, once with the instruction to respond to
the vocal rhythm, and once to respond to the clapstick rhythm. Block presentation and
experimental tasks were balanced across subjects.
Equipment:
The equipment was the same as experiment 1.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 14
Procedure:
The procedure for this experiment was the same as experiment 1 with two changes: subjects
were informed which DISTRACTOR task to perform during each block, and subjects used their
dominant hand for the tapping DISTRACTOR, and their non-dominant hand to push the
appropriate response buttons. The experimenter monitored participants’ lip and jaw movements
for the sub-vocal task and hand movement for the tapping task to ensure adequate performance
of the DISTRACTOR tasks, and reminded participants of proper procedure when necessary. The
experiment took approximately 2 hours to complete.
2.3.Results:
In addition to TRAINING, TIMBRE, and CONGRUITY as used in the first experiment, the
second and third experiments include DISTRACTOR with the three levels described above.
Reaction time:
A repeated measurement ANOVA with TRAINING as between- and the other three factors as
within-subject factors was performed on the RT data after exclusion of outliers and averaging
over factor combinations as in experiment 1 (table 3). TIMBRE and CONGRUITY were found
to be significant at the α = .05 level, but there was no significant effect for TRAINING or
DISTRACTOR. The mean difference between same (1.2s; C.I. 1.16 to 1.29) and different
CONGRUITY (2.15s; C.I. 2.10 to 2.24) was 0.95s (Cohen’s d = 2.32), and decisions on vocal
rhythms (1.63s; C.I. 1.53 to 1.73) were 0.1s faster (Cohen’s d = 0.37) than those on clapstick
rhythms (1.73s; C.I. 1.62 to 1.85).
(Insert Table 3 here)
There was a significant 2-way interaction for DISTRACTOR and CONGRUITY
indicating that the ‘different’ decisions were more affected by the DISTRACTOR than were the
‘same’ decisions. A significant interaction was also found between TRAINING, DISTRACTOR
and CONGRUITY, indicating that musicians were affected in their ‘different’ decisions by the
tapping distractor, but not non-musicians were not. (see fig.4).
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 15
Figure 4, Reaction time (RT) interaction plot for DISTRACTOR:CONGRUITY, split by TRAINING from experiment 2. Contr: control, sub-voc: sub-vocal articulation, tap: finger-tapping, d: ‘different’ rhythm, s: ‘same’ rhythm. Asterisk marks musicians’ mean for tapping that is significantly different from their control and sub-vocal means. Error bars represent 1 standard error.
Accuracy:
A generalized linear mixed effects analysis was performed on the mean accuracy data with
TRAINING, DISTRACTOR, TIMBRE, and CONGRUITY as fixed effects, and subjects as
random effects. A binomial error distribution and the logit link function were chosen for the
model. ANOVA was performed on the minimal fitted model and corresponding p values were
calculated using Satterthwaite approximation and results are displayed in table 4.
(Insert Table 4 here)
All four main factors of the model as well as the interactions DISTRACTOR:TIMBRE,
DISTRACTOR:CONGRUITY, and TIMBRE:CONGRUITY were significant (fig. 5). Model
estimates showed that musicians gave more correct responses than non-musicians, with an effect
size (mean response difference) of 7.8 pp. Response rates to both the tap (-10.3 pp) and the sub-
vocal distractor (-11.4 pp) were lower than for the control, but there was no significant difference
between the two distractors. The correct response rate for vocal rhythms was 13.4 pp lower than
that for clapstick rhythms and same decisions were 9.1 pp better than different decisions.
The interaction DISTRACTOR:TIMBRE showed that in comparison to controls the
accuracy for vocal rhythms was less affected by tapping (-8.2 pp) or sub-vocal articulation (-9.3
pp) than the corresponding accuracy for clapstick rhythms (-12.5 pp and -13.5 pp, respectively) .
The interaction DISTRACTOR:CONGRUITY was due to the fact that for controls the difference
between ‘same’ and ‘different’ decisions was 4.2 pp, whereas for tapping it was 14.0 pp and for
sub-voc. distractor it was 9.2 pp. Finally, the TIMBRE:CONGRUITY interaction arose from the
difference between 'same' and 'different' decisions for vocal rhythms was larger (15.9 pp) than
for clapstick rhythms (2.5 pp).
Figure 5, A: Accuracy interaction plot for TIMBRE:CONGRUITY from experiment 2. Accuracy(%): percentage correct, clap: clapstick rhythm, vocal: vocal rhythm, d: ‘different’
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 16
rhythm, s: ‘same’ rhythm. B: Accuracy interaction plot for DISTRACTOR:CONGRUITY from experiment 2. Contr : control, sub-voc: sub-vocal articulation, tap: finger-tapping, d: ‘different’ rhythm, s: ‘same’ rhythm.. Error bars represent 1 standard error.
We also calculated response bias measure β the same way as in experiment 1 to test
whether the results were produced or affected by changes in subjects’ bias to make ‘same’ or
‘different’ decisions. The grouped t tests for the main factor levels led to p-values ranging from
.39 to .09, none of them being significant. There was no indication that the results were
influenced by changes in the subjects’ response bias.
2.4 Discussion:
As expected, the additional cognitive load of the concurrent motor tasks leads to reduced
accuracy in same/different decisions (control: 76.9%, tapping: 66.7%, sub-vocal articulation:
65.8%). However, there was no significant difference between tapping and sub-vocal
articulation. This result does not support the hypothesis that the articulatory loop is involved in
differential memorization of vocal versus instrumental rhythms. It also contrasts with Saito and
Ishio’s (1998) results, showing that a concurrent sub-vocal articulation task significantly
degraded a rhythmic reproduction task, but a concurrent drawing task did not. Furthermore,
given the significant difference in accuracy for memory of vocal versus clapstick rhythms found
in experiment 1, we expected a particular type of interaction between the DISTRACTOR motor
tasks and the TIMBRE task: that a concurrent sub-vocal articulation task would interfere more
with memory for vocal rhythm than would a concurrent finger-tapping task, but that was not
what we found.
Although the hypothesis that the articulatory loop is generally involved in memorization
of rhythms was not supported, results were consistent with those of experiment 1. As expected,
clapstick decisions (76.3%) were more accurate than the voice decisions (63.1%) overall. In
addition, compared with the control, RT was significantly faster for vocal than for clapstick
rhythms for the two distractor tasks (0.095s for tapping and 0.15s for sub-vocal articulation.),
suggesting that for dual-rhythm stimuli vocal information is processed faster if cognitive
resources are constrained by concurrent motor tasks. Furthermore, non-musicians (65.1%) were
slightly more affected by the concurrent tasks than musicians (72.9%).
In line with results from experiment 1, ‘same’ decisions were made faster and more
accurately (RT = 1.2s; 74.1%) than ‘different’ decisions (2.15s; 65.0%), adding further support
for the idea that different processes and representations may be involved in ‘same’ and
‘different’ decisions.
The two interactions of TIMBRE:CONGRUITY and DISTRACTOR:CONGRUITY
were a further indication that different processes may be involved in same/different decisions
(Keuss, 1977; Markman & Gentner, 2005), one fast, holistic, and the other slow and analytic: the
slow one is cognitively more demanding (longer RT) and therefore more affected by the
additional cognitive load of a distractor task.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 17
This experiment provided further evidence supporting the idea of differential processing
of vocal versus instrumental rhythms. However, the introduction of concurrent tapping or sub-
vocal articulation did not yield a significant difference between these two tasks, suggesting that
the articulatory loop may not be involved in rhythm decision tasks. The following experiment
will explore the possibility of a role for the articulatory loop in the reproduction of memorized
rhythms.
Experiment 3:
3.1.Introduction:
The outcome of experiment 2 seems to stand in contrast with the study of Saito and Ishio (1998)
that indicated an involvement of the phonological loop in rhythm memory. Differences between
the two experiments that could explain the different results concern stimulus features and
experimental tasks. Saito and Ishio used empty rhythms, while we used empty and filled rhythms
(differences between empty and filled rhythms will be addressed in experiment 4 below), and
their participants had to reproduce rhythms whereas ours had to make decisions. Hence our third
experiment, in which a rhythm reproduction task is substituted for the same/different decision
task, will test the idea that the articulatory loop may be recruited for rhythm reproduction tasks.
3.2.Methods:
Participants:
Musical experience was determined as in experiment 1. Fourteen non-musicians (8 female; avg.
age = 24.4 years, min. 19, max. 34) and 12 musicians (5 female; avg. age = 27.27 years, min. 20,
max. 35) participated in this experiment. Non-musicians had an average of 2.1 years (min. 0,
max. 8) and musicians an average of 17.05 years of formal music training (min. 5, max. 28).
Stimuli:
Twenty-four stimuli that had received the highest scores were selected from the material used in
experiment 2. Each trial was set up as follows: a stimulus was followed by a silent delay interval
of 12.5s which ended with a short 1kHz “beep” as a signal for the participants to start tapping the
rhythm they had just heard. The 24 trials were split into two blocks of 12 trials. Each block was
repeated with every combination of DISTRACTOR and TIMBRE tasks. Block presentation and
experimental tasks were balanced across the subjects in order to reduce the possible influence of
practice effects.
Equipment:
The equipment was the same as experiment 1. In addition, participants tapped the response
rhythms with a small metal rod on a wooden tablet in front of the laptop with a built-in
microphone, and their responses were recorded by the presentation software (DMDX).
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 18
Procedure:
The procedure for this experiment was the same as experiment 2 but with the task changed to the
reproduction of the stimulus rhythms. After the experiment, which took approximately 90
minutes, participants were debriefed.
Data Analysis:
Responses were classified as correct if the number of response and stimulus events was identical,
and if the ratios of the response intervals deviated less than 20% from those of the stimulus
intervals.
3.3.Results:
A linear mixed effects analysis (see experiment1) on the accuracy data was performed with
TRAINING, DISTRACTOR, and TIMBRE as fixed effects and subjects as random effects. The
minimal fitted model only contained the main factors as none of the interactions turned out to be
significant. ANOVA results for the model and model parameter estimates are shown in table 5.
(Insert Table 5 here)
The ANOVA confirmed that only the main factors were significant. Model parameter
estimates, their associated z-values, and the effect sizes (mean response differences) indicated the
following: musicians were correct 25 pp more often than non-musicians; the sub-vocal
DISTRACTOR affected the accuracy significantly (-9.7 pp), but the tapping DISTRACTOR was
not significantly different from control; accuracy for the reproduction of clapstick rhythms was
41 pp better than that for vocal rhythms, and none of the interactions turned out to be significant.
Figure 6, Accuracy interaction plot for DISTRACTOR:TIMBRE, split by TRAINING from experiment 3. Contr : control, sub-voc: sub-vocal articulation, tap: finger-tapping, voc: vocal rhythm, clap: clapstick rhythm. Error bars represent 1 standard error.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 19
3.4.Discussion:
In this reproduction experiment we found that rhythm memory was significantly more negatively
affected by a concurrent sub-vocal articulation than by a tapping task. The larger effect of the
sub-vocal articulation task can be attributed to an involvement of the articulatory loop in short-
term memorization, and supports our initial hypothesis about such an involvement in a delayed
rhythm reproduction task. This is in agreement with the findings of Saito and Ishio (1998).
However, together with the results from our experiment 2, we arrive at a different interpretation
than these authors. It seems unlikely that the articulatory loop is generally involved in rhythm
memory: if rhythms are kept in memory for non-motor responses (comparison and/or decision) a
concurrent sub-vocal articulation was not found to have a stronger effect that a concurrent
tapping task (experiment 2). However, if rhythms are memorized for subsequent motor action,
concurrent sub-vocal articulation has a significantly stronger effect than concurrent finger-
tapping, suggesting the involvement of the articulatory loop. A likely explanation is that rhythm
memorization for action involves a different representation than memorization for comparison
and decision, e.g. a form of encoding that enables vocal articulation (or pronunciation) and
processing by the articulatory loop. Such an explanation is supported by recent studies (Coull et
al., 2010; Wiener et al., 2011) indicating different processing networks underlying perceptual
and motor timing. Whether these different representations or processes exist simultaneously,
being tapped on demand, or whether they are only created in response to the specific
experimental task, remains to be addressed in further research. Earlier studies on verbal short-
term memory (Levy, 1971; Morton, 1970), however, suggested coexisting dual acoustic and
articulatory encodings for short-term memory of verbal material, and seem to point towards the
first possibility.
In our experiment musicians (70.9% mean correct responses) were better than non-
musicians (45.2% mean correct responses). The difference between them (25.7 pp) was much
larger than in the decision experiment (7.8 pp). This could be due to two factors: First, the
training that is a criterion for musician status in our study likely means that non-musicians were
less able to perform a correct reproduction of the auditory stimulus. Second, training received by
the musicians likely means they were able to transform the auditory stimulus rhythm into a more
stable form for memorization. Superior memory in musicians has been suggested by Jakobson et
al. (2008) and Tervaniemi et al. (2001). Determining the relative role of these two factors –
performance and memory trace stability – in causing this difference will require further research.
Another difference between experiments 2 and 3 was that the difference between
clapstick and vocal rhythms (40.4 pp) were larger than in the decision experiment (13.4 pp).
Notably, there was a clear experimental task effect: clapstick rhythms were no more affected in
the reproduction than in the decision task (experiment 2), whereas the accuracy for vocal
rhythms was considerably reduced. In addition, decisions on vocal rhythms were more affected
by the sub-vocal DISTRACTOR than clapstick rhythms. As there were no significant
interactions between TRAINING and TIMBRE, the results for the main factors can be
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 20
considered as further support for our hypothesis that clapstick and vocal rhythms have different
representations in memory, and suggest that the representation of the clapstick rhythms for
reproduction tasks and their representation for decision tasks may not be very different from each
other. This is in contrast to the representations of vocal rhythms, which seem to more strongly
rely on articulatory encoding in the reproduction task.
Experiments 2 and 3 support that multiple forms of representation or encoding may be
involved in short-term rhythm memorization, and that they are activated and/or tapped
depending on whether the task requires motor action or comparison/decision. They also add
further evidence for different processing and representations of vocal and instrumental rhythms.
In the following experiment we test to what extent these differences can be explained by the
contrast between the ‘empty’ clapstick and the ‘filled’ vocal rhythms of our stimuli.
Experiment 4:
4.1. Introduction:
Vocal rhythms are formed by a complex combination of changes in pitch, duration, amplitude,
and timbre while clapstick rhythms are constituted by events that show minimal variation in
pitch and timbre. One of the conspicuous differences between the two is that vocal rhythms are a
series of seemingly continuous sounds whereas clapstick rhythms are a set of discontinuous
sounds. This difference has been described by the terms ‘filled’ and ‘empty’: for filled intervals
the end of one event coincides with the start of the following event, whereas for empty intervals
one event ends before the following event starts, i.e. events are separated by silence.
Differences in the perception of these two interval types were already described over
hundred years ago. In his Principles of Psychology (1890), James described the ‘filled interval
illusion’ that refers to the phenomenon that filled intervals are perceptually longer than empty
intervals, even though the duration of both intervals is the same. Subsequently, experimenters
have found that the estimation of filled interval duration is more accurate than that of empty
intervals (Goldstone & Goldfarb, 1963; Rammsayer & Lima, 1991; Rammsayer & Skrandies,
1998). Though most research on duration perception of filled and empty intervals deals with
single intervals (but see: Repp & Bruttomesso, 2009), it seems possible that these effects will
also be found for interval sequences.
Here our hypothesis is that, if the difference in memory for vocal and clapstick rhythms is
due to the difference between empty and filled rhythms, then we should find no difference
between the memory for vocal and filled instrumental rhythms. We would, however, expect to
find a difference between empty and filled instrumental rhythms of about the same order as the
differences between vocal and clapstick rhythms in experiment 1.
4.2. Methods:
Participants:
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 21
Musical experience was determined as in experiment 1. Twenty-one participants with self-
reported normal hearing took part in this experiment. Ten non-musicians (5 female; avg. age =
27.1 years, min. 20, max. 45) had an average of 2.55 years of formal music training (min.0, max.
10), and 11 musicians (6 female; avg. age = 28.27 years, min. 22, max. 34) had an average of
20.55 years of training (min.10, max.34).
Stimuli:
Stimuli were prepared as follows: The vocal rhythm set consisted of 14 sound clips from the
material used in the previous experiments. From each of these files we extracted the amplitude
envelope, and convolved it with a cello sound sample (cello model Charles Quenoil 1923;
sample from the University of Iowa Electronic Music Studio) to create the filled instrumental
rhythms. Several instrumental sounds were tried before settling on the cello as an adequate
stimulus choice. The cello sample was judged similar enough to the male voice in terms of pitch
and timbre in informal listening tests. Furthermore, the re-synthesis procedure – described next –
caused less obvious distortions using the cello sample than the other instrument samples tried.
Fourteen empty instrumental rhythms were formed by using 0.08s sound events created from the
same cello sound sample, with 0.03s rise time and 0.05s decay time to obtain amplitude
envelopes similar to those of the clapstick sounds of our earlier experiments. Their amplitude
peaks were aligned in time with the amplitude peaks of the filled rhythms, and the mean
amplitude of the empty rhythms was then adjusted to match that of the two other sets.
As for the previous experiments we created variants for each rhythm that differed in one
event by eliminating, adding, or changing the timing of one event. Types and locations of
changes within the files were evenly distributed across the sets. The resulting 84 rhythms (3*14
‘parents’ and 3*14 variants) were grouped into pairs in such a way that each of the parent
rhythms occurred once paired with itself and once with its variant. With these pairs we formed
two sets of 84 stimulus files, one for the immediate recall with an ISI of 0.5s between the two
rhythms, and one with an ISI of 15s for the delayed recall. Each of the two sets was then split
into two presentation blocks of 42 stimuli in such a way that rhythms paired with themselves and
with their variants did not occur in the same block.
Procedure:
The procedure for this experiment was the same as experiment 1. The experiment took
approximately 50 minutes.
4.3. Results:
The fourth experiment includes an additional factor 'RHYTHM' with three levels to denote the
stimulus types used; vocal, “filled” instrument, or “empty” instrument.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 22
Reaction time:
A repeated measurement ANOVA with factors TRAINING, RETENTION, RHYTHM, and
CONGRUITY was performed on the means of the subjects’ RT data. All main factors were
found to be significant (table 6), and there was a significant 2-way interaction for RHYTHM and
CONGRUITY.
(Insert Table 6 here)
The means for RHYTHM were 1.26s (vocal; C.I. 1.14 to 1.38), 1.39s (filled; C.I. 1.26 to
1.53), and 1.21s (empty; C.I. 1.12 to 1.30). Musicians (1.18s; C.I. 1.09 to 1.26) were 0.3s faster
(Cohen’s d = 0.34) than non-musicians (1.41s; C.I. 1.31 to 1.50) and RTs for the short
RETENTION (1.13s; C.I. 1.04 to 1.22) were 0.32s faster (Cohen’s d = 0.83) than for the long
RETENTION (1.45s; C.I. 1.36 to 1.54). The means of both participant groups showed
significantly longer RTs for delayed than for immediate recall. ‘Same’ decisions were made
0.78s faster (Cohen’s d = 2.76) than ‘different’ decisions. The interaction of
RHYTHM:CONGRUITY was caused by the filled instrumental rhythms showing longer RTs
than the other two rhythms in the ‘different’ condition, but not in the ‘same’ condition (see fig.
7A). For ‘different’ decisions, filled instrumental rhythms showed the longest RT and empty
rhythms showed the shortest RT in both RETENTION conditions and for both subject groups.
Figure 7, A: RT interaction plot for RHYTHM:CONGRUITY from experiment 4. B: RT interaction plot for RHYTHM :RETENTION from experiment 4. Emp: empty, fil: filled, voc: vocal rhythms, s: ‘same’ rhythm, d: ‘different’ rhythm, long: 15s RETENTION, short: 0.5s RETENTION. Error bars represent 1 standard error.
Accuracy:
A generalized linear mixed effects analysis with TRAINING, RETENTION, RHYTHM, and
CONGRUITY as fixed effects and subjects as random effects was performed following the
procedure outlined in exp.1. Results for the ANOVA and model parameter estimates of the
generalized linear mixed model are displayed in table 7.
(Insert Table 7 here)
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 23
ANOVA results showed that main factors TRAINING, RHYTHM and CONGRUITY, as well as
interactions RETENTION:RHYTHM, RETENTION:CONGRUITY, RHYTHM:CONGRUITY,
and the three-way interaction TRAINING:RHYTHM:CONGRUITY were significant (fig. 8).
Estimated standardized effect sizes for the model are shown in table 7, and the simple effect
sizes (mean response difference) for the responses indicated the following: musicians gave 10.2
pp more correct responses than non-musicians; correct response rates for filled RHYTHM were
28.9 pp lower than for empty, while the difference between empty and vocal was 2.8 pp; ‘same’
pairs were judged 12.1 pp more correctly than ‘different’ pairs. Vocal RHYTHM was 7.7 pp
more correct for the short than for the long retention span, while responses to filled RHYTHM
did not change significantly. Responses for ‘same pairs’ were correct 18.7 pp more often for the
short than the long RETENTION, whereas the rate change for ‘different’ pairs was not
significant. Response rates for both filled (+29.7 pp) and vocal RHYTHM (+12.9 pp) were better
in the ‘same’ than in the ‘different’ condition. Finally, the difference in correct responses for
‘same’ and ‘different’ pairs in non-musicians and musicians was significant, both for filled (n:
45.5 pp , m: 13.9 pp) and for vocal RHTYHM (n: 24.0 pp, m: 1.9 pp).
Figure 8, A: Accuracy interaction between RETENTION:CONGRUITY from experiment 4. Accuracy(%): percentage correct, long: 15s RETENTION, short: 0.5s RETENTION, s: ‘same’ rhythm, d: ‘different’ rhythm. B: Accuracy interaction plot for RHYTHM:CONGRUITY, split by TRAINING from experiment 4. Emp: empty, fil: filled, voc: vocal rhythms, s: ‘same’ rhythm, d: ‘different’ rhythm. Error bars represent 1 standard error.
4.4. Discussion:
The results of this experiment provide evidence for differential memory for vocal, filled and
empty instrumental rhythms in terms of both RT and accuracy. The participants showed the
shortest RT (1.21s) and the best accuracy (88.6%) for empty rhythms. In contrast, filled rhythms
had the longest RT (1.39s) and showed the lowest accuracy (58.1%). This supports one part of
our initial hypotheses about the difference between empty and filled rhythms. Given that the
empty rhythms were derived from the corresponding filled rhythms, these two sets differ only in
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 24
their amplitude envelopes. The results indicate that it was easier to distinguish events and extract
temporal information from empty than from filled rhythms. With spectral composition and pitch
being identical, the factor that seems to account most for this difference was the clear delineation
of the event on- and offsets in empty rhythms. This contrasts with results from studies on single
filled and empty intervals mentioned in 4.1, and indicates that, unlike duration perception (Repp
& Bruttomesso, 2009), accuracy data for single intervals do not generalize to rhythmic sequences
of these intervals.
The second part of our initial hypothesis was that if the memory differences between
vocal and clapstick rhythms were only due to the contrast between filled and empty sounds, we
should find a difference between vocal and empty rhythms but not between vocal and filled
rhythms. However, we found that the differences between vocal and empty rhythms (RT: 0.05s,
accuracy 2.8 pp) were much smaller than between filled and empty rhythms (0.18s, 28.4 pp) and
that the difference between filled and vocal rhythms (0.13s, 26.1 pp) was significant. The
differences in memory for vocal and filled instrumental rhythms can be explained by the
different sounds they used: vocal sounds contain pitch and spectral changes as additional cues for
the rhythm extraction that seemed to improve decision accuracy. At the same time, though vocal
sounds are more complex than the filled instrumental sounds, these additional features seem to
facilitate processing and lead to faster decisions. A main factor contributing to these differences
is probably the preference of the human auditory system for vocal sounds. Neuroanatomical
(Rauschecker, Tian, & Hauser, 1995; Wang, 2000), electrophysiological (Levy et al., 2003) and
imaging studies (Belin et al., 2000; Binder et al., 2000; Zatorre et al., 2002) indicate that the
auditory cortex includes functionally specialized regions that show response preferences for
conspecific vocalizations. Voice is the most important and core medium of the human
communication system and this preference is manifest in the fact that complex voice sounds
were processed faster than simpler ‘non-voice’ sounds. This is in line with speech perception
research showing that human voice sounds lead to faster and better identification than closely
modelled synthetic sounds (Hillenbrand & Nearey, 1999; ter Schure, Chládková, & van Leussen,
2011).
In this experiment, musicians showed significantly faster RT (-0.14s) and better accuracy
(+10.2 pp) than non-musicians. It suggests that musical training positively contributes to rhythm
memory. This is consistent with the results of the rhythm recognition tasks in experiments 1 and
2, and also with those of the reproduction task in experiment 3. The potential role of musical
training on memory for rhythm will be explored further in the general discussion.
The ‘same’ CONGRUITY condition showed faster RT (-0.72s) and better accuracy
(+12.1 pp) than the different condition. Similar results were also found in the first two
experiments, and suggest that unlike for visuo-spatial stimuli (Keuss, 1977), same/different
decisions made on acoustic-temporal stimuli affect both RTs and accuracy of the responses. The
results from these three experiments support the idea that different processes and representations
are involved in same/different decisions (Bagnara et al., 1982; Markman & Gentner, 2005), with
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 25
the slower RT for ‘different’ decisions probably being related to additional and different re-
check processes when detecting differences (see section 1.4).
Examining the significant interaction of factors RETENTION and CONGRUITY, the
same condition showed a significant decline in accuracy for long RETENTION and the accuracy
for the different condition improved slightly in long RETENTION. One plausible explanation is
that different types of rhythm representation may be involved in same and different judgments.
Perceptual pre-categorical information may be used for the holistic and automatic same decisions
while categorical representation (e.g. music notation) may be used for the analytic and deliberate
different decisions. Perceptual representations would be derived from sensory memory, which
fades rapidly (Baddeley, 1990). Categorical representations, on the other hand, are better
memorized than sensory traces (Snyder, 2000) and could explain the accuracy in the different
condition. Finally, the interactions between RHYTHM and CONGRUITY and between
TRAINING, RHYTHM and CONGRUITY suggest that the smaller difference between
same/different pairs in musicians may be related to their improved ability to form categorical
representations for rhythm due to their training and knowledge of music notation.
General discussion:
Decay and Encoding
Two retention periods were used in experiment 1 and 4, immediate recall with ISI of 0.5s and
delayed recall with 12.5 and 15s, respectively, and results from both experiments show that RT
increases and accuracy decreases with increasing retention span. Two factors that have been
shown to cause worsening memory performance, divided attention (Craik & Kester, 1999) and
interference (Nairne, 2002), do not seem to play a major role in our experiments. We would
expect attention to affect clapstick and vocal rhythms and same/different conditions in the same
way, whereas our results show different deterioration rates for the clapstick-vocal rhythms as
well as for same/different conditions. Additionally, Mercer and McKeown (2014) have shown
that refocusing attention in delayed decision experiments through the introduction of alert signals
does not improve memory performance; degrading attention does not seem to be a cause for the
observed decay. Though our study does not directly contrast interference and decay, retroactive
interference (McGeoch, 1932) can be excluded as our experiments 1 and 4 used neither
distractors nor interference from irrelevant stimuli during the retention period, and decisions on
each stimulus pair were made before a new pair was presented. Consequently, we take the results
from our experiments as evidence in favour of decay in short-term rhythm memory.
The idea that memory traces fade with time, however, has been strongly contested since
McGeoch (1932) proposed his interference theory, and almost all claims about decay have turned
out to be explainable with interference theory (Berman, Jonides, & Lewis, 2009; Nairne, 2002),
with a few notable exceptions. In a pitch matching decision task Harris (1952) found an
increasing decline in performance with increasing retention intervals, and no convincing
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 26
alternative explanation has so far been offered for his results. Mercer and McKeown (2014)
have recently demonstrated trace decay in a timbre matching and a pitch matching memory
experiment. Notably these studies, including the present one, are short-term memory tasks on
nonverbal stimuli: pitch, timbre, and rhythm (Harris, 1952; Mercer & McKeown, 2014). In
contrast, most studies that have failed to convincingly demonstrated memory trace decay have
focused on retention of verbal material and the multiple components of this mechanism; that is,
they have used stimuli that are susceptible to verbal coding and participants could easily engage
in rehearsals to maintain memory–a strategy less applicable to nonverbal stimuli.
Concerning the time course of decay our conclusions are limited because we used only
two retention periods. Nonetheless, our results are compatible with those of Harris (1952) and
Mercer and McKeown (2014), identifying decay for a period of at least 15 to 30 seconds. This
contrasts with Baddeley’s (1990) hypothesis that decay is limited to the first few (ca. 5) seconds
of the retention period. These differences may have to do with the different stimulus material
considered (verbal vs non-verbal) and the different encodings involved. However, Mercer and
McKeown’s (2014) second experiment, using a masking paradigm, indicates that the decay
observed at longer retention periods is most likely due to more factors than simply the effect of
sensory memory fading.
Harris (1952) attempted to explain the depreciation of memory performance by alluding
to the decline of neuronal activity resulting from the stimuli. More recently, Jonides et al., (2008)
have proposed another model and preliminary evidence for the link between neuronal encoding
and decay, but the link between neural activity decline and memory performance was not very
strong and their model will no doubt undergo further testing and specification. Our study points
to yet another factor shaping memory decay, the form of encoding. As discussed above, different
decay rates were found for vocal and clapstick rhythms and also between musicians’ and non-
musicians’ responses. Results from our second experiment suggest that these differences are not
due to rehearsal, nor is it likely they can be attributed to other maintenance strategies like
refreshing (Raye, Johnson, Mitchell, Greene, & Johnson, 2007). We therefore suggest these
differences may be related to differences in pre-categorical (sensory memory) encoding and/or
timing differences of subsequent categorical encoding: the relatively uncomplicated temporal
features of clapstick sound sequences might lead to different sensory encodings and allow for
more rapid categorical recoding, thus offering more stable memory traces, than the complex
vocal sound sequences. Similarly, as discussed earlier, the training of musicians that allows for
different (Aleman et al., 2000) and faster (Kraus & Chandrasekaran, 2010) categorical recoding
may contribute to their better and more stable memory performance. One plausible explanation
would be that the decay rates reported in the present study were largely shaped by the degree and
extent to which pre-categorical sensory and categorical encodings are involved in the memory
processes.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 27
Musical Training and Memory for Rhythm
Throughout all four experiments in this study, musicians were more accurate than non-musicians
in memory for rhythm tasks. This difference varied according to the task performed, and was
much larger when the participants were asked to reproduce the rhythm (25.7 pp) than when they
were asked to recognize a difference in the rhythm (7.7 - 10.2 pp). The reproduction of a rhythm
relies on both on the extraction of and memory for the rhythm – our main interests – and also on
the ability to accurately reproduce a rhythm. Musical training likely confers a general advantage
on the latter, and it is possible that that explains part of the difference between the reproduction
and recognition experiments. Clearly, however, in our study the musicians were better able to
extract and remember the clapstick and vocal rhythms we employed.
On the one hand, musical training seems to contribute to faster extraction of timing
information (Kraus & Chandrasekaran, 2010) and also improves auditory processing in some
brain areas (i.e. planum temporale and dorso-lateral prefrontal cortex) that are also used in
language processing (Franklin et al., 2008; Ohnishi et al., 2001). On the other hand, their training
may enable professional musicians to employ different forms of mental representation of
rhythms than non-musicians (Aleman, Nieuwenstein, Böcker, & de Haan, 2000; Schaal et al.,
2015; Palmer & Krumhansl, 1990). Musicians could use forms of visuospatial representations to
memorize rhythms because they have been trained to transfer rhythms into musical notation.
Musical training may enable musicians to use this symbolic system which may be effective in
extracting rhythms quickly and storing them in more stable and easier to recall representations
(Brodsky, Kessler, Rubinstein, Ginsborg, & Henik, 2008). During the participant debriefing after
experiment 4, several musicians reported that they had combined instrumental rhythms with
images of cello performances in order to extract and repeat the rhythm. This suggests that
musicians may also transform instrumental sounds to either visual images or bodily movements
that support or improve the retention of rhythmic patterns and points towards an interesting area
for further research.
Differences Between Vocal and Instrumental Rhythm Processing
What are the factors that shape the differential processing of our two rhythm types? We have
already discussed above that attention does not seem to account for a significant part of these
differences, though it is likely to play a general modulatory role on subjects’ responses. By
having the two experimental tasks performed on identical sets of stimuli (experiments 1, 2, and
4), we effectively equated task-related attentional demands between them. Familiarity, another
potential confound, also seems to be an unlikely explanation. Although the musical components
of our stimuli were not uncommon, the music excerpts were taken from a music and language
culture none of the subjects was familiar with or had ever heard before. Rather, the present study
hints at three different factors contributing to the differential processing of vocal and
instrumental rhythms.
The first one has already been referred to above and concerns the acoustical stimulus
features pertinent for a hypothesized rhythm detection and extraction (encoding) process (e.g.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 28
Jones, 1976; Jones & Boltz, 1989; Large & Jones, 1999). The temporal features relevant for this
process appear easier to obtain from clapsticks than from vocal signals, perhaps because the
former have minimum variation in their spectral and pitch components. The task of obtaining the
temporal information from clapsticks could basically be reduced to a determination of the
amplitude peak or onset sequences. Extraction of vocal rhythm, on the other hand, requires
simultaneous consideration of multiple features, the identification of relevant changes in
dynamics, pitch, and spectra. This factor probably explains at least part of the accuracy
differences for vocal and clapstick rhythms in the present study. Music makes use of a wide
range of instrumental rhythms, from clapstick-like percussion rhythms to more voice-like
rhythms of bowed string instruments. The degree to which these are processed differently from
vocal rhythms needs to be assessed in further studies comparing a wider range of instrumental
and vocal rhythms. However, it is unlikely that the complexity issue can account for the full
range of differences identified here because there are other features that distinguish vocal from
instrumental sound sequences (see Vouloumanos et al., 2001 for a similar argument concerning
processing differences between speech and non-speech sounds). For example, the human voice
has a typical spectral energy distribution easily distinguished from most musical instruments.
Furthermore, there is a relative independence of fundamental frequency of the voice and the
resonance characteristic of the vocal tract that allows for the generation of sound sequences
impossible to produce on musical instruments (Fant, 1960). These features, and the implication
they have for cognitive processing, lead us to the next point.
The second factor refers to the findings of various studies, showing that the human brain
possesses specializations for the processing of human vocal sounds (Belin et al., 2000; Bent et
al., 2006; Levy et al., 2001, 2003; Vouloumanos et al., 2001; Zatorre et al., 2002) and vocal
rhythms (Hung, 2011). These specializations can be understood as neuronal and cognitive
adaptations to the acoustic complexity and vital biological role of vocal sounds in intra-species
communication (Wang, 2000). For humans, quickly identifying sounds as vocal and processing
them as speech sounds can be crucial in social context, and it is something humans perform
effortlessly and automatically. In line with this, all imaging studies mentioned here found
stronger and partly different patterns of brain activation for processing of human voice than for
non-voice sounds. Theories of speech processing like the duplex theory (Whalen & Liberman,
1987) explain this specialization by hypothesizing different parallel processors operating on the
auditory input. This specialization and the possible ensuing preference for processing of human
voice help to explain the faster reaction times to vocal rhythms, despite their greater complexity,
when reaction times for single rhythms are compared (experiment 4) or when dual rhythms are
processed under additional cognitive load by a distractor task (experiment 2).
The third and final aspect we want to address here connects with the previous one.
Research on sensory motor integration in the auditory system (Pa & Hickok, 2008; Wang, 2000)
suggests that acoustical differences between vocal and non-vocal sounds and cortical processing
adaptations may not be the only factors to explain different processing of the respective rhythms.
Rather, as vocal and instrumental rhythms are produced via different motor-effectors in the body,
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 29
their processing leads to specific sensory-motor activations of different neural substrates and
different associated encodings (see also: Wang, 2000). Such distinct forms of sensory-motor
representation are, therefore, further likely candidates to explain processing differences between
vocal and instrumental rhythms. The contrast between decision and reproduction (experiments 2
and 3) provides the first hints at how encoding differences may be linked with different sensory-
motor activations, and this connection deserves further explorations in future studies employing
different methods.
Differential memory processing of vocal and instrumental rhythms, as discussed here,
indicates that short-term memorization of temporal information is a multi-component process in
which selection and working of components is influenced by the required tasks, participant
strategies, attention, as well as stimulus content and context. This is consistent with recently
formulated models of time processing (Coull et al., 2010; Ivry & Schlerf, 2008; Wiener et al.,
2011) as comprising multiple processing networks in which the actual processes employed are
task and context dependent, and these processes can furthermore be modulated by factors like
attention. Our results are not only compatible with these multi-process models but also seem to
offer additional support for them from rhythm memory research.
Implications for Musical Rhythm Research
The identification of processing differences and the evidence for multiple, different forms of
encoding of vocal and instrumental rhythms challenge the idea of musical rhythm as an abstract
feature of sound sequences: rhythm processing appears to have an important perceptual
component. How we process and experience rhythms is influenced by the specific sounds that
form those rhythms – a characteristic that has been largely ignored by musical rhythm research.
Results of the current study are one line of supporting evidence for the non-unitary nature and
diverse evolutionary origins of music. The distinction between vocal and instrumental rhythm
may be a reflection of their different origins in relation to the human body – one produced
actively inside the body, the other through limb action on external objects – as well as their
different significance in human interaction and communication. In his comparative approach,
combining cross-cultural, intra-specific and inter-specific components, Fitch (2006) emphasized
that what is generally called the ‘music faculty’ actually consists of various components that may
have very different evolutionary histories, and that talking about ‘music’ as a unitary
phenomenon risks obscuring these histories and preventing an understanding of the origins and
development music. He proposed a multi-component view in which vocal and instrumental
music are the central components, and he discussed various lines of evidence – from the design
features of music and language to the evolution of analogous and homologous behavioural traits
– in favour of this view. The existence of brain specializations that lead to differential processing
of vocal and non-vocal sounds, of vocal and non-vocal melodic contours, and vocal and
instrumental rhythms constitute strong support for such a view.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 30
References
Aleman, A., Nieuwenstein, M. R., Böcker, K. B., & de Haan, E. H. (2000). Music Training and
Mental Imagery Ability. Neuropsychologia, 38(12), 1664-1668. doi:10.1016/s0028-
3932(00)00079-8
Alvarez, C.J., Cottrell, D., Afonso, O. (2009). Writing dictated words and picture names. Applied
Psycholinguistics, 30, 205-223. doi:10.1017/s0142716409090092
Baddeley, A.D. (1990). Human Memory: Theory and Practice. Oxford: Oxford University Press.
Baddeley, A.D. (2010). Working Memory. Current Biology, 20(4), R136–R140.
doi:10.1016/j.cub.2009.12.014
Baddeley, A., Eldridge, M., & Lewis, V. (1981). The role of subvocalisation in reading. The
Quarterly Journal of Experimental Psychology Section A, 33(4), 439–454.
doi:10.1080/14640748108400802
Baddeley, A.D., & Hitch, G. J. (1994). Developments in the Concept of Working Memory.
Neuropsychology, 8, 485-493. doi:10.1037/0894-4105.8.4.485
Baddeley, A. D., Thomson, N., & Buchanan, M. (1975). Word Length and the Structure of
Short-Term Memory. Journal of Verbal Learning and Verbal Behavior, 14(6), 575-589.
doi:10.1016/s0022-5371(75)80045-4
Baguley, T. (2009) Standarized or simple effect size: What should be reported. British Journal of
Psychology, 100, 603-617. doi:10.1348/000712608X377117
Bagnara, S., Boles, D.B., Simion, F. & Umiltà, C. (1982) Can an analytic/holistic dichotomy
explain hemispheric asymmetries? Cortex, 18, 67-78. doi:10.1016/s0010-9452(82)80019-
1
Bates, D. M. (2010). lme4: Mixed-Effects Modeling with R. New York: Springer. Prepublication
version at: http://lme4.r-forge.r-project.org/book/
Bates, D., Maechler, M., Bolker, B. & Walker, S. (2014). lme4: Linear mixed-effects models
using Eigen and S4. R package version 1.1-7, http://CRAN.R-project.org/package=lme4.
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-Selective Areas
in Human Auditory Cortex. Nature, 403(6767), 309-312. doi:10.1038/35002078
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 31
Bent, T., Bradlow, A. R., & Wright, B. A. (2006). The Influence of Linguistic
Experience on the Cognitive Processing of Pitch in Speech and Non-Speech Sounds.
Journal of Experimental Psychology: Human Perception and Performance, 32(1), 97-
103. doi:10.1037/0096-1523.32.1.97
Berman, M.G., Jonides, J., & Lewis, R.L. (2009). In Search of Decay in Verbal Short Term
Memory. Journal of Experimental Psychology: Learning, Memory and Cognition, 35(2),
317-333. doi:10.1037/a0014873
Binder, J. R., Frost, J. A., Hammeke, T. A., Bellgowan, P. S. F., Springer, J. A., Kaufman, J. N.,
& Possing, E. T. (2000). Human Temporal Lobe Activation by Speech and Nonspeech
Sounds. Cerebral Cortex, 10(5), 512 -528. doi:10.1093/cercor/10.5.512
Briggs, G. E., & Johnson, A. M. (1973). On the Nature of Central Processing in Choice
Reactions. Memory & Cognition, 1(1), 91-100. doi:10.3758/bf03198076
Brodsky, W., Kessler, Y., Rubinstein, B. S., Ginsborg, J., & Henik, A. (2008). The Mental
Representation of Music Notation: Notational Audiation. Journal of Experimental
Psychology: Human Perception and Performance, 34(2), 427. doi:10.1037/0096-
1523.34.2.427
Coull, J.T., Cheng, R.K. & Meck, W.H. (2010). Neuroanatomical and Neurochemical Substrates
of Timing. Neuropsychopharmacology Reviews. 36(1), 3-25. doi:10.1038/npp.2010.113
Cowan, N. (1984). On Short and Long Auditory Stores. Psychological Bulletin, 96(2), 341-
370. doi:10.1037//0033-2909.96.2.341
Cowan, N. (1997). Attention and Memory: An Integrated Framework. New York: Oxford
Psychology Series.
Cowan, N. (2008). What are the Differences Between Long-term, Short-term, and Working
Memory? Progress in Brain Research, 169, 323-338. doi:10.1016/s0079-6123(07)00020-
9
Craik, F.I.M. & Kester, J.D. (1999). Divided Attention and Memory: Impairment of Processing
or Consolidation? In E. Tulving (Ed.), Memory, consciousness, and the brain: The Tallin
conference. Philadelphia: Psychology Press. pp. 38-51.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 32
Deutsch, D. (1986). Recognition of Durations Embedded in Temporal Patterns. Perception &:
Psychophysics, 39(3), 179-186. doi:10.3758/bf03212489
Dixon, R.M.W. & Koch, G. (1996). Dyirbal Song Poetry Traditional Songs of an Australian
Rainforest People. [CD]. Mascot, N.S.W : Larrikin.
Fant, G. (1960) Acoustic Theory of Speech Production. The Hague (The Netherlands): Mouton.
Finnigan, S., Humphreys, M.S., Dennis, S., & Geffen, G. (2002). ERP ‘old/new‘ effects:
memory strength and decisional factor(s). Neuropsychologia, 40, 2288-2304.
doi:10.1016/s0028-3932(02)00113-6
Fitch, W.T. (2006). The Biology and Evolution of Music: a Comparative Perspective.
Cognition, 100(1), 173-215. doi:10.1016/j.cognition.2005.11.009
Forster, K. I., & Forster, J. C. (2003). DMDX: A Windows Display Program with Millisecond
Accuracy. Behavior Research Methods, 35(1), 116–124. doi:10.3758/bf03195503
Franklin, M. S., Moore, K. S., Yip, C.-Y., Jonides, J., Rattray, K., & Moher, J. (2008). The
Effects of Musical Training on Verbal Memory. Psychology of Music, 36( 3), 353-365.
doi:10.1177/0305735607086044
Gaser, C., & Schlaug, G. (2003). Brain Structures Differ between Musicians and Non-
Musicians. The Journal of Neuroscience, 23(27), 9240-9245.
Glenberg, A.M. & Jona, M. (1991). Temporal Coding in Rhythm Tasks Revealed by Modality
Effects. Memory and Cognition, 19 (5), 514-522. doi:10.3758/bf03199576
Goldstone, S., & Goldfarb, J. L. (1963). Judgment of Filled and Unfilled Durations: Intersensory
Factors. Perceptual and Motor Skills, 17(3), 763-774. doi:10.2466/pms.1963.17.3.763
Halliday, M. S., Hitch, G. J., Lennon, B., & Pettipher, C. (1990). Verbal short-term memory in
children: The role of the articulator loop. European Journal of Cognitive Psychology,
2(1), 23–38. doi:10.1080/09541449008406195
Harris, J. D. (1952). The Decline of Pitch Discrimination with Time. Journal of Experimental
Psychology, 43(2), 96–99. doi:10.1037/h0057373
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 33
Hillenbrand, J. M., & Nearey, T. M. (1999). Identification of Resynthesized /hVd/ Utterances:
Effects of Formant Contour. The Journal of the Acoustical Society of America, 105(6),
3509-3523. doi:/10.1121/1.424676
Hung, T. H. (2011). One music? Two musics? How many musics? Cognitive
Ethnomusicological, Behavioral, and fMRI Study on Vocal and Instrumental Rhythm
Processing (Doctoral dissertation). The Ohio State University, Columbus OH.
http://rave.ohiolink.edu/etdc/view?acc_num=osu1308317619
Ivry, R.B. & Schlerf, J.E. (2008). Dedicated and intrinsic models of time perception. Trends in
Cognitive Sciences 12: 273–280. doi:10.1016/j.tics.2008.04.002
Jakobson, L. S., Lewycky, S. T., Kilgour, A. R., & Stoesz, B. M. (2008). Memory for Verbal and
Visual Material in Highly Trained Musicians. Music Perception: An Interdisciplinary
Journal, 26(1), 41-55. doi:10.1525/mp.2008.26.1.41
James, W. (1890). The Principles of Psychology (Vols. 1-2 ). New York: Holt.
Johns, B.T., Jones, M.N., & Mewhort, D.J.K. (2012). A synchronization account of false
recognition. Cognitive Psychology, 65(4), 486-518. doi:10.1016/j.cogpsych.2012.07.002
Jones, M. R. (1976). Time, our lost dimension: Toward a new theory of perception, attention,
and memory. Psychological Review, 83(5), 323–355. doi:10.1037/0033-295x.83.5.323
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time. Psychological
Review, 96(3), 459–491. doi:10.1037/0033-295x.96.3.459
Jonides, J., Lewis, R.L., Nee, D.E., Lustig, C.A., Berman, M.G. & Moore, K.S. (2008). The
Mind and Brain of Short-Term Memory. Annual Review of Psychology, 59, 193-224.
doi:10.1146/annurev.psych.59.103006.093615
Keller, T. A., Cowan, N., & Saults, J. S. (1995). Can Auditory Memory for Tone Pitch be
Rehearsed? Journal of Experimental Psychology: Learning, Memory, and Cognition,
21(3), 635. doi:10.1037//0278-7393.21.3.635
Keuss, P.I.G. (1977). Processing of Geometrical Dimensions in a Binary Classification Task:
Evidence for a Dual Process Model. Perception & Psychophysics, 21(4), 371-376.
doi:10.3758/bf03199489
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 34
Kilgour, A. R., Jakobson, L. S., & Cuddy, L. L. (2000). Music Training and Rate of Presentation
as Mediators of Text and Song Recall. Memory & Cognition, 28(5), 700-710.
doi:10.3758/bf03198404
Koelsch, S., Schröger, E., & Tervaniemi, M. (1999). Superior Pre-Attentive Auditory Processing
in Musicians. Neuroreport, 10(6), 1309-1313. doi:10.1097/00001756-199904260-00029
Kraus, N., & Chandrasekaran, B. (2010). Music Training for the Development of Auditory
Skills. Nature Reviews Neuroscience, 11(8), 599-605. doi:10.1038/nrn2882
Krueger, L.E. (1978). A Theory of Perceptual Matching. Psychological Review, 85(4), 278-
304. doi:10.1037//0033-295x.85.4.278
Kuznetsova, A., Brockhoff, P.B., & Christensen, R.H.B. (2014). lmerTest: Tests for random and
fixed effects for linear mixed effect models (lmer objects of lme4 package).. R package
version 2.0-11. http://CRAN.R-project.org/package=lmerTest
Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying
events. Psychological Review, 106(1), 119–159. doi:10.1037/0033-295x.106.1.119
Levy, B.A. (1971). Role of Articulation in Auditory and Visual Short-Term Memory. Journal of
Verbal Learning and Verbal Behavior, 10(2), 123-132. doi:10.1016/s0022-
5371(71)80003-8
Levy, D., Granot, R., & Bentin, S. (2001). Processing Specificity for Human Voice
Stimuli: Electrophysiological Evidence. NeuroReport, 12(12), 2653-2657.
doi:10.1097/00001756-200108280-00013
Levy, D., Granot, R., & Bentin, S. (2003). Neural Sensitivity to Human Voices: ERP Evidence
of Task and Attentional Influences. Psychophysiology, 40(2), 291–305.
doi:10.1111/1469-8986.00031
MacMillan, N., & Creelman, D. (2005). Detection Theory: A User’s Guide (2nd ed.). Mahwah,
New Jersey: Lawrence Erlbaum Associates, Inc.
Malmberg, K.J., & Xu, J. (2007). On the flexibility and on the fallibility of associative
memory. Memory & Cognition, 35(3), 545–556. doi:10.3758/bf03193293
Markman, A.B., & Gentner, D. (2005). Nonintentional Similarity Processing. In R.
Hassin, J.A. Bargh, & J.S. Uleman (Eds.) The New Unconscious (pp. 107-137). New
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 35
York: Oxford University Press. doi:10.1093/acprof:oso/9780195307696.003.0006
Mathias, S., Micheyl, C., & Shinn-Cunningham, B. (2014). Gradual Decay of Auditory Short-
Term Memory. Journal of the Acoustical Society of America, 135(4.2), 2412.
doi:10.1121/1.4878005
McGeoch, J. (1932). Forgetting and the Law of Disuse. Psychology Review, 39(4), 352-370.
doi:10.1037/h0069819
Mercer, T., & McKeown, D. (2014). Decay Uncovered in Nonverbal Short-Term Memory.
Psychonomic Bulletin & Review, 21(1), 128–135. doi:10.3758/s13423-013-0472-6
Mewhort, D. J. K., & Johns, E. E. (2005). Sharpening the echo: An iterative-resonance model for
short-term recognition memory. Memory, 13, 300–307. doi:10.1080/09658210344000242
Morton, J. (1970). A Functional Model of Memory. In D. A. Norman (Ed.), Models of Human
Memory (pp. 203-254). New York: Academic Press. doi:10.1016/b978-0-12-521350-
9.50012-7
Musacchia, G., Sams, M., Skoe, E., & Kraus, N. (2007). Musicians have Enhanced Subcortical
Auditory and Audiovisual Processing of Speech and Music. Proceedings of the National
Academy of Sciences, 104(40), 15894 -15898. doi:10.1073/pnas.0701498104
Musacchia, G., Strait, D., & Kraus, N. (2008). Relationships Between Behavior, Brainstem and
Cortical Encoding of Seen and Heard Speech in Musicians and Non-Musicians. Hearing
Research, 241(1-2), 34-42. doi:10.1016/j.heares.2008.04.013
Nairne, J.S. (2002). Remembering Over the Short-Term: The Case Against the Standard Model.
Annual Review of Psychology, 53(1), 53-81.
doi:10.1146/annurev.psych.53.100901.135131
Neath, I., & Surprenant, A. M. (2003). Human Memory: an Introduction to Research, Data, and
Theory (2nd ed.). Thomson/Wadsworth.
Neisser, U. (1967). Cognitive Psychology. East Norwalk, CT: Appleton,Century,Crofts.
Ohnishi, T., Matsuda, H., Asada, T., Aruga, M., Hirakata, M., Nishikawa, M., Katoh, A. &
Imabayashi, E. (2001). Functional Anatomy of Musical Perception in Musicians.
Cerebral Cortex, 11(8), 754–760. doi:10.1093/cercor/11.8.754
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 36
Pa, J. & Hickok, G. (2008). A Parietal-Temporal Sensory-Motor Integration Area for the Human
Vocal Tract: Evidence from an fMRI Study of Skilled Musicians. Neuropsychologia,
46(1), 362-368. doi: 10.1016/j.neuropsychologia.2007.06.024
Parbery-Clark, A., Skoe, E., Lam, C., & Kraus, N. (2009). Musician Enhancement for Speech-
In-Noise. Ear and Hearing, 30(6), 653-661. doi:10.1097/aud.0b013e3181b412e9
Palmer, C., & Krumhansl, C. L. (1990). Mental Representations for Musical Meter. Journal of
Experimental Psychology: Human Perception and Performance, 16(4), 728.
doi:10.1037//0096-1523.16.4.728
Papagno, C., Valentine, T., & Baddeley, A. (1991). Phonological Short-Term Memory and
Foreign-Language Vocabulary Learning. Journal of Memory and Language, 30, 331-347.
doi:10.1016/0749-596x(91)90040-q
Poss, N. F. (2012). Hmong Music and Language Cognition: An Interdisciplinary Investigation
(Doctoral dissertation). The Ohio State University, Columbus OH.
http://rave.ohiolink.edu/etdc/view?acc_num=osu1332472729
Poss, N., Hung, T.H., & Will, U. (2008). The Effects of Tonal Information on Lexical
Activation in Mandarin Speakers. In Proceedings of the 20th North American Conference
on Chinese Linguistics, vol. 1 (NACCL-20, Columbus, OH: The Ohio State University,
2008), 205-211.
Povel, D.J. (1981). Internal Representation of Simple Temporal Patterns. Journal of
Experimental Psychology: Human Perception &: Performance,7(1), 3-18.
doi:10.1037//0096-1523.7.1.3
Povel, D.J., & Essens, P. (1985). Perception of Temporal Patterns. Music Perception 2(4),
411-440. doi:10.2307/40285311
Proctor, R.W. (1981). A Unified Theory for Matching-Task Phenomena. Psychological Review,
88(4), 291-326. doi:10.1037//0033-295x.88.4.291
R Development Core Team (2010). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
http://www.R-project.org.
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 37
Rammsayer, T. H., & Lima, S. D. (1991). Duration Discrimination of Filled and Empty Auditory
Intervals: Cognitive and Perceptual Factors. Perception & Psychophysics, 50(6), 565-
574. doi:10.3758/bf03207541
Rammsayer, T. H., & Skrandies, W. (1998). Stimulus Characteristics and Temporal Information
Processing: Psychophysical and Electrophysiological Data. Journal of Psychophysiology,
12(1), 1-12.
Rauschecker, J. P., Tian, B., & Hauser, M. (1995). Processing of Complex Sounds in the
Macaque Nonprimary Auditory Cortex. Science, 268(5207), 111-114.
doi:10.1126/science.7701330
Raye, C. L., Johnson, M. K., Mitchell, K. J., Greene, E. J., & Johnson, M. R. (2007). Refreshing:
A Minimal Executive Function. Cortex, 43(1), 135–145. doi:10.1016/s0010-
9452(08)70451-9
Repp, B. H., & Bruttomesso, M. (2009). A Filled Duration Illusion in Music: Effects of Metrical
Subdivision on the Perception and Production of Beat Tempo. Advances in Cognitive
Psychology, 5, 114. doi:10.2478/v10053-008-0071-7
Saito, S., & Ishio, A. (1998). Rhythmic Information in Working Memory: Effects of Concurrent
Articulation on Reproduction of Rhythms. Japanese Psychological Research, 40(1), 10–
18. doi:10.1111/1468-5884.00070
Salamé, P., & Baddeley, A. (1982). Disruption of Short-Term Memory by Unattended Speech:
Implications for the Structure of Working Memory. Journal of Verbal Learning and
Verbal Behavior, 21(2), 150-164.
doi:10.1016/s0022-5371(82)90521-7
Schaal, N. K., Banissy, M. J., & Lange, K. (2014). The Rhythm Span Task: Comparing Memory
Capacity for Musical Rhythms in Musicians and Non-Musicians. Journal of New Music
Research, 44(1), 3–10. doi:10.1080/09298215.2014.937724
Snyder, B. (2000). Music and Memory: An Introduction. MIT Press.
ter Schure, S., Chládková, K., & van Leussen, J. W. (2011). Comparing Identification of
Artificial and Natural Vowels. Paper presented at the 17th International Congress of
Phonetic Sciences, Hong Kong. Retrieved from http://dare.uva.nl/document/2/101523
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 38
Tervaniemi, M., Rytkönen, M., Schröger, E., Ilmoniemi, R. J., & Näätänen, R. (2001). Superior
Formation of Cortical Memory Traces for Melodic Patterns in Musicians. Learning &
Memory, 8(5), 295 -300. doi:10.1101/lm.39501
University of Iowa Electronic Music Studio. (2012). Cello. Retrieved from
http://theremin.music.uiowa.edu/MIScello.html
Vouloumanos, A., Kiehl, K.A., Werker, J.F., & Liddle, P. F. (2001). Detection of Sounds in the
Auditory Stream: Event-Related fMRI Evidence for Differential Activation to Speech
and Nonspeech. Cognitive Neuroscience, 13(7), 944-1005.
doi:10.1162/089892901753165890
Wang X. (2000). On Cortical Coding of Vocal Communication Sounds in Primates. Proceedings
of the National Academy of Sciences, 97(22), 11843–11849.
doi:10.1073/pnas.97.22.11843
Whalen, D. H., & Liberman, A. M. (1987). Speech Perception Takes Precedence over
Nonspeech Perception. Science, 237(4811), 169–171.
doi:10.1126/science.3603014
Wiener, M., Matell, M.S., & Coslett, H.B. (2011). Multiple Mechanism for Temporal
Processing. Frontiers in Integrative Neuroscience. 5(31).
doi:10.3389/fnint.2011.00031.
Will, U., Nottbusch, G. Weingarten, R. (2006) Linguistic units in word typing. Written Language
& Literacy, 9(1), 153-176. http://dx.doi.org/10.1075/wll.9.1.10wil
Wright, D.B., Horry, R, & Skagerberg, E.M. (2009). Functions for traditional and multilevel
approaches to signal detection theory. Behavior Research Methods, 41 (2), 257-267.
doi:10.3758/BRM.41.2.257
Zatorre, R.J. (1998). How Do Our Brains Analyze Temporal Structure in Sound? Nature
Neuroscience (1)5, 343-345.
Zatorre, R.J., Belin, P., & Penhune, V.B. (2002). Structure and Function of Auditory Cortex:
Music and Speech. Trends in Cognitive Science, 6(1), 37-45.
doi:10.1016/s1364-6613(00)01816-7
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 39
Table 1: ANOVA summary of RT measurements from experiment 1, with between-subject
factor TRAINING (TRAIN) and within-subject factors RETENTION (RET), TIMBRE (TIMB),
CONGRUITY (CONG). Significant values are in bold.
Table 2a: GLMM ANOVA for minimal fitted model for experiment 1, with fixed effects
TRAINING (TRAIN); RETENTION (RET); TIMBRE (TIMB); CONGRUITY (CONG),
significant values in bold. 2b: Model estimates and z values (only significant parameters are
displayed). TRAINn = Training (non-musician); TIMBv = TIMBRE (voice); CONGs =
CONGRUITY (same).
Factor
Anova with Satterthwaite approximation of df
F value df. Satt P (>F)
TRAIN 4.83 23.0 0.038
RET 14.25 3643.5 <0.001
TIMB 45.70 3643.5 <0.001
CONG 6.89 3642.2 0.009
TRAIN:TIMB 13.71 3643.5 <0.001
RET:TIMB 5.95 3643.5 0.015
TIMB:CONG 8.59 3642.2 0.003
TRAIN:TIMB:CONG 3.09 3642.2 0.048
RET:TIMB:CONG 1.27 3642.2 0.281
TRAIN:RET:TIMB:CONG 3.51 3642.8 0.007
Factor F value P value
TRAIN F1,23<0.01 0.996
RET F1,23=124.85 <0.001
TIMB F1,23 = 0.06 0.816
CONG F1,23 = 382.47 <0.001
TRAIN:RET F1,23 = 1.09 0.307
TRAIN:TIMB F1,23=4.84 0.012
TRAIN:CONG F1,23 = 0.15 0.706
RET:TIMB F1,23<0.01 0.988
RET:CONG F1,23<0.17 0.686
TIMB:CONG F1,23<0.01 0.908
TRAIN:RET:TIMB F1,23<0.44 0.512
TRAIN:RET:CONG F1,23<0.01 0.958
TRAIN:TIMB:CONG F1,23=4.09 0.055
RET:TIMB:CONG F1,23<1.03 0.312
TRAIN:RET:TIMB:CONG F1,23<0.75 0.396
Parameter (fixed effects) Estimate z value P (>|z|)
TRAINn -1.167 -3.798 <0.001
TIMBv -1.713 -7.208 <0.001
CONGs -0.955 -3.882 <0.001
TRAINn:TIMBv 1.257 3.779 <0.001
RETs:TIMBv 0.837 2.493 0.013
TIMBv:CONGs 1.254 4.103 <0.001
TRAINn:TIMBv:CONGs -0.583 -2.060 0.039
RETs:TIMBv:CONGs -0.719 -2.713 0.006
TRAINn:RETs:TIMBv:CONGs 0.879 2.996 0.003
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 40
Table 3: ANOVA summary of RT measurements for experiment 2, with between-subject factor
TRAINING (TRAIN) and within-subject factors DISTRACTOR (DISTR), TIMBRE (TIMB),
CONGRUITY (CONG), and significant values in bold.
Table 4a: GLMM ANOVA for minimal fitted model for experiment 2, with fixed effects
TRAINING (TRAIN); DISTRACTOR (DIST); TIMBRE (TIMB); CONGRUITY (CONG),
significant values in bold. 4b: Model estimates and z values. TRAINm = TRAINING
(musician); DISTRt = DISTRACTOR (tapping); DISTsv = DISTRACTOR (subvocal); TIMBv
= TIMBRE (voice); only significant parameter are displayed.
Factor
Anova with Satterthwaite approximation of df
F value df. Satt P (>F)
TRAIN 4.47 22.0 0.035
DISTR 26.80 5293.8 <0.001
TIMB 114.65 5293.8 <0.001
CONG 56.16 5293.8 <0.001
DISTR:TIMB 3.81 5293.8 0.022
DISTR:CONG 3.43 5293.8 0.032
TIMB:CONG 22.50 5293.8 <0.001
Factor F value P value
TRAIN F1,21=1.03 0.322
DISTR F2,42=0.55 0.583
TIMB F1,21=9.16 0.006
CONG F1,21=370.11 <0.001
TRAIN:DISTR F2,42= 0.09 0.915
TRAIN:TIMB F1,21=0.03 0.871
TRAIN:CONG F1,21 < 0.01 0.978
DISTR:TIMB F2,42=1.15 0.326
DISTR:CONG F2,42=4.10 0.024
TIMB:CONG F1,21<0.47 0.500
TRAIN:DISTR:TIMB F2,42=1.85 0.169
TRAIN:DISTR:CONG F2,42=4.99 0.011
TRAIN:TIMB:CONG F1,21=0.47 0.465
DISTR:TIMB:CONG F2,42=0.43 0.656
TRAIN:DISTR:TIMB:CONG F2,42=0.03 0.467
Parameter (fixed effects) Estimate Z value P (>|z|)
TRAINm 0.391 2.359 0.018
DISTRt -1.008 -6.906 <0.001
DISTRsv -0.950 -6.510 <0.001
TIMBv -1.260 -9.233 <0.001
DISTRt:TIMBv 0.419 2.623 0.009
DISTRsv:TIMBv 0.418 2.630 0.009
DISTRt:CONGs 0.459 2.915 0.004
TIMBv:CONGs 0.596 4.739 <0.001
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 41
Table 5a: GLMM ANOVA for minimal fitted model for experiment 3, with fixed effects
TRAINING (TRAIN); DISTRACTOR (DISTR); TIMBRE (TIMB) and subjects as random
effects. Significant values are in bold. 5b: Model estimates and z values. TRAINn =
TRAINING (non-musician); DISTRsv = DISTRACTOR (subvocal); TIMBv = TIMBRE
(voice).
Factor
Anova with Satterthwaite approximation of df
F value df. Satt P (>F)
TRAIN 6.36 24.0 0.012
DISTR 5.78 3712.1 0.003
TIMB 641.15 3712.1 <0.001
Table 6: ANOVA summary of RT measurements for experiment 4, with factors TRAINING
(TRAIN), RETENTION (RET), RHYTHM (RHYT), CONGRUITY (CONG). Significant values
in bold.
Parameter (fixed effects) Estimate Z value P (>|z|)
TRAINn -1.492 -4.015 <0.001
DISTRsv -0.639 -4.347 <0.001
TIMBv -2.284 -25.191 <0.001
Factor F value P value
TRAIN F 1,20=7.70 0.012
RET F 1,20=45.17 <0.001
RHYT F 2,40=9.77 <0.001
CONG F1,20=502.55 <0.001
TRAIN:RET F1,20= 2.43 0.135
TRAIN:RHYT F2,40=0.65 0.527
TRAIN:CONG F1,20 = 3.91 0.062
RET:RHYT F2,40 = 3.21 0.058
RET:CONG F1,20=0.32 0.575
RHYT:CONG F2,40=29.68 <0.001
TRAIN:RET:RHYT F2,40=0.12 0.885
TRAIN:RET:CONG F1,20=1.08 0.311
TRAIN:RHYT:CONG F2,40=0.09 0.911
RET:RHYT:CONG F2,40=1.00 0.378
TRAIN:RET:RHYT:CONG F2,40=0.25 0.782
SHORT-TERM MEMORY FOR VOCAL & INSTRUMENTAL RHYTHMS 42
Table 7a: GLMM ANOVA for minimal fitted model for experiment 4, with fixed effects
TRAINING (TRAIN); RETENTION (RET); RHYTHM (RHYT); CONGRUITY (CONG), and
significant values in bold. Table 7b: Model estimates and z values. TRAINm = TRAINING
(musician); RHYTf = RHYTHM (filled); RHYTv = RHYTHM (vocal); CONGs =
CONGRUITY (same); only significant parameters displayed.
Factor
Anova with Satterthwaite approximation of df
F value df. Satt P (>F)
TRAIN 7.92 20.0 0.011
RET 2.88 3557.7 0.090
RHYT 134.20 3557.4 <0.001
CONG 39.07 3557.9 <0.001
TRAIN:RHYT 2.03 3557.4 0.135
RET:RHYT 11.48 3557.4 <0.001
RET:CONG 83.97 3557.6 <0.001
RHYT:CONG 31.69 3557.4 <0.001
TRAIN:RHYT:CONG 10.94 3557.5 <0.001
Parameter (fixed effects) Estimate Z value P (>|z|)
TRAINm 0.957 2.723 0.006
RHYTf -2.769 -10.690 <0.001
RHYTv -1.507 -5.577 <0.001
CONGs 1.590 6.170 <0.001
RETs:RHYTv 0.582 1.986 0.047
RETs:CONGs 2.128 9.977 <0.001
RHYTf:CONGs 2.682 8.612 <0.001
RHYTv:CONGs 2.947 7.829 <0.001
TRAINm:RHYTf:CONGs -1.480 -5.471 <0.001
TRAINm:RHYTv:CONGs -1.744 -4.507 <0.001