Analysis and synthesis of expressive guitar performance
-
Upload
khangminh22 -
Category
Documents
-
view
0 -
download
0
Transcript of Analysis and synthesis of expressive guitar performance
Analysis and Synthesis of Expressive Guitar Performance
A Thesis
Submitted to the Faculty
of
Drexel University
by
Raymond Vincent Migneco
in partial fulfillment of the
requirements for the degree
of
Doctor of Philosophy
May 2012
ii
Table of Contents
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 COMPUTATIONAL GUITAR MODELING . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Sound Modeling and Synthesis Techniques . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Wavetable Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 FM Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Additive Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Source-Filter Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.5 Physical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Summary and Model Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Synthesis Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Synthesis Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.2 Description and Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.3 New Music Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 PHYSICALLY INSPIRED GUITAR MODELING . . . . . . . . . . . . . . . . . . . . 14
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Waveguide Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.1 Solution for the Ideal, Plucked-String . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.2 Digital Implementation of the Wave Solution . . . . . . . . . . . . . . . . . . . . 15
3.2.3 Lossy Waveguide Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.4 Waveguide Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
iii
3.2.5 Extensions to the Waveguide Model . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Analysis and Synthesis Using Source-Filter Approximations . . . . . . . . . . . . . . 21
3.3.1 Relation to the Karplus-Strong Model . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.2 Plucked String Synthesis as a Source-Filter Interaction . . . . . . . . . . . . . . . 22
3.3.3 SDL Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.4 Excitation and Body Modeling via Commuted Synthesis . . . . . . . . . . . . . . 25
3.3.5 SDL Loop Filter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Extensions to the SDL Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 SOURCE-FILTER PARAMETER ESTIMATION . . . . . . . . . . . . . . . . . . . . 32
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Background on Expressive Guitar Modeling . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Excitation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3.1 Experiment: Expressive Variation on a Single Note . . . . . . . . . . . . . . . . . 34
4.3.2 Physicality of the SDL Excitation Signal . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.3 Parametric Excitation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Joint Source-Filter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.1 Error Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.2 Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 SYSTEM FOR PARAMETER ESTIMATION . . . . . . . . . . . . . . . . . . . . . . 43
5.1 Onset Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1.1 Coarse Onset Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1.2 Pitch Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.3 Pitch Synchronous Onset Detection . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1.4 Locating the Incident and Reflected Pulse . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2.2 Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
iv
5.3.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.3.2 Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6 EXCITATION MODELING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Previous Work on Guitar Source Signal Modeling . . . . . . . . . . . . . . . . . . . . 64
6.3 Data Collection Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.4 Excitation Signal Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4.1 Pitch Estimation and Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4.2 Residual Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4.3 Spectral Bias from Plucking Point Location . . . . . . . . . . . . . . . . . . . . . 70
6.4.4 Estimating the Plucking Point Location . . . . . . . . . . . . . . . . . . . . . . . 71
6.4.5 Equalization: Removing the Spectral Bias . . . . . . . . . . . . . . . . . . . . . . 74
6.4.6 Residual Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.5 Component-based Analysis of Excitation Signals . . . . . . . . . . . . . . . . . . . . 77
6.5.1 Analysis of Recovered Excitation Signals . . . . . . . . . . . . . . . . . . . . . . 77
6.5.2 Towards an Excitation Codebook . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.5.3 Application of Principal Components Analysis . . . . . . . . . . . . . . . . . . . 79
6.5.4 Analysis of PC Weights and Basis Vectors . . . . . . . . . . . . . . . . . . . . . . 81
6.5.5 Codebook Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.5.6 Codebook Evaluation and Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.6 Nonlinear PCA for Expressive Guitar Synthesis . . . . . . . . . . . . . . . . . . . . . 88
6.6.1 Nonlinear Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.6.2 Application to Guitar Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.6.3 Expressive Control Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
v
7.1 Expressive Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2 Physical Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Appendix A Overview of Fractional Delay Filters . . . . . . . . . . . . . . . . . . . . . . 100
A.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.2 The Ideal Fractional Delay Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
A.3 Approximation Using FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
A.3.1 Delay Approximation using Lagrange Interpolation Filters . . . . . . . . . . . . . 103
A.4 Further Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
Appendix B Pitch Glide Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
B.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
B.2 Pitch Glide Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
B.3 Pitch Glide Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
B.4 Nonlinear Modeling and Data Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 108
B.4.1 Nonlinear Least Squares Formulation . . . . . . . . . . . . . . . . . . . . . . . . 108
B.4.2 Fitting and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
B.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
vi
List of Tables
2.1 Summary of sound synthesis models including their modeling domain and applicableaudio signals. Adopted from Vercoe et al. [93]. . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Evaluating the attributes of various sound modeling techniques. The boldface tagsindicate the optimal evaluation for a particular category. . . . . . . . . . . . . . . . . 11
5.1 Mean and standard deviation of the SNR computed using Equation 5.11. The jointsource-filter estimation approach was used to obtain parameters for synthesizing theguitar tones based on an IIR loop filter. . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2 Mean and standard deviation of the SNR computed using Equation 5.11. The jointsource-filter estimation approach was used to obtain parameters for synthesizing theguitar tones using a FIR loop filter with length N = 3. . . . . . . . . . . . . . . . . . 61
B.1 Pitch glide parameters of Equation B.3 for plucked guitar tones for each guitar string.p, mf and f indicate strings excited with piano, mezzo-forte and forte dynamics,respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
vii
List of Figures
3.1 Traveling wave solution of an ideal string plucked at time t = t1 and its displacementat subsequent time instances t2, t3. The string’s displacement (solid) at any positionis the summation of the two disturbances (dashed) at that position. . . . . . . . . . . 16
3.2 Waveguide model showing the discretized solution of an ideal, plucked string. Theupper (y+) and lower (y�) signal paths represent the right and left traveling distur-bances, respectively. The string’s displacement is obtained by summing y+ and y� ata desired spatial sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Waveguide model incorporating losses due to propagation at the spatial samplinginstances. The dashed lines outline a section where M gain and delay blocks areconsolidated using a linear time-invariant assumption. . . . . . . . . . . . . . . . . . . 18
3.4 Plucked-string waveguide model as it correlates to the physical layout of the guitar.Propagation losses and boundary conditions are lumped into digital filters located atthe bridge and nut positions. The delay lines are initialized with the string’s initialdisplacement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Single delay-loop model (right) obtained by concatenating the two delay lines from abidirectional waveguide model (left) at the nut position. Losses from the bridge andnut filters are consolidated into a single filter in the feedback loop. . . . . . . . . . . . 22
3.6 Plucked string synthesis using the single delay-loop (SDL) model specified by S(z).C(z) and U(z) are comb filters simulating the e↵ects of the plucking point and pickuppositions along the string, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.7 Components for guitar synthesis including excitation, string and body filters. Theexcitation and body filter’s may be consolidated for commuted synthesis. . . . . . . . 26
3.8 Overview of the loop filter design algorithm outlined in Section 3.3.5 using short-timeFourier transform analysis on the signal. . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 Top: Plucked guitar tones representing various string articulations by the guitarist onthe open, 1st string (pitch E4, 329.63 Hz). Bottom: Excitation signals for the SDLmodel associated with each plucking style. . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 The output of a waveguide model is observed over one period of oscillation. The topfigure in each subplot shows the position of the traveling acceleration waves at di↵erenttime instances. The bottom plot traces out the measured acceleration at the bridge(noted by the ’x’ in the top plots) over time. . . . . . . . . . . . . . . . . . . . . . . . 37
5.1 Proposed system for jointly estimating the source-filter parameters for plucked guitartones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Pitch estimation using the autocorrelation function. The lag corresponding to theglobal maximum indicates the fundamental frequency for a signal with f0 = 330 Hz. 46
viii
5.3 Overview of residual onset localization in the plucked-string signal. (a): Coarse onsetlocalization using a threshold based on spectral flux with a large frame size. (b):pitch-synchronous onset detection utilizing spectral flux threshold computed with aframe size proportional to the fundamental frequency of the string. (c): Plucked-stringsignal with onsets coarse and pitch-synchronous onsets overlayed. . . . . . . . . . . . 47
5.4 Detail view of the “attack” portion of the plucked-tone signal in Figure 5.3. The pitch-synchronous onset is marked as well as the incident and reflected pulses from the firstperiod of oscillation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.5 Pole-zero and magnitude plots of a string filter S(z) with f0 = 330 Hz and a loopfilter pole located at ↵0 = 0.03. The pole-zero and magnitude plots of the system areshown in (a) and (c) and the corresponding plots using an all-pole approximation ofS(z) are shown in (b) and (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.6 Analysis and resynthesis of the guitar’s 1st String in the “open” position (E4, f0 =329.63 Hz). Top: Original plucked-guitar tone, residual signal and estimated excitationboundaries. Middle: Resynthesized pluck and excitation using estimated source-filterparameters. Bottom: Modeling error. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.7 Comparing the amplitude envelopes of synthetic plucked-string tones produced withthe parameters obtained from the joint source-filter algorithm against their analyzedcounterparts. The tones under analysis were produced by plucking the 1st string atthe 2nd fret position (F#4, f0 = 370 Hz) at piano, mezzo-forte and forte dynamics. . 55
5.8 Comparing the amplitude envelopes of synthetic plucked-string tones produced withthe parameters obtained from the joint source-filter algorithm against their analyzedcounterparts. The tones under analysis were produced by plucking the 5th string atthe 5th fret position (D3, f0 = 146.83 Hz) at piano, mezzo-forte and forte dynamics. . 56
6.1 Source-filter model for plucked-guitar synthesis. C(z) is the feed-forward comb filtersimulating the a↵ect of the player’s plucking position. S(z) models the string’s pitchand decay characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Front orthographic projection of the bridge-mounted piezoelectric bridge used to recordplucked-tones. A piezoelectric crystal is mounted on each saddle, which measurespressure during vibration. Guitar diagram obtained from www.dragoart.com. . . . . 67
6.3 Diagram outlining the residual equalization process for excitation signals. . . . . . . . 69
6.4 “Comb filter” e↵ect resulting from plucking a guitar string (open E, f0 = 331 Hz)8.4 cm from the bridge plucked-guitar tone. (a) Residual obtained from single delay-loop model. (b) Residual spectrum. Using equation 6.2, the notch frequencies areapproximately located at multiples of 382 Hz. . . . . . . . . . . . . . . . . . . . . . . 70
6.5 Plucked-guitar tone measured using a piezo-electric bridge pickup. Vertical dashed-lines indicate the impulses arriving at the bridge pickup. �t indicates the arrival timebetween impulses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.6 (a) One period extracted from the plucked-guitar tone in Figure 6.5. (b) Autocor-relation of the extracted period. The minimum is marked and denotes time lag, �t,between arriving pulses at the bridge pickup. . . . . . . . . . . . . . . . . . . . . . . . 73
ix
6.7 Comb filter structures for simulating the plucking point location. (a) Basic struc-ture. (b) Basic structure with fractional delay filter added to the feedforward path toimplement non-integer delay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.8 Spectral equalization on a residual signal obtained from plucking a guitar string 8.4cm from the bridge (open E, f0 = 331 Hz) . . . . . . . . . . . . . . . . . . . . . . . . 76
6.9 Excitation signals corresponding to strings excited using a pick (a) and finger (b). . . 77
6.10 Average magnitude spectra of signals produced with pick (a) and finger (b). . . . . . 78
6.11 Application of principal components analysis to a synthetic data set. The vector v1
explains the greatest variance in the data while v2 explains the remaining greatestvariance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.12 Explained variance of the principal components computed for the set of (a) unwoundand (b) wound strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.13 Selected basis vectors extracted from plucked-guitar recordings produced on the 1st,2nd and 3rd strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.14 Selected basis vectors extracted from plucked-guitar recordings produced on the 4th,5th and 6th strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.15 Projection of guitar excitation signals into the principal component space. Excitationsfrom strings 1 - 3 (a) and 4 - 6 (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.16 Histogram of basis vector occurrences generated with Mtop = 20. . . . . . . . . . . . 86
6.17 Excitation synthesis by varying the number of code book entries: (a) 1 entry, (b) 10entries, (c) 50 entries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.18 Computed Signal-to-noise ratio when increasing the number of codebook entries usedto reconstruct the excitation signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.19 Architecture for a 3-4-1-4-3 autoassociative neural network. . . . . . . . . . . . . . . . 89
6.20 Top: Projection of excitation signals into the space defined by the first two linearprincipal components. Bottom: Projection of the linear PCA weights along the axisdefined by the bottleneck layer of the trained 25-6-2-6-25 ANN. . . . . . . . . . . . . 91
6.21 Guitar data projected along orthogonal principal axes defined by the ANN (center).Example excitation pulses resulting from sampling this space are also shown. . . . . . 92
6.22 Tabletop guitar interface for the components based excitation synthesis. The articula-tion is applied in the gradient rectangle, while the colored squares allow the performerto key in specific pitches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
A.1 Impulse responses of an ideal shifting filter when the sample delay assumes an integer(top) and non-integer (bottom) number of samples. . . . . . . . . . . . . . . . . . . . 102
A.2 Lagrange interpolation filters with order N = 3 (top) and N = 7 (bottom) to providea fractional delay, dF = 0.3. As the order of the filter is increased, the Lagrange filtercoe�cients near the values of the ideal function. . . . . . . . . . . . . . . . . . . . . . 104
x
A.3 Frequency response characteristics of Lagrange interpolation filters with order N =3, 5, 7 to provide a fractional delay dF = 0.3. Magnitude (top) and group delay(bottom) characteristics are plotted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
B.1 Measured and modeled pitch glide for forte plucks. . . . . . . . . . . . . . . . . . . . 110
B.2 Measured and modeled pitch glide for piano, mezzo-forte and forte plucks. . . . . . . 111
B.3 Single delay-loop waveguide filter with variable fractional delay filter, HF (z). . . . . . 111
xi
AbstractAnalysis and Synthesis of Expressive Guitar Performance
Raymond Vincent MignecoAdvisor: Youngmoo Edmund Kim, Ph.D.
The guitar is one of the most popular and versatile instruments used in Western music cultures.
Dating back to the Renaissance era, the guitar can be heard in nearly every genre of Western music,
and is arguably the most widely used instrument in present-day rock music. Over the span of 500
years, the guitar has developed a multitude of performance and compositional styles associated with
nearly every musical genre such as classical, jazz, blues and rock. This versatility can be largely
attributed to the relatively simplistic nature of the instrument, which can be built from a variety of
materials and optionally amplified. Furthermore, the flexibility of the instrument allows performers
to develop unique playing styles, which reflect how they articulate the guitar to convey certain
musical expressions.
Over the last three decades, physical- and physically-inspired models of musical instruments have
emerged as a popular methodology for modeling and synthesizing various instruments, including the
guitar. These models are popular since their components relate to the actual mechanisms involved
with sound production on a particular instrument, such as the vibration of a guitar string. Since the
control parameters are physically relevant, they have a variety of applications including control and
manipulation of “virtual instruments.” The focus of much of the literature on physical modeling for
guitars is concerned with calibrating the models from recorded tones to ensure that the behavior of
real strings is captured. However, far less emphasis is placed on extracting parameters that pertain
to the expressive styles of the guitarist.
This research presents techniques for the analysis and synthesis of plucked guitar tones that
are capable of modeling the expressive intentions applied through the guitarist’s articulation during
performance. A joint source-filter estimation approach is developed to account for the performer’s
articulation and the corresponding resonant string response. A data-driven, statistical approach for
modeling the source signals is also presented in order to capture the nuances of particular playing
styles. This research has several pertinent applications, including the development of expressive syn-
thesizers for new musical interfaces and the characterization of performance through audio analysis.
1
CHAPTER 1: INTRODUCTION
The guitar is one of the most popular and versatile instruments used in Western music cultures.
Dating back to the Renaissance period, it has been incorporated into nearly every genre of Western
music and, hence, has a rich tradition of design and performance techniques pertaining to each genre.
From a cultural standpoint, musicians and non-musicians alike are captivated by the performances of
virtuoso guitarists past and present, who introduced innovative techniques that defined or redefined
the way the instrument was played. This deep appreciation is no doubt related to the instrument’s
adaptability, as it is recognized as a primary instrument in many genres, such as blues, jazz, folk,
country and rock.
The guitar’s versatility is inherent in its simple design, which can be attributed to its use in
multiple musical genres. The basic components of any guitar consist of a set of strings mounted
across a fingerboard and a resonant body to amplify the vibration of the strings. The tension on
each string is adjusted to achieve a desired pitch when the string is played. Particular pitches are
produced by clamping down each string at a specific location along the fingerboard, which changes
the e↵ective length of the string and, thus, the associated pitch when it is plucked. Frets, which
are metallic strips spanning the width of the fingerboard, are usually installed on the fingerboard to
exactly specify the location of notes in accordance with an equal tempered division of the octave.
The basic design of the guitar has been augmented in a multitude of ways to satisfy the demands
of di↵erent performers and musical genres. For example, classical guitars are strung with nylon
strings, which can be played with the fingers or nails, and a wide fingerboard to permit playing
scales and chords with minimal interference from adjacent strings. Often a solo instrument, the
classical guitar requires a resonant body for amplification where the size and materials of the body
are chosen to achieve a specific timbre. On the other hand, country and folk guitarists prefer steel-
strings which generally produce “brighter” tones. Electric guitars are designed to accommodate
the demands of guitarists performing rock, blues and jazz music. These guitars are outfitted with
electromagnetic pickups where string vibration induces an electrical current, which can be processed
to apply certain e↵ects (e.g. distortion, reverberation) and eventually amplified. The role of the
body is less important for electric guitars (although guitarists argue that it a↵ects the instrument’s
2
timbre) where the body is generally thinner to increase comfort during performance. When the
electric guitar is outfitted with light gauge strings, it facilitates certain techniques such as pitch-
bending and legato, which are more di�cult to perform on acoustic instruments.
Though the guitar can be designed and played in di↵erent ways to achieve a vast tonal palette,
the underlying physical principles of vibrating strings is constant for each variation of the instrument.
Consequently, a popular topic among musicians and researchers is the development of quantitative
guitar models that simulate this behavior. Physical- and physically-inspired models of musical in-
struments have emerged as a popular methodology for this task. The lure of these models is that
they simulate the physical phenomena responsible for sound production in instruments, such as a
vibrating strings or air in a column, and produce high-quality synthetic tones. Properly calibrating
these models, however, remains a di�cult task and is an on-going topic in the literature. Several gui-
tar synthesizers have been developed using physically-inspired models, such as waveguide synthesis
and the Karplus-Strong Algorithm.
In the last decade, there has been considerable interest in digitally modeling analog guitar com-
ponents and e↵ects using digital signal processing (DSP) techniques. This work is highly relevant
to the consumer electronics industry since it promises low-cost, digital “clones” of vintage, analog
equipment. The promise of these devices is to help musicians consolidate their analog equipment
into a single device or acquire the specific tones and capabilities of expensive and/or discontinued
equipment at lower cost. Examples of products designed using this technology include Line6 mod-
eling guitars and amplifiers, where DSP is used to replicate the sounds of well-known guitars and
tube-based amplifiers [45, 46].
Despite the large amount of research focused on digitally modeling the physics of the guitar and its
associated e↵ects, there has been relatively little research conducted which analyzes the expressive
attributes of guitar performance. The current research is mainly concerned with implementing
specific performance techniques into physical models based on detailed physical analysis of the
performer-instrument interaction. However, there is a void in the research for guitar modeling and
synthesis that is concerned with measuring physical and expressive data from recordings. Obtaining
such data is essential for developing an expressive guitar synthesizer ; that is, a system that not only
faithfully replicates guitar timbres, but is also capable of simulating expressive intentions used by
many guitarists.
3
1.1 Contributions
This dissertation proposes analysis and synthesis techniques for plucked guitar tones that are capable
of modeling the expressive intentions applied through the guitarist’s articulation during performance.
Specifically, the expression analyzed through recorded performance focuses on how the articulation
was applied through plucking mechanism and strength. The main contributions of this research are
summarized as follows:
• Generated a data set of plucked guitar tones comprising variations of the performer’s articu-
lation including the plucking mechanism and strength, which spans all of the guitar’s strings
and several fretting positions.
• Developed a framework for jointly estimating the source and filter parameters for plucked-
guitar tones based on a physically-inspired model.
• Proposed and demonstrated a novel application of principal component analysis to model the
source signal for plucked guitar tones to encapsulate characteristics of various string articula-
tions.
• Utilized nonlinear principal components analysis to derive an expressive control space to syn-
thesize excitation signals corresponding to guitar articulations.
The analysis and synthesis techniques proposed here are based on physically inspired models
of plucked-guitar tones. These types of models are chosen because they have great potential for
analyzing and synthesizing expressive performance because their operation has a strong physical
analog to the process of exciting a string; that is, an impulsive force excites a resonant string response.
These advantages are in contrast to other modeling techniques, such as frequency modulation (FM),
additive and spectral modeling synthesis, which are often used for music synthesis tasks, but lack
easily controlled parameters that relate to how an instrument is excited (e.g. bowing, picking).
Physical models, on the other hand, relate to the initial conditions of a plucked string and possible
variations which produce unique tones when applied to the model. This is intuitive, considering
guitarists a↵ect the same physical variables when plucking a string.
The proposed method for deriving the parameters relating to expressive guitar performance is
based on a joint source-filter estimation framework. The motivation to implement the estimation in
a joint source-filter framework is two-fold. Foremost, musical expression results from an interaction
4
between the performer and the instrument and estimating the expressive attributes of performance
requires accounting for the simultaneous variation of source and filter parameters. For the specific
case of the guitar, the performer can be seen as imparting an articulation (i.e. excitation) on the
string (i.e. filter), which has a resonant response to the performance input. The second reason for
this approach is to facilitate the estimation of the source and filter parameters, which is typically
accomplished in two separate tasks.
Building o↵ the joint parameter estimation scheme, component-based analysis is applied to the
source (i.e. excitation) signals obtained from recorded performance. Existing modeling techniques
treat the excitation signal as a separate entity saved o↵-line to model a specific articulation, but in
doing so provides no mechanism to quantify or manipulate the excitation signal. The application of
component analysis is a data-driven, statistical approach used to represent the nuances of specific
articulations through linear combinations of component vectors or functions. Using this represen-
tation, the articulations can be visualized in the component space and dimensionality reduction is
applied to yield an expressive synthesis space that o↵ers control over specific characteristics of the
data set.
The proposed guitar modeling techniques presented in this dissertation have many potential
applications for music analysis and synthesis tasks. Analyzing the source-filter parameters derived
from the recordings of many guitarists could lead to development of quantitative models of guitar
expression and a deeper understanding of expression during performance. The application of the
estimated parameters using the proposed techniques can expand upon the sonic and expressive
capabilities of current synthesizers, which often rely on MIDI or wavetable samples to replicate the
tone with little or no expressive control. During the advent of computer music, limited computational
power was a major constraint when implementing synthesis algorithms, but this is now much less
of a concern given the capabilities of present-day computers and mobile devices. These advances in
technology have provided new avenues for interacting with audio through gesture-based technologies.
The guitar analysis and synthesis techniques presented in this dissertation can be harnessed along
with these technologies to create new experiences for musical interaction.
1.2 Overview
As computational modeling for plucked-guitars is the basis of this thesis, Chapter 2 overviews various
approaches for modeling and synthesizing musical sounds. These approaches include wavetable
5
synthesis, spectral modeling, FM synthesis, physical modeling and source-filter model. The strengths
and weaknesses of each model are evaluated and based on our assessment, a recommendation is made
to base the techniques proposed in this dissertation on a source-filter approximation of physical guitar
models.
Physical and source-filter models are discussed in detail in Chapter 3, which digitally implement
the behavior of a vibrating string due to an external input. The so-called waveguide model, which
is based on a digital implementation of the d’Alembert solution for describing traveling waves on a
string, is introduced as well as a source-filter approximation of this model.
Chapter 4 presents an approach for capturing the expression contained in specific string articu-
lations via the source signal from a source-filter model. The physical relation of this source signal
to the waveguide model is highlighted and it is suggested that a parametric model can be used to
capture the nuances of the articulations. The joint estimation of the source and filter models is
proposed by finding parameters that minimize the error between the analyzed recording and the
synthetic signal. This constrained least squares problem is solved using convex optimization. The
implementation for this approach and results are discussed in Chapter 5.
In Chapter 6, principal components analysis (PCA) is applied to a corpus of excitation signals
derived from recorded performance. In this application, PCA models each excitation signal as a
linear combination of basis functions, where each function contributes to the expressive attributes
of the data. We show that a codebook of relevant basis functions can be extracted which describe
particular articulations where the plucking device and strength are varied. Furthermore, using
components as features, we show that nonlinear PCA (NLPCA) can be applied for dimensionality
reduction, which helps visualize the expressive attributes of the data set. This mapping is reversible,
so the reduced dimensional space can be used as an expressive synthesizer using the linear basis
functions to reconstruct the excitation signals. This chapter also deals with the pre-processing steps
required to remove biases from the recovered signals, including the e↵ect of the guitarist’s plucking
position along the string.
The conclusions from this dissertation are presented in Chapter 7, which includes the limitations
and future avenues to explore.
6
CHAPTER 2: COMPUTATIONAL GUITAR MODELING
A number of techniques are available for the computational modeling and synthesis of guitar tones,
each with entirely di↵erent approaches for capturing its acoustic attributes. This chapter will provide
an overview of the sound models most commonly applied to guitar tones including their computa-
tional basis, strengths and weaknesses. For detailed treatment of these techniques, the reader is
referred to extensive overviews provided by [10] and [89]. The analysis of each synthesis techniques
will also be used to justify the source-filter modeling approach used throughout this dissertation.
Finally, this chapter will discuss pertinent applications of computational synthesis of guitar tones.
2.1 Sound Modeling and Synthesis Techniques
2.1.1 Wavetable Synthesis
In many computer music applications, wavetable synthesis is a viable means for synthetically gener-
ating musical sounds with low computational overhead. A wavetable is simply a bu↵er that stores
the periodic component of a recorded sound, which can be looped repeatedly. As musical sounds
vary in pitch and duration, signal processing techniques are required to modify the synthetic tones
from a wavetable sample. Pitch shifting is achieved by interpolating the samples in the wavetable
where a decrease or increase in pitch is achieved by interpolating the wavetable samples up or down,
respectively.
A problem with interpolation in wavetable synthesis is that excessive interpolation of a particular
wavetable sample can result in synthetic tones that sound unnatural since interpolation alters the
length of the synthetic signal. To overcome this limitation, multi-sampling is used, where several
samples of an instrument are used and these samples span the pitch range of the instrument. In-
terpolation can now be used between the reference samples without excessive degradation to the
synthetic tone, which is preferred to storing every possible pitch the instrument can produce. Multi-
sampling can also be used to incorporate di↵erent levels of dynamics, or relative loudness into the
system as well. Beyond interpolation, digital filters can be used to adjust the spectral properties
7
(e.g. brightness) of the wavetable samples as well.
The computational costs of wavetable synthesis are fairly low and the main restriction is the
amount of memory available to store samples. The sound quality in these systems can be quite good
as long as there is not excessive degradation from modification. However, wavetable synthesis has
no true modeling basis (i.e. sinusoidal, source-filter) and is rather “ad-hoc” in its approach. Also,
its flexibility in modeling and synthesis is restricted by the samples available to the synthesizer.
2.1.2 FM Synthesis
Frequency Modulation (FM) synthesis is a technique used to simulate characteristics of sounds that
cannot be produced with LTI models. A FM oscillator is one such way of achieving these sounds
and it operates by modulating the base frequency of a signal with another signal. FM Synthesis is
often used to simulate characteristics of sounds that cannot be modeled using linear time-invariant
models. A simple FM oscillator is given by
y(t) = Ac sin(2⇡tfc + �fc cos(2⇡tfm)) (2.1)
where Ac and fc are the amplitude and frequency of the carrier signal, respectively, fm is the
modulating frequency and �fc is the maximum di↵erence between fc and fm. The spectrum of
the resulting signal y(t) contains a peak located at the carrier frequency and sideband frequencies
located at plus and minus integer multiples of fm. When the ratio of the carrier to the modulating
frequency is non-integer, FM synthesis creates an inharmonic spectrum where the frequency spacing
between the partials is not constant. This is useful for modeling the spectra of certain musical
sounds, such as strings and drums, which exhibit inharmonic behavior.
FM synthesis is a fairly computationally e�cient technique and can be easily implemented on
a microprocessor, which makes it attractive for commercially available synthesizers. Due to the
nonlinearity of the FM oscillator, for example, it is capable of producing timbres not possible with
other synthesis methods. However, there is no automated approach for matching the synthesis
parameters to an acoustic recording [8]. Rather, the parameters must be tweaked by trial and error
and/or using perceptual evaluation.
8
2.1.3 Additive Synthesis
Additive, or spectral modeling, synthesis is a sound modeling and synthesis approach based on
characterizing the spectra of musical sounds and modeling them appropriately. Sound spectra cat-
egories typically consist of harmonic, inharmonic, noise or mixed spectra. Analysis via the additive
synthesis approach typically entails performing a short-time analysis on the signal to divide it into
relatively short frames where the signal is assumed to be stationary within the frame. In the spectral
modeling synthesis technique proposed by Serra and Smith, the sinusoidal, or deterministic, parts of
the spectrum within each frame are identified and modeled using amplitude, frequency and phase.
The sound can be re-synthesized by interpolating between the deterministic components of each
frame to generate a sum of smooth, time-varying sinusoids. The noise-like, or stochastic, parts of
the spectrum can be obtained by subtracting the synthesized, deterministic component from the
original signal [68].
There are several benefits to synthesizing musical sounds via additive synthesis. Foremost, the
model is very general and can be applied to a wide range of signals including polyphonic audio
and speech [50, 68]. Also, the separation of the deterministic and stochastic components permits
flexible modification of signals since the sinusoidal parameters are isolated within the spectrum.
For example, pitch and time/scale modification can be achieved independently or simultaneously
by shifting the frequencies of the sinusoids and altering the interpolation time between successive
frames. This leads to synthetic tones that sound more natural and can be extended indefinitely,
unlike wavetable interpolation.
A problem with additive synthesis is that transient events present in an analyzed signal are
often too short to be adequately modeled by sinusoids and must be accounted for separately. This
is problematic especially for signals with a percussive “attack” such as plucked-strings. It is also
unclear how to modify the sinusoids in order to achieve certain e↵ects related to the perceived
dynamics of a musical tone.
2.1.4 Source-Filter Modeling
Analysis and synthesis via source-filter models involves using a complex sound source, such as an
impulse or periodic impulse train, to excite a resonant filter. The filter includes the important per-
ceptual characteristics of the sound, such as the overall spectral tilt and the formants, or resonances,
characteristic to the sound. When such a filter is excited by an impulse train, for example, the
9
resonant filter is “sampled” at regular intervals in the spectrum as defined by the frequency of the
impulse train.
Source-filter models are attractive because they permit the automated analysis of the resonant
characteristics through either time or frequency domain based techniques. One of the most well-
known examples of this is linear prediction. Linear prediction entails predicting a sample of a signal
based on a linear combination of past samples for that signal
x(n) =PX
p=1
↵px(n � p) (2.2)
where ↵p, ↵p+1, . . . , ↵P are the prediction coe�cients to be estimated from the recording [60]. When
a fairly low prediction order P is used, the prediction coe�cients yield an all-pole filter that approx-
imates the spectral shape, including resonances, of the analyzed sound. Computationally e�cient
techniques, such as the autocorrelation and covariance methods, are available for estimating the
filter parameters as well.
A significant advantage of source-filter models is that they approximate musical sounds as the
output of a linear time-invariant (LTI) system. Therefore, using the estimated resonant filter, the
source signal for the model can be recovered through an inverse filtering operation. Analysis of
the recovered source signals provides insight into the expression used to produce the sound for the
case of musical instruments. Also, source signals derived from certain signals can be used to excite
the resonant filters from others, thus permitting cross-synthesis for generating new and interesting
sounds. As will be discussed in Chapter 3, source-filter models have a close relation to physical
models of musical instruments.
Despite the advantages of source-filter models, they have certain limitations. Namely, as they
are based on LTI models, they cannot model the inherent nonlinearities found in real musical in-
struments. For example, tension modulation in real strings alters the spectral characteristics in a
time-varying manner, while source-filter models have fixed fundamental frequencies.
2.1.5 Physical Modeling
Physical modeling systems aim to model the behavior of systems using physical variables such as
force, displacement, velocity and acceleration. Physical systems describing sound can range from
musical interactions such as striking a drum or string or natural sounds such as wind and rolling
objects. An example physical system for a musical interaction consists of releasing a string from an
10
initial displacement. The solution to this system is discussed extensively in Chapter 3, but involves
computing the infinitesimal forces acting on the string as it is released which results in a set of
di↵erential equations describing the motion of the string with respect to time and space. The digital
implementation of physical models for sound can be achieved in a number of ways including modal
decomposition, digital waveguides and wave digital filters to name a few [89].
While physical models are capable of high quality synthesis of acoustic instruments, developing
models of these systems is often a di�cult task. Taking the plucked-string as an example, a complete
physical description requires knowledge of the string including its material composition and how it
interacts with the boundary conditions at its termination points, which includes fricative forces
acting on the string as it travels. Furthermore, there may be coupling forces acting between the
string and the excitation mechanism (e.g. the player’s finger), which should be included as well.
For these reasons, the physical system must be known a priori and it cannot be calibrated directly
through audio analysis.
2.2 Summary and Model Recommendation
Table 2.1 summarizes the sound modeling techniques presented above by comparing their modeling
domains and the range of musical signals that can be produced using each method. The vertical
ordering is indicative of the underlying basis and/or structure of the model types. For example,
wavetable synthesis is a rather “ad-hoc” approach without a true computational basis, while FM
synthesis is based on modulating sinusoids. Additive synthesis and source-filter models have a strict
modeling basis using sinusoids plus noise and source-filter parameters, respectively. Physical models
are most closely related to musical instruments since they deal with related physical quantities and
interactions. As a model’s parameter domain becomes more general, a greater range of sounds can
be synthesized with more control over their properties (i.e. pitch, timbre, articulation).
Based on the discussion in Section 2.1, the strengths and weaknesses of each model are evaluated
on a scale (Low, Moderate, High) as they pertain to four categories:
1. Computational complexity required for implementation
2. The resulting sound quality when the model is used for sound synthesis of guitar tones
3. The di�culty required to calibrate the model in accordance with acoustic samples
4. The degree of expressive control a↵orded by the model
11
Table 2.1: Summary of sound synthesis models including their modeling domain and applicableaudio signals. Adopted from Vercoe et al. [93].
Sound Model Parameter Domain Acoustic Range
Wavetablesound samples, manipulation
filtersdiscrete pitches, isolated sound
events
FMcarrier and modulating
frequenciessounds with harmonic and
inharmonic spectra
Additivenoise sources, time-varying
amplitude, frequency and phase
sounds with harmonic,inharmonic, noisy or mixed
spectra
Source-Filterexcitation signal, filter
parameters
voice (speech, singing),plucked-string or struck
instruments
Physicalphysical quantities (length,
sti↵ness, position, etc.)plucked, struck, bowed or blown
instruments
Table 2.2: Evaluating the attributes of various sound modeling techniques. The boldface tagsindicate the optimal evaluation for a particular category.
Sound ModelComputationalComplexity
Sound QualityCalibrationDi�culty
ExpressiveControl
Wavetable Low High High Low
FM Low Moderate High Low
Additive Moderate High Moderate Moderate
Source-Filter Moderate High Moderate High
Physical High High High Moderate
Table 2.2 shows the results of this evaluation in accordance with the four categories presented
above. The model(s) earning the best evaluation for each category are highlighted in bold face font
for emphasis. It should be noticed that, in general, the computational complexity of the models
increases in accordance with the associated model parameter domain in Table 2.1. That is, as the
parameters become more general, they are more di�cult to implement and harder to calibrate.
For truly flexible and expressive algorithmic synthesis, additive, source-filter and physical models
o↵er the best of all categories. While the additive model provides good sound quality and flexible
synthesis (especially with regard to pitch and time shifting), the sinusoidal basis does not allow
the performer’s input to be separated from the instrument’s response. Physical models provide this
12
separation, but are di�cult to calibrate, especially from a recording, since the physical configuration
of the instrument’s components and the performer’s interaction are generally not known a priori.
Of the remaining models, the source-filter model provides the greatest appeal due to its inherent
simplicity especially, especially as it pertains to modeling the performer’s articulation, relative ease
of calibration and available expressive control.
2.3 Synthesis Applications
The techniques for modeling plucked-guitar tones presented in this thesis are applicable to a number
of sound synthesis tasks. This section will highlight a few such tasks to provide a larger perspective
on the benefits of computational guitar modeling.
2.3.1 Synthesis Engines
There are numerous systems available which encompass a variety of computational sound models
for the creation of synthetic audio. One system includes CSound, which is an audio programming
language created by Vercoe et al. based on the C language [92]. CSound o↵ers the implementation
of several synthesis algorithms, including general filtering operations, additive synthesis and linear
prediction. The Synthesis ToolKit (STK) is another system created by Cook and Scavone, which
adopts a hierarchical approach to sound modeling and synthesis using an open-source application
programming interface based on C++ [11]. STK handles low level, core sound synthesis via unit
generators which include envelopes, oscillators and filters. High-level synthesis routines encapsulate
physical modeling algorithms for specific musical instruments, FM synthesis, additive synthesis and
other routines.
2.3.2 Description and Transmission
Computational modeling of musical instruments, especially the guitar, is highly applicable in sys-
tems requiring generalized audio description and transmission. The MPEG-4 standard is perhaps
the most well-known codec (compressor-decompressor) for transmission of multimedia data. How-
ever, the compression of raw audio, even using the perceptual codec found in mp3, leaves little or no
control over the sound at the decoder. To expand the parametric control of compressed audio, the
MPEG-4 standard includes a descriptor for so-called Structured Audio, which permits the encoding,
transmission and decoding of audio using highly structured descriptions of sound [21, 66, 93]. The
13
audio descriptors can include high-level, performance information for musical sounds such as pitch,
duration, articulation and timbre and low-level descriptions based on the models (e.g. source-filter,
additive synthesis) used to generate the sounds. It should be noted that the structured audio descrip-
tor does not attempt to standardize the model used to parameterize the audio, but provides a means
for describing the synthesis method(s), which keeps the standard flexible. The level of description
provided by structured audio di↵erentiates it from other formats such as pulse-code modulated audio
or mp3, which do not provide contextual descriptions and MIDI (musical instrument digital inter-
face), which provide contextual description, but lacks timbral or expressive descriptors. In essence,
structured audio provides a flexible and descriptive “language” for communicating with synthesis
engines.
2.3.3 New Music Interfaces
Computer music researchers have long sought to develop new interfaces for musical interaction.
Often, these interfaces deviate from the traditional notion in which an instrument is played in order
to appeal to non-musicians or enable entirely new ways of interacting with sound. For the guitar,
Karjalainen et al. developed a “virtual air guitar” where the performer’s hands are tracked using
motion sensing gloves [26]. The guitar tones are produced algorithmically using waveguide models
in response to gestures made by the performer. More recently, commercially available gesture and
multitouch technologies have been used for music creation. The limitations of these systems, however,
is that their audio engines utilize sample-based synthesizers and provide little or no parametric
control over the resulting sound [20, 55].
The plucked-guitar model techniques presented in this dissertation are applicable to each of the
sound synthesis areas outlined above. The source and filter parameters extracted from recordings
can be used for low bit-rate transmission of audio and are based on algorithms (source-filter) that
are either available in many synthesis packages are easily implemented on present-day hardware.
Given the computational power available in present day computers and mobile devices, the anal-
ysis techniques and algorithms presented here can be harnessed into applications for new musical
interfaces as well.
14
CHAPTER 3: PHYSICALLY INSPIRED GUITAR MODELING
3.1 Overview
For the past two decades, physically-inspired modeling systems have emerged as a popular method
for simulating plucked-string instruments since they are capable of producing high-quality tones
with computationally e�cient implementations. The emergence of these techniques was due, in
part, to the innovations of the Karplus-Strong algorithm, which simulated plucked-string sounds
using a simple and e�cient model, which was later shown to approximate the physical phenomena
of traveling waves on a string [22, 30, 31, 72, 89]. Thus, direct physical modeling of a musical
instrument aims to simulate the behavior of particular elements responsible for sound production
(e.g. a vibrating string or resonant air column) due to the musician’s interaction with the instrument
(e.g. plucking or breath excitation) with a digital model [89].
This chapter will briefly overview waveguide techniques for guitar synthesis, which directly models
the traveling wave solution resulting from a plucked string. A related model, known as the single
delay-loop, is also discussed, which is utilized for the analysis and synthesis tasks presented in this
thesis.
3.2 Waveguide Modeling
Directly modeling the complex vibration of guitar strings due to the performer-instrument interaction
is a di�cult problem. However, by using simplified models of plucked-strings, waveguide models o↵er
an intuitive understanding of string and lead to practical and e�cient implementations [72]. In this
section, the well-known traveling wave solution for ideal, plucked-strings is presented [33]. This
general solution is then discretized and digitally implemented, as shown by Smith, to constitute a
digital waveguide model [72]. Common extensions to the waveguide model are also presented, which
correspond to non-ideal string conditions.
15
3.2.1 Solution for the Ideal, Plucked-String
The behavior of a vibrating string is understood by deriving and solving the well-known wave
equation for an ideal, lossless string. The full derivation of the wave equation is documented in
several physics texts [33, 52] and is obtained by computing the tension di↵erential across a curved
section of string with infinitesimal length. This tension is balanced at all times by an inertial
restoring force due to the string’s transverse acceleration.
The wave equation is expressed as [33]
Kty00 = "y (3.1)
where Kt, " are the string’s tension and linear mass density, respectively, and y = y (t, x) is the
string’s transverse displacement at a particular time instant, t, and location along the string, x. The
curvature of the string is indicated by y00 = @2y(t, x)/@x2 and its transverse acceleration is given by
y = @2y(t, x)/@t2. The general solution to the wave equation is given by [33]
y (t, x) = yr (t � x/c) + yl (t + x/c) , (3.2)
where yr and yl are functions that describe the right and left traveling components of the wave,
respectively, and c is the wave speed, which is a constant determined byp
Kt/". It should be noted
that, yr and yl are arbitrary functions of arguments (ct � x) and (ct + x) and it can be verified that
substituting any twice-di↵erentiable function with these arguments for y(t, x) will satisfy Equation
3.1 [33, 72].
Equation 3.2 indicates that the wave solution can be represented by two functions, each depending
on a time and a spatial variable. This notion becomes clear by analyzing an ideal, plucked-string
at a few instances after its initial displacement as shown in Figure 3.1. After the string is released,
its total displacement is obtained by summing the amplitudes of the right- and left-traveling wave
shapes, which propagate away from the plucking position, along the entire length of the string.
3.2.2 Digital Implementation of the Wave Solution
As demonstrated in Figure 3.1, the traveling wave solution has both time and spatial dependencies,
which must be discretized to digitally implement Equation 3.2. Temporal sampling is achieved by
employing a change of variable in Equation 3.2 such that tn = nTs where Ts is the audio sampling
16
t = t1
t = t2
t = t3
Figure 3.1: Traveling wave solution of an ideal string plucked at time t = t1 and its displacement atsubsequent time instances t2, t3. The string’s displacement (solid) at any position is the summationof the two disturbances (dashed) at that position.
interval. The wave’s position is discretized by setting xm = mX, where X = cTs, such that the
waves are sampled at a fixed spatial interval along the string. Substituting t and x with tn and xm
in Equation 3.2 yields [72]:
y (tn, xm) = yr (t � x/c) + yl (t + x/c) (3.3)
= yr (nTs � mX/c) + yl (nTs + mX/c) (3.4)
= yr ((n � m) Ts) + yl ((n + m) Ts) (3.5)
Since all arguments are multiplied by Ts, it is suppressed and the terms corresponding to the right
and left traveling waves can be simplified to [72, 89]:
y+ (n) , yr (nTs) , y� (n) , yl (nTs) (3.6)
Smith showed that Equation 3.5 could be schematically realized as a so-called “digital waveg-
uide” model shown in Figure 3.2 [70, 71, 72]. When the upper and lower signal paths, or “rails”,
of Figure 3.2 are initialized with the values of the string’s left and right wave shapes, the traveling
wave phenomena in Figure 3.1 and Equation 3.2 is achieved by shifting the transverse displacement
values for the wave shapes in the upper and lower rails. For example, during one temporal sampling
instance, the right-traveling wave shifts by the amount cTs along the string, which is equivalent to
delaying y+ by one sample in Figure 3.2. The waveguide model also provides an intuitive under-
standing for how the traveling waves relate to the string’s total displacement, which is obtained by
17
z-1 z-1 z-1
z-1 z-1 z-1
y+(n) y+(n-1) y+(n-2) y+(n-3)
y-(n) y-(n+1) y-(n+2) y-(n+3)
(x = 0) (x = cTs) (x = 2cTs) (x = 3cTs)
y(nTs, 0) y(nTs, 3X)
Figure 3.2: Waveguide model showing the discretized solution of an ideal, plucked string. The upper(y+) and lower (y�) signal paths represent the right and left traveling disturbances, respectively.The string’s displacement is obtained by summing y+ and y� at a desired spatial sample.
summing the values of y+ and y� at a desired spatial sample x = mcTs. It should be noted that the
values obtained at the sampling instants in the waveguide model are exact, although band-limited
interpolation can be used to obtain the displacement between spatial sampling instants if desired
[89].
3.2.3 Lossy Waveguide Model
The lossless waveguide model in Figure 3.2 clearly represents the phenomena of the traveling wave
solution for a plucked string under ideal conditions. However, this model does not incorporate the
characteristics of real strings, which are subject to a number of non-ideal characteristics, such as
internal friction and losses due to boundary collisions. In the context of sound synthesis, incorpo-
rating these properties is essential for modeling tones that behave naturally both from a physical
and perceptual standpoint.
Non-ideal string propagation is hindered by energy losses from internal friction and drag imposed
by the surrounding air. If these losses can be modeled as a constant, µ, proportional to the wave’s
transverse velocity, y, Equation 3.1 can be modified as [72]
Kty00 = "y + µy (3.7)
where the additional term, µy, incorporates the fricative losses applied to the string in the transverse
direction. The solution to Equation 3.7 is the same as Equation 3.1, but with an exponential term
that attenuates the right- and left-traveling waves as a function of propagation distance. The solution
18
z-1y+(n)
y-(n)
(x = 0)
y(nTs, 0)
g z-1 g
g z-1 g z-1
z-1
M sections
g z-1
y(nTs, MX)
g
(x = McTs)
Figure 3.3: Waveguide model incorporating losses due to propagation at the spatial sampling in-stances. The dashed lines outline a section where M gain and delay blocks are consolidated using alinear time-invariant assumption.
is given by [72]:
y(t, x) = e�(µ/2")x/cyr(t � x/c) + e(µ/2")x/cyl(t + x/c) (3.8)
To obtain the lossy waveguide model, Equation 3.8 is discretized by applying the same change of
variables that were used to discretize Equation 3.1. This yields a waveguide model with a gain factor,
g = e�µTs/2", inserted after each delay element in the waveguide as shown in Figure 3.3. Thus, a
particular point along the right- or left-traveling wave shape is subject to an amplitude attenuation
by the amount g as it advances one spatial sample through the waveguide.
By using a linear time-invariant (LTI) assumption, Figure 3.3 can be simplified to reduce the
number of delay and gain elements required for the model. For example, if the output of the
waveguide is observed at x = (M + 1)X, then the previous M delay and gain elements can be
consolidated into a single delay, z�M , and loss factor, g�M . This greatly reduces the complexity of
the waveguide model, which is desirable for practical implementations.
3.2.4 Waveguide Boundary Conditions
In practice, the behavior of a vibrating string is determined by boundary conditions due to the
string’s termination points. In the case of the guitar, each string is terminated at the “nut” and
“bridge” where the former is located near the guitar’s headstock and the latter is mounted on the
guitar’s saddle. The behavior of the string at these locations depends on several factors, including
the string’s tensile properties, how it is fastened and the construction of the bridge and nut. For
19
simplistic modeling, however, it su�ces to assume that guitar string’s are rigidly terminated such
that there is no displacement at these positions.
By assuming rigid terminations for a string with length L, a set of boundary conditions are
obtained for solving the wave equation [33]
y (t, 0) = 0 y (t, L) = 0. (3.9)
By substituting these conditions into Equation 3.2 and discretizing, the following relations between
y+ and y� are obtained [72]:
y+ (n) = �y� (n) (3.10)
y+ (n � D/2) = �y� (n + D/2) (3.11)
In Equation 3.11, D = 2L/X and is often referred to as the “loop delay” since it indicates the delay
time, in samples, for a point on the right wave shape, for example, to travel from x = 0 to x = L
and back along the string. Thus, points located at the same spatial sample on the right and left
wave shapes will have the same amplitude displacement every D/2 samples. Viewed another way,
D can be calculated as a ratio of the sampling frequency and the string’s pitch, which is determined
by the string’s length,
D =2L
X=
2L
cTs=
2Lfsc
=fsf0
(3.12)
where the fundamental frequency, f0, was substituted based on the wave relationship f0 = c/2L
where 2L is the wavelength and c is the wavespeed.
Figure 3.4 shows the lossy waveguide model with boundary conditions superimposed on a guitar
body to illustrate the physical relationship between the model and instrument. The loss factors due
to wave propagation and rigid boundary conditions are consolidated into two filters located at x = 0
and x = L, which correlate the guitar’s bridge and nut positions, respectively. The individual delay
elements are merged into two bulk delay lines, each having a length of D/2 samples and store the
shapes of the left- and right-traveling wave shapes at any time during the simulation. Furthermore,
this model allows the string’s initial conditions to be specified relative to a spatial sample in the
delay line that represents the plucking point position. Initializing the waveguide in this way removes
20
Delay Line D/2 Samples
Hb(z)
y+(n-D/2)y+(n)
y-(n) y-(n+D/2)
(x = 0) (x = L)
y(nTs, M1X) Hh(z)
Bridge Nut
(x = M1X)
Pickup Pluck Point
(x = M2X)
Delay Line D/2 Samples
Figure 3.4: Plucked-string waveguide model as it correlates to the physical layout of the guitar.Propagation losses and boundary conditions are lumped into digital filters located at the bridge andnut positions. The delay lines are initialized with the string’s initial displacement.
the need to explicitly model the coupling e↵ects arising from the interaction between the string and
excitation mechanism [72]. The guitar’s output is observed at the “pickup” location by summing
the values of the upper and lower delay lines at a desired spatial sample.
The simplistic nature of the the waveguide model in Figure 3.4 leads to computationally e�cient
hardware and software implementations of realistic plucked guitar sounds. Memory requirements
are minimal, since only two bu↵ers are required to store the string’s initial conditions and the
lossy boundaries can be implemented with simple digital filters. Furthermore, as Smith showed,
the contents of the delay lines can be shifted via pointer manipulation to reduce the load on the
processor [10, 72]. Karjalainen showed that using such techniques enables several string models to
be implemented on a single DSP chip, with computational capabilities that are eclipsed by present
day (2012) microprocessors [25].
3.2.5 Extensions to the Waveguide Model
An important extension is providing fractional delay for the waveguide model since strings are often
tuned to non-integer frequencies that may not be obtainable by taking the ratio of sampling frequency
over delay line length. While certain hardware and software configurations support multiple sampling
rates, it is generally undesirable to vary the sampling rate to achieve a particular tuning, especially
when synthesizing multiple string tones with di↵erent pitches. Instead, Karjalainen proposed adding
21
fractional delay into the waveguide loop via a Lagrange interpolation filter. Thus, a FIR filter is
computed to add the required fractional delay to precisely tune the waveguide [25].
Smith proposed using all-pass filters to simulate the e↵ects of dispersion in strings, where the
string’s internal sti↵ness causes higher frequency components of the wave to travel faster than lower
ones. This has the e↵ect of constantly altering the shape of the string. All-pass filters introduce
frequency-dependent group delay to simulate this e↵ect [72].
Tolonen et al. incorporate the e↵ects of “pitch glide,” or tension modulation, exhibited by real
strings using a non-linear waveguide model [79, 80, 91]. At rest, a string exhibits a nominal length
and tension. However, as the string is displaced from its equilibrium position, the string undergoes
elongation which increases its tension. After release, the tension and, thus, the wave speed constantly
fluctuates as the string oscillates about its nominal position. This constant fluctuation does not allow
a fixed spatial sampling scheme to su�ce and the wave must be resampled at each time instance to
account for the elongation.
3.3 Analysis and Synthesis Using Source-Filter Approximations
The waveguide model discussed in the previous discussion provides an intuitive methodology for
implementing the traveling wave solution and simulating plucked-string tones. However, accurate
re-synthesis of plucked-guitar tones using the waveguide model requires knowledge of the string’s
initial conditions and loss filters that are correctly calibrated to simulate naturally decaying tones.
The former requirement is a significant limitation since the exact initial conditions of the string
are not available from a recorded signal and must be measured during performance, which is often
impractical. Therefore, when performance and physical data are unavailable, the utility of the
waveguide model is limited for analysis-synthesis tasks, such as characterizing recorded performance.
An alternative model, known as the single delay-loop (SDL), was developed to simplify the
waveguide model from a computational standpoint by consolidating the delay lines and loss filters.
The SDL model is also widely used in the literature because it permits the analysis of plucked-
guitar tones from a source-filter perspective; that is, an external signal excites a filter to simulate
the resonant behavior of a plucked string. Thus, the physical specifications for the guitar and its
strings are generally not required to calibrate the SDL model since linear time-invariant methods
can be applied for this task. A number of guitar synthesis systems are based on SDL models
[26, 56, 74, 75, 90].
22
3.3.1 Relation to the Karplus-Strong Model
For a more streamlined structure, the bidirectional waveguide model from Figure 3.4 can be reduced
to a single, D-length delay line and a loop filter that consolidates the losses incurred from the bridge
and nut [7, 72]. This reduction is shown in Figure 3.5, where the lower delay line is concatenated
with the upper delay line at the nut position. The wave shape contained in the lower delay line is
inverted to incorporate the reflection at the rigid nut, which has been removed.
Hb (z)
y+(n-D/2)y+(n)
y-(n) y-(n+D/2)
Hh (z)
D/2 Samples
D/2 Samples
D Samples
Hl (z)
y+(n) y+(n-D)
Figure 3.5: Single delay-loop model (right) obtained by concatenating the two delay lines from abidirectional waveguide model (left) at the nut position. Losses from the bridge and nut filters areconsolidated into a single filter in the feedback loop.
The new waveguide structure in Figure 3.5 (right) demonstrates the basic SDL model and is
identical to the well-known Karplus-Strong (KS) plucked-string model, whose discovery pre-dated
waveguide synthesis techniques [22, 31]. Unlike waveguide techniques where the excitation is based
on wave variables, the KS model works by initializing a D-length delay line with random values
and circularly shifting the samples through a loss filter. The random initialization of the delay line
simulates the transient noise burst perceived during the attack of plucked-string instruments, though
this “excitation” signal has no physical relation to the string, while the feedback loop acts a comb
filter so that only the harmonically-related frequencies are passed. The loss filter, Hl(z), employs
low-pass filtering to implement the frequency dependent decay characteristics of real strings so that
high frequency energy dissipates faster than the lower frequencies.
3.3.2 Plucked String Synthesis as a Source-Filter Interaction
By modeling plucked-guitar tones with the single-delay loop (SDL), the physical interpretation
of traveling wave shapes on a string is no longer clear as it was for the bidirectional waveguide.
However, Valimaki et al. show that the SDL can be derived from the bidirectional waveguide model
by computing a transfer function between the spatial samples representing the plucking position
23
and output samples [30, 89]. This derivation is still physically valid, though the model’s excitation
signal is treated as an external input rather than a set of initial conditions describing the string’s
displacement.
Figure 3.6 shows a complete source-filter model for plucked guitar synthesis based on waveguide
modeling principles. The SDL model is contained in the block labeled S(z), which is equivalent
to the single delay line structure shown in Figure 3.5, except the model is driven by an external
excitation signal rather than a random initialization as in the Karplus-Strong model. S(z) alone
cannot simulate the complete behavior of plucked-strings found in the waveguide model. Notably,
missing is the ability to manipulate the plucking point and pick up positions, both of which are
achieved by selecting a desired spatial sample in the waveguide model corresponding to the location
on where the string is displaced and where the vibration is observed as the output. Valimaki showed
that this functionality could be achieved by adding comb filters before and after the SDL to simulate
the e↵ects of plucking point and pickup positions present in the waveguide model.
Figure 3.6 shows a comb filter C(z) preceding S(z) to simulate the e↵ect of the plucking point
position. For simplicity, the input p(n) can be an ideal impulse. The comb filter delay determines
when p(n) is reflected, which is analogous to a sample in the digital waveguide model encountering
a rigid boundary. The number of samples between the initial and reflected impulses is specified as a
fraction � of the loop delay where D indicates the number of samples corresponding to one period of
string vibration. Similarly, the comb filter U(z) proceeding S(z) simulates the position of the pickup
seen on electric guitars. In this filter, the comb filter delay specifies the delay between arriving pulses
associated with a relative position along the string. It should be noted that, since each of the blocks
in Figure 3.6 are linear time-invariant (LTI) systems, they may be freely interchanged as desired.
3.3.3 SDL Components
Whereas the comb filters in Figure 3.6 specify initial and output observation conditions for the
plucked guitar tone, the SDL filter in S(z) is responsible for modeling the string vibration including
its fundamental frequency and decay. As in the case of the bidirectional waveguide, the total “loop
delay”, D, of the SDL denoted by S(z) determines the pitch of the resulting guitar tone as determined
by Equation 3.12. Since D is typically a non-integer, the fractional delay filter, HF (z), is used to
add the required fractional group delay, while z�DI provides the bulk, integer delay component of
D. All-pass and Lagrange interpolation filters are commonly used for HF (z), with the latter being
24
Hl (z) HF (z) z-DI
p(n)
y(n)
S(z)
+z-λ1D
−+
+
C(z)
+
+
+z-λ2D
−+
U(z)
Figure 3.6: Plucked string synthesis using the single delay-loop (SDL) model specified by S(z). C(z)and U(z) are comb filters simulating the e↵ects of the plucking point and pickup positions along thestring, respectively.
especially popular in synthesis systems since it can achieve variable delay for pitch modification
without significant transient e↵ects [26, 30]. Additional information pertaining to fractional delay
filters is provided in Appendix A.
Hl(z) is the so-called “loop filter” and is responsible for implementing the non-ideal characteristics
of real strings, including losses due to wave propagation and terminations at the nut and bridge
positions. In the early developments of waveguide synthesis, Hl(z) was chosen as a two-tap, averaging
filter for simplicity and e�ciency [31], but since a low order, FIR filter is often too simplistic to match
the magnitude decay characteristics of plucked-guitar tones. In the literature, a first order, IIR filter
is often used for Hl(z) and has the form
Hl(z) =g
1 � ↵0z�1(3.13)
where ↵0 and g must be determined for proper calibration [29, 62, 86, 90]
It is useful to analyze the total delay, D, in the SDL as a sum of the delays contributed by each
component in the feedback loop,
D = ⌧l + DF + DI (3.14)
25
where ⌧l, DF , DI are the group delays associated with Hl(z), HF (z) and z�DI , respectively. Thus,
the bulk and fractional delay components should be chosen to compensate for the group delay
introduced by the loop filter, which varies as a function of ↵0.
For spectral-based analysis, the transfer function of the SDL model between input, p(n), and
output, y(n), can be expressed in the z-transform domain as
S(z) =1
1 � Hl(z)HF (z)z�DI. (3.15)
Equation 3.15 can be thought of as a modified linear prediction where the prediction occurs over
DI samples due to the periodic nature of plucked-guitar tones. The “prediction” coe�cients are
determined by the coe�cients of the loop and fractional delay filters in the feedback loop of S(z).
The SDL model in Figure 3.6 is attractive from an analysis-synthesis perspective since, unlike the
bidirectional waveguide model, it does not require specific data about the string during performance
(e.g. initial conditions, instrument materials, plucking technique) to faithfully replicate plucked-
guitar tones. Rather, the problem becomes properly calibrating the filters from recorded tones via
model-based analysis. A significant portion of the literature for plucked-guitar synthesis is dedicated
towards developing calibration schemes for extracting optimal SDL components [26, 29, 62, 69, 86,
90].
3.3.4 Excitation and Body Modeling via Commuted Synthesis
When using the SDL model for guitar synthesis, the output signal is assumed to be strictly the result
of the string’s vibration where the only external forces acting on the string are due to fricative losses.
This assumption is not necessarily true when dealing with real guitars, since the instrument’s body
incorporates a resonant filter, which a↵ects its timbre, and interacts with the strings via nonlinear
coupling. Valimaki et al. describe the acoustic guitar body as a multidimensional resonator, which
requires computationally expensive modeling techniques to implement [89].
While an exhaustive review of acoustic body modeling techniques is beyond the current scope,
several attempts have been made to reduce the complexity of this task [7, 28, 57]. Measurement of the
acoustic guitar body response is typically achieved by striking the resonant body of the instrument
with a hammer with the strings muted. The acoustic radiation is recorded to capture the resonant
body modes. In some cases, electro-mechanical actuators are used to excite and measure the resonant
body in a controlled manner [63]. Digital implementation of the acoustic body involves designing a
26
y(n)E(z) S(z) B(z)δ(n)
Excitation Filter SDL Model Body Filter
Figure 3.7: Components for guitar synthesis including excitation, string and body filters. Theexcitation and body filter’s may be consolidated for commuted synthesis.
filter that captures the resonant modes. This can be achieved using FIR or IIR filters, though precise
modeling requires very high order filters. Karjalainen et al. proposed using warped filter models
for computationally e�cient modeling and synthesis of acoustic guitar bodies. The warped filter
is advantageous since the frequency resolution of the filter can favor the lower, resonant frequency
modes which are perceptually important to capture for re-synthesis, while keeping the required filter
orders low enough for e�cient synthesis [24]. For “cross-synthesis” applications, Karjalainen et al.
introduced a technique to “morph” electric guitar sounds into acoustic tones through equalization
of the magnetic pickups found on electric guitars. A filter, which encapsulates the body e↵ects of
the acoustic guitar, was then applied to a digital waveguide model of the instrument [27].
A popular method for dealing with the absent resonant body e↵ects in SDL model involves using
so-called commuted synthesis, which was independently developed by Smith and Karjalainen [29, 73].
This technique exploits the commutative property of linear time-invariant (LTI) systems in order to
extract an aggregate signal that encapsulates the e↵ects of the resonant body filter and the string
excitation, p(n), of the SDL model when the loop filter parameters are known. This approach avoids
the computational cost incurred with explicitly modeling the body with a high-order filter.
Figure 3.7 shows the SDL model augmented by inserting excitation and body filters before and
after the SDL loop, respectively. The excitation filter is a general LTI block that encapsulates several
aspects of synthesis including “pluck-shaping” filters to model certain dynamics in the articulation
and the comb filtering e↵ects from the plucking point and/or pickup locations as shown in Figure
3.6. Assuming that S(z) and y(n) are known, the LTI system can be rearranged
Y (z) = E (z) S (z) B (z) (3.16)
= E (z) B (z) S (z) (3.17)
= A (z) S (z) (3.18)
where A(z) is an aggregation of the body and excitation filters. By inverse filtering y(n) in the
27
frequency domain with S(z), the impulse response for A(z) is obtained. Thus, by making a LTI
assumption on the model, this residual signal contains the additional model components which are
unaccounted for by the SDL alone. For practical considerations, Valimaki notes that several hundred
milliseconds of the residual signal may be required to capture the perceptually relevant resonances
of the acoustic body during resynthesis [90], but for many applications the tradeo↵ of storing this
signal outweighs the cost of explicit body modeling.
It should be noted, that even when plucked-guitar tones do not exhibit prominent e↵ects from
the resonant body, commuted synthesis is still a valid technique for obtaining the SDL excitation
signal, p(n). This is often the case for electric guitar tones, where the output is measured by a
transducer and is relatively “dry” compared to an acoustic guitar signal. Also, any excitation signal
extracted via commuted synthesis will contain biases from the plucking point and pickup locations
unless these phenomena are specifically accounted for in the “excitation filter” block of Figure 3.7.
If the plucking point and pickup locations are known with respect to the SDL model, the excitation
signal can be “equalized” to remove the biases. There are several techniques utilized in the literature
to estimate the plucking point location directly from recordings of plucked guitar tones. Traube and
Smith developed frequency domain techniques for acoustic guitars [81, 82, 83, 84], while Pentttinen
et al. employed time-domain analysis to determine the relative plucking position along the string
[58, 59].
3.3.5 SDL Loop Filter Estimation
Before the SDL excitation signal can be extracted via commuted synthesis, the loop filter, Hl(z),
needs to be calibrated from the recorded tone. This task has been the primary focus in much of
the literature, since the loop filter provides the synthesized tones with natural decay characteristics
[14, 29, 39, 62, 69, 86, 90]. This section will overview some of the techniques used in the literature.
Early attempts at modeling the loop filter for the violin involved using deconvolution in the
frequency domain to obtain an estimate of the loop filter’s magnitude response. Smith employed
various filter design techniques, including autoregressive methods, in order to model the contours of
the spectra, however, the measured spectra were subject to amplified noise due to the deconvolution
process [69].
Karjalainen introduced a more robust algorithm that extracts magnitude response specifications
for the loop filter by analyzing the recorded tone with a short-time Fourier transform (STFT)
28
analysis [29]. Phase characteristics of the STFT are not considered in the loop filter design since the
magnitude response is considered to be perceptually more important for plucked-guitar modeling
[29, 86].
Lee et al. expand on Karjalainen’s STFT-based approach by adapting the so-called Energy Decay
Relief (EDR) [40, 64] to model the frequency-dependent attenuation of the waveguide. The EDR
was adapted from Jot [23] in order to de-emphasize the e↵ects of beating in the string so that the
resulting magnitude trajectories for each partial are strictly monotonic. Thus, the EDR at time
t and frequency f is computed by summing all the remaining energy at that frequency from t to
infinity. Due to the decaying nature of plucked-guitar tones, this leads to a set of monotonically
decreasing curves for each partial analyzed.
Example algorithm for Loop Filter Estimation
An example of Karjalainen’s calibration scheme is shown in Figure 3.8 and can be summarized with
the following steps:
1. Determine the pitch, f0, of the recorded tone, y(n).
2. Compute the STFT on the plucked tone y(n).
3. For each frame in the STFT, estimate the magnitudes of the harmonically-related partials.
4. Estimate the slope of each partial’s magnitude trajectory across all frames in the STFT.
5. Compute a gain profile, G(fk), based on the magnitude trajectories for each harmonically
related partials.
6. Apply filter design techniques (e.g. least-squares) to determine the parameters of Hl(z) that
satisfy the gain profile.
The details of each step in Karjalainen’s calibration scheme vary depending on the specific imple-
mentation. For example, the number of partials chosen to analyze is typically between 10-20. Also,
partial-tracking across each frame can be achieved by bandpass filtering techniques when the pitch
is known [90].
The gain profile, G(fk), extracted from the STFT analysis is computed as [29]
G(fk) = 10�kD
20fHop (3.19)
29
where �k is the slope of the kth partial’s magnitude trajectory, D is the “loop delay” in samples and
fHop is the hop size of the STFT analysis. The physical meaning of Equation 3.19 is to determine
the amount of attenuation a particular partial of the plucked tone incurs for each pass through the
SDL. Thus, Equation 3.19 provides a gain specification for each partial in the STFT that can be
used to design a loop filter, Hl(z), with similar magnitude response characteristics.
Filter Design Techniques
Least-squares filter design techniques are typically employed to derive coe�cients for the loop filter
that satisfy the estimated gain profile [29, 86, 90]. Valimaki et al. utilized a weighted, least squares
algorithm to estimate the gain, g, and pole, ↵0 of Hl(z) with a transfer function described by
Equation 3.13. Since a low-order filter generally cannot match the gain specifications of every
partial, the weighted minimization ensures that the magnitudes of the lower, perceptually important
partials are more accurately matched with the gain profile [86, 90]. These techniques must ensure
that the filter coe�cients are constrained for stability, which, for example, requires 1 < ↵0 < 0 and
0 < g < 1 when using the loop filter form of Equation 3.13. Rather than design a filter based on
desired magnitude characteristics, Bank et al. propose filter design technique which minimizes the
error of the decay times for the partials in the synthetic tone [3], which are found to be perceptually
significant.
Erkut and Laurson used Karjalainen’s calibration method as a foundation for an iterative scheme
based on nonlinear optimization to extract loop filter parameters that best match the amplitude
envelope of a recorded tone [14, 39]. The calibration scheme in Figure 3.8 is used to obtain an
initial set of loop filter parameters, which are used to resynthesize the plucked signal and an error
signal is computed between the amplitude envelopes of the recorded and synthesized signals. The
loop filter parameters are adjusted by a small amount and the process is repeated until a global
minimum in the error function is found. While this method has the potential to extract precise
model parameters, convergence is not guaranteed and its success depends on the accuracy of the
initial parameter estimates.
30
Pitch Estimation
y(n)
STFT
Peak Detection
Loop Filter Design
f0
Y(m, ω)
g, α0
0 100 200 300 400 500 600 700 800
0.98
0.985
0.99
0.995
1
Loop Filter Gain Specifications
Frequency (Hz)
Gai
n
Gain ProfileDesigned Filter Magnitude
0 0.5 1 1.5 2−20
−10
0
10
20
30
40
Trajectories of the Partials from a Plucked−Guitar Tone
Time (sec)
Mag
nitu
de (d
B)
Partial 1FittedPartial 2FittedPartial 3FittedPartial 4FittedPartial 5Fitted
0 0.05 0.1 0.15 0.2−0.25
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
Time (sec)
Ampl
itude
Plucked Guitar Tone
Figure 3.8: Overview of the loop filter design algorithm outlined in Section 3.3.5 using short-timeFourier transform analysis on the signal.
31
3.4 Extensions to the SDL Model
The SDL model discussed in this chapter simulates plucked strings that vibrate in only the transverse
(parallel to the guitar’s top plate) direction and behave in accordance with linear time-invariant as-
sumptions. These simplifications prevent modeling additional physical behavior exhibited by guitar
strings, which are described in this section. Real guitar strings vibrate along the axes parallel and
perpendicular to the guitar’s sound board. The frequency of vibration along each axis is slightly
di↵erent due to slight di↵erences in the string’s length at the bridge and nut terminations. The
di↵erences in the frequency of vibration along each axis causes the “beating” phenomena where
the sum and di↵erence frequencies are perceived [9]. Furthermore, these vibrations may be coupled
at the guitar’s bridge termination, which causes a two-stage decay due to the in- and out-of-phase
vibration along each axis [43].
In practice, the beating phenomena is incorporated into synthesis systems by driving two SDL
models in parallel, which represent string vibration along the transverse and perpendicular axes
[30, 26, 86]. From an analysis perspective, it is di�cult to simultaneously estimate parameters for
both the transverse and perpendicular axes from a recording since guitar pick-ups measure the total
vibration at a particular point on the string. Typically, the parameters for both SDL model are
extracted using the methods described in Section 3.3.5 with the exception of slightly mistuning one
of the delay lines to simulate the beating e↵ect. In order to estimate the model parameters directly,
Riionheimo utilized genetic algorithms to obtain transverse and perpendicular SDL parameters that
matched recorded signals in a perceptual sense [62]. Alternately, Lee employed a hybrid waveguide-
signal approach where the waveguide model is augmented with a resonator bank to implement
beating and two-stage decay phenomena in the lower frequency partials [43].
Modeling the tension modulation in strings necessitates the use of non-linear techniques to model
the “pitch-glide” phenomena [79, 80]. In practice, pitch-glide is simulated by pre-loading a waveguide
or SDL model with an initial string displacement and regularly computing the string’s slope to
determine an elongation parameter. This parameter drives a time-varying delay, which represents
wave speed to reproduce the tension modulation e↵ect. The caveat to this approach, however, is
that commuted synthesis cannot be applied to extract an excitation signal from a recorded tone.
For an analysis-synthesis approach, Lee uses a hybrid resonator-waveguide model. The resonator
bank is calibrated from a recording to implement pitch-glide in the low-frequency partials, since, it
is argued, that these are perceptually more relevant [42].
32
CHAPTER 4: SOURCE-FILTER PARAMETER ESTIMATION
4.1 Overview
Despite the vast amount of literature dedicated towards developing and calibrating physically in-
spired guitar models, as discussed in Chapter 3, far less research has been dedicated towards esti-
mating expression from recorded performances and incorporating these attributes into the synthesis
models. It is well-known that guitarists employ a variety of techniques to articulate guitar strings,
such as varying the loudness, or dynamics, and picking device (e.g. finger, pick), which characterizes
their playing style. Thus, identifying these playing styles from a performance is essential towards
developing a system capable of expressive synthesis.
In this chapter, I propose a novel method to capture expressive characteristics of guitar perfor-
mance from recordings in accordance with the single delay-loop (SDL) model overviewed in Section
3.3. This approach involves jointly estimating the source and filter parameters of the SDL in accor-
dance with a parametric model for the excitation signal, which captures the expressive attributes of
guitar performance. Since the SDL is a source-filter abstraction of the waveguide model, this method
treats the source signal as the guitarist’s string articulation while the filter represents the string’s
response behavior. The motivation for a joint estimation scheme is to account for simultaneous
variation of source and filter parameters, which characterizes particular playing styles.
Before providing the details of our approach, I briefly overview existing techniques in the litera-
ture for modeling expression in guitar synthesis models.
4.2 Background on Expressive Guitar Modeling
Erkut and Laurson present methods to generate plucked-tones with di↵erent levels of musical dynam-
ics, or relative “loudness”, by manipulating a reference excitation signal with a known dynamics level.
These methods involve designing pluck-shaping filters that can achieve a desired musical dynamics
when applied to the reference excitation signal [14]. Erkut employs a method that deconvolves a
fortissimo (very loud) excitation with forte (loud) and piano (soft) excitations in order to derive
33
their respective pluck-shaping filter coe�cients. Laurson used the di↵erences in log-magnitude be-
tween two signals with di↵erent dynamics and autoregressive filter design techniques to approximate
a desired pluck-shaping filter [39]. Both approaches are founded on an argument that a desired
level of musical dynamics can be achieved by appropriately filtering a reference excitation signal.
A limitation of this approach, however, is the assumption that the string filter parameters remain
constant for all plucking styles, which does not always hold.
Cuzzocoli et al. presented a model for synthesizing guitar expression by considering the finger-
string interaction for di↵erent plucking styles in classical guitar performance [12]. This work consid-
ered two plucking styles; apoyando, where the string is displaced quickly by the finger, and tirando,
where the finger slowly displaces the string before releasing it. The e↵ects of these finger-string in-
teractions are incorporated into the waveguide model by modifying the wave equation to incorporate
the force exerted on the string depending on the plucking style. For example, in the case of apoyando
plucking, the force applied to the string is impulsive, while tirando plucks are characterized by a
more gradual change in the string’s tension. Cuzzucoli’s approach relies on o↵-line analysis and no
methods are provided for deriving these parameters from a recorded signal.
Though these approaches adequately model expressive intention(s), o✏ine analysis is required to
compute the model’s excitation signal separately from the filter. This approach is counter-intuitive
from a musical performance perspective, since it is understood by musicians that expression is, in
part, the result of a simultaneous interaction between the performer and instrument.
4.3 Excitation Analysis
The SDL model presented in Section 3.3 assumes that plucked-guitar synthesis can be modeled
by a linear and time-invariant system. Accordingly the model output is the result of a convolution
between a source signal p(n) a comb filter C(z) approximating the performer’s plucking point position
and the string filter model S(z). For analysis-synthesis tasks, the commuted synthesis technique, as
overviewed in Section 3.3.4, is used to compute pb(n) by inverse filtering the recorded tone, y(n), in
the frequency domain with S(z) as shown in Equation 4.1:
Pb(z) = Y (z)S�1(z) (4.1)
34
It should be noted that the subscript b on p(n) indicates that the excitation signal contains a bias
from the performer’s plucking point position. Unless the comb filter C(z) from Section 3.3.4 is
known, the excitation signal derived from commuted synthesis will always contain this type of bias.
4.3.1 Experiment: Expressive Variation on a Single Note
To determine if the SDL model can incorporate expressive attributes of guitar performance, exci-
tation signals are analyzed corresponding to di↵erent articulations for the same note on an electric
guitar by employing commuted synthesis with Equation 4.1. Assuming the string filter parameters
are relatively constant for each performance, one might expect that the excitation signals contain the
expressive characteristics that distinguish each playing style. Additionally, any similarities observed
between the excitations may permit the development of a parametric input model.
To test this hypothesis, recordings of electric guitar performance were analyzed using the follow-
ing approach; For each plucking style:
1. Vary the relative plucking strength used to excite the string from piano (soft) to forte (loud).
2. Vary the articulation used to excite the string using either a pick or a finger.
3. Calibrate the string filter, S(z), using the methodology described in Section 3.3.5
4. Extract pb(n) by inverse filtering the recording, y(n), with S(z)
The tones used for analysis were taken from an electric guitar equipped with a bridge-mounted
piezo electric pickup. These signals are relatively “dry” with negligible e↵ects from the instrument’s
resonant body so that the recovered excitation signals should primarily indicate the performer’s
articulation. The bridge-mounted pickup ensures that the output will be observed from the same
location on the string and the recovered excitation signal will only contain a bias due to the plucking
point e↵ect.
The top panel of Figure 4.1 shows the recorded tones produced from specific articulations applied
to the guitar’s “open”, or unfretted, 1st string and the corresponding excitation signals obtained
using the approach outlined above are shown in the bottom panel. By observation, it is clear that
each excitation signal corresponds to the first period of oscillation for its associated signal in the top
panel of Figure 4.1 and each has negligible amplitude after this period. This is an intuitive result
since the SDL used for synthesis is tuned for the pitch of the string and its harmonics. By inverse
filtering with the SDL, the residual signal is devoid of the periodic and harmonic structure of the
35
−0.6
−0.4
−0.2
0
0.2
0.4
Ampl
itude
finger, pianofinger, fortepick, pianopick, forte
0 1 2 3 4 5 6 7
−0.6
−0.4
−0.2
0
0.2
0.4
Time (msec)
Ampl
itude
finger, pianofinger, fortepick, pianopick, forte
Figure 4.1: Top: Plucked guitar tones representing various string articulations by the guitarist onthe open, 1st string (pitch E4, 329.63 Hz). Bottom: Excitation signals for the SDL model associatedwith each plucking style.
recorded tone. The remaining “spikes” in the excitation signal correspond to incident and reflected
pulses detected by the pick up after the string is released from displacement (see Section 4.3.2).
Despite the similar contour patterns of the excitation signals in Figure 4.1, there are several
distinguishing features related to the perceived di↵erences in timbre. The di↵erences between the
amplitudes of overlapping impulses corresponds to the relative strength of the articulation used to
produce the tone. More interestingly, however, are the di↵erences between the tones produced with
a pick and those produced with the finger, as the former features sharper transitions near regions of
maximum or minimum amplitude displacement. This observation is correlated with the perceived
timbre of each tone since plucks generated with a pick have a more pronounced “attack” and will
36
excite the high-frequency harmonics in the string.
The common structure of the excitation signals in Figure 4.1 suggest that pb(n) can be parametri-
cally represented to capture the variations imparted by the guitarist through the applied articulation.
4.3.2 Physicality of the SDL Excitation Signal
The excitation signals shown in Figure 4.1 follow the contours of their counterpart plucked signals
in Figure 4.1. However, the excitation signal is a short transient event that reduces to residual error
after one period of oscillation in the corresponding plucked tones. Essentially, the excitation signal
indicates one period of oscillation in the vibrating string measured at a particular position along the
string. In this case, the acceleration of the string at the guitar’s bridge is the variable observed.
The peaks observed in the excitation signals of Figure 4.1 can be explained by observing the
output of a bidirectional waveguide model over one period of oscillation. This is shown in Figure
4.2 where the output at the end of the waveguide representing the guitar’s bridge position is traced
over time. Initially, the amplitude of the acceleration wave is maximal at the moment the string is
released from its initial displacement (Figure 4.2a). After time, two separate disturbances form and
travel in opposite directions along the string (Figure 4.2b). The initial peak in the excitation signal
occurs when the right-traveling wave encounters the bridge position (Figure 4.2c). The amplitude of
both traveling waves is inverted after reflecting with the boundary conditions at the nut and bridge
positions. Eventually, the initially left-traveling wave, now with inverted amplitude, encounters the
bridge position forming the second pulse of the excitation signal (Figure 4.2e). After sometime,
the initial pulse returns and the cycle repeats (Figure 4.2e). As will be discussed in Chapter 6,
identifying the pulse locations in the excitation signal can be used to estimate the guitarist’s relative
plucking position.
37
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1
−0.5
0
0.5
1
String Length (meters)
Acce
lera
tion
0 1 2 3 4 5 6−1
−0.5
0
0.5
1
Time (msec)
Brid
ge A
ccel
erat
ion
(a) t = 0 msec
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1
−0.5
0
0.5
1
String Length (meters)
Acce
lera
tion
0 1 2 3 4 5 6−1
−0.5
0
0.5
1
Time (msec)
Brid
ge A
ccel
erat
ion
(b) t = 0.56 msec
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1
−0.5
0
0.5
1
String Length (meters)
Acce
lera
tion
0 1 2 3 4 5 6−1
−0.5
0
0.5
1
Time (msec)
Brid
ge A
ccel
erat
ion
(c) t = 1.156 msec
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1
−0.5
0
0.5
1
String Length (meters)
Acce
lera
tion
0 1 2 3 4 5 6−1
−0.5
0
0.5
1
Time (msec)
Brid
ge A
ccel
erat
ion
(d) t = 2.26 msec
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1
−0.5
0
0.5
1
String Length (meters)
Acce
lera
tion
0 1 2 3 4 5 6−1
−0.5
0
0.5
1
Time (msec)
Brid
ge A
ccel
erat
ion
(e) t = 3.37 msec
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1
−0.5
0
0.5
1
String Length (meters)
Acce
lera
tion
0 1 2 3 4 5 6−1
−0.5
0
0.5
1
Time (msec)
Brid
ge A
ccel
erat
ion
(f) t = 5.67 msec
Figure 4.2: The output of a waveguide model is observed over one period of oscillation. The top figurein each subplot shows the position of the traveling acceleration waves at di↵erent time instances.The bottom plot traces out the measured acceleration at the bridge (noted by the ’x’ in the topplots) over time.
38
4.3.3 Parametric Excitation Model
The contour patterns of the excitation signals observed in Figure 4.1 and the simulated waveguide
output of Figure 4.2 are consistent with the physical behavior of the vibrating string. This suggests
that the variations in the physical behavior of a plucked-string due to di↵erent articulations can be
parametrically represented by capturing the contours of the pulse peaks. Modeling the excitation
signal with polynomial segments is a reasonable choice for approximating each contour. By concate-
nating these polynomial segments together, the excitation signal can be represented by a piecewise
function
pb(n) = c1,0n0 + c1,1n
1 + · · · + c1,KnK + · · · + cJ,0n0 + cJ,1n
1 + · · · + cJ,KnK (4.2)
where cJ,k is the kth coe�cient of a Kth order polynomial modeling the J th segment of pb(n).
Therefore, modeling a particular excitation signal requires determining the number of segments
required, the polynomial degree used to model each segment and the boundary locations specifying
where a particular segment begins and ends.
4.4 Joint Source-Filter Estimation
As shown in Section 4.3.2, the SDL excitation signal reflects one period of oscillation observed at
a particular location along the string. Also, it was shown that these signals di↵er according to the
articulation imparted by the guitarist and that a parametric model was proposed that can account
for these di↵erences. To model the SDL filter in response to di↵erent inputs (i.e. string articulations),
this section proposes a joint source-filter approach to simultaneously account for variation in the
excitation and string filter parameters. This section will detail the approach for estimating these
parameters by formulating a convex optimization problem.
4.4.1 Error Minimization
Using the SDL model, plucked string synthesis is assumed to result from a convolution between an
input signal and a string filter. To estimate these parameters in a joint framework, the error between
the excitation model described by Equation 4.2 and the residual signal must be minimized
e (n) = pb (n) � pb (n) . (4.3)
39
Here, pb(n) is the excitation model from Equation 4.2 and pb(n) is the residual obtained by inverse
filtering the output with the string filter. By assuming S(z) is an all-pole filter, e(n) can be expressed
in the frequency domain by replacing pb (n) with Y (z)S�1(z) to yield
E(z) = Pb(z) � Y (z)S�1(z)
= Pb(z) � Y (z)(1 � Hl(z)HF (z)z�D) (4.4)
where the SDL components discussed in Chapter 3 are used to complete the inverse filtering oper-
ation. Making an all-pole assumption on S(z) treats the output of the SDL as a generalized linear
prediction problem where the current output sample y(n) is computed by a linear combination of
previous output samples. Due to the periodic nature of the plucked tone, this prediction happens
over an interval defined by the loop delay which is specified by D.
Since inverse-filtering is a time-domain process, taking the inverse Z-Transform of E(z) in Equa-
tion 4.4 yields
e(n) = pb(n) � y(n) + ↵0y(n � D) + ↵1y(n � D � 1) + · · · + ↵Ny(n � D � N), (4.5)
where ↵0, ↵1, . . . are generalized filter coe�cients that are to be estimated. This equation can be
rearranged to
e(n) = pb(n) + ↵0y(n � D) + ↵1y(n � D � 1) + · · · + ↵Ny(n � D � N) � y(n), (4.6)
where the unknowns due to the source signal pb(n) and filter (↵0, ↵1, . . . ) are clearly separated from
the recorded tone y(n). This form leads to a convenient matrix formulation as shown in Equation
4.7.
40
2
666666666666664
e(1)...
e(i)
e(i + 1)...
e(m)
3
777777777777775
=
2
666666666666664
10 · · · 1K 0 · · · 0 y(1 � D) · · · y(1 � D � N)...
......
......
......
......
i0 · · · iK 0 · · · 0 y(i � D) · · · y(i � D � N)
0 · · · 0 (i + 1)0 · · · (i + 1)K y(i + 1 � D) · · · y(i + 1 � D � N)...
......
......
......
......
0 · · · 0 m0 · · · mK y(m � D) · · · y(m � D � N)
3
777777777777775
x�
2
666666666666664
y(1)...
y(i)
y(i + 1)...
y(m)
3
777777777777775
e = Hx� y (4.7)
H contains the time indices corresponding to the boundaries of pb(n) and the shifted samples of
y(n) and the unknown source-filter parameters are contained in a column vector x defined as
x =
c1,0 · · · c1,K cJ,0 · · · cJ,K ↵0 ↵1 · · · ↵N
�T. (4.8)
Full specification of Equation 4.7 requires determining the number of unknown source and filter
parameters. The generalized filter depends on N coe�cients while the excitation signal depends on
the number of piecewise polynomials used to model it. J indicates the number of segments and K
is the polynomial order for each segment.
4.4.2 Convex Optimization
The source-filter parameters are found by identifying the unknowns in x that minimize Equation 4.7.
The complexity of this problem is obviously related to the number of segments used to parameterize
pb(n) and the order of the generalized filter used to implement the string decay. In general the
number of unknowns are specified by J ⇥ (K + 1) + N + 1.
A common metric for optimizing the estimation of the unknown parameters is by taking the
L2-norm of the error term in Equation 4.7, which leads to
minx
kek2 = minx
kHx� yk2. (4.9)
41
Expanding 4.9 yields
minx
kHx� yk2 = (Hx� y)T (Hx� y)
= xTHTHx� 2yTHx + yTy
=1
2xTFx + gTx + yTy (4.10)
where F = 2HTH and gT = �2yTH. Equation 4.10 is now in the form of a convex optimization
problem. In this form, any locally minimum solution must also be a global solution [6].
Before applying a solver to the optimization problem, the constraints on the source-filter param-
eters in x must be addressed. For example, depending on the structure used for the loop filter, the
constraints may specify bounds on the coe�cients to yield a stable filter. Specific constraints for the
filter models used will be discussed in Sections 5.2 and 5.3. Regardless of the filter structure used,
the constraints regarding the excitation model are consistent. In particular, the segments constitut-
ing the excitation should be a smooth concatenation of polynomial functions that are continuous
at the boundary locations. As an example, consider an excitation consisting of J = 2 segments,
each modeled with a K-order polynomial and sharing a boundary located at n = i. The equality
condition ensuring that these segments are continuous can be expressed as
c1,0 · i0 + c1,1 · i1 + · · · + c1,K · iK = c2,0 · i0 + c2,1 · i1 + · · · + c2,K · iK ,
which, in matrix form, is notated as
i0 i1 · · · iK �i0 �i1 · · · �ik
�
2
6666666666666666666664
c1,0
c1,1
...
c1,K
c2,0
c2,1
...
c2,K
3
7777777777777777777775
= 0.
The term on the left contains the time indices of the polynomial functions and the column vector
42
contains the unknown source coe�cients. Since the real excitation signals dealt with will consist
of more than two segments, additional equality conditions are required for each pair of segments
sharing a boundary.
The constraints on the source-filter parameters are specified for the optimization problem via
equality and inequality conditions, noted by Aeq and A, respectively. By including these constraints,
the optimization problem from Equation 4.10 is expressed as
minx
f(x) =1
2xTFx + gTx (4.11)
subject to Ax b
Aeqx = beq.
where the last term of Equation 4.10 is dropped from the objective function f(x) since it is always
positive and does not contribute to the minimization. In Equation 4.11, b and beq specify the
bounds on the parameters related to the inequality and equality constraint matrices, respectively.
When written in the form of 4.11, Equation 4.9 is solved using quadratic programming techniques.
Several software packages are available for this task, including CVX and the quadprog function in
MATLAB’s Optimization Toolbox. quadprog employs a “trust region” algorithm, where a gradient
approximation is used to evaluate a small neighborhood of possible solutions in x to determine
convergence [47]. CVX is also adept for solving quadratic programs, though it formulates the
objective function as a second-order cone problem [18]. CVX is the preferred solver for the work in
this thesis because the syntax used to specify the quadratic program is identical to the mathematical
description of the minimization problem in Equation 4.10.
43
CHAPTER 5: SYSTEM FOR PARAMETER ESTIMATION
Coarse Onset Detection
Pitch Estimation
Pitch Synchronous
Onset
Onset Localization and Segment Estimation
y(n)
Initialize Least Squares Problem
||Hx - y||2
f0
n0, n1, ... ,nJ
Solve Optimization
Source-Filter Parameters
x
Figure 5.1: Proposed system for jointly estimating the source-filter parameters for plucked guitartones.
This chapter presents the details for the implementation of the joint source-filter estimation
scheme proposed in Chapter 4. Figure 5.1 provides a diagram of the proposed system including
the major sub-tasks required for estimating the parameters directly from recordings. Section 5.1
discusses the onset localization of the plucked-guitar signal. This is required to determine the pitch
of the tone during the “attack” instant and to localize the indices for the parametric model of the
excitation signal. The experiments for application of the joint source-filter scheme are presented in
Section 5.2, which include the problem formulation, solution and analysis of the results.
5.1 Onset Localization
To estimate the SDL excitation signal in the joint framework, the physics of a vibrating string fixed
at both end points are exploited. When considering the SDL model without the comb filter e↵ect
explicitly accounted for, the excitation signal corresponds to one period of string vibration, which
can be identified in the recorded signal. From the physical modeling overview provided in Chapter 3,
44
when the string is released from an initial displacement, two disturbances are produced that travel
in opposite directions along the string. These disturbances are measured by the guitar’s pickup as
impulse-like signals where the first pulse is incident from the string’s initial displacement and the
second is inverted from reflection at the guitar’s nut. A simulation of this behavior using acceleration
as the wave variable was shown in Section 4.3.2. By identifying these pulses in the initial period of
vibration, the portion of the recorded signal corresponding to the excitation signal can be identified.
This section overviews the approach used to identify the boundaries of the excitation within the
plucked-guitar signal, which includes locating the incident and reflected pulses. As will be explained
in Chapter 6, the spacing of these pulses provides insight on estimating the performer’s relative
plucking position along the string. The approach utilizes a two-stage onset detection and is outlined
as follows:
1. Employ “coarse” onset detection to determine a rough onset time for the “attack” of the
plucked tone.
2. Estimate the pitch of the tone starting from the coarse onset.
3. Using the estimated pitch value, employ pitch-synchronous onset detection to estimate an
onset closer to the initial “attack” of the signal.
4. Search for the local minimum and maximum values within the first period of the signal.
5.1.1 Coarse Onset Detection
Onset detection is an important tool used for many tasks in music information retrieval (MIR)
systems, such as the identification of performance events in recorded music. For example, on a large
scale it may be of interest to identify the beats from a recording of polyphonic music by looking
for the drum onsets. For melody detection on a monophonic signal, the onsets must be found to
determine when the instrument is actually playing.
A thorough review of onset detection algorithms is provided in [4] and details several sub-tasks
of the process including pre-processing of the audio signal, reducing the audio signal to a detection
function and locating the onsets by finding peaks in the detection function. Obtaining a spectral
representation of the audio signal is often the initial step for computing a detection function since the
time-varying energy in the spectrum can indicate when certain transient events occur, such as note
onsets. The short-time Fourier Transform (STFT) provides a time-varying spectral representation
45
and may be computed as:
Yk(n) =
N2 �1X
m=�N2
y(m)w(m � nh)e�2j⇡mk
N . (5.1)
In Equation 5.1, w(m) is an N -point window function and h is the hop-size between adjacent
windows. The STFT facilitates the computation of several detection functions for onset detection
tasks including spectral flux. For monophonic recordings of instruments with an impulsive attack,
such as the guitar, Bello et al. show that spectral flux performs well in identifying onsets [4]. Spectral
flux is calculated as the squared distance between successive frames of the STFT
SF (n) =
N2 �1X
k=�N2
{R (|Yk(n)|� |Yk(n � 1)|)}2 (5.2)
where R(x) = (x + |x|)/2 is a rectification function to account for only positive changes in energy
while ignoring negative changes.
The “coarse” onset detection is named such because a relatively large window size of N = 2048
samples is used to compute the STFT in Equation 5.1 and the flux in Equation 5.2. The motivation
for using such a long window size is to identify the “attack” portion of the plucked-tone where there
is the largest energy increase while ignoring spurious noise preceding onset. The corresponding
detection function is shown in the top panel of Figure 5.3(a) where there is a clear peak. The onset
is taken as the time instant two frames prior to the maxima in the detection function.
5.1.2 Pitch Estimation
The coarse onset detected in Figure 5.3(a) is still quite far o↵ from the “attack” segment of the
plucked signal. Searching for the pulse indices too far from the onset of the signal will likely result
in false detections and a closer estimate is required. This is the purpose of pitch synchronous onset
detection. The pitch of the signal is estimated by taking a window of audio equal to three times
of the STFT frame length starting from the coarse onset location. Using this window, the pitch is
estimated using the well-known autocorrelation function, which is given by
�(m) =1
N
N�1X
n=0
[y(n + l)w(n)][y(n + l + m)w(n + m)], for 0 m N � 1, (5.3)
46
0 2 4 6 8 10
−50
0
50
100
150
200
Lag (msec)
Auto
corre
latio
n
Autocorrelation FunctionFundamental Frequency Lag
Figure 5.2: Pitch estimation using the autocorrelation function. The lag corresponding to the globalmaximum indicates the fundamental frequency for a signal with f0 = 330 Hz.
where w(n) is a window with length N . Autocorrelation is used extensively for detecting periodicity
in signal processing tasks since it can reveal underlying structure in signals, especially for speech
and music. If �(m) for a particular signal is known to be periodic with period P , then that signal
is also periodic with the same period [61]. The pitch of the plucked-signal is estimated by searching
for a global maximum in �(m) that occurs after the maximum correlation, i.e. the point of zero lag
where m = 0. An example autocorrelation plot is provided in Figure 5.2.
5.1.3 Pitch Synchronous Onset Detection
The estimated pitch of the plucked-signal is used to recompute the STFT using a frame size equal
to half the estimated pitch period starting from the coarse onset location. The spectral flux is also
recomputed using equation 5.2 and the new frame size. This yields a detection function with much
finer time resolution. As an example, the pitch synchronous onset for a plucked signal is shown in
Figure 5.3(b), where the onset is taken as the first locally maximum peak indicated by the detection
function. Comparing all the panels of 5.3, it is evident that the two stage onset detection procedure
provides an onset that is su�ciently close to the “attack” portion of the plucked-note.
47
0
20
40
60
80
100
120
140
160
180
Spec
tral F
lux
FluxOnset
(a)
0
50
100
150
200
250
300
350
400
Spec
tral F
lux
FluxOnset
(b)
0.4 0.45 0.5 0.55 0.6−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Time (sec)
Ampl
itude
Plucked SignalCoarse OnsetPitch Synchronous Onset
(c)
Figure 5.3: Overview of residual onset localization in the plucked-string signal. (a): Coarse onsetlocalization using a threshold based on spectral flux with a large frame size. (b): pitch-synchronousonset detection utilizing spectral flux threshold computed with a frame size proportional to the fun-damental frequency of the string. (c): Plucked-string signal with onsets coarse and pitch-synchronousonsets overlayed.
48
5.1.4 Locating the Incident and Reflected Pulse
With the pitch-synchronous onset location, identifying the indexes of the incident and reflected
pulses is accomplished via a straight-forward search for the minimum and maximum peaks within
the first period of the signal. This period is known from the previous pitch estimation step. The
plucked-signal from Figure 5.3 is shown again in detail in Figure 5.4 for emphasis. The indices of
the pulses are used as boundaries for fitting polynomial curves to model the excitation signal. It
should be noted that a straight-forward search for the minima and maxima is sensitive to noise
preceding the incident pulse. The pitch-synchronous onset detection is capable of ignoring this noise
and yielding an onset closer to the incident pulse location.
0.44 0.445 0.45 0.455 0.46−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Time (sec)
Ampl
itude
Pluck SignalPitch Syncrhonous OnsetIncident PulseReflected Pulse
Figure 5.4: Detail view of the “attack” portion of the plucked-tone signal in Figure 5.3. The pitch-synchronous onset is marked as well as the incident and reflected pulses from the first period ofoscillation.
49
5.2 Experiment 1
This section presents the application of the joint source-filter estimation schemed proposed in Section
4.4 when the loop filter chosen is a single pole infinite impulse response (IIR) type. The problem
formulation and solution are discussed as well as the application of the scheme to a corpus of plucked
guitar tones.
5.2.1 Formulation
In the literature, the decay rates of the harmonically-related partials of plucked-guitar tones are
often approximated by a single, infinite impulse response (IIR) filter with the following form
Hl(z) =g
1 � ↵0z�1(5.4)
In this formulation, the pole ↵0 is tuned so that the spectral roll-o↵ of the filter’s magnitude response
approximates the decay rates of the harmonically related partials in the plucked guitar tone. The
gain term g in the numerator is tuned to improve the fit.
To estimate this type of filter in the joint source-filter framework, Equation 5.4 is substituted for
Hl(z) in the SDL string filter S(z)
S(z) =1
1 � Hl(z)HF (z)z�DI
=1 � ↵0z
�1
1 � ↵0z�1 � gHF (z)z�DI. (5.5)
The pole in the numerator of Equation 5.5 poses a problem for the joint-source filter estimation
approach because inverse filtering Y (z) with S(z) does not result in a FIR filtering operation. This
is problematic because inverse filtering Y (z) and S(z) in the time domain requires previous samples
from the excitation signal pb(n), which is unknown.
In practice, we can circumvent this di�culty and still formulate the joint source-filter estima-
tion problem by discarding the numerator of S(z) in Equation 5.5 to yield an all-pole filter. This
approximation is made by noting a few observations about the source-filter system. First, the mag-
nitude response of S(z), shown in Figure 5.5(d), is dominated by its poles, which creates a resonant
structure passing frequencies located near the string’s harmonically related partials. Examining the
values estimated for the loop filter pole ↵0 in the literature [14, 39, 86, 90], ↵0 is typically very small
50
−1 −0.5 0 0.5 1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Real Axis (seconds−1)
Imag
inar
y Ax
is (s
econ
ds−1
)
(a)
−1 −0.5 0 0.5 1−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Real Axis (seconds−1)
Imag
inar
y Ax
is (s
econ
ds−1
)
(b)
0 2000 4000 6000 8000 10000−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
Frequency (Hz)
Mag
nitu
de (d
B)
(c)
0 2000 4000 6000 8000 10000−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
Frequency (Hz)
Mag
nitu
de (d
B)
(d)
Figure 5.5: Pole-zero and magnitude plots of a string filter S(z) with f0 = 330 Hz and a loop filterpole located at ↵0 = 0.03. The pole-zero and magnitude plots of the system are shown in (a) and(c) and the corresponding plots using an all-pole approximation of S(z) are shown in (b) and (d).
(|↵0| ⌧ 1). As shown in Figure 5.5(a), this places the corresponding zero in the numerator of S(z)
close to the origin of the unit circle giving it a negligible a↵ect on the filter’s magnitude response.
Figures 5.5(d) shows that the magnitude response of the all-pole approximation is identical to its
pole-zero counterpart in Figure 5.5(c).
The next observation is that the model of the excitation signal consists of a short-duration pulse
with zero amplitude after the first period of vibration as discussed in Section 4.3. The non-zero part
of the excitation signal pertains to how the string was plucked, while the remaining part is residual
error from the string model. By making a zero-input assumption on the excitation signal after the
initial period, the recursion from the numerator of S(z) can be ignored without much a↵ect to the
51
estimation.
Taking these observations into account, the numerator of S(z) is discarded and an all-pole ap-
proximation is obtained
S(z) =1
1 � ↵0z�1 � gHF (z)z�DI. (5.6)
The fractional delay coe�cients due to HF (z) must be addressed before the error minimization
between the residual and excitation filter can be formulated (i.e. Equation 4.3). HF (z) is an N
order FIR filter
HF (z) =NX
n=0
hn(n)z�N (5.7)
where the coe�cients for a desired delay can be computed using a number of design techniques.
A consequence of realizing a causal fractional delay filter is that an additional integer delay in the
amount bN/2c is introduced into the feedback loop of S(z). In practice, this can be compensated
for to avoid de-tuning the SDL by subtracting the added delay from HF (z) o↵ of the bulk delay
filter zDI as long as N ⌧ DI .
The required fractional delay DF and the bulk delay DI can be determined from the estimated
pitch of the guitar tone discussed in Section 5.1.2 and HF (z) is computed using the LaGrange
interpolation technique overviewed in Appendix A. The error minimization from Equation 4.4 can
now be specified for this particular case
E(z) = Pb(z)Y (z)(1 � ↵0z�1 � g(h0 + h1z
�1 + · · · + hNz�N )z�DI ). (5.8)
By expanding Equation 5.8, rearranging terms and taking the inverse z-transform the error mini-
mization is expressed in the time domain as
e(n) = pb(n) + ↵0y(n � 1) + . . .
�0y(n � DI) + �1y(n � DI � 1) + · · · + �Ny(n � DI � N) � y(n)(5.9)
where �j = ghj , for j = 0, 1, 2, . . . , N .
52
5.2.2 Problem Solution
Using the convex optimization approach presented in Section 4.4.2, minimizing the L2-norm of
Equation 5.9 becomes
minx
kHx� yk2 (5.10)
subject to 0.001 ↵0 0.999
0.001 �j 0.999 for j = 0, 1, . . . , N.
The first inequality in the minimization ensures that the estimated loop filter pole ↵0 will lie within
the unit circle for stability and have low-pass characteristics. Though ↵0 = 0 is a stable solution, the
resulting filter will not have any damping characteristics on the frequency response of the loop filter
so 0.001 was chosen as a lower bound on ↵0. The second inequality constraint relates to the stability
of the overall string filter S(z). If the gain g of the loop filter is permitted to exceed unity, certain
frequencies could be amplified, which would result in an unstable string filter response. Thus, the
product of g with each fractional delay filter coe�cient hj is constrained to avoid this. Each hj is
constrained by the nature of the fractional filter design leaving g as the free parameter.
In addition to the inequality constraints, equality constraints were placed on the minimization
in Equation 5.10 to handle continuous excitation boundaries, which was discussed in Section 4.4.2.
The excitation boundaries were identified using the two-stage onset localization scheme from Section
5.1. While this approach yields 3 segments corresponding to the incident and reflected pulses, it
was found that additional segments were needed to adequately model the complex contours of the
excitation signal. To reduce the modeling complexity, two equally-spaced boundaries were inserted
between the incident and reflected pulses as shown in the top panel of Figure 5.6. Including the
boundary after the first period of the signal, this yields a total of 5 boundaries requiring 6 segments
to be modeled. 5th-order polynomial functions were found to provide the best approximation of
each segment while maintaining feasibility in the optimization problem since increasing the order
also increases the number of unknown variables. Lower order functions are unable to capture the
details of the signal, while higher order functions generally resulted in the solver failing to converge
on a solution.
53
5.2.3 Results
The source-filter estimation scheme was applied to a corpus of recorded performances of a guitarist
exciting each of the 6 strings using various fret positions. Multiple articulations were performed at
each position, which included using a finger or pick and altering the dynamics, or relative hardness,
of the excitation. Additional details about the data are provided in Section 6.3.
Figure 5.6 demonstrates the analysis and resynthesis for a tone produced by plucking the open,
1st string of the guitar. The top panel of Figure 5.6 shows the identification of the boundaries for
the excitation signal model within the first period of the recorded tone. The middle panel shows the
resynthesized tone and estimated excitation signal using the parameters obtained from the convex
optimization. The error computed between the synthetic and recorded tones is shown in the bottom
panel of Figure 5.6 along with the error computed between the estimated excitation signal and the
residual from inverse filtering. Areas of the error signals with significant amplitude can be attributed
to several factors. First, the approximation of the excitation may not capture all the high frequency
details present in the recorded signal. Second, the SDL model has fixed-frequency tuning whereas
the pitch of the recorded tone tends to fluctuate due to changing tension as the string vibrates,
which results in misalignment. Finally, the loop filter model assumes that the string’s partials
monotonically decay over time even though the decay characteristics of recorded tones are generally
more complex. This results in amplitude discrepancy between the analyzed and synthetic signals,
which contributes to the error as well.
Figure 5.7 shows that the source-filter estimation approach is capable of estimating the loop filter
pertaining to string articulations resulting from varying dynamics. Figures 5.7(a) and 5.7(b) show
the amplitude decay characteristics of analyzed and synthesized tones produced with a piano artic-
ulation, respectively. In this case, the synthetic tone demonstrates the gradual decay characteristics
of its analyzed counterpart. As the articulation dynamics are increased to mezzo-forte, the observed
decay is more rapid in both the analyzed and synthetic cases in Figures 5.7(c) and 5.7(d). Finally,
Figures 5.7(e) and 5.7(f) show a forte articulation defined by a very rapid decay. In all cases, the
synthetic signals constructed from the estimated parameters convey the perceptual characteristics
of their analyzed counter parts.
Figure 5.8 shows a similar plot of analyzed and resynthesized signals for various articulations,
but focuses on tones produced on a lower gauge string. In this case, the string’s behavior deviates
significantly from the SDL model since the amplitude decay rate fluctuates over time. This is
54
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Ampl
itude
Analyzed SignalResidual ExcitationExcitation Boundaries
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Ampl
itude
Synthesized OutputEstimated Input Signal
2 4 6 8 10 12−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Ampl
itude
Time (msec)
Output ErrorInput Error
Figure 5.6: Analysis and resynthesis of the guitar’s 1st String in the “open” position (E4, f0 = 329.63Hz). Top: Original plucked-guitar tone, residual signal and estimated excitation boundaries. Middle:Resynthesized pluck and excitation using estimated source-filter parameters. Bottom: Modelingerror.
55
0 1 2 3 4 5
−0.2−0.15−0.1−0.05
00.050.1
0.150.2
Time (sec)
Ampl
itude
(a) piano, analyzed
0 1 2 3 4 5
−0.2−0.15−0.1−0.05
00.050.1
0.150.2
Time (sec)
Ampl
itude
(b) piano, synthetic
0 1 2 3 4 5
−0.2
−0.1
0
0.1
0.2
Time (sec)
Ampl
itude
(c) mezzo-forte, analyzed
0 1 2 3 4 5
−0.2
−0.1
0
0.1
0.2
Time (sec)
Ampl
itude
(d) mezzo-forte, synthetic
0 1 2 3 4 5
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
Time (sec)
Ampl
itude
(e) forte, analyzed
0 1 2 3 4 5
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
Time (sec)
Ampl
itude
(f) forte, synthetic
Figure 5.7: Comparing the amplitude envelopes of synthetic plucked-string tones produced with theparameters obtained from the joint source-filter algorithm against their analyzed counterparts. Thetones under analysis were produced by plucking the 1st string at the 2nd fret position (F#4, f0 = 370Hz) at piano, mezzo-forte and forte dynamics.
characteristic of tones that exhibit strong beating characteristics and tension modulation. Although
these behaviors are not captured using the joint estimation approach, the optimization routine
identifies loop filter parameters that provide the best overall approximation of the tone’s decay
characteristics.
56
0 1 2 3 4 5
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
Time (sec)
Ampl
itude
(a) piano, analyzed
0 1 2 3 4 5
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
Time (sec)
Ampl
itude
(b) piano, synthetic
0 1 2 3 4 5−0.6
−0.4
−0.2
0
0.2
0.4
0.6
Time (sec)
Ampl
itude
(c) mezzo-forte, analyzed
0 1 2 3 4 5−0.6
−0.4
−0.2
0
0.2
0.4
0.6
Time (sec)
Ampl
itude
(d) mezzo-forte, synthetic
0 1 2 3 4 5−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
Time (sec)
Ampl
itude
(e) forte, analyzed
0 1 2 3 4 5−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
Time (sec)
Ampl
itude
(f) forte, synthetic
Figure 5.8: Comparing the amplitude envelopes of synthetic plucked-string tones produced withthe parameters obtained from the joint source-filter algorithm against their analyzed counterparts.The tones under analysis were produced by plucking the 5th string at the 5th fret position (D3,f0 = 146.83 Hz) at piano, mezzo-forte and forte dynamics.
To assess the model “fit” for each signal in the data set, the signal-to-noise ratio (SNR) was
57
computed as
SNRdB = 10 log101
L
LX
n=0
✓y(n)
y(n) � y(n)
◆2
, (5.11)
where L is the length of the analyzed guitar tone y(n) and y(n) is the re-synthesized tone using
the parameters from the joint estimation scheme. This metric provides an indication of the average
amplitude distortion introduced by the modeling scheme for a particular signal so that in the ideal
case there is zero amplitude error distorting the signal.
Table 5.1 summarizes the mean and standard deviation of the SNR computed for particular
articulations on certain strings. For example, the SNR values for all forte plucks produced with the
guitarist’s finger along the 1st string are computed and the mean and standard deviation of these
values is reported. No distinction is made for di↵erent fret positions along a string.
It should be noted that in general, the mean SNR value for a particular dynamic (i.e. forte)
corresponding to pick articulations is generally lower than the same plucking dynamic produced
with the guitarist’s finger. This can be explained by the action of the plastic pick, which induces
rapid frequency excursions in the partials of the string and other nonlinear behaviors such as tension
modulation. These e↵ects are prominent near the “attack” portion of the tone and the associated
string decay does not exhibit the monotonically decaying exponential characteristics used in the
single delay-loop model. The linear time invariant model cannot capture the complexities of the
string vibration and the estimated loop filter provides a “best fit” to match the overall decay char-
acteristics. This leads to a greater amplitude discrepancy between the modeled and analyzed tones
and thus a lower SNR value.
For the 3rd string, the SNR values are significantly lower for the pick articulations. A closer
inspection revealed that many of these tones exhibited resonant e↵ects from coupling with the
guitar’s body. This resonant e↵ect introduces a “hump” in the tone’s amplitude decay envelope
after the initial attack. Since the string model does not consider the instrument’s resonant body,
this e↵ect is not accounted for, which leads to increased amplitude error for the a↵ected portions of
the signal.
Informal listening tests confirm that the synthetic signals preserve many of the perceptually
important characteristics of the original tones, including the transient “attack” portion of the signal
related to the guitarist’s articulation.
58
Mean and Standard Deviation of Signal-to-Noise Ratio (dB)
Pick Finger
String piano mezzo-forte forte piano mezzo-forte forte
1 50.27 ± 1.52 51.92 ± 1.73 52.03 ± 2.12 49.80 ± 2.53 52.70 ± 1.74 54.66 ± 1.51
2 50.23 ± 1.37 50.35 ± 1.19 53.58 ± 2.18 52.10 ± 3.29 55.34 ± 1.39 55.48 ± 1.34
3 48.30 ± 0.99 48.60 ± 1.29 48.85 ± 1.53 50.73 ± 3.86 55.62 ± 3.12 56.36 ± 2.37
4 51.19 ± 1.29 52.11 ± 0.85 51.78 ± 1.98 54.44 ± 2.37 57.06 ± 1.18 56.47 ± 1.30
5 49.80 ± 1.59 50.16 ± 1.80 49.12 ± 1.04 53.63 ± 1.79 56.38 ± 1.53 55.60 ± 1.03
6 51.09 ± 1.23 51.61 ± 1.65 51.98 ± 1.77 53.78 ± 1.84 53.88 ± 1.65 55.09 ± 1.25
Table 5.1: Mean and standard deviation of the SNR computed using Equation 5.11. The jointsource-filter estimation approach was used to obtain parameters for synthesizing the guitar tonesbased on an IIR loop filter.
5.3 Experiment 2
This section investigates the solution of the joint source-filter estimation scheme when a finite impulse
response (FIR) filter is used to implement the loop filter. The problem formulation, solution and
results are discussed as well.
5.3.1 Formulation
The Z-Transform for a generalized, length N (order N � 1) FIR filter is given by
H(z) =NX
k=0
hkz�k, (5.12)
where each hk is an impulse response coe�cient of the filter. By using this filter structure for the
string model’s loop filter, the transfer function of S(z) becomes
S(z) =1
1 � Hl(z)HF (z)z�Dl. (5.13)
For the plucked-string system defined by the transfer function of S(z), the output is computed en-
tirely by a linear combination of past output samples once the transient-like excitation has reached
a zero-input state. Estimating the filter coe�cients through the error minimization technique dis-
cussed in Section 4.4.1 becomes complicated since the loop filter coe�cients are convolved with the
coe�cients from the fractional delay filter HF (z), which is also modeled using an FIR filter and
59
the contribution of the loop filter cannot be easily separated. In practice, this di�culty is averted
by resampling the recorded signal y(n) to a frequency that can be defined by an integer number
of delays determined by the bulk delay term DI , which allows HF (z) to be dropped. Though this
has the e↵ect of adjusting the frequency of the signal to fo = fsDI
, the fractional delay filter can be
re-introduced during synthesis to correct the pitch.
After the resampling operation, the Z-Transform of the error minimization becomes
E(z) = Pb(z) � Y (z)S�1(z)
= Pb(z)(1 � (h0 + h1z�1 + · · · + hN )z�DI ). (5.14)
Expanding terms and taking the inverse Z-Transform of Equation 5.14 yields the time-domain for-
mulation of the error minimization
e(n) = pb(n) + h0y(n � DI) + h1y(n � DI � 1) + · · · + hNy(n � DI � N) � y(n) (5.15)
where the loop filter coe�cients hk can be estimated with the convex optimization approach.
5.3.2 Problem Solution
Before solving for the source and filter parameters, several constraints are imposed on the FIR loop
filter. Foremost, the loop filter is required to have a low pass characteristic, to avoid amplifying
high frequency partials. This is consistent with the assumed operation of the loop filter in relation
to the behavior of plucked-guitar tones described in Section 3.3.3 where, in general, high frequency
partials are perceived as decaying faster than lower frequency partials. The next constraint on the
loop filter is that it exhibit a linear phase response to avoid introducing excessive phase distortion
into the frequency response of the string filter S(z). These filters also have the convenient property
of constant group delay, so as not to drastically de-tune S(z) when the signal is resynthesized.
The low pass constraints on the FIR filter can be formulated by constraining the magnitude
response on the filter at DC and Nyquist. At DC (! = 0), the filter gain is required to be 1 and
60
yields the following inequality constraints on the filter coe�cients
��H(e�j!k)��!=0
1
��h0 + h1e�j⇤0⇤1 + h2e
�j⇤0⇤2 + · · · + hNe�j⇤0⇤N �� 1
h0 + h1 + h2 + · · · + hN 1. (5.16)
At Nyquist frequency (! = ⇡), we require the filter to have zero magnitude response. This is
expressed as an equality constraint on the filter coe�cients
��H(e�j!k)��!=⇡
= 0
��h0 + h1e�j⇡ + h2e
�j2⇡ + · · · + hNe�jN⇡�� = 0
h0 + �h1 + h2 + · · · + (�1)NhN = 0. (5.17)
The linear phase constraint on the filter requires that its filter coe�cients are symmetric. This
imposes a final set of equality constraints on the coe�cients
hk = hN�1�k for k = 0, . . . , N. (5.18)
The process of identifying the boundaries for the segments of the excitation signal is identical
to the procedure described in Section 5.2.2 and 5th-order polynomials are also used for segment
fitting. Equation 5.19 summarizes the constrained minimization problem after taking the L2-norm
of Equation 5.15 and imposing the constraints from Equations 5.16-5.18 in addition to the constraints
placed on the input signal as specified in Section 4.4.2.
minx
kHx� yk2 (5.19)
subject toN+1X
k=0
hk 1
N+1X
k=0
hk(�1)k = 1
hk = hN�1�k for k = 0, . . . , N
61
Mean and Standard Deviation of Signal-to-Noise Ratio (dB)
Pick Finger
String piano mezzo-forte forte piano mezzo-forte forte
1 50.81 ± 1.61 51.94 ± 1.68 52.03 ± 1.85 49.51 ± 2.77 52.88 ± 1.83 54.77 ± 1.66
2 50.76 ± 1.19 50.68 ± 1.13 52.64 ± 1.93 52.26 ± 3.33 56.03 ± 1.32 55.69 ± 1.29
3 48.78 ± 0.97 48.70 ± 1.20 49.65 ± 1.44 50.89 ± 3.91 56.21 ± 3.48 56.30 ± 2.68
4 51.60 ± 1.05 52.18 ± 0.66 52.32 ± 1.72 54.45 ± 2.16 57.28 ± 2.16 56.45 ± 1.23
5 49.68 ± 1.65 50.10 ± 1.66 49.78 ± 1.92 53.76 ± 2.07 56.48 ± 1.58 55.28 ± 1.05
6 51.30 ± 1.43 51.73 ± 1.51 52.12 ± 1.86 53.92 ± 1.95 54.03 ± 1.84 55.23 ± 1.75
Table 5.2: Mean and standard deviation of the SNR computed using Equation 5.11. The jointsource-filter estimation approach was used to obtain parameters for synthesizing the guitar tonesusing a FIR loop filter with length N = 3.
5.3.3 Results
The source-filter estimation scheme using the FIR loop filter was applied to the same corpus of signals
used in Experiment 1 and the MATLAB CVX package was again used to solve the minimization
from Equation 5.19. Table 5.2 summarizes the mean and standard deviation of the SNR computed
in the same manner as Experiment 1 using Equation 5.11. These values were computed based on
re-synthesizing the plucked-guitar tones using a FIR loop filter with length N = 3.
The values reported in Table 5.2 from this experiment are on par with the values obtained
in Experiment 1. That is, the FIR modeling approach exhibits roughly the same average SNR
values and trends for di↵erent articulations and strings. However, by comparing the synthetic tones
produced by the methods of Experiment 1 and 2, we noted that the FIR filter does not always
adequately match the decay rates for the high frequency partials. This yielded synthetic tones that
sounded “buzzy” since the high frequency partials were not decaying fast enough.
We attempted to improve the perceptual qualities of the synthetic tones to better match their
analyzed counterparts by increasing the length of the FIR loop filter. However, using filters with
length N > 3 often resulted in the overall response of the single delay-loop model becoming unstable.
Though the FIR loop filter is inherently stable by design and constraints were placed on the filter at
the DC and Nyquist frequencies, the FIR loop filter may occasionally exhibit gains exceeding unity
at mid-range frequencies across the spectrum. Since this filter is located in the feedback loop of the
single delay-loop model, the overall response is unstable when the excitation signal has energy at
62
mid-range frequencies.
5.4 Discussion
This chapter presented the implementation details for the joint source-filter estimation scheme pro-
posed in Chapter 4. This included a two-stage onset detection based on a spectral flux computation
to estimate the pitch of the plucked-tone and identify the location of the incident pulses used to
estimate the source signal. The system was implemented using two di↵erent loop filter structures
which characterize the frequency-dependent decay characteristics of the guitar tones.
The first implementation utilized a one pole IIR filter to model the string’s decay response. The
formulation of the joint estimation scheme using this filter required using an all-pole approximation
for the single delay-loop transfer function. By applying the estimation scheme using this formulation,
it was shown that the modeling scheme was capable of capturing the source signals and string decay
responses characteristic to the articulations in the data set. The articulations produced with the
guitarist’s pick led to more complex string responses and the source-filter estimation method extracts
filter parameters that best approximate these characteristics. Modeling error is attributed to the
accuracy of the estimated source signal, which may omit some noise-like characteristics and the
non-ideal decay characteristics of real strings, which is generally not monotonic as assumed by the
model.
The second implementation utilized an FIR loop filter model, which inherently leads to an all-
pole transfer function for the single delay-loop model and thus, is more flexible in terms of adding
additional taps to improve the fit. Though a low order (length N = 3) FIR filter performed similarly
to the IIR case in terms of SNR, the low order filter did not adequately taper o↵ the high frequency
characteristics of the tones. Increasing the order of this filter led to unstable single delay-loop
transfer functions due to the loop filter gain occasionally exceeding unity. Thus, the IIR loop filter
proved to be more robust in terms of stability and providing a better match of the string’s decay
characteristics for high frequency partials.
63
CHAPTER 6: EXCITATION MODELING
6.1 Overview
In Chapter 3 physically inspired models of the guitar were discussed including the popular waveg-
uide synthesis and the related source-filter models. In particular, the source-filter approximation
is attractive for analysis and synthesis tasks because these models provide a clear analog to the
physical phenomena incurred with exciting a guitar string: that is, an impulsive-like force from the
performer excites the resonant behavior of the string. In Section 4.3, it was shown that analysis
via the source-filter approximation can be used to recover excitation signals corresponding to par-
ticular string articulations, thereby providing a measure of the performer’s expression. In Section
4.4, a technique was proposed to jointly estimate the excitation signal along with the filter model
using a piecewise polynomial approximation of the excitation signal, which contains a bias from the
performer’s relative plucking point position along the string.
Including the method proposed in Section 4.4.1, many techniques are available for estimating
and calibrating the resonant filter properties for the source-filter model [29, 36, 86], but less research
has been invested in the analysis of the excitation signals, which are responsible for reproducing
the unique timbres associated with the performer’s articulation. This is a complex problem, since
there are nearly an infinite number of ways to pluck a string, each of which will yield a unique
excitation (using the source-filter model) even when the tones have a similar timbre. In particular,
it is desirable to have methods in which particular articulations could be quantified from analysis
of the associated excitation signal. For applications, it would also be desirable to manipulate a
parametric representation for arbitrary plucked-string synthesis.
In this chapter, a components analysis approach is applied to a corpus of excitation signals derived
from recordings of plucked-guitar tones in order to obtain a quantitative representation to model the
unique characteristics of guitar articulations. In particular, principal components analysis (PCA)
is employed for this task to exploit common features of excitation signals while modeling the finer
details using the appropriate principal components. This approach can be viewed as developing a
codebook, where the entries are principal component vectors that describe the unique characteristics
64
of the excitation signals. Additionally, these components are used as features for visualization of
particular articulations and dimensionality reduction. Nonlinear PCA is employed to yield a two-way
mapping that isolates specific performance attributes which can be used for synthesizing excitation
signals.
This research has several applications, including modeling guitar performance directly from
recordings in order to capture expressive and perceptual characteristics of a performer’s playing
style. Additionally, the codebook entries obtained in this paper can be applied to musical interfaces
for control and synthesis of expressive guitar tones.
6.2 Previous Work on Guitar Source Signal Modeling
Existing excitation modeling techniques are based on either the digital waveguide or related source-
filter models. While both are discussed at length in Chapter 3, the source filter model and its
components are briefly overviewed here to re-introduce notation pertinent to the remainder of the
chapter.
Figure 6.1 shows the model achieved when the bi-directional waveguide model is reduced to
a source-filter approximation. The lower block, S(z), of Figure 6.1 is referred to as the single
delay-loop (SDL) and consolidates the DWG model into a single delay line zDI in cascade with a
string decay filter Hl(z) and a fractional delay filter HF (z). These filters are calibrated such that
the total delay, D, in the SDL satisfies D = fsf0
where fs and f0 are the sampling frequency and
fundamental frequency, respectively. Hl(z) is designed using the techniques discussed in Section
3.3.5 [29, 36, 86] while the fractional delay filter can be designed using a number of techniques
discussed in Appendix A. The upper block, C(z), of Figure 6.1 is a feedforward comb-filter that
incorporates the e↵ect of the performer’s plucking point position along the string. Since the SDL
lacks the bi-directional characteristics of the DWG, C(z) simulates the boundary conditions when a
traveling wave encounters a rigid termination. Absent from Figure 6.1 is an additional comb filter
modeling the pickup position where the string output is observed. While this a↵ects the resulting
excitation signals when commuted synthesis is used for recovery, it is omitted here since the data
used for evaluations is collected using a constant pickup position.
While the SDL is essentially a source-filter approximation of the physical system for a plucked-
string, there are several benefits associated with modeling tones in this manner. For example,
modifying the source signal permits arbitrary synthesis of unique tones even for the same filter
65
Hl (z) HF (z) z-DI
p(n)
y(n)S(z)
+z-λD
−+
+
C(z)
+
+
Figure 6.1: Source-filter model for plucked-guitar synthesis. C(z) is the feed-forward comb filtersimulating the a↵ect of the player’s plucking position. S(z) models the string’s pitch and decaycharacteristics.
model. Also, for analysis tasks it is desirable to model the perceptual characteristics of tones from
a recorded performance by recovering the source signal using linear filtering operations (see Section
3.3.4 on Commuted Synthesis), which is possible with a source-filter model.
There are several approaches used in the literature for determining the excitation signal for the
source-filter model of a plucked-guitar. A possible source signal includes filtered white noise, which
simulates the transient, noise-like characteristics of a plucked-string [31]. A well-known technique
involves inverse filtering a recorded guitar tone with a properly calibrated string-model [29, 36].
When inverse filtering is used, the string model cancels out the tone’s harmonic components leaving
behind a residual that contains the excitation in the first few milliseconds. In [39], these residuals
are processed with “pluck-shaping” filters to simulate the performer’s articulation dynamics. For
improved reproduction of acoustic guitar tones, this approach is extended by decomposing the tone
into its deterministic and stochastic components, separately inverse filtering each signal and adding
the residuals to equalize the spectra of the residual [90]. Other methods utilize non-linear processing
to spectrally flatten the recorded tone and use the resulting signal as the source, since it preserves
the signal’s phase information [38, 41]. Lindroos et al. consider the excitation signal to consist of
three parts, which include the picking noise, the first impulse detected by the pickup and a second,
reflected pulse also detected by the pickup at some later time [44]. The picking noise is modeled
with low-pass filtered white noise and the first pulse is modeled with an integrating filter.
Despite the range of modeling techniques described above, these methods are not generalizable
for describing a multitude of string articulations. For example, Laurson’s approach involves storing
the residual signals obtained from inverse-filtering recorded plucks, and filters to shape a reference
66
residual signal in order to achieve another residual with a particular dynamic level (e.g. piano,
forte) [39]. While this approach is capable of “morphing” one residual into another, the relation-
ship between the pluck-shaping filters and the physical e↵ects of modifying plucking dynamics is
somewhat arbitrary. Additionally, this method does not remove the bias of the guitarist’s pluck-
ing point location, which is undesirable since the plucking point should be a free parameter for
arbitrary resynthesis. On the other hand, Lee’s approach handles this problem by “whitening” the
spectrum of the recorded tone to remove spectral bias. However, this requires preserving the phase
information resulting in a signal equal to the duration of the recorded tone, which is not a compact
representation of the signal.
6.3 Data Collection Overview
It is understood by guitarists that exactly reproducing a particular articulation on a guitar string is
extremely di�cult, if not impossible due to the many degrees of freedom available when exciting the
string. These degrees of freedom during the articulation comprise parts of the guitarist’s expressive
palette including:
• Plucking device (e.g. pick, finger, nail)
• Plucking location along the string
• Dynamics (i.e. the relative “hardness” or “softness” during the articulation)
These techniques have a direct impact on the initial shape of the string, yielding perceptually
unique timbres, especially during the “attack” phase of the tone. It is important to note that, unlike
the waveguide model presented in Chapter 3, the SDL does not allow the initial waveshape to be
specified via wave variables (e.g. displacement, acceleration). Instead, signal processing techniques
must be used to derive the excitation signals through analysis of recorded tones and it is unclear
initially how exactly to parameterize the e↵ects of the plucking device and dynamics once the signals
are recovered. Additionally, a significant amount of data is needed to analyze the e↵ects of these
expressive parameters on the resulting excitation signals.
This section details the approach and apparatus used to collect plucked guitar recordings con-
taining the expressive attributes listed above. The recovery of the excitation signals from the data
will be explained in Section 6.4.
67
6.3.1 Approach
The plucked-guitar signals under analysis were produced using an Epiphone Les Paul Standard gui-
tar equipped with a Fishman Powerbridge pickup. A diagram of the Powerbridge pickup is shown
in Figure 6.2 and features a piezoelectric sensor mounted on each string’s saddle on the bridge
[15]. Unlike the magnetic pickups traditionally used for electric guitars, the piezoelectric pickup
responds to pressure changes due to the string’s vibration at the bridge. For the application of
excitation modeling, the piezoelectric pickup has several benefits over magnetic pickups, including
the measurement of a relatively “dry” signal that does not include significant resonant e↵ects arising
from the instrument’s body. Also, magnetic pickups tend to introduce a low-pass filtering e↵ect on
the spectra of plucked-tones, but the piezo pickups record a much wider frequency range, which is
useful for modeling the noise-like interaction between the performer’s articulation and the string.
Finally, recordings produced with the bridge-mounted piezo pickup can be used to isolate the pluck-
ing point location for equalization, which will be explained in Section 6.4.2, since the pickup location
is constant at the bridge.
Bridge
Saddle
Piezo Crystals
Saddle Position Screw
Figure 6.2: Front orthographic projection of the bridge-mounted piezoelectric bridge used to recordplucked-tones. A piezoelectric crystal is mounted on each saddle, which measures pressure duringvibration. Guitar diagram obtained from www.dragoart.com.
The guitar was strung with a set of D’Addario “10-gauge” nickel-wound strings. The gauge
reflects the diameter of the first (highest) string, which is 0.01 inches, while the last (lowest) string
68
has a 0.046 inch diameter. As is common with electric guitar strings, the lowest 3 strings (4-6)
feature a wound construction while the highest 3 (1-3) are unwound. Recordings were used using
either the fleshy part of the guitarist’s finger or a Dunlop Jazz III pick.
The data set of plucked-recordings was produced by varying the articulation across the fretboard
of the guitar using either the guitarist’s finger or the pick. For each fret, the guitarist produces a
specific articulation five consecutive times for consistency using the pick and their finger. The artic-
ulations were identified by their dynamic level and consisted of piano (soft), mezzo-forte (medium-
loud) and forte (loud). The performer’s relative plucking point position along the string was not
specified and remained a free parameter during the recordings. The articulations were produced on
each of the guitar’s six strings using the “open” string position as well as the first five frets, which
yielded approximately 1000 plucked-guitar recordings.
The output of the guitar’s bridge pick-up was fed directly to a M-Audio Fast Track Pro USB
interface, which recorded the audio directly to a Macintosh computer. Audacity, an open source
sound recording and editing tool, was used to record the samples at sampling rate of 44.1 kHz at a
16-bit depth [49].
Due to the di↵erence in construction between the lower and high strings on the guitar, the
recordings were analyzed in two separate groups reflecting the wound and unwound strings. In
terms of the acquisition system, this a↵ects how the signals are resampled in Figure 6.3. For the
unwound strings, the signals were re-sampled to 196 Hz, which corresponds to the tuning of the
open, 3rd string, which is the lowest pitch possible on the unwound set. Similarly, the wound strings
were resampled to 82.4 Hz, which is the pitch of the open 6th string and the lowest note possible in
the wound set.
6.4 Excitation Signal Recovery
On the way to modeling the articulations from recordings from plucked-guitar tones, there are a
few pre-processing tasks that must be addressed: 1) Estimate the residual signal from plucked
guitar recordings and 2) remove the bias associated with the guitarists plucking point position. As
discussed in Section 6.2, a limitation of existing excitation modeling methods is that they do not
explicitly handle this bias. The system overviewed in Figure 6.3 addresses these tasks and its various
sub-blocks will be explained in this section.
69
6.4.1 Pitch Estimation and Resampling
The initial step of the excitation recovery scheme involves estimating the pitch of the plucked guitar
tone. This is achieved by using the well-known autocorrelation method, which estimates the pitch
over the first 2-3 periods of the signal by searching for the lag corresponding to the maximum
of the autocorrelation function (see Section 5.1.2) [61]. The fundamental frequency is computed
as f0 = fs⌧max
where fs is the sampling frequency and ⌧max is the lag at the maximum of the
autocorrelation function.
Since the plucked-guitar tones under analysis have varying fundamental frequencies, a resampling
operation is required to compensate for di↵erences in the pulse-width when the residual is recovered.
This is a required pre-processing step before principal components analysis, since the goal is to model
di↵erences in articulation that are not related to pitch. Otherwise, the extracted basis vectors will
not reflect the di↵erences in articulation, but rather the di↵erences between the fundamental periods
of the analyzed tones.
The resampling operation on the plucked-tone is defined as
y(n) =l�� y(n) (6.1)
where �� is the resampling factor. � = Tref and � = T0 indicate the periods, in samples, of the
reference frequency and the estimated pitch frequency of the plucked-tone, respectively.
6.4.2 Residual Extraction
There are several methods of extracting the residual from the recorded tone. The most generalized
approach was discussed in Section 4.3 and involves inverse-filtering the recorded tone by the cali-
y(n)
Plucking Point Estimation
Pitch Estimation Residual Extraction
via inverse filtering or joint estimation
f0
Residual Equalization
drpp
pb(n)
p(n)
Figure 6.3: Diagram outlining the residual equalization process for excitation signals.
70
0 2 4 6 8 10−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Time (msec)
Ampl
itude
(a)
500 1000 1500 2000 2500 3000 3500 4000 4500 5000−30
−20
−10
0
10
20
30
Frequency (Hz)
Mag
nitu
de (d
B)
(b)
Figure 6.4: “Comb filter” e↵ect resulting from plucking a guitar string (open E, f0 = 331 Hz) 8.4cm from the bridge plucked-guitar tone. (a) Residual obtained from single delay-loop model. (b)Residual spectrum. Using equation 6.2, the notch frequencies are approximately located at multiplesof 382 Hz.
brated string model presented in Section 6.2 to yield the residual excitation pb(n). The approach
proposed in Chapter 4 outlines an alternate method to jointly estimate the excitation and filter
parameters for a plucked guitar tone. It should be notice that the subscript b on pb(n) indicates
that the residual contains a “plucking point bias”, which will eventually be removed.
6.4.3 Spectral Bias from Plucking Point Location
The “Plucking Point Estimation” block in Figure 6.3 is concerned with determining the position
where the guitarist has displaced the string. It is well understood in literature regarding string
physics and digital waveguide modeling that the plucking point position imparts a comb-filter e↵ect
on the spectrum of the vibrating string [17, 30, 64]. This occurs because the harmonics that have a
node at the plucking position are not excited and, in the ideal case, have zero amplitude.
Figure 6.4 shows the residual and its spectrum obtained from plucking an open E string (f0 =
331 Hz) approximately 8.4 cm from the bridge of an electric guitar. From 6.4(a), the first spike in
the residual results from the impulse produced by the string’s initial displacement arriving at the
bridge pickup. The subsequent spike also results from the initial string displacement, but has an
inverted amplitude due to traveling in the opposite direction along the string and reflecting at the
guitar’s nut. A detailed description of this behavior is provided in Figure 4.2 in Section 4.3.2. Unlike
a pure impulse which has a flat frequency response, the residual spectrum in 6.4(b) contains deep
notches spaced at near-regular frequency intervals. By denoting the relative plucking position along
71
the string as drpp = lLs
, where l is the distance from the bridge and Ls is the length of the string,
the notch frequencies can be calculated by
fnotch,n = nf0
1 � drpp, for n = 0, 1, 2, . . . (6.2)
The comb filter bias creates a challenge for parameterizing the excitation signals since the gui-
tarist’s relative plucking position constantly varies depending on the position of their strumming
hand and their fretting hand. Even when the guitarist maintains the same plucking distance from
the bridge, changing the fretting position along the neck manipulates the relative plucking position
by elongating or shortening the e↵ective length of the string. While guitarists vary the relative
plucking point location, either consciously or subconsciously, during performance, modeling the ex-
citation signal requires estimation of the plucking point position and equalization to remove its
spectral bias. Ideally, it is desirable to recover the pure impulsive signal imparted by the guitarist
when striking the string, as shown in Figure 6.9, in order to quantify expressive techniques, such as
plucking mechanism and dynamics. Such analysis requires estimating the plucking point location
from recordings and equalizing the residuals to remove the bias.
6.4.4 Estimating the Plucking Point Location
Previous techniques in the literature for estimating the plucking point location from guitar recordings
have focused on spectral or time-domain analysis techniques.
Traube proposed a method of estimating the plucking point location by comparing a sampled-
data magnitude spectrum obtained from a recording to synthetic magnitude spectra generated with
di↵erent plucking point locations [83, 84]. The plucking point location for a particular recording
was determined by finding the synthetic string spectra with a plucking position that minimizes the
magnitude error between the measured and ideal spectra.
Later, Traube introduced a plucking-point estimation method based on iterative optimization
and the so-called log-correlation, which is computed from recordings of plucked tones [81, 82]. The
log-correlation is computed by taking the log of the squared Fourier coe�cients for the harmonically-
related partials in a plucked-guitar spectrum and applying the inverse Fourier transform using these
coe�cients. The log-correlation function yields an initial estimate for the relative plucking position,
drpp = ⌧min
⌧0, where ⌧min, ⌧0 are the lags indicating the minima and maxima of the log-correlation
function, respectively. The estimate of drpp is used to initialize an iterative optimization scheme,
72
which minimizes the di↵erence between ideal and measured spectra, in order to refine drpp and
improve accuracy.
Penttinen et al. exploited time domain-based analysis techniques to estimate the plucking po-
sition [58, 59]. Using an under-saddle bridge pickup, Penttinen’s technique is based on identifying
the impulses associated with the string’s initial displacement as they arrive at the bridge pickup.
Since the initial string displacement produces two impulses traveling in opposite directions, the ar-
rival time between each impulse at the bridge, �t, provides an indication of the guitarist’s relative
plucking position along the string.
Figure 6.5 shows the output of a bridge-mounted piezo-electric pickup for a plucked-guitar tone.
By determining the onsets when each pulse arrives at the bridge pickup, Pentinnen shows that the
relative plucking position can be determined by
drpp =fs � �Tf0
fs, (6.3)
where �T = fs�t indicates the number of samples between the arrival of each impulse at the bridge
pickup [58, 59]. As drpp is in the range of (0, 1), the actual distance from the bridge is obtained
by multiplying drpp by the length of the string. Penttinen utilizes a two-stage onset detection to
determine �T where the first stage isolates the onset of the plucked tone and the second stage uses
the estimated pitch of the tone to extract one period of the waveform. The autocorrelation on the
extracted period is used to determine �T since the minimum of the autocorrelation function occurs
at the lag where the signal’s impulses are out of phase. Figure 6.6(a) shows one cycle extracted from
the waveform in Figure 6.5 and the corresponding autocorrelation of that signal in Figure 6.6(b).
�t is identified by searching for the index corresponding to the minimum of the autocorrelation
function.
There are several strengths and weaknesses associated with the methods proposed by Traube and
Penttinen. Traube’s approach is generalizable to acoustic guitar tones recorded using an external
microphone. However, a relatively large time window on the order of 100 milliseconds is required to
achieve the frequency resolution required to resolve the string’s harmonically related partials and,
thus, compute the autocorrelation function. By including multiple periods of string vibration in the
analysis, the e↵ect of the plucking position can become obscured since non-linear coupling of the
string’s harmonics can regenerate the missing harmonics [16]. By isolating just one period of the
waveform near the onset, Penttinen’s technique avoids this physical consequence since the analyzed
73
2 4 6 8 10 12 14
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Time (msec)
Ampl
itude
Δt
Figure 6.5: Plucked-guitar tone measured using a piezo-electric bridge pickup. Vertical dashed-linesindicate the impulses arriving at the bridge pickup. �t indicates the arrival time between impulses.
5.5 6 6.5 7 7.5 8 8.5
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Time (msec)
Ampl
itude
(a)
0 0.5 1 1.5 2 2.5 3−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Time (msec)
Ampl
itude
(b)
Figure 6.6: (a) One period extracted from the plucked-guitar tone in Figure 6.5. (b) Autocorrelationof the extracted period. The minimum is marked and denotes time lag, �t, between arriving pulsesat the bridge pickup.
segment results from the string’s initial displacement. However, Penttinen’s approach requires the
guitar to be equipped with the bridge-mounted pickup to isolate the arrival time of the impulses in
the first period of vibration. Also, isolating the first period of vibration is di�cult and success is
dependent on the parameters used in the two-stage onset detection.
Handling the e↵ect of a string pickup location at a position other than the bridge is not explicitly
addressed by either method. Similar to spectral bias resulting from the plucking point location, the
pickup location also adds a spectral bias since vibrating modes of the string with a node at the pick
74
up location will not be measured. Traube’s methods are developed for the acoustic guitar recorded
with a microphone some distance from the instrument’s sound hole. In this case, the “pickup” is the
radiated acoustic energy from all positions along the string and thus shows no particular spectral
bias. For electric guitars, if a bridge-mounted pickup is not available, determining the plucking
location is particularly di�cult due to the lack of consistency where the pickups are placed on
the instrument and the number used. The former constraint makes it di�cult to determine which
impulse (i.e. that left-traveling and right-traveling) pulse is being measured at the output and the
latter constraint complicates the problem since some guitars “blend” the signal from two or more
pickups.
6.4.5 Equalization: Removing the Spectral Bias
The next step in the excitation acquisition scheme is to remove the comb filter bias associated with
the plucking point position. In Figure 6.3, the “Residual Equalization” block handles this task.
The equalization begins by obtaining an estimate of the relative plucking-point location drpp
along the string. Since the signals under analysis were recorded with a bridge-mounted pickup,
Penttinen’s autocorrelation-based technique was chosen to estimate drpp. The two-stage onset de-
tection approach presented in Section 5.1 was used to identify the incident and reflected pulses
during the initial period of vibration. drpp is then used to formulate a comb filter to approximate
the notches in the spectrum of the residual
Hcf (z) = 1 � µz�b�Dc, (6.4)
where � = 1 � drpp and D = fsf0
is the “loop delay” of the digital waveguide model determining the
pitch of the string [74]. b�Dc denotes the greatest integer less than or equal to the product �D. µ
is a gain factor applied to the delayed signal, which determines how deep the magnitude is for the
notch frequencies in the spectrum where µ values closer to 1 lead to deeper notches [76]. Intuitively,
Equation 6.4 specifies the number of samples, as a fraction of the total loop delay, between the
arrival of each impulse at the bridge.
The basic comb filter structure in Equation 6.4 and Figure 6.7 (a) provides a good approximation
of the spectral nulls associated with the plucking point position. However, it is limited to sample-
level accuracy, which may not adequately approximate the true notch frequencies in the spectrum.
For more precise localization, a fractional delay filter is inserted into the feed-forward path to provide
75
μ +z-λD−
+v(n) u(n)
(a)
μ +F(z) z-λD−
+v(n) u(n)
(b)
Figure 6.7: Comb filter structures for simulating the plucking point location. (a) Basic structure. (b)Basic structure with fractional delay filter added to the feedforward path to implement non-integerdelay.
the required non-integer delay as shown in Figure 6.7 (b) [88]. Thus, the resulting fractional delay
comb filter has the form
Hcf (z) = 1 � µF (z)z�b�Dc, (6.5)
where F (z) provides the fractional precision lost by rounding the product �D. F (z) is designed
using several available techniques in the literature, including all-pass filters and FIR LaGrange
interpolation filters as discussed in Appendix A.
Using the comb filter structure from Equation 6.4 or 6.5, pb(n) can be equalized by inverse
filtering
P (z) =Pb(z)
Hcf (z). (6.6)
Figure 6.8 demonstrates the e↵ects of equalizing the residual in both the time and frequency
domains. Figures 6.8(a) and 6.8(b) show the time and spectral domain plots, respectively, of the
residual obtained from a plucked-guitar tone. Figure 6.8(b) also plots the frequency response of
the estimated comb filter, which approximates the deep notches found in the residual. A 5th-order
fractional delay was used for the comb filter and a value of 0.95 was used for the gain term µ. This
value was found to provide the closest approximation of the spectral notches for the signals in the
76
dataset. Figure 6.8(c) and 6.8(d) show the time and spectral domain plots when the residual is
equalized by inverse filtering. In the spectral domain, inverse comb filtering yields a magnitude
spectrum that is relatively free of the deep notches seen in 6.8(b). In the time domain plot of 6.8(c)
this translates into a signal that is much closer to a pure impulse.
0 2 4 6 8 10−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Time (msec)
Ampl
itude
(a) Residual
500 1000 1500 2000 2500 3000 3500 4000 4500 5000−30
−20
−10
0
10
20
30
Frequency (Hz)
Mag
nitu
de (d
B)
Residual SpectrumComb Filter Approximation
(b) Residual spectrum and comb filter approximation
0 2 4 6 8 10−1.6
−1.4
−1.2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
Time (msec)
Ampl
itude
(c) Residual with bias removed
500 1000 1500 2000 2500 3000 3500 4000 4500 5000−30
−20
−10
0
10
20
30
Frequency (Hz)
Mag
nitu
de (d
B)
Residual SpectrumEqualized Spectrum
(d) Original and equalized spectra using inverse combfilter.
Figure 6.8: Spectral equalization on a residual signal obtained from plucking a guitar string 8.4 cmfrom the bridge (open E, f0 = 331 Hz)
6.4.6 Residual Alignment
After equalization, the final step is to align the processed excitation signals with a reference excitation
signal. This ensures that the impulse “peak” of each signal is aligned in the time domain to avoid
errors for principal components analysis. In practice, this is accomplished by copying the reference
and processed signals and cubing them, which decreases the amplitudes of the samples around the
primary peak. The cross correlation is computed between each signal and the reference pulse. The
lag indicating maximum correlation is used to indicate the shift needed to align each signal with the
77
reference pulse.
For excitation signal modeling and parameterization, the residual equalization scheme has several
benefits. From an intuitive standpoint, the impulsive-like signals obtained from equalization are more
indicative of the performer’s string articulation. Also, signals in this form are simpler to model and
therefore more adept for parameterization. Finally, removing the plucking point bias allows the
relative plucking point location to remain a free parameter for synthesis applications.
6.5 Component-based Analysis of Excitation Signals
6.5.1 Analysis of Recovered Excitation Signals
By applying the excitation recovery and equalization scheme of the previous section to the corpus
of recordings gathered in Section 6.3, analysis of the recovered signals provides insight into the
similarities and di↵erences of excitation signals corresponding to various string articulations. Figure
0 1 2 3 4 5 6 7 8 9 10
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
Time (msec)
Ampl
itude
fortemezzo−fortepiano
(a)
0 1 2 3 4 5 6 7 8 9 10
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
Time (msec)
Ampl
itude
fortemezzo−fortepiano
(b)
Figure 6.9: Excitation signals corresponding to strings excited using a pick (a) and finger (b).
6.9 (a) and (b) shows excitation signals overlayed on each other which were obtained from plucked
guitar tones produced by using either a plastic pick (a) or the player’s finger (b). For both finger and
pick articulations, the dynamics of the pluck consisted of piano (soft), mezzo-forte (moderately loud)
and forte (loud). These plots show a common, impulsive-like contour with additional high-frequency
characteristics depending on the dynamics used. Comparing Figures 6.9 (a) and (b), it is evident
that the signals corresponding to finger articulations are generally wider whereas the pick excitation
signals are more narrow and closer to an ideal impulse.
Figure 6.10 plots the average magnitude spectrum for each type of articulation in the data set.
78
102 103 104−40
−30
−20
−10
0
10
20
30
Frequency (Hz)
Mag
nitu
de (d
B)
fortemezzo−fortepiano
(a)
102 103 104−40
−30
−20
−10
0
10
20
30
Frequency (Hz)
Mag
nitu
de (d
B)
fortemezzo−fortepiano
(b)
Figure 6.10: Average magnitude spectra of signals produced with pick (a) and finger (b).
For each type of articulation (finger or pick), increasing the relative dynamics from piano to forte
results in increased high frequency spectral energy. An interesting observation is that piano-finger
articulations show a significant high frequency ripple. This may be attributed to the deliberately
slower plucking action used to produce these articulations, where the string slides slower o↵ the
player’s finger. When these signals are used to re-synthesized plucked-guitar tones, they often have
a qualitative association with the perceived timbre of the resulting tones. Descriptors, such as
“brightness” are often used to describe the timbre, which generally increases with the dynamics of
the articulations. The varying energy from the plots in Figure 6.10 provides quantitative support of
this observation.
6.5.2 Towards an Excitation Codebook
Based on the observations of Figures 6.9 and 6.10, we propose a data-driven approach for mod-
eling excitation signals using principal components analysis (PCA). Employing PCA is motivated
by observing the similar, impulse-like structure of the excitation signals shown in Figure 6.9. As
discussed, the fine di↵erences between the derived excitation signals can be attributed to the gui-
tarist’s articulation and account, in part, for the spectral characteristics of the perceived tones.
These di↵erences can be modeled using a linear combination of basis vectors to provide the desired
spectral characteristics. The results of this analysis will be used to develop a codebook that consists
of the essential components required to accurately synthesize a multitude of articulation signals. At
present, PCA has not yet been applied to modeling the excitation signals for source-filter models of
plucked-string instruments. However, PCA has been applied to speech coding applications, in which
79
principal components are used to model voice-source waveforms including the complex interactions
between the vocal tract and glottis [19, 51].
This section presents the application of PCA to the data set and the development of an excitation
codebook using the basis vectors. The re-synthesis of excitation signals corresponding to particular
string articulations will also be presented.
6.5.3 Application of Principal Components Analysis
The motivation for applying principal components analysis (PCA) to plucked-guitar excitation sig-
nals is to achieve a parametric representation of these signals through statistical analysis. In Section
6.5.1 it was shown that excitation signals corresponding to di↵erent articulations shared a common
impulsive-contour, but had varying high frequency details depending on the specific articulation.
The goal of PCA is to apply a statistical analysis to this data set which is capable of extracting
basis vectors that can model these fine details. By exploiting redundancy in the data set, PCA leads
to data reduction for parametric representation of signals.
PCA is defined as an orthogonal linear transformation of the data set onto a new coordinate
system [13]. The first principal axes in this new space explains the greatest variance in the original
data set, the second axes maximizes the remaining greatest variance in the data set and so on. Figure
6.11 depicts the application of PCA to synthetic data in a two dimensional space. The vectors v1
and v2 define the principal component axes for the data set.
The principal components are found by computing the eigenvalues and eigenvectors for the
covariance matrix of the data set [5]. This is the well-known Covariance Method for PCA [13]. The
v1v2
Figure 6.11: Application of principal components analysis to a synthetic data set. The vector v1
explains the greatest variance in the data while v2 explains the remaining greatest variance.
80
initial step involves formulating a data matrix
P =
2
66664
| | |
p1 p2 . . . pN
| | |
3
77775
T
(6.7)
where each pi is a M -length column vector corresponding to a particular excitation signal in the
data set. The next step involves computing the covariance matrix for the mean-centered data matrix
by taking
⌃ = Eh(P� u) (P� u)T
i(6.8)
where E is the expectation operator and u = E[P] is the empirical mean of the data matrix. The
principal component basis vectors are obtained through an eigenvalue decomposition of ⌃
V�1⌃V = D (6.9)
where V = [v1v2 . . .vN ] is a matrix of eigenvectors of ⌃ and D is a matrix containing the associ-
ated eigenvalues along its main diagonal. The LAPACK linear algebra software package is used to
compute the eigenvectors and eigenvalues [2].
The columns of V are sorted in order of the decreasing eigenvalues in D such that �1 > �2 >
· · · > �N . This step is performed so that the PC basis vectors are rearranged in a manner that
explains the most variance in the data set.
To reconstruct the excitation signals, the correct linear combination of basis vectors is required.
The correct weights are obtained by projecting the mean-centered data matrix onto the eigenvectors
W = (P� u)V. (6.10)
Equation 6.10 defines an orthogonal linear transformation of the data onto a new coordinate system
81
defined by the basis vectors. The weight matrix W is defined as
W =
2
66664
| | |
w1 w2 . . . wN
| | |
3
77775
T
, (6.11)
where each w is an M -length column vector containing the scores (or weights) to pertaining to a
particular excitation signal in P. These scores indicate how much each basis vector is weighted when
reconstructing the signal and they are also helpful in visualizing the data, as will be discussed in the
next section.
6.5.4 Analysis of PC Weights and Basis Vectors
Principal component analysis of the excitation signals is divided into two groups to separately
examine the set of wound and unwound strings, which have di↵erent physical characteristics, as
described in Section 6.3.
For the set of unwound strings, the recovered excitation signals were normalized to a reference
length of M = 570 samples, which is approximately twice the length of the period corresponding
to the open 3rd string tuned to 196 Hz. For the set of wound strings, the reference length of the
excitation signals was set to M = 910 samples, which is approximately twice the period of the open
6th string tuned to 82.4 Hz. It should be noticed that normalization was achieved via downsampling
to avoid truncating significant sections of the excitation signal. Downsampling to the lowest possible
frequency in the set of strings also avoids the loss of high frequency information present in the data
set. PCA was applied to both groups of excitation signals using the Covariance Method overviewed
in Section 6.5.3.
To analyze the compactness of each data set, the explained variance (EV ) can be computed
using the eigenvalues calculated from PCA
EV =⌃M 0
m �m
⌃Mm �m
(6.12)
where M 0 < M . Figure 6.12 plots the explained variance for the sets of unwound and wound
strings, respectively. In both cases, the plots of explained variance suggest that the data is fairly
low dimensional. Selecting M 0 = 20 basis vectors accounts for > 95% of the variance for the set of
82
10 20 30 40 50 60 70 80 90 10075
80
85
90
95
100
Number of Eigenvalues
Expl
aine
d Va
rianc
e (%
)
(a)
10 20 30 40 50 60 70 80 90 10075
80
85
90
95
100
Number of Eigenvalues
Expl
aine
d Va
rianc
e (%
)
(b)
Figure 6.12: Explained variance of the principal components computed for the set of (a) unwoundand (b) wound strings.
unwound strings while M 0 = 30 is su�cient for > 95% of the variance in the wound set.
For insight on the relationship between the basis vectors and the excitation signals, Figure 6.13
plots the first three basis functions along side example articulations extracted from the data set
consisting of the 1st, 2nd and 3rd strings. The general, impulsive-like contour is captured by the
empirical mean of the data set. In the case of the excitations derived from pick articulations, the
basis vectors plotted provide the high frequency components just before and after the main impulse.
In the case of the finger articulations, these basis vectors are negatively weighted and serve to widen
the main impulse. This relationship agrees with the physical occurrence of plucking a string with
a pick versus a finger, since the physical characteristics of each plucking device directly a↵ect the
shape of the string.
Figure 6.14 shows a similar plot for the 4th, 5th and 6th strings, which have di↵erent physical
characteristics due to their wound construction. By comparing Figures 6.13 and 6.14, it is evident
that the extracted basis vectors are very similar in each case. The di↵erence, however, is in the
empirical mean vector, which is exhibits a pronounced “bump ” immediately after the main impulse.
This feature appears to be characteristic of the articulations produced by the finger, which perhaps
reflects the slippage of the wound string o↵ of the finger.
Figure 6.15 shows projections of how the data pertaining to the string articulations projects
into the space defined by the principal component vectors. Figure 6.15(a) shows the projection of
articulations from strings 1-3 along the 1st and 2nd components. This projection shows that the
data pertaining to specific articulations have a particular arrangement and grouping in this space.
83
−1
−0.5
0
pick excitations
Ampl
itude
fortemezzo−fortepiano
0 2 4 6 8 10
−1
−0.5
0
finger excitations
Time (msec)0 2 4 6 8 10
Time (msec)
principal components
MeanPC 1PC 2PC 3
Figure 6.13: Selected basis vectors extracted from plucked-guitar recordings produced on the 1st,2nd and 3rd strings.
−1
−0.5
0
pick excitations
Ampl
itude
fortemezzo−fortepiano
0 5 10 15
−1
−0.5
0
finger excitations
Time (msec)0 5 10 15
Time (msec)
principal components
MeanPC 1PC 2PC 3
Figure 6.14: Selected basis vectors extracted from plucked-guitar recordings produced on the 4th,5th and 6th strings.
84
−6 −5 −4 −3 −2 −1 0 1 2 3 4−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
1st Principal Component
2nd
Prin
cipa
l Com
pone
nt
pick−fortepick−mezzo−fortepick−pianofinger−fortefinger−mezzo−fortefinger−piano
(a)
−6 −4 −2 0 2 4 6−4
−3
−2
−1
0
1
2
3
1st Principal Component
2nd
Prin
cipa
l Com
pone
nt
pick−fortepick−mezzo−fortepick−pianofinger−fortefinger−mezzo−fortefinger−piano
(b)
Figure 6.15: Projection of guitar excitation signals into the principal component space. Excitationsfrom strings 1 - 3 (a) and 4 - 6 (b).
In particular, the axis pertaining to the 1st correlates to the articulation strength, which increases
independently for pick and finger articulations. Similarly, the projection of the data pertaining to
Strings 4-6 is shown in Figure 6.15(b), which shows a di↵erent arrangement, but a similar clustering
of data based on the articulation type.
6.5.5 Codebook Design
The plots of explained variance in Figure 6.12 demonstrate the relative low dimensionality of the
extracted guitar excitation signals. Here, we present an approach for designing a codebook to further
85
reduce the number of basis vectors required to accurately reconstruct the excitation signals. This
step is advantageous for synthesis systems where it is desirable to faithfully capture the perceptual
characteristics of the performer-string interaction, while minimizing the amount of data required.
Also, this approach separately analyzes the principal component weights for pick and finger articu-
lations to determine the “best” subset of basis vectors comprising each group of articulations. This
method considers that, while PCA yields basis vectors that successively explain the most variance
in the data, certain basis vectors may be more essential to synthesize a particular articulation based
on the magnitude of the associated weight vector.
The codebook design procedure is as follows:
1. Compute the weight matrix for the data set using Equation 6.10. A weight vector w =
[w1w2 . . . wM ] is obtained for each excitation signal in the data set.
2. Take the absolute value for each weight vector w and sort the entries in descending order so
that |w1| > |w2| > · · · > |wM |.
3. Select the first Mtop weights from the sorted weight vector where Mtop is an integer number.
4. For each of the Mtop weights selected, record the occurrence of the associated principal com-
ponent vector into a histogram.
5. Using the histogram as a guide, select a subset L of basis vectors having the highest occurrences
in the histogram (see Figure 6.16) where L < M . This yields a subset of basis vectors V ⇢ V
where V = [v1v2 . . . vL]. These form the codebook entries.
Figure 6.16 shows the histogram computed separately for excitation signals associated with pick
and finger articulations. It is interesting to note that the function of weight frequency vs. principal
component number does not monotonically decrease. This suggests that certain component vectors
are more “essential” than others for representing the ensemble of excitation signals for a particular
articulation.
6.5.6 Codebook Evaluation and Synthesis
After the codebook as been designed, a particular excitation signal can be generated by using a
desired number of codebook entries (i.e. basis vectors) and the appropriate weightings for each
86
5 10 15 20 25 30 35 40 45 500
50
100
150
200
250
Principal Component
Freq
uenc
y
PickFinger
Figure 6.16: Histogram of basis vector occurrences generated with Mtop = 20.
entry. Equation 6.13 presents the synthesis equation
pi = p +LX
m=1
wi,mvm, (6.13)
where L indicates the number of codebook entries used for re-synthesis. The weight values are
obtained by projecting the excitation signal onto the basis vectors. The number of codebook entries
used for synthesis depends on the desired accuracy. Figure 6.17 demonstrates the reconstruction
by varying the number of entries. It is clear that using a single entry does not capture the high
frequency details found in the reference excitation signal. However, using 10 entries approximates
the contour of the signal and 50 entries captures nearly all the high frequency information.
The reconstruction quality can be summarized for the entire data set by computing the signal-
to-noise ratio (SNR) for each signal in the set. SNR is defined as
SNRdB = 10 log10
X
n
✓p(n)
p(n) � p(n)
◆2
, (6.14)
where p(n) and p(n) are the original and reconstructed signals, respectively. Each excitation signal
was constructed by varying the number of codebook entries used and averaging the SNR for all
excitations at particular number of entries. Additionally, separate codebooks were developed for
signals associated with pick or finger articulations to improve error when the number of entries is
low. Figure 6.18 summarizes the results of this analysis.
87
5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
Time (msec)
Ampl
itude
1 Codebook Entries
OriginalReconstructed
(a)
5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
Time (msec)
Ampl
itude
10 Codebook Entries
OriginalReconstructed
(b)
5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
Time (msec)
Ampl
itude
50 Codebook Entries
OriginalReconstructed
(c)
Figure 6.17: Excitation synthesis by varying the number of code book entries: (a) 1 entry, (b) 10entries, (c) 50 entries.
It is of note that the SNR computed for finger excitation signals is generally higher than SNR
computed for pick excitations regardless of the number of codebook entires used. Intuitively, this
agrees with previous observations of the excitation signals obtained from our data set. In general, the
observed signals pertaining to finger articulations were not as complex as the picked articulations
88
0 50 100 150 200 25010
15
20
25
30
35
Codebook Entries
SNR
(dB)
pickfinger
Figure 6.18: Computed Signal-to-noise ratio when increasing the number of codebook entries usedto reconstruct the excitation signals.
(see Figure 6.10). Thus, the finger articulations may be more accurately represented with fewer
components.
The results presented in Figure 6.18 present a strong case for applications requiring accurate
and expressive synthesis with low data storage requirements. The initial PCA analysis yielded
570 basis vectors (for strings 1-3) each with a length of 570 samples. From Figure 6.18, it is
evident that the SNR of the reconstruction error only marginally increases when more than 150
codebook entries are used. 150 codebook entries requires only 26% ( 150⇥570570⇥570 ) of the data obtained
from the initial PCA, which significantly reduces the amount of storage required. At a 16-bit
quantization level, 150 codebook entries would require approximately 167 kilobytes of storage, which
is a modest requirement considering the storage capacities of present day personal computers and
mobile computing devices.
6.6 Nonlinear PCA for Expressive Guitar Synthesis
The linear PCA technique presented in the previous section provides intuition on the underlying
basis functions comprising our data set, it is unclear how exactly the high dimensional component
space relates to the expressive attributes of our data. As shown in Figure 6.15, there is a nonlinear
arrangement of the data along the axes pertaining to the first two principal components. Moreover,
as additional components are needed to accurately reconstruct the source signals, simply sampling
the space defined by the first two components is not su�cient for high quality synthesis. On the
89
other hand, it is di�cult to visualize and infer the underlying structure of the data by projecting
it along additional components. In this section, we explore the application of nonlinear principal
components analysis (NLPCA) to the data extracted from linear PCA to derive a low dimensional
representation of the data. We show that the reduced dimensional space derived using NLPCA
explains the expressive attributes of the excitation signals in the data set. Moreover, this low
dimensional representation can be inverted and therefore adept as an expressive controller using the
original linear components.
6.6.1 Nonlinear Dimensionality Reduction
There are many techniques available in the literature for nonlinear dimensionality reduction, or
manifold-learning, for the purposes of discovering the underlying nonlinear characteristics of high
dimensional data. Such techniques include locally linear embedding (LLE) [65] and Isomap [78].
While LLE and Isomap are useful for data reduction and visualization tasks, their application does
not provide an explicit mapping function to project the reduced dimensionality data back into the
high dimensional space.
For the purpose of developing an expressive control interface, re-mapping the data back into the
original space is essential since we wish to use our linear basis vectors to reconstruct the excitation
pulses. To satisfy this requirement, we employ NLPCA via autoassociative neural networks (ANN)
to achieve dimensionality reduction with explicit re-mapping functions.
σ
σ
σ
σ
σ
σ
σ
σ
*
z1
Input Layer
Mapping Layer
Bottleneck Layer
De-Mapping Layer
Output Layer
w1
w2
w3
w2
w3
w1
T1 T2 T3 T4
Figure 6.19: Architecture for a 3-4-1-4-3 autoassociative neural network.
90
The standard architecture for an ANN is shown in Figure 6.19 and consists of 5 layers [34]. The
input and mapping layers can be viewed as the “extraction” function since it projects the input
variables into a lower dimensional space as specified in the bottleneck layer. The de-mapping and
output layers comprise the “generation” function, which projects the data back into its original
dimensionality. Using Figure 6.19 as an example, the ANN can be specified as a 3-4-1-4-3 network
to indicate the number of nodes at each layer. The nodes in the mapping and de-mapping functions
contain sigmoidal functions and are essential for compressing and decompressing the range of the data
to and from the bottle neck layer. An example sigmoidal function that can be used is the hyperbolic
tangent, which compresses values with a range of (�1,1) to (�1, 1). Since the desired values at the
bottleneck layer are unknown, direct supervised training cannot be used to learn the mapping and
de-mapping functions. Rather, the combined network is learned using back propagation algorithms
to minimize a squared error criterion such that E = 12 kw � wk [34]. From a practical standpoint,
the mapping functions are essentially a set of transformation matrices to compress (T1, T2) and
decompress (T3, T4) the dimensionality of the data.
6.6.2 Application to Guitar Data
To uncover the nonlinear structure of the guitar features extracted in Section 6.5.4, NLPCA was
applied using 25 scores from the linear components analysis at the input layer of the ANN. Empir-
ically, we found that using 25 scores was su�cient in terms of adequately describing the data and
expediting the ANN training. As discussed in Section 6.5.4, 25 linear PCA vectors explains > 95%
of the variance in the data set and leads to good re-synthesis. At the bottleneck layer of the ANN,
we chose two nodes in order to have multiple degrees of freedom which could be used to synthesize
excitation pulses in an expressive control interface. These design criteria yielded a 25-6-2-6-25 ANN
architecture, which was trained using the NLPCA MATLAB Toolbox [67].
Figure 6.20 compares the projection of the data into the linear component space and the reduced
dimension space defined by the bottleneck layer of the ANN. As shown in 6.20(b). Unlike the linear
projection in 6.20(a), the bottleneck layer of the NLPCA space has “unwrapped” the nonlinear data
arrangement so that it is now clustered about linear axes. Figure 6.21 shows an additional linear
rotation applied to this new space for a clearer view of how the axes relate to the data set. By
examining this space, the data is clearly organized around the orthogonal z1 and z2 axes. Selected
excitation pulses are also shown, which were synthesized by sampling this coordinate space, project-
91
ing back into the linear principal component domain using the transformation matrices (T3, T4) from
the ANN and using the resulting scores to reconstruct the pulse with linear component vectors.
−6 −5 −4 −3 −2 −1 0 1 2 3 4−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
1st Principal Component
2nd
Prin
cipa
l Com
pone
nt
pick−fortepick−mezzo−fortepick−pianofinger−fortefinger−mezzo−fortefinger−piano
(a)
−1 −0.8 −0.6 −0.4 −0.2 0 0.2
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
v1
v 2
pick, fortepick, mezzo−fortepick, pianofinger, fortefinger, mezzo−fortefinger, piano
(b)
Figure 6.20: Top: Projection of excitation signals into the space defined by the first two linearprincipal components. Bottom: Projection of the linear PCA weights along the axis defined by thebottleneck layer of the trained 25-6-2-6-25 ANN.
The nonlinear component defined by the z1 axis describes the articulation type where points
sampled in the space z1 < 0 pertain to finger articulations and points sampled for z1 > 0 pertain
to pick articulations. The finger articulations feature a wider excitation pulse in contrast to the
pick, where the pulse is generally more narrow and impulsive. In both articulation spaces, moving
from left to right increases the relative dynamics. The second nonlinear component defined by the
z2 axis relates to the contact time of the articulation. As z2 is increased, the excitation pulse width
increases for both articulation types.
92
2 4 6 8 10 12−1
−0.5
0
0.5
Time (msec)
Ampl
itude
2 4 6 8 10 12−1
−0.5
0
0.5
Time (msec)
Ampl
itude
2 4 6 8 10 12−1
−0.5
0
0.5
Time (msec)
Ampl
itude
2 4 6 8 10 12−1
−0.5
0
0.5
Time (msec)
Ampl
itude
2 4 6 8 10 12−1
−0.5
0
0.5
Time (msec)
Ampl
itude
2 4 6 8 10 12−1
−0.5
0
0.5
Time (msec)
Ampl
itude
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
z1
z 2
pick, fortepick, mezzo−fortepick, pianofinger, fortefinger, mezzo−fortefinger, piano
Figure 6.21: Guitar data projected along orthogonal principal axes defined by the ANN (center).Example excitation pulses resulting from sampling this space are also shown.
6.6.3 Expressive Control Interface
We demonstrate the practical application of this research in a touch-based iPad interface shown in
Figure 6.22. This interface acts as a “tabletop” guitar, where the performer uses one hand to provide
the articulation and the other to key in the desired pitch(es). The articulation is applied to the large,
gradient square in Figure 6.22, which is a mapping of the reduced dimensionality space shown in
Figure 6.21. Moving up along the vertical axis of the articulation space increases the dynamics of
the articulation (piano to forte) and moving right to left on the horizontal axis increases the contact
time. The articulation area is capable of multi-touch input so the performer can use multiple fingers
within the articulation area to give each tone a unique timbre.
The colored keys on the left-side of Figure 6.22 allow the user to produce certain pitches. Adjacent
93
Contact Time
Strength
Articulation Space
Keyboard Area
Figure 6.22: Tabletop guitar interface for the components based excitation synthesis. The articu-lation is applied in the gradient rectangle, while the colored squares allow the performer to key inspecific pitches.
keys on the horizontal axis are tuned a half step apart and their color indicates that they are part
of the same “string” so that only the leading key on the string can be played at once. Diagonal keys
on adjacent strings are tuned to a Major 3rd interval while the o↵-diagonal keys represent a Minor
3rd interval. This arrangement allows the performer to easily finger di↵erent chord shapes.
The synthesis engine for the tabletop interface is capable of computing the excitation signal
corresponding to the performer’s touch point within the articulation space and filtering the resulting
excitation signal for multiple tones in real-time. The filter module used for the string is implemented
with the single delay-loop model shown in Figure 6.1. Though this filter has a large number of delay
taps, which is dependent on the pitch, only a few of these taps have non-zero coe�cients, which
permits an e�cient implementation of infinite impulse response filtering. Currently, the relative
plucking position along the string is fixed, though this may be a free parameter in future versions
of the application. The excitation signal can be updated in real-time during performance, which
is made possible by the iPad’s support of hardware-accelerated vector libraries. These include the
matrix multiplication routines to project the low dimensional user input into the high dimensional
component space. Through our own testing, we found that the excitation signal is typically computed
94
in < 1 millisecond, which is more than adequate for real-time performance.
6.7 Discussion
In this chapter, a novel, component-based approach was presented for modeling the excitation signals
of plucked-guitar tones. This method draws on physically inspired modeling techniques to extract the
excitation pulses from recorded performances pertaining to various articulation styles in accordance
with a source-filter model. Principal components analysis (PCA) was used to model the excitation
pulses using the resulting set of linear basis vectors. While this analysis led to a large number of
basis vectors, a codebook was developed to reduce the number required for accurate modeling.
To understand the relation between the linear components and the expressive attributes of the
excitation signals in the data set, nonlinear principal components analysis (NLPCA) was used to
achieve a reduced dimensional space using the linear weights as inputs to autoassociative neural
network (ANN). Using the ANN, the relation of the expressive attributes of the excitation signals
to the axes of the reduced dimensional space are clear.
A pertinent application of this research includes developing new interfaces for musical expression.
The application of NLPCA to the excitation signal data set derives a low dimensional representation
based on linear basis vectors and has a clear relationship to the expressive attributes of the data set.
Since the transformation into the reduced space is invertible, this representation could be leveraged
into gesture recognition and control applications for music synthesis. At present, gesture-based
recognition systems for guitar synthesis rely on non-parametric, sample-based synthesizers or at
best, physical models where the excitation signals are saved o↵-line [26, 55]. The component-based
modeling approach presented here is limited only by the data used to derive component vectors and
can be used for arbitrary synthesis using the reduced dimensional space.
Similar to gesture-recognition systems, recent advances in mobile computing technology make
touch-based devices a compelling platform for expressive musical interfaces, especially for the gui-
tar. Among the existing interfaces are Apple’s iPad implementation of Garageband, which uses
accelerometer data in response to the user’s tapping strength to trigger an appropriate sample
for the synthesizer [20]. Similarly, the OMGuitar enables single note or chorded performance and
triggers chord samples based on the how quickly the user “strums” the interface [1]. In both cases,
sample-based synthesizers are used, though as shown in the previous section, the reduced-dimensional
component space is highly applicable to these interfaces.
95
CHAPTER 7: CONCLUSIONS
This research presented several novel techniques for the analysis and synthesis of guitar performance
focusing on the player’s string articulation, which can be summarized as follows:
• Generated a data set of plucked guitar tones comprising variations of the performer’s articu-
lation including the plucking mechanism and strength, which spans all of the guitar’s strings
and several fretting positions.
• Developed a framework for jointly estimating the source and filter parameters for plucked-
guitar tones based on a physically-inspired model.
• Proposed and demonstrated a novel application of principal component analysis to model the
source signal for plucked guitar tones to encapsulate characteristics of various string articula-
tions.
• Utilized nonlinear principal components analysis to derive an expressive control space to syn-
thesize excitation signals corresponding to guitar articulations.
This research is centered on source-filter modeling techniques widely used in the literature since the
model highly analogous to the process of exciting a resonant string. I have shown that estimating the
parameters of the model can be formulated as a joint estimation problem where the motivation is to
account for the simultaneous variation between the performer’s articulation and the string’s resonant
response and that this technique is adept at capturing the parameters and perceptual attributes
of recorded plucked-guitar tones produced with di↵erent plucking mechanisms and strengths. A
novel, data-driven approach for modeling excitation signals based on linear and nonlinear principal
components was also presented. This modeling approach decouples the e↵ect of the performer’s
plucking position on the string and treats each excitation signal as a weighted combination of basis
vectors. Nonlinear components analysis is used to derive an invertible, expressive space which can be
used to synthetically generate excitation signals pertaining to specific articulations in the data set.
A practical application of this research was also presented where an iPad was used to demonstrate
flexible, real-time synthesis of guitar tones with control over the string articulation.
96
This chapter will discuss limitations of the proposed methods with regard to the techniques
employed and the underlying physics of vibrating strings. Future directions for this research will
also be discussed.
7.1 Expressive Limitations
The techniques presented in this dissertation are primarily concerned with modeling the performer’s
articulation through their plucking action, which includes the e↵ects of plucking mechanism and
strength. However, guitarists use additional expressive techniques during performance pertaining
to the action of their fretting hand which controls the pitch of the plucked-tone. These techniques
include legato, or smooth, transitions between notes and pitch shifting techniques such as bends and
vibrato, which alter the pitch of a tone after it has been excited. Due to the time-varying nature
of the the tones resulting from these techniques, analysis and synthesis with linear time-invariant
source filter models is extremely di�cult or unfeasible.
Guitarists typically play with legato style using slides, “hammer-on’s” or “pull-o↵’s” between
notes. When performing a slide, the note is played at a particular position and the fretting finger
moves up or down the string after the note has sounded until the desired pitch is reached. Similarly,
a hammer-on involves playing a particular note with a fretting finger and using another finger to
clamp down the string at a higher fret position after the note has already sounded to achieve a sudden
pitch increase. The complementary technique is the pull-o↵, where the fretting finger is released
and another finger, already in position behind the fretting finger, sets a lower pitch. The discrete
pitch changes resulting from tones produced with legato are not easily analyzed with a source-filter
model. In particular, sliding into a note causes one or many discrete pitch changes as the guitarist’s
finger moves along the fretboard to its final position. The resulting tone will exhibit time varying
pitch and decay characteristics. The hammer-on technique introduces additional complexity into
the analysis since the string is “excited” in a sense by the second finger clamping the new fret in
an impulsive-like manner. Furthermore, melodies can often be performed with hammer-on’s and
pull-o↵’s without using the articulation hand to initially excite the string, which diverges from the
traditional notion of how the string is excited.
While legato performance introduces sudden, discrete pitch changes to the plucked tone, vibrato
and string bending alter the pitch of the fretted note without changing the fret position. Vibrato
is achieved by rapidly wiggling the fretting finger at a particular position to slightly alter the pitch
97
of the tone. Pitch-bending involves physically bending the string at the fretting position, thereby
altering its tension to achieve a pitch increase. While a certain degree of vibrato may be negligible
from an analysis standpoint, pitch bending produces a signal with noticeable time-varying pitch,
which cannot be analyzed using either the proposed joint source-filter estimation scheme or existing
spectral-based filter estimation schemes. This is due to the harmonically related partials shifting
with the fundamental frequency so that the continuously changing partial frequencies and decay
rates must be identified. Implementing pitch shifting via post-processing can be achieved but with
certain restrictions. For example, vibrato can be implemented using the source-filter model by
varying the fractional delay filter in the feedback loop as long as the pitch change is small. However,
significant pitch shifting requires modification of the bulk delay term in the feedback loop. Such
modification requires continuously resampling the delay line to simulate the gradual tension change
in the string [80]. In certain synthesis systems, pitch bending is often simulated by applying a
phase vocoder algorithm which applies short-time spectral manipulation to the signal to smoothly
alter the pitch of a synthetic signal [20]. Alternately, a sinusoidal model can easily be applied to
the time-varying characteristics associated with string bending, though the benefits associated with
source-filter modeling will be lost.
7.2 Physical Limitations
The so-called single delay-loop (SDL) model that forms the basis of the analysis and synthesis
techniques presented in this dissertation describes the basic components of plucked string synthesis
including articulation, pitch and frequency-dependent decay. However, there are several physical
aspects of vibrating guitar strings that are not encapsulated by the model.
It is well understood that real strings vibrate along the transverse and longitudinal directions
which are perpendicular and parallel to the guitar’s body, respectively. The perceived vibration
of the string is the sum of vibration in both directions, including coupling e↵ects, and in certain
cases a “beating phenomena” is heard, which is caused by slight di↵erences in the string’s e↵ective
length along the transverse and longitudinal axes [16]. The beating phenomena causes the sum
and di↵erence frequencies to be perceived by the listener. Identification of the beating frequencies
in guitar tones through analysis is di�cult since it is a fast occurring phenomenon requiring high
spectral resolution (and thus long window lengths) to identify the distinct frequencies. Lee presents
an approach for finding the beating frequencies through identification of the two-stage decay evident
98
in plucked tones, but it is unclear how to automate the process which is based on an additive
synthesis model [43]. While beating isn’t included in the analysis techniques presented here, beating
implementation is often accomplished via an ad-hoc approach where two SDL models are used, each
having a slightly di↵erent pitch, and placed in parallel. The outputs of each SDL are scaled by a gain
factor and mixed to create a synthetic signal with beating present around the fundamental frequency
[44]. The presented synthesis techniques can easily be modified to include beating, though automated
analysis and identification of the beating frequencies remains an on-going research problem.
The pitch shifting due the tension modulation present in real plucked-guitar strings is not ex-
plicitly accounted for in the joint source-filter estimation since it is a slowly time-varying process.
However, when the measured pitch shift is relatively small, the fractional delay filter can be slowly
varied over time to manipulate the frequency as discussed in Appendix B. The frequency trajecto-
ries are obtained by modeling the pitch of a plucked tone via short-time analysis. A technique for
incorporating tension modulation into a synthesis system involves re-sampling the delay line to alter
the pitch [80] or using a sinusoidal model where the frequencies of the harmonically related partials
gradually decrease over time [42].
7.3 Future Directions
Beyond the expressive and physical limitations of the modeling techniques demonstrated, the com-
putational model of guitar articulations developed in this thesis could be furthered through the
collection of performance data from additional guitarists. However, acquiring this data is challeng-
ing due to the specific guitar configuration (e.g. bridge-mounted piezoelectric pickup) required for
recording and analyzing the performance. Currently, no publicly available datasets exists while
also satisfying the recording configuration, which is why a dataset was created specifically for this
research.
There is also the issue of recording the guitarist in the context of a live performance. The data
set developed is centered on capturing the acoustic attributes of the expression associated with an
articulation in a controlled environment so that individual strings can be isolated. During a live
performance, guitarists will alter their articulation in other ways, especially when strumming the
strings to produce chords. This necessitates a divided, or “hexaphonic”, guitar pickup for capturing
the audio from individual strings while avoiding the challenging task of multiple source separation
from a polyphonic mixture. Divided pickups are commercially available for common guitar models,
99
but a streamlined apparatus is required to interface the signals with recording equipment without
being obtrusive to the performer. Development of this complete, polyphonic recording system for
capturing contextual performance remains a task for future work.
With the inclusion of performance data from many guitarist’s, computational models for specific
performers could be developed to determine if the di↵erences in articulation are discernible using the
proposed modeling techniques. These models could then be used to “profile” a particular performer
and integrate the related parameters into a synthesis system for the application of new musical
interfaces. It was already demonstrated that the excitation synthesis could be implemented on
currently available mobile computing platforms, but emerging gesture recognition technologies, such
as the Microsoft Kinect, could also be used to harness this technology for performance, entertainment
and gaming applications.
From a physical modeling standpoint, additional characteristics of guitars such as body resonance
e↵ects and magnetic pickups could be studied including how the performer uses these aspects of the
instrument during performance. Foremost, inclusion of these e↵ects is required for acoustically ac-
curate synthesis of a “complete” guitar model, which would necessitate augmenting the source-filter
model with blocks implementing the signal processing tasks for modeling the pickups, resonance,
etc. Also, analysis of how the guitarist uses certain techniques such as plucking position or pickup
position either consciously, or subconsciously in context with the performance also warrants analysis.
100
Appendix A: Overview of Fractional Delay Filters
A.1 Overview
The waveguide models introduced in Chapter 3 depend on a delay loop parameter, D, that sets the
waveguide’s total sample delay and thus the pitch, f0, of the synthesized tone such that D = fsf0
,
where fs is the sampling frequency. In many cases, however, D is a non-integer which cannot be
obtained by a simple ratio of integers. In some systems, it is permissible to adjust the sampling rate
to achieve a desired pitch, though this is often undesirable especially when multiple voices are being
synthesized or when certain performance techniques, such as tremolo and vibrato, require D to be
a continuously varying parameter.
Fractional delay filters have been widely used in the literature to provide the required non-
integer delay required for precisely tuning waveguide models [25, 26, 29, 56, 59, 85]. However, design
and implementation of such filters is not straight forward and requires some special consideration.
This appendix will briefly overview the basic theory and practical considerations associated with
designing and implementing FIR-type fractional delay filters. While IIR-type filters are also used
for this task, FIR filters are preferred in the literature since they can be easily designed with good
frequency response characteristics. In particular, the Lagrange interpolation fractional delay filter
is examined, which is used in this thesis.
A.2 The Ideal Fractional Delay Filter
To understand fractional delay filters, it is useful to consider a discrete time signal, x(n), delayed
by D samples. D is a real number and is expressed as
D = dI + dF (A.1)
where dI and dF are the integer and fractional components, respectively. x(n) is shifted by D
samples via convolution with a shifting filter, hid(n), to yield y = x(n�D) [54]. In the z-transform
101
domain, the transfer function of the ideal shifting filter is,
Hid(z) =Y (z)
X(z)=
X(z)z�D
X(z)= z�D (A.2)
and the corresponding frequency response is obtained by setting z = e�j! in Equation A.2:
Hid(e�j!) = e�j!D (A.3)
By computing the magnitude, phase and group delay responses for Equation A.3, it can be veri-
fied that Hid(e�j!) is distortionless since it will pass an input signal without magnitude or phase
distortion as shown in Equations A.4 - A.6 [37]:
��Hid(e�j!)
�� =��e�j!D
�� = 1 (A.4)
⇥id(!) = \Hid(e�j!) = �!D (A.5)
⌧id(!) = � @
@!⇥id(!) = D (A.6)
It is intuitive that the filter will not distort the magnitude of an input signal since it has unity gain,
but the importance of the linear phase response shown in Equation A.5 cannot be understated.
Linear phase implies that the system has a constant group delay such that the input signal is
uniformly delayed by D samples regardless of frequency.
The impulse response of Hid(e�j!) can be obtained by taking its inverse discrete-time Fourier
transform [54]:
hid(n) =1
2⇡
Z ⇡
�⇡
Hid(e�j!)ej!nd! (A.7)
=1
2⇡
Z ⇡
�⇡
e�j!Dej!nd! (A.8)
=1
2⇡
Z ⇡
�⇡
ej!(n�D)d! (A.9)
By evaluating the integral in Equation A.9, the impulse response of Hid(e�j!) can be verified as
the sinc function shifted by D samples
hid(n) =sin(⇡(n � D))
⇡(n � D)= sinc (n � D). (A.10)
102
−2 −1 0 1 2 3 4 5 6 7 8
0
0.5
1D = 3
Sample (n)Am
plitu
de
−2 −1 0 1 2 3 4 5 6 7 8
0
0.5
1D = 3.3
Sample (n)
Ampl
itude
Figure A.1: Impulse responses of an ideal shifting filter when the sample delay assumes an integer(top) and non-integer (bottom) number of samples.
Laakso et al. address the problems with implementing a fractional delay filter by comparing the
impulse responses for hid(n) when D takes on integer and non-integer values as shown in Figure A.1
[35, 87] In the case where D = 3, hid(n) reduces to a unit impulse at n = 3 since the sinc function
is exactly zero at all other sample values. When D = 3.3, however, the hid(n) cannot be reduce
to a simple unit impulse, since the peak of the sinc function is o↵set from an integer sample value.
Now, an interpolation using all samples of the sinc function is required to delay an input signal by
D = 3.3 samples. As the bottom panel of Figure A.1 shows, implementing this impulse response is
not possible since hs(n) is both non-causal and infinite in length.
A.3 Approximation Using FIR Filters
Since the ideal fractional delay (FD) filter cannot be realized in practice, techniques are required to
approximate the impulse response for practical implementations. This section will briefly overview
the design techniques used to develop approximations based on finite impulse response (FIR) filters.
A FIR filter that approximates the ideal shifting filter has the following form
HF (z) =NX
n=0
h(n)z�n (A.11)
where N indicates the filter order, such that the filter consists of N + 1 coe�cients. To determine
103
the coe�cients of h(n) that approximate the ideal filter, an error function is defined
E(ej!) = Hid(ej!) � HF (ej!). (A.12)
Laakso et al. obtain a time-domain error criterion by applying the L2 norm to Equation A.12 and
applying Parseval’s Theorem [35] which yields
eL2(n) =1X
n=�1|hF (n) � hid(n)|2. (A.13)
The optimal solution for hF (n) as per Equation A.13 is the ideal impulse response truncated and
delayed by the required number of samples. The error decreases as the number of samples used to
approximate the sinc function are increased.
A.3.1 Delay Approximation using Lagrange Interpolation Filters
A consequence of implementing fractional delay (FD) filters based on a truncated sinc function is the
well-known Gibbs phenomenon [54]. Essentially, the Gibbs phenomenon results from truncating the
impulse response of a sinc function with a square window, which results in the FD filter’s magnitude
response exhibiting a ripple due to side lobe interaction. This rippling is often undesirable and
thus, more sophisticated techniques are required to design FD filters with relatively flat magnitude
responses.
Lagrange interpolation filters allow for FD filter design with a maximally flat magnitude response
at a frequency of interest. The coe�cients for the Lagrange filters are obtained by setting the
derivatives of Equation A.12 equal to 0.
dnE(ej!)
d!n|!=!0 = 0 for n = 0, 1, 2, . . . , N (A.14)
In most cases, it is desired that the maximally flat magnitude response occur near DC, which requires
!0 = 0. The solution of Equation A.14 is obtained by solving a system of N linear equations which
has the following solution
h(n) =NY
k=0,k 6=n
D � k
n � kfor n = 0, 1, 2, . . . , N (A.15)
104
where D is the total delay including the fractional component [35, 53]. The name of the Lagrange
interpolation filter becomes obvious when considering the case N = 1 which yields coe�cients
h(0) = 1 � D and h(1) = D, which is equivalent to a linear interpolation between two samples.
Figure A.2 illustrates the tradeo↵s associated with designing Lagrange filters with a desired
accuracy. As the order N is increased, the values of the h(n) approach the ideal fractional delay filter
at the expense of adding integer sample delay. Figure A.3 demonstrates the tradeo↵s associated with
the order of the Lagrange FD filter and its frequency response. As N increases, the cuto↵ frequency
for the filter’s magnitude response increases, thus providing a flatter magnitude response across a
wider bandwidth. Similarly, we also see the tradeo↵ associated with the group delay of the FD
filter, since increasing N maintains the desired flat group delay response over a wider bandwidth.
For an N -order Lagrange FD filter designed for a maximally flat response at DC, the associated
bulk integer delay, dI , at this frequency can be computed as bN/2c.
0 1 2 3 4 5 6 7 8 9 10
0
0.5
1
Sample (n)
Ampl
itude
Lagrange Filter, N = 3Ideal
0 1 2 3 4 5 6 7 8 9 10
0
0.5
1
Sample (n)
Ampl
itude
Lagrange Filter, N = 7Ideal
Figure A.2: Lagrange interpolation filters with order N = 3 (top) and N = 7 (bottom) to providea fractional delay, dF = 0.3. As the order of the filter is increased, the Lagrange filter coe�cientsnear the values of the ideal function.
A.4 Further Considerations
Lagrange interpolation filters are a popular choice in many waveguide synthesis systems, since the
filter coe�cients are relatively easy to compute and the frequency response characteristics are suf-
ficient for relatively low order filters. In general, FIR FD filters are preferred for musical synthesis
because they can be varied during synthesis to achieve certain a↵ects (such as pitch bending or
105
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10
−5
0
5
Normalized Frequency
Mag
nitu
de (d
B)
N = 3N = 5N = 7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
1
2
3
4
5
Normalized Frequency
Gro
up D
elay
(sam
ples
)
N = 3 N = 5 N = 7
Figure A.3: Frequency response characteristics of Lagrange interpolation filters with order N =3, 5, 7 to provide a fractional delay dF = 0.3. Magnitude (top) and group delay (bottom) character-istics are plotted.
vibrato) without noticeable transient e↵ects, which is problematic when using IIR FD filter.
FD filter design based on a maximally flat frequency characteristics is just one of many techniques
used for designing FD filters. The reader is referred to the work of Laakso and Valimaki for additional
FD design techniques, including windowed sinc functions, weighted-least squares and IIR design
techniques [35, 87]. Additionally, these works provide theory and techniques required to develop IIR
FD filters using all-pass filters, which have their own benefits over FIR implementations.
106
Appendix B: Pitch Glide Modeling
B.1 Overview
Pitch glide is an important physical consequence resulting from plucking a guitar string. As a
plucked-string vibrates about its equilibrium, or resting, position, it is subject to elongation of its
nominal length. This elongation increases the tension of the string beyond its nominal value and,
consequently, increases the fundamental frequency of vibration. As shown in the following equation,
the fundamental frequency of vibration is proportional to the square of the string’s tension:
f0 =
rKt
4mL(B.1)
where Kt, m and L are the string’s tension, mass and length, respectively [17]. Since the string loses
energy during vibrations due to various frictional forces, the amplitude of its transverse displacements
decreases over time and thus, the elongation decreases as well. After some amount of time, the string
will vibrate at near its nominal, or un-stretched value, and a steady state pitch is perceived.
Modeling and simulation of pitch glide is an important consideration for an expressive guitar
system since it can lead to tones that have a noticeably higher pitch near the “attack” part of
the note than it does some time later. The amount of pitch glide present in a tone depends on
the guitarist’s dynamics, or the relative “hardness” used to displace the string. Therefore, as a
guitarist increases their dynamics during performance, we expect the resulting notes to have a
greater perceived pitch initially than some time after the “attack” phase.
This appendix will discuss the modeling and implementation of pitch glide for expressive guitar
synthesis. This includes the estimation of time-varying pitch from plucked-guitar recordings, fitting
estimated data to a model of pitch glide and practical implementation.
107
B.2 Pitch Glide Model
The following model was proposed by Lee et al. [42] to simulate the pitch glide trajectory observed
in recorded guitar tones
f(t) = fss(1 + ↵e�t/⌧ ). (B.2)
This representation consists of multiplying the steady state pitch value fss, which is associated with
the nominal tension of the string, by an exponentially decaying function with time constant ⌧ and
a multiplicative factor ↵. This model ensures that the tone decays to its steady state pitch as
t ! 1, which agrees with the physicality of the damped vibrating string. The multiplicative factor
↵ determines the amount of pitch excursion such that increasing ↵ increases the amount of pitch
deviation from its steady state value. By setting ↵ to an arbitrarily small (or zero) value, the pitch
glide e↵ect is e↵ectively eliminated so that f(t) ⇡ fss for all values of t. For a physical interpretation
of Equation B.2, Lee relates the time-varying fundamental frequency to the square of the slope of
the string’s displacement, which decays exponentially over time [42].
The pitch glide model of Equation B.2 is suitable for an expressive synthesis system because
its parameters can be related to particular articulations. In particular, the ↵ parameter allows the
amount of pitch glide to vary based on the dynamics used by the player. This parameter, and the
others, must be determined through analysis of plucked-guitar recordings.
B.3 Pitch Glide Measurement
In this section we discuss the estimation of pitch glide parameters through analysis of plucked-guitar
recordings. The data set used for parameter estimation consists of approximately 1000 samples of
guitar tones recorded using a bridge-mounted piezo-electric pick-up. The recorded notes span all 6
guitar strings and were produced by varying the plucking device and articulation from piano (soft),
to mezzo-forte (moderately loud) to forte (loud). More information about the data is provided in
Section 6.3.
The first step involves acquisition of the pitch glide data from the recordings. A short-time
analysis is applied to the recordings to extract 1500 msec of pitch information for each tone beginning
at the “attack” instant of the tone. This audio segment is sub-divided into overlapping frames, each
having a duration of 90 msec and adjacent frames are overlapped by a factor of 90%.
108
For each analysis frame, the Fast Fourier Transform (FFT) is computed and the pitch is deter-
mined by searching for the prominent peak in the frequency spectrum. The underlying frequency
bin at the spectral peak indicates the pitch for the vibrating string at that moment. This pitch
estimation is improved via quadratic interpolation around the spectral peak [77]. Utilizing the peak
FFT bin and the magnitudes of the neighboring bins on each side of the peak, the “true” peak
is found by finding the maxima of the parabola passing through all three points. The underlying
frequency of this maxima is taken as the “true” frequency. This step improves the pitch estimation
by compensating for the limited frequency resolution of the FFT.
By repeating the pitch estimation for each frame, a pitch trajectory is obtained for each recording
in the data set. Since the approach involves determining the parameters of Equation B.2 from many
recordings, each pitch trajectory is normalized by its steady state frequencyf(t)/fss. By dividing
Equation B.2 by fss, the measured data must be fit to the following equation
fnorm(t) = 1 + ↵e�t/⌧ , (B.3)
where fnorm(t) = f(t)/fss is the normalized pitch trajectory. The normalized pitch trajectories cor-
responding to recordings produced with a specific articulation (e.g. piano) are averaged to compute
pitch trajectory prototype curves used for model fitting.
B.4 Nonlinear Modeling and Data Fitting
B.4.1 Nonlinear Least Squares Formulation
To determine the model parameters that best describe the measured pitch glide trajectories, a
nonlinear least-squares (NLLS) problem is formulated. The problem formulation involves defining a
residual function
r(t) = f(t) � F (t, x), (B.4)
where f is a prototype pitch glide curve measured from audio and F (t, x) is the pitch glide function
in Equation B.3 with unknown parameters x = [↵ ⌧ ]. The optimal parameters satisfy S(x⇤) = 0
109
where S is the sum of squares of the residual defined by
S(x) =X
t
r(t)2. (B.5)
The unknowns in x are found by taking the gradient of S with respect to x and setting it equal to
zero
@S
@xi= 2
X
t
rt@rt@xi
= 0 i = 1, 2. (B.6)
Equation B.6 lacks a closed form solution since the partial derivatives @rt/@xi of the nonlinear func-
tion depend on both the independent variable and the unknown parameters. In practice, nonlinear
least squares problems are solved using iterative methods where initial values of the unknown param-
eters in x are specified and iteratively refined using successive approximation [32]. This linearizes
the model through a Taylor series expansion by ignoring the high order, nonlinear terms.
The algorithm chosen for successive approximation in this implementation is the Gauss-Newton
Iteration, which is available in many numerical software packages. The MATLAB function lsqnonlin
applies NLLS approximation using the Gauss-Newton Iteration by default [48]. This function allows
the programmer to specify the nonlinear function desired for curve-fitting as well as the initial pa-
rameter estimations, bounds for the unknown parameters, the maximum number of iterations and
several other options.
B.4.2 Fitting and Results
We first extract the pitch glide parameters for the forte articulations using MATLAB’s lsqnonlin
function. The results of this fit are shown in Figure B.1.
Using the time constant ⌧ estimated for the forte pitch glide curve, we constrain the NNLS
algorithm for the remaining piano and mezzo-forte curves by enforcing the same ⌧ value for all curves.
This results in all pitch glide curves having the same time constant, but di↵ering ↵ parameters, which
determine the maximum amount of pitch deviation from the steady state value. In this manner, ↵
acts as an expressive control parameter which can be varied to continuously interpolate between the
piano and forte pitch glide curves. Figure B.2 shows the observed and estimated pitch glide curves
for each articulation and clearly shows the e↵ect of the ↵ parameter on the initial pitch glide value.
The extracted parameters for each articulation are summarized in Table B.1.
110
0.2 0.4 0.6 0.8 1 1.2 1.41
1.0005
1.001
1.0015
1.002
1.0025
1.003
1.0035
1.004
1.0045
Time (sec)
Nor
mal
ized
Fre
quen
cy
Forte, measuredForte, fitted
Figure B.1: Measured and modeled pitch glide for forte plucks.
B.5 Implementation
For implementation of the pitch glide e↵ect in a plucked-guitar synthesis system, we employ the
well-known single delay-loop model, which was presented in Chapter 2 and is shown in Figure B.3.
The pitch of the synthetic tone is determined by the ratio fsD where fs is the sampling frequency and
D is the delay line length. Since the ratio of fsD is often non-integer, HF (z) provides the required
non-integer delay. Appendix A provides an overview of fractional delay filters.
The fractional delay filter chosen is a variable 5th order LaGrange interpolation filter inserted
into the feedback loop of the single delay-loop model as shown in Figure B.3. Equation B.3 can be
multiplied by the desired steady state pitch value to achieve the correct tuning. The pitch glide is
implemented by updating the coe�cients of HF (z) every 50 milliseconds according to the prototype
curve for a particular articulation. Updating the coe�cients in this manner is possible, since the
single delay-loop model is implemented as a Type I IIR filter, which has separate delay lines for the
input and output feedback [77].
111
0.2 0.4 0.6 0.8 1 1.2 1.41
1.0005
1.001
1.0015
1.002
1.0025
1.003
1.0035
1.004
1.0045
Time (sec)
Nor
mal
ized
Fre
quen
cy
Forte, measuredForte, fittedMezzo−forte, measuredMezzo−forte, fittedPiano, measuredPiano, fitted
Figure B.2: Measured and modeled pitch glide for piano, mezzo-forte and forte plucks.
Hl (z) HF (z) z-DI
+pb(n) y(n)
Figure B.3: Single delay-loop waveguide filter with variable fractional delay filter, HF (z).
112
Table B.1: Pitch glide parameters of Equation B.3 for plucked guitar tones for each guitar string.p, mf and f indicate strings excited with piano, mezzo-forte and forte dynamics, respectively.
Pitch Glide Parameters
String Dynamic ↵ (⇥10�4) ⌧
1p 1.523 0.2284
mf 3.123 0.2284
f 11.94 0.2284
2p 9.337 0.4037
mf 19.41 0.4037
f 44.39 0.4037
3p 16.45 0.3958
mf 35.51 0.3958
f 72.91 0.3958
4p 26.03 0.3766
mf 36.55 0.3766
f 60.89 0.3766
5p 35.04 0.3786
mf 60.21 0.3786
f 68.28 0.3786
6p 38.03 0.3523
mf 62.76 0.3523
f 81.24 0.3523
113
Bibliography
[1] Amidio. OMGuitar advanced guitar synth. http://amidio.com/portfolio/omguitar/, Jan.2012.
[2] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Green-baum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. Society forIndustrial and Applied Mathematics, Philadelphia, PA, third edition, 1999.
[3] B. Bank and V. Valimaki. Robust loss filter design for digital waveguide synthesis of stringtones. Signal Processing Letters, IEEE, 10(1):18 – 20, Jan. 2003.
[4] J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler. A tutorial ononset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5),2005.
[5] C.M. Bishop. Pattern Recognition and Machine Learning. Information science and statistics.Springer, 2006.
[6] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, The Edin-burgh Building, Cambridge, CB2 8RU, UK, 2004.
[7] K. Bradley, Mu-Huo Cheng, and V.L. Stonick. Automated analysis and computationally e�cientsynthesis of acoustic guitar strings and body. In Applications of Signal Processing to Audio andAcoustics, 1995., IEEE ASSP Workshop on, pages 238–241, Oct. 1995.
[8] John M. Chowning. The synthesis of complex audio spectra by means of frequency modulation.J. Audio Eng. Soc, 21(7):526–534, 1973.
[9] Perry R. Cook, editor. Music, Cognition, and Computerized Sound: An Introduction to Psy-choacoustics. MIT Press, Cambridge, MA, USA, 1999.
[10] Perry R. Cook. Real Sound Synthesis for Interactive Applications. A. K. Peters, Ltd., Natick,MA, USA, 2002.
[11] Perry R. Cook and Gary P. Scavone. The synthesis toolkit (STK). In International ComputerMusic Conference, 1999.
[12] G. Cuzzucoli and V. Lombardo. A physical model of the classical guitar, including the player’stouch. Computer Music Journal, 23(2):52–69, Jun. 1999.
[13] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification. Pattern Classification and SceneAnalysis: Pattern Classification. Wiley, 2001.
[14] C. Erkut, V. Valimaki, M. Karjalainen, and M. Laurson. Extraction of physical and expres-sive parameters for model-based sound synthesis of the classical guitar. In 108th AES Int.Convention 2000, pages 19–22, Paris, France, Feb. 2000. AES.
114
[15] Fishman. Pickups: Tune-o-matic powerbridge pickup. http://www.fishman.com/products/
view/tune-o-matic-powerbridge-pickup, Apr. 2012.
[16] N. H. Fletcher. The nonlinear physics of musical instruments. Technical Report 62, Instituteof Physics Publishing, 1999.
[17] N.H. Fletcher and T.D. Rossing. The Physics of Musical Instruments. Springer, 1998.
[18] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version1.21. http://cvxr.com/cvx.
[19] J. Gudnason, M. R. P. Thomas, P. A. Naylor, and D. P. W. Ellis. Voice source waveformanalysis and synthesis using principal component analysis and gaussian mixture modelling. InProc. of the 2009 Annual Conference of the International Speech Communication Association,Brighton, U.K., Sept. 2009. INTERSPEECH.
[20] Apple Inc. Garageband. http://itunes.apple.com/us/app/garageband/id408709785?mt=8,Jan. 2012.
[21] ISO. Information technology - coding of audio-visual objects - part 3: Au-dio. http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.
htm?csnumber=53943, Nov. 2011.
[22] D. A. Ja↵e and J. O. Smith. Extensions of the Karplus-Strong plucked-string algorithm. Com-puter Music Journal, 7(2):56–69, Jun. 1983.
[23] J.-M. Jot. An analysis/synthesis approach to real-time artificial reverberation. In Proc. ofIEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages221–224. ICASSP, Mar. 1992.
[24] M. Karjalainen, A. Harma, U.K. Laine, and J. Huopaniemi. Warped filters and their audio ap-plications. In Proc. IEEE Worshop on Applications of Signal Processing to Audio and Acoustics,page 4 pp. WASPAA, Oct. 1997.
[25] M. Karjalainen and U. K. Laine. A model for real-time sound synthesis of guitar on a floating-point signal processor. In Proc. of IEEE International Conference on Acoustics, Speech andSignal Processing, volume 5, pages 3653–3656. ICASSP, Apr. 1991.
[26] M. Karjalainen, T. Maki-Patola, A. Kanerva, A. Huovilainen, and P. Janis. Virtual air guitar.In Proc. of the 117th Audio Engineering Society Convention. AES, Oct. 2004.
[27] M. Karjalainen, H. Penttinen, and V. Valimaki. Acoustic sound from the electric guitar usingDSP techniques. In Proc. of IEEE International Conference on Acoustics, Speech and SignalProcessing, volume 2, pages 773–776. ICASSP, 2000.
[28] M. Karjalainen and J. O. Smith. Body modeling techniques for string instrument synthesis. InProc. of the International Computer Music Conference. ICMC, 1996.
[29] M. Karjalainen, V. Valimaki, and Z. Janosy. Towards high-quality sound synthesis of the guitarand string instruments. In Proc. of the International Computer Music Conference. ICMC, Sept.1993.
[30] M. Karjalainen, V. Valimaki, and T.. Tolonen. Plucked-string models: From the Karplus-Strongalgorithm to digital waveguides and beyond. Computer Music Journal, 22(3):17–32, Oct. 1998.
115
[31] K. Karplus and A. Strong. Digital synthesis of plucked-string and drum timbres. ComputerMusic Journal, 7(2):43–55, Jun. 1983.
[32] C. T. Kelley. Iterative Methods for Optimization. Frontiers in Applied Mathematics, SIAM,1999.
[33] L. E. Kinsler, A. R. Frey, A. B. Coppens, and J. V. Sanders. Fundamentals of Acoustics. Wiley,3rd edition, 1982.
[34] Mark A. Kramer. Nonlinear principal component analysis using autoassociative neural networks.AIChE Journal, 37(2):233–243, 1991.
[35] T. Laakso, V. Valimaki, M. Karjalainen, and U. K. Laine. Splitting the unit delay - tools forfractional delay filter design. IEEE Signal Processing Magazine, 13(1):30–60, Jan. 1996.
[36] J. Laroche and J.-L. Meillier. Multichannel excitation/filter modeling of percussive sounds withapplication to the piano. IEEE Transactions on Speech and Audio Processing, 2(2):329 –344,Apr. 1994.
[37] B. P. Lathi. Signal Processing And Linear Systems. Oxford University Press, Inc., 198 MadisonAvenue, New York, New York, 10016, 1998.
[38] N. Laurenti, G. De Poli, and D. Montagner. A nonlinear method for stochastic spectrum esti-mation in the modeling of musical sounds. IEEE Transactions on Audio, Speech, and LanguageProcessing, 15(2):531 –541, Feb. 2007.
[39] M. Laurson, C. Erkut, V. Valimaki, and M. Kuushankare. Methods for modeling realisticplaying in acoustic guitar synthesis. Computer Music Journal, 25(3):38–49, Oct. 2001.
[40] N. Lee, R. Cassidy, and J.O. Smith. Use of energy decay relief (EDR) to estimate partial-overtone decay-times in a freely vibrating string. In Invited paper at The Musical AcousticsSessions at the Joint ASA-ASJ meeting, Honolulu, HI, 2006. ASA.
[41] N. Lee, Z. Duan, and J. O. Smith. Excitation signal extraction for guitar tones. In Proc. of theInternational Computer Music Conference. ICMC, 2007.
[42] N. Lee, J. O. Smith, J. Abel, and D. Berners. Pitch glide analysis and synthesis from recordedtones. In Proc. of the International Conference on Digital Audio E↵ects, Como, Italy, Sept.2009. DAFx.
[43] N. Lee, J. O. Smith, and V. Valimaki. Analysis and synthesis of coupled vibrating strings usinga hybrid modal-waveguide synthesis model. IEEE Transactions on Audio, Speech and LanguageProcessing, 18(4):833–842, May 2010.
[44] N. Lindroos, H. Penttinen, and V. Valimaki. Parametric electric guitar synthesis. ComputerMusic Journal, 35(3):18–27, Sept. 2011.
[45] Line6. Lin6 modeling amplifiers. http://line6.com/amps, May 2012.
[46] Line6. Lin6 variax guitars. http://line6.com/guitars, May 2012.
[47] MathWorks. Optimization Toolbox 5.0. http://www.mathworks.com/products/
optimization/, August 2010.
116
[48] MathWorks. Curve Fitting Toolbox 3.0. http://www.mathworks.com/products/
curvefitting/, November 2011.
[49] D. Mazzoni and R. Dannenberg. Audacity. http://audacity.sourceforge.net/, Oct. 2011.
[50] R. McAulay and T. Quatieri. Speech analysis/synthesis based on a sinusoidal representation.IEEE Transactions on Acoustics, Speech and Signal Processing, 34(4):744 – 754, aug 1986.
[51] P. Mokhtari, H. R. Pfitzinger, and C. T. Ishi. Principal components of glottal waveforms:towardsparameterisation and manipulation of laryngeal voice quality. In VOQUAL ’03, 2003.
[52] P. M. Morse and K. U. Ingard. Theoretical Acoustics. McGraw-Hill Education, New York, NY,USA, 1968.
[53] G. Oetken. A new approach for the design of digital interpolating filters. IEEE Transactionson Acoustics, Speech and Signal Processing, 27(6):637 – 643, Dec. 1979.
[54] A. V. Oppenheim and R. W. Schafer. Discrete-Time Signal Processing. Prentice-Hall, Inc.,Upper Saddle River, New Jersey, 1999.
[55] C. O’Shea. Kinect air guitar prototype. http://www.chrisoshea.org/lab/
air-guitar-prototype, Jan. 2012.
[56] J. Pakarinen, T. Puputti, and V. Valimaki. Virtual slide guitar. Computer Music Journal,32(3):42–54, 2008.
[57] H. Penttinen, M. Karjalainen, T. Paatero, and H. Jarvelainen. New techniques to model rever-berant instrument body responses. In Proc. of the International Computer Music Conference.ICMC, 2001.
[58] H. Penttinen, J. Siiskonen, and V. Valimaki. Acoustic guitar plucking point estimation inreal time. In Proc. of the IEEE International Conference on Acoustics, Speech, and SignalProcessing, volume 3, pages 209 – 212. ICASSP, Mar. 2005.
[59] H. Penttinen and V. Valimaki. Time-domain approach to estimating the plucking point of guitartones obtained with an under-saddle pickup. Applied Acoustics, 65:1207–1220, Dec. 2004.
[60] Thomas Quatieri. Discrete-Time Speech Signal Processing: Principles and Practice. PrenticeHall Press, Upper Saddle River, NJ, USA, 2001.
[61] L. Rabiner. On the use of autocorrelation analysis for pitch detection. IEEE Transactions onAcoustics, Speech and Signal Processing, 25(1):24 – 33, Feb. 1977.
[62] Janne Riionheimo and Vesa Valimaki. Parameter estimation of a plucked string synthesismodel using a genetic algorithm with perceptual fitness calculation. EURASIP J. Appl. SignalProcess., 2003:791–805, 2003.
[63] M. Roma, L. Gonzalez, and F. Briones. Software based acoustic guitar simulation by meansof its impulse response. In 10th Meeting on Audio Engineering of the AES. AES, Portugal,Lisbon, 2009.
[64] Thomas D. Rossing, editor. The Science of String Instruments, chapter 23. Springer Sci-ence+Business Media, 233 Spring Street, New York, NY 10013, USA, 1 edition, 2010.
117
[65] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linearembedding. Science, 290(5500):2323–2326, 2000.
[66] E.D. Scheirer. The MPEG-4 structured audio standard. In Proc. of the IEEE InternationalConference on Acoustics, Speech and Signal Processing, volume 6, pages 3801 –3804 vol.6.ICASSP, may 1998.
[67] M. Scholz. Nonlinear PCA toolbox for MATLAB. http://www.nlpca.de/matlab.html, 2011.
[68] Xavier Serra and J. O. Smith. Spectral modeling synthesis: A sound analysis/synthesis systembased on a deterministic plus stochastic decomposition. Computer Music Journal, 14(4):pp.12–24, 1990.
[69] J. O. Smith. Techniques for Digital Filter Design and System Identification with Application tothe Violin. PhD thesis, Department of Music, Stanford University, Stanford, CA, Jun. 1983.
[70] J. O. Smith. Music applications of digital waveguides. Technical report, CCRMA, MusicDepartment, Stanford University, 1987.
[71] J. O. Smith. Waveguide filter tutorial. In Proc. of the International Computer Music Conference,pages 9–16. Computer Music Association, 1987.
[72] J. O. Smith. Physical modeling using digital waveguides. Computer Music Journal, 16(4):74–91,1992.
[73] J. O. Smith. E�cient synthesis of stringed musical instruments. In Proc of the InternationalComputer Music Conference, Tokyo, Japan, 1993. ICMC.
[74] J. O. Smith. Virtual electric guitars and e↵ects using faust and octave. In Proc of InternationalLinux Audio Conference, Cologne, Germany, 2008.
[75] J. O. Smith. Digital waveguide architectures for virtual musical instruments. In David Havelock,Sonoko Kuwano, and Michael Vorlander, editors, Handbook of Signal Processing in Acoustics,pages 399–417. Springer New York, 2009.
[76] J. O. Smith. Physical Audio Signal Processing. W3K Publishing, 2010. online book.
[77] J. O. Smith. Spectral Audio Signal Processing, October 2008 Draft. CCRMA Stanford, August22, 2010. online book.
[78] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework fornonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
[79] T. Tolonen, C. Erkut, V. Valimaki, and M. Karjalaineen. Simulation of plucked strings exhibit-ing tension modulation driving force. In Proc. of the International Computer Music Conference.ICMC, 1999.
[80] T. Tolonen, V. Valimaki, and M. Karjalainen. Modeling of tension modulation nonlinearity inplucked strings. IEEE Transactions on Speech and Audio Processing, 8(3):300–310, May 2000.
[81] C. Traube and P. Depalle. Extraction of the excitation point location on a string using weightedleast-square estimation of a comb filter delay. In Proc. of International Conference on DigitalAudio E↵ects, London, UK, Sept. 2003. DAFx.
118
[82] C. Traube, P. Depalle, and M. Wanderley. Indirect acquisition of instrumental gesture based onsignal, physical and perceptual information. In Proc. of New Interfaces for Musical Expression,pages 42–47, Montreal, Canada, 2003. NIME.
[83] C. Traube and J. O. Smith. Estimating the plucking point on a guitar string. In COST G-6Conference on Digital Audio E↵ects. DAFX, Dec. 2000.
[84] C. Traube and J.O. Smith. Extracting the fingering and the plucking points on a guitar stringfrom a recording. In Proc. of IEEE Workshop on Applications of Signal Processing to Audioand Acoustics, pages 7–10. WASPAA, 2001.
[85] V. Valimaki. Discrete-Time Modeling of Acoustic Tubes Using Fractional Delay Filters. PhDthesis, Helsinki University of Technology, Espoo, Finland, 1995.
[86] V. Valimaki, J. Huopaniemi, M. Karjalainen, and Z. Janosy. Physical modeling of plucked stringinstruments with application to real-time sound synthesis. Journal of the Audio EngineeringSociety, 44(5):331–353, May 1996.
[87] V. Valimaki and T. Laakso. Principles of fractional delay filters. In Proc. of the IEEE Inter-national Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, Jun. 2000.ICASSP.
[88] V. Valimaki, H. Lehtonen, and T. Laakso. Musical signal analysis using fractional-delay inversecomb fitlers. In Proc. of International Conference on Digital Audio E↵ects, Bordeaux, France,Sept. 2007. DAFx.
[89] V. Valimaki, J. Pakarinen, C. Erkut, and M. Karjalainen. Discrete-time modeling of musicalinstruments. Technical report, Institute of Physics Publishing, Oct. 2005.
[90] V. Valimaki and T. Tolonen. Development and calibration of a guitar synthesizer. Journal ofthe Audio Engineering Society, 46(9):766–778, Sept. 1998.
[91] V. Valimaki, T. Tolonen, and M. Karjalinen. Signal-dependent nonlinearities for physical modelsusing time-varying fractional delay filters. In International Computer Music Conference, pages264–267, Oct. 1998.
[92] B.L. Vercoe and D. P. Ellis. Real-time csound: Software synthesis with sensing and control. InInternational Computer Music Conference, 1990.
[93] B.L. Vercoe, W.G. Gardner, and E.D. Scheirer. Structured audio: creation, transmission, andrendering of parametric sound representations. Proceedings of the IEEE, 86(5):922 –940, may1998.
119
VITA
Raymond Vincent Migneco
EDUCATIONPh.D. Electrical & Computer Engineering, Drexel University, Philadelphia, PA, 2012M.S. Electrical & Computer Engineering, Drexel University, Philadelphia, PA, 2011B.S. Electrical Engineering, The Pennsylvania State University, University Park, PA, 2005
ACADEMIC HONORSEta Kappa Nu Electrical & Computer Engineering Honor SocietyDean’s List Honors Drexel University, The Pennsylvania State University
PROFESSIONAL EXPERIENCEGraduate Research Assistant, Drexel University, 9/2007 - 6/2012Electrical Reliability Engineer, Sunoco Chemicals, 8/2005 - 8/2007
TEACHING EXPERIENCETeaching Assistant, Drexel University, 9/2007 - 6/2011NSF Discovery K-12 Fellow, Drexel University, 3/2008 - 6/2009Teaching Assistant, The Pennsylvania State University, 1/2005 - 5/2005
SELECTED PUBLICATIONS• Migneco, R., and Kim, Y. E. (2012). “A Component-Based Approach for Modeling Plucked-
Guitar Excitation Signals,” Proceedings of the International Conference on New Interfaces forMusical Expression, Ann Arbor, MI: NIME.
• Batula, A. M., Morton, B. G., Migneco, R., Prockup, M., Schmidt, E. M., Grunberg, D.K., Kim, Y. E., and Fontecchio, A. K. (2012). “Music Technology as an Introduction toSTEM,” Proceedings of the American Society for Engineering Education Annual Conference,San Antonio, TX: ASEE.
• Migneco, R., and Kim, Y. E. (2011). “Excitation Modeling and Synthesis for Plucked GuitarTones,” Proceedings of the 2011 IEEE Workshop on Applications of Signal Processing to Audioand Acoustics, New Paltz, NY: WASPAA.
• Migneco, R., and Kim, Y. E. (2011). “Modeling Plucked Guitar Tones Via Joint SourceFilter Estimation,” Proceedings of the 14th IEEE Digital Signal Processing Workshop and 6thIEEE Signal Processing Education Workshop, Sedona, AZ: DSP/SPE.
• Scott, J., Migneco, R., Morton, B., Hahn, C. M., Difenbach, P. and Kim, Y. E. (2010).“An audio processing library for MIR application development in Flash,” Proceedings of the2010 International Society for Music Information Retrieval Conference, Utrecht, Netherlands:ISMIR.
• Migneco, R., Doll, T. M., Scott, J. J., Hahn, C., Diefenbach, P. J., and Kim, Y. E. (2009).“An audio processing library for game development in Flash,” Accepted to International IEEEConsumer Electronics Societys Games Innovations Conference.
• Kim, Y. E., Doll, T. M., and Migneco, R. (2009). “Collaborative online activities for acousticseducation and psychoacoustic data collection,” in IEEE Transactions on Learning Technologies.
• Doll, T. M., Migneco, R., Scott, J. J., and Kim, Y. E. (2009). “An audio DSP toolkitfor rapid application development in Flash,” Accepted to IEEE International Workshop onMultimedia Signal Processing, Rio de Janiero, Brazil: MMSP.