Analysis and synthesis of expressive guitar performance

133
Analysis and Synthesis of Expressive Guitar Performance A Thesis Submitted to the Faculty of Drexel University by Raymond Vincent Migneco in partial fulfillment of the requirements for the degree of Doctor of Philosophy May 2012

Transcript of Analysis and synthesis of expressive guitar performance

Analysis and Synthesis of Expressive Guitar Performance

A Thesis

Submitted to the Faculty

of

Drexel University

by

Raymond Vincent Migneco

in partial fulfillment of the

requirements for the degree

of

Doctor of Philosophy

May 2012

c� Copyright 2012

Raymond Vincent Migneco. All Rights Reserved.

ii

Table of Contents

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 COMPUTATIONAL GUITAR MODELING . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Sound Modeling and Synthesis Techniques . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Wavetable Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.2 FM Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.3 Additive Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.4 Source-Filter Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.5 Physical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Summary and Model Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.3 Synthesis Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.1 Synthesis Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2 Description and Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 New Music Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 PHYSICALLY INSPIRED GUITAR MODELING . . . . . . . . . . . . . . . . . . . . 14

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Waveguide Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2.1 Solution for the Ideal, Plucked-String . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2.2 Digital Implementation of the Wave Solution . . . . . . . . . . . . . . . . . . . . 15

3.2.3 Lossy Waveguide Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2.4 Waveguide Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

iii

3.2.5 Extensions to the Waveguide Model . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Analysis and Synthesis Using Source-Filter Approximations . . . . . . . . . . . . . . 21

3.3.1 Relation to the Karplus-Strong Model . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3.2 Plucked String Synthesis as a Source-Filter Interaction . . . . . . . . . . . . . . . 22

3.3.3 SDL Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.4 Excitation and Body Modeling via Commuted Synthesis . . . . . . . . . . . . . . 25

3.3.5 SDL Loop Filter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4 Extensions to the SDL Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 SOURCE-FILTER PARAMETER ESTIMATION . . . . . . . . . . . . . . . . . . . . 32

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2 Background on Expressive Guitar Modeling . . . . . . . . . . . . . . . . . . . . . . . 32

4.3 Excitation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3.1 Experiment: Expressive Variation on a Single Note . . . . . . . . . . . . . . . . . 34

4.3.2 Physicality of the SDL Excitation Signal . . . . . . . . . . . . . . . . . . . . . . . 36

4.3.3 Parametric Excitation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4 Joint Source-Filter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4.1 Error Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.4.2 Convex Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5 SYSTEM FOR PARAMETER ESTIMATION . . . . . . . . . . . . . . . . . . . . . . 43

5.1 Onset Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.1.1 Coarse Onset Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.1.2 Pitch Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.1.3 Pitch Synchronous Onset Detection . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.1.4 Locating the Incident and Reflected Pulse . . . . . . . . . . . . . . . . . . . . . . 48

5.2 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2.2 Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.3 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

iv

5.3.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.3.2 Problem Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6 EXCITATION MODELING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.2 Previous Work on Guitar Source Signal Modeling . . . . . . . . . . . . . . . . . . . . 64

6.3 Data Collection Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.4 Excitation Signal Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.4.1 Pitch Estimation and Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.4.2 Residual Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.4.3 Spectral Bias from Plucking Point Location . . . . . . . . . . . . . . . . . . . . . 70

6.4.4 Estimating the Plucking Point Location . . . . . . . . . . . . . . . . . . . . . . . 71

6.4.5 Equalization: Removing the Spectral Bias . . . . . . . . . . . . . . . . . . . . . . 74

6.4.6 Residual Alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.5 Component-based Analysis of Excitation Signals . . . . . . . . . . . . . . . . . . . . 77

6.5.1 Analysis of Recovered Excitation Signals . . . . . . . . . . . . . . . . . . . . . . 77

6.5.2 Towards an Excitation Codebook . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.5.3 Application of Principal Components Analysis . . . . . . . . . . . . . . . . . . . 79

6.5.4 Analysis of PC Weights and Basis Vectors . . . . . . . . . . . . . . . . . . . . . . 81

6.5.5 Codebook Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.5.6 Codebook Evaluation and Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.6 Nonlinear PCA for Expressive Guitar Synthesis . . . . . . . . . . . . . . . . . . . . . 88

6.6.1 Nonlinear Dimensionality Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.6.2 Application to Guitar Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.6.3 Expressive Control Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

7 CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

v

7.1 Expressive Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.2 Physical Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Appendix A Overview of Fractional Delay Filters . . . . . . . . . . . . . . . . . . . . . . 100

A.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

A.2 The Ideal Fractional Delay Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

A.3 Approximation Using FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

A.3.1 Delay Approximation using Lagrange Interpolation Filters . . . . . . . . . . . . . 103

A.4 Further Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

Appendix B Pitch Glide Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

B.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

B.2 Pitch Glide Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

B.3 Pitch Glide Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

B.4 Nonlinear Modeling and Data Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . 108

B.4.1 Nonlinear Least Squares Formulation . . . . . . . . . . . . . . . . . . . . . . . . 108

B.4.2 Fitting and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

B.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

vi

List of Tables

2.1 Summary of sound synthesis models including their modeling domain and applicableaudio signals. Adopted from Vercoe et al. [93]. . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Evaluating the attributes of various sound modeling techniques. The boldface tagsindicate the optimal evaluation for a particular category. . . . . . . . . . . . . . . . . 11

5.1 Mean and standard deviation of the SNR computed using Equation 5.11. The jointsource-filter estimation approach was used to obtain parameters for synthesizing theguitar tones based on an IIR loop filter. . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5.2 Mean and standard deviation of the SNR computed using Equation 5.11. The jointsource-filter estimation approach was used to obtain parameters for synthesizing theguitar tones using a FIR loop filter with length N = 3. . . . . . . . . . . . . . . . . . 61

B.1 Pitch glide parameters of Equation B.3 for plucked guitar tones for each guitar string.p, mf and f indicate strings excited with piano, mezzo-forte and forte dynamics,respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

vii

List of Figures

3.1 Traveling wave solution of an ideal string plucked at time t = t1 and its displacementat subsequent time instances t2, t3. The string’s displacement (solid) at any positionis the summation of the two disturbances (dashed) at that position. . . . . . . . . . . 16

3.2 Waveguide model showing the discretized solution of an ideal, plucked string. Theupper (y+) and lower (y�) signal paths represent the right and left traveling distur-bances, respectively. The string’s displacement is obtained by summing y+ and y� ata desired spatial sample. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.3 Waveguide model incorporating losses due to propagation at the spatial samplinginstances. The dashed lines outline a section where M gain and delay blocks areconsolidated using a linear time-invariant assumption. . . . . . . . . . . . . . . . . . . 18

3.4 Plucked-string waveguide model as it correlates to the physical layout of the guitar.Propagation losses and boundary conditions are lumped into digital filters located atthe bridge and nut positions. The delay lines are initialized with the string’s initialdisplacement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5 Single delay-loop model (right) obtained by concatenating the two delay lines from abidirectional waveguide model (left) at the nut position. Losses from the bridge andnut filters are consolidated into a single filter in the feedback loop. . . . . . . . . . . . 22

3.6 Plucked string synthesis using the single delay-loop (SDL) model specified by S(z).C(z) and U(z) are comb filters simulating the e↵ects of the plucking point and pickuppositions along the string, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.7 Components for guitar synthesis including excitation, string and body filters. Theexcitation and body filter’s may be consolidated for commuted synthesis. . . . . . . . 26

3.8 Overview of the loop filter design algorithm outlined in Section 3.3.5 using short-timeFourier transform analysis on the signal. . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.1 Top: Plucked guitar tones representing various string articulations by the guitarist onthe open, 1st string (pitch E4, 329.63 Hz). Bottom: Excitation signals for the SDLmodel associated with each plucking style. . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2 The output of a waveguide model is observed over one period of oscillation. The topfigure in each subplot shows the position of the traveling acceleration waves at di↵erenttime instances. The bottom plot traces out the measured acceleration at the bridge(noted by the ’x’ in the top plots) over time. . . . . . . . . . . . . . . . . . . . . . . . 37

5.1 Proposed system for jointly estimating the source-filter parameters for plucked guitartones. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 Pitch estimation using the autocorrelation function. The lag corresponding to theglobal maximum indicates the fundamental frequency for a signal with f0 = 330 Hz. 46

viii

5.3 Overview of residual onset localization in the plucked-string signal. (a): Coarse onsetlocalization using a threshold based on spectral flux with a large frame size. (b):pitch-synchronous onset detection utilizing spectral flux threshold computed with aframe size proportional to the fundamental frequency of the string. (c): Plucked-stringsignal with onsets coarse and pitch-synchronous onsets overlayed. . . . . . . . . . . . 47

5.4 Detail view of the “attack” portion of the plucked-tone signal in Figure 5.3. The pitch-synchronous onset is marked as well as the incident and reflected pulses from the firstperiod of oscillation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.5 Pole-zero and magnitude plots of a string filter S(z) with f0 = 330 Hz and a loopfilter pole located at ↵0 = 0.03. The pole-zero and magnitude plots of the system areshown in (a) and (c) and the corresponding plots using an all-pole approximation ofS(z) are shown in (b) and (d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

5.6 Analysis and resynthesis of the guitar’s 1st String in the “open” position (E4, f0 =329.63 Hz). Top: Original plucked-guitar tone, residual signal and estimated excitationboundaries. Middle: Resynthesized pluck and excitation using estimated source-filterparameters. Bottom: Modeling error. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.7 Comparing the amplitude envelopes of synthetic plucked-string tones produced withthe parameters obtained from the joint source-filter algorithm against their analyzedcounterparts. The tones under analysis were produced by plucking the 1st string atthe 2nd fret position (F#4, f0 = 370 Hz) at piano, mezzo-forte and forte dynamics. . 55

5.8 Comparing the amplitude envelopes of synthetic plucked-string tones produced withthe parameters obtained from the joint source-filter algorithm against their analyzedcounterparts. The tones under analysis were produced by plucking the 5th string atthe 5th fret position (D3, f0 = 146.83 Hz) at piano, mezzo-forte and forte dynamics. . 56

6.1 Source-filter model for plucked-guitar synthesis. C(z) is the feed-forward comb filtersimulating the a↵ect of the player’s plucking position. S(z) models the string’s pitchand decay characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.2 Front orthographic projection of the bridge-mounted piezoelectric bridge used to recordplucked-tones. A piezoelectric crystal is mounted on each saddle, which measurespressure during vibration. Guitar diagram obtained from www.dragoart.com. . . . . 67

6.3 Diagram outlining the residual equalization process for excitation signals. . . . . . . . 69

6.4 “Comb filter” e↵ect resulting from plucking a guitar string (open E, f0 = 331 Hz)8.4 cm from the bridge plucked-guitar tone. (a) Residual obtained from single delay-loop model. (b) Residual spectrum. Using equation 6.2, the notch frequencies areapproximately located at multiples of 382 Hz. . . . . . . . . . . . . . . . . . . . . . . 70

6.5 Plucked-guitar tone measured using a piezo-electric bridge pickup. Vertical dashed-lines indicate the impulses arriving at the bridge pickup. �t indicates the arrival timebetween impulses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.6 (a) One period extracted from the plucked-guitar tone in Figure 6.5. (b) Autocor-relation of the extracted period. The minimum is marked and denotes time lag, �t,between arriving pulses at the bridge pickup. . . . . . . . . . . . . . . . . . . . . . . . 73

ix

6.7 Comb filter structures for simulating the plucking point location. (a) Basic struc-ture. (b) Basic structure with fractional delay filter added to the feedforward path toimplement non-integer delay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.8 Spectral equalization on a residual signal obtained from plucking a guitar string 8.4cm from the bridge (open E, f0 = 331 Hz) . . . . . . . . . . . . . . . . . . . . . . . . 76

6.9 Excitation signals corresponding to strings excited using a pick (a) and finger (b). . . 77

6.10 Average magnitude spectra of signals produced with pick (a) and finger (b). . . . . . 78

6.11 Application of principal components analysis to a synthetic data set. The vector v1

explains the greatest variance in the data while v2 explains the remaining greatestvariance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.12 Explained variance of the principal components computed for the set of (a) unwoundand (b) wound strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.13 Selected basis vectors extracted from plucked-guitar recordings produced on the 1st,2nd and 3rd strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.14 Selected basis vectors extracted from plucked-guitar recordings produced on the 4th,5th and 6th strings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.15 Projection of guitar excitation signals into the principal component space. Excitationsfrom strings 1 - 3 (a) and 4 - 6 (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.16 Histogram of basis vector occurrences generated with Mtop = 20. . . . . . . . . . . . 86

6.17 Excitation synthesis by varying the number of code book entries: (a) 1 entry, (b) 10entries, (c) 50 entries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.18 Computed Signal-to-noise ratio when increasing the number of codebook entries usedto reconstruct the excitation signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.19 Architecture for a 3-4-1-4-3 autoassociative neural network. . . . . . . . . . . . . . . . 89

6.20 Top: Projection of excitation signals into the space defined by the first two linearprincipal components. Bottom: Projection of the linear PCA weights along the axisdefined by the bottleneck layer of the trained 25-6-2-6-25 ANN. . . . . . . . . . . . . 91

6.21 Guitar data projected along orthogonal principal axes defined by the ANN (center).Example excitation pulses resulting from sampling this space are also shown. . . . . . 92

6.22 Tabletop guitar interface for the components based excitation synthesis. The articula-tion is applied in the gradient rectangle, while the colored squares allow the performerto key in specific pitches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

A.1 Impulse responses of an ideal shifting filter when the sample delay assumes an integer(top) and non-integer (bottom) number of samples. . . . . . . . . . . . . . . . . . . . 102

A.2 Lagrange interpolation filters with order N = 3 (top) and N = 7 (bottom) to providea fractional delay, dF = 0.3. As the order of the filter is increased, the Lagrange filtercoe�cients near the values of the ideal function. . . . . . . . . . . . . . . . . . . . . . 104

x

A.3 Frequency response characteristics of Lagrange interpolation filters with order N =3, 5, 7 to provide a fractional delay dF = 0.3. Magnitude (top) and group delay(bottom) characteristics are plotted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

B.1 Measured and modeled pitch glide for forte plucks. . . . . . . . . . . . . . . . . . . . 110

B.2 Measured and modeled pitch glide for piano, mezzo-forte and forte plucks. . . . . . . 111

B.3 Single delay-loop waveguide filter with variable fractional delay filter, HF (z). . . . . . 111

xi

AbstractAnalysis and Synthesis of Expressive Guitar Performance

Raymond Vincent MignecoAdvisor: Youngmoo Edmund Kim, Ph.D.

The guitar is one of the most popular and versatile instruments used in Western music cultures.

Dating back to the Renaissance era, the guitar can be heard in nearly every genre of Western music,

and is arguably the most widely used instrument in present-day rock music. Over the span of 500

years, the guitar has developed a multitude of performance and compositional styles associated with

nearly every musical genre such as classical, jazz, blues and rock. This versatility can be largely

attributed to the relatively simplistic nature of the instrument, which can be built from a variety of

materials and optionally amplified. Furthermore, the flexibility of the instrument allows performers

to develop unique playing styles, which reflect how they articulate the guitar to convey certain

musical expressions.

Over the last three decades, physical- and physically-inspired models of musical instruments have

emerged as a popular methodology for modeling and synthesizing various instruments, including the

guitar. These models are popular since their components relate to the actual mechanisms involved

with sound production on a particular instrument, such as the vibration of a guitar string. Since the

control parameters are physically relevant, they have a variety of applications including control and

manipulation of “virtual instruments.” The focus of much of the literature on physical modeling for

guitars is concerned with calibrating the models from recorded tones to ensure that the behavior of

real strings is captured. However, far less emphasis is placed on extracting parameters that pertain

to the expressive styles of the guitarist.

This research presents techniques for the analysis and synthesis of plucked guitar tones that

are capable of modeling the expressive intentions applied through the guitarist’s articulation during

performance. A joint source-filter estimation approach is developed to account for the performer’s

articulation and the corresponding resonant string response. A data-driven, statistical approach for

modeling the source signals is also presented in order to capture the nuances of particular playing

styles. This research has several pertinent applications, including the development of expressive syn-

thesizers for new musical interfaces and the characterization of performance through audio analysis.

1

CHAPTER 1: INTRODUCTION

The guitar is one of the most popular and versatile instruments used in Western music cultures.

Dating back to the Renaissance period, it has been incorporated into nearly every genre of Western

music and, hence, has a rich tradition of design and performance techniques pertaining to each genre.

From a cultural standpoint, musicians and non-musicians alike are captivated by the performances of

virtuoso guitarists past and present, who introduced innovative techniques that defined or redefined

the way the instrument was played. This deep appreciation is no doubt related to the instrument’s

adaptability, as it is recognized as a primary instrument in many genres, such as blues, jazz, folk,

country and rock.

The guitar’s versatility is inherent in its simple design, which can be attributed to its use in

multiple musical genres. The basic components of any guitar consist of a set of strings mounted

across a fingerboard and a resonant body to amplify the vibration of the strings. The tension on

each string is adjusted to achieve a desired pitch when the string is played. Particular pitches are

produced by clamping down each string at a specific location along the fingerboard, which changes

the e↵ective length of the string and, thus, the associated pitch when it is plucked. Frets, which

are metallic strips spanning the width of the fingerboard, are usually installed on the fingerboard to

exactly specify the location of notes in accordance with an equal tempered division of the octave.

The basic design of the guitar has been augmented in a multitude of ways to satisfy the demands

of di↵erent performers and musical genres. For example, classical guitars are strung with nylon

strings, which can be played with the fingers or nails, and a wide fingerboard to permit playing

scales and chords with minimal interference from adjacent strings. Often a solo instrument, the

classical guitar requires a resonant body for amplification where the size and materials of the body

are chosen to achieve a specific timbre. On the other hand, country and folk guitarists prefer steel-

strings which generally produce “brighter” tones. Electric guitars are designed to accommodate

the demands of guitarists performing rock, blues and jazz music. These guitars are outfitted with

electromagnetic pickups where string vibration induces an electrical current, which can be processed

to apply certain e↵ects (e.g. distortion, reverberation) and eventually amplified. The role of the

body is less important for electric guitars (although guitarists argue that it a↵ects the instrument’s

2

timbre) where the body is generally thinner to increase comfort during performance. When the

electric guitar is outfitted with light gauge strings, it facilitates certain techniques such as pitch-

bending and legato, which are more di�cult to perform on acoustic instruments.

Though the guitar can be designed and played in di↵erent ways to achieve a vast tonal palette,

the underlying physical principles of vibrating strings is constant for each variation of the instrument.

Consequently, a popular topic among musicians and researchers is the development of quantitative

guitar models that simulate this behavior. Physical- and physically-inspired models of musical in-

struments have emerged as a popular methodology for this task. The lure of these models is that

they simulate the physical phenomena responsible for sound production in instruments, such as a

vibrating strings or air in a column, and produce high-quality synthetic tones. Properly calibrating

these models, however, remains a di�cult task and is an on-going topic in the literature. Several gui-

tar synthesizers have been developed using physically-inspired models, such as waveguide synthesis

and the Karplus-Strong Algorithm.

In the last decade, there has been considerable interest in digitally modeling analog guitar com-

ponents and e↵ects using digital signal processing (DSP) techniques. This work is highly relevant

to the consumer electronics industry since it promises low-cost, digital “clones” of vintage, analog

equipment. The promise of these devices is to help musicians consolidate their analog equipment

into a single device or acquire the specific tones and capabilities of expensive and/or discontinued

equipment at lower cost. Examples of products designed using this technology include Line6 mod-

eling guitars and amplifiers, where DSP is used to replicate the sounds of well-known guitars and

tube-based amplifiers [45, 46].

Despite the large amount of research focused on digitally modeling the physics of the guitar and its

associated e↵ects, there has been relatively little research conducted which analyzes the expressive

attributes of guitar performance. The current research is mainly concerned with implementing

specific performance techniques into physical models based on detailed physical analysis of the

performer-instrument interaction. However, there is a void in the research for guitar modeling and

synthesis that is concerned with measuring physical and expressive data from recordings. Obtaining

such data is essential for developing an expressive guitar synthesizer ; that is, a system that not only

faithfully replicates guitar timbres, but is also capable of simulating expressive intentions used by

many guitarists.

3

1.1 Contributions

This dissertation proposes analysis and synthesis techniques for plucked guitar tones that are capable

of modeling the expressive intentions applied through the guitarist’s articulation during performance.

Specifically, the expression analyzed through recorded performance focuses on how the articulation

was applied through plucking mechanism and strength. The main contributions of this research are

summarized as follows:

• Generated a data set of plucked guitar tones comprising variations of the performer’s articu-

lation including the plucking mechanism and strength, which spans all of the guitar’s strings

and several fretting positions.

• Developed a framework for jointly estimating the source and filter parameters for plucked-

guitar tones based on a physically-inspired model.

• Proposed and demonstrated a novel application of principal component analysis to model the

source signal for plucked guitar tones to encapsulate characteristics of various string articula-

tions.

• Utilized nonlinear principal components analysis to derive an expressive control space to syn-

thesize excitation signals corresponding to guitar articulations.

The analysis and synthesis techniques proposed here are based on physically inspired models

of plucked-guitar tones. These types of models are chosen because they have great potential for

analyzing and synthesizing expressive performance because their operation has a strong physical

analog to the process of exciting a string; that is, an impulsive force excites a resonant string response.

These advantages are in contrast to other modeling techniques, such as frequency modulation (FM),

additive and spectral modeling synthesis, which are often used for music synthesis tasks, but lack

easily controlled parameters that relate to how an instrument is excited (e.g. bowing, picking).

Physical models, on the other hand, relate to the initial conditions of a plucked string and possible

variations which produce unique tones when applied to the model. This is intuitive, considering

guitarists a↵ect the same physical variables when plucking a string.

The proposed method for deriving the parameters relating to expressive guitar performance is

based on a joint source-filter estimation framework. The motivation to implement the estimation in

a joint source-filter framework is two-fold. Foremost, musical expression results from an interaction

4

between the performer and the instrument and estimating the expressive attributes of performance

requires accounting for the simultaneous variation of source and filter parameters. For the specific

case of the guitar, the performer can be seen as imparting an articulation (i.e. excitation) on the

string (i.e. filter), which has a resonant response to the performance input. The second reason for

this approach is to facilitate the estimation of the source and filter parameters, which is typically

accomplished in two separate tasks.

Building o↵ the joint parameter estimation scheme, component-based analysis is applied to the

source (i.e. excitation) signals obtained from recorded performance. Existing modeling techniques

treat the excitation signal as a separate entity saved o↵-line to model a specific articulation, but in

doing so provides no mechanism to quantify or manipulate the excitation signal. The application of

component analysis is a data-driven, statistical approach used to represent the nuances of specific

articulations through linear combinations of component vectors or functions. Using this represen-

tation, the articulations can be visualized in the component space and dimensionality reduction is

applied to yield an expressive synthesis space that o↵ers control over specific characteristics of the

data set.

The proposed guitar modeling techniques presented in this dissertation have many potential

applications for music analysis and synthesis tasks. Analyzing the source-filter parameters derived

from the recordings of many guitarists could lead to development of quantitative models of guitar

expression and a deeper understanding of expression during performance. The application of the

estimated parameters using the proposed techniques can expand upon the sonic and expressive

capabilities of current synthesizers, which often rely on MIDI or wavetable samples to replicate the

tone with little or no expressive control. During the advent of computer music, limited computational

power was a major constraint when implementing synthesis algorithms, but this is now much less

of a concern given the capabilities of present-day computers and mobile devices. These advances in

technology have provided new avenues for interacting with audio through gesture-based technologies.

The guitar analysis and synthesis techniques presented in this dissertation can be harnessed along

with these technologies to create new experiences for musical interaction.

1.2 Overview

As computational modeling for plucked-guitars is the basis of this thesis, Chapter 2 overviews various

approaches for modeling and synthesizing musical sounds. These approaches include wavetable

5

synthesis, spectral modeling, FM synthesis, physical modeling and source-filter model. The strengths

and weaknesses of each model are evaluated and based on our assessment, a recommendation is made

to base the techniques proposed in this dissertation on a source-filter approximation of physical guitar

models.

Physical and source-filter models are discussed in detail in Chapter 3, which digitally implement

the behavior of a vibrating string due to an external input. The so-called waveguide model, which

is based on a digital implementation of the d’Alembert solution for describing traveling waves on a

string, is introduced as well as a source-filter approximation of this model.

Chapter 4 presents an approach for capturing the expression contained in specific string articu-

lations via the source signal from a source-filter model. The physical relation of this source signal

to the waveguide model is highlighted and it is suggested that a parametric model can be used to

capture the nuances of the articulations. The joint estimation of the source and filter models is

proposed by finding parameters that minimize the error between the analyzed recording and the

synthetic signal. This constrained least squares problem is solved using convex optimization. The

implementation for this approach and results are discussed in Chapter 5.

In Chapter 6, principal components analysis (PCA) is applied to a corpus of excitation signals

derived from recorded performance. In this application, PCA models each excitation signal as a

linear combination of basis functions, where each function contributes to the expressive attributes

of the data. We show that a codebook of relevant basis functions can be extracted which describe

particular articulations where the plucking device and strength are varied. Furthermore, using

components as features, we show that nonlinear PCA (NLPCA) can be applied for dimensionality

reduction, which helps visualize the expressive attributes of the data set. This mapping is reversible,

so the reduced dimensional space can be used as an expressive synthesizer using the linear basis

functions to reconstruct the excitation signals. This chapter also deals with the pre-processing steps

required to remove biases from the recovered signals, including the e↵ect of the guitarist’s plucking

position along the string.

The conclusions from this dissertation are presented in Chapter 7, which includes the limitations

and future avenues to explore.

6

CHAPTER 2: COMPUTATIONAL GUITAR MODELING

A number of techniques are available for the computational modeling and synthesis of guitar tones,

each with entirely di↵erent approaches for capturing its acoustic attributes. This chapter will provide

an overview of the sound models most commonly applied to guitar tones including their computa-

tional basis, strengths and weaknesses. For detailed treatment of these techniques, the reader is

referred to extensive overviews provided by [10] and [89]. The analysis of each synthesis techniques

will also be used to justify the source-filter modeling approach used throughout this dissertation.

Finally, this chapter will discuss pertinent applications of computational synthesis of guitar tones.

2.1 Sound Modeling and Synthesis Techniques

2.1.1 Wavetable Synthesis

In many computer music applications, wavetable synthesis is a viable means for synthetically gener-

ating musical sounds with low computational overhead. A wavetable is simply a bu↵er that stores

the periodic component of a recorded sound, which can be looped repeatedly. As musical sounds

vary in pitch and duration, signal processing techniques are required to modify the synthetic tones

from a wavetable sample. Pitch shifting is achieved by interpolating the samples in the wavetable

where a decrease or increase in pitch is achieved by interpolating the wavetable samples up or down,

respectively.

A problem with interpolation in wavetable synthesis is that excessive interpolation of a particular

wavetable sample can result in synthetic tones that sound unnatural since interpolation alters the

length of the synthetic signal. To overcome this limitation, multi-sampling is used, where several

samples of an instrument are used and these samples span the pitch range of the instrument. In-

terpolation can now be used between the reference samples without excessive degradation to the

synthetic tone, which is preferred to storing every possible pitch the instrument can produce. Multi-

sampling can also be used to incorporate di↵erent levels of dynamics, or relative loudness into the

system as well. Beyond interpolation, digital filters can be used to adjust the spectral properties

7

(e.g. brightness) of the wavetable samples as well.

The computational costs of wavetable synthesis are fairly low and the main restriction is the

amount of memory available to store samples. The sound quality in these systems can be quite good

as long as there is not excessive degradation from modification. However, wavetable synthesis has

no true modeling basis (i.e. sinusoidal, source-filter) and is rather “ad-hoc” in its approach. Also,

its flexibility in modeling and synthesis is restricted by the samples available to the synthesizer.

2.1.2 FM Synthesis

Frequency Modulation (FM) synthesis is a technique used to simulate characteristics of sounds that

cannot be produced with LTI models. A FM oscillator is one such way of achieving these sounds

and it operates by modulating the base frequency of a signal with another signal. FM Synthesis is

often used to simulate characteristics of sounds that cannot be modeled using linear time-invariant

models. A simple FM oscillator is given by

y(t) = Ac sin(2⇡tfc + �fc cos(2⇡tfm)) (2.1)

where Ac and fc are the amplitude and frequency of the carrier signal, respectively, fm is the

modulating frequency and �fc is the maximum di↵erence between fc and fm. The spectrum of

the resulting signal y(t) contains a peak located at the carrier frequency and sideband frequencies

located at plus and minus integer multiples of fm. When the ratio of the carrier to the modulating

frequency is non-integer, FM synthesis creates an inharmonic spectrum where the frequency spacing

between the partials is not constant. This is useful for modeling the spectra of certain musical

sounds, such as strings and drums, which exhibit inharmonic behavior.

FM synthesis is a fairly computationally e�cient technique and can be easily implemented on

a microprocessor, which makes it attractive for commercially available synthesizers. Due to the

nonlinearity of the FM oscillator, for example, it is capable of producing timbres not possible with

other synthesis methods. However, there is no automated approach for matching the synthesis

parameters to an acoustic recording [8]. Rather, the parameters must be tweaked by trial and error

and/or using perceptual evaluation.

8

2.1.3 Additive Synthesis

Additive, or spectral modeling, synthesis is a sound modeling and synthesis approach based on

characterizing the spectra of musical sounds and modeling them appropriately. Sound spectra cat-

egories typically consist of harmonic, inharmonic, noise or mixed spectra. Analysis via the additive

synthesis approach typically entails performing a short-time analysis on the signal to divide it into

relatively short frames where the signal is assumed to be stationary within the frame. In the spectral

modeling synthesis technique proposed by Serra and Smith, the sinusoidal, or deterministic, parts of

the spectrum within each frame are identified and modeled using amplitude, frequency and phase.

The sound can be re-synthesized by interpolating between the deterministic components of each

frame to generate a sum of smooth, time-varying sinusoids. The noise-like, or stochastic, parts of

the spectrum can be obtained by subtracting the synthesized, deterministic component from the

original signal [68].

There are several benefits to synthesizing musical sounds via additive synthesis. Foremost, the

model is very general and can be applied to a wide range of signals including polyphonic audio

and speech [50, 68]. Also, the separation of the deterministic and stochastic components permits

flexible modification of signals since the sinusoidal parameters are isolated within the spectrum.

For example, pitch and time/scale modification can be achieved independently or simultaneously

by shifting the frequencies of the sinusoids and altering the interpolation time between successive

frames. This leads to synthetic tones that sound more natural and can be extended indefinitely,

unlike wavetable interpolation.

A problem with additive synthesis is that transient events present in an analyzed signal are

often too short to be adequately modeled by sinusoids and must be accounted for separately. This

is problematic especially for signals with a percussive “attack” such as plucked-strings. It is also

unclear how to modify the sinusoids in order to achieve certain e↵ects related to the perceived

dynamics of a musical tone.

2.1.4 Source-Filter Modeling

Analysis and synthesis via source-filter models involves using a complex sound source, such as an

impulse or periodic impulse train, to excite a resonant filter. The filter includes the important per-

ceptual characteristics of the sound, such as the overall spectral tilt and the formants, or resonances,

characteristic to the sound. When such a filter is excited by an impulse train, for example, the

9

resonant filter is “sampled” at regular intervals in the spectrum as defined by the frequency of the

impulse train.

Source-filter models are attractive because they permit the automated analysis of the resonant

characteristics through either time or frequency domain based techniques. One of the most well-

known examples of this is linear prediction. Linear prediction entails predicting a sample of a signal

based on a linear combination of past samples for that signal

x(n) =PX

p=1

↵px(n � p) (2.2)

where ↵p, ↵p+1, . . . , ↵P are the prediction coe�cients to be estimated from the recording [60]. When

a fairly low prediction order P is used, the prediction coe�cients yield an all-pole filter that approx-

imates the spectral shape, including resonances, of the analyzed sound. Computationally e�cient

techniques, such as the autocorrelation and covariance methods, are available for estimating the

filter parameters as well.

A significant advantage of source-filter models is that they approximate musical sounds as the

output of a linear time-invariant (LTI) system. Therefore, using the estimated resonant filter, the

source signal for the model can be recovered through an inverse filtering operation. Analysis of

the recovered source signals provides insight into the expression used to produce the sound for the

case of musical instruments. Also, source signals derived from certain signals can be used to excite

the resonant filters from others, thus permitting cross-synthesis for generating new and interesting

sounds. As will be discussed in Chapter 3, source-filter models have a close relation to physical

models of musical instruments.

Despite the advantages of source-filter models, they have certain limitations. Namely, as they

are based on LTI models, they cannot model the inherent nonlinearities found in real musical in-

struments. For example, tension modulation in real strings alters the spectral characteristics in a

time-varying manner, while source-filter models have fixed fundamental frequencies.

2.1.5 Physical Modeling

Physical modeling systems aim to model the behavior of systems using physical variables such as

force, displacement, velocity and acceleration. Physical systems describing sound can range from

musical interactions such as striking a drum or string or natural sounds such as wind and rolling

objects. An example physical system for a musical interaction consists of releasing a string from an

10

initial displacement. The solution to this system is discussed extensively in Chapter 3, but involves

computing the infinitesimal forces acting on the string as it is released which results in a set of

di↵erential equations describing the motion of the string with respect to time and space. The digital

implementation of physical models for sound can be achieved in a number of ways including modal

decomposition, digital waveguides and wave digital filters to name a few [89].

While physical models are capable of high quality synthesis of acoustic instruments, developing

models of these systems is often a di�cult task. Taking the plucked-string as an example, a complete

physical description requires knowledge of the string including its material composition and how it

interacts with the boundary conditions at its termination points, which includes fricative forces

acting on the string as it travels. Furthermore, there may be coupling forces acting between the

string and the excitation mechanism (e.g. the player’s finger), which should be included as well.

For these reasons, the physical system must be known a priori and it cannot be calibrated directly

through audio analysis.

2.2 Summary and Model Recommendation

Table 2.1 summarizes the sound modeling techniques presented above by comparing their modeling

domains and the range of musical signals that can be produced using each method. The vertical

ordering is indicative of the underlying basis and/or structure of the model types. For example,

wavetable synthesis is a rather “ad-hoc” approach without a true computational basis, while FM

synthesis is based on modulating sinusoids. Additive synthesis and source-filter models have a strict

modeling basis using sinusoids plus noise and source-filter parameters, respectively. Physical models

are most closely related to musical instruments since they deal with related physical quantities and

interactions. As a model’s parameter domain becomes more general, a greater range of sounds can

be synthesized with more control over their properties (i.e. pitch, timbre, articulation).

Based on the discussion in Section 2.1, the strengths and weaknesses of each model are evaluated

on a scale (Low, Moderate, High) as they pertain to four categories:

1. Computational complexity required for implementation

2. The resulting sound quality when the model is used for sound synthesis of guitar tones

3. The di�culty required to calibrate the model in accordance with acoustic samples

4. The degree of expressive control a↵orded by the model

11

Table 2.1: Summary of sound synthesis models including their modeling domain and applicableaudio signals. Adopted from Vercoe et al. [93].

Sound Model Parameter Domain Acoustic Range

Wavetablesound samples, manipulation

filtersdiscrete pitches, isolated sound

events

FMcarrier and modulating

frequenciessounds with harmonic and

inharmonic spectra

Additivenoise sources, time-varying

amplitude, frequency and phase

sounds with harmonic,inharmonic, noisy or mixed

spectra

Source-Filterexcitation signal, filter

parameters

voice (speech, singing),plucked-string or struck

instruments

Physicalphysical quantities (length,

sti↵ness, position, etc.)plucked, struck, bowed or blown

instruments

Table 2.2: Evaluating the attributes of various sound modeling techniques. The boldface tagsindicate the optimal evaluation for a particular category.

Sound ModelComputationalComplexity

Sound QualityCalibrationDi�culty

ExpressiveControl

Wavetable Low High High Low

FM Low Moderate High Low

Additive Moderate High Moderate Moderate

Source-Filter Moderate High Moderate High

Physical High High High Moderate

Table 2.2 shows the results of this evaluation in accordance with the four categories presented

above. The model(s) earning the best evaluation for each category are highlighted in bold face font

for emphasis. It should be noticed that, in general, the computational complexity of the models

increases in accordance with the associated model parameter domain in Table 2.1. That is, as the

parameters become more general, they are more di�cult to implement and harder to calibrate.

For truly flexible and expressive algorithmic synthesis, additive, source-filter and physical models

o↵er the best of all categories. While the additive model provides good sound quality and flexible

synthesis (especially with regard to pitch and time shifting), the sinusoidal basis does not allow

the performer’s input to be separated from the instrument’s response. Physical models provide this

12

separation, but are di�cult to calibrate, especially from a recording, since the physical configuration

of the instrument’s components and the performer’s interaction are generally not known a priori.

Of the remaining models, the source-filter model provides the greatest appeal due to its inherent

simplicity especially, especially as it pertains to modeling the performer’s articulation, relative ease

of calibration and available expressive control.

2.3 Synthesis Applications

The techniques for modeling plucked-guitar tones presented in this thesis are applicable to a number

of sound synthesis tasks. This section will highlight a few such tasks to provide a larger perspective

on the benefits of computational guitar modeling.

2.3.1 Synthesis Engines

There are numerous systems available which encompass a variety of computational sound models

for the creation of synthetic audio. One system includes CSound, which is an audio programming

language created by Vercoe et al. based on the C language [92]. CSound o↵ers the implementation

of several synthesis algorithms, including general filtering operations, additive synthesis and linear

prediction. The Synthesis ToolKit (STK) is another system created by Cook and Scavone, which

adopts a hierarchical approach to sound modeling and synthesis using an open-source application

programming interface based on C++ [11]. STK handles low level, core sound synthesis via unit

generators which include envelopes, oscillators and filters. High-level synthesis routines encapsulate

physical modeling algorithms for specific musical instruments, FM synthesis, additive synthesis and

other routines.

2.3.2 Description and Transmission

Computational modeling of musical instruments, especially the guitar, is highly applicable in sys-

tems requiring generalized audio description and transmission. The MPEG-4 standard is perhaps

the most well-known codec (compressor-decompressor) for transmission of multimedia data. How-

ever, the compression of raw audio, even using the perceptual codec found in mp3, leaves little or no

control over the sound at the decoder. To expand the parametric control of compressed audio, the

MPEG-4 standard includes a descriptor for so-called Structured Audio, which permits the encoding,

transmission and decoding of audio using highly structured descriptions of sound [21, 66, 93]. The

13

audio descriptors can include high-level, performance information for musical sounds such as pitch,

duration, articulation and timbre and low-level descriptions based on the models (e.g. source-filter,

additive synthesis) used to generate the sounds. It should be noted that the structured audio descrip-

tor does not attempt to standardize the model used to parameterize the audio, but provides a means

for describing the synthesis method(s), which keeps the standard flexible. The level of description

provided by structured audio di↵erentiates it from other formats such as pulse-code modulated audio

or mp3, which do not provide contextual descriptions and MIDI (musical instrument digital inter-

face), which provide contextual description, but lacks timbral or expressive descriptors. In essence,

structured audio provides a flexible and descriptive “language” for communicating with synthesis

engines.

2.3.3 New Music Interfaces

Computer music researchers have long sought to develop new interfaces for musical interaction.

Often, these interfaces deviate from the traditional notion in which an instrument is played in order

to appeal to non-musicians or enable entirely new ways of interacting with sound. For the guitar,

Karjalainen et al. developed a “virtual air guitar” where the performer’s hands are tracked using

motion sensing gloves [26]. The guitar tones are produced algorithmically using waveguide models

in response to gestures made by the performer. More recently, commercially available gesture and

multitouch technologies have been used for music creation. The limitations of these systems, however,

is that their audio engines utilize sample-based synthesizers and provide little or no parametric

control over the resulting sound [20, 55].

The plucked-guitar model techniques presented in this dissertation are applicable to each of the

sound synthesis areas outlined above. The source and filter parameters extracted from recordings

can be used for low bit-rate transmission of audio and are based on algorithms (source-filter) that

are either available in many synthesis packages are easily implemented on present-day hardware.

Given the computational power available in present day computers and mobile devices, the anal-

ysis techniques and algorithms presented here can be harnessed into applications for new musical

interfaces as well.

14

CHAPTER 3: PHYSICALLY INSPIRED GUITAR MODELING

3.1 Overview

For the past two decades, physically-inspired modeling systems have emerged as a popular method

for simulating plucked-string instruments since they are capable of producing high-quality tones

with computationally e�cient implementations. The emergence of these techniques was due, in

part, to the innovations of the Karplus-Strong algorithm, which simulated plucked-string sounds

using a simple and e�cient model, which was later shown to approximate the physical phenomena

of traveling waves on a string [22, 30, 31, 72, 89]. Thus, direct physical modeling of a musical

instrument aims to simulate the behavior of particular elements responsible for sound production

(e.g. a vibrating string or resonant air column) due to the musician’s interaction with the instrument

(e.g. plucking or breath excitation) with a digital model [89].

This chapter will briefly overview waveguide techniques for guitar synthesis, which directly models

the traveling wave solution resulting from a plucked string. A related model, known as the single

delay-loop, is also discussed, which is utilized for the analysis and synthesis tasks presented in this

thesis.

3.2 Waveguide Modeling

Directly modeling the complex vibration of guitar strings due to the performer-instrument interaction

is a di�cult problem. However, by using simplified models of plucked-strings, waveguide models o↵er

an intuitive understanding of string and lead to practical and e�cient implementations [72]. In this

section, the well-known traveling wave solution for ideal, plucked-strings is presented [33]. This

general solution is then discretized and digitally implemented, as shown by Smith, to constitute a

digital waveguide model [72]. Common extensions to the waveguide model are also presented, which

correspond to non-ideal string conditions.

15

3.2.1 Solution for the Ideal, Plucked-String

The behavior of a vibrating string is understood by deriving and solving the well-known wave

equation for an ideal, lossless string. The full derivation of the wave equation is documented in

several physics texts [33, 52] and is obtained by computing the tension di↵erential across a curved

section of string with infinitesimal length. This tension is balanced at all times by an inertial

restoring force due to the string’s transverse acceleration.

The wave equation is expressed as [33]

Kty00 = "y (3.1)

where Kt, " are the string’s tension and linear mass density, respectively, and y = y (t, x) is the

string’s transverse displacement at a particular time instant, t, and location along the string, x. The

curvature of the string is indicated by y00 = @2y(t, x)/@x2 and its transverse acceleration is given by

y = @2y(t, x)/@t2. The general solution to the wave equation is given by [33]

y (t, x) = yr (t � x/c) + yl (t + x/c) , (3.2)

where yr and yl are functions that describe the right and left traveling components of the wave,

respectively, and c is the wave speed, which is a constant determined byp

Kt/". It should be noted

that, yr and yl are arbitrary functions of arguments (ct � x) and (ct + x) and it can be verified that

substituting any twice-di↵erentiable function with these arguments for y(t, x) will satisfy Equation

3.1 [33, 72].

Equation 3.2 indicates that the wave solution can be represented by two functions, each depending

on a time and a spatial variable. This notion becomes clear by analyzing an ideal, plucked-string

at a few instances after its initial displacement as shown in Figure 3.1. After the string is released,

its total displacement is obtained by summing the amplitudes of the right- and left-traveling wave

shapes, which propagate away from the plucking position, along the entire length of the string.

3.2.2 Digital Implementation of the Wave Solution

As demonstrated in Figure 3.1, the traveling wave solution has both time and spatial dependencies,

which must be discretized to digitally implement Equation 3.2. Temporal sampling is achieved by

employing a change of variable in Equation 3.2 such that tn = nTs where Ts is the audio sampling

16

t = t1

t = t2

t = t3

Figure 3.1: Traveling wave solution of an ideal string plucked at time t = t1 and its displacement atsubsequent time instances t2, t3. The string’s displacement (solid) at any position is the summationof the two disturbances (dashed) at that position.

interval. The wave’s position is discretized by setting xm = mX, where X = cTs, such that the

waves are sampled at a fixed spatial interval along the string. Substituting t and x with tn and xm

in Equation 3.2 yields [72]:

y (tn, xm) = yr (t � x/c) + yl (t + x/c) (3.3)

= yr (nTs � mX/c) + yl (nTs + mX/c) (3.4)

= yr ((n � m) Ts) + yl ((n + m) Ts) (3.5)

Since all arguments are multiplied by Ts, it is suppressed and the terms corresponding to the right

and left traveling waves can be simplified to [72, 89]:

y+ (n) , yr (nTs) , y� (n) , yl (nTs) (3.6)

Smith showed that Equation 3.5 could be schematically realized as a so-called “digital waveg-

uide” model shown in Figure 3.2 [70, 71, 72]. When the upper and lower signal paths, or “rails”,

of Figure 3.2 are initialized with the values of the string’s left and right wave shapes, the traveling

wave phenomena in Figure 3.1 and Equation 3.2 is achieved by shifting the transverse displacement

values for the wave shapes in the upper and lower rails. For example, during one temporal sampling

instance, the right-traveling wave shifts by the amount cTs along the string, which is equivalent to

delaying y+ by one sample in Figure 3.2. The waveguide model also provides an intuitive under-

standing for how the traveling waves relate to the string’s total displacement, which is obtained by

17

z-1 z-1 z-1

z-1 z-1 z-1

y+(n) y+(n-1) y+(n-2) y+(n-3)

y-(n) y-(n+1) y-(n+2) y-(n+3)

(x = 0) (x = cTs) (x = 2cTs) (x = 3cTs)

y(nTs, 0) y(nTs, 3X)

Figure 3.2: Waveguide model showing the discretized solution of an ideal, plucked string. The upper(y+) and lower (y�) signal paths represent the right and left traveling disturbances, respectively.The string’s displacement is obtained by summing y+ and y� at a desired spatial sample.

summing the values of y+ and y� at a desired spatial sample x = mcTs. It should be noted that the

values obtained at the sampling instants in the waveguide model are exact, although band-limited

interpolation can be used to obtain the displacement between spatial sampling instants if desired

[89].

3.2.3 Lossy Waveguide Model

The lossless waveguide model in Figure 3.2 clearly represents the phenomena of the traveling wave

solution for a plucked string under ideal conditions. However, this model does not incorporate the

characteristics of real strings, which are subject to a number of non-ideal characteristics, such as

internal friction and losses due to boundary collisions. In the context of sound synthesis, incorpo-

rating these properties is essential for modeling tones that behave naturally both from a physical

and perceptual standpoint.

Non-ideal string propagation is hindered by energy losses from internal friction and drag imposed

by the surrounding air. If these losses can be modeled as a constant, µ, proportional to the wave’s

transverse velocity, y, Equation 3.1 can be modified as [72]

Kty00 = "y + µy (3.7)

where the additional term, µy, incorporates the fricative losses applied to the string in the transverse

direction. The solution to Equation 3.7 is the same as Equation 3.1, but with an exponential term

that attenuates the right- and left-traveling waves as a function of propagation distance. The solution

18

z-1y+(n)

y-(n)

(x = 0)

y(nTs, 0)

g z-1 g

g z-1 g z-1

z-1

M sections

g z-1

y(nTs, MX)

g

(x = McTs)

Figure 3.3: Waveguide model incorporating losses due to propagation at the spatial sampling in-stances. The dashed lines outline a section where M gain and delay blocks are consolidated using alinear time-invariant assumption.

is given by [72]:

y(t, x) = e�(µ/2")x/cyr(t � x/c) + e(µ/2")x/cyl(t + x/c) (3.8)

To obtain the lossy waveguide model, Equation 3.8 is discretized by applying the same change of

variables that were used to discretize Equation 3.1. This yields a waveguide model with a gain factor,

g = e�µTs/2", inserted after each delay element in the waveguide as shown in Figure 3.3. Thus, a

particular point along the right- or left-traveling wave shape is subject to an amplitude attenuation

by the amount g as it advances one spatial sample through the waveguide.

By using a linear time-invariant (LTI) assumption, Figure 3.3 can be simplified to reduce the

number of delay and gain elements required for the model. For example, if the output of the

waveguide is observed at x = (M + 1)X, then the previous M delay and gain elements can be

consolidated into a single delay, z�M , and loss factor, g�M . This greatly reduces the complexity of

the waveguide model, which is desirable for practical implementations.

3.2.4 Waveguide Boundary Conditions

In practice, the behavior of a vibrating string is determined by boundary conditions due to the

string’s termination points. In the case of the guitar, each string is terminated at the “nut” and

“bridge” where the former is located near the guitar’s headstock and the latter is mounted on the

guitar’s saddle. The behavior of the string at these locations depends on several factors, including

the string’s tensile properties, how it is fastened and the construction of the bridge and nut. For

19

simplistic modeling, however, it su�ces to assume that guitar string’s are rigidly terminated such

that there is no displacement at these positions.

By assuming rigid terminations for a string with length L, a set of boundary conditions are

obtained for solving the wave equation [33]

y (t, 0) = 0 y (t, L) = 0. (3.9)

By substituting these conditions into Equation 3.2 and discretizing, the following relations between

y+ and y� are obtained [72]:

y+ (n) = �y� (n) (3.10)

y+ (n � D/2) = �y� (n + D/2) (3.11)

In Equation 3.11, D = 2L/X and is often referred to as the “loop delay” since it indicates the delay

time, in samples, for a point on the right wave shape, for example, to travel from x = 0 to x = L

and back along the string. Thus, points located at the same spatial sample on the right and left

wave shapes will have the same amplitude displacement every D/2 samples. Viewed another way,

D can be calculated as a ratio of the sampling frequency and the string’s pitch, which is determined

by the string’s length,

D =2L

X=

2L

cTs=

2Lfsc

=fsf0

(3.12)

where the fundamental frequency, f0, was substituted based on the wave relationship f0 = c/2L

where 2L is the wavelength and c is the wavespeed.

Figure 3.4 shows the lossy waveguide model with boundary conditions superimposed on a guitar

body to illustrate the physical relationship between the model and instrument. The loss factors due

to wave propagation and rigid boundary conditions are consolidated into two filters located at x = 0

and x = L, which correlate the guitar’s bridge and nut positions, respectively. The individual delay

elements are merged into two bulk delay lines, each having a length of D/2 samples and store the

shapes of the left- and right-traveling wave shapes at any time during the simulation. Furthermore,

this model allows the string’s initial conditions to be specified relative to a spatial sample in the

delay line that represents the plucking point position. Initializing the waveguide in this way removes

20

Delay Line D/2 Samples

Hb(z)

y+(n-D/2)y+(n)

y-(n) y-(n+D/2)

(x = 0) (x = L)

y(nTs, M1X) Hh(z)

Bridge Nut

(x = M1X)

Pickup Pluck Point

(x = M2X)

Delay Line D/2 Samples

Figure 3.4: Plucked-string waveguide model as it correlates to the physical layout of the guitar.Propagation losses and boundary conditions are lumped into digital filters located at the bridge andnut positions. The delay lines are initialized with the string’s initial displacement.

the need to explicitly model the coupling e↵ects arising from the interaction between the string and

excitation mechanism [72]. The guitar’s output is observed at the “pickup” location by summing

the values of the upper and lower delay lines at a desired spatial sample.

The simplistic nature of the the waveguide model in Figure 3.4 leads to computationally e�cient

hardware and software implementations of realistic plucked guitar sounds. Memory requirements

are minimal, since only two bu↵ers are required to store the string’s initial conditions and the

lossy boundaries can be implemented with simple digital filters. Furthermore, as Smith showed,

the contents of the delay lines can be shifted via pointer manipulation to reduce the load on the

processor [10, 72]. Karjalainen showed that using such techniques enables several string models to

be implemented on a single DSP chip, with computational capabilities that are eclipsed by present

day (2012) microprocessors [25].

3.2.5 Extensions to the Waveguide Model

An important extension is providing fractional delay for the waveguide model since strings are often

tuned to non-integer frequencies that may not be obtainable by taking the ratio of sampling frequency

over delay line length. While certain hardware and software configurations support multiple sampling

rates, it is generally undesirable to vary the sampling rate to achieve a particular tuning, especially

when synthesizing multiple string tones with di↵erent pitches. Instead, Karjalainen proposed adding

21

fractional delay into the waveguide loop via a Lagrange interpolation filter. Thus, a FIR filter is

computed to add the required fractional delay to precisely tune the waveguide [25].

Smith proposed using all-pass filters to simulate the e↵ects of dispersion in strings, where the

string’s internal sti↵ness causes higher frequency components of the wave to travel faster than lower

ones. This has the e↵ect of constantly altering the shape of the string. All-pass filters introduce

frequency-dependent group delay to simulate this e↵ect [72].

Tolonen et al. incorporate the e↵ects of “pitch glide,” or tension modulation, exhibited by real

strings using a non-linear waveguide model [79, 80, 91]. At rest, a string exhibits a nominal length

and tension. However, as the string is displaced from its equilibrium position, the string undergoes

elongation which increases its tension. After release, the tension and, thus, the wave speed constantly

fluctuates as the string oscillates about its nominal position. This constant fluctuation does not allow

a fixed spatial sampling scheme to su�ce and the wave must be resampled at each time instance to

account for the elongation.

3.3 Analysis and Synthesis Using Source-Filter Approximations

The waveguide model discussed in the previous discussion provides an intuitive methodology for

implementing the traveling wave solution and simulating plucked-string tones. However, accurate

re-synthesis of plucked-guitar tones using the waveguide model requires knowledge of the string’s

initial conditions and loss filters that are correctly calibrated to simulate naturally decaying tones.

The former requirement is a significant limitation since the exact initial conditions of the string

are not available from a recorded signal and must be measured during performance, which is often

impractical. Therefore, when performance and physical data are unavailable, the utility of the

waveguide model is limited for analysis-synthesis tasks, such as characterizing recorded performance.

An alternative model, known as the single delay-loop (SDL), was developed to simplify the

waveguide model from a computational standpoint by consolidating the delay lines and loss filters.

The SDL model is also widely used in the literature because it permits the analysis of plucked-

guitar tones from a source-filter perspective; that is, an external signal excites a filter to simulate

the resonant behavior of a plucked string. Thus, the physical specifications for the guitar and its

strings are generally not required to calibrate the SDL model since linear time-invariant methods

can be applied for this task. A number of guitar synthesis systems are based on SDL models

[26, 56, 74, 75, 90].

22

3.3.1 Relation to the Karplus-Strong Model

For a more streamlined structure, the bidirectional waveguide model from Figure 3.4 can be reduced

to a single, D-length delay line and a loop filter that consolidates the losses incurred from the bridge

and nut [7, 72]. This reduction is shown in Figure 3.5, where the lower delay line is concatenated

with the upper delay line at the nut position. The wave shape contained in the lower delay line is

inverted to incorporate the reflection at the rigid nut, which has been removed.

Hb (z)

y+(n-D/2)y+(n)

y-(n) y-(n+D/2)

Hh (z)

D/2 Samples

D/2 Samples

D Samples

Hl (z)

y+(n) y+(n-D)

Figure 3.5: Single delay-loop model (right) obtained by concatenating the two delay lines from abidirectional waveguide model (left) at the nut position. Losses from the bridge and nut filters areconsolidated into a single filter in the feedback loop.

The new waveguide structure in Figure 3.5 (right) demonstrates the basic SDL model and is

identical to the well-known Karplus-Strong (KS) plucked-string model, whose discovery pre-dated

waveguide synthesis techniques [22, 31]. Unlike waveguide techniques where the excitation is based

on wave variables, the KS model works by initializing a D-length delay line with random values

and circularly shifting the samples through a loss filter. The random initialization of the delay line

simulates the transient noise burst perceived during the attack of plucked-string instruments, though

this “excitation” signal has no physical relation to the string, while the feedback loop acts a comb

filter so that only the harmonically-related frequencies are passed. The loss filter, Hl(z), employs

low-pass filtering to implement the frequency dependent decay characteristics of real strings so that

high frequency energy dissipates faster than the lower frequencies.

3.3.2 Plucked String Synthesis as a Source-Filter Interaction

By modeling plucked-guitar tones with the single-delay loop (SDL), the physical interpretation

of traveling wave shapes on a string is no longer clear as it was for the bidirectional waveguide.

However, Valimaki et al. show that the SDL can be derived from the bidirectional waveguide model

by computing a transfer function between the spatial samples representing the plucking position

23

and output samples [30, 89]. This derivation is still physically valid, though the model’s excitation

signal is treated as an external input rather than a set of initial conditions describing the string’s

displacement.

Figure 3.6 shows a complete source-filter model for plucked guitar synthesis based on waveguide

modeling principles. The SDL model is contained in the block labeled S(z), which is equivalent

to the single delay line structure shown in Figure 3.5, except the model is driven by an external

excitation signal rather than a random initialization as in the Karplus-Strong model. S(z) alone

cannot simulate the complete behavior of plucked-strings found in the waveguide model. Notably,

missing is the ability to manipulate the plucking point and pick up positions, both of which are

achieved by selecting a desired spatial sample in the waveguide model corresponding to the location

on where the string is displaced and where the vibration is observed as the output. Valimaki showed

that this functionality could be achieved by adding comb filters before and after the SDL to simulate

the e↵ects of plucking point and pickup positions present in the waveguide model.

Figure 3.6 shows a comb filter C(z) preceding S(z) to simulate the e↵ect of the plucking point

position. For simplicity, the input p(n) can be an ideal impulse. The comb filter delay determines

when p(n) is reflected, which is analogous to a sample in the digital waveguide model encountering

a rigid boundary. The number of samples between the initial and reflected impulses is specified as a

fraction � of the loop delay where D indicates the number of samples corresponding to one period of

string vibration. Similarly, the comb filter U(z) proceeding S(z) simulates the position of the pickup

seen on electric guitars. In this filter, the comb filter delay specifies the delay between arriving pulses

associated with a relative position along the string. It should be noted that, since each of the blocks

in Figure 3.6 are linear time-invariant (LTI) systems, they may be freely interchanged as desired.

3.3.3 SDL Components

Whereas the comb filters in Figure 3.6 specify initial and output observation conditions for the

plucked guitar tone, the SDL filter in S(z) is responsible for modeling the string vibration including

its fundamental frequency and decay. As in the case of the bidirectional waveguide, the total “loop

delay”, D, of the SDL denoted by S(z) determines the pitch of the resulting guitar tone as determined

by Equation 3.12. Since D is typically a non-integer, the fractional delay filter, HF (z), is used to

add the required fractional group delay, while z�DI provides the bulk, integer delay component of

D. All-pass and Lagrange interpolation filters are commonly used for HF (z), with the latter being

24

Hl (z) HF (z) z-DI

p(n)

y(n)

S(z)

+z-λ1D

−+

+

C(z)

+

+

+z-λ2D

−+

U(z)

Figure 3.6: Plucked string synthesis using the single delay-loop (SDL) model specified by S(z). C(z)and U(z) are comb filters simulating the e↵ects of the plucking point and pickup positions along thestring, respectively.

especially popular in synthesis systems since it can achieve variable delay for pitch modification

without significant transient e↵ects [26, 30]. Additional information pertaining to fractional delay

filters is provided in Appendix A.

Hl(z) is the so-called “loop filter” and is responsible for implementing the non-ideal characteristics

of real strings, including losses due to wave propagation and terminations at the nut and bridge

positions. In the early developments of waveguide synthesis, Hl(z) was chosen as a two-tap, averaging

filter for simplicity and e�ciency [31], but since a low order, FIR filter is often too simplistic to match

the magnitude decay characteristics of plucked-guitar tones. In the literature, a first order, IIR filter

is often used for Hl(z) and has the form

Hl(z) =g

1 � ↵0z�1(3.13)

where ↵0 and g must be determined for proper calibration [29, 62, 86, 90]

It is useful to analyze the total delay, D, in the SDL as a sum of the delays contributed by each

component in the feedback loop,

D = ⌧l + DF + DI (3.14)

25

where ⌧l, DF , DI are the group delays associated with Hl(z), HF (z) and z�DI , respectively. Thus,

the bulk and fractional delay components should be chosen to compensate for the group delay

introduced by the loop filter, which varies as a function of ↵0.

For spectral-based analysis, the transfer function of the SDL model between input, p(n), and

output, y(n), can be expressed in the z-transform domain as

S(z) =1

1 � Hl(z)HF (z)z�DI. (3.15)

Equation 3.15 can be thought of as a modified linear prediction where the prediction occurs over

DI samples due to the periodic nature of plucked-guitar tones. The “prediction” coe�cients are

determined by the coe�cients of the loop and fractional delay filters in the feedback loop of S(z).

The SDL model in Figure 3.6 is attractive from an analysis-synthesis perspective since, unlike the

bidirectional waveguide model, it does not require specific data about the string during performance

(e.g. initial conditions, instrument materials, plucking technique) to faithfully replicate plucked-

guitar tones. Rather, the problem becomes properly calibrating the filters from recorded tones via

model-based analysis. A significant portion of the literature for plucked-guitar synthesis is dedicated

towards developing calibration schemes for extracting optimal SDL components [26, 29, 62, 69, 86,

90].

3.3.4 Excitation and Body Modeling via Commuted Synthesis

When using the SDL model for guitar synthesis, the output signal is assumed to be strictly the result

of the string’s vibration where the only external forces acting on the string are due to fricative losses.

This assumption is not necessarily true when dealing with real guitars, since the instrument’s body

incorporates a resonant filter, which a↵ects its timbre, and interacts with the strings via nonlinear

coupling. Valimaki et al. describe the acoustic guitar body as a multidimensional resonator, which

requires computationally expensive modeling techniques to implement [89].

While an exhaustive review of acoustic body modeling techniques is beyond the current scope,

several attempts have been made to reduce the complexity of this task [7, 28, 57]. Measurement of the

acoustic guitar body response is typically achieved by striking the resonant body of the instrument

with a hammer with the strings muted. The acoustic radiation is recorded to capture the resonant

body modes. In some cases, electro-mechanical actuators are used to excite and measure the resonant

body in a controlled manner [63]. Digital implementation of the acoustic body involves designing a

26

y(n)E(z) S(z) B(z)δ(n)

Excitation Filter SDL Model Body Filter

Figure 3.7: Components for guitar synthesis including excitation, string and body filters. Theexcitation and body filter’s may be consolidated for commuted synthesis.

filter that captures the resonant modes. This can be achieved using FIR or IIR filters, though precise

modeling requires very high order filters. Karjalainen et al. proposed using warped filter models

for computationally e�cient modeling and synthesis of acoustic guitar bodies. The warped filter

is advantageous since the frequency resolution of the filter can favor the lower, resonant frequency

modes which are perceptually important to capture for re-synthesis, while keeping the required filter

orders low enough for e�cient synthesis [24]. For “cross-synthesis” applications, Karjalainen et al.

introduced a technique to “morph” electric guitar sounds into acoustic tones through equalization

of the magnetic pickups found on electric guitars. A filter, which encapsulates the body e↵ects of

the acoustic guitar, was then applied to a digital waveguide model of the instrument [27].

A popular method for dealing with the absent resonant body e↵ects in SDL model involves using

so-called commuted synthesis, which was independently developed by Smith and Karjalainen [29, 73].

This technique exploits the commutative property of linear time-invariant (LTI) systems in order to

extract an aggregate signal that encapsulates the e↵ects of the resonant body filter and the string

excitation, p(n), of the SDL model when the loop filter parameters are known. This approach avoids

the computational cost incurred with explicitly modeling the body with a high-order filter.

Figure 3.7 shows the SDL model augmented by inserting excitation and body filters before and

after the SDL loop, respectively. The excitation filter is a general LTI block that encapsulates several

aspects of synthesis including “pluck-shaping” filters to model certain dynamics in the articulation

and the comb filtering e↵ects from the plucking point and/or pickup locations as shown in Figure

3.6. Assuming that S(z) and y(n) are known, the LTI system can be rearranged

Y (z) = E (z) S (z) B (z) (3.16)

= E (z) B (z) S (z) (3.17)

= A (z) S (z) (3.18)

where A(z) is an aggregation of the body and excitation filters. By inverse filtering y(n) in the

27

frequency domain with S(z), the impulse response for A(z) is obtained. Thus, by making a LTI

assumption on the model, this residual signal contains the additional model components which are

unaccounted for by the SDL alone. For practical considerations, Valimaki notes that several hundred

milliseconds of the residual signal may be required to capture the perceptually relevant resonances

of the acoustic body during resynthesis [90], but for many applications the tradeo↵ of storing this

signal outweighs the cost of explicit body modeling.

It should be noted, that even when plucked-guitar tones do not exhibit prominent e↵ects from

the resonant body, commuted synthesis is still a valid technique for obtaining the SDL excitation

signal, p(n). This is often the case for electric guitar tones, where the output is measured by a

transducer and is relatively “dry” compared to an acoustic guitar signal. Also, any excitation signal

extracted via commuted synthesis will contain biases from the plucking point and pickup locations

unless these phenomena are specifically accounted for in the “excitation filter” block of Figure 3.7.

If the plucking point and pickup locations are known with respect to the SDL model, the excitation

signal can be “equalized” to remove the biases. There are several techniques utilized in the literature

to estimate the plucking point location directly from recordings of plucked guitar tones. Traube and

Smith developed frequency domain techniques for acoustic guitars [81, 82, 83, 84], while Pentttinen

et al. employed time-domain analysis to determine the relative plucking position along the string

[58, 59].

3.3.5 SDL Loop Filter Estimation

Before the SDL excitation signal can be extracted via commuted synthesis, the loop filter, Hl(z),

needs to be calibrated from the recorded tone. This task has been the primary focus in much of

the literature, since the loop filter provides the synthesized tones with natural decay characteristics

[14, 29, 39, 62, 69, 86, 90]. This section will overview some of the techniques used in the literature.

Early attempts at modeling the loop filter for the violin involved using deconvolution in the

frequency domain to obtain an estimate of the loop filter’s magnitude response. Smith employed

various filter design techniques, including autoregressive methods, in order to model the contours of

the spectra, however, the measured spectra were subject to amplified noise due to the deconvolution

process [69].

Karjalainen introduced a more robust algorithm that extracts magnitude response specifications

for the loop filter by analyzing the recorded tone with a short-time Fourier transform (STFT)

28

analysis [29]. Phase characteristics of the STFT are not considered in the loop filter design since the

magnitude response is considered to be perceptually more important for plucked-guitar modeling

[29, 86].

Lee et al. expand on Karjalainen’s STFT-based approach by adapting the so-called Energy Decay

Relief (EDR) [40, 64] to model the frequency-dependent attenuation of the waveguide. The EDR

was adapted from Jot [23] in order to de-emphasize the e↵ects of beating in the string so that the

resulting magnitude trajectories for each partial are strictly monotonic. Thus, the EDR at time

t and frequency f is computed by summing all the remaining energy at that frequency from t to

infinity. Due to the decaying nature of plucked-guitar tones, this leads to a set of monotonically

decreasing curves for each partial analyzed.

Example algorithm for Loop Filter Estimation

An example of Karjalainen’s calibration scheme is shown in Figure 3.8 and can be summarized with

the following steps:

1. Determine the pitch, f0, of the recorded tone, y(n).

2. Compute the STFT on the plucked tone y(n).

3. For each frame in the STFT, estimate the magnitudes of the harmonically-related partials.

4. Estimate the slope of each partial’s magnitude trajectory across all frames in the STFT.

5. Compute a gain profile, G(fk), based on the magnitude trajectories for each harmonically

related partials.

6. Apply filter design techniques (e.g. least-squares) to determine the parameters of Hl(z) that

satisfy the gain profile.

The details of each step in Karjalainen’s calibration scheme vary depending on the specific imple-

mentation. For example, the number of partials chosen to analyze is typically between 10-20. Also,

partial-tracking across each frame can be achieved by bandpass filtering techniques when the pitch

is known [90].

The gain profile, G(fk), extracted from the STFT analysis is computed as [29]

G(fk) = 10�kD

20fHop (3.19)

29

where �k is the slope of the kth partial’s magnitude trajectory, D is the “loop delay” in samples and

fHop is the hop size of the STFT analysis. The physical meaning of Equation 3.19 is to determine

the amount of attenuation a particular partial of the plucked tone incurs for each pass through the

SDL. Thus, Equation 3.19 provides a gain specification for each partial in the STFT that can be

used to design a loop filter, Hl(z), with similar magnitude response characteristics.

Filter Design Techniques

Least-squares filter design techniques are typically employed to derive coe�cients for the loop filter

that satisfy the estimated gain profile [29, 86, 90]. Valimaki et al. utilized a weighted, least squares

algorithm to estimate the gain, g, and pole, ↵0 of Hl(z) with a transfer function described by

Equation 3.13. Since a low-order filter generally cannot match the gain specifications of every

partial, the weighted minimization ensures that the magnitudes of the lower, perceptually important

partials are more accurately matched with the gain profile [86, 90]. These techniques must ensure

that the filter coe�cients are constrained for stability, which, for example, requires 1 < ↵0 < 0 and

0 < g < 1 when using the loop filter form of Equation 3.13. Rather than design a filter based on

desired magnitude characteristics, Bank et al. propose filter design technique which minimizes the

error of the decay times for the partials in the synthetic tone [3], which are found to be perceptually

significant.

Erkut and Laurson used Karjalainen’s calibration method as a foundation for an iterative scheme

based on nonlinear optimization to extract loop filter parameters that best match the amplitude

envelope of a recorded tone [14, 39]. The calibration scheme in Figure 3.8 is used to obtain an

initial set of loop filter parameters, which are used to resynthesize the plucked signal and an error

signal is computed between the amplitude envelopes of the recorded and synthesized signals. The

loop filter parameters are adjusted by a small amount and the process is repeated until a global

minimum in the error function is found. While this method has the potential to extract precise

model parameters, convergence is not guaranteed and its success depends on the accuracy of the

initial parameter estimates.

30

Pitch Estimation

y(n)

STFT

Peak Detection

Loop Filter Design

f0

Y(m, ω)

g, α0

0 100 200 300 400 500 600 700 800

0.98

0.985

0.99

0.995

1

Loop Filter Gain Specifications

Frequency (Hz)

Gai

n

Gain ProfileDesigned Filter Magnitude

0 0.5 1 1.5 2−20

−10

0

10

20

30

40

Trajectories of the Partials from a Plucked−Guitar Tone

Time (sec)

Mag

nitu

de (d

B)

Partial 1FittedPartial 2FittedPartial 3FittedPartial 4FittedPartial 5Fitted

0 0.05 0.1 0.15 0.2−0.25

−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

Time (sec)

Ampl

itude

Plucked Guitar Tone

Figure 3.8: Overview of the loop filter design algorithm outlined in Section 3.3.5 using short-timeFourier transform analysis on the signal.

31

3.4 Extensions to the SDL Model

The SDL model discussed in this chapter simulates plucked strings that vibrate in only the transverse

(parallel to the guitar’s top plate) direction and behave in accordance with linear time-invariant as-

sumptions. These simplifications prevent modeling additional physical behavior exhibited by guitar

strings, which are described in this section. Real guitar strings vibrate along the axes parallel and

perpendicular to the guitar’s sound board. The frequency of vibration along each axis is slightly

di↵erent due to slight di↵erences in the string’s length at the bridge and nut terminations. The

di↵erences in the frequency of vibration along each axis causes the “beating” phenomena where

the sum and di↵erence frequencies are perceived [9]. Furthermore, these vibrations may be coupled

at the guitar’s bridge termination, which causes a two-stage decay due to the in- and out-of-phase

vibration along each axis [43].

In practice, the beating phenomena is incorporated into synthesis systems by driving two SDL

models in parallel, which represent string vibration along the transverse and perpendicular axes

[30, 26, 86]. From an analysis perspective, it is di�cult to simultaneously estimate parameters for

both the transverse and perpendicular axes from a recording since guitar pick-ups measure the total

vibration at a particular point on the string. Typically, the parameters for both SDL model are

extracted using the methods described in Section 3.3.5 with the exception of slightly mistuning one

of the delay lines to simulate the beating e↵ect. In order to estimate the model parameters directly,

Riionheimo utilized genetic algorithms to obtain transverse and perpendicular SDL parameters that

matched recorded signals in a perceptual sense [62]. Alternately, Lee employed a hybrid waveguide-

signal approach where the waveguide model is augmented with a resonator bank to implement

beating and two-stage decay phenomena in the lower frequency partials [43].

Modeling the tension modulation in strings necessitates the use of non-linear techniques to model

the “pitch-glide” phenomena [79, 80]. In practice, pitch-glide is simulated by pre-loading a waveguide

or SDL model with an initial string displacement and regularly computing the string’s slope to

determine an elongation parameter. This parameter drives a time-varying delay, which represents

wave speed to reproduce the tension modulation e↵ect. The caveat to this approach, however, is

that commuted synthesis cannot be applied to extract an excitation signal from a recorded tone.

For an analysis-synthesis approach, Lee uses a hybrid resonator-waveguide model. The resonator

bank is calibrated from a recording to implement pitch-glide in the low-frequency partials, since, it

is argued, that these are perceptually more relevant [42].

32

CHAPTER 4: SOURCE-FILTER PARAMETER ESTIMATION

4.1 Overview

Despite the vast amount of literature dedicated towards developing and calibrating physically in-

spired guitar models, as discussed in Chapter 3, far less research has been dedicated towards esti-

mating expression from recorded performances and incorporating these attributes into the synthesis

models. It is well-known that guitarists employ a variety of techniques to articulate guitar strings,

such as varying the loudness, or dynamics, and picking device (e.g. finger, pick), which characterizes

their playing style. Thus, identifying these playing styles from a performance is essential towards

developing a system capable of expressive synthesis.

In this chapter, I propose a novel method to capture expressive characteristics of guitar perfor-

mance from recordings in accordance with the single delay-loop (SDL) model overviewed in Section

3.3. This approach involves jointly estimating the source and filter parameters of the SDL in accor-

dance with a parametric model for the excitation signal, which captures the expressive attributes of

guitar performance. Since the SDL is a source-filter abstraction of the waveguide model, this method

treats the source signal as the guitarist’s string articulation while the filter represents the string’s

response behavior. The motivation for a joint estimation scheme is to account for simultaneous

variation of source and filter parameters, which characterizes particular playing styles.

Before providing the details of our approach, I briefly overview existing techniques in the litera-

ture for modeling expression in guitar synthesis models.

4.2 Background on Expressive Guitar Modeling

Erkut and Laurson present methods to generate plucked-tones with di↵erent levels of musical dynam-

ics, or relative “loudness”, by manipulating a reference excitation signal with a known dynamics level.

These methods involve designing pluck-shaping filters that can achieve a desired musical dynamics

when applied to the reference excitation signal [14]. Erkut employs a method that deconvolves a

fortissimo (very loud) excitation with forte (loud) and piano (soft) excitations in order to derive

33

their respective pluck-shaping filter coe�cients. Laurson used the di↵erences in log-magnitude be-

tween two signals with di↵erent dynamics and autoregressive filter design techniques to approximate

a desired pluck-shaping filter [39]. Both approaches are founded on an argument that a desired

level of musical dynamics can be achieved by appropriately filtering a reference excitation signal.

A limitation of this approach, however, is the assumption that the string filter parameters remain

constant for all plucking styles, which does not always hold.

Cuzzocoli et al. presented a model for synthesizing guitar expression by considering the finger-

string interaction for di↵erent plucking styles in classical guitar performance [12]. This work consid-

ered two plucking styles; apoyando, where the string is displaced quickly by the finger, and tirando,

where the finger slowly displaces the string before releasing it. The e↵ects of these finger-string in-

teractions are incorporated into the waveguide model by modifying the wave equation to incorporate

the force exerted on the string depending on the plucking style. For example, in the case of apoyando

plucking, the force applied to the string is impulsive, while tirando plucks are characterized by a

more gradual change in the string’s tension. Cuzzucoli’s approach relies on o↵-line analysis and no

methods are provided for deriving these parameters from a recorded signal.

Though these approaches adequately model expressive intention(s), o✏ine analysis is required to

compute the model’s excitation signal separately from the filter. This approach is counter-intuitive

from a musical performance perspective, since it is understood by musicians that expression is, in

part, the result of a simultaneous interaction between the performer and instrument.

4.3 Excitation Analysis

The SDL model presented in Section 3.3 assumes that plucked-guitar synthesis can be modeled

by a linear and time-invariant system. Accordingly the model output is the result of a convolution

between a source signal p(n) a comb filter C(z) approximating the performer’s plucking point position

and the string filter model S(z). For analysis-synthesis tasks, the commuted synthesis technique, as

overviewed in Section 3.3.4, is used to compute pb(n) by inverse filtering the recorded tone, y(n), in

the frequency domain with S(z) as shown in Equation 4.1:

Pb(z) = Y (z)S�1(z) (4.1)

34

It should be noted that the subscript b on p(n) indicates that the excitation signal contains a bias

from the performer’s plucking point position. Unless the comb filter C(z) from Section 3.3.4 is

known, the excitation signal derived from commuted synthesis will always contain this type of bias.

4.3.1 Experiment: Expressive Variation on a Single Note

To determine if the SDL model can incorporate expressive attributes of guitar performance, exci-

tation signals are analyzed corresponding to di↵erent articulations for the same note on an electric

guitar by employing commuted synthesis with Equation 4.1. Assuming the string filter parameters

are relatively constant for each performance, one might expect that the excitation signals contain the

expressive characteristics that distinguish each playing style. Additionally, any similarities observed

between the excitations may permit the development of a parametric input model.

To test this hypothesis, recordings of electric guitar performance were analyzed using the follow-

ing approach; For each plucking style:

1. Vary the relative plucking strength used to excite the string from piano (soft) to forte (loud).

2. Vary the articulation used to excite the string using either a pick or a finger.

3. Calibrate the string filter, S(z), using the methodology described in Section 3.3.5

4. Extract pb(n) by inverse filtering the recording, y(n), with S(z)

The tones used for analysis were taken from an electric guitar equipped with a bridge-mounted

piezo electric pickup. These signals are relatively “dry” with negligible e↵ects from the instrument’s

resonant body so that the recovered excitation signals should primarily indicate the performer’s

articulation. The bridge-mounted pickup ensures that the output will be observed from the same

location on the string and the recovered excitation signal will only contain a bias due to the plucking

point e↵ect.

The top panel of Figure 4.1 shows the recorded tones produced from specific articulations applied

to the guitar’s “open”, or unfretted, 1st string and the corresponding excitation signals obtained

using the approach outlined above are shown in the bottom panel. By observation, it is clear that

each excitation signal corresponds to the first period of oscillation for its associated signal in the top

panel of Figure 4.1 and each has negligible amplitude after this period. This is an intuitive result

since the SDL used for synthesis is tuned for the pitch of the string and its harmonics. By inverse

filtering with the SDL, the residual signal is devoid of the periodic and harmonic structure of the

35

−0.6

−0.4

−0.2

0

0.2

0.4

Ampl

itude

finger, pianofinger, fortepick, pianopick, forte

0 1 2 3 4 5 6 7

−0.6

−0.4

−0.2

0

0.2

0.4

Time (msec)

Ampl

itude

finger, pianofinger, fortepick, pianopick, forte

Figure 4.1: Top: Plucked guitar tones representing various string articulations by the guitarist onthe open, 1st string (pitch E4, 329.63 Hz). Bottom: Excitation signals for the SDL model associatedwith each plucking style.

recorded tone. The remaining “spikes” in the excitation signal correspond to incident and reflected

pulses detected by the pick up after the string is released from displacement (see Section 4.3.2).

Despite the similar contour patterns of the excitation signals in Figure 4.1, there are several

distinguishing features related to the perceived di↵erences in timbre. The di↵erences between the

amplitudes of overlapping impulses corresponds to the relative strength of the articulation used to

produce the tone. More interestingly, however, are the di↵erences between the tones produced with

a pick and those produced with the finger, as the former features sharper transitions near regions of

maximum or minimum amplitude displacement. This observation is correlated with the perceived

timbre of each tone since plucks generated with a pick have a more pronounced “attack” and will

36

excite the high-frequency harmonics in the string.

The common structure of the excitation signals in Figure 4.1 suggest that pb(n) can be parametri-

cally represented to capture the variations imparted by the guitarist through the applied articulation.

4.3.2 Physicality of the SDL Excitation Signal

The excitation signals shown in Figure 4.1 follow the contours of their counterpart plucked signals

in Figure 4.1. However, the excitation signal is a short transient event that reduces to residual error

after one period of oscillation in the corresponding plucked tones. Essentially, the excitation signal

indicates one period of oscillation in the vibrating string measured at a particular position along the

string. In this case, the acceleration of the string at the guitar’s bridge is the variable observed.

The peaks observed in the excitation signals of Figure 4.1 can be explained by observing the

output of a bidirectional waveguide model over one period of oscillation. This is shown in Figure

4.2 where the output at the end of the waveguide representing the guitar’s bridge position is traced

over time. Initially, the amplitude of the acceleration wave is maximal at the moment the string is

released from its initial displacement (Figure 4.2a). After time, two separate disturbances form and

travel in opposite directions along the string (Figure 4.2b). The initial peak in the excitation signal

occurs when the right-traveling wave encounters the bridge position (Figure 4.2c). The amplitude of

both traveling waves is inverted after reflecting with the boundary conditions at the nut and bridge

positions. Eventually, the initially left-traveling wave, now with inverted amplitude, encounters the

bridge position forming the second pulse of the excitation signal (Figure 4.2e). After sometime,

the initial pulse returns and the cycle repeats (Figure 4.2e). As will be discussed in Chapter 6,

identifying the pulse locations in the excitation signal can be used to estimate the guitarist’s relative

plucking position.

37

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1

−0.5

0

0.5

1

String Length (meters)

Acce

lera

tion

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

Time (msec)

Brid

ge A

ccel

erat

ion

(a) t = 0 msec

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1

−0.5

0

0.5

1

String Length (meters)

Acce

lera

tion

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

Time (msec)

Brid

ge A

ccel

erat

ion

(b) t = 0.56 msec

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1

−0.5

0

0.5

1

String Length (meters)

Acce

lera

tion

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

Time (msec)

Brid

ge A

ccel

erat

ion

(c) t = 1.156 msec

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1

−0.5

0

0.5

1

String Length (meters)

Acce

lera

tion

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

Time (msec)

Brid

ge A

ccel

erat

ion

(d) t = 2.26 msec

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1

−0.5

0

0.5

1

String Length (meters)

Acce

lera

tion

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

Time (msec)

Brid

ge A

ccel

erat

ion

(e) t = 3.37 msec

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7−1

−0.5

0

0.5

1

String Length (meters)

Acce

lera

tion

0 1 2 3 4 5 6−1

−0.5

0

0.5

1

Time (msec)

Brid

ge A

ccel

erat

ion

(f) t = 5.67 msec

Figure 4.2: The output of a waveguide model is observed over one period of oscillation. The top figurein each subplot shows the position of the traveling acceleration waves at di↵erent time instances.The bottom plot traces out the measured acceleration at the bridge (noted by the ’x’ in the topplots) over time.

38

4.3.3 Parametric Excitation Model

The contour patterns of the excitation signals observed in Figure 4.1 and the simulated waveguide

output of Figure 4.2 are consistent with the physical behavior of the vibrating string. This suggests

that the variations in the physical behavior of a plucked-string due to di↵erent articulations can be

parametrically represented by capturing the contours of the pulse peaks. Modeling the excitation

signal with polynomial segments is a reasonable choice for approximating each contour. By concate-

nating these polynomial segments together, the excitation signal can be represented by a piecewise

function

pb(n) = c1,0n0 + c1,1n

1 + · · · + c1,KnK + · · · + cJ,0n0 + cJ,1n

1 + · · · + cJ,KnK (4.2)

where cJ,k is the kth coe�cient of a Kth order polynomial modeling the J th segment of pb(n).

Therefore, modeling a particular excitation signal requires determining the number of segments

required, the polynomial degree used to model each segment and the boundary locations specifying

where a particular segment begins and ends.

4.4 Joint Source-Filter Estimation

As shown in Section 4.3.2, the SDL excitation signal reflects one period of oscillation observed at

a particular location along the string. Also, it was shown that these signals di↵er according to the

articulation imparted by the guitarist and that a parametric model was proposed that can account

for these di↵erences. To model the SDL filter in response to di↵erent inputs (i.e. string articulations),

this section proposes a joint source-filter approach to simultaneously account for variation in the

excitation and string filter parameters. This section will detail the approach for estimating these

parameters by formulating a convex optimization problem.

4.4.1 Error Minimization

Using the SDL model, plucked string synthesis is assumed to result from a convolution between an

input signal and a string filter. To estimate these parameters in a joint framework, the error between

the excitation model described by Equation 4.2 and the residual signal must be minimized

e (n) = pb (n) � pb (n) . (4.3)

39

Here, pb(n) is the excitation model from Equation 4.2 and pb(n) is the residual obtained by inverse

filtering the output with the string filter. By assuming S(z) is an all-pole filter, e(n) can be expressed

in the frequency domain by replacing pb (n) with Y (z)S�1(z) to yield

E(z) = Pb(z) � Y (z)S�1(z)

= Pb(z) � Y (z)(1 � Hl(z)HF (z)z�D) (4.4)

where the SDL components discussed in Chapter 3 are used to complete the inverse filtering oper-

ation. Making an all-pole assumption on S(z) treats the output of the SDL as a generalized linear

prediction problem where the current output sample y(n) is computed by a linear combination of

previous output samples. Due to the periodic nature of the plucked tone, this prediction happens

over an interval defined by the loop delay which is specified by D.

Since inverse-filtering is a time-domain process, taking the inverse Z-Transform of E(z) in Equa-

tion 4.4 yields

e(n) = pb(n) � y(n) + ↵0y(n � D) + ↵1y(n � D � 1) + · · · + ↵Ny(n � D � N), (4.5)

where ↵0, ↵1, . . . are generalized filter coe�cients that are to be estimated. This equation can be

rearranged to

e(n) = pb(n) + ↵0y(n � D) + ↵1y(n � D � 1) + · · · + ↵Ny(n � D � N) � y(n), (4.6)

where the unknowns due to the source signal pb(n) and filter (↵0, ↵1, . . . ) are clearly separated from

the recorded tone y(n). This form leads to a convenient matrix formulation as shown in Equation

4.7.

40

2

666666666666664

e(1)...

e(i)

e(i + 1)...

e(m)

3

777777777777775

=

2

666666666666664

10 · · · 1K 0 · · · 0 y(1 � D) · · · y(1 � D � N)...

......

......

......

......

i0 · · · iK 0 · · · 0 y(i � D) · · · y(i � D � N)

0 · · · 0 (i + 1)0 · · · (i + 1)K y(i + 1 � D) · · · y(i + 1 � D � N)...

......

......

......

......

0 · · · 0 m0 · · · mK y(m � D) · · · y(m � D � N)

3

777777777777775

x�

2

666666666666664

y(1)...

y(i)

y(i + 1)...

y(m)

3

777777777777775

e = Hx� y (4.7)

H contains the time indices corresponding to the boundaries of pb(n) and the shifted samples of

y(n) and the unknown source-filter parameters are contained in a column vector x defined as

x =

c1,0 · · · c1,K cJ,0 · · · cJ,K ↵0 ↵1 · · · ↵N

�T. (4.8)

Full specification of Equation 4.7 requires determining the number of unknown source and filter

parameters. The generalized filter depends on N coe�cients while the excitation signal depends on

the number of piecewise polynomials used to model it. J indicates the number of segments and K

is the polynomial order for each segment.

4.4.2 Convex Optimization

The source-filter parameters are found by identifying the unknowns in x that minimize Equation 4.7.

The complexity of this problem is obviously related to the number of segments used to parameterize

pb(n) and the order of the generalized filter used to implement the string decay. In general the

number of unknowns are specified by J ⇥ (K + 1) + N + 1.

A common metric for optimizing the estimation of the unknown parameters is by taking the

L2-norm of the error term in Equation 4.7, which leads to

minx

kek2 = minx

kHx� yk2. (4.9)

41

Expanding 4.9 yields

minx

kHx� yk2 = (Hx� y)T (Hx� y)

= xTHTHx� 2yTHx + yTy

=1

2xTFx + gTx + yTy (4.10)

where F = 2HTH and gT = �2yTH. Equation 4.10 is now in the form of a convex optimization

problem. In this form, any locally minimum solution must also be a global solution [6].

Before applying a solver to the optimization problem, the constraints on the source-filter param-

eters in x must be addressed. For example, depending on the structure used for the loop filter, the

constraints may specify bounds on the coe�cients to yield a stable filter. Specific constraints for the

filter models used will be discussed in Sections 5.2 and 5.3. Regardless of the filter structure used,

the constraints regarding the excitation model are consistent. In particular, the segments constitut-

ing the excitation should be a smooth concatenation of polynomial functions that are continuous

at the boundary locations. As an example, consider an excitation consisting of J = 2 segments,

each modeled with a K-order polynomial and sharing a boundary located at n = i. The equality

condition ensuring that these segments are continuous can be expressed as

c1,0 · i0 + c1,1 · i1 + · · · + c1,K · iK = c2,0 · i0 + c2,1 · i1 + · · · + c2,K · iK ,

which, in matrix form, is notated as

i0 i1 · · · iK �i0 �i1 · · · �ik

2

6666666666666666666664

c1,0

c1,1

...

c1,K

c2,0

c2,1

...

c2,K

3

7777777777777777777775

= 0.

The term on the left contains the time indices of the polynomial functions and the column vector

42

contains the unknown source coe�cients. Since the real excitation signals dealt with will consist

of more than two segments, additional equality conditions are required for each pair of segments

sharing a boundary.

The constraints on the source-filter parameters are specified for the optimization problem via

equality and inequality conditions, noted by Aeq and A, respectively. By including these constraints,

the optimization problem from Equation 4.10 is expressed as

minx

f(x) =1

2xTFx + gTx (4.11)

subject to Ax b

Aeqx = beq.

where the last term of Equation 4.10 is dropped from the objective function f(x) since it is always

positive and does not contribute to the minimization. In Equation 4.11, b and beq specify the

bounds on the parameters related to the inequality and equality constraint matrices, respectively.

When written in the form of 4.11, Equation 4.9 is solved using quadratic programming techniques.

Several software packages are available for this task, including CVX and the quadprog function in

MATLAB’s Optimization Toolbox. quadprog employs a “trust region” algorithm, where a gradient

approximation is used to evaluate a small neighborhood of possible solutions in x to determine

convergence [47]. CVX is also adept for solving quadratic programs, though it formulates the

objective function as a second-order cone problem [18]. CVX is the preferred solver for the work in

this thesis because the syntax used to specify the quadratic program is identical to the mathematical

description of the minimization problem in Equation 4.10.

43

CHAPTER 5: SYSTEM FOR PARAMETER ESTIMATION

Coarse Onset Detection

Pitch Estimation

Pitch Synchronous

Onset

Onset Localization and Segment Estimation

y(n)

Initialize Least Squares Problem

||Hx - y||2

f0

n0, n1, ... ,nJ

Solve Optimization

Source-Filter Parameters

x

Figure 5.1: Proposed system for jointly estimating the source-filter parameters for plucked guitartones.

This chapter presents the details for the implementation of the joint source-filter estimation

scheme proposed in Chapter 4. Figure 5.1 provides a diagram of the proposed system including

the major sub-tasks required for estimating the parameters directly from recordings. Section 5.1

discusses the onset localization of the plucked-guitar signal. This is required to determine the pitch

of the tone during the “attack” instant and to localize the indices for the parametric model of the

excitation signal. The experiments for application of the joint source-filter scheme are presented in

Section 5.2, which include the problem formulation, solution and analysis of the results.

5.1 Onset Localization

To estimate the SDL excitation signal in the joint framework, the physics of a vibrating string fixed

at both end points are exploited. When considering the SDL model without the comb filter e↵ect

explicitly accounted for, the excitation signal corresponds to one period of string vibration, which

can be identified in the recorded signal. From the physical modeling overview provided in Chapter 3,

44

when the string is released from an initial displacement, two disturbances are produced that travel

in opposite directions along the string. These disturbances are measured by the guitar’s pickup as

impulse-like signals where the first pulse is incident from the string’s initial displacement and the

second is inverted from reflection at the guitar’s nut. A simulation of this behavior using acceleration

as the wave variable was shown in Section 4.3.2. By identifying these pulses in the initial period of

vibration, the portion of the recorded signal corresponding to the excitation signal can be identified.

This section overviews the approach used to identify the boundaries of the excitation within the

plucked-guitar signal, which includes locating the incident and reflected pulses. As will be explained

in Chapter 6, the spacing of these pulses provides insight on estimating the performer’s relative

plucking position along the string. The approach utilizes a two-stage onset detection and is outlined

as follows:

1. Employ “coarse” onset detection to determine a rough onset time for the “attack” of the

plucked tone.

2. Estimate the pitch of the tone starting from the coarse onset.

3. Using the estimated pitch value, employ pitch-synchronous onset detection to estimate an

onset closer to the initial “attack” of the signal.

4. Search for the local minimum and maximum values within the first period of the signal.

5.1.1 Coarse Onset Detection

Onset detection is an important tool used for many tasks in music information retrieval (MIR)

systems, such as the identification of performance events in recorded music. For example, on a large

scale it may be of interest to identify the beats from a recording of polyphonic music by looking

for the drum onsets. For melody detection on a monophonic signal, the onsets must be found to

determine when the instrument is actually playing.

A thorough review of onset detection algorithms is provided in [4] and details several sub-tasks

of the process including pre-processing of the audio signal, reducing the audio signal to a detection

function and locating the onsets by finding peaks in the detection function. Obtaining a spectral

representation of the audio signal is often the initial step for computing a detection function since the

time-varying energy in the spectrum can indicate when certain transient events occur, such as note

onsets. The short-time Fourier Transform (STFT) provides a time-varying spectral representation

45

and may be computed as:

Yk(n) =

N2 �1X

m=�N2

y(m)w(m � nh)e�2j⇡mk

N . (5.1)

In Equation 5.1, w(m) is an N -point window function and h is the hop-size between adjacent

windows. The STFT facilitates the computation of several detection functions for onset detection

tasks including spectral flux. For monophonic recordings of instruments with an impulsive attack,

such as the guitar, Bello et al. show that spectral flux performs well in identifying onsets [4]. Spectral

flux is calculated as the squared distance between successive frames of the STFT

SF (n) =

N2 �1X

k=�N2

{R (|Yk(n)|� |Yk(n � 1)|)}2 (5.2)

where R(x) = (x + |x|)/2 is a rectification function to account for only positive changes in energy

while ignoring negative changes.

The “coarse” onset detection is named such because a relatively large window size of N = 2048

samples is used to compute the STFT in Equation 5.1 and the flux in Equation 5.2. The motivation

for using such a long window size is to identify the “attack” portion of the plucked-tone where there

is the largest energy increase while ignoring spurious noise preceding onset. The corresponding

detection function is shown in the top panel of Figure 5.3(a) where there is a clear peak. The onset

is taken as the time instant two frames prior to the maxima in the detection function.

5.1.2 Pitch Estimation

The coarse onset detected in Figure 5.3(a) is still quite far o↵ from the “attack” segment of the

plucked signal. Searching for the pulse indices too far from the onset of the signal will likely result

in false detections and a closer estimate is required. This is the purpose of pitch synchronous onset

detection. The pitch of the signal is estimated by taking a window of audio equal to three times

of the STFT frame length starting from the coarse onset location. Using this window, the pitch is

estimated using the well-known autocorrelation function, which is given by

�(m) =1

N

N�1X

n=0

[y(n + l)w(n)][y(n + l + m)w(n + m)], for 0 m N � 1, (5.3)

46

0 2 4 6 8 10

−50

0

50

100

150

200

Lag (msec)

Auto

corre

latio

n

Autocorrelation FunctionFundamental Frequency Lag

Figure 5.2: Pitch estimation using the autocorrelation function. The lag corresponding to the globalmaximum indicates the fundamental frequency for a signal with f0 = 330 Hz.

where w(n) is a window with length N . Autocorrelation is used extensively for detecting periodicity

in signal processing tasks since it can reveal underlying structure in signals, especially for speech

and music. If �(m) for a particular signal is known to be periodic with period P , then that signal

is also periodic with the same period [61]. The pitch of the plucked-signal is estimated by searching

for a global maximum in �(m) that occurs after the maximum correlation, i.e. the point of zero lag

where m = 0. An example autocorrelation plot is provided in Figure 5.2.

5.1.3 Pitch Synchronous Onset Detection

The estimated pitch of the plucked-signal is used to recompute the STFT using a frame size equal

to half the estimated pitch period starting from the coarse onset location. The spectral flux is also

recomputed using equation 5.2 and the new frame size. This yields a detection function with much

finer time resolution. As an example, the pitch synchronous onset for a plucked signal is shown in

Figure 5.3(b), where the onset is taken as the first locally maximum peak indicated by the detection

function. Comparing all the panels of 5.3, it is evident that the two stage onset detection procedure

provides an onset that is su�ciently close to the “attack” portion of the plucked-note.

47

0

20

40

60

80

100

120

140

160

180

Spec

tral F

lux

FluxOnset

(a)

0

50

100

150

200

250

300

350

400

Spec

tral F

lux

FluxOnset

(b)

0.4 0.45 0.5 0.55 0.6−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time (sec)

Ampl

itude

Plucked SignalCoarse OnsetPitch Synchronous Onset

(c)

Figure 5.3: Overview of residual onset localization in the plucked-string signal. (a): Coarse onsetlocalization using a threshold based on spectral flux with a large frame size. (b): pitch-synchronousonset detection utilizing spectral flux threshold computed with a frame size proportional to the fun-damental frequency of the string. (c): Plucked-string signal with onsets coarse and pitch-synchronousonsets overlayed.

48

5.1.4 Locating the Incident and Reflected Pulse

With the pitch-synchronous onset location, identifying the indexes of the incident and reflected

pulses is accomplished via a straight-forward search for the minimum and maximum peaks within

the first period of the signal. This period is known from the previous pitch estimation step. The

plucked-signal from Figure 5.3 is shown again in detail in Figure 5.4 for emphasis. The indices of

the pulses are used as boundaries for fitting polynomial curves to model the excitation signal. It

should be noted that a straight-forward search for the minima and maxima is sensitive to noise

preceding the incident pulse. The pitch-synchronous onset detection is capable of ignoring this noise

and yielding an onset closer to the incident pulse location.

0.44 0.445 0.45 0.455 0.46−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time (sec)

Ampl

itude

Pluck SignalPitch Syncrhonous OnsetIncident PulseReflected Pulse

Figure 5.4: Detail view of the “attack” portion of the plucked-tone signal in Figure 5.3. The pitch-synchronous onset is marked as well as the incident and reflected pulses from the first period ofoscillation.

49

5.2 Experiment 1

This section presents the application of the joint source-filter estimation schemed proposed in Section

4.4 when the loop filter chosen is a single pole infinite impulse response (IIR) type. The problem

formulation and solution are discussed as well as the application of the scheme to a corpus of plucked

guitar tones.

5.2.1 Formulation

In the literature, the decay rates of the harmonically-related partials of plucked-guitar tones are

often approximated by a single, infinite impulse response (IIR) filter with the following form

Hl(z) =g

1 � ↵0z�1(5.4)

In this formulation, the pole ↵0 is tuned so that the spectral roll-o↵ of the filter’s magnitude response

approximates the decay rates of the harmonically related partials in the plucked guitar tone. The

gain term g in the numerator is tuned to improve the fit.

To estimate this type of filter in the joint source-filter framework, Equation 5.4 is substituted for

Hl(z) in the SDL string filter S(z)

S(z) =1

1 � Hl(z)HF (z)z�DI

=1 � ↵0z

�1

1 � ↵0z�1 � gHF (z)z�DI. (5.5)

The pole in the numerator of Equation 5.5 poses a problem for the joint-source filter estimation

approach because inverse filtering Y (z) with S(z) does not result in a FIR filtering operation. This

is problematic because inverse filtering Y (z) and S(z) in the time domain requires previous samples

from the excitation signal pb(n), which is unknown.

In practice, we can circumvent this di�culty and still formulate the joint source-filter estima-

tion problem by discarding the numerator of S(z) in Equation 5.5 to yield an all-pole filter. This

approximation is made by noting a few observations about the source-filter system. First, the mag-

nitude response of S(z), shown in Figure 5.5(d), is dominated by its poles, which creates a resonant

structure passing frequencies located near the string’s harmonically related partials. Examining the

values estimated for the loop filter pole ↵0 in the literature [14, 39, 86, 90], ↵0 is typically very small

50

−1 −0.5 0 0.5 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Real Axis (seconds−1)

Imag

inar

y Ax

is (s

econ

ds−1

)

(a)

−1 −0.5 0 0.5 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Real Axis (seconds−1)

Imag

inar

y Ax

is (s

econ

ds−1

)

(b)

0 2000 4000 6000 8000 10000−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

Frequency (Hz)

Mag

nitu

de (d

B)

(c)

0 2000 4000 6000 8000 10000−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

Frequency (Hz)

Mag

nitu

de (d

B)

(d)

Figure 5.5: Pole-zero and magnitude plots of a string filter S(z) with f0 = 330 Hz and a loop filterpole located at ↵0 = 0.03. The pole-zero and magnitude plots of the system are shown in (a) and(c) and the corresponding plots using an all-pole approximation of S(z) are shown in (b) and (d).

(|↵0| ⌧ 1). As shown in Figure 5.5(a), this places the corresponding zero in the numerator of S(z)

close to the origin of the unit circle giving it a negligible a↵ect on the filter’s magnitude response.

Figures 5.5(d) shows that the magnitude response of the all-pole approximation is identical to its

pole-zero counterpart in Figure 5.5(c).

The next observation is that the model of the excitation signal consists of a short-duration pulse

with zero amplitude after the first period of vibration as discussed in Section 4.3. The non-zero part

of the excitation signal pertains to how the string was plucked, while the remaining part is residual

error from the string model. By making a zero-input assumption on the excitation signal after the

initial period, the recursion from the numerator of S(z) can be ignored without much a↵ect to the

51

estimation.

Taking these observations into account, the numerator of S(z) is discarded and an all-pole ap-

proximation is obtained

S(z) =1

1 � ↵0z�1 � gHF (z)z�DI. (5.6)

The fractional delay coe�cients due to HF (z) must be addressed before the error minimization

between the residual and excitation filter can be formulated (i.e. Equation 4.3). HF (z) is an N

order FIR filter

HF (z) =NX

n=0

hn(n)z�N (5.7)

where the coe�cients for a desired delay can be computed using a number of design techniques.

A consequence of realizing a causal fractional delay filter is that an additional integer delay in the

amount bN/2c is introduced into the feedback loop of S(z). In practice, this can be compensated

for to avoid de-tuning the SDL by subtracting the added delay from HF (z) o↵ of the bulk delay

filter zDI as long as N ⌧ DI .

The required fractional delay DF and the bulk delay DI can be determined from the estimated

pitch of the guitar tone discussed in Section 5.1.2 and HF (z) is computed using the LaGrange

interpolation technique overviewed in Appendix A. The error minimization from Equation 4.4 can

now be specified for this particular case

E(z) = Pb(z)Y (z)(1 � ↵0z�1 � g(h0 + h1z

�1 + · · · + hNz�N )z�DI ). (5.8)

By expanding Equation 5.8, rearranging terms and taking the inverse z-transform the error mini-

mization is expressed in the time domain as

e(n) = pb(n) + ↵0y(n � 1) + . . .

�0y(n � DI) + �1y(n � DI � 1) + · · · + �Ny(n � DI � N) � y(n)(5.9)

where �j = ghj , for j = 0, 1, 2, . . . , N .

52

5.2.2 Problem Solution

Using the convex optimization approach presented in Section 4.4.2, minimizing the L2-norm of

Equation 5.9 becomes

minx

kHx� yk2 (5.10)

subject to 0.001 ↵0 0.999

0.001 �j 0.999 for j = 0, 1, . . . , N.

The first inequality in the minimization ensures that the estimated loop filter pole ↵0 will lie within

the unit circle for stability and have low-pass characteristics. Though ↵0 = 0 is a stable solution, the

resulting filter will not have any damping characteristics on the frequency response of the loop filter

so 0.001 was chosen as a lower bound on ↵0. The second inequality constraint relates to the stability

of the overall string filter S(z). If the gain g of the loop filter is permitted to exceed unity, certain

frequencies could be amplified, which would result in an unstable string filter response. Thus, the

product of g with each fractional delay filter coe�cient hj is constrained to avoid this. Each hj is

constrained by the nature of the fractional filter design leaving g as the free parameter.

In addition to the inequality constraints, equality constraints were placed on the minimization

in Equation 5.10 to handle continuous excitation boundaries, which was discussed in Section 4.4.2.

The excitation boundaries were identified using the two-stage onset localization scheme from Section

5.1. While this approach yields 3 segments corresponding to the incident and reflected pulses, it

was found that additional segments were needed to adequately model the complex contours of the

excitation signal. To reduce the modeling complexity, two equally-spaced boundaries were inserted

between the incident and reflected pulses as shown in the top panel of Figure 5.6. Including the

boundary after the first period of the signal, this yields a total of 5 boundaries requiring 6 segments

to be modeled. 5th-order polynomial functions were found to provide the best approximation of

each segment while maintaining feasibility in the optimization problem since increasing the order

also increases the number of unknown variables. Lower order functions are unable to capture the

details of the signal, while higher order functions generally resulted in the solver failing to converge

on a solution.

53

5.2.3 Results

The source-filter estimation scheme was applied to a corpus of recorded performances of a guitarist

exciting each of the 6 strings using various fret positions. Multiple articulations were performed at

each position, which included using a finger or pick and altering the dynamics, or relative hardness,

of the excitation. Additional details about the data are provided in Section 6.3.

Figure 5.6 demonstrates the analysis and resynthesis for a tone produced by plucking the open,

1st string of the guitar. The top panel of Figure 5.6 shows the identification of the boundaries for

the excitation signal model within the first period of the recorded tone. The middle panel shows the

resynthesized tone and estimated excitation signal using the parameters obtained from the convex

optimization. The error computed between the synthetic and recorded tones is shown in the bottom

panel of Figure 5.6 along with the error computed between the estimated excitation signal and the

residual from inverse filtering. Areas of the error signals with significant amplitude can be attributed

to several factors. First, the approximation of the excitation may not capture all the high frequency

details present in the recorded signal. Second, the SDL model has fixed-frequency tuning whereas

the pitch of the recorded tone tends to fluctuate due to changing tension as the string vibrates,

which results in misalignment. Finally, the loop filter model assumes that the string’s partials

monotonically decay over time even though the decay characteristics of recorded tones are generally

more complex. This results in amplitude discrepancy between the analyzed and synthetic signals,

which contributes to the error as well.

Figure 5.7 shows that the source-filter estimation approach is capable of estimating the loop filter

pertaining to string articulations resulting from varying dynamics. Figures 5.7(a) and 5.7(b) show

the amplitude decay characteristics of analyzed and synthesized tones produced with a piano artic-

ulation, respectively. In this case, the synthetic tone demonstrates the gradual decay characteristics

of its analyzed counterpart. As the articulation dynamics are increased to mezzo-forte, the observed

decay is more rapid in both the analyzed and synthetic cases in Figures 5.7(c) and 5.7(d). Finally,

Figures 5.7(e) and 5.7(f) show a forte articulation defined by a very rapid decay. In all cases, the

synthetic signals constructed from the estimated parameters convey the perceptual characteristics

of their analyzed counter parts.

Figure 5.8 shows a similar plot of analyzed and resynthesized signals for various articulations,

but focuses on tones produced on a lower gauge string. In this case, the string’s behavior deviates

significantly from the SDL model since the amplitude decay rate fluctuates over time. This is

54

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Ampl

itude

Analyzed SignalResidual ExcitationExcitation Boundaries

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Ampl

itude

Synthesized OutputEstimated Input Signal

2 4 6 8 10 12−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Ampl

itude

Time (msec)

Output ErrorInput Error

Figure 5.6: Analysis and resynthesis of the guitar’s 1st String in the “open” position (E4, f0 = 329.63Hz). Top: Original plucked-guitar tone, residual signal and estimated excitation boundaries. Middle:Resynthesized pluck and excitation using estimated source-filter parameters. Bottom: Modelingerror.

55

0 1 2 3 4 5

−0.2−0.15−0.1−0.05

00.050.1

0.150.2

Time (sec)

Ampl

itude

(a) piano, analyzed

0 1 2 3 4 5

−0.2−0.15−0.1−0.05

00.050.1

0.150.2

Time (sec)

Ampl

itude

(b) piano, synthetic

0 1 2 3 4 5

−0.2

−0.1

0

0.1

0.2

Time (sec)

Ampl

itude

(c) mezzo-forte, analyzed

0 1 2 3 4 5

−0.2

−0.1

0

0.1

0.2

Time (sec)

Ampl

itude

(d) mezzo-forte, synthetic

0 1 2 3 4 5

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Time (sec)

Ampl

itude

(e) forte, analyzed

0 1 2 3 4 5

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Time (sec)

Ampl

itude

(f) forte, synthetic

Figure 5.7: Comparing the amplitude envelopes of synthetic plucked-string tones produced with theparameters obtained from the joint source-filter algorithm against their analyzed counterparts. Thetones under analysis were produced by plucking the 1st string at the 2nd fret position (F#4, f0 = 370Hz) at piano, mezzo-forte and forte dynamics.

characteristic of tones that exhibit strong beating characteristics and tension modulation. Although

these behaviors are not captured using the joint estimation approach, the optimization routine

identifies loop filter parameters that provide the best overall approximation of the tone’s decay

characteristics.

56

0 1 2 3 4 5

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

Time (sec)

Ampl

itude

(a) piano, analyzed

0 1 2 3 4 5

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

Time (sec)

Ampl

itude

(b) piano, synthetic

0 1 2 3 4 5−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Time (sec)

Ampl

itude

(c) mezzo-forte, analyzed

0 1 2 3 4 5−0.6

−0.4

−0.2

0

0.2

0.4

0.6

Time (sec)

Ampl

itude

(d) mezzo-forte, synthetic

0 1 2 3 4 5−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Time (sec)

Ampl

itude

(e) forte, analyzed

0 1 2 3 4 5−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Time (sec)

Ampl

itude

(f) forte, synthetic

Figure 5.8: Comparing the amplitude envelopes of synthetic plucked-string tones produced withthe parameters obtained from the joint source-filter algorithm against their analyzed counterparts.The tones under analysis were produced by plucking the 5th string at the 5th fret position (D3,f0 = 146.83 Hz) at piano, mezzo-forte and forte dynamics.

To assess the model “fit” for each signal in the data set, the signal-to-noise ratio (SNR) was

57

computed as

SNRdB = 10 log101

L

LX

n=0

✓y(n)

y(n) � y(n)

◆2

, (5.11)

where L is the length of the analyzed guitar tone y(n) and y(n) is the re-synthesized tone using

the parameters from the joint estimation scheme. This metric provides an indication of the average

amplitude distortion introduced by the modeling scheme for a particular signal so that in the ideal

case there is zero amplitude error distorting the signal.

Table 5.1 summarizes the mean and standard deviation of the SNR computed for particular

articulations on certain strings. For example, the SNR values for all forte plucks produced with the

guitarist’s finger along the 1st string are computed and the mean and standard deviation of these

values is reported. No distinction is made for di↵erent fret positions along a string.

It should be noted that in general, the mean SNR value for a particular dynamic (i.e. forte)

corresponding to pick articulations is generally lower than the same plucking dynamic produced

with the guitarist’s finger. This can be explained by the action of the plastic pick, which induces

rapid frequency excursions in the partials of the string and other nonlinear behaviors such as tension

modulation. These e↵ects are prominent near the “attack” portion of the tone and the associated

string decay does not exhibit the monotonically decaying exponential characteristics used in the

single delay-loop model. The linear time invariant model cannot capture the complexities of the

string vibration and the estimated loop filter provides a “best fit” to match the overall decay char-

acteristics. This leads to a greater amplitude discrepancy between the modeled and analyzed tones

and thus a lower SNR value.

For the 3rd string, the SNR values are significantly lower for the pick articulations. A closer

inspection revealed that many of these tones exhibited resonant e↵ects from coupling with the

guitar’s body. This resonant e↵ect introduces a “hump” in the tone’s amplitude decay envelope

after the initial attack. Since the string model does not consider the instrument’s resonant body,

this e↵ect is not accounted for, which leads to increased amplitude error for the a↵ected portions of

the signal.

Informal listening tests confirm that the synthetic signals preserve many of the perceptually

important characteristics of the original tones, including the transient “attack” portion of the signal

related to the guitarist’s articulation.

58

Mean and Standard Deviation of Signal-to-Noise Ratio (dB)

Pick Finger

String piano mezzo-forte forte piano mezzo-forte forte

1 50.27 ± 1.52 51.92 ± 1.73 52.03 ± 2.12 49.80 ± 2.53 52.70 ± 1.74 54.66 ± 1.51

2 50.23 ± 1.37 50.35 ± 1.19 53.58 ± 2.18 52.10 ± 3.29 55.34 ± 1.39 55.48 ± 1.34

3 48.30 ± 0.99 48.60 ± 1.29 48.85 ± 1.53 50.73 ± 3.86 55.62 ± 3.12 56.36 ± 2.37

4 51.19 ± 1.29 52.11 ± 0.85 51.78 ± 1.98 54.44 ± 2.37 57.06 ± 1.18 56.47 ± 1.30

5 49.80 ± 1.59 50.16 ± 1.80 49.12 ± 1.04 53.63 ± 1.79 56.38 ± 1.53 55.60 ± 1.03

6 51.09 ± 1.23 51.61 ± 1.65 51.98 ± 1.77 53.78 ± 1.84 53.88 ± 1.65 55.09 ± 1.25

Table 5.1: Mean and standard deviation of the SNR computed using Equation 5.11. The jointsource-filter estimation approach was used to obtain parameters for synthesizing the guitar tonesbased on an IIR loop filter.

5.3 Experiment 2

This section investigates the solution of the joint source-filter estimation scheme when a finite impulse

response (FIR) filter is used to implement the loop filter. The problem formulation, solution and

results are discussed as well.

5.3.1 Formulation

The Z-Transform for a generalized, length N (order N � 1) FIR filter is given by

H(z) =NX

k=0

hkz�k, (5.12)

where each hk is an impulse response coe�cient of the filter. By using this filter structure for the

string model’s loop filter, the transfer function of S(z) becomes

S(z) =1

1 � Hl(z)HF (z)z�Dl. (5.13)

For the plucked-string system defined by the transfer function of S(z), the output is computed en-

tirely by a linear combination of past output samples once the transient-like excitation has reached

a zero-input state. Estimating the filter coe�cients through the error minimization technique dis-

cussed in Section 4.4.1 becomes complicated since the loop filter coe�cients are convolved with the

coe�cients from the fractional delay filter HF (z), which is also modeled using an FIR filter and

59

the contribution of the loop filter cannot be easily separated. In practice, this di�culty is averted

by resampling the recorded signal y(n) to a frequency that can be defined by an integer number

of delays determined by the bulk delay term DI , which allows HF (z) to be dropped. Though this

has the e↵ect of adjusting the frequency of the signal to fo = fsDI

, the fractional delay filter can be

re-introduced during synthesis to correct the pitch.

After the resampling operation, the Z-Transform of the error minimization becomes

E(z) = Pb(z) � Y (z)S�1(z)

= Pb(z)(1 � (h0 + h1z�1 + · · · + hN )z�DI ). (5.14)

Expanding terms and taking the inverse Z-Transform of Equation 5.14 yields the time-domain for-

mulation of the error minimization

e(n) = pb(n) + h0y(n � DI) + h1y(n � DI � 1) + · · · + hNy(n � DI � N) � y(n) (5.15)

where the loop filter coe�cients hk can be estimated with the convex optimization approach.

5.3.2 Problem Solution

Before solving for the source and filter parameters, several constraints are imposed on the FIR loop

filter. Foremost, the loop filter is required to have a low pass characteristic, to avoid amplifying

high frequency partials. This is consistent with the assumed operation of the loop filter in relation

to the behavior of plucked-guitar tones described in Section 3.3.3 where, in general, high frequency

partials are perceived as decaying faster than lower frequency partials. The next constraint on the

loop filter is that it exhibit a linear phase response to avoid introducing excessive phase distortion

into the frequency response of the string filter S(z). These filters also have the convenient property

of constant group delay, so as not to drastically de-tune S(z) when the signal is resynthesized.

The low pass constraints on the FIR filter can be formulated by constraining the magnitude

response on the filter at DC and Nyquist. At DC (! = 0), the filter gain is required to be 1 and

60

yields the following inequality constraints on the filter coe�cients

��H(e�j!k)��!=0

1

��h0 + h1e�j⇤0⇤1 + h2e

�j⇤0⇤2 + · · · + hNe�j⇤0⇤N �� 1

h0 + h1 + h2 + · · · + hN 1. (5.16)

At Nyquist frequency (! = ⇡), we require the filter to have zero magnitude response. This is

expressed as an equality constraint on the filter coe�cients

��H(e�j!k)��!=⇡

= 0

��h0 + h1e�j⇡ + h2e

�j2⇡ + · · · + hNe�jN⇡�� = 0

h0 + �h1 + h2 + · · · + (�1)NhN = 0. (5.17)

The linear phase constraint on the filter requires that its filter coe�cients are symmetric. This

imposes a final set of equality constraints on the coe�cients

hk = hN�1�k for k = 0, . . . , N. (5.18)

The process of identifying the boundaries for the segments of the excitation signal is identical

to the procedure described in Section 5.2.2 and 5th-order polynomials are also used for segment

fitting. Equation 5.19 summarizes the constrained minimization problem after taking the L2-norm

of Equation 5.15 and imposing the constraints from Equations 5.16-5.18 in addition to the constraints

placed on the input signal as specified in Section 4.4.2.

minx

kHx� yk2 (5.19)

subject toN+1X

k=0

hk 1

N+1X

k=0

hk(�1)k = 1

hk = hN�1�k for k = 0, . . . , N

61

Mean and Standard Deviation of Signal-to-Noise Ratio (dB)

Pick Finger

String piano mezzo-forte forte piano mezzo-forte forte

1 50.81 ± 1.61 51.94 ± 1.68 52.03 ± 1.85 49.51 ± 2.77 52.88 ± 1.83 54.77 ± 1.66

2 50.76 ± 1.19 50.68 ± 1.13 52.64 ± 1.93 52.26 ± 3.33 56.03 ± 1.32 55.69 ± 1.29

3 48.78 ± 0.97 48.70 ± 1.20 49.65 ± 1.44 50.89 ± 3.91 56.21 ± 3.48 56.30 ± 2.68

4 51.60 ± 1.05 52.18 ± 0.66 52.32 ± 1.72 54.45 ± 2.16 57.28 ± 2.16 56.45 ± 1.23

5 49.68 ± 1.65 50.10 ± 1.66 49.78 ± 1.92 53.76 ± 2.07 56.48 ± 1.58 55.28 ± 1.05

6 51.30 ± 1.43 51.73 ± 1.51 52.12 ± 1.86 53.92 ± 1.95 54.03 ± 1.84 55.23 ± 1.75

Table 5.2: Mean and standard deviation of the SNR computed using Equation 5.11. The jointsource-filter estimation approach was used to obtain parameters for synthesizing the guitar tonesusing a FIR loop filter with length N = 3.

5.3.3 Results

The source-filter estimation scheme using the FIR loop filter was applied to the same corpus of signals

used in Experiment 1 and the MATLAB CVX package was again used to solve the minimization

from Equation 5.19. Table 5.2 summarizes the mean and standard deviation of the SNR computed

in the same manner as Experiment 1 using Equation 5.11. These values were computed based on

re-synthesizing the plucked-guitar tones using a FIR loop filter with length N = 3.

The values reported in Table 5.2 from this experiment are on par with the values obtained

in Experiment 1. That is, the FIR modeling approach exhibits roughly the same average SNR

values and trends for di↵erent articulations and strings. However, by comparing the synthetic tones

produced by the methods of Experiment 1 and 2, we noted that the FIR filter does not always

adequately match the decay rates for the high frequency partials. This yielded synthetic tones that

sounded “buzzy” since the high frequency partials were not decaying fast enough.

We attempted to improve the perceptual qualities of the synthetic tones to better match their

analyzed counterparts by increasing the length of the FIR loop filter. However, using filters with

length N > 3 often resulted in the overall response of the single delay-loop model becoming unstable.

Though the FIR loop filter is inherently stable by design and constraints were placed on the filter at

the DC and Nyquist frequencies, the FIR loop filter may occasionally exhibit gains exceeding unity

at mid-range frequencies across the spectrum. Since this filter is located in the feedback loop of the

single delay-loop model, the overall response is unstable when the excitation signal has energy at

62

mid-range frequencies.

5.4 Discussion

This chapter presented the implementation details for the joint source-filter estimation scheme pro-

posed in Chapter 4. This included a two-stage onset detection based on a spectral flux computation

to estimate the pitch of the plucked-tone and identify the location of the incident pulses used to

estimate the source signal. The system was implemented using two di↵erent loop filter structures

which characterize the frequency-dependent decay characteristics of the guitar tones.

The first implementation utilized a one pole IIR filter to model the string’s decay response. The

formulation of the joint estimation scheme using this filter required using an all-pole approximation

for the single delay-loop transfer function. By applying the estimation scheme using this formulation,

it was shown that the modeling scheme was capable of capturing the source signals and string decay

responses characteristic to the articulations in the data set. The articulations produced with the

guitarist’s pick led to more complex string responses and the source-filter estimation method extracts

filter parameters that best approximate these characteristics. Modeling error is attributed to the

accuracy of the estimated source signal, which may omit some noise-like characteristics and the

non-ideal decay characteristics of real strings, which is generally not monotonic as assumed by the

model.

The second implementation utilized an FIR loop filter model, which inherently leads to an all-

pole transfer function for the single delay-loop model and thus, is more flexible in terms of adding

additional taps to improve the fit. Though a low order (length N = 3) FIR filter performed similarly

to the IIR case in terms of SNR, the low order filter did not adequately taper o↵ the high frequency

characteristics of the tones. Increasing the order of this filter led to unstable single delay-loop

transfer functions due to the loop filter gain occasionally exceeding unity. Thus, the IIR loop filter

proved to be more robust in terms of stability and providing a better match of the string’s decay

characteristics for high frequency partials.

63

CHAPTER 6: EXCITATION MODELING

6.1 Overview

In Chapter 3 physically inspired models of the guitar were discussed including the popular waveg-

uide synthesis and the related source-filter models. In particular, the source-filter approximation

is attractive for analysis and synthesis tasks because these models provide a clear analog to the

physical phenomena incurred with exciting a guitar string: that is, an impulsive-like force from the

performer excites the resonant behavior of the string. In Section 4.3, it was shown that analysis

via the source-filter approximation can be used to recover excitation signals corresponding to par-

ticular string articulations, thereby providing a measure of the performer’s expression. In Section

4.4, a technique was proposed to jointly estimate the excitation signal along with the filter model

using a piecewise polynomial approximation of the excitation signal, which contains a bias from the

performer’s relative plucking point position along the string.

Including the method proposed in Section 4.4.1, many techniques are available for estimating

and calibrating the resonant filter properties for the source-filter model [29, 36, 86], but less research

has been invested in the analysis of the excitation signals, which are responsible for reproducing

the unique timbres associated with the performer’s articulation. This is a complex problem, since

there are nearly an infinite number of ways to pluck a string, each of which will yield a unique

excitation (using the source-filter model) even when the tones have a similar timbre. In particular,

it is desirable to have methods in which particular articulations could be quantified from analysis

of the associated excitation signal. For applications, it would also be desirable to manipulate a

parametric representation for arbitrary plucked-string synthesis.

In this chapter, a components analysis approach is applied to a corpus of excitation signals derived

from recordings of plucked-guitar tones in order to obtain a quantitative representation to model the

unique characteristics of guitar articulations. In particular, principal components analysis (PCA)

is employed for this task to exploit common features of excitation signals while modeling the finer

details using the appropriate principal components. This approach can be viewed as developing a

codebook, where the entries are principal component vectors that describe the unique characteristics

64

of the excitation signals. Additionally, these components are used as features for visualization of

particular articulations and dimensionality reduction. Nonlinear PCA is employed to yield a two-way

mapping that isolates specific performance attributes which can be used for synthesizing excitation

signals.

This research has several applications, including modeling guitar performance directly from

recordings in order to capture expressive and perceptual characteristics of a performer’s playing

style. Additionally, the codebook entries obtained in this paper can be applied to musical interfaces

for control and synthesis of expressive guitar tones.

6.2 Previous Work on Guitar Source Signal Modeling

Existing excitation modeling techniques are based on either the digital waveguide or related source-

filter models. While both are discussed at length in Chapter 3, the source filter model and its

components are briefly overviewed here to re-introduce notation pertinent to the remainder of the

chapter.

Figure 6.1 shows the model achieved when the bi-directional waveguide model is reduced to

a source-filter approximation. The lower block, S(z), of Figure 6.1 is referred to as the single

delay-loop (SDL) and consolidates the DWG model into a single delay line zDI in cascade with a

string decay filter Hl(z) and a fractional delay filter HF (z). These filters are calibrated such that

the total delay, D, in the SDL satisfies D = fsf0

where fs and f0 are the sampling frequency and

fundamental frequency, respectively. Hl(z) is designed using the techniques discussed in Section

3.3.5 [29, 36, 86] while the fractional delay filter can be designed using a number of techniques

discussed in Appendix A. The upper block, C(z), of Figure 6.1 is a feedforward comb-filter that

incorporates the e↵ect of the performer’s plucking point position along the string. Since the SDL

lacks the bi-directional characteristics of the DWG, C(z) simulates the boundary conditions when a

traveling wave encounters a rigid termination. Absent from Figure 6.1 is an additional comb filter

modeling the pickup position where the string output is observed. While this a↵ects the resulting

excitation signals when commuted synthesis is used for recovery, it is omitted here since the data

used for evaluations is collected using a constant pickup position.

While the SDL is essentially a source-filter approximation of the physical system for a plucked-

string, there are several benefits associated with modeling tones in this manner. For example,

modifying the source signal permits arbitrary synthesis of unique tones even for the same filter

65

Hl (z) HF (z) z-DI

p(n)

y(n)S(z)

+z-λD

−+

+

C(z)

+

+

Figure 6.1: Source-filter model for plucked-guitar synthesis. C(z) is the feed-forward comb filtersimulating the a↵ect of the player’s plucking position. S(z) models the string’s pitch and decaycharacteristics.

model. Also, for analysis tasks it is desirable to model the perceptual characteristics of tones from

a recorded performance by recovering the source signal using linear filtering operations (see Section

3.3.4 on Commuted Synthesis), which is possible with a source-filter model.

There are several approaches used in the literature for determining the excitation signal for the

source-filter model of a plucked-guitar. A possible source signal includes filtered white noise, which

simulates the transient, noise-like characteristics of a plucked-string [31]. A well-known technique

involves inverse filtering a recorded guitar tone with a properly calibrated string-model [29, 36].

When inverse filtering is used, the string model cancels out the tone’s harmonic components leaving

behind a residual that contains the excitation in the first few milliseconds. In [39], these residuals

are processed with “pluck-shaping” filters to simulate the performer’s articulation dynamics. For

improved reproduction of acoustic guitar tones, this approach is extended by decomposing the tone

into its deterministic and stochastic components, separately inverse filtering each signal and adding

the residuals to equalize the spectra of the residual [90]. Other methods utilize non-linear processing

to spectrally flatten the recorded tone and use the resulting signal as the source, since it preserves

the signal’s phase information [38, 41]. Lindroos et al. consider the excitation signal to consist of

three parts, which include the picking noise, the first impulse detected by the pickup and a second,

reflected pulse also detected by the pickup at some later time [44]. The picking noise is modeled

with low-pass filtered white noise and the first pulse is modeled with an integrating filter.

Despite the range of modeling techniques described above, these methods are not generalizable

for describing a multitude of string articulations. For example, Laurson’s approach involves storing

the residual signals obtained from inverse-filtering recorded plucks, and filters to shape a reference

66

residual signal in order to achieve another residual with a particular dynamic level (e.g. piano,

forte) [39]. While this approach is capable of “morphing” one residual into another, the relation-

ship between the pluck-shaping filters and the physical e↵ects of modifying plucking dynamics is

somewhat arbitrary. Additionally, this method does not remove the bias of the guitarist’s pluck-

ing point location, which is undesirable since the plucking point should be a free parameter for

arbitrary resynthesis. On the other hand, Lee’s approach handles this problem by “whitening” the

spectrum of the recorded tone to remove spectral bias. However, this requires preserving the phase

information resulting in a signal equal to the duration of the recorded tone, which is not a compact

representation of the signal.

6.3 Data Collection Overview

It is understood by guitarists that exactly reproducing a particular articulation on a guitar string is

extremely di�cult, if not impossible due to the many degrees of freedom available when exciting the

string. These degrees of freedom during the articulation comprise parts of the guitarist’s expressive

palette including:

• Plucking device (e.g. pick, finger, nail)

• Plucking location along the string

• Dynamics (i.e. the relative “hardness” or “softness” during the articulation)

These techniques have a direct impact on the initial shape of the string, yielding perceptually

unique timbres, especially during the “attack” phase of the tone. It is important to note that, unlike

the waveguide model presented in Chapter 3, the SDL does not allow the initial waveshape to be

specified via wave variables (e.g. displacement, acceleration). Instead, signal processing techniques

must be used to derive the excitation signals through analysis of recorded tones and it is unclear

initially how exactly to parameterize the e↵ects of the plucking device and dynamics once the signals

are recovered. Additionally, a significant amount of data is needed to analyze the e↵ects of these

expressive parameters on the resulting excitation signals.

This section details the approach and apparatus used to collect plucked guitar recordings con-

taining the expressive attributes listed above. The recovery of the excitation signals from the data

will be explained in Section 6.4.

67

6.3.1 Approach

The plucked-guitar signals under analysis were produced using an Epiphone Les Paul Standard gui-

tar equipped with a Fishman Powerbridge pickup. A diagram of the Powerbridge pickup is shown

in Figure 6.2 and features a piezoelectric sensor mounted on each string’s saddle on the bridge

[15]. Unlike the magnetic pickups traditionally used for electric guitars, the piezoelectric pickup

responds to pressure changes due to the string’s vibration at the bridge. For the application of

excitation modeling, the piezoelectric pickup has several benefits over magnetic pickups, including

the measurement of a relatively “dry” signal that does not include significant resonant e↵ects arising

from the instrument’s body. Also, magnetic pickups tend to introduce a low-pass filtering e↵ect on

the spectra of plucked-tones, but the piezo pickups record a much wider frequency range, which is

useful for modeling the noise-like interaction between the performer’s articulation and the string.

Finally, recordings produced with the bridge-mounted piezo pickup can be used to isolate the pluck-

ing point location for equalization, which will be explained in Section 6.4.2, since the pickup location

is constant at the bridge.

Bridge

Saddle

Piezo Crystals

Saddle Position Screw

Figure 6.2: Front orthographic projection of the bridge-mounted piezoelectric bridge used to recordplucked-tones. A piezoelectric crystal is mounted on each saddle, which measures pressure duringvibration. Guitar diagram obtained from www.dragoart.com.

The guitar was strung with a set of D’Addario “10-gauge” nickel-wound strings. The gauge

reflects the diameter of the first (highest) string, which is 0.01 inches, while the last (lowest) string

68

has a 0.046 inch diameter. As is common with electric guitar strings, the lowest 3 strings (4-6)

feature a wound construction while the highest 3 (1-3) are unwound. Recordings were used using

either the fleshy part of the guitarist’s finger or a Dunlop Jazz III pick.

The data set of plucked-recordings was produced by varying the articulation across the fretboard

of the guitar using either the guitarist’s finger or the pick. For each fret, the guitarist produces a

specific articulation five consecutive times for consistency using the pick and their finger. The artic-

ulations were identified by their dynamic level and consisted of piano (soft), mezzo-forte (medium-

loud) and forte (loud). The performer’s relative plucking point position along the string was not

specified and remained a free parameter during the recordings. The articulations were produced on

each of the guitar’s six strings using the “open” string position as well as the first five frets, which

yielded approximately 1000 plucked-guitar recordings.

The output of the guitar’s bridge pick-up was fed directly to a M-Audio Fast Track Pro USB

interface, which recorded the audio directly to a Macintosh computer. Audacity, an open source

sound recording and editing tool, was used to record the samples at sampling rate of 44.1 kHz at a

16-bit depth [49].

Due to the di↵erence in construction between the lower and high strings on the guitar, the

recordings were analyzed in two separate groups reflecting the wound and unwound strings. In

terms of the acquisition system, this a↵ects how the signals are resampled in Figure 6.3. For the

unwound strings, the signals were re-sampled to 196 Hz, which corresponds to the tuning of the

open, 3rd string, which is the lowest pitch possible on the unwound set. Similarly, the wound strings

were resampled to 82.4 Hz, which is the pitch of the open 6th string and the lowest note possible in

the wound set.

6.4 Excitation Signal Recovery

On the way to modeling the articulations from recordings from plucked-guitar tones, there are a

few pre-processing tasks that must be addressed: 1) Estimate the residual signal from plucked

guitar recordings and 2) remove the bias associated with the guitarists plucking point position. As

discussed in Section 6.2, a limitation of existing excitation modeling methods is that they do not

explicitly handle this bias. The system overviewed in Figure 6.3 addresses these tasks and its various

sub-blocks will be explained in this section.

69

6.4.1 Pitch Estimation and Resampling

The initial step of the excitation recovery scheme involves estimating the pitch of the plucked guitar

tone. This is achieved by using the well-known autocorrelation method, which estimates the pitch

over the first 2-3 periods of the signal by searching for the lag corresponding to the maximum

of the autocorrelation function (see Section 5.1.2) [61]. The fundamental frequency is computed

as f0 = fs⌧max

where fs is the sampling frequency and ⌧max is the lag at the maximum of the

autocorrelation function.

Since the plucked-guitar tones under analysis have varying fundamental frequencies, a resampling

operation is required to compensate for di↵erences in the pulse-width when the residual is recovered.

This is a required pre-processing step before principal components analysis, since the goal is to model

di↵erences in articulation that are not related to pitch. Otherwise, the extracted basis vectors will

not reflect the di↵erences in articulation, but rather the di↵erences between the fundamental periods

of the analyzed tones.

The resampling operation on the plucked-tone is defined as

y(n) =l�� y(n) (6.1)

where �� is the resampling factor. � = Tref and � = T0 indicate the periods, in samples, of the

reference frequency and the estimated pitch frequency of the plucked-tone, respectively.

6.4.2 Residual Extraction

There are several methods of extracting the residual from the recorded tone. The most generalized

approach was discussed in Section 4.3 and involves inverse-filtering the recorded tone by the cali-

y(n)

Plucking Point Estimation

Pitch Estimation Residual Extraction

via inverse filtering or joint estimation

f0

Residual Equalization

drpp

pb(n)

p(n)

Figure 6.3: Diagram outlining the residual equalization process for excitation signals.

70

0 2 4 6 8 10−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time (msec)

Ampl

itude

(a)

500 1000 1500 2000 2500 3000 3500 4000 4500 5000−30

−20

−10

0

10

20

30

Frequency (Hz)

Mag

nitu

de (d

B)

(b)

Figure 6.4: “Comb filter” e↵ect resulting from plucking a guitar string (open E, f0 = 331 Hz) 8.4cm from the bridge plucked-guitar tone. (a) Residual obtained from single delay-loop model. (b)Residual spectrum. Using equation 6.2, the notch frequencies are approximately located at multiplesof 382 Hz.

brated string model presented in Section 6.2 to yield the residual excitation pb(n). The approach

proposed in Chapter 4 outlines an alternate method to jointly estimate the excitation and filter

parameters for a plucked guitar tone. It should be notice that the subscript b on pb(n) indicates

that the residual contains a “plucking point bias”, which will eventually be removed.

6.4.3 Spectral Bias from Plucking Point Location

The “Plucking Point Estimation” block in Figure 6.3 is concerned with determining the position

where the guitarist has displaced the string. It is well understood in literature regarding string

physics and digital waveguide modeling that the plucking point position imparts a comb-filter e↵ect

on the spectrum of the vibrating string [17, 30, 64]. This occurs because the harmonics that have a

node at the plucking position are not excited and, in the ideal case, have zero amplitude.

Figure 6.4 shows the residual and its spectrum obtained from plucking an open E string (f0 =

331 Hz) approximately 8.4 cm from the bridge of an electric guitar. From 6.4(a), the first spike in

the residual results from the impulse produced by the string’s initial displacement arriving at the

bridge pickup. The subsequent spike also results from the initial string displacement, but has an

inverted amplitude due to traveling in the opposite direction along the string and reflecting at the

guitar’s nut. A detailed description of this behavior is provided in Figure 4.2 in Section 4.3.2. Unlike

a pure impulse which has a flat frequency response, the residual spectrum in 6.4(b) contains deep

notches spaced at near-regular frequency intervals. By denoting the relative plucking position along

71

the string as drpp = lLs

, where l is the distance from the bridge and Ls is the length of the string,

the notch frequencies can be calculated by

fnotch,n = nf0

1 � drpp, for n = 0, 1, 2, . . . (6.2)

The comb filter bias creates a challenge for parameterizing the excitation signals since the gui-

tarist’s relative plucking position constantly varies depending on the position of their strumming

hand and their fretting hand. Even when the guitarist maintains the same plucking distance from

the bridge, changing the fretting position along the neck manipulates the relative plucking position

by elongating or shortening the e↵ective length of the string. While guitarists vary the relative

plucking point location, either consciously or subconsciously, during performance, modeling the ex-

citation signal requires estimation of the plucking point position and equalization to remove its

spectral bias. Ideally, it is desirable to recover the pure impulsive signal imparted by the guitarist

when striking the string, as shown in Figure 6.9, in order to quantify expressive techniques, such as

plucking mechanism and dynamics. Such analysis requires estimating the plucking point location

from recordings and equalizing the residuals to remove the bias.

6.4.4 Estimating the Plucking Point Location

Previous techniques in the literature for estimating the plucking point location from guitar recordings

have focused on spectral or time-domain analysis techniques.

Traube proposed a method of estimating the plucking point location by comparing a sampled-

data magnitude spectrum obtained from a recording to synthetic magnitude spectra generated with

di↵erent plucking point locations [83, 84]. The plucking point location for a particular recording

was determined by finding the synthetic string spectra with a plucking position that minimizes the

magnitude error between the measured and ideal spectra.

Later, Traube introduced a plucking-point estimation method based on iterative optimization

and the so-called log-correlation, which is computed from recordings of plucked tones [81, 82]. The

log-correlation is computed by taking the log of the squared Fourier coe�cients for the harmonically-

related partials in a plucked-guitar spectrum and applying the inverse Fourier transform using these

coe�cients. The log-correlation function yields an initial estimate for the relative plucking position,

drpp = ⌧min

⌧0, where ⌧min, ⌧0 are the lags indicating the minima and maxima of the log-correlation

function, respectively. The estimate of drpp is used to initialize an iterative optimization scheme,

72

which minimizes the di↵erence between ideal and measured spectra, in order to refine drpp and

improve accuracy.

Penttinen et al. exploited time domain-based analysis techniques to estimate the plucking po-

sition [58, 59]. Using an under-saddle bridge pickup, Penttinen’s technique is based on identifying

the impulses associated with the string’s initial displacement as they arrive at the bridge pickup.

Since the initial string displacement produces two impulses traveling in opposite directions, the ar-

rival time between each impulse at the bridge, �t, provides an indication of the guitarist’s relative

plucking position along the string.

Figure 6.5 shows the output of a bridge-mounted piezo-electric pickup for a plucked-guitar tone.

By determining the onsets when each pulse arrives at the bridge pickup, Pentinnen shows that the

relative plucking position can be determined by

drpp =fs � �Tf0

fs, (6.3)

where �T = fs�t indicates the number of samples between the arrival of each impulse at the bridge

pickup [58, 59]. As drpp is in the range of (0, 1), the actual distance from the bridge is obtained

by multiplying drpp by the length of the string. Penttinen utilizes a two-stage onset detection to

determine �T where the first stage isolates the onset of the plucked tone and the second stage uses

the estimated pitch of the tone to extract one period of the waveform. The autocorrelation on the

extracted period is used to determine �T since the minimum of the autocorrelation function occurs

at the lag where the signal’s impulses are out of phase. Figure 6.6(a) shows one cycle extracted from

the waveform in Figure 6.5 and the corresponding autocorrelation of that signal in Figure 6.6(b).

�t is identified by searching for the index corresponding to the minimum of the autocorrelation

function.

There are several strengths and weaknesses associated with the methods proposed by Traube and

Penttinen. Traube’s approach is generalizable to acoustic guitar tones recorded using an external

microphone. However, a relatively large time window on the order of 100 milliseconds is required to

achieve the frequency resolution required to resolve the string’s harmonically related partials and,

thus, compute the autocorrelation function. By including multiple periods of string vibration in the

analysis, the e↵ect of the plucking position can become obscured since non-linear coupling of the

string’s harmonics can regenerate the missing harmonics [16]. By isolating just one period of the

waveform near the onset, Penttinen’s technique avoids this physical consequence since the analyzed

73

2 4 6 8 10 12 14

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Time (msec)

Ampl

itude

Δt

Figure 6.5: Plucked-guitar tone measured using a piezo-electric bridge pickup. Vertical dashed-linesindicate the impulses arriving at the bridge pickup. �t indicates the arrival time between impulses.

5.5 6 6.5 7 7.5 8 8.5

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Time (msec)

Ampl

itude

(a)

0 0.5 1 1.5 2 2.5 3−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

Time (msec)

Ampl

itude

(b)

Figure 6.6: (a) One period extracted from the plucked-guitar tone in Figure 6.5. (b) Autocorrelationof the extracted period. The minimum is marked and denotes time lag, �t, between arriving pulsesat the bridge pickup.

segment results from the string’s initial displacement. However, Penttinen’s approach requires the

guitar to be equipped with the bridge-mounted pickup to isolate the arrival time of the impulses in

the first period of vibration. Also, isolating the first period of vibration is di�cult and success is

dependent on the parameters used in the two-stage onset detection.

Handling the e↵ect of a string pickup location at a position other than the bridge is not explicitly

addressed by either method. Similar to spectral bias resulting from the plucking point location, the

pickup location also adds a spectral bias since vibrating modes of the string with a node at the pick

74

up location will not be measured. Traube’s methods are developed for the acoustic guitar recorded

with a microphone some distance from the instrument’s sound hole. In this case, the “pickup” is the

radiated acoustic energy from all positions along the string and thus shows no particular spectral

bias. For electric guitars, if a bridge-mounted pickup is not available, determining the plucking

location is particularly di�cult due to the lack of consistency where the pickups are placed on

the instrument and the number used. The former constraint makes it di�cult to determine which

impulse (i.e. that left-traveling and right-traveling) pulse is being measured at the output and the

latter constraint complicates the problem since some guitars “blend” the signal from two or more

pickups.

6.4.5 Equalization: Removing the Spectral Bias

The next step in the excitation acquisition scheme is to remove the comb filter bias associated with

the plucking point position. In Figure 6.3, the “Residual Equalization” block handles this task.

The equalization begins by obtaining an estimate of the relative plucking-point location drpp

along the string. Since the signals under analysis were recorded with a bridge-mounted pickup,

Penttinen’s autocorrelation-based technique was chosen to estimate drpp. The two-stage onset de-

tection approach presented in Section 5.1 was used to identify the incident and reflected pulses

during the initial period of vibration. drpp is then used to formulate a comb filter to approximate

the notches in the spectrum of the residual

Hcf (z) = 1 � µz�b�Dc, (6.4)

where � = 1 � drpp and D = fsf0

is the “loop delay” of the digital waveguide model determining the

pitch of the string [74]. b�Dc denotes the greatest integer less than or equal to the product �D. µ

is a gain factor applied to the delayed signal, which determines how deep the magnitude is for the

notch frequencies in the spectrum where µ values closer to 1 lead to deeper notches [76]. Intuitively,

Equation 6.4 specifies the number of samples, as a fraction of the total loop delay, between the

arrival of each impulse at the bridge.

The basic comb filter structure in Equation 6.4 and Figure 6.7 (a) provides a good approximation

of the spectral nulls associated with the plucking point position. However, it is limited to sample-

level accuracy, which may not adequately approximate the true notch frequencies in the spectrum.

For more precise localization, a fractional delay filter is inserted into the feed-forward path to provide

75

μ +z-λD−

+v(n) u(n)

(a)

μ +F(z) z-λD−

+v(n) u(n)

(b)

Figure 6.7: Comb filter structures for simulating the plucking point location. (a) Basic structure. (b)Basic structure with fractional delay filter added to the feedforward path to implement non-integerdelay.

the required non-integer delay as shown in Figure 6.7 (b) [88]. Thus, the resulting fractional delay

comb filter has the form

Hcf (z) = 1 � µF (z)z�b�Dc, (6.5)

where F (z) provides the fractional precision lost by rounding the product �D. F (z) is designed

using several available techniques in the literature, including all-pass filters and FIR LaGrange

interpolation filters as discussed in Appendix A.

Using the comb filter structure from Equation 6.4 or 6.5, pb(n) can be equalized by inverse

filtering

P (z) =Pb(z)

Hcf (z). (6.6)

Figure 6.8 demonstrates the e↵ects of equalizing the residual in both the time and frequency

domains. Figures 6.8(a) and 6.8(b) show the time and spectral domain plots, respectively, of the

residual obtained from a plucked-guitar tone. Figure 6.8(b) also plots the frequency response of

the estimated comb filter, which approximates the deep notches found in the residual. A 5th-order

fractional delay was used for the comb filter and a value of 0.95 was used for the gain term µ. This

value was found to provide the closest approximation of the spectral notches for the signals in the

76

dataset. Figure 6.8(c) and 6.8(d) show the time and spectral domain plots when the residual is

equalized by inverse filtering. In the spectral domain, inverse comb filtering yields a magnitude

spectrum that is relatively free of the deep notches seen in 6.8(b). In the time domain plot of 6.8(c)

this translates into a signal that is much closer to a pure impulse.

0 2 4 6 8 10−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Time (msec)

Ampl

itude

(a) Residual

500 1000 1500 2000 2500 3000 3500 4000 4500 5000−30

−20

−10

0

10

20

30

Frequency (Hz)

Mag

nitu

de (d

B)

Residual SpectrumComb Filter Approximation

(b) Residual spectrum and comb filter approximation

0 2 4 6 8 10−1.6

−1.4

−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

Time (msec)

Ampl

itude

(c) Residual with bias removed

500 1000 1500 2000 2500 3000 3500 4000 4500 5000−30

−20

−10

0

10

20

30

Frequency (Hz)

Mag

nitu

de (d

B)

Residual SpectrumEqualized Spectrum

(d) Original and equalized spectra using inverse combfilter.

Figure 6.8: Spectral equalization on a residual signal obtained from plucking a guitar string 8.4 cmfrom the bridge (open E, f0 = 331 Hz)

6.4.6 Residual Alignment

After equalization, the final step is to align the processed excitation signals with a reference excitation

signal. This ensures that the impulse “peak” of each signal is aligned in the time domain to avoid

errors for principal components analysis. In practice, this is accomplished by copying the reference

and processed signals and cubing them, which decreases the amplitudes of the samples around the

primary peak. The cross correlation is computed between each signal and the reference pulse. The

lag indicating maximum correlation is used to indicate the shift needed to align each signal with the

77

reference pulse.

For excitation signal modeling and parameterization, the residual equalization scheme has several

benefits. From an intuitive standpoint, the impulsive-like signals obtained from equalization are more

indicative of the performer’s string articulation. Also, signals in this form are simpler to model and

therefore more adept for parameterization. Finally, removing the plucking point bias allows the

relative plucking point location to remain a free parameter for synthesis applications.

6.5 Component-based Analysis of Excitation Signals

6.5.1 Analysis of Recovered Excitation Signals

By applying the excitation recovery and equalization scheme of the previous section to the corpus

of recordings gathered in Section 6.3, analysis of the recovered signals provides insight into the

similarities and di↵erences of excitation signals corresponding to various string articulations. Figure

0 1 2 3 4 5 6 7 8 9 10

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

Time (msec)

Ampl

itude

fortemezzo−fortepiano

(a)

0 1 2 3 4 5 6 7 8 9 10

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

Time (msec)

Ampl

itude

fortemezzo−fortepiano

(b)

Figure 6.9: Excitation signals corresponding to strings excited using a pick (a) and finger (b).

6.9 (a) and (b) shows excitation signals overlayed on each other which were obtained from plucked

guitar tones produced by using either a plastic pick (a) or the player’s finger (b). For both finger and

pick articulations, the dynamics of the pluck consisted of piano (soft), mezzo-forte (moderately loud)

and forte (loud). These plots show a common, impulsive-like contour with additional high-frequency

characteristics depending on the dynamics used. Comparing Figures 6.9 (a) and (b), it is evident

that the signals corresponding to finger articulations are generally wider whereas the pick excitation

signals are more narrow and closer to an ideal impulse.

Figure 6.10 plots the average magnitude spectrum for each type of articulation in the data set.

78

102 103 104−40

−30

−20

−10

0

10

20

30

Frequency (Hz)

Mag

nitu

de (d

B)

fortemezzo−fortepiano

(a)

102 103 104−40

−30

−20

−10

0

10

20

30

Frequency (Hz)

Mag

nitu

de (d

B)

fortemezzo−fortepiano

(b)

Figure 6.10: Average magnitude spectra of signals produced with pick (a) and finger (b).

For each type of articulation (finger or pick), increasing the relative dynamics from piano to forte

results in increased high frequency spectral energy. An interesting observation is that piano-finger

articulations show a significant high frequency ripple. This may be attributed to the deliberately

slower plucking action used to produce these articulations, where the string slides slower o↵ the

player’s finger. When these signals are used to re-synthesized plucked-guitar tones, they often have

a qualitative association with the perceived timbre of the resulting tones. Descriptors, such as

“brightness” are often used to describe the timbre, which generally increases with the dynamics of

the articulations. The varying energy from the plots in Figure 6.10 provides quantitative support of

this observation.

6.5.2 Towards an Excitation Codebook

Based on the observations of Figures 6.9 and 6.10, we propose a data-driven approach for mod-

eling excitation signals using principal components analysis (PCA). Employing PCA is motivated

by observing the similar, impulse-like structure of the excitation signals shown in Figure 6.9. As

discussed, the fine di↵erences between the derived excitation signals can be attributed to the gui-

tarist’s articulation and account, in part, for the spectral characteristics of the perceived tones.

These di↵erences can be modeled using a linear combination of basis vectors to provide the desired

spectral characteristics. The results of this analysis will be used to develop a codebook that consists

of the essential components required to accurately synthesize a multitude of articulation signals. At

present, PCA has not yet been applied to modeling the excitation signals for source-filter models of

plucked-string instruments. However, PCA has been applied to speech coding applications, in which

79

principal components are used to model voice-source waveforms including the complex interactions

between the vocal tract and glottis [19, 51].

This section presents the application of PCA to the data set and the development of an excitation

codebook using the basis vectors. The re-synthesis of excitation signals corresponding to particular

string articulations will also be presented.

6.5.3 Application of Principal Components Analysis

The motivation for applying principal components analysis (PCA) to plucked-guitar excitation sig-

nals is to achieve a parametric representation of these signals through statistical analysis. In Section

6.5.1 it was shown that excitation signals corresponding to di↵erent articulations shared a common

impulsive-contour, but had varying high frequency details depending on the specific articulation.

The goal of PCA is to apply a statistical analysis to this data set which is capable of extracting

basis vectors that can model these fine details. By exploiting redundancy in the data set, PCA leads

to data reduction for parametric representation of signals.

PCA is defined as an orthogonal linear transformation of the data set onto a new coordinate

system [13]. The first principal axes in this new space explains the greatest variance in the original

data set, the second axes maximizes the remaining greatest variance in the data set and so on. Figure

6.11 depicts the application of PCA to synthetic data in a two dimensional space. The vectors v1

and v2 define the principal component axes for the data set.

The principal components are found by computing the eigenvalues and eigenvectors for the

covariance matrix of the data set [5]. This is the well-known Covariance Method for PCA [13]. The

v1v2

Figure 6.11: Application of principal components analysis to a synthetic data set. The vector v1

explains the greatest variance in the data while v2 explains the remaining greatest variance.

80

initial step involves formulating a data matrix

P =

2

66664

| | |

p1 p2 . . . pN

| | |

3

77775

T

(6.7)

where each pi is a M -length column vector corresponding to a particular excitation signal in the

data set. The next step involves computing the covariance matrix for the mean-centered data matrix

by taking

⌃ = Eh(P� u) (P� u)T

i(6.8)

where E is the expectation operator and u = E[P] is the empirical mean of the data matrix. The

principal component basis vectors are obtained through an eigenvalue decomposition of ⌃

V�1⌃V = D (6.9)

where V = [v1v2 . . .vN ] is a matrix of eigenvectors of ⌃ and D is a matrix containing the associ-

ated eigenvalues along its main diagonal. The LAPACK linear algebra software package is used to

compute the eigenvectors and eigenvalues [2].

The columns of V are sorted in order of the decreasing eigenvalues in D such that �1 > �2 >

· · · > �N . This step is performed so that the PC basis vectors are rearranged in a manner that

explains the most variance in the data set.

To reconstruct the excitation signals, the correct linear combination of basis vectors is required.

The correct weights are obtained by projecting the mean-centered data matrix onto the eigenvectors

W = (P� u)V. (6.10)

Equation 6.10 defines an orthogonal linear transformation of the data onto a new coordinate system

81

defined by the basis vectors. The weight matrix W is defined as

W =

2

66664

| | |

w1 w2 . . . wN

| | |

3

77775

T

, (6.11)

where each w is an M -length column vector containing the scores (or weights) to pertaining to a

particular excitation signal in P. These scores indicate how much each basis vector is weighted when

reconstructing the signal and they are also helpful in visualizing the data, as will be discussed in the

next section.

6.5.4 Analysis of PC Weights and Basis Vectors

Principal component analysis of the excitation signals is divided into two groups to separately

examine the set of wound and unwound strings, which have di↵erent physical characteristics, as

described in Section 6.3.

For the set of unwound strings, the recovered excitation signals were normalized to a reference

length of M = 570 samples, which is approximately twice the length of the period corresponding

to the open 3rd string tuned to 196 Hz. For the set of wound strings, the reference length of the

excitation signals was set to M = 910 samples, which is approximately twice the period of the open

6th string tuned to 82.4 Hz. It should be noticed that normalization was achieved via downsampling

to avoid truncating significant sections of the excitation signal. Downsampling to the lowest possible

frequency in the set of strings also avoids the loss of high frequency information present in the data

set. PCA was applied to both groups of excitation signals using the Covariance Method overviewed

in Section 6.5.3.

To analyze the compactness of each data set, the explained variance (EV ) can be computed

using the eigenvalues calculated from PCA

EV =⌃M 0

m �m

⌃Mm �m

(6.12)

where M 0 < M . Figure 6.12 plots the explained variance for the sets of unwound and wound

strings, respectively. In both cases, the plots of explained variance suggest that the data is fairly

low dimensional. Selecting M 0 = 20 basis vectors accounts for > 95% of the variance for the set of

82

10 20 30 40 50 60 70 80 90 10075

80

85

90

95

100

Number of Eigenvalues

Expl

aine

d Va

rianc

e (%

)

(a)

10 20 30 40 50 60 70 80 90 10075

80

85

90

95

100

Number of Eigenvalues

Expl

aine

d Va

rianc

e (%

)

(b)

Figure 6.12: Explained variance of the principal components computed for the set of (a) unwoundand (b) wound strings.

unwound strings while M 0 = 30 is su�cient for > 95% of the variance in the wound set.

For insight on the relationship between the basis vectors and the excitation signals, Figure 6.13

plots the first three basis functions along side example articulations extracted from the data set

consisting of the 1st, 2nd and 3rd strings. The general, impulsive-like contour is captured by the

empirical mean of the data set. In the case of the excitations derived from pick articulations, the

basis vectors plotted provide the high frequency components just before and after the main impulse.

In the case of the finger articulations, these basis vectors are negatively weighted and serve to widen

the main impulse. This relationship agrees with the physical occurrence of plucking a string with

a pick versus a finger, since the physical characteristics of each plucking device directly a↵ect the

shape of the string.

Figure 6.14 shows a similar plot for the 4th, 5th and 6th strings, which have di↵erent physical

characteristics due to their wound construction. By comparing Figures 6.13 and 6.14, it is evident

that the extracted basis vectors are very similar in each case. The di↵erence, however, is in the

empirical mean vector, which is exhibits a pronounced “bump ” immediately after the main impulse.

This feature appears to be characteristic of the articulations produced by the finger, which perhaps

reflects the slippage of the wound string o↵ of the finger.

Figure 6.15 shows projections of how the data pertaining to the string articulations projects

into the space defined by the principal component vectors. Figure 6.15(a) shows the projection of

articulations from strings 1-3 along the 1st and 2nd components. This projection shows that the

data pertaining to specific articulations have a particular arrangement and grouping in this space.

83

−1

−0.5

0

pick excitations

Ampl

itude

fortemezzo−fortepiano

0 2 4 6 8 10

−1

−0.5

0

finger excitations

Time (msec)0 2 4 6 8 10

Time (msec)

principal components

MeanPC 1PC 2PC 3

Figure 6.13: Selected basis vectors extracted from plucked-guitar recordings produced on the 1st,2nd and 3rd strings.

−1

−0.5

0

pick excitations

Ampl

itude

fortemezzo−fortepiano

0 5 10 15

−1

−0.5

0

finger excitations

Time (msec)0 5 10 15

Time (msec)

principal components

MeanPC 1PC 2PC 3

Figure 6.14: Selected basis vectors extracted from plucked-guitar recordings produced on the 4th,5th and 6th strings.

84

−6 −5 −4 −3 −2 −1 0 1 2 3 4−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

1st Principal Component

2nd

Prin

cipa

l Com

pone

nt

pick−fortepick−mezzo−fortepick−pianofinger−fortefinger−mezzo−fortefinger−piano

(a)

−6 −4 −2 0 2 4 6−4

−3

−2

−1

0

1

2

3

1st Principal Component

2nd

Prin

cipa

l Com

pone

nt

pick−fortepick−mezzo−fortepick−pianofinger−fortefinger−mezzo−fortefinger−piano

(b)

Figure 6.15: Projection of guitar excitation signals into the principal component space. Excitationsfrom strings 1 - 3 (a) and 4 - 6 (b).

In particular, the axis pertaining to the 1st correlates to the articulation strength, which increases

independently for pick and finger articulations. Similarly, the projection of the data pertaining to

Strings 4-6 is shown in Figure 6.15(b), which shows a di↵erent arrangement, but a similar clustering

of data based on the articulation type.

6.5.5 Codebook Design

The plots of explained variance in Figure 6.12 demonstrate the relative low dimensionality of the

extracted guitar excitation signals. Here, we present an approach for designing a codebook to further

85

reduce the number of basis vectors required to accurately reconstruct the excitation signals. This

step is advantageous for synthesis systems where it is desirable to faithfully capture the perceptual

characteristics of the performer-string interaction, while minimizing the amount of data required.

Also, this approach separately analyzes the principal component weights for pick and finger articu-

lations to determine the “best” subset of basis vectors comprising each group of articulations. This

method considers that, while PCA yields basis vectors that successively explain the most variance

in the data, certain basis vectors may be more essential to synthesize a particular articulation based

on the magnitude of the associated weight vector.

The codebook design procedure is as follows:

1. Compute the weight matrix for the data set using Equation 6.10. A weight vector w =

[w1w2 . . . wM ] is obtained for each excitation signal in the data set.

2. Take the absolute value for each weight vector w and sort the entries in descending order so

that |w1| > |w2| > · · · > |wM |.

3. Select the first Mtop weights from the sorted weight vector where Mtop is an integer number.

4. For each of the Mtop weights selected, record the occurrence of the associated principal com-

ponent vector into a histogram.

5. Using the histogram as a guide, select a subset L of basis vectors having the highest occurrences

in the histogram (see Figure 6.16) where L < M . This yields a subset of basis vectors V ⇢ V

where V = [v1v2 . . . vL]. These form the codebook entries.

Figure 6.16 shows the histogram computed separately for excitation signals associated with pick

and finger articulations. It is interesting to note that the function of weight frequency vs. principal

component number does not monotonically decrease. This suggests that certain component vectors

are more “essential” than others for representing the ensemble of excitation signals for a particular

articulation.

6.5.6 Codebook Evaluation and Synthesis

After the codebook as been designed, a particular excitation signal can be generated by using a

desired number of codebook entries (i.e. basis vectors) and the appropriate weightings for each

86

5 10 15 20 25 30 35 40 45 500

50

100

150

200

250

Principal Component

Freq

uenc

y

PickFinger

Figure 6.16: Histogram of basis vector occurrences generated with Mtop = 20.

entry. Equation 6.13 presents the synthesis equation

pi = p +LX

m=1

wi,mvm, (6.13)

where L indicates the number of codebook entries used for re-synthesis. The weight values are

obtained by projecting the excitation signal onto the basis vectors. The number of codebook entries

used for synthesis depends on the desired accuracy. Figure 6.17 demonstrates the reconstruction

by varying the number of entries. It is clear that using a single entry does not capture the high

frequency details found in the reference excitation signal. However, using 10 entries approximates

the contour of the signal and 50 entries captures nearly all the high frequency information.

The reconstruction quality can be summarized for the entire data set by computing the signal-

to-noise ratio (SNR) for each signal in the set. SNR is defined as

SNRdB = 10 log10

X

n

✓p(n)

p(n) � p(n)

◆2

, (6.14)

where p(n) and p(n) are the original and reconstructed signals, respectively. Each excitation signal

was constructed by varying the number of codebook entries used and averaging the SNR for all

excitations at particular number of entries. Additionally, separate codebooks were developed for

signals associated with pick or finger articulations to improve error when the number of entries is

low. Figure 6.18 summarizes the results of this analysis.

87

5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Time (msec)

Ampl

itude

1 Codebook Entries

OriginalReconstructed

(a)

5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Time (msec)

Ampl

itude

10 Codebook Entries

OriginalReconstructed

(b)

5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Time (msec)

Ampl

itude

50 Codebook Entries

OriginalReconstructed

(c)

Figure 6.17: Excitation synthesis by varying the number of code book entries: (a) 1 entry, (b) 10entries, (c) 50 entries.

It is of note that the SNR computed for finger excitation signals is generally higher than SNR

computed for pick excitations regardless of the number of codebook entires used. Intuitively, this

agrees with previous observations of the excitation signals obtained from our data set. In general, the

observed signals pertaining to finger articulations were not as complex as the picked articulations

88

0 50 100 150 200 25010

15

20

25

30

35

Codebook Entries

SNR

(dB)

pickfinger

Figure 6.18: Computed Signal-to-noise ratio when increasing the number of codebook entries usedto reconstruct the excitation signals.

(see Figure 6.10). Thus, the finger articulations may be more accurately represented with fewer

components.

The results presented in Figure 6.18 present a strong case for applications requiring accurate

and expressive synthesis with low data storage requirements. The initial PCA analysis yielded

570 basis vectors (for strings 1-3) each with a length of 570 samples. From Figure 6.18, it is

evident that the SNR of the reconstruction error only marginally increases when more than 150

codebook entries are used. 150 codebook entries requires only 26% ( 150⇥570570⇥570 ) of the data obtained

from the initial PCA, which significantly reduces the amount of storage required. At a 16-bit

quantization level, 150 codebook entries would require approximately 167 kilobytes of storage, which

is a modest requirement considering the storage capacities of present day personal computers and

mobile computing devices.

6.6 Nonlinear PCA for Expressive Guitar Synthesis

The linear PCA technique presented in the previous section provides intuition on the underlying

basis functions comprising our data set, it is unclear how exactly the high dimensional component

space relates to the expressive attributes of our data. As shown in Figure 6.15, there is a nonlinear

arrangement of the data along the axes pertaining to the first two principal components. Moreover,

as additional components are needed to accurately reconstruct the source signals, simply sampling

the space defined by the first two components is not su�cient for high quality synthesis. On the

89

other hand, it is di�cult to visualize and infer the underlying structure of the data by projecting

it along additional components. In this section, we explore the application of nonlinear principal

components analysis (NLPCA) to the data extracted from linear PCA to derive a low dimensional

representation of the data. We show that the reduced dimensional space derived using NLPCA

explains the expressive attributes of the excitation signals in the data set. Moreover, this low

dimensional representation can be inverted and therefore adept as an expressive controller using the

original linear components.

6.6.1 Nonlinear Dimensionality Reduction

There are many techniques available in the literature for nonlinear dimensionality reduction, or

manifold-learning, for the purposes of discovering the underlying nonlinear characteristics of high

dimensional data. Such techniques include locally linear embedding (LLE) [65] and Isomap [78].

While LLE and Isomap are useful for data reduction and visualization tasks, their application does

not provide an explicit mapping function to project the reduced dimensionality data back into the

high dimensional space.

For the purpose of developing an expressive control interface, re-mapping the data back into the

original space is essential since we wish to use our linear basis vectors to reconstruct the excitation

pulses. To satisfy this requirement, we employ NLPCA via autoassociative neural networks (ANN)

to achieve dimensionality reduction with explicit re-mapping functions.

σ

σ

σ

σ

σ

σ

σ

σ

*

z1

Input Layer

Mapping Layer

Bottleneck Layer

De-Mapping Layer

Output Layer

w1

w2

w3

w2

w3

w1

T1 T2 T3 T4

Figure 6.19: Architecture for a 3-4-1-4-3 autoassociative neural network.

90

The standard architecture for an ANN is shown in Figure 6.19 and consists of 5 layers [34]. The

input and mapping layers can be viewed as the “extraction” function since it projects the input

variables into a lower dimensional space as specified in the bottleneck layer. The de-mapping and

output layers comprise the “generation” function, which projects the data back into its original

dimensionality. Using Figure 6.19 as an example, the ANN can be specified as a 3-4-1-4-3 network

to indicate the number of nodes at each layer. The nodes in the mapping and de-mapping functions

contain sigmoidal functions and are essential for compressing and decompressing the range of the data

to and from the bottle neck layer. An example sigmoidal function that can be used is the hyperbolic

tangent, which compresses values with a range of (�1,1) to (�1, 1). Since the desired values at the

bottleneck layer are unknown, direct supervised training cannot be used to learn the mapping and

de-mapping functions. Rather, the combined network is learned using back propagation algorithms

to minimize a squared error criterion such that E = 12 kw � wk [34]. From a practical standpoint,

the mapping functions are essentially a set of transformation matrices to compress (T1, T2) and

decompress (T3, T4) the dimensionality of the data.

6.6.2 Application to Guitar Data

To uncover the nonlinear structure of the guitar features extracted in Section 6.5.4, NLPCA was

applied using 25 scores from the linear components analysis at the input layer of the ANN. Empir-

ically, we found that using 25 scores was su�cient in terms of adequately describing the data and

expediting the ANN training. As discussed in Section 6.5.4, 25 linear PCA vectors explains > 95%

of the variance in the data set and leads to good re-synthesis. At the bottleneck layer of the ANN,

we chose two nodes in order to have multiple degrees of freedom which could be used to synthesize

excitation pulses in an expressive control interface. These design criteria yielded a 25-6-2-6-25 ANN

architecture, which was trained using the NLPCA MATLAB Toolbox [67].

Figure 6.20 compares the projection of the data into the linear component space and the reduced

dimension space defined by the bottleneck layer of the ANN. As shown in 6.20(b). Unlike the linear

projection in 6.20(a), the bottleneck layer of the NLPCA space has “unwrapped” the nonlinear data

arrangement so that it is now clustered about linear axes. Figure 6.21 shows an additional linear

rotation applied to this new space for a clearer view of how the axes relate to the data set. By

examining this space, the data is clearly organized around the orthogonal z1 and z2 axes. Selected

excitation pulses are also shown, which were synthesized by sampling this coordinate space, project-

91

ing back into the linear principal component domain using the transformation matrices (T3, T4) from

the ANN and using the resulting scores to reconstruct the pulse with linear component vectors.

−6 −5 −4 −3 −2 −1 0 1 2 3 4−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5

1st Principal Component

2nd

Prin

cipa

l Com

pone

nt

pick−fortepick−mezzo−fortepick−pianofinger−fortefinger−mezzo−fortefinger−piano

(a)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

v1

v 2

pick, fortepick, mezzo−fortepick, pianofinger, fortefinger, mezzo−fortefinger, piano

(b)

Figure 6.20: Top: Projection of excitation signals into the space defined by the first two linearprincipal components. Bottom: Projection of the linear PCA weights along the axis defined by thebottleneck layer of the trained 25-6-2-6-25 ANN.

The nonlinear component defined by the z1 axis describes the articulation type where points

sampled in the space z1 < 0 pertain to finger articulations and points sampled for z1 > 0 pertain

to pick articulations. The finger articulations feature a wider excitation pulse in contrast to the

pick, where the pulse is generally more narrow and impulsive. In both articulation spaces, moving

from left to right increases the relative dynamics. The second nonlinear component defined by the

z2 axis relates to the contact time of the articulation. As z2 is increased, the excitation pulse width

increases for both articulation types.

92

2 4 6 8 10 12−1

−0.5

0

0.5

Time (msec)

Ampl

itude

2 4 6 8 10 12−1

−0.5

0

0.5

Time (msec)

Ampl

itude

2 4 6 8 10 12−1

−0.5

0

0.5

Time (msec)

Ampl

itude

2 4 6 8 10 12−1

−0.5

0

0.5

Time (msec)

Ampl

itude

2 4 6 8 10 12−1

−0.5

0

0.5

Time (msec)

Ampl

itude

2 4 6 8 10 12−1

−0.5

0

0.5

Time (msec)

Ampl

itude

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

z1

z 2

pick, fortepick, mezzo−fortepick, pianofinger, fortefinger, mezzo−fortefinger, piano

Figure 6.21: Guitar data projected along orthogonal principal axes defined by the ANN (center).Example excitation pulses resulting from sampling this space are also shown.

6.6.3 Expressive Control Interface

We demonstrate the practical application of this research in a touch-based iPad interface shown in

Figure 6.22. This interface acts as a “tabletop” guitar, where the performer uses one hand to provide

the articulation and the other to key in the desired pitch(es). The articulation is applied to the large,

gradient square in Figure 6.22, which is a mapping of the reduced dimensionality space shown in

Figure 6.21. Moving up along the vertical axis of the articulation space increases the dynamics of

the articulation (piano to forte) and moving right to left on the horizontal axis increases the contact

time. The articulation area is capable of multi-touch input so the performer can use multiple fingers

within the articulation area to give each tone a unique timbre.

The colored keys on the left-side of Figure 6.22 allow the user to produce certain pitches. Adjacent

93

Contact Time

Strength

Articulation Space

Keyboard Area

Figure 6.22: Tabletop guitar interface for the components based excitation synthesis. The articu-lation is applied in the gradient rectangle, while the colored squares allow the performer to key inspecific pitches.

keys on the horizontal axis are tuned a half step apart and their color indicates that they are part

of the same “string” so that only the leading key on the string can be played at once. Diagonal keys

on adjacent strings are tuned to a Major 3rd interval while the o↵-diagonal keys represent a Minor

3rd interval. This arrangement allows the performer to easily finger di↵erent chord shapes.

The synthesis engine for the tabletop interface is capable of computing the excitation signal

corresponding to the performer’s touch point within the articulation space and filtering the resulting

excitation signal for multiple tones in real-time. The filter module used for the string is implemented

with the single delay-loop model shown in Figure 6.1. Though this filter has a large number of delay

taps, which is dependent on the pitch, only a few of these taps have non-zero coe�cients, which

permits an e�cient implementation of infinite impulse response filtering. Currently, the relative

plucking position along the string is fixed, though this may be a free parameter in future versions

of the application. The excitation signal can be updated in real-time during performance, which

is made possible by the iPad’s support of hardware-accelerated vector libraries. These include the

matrix multiplication routines to project the low dimensional user input into the high dimensional

component space. Through our own testing, we found that the excitation signal is typically computed

94

in < 1 millisecond, which is more than adequate for real-time performance.

6.7 Discussion

In this chapter, a novel, component-based approach was presented for modeling the excitation signals

of plucked-guitar tones. This method draws on physically inspired modeling techniques to extract the

excitation pulses from recorded performances pertaining to various articulation styles in accordance

with a source-filter model. Principal components analysis (PCA) was used to model the excitation

pulses using the resulting set of linear basis vectors. While this analysis led to a large number of

basis vectors, a codebook was developed to reduce the number required for accurate modeling.

To understand the relation between the linear components and the expressive attributes of the

excitation signals in the data set, nonlinear principal components analysis (NLPCA) was used to

achieve a reduced dimensional space using the linear weights as inputs to autoassociative neural

network (ANN). Using the ANN, the relation of the expressive attributes of the excitation signals

to the axes of the reduced dimensional space are clear.

A pertinent application of this research includes developing new interfaces for musical expression.

The application of NLPCA to the excitation signal data set derives a low dimensional representation

based on linear basis vectors and has a clear relationship to the expressive attributes of the data set.

Since the transformation into the reduced space is invertible, this representation could be leveraged

into gesture recognition and control applications for music synthesis. At present, gesture-based

recognition systems for guitar synthesis rely on non-parametric, sample-based synthesizers or at

best, physical models where the excitation signals are saved o↵-line [26, 55]. The component-based

modeling approach presented here is limited only by the data used to derive component vectors and

can be used for arbitrary synthesis using the reduced dimensional space.

Similar to gesture-recognition systems, recent advances in mobile computing technology make

touch-based devices a compelling platform for expressive musical interfaces, especially for the gui-

tar. Among the existing interfaces are Apple’s iPad implementation of Garageband, which uses

accelerometer data in response to the user’s tapping strength to trigger an appropriate sample

for the synthesizer [20]. Similarly, the OMGuitar enables single note or chorded performance and

triggers chord samples based on the how quickly the user “strums” the interface [1]. In both cases,

sample-based synthesizers are used, though as shown in the previous section, the reduced-dimensional

component space is highly applicable to these interfaces.

95

CHAPTER 7: CONCLUSIONS

This research presented several novel techniques for the analysis and synthesis of guitar performance

focusing on the player’s string articulation, which can be summarized as follows:

• Generated a data set of plucked guitar tones comprising variations of the performer’s articu-

lation including the plucking mechanism and strength, which spans all of the guitar’s strings

and several fretting positions.

• Developed a framework for jointly estimating the source and filter parameters for plucked-

guitar tones based on a physically-inspired model.

• Proposed and demonstrated a novel application of principal component analysis to model the

source signal for plucked guitar tones to encapsulate characteristics of various string articula-

tions.

• Utilized nonlinear principal components analysis to derive an expressive control space to syn-

thesize excitation signals corresponding to guitar articulations.

This research is centered on source-filter modeling techniques widely used in the literature since the

model highly analogous to the process of exciting a resonant string. I have shown that estimating the

parameters of the model can be formulated as a joint estimation problem where the motivation is to

account for the simultaneous variation between the performer’s articulation and the string’s resonant

response and that this technique is adept at capturing the parameters and perceptual attributes

of recorded plucked-guitar tones produced with di↵erent plucking mechanisms and strengths. A

novel, data-driven approach for modeling excitation signals based on linear and nonlinear principal

components was also presented. This modeling approach decouples the e↵ect of the performer’s

plucking position on the string and treats each excitation signal as a weighted combination of basis

vectors. Nonlinear components analysis is used to derive an invertible, expressive space which can be

used to synthetically generate excitation signals pertaining to specific articulations in the data set.

A practical application of this research was also presented where an iPad was used to demonstrate

flexible, real-time synthesis of guitar tones with control over the string articulation.

96

This chapter will discuss limitations of the proposed methods with regard to the techniques

employed and the underlying physics of vibrating strings. Future directions for this research will

also be discussed.

7.1 Expressive Limitations

The techniques presented in this dissertation are primarily concerned with modeling the performer’s

articulation through their plucking action, which includes the e↵ects of plucking mechanism and

strength. However, guitarists use additional expressive techniques during performance pertaining

to the action of their fretting hand which controls the pitch of the plucked-tone. These techniques

include legato, or smooth, transitions between notes and pitch shifting techniques such as bends and

vibrato, which alter the pitch of a tone after it has been excited. Due to the time-varying nature

of the the tones resulting from these techniques, analysis and synthesis with linear time-invariant

source filter models is extremely di�cult or unfeasible.

Guitarists typically play with legato style using slides, “hammer-on’s” or “pull-o↵’s” between

notes. When performing a slide, the note is played at a particular position and the fretting finger

moves up or down the string after the note has sounded until the desired pitch is reached. Similarly,

a hammer-on involves playing a particular note with a fretting finger and using another finger to

clamp down the string at a higher fret position after the note has already sounded to achieve a sudden

pitch increase. The complementary technique is the pull-o↵, where the fretting finger is released

and another finger, already in position behind the fretting finger, sets a lower pitch. The discrete

pitch changes resulting from tones produced with legato are not easily analyzed with a source-filter

model. In particular, sliding into a note causes one or many discrete pitch changes as the guitarist’s

finger moves along the fretboard to its final position. The resulting tone will exhibit time varying

pitch and decay characteristics. The hammer-on technique introduces additional complexity into

the analysis since the string is “excited” in a sense by the second finger clamping the new fret in

an impulsive-like manner. Furthermore, melodies can often be performed with hammer-on’s and

pull-o↵’s without using the articulation hand to initially excite the string, which diverges from the

traditional notion of how the string is excited.

While legato performance introduces sudden, discrete pitch changes to the plucked tone, vibrato

and string bending alter the pitch of the fretted note without changing the fret position. Vibrato

is achieved by rapidly wiggling the fretting finger at a particular position to slightly alter the pitch

97

of the tone. Pitch-bending involves physically bending the string at the fretting position, thereby

altering its tension to achieve a pitch increase. While a certain degree of vibrato may be negligible

from an analysis standpoint, pitch bending produces a signal with noticeable time-varying pitch,

which cannot be analyzed using either the proposed joint source-filter estimation scheme or existing

spectral-based filter estimation schemes. This is due to the harmonically related partials shifting

with the fundamental frequency so that the continuously changing partial frequencies and decay

rates must be identified. Implementing pitch shifting via post-processing can be achieved but with

certain restrictions. For example, vibrato can be implemented using the source-filter model by

varying the fractional delay filter in the feedback loop as long as the pitch change is small. However,

significant pitch shifting requires modification of the bulk delay term in the feedback loop. Such

modification requires continuously resampling the delay line to simulate the gradual tension change

in the string [80]. In certain synthesis systems, pitch bending is often simulated by applying a

phase vocoder algorithm which applies short-time spectral manipulation to the signal to smoothly

alter the pitch of a synthetic signal [20]. Alternately, a sinusoidal model can easily be applied to

the time-varying characteristics associated with string bending, though the benefits associated with

source-filter modeling will be lost.

7.2 Physical Limitations

The so-called single delay-loop (SDL) model that forms the basis of the analysis and synthesis

techniques presented in this dissertation describes the basic components of plucked string synthesis

including articulation, pitch and frequency-dependent decay. However, there are several physical

aspects of vibrating guitar strings that are not encapsulated by the model.

It is well understood that real strings vibrate along the transverse and longitudinal directions

which are perpendicular and parallel to the guitar’s body, respectively. The perceived vibration

of the string is the sum of vibration in both directions, including coupling e↵ects, and in certain

cases a “beating phenomena” is heard, which is caused by slight di↵erences in the string’s e↵ective

length along the transverse and longitudinal axes [16]. The beating phenomena causes the sum

and di↵erence frequencies to be perceived by the listener. Identification of the beating frequencies

in guitar tones through analysis is di�cult since it is a fast occurring phenomenon requiring high

spectral resolution (and thus long window lengths) to identify the distinct frequencies. Lee presents

an approach for finding the beating frequencies through identification of the two-stage decay evident

98

in plucked tones, but it is unclear how to automate the process which is based on an additive

synthesis model [43]. While beating isn’t included in the analysis techniques presented here, beating

implementation is often accomplished via an ad-hoc approach where two SDL models are used, each

having a slightly di↵erent pitch, and placed in parallel. The outputs of each SDL are scaled by a gain

factor and mixed to create a synthetic signal with beating present around the fundamental frequency

[44]. The presented synthesis techniques can easily be modified to include beating, though automated

analysis and identification of the beating frequencies remains an on-going research problem.

The pitch shifting due the tension modulation present in real plucked-guitar strings is not ex-

plicitly accounted for in the joint source-filter estimation since it is a slowly time-varying process.

However, when the measured pitch shift is relatively small, the fractional delay filter can be slowly

varied over time to manipulate the frequency as discussed in Appendix B. The frequency trajecto-

ries are obtained by modeling the pitch of a plucked tone via short-time analysis. A technique for

incorporating tension modulation into a synthesis system involves re-sampling the delay line to alter

the pitch [80] or using a sinusoidal model where the frequencies of the harmonically related partials

gradually decrease over time [42].

7.3 Future Directions

Beyond the expressive and physical limitations of the modeling techniques demonstrated, the com-

putational model of guitar articulations developed in this thesis could be furthered through the

collection of performance data from additional guitarists. However, acquiring this data is challeng-

ing due to the specific guitar configuration (e.g. bridge-mounted piezoelectric pickup) required for

recording and analyzing the performance. Currently, no publicly available datasets exists while

also satisfying the recording configuration, which is why a dataset was created specifically for this

research.

There is also the issue of recording the guitarist in the context of a live performance. The data

set developed is centered on capturing the acoustic attributes of the expression associated with an

articulation in a controlled environment so that individual strings can be isolated. During a live

performance, guitarists will alter their articulation in other ways, especially when strumming the

strings to produce chords. This necessitates a divided, or “hexaphonic”, guitar pickup for capturing

the audio from individual strings while avoiding the challenging task of multiple source separation

from a polyphonic mixture. Divided pickups are commercially available for common guitar models,

99

but a streamlined apparatus is required to interface the signals with recording equipment without

being obtrusive to the performer. Development of this complete, polyphonic recording system for

capturing contextual performance remains a task for future work.

With the inclusion of performance data from many guitarist’s, computational models for specific

performers could be developed to determine if the di↵erences in articulation are discernible using the

proposed modeling techniques. These models could then be used to “profile” a particular performer

and integrate the related parameters into a synthesis system for the application of new musical

interfaces. It was already demonstrated that the excitation synthesis could be implemented on

currently available mobile computing platforms, but emerging gesture recognition technologies, such

as the Microsoft Kinect, could also be used to harness this technology for performance, entertainment

and gaming applications.

From a physical modeling standpoint, additional characteristics of guitars such as body resonance

e↵ects and magnetic pickups could be studied including how the performer uses these aspects of the

instrument during performance. Foremost, inclusion of these e↵ects is required for acoustically ac-

curate synthesis of a “complete” guitar model, which would necessitate augmenting the source-filter

model with blocks implementing the signal processing tasks for modeling the pickups, resonance,

etc. Also, analysis of how the guitarist uses certain techniques such as plucking position or pickup

position either consciously, or subconsciously in context with the performance also warrants analysis.

100

Appendix A: Overview of Fractional Delay Filters

A.1 Overview

The waveguide models introduced in Chapter 3 depend on a delay loop parameter, D, that sets the

waveguide’s total sample delay and thus the pitch, f0, of the synthesized tone such that D = fsf0

,

where fs is the sampling frequency. In many cases, however, D is a non-integer which cannot be

obtained by a simple ratio of integers. In some systems, it is permissible to adjust the sampling rate

to achieve a desired pitch, though this is often undesirable especially when multiple voices are being

synthesized or when certain performance techniques, such as tremolo and vibrato, require D to be

a continuously varying parameter.

Fractional delay filters have been widely used in the literature to provide the required non-

integer delay required for precisely tuning waveguide models [25, 26, 29, 56, 59, 85]. However, design

and implementation of such filters is not straight forward and requires some special consideration.

This appendix will briefly overview the basic theory and practical considerations associated with

designing and implementing FIR-type fractional delay filters. While IIR-type filters are also used

for this task, FIR filters are preferred in the literature since they can be easily designed with good

frequency response characteristics. In particular, the Lagrange interpolation fractional delay filter

is examined, which is used in this thesis.

A.2 The Ideal Fractional Delay Filter

To understand fractional delay filters, it is useful to consider a discrete time signal, x(n), delayed

by D samples. D is a real number and is expressed as

D = dI + dF (A.1)

where dI and dF are the integer and fractional components, respectively. x(n) is shifted by D

samples via convolution with a shifting filter, hid(n), to yield y = x(n�D) [54]. In the z-transform

101

domain, the transfer function of the ideal shifting filter is,

Hid(z) =Y (z)

X(z)=

X(z)z�D

X(z)= z�D (A.2)

and the corresponding frequency response is obtained by setting z = e�j! in Equation A.2:

Hid(e�j!) = e�j!D (A.3)

By computing the magnitude, phase and group delay responses for Equation A.3, it can be veri-

fied that Hid(e�j!) is distortionless since it will pass an input signal without magnitude or phase

distortion as shown in Equations A.4 - A.6 [37]:

��Hid(e�j!)

�� =��e�j!D

�� = 1 (A.4)

⇥id(!) = \Hid(e�j!) = �!D (A.5)

⌧id(!) = � @

@!⇥id(!) = D (A.6)

It is intuitive that the filter will not distort the magnitude of an input signal since it has unity gain,

but the importance of the linear phase response shown in Equation A.5 cannot be understated.

Linear phase implies that the system has a constant group delay such that the input signal is

uniformly delayed by D samples regardless of frequency.

The impulse response of Hid(e�j!) can be obtained by taking its inverse discrete-time Fourier

transform [54]:

hid(n) =1

2⇡

Z ⇡

�⇡

Hid(e�j!)ej!nd! (A.7)

=1

2⇡

Z ⇡

�⇡

e�j!Dej!nd! (A.8)

=1

2⇡

Z ⇡

�⇡

ej!(n�D)d! (A.9)

By evaluating the integral in Equation A.9, the impulse response of Hid(e�j!) can be verified as

the sinc function shifted by D samples

hid(n) =sin(⇡(n � D))

⇡(n � D)= sinc (n � D). (A.10)

102

−2 −1 0 1 2 3 4 5 6 7 8

0

0.5

1D = 3

Sample (n)Am

plitu

de

−2 −1 0 1 2 3 4 5 6 7 8

0

0.5

1D = 3.3

Sample (n)

Ampl

itude

Figure A.1: Impulse responses of an ideal shifting filter when the sample delay assumes an integer(top) and non-integer (bottom) number of samples.

Laakso et al. address the problems with implementing a fractional delay filter by comparing the

impulse responses for hid(n) when D takes on integer and non-integer values as shown in Figure A.1

[35, 87] In the case where D = 3, hid(n) reduces to a unit impulse at n = 3 since the sinc function

is exactly zero at all other sample values. When D = 3.3, however, the hid(n) cannot be reduce

to a simple unit impulse, since the peak of the sinc function is o↵set from an integer sample value.

Now, an interpolation using all samples of the sinc function is required to delay an input signal by

D = 3.3 samples. As the bottom panel of Figure A.1 shows, implementing this impulse response is

not possible since hs(n) is both non-causal and infinite in length.

A.3 Approximation Using FIR Filters

Since the ideal fractional delay (FD) filter cannot be realized in practice, techniques are required to

approximate the impulse response for practical implementations. This section will briefly overview

the design techniques used to develop approximations based on finite impulse response (FIR) filters.

A FIR filter that approximates the ideal shifting filter has the following form

HF (z) =NX

n=0

h(n)z�n (A.11)

where N indicates the filter order, such that the filter consists of N + 1 coe�cients. To determine

103

the coe�cients of h(n) that approximate the ideal filter, an error function is defined

E(ej!) = Hid(ej!) � HF (ej!). (A.12)

Laakso et al. obtain a time-domain error criterion by applying the L2 norm to Equation A.12 and

applying Parseval’s Theorem [35] which yields

eL2(n) =1X

n=�1|hF (n) � hid(n)|2. (A.13)

The optimal solution for hF (n) as per Equation A.13 is the ideal impulse response truncated and

delayed by the required number of samples. The error decreases as the number of samples used to

approximate the sinc function are increased.

A.3.1 Delay Approximation using Lagrange Interpolation Filters

A consequence of implementing fractional delay (FD) filters based on a truncated sinc function is the

well-known Gibbs phenomenon [54]. Essentially, the Gibbs phenomenon results from truncating the

impulse response of a sinc function with a square window, which results in the FD filter’s magnitude

response exhibiting a ripple due to side lobe interaction. This rippling is often undesirable and

thus, more sophisticated techniques are required to design FD filters with relatively flat magnitude

responses.

Lagrange interpolation filters allow for FD filter design with a maximally flat magnitude response

at a frequency of interest. The coe�cients for the Lagrange filters are obtained by setting the

derivatives of Equation A.12 equal to 0.

dnE(ej!)

d!n|!=!0 = 0 for n = 0, 1, 2, . . . , N (A.14)

In most cases, it is desired that the maximally flat magnitude response occur near DC, which requires

!0 = 0. The solution of Equation A.14 is obtained by solving a system of N linear equations which

has the following solution

h(n) =NY

k=0,k 6=n

D � k

n � kfor n = 0, 1, 2, . . . , N (A.15)

104

where D is the total delay including the fractional component [35, 53]. The name of the Lagrange

interpolation filter becomes obvious when considering the case N = 1 which yields coe�cients

h(0) = 1 � D and h(1) = D, which is equivalent to a linear interpolation between two samples.

Figure A.2 illustrates the tradeo↵s associated with designing Lagrange filters with a desired

accuracy. As the order N is increased, the values of the h(n) approach the ideal fractional delay filter

at the expense of adding integer sample delay. Figure A.3 demonstrates the tradeo↵s associated with

the order of the Lagrange FD filter and its frequency response. As N increases, the cuto↵ frequency

for the filter’s magnitude response increases, thus providing a flatter magnitude response across a

wider bandwidth. Similarly, we also see the tradeo↵ associated with the group delay of the FD

filter, since increasing N maintains the desired flat group delay response over a wider bandwidth.

For an N -order Lagrange FD filter designed for a maximally flat response at DC, the associated

bulk integer delay, dI , at this frequency can be computed as bN/2c.

0 1 2 3 4 5 6 7 8 9 10

0

0.5

1

Sample (n)

Ampl

itude

Lagrange Filter, N = 3Ideal

0 1 2 3 4 5 6 7 8 9 10

0

0.5

1

Sample (n)

Ampl

itude

Lagrange Filter, N = 7Ideal

Figure A.2: Lagrange interpolation filters with order N = 3 (top) and N = 7 (bottom) to providea fractional delay, dF = 0.3. As the order of the filter is increased, the Lagrange filter coe�cientsnear the values of the ideal function.

A.4 Further Considerations

Lagrange interpolation filters are a popular choice in many waveguide synthesis systems, since the

filter coe�cients are relatively easy to compute and the frequency response characteristics are suf-

ficient for relatively low order filters. In general, FIR FD filters are preferred for musical synthesis

because they can be varied during synthesis to achieve certain a↵ects (such as pitch bending or

105

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−10

−5

0

5

Normalized Frequency

Mag

nitu

de (d

B)

N = 3N = 5N = 7

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1

2

3

4

5

Normalized Frequency

Gro

up D

elay

(sam

ples

)

N = 3 N = 5 N = 7

Figure A.3: Frequency response characteristics of Lagrange interpolation filters with order N =3, 5, 7 to provide a fractional delay dF = 0.3. Magnitude (top) and group delay (bottom) character-istics are plotted.

vibrato) without noticeable transient e↵ects, which is problematic when using IIR FD filter.

FD filter design based on a maximally flat frequency characteristics is just one of many techniques

used for designing FD filters. The reader is referred to the work of Laakso and Valimaki for additional

FD design techniques, including windowed sinc functions, weighted-least squares and IIR design

techniques [35, 87]. Additionally, these works provide theory and techniques required to develop IIR

FD filters using all-pass filters, which have their own benefits over FIR implementations.

106

Appendix B: Pitch Glide Modeling

B.1 Overview

Pitch glide is an important physical consequence resulting from plucking a guitar string. As a

plucked-string vibrates about its equilibrium, or resting, position, it is subject to elongation of its

nominal length. This elongation increases the tension of the string beyond its nominal value and,

consequently, increases the fundamental frequency of vibration. As shown in the following equation,

the fundamental frequency of vibration is proportional to the square of the string’s tension:

f0 =

rKt

4mL(B.1)

where Kt, m and L are the string’s tension, mass and length, respectively [17]. Since the string loses

energy during vibrations due to various frictional forces, the amplitude of its transverse displacements

decreases over time and thus, the elongation decreases as well. After some amount of time, the string

will vibrate at near its nominal, or un-stretched value, and a steady state pitch is perceived.

Modeling and simulation of pitch glide is an important consideration for an expressive guitar

system since it can lead to tones that have a noticeably higher pitch near the “attack” part of

the note than it does some time later. The amount of pitch glide present in a tone depends on

the guitarist’s dynamics, or the relative “hardness” used to displace the string. Therefore, as a

guitarist increases their dynamics during performance, we expect the resulting notes to have a

greater perceived pitch initially than some time after the “attack” phase.

This appendix will discuss the modeling and implementation of pitch glide for expressive guitar

synthesis. This includes the estimation of time-varying pitch from plucked-guitar recordings, fitting

estimated data to a model of pitch glide and practical implementation.

107

B.2 Pitch Glide Model

The following model was proposed by Lee et al. [42] to simulate the pitch glide trajectory observed

in recorded guitar tones

f(t) = fss(1 + ↵e�t/⌧ ). (B.2)

This representation consists of multiplying the steady state pitch value fss, which is associated with

the nominal tension of the string, by an exponentially decaying function with time constant ⌧ and

a multiplicative factor ↵. This model ensures that the tone decays to its steady state pitch as

t ! 1, which agrees with the physicality of the damped vibrating string. The multiplicative factor

↵ determines the amount of pitch excursion such that increasing ↵ increases the amount of pitch

deviation from its steady state value. By setting ↵ to an arbitrarily small (or zero) value, the pitch

glide e↵ect is e↵ectively eliminated so that f(t) ⇡ fss for all values of t. For a physical interpretation

of Equation B.2, Lee relates the time-varying fundamental frequency to the square of the slope of

the string’s displacement, which decays exponentially over time [42].

The pitch glide model of Equation B.2 is suitable for an expressive synthesis system because

its parameters can be related to particular articulations. In particular, the ↵ parameter allows the

amount of pitch glide to vary based on the dynamics used by the player. This parameter, and the

others, must be determined through analysis of plucked-guitar recordings.

B.3 Pitch Glide Measurement

In this section we discuss the estimation of pitch glide parameters through analysis of plucked-guitar

recordings. The data set used for parameter estimation consists of approximately 1000 samples of

guitar tones recorded using a bridge-mounted piezo-electric pick-up. The recorded notes span all 6

guitar strings and were produced by varying the plucking device and articulation from piano (soft),

to mezzo-forte (moderately loud) to forte (loud). More information about the data is provided in

Section 6.3.

The first step involves acquisition of the pitch glide data from the recordings. A short-time

analysis is applied to the recordings to extract 1500 msec of pitch information for each tone beginning

at the “attack” instant of the tone. This audio segment is sub-divided into overlapping frames, each

having a duration of 90 msec and adjacent frames are overlapped by a factor of 90%.

108

For each analysis frame, the Fast Fourier Transform (FFT) is computed and the pitch is deter-

mined by searching for the prominent peak in the frequency spectrum. The underlying frequency

bin at the spectral peak indicates the pitch for the vibrating string at that moment. This pitch

estimation is improved via quadratic interpolation around the spectral peak [77]. Utilizing the peak

FFT bin and the magnitudes of the neighboring bins on each side of the peak, the “true” peak

is found by finding the maxima of the parabola passing through all three points. The underlying

frequency of this maxima is taken as the “true” frequency. This step improves the pitch estimation

by compensating for the limited frequency resolution of the FFT.

By repeating the pitch estimation for each frame, a pitch trajectory is obtained for each recording

in the data set. Since the approach involves determining the parameters of Equation B.2 from many

recordings, each pitch trajectory is normalized by its steady state frequencyf(t)/fss. By dividing

Equation B.2 by fss, the measured data must be fit to the following equation

fnorm(t) = 1 + ↵e�t/⌧ , (B.3)

where fnorm(t) = f(t)/fss is the normalized pitch trajectory. The normalized pitch trajectories cor-

responding to recordings produced with a specific articulation (e.g. piano) are averaged to compute

pitch trajectory prototype curves used for model fitting.

B.4 Nonlinear Modeling and Data Fitting

B.4.1 Nonlinear Least Squares Formulation

To determine the model parameters that best describe the measured pitch glide trajectories, a

nonlinear least-squares (NLLS) problem is formulated. The problem formulation involves defining a

residual function

r(t) = f(t) � F (t, x), (B.4)

where f is a prototype pitch glide curve measured from audio and F (t, x) is the pitch glide function

in Equation B.3 with unknown parameters x = [↵ ⌧ ]. The optimal parameters satisfy S(x⇤) = 0

109

where S is the sum of squares of the residual defined by

S(x) =X

t

r(t)2. (B.5)

The unknowns in x are found by taking the gradient of S with respect to x and setting it equal to

zero

@S

@xi= 2

X

t

rt@rt@xi

= 0 i = 1, 2. (B.6)

Equation B.6 lacks a closed form solution since the partial derivatives @rt/@xi of the nonlinear func-

tion depend on both the independent variable and the unknown parameters. In practice, nonlinear

least squares problems are solved using iterative methods where initial values of the unknown param-

eters in x are specified and iteratively refined using successive approximation [32]. This linearizes

the model through a Taylor series expansion by ignoring the high order, nonlinear terms.

The algorithm chosen for successive approximation in this implementation is the Gauss-Newton

Iteration, which is available in many numerical software packages. The MATLAB function lsqnonlin

applies NLLS approximation using the Gauss-Newton Iteration by default [48]. This function allows

the programmer to specify the nonlinear function desired for curve-fitting as well as the initial pa-

rameter estimations, bounds for the unknown parameters, the maximum number of iterations and

several other options.

B.4.2 Fitting and Results

We first extract the pitch glide parameters for the forte articulations using MATLAB’s lsqnonlin

function. The results of this fit are shown in Figure B.1.

Using the time constant ⌧ estimated for the forte pitch glide curve, we constrain the NNLS

algorithm for the remaining piano and mezzo-forte curves by enforcing the same ⌧ value for all curves.

This results in all pitch glide curves having the same time constant, but di↵ering ↵ parameters, which

determine the maximum amount of pitch deviation from the steady state value. In this manner, ↵

acts as an expressive control parameter which can be varied to continuously interpolate between the

piano and forte pitch glide curves. Figure B.2 shows the observed and estimated pitch glide curves

for each articulation and clearly shows the e↵ect of the ↵ parameter on the initial pitch glide value.

The extracted parameters for each articulation are summarized in Table B.1.

110

0.2 0.4 0.6 0.8 1 1.2 1.41

1.0005

1.001

1.0015

1.002

1.0025

1.003

1.0035

1.004

1.0045

Time (sec)

Nor

mal

ized

Fre

quen

cy

Forte, measuredForte, fitted

Figure B.1: Measured and modeled pitch glide for forte plucks.

B.5 Implementation

For implementation of the pitch glide e↵ect in a plucked-guitar synthesis system, we employ the

well-known single delay-loop model, which was presented in Chapter 2 and is shown in Figure B.3.

The pitch of the synthetic tone is determined by the ratio fsD where fs is the sampling frequency and

D is the delay line length. Since the ratio of fsD is often non-integer, HF (z) provides the required

non-integer delay. Appendix A provides an overview of fractional delay filters.

The fractional delay filter chosen is a variable 5th order LaGrange interpolation filter inserted

into the feedback loop of the single delay-loop model as shown in Figure B.3. Equation B.3 can be

multiplied by the desired steady state pitch value to achieve the correct tuning. The pitch glide is

implemented by updating the coe�cients of HF (z) every 50 milliseconds according to the prototype

curve for a particular articulation. Updating the coe�cients in this manner is possible, since the

single delay-loop model is implemented as a Type I IIR filter, which has separate delay lines for the

input and output feedback [77].

111

0.2 0.4 0.6 0.8 1 1.2 1.41

1.0005

1.001

1.0015

1.002

1.0025

1.003

1.0035

1.004

1.0045

Time (sec)

Nor

mal

ized

Fre

quen

cy

Forte, measuredForte, fittedMezzo−forte, measuredMezzo−forte, fittedPiano, measuredPiano, fitted

Figure B.2: Measured and modeled pitch glide for piano, mezzo-forte and forte plucks.

Hl (z) HF (z) z-DI

+pb(n) y(n)

Figure B.3: Single delay-loop waveguide filter with variable fractional delay filter, HF (z).

112

Table B.1: Pitch glide parameters of Equation B.3 for plucked guitar tones for each guitar string.p, mf and f indicate strings excited with piano, mezzo-forte and forte dynamics, respectively.

Pitch Glide Parameters

String Dynamic ↵ (⇥10�4) ⌧

1p 1.523 0.2284

mf 3.123 0.2284

f 11.94 0.2284

2p 9.337 0.4037

mf 19.41 0.4037

f 44.39 0.4037

3p 16.45 0.3958

mf 35.51 0.3958

f 72.91 0.3958

4p 26.03 0.3766

mf 36.55 0.3766

f 60.89 0.3766

5p 35.04 0.3786

mf 60.21 0.3786

f 68.28 0.3786

6p 38.03 0.3523

mf 62.76 0.3523

f 81.24 0.3523

113

Bibliography

[1] Amidio. OMGuitar advanced guitar synth. http://amidio.com/portfolio/omguitar/, Jan.2012.

[2] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Green-baum, S. Hammarling, A. McKenney, and D. Sorensen. LAPACK Users’ Guide. Society forIndustrial and Applied Mathematics, Philadelphia, PA, third edition, 1999.

[3] B. Bank and V. Valimaki. Robust loss filter design for digital waveguide synthesis of stringtones. Signal Processing Letters, IEEE, 10(1):18 – 20, Jan. 2003.

[4] J. P. Bello, L. Daudet, S. Abdallah, C. Duxbury, M. Davies, and M. B. Sandler. A tutorial ononset detection in music signals. IEEE Transactions on Speech and Audio Processing, 13(5),2005.

[5] C.M. Bishop. Pattern Recognition and Machine Learning. Information science and statistics.Springer, 2006.

[6] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, The Edin-burgh Building, Cambridge, CB2 8RU, UK, 2004.

[7] K. Bradley, Mu-Huo Cheng, and V.L. Stonick. Automated analysis and computationally e�cientsynthesis of acoustic guitar strings and body. In Applications of Signal Processing to Audio andAcoustics, 1995., IEEE ASSP Workshop on, pages 238–241, Oct. 1995.

[8] John M. Chowning. The synthesis of complex audio spectra by means of frequency modulation.J. Audio Eng. Soc, 21(7):526–534, 1973.

[9] Perry R. Cook, editor. Music, Cognition, and Computerized Sound: An Introduction to Psy-choacoustics. MIT Press, Cambridge, MA, USA, 1999.

[10] Perry R. Cook. Real Sound Synthesis for Interactive Applications. A. K. Peters, Ltd., Natick,MA, USA, 2002.

[11] Perry R. Cook and Gary P. Scavone. The synthesis toolkit (STK). In International ComputerMusic Conference, 1999.

[12] G. Cuzzucoli and V. Lombardo. A physical model of the classical guitar, including the player’stouch. Computer Music Journal, 23(2):52–69, Jun. 1999.

[13] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification. Pattern Classification and SceneAnalysis: Pattern Classification. Wiley, 2001.

[14] C. Erkut, V. Valimaki, M. Karjalainen, and M. Laurson. Extraction of physical and expres-sive parameters for model-based sound synthesis of the classical guitar. In 108th AES Int.Convention 2000, pages 19–22, Paris, France, Feb. 2000. AES.

114

[15] Fishman. Pickups: Tune-o-matic powerbridge pickup. http://www.fishman.com/products/

view/tune-o-matic-powerbridge-pickup, Apr. 2012.

[16] N. H. Fletcher. The nonlinear physics of musical instruments. Technical Report 62, Instituteof Physics Publishing, 1999.

[17] N.H. Fletcher and T.D. Rossing. The Physics of Musical Instruments. Springer, 1998.

[18] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version1.21. http://cvxr.com/cvx.

[19] J. Gudnason, M. R. P. Thomas, P. A. Naylor, and D. P. W. Ellis. Voice source waveformanalysis and synthesis using principal component analysis and gaussian mixture modelling. InProc. of the 2009 Annual Conference of the International Speech Communication Association,Brighton, U.K., Sept. 2009. INTERSPEECH.

[20] Apple Inc. Garageband. http://itunes.apple.com/us/app/garageband/id408709785?mt=8,Jan. 2012.

[21] ISO. Information technology - coding of audio-visual objects - part 3: Au-dio. http://www.iso.org/iso/iso_catalogue/catalogue_ics/catalogue_detail_ics.

htm?csnumber=53943, Nov. 2011.

[22] D. A. Ja↵e and J. O. Smith. Extensions of the Karplus-Strong plucked-string algorithm. Com-puter Music Journal, 7(2):56–69, Jun. 1983.

[23] J.-M. Jot. An analysis/synthesis approach to real-time artificial reverberation. In Proc. ofIEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages221–224. ICASSP, Mar. 1992.

[24] M. Karjalainen, A. Harma, U.K. Laine, and J. Huopaniemi. Warped filters and their audio ap-plications. In Proc. IEEE Worshop on Applications of Signal Processing to Audio and Acoustics,page 4 pp. WASPAA, Oct. 1997.

[25] M. Karjalainen and U. K. Laine. A model for real-time sound synthesis of guitar on a floating-point signal processor. In Proc. of IEEE International Conference on Acoustics, Speech andSignal Processing, volume 5, pages 3653–3656. ICASSP, Apr. 1991.

[26] M. Karjalainen, T. Maki-Patola, A. Kanerva, A. Huovilainen, and P. Janis. Virtual air guitar.In Proc. of the 117th Audio Engineering Society Convention. AES, Oct. 2004.

[27] M. Karjalainen, H. Penttinen, and V. Valimaki. Acoustic sound from the electric guitar usingDSP techniques. In Proc. of IEEE International Conference on Acoustics, Speech and SignalProcessing, volume 2, pages 773–776. ICASSP, 2000.

[28] M. Karjalainen and J. O. Smith. Body modeling techniques for string instrument synthesis. InProc. of the International Computer Music Conference. ICMC, 1996.

[29] M. Karjalainen, V. Valimaki, and Z. Janosy. Towards high-quality sound synthesis of the guitarand string instruments. In Proc. of the International Computer Music Conference. ICMC, Sept.1993.

[30] M. Karjalainen, V. Valimaki, and T.. Tolonen. Plucked-string models: From the Karplus-Strongalgorithm to digital waveguides and beyond. Computer Music Journal, 22(3):17–32, Oct. 1998.

115

[31] K. Karplus and A. Strong. Digital synthesis of plucked-string and drum timbres. ComputerMusic Journal, 7(2):43–55, Jun. 1983.

[32] C. T. Kelley. Iterative Methods for Optimization. Frontiers in Applied Mathematics, SIAM,1999.

[33] L. E. Kinsler, A. R. Frey, A. B. Coppens, and J. V. Sanders. Fundamentals of Acoustics. Wiley,3rd edition, 1982.

[34] Mark A. Kramer. Nonlinear principal component analysis using autoassociative neural networks.AIChE Journal, 37(2):233–243, 1991.

[35] T. Laakso, V. Valimaki, M. Karjalainen, and U. K. Laine. Splitting the unit delay - tools forfractional delay filter design. IEEE Signal Processing Magazine, 13(1):30–60, Jan. 1996.

[36] J. Laroche and J.-L. Meillier. Multichannel excitation/filter modeling of percussive sounds withapplication to the piano. IEEE Transactions on Speech and Audio Processing, 2(2):329 –344,Apr. 1994.

[37] B. P. Lathi. Signal Processing And Linear Systems. Oxford University Press, Inc., 198 MadisonAvenue, New York, New York, 10016, 1998.

[38] N. Laurenti, G. De Poli, and D. Montagner. A nonlinear method for stochastic spectrum esti-mation in the modeling of musical sounds. IEEE Transactions on Audio, Speech, and LanguageProcessing, 15(2):531 –541, Feb. 2007.

[39] M. Laurson, C. Erkut, V. Valimaki, and M. Kuushankare. Methods for modeling realisticplaying in acoustic guitar synthesis. Computer Music Journal, 25(3):38–49, Oct. 2001.

[40] N. Lee, R. Cassidy, and J.O. Smith. Use of energy decay relief (EDR) to estimate partial-overtone decay-times in a freely vibrating string. In Invited paper at The Musical AcousticsSessions at the Joint ASA-ASJ meeting, Honolulu, HI, 2006. ASA.

[41] N. Lee, Z. Duan, and J. O. Smith. Excitation signal extraction for guitar tones. In Proc. of theInternational Computer Music Conference. ICMC, 2007.

[42] N. Lee, J. O. Smith, J. Abel, and D. Berners. Pitch glide analysis and synthesis from recordedtones. In Proc. of the International Conference on Digital Audio E↵ects, Como, Italy, Sept.2009. DAFx.

[43] N. Lee, J. O. Smith, and V. Valimaki. Analysis and synthesis of coupled vibrating strings usinga hybrid modal-waveguide synthesis model. IEEE Transactions on Audio, Speech and LanguageProcessing, 18(4):833–842, May 2010.

[44] N. Lindroos, H. Penttinen, and V. Valimaki. Parametric electric guitar synthesis. ComputerMusic Journal, 35(3):18–27, Sept. 2011.

[45] Line6. Lin6 modeling amplifiers. http://line6.com/amps, May 2012.

[46] Line6. Lin6 variax guitars. http://line6.com/guitars, May 2012.

[47] MathWorks. Optimization Toolbox 5.0. http://www.mathworks.com/products/

optimization/, August 2010.

116

[48] MathWorks. Curve Fitting Toolbox 3.0. http://www.mathworks.com/products/

curvefitting/, November 2011.

[49] D. Mazzoni and R. Dannenberg. Audacity. http://audacity.sourceforge.net/, Oct. 2011.

[50] R. McAulay and T. Quatieri. Speech analysis/synthesis based on a sinusoidal representation.IEEE Transactions on Acoustics, Speech and Signal Processing, 34(4):744 – 754, aug 1986.

[51] P. Mokhtari, H. R. Pfitzinger, and C. T. Ishi. Principal components of glottal waveforms:towardsparameterisation and manipulation of laryngeal voice quality. In VOQUAL ’03, 2003.

[52] P. M. Morse and K. U. Ingard. Theoretical Acoustics. McGraw-Hill Education, New York, NY,USA, 1968.

[53] G. Oetken. A new approach for the design of digital interpolating filters. IEEE Transactionson Acoustics, Speech and Signal Processing, 27(6):637 – 643, Dec. 1979.

[54] A. V. Oppenheim and R. W. Schafer. Discrete-Time Signal Processing. Prentice-Hall, Inc.,Upper Saddle River, New Jersey, 1999.

[55] C. O’Shea. Kinect air guitar prototype. http://www.chrisoshea.org/lab/

air-guitar-prototype, Jan. 2012.

[56] J. Pakarinen, T. Puputti, and V. Valimaki. Virtual slide guitar. Computer Music Journal,32(3):42–54, 2008.

[57] H. Penttinen, M. Karjalainen, T. Paatero, and H. Jarvelainen. New techniques to model rever-berant instrument body responses. In Proc. of the International Computer Music Conference.ICMC, 2001.

[58] H. Penttinen, J. Siiskonen, and V. Valimaki. Acoustic guitar plucking point estimation inreal time. In Proc. of the IEEE International Conference on Acoustics, Speech, and SignalProcessing, volume 3, pages 209 – 212. ICASSP, Mar. 2005.

[59] H. Penttinen and V. Valimaki. Time-domain approach to estimating the plucking point of guitartones obtained with an under-saddle pickup. Applied Acoustics, 65:1207–1220, Dec. 2004.

[60] Thomas Quatieri. Discrete-Time Speech Signal Processing: Principles and Practice. PrenticeHall Press, Upper Saddle River, NJ, USA, 2001.

[61] L. Rabiner. On the use of autocorrelation analysis for pitch detection. IEEE Transactions onAcoustics, Speech and Signal Processing, 25(1):24 – 33, Feb. 1977.

[62] Janne Riionheimo and Vesa Valimaki. Parameter estimation of a plucked string synthesismodel using a genetic algorithm with perceptual fitness calculation. EURASIP J. Appl. SignalProcess., 2003:791–805, 2003.

[63] M. Roma, L. Gonzalez, and F. Briones. Software based acoustic guitar simulation by meansof its impulse response. In 10th Meeting on Audio Engineering of the AES. AES, Portugal,Lisbon, 2009.

[64] Thomas D. Rossing, editor. The Science of String Instruments, chapter 23. Springer Sci-ence+Business Media, 233 Spring Street, New York, NY 10013, USA, 1 edition, 2010.

117

[65] Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linearembedding. Science, 290(5500):2323–2326, 2000.

[66] E.D. Scheirer. The MPEG-4 structured audio standard. In Proc. of the IEEE InternationalConference on Acoustics, Speech and Signal Processing, volume 6, pages 3801 –3804 vol.6.ICASSP, may 1998.

[67] M. Scholz. Nonlinear PCA toolbox for MATLAB. http://www.nlpca.de/matlab.html, 2011.

[68] Xavier Serra and J. O. Smith. Spectral modeling synthesis: A sound analysis/synthesis systembased on a deterministic plus stochastic decomposition. Computer Music Journal, 14(4):pp.12–24, 1990.

[69] J. O. Smith. Techniques for Digital Filter Design and System Identification with Application tothe Violin. PhD thesis, Department of Music, Stanford University, Stanford, CA, Jun. 1983.

[70] J. O. Smith. Music applications of digital waveguides. Technical report, CCRMA, MusicDepartment, Stanford University, 1987.

[71] J. O. Smith. Waveguide filter tutorial. In Proc. of the International Computer Music Conference,pages 9–16. Computer Music Association, 1987.

[72] J. O. Smith. Physical modeling using digital waveguides. Computer Music Journal, 16(4):74–91,1992.

[73] J. O. Smith. E�cient synthesis of stringed musical instruments. In Proc of the InternationalComputer Music Conference, Tokyo, Japan, 1993. ICMC.

[74] J. O. Smith. Virtual electric guitars and e↵ects using faust and octave. In Proc of InternationalLinux Audio Conference, Cologne, Germany, 2008.

[75] J. O. Smith. Digital waveguide architectures for virtual musical instruments. In David Havelock,Sonoko Kuwano, and Michael Vorlander, editors, Handbook of Signal Processing in Acoustics,pages 399–417. Springer New York, 2009.

[76] J. O. Smith. Physical Audio Signal Processing. W3K Publishing, 2010. online book.

[77] J. O. Smith. Spectral Audio Signal Processing, October 2008 Draft. CCRMA Stanford, August22, 2010. online book.

[78] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework fornonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.

[79] T. Tolonen, C. Erkut, V. Valimaki, and M. Karjalaineen. Simulation of plucked strings exhibit-ing tension modulation driving force. In Proc. of the International Computer Music Conference.ICMC, 1999.

[80] T. Tolonen, V. Valimaki, and M. Karjalainen. Modeling of tension modulation nonlinearity inplucked strings. IEEE Transactions on Speech and Audio Processing, 8(3):300–310, May 2000.

[81] C. Traube and P. Depalle. Extraction of the excitation point location on a string using weightedleast-square estimation of a comb filter delay. In Proc. of International Conference on DigitalAudio E↵ects, London, UK, Sept. 2003. DAFx.

118

[82] C. Traube, P. Depalle, and M. Wanderley. Indirect acquisition of instrumental gesture based onsignal, physical and perceptual information. In Proc. of New Interfaces for Musical Expression,pages 42–47, Montreal, Canada, 2003. NIME.

[83] C. Traube and J. O. Smith. Estimating the plucking point on a guitar string. In COST G-6Conference on Digital Audio E↵ects. DAFX, Dec. 2000.

[84] C. Traube and J.O. Smith. Extracting the fingering and the plucking points on a guitar stringfrom a recording. In Proc. of IEEE Workshop on Applications of Signal Processing to Audioand Acoustics, pages 7–10. WASPAA, 2001.

[85] V. Valimaki. Discrete-Time Modeling of Acoustic Tubes Using Fractional Delay Filters. PhDthesis, Helsinki University of Technology, Espoo, Finland, 1995.

[86] V. Valimaki, J. Huopaniemi, M. Karjalainen, and Z. Janosy. Physical modeling of plucked stringinstruments with application to real-time sound synthesis. Journal of the Audio EngineeringSociety, 44(5):331–353, May 1996.

[87] V. Valimaki and T. Laakso. Principles of fractional delay filters. In Proc. of the IEEE Inter-national Conference on Acoustics, Speech and Signal Processing, Istanbul, Turkey, Jun. 2000.ICASSP.

[88] V. Valimaki, H. Lehtonen, and T. Laakso. Musical signal analysis using fractional-delay inversecomb fitlers. In Proc. of International Conference on Digital Audio E↵ects, Bordeaux, France,Sept. 2007. DAFx.

[89] V. Valimaki, J. Pakarinen, C. Erkut, and M. Karjalainen. Discrete-time modeling of musicalinstruments. Technical report, Institute of Physics Publishing, Oct. 2005.

[90] V. Valimaki and T. Tolonen. Development and calibration of a guitar synthesizer. Journal ofthe Audio Engineering Society, 46(9):766–778, Sept. 1998.

[91] V. Valimaki, T. Tolonen, and M. Karjalinen. Signal-dependent nonlinearities for physical modelsusing time-varying fractional delay filters. In International Computer Music Conference, pages264–267, Oct. 1998.

[92] B.L. Vercoe and D. P. Ellis. Real-time csound: Software synthesis with sensing and control. InInternational Computer Music Conference, 1990.

[93] B.L. Vercoe, W.G. Gardner, and E.D. Scheirer. Structured audio: creation, transmission, andrendering of parametric sound representations. Proceedings of the IEEE, 86(5):922 –940, may1998.

119

VITA

Raymond Vincent Migneco

EDUCATIONPh.D. Electrical & Computer Engineering, Drexel University, Philadelphia, PA, 2012M.S. Electrical & Computer Engineering, Drexel University, Philadelphia, PA, 2011B.S. Electrical Engineering, The Pennsylvania State University, University Park, PA, 2005

ACADEMIC HONORSEta Kappa Nu Electrical & Computer Engineering Honor SocietyDean’s List Honors Drexel University, The Pennsylvania State University

PROFESSIONAL EXPERIENCEGraduate Research Assistant, Drexel University, 9/2007 - 6/2012Electrical Reliability Engineer, Sunoco Chemicals, 8/2005 - 8/2007

TEACHING EXPERIENCETeaching Assistant, Drexel University, 9/2007 - 6/2011NSF Discovery K-12 Fellow, Drexel University, 3/2008 - 6/2009Teaching Assistant, The Pennsylvania State University, 1/2005 - 5/2005

SELECTED PUBLICATIONS• Migneco, R., and Kim, Y. E. (2012). “A Component-Based Approach for Modeling Plucked-

Guitar Excitation Signals,” Proceedings of the International Conference on New Interfaces forMusical Expression, Ann Arbor, MI: NIME.

• Batula, A. M., Morton, B. G., Migneco, R., Prockup, M., Schmidt, E. M., Grunberg, D.K., Kim, Y. E., and Fontecchio, A. K. (2012). “Music Technology as an Introduction toSTEM,” Proceedings of the American Society for Engineering Education Annual Conference,San Antonio, TX: ASEE.

• Migneco, R., and Kim, Y. E. (2011). “Excitation Modeling and Synthesis for Plucked GuitarTones,” Proceedings of the 2011 IEEE Workshop on Applications of Signal Processing to Audioand Acoustics, New Paltz, NY: WASPAA.

• Migneco, R., and Kim, Y. E. (2011). “Modeling Plucked Guitar Tones Via Joint SourceFilter Estimation,” Proceedings of the 14th IEEE Digital Signal Processing Workshop and 6thIEEE Signal Processing Education Workshop, Sedona, AZ: DSP/SPE.

• Scott, J., Migneco, R., Morton, B., Hahn, C. M., Difenbach, P. and Kim, Y. E. (2010).“An audio processing library for MIR application development in Flash,” Proceedings of the2010 International Society for Music Information Retrieval Conference, Utrecht, Netherlands:ISMIR.

• Migneco, R., Doll, T. M., Scott, J. J., Hahn, C., Diefenbach, P. J., and Kim, Y. E. (2009).“An audio processing library for game development in Flash,” Accepted to International IEEEConsumer Electronics Societys Games Innovations Conference.

• Kim, Y. E., Doll, T. M., and Migneco, R. (2009). “Collaborative online activities for acousticseducation and psychoacoustic data collection,” in IEEE Transactions on Learning Technologies.

• Doll, T. M., Migneco, R., Scott, J. J., and Kim, Y. E. (2009). “An audio DSP toolkitfor rapid application development in Flash,” Accepted to IEEE International Workshop onMultimedia Signal Processing, Rio de Janiero, Brazil: MMSP.