Delay Differential Analysis of Time Series

22
Neural Computation (March 2015); in press 1 Delay Differential Analysis of Time Series Claudia Lainscsek and Terrence J. Sejnowski Howard Hughes Medical Institute, Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, CA 92037, USA and Institute for Neural Computation, University of California San Diego, La Jolla, CA 92093, USA Abstract Non-linear dynamical system analysis based on embedding theory has been used for modeling and prediction but it also has applications to signal detection and classification of time series. An embedding creates a multidimensional geometrical object from a single time series. Traditionally either delay or derivative embed- dings have been used. The delay embedding is composed of delayed versions of the signal and the derivative embedding is composed of successive derivatives of the signal. The delay embedding has been extended to non-uniform embeddings to take multiple time scales into account. Both embeddings provide information on the underlying dynamical system without having direct access to all the system variables. Delay differential analysis (DDA) is based on functional embeddings, a combination of the derivative embedding with non-uniform delay embeddings. Small delay differential equation (DDE) models that best represent relevant dy- namic features of time series data are selected from a pool of candidate models for detection and/or classification. We show that the properties of DDEs support spectral analysis in the time-domain where nonlinear correlation functions are used to detect frequencies, frequency and phase couplings, and bispectra. These can be efficiently computed with short time windows and are robust to noise. For frequency analysis this framework is a multi- variate extension of discrete Fourier transform (DFT) and for higher-order spectra it is a linear and multivariate alternative to multidimensional fast Fourier trans- form of multidimensional correlations. This method can be applied to short and/or sparse time series and can be extended to cross-trial and/or cross-channel spectra if multiple short data segments of the same experiment are available. Together, this time-domain toolbox provides higher temporal resolution, increased frequency and phase coupling information, and allows an easy and straightforward implementa- tion of higher-order spectra across time compared with frequency-based methods such as the discrete Fourier transform (DFT) and cross-spectral analysis.

Transcript of Delay Differential Analysis of Time Series

Neural Computation (March 2015); in press 1

Delay Differential Analysis of Time Series

Claudia Lainscsek and Terrence J. SejnowskiHoward Hughes Medical Institute, Computational Neurobiology Laboratory,Salk Institute for Biological Studies, La Jolla, CA 92037, USA andInstitute for Neural Computation, University of California San Diego, La Jolla,CA 92093, USA

Abstract

Non-linear dynamical system analysis based on embedding theory has been usedfor modeling and prediction but it also has applications to signal detection andclassification of time series. An embedding creates a multidimensional geometricalobject from a single time series. Traditionally either delay or derivative embed-dings have been used. The delay embedding is composed of delayed versions ofthe signal and the derivative embedding is composed of successive derivatives ofthe signal. The delay embedding has been extended to non-uniform embeddingsto take multiple time scales into account. Both embeddings provide informationon the underlying dynamical system without having direct access to all the systemvariables. Delay differential analysis (DDA) is based on functional embeddings,a combination of the derivative embedding with non-uniform delay embeddings.Small delay differential equation (DDE) models that best represent relevant dy-namic features of time series data are selected from a pool of candidate modelsfor detection and/or classification.

We show that the properties of DDEs support spectral analysis in the time-domainwhere nonlinear correlation functions are used to detect frequencies, frequency andphase couplings, and bispectra. These can be efficiently computed with short timewindows and are robust to noise. For frequency analysis this framework is a multi-variate extension of discrete Fourier transform (DFT) and for higher-order spectrait is a linear and multivariate alternative to multidimensional fast Fourier trans-form of multidimensional correlations. This method can be applied to short and/orsparse time series and can be extended to cross-trial and/or cross-channel spectra ifmultiple short data segments of the same experiment are available. Together, thistime-domain toolbox provides higher temporal resolution, increased frequency andphase coupling information, and allows an easy and straightforward implementa-tion of higher-order spectra across time compared with frequency-based methodssuch as the discrete Fourier transform (DFT) and cross-spectral analysis.

1 Introduction

Delay differential analysis (DDA) uses delay differential equations (DDE) to revealdynamical information from time series. A DDE unfolds timescales and couplingsbetween those timescales. In non-linear time series analysis an embedding convertsa single time series into a multidimensional object in an embedding space that re-veals valuable information about the dynamics of the system without having directaccess to all the systems variables (Whitney (1936); Packard et al. (1980); Takens(1981); Sauer et al. (1991)). A DDE is a functional embedding, an extension of anon-uniform embedding with linear and/or nonlinear functions of the time series.DDA therefore connects concepts of nonlinear dynamics with frequency analysisand higher-order statistics.

A Fourier series decomposes periodic functions or periodic signals into the sumof a (possibly infinite) set of simple oscillating functions (Fourier (1822)). TheFourier transform F (f) of the function f(t) has many applications in physics andengineering and are referred to as frequency domain and time-domain representa-tions, respectively. The relationship between frequency analysis and the analysisof frequency and/or phase couplings in the time-domain is poorly understood (seee.g. Hjorth (1970); Chan and Langford (1982); Raghuveer and Nikias (1985, 1986);Stankovic (1994)).

This study introduces spectral DDE methods and applies them to simulated data.In a companion paper (Lainscsek et al. (2014)) the methods are applied to elec-troencephalography (EEG) data.

The paper is organized as follows: In Sec. 2, DDA is introduced as a non-lineartime series classification tool. In Sec. 3, functional embeddings serve as a linkbetween nonlinear dynamics and traditional frequency analysis and higher-orderstatistics. The time-domain spectrum (TDS) and the time-domain bispectrum(TDB) are introduced. These are extended in Sec. 4 to sparse data and in Sec.5 to the cross-trial spectrogram (CTS) and the cross-trial bispectrogram (CTB).The results are compared to traditional wavelet analysis. The conclusion in Sec. 6compares the complementary strength and weakness of time-domain and frequencydomain approaches to pattern detection and classification of time series.

2 DDA as non-linear classification tool

Fig. 1 illustrates the connections of delay differential analysis (DDA) to non-lineardynamics and global modeling. In non-linear time series analysis, a single variablemeasurement (blue box in Fig. 1) comes from and unknown non-linear dynam-ical system (pink box in Fig. 1). In Fig. 1 the Rossler system (Rossler (1976))is used for illustrational purposes. A general existence theorem for embeddingsin an Euclidean space was given by Whitney (1936): A generic map from ann-dimensional manifold to a 2n + 1 dimensional Euclidean space is an embed-ding. Whitney’s theorem implies that each state can be identified uniquely by

2

Figure 1: Delay differential analysis (DDA) and global modeling

a vector of 2n + 1 measurements, thereby reconstructing the phase space. Thistheorem is the basis for the embedding theorems of Packard et al. (1980) andTakens (1981) for reconstruction of an attractor from single time series measure-ments. Takens proved that instead of 2n + 1 generic signals, the time-delayedversions [y(t), y(t − τ), y(t − 2τ), . . . , y(t − nτ)] or the successive derivatives ofone generic signal would suffice to embed the n-dimensional manifold. Althoughthe reconstruction preserves the invariant dynamical properties of the dynamicalsystem, the embedding theorems provide no tool to reconstruct the original dy-namical system in the original phase space from the single measurement or theembedding.

The Ansatz Library is a tool to reconstruct a general system from a single measure-ment that contains the original dynamical system as a subset (Lainscsek (1999);Lainscsek et al. (2001, 2003); Lainscsek (2011)). For example, for the Rosslersystem (Rossler (1976))

x = a1,2 y + a1,3 zy = a2,1 x+ a2,2 yz = a3,0 + a3,3 z + a3,6 x z

(1)

3

the standard form (or the functional form of the differential embedding) is

X = Y

Y = Z

Z = F (X, Y, Z)

(2)

where X = x2 and its successive derivatives define the new state space variablesX, Y , and Z. The function F (X, Y, Z) is explicitly

F (X, Y, Z) = α1 +Xα2 +X2α3 + Y α4 +XY α5

+Y 2α6 + Zα7 +XZα8 + Y Zα9 .(3)

Using the relations between the coordinates αr of the differential embedding andthe coordinates ai,∗ the model of the Rossler system becomes:

ϕx = d2 1ba1,2 y + d3 1

b1ca1,3 ϕz

y = b a2,1 ϕx + d a2,2 y

ϕz = c a3,0 + d a3,3 ϕz + b a3,6 ϕx ϕz ,

(4)

where the functions ϕx = ϕx(x, z) and ϕz = ϕz(x, z) are generic linear independentfunctions of x and z (see Lainscsek et al. (2003) and Lainscsek (2011) for details).The parameters b and c specify the structure of the parameter space ai,∗ and d isan additional time-scaling parameter.

Such a reconstruction is only possible for simple dynamic systems with no noiseand can be seen as the theoretical limit for global modeling. It also shows thatmeasurements from the same dynamical system on different time scales (and con-sequently frequency shifted) can be identified as belonging to the same dynamicalsystem. Further, the time-scaling parameter can be reconstructed as a separateparameter. This is consistent with Takens’ theorem, which does not specify thesampling rate.

Since real-world data are noisy and generally originate from highly complex dy-namical systems, global modeling from data is in general not feasible. Often it issufficient to detect dynamical differences between data classes and to quantify thedifference. DDA can distinguish between heart conditions in EKG (electrocardiog-raphy) data (Lainscsek and Sejnowski (2013)) and EEG data between Parkinson’spatients and control subjects (Lainscsek et al. (2013a)). Dynamical differencescan be detected and analyzed by combining derivative embeddings with delayembeddings as functional embeddings in the DDA framework.

DDA in the time-domain can be related to frequency analysis as shown in the nextsection. In linear DDEs (Falbo (1995)) the estimated parameters and delays of aDDE relate to the frequencies of a signal and in non-linear DDEs the estimatedparameters are connected to higher-order statistical moments.

4

3 Functional embedding as a connection between

non-linear dynamics and frequency analysis

3.1 Linear DDE and frequency analysis

Lainscsek and Sejnowski (2013) show that the linear DDE (Falbo (1995))

x = axτ (5)

where xτ = x(t−τ), can be used to detect frequencies in the time-domain: Eq. (5)with a harmonic signal with one frequency, x(t) = A cos(ω t + ϕ) can be writtenas

−aA cos(ω(t− τ) + ϕ)− Aω sin(tω + ϕ) = 0

sin(ω t)(−aA cos(ϕ) sin(τω) + aA sin(ϕ) cos(τω)− Aω cos(ϕ))+cos(ω t)(−aA sin(ϕ) sin(τω)− aA cos(ϕ) cos(τω)− Aω sin(ϕ)) = 0 .

(6)

This equation has the solutions

τ =(2n− 1) π

ω=2π f=

2n− 1

4 f, a = (−1)n ω , n ∈ N . (7)

This means that the delays are inversely proportional to the frequency f and thecoefficient a is directly proportional to the frequency. For linear DDEs

x =N∑k=1

akxτk (8)

a special solution is a harmonic signal with 2N − 1 frequencies

x(t) =2N−1∑k=1

Ak cos(ωk t+ ϕk) (9)

and

τi =π(2n− 1)

2ωj, n ∈ N (10)

where all delays τi are related to one of the frequencies. The expressions for thecoefficients a are more complicated than in Eq. (7) and each of the coefficients adepends on all the frequencies in the signal.

Eq. (5) can be expanded as a Yule-Walker equation (Yule (1927); Walker (1931);Boashash (1995); Kadtke and Kremliovsky (1999)), x xτ = a x2

τ . Applying the

expectation operator 〈F (t)〉 ≡ limT→∞

(1T

∫ T0F (t) dt

)we get

a =〈x xτ 〉〈x2

τ 〉=〈x xτ 〉〈x2〉

. (11)

5

For a harmonic signal with one frequency (This is a special solution of the linearDDE in Eq. (5). See Falbo (1995) for details.) x(t) = A cos(ω t+ϕ) this becomes:

〈x xτ 〉 = −12ωA2 sin(ωτ)

〈x2〉 = 12A2

⇒ a = −ω sin(ω τ) (12)

and for sin(ω τ) = ±1 the special solutions for a and τ are:

τ =(2n− 1) π

ω=2π f=

2n− 1

4 f, a = (−1)n ω , (13)

the same as in Eq. (7). For DDEs without explicit analytical solutions Yule-Walkerequations can be applied.

The numerator in Eq. (11) can be rewritten as delay derivatives of the autocorre-lation function in the case of a bounded stationary signal,

〈xxτ 〉 = limT→∞

1T

T∫0

xxτ dt = limT→∞

1T

[xxτ ]T0 − lim

T→∞1T

T∫0

xdxτdt

dt =

= limT→∞

1T

T∫0

xdxτdτdt = d

dτ〈xxτ 〉

(14)

and therefore

a =〈x xτ 〉〈x2〉

=ddτ〈x xτ 〉〈x2〉

. (15)

Thus the numerator in Eq. (15) is the delay derivative of the autocorrelationfunction 〈x xτ 〉. For a harmonic signal x(t) =

∑Nk=1Ak cos(ωk t + ϕk) with N

frequencies it is 〈x xτ 〉 = 12

∑Nk=1A

2k cos(ωk τ) and is used for pitch detection

(Boersma (1993); Roads and Curtis (1996)).

The denominator 〈x2〉 will be used as frequency detector in the following way:Consider a signal

x(t) =N∑i=1

Ai cos(ωi t+ ϕi)︸ ︷︷ ︸S

+D cos(Ω t+ φ)︸ ︷︷ ︸P

(16)

where S is the signal under investigation and P = D cos(Ω t+φ) is a probe signal.

The expectation value 〈x2〉 = limT→∞

1T

∫ T0x2 dt can be written in the form

〈x2〉 = 〈(S + P)2〉 = 〈S2〉+ 2〈SP〉+ 〈P2〉

〈S2〉 = 12

N∑i=1

A2i

〈P2〉 = 12D2

〈SP〉 =

0AiD cos(ϕi − φ)

forωi 6= Ωωi = Ω

(17)

The term 〈SP〉 = 〈SD cos(Ω t+φ)〉 combinesR = 〈S cos(Ωt)〉 and I = 〈S sin(Ωt)〉used in discrete frequency analysis.

6

Time domain spectrum (TDS)

Let the amplitude of the probe signal be D = 1 (arbitrary parameter). Then oneway to define the time-domain spectrum (TDS) is

L(Ω) = maxφ

(〈x2〉 − 〈S2〉 − 1

2

). (18)

L(Ω) is zero for frequencies Ω that are not part of the signal S and equal tothe amplitude Ai of a frequency fi = ωi

2πif Ω = ωi, which follows directly from

〈x2〉 = 〈S2〉+ 12+Ai cos(ϕi−φ). The amplitude Ai is always positive and cos(ϕi−φ)

ranges between -1 and 1 with a maximum when φ = ϕi. For a given signal S weloop over a range of φ values between 0 and 2π and take the maximal value of〈x2〉 − 〈S2〉 − 1

2= Ai. The value φ that maximizes cos(ϕi − φ) is then equal to

the phase ϕi. Therefore, this simultaneously detects the amplitudes Ai and theirphases ϕi in a signal. This method can only detect frequencies and couplings atlower than half the sampling rate since Ω = 2πf

fs, where fs is the sampling rate, in

cos(Ωt+φ) with the time t given in time steps needs to be smaller than π. This isthe same as the Nyquist frequency limit for traditional methods (Nyquist (1928)).

Comparison to the Goertzel algorithm and discrete frequency analysis

In discrete frequency analysis for a signal S(t) =N∑i=1

Ai cos(ωi t + ϕi) the expec-

tation values R = 〈S cos(Ωt)〉 and I = 〈S sin(Ωt)〉 are computed. R and I areonly non-zero if Ω is contained in the signal. The amplitude of that frequency is

then AiΩ=ωi= 4

√R2 + I2 and the phase is ϕi = tan−1 I

R . The Goertzel Algorithm(Goertzel (1958); Jacobsen and Lyons (2003)) is the numerical implementation ofthis principle and gives the exact same results as the computation of the TDS.

The TDS L(Ω) differs from discrete frequency analysis in two ways. Firstly,D cos(Ω t + φ) is multiplied with the signal S instead of cos(Ω t). Secondly,only 〈SP〉 = 〈SD cos(Ω t + φ)〉 is computed instead of R = 〈S cos(Ωt)〉 andI = 〈S sin(Ωt)〉. Amplitudes and phases are extracted from 〈SP〉 instead of

AiΩ=ωi= 2

√R2 + I2 and ϕi = tan−1 I

R . An advantage of L(Ω) is that it can bemore easily expanded to the computation of higher-order spectra, e.g. the bispec-trum.

3.2 Non-linear DDE and Bispectral Analysis

Quadratic phase and frequency couplings

Frequency components do not always appear completely independently of oneanother. Non-linear interactions of frequencies and their phases (e.g. quadraticphase coupling) cannot be detected by a power spectrum – the Fourier transformof the autocorrelation function (second-order cumulant) – since phase relationships

7

and frequency couplings of signals are lost. Such couplings are usually detectedvia bispectral analysis (Tukey (1953); Kolmogorov and Rozanov (1960); Leonovand Shiryeav (1959); Rosenblatt and Van Ness (1965); Brillinger and Rosenblatt(1967); Swami (2003); Mendel (1991); Fackrell and McLaughlin (1995)). The bis-pectrum or bispectral density is the Fourier transform of the third-order cumulant(bicorrelation function).

Consider the signal x(t) = A1 cos(ω1t + ϕ1) + A2 cos(ω2t + ϕ2) which is passedthrough a quadratic nonlinear system h(t) = b x2(t) where b is a non-zero con-stant. The output of the system will include the harmonic components: (2ω1, 2ϕ1),(2ω2, 2ϕ2), (ω1 + ω2, ϕ1 + ϕ2), and (ω1 − ω2, ϕ1 − ϕ2). These phase relations arecalled quadratic phase coupling (QPC) in general and quadratic frequency cou-pling (QFC) when the phases are zero (ϕ1 = ϕ2 = 0): For the signal x(t) =cos(ω1t) + cos(ω2t) + cos(ω3t) QFC occurs when ω3 is a multiple of one of thefrequencies or of the sum or difference of the two frequencies.

The third order cumulant is C3(τ1, τ2) = 〈x(t)x(t−τ1)x(t−τ2)〉 and the traditionalbispectrum is the double Fourier transform of C3,

Bx(f1, f2) = F?(f1 + f2)F(f1)F(f2) (19)

where F(f) is the Fourier transform of x(t). An example is given in Fig. 2.

Figure 2: Bispectrum magnitude (Eq. (19)) for the signal

S =3∑i=1

Ai cos(ωi t+ ϕi) + cos((ω1 + ω3) t+ (ϕ1 + ϕ3))

with the frequencies, amplitudes, and phases listed in Tab. 1.

Nonlinear DDE

The simplest non-linear DDEx = ax2

τ (20)

8

Table 1: Frequencies fi, amplitudes Ai, and phases ϕi for the numerical experi-ments shown in Figs. 2 and 3.

i fi Ai ϕi1 31 1.14 0.532 40 0.92 1.143 69 1.13 0.09

where xτ = x(t−τ), extends linear spectral analysis to quadratic coupling analysisor bispectral analysis. This equation can be solved in the same way as the linearDDE:

a =〈x x2

τ 〉〈x4

τ 〉=〈x x2

τ 〉〈x4〉

. (21)

For a harmonic signal with three frequencies x(t) = A1 cos(ω1 t+ϕ1)+A2 cos(ω2 t+ϕ2)+A3 cos(ω3 t+ϕ3) the coefficient a is only non-zero if there is quadratic couplingbetween the frequencies, i.e. ω3 = ω1±ω2, 2ω1, 2ω2,

ω1

2, or ω2

2. Only the numerator

〈x x2τ 〉 contains quadratic coupling information and can be rewritten for a bounded

stationary signal:

〈xx2τ 〉 = lim

T→∞1T

T∫0

xx2τ dt = lim

T→∞1T

[xx2τ ]T0 − lim

T→∞1T

T∫0

xdx2τ

dtdt =

= limT→∞

1T

T∫0

xdx2τ

dτdt = d

dτ〈xx2

τ 〉 .(22)

The denominator in Eq. (21) is a constant independent of up to quadratic couplingsin the signal. Therefore Eq. (21) can be rewritten as

a 〈x4〉 = 〈x x2τ 〉

ddτ〈xx2

τ 〉 = const.(23)

For x(t) =∞∑i=1

Ai cos(ωi t + ϕi) + Aj,k cos((ωj + ωk) t + ϕj + ϕk) the expectation

value is:

〈xx2τ 〉 =

1

2Aj Ak Aj,k

(cos(ωjτ) + cos(ωkτ) + cos((ωj + ωk)τ)

). (24)

Therefore 〈xx2τ 〉 is zero for a signal without QFC (Aj,k = 0) and non-zero if there

is QFC in the signal. Without loss of generality the delay τ can be set to zero

in Eq. (24) and 〈xx2τ 〉

τ=0= 〈x3〉 = 3

2Aj Ak Aj,k. 〈x3〉 and the coefficient a of the

non-linear DDE in Eq. (20) is then only non-zero if QFC is present in the signal(Aj,k 6= 0) and can be used as bispectral detector. 〈x3〉 can only detect the presenceor absence of QFC but not the frequencies involved. To determine the frequencies,the time-domain bispectrum is introduced.

9

Time domain bispectrum (TDB)

To determine the frequencies of QFC consider a signal

x(t) = S + P

S =N∑i=1

Ai cos(ωit+ ϕi) + Aj,k cos((ωj + ωk)t+ (ϕj + ϕk))

P = D cos(Ωt+ φ)

(25)

where S is the signal under investigation and P = D cos(Ω t+φ) is a probe signal.The frequencies fj and fk are coupled. 〈x3〉 is then

〈x3〉 = 〈S3〉+ 3 〈S2P〉+ 3 〈S P2〉+ 〈P3〉

〈S3〉 = 32Aj,k AjAk

〈S2P〉 =

12AjAkD cos(φ− ϕj ∓ ϕk) Ω = ωj ± ωk

12Ak Aj,kD cos(φ− ϕj) Ω = ωj

14A2iD cos(φ− 2ϕi) Ω = 2ωi

0 Ω otherwise

〈S P2〉 =

14AiD

2 cos(2φ− ϕi) Ω = ωi2

0 Ω otherwise

〈P3〉 = 0

(26)

The term 〈S P2〉 is only non-zero when Ω = 2 Ω = ωi. It therefore can be used asan alternative definition of the time-domain spectrum TDS in Eq. (18):

L(Ω) = maxφ〈S P2〉 . (27)

To define the time-domain bispectrum (TDB) we set in our analysis D = 1:

B(Ω) = L(Ω) · maxφ〈S2P〉 . (28)

The expression max〈S2P〉 is zero for frequencies Ω that are not coupled and non-zero according to Eq. (26) for the coupling frequencies ωj ± ωk, ωj, ωk, and 2ωi.

To get only a non-zero expression for ωj +ωk, ωj, and ωk, we use L(Ω) in Eq. (28)to filter out 2ωi.

Note that unlike the bispectrum, which is a function of two frequencies (Fig. 2),the TDB is a function of a single frequency. In Fig. 3, the TDB is shown for thesame signal used to illustrate the traditional bispectrum in Fig. 2. Note that the40 Hz component, which is not coupled, is absent in both the TBS and bispectrum,but is present in the TDS.

10

Figure 3: Time domain spectrum TDS L(Ω) and time-domain bispectrum TDB

B(Ω) for the signal S =3∑i=1

Ai cos(ωi t + ϕi) + cos((ω1 + ω3) t + (ϕ1 + ϕ3)) with

the frequencies, amplitudes, and phases listed in Tab. 1.

3.3 General non-linear DDE as generalized non-linear spec-tral tool

Combining linear and non-linear terms in a DDE makes all coefficients non-linear.If we for example combine the DDEs of Secs. 3.1 and 3.2

x = a1xτ + a2x2τ (29)

the solution can be written as

a1 =〈x4

τ 〉〈xτ x〉 − 〈x3τ 〉〈x2

τ x〉〈x2

τ 〉〈x4τ 〉 − 〈x3

τ 〉2=〈x4〉 d

dτ〈xτx〉 − 〈x3〉 d

dτ〈x2

τx〉〈x2〉〈x4〉 − 〈x3〉2

a2 =〈x2

τ 〉〈x2τ x〉 − 〈x3

τ 〉〈xτ x〉〈x2

τ 〉〈x4τ 〉 − 〈x3

τ 〉2=〈x2〉 d

dτ〈x2

τx〉 − 〈x3〉 ddτ〈xτx〉

〈x2〉〈x4〉 − 〈x3〉2 .

(30)

The coefficient a1 as well as the coefficient a2 contain both linear as well as non-linear statistical moments. The moments and coefficients for combinations ofsinusoids are given in Tab. 2 with and without couplings.

3.1 Example 1: Data with no couplings

To explore the properties of Eq. (30) we start with a harmonic signal with 4independent frequencies and no couplings,

x(t) =4∑

k=1

Ak cos(ωk t+ ϕk) . (31)

11

Table 2: Moments and coefficients of Eqs. (29) and (30). The coefficient a2 is onlynon-zero if QFC is present in the data.

couplings present no couplings

x(t) =N∑k=1

Ak cos(ωk t+ ϕk)+

N∑i=1

N∑j=i+1

(Ai,j cos((ωi + ωj) t+

(ϕi + ϕj)))

x(t) =N∑k=1

Ak cos(ωk t+ ϕk)

〈x2〉 const. const.〈x3〉 const. 0〈x4〉 const. const.〈xτ x〉 f(ωi,j,k, τ) f(ωk, τ)〈x2

τ x〉 f(ωi,j,k, τ) 0

a1 f(ωi,j,k, τ) f(ωk, τ)a2 f(ωi,j,k, τ) 0

Using Eq. (30) we get for the two coefficients

a1 = 316

(A(1) + 4A2

2A23 + 4A2

4A23 + 4A2

2A24

) 4∑k=1

A2kωk sin (τωk)

a2 = 0

(32)

where A(1) = A41 +A4

2 +A43 +A4

4 + 4A21 (A2

2 + A23 + A2

4). The non-linear coefficienta2 vanishes in this case.

3.2 Example 2: Data with couplings

Replace the 4th term in Eq. (31) with a coupling frequency

x(t) =3∑

k=1

Ak cos(ωk t) + A4 cos((ω2 + ω3) t+ (ϕ2 + ϕ3)) (33)

the two coefficients are:

a1 = 316

((A(1) + 4A22A

23 + 4A2

4A23 + 4A2

2A24)A2

1ω1 sin(τω1)+(4A2

2A23 + A(1))(A2

2ω2 sin(τω2) + A23ω3 sin(τω3))+

4A24(A4

2ω2 sin(τω2) + A43ω3 sin(τω3))+

A24(ω2 + ω3)(4A2

2A24 + 4A2

3A24 + A(1))(sin(τω2) cos(τω3)+

sin(τω3) cos(τω2))

a2 = 14A(2)(−3A2

1ω1 sin(τω1)+(A2

1 − 2A22 + A2

3 + A24)ω2 sin(τω2)+

(A21 + A2

2 − 2A23 + A2

4)ω3 sin(τω3)+(A2

1 + A22 + A2

3 − 2A24)(ω2 + ω3)(sin(τω2) cos(τω3)+

sin(τω3) cos(τω2)))

(34)

12

where A(1) = A41 + A4

2 + A43 + A4

4 + 4A21 (A2

2 + A23 + A2

4) and A(2) = A2A3A4.

Eq. (29) is therefore a QFC detector. For any signal without QFC the coefficienta2 will vanish. The detection of non-linear couplings in the data is important inunderwater acoustics (Nikias and Raghuver (1987)).

4 Sparse data

Real-world data often contain data segments that cannot be used for the analysis,such as artifacts in electroencephalography (EEG) data. Eq. (18) can nonethelessbe used even when many data points are missing: For a sparse signal x(T ), whereT is the vector of times for which the signal is good, the probe signal cos(ΩT +φ)can be used to detect the spectrum or any higher-order spectra. Eq. (16) can bemodified as

x(T ) =N∑i=1

Ai cos(ωi T + ϕi)︸ ︷︷ ︸S

+ cos(ΩT + φ)︸ ︷︷ ︸P

(35)

where T is a sparse time vector. The upper plot in Fig. 4 shows an example whereonly a few data points of a signal were used. Those data and their time pointswere then used to compute the TDS using Eq. (18). The TDS for these sparsedata is shown in the lower plot of Fig. 4. Note, that the sparse data have to satisfythe restrictions of the Nyquist theorem (Nyquist (1928)). Eq. (28) can be used forthe bispectrum of sparse data in the same way.

0 0.1 0.2 0.3 0.4 0.5

−2

0

2

time [s]

am

plit

ud

e

Figure 4: Frequency detection from sparse data. Only 101 data points circledfrom a signal S = cos(2πft) with f = 80 Hz and a sampling rate of 500 Hz andadded white noise (SNR = 0 dB) (upper plot) were used to detect the frequencyf (lower plot).

13

5 Cross-trial spectrogram (CTS) and bispectro-

gram (CTB)

The same method can be used when very short segments of data need to be an-alyzed and there are multiple trials of the same experiment e.g. in ERP (eventrelated potential) analysis (Davis (1939); Davis and Davis (1936); Sutton et al.(1965); Luck (2014)). There are two different ways to define a cross-trial spec-trogram (CTS). The first assumes that there is cross-trial phase coherence; thesecond does not.

5.1 Phase coherence across trials

The first method assumes phase coherence across trials. Consider a signal x1(T1),x2(T2), . . . , xn(Tn) where all time series xi(Ti) are centered around the same event,e.g. a stimulus S for EEG data. Then all time vectors Ti can be considered equaland the signal can be concatenated into a single vector, x1(T ), x2(T ), . . . , xn(T )and Li(Ω) is computed for this single vector. For a short data window that wouldbe too short for spectral analysis the data of all trials can be combined and thespectrum can be computed by using a probe signal with a time vector that consistsof n repetitions of the time vector T for n data segments. In this manner a cross-trial spectrogram can be computed by using short sliding windows when there issome phase coherence in the relevant frequencies.

To test this we generated a signal Si = Wi + Yi, where W is white noise andYi are short phase coherent (phases varied in a range from 0 to π for each trial)data segments of 150 ms (t =[126, 279] ms, [350, 500] ms, [576, 726] ms, and[800, 950] ms). The frequencies in the first segment are 10, 20, and 30 Hz (QFC),the single frequencies in the following three segments are 20, 30, and 24 Hz re-spectively. Twenty one second data segments (i = 1, 2, . . . , 20) at a sampling rateof 500 Hz were generated to simulate data from repeated trials in an experiment.

Fig. 5(a) shows the mean signal of these 20 trials. Sliding windows of 100 ms (50data points) with a window shift of 10 ms (5 data points) were used to computethe CTS in Fig. 5(c). This CTS was compared with traditional Morlet waveletanalysis (Mallat (2008)) in (Fig. 5(b)).

In a companion paper (Lainscsek et al. (2014)) we apply this method to EEGdata.

5.2 No phase coherence across trials

The second cross-trial method does not assume phase coherence across trials andis defined by computing Li(Ω) in Eq. (18) for each trial using sliding windows andthen averaging over the 20 spectra Li(Ω) (i = 1, 2, . . . , 20).

Fig. 5(c) shows the CTS assuming phase coherence in the signal, and 5(d) showsthe phase independent CTS. All three methods identified the frequencies correctly

14

and both CTS methods produce similar results (the correlation coefficient betweenFigs. 5(c) and 5(d) is 0.96).

The same experiment was repeated for completely phase randomized data seg-ments Yi. Fig. 6(a) shows the mean of the data, 6(b) the traditional Morletwavelet analysis, 6(c) the CTS assuming phase coherence and 6(d) the phase-independent CTS. Only the phase-independent CTS can identify the frequencies

(a) mean time series

(b) wavelets

time [s]

freq

uen

cy [

Hz]

0.2 0.4 0.6 0.8

10

20

30

40

0.05

0.1

0.15

(c) phase-dep. CTS

time [s]

freq

uen

cy [

Hz]

0.2 0.4 0.6 0.8

10

20

30

40

0.05

0.1

0.15

0.2

0.25

(d) phase-indep. CTS

time [s]

freq

uen

cy [

Hz]

0.2 0.4 0.6 0.8

10

20

30

40

0.05

0.1

0.15

0.2

(e) CTB (f) 〈S3〉

Figure 5: Cross-trial spectrogram (CTS) of phase coherent simulated data withwhite noise. Four segments of noisy short signals of 10, 20, 24, and 30 Hz withsome phase coherence were added to the data. Horizontal white dashed linesindicate frequencies present in each segment. (a) Mean over the 20 trials. (b)Wavelet spectrogram, (c) Phase-dependent CTS, (d) the phase independent CTS,(e) the Bispectrogram CTB, and (f) the bispectral detector |〈S3〉|.

15

(a) mean time series

(b) wavelets

time [s]

freq

uen

cy [

Hz]

0.2 0.4 0.6 0.8

10

20

30

40

2

4

6

8

10

12

x 10−3

(c) phase dep. CTS

time [s]

freq

uen

cy [

Hz]

0.2 0.4 0.6 0.8

10

20

30

40

0.05

0.1

0.15

0.2

0.25

(d) phase indep. CTS

time [s]

freq

uen

cy [

Hz]

0.2 0.4 0.6 0.8

10

20

30

40

0.05

0.1

0.15

0.2

(e) CTB (f) 〈S3〉

Figure 6: Cross-trial spectrogram (CTS) of non phase coherent simulated datawith white noise. Four segments of noisy short signals of 10, 20, 24, and 30 Hzwith no phase coherence were added to the data. (a) shows the mean over the 20trials. (b) displays the wavelet spectrogram, (c) the phase dependent CTS, (d)the phase independent CTS, (e) the Bispectrogram CTB, and (f) the bispectraldetector |〈S3〉|.

correctly. The correlation coefficient between Figs. 6(c) and 6(d) drops to 0.15.Therefore the correlation coefficient between the phase dependent CTS and thephase independent CTS can be used to quantify the amount of phase coherence.

If multiple channels are available, data from different channels and/or different tri-als can be combined. Such cross-trial spectra are different from the cross-spectraldensity (Bracewell (1965); Papoulis (1962)) based on the cross-correlation between

16

two signals.

Since the TDB in Eq. (28) is a function of a single frequency (one-dimensionalplot instead of a two-dimensional plot for the traditional bispectrum) the TDBbispectrogram can be obtained in the same way as a spectrogram. The single-trial, cross-trial and/or cross-channel TDS (or spectrogram) can be extended toa single-trial, cross-trial and/or cross-channel bispectrogram (CTB) by replacingL(Ω) with B(Ω). The bispectral detector 〈S3〉 in Figs. 5(f) and 6(f) indicatesthe presence of QFC in the first data segment and CTB in in Figs. 5(e) and 6(e)show couplings in the first data segment with the three coupled frequencies 10,20, and 30 Hz (10+20=30) present. There is no frequency coupling in other datasegments.

17

6 Discussion

Delay differential analysis is a non-linear dynamical time series classification anddetection tool based on embedding theory. It combines aspects of delay and dif-ferential embeddings in the form of delay differential equations (DDEs), whichcan detect distinguishing dynamical information in data. We focused here on theconnection between traditional frequency analysis and linear DDEs and betweenhigher-order statistics and non-linear DDEs.

We introduced a new set of time-domain tools to analyze the frequency content andfrequency coupling in signals: The time-domain spectrum (TDS), the time-domainbispectrum (TDB), the time-domain cross-trial (or cross-channel) spectrogram(CTS), and the time-domain cross-trial (or cross-channel) bispectrogram (CTB).In addition, the CTS has a phase-dependent and a phase-independent realization.

These new time-domain methods are complementary to traditional frequency-domain methods and have several advantages:

1. Because these new DDA spectral methods use time itself as a variable, theycan be applied cross-trial and on sparse data.

2. DDA can be applied to short data segments and therefore has an improvedtime resolution compared with the fast Fourier transform, which was devel-oped to be computationally fast on longer time series.

3. Because some numerical components of 1f

noise are not present, the new time-domain spectrogram introduced here does not need normalization acrossfrequencies. The sources of 1

fnoise will be discussed elsewhere.

4. DDA is noise insensitive (Lainscsek et al. (2013b)): Short data segmentsfrom the Rossler system with a signal to noise ratio between 10dB and -5dB (more noise than data) were well separated. This separation was alsocorrelated to the dynamical bifurcation parameter and generalized well tonew data.

5. DDA can discriminate dynamical differences in data classes using a smallset of features, thus avoiding overfitting. With DDA only 5 features aresufficient for many applications whereas traditional methods often rely onmany more features. The goal is not to achieve the most predictive model,but the most discriminative model.

6. Higher-order nonlinearities are intrinsically included in DDA, in comparisonwith non-linear extensions of spectral analysis, which are not straightforwardand complicated.

There are many applications of DDA, including classification of electrocardio-grams (Lainscsek and Sejnowski (2013)), electroencephalograms (Lainscsek et al.(2013a,b)), speech (Gorodnitsky and Lainscsek (2004)), and sonar time series(Lainscsek and Gorodnitsky (2003)).

18

Acknowledgements

The authors would like to acknowledge support by the Howard Hughes MedicalInstitute, NIH (grant NS040522), the Office of Naval Research (N00014-12-1-0299and N00014-13-1-0205) and the Swartz Foundation. We would also like to thankManuel Hernandez, Jonathan Weyhenmeyer, and Howard Poizner for valuablediscussion and Xin Wang for helping with the wavelet analysis.

References

Boashash, B. (1995). Higher-Order Statistical Signal Processing. John Wiley &Sons.

Boersma, P. (1993). Accurate short-term analysis of the fundamental frequencyand the harmonics-to-noise ratio of a sampled sound. In Institute of PhoneticSciences, University of Amsterdam, Proceedings 17, pages 97–110.

Bracewell, R. (1965). The Fourier Transform and Its Applications, chapter Penta-gram Notation for Cross Correlation., page 46 and 243. New York: McGraw-Hill.

Brillinger, D. and Rosenblatt, M. (1967). Asymptotic theory of estimates of k-thorder spectra. In Harris, B., editor, Spectral Analysis of Time Series, pages153–188. Wiley, New York.

Chan, Y. and Langford, R. (1982). Spectral estimation via the high-order Yule-Walker equations. Acoustics, Speech and Signal Processing, IEEE Transactionson, 30(5):689 – 698.

Davis, H. and Davis, P. (1936). Action potentials of the brain: In normal personsand in normal states of cerebral activity. Archives of Neurology & Psychiatry,36(6):1214–1224.

Davis, P. A. (1939). Effects of acoustic stimuli on the waking human brain. Journalof Neurophysiology, 2(6):494–499.

Fackrell, J. and McLaughlin, S. (1995). Quadratic phase coupling detection usinghigher order statistics. In Proceedings of the IEE Colloquium on Higher OrderStatistics, page 9.

Falbo, C. E. (1995). Analytic and numerical solutions to the delay differentialequation y′(t) = αy(t − δ). In Joint meeting of the Northern and SouthernCalifornia sections of the MAA, San Luis Obispo, CA.

Fourier, J. (1822). Theorie Analytique de la Chaleur. Chez Firmin Didot, pere etfils.

19

Goertzel, G. (1958). An algorithm for the evaluation of finite trigonometric series.American Mathematical Monthly, 65 (1):34.

Gorodnitsky, I. and Lainscsek, C. (2004). Machine emotional intelligence: A novelmethod for spoken affect analysis. In Proc. Intern. Conf. on Development andLearning ICDL 2004.

Hjorth, B. (1970). EEG analysis based on time domain properties. Electroen-cephalography and Clinical Neurophysiology, 29(3):306 – 310.

Jacobsen, E. and Lyons, R. (2003). The sliding DFT. Signal Processing Magazine,IEEE, 20(2):74–80.

Kadtke, J. and Kremliovsky, M. (1999). Estimating dynamical models using gen-eralized correlation functions. Physics Letters A, 260(3-4):203.

Kolmogorov, A. and Rozanov, Y. (1960). On strong mixing conditions for gaussianprocesses. Theor. Prob. Appl., 5:204–208.

Lainscsek, C. (1999). Identification and Global Modeling of Nonlinear DynamicalSystems. PhD thesis, Technische Universitat Graz.

Lainscsek, C. (2011). Nonuniqueness of global modeling and time scaling. Phys.Rev. E, 84:046205.

Lainscsek, C. and Gorodnitsky, I. (2003). Characterization of various fluids incylinders from dolphin sonar data in the interval domain. In Oceans 2003:Celebrating the Past...Teaming Toward the Future, volume 2, pages 629–632.Marine Technology Society/IEEE.

Lainscsek, C., Hernandez, M. E., Poizner, H., and Sejnowski, T. J. (2014). Delaydifferential analysis of electroencephalographic data. Neural Computation. inpress.

Lainscsek, C., Hernandez, M. E., Weyhenmeyer, J., Sejnowski, T. J., and Poizner,H. (2013a). Non-linear dynamical analysis of EEG time series distinguishes pa-tients with Parkinson’s disease from healthy individuals. Frontiers in Neurology,4(200).

Lainscsek, C., Letellier, C., and Gorodnitsky, I. (2003). Global modeling of theRossler system from the z-variable. Physics Letters A, 314:409–127.

Lainscsek, C., Letellier, C., and Schurrer, F. (2001). Ansatz library for globalmodeling using a structure selection. Physical Review E, 64:016206.

Lainscsek, C. and Sejnowski, T. (2013). Electrocardiogram classification usingdelay differential equations. Chaos, 23(2):023132.

Lainscsek, C., Weyhenmeyer, J., Hernandez, M., Poizner, H., and Sejnowski, T.(2013b). Non-linear dynamical classification of short time series of the Rosslersystem in high noise regimes. Frontiers in Neurology, 4(182).

20

Leonov, V. and Shiryeav, A. (1959). On a method of calculation of semi-invariants.Theor. Prob. Appl., 4:319–329.

Luck, S. (2014). An Introduction to the Event-Related Potential Technique. MITPress, 2 edition.

Mallat, S. (2008). A Wavelet Tour of Signal Processing, Third Edition: The SparseWay. Academic Press, 3 edition.

Mendel, J. (1991). Tutorial on higher-order statistics (spectra) in signal processingand system theory: Theoretical results and some applications. In Proc. IEEE,volume 79, pages 278–305.

Nikias, C. and Raghuver, M. (1987). Bispectrum estimation: A digital signalprocessing framework. In Proc. IEEE, volume 75, pages 869–891.

Nyquist, H. (1928). Certain topics in telegraph transmission theory. Trans. AIEE,47:617–644.

Packard, N. H., Crutchfield, J. P., Farmer, J. D., and Shaw, R. S. (1980). Geometryfrom a time series. Phys. Rev. Lett., 45:712.

Papoulis, A. (1962). The Fourier Integral and Its Applications, pages 244–245 and252–253. New York: McGraw-Hill.

Raghuveer, M. and Nikias, C. (1985). Bispectrum estimation: A parametricapproach. Acoustics, Speech and Signal Processing, IEEE Transactions on,33(5):1213 – 1230.

Raghuveer, M. R. and Nikias, C. L. (1986). Bispectrum estimation via AR mod-eling. Signal Processing, 10(1):35 – 48.

Roads and Curtis (1996). Computer Music Tutorial. MIT Press.

Rosenblatt, M. and Van Ness, J. (1965). Estimation of the bispectrum. TheAnnals of Mathematical Statistics, 36:1120–1136.

Rossler, O. E. (1976). An equation for continous chaos. Physics Letters A, 57:397.

Sauer, T., Yorke, J. A., and Casdagli, M. (1991). Embedology. Journal of Statis-tical Physics, 65:579.

Stankovic, L. (1994). A method for time-frequency analysis. Signal Processing,IEEE Transactions on, 42(1):225 –229.

Sutton, S., Braren, M., Zubin, J., and John, E. R. (1965). Evoked-potentialcorrelates of stimulus uncertainty. Science, 150(3700):1187–1188.

Swami, A. (2003). HOSA - higher order spectral analysis toolbox.

21

Takens, F. (1981). Detecting strange attractors in turbulence. In Rand,D. A. and Young, L.-S., editors, Dynamical Systems and Turbulence, Warwick1980, volume 898 of Lecture Notes in Mathematics, pages 366–381. SpringerBerlin/Heidelberg.

Tukey, J. (1953). The spectral representation and transformation properties of thehigher moments of stationary time series. In Brillinger, D., editor, The CollectedWorks of John W. Tukey, volume 1, pages 165–184. Wadsworth, Belmont.

Walker, G. (1931). On periodicity in series of related terms. Proceedings of theRoyal Society of London, Ser. A, 131:518.

Whitney (1936). Differentiable manifolds. Ann. Math., 37:645–680.

Yule, G. (1927). On a method of investigating periodicities in disturbed series,with special reference to Wolfer’s sunspot numbers. Philosophical Transactionsof the Royal Society of London, Ser. A, 226:267.

22