Nonlinear dynamical system identification with dynamic noise and observational noise

Physica D 223 (2006) 54–68www.elsevier.com/locate/physd

Nonlinear dynamical system identification with dynamic noise andobservational noise

Tomomichi Nakamura!, Michael Small

Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

Received 27 September 2004; received in revised form 9 June 2006; accepted 16 August 2006Available online 18 September 2006

Communicated by A. Stuart

Abstract

In this paper we consider the problem of whether a nonlinear system has dynamic noise and then estimate the level of dynamic noise toadd to any model we build. The method we propose relies on a nonlinear model and an improved least squares method recently proposed onthe assumption that observational noise is not large. We do not need any a priori knowledge for systems to be considered and we can applythe method to both maps and flows. We demonstrate with applications to artificial and experimental data. The results indicate that applying theproposed method can detect presence or absence of dynamic noise from scalar time series and give a reliable level of dynamic noise to add to themodel built in some cases.c" 2006 Elsevier B.V. All rights reserved.

Keywords: Description length; Dynamic noise; Least squares method; Nonlinear time series modelling

1. Introduction

In the real world, systems are not always isolated fromtheir surroundings. That is, the observed system may havetwo components in the dynamics, a system (the deterministicpart) and an input from the surroundings (high-dimensionalunexplained dynamics, which we call dynamic noise). Also,nonlinear systems abound in nature and phenomena can oftenbe understood as due to nonlinear dynamics [22,31,17]. Tohave a profound understanding of the nonlinear system andto identify the system, it is very important to know whetherthe system has dynamic noise. If we could know whether thesystem has dynamic noise, it would be a good opportunityto consider what is the dynamic noise and why the dynamicnoise is necessary for the system. In this paper, we consider theproblem of detecting whether a nonlinear system (deterministicpart) has dynamic noise (stochastic part).

For investigating nonlinearity (nonlinear determinism) intime series, the method of surrogate data has been proposed[1,2] and often applied to real-world time series [7,8].

! Corresponding author.E-mail addresses: [email protected] (T. Nakamura),

[email protected] (M. Small).

We call such techniques linear surrogate methods, becausethey are based on a linear process and address a linearnull hypothesis [3–6]. By this method, we can considerwhether the time series is generated by a linear system or anonlinear system. Although this discrimination is very useful tounderstand the system, it is still not enough because the resultobtained using the method of surrogate data cannot determinewhether the nonlinear system has dynamic noise. The reasonis that nonlinear systems indicate nonlinearity irrespective ofwhether the system has dynamic noise or not.

Another approach for tackling nonlinear time series isnonlinear modelling [10,8,11,23,24]. The aim of building amodel is, given finite time series which are contaminatedby observational noise, to find an approximation of the truedynamics. Time series modelling is very useful for predictionof time series, to help to understand behaviour and to gaininsight into the dynamics. A particularly convenient and generalclass of nonlinear models is the pseudo-linear models, whichare linear combinations of nonlinear functions [10,11]. Hence,models we will build in this paper are all pseudo-linear models.The modelling procedure we use has three components: asuitable hierarchical class of nonlinear models; methods toselect optimal models of a prescribed order from the model

0167-2789/$ - see front matter c" 2006 Elsevier B.V. All rights reserved.doi:10.1016/j.physd.2006.08.013

http://www.elsevier.com/locate/physd

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/10.1016/j.physd.2006.08.013

https://www.researchgate.net/publication/1781984_Improved_Surrogate_Data_for_Non-Linearity_Tests?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/2624647_Detecting_Nonlinearity_in_Experimental_Data?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


https://www.researchgate.net/publication/263987092_REFINEMENTS_TO_MODEL_SELECTION_FOR_NONLINEAR_TIME_SERIES?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/263864351_A_COMPARATIVE_STUDY_OF_MODEL_SELECTION_METHODS_FOR_NONLINEAR_TIME_SERIES?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/223130367_Non-Linear_Time_Series_A_Dynamical_Systems_Approach?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/14569842_Reexamination_of_the_Evidence_for_Low-Dimensional_Non-Linear_Structure_in_the_Human_Electroencephalogram?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/245597030_Cambridge_Nonlinear_Science_Series_7?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/1781986_Discrimination_power_of_measures_for_nonlinearity_in_a_time_series?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/5607045_Detection_of_weak_chaos_in_infant_respiration?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/11663486_Surrogate_for_nonlinear_time_series_analysis?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/222701519_Surrogate_Time_Series?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/222505460_On_Selecting_Models_for_Non-Linear_Time_Series?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


https://www.researchgate.net/publication/243243949_Embedding_as_a_modeling_problem?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


https://www.researchgate.net/publication/222061904_Testing_for_Non-Linearity_in_Time_Series_The_Method_of_Surrogate_Data?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

T. Nakamura, M. Small / Physica D 223 (2006) 54–68 55

class; and criteria which determine when a model is sufficientlyaccurate and neither over-fitting nor under-fitting the timeseries [10,11]. When building models for real-world time series,we usually expect that the system has dynamic noise, becausesystems are not always isolated from their surroundings in thereal world. The free-run behaviour of the model built with someamount of dynamic noise often shows very similar qualitativebehaviour to the data used as training data. However, if it werenot for dynamic noise in the model, the behaviour is oftenperiodic [10]. Hence, we expect that the system would havedynamic noise. However, we do not know the appropriate levelof dynamic noise to add to the model. As is easily expected,either too much or too little dynamic noise is likely to leadto inappropriate free-run behaviour. Also, the model we canbuild is just an approximation of the true dynamics and thedata available are not perfect, that is, the length and numericalaccuracy of time series available to us are limited, and the datais contaminated by observational noise. Furthermore, we cannotknow whether the system has dynamic noise only by observingtime series. Hence, we do not know whether our expectation iscorrect.

There are some works concerning dynamic noise, forexample, see [14,15]. However, the object of these studiesis a map and one must first know the perfect model. In ourapproach, we do not need any a priori knowledge for systemsto be considered and we can apply the method to both mapsand flows. However, we assume that observational noise is notlarge and dynamic noise is significant for the system.1 Thekey feature of the method is its combination of pseudo-linearmodels and the recently proposed noisy least squares (NLS)method [26].

In our approach for detecting whether a system has dynamicnoise, a modelling technique is applied. For building pseudo-linear models, we perform parameter estimation and model se-lection. For parameter estimation the least squares (LS) methodis applied, and for model selection an information criterionis applied. Hence, in the next section, we first review the LSmethod and description length. Next, using polynomial modelsincluding the correct model, we show differences in the modelsselected as the best model when the system has dynamic noise.As a more practical example, where there is no correct model,we also build models using radial basis functions for two artifi-cial data sets, one system has dynamic noise and the other doesnot. Finally, we build models for two real-world time series, alaser time series and heart rate variability data. We find that theidea proposed in this paper can detect the presence or absenceof dynamic noise in systems and provide a good estimate of thedynamic noise level to produce good simulations.

2. The least squares method and information criteria formodelling

For building pseudo-linear models, we perform parameterestimation and model selection [10,11]. The least squares (LS)

1 In a computer simulation, round-off errors should play the role of the verysmall dynamic noise. However, it is usually considered that the influence canbe ignored or the influence is not significant for the system.

method is a vital technique for building models of anonlinear dynamical system given finite time series whichare contaminated by observational noise [40]. These modelstypically have a large number of parameters and these mustbe tuned via parameter estimation. A commonly used methodto estimate parameters is by least squares. After parameterestimation, we can calculate the fitting error (prediction error)and appropriate information criteria for model selection. Thatis, these operations are strongly interconnected with modelling.Hence, the LS method is very important [26,27].

In Section 2.1 we describe how the LS method is appliedin practice and the drawbacks, and how the LS method shouldbe applied. In Section 2.2 we describe the purpose of usinginformation criteria and description length which we adopt asan information criterion.

2.1. The least squares method

We now consider the problem of estimating the parameters! # Rk of a model xt+1 = f (xt , !), xt # Rd of a nonlineardeterministic dynamical system given only a finite time seriesof observations st contaminated by observational noise "t (thatis, st = xt + "t ), where "t is Gaussian with zero mean and fixedvariance, and the data comprise a set of n scalar measurements.

The common and practically used LS method solves theoptimization problem

min!

n$1!

t=1

%st+1 $ f (st , !)%2 , (1)

where only the noisy observations st are used for the fitting.This is a maximum likelihood method, which makes theassumption that the noise is Gaussian and independent. Evenwhen these assumptions hold and the noise is white, it is knownthat LS is biased in the presence of observational noise. Whiteobservational noise at the output becomes coloured regressionnoise in the regression equation which violates the assumptionsof LS [28,40].

This parameter estimates would be much less biased if wecould solve the optimization problem

min!

n$1!

t=1

%st+1 $ f (xt , !)%2 , (2)

where xt is the true state (noise free data) at time t . But ofcourse we cannot know xt . So in Eq. (1) st is used as a proxyfor the noise free data xt . This is clearly not a good thing to dobecause st is corrupted by noise [28,40].

When using Eq. (1) for nonlinear models, the fitting error (inother words, prediction error) is usually not the same asobservational noise and also not normally distributed, evenif the observational noise is Gaussian. However, when usingEq. (2), the fitting error is almost identical to the observationalnoise and normally distributed [27]. It should be noted that thenormal distribution is very important because we usually usethe assumption that the fitting error is Gaussian when usingthe LS method and deriving the information criteria formulae.For more details see [36,20,10]. We consider that the usage

https://www.researchgate.net/publication/3027229_A_New_Look_At_The_Statistical_Model_Identification?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/220273018_Modeling_Nonlinear_Time_Series_Using_Improved_Least_Squares_Method?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


https://www.researchgate.net/publication/220273347_A_Comparative_Study_of_Information_Criteria_for_Model_Selection?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


https://www.researchgate.net/publication/1782061_Better_Nonlinear_Models_from_Noisy_Data_Attractors_with_Maximum_Likelihood?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4



https://www.researchgate.net/publication/11805169_Detection_of_chaotic_determinism_in_time_series_from_randomly_forced_maps?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/12304675_Estimation_of_Noise_Levels_for_Models_of_Chaotic_Dynamical_Systems?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/248738633_A_new_look_at_the_statistical_idenification_model?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4





https://www.researchgate.net/publication/220272753_Improved_Parameter_Estimation_from_Noisy_Time_Series_for_Nonlinear_Dynamical_Systems?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4




https://www.researchgate.net/publication/221996515_System_Identification_Theory_For_The_User?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

56 T. Nakamura, M. Small / Physica D 223 (2006) 54–68

Fig. 1. Information criterion as a function of model size.

of Eq. (1) is “inappropriate” and that of Eq. (2) is “appropriate”as the LS method.

For finding the best model among many, some informationcriteria are usually employed. Hence, we briefly reviewthe basic concept for information criteria and Rissanen’sDescription Length modified by Judd and Mees (DL# ) [10],which we adopt here. Then, we show a significant example:when using Eq. (1), applying DL# cannot find the correctmodel as the best model even with a perfect model class andeven if the noise level is low.

2.2. Information criteria

For building pseudo-linear models, the model size2

is important. Time series available to us are imperfectand insufficient, because they are usually contaminated byobservational noise, and numerical accuracy is also limited.Using these unfavorable data, we have to build models, andwe hope the models can reflect the underlying system andthe influence of noise will be removed as much as possible.Hence, the model should not be fitted to the data either tooclosely or too poorly, this is called over-fitting and under-fitting, respectively. To find the model that best balances modelerror against model size so as to prevent over-fitting or under-fitting of the data, information criteria are often applied. Fig. 1shows the rough sketch of the relationship between model sizeand fitting error (prediction error) for a generic informationcriterion. It is considered that the minimum of the informationcriterion corresponds to the best (optimal) model size and thesmaller the value, the better the model [12,27].

Another important reason for using information criteria isto avoid unnecessary increase in the model size, which occurswhen a model is built which models a nested, self-iterated,form of the original system. We refer to this kind of modelas degenerate [13,26]. Although such a model is essentiallyidentical to the original model, the size is larger and the modelsize increases indefinitely. Hence, it is important to removenesting effect and determine the smallest model size which

2 For pseudo-linear models, the model size refers to the number of basisfunctions, which is the same as the number of parameters of the model, becausethe only parameters used to fit the data are the linear weights.

can substantially model the system, otherwise such models areclearly over-fitting [13,26].

Some information criteria have already been proposed forthese purposes. The best known is the Akaike Information Cri-terion (AIC) [20], but it is known to fail to provide statisticallyconsistent estimates [21]. The Rissanen’s Description Lengthmodified by Judd and Mees (DL# ) has proven to be effective inmodelling nonlinear dynamics [8–11], and it has fewer approx-imations than other information criteria [10]. Hence, DL# ispossibly more reliable [21,10]. It is this criterion we use for de-termining the best (optimal) model. We note that other informa-tion criteria (Schwarz Information Criteria (SIC) [37], Normal-ized Maximum Likelihood (NML) [38]) give either the same,or very similar, results [27].

The minimum description length (MDL) principle states thatthe best model is the one that attains the greatest compression.Under the assumption that the fitting error (prediction error)is normally distributed, Judd and Mees showed that thedescription length DL# can be approximated by

DL# (k) ="n

2$ 1

#ln

eT en

+ (k + 1)

$12

+ ln #

%$

k!

i=1

ln $i , (3)

where k is the model size, # is related to the scale of the data(see below), e is the prediction errors, and the variables $ canbe interpreted as the relative precision to which the parameters! are specified. See more details in [10,11].

A more precise interpretation of # is that it is the exponent ina floating point representation of the parameters scaled relativeto some fixed amount, and it is supposed that 1 & # & 32.There are good arguments that # should be 1, but in practicelarger values seem to be necessary to prevent over-fitting [12,25]. We use # = 32 (that is, DL32) throughout this paper.

Here, we note that we do not use DL# as a strict criterionfor model selection, rather we use it as a method for screeningalternative formulations in order to produce a set of candidatemodels for further consideration.

3. Degeneracy and model selection

To investigate whether the correct model is selected as thebest model, we use one simple well known polynomial model,the Henon map [16], as a nonlinear AR model, and we considerusing multivariate polynomial models. This model is the trueand unique correct model by our definition. We note that, by the“best model”, we mean the model that has the smallest value ofDL32. We expect that the DL32 becomes the smallest when themodel built is the correct model.

It should be noted that selecting the optimal subset of basisfunctions is typically an NP-hard problem [10]. Although weusually apply a selection method for selecting basis functions,the models obtained are affected by the selection methods, andthere is no guarantee of finding the optimal subset. That is, thereis a possibility that the model obtained using various selectionmethods is not the truly best model. Hence, to obtain the truly


https://www.researchgate.net/publication/222305431_Comparison_of_New_Non-Linear_Modelling_Techniques_With_Applications_to_Infant_Respiration?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/6781326_Degeneracy_of_time_series_models_The_best_model_is_not_always_the_correct_model?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


https://www.researchgate.net/publication/3027229_A_New_Look_At_The_Statistical_Model_Identification?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4





https://www.researchgate.net/publication/38358303_'Estimating_the_Dimension_of_A_Model'?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/38328528_'A_Two-Dimensional_Mapping_with_a_Strange_Attractor'?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4








https://www.researchgate.net/publication/230675486_Stochastic_Complexity_in_Statistical_Inquiry?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


https://www.researchgate.net/publication/273134799_Estimating_the_Dimension_of_a_Model?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


best model we calculate all possible combinations (in otherwords, an exhaustive search). However, it should be noted thatsuch a procedure is only practical for fairly small dictionaries.

The Henon map [16] is given by&

xt = A0 + A2xt$2 + A4x2t$1 + %t ,

st = xt + "t ,(4)

where A0 = 1.0, A2 = 0.3 and A4 = $1.4, %t is Gaussiandynamic noise and "t is Gaussian observational noise. We use stas observational data and will use time delay linear polynomialmodels. Choosing lag = 3 and degree = 3 gives 20 candidatebasis functions3 in the dictionary.

Data we use is contaminated by 40 dB observational noiseand the system does not have dynamic noise, and the other iscontaminated by 60 dB observational noise and the system hasGaussian dynamic noise with standard deviation 0.006. We alsoinvestigate larger observational noise, 40 dB, where the systemhas the same level of dynamic noise as above. The number ofdata points used in all cases is 1000 and 10 000.

Table 1 shows the model size of the best model selected.When the system does not have dynamic noise and the levelof observational noise is 40 dB (this is the case(a) in Table 1),DL32 (description length with # = 32) is not the smallest whenthe model size is 3 (the correct model size). However, we notethat the correct model is selected at the correct model size. Thatis, the correct model is not the best model. This is, in fact, avery good but degenerate, approximation to the correct model.We can find such good but degenerate approximations of sizes6, 8 and 11.4 It has been shown elsewhere that the models can bereduced to essentially the correct model [13,26]. When we in-vestigate when the data is contaminated by lower observationalnoise, 60 dB and 80 dB, the same phenomenon appears.

When the system has dynamic noise and the level ofobservational noise is 60 dB (this is the case(b) in Table 1),DL32 is the smallest when the model size is 3 (the correctmodel size). That is, the best model is the correct model.

When the system has dynamic noise and the level of obser-vational noise is 40 dB (this is the case(c) in Table 1), althoughthe correct model is selected at the correct model size 3, the bestmodel in both the data numbers is not the correct model and thebest models are degenerate. This is the same phenomenon asthat when the system does not have dynamic noise.

In the example, the standard deviation of Gaussian dynamicnoise is 0.006 and the standard deviations of 40 dB and 60 dBGaussian observational noise are about 0.00721 and 0.00072respectively. These results imply that broadly speaking, whenthe system has dynamic noise and the dynamic noise is largerthan observational noise, the model selected as the best modelwill be the correct model. Otherwise, when the system has

3 The basis functions corresponding to the parameters are, A0: constant,A1: xt$1, A2: xt$2, A3: xt$3, A4: x2

t$1, A5: x2t$2, A6: x2

t$3, A7: x3t$1,

A8: x3t$2, A9: x3

t$3, A10: xt$1xt$2, A11: xt$1xt$3, A12: xt$2xt$3,

A13: x2t$1xt$2, A14: x2

t$1xt$3, A15: x2t$2xt$1, A16: x2

t$2xt$3,

A17: x2t$3xt$1, A18: x2

t$3xt$2 and A19: xt$1xt$2xt$3.4 We have confirmed that the results using other information criteria such as

SIC [37] and NML [38] are essentially the same.

Table 1The size of the best model for the Henon map

Data number Case(a) Case(b) Case(c)

1 000 8 3 610 000 8 3 8

Case(a): the system does not have dynamic noise and the data is contaminatedby 40 dB observational noise. Case(b): the system has dynamic noise and thedata is contaminated by 60 dB observational noise. Case(c): the system hasdynamic noise and the data is contaminated by 40 dB observational noise.

dynamic noise, the best model will tend to over-fit and will notbe the correct model.

As mentioned before, one of the important reasons for usinginformation criteria is to avoid unnecessary increases in modelsize in which a model is built that reflects a nested, that is, self-iterated form of the original system. However, the result showsthat we cannot avoid unnecessary increase in the model sizeand we cannot select the correct model as the best model, evenwhen the noise level is low and truly the best model is obtainedby calculating all possible combinations. Also, the size of themodel selected as the best model is much larger than that ofthe correct model. It indicates that the best model is the resultof some kind of “over-fitting”. For more details concerningdegenerate model see [13,26].

To overcome this problem and to select the correct model asthe best model, Nakamura and Small have proposed the noisyleast squares (NLS) method, which is an idea to use the LSmethod more appropriately in practical situations [26]. Also, bythis method we can take advantage of information criteria moreeffectively and generally avoid over-fitting. In the next section,we review the NLS method.

4. The noisy least squares method: Using the least squaresmethod more appropriately

Nakamura and Small have proposed an idea to use the LSmethod more appropriately (like Eq. (2)) without using the truestate in practical situations [26]. As is well known, when thenoise level is low, the reconstructed attractors using the noisydata are very similar to that using the noise free data, and alsothe parameters estimated are almost the same as the correctvalues. These facts indicate that the noisy data can be regardedas a good proxy for the true state when the noise level is low.From this, we expect that we can achieve a proxy of Eq. (2)using only noisy data when the noise in st is low compared tothat of st+1 in Eq. (1). Hence, we propose the addition of largerGaussian noise to the st+1 term in Eq. (1).

Let the additional Gaussian noise be "'t and s'

t+1 = st+1 +"'

t+1. Then we obtain the new optimization problem

min!

n$1!

t=1

''s't+1 $ f (st , !)

''2. (5)

In Eq. (2), the st+1 term has more noise than xt . Hence, whenthe level of the additional noise is large enough relative to thenoise included in the original noisy data st , we expect that theEq. (5) can be a good approximation to the Eq. (2). We refer tothe method as the “noisy least squares” (NLS) method [26].








https://www.researchgate.net/publication/38358303_'Estimating_the_Dimension_of_A_Model'?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4




Fig. 2. (Colour online) A plot of the selected basis functions against additional noise levels, where the number of data points used is 1000 ((a), (b) and (c)) and10 000 ((d), (e) and (f)), and the number in the parentheses is the number of parameters selected (the model size) in the model selected as the best model. (a) and(d) without dynamic noise and 40 dB observational noise, (b) and (e) with dynamic noise and 60 dB observational noise, and (c) and (f) with dynamic noise and40 dB observational noise.

4.1. Application of the NLS method to the case of degeneratemodels

Consider the Henon system (used in Section 3) again. Weapply the NLS method to system identification. We againcalculate all possible combinations to obtain the truly bestmodel. We increase the additional noise level every 10 dB andadd the noise level from 80 to 0 dB.

Fig. 2 simultaneously shows the basis functions in modelsselected as the best model and the additional noise level (they-axis index corresponds to that of the parameters given inSection 3). Fig. 2(a) and (d) show the results when the systemdoes not have dynamic noise and the data is contaminated by40 dB observational noise. The figures indicate that the selectedbasis functions do not change for a while. However, as the noiselevel becomes larger, the basis functions selected are identicalto the basis functions in the correct model. The three basisfunctions are all correct basis functions, that is, the model isthe correct model.

Fig. 2(b) and (e) show the results when the system hasdynamic noise and the data is contaminated by 60 dBobservational noise. The figure indicates that when the numberof data points is 10 000, only the correct basis functions arealways selected. That is, the correct model is always selected.When the number of data points is 1000, the selected basis

functions do not change, until the noise level is 0 dB. From 80to 10 dB, the three basis functions selected are all correct basisfunctions, that is, the model is the correct model. However,when the noise level is 0 dB, although the size of the best modelis 3, two of three basis functions selected are correct and theother is not.

Fig. 2(c) and (f) show the result when the system hasdynamic noise and the data is contaminated by 40 dBobservational noise. The figures indicate that the behaviour isvery similar to that seen in Fig. 2(a) and (d).

The results (Fig. 2(a), (b), (d) and (e)) indicate that thereis a significantly different behaviour when the system hasdynamic noise. When the system has dynamic noise, the bestmodel selected by applying the NLS method at each noiselevel is the same as the best model selected using only theoriginal data. However, when the system does not have dynamicnoise, the best model selected by applying the NLS method ateach noise level does change and the basis functions selectedbecome identical to the correct basis functions, as the noiselevel becomes larger.

However, it should be noted that when the observationalnoise is 40 dB and the system has dynamic noise (that is, theobservational noise is larger than dynamic noise), the behaviouris very similar to when the system does not have dynamic noise.See Fig. 2(c) and (f). Hence, it indicates that it is important for


our idea to detect the presence or absence of dynamic noise, thatthe level of observational noise is smaller than that of dynamicnoise. That is, we want to use data which is not contaminatedby much observational noise. However, it should be noted thatthis requirement is not unusual, because we always want to useclean data for modelling.

4.2. Some observations

Here, we consider the reason why the correct model isselected as the best model without applying the NLS methodwhen the Henon system, Eq. (4), has dynamic noise. Thesystems we consider have dynamics described by&

xt = f (xt$1, !) + %t ,

st = g(xt ) + "t ,(6)

where xt # Rd is the state of the system at time t , f isthe dynamics (a nonlinear function), ! # Rk are parametersof the system, st is our measurement of the system viaan observation function g, and %t and "t are independentidentically distributed (i.i.d.) random variates representingthe dynamic noise and observational noise respectively. Wedecompose Eq. (6) further, when %t is not zero. The formulaxt = f (xt$1, !) + %t can then be rewritten as

x 't = f (xt$1, !),

xt = x 't + %t .

(7)

The operation in Eq. (7) is essentially the same as the NLSmethod, Eq. (5). Here, we use the observation function g in Eq.(6) as an identity equation (that is, xt = g(xt )) for simplicity.The formula st = g(xt ) + "t can be written using Eq. (7) as

st = xt + "t ,

st = x 't + %t + "t .

(8)

When observational noise "t is smaller than dynamic noise %t ,broadly speaking, the features of observational noise will beburied in that of dynamic noise. Hence, the correct model wouldbe selected as the best model as happens when applying theNLS method. However, when observational noise is larger thandynamic noise, broadly speaking, the features of dynamic noisewill be buried in that of observational noise. Hence, the correctmodel would not be selected as the best model.

The results in Section 3 show that when the system doesnot have dynamic noise, the best model size is larger than thecorrect model size. When the system has dynamic noise anddynamic noise is larger than observational noise, the best modelis the correct model. When the system has dynamic noise andobservational noise is larger than dynamic noise, the best modelis not the correct model. That is, the correct model is selectedonly when dynamic noise is larger than observational noise.Section 4.1 shows that when the system does not have dynamicnoise, when the NLS method is applied, as the additionalnoise level increases, the model size changes immediately andthe same model is selected for a while and the model is thecorrect model. Also, when the system has dynamic noise, whenthe NLS method is applied, as the noise level increases, the

model size does not change, the same model continues to beselected and this model is the correct model. These are clearlydifferent behaviours. The description of Section 4.1 providesa method to exploit the phenomenon observed in Section 3.Hence, although the NLS method is originally proposed toavoid over-fitting [26], we find that we can also detectwhether a nonlinear system has dynamic noise by applying thismethod.

In the previous examples, the results were very clear,because there were correct basis functions for the system inthe dictionary and we could build the correct model. However,when using real-world time series, we usually do not knowthe system or the model. Hence, in such cases, we expect thatwhen the NLS method is applied and the additional noise levelis very large (for example, larger than 20 dB), the behaviourwould be different from that seen in the examples. The modelsize will be smaller5 unlike the results seen in Fig. 2. Whena system does not have dynamic noise, the model tends to beover-fitted, and hence the model size changes at relatively smalladditional noise level. On the other hand, when a system hasdynamic noise, the model is not over-fitted and hence the modelsize changes at relatively large additional noise level. Therough sketch we expect is shown in Fig. 3. Using this differentbehaviour, we consider that we can discriminate whether thesystem has dynamic noise. We will investigate how this ideaworks in more practical cases in the next section.

5. Example and application

In the previous sections, the results were very clear.However, the examples presented so far are somewhatunrealistic. The reasons are that: Firstly, correct basis functionsfor the system are in a dictionary. That is, there is theanswer (the correct model). When using real-world time series,we usually do not know the system or the model. Secondly,when building models, we usually cannot calculate all possiblecombination sets (an exhaustive search) to obtain the truly bestmodel. We usually apply a method for selecting basis functionsfrom a dictionary [10,23,24]. Finally, the basis functions usedare all polynomial. It is not recommended to build models usingonly polynomials, if one wants to build from strong modelclasses,6 that is, the models provide better approximations for agiven number of basis functions. Generally, it is considered thatthe polynomial model class is not strong and probably shouldnot be used by themselves [12].

To investigate how our idea works for detecting whether asystem has dynamic noise in more practical cases, we buildpseudo-linear models using radial basis functions [10]. We useGaussian function fi (x) = exp($%x $ ci%2/r2

i ) for arbitrarycentres ci and radii ri . Many candidate basis functions are

5 Typically the size of a model built using very noisy data is relatively small.6 A characteristic feature of strong models is that the fitting process is

inherently nonlinear. To obtain a strong model it is necessary, although notsufficient, to have some nonlinear parameter dependence [39]. Hence, radialbasis models are strong models because the centres (and scale radii whereappropriate) should also be considered as parameters and it is these parametersthat provide the essential nonlinear dependence of strong models [10].




https://www.researchgate.net/publication/3078296_Barron_AE_Universal_approximation_bounds_for_superpositions_of_a_sigmoidal_function_IEEE_Trans_on_Information_Theory_39_930-945?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/3524784_Approximation_Bounds_For_Superpositions_Of_A_Sigmoidal_Function?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


Fig. 3. The rough sketch of a plot of the size of the best model selected against additional noise levels. (a) A system does not have dynamic noise, and (b) a systemhas dynamic noise.

generated by random assignment of the centres ci and radii riusing time series in the form of a dictionary. It is almost alwaysadvantageous to have available constant and linear functionstoo. For more details on this procedure see [10,12]. Also,we use a selection method, the up-and-down method usingmarginal error [23,24]. As stated above, although selecting theoptimal subset of basis functions from a dictionary is typicallyan NP-hard problem and selection algorithms cannot usuallyguarantee to find the optimal subset, the bottom-up method andthe improved method, the up-and-down method, have provento be effective in modelling nonlinear dynamics [9,8,23,24].Especially, the up-and-down method is able to obtain bettermodels in most cases than the bottom-up method [23,24].Hence, we employ the up-and-down method.

In the earlier examples, we could always obtain clear resultsfor any noise level even when the noise level is relatively large.This was possible, because the correct basis functions were inthe dictionary and all possible combination sets were alwayscalculated. In practice, as we have to select a basis functionfrom a dictionary using a selection method, the basis functionselected comes under the influence of features of the trainingdata. A general explanation for selecting basis functions is thatthe basis functions selected would be adequate for extractingthe features of the training data, when the training data is veryclean. However, when this is not the case, such basis functionswould not be selected and basis functions which reflect noisewould be added. In other words, when data are noisy, basisfunctions selection is influenced by noise features. That is,noisy data is not appropriate as training data for buildingmodels, and we should avoid the influence of noise in trainingdata for selecting basis functions as much as possible.

We want to use clean data in practice. Hence, to save timeand trouble, we will use the following idea to apply the NLSmethod.7

7 In the previous examples, the case of degenerate models, to obtain the bestmodel at each model size and to show that the correct model is the global bestmodel under each condition, we calculated all possible combination sets. Whenwe apply the idea mentioned here to the previous examples, we still find thecorrect model as the best model, although some models selected at each modelsize are not the best.

(1) We build models using the LS method, a selection methodand training data as usual.

(2) Keep using the model obtained (that is, the same basisfunctions selected) at each model size in step 1, DL32of these models is calculated again using the fitting errorobtained by applying the NLS method.

(3) We find the best model at each additional noise level.

We first apply the idea to two artificial data sets, where weknow the answer to whether the system has dynamic noise. Onesystem is the differential equations of the Chua’s circuit [29]and the other is the model of the annual sunspot numbersusing radial basis functions [30]. These time series are wellknown as nonlinear time series. To investigate the quality andperformance of models obtained by applying the idea, weexamine long-term free-run data, which is generated by usingthe models, because one needs to get the dynamics right toobtain good long-term free-run data. Then, we will confirm thatour expectation described in Section 4.2 is correct. Based onthe results, we apply the idea to a real-world time series, a lasertime series [18] and a heart rate data obtained from a healthyadult male.

5.1. Example

In this section, we investigate how our idea works and somedifferences from results presented previously in this example.We note that we expect that any time series is contaminated byobservational noise to some extent. Hence, we contaminate thedata with 60 dB Gaussian noise and use these as observationaldata in this section.

5.1.1. The differential equations of the Chua’s circuitThis model is an electronic circuit proposed by Chua et al.

[29]. The circuit dynamics are described by()))))*

)))))+

C1dvc1

dt= G(vc2 $ vc1) $ g(vc1)

C2dvc2

dt= G(vc1 $ vc2) + iL

LdiL

dt= $vc2

(9)









https://www.researchgate.net/publication/48376966_Analysis_of_Observed_Chaotic_Data?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/3186385_The_double_scroll?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/3186385_The_double_scroll?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


Fig. 4. The mode of the best model size obtained at different additional noiselevels for the models of the Chua’s circuit equations.

where vc1, vc2 and iL denote the voltage across C1, the voltageacross C2 and the current through L , respectively, and g(v) isthe piecewise-linear function:

g(v) = m0v + 12(m1 $ m0)|v + Bp|

+ 12(m0 $ m1)|v $ Bp|. (10)

The parameters for both the equations are 1/C1 = 9.0, 1/C2 =1.0, 1/L = 7.0, G = 0.7, m0 = $0.5, m1 = $0.8 andBp = 1.0. When calculating this data we use the fourth orderRunge–Kutta method with sampling interval 0.1, and we usethe vc1 component of the model. Then, we contaminate it with60 dB Gaussian noise and use it as observational data. We notethat the system does not have dynamic noise.

For building a model using radial basis functions, 5000data points are used as the training data and the up-and-downmethod using marginal error is applied [23,24]. The data isembedded using uniform embedding (t $ 1, t $ 5, t $ 9) in3 dimensions [19] with the aim of predicting a value at time t .

The size of the best model is 67. We apply the NLSmethod and find the best model again at each noise level. Weadd Gaussian noise from 80 dB for every 5 dB and use 5different noise realizations. When the NLS method is appliedthe model of smaller size is selected when the additional noiselevel is larger than that included in the original data (seeSection 4.1). Hence, 80 dB is clearly too small as additionalnoise. However, in practice, we usually do not know the levelof the observational noise and it is not unreasonable to expectthe level of the observational noise to be less than 80 dB. Hence,on the assumption that we apply the NLS method to real-worldtime series where we do not know the observational noise level,we add Gaussian noise from 80 dB for the NLS method.

Fig. 4 shows the mode of model size when the model hasthe smallest value of DL32 (description length with # = 32)at each noise level. Although the difference of the first plateauof model size 67 and the second plateau of model size 64 and63 is not so large, this behaviour is very similar to that seen inFig. 3(a). This result indicates that the difference of first plateauand second plateau may not be always large in practice. Also,the models obtained seem to be very sensitive to the additionalnoise level, when there is no correct model. In Section 4.1, the

model and model size changed when the additional noise levelin the NLS method was larger than that included in the originaltraining data. However, the result obtained here is different fromthis. When the additional noise level is 60 dB which is the samenoise level as that included in the original training data, themodel size is 64, which is smaller than that obtained using theoriginal training data. When the noise levels are smaller than60 dB, that is, 65 dB, 70 dB, 75 dB and 80 dB, the model ofsize 67 which is the same model size as that obtained using theoriginal training data is selected as the best model.

We investigate the models of size 63 and 67. Fig. 5 shows thereconstructed attractors of the original training data and those ofthe ubiquitous behaviours of the free-run data of these models.Fig. 5 shows that although the model of size 63 has less basisfunctions, the free-run behaviour is similar to that of the originaldata as well as the model of size 67. This result indicates thatthe model of size 63 has enough basis functions and hence themodel of size 67 might be over-fitted.

Here, it should be noted that when a system does not havedynamic noise, the behaviour of free-run data generated by thesystem is strongly influenced by the initial condition. To avoidthis influence, we apply the following idea. We use all points inthe training data as the initial condition, iterate 1000 times andtake the last points. If the free-run data goes to fixed points orperiodic orbits, the reconstructed attractor using the last pointscollected would clearly show fixed points or periodic orbits.

Fig. 6 shows the reconstructed attractors of the data obtainedin this way. For comparison, the reconstructed attractors of theoriginal training data are also shown in Fig. 6(a). Fig. 6(b)and (c) are very similar, although the model of size 63 has lessbasis functions.8 It should be noted that although the originaltraining data is contaminated by 60 dB Gaussian observationalnoise, we still see that the points in Fig. 6(a) clearly have nonoise influence, while all the others do, that is, the attractor ineach case is a little fuzzy.

We found that the model of size 63 shows almost the sameperformance as that of the model of size 67. That is, the modelof size 67 is probably over-fitted. Hence, we regard that thesystem considered does not have dynamic noise. This is inagreement with the original model of the differential equationsof the Chua’s circuit.

5.1.2. The model of the annual sunspot numbersWe use the model of the annual sunspot numbers built

using the behaviour criterion by [30] which has excellentfree-run dynamics. To build the model, the actual annualsunspot numbers, over the period 1700–2000, are used. Theraw annual sunspot numbers yt are first transformed using thenonlinear function9 xt = 2

(yt + 1 $ 1 [31]. The purpose

8 This is a discussion for the colour graphic. Fig. 6(b) shows that there aremost of all blue, pale blue, green and yellow points around the centres on eitherside and most orange and red points on the belt. Fig. 6(c) is almost the same asFig. 6(b).

9 For all models in this section, we follow a suggestion of Tong, and apply thetransform xt = 2

(yt + 1$1. When presenting results, we undo this transform,

that is, y't =

$x '

t +12

%2$ 1, where y'

t is the simulated sunspot number.




https://www.researchgate.net/publication/225651862_Detecting_Strange_Attractors_in_Turbulence_Lecture_Notes_in_Mathematics?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/245896484_Detecting_Strange_Attractors_in_Turbulence?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4


Fig. 5. The reconstructed attractors of time series of the Chua’s circuit equations and a free-run of the models obtained. In each panel, 5000 data points are plotted.(a) Training data, (b) model size 67 and (c) model size 63.

Fig. 6. (Colour online) The reconstructed attractors using the training data and free-run time series of models obtained. We use all points in the training data as theinitial condition, iterate 1000 times and take the last points. In each panel, 5000 data points are plotted. (a) Training data, (b) model size 67, and (c) model size 63.

Fig. 7. Time series of (a) the actual annual sunspot numbers and (b) a part of the training data.

of this transform is to weaken the nonlinearity and also toprovide a fair comparison to previously published results.The data is embedded using the non-uniform embeddingXt$1 = (xt$1, xt$2, xt$4, xt$8, xt$10) in 5 dimensions [11]for predicting xt , and the model is built using radial basisfunctions [30]. The model is

xt+1 = f (Xt ) + %t , (11)

where f is the model and %t is Gaussian dynamic noise. Weuse the free-run time series of the model. Fig. 7 shows theoriginal annual sunspot numbers and the free-run time seriescontaminated by 60 dB Gaussian noise of the model obtainedby [30], which we use as observational data and the originaltraining data for building models. The free-run data of themodel is very similar to the original time series of the annualsunspot numbers. We note that the system has dynamic noise

and if it were not for dynamic noise, the free-run data of themodel is just periodic.

For building a model using radial basis functions, 3000data points are used as training data and the up-and-downmethod using marginal error is applied [23,24]. The data isembedded using the same strategy as the original model, thatis, non-uniform embedding (t $ 1, t $ 2, t $ 4, t $ 8, t $ 10)

in 5 dimensions [19] with the aim of predicting a value attime t .

The size of the model obtained as the best model is 16. Weapply the NLS method and find the best model again at eachnoise level. We add Gaussian noise from 80 dB for every 5 dBand use 5 different noise realizations. Fig. 8 shows the modeof the best model size obtained. From 80 dB to 20 dB, themodel size is 16, which is the same size of the model obtainedusing the original training data. From 10 dB to 0 dB, the model




Fig. 8. The mode of the best model size obtained at different additional noiselevels for the models of free-run annual sunspot numbers.

size decreases. This behaviour is very similar to that seen inFig. 3(b). Hence, we regard that this system has dynamic noise.This is in agreement with the original model of the annualsunspot numbers.

Here, we consider whether we can obtain a good estimateof dynamic noise level to add to the model built from theresult. When we generate free-run data using a model builtusing radial basis functions, we often add dynamic noise to themodel, where we expect that the model (system) has dynamicnoise. However, we usually do not know the appropriate levelof dynamic noise to add to the model. Either too much ortoo little dynamic noise is likely to lead to a wrong free-runbehaviour [32]. As a rough standard for the level of dynamicnoise, Judd and Mees suggest that the value of 60% of theroot-mean-square fitting error (RMS error) of the model is anominal value which allows some of the residual error to be dueto dynamic noise and not observational noise [11]. Hence, weusually expect that the dynamic noise level would be betweenabout 60% and 70% of the RMS error. As one of the roughstandards, we use 65% of the RMS error as dynamic noiselevel. However, we do not claim that this level is generic. Asa matter of fact, different dynamic noise levels are suggested.Judd and Mees have used 75% of the RMS error [10], Smalland Judd use 10%, 20%, 30% and 60% of the RMS error [8],and Walker and Mees use 5% of the standard deviation of “theoriginal data” (not the RMS error) [33]. Hence, it is importantto obtain a good estimate of the dynamic noise level (even if theestimate is rough), because there is no good indication now.

From Fig. 8 we consider that the model size would changewhen the additional noise becomes more significant thandynamic noise to be included in the model. Hence, we expectthat appropriate dynamic noise level added to the model wouldbe around the noise level when the model size changes. In thisexample, the model size changes when the noise level is 15 dB,and the standard deviation of the noise level is 1.014344.

We investigate free-run behaviour using diverse levels ofdynamic noise. In Fig. 9, we show the original training dataand the ubiquitous behaviour of free-run data generated bythe model of size 16 using some levels of dynamic noisefor comparison of influence of different dynamic noise levels.

Dynamic noise added is Gaussian and the standard deviation10

is 1.522003 (this value is used in the original annual sunspotnumbers model), 1.1, 0.998863 (this value is 65% of the RMSerror) and 0.6. Fig. 9(c) shows that although the free-runbehaviour is stable over long periods, as shown in Fig. 9(d),the behaviour is slightly missing in some areas. This wouldmean that the dynamic noise level is larger. Fig. 9(e) showsthat the free-run behaviour is stable over long periods, Fig. 9(f)shows that the behaviour is more “sunspot-like” oscillations,and both the panels are very similar to the original trainingdata seen in Fig. 9(a) and (b). Fig. 9(g) shows that the free-runbehaviour is stable over long periods, however, Fig. 9(h) showsthe behaviour is periodic rather than sunspot-like oscillations.Fig. 9(i) and (j) show that the behaviour is more periodic thanthat seen in Fig. 9(g) and (h), and the amplitude is globallysmaller than the training data. In the last two cases, these wouldmean that the dynamic noise level is smaller. From the result weexpect that dynamic noise level 1.1 would be the appropriatelevel for the model built, where 1.1 is not much different from1.014344 which is the 15 dB observational noise level. Also,65% of the RMS error (0.998863) is not much different fromthe value of 1.1. Hence, we consider that the noise level whenthe model size changes and the RMS error is 65% is a goodrough standard of the dynamic noise level.

5.2. Application

In the previous examples, we built models using radialbasis functions where we do not use correct basis functions.From the results, we find that when a system does not havedynamic noise the model tends to be over-fitted, and hence themodel size changes immediately as the noise level increaseswhen we apply the NLS method. In other words, the modelsize changes at relatively small additional noise level. Whena system has dynamic noise the model is not over-fitted andhence the model size changes not immediately as the noise levelincreases. In other words, the model size changes at relativelylarge additional noise level. The results are essentially the sameas those of the Henon map, and we can therefore discriminatethe systems correctly. We now apply the method to two real-world data sets.

5.2.1. Application to a laser time seriesWe apply our idea to building models of a laser time

series. The laser is a neodymium-doped yttrium aluminumgarnet laser. The time series is considered as chaotic becauseit has at least one positive Lyapunov exponent associatedwith its evolution, and a broad and continuous Fourier powerspectrum [18]. Fig. 10(a) shows the time series used as trainingdata for building models.

For building a model using radial basis functions, 2000data points are used as the training data and the up-and-downmethod using marginal error is applied [23,24]. The data is

10 We cannot meaningfully define “SNR” for dynamic noise. Hence, we usepercentage of signal standard deviation to show how much dynamic noise isadded.


https://www.researchgate.net/publication/2465634_Reconstructing_Nonlinear_Dynamics_by_Extended_Kalman_Filtering?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/222553203_Applying_the_method_of_surrogate_data_to_cyclic_time_series?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4



Fig. 9. The training data and free-run data of models obtained. Panels (a) and (b): training data, and (c) and (d): dynamic noise level is 1.522003 (this is the samevalue used in the original annual sunspot numbers model), (e) and (f): dynamic noise level is 1.1, (g) and (h): dynamic noise level is 0.998863 (this is 65% of theRMS error), and (i) and (j): dynamic noise level is 0.6, where the size of the model used is 16 and panels (b), (d), (f), (h) and (j) are enlargements of (a), (c), (e), (g)and (i), respectively.

embedded using uniform embedding (t $ 1, t $ 6, t $ 11,

t $ 16, t $ 21) in 5 dimensions [19] with the aim of predictinga value at time t . The size of the best model selected is 30.

We apply the NLS method and find the best model again ateach noise level. We add Gaussian noise from 80 dB for every5 dB and use 5 different noise realizations. Fig. 10(b) shows themode of the best model size obtained. From 80 dB to 35 dB, themodel size is 30, which is the same size of the model obtainedusing the original training data, and from 35 dB, the model sizekeeps decreasing. This behaviour is very similar to that seenin Fig. 3(b). Hence, we regard that the system of laser datahas dynamic noise. This conclusion is in agreement with the

explanation for the laser system published by [18]. Accordingto [18], there is noise associated with the intrinsic quantumfluctuations and the laser is pumped by inputs of diode laserwhich would have fluctuations even without any other effects,and these factors work as dynamic noise.11

Here, based on the result of Section 5.1.2, we investigatethe level of dynamic noise to add to the model. The modelsize changes when the noise level is 30 dB, where thestandard deviation of 30 dB observational noise added in the

11 Although we use the term “dynamic noise”, Abarbanel characterizes thenoise high-dimensional unexplained dynamics [18].



Fig. 10. The laser time series used as training data and the mode of the bestmodel size obtained at different additional noise levels for the models of thelaser data.

NLS method is 1.165343. Hence, we expect that appropriatedynamic noise level added to the model would be around1.165343. We investigate free-run behaviour using diverselevels of dynamic noise.

In Fig. 11, we show data in a certain interval after thetraining data and the ubiquitous behaviour of free-run datagenerated by the model of size 30, where dynamic noise addedis Gaussian and the standard deviation is 1.1, 0.999956 (thisis 65% of the RMS error) and 0.6. Fig. 11(b) shows that

the behaviour is qualitatively similar to Fig. 11(a) except forintermittent bursting. This would mean that the dynamic noiselevel is larger. Fig. 11(c) shows that the behaviour is stable overlong periods, and it is very similar to the data seen in Fig. 11(a).This would mean that the dynamic noise level is appropriate.Fig. 11(d) shows that although the behaviour is stable over longperiods, it is very poor. This would mean that the dynamic noiselevel is smaller.

5.2.2. Application to a heart rate variability dataIn one normal, resting subject the heart rate was monitored

for about 20 min using electrocardiogram (ECG). Theinstantaneous heart rate is expressed as the interval betweenconsecutive heart beats. This is easily estimated from the ECGas the time between the peaks of consecutive R-waves, theso-called R–R intervals. A total of 1500 R–R intervals weremeasured from the subject. Fig. 12 shows the time series.Fig. 12(a) is training data for building models and Fig. 12(b)is a part of Fig. 12(a). As shown in the figure, their behavioursfluctuate greatly, although the subject was at rest. We assumethat the data is stationary because the data was measured whenthe subject had eyes closed and was at rest, that is, the subjectwas at rest physically and he was asked not to concentrate onanything.

Also, we expect the heart rate data we use here to benonlinear. As a preliminary test, we have applied linearsurrogate methods, FT, AAFT and IAAFT surrogate data[1,2], to the heart rate data, where the embedding dimension is5, the time-lag is 2, the number of data points used is 1024, andthe slope of the correlation sum [17] (this indicates the localcorrelation dimension) is used as the discriminating statistic.The result shows that there is a clear difference between the

Fig. 11. A certain interval after the training data and free-run data of the model of size 30. (a): data in a certain interval after the training data, (b): free-run datawhen dynamic noise level is 1.1, (c): free-run data when dynamic noise level is 0.999956 (this is 65% of the RMS error), and (d): free-run data when dynamic noiselevel is 0.6.




Fig. 12. Time series of (a) the heart rate data used as training data and (b) a part of the training data.

original data and the surrogate. Hence, we regard that the heartrate data we use would not be linear but could be nonlinear.

For building a model using radial basis functions, 1500data points are used as the training data and the up-and-downmethod using marginal error is applied [23,24]. The data isembedded using uniform embedding (t $ 1, t $ 3, t $ 5, t $ 7)

in 4 dimensions [19] with the aim of predicting a value at timet . The size of the best model selected is 10.

We apply the NLS method and find the best model again ateach noise level. We add Gaussian noise from 80 dB for every5 dB and use 5 different noise realizations. Fig. 13 shows themode of the best model size obtained. From 80 dB to 15 dBthe model size does not change and from 10 dB the model sizechanges. This behaviour is very similar to that seen in Fig. 3(b).Hence, this result implies that the heart rate system has dynamicnoise.12 That is, the heart rate variability seen is not due to theaction of only low-dimensional deterministic components of thesystem.

This is in agreement with the physiology of thecardiovascular system. The heart rate is determined by theactivity in the autonomic nerves which supplies the sinusnode. The activity in these fibres is not only determined bysimple feedback from the baroreceptors (pressure sensors) inthe cardiovascular system, but is also influenced by inputsfrom many other systems, including hormonal systems, andhigher centres such as the cerebral cortex. The normal heartbeat is initiated by a group of pacemaker cells, the sinusnode, located in the right atrium. In isolation, the sinus nodeappears to be relatively stable, producing heart beats at a regularrate. This is evident in heart transplant patients where thenervous innervation to the heart is missing. In these patientsthe instantaneous heart rate shows much less variation than innormal subjects [34]. Also, when patients have severe cardiacheart disease, the behaviour of the heart rate variability is alsosometimes very periodic [35]. This implies that an importantsource of the normal variability is the external forcing of thesinus node by the nervous system. Hence, the normal heartrate variability could be an example of a system where thefluctuations arise as a consequence of a nonlinear deterministicsystem (the sinus node) being forced by a high-dimensionalinput (the activity in the nerves innervating the sinus node).

Here, based on the result of Section 5.1.2, we investigatethe level of dynamic noise to add to the model built. The

12 It may be better to call dynamic noise “high-dimensional unexplaineddynamics input to control the heart” in this case.

Fig. 13. The mode of the best model size obtained at different additional noiselevels for the models of the heart rate data.

model size changes when the noise level is 10 dB, where thestandard deviation of 10 dB observational noise added in theNLS method is 0.015866. Hence, we expect that appropriatedynamic noise level added to the model would be around0.015866. We investigate free-run behaviour using diverselevels of dynamic noise. In Fig. 14, we show the ubiquitousbehaviour of free-run data generated by the model of size10, where dynamic noise added is Gaussian and the standarddeviation is 0.013, 0.011009 (this is 65% of the RMS error)and 0.009. Fig. 14(a) shows that the values of free-run data iscompletely different from those of actual heart rate data usedas training data. The data drops from heart rate data range tounphysiological range, that is, the data become negative. Thiswould mean that the dynamic noise level is larger. Fig. 14(c)shows that the behaviour is stable over long periods, and alsoFig. 14(c) and (d) are very similar to the actual heart rate dataseen in Fig. 12. This would mean that the dynamic noise levelis appropriate. Fig. 14(e) shows that although the behaviourfluctuates and is stable over long periods, the amplitude isglobally smaller than the training data and Fig. 14(f) showsthat the behaviour is slightly poor. This would mean that thedynamic noise level is smaller.

6. Summary and conclusion

This paper has described a method for detecting whethera system has dynamic noise under the assumption thatobservational noise is not large.

Also, we found the noise level between about 60% and 70%of the RMS error and the noise level when the model sizechanges when applying the NLS method can be a good roughstandard for dynamic noise level to add to the models.

https://www.researchgate.net/publication/20512975_Power_spectrum_of_heart_rate_variability_in_human_cardiac_transplant_recipients?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4

https://www.researchgate.net/publication/11807554_Fractal_mechanisms_in_the_electrophysiology_of_the_heart?el=1_x_8&enrichId=rgreq-c083855a017868be7ea318a31795a6f9-XXX&enrichSource=Y292ZXJQYWdlOzIyMjQyNzkzMDtBUzo5NzE1MjUwODM2Njg1NEAxNDAwMTc0MzcxNjI4



Fig. 14. Free-run data of the model of size 10. (a) and (b): dynamic noise level is 0.013, (c) and (d): dynamic noise level is 0.011009 (this is 65% of the RMS error),and (e) and (f): dynamic noise level is 0.009, where (b), (d) and (f) are enlargements of (a), (c) and (e), respectively.

Finally, we note that to make our approach a success, it isimportant that observational noise be relatively small. As theresult shows in Section 4.1, when the level of observationalnoise is substantially larger than that of dynamic noise for thesystem, the behaviour of the result obtained by applying ouridea is almost the same as when the system does not havedynamic noise.

Acknowledgments

We would like to acknowledge Professor H.D.I. Abarbanel(The University of California) for providing us the neodymium-doped yttrium aluminum garnet laser data. This work wassupported by a direct allocation grant from Hong KongPolytechnic University (A-PG58).

References

[1] J. Theiler, S. LuDanK, A. Longtin, B. Galdrikian, J.D. Farmer, Testingfor nonlinearity in time series: the method of surrogate data, Physica D 58(1992) 77–94.

[2] T. Schreiber, A. Schmitz, Improved surrogate data for nonlinearity tests,Phys. Rev. Lett. 77 (1996) 635–638.

[3] T. Schreiber, A. Schmitz, Discrimination power of measures fornonlinearity in a time series, Phys. Rev. E 55 (1997) 5443–5447.

[4] T. Schreiber, A. Schmitz, Surrogate time series, Physica D 142 (2000)346–382.

[5] J. Bhattacharya, Detection of weak chaos in infant respiration, IEEETrans. Syst. Man Cybern. 31 (4) (2001) 637–642.

[6] K.T. Dolan, M. Spano, Surrogate for nonlinear time series analysis, Phys.Rev. E 64 (4) (2001) 046128.

[7] J. Theiler, P.E. Rapp, Re-examination of the evidence for low-dimensional, nonlinear structure in the human electroencephalogram,Electroencephalogr. Clin. Neurophysiol. 98 (1996) 213–222.

[8] M. Small, K. Judd, Detecting nonlinearity in experimental data, Internat.J. Bifur. Chaos 8 (6) (1998) 1231–1244.

[9] M. Small, K. Judd, Comparisons of new nonlinear modeling techniqueswith applications to infant respiration, Physica D 117 (1998) 283–298.

[10] K. Judd, A.I. Mees, On selecting models for nonlinear time series, PhysicaD 82 (1995) 426–444.

[11] K. Judd, A.I. Mees, Embedding as modeling problem, Physica D 120(1998) 273–286.

[12] K. Judd, Building optimal models of time series, in: G. Gouesbet,S. Meunier-Guttin-Cluzel, O. Menard (Eds.), Chaos and its Reconstruc-tion, Nova Science Pub Inc, New York, 2003, pp. 179–214.

[13] K. Judd, T. Nakamura, Degeneracy of time series models: The best modelis not always the correct model, Chaos 16 (3) (2006) 033105.

[14] K.H. Chon, J.K. Kanters, R.J. Cohen, N.-H. Holstein-Rathlou, Detectionof chaotic determinism in time series from randomly forced maps, PhysicaD 99 (1997) 471–486.

[15] J.P.M. Heald, J. Stark, Estimation of noise levels for models of chaoticdynamical systems, Phys. Rev. Lett. 84 (2000) 2366–2369.

[16] M. Henon, A two-dimensional map with a strange attractor, Commun.Math. Phys. 50 (1976) 69–77.



































[17] H. Kantz, T. Schreiber, Nonlinear Time-Series Analysis, in: CambridgeNonlinear Science Series, Number 7, Cambridge University Press,Cambridge, 1997.

[18] H.D.I. Abarbanel, Analysis of Observed Chaotic Data, Springer-Verlag,New York, 1996.

[19] F. Takens, Detecting strange attractors in turbulence, Lect. Notes Math.898 (1981) 366–381.

[20] H. Akaike, A new look at the statistical identification model, IEEE Trans.Automat. Control 19 (1974) 716–723.

[21] J. Rissanen, Stochastic Complexity in Statistical Inquiry, World Scientific,Singapore, 1989.

[22] A.I. Mees, Nonlinear Dynamics and Statistics, Birkhauser, Boston, 2000.[23] T. Nakamura, K. Judd, A.I. Mees, Refinements to model selection for

nonlinear time series, Internet J. Bifur. Chaos 13 (5) (2003) 1263–1274.[24] T. Nakamura, D. Kilminster, K. Judd, A.I. Mees, A comparative study

of model selection methods for nonlinear time series, Internat. J. Bifur.Chaos 14 (3) (2004) 1129–1146.

[25] T. Nakamura, Modelling nonlinear time series using selection methodsand information criteria, Ph.D. Thesis, School of Mathematics andStatistics, The University of Western Australia, 2004.

[26] T. Nakamura, M. Small, Modelling nonlinear time series using improvedleast squares method, Internat. J. Bifur. Chaos 16 (2) (2006) 445–464.

[27] T. Nakamura, M. Small, A comparative study of information criteria formodel selection. Internat. J. Bifur. Chaos 16 (8) (2006) (in press).

[28] T. Nakamura, Y. Hirata, K. Judd, D. Kilminster, M. Small, Improvedparameter estimation from noisy time series for nonlinear dynamicalsystems. Internat. J. Bifur. Chaos 17 (3) (2007) (in press).

[29] T. Matsumoto, L.O. Chua, M. Komuro, The double scroll, IEEE Trans.Circuits Syst. 32 (1985) 797–818.

[30] D. Kilminster, Modelling Dynamical Systems via Behaviour Criterion.Ph.D. Thesis, School of Mathematics and Statistics, The University ofWestern Australia, 2003.

[31] H. Tong, Non-linear Time Series: A Dynamical Systems Approach,Oxford Univ. Press, Oxford, New York, 1990 (Chapter 7.3), pp. 419–429.

[32] M. Small, C.K. Tse, Applying the method of surrogate data to cyclic timeseries, Physica D 164 (2002) 187–201.

[33] D. Walker, A.I. Mees, Reconstructing nonlinear dynamics by extendedKalman filtering, Internat. J. Bifur. Chaos 8 (1998) 557–569.

[34] K.E. Sands, M.L. Appel, L.S. Lilly, F.J. Schoen, G.H. Mudge Jr., R.J.Chohen, Power spectrum analysis of heart rate variability in humancardiac transplant recipients, Circulation 79 (1) (1989) 76–82.

[35] A.L. Goldberger, Fractal mechanisms in the electrophysiology of theheart, IEEE Eng. Med. Biol. Mag. 11 (2) (1992) 47–52.

[36] L. Ljung, E.J. Ljung, System Identification: Theory for the User, in:Prentice Hall Information and System Sciences Series, 1999.

[37] G. Schwarz, Estimating the dimension of a model, Ann. Statist. 6 (1978)461–464.

[38] J. Rissanen, MDL denoising, IEEE Trans. Inform. Theory 46 (7) (2000)2537–2543.

[39] A.R. Barron, Universal approximation bounds for superpositions of asigmoidal function, IEEE Trans. Inform. Theory 39 (1993) 930–945.

[40] P. McSharry, L.A. Smith, Better nonlinear models from noisy data:Attractors with maximum likelihood, Phys. Rev. Lett. 83 (1999)4285–4288.


































Nonlinear dynamical system identification with dynamic noise and observational noise

Documents

Transcript of Nonlinear dynamical system identification with dynamic noise and observational noise