Systematic identifiability study based on the Fisher Information Matrix for reducing the number of...

11
Systematic identifiability study based on the Fisher Information Matrix for reducing the number of parameters calibration of an activated sludge model Vinicius Cunha Machado, Gladys Tapia, David Gabriel, Javier Lafuente, Juan Antonio Baeza * Departament d’Enginyeria Quı ´mica, Universitat Auto `noma de Barcelona, ETSE, 08193 Bellaterra (Barcelona), Spain article info Article history: Received 30 July 2008 Received in revised form 2 May 2009 Accepted 3 May 2009 Available online 11 June 2009 Keywords: ASM2d A 2 /O WWTP FIM Calibration Validation Modelling abstract This work proposes a procedure for calibration and validation of complex models by systematically obtaining identifiable parameter subsets according to the available data. The procedure uses the new RDE criteria calculated from the Fisher Information Matrix (FIM) as the ratio of normalized D to modified E criteria (RDE). It does not require expert knowledge and it defines automatically the dimension of the identifiable subset without requiring a threshold for the RDE. It was applied successfully to the study of the IWA-ASM2d model, which was implemented, calibrated and validated for an anaerobic/anoxic/oxic (A 2 /O) pilot WWTP operated under three different influent ammonium concentrations (15, 20 and 30 mg/L) and two internal recycling ratios (IRR ¼ 2 and 5). Starting from 51 among all the ASM2d parameters, a sensitivity analysis around the ASM2d default values was performed. From the sensitivity ranking, the 20 best-ranked parameters were named ‘‘seeds’’, since each one served for growing a parameter subset for model calibration. The subset generation process added to the seed a parameter that presented the highest RDE among all the remaining parameters of the sensitivity ranking. The process of parameter addition was repeated until the RDE decreased from the current iteration to the previous one. The best subset determined by the methodology {b PAO ,Y PO4 , m A } presented the highest possible value of the RDE. Finally, the simulation of the WWTP with this subset fitted adequately the experimental data while the parameters obtained had low confidence intervals. Ó 2009 Elsevier Ltd. All rights reserved. 1. Introduction The activated sludge process is the most common treatment for municipal and industrial wastewater. In this process, the bacterial biomass suspension is responsible for pollutants removal, usually COD, N and P. Nowadays, modelling of these processes using International Water Association (IWA) activated sludge models (ASM) is widely extended. These models are mainly used for the design or redesign of WWTP (i.e., Benedetti et al., 2008; Ferrer et al., 2008; Rivas et al., 2008), development of control strategies for WWTP (i.e. Flores-Alsina et al., 2008) and control design for inte- grated urban wastewater systems (i.e., Vanrolleghem et al., 2005; Fu et al., 2008). Since the release of ASM1 by Henze et al. (1987), four versions of ASM for organic matter and nutrients removal processes have been proposed by the IWA Task Group on mathe- matical modelling: ASM1, ASM2, ASM2d and ASM3 (Henze et al., 2000). ASM2d was proposed to provide a useful framework for the description of WWTP with biological N and P removal. Three types of microorganisms are defined in ASM2d. Heterotrophic microor- ganisms (X H ) grow on readily biodegradable organic substrates (S F ) and fermentation products (S A ). Autotrophic microorganisms (X A ) are involved in the aerobic process of nitrification, where ammo- nium (S NH4 ) is converted to nitrate (S NO3 ). Finally, PAO microor- ganisms are responsible for enhanced biological phosphorus removal (EBPR) and are modelled considering three state variables: cell internal storage products (X PHA ), stored poly-phosphate (X PP ) and PAO (X PAO ). Usually, an approximate description of a WWTP with N and P removal can be achieved by using ASM2d default parameters, but model calibration is required for an accurate description of experimental data. Moreover, determining the best parameter values, according to a cost function is only part of the problem and should be followed by a confidence assessment of estimates (Checchi et al., 2007). The high number of parameters of complex models as ASM makes difficult to choose which parameters must be selected for calibration. This is usually based on process knowledge and previous experience, but some authors have proposed a systematic approach based on mathematical tools for parameter selection. * Corresponding author. Tel.: þ34935811587; fax: þ34935812013. E-mail addresses: [email protected] (V.C. Machado), [email protected] (G. Tapia), [email protected] (D. Gabriel), [email protected] (J. Lafuente), [email protected] (J.A. Baeza). Contents lists available at ScienceDirect Environmental Modelling & Software journal homepage: www.elsevier.com/locate/envsoft 1364-8152/$ – see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2009.05.001 Environmental Modelling & Software 24 (2009) 1274–1284

Transcript of Systematic identifiability study based on the Fisher Information Matrix for reducing the number of...

lable at ScienceDirect

Environmental Modelling & Software 24 (2009) 1274–1284

Contents lists avai

Environmental Modelling & Software

journal homepage: www.elsevier .com/locate/envsoft

Systematic identifiability study based on the Fisher Information Matrixfor reducing the number of parameters calibration of an activated sludge model

Vinicius Cunha Machado, Gladys Tapia, David Gabriel, Javier Lafuente, Juan Antonio Baeza*

Departament d’Enginyeria Quımica, Universitat Autonoma de Barcelona, ETSE, 08193 Bellaterra (Barcelona), Spain

a r t i c l e i n f o

Article history:Received 30 July 2008Received in revised form2 May 2009Accepted 3 May 2009Available online 11 June 2009

Keywords:ASM2dA2/OWWTPFIMCalibrationValidationModelling

* Corresponding author. Tel.: þ34935811587; fax: þE-mail addresses: [email protected] (V.C. Ma

(G. Tapia), [email protected] (D. Gabriel), [email protected] (J.A. Baeza).

1364-8152/$ – see front matter � 2009 Elsevier Ltd.doi:10.1016/j.envsoft.2009.05.001

a b s t r a c t

This work proposes a procedure for calibration and validation of complex models by systematicallyobtaining identifiable parameter subsets according to the available data. The procedure uses the newRDE criteria calculated from the Fisher Information Matrix (FIM) as the ratio of normalized D to modifiedE criteria (RDE). It does not require expert knowledge and it defines automatically the dimension of theidentifiable subset without requiring a threshold for the RDE. It was applied successfully to the study ofthe IWA-ASM2d model, which was implemented, calibrated and validated for an anaerobic/anoxic/oxic(A2/O) pilot WWTP operated under three different influent ammonium concentrations (15, 20 and30 mg/L) and two internal recycling ratios (IRR¼ 2 and 5). Starting from 51 among all the ASM2dparameters, a sensitivity analysis around the ASM2d default values was performed. From the sensitivityranking, the 20 best-ranked parameters were named ‘‘seeds’’, since each one served for growinga parameter subset for model calibration. The subset generation process added to the seed a parameterthat presented the highest RDE among all the remaining parameters of the sensitivity ranking. Theprocess of parameter addition was repeated until the RDE decreased from the current iteration to theprevious one. The best subset determined by the methodology {bPAO, YPO4, mA} presented the highestpossible value of the RDE. Finally, the simulation of the WWTP with this subset fitted adequately theexperimental data while the parameters obtained had low confidence intervals.

� 2009 Elsevier Ltd. All rights reserved.

1. Introduction

The activated sludge process is the most common treatment formunicipal and industrial wastewater. In this process, the bacterialbiomass suspension is responsible for pollutants removal, usuallyCOD, N and P. Nowadays, modelling of these processes usingInternational Water Association (IWA) activated sludge models(ASM) is widely extended. These models are mainly used for thedesign or redesign of WWTP (i.e., Benedetti et al., 2008; Ferrer et al.,2008; Rivas et al., 2008), development of control strategies forWWTP (i.e. Flores-Alsina et al., 2008) and control design for inte-grated urban wastewater systems (i.e., Vanrolleghem et al., 2005;Fu et al., 2008). Since the release of ASM1 by Henze et al. (1987),four versions of ASM for organic matter and nutrients removalprocesses have been proposed by the IWA Task Group on mathe-matical modelling: ASM1, ASM2, ASM2d and ASM3 (Henze et al.,2000). ASM2d was proposed to provide a useful framework for the

34935812013.chado), [email protected]@uab.cat (J. Lafuente),

All rights reserved.

description of WWTP with biological N and P removal. Three typesof microorganisms are defined in ASM2d. Heterotrophic microor-ganisms (XH) grow on readily biodegradable organic substrates (SF)and fermentation products (SA). Autotrophic microorganisms (XA)are involved in the aerobic process of nitrification, where ammo-nium (SNH4) is converted to nitrate (SNO3). Finally, PAO microor-ganisms are responsible for enhanced biological phosphorusremoval (EBPR) and are modelled considering three state variables:cell internal storage products (XPHA), stored poly-phosphate (XPP)and PAO (XPAO).

Usually, an approximate description of a WWTP with N and Premoval can be achieved by using ASM2d default parameters, butmodel calibration is required for an accurate description ofexperimental data. Moreover, determining the best parametervalues, according to a cost function is only part of the problem andshould be followed by a confidence assessment of estimates(Checchi et al., 2007). The high number of parameters of complexmodels as ASM makes difficult to choose which parameters mustbe selected for calibration. This is usually based on processknowledge and previous experience, but some authors haveproposed a systematic approach based on mathematical tools forparameter selection.

Settler

FIR

FINFLUENT FEFFLUENT

Nomenclature

A2/O Anaerobic, Anoxic and Aerobic (WWTP configuration)ASM Activated Sludge ModelsCCF Calibration Cost FunctionCFA Continuous Flow AnalysisCOD Chemical Oxygen DemandCOST European Co-operation in the field of Scientific and

Technical ResearchDO Dissolved OxygenEBPR Enhanced Biological Phosphorus RemovalFER External Recycle Flow rateFIM Fisher Information MatrixFI Inlet Flow RateFIR Internal Recycle Flow rate

GAO Glycogen Accumulating OrganismsHRT Hydraulic Retention TimeIRR Internal Recycle RatioIWA International Water AssociationORP Oxidation–Reduction PotentialOUR Oxygen Uptake RatePAO Phosphorus Accumulating OrganismsPC-PLC Personal Computer – Programmable Logical ControllerSRT Sludge Retention TimeTSS Total Suspended SolidsVCF Validation Cost FunctionVFA Volatile Fatty AcidVSS Volatile Suspended SolidsWWTP Wastewater Treatment PlantWERF Water Environment Research Foundation

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–1284 1275

The knowledge-based approach makes use of the large amountof experience reported from activated sludge systems (Ruano et al.,2007) as the protocols developed by WERF (Melcer et al., 2003),BIOMATH (Vanrolleghem et al., 2003), STOWA (Hulsbeek et al.,2002) or CALAGUA (Garcıa-Usach et al., 2006). On the other hand,the systematic approach studies the identifiability of ASM modelsrelying on the sensitivity and correlation analysis of modelparameters (Weijers and Vanrolleghem, 1997; Brun et al., 2002; DePauw, 2005). These systematic methodologies firstly calculatea ranking of parameters (local sensitivity analysis) based on itsinfluence on model outputs and then study the correlation analysisof parameter subsets. Weijers and Vanrolleghem (1997) developeda procedure based on the Fisher Information Matrix (FIM) to studythe identifiability of ASM1 models. The D and modE criteria of theFIM were used to find an identifiable parameter subset amongnumerous combinations. This methodology was also successfullyapplied to other kinetic models (Reichert and Vanrolleghem, 2001;De Pauw, 2005; Checchi and Marsili-Libelli, 2005; Marsili-Libelliand Giusti, 2008). On the other hand, Brun et al. (2002) developeda systematic approach for ASM2d calibration based on full-scaleplant data by applying identifiability analysis and a subsequentiterative parameter subset selection and tuning using comparablecriteria to the D and modE criteria. They defined the collinearityindex (g) and the determinant measure (r). The g index representsthe interdependence of all the analyzed parameters and r isa relative measure suited for comparison of parameter identifi-ability of different parameter subsets. In addition, Brun et al. (2002)studied the problem of parameter interdependencies and the effectof fixed parameter values on parameter estimates (bias problem).Recently, a similar methodology was applied for water body qualitymodelling where the water drainage system was the focus insteadof the biological wastewater treatment (Freni et al., 2009).

The main objective of this work is to find a systematic procedureto obtain the best set of parameters: i) able to provide a good fittingof ASM2d to the experimental data available, ii) with low confi-dence intervals and iii) without requiring expert knowledge aboutthe process. The results obtained with the systematic proceduredeveloped, based on a sensitivity analysis and the new RDE crite-rion, are compared to the methodology proposed by Brun et al.(2002).

R1 R2 R3 R4

FERFPURGE

Anaerobic Anoxic Aerobic Aerobic

Fig. 1. A2/O multistage scheme configuration of the pilot WWTP.

2. Material and methods

2.1. Pilot plant

The pilot plant studied was a scale-down of the municipal WWTP of Manresa(Spain). The operation was based on the A2/O multistage configuration with nitri-fication–denitrification and EBPR (Fig. 1). It consisted of a 9 L anaerobic reactor (R1),

a 25 L anoxic reactor (R2), two 25 L aerobic reactors (R3 and R4) and a 60 L settler(Baeza et al., 1999, 2002). Nitrification was performed in the aerobic reactors and thenitrate thus produced was recycled with the mixed liquor to the anoxic reactor(internal recycle) where denitrification took place. The return sludge from the settlerwas recycled to the anaerobic reactor (external recycle) where the influent and thesludge were mixed under anaerobic conditions.

The pilot plant was controlled with a PC-PLC control system that kept in-linemeasurements like flow rates (influent, internal recycle and external recycle), pHand DO at predetermined setpoints. ORP, temperature and actuation on the aerationvalves of the reactors were also monitored. Off-line analyses for N–NH4

þwere carriedout by using a continuous flow analyser (CFA) (Baeza et al., 1999). TSS (Total Sus-pended Solids), VSS (Volatile Suspended Solids), COD and P–PO4

3� were off-linedetermined using the protocols established in the applicable Standard Methods(APHA, 1995).

The sludge used was obtained from an urban WWTP performing N removal. Toallow microbial communities in the wastewater to adapt to the configuration used,the pilot plant was kept under a steady-state influent during 120 days. VSSconcentration remained at similar levels (ca. 4000� 300 mg/L) throughout theexperiments. SRT was kept at 10 days by automatic purge flow wastage in theexternal recycle. The environmental and operational conditions of the pilot plantthroughout the adaptation and experimental periods are summarised in Table 1.

The required inflow (420 L/day of synthetic wastewater of known composition)was prepared from two feed concentrates that were diluted with tap water. Thecomposition of the synthetic water was similar to that typically obtained afterprimary settler treatment in a WWTP and was formulated using compounds withdifferent rates of biodegradability. The exact composition of the wastewater isshown in Table 2. Ammonium chloride was used at three different concentrationsdepending on the experiment. The synthetic wastewater was characterised for thefractioning required for the ASM2d influent specifications. Wastewater character-ization results are shown in Table 3.

2.2. Experimental design

Experiments were designed to obtain information of N and P removal in thepilot WWTP under different conditions. The two variables modified throughout theexperiments were the influent ammonium nitrogen concentration and the internalrecycle ratio (IRR). The later is defined as the ratio between the internal recycle flowrate (FIR) and the inlet flow rate (FI). Each set consisted of two different experimentswhere the ammonium concentration was maintained constant and the internalrecycle flow rate (flow from R4 to R2) was modified. Three different influentammonium nitrogen concentrations were tested, named low (15 mg/L), medium(20 mg/L) and high (30 mg/L). The other influent components were maintainedequal to those shown in Table 2. Two different IRR values were tested for each

Table 1Operational conditions used in the pilot plant throughout the study.

Symbol Parameter Value Unit

FI Influent flow 420 L/dFER External recycle flow 145 L/dFER/FI External recycle/Influent flow 0.35 –FIR Internal recycle flow 840/2100 L/dIRR Internal flow/Influent flow 2/5 –HRT Hydraulic residence time 8.7 hSRT Sludge residence time 10 dT Temperature 19–21 �CpH 7–7.5DO R1 Dissolved oxygen in reactor 1 0 (anaerobic) mg/LDO R2 Dissolved oxygen in reactor 2 0.04 (anoxic) mg/LDO R3 Dissolved oxygen in reactor 3 3 (aerobic) mg/LDO R4 Dissolved oxygen in reactor 4 3 (aerobic) mg/L

Table 3Synthetic wastewater characterization for ASM2d influent specifications.

Component Concentration [mg/L]

SNH4 15.5/21.6/31.5SNO3 1.41SPO4 19.01SALK 15.0SF 184XS 279XTSS 206.9iN;SF

0.012iN;XS

0.030

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–12841276

ammonium concentration: IRR¼ 2 and IRR¼ 5. A ratio of 2 is usually found inWWTP operating with an A2/O configuration, and it is usually considered anadequate compromise between removal efficiency and economical cost. A ratio of 5is around the highest values recommended for this kind of processes and involvesthe highest nitrate removal efficiency, even if at a higher economical cost. A higherIRR is not recommended because it implies a high flow of recycle containing thesame oxygen than the aerobic reactor, which can be detrimental to denitrification.

The detailed experimental procedure (see Fig. 2) is next summarised. First,a period of 120 days of operation under standard conditions (Table 1) was main-tained (influent with 20 mg N–NH4

þ/L and IRR¼ 2). The influent ammoniumconcentration was then decreased to 15 mg N–NH4

þ/L and IRR was set to 5. Aftertwenty-four hours of operation under these conditions, samples were taken fromthe four reactors. Afterwards, the IRR was changed to 2. Samples were taken againfrom the 4 reactors twenty-four hours later. The same procedure was applied forammonium influent concentrations of 20 and 30 mg N–NH4

þ/L, using IRR of 5 and 2for each influent. Therefore, data from six experimental conditions were obtained.

Model calibration was performed calculating the value of the cost function withdata from the experiment of 20 mg N–NH4

þ/L, whereas model validation was per-formed with data available from the experiments performed with the ammoniuminlet concentration of 15 and 30 mg N–NH4

þ/L. It is important to notice that theperiod of 120 days and the experiment at 15 mg N–NH4

þ/L were included in thesimulation for evaluating the cost function at the condition of 15 mg N–NH4

þ/L, butonly data generated during the specific experiment of 15 mg N–NH4

þ/L were used forcalculating the cost function. For the experiment with 20 mg N–NH4

þ/L, the 120days-period and the experiments at 15 and 20 mg N–NH4

þ/L were simulated even ifonly data of the specific experiment of 20 mg N–NH4

þ/L were employed for calcu-lating the cost function. The same procedure was adopted for calculating the costfunction at 30 mg N–NH4

þ/L.

2.3. Model and methodology implementation in MATLAB�

The ASM2d model was implemented in MATLAB� using a modular approach. Itwas easily adaptable to different configurations, as for example the A2/O WWTP pilotplant used. The 19 ASM2d variables were considered in each one of the four reactorsand in the external recycle. The settler was described with a one-dimensional 10-layer model using Takacs settling velocity (Takacs et al., 1991). The obtained set of 86differential equations was solved using the ode15s MATLAB� function.

Table 2Synthetic wastewater composition.

Component Concentration [mg/L]

Casein peptone 99Ammonium chloride (NH4Cl) 57/79/114Urea (CO(NH2)2) 4.74Starch ((C12H10O5)n) 103Sucrose (C6H22O11) 78D(þ)-Glucose (C6H12O6) 78Magnesium chloride (MgCl4.7H2O) 78Sodium chloride (NaCl) 69Potassium dihydrogenphosphate (K2HPO4) 65Calcium chloride (CaCl2.2H2O) 34Iron sulphate (FeSO4.7H2O) 1.7Zinc sulphate (ZnSO4.7H2O) 1.7Manganese sulphate (MnSO4.H2O) 1.1Yeast extract 0.86Copper sulphate (CuSO4.5H2O) 0.86Boric acid (H3BO3) 0.086

All the steps of the methodology were implemented and automated in MAT-LAB�. Different functions called by the main function include: model simulation,calculation of sensitivities, building of calibration subsets, parameter optimization(calibration), model validation and calculation of FIM and its derived criteria for theselection of the best subset. Section 3.6 details the computational efforts of theproposed methodology.

2.4. Sensitivity analysis

Sensitivity analysis allows making a ranking of the most important parametersthat affect the outputs. Relative sensitivity of an output i (yi) respect a parameter j(qj) is defined as (Reichert and Vanrolleghem, 2001),

Si j ¼qj

yi

dyi

dqj(1)

Norton (2008) proposed the utilization of algebraic sensitivity analysis because thenumerical value of sensitivity applies only for a specific change from a specific valueof qj, while the former provides algebraic relations. Numerical values of sensitivityare generally much less informative than an algebraic relation, but algebraicsensitivity analysis is not feasible if the equations of the model are complicated as inASM2d. Therefore, the derivatives of Equation (1) were determined numerically bythe finite differences method. The central difference approach with 10�4 (0.01%) asperturbation factor was used for the sensitivity calculations of each tested parameteraround the default ASM2d value. This perturbation factor was selected because itproduced equal derivative values with forward and backward finite differences (DePauw, 2005).

The overall sensitivity of a parameter was calculated by adding absolute valuesof individual sensitivities. In our case, 16 output variables were declared from which8 were ammonium concentrations, one for each reactor at IRR¼ 2 and IRR¼ 5, andother 8 were phosphate concentrations, one for each reactor at IRR¼ 2 and IRR¼ 5.Hence, the overall sensitivity value of a parameter j (OSj) was calculated withEquation (2).

OSj ¼X8

i¼1

jSij; NH4j þX16

i¼9

jSij; PO4j (2)

It is important to notice that some model parameters were not considered forcalibration. Although temperature correction factors were shown as one of the most

Fig. 2. Diagram of the experimental design performed in the pilot WWTP. Solid line:ammonium influent concentration. Dashed line: internal recycle ratio (IRR). Dottedline: sampling event in the reactors.

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–1284 1277

influencing parameters on the model output (Ruano et al., 2007), they were notconsidered since the experiments were isothermally carried out at 20 �C. Equally,the influence of variable influent characteristics was not studied since the compo-sition of the synthetic wastewater was known during the experimental tests.Sedimentation parameters were also not included in the study because previousexperiments allowed us adapting the default sedimentation parameters (Copp,2002) to the pilot plant sludge characteristics.

Among the nine stoichiometric parameters of the ASM2d model, fSI (productionof soluble inert COD, SI) and fxI (fraction of inert COD generated in PAO, heterotrophsand autotrophs lysis) were kept constant at their default values.

Therefore, the sensitivity rank included 51 ASM2d parameters: the 45 kineticparameters and the stoichiometric parameters YH, YPAO, YPO4, YPHA and YA.

2.5. FIM criteria

The FIM summarizes the importance of each model parameter over the outputs,since it measures the variation of output variables caused by a variation of modelparameters (Dochain and Vanrolleghem, 2001; Guisasola et al., 2006). Algebraically,the FIM is represented by Equation (3),

FIM ¼XN

k¼1

YqðkÞ$Q�1k $YT

q ðkÞ (3)

For a FIM calculated for r output variables and p parameters, it is a p� p matrix,where k represents each sampling data point, QK is the r� r covariance matrix of themeasurement noise, q is the vector of p parameters, N is the total number of samplesand Yq is the p� r output sensitivity function matrix, expressed by Equation (4):

YTq ðtÞ ¼

�vyðt; q0Þ

vqT

�q0

(4)

where q0 is the complete model parameter vector used for calculating the deriva-tives and qT is the transposed parameter vector, whose elements are being studied.In the present study, the derivative shown in Equation (4) was numerically obtainedby finite differences using a perturbation factor of 10�4 as in the sensitivity calcu-lations. Mathematically was proved that the FIM provides a lower bound of the

CF ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP4n¼1 sc2

NH4

�SReactor n; IRR¼5

NH4; EXP � SReactor n; IRR¼5NH4; MOD

�2þP4

n¼1 sc2NH4

�SReactor n; IRR¼2

NH4; EXP � SReactor n; IRR¼2NH4; MOD

�2

þP4

n¼1 sc2PO4

�SReactor n; IRR¼5

PO4; EXP � SReactor n; IRR¼5PO4; MOD

�2þP4

n¼1 sc2PO4

�SReactor n; IRR¼2

PO4; EXP � SReactor n; IRR¼2PO4; MOD

�2

vuuuut (9)

parameter error covariance matrix (Soderstrom and Stoica, 1989) as shown byEquation (5),

covðq0Þ � FIM�1 (5)

This FIM property was used for calculating the confidence interval Dqj with Equation(6) for a given parameter qj (Seber and Wild, 1989),

Dqj ¼ ta;N�p

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffifficov

�qj�q

(6)

where t is the statistical t-student with a¼ 95% of confidence and N-p degrees offreedom (number of experimental data points minus p parameters), and covðqjÞwasassumed as FIM�1

jj .As can be observed, the calculation of the parameter error covariance matrix

using the FIM involves its inversion. To be invertible, the FIM should have a deter-minant different from zero and should not be ill-conditioned. To match theserequirements, any pair of matrix columns should not be very similar. As each columnof the matrix represents a parameter, the determinant and the condition number ofthe FIM provides a reasonable measurement of the correlation of a set of parameters.Hence, parameters less correlated will easily provide a diagonal-dominant matrix.The FIM determinant (D criterion) and the ratio between the highest and the lowestFIM eigenvalue (modE criterion) can be used as criteria for parameter subsetselection. A modE criterion value close to the unity indicates that all the involvedparameters independently affect the outputs while the shape of the confidenceregion is similar to a circle (2 parameters) or a sphere (3 parameters) and not ellipsesand ellipsoids as occur with correlated parameters. A high D criterion value meanslower values of the diagonal elements of the covariance matrix and hence, lowerconfidence intervals of the parameters. As the D criterion is dependent on themagnitude of the involved parameters, this criterion was normalized (normD)according to Equation (7):

normD ¼ D$kqk2 (7)

where kqk is the Euclidean norm of the parameter vector. Such normalization worksas a scaling factor and allows comparisons among subsets with the same size butwith different parameters.

From the system engineering point of view, it is important to include in theparameter subset those parameters that maximize the normD criterion and mini-mize the modE criterion. Hence, the ratio between the normD and the modE criteria(RDE criterion) is proposed in the present work as an interesting index to definesubsets of parameters for calibration. The RDE criterion (Equation (8)) establishesthe capacity of a parameter subset to explain experimental data coupled to lowuncertainty in the estimated parameters.

RDE ¼ normDmodE

(8)

The results of the RDE criteria were compared to the collinearity index (g) andthe determinant measure (r) calculated with the methodology of Brun et al. (2002).The r criterion indicates the importance of each parameter to explain the outputsbehaviour while g represents the interdependence of all the analyzed parameters.The r measure is the determinant of ðST

K SK Þ1=2K where SK is the sensitivity matrix ofthe parameter subset of size k taken from the full sensitivity matrix S. Index g isequal to ~l�1=2

K where ~lK is the smallest eigenvalue of ~STK

~SK , where ~SK is SK normalizedcolumn by column by applying the Euclidian norm to each column of SK. Accordingto Brun et al. (2002), a high value of g indicates a high correlation amongst theparameters while a high value of r indicates that the set of parameters are notstrongly correlated and are able to explain the process behaviour. In that work, g andr were calculated for many parameter combinations of different size, even if theywere not used in a systematic procedure to reduce possible combinations ofparameters.

2.6. Cost function evaluation and minimization

The calibration procedure was based on the minimization of a cost function(Equation (9)) calculated as the Euclidian norm of the difference between theexperimental data and the model predictions of ammonium and phosphateconcentrations in all reactors at IRR¼ 2 and IRR¼ 5.

where scNH4 and scPO4 are scale factors for ammonium and phosphate values being inthe same magnitude. Scale factor scPO4 was considered 1.00 (it was the referenceoutput variable), while scNH4 was calculated as 1.240, 1.033 and 0.630 for medium,low and high loadings, respectively. Such factor was calculated as the ratio betweenthe average of the eight experimental ammonium concentrations and the average ofthe eight experimental phosphate concentrations. Equation (9) was applied to themedium-load experiment to obtain the calibration cost function (CCF). It was alsoused to calculate the validation cost function (VCF) with both the low-load and high-load experimental data for comparing the prediction capacity of the model whenusing different parameters subsets (model validation).

The optimization algorithm employed was the MATLAB� fminsearch based onthe Nelder and Mead Simplex method (Lagarias et al., 1998). Fminsearch is anunconstrained direct method and does not use numerical or analytical gradients ofthe cost function. The tolerances used were 10�4 and 10�5 for the parameters and forthe cost function, respectively. Although in practice model parameters requireconstraints, the optimization method used was unconstrained because constraintsmay disrupt convergence properties and produce a less realistic covariance matrix(Checchi and Marsili-Libelli, 2005). However, a posterior analysis of results caneasily detect unrealistic parameter estimations.

The analysis of the quality of the model calibration started from checking thesystem response with the default values of ASM2d parameters for all the loadings.These simulations were useful as a reference to compare how much the optimizedparameter values improved the model predictions. Such comparison was performedthrough visual inspection of experimental data and model predictions curvestogether with the Janus coefficient (Sin et al., 2008), the CCF and the VCF values. TheJanus coefficient is defined as the ratio between the sum of the squared differencebetween experimental validation data and model predictions and the sum of thesquared difference between the calibration data and model predictions, according toEquation (10),

Fig. 3. Flowchart of the proposed methodology.

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–12841278

J2 ¼1=NVAL

PNVAL1 ðyEXP � yMODÞ2

1=NCALPNCAL

1 ðyEXP � yMODÞ2(10)

where NCAL and NVAL are the number of experimental data points used for calibrationand validation, respectively, and yEXP and yMOD are the experimental data points andthe model prediction vector, respectively. When J2 is closer to the unity, the cali-brated model provides a similar performance in calibration and validation data.

3. Results and discussion

3.1. Model calibration procedure using the FIM

The model calibration procedure is based on the sensitivityanalysis, the FIM calculations and the parameter estimation

through minimizing the abovementioned CCF. Fig. 3 shows a flow-chart illustrating the proposed methodology. A pre-selection of themost influential parameters is recommended in the literature (DePauw, 2005) to prevent the combinatorial explosion of the numberof parameter subsets in later steps of the procedure. Therefore, thesensitivity analysis (Equation (2)) is employed for the initial clas-sification of the parameters studied. The first twenty parameters ofthe ranking are selected as candidates to be evaluated. They arecalled ‘‘seeds’’ because each one will originate a parameter subsetfor model calibration.

The RDE criteria for all the possible pairs between the first seedand the other parameters of the sensitivity ranking are calculatedaround the ASM2d default parameter values. The pair witha maximum RDE value is chosen to continue building a newparameter subset. Next step is to calibrate the model minimizingthe CCF using the pair of parameters selected. Once the CCF isoptimized, the RDE criteria is recalculated with the optimizedparameter values (RDEC, RDE corrected). Note that each new RDEcalculation implies the evaluation of a new FIM matrix (Equation(3)) with the updated set of parameters.

Next step is to calculate the RDE for all combinations betweenthe pair previously selected and one of the remaining parameters ofthe sensitivity ranking. Then, the three-parameter subset witha maximum RDE value is selected for calibration. The CCF is opti-mized and the RDEC is calculated with the new optimized values forthe three parameters. The whole procedure of parameter subsetextension is repeated until the current RDEC is lower than the RDEC

of the previous step. Note that the procedure adds only oneparameter at each step. At this point, the subset of the previousstep, which has the maximum RDEC value among all the optimizedsubsets along the iterations, is taken as the final subset produced bythe seed that is being investigated. Then, the subset of the next seedis generated (second parameter of the sensitivity ranking). Thisprocedure is repeated until each one of the 20 seeds has beeninvestigated.

During the calibration step, if the tested subset has any optimizedvalue out of the physical range defined in the ASM2d model (e.g.negative values of any parameter or yield parameters higher than 1)the subset is considered not identifiable and discarded. Then, thenext extended subset with a higher RDE index is chosen for cali-bration instead of the subset that generated inappropriate optimizedvalues. It is important to highlight that the initial guess of the opti-mization of pairs for each seed is the default ASM2d value of eachparameter. Optimizations with three or more parameters have theirinitial guesses obtained from the values of optimized parameters ofthe previous iteration and the ASM2d value of the new parameter.

Next step is to choose the best subset among all producedsubsets by all seeds. The RDE criterion is also used for this purpose,as it indicates how much a subset is able to explain the experi-mental process behaviour producing lower estimation parametererrors. Therefore, the subset with the higher RDEC can be consid-ered the best subset.

Finally, potential bias problem is studied to evaluate the influ-ence of parameters that do not belong to the calibration subset overthe optimized values of the own calibration parameters (Brun et al.,2002; De Pauw, 2005). To perform a simple study, three parametersthat could not enter the selected subset during its construction butpresented high values of RDE are chosen. These parameters aremodified around their default values, the subset is re-calibrated andthe parameter confidence intervals are calculated.

3.2. Building the sensitivity ranking

The sensitivity analysis was employed for the initial classifica-tion of the 51 parameters studied. Table 4 presents a ranking of the

Table 4Relative sensitivity of the sum of the ammonium and phosphate measurements with the ASM2d parameters around their default values for the medium loading (20 mg-N–NH4þ).

Order Parameter and short description Related biomass or process Sensitivity

1 YPAO Yield coefficient of PAO (XPAO/XPHA) PAO 17.632 mA Maximum growth rate of XA Autotrophic 14.683 qPHA Rate constant for storage of XPHA PAO 10.124 bPAO Rate for lysis of XPAO PAO 7.985 mH Maximum growth rate of XH Heterotrophic 5.586 KMAXP Maximum ratio of XPP/XPAO PAO 5.307 YH Yield coefficient of XH Heterotrophic 5.228 YPO4 Poly-phosphate requirement (PO4 release) per PHA stored PAO 4.929 KO2,H Saturation/inhibition coefficient for oxygen Heterotrophic 3.8910 KA,P Saturation coefficient for acetate PAO 3.5111 hFE Anaerobic hydrolysis reduction factor Hydrolysis 2.7612 KA,H Saturation coefficient for acetate Heterotrophic 2.4013 qPP Rate constant for storage of XPP PAO 2.2614 qFE Maximum fermentation rate Heterotrophic 2.1015 KH Hydrolysis rate constant Hydrolysis 1.7416 bH Rate for lysis of XH Heterotrophic 1.4517 mPAO Maximum growth rate of XPAO PAO 1.3418 KF Saturation coefficient for growth on SF Heterotrophic 1.1619 bPP Rate for lysis of XPP PAO 0.7920 KPS Saturation coefficient for phosphate in XPP storage PAO 0.7321 Kx Saturation coefficient for particulate COD Hydrolysis 0.7322 KFE Saturation coefficient for fermentation on SF Heterotrophic 0.7023 hNO3,P Reduction factor for denitrification PAO 0.6824 KNO3 Saturation/inhibition coefficient for nitrate Hydrolysis 0.5425 KIPP Inhibition coefficient for XPP storage PAO 0.5326 KNO3,P Saturation coefficient for nitrate PAO 0.3827 KPP Saturation coefficient for XPP PAO 0.3828 KO2,P Saturation/inhibition coefficient for oxygen PAO 0.3729 hNO3,H Reduction factor under anoxic conditions Heterotrophic 0.3330 bPHA Rate for lysis of XPHA PAO 0.1031 YA Yield of XA per NO3

- -N Autotrophic 0.0832 KNH4,A Saturation coefficient for ammonium as substrate Autotrophic 0.0733 KALK,A Saturation coefficient for alkalinity Autorotrophic 0.0534 KNH4,P Saturation coefficient for ammonium as nutrient PAO 0.0335 KALK,P Saturation coefficient for alkalinity PAO 0.0236 KP,P Saturation coefficient for phosphate for growth PAO 0.0237 KP,A Saturation coefficient for phosphate as nutrient Autotrophic 0.0138 KO2 Saturation coefficient for oxygen Hydrolysis 0.0139 KNO3,H Saturation/inhibition coefficient for nitrate Heterotrophic 0.0040 KPHA,P Saturation coefficient for XPHA PAO 0.00

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–1284 1279

first 40 ASM2d parameters in descending order of the value ofEquation (2). Sensitivity ranking indicated that YPAO, mA and qPHA

were the most sensible parameters among all the parametersstudied. Once the sensitivity ranking was built, the first twentyparameters were selected as seeds to be evaluated in the next stepof the procedure.

Table 5Parameter subset selection procedure applied to the seed bPAO.

Step Parameters D criterion normD criterion modE c

Selecting 2nd parameter {bPAO, YPO4} 3.29Eþ07 6.57Eþ06 8.00Eþ0{bPAO, KMAXP} 1.81Eþ07 2.82Eþ06 1.70Eþ0{bPAO, KO2,H} 3.96Eþ06 3.17Eþ05 9.30Eþ0{bPAO, mA} 2.24Eþ06 2.33Eþ06 1.14Eþ0{bPAO, hFE} 5.03Eþ05 1.01Eþ05 5.32Eþ0

Optimizing 2P {bPAO,YPO4} 5.84Eþ09 1.17Eþ09 9.40Eþ0Selecting 3rd parameter {bPAO, YPO4, mA} 1.65Eþ13 1.98Eþ13 5.01Eþ0

{bPAO, YPO4, KPS} 3.11Eþ13 7.46Eþ12 4.26Eþ0{bPAO, YPO4, YPAO} 1.04Eþ13 6.12Eþ12 1.26Eþ0{bPAO, YPO4, mPAO} 2.72Eþ12 3.27Eþ12 1.82Eþ0{bPAO, YPO4, KMAXP} 6.40Eþ12 2.02Eþ12 1.13Eþ0

Optimizing 3P {bPAO, YPO4, mA} 1.49Eþ13 1.78Eþ13 7.89Eþ0Selecting 4th parameter {bPAO, YPO4, mA, KPS} 2.63Eþ15 3.26Eþ15 2.08Eþ0

{bPAO, YPO4, mA, KMAXP} 1.43Eþ15 1.88Eþ15 2.36Eþ0{bPAO, YPO4, mA, bPP} 9.25Eþ14 1.15Eþ15 3.20Eþ0{bPAO, YPO4, mA, KO2,H} 6.44Eþ14 7.98Eþ14 3.19Eþ0{bPAO, YPO4, mA, YPAO} 7.70Eþ14 1.23Eþ15 6.44Eþ0

Optimizing 4P {bPAO, YPO4, mA, KPS} 4.63Eþ10 5.74Eþ10 7.40Eþ0

3.3. Seeds analysis

The twenty seeds were used for sequentially constructing cali-bration subsets. To illustrate the proposed methodology, Table 5shows the results of the parameter subset selection procedureapplied to the bPAO seed. The first step was to determine the

riterion RDE for parameter selection RDEC Optimized values CCF

0 8.22Eþ051 1.66Eþ051 3.41Eþ032 2.04Eþ042 1.89Eþ02

1 1.24Eþ07 [0.221, 0.165] 10.112 3.95Eþ102 1.75Eþ103 4.84Eþ093 1.80Eþ093 1.79Eþ09

2 2.26Eþ10 [0.167, 0.195, 1.033] 9.534 1.56Eþ114 7.98Eþ104 3.58Eþ104 2.50Eþ104 1.90Eþ10

4 7.76Eþ05 [0.168, 0.177, 0.812, 3.500] 4.07

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–12841280

parameter of the sensitivity ranking that together with bPAO

provided the highest RDE. Table 5 shows the five best-ranked pairsamong the nineteen tested. YPO4 was added to the bPAO seedbecause this pair had the highest RDE value (RDE for parameterselection in Table 5). The next step was the optimization of the pair{bPAO, YPO4} to improve the model prediction. In this calibrationstep with 2 parameters, the CCF value was 10.11. The RDEC wascalculated with a FIM evaluation around the new parameter values,resulting in a RDEC of 1.24$107. Afterwards, parameter values wereupdated and the RDE was calculated for all the three-parametersubsets constituted by bPAO, YPO4 and each one of the 18 remainingparameters of the sensitivity ranking. Note that the RDE calculationat this point is performed with the updated values of bPAO and YPO4

and the other parameters set at their default ASM2d values.mA was the next parameter added to the subset (RDE of

3.95$1010). Optimization of subset {bPAO, YPO4, mA} produceda CCF¼ 9.53 while optimized values of bPAO, YPO4 and mA were 0.167,0.195 and 1.033, respectively. Again, the RDEC was determined bythe FIM calculation of {bPAO, YPO4, mA} in the optimized point,resulting in a RDEC value of 2.26�1010. As the RDEC of {bPAO, YPO4,mA} was higher than the RDEC of {bPAO, YPO4}, the subset construc-tion needed to be continued because the RDEC had grown upbetween two successive iterations. Hence, repeating the procedure,the RDE was evaluated between the already existing three-parameter subset and each one of the 17 remaining parameters ofthe sensitivity ranking. Since KPS presented the higher value of RDE,the four-parameter subset {bPAO, YPO4, mA, KPS} was optimized,obtaining a CCF equal to 4.07. The RDEC was calculated for thissubset, resulting in a value of 7.65�105, which was lower than theRDEC of {bPAO, YPO4, mA}. This fact broke the loop of subsetconstruction and the final result generated by the bPAO seed was thethree-parameter subset.

The same seed methodology was applied to the other 19 seeds.Fig. 4 shows the behaviour of RDEC versus the number of calibrationparameters included into the parameter subset for ten seeds. Thegeneral trend of RDEC is to increase until subsets size is three.Adding a fourth parameter, modE criteria becomes higher enoughto increase parameter correlation and hence RDEC decreases. Table6 presents a summary of results of the methodology for all theseeds. The generated parameter subset, the optimized values, theconfidence interval calculated with Equation (6) and severalperformance criteria are shown for each seed in Table 6. Note that

2 310-8

10-6

10-4

10-2

100

102

104

106

108

1010

1012

Number of Calib

RD

Ec

Fig. 4. RDEC versus the number of calibration parameters during

different seeds can generate the same final subset with the sameoptimum values, as occurs for seeds bPAO and mA, YPO4 and KO2,H orKPS and mH.

3.4. Selecting the best subset

Table 6 shows RDEC, optimized parameter values, confidenceinterval, cost function values, Janus coefficient and Bruns’ indexesof all the subsets generated by the seeds. These data show thatseveral subsets provide good model fitting to experimental dataused for calibration (CCF) and for validation (VCF), while presentinglow confidence intervals. However, the procedure should be able todecide which one is the best subset.

From Table 6, the best three ranked subsets according to theRDEC criterion are the subsets from the bPAO, mA and YPAO seeds,although the subset composition and the parameter values of thefirst two subsets are equal. The 4th subset was the only one withfour parameters selected by the methodology. Such size provokeda low reliable subset with too high confidence intervals. On theother hand, the Janus coefficient evaluates the model predictioncapacity using calibration and validation data. This coefficientshowed values around 3.35 for the first three subsets with threeparameters, while a value of 1.90 was obtained for the four-parameter subset. Although these values are proper, these resultsshow that the subset grown from the YH seed provides betterresults when only this criterion is considered. However, it should bementioned that this index does not take into account the confi-dence intervals of parameters and overfitting problems are likelyappearing.

Therefore, selecting {bPAO, YPO4, mA} as the best subset, aspointed out by the RDEC criteria, seems to be a good decision,balancing an appropriate description of calibration and validationdata, while showing parameter values comparable to the literatureand with one of the lowest confidence intervals. In addition, itappears twice in the list because it was obtained from two differentseeds following different ways, which corroborates the identifi-ability of this subset. The selected subset reduced the CCF valuefrom the original 34.16 with default ASM2d parameters to 9.53,which is graphically confirmed by Figs. 5 and 6. Fig. 5 presents theammonium and phosphate experimental concentrations andmodel predictions for the experiments performed at low, mediumand high load at both IRR tested. The simulated default ASM2d

4 5

ration Parameters

bPAOYPO4KMAXPbPPYPAOKFµHµPAOqFE

YH

the process of building calibration subsets from each seed.

Table 6Summary of results for the subsets created with the proposed procedure for each one of the 20 seeds, ordered by RDEC.

Seed Parameters Optimized values ParameterConfidenceInterval (%)

Norm of ParameterConfidence Interval(%)

normD modE RDEC CCF VCF Janus r g

bPAO bPAO, YPO4, mA 0.167, 0.195, 1.033 5 17 4 18.1 1.78Eþ13 7.89Eþ02 2.26Eþ10 9.53 18.39 3.35 16.0 22.7mA mA, bPAO, YPO4 1.033, 0.167, 0.195 4 5 17 18.1 1.78Eþ13 7.89Eþ02 2.26Eþ10 9.53 18.39 3.35 16.0 22.7YPAO YPAO, YPO4, mA 0.670, 0.191, 1.063 2 17 4 17.6 1.19Eþ13 6.96Eþ02 1.71Eþ10 9.84 18.80 3.39 22.2 27.1YH YH, YPO4, mH,

KPS

0.969, 0.227, 4.493,0.482

0.8 21 12 75 78.8 2.14 Eþ15 3.30Eþ05 6.49Eþ09 6.87 10.92 1.90 11.2 6.80

KMAXP KMAXP, YPO4, mA 0.416, 0.226, 0.985 6 20 4 21.3 1.81Eþ12 3.39Eþ02 5.34Eþ09 8.76 17.37 3.36 18.4 17.4YPO4 YPO4, KO2,H, mA 0.190, 0.298, 1.048 17 13 4 21.7 9.31Eþ11 2.31Eþ02 4.03Eþ09 9.60 18.50 3.47 12.2 18.6KO2,H KO2,H, YPO4, mA 0.298, 0.190, 1.048 13 17 4 21.7 9.31Eþ11 2.31Eþ02 4.03Eþ09 9.60 18.50 3.47 12.2 18.6hFE hFE, YPO4, mA 0.603, 0.188, 1.056 11 17 4 20.6 4.02Eþ11 4.69Eþ02 8.57Eþ08 9.79 18.87 3.49 13.2 31.2bPP bPP, YPAO, YPO4 0.327, 0.682, 0.164 22 2 24 32.6 7.17Eþ11 1.33Eþ03 5.41Eþ08 8.77 16.68 3.06 9.2 21.2qPHA qPHA, YPO4, KPS 3.716, 0.154, 0.284 7 17 15 23.7 7.46Eþ10 2.60Eþ03 2.87Eþ07 8.85 16.92 3.08 5.6 14.9mH mH, YPO4, KPS 4.420, 0.182, 0.257 9 19 15 25.8 1.29Eþ11 7.44Eþ03 1.73Eþ07 8.75 16.65 3.17 6.0 17.3KPS KPS, YPO4, mH 0.257, 0.182, 4.420 15 19 9 25.8 1.29Eþ11 7.44Eþ03 1.73Eþ07 8.75 16.65 3.17 6.0 17.3mPAO mPAO, qPHA, YPO4 1.257, 3.766, 0.152 8 8 17 20.4 1.10Eþ10 3.26Eþ03 3.39Eþ06 8.93 17.14 3.17 6.4 11.7bH bH, YPO4, mA 0.322, 0.177, 1.000 84 20 7 86.6 3.94Eþ09 1.45Eþ03 2.72Eþ06 13.50 20.11 1.22 4.7 21.3KA,P KA,P, YPO4, KMAXP 4.132, 0.208, 0.416 22 14 11 28.3 5.25Eþ10 7.05Eþ04 7.45Eþ05 8.84 16.81 3.29 7.1 18.1KH KH, YPO4, KPS 5.421, 0.177, 0.260 14 19 15 27.9 1.18Eþ10 2.64Eþ04 4.48Eþ05 8.78 16.77 3.17 5.5 18.1KA,H KA,H, YPO4, KPS 8.738, 0.180, 0.261 24 19 15 34.0 1.86Eþ09 1.58Eþ05 1.18Eþ04 8.76 16.58 3.15 4.2 15.5KF KF, YPO4, KPS 11.22, 0.181, 0.253 28 19 16 37.4 1.21Eþ09 5.34Eþ05 2.26Eþ03 8.80 16.44 3.07 4.2 19.8qPP qPP, qPHA 2.800, 2.509 219 7 219 5.51Eþ04 8.91Eþ03 6.18Eþ00 9.43 43.40 4.09 1.04 1.60qFE qFE, YPO4 27.73, 0.193 58 6 58.3 3.93Eþ04 3.08Eþ07 1.28E�03 9.56 18.29 2.86 6.4 4.7

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–1284 1281

profiles followed very well the tendency for both variables(ammonium and phosphate), even though there were largedifferences between the simulated values and the experimentaldata. This simulation underestimated P-removal and overestimated

Inf. R1 R2 R3 R40

10

20

30

40

Plant location

NH

4+

,[m

gN

/L

],

Lo

ad

=

20

Inf. R1 R2 R3 R4

Inf. R1 R2 R3 R40

10

20

30

40

Plant location

NH

4+

,[m

gN

/L

],

Lo

ad

=

15

Exp. IRR = 5Sim. IRR = 5

Exp. IRR = 2Sim. IRR = 2

0

10

20

30

40

Plant location

NH

4+

,[m

gN

/L

],

Lo

ad

=

30

Fig. 5. Concentration of ammonium (left) and phosphate (right) in the influent and the fourN–NH4

þ/L), medium (20 mg N–NH4þ/L) and high (30 mg N–NH4

þ/L) loading and internal recyclshown with lines (solid and dashed).

ammonium removal. Therefore, the model with the default valuesof ASM2d parameters predicted a lower PAO activity and a higherautotrophic activity compared to the experimental data. Fig. 6shows the plant data and the model predictions with the optimized

Inf. R1 R2 R3 R4

Inf. R1 R2 R3 R4

0

10

20

30

40

50

Plant location

PO

43-,[m

gP

/L

],

Lo

ad

=

20

Inf. R1 R2 R3 R40

10

20

30

40

50

Plant location

PO

43-,[m

gP

/L

],

Lo

ad

=

15

Plant location

0

10

20

30

40

50

PO

43-,[m

gP

/L

],

Lo

ad

=

30

reactors of the A2/O pilot plant obtained in the experiments performed at low (15 mge ratios of 2 and 5. Model predictions using the default values of ASM2d parameters are

0

10

20

30

40

NH

4+

,[m

gN

/L

],

Lo

ad

=

20

0

10

20

30

40

50

PO

43-,[m

gP

/L

],

Lo

ad

=

20

Inf. R1 R2 R3 R40

10

20

30

40

Plant location

NH

4+

,[m

gN

/L

],

Lo

ad

=

15

Exp. IRR = 5Sim. IRR = 5Exp. IRR = 2Sim. IRR = 2

Inf. R1 R2 R3 R40

10

20

30

40

50

Plant location

Inf. R1 R2 R3 R4Plant location

Inf. R1 R2 R3 R4Plant location

Inf. R1 R2 R3 R4Plant location

Inf. R1 R2 R3 R4Plant location

PO

43-,[m

gP

/L

],

Lo

ad

=

15

0

10

20

30

40

NH

4+

,[m

gN

/L

],

Lo

ad

=

30

0

10

20

30

40

50

PO

43-,[m

gP

/L

],

Lo

ad

=

30

Fig. 6. Concentration of ammonium (left) and phosphate (right) in the influent and the four reactors of the A2/O pilot plant obtained in the experiments performed at low (15 mgN–NH4

þ/L), medium (20 mg N–NH4þ/L) and high (30 mg N–NH4

þ/L) loading and internal recycle ratios of 2 and 5. Medium loading was used for calibration and low and high loadingswere used for validation. Model calibration was performed with parameter subset {bPAO, YPO4, mA}. Model predictions are shown with lines (solid and dashed).

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–12841282

values of the calibration subset. A good agreement betweenexperimental data and the model prediction is observed, especiallyfor low and medium loads. However, some discrepancies areobserved for the ammonium profile at high loads.

The optimized values of bPAO, YPO4 and mA were 0.167, 0.195 and1.033, respectively. YPO4 value was 50% lower than its default value(0.4 g P/g of COD), probably due to the presence of GAO bacteria(Glycogen Accumulating Organisms), because a lower anaerobic Prelease vs. VFA uptake would be observed due to the simultaneousVFA uptake by GAO and PAO. Thereby, GAO could be considered asan external unmeasured process disturbance that affected theoptimized values of the PAO parameters.

Table 6 also points out that YPO4 is present in most subsets, beingthe first parameter to be added to the seed. Such fact indicates thatYPO4 is the parameter that most maximizes the RDE criterion for allseeds. Such effect is probably because phosphate processes in thepilot plant were not well described by the default values of ASM2dparameters in comparison to the processes related to nitrogenremoval. Besides, only one parameter related to autotrophicprocesses appears in many subsets (mA) and its optimized value(1.033) is very close to the default ASM2d value (1.000). Comparingall subsets containing YPO4 and mA, the true value of both parame-ters was close to the values obtained by the selected subset. Takingall the optimized values for YPO4 from all the subsets and excludingrepeated values, an average value of 0.187� 0.021 was obtained.Repeating this procedure for mA, an average value of 1.031�0.032was obtained. The low standard deviations were of the samemagnitude to the parameter confidence intervals obtainedanalyzing an individual subset, which provided confidence on theparameter estimates. Nevertheless, a potential bias problem shouldbe investigated to confirm this observation.

Table 6 also compares the results obtained with the criteriaproposed by Brun et al. (2002). A high value of r indicates that

parameters are not strongly correlated and are able to explain theprocess behaviour. Results show a good agreement between r andthe RDEC, as high values of r are observed for the subsets betterclassified in the RDEC ranking. Both criteria indicate good identifi-ability of parameter subsets. Instead, the collinearity index(g criteria) shows a greater dispersion and does not allow selectingbetween subsets. According to Brun et al. (2002), if g exceeds anempirically found threshold of approximately 10–15, then thecorresponding parameter subset is poorly identifiable. However,the results reported in Table 6 show that g values about 20 generatesubsets with relatively low errors on parameter estimations.

3.5. Study of the potential bias problem

From Table 5 it is possible to identify that KPS, KMAXP and bPP

were three parameters that could not enter the subset {bPAO, YPO4,mA} but provided the highest RDE value when analyzing the 4thcandidate parameter to enter the calibration subset. Hence, these‘‘fixed parameters’’ were selected for the bias study. These param-eters were modified 5% and 10% above and below their ASM2ddefault values. Then, bPAO, YPO4 and mA were re-calibrated and theparameter confidence intervals were again evaluated by FIMcalculations. Confidence intervals remained around their originalvalues of 5%, 17% and 4%. This happened for all the tested combi-nations of fixed parameters values, except when KMAXP was set to0.306, which is 10% lower than its ASM2d default value. In this case,the CCF value was 21.25, which is more than two times the value ofthe CCF obtained during the original calibration of bPAO, YPO4 and mA

(9.53). Moreover, the errors of bPAO, YPO4 and mA were 51%, 176% and20%, respectively, which are higher than the parameter confidenceinterval observed in the original calibration. Therefore, the resultsobtained by the selected subset are conditioned to values of KMAXP

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–1284 1283

lower than þ/� 5% around its ASM2d default value, but are notaffected by a 10% variation of KPS and bPP.

Finally, improvement of the CCF was also assessed in the case ofcalibration of subsets composed by {bPAO, YPO4, mA} plus one addi-tional parameter (KPS, KMAXP or bPP). Although results agreed witha lower CCF value of 4.07, 6.08 or 6.54, respectively, it had thedrawback of increasing the confidence interval of all the parame-ters of the subset to a norm of 89%, 1059% and 150%, respectively.These results corroborate that the procedure was able to selecta subset with a suitable number of identifiable parameters.

3.6. Computational performance of the proposed methodology

All calculations were performed with MATLAB� in an ASUS AMDAthlon 64 X2 Dual Core Processor 4000þ, 2.1 GHz, with 2 GB RAMmemory. Sensitivity calculations were performed just once at thebeginning of the whole methodology and took around 10 min. FIMcalculations (20 s for 2 parameters up to 100 s for 6 parameters)were fast compared with the optimization task (10 min for 2parameters up to 50 min for 6 parameters). For a seed thatproduced a final subset of 3 parameters, calculations required wereas follows:

� FIM and RDE of 19 pairs (combinations between the seed andall the 19 remaining parameters of the sensitivity ranking)� 1 optimization step with 2 parameters (the seed and the

parameter that presented the highest RDE together with theseed)� 1 FIM calculation of 2 optimized parameters for computing the

RDEC

� FIM and RDE of 18 three-parameter subsets (combinations ofthe optimized pair and each one of the 18 remaining parame-ters of the sensitivity ranking)� 1 optimization step with 3 parameters (the three-parameter

subset that presented the highest RDE)� 1 FIM calculation of 3 optimized parameters for computing the

RDEC

� FIM and RDE of 17 four-parameter subsets (combinations of theoptimized three-parameter subset and each one of the 17remaining parameters of the sensitivity ranking)� 1 optimization step with 4 parameter (the four-parameter

subset that provided the highest RDE)� 1 FIM calculation of 4 optimized parameters for computing the

RDEC

As we are supposing that the subset has 3 parameters, the RDEC

of the 4 optimized parameters is lower than the RDEC of the 3optimized parameters. In terms of computational efforts, it wasconsumed around 1 h for investigating one seed using the softwareand hardware previously described.

4. Conclusions

This work proposed a systematic methodology for reducing thenumber of calibration parameters of the ASM2d model, using theRDE criterion based on the FIM. Starting from 51 ASM2d parame-ters, it was recommended to perform model calibration with only 3parameters, {bPAO, YPO4, mA}, which optimized values were,respectively, 0.167, 0.195 and 1.033 with a confidence interval equalto 5%, 17% and 4%, respectively. From each best-ranked parameter inthe sensitivity ranking with 20 parameters named as ‘‘seeds’’, themethodology grows parameter subsets with elements that maxi-mize the RDE. The methodology provides the opportunity of testingall the best-ranked parameters in the sensitivity ranking (eachseed), thus better exploring the parameter space than other

methodologies that are exclusively based on a sensitivity analysisor its derived criteria. The proposed methodology does not test allthe possible combinations among all the 20 parameters, i.e., around106 combinations (considering all the combinations of 2, 3, 4, until20 parameters), which would be time-consuming. Additionally, themethodology does not optimize a subset with a high number ofparameters, which would probably demand critical computationalresources. Instead, the proposed methodology incorporatesparameters to each seed that, at the same time, minimize param-eter confidence intervals and help explaining the experimentalprocess behaviour because the subset produced by a seed has thehighest value of RDE during the iterations. This procedure has theimportant advantage that does not require expert knowledge and itdefines automatically the dimension of the identifiable subsetwithout requiring a threshold for RDE. In addition, it is not aniterative procedure and reduces the possibility to find a localoptimum because the minimization procedure is performed fromdifferent starting points and following different ways for each seed.

Acknowledgment

Vinicius Cunha Machado has received a Pre-doctoral scholarshipof the AGAUR (Agencia de Gestio d’Ajuts Universitaris i Recerca –Catalonia, Spain), inside programs of the European CommunitySocial Fund. The authors acknowledge the financial supportprovided through the European Community’s Human PotentialProgram under contract HPRN-CT-2001-00200 [WWT & SYSENG]and by the ‘‘Comision Interministerial de Ciencia y Tecnologıa’’(CICYT), project CTQ2007-61756/PPQ. The authors are members ofthe GENOCOV group (Grup de Recerca Consolidat de la Generalitatde Catalunya, SGR05–00721).

References

APHA, 1995. Standard Methods for the Examination of Water and Wastewater.American Publishers Health Association.

Baeza, J.A., Gabriel, D., Lafuente, J., 1999. An expert supervisory system for a pilotWWTP. Environmental Modelling & Software 14 (5), 383–390.

Baeza, J.A., Gabriel, D., Lafuente, J., 2002. Improving the nitrogen removal efficiencyof an A2/O based WWTP by using an on-line knowledge based expert system.Water Research 36 (8), 2109–2123.

Benedetti, L., Bixio, D., Claeys, F., Vanrolleghem, P.A., 2008. Tools to supporta model-based methodology for emission/immission and benefit/cost/riskanalysis of wastewater systems that considers uncertainty. EnvironmentalModelling & Software 23, 1082–1091.

Brun, R., Kuhni, M., Siegrist, H., Gujer, W., Reichert, P., 2002. Practical identifiabilityof ASM2d parameters – systematic selection and tuning of parameter subsets.Water Research 36 (16), 4113–4127.

Checchi, N., Giusti, E., Marsili-Libelli, S., 2007. PEAS: A toolbox to assess the accuracyof estimated parameters in environmental models. Environmental Modelling &Software 22, 899–913.

Checchi, N., Marsili-Libelli, S., 2005. Reliability of parameter estimation in respi-rometric models. Water Research 39, 3686–3696.

Copp, J.B., 2002. The COST Simulation Benchmark – Description and SimulatorManual. Office of Official Publications of the European Communities,Luxembourg.

De Pauw, D.J.W., 2005. Optimal Experimental Design for Calibration of BioprocessModels: A Validated Software Toolbox. PhD thesis in Applied BiologicalSciences, BIOMATH, University of Gent. Available from: http://biomath.ugent.be/publications/download/.

Dochain, D., Vanrolleghem, P.A., 2001. Dynamical Modelling and Estimation inWastewater Treatment Processes. IWA Publishing, London.

Ferrer, J., Seco, A., Serralta, J., Ribes, J., Manga, J., Asensi, E., Morenilla, J.J., Llavador, F.,2008. DESASS: a software tool for designing, simulating and optimisingWWTPs. Environmental Modelling & Software 23, 19–26.

Flores-Alsina, X., Rodrıguez-Roda, I., Sin, G., Gernaey, K.V., 2008. Multi-criteriaevaluation of wastewater treatment plant control strategies under uncertainty.Water Research 42, 4485–4497.

Freni, G., Mannina, G., Viviani, G., 2009. Identifiability analysis for receiving waterbody quality modelling. Environmental Modelling & Software 24, 54–62.

Fu, G., Butler, D., Khu, S., 2008. Multiple objective optimal control of integratedurban wastewater systems. Environmental Modelling & Software 23, 225–234.

V.C. Machado et al. / Environmental Modelling & Software 24 (2009) 1274–12841284

Garcıa-Usach, F., Ferrer, J., Bouzas, A., Seco, A., 2006. Calibration and simulation ofASM2d at different temperatures in a phosphorus removal pilot plant. WaterScience and Technology 53 (12), 199–206.

Guisasola, A., Baeza, J.A., Carrera, J., Sin, G., Vanrolleghem, P.A., Lafuente, J., 2006.The influence of the experimental data quality and quantity on parameterestimation accuracy: Andrews substrate inhibition model as a case study.Education for Chemical Engineer 1, 139–145.

Henze, M., Grady Jr., C.P.L., Gujer, W., Marais, G.R., Matsuo, T., 1987. Activated SludgeModel No. 1, IAWQ Scientific and Technical Report No.1. IWAQ, London.

Henze, M., Gujer, W., Mino, T., van Loosdrecht, M.C.M., 2000. Activated SludgeModels ASM1, ASM2, ASM2d and ASM3: Scientific and Technical Report No. 9.IWA Task Group on Mathematical Modelling for Design and Operation of Bio-logical Wastewater Treatment. IWA Publishing, London.

Hulsbeek, J.J.W., Kruit, J., Roeleveld, P.J., van Loosdrecht, M.C.M., 2002. A practicalprotocol for dynamic modelling of activated sludge systems. Water Science andTechnology 45 (6), 127–136.

Lagarias, J.C., Reeds, J.A., Wright, M.H., Wright, P.E., 1998. Convergence properties ofthe Nelder-Mead simplex method in low dimensions. SIAM Journal of Opti-mization 9 (1), 112–147.

Marsili-Libelli, S., Giusti, E., 2008. Water quality modelling for small river basins.Environmental Modelling & Software 23, 451–463.

Melcer, H., Dold, P.L., Jones, R.M., Bye, C.M., Takacs, I., Stensel, H.D., Wilson, A.W., Sun, P.,Bury, S., 2003. Methods for Wastewater Characterization in Activated SludgeModelling. Water Environment Research Foundation (WERF), Alexandria, VA, USA.

Norton, J.P., 2008. Algebraic sensitivity analysis of environmental models. Envi-ronmental Modelling & Software 23, 963–972.

Reichert, P., Vanrolleghem, P., 2001. Identifiability and uncertainty analysis of theRiver Water Quality Model No. 1 (RWQM1). Water Science and Technology 43(7), 329–338.

Rivas, A., Irizar, I., Ayesa, E., 2008. Model-based optimisation of wastewater treat-ment plants design. Environmental Modelling & Software 23, 435–450.

Ruano, M.V., Ribes, J., De Pauw, D.J.W., Sin, G., 2007. Parameter subset selection forthe dynamic calibration of activated sludge models (ASMs): experience versussystems analysis. Water Science & Technology 56 (8), 107–115.

Seber, G.A.F., Wild, C.J., 1989. Nonlinear Regression. Wiley, New York.Sin, G., De Pauw, D.J.W., Weijers, S., Vanrolleghem, P.A., 2008. An efficient approach

to automate the manual trial and error calibration of activated sludge models.Biotechnology and Bioengineering 100, 516–528.

Soderstrom, T., Stoica, P., 1989. System Identification. Prentice-Hall, EnglewoodCliffs: New Jersey.

Takacs, I., Patry, G.G., Nolasco, D., 1991. A dynamic-model of the clarificationthickening process. Water Research 25 (10), 1263–1271.

Vanrolleghem, P.A., Insel, G., Petersen, B., Sin, G., De Pauw, D., Nopens, I., Weijers, S.,Gernaey, K., 2003. A comprehensive model calibration procedure for activatedsludge models. In: Proceedings: WEFTEC 76th Annual Technical Exhibition andConference, October 11–15, Los Angeles, California.

Vanrolleghem, P.A., Benedetti, L., Meirlaen, J., 2005. Modelling and real-time controlof the integrated urban wastewater system. Environmental Modelling & Soft-ware 20, 427–442.

Weijers, S.R., Vanrolleghem, P.A., 1997. A procedure for selecting best identifiableparameters in calibration activated sludge model No. 1 to full-scale plant data.Water Science and Technology 36 (5), 69–79.