A robust and efficient triangulation-based optimization algorithm for stochastic black-box systems

This article appeared in a journal published by Elsevier. The attachedcopy is furnished to the author for internal non-commercial researchand education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling orlicensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of thearticle (e.g. in Word or Tex form) to their personal website orinstitutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies areencouraged to visit:

http://www.elsevier.com/authorsrights

http://www.elsevier.com/authorsrights

Author's personal copy

Computers and Chemical Engineering 60 (2014) 143– 153

Contents lists available at ScienceDirect

Computers and Chemical Engineering

jo u r n al homep age: www.elsev ier .com/ locate /compchemeng

A robust and efficient triangulation-based optimization algorithm forstochastic black-box systems

J.A. McGill, B.A. Ogunnaike ∗, D.G. Vlachos ∗

Department of Chemical and Biomolecular Engineering, University of Delaware, 150 Academy Street, Colburn Laboratory, Newark, DE 19716, United States

a r t i c l e i n f o

Article history:Received 1 April 2013Accepted 4 September 2013Available online 16 September 2013

Keywords:Stochastic optimizationEquation-freesystemsSmulation-based optimizationKinetic Monte Carlo

a b s t r a c t

Optimization of process variables is an important, yet difficult, task for systems-level analysis and designof complex stochastic systems. Here, we introduce the Simplex-Triangulation Optimization (STO) algo-rithm to optimize stochastic black-box systems efficiently in fewer iterations than other comparablealgorithms without requiring gradient information or detailed initial guesses. The STO algorithm is shownto converge linearly. Several test functions are utilized to compare the STO algorithm to the Particle SwarmOptimization (PSO) and Finite Difference Stochastic Approximation (FDSA) algorithms, which are oftenused for parameter optimization in stochastic systems.

© 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Consider a physical process with nonlinear dynamics that canbe described with the following discrete time recurrence relation,

x(t + �t) = f (t, x(t), u(t)) (1)

where x(t) is a vector of process state variables at time t, u(t)is a set of manipulated variables affecting the state, and f(·) isa nonlinear function describing the dynamics of the process. Formany applications, a closed-form expression for f(·) may not existand realizations of Eq. (1) may require simulation, such as kineticMonte Carlo or molecular dynamics. Systems lacking a closed-formexpression for state dynamics are coolectively known as black-boxsystems and the design, optimization, or control of such systemsis challenging. Optimization problems involving systems withoutclosed-form expressions are known in the literature as simulation-based optimizations (SO) (Fu, Glover, & April, 2005) or black-boxoptimizations (Jones, Schonlau, & Welch, 1998; Shan & Wang,2010). Formally, in SO, the values of a set of manipulated variables(simulation inputs) are determined such that simulation output(s)are driven to some optimal desired value(s). Parameter estimationin a model or simulation is a particularly important applicationof SO and black-box optimizations but many other applicationsof SO can be found in operations research (Fu et al., 2005), pro-cess control (Christofides, Armaou, Lou, & Varshney, 2009), and

∗ Corresponding authors.E-mail addresses: [email protected] (J.A. McGill), [email protected]

(B.A. Ogunnaike), [email protected] (D.G. Vlachos).

related literature. An excellent introduction and tutorial on SOcan be found in Fu (2002). In addition to the computational costsassociated with simulation evaluations, a number of other issuescomplicate SO. For example, problems requiring SO often involve ahigh degree of stochasticity and/or nonlinearity. Mixtures of con-tinuous and discrete variables in the manipulated variables oroutputs of a simulation are also common in the optimization ofthese systems. Several classes of SO tools have been developed tohandle many of the aforementioned issues. For example, stochas-tic gradient-based methods have been developed that guaranteeasymptotic convergence to a (local) minimum (Spall, 1998, 2003)but require continuous manipulated variables and very good ini-tial guesses, especially in high dimensional problems. While thesealgorithms can converge quickly, the gradients required by thesealgorithms must often be estimated numerically, greatly increasingthe total number of simulation evaluations (methods that exploitthe underlying specific structure of a simulation exist, e.g., Fu,2002). Other methods, such as random search or simulated anneal-ing, also have well understood convergence properties Spall (2001)and do not require good initial guesses but require a large numberof simulation evaluations, which is undesirable for computationallyintensive simulations.

Agent-based direct search algorithms, such as particle swarmoptimization (PSO) (Kennedy & Eberhart, 1995) and ant colonyoptimization (Marco, Di Caro, & Gambardella, 1999), rely on a col-lection of computational agents operating in parallel where eachagent has some form of memory and/or the ability to communi-cate with other computational agents. For the sake of brevity, wewill limit our discussion of agent-based algorithms to PSO. Whilethe literature on the convergence criteria for PSO only discusses

0098-1354/$ – see front matter © 2013 Elsevier Ltd. All rights reserved.http://dx.doi.org/10.1016/j.compchemeng.2013.09.003

https://www.researchgate.net/publication/3002947_Implementation_of_the_simultaneous_perturbation_algorithm_of_stochastic_optimization?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/3623274_Particle_swarm_Optimization?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/221527371_Simulation_optimization_A_review_new_developments_and_applications?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==


https://www.researchgate.net/publication/273453883_Particle_Swarm_Optimization?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/235709802_Efficient_Global_Optimization_of_Expensive_Black-Box_Functions?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/242383792_Optimization_for_Simulation_Theory_vs_Practice?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==


https://www.researchgate.net/publication/243770427_Ant_algorithm_for_discrete_optimization?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/264913332_Introduction_to_Stochastic_Search_and_Optimization_Estimation_Simulation_and_Control?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==


144 J.A. McGill et al. / Computers and Chemical Engineering 60 (2014) 143– 153

limited special cases (see Clerc & Kennedy, 2002; Trelea, 2003),the algorithm is effective in practice, can easily be parallelized,and has little per-iteration computational overhead. PSO attemptsto determine the optimum of a function globally over a closedor compact interval and, as such, simply requires the bounds onthe interval to initialize the algorithm, without requiring a spe-cific initial guess. Unfortunately, PSO may need a large number ofiterations to converge to a solution, particularly when only noisymeasurements of the objective function are available. Although alarge number of iterations is typically required, PSO will converge tothe global minimum of a given function, making it an easily imple-mented, parallelizable, and gradient-free optimization method. Amethod that leverages the strengths of current agent-based directsearch algorithms (robustness, parallelization) while mitigatingtheir weaknesses (large numbers of function evaluations) wouldbe a powerful tool for simulation-based optimization.

In this work, we introduce the Simplex-Triangulation Opti-mization (STO) agent-based algorithm that we developed forlow-dimensional noisy objective functions that are expensive toevaluate. STO attempts to solve the following problem:

minu

L(x, u, t)

s.t.

x(t + �t) = f (t, x(t), u(t))

u ∈ � : � ∈ RD

(2)

where L(·) is the objective function to be minimized, x is the vectorof process state variables, f(·) describes the process dynamics, u isthe D-dimensional vector of manipulated variables (MV), t is time,and � is the allowable domain of the MV. In STO, agents are assignedto evaluate L at different locations in � .

The rest of this work is organized as follows. Section 2 describesseveral objective functions used to assess STO and the other algo-rithms. Section 3 describes the STO algorithm in detail and providesbackground for the PSO and FDSA algorithms. Section 4 comparesSTO, PSO, and FDSA across several optimizations of the objectivefunctions. Finally, Section 5 provides a discussion of the findings.

2. Objective functions

This section describes the objective functions that were used toassess the performance of the optimization methods. We considertwo traditional static test problems (where the objective functionis a function only of u), as well as a dynamic test problem (wherethe objective function is a function only of both x and u). Forcontrollable dynamics at steady state (Harmon Ray, 1981), L( x, u,t) in Eq. (2) can be represented as a function of u alone,

L(u) = L(x|t=∞, u, t∞) (3)

and, thus, STO can be compared to PSO and FDSA using traditionalobjective functions; in this work, two standard test objective func-tions, the Alpine function and Griewank function, were used. Theseequations have been used to assess the performance of PSO (Clerc& Kennedy, 2002; Trelea, 2003). We have included an additive ran-dom noise term in each of the equations to benchmark the STO, PSO,and FDSA in the presence of different levels of noise. The Alpinefunction has the following form,

L(u) = −D∏

i=1

[√ui sin(ui)

]+ N(

0, �2)

(4)

where N(0, �2) represents normally distributed, zero-mean noisewith variance �2. For the D-dimensional hypercube � ∈ [0, 10]D, theabsolute minimum of E(L) occurs at u* ≈ [7.917, · · · , 7.917]T (Clerc& Kennedy, 2002).

Fig. 1. Reaction network of single gene expression.

The Griewank function has the following form,

L(u) = 14000

D∑i=1

u2i −

D∏i=1

cos(

ui√i

)+ 1 + N

(0, �2

)(5)

where, again, N(0, �2) represents normally distributed, zero-meannoise with variance �2. The absolute minimum of E(L) occurs atu* = 0 for any dimensionality of u. In two dimensions on the inter-val [− 60, 60]2, the Griewank function has more than 500 peaksand valleys where 0 ≤ E(L) < 4, providing a much more challengingobjective function to minimize compared to the Alpine function.

In addition to the Alpine and Griewank functions, the opti-mization methods were compared when optimizing a quadraticobjective function incorporating the dynamics of the gene regula-tory network shown in Fig. 1. The network represents the dynamicsof the number of RNA (R) and the number of protein molecules (P)as functions of the four rate constants and the amount of DNA (D)present.

The state ( x) and input ( u) vectors for this system are

x = [R, P]T (6)

u =[kr, kp

]T(7)

When the rate constants, kr and kp, are considered as manipu-lated inputs, the system is control affine with the following form,

x = h0(x) +∑

i

hi(x)ui (8)

where hi is a vector-valued function of the system state.The system shown in Fig. 1 was simulated using the kinetic

Monte Carlo (KMC) method, which solves the chemical masterequation (Gillespie, 1992) and is appropriate for modeling thekinetics of small numbers of molecules. We verified that the opti-mization routines converged to the global minimum using theanalytic solution to Eq. (8) (Thattai & Oudenaarden, 2001). In addi-tion, the noise present in the stochastic dynamics of the reactionnetwork varies with location in � , a challenge not present in eitherthe Alpine function or Griewank function. These attributes makethis simple reaction network an excellent test case. The objectivefunction used in this work is

L(x, u) =(

1 − P(u)�P

)2

+(

1 − R(u)�R

)2

(9)

where P is an estimate of the expected value of P, R is an estimate ofthe expected value of R, �P is the desired protein setpoint, and �R

is the desired RNA setpoint. In this work, we limit the optimizationto steady state estimates of P and R. Table 1 provides the initial con-ditions for R and P and Table 2 provides the fixed rate parameters� r and �p used in this work. Note that for all simulations we heldD = 1 for the entirety of the simulation.

https://www.researchgate.net/publication/222457906_Gillespie_D_T_A_Rigorous_Derivation_of_the_Chemical_Master_Equation_Physica_A_188_404-425?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/11901910_Intrinsic_noise_in_gene_regulatory_networks_Proc_Natl_Acad_Sci_U_S_A?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/222584120_Trelea_IC_The_Particle_Swarm_Optimization_Algorithm_convergence_analysis_and_parameter_selection_Information_Processing_Letters_85_317-325?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/285865273_Intrinsic_noise_in_gene_regulatory_networks?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/302942656_The_particle_swarm_optimization_algorithm_convergence_analysis_and_parameter_selection?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==


https://www.researchgate.net/publication/284048685_The_particle_swarm_explosion_stability_and_convergence_in_multi-dimensional_complex_space?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==






J.A. McGill et al. / Computers and Chemical Engineering 60 (2014) 143– 153 145

Table 1Initial conditions used for the state variables of the gene-regulatory network model.

State variable Initial condition

R 0P 0

3. Algorithms

3.1. Simplex-Triangulation Optimization

The Simplex-Triangulation Optimization (STO) algorithm util-izes multiple, independent computational agents to find a globalsolution to Eq. (2) iteratively. Each agent is assigned a set of val-ues, u, and evaluates L( u) (obtained from some computationallyexpensive black-box simulation or process). The set, U , consists ofall sampled u from the current and previous iterations, such that∀ u, u ∈ U and U ⊆ � . At each iteration, a mesh is constructedfrom U which connects a point u to all of its nearest neighborsvia a mathematical construct called a triangulation; as explainedin detail later, this mesh provides information for agent placementin the next iteration. More precisely, each u ∈ U becomes a vertexin a D-dimensional triangulation such that the triangulation fullytessellates � with D-simplexes (with each simplex containing D + 1vertices). As a concrete example, for a 2 dimensional � (i.e., eachu = [u1, u2]T), D = 2. The triangulation that tessellates this � will becomposed of simplexes that each have 3 vertices (D + 1 =3). Thus,the number of vertices in each simplex is related to the dimension-ality of � .

Let a point, u* ∈ U , be chosen such that

u∗ = arg minu∈U

L(x, u, t) (10)

where u* represents the current best guess for the minimizer of L.u* can be refined using information gathered from the triangulationof U , specifically nearest neighbor information and the volumeof each simplical element. To refine u* requires two steps. First,all simplexes incident to u* are found and their centers of mass(centroids) are calculated. The centroid of a D-simplex is defined asfollows,

c = 1D + 1

D+1∑k=1

uk (11)

where uk is a vertex of the simplex. Agents evaluate the valueof L(·) at the centroids of simplexes incident to u* which allowsrefinement of u* in the region of � surrounding u*. Second, someagents are assigned to evaluate L(·) at the centroids of simplexesthat have a large volume, where the volume, V, is defined by

V = 1D!

| det (u1 − u0, · · ·, uD − u0) | (12)

where D is the dimensionality of the simplex, ui is a vertex of thesimplex, V is the volume, and det(·) is the determinant operator.We have found that evaluating L(·) at the centroids of large vol-ume simplexes helps prevent convergence to local extrema (seeSection 3.1.3 for details). U is then updated with all newly sampledpoints and this process is repeated until some stopping criterion isachieved.

Table 2Values of the fixed rate parameters � r and �p describing the rates of decay of R andP, respectively.

Rate parameter Value

� r 0.1 s−1

�p 0.002 s−1

The sole task of an agent is to evaluate the objective function, L(·),at a specific u and, since each agent evaluates L(·) independentlyof other agents, each iteration of STO is trivially parallelizable. Thenumber of agents, N, is arbitrary but a larger number of agents willallow the STO algorithm to search � more quickly. We have foundthat a good heuristic for choosing N is obtained by approximatingthe maximum number of regular simplexes (simplexes with everypair of vertices being the same distance apart in a regular grid pat-tern), M, that can be incident to some central vertex, and choosingN such that,

M < N ≤ 2M (13)

For example, in two (three) dimensions, it can be shown thata maximum of 6 (22) regular simplexes can be incident simulta-neously to a given vertex; therefore, according to the guideline,6 < N ≤ 12 (22 < N ≤ 44) agents should be used for the optimization.Note that for D > 2, an approximation for M is necessary since it iswell known that in general for D > 2, regular D-simplexes cannotfully tessellate a given D-dimensional space.

Fig. 2 shows a schematic of the overall information flow of thealgorithm. The following subsections provide more details aboutspecific tasks performed by or properties of the STO algorithm,including: initialization, triangulation, measures implemented toprevent convergence to local minima, convergence properties, inaddition to a pseudocode and an illustrative example.

3.1.1. Agent initializationInitially, agent placement in � is arbitrary; however, we have

found that distributing the agents in a space-filling configurationwithin � improves convergence to the global minimum. Low-discrepancy sequences, such as those used in quasi-random MonteCarlo methods (Papageorgiou & Traub, 1997), have a number ofuseful properties with respect to distributing points in a space and,thus, are recommended for distributing the agents initially. Infor-mally, a low-discrepancy sequence, S, is a sequence whose valuesS1, · · · , Sn fill a D-dimensional space evenly as n→ ∞, such thatthere is no clustering of points in any particular location. Fig. 3illustrates the difference between random sequences of 25 and250 points in the interval [0, 1]2 and corresponding 25 and 250point low-discrepancy sequences (Sobol sequences in this case).Fig. 3 clearly shows that the low-discrepancy sequences are moreevenly distributed in the two dimensional interval. Techniques forimplementing various low-discrepancy sequence generators canbe found in Struckmeier (1995) and Press, Teukolsky, Vetterling,and Flannery (2007). In this work, a Sobol sequence generator wasutilized for initial agent placement.

3.1.2. Triangulation of pointsInformally, a triangulation is a mathematical construct describ-

ing the subdivision of a D-dimensional space containing N pointsinto simplexes which, as a set, fully tessellate the space. In twodimensions with a convex polygon as the enclosing space, the sim-plexes are triangles that fully tessellate the polygon. The mostcommonly employed triangulation in computational sciences is theDelaunay triangulation. A Delaunay triangulation maximizes theminimum angles of the simplexes in the triangulation and, thus,maximizes the volumes of the simplexes. Fig. 4 illustrates a typi-cal Delaunay triangulation. The purpose of triangulation in STO isto use the locations of all previously sampled points to create amesh containing information about how well areas in � have beensearched; this information is then used to determine agent place-ment for the next iteration. In particular, a high mesh density (smallsimplex volumes) in a neighborhood around a point u implies thatthe neighborhood has been searched extensively; likewise, a lowmesh density around a point u implies that the area surroundingu has not been searched well. The triangulated mesh is updated at

https://www.researchgate.net/publication/220488171_Faster_Evaluation_of_Multidimensional_Integrals?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==



Initializeagents

Triangulatepoints

Max Iterations/Tolerance Reached

Find currentargmin(L)

Find largevolume simplexes

Assign agents aroundcurrent argmin(L)

Assign agents aroundcentroids of largevolume simplexes

Collectresults

End

False

True

Fig. 2. Block diagram describing information flow in the STO algorithm.

each iteration by incorporating all newly sampled points as newvertices in the triangulation and, thus, refining the mesh densityaround u*.

In this work, we used the Qhull library (Qhull, 2013), whichimplements a general D-dimensional Delaunay triangulation algo-rithm, to calculate the Delaunay triangulation of a set of points.Note that the triangulation block in STO (see Fig. 2) is a bottleneckfor overall computational throughput of STO since triangulation,particularly in higher dimensions, is a computationally intensivecalculation. Implementing a parallelized triangulation algorithm,such as the DeWall algorithm (Cignoni, Montani, & Scopigno, 1998),

would reduce the time necessary to perform an iteration of STO;however, since only low dimensional test problems were consid-ered in this work, we utilized the well-tested, well documented,and thoroughly optimized Qhull library.

3.1.3. Large volume simplexesTo prevent convergence to local minima, the algorithm assigns

agents to the centers of large-volume simplexes. In this context,a “large-volume simplex” is a simplex that has a large relativevolume compared to other simplexes in the triangulation. Theexistence of a simplex implies that, with certainty, no point within

Fig. 3. (a) Sequence of 25 random points. (b) Sequence of 25 points from a Sobol sequence, a particular type of low-discrepancy sequence. (c) Sequence of 250 random points.(d) Sequence of 250 points from a Sobol sequence. The points from the Sobol sequence are more evenly distributed than the points from the random sequences.

https://www.researchgate.net/publication/220583434_DeWall_A_fast_divide_and_conquer_Delaunay_triangulation_algorithm_in_E_d?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==



Fig. 4. (a) A set of points obtained from a Sobol sequence in the interval [0, 1]2. (b) The Delaunay triangulation of the set of points in panel (a).

the volume of that simplex has been sampled. Informally, onecan rationalize this statement with the following argument: if apoint had been sampled from within that simplex, the triangula-tion algorithm would have subdivided the simplex into simplexesof smaller volume. Therefore, by simulating at the centroids oflarge-volume simplexes, the average simplex size, and thus theunexplored (unsampled) regions of � , is reduced. In particular,reducing the average simplex volume increases the likelihood offinding solutions with steep basins of convergence. Note that areasin � that contain smaller than average simplexes not near u* donot need to be considered explicitly since these areas have beensampled to a greater than average extent and have been found tobe suboptimal compared to u*, whereas locations with larger sim-plexes have not been sampled as extensively so that the algorithmcannot be sure that a better u* does not exist within the simplexvolume.

We calculated which large-volume simplexes to sample in theimplementation of STO used in Section 4 as follows. First we rankall of the simplexes in U according to volume. Then, if we uti-lize N agents and have found that u* is incident to M simplexes,the centroids of the (A − M) largest simplexes are sampled. Notethat, depending on the system and size of � , other heuristics couldbe used to determine large-volume simplex sampling. Examples ofother sampling strategies include always sampling from the cen-troids of the L largest simplexes at the expense of sampling aroundu*, having a separate pool of agents just for large-volume simplexsampling, etc.

3.1.4. ConvergenceIt can be shown (see Appendix A) that STO converges linearly

with the volume of the bounding simplex. Empirical evidence sug-gests that the algorithm converges to an approximate solution afteronly a few iterations. This implies that STO may be used effectivelyas a screening tool to locate bounds on the minimizer or to find ini-tial guesses to use with other optimization algorithms. For example,a hybrid optimization algorithm could be constructed that couplesSTO with a gradient-based method. Such a method would utilizeSTO to find initial guesses efficiently for the gradient based solverwhile the gradient-based solver quickly refines the initial guesswith its higher order of convergence.

3.1.5. Pseudocode implementation of STOA pseudocode implementation of STO is shown in Fig. 5. Text

contained in braces, { · }, are comments to guide the reader andclarify various implementation aspects. Text in italics denotes vari-able names and text in Small Caps denotes function names. Someof the functions in the pseudocode, such as FindLargeV and Deter-

mineError, are specific to the triangulation algorithm utilized or tothe objective function being minimized and, thus, are left for thereader to implement.

In the pseudocode, agents is a list of points that will be sam-pled in the next iteration. AddToTriangulation(·) incorporates eachpoint in agents into the triangulation held in tri. The variable fvalis a list containing the objective function value obtained at eachpoint in agents. The variable mesh is a mapping of all sampledpoints in agents to their corresponding fval (i.e., a lookup table,or associative array). FindIncident(·) returns the point that corre-sponds to the smallest sampled fval. FindNearMin(·) takes a pointand returns a list of simplexes incident to it. FindCentroid(·) takesa simplex representation and returns its centroid. FindLargeV(·)takes the triangulation and an integer, nRemaining, and returns thenRemaining largest simplexes. DetermineError(·) takes the meshand returns errors used in the stopping criteria for the algorithm(such as u

∗i − u

∗i−1 and ObjFun(u∗

i )-ObjFun(u∗i−1)).

3.1.6. Illustrative exampleThe key aspects of the STO algorithm are illustrated graphically

in Fig. 6. Fig. 6a shows the natural logarithm of the deterministicgene network objective function (see Section 2) with its analyticminimum indicated by the red dot. The natural logarithm of thegene network objective function (see Eq. (9)) is shown due to theflat curvature around the minimum. Fig. 6b shows the mesh gen-erated by STO after the agent initialization phase. Each point in themesh represents a location in � that has been sampled. During theinitialization phase (Fig. 6b), the agents are well dispersed and formsimplexes of approximately the same volume. Fig. 6c shows themesh after the first iteration of the STO algorithm. Finally, Fig. 6dshows the mesh after 5 iterations of the STO algorithm. Fig. 6dindicates clearly that not only has the region around the analyticminimum been searched extensively, but the average simplex size



// lbounds is a vector of low er bounds,// ubounds is a vector of up per bounds,// ObjFun is (a pointe r or refere nce to ) the object ive funct ion,// nAgents is the number of agents use d by STO

function STO(lbounds,ubounds,ObjFun,nAgents)// Initialize agent locationsran ge ← ubounds − lboundsfor i ← 1, nA gents do

// Assign ea ch age nt a new Sobol sequence valu e// Here the Sobol seq uence is boun d to 0 ≤ SBi ≤ 1agentsi ← ran ge∗Sobol () + lbounds

end forcount ← 0while count < Max Ite ration and err > Max Tol do

// Use ObjFun to determin e th e valu e at ea ch// agent location and add the points to the// tr iangulat ion, trifval ←ObjFun (agents)tri ←AddToTriangul ation (agents )// Use tri and fval to update the meshmesh ←Up dateMesh(agents, fval )// Fin d u∗ and simplexe s in cid ent to u∗

curMin ←FindCurrentMin (mesh )nearMin ←FindIncident (tri)// Determin e how many agents ar e stil l un assignednRemaining ← nA gents−length (nea rMin )// Assign age nts to ea ch simpl ex in nea rMinfor i ← 0,length (nearMin ) do

agentsi ←FindCent roid (nea rMini)end for// Fin d locat ions of the nRemaining larges t volum e// simpl exes and ass ign agents to them// Note: loop decrements from nA gents − 1LrgV Simp ←FindLa rgeV (tri, nRemaining )for i ← nA gents − 1, nRemaining do

agentsi ←FindCent roid (LrgV simpi)end for// Determin e error of u∗

err ←DetermineEr ror (mesh)count ← count + 1

end whilereturn curMin

end function

Fig. 5. Pseudocode for STO.

across the mesh has been reduced by a factor of approximately 3compared to the agent initialization phase.

3.2. Particle Swarm Optimization

Particle Swarm Optimization (PSO) is an agent-based algorithmthat minimizes a function constrained in a closed or compact D-dimensional hypercube, � (equivalent to the � defined in Eq.(2)). The algorithm, pioneered by Kennedy and Eberhart (1995),has proven effective for optimizing multi-modal, high-dimensionalobjective functions (Clerc & Kennedy, 2002; Trelea, 2003). Eachagent is composed of a position and a velocity which jointly dictatehow the agent moves around � in search of the global minimum.The position and velocity of each agent are calculated according tothe following equations,

uik+1 = aui

k + bvik+1 (14)

vik+1 = cvi

k + dr1(

ui,p − uik

)+ er2

(ug − ui

k

)(15)

where uik

is the D-dimensional point representing the location ofthe ith agent at the kth iteration in � , vi

krepresents the velocity of

the ith agent at the kth iteration, r1 and r2 are vectors of uniformrandom variables, and a, b, c, d, and e are constants. Each agentstores two quantities: (1) the current best estimate for the mini-mizer of L(·) across all agents, ug, and (2) the current best estimatefor the minimizer of L(·) that the ith agent has reported, ui,p. Bothug and ui,p influence the agent’s velocity in � and result in a newvelocity for the agent at each iteration that is a weighted sum of ug

and ui,p. The qualitative behavior at each iteration arising from Eqs.(14) and (15) is that the agents traverse a large portion of � as a“flock” and eventually settle to a single point. For further implemen-tation details, the reader is directed to one of the papers discussingthe convergence properties of PSO (Clerc & Kennedy, 2002; Trelea,2003).

3.3. Finite-difference stochastic approximation

Finite-difference stochastic approximation (Spall, 2003) (FDSA)is a gradient based optimization algorithm designed to optimizenoisy objective functions. FDSA can be thought of as a general-ization of line search optimization algorithms (Nocedal & Wright,2006) for stochastic objective functions. The general recurrencerelation for stochastic approximation is,

uk+1 = uk − akgk+1(uk) (16)

where uk is the estimate of the minimizer of L( x, u, t) at the kth iter-ation and gk+1(uk) is an estimate of the gradient at uk. ak (s.t. ak > 0)is a decaying sequence that filters out noise in uk and gk+1(uk). InFDSA, gk+1(uk) is estimated via a finite differencing formula of theform,

gk+1(uk) =

⎡⎢⎢⎢⎢⎢⎣L(uk + ck�1) − L(uk − ck�1)

2ck

...L(uk + ck�D) − L(uk − ck�D)

2ck

⎤⎥⎥⎥⎥⎥⎦ (17)

where �i is a D-dimensional unit vector with a 1 in the ith place and0s elsewhere, and ck > 0 is another decaying sequence defining themagnitude of the perturbation in each element of uk. A consequenceof Eq. (17) is that each iteration of FDSA requires 2D function eval-uations. It should be noted that changing the functional form of Eq.(17) to first-order or using such computational approaches as com-mon random numbers, can reduce the computational requirementsfor estimating the gradient in Eq. (16), but in general, gradient esti-mation in FDSA will scale linearly with D (see, for example, Refs.McGill, Ogunnaike, & Vlachos, 2012; Spall, 2003). (See Spall, 2003for a detailed discussion on the convergence of Eqs. (16) and (17)and for appropriate selection of the sequences ak and ck.) In thiswork, we use sequences of the following form for ak and ck,

ak = a

(k + 1 + A)˛ (18)

ck = c

(k + 1)� (19)

where k is the current number of iterations and a, A, ˛, c, and �are specific parameters in the objective function. Table 3 showsthe parameter values used in FDSA for each of the objectivefunctions.

https://www.researchgate.net/publication/256744271_Efficient_gradient_estimation_using_finite_differencing_and_likelihood_ratios_for_kinetic_Monte_Carlo_simulations?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==


https://www.researchgate.net/publication/230873264_Numerical_Optimization_Second_Edition?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==











Fig. 6. (a) Natural logarithm of the gene network objective function and its analytic minimum (red dot). (b) The mesh generated by STO after the agent initialization phase.The blue circles represent locations searched by the agents. (c) The first iteration of the STO algorithm. (d) The mesh after 5 iterations of the STO algorithm. (For interpretationof the references to color in this figure legend, the reader is referred to the web version of the article.)

4. Results and discussion

The STO, PSO, FDSA and KMC algorithms were implementedusing a combination of C++ (triangulation and KMC code) andPython (optimization code). Each optimization algorithm wasassessed by minimizing Eqs. (4), (5), and (9) with respect to u.Eqs. (4) and (5) were each minimized in both two (� ⊂ R

2) andfour (� ⊂ R

4) dimensions for several values of �2. In the followingoptimizations, STO and PSO were allotted 20 agents each for the 2Dcases and 40 agents each in the 4D cases. For FDSA, an initial guess,x0, was chosen for each objective function such that x0 was at theedge of each respective objective function’s basin of convergencefor the global minimum. The stopping criteria for the algorithmswere defined as

‖ ui − ui−1 ‖‖ ui−1 ‖ ≤ �u (20)

‖ L(

ui)

− L(

ui−1)

‖‖ L(

ui−1)

‖≤ �L (21)

Table 3Parameter values for the FDSA algorithm.

Objective function a A c �

Alpine 0.75 50 0.602 0.5 0.101Griewank 0.75 50 0.602 0.5 0.101Gene regulatory 0.3 50 0.602 0.02 0.101

where ui is the estimate for the minimizer of L at the ith itera-tion, L is the averaged estimate for L, and �u is the tolerance for u,and �L is tolerance for L. The noise present in the objective func-tion measurements means that some averaging and relatively largevalues of �L are required to ensure the stopping criteria in Eqs. (20)and (21) are met. Exponential moving averages were applied to theobjective function measurements such that

L(ui) = (1 − ˛)L(ui−1) + ˛L(ui) (22)

where 0 ≤ ≤ 1. In this work, we set to 0.6, however, the choiceof is somewhat problem specific and should be chosen based onthe observed level of noise in objective function measurements.While the exponentially-weighted moving average allows L to beaveraged across iterations within the same optimization to filternoise, other moving averages could also be utilized.

Fig. 7 displays contour plots for Eqs. (4), (5), and (9) to providesome insight into the topology of the objective functions. Each opti-mization algorithm was allowed a maximum of 1000 iterations tofind the global minimum; if the algorithm did not find the minimumafter 1000 iterations, the search was terminated.

To assess the robustness of STO, PSO, and FDSA in the presenceof noise, Eqs. (4) and (5) were optimized for several different valuesof �2: three values of �2 were chosen for both the Alpine (�2 ∈ [0,5, 50]) and Griewank (�2 ∈ [0, 0.05, 0.5]) functions correspondingto no, moderate, and high noise. The tables in the following sec-tions summarize various statistics of the optimization, includingthe average distance from the global minimizer, ‖ u − ug ‖, and the



Fig. 7. Contour plots of the Alpine, Griewank, and the natural logarithm of the gene network objective functions. The light colored regions indicate lower values of theobjective function while higher values of the objective function are indicated by dark colored regions.

average number of iterations and function evaluations the opti-mizations took to meet the convergence criteria.

In the optimization of Eq. (9), we set �R = 2 and �P = 50 and opti-mized the objective function under two conditions: (1) when P andR were obtained through a KMC simulation, and (2) when P andR were calculated using the analytic solution for their respectiveexpected values (Thattai & Oudenaarden, 2001) (representing a “nonoise” case).

In the following sections, an observant reader will note that asthe noise increases, the average distance for STO and PSO increases(particularly for the Griewank function) more than for FDSA. Thisis primarily due to the following reason: STO and PSO are globaloptimization algorithms searching over the entire permissible MVspace, � , whereas FDSA is a local optimization algorithm wherethe initial guess must be set close to the global minimizer to ensureconvergence to the global minimum (see Section 4.1). This meansthat, particularly for large values of �2, STO and PSO may not con-verge to the exact global minimum but to a local minimum in thegeneral vicinity of the global minimum. While convergence to alocal minimum is not ideal, note that only broad bounds on u (e.g.,the bounds on �) are required to find a good approximation of theglobal minimizer when using STO or PSO as opposed to a very spe-cific initial guess for FDSA (or other local optimization algorithms).FDSA was included in this comparisons since it allows us to com-pare the convergence of the direct search methods, STO and PSO,to that of a gradient method.

Additionally, to provide a fair comparison of FDSA to STO andPSO, we implemented a hybrid optimization algorithm where weoptimize an objective function using STO and then used its outputsas an initial guess for FDSA. Examining the results of the hybridapproach provides two benefits. First, we used very carefully cho-sen initial conditions for FDSA to ensure convergence to the globalminimum (see Section 4.1). The hybrid algorithm allows us to seethe performance of FDSA at other initial conditions. Second, by com-paring the results of the hybrid algorithm to the results from STOalone, we can determine if any improvement in accuracy resultsfrom using FDSA. This is particularly important if, as mentionedabove, STO converges to a local minimum.

4.1. Effect of initial guesses on FDSA

As mentioned in Section 3.3, in order to achieve convergence tothe global minimum, FDSA, like most gradient methods, requires agood initial guess. To illustrate this consider the FDSA algorithmoperating on the deterministic Alpine function at various initialguesses. Consider four different initial guesses

u0 ∈{[

6

6

],

[6.5

6

],

[6

6.5

],

[6.5

6.5

]}

Fig. 8 shows that, despite the close proximity of these initialguesses to one another, only one of the initial guesses results inthe FDSA algorithm converging to the global minimum. The opti-mization does not fail to converge; it simply converges to a local,rather than the global minimum. While this result is well known,we want to draw attention to the fact that methods that samplemany points in � simultaneously, such as STO and PSO, do notrequire good initial guesses. Once one provides a bounded regionof admissible values for u, STO and PSO can locate good approxi-mations to the global minimum even with very complex functiontopologies (see Fig. 6).

In the following subsections, we have chosen the initial con-dition for FDSA carefully, so that even in the presence of largeamounts of noise, FDSA will often converge to the global minimum.In most cases, obtaining such an initial guess would be difficult orimpossible. To illustrate optimizations using FDSA that do not startat a carefully chosen initial guess, we have included a hybrid STO-FDSA algorithm that uses the output of STO as an initial guess toFDSA. In the hybrid algorithm, FDSA does not substantially improvethe guess provided by STO.

4.2. Two dimensional optimization

In 2D, the Alpine, Griewank, and KMC objective functions wereoptimized using STO, PSO, FDSA, and a hybrid method (whereSTO was used to find an initial guess for FDSA); the results are

Fig. 8. Contour plot of the Alpine function with initial guesses and optimizationtrajectories superimposed. This example illustrates the sensitivity of FDSA to initialguesses. The four blue circles represent initial guesses, while the red dots representa result of the FDSA algorithm. The black lines represent the optimization trajectoryconnecting an initial guess to its final value. (For interpretation of the references tocolor in this figure legend, the reader is referred to the web version of the article.)




Table 4Summary statistics for the minimization of the 2D Alpine function over 20 replicateoptimizations. Iteration and function evaluation ranges represent the 95% confi-dence interval (�L = �u = 1.0 × 10−3).

f − f g ‖ u − ug ‖ Avg iterations Avg functionevaluations

�2 = 0STO 5.04 × 10−2 6.82 × 10−2 6.8 ± 0.7 140 ± 13.8PSO 6.98 × 10−2 6.12 × 10−2 37 ± 8.5 743 ± 170FDSA 1.31 × 10−5 1.74 × 10−3 1000 ± 0.0 4000 ± 0.0Hybrid 1.31 × 10−5 1.74 × 10−3 990 ± 0.7 4100 ± 8.3

�2 = 5STO 1.04 × 100 3.99 × 10−1 14.6 ± 5.7 295 ± 114PSO 1.20 × 100 3.23 × 10−1 27.2 ± 27.2 626 ± 544FDSA 5.78 × 10−1 6.74 × 10−1 1000 ± 0.0 4000 ± 0.0Hybrid 8.58 × 10−2 4.16 × 10−1 985 ± 2.7 4200 ± 32

�2 = 50STO 5.69 2.21 22 ± 15 430 ± 300PSO 4.95 3.74 720 ± 150 14000 ± 3000FDSA 5.17 3.38 1000 ± 0.0 4000 ± 0.0Hybrid 6.23 4.75 970 ± 11 4300 ± 0.0

Table 5Summary statistics for the minimization of the 2D Griewank function over 20replicate optimizations. Iteration and function evaluation ranges represent the 95%confidence interval (�L = �u = 1.0 × 10−3).


�2 = 0STO 4.43 × 10−2 11.30 24.1 ± 9.5 482 ± 190PSO 4.10 × 10−5 3.88 × 10−3 76.8 ± 7.4 1535 ± 148FDSA 1.77 × 10−12 2.66 × 10−6 1000 ± 0.0 4000 ± 0Hybrid 4.42 × 10−2 11.30 975 ± 9.5 4285 ± 114

�2 = 0.05STO 1.26 × 10−1 15.7 37.3 ± 15.3 745 ± 307PSO 3.35 × 10−1 14.4 1000 ± 0.0 20000 ± 0FDSA 2.09 × 10−3 7.12 × 10−2 1000 ± 0.0 4000 ± 0Hybrid 4.67 × 10−2 15.7 962 ± 15 4443 ± 184

�2 = 0.5STO 5.62 × 10−1 23.2 40.5 ± 21.0 809 ± 419PSO 6.72 × 10−1 24.1 1000 ± 0.0 20000 ± 0FDSA 1.14 × 10−2 1.63 × 10−1 1000 ± 0.0 4000 ± 0Hybrid 3.42 × 10−1 23.1 959 ± 21.0 4481 ± 252

summarized in Tables 4–6. Each table contains four columns. Thefirst two columns show the accuracy of the methods compared tothe respective global minimum and minimizer of each objectivefunction, the last two columns the average number of iterationsfor convergence and the average number of function evaluationsfor convergence, respectively. Since this work is concerned withparameter optimization of computationally expensive objective

Table 6Summary statistics for the minimization of the gene regulatory objective functionover 20 replicate optimizations. Iteration and function evaluation ranges representthe 95% confidence interval (�L = �u = 1.0 × 10−3).


DeterministicSTO 1.16 × 10−3 4.08 × 10−3 28.8 ± 10.3 580 ± 207PSO 6.08 × 10−4 2.01 × 10−3 23.6 ± 3.4 470 ± 67FDSA 5.44 × 10−30 4.86 × 10−16 1000 ± 0.0 4000 ± 0.0Hybrid 4.75 × 10−30 4.48 × 10−16 970 ± 9.6 4300 ± 120

StochasticSTO 1.72 × 10−2 1.99 × 10−2 40 ± 9.6 780 ± 191PSO 2.30 × 10−2 2.25 × 10−2 53 ± 17 1100 ± 346FDSA 1.02 × 10−2 1.15 × 10−2 1000 ± 0.0 4000 ± 0.0Hybrid 6.31 × 10−2 1.14 × 10−2 960 ± 9.6 4500 ± 120

functions, we are more concerned with the total computationalcost of the methods than with the absolute accuracy and precisionof the solution (provided the solutions are within some desiredneighborhood of one another). This is particularly true when weconsider noisy objective functions since, in such cases, the algo-rithms will only be able to obtain a noisy approximation to theactual minimizer.

Tables 4–6 show that for the Alpine and gene regulatory objec-tive functions, FDSA and the hybrid method generally converge toa more precise solution than does STO or PSO but require sub-stantially more function evaluations than does the agent basedmethods. When �2 is small, FDSA converges to a more accuratesolution for the given stopping criteria, but as the noise levelincreases, all four of the methods show similar accuracy. In mostcases, STO converges in far fewer iterations than the other algo-rithms.

Note that STO and PSO require 20 function evaluations periteration while FDSA requires only 4 function evaluations per iter-ation (for the hybrid method, the number of function evaluations isdependent on the number of iterations required for the STO phaseand the PSO phase). This implies (again assuming that each objec-tive function evaluation is expensive) that if all STO and PSO agentscan operate in parallel, then STO and PSO will run for approximatelythe same (real) time per iteration as FDSA and the number of iter-ations is the proper metric for comparing the performance of thealgorithms. However, if the agents can only be run serially (e.g.,one agent must complete its function evaluation before anothercan proceed), then FDSA requires less (real) time per iteration andthe number of function evaluations becomes the appropriate met-ric for comparing the methods. Regardless of the metric used, as�2 increases, STO becomes the most computationally cost effectivealgorithm.

It is also important to note that none of the algorithms con-verges to the global minimizer in every case. In particular, as thenoise increases, the algorithms are more likely to converge to a localminimum. This can be seen by inspecting the first two columns inTables 4–6. The column labeled f − f g shows the average deviationof estimate of the global minimum from the actual analytic globalminimum (smaller is better). The column labeled ‖ u − ug ‖ showsthe average distance (Euclidean norm) of the estimate for the globalminimizer from the actual global minimizer (smaller is better).Observe that, in particular, the algorithms had the most difficulttime minimizing the Griewank function, which makes intuitivesense since the Griewank function has a large number of local min-ima that differ by only a small amount (see Fig. 7). In many practicalproblems, one only needs to find a solution that is sufficiently closeto the global minimum/minimizer in some regard, rendering thisproblem less important.

4.3. Four dimensional optimization

In 4D, only the Alpine and Griewank functions were optimized.The gene regulatory objective function in Eq. (9) could not be gen-eralized to 4 dimensions since this system has only two observables(R and P). Tables 7 and 8 summarize the results for the four dimen-sional objective functions.

The trends for STO and PSO in Tables 7 and 8 are similar to thetrends in Tables 4 and 5; as the amount of noise increases, thenumber of iterations required for STO convergence is one to twoorders of magnitude lower than that of PSO, FDSA, or of the hybridalgorithm. For the Alpine objective function, the average distancefrom the global minimum/minimizer is approximately the same forSTO, PSO and the hybrid algorithm. FDSA obtains a more accuratesolution for the Griewank function than for the other algorithms.Section 3.3 explains the reason for this: FDSA is initialized veryclose to the global minimizer, ensuring convergence to the global



Table 7Summary statistics for the minimization of the 4D Alpine function over 20 replicateoptimizations. Iteration and function evaluation ranges represent the 95% confi-dence interval (�L = �u = 1.0 × 10−3).


�2 = 0STO 26.7 4.67 11 ± 0.9 210 ± 18PSO 18.5 3.29 84 ± 9.2 1700 ± 180FDSA 2.06 × 10−4 2.5 × 10−3 1000 ± 0.0 4000 ± 0.0Hybrid 26.5 4.64 990 ± 1 4100 ± 11

�2 = 5STO 27.7 4.27 20 ± 2.6 41 ± 52PSO 14.7 2.46 1000 ± 0.0 20000 ± 0.0FDSA 15.4 2.40 1000 ± 0.0 4000 ± 0.0Hybrid 31.4 4.76 980 ± 2.6 4200 ± 31

�2 = 50STO 32.1 4.97 23 ± 6.6 450 ± 130PSO 17.3 2.43 1000 ± 0.0 20000 ± 0.0FDSA 39.1 5.37 1000 ± 0.0 4000 ± 0.0Hybrid 40.2 5.40 980 ± 6.6 4300 ± 79

Table 8Summary statistics for the minimization of the 4D Griewank function over 20replicate optimizations. Iteration and function evaluation ranges represent the 95%confidence interval (�L = �u = 1.0 × 10−3).


�2 = 0STO 3.60 × 10−1 35.1 18 ± 2.4 370 ± 49PSO 8.45 × 10−7 1.35 × 10−3 97 ± 4.6 1900 ± 91FDSA 3.42 × 10−8 5.20 × 10−4 1000 ± 0.0 4000 ± 0.0Hybrid 3.31 × 10−1 35.1 980 ± 2.4 4217 ± 29

�2 = 0.05STO 3.93 × 10−1 30.9 33 ± 4.7 660 ± 94PSO 6.31 × 10−1 18.0 1000 ± 0.0 20000 ± 0.0FDSA 3.93 × 10−3 1.42 × 10−1 1000 ± 0.0 4000 ± 0.0Hybrid 2.73 × 10−1 31.0 970 ± 4.7 4400 ± 56

�2 = 0.5STO 1.20 36.3 6.1 ± 1.7 120 ± 33PSO 1.10 33.6 1000 ± 0.0 20000 ± 0.0FDSA 3.69 × 10−2 4.11 × 10−1 1000 ± 0.0 4000 ± 0.0Hybrid 0.52 36.1 990 ± 1.7 4100 ± 20

minimizer, whereas STO, PSO, and the hybrid algorithm are simplygiven constraints on � . STO does converge very quickly to a localminimum that has a value close to the global minimum, however,making it a good choice when the goal is to find an approximationto the global minimum in as few iterations as possible.

5. Conclusions

Efficient simulation-based optimization of low-dimensional,computationally expensive simulations has a wide range of appli-cations in process parameter estimation, optimization, and control.In this work, we have developed the Simplex-Triangulation Opti-mization (STO) algorithm as an efficient method for optimizingobjective functions that may depend on low-dimensional dynamicsof a system. The performance of the STO algorithm was comparedto that of the Particle Swarm Optimization (PSO) algorithm, theFinite Difference Stochastic Approximation (FDSA) algorithm, and ahybrid algorithm that incorporated both STO and FDSA algorithms.We used stochastic variants of the Alpine and Griewank objectivefunctions in 2- and 4-dimensions, as well as an objective functiondependent on the dynamics of a simple gene regulatory networkmodeled using a kinetic Monte Carlo method to assess the perfor-mance of each algorithm. We found that convergence of the STOrequires significantly fewer function evaluations in the presence

of noise compared to the PSO algorithm, and a similar number offunction evaluations as FDSA, despite STO being a direct searchmethod. Additionally, unlike FDSA, STO converges near the globalminimum of a function without requiring a good initial guess. Thisis an important property since, in many applications, knowledgeof the underlying objective function structure is unavailable andexpensive to acquire.

While the STO has a larger computational overhead per iterationcompared to the PSO or FDSA, STO is a better choice when appliedto low to moderate dimensional problems with moderate to largeamounts of noise, and each individual simulation has a long runtime. The computational overhead of STO is due to the Delaunaytriangulation required at each iteration. As an example, in 2D thetime complexity of a “divide and conquer” algorithm is O(n log n)where n is the total number of points in the triangulation. Thebest-case computational complexity for triangulations when scal-ing to higher dimensions is an open problem and is specific to theclass of (Delaunay) triangulation algorithm used (e.g., a “divide andconquer” algorithm versus an incremental algorithm). As a compar-ison, the computational complexity for agent placement in PSO isO(1) for a constant number of agents. While this scaling appears tofavor PSO greatly, in most practical optimization problems of low tomoderate dimension (D ≤ 6), the Delaunay triangulation proceedssufficiently quickly. For example, the 4D Delaunay triangulationsfrom the numerical experiments in Section 4 took 10–15 min each.When a simulation takes several hours to days to complete (as isthe case in many KMC or molecular dynamics simulations), an addi-tional 15 minutes of computations per iteration is worth the costof reducing the total number of iterations.

Acknowledgement

This work was supported in part by the NSF (CMMI-0835673).

Appendix A. Linear convergence of STO

Here we show that STO converges linearly. We assume that thepoint corresponding to the global minimum always exists withinthe volume of a simplex and never exactly on a vertex. First we notethat the volume of a simplex can be calculated as follows,

V = 1n!

∣∣det (u1 − u0, · · ·, un − u0)∣∣ (A.1)

where V is the simplex volume, n is the dimensionality of the sim-plex, xi is the ith vertex of the simplex, and det(·) is the determinant.Also, the centroid of a simplex corresponds to its geometric centerof mass, which can be calculated as

c = 1n + 1

n∑k=0

uk (A.2)

where c is the centroid of the simplex. The centroid of a simplexdivides that simplex into n + 1 new simplexes. In what follows, weshow that in any dimension the volume of each of the n + 1 newsimplexes will be exactly Vnew = Vold/(n + 1).

Let Vold be the volume of a simplex as given in Eq. (A.1) andlet Vnew be the volume of one subdivision of that simplex by thecentroid, such that

Vnew = 1n!

∣∣det (u1 − u0, · · ·, c − u0)∣∣

= 1n!

∣∣∣∣∣det

(u1 − u0, · · ·, 1

(n + 1)

(n∑

k=0

uk

)− u0

)∣∣∣∣∣ (A.3)



Since 1/(n + 1) is a constant

Vnew = 1(n + 1)

1n!

∣∣∣∣∣det

(u1 − u0, · · ·,

(n∑

k=0

uk

)− (n + 1)u0

)∣∣∣∣∣= 1

(n + 1)1n!

∣∣∣∣∣det

(u1 − u0, · · ·,

n∑k=0

(uk − u0)

)∣∣∣∣∣ (A.4)

Finally, since the determinant is invariant with respect to ele-mentary row operations

Vnew = 1(n + 1)

1n!

∣∣∣∣∣det

(u1 − u0, · · ·,

n∑k=0

(uk − u0) −n−1∑k=1

(uk − u0)

)∣∣∣∣∣= 1

(n + 1)1n!

∣∣det (u1 − u0, · · ·, un − u0)∣∣

= 1(n + 1)

Vold

(A.5)

Through induction, it is clear that Eq. (A.5) could be applied toany subsequent simplex. This implies that at each iteration, thevolume of the simplex surrounding the point corresponding to theglobal minimum decreases by a constant factor, which implies lin-ear convergence.

References

www.qhull.org, April 1, 2013.Christofides, P., Armaou, A., Lou, Y., & Varshney, A. (2009). Control and optimization

of multiscale process systems. Boston: Birkhauser.Cignoni, P., Montani, C., & Scopigno, R. (1998). DeWall: A fast divide and conquer

delaunay triangulation algorithm in ed. Computer-Aided Design, 30, 3–41.Clerc, M., & Kennedy, J. (2002). The particle swarm – explosion, stability, and conver-

gence in a multidimensional complex space. IEEE Transactions on EvolutionaryComputation, 6, 58–73.

Fu, M. C. (2002). Optimization for simulation: Theory vs practice. Informs Journal onComputing, 14, 192–215.

Fu, M. C., Glover, F. W., & April, J. (2005). Simulation optimization: a review, newdevelopments, and applications. In Proceedings of the Winter Simulation Confer-ence (p. 13).

Gillespie, D. T. (1992). A rigorous derivation of the chemical master equation. PhysicaA, 188, 404–425.

Harmon Ray, W. (1981). Advanced process control. New York: McGraw Hill.Jones, D. R., Schonlau, M., & Welch, W. J. (1998). Efficient global optimization of

expensive black-box functions. Journal of Global Optimization, 13, 455–492.Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the

IEEE international conference on neural networks, volume 4 Nov/Dec 1995, (pp.1942–1948).

Marco, Dorigo, Di Caro, Gianni, & Gambardella, Luca M. (1999). Ant algorithms fordiscrete optimization. Artificial Life, 5, 137–172.

McGill, Jacob A., Ogunnaike, Babatunde A., & Vlachos, Dionisios G. (2012). Effi-cient gradient estimation using finite differencing and likelihood ratios forkinetic Monte Carlo simulations. Journal of Computational Physics, 231(21),7170–7186.

Nocedal, Jorge, & Wright, Stephen. (2006). Numerical optimization (2nd edition).New York: Springer.

Papageorgiou, A., & Traub, J. F. (1997). Faster evaluation of multidimensional inte-grals. Computers in Physics, 11, 574–578.

Press, William, Teukolsky, Saul, Vetterling, William, & Flannery, Brian. (2007).Numerical recipes 3rd edition: The art of scientific computing. New York:Cambridge University Press.

Shan, Songqing, & Wang, G. (2010). Survey of modeling and optimizationstrategies to solve high-dimensional design problems with computationally-expensive black-box functions. Structural and Multi-disciplinary Optimization, 41,219–241.

Spall, J. C. (1998). Implementation of the simultaneous perturbation algorithm forstochastic optimization. IEEE Transactions on Aerospace and Electronic Systems,34, 817–823.

Spall, James C. (2001). Stochastic optimization, stochastic approximation and simulatedannealing. New York: John Wiley & Sons, Inc.

Spall, James C. (2003). Introduction to stochastic search and optimization: Estimation,simulation, and control. Hoboken, NJ: Wiley-Interscience.

Struckmeier, J. (1995). Fast generation of low-discrepancy sequences. Journal ofComputational and Applied Mathematics, 61, 29–41.

Thattai, Mukund, & Oudenaarden, Alexander van. (2001). Intrinsic noise in generegulatory networks. Proceedings of the National Academy of Sciences of the UnitesStates of America, 98(15), 8614–8619.

Trelea, I. C. (2003). The particle swarm optimization algorithm: convergence analysisand parameter selection. Information Processing Letters, 85, 317–325.







https://www.researchgate.net/publication/223177170_Fast_Generation_of_Low-Discrepancy_Sequences?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==

https://www.researchgate.net/publication/223177170_Fast_Generation_of_Low-Discrepancy_Sequences?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==























https://www.researchgate.net/publication/234831331_Numerical_Recipes_Source_Code_CD-ROM_3rd_Edition_The_Art_of_Scientific_Computing?el=1_x_8&enrichId=rgreq-0b253dd5-4211-4c7d-8d40-c0866a84c173&enrichSource=Y292ZXJQYWdlOzI1OTA5NjU1NTtBUzoxNTUxMzQ3NDMwMjc3MTJAMTQxMzk5ODQxMzk2Mw==










A robust and efficient triangulation-based optimization algorithm for stochastic black-box systems

Documents

Transcript of A robust and efficient triangulation-based optimization algorithm for stochastic black-box systems