Efficient and Accurate Statistical Analog Yield Optimization and Variation-Aware Circuit Sizing...

13
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011 793 Efficient and Accurate Statistical Analog Yield Optimization and Variation-Aware Circuit Sizing Based on Computational Intelligence Techniques Bo Liu, Francisco V. Fernández, and Georges G. E. Gielen, Fellow, IEEE Abstract —In nanometer complementary metal-oxide-semi- conductor technologies, worst-case design methods and response- surface-based yield optimization methods face challenges in accuracy. Monte-Carlo (MC) simulation is general and accurate for yield estimation, but its efficiency is not high enough to make MC-based analog yield optimization, which requires many yield estimations, practical. In this paper, techniques inspired by computational intelligence are used to speed up yield op- timization without sacrificing accuracy. A new sampling-based yield optimization approach, which determines the device sizes to optimize yield, is presented, called the ordinal optimization (OO)- based random-scale differential evolution (ORDE) algorithm. By proposing a two-stage estimation flow and introducing the OO technique in the first stage, sufficient samples are allocated to promising solutions, and repeated MC simulations of non-critical solutions are avoided. By the proposed evolutionary algorithm that uses differential evolution for global search and a random- scale mutation operator for fine tunings, the convergence speed of the yield optimization can be enhanced significantly. With the same accuracy, the resulting ORDE algorithm can achieve approximately a tenfold improvement in computational effort compared to an improved MC-based yield optimization algo- rithm integrating the infeasible sampling and Latin-hypercube sampling techniques. Furthermore, ORDE is extended from plain yield optimization to process-variation-aware single-objective circuit sizing. Index Terms—Differential evolution, ordinal optimization, variation-aware analog sizing, yield optimization. I. Introduction I NDUSTRIAL analog integrated circuit design not only calls for fully optimized nominal design solutions, but also requires high robustness and yield in the light of varying supply voltage and temperature conditions, as well as inter-die and intra-die process variations [1]–[3]. Especially in nanome- ter complementary metal–oxide–semiconductor (CMOS) tech- Manuscript received January 21, 2010; revised May 28, 2010 and September 22, 2010; accepted November 27, 2010. Date of current version May 18, 2011. This work was supported by a special bilateral agreement scholarship of Katholieke Universiteit Leuven, Leuven, Belgium, and Tsinghua University, Beijing, China, and by the TIC-2532 Project funded by Consejeria de Innovación, Ciencia y Empresa, Junta de Andalucia, Spain. This paper was recommended by Associate Editor P. Li. B. Liu and G. G. E. Gielen are with Katholieke Universiteit Leuven, B-3001 Leuven, Belgium (e-mail: [email protected]; georges.gielen@esat. kuleuven.be). F. V. Fernández is with IMSE-CNM, CSIC, University of Sevilla, Sevilla, E-41092, Spain (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCAD.2011.2106850 nologies, random and systematic process variations have a large influence on the quality and yield of the manufactured analog circuits. As a consequence, in the high-performance analog and mixed-signal design flows, the designer needs guidelines and tools to deal with these factors impacting circuit yield and performances in an integrated manner in order to avoid costly re-design iterations [4]. Yield optimization includes system-level hierarchical opti- mization [5] and building-block-level yield optimization [6], [7]. At the building block level, there exist parametric yield optimization [6]–[8] and layout-related yield optimization [9]– [11], e.g., critical area yield analysis [9]. This paper focuses on parametric yield optimization at the building-block level. The yield optimization flow is summarized in Fig. 1. In the optimization loop, the candidate circuit parameters are generated by the optimization engine; the performances and yield are analyzed and fed back to the optimization engine for the next iteration. Yield analysis is a critical point in the yield optimization flow. Among the factors that impact yield, statistical inter-die and intra-die process variations play a vital role [8]. Previous yield optimization methods include device model corner-based methods [3], [12], performance-specific worst-case design (PSWCD) methods [6], [7], response-surface-based methods [2], [15], and Monte-Carlo (MC)-based methods. 1) Device model corner-based methods [3], [12] use the same slow/fast parameter sets to decide the worst-case parameters for all circuits for a given technology. They are efficient due to the limited number of simulations needed. But their drawback is that the worst-case perfor- mance values are pessimistic as the corners correspond to the tails of the joint probability density function of the parameters, resulting in considerable over-design. Also, the slow/fast values obtained for a single performance, e.g., delay, and the worst-case process parameters for other performances may be different. Second, the actual yield may be low if the intra-die variations are ignored. If the intra-die variations were considered, the number of simulations would be extremely large. The limitations of device model corner-based methods for robust analog sizing are discussed in [1] and [13]. 2) The PSWCD methods [6], [7], [13], [14] represent an important progress in robust sizing of analog ICs. Instead of using the same slow/fast parameter sets 0278-0070/$26.00 c 2011 IEEE

Transcript of Efficient and Accurate Statistical Analog Yield Optimization and Variation-Aware Circuit Sizing...

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011 793

Efficient and Accurate Statistical Analog YieldOptimization and Variation-Aware Circuit SizingBased on Computational Intelligence Techniques

Bo Liu, Francisco V. Fernández, and Georges G. E. Gielen, Fellow, IEEE

Abstract—In nanometer complementary metal-oxide-semi-conductor technologies, worst-case design methods and response-surface-based yield optimization methods face challenges inaccuracy. Monte-Carlo (MC) simulation is general and accuratefor yield estimation, but its efficiency is not high enough tomake MC-based analog yield optimization, which requires manyyield estimations, practical. In this paper, techniques inspiredby computational intelligence are used to speed up yield op-timization without sacrificing accuracy. A new sampling-basedyield optimization approach, which determines the device sizes tooptimize yield, is presented, called the ordinal optimization (OO)-based random-scale differential evolution (ORDE) algorithm. Byproposing a two-stage estimation flow and introducing the OOtechnique in the first stage, sufficient samples are allocated topromising solutions, and repeated MC simulations of non-criticalsolutions are avoided. By the proposed evolutionary algorithmthat uses differential evolution for global search and a random-scale mutation operator for fine tunings, the convergence speedof the yield optimization can be enhanced significantly. Withthe same accuracy, the resulting ORDE algorithm can achieveapproximately a tenfold improvement in computational effortcompared to an improved MC-based yield optimization algo-rithm integrating the infeasible sampling and Latin-hypercubesampling techniques. Furthermore, ORDE is extended from plainyield optimization to process-variation-aware single-objectivecircuit sizing.

Index Terms—Differential evolution, ordinal optimization,variation-aware analog sizing, yield optimization.

I. Introduction

INDUSTRIAL analog integrated circuit design not onlycalls for fully optimized nominal design solutions, but also

requires high robustness and yield in the light of varyingsupply voltage and temperature conditions, as well as inter-dieand intra-die process variations [1]–[3]. Especially in nanome-ter complementary metal–oxide–semiconductor (CMOS) tech-

Manuscript received January 21, 2010; revised May 28, 2010 and September22, 2010; accepted November 27, 2010. Date of current version May 18,2011. This work was supported by a special bilateral agreement scholarshipof Katholieke Universiteit Leuven, Leuven, Belgium, and Tsinghua University,Beijing, China, and by the TIC-2532 Project funded by Consejeria deInnovación, Ciencia y Empresa, Junta de Andalucia, Spain. This paper wasrecommended by Associate Editor P. Li.

B. Liu and G. G. E. Gielen are with Katholieke Universiteit Leuven, B-3001Leuven, Belgium (e-mail: [email protected]; [email protected]).

F. V. Fernández is with IMSE-CNM, CSIC, University of Sevilla, Sevilla,E-41092, Spain (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCAD.2011.2106850

nologies, random and systematic process variations have alarge influence on the quality and yield of the manufacturedanalog circuits. As a consequence, in the high-performanceanalog and mixed-signal design flows, the designer needsguidelines and tools to deal with these factors impacting circuityield and performances in an integrated manner in order toavoid costly re-design iterations [4].

Yield optimization includes system-level hierarchical opti-mization [5] and building-block-level yield optimization [6],[7]. At the building block level, there exist parametric yieldoptimization [6]–[8] and layout-related yield optimization [9]–[11], e.g., critical area yield analysis [9]. This paper focuseson parametric yield optimization at the building-block level.

The yield optimization flow is summarized in Fig. 1.In the optimization loop, the candidate circuit parametersare generated by the optimization engine; the performancesand yield are analyzed and fed back to the optimizationengine for the next iteration. Yield analysis is a criticalpoint in the yield optimization flow. Among the factorsthat impact yield, statistical inter-die and intra-die processvariations play a vital role [8]. Previous yield optimizationmethods include device model corner-based methods [3],[12], performance-specific worst-case design (PSWCD)methods [6], [7], response-surface-based methods [2], [15],and Monte-Carlo (MC)-based methods.

1) Device model corner-based methods [3], [12] use thesame slow/fast parameter sets to decide the worst-caseparameters for all circuits for a given technology. Theyare efficient due to the limited number of simulationsneeded. But their drawback is that the worst-case perfor-mance values are pessimistic as the corners correspondto the tails of the joint probability density function of theparameters, resulting in considerable over-design. Also,the slow/fast values obtained for a single performance,e.g., delay, and the worst-case process parameters forother performances may be different. Second, the actualyield may be low if the intra-die variations are ignored.If the intra-die variations were considered, the numberof simulations would be extremely large. The limitationsof device model corner-based methods for robust analogsizing are discussed in [1] and [13].

2) The PSWCD methods [6], [7], [13], [14] representan important progress in robust sizing of analog ICs.Instead of using the same slow/fast parameter sets

0278-0070/$26.00 c© 2011 IEEE

794 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011

for all the circuits, the PSWCD methods decide onthe worst-case parameters for specific performancesof each circuit and nominal design. Determining theperformance-specific worst-case parameters is criticalfor this kind of method. Although the search for theWC point typically uses some nonlinear optimizationformulation, most PSWCD methods [13] linearize theperformances at the worst-case point, which can intro-duce inherent errors. Some PSWCD methods build aresponse surface between the inter-die parameters andthe performances [14] (RSM PSWCD). The inter-dieparameters are independent of the design parameters,but intra-die variations have correlations to the designparameters. Therefore, it is not easy to consider them,and furthermore accounting for intra-variations for eachdevice would increase the total number of the processvariation variables dramatically. While some PSWCDmethods calculate an approximate estimation of theyield, others do not calculate yield. Instead, they cal-culate a range of the process parameters for a givenyield, in which the specifications are met. In this case,the estimated yield is not available explicitly and themethod has to be run repeatedly with different targetvalues (e.g., yield > 95–99%) to know the best yieldthat can be achieved.

3) In response-surface-based methods, first macro-modelsover the yield and the design variables and processparameters are established through regression methodsand these are subsequently used to estimate the yieldin the sizing process. Macro-models can be classifiedinto white-box models and black-box models. A white-box model analytically expresses the yield as a functionof the design and process parameters. Some additionalparameters are used for regression purposes. Black-boxmodels, on the other hand, do not consider analyticalexpressions of the yield, but construct a regressionmodel according to the input (i.e., design points, processparameters) and the output (i.e., yield) data. Accurateyield-aware performance macro-models can make a siz-ing tool explore design alternatives with little compu-tational cost. However, response-surface-based methodssuffer from the trade-off between the accuracy andthe complexity of the model, as well as the accuracyand the number of samples (CPU time) to create themodel.

4) MC-based methods have the advantages of generalityand high accuracy [16], so they are the most reliable andcommonly used technique for yield estimation. Never-theless, a large number of simulations are needed for MCanalysis, therefore preventing its use within an iterativeyield optimization loop (Fig. 1). Some speed enhance-ment techniques for MC simulations based on design ofexperiments (DOE) techniques have been proposed, suchas the Latin hypercube sampling (LHS) method [17],[18] or the quasi-Monte-Carlo (QMC) method [19], [20],to replace primitive Monte-Carlo (PMC) simulation.These speed improvements are very significant, but ourexperiments show that the computational load is still too

Fig. 1. General flow of yield optimization methods.

large for yield optimization if only using DOE methodsin real practice (see Section IV).

Currently, PSWCD methods and response-surface-basedmethods are the most popular approaches in the repeatediterations within yield optimization loops, while some formof Monte-Carlo yield estimation is most popular in designverification.

Therefore, in this paper we address the efficiency of MC-based yield optimization by proposing a different (but comple-mentary) approach exploiting techniques from computationalintelligence: while keeping high accuracy, we dramaticallyincrease the efficiency of yield optimization by: 1) optimallyallocating the computing budget to candidate solutions inorder to avoid non-critical MC simulations, and 2) enhancingthe convergence speed of the search strategy by means ofa random-scale mutation operator in combination with thedifferential evolution (DE) algorithm to decrease the amountof expensive MC simulations.

Based on the above ideas, we then present the ordinaloptimization (OO)-based random-scale differential evolution(ORDE) algorithm for analog yield optimization. The methodaims to:

1) be general enough to be applied to any analog circuit inany technology process and for any distribution of theprocess parameters;

2) simultaneously handle inter-die and intra-die variationsin nanometer technologies;

3) provide highly accurate results comparable to Monte-Carlo analysis;

4) use an order of magnitude less computational effortcompared with the improved MC-based method in-tegrating the infeasible pruning and Latin hypercubesampling techniques (Section III-A) and as such makingthe computational time of accurate yield optimizationpractical.

The remainder of this paper is organized as follows. Sec-tion II reviews basic concepts of yield optimization. Sec-tion III introduces the components and the general frameworkof ORDE. Section IV tests ORDE on practical examples.Comparisons with response-surface-based methods are alsoperformed. In Section V, the ORDE algorithm is extended fromplain yield optimization to process-variation-aware single-objective analog sizing, which optimizes a target design ob-jective (e.g., power) subject to a minimum yield requirement.The concluding remarks are presented in Section VI.

LIU et al.: EFFICIENT AND ACCURATE STATISTICAL ANALOG YIELD OPTIMIZATION AND VARIATION-AWARE CIRCUIT SIZING 795

II. Basics of Yield Optimization

The aim of yield optimization is to find a circuit design pointd∗ that has a maximum yield, considering the manufacturingand environmental variations [8]. In the following, we willelaborate the design space D, process parameter space S withdistribution pdf(s), environmental parameter space �, andspecifications P.

The design space D is the search space of the circuitdesign points, d, which can be transistor widths and lengths,resistances, capacitances and bias voltages and currents. Eachone has an upper and lower bound, which is determinedby the technological process or the user’s setup. The pro-cess parameter space S is the space of statistical parametersreflecting the process fluctuations, e.g., oxide thickness Tox

and threshold voltage Vth. Process parameter variations canbe inter-die or intra-die. For an accurate model, both typesshould be considered. The environmental variables � includetemperature and power supply voltage. The specificationsP are the requirements set by the designer, which can beclassified into performance constraints (e.g., DC gain > 70 dB)and functional constraints (e.g., transistors must work in thesaturation region).

The yield optimization problem can be formulated as to finda design point d∗ that maximizes yield (in the case of plainyield optimization) [13]

d∗ = argd∈D

max{Y (d)} (1.1)

or that minimizes some function f (e.g., power, area) subjectto a minimum yield requirement y (in the case of yield-awaresizing) [17]

d∗ = arg mind∈D

{f (d, s, θ)} s ∈ S, θ ∈ �

s.t. Y (d) ≥ y .(1.2)

Yield is defined as the percentage of manufactured circuitsthat meet all the specifications considering process and envi-ronmental variations. Hence, yield can be formulated as

Y (d) = E{YS(d, s, θ)|pdf (s)} (2)

where E is the expected value. YS(d, s, θ) is equal to 1 if theperformance of d meets all the specifications considering s(process fluctuation) and θ (environmental variation); other-wise, YS(d, s, θ) is equal to 0. In most analog circuits, circuitperformances change monotonically with the environmentalvariables θ. Then, the impact of environmental variationscan be handled by simulations at the extreme values of theenvironmental variables. For instance, if the power supply mayexperience some variations, e.g, 10%, the largest degradationis obtained by simulating at the extreme values: (1 ± 10%)×nominal value. Process variations, on the other hand, aremuch more complex: directly simulating the extreme values(classical worst-case analysis [1]) may cause serious over-design. This paper therefore focuses on the impact of statisticalprocess variations (space S) in yield optimization.

Fig. 2. Two-stage yield estimation flow.

III. The ORDE Algorithm

A. The Use of Infeasible Pruning and DOE in ORDE

To satisfy the first three goals (be general enough, ableto handle both inter-die and intra-die variations, high accu-racy) from Section I, MC analysis is selected. The speedenhancement technique, DOE, for MC-based yield estimationis used. The DOE method implemented in ORDE is LHS.However, the key contributions of ORDE are not relatedto a particular sampling mechanism, therefore, other speedacceleration methods like the recently proposed QMC [19],[20] can be integrated. In the yield optimization process,some candidate solutions will appear that cannot satisfy thespecifications even for nominal values of process parameters.Their yield values will be too low to become a useful candidatesolution. Hence, there is not much sense in applying the MC-based yield estimation to these solutions. In ORDE, we callthem infeasible solutions and assign them a zero yield value.Their violation of constraints is calculated, and the constrainedoptimization algorithm will minimize the constraint violationsand move the search space to feasible solutions (i.e., designpoints that satisfy the specifications for nominal process pa-rameters). This technique is named “infeasible pruning” in thispaper. The selected feasible solutions are handled by ordinaloptimization, which is described below.

B. Basics of ORDE

Many recent analog circuit sizing and yield optimizationmethodologies are based on evolutionary computation (EC),which relies on the evolution of a set of candidate solutions,commonly called population, along a set of iterations, com-monly called generations [19]. The computational effort ateach iteration and the necessary number of iterations are twokey factors that affect the speed of the yield optimization. Wesolve these two problems by optimally allocating the com-puting budget to each candidate in the population (reducingthe computational effort at each iteration) and by improvingthe search mechanism (decreasing the necessary number ofiterations, and, hence, decreasing the number of expensive MCsimulations). Therefore, the total computational effort can bereduced considerably. In this paper, we use two computationalintelligence techniques to implement these two key ideas.

Our yield estimation flow is depicted in Fig. 2. In orderto optimally allocate the computing budget at each iteration,instead of assigning the same number of MC simulations tofeasible solutions, the yield estimation process is divided intotwo stages. In the first stage, the fitness ranking of the candi-date solutions and a reasonably accurate yield estimation result

796 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011

Fig. 3. Yield optimization flow.

for good (critical) solutions are important. For medium or bad(non-critical) candidate solutions, their ranking is important,but accurate yield estimation is not. The reason is that thefunction of the yield estimation for non-critical candidates is toguide the selection operator in the EC algorithm, but the candi-dates themselves are likely not to be selected as the final resultor even not enter the second stage of the yield optimizationflow. Hence, the computational efforts spent on feasible butnon-optimal candidate solutions can be strongly reduced. Onthe other hand, the estimations for these non-critical candidatescannot be too inaccurate either. After all, correct selectionof candidate solutions in the yield optimization algorithm isnecessary. In the first stage, the yield optimization problemis therefore formulated as an ordinal optimization problem,aimed at identifying critical candidate solutions by allocatinga sufficient number of samples to the MC simulation of thesesolutions, while reasonably few samples are allocated to non-critical solutions [22]. Notice that this approach is intendedto assign a different number of MC simulations to the yieldestimations of the different candidates. This is different, andcompatible, with the efficient sampling addressed with anyDOE technique. In the second stage of the ORDE method,an accurate result is highly important, so the number ofsimulations within each yield estimation is increased in thesecond stage to obtain an accurate yield value.

Another key technique of ORDE is to decrease the nec-essary number of iterations of the optimization flow shownin Fig. 3. Instead of using conventional EC algorithms, wedesign a selection-based random-scale DE algorithm (RSDE),which is a combination of three different techniques. Eachtechnique plays a significant role in each phase. The first phaseemphasizes a selection-based method to focus the search intothe feasible solution space, defined by the nominal values ofthe process parameters. We use the DE framework [23] (apowerful and fast global optimization algorithm) for globalsearch (emphasized in the second phase) and a random-scaleoperator for fine tunings (emphasized in the third phase).

In the following, the basic components of ORDE willbe introduced first, and the general framework will then bepresented.

C. Introducing Ordinal Optimization into Yield Optimization

OO has emerged as an efficient technique for simulation andoptimization, especially for problems where the computationof the simulation models is time consuming [22]. OO is basedon two basic tenets.

1) Obtaining the “order” of a set of candidates is easier thanestimating an accurate “value” of each candidate. Theconvergence rate of ordinal optimization is exponential.

This means that the probability that the observed bestsolution is among the true best solutions grows asO(e−αn) where α is a positive real number and n is thenumber of simulations [22]. In contrast, the convergencerate of methods aimed at estimating the right valueinstead of the order, e.g., the direct Monte Carlo method,is at most O(1

/√n) [24].

2) An accurate estimation is very costly but a satisfactoryvalue can be obtained much easier.

Therefore, OO fits the objectives of the first stage of yieldestimation of ORDE (see Fig. 2) quite well. In the firststage, a bunch of good designs are selected through evolutionand sent to the second stage. The requirement is a correctselection with a reasonably accurate yield estimation and withthe smallest computational effort. According to OO, a largeportion of the simulations should be conducted with thosecritical solutions in order to reduce the estimator variance. Onthe other hand, limited computational effort should be spent onnon-critical solutions that have little effect on identifying thegood solutions, even if they have large variances. This leadsto the core problem in ordinal optimization: allocating thecomputing budget, which can be formulated as follows. Givena pre-defined computing budget, how should it be distributedamong the candidate designs?

Consider the yield evaluation function. For a single sim-ulation (e.g., a sample of process parameters), we defineYS(d, s) = 1 if all the circuit specifications are met, andYS(d, s) = 0 otherwise. Because the MC simulation determinesthe yield as the ratio of the number of functional chips toall fabricated chips, the mean value of YS(d, s), correspondsto the yield value, Y (d). Let us consider a total computingbudget equal to T simulations. In ORDE, T is determinedby the number of feasible solutions (i.e., solutions that meetthe performance constraints for nominal values of the processparameters) at each generation. Here, we set T = simave ×M1,where M1 is the number of feasible solutions and simave isthe average budget for each candidate set by the user. Thebudget allocation problem consists in determining the numberof simulations n1, n2, · · · , nM1 of the M1 candidate solutionssuch that n1 + n2 + · · · nM1 = T . For this problem, severalalgorithms have been reported in the specialized literature [25],[26]. An asymptotic solution to this optimal computing budgetallocation problem is proposed in [25]:

nb = σb

⎛⎝ M1∑

i=1,i �=b

n2i /σ

2i

⎞⎠

1/2

ni/nj =

(σi/δb,i

σj/δb,j

)2

i, j ∈ {1, 2, · · · , M1} i �= j �= b (3)

where b is the best design of the M1 candidate solutions(represented by the highest estimated yield value based onthe available samples for each candidate). For each candidatesolution, some samples are allocated. For each sample, thecorresponding YS(d, s) can be computed (0 or 1). By theseYS(d, s), we can calculate their mean (estimated yield, Y (d))and σ2

1 , σ22 , · · · , σ2

M1, which are the finite variances of theM1 solutions, respectively. They measure the accuracy of

LIU et al.: EFFICIENT AND ACCURATE STATISTICAL ANALOG YIELD OPTIMIZATION AND VARIATION-AWARE CIRCUIT SIZING 797

Algorithm 1 Ordinal optimization for analog yield analysis

Step 0: Let k=0, and perform n0 simulations for eachfeasible design, i.e., nk

i = n0, i = 1, 2, ..., M1.

Step 1: If∑M1

i=1 nki ≥ T , stop the OO for yield analysis.

Step 2: Consider � additional simulations (refer to [22] forthe selection of the � and n0 values) and compute the newbudget allocation nk+1

i , i = 1, 2, ..., M1 by (3). If nk+1i ≥

nmax, then nk+1i = nmax.

Step 3: Perform additional max{0, nk+1i −nk

i } simulations fordesign di, i = 1, 2, ..., M1. Let k = k + 1 and go to step 1.

Fig. 4. Function of OO in a typical population.

the estimation. Parameter δb,i = Yb(d) − Yi(d) represents thedeviations of the estimated yield value of each design solutionwith respect to that of the best design. The interpretation of (3)is quite intuitive. If δb,i is large, the estimated yield value ofdesign i is bad, and according to ni/nj = ( σi/δb,i

σj/δb,j)2, ni becomes

small, i.e., we should not allocate many simulations to thisdesign. However if σi is large, it means that the accuracyof the yield estimation is low, and we should allocate moresimulations to this design to obtain a better yield estimate.Therefore, the quotient σi/δb,i represents a trade-off betweenthe yield value of design i and the accuracy of its estima-tion. Therefore, an OO-based yield analysis algorithm can bedesigned as Algorithm 1.

Parameter n0 is the initial number of simulations for eachcandidate solution, selected to provide a very rough idea of theyield. More simulations are allocated according to the qualityof the candidate later on. Parameter nmax is the upper limitof the number of simulations for any candidate. The value ofnmax must call for a balance between the accuracy and theefficiency.

A typical population from example 2, described in Sec-tion IV, is selected to show the benefits of OO (see Fig. 4):candidates with a yield value larger than 70% correspondto 31% of the population, and are assigned 56% of thesimulations. Candidates with a yield value smaller than 40%correspond to 33% of the population, and are only assigned12% of the simulations. The total number of simulations is10.2% of those of the infeasible pruning (IP)+LHS methodapplied to the same candidate designs, because repeated MCsimulations of non-critical solutions are avoided.

The above technique is used until the yield converges closeto the desired value. For example, if the desired target yield

is 99%, the threshold value between the first and the secondstage can be 97%. Candidates with an estimated yield largerthan the threshold value enter the second stage. In this stage,all the candidates are assigned the specified maximum number(nmax) of samples to guarantee the accuracy of the final result,while other candidates in the population still remain in the firststage and still use the estimation method described previously.Note that the two stages are therefore not separated in time,but rather use different yield estimation methods.

The threshold value must be properly selected. A too lowthreshold value may cause low efficiency, as OO would stopwhen the yield values of the selected points are not promisingenough (e.g., a 50% yield threshold for a requirement of90% yield) and shifts the yield estimation and selection tasksto the second stage, which is more CPU expensive. A toohigh threshold value (e.g., a threshold equal to the yieldrequirement) may cause low accuracy. The reason is that inmost cases the points selected by OO are promising (it cancompare the candidates and do the selection correctly) but theestimated yield values are not sufficiently accurate for the finalresult. Assigning the threshold to be two percentage pointsbelow the required target yield represents an appropriate trade-off between efficiency and accuracy.

D. Brief Introduction to the DE Algorithm

In addition to introducing OO to decrease the computationaleffort at each iteration, decreasing the necessary number ofiterations is another key objective. The DE algorithm [23]is selected as the global search engine. The DE algorithmoutperforms many EC algorithms in terms of solution qualityand convergence speed [23]. DE uses a simple differentialoperator to create new candidate solutions and a one-to-onecompetition scheme to greedily select new candidates.

The ith candidate solution in the Q-dimensional searchspace at generation t can be represented as

di(t) = [di,1, di,2, · · · , di,Q]. (4)

At each generation t, the mutation and crossover operatorsare applied to the candidate solutions, and a new populationarises. Then, selection takes place, and the corresponding can-didate solutions from both populations compete to comprisethe next generation. The operators are now explained in detail.

For each target candidate solution, according to the mutationoperator, a mutant vector is built

Vi(t + 1) = [vi,1(t + 1), . . . , vi,Q(t + 1)]. (5)

It is generated by adding the weighted difference betweena given number of candidate solutions randomly selectedfrom the previous population to another candidate solution.In ORDE, the latter one is selected to be the best individualin the current population. The mutation operation is thereforedescribed by the following equation:

Vi(t + 1) = dbest(t) + F (dr1(t) − dr2(t)) (6)

where indices r1 and r2 (r1, r2 ∈ {1, 2, . . . , M}) are randomlychosen and mutually different, and also different from thecurrent index i. Parameter F ∈ (0, 2] is a constant called

798 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011

the scaling factor, which controls the amplification of thedifferential variation dr1(t) − dr2(t). The population size Mmust be at least 4, so that the mutation can be applied. Thebase vector to be perturbed, dbest(t), is the best member of thecurrent population, so that the best information can be sharedamong the population.

After the mutation phase, the crossover operator is appliedto increase the diversity of the population. Thus, for each targetcandidate solution, a trial vector is generated as follows:

Ui(t + 1) = [ui,1(t + 1), . . . , ui,Q(t + 1)] (7)

ui,j(t + 1) =

{vi,j(t + 1), if(rand(i, j) ≤ CR)orj = randn(i)

di,j(t), otherwise(8)

where j = 1, 2, . . . , Q and rand(i, j) is an independentrandom number uniformly distributed in the range [0, 1].Parameter randn(i) is a randomly chosen index from theset {1, 2, . . . , Q}. Parameter CR ∈ [0, 1] is a constant calledthe crossover parameter, which controls the diversity of thepopulation.

Following the crossover operation, the selection operationdecides whether the trial vector Ui(t + 1) will be a memberof the population of the next generation t + 1 or not. For aminimization problem, Ui(t + 1) is compared to the initialtarget candidate solution di(t) by the following one-to-one-based greedy selection criterion:

di(t + 1) =

{Ui(t + 1), if f (Ui(t + 1)) < f (di(t))di(t), otherwise

(9)

where the function f is the objective function, i.e., the functionto be minimized or maximized. In this paper, the objectivefunction is the yield in the case of standard yield optimizationand some circuit performance in the case of yield-awaresizing. The candidate solution, di(t+1), becomes the candidatesolution of the new population. Then, the next iteration begins.

E. Random-Scale Operator for Combined Global and LocalSearch

Although DE is a very powerful and fast global optimizationalgorithm, it is not so efficient in local tuning to reach theexact optimal solution (other global optimization algorithms,e.g., genetic algorithms, also have the same problem). Butlocal tuning is emphasized in the fine-tuning phase (phase 3in Fig. 3). Usually, in this phase, there are many candidateswhich are assigned the maximum number of simulations(nmax) to estimate the yield, which is expensive, but otherwisethe accuracy would degrade very significantly. Moreover, theglobal optimization mechanism must be maintained even inthis phase, because otherwise the yield optimization has a highrisk to be stuck at a premature solution. We therefore propose acombined global and local search mechanism, whose purposeis to enhance the convergence speed while at the same timemaintaining the accuracy.

In the EC field, enhancing the local search ability is oftenachieved by memetic algorithms [27]. In addition to the global

Fig. 5. Mutant vectors obtained by the random-scale operator.

optimization engine, memetic algorithms use a population-based strategy coupled with individual search heuristics ca-pable of performing local refinements. Local search methodscan be classified into derivative-based methods (e.g., quasi-Newton method [28]) and derivative-free methods (e.g., hillclimbing method [29]). In derivative-based methods, calcu-lating the required derivatives, e.g., Hessian matrix, oftenconsumes numerous function evaluations when the numberof design variables is large, especially when the derivativescannot be expressed analytically. Normally, for medium-scaleproblems (10–20 design variables), derivative-free methodsalso need more than 20–30 iterations for each candidate, andeach iteration needs nmax simulations. As this number has tobe multiplied by the size of the population, this procedurebecomes very expensive. Hence, for yield optimization, acheaper method is necessary.

Instead of performing a separate global and local search,our proposed approach is to combine global search and localsearch into a unified procedure. In (6), the scaling factor Fis a constant for all the candidate solutions. If F is small, thewhole evolution process will be slow; if F is large, it is difficultto perform effective local fine-tunings. To solve the problem,a natural idea is to randomize F in (6) to each differentialvariation. By randomizing F, the differences of vectors can beamplified stochastically, and the diversity of the population isretained. This introduces two advantages: 1) the algorithm hasa lower probability of providing premature solutions becauseof the reasonable diversity, and 2) the vicinity of the mutantvector that the standard DE can explore is investigated by therandomized amplification of the differential variation dr1(t) −dr2(t). In a standard DE search process, the candidates may getstuck in a region and make the evolution quite slow when theglobal optimization point is nearly reached (but the diversityis also maintained). This is called “stagnation” [30]. Fig. 5shows the effect of randomizing F. It can be seen that a cloudof potential points centered around the mutant vector withconstant scaling factor F1 have the potential to be investigated.

In our method, as a scaling factor we use a vector F̂

composed of Gaussian-distributed random variables with meanvalue μ and variance σ: F̂i,j = norm(μ, σ), i = 1, 2, . . . M,

LIU et al.: EFFICIENT AND ACCURATE STATISTICAL ANALOG YIELD OPTIMIZATION AND VARIATION-AWARE CIRCUIT SIZING 799

Fig. 6. Flow diagram of ORDE.

j = 1, 2, . . . Q. A Gaussian distribution is selected based onthe following two considerations.

1) As the purpose of the random scaling factor is to searchthe vicinity of the mutant vectors by the constant F,it should not be far from it. By using a Gaussiandistribution, 68% of the generated samples in F̂ arewithin 1σ.

2) It should have the ability to escape from the “stagnation.”A Gaussian distribution can also provide 5% of F̂ valuesout of 2σ. We have also tried uniform and Cauchy distri-butions for the scaling factor using benchmark problemsin the EC field and found that the Gaussian-distributed F̂

results in the best average objective function value. Foreach variable in the search space, the scaling factor F̂i,j

of each differential variation dr1(t) − dr2(t) is different.Equation (6) is therefore changed to

Vi(t + 1) = dbest(t) + F̂i(dr1(t) − dr2(t)). (10)

By the proposed combined global and local searchmechanism, the necessary number of iterations of theyield optimization algorithm decreases significantly (seeexample 2 in Section IV).

F. Other Components

Besides the two key ideas described above, we use theselection method in [31] to handle the optimization constraints.They include both circuit performance constraints (e.g., gainlarger than 80 dB) and functional constraints (e.g., transistorsmust work in the saturation region). The advantages of thisselection method and its combination with the DE algorithmfor analog sizing have been shown in [32].

G. The General Framework of ORDE

Based on the above components, the overall ORDE algo-rithm for analog yield optimization can now be constructed.The detailed flow diagram is shown in Fig. 6.

The ORDE algorithm consists of the following steps.

Algorithm 2 The ORDE algorithm for analog yield optimization

Step 0: Initialize parameters n0, T, �, nmax and the DE algo-rithm parameters (e.g., the population size M, the crossoverrate CR). Initialize the population by randomly selectingvalues of the design variables d within the allowed ranges.Step 1: Update the current best candidate. If no candidatemeets the specifications for nominal process parameters,the best candidate is the one with the smallest constraintviolation. Otherwise, the best candidate is the feasiblecandidate with the largest estimated yield.Step 2: Perform the mutation operation according to (10)to obtain each candidate solution’s mutant counterpart.Step 3: Perform the crossover operation between eachcandidate solution and its corresponding mutant counterpartaccording to (8) to obtain each individual’s trial individual.Step 4: Check the feasibility of the trial individual. Forfeasible solutions, go to step 5.1; for infeasible solutions,go to step 5.2.Step 5.1: Set constraint violations equal to 0, and use theOO technique described in Algorithm 1 to calculate theyield. If the estimated yield is higher than the thresholdvalue, add additional samples to perform the full MCsimulation. Go to step 6.Step 5.2: Set yield equal to 0, and calculate the constraintviolations. No yield is estimated in this step. Go to step 6.Step 6: Perform selection between each candidate solutionand its corresponding trial counterpart according to the rulesin [31]: if both of them are not feasible, select the one withsmaller constraint violation; if one is feasible and the otheris infeasible, select the feasible one; if both are feasible,select the one with higher yield.Step 7: If the stopping criterion is met (e.g., a convergencecriterion or a maximum number of generations), then outputdbest and its objective function value; otherwise go toStep 1.

IV. Experimental Results and Comparisons

In this section, the ORDE algorithm is demonstrated bytwo practical analog circuits in 0.35 μm and 90 nm CMOStechnologies, respectively. To highlight the effects of the twokey contributions of ORDE, it will be compared with areference method that combines the DE optimization engine,the selection-based constraint handling mechanism, infeasiblepruning and LHS techniques. In the DE search engine, thepopulation size is 50 and the crossover rate is 0.8. Except forORDE, the DE scaling factor F in the other experiments is 0.8,which is a common setting [23]. In the random-scale searchoperator, we choose a Gaussian distribution with μ = 0.75 andσ = 0.25. The optimization process stops when the reportedyield reaches 99%, or when the yield does not increase for20 consecutive generations. If parameter n0 is set to a toolow value, the yield estimates are too inaccurate, even for theapplication of (3). If it is too high, the advantages of OOare lost. A value between 5 and 20 is recommended in [22].We use n0 = 15 in ORDE. Parameter simave is set to 35in all experiments. It may seem that 35 simulations are too

800 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011

few to get an acceptable accuracy. However, there are threeconsiderations that must be taken into account.

1) LHS sampling is used. It has been reported that LHSgets a comparable accuracy to PMC simulation in circuityield analysis with just 20–25% the number of samplesof PMC [34].

2) Parameter simave is just an average number of simu-lations of different candidates. By using OO, the MCsimulations are optimally allocated according to thesolution qualities, so promising candidate solutions areassigned much more than 35 simulations. According toexperiments, some promising candidates are assignedmore than 160 LHS simulations.

3) We do not need a very accurate result in the first stage, asthe purpose of this stage is correct selection and gettinga reasonably accurate yield estimation for promisingpoints.

The examples have been run on a PC with 4 GB RAMand Linux operating system, in the MATLAB environment.Synopsys HSPICE electrical simulator is used as the circuitperformance evaluator. The key techniques in this paper, i.e.,the OO for yield optimization and the random-scale operatorare analyzed by statistical results here. The abilities of theDE optimization kernel for analog sizing compared with someother EC algorithms have been reported in [32], and will notbe compared here.

A. Experimental Method

There are several aspects that have to be considered whendesigning the experiments. First, the number of MC simula-tions for each feasible candidate should be decided. There isnot much sense in comparing the efficiency without a goodaccuracy. The number of MC simulations in the second stageis the main factor that influences the accuracy of the finalresult. The accuracy of the yield estimates is related to thenumber of samples according to [1]

nMC ≈ Y (1 − Y )k2γ

�Y 2(11)

where Y is the yield value and �Y is the confidence interval,e.g., if the yield estimate is 90%, and �Y = 1%, thenthe confidence interval is 89–91%. Parameter kγ reflects theconfidence level, e.g., kγ = ±1.645 denotes a 90% confidencelevel. From (11), the necessary number of MC simulations canbe calculated. However, this corresponds to the primitive MCsimulation. According to [34] and [35], LHS sampling requires20–25% the number of samples compared with PMC to geta comparable accuracy. Fig. 7 shows the estimated numberof LHS simulations needed for a confidence level of 90%,95%, and 99%, respectively, when �Y = 1%. The numberof LHS simulations is estimated as 20% of the necessarynumber of PMC samples. It can be seen that even for a 99%confidence level, for a yield larger than 96%, 500 LHS pointsare sufficient. For a 90% confidence level, 500 LHS pointsare even sufficient for a yield larger than 90%. In all theexperiments, the threshold to use 500 LHS simulations is 97%,so 500 LHS samples are enough for the required accuracy.

Fig. 7. Necessary numbers of LHS simulations.

Second, the estimated yield result is influenced by thenumber of samples. Two experiments using 50 and 500 MCsimulations for each feasible candidate can report a solutionwith “100% yield,” but the true yield value can be quitedifferent. To reflect the real accuracy, we calculate the yieldestimated by 50 000 LHS MC simulations at the same designpoint. From (11), we can calculate that with 99% confidencelevel and �Y = 0.1%, the corresponding yield value of 50 000LHS simulations is 96%. The results we test are all higher than96%. Hence, an estimation result from 50 000 LHS simulationsis a very reliable approximation of the real yield value for useas a reference result.

Third, the method to measure the efficiency should be de-cided. The performance of evolutionary algorithms (EA) is af-fected by the random numbers used in the evolution operators.The CPU times and the yield results have differences betweendifferent runs. To address the stochasticity of the results ofthe evolution process, all experiments are therefore executedten times with different random numbers and the results areanalyzed and compared statistically showing typical, best, andworst performance. In this way, the comparison in terms ofaccuracy and efficiency is reliable.

B. Test Example 1

The ORDE algorithm is first tested on a fully differentialfolded-cascode amplifier, shown in Fig. 8, implemented ina 0.35 μm CMOS process with 3.3V power supply. Thespecifications are gain A0 ≥ 70 dB, gain-bandwidth GBW ≥40 MHz, phase margin PM ≥ 60°, output swing OS ≥ 4.6Vand power ≤ 1 mW. There are 13 design variables. Thetransistor width has a range of 1 μm to 600 μm; the transistorlength has a range of 0.35 μm to 5 μm; the biasing current hasa range of 10 μA to 200 μA. The total number of the processvariation variables is 80, including 15 transistors × 4 intra-die variables/transistor = 60 intra-die variables (mismatch) and20 inter-die variables. Statistical information of the processvariables has been extracted from the technology informationprovided by the foundry.

Experiments with the infeasible pruning (IP) +LHS methodhave been performed using 300 and 500 LHS MC simulationsfor each feasible candidate. The results of the yield estimateprovided by a 50 000 MC simulation analysis at the same finaldesign point and the total number of simulations are analyzed.

LIU et al.: EFFICIENT AND ACCURATE STATISTICAL ANALOG YIELD OPTIMIZATION AND VARIATION-AWARE CIRCUIT SIZING 801

Fig. 8. Fully differential folded-cascode amplifier.

TABLE I

Yield Results (Using 50 000 MC Simulations) of the Solutions

Obtained by Different Methods (Example 1)

Methods Best Worst Average Variance300 simulations (IP + LHS) 98.5% 96.2% 96.8% 0.008%500 simulations (IP + LHS) 98.7% 96.6% 97.5% 0.007%ORDE 99.1% 97.3% 98.5% 0.004%

TABLE II

Total Number of Simulations (Example 1)

Methods Best Worst Average300 simulations (IP + LHS) 181 500 986 100 464 570500 simulation (IP + LHS) 357 500 3 591 500 1 824 520ORDE 25 376 439 984 150 540

The statistical results of ten independent runs are shown inTables I, II, and Fig. 9.

From the inspection of Table I, the experiment with 500 IP+LHS MC simulations is selected as a benchmark to comparewith ORDE. From Fig. 9, we can see that the deviations ofORDE from the accurate yield estimate obtained by 50 000LHS samples are much better than the other methods and thecomputational cost is much lower. With respect to the numberof simulations, shown in Table II, ORDE costs only 8.25%of the simulations of the infeasible pruning (IP)+LHS methodwith comparable accuracy. Moreover, in many runs of 300 or500 simulations with standard DE, the final reported resultsdo not reach the yield requirement, 99%, while 90% of thereported results of ORDE reach 99%. It can be concluded thatthe random-scale search operator enhances the search ability ofDE. The average cost of CPU time of ORDE for this exampleis about 3–4 min.

In the following, a more complex example will be tested andthe contribution of OO and the random-scale search operatorwill be investigated separately.

Fig. 9. Comparisons of average yield estimate deviation and number ofsimulations for different methods for example 1.

TABLE III

Yield Results (Using 50 000 MC Simulations) of the Solutions

Obtained by Different Methods (Example 2)

Methods Best Worst Average Variance300 simulations (IP + LHS) 99.0% 97.9% 98.3% 0.002%500 simulations (IP + LHS) 99.3% 98.2% 98.9% 0.002%OO+IP+LHS 99.7% 98.1% 98.9% 0.003%ORDE 99.6% 98.3% 98.9% 0.002%

C. Test Example 2

The ORDE algorithm is now tested on a two-stage fullydifferential folded-cascode amplifier with common-mode feed-back, shown in Fig. 10. The circuit is designed in a 90 nmCMOS process with 1.2 V power supply. The specificationsare A0 ≥ 60 dB, GBW ≥ 45 MHz, PM ≥ 60°, OS ≥ 1.9V,power ≤ 2.5 mW and

√area ≤ 250 μm. There exist 21

design variables. The transistor width has a range of 0.12 μmto 800 μm; the transistor length has a range of 0.1 μm to20 μm; the compensation capacitance has a range of 0.1 pF to50 pF; the biasing current has a range of 0.05 mA to 50 mA.All transistors must be in the saturation region. The totalnumber of process variation variables for this technology is143, including 24 transistors × 4 intra-die variables/transistor= 96 intra-die variables and 47 inter-die variables. Statisticalinformation of the process variables was extracted from thetechnology information provided by the foundry.

Experiments with 300 and 500 simulations for each fea-sible candidate by the reference IP+LHS method have beendone. We separately study the improvement provided by theintroduced OO technique and the improvement provided bythe random-scale operator. The results of the yield estimationprovided by a 50 000 MC simulation analysis at the same finaldesign point and the total number of simulations are analyzed.The statistical results of ten independent runs are shown inTables III, IV, and Fig. 11.

From the best, worst and average yield values in Table III,it can be seen that the accuracy with 300 simulations isobviously lower than with 500 simulations. To assess theseparate contribution of OO and the random-scale differentialevolution operator, two experiments are conducted. The firstone only includes OO as well as the IP and LHS techniques.The second experiment corresponds to the use of ORDE (OO

802 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011

Fig. 10. Two-stage fully differential folded-cascode amplifier.

TABLE IV

Total Number of Simulations (Example 2)

Methods Best Worst Average300 simulations (IP + LHS) 115 500 546 900 264 130500 simulations (IP + LHS) 172 500 688 000 418 730OO + IP + LHS 39 828 140 537 90 209ORDE 16 335 100 795 47 421

Fig. 11. Comparisons of average yield estimate deviation and number ofsimulations for different methods for example 2.

and RSDE combined). For statistical characterization, ten runsof each experiment are performed.

From Fig. 11, we can see that the deviations of ORDE fromthe target value are very close to that of using 500 simulationsand the computational cost is much lower. With respect tothe number of simulations, shown in Table IV, ORDE costsonly 11.32% of the number of simulations of the IP+LHSmethod with comparable accuracy. These results come fromthe contribution of both the OO and the random-scale operator.Without the random-scale operator, as can be seen fromthe result of the OO+IP+LHS method, it spends 21.54% ofthe simulations of the IP+LHS method. The average CPU timeof ORDE for this example is 25 minutes. It can therefore beconcluded that ORDE improves the CPU time by an order ofmagnitude for the same accuracy compared to the improvedMC-based method integrating the infeasible pruning and LatinHypercube sampling techniques.

Fig. 12. Result of using NN to approximate yield.

D. Comparisons to RSM Methods

The advantages and drawbacks of the PSWCD methodshave been discussed in Section I. Here, we experimentallycompare ORDE with response-surface-based methods.

In response-surface-based methods, the data obtained fromexpensive MC simulations at a number of design points is usedto generate a regression model able to predict the yield in otherdesign points much cheaper than with a MC simulation, be itat the price of a loss of accuracy. Hence, there exist two trade-offs. The first one is the balance between the accuracy and thecomplexity of the model. In deep-submicrometer or nanometertechnologies, a sufficiently accurate white-box model maybe very complex and makes the regression computationallyintractable [8]. The second trade-off is the balance betweenthe accuracy and the number of samples needed to build themodel. If sufficient accuracy is required, sufficient and welldistributed training data must be provided. But the computa-tional cost also increases sharply as the density of the samplesincreases. In the following, we will show the trade-off betweenaccuracy and computational cost of generating the trainingdata using a black-box model.

To assess the loss of accuracy we use the example 1 inSection IV-B and consider a response-surface method basedon neural networks (NN), often considered as a powerfulregressor [36]. Here, we will use a backward propagation NN[36] with 20 neurons in the hidden layer and the Levenberg-Marquardt algorithm [37], [38] for training it to approximatethe yield. For the source of the training data, we use the datagenerated during a typical execution of ORDE. It has to benoticed that using these sampling data favors the macro-modelas these training data are more significant (because of thekey techniques in ORDE, which make the sampling to bemore effective) than randomly selected MC simulations, oreven those selected by applying only IP and DOE techniques.We will consider the data generated up to a given iterationof ORDE as training data (design parameters as input, andyield values as output), and use the data (yield value) ofsubsequent iterations as test data to assess the accuracy ofthe macro-model. At every iteration, we use the data from all

LIU et al.: EFFICIENT AND ACCURATE STATISTICAL ANALOG YIELD OPTIMIZATION AND VARIATION-AWARE CIRCUIT SIZING 803

previous iterations to train the NN and use this to predict theyield values of the current iteration. The error between thepredicted yield values and the real yield values obtained byMC simulations is then calculated and plotted in Fig. 12. They-axis shows the root-mean-square (RMS) error of the yieldpredictions. The x-axis shows the ratio of the computation timeto generate the training data for the NN (by ORDE) to thetotal computational time of ORDE up to a given iteration. Itcan be seen that in the beginning the error decreases sharply,but then is levels off. Even when all the data from ORDEare used to train the NN, the RMS error is still 8.06%,while ORDE can provide an error less than 1%. Therefore,response-surface-based methods have difficulties in achievinga sufficient accuracy.

V. Enhancing ORDE for Single Objective

Variation-Aware Sizing

As can be seen from the previous experiments, the ORDEalgorithm meets the goals (be general enough, able to handleboth inter-die and intra-die variations, very high accuracy) withsignificant enhancement on efficiency to make the CPU timepractical for yield optimization. On the other hand, in realpractice, if the yield requirement can be met, the designerssometimes want to further optimize some objective function(e.g., power or area) while maintaining the target yield, whichis shown in (1.2). To achieve this, we present an extendedversion of ORDE for single-objective variation-aware sizing.

A. ORDE-Based Single-Objective Variation-Aware Sizing

In single-objective variation-aware sizing, both the objectivefunction f (e.g., power) and the constraint (yield Y ) mustbe considered simultaneously. Hence, we first look at thedifferences between them. Yield is not a stochastic variable,but we have some uncertainties on its estimation. If weperform an infinite number of MC simulations, yield wouldhave an exact value. The objective function, or specification, isdifferent. If we perform an infinite number of MC simulations,power would still have a probability distribution function, butwith an accurate mean and an accurate variance caused by theprocess variations. Therefore, for yield, we use its expectedvalue to describe it. For the objective function, we use the3σ value to guarantee the reliability of the expected objectivefunction value, where σ is extracted from the samples.

The main idea of extending ORDE from plain yield opti-mization to single-objective variation-aware sizing is to add anouter selection procedure considering the objective functionvalue and the yield as constraint. The detailed selectionrules are now as follows: for each candidate solution and itscorresponding trial counterpart:

1) if none of them are feasible for nominal process parame-ters, select the one with the smaller constraint violation;

2) if one is feasible and the other is infeasible for nominalprocess parameters, select the feasible one;

3) if both are feasible for nominal process parameters;

a) if both of them violate the yield constraint, selectthe one with the smaller yield constraint violation;

b) if one satisfies the yield constraint and the otherdoes not, select the feasible one;

c) if both of them satisfy the yield constraint, selectthe one with the smaller f̄ (d) + 3σf (d) (f is theobjective function to be minimized, f̄ is the meanvalue).

Using the above selection rule to replace the original selec-tion rule in ORDE, the extended ORDE for single-objectivevariation-aware sizing can be implemented. We can roughlydivide the algorithm into two phases: the yield satisfactionphase and the objective function optimization phase. If wehandle the single-objective variation-aware sizing problem asa new task, the yield satisfaction phase will be run first.However, we already have the candidates that satisfy theyield constraint as the plain yield optimization is done firstto check if the yield requirement can be met. In this method,we use the last population in the plain yield optimization asthe initial population of the extended ORDE to prevent theyield satisfaction phase from running two times.

B. Example

Here we use the example 2 from Section IV with thespecifications of A0 ≥ 60 dB, GBW ≥ 45 MHz, PM ≥ 60°,OS ≥ 1.9V, power ≤ 2.5 mW,

√area ≤ 250 μm and settling

time ≤ 25 μs with 1% error band (this specification needstransient simulation). The yield specification is 99% and thepower is the target design objective to be minimized. Fivetests with different random seeds are performed. For plainyield optimization without optimizing the power consumption,ORDE satisfies the yield specification, 99%, at a power of2.38 mW. We then use the extended ORDE to minimizepower while maintaining yield larger than 99%. The averagepower value now becomes 1.63 mW. The average CPU time is8919 s.

VI. Conclusion

In this paper, the ORDE algorithm has been proposedfor efficient yield optimization of analog integrated circuits,considering both inter-die and intra-die process variations.The method is general. ORDE can provide very accurateresults with far less computational cost (an order of magnitudesmaller) than the MC-based method using infeasible pruningand Latin hypercube sampling techniques. This improved effi-ciency makes statistical yield optimization useful in practice.This is achieved by using techniques from computationalintelligence, which are as follows.

1) ORDE uses a two-stage yield estimation process withordinal optimization in the first stage, which determinesthe simulation effort for each candidate solution “intel-ligently.”

2) The proposed random-scale operator maintains the diver-sity and performs combined global and local search, thusenhancing the convergence speed of the search engine.

3) The use of design of experiment techniques, infeasi-ble pruning and the selection-based constraint-handlingtechnique also contribute positively to ORDE.

804 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 30, NO. 6, JUNE 2011

Furthermore, ORDE is extended from plain yield optimiza-tion to process-variation-aware single-objective analog siz-ing which has been shown to give good results. Therefore,ORDE and its extended version are both reliable and efficientmethods for analog circuit yield optimization, especially fornew nanometer technologies with large variability. Moreover,ORDE is based on evolutionary computation and statisticalsampling methods, which are very well suited for parallelcomputation.

Acknowledgment

The authors sincerely thank B. Machiels, ESAT-MICAS,Katholieke Universiteit Leuven, Leuven, Belgium, for valuablediscussions.

References

[1] H. Graeb, Analog Design Centering and Sizing. Berlin, Germany:Springer, 2007.

[2] G. Gielen, T. Eeckelaert, E. Martens, and T. McConaghy, “Automatedsynthesis of complex analog circuits,” in Proc. 18th Eur. Conf. CircuitTheory Design, Aug. 2007, pp. 20–23.

[3] K. S. Eshbaugh, “Generation of correlated parameters for statisticalcircuit simulation,” IEEE Trans. Comput.-Aided Des. Integr. CircuitsSyst., vol. 11, no. 10, pp. 1198–1206, Oct. 1992.

[4] M. Buhler, J. Koehl, J. Bickford, J. Hibbeler, U. Schlichtmann,R. Sommer, M. Pronath, and A. Ripp, “DATE 2006 special session:DFM/DFY design for manufacturability and yield—influence of processvariations in digital, analog and mixed-signal circuit design,” in Proc.DATE, Mar. 2006, pp. 387–392.

[5] G. Yu and P. Li, “Yield-aware hierarchical optimization of large analogintegrated circuits,” in Proc. ICCAD, Nov. 2008, pp. 79–84.

[6] F. Schenkel, M. Pronath, S. Zizala, R. Schwencker, H. Graeb, andK. Antreich, “Mismatch analysis and direct yield optimization by spec-wise linearization and feasibility-guided search,” in Proc. DAC, 2001,pp. 858–863.

[7] T. Mukherjee, L. R. Carley, and R. A. Rutenbar, “Efficient handlingof operating range and manufacturing line variations in analog cellsynthesis,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.,vol. 19, no. 8, pp. 825–839, Aug. 2000.

[8] T. McConaghy, “Variation-aware structural synthesis and knowl-edge extraction of analog circuits,” Ph.D. dissertation, ESAT-MICAS,Katholieke Univ. Leuven, Leuven, Belgium, 2008.

[9] P. Khademsameni and M. Syrzycki, “Manufacturability analysis ofanalog CMOS ICs through examination of multiple layout solutions,”in Proc. 17th IEEE Int. Symp. Defect Fault Tolerance VLSI Syst., Nov.2002, pp. 3–11.

[10] Y. Xu, K.-L. Hsiung, X. Li, L. T. Pileggi, and S. P. Boyd, “Regularanalog/RF integrated circuits design using optimization with recourseincluding ellipsoidal uncertainty,” IEEE Trans. Comput.-Aided Des.Integr. Circuits Syst., vol. 28, no. 5, pp. 623–637, May 2009.

[11] J.-E. Chen, P.-W. Luo, and C.-L. Wey, “Placement optimization for yieldimprovement of switched-capacitor analog integrated circuits,” IEEETrans. Comput.-Aided Des. Integr. Circuits Syst., vol. 29, no. 2, pp.313–318, Feb. 2010.

[12] M. Barros, J. Guilherme, and N. Horta, “Analog circuits optimizationbased on evolutionary computation techniques,” Integr. VLSI J., vol. 43,no. 1, pp. 136–155, 2010.

[13] R. Schwencker, F. Schenkel, M. Pronath, and H. Graeb, “Analog circuitsizing using adaptive worst-case parameters sets,” in Proc. DATE, 2002,pp. 581–585.

[14] M. Sengupta, S. Saxena, L. Daldoss, G. Kramer, S. Minehane, and J.Cheng, “Application-specific worst case corners using response surfacesand statistical models,” IEEE Trans. Comput.-Aided Des. Integr. CircuitsSyst., vol. 24, no. 9, pp. 1372–1380, Sep. 2005.

[15] S. Basu, B. Kommineni, and R. Vemuri, “Variation-aware macromodel-ing and synthesis of analog circuits using spline center and range methodand dynamically reduced design space,” in Proc. 22nd Int. Conf. VLSIDes., Jan. 2009, pp. 433–438.

[16] A. A. Mutlu, N. G. Gunther, and M. Rahman, “Concurrent optimizationof process dependent variations in different circuit performance mea-sures,” in Proc. Int. Symp. Circuits Syst., May 2003, pp. 692–695.

[17] S. K. Tiwary, P. K. Tiwary, and R. A. Rutenbar, “Generation of yield-aware Pareto surfaces for hierarchical circuit design space exploration,”in Proc. DAC, 2006, pp. 31–36.

[18] M. Stein, “Large sample properties of simulations using Latin hypercubesampling,” Technometrics, vol. 29, no. 2, pp. 143–151, 1987.

[19] A. Singhee, S. Singhal, and R. A. Rutenbar, “Practical, fast Monte Carlostatistical static timing analysis: Why and how,” in Proc. IEEE ICCAD,Nov. 2008, pp. 190–195.

[20] A. Singhee and R. Rutenbar, Novel Algorithms for Fast StatisticalAnalysis of Scaled Circuits. Berlin, Germany: Springer, 2009.

[21] R. Rutenbar, G. Gielen, and B. Antao, Computer-Aided Design of AnalogIntegrated Circuits and Systems. New York: Wiley, 2002.

[22] Y. Ho, Q. Zhao, and Q. Jia, Ordinal Optimization. Soft Optimization forHard Problems. Berlin, Germany: Springer, 2007.

[23] K. Price, R. Storn, and J. Lampinen, Differential Evolution. A PracticalApproach to Global Optimization. Berlin, Germany: Springer, 2005.

[24] H. Niederreiter, Random Number Generation and Quasi-Monte CarloMethods. Philadelphia, PA: SIAM, 1992.

[25] C.-H. Chen, J. Lin, E. Yücesan, and S. E. Chick, “Simulation budgetallocation for further enhancing the efficiency of ordinal optimization,”J. Discrete Event Dyn. Syst. Theory Applicat., vol. 10, no. 3, pp. 251–270, 2000.

[26] H.-C. Chen, C.-H. Chen, and E. Yucesan, “Computing efforts allocationfor ordinal optimization and discrete event simulation,” IEEE Trans.Automatic Control, vol. 45, no. 5, pp. 960–964, May 2000.

[27] P. Moscato, “On evolution, search, optimization, genetic algorithms andmartial arts: Toward memetic algorithms,” Caltech Concurrent Comput.Program, California Instit. Technol., Pasadena, CA, Tech. Rep. 158-79,1989.

[28] J. Nocedal, “Updating quasi-Newton matrices with limited storage,”Math. Comput., vol 35, no. 151, pp. 773–782, 1980.

[29] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach,2nd ed. Englewood Cliffs, NJ: Prentice-Hall, 2003, pp. 111–114.

[30] J. Lampinen and I. Zelinka, “On stagnation of the differential evolutionalgorithm,” in Proc. 6th Int. Mendel Conf. Soft Comput., 2000, pp. 76–83.

[31] K. Deb, “An efficient constraint handling method for genetic algorithm,”Comput. Methods Appl. Mech. Eng., vol. 186, nos. 2–4, pp. 311–338,Jun. 2000.

[32] B. Liu, F. V. Fernández, G. Gielen, R. Castro-López, and E. Roca, “Amemetic approach to the automatic design of high performance analogintegrated circuits,” ACM Trans. Des. Autom. Electron. Syst., vol. 14,no. 3, p. 42, May 2009.

[33] L. Zielinski, B. Puchalski, and J. Rutkowski, “Yield enhancement bymeans of evolutionary computation techniques,” in Proc. ISCAS, 2006,pp. 4631–4634.

[34] J. F. Swidzinski, M. Keramat, and K. Chang, “A novel approach toefficient yield estimation for microwave integrated circuits,” in Proc.42nd Midwest Symp. Circuits Syst., 1999, pp. 367–370.

[35] J. Swidzinski and K. Chang, “Nonlinear statistical modeling and yieldestimation technique for use in Monte Carlo simulations,” IEEE Trans.Microw. Theory Tech., vol. 48, no. 12, pp. 2316–2324, Dec. 2000.

[36] P. D. Wasserman, Neural Computing: Theory and Practice. New York:Van Nostrand Reinhold, 1988.

[37] K. Levenberg, “A method for the solution of certain problems in leastsquares,” Quart. Appl. Math., vol. 2, pp. 164–168, 1944.

[38] D. Marquardt, “An algorithm for least-squares estimation of nonlinearparameters,” SIAM J. Appl. Math., vol. 11, no. 2, pp. 431–441, Jun.1963.

Bo Liu was born in Beijing, China, on Septem-ber 23, 1984. He received the B.S. degree inelectronic engineering from Tsinghua University,Beijing, China, in 2008. Since 2008, he has beena Ph.D. candidate at the MICAS Laboratories of theKatholieke Universiteit Leuven, Leuven, Belgium.

Since 2008, he has been a Research Assistantwith the MICAS Laboratories of the KatholiekeUniversiteit Leuven, under the supervision of Prof.G. Gielen. His current research interests include de-sign automation methodologies of analog and radio

frequency integrated circuits, evolutionary computation, machine learning,

LIU et al.: EFFICIENT AND ACCURATE STATISTICAL ANALOG YIELD OPTIMIZATION AND VARIATION-AWARE CIRCUIT SIZING 805

and fuzzy logic. He has authored or co-authored more than 20 papers ininternational journals and conference proceedings.

Mr. Liu is a reviewer in the artificial intelligence and analog de-sign automation fields, for publications such as the IEEE Transactions

on Evolutionary Computation, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Information Sci-ences (Elsevier) and Integration, the VLSI Journal (Elsevier). He is a bookreviewer of Elsevier, The Netherlands, and Bentham Science Publishers,UAE.

Francisco. V. Fernández received the Physics-Electronics degree from the University of Seville,Seville, Spain, in 1988, and the Ph.D. degree in1992.

In 1993, he was a Post-Doctoral Research Fel-low with Katholieke Universiteit Leuven, Leuven,Belgium. From 1995 to 2009, he was an AssociateProfessor with the Department of Electronics andElectromagnetism, University of Seville, where hewas promoted to Full Professor in 2009. He iscurrently a Researcher with IMSE-CNM, CSIC,

University of Seville. He has authored or edited three books and has co-authored more than 100 papers in international journals and conferences. Hiscurrent research interests include the design and design methodologies ofanalog and mixed-signal circuits.

Dr. Fernández is currently the Editor-in-Chief of Integration, the VLSIJournal (Elsevier). He regularly serves at the program committees of sev-eral international conferences. He has participated as a researcher or amain researcher in several national and European research and developmentprojects.

Georges G. E. Gielen (S’87–M’92–SM’99–F’02)received the M.S. and Ph.D. degrees in electricalengineering from Katholieke Universiteit Leuven,Leuven, Belgium, in 1986 and 1990, respectively.

In 1990, he was a Post-Doctoral Research Assis-tant and Visiting Lecturer with the Department ofElectrical Engineering and Computer Science, Uni-versity of California, Berkeley. From 1991 to 1993,he was a Post-Doctoral Research Assistant with theBelgian National Fund of Scientific Research, ESATLaboratory, Katholieke Universiteit Leuven. In 1993,

he was an Assistant Professor with Katholieke Universiteit Leuven, where hewas promoted to Full Professor in 2000. He has authored or co-authoredtwo books and more than 300 papers in edited books, international journals,and conference proceedings. His current research interests include the designof analog and mixed-signal integrated circuits, and especially analog andmixed-signal computer-aided design tools and design automation (modeling,simulation and symbolic analysis, analog synthesis, analog layout generation,analog, and mixed-signal testing). He is a coordinator or a partner of several(industrial) research projects in this area.

Dr. Gielen is regularly a member of the program committees of internationalconferences (DAC, ICCAD, ISCAS, DATE, CICC, and so on), and servedas the General Chair of the DATE Conference in 2006 and of the ICCADConference in 2007. He serves regularly as a member of editorial boardsof international journals (IEEE Transactions on Circuits and Systems,Springer International Journal on Analog Integrated Circuits and SignalProcessing, Elsevier Integration). He received the 1995 Best Paper Awardin the John Wiley International Journal on Circuit Theory and Applications,and was the 1997 Laureate of the Belgian Royal Academy on Sciences,Literature and Arts in the Discipline of Engineering. He received the 2000Alcatel Award from the Belgian National Fund of Scientific Research for hisinnovative research in telecommunications, and won the DATE 2004 BestPaper Award. He served as an elected member of the Board of Governors ofthe IEEE Circuits and Systems (CAS) Society and as the Chairman of theIEEE Benelux CAS Chapter. He served as the President of the CAS Societyin 2005.