Variability in nanometer CMOS: Impact, analysis, and minimization

21
INTEGRATION, the VLSI journal 41 (2008) 319–339 Invited paper Variability in nanometer CMOS: Impact, analysis, and minimization Dennis Sylvester a, , Kanak Agarwal b , Saumil Shah a a University of Michigan, Ann Arbor, MI, USAxc b IBM Austin Research Laboratories, Austin, TX, USA Received 6 August 2007; received in revised form 7 September 2007; accepted 10 September 2007 Abstract Variation is a significant concern in nanometer-scale CMOS due to manufacturing equipment being pushed to fundamental limits, particularly in lithography. In this paper, we review recent work in coping with variation, through both improved analysis and optimization. We describe techniques based on integrated circuit manufacturing, circuit design strategies, and mathematics and statistics. We then go on to discuss trends in this area, and a future technology outlook with an eye towards circuit and CAD-solutions to growing levels of variation in underlying device technologies. r 2007 Elsevier B.V. All rights reserved. 1. Introduction Technology scaling has led to a significant increase in process variability due to random dopant effects, imperfec- tions in lithographic patterning of small devices, and related effects [1]. These variations cause significant unpredictability in the power and performance characte- ristics of integrated circuits (ICs). In particular, the variations have a large effect on leakage power. Leakage power consumption now contributes a significant propor- tion of total power [2]. The susceptibility of leakage power on process variations is due to its exponential dependence on threshold voltage. In [3], it is shown that for a typical process, delay variation between fast and slow dies is around 30%, whereas leakage variation can be as high as 20 . Power and delay are negatively correlated, therefore it is often found that fast chips have unacceptably high power consumption, and low-power chips are too slow. This two-sided constraint is significantly reducing the parametric yield in process technologies at 90 nm and beyond. This yield loss will worsen in future technologies due to increasing process variations and the continued significance of leakage power. In recent times, there has been a wealth of research aiming to analyze circuit variability and reduce the effects on parametric yield. In this paper, we review a range of such techniques that not only predict the effects of process variation, but also allow designers to use the information to guide circuit optimization. Using this information allows designers to move away from the conventional methods of optimizing for power or timing, instead focusing their optimization towards the objective of improving both concurrently. In Section 2, we explore some of the causes of process variation and their contribution towards parametric yield reduction. We also analyze the impact of variability on the yield of SRAM arrays. Section 3 reviews methods for the modeling and analysis of variation. These include a method to compute the joint probability distribution function (jpdf) of power and timing, thus computing parametric yield given simultaneous power and timing constraints, a method for the modeling of interconnect variation, and a technique to model the effect of imperfect lithographic patterning on circuit timing and leakage. In Section 4, we discuss some design techniques for variability reduction. The first among these is an adaptive body biasing (ABB) scheme. We also review variation-tolerant microarchitec- ture techniques, a gate-length biasing mechanism, and a self-compensating layout scheme. Section 5 presents one perspective on the future of design and CAD with respect ARTICLE IN PRESS www.elsevier.com/locate/vlsi 0167-9260/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.vlsi.2007.09.001 Corresponding author. E-mail address: [email protected] (D. Sylvester).

Transcript of Variability in nanometer CMOS: Impact, analysis, and minimization

ARTICLE IN PRESS

0167-9260/$ - se

doi:10.1016/j.vl

�CorrespondE-mail addr

INTEGRATION, the VLSI journal 41 (2008) 319–339

www.elsevier.com/locate/vlsi

Invited paper

Variability in nanometer CMOS: Impact, analysis, and minimization

Dennis Sylvestera,�, Kanak Agarwalb, Saumil Shaha

aUniversity of Michigan, Ann Arbor, MI, USAxcbIBM Austin Research Laboratories, Austin, TX, USA

Received 6 August 2007; received in revised form 7 September 2007; accepted 10 September 2007

Abstract

Variation is a significant concern in nanometer-scale CMOS due to manufacturing equipment being pushed to fundamental limits,

particularly in lithography. In this paper, we review recent work in coping with variation, through both improved analysis and

optimization. We describe techniques based on integrated circuit manufacturing, circuit design strategies, and mathematics and statistics.

We then go on to discuss trends in this area, and a future technology outlook with an eye towards circuit and CAD-solutions to growing

levels of variation in underlying device technologies.

r 2007 Elsevier B.V. All rights reserved.

1. Introduction

Technology scaling has led to a significant increase inprocess variability due to random dopant effects, imperfec-tions in lithographic patterning of small devices, andrelated effects [1]. These variations cause significantunpredictability in the power and performance characte-ristics of integrated circuits (ICs). In particular, thevariations have a large effect on leakage power. Leakagepower consumption now contributes a significant propor-tion of total power [2]. The susceptibility of leakage poweron process variations is due to its exponential dependenceon threshold voltage. In [3], it is shown that for a typicalprocess, delay variation between fast and slow dies isaround 30%, whereas leakage variation can be as high as20� . Power and delay are negatively correlated, thereforeit is often found that fast chips have unacceptably highpower consumption, and low-power chips are too slow.This two-sided constraint is significantly reducing theparametric yield in process technologies at 90 nm andbeyond. This yield loss will worsen in future technologiesdue to increasing process variations and the continuedsignificance of leakage power.

e front matter r 2007 Elsevier B.V. All rights reserved.

si.2007.09.001

ing author.

ess: [email protected] (D. Sylvester).

In recent times, there has been a wealth of researchaiming to analyze circuit variability and reduce the effectson parametric yield. In this paper, we review a range ofsuch techniques that not only predict the effects of processvariation, but also allow designers to use the informationto guide circuit optimization. Using this information allowsdesigners to move away from the conventional methods ofoptimizing for power or timing, instead focusing theiroptimization towards the objective of improving bothconcurrently.In Section 2, we explore some of the causes of process

variation and their contribution towards parametric yieldreduction. We also analyze the impact of variability on theyield of SRAM arrays. Section 3 reviews methods for themodeling and analysis of variation. These include a methodto compute the joint probability distribution function(jpdf) of power and timing, thus computing parametricyield given simultaneous power and timing constraints, amethod for the modeling of interconnect variation, and atechnique to model the effect of imperfect lithographicpatterning on circuit timing and leakage. In Section 4, wediscuss some design techniques for variability reduction.The first among these is an adaptive body biasing (ABB)scheme. We also review variation-tolerant microarchitec-ture techniques, a gate-length biasing mechanism, and aself-compensating layout scheme. Section 5 presents oneperspective on the future of design and CAD with respect

ARTICLE IN PRESSD. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339320

to variability, bringing up several directions of researchthat merit increased attention.

2. Impact of process variation

Process variation can have a significant impact on powerdissipation, performance and functionality of a circuit. Inthis section, we discuss the impact of manufacturingvariations on parametric and functional yield of a design.

2.1. Impact on parametric yield

Parametric yield is defined as the percentage of dies thatsatisfy the specified frequency and power constraints.Parametric yield is different from manufacturing yieldand does not include yield loss due to manufacturingdefects. With fluctuations in process parameters, a large setof dies may not meet the power or performance budget ofthe design. This causes a significant yield loss as such chipscannot be shipped despite being functionally correct. Inthis section, we discuss such yield-related issues anddemonstrate that the increasing dominance of processvariation has made parametric yield loss a serious concernin current technologies.

A large set of process parameters can impact the powerand performance of a design. Among all these parameters,the most significant are channel length and thresholdvoltage. Fig. 1 shows frequency scatter due to thresholdand channel length variation obtained from Monte Carlo(MC) SPICE simulations for a 15-stage ring oscillatorcircuit in a 90 nm process. The variability in channel lengthand threshold voltage is obtained from estimates for thisindustrial process. As expected, frequency is higher atsmaller channel lengths and degrades considerably forlarger channel lengths. This is in accordance with tradi-tional practices where process spread is defined as spread inchannel length and negative and positive sigma corners are

Fig. 1. Scatter plot of normalized frequency plotted against sigma-

variation in channel length.

referred to as fast and slow process corners, respectively.As can be seen from the figure, there can be large variationin frequency due to process variation. The slow chips thatdo not meet the timing constraint must be discardedresulting in parametric yield loss.Besides timing yield, process variation can also cause

yield loss due to its impact on power. In previoustechnologies, when leakage (static) power was an insignif-icant component of the total power, the impact of processvariation on parametric yield was limited to the timingyield. It is because dynamic power shows a weakdependence on process variation. Dynamic power isimpacted primarily by the switching capacitance, which islinearly dependent on device channel length. The switchingcapacitance is fairly independent of small variations inthreshold voltage and the effect of threshold voltagevariation on short-circuit power can be neglected. Hence,it can be presumed that dynamic power is sensitive only tochannel length variations. More importantly, the impact ofprocess variation on dynamic power causes power con-sumption at slow process corners to be higher than that atfast corners. This trend is shown in Fig. 2, which plotsnormalized power and frequency against sigma-variationsin channel length superimposed on the Gaussian densityfunction of variations in channel length [4]. If a certainpower and frequency constraint is imposed on such aspread, then slower dies also violate the power constraint.Hence, the constraint on parametric yield is one-sided andwe can determine the maximum sigma-value for channellength that meets both power and frequency constraintsand reject all dies lying outside this channel length bound.With growing contribution of leakage power to total

power, the impact of variation on power dissipation of achip has become significant. It is because process variationhas a strong impact on leakage current due to the

Fig. 2. Yield window for frequency constraint of fmin ¼ 0.9fnom and

power constraint of Pmax ¼ 1.05Pnom. The figure shows that for negligible

static (leakage) power, the yield window is subjected to only one-sided

constraint.

ARTICLE IN PRESS

Fig. 3. Scatter plot of normalized leakage power plotted against sigma-

variation in channel length.

1.4

1.3

1.2

1.1

1.0

0.90 5 10 15 20

Norm

aliz

ed F

requency

Normalized Lekage (lsb)

30%

20X

0.18 micron

~1000 samples

Fig. 4. Distribution of chip performance and leakage based on silicon

measurements over a large number of samples of a high-end processor

design [3].

Fig. 5. Yield window for frequency constraint of fmin ¼ 0.9fnom and

power constraint of Pmax ¼ 1.05Pnom. The figure shows that leakage

power results in a two-sided constraint on the yield window.

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 321

exponential dependence of subthreshold leakage on chan-nel length and threshold variation and gate leakage onoxide thickness variation. Fig. 3 shows the variation innormalized leakage power (subthreshold+gate) plottedagainst sigma-variation in channel length obtained fromMC SPICE simulations The figure shows that there can benearly an order of magnitude variation in leakage powerdue to process variation. The key point here is that leakageis higher for smaller channel lengths due to thresholdvoltage roll-off effects in short-channel devices. Thisinverse correlation is also illustrated in Fig. 4, which showsthe distribution of chip performance and leakage based onsilicon measurements over a large number of samples of ahigh-end processor design [3]. As can be seen, both themean and variance of the leakage distribution increasesignificantly for chips with higher frequencies. As a resulttotal power is higher at fast corners than at slow corners.This trend is shown in Fig. 5 [4]. The figure also shows thatsuch a trend in power results in a two-sided constraint onthe yield window. This is due to the fact that fast dies fail to

meet the power budget due to high leakage, while low-leakage chips are too slow to meet performance specifica-tions. This key point is the reason why there can besignificant yield loss due to process variation as leakagepower grows in significance. This shrink in yield windowis what makes process variation a significant issue forASIC or custom high-performance and power consciousdesigns.

2.2. Impact on functional yield

Besides degrading parametric yield, process variationcan also cause nominally functional circuits to fail atskewed process corners. Various analog and digital circuitsthat rely on transistor matching can easily fail in thepresence of process variation-induced device mismatch.The most significant impact of process variation onfunctional yield is observed in the form of poor yield ofthe static random access memory (SRAM) arrays in scaledtechnologies [5,6].SRAM yield is very important from an economic

viewpoint due to the critical and the ubiquitous nature ofmemory in modern processors and SoCs. Density is a veryimportant metric for memories and hence SRAM cells usethe smallest manufacturable device sizes in a giventechnology. However, the threshold voltage variation dueto random dopant fluctuation is inversely proportional togate area [7–9]. Due to this dependence, the small sizedtransistors in a memory cell see a highly pronouncedrandom dopant effect. Moreover, the yield requirementsfor SRAM designs are more stringent than other logicblocks. Ref. [10] illustrates this problem with an example ofa 4MB cache. The authors show that a typical 4MB cachewith error correcting code (ECC) cells contains approxi-mately 38 million cells. To ensure that there is at most onefailure in this cache, the circuit must operate correctly up to

ARTICLE IN PRESSD. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339322

5.44 sigmas. This sort of fault coverage is extremely hard toachieve considering that SRAM cells are traditionallydesigned to ensure that the contents of the cell do not getaltered during read access while the cell should be able toquickly change its state during the write operation. Theseconflicting read and write requirements are satisfied bybalancing the relative strengths of the devices in the design.Such careful design of an SRAM cell provides stable readand write operation, but it also makes the cell vulnerable tothe failures caused by random variations in the devicestrengths.

For a 6-T SRAM cell as shown in Fig. 6, a readoperation typically involves precharging the bitlinesfollowed by reading the contents of the cell through theaccess transistors. When the access transistors are turnedon, one of the precharged bitlines discharges through theaccess device and the inverter pull-down transistor. Forexample in Fig. 6, during a read access, the bitline BL willdischarge through the access device AL and the pull-downdevice NL to read a zero at node L. This method of readingcell contents exposes the internal storage node L to thedisturbance caused by the resistive voltage division betweenthe access and the pull-down devices. Typically, thisdisturbance is minimized by making the pull-down devicemuch stronger than the access transistor. However,random variations in the threshold voltages can changethe strengths of various devices in an SRAM cell therebycausing the read operation to flip the contents of the cell.Similarly, write capability of a cell depends on relativestrengths of pull-up device and the access transistor. Thewrite operation in an SRAM cell is performed by settingthe bitlines to the desired values and enabling the accesstransistors to drive the internal nodes of the cell. Forexample in Fig. 6, a zero is written at the node R by drivingbitline BR low and setting the wordline high. The resistivevoltage division between the pull-up device PR and theaccess transistor AR pulls the internal node R to zerocausing the cell to change its state. In the presence ofsignificant offsets in device strengths due to processvariation, a SRAM cell may fail to write a desired statein the cell during the write operation. These sorts of failurescan be classified as functional failures because the circuitsfail to function as expected.

PR

NR

PL

NL

AR

WL

BL BR

VDD

Node L

AL (VL=0)

GND

(VR=1)

Node R

Fig. 6. Schematic of a conventional 6-T SRAM cell.

3. Modeling and analysis of variation

In the previous section, we showed that process variationcan have a significant impact on the parametric yield of adesign. This yield loss will worsen in future technologiesdue to increasing process variations and the continuedsignificance of leakage power. Hence there is a critical needto develop accurate yield estimation approaches that arecomputationally efficient to enable yield-driven optimiza-tion tools. In this section, we discuss various approaches tomodel impact of process variation on power and perfor-mance variability of a design.

3.1. Parametric yield modeling

Ref. [11] presents an efficient gate-level approach toaccurately estimate the parametric yield defined by leakagepower and delay constraints. Note that the techniquedescribed below shares certain drawbacks with all so-called‘traditional’ statistical CAD approaches—this is furtherdiscussed in Section 5.1 with a possible alternative pathdescribed.The yield in [11] is estimated by finding the jpdf for delay

and leakage power. The process parameters are expressedas a sum of correlated and random components and thesum of variances of both these components provides theoverall variation in the process parameter. To handlethe correlated components of variations (inter-die andcorrelated intra-die) the overall chip area is divided into agrid as shown in Fig. 7. Each square in the gridcorresponds to a random variable (RV) of the processparameter which has correlations with all other RVscorresponding to other squares in the grid [12]. The valuesat the bottom right corner of each of the grids show thecorrelation coefficients with the top-left square on the grid.Squares that are much further apart clearly demonstratelower correlation compared to adjacent squares on the gridin this model.

1.00 0.50

0.50

0.33

0.33

0.41 0.31

0.31 0.26

Fig. 7. Example partitions of a circuit using a grid to model the correlated

component of variations (correlation coefficients referenced to top left

element).

ARTICLE IN PRESSD. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 323

To simplify the problem, this set of correlated RVs isreplaced by another set of mutually independent RVs withzero mean and unit variance using the principal compo-nents of the set of correlated RVs [12]. A vector of RVs(say X) with a correlation matrix C, can be expressed as alinear combination of the principal components Y as

X ¼ Dx þ OV�1D1=2Y , (1)

Here Dx is the vector of the mean values of X, O is adiagonal matrix with the diagonal elements being thestandard deviations of X, V is the matrix of theeigenvectors of C, and D is a diagonal matrix ofthe eigenvalues of C. Since the correlation matrix of amultivariate (non-degenerate) Gaussian RV is positive-definite, all elements of D are positive and the square-rootin (1) can be evaluated. The delay and leakage power of anindividual gate can be expressed as follows:

Delay ¼ dnom þXp

i¼1

apðDPpÞ,

Leakage ¼ exp Vnom þXp

i¼1

bpðDPpÞ

!. ð2Þ

Here dnom and exp(Vnom) are the nominal values of delayand leakage power, respectively, and the a’s and b’srepresent the sensitivities of delay and the log of leakage tothe process parameters under consideration. The variableDPp represents the change in the process parameters fromtheir nominal value.

In a statistical scenario, the process parameters aremodeled as RVs. If the overall circuit is partitioned using agrid as shown in Fig. 7, the delay of individual gates can beexpressed as a function of these RVs. Using the principalcomponent approach, the delay in (2) can then beexpressed as

Delay ¼ dnom þXp

i¼1

ap

Xn

j¼1

gjizj

!þ ZdR, (3a)

where zj’s are the principal components of the correlatedRVs DPp’s in (2) and the g’s can be obtained from (1).R�N(0,1) in the above equation represents the randomcomponent of variations of all the process parameterslumped into a single term that contributes a total varianceof Z2d to the overall variance of delay. Similarly the leakagepower for an individual gate can be expressed as

Leakage ¼ exp Vnom þXp

i¼1

bp

Xn

j¼1

gjizj

!þ ZlR

!. (3b)

These representations of delay and power allow forsignificant simplification in the joint timing and poweranalysis, which otherwise becomes computationally ineffi-cient if the spatial correlation were maintained withoutsuch a unified principal component approach. The delay of

each gate (say a) can now be expressed as follows:

Da ¼ a0 þXn

i¼1

aizi þ anþ1R. (4)

This serves as the canonical expression for delay. Themean delay is simply the nominal delay (a0). Since theprincipal components (zi’s) are uncorrelated N(0,1) RVs,the variance of the delay can be expressed as

VarðDaÞ ¼Xn

i¼1

ða2i Þ þ a2

nþ1 (5)

and the covariance of the delay with one of the principalcomponents can be obtained as

Cov ðDa; ziÞ ¼ EðDaziÞ � EðDaÞEðziÞ ¼ a2i

8i 2 f1; 2; . . . ; ng. ð6Þ

Since the random components are uncorrelated and donot contribute to the covariance of the delay at the twonodes at the inputs of a gate (e.g., ‘a’ and ‘b’), thecovariance can be obtained as

Cov ðDa;DbÞ ¼Xn

i¼1

aibi. (7)

In deterministic timing analysis the delay of the circuit isfound by applying two functions to the delay of individualgates: sum and max. Similar functions for the canonicaldelay expressions (4) are defined as

SumðDa;DbÞ ¼ ða0 þ b0Þ þXn

i¼1

ðai þ biÞzi þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffia2

nþ1 þ b2nþ1

qR.

(8)

The max function of normally distributed RVs(c ¼ max(a, b)) is not a strict Gaussian but can be closelyapproximated by another Gaussian [12–14]. Ref. [15]provides equations to calculate the mean and variance ofc in terms of the mean and variance of a and b and theircorrelation coefficient. Ref. [15] also develops expressionsto evaluate the correlation of c with any other RV in termsof the correlation of the RV with a and b. The max c can beexpressed in the same canonical form as a and b [12,13]. Tofind the coefficients in the expression for c in canonicalform, the mean, variances and the correlation of c with theprincipal components are matched, giving

c0 ¼ E maxða; bÞð Þ

ci ¼ covðc; ziÞ ¼ cov maxða; bÞ; zið Þ 8i 2 f1; . . . ; ng

cnþ1 ¼ Var maxða; bÞð Þ �Xn

i¼1

c2i

!1=2

. ð9Þ

Using the sum and max operations outlined above, onecan develop an expression for the delay of a circuit in termsof the RVs associated with process parameter variations.The correlation in delay and power can be preserved byusing a similar principal component-based approach withthe same underlying RVs.

ARTICLE IN PRESSD. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339324

For power analysis, the leakage power of the total circuitcan be expressed as a sum of correlated lognormalvariables. This sum can be accurately approximated asanother lognormal RV [16]. The leakage power of anindividual gate a is expressed as

Paleak ¼ exp a0 þ

Xn

i¼1

aizi þ anþ1R

!, (10)

where the z’s are principal components of the RVs andthe a’s are the coefficients obtained using (1) and (2).The mean and variance of the RV in (10) can then becomputed as

EðPaleakÞ ¼ exp a0 þ

1

2

Xnþ1i¼1

a2i

!, (11)

VarðPaleakÞ ¼ exp 2a0 þ

Xnþ1i¼1

a2i

!� exp 2a0 þ

1

2

Xnþ1i¼1

a2i

!.

(12)

The correlation of the leakage of gate a with thelognormal RV associated with zj is found by evaluating

E PaleakeZj

� �¼ exp a0 þ

1

2

Xnþ1i¼1;iaj

a2i þ ðaj þ 1Þ2

!

8j 2 f1; 2; . . . ; ng. ð13Þ

Similarly the covariance of the leakage of two gates(a and b) can be obtained by using

E PaleakPb

leak

� �¼ exp ða0 þ b0Þ

þ1

2

Xn

i¼1

ðai þ biÞ2þ a2

nþ1 þ b2nþ1

!!. ð14Þ

The sum of leakage power can be expressed in thesame canonical form as (10). If the random variablesassociated with all the gates in the circuit are summedin a single step the overall complexity of the approachis O(n2) due to the size of the correlation matrix. Sincethe sum of two lognormal RVs is assumed to be alognormal variable in the same form, one can use arecursive technique to estimate the sum of more than twolognormal RVs. In each recursive step we sum two RVs ofthe form in (10) to obtain another RV in the samecanonical form. The coefficients in the expression forthe sum are obtained by matching the first two moments(as in Wilkinson’s method [17]) and the correlationswith the lognormal RVs associated with each of theGaussian principal components. We outline the steps inone of the recursive steps where Pb

leak and Pcleak are summed

to obtain Paleak

Paleak ¼ exp a0 þ

Xn

i¼1

aizi þ anþ1

!

¼ exp b0 þXn

i¼1

bizi þ bnþ1

!

þ exp c0 þXn

i¼1

cizi þ cnþ1

!¼ Pb

leak þ Pcleak. ð15Þ

The coefficients associated with the principal compo-nents can be found using (11)–(14) and expressing thecoefficients associated with the principal components as

ai ¼ logEðPa

leakezi Þ

EðPaleakÞEðe

zi Þ

� �

¼ logEðPb

leakezi Þ þ EðPcleakezi Þ

EðPbleakÞ þ EðPc

leak� �

Eðezi Þ

!. ð16Þ

Using the expressions developed in [18], the remaining twocoefficients in the expression for Pa

leak can be expressed as

a0 ¼1

2log

EðPbleakÞ þ EðPc

leak� �4

EðPbleakÞ þ EðPc

leak� �2

þ VarðPbÞ þ VarðPcÞ þ 2CovðPbPcÞ

!,

anþ1 ¼ log 1þVarðPbÞ þ VarðPcÞ þ 2CovðPbPcÞ

EðPbleakÞ þ EðPb

leak� �2

!�Xn

i¼1

a2i

" #0:5.

ð17Þ

Having obtained the sum of two lognormals in the originalcanonical form, the process can be recursively repeated tocompute the expression for the total leakage power of thecircuit.Once we have algorithms for statistical timing and power

analysis, the parametric yield of a circuit given delay andpower constraints can be expressed as

Y ¼ PðDpD0;PpP0Þ (18)

which is the probability of the circuit delay and powerbeing less than D0 and P0, respectively. Since delay andpower are correlated, the yield cannot be simply computedby multiplying the separate probabilities. To express theyield as the probability of a bivariate Gaussian RV, wetake the logarithm of the leakage power constraint. Thecorrelation coefficient of the two Gaussian RVs in the yieldequation can now be obtained using (7). The yield in termsof two N(0,1) RVs N0 and N1 can be expressed as

Y ¼ P N0pD0 � mD

sD

;N1plogðP0Þ � mlogðPÞ

slogðPÞ

� �. (19)

Since correlation does not change under a lineartransformation with positive coefficients, the correlationbetween N0 and N1 remains same.Table 1 compares the analytical yield estimates to those

obtained using Monte Carlo-based simulations for ISCASbenchmark circuits [19]. The table also shows thesignificance of correlation in parametric yield estimation.

ARTICLE IN PRESS

Table 1

Yield estimates for different frequency bins using the proposed approach and MC-based simulations

Benchmark circuit Monte Carlo Analytical yield Yield neglecting correlation

DoDm DmoDo1.1*Dm DoDm DmoDo1.1*Dm DoDm DmoDo1.1*Dm

c432 0.17 0.43 0.14 0.46 0.31 0.30

c499 0.17 0.46 0.15 0.49 0.32 0.32

c880 0.20 0.43 0.16 0.46 0.32 0.31

c1908 0.18 0.48 0.14 0.49 0.32 0.32

c2670 0.16 0.44 0.14 0.47 0.31 0.30

c3540 0.22 0.43 0.20 0.44 0.33 0.32

c5315 0.19 0.48 0.19 0.48 0.33 0.33

c6288 0.22 0.47 0.21 0.46 0.34 0.34

c7552 0.20 0.47 0.19 0.47 0.33 0.33

1.31.4

1.5

1.6

0

2

4

6

-10.4

-10.0

-9.6

-9.2

log

(Lea

kage

)

Delay (ns)

Fig. 8. Joint probability distribution function for the bivariate Gaussian

distribution for c3540.

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 325

The analytical approach provides good estimates of theyield with an average misprediction in yield of 2%. Fig. 8shows a representative joint probability density function(jpdf) of the log of leakage and delay, which is a bivariateGaussian jpdf.

As discussed in the previous section, power results in atwo-sided constraint on the yield window. The yieldmethodology discussed in this section enables efficientcomputation of parametric yield considering correlatedvariations in power and performance. The approach canform the basis for power and performance constrainedparametric yield-driven optimization techniques as op-posed to purely timing-yield-driven optimizations whichdramatically penalize power yield.

3.2. Interconnect variation modeling

Process variation can have a significant impact on bothdevice (front-end of the line) and interconnect (back-end of

the line) performance. In this section, we focus on modelinginterconnect delay while considering variability in thephysical dimensions. The impact of process variation oninterconnect performance is context dependent [20]. Fortwo different interconnect structures, when metal thicknessis increased, the delay of the one structure may increasewhile that of the other structure may reduce [20]. Thismakes it very difficult to capture the impact of variabilityon interconnect delay using a traditional corner-basedmethod.To first order, delay through an interconnect can be

expressed as the RC product of its resistance andcapacitance. With any change in the physical dimensionsof the wire, its resistance and capacitance also change,causing interconnect delay to fluctuate. In order to modelthe impact of variability on wire delay, we need to capturethe effect of geometric variations on the electrical para-meters. The change in electrical parameters due tovariations in geometric dimensions can be captured bythe simple linear approximation as follows:

R ¼ Rnom þ a1 DW þ a2 DT ,

C ¼ Cnom þ b1 DW þ b2 DT þ b3 DH. ð20Þ

Here, Rnom and Cnom represent nominal resistance andcapacitance values, computed when the wire dimensionsare at their nominal or typical values. DW, DT, and DH

represent the change in metal width, metal thickness, andILD thickness, respectively. The coefficients ai and bi arethe modeling coefficients in the linearized model. Thislinear approximation shows a high degree of accuracy forour purpose while remaining very simple to use [21]. Forsimplicity, Eq. (20) considers variations only in metal width(W), metal thickness (T) and interlayer dielectric thickness(H) but the other variation sources can be included in asimilar manner.Ref. [22] presents a technique for mapping geometric

variations in interconnect dimensions to corresponding RCdelay variation. Once interconnect dimensions are mappedto the circuit parameters (Eq. (20)), the next step isthe computation of circuit moments. For deterministic

ARTICLE IN PRESS

R3

R2

C1

(1) (2)

(3)

R1

C2

C3

Fig. 9. A simple RC tree.

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339326

resistance and capacitance values in an RC tree, the circuitmoments can be computed easily by path tracing [23].However, with interconnect variability the resistances andcapacitances are now random variables. If changes inphysical dimensions (DW, DT, etc.), are consideredindependent normal random variables, then the resistanceand capacitance calculated using Eq. (20) are correlatednormal random variables. We illustrate moment computa-tion with variability for the simple RC tree example shownin Fig. 9.

The first moment at any node can be expressed as afunction of resistances Ri’s and capacitances Ci’s. Forexample, the first moment at node 3 (without consideringvariability) in the above circuit is given by

m31 ¼ m1ðRi;CiÞ ¼ �R1ðC1 þ C2 þ C3Þ � R3C3. (21)

With variations in physical dimensions, the resistances Ri

and the capacitance Ci are linear functions of randomvariables (DW, DT, etc.),

Ri ¼ RiðnomÞ þ RiðW Þ DW þ RiðTÞ DT ,

Ci ¼ CiðnomÞ þ CiðW Þ DW þ CiðTÞ DT þ CiðHÞ DH. ð22Þ

Here, Ri(nom) and Ci(nom) are nominal resistance andcapacitance, respectively. The coefficients Ri(W), Ci(W)

model change in the resistance and capacitance with achange in width. Similarly, other coefficients capture thedelta resistance and delta capacitance with respect to eachphysical dimension. Substituting Ri and Ci from Eq. (22)into Eq. (21) gives

m1 ¼ m1ðnomÞ þ kW DW þ kT DT þ kH DH þ kW 2ðDW Þ2

þ kT2 ðDTÞ2 þ kWT ðDW DTÞ þ kWH ðDW DHÞ

þ kTH ðDT DHÞ. ð23Þ

Here m1(nom) is the first moment evaluated at nominalresistance Ri(nom) and nominal capacitance Ci(nom). Weexpress this by the following notation:

m1ðnomÞ ¼ m1ðRiðnomÞ;CiðnomÞÞ. (24)

The coefficients in Eq. (23) can be calculated byevaluating the first moment at different values of R’s andC’s. For example, the coefficient kW 2 can be computed bycalculating first moment when all resistances and capaci-tances are replaced by the corresponding Ri(W) and Ci(W).The expressions for each coefficient in terms of first

moment computation are given by

kW ¼ m1ðRiðW Þ;CiðnomÞÞ þm1ðRiðnomÞ;CiðW ÞÞ,

kW 2 ¼ m1ðRiðW Þ;CiðW ÞÞ,

kT ¼ m1ðRiðTÞ;CiðnomÞÞ þm1ðRiðnomÞ;CiðTÞÞ,

kT2 ¼ m1ðRiðTÞ;CiðTÞÞ,

kWT ¼ m1ðRiðW Þ;CiðTÞÞ þm1ðRiðTÞ;CiðW ÞÞ,

kWH ¼ m1ðRiðW Þ;CiðHÞÞ,

kTH ¼ m1ðRiðTÞ;CiðHÞÞ,

kH ¼ m1ðRiðnomÞ;CiðHÞÞ. ð25Þ

Eq. (23) shows the first moment expression as a functionof normal random variables representing variations inback-end physical dimensions. Eq. (23) contains higherorder terms and cross product terms, thereby implying thatthe distribution of first moments is not exactly Gaussian.However, if we neglect higher order terms, the first momentexpression from Eq. (23) reduces to the following equation:

m1 ¼ m1ðnomÞ þ kW DW þ kT DT þ kH DH. (26)

We can perform a similar analysis for the secondmoment. The second moment can be expressed as afunction of resistances, capacitances, and the first mo-ments. For example, the second moment at node 3 (withoutconsidering variability) for the circuit in Fig. 9 is given by

m32 ¼ m2ðRi;Ci;m

i1Þ ¼ � R1ðm

11C1 þm2

1C2 þm31C3Þ

� R3ðm31C3Þ. ð27Þ

With variations in physical dimensions, Ri’s and Ci’s canbe replaced by their corresponding expressions fromEq. (22) and similarly m1’s can be replaced by theexpression given in Eq. (26). If we again keep only thelinear terms, m2 can be expressed as

m2 ¼ m2ðnomÞ þ AW DW þ AT DT þ AH DH. (28)

Here m2(nom) is the second moment evaluated at nominalresistance Ri(nom), nominal capacitance Ci(nom) and nominalfirst moment m1(nom). We express this by followingnotation:

m2ðnomÞ ¼ m2ðRiðnomÞ;CiðnomÞ;m1ðnomÞÞ. (29)

The coefficients in Eq. (28) can be calculated byevaluating the second moment at different values of R’s,C’s and m1’s.

AW ¼ m2 RiðnomÞ;CiðnomÞ; kW

� �þm2 RiðnomÞ;CiðW Þ;m1ðnomÞ

� �þm2 RiðW Þ;CiðnomÞ;m1ðnomÞ

� �,

AT ¼ m2 RiðnomÞ;CiðnomÞ; kT

� �þm2 RiðnomÞ;CiðTÞ;m1ðnomÞ

� �þm2 RiðTÞ;CiðnomÞ;m1ðnomÞ

� �,

AH ¼ m2 RiðnomÞ;CiðnomÞ; kH

� �þm2 RiðnomÞ;CiðHÞ;m1ðnomÞ

� �.

ð30Þ

Here kw, kT, and kH can be computed as shown inEq. (25).The validity of the linearity assumption was experimen-

tally validated in [22] for the simple configuration of Fig. 9.

ARTICLE IN PRESS

-94 -92 -90 -88 -86 -84 -82 -80 -78 -760.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

0.045

Pro

babili

ty

m1∗ (1e12)

m1 distribution with all terms

m1 distribution with only linear terms

5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.00.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

0.045m2 distribution with all terms

m2 distribution with only linear terms

m1∗ (1e21)

Pro

babili

ty

Fig. 10. First (left plot) and second (right plot) moment distributions with and without considering non-linear terms.

60 62 64 66 68 700.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

SPICE Monte Carlo: Mean=64.7ps, Stdev=1.7ps

Nominal: W=T=0.8um, ILD=0.55um, L=5mm, Delay=64.9ps

Pro

babili

ty

Delay (ps)

D2M: Mean=64.9ps, Stdev=1.7ps

Fig. 11. Analytical delay distribution obtained using statistical D2M

metric compared to Monte Carlo simulations.

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 327

Nominal values for width and metal thickness were chosento be 0.8 mm while nominal ILD height was taken to be0.55 mm. The 3-sigma tolerances were 70.1 mm for widthand thickness and 70.05 mm for ILD thickness. Spacingwas assumed to be inversely correlated with width. Giventhese distributions of DW, DT, and DH, Fig. 10 plots thedistributions of first and second moments at node 3 withand without the non-linear terms. The figure shows that thetwo curves are almost identical. The figures also show thatthe distributions of moments are Gaussian and hence thelinear approximations of Eqs. (26) and (28) are extremelyaccurate.

Once moments are expressed as functions ofchange in physical dimensions, the next step is to mapthe PDF of these moments to an interconnect delay PDF.Among various delay metrics, D2M [24] is a goodcandidate for this exercise due to its closed-form nature.However, the approach is independent of the metricand can be applied to any other closed-form delay metricas well.

Using D2M the delay in terms of moments can beexpressed as

D2M ¼ ln 2 �m1ðnomÞ þ kW DW þ kT DT þ kH DH� �2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

m2ðnomÞ þ AW DW þ AT DT þ AH DHp .

(31)

Analyzing the D2M expression in Eq. (31) is difficult andhence, we can write a series expansion of the D2M

expression and keep only the linear terms, as discussedearlier. The D2M expression can then be re-written as

D2M ¼ ln 2 �m1ðnomÞ

� �2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2ðnomÞp 1þ SW DW þ ST DT þ SH DHð Þ.

(32)

Here SW, ST, and SH can be calculated using

SW ¼2kW

m1ðnomÞ�

AW

2m2ðnomÞ,

ST ¼2kT

m1ðnomÞ�

AT

2m2ðnomÞ,

SH ¼2kH

m1ðnomÞ�

AH

2m2ðnomÞ. ð33Þ

If DW, DT and DH are independent random variableswith their mean values zero and standard deviations of sW,sT, and sH, respectively, then the mean and standarddeviation of delay in terms of the standard deviations in

ARTICLE IN PRESS

0 100 200 300 4003x107

4x107

5x107

6x107

7x107

8x107

9x107

1x108

1x108

Jon (A/cm2)

Joff (A/cm2)

Width location (nm), z-axis

Jon (

A/c

m2)

102

103

104

Joff (

A/c

m2)

Fig. 12. Drive and leakage current density variation along channel width.

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339328

physical dimensions can be written as

EðD2MÞ ¼ ln 2 �m1ðnomÞ

� �2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2ðnomÞp ,

StdevðD2MÞ

¼ ln 2 �m1ðnomÞ

� �2ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffim2ðnomÞp

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiS2

Ws2W þ S2Ts

2T þ S2

Hs2H� �q

. ð34Þ

Fig. 11 shows the application of Eq. (32) to estimatedelay spread of a 5mm long global interconnect wire.Nominal values (3-sigma tolerances) for metal width,thickness and ILD height were 0.8 mm (70.1 mm), 0.8 mm(70.1 mm) and 0.55 mm (70.05 mm), respectively. Fig. 11shows the model results as well as those from the MCsimulations. The figure shows that the model estimates thedelay spread very well.

In the next section, we discuss one of the commonsources of leakage variability, irregularities in lithographicpatterning. We describe a technique proposed to quantifythe leakage variation due to this source of variability.

3.3. Analysis of leakage variation due to non-uniform device

geometries

In this section, we discuss leakage variation due toimperfect lithographic patterning. It is found that printedimages on silicon are not restricted to simple rectilineargeometries. A silicon image of a transistor may not be aperfect rectangle as is assumed by all current circuitanalysis tools. The irregularity in shape can have a strongeffect on leakage current. Existing tools and device modelscannot handle complicated non-rectilinear geometries.

In this section, we discuss a new technique to model non-uniform, non-rectangular gates. The authors in [25]propose a method to convert non-rectangular gates intoequivalent perfect rectangle gates so that they can beanalyzed by SPICE-like circuit analysis tools. This enablesclosest-to-silicon timing and power analysis. Severalapproaches to handling non-rectangular gates have beenpreviously proposed methods [26–29]. However, theseapproaches neglect an important effect that is the changein threshold voltage along the width of the channel. The so-called ‘‘edge effect’’ means that an irregularity in shapenear the edge of the gate affects the currently differentlythan a similar irregularity near the gate-center. Theproposed method takes this effect into account and modelsit accurately to find the current density at every point alongthe device. The non-rectangular gate can now be sliced intoseveral rectangular regions and the current flowing througheach slice can be accurately calculated based on its length,width and position. The current thus calculated is used toobtain the effective length for an equivalent rectangulardevice. The authors show that this method is much moreaccurate than previously proposed approaches whichneglect the location dependence of the threshold voltage.

MOS devices with shallow trench isolation (STI)technology suffer from the reverse-narrow widtheffect [30]. Due to fringing capacitances from line-endextension and well proximity effects [29], the thres-hold voltage is much lower near the STI edges thannear the center of device. Fig. 12 shows the driveand leakage current densities as a function of positionalong the channel. The data is generated from aTCAD setup closely matching an industrial 90 nm SPICEmodel.The threshold voltage is analytically modeled as a

function of distance from the edge, using knowledge ofdevice behavior and data from SPICE simulations.This method is found to be considerably more

accurate compared to a ‘‘flat’’ model which does notcompare threshold voltage variation along the length.The accuracy is verified using a test structure, where asmall irregularity is introduced into the rectangular gate.The size and location of the protrusion are varied and thecurrent overhead compared to the nominal rectan-gular gate is measured. The resulting surfaces are plottedin Fig. 13.As shown, the flat model (a) cannot capture the location

dependence, whereas the proposed model (b) is found to bevery close to the TCAD simulation result (c). Table 2shows the improvement in accuracy of the proposedmethod over flat method.The results show that, using this method, the non-

rectangular gates can be accurately replaced by equivalentrectangular gates, thus allowing accurate post-lithographytiming and leakage analysis. This modeling technique canbe incorporated into a full-chip post-silicon analysis flow,as proposed in [31].

4. Design techniques for variation minimization

In this section, we discuss several design techniques forreducing the impact of process variation on parametric and

ARTICLE IN PRESS

Fig. 13. Current overhead for a small protrusion on a device: (a) flat model, (b) proposed model and (c) TCAD simulation.

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 329

functional yield of a design. In general, the techniques canbe classified into the following three categories:

post-silicon variation compensation techniques � variation-tolerant circuits and microarchitecture � variability-aware design

4.1. Post-silicon variation compensation

Post-silicon compensation techniques are applied afterfabrication and can improve the parametric yield of adesign significantly. One such technique for post-siliconcompensation is substrate or body biasing [32–34]. Thistechnique, commonly known as, ABB allows post-silicontuning of the threshold voltage (VT) by controlling thesubstrate potential. The technique exploits the fact that VT

of a device can be controlled by changing its body-to-source bias voltage. The power-performance spread of adesign can therefore be tightened by applying a forwardbody-bias (FBB) to slow and less leaky dies and a reversebody-bias (RBB) to fast and leaky dies. The circuitimplementation of ABB requires a body-bias generatorand a control circuit for optimal body-bias selection.

The control circuit typically contains a critical path replicawhose frequency and leakage determine the optimalNMOS and PMOS body-bias voltage applied to the die.Ref. [35] demonstrates the advantage of ABB in compres-sing leakage and frequency distributions on a 150 nmCMOS test-chip. The measured results from [35] are shownin Fig. 14.The application of ABB to intra-die variation requires

biasing different regions of the die with different bodyvoltages. This limits the granularity of ABB in compensat-ing intra-die variation. Moreover, RBB and FBB haveseveral other issues that limit their effectiveness in sub-65 nm technologies. RBB increases depletion region widthand hence worsens short-channel effects [36,37]. RBB alsoincreases drain–body junction leakage [38,39]. FBB, on theother hand, results in higher junction capacitance and highforward source–body junction current [40]. Fig. 15 showsthreshold voltage change as a function of body-bias for anNFET device simulated in 65 nm technology. The figurealso shows the impact of body-bias on DIBL coefficient.Here DIBL coefficient is measured as the sensitivity ofthreshold voltage with respect to channel length variation.The plot shows that RBB results in significant degradation

ARTICLE IN PRESS

Fig. 14. Leakage vs. frequency distribution for no body-bias (NBB) and

adaptive body-bias (ABB) [35].

-100

-75

-50

-25

0

25

50

75

100

125

-1 -0.75 -0.5 -0.25 0 0.25 0.5

Body Bias (V)

ΔVth

(m

V)

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

Norm

aliz

ed D

IBL c

oeffDIBL

Vth

Fig. 15. Impact of body-bias on threshold voltage and DIBL.

PR

NR

PL

NL

ALAR

WWL

VDD

GNDWBL WBL

RWL

RBL

Fig. 16. 8-T SRAM cell with stable read operation [43].

Table 2

Results for non-rectilinear gates with protrusion or depression of width

20 nm

Loc. (nm) Length of protrusion/depression

82 86 90 94 98

0

Davinci 1.477 1.163 1.000 0.919 0.870

Proposed model 1.543 1.194 1.000 0.911 0.838

Flat model 1.133 1.040 1.000 0.978 0.967

20

Davinci 1.247 1.083 1.000 0.982 0.971

Proposed model 1.197 1.061 1.000 0.970 0.954

Flat model 1.133 1.040 1.000 0.978 0.967

40

Davinci 1.124 1.042 1.000 0.991 0.984

Proposed model 1.088 1.024 1.000 0.989 0.984

Flat model 1.133 1.040 1.000 0.978 0.967

60

Davinci 1.092 1.031 1.000 0.993 0.989

Proposed model 1.079 1.021 1.000 0.991 0.986

Flat model 1.133 1.040 1.000 0.978 0.967

80

Davinci 1.082 1.027 1.000 0.994 0.991

Proposed model 1.079 1.021 1.000 0.991 0.986

Flat model 1.133 1.040 1.000 0.978 0.967

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339330

of DIBL coefficient, thereby limiting its effectiveness inprocess variation compensation.

Adaptive supply voltage (ASV) is another technique forreducing power-performance spread due to process varia-tion [41,42]. This technique shrinks distributions by low-ering supply voltage of fast and leakier dies and byincreasing VDD of slow but less leaky dies. The implemen-tation of ASV requires a DC–DC converter and a control

circuit for selecting optimal supply voltage. ASV can beused in conjunction with ABB for maximum benefits butthe design complexity and overhead of using bothtechniques together is quite significant.

4.2. Variation-tolerant circuits and microarchitecture

Variation-tolerant circuit topologies and microarchitec-ture techniques can be very effective in reducing the impactof process variation. In this section, we discuss few suchtechniques in the context of robust SRAM design. Forsensitive circuits such as SRAM cells, the variation-tolerantdesign can significantly improve the yield of a memoryarray. As discussed in Section 2, SRAM failures are causedby local device mismatch. Due to their random nature,such local variations cannot be controlled by post-siliconadaptive techniques discussed in the previous section.Hence it is important to develop design solutions thatmake the cells less sensitive to random device variations.Fig. 16 shows an SRAM cell topology [43] that provides

stable read operation even in the presence of large randomvariation. The SRAM cell contains two extra transistorsfor single-ended read operation as shown in the figure.These devices improve read stability of the design byisolating the read disturbance from the storage nodes.During read operation, the resistive voltage division

ARTICLE IN PRESS

PCLK

PCLK

EN2

BL2

S2_LS2

S1_L

Main SA

Shadow SA

To column

multiplexers…

ERROR

EN1

SR2

SR1 DATA

PCLK

EN1

BL1

S1

Fig. 17. Timing error detection in SRAM arrays using a shadow sense

amplifier [46].

Fig. 18. Ioff/Ion plots for NMOS/PMOS

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 331

between the read devices does not cause any disturbance atthe internal storage nodes. Therefore, even in the presenceof large read disturbances, the cell always retains its dataresulting in stable read operation. Furthermore, the celltopology has separate read and write ports and hence thewrite margin of the cell can be optimized independent ofthe read stability constraints. The limitation of this designis that it results in an increased cell size due to two extradevices as compared to the conventional 6-T cell.Timing failures caused by process variation can be

addressed by microarchitecture techniques such as theRazor technique discussed in [44,45]. The general ideabehind this technique is to use a shadow latch with thestandard positive edge triggered D-flipflop. The shadowlatch samples the data at the negative edge of the clock thusallowing it to latch the correct output even in the presenceof timing failures. Any discrepancy in the outputs of theshadow latch and the D-flipflop indicates an error.A similar technique can be used to detect timing errors inSRAM arrays (Fig. 17) [46]. At the sense amplifierboundary on the read bit-path, two standard differentiallatch-type sense amplifiers are used to double-sample thebitline during a read operation in the SRAM. The originalsense amplifier is triggered at the end of the cycle while thesecond sense amplifier is triggered by a delayed enablesignal. The delayed enable signal provides extra time todevelop additional bitline differential before the secondsense amplifier is fired. The outputs of the two senseamplifiers can be compared to detect read performancefailures.

When a speed path failure is detected at the senseamplifier, the correct value is multiplexed onto the staticdata bus. This result will not reach the I/O interface withinthe clock cycle, but the memory element at the I/Ointerface is capable of re-sampling the data bus and

devices as a function of gate-length.

ARTICLE IN PRESS

Table 3

Optimization results for transistor-level and cell-level biased libraries

Circuit Instance

count

% Imp

CLB

% Imp

TLB

% Imp all

variants

C5315 1681 27.66 41.69 42.09

C6288 3041 16.99 26.17 27.25

AES 30,991 22.68 38.05 38.66

ALU 15,880 15.68 32.56 33.13

S9234 1212 24.38 31.41 32.46

S13207 3464 30.83 40.15 40.43

S38417 11,620 25.98 38.44 38.78

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339332

propagating the correct value via the latching mechanismproposed in [44]. If an error is detected at the latch columnor at the sense amplifier block, the SRAM can return asignal similar to a cache miss in a hierarchical memorysystem. The corrected data is forwarded to the system atthe end of the second cycle after the request rather than inone cycle.

4.3. Gate-length biasing for leakage and variability

reduction

In this section, we discuss another scheme for buildingcircuits that have high yield in the face of leakagevariability. We describe a leakage and variability reductionscheme that involves increasing the critical dimension ofstandard cells by very small amounts, which can be donewith very little impact on standard-cell design/routing. Thismethod was first proposed by the authors of Ref. [47].Fig. 18 shows how the drive and leakage currents of PMOSand NMOS devices change with small changes in gatelength.

The figures show that the ratio of leakage and drivecurrent decreases with increase in gate-length. In otherwords, small changes to the length lead to large reductionsin leakage for small delay overhead. The figure also showsthat along with reduced leakage, the sensitivity of leakageto gate-length variation reduces with increased length.Thus gate-length biasing leads to leakage variabilityreduction along with leakage reduction. The authors in[48] propose a standard-cell optimization methodologythat involves generating different variants are generated foreach cell to create an enhanced cell-library. The paperspecifically proposes a transistor-level length assignmentmethodology, where gate-length biases are assigned to eachtransistor individually. A circuit optimizer uses thisenhanced library to selectively replace cells on non-criticalpaths with slow and low-leakage cells in order to generate adesign with reduced leakage.

The authors propose that maximum length-bias be keptto approximately 10% of the nominal gate length. It isfound that with larger CD changes, the delay-leakagetradeoffs get increasingly less attractive. Furthermore, areaand capacitance penalties become significant. The pro-posed method strives to maintain layout compatibility ofthe biased standard-cell with the nominal cell. Thisconsideration also limits the allowed bias due to potentialdesign rule violations. Across technologies, it is found that10% is a reasonable value, both from the process andbenefit perspective.

We now review and compare two library optimizationmethods, one where every transistor in a cell has the samebias, and the other where each transistor is biasedindividually. We refer to the two methods as cell-levelbiasing (CLB) and transistor-level biasing (TLB). TLB,although more complex, provides a much richer designspace for the optimizer, leading to more thoroughlyoptimized designs. The need for TLB variants is motivated

by the fact that different transistors in a cell have varyingcontributions to leakage and delay. The observation thatPMOS and NMOS devices have different delay/leakagecharacteristics provides additional motivation for a TLBmethodology.To generate TLB variants for standard-cells, the authors

propose a sensitivity-based heuristic, which identifies thetransistors in the cell that can provide maximum leakagereduction while causing minimum delay overhead. Severaldifferent types of variants are generated, with varying delayand leakage tradeoffs. The generation of asymmetricvariants where the rise and fall transitions are affected, inorder to exploit asymmetric slack characteristics of designs,is also proposed.We also identify variants that are superior in both

leakage and delay compared to unbiased cells. Thesevariants are generated by distributing bias between high-leakage-impact transistors and high-delay-impact transis-tors within a standard cell.Table 3 shows the results of the gate-length biasing

scheme over a variety of ISCAS85 [19] circuits. It is foundthat using cell-level biasing gives leakage reduction of up to31%, whereas the TLB methodology achieves leakagereduction of up to 42%. The average leakage savings ofCLB and TLB over unoptimized designs are 23% and35%, respectively. The histogram for the AES circuit inFig. 19 also shows that the standard deviation of the CLBoptimized design is 28% lower than nominal and that ofthe TLB optimized design is 66% lower than nominal.This shows the clear superiority of the TLB scheme over

the gate-length biasing scheme. With minimal design andoptimization complexity, large savings in leakage andleakage variability are achievable.

4.4. Variability-aware design

In this section, we move on to review methods fordesigning circuits that inherently compensate for sources ofvariation. We discuss a self-compensation scheme for CDvariation caused by defocus. The Bossung plots in Fig. 20show that dense lines ‘‘smile’’ through focus while isolatedlines ‘‘frown’’. In this work, we propose a new designmethodology that allows explicit compensation of focus-dependent CD variation in particular, either within a cell

ARTICLE IN PRESSD. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 333

(self-compensated cells) or across cells in a critical path(self-compensated design). By creating iso and dense

variants for each library cell, we can achieve designsthat are more robust to focus variation. Optimizationwith a mixture of dense and iso cell variants is possibleboth for area and leakage power in timing constraints

Table 4

Leakage power change for self-compensating designs and two heuristic-based op

At 0.4 defocus c432 c499 c880 c1355 c190

Self-compensated (%) 5.1 5.7 5.5 5.6 5

Single-pitched (%) 6.6 6.9 7.0 6.7 6

heu1 (area) (%) 31.4 �4.8 �36.3 11.3 �6

heu2 (leakage) (%) �25.7 0.6 �36.9 14.8 �22

-0.4 -0.2 0.0 0.2 0.4110

120

130

140

150

line w

idth

(nm

)

defocus (um)

0.18

0.2

0.22

0.24

0.26

0.28

0.3

0.32

0.34

0.36

0.38

0.4

0.42

0.44

0.46

0.48

space (um)

Fig. 20. Linewidth variation with defocus level (nominal line-

width ¼ 130nm).

Fig. 19. Leakage histograms for unoptimized, TLB optimized and CLB

optimized AES designs.

(critical delay), with the latter an interesting complement toexisting leakage reduction techniques such as dual-Vth. Weimplement both a heuristic and mixed-integer linearprogramming (MILP) solution methods to address thisoptimization. The two methods of self-compensation arediscussed below.

4.4.1. Self-compensated cell layout

By self-compensated cell layout, we refer to a correct-by-construction methodology that relies on within-cell com-pensation of CD variation caused by focus variation. Forexample, variation can be compensated in series-connectedNMOS, if one device becomes shorter (thus, faster) underdefocus, and the other device becomes longer (thus,slower). This can be achieved by making one device ‘‘iso’’and the other device ‘‘dense’’. The other way of generatingself-compensated cells is to find spacing ranges in which thelinewidth variation is negligible by focus variation. Eachspacing between adjacent poly-lines should be one of thesevalues. In this work, we generate all the self-compensatedcells by requiring poly-spacing to be in the compensatedspacing range (to be discussed further in the next section).We also explore the possibility of single-pitched-cells

where all poly-spacings are set to one highly manufactur-able value to eliminate the focus-dependent CD variationinside cells.

4.4.1.1. Self-compensated physical design. This refers tocompensation across cells (e.g., along a critical path).Consider two cells G1 and G2 that lie on the critical pathG1-G2. Focus variation, if not corrected by applyingexpensive RETs, can cause variation in critical path delayand lead to potential timing failures or parametric yieldloss. However, if G1 is explicitly made ‘‘iso’’ while G2 ismade to act ‘‘dense’’, focus-dependent CD variation can becompensated. Assuming that iso and dense versions oflibrary cells are available, designs that are robust to focusvariation become possible.Using these different libraries and our optimization

algorithms on ISCAS85 [19] benchmark circuits, we findthat we can adequately compensate for focus-dependentCD variation. The results are shown in Tables 4 and 5. Thenormalized area overheads incurred when using each cellvariant (both uniformly and using the proposed optimiza-tion approaches) are shown in Table 5. The gate distribu-tion and runtime of the optimization options are shown inthe right side of the table. Heu1 (area-driven optimization)and heu2 (leakage driven optimization) refer to two

timizations at 0.4 mm defocus compared to original library at 0 mm defocus

8 c2670 c3540 c5315 c6288 c7552 Avg.

.5 5.0 5.5 5.5 5.3 5.4 5.4

.2 5.9 6.8 6.2 6.4 5.9 6.5

.5 �36.5 �34.5 17.4 �10.5 �33.6 �10.3

.2 �37.6 �37.2 �36.6 �25.9 �39.0 �24.6

ARTICLE IN PRESS

Table 5

Normalized area and gate distributions for each library and optimization approach

Benchmarks Total no. of

gates

Normalized area Gate distribution Runtime (s)

Orig. Dense Iso Self-

comp.

Single-

pitch

Optimization Heuristic1

(area)

Heuristic2

(leakage)

ILP

heu1

(area)

heu2

(leakage)

ILP Dense Iso Dense Iso Dense Iso heu1 ILP

c432 339 1.00 1.02 1.17 1.12 1.26 1.09 1.09 1.08 233 106 318 21 317 22 0.04 0.19

c499 682 1.00 1.00 1.17 1.11 1.27 1.00 1.02 1.00 581 101 569 113 584 98 0.09 1.70

c880 575 1.00 1.02 1.18 1.11 1.27 1.02 1.02 1.01 560 15 561 14 562 13 0.07 0.35

c1355 680 1.00 1.00 1.17 1.11 1.27 1.05 1.08 1.04 536 144 516 164 564 116 0.39 11.21

c1908 645 1.00 1.01 1.16 1.12 1.26 1.04 1.05 1.04 554 91 584 61 566 79 0.08 13.79

c2670 1040 1.00 1.01 1.15 1.11 1.25 1.05 1.05 1.04 1017 23 1020 20 1010 30 0.20 11.61

c3540 1313 1.00 1.01 1.17 1.10 1.27 1.01 1.01 1.01 1279 34 1287 26 1280 33 0.32 27.28

c5315 2028 1.00 1.00 1.16 1.11 1.27 1.01 1.01 1.00 1490 538 1978 50 1981 47 1.51 29.30

c6288 4102 1.00 1.00 1.16 1.11 1.26 1.06 1.07 1.05 3631 471 3820 282 3693 409 7.80 913.32

c7552 2700 1.00 1.00 1.15 1.11 1.25 1.01 1.01 1.00 2610 90 2658 42 2648 52 2.02 358.03

Average 1.00 1.01 1.16 1.11 1.27 1.03 1.04 1.02

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339334

different heuristic approaches. While self-compensated andsingle-pitched libraries lead to good timing behavior acrossfocus, they also lead to relatively large area overheads of11% and 27%, respectively. The ILP optimization providesan optimal solution and can be used to determine how wellthe heuristics are performing. The two sensitivity-basedheuristics show 3–4% area increases while meeting timingrequirement throughout all defocus range. Note that thetrend is towards smaller area penalties in the largerbenchmarks, explainable by the fact that a smaller(relative) subset of gates are responsible for determiningtiming in these larger circuits. The first heuristic inparticular achieves circuit areas very close to optimal,usually within 1%. As can be seen from the runtime of thevarious optimization approaches, the heuristic techniquesshows very reasonable efficiency with high quality solu-tions relative to the ILP.

Table 4 shows the change in leakage power at theworst defocus conditions compared to the originallibrary at perfect focus using several self-compensatingdesign options. As can be seen, both self-compensatedcells and single-pitch cells designs options showsmodest �7% leakage increases at worst-case defocus.The area-driven dense+iso optimization shows10% less leakage than the nominal case at 0.4 mmdefocus, although the results for this case vary widely. Asexpected the leakage-driven optimization shows 25% lessleakage than the original circuit and 15% less thanheuristic 1 since leakage is directly accounted for in thisformulation.

In conclusion, a novel design technique to compensatefor lithographic focus-dependent CD variation is proposedin this paper. The proposed idea is to judiciouslyinstantiate isolated and dense versions of library cells in acircuit to effectively negate the impact of expected focusvariations.

5. Conclusions and future technology outlook

This paper has reviewed recent work in the analysis andmitigation of variability in modern ICs. There is asignificant amount of effort in both industry and academiain this area due to its potential showstopping nature. Thekey outcomes of growing process variability in nanometer-scale CMOS are: (1) a trend towards parametric yield asthe governing objective function, rather than timing orpower in isolation, (2) a push towards circuits and CADtechniques that focus on robustness as much as nominalperformance, and (3) a need for more cooperation andcommon understanding across the device, circuit, CAD,and architecture levels of the integrated circuit designprocess.The remainder of this section highlights future trends in

design and CAD efforts to cope with variability. Particularemphasis is placed on re-examining the emerging role ofstatistical techniques in nanometer-scale CMOS design,and on the injection of manufacturing information into thesignoff process in an effort to make design better reflectreality.

5.1. Statistical CAD

It is well known that traditional corner-case-baseddesign approaches lead to overly pessimistic guard-bandingand are ineffective in coping with the present scenario.However, the emerging statistical techniques that havebeen proposed are almost universally based on conceptsdeveloped in statistical static timing analysis (SSTA)[12,13,49]. Conventional thinking is that SSTA will providebetter timing estimation as well as form the underlyingengine for optimization capabilities (e.g., gate sizing, dual-Vth assignment, etc.). However, current SSTA formula-tions are based on a number of significant simplifications,

ARTICLE IN PRESSD. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 335

such as the assumption of normal delay and arrival timedistributions and approximations of correlations. Inaddition, a number of modeling issues remain to beaddressed in SSTA, including the dependence of delay onload and slope, interaction between interconnect and gatedelays under process variations, and the impact of noise ondelay. Finally, sensitivity computation based on SSTA hasproven to be expensive and optimization of complexparameters such as discrete Vth assignment still posessignificant difficulties. Given that commercial SSTA toolshave been available for several years and there is very little(or no) experimental data pointing to its superiority, onecan reasonably conclude that there is a lack of designerbuy-in at present.

Therefore, from a practical perspective it can be arguedthat existing statistical methods still require a significantamount of improvement, refinement, and innovationbefore they are suitable for use in developing mainstreamproducts. On the other hand, deterministic methods(i.e., algorithms, tools) are already well established andwidely used in industry. To provide an alternative path forenabling a variation-aware design technology infrastruc-ture, we suggest a new design optimization framework forstatistical analysis and optimization that uses knowndeterministic optimization procedures. This approach firststatistically samples the process space and then usesdeterministic approaches to optimize the design at eachsampled process condition. In order to maximize theruntime efficiency of the proposed framework, variationspace sampling concepts can be applied as well as a

Fig. 21. Proposed design optimization framework considering variability.

tailoring of the deterministic optimization for speedefficiency in the inner-loop of the method. After repeatingthis process of deterministic optimization across multipleprocess sample points, probability distributions (PDF) andcorrelations of the optimal design parameters (such as Vthor width) are computed for all gates in the circuit. ThesePDFs are then used to finalize the gate-level designparameters and thus guide the statistical optimization.The general philosophy behind this framework is to

learn from optimization outcomes under varying scenariosof process variations and use this knowledge to fix deviceparameters at design time for maximizing circuit yield. Theoverall framework is summarized in Fig. 21. Complexobjectives such as forming optimized clusters of gates forpost-silicon tuning using ABB [35] can be effectivelyaccomplished using this framework. It will be extremelydifficult to achieve such goals using traditional SSTAframeworks. Essentially this section strongly points to afuture built on MC-based SSTA (and accompanyingoptimization) rather than ‘traditional’ approaches thatpropagate timing (or power) distributions through circuits.The advantages of MC-based SSTA are numerous andsignificant in nature:

built on proven STA technology, easing integration intoflow and ensuring designer buy-in � accurate—it is essentially the golden model given a large

enough sample set

� no simplifying assumptions to deal with propagating

delay distributions

� trivially parallelizable such that runtime can be made

very reasonable

� can leverage variance reduction strategies to bring down

number of MC samples required

The latter two points are important in driving statisticaloptimization built on such a MC-based SSTA approach. Ingeneral the future of statistical CAD is still unclear—research into MC-based SSTA should be pursued as analternative path to more conventional SSTA approachesrelying on propagation of delay distributions.

5.2. Adaptive design

The functional correctness of designs as well as circuitparametric yield is endangered by uncertainty. As de-scribed above, statistical CAD tools have been proposed asa way to improve parametric yield and assist in timingverification. Much of variability is, however, systematicrather than random including strong spatial correlations.In that case, statistical techniques may not be sufficient tomaintain good yields going forward, particularly if physicaltolerances are relaxed due to cost pressures. An alternativeand orthogonal approach is the use of adaptive circuitfabrics that seek to guarantee and enforce a specifiedbehavior in circuits even in the presence of underlyingprocess variation. These techniques rely on sensing the

ARTICLE IN PRESS

DAP

L

2

L

2

L

2

L

2

L

2

L

2

L

2

L

2

L

2

L

2

PEProcessing Element

Simple architecture, DVS,

tunable FFs

Sensors/Tuning

Structures

PE States

Nominal Operation

Boosted Operation

Timing Yield Enhanced

Healing/Recovery

Sleep State

Failed Element

L1

Testing/Diagnosis

L3

Fig. 22. Overview of ElastIC architecture showing various PE operation modes [56].

1Since the performance requirements of the DAP unit itself are modest,

it can operate at low voltage and frequency and use aggressive built-in

redundancy to render the unit effectively immune to failure.

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339336

process and environmental conditions and adjustingcertain circuit properties to keep the design within itsspecifications. The area of adaptive circuit techniques isrelatively new and approaches to date have been ad hoc,not automated, and not sufficiently general. In particular, aformal approach is required that would provide automatedcircuit synthesis onto adaptive circuit fabrics. The desiredoutcome in this case would be a methodology thatpreserves the foundations of a traditional design flow,i.e., you get what you designed. The synthesis toolsthemselves would insert sufficient adaptivity to guaranteespecified power/performance/yield levels. This adaptivitycan take the form of known, but mostly unused to date,techniques such as ABB and supply voltages, or canleverage any newly developed approaches (e.g., adaptivepower gating [50] and sizing [51]).

In addition to CAD tools for adaptive design, differentsystem architectures can help designers deal with rampantvariability. In addition to extreme parametric variations(e.g., due to random dopant fluctuations and line-edgeroughness), aggressively scaled CMOS exhibits acceleratedwear-out mechanisms (such as oxide breakdown andnegative bias temperature instability, or NBTI [52,53])and high manufacturing device failure rates (expected toreach one in thousands or even hundreds [54,55]). Suchunpredictable silicon poses a fundamental barrier forreliable computing in future technologies. To address thesechallenges, a new architecture is needed that incorporates

very aggressive runtime self-diagnosis, adaptivity, and self-

healing.One such architecture, dubbed ElastIC [56], combines

circuit-level, microarchitectural, and system-level techni-ques and is conceptually depicted in Fig. 22. It consists offour key components: (1) a large number (tens to hundreds)of small, simple processing elements (PEs) that areextremely adaptive, equipped with reliability, performanceand power monitors, and individually tunable throughvoltage/frequency scaling, bias-levels and dynamic softcycle-boundaries. (2) A central diagnostic and adaptivityprocessing (DAP) unit. During runtime, the DAP unittakes each processing element off-line in turn to performdetailed diagnostics of parametric variations and wear-outusing automatic test pattern generation (ATPG) methodscoupled with in-situ reliability monitors in each core. TheDAP unit tracks the degradation of individual or smallclusters of devices due to wearout mechanisms. It can theninitiate active healing, taking advantage of the reversibilityof several wear-out mechanisms, and will tune the PE foroptimal compensation of parametric shifts.1 (3) A corre-sponding memory and interconnect system that usesintegrated error codes and redundancy to address bothfunctional failure and parametric shifts. (4) A system-level

ARTICLE IN PRESS

SnomSknee

Yie

ld

1

0

Design rule dimension (spacing)

Conventional

design rule

Flexible

design rule

Fig. 23. General concept of flexible design rules showing additional

manufacturable design space in the shaded region.

D. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 337

scheduler that employs up-to-date reliability degradationand frequency/power trade-offs of individual PEs andmemory elements to maximize global system performanceunder workload constraints.

While redundancy is an important weapon to combatfunctional failure/wearout, the overheads are unacceptablylarge to apply it blindly (e.g., triple modular redundancy orTMR). Intelligent approaches such as [57] are beingdeveloped to greatly reduce such overhead while maintain-ing very high coverage of potential runtime faults.Significant further work is needed in this area to bringcoverage to acceptable levels, i.e., 499.9%.

5.3. Post-lithography simulation

Rising design and manufacturing costs for IC designs area key challenge to the semiconductor industry [1]. Inparticular, traditional design rules have become increas-ingly muddled as maintaining a balance between area/performance and yield becomes more difficult. Overlyrestrictive design rule sets with cryptic recommended ruleslimit performance gains achieved in new technologies.Also, expensive masks make first-time-right silicon evenmore crucial. Design for manufacturability (DFM) is oftenbroadly defined, but primarily pertains to tools/techniquesthat bring manufacturing-awareness into the design flow,such as yield-aware routers [58] and placers that considerphase-shift mask restrictions [59]. Despite much recentDFM work, vital components of a true DFM-based designflow are unaddressed. For instance, final timing analysisusing drawn dimensions points to different critical pathsthan analysis including post-OPC printed image dimen-sions [60], yet no signoff capabilities based on post-lithosimulation data exist. Design optimization methods areconstrained by the delay-leakage tradeoffs of existingstandard cells. There is a large portion of the design spacethat is not exploited. Innovative methods for changing the

length and the shape of the channel are required, which cangenerate cells with a wide variety of delay-leakage tradeoffsand thus provide a large range of choices to optimizationalgorithms. Also, real deployment of any research out-comes cannot rely only on simulated results: siliconvalidation must be obtained.Key components of a ‘‘true DFM’’ flow would include

the ability to work with realistic shapes (i.e., non-rectangular transistors) in both the analysis and optimiza-tion domains (beyond what is described in Section 3.3 and[25–29,31]), and to move beyond the overly simplified 0/1design rule paradigm (see Fig. 23). In addition there is aclear need for new canonical test element groups thatsupport the above tools as well as enable new character-ization and modeling capabilities of process variability.

References

[1] International Technology Roadmap for Semiconductors, /www.

itrs.netS, 2007.

[2] S. Narendra, Leakage issues in IC design: trends, estimation, and

avoidance, in: ACM/IEEE International Conference on Computer-

Aided Design, 2003 (full-day tutorial).

[3] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, V. De,

Parameter variation and impact on circuit and micro-architecture, in:

ACM/IEEE Design Automation Conference, 2003, pp. 338–342.

[4] R.M. Rao, K. Agarwal, A. Devgan, K. Nowka, D. Sylvester, R.

Brown, Parametric yield analysis and constrained-based supply

voltage optimization, in: ACM/IEEE International Symposium on

Quality Electronic Design, 2005, pp. 284–290.

[5] D. Burnett, K. Erington, C. Subramanian, K. Baker, Implications of

fundamental threshold voltage variations for high-density SRAM

and logic circuits, in: IEEE Symposium on VLSI Technology, 1994,

pp. 15–16.

[6] B. Cheng, S. Roy, A. Asenov, The impact of random doping effects

on CMOS SRAM cell, in: IEEE European Solid State Circuits

Conference, 2004, pp. 219–222.

[7] M. Pelgrom, A. Duinmaijer, A. Welbers, Matching properties of

MOS transistors, IEEE J. Solid State Circuits (1989) 1433–1440.

[8] T. Mizuno, J. Okamura, A. Toriumi, Experimental study of threshold

voltage fluctuation due to statistical variation of channel dopant

number in MOSFETs, IEEE Trans. Electron Devices (1994)

2216–2221.

[9] K. Lakshmikumar, R. Hadaway, M.A. Copeland, Characterization

and modeling of mismatch in MOS transistors for precision analog

design, J. Solid State Circuits (1986) 1057–1066.

[10] R. Heald, P. Wang, Variability in sub-100 nm SRAM designs, in:

ACM/IEEE International Conference on Computer-Aided Design,

2004, pp. 347–352.

[11] A. Srivastava, S. Shah, K. Agarwal, D. Sylvester, D. Blaauw, S.

Director, Accurate and efficient gate-level parametric yield estimation

considering power/performance correlation, in: ACM/IEEE Design

Automation Conference, 2005, pp. 535–540.

[12] H. Chang, S.S. Sapatnekar, Statistical timing analysis considering

spatial correlations using a single PERT-like traversal, in: ACM/

IEEE International Conference on Computer-Aided Design, 2003,

pp. 621–625.

[13] C. Visweswariah, K. Ravindran, K. Kalafala, S. Walker, S. Narayan,

First-order incremental block-based statistical timing analysis, in:

ACM/IEEE Design Automation Conference, 2004, pp. 331–336.

[14] S. Tsukiyama, M. Tanaka, M. Fukui, A new statistical static timing

analyzer considering correlation between delays, TAU (2002) 27–33.

[15] C. Clark, The greatest of a finite set of random variables, Oper. Res. 9

(1961) 85–91.

ARTICLE IN PRESSD. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339338

[16] A. Abu-Dayya, N. Beaulieu, Outage probabilities in the presence of

correlated lognormal interferers, IEEE Trans. Veh. Technol. (1994)

164–173.

[17] S.C. Schwartz, Y.S. Yeh, On the distribution function and moments

of power sums with lognormal components, Bell Syst. Tech. J. 61

(1982) 1441–1462.

[18] R.R. Rao, A. Devgan, D. Blaauw, D. Sylvester, Parametric yield

estimation considering leakage variability, in: ACM/IEEE Design

Automation Conference, 2004, pp. 442–447.

[19] F. Brglez, H. Fujiwara, A neutral net list of 10 combinational

benchmark circuits and a target translator in Fortran, in: Interna-

tional Symposium on Circuits and Systems, 1989, pp. 695–698

[20] Y. Liu, S. Nassif, L. Pileggi, A. Strojwas, Impact of interconnect

variations on the clock skew of a gigahertz microprocessor, in: ACM/

IEEE Design Automation Conference, 2000, pp. 168–171.

[21] V. Mehrotra, S. Nassif, D. Boning, J. Chung, Modeling the effects of

manufacturing variation on high-speed microprocessor interconnect

performance, in:IEEE Electron Devices Meetings, 1998, pp. 767–770.

[22] K. Agarwal, M. Agarwal, D. Sylvester, D. Blaauw, Statistical

modeling of VLSI interconnect for physical design optimization,

IEEE Trans. Comput.-Aid. Des. (2006) 1273–1288.

[23] C.L. Ratzlaff, N. Gopal, L.T. Pillage, RICE: rapid interconnect

circuit evaluator, in: ACM/IEEE Design Automation Conference,

1991, pp. 555–560.

[24] C.J. Alpert, A. Devgan, C. Kashyap, A two moment RC delay metric

for performance optimization, in: International Symposium on

Physical Design , 2000, pp. 69–74.

[25] P. Gupta, A.B. Kahng, Y. Kim, S. Shah, D. Sylvester, Modeling of

non-uniform device geometries for post-lithography circuit analysis,

in: Proceedings of the SPIE Design and Process Integration for

Microelectronic Manufacturing V, 2006.

[26] A.P. Balasinski, L. Karklin, V. Axelrad, Impact of subwavelength

CD tolerance on device performance, in: Proceedings of the SPIE

Conference on Design, Process Integration, and Characterization for

Microelectronics, 2002, pp. 361–368.

[27] R.C. Pack, V. Axelrad, A. Shibkov, V.V. Boksha, J.A. Huckabay, R.

Salik, W. Staud, R. Wang, W.D. Grobman, Physical and timing

verification of subwavelength-scale designs: I. Lithography impact on

MOSFETs, in: Proceedings of the SPIE Conference on Design and

Process Integration for Microelectronic Manufacturing, 2003, pp. 51–62.

[28] F.-L. Heng, J.-F. Lee, P. Gupta, Toward through-process layout

quality metrics, in: Proceedings of the SPIE Conference on Design

and Process Integration for Microelectronic Manufacturing III, 2005,

pp. 161–167.

[29] I. Polishchuk, N. Mathur, C. Sandstrom, P. Manos, O. Pohland,

CMOS Vt-control improvement through implant lateral scatter

elimination, in: IEEE International Symposium on Semiconductor

Manufacturing, 2005, pp. 193–196.

[30] K.K.-L. Hsueh, J.J. Sanchez, T.A. Demassa, L.A. Akers, Inverse-

narrow-width effects and small-geometry MOSFET threshold voltage

model, IEEE Trans. Electron Dev. 35 (3) (1988) 325–338.

[31] P. Gupta, et al., Lithography simulation-based full-chip design

analysis, in: Proceedings of the SPIE Design and Process Integration

for Microelectronic Manufacturing V, 2006.

[32] T. Kuroda, et al., A 0.9-V, 150-MHz, 10-mW, 4mm2 2-D discrete

cosine transform core processor with variable threshold-voltage

scheme,, IEEE J. Solid State Circuits 31 (1996) 1770–1779.

[33] M. Miyazaki, G. Ono, T. Hattori, K.A. Ishibashi, A 1.2-GIPS/W

microprocessor using speed-adaptive threshold-voltage CMOS with

forward bias, IEEE J. Solid State Circuits 37 (2002) 210–216.

[34] M. Miyazaki, H. Mizuno, K. Ishibashi, A delay distribution

squeezing scheme with speed-adaptive threshold-voltage CMOS

(SA-Vt CMOS) for low voltage LSls, in: ACM/IEEE International

Symposium on Low Power Electronic Design, August 1998,

pp. 48–53.

[35] J. Tschanz, et al., Adaptive body bias for reducing impacts of die-to-

die and within-die parameter variations on microprocessor frequency

and leakage, J. Solid State Circuits 37 (2002) 1396–1401.

[36] S. Narendra, D. Antoniadis, V. De, Impact of using adaptive body

bias to compensate die-to-die Vt variation on within-die Vt variation,

in: ACM/IEEE International Symposium on Low Power Electronic

Design, 1999, pp. 229–232.

[37] A. Keshavarzi, et al., Effectiveness of reverse body bias for leakage

control in scaled dual Vt CMOS ICs, in: ACM/IEEE International

Symposium on Low Power Electronic Design , 2001, pp. 207–212.

[38] A.H. Montree, et al., Limitations to adaptive back bias approach for

standby power reduction in deep sub-micron CMOS, in: IEEE

European Solid State Devices Conference, September 1999,

pp. 580–583.

[39] M.-J. Chen, et al., Back-gate bias enhanced band-to-band tunneling

leakage in scaled MOSFET’s, IEEE Electron Device Lett. 19 (1998)

134–136.

[40] S. Narendra, A. Keshavarzi, B. Bloechel, S. Borkar, V. De, Forward

body bias for microprocessors in 130-nm technology generation and

beyond, IEEE J. Solid State Circuits 38 (2003) 696–701.

[41] M. Elgebaly, M. Sachdev, Efficient adaptive voltage scaling

system through critical path emulation, in: ACM/IEEE Inter-

national Symposium on Low Power Electronic Design, 2004,

pp. 375–380.

[42] T. Chen, S. Naffziger, Comparison of adaptive body bias (ABB) and

adaptive supply voltage (ASV) for improving delay and leakage

under the presence of process variation, IEEE Trans. VLSI Syst. 11

(2003) 888–898.

[43] L. Chang, et al., Stable SRAM cell design for the 32 nm node

and beyond, in: IEEE Symposium on VLSI Technology, 2005,

pp. 128–129.

[44] D. Ernst, et al., Razor: a low-power pipeline based on circuit-level

timing speculation, in: International Symposium on Microarchitec-

ture, 2003, pp. 7–18.

[45] S. Das, et al., A self-tuning DVS processor using delay-error

detection and correction, IEEE J. Solid State Circuits 41 (2006)

792–804.

[46] E. Karl, D. Sylvester, D. Blaauw, Timing error correction techniques

for voltage scalable on-chip memories, in: IEEE International

Symposium on Circuits and Systems, 2005, pp. 3563–3566.

[47] P. Gupta, A.B. Kahng, P. Sharma, D. Sylvester, Selective gate-length

biasing for cost-effective runtime leakage control, in: ACM/IEEE

Design Automation Conference, 2004, pp. 327–330.

[48] S. Shah, P. Gupta, A.B. Kahng, Standard-cell library optimization

for leakage reduction, in: ACM/IEEE Design Automation

Conference, 2006, pp. 983–986.

[49] A. Devgan, C. Kashyap, Block-based static timing analysis with

uncertainty, in: ACM/IEEE International Conference on Computer-

Aided Design, 2003, pp. 607–614.

[50] H. Deogun, R.M. Rao, D. Sylvester, K. Nowka, Adaptive

MTCMOS for dynamic leakage and frequency control using variable

footer strength, in: IEEE System-on-Chip Conference, 2005,

pp. 147–150.

[51] H. Wang, M. Miranda, A. Papanikolaou, F. Catthoor, W. Dehaene,

Variable tapered pareto buffer design and implementation allowing

runtime configuration for low-power embedded SRAMs, IEEE

Trans. VLSI Syst. (2005) 1127–1135.

[52] D. Schroder, J. Babcock, Negative bias temperature instability: road

to cross in deep submicron silicon semiconductor manufacturing,

J. Appl. Phys. (2003) 1–18.

[53] D. Schroder, Negative bias temperature instability: what do we

understand?, Microelectron. Reliab. (2007) 841–852.

[54] J.M. Rabaey, Design at the end of the silicon roadmap, in: ACM/

IEEE Asia and South Pacific Design Automation Conference, vol. 1,

2005, pp. 18–21.

[55] S. Borkar, Designing reliable systems with unreliable components: the

challenges of transistor variability and degradation, in: IEEE Micro,

November/December 2005, pp. 10–16.

[56] D. Sylvester, D. Blaauw, E. Karl, ElastIC: an adaptive self-healing

architecture for unpredictable silicon, IEEE Des. Test (2006)

484–490.

ARTICLE IN PRESSD. Sylvester et al. / INTEGRATION, the VLSI journal 41 (2008) 319–339 339

[57] S. Shyam, K. Constantinides, S. Phadke, V. Bertacco, T. Austin,

Ultra low-cost defect protection for microprocessor pipelines, in:

International Conference on Architectural Support for Programming

Languages and Operating Systems (ASPLOS), 2006, pp. 73–82.

[58] J. Mitra, P. Yu, D.Z. Pan, RADAR: RET-aware detailed routing

using fast lithography simulations, in: ACM/IEEE Design Automa-

tion Conference, 2005, pp. 369–372.

[59] C. Chiang, A.B. Kahng, S. Sinha, X. Xu, Fast and efficient phase

conflict detection and correction in standard-cell layouts, in: ACM/

IEEE International Conference on Computer-Aided Design, 2005,

pp. 149–156.

[60] J. Yang, L. Capodieci, D. Sylvester, Advanced timing analysis based

on post-OPC extraction of critical dimensions, in: ACM/IEEE

Design Automation Conference, 2005, pp. 359–364.