Synthesis of hazard-free asynchronous circuits with bounded wire delays

26
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995 61 Synthesis of Hazard-Free Asynchronous Circuits with Bounded Wire Delays Lucian0 Lavagno, Member, IEEE, Kurt Keutzer, Senior Member, IEEE, and Albert0 L. Sangiovanni-Vincentelli, Fellow, ZEEE Abstract-This paper introduces a new synthesis methodology for asynchronous sequential control circuits from a high level specification, the Signal Transition Graph (STG). The method- ology is guaranteed to generate hazard-free circuits with the bounded wire-delay model, if the STG is live and has the Complete State Coding property. The methodology exploits knowledge of the environmental delays, speed-independence with respect to externally visible signals, and logic synthesis techniques. A proof that STG persistency is neither necessary nor sufficient for hazard-free implementation is given. I. INTRODUCTION SYNCHRONOUS sequential circuit design has always A been a controversial topic. In the early years of electronic circuit design, when the size of the circuits was such that a human designer could keep track of the complex timing issues involved, it was a popular design style (see [47] for a thorough review). Then synchronous logic dominated the VLSI era, when the ease of design of clocked circuits overwhelmed the advantages of the asynchronous style. Asynchronous design, yet, has always been around, at least in the restricted domain of interfaces to the external world, asynchronous by definition. However it was usually limited to finding a good reliable way to synchronize signals with the internal clock. Recently there has been a revival of interest in asynchronous self-timed circuits [40] due to their desirable properties: 1) the clock-skew problem, getting worse and worse in synchronous submicron designs, disappears completely. 2) system-level latency is no longer dictated by the worst- case delay, but by the average delay. For example, a self-timed adder can signal when the result on its outputs is valid and stable, rather than always wait for the worst- case delay of the carry chain. 3) the power consumption of a static CMOS implemen- tation may be reduced with respect to a synchronous implementation, because there is no need to drive long clock lines at every cycle. These properties are counterbalanced by a more constrained design procedure, which until now has discouraged an exten- sive use of asynchronous circuits, except in those cases where the specification is inherently asynchronous (e.g., the VME bus interfaces). The clock period of synchronous circuits must be long enough for the combinational logic outputs to settle. Asyn- chronous circuits, on the other hand, are by definition sensitive to all signal changes, whether they are intentional (i.e., part of the specification) or not. An example of such unintentional changes, also called hazards, are the multiple oscillations of a signal that is supposed to have a single transition. In this paper we will give a procedure transforming a formal, technology-independent specification, called Signal Transition Graph [10],[39], into a circuit implementation made out of gates from any available library (e.g., nands, nors, and-or- inverts, SR flip-flops, ...). We want to prove that the output of our procedure does not have hazards. In order to do so, we must define what delay model we are going to use for our circuit implementation. The unbounded gate-deZay model [3 11 assumes that wires interconnecting gates have zero delay, and that all the paths inside each gate have exactly the same delay. It also assumes that no bounds are known on the delay of each gate [Fig. l(a)]. The unbounded wire-delay model [46] assumes that each connection between a gate output and a gate input can have an unbounded delay [Fig. l(b)]. The bounded wire-delay model [15] also assumes that each connection between a gate output and another gate input can have a delay [see again Fig. l(b)]. But in this model the amount of delay from each input to each output of a complex gate is a function of the load on the gate out- put. The function depends both on the input that we con- sider and on the actual circuit used to implement the gate. This delay is called nominal delay. Because of statistical fluctuations in the manufacturing process and of modeling errors. for examde the delav on the wires themselves. a Manuscript received February 11, 1992; revised, November 24, 1992 and lower an bound oL the nominal delay are con- June 6, 1994. This work was supported in part by the National Science Foundation, Grant UCB-BS16421, and by AT&T Bell Laboratories. This paper sidered when verifying the circuit with timing analysis. - _ was recommended by Associate Editor Louise Trevillyan. di Torino, Torino, Italy. Within this paper we will always consider pure delay elements, that perform a translation in time of the input L. Lavagno is with the Department of Electrical Engineering, Politecnico K. Keutzer is with Synopsys Inc., Mountain View, CA, 94043 USA. waveform, rather than inertial delays, that suppress output _. A. L. Sangiovanni-Vincentelli is with the Department of Electrical Engi- neering and Computer Sciences, University of California, Berkeley, CA 94720 USA. pulses shorter than the delay value [47]. The approximation is pessimistic, but allows us to prove strong results on hazard- IEEE Log Number 9404032. preserving circuit transformations. 0278-0070/95$04.00 0 1995 IEEE

Transcript of Synthesis of hazard-free asynchronous circuits with bounded wire delays

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995 61

Synthesis of Hazard-Free Asynchronous Circuits with Bounded Wire Delays

Lucian0 Lavagno, Member, IEEE, Kurt Keutzer, Senior Member, IEEE, and Albert0 L. Sangiovanni-Vincentelli, Fellow, ZEEE

Abstract-This paper introduces a new synthesis methodology for asynchronous sequential control circuits from a high level specification, the Signal Transition Graph (STG). The method- ology is guaranteed to generate hazard-free circuits with the bounded wire-delay model, if the STG is live and has the Complete State Coding property. The methodology exploits knowledge of the environmental delays, speed-independence with respect to externally visible signals, and logic synthesis techniques. A proof that STG persistency is neither necessary nor sufficient for hazard-free implementation is given.

I. INTRODUCTION

SYNCHRONOUS sequential circuit design has always A been a controversial topic. In the early years of electronic circuit design, when the size of the circuits was such that a human designer could keep track of the complex timing issues involved, it was a popular design style (see [47] for a thorough review). Then synchronous logic dominated the VLSI era, when the ease of design of clocked circuits overwhelmed the advantages of the asynchronous style. Asynchronous design, yet, has always been around, at least in the restricted domain of interfaces to the external world, asynchronous by definition. However it was usually limited to finding a good reliable way to synchronize signals with the internal clock.

Recently there has been a revival of interest in asynchronous self-timed circuits [40] due to their desirable properties:

1) the clock-skew problem, getting worse and worse in synchronous submicron designs, disappears completely.

2) system-level latency is no longer dictated by the worst- case delay, but by the average delay. For example, a self-timed adder can signal when the result on its outputs is valid and stable, rather than always wait for the worst- case delay of the carry chain.

3) the power consumption of a static CMOS implemen- tation may be reduced with respect to a synchronous implementation, because there is no need to drive long clock lines at every cycle.

These properties are counterbalanced by a more constrained design procedure, which until now has discouraged an exten- sive use of asynchronous circuits, except in those cases where the specification is inherently asynchronous (e.g., the VME bus interfaces).

The clock period of synchronous circuits must be long enough for the combinational logic outputs to settle. Asyn- chronous circuits, on the other hand, are by definition sensitive to all signal changes, whether they are intentional (i.e., part of the specification) or not. An example of such unintentional changes, also called hazards, are the multiple oscillations of a signal that is supposed to have a single transition.

In this paper we will give a procedure transforming a formal, technology-independent specification, called Signal Transition Graph [10],[39], into a circuit implementation made out of gates from any available library (e.g., nands, nors, and-or- inverts, SR flip-flops, ...). We want to prove that the output of our procedure does not have hazards. In order to do so, we must define what delay model we are going to use for our circuit implementation.

The unbounded gate-deZay model [3 11 assumes that wires interconnecting gates have zero delay, and that all the paths inside each gate have exactly the same delay. It also assumes that no bounds are known on the delay of each gate [Fig. l(a)]. The unbounded wire-delay model [46] assumes that each connection between a gate output and a gate input can have an unbounded delay [Fig. l(b)]. The bounded wire-delay model [15] also assumes that each connection between a gate output and another gate input can have a delay [see again Fig. l(b)]. But in this model the amount of delay from each input to each output of a complex gate is a function of the load on the gate out- put. The function depends both on the input that we con- sider and on the actual circuit used to implement the gate. This delay is called nominal delay. Because of statistical fluctuations in the manufacturing process and of modeling errors. for examde the delav on the wires themselves. a

Manuscript received February 11, 1992; revised, November 24, 1992 and lower an bound oL the nominal delay are con- June 6, 1994. This work was supported in part by the National Science Foundation, Grant UCB-BS16421, and by AT&T Bell Laboratories. This paper sidered when verifying the circuit with timing analysis. - _ was recommended by Associate Editor Louise Trevillyan.

di Torino, Torino, Italy.

Within this paper we will always consider pure delay elements, that perform a translation in time of the input L. Lavagno is with the Department of Electrical Engineering, Politecnico

K. Keutzer is with Synopsys Inc., Mountain View, CA, 94043 USA. waveform, rather than inertial delays, that suppress output _. A. L. Sangiovanni-Vincentelli is with the Department of Electrical Engi-

neering and Computer Sciences, University of California, Berkeley, CA 94720 USA.

pulses shorter than the delay value [47]. The approximation is pessimistic, but allows us to prove strong results on hazard-

IEEE Log Number 9404032. preserving circuit transformations.

0278-0070/95$04.00 0 1995 IEEE

~

62

e

a

b

C

(b)

Fig. 1 . Delay models.

The work of [15], [45], and [47] made available to asyn- chronous designers a synthesis procedure starting from a Finite-State-Machine-like specification, the Flow Table, and producing logic equations implementing it. Unfortunately the resulting circuit is hazard-free only under the normal funda- mental mode assumption, that is the designer has to ensure that the circuit inputs change one at a time, only when the circuit itself is stable and ready to accept them. This verification task is nontrivial, and requires to use the bounded wire-delay model to analyze the circuit.

More recently [33] introduced a self-clocked design method- ology that does not require the single input change constraint, at the expense of some restrictions on the Flow Table spec- ification and on the environment behavior. Both these Flow Table-based methodologies ensure hazard-free operation by increasing the delay of some signals (state signals in the first case, the internal clock in the latter).

On the other hand, Muller [31],[27] introduced an analysis technique that allows to verify if a complete’ gate-level circuit is speed-independent. Roughly speaking this means that it functions properly without hazards using the unbounded gate- delay model. Unfortunately this delay model, used in the synthesis methods proposed, for example, by 113, [501, 1301 or [3], is not terribly realistic, as ignoring technological limits on the delays is pessimistic, while ignoring the wire delays (or assuming them to be smaller than gate delays, as in [8]) is optimistic.

The most interesting aspect of Muller’s approach hence lies in the fact that by giving a precise and formal model of both the circuit and its environment? it allows properties of the circuit, such as hazard-freeness, to be formally proved.

Chu [lo] and Rosenblum et al., [39] independently in- troduced the Signal Transition Graph (STG) as a powerful

A complete circuit does not have primary inputs, thus assuming that the

* By environrnenr we mean here whatever entity observes the circuit outputs

environment is also modeled as a circuit.

and produces the circuit inputs.

formalism to specify asynchronous circuits. We have chosen it as the input for our design procedure because:

it allows to describe the behavior both of the circuit and of its environment (thus retaining the formal power of Muller’s approach), it is relatively easy to use, due to its similarity to classical timing diagrams, and it explicitly handles some major paradigms of asyn- chronous behavior, such as concurrency, causality and choice.

The two papers that introduced the STG formalism used a very different approach to defining its correctness. Chu relied on a theoretic characterization based on Hack‘s results on free- choice Petri nets [I41 (see also Section 11-C). Rosenblum et al., on the other hand relied on a practical argument, accepting as valid any STG that could be meaningfully interpreted as a circuit specification (excluding, for example, an STG where a signal can have two falling transitions without rising in between). The first approach has the advantage of allowing an easy syntactic analysis of the properties of the STG. The second approach has the advantage of not limiting the range of behaviors that can be specified to those that fall within Hack’s characterization (see [51],[ 181 for a discussion of the problem). In this paper, we give a sufJicient correctness condition for our algorithms, based on free-choice Petri nets, and discuss how this condition can be relaxed to extend its descriptive power to the level allowed by Rosenblum et al.

The STG synthesis approach proposed by Chu has two ma- jor drawbacks (see Section IX for a more extensive treatment of this subject):

It requires the STG to be persistent in order to be able to produce a hazard-free circuit, which we will show to be neither necessary nor sufficient for this purpose. It can guarantee to automatically produce a hazard-free circuit only using an unrealistic delay model, where no restriction is posed on the complexity of the logic function implemented by each gate.

The similar approach described in [39] suffers only from the second problem, since it does not impose any theoretical restriction on the form of the STG.

In this paper we propose to overcome those drawbacks by using the bounded wire-delay model to synthesize circuits from an STG specification. The unbounded wire-delay model, that in principle would be the most robust with respect to process and environmental variations, cannot be used, as shown by [471, [351 and [24], to build circuits out of “basic” gates (and, or, . . .). So authors who proposed design methodologies for circuits that operate correctly with this model (delay- insensitive circuits) actually were forced to make assumptions on the delays, either in isochronicforks [8] or within carefully designed logic modules [28],[38],[12],[71,[44]. So our method- ology can be considered as a design aid for those modules, while ensuring that the “global protocol” specified by the STG is speed-independent or delay-insensitive [52].

The purpose of this work is to describe a synthesis procedure that can be proved to generate hazard-free circuits from an STG specification, with the bounded wire-delay model. The

LAVAGNO et al.: SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS 63

synthesis procedure resembles the one presented in [lo] and [26]. However our procedure is more general, since it deals with a more realistic delay model, and it can potentially give better results, since it can guarantee that the circuit is hazard- free without requiring the STG to be persistent. In order to do this we will:

give a synthesis procedure deriving from an STG a circuit implementation C with two-level combinational functions and flip-flops, together with sufficient conditions on the STG guaranteeing that such an implementation exists, characterize all hazards in C, due to the fact that we have delays inside the logic block implementing each STG signal, show that constrained multilevel logic synthesis can be used without altering hazard properties of C, give a procedure to eliminate all hazards from C using the bounded wire-delay model.

The methodology is based on the following key results: Hazards can occur in a two-level implementation of an STG if and only if, due to delays inside the circuit, the STG-specified ordering of two signal transitions is reversed. The above property is preserved by constrained logic synthesis. All hazards can be eliminated by increasing the delay of some STG-specified signal, to “restore” the violated ordering constraints.

The paper is organized as follows. Section I1 recalls some definitions from the literature and describes some previous work in the area. Section 111 analyzes the hazard properties of a two-level implementation of a logic function. Section IV gives an algorithm to synthesize a two-level implementation of an STG specification and shows that this implementation does not have hazards for a class of input transitions, namely those that are concurrent in the specification. Section V shows that we can apply logic synthesis to this two-level circuit and obtain a multilevel implementation with the same hazard properties. Section VI analyzes when hazards can occur in a circuit implemented according to the method given in the previous sections. Section VI1 describes how remaining potential hazards can be eliminated from the implementation by adding delay to some signals specified by the STG. Section VI11 shows how the results obtained so far can be extended to handling dynamic hazards as well as static hazards. Section IX is devoted to the analysis of the definition of STG persistency in [lo] and concludes that, contrary to previous beliefs, it is neither necessary nor sufficient to ensure hazard-freeness of the implementation. Section X describes the algorithm implementation and gives experimental results. Section XI draws some conclusions and outlines opportunities for future development. Appendix A describes a very simple example, applying the ideas presented in the paper.

11. DEFINITIONS AND PREVIOUS WORK

This section gives some basic definitions and recalls previ- ous results useful throughout the paper.

A. Logic Functions and Combinational h g i c Circuits

A completely specified single-output logic function g of n input variables is a mapping g : (0, l}n + (0 , l ) . Each input variable x; corresponds to a coordinate of the domain of 9. Each element of (0, l}” is called a vertex .

An incompletely specified single-output logic function f of n input variables (called logic function in the following) is a mapping f : {0,1}” + {0,1,*}.

The set of vertices where f evaluates to 1 is called the on- set of f, the set of vertices where f evaluates to 0 is called its off-set, the set of vertices where f evaluates to * (i.e., it is not specified) is called its dc-set.

The complement of a logic function f, denoted by 7, is obtained by exchanging the on-set and off-set.

A literal is either a variable or its complement. A cube c is a set of literals, such that if zi E c then c and vice-versa. It is interpreted as the Boolean product of its elements. The cubes with exactly n literals are in one-to-one correspondence with the vertices of { O , l } n .

A cube c’ covers another cube c, denoted e’ J c, if c’ C_ c, for example {a ,b} C_ {a,b, e}, so ab 2 aTlc (from now on we will drop braces and commas from a cube representation, and use the more familiar “product” notation).

The intersection of two cubes e’ and c is not defined if there exists a variable xi such that either xi E c’ and E E c or E E e’ and zi E c. Otherwise it is e’ = c’ U c. It is called “intersection” because it covers exactly the intersection of the sets of vertices covered by c‘ and c. For example the intersection of ab and ac is a&.

The Hamming distance between two cubes e’ and c, denoted by d(c’,c), is the cardinality of the set of variables zi such that either xi E c’ and E E c or

A cube is called an implicant of a logic function f if it does not cover any off-set vertex of f . An implicant of f is called a prime if it is not covered by any other single implicant of f .

An on-set cover F of a logic function f is a set of cubes such that each cube of F is an implicant o f f and each on-set vertex of f is covered by at least one cube of F . An off-set cover R of a logic function f is a cover of the complement

A cube e‘ in an on-set cover F of a logic function f can be expanded against the off-set of f by removing literals from it while it does not cover any off-set vertex. The result of the expansion is not unique (it depends on the order of removal), but it is always a prime implicant of f .

In the following we shall use “cover” to denote on-set covers. Each cover F corresponds to a unique completely spec@ed logic function, denoted by B(F) . On the other hand, a logic function can have in general many covers. A cover is interpreted as the Boolean sum of its elements, so it can also be seen as a two-level sum-of-products implementation of the completely specified function B ( F ) . The intersection of a cover C with a cube c‘ is the set of all the defined intersections of ci E C with c’.

A cover F is called a prime cover of a function f if all its cubes are prime implicants of f . A cover F is called an irredundant cover of a function f if deleting any cube c from

E e’ and xi E c.

of f .

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

F causes it to be no longer a cover of f (i.e., if some on-set vertex is no longer covered by any cube of F - {e}).

A function f is monotone increasing in a variable xi if

f ( x i = O,P) = 1 forallp E (0, I } , - ' ,

that is if increasing the value of the variable zi from 0 to 1 never decreases the value of f from 1 to 0.

A function f is monotone decreasing in a variable xi if

f(zi = 1 , ~ ) = o + f ( x i = O,P) = Oforall@ E (0, I } , - ' ,

that is if decreasing the value of the variable xi from 1 to 0 never increases the value of f from 0 to 1.

A function f is unate in a variable xi if it is either monotone increasing or monotone decreasing in x i . Otherwise f is binate in xi.

A cover F is mate in a variable xi if variable xi appears only in one phase (i.e., either xi or c) in its cubes. As shown in [5], a function that is unate can have non-unate covers, but prime covers of unate functions must be unate. Moreover, if F is a cover of f and F is unate then f must be unate.

A combinational logic circuit is represented as a labeled, directed, acyclic graph G = (V, E ) where each node v is labeled:

f(zi = 1 , ~ ) = 1

either with the name of a primary input or output, or with a logic function of a set of nodes, its fanin. There exists an edge (u,v) in G for each fanin U of U. Conversely, w is a fanout node of each such U.

A node that is not a primary input or a primary output will also be called a gate. An edge will also be called a wire. A combinational circuit will also be called a multilevel circuit.

A two-level combinational circuit contains three classes of gates:

inverter gates, with fanin primary inputs. and gates, with fanin inverter gates and primary inputs. or gates with fanin and gates, inverter gates and primary

primary outputs, with fanin inverter gates, and gates, and

A combinational logic circuit is a leaf-DAG if only primary inputs have a fanout greater than one. The distance of a node v from the primary inputs, d ( v ) is recursively defined as follows:

inputs.

or gates.

if v is a primary input, then d ( v ) = 0. otherwise, d ( v ) = 1 + maGEjanin( , , ) d (u)

In the wire-delay model, a pure delay element (i.e., a trans- lation in time of the input waveform) is associated with every edge in the circuit, and each node is considered an instan- taneous Boolean evaluator. The bounded wire-delay model assumes that upper and lower bounds on the magnitude of each delay are known, while the unbounded wire-delay model assumes only that the delays are positive. So these models can describe a whole family of circuits at once. An instance of a circuit family has one specific value, within the bounds associated with its family, assigned to every delay element.

B. Hazards

Synchronous circuits do not have hazard problems: the clock cycle is chosen long enough to ensure that every latch input

is stable when the clock is pulsed. In the asynchronous case, we must ensure that no signal transition ever happens except when it is specified by the designer, because every transition can be detected by some other part of the system, and cause it to behave incorrectly.

A static hazard is a 0 i 1 -+ 0 transition (static 0-hazard) or 1 -+ 0 -+ 1 transition (static 1-hazard) in any condition where no transition for that signal is enabled according to the specification.

A dynamic hazard is a sequence of 0 + 1 -+ 0 -+ . . . + 1 (or 1 -+ 0 -+ 1 -+ ... -+ 0) transitions in any condition where a single positive (or negative) transition for that signal is enabled according to the specification.

Hazards must be absolutely avoided, because they can cause the circuit to malfunction in an unpredictable way (for example in response to a change in operating temperature).

C. Signal Transition Graph

The Signal Transition Graph was introduced by [9] and [39] as a specification formalism for asynchronous sequential circuits. It is a natural way to specify asynchronous interface circuits, because the causal relationships among the signal transitions can be easily described, and the concurrency is captured explicitly.

A Petri net (PN, [37],[36],[32]) is a triple N = (P,T, F ) , where P is a set of places, T is a set of transitions and F C ( P x T ) U (T x P ) is the flow relation. A place p E P is a predecessor of a transition t E T , and t is a successor of p , if ( p , t ) E F . Conversely, a transition t E T is a predecessor of a place p E P, and p is a successor of t , if ( t , p ) E F .

A free-choice net (FCPN) is a PN where if a place p has more than one transition tl . . . t , as it successors, then p must be the only predecessor of t l . . . t,. A Marked Graph (MG) is a PN where each place p has exactly one predecessor and one successor transition. A State Machine (SM) is a PN where each transition t has exactly one predecessor and one successor place.

An SM reduction of a free-choice net is obtained by choosing exactly one place among the predecessors of each transition and then by iteratively deleting transitions and places with no predecessors or successors. Similarly, an MG reduction of a free-choice net is obtained by choosing exactly one transition among the successors of each place and then by iteratively deleting dangling elements. A set of reductions of a net such that each place and transition has a corresponding element in one of the reductions is called a cover of the net, and each reduction is called a component of the net.

An STG is an interpreted free-choice Petri net: transitions of the FCPN are interpreted as value changes on input/output signals of the specified circuit. Positive transitions (labeled with a "+") represent 0 -+ 1 changes, negative transitions (labeled with a "-") represent 1 -+ 0 changes. Henceforth, t* will denote a transition of signal t (i.e., either t+ or t - ) and will denote its complementary transition (i.e., either t - or t+, respectively). Input transitions are those that occur on input signals of the circuit, output transitions are those that occur on its output signals.

65 LAVAGNO et al.: SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS

A marking m is reachable from another marking m‘ if there exists a sequence of transition firings that produces m starting from m‘.

Two transitions tl and t 2 of a marked PN are concurrent if there exists a reachable marking m where both tl and t 2

are enabled, and neither tl disables t 2 nor vice-versa in any reachable marking.

2-t- x-

(a) (b)

Fig. 2. Example of signal transition graph.

Some authors also consider a third class of signals, internal signals. They are noninput signals whose transitions do not have any input signal transition as direct successors (or, equivalently, that are not observable by the environment). We will treat both output and internal signals uniformly for the purposes of this paper, while the distinction becomes important, e.g., for STG manipulation techniques that may be allowed to change the behavior of internal signals.

The conventional graphical representation of an STG (slightly different from the PN convention) is a directed graph, where transitions are denoted by the corresponding label and places are denoted by circles, while directed edges represent elements of the flow relation. Places with only one predecessor and one successor are usually omitted. Directed edges whose successor is a transition represent sequencing constraints, either on the circuit to be synthesized (if their successor is an output transition), or on the environment (if their successor is an input transition). They specify what set of transitions causes each transition.

Fig. 2 contains an example of an STG (from [26]), together with an equivalent timing diagram representation of the same behavior. Suppose that z and y are inputs and that z is an output. Then the edge I+ + y+ means that the environment guarantees that the rising edge of y is caused by the rising edge of x. The edges z- + z- and y+ + z - mean that the circuit to be synthesized must guarantee that the faiiing edge of z is caused by the falling edge of z and the rising edge of y.

I) Marking and Firing: A token marking of a PN is a non-negative integer labeling of its places. A transition is enabled (i.e., the corresponding event can happen in the circuit) whenever all its predecessor places3 are marked with at least one token. Transition I+ is enabled in Fig. 2 (the black dot represents the initial marking).

An enabled transition may eventually fire. This means that the corresponding signal changes value in the circuit. When it fires, a token is removed from every predecessor place, and a token is added to every successor place.

If a place marked with one token has more than one successor edge, then exactly one of its successor transitions is nondeterministically enabled. In this case firing one of them disables the other one. This means, in practice, that the behavior of the circuit depends on an external condition. So [ 101 constrained all successor transitions of a multi-successor place in a well-formed STG to be input transitions.

3The marking simply appears on the edges themselves whenever a single predecessor/single successor place is omitted from the graphical representa- tion.

- 2) Live and Safe Petri Net: In the following we will restrict

ourselves to FCPN’s whose underlying directed graph is strongly ~onnec ted .~

A marking m is live if for all markings m’ reachable from m, every transition can be enabled through some sequence of firings from m’. A marked PN is live if its initial marking is live. This means that, since the PN is strongly connected, every transition can be enabled infinitely often through some sequence of firings from the initial marking.

A marking m is safe (sometimes referred to as 1-bounded) if no place can ever be assigned more than one token after any sequence of firings from m. A marked PN is safe if its initial marking is safe.

Hack [14] proved that: Theorem 2.1: Let N bet a free-choice Petri net. Then the

1) N has a live and safe marking i f and only i f : two following statements hold

a.

b.

every MG reduction is nonempty and strongly connected, and the MG components cover N . every SM reduction is nonempty and strongly connected, and the SM components cover N .

2) Let m be a live marking of N . Then m is safe if and only ifthere exists an SM cover where each component is marked with exactly one token in m.

SM components can be thought of as running concurrently. They synchronize on the transitions that belong to the inter- section of two (or more) components.

MG components can be thought of as running one at a time. Whenever a place corresponding to a multisuccessor place in the original FCPN becomes marked, then the next running component is nondeterministically chosen.

If there are no multisuccessor places, i.e., if the PN is an MG, then the SM components are just the simple cycles of the PN. For example in Fig. 2 there are two SM components, namely z+ + z+ + x- -+ z- + y- -+ z+ and

This decomposition mechanism is very useful to analyze theoretically the properties of FCPN’s, allowing a characteri- zation of behavior properties (liveness, safeness, . .) in terms of syntactic properties.

Liveness is obviously a desirable property of a circuit (a signal that can never change its value is redundant), and safeness is required by the synthesis procedure outlined below, so from now on we will restrict ourselves to strongly connected STGs with a live and safe initial marking.

3) Live Signal Transition Graph: We now give a definition of STG liveness that is slightly more general than the one

x+ 4 y+ + 2- + y- + x+.

4F0r example, there exists a path from each transition or place to every other transition or place.

-

66 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

given in [ 101 (notice the distinction between Petri net liveness and STG liveness).

An STG is defined as live if it has all the following properties.

1) The underlying PN is finite, free-choice, live and safe. 2) No output signal transition can ever be disabled. 3) For each signal t, there is at least one SM component,

initially marked with one token, such that: a) it contains all the transitions t:’s of t,, and b) each path from a transition t: to another transition

t: (i.e., both rising or falling) contains also the complementary transition e.

A result of [ 141 guarantees that the SM component in condition 3 remains marked with exactly one token after any firing sequence if the FCPN is live and safe. Liveness ensures that each signal in the circuit has always a well-defined value in all markings reachable from the initial one, because a rising and falling transition for the same signal can never be concurrently enabled and each signal must have alternate rising and falling transitions.

This definition is slightly more general than the one given in [lo], since:

Chu required that only two transitions per signal appear in the STG, and that those transitions are ordered (i.e., belong to a simple cycle) in every SM component of the PN. we do not restrict the number of transitions per signal, and we require that at least one SM ensures the alternating order for each signal.

D. State Graph

The State Graph (SG), also introduced by [lo], is a Finite- State-Machine-like description of the same behavior as the STG, where the STG concurrency is represented as an inter- leaving of transitions.

The SG is a directed graph, where each node (henceforth called state) is in one-to-one correspondence with a marking of the Signal Transition Graph reachable from its initial marking. An edge joins state s’ with state s if the marking m (corresponding to s) can be reached from m’ (corresponding to s’) through the firing of a single transition. This transition labels the edge.

Fig. 3(a) contains the SG derived from the STG in Fig. 2 (the initial marking corresponds to the dotted state).

The SG can be derived from the STG by exhaustive sim- ulation with the following recursive procedure (see also [lo] for an equivalent procedure based on graph decomposition). The procedure is called initially with current marking m the initial marking of the STG.

Procedure 2.1 : 1) If m has not been recorded yet, then:

a) create a new state s associated with m. b) for each transition t: enabled in m: cl fire t:, obtaining a marking m’.

call recursively step 1 using m‘ as current mark- ing, retrieving the corresponding state s’. create an edge from s to s’ labeled with t:.

E. Next-State Function Derivation and Complete State Coding

The synthesis procedure described below uses the signals of the circuit directly as state variables. The corresponding circuit structure is depicted in Fig. 4, where the combinational logic implements the next-state function of each output signal.

The synthesis procedure assumes that we can label each state s of the SG with a vector v of signal values that is consistent with the SG edge labeling. In other words for each edge s’ --f s, for each signal t,:

1) if the edge is labeled t: then signal t, must be 0 in v’

2) if the edge is labeled t,- then signal t, must be 1 in ‘U’

3) otherwise signal t, must have the same value in both U’

and 1 in v,

and 0 in U,

and U.

An example of such a labeling appears in Fig. 3(b). Theorem 2.2: Let S be a live STG. Then its associated SG

has a consistent labeling. Proof: If the STG is live, then for each signal t, there

exists one SM component, initially marked with one token, to which all the transitions of t, belong. The component remains marked with one token after any firing sequence, due to the above mentioned result of [14]. Hence transitions in this component alternate, and all the possible firing sequences of t, are consistent (i.e., rising and falling alternate). This implies

Let S be a live STG and let m be its initial marking. The following procedure determines the label w of the initial state, corresponding to the initial marking of S:

that the value o f t , in each marking is unique.

Procedure 2.2: 1 ) For each signal t, do:

a) let M, be an SM component of S , initially marked with one token, that contains all the transitions for t,, and let m, be its initial marking ( a subset of m).

b) find on M, the first transition t j that can be reached from m,.

c) i f t j is t3ft then let v, = 0, otherwise let U, = 1. The initial values are well defined if the STG is live (Section II-C-3), because:

at least one M, containing all transitions of signal t, must exist. the direction of the first transition of t, that can fire on the SM component M, starting from a marking m must be independent of the path on M3, or else there would be a firing sequence where two transitions t j (both rising or falling) are not separated by a q. if there are two (or more) SM components containing all the transitions of t,, say M: and M,, then the first transition of t , reachable from a marking m restricted to M,’ or M, must be the same, since the intersection of the two SM’s obviously contains all such t;’s, and the SM’s are synchronized in their mutual intersections.

Note that in practice exhaustive reachability analysis, fol- lowed by a straightforward application of the definition of consistent labeling is often a more efficient way of performing the same task.

LAVAGNO et al.: SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS 61

Fig. 3. A state graph with state codes.

4 combinational

logic M Fig. 4. Synthesized circuit structure.

Let f be the next-state function to be implemented for output signal ti, and let v be an element of the domain of f (i.e., v E (0,l)" if the STG 5' has n signals). Every state has a consistent labeling in terms of the STG signals, so each SG state s can be associated with a vertex v of the domain

The following procedure [ 101 derives the label of each state and f . It is initially called with the initial marking m of S and its label w (as computed by Procedure 2.2).

out1

out2

- of f . -

Procedure 2.3: 1) Zjt: is enabled in m then let f ( w ) = 1. 2) Else if t i is enabled in m then let f (v) = 0.

-

68 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

3) Else let f ( u ) = vi. 4) For each transition tj* enabled in m such that marking

m’, obtained from m $ring tj*, has not been reached yet, do: a) let U’ = U. b) ift; is t:, then let U; = 1, otherwise let U; = 0. c) recursively call step I with d and m’.

Note that in each state there can be only one fanout edge with label ti*, because no two transitions for ti can be concurrently enabled.

Moreover f ( w ) is don’t care for all vertices U of the domain of f that do not correspond to any SG state.

Fig. 3(c) contains the SG corresponding to the STG of Fig. 2. Each state is labeled with the corresponding input vertex (state, code, upper level) and the next-state value for 2, y, z (lower label).

It should be obvious that a hypothetical implementation o f f as a zero-delay logic function, followed by an unbounded delay on the output, satisfies the STG specification. A transition of ti can happen in such a circuit exactly when it is enabled in the STG (and correspondingly in the SG).

Chu showed that the next-state function is well-defined (i.e., it has only one value for each point in the domain) if and only if for each pair (s’,s) of SG states that have the same label, the same set of output signal transitions is enabled in both markings corresponding to s’ and s [IO]. If the SG has this characteristic, then we say that the STG from which the SG was derived has the Complete State Coding property (the name CSC is due to [30]).

The first methods to enforce the CSC property, in the limited case of an interpreted marked graph with only a pair of transitions for each signal, were given by [48] and [53]. Both methods may add state signals and/or new edges to the STG. More recently [22] proposed a complete procedure for general free-choice nets that relies on state minimization, critical race-free state encoding and state signal insertion to produce an STG with CSC. Vanbekbergen et al., on the other hand, describe in [49] a method that works directly at the SG level to satisfy the CSC requirement.

111. STATIC HAZARD ANALYSIS OF A TWO- LEVEL LOGIC CIRCUIT

In this section we analyze when static hazards can occur in a two-level circuit implementation C of a logic function f . The analysis is performed using the unbounded wire-delay model and assuming that an STG specification restricts the way in which input and output signals of C are allowed to change value. Section VI11 shows how to extend these results to handle dynamic hazards as well.

As shown in Section 11-E, we can synthesize an asyn- chronous sequential circuit specified by an STG as a set of combinational subcircuits C, one for each output signal of the STG t i , each having as inputs the set of signals specified by the STG. Each C, in general, has its output signal ti appearing also as one of its inputs. In this analysis, though, we will treat them as separate signals (i.e., we decompose the sequential

circuit into a combinational logic block and a feedback path). We will ensure that the output signal does not have hazards if none of the inputs has hazards.

An input vector is an assignment of binary values to all input signals of C. A sequence of input vectors (also called a transition sequence) is consistent with the STG if there exists a path on the SG such that each vector appears in it as a state label in the same order as in the sequence. A static hazard occurs in the circuit if applying a consistent sequence of input vectors to C , with any delay among them, its output changes value when the STG does not allow a transition for it.

Exhaustive simulation of all consistent input vector se- quences for all possible delay assignments is clearly not feasible. So we will use three-valued logic analysis, where each variable can assume a value 0, 1 or “-” (for undeter- mined), as described in [ 131, by collapsing a whole family of input vector sequences and delay assignments into a single three-valued simulation. For example the three-valued output of a two-input or gate with inputs 1 and “-” is 1, while with inputs 0 and “-” it is “-”.

An input cube is a set of assignments of three-valued values to all the input signals of C. An input with a value of 1 or 0 corresponds to a signal whose value is known to be constant from the STG specification during the transition sequence that we are simulating. An input with a value of “-” corresponds to a signal allowed by the STG to have 1 or more transitions during the transition sequence that we are simulating.

With this procedure we can simulate the transition sequence where all the signals with a value of “-” change value in any order at any point in time, under all possible wire delay assign- ments. In this sense, three-valued simulation is a pessimistic approximation of the real circuit behavior, because it detects which hazards might happen with any delay assignment, even impossible ones. So we will need to refine this analysis once an implementation for the circuit has been derived and more precise delay estimates are available.

We define a path 7r on the SG, with initial and final states s’ (labeled U’) and s (labeled U), to be valid with respect to the logic function f of signal ti if

1) f(d) = f ( w ) (since we are analyzing static hazards),

2) 7r contains at most one edge labeled with a transition

Each valid path is associated with a transition cube, where all the signals that change value from s‘ to s are undetermined, while the other signals have the value they have in U’ (and also in U, of course).

Eichelberger [ 131 proved that a static hazard condition exists for a gate-level implementation, with the unbounded wire- delay model, if and only if the three-valued simulation of the transition cube corresponding to a valid path gives an undetermined output value.

The following procedure performs the three-valued sim- ulation for all valid paths of a two-level circuit, directly implementing an on-set cover F of a logic function f. Let U; be the value of input signal t; in the input vector U. For example if w = 100 then 211 = 1, u2 = 0, u3 = 0.

and

of ti.

LAVAGNO et al.: SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS 69

Procedure 3.1: 1) for each valid path T , associated with the state pair

(s', s) (labeled U' and U, respectively): a) determine the corresponding transition cube c as

follows: for each input signal ti: b) i f T contains an edge labeled with a transition for

ti, then let ci = - else let ci = U:

c) simulate the circuit under c. d) i f c does not intersect any cube c' E F , then the

output of each cube c', and of the circuit, is 0. else i f c is covered by some cube c' E F , then the output of that particular c', and of the circuit, is 1. otherwise ( c intersects some c' without being cov- ered by any single one of them), the output of those c', and of the circuit, is -. Then a hazard condition exists for state pair ( S I , 5).

For example, consider a subcircuit synthesized from the STG in Fig. 2 [the corresponding SG appears in Fig. 3(b)] to implement output signal z. Path 100 -, 101 -, 001, with associated state pair (100, Ool), is valid for signal z. Its corresponding transition cube is -0-. Path 101 -+ 111 -, 011, with associated state pair (101, 011) is not valid, because the value of the next-state function of z is 1 and 0, respectively.

IV. NEXT- STATE FUNCTION COVER DERIVATION FROM A SIGNAL TRANSITION GRAPH

One of the main problems in asynchronous circuit synthesis is to ensure that the circuit behavior is correct for each possible ordering of concurrent transitions. In the example of Fig. 2, since nothing is said about the ordering of z+ and y+, then no output signal may have static hazards, regardless of their firing order. So we must ensure, remembering the analysis in Section 111, that the on-set and off-set covers F and R that we synthesize for the next-state function of output ti have the following property:

Property 4.1: Let S be a live STG' with CSC, let F be an on-set cover, let R be an off-set cover of the next-state function of signal ti, synthesized from S , and such that the intersection of F and R is empty.6

Then for each reachable marking m of S and for each set T of transitions concurrently enabled in m such that the next state value for ti does not change during m y sequence offirings in T :

1) if ti must be 1, then there exists at least one cube c' E F such that:

a.

b.

c' covers the vertex corresponding to marking m and no signal whose transition is in T appears in c'.

We require S to be live in order to be able to associate each marking m of S with a vertex in the domain of the next-state function of each signal t , in S.

61n general a valid pair of on-set and off-set covers of a given incompletely specified function may have a nonempty intersection on dc-set vertices.

2) otherwise, i f ti must be 0, then there exists at least one cube c E R such that:

a. b.

c covers the vertex corresponding to m and no signal whose transition is in T appears in c.

Case 1 guarantees that the output of F , if so required, remains at 1 independent of the firing order in T. Case 2 guarantees that the output of F , if so required, remains at 0 independent of the firing order in T , even though it is stated in terms of the off-set cover R. This is because the distance of c; E R from any cube cj E F is greater than 0. So we can be sure that each c j E F will evaluate to 0 in the vertex corresponding to m, and no signal whose transition is in T appears in it.

For example, the set of concurrently enabled transitions in the marking shown in Fig. 2, corresponding to vertex 100 in Fig. 3(b), is T = {y+,z+}. If one of the cubes in the on- set cover of the next-state function for z, which must be a constant 1 independent of the firing order in S , is exactly 5, then the value of the next-state function for signal z will remain consistently at 1 regardless of the firing order of y+ and z+.

In this way we are able to guarantee that the circuit implementation does not have hazards under any valid path corresponding to the firing of concurrent transitions in the STG specification.

The following procedure derives an on-set cover F and an off-set cover R for the next-state function f of each output signal ti, receiving as input a live STG S with the CSC property. Let v be a vector of values for the n signals that appear in S, w E (0 , l}n, and let 1-'j denote the value of signal t j in U. Initially, F = 4, R = 4, m is the initial marking of S , and w is its label.

Procedure 4. I : 1) For each marking (SG state) m, labeled U, do:

a) For each maximal subset T of transitions concur- rently enabled in m such that t,' is not enabled in the marking m' obtained from m firing all transitions in T do: i) let c = {wj : tj* $ T} . ii) i f f ( u ) = 1 then let F = F U {c} , otherwise let

This procedure finds for each state a collection of maximal sets of on-set or off-set vertices that have the same next-state function value, and can be covered by a single cube. Note that T can include a transition of ti itself.

F and R are constructed exactly to satisfy Property 4.1. This guarantees that the next-state function remains constant whenever the STG specifies so, independent of the firing order of each set of concurrently enabled transitions.

We can show that F and R are on-set and off-set covers of the next-state function f , as obtained in Section IV.

Theorem 4.1: Let f be the incompletely speciJed next-state function of signal ti, obtainedfrom a live STG S with the CSC property using Procedures 2.1, 2.2 and 2.3. Let F and R be the covers obtained from S using Procedure 4.1.

R = R U { c } .

70 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. I , JANUARY 1995

Then F and R are valid on-set and off-set covers of f , that is every on-set vertex and no off-set vertex o f f is covered by a cube of F and every off-set vertex and no on-set vertex o f f is covered by a cube of R, and the intersection of F and R is empty . Proofi We have the following cases:

1) Some transitions enabled in the current marking m lead directly to a marking m’ where no transition of ti is enabled. Then we generate a set of cubes such that the vertex IJ corresponding to m is covered. Moreover all the cubes belong either to the on-set or to the off-set according to whether vi is 1 or 0. All covered vertices correspond to markings where the next state value of t; is the same, so if the STG has the CSC property then no vertex where f must have a different value can be covered. Furthermore the intersection of F and R is empty, because all covered vertices can be reached by some firing sequence of the concurrent transitions, so no generated cube covers any dc- set vertex.

2 ) Every transition firing from m directly reaches a marking m’ where a transition of ti is enabled. We will show that in this case no transition of signal ti can be enabled in m. Suppose that tT is enabled in m and let m‘ be the marking reached from m firing t,’. Then a) tf could not be enabled again in m’, since the STG

is live (otherwise two rising or falling transitions of ti could fire in sequence).

is enabled in m’. Let m be the marking (obviously with the same label as m) reached from m’ by firing c. We have the following cases: i. If t,’ was not enabled in m‘, then we would have a

contradiction, because m and m‘ would have the same label but different enabled output signals, so the STG would not have the CSC property.

ii. The same reasoning applies if tf was enabled in m’ and there existed some marking, reachable by m‘ by firing only transitions of ti, where no transition of t; is enabled.

iii. Otherwise, signal ti can have an unbounded number of transitions without any other signal changing in between:

either the STG is simply a cycle with two transitions, t’ 4 t i -+ t:, that can be shown by inspection to satisfy the theorem, or tf is enabled by some transition t5 of some other signal. Then if t5 fires again (it can do so by liveness) without tb firing (it can do so by assumption), the place between them can be marked twice, thus contradicting the safeness hypothesis.

We also know that the STG is live, so there exists some marking m predecessor of m under some transition ti. Then whenever the procedure reaches m, it generates a cube cover-

b) Assume, for the sake of contradiction, that

ing also the vertex corresponding to m, because m is obtained from m by firing a transition that does not enable t:.

w Fig. 5 contains an STG fragment (a) and the corresponding

SG fragment (b) to illustrate case 1. Let o be the signal for which we are generating cover cubes in marking m (black dots in the STG fragment). Black dots in the SG represent on-set vertices of f , white dots represent off-set vertices.

1) The vector corresponding to marking m is: a = 0, b = 0,c = 1 ,d = 1 ,e = l , o = 1.

2) The sets of transitions that can fire without enabling 0- are: S = {a+, b+}, S’ = {a+, e - } and S” = {b+, - e-}.

3) The generated cubes are: c = cdeo,c’ = bdeo and

4) Each cube covers vertex Tibcdeo, corresponding to m, and belongs to the on-set cover. So the on-set vertex &deo of f is covered.

5) Every cube covers only vertices where 0- is not enabled, so no off-set vertex (such as a b d e o ) is improperly covered.

Fig. 6 contains an STG fragment (a) and the corresponding SG fragment (b) to illustrate case 2. Let o be the signal for which we are generating cover cubes in marking m (black dots in the STG fragment). The circle represents a multi-successor place. Black dots in the SG represent on-set vertices of f , white dots represent off-set vertices.

1) The only two transitions that can fire in m are either a+ or c- (not both, since this is a multi-successor place marked with one token, so it enables only one successor transition). Both enable 0-.

2) One example of a marking m’ predecessor of m is obtained by replacing the token on b+ -+ 0- with one token on each fanin arc of b+.

3) One of the cubes generated in m’ is hco, so the one-set vertex abco corresponding to m is covered.

4) If one of the enabled transitions in m had been either o+ or 0-, instead of a+ or e-, then it is clear that

- - - adeo, respectively.

either the STG would not have had the CSC property (o+ followed by 0-, Fig. 6(c)), or it would not have been live (0- followed by 0-, Fig. 6(d)).

Various authors [18],[51] pointed out that the assumption to start from a live STG can be unnecessarily restrictive, as there are some useful asynchronous circuit behaviors that cannot be described with live safe free-choice Petri nets. The reader can check that Procedure 4.1 and the proof of Theorem 4.1 rely only on a more relaxed set of assumptions. Namely we only need that all of the following conditions are satisfied:

1) The State Graph associated with the STG is finite and has a consistent labeling with vectors of signal values. Obviously Procedure 2.2 in this case must be replaced with an exhaustive reachability analysis.

2 ) The STG has Complete State Coding. 3) The SG is output-persistent [52], i.e., no output signal

transition can be disabled by some other transition firing.

71 LAVAGNO er al.: SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS

t t

a

a -

J

(a) Fig. 5. Illustration of Theorem 4.1, case 1.

\ / 0-

/" a+

\ C -

m +

\ / bi

0-

0

m".

l b +

(C) Fig. 6. Illustration of Theorem 4.1 case 2.

Otherwise we would have meta-stability problems that the proposed methodology cannot handle directly.

apply .down logic syn ... esis techniques in orc-r to obtain a minimal implementation of the combinational part.

The next two sections describe how we can implement and optimize the initial circuit while preserving the hazard properties. We first establish what logic synthesis operations can be safely used, and then we outline how to use them in an optimized synthesis procedure.

v. CIRCUIT IMPLEMENTATION OF THE NEXT- STATE FUNCTION

Once we have obtained an on-set and an off-set cover of the next-state function f for each output signal, we can choose how to implement the feedback loop (sequential part), and

~

12 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

A. Logic Synthesis for Minimum Two-Level Implementation

In general we have the choice between implementing the on-set or the off-set cover of each signal (inverting the output in the latter case). In the following we will discuss only about the on-set cover, but the results apply also to the off-set cover implementation.

We want to obtain an implementation that is minimal with respect to some cost function, usually a combination of delay, area and testability.

Prime and irredundant covers are very important from an implementation point of view, because:

1) A two-level implementation of a logic function obtained from a prime and irredundant cover is fully testable for single stuck-at-faults [ 191.

2) A prime and irredundant cover is a good starting point for multilevel logic synthesis systems [6].

On the other hand we want to preserve Property 4.1, because it is tightly connected with the hazard properties of the implemented circuit.

This means that we can expand each cube in the cover F against R to a prime implicant of f , because this does not introduce additional dependencies of the cube on signals that may change when f must remain constant 1. Moreover, it does not remove all dependencies on signals that remain stable when f must remain constant 0.

Unfortunately we cannot always remove redundant cubes, because we must guarantee that each cube in the original on- set cover is covered by some prime. But we can at least set up a minimum covering problem similar to the classical Quine- McCluskey minimization procedure [25], where each cube c of the original cover (rather than each on-set vertex of f , as in [25]) must be covered by at least one prime implicant in the final cover.

The resulting two-level logic minimization procedure is as follows. Let F and R be the on-set and off-set cover, respectively, of the next state function f of an output signal ti of an STG, produced by Procedure 4.1.

Procedure 5.1: 1) let F be the set of all prime implicants of the incompletely

speciJed logic function with on-set vertices covered by F and off-set vertices covered by R.

a) every cube in F is covered by at least one cube in

b) F' has minimum cardinality.

2) Let F' be a set of cubes of F such that :

F' and

The output F' of this Procedure is a prime but potentially redundant cover of f that is guaranteed to have the minimum number of cubes among those that ensure no hazards due to the firing order of concurrent transitions.

Note that the two-level synthesis methodology described in [30] is potentially faster at the expense of optimality, as it performs only a single expansion of the cubes in F, removing only duplicated cubes.

B. Logic Synthesis and Hazards

We shall now prove that for every multilevel combinational circuit M derived from a two-level circuit T by applying the

associative and the distributive law, or adding an inverter to any output of the circuit, for every assignment of delays to the wires in M , there exists an assignment of wire delays in T such that the behavior of M and T is the same (modulo a change in polarity if an inverter is added, of course). Note that the given results are valid and significant only in the unbounded wire delay case.

So if we analyze, as in Section 111, what classes of input changes may cause a hazard in T under some particular combination of wire delays, then we can check only those input changes and the corresponding wire delays in order to examine all possible hazard causes in M. Furthermore if we derive, as in Section IV, a two-level implementation T of a logic function f that does not exhibit hazards for some class of input changes, then we can use multilevel synthesis techniques, constrained to use only the transformations listed above, to obtain a multilevel implementation of f that has the same hazard properties.

A similar result was proved in [47], with the restrictive assumption that a new set of input values can be applied to the circuit only when it is stable (fundamental mode assumption). This assumption cannot be used for circuits synthesized from an STG, so a new proof strategy is required which is valid for the more general case.

We will find it much easier to prove our results relative to a leaf-DAG representation of a combinational circuit rather than the original circuit. To do so we will introduce a couple of lemmas that show that given an arbitrary combinational circuit, we can construct a leaf-DAG circuit that has the same temporal behavior at the primary outputs. These lemmas are concerned only with the existence of delay assignments such that behavioral equality holds, not with their physical realizability due to, e.g., very large fanout loads for primary input nodes.

Lemma 5.1: Let M be a speciJic instance of a combi- national circuit family. Let M' be a combinational circuit derived from M by duplicating every gate that has more than one fanout (copying the delay assignment of every duplicated wire), until every gate in the resulting circuit has exactly one fanout.

Then the primary outputs of M and M' have exactly the same behavior in time.

Note that we do not make any hypothesis on how and when the primary inputs can change.

Pro08 We will proceed by induction on the distance of each node w in M from primary inputs.

base case, d(w) = 1: all the fanin nodes of w are primary inputs. Then all the corresponding duplicated gates w; in M' have exactly the same fanin set and exactly the same delay assignment, so their temporal behavior reproduces the behavior of w exactly. induction step. Let us assume that each gate U in M such that d(u ) < d(w) has exactly the same temporal behavior as its duplicates uj in MI. Then all the duplicates w; have all their inputs with exactly the same behavior as the inputs of w, and the same edge delay assignment. So their temporal behavior is the same as that of w.

rn

LAVAGNO et al. : SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS 13

Lemma 5.2: Let M be a spec@ instance of a combina- tional circuit family, such that no gate has more than one fanout. Let M' be a combinational circuit derived from M by recursively moving the delays back up to edges fanout of primary inputs. That is for each node U , in order of increasing distance from primary outputs, let e( v) be the delay associated with its single fanout edge, assign zero to the fanout delay of U , and add e(v) to each fanin delay of U .

Then the primary outputs of M and M' have exactly the same behavior in time.

Proof: Pure delay, being a translation in time, can be moved from the fanout edge of a single-fanout node of a com- binational circuit to each of its fanin edges without changing its behavior. Then each step of the procedure transforming M into M' preserves the temporal behavior of each node.

Theorem 5.1: Let 7 be a two-level combinational circuit family, such that only primary inputs can have multiple fanout and only edges which fanout from primary inputs have a nonzero (unbounded) delay. Let M' be a multilevel com- binational circuit family derived from 7 by applying the distributive and associative laws, and/or by adding an inverter to some primary output, with unbounded delays on every wire.

Then for each instance M of M' there exists an instance T of 7 whose primary outputs have exactly the same behavior in time (modulo complementation of value ifan inverter is added).

Proof: We will give a constructive method to derive T given 7 , M and M, and prove that each step preserves temporal behavior.

Given M, transform it into M' where each gate has exactly one fanout, using Lemma 5.1. Then transform M' into M" where only edges fanout of primary inputs have a nonzero delay, using Lemma 5.2.

Now apply the inverse transformations of those used to obtain M from 7, in order to derive a two-level circuit T. No transformation changes the primary output temporal behavior, because we have nonzero delay labels only on edges fanout of primary inputs, and:

removing an inverter driving a primary output changes only the polarity of the primary output itself, not its temporal behavior. inverse application of the associative laws:

a. ( b . c ) + 0 , . b . c

a + ( b + c ) - , a + b + c

merges two gates into one. Direct application of the associativz law to derive M from 7 can never dupli- cate faniik, so these gates did not have any common fanin node. Then the edges and their weights are simply copied to the merged node, and the temporal behavior is obviously not changed. inverse application of the distributive laws:

a . ( b + c) -i (a . b) + (a . c)

a + ( b . c ) - t ( a + b ) . ( a + c )

transforms two nodes into three nodes. Now we must duplicate the delay label on the edge between node a and

the and (or) nodes, and again the temporal behavior does not change.

Notice that, since M was obtained from 7 in a very restricted way, its gates can only be inverters, ands and ors. But we can use the following procedure [l 11 to replace some subcircuit of M with a single gate that computes the same function, if convenient, and still be able to apply Theorem 5.1. Let M be a specific instance of a combinational circuit family M.

Procedure 5.2: Replace each node v in M and the set of fanout edges (v,u) with :

either a node computing the same function as v an inverter i, with fanin U ,

an inverter i(,,u) with fanin i, for each edge (v , U ) .

a node computing the complement of the function

an inverter with fanin v for each edge ( U , U ) .

In both cases, let the edge between each i(,,u) and each U inherit the delay label of (v, U ) , and let the other added edges be labeled with zero. Replace each node v in the modi$ed circuit with a two- level prime and irredundant subcircuit implementing the same node function. Let each edge from a node U to some and or inverter gate inherit the same delay value as the connection from U to U , and let all other edges connecting inverters to ands and andto orshave a zero delay.

or

of U

Theorem 5.2: Let M be a spec$c instance of a combina- tional circuit family. Let M' be a combinational circuit derived from M by applying Procedure 5.2.

Then the primary outputs of M and M' have exactly the same behavior in time.

Proof: The delay-free two-level subcircuit acts as an instantaneous Boolean evaluator, so it replaces each node v without affecting the behavior of the primary outputs.

Similarly the insertion of inverters transforms a single node in a delay-free evaluator for the same function, and a set of

The theorems developed in this section are necessary in order to be able to implement the two-level circuit produced by Procedure 5.1 in an arbitrary technology, possibly improving the area and/or delay performance of the circuit. We only assume that the technology is complete (i.e., it allows to implement any Boolean function using an interconnection of gates). So the methodology can be applied, e.g., to semi- custom standard cell or gate array design, as well as full- custom design. We can use some multilevel logic synthesis techniques, such as those described in [6] and [ 111, restricted to the transformations listed in Theorems 5.1 and 5.2. Algebraic factorization, for example, is a direct application of associative and distributive laws, so it is covered by Theorem 5.1, while tree-based technology mapping is covered by Theorem 5.2.

edges with exactly the same delay as before.

~

74 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

C. Local Feedback Loop Implementation

The local feedback loop, occurring when the next-state function of a signal essentially depends on the signal itself, can always be implemented using a simple flip-flop. This is shown in the following theorem, first proved by [29] in the restricted case when the STG is persistent.

Theorem 5.3: Let S be a live STG with the CSC property. Let F’ be an on-set cover of the next-state function f of signal t i derived from S according to Procedures 4.1 and 5.1.

Then F’ is positive unate in ti. An intuitive reason for this is that if f is binate in ti, then

there is a set of input values for which ti oscillates. And, if f is unate, every prime cover of f must be unate.

Proof: Let us assume, for the sake of contradiction, that F‘ is binate or negative unate in ti.

Then there exists at least one vertex U’ = Gp, where /3 E (0, l}n-l, belonging to the on-set f, whose corresponding vertex w = t i p belongs to the off-set o f f (otherwise Procedure 5.1 could cover both w’ and w with a prime implicant not depending on t i ) .

The value of f in vertex ‘U’ is the complement of the value of ti in U’, so t’ is enabled in the marking m’ corresponding to d. The marking obtained firing t? corresponds exactly to U, since w differs from w’ only in the value of ti.

Similarly the value of f in vertex w is the complement of the value of ti in w, so t; is enabled in the marking m corresponding to ’U.

But m is obtained by m’ through the firing oft:, so a firing of t? would immediately enable t i . The same argument as in case 2 of Theorem 4.1 can be used to show that this case is

A function with a positive unate cover is positive unate, hence:

Corollary 5. I : Let S be a live STG with the CSC property. Let f be the next-state function of signal ti derived from S according to Procedures 4.1 and 5. I .

Then f is positive unate in t i . If f is positive unate in ti, then there exist two logic

functions s and m that do not depend on t i , such that f = s + tim. So we can partition the subcircuit into two purely combinational parts, s and m, and an S M flip-flop [4]. The flip-flop has logic equation Q = s + qm, where q and Q represent the present and next values of the flip- flop state variable, respectively, and s and m are its set and memory inputs, respectively. This decomposition uses only the distributive property, so according to Theorem 5.1 it does not change the hazard properties of the circuit. Note that the given logic equation implies the use of a set-dominant SM flip-flop.

We can also use Theorem 5.1, adding an inverter to m, to implement the feedback loop with the more usual set-dominant SR flip-flop, with logic equation Q = s+qF, without changing the hazard properties of the circuit.

Similarly we can use a Muller C element, with logic equation Q = SF + qs + qF, if the cover F’ can be factored according to the given equation. A C element implementation may be desirable, because it may reduce the number of potential hazards that manifest themselves in the final circuit

either trivial or impossible. rn

[2]. In this case, Procedure 4.1 must be changed to generate an on-set cover F’ with cubes only for SG states where a rising transition of the given output signal ti is enabled, and an off-set cover R’ with cubes only for SG states where a falling transition of ti is enabled. Then a valid next-state function implementation for signal ti can be obtained by connecting F‘ and R’ to the s and r inputs of the C element, respectively. Output Q, implementing signal ti, rises when a rising transition of ti is enabled and no falling transition of ti is enabled, and it falls when a falling transition of ti is enabled and no rising transition of ti enabled. The hazard analysis procedure described in the next section also requires some minor changes that are left as an exercise to the reader.

We assume in the following that the flip-flop is ready to accept a new transition of its inputs (i.e., that the internal feedback loop is in a stable state) whenever the output Q makes a transition. This means that the delay of the internal feedback loop must be smaller than the delay of any other circuit path from flip-flop output to one of its inputs. The assumption can easily be justified with a careful layout of the flip-flop itself.

D. Hazards and Redundancy

As we informally claimed in Section V-A, we cannot in general remove redundant cubes from the initial two-level implementation of an STG. Fig. 7(a) contains an example of a live STG with the CSC property whose two-level imple- mentation of the flip-flop excitation function m according to Procedures 4.1 and 5.1 is redundant. Fig. 7(b) contains the SG (input variables are ordered a, b, c, t ) , while Fig. 7(c) contains the Karnaugh map of the function m.

The initial cover of f is: f = abt + ai3 + ab + act + bZt. The cover of s is then: s = Tib while the cover of m is: m = ab + aF + bi3 + Ec where the implicant a? is redundant (it is shown by the dashed oval on the Karnaugh map, while nonredundant implicants are shown by dotted ovals). If the redundant implicant is removed from the cover, then a hazard can occur when the circled b- transition fires, because the implicant bE could go to 0 before the implicant ab goes to 1. This causes a static 1-hazard, and possibly a malfunction in the circuit, since the SA4 flip-flop can be set incorrectly due to this hazard.

A more realistic example of the same kind of problem can be found in Fig. 8 (taken from [lo]). An on-set cover for signal Ai is F = DL + DRi + LE1 and an off-set cover for it is R = D L+ DRi + LRi. Both covers are redundant (cube DRi can be removed from the on-set cover and cube DRi can be removed from the off-set cover).’

-_ -

VI. STATIC HAZARD ANALYSIS OF A ClRCUIT IMPLEMENTED FROM A SIGNAL TRANSITION GRAPH

Now let us see what happens when we apply a sequence of input patterns corresponding to a valid path T to the circuit. Let the circuit implementation of signal ti be obtained from the

’There exist also valid covers for A, that are not redundant, but we only use this example to show that redundant covers can indeed arise in practice, and not only from “crafted” STG‘s such as the previous one.

LAVAGNO et al.: SYNTHESIS OF HAZARDFREE ASYNCHRONOUS CIRCUITS

b+ + I i 1 I 1 1

t+

b-

C-

t-

a+

b+

a-

c+ 1 a+

C+

I I 1

b+

t-

b-

a-

(a) (b)

Fig. 7. An STG that has a redundant two-level implementation.

00 01 11 10 o\mi .._ . . ..- ...

1 :::1 1 ; 0 , , l , ; . . .... '. .. _.'

The following theorem is the fundamental result of this paper, and is the basis of the hazard elimination procedure as well:

Theorem 6.1: Let F' be a two-level circuit implementation of the next state finetion f for signal t h , derived according to Procedures 4.1 and 5.1 from a live STG S with the CSC property. Suppose that the interaction between F' and its environment obeys S.

Then no static hazard can occur with respect to signal t h if and only if:

for each valid path T such that the associated transition cube c is not completely covered by either an on-set or an off-set cube o f f , - for each pair of transitions (t:, t i ) such that:

Fig. 8. Another STG that has a redundant two-level implementation. * they are not concurrent, and

on-set cover F' of the next-state function f for ti as described in Sections IV and V. Let ( s f , s) be the state pair associated with T , and let U' and U denote their respective labels.

A path on the transition cube c associated with T is defined as a sequence of vectors of signal values uo + w1 + U' . . . + un such that:

U' is the label of s', un is the label of s, the Hamming distance between each pair d(vi,vi+l) is 1 and c covers every U'.

Such a path corresponds to any permutation (possibly with repetitions) of the transition in T

* they belong to T in the order t: + t:, and * there exists an SG state s', labeled U' , on T such

that: . t: is enabled in the corresponding marking and

. the label obtained from U' by toggling signal t b has a next state function value different from U' , (i.e., an inversion in thejring order o f t : and tz causes the value o f f to change), the effect of ti cannot, under the bounded wire delay hypothesis, propagate to t h before the effect of t : .

the effect of ti cannot, under the bounded wire delay hypoth- esis, propagate to t h before the effect of t : .

16 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

Pro05 e Suppose that there exists a pair of transitions such that the effect of ti can propagate to th before the effect oft: and such that an inversion in the firing order of tz and ti can cause the value of f to change. Then obviously we have a hazard at t h .

3 Suppose that a static hazard can occur in the circuit. We will show that the order of a pair of transitions as described in the theorem statement is reversed.

According to [13] a static hazard occurs only if we have applied to F‘ a transition cube c that was not covered by a single on-set or off-set cube. Moreover, the transition cube must be associated with a valid path R, because the hazard is static.

We have the following cases (as usual, let s’, s denote the initial and final states of R, and let w‘, w denote their labels):

f ( d ) = f ( v ) = 1. Let us assume that two distinct cubes of F’, say ea and cb, are required to cover the vertices on R (if more than two cubes are required, then we can just pick another valid sub-path of R, such that two cubes are sufficient to cover its vertices). Then there exists some path w’ + w on c that is not completely contained in F’ (i.e., some vertex w belonging to that path is not covered by any cube of F’). Otherwise we could cover both ea and cb with a single cube, contradicting the assumption that F’ is prime. For each such path that leaves and re- enters F’ at most once (if any path as above leaves and re-enters F’ more than once, there exists another path that does it only once), we can find two transitions t: and ti such that: - ti and t: are concurrent (otherwise they would be

covered by a single cube), - ti turns cb off (i.e., cb covers the label of the fanin

vertex of ti and it does not cover the label of its fanout vertex on the SG),

- t: turns ea on (i.e., ea does not cover the label of the fanin vertex oft: and it covers the label of its fanout vertex on the SG).

Note that the transitions must occur in the order t: -+ t i in the STG, in order for cb and ea to change value in the right order and keep ti at 1. The case when f ( d ) = f ( w ) = 0 and c intersects some cube c’ of F‘ is completely analogous, as we can find a pair of nonconcurrent transitions that turn on and off e’, respectively.

An example of case 1 is the path 000 -+ 100 -+ 110 for the SG in Fig. 3(b) for output x, with on-set cover F’ = xZ + 7Jz (see Appendix A). Here cb = @,ea = xZ,t; = y+ and t: = x+, ordered xf + y+ on the STG.

An example of case 2 is the 001 -+ 011 + 010 for output x. Both hazard cases occur, informally, when the subcircuit

behaves as though the STG-specijied ordering of ti and t: had been reversed. This means that the physical circuit implementation must preserve the transition ordering:

Property 6.1: Given an STG S and a circuit G implement- ing it, $ a transition tz on the input of a subcircuit C, causes a transition ti on the output of Cl (i.e., the two transitions

(a) (b)

An STG fragment and its implementation. Fig. 9.

are not concurrent in S), then no other subcircuit C2 can produce a sequence of events on its output as a delay-free implementation would have produced if ti had preceded t: in time.

For example in the circuit shown in Fig. 9 we must ensure that every circuit path connecting a -+ b + c through Cl and Ca has a longer delay than any circuit path a + c in C2 only.

This property can be satisfied, under the bounded wire-delay model, by slowing down the output of C1 so that we are sure that every circuit having a as input is stable when we generate a transition of b. The next section gives an algorithm for this purpose. The circuit will still work properly with this increased delay on the output of Cl, because we are changing the delay of an STG signal. So the circuit still follows the specification, since the firing time of an STG transition can be arbitrary.

VII. HAZARD DETECTION AND ELIMINATION PROCEDURE

In the previous section we described the precise conditions under which a static hazard can occur, namely when the difference between the delays along two paths in one subcircuit is greater than the delay between two transitions. In this section we give an algorithm for detecting such hazards and discuss a procedure for eliminating them.

The procedure is a conservative approximation of Theorem 6.1. It is guaranteed to detect all hazards, but may erroneously identify as hazardous some safe cases. This approximation is done in order to save computation time, and has an impact only on the performance of the resulting circuit, not on its hazard-freeness. Its extension to compute the exact conditions of the Theorem is left to the interested reader.

The approximation consists of finding pairs of cubes that may be turned on and off incorrectly due to delays reversing the apparent order of two input transitions. Basically, we check all the firing sequences of the STG (i.e., all the valid input sequences) and verify the delays of cubes being turned off and on. This delay verification can be done once the circuit is implemented, because there is a direct correspondence between cubes in the initial two-level cover and the final implemented circuit, if we use only the transformation listed in Theorems 5.1 and 5.2. If two cubes ever switch in the wrong order, then a hazard may happen. The limit to two cubes ensures a safe approximation, but neglects the fact that a third cube can actually keep the output stable.

Static 1 + 1 hazards are detected by checking cube pairs belonging to the on-set cover F’ derived by Procedures 4.1 and 5.1. To address static 0 + 0 hazards we analyze the DeMorgan

LAVAGNO et al.: SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS I7

complement, DM(F' ) , of the cover and apply essentially the same procedure used for 1 -+ 1 hazards. The validity of this procedure rests on the fact that DM(F ' ) has a static 1 + 1 exactly when F' has a static 0 + 0 hazard [47].

The hazard detection procedure below must be called once for each output signal th, with next-state function f, imple- mented by cover F' according to Procedures 4.1 and 5.1. In the following: Given a cube c, let D M ( c ) denote the DeMorgan's complement of c; for example, if c = zyz, then D M ( c ) = a: + y + Z. (See Procedure 7.1 at bottom of page.)

Note that for each input t , or tb of each cube ca, cb and cab mentioned in the procedure, there exists only one path in the final circuit if we optimize the initial two-level circuit only with the distributive and associative laws.

It is easy to show, following the same line of reasoning used to introduce Property 6.1, that this procedure is useful to derive suficient conditions for hazard-free circuit implementation using Theorem 6.1.

The conditions are only sufficient because we do not take into account the case when two or more cubes in the on-set (or

Procedure 7.1:

For each SG state pair (s', s), labeled U' and U, do: -for each maximal set II of paths that are permutations of concurrent transitions of some valid SG path R'

associated with (SI, s) such that: * R' is not a sub-path of another valid path (i.e., (SI, s) are maximally distant) * f (v') = f (v) = f (v") for all U'' labeling some s'' on R' (and hence on every R E n, since output

transitions cannot be disabled) do:

1) for each pair (t:, t i ) of nonconcurrent transitions that appear ordered t: -+ . . . + ti on every R E II do: (a) let S' be the set of predecessor states of a jiring oft: on some R E ll (b) let S be the set of successor states of ajiring oft: on some R E II (c) let dab be a lower bound on the delay between transitions t: and t: (d) iff(.') = 1 then

(check for 1 + 1 hazards, by looking for cubes turned off and on in the wrong order) *for each pair of cubes cb, ca E F' such that:

. cb covers the label of some s' E S' ' ca covers the label of some s E S . jiring ti turns 08 cb

.$ring t: turns on ca (i.e., tb E cb i f ti = tb,G E cb if ti = tb+)

(i.e., t , E ca i f tlf. = t i , labeling some s' E 5'' )

E ca if t: = t;; note that this means that ca does not cover any U'

do: i. let dbh be a lower bound on the delay for transition t: along the circuit path from tb to th corr-

ii. let dah be an upper bound on the delay for transition t: along the circuit path from t, to th esponding to cube cb

corresponding to cube ca (e) else

(check for 0 + 0 hazards, by looking for cubes that may be turned on incorrectly, i.e., cubes in D M ( F ' ) being turned off and on in the wrong order) *for each cube cab E F' such that there exist two cubes ca', cb' E D M ( c a b ) such that:

. cbl covers the label of some s' E S'

. ca covers the label pf some s E S

.$ring ti turns off cb

.$ring t: tums on ea

do: i. let dbh be a lower bound on the delay for transition ti along the circuit path from tb to th corre-

ii. let dah be an upper bound on the delay for transition t: along the circuit path from t, to th

(i.e., t b E cb' i f ti 7 t i , G E cb' i f ti = tb+; note that this implies that ti turns on cab)

(i.e., t , E ca i f t: = t z , c E ca' i f t: = t i ; note that this implies that t: turns off cab)

sponding to cube cab

corresponding to cube cab cf) i f dah > dab + dah then a hazard condition exists

-

78 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

off-set) contribute to keeping the output stable. In that case, for example, the potential hazard manifests itself only if the effect of two (or more) transitions, call them tl,t:, reaches the output before the effect of another transition ordered before both of them, call it t:. The hazard occurs if both cubes are turned off by ti and t: before another cube is turned on by t;. The cause of the hazard, in general, can be an inversion of the ordering of two arbitrary subsets of transitions, but taking this fact into account would make the hazard detection and elimination algorithms too complex.

On the other hand, considering a set of paths II together reduces the average case complexity of the algorithm without losing generality. It “collapses” together a set of vertices reached under all possible orderings of a set of concurrent transitions that do not change the value of f (since they appear on a valid path). All such vertices must be covered by a single on-set or off-set cube according to Procedure 4.1. So all the paths that are “collapsed” together agree on the order of nonconcurrent transitions and on the order of the cubes. Furthermore, all the vertices that are labels of some state in S’ (respectively S) are also reached under concurrent transitions, so all the cubes that cover some of them must cover all of them.

Note that similar conditions for hazard manifestation could be derived using the approach described in [16] and [17], but only under the fundamental mode hypothesis. We do not require such hypothesis, but only constrain the environment to satisfy the STG specification.

The worst case running time of the algorithm can be pessimistically estimated as follows. Suppose that the STG has n signals and O(n) transitions and places. Then there can be at most 0 ( 2 n ) SG states and 0 ( 2 n ) cubes in F’. A bound on the number of sets II of paths to be considered from each state is 0 ( 2 n ) because O(n) places cannot produce more than 0(2n) different choices (recall that concurrency does not increase the number of such sets). Each set can be generated with a depth- first search of the SG, with a running time of O(gn). The main loop considers each pair of states, each set of paths and each pair of transitions in turn (i.e,, it can be executed O(n’8”) times). It involves 0(4n) cubes. So, assuming that each cube operation in the inner loop takes O(n) time to complete, the worst case running time of the algorithm is O ( ~ z ~ 3 2 ~ ) .

The algorithm performs very well on practical STG’s, though, because:

the number of sets of paths is linear in the number of MG components, which is usually fairly small, and the number of cubes in F’ is also usually small.

The hazard detection procedure is performed on a two-level implementation of the next state function, using the unbounded wire-delay model. But, as proved in Theorems 5.1 and 5.2, we can use constrained multilevel logic synthesis to implement and optimize the function with logic gates from any available library (nand, nor, inverter, ex-or, SR flip-flops, C-elements, + ..). The resulting circuit has exactly the same potential hazards as the initial two-level circuit (under the unbounded wire-delay hypothesis, of course).

Now we can use the delay information from the imple- mentation in order to verify which hazards, predicted using

the pessimistic unbounded wire-delay model, will actually be present if we use the more realistic bounded wire-delay model. Moreover we can use the same information to eliminate those hazards, and produce a hazard-free implementation of the STG specification.

Basically, we perform a timing analysis step to check with inequalities of the form ( d a h - d b h ) < dab are not satisfied, and then pad delays in order to satisfy them.

The upper and lower bounds on dbh and d a h can be obtained by timing simulation or timing analysis of the circuit implementation, where some bounds on the gate delays and wire delays must be known. The input vectors for the timing simulation are just the cubes ca , cb and cab (toggling t: and t i ) as obtained above.

The lower bound on dab can either be obtained in the same way (if we are also synthesizing the circuit for signal t b ) or from any other information source, such as a data sheet. In the worst case we can assume it to be zero.

Whenever Procedure 7.1 finds a hazard condition, we record the triple of delays (together with the associated circuit paths). At the end, we have two choices for each ( d a h - d b h ) < dab

that is not satisfied: 1) Reduce d a h andor increase d b h , using logic synthesis

andor transistor sizing. This brings the circuit closer to the gate-delay case assumed in Section 11-E, where all delays inside the subcircuit for each output ti were supposed to be balanced. This may not always be pos- sible, furthermore it may introduce cyclic dependencies, so that when we try to eliminate one hazard we make another one worse.

2) Increase dab . This is always possible, just adding buffers after the output of the subcircuit for signal t b , if we are using a cell-based semi-custom design methodology. Or we can use transistor sizing to accomplish, it, if we are using a full-custom design methodology.

Furthermore we do not introduce cyclic constraints, since dbh and d a h are measured from a subcircuit input to a subcircuit output. So adding delay after the output does not change any dbh or d a h r for this signal or for any other output signals. Thus it does not introduce new hazards neither in this subcircuit nor in other subcircuits. We also preserve the correct operation of the circuit. The STG behavior assumed unbounded delays for each transition firing, so whatever delay we add after the circuit that produces t b (before any fanout point) does not affect the correctness of the STG implementation.

If we make the second choice, then an easy solution is to record by how much each output fails to pass the test, and then slow it down by the maximum difference.

A better solution would be to take into account all hazards simultaneously, since increasing some dab cannot make any other hazard worse, but it can help eliminate some other hazard. The implications of this improved hazard elimination methodology are discussed more in detail in [23].

We can now observe some similarity between this approach and the classical definition of essential hazards:

LAVAGNO er al.; SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS 19

An essential hazard manifests itself in a circuit operated in fundamental mode as the fact that a state variable change propagates too fast. Then it “overruns” the input change that caused it, and causes a hazard in some gate that was expecting the effect of the input change before the effect of the state variable change.

Such hazards require (see, e.g., [47]) to increase the delay of the state signals, so that input changes finish their propagation before the changes due to the state transition are initiated. The hazards detected by our methodology manifest themselves in a circuit operated according to an STG specification (that is we do not requirefundamental mode operation) if two signal transitions that are causally related “overrun” each other.

Such hazards require to increase the delay of the second signal, so that the changes due to the first transition finish their propagation before the changes due to the second transition are initiated.

We can also find a parallel between our proposed method- ology and what is classically done in synchronous circuit synthesis:

1) in the synchronous case we slow down the clock signal until no more events are propagating along the whole circuit.

2) in the asynchronous case we slow down each signal until no more events that caused its change are propagating in its immediate fanout.

So this approach, even if it does not generate locally speed-independent or delay-insensitive circuits, can still be considered faithfully adherent to the “self-timed philosophy” [40]. Every element must obey a “locally defined protocol, and elements that are logically (and maybe physically) far apart must not be slowed down due to each other.

VIII. DYNAMIC HAZARD ANALYSIS Dynamic hazard analysis is much more difficult than static

hazard analysis, as we do not have (yet) powerful formal tech- niques to reason about them. Even the methodology described in [21] could be used for our purposes only assuminghnda- mental mode operation. So this section will be less precise and more based on intuition than its counterparts on static hazards.

If we assume that we have performed the static hazard elim- ination procedure, we can analyze in the following manner the operation of a two-level implementation of the on-set of a logic function F. This intuitive view can be used also for a multi- level implementation of the same function, because algebraic logic synthesis, based only on distributive and commutativity, does not change substantially the “cube-based” picture.

The STG specification describes a constrained way to walk along the cubes implementing the function, and the hazard elimination procedure guarantees that:

If we are walking along on-set vertices, whenever we are about to “leave” a cube and the function value must not change in the next vertex, then Procedure 4.1 guarantees that there is another on-set cube covering both the current vertex and the next one. The hazard elimination procedure

delays, if necessary, the transition among those two vertices to ensure that the logic implementing this second cube keeps the output high before the first cube can cease to keep it high.

So we “orderly walk” on the cubes, ensuring that whenever we “enter” a cube it has time to “turn on.” If we are walking along off-set vertices, whenever we might “enter” a cube, then the hazard elimination proce- dure delays, if necessary, the transition to ensure that we “stay outside” the on-set.

So our static hazard elimination actually guarantees that we follow only paths that are legal according to the spec$cation, even though the delays in the implemented circuits might lead us away from legal SG paths due to the bounded wire-delay model.

But in this case we cannot have dynamic hazards after static hazard elimination, because:

While we are walking on the on-set (when we could have 1 + 0 --+ 1 -+ 0 hazards when entering the off-set), then we are sure that every on-set cube that we can enter has time to be turned on before we proceed. Hence we do not output a spurious “1-pulse” because some cube is very slow and its effect reaches the output only after we have left the on-set. While we are walking on the off-set (when we could have 0 + 1 + 0 + 1 hazards when entering the on-set) then we are sure that no off-set cube can ever be turned on and produce a dynamic hazard.

The only problem can be caused by cubes that happen to cover a vertex due to some other, independent firing sequence of the STG. This is due to the fact that the Complete State Coding property ensures only that different markings with the same binary label (i.e., corresponding to the same vertex) have the same set of enabled output transitions. So we can reach the same vertex more than once in different markings, and have a different set of input transitions enabled, thus producing different cubes in Step l(a)i of Procedure 4.1. The hazard elimination procedure would then be unable to handle these different cubes, because they would not satisfy e.g., Property 4.1 with respect to transitions that were enabled in a different marking with the same label. Thus some dynamic hazards may be present in the final implementation even after hazard elimination.

This problem can be solved by enforcing the more restrictive Unique State Coding property on the STG and avoiding to expand the cubes in the initial cover if they can intersect each other. This solution is similar to the approach described by [34] to synthesize self-clocked asynchronous circuits, and to the conditions required by [20] to synthesize unbounded gate-delay circuits.

Dynamic hazards, though, are in general less of a problem than static hazards, because the inertial nature of real circuit delays tends to absorb them. So they can often be neglected in practice.

Ix . SIGNAL TRANSITION GRAPH PERSISTENCY AND HAZARDS

After giving a complete procedure for hazard-free synthesis of a circuit from an STG specification, we now turn our

80 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

attention towards a property of the STG specification that was previously considered to be necessary and sufJicient to obtain a hazard-free implementation.

An STG transition t5 is defined to be persistent according to Chu [IO] if for each immediate predecessor tf of tj”, tj* and are ordered (i.e., belong to a simple cycle). An STG is persistent if all its transitions are persistent. Note that this definition is not related with the standard definition of persistency for Petri nets, where a transition is persistent if once enabled it cannot be disabled by another transition firing.

For example, transition y+ in Fig. 2 is not persistent ac- cording to Chu, because it has a x+ as a predecessor, but x- and y+ are concurrent. On the other hand, the underlying Petri net transition is persistent according to the standard definition, because the token from its input place can never be removed.

In a persistent STG, whenever a transition tj” becomes enabled, none of its enabling signals t; can change level before tj” has fired. So STG persistency seems intuitively connected with the notion that an output transition may be disabled if any of the signals that caused it changes value. This intuition is incorrect, as we shall see below.

Persistency at the STG level was considered to be a neces- sary and sufficient condition for the existence of a hazard-free circuit implementation, due to the following theorem, taken from [lo] (only the notation is changed here, to be consistent with the rest of the paper):

Theorem 9.1: Let S be a live STG. For each output signal tj, there exists a signal t; and a marking m in S such that:

tf and tj” are enabled in m and tf disables tj* and tj* does not disable tf8

if and only if q i s a predecessor of tj”, and tj* and t5 are concurrent (that is tj” is nonpersistent).

The case in which tj” is disabled by tf but not vice- versa intuitively seems to be dangerous, because if the circuit implementing signal t j is not “fast enough” to fire after fires, then a firing of tf may prevent t; from firing. So we could have a potential hazard, depending on the delay of the circuit implementing t j and the time from to t:. On the other hand a simple persistency check on the STG would be enough to guarantee that no such hazard occurs.

The proof of this theorem was based not only on STG properties, since strictly speaking no transition satisfying the premises of the theorem can be disabled, but also on specific assumptions on the circuit implementation derived from the STG, namely that if t; is enabled by e, then an occurrence of tf must disable t; in all circuit implementations of signal ti. This is not true in general, but only if the output of the subcircuit implementing signal t j is sensitive to the value of signal ti in marking m.

See, for example, the STG in Fig. 2, where y+ is not persistent. The logic equation for y is y = x + z, and using the gate-delay model, as assumed by [lo], we know that x- is caused by z+, so when x- fires z is already at 1 (which determines the output of the or gate independent of the value

8This means that t t and t; are not successors of the same multisuccessor place, otherwise, t j would disable t: as well.

La+

\.t> SI

Lr 1

1 .1

Sa+

Sa Sr-

La

Sa-

Fig. 10. An STG fragment and its implementation.

1j3q7kx :>y

Y R

(a) (b) (C)

Fig. 11 .

of x), and x- cannot disable y+. So STG persistency is not necessary for a hazard-fiee implementation, because the implementation described in Fig. 11 of this nonpersistent STG is hazard-free using the unbounded gate-delay model that [ 101 used.

On the other hand, Theorem 9.1 guaranteed hazard-free implementation only if the whole subcircuit implementing each signal t j could be satisfactorily modeled as a single gate with the unbounded gate delay model. But this model is generally a reasonable approximation of reality only if the whole subcircuit is one simple gate, such as a nand or a nor.

Consider, for example, Fig. 10 from [lo]. It shows a frag- ment of a persistent STG and its circuit implementation. The value of the signals in the given marking is La = 0, Lr = 1 , Sa = 0 , Sr = 1 , Ca = 0 and Cr = 0. When La+ fires, then the output of a3 has a rising transition. Suppose that the gate delay of a3 is greater than the gate delay of a2 plus the delay between Sa+ and Sr-. Then a hazard occurs on Lr. The only way to avoid the hazard in the unbounded gate delay model is to assume that the whole group of gates i l l a3 and 01 can be modeled with a single delay on the output of 01. So STG persistency is not sufJicient for a hazard-fee implementation, except when using the complex gate delay model (that may be considered too unrealistic for most practical purposes).

A circuit implementation of the example STG.

In consequence we can state that: Theorem 9.2: Let S be a live STG with the CSC property.

Let C be a circuit implementation of the signals in S according to Procedures 4.1 and 5.1 and the decomposition described in Section V-C. If the implementation of each combinational subcircuit exciting each jlip-jlop input, as well as each jlip- flop, can be modeled as a single gate with nonzero delay, then C is a hazard-free with the unbounded gate-delay model.

Proof: Each signal t b in S is implemented by a circuit with nonzero delay, so dab > 0. Moreover both dbh and dah

LAVAGNO et al.: SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS

chul50 chu172 converta ebergen full hazard hybridf nowick

mp-forward-pkt nak-pa pe-rcv-ifc pe-send-ifc

rcv-setup

sbuf-read-ctl sbuf-send-ctl sbuf-send-pkt2 sendr-done qr42 rpdft vbelOb vbe5b vbe5c

allOC-OUtbOUnd

m-read-sbuf

Sbuf-m-Write

81

1 76 128 376 240 224 200 312 232

272 320 776 920

128

256 280 320 104 240 176 680 232 160

272

352

328

TABLE I INFLUENCE OF LOGIC 0 ~IMIZATION

3.4 5.2 4.6 5.2 4.2 5.2 5.2 4.4 5.6 5.8 5.2

10.8 12.0 7.0 2.8 6.4 3.4 3.4 5.0 2.8 5.2 4.0 7.0 7.6 2.8 6.4

176 128 336 240 224 200 312 232 272 272 288 776 904 352 128 328 256 280 320 104 240 176 680 232 160 648 Wdatab

total

3.4 3 .O 4.6 4.4 4.2 4.2 4.8 4 A 4.0 4.2 4.0 6.0 7.2 4.6 2.8 4.6 3.4 3.4 4.2 2.8 4.4 4 .O 4.6 4.4 2.8 6.0 648

8576 113.6

mized s Without hazards I I With hazards Area Delay 11 Area Delay 256 5.4 11 224 3.2 1 76 192 376 272 224 232 376 232 304 336 352

lo00 1176 448 128 424 256 280 352 104 272 176 872 360 160 744

3.4 3.0 4.6 4.4 4.2 4.2 4.8 4.4 4.0 4.2 4.0 6.0 7.6 4.6 2.8 4.6 3.4 3.4 4.2 2.8 4.4 4 .O 4.6 4.4 2.8 6.0

I,

10080 146.0 11 8488 114.0

are delays within the same gate, implementing either s or rn,

Notice that the assumption that each excitation function may be modeled as a single gate was used, as shown above, by [lo] and [26].

SO dbh = dahr and dah - dbh = 0 < &b.

X. EXPERIMENTAL RESULTS All the algorithms described in this paper have been im-

plemented within S I S , a sequential logic synthesis system developed at the University of California, Berkeley [42],[41]. We applied them to a set of STG’s taken both from the liter- ature and from a real industrial application, a multiprocessor interconnection system [43].

We will show the influence of the following factors on the final result:

1) the use of constrained logic synthesis to reduce the area and delay of the final circuit implementation.

2) the use of some information on the delay between output transitions and input transitions due to the nonzero response time of the environment (while all the other results in this section were obtained by conservatively estimating the environment delays to be zero).

3) the type of gate library available for the implementation.

nized Without hazards Area Delay 256 5.4 1 76 192 336 272 224 232 376 232 304 336 320

lo00 1160 448 128 424 256 280 352 104 272 176 872 360 160 744

3.4 5.2 4.6 5.2 4.2 5.2 5.2 4.4 5.6 5.8 5.2

10.8 11.0 7 .O 2.8 6.4 3.4 3.4 5 .O 2.8 5.2 4 .O 7.0 7.6 2.8 6.4

9992 145.0

We will also give a comparison with a straightforward syn- chronous implementation of (roughly) the same functionality. This implementation was obtained by interpreting the SG as a Finite State Machine (as outlined in [22]), minimizing it and then performing standard state assignment and synchronous logic synthesis. So the synchronous circuit would implement exactly the same behavior as the asynchronous circuit if we assumed infinite clock speed.

In all the tables, the columns labeled “area” give the total area (excluding routing) of each circuit, while the columns labeled “delay” given the maximum delay in the combinational logic block implementing each output signal, using a “generic” standard cell library including asynchronous gates (e.g., SR flip-flops and C-elements). As a reference point, in this library the inverter area is 16 units, and its delay is 1 unit plus .2 units for each driven gate. Notice that the delay column is not meant to give an absolute measure of operating speed, but only an idea of the trade-offs implied by the synthesis choices.

Table I shows the influence of logic optimization on the result of the synthesis and hazard elimination steps. The column labeled “Unoptimized with hazards” gives the result of a straightforward technology mapping of the initial next-state function, as obtained by Procedures 4.1 and 5.1. The column

82 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

TABLE I1 INFLUENCE OF SOME KNOWLEDGE OF ENVIRONMENT DELAY

example

chu133 chul50 chu172 Converta ebergen full hazard hybridf nowick alloc-outbound mp- forward-pkt d - p a percv-ifc pe-send-ifc m-read-sbuf rcv-setup sbuf-ram-write sbuf-read-cu sbuf-send-ctl sbuf-send-pkt2 sendrdone qr42 rpdft vbelob vbe5b vbe5c wrdatab total

With hazards 4rea Delay 224 3.2 176 3.4 128 3.0 336 4.6 240 4.4 224 4.2 200 4.2 312 4.8 232 4.4 272 4.0 272 4.2 288 4.0 176 6.0 9oQ 7.6 352 4.6 128 2.8 328 4.6 256 3.4 280 3.4 320 4.2 104 2.8 240 4.4 176 4.0 680 4.6 232 4.4 160 2.8 648 6.0 M88 114.0

Zerodelay ~thout hazards 4rea Delay 256 5.4 176 3.4 192 5.2 336 4.6 272 5.2 224 4.2 232 5.2 376 5.2 232 4.4 304 5.6 336 5.8 320 5.2

lo00 10.8 1160 11.0 448 7.0 128 2.8 424 6.4 256 3.4 280 3.4 352 5.0 104 2.8 272 5.2 176 4.0 872 7.0 360 7.6 160 2.8 744 6.4

9992 145.0

! inverter delays Without hazards kea Delay 224 3.2 176 3.4 128 3.0 336 4.6 240 4.4 224 4.2 200 4.2 344 5.0 232 4.4

212 4.2 288 4.0 872 8.4

lo00 1o.c 352 4.6 128 2.8 328 4.6 256 3.4 280 3.4 320 4.; 104 2.f 240 4.4 176 4.( 744 7.( 264 5.2 160 2.8 648 6.0

8808 122.2

272 4.a

labeled “Unoptimized without hazards” gives the result of the application of the hazard elimination procedure for this unoptimized implementation. The column labeled “Optimized with hazards” gives the results of constrained logic synthesis (using only the distributive and associative laws), while the column labeled “Optimized without hazards” gives the result of the hazard elimination procedure applied to this optimized implementation. Notice that Theorems 5.1 and 5.2 were used in all cases to obtain a standard cell implementation.

Table I1 shows how some knowledge about the delay of the environment can greatly improve the synthesis result. The column labeled “Zero-delay’’ shows the result of the hazard elimination procedure if we assume that the environment instantaneously responds to an output transition with a new set of input transitions. The column labeled “2 inverter delays” shows the result if we assume that the delay of the environment is at least equal to the delay of two inverters of our standard cell library. This is used in the hazard elimination step when estimating the value of each external delay (dab in Procedure 7.1).

Table I11 shows the influence of the available library on the synthesis results. We used two different libraries, one (Library 1) with a large set of combinational functions and one (Library 2) with only a few gates with delays widely unbalanced both among gates and among different inputs to the same gate, to simulate somehow the influence of very long routing lines.

Table IV compares an asynchronous and a synchronous implementation of roughly the same behavior, as described above. This table is not meant to be a “fair” comparison between the synchronous and asynchronous design styles,

__

because the specifications were designed with an asynchronous implementation in mind. It is meant to show that an asyn- chronous implementation does not automatically imply a loss in area and/or performance.

XI. CONCLUSIONS AND FUTURE WORK The principal target of this paper was to show that each live

STG with the CSC property has a hazard-free asynchronous implementation, using the bounded wire-delay model.

In order to prove this, we gave a synthesis procedure, and we examined the hazard properties of the result of each synthesis step, taking care that we did not introduce new causes of hazards, and that we eliminated all hazards at the end. Hazard elimination takes advantage of the fact that we can always add delays to STG signals in order to eliminate hazards, without violating the specification.

One important consequence of this work is that persistence can no longer be considered a necessary condition for hazard- free implementation. This is a desirable result, since enforcing persistence reduces the concurrency at the STG level [26].

The derivation of necessary and suficient conditions for hazard-free implementation under bounded wire-delays is an interesting area for future development. For example, the work of [52] and of [20] can be considered an interesting step in that direction, for the less realistic unbounded gate-delay model.

Another opportunity for future development is the applica- tion of logic synthesis techniques to eliminate hazards during the implementation and optimization phases, rather than as a post-processing step.

APPENDIX A A SYNTHESIS EXAMPLE

In this section we will simulate by hand the execution of the synthesis procedure for an example taken from [26]. The STG, with the initial marking, appears in Fig. 2, and its SG appears in Fig. 3(b).

Let us implement an on-set and an off-set cover for each signal. The initial value vector is x@. x :

only y+ can fire without enabling x-. We add XZ to F (since the value of x is 1 in the current value vector).

1) fire y+. The value vector becomes xyS. - Nothing can fire without enabling x-, so we do

not add cubes. Fire z+. The value vector becomes xyz.

- x- can fire. The value vector becomes Zyz. We add yz to R (since the value of x is 0 in the current value vector). Fire x-. The value vector remains 3yz.

Fire z- . The value vector becomes ZyZ.

not add cubes. Fire y- . The value vector becomes Zyz.

- z - can fire. We add Zy to R

- Nothing can fire without enabling to x-, so we do

LAVAGNO et al.: SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS

TABLE 111 COMPARISON BETWEEN DIFFERENT LIBRARIES

example

chu133 chul50 chu172 converta ebergen full hazard hybridf nowick alloc-outbound mp-forward-pkt nak-pa pe-rcv-ifc pe-send-ifc ram-read-sbuf rcv-setup sbuf-ram-write SbUf-readctl sbuf-send-ctl

=&-done qr42

vbelOb vbe5b vbefic wrdatab total

sbuf-Send-pkt2

rpdft

176 128 336 240 224 200 312 232 272 272 288 776 904 352 128 328 256 280 320 104 240 176 680 232 160 648

3.4 3 .O 4.6 4.4 4.2 4.2 4.8 4.4 4.0 4.2 4.0 6.0 7.6 4.6 2.8 4.6 3 A 3 A 4.2 2.8 4.4 4.0 4.6 4.4

6.a 2.8

Lil With hazards Area Delay 224 3.2

_ _ 8488 114.C

Vithout hazards 4rea Delay 256 5.4 176 192 336 272 224 232 376 232 304 336 320

lo00 1160 448 128 424 256 280 352 104 272 176 872 360 160

3.4 5.2 4.6 5.2 4.2 5.2 5.2 4.4 5.6 5.8 5.2

10.8 11.0 7 .O 2.8 6.4 3.4 3.4 5 .O 2.8 5.2 4 .O 7 .O 7.6 2.8

744 6.4 9992 145.0

- x+ can fire. The value vector becomes xv. We add yZ to F. We reach an old marking, so we return.

2) fire z+. The value vector becomes xgz. - x-,y+ can fire. The value vector becomes mz.

We add z to R. a) Fire x-. The value vector remains q z .

* y+ can fire. We add 5 z to R. We reach an old marking, so we return.

b) Firing y+ would reach an old marking, so we

Library 2 With hazards Area Delay 256 8.2 248 144 536 320 240 216 352 288 3 12 352 368 984

1256 456 176 464 256 360 360 104 320 248 872 288 216

14.2 7.8

14.8 7.8 4.2 7.8 5.2

15.4 10.4 7.0

15.2 26.8 19.0 14.2 7.8

14.0 6.8 8.2 9.2 6.6 7.8

15.2 15.8 9.4 9 .O

832 14.4 10824 302.2

408 240 696 352 240 248 416 288 376 512 432

1688 2056 744 176 688 288 392 424 136 352 344

1544 5 12 280

19.2 12.6 17.2 10.2 4.2

10.2 7 -6

15.4 10.4 9.4

15.2 43.6 29.4 18.6 7.8

18.8 6.8

10.2 10.4 6.6

10.2 22.4 27.6 15.2 9.0

1216 14.4 15432 395.6

83

- Nothing can fire without enabling y-, so we do not add cubes. Fire z-. The value vector becomes ZyZ.

- x- can fire. The value vector becomes Zyz. We add E to R. Fire y-. The value vector remains Zyz.

- Nothing can fire without enabling y+, so we do not add cubes. We reach an old marking, so we return.

2) fire z+. The value vector becomes xyz. return.

- x-, y+ can fire. The value vector becomes Tyz. We add z to F. a) Fire x-. The value vector becomes Zyz.

So we generate F = xZ + gp and R = yz + Zy + z + xz. F is already a minimum prime cover, and we obtain an implementation of x as described in Fig. ll(a).

-

I, . Y '

y+, z+ can fire. The value vector becomes xyZ. We add x to F.

* y+ can fire. We add Zz to F. We reach an old marking, so we return.

I ) fire y+. The value vector remains xy~. b) Firing y+ would reach an old marking, so we

So we generate F = x + xy + yz + z + ?i?z and R = T Z . We can delete covered cubes obtaining F = x + z , and the implementation of y described in Fig. ll(b).

- z+ can fire. So we add xy to F. Fire z+. The value vector becomes xyz.

- x- can fire. We add yz to F Fire x-. The value vector remains :yz.

return.

84 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

TABLE IV ASYNCHRONOUS AND SYNCHRONOUS IMPLEMENTATION

:xample

:hu133 ;hu150 :hu172 mnverta k g e n full hazard hybridf nowick alloc-O~tbound mp-forward-pkt d - p a pe-rcv-ifc pesend-ifc ram-read-sbuf rcv-setup sbuf-ram-write StnJf-read-ctl sbuf-send4 Sbuf-Send-pkt2 sendr-done qr42 rpdft vbelob vbe5b vbe5c wrdatab total

Asyni with hazards Area Delay 224 3.2 176 3.4 128 3.0 336 4.6 240 4.4 224 4.2 200 4.2 312 4.8 232 4.4 272 4.0 272 4.2 288 4.0 776 6.0 904 7.6 352 4.6 128 2.8 328 4.6 256 3.4 280 3.4 320 4.2 104 2.8 240 4.4 176 4.0 680 4.6 232 4.4 160 2.8 648 6.0

8488 114.0

molt3 Without hazards Area Delay 2% 5.4 176 3.4 192 5.2 336 4.6 272 5.2 224 4.2 232 5.2 376 5.2 232 4.4 304 5.6 336 5.8 320 5.2

loo0 10.8 1160 11.0 448 7.0 128 2.8 424 6.4 256 3.4 280 3.4 352 5.0 104 2.8 272 5.2 176 4.0 872 7.0 360 7.6 160 2.8 744 6.4

9992 145.0

Synchronous Rthout hazards Area Delay 568 5.8 432 4.4 392 4.2 468 5.6 380 3.2 336 5.8 252 3.0 608 6.2 472 8.2 472 7.0 304 4.6 336 6.6 808 16.6

1160 17.0 440 8.4 296 5.4 720 12.6 400 6.0 464 5.8 272 6.8 160 4.2 380 3.2 256 6.4

1224 8.6 512 5.6 440 4.2

1028 7.2 13580 182.6

z : y+, z+ can fire. The value vector becomes xgz. We add x to F.

1) fire y+. The value vector becomes xyz. - z+ can fire. We add xy to F. Fire z+. The value

- Nothing can fire without enabling z - , so we do vector remains xyz.

not add cubes. Fire x-. The value vector becomes 5yz.

add 5y to R. Fire z- . The value vector remains 5yZ.

Fire y-. The value vector becomes Zyz.

not add cubes.

- z- can fire. The value vector becomes 5yZ. We

- y- can fire. We add 3 2 to R.

- Nothing can fire without enabling z+, so we do

- We reach an old marking, so we return. 2) fire z+. The value vector remains xjjz.

- y+ can fire. We add xz to F. x- can fire. We add jjz to F.

a) Fire x-. The value vector becomes mz. * Nothing can fire without enabling z- , so we do not add cubes. We reach an old marking, so we return.

return. b) Firing y+ would reach an old marlung, so we

So we generate F = x + xy + xz +jjz and R = Zy +?E. We can delete covered cubes obtaining F = z + gz , and the implementation of z described in Fig. ll(c).

The following valid state pairs are associated with paths that might produce hazard conditions for each signal:

.I,.

1) State pair (000, 110), with set of paths {(z+ + y+)} and function value 1. Consider the on-set cubes z f and yZ. The two transitions that turn them on and off are z+ and y+, respectively. A pair of vectors causing the falling transition along the circuit path from input y through cube yZ to output x is ( yZ ,y f ) . This gives US dbh. A pair of vectors causing the rising transition along the circuit path from input x through cube xZ to output x is (?E, yZ). This gives us dah. dab can be measured on the circuit for output y applying transition x+, that is from vertex Zyz to xyZ. dab is the rising delay of an or gate, and dbh is the delay through a nor gate plus the set delay of an SR flip-flop, and their sum is going in general to be larger than dah, the feedback loop delay of the SR flip-flop. Otherwise we must slow down signal y.

2) State pair (101, OlO), with set of paths {(x- 4 yf -+ z - ) , (y+ 4 x- 4 z - ) } and function value 0. Consider the following on-set cubes: a) xZ. The two transitions that turn if off and on

are x- and z - , respectively. Then dbh is measured from z through cube zZ to x, dah is measured from z through cube xZ to x and dab is measured from x through cube x, signal y and cube jJz to z (notice that here the dependency between x- and z - is through another signal, namely y).

b) yZ. The two transitions that turn it off and on are y+ and z - , respectively. Then dbh is measured from y through cube @? to z, dah is measured from z through cube @ to z and dab is measured from y through cube gz to z .

y : Notice that the circuit for output y is purely combinational in this particular case.

1) State pair (100, 011), with set of paths { ( z + -+ x- -+

y+), (z+ --f y+ -+ x-), (y+ + zf -+ x-)} and function value 1. Consider the on-set cubes z and z. The two transitions that turn them on and off are z+ and x-, respectively. Then dbh is measured from x through cube 2 to y, dah is measured from z through cube z to y and dab is measured from z through cube XZ to x.

z : 1) State pair (100, Ool), with set of paths { (z+ --f x-)}

and function value 1. Consider the on-set cubes gz and x. The two transitions that turn them on and off are z+ and z-, respectively. Then dbh is measured from x through cube x to z , dah is measured from z through cube jjz to z and dab is measured from z through cube xZ to x.

2) State pair (011, 000), with set of paths { ( z - -+ y-)} and function value 0. Consider the on-set cube yz . The two transitions that turn it off and on are z- and y-, respectively. Then dbh is measured from z through cube gz to z , dah is measured from z through cube gz to z and dab is measured from z through cube z to y.

LAVAGNO et al.: SYNTHESIS OF HAZARD-FREE ASYNCHRONOUS CIRCUITS 85

ACKNOWLEDGMENT [24] A. Martin, “The limitations to delay-insensitivity in asynchronous cir-

The authors would like to thank C. Moon, R. Brayton, S . Nowick, D. Dill, P. Beerel, T. Meng, T.-A. Chu, K. Stevens, and A. Saldanha for many useful discussions. Many thanks also to Jerry Burch for verifying the correctness of some of our circuits (thus increasing our confidence in the methodology), and to the anonymous reviewers for their help in improving the overall quality of the paper.

REFERENCES

[l] D. B. Armstrong, A. D. Friedman, and R. P. Menon, “Design of asynchronous circuits assuming unbounded gate delays,” IEEE Trans. Comput. , vol. C-18, no. 12, pp. 1110-1120, Dec. 1969.

[2] P. A. Beerel and T. H.-Y. Meng, “Automatic gate-level synthesis of speed-independent circuits,” in Proc. Int. Con$ Computer-Aided Design, Nov. 1992.

[3] - , “Gate-level synthesis of speed-independent asynchronous con- trol circuits,” in Proc. ACM Int. Wkshp. Timing Issues Specijication Synthesis Digital Syst. (TAU), Mar. 1992.

[4] C. Berthet and E. Cerny, “Synthesis of speed-independent circuits using set-memory elements,” in Proc. Int. Wkshp. Logic Architect. Synthesis Silicon Compilers, Grenoble, France, May 1988.

[5] R. K. Brayton, G. D. Hachtel, C. T. McMullen, and A. Sangiovanni- Vincentelli, Logic Minimization Algorithms for VLSI Synthesis. Nor- well, MA: Kluwer Academic, 1984.

[6] R. K. Brayton, R. Rudell, A. Sangiovanni-Vincentelli, and A. R. Wang, “MIS: A multiple-level optimization system,” IEEE Trans. Computer- Aided Design, vol. CAD-6, no. 6, pp. 1062-1081, Nov. 1987.

[7] E. Brunvand and R. F. Sproull, “Translating concurrent programs into delay-insensitive circuits,” in Proc. Int. Con$ Computer-Aided Design, Nov. 1989, pp. 262-265.

[8] S. Bums and A. Martin, “A synthesis method for self-timed VLSI circuits,” in Proc. Int. Con$ Computer Design, 1987.

[9] T.-A. Chu, “On the models for designing VLSI asynchronous digital systems,” Integration: VLSI J., vol. 4, pp. 99-1 13, 1986.

[IO] - “Synthesis of self-timed VLSI circuits from graph-theoretic specifications,” Ph.D. dissertation, MIT, June 1987.

[ l l ] E. Detiens, G. Gannot, R. Rudell, A. Sangiovanni-Vincentelli, and A. Wang,-“Technology mapping in MIS,” in-roc. Int. Con$ Computer- Aided Design, Nov. 1987, pp. 116-1 19. J. C. Ebergen, Translating Programs into Delay-Insensitive Circuits. Amsterdam, Netherlands: Centrum voor Wiskunde en Informatica, 1989. E. B. Eichelberger, “Hazard detection in combinational and sequential switching circuits, IBM J. Res. Develop., vol. 9, Mar. 1965. M. Hack, “Analysis of production schemata by Petri Nets,” Tech. Rep. TR 94, Project MAC, MIT, 1972. D. A. Huffman, “The synthesis of sequential switching circuits,” J. Franklin Inst., vol. 257, pp. 161-190, 275-303, Mar. 1954. N. Ishiura, M. Takahashi, and S. Yajima, “Time-symbolic simulation for accurate timing verification,” in Proc. Design Automat. Con$, June

N. Ishiura, M. Takahashi, and S. Yajima, “Coded time-symbolic simula- tion using shared binary decision diagrams,” in Proc. Design Automat. Con$, June 1990, pp. 130-135. M. A. Kishinevsky, A. Y. Kondratyev, A. R. Taubin, and V. I. Varshavsky, “On self-timed behavior verification”, in Proc. ACM Int. Wkshp. Timing Issues Specijication Synthesis Digital Syst. (TAU), 1992. I. Kohavi and Z. Kohavi, “Detection of multiple faults in combinational logic networks,” IEEE Trans. Comput. , vol. C-21, no. 6, pp. 556558, June 1972. A. Kondratyev, M. Kishinevsky, B. Lin, P. Vanbekbergen, and A. Yakovlev, “On the conditions for gate-level speed-independence of asyn- chronous circuits,” in Proc. ACM Int. Wkshp. Timing Issues Specijication Synthesis Digital Syst. (TAU) , 1993. D. Kung, “Hazard-non-increasing gate-level optimization algorithms. in Proc. Int. Con$ Computer-Aided Design, Nov. 1992. L. Lavagno, C. W. Moon, R. K. Brayton, and A. Sangiovanni- Vincentelli, “Solving the state assignment problem for signal transition graphs,” in Proc. Design Automat. Con$ , June 1992. L. Lavagno, N. Shenoy, and A. Sangiovanni-Vincentelli, “Linear pro- gramming for hazard elimination in asynchronous circuits,” J. VLSI Signal Process., vol. 7 , nos. 1-2, pp. 137-160, 1994.

1989, pp. 497-502.

cuits,” in Proc. Conf Advanced Res. VLSI, Apr.. 1990. [25] E. McCluskey, “Minimization of Boolean functions,” Bell Labs. Tech.

J., Nov. 1956. [26] T. Meng, “Asynchronous design for digital signal processing architec-

tures,’’ Ph.D. dissertation, Univ. of Calif., Berkeley, Nov. 1988. [27] R. E. Miller, “Chapter IO,” in Switching Theory, vol. 2. New York

Wiley, 1965, pp. 192-244. [28] C. E. Molnar, T.-P. Fang, and F. U. Rosenberger, “Synthesis of delay-

insensitive modules,” in Chapel Hill Cont VLSI, May 1985, pp. 67-86. [29] C. W. Moon, “On Synthesizing Logic from Signal Transition Graphs,”

personal communication, 1990. [30] C. W. Moon, P. R. Stephan, and R. K. Brayton, “Synthesis of hazard-

free asynchronous circuits from graphical specifications,” in Proc. Int. Con$ Computer-Aided Design , Nov. 1991.

[31] D. E. Muller and W. C. Bartky, “A theory of asynchronous circuits,” in Annals of Computing Lab. Harvard Univ., 1959, pp. 2W243 .

[32] T. Murata, “Petri nets: Properties, analysis and applications,” Proc. IEEE, pp. 541-580, Apr. 1989.

[33] S. M. Nowick and D. L. Dill, “Automatic synthesis of locally-clocked asynchronous state machines,” in Proc. Int. Con$ Computer-Aided Design, Nov. 1991.

[34] -, “Exact two-level minimization of hazard-free logic with multiple-input changes,” in Proc. Int. Con$ Computer-Aided Design, Nov. 1992.

[35] S. S. Patil and J. B. Dennis, “Speed independent asynchronous circuits,” in Proc. Hawaii Int. Con$ Syst. Sci., 1971, pp. 55-58.

[36] J. L. Peterson, “Petri nets,” ACM Computing Surveys, vol. 9, no. 3, Sept. 1977.

[37] C. A. Petri, “Kommunikation mit Automaten,” Ph.D. dissertation, In- stitut fur Instrumentelle Mathematik, Bonn, 1962. (Tech. Rep. Schriften des IIM Nr. 3.)

[38] F. U. Rosenberger, C. E. Molnar, T. J. Chaney, and T.-P. Fang, “Q- modules, internally clocked delay-insensitive modules,” IEEE Trans. Comput., vol. 37, pp. 1005-1018, 1988.

[39] L. Y. Rosenblum and A. V. Yakovlev, “Signal graphs: From self-timed to timed ones,” in Int. Wkshp. Timed Petri Nets, Torino, Italy, 1985.

[40] C. L. Seitz, “Chapter. 7,” in Introduction to VLSI Systems, C. Mead and L. Conway, Fds.

[41] E. M. Sentovich, K. J. Singh, L. Lavagno, C. Moon, R. Murgai, A. Saldanha, H. Savoj, P. R. Stephan, R. K. Brayton, and A. Sangiovanni- Vincentelli, “SIS: A system for sequential circuit synthesis,” Tech. Rep. UCBERL M92/41, Univ. of Calif., Berkeley, May 1992.

[42] E. M. Sentovich, K. J. Singh, C. Moon, H. Savoj, R. K. Brayton, and A. L. Sangiovanni-Vincentelli, “Sequential circuit design using synthesis and optimization,” in Proc. Int. Con$ Computer Design, Oct. 1992.

[43] K. S. Stevens, S. V. Robinson, and A. L. Davis, “The post of- fice-communication support for distributed ensemble architectures,” in 6th Int. Con$ Distributed Computing Syst., 1986.

[44] I. E. Sutherland, “Micropipelines,” Communicat. ACM, June 1989. Turing Award Lecture.

[45] J. H. Tracey, “Internal state assignments for asynchronous sequential machines,” IEEE Trans. Electron. Comput. , vol. EC-15, no. 4, pp. 551-560, Aug. 1966.

[46] J. T. Udding, “A formal model for defining and classifying delay- insensitive circuits and systems,’’ Distributed Computing, vol. 1, pp. 197-204, 1986.

[47] S. H. Unger, Asynchronous Sequential Switching Circuits . New York: Wiley Interscience, 1969.

[48] P. Vanbekbergen, “Optimized synthesis of asynchronous control circuits from graph-theoretic specifications,” in Proc. ht . Con$ Computer-Aided Design, Nov. 1990, pp. 184-187.

[49] P. Vanbekbergen, B. Lin, G. Goossens, and H. De Man, “A generalized state assignment theory for transformations on signal transition graphs,” in Proc. Int. Con$ Computer-Aided Design, Nov. 1992, pp. 112-1 17.

[50] V. I. Varshavsky, M. A. Kishinevsky, V. B. Marakhovsky, V. A. Peschansky, L. Y. Rosenblum, A. R. Taubin, and B. S. Tzirlin, Self-timed Control of Concurrent Processes. Norwell, MA: Kluwer Academic, 1990. (Russian edition: 1986.)

[51] A. V. Yakovlev, “On limitations and extensions of STG model for designing asynchronous control circuits,” in Proc. Int. Con$ Cornput. Design, Oct. 1992, pp. 39WoO.

[52] A. V. Yakovlev, L. Lavagno, and A. Sangiovanni-Vincentelli, “A unified signal transition graph model for asynchronous control circuit synthesis,” in Proc. Int. Con$ Computer-Aided Design, Nov. 1992.

[53] A. V. Yakovlev and A. Petrov, “Petri nets and parallel bus controller design,” in IEEE Cornput. Soc. Int. Con$ Applicat. Theory Petri Nets, Paris, France, June 1990.

Reading, MA: Addison Wesley, 1981.

86 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 14, NO. 1, JANUARY 1995

Lucian0 Lavagno (S’88-M’93) for a photograph and biography, see this issue, p. 60.

Alberto L. Sangiovanni-Vincentelli (M’74-SM’81-F‘83) for a photograph and biography, see this issue, p. 44.

Kurt Keutzer (S’83-M’84-SM’94) received the B.S. degree in mathematics from Maharishi In- ternational University in 1978, and the M.S. and Ph.D. degrees in computer science from Indiana University, in 1981 and 1984, respectively.

In 1984 he joined ATCT Bell Laboratories, where he worked to apply various computer science dis- ciplines to practical problems in computer-aided design. In 1991 he joined Synopsys, Inc., where he continues his work as Director of Research. His research in technoloev mamine led to the - _ I

inclusion of a paper in the anthology “Twenty-five Years of Electronic Design Automation,” His investigations into synthesis for testability, asynchronous synthesis, and timing verification have led to DAC best paper awards in 1990 and 1991, as well as an ICCAD distinguished paper citation in 1991, and an ICCD best paper award in 1992.

Dr. Keutzer presently serves on the editorial boards of three journals: IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS; F o m l Methods in System Design; and Integration - the VLSl Journal. He currently serves on the technical program committees of DAC and Euro-DAC, and he has served on numerous other technical program and executive committees in recent years.