Sim2 Mutants Have Developmental Defects Not Overlapping with Those of Sim1 Mutants

An Integrated Multicomponent Synthesis Environment for MCMs

Ranga Vemuri, Nand Kumar, Raghu Vutukuru, Prasad Subba Rao, Praveen Sinha,

Ning Ren, Paddy Mamtora, Ram Mandayam, Ram Vemuri, and Jayanta Roy University of Cincinnati

esign and development of application-specific multichip modules (that is, MCMs that contain many application-specific integrated circuits) is growing in importance, especially in defense and aerospace applications. Low

production quantities and the need to produce working systems quickly in these application areas do not justify the long and expensive development methodolo- gies traditionally used for MCMs.' An integrated CAD environment for application-specific MCMs should offer the following facilities:

Automated synthesis. To significantly reduce design times, automated synthesis techniques must be employed at all stages of design including behavioral, register transfer, logic, chip layout, and package layout.

Design verification. Simulation is still the most viable technique for verifying design correctness. The design environment must have efficient simulation tools at all levels of design abstraction. Test vectors required for simulation must be

, pcncratccl automatically a\ far a h possible. and t o turther reduce cxpcnsivc verification time. \imulation re<uIt\ must he compared automatically with the

This vertically integrated design

environment supports the development of application-specific MCMs that involve

low-volume production and short lead times.

expected responses. Tesrability support. Complexity, poor accessibility, and a heterogeneous com-

ponent mix make MCM testing a tedious task. MCM designs must address testability. This may require the use of an appropriate test architecture such as boundary-scan or built-in self-test.

HDL-centered design environment. T o facilitate automated synthesis and design validation at each step in the design process, textual formalisms such as hardware description languages (HDLs) and other standard, widely accepted formats must be used in addition to graphical front ends and schematic capture tools.

Performance-driven design f low. If not carefully controlled, the use of synthesis tools can adversely affect performance. The design environment must have provisions that let the designer override synthesis tools whenever desired. Further, all synthesis tools should be capable of taking some specified performance goals into account.

62

Over the past several years research and development have brought many of these ideas to maturity. For example, many C A D vendors offer silicon compilers, logic synthesis, and layout synthesis tools for ASICs. MCM place-and- route tools with varying degrees of efficiency are available, as are efficient simulators at all levels of abstraction. Standardization of V H D L (VHSIC Hardware Description Language)z and the widespread availability of VHDL simulators is accelerating the use of HDL-centered design environments. Similarly. test-related standards such as IEEE 1049.1 for boundary scan' and IEEE 1029.1 WAVES (Waveform and Vector Exchange Specification)' encour- age the use of structured testability techniques in large multicomponent designs.

Other topics are still under investigation. Designers have not yet accepted behavioral synthesis - the process of generating register-level designs from behavioral specifications.s Partitioning algorithms, used for generating multichip designs from large monolithic specifications, are still in active research. In spite of rapid advances in test research. there are no commonly available tools for test-bench compilation and automated test-vector generation for large designs.

We have been developing an integrated design environment for multichip modules. The environment. called MSS (Multicomponent Synthesis Sys- tem), contains several research tools developed over the past three years. along with several industrial-strength tools. The MSS environment is centered in VHDL, WAVES. and PDL (Perfor- mance Description Language -a notation for describing the performance attributes of a design). MSS provides four levels of automated synthesis support all the way from the behavioral level to MCM placement and routing; three levels of simulation support including behavioral. register, and switch levels: and tools for automated test-bench compilation and design validation for all synthesized designs.

compiler Crosstalk

MSS environment

Figure 1 shows the process of MCM design synthesis using the MSS tools. First, the user prepares an executable behavioral specification, in VHDL, of the digital system to be designed and

employs a standard VHDL simulator to verify its behavior. The user also specifies the performance goals in PDL and employs the PDL interpreter to check the correctness of PDL programs.

The user then invokes the behavioral synthesis tool, called DSS (Distributed

Synthesis System), which generates a register-level design, also in VHDL, using a parameterized register-level module-generator library.

Following behavioral synthesis, the partitioning engine invokes a collection of algorithms that partition the design

Behavioral specification (VHDL) Performance specification (POL)

I Behavioral synthesis component

Register-level design (VHDL) --

Partitioning Partitioning engine cost functions (POL) I algorithms

Register-level MCM design (VHDL) ci, cz,. . . , q, c

-. c Test structure

library (VHDL) compiler I

MCM design with Behavioral- boundaty-scan level structures (VHDL) test-bench - 6C,,BC2,. . . ,SC,,SC (WAVES

or VHDL) BC,, 8C2, . . . , 8Cn

T-

(LagerlV or GDT) layout cell

Mesk layouts L,, 4'. . . * L"

fabrication

Multicomponent test-bench

simulator compiler

1 1 I Test benches

MCM *?- and-route deslgn test bench

compiler

test benches Sl,, ST,, . . . , ST, To MCM

foundry

Figure 1. Flowchart for the Multicomponent Synthesis System.

April 1993 63

across multiple VLSI chips. The resulting multicomponent design meets the stated pin-limit and area constraints on individual chips. The register-level description of each chip i is generated as a VHDL file C, and the interconnections among the chips are described in a sep- arate top-level VHDL entity file C.

Next, the user optionally invokes the test-structure compiler. This compiler

adds all the necessary hardware to implement the IEEE standard boundary- scan architecture for testing the multicomponent design. The result ing component designs are placed in individual VHDL files BC, and a top-level connectivity file BC.

The component designs are processed by a silicon compiler that generates fab- ricatable mask layouts. We currently use

entity find is port (x : in bit-vector(3 downto 0);

index : out bit-vector(3 downto 0)); end f i n d

architecture find of find is type int-array is array (0 to 7) of bit-vector(3 downto 0); signal list : int-array := ( " 1000" ,"0111 'I ,"0110" ,"0101" ,

begin "0100" ,"0001" ,"0010", "001 1 " );

sort : process variable i j : integer := 0 variable low,high,mid,found : integer := 0; variable temp : bit-vector(3 downto 0); variable sorted : bit := '0'; begin

if (sorted = '0 ' ) then i := 0; while (i < 8) loop

--%width 4 - - %width 4

- - bubble sort

j : = i + l ; while (j < 8) loop if bits-to-int(list(j)) < bits-to-int(list(i)) then;

temp := list(j); list@ <= list(i); list(i) <= temp; wait for 0 ns; - - for synchronization

end if; j : = j + l ;

end loop ; i : = i + l ;

end loop ; sorted := '1';

end if;

index <= "1111"; low := 0; high := 8;

found := 0; while ((low < high) and (found = 0)) loop

mid := (low + high) / 2; if (x = list(mid)) then found := 1; end if; if (x > list(mid)) then low := mid + 1; end if; if (x < list(mid)) then high := mid; end if;

- - binary search

end loop;

if (found = 1) then index <= int-tobits4(mid); end if; wait on x;

end process; end find;

Figure 2. Behavioral specification of the Find in VHDL.

64

the LagerIV tools from Mississippi State University and the University of Califor- nia at Berkeley, and the Generator De- velopment Tools (GDT) from Mentor Graphics. We developed translators to convert register-level VHDL into the formats required by these tools. The resulting layouts are saved in files L,.

The packaging compiler is the final step in the synthesis process. At its heart is an MCM place-and-route system. We currently use MCM, hybrid, and print- ed-circuit-board tools from Mentor Graphics for this purpose. We have developed several translators to generate appropriate net-list and geometry files to drive these place-and-route tools.

Design validation in the MSS environment is accomplished through simulation at various stages. As stated earli- er, a VHDL simulator validates the behavioral specification. The test vectors used for this purpose and the test results are stored in WAVES format. A multicomponent test-bench compiler generates individual test benches (test vectors plus expected responses) for each component in the multicomponent design. Each test bench T, is generated in WAVES notation. Also generated is a test bench T to validate the entire design including the interconnect. Each component design BC, is simulated using the test bench T, to verify the correctness of the component designs. The entire design is also simulated using T .

The MSS environment also incorporates a switch-level test-bench compiler capable of accepting a WAVES test bench and generating a switch-level test bench. This compiler generates test benches ST, in a format suitable for driving a switch-level simulator. These test benches simulate the switch-level models extracted from the mask layouts. If desired, the entire design can be simulated at the switch level. However, this is usually avoided because it re- quires an inordinate amount of simulation time.

Thermal analysis and cross-talk analysis tools are available within the packaging compiler. The user conducts these analyses; if the results are not satisfactory, a corrective action can be taken and the design process repeated from any of the previous steps. For example, alternative partitions could be generated by relaxing the constraints and reinvoking either the behavioral synthesis system or the partitioning engine -or both.

COMPUTER

t

All tools in the MSS environment execute on Sun Sparcstations and are written primarily in C++ (with some pieces in C and Prolog).

Three examples

Three tutorial examples in this article illustrate MSS algorithms and results. The primary example is the Find, which performs a bubble sort followed by binary search. We use it as the running example because it is small. Such small specifications, however, do not require MCMs. We use two larger examples, the Move Machine and the Viper Mi- croprocessor, to illustrate the results.

Find. Find has an array of eight 4-bit numbers. O n power-up, Find sorts these numbers in ascending order. Then it accepts a 4-bit number, searches for the number in this array, and if found, re- turns the array index. Figure 2 lists its behavioral specification. The interface to the Find entity is kept quite simple for the sake of clarity. Writing a behavioral specification is the first and most important step in generating an MCM design using MSS. Tools in MSS can only interpret the VHDLconstructs used in the behavioral specification: they have no way of determiningwhether the specification itself is "correct." The user must ensure that the specification re- flects what he or she wants.

Move Machine. The Move Machine is a small instruction-set processor. It has a 10-bit instruction pointer, 16-bit instruction register, four 16-bit general-purpose registers, and a set of 16 instructions with four addressing modes. It assumes a 1-Kbyte. 16-bit main memory. The VHDL specification of the Move Machine is given in Roy et al.h

Viper Microprocessor. Viper was the first microprocessor to be specified, designed. and implemented with formal verification in mind.' Viper's 32-bit architecture assumes a 1-Mbyte main memory, two 32-bit general-purpose registers. a 32-bit accumulator, and a 12-bit instruction pointer. Viper has 32 instructions. 16 addressing modes, and eight destination selection modes. The VHDL specification of Viper's instruction set is about 450 lines long and is available to the public through the authors.

RAM

P i

(Register Block) Dk ' : b - 3

D A T A " 7

Figure 3. Register-level design of the Find, excluding multiplexers.

High-level synthesis

High-level synthesis is the process of generating a register-level design from a behavioral specification. The register-level design contains a data path and a finite-state controller. Our high- level synthesis system, called DSS (Dis- tributed Synthesis System). is a collection of parallel algorithms.fi The data paths generated by DSS contain register- level modules selected from a parameterized module library. This module library contains VHDL descriptions of the register-level modules and information about the performance of each module (for example, its delay time). Since the modules are parameterized in terms of their bit-widths. the area and delay profiles are stored as equations or tables.

DSS accepts a behavioral spccifica- tion in asynthesizable subset of VHDL6 and performance constraints in terms of the desired clock period and the up- per limit on the area. DSS views the clock period as the maximum dclay of the combinational blocks in any register transfer and the area constraint as the sum of the areas ofthe rcgister-lcvel components used in the data path.

The behavioral specification is first

translated into a dataflow graph representation called VIF (VHDL Interme- diate Format). The VIF representation is organized as a collection of segments. one segment for each process. subpro- gram, wait statement. and while-loop in the VHDL specification. The VIF representation is then loaded into DSS during the initialization phase.

DSS consists of the three major tasks described in this section. Figure 3 shows the register-level design generated by DSS for the Find specification. The entire system executes o n several unipro- cessor and multiprocessor platforms.

Scheduling and performance estimation. The operations in the dataflow graph are assigned relative time steps during scheduling. and the arithmetic operations are bound to physical arithmetic logic units available in the module library.

In DSS, the scheduler simultaneously explores many alternative designs. T o do so. it first generates all valid module sets from the module library. A valid module set is a collection of register- level modules selected (with duplicates allowed) from the module library such that ( a ) the selected modules are suffi- cient to implement all operations in the

April 1993 65

dataflow graph, (b) no module in the module set has a delay greater than the specified clock period constraint, (c) together, the modules in the module set do not violate the area constraint specified by the user, and (d) the number of copies of any module does not exceed the maximum number needed to implement all the parallelism available in the dataflow graph. Then, for each valid module set, the dataflow graph is scheduled using a variant of Paulin's force- directed scheduling algorithm.x The force-directed algorithm uses an efficient heuristic to produce the fastest possible schedule subject to the resource constraints imposed by the modules available in the valid module set.

Following the scheduling step, a performance estimator is invoked to esti- mate the area and clock speed of the design that would be generated from the scheduled dataflow graph. Estima- tion of area considers the effects of introducing registers. multiplexers, routing, and the controller. Estimation of execution time is based on the number of control steps used to schedule the dataflow graph. The estimator generates a plot showing the area versus execution-time trade-off for various module sets. The user can select a trade-off point, or the DSS will select the fastest of the least-area designs. For details on DSS scheduling and trade-off analysis, see Dutta e t al.'

Register optimization. Each carrier in the dataflow graph represents the need for a register (roughly speaking, a carrier is an edge connecting two operation nodes in the dataflow graph). Two carriers are said to be compatible if their life spansdo not overlap. The problem of register optimization is to find the best grouping of compatible carriers such that each group can be bound to a physical register. Within any group, the life spans of any two carriers should not overlap. Fewer groups lead to fewer registers in the data path. Like scheduling, register optimization is an NP-hard problem.

Our register-optimization algorithm is based on partitioning the dataflow graph at the VIF segment boundaries. In the first step, intrasegment optimization is carried out on the basis of exten- sive intrasegment life-span analysis to determine the best grouping of compatible carriers within each segment. The second step is the intersegment register

optimization. In this step, various carrier groups determined from the intrasegment optimization are further com- bined into larger groups. Intersegment optimization does not require any further life-span analysis: instead, it simply exploits the data-dependence relation- ships (callingsequences) among the segments as represented in the dataflow graph.

At the end of register optimization, a few groups of compatible carriers are left. Each group is then bound to a physical register selected from the module library. After the register optimization is completed, the interconnect structure of the design is formed by adding multiplexers and buses as appropriate. For details on DSS register optimization, see Roy et al.'

Controller generation. Behavioral specification in VHDL is viewed as a collection of communicating processes. While facilitating abstract specification, this view complicates controller organization and generation. The controller is conceptually organized as a collection of communicating synchronous state machines, one for each VIF segment. A privileged finite-state machine, called roof F S M , controls all segment-state machines, called leaf FSMs. Leaf FSMs control the register transfers in the data path.

The interaction between the root FSM and the leaf FSMs resembles what hap- pens in the VHDL simulation cycle. During each cycle of operation, the root FSM updates all the signal registers and invokes all the leaf FSMs. When all leaf FSMs reach the privileged wait states, the root FSM generates a special signal, called finish. In simulation terms, the finish signal indicates the suspension of all processes and the absence of pend- ing transactions. Then, the root FSM begins a new cycle.

We should point out that in spite of the conceptual organization as a collection of state machines, the design is actually synchronous and, in fact, assumes the standard two-phase clocking scheme.

Partitioning for multichip designs

The register-level design generated by DSS may be too large to f i t on one

chip. In such cases, it should be parti- tioned into several chips, subject to area and pin constraints. The partitioning engine serves this purpose. It has a flex- ible framework that gives the user access to various partitioning algorithms and performance evaluators as well as an interactive interface that facilitates manual partitioning if desired.

The partitioning problem involves clustering the register-level elements (registers, arithmetic logic units, multiplexers, etc.) into several groups. Each group of elements is bound to a VLSI chip, and all the chips together are bound to a multichip module. Let the register- level design R consist of register-level elements r , , rz, . . . , r,. These components should be collated into disjoint groups g,, g,, . . . , g,,, 1 5 m I n, subject to the following constraints:

where area(r) denotes the area of the register-level module r , and A denotes the area of any chip.

(2) Pin constraint: Vg,, pin-count(g,) 5 P

where P i s the maximum number of I/O pins allowed for any chip.

The MSS partitioning engine uses performance attributes and cost functions defined in PDL to drive the partitioning algorithms. PDL is a performance description language based on attribute grammars. PDL programs define such attributes as area and pin-count as well as the number of chips in an MCM and the number of interconnection wires among the chips (see the side- bar for example PDL programs).

Partitioning algorithms usually attempt to minimize the total number of chips and/or the total number of interconnections among the chips. Partition- ing is a combinatorial problem; even its restricted versions are known to be NP- hard. Various heuristics representing trade-offs between the time required to generate a partition and the quality of the partition generated have been pro- posed. No single algorithm can produce efficient designs in all situations. We briefly describe two partitioning algorithms here.

Hierarchical clustering. In the hierar-

66 COMPUTER

Performance Description Language (PDL) Designs are hierarchical compositions of primitive de-

sign objects. A performance attribute is a function of the design structure and the performance attributes of its constituent objects. Thus, performance attributes can be conveniently described in terms of the attributes attached to objects in the design hierarchy. Attribute grammars’ are a sound formalism for doing so.

We have been developing an executable notation to describe both static and dynamic performance attributes succinctly. The notation, called PDL (Performance De- scription Language), is based on attribute grammars. In- terpretation of a PDL program describing a collection of performance attributes of a design yields a procedure to compute the values of these attributes. PDL users can describe a system’s performance attributes at multiple levels of abstraction.

In PDL a design schema is specified using schema construction clauses. Attributes can be associated with each design entity, and attribute computation rules can be associated with the schema construction clauses. We are incorporating a variety of data types, constructs, and short-hand notations in PDL to facilitate straightforward specification of complicated attributes. Here we introduce PDL through two simple examples. (The syntax used here is expected to change as PDL evolves.)

Static performance attributes. The PDL program in Figure A defines the area attribute of a system composed of processors and memory units. The first line in the program associates an attribute named area with each design object in the description. In subsequent lines, system, module, processor, and memory are design objects specified by the user, and processor and memory are defined to be primitive objects that do not contain any other objects. There are two schema construction clauses: system contains one or more module objects; module contains zero or more processor or memory objects. En- closed between ‘( )’ is a structural qualification statement that restricts the number of ‘module’ objects in the system. Enclosed between ‘{ are the attribute evaluation rules. Attribute area of the system is defined to be the same as the sum of the area attributes of its constituent modules, and area of the module is defined likewise. En- closed between ‘[ ]’ is a list-construction operator. For ex-

attr area of‘ * is in/ :

system :: module

{ system.area = +[module.arca]: ]

m o d u l e :: processor. m e m o r y

+ +[memory.area]: )

( nrm( module) > 0: )

{ module.area = +[processor.area]

processor :: priniiri\~e: memory :: priniitive:

Figure A. PDL program to define the area attribute of a system.

dyiimnic cirtr length. max-length of queue is i n t := 0: e v m r attr flag of’ q u e u e is (enqucuc . dequeue) : process :: producer. consumer, queuc

( rir tm(consumer) = = 1 ; nitni(producer) = = 1: nlrr,i(queue) = = 1: )

queue :: priniirive: (on(f1ag)

th?ti (f(flag = = e n q u e u e )

hvgirl length := length + 1 : if’(max_length < length)

thrii max-length := length; rr7tl:

else length := lcngth - 1: I

producer :: primitive: consumer :: priniit ive:

Figure B. PDL program to define the maximum length of a module.

ample, this construct in line 4 constructs a list of the area values of all the module objects.

Dynamic performance attributes. The PDL program in Figure B defines an attribute to determine the maximum length of queue in a design that consists of three modules: producer, consumer, and queue.

The program defines length and max-length to be dynamic attributes of queue and defines flag to be an event associated with the queue. An event is the means with which the environment can impact execution of a PDL program. In this program, flag is an enumerated type assuming enqueue and dequeue values. The attribute computation rule associated with queue is conditional upon the flag event. Whenever flag assumes enqueue or dequeue, the length and max-length are updated.

PDL implements dynamic attributes by interfacing the PDL system with a hardware-description-language (HDL) simulator. The external events take place in the simulator as the HDL model is executed. We are currently addressing how to establish the correspondence between the event attributes in the PDL code and events in the HDL simulator and how to formalize the interaction between these events.

For further details on PDL, see Mandayam and Vemut2

References

1. P. Deransart, M. Jourdan, and B. Lorho, Attribute Grammars, Lec- ture Notes in Computer Science-323, Springer-Verlag, 1988.

2. R. Mandayam and R. Vemuri, “Performance Specification and Measurement,” to be published in Proc. /F/P Conf. on Computer Hardware Descripfions Languages and Their Applications, North Holland, New York, Apr. 1993.

April 1993 67

Clock Clock Data,x phase 1 phase2 Reset Start

1 4 I I l l

Figure 4. Multichip design of the Find.

1 2 3 4 S 6 7 8 9 10 11 12 13 14 1s 16 17 -

N: population size (number of partitions in a generation) := 100 S: percent of new generation produced by selection := 20 C: percent of new generation produced by crossover := 80 M: percent of partitions mutated := 20 begin

create a random set of N partitions. evaluate the fitness of each partition. while (stopping criteria not satisfied) d o

begin Create S percent of new population of partitions by selection. Create C percent of new population of partitions by crossover. Replace the current generation by new generation of partitions. Mutate M percent of the current partitions. Evaluate the fitness of each partition. Save the partition with the best fitness.

end end

Figure 5. Genetic partitioning algorithm.

68

chical clustering technique. we assume that none of the register-level elements violates the area or pin constraints. A cluster represents a VLSI chip. A register-level element that is included in a cluster is said to be houtzrl; otherwise. i t is said to be,frrr. Initially. a free element is selected and included in a new cluster. A free element is added to this cluster if such an addition does not violate the area and pin constraints on the chips. In case of multiple choices, the element that causes the maximum in- crement in the cluster area is selected. This process is repeated until no more elements can be added to the cluster. Additional clusters are created until all elements are bound. This completes the first iteration of the clustering algorithm.

In the next iteration. the clusters formed in the previous iteration are viewed as elements and further clustering is attempted. This iterative process continues until no further clustering occurs during an iteration. Hierarchical clustering is a deterministic, greedy algorithm. It usually gives less than optimal partitions. but i t is quite fast and memory efficient. We used i t to partition the Find example into five chips. Figure 4 shows the resulting design.

Genetic partitioning. Randomized search algorithms that intermittently admit inferior designs in order to reach a final optimal design are known to be effective in solving combinatorial prob- lems. Genetic a1gorithms"'are one such class of algorithms. They search through generations of populations (solutions to the given problem) to arrive at a population containing an individual so- lution that meets the given criteria. As the population evolves during the search process, the average fitness of the population improves.

We have developed a genetic algorithm to solve our partitioning problem (see Figure S)." Initially. it creates a number of random partitions. The f i t - ticss of a parti t ion is dcf ined a s l / ( l + P D + A D ) . where P D is the net deviation from the pin constraint on each chip and A D is the net deviation from the area constraint on each chip. Fitness is a value between 0.0 and 1.0. The algorithm uses the following oper- ators to generate the next generation of part i t i o ns :

Selrctiori. A randomly selected partition that is highly fit (fitness value >

COMPUTER

0.8) can be moved into the next generation. Twenty percent of the partitions in the new generation are created using selection.

Crossover. Two highly fit partitions P,.P, in the current population are randomly selected. The largest chip C (in terms of total area of the register-level components in it) in P , is copied into P2. Register-level elements in C that are currently assigned to other chips in P, are deleted from those other chips. Both P , and P2 are moved into the next generation of partitions. Eighty percent of the partitions in the new Sener- ation are created using crossover.

Mirtrrtion. A partition is randomly selected and a register-level element from some randomly selected chip i n the partition is moved to another randomly selected chip. Mutation is ap- plied to 20 percent of the partitions in the new generation.

The genetic partitioning algorithm ter- minates when a termination criterion is satisfied. The criterion usually is that a partition with fitness 1.0 is found. In addition. to find optimal partitions (not just constraint-satisfying partitions). other criteria must be specified. These criteria might include the computation time. the number of generations to be searched. or a lower limit on a measure of global optimality such as the total number of interconnection wires or the total number of chips. The genetic partitioning algorithm is quite slow but can find better quality partitions. in terms of the number of chips and the number of interconnections among them. than the hierarchical clustering algorithm. For complete details of the genetic partitioning algorithm. see Vemuri and Ve- muri.”

Test structure compiler

Complexity of designs and limited probing accessibility in multichip modules necessitate incorporation of test structures in the design to improve ob- servability and controllability. The IEEE standard boundary-scan architecture provides a means of gaining access to the chips buried inside the multichip module.‘

The MSS test structure compiler automatically incorporates the IEEE standard test access port (TAP). scan cells. and other devices necessary to support

boundary scan. T o each chip in the multichip design generated from the partitioning phase. the test structure compiler adds a 16-state T A P controller. a 3-bit instruction register. an instruction decoder. a bypass register. scan cells for all the pins, and optionally. a device identification register. The following instructions are supported by default: Extest. Bypass. SampleiPreload. Intest. and Idcode. Five pins are added to each chip: Tck. Tms. Tdi. Tdo. and Trst:*.

The test structure compiler has access to a library of boundary-scan structures written in VHDL. The compilation consistsof three major phases. First. relevant design and connectivity information is extracted from the VHDL files generated at the end of the partitioning phase. Then. during the boundary-scan synthesis phase. all necessary test structures are added in each chip and appropriate connections are added among the new test structures and the existing 110 pins of the chip. Finally. during the serial-scan-path formation phase. the chips are interconnected such that theTdoofone chipgoes to the Tdi of another during the test mode. The test access port is added to the overall MCM design. Thus. a serial scan path is established through the entire

MCM and can be exercised in its test mode.

The test structure compiler outputs the resulting MCM design in the form of several VHDL files. one for each chip and one for the connectivity information among all the chips.

Silicon compilation

Mask layouts for each chip in the MCM are generated using the LagerIV silicon compiler tools.” LagerIV is a collection of logic synthesis and physical design- automation tools for VLSI chips. Spec- ifications to LaZerIV are written in two formats: SDL specifies structural com- position of existing library modules and standard cells. and BDS specifies finite- state controllers. We wrote two translators to generate appropriate SDL and BDS files from the register-level VHDL description of the chip being synthesized. These files are then processed by the LagerIV tools to generate fabricat- able mask layouts. We currently use 2.0- micron scalable CMOS technology: the design process. however. is completely technology-independent.

Figure 6 shows the mask layout. with bonding pads. for one of the chips in the Find MCM.

Figure 6. Layout of chip 5 in the Find MCM (with boundary scan).

April 1993 69

Design verification methodology

Although the designs generated by various synthesis tools are expected to be correct “by construction,” some software errors can remain in the synthesis tools and result in incorrect designs. To verify that the designs generated meet the functionalspecifications and to avoid costly post-fabrication test and repair. the MSS design process includes exten- sive simulations.

The design is verified through simulations at various stages. Our approach is primarily based on functional testing at each level of design abstraction. namely, the behavioral. the register. and the switch levels. We tested all the component generatorsin the various MSS component libraries before adding them to the libraries.

T o facilitate functional testing. we provide two test-bench compilers. A fesf bench is an executable textual description of the test vectors and expected results. WAVES is an IEEE standard for writing test benches.’ When simulating the behavioral specifications. the MSS user must have employed good functional testing practices so that rea- sonable fault coverage is possible. MSS uses these test vectors to automatically generate functional tests at lower levels of abstraction. All the user has to d o is

provide behavioral-level test vectors in WAVES format. The MSS multicomponent test-bench compiler generates test benches for each component as well as for the entire MCM design. These test benches are in the form of WAVES files that can be used in conjunction with any VHDL simulator for register- level simulations of the MCM design.

WAVES test benchescan also be used for post-fabrication testing of the MCM and its constituent chips (both before and after being embedded in the MCM package), provided the automated test equipment accepts WAVES files. Many test-equipment manufacturers are developing software packages to facilitate inputs in WAVES format. For post- layout simulations of the chips, we use a switch-level simulator that extracts circuit descriptions from files generated by the silicon compiler. The switch-level simulator uses a linear timing model based on the extracted capacitances to provide both functional and timing simulations. More detailed simulations (using a circuit simulator such as SPICE) are prohibitively expensive for such large designs. Note. however. that all component generators in the library used in conjunction with the silicon compiler have been simulated with SPICE.

The switch-level test-bench compiler automatic a I l y ge n e ra t e s switch - 1 eve 1 simulator-stimulus files. The source for these files is the WAVES external files

Figure 7. Multi- component test- bench compiler.

70

Register-level design (VHDL)

Behavioral-level test bench (WAVES simulator or VHDL)

I Trace file for all signals

i-4-p Compare Stop

Multichip design z Waveform

partitioning extractor

WAVES external files for each chip and the overall MCM (without boundary scan)

I A

WAVES test benches for each chip and for overall MCM (with boundary scan)

produced by the multicomponent test- bench compiler. Simulation results, at both the register and switch levels, are automatically compared with the expected responses. using two programs provided for this purpose.

Multicomponent test-bench compiler. Figure 7 shows the multicomponent test-bench compiler. I t generates test benches for the individual chips as well as for the entire MCM. T o generate these test benches, it is necessary to trace all the signals that realize the interconnections among the chips. This is done while simulating the register-level design generated by DSS with the same WAVES test bench used to simulate the behavioral specification.

If the register-level design is correct, it should produce the same responses as the behavioral specification for each test stimulus. However. there is one difficulty. The behavioral model has no concept of clocks or timing: The responses are produced “instantaneous- ly.” The register-level design mesa two- phase clocking scheme; the responses are generated a finite number of clock cycles after applying the stimulus.

T o determine the exact instance in time when the signal values are valid, we use the finish signal generated by the controller. Recall that this signal goes high when all the leaf FSMs reach the wait state. indicating that the system has finished responding to the stimulus. The values at the output ports at that time should match the expected values indicated by the WAVES test bench. The next test vector can then be ap- plied. A program is used to compare, for each test vector in the WAVES file. the actual responses of the multicomponent design with the expected responses. A trace file that traces all the waveforms (signal values) is generated for the entire simulation run.

The next s tep is t o generate the WAVES test benches for each chip. The MCM design generated by the partitioning engine is analyzed to determine the names of the signals in the register-level design that correspond to the input and output pins of each chip. The waveforms on these signals are extracted from the simulation trace file. Waveforms on the input pins of a chip represent the stimuli, and waveforms on its output pins represent the expected responses. This data is generated as a WAVES external file. A WAVES

COMPUTER

external file is generated for each chip and one for the overall design.

The final step is to transform the waveform data into a form suitable to exer- cise the scan path generated by the test structure compiler. During this step. the user can choose several options corre- sponding to the boundary-scan options selected during the test-structure compilation phase. These include parallel- idserial-out or serial-in/parallel-out testing, checking the device identification register. and checking the bypass mode. Complete WAVES data sets. one for each chip and one for the overall MCM design, are generated.

Switch-level test-bench compiler. In- put for the switch-level test-bench compiler includes the test benches in WAVES format generated by the multicomponent test-bench compiler and the VHDL entity description of the design.The switch-level test-bench compiler generates a switch-level-simulator command file that contains commands to apply all test vectors specified in the WAVES test bench. Then. switch-level simulation is carried out using a switch- level model extracted from layouts of the entity. The simulation results can be viewed on the simulator's waveform display. The results are also automatically compared with the expected rc- sponses.

Figure X shows a partial snapshot of the waveforms generated by the switch- level simulator used to simulate chip 5 in the Find design. Figure 0 shows the relevant portion of the WAVES external file generated by the multicomponent test-bench compiler. A comparison of these figures shows that the switch-level simulations of chip 5 are in agreement with the expected values in the WAVES file.

MCM physical design The final stage in the design process

is the design of the MCM package itself. We currently use the Mentor Graphics MCM design tools for placement. routing, and analysis of the package. The term packnging conzpilrr refers here to the MCM physical design tools collec- tively.

The MCM packaging compiler takes as its input the top-level VHDL description of the multichip design and the mask-layout files of each chip. We

% STRUCTURAL TEST VECTORS (uith expected outputs) % Ports

% nmcir35-0ut nmcir2lLout mcirlO-HUXLSELECT mcirlO-out nmcir29_out nmclr28.out % nmcir27-out nmcir26-out nmcir25.out nmcirl2-out nmcirl6-out nmclr23-out 'I. nmcir22_out mcir2_UUX_SELECT mclr2-out nmclr24_out nmclr5_out % mcir7_MUX_SELECT nmclr7_CONTROL nmcir7_out Clock-phase1 Clock-phase2 Clear % tms tck trst tdi tdo

"0000000000000000000000000000000000000000000000000000000000000000000000000010111110 : 160 ns; "000000000000000100010100110l100001000000000000110000000000001110000000000000111010 : 10 us; "00000000000000010001010011011000010000000000001100000000000011100000000000011~1010 : 20 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000000111010 : 10 us; "0000000000000001000101001101100001000000000000110000000000001110000000000010011110 ; 160 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000000011110 : 10 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000001011110 : 20 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000000011110 : 10 ns: "0000000000000001000101001101100001000000000000110000000000001110000000000010011110 : 160 us; "0000000000000001000101001101100001000000000000110000000000001110000000000000011110 : 10 ns; "00000000000000010001010011011000010000000000001100000000000011l0000000000001011110 : 20 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000000011110 : 10 ns; "000000000000000100010i0011011000010000000000001100010010000011100000000000100~1110 : 160 ns; "0000000000000001000101001101100001000000000000110001001100001110000000000000011110 : 10 ns; "0000000000000001000101001101100001000000000000110001001100001110000000000001011110 : 20 ns; "000000000000000100010100110l100001000000000000110001001100001110000000000000011110 : 10 us; "0000000000000001000101001101100001000000000000110001000100001110000000000010011110 : 160 ns;

Figure 9. Example of WAVES external file.

wrote a net-list extractor to generate a net-list filc from the VHDL file. that has the chip connectivity inform at ion. Various geometry files indicating the bounding-box information are created from mask-layout files.

The packaging compiler operates in two steps: During p/rrcemrrir. all chips are placed o n the chip layer: during vortririg. the interconnect is generated to connect the chips according to the net-list specification. Several wiring layers arc' used for this purpose. Placement

and routing steps attempt to minimize the total area. signal delays, and interconnect wire lengths. Users can try interactive placement to generate tighter designs. The left side of Figure 10 on the next page shows a placed and routed MCM design for the Find example.

Following successful placement and routing. thermal analysis and cross-talk analysis are performed. Dense MCM designs can generate a considerable amount of heat. Placement and routing of MCMs should eliminate hot spots

April 1903 71

Table 1. Source code information for the MSS tools.

Tool Language Size in Lines

Distributed Synthesis System (DSS) Partitioning engine

Hierarchical clustering Genetic algorithm

Test structure compiler Multicomponent test-bench compiler Switch-level test-bench compiler Packaging compiler All the translators SDL and BDS libraries

C++ 50.000 C++ 6.000

C 1 .000 c + + 3.500 C++ 6.000

C 300

Prolog 1 50

C 2.000 SDLiBDS 1,000

Table 2. Design data for the examples.

Feature Move

Find Machine Viper

Lines of behavioral specifications Number of chips Total number of pins of all chips No. of IiO pins (before boundary scan) No. of IiO pins (after boundary scan) No. of register-level components

Excluding boundary-scan devices Including boundary-scan devices

Area constant (mm squared) Pin constraint Clock period constraint (ns) No. of interconnections among chips

62 5

4-39 I5 20

90 559

25 120 100 235

80 3

336 49 54

49 322

5 0 150 100 215

450 5

1.530 69 74

158 1.733

50 350 I O 0 640

and avoid the possibilityof thermal fail- ure. A thermal-analysis tool simulates the three types of heat transfer. namely. conduction. convection. and radiation. The results of thermal analysis can be examined visually. as shown on the right

side of Figure IO for the Find example. If the resultsindicate potential hot spots. the thermal-analysis information is fed back into the place-and-route steps to arrive at a more thermally sound MCM design.

4

Ij ..................... ....._. ......_.___._. ...... ___. ..... . ....... .. ...... . .. . . . ....... ... . .... . .. .

Figure 10. Layout (left) and isothermal profile (right) of

72

Design results and resource usage

While reducing the design time, large- scale design environments such as MSS put a very high demand on file space. We will try to give some feel for the design effort and computer-resource usage involved in generating the multichip designs for our three examples, namely. the Find. the Move Machine. and the Viper Microprocessor.

Tools in the MSS environment have been under development for one to three years. We recently completed stream- lining the entire design process from specification to designs suitable for fabrication. We have not fabricated any MCMs because of high costs. but we have successfully fabricated and tested several VLSI chips. including the Move Machine, that were synthesized using the MSS environment.

Table 1 shows the number of lines of program code for each major tool in the MSS. For parsers. we counted the lines in the parser-generator (Yacc or Bison) source files. All the software executes on a Sun Sparcstation I with 28 Mbytes of main memory and 260 Mbytes of lo- cal disk used mostly as a swap area and attached to an 8-Gbyte file server.

Table 2 shows some data pertaining to the three example MCM designs. The target technology for the chips is 2.0- micron scalable CMOS in each case. If we had chosen a smaller feature size. say. 1 .O-micron CMOS. then none of the

I ie Find MCM.

ISOTHERM TEMPERATURES ( *C)

R 26.286

26 257

26.228

26,198

26 169

26.139

26.110

26.081

26 ,051

26.022

25 992

a ,963

COMPUTER

examples would have required partitioning. Note that none of the examples is pin-limited. In fact. the primary interface of each design has relatively few pins - 74 in the case of Viper. But once we start partitioning the design because of area limitation. the number of pins needed for the individual chips is quite high because the number of wires connecting register-level elements in the design is large.

The large number of pins in each individual chip contributes to the relatively large number of boundary-scan devices introduced. Each pin needs a boundary-scan cell and each chip needs a T A P controller. The clock is relatively slow in each case. only about 10 megahertz. This again is due to the 2.0-micron technology we are using and the way the library cells are designed: they are not designed for speed.

All arithmetic units are combinational blocks with ripple-carry features. Our synthesis software itself is independent of the technology used: if we replace the library with a better one. we get better designs. The MCM technology is a hybrid technology with four signal layers and assumes surface mounting.

T o give a realistic picture of the design times involved. Table 3 shows the approximate number of person-hours (wall-clock time) taken to complete various design tasks for the three examples, starting with behavioral-specification development and endingwith MCM place-and-route. The times shown are for one design iteration. Subsequent it- erations should take shorter times. As the various algorithms and tools in MSS improve. the synthesis times are expected to decrease significantly. Simulation time will probably decrease at a much slower rate (assuming the same hardware platform): the simulators are al- ready quite fast and efficient.

Layout synthesis. both at chip level and package level. has been a key can- tributor to the design time. However. the layouts generated are reasonably compact. O n the other hand, behavioral synthesis and partitioning (at least. the clustering algorithm) are fast. but the designs generated by these tools have room for improvement. We are currently investigating various algorithms for high-level synthesis and design partitioning to generate better quality multichip designs.

Table 3 shows the amount of disk space used by the various files. Mask-

Table 3. Design and simulation times for the examples (in minutes).

Feature Move

Find Machine Viper

Behavioral specification preparation and simulation Number of behavioral-level test vectors Behavioral synthesis Partition i ng e n gi ne

Hierarchical clustering Genetic partitioning

Tcst structure compilation Silicon compilation (all chips)

Without BSD With BSD

Packaging compilation (place and route) Thermal analysis Multicomponent test-bench compilation (including

Simulation of multichip design with boundary scan simulation of the register-level design)

options at register level (each chip plus overall) and comparison of results

S w i t c h -lev e 1 t cs t - be nc h com pi I at ion ( all chips) Switch-level simulation and comparison of results

(all chips)

Total ( i n hours)

120 240 8 20 7 3

7 i 2

360 300 1 1

3 0 30 360 300 180 120

- 5 1

30 30

35 45 3 3

180 240

22 26

3.360 116

5

5 600

1

60 900 540

1 0

60

120 IO

360

101

Table 4. File spaces for the examples (in Kbytes). ~~

Move Feature Find Machine Viper

Behavioral specification (VHDL) 1.4 Behavioral-level test bench (WAVES) 1 Register-level design (VHDL) 100 Multichip design after partitioning (VHDL)

MCM design with boundary-scan structures (VHDL)

Mask layouts L , + 1.: + . . . + L,,

c', + C': + . . . + c,, + c 1 i o

BC', t BC, + . . . i BC,, + BC 175

Without BSD 1 .ooo With BSD 8.000

MCM layout design 176 Trace file in multicomponent test-bench compiler 1 .000 Test benches (WAVES) T , + T2 + . . . + T,, + 7' 1.500 Switch-level test benches S T , + ST? + . . . + ST,, + T 3.000 Switch-level simulation results files 1.500

Total 16.603

1 .5 15 1 5

100 200

125 300

150 225

2.000 4.500 3.500 20.000

526 1.200 1.000 4.000 1.500 5.000 3,000 10,000 1.500 5.000

13,404 50,145

layout files. simulation-trace files. and test benches take considerable space.

he MSS tools are currently being used for prototyping MCM T designs. HoweLer. thev have not

yet attained the maturity expected of industrial-strength design tools. We are continuing our research and dcvelop- ment efforts with specific attention to

specifying and imposing constraints on other performance attributes. such as heat dissipation and power consumption. at the behavioral-synthesis step.

introducing formal verification tools as a possible way to reduce the simulation timer. and

implementing performance-driven MCM placement and routing.

April 1YY3 73

Acknowledgments This research is sponsored in part by the

U S Air Force Wright Laboratorim under ContractNo.F33615-91-C-I811. the Defense Advanced Research Projects Agency under Order No. 7056 monitored by the Federal Bureau of Investigation under Contract No. J-FBI-89-094. and an ACM/SIGDA gradu- ate scholarship. The work on PDL is also partly sponsored by Semiconductor Research Corp. under Contract No. 92-DJ-293.

We thank Darrcll Barker. John Hincs. Neal Stollon. and Justin Harlow for their useful cnmments and critique of this work. We gratefully acknowledge Bob Reese's help with the LaeerlV tools.

References I . M/r/tichi/i Mor/ri/e.\ . R. Johnson et al..

eds.. IEEE Press. Piscataway. N.J.. 1991.

2 . l E E E Strrridnrd V H D L Ltr~igtrrrgr R erice Mniirrnl. IEEE Standard 1076. Pi5- cataway. N.J.. 1987.

3. l E E E Strrrirlrrrd Tcst Acc.rcs for./ r r r r d BOW^ dirry-5crirl A rchirccrrrrc~. I E E E Stan- dard 1139.1. Piscataway. N.J.. 1990.

4. U.$er'.5 <;t l / t f f 1 0 M.'A VE.y ( M . ' i / l ' C f ~ J ~ t l 1 ( ! , I t /

Vector E.rc/~rrr~gc, Specific.rrtioii). Dratt Document V4.4. Piscataway. N.J.. 1990

5. R.A. Walker and R. Caniposano. A .Sur-- ~ ' e y ofH ig / l -Levc l S??lt/lesis ~ \ ' I C I I ~ . S . Klu- wer Academic Publishers. Norwell. Mass.. 1991.

6. J. Roy et al.. "DSS: A Distributed High- Level Synthesis System." l E E E Design & Test ofC'oniprrtc,rs, Vol. 9. No. 2. l u n e 92. pp. 18-32.

7. W.J . Cullyer. "Implementing Salct! Critical Systems: The Viper Micropro- cessoi-." in V L S l S p ~ i f i ~ r / / / ( ~ t i . C'errfic.tr- / I ( J H , ~ i r d S?.rit/resis. G . Rirtwistlc and P.A. Subrahnianyam. eds.. Kluwer Ac+ dcmic Publishers. Norwell. Mass.. 198X. pp. 1-26.

8. P.G. Paulin and J.P. Knight. "Force-Di- rectedSchedulingin AutomaticData Path Synthesis." Proc. 24th De.\/gti Afft(Jrf1rl-

t iori C ~ t i f . . IEEE CS Press. Los Alami- 10s. Calif.. Order No. 781. 1987. pp. 195- 202.

9. R. Dutta. J. Roy. and R. Vcmuri. "Dis- tributed Design Space Exploration for High-Level Synthesis Systems." Proc. 20th Desigrl A/ltOn10t/f>11 Co!lf.. IEEE <'S Press. Los Alamitos. Calif.. Order No. 2 - 2 2 . 1992. pp. 6446.50.

1 0 . D . <;oldberg. (;c,rii,ric AI,qorirhni\ i r r Ser/rc//. ( ~ p t l r ? l l ~ U t / i J ~ l , utlt/ Mtrc./ritir Lerrrl1- irig. Addison-We\lcy. Reading. Mass.. 1989.

1 1 . R. Vemuri and R. Vemuri . "A Genetic Algorithm for Multichip Partitioning.'. Tech . R e p o r t T M - E C E - D D E - 9 2 - 2 6 ,

Dept. of Electrical and Computer Eng.. LJni\. of Cincinnati. 1992.

12. R. .lain et al.. "An Integrated C A D Sys- tem for Algorithm-Specific IC Design." / € € E Tr(rti.\. CAD. Vol. IO. No. 3. Apr. 1991. pp. 147-163.

1.3. R. V u t u k u r u . P. Suhba Rao. and R. Ve- muri. "Boundary-Scan Test Structures and Test-Bench Compilation in a Multi- chip Module Synthesis System." Proc,. I E E E .Uir/tic/rip .hlodrr/c~ C.otrf . . IEEE Press. Piscataway. N.J.. 1992. pp. 44-17.

of Technologv at Kharagpur and a P h D from Case Western Reserve University. both in computer engineering. Vemuri is the recipi- ent 0 1 the Siddhartha Gold Medal. a distin- guishedrcscarch anard. an outstanding teach- e r a n a r d . and corecipient o f a hest-paper award. He is a member of ACM. IEEE. and the Computer Societ!

Nand Kumar. a PhD student in the University of Cincinnati's Department of Electrical and Com- puter Engineering. has research intcrcstsincom- puter-aided design of digital systems. high-le\,- el synthesis. and formal \ erification.

Raghu Vutukuru . a soft- Mare engineer a t LSI Logic. ha5 research in- terests in design automation. simulation. and testing. He received an MTech from the Indian Institute o f Technology and an MS from the Uni- versit) of Cincinnati.

Prasad Subba Rao. a software engineer at LSI Logic. has research in- terests i n design synthesis and parallel algorithms. He received his BS from Bangalore IJni- versity. India. and his MS from the University of ('in ci n n at i .

Praveen Sinha works for Intel Corp. in Phoenix. Ari7ona. He received his BTech in electrical engineering from the Indian Institute of Technology, Kanpur, and his MS in computer engineering from the University of Cincinnati in 1992.

Ning Ren IS a PhD student in electrical and computer engineering at the UniversityofCincin- nati. He earned his BS and MS in computer engineering from Chengdu Institute of Radio and [Jniversity of Cincinnati

U respectively.

I " c

Paddy Mamtora works in the IC C A D Support G r o u p a t M e n t o r Graphics. H e received his B E from the Univer- sity of Bombay in 1988 and his MS in electrical engineering f rom the University of Cincinnati in 1992.

tion and in systems-level hardware design

Ram Vemuri is a P h D student at the Universi- t y of Cincinnati. where he received his MS in 1989. His research inter- ests include design automation of VLSI circuits. optimization of logic design. and genetic algorithms.

from the University of Cincinnati.

Keader\ can contact Ranga Vemuri at the Laboratory lo r Digital Design Environments. Department of Electrical and Computer Engineering. M.L. #XI. University of Cincinnati. ('incinnati. O H 4322 1-0030: e-mail ranga.vemuri~'uc.edu.

74 COMPUTER

Sim2 Mutants Have Developmental Defects Not Overlapping with Those of Sim1 Mutants

Documents

Transcript of Sim2 Mutants Have Developmental Defects Not Overlapping with Those of Sim1 Mutants