Sim2 Mutants Have Developmental Defects Not Overlapping with Those of Sim1 Mutants

13
An Integrated Multicomponent Synthesis Environment for MCMs Ranga Vemuri, Nand Kumar, Raghu Vutukuru, Prasad Subba Rao, Praveen Sinha, Ning Ren, Paddy Mamtora, Ram Mandayam, Ram Vemuri, and Jayanta Roy University of Cincinnati esign and development of application-specific multichip modules (that is, MCMs that contain many application-specific integrated circuits) is growing in importance, especially in defense and aerospace applications. Low production quantities and the need to produce working systems quickly in these application areas do not justify the long and expensive development methodolo- gies traditionally used for MCMs.' An integrated CAD environment for applica- tion-specific MCMs should offer the following facilities: Automated synthesis. To significantly reduce design times, automated synthe- sis techniques must be employed at all stages of design including behavioral, register transfer, logic, chip layout, and package layout. Design verification. Simulation is still the most viable technique for verifying design correctness. The design environment must have efficient simulation tools at all levels of design abstraction. Test vectors required for simulation must be , pcncratccl automatically a\ far ah possible. and to turther reduce cxpcnsivc verification time. \imulation re<uIt\ must he compared automatically with the This vertically integrated design environment supports the development of application-specific MCMs that involve low-volume production and short lead times. expected responses. Tesrability support. Complexity, poor accessibility, and a heterogeneous com- ponent mix make MCM testing a tedious task. MCM designs must address testability. This may require the use of an appropriate test architecture such as boundary-scan or built-in self-test. HDL-centered design environment. To facilitate automated synthesis and design validation at each step in the design process, textual formalisms such as hardware description languages (HDLs) and other standard, widely accepted formats must be used in addition to graphical front ends and schematic capture tools. Performance-driven design flow. If not carefully controlled, the use of synthe- sis tools can adversely affect performance. The design environment must have provisions that let the designer override synthesis tools whenever desired. Further, all synthesis tools should be capable of taking some specified performance goals into account. 62

Transcript of Sim2 Mutants Have Developmental Defects Not Overlapping with Those of Sim1 Mutants

An Integrated Multicomponent Synthesis Environment for MCMs

Ranga Vemuri, Nand Kumar, Raghu Vutukuru, Prasad Subba Rao, Praveen Sinha,

Ning Ren, Paddy Mamtora, Ram Mandayam, Ram Vemuri, and Jayanta Roy University of Cincinnati

esign and development of application-specific multichip modules (that is, MCMs that contain many application-specific integrated circuits) is growing in importance, especially in defense and aerospace applications. Low

production quantities and the need to produce working systems quickly in these application areas do not justify the long and expensive development methodolo- gies traditionally used for MCMs.' An integrated CAD environment for applica- tion-specific MCMs should offer the following facilities:

Automated synthesis. To significantly reduce design times, automated synthe- sis techniques must be employed at all stages of design including behavioral, register transfer, logic, chip layout, and package layout.

Design verification. Simulation is still the most viable technique for verifying design correctness. The design environment must have efficient simulation tools at all levels of design abstraction. Test vectors required for simulation must be

, pcncratccl automatically a\ far a h possible. and t o turther reduce cxpcnsivc verification time. \imulation re<uIt\ must he compared automatically with the

This vertically integrated design

environment supports the development of application-specific MCMs that involve

low-volume production and short lead times.

expected responses. Tesrability support. Complexity, poor accessibility, and a heterogeneous com-

ponent mix make MCM testing a tedious task. MCM designs must address testability. This may require the use of an appropriate test architecture such as boundary-scan or built-in self-test.

HDL-centered design environment. T o facilitate automated synthesis and design validation at each step in the design process, textual formalisms such as hardware description languages (HDLs) and other standard, widely accepted formats must be used in addition to graphical front ends and schematic capture tools.

Performance-driven design f low. If not carefully controlled, the use of synthe- sis tools can adversely affect performance. The design environment must have provisions that let the designer override synthesis tools whenever desired. Further, all synthesis tools should be capable of taking some specified performance goals into account.

62

Over the past several years research and development have brought many of these ideas to maturity. For example, many C A D vendors offer silicon com- pilers, logic synthesis, and layout syn- thesis tools for ASICs. MCM place-and- route tools with varying degrees of efficiency are available, as are efficient simulators at all levels of abstraction. Standardization of V H D L (VHSIC Hardware Description Language)z and the widespread availability of VHDL simulators is accelerating the use of HDL-centered design environments. Similarly. test-related standards such as IEEE 1049.1 for boundary scan' and IEEE 1029.1 WAVES (Waveform and Vector Exchange Specification)' encour- age the use of structured testability tech- niques in large multicomponent designs.

Other topics are still under investiga- tion. Designers have not yet accepted behavioral synthesis - the process of generating register-level designs from behavioral specifications.s Partitioning algorithms, used for generating multi- chip designs from large monolithic spec- ifications, are still in active research. In spite of rapid advances in test research. there are no commonly available tools for test-bench compilation and auto- mated test-vector generation for large designs.

We have been developing an inte- grated design environment for multi- chip modules. The environment. called MSS (Multicomponent Synthesis Sys- tem), contains several research tools developed over the past three years. along with several industrial-strength tools. The MSS environment is centered in VHDL, WAVES. and PDL (Perfor- mance Description Language -a nota- tion for describing the performance at- tributes of a design). MSS provides four levels of automated synthesis support all the way from the behavioral level to MCM placement and routing; three lev- els of simulation support including be- havioral. register, and switch levels: and tools for automated test-bench compi- lation and design validation for all syn- thesized designs.

compiler Crosstalk

MSS environment

Figure 1 shows the process of MCM design synthesis using the MSS tools. First, the user prepares an executable behavioral specification, in VHDL, of the digital system to be designed and

employs a standard VHDL simulator to verify its behavior. The user also speci- fies the performance goals in PDL and employs the PDL interpreter to check the correctness of PDL programs.

The user then invokes the behavioral synthesis tool, called DSS (Distributed

Synthesis System), which generates a register-level design, also in VHDL, using a parameterized register-level module-generator library.

Following behavioral synthesis, the partitioning engine invokes a collection of algorithms that partition the design

Behavioral specification (VHDL) Performance specification (POL)

I Behavioral synthesis component

Register-level design (VHDL) --

Partitioning Partitioning engine cost functions (POL) I algorithms

Register-level MCM design (VHDL) ci, cz,. . . , q, c

-. c Test structure

library (VHDL) compiler I

MCM design with Behavioral- boundaty-scan level structures (VHDL) test-bench - 6C,,BC2,. . . ,SC,,SC (WAVES

or VHDL) BC,, 8C2, . . . , 8Cn

T-

(LagerlV or GDT) layout cell

Mesk layouts L,, 4'. . . * L"

fabrication

Multicomponent test-bench

simulator compiler

1 1 I Test benches

MCM *?- and-route deslgn test bench

compiler

test benches Sl,, ST,, . . . , ST, To MCM

foundry

Figure 1. Flowchart for the Multicomponent Synthesis System.

April 1993 63

across multiple VLSI chips. The result- ing multicomponent design meets the stated pin-limit and area constraints on individual chips. The register-level de- scription of each chip i is generated as a VHDL file C, and the interconnections among the chips are described in a sep- arate top-level VHDL entity file C.

Next, the user optionally invokes the test-structure compiler. This compiler

adds all the necessary hardware to im- plement the IEEE standard boundary- scan architecture for testing the multi- component design. The result ing component designs are placed in indi- vidual VHDL files BC, and a top-level connectivity file BC.

The component designs are processed by a silicon compiler that generates fab- ricatable mask layouts. We currently use

entity find is port (x : in bit-vector(3 downto 0);

index : out bit-vector(3 downto 0)); end f i n d

architecture find of find is type int-array is array (0 to 7) of bit-vector(3 downto 0); signal list : int-array := ( " 1000" ,"0111 'I ,"0110" ,"0101" ,

begin "0100" ,"0001" ,"0010", "001 1 " );

sort : process variable i j : integer := 0 variable low,high,mid,found : integer := 0; variable temp : bit-vector(3 downto 0); variable sorted : bit := '0'; begin

if (sorted = '0 ' ) then i := 0; while (i < 8) loop

--%width 4 - - %width 4

- - bubble sort

j : = i + l ; while (j < 8) loop if bits-to-int(list(j)) < bits-to-int(list(i)) then;

temp := list(j); list@ <= list(i); list(i) <= temp; wait for 0 ns; - - for synchronization

end if; j : = j + l ;

end loop ; i : = i + l ;

end loop ; sorted := '1';

end if;

index <= "1111"; low := 0; high := 8;

found := 0; while ((low < high) and (found = 0)) loop

mid := (low + high) / 2; if (x = list(mid)) then found := 1; end if; if (x > list(mid)) then low := mid + 1; end if; if (x < list(mid)) then high := mid; end if;

- - binary search

end loop;

if (found = 1) then index <= int-tobits4(mid); end if; wait on x;

end process; end find;

Figure 2. Behavioral specification of the Find in VHDL.

64

the LagerIV tools from Mississippi State University and the University of Califor- nia at Berkeley, and the Generator De- velopment Tools (GDT) from Mentor Graphics. We developed translators to convert register-level VHDL into the formats required by these tools. The re- sulting layouts are saved in files L,.

The packaging compiler is the final step in the synthesis process. At its heart is an MCM place-and-route system. We currently use MCM, hybrid, and print- ed-circuit-board tools from Mentor Graphics for this purpose. We have de- veloped several translators to generate appropriate net-list and geometry files to drive these place-and-route tools.

Design validation in the MSS envi- ronment is accomplished through simu- lation at various stages. As stated earli- er, a VHDL simulator validates the behavioral specification. The test vec- tors used for this purpose and the test results are stored in WAVES format. A multicomponent test-bench compiler generates individual test benches (test vectors plus expected responses) for each component in the multicomponent de- sign. Each test bench T, is generated in WAVES notation. Also generated is a test bench T to validate the entire de- sign including the interconnect. Each component design BC, is simulated us- ing the test bench T, to verify the cor- rectness of the component designs. The entire design is also simulated using T .

The MSS environment also incorpo- rates a switch-level test-bench compiler capable of accepting a WAVES test bench and generating a switch-level test bench. This compiler generates test benches ST, in a format suitable for driving a switch-level simulator. These test benches simulate the switch-level models extracted from the mask lay- outs. If desired, the entire design can be simulated at the switch level. However, this is usually avoided because it re- quires an inordinate amount of simula- tion time.

Thermal analysis and cross-talk anal- ysis tools are available within the pack- aging compiler. The user conducts these analyses; if the results are not satisfactory, a corrective action can be taken and the design process repeated from any of the previous steps. For ex- ample, alternative partitions could be generated by relaxing the constraints and reinvoking either the behavioral synthesis system or the partitioning engine -or both.

COMPUTER

t

All tools in the MSS environment execute on Sun Sparcstations and are written primarily in C++ (with some pieces in C and Prolog).

Three examples

Three tutorial examples in this article illustrate MSS algorithms and results. The primary example is the Find, which performs a bubble sort followed by binary search. We use it as the running example because it is small. Such small specifications, however, do not require MCMs. We use two larger examples, the Move Machine and the Viper Mi- croprocessor, to illustrate the results.

Find. Find has an array of eight 4-bit numbers. O n power-up, Find sorts these numbers in ascending order. Then it accepts a 4-bit number, searches for the number in this array, and if found, re- turns the array index. Figure 2 lists its behavioral specification. The interface to the Find entity is kept quite simple for the sake of clarity. Writing a behav- ioral specification is the first and most important step in generating an MCM design using MSS. Tools in MSS can only interpret the VHDLconstructs used in the behavioral specification: they have no way of determiningwhether the spec- ification itself is "correct." The user must ensure that the specification re- flects what he or she wants.

Move Machine. The Move Machine is a small instruction-set processor. It has a 10-bit instruction pointer, 16-bit instruction register, four 16-bit gen- eral-purpose registers, and a set of 16 instructions with four addressing modes. It assumes a 1-Kbyte. 16-bit main memory. The VHDL specifica- tion of the Move Machine is given in Roy et al.h

Viper Microprocessor. Viper was the first microprocessor to be specified, de- signed. and implemented with formal verification in mind.' Viper's 32-bit ar- chitecture assumes a 1-Mbyte main mem- ory, two 32-bit general-purpose regis- ters. a 32-bit accumulator, and a 12-bit instruction pointer. Viper has 32 instruc- tions. 16 addressing modes, and eight destination selection modes. The VHDL specification of Viper's instruction set is about 450 lines long and is available to the public through the authors.

RAM

P i

(Register Block) Dk ' : b - 3

D A T A " 7

Figure 3. Register-level design of the Find, excluding multiplexers.

High-level synthesis

High-level synthesis is the process of generating a register-level design from a behavioral specification. The regis- ter-level design contains a data path and a finite-state controller. Our high- level synthesis system, called DSS (Dis- tributed Synthesis System). is a collec- tion of parallel algorithms.fi The data paths generated by DSS contain register- level modules selected from a parameter- ized module library. This module library contains VHDL descriptions of the reg- ister-level modules and information about the performance of each module (for example, its delay time). Since the mod- ules are parameterized in terms of their bit-widths. the area and delay profiles are stored as equations or tables.

DSS accepts a behavioral spccifica- tion in asynthesizable subset of VHDL6 and performance constraints in terms of the desired clock period and the up- per limit on the area. DSS views the clock period as the maximum dclay of the combinational blocks in any regis- ter transfer and the area constraint as the sum of the areas ofthe rcgister-lcvel components used in the data path.

The behavioral specification is first

translated into a dataflow graph repre- sentation called VIF (VHDL Interme- diate Format). The VIF representation is organized as a collection of segments. one segment for each process. subpro- gram, wait statement. and while-loop in the VHDL specification. The VIF rep- resentation is then loaded into DSS during the initialization phase.

DSS consists of the three major tasks described in this section. Figure 3 shows the register-level design generated by DSS for the Find specification. The en- tire system executes o n several unipro- cessor and multiprocessor platforms.

Scheduling and performance estima- tion. The operations in the dataflow graph are assigned relative time steps during scheduling. and the arithmetic operations are bound to physical arith- metic logic units available in the mod- ule library.

In DSS, the scheduler simultaneously explores many alternative designs. T o do so. it first generates all valid module sets from the module library. A valid module set is a collection of register- level modules selected (with duplicates allowed) from the module library such that ( a ) the selected modules are suffi- cient to implement all operations in the

April 1993 65

dataflow graph, (b) no module in the module set has a delay greater than the specified clock period constraint, (c) together, the modules in the module set do not violate the area constraint spec- ified by the user, and (d) the number of copies of any module does not exceed the maximum number needed to imple- ment all the parallelism available in the dataflow graph. Then, for each valid module set, the dataflow graph is sched- uled using a variant of Paulin's force- directed scheduling algorithm.x The force-directed algorithm uses an effi- cient heuristic to produce the fastest possible schedule subject to the resource constraints imposed by the modules available in the valid module set.

Following the scheduling step, a per- formance estimator is invoked to esti- mate the area and clock speed of the design that would be generated from the scheduled dataflow graph. Estima- tion of area considers the effects of in- troducing registers. multiplexers, rout- ing, and the controller. Estimation of execution time is based on the number of control steps used to schedule the dataflow graph. The estimator gener- ates a plot showing the area versus exe- cution-time trade-off for various mod- ule sets. The user can select a trade-off point, or the DSS will select the fastest of the least-area designs. For details on DSS scheduling and trade-off analysis, see Dutta e t al.'

Register optimization. Each carrier in the dataflow graph represents the need for a register (roughly speaking, a carrier is an edge connecting two oper- ation nodes in the dataflow graph). Two carriers are said to be compatible if their life spansdo not overlap. The prob- lem of register optimization is to find the best grouping of compatible carri- ers such that each group can be bound to a physical register. Within any group, the life spans of any two carriers should not overlap. Fewer groups lead to fewer registers in the data path. Like schedul- ing, register optimization is an NP-hard problem.

Our register-optimization algorithm is based on partitioning the dataflow graph at the VIF segment boundaries. In the first step, intrasegment optimiza- tion is carried out on the basis of exten- sive intrasegment life-span analysis to determine the best grouping of compat- ible carriers within each segment. The second step is the intersegment register

optimization. In this step, various carri- er groups determined from the intra- segment optimization are further com- bined into larger groups. Intersegment optimization does not require any fur- ther life-span analysis: instead, it simply exploits the data-dependence relation- ships (callingsequences) among the seg- ments as represented in the dataflow graph.

At the end of register optimization, a few groups of compatible carriers are left. Each group is then bound to a physical register selected from the mod- ule library. After the register optimiza- tion is completed, the interconnect struc- ture of the design is formed by adding multiplexers and buses as appropriate. For details on DSS register optimiza- tion, see Roy et al.'

Controller generation. Behavioral specification in VHDL is viewed as a collection of communicating processes. While facilitating abstract specification, this view complicates controller organi- zation and generation. The controller is conceptually organized as a collection of communicating synchronous state machines, one for each VIF segment. A privileged finite-state machine, called roof F S M , controls all segment-state machines, called leaf FSMs. Leaf FSMs control the register transfers in the data path.

The interaction between the root FSM and the leaf FSMs resembles what hap- pens in the VHDL simulation cycle. During each cycle of operation, the root FSM updates all the signal registers and invokes all the leaf FSMs. When all leaf FSMs reach the privileged wait states, the root FSM generates a special signal, called finish. In simulation terms, the finish signal indicates the suspension of all processes and the absence of pend- ing transactions. Then, the root FSM begins a new cycle.

We should point out that in spite of the conceptual organization as a collec- tion of state machines, the design is actually synchronous and, in fact, as- sumes the standard two-phase clocking scheme.

Partitioning for multichip designs

The register-level design generated by DSS may be too large to f i t on one

chip. In such cases, it should be parti- tioned into several chips, subject to area and pin constraints. The partitioning engine serves this purpose. It has a flex- ible framework that gives the user ac- cess to various partitioning algorithms and performance evaluators as well as an interactive interface that facilitates manual partitioning if desired.

The partitioning problem involves clustering the register-level elements (registers, arithmetic logic units, multi- plexers, etc.) into several groups. Each group of elements is bound to a VLSI chip, and all the chips together are bound to a multichip module. Let the register- level design R consist of register-level elements r , , rz, . . . , r,. These compo- nents should be collated into disjoint groups g,, g,, . . . , g,,, 1 5 m I n, subject to the following constraints:

where area(r) denotes the area of the register-level module r , and A denotes the area of any chip.

(2) Pin constraint: Vg,, pin-count(g,) 5 P

where P i s the maximum number of I/O pins allowed for any chip.

The MSS partitioning engine uses performance attributes and cost func- tions defined in PDL to drive the parti- tioning algorithms. PDL is a perfor- mance description language based on attribute grammars. PDL programs de- fine such attributes as area and pin-count as well as the number of chips in an MCM and the number of interconnec- tion wires among the chips (see the side- bar for example PDL programs).

Partitioning algorithms usually at- tempt to minimize the total number of chips and/or the total number of inter- connections among the chips. Partition- ing is a combinatorial problem; even its restricted versions are known to be NP- hard. Various heuristics representing trade-offs between the time required to generate a partition and the quality of the partition generated have been pro- posed. No single algorithm can produce efficient designs in all situations. We briefly describe two partitioning algo- rithms here.

Hierarchical clustering. In the hierar-

66 COMPUTER

Performance Description Language (PDL) Designs are hierarchical compositions of primitive de-

sign objects. A performance attribute is a function of the design structure and the performance attributes of its constituent objects. Thus, performance attributes can be conveniently described in terms of the attributes attached to objects in the design hierarchy. Attribute grammars’ are a sound formalism for doing so.

We have been developing an executable notation to describe both static and dynamic performance attributes succinctly. The notation, called PDL (Performance De- scription Language), is based on attribute grammars. In- terpretation of a PDL program describing a collection of performance attributes of a design yields a procedure to compute the values of these attributes. PDL users can describe a system’s performance attributes at multiple levels of abstraction.

In PDL a design schema is specified using schema construction clauses. Attributes can be associated with each design entity, and attribute computation rules can be associated with the schema construction clauses. We are incorporating a variety of data types, constructs, and short-hand notations in PDL to facilitate straightforward specification of complicated attributes. Here we introduce PDL through two simple examples. (The syntax used here is expected to change as PDL evolves.)

Static performance attributes. The PDL program in Figure A defines the area attribute of a system composed of processors and memory units. The first line in the pro- gram associates an attribute named area with each de- sign object in the description. In subsequent lines, sys- tem, module, processor, and memory are design objects specified by the user, and processor and memory are de- fined to be primitive objects that do not contain any other objects. There are two schema construction clauses: sys- tem contains one or more module objects; module con- tains zero or more processor or memory objects. En- closed between ‘( )’ is a structural qualification statement that restricts the number of ‘module’ objects in the sys- tem. Enclosed between ‘{ are the attribute evaluation rules. Attribute area of the system is defined to be the same as the sum of the area attributes of its constituent modules, and area of the module is defined likewise. En- closed between ‘[ ]’ is a list-construction operator. For ex-

attr area of‘ * is in/ :

system :: module

{ system.area = +[module.arca]: ]

m o d u l e :: processor. m e m o r y

+ +[memory.area]: )

( nrm( module) > 0: )

{ module.area = +[processor.area]

processor :: priniiri\~e: memory :: priniitive:

Figure A. PDL program to define the area attribute of a system.

dyiimnic cirtr length. max-length of queue is i n t := 0: e v m r attr flag of’ q u e u e is (enqucuc . dequeue) : process :: producer. consumer, queuc

( rir tm(consumer) = = 1 ; nitni(producer) = = 1: nlrr,i(queue) = = 1: )

queue :: priniirive: (on(f1ag)

th?ti (f(flag = = e n q u e u e )

hvgirl length := length + 1 : if’(max_length < length)

thrii max-length := length; rr7tl:

else length := lcngth - 1: I

producer :: primitive: consumer :: priniit ive:

Figure B. PDL program to define the maximum length of a module.

ample, this construct in line 4 constructs a list of the area values of all the module objects.

Dynamic performance attributes. The PDL program in Figure B defines an attribute to determine the maximum length of queue in a design that consists of three modules: producer, consumer, and queue.

The program defines length and max-length to be dy- namic attributes of queue and defines flag to be an event associated with the queue. An event is the means with which the environment can impact execution of a PDL pro- gram. In this program, flag is an enumerated type assum- ing enqueue and dequeue values. The attribute computa- tion rule associated with queue is conditional upon the flag event. Whenever flag assumes enqueue or dequeue, the length and max-length are updated.

PDL implements dynamic attributes by interfacing the PDL system with a hardware-description-language (HDL) simulator. The external events take place in the simulator as the HDL model is executed. We are currently address- ing how to establish the correspondence between the event attributes in the PDL code and events in the HDL simulator and how to formalize the interaction between these events.

For further details on PDL, see Mandayam and Vemut2

References

1. P. Deransart, M. Jourdan, and B. Lorho, Attribute Grammars, Lec- ture Notes in Computer Science-323, Springer-Verlag, 1988.

2. R. Mandayam and R. Vemuri, “Performance Specification and Measurement,” to be published in Proc. /F/P Conf. on Computer Hardware Descripfions Languages and Their Applications, North Holland, New York, Apr. 1993.

April 1993 67

Clock Clock Data,x phase 1 phase2 Reset Start

1 4 I I l l

Figure 4. Multichip design of the Find.

1 2 3 4 S 6 7 8 9 10 11 12 13 14 1s 16 17 -

N: population size (number of partitions in a generation) := 100 S: percent of new generation produced by selection := 20 C: percent of new generation produced by crossover := 80 M: percent of partitions mutated := 20 begin

create a random set of N partitions. evaluate the fitness of each partition. while (stopping criteria not satisfied) d o

begin Create S percent of new population of partitions by selection. Create C percent of new population of partitions by crossover. Replace the current generation by new generation of partitions. Mutate M percent of the current partitions. Evaluate the fitness of each partition. Save the partition with the best fitness.

end end

Figure 5. Genetic partitioning algorithm.

68

chical clustering technique. we assume that none of the register-level elements violates the area or pin constraints. A cluster represents a VLSI chip. A regis- ter-level element that is included in a cluster is said to be houtzrl; otherwise. i t is said to be,frrr. Initially. a free element is selected and included in a new clus- ter. A free element is added to this cluster if such an addition does not vio- late the area and pin constraints on the chips. In case of multiple choices, the element that causes the maximum in- crement in the cluster area is selected. This process is repeated until no more elements can be added to the cluster. Additional clusters are created until all elements are bound. This completes the first iteration of the clustering algorithm.

In the next iteration. the clusters formed in the previous iteration are viewed as elements and further cluster- ing is attempted. This iterative process continues until no further clustering occurs during an iteration. Hierarchical clustering is a deterministic, greedy al- gorithm. It usually gives less than opti- mal partitions. but i t is quite fast and memory efficient. We used i t to parti- tion the Find example into five chips. Figure 4 shows the resulting design.

Genetic partitioning. Randomized search algorithms that intermittently admit inferior designs in order to reach a final optimal design are known to be effective in solving combinatorial prob- lems. Genetic a1gorithms"'are one such class of algorithms. They search through generations of populations (solutions to the given problem) to arrive at a population containing an individual so- lution that meets the given criteria. As the population evolves during the search process, the average fitness of the pop- ulation improves.

We have developed a genetic algo- rithm to solve our partitioning problem (see Figure S)." Initially. it creates a number of random partitions. The f i t - ticss of a parti t ion is dcf ined a s l / ( l + P D + A D ) . where P D is the net deviation from the pin constraint on each chip and A D is the net deviation from the area constraint on each chip. Fitness is a value between 0.0 and 1.0. The algorithm uses the following oper- ators to generate the next generation of part i t i o ns :

Selrctiori. A randomly selected par- tition that is highly fit (fitness value >

COMPUTER

0.8) can be moved into the next gener- ation. Twenty percent of the partitions in the new generation are created us- ing selection.

Crossover. Two highly fit partitions P,.P, in the current population are ran- domly selected. The largest chip C (in terms of total area of the register-level components in it) in P , is copied into P2. Register-level elements in C that are currently assigned to other chips in P, are deleted from those other chips. Both P , and P2 are moved into the next generation of partitions. Eighty per- cent of the partitions in the new Sener- ation are created using crossover.

Mirtrrtion. A partition is randomly selected and a register-level element from some randomly selected chip i n the partition is moved to another ran- domly selected chip. Mutation is ap- plied to 20 percent of the partitions in the new generation.

The genetic partitioning algorithm ter- minates when a termination criterion is satisfied. The criterion usually is that a partition with fitness 1.0 is found. In addition. to find optimal partitions (not just constraint-satisfying partitions). other criteria must be specified. These criteria might include the computation time. the number of generations to be searched. or a lower limit on a measure of global optimality such as the total number of interconnection wires or the total number of chips. The genetic par- titioning algorithm is quite slow but can find better quality partitions. in terms of the number of chips and the number of interconnections among them. than the hierarchical clustering algorithm. For complete details of the genetic parti- tioning algorithm. see Vemuri and Ve- muri.”

Test structure compiler

Complexity of designs and limited probing accessibility in multichip mod- ules necessitate incorporation of test structures in the design to improve ob- servability and controllability. The IEEE standard boundary-scan archi- tecture provides a means of gaining access to the chips buried inside the multichip module.‘

The MSS test structure compiler au- tomatically incorporates the IEEE stan- dard test access port (TAP). scan cells. and other devices necessary to support

boundary scan. T o each chip in the multichip design generated from the partitioning phase. the test structure compiler adds a 16-state T A P control- ler. a 3-bit instruction register. an in- struction decoder. a bypass register. scan cells for all the pins, and optionally. a device identification register. The fol- lowing instructions are supported by default: Extest. Bypass. SampleiPreload. Intest. and Idcode. Five pins are added to each chip: Tck. Tms. Tdi. Tdo. and Trst:*.

The test structure compiler has ac- cess to a library of boundary-scan struc- tures written in VHDL. The compila- tion consistsof three major phases. First. relevant design and connectivity infor- mation is extracted from the VHDL files generated at the end of the parti- tioning phase. Then. during the bound- ary-scan synthesis phase. all necessary test structures are added in each chip and appropriate connections are added among the new test structures and the existing 110 pins of the chip. Finally. during the serial-scan-path formation phase. the chips are interconnected such that theTdoofone chipgoes to the Tdi of another during the test mode. The test access port is added to the overall MCM design. Thus. a serial scan path is established through the entire

MCM and can be exercised in its test mode.

The test structure compiler outputs the resulting MCM design in the form of several VHDL files. one for each chip and one for the connectivity informa- tion among all the chips.

Silicon compilation

Mask layouts for each chip in the MCM are generated using the LagerIV silicon compiler tools.” LagerIV is a collection of logic synthesis and physical design- automation tools for VLSI chips. Spec- ifications to LaZerIV are written in two formats: SDL specifies structural com- position of existing library modules and standard cells. and BDS specifies finite- state controllers. We wrote two transla- tors to generate appropriate SDL and BDS files from the register-level VHDL description of the chip being synthe- sized. These files are then processed by the LagerIV tools to generate fabricat- able mask layouts. We currently use 2.0- micron scalable CMOS technology: the design process. however. is completely technology-independent.

Figure 6 shows the mask layout. with bonding pads. for one of the chips in the Find MCM.

Figure 6. Layout of chip 5 in the Find MCM (with boundary scan).

April 1993 69

Design verification methodology

Although the designs generated by various synthesis tools are expected to be correct “by construction,” some soft- ware errors can remain in the synthesis tools and result in incorrect designs. To verify that the designs generated meet the functionalspecifications and to avoid costly post-fabrication test and repair. the MSS design process includes exten- sive simulations.

The design is verified through simu- lations at various stages. Our approach is primarily based on functional testing at each level of design abstraction. name- ly, the behavioral. the register. and the switch levels. We tested all the compo- nent generatorsin the various MSS com- ponent libraries before adding them to the libraries.

T o facilitate functional testing. we provide two test-bench compilers. A fesf bench is an executable textual de- scription of the test vectors and expect- ed results. WAVES is an IEEE stan- dard for writing test benches.’ When simulating the behavioral specifications. the MSS user must have employed good functional testing practices so that rea- sonable fault coverage is possible. MSS uses these test vectors to automatically generate functional tests at lower levels of abstraction. All the user has to d o is

provide behavioral-level test vectors in WAVES format. The MSS multicom- ponent test-bench compiler generates test benches for each component as well as for the entire MCM design. These test benches are in the form of WAVES files that can be used in conjunction with any VHDL simulator for register- level simulations of the MCM design.

WAVES test benchescan also be used for post-fabrication testing of the MCM and its constituent chips (both before and after being embedded in the MCM package), provided the automated test equipment accepts WAVES files. Many test-equipment manufacturers are de- veloping software packages to facilitate inputs in WAVES format. For post- layout simulations of the chips, we use a switch-level simulator that extracts cir- cuit descriptions from files generated by the silicon compiler. The switch-lev- el simulator uses a linear timing model based on the extracted capacitances to provide both functional and timing sim- ulations. More detailed simulations (us- ing a circuit simulator such as SPICE) are prohibitively expensive for such large designs. Note. however. that all compo- nent generators in the library used in conjunction with the silicon compiler have been simulated with SPICE.

The switch-level test-bench compiler automatic a I l y ge n e ra t e s switch - 1 eve 1 simulator-stimulus files. The source for these files is the WAVES external files

Figure 7. Multi- component test- bench compiler.

70

Register-level design (VHDL)

Behavioral-level test bench (WAVES simulator or VHDL)

I Trace file for all signals

i-4-p Compare Stop

Multichip design z Waveform

partitioning extractor

WAVES external files for each chip and the overall MCM (without boundary scan)

I A

WAVES test benches for each chip and for overall MCM (with boundary scan)

produced by the multicomponent test- bench compiler. Simulation results, at both the register and switch levels, are automatically compared with the ex- pected responses. using two programs provided for this purpose.

Multicomponent test-bench com- piler. Figure 7 shows the multicompo- nent test-bench compiler. I t generates test benches for the individual chips as well as for the entire MCM. T o generate these test benches, it is necessary to trace all the signals that realize the in- terconnections among the chips. This is done while simulating the register-level design generated by DSS with the same WAVES test bench used to simulate the behavioral specification.

If the register-level design is correct, it should produce the same responses as the behavioral specification for each test stimulus. However. there is one difficulty. The behavioral model has no concept of clocks or timing: The re- sponses are produced “instantaneous- ly.” The register-level design mesa two- phase clocking scheme; the responses are generated a finite number of clock cycles after applying the stimulus.

T o determine the exact instance in time when the signal values are valid, we use the finish signal generated by the controller. Recall that this signal goes high when all the leaf FSMs reach the wait state. indicating that the system has finished responding to the stimulus. The values at the output ports at that time should match the expected values indicated by the WAVES test bench. The next test vector can then be ap- plied. A program is used to compare, for each test vector in the WAVES file. the actual responses of the multicom- ponent design with the expected respons- es. A trace file that traces all the wave- forms (signal values) is generated for the entire simulation run.

The next s tep is t o generate the WAVES test benches for each chip. The MCM design generated by the par- titioning engine is analyzed to deter- mine the names of the signals in the register-level design that correspond to the input and output pins of each chip. The waveforms on these signals are ex- tracted from the simulation trace file. Waveforms on the input pins of a chip represent the stimuli, and waveforms on its output pins represent the expect- ed responses. This data is generated as a WAVES external file. A WAVES

COMPUTER

external file is generated for each chip and one for the overall design.

The final step is to transform the wave- form data into a form suitable to exer- cise the scan path generated by the test structure compiler. During this step. the user can choose several options corre- sponding to the boundary-scan options selected during the test-structure com- pilation phase. These include parallel- idserial-out or serial-in/parallel-out test- ing, checking the device identification register. and checking the bypass mode. Complete WAVES data sets. one for each chip and one for the overall MCM design, are generated.

Switch-level test-bench compiler. In- put for the switch-level test-bench com- piler includes the test benches in WAVES format generated by the mul- ticomponent test-bench compiler and the VHDL entity description of the design.The switch-level test-bench com- piler generates a switch-level-simulator command file that contains commands to apply all test vectors specified in the WAVES test bench. Then. switch-level simulation is carried out using a switch- level model extracted from layouts of the entity. The simulation results can be viewed on the simulator's waveform display. The results are also automati- cally compared with the expected rc- sponses.

Figure X shows a partial snapshot of the waveforms generated by the switch- level simulator used to simulate chip 5 in the Find design. Figure 0 shows the relevant portion of the WAVES exter- nal file generated by the multicompo- nent test-bench compiler. A compari- son of these figures shows that the switch-level simulations of chip 5 are in agreement with the expected values in the WAVES file.

MCM physical design The final stage in the design process

is the design of the MCM package itself. We currently use the Mentor Graphics MCM design tools for placement. rout- ing, and analysis of the package. The term packnging conzpilrr refers here to the MCM physical design tools collec- tively.

The MCM packaging compiler takes as its input the top-level VHDL de- scription of the multichip design and the mask-layout files of each chip. We

% STRUCTURAL TEST VECTORS (uith expected outputs) % Ports

% nmcir35-0ut nmcir2lLout mcirlO-HUXLSELECT mcirlO-out nmcir29_out nmclr28.out % nmcir27-out nmcir26-out nmcir25.out nmcirl2-out nmcirl6-out nmclr23-out 'I. nmcir22_out mcir2_UUX_SELECT mclr2-out nmclr24_out nmclr5_out % mcir7_MUX_SELECT nmclr7_CONTROL nmcir7_out Clock-phase1 Clock-phase2 Clear % tms tck trst tdi tdo

"0000000000000000000000000000000000000000000000000000000000000000000000000010111110 : 160 ns; "000000000000000100010100110l100001000000000000110000000000001110000000000000111010 : 10 us; "00000000000000010001010011011000010000000000001100000000000011100000000000011~1010 : 20 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000000111010 : 10 us; "0000000000000001000101001101100001000000000000110000000000001110000000000010011110 ; 160 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000000011110 : 10 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000001011110 : 20 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000000011110 : 10 ns: "0000000000000001000101001101100001000000000000110000000000001110000000000010011110 : 160 us; "0000000000000001000101001101100001000000000000110000000000001110000000000000011110 : 10 ns; "00000000000000010001010011011000010000000000001100000000000011l0000000000001011110 : 20 ns; "0000000000000001000101001101100001000000000000110000000000001110000000000000011110 : 10 ns; "000000000000000100010i0011011000010000000000001100010010000011100000000000100~1110 : 160 ns; "0000000000000001000101001101100001000000000000110001001100001110000000000000011110 : 10 ns; "0000000000000001000101001101100001000000000000110001001100001110000000000001011110 : 20 ns; "000000000000000100010100110l100001000000000000110001001100001110000000000000011110 : 10 us; "0000000000000001000101001101100001000000000000110001000100001110000000000010011110 : 160 ns;

Figure 9. Example of WAVES external file.

wrote a net-list extractor to generate a net-list filc from the VHDL file. that has the chip connectivity inform at ion. Various geometry files indicating the bounding-box information are created from mask-layout files.

The packaging compiler operates in two steps: During p/rrcemrrir. all chips are placed o n the chip layer: during vortririg. the interconnect is generated to connect the chips according to the net-list specification. Several wiring lay- ers arc' used for this purpose. Placement

and routing steps attempt to minimize the total area. signal delays, and inter- connect wire lengths. Users can try in- teractive placement to generate tighter designs. The left side of Figure 10 on the next page shows a placed and routed MCM design for the Find example.

Following successful placement and routing. thermal analysis and cross-talk analysis are performed. Dense MCM designs can generate a considerable amount of heat. Placement and routing of MCMs should eliminate hot spots

April 1903 71

Table 1. Source code information for the MSS tools.

Tool Language Size in Lines

Distributed Synthesis System (DSS) Partitioning engine

Hierarchical clustering Genetic algorithm

Test structure compiler Multicomponent test-bench compiler Switch-level test-bench compiler Packaging compiler All the translators SDL and BDS libraries

C++ 50.000 C++ 6.000

C 1 .000 c + + 3.500 C++ 6.000

C 300

Prolog 1 50

C 2.000 SDLiBDS 1,000

Table 2. Design data for the examples.

Feature Move

Find Machine Viper

Lines of behavioral specifications Number of chips Total number of pins of all chips No. of IiO pins (before boundary scan) No. of IiO pins (after boundary scan) No. of register-level components

Excluding boundary-scan devices Including boundary-scan devices

Area constant (mm squared) Pin constraint Clock period constraint (ns) No. of interconnections among chips

62 5

4-39 I5 20

90 559

25 120 100 235

80 3

336 49 54

49 322

5 0 150 100 215

450 5

1.530 69 74

158 1.733

50 350 I O 0 640

and avoid the possibilityof thermal fail- ure. A thermal-analysis tool simulates the three types of heat transfer. namely. conduction. convection. and radiation. The results of thermal analysis can be examined visually. as shown on the right

side of Figure IO for the Find example. If the resultsindicate potential hot spots. the thermal-analysis information is fed back into the place-and-route steps to arrive at a more thermally sound MCM design.

4

Ij ..................... ....._. ......_.___._. ...... ___. ..... . ....... .. ...... . .. . . . ....... ... . .... . .. .

Figure 10. Layout (left) and isothermal profile (right) of

72

Design results and resource usage

While reducing the design time, large- scale design environments such as MSS put a very high demand on file space. We will try to give some feel for the design effort and computer-resource usage involved in generating the multi- chip designs for our three examples, namely. the Find. the Move Machine. and the Viper Microprocessor.

Tools in the MSS environment have been under development for one to three years. We recently completed stream- lining the entire design process from specification to designs suitable for fab- rication. We have not fabricated any MCMs because of high costs. but we have successfully fabricated and tested several VLSI chips. including the Move Machine, that were synthesized using the MSS environment.

Table 1 shows the number of lines of program code for each major tool in the MSS. For parsers. we counted the lines in the parser-generator (Yacc or Bison) source files. All the software executes on a Sun Sparcstation I with 28 Mbytes of main memory and 260 Mbytes of lo- cal disk used mostly as a swap area and attached to an 8-Gbyte file server.

Table 2 shows some data pertaining to the three example MCM designs. The target technology for the chips is 2.0- micron scalable CMOS in each case. If we had chosen a smaller feature size. say. 1 .O-micron CMOS. then none of the

I ie Find MCM.

ISOTHERM TEMPERATURES ( *C)

R 26.286

26 257

26.228

26,198

26 169

26.139

26.110

26.081

26 ,051

26.022

25 992

a ,963

COMPUTER

examples would have required parti- tioning. Note that none of the examples is pin-limited. In fact. the primary inter- face of each design has relatively few pins - 74 in the case of Viper. But once we start partitioning the design because of area limitation. the number of pins needed for the individual chips is quite high because the number of wires con- necting register-level elements in the design is large.

The large number of pins in each indi- vidual chip contributes to the relatively large number of boundary-scan devices introduced. Each pin needs a bound- ary-scan cell and each chip needs a T A P controller. The clock is relatively slow in each case. only about 10 megahertz. This again is due to the 2.0-micron tech- nology we are using and the way the library cells are designed: they are not designed for speed.

All arithmetic units are combination- al blocks with ripple-carry features. Our synthesis software itself is independent of the technology used: if we replace the library with a better one. we get better designs. The MCM technology is a hy- brid technology with four signal layers and assumes surface mounting.

T o give a realistic picture of the de- sign times involved. Table 3 shows the approximate number of person-hours (wall-clock time) taken to complete various design tasks for the three exam- ples, starting with behavioral-specifica- tion development and endingwith MCM place-and-route. The times shown are for one design iteration. Subsequent it- erations should take shorter times. As the various algorithms and tools in MSS improve. the synthesis times are expect- ed to decrease significantly. Simulation time will probably decrease at a much slower rate (assuming the same hard- ware platform): the simulators are al- ready quite fast and efficient.

Layout synthesis. both at chip level and package level. has been a key can- tributor to the design time. However. the layouts generated are reasonably compact. O n the other hand, behavioral synthesis and partitioning (at least. the clustering algorithm) are fast. but the designs generated by these tools have room for improvement. We are cur- rently investigating various algorithms for high-level synthesis and design par- titioning to generate better quality mul- tichip designs.

Table 3 shows the amount of disk space used by the various files. Mask-

Table 3. Design and simulation times for the examples (in minutes).

Feature Move

Find Machine Viper

Behavioral specification preparation and simulation Number of behavioral-level test vectors Behavioral synthesis Partition i ng e n gi ne

Hierarchical clustering Genetic partitioning

Tcst structure compilation Silicon compilation (all chips)

Without BSD With BSD

Packaging compilation (place and route) Thermal analysis Multicomponent test-bench compilation (including

Simulation of multichip design with boundary scan simulation of the register-level design)

options at register level (each chip plus overall) and comparison of results

S w i t c h -lev e 1 t cs t - be nc h com pi I at ion ( all chips) Switch-level simulation and comparison of results

(all chips)

Total ( i n hours)

120 240 8 20 7 3

7 i 2

360 300 1 1

3 0 30 360 300 180 120

- 5 1

30 30

35 45 3 3

180 240

22 26

3.360 116

5

5 600

1

60 900 540

1 0

60

120 IO

360

101

Table 4. File spaces for the examples (in Kbytes). ~~

Move Feature Find Machine Viper

Behavioral specification (VHDL) 1.4 Behavioral-level test bench (WAVES) 1 Register-level design (VHDL) 100 Multichip design after partitioning (VHDL)

MCM design with boundary-scan structures (VHDL)

Mask layouts L , + 1.: + . . . + L,,

c', + C': + . . . + c,, + c 1 i o

BC', t BC, + . . . i BC,, + BC 175

Without BSD 1 .ooo With BSD 8.000

MCM layout design 176 Trace file in multicomponent test-bench compiler 1 .000 Test benches (WAVES) T , + T2 + . . . + T,, + 7' 1.500 Switch-level test benches S T , + ST? + . . . + ST,, + T 3.000 Switch-level simulation results files 1.500

Total 16.603

1 .5 15 1 5

100 200

125 300

150 225

2.000 4.500 3.500 20.000

526 1.200 1.000 4.000 1.500 5.000 3,000 10,000 1.500 5.000

13,404 50,145

layout files. simulation-trace files. and test benches take considerable space.

he MSS tools are currently be- ing used for prototyping MCM T designs. HoweLer. thev have not

yet attained the maturity expected of industrial-strength design tools. We are continuing our research and dcvelop- ment efforts with specific attention to

specifying and imposing constraints on other performance attributes. such as heat dissipation and power consumption. at the behavioral-syn- thesis step.

introducing formal verification tools as a possible way to reduce the sim- ulation timer. and

implementing performance-driven MCM placement and routing.

April 1YY3 73

Acknowledgments This research is sponsored in part by the

U S Air Force Wright Laboratorim under ContractNo.F33615-91-C-I811. the Defense Advanced Research Projects Agency under Order No. 7056 monitored by the Federal Bureau of Investigation under Contract No. J-FBI-89-094. and an ACM/SIGDA gradu- ate scholarship. The work on PDL is also partly sponsored by Semiconductor Research Corp. under Contract No. 92-DJ-293.

We thank Darrcll Barker. John Hincs. Neal Stollon. and Justin Harlow for their useful cnmments and critique of this work. We gratefully acknowledge Bob Reese's help with the LaeerlV tools.

References I . M/r/tichi/i Mor/ri/e.\ . R. Johnson et al..

eds.. IEEE Press. Piscataway. N.J.. 1991.

2 . l E E E Strrridnrd V H D L Ltr~igtrrrgr R erice Mniirrnl. IEEE Standard 1076. Pi5- cataway. N.J.. 1987.

3. l E E E Strrrirlrrrd Tcst Acc.rcs for./ r r r r d BOW^ dirry-5crirl A rchirccrrrrc~. I E E E Stan- dard 1139.1. Piscataway. N.J.. 1990.

4. U.$er'.5 <;t l / t f f 1 0 M.'A VE.y ( M . ' i / l ' C f ~ J ~ t l 1 ( ! , I t /

Vector E.rc/~rrr~gc, Specific.rrtioii). Dratt Document V4.4. Piscataway. N.J.. 1990

5. R.A. Walker and R. Caniposano. A .Sur-- ~ ' e y ofH ig / l -Levc l S??lt/lesis ~ \ ' I C I I ~ . S . Klu- wer Academic Publishers. Norwell. Mass.. 1991.

6. J. Roy et al.. "DSS: A Distributed High- Level Synthesis System." l E E E Design & Test ofC'oniprrtc,rs, Vol. 9. No. 2. l u n e 92. pp. 18-32.

7. W.J . Cullyer. "Implementing Salct! Critical Systems: The Viper Micropro- cessoi-." in V L S l S p ~ i f i ~ r / / / ( ~ t i . C'errfic.tr- / I ( J H , ~ i r d S?.rit/resis. G . Rirtwistlc and P.A. Subrahnianyam. eds.. Kluwer Ac+ dcmic Publishers. Norwell. Mass.. 198X. pp. 1-26.

8. P.G. Paulin and J.P. Knight. "Force-Di- rectedSchedulingin AutomaticData Path Synthesis." Proc. 24th De.\/gti Afft(Jrf1rl-

t iori C ~ t i f . . IEEE CS Press. Los Alami- 10s. Calif.. Order No. 781. 1987. pp. 195- 202.

9. R. Dutta. J. Roy. and R. Vcmuri. "Dis- tributed Design Space Exploration for High-Level Synthesis Systems." Proc. 20th Desigrl A/ltOn10t/f>11 Co!lf.. IEEE <'S Press. Los Alamitos. Calif.. Order No. 2 - 2 2 . 1992. pp. 6446.50.

1 0 . D . <;oldberg. (;c,rii,ric AI,qorirhni\ i r r Ser/rc//. ( ~ p t l r ? l l ~ U t / i J ~ l , utlt/ Mtrc./ritir Lerrrl1- irig. Addison-We\lcy. Reading. Mass.. 1989.

1 1 . R. Vemuri and R. Vemuri . "A Genetic Algorithm for Multichip Partitioning.'. Tech . R e p o r t T M - E C E - D D E - 9 2 - 2 6 ,

Dept. of Electrical and Computer Eng.. LJni\. of Cincinnati. 1992.

12. R. .lain et al.. "An Integrated C A D Sys- tem for Algorithm-Specific IC Design." / € € E Tr(rti.\. CAD. Vol. IO. No. 3. Apr. 1991. pp. 147-163.

1.3. R. V u t u k u r u . P. Suhba Rao. and R. Ve- muri. "Boundary-Scan Test Structures and Test-Bench Compilation in a Multi- chip Module Synthesis System." Proc,. I E E E .Uir/tic/rip .hlodrr/c~ C.otrf . . IEEE Press. Piscataway. N.J.. 1992. pp. 44-17.

of Technologv at Kharagpur and a P h D from Case Western Reserve University. both in computer engineering. Vemuri is the recipi- ent 0 1 the Siddhartha Gold Medal. a distin- guishedrcscarch anard. an outstanding teach- e r a n a r d . and corecipient o f a hest-paper award. He is a member of ACM. IEEE. and the Computer Societ!

Nand Kumar. a PhD stu- dent in the University of Cincinnati's Department of Electrical and Com- puter Engineering. has research intcrcstsincom- puter-aided design of digital systems. high-le\,- el synthesis. and formal \ erification.

Raghu Vutukuru . a soft- Mare engineer a t LSI Logic. ha5 research in- terests in design auto- mation. simulation. and testing. He received an MTech from the Indian Institute o f Technology and an MS from the Uni- versit) of Cincinnati.

Prasad Subba Rao. a software engineer at LSI Logic. has research in- terests i n design synthe- sis and parallel algo- rithms. He received his BS from Bangalore IJni- versity. India. and his MS from the University of ('in ci n n at i .

Praveen Sinha works for Intel Corp. in Phoenix. Ari7ona. He received his BTech in electrical engi- neering from the Indian Institute of Technology, Kanpur, and his MS in computer engineering from the University of Cincinnati in 1992.

Ning Ren IS a PhD stu- dent in electrical and computer engineering at the UniversityofCincin- nati. He earned his BS and MS in computer en- gineering from Chengdu Institute of Radio and [Jniversity of Cincinnati

U respectively.

I " c

Paddy Mamtora works in the IC C A D Support G r o u p a t M e n t o r Graphics. H e received his B E from the Univer- sity of Bombay in 1988 and his MS in electrical engineering f rom the University of Cincinnati in 1992.

tion and in systems-lev- el hardware design

Ram Vemuri is a P h D student at the Universi- t y of Cincinnati. where he received his MS in 1989. His research inter- ests include design au- tomation of VLSI cir- cuits. optimization of logic design. and genet- ic algorithms.

from the University of Cincinnati.

Keader\ can contact Ranga Vemuri at the Laboratory lo r Digital Design Environments. Department of Electrical and Computer Engineering. M.L. #XI. University of Cincinnati. ('incinnati. O H 4322 1-0030: e-mail ranga.vemuri~'uc.edu.

74 COMPUTER