Designing Run-Time Reconfigurable Systems with JHDL

17
Journal of VLSI Signal Processing 28, 29–45, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. Designing Run-Time Reconfigurable Systems with JHDL PETER BELLOWS ISI Systems, 3701 North Fairfax Drive, Arlington, VA 22203-1714 BRAD HUTCHINGS Department of Electrical andComputer Engineering, Brigham Young University, Provo, UT, USA 84602 Received July 1999; Revised December 1999 Abstract. Run-time reconfigurable (RTR) systems are FPGA-based systems that reconfigure FPGAs during ex- ecution to alter hardware organization and composition to meet the varying needs of applications as they execute. These systems are difficult to describe with conventional tools (schematic capture, VHDL synthesis, etc.) because most tools assume that the underlying hardware organization is static. JHDL is a Java-based design environment capable of describing, netlisting, simulating and executing complex, dynamic RTR systems. Using conventional Java syntax, users describe hardware structures as objects; as these hardware-object constructors are invoked, JHDL automatically configures hardware circuits onto FPGA hardware, thus directly supporting the dynamic nature of RTR systems with standard language constructs. JHDL also supports codesign of the software and hardware parts of the system; in other words, the entire application can be described in a single piece of Java code that can be co- simulated/co-executed with the FPGA hardware. To date, RTR design with JHDL has focused on the development of automated target recognition (ATR) systems, and working systems described in JHDL have been demonstrated. Keywords: FPGAs, CAD, configurable computing, image processing 1. Introduction When developing applications for FPGA-based config- urable computing machines (CCMs), designers must perform two general tasks. First, they must design the circuitry that implements the necessary function- ality for the application. This is typically done using commercial CAD tools such as VHDL synthesis or schematic capture in concert with the back-end tools obtained from the FPGA device vendors. Second, de- signers must write a supervisory program that con- trols the configurable-computing platform during the operation of the application. In some cases this con- trol program is relatively simple, just loading a single configuration and then loading and retrieving data. In more complex run-time reconfigured applications for example, these control programs need to load a variety of configurations and data on demand, as the appli- cation proceeds. This typically requires very tedious, low-level programming. Currently, the control program and the circuit description must be developed and sim- ulated independently; the designer is responsible for ensuring that the these two pieces of software coop- erate correctly, typically through repeated download, execute and compile cycles on the CCM. JHDL is a design environment targetted primarily at FPGA systems that overcomes many of these problems by allowing users to describe all aspects of FPGA appli- cations with a single programmatic description, includ- ing: control software, user interfaces, static hardware and dynamic hardware. JHDL uses standard Java syn- tax and designers can use standard Java compilers and debuggers for all aspects of system design. All hard- ware is modeled as distinct objects that extend classes from the JHDL library. The dynamic nature of hard- ware is represented using standard constructors: when

Transcript of Designing Run-Time Reconfigurable Systems with JHDL

Journal of VLSI Signal Processing 28, 29–45, 2001c© 2001 Kluwer Academic Publishers. Manufactured in The Netherlands.

Designing Run-Time Reconfigurable Systems with JHDL

PETER BELLOWSISI Systems, 3701 North Fairfax Drive, Arlington, VA 22203-1714

BRAD HUTCHINGSDepartment of Electrical and Computer Engineering, Brigham Young University, Provo, UT, USA 84602

Received July 1999; Revised December 1999

Abstract. Run-time reconfigurable (RTR) systems are FPGA-based systems that reconfigure FPGAs during ex-ecution to alter hardware organization and composition to meet the varying needs of applications as they execute.These systems are difficult to describe with conventional tools (schematic capture, VHDL synthesis, etc.) becausemost tools assume that the underlying hardware organization is static. JHDL is a Java-based design environmentcapable of describing, netlisting, simulating and executing complex, dynamic RTR systems. Using conventionalJava syntax, users describe hardware structures as objects; as these hardware-object constructors are invoked, JHDLautomatically configures hardware circuits onto FPGA hardware, thus directly supporting the dynamic nature ofRTR systems with standard language constructs. JHDL also supports codesign of the software and hardware partsof the system; in other words, the entire application can be described in a single piece of Java code that can be co-simulated/co-executed with the FPGA hardware. To date, RTR design with JHDL has focused on the developmentof automated target recognition (ATR) systems, and working systems described in JHDL have been demonstrated.

Keywords: FPGAs, CAD, configurable computing, image processing

1. Introduction

When developing applications for FPGA-based config-urable computing machines (CCMs), designers mustperform two general tasks. First, they must designthe circuitry that implements the necessary function-ality for the application. This is typically done usingcommercial CAD tools such as VHDL synthesis orschematic capture in concert with the back-end toolsobtained from the FPGA device vendors. Second, de-signers must write a supervisory program that con-trols the configurable-computing platform during theoperation of the application. In some cases this con-trol program is relatively simple, just loading a singleconfiguration and then loading and retrieving data. Inmore complex run-time reconfigured applications forexample, these control programs need to load a varietyof configurations and data on demand, as the appli-

cation proceeds. This typically requires very tedious,low-level programming. Currently, the control programand the circuit description must be developed and sim-ulated independently; the designer is responsible forensuring that the these two pieces of software coop-erate correctly, typically through repeated download,execute and compile cycles on the CCM.

JHDL is a design environment targetted primarily atFPGA systems that overcomes many of these problemsby allowing users to describe all aspects of FPGA appli-cations with a single programmatic description, includ-ing: control software, user interfaces, static hardwareand dynamic hardware. JHDL uses standard Java syn-tax and designers can use standard Java compilers anddebuggers for all aspects of system design. All hard-ware is modeled as distinct objects that extend classesfrom the JHDL library. The dynamic nature of hard-ware is represented using standard constructors: when

30 Bellows and Hutchings

a hardware object is constructed, the corresponding bit-stream is automatically fetched and compiled onto anFPGA device. Analogously, when a hardware object isdeleted (using delete methods provided by the JHDLlibrary), the FPGA device resources are reclaimed.Users are free to combine these dynamic hardwaredescriptions with other Java code such as other appli-cation software, Swing-based user interfaces, etc. Theresulting description can be either co-simulated (thecontrol software executed in conjunction with circuitsimulation), or co-executed (the application softwareexecuted in conjunction with actual execution of hard-ware circuits on an FPGA platform). JHDL has beenused to develop a working hardware prototype of anautomated target recognition (ATR) application.

The paper is organized as follows. First, projectgoals, literature review and design strategy are brieflypresented. The remainder (and majority) of the paperpresents an example-driven development of how JHDLachieves the described RTR design objectives. Our mo-tivating example will be a simple multiply-accumulator(MAC). The multiply unit will be custom-configurableto any user-defined constant; this will let us illustratehow JHDL models both global and local (or partial)RTR. While simple, this example will demonstrate allthe basic JHDL RTR constructs, and could easily beextended into more complex examples such as digitalfilters, etc. In Section 6, we develop the adder and mul-tiplier as a brief introduction to JHDL design syntax.In Section 7, we show how we can specify the con-figuration sequencing and data transfer with our MACsystem. Then in Section 8 we detail how the JHDLclasses are able to specify this behavior in a platformindependent manner that provides identical behavior inboth simulation and execution modes. In Section 9 weshow how we used these basic JHDL models to controla complex FPGA system for automatic target recog-nition. Then in Section 10 we discuss possible futureextensions of this work, and summarize our findingsand contributions in Section 11.

2. Project Goals

The primary objective of this research project is to de-velop a tool-suite/design-environment for describingconfigurable-computing applications thatmergesthecircuit description and the control program into a sin-gle, integrated piece of software. This project has thefollowing additional requirements and potential bene-fits.

1. It must use an existing programming language withno extensions.This will make the tool accessible toa wider range of programmers by allowing them touse commercially-available compilers.

2. The CCM-control paradigm must be CCM indepen-dent.CCM control details should be abstracted to ahigher level of programming. This will make CCMsmore accessible to programmers and will also easethe process of retargetting applications to run on avariety of different CCMs.

3. The description method must support run-time andpartial configuration.These are the most demand-ing CCM applications and will be used from theoutset to stress the design environment.

4. The integrated description must serve for both simu-lation and final execution with no modifications.Forsimulation, it must support end-to-end simulationof applications that may consist of many configu-rations. By changing a software flag, the integrateddescription should be able to switch transparentlyfrom simulation to actual hardware execution onthe CCM.

3. Background

There have been several efforts to create textual CADtools for FPGA designs. The JHDL effort itself hasbeen ongoing for almost two years, and initial findingshave previously been reported [1, 2]. In an early pio-neering effort at DEC PRL, Vuillemin and his groupdeveloped and used Perle [3] to design CCM applica-tions on DECPerle-1 and more recently on thePamette [4]. Perle is a C++-based CAD tool that useshierarchy and inheritance to describe user circuits. APerle description, when compiled and executed, gener-ates a netlist that is then processed by Xilinx placeand route tools. Other similar examples of object-oriented circuit-design languages include Spyder [5]and Lola [6].

Run-time reconfiguration (RTR) has been receivingmore attention lately and a few efforts are starting toreport results with tools and run-time environments.Luk and Shirazi [7] reported on compilation tools forRTR designs. Their tools consist of a partial evalua-tor, an incremental configuration calculator and a op-timizer. One of their goals is to automatically gener-ate circuit overlays that have been optimized for use inpartially configured applications. Burns and Donlin [8]reported on a run-time system for dynamic configura-tion. They proposed a run-time system that attempts

Designing Run-Time Reconfigurable Systems 31

to automatically manage FPGA resources similar tothe way a conventional OS manages memory or CPUresources. The system as proposed consists of a vir-tual hardware manager (for managing the FPGA re-sources), a transform manager (for modifying circuitsto accommodate available device resources), a configu-ration manager (to manage the configuration process),and a device driver. Gokhale and Gomersoll [9] re-ported on their high-level compilation tools for fine-grained FPGAs such as the National CLAy device.These tools accept a dbC (data-parallel C) version ofthe algorithm, partition it into control and datapath andthen implement the circuit using parameterizable mod-ule generators that have been optimized for fine-grainedFPGAs. Lysaght has also reported on a VHDL-basedsimulation environment for RTR [10].

JHDL has some things in common with many ofthese efforts. First, as a design tool, it has been designedto directly support run-time reconfiguration, both par-tial and global, and it attempts to hide details of con-figuration from the user. However, in contrast to otherwork, JHDL makes no attempt to automatically iden-tify partial configurations nor does it address the run-time physical transformation of circuits so that theywill fit within available FPGA resources. At present,JHDL is primarily a manual design tool that combinesCCM control and circuit design into single integrateddescription. JHDL probably has more in common withPerle as it uses hierarchy in a manner similar to Perle.However, it differs from Perle in that it was specificallydesigned to support run-time reconfiguration and CCMcontrol.

Note that Java is not critical to this project; almostany object-oriented language would have sufficed. Javadoes have some useful features that can be exploitedfor this project; in particular, the portability and inte-grated GUI API are useful. However, any language thatsupports object construction and hierarchy would be alikely candidate for this project.

4. Research Approach

The primary distinction of JHDL and indeed the pri-mary goal of this project is the creation of a singleintegrated API that allows the designer to express cir-cuit organizations that dynamically change over time.Stated another way, the primary goal is to allow thedesigner to specify, in a reasonably natural way, whenhardware gets loaded and removed from a CCM with-out exposing any of the details normally associated with

CCM operation. Rather than invent a new language fea-ture to schedule the configuration of circuits, we choseto adopt the object-instance construction/destructionmechanism used in object-oriented languages. Con-ventional object-oriented languages manage memorythrough object constructors and destructors. When anobject constructor is invoked, it allocates the necessarymemory from the heap or stack and sets object variablesto initial values. Memory is reclaimed by invoking anobject destructor that frees the memory back up to beused by other objects. JHDL manages FPGA resourceson CCMs in a similar manner. In JHDL, all circuits aredeveloped hierarchically as distinct objects. FPGA re-sources are allocated (i.e. a circuit is configured) byinvoking the constructor for a JHDL circuit object andanalogously, FPGA circuitry is reclaimed by invokingthe object’s destructor.

This approach of using object constructors/destruc-tors to control the circuit lifetime on a CCM is apowerful technique that naturally leads to a dual sim-ulation/execution environment where a designer caneasily switch between either software simulation orhardware execution on a CCM with a single appli-cation description. When simulating in software, theconstructors/destructors communicate with the JHDLsimulation kernel. Constructors create object instancesin system memory; these object instances are actuallysimulation models that interface with a simulation ker-nel to provide a clock-by-clock simulation of the usercircuit. However, when executing in hardware (on theCCM), the constructors/destructors communicate di-rectly with the CCM (through a JHDL interface layer)instead of the simulation kernel. Instead of allocatingsystem memory, constructors load circuit descriptionsfrom a circuit library and control the execution of theCCM. Analogously, destructors remove circuits by re-placing existing circuits with “blank” configurations,similar to the state that exists when the FPGA is ini-tially reset.

5. Overview

JHDL is embodied in a library of Java classes, such asCell andWire , which provide the basic framework forcircuit simulation and netlisting. The user builds up hiscircuit design by defining Java classes which inheritfrom these JHDL core classes. As JHDL is primarilya structural design tool, the user describes the specificbehavior of the cell by piecing together primitive cells,maybe “FlipFlop” or “And2”, from an FPGA-specific

32 Bellows and Hutchings

library. Each time the user constructs an instance ofhis circuit object, he is modeling the configuration ofthe corresponding circuit in the target FPGA. To im-prove simulation performance, JHDL allows the userto provide a high-level behavioral description of hisstructural circuit, if desired.

In order to support hardware implementation ofthe user’s circuit, JHDL supplies libraries of FPGA-specific cell primitives from which the user builds uphis design. Circuit netlists can be generated using theseprimitives, and these netlists can then be processedby existing vendor-specific implementation tools. Cur-rently, JHDL libraries exist for the Xilinx XC4000,XC6200, and Virtex architectures, and a JHDL netlis-ter has been created that translates JHDL data structuresinto an EDIF-syntax netlist that can be read by XilinxPAR tools.

Having developed these packages as a startingpoint, we created a JHDL methodology for describingand controlling RTR systems. However, as mentionedabove, JHDL relies on back-end tools to complete theimplementation of the circuit. Because existing PARtools do not automate the placement of RTR circuitsin the way that they automate static placement, JHDL(or any any current FPGA design tool, for that matter)cannot fully automate the generation of RTR-capablebitstreams. Instead, JHDL currently serves as a veryefficient and intuitive way of describing and modellingthe behavior. To get around the limitations of the PARtools, we used JHDL to simulate the RTR system, andto netlist the static and dynamic portions of the circuit.Then we hand-crafted the placement of the circuits toensure their compatibilty across all possible configura-tions. Should RTR-PAR tools become available, JHDLwould provide a clean and flexible interface in obtain-ing high-performance results from these tools, becauseof the fine granularity of control available in JHDL andthe rich programming features available in Java. Be-cause of this limitation, the rest of this paper focuseson the issues involved in providing the simulation andcontrol of RTR systems under JHDL.

The JHDL mechanisms for modelling RTR systemshave proven to be extremely useful in that JHDL inher-ently debugs the interface between the FPGA systemand the controlling software running on the host ma-chine. A JHDL model describes both parts of the sys-tem in a single piece of code, and that description servesas both the simulation model and the hardware execu-tion controller, with identical behavior. The heart of thistransparent switching between simulation and execu-

Figure 1. High-level view of the interaction of JHDL’s run-timereconfiguration control classes.

tion is the interaction of four special classes:Reconfig-urable, DataPort, HWSystem, andHardwareMan-ager. The Reconfigurable class transparently managesthe loading and unloading of FPGA configurations. TheDataPort class manages I/O between the FPGA systemand the host. The HWSystem is in charge of sequenc-ing the circuit, either by running the JHDL simulationmodel or by controlling the hardware clock and reset.Finally, the HardwareManager provides the abstractionlayer that allows these operations to be specified in aplatform-independent manner. The interaction of theseclasses is summarized in Fig. 1. This paper focuses onhow these classes use the basic JHDL model of dy-namic hardware to provide a codesign environment forend-to-end development of RTR systems.

Since the development of the RTR control classesmentioned above, JHDL has evolved into a general-purpose design environment for FPGA systems. Thecurrent development environment includes a graphi-cal circuit “browser” with a schematic-rendering tool,“waves”-style watch table, and other advanced toolsthat allow direct interaction and inspection of the JHDLobjects themselves. JHDL also has several design ac-celerators to reduce the overhead of structural design:a state-machine synthesizer, module generators, and a

Designing Run-Time Reconfigurable Systems 33

rich API for quickly building up basic circuit types.A complex JHDL control environment has been devel-oped for a current popular FPGA system, the AnnapolisWildForce board. Current development is under wayfor logic optimization, graphical floor-planning tools,and Xilinx Virtex support. Because of the rapid evolu-tion of the general-purpose JHDL design environment,not all of the research presented herein has kept upwith development. Nevertheless the current generationof JHDL shares the same core and the same designstrategies.

6. Designing a Basic JHDL Cell

The first step in building our MAC cell is to create anadder. We will illustrate how to do this from the groundup as a gentle introduction to JHDL syntax. Note thatfunctions such as addition have compact, native JHDLimplementations, but we will ignore this fact for thepurpose of illustration. For more details on JHDL syn-tax, please refer to [2].

Listing 1. JHDL implementation of a 1-bit full adder.

6.1. Creating a 1-Bit Full Adder

The JHDL implementation of a 1-bit full adder is shownin Listing 1. As mentioned, each JHDL circuit ele-ment is a discrete Java object, so the FullAdder isembodied in a stand-alone class which inherits fromthe Logic class in the base JHDL library. Logic pro-vides the inherent netlisting and simulation setup forthe adder. The first declaration in the class,CellInter-face[] cell interface..., elaborates the port interface forthe FullAdder: two 1-bit operands, a 1-bit sum, andcarry-in/carry-out. As mentioned previously, circuitstructure is built up in the cell constructor (thepublicFullAdder(Node parent...) block). First, the call to thesuper-constructor (super(parent, name)) builds up thecircuit hierarchy graph. Next, theconnect(portname,wire) statements take the Wire parameters that werepassed into the constructor and connects them tothe ports of the FullAdder. Finally, we build up thecircuit structure with the nativeand(), or o(),and xor o() method calls, which instantiate the

34 Bellows and Hutchings

corresponding logic gates. Note that a method with theo suffix requires the user to supply the output wire to

the function, whereas a method without this suffix (suchas and(a,b)) creates the output wire for you and re-turns it. This allows us to have nested gate-instantiatingmethod calls, as shown.

6.2. Creating an N-Bit Adder

Now to create a wide adder, we simply need to piecetogether a chain of our FullAdder cells. This is shownin Listing 2. The cell interface for the NBitAdder class

Listing 2. JHDL implementation of an N-bit adder.

shows how generic-width ports are specified in JHDL,using theparam() construct in the interface declaration.The NBitAdder is built up using a simplefor loop toinstantiateN FullAdders interconnected by the carrywires. At this point, our adder is complete, and we areready to build up our constant multiplier for our MACblock.

6.3. Creating the Constant Multiplier

Now we can create our multiplier. As mentioned previ-ously, we are going to make this a constant multiplier,

Designing Run-Time Reconfigurable Systems 35

which automatically builds a custom circuit for the con-stant. Besides being useful for illustration, this “con-stant propagation” approach is common in FPGA de-sign because it lets us profit from the size and speedof custom circuits. At the same time, the ability to re-configure the custom circuit on the fly gives us almostas much flexibility as having a general-purpose circuit.

Listing 3. JHDL implementation of a constant multiplier.

We call our MAC “partially reconfigurable” becausewe will be reconfiguring only the multiplier part of thecircuit, while leaving the rest of the circuit untouched.The JHDL model fully supports partial reconfigurationas well as global (full chip at-a-time) reconfiguration.

The model for the constant multiplier is shown inListing 3. It is not necessary to understand all the

36 Bellows and Hutchings

structural details of the multiplier; generally speak-ing it instantiates an adder for each ’1’ bit in theconstant multiplicand. These adders are chained in se-rial, with appropriate shifting between them, to createa custom array multiplier. Note that all wire param-eters to the constructor are contained within a specialclassArgBlockList . This ArgBlockList is a list of wireparameters with specific information about how to con-nect them to the ports of the cell. Any cell that em-bodies a stand-alone partial configuration must usean ArgBlockList to connect its ports, because doingso allows JHDL to control the interface to the par-tially reconfigurable logic; this is further explained inSection 7.

7. Controlling Run-Time ReconfiguredHardware

Now, we want to combine our FullAdder, Constant-Multiplier, and a register to make a MAC. As dis-

Listing 4. JHDL implementation of a partially reconfigurable MAC cell.

cussed, we want our MAC to be partially reconfig-urable, such that we can change our multiplier constanton the fly. We use the specialReconfigurableclass tospecify the desired reconfiguration behavior. This isshown in Listing 4. We simply interconnect compo-nents using the same JHDL syntax shown in previousexamples; however note that we connect the FullAdderand the register to aReconfigurable, not aConstant-Multiplier . The Reconfigurable is a static placeholderin the netlist where the ConstantMultiplier will be con-figured. When we construct the Reconfigurable, wepass in an ArgBlockList of wires, which representsthe interface between the reconfigurable and static por-tions of the circuit. Each time a new configurationis loaded into the Reconfigurable, it passes the sameArgBlockList into the constructor of the new cell, forc-ing the new configuration to match this static interface.Effectively, the Reconfigurable class acts like a “chipsocket” as shown in Fig. 2; it maintains a static inter-face to external circuitry, and allows any “chip” to beplugged in which conforms to its “pin package.”

Designing Run-Time Reconfigurable Systems 37

Figure 2. Illustration of the effective model of a Reconfigurable object.

Note that we do not specify the configuration ofthe Reconfigurable in the constructor. This happensin the reconfigure() method, in which we tell the Re-configurable to create a new ConstantMultiplier, usingthe givenconstantvalueString to specify the constantmultiplicand. This in turn causes the Reconfigurable toinvoke the constructor of the ConstantMultiplier, whichis our model of circuit configuration as mentionedpreviously. Theconstantvalue String gets appendedto the ArgBlockList before being passed into theConstantMultiplier constructor.

Now, we simply need to put our MAC unit into somesort of top-level “test-bench” class that interfaces thecircuit with the rest of the Java world. This test-bench

will manage the transfer of data to and from the circuit,and interface with the “host software” parts of the totalsystem, such as a graphical user interface, disk access,console I/O, and so forth. This is shown in Listing 5.In this simple concocted example, we assume that ourMAC circuit is controlled by a GUI on the host bywhich the user can specify the constant value for themultiplier and the file in which to dump the output data.

In our test-bench, we need to do the following:

• Create a JHDL system controller, called aHWSys-tem• Initialize our FPGA board controller, called aHard-

wareManager.

38 Bellows and Hutchings

• Create a top-level Reconfigurable to control theglobal reconfiguration of the MAC.• Setup data transfer with the FPGA system. This data

interface occurs via theDataPort class.

Each of these steps is illustrated in the listing. First,we create the JHDL system controller, HWSystem, andtell it to negotiate all hardware interaction via the Hot-WorksHardwareManager, which is a device driver forthe Virtual Computer Corporation’s HotWorks board.This driver extends JHDL’s HardwareManager API,which provides a platform-independent specification ofbasic FPGA system interaction with a JHDL-specificAPI.

Next in the example, we instantiate a single Recon-figurable and wire it up. This Reconfigurable represents“the FPGA”, and will take care of the global chip con-figuration. We immediately load the MAC configura-tion, anticipating that this top-level configuration willstay resident throughout the execution of our program.Next, we attach DataPort objects to the input/outputwires of the system. These DataPorts act as data bufferson the wires, and hide the details of data I/O with theFPGA hardware from the user.

The rest of the listing is an example of how we mighteasily interface our host software with the executinghardware. In this example, we retrieve a new multiplyconstant and destination file from the user interface.Then we partially reconfigure the circuit with a cus-tom multiplier corresponding to the constant. Next weload the input data into the input DataPort, and use theHWSystem to reset and clock the circuit. We can thenretrieve the buffered output data from the output Data-Port, and use the host machine to save the data to disk,or do post-processing, or whatever other functions aredesired.

At this point, we should summarize a few key pointsabout the JHDL model for RTR computing:

• The specification of configuration sequencing is ex-tremely simple and intuitive.• The Reconfigurable class can be nested, i.e. we

loaded the MAC configuration into the top-levelReconfigurable, while the MAC configuration it-self contains a Reconfigurable (for the multiplier).It turns out that this architecture will allow forarbi-trary levels of nested Reconfigurables, allowing theuser to specify any degree of partial reconfigurationdesired.

• All board-level interactions, including reconfigura-tion, clocking, and data transfer were specified in aplatform-independent syntax.• The JHDL designer can easily codesign the host soft-

ware and the FPGA hardware, and interaction of thetwo is naturally debugged in the standard Java/JHDLdebugging environment.

8. The Dual-Mode Operation of RTR ControlClasses

Now we will discuss the details of how the RTR controlclasses (HWSystem, Reconfigurable, DataPort, andHardwareManager) perform their functions. Specifi-cally, we will show how these classes interact to pro-vide identical behavior in bothsimulation andexecu-tion mode, and how they provide their abstract interfaceto hardware. We must reiterate that the JHDL modelsfor partially reconfigurable circuits cannot be automati-cally compiled to bitstreams because current PAR toolsare not capable of synthesizing for partial configura-tion. But given better tool support in the future, JHDLcould provide an ideal design entry interface for fullyautomating the compilation of partially reconfigurablecircuits.

8.1. The Reconfigurable Class

In simulation mode, the Reconfigurable class does littlemore than hide the details of object construction fromthe user. Reconsider the MAC code of Listing 4. Whenwe call

reconfigurable.reconfigure(“ConstantMultiplier”,“constantvalue=”+ constantvalue);

the Reconfigurable uses Java’s “reflection” capabili-ties to dynamically look up the constructor of the Con-stantMultiplier class. (Java reflection refers to the built-in meta-data structure of Java classes that allow theirfields, methods, and constructors to be dynamically an-alyzed and invoked at run-time.) The Reconfigurablefinds the constructor for the ConstantMultiplier class,and invokes it, passing in its ArgBlockList with the in-terface wires. It also tags all objects in the old configu-ration for deletion. As with all JHDL cells, the Recon-figurable will accept placement parameters that will beinserted into the netlist. Upon netlisting, the Reconfig-urable class can easily segregate the circuit into static

Designing Run-Time Reconfigurable Systems 39

and dynamic parts, which will be a necessary step forfuture PAR tools that support partial configuration.

Notice that when dealing with the Reconfigurables,we do nothing to specify which FPGA system we aretargeting. All JHDL functions get mapped to the targettechnology by the HardwareManager class, which we

Listing 5. The test-bench to control the MAC cell.

set as a JHDL system-wide parameter before construct-ing the circuit (refer to thesetHardwareManager() callin Listing 5). In our example, we targeted the MACsystem to the HotWorksHardwareManager. When theReconfigurable.reconfigure() method is called prior tonetlisting, the HardwareManager must decide how to

40 Bellows and Hutchings

map the loaded configuration to the FPGA system.In the case of the HotWorksHardwareManager, this issimple since there is only one FPGA on the board; whenreconfigure() is called or when the netlister is run, theHotWorksHardwareManager only needs to insert somepin interface logic around the circuit. Most real FPGAapplications require multiple FPGAs per board, how-ever. So if we targeted JHDL to the 16-FPGA Splash-IIboard, for example, the same operation would requirethe SplashHardwareManager to partition the circuit au-tomatically across the 16 FPGAs.

In hardware execution, the Reconfigurable nego-tiates the loading of the corresponding bitstreamswith the HardwareManager. Typically, a standard file-naming convention is used to associate a bitstreamfile with a JHDL class; for example, the Hardware-Manager might look for a bitstream “ConstantMulti-plier.bit” when the ConstantMultiplier class is loadedinto a Reconfigurable. When the Reconfigurable repre-sents a partial (local) configuration, the Reconfigurableuses its placement constraints to control where the con-figuration is loaded on the chip.

8.2. The DataPort Class

In simulation mode, the DataPort class is just anothercell on the JHDL simulation schedule, acting like asynchronous wire reader or writer. When netlisting thedesign, DataPorts indicate the need for interface drop-in logic to be inserted on wires that are inputs or outputsto the top-level system. Again, we have done nothingto specify the details of how these data transfers getimplemented; for example, refer back to Listing 5. Inthis example, all we do is request two data channelsfrom the HardwareManager, one for reading and onefor writing. As with the configuration details, the Hard-wareManager must make some decisions as to how thisdata transfer best maps to its board—maybe via a FIFOor a crossbar, which will require special configurationand interface logic. Or maybe it can read and write datadirectly off the pins of the device. An alternative to thisapproach is that the user could put constraints on thetop level input/output wires, such as input-pin JHDLsymbols, which indicate the desired I/O behavior. Butin one form or another, the HardwareManager must de-cide at netlist time how to implement the desired datatransfer.

In hardware execution mode, the DataPort negoti-ates the data transfers with the I/O sources selectedduring netlisting, via the HardwareManager. Because

the DataPorts use a buffered data model, the Hardware-Manager can take advantage of fast block transfers toreduce the overhead of FPGA board interaction.

8.3. The HWSystem class

The HWSystem sets up and controls the JHDL simula-tion. Itscycle() andreset() methods shown in Listing 5are used to control the circuit in both simulation and ex-ecution mode. In simulation mode, these methods em-ulate the corresponding clock and reset behavior usinga unit-wire-delay, static simulation schedule. Becausethe Reconfigurable and DataPort classes are derivedfrom base JHDL types, these objects require no spe-cial consideration during simulation. When the Recon-figurables change configuration, the HWSystem walksthrough its simulation schedule, notes all the wires andcells that were tagged for deletion by the Reconfig-urables, and removes them from the simulation sched-ule to be garbage-collected. In execution mode, theHWSystem gets control of the FPGA board upon ini-tialization via the HardwareManager, and thecycle()andreset() methods issue the corresponding signals tothe board.

8.4. HardwareManager

As is obvious from our description of the previousthree classes, the HardwareManager is critical to suc-cessful implementation of the system because all hard-ware interaction must pass through it. The Hardware-Manager provides an abstract API to describe standardFPGA board operations, such as clocking, reset, recon-figuration, and data transfer. This abstract API allowsus to describe entire board-level systems in platform-independent terms. Of course, the benefits of platform-independence are only realized if the HardwareMan-ager makes efficient implementation choices for eachof its method calls. In JHDL, it is very easy to addvarying degrees of user constraints to direct the toolstowards an efficient implementation. Or, an alternativeand equally (or more) viable solution would be to pro-vide a platform-specific JHDL model for a given boardwhich exposes as much fine-grained control as needed.For example, a SplashHardwareManager could be cre-ated that predefines 16 Reconfigurables, one per FPGAon the board, and specialized DataPorts to representthe FIFOs, memory ports, and crossbar on the board;this type of model has been created for the Annapolis

Designing Run-Time Reconfigurable Systems 41

WildForce board. In this paper, we have focused moreon the platform-independent specification of RTR be-havior because this is clearly the long-term direction inwhich FPGA tools ought to move, requiring fewer ofus to be vendor-specific hardware wizards in order toreap the benefits of adaptive computing. The approachshown in this paper allows the designer to specify theRTR behavior of his circuit at a high level, leaving thedevice-specific mapping details to the tools.

9. A Dynamic JHDL Implementation ofAutomatic Target Recognition

The culmination of this research was the JHDL imple-mentation of a complex automatic target recognition(ATR) algorithm. This algorithm, which was devel-oped by Sandia National Laboratories, performs binarythresholding of radar images, where the threshold val-ues are locally computed at each position in the image.Then after the image is thresholded, it is correlatedto template images that describe both the foreground(“bright”) and background (“surround”) features of thetarget; this is sometimes called a “hit-and-miss” corre-lation. If both correlations are high, this indicates thepresence of the target.

Thus this ATR problem requires two circuits: oneto compute the adaptive threshold values (called the“shapesum”), and one to compute both the thresholdingand correlation [11–13]. The target system must be ableto reconfigure very quickly, since the Sandia algorithmuses thousands of templates. We will want to take ad-vantage of RTR to achieve both a high clock rate, by cre-ating custom circuits for each template configuration,and high throughput, by having low reconfigurationtimes. Given these constraints, we decided to map thealgorithm to the Xilinx XC6200 series FPGA, whichhas fine-grained logic appropriate for binary morphol-ogy and very fast partial reconfiguration.

Our circuit design approach for the ATR problemis illustrated in Fig. 3. This figure describes the im-plementation for both the shapesum and the correla-tion circuits. In both cases, we arrange the circuit asa 2-D systolic grid of pixel processors, one processorper pixel in the template image. Between each of theprocessors is a static grid of systolic interconnect. Eachpixel processor can be independently reconfigured tocustom-compute based on the state of that pixel in thetemplate. For the correlator, a processor is configured toadd one to the bright “hit count” when the image pixelintensity exceeds the threshold, if the corresponding

template pixel is bright. If the template pixel is sur-round, the processor is configured to add one to thesurround “hit count” when the image pixel intensity isless than the threshold. Similarly, each shapesum pixelprocessor is custom-configured based on the state ofthe template pixels.

To model this in JHDL, we created Shapesum andCorrelator classes, each of which build up aN × Ngrid of Reconfigurables. These Reconfigurables wereassigned placement constraints to set the grid locationsof each processor. Then they were interconnected withWire objects to provide the static communications net-work. Next, we bundled the system into a top-leveltest-bench class, very similar to our MAC controllershown in Listing 5. The testbench contained a singleReconfigurable object with DataPorts controlling itsI/O wires. Because our XC6200 device was only bigenough to contain one of the circuits at a time, wehad to “context-switch” the device, reconfiguring backand forth between the shapesum and correlator. Thusthis application uses two levels of reconfiguration—global configuration of either the shapesum or correla-tor, and local configuration for each pixel processor inthe N × N grid.

While we could netlist the design from JHDL, therewas no way to automatically PAR the circuits becauseof the partial reconfiguration requirements. Thereforewe had to manually place the circuit, making sure thatall of the pixel processor configurations were inter-changeable in their size and routing interface to thestatic interconnection grid. We separated the bitstreaminto individual configuration files, one each for theshapesum, correlator, and the various pixel processorconfigurations. The HotWorksHardwareManager asso-ciated these configuration files with each JHDL cell bya naming convention. When the pixel processors are re-configured, the HotWorksHardwareManager uses theplacement constraints of the Reconfigurables to decidewhere to load the new processor configurations. Also,when DataPorts are created, the HotWorksHardware-Manager allocates a bank of protected I/O registers (theprimary I/O mechanism on the HotWorks board) andinserts them into the netlist with specific placementannotations. Then in hardware execution, it sends andreceives the image data to/from those registers via thedevice driver. In all, we found that JHDL was quickand efficient in controlling this complex application;more importantly, it demonstrated that we can in factcodesign a very complex RTR application, with verysimple and intuitive constructs to describe the system

42 Bellows and Hutchings

Figure 3. The architecture for the shapesum and correlator ATR circuits.

control. We were disappointed that we could not finda way to automate the place-and-routing of the JHDL-generated netlist, as we could normally do with circuitsthat use only global reconfiguration. But again this wasa limitation of the back-end PAR tools, not of JHDLitself.

10. Future Work

As discussed above, the major limiting factor in thisresearch has been our inability to automate the place-and-route of partially reconfigurable circuits with con-ventional tools. A significant contribution to the RTRresearch community would be the development ofPAR tools that natively support partially reconfig-urable circuits. This is obviously an order of magni-tude more difficult than normal static PAR, and there-

fore would have to be constrained to a more restrictivedesign flow. One possible methodology would be asfollows:

1. The designer segregates the circuit into static andreconfigurable portions.

2. The static portions of the circuit maintain a fixedinterface to all reconfigurable portions of thecircuit.

3. The dynamic portions of the circuit are placed-and-routed. All circuits that will be loaded into the sameReconfigurable will be optimized for isomorphism.In other words, say that I am going to interchangeadder and subtracter configurations at a given loca-tion. The tools should try to force the adder andsubtracter, as much as possible, to have the same lay-out and use the same resources. This minimizes the

Designing Run-Time Reconfigurable Systems 43

circuit resource overhead of partial configuration(see item #4). Of course, the tools must also forcethe adder and subtracter to have anidenticalroutinginterface to the static logic. This is the critical stepin efficient automated PAR, and will greatly benefitfrom any user guidance provided.

4. Next, the PAR tools place the static portions of thecircuit. At any given location that is partially re-configurable, the PAR tools throw out all FPGAresources that are claimed by any of the potentialsub-configurations. In other words, we take eachsub-configuration group (see #3), take the logicalunion of all FPGA resources used by that group,and throw them out of the available resource poolfor the static circuitry. Then, the tools can PAR thestatic design normally.

We feel that JHDL provides the right sort of designentry point for partial RTR circuit design, since theReconfigurable class clearly partitions the circuit intostatic and dynamic portions, and enforces the staticwiring interface with dynamic circuits. Also, JHDLallows the user very flexible control over the low-leveldetails of the circuit; with this kind of control, the usercould very likely provide some simple placement hintsor constraints to guide step #3 above, which would bea very difficult problem to solve without guidance.

Also, more investigation into the HardwareMan-ager concept would be very insightful. Because ofits abstract representation of FPGA board interac-tion, the HardwareManager permits us to describeFPGA system interaction in a clean, intuitive, device-independent manner. In traditional tool flows, theseFPGA control operations almost always require de-tailed understanding of the device and very low-levelprogramming. While our research demonstrated thatthis model was successful for the simple HotWorksplatform, it would be quite compelling to demonstratethat these same kinds of designs could be efficientlymapped by a HardwareManager to a more complexsystem like Splash-II. This would require efficient au-tomated circuit partitioning. Could we in fact designHardwareManagers that take full advantage of the spe-cific features of a target system when we use thisabstract API? If so, we could easily build up a heteroge-nous FPGA computing fabric, each system communi-cating with the others via this API, `a la the “plug-and-play” standard. This complex communication wouldgreatly benefit from the clean control constructs ofJHDL.

11. Conclusions

We believe that the constructor/destructor mechanismhas proven to be a feasible way to control configurationon a CCM. The Reconfigurable class provides an intu-itive wrapper around this constructor invocation, hidingthe details of switching between simulation and execu-tion mode. In addition, JHDL met all of the projectgoals that were defined at the outset of this researchproject:

1. JHDL is based on a popular language and requiresno language extensions for circuit design.

2. The CCM control paradigm is CCM independent,adopting the object-instance construction metaphorfrom object-oriented languages. The abstractionwill work with any standard CCM and work is nowunder way to interface JHDL to other CCMs suchas the WildForce system from Annapolis Microsys-tems. The JHDL CCM control paradigm is very sim-ple and intuitive.

3. JHDL supports both partial and global configura-tion, and demonstration ATR applications have beenimplemented to show this capability.

4. A JHDL application description serves as both sim-ulation model and execution controller for CCM ap-plications. No code modifications are required, andswitching between software simulation and hard-ware execution on the CCM requires the setting ofa single boolean variable.

Furthermore, JHDL allows the application to be nat-urally divided into those parts that will run in softwareand those parts that will run in hardware. When operat-ing in hardware execution mode, only those parts of theapplication that are described using circuit classes willbe executed on the CCM platform. All other parts ofthe application remain on the host, operating essentiallyas a separate program that is communicating with theuser-designed circuitry via the CCM device driver. Inthis way, JHDL allows for both software and hardwaredescriptions to not only coexist but also to coexecute.

JHDL also provides additional benefits because it isbased on a commonly-used programming language andas such all of the standard language features, such asI/O, are accessible to the designer throughout the designprocess. Unlike VHDL for example, it is quite easy toperform arbitrary I/O in JHDL, both to the consoleand to files during software simulation. Also, JHDLbenefits from the programmatic control constructs ofJava (i.e.if/then/elsestatements,for(...) loops, method

44 Bellows and Hutchings

calls, etc.). These constructs allow the user very quicklyto build up the parameterized structures that are typicalof complex RTR applications.

Although JHDL was developed primarily withFPGAs in mind, it is quite possible to use JHDL fornon-FPGA designs. The JHDL framework is quitegeneric, and it is relatively easy to setup JHDL to workwith different technologies. For any given technology,all that is necessary is a library of appropriate prim-itives and a netlister that accesses the JHDL circuitstructure and generates a textual netlist in the properformat. Thus it would be possible, for example, to useJHDL as a low-level circuit layout generator by provid-ing a library of physical primitives and a netlister of theappropriate format. Or, structural JHDL designs couldbe created if a library of gate-level primitives were pro-vided along with a VHDL netlister, for example.

In the future, we anticipate making several improve-ments to JHDL to further enhance its effectiveness andapplicability as a design tool. Possible enhancementsinclude synthesis of control blocks, the ability to fullydebug executing hardware, and a reader that can acceptEDIF netlists and generate JHDL.

Acknowledgments

This effort was sponsored by the Defense AdvancedResearch Projects Agency (DARPA). The U.S. Gov-ernment is authorized to reproduce and distributereprints for Governmental purposes notwithstandingany copyright annotation thereon.

References

1. P. Bellows and B. Hutchings, “Jhdl—An hdl for ReconfigurableSystems,” inProceedings of IEEE Workshop in FPGAs for Cus-tom Computing Machines, J. Arnold and K.L. Pocek (Eds.),Napa, CA, April 1998, pp. 175–184.

2. P. Bellows, J. Hawkins, S. Hemmert, and B. Hutchings, “A CadSuite for High-Performance FPGA Design,” inProceedings ofIEEE Workshop on FPGAs for Custom Computing Machines,J.M. Arnold and K.L. Pocek (Eds.), Napa, CA, April 1999.

3. J. Vuillemin, P. Bertin, D. Roncin, M. Shand, H. Touati, andP. Boucard, “Programmable Active Memories: ReconfigurableSystems Come of Age,”IEEE Transactions on VLSI Systems,vol. 4, no. 1, 1996, pp. 56–69.

4. L. Moll and M. Shand, “Systems Performance Measurement onPCI Pamette,” inProceedings of IEEE Workshop on FPGAs forCustom Computing Machines, J. Arnold and K.L. Pocek (Eds.),Napa, CA, April 1997, pp. 125–133.

5. C. Iseli and E. Sanchez, “Spyder: A Reconfigurable VLIWProcessor Using FPGAs,” inProceedings of IEEE Workshop

on FPGAs for Custom Computing Machines, D.A. Buell andK.L. Pocek (Eds.), Napa, CA, April 1993, pp. 17–24.

6. S. Gehring and S. Ludwig, “The Trianus System and Its Appli-cation to Custom Computing,” inField-Programmable Logic:Smart Applications, New Paradigms, and Compliers. 6th In-ternational Workshop on Field-Programmable Logic and Ap-plications, R.W. Hartenstein and M. Glesner (Eds.), Darmstadt,Germany: Springer-Verlag, 1996, pp. 176–184.

7. W. Luk, N. Shirazi, and P.Y.K. Cheung, “Compilation Toolsfor Run-Time Reconfigurable Design,” inProceedings of IEEEWorkshop on FPGAs for Custom Computing Machines,J.M. Arnold and K.L. Pocek (Eds.), Napa, CA, April 1997,pp. 56–65.

8. J. Burns, A. Donlin, J. Hogg, S. Singh, and M de Wit, “A Dy-namic Reconfiguration Run-Time System,” inProceedings ofIEEE Workshop on FPGAs for Custom Computing Machines,J. Arnold and K.L. Pocek (Eds.), Napa, CA, April 1997, pp. 66–75.

9. M. Gokhale and E. Gomersoll, “High Level Compilation for FineGrained FPGAs,” inProceedings of IEEE Workshop on FPGAsfor Custom Computing Machines, J.M. Arnold and K.L. Pocek(Eds.), Napa, CA, April 1997, pp. 165–173.

10. P. Lysaght and J. Stockwood, “A Simulation Tool for Dynami-cally Reconfigurable Field Programmable Gate Arrays,”IEEETransactions on Very Large Scale Integration (VLSI) Systems,vol. 4, no. 3, 1996, pp. 381–390.

11. J. Villasenor, B. Schoner, K. Chia, and C. Zapata, “ConfigurableComputing Solutions for Automatic Target Recognition,” inPro-ceedings of IEEE Workshop on FPGAs for Custom ComputingMachines, J. Arnold and K.L. Pocek (Eds.), Napa, CA, April1996, pp. 70–79.

12. M. Rencher, “A Comparison of FPGA Platforms ThroughSAR/ATR Algorithm Implementation,” Master’s thesis, Depart-ment of Electrical and Computer Engineering, Brigham YoungUniversity, Provo, Utah, 1996.

13. M. Rencher and B. Hutchings, “Automated Target Recognitionon Splash 2,” inProceedings of IEEE Workshop on FPGAs forCustom Computing Machines, J. Arnold and K.L. Pocek (Eds.),Napa, CA, April 1997, pp. 192–200.

Peter Bellowsreceived the B.S. and M.S. degrees from BrighamYoung University in Electrical and Computer Engineering in 1999.At BYU, he researched reconfigurable computing and computerarchitecture, and helped to develop a suite of FPGA design toolsthat describe and control the dynamic structure of run-time-reconfigurable circuits. He is currently a systems engineer at In-formation Sciences Institute in Arlington, Virginia. He is a princi-pal developer of the SLAAC-1 reconfigurable computing platform,which is PCI-based plug-in card with Xilinx 4000/Virtex FPGAs.His current work on this platform includes design of the PCI bridge,

Designing Run-Time Reconfigurable Systems 45

development of device drivers and control APIs, and system per-formance characterization. His other research interests include com-puter architecture and CAD tool [email protected]

Brad Hutchings is an Associate Professor in the Department ofElectrical and Computer Engineering at Brigham Young University.

His research interests are related to high-performance computingfocusing primarily on FPGA-based systems but with emphasis onreal-time signal processing, VLSI, DSPs, FPGA device architecture,and CAD tools. He received his PhD in Computer Science from theUniversity of Utah in 1992. He currently is director of the Config-urable Computing Laboratory at Brigham Young [email protected]