Tornado: A self-reconfiguration control system for core-based multiprocessor CSoPCs

15
Tornado: A self-reconfiguration control system for core-based multiprocessor CSoPCs Armando Astarloa * , Aitzol Zuloaga, Unai Bidarte, Jose ´ Luis Martı ´n, Jesu ´s La ´zaro, Jaime Jime ´nez University of the Basque Country, Department of Electronics and Telecommunications, Faculty of Engineering, Urquijo s/n, E-48013 Bilbao, Spain Received 26 May 2006; received in revised form 22 December 2006; accepted 7 January 2007 Available online 1 February 2007 Abstract In this work we present a self-reconfiguration control focused on multiprocessor core-based systems implemented on FPGA technology. An infrastructure of signals, protocols, interfaces and a controller is exposed to perform safe hard- ware/software reconfigurations. This infrastructure is part of the Tornado framework that includes other elements such as a multi-context assembler for a reconfigurable processor or a custom design flow developed for the Wishbone IP-Core interconnection specification. We present two applications where the presented control system has been applied, and it is compared with other available approaches. Ó 2007 Elsevier B.V. All rights reserved. Keywords: Dynamic reconfiguration; Partial reconfiguration; SoPC; SoC; FPGA; Multiprocessor 1. Introduction Thanks to the high-capacity modern FPGAs, ‘‘Hardware Plugins’’ concept makes sense. For example, Horta et al. [1] describe how communica- tions circuits were implemented as Dynamic Hard- ware Plugins and reconfigured with data sent over the network. Danne et al. [2] present a technique to implement multi-controller systems using partial reconfigurable FPGAs. In both cases the reconfigu- ration controller was implemented outside the main FPGA. The latest platform FPGAs have enabled the integration of whole digital systems in a single device: Hardware cores, microprocessors, on-chip buses, etc. Martin in the chapter ‘‘The History of the SoC Revolution’’ in [3] emphasizes how the core-based design with commercial reconfigurable FPGA platforms is a strong reality in the System- on-Chip (SoC) [4] design today and it will continue in the future. The SoCs implemented in reconfigura- ble logic are called System-on-Programmable- Devices (SoPCs). In this context of SoPCs, the use of the dynamic partial reconfiguration is a very active research area. 1383-7621/$ - see front matter Ó 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2007.01.011 * Corresponding author. Tel.: +34 94 6017304; fax: +34 94 60142 9. E-mail address: [email protected] (A. Astarloa). Journal of Systems Architecture 53 (2007) 629–643 www.elsevier.com/locate/sysarc

Transcript of Tornado: A self-reconfiguration control system for core-based multiprocessor CSoPCs

Journal of Systems Architecture 53 (2007) 629–643

www.elsevier.com/locate/sysarc

Tornado: A self-reconfiguration control system forcore-based multiprocessor CSoPCs

Armando Astarloa *, Aitzol Zuloaga, Unai Bidarte,Jose Luis Martın, Jesus Lazaro, Jaime Jimenez

University of the Basque Country, Department of Electronics and Telecommunications, Faculty of Engineering,

Urquijo s/n, E-48013 Bilbao, Spain

Received 26 May 2006; received in revised form 22 December 2006; accepted 7 January 2007Available online 1 February 2007

Abstract

In this work we present a self-reconfiguration control focused on multiprocessor core-based systems implemented onFPGA technology. An infrastructure of signals, protocols, interfaces and a controller is exposed to perform safe hard-ware/software reconfigurations. This infrastructure is part of the Tornado framework that includes other elements suchas a multi-context assembler for a reconfigurable processor or a custom design flow developed for the Wishbone IP-Coreinterconnection specification. We present two applications where the presented control system has been applied, and it iscompared with other available approaches.� 2007 Elsevier B.V. All rights reserved.

Keywords: Dynamic reconfiguration; Partial reconfiguration; SoPC; SoC; FPGA; Multiprocessor

1. Introduction

Thanks to the high-capacity modern FPGAs,‘‘Hardware Plugins’’ concept makes sense. Forexample, Horta et al. [1] describe how communica-tions circuits were implemented as Dynamic Hard-ware Plugins and reconfigured with data sent overthe network. Danne et al. [2] present a techniqueto implement multi-controller systems using partialreconfigurable FPGAs. In both cases the reconfigu-

1383-7621/$ - see front matter � 2007 Elsevier B.V. All rights reserved

doi:10.1016/j.sysarc.2007.01.011

* Corresponding author. Tel.: +34 94 6017304; fax: +34 9460142 9.

E-mail address: [email protected] (A. Astarloa).

ration controller was implemented outside the mainFPGA.

The latest platform FPGAs have enabled theintegration of whole digital systems in a singledevice: Hardware cores, microprocessors, on-chipbuses, etc. Martin in the chapter ‘‘The History ofthe SoC Revolution’’ in [3] emphasizes how thecore-based design with commercial reconfigurableFPGA platforms is a strong reality in the System-on-Chip (SoC) [4] design today and it will continuein the future. The SoCs implemented in reconfigura-ble logic are called System-on-Programmable-Devices (SoPCs).

In this context of SoPCs, the use of the dynamicpartial reconfiguration is a very active research area.

.

630 A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643

The use of this feature in platform reconfigurabledevices, replacing a part of the FPGA configurationmemory while the remaining digital system imple-mented in the chip continues running, will lead tomore flexible systems. Also, in this new platformreconfigurable devices, all the reconfiguration infra-structure (reconfiguration controller, bitstreamsmemory, etc.) may be embedded in the same chip,emerging the self-reconfigurable SoPCs. Taking intoaccount that in these systems a section must con-tinue running while another one is being replaced,the reconfiguration control is a key issue. In orderto apply self-reconfiguration, it is not enough thatthe FPGA configuration memory allows modifica-tion. This modification must be done under control,and the reconfiguration control system must takeinto account the following vital points: Do notmake any interference with the system static section,disconnect the module that is being replaced fromthe on-chip bus, control the logic state of the I/Oports located in the dynamic section, manage areconfiguration request-acknowledge handshake toallow critical no-reconfigurable time slots or toblock embedded multiprocessors during contextchange operations. Since the reconfiguration in theSoPC designs is focused on increasing the systemfunctionality, those are the new critical issues thatthe research in self-reconfiguration control tries tosolve. The traditional reconfiguration overheads[5], power and time, although they must be takeninto account depending on the application, are notso critical.

The most recent work in this field is summarizedas follows. Fong et al. [6] propose a framework forFPGA field updates embedding a reconfigurationcontroller with cryptographic capabilities and amedia interface through the bitstream which is trans-mitted to the FPGA. The reconfiguration is per-formed using the ICAP [7] internal reconfigurationinterface of the Virtex-II devices. There is no com-munication between the controller and the static ordynamic section of the design. Blodget et al. [8] pres-ent a Self-Reconfiguring Platform (SRP) for XilinxVirtex-II and Virtex-II Pro. The SRP has a reconfig-uration controller built with a soft microprocessorcore (Microblaze) on the Virtex-II or a hard micro-processor core (PowerPC) on the Virtex-II Pro.The internal reconfiguration interface ICAP iswrapped to fulfill with the on-chip bus specification,building a ‘‘Reconfiguration Peripheral Core’’. TheSRP is completed with a reconfiguration cache builtaround an embedded memory block (BlockRAM).

The communication between the different cores isperformed over the CoreConnectTM Open PeripheralBus [9]. Ullmann et al. [10] propose a self-reconfigu-rable system for automotive industry. Also, theyexploit Virtex devices partial reconfiguration capa-bility. Their inter-task reconfiguration approach isbased on the idea that in a car, all the electronic func-tions (managed by IP-Cores) not need to be simulta-neously active, so different priority groups can bedefined [11]. Using a real-time software to managethe messages interchange and dynamic priority, thehardware modules are loaded in defined slots inthe FPGA. Moller et al. approach, ‘‘Fixed core Pro-cessor connected to Reconfigurable Coprocessor’’(FiPRe) [12,13], is focused on multiprocessor self-reconfigurable systems. The embedded coprocessorsare in charge of accelerating some critical functionsand they can be inserted or replaced run-time. Thismultiprocessor model has been implemented usingXilinx Virtex-II technology. In like manner, some‘‘Reconfigurable Hardware Operating Systems’’[14] are emerging. In this field, Walder and Platzner[15,16] present a bus infrastructure (‘‘Task Commu-nication Bus’’ – TCB), signals and logic that allowshardware tasks management by a hardware Operat-ing System. The Operating System runs in the FPGAembedded processor.

Our research group has been working for years inSoPC architectures [17,18] with multiprocessor hard-ware-software cores [19,20]. Therefore, in order toexploit self-reconfigurable capabilities in our SoPCarchitectures, we need a control system with the fol-lowing features: On-chip standard bus allowance,low cost controller and low cost reconfiguration con-trol infrastructure, small circuit modifications andbig modifications via reconfiguration support, ahandshake between the reconfiguration controllerand the reconfigurable modules, and reconfigurationcontrol support for reconfiguration of cores withembedded tiny processors (Mixed Cores). Attendingto this motivation, the previously presentedapproaches are compared in Table 1. As can benoticed, the FiPRe approach is the best that betterfits with the requirements. However, it uses a specificbus, not a standard one, and it does not providemeans for intra-tasks reconfiguration control. Since,none of them fits with the proposed requirementsand, although the state-of-the-art in this field is verywide and new approaches are emerging with veryuseful contributions, like new design tools and mech-anisms to interconnect properly the statics anddynamic sections in the FPGA [21], to the best of

Tab

le1

Co

mp

aris

on

amo

ng

dyn

amic

par

tial

reco

nfi

gura

tio

nco

ntr

ol

app

roac

hes

for

core

-bas

edS

oP

Csy

stem

s

Nam

eS

tan

dar

dB

us

spec

ifica

tio

nR

eco

nfi

gura

tio

nco

ntr

oll

erIn

tra-

task

reco

nfi

gura

tio

nsu

pp

ort

Inte

r-ta

skre

con

figu

rati

on

sup

po

rt

Mix

edC

ore

soft

war

ere

con

figu

rati

on

sup

po

rt

Rec

on

figu

rati

on

con

tro

lh

and

shak

e

Co

red

isco

nn

ecti

on

du

rin

git

sre

con

figu

rati

on

Fo

ng

etal

.[6

]N

oS

pec

ific

pro

cess

or

Yes

No

No

No

No

Blo

dge

tet

al.

(SR

P)

[8]

Co

reC

on

nec

tTM

Gen

eral

pu

rpo

sep

roce

sso

r(M

icro

Bla

ze/P

ow

erP

C)

Yes

Yes

No

No

No

Ull

man

net

al.

[10]

Sp

ecifi

c(N

oC

)G

ener

alp

urp

ose

pro

cess

or

(Mic

roB

laze

)N

oY

esN

oY

esY

es

Mo

ller

etal

.(F

iPR

E)

[12]

Sp

ecifi

cG

ener

alp

urp

ose

pro

cess

or

(R8R

)(c

ust

om

ized

)N

oY

esY

esY

esY

es

Wal

der

etal

.[1

4]S

pec

ific

Gen

eral

pu

rpo

sep

roce

sso

r(M

icro

Bla

ze)

No

Yes

No

Yes

Yes

A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643 631

our knowledge, there are not solutions to controlself-reconfiguration applied to Mixed Cores thattakes advantage from the intra-task reconfiguration(Section 2.2).

In this paper, we present Tornado [22], a self-reconfiguration control system approach. It includesan infrastructure of signals, protocols and logic thatis defined to apply safe self-reconfiguration to SoPCcore based designs. These SoPCs can include multi-ple cores with tiny processors embedded. If thesesystems have reconfigurable capabilities, they arecalled Configurable-System-on-a-Programmable-Chip (CSoPCs) [23]. In order to apply and verify thisgeneral approach, we have implemented it using theWishbone standard specification for IP-Core inter-connection [24]. The selected target FPGAs are theXilinx Virtex partially reconfigurable devices.

The remainder of this article is organized intofour sections. Section 2 introduces the Tornado

reconfiguration framework for multiprocessorCSoPC and presents the signaling subsystem andthe reconfiguration controller core. Sections 3 and4 expose the experimental results. Section 5 con-cludes this paper and presents the future work inthis field.

2. Tornado self-reconfiguration control system

2.1. Targeted architectures

Fig. 1a represents a simplified architecture of aSoPC that includes n Tornado Compatible (TC-Cores), which are cores that admit controlled recon-figuration, and z IP-Cores, which have not Tornado

reconfiguration control. All of them use a standardinterface to be linked with the on-chip bus. The bustopology is only constrained by the bus specificationused, having selected for the representation aShared Bus topology.

The TC-Cores can include a software section,implemented on a tiny soft processor and with itssoftware code embedded into dedicated RAM mem-ory of the FPGA. The integration of tiny processorsinto cores offers a very flexible tool to the designer.State machines or control loops can be efficientlyimplemented using little embedded RISC micropro-cessors [25,26] to build complex cores. Althoughthese processors can be used for data processing[27,28] they are more likely to be employed in con-trol sections of the core [19]. The dominant imple-mentation factor of these small processors is size.We call the cores built with an embedded small

Fig. 1. Tornado infrastructure. (a) Generalized architecture of a CSoPC with TC-Cores. The allowable reconfiguration request sources arehighlighted. (b) Tornado interfaces for reconfiguration control.

632 A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643

A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643 633

CPU and additional hardware Mixed Cores. Fig. 2shows an abstraction of a Mixed Core that includesa small microprocessor with its software embeddedinto the FPGA dedicated RAM, a custom hardwarefor the specific application, and an interface to linkthe core to the on-chip bus. A standard specificationfor IP-Cores interconnection [29] is used for theinterface. The TC-Cores are completed with a slaveTornado InterFace (TIF) used to handshake thereconfiguration with the reconfiguration controller(detailed in Section 2.3).

A dynamic reconfigurable system requires anextra computation to set the context for each recon-figurable module and apply it. This processing iscalled ‘‘metacomputation’’ [30]. For example, inpattern matching applications, the ‘‘metacomputa-tion’’ would include for each new pattern: The com-putation necessary to identify or receive the newpattern match, the generation of a new bitstreamfor the new circuit adapted to the new pattern,and the load of this bitstream into the FPGA con-figuration memory. The Tornado approach followsthe natural architecture of the core based designs,where each core is in charge of doing independenttasks. These cores are able to write a configurationword, that includes information of which moduleand with which context want to be reconfigured,to the reconfiguration controller (Tornado BasicController – TBC) through the standard on-chipbus (see Fig. 1b).

Fig. 2. Mixed Core Tornado Compatible with multi-contextsoftware and partially modifiable.

The reconfiguration controller manages therequests and the application of the partial reconfig-uration bitstreams. The reconfiguration requests tothe controller can come from either TC-Cores orin general from IP-Cores, including the hard or soft

powerful microprocessors that may be embeddedinto the platform. Distributing the ‘‘metacomputa-tion’’ between different cores the complexity of thereconfiguration controller is reduced.

2.2. Supported reconfiguration modes

Sidhu and Prasanna defined in [30] two basicdynamic reconfiguration types:

• Intra-task reconfiguration: This reconfigurationapproach applies small changes to circuits thatare specifically designed to take advantage ofthe partial reconfiguration, like the multiplierspresented in [31].

• Inter-task reconfiguration: For this mode ofreconfiguration, all the core is removed. Thefunctionality of the new one can be completelydifferent.

The design flows that are being developed to sup-port dynamically reconfigurable designs are verydependent on the underlaying technology. In thecase of the Xilinx technology there are basicallytwo design flows: Jbits, based on the Java language[32], that solves well intra-task reconfiguration dueto its flexibility with FPGA bitstream manipulation;And the Xilinx Design Flows for partial reconfigu-ration described in [33]. There are two variants ofthe Xilinx Design Flow. One for intra-task reconfig-uration, and another one, called ‘‘Module BasedDesign Flow’’ that is used for inter-task reconfigu-ration. This last design flow solves the routingbetween dynamic and static parts problem usingpass-trough logic called Bus-Macro. This modulecan be constrained to fixed locations.

Tornado main contributions to control self-reconfiguration are the following ones:

• It allows for intra-task and inter-task reconfigu-rations. For this last reconfiguration mode, a spe-cific Tornado compatible Bus-Macro [34] inprovided to support the signals used for reconfig-uration control (see Section 2.3).

• It provides a handshake between the TC-Coresand the reconfiguration controller to performsafe partial dynamic reconfigurations. The need

634 A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643

for a reconfiguration control handshake has beendefined theoretically in the literature [30,33], butin this work a specific control system for IP-Corebased designs is proposed.

• Intra-task reconfiguration support for cores withembedded tiny processors, allowing differentsoftware contexts interchangeable by reconfigu-ration. The time and power overhead that thereconfiguration process generates [5,35] is depen-dent on the bitstream size. This size is drasticallyreduced when the intra-task reconfiguration isused, because there are few changes into the cir-cuit. Applying intra-task reconfiguration, twoaspects of the TC-Cores with embedded tiny pro-cessor are enhanced:– Software context interchange: The size of the

program memory for the tiny embedded pro-cessors into the core is strongly constrainedby the size of the dedicated RAM of theFPGA. The use of partial reconfiguration tocores with embedded microprocessors allowsthe addition of multiple software contexts thatenhance the FPGA use in applications wherethe Mixed Core architecture fits (control, pro-tocol processing, half-duplex transceivers,etc.). To support it, the embedded tiny proces-sor has been provided with a slave Tornado

InterFace (TIF). The control through thisinterface is needed to prevent signal conten-tions or the Program Control loss. These fea-tures have been included in a new soft

processor called Multicontext Tiny Micropro-

cessor (MTM). It is capable of managingmulti-context small software units with theTornado reconfiguration control signals.

– Hardware context modification: The intra-taskreconfiguration to modify a specific section ofa circuit has been integrated successfully intoindustrial-quality cores [2,36,37]. Nevertheless,this hardware reconfiguration is restricted tochanging the contents of an area-located ele-ments of the FPGA obtaining a slightly differ-ent core instead of a new one.

2.3. TC-Core reconfiguration control and

synchronization

Fig. 1b and a show the blocks diagrams of a gen-eralized CSoPC with Tornado. Basically, the systemoperation is summarized as follows: When no

reconfiguration is presented, the CSoPC runs with-out any interference. All the IP and TC-Cores arerunning in parallel using the on-chip buses to sharedata. To perform a reconfiguration for a given TC-Core, a Configuration Request Word (CRW), whichincludes the TC-Core number and the context, iswritten to the reconfiguration controller (TBC)through the standard on-chip bus by any modulewith a on-chip bus master interface. This informa-tion is queued and processed by the TBC, thatstarts the handshake sequence with the target TC-Core.

In order to apply safe reconfiguration, the TC-Cores and the tiny MTM support the followingfeatures:

• Reconfiguration enable/disable: Each differentTC-Core can temporally disable its own reconfig-uration. If it embeds a tiny processor, that oper-ation may be done by software using somespecific instructions (RECONF ENABLE/DIS-

ABLE).When the controller has to apply arequested reconfiguration to a target TC-Core,it sets the STB_REQ_RECONF(i) signal (singlefor each TC-Core) (Fig. 1b). If the reconfigura-tion is allowed, the TC-Core asserts the REQ_ACKsignal. During the reconfiguration the embeddedprocessor, if it is present, is freezed. That is, itdoes not attend its interfaces and the internalmodules are stopped (the Program Counter andthe access to the Program Memory are lock-ed).This signal handshake can also be used to dis-connect the whole core from the on-chip busduring an inter-task reconfiguration operationdisabling the three state buffers of the Bus-Macro. In this case all the core is removed exceptfor the Bus-Macro attached to the on-chip businterface (see Section 4).

• Program Counter and Stack Pointer control: Foreach TC-Core, the embedded processors’ Pro-gram Counter and Stack Pointer values must becontrolled during and after the partial reconfigu-ration, because the new configuration mayinclude a software context change. The controloptions are specified for each TC-Core usingtwo reconfiguration directives included into theassembler. The Program Counter Reset (PCR)and Stack Pointer Reset (SPR) signals definethe state of the software execution after thereconfiguration. If asserted, the ProgrammCounter and the Stack Pointer are reset afterthe partial reconfiguration process.

A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643 635

3. AMT platform

One of the projects that we have developed tovalidate and test the Tornado approach is a multi-channel digital HF transmitter with Direct DigitalSynthesis (DDS) capability, called ‘‘Adaptive Multi-band Transmitter’’ (AMT). We have used a Wish-bone compatible reconfiguration controller (TBC)and MTM tiny processors, both VHDL describedand customized for Xilinx devices. The assem-bler and the automatization scripts use the Xilinxtools to generate the bitstreams (global and partial)using the design flows summarized in [33].

The architecture of a generic AMT CSoPC is rep-resented in Fig. 3a. It has been built with the follow-ing cores:

• DDS-TXi: These cores generate the high-fre-quency modulated signal. They have a TC-Corestructure that is represented in Fig. 3b. EachDDS-TX core embeds MTM processor and ahardware 15 stages pipelined Cordic [38] with aloadable phase accumulator. There is a DDS-TX core for each channel. The Cordic is incharge of doing the DDS synthesis. The trans-mission data using a specific protocol is receivedthrough the USB module. The channel with itsassociated DDS-TX is identified in the messageheader. Thus, the USB module strobes the targetDDS-TX which reads the data from the USBmodule on-chip bus slave interface. The trans-mission modulation is generated as follows: Theinternal MTM processor writes to the phaseaccumulator the proper values. The phase accu-mulator output angle values are written to theCordic circuit. The sine and cosine Cordic out-puts are the HF modulation. Also, the internalMTM is in charge of doing some auxiliary tasksand controlling the internal FIFO and the on-chip bus interface.

• RSSI-LD: This TC-Core is in charge of readingthe Radio Signal Strength Indicator (RSSI)information given by the receptor in order toknow the link quality. The software of theMTM defines the internal thresholds that setthe more affordable modulation type in eachcase. The RSS-LD embeds a sigma-delta ADCHDL circuit that is completed with an externalcomparator [39].

• IP-Cores: As it has been presented in this paper,the proposed reconfiguration control systemallows the use of IP-Cores. In this application,

an USB interface-FIFO core that links with anFT245UM chip (IP-Core1) has been included.The IP-Core0 is a text-mode Video Wishbonecompatible slave core that drives a VGA moni-tor. It synthesizes directly the VGA signals andgives to the master cores an easy way to representtheir internal state (current modulation type orother issues).

• TBC: The reconfiguration controller. It receives,stacks and applies the Reconfiguration RequestsWords. In this application the reconfigura-tion requests can be written by the followingsources:– The DDS-TX modules can write through the

on-chip bus a reconfiguration word to be self-reconfigured. In same way, commands toforce a modulation mode or communicationmedia type change can be included into theinformation sent by the host connected tothe SoPC though the USB interface to eachDDS-TX.

– The RSSI-LD can write to the reconfigurationcontroller a reconfiguration word to changethe modulation type of either of the transmittermodules, when a change in the quality of thechannel is detected.

Table 2 represents the experimental results of theimplementation of a 2-channel AMT design in aSpartan II XC2S150 FPGA. Three modulationtypes for each DDS-TXi core (FSK, OOK andPSK-31) are selected for the AMT. Each one needsdifferent software for the embedded MTM tiny pro-cessors, and slightly different hardware (smallchanges into the phase accumulator and maximumand minimum angle boundaries in the Cordic). Tak-ing into account that each modulation type needsdifferent program stored in the FPGA BlockRAMsand hardware modification, in a static SoPC, 3 coreswill be required for each transmitter. If a staticSoPC design with one modulation type (basedesign) is compared with the CSoPC with Tornado

capabilities, the area overhead generated by theaddition of Tornado is only 141 slices and 1 Block-RAM. A SoPC without reconfiguration and withthe three modulation types requires more than 2.5times area and decreases a 30% the maximum run-ning speed compared with the CSoPC. In this eval-uation, the reconfigurable approach it also has beencompared with a no reconfigurable implementationof a two channel AMT. This implementationoffers the same features, three modulations for each

Fig. 3. N-channel AMT platform. (a) AMT platform blocks, (b) DDS-TXi core: Internal partition.

636 A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643

Table 2Implementation results for the AMT (Tool: Xilinx XST-ISE)

Resources Base design With Tornado reconfiguration Without reconfiguration

4 input LUTs 2.251 (65%) 2.475 (70%) 7.073 (204%)Virtex Slices 1.540 (89%) 1.681 (97%) 4.290 (248%)4 K BlockRam 8 (66%) 9 (75%) 13 (108%)Max. running speeda 118 MHz 115.4 MHz 81.74 MHz

The % represents the device occupation.a This frequency is the SoPC global clock x2 (doubled by internal DLL). The x2 domain clock is used in the Cordic modules that

generate de HF DDS signal.

A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643 637

channel, and it has not the Tornado infrastructure.The area overhead added to support the reconfigu-ration (controller, signals, etc.) is basically constant,so the benefit obtained increases with the number oftransmitters running in parallel.

In order to compare the evolution of the circuitcomplexity for both alternatives, SoPC and CSoPC,20 implementations with different number of chan-nels and different number of modulations have beenobtained. In Fig. 4a the maximum clock frequencyfor 5 SoPC and 5 CSoPC with 1 channel and 1–5modulations respectively are compared. As can itbe noticed, for AMT systems with one channel onlyif more than 4 modulations are used, the self-recon-figurable alternative offers higher running frequency(less circuit complexity). Fig. 4b shows the compar-ison of 5 SoPC and 5 CSoPC implementations withdifferent number of channels (1 to 5) and 3 modula-tion types. In this case, for more than 3 simulta-neous channels the CSoPC offers higher maximum

80

85

90

95

100

105

110

1 2 3 4 5Modulations supported (1 channel)

Frec

uenc

y (M

Hz)

Max. Freq. (CSoPC)Max. Freq. (SoPC)

a b

Fig. 4. Circuit global clock maximum running frequency evolution. (depending on the number of modulations supported. (b) Maximumdepending on the number of channels.

running frequencies. In this comparison, SoPCand CSoPC maximum running frequencies changewhen more channels are included because in bothapproaches the module number is incremented.

The time overhead is characterized using the fol-lowing parameters: Tpr (time to load the partial bit-stream) and Tap (average time to apply therequested reconfiguration) parameters.

The Tpr is function of the speed of the reconfig-uration port and the length of the partial bitstream.In the prototype board, a 50 MHz clock drives the8 bit SelectMap reconfiguration port (it supportsup 400 Mbit/s data rate), an the size of the partialbitstreams in this application is about 2 Kbytes(intra-task).

To estimate the Tap two time segments are con-sidered. The first, since the controller receives thereconfiguration word till it tries it application (sup-posing an empty queue) is close to 1.8 ls in this pro-totype. The second time segment includes the delay

0

20

40

60

80

100

120

1 2 3 4 5 6 7 8

Channels (simultaneous)

Frec

uenc

y (M

Hz)

Max. Freq. (CSoPC)Max. Freq. (SoPC)

a) Maximum running frequency for 1 channel AMT platformsrunning frequency for 3 modulation supported AMT platforms

638 A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643

if more reconfigurations are waiting in the queueand the delay if the reconfiguration is temporallyrejected. The minimum time required for eachreconfiguration supposing that there are no retries,one reconfiguration word waiting in the queue andwith a Tpr of 162 ls, is close to 328 ls.

Because a reconfiguration request may bedelayed or some may be queued, the final reconfig-uration time can not be know in advance. Thisuncertainty is a remarkable drawback of the Tor-

nado. The time overhead must be analyzed for eachapplication customizing, if it is necessary, the sche-dule policy of the reconfiguration controller. Thearchitecture of the proposed reconfiguration con-troller (TBC), with a software section, eases thatcustomization.

4. IP-Cores ‘‘on demand’’ platform

AMT CSoPC platform uses intra-task reconfigu-ration. This reconfiguration involves small modifi-cations and does not imply FPGA routingchanges. When a whole module want to be replaced,this is, the inter-task reconfiguration, the specificcharacteristics of the reconfigurable technology thatwill be used must be taken into account. For Xilinxdevices, when this reconfiguration modality is used,partial bitstreams are quite big (logic and routinginformation of all the involved vertical FPGA con-figuration frames), the routing between the staticpart and dynamic part must be ensured using a

TBM

TORNADOINTERFACE

TORNADOCONTROL

LOGIC

WISHBONESLAVE IF(STATIC

SECTION)

WISHBONEMASTER IF(DYNAMICSECTION)

I/O SIGNALS(STATIC

SECTION)

I/O SIGNALS(DYNAMICSECTION)

Block diagram

a b

Fig. 5. Tornado Bus-Macro. (a) Block

special pre-routed circuits (Bus-Macro) and the‘‘Module Based Design Flow’’ must be followed.Tornado faces inter-task reconfiguration with a spe-cific Bus-Macro designed to fulfill both Tornadocontrol protocol and Xilinx routing specificrequirements.

Fig. 5a shows the Tornado compatible Bus-Macro block diagram. This pre-routed and relocat-able module has two on-chip Wishbone interfaces.The slave one, at the left, links the Bus-Macro withthe CSoPC static part on-chip bus. The masterWishbone interface, on the right, is used to connectdynamically interchangeable Wishbone compatibleIP-Cores, wrapping them. The Bus-Macro has aTornado slave interface at the left that managesthe reconfiguration handshake with the reconfigura-tion controller implemented in the static section ofthe design. Inside the Bus-Macro a small FiniteState Machine made with two slices is stored. ThisFSM is in charge of controlling Tornado slave inter-face handshake signals. To complete the Bus-Macro, some pre-routed signals have been includedto connect I/O ports located in the dynamic sectionsto signals generated in the static one. In Fig. 5b thepre-routed Bus-Macro module is represented. To fixthe defined routes, tri-state buffers have been usedfollowing the Xilinx directives given for Spartanand Virtex devices.

Due to the complexity of inter-task reconfigura-tion and the lack of robustness of the availabledesign tools for this design flow, the experimental

Placed and routed

diagram, (b) placed and routed.

A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643 639

inter-task CSoPC that we have designed is focusedmore on the validation of Tornado approach forinter-task reconfiguration than on a practicalapplication.

Fig. 6a shows the block diagram of the designedCSoPC. The dynamic IP-Cores are fixed on theright (one IP-Core at the time). This implies, thatall the configuration frames right to the Bus-Macroare rewritten during the reconfiguration stage. The

Static section

R1C1:R24C20(rectangle)

ToBus

Fig. 6. IP-Cores ‘‘on demand’’ CSoPC. (a) Block diagram, (b) static an

remaining circuits, left to the Bus-Macro, are thestatic section (see Fig. 6b). The dynamic IP-Coreis linked to the on-chip bus through the Bus-Macro.The remaining cores are the following ones:

• TBC: The reconfiguration controller.• MTM-UART: Mixed Core with a high speed

UART and a MTM tiny processor. An externalhost can write and read any data on the CSoPC

Dynamic section

R1C21:R24C36(rectangle)

rnado -Macro

R4C19(origin)

FPGA

d dynamic sections boundaries defined using Xilinx coordinates.

Fig. 7. Static section modules placed, routed and linked to the Bus-Macro.

640 A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643

memory map through this module using a com-mand set that the tiny processor interprets. Usingthis module, the Configuration Request Words,which set what dynamic IP-Core should beloaded, are written from the host to the TBC.

• WB-VGA: Text-mode video controller IP-Core.It is used for debugging purposes.

Fig. 8. WB_ADC IP-Core routed and linked t

Figs. 7 and 8 show respectively the staticCSoPC section and a dynamic IP-Core (a WishboneADC controller IP-Core) routed and linked to theBus-Macro in a Spartan-II FPGA. Each dynamicIP-Core involves a partial bitstream. The size isproportional to the area that is assigned to thedynamic area. In this design, it about the 30% of

o the Bus-Macro in the dynamic section.

A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643 641

the FPGA logic matrix that gives partial bitstreamsof 38.804 bytes. With bigger devices this size is signif-icantly increased that could involve importanttime penalties not acceptable depending on theapplication.

5. Conclusions

In this work we have presented a self-reconfigu-ration control system focused on multiprocessorcore-based systems implemented on FPGA technol-ogy. An infrastructure of signals, protocols, inter-faces and controller are exposed to perform safehardware/software reconfigurations. Table 3 sum-marizes Tornado features. If it is compared withother control approaches, it allows controlledinter-task reconfiguration and Mixed Core intra-task reconfiguration. It adds a ‘‘light’’ infrastructureto exploit reconfiguration technology. The use ofreconfiguration controlled with Tornado shows animportant area save. The simplification obtained,both for the cores and for the system in general,allows higher clock rates for the same reconfigura-ble device. The use of multi-context Mixed Coresenables new solutions to emerging areas such asSoftware Defined Radio.

Tornado approach distributes the computationbetween the different master cores of the system,so the reconfiguration controller is simple. Com-pared with other self-reconfiguration approaches(Table 1), which use a general purpose microproces-sor as reconfiguration controller, the cost of theTornado infrastructure (additional elements neededfor the reconfiguration control) is very low.

To evince the applicability of this approach, twoprojects, one using intra-task reconfiguration andother using inter-task reconfiguration have beenpresented. In spite of the generality of the frame-work, the experimental results show the viabilityof the approach. However, the experimental workhas evidenced that the nowadays available design

Table 3Tornado main features

Feature Tornado

Standard Bus specification YesReconfiguration controller Specific (TBC)Intra-task reconfiguration support YesInter-task reconfiguration support YesMixed core software reconfiguration support YesReconfiguration control handshake YesCore disconnection during its reconfiguration Yes

tools for partial reconfigurable systems design onlyensure intra-task reconfiguration. Future works willbe focused on the application of the Tornado tomore complex systems, on the design of more flexi-ble reconfiguration controllers that supportdynamic partial bitstream generation and on theenhancement of the design flow applying promisingnew design tools, like Xilinx Planahead, that sup-port dynamic partial reconfiguration.

References

[1] Edson L. Horta, John W. Lockwood, David E. Taylor,David Parlour, Dynamic hardware plugins in an FPGA withpartial run-time reconfiguration, in: Proceedings of theDesign Automation Conference (DAC’02), New Orleans,LA, June 2002, pp. 343–348.

[2] K. Danne, C. Bobda, H. Kalte, Run-time exchange ofmechatronic controllers using partial hardware reconfigura-tion, Lecture Notes in Computer Science 2778 (2003) 272–281.

[3] G. Martin, H. Chang (Eds.), Winning the SoC Revolution:Experiences in Real Design, Kluwer Academic Publishers,Massachusetts, USA, 2003.

[4] R.A. Bergamaschi, S. Bhattacharya, R. Wagner, C. Fellenz,M. Muhlada, Automating the design of socs using cores,IEEE Design & Test of Computers 18 (5) (2001) 32–45.

[5] K. Compton, S. Hauck, Reconfigurable computing: a surveyof systems and software, ACM Computing Surveys 34 (2)(2002) 171–210.

[6] R.J. Fong, S.J. Harper, P.M. Athanas, A versatile frame-work for FPGA field updates: an application of partialself-reconfiguration, in: Proceedings of the 14th IEEEInternational Workshop on Rapid Systems Prototyping(RSP’03), June 2003, pp. 117–123.

[7] Xilinx Corp. ISE 6.1 Xilinx Libraries Guide. http://sup-port.xilinx.com, 2003.

[8] B. Blodget, P. James-Roxby, E. Keller, S. McMillan, P.Sundararajan, A self-reconfiguring platform, Lecture Notesin Computer Science 2778 (2003) 565–574.

[9] Inc. IBM. Coreconnect Spec. IBM web site: http://www.chips.ibm.com/products/coreconnect, 2003.

[10] M. Ullmann, M. Hnbner, B. Grimm, J. Becker, On-demandFPGA run-time system for dynamical reconfiguration withadaptive priorities, Lecture Notes in Computer Science 3203(2004) 454–463.

[11] M. Ullmann, M. Hnbner, B. Grimm, J. Becker, An FPGArun-time system for dynamical on-demand reconfiguration,in: Proceedings of the 11th Reconfigurable ArchitecturesWorkshop (RAW/IPDPS), April 2004.

[12] L. Moller, N. Calazans, F. Moraes, E. Briao, E. Carvalho,D. Camozzato, FiPRe: an implementation model to enableself-reconfigurable applications, Lecture Notes in ComputerScience 3203 (2004) 1042–1046.

[13] J.C. Palma, A.V. de Mello, L. Moller, F. Moraes, N.Calazans, Core communication interface for FPGAs, in:Proceedings of the ACM 17th South MicroelectronicsSeminar (SIM’02), June 2002, pp. 183–190.

[14] H. Walder, M. Platzner, Reconfigurable hardware operat-ing systems: from design concepts to realizations, in:

642 A. Astarloa et al. / Journal of Systems Architecture 53 (2007) 629–643

Proceedings of the 3rd International Conference on Engi-neering of Reconfigurable Systems and Architectures(ERSA), CSREA Press, 2003, pp. 284–287.

[15] H. Walder, M. Platzner, A runtime environment for recon-figurable hardware operating systems, Lecture Notes inComputer Science 3203 (2004) 831–835.

[16] H. Walder, S. Nobs, M. Platzner, XF-Board: a prototypingplatform for reconfigurable hardware operating systems, in:Proceedings of the 3rd International Conference on Engi-neering of Reconfigurable Systems and Architectures(ERSA), CSREA Press, 2004, p. 306.

[17] U. Bidarte, A. Astarloa, A. Zuloaga, J. Jimenez, I. Martinezde Alegrıa, Core-based reusable architecture for slavecircuits with extensive data exchange requirements, LectureNotes in Computer Science 2778 (2003) 497–506.

[18] A. Astarloa, U. Bidarte, A. Zuloaga, A reconfigurable SoCarchitecture for high volume and multi-channel data trans-action in industrial environments, in: Proceedings of theInternational IEEE Conference on Industrial Electronics,Control and Instrumentation IECON’02, November 2002,pp. 2322–2327.

[19] A. Astarloa, U. Bidarte, J. Lazaro, A. Zuloaga, J. Arias,Multiprocessor SoPC-Core for FAT volume computation,Microprocessors and Microsystems 29 (10) (2005) 421–434.

[20] A. Astarloa, U. Bidarte, A. Zuloaga, J. Arias, J. Jimenez,Run-time reconfigurable hybrid multiprocessor cores, in:Proceedings of the 2004 IEEE Internacional Conference onIndustrial Technology, December 2004.

[21] M.L. Silva, J. Canas Ferreira, Support for partial run-timereconfiguration of platform FPGAs, Journal of SystemsArchitecture (52) (2006) 619–639.

[22] A. Astarloa, Dynamic partial reconfiguration of multi-processor modular systems in SoPC devices, Ph.D. thesis,University of the Basque Country, July 2005.

[23] J. Becker, A. Thomas, M. Vorbach, V. Baumgarte, Anindustrial/academic configurable system-on-chip project(csoc): coarse-grain xpp-/leon-based architecture integration,in: Proceedings of the Design, Automation and Test inEurope Conference and Exhibition (DATE’03), September2003, pp. 1120–1121.

[24] Silicore Corporation. Wishbone System-on-Chip (SoC)Interconnection Architecture for Portable IP Cores Revision:B.3. http://www.opencores.org, September 2002.

[25] K. Chapman, PicoBlaze 8-Bit Microcontroller for Virtex-Eand Spartan II/IIE Devices, Xilinx Application Notes.http://www.xilinx.com, February 2003.

[26] G. Ferrante, CPUGEN Tutorial V1.00. http://www.open-cores.org/projects/cpugen, 2003.

[27] H.V. Kampen, AES-128 for Picoblaze-I, http://www.media-tronix.com/FreeIP.htm, 2003.

[28] J. Lazaro, J. Arias, J.L. Martin, A. Astarloa, U. Bidarte, Atiny microprocessor floating point implementation of ageneral regression neural network, WSEAS Transactionson Computers 4 (2) (2005) 280–285.

[29] Kyeong Keol Ryu, Eung Shin, Vincent J. Mooney,A comparison of five different multiprocessor SoC busarchitectures, in: Proceedings of the Euromicro Symposiumon Digital Systems Design (DSD’01), September 2001,p. 202.

[30] R. Sidhu, V.K. Prasanna, Efficient metacomputation usingself-reconfiguration, Lecture Notes in Computer Science2438 (2002) 698–709.

[31] S.A. Guccione, D. Levi, The advantages of run-timereconfiguration, in: Proceedings of the ReconfigurableTechnology: FPGAs for Computing and Applications(SPIE’99), The International Society for Optical Engineer-ing, 1999, pp. 87–92.

[32] Xilinx Corp. JBits 3.0 SDK for Virtex-II. http://www.xi-linx.com/labs/projects/jbits/, August 2003.

[33] Xilinx Corp. Two flows for partial reconfiguration: modulebased or small bit manipulations. Xilinx Application Notes,http://www.xilinx.com, May 2002.

[34] A. Astarloa, U. Bidarte, J. Jimenez, J. Arias, I. Kortabarrıa,Wishbone compatible Bus-Macro for inter-task partialreconfiguration, in: Proceedings of the Jornadas de Compu-tacion Reconfigurable y Aplicaciones (JCRA’05), Universityof Granada, 2005, pp. 17–24.

[35] J.M.P. Cardoso, On combining temporal partitioning andsharing of functional units in compilation for reconfigurablearchitectures, IEEE Transactions on Computers 52 (10)(2003) 1362–1375.

[36] V. Eck, P. Kalra, R. LeBlanc, J. McManus, In-circuit partialreconfiguration of RocketIO attributes, Xilinx ApplicationNotes. http://www.xilinx.com, January 2003.

[37] M. Dyer, C. Plessl, M. Platzner, Partially reconfigurablecores for Xilinx Virtex, Lecture Notes in Computer Science2438 (2002) 292–301.

[38] R. Herveille, CORDIC core. http://www.opencores.org/projects.cgi/web/cordic, 2001.

[39] J. Logue, Virtex Analog to Digital Converter, XilinxApplication Notes. http://www.xilinx.com, September 1999.

Armando Astarloa received the M.S. andPh.D. degrees in electrical engineeringfrom the University of the BasqueCountry, Spain, in 1999 and 2005,respectively. From 1999 to 2001, heworked as R&D engineer for a privatecompany. Since 2001 he is AssistantLecturer in electronic technology at theElectronics and TelecommunicationsDepartment of the University of theBasque Country. His main research

interests are Reconfigurable Circuits, System-on-Chip and Digi-tal Communications Circuits.

Aitzol Zuloaga received the B.S. degreein electrical engineering and M.S. degreein project management from the SimonBolivar University, Venezuela, in 1985and 1992, respectively, and Ph.D.degrees in telecommunications engineer-ing from the University of the BasqueCountry, Spain, 2001. From 1985 to1995 he was with different R&Ddepartments of private companies inVenezuela. In 1995, he joined the Uni-

versity of the Basque Country with a predoctoral grant. In 2000he became Assistant Professor in electronic technology at the

Electronics and Telecommunications Department of the Uni-versity of the Basque Country. His main research interests areImage Processing, System-on-Chip, and Digital CommunicationsCircuits.

stems

Unai Bidarte received the M.S. andPh.D. degrees in telecommunicationsengineering from the University of theBasque Country, Spain, in 1996 and2004, respectively. Since 1999 he isAssistant Professor in electronic tech-nology at the Electronics and Telecom-munications Department of theUniversity of the Basque Country. Hismain research interests are Reconfigura-ble Circuits, System-on-Chip and Digital

Communications Circuits.

A. Astarloa et al. / Journal of Sy

Jose L. Martın received the M.S. andPh.D. degrees in electrical engineeringfrom the University of the BasqueCountry, Spain, in 1988 and 1992respectively. From 1989 to 1995, he wasAssistant Professor in electronic tech-nology at the Electronics and Telecom-munications Department of theUniversity of the Basque Country. In1995, he became Associate Professor.From 1995 to 2001 he has been Head of

the Electronics and Telecommunications Department. From 2001to 2005 he has been Vice-Dean of the Faculty of Engineering in

Bilbao, Spain. His main research interests are Image Processing,System-on-Chip, and Digital Communications Circuits.

Jesus Lazaro received the M.S. andPh.D. degrees in telecommunicationsengineering from the University of theBasque Country, Spain, in 2001 and2005, respectively. Since 2001 he isAssistant Professor in electronic tech-nology at the Electronics and Telecom-munications Department of theUniversity of the Basque Country. Hismain research interests are Reconfigura-ble Circuits and System-on-Chip.

Architecture 53 (2007) 629–643 643

Jaime Jimenez received the M.S. degreein telecommunications engineering fromthe University of the Basque Country,Spain, in 1991. From 1991 to 1997, hewas with the Robotiker TechnologicalCenter and the Basque Government.Since 1998 he is Assistant Professor inelectronic technology at the Electronicsand Telecommunications Department ofthe University of the Basque Country. In2005 he received the Ph.D. degree in

telecommunicactions engineering. His main research interests areDesign Methodologies, System-on-Chip and Digital Communi-

cations Circuits.