A ferroelectric memory-based secure dynamically programmable gate array

11
IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 5,MAY 2003 715 A Ferroelectric Memory-Based Secure Dynamically Programmable Gate Array Shoichi Masui, Member, IEEE, Tsuzumi Ninomiya, Michiya Oura, Wataru Yokozeki, Kenji Mukaida, and Shoichiro Kawashima, Member, IEEE Abstract—A nonvolatile ferroelectric memory-based eight-con- text dynamically programmable gate array (DPGA) enables low-cost field programmable systems by the elimination of off-chip nonvolatile memories as well as the multicontext archi- tecture. Since read and program sequences of configuration data loading from/to the DPGA are securely protected, unauthorized users cannot access the stored configuration data. The associated configuration memory consists of a SRAM-based six-transistor and 4-ferroelectric capacitor cell. The developed configuration memory achieves access time of 4 ns, comparable to standard SRAM, which is 20 times faster than conventional ferroelectric memory; furthermore, it features a nondestructive read operation and a stable data recall scheme. The employed logic block circuit can effectively improve the available number of logic gates for the multicontext scheme with minimum area overhead. The prototype nonvolatile DPGA is fabricated in a 0.35- m CMOS with ferro- electric memory technology, and the implementation result of the Data Encryption Standard (DES) encryption/decryption functions on this DPGA presents proper operation up to 51 MHz at 3.3 V. The nonvolatile storage of configuration memory is verified for power-supply voltage as low as 1.5 V at room temperature, which is the lowest operation voltage ever reported for PbZrTiO (PZT)-based ferroelectric memories. Index Terms—Ferroelectric storage, field programmable gate arrays, programmable logic devices, security of data. I. INTRODUCTION T HE field-programmable gate array (FPGA) is extending its market because of its lower development cost compared with the mask programmable gate array (MPGA) [1]. How- ever, the conventional SRAM-based FPGA requires off-chip nonvolatile memory devices to store configuration data; as a result, the total device costs and the board area increase. FPGA vendors as well as users have been demanding nonvolatile memory technology to realize single-chip nonvolatile FPGAs. Floating-gate-type nonvolatile memories, such as EEPROM and flash memories, have the possibility to integrate FPGA with nonvolatile configuration data storage memory into one chip; however, its propagation delay is inferior to standard CMOS devices, and the number of additional masks to the standard CMOS technology required to fabricate an embedded nonvolatile memory is typically greater than six [2]. On the other hand, ferroelectric random access memory (FeRAM) technology can provide a nonvolatile storage device maintaining Manuscript received August 2, 2002; revised November 18, 2002. S. Masui, M. Oura, and S. Kawashima are with Fujitsu Laboratories, Ltd., Akiruno 197-0833, Japan (e-mail: [email protected]). T. Ninomiya, W. Yokozeki, and K. Mukaida are with Fujitsu Ltd., Akiruno 197-0833, Japan. Digital Object Identifier 10.1109/JSSC.2003.810034 Fig. 1. Cross-sectional view of 0.35- m CMOS FeRAM device. TABLE I COMPARISON OF NONVOLATILE MEMORY CHARACTERISTICS perfect compatibility with standard CMOS performance and design reusability. As is shown in Fig. 1, a 0.35- m FeRAM device features planar ferroelectric capacitors placed between standard CMOS transistors and metal interconnection layers. The ferroelectric material used is PbZrTiO (PZT). The number of additional masks for making the ferroelectric capacitors can be reduced to two by improvements of fabrication technology [3]. Table I compares characteristics of the conventional non- volatile memories and FeRAM. In addition to the standard CMOS process compatibility, FeRAM outperforms others with regard to the program time, the program voltage, and the program energy. The fast program time does not have much impact when configuration data is programmed by FPGA users. However, when a number of FPGAs must be configured for a mass production purpose, the configuration cost is minimized through the configuration programming by 0018-9200/03$17.00 © 2003 IEEE

Transcript of A ferroelectric memory-based secure dynamically programmable gate array

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 5, MAY 2003 715

A Ferroelectric Memory-Based SecureDynamically Programmable Gate Array

Shoichi Masui, Member, IEEE, Tsuzumi Ninomiya, Michiya Oura, Wataru Yokozeki, Kenji Mukaida, andShoichiro Kawashima, Member, IEEE

Abstract—A nonvolatile ferroelectric memory-based eight-con-text dynamically programmable gate array (DPGA) enableslow-cost field programmable systems by the elimination ofoff-chip nonvolatile memories as well as the multicontext archi-tecture. Since read and program sequences of configuration dataloading from/to the DPGA are securely protected, unauthorizedusers cannot access the stored configuration data. The associatedconfiguration memory consists of a SRAM-based six-transistorand 4-ferroelectric capacitor cell. The developed configurationmemory achieves access time of 4 ns, comparable to standardSRAM, which is 20 times faster than conventional ferroelectricmemory; furthermore, it features a nondestructive read operationand a stable data recall scheme. The employed logic block circuitcan effectively improve the available number of logic gates for themulticontext scheme with minimum area overhead. The prototypenonvolatile DPGA is fabricated in a 0.35- m CMOS with ferro-electric memory technology, and the implementation result of theData Encryption Standard (DES) encryption/decryption functionson this DPGA presents proper operation up to 51 MHz at 3.3 V.The nonvolatile storage of configuration memory is verifiedfor power-supply voltage as low as 1.5 V at room temperature,which is the lowest operation voltage ever reported for PbZrTiO3(PZT)-based ferroelectric memories.

Index Terms—Ferroelectric storage, field programmable gatearrays, programmable logic devices, security of data.

I. INTRODUCTION

T HE field-programmable gate array (FPGA) is extendingits market because of its lower development cost compared

with the mask programmable gate array (MPGA) [1]. How-ever, the conventional SRAM-based FPGA requires off-chipnonvolatile memory devices to store configuration data; as aresult, the total device costs and the board area increase. FPGAvendors as well as users have been demanding nonvolatilememory technology to realize single-chip nonvolatile FPGAs.Floating-gate-type nonvolatile memories, such as EEPROMand flash memories, have the possibility to integrate FPGAwith nonvolatile configuration data storage memory into onechip; however, its propagation delay is inferior to standardCMOS devices, and the number of additional masks to thestandard CMOS technology required to fabricate an embeddednonvolatile memory is typically greater than six [2]. On theother hand, ferroelectric random access memory (FeRAM)technology can provide a nonvolatile storage device maintaining

Manuscript received August 2, 2002; revised November 18, 2002.S. Masui, M. Oura, and S. Kawashima are with Fujitsu Laboratories, Ltd.,

Akiruno 197-0833, Japan (e-mail: [email protected]).T. Ninomiya, W. Yokozeki, and K. Mukaida are with Fujitsu Ltd., Akiruno

197-0833, Japan.Digital Object Identifier 10.1109/JSSC.2003.810034

Fig. 1. Cross-sectional view of 0.35-�m CMOS FeRAM device.

TABLE ICOMPARISON OFNONVOLATILE MEMORY CHARACTERISTICS

perfect compatibility with standard CMOS performance anddesign reusability. As is shown in Fig. 1, a 0.35-m FeRAMdevice features planar ferroelectric capacitors placed betweenstandard CMOS transistors and metal interconnection layers.The ferroelectric material used is PbZrTiO(PZT). The numberofadditionalmasks formaking the ferroelectriccapacitorscanbereduced to two by improvements of fabrication technology [3].

Table I compares characteristics of the conventional non-volatile memories and FeRAM. In addition to the standardCMOS process compatibility, FeRAM outperforms otherswith regard to the program time, the program voltage, andthe program energy. The fast program time does not havemuch impact when configuration data is programmed byFPGA users. However, when a number of FPGAs must beconfigured for a mass production purpose, the configurationcost is minimized through the configuration programming by

0018-9200/03$17.00 © 2003 IEEE

716 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 5, MAY 2003

Fig. 2. Manufacturer’s configuration scheme with FeRAM-based FPGAs.

FPGA manufacturers with FeRAM-based FPGAs, as shownin Fig. 2. By the configuration programming followed by thelogic testing, 1-Mb data can be programmed in 11.1 ms for8-bit word FeRAM; however, an intolerable 65.5 s is requiredfor 32-bit word EEPROM. In FeRAMs, since the programvoltage is equivalent to the power supply of the logic circuit,on-chip high-voltage generator circuits can be eliminated tominimize the die size. The program endurance (the numberof overwrite programming cycles that devices can assureproper programming and ten-year storage) is less significantfor FPGA applications because the number of configurationreprogramming opportunities is fairly small.

Another drawback of conventional FPGAs is that logicdensity (the number of logic gates per unit area) is smaller thanMPGA. A dynamically programmable gate array (DPGA),time-multiplexed FPGA, and dynamically reconfigurable logicutilize a technique to improve the available logic densityby implementing multiple banks of configuration memoryand changing configuration (context) during logic operation[4]–[8]. Since the area occupied by the configuration memoryis typically 10% of the entire FPGA, two sets of configurationmemory can double the available logic gate density with anarea overhead of 10% [9].

The above consideration predicts the possible cost minimiza-tion of FPGA for a mass production purpose by the combinationof FeRAM and multicontext technologies. This paper exploreshow the FeRAM circuit is optimally implemented to the non-volatile DPGA. In Section II, we illustrate the architecture ofa prototype DPGA and describe secure configuration and con-text control schemes. Section III describes design and opera-tion of the ferroelectric configuration memory. Developed non-volatile SRAM-based memory features fast access time of lessthan 4 ns as well as nondestructive read operation, and bothcharacteristics are suitable and required for the DPGA applica-tions. Section IV shows how an employed logic block circuit canreduce the number of used logic blocks through a benchmark ofData Encryption Standard (DES) encryption/decryption opera-tion. Section V provides physical design and simulated charac-teristics for the prototype DPGA. The measured performance ofthe DES implementation as well as the basic functional blockand evaluation results of configuration memory operations in-cluding reliability stress tests are presented in Section VI. Sec-tion VII presents the conclusion and future direction for thiswork.

II. A RCHITECTURE

Fig. 3 presents the hierarchical structure of a prototypeFeRAM-based DPGA, and features of the primary architectureare summarized in Table II. The elemental logic block isorganized with a four-input lookup table (LUT), a flip-flop,and multiplexers for input selection. The subarray consists of4 4 elemental logic blocks, a bounded-subarray-style localinterconnection [9], and a configuration memory controller.The middle array is formed by 2 2 subarrays connected witha NEWS network of level-2 crossbars. Direct interconnectionsfrom one subarray to another are available between the north,east, west, and south neighbors with the NEWS network.The level-2 crossbar consists of eight 16 : 1 multiplexers,where the control of the crossbar selection is encoded with themultiplexer to reduce the capacity of configuration memory[10]. The entire chip is organized by symmetrical 22 middlearrays connected with level-3 crossbars. The level-3 crossbarenables selection of global interconnections across the entirelength of the logic block array by sixteen 16 : 1 multiplexers.These global interconnections are supplied to the level-2interconnections in the form of a higher level NEWS network,resulting in a tree structure (pyramid architecture).

Four 512-bit programmable ROMs (PROMs) are placed oneach side of the symmetrical array. The PROM has the samestructure as the configuration memory and is programmed si-multaneously with the DPGA array configuration.

Since our target is security application by the use of em-bedded nonvolatile storage for keys and related parameters, wepresume that the logic function assigned to each context of theDPGA does not change every clock cycle, which is in contrastwith the time-multiplexed FPGA [5]. In a multiround encryp-tion/decryption operation, logic functions can be spread overseveral contexts to improve logic density. The logic functionfor each context is less frequently changed, and intermediateresults of operations are typically generated in the final step ofeach context. Therefore, the intermediate results of logic func-tions are stored in the flip-flop of the elemental logic block, andregisters additionally placed in I/O circuits. This architecture isverified by the benchmarking of the DES implementation, as isdescribed in Section IV.

The number of the DPGA context is eight, and the equiva-lent logic gate count is calculated as 29 K, and the system gatecount becomes 68 K if 20% of the total LUTs are used as a localmemory [11]. The total capacity of the eight-context configura-tion memory is 92 kb.

Fig. 4 presents a schematic diagram of configurationdata programming and configuration memory control. Con-figuration data for each middle array are simultaneouslytransferred through four-way serial paths, denoted as SINand SOUT , and can be programmed into configurationmemory without causing interference with logic functions.Fig. 5 presents the structure of the configuration memory, forthe case where its outputs are connected to a LUT. The outputof the configuration memory is also connected to the level-2and level-3 crossbars and programmable I/Os. Eight-contextconfiguration data are stored in an eight-row FeRAM array

MASUI et al.: FERROELECTRIC MEMORY-BASED SECURE DYNAMICALLY PROGRAMMABLE GATE ARRAY 717

Fig. 3. Hierarchical structure of prototype FeRAM-based DPGA.

TABLE IISUMMARY OF PRIMARY ARCHITECTURE

controlled by the corresponding wordline (WL) and plateline(PL). The plateline is an additional control line specificallyused in FeRAM, and is illustrated in detail in Section III.The output buffers store configuration data correspondingto the currently operating function, and enable background

configuration programming by isolating bitline signals fromits outputs. The configuration data are supplied from the shiftregister, and are programmed through the write amplifier bythe control of write enable signal (WE) and the PL. It shouldbe noted that no sense amplifiers are used in the configurationmemory in order to reduce the memory area.

As is shown in Fig. 4, context is controlled by 3-bit com-mand CMD and 3-bit context identifier CIN , whichare generated from off-chip signals CMDEXT and CIDEXTor internally generated signals CIDCHG and CIDINT. Theseinternal signals are generated from the logic block array im-plemented as a context controller (typically, a counter repre-senting the number of clock cycles assigned for each context).CMD specifies one of fundamental configuration memoryoperations, which consists of configuration data transfer fromSIN to SOUT, CID change corresponding to the logic functionchange, background configuration programming, configurationdata recall, power-off, and several test modes. The recall com-mand initiates a read-out sequence from nonvolatile ferroelec-tric capacitors to the corresponding memory cells right after thepower-on. The power-off command protects contents of config-uration memory in the power-supply ramp-down period. Bothof the recall and power-off commands are generated accordingto the output signal from the power detector, PDET, as shown inFig. 4. CMD and CIN are supplied to the command de-

718 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 5, MAY 2003

Fig. 4. Schematic diagram of configuration data programming and configuration memory control.

Fig. 5. Structure and control of configuration memory.

coder of the configuration memory controller as shown in Fig. 5.The context can be changed in one clock cycle, and the overall

latency of the CID change from asserting the internal signal,CIDCHG, is six clock cycles.

The contents of nonvolatile configuration memory must besecurely protected from malicious readout and overwriting.Fig. 6 represents bitstreams associated with the configurationdata program and read. The input bitstream consists of thesynchronizing word, command indicating program or read,security ID, configuration data if any, and cyclic redundancycheck (CRC) code to detect communication errors. The total1-kb security ID for each user is programmed in a specificregion of the configuration memory after device tests. If thesecurity ID in the bitstream does not match the one stored inthe configuration memory, the program or read operation isprohibited. This security lock function is indispensable fornonvolatile field programmable devices, and the security ofconfiguration data protection can be extended to the encryptionof transferred data by the use of key and parameters stored in thenonvolatile memory. Information stored in the FeRAM-basedconfiguration memory has enough resistance to tampering fromdestructive and nondestructive analyses.

Before the configuration data is programmed, undefined out-puts from the logic block array might cause circuit instability;for example, unwanted CID change signals might be gener-ated and result in large power consumption by frequent CIDchanges. In this DPGA, the CFGDONE signal, representing

MASUI et al.: FERROELECTRIC MEMORY-BASED SECURE DYNAMICALLY PROGRAMMABLE GATE ARRAY 719

Fig. 6. Bitstreams for configuration data programming.

Fig. 7. Conventional 2T2C FeRAM cell.

whether the program has been completed, is stored in the con-figuration memory so as to prevent any logic block outputs fromgenerating unnecessary control signals before the configurationdata are loaded [12].

III. FERROELECTRICCONFIGURATION MEMORY

Current FeRAM products utilize two-transistor/two-capac-itor (2T2C) memory cells, as shown in Fig. 7, to keep thememory cell size small and maintain high reliability [13]. Inthis memory cell, two ferroelectric capacitors FC1 and FC2store opposite polarization states, shown by the ferroelectrichysteresis loop in Fig. 8. This curve is measured at 1-MHzsweep rate and room temperature. If FC1 stores the “0” statewhose polarization results from the application of positivevoltage, FC2 stores the “1” state, indicating the previousapplication of negative voltage. The read operation of thememory cell detects the difference between the equivalentcapacitances C0 and C1. In order to create the capacitancedifference, the read operation is executed by precharging thebitlines (with appropriate node capacitances) to GND and,subsequently, driving the plateline from low to high. Thecapacitance difference between memory cells generates thevoltage difference between the bitlines, as denoted as,then this voltage difference is amplified to full-rail voltageby a differential sense amplifier. The typical value of for

Fig. 8. Ferroelectric hysteresis loop of 0.35-�m FeRAM cell.

a 2T2C cell is 600 mV [14], which is much higher than thatof DRAM. In consequence of this large operation margin, asmaller geometry cell organized by a one-transistor/one-capac-itor (1T1C) memory cell has been developed by the use of asophisticated reference technique [15].

Drawbacks of the conventional FeRAM are that the read cycletime is larger than 80 ns due to the PL delay, and the number ofmaximum read and program cycles are limited to 110 dueto the destructive read and material wearout. As is observed inFig. 8, the ferroelectric capacitor in the “1” state is changed tothe “0” state after the PL drive, which results in the destruc-tive read. This memory cell is suitable for smart card [16] andFPGA applications, since memory cell geometry is small andthe number of read access cycles to the nonvolatile memory islimited by system requirements; on the other hand, DPGA re-quires read access time of less than 10 ns for the increasing clockfrequency as well as the nondestructive read to accommodatethe frequent CID change.

Ferroelectric nonvolatile SRAM, as shown in Fig. 9(a) [17],can realize read access time comparable to an SRAM cell, andthe read operation becomes nondestructive if the voltage acrossthe ferroelectric capacitors FC1 and FC2 does not change duringread operation. This is easily accomplished by the optimiza-tion of the memory cell and peripheral circuits. The memorycell size can be the same as that of the conventional SRAMby stacking ferroelectric capacitors above cell transistors [17].With all advantages, however, we have employed the six-tran-sistor/four-capacitor (6T4C) cell, as shown in Fig. 9(b), to im-

720 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 5, MAY 2003

TABLE IIIOPERATIONS ANDASSOCIATEDCONTROL SIGNALS FOR THEFERROELECTRICSRAM CELLS

(a)

(b)

Fig. 9. Ferroelectric SRAM-based configuration memory cell. (a) Con-ventional cell. (b) Stable data recall cell.

prove data recall characteristics. The four capacitors are con-trolled by two platelines PL1 and PL2.

Operations and associated control signals for both ferroelec-tric SRAM cells are summarized in Table III. The commanddecoder in the memory controller, shown in Fig. 5, generatessequential sets of operations, presented in Table III, fromCMD . For example, when the CID change command isissued from the command/CID generator, the command de-coder generates the read and the subsequent normal (standby)operations. CIN specifies the selected pair of wordlineand plateline. Program and read operations are the same as theconventional SRAM except for the plateline control. PlatelinesPL, PL1, and PL2 are set at half in normal and read opera-tions to mitigate the imprint effect of the ferroelectric material

Fig. 10. Power-supply control circuit for the employed ferroelectric SRAMcell.

[17]. This imprint effect corresponds to the horizontal shift ofthe ferroelectric hysteresis loop resulted from dc voltage andtemperature stresses, and causes difficulty in switching fromone state to the opposite state [14].

The most important operation in Table III is recall, since allof the configuration data, 92 kb for this prototype DPGA, mustbe properly regenerated from the ferroelectric capacitors to thecorresponding crosscoupled SRAM cell in one sequence. In theconventional data recall operation employed for the 6T2C cell,the power supply (PWR) is applied to the cell array while PLis set to low. On the other hand, in the employed cell, PL1 is atfirst driven from low to high while PL2 is kept low; then PWR isapplied to the cell. Fig. 10 illustrates the power-control circuit tosupply power to the employed memory cell arrays by assertingan activation signal EN, and negating its complement XEN afterthe PL1 drive.

Fig. 11 shows the SPICE simulation results of data recalloperations for the conventional cell [S1(a) and S2(a)], and theemployed cell [S1(b) and S2(b)]. Node S1 in Fig. 9 is previ-ously programmed to the low state. To include the worst case3- process variation for transistor characteristics, M1 and M4have a fast corner model, while M2 and M3 have a slow cornermodel. Although the conventional cell fails to recall, the em-

MASUI et al.: FERROELECTRIC MEMORY-BASED SECURE DYNAMICALLY PROGRAMMABLE GATE ARRAY 721

Fig. 11. Simulation results of data recall operations.

Fig. 12. V estimation from the hysteresis loop.

ployed cell indicates a large operation margin against the tran-sistor imbalance. The primary reason of the data recall failurein the conventional cell is that the SRAM cell latches whenthe voltage across the ferroelectric capacitors FC1 and FC2 isaround the threshold voltage of M2 and M4 (0.5 V). The differ-ence of equivalent capacitances for the voltage range from 0 to0.5 V is not so large as is depicted in Fig. 8. For example, C1/C0is 3.1 for the applied voltage of 3.0 V, while C1/C0 is only 1.7for the 0.5-V application. Consequently, it is difficult to createsufficient node voltage difference to overcome the process vari-ation with the conventional nonvolatile SRAM cell.

The for the 6T4C cell can be graphically estimated fromthe hysteresis loop, as illustrated in Fig. 12. If the node S1 pre-viously stored low state, the voltage-polarization curve is repre-sented by the “0” state curve for FC1, and the curve of FC3 isindicated by the “1” state curve when PL1 is driven from low tohigh. The voltage of the intermediate node S1 between FC1 andFC3 is denoted as and is determined from the cross pointof the “0” and “1” state curves, as shown in the upper part ofFig. 12. displays the previously stored voltage of the node.

TABLE IVCOMPARISON OF CONVENTIONAL FERAM AND

EMPLOYED MEMORY CELL CHARACTERISTICS

The voltage of the other node S2, , is derived in a similar way,as shown in the lower part of Fig. 12. is equal to the differ-ence between and and is much larger than that of the2T2C cell because the employed mechanism utilizes the largecapacitance difference between “1” and “0” states. We deducethat the difference between the in Fig. 11 (dynamic) andFig. 12 (static) results from high-frequency components of po-larization switching, which is only included in the SPICE sim-ulation [18].

An issue of the recall operation is partial polarization destruc-tion during the PL1 drive. As is shown in the voltage curve ofnode S1(b) in Fig. 11, 1.1 V is applied to FC3, which producesa small amount of switching charge, as is presented in the hys-teresis loop in Fig. 12. However, since the number of recall cy-cles in any field programmable device is less than 110 times,this partial-destructive read operation does not cause any impacton the reliability of the ferroelectric capacitors.

The area overhead from additional capacitors FC3 and FC4can be minimized by fabricating all ferroelectric capacitorsabove the SRAM transistors. The influence of the additionalcapacitors on device size is discussed in Section V. Anotheradvantage gained from the ferroelectric SRAM cell is improvedsoft-error immunity. A remarkable reduction of soft-error rateis observed by adding capacitors to storage nodes [19], [20].It should be noted that the additional current consumption bythe leakage of ferroelectric capacitors is calculated as small as2.3 A for the entire 92-kb configuration memory. Table IVsummarizes the characteristics of the conventional FeRAM celland the employed 6T4C cell.

IV. L OGIC BLOCK CIRCUIT AND BENCHMARK

In this section, we propose a logic block circuit that can ef-fectively store the intermediate computing results for the multi-context scheme. Fig. 13 shows the schematic diagram of the el-emental logic block circuit. The 2 : 1 multiplexer selects outputsfrom the LUT or the flip-flop. The latched signal is supplied tothe output by enabling the flip-flop in one context; on the otherhand, when the unlatched signal is selected in another context,the flip-flop is disabled, and can store an intermediate result ofprevious operation. Consequently, combination logic operationby the LUT and the previous data storage by the flip-flop areexecuted simultaneously in one logic block for the multicontextscheme.

722 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 5, MAY 2003

Fig. 13. Elemental logic block circuit.

Fig. 14. DES implementation on the prototype DPGA.

To evaluate this circuit, the DES encryption/decryptionfunction [21] is implemented with six contexts of the prototypeDPGA. Fig. 14 presents the relationship between the internaldata processing and the associated contexts. The proper imple-mentation is validated by standard known-answer tests [22].Fig. 15 represents the mapping result of the third context.Key and intermediate results of expansion permutation arestored in the flip-flops around the center region; moreover,combinational logic function operating as the substitution boxis implemented on the LUTs in the same region. The logicblock circuit shown in Fig. 12 can improve the logic density forthe multicontext scheme with minimum area overhead.

V. CHIP DESIGN

A prototype nonvolatile secure DPGA has been designedin a 0.35- m triple-layer metal CMOS technology with3.3-V embedded FeRAM technology [23]. Fig. 16 showsthe die micrograph of the prototype DPGA. The die size is10.4 10.4 mm . A total of 184 64-bit/eight-context config-uration memory blocks are placed in an array, and one 64-bitconfiguration memory is used to configure two logic blocks.

Fig. 15. Example of DES mapping.

Fig. 16. Die micrograph of the prototype FeRAM-based DPGA.

TABLE VNOMINAL SIMULATION RESULTS ONPRIMARY CIRCUIT OPERATIONS

The numbers of configuration memory blocks assigned tologic blocks, crossbars, PROMs, and I/O are 128, 48, 4, and4, respectively. The entire chip is designed with a 0.35-mstandard CMOS library except for the configuration memory,which is designed manually.

Table V summarizes nominal performance simulation resultsfor the primary DPGA circuit elements. The configurationmemory is designed to minimize the delay from the clockrising edge to the configuration memory output. From thistable, the maximum operating frequency to implement a seriesconnection of three LUTs and three crossbars is calculated as

MASUI et al.: FERROELECTRIC MEMORY-BASED SECURE DYNAMICALLY PROGRAMMABLE GATE ARRAY 723

TABLE VIAREA RATIO OF EACH BLOCK IN THE PROTOTYPE ANDOPTIMIZED DPGA

Fig. 17. Comparison of prototype cell in 0.35-�m planar cell technology andreduced cell in 0.18-�m stacked cell technology.

125 MHz. The maximum frequency of configuration memoryprogramming is 40 MHz and is dominated by theRC delayof the plateline. The power dissipation of each 64-bit config-uration memory block is simulated as 1.1 mW at 10 MHz forthe CID change and 3.4 mW at 10 MHz for the backgroundprogramming.

The layout of the prototype configuration memory is notoptimized by the requirements of placing additional ferroelec-tric capacitors to protect used capacitors from process-induceddamages. Through process evaluations, we have confirmedthat these capacitors can be removed to minimize the entiredevice area. Fig. 17 compares the layouts and sizes of theprototype 6T4C cell in a 0.35-m planar-cell technology andoptimized structure in a 0.18-m stacked-cell technology [2].The optimized cell size in the 0.18-m technology is 1/10 ofthe prototype cell size.

Table VI lists the area ratio of the configuration memories,control logic, and crossbars including logic blocks relative to theentire DPGA array without the PROM blocks and I/O pads. Thearea ratios of the configuration memory per context for the pro-totype and optimized designs are 6.6% and 3.5%, respectively.We deduce that this small configuration memory area is gainedby the use of multiplexer-based fully encoded crossbars, as de-scribed in Section II. With the optimized cell structure and an

Fig. 18. Shmoo plot of the DES implementation at room temperature.

area model presented in [9], the logic density of the eight-con-text FeRAM-based nonvolatile DPGA is 3.8 times larger thanthat of conventional SRAM-based FPGA.

VI. EXPERIMENTAL RESULTS

The delay of the logic block and crossbar was evaluated byconfiguring various length inverter chains configured as seriesconnections of logic blocks and crossbars. The measurementat 3.3 V and room temperature shows delays of 2.3 and 2.4 nsfor the logic block and crossbar, respectively. The differencebetween this measurement and the simulation shown in Table Varises from the interconnection delay. This measured delayis comparable to an FPGA product fabricated in a 0.35-mtechnology [24]. The maximum frequency to implement aseries connection of three LUTs and three crossbars becomes62.1 MHz.

Fig. 18 shows a shmoo plot of the DES operation at roomtemperature. The measured maximum operating frequency is51 MHz at 3.3 V. This value matches well with a delay simu-lation including a back-annotated interconnection delay, whichindicates 50-MHz operation in the same condition. In addition,a simple path delay calculation of the DES implementationgives 54.1-MHz operation since the critical path consists ofa series connection of three LUTs and four crossbars. Themeasured power consumption of the DES operation is 282 mWat 20 MHz and room temperature. The measured standbyleakage associated with ferroelectric capacitors in the entireconfiguration memory is less than 5A.

Fig. 19 plots the voltage dependence of program and recalloperations at room temperature. The recall operation is veri-fied for the entire 92-kb configuration memory after programand subsequent power-off period of 60 s. When the program

724 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 5, MAY 2003

Fig. 19. Voltage dependence of program and recall operation at roomtemperature.

TABLE VIIMEASUREDV FOR THEPOWER SUPPLY VOLTAGE OF 1.5-V BEFORE AND

AFTER TYPICAL RELIABILITY TEST STRESSES

and recall voltages are the same, as is usually the case, properoperation is observed as low as 1.3 V. It should be noted thata high-voltage program enables a low-voltage recall; however,a lower voltage program tends to failure especially for highervoltage recall. Since the simulation with a ferroelectric capac-itor model including low-voltage (minor looping) effect [18]does not show the failure observed in Fig. 19, we deduce thatthis failure results from variation of material characteristics pro-grammed at the low-voltage region of the hysteresis loop.

Table VII presents the measured values for the power-supply voltage of 1.5 V before and after typical reliability teststresses, namely, retention, imprint, and wearout. The worst casedegradation is caused by the retention stress; however, theisstill larger than 100 mV, which is the minimum voltage requiredto assure proper operation. Therefore, the minimum nonvolatileoperation voltage of the employed configuration memory is con-cluded as 1.5 V. This is the lowest operation voltage ever re-ported in a PZT-based ferroelectric memory.

VII. CONCLUSION AND FUTURE DIRECTION

We have explored the application of ferroelectric memorytechnology to low-cost field programmable devices with itsnonvolatility and the multicontext technique. A prototype non-volatile eight-context DPGA with a secure protection functionof the configuration data has been designed and evaluated witha 0.35- m CMOS FeRAM technology. SRAM-based 6T4Cmemory cell presents overwhelming fast and nondestructiveread characteristics as well as stable recall operation overconventional 1T1C/2T2C and nonvolatile (6T2C) SRAM cells.The 1.5-V nonvolatile memory characteristics are superior toother conventional PZT-based 1T1C/2T2C memory becauseof the large recall margin. Implementation of the DES encryp-

tion/decryption function presents comparable performance tothe standard CMOS technology.

The next step of the research and development on theFeRAM-based field programmable devices is to apply thiscircuit to scaled FeRAM technologies. Although the massproduction of FeRAM started from a 0.5-m technology,the minimum feature length has been reduced by 0.7 timesevery year from 2001 to 2003 [25], and the rate of devicedevelopment has accelerated recently [3], [26]. By the use ofthese scaled technologies, nonvolatile DPGA can increase thelogic gate counts and extend its application to the large andgrowing communication field.

ACKNOWLEDGMENT

The authors would like to thank Dr. H. Nishi, Dr. Y. Arimoto,T. Suzuki, and Y. Takayama for their guidance, A. Ito, Dr. T. Es-hita, and Dr. M. Aoki for their numerous discussions on memorycell layout and ferroelectric memory technology, Prof. G. Gulakand Prof. A. Sheikholeslami, University of Toronto, for theirhelpful discussions, and the staff at the Iwate plant for fabrica-tion of the prototype chip.

REFERENCES

[1] S. Brown, R. Francis, J. Rose, and Z. Vranesic,Field ProgrammableGate Array. Norwell, MA: Kluwar, 1992.

[2] D. Buss, “Technology in the Internet age,” inIEEE Int. Solid-State Cir-cuit Conf. Dig. Tech. Papers, 2002, pp. 18–21.

[3] Y. Horii, Y. Hikosaka, A. Itoh, K. Matsuura, M. Kurasawa, G. Komuro,T. Eshita, and S. Kashiwagi, “4-Mb embedded FRAM for high perfor-mance system on chip (SoC) with large switching charge, reliable reten-tion and imprint resistance,” inIEDM Tech. Dig., 2002, pp. 539–542.

[4] M. Blotski, A. DeHon, and T. Knight Jr., “Unifying FPGAs and SIMDarrays,” inProc. ACM/SIGMA 2nd Int. Symp. FPGAs, 1994.

[5] S. Trimberger, D. Carberry, A. Johnson, and J. Wong, “A time-multi-plexed FPGA,” inProc. IEEE Symp. FPGAs for Custom Computing Ma-chines, 1997, pp. 22–28.

[6] M. Motomura, Y. Aimoto, A. Shibayama, Y. Yabe, and M. Yamashina,“An embedded DRAM-FPGA chip with instantaneous logic reconfigu-ratiuons,” inSymp. VLSI Circuits Dig. Tech. Papers, 1997, pp. 55–56.

[7] S. Trimberger, “Scheduling designs into a time-multiplexed FPGA,” inProc. ACM/SIGMA 6th Int. Symp. FPGAs, 1998, pp. 153–160.

[8] T. Fujii, K. Furuta, M. Motomura, M. Nomura, M. Mizuno, K. Anjo, K.Wakabayashi, Y. Hirota, Y. Nakazawa, H. Ito, and M. Yamashina, “A dy-namically reconfigurable logic engine with a multi-context/multi-modeunified cell architectures,” inIEEE Int. Solid-State Circuits Conf. Dig.Tech. Papers, 1999, pp. 364–365.

[9] A. DeHon, “Reconfigurable architectures for general-purpose com-puting,” Ph.D. dissertation, Mass. Inst. Technol., Cambridge, MA,1996.

[10] , “Entropy, counting, and programmable interconnect,” inProc.ACM/SIGMA 4th Int. Symp. FPGAs, 1996, pp. 73–79.

[11] “Gate count capacity metrics for FPGAs,” Xilinx Corporation, San Jose,CA, Appl. Note XAPP 059 v1.1, 1997.

[12] P. Chow, S.-O. Seo, J. Rose, K. Chung, G. Paez-Monzon, and I.Rahardja, “The design of a SRAM-based field programmable gatearray—Part II: Circuit design and layout,”IEEE Trans. VLSI Syst., vol.7, pp. 321–330, Mar. 1999.

[13] A. Sheikholeslami and G. Gulak, “A survey of circuit innovations in fer-roelectric random-access memories,”Proc. IEEE, vol. 88, pp. 667–689,Mar. 2000.

[14] S. Kawashima, T. Endo, A. Yamamoto, K. Nakabayashi, M. Nakazawa,K. Morita, and M. Aoki, “Bitline GND sensing technique forlow-voltage operation FeRAM,”IEEE J. Solid-State Circuits, vol. 37,pp. 592–598, May 2002.

[15] J. Siu, Y. Eslami, A. Sheikholeslami, G. Gulak, T. Endo, and S.Kawashima, “A 16-kb ITIC FeRAM testchip using current-basedreference scheme,” inProc. IEEE Custom Integtated Circuits Conf.,2002, pp. 107–110.

MASUI et al.: FERROELECTRIC MEMORY-BASED SECURE DYNAMICALLY PROGRAMMABLE GATE ARRAY 725

[16] S. Masui, S. Kawashima, S. Fueki, K. Masutani, A. Inoue, T. Teramoto,and T. Suzuki, “FeRAM applications for next-generation smart cardLSIs,” in Ext. Abstr. 1st Int. Meeting Ferroelectric Random Access Mem-ories, 2001, pp. 13–14.

[17] T. Miwa, J. Yamada, H. Koike, T. Nakura, S. Kobayashi, N. Kasai, andH. Toyoshima, “A 512-kbit low-voltage NV-SRAM with the size of con-ventional SRAM,” inSymp. VLSI Circuits Dig. Tech. Papers, 2001, pp.129–132.

[18] T. Tamura, Y. Arimoto, and H. Ishihara, “Ferroelectric capacitor modelfor circuit simulation of FeRAM,” inExt. Abstr. 1st Int. Meeting Ferro-electric Random Access Memories, 2001, pp. 116–117.

[19] M. Matsumiya, S. Kawashima, M. Sakata, M. Ookura, T. Miyabo, T.Koga, K. Itabashi, K. Mizutani, H. Shimada, and N. Suzuki, “A 15-ns16-Mb CMOS SRAM with interdigitated bit-line architecture,”IEEE J.Solid-State Circuits, vol. 27, pp. 1497–1503, Nov. 1992.

[20] T. Wada, S. Ohbayashi, H. Sato, K. Kozaru, Y. Okamoto, Y. Higashide,T. Shimizu, Y. Maki, R. Morimoto, H. Otoi, T. Koga, H. Honda, M.Taniguchi, Y. Arita, and T. Shiomi, “A 500-MHz pipelined burst SRAMwith improved SER immunity,” inIEEE Int. Solid-State Circuits Conf.Dig. Tech. Papers, 1999, pp. 196–197.

[21] B. Schneier,Applied Cryptography, 2nd ed. New York: Wiley, 1996,ch. 12.

[22] S. Keller and M. Smid, “Modes of operation validation system (MOVS):Requirements and procedures,” Nat. Inst. Standards Technol., Gaithers-burg, MD, NIST Special Pub. 800-172, 1998.

[23] T. Yamazaki, “Key issues for manufacturable FeRAM devices,” inExt.Abstr. 1st Int. Meeting Ferroelectric Random Access Memories, 2001,pp. 31–34.

[24] “XC4000XL electrical specifications,” Xilinx Corp., San Jose, CA,Product Spec. DS005 v1.8, 1999.

[25] “Process integration, devices, and structures and emerging researchingdevices,” Int. Technol. Roadmap for Semiconductors (ITRS), 2001.

[26] T. Moise et al., “Demonstration of a 4 Mb, high density ferroelectricmemory embedded within a 130 nm, 5 LM Cu/FSG logic process,” inIEDM Tech. Dig., 2002, pp. 535–538.

Shoichi Masui (M’90) was born in Nagoya, Japan,on February 14, 1960. He received the B.S. and M.S.degrees from Nagoya University, Nagoya, Japan, in1982 and 1984, respectively.

From 1984 to 1999, he was with Nippon SteelCorporation, Sagamihara, Japan, where he wasengaged in research on silicon-on-insulator deviceand circuit design, and subsequently, he wasresponsible for nonvolatile memory circuit designand its application to radio frequency identification(RFID) integrated circuits. From 1990 to 1992,

he was a Visiting Scholar at Stanford University, Stanford, CA, where heresearched substrate-coupling noise in mixed-signal integrated circuits. In1999, he joined Fujitsu, Ltd., and since 2000, has been a Senior Researcherwith Fujitsu Laboratories, Ltd., Akiruno, Japan, where he is engaged in thedesign of ferroelectric random access memory (FeRAM) for smart cards,RFIDs, and reconfigurable logic LSIs. In 2001, he was a Visiting Scholar at theUniversity of Toronto, Toronto, ON, Canada, where he researched low-powerand high-speed FeRAM design and its application to reconfigurable logicLSIs. His current research interests include reconfigurable circuit design forsoftware-defined radio and secure hardware design.

Tsuzumi Ninomiya was born in Fukuoka, Japan, onAugust 13, 1966. He received the B.S. and M.S. de-grees from Kyoto University, Kyoto, Japan, in 1989and 1991, respectively.

From 1991 to 2001, he was with Oki ElectricIndustry Company, Tokyo, Japan, where he wasengaged in research on GaAs device and circuit de-sign. In 2001, he joined Fujitsu Ltd., Akiruno, Japan,where he is engaged in the design of reconfigurablecircuits with ferroelectric random access memory(FeRAM). His current research interests include

CAD tools for reconfigurable circuits.

Michiya Oura was born in Kyoto, Japan, on July 27,1959. He received the B.E. degree from Kyoto Uni-versity, Kyoto, Japan, in 1982.

In 1982, he joined Fujitsu Laboratories, Ltd.,Akiruno, Japan, where he was engaged in researchon image sensors using amorphous silicon and,later, liquid crystal displays using amorphoussilicon and/or polycrystalline silicon. Since 2000,he has been a Researcher working on low-powerand high-speed design of ferroelectric randomaccess memory (FeRAM) and its application to

reconfigurable logic LSIs.

Wataru Yokozeki was born in Osaka, Japan, onOctober 29, 1968. He received the B.S. and M.S.degrees from Keio University, Tokyo, Japan, in 1992and 1994, respectively.

From 1994 to 1998, he was with Nippon SteelCorporation, Sagamihara, Japan, where he was en-gaged in researches on scaled-down MOSFETs andTCAD, and later he worked on process integrationof large-capacity DRAMs. In 1998, he joined theAdvanced CMOS Technology Department, FujitsuLtd., Akiruno, Japan, where he was engaged in

circuit design of high-speed cache RAMs and compiled SRAMs. From 2001 to2002, he was with the FRAM Division, LSI Group, researching circuit designof nonvolatile ferroelectric memory LSIs and its applications. Currently, hisinterests include nonvolatile embedded FPGAs and reconfigurable logic LSIs.

Kenji Mukaida was born in Gifu, Japan, on March26, 1966. He received the B.S. degree from the Sci-ence University of Tokyo, Tokyo, Japan, in 1990.

From 1990 to 1999, he was with Nippon SteelCorporation, Sagamihara, Japan, where he wasengaged in design of logic LSIs and computersystems. In 1999, he joined Fujitsu Ltd., Akiruno,Japan, where he has been engaged in the design ofreconfigurable logic LSIs since 2001.

Shoichiro Kawashima (M’83) was born in Yoko-hama, Japan, in 1958. He received the B.S. degreefrom Tokyo University, Tokyo, Japan, in 1982.

He joined Fujitsu Ltd., Kawasaki, Japan, in1982, where he was engaged in the development of16-kb–16-Mb MOS static RAMs. Since 1994, hehas been with Fujitsu Laboratories Ltd., Akiruno,Japan, where he researches low-power SRAMs andDSPs. His current research interests are FeRAM andits sensing circuits.

Mr. Kawashima is a member of the Japan Societyof Applied Physics and the Institute of Electronics, Information, and Commu-nication Engineers of Japan.