Concurrent Error Detection and Correction in Gaussian Normal Basis Multiplier over GF(2^m

7
Brief Contributions________________________________________________________________________________ Concurrent Error Detection and Correction in Gaussian Normal Basis Multiplier over GFð2 m Þ Che Wun Chiou, Member, IEEE, Chin-Cheng Chang, Fellow, IEEE, Chiou-Yng Lee, Senior Member, IEEE, Ting-Wei Hou, and Jim-Min Lin Abstract—Fault-based cryptanalysis has been developed to effectively break both private-key and public-key cryptosystems, making robust finite field multiplication a very important research topic in recent years. However, no robust normal basis multiplier has been proposed in the literature. Therefore, this investigation presents a semisystolic Gaussian normal basis multiplier. Based on the proposed Gaussian normal basis multiplier, both concurrent error detection and correction capabilities can be easily achieved using time redundancy technology with no hardware modification. Index Terms—Finite field multiplication, Gaussian normal basis, elliptic curve cryptosystem, fault-based cryptanalysis, concurrent error detection, concurrent error correction. Ç 1 INTRODUCTION THE arithmetic operations in GFð2 m ) are largely adopted in coding theory, digital signal processing, and public key cryptosystems [1], [2], [3]. These applications require arithmetic operations such as addition, multiplication, multiplicative inversion, and exponentia- tion to be performed rapidly. Addition in GF ð2 m Þ is easily achieved with XOR gates. Multiplicative inversion and exponentiation are much more time-consuming than the other two basic operations, namely addition and multiplication, but can be performed using iterative multiplications. Therefore, efficient implementation of multiplication is fundamental in cryptographic applications. The efficiency of the finite field multiplication heavily depends on how to specify the field element representation. An element of GFð2 m Þ is typically denoted by one of three popular bases, namely polynomial basis (PB) [4], [5], [6], [7], [8], [9], [10], [11], dual basis (DB) [12], [13], [14], [15], [16], and normal basis (NB) [6], [16], [17], [18], [19], [20], [21], [22], [23], [24]. Each basis has its own distinct properties and hardware implementations. The PB architectures can easily extend their sizes to meet various applications owing to their low circuit complexity, simplicity, regularity, and modular- ity. The DB multipliers need less chip area among these bases. The major merit of the NB is that it can square an element simply by cyclically shifting its binary form. Thus, the NB multipliers are very effective for performing inverse, squaring, and exponentia- tion operations. Massey and Omura proposed the first NB multiplication algorithm in 1986 [17]. Their NB multiplier has a parallel-in, serial-out architecture with a long critical path delay. Various bit-parallel multipliers based on Massey and Omura’s multiplication algorithm have been proposed [6], [18], [19], [21], [22]. Agnew et al. [21] developed a parallel-in, parallel-out multiplier to alleviate the long latency problem of the Massey- Omura multiplier. In practice, existing bit-parallel normal basis multipliers over GFð2 m Þ [6], [19], [21], [22] takes time complexity of (O log 2 m) by adopting XOR trees, but require space complexity of (Om 2 ), and have irregular structures that are not suitable for VLSI implementation. Reyhani-Masoleh [19] developed a new nonsystolic architecture for Gaussian normal basis multiplication of Type-T outperforming previous normal basis multipliers. Wang et al. [18] presented a VLSI architecture to realize the Massey- Omura multiplier. The major disadvantages of this VLSI Massey- Omura multiplier are its irregularity and lack of modularity, which mean that it cannot be easily extended. Kwon [23] announced a novel systolic multiplier for optimal normal basis of type 2 with low space complexity. This investigation first presents a semisystolic array structure for Gaussian normal basis multiplication, which is appropriate for VLSI implementation. Lidl and Niederreiter [2] revealed that a normal basis exists for every positive integer m. Gaussian normal basis (GNB is a special case of normal basis, and has low hardware complexity. All positive integers, except those that are divisible by eight, have GNB [25]. Many standards, such as ANSI X9.62 [26], FIPS 186-2 [27], and IEEE Standard 1363-2000 [28], include GNB. Fault-based cryptanalysis attack is a recently developed cryptanalysis approach where faults are injected into cryptosys- tems. Fault-based cryptanalysis may only need a small amount of faulty ciphers to break common ciphers. Fault-based cryptanalysis has been proven to be a valuable cryptanalysis method against symmetrical and asymmetrical encryption algorithms [29], [30], [31], [32], [33], [34], [35]. Biehl et al. [33], Ciet and Joye [34], and Blo ¨ mer et al. [35] have also demonstrated that inducing faults into the computation of elliptic curve scalar multiplications easily enables recovery of the secret key. The erroneous outputs of cryptographic devices and memory structures can lead to an active attack. Therefore, simple and effective approaches for protecting the encryption/decryption circuitry from an attacker are stipulated to ensure that associated cryptographic devices can output the accurate signatures. Several error detection approaches have been developed for private-key cryptosystems [36], [37] and public-key cryptosystems [38], [39], [40], [41], [42], [43], [44], [45], [46] to check output values. Most error detection methods for polynomial basis multipliers employ parity checking or time redundancy approaches [40], [41], [42], [44], [45], [46]. Lee et al. [43] provide a concurrent error detection approach for systolic dual basis multipliers. However, the only existing error detection approach for bit-serial normal basis multiplier is that of Fenn et al. [40]. No concurrent error detection approaches for bit-parallel normal basis multiplier have been noted in the literature. Therefore, this investigation presents a concurrent error detection approach for bit-parallel semisystolic Gaussian normal basis multiplier by using RE computing with Shifted Operands (RESOs) technology [47], [48]. Furthermore, a concurrent error correction approach for these Gnormal basis multipliers is also presented. Most of previous work is focused on redesigning data path of cryptographic architectures with error detection capability. IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009 851 . C.W. Chiou is with the Department of Computer Science and Information Engineering, Ching Yun University, Chung-Li 320, Taiwan, R.O.C. E-mail: [email protected]. . C.-C. Chang is with the Department of Information Engineering and Computer Science, Feng Chia University, Taichung City 407, Taiwan, R.O.C. E-mail: [email protected]. . C.-Y. Lee is with the Department of Computer Information and Network Engineering, Lunghwa University of Science and Technology, Taoyuan County 333, Taiwan, R.O.C. E-mail: [email protected]. . T.-W. Hou is with the Department of Engineering Science, National Cheng Kung University, Tainan City 701, Taiwan, R.O.C. E-mail: [email protected]. . J.-M. Lin is with the Department of Information Engineering and Computer Science, Feng Chia University, Taichung City 407, Taiwan, R.O.C. E-mail: [email protected]. Manuscript received 21 Apr. 2008; revised 24 Aug. 2008; accepted 9 Dec. 2008; online accepted 19 Dec. 2008. Recommendation for acceptance by D. Gizopoulos. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TC-2008-04-0172. Digital Object Identifier no. 10.1109/TC.2008.226. 0018-9340/09/$25.00 ß 2009 IEEE Published by the IEEE Computer Society Authorized licensed use limited to: LUNGHWA UNIV OF SCIENCE AND TECHNOLOGY. Downloaded on April 30, 2009 at 23:05 from IEEE Xplore. Restrictions apply.

Transcript of Concurrent Error Detection and Correction in Gaussian Normal Basis Multiplier over GF(2^m

Brief Contributions________________________________________________________________________________

Concurrent Error Detection and Correctionin Gaussian Normal Basis Multiplier

over GFð2mÞ

Che Wun Chiou, Member, IEEE,Chin-Cheng Chang, Fellow, IEEE,

Chiou-Yng Lee, Senior Member, IEEE,Ting-Wei Hou, and Jim-Min Lin

Abstract—Fault-based cryptanalysis has been developed to effectively break

both private-key and public-key cryptosystems, making robust finite field

multiplication a very important research topic in recent years. However, no robust

normal basis multiplier has been proposed in the literature. Therefore, this

investigation presents a semisystolic Gaussian normal basis multiplier. Based on

the proposed Gaussian normal basis multiplier, both concurrent error detection

and correction capabilities can be easily achieved using time redundancy

technology with no hardware modification.

Index Terms—Finite field multiplication, Gaussian normal basis, elliptic curve

cryptosystem, fault-based cryptanalysis, concurrent error detection, concurrent

error correction.

Ç

1 INTRODUCTION

THE arithmetic operations in GFð2m) are largely adopted in codingtheory, digital signal processing, and public key cryptosystems [1],[2], [3]. These applications require arithmetic operations such asaddition, multiplication, multiplicative inversion, and exponentia-tion to be performed rapidly. Addition in GF ð2mÞ is easily achievedwith XOR gates. Multiplicative inversion and exponentiation aremuch more time-consuming than the other two basic operations,namely addition and multiplication, but can be performed usingiterative multiplications. Therefore, efficient implementation ofmultiplication is fundamental in cryptographic applications.

The efficiency of the finite field multiplication heavily dependson how to specify the field element representation. An element ofGFð2mÞ is typically denoted by one of three popular bases, namelypolynomial basis (PB) [4], [5], [6], [7], [8], [9], [10], [11], dual basis(DB) [12], [13], [14], [15], [16], and normal basis (NB) [6], [16], [17],[18], [19], [20], [21], [22], [23], [24]. Each basis has its own distinctproperties and hardware implementations. The PB architecturescan easily extend their sizes to meet various applications owing to

their low circuit complexity, simplicity, regularity, and modular-ity. The DB multipliers need less chip area among these bases. Themajor merit of the NB is that it can square an element simply bycyclically shifting its binary form. Thus, the NB multipliers arevery effective for performing inverse, squaring, and exponentia-tion operations. Massey and Omura proposed the first NBmultiplication algorithm in 1986 [17]. Their NB multiplier has aparallel-in, serial-out architecture with a long critical path delay.Various bit-parallel multipliers based on Massey and Omura’smultiplication algorithm have been proposed [6], [18], [19], [21],[22]. Agnew et al. [21] developed a parallel-in, parallel-outmultiplier to alleviate the long latency problem of the Massey-Omura multiplier. In practice, existing bit-parallel normal basismultipliers over GFð2mÞ [6], [19], [21], [22] takes time complexityof (O log2 m) by adopting XOR trees, but require space complexityof (Om2), and have irregular structures that are not suitable forVLSI implementation. Reyhani-Masoleh [19] developed a newnonsystolic architecture for Gaussian normal basis multiplicationof Type-T outperforming previous normal basis multipliers. Wanget al. [18] presented a VLSI architecture to realize the Massey-Omura multiplier. The major disadvantages of this VLSI Massey-Omura multiplier are its irregularity and lack of modularity,which mean that it cannot be easily extended. Kwon [23]announced a novel systolic multiplier for optimal normal basisof type 2 with low space complexity. This investigation firstpresents a semisystolic array structure for Gaussian normal basismultiplication, which is appropriate for VLSI implementation. Lidland Niederreiter [2] revealed that a normal basis exists for everypositive integer m. Gaussian normal basis (GNB is a special case ofnormal basis, and has low hardware complexity. All positiveintegers, except those that are divisible by eight, have GNB [25].Many standards, such as ANSI X9.62 [26], FIPS 186-2 [27], andIEEE Standard 1363-2000 [28], include GNB.

Fault-based cryptanalysis attack is a recently developedcryptanalysis approach where faults are injected into cryptosys-tems. Fault-based cryptanalysis may only need a small amount offaulty ciphers to break common ciphers. Fault-based cryptanalysishas been proven to be a valuable cryptanalysis method againstsymmetrical and asymmetrical encryption algorithms [29], [30],[31], [32], [33], [34], [35]. Biehl et al. [33], Ciet and Joye [34], andBlomer et al. [35] have also demonstrated that inducing faults intothe computation of elliptic curve scalar multiplications easilyenables recovery of the secret key.

The erroneous outputs of cryptographic devices and memorystructures can lead to an active attack. Therefore, simple andeffective approaches for protecting the encryption/decryptioncircuitry from an attacker are stipulated to ensure that associatedcryptographic devices can output the accurate signatures. Severalerror detection approaches have been developed for private-keycryptosystems [36], [37] and public-key cryptosystems [38], [39],[40], [41], [42], [43], [44], [45], [46] to check output values. Mosterror detection methods for polynomial basis multipliers employparity checking or time redundancy approaches [40], [41], [42],[44], [45], [46]. Lee et al. [43] provide a concurrent error detectionapproach for systolic dual basis multipliers. However, the onlyexisting error detection approach for bit-serial normal basismultiplier is that of Fenn et al. [40]. No concurrent error detectionapproaches for bit-parallel normal basis multiplier have beennoted in the literature. Therefore, this investigation presents aconcurrent error detection approach for bit-parallel semisystolicGaussian normal basis multiplier by using RE computing withShifted Operands (RESOs) technology [47], [48]. Furthermore, aconcurrent error correction approach for these Gnormal basismultipliers is also presented. Most of previous work is focused onredesigning data path of cryptographic architectures with errordetection capability.

IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009 851

. C.W. Chiou is with the Department of Computer Science and InformationEngineering, Ching Yun University, Chung-Li 320, Taiwan, R.O.C.E-mail: [email protected].

. C.-C. Chang is with the Department of Information Engineering andComputer Science, Feng Chia University, Taichung City 407, Taiwan,R.O.C. E-mail: [email protected].

. C.-Y. Lee is with the Department of Computer Information and NetworkEngineering, Lunghwa University of Science and Technology, TaoyuanCounty 333, Taiwan, R.O.C. E-mail: [email protected].

. T.-W. Hou is with the Department of Engineering Science, National ChengKung University, Tainan City 701, Taiwan, R.O.C.E-mail: [email protected].

. J.-M. Lin is with the Department of Information Engineering andComputer Science, Feng Chia University, Taichung City 407, Taiwan,R.O.C. E-mail: [email protected].

Manuscript received 21 Apr. 2008; revised 24 Aug. 2008; accepted 9 Dec.2008; online accepted 19 Dec. 2008.Recommendation for acceptance by D. Gizopoulos.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-2008-04-0172.Digital Object Identifier no. 10.1109/TC.2008.226.

0018-9340/09/$25.00 � 2009 IEEE Published by the IEEE Computer Society

Authorized licensed use limited to: LUNGHWA UNIV OF SCIENCE AND TECHNOLOGY. Downloaded on April 30, 2009 at 23:05 from IEEE Xplore. Restrictions apply.

The remainder of this paper is organized as follows: Section 2briefly reviews the mathematical background. Section 3 thendescribes the semisystolic Gaussian normal basis multiplier. Next,Section 4 introduces the proposed semisystolic Gaussian normalbasis multiplier with concurrent error detection capability. TheGaussian normal basis multiplier with concurrent error correctioncapability is discussed in Section 5. Conclusions are finally drawnin Section 6.

2 REPRESENTATION OF GAUSSIAN NORMAL BASIS

The reader is assumed to be familiar with the basic concepts of

finite fields and the RESO method. Previous studies [2], [49]

describe the attributes of finite fields in detail. In the following

paragraphs, the results of the finite fields and the RESO method

are briefly reviewed. Let f�20; �21

; �22; . . . ; �2m�1g be a normal basis

of GFð2mÞ for � 2 GFð2mÞ. Each element A 2 GFð2mÞ can be

uniquely expressed as

A ¼ a0�20 þ a1�

21 þ a2�22 þ � � � þ am�1�

2m�1

;

where ai 2 f0; 1g for i ¼ 0; 1; 2; . . . ;m� 1.

Existing Gaussian normal basis multiplier architectures [19]

cannot be easily redesigned to have concurrent error detection

capability since they use irregular multiplication tables. Therefore,

this work derives a semisystolic Gaussian normal basis multiplier

in which concurrent error detection and correction capabilities can

be easily applied with a low-cost requirement.The GNB [25] of a type t presented by the Gauss period of type

(m,t) has the following attributes:

Property 1. � ¼Pt�1

i¼0 ��i ,

Property 2. �t ¼ 1 modmtþ 1,

Property 3. �mtþ1 ¼ � mtþ1ð Þ mod mtþ1ð Þ ¼ 1,

where � and � indicate primitive tth and ðmtþ 1Þth roots of unity in

GFð2mÞ, respectively.

Replacing the normal element � in the normal basis � ¼f�20

; �21; . . . ; �2m�1g by using Property 1, the basis � is transferred

to the following basis �0:

�0 ¼��20�0

; �20�1

; . . . ; �20�t�1

; �21�0

; �21�1

; . . . ;

�21

�m�1; . . . ; �22�0

; . . . ; �2m�1�t�1�:

Using Property 3, the basis �0 can be transferred and extended tothe following redundant basis �00:

�00 ¼ �0; �1; �2; �3; . . . ; �mt� �

:

Consequently, any normal basis element A 2 GFð2mÞ can berepresented with both bases � and�00 as follows:

A ¼Xm�1

i¼0

ai�2i ¼

Xmtj¼0

a00j �j;

where a000 ¼ 0 and a00j ¼ ai (1 � j � mt; 0 � i � m� 1) if 9k ð0 � k �t� 1Þ;3 2i�k modmtþ 1 ¼ j.

For practical reasons, this study focuses on a finite field with theGNB of even type t. In GNBs with type t for GFð2mÞ; t is an eveninteger since mtþ 1 is always a prime integer, and m is an oddnumber. To save space complexity, the palindromic representation[49], [50], [51] can also be utilized to represent elements. Based onthe palindromic representation, the normal basis � now can betransferred to the following basis ��:

�� ¼��1 þ ��1; �2 þ ��2; �3 þ ��3; . . . ; �mt=2 þ ��mt=2

�:

If any one element A 2 GFð2mÞ is represented using both bases �

and ��, then the relationship is as follows:

A ¼Xm�1

i¼0

ai�2i ¼

Xmt=2

j¼1

a�j �j þ ��j

� �;

where a�j ¼ aið1 � j � mt=2; 0 � i � m� 1Þ if 9 k ð0 � k � t� 1Þ;3 2i�k modmtþ 1 ¼ j or 2i�k modmtþ 1 ¼ mtþ 1� j.

The RESO scheme [47], [48] is based on time redundancy. Let the

function G(x) be a function unit and the function T be such that

T�1 G T xð Þð Þð Þ ¼ G xð Þ for all values of x. The results are computed

twice. The result of the first computation step for computing GðxÞ is

stored in a register. During the second computation step, the result

yielded by computing T�1 G T xð Þð Þð Þ is compared with the result of

the first step. A mismatch indicates an existing error. The advantage

of the RESO scheme is that it can detect both permanent as well as

intermittent failures. The fault model assumed in the RESO scheme

is the functional fault model. The functional fault model assumes

that the faults are confined to a small area of the circuit and the

precise nature of the faults is not known. The functional fault model

is very appropriate for the VLSI circuits.

3 PROPOSED SEMISYSTOLIC GNB MULTIPLIER

Let C be the product of A and B, where A, B, and C 2 GFð2mÞ. The

representations of A and B and the product C are computed as

follows:

A ¼Xmt=2

i¼1

a�i �i þ ��i

� �; B ¼

Xmti¼0

b00i �i; and

C ¼ A� B

¼ a�1 �1 þ ��1� �

þ a�2 �2 þ ��2� �

þ � � � þ a�mt2�mt2 þ ��mt2

� �� �B

¼ a�1�1 þ a�2�2 þ a�3�3 þ � � � þ a�mt

2�mt2

� �B

þ a�1��1 þ a�2��2 þ a�3��3 þ � � � þ a�mt

2��

mt2

� �B

¼ C1þ C2;

ð1Þ

where C1 ¼ ða�1�1 þ a�2�2 þ a�3�3 þ � � � þ a�mt2�mt2 ÞB and C2 ¼ a�1��1 þ

a�2��2 þ a�3��3 þ � � � þ a�mt

2��

mt2 B.

C1 can be computed as follows:

C1 ¼ a�1�1 þ a�2�2 þ a�3�3 þ � � � þ a�mt

2�mt2

� �B

¼ a�1�1Bþ a�2�2Bþ a�3�3Bþ � � � þ a�mt2�mt2 B:

Let BðiÞ ¼ �iB for 1 � i � mt=2, thus

C1 ¼ a�1Bð1Þ þ a�2Bð2Þ þ � � � þ a�mt2Bð

mt2 Þ:

Let BðiÞ (1 � i � mt=2) be represented as follows: BðiÞ ¼bðiÞ0 �

0 þ bðiÞ1 �1 þ � � � þ bðiÞmt�1�

mt�1 þ bðiÞmt�mt, where bðiÞj 2 GF ð2Þ for

0 � j � mt:The relationship between Bðiþ1Þ and BðiÞ for 1 � i � mt=2� 1 is

as follows:

Bðiþ1Þ ¼ BðiÞ� ¼ bðiÞ0 �

0 þ bðiÞ1 �1 þ � � � þ bðiÞmt�1�

mt�1 þ bðiÞmt�mt� �

¼ bðiÞ0 �1 þ bðiÞ1 �

2 þ � � � þ bðiÞmt�1�mt þ bðiÞmt�mtþ1

¼ bðiÞmt�0 þ bðiÞ0 �1 þ bðiÞ1 �

2 þ � � � þ bðiÞmt�1�mt

¼ bðiþ1Þ0 �0 þ bðiþ1Þ

1 �1 þ bðiþ1Þ2 �2 þ � � � þ bðiþ1Þ

mt �mt;

852 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009

Authorized licensed use limited to: LUNGHWA UNIV OF SCIENCE AND TECHNOLOGY. Downloaded on April 30, 2009 at 23:05 from IEEE Xplore. Restrictions apply.

where bðiþ1Þj ¼ bðiÞj�1 for 1 � j � mt and b

ðiþ1Þ0 ¼ bðiÞmt. Assume that Bð1Þ

is precalculated first as: Bð1Þ ¼ �B ¼ b00mt�0 þ b000�1 þ b001�2 þ � � � þb00mt�1�

mt.Fig. 1 displays the designed semisystolic array for computing

C1. In Fig. 1, a�j is applied j-i clock cycles later behind a�i for1 � i � j � mt=2, and n = mt+1 and k = mt/2. Fig. 2 illustrates thedetailed circuit of the U cell. Let the binary representation of C1 beC1 ¼ c1000�0 þ c1001�1 þ c1002�2 þ � � � þ c100mt�mt, where 8c100i 2 GF ð2Þfor 0 � i � mt.

A similar manner for computing C2 will be depicted in thefollowing paragraphs:

C2 ¼ c2000�0 þ c2001�1 þ c2002�2 þ � � � þ c200mt�mt

¼ a�1��1 þ a�2��2 þ a�3��3 þ � � � þ a�mt

2��

mt2

� �B

¼ a�1��1Bþ a�2��2Bþ � � � þ a�mt2��

mt2 B

¼ a�1Bð�1Þ þ a�2Bð�2Þ þ � � � þ a�mt2Bð�

mt2 Þ;

where c200j 2 GF ð2Þ and Bð�iÞ ¼ ��iB for 0 � j � mt and 1 � i �mt=22:

Let Bð�iÞ (1 � i � mt=2) be represented by the following

representation:

Bð�iÞ ¼ bð�iÞ0 �0 þ b

ð�iÞ1 �1 þ � � � þ bð�iÞmt�1�

mt�1 þ bð�iÞmt �mt;

where bð�iÞj 2 GF ð2Þ for 0 � j � mt:

Based on ��1 ¼ �mt, the relationship between Bð�i�1Þ and Bð�iÞ

for 1 � i � mt=2� 1 is as follows:

Bð�i�1Þ ¼ Bð�iÞ��1

¼ bð�iÞ0 �0 þ bð�iÞ1 �1 þ � � � þ bð�iÞmt�1�

mt�1 þ bð�iÞmt �mt

� ���1

¼ bð�iÞ0 ��1 þ bð�iÞ1 �0 þ � � � þ bð�iÞmt�1�mt�2 þ bð�iÞmt �

mt�1

¼ bð�iÞ1 �0 þ � � � þ bð�iÞmt�1�mt�2 þ bð�iÞmt �

mt�1 þ bð�iÞ0 �mt

¼ bð�i�1Þ0 �0 þ � � � þ bð�i�1Þ

mt�2 �mt�2 þ bð�i�1Þmt�1 �mt�1 þ bð�i�1Þ

mt �mt;

where bð�i�1Þj�1 ¼ bð�iÞj for 1 � j � mt and b

ð�i�1Þ0 ¼ bð�iÞmt .

The first Bð�1Þ is precalculated as Bð�1Þ ¼ ��1B ¼ b001�0 þ b002�1 þ � � �þb00mt�mt�1 þ b000�mt.

The semisystolic array for computing C1 can also be adopted to

compute C2 by reversely rearranging coefficients of B, as depicted

in Fig. 3. The final result of C is the summation of C1 and C2. In

other words

C ¼ c1000 þ c2000� �

�0 þ c1001 þ c2001� �

�1 þ � � � þ c100mt þ c200mt� �

�mt

¼ c000�0 þ c001�1 þ . . .þ c00mt�mt:

The result C is represented by the basis �00 and needs to be

transferred to the following representation in the basis �0:

C ¼ c000 þ c001� �

�1 þ c000 þ c002� �

�2 þ � � � þ c000 þ c00mt� �

�mt:

¼ c01�1 þ c02�2 þ � � � þ c0mt�mt;

C can be further represented in the basis � as follows:

C ¼Xm�1

i¼0

ci�2i ¼

Xmtj¼1

c0j�j;

IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009 853

Fig. 1 The semisystolic GNB multiplier for computing C1.

Fig. 2 The detailed circuit of U cell.

Fig. 3 The proposed semisystolic GNB multiplier for computing C = C1 + C2.

Authorized licensed use limited to: LUNGHWA UNIV OF SCIENCE AND TECHNOLOGY. Downloaded on April 30, 2009 at 23:05 from IEEE Xplore. Restrictions apply.

where ci ¼ c0j (1 � j � mt; 0 � i � m� 1Þ if 9 k ð0 � k � t� 1Þ;3 2i�k modmtþ 1 ¼ j.

The following assumptions were made for VLSI implementa-

tion when comparing space complexity: a 2-input AND gate, a

1-bit latch, and a 2-input XOR gate composed of six, eight, and six

transistors, respectively [52]. Some real circuits, such as M74HC86

(STMicroelectronics, XOR gate, tPD ¼ 12 ns (TYP.)) [53], M74HC08

(STMicroelectronics, AND gate, tPD ¼ 7 ns (TYP.)) [54],

M74HC279 (STMicroelectronics, SR Latch, tPD ¼ 13 ns (TYP.))

[55], and M74HC32 (STMicroelectronics, OR gate, tPD ¼ 8 ns

(TYP.)) [56] were applied to compare time complexity. Table 1 lists

the results of comparison for various normal basis systolic array

multipliers. The proposed semisystolic array GNB multiplier saves

6 percent space complexity and 27 percent time complexity but

with low throughput as compared to Kwon’s systolic array ONB

multiplier [23] for the case of type 2.

4 PROPOSED SEMISYSTOLIC GNB MULTIPLIER WITH

CONCURRENT ERROR DETECTION

No bit-parallel normal basis multiplier with concurrent error

detection (CED) has yet been developed. Hence, the concepts of

RESO are adopted to design a semisystolic array Gaussian

normal basis multiplier with CED capability. To achieve CED

capability, four rounds are necessary to perform C ¼ A� B, as

shown in Fig. 4. The first and third rounds are employed to

determine C1 and C2, respectively. The second and fourth

rounds are backups of the first and third rounds, respectively,

for achieving CED. Owing to the cyclic feature of the proposed

semisystolic GNB multiplier, the result C can also be found by

multiplying A by RoL(B), where RoL(B) denotes B rotated left by

one bit. The second and fourth rounds can be utilized to perform

RoL(C) ¼ A�RoL(B), where RoL(C) denotes C rotated left by

one bit. An error occurs if both results derived by computing

C ¼ A� B and C ¼ RoR(A � RoL(B)) are inconsistent. The fault

model in this study is single-cell fault.

Theorem 1. The proposed GNB multiplier (presented in Fig. 4) can

detect any single-cell fault.

Proof. If a single-cell fault occurs in cell Ui;j ð0 � i � mt;1 � j � mt=2Þ, then the faulty behavior can be categorized

into the following three cases:

1. The error on aout: Since ain and aout denote the samego-through line, thus the error on aout of cell Ui;j canbe detected by comparing ain of cell Ui;0 and aout ofcell Ui;n�1.

2. The error on bout: Because bin and bout are on the samego-through line, the error on bout of cell Ui;j can bedetected by comparing bin of the cell U1;<j�iþ1> with bout

854 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009

TABLE 1Comparisons of Various Normal-Basis Multipliers

Authorized licensed use limited to: LUNGHWA UNIV OF SCIENCE AND TECHNOLOGY. Downloaded on April 30, 2009 at 23:05 from IEEE Xplore. Restrictions apply.

of the cell Ui;<j�iþmt=2>, where �h i denotes the number� modmtþ 1.

3. The error on cout: In the first round, the error on cout of

cell Ui;j affects the output bit c00j . In the second round,

this error will influence the output bit c00jþ1h i. This error

can be identified by comparing both bits c00j and c00jþ1h i.

Similarly, such errors can also be detected by compar-

ing the results from the third and fourth rounds. tu

Because no existing bit-parallel GNB multiplier exists with CEDin the literature, the proposed GNB multiplier with CED capabilityonly compares with the proposed GNB multiplier without CEDcapability. Table 2 demonstrates that only a small overhead isstipulated to achieve CED capability. For instance, only about0.6 percent space complexity and 1 percent time complexity areadded for m ¼ 233 and type 2. The Equality Checkers employ anXOR-OR architecture. Two Equality Checkers with size mtþ 1 andone Equality Checker with size mt=2 are used for checking thecorrectness of C, B, and A.

5 PROPOSED SEMISYSTOLIC GNB MULTIPLIER WITH

CONCURRENT ERROR CORRECTION

No existing bit-parallel normal basis multiplier with concurrenterror correction (CEC) has been found in the literature. If theGNB multiplier can provide CEC capability, then it can furtherimprove the antifault-based cryptanalysis ability. The CECcapability can be easily achieved based on the proposedGNB multiplier. Fig. 5 depicts the proposed GNB multiplier withCEC capability, which required six rounds: three rounds forcomputing C1, followed by voting for tolerating single-cell fault.Another three rounds are performed for C2 and are applied tovoters to obtain the correct outputs. The first, second, and thirdrounds are for C1, and the final three rounds are for C2. The firstround is for computing C1 with the original operands A and B. Thesecond round applies the original operand A and a rotated operandRoL(B) to derive C1. The third round adopts A and RoL(RoL(B)) toobtain C1. Similarly, the fourth, fifth, and sixth rounds utilize B,RoL(B), and RoL(RoL(B)) to determine C2, respectively. ThePermutation Network in Fig. 5 carries out shifting operation

IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009 855

TABLE 2Comparisons of GNB Multipliers with and without CED Capability

Fig. 4 The semisystolic GNB multiplier for computing C ¼ A� B with concurrent

error detection.

Fig. 5 The semisystolic GNB multiplier for computing C ¼ A� B with concurrent

error detection.

Authorized licensed use limited to: LUNGHWA UNIV OF SCIENCE AND TECHNOLOGY. Downloaded on April 30, 2009 at 23:05 from IEEE Xplore. Restrictions apply.

without logic gates. The first and fourth rounds are responsible for

computing the result C. The second and third rounds are backups

of the first round. The fifth and sixth rounds are backups of the

fourth round. They are voted for achieving concurrent error

correction capability.

Theorem 2. The proposed GNB multiplier (illustrated in Fig. 5) can

tolerate any single-cell fault.

Proof. Assume that the single-cell fault occurred in the cell

Ui;j 0 � i � mt; 1 � j � mt=2ð Þ. The faulty behavior can be

classified into the following three cases:

1. The error on aout and bout: aout and bout are global lines.Errors on global lines can be detected by comparingprimary inputs and outputs of this U array.

2. The error on cout: In the first round, the error on cout of

the cell Ui;j affects the output bit c00j . In the second

round, this error affects the output bit c00jþ1h i, where �h iindicates the number � modmtþ 1. In the third round,

this error alters the output bit c00jþ2h i. Restated, this error

influences disjoint bits in different rounds. The error is

corrected after voting. Similarly, the results of the

fourth, fifth, and sixth rounds are voted to give the

correct C2.

Since no existing GNB multiplier with CEC is found in the

literature, only comparisons of the proposed GNB multiplier with

and without CEC capability are provided. Table 3 reveals that only a

small overhead is necessary to provide CEC capability. For example,

CEC adds only about 0.57 percent additional space complexity and

1.91 percent extra time complexity for the case of m ¼ 233 and

type 2. The voters are implemented by an AND-OR architecture.Notably, existing finite field multipliers over GFð2mÞ by parity

checking cannot provide CEC capability. The proposed GNB

multiplier can easily provide CEC capability using the RESO

concept. To provide CEC capability, no more than 2 percent extra

space and time complexities are added.

6 CONCLUSION

This study builds a semisystolic type-t GNB multiplier overGFð2mÞ. No systolic array architecture for GNB multiplier hasbeen found in the literature. The proposed semisystolic GNBmultiplier is suitable for VLSI chip implementation. The proposedsemisystolic GNB multiplier saves about 6 percent space complex-ity and 27 percent time complexity when compared to existingsystolic optimal normal basis multipliers of type 2. The proposedGNB multiplier can be converted to a multiplier with concurrenterror detection and correction with no hardware modification ofthe systolic array itself. The proposed GNB multiplier withconcurrent error detection only has 0.6 percent more space and1 percent more time complexity than the proposed multiplierwithout concurrent error detection for the case of m ¼ 233 andt ¼ 2, and provides equality checkers. Furthermore, the proposedGNB multiplier with concurrent error correction also has0.6 percent more space and 2 percent more time complexity thanthat without concurrent error correction. Moreover, no systolicarray architecture for GNB multiplier has previously beenpresented. A GNB of type 2 is same as an ONB of type 2. Theproposed GNB multiplier is superior to existing ONBs with type 2in both space and time complexity.

ACKNOWLEDGMENTS

The authors would like to thank anonymous referees and theeditor for carefully reading the paper and for their great help inimproving the paper. They also like to thank the National ScienceCouncil of the Republic of China, for financially supporting thisresearch under Contract No. NSC96-2221-E-231-006.

REFERENCES

[1] F.J. MacWilliams and N.J.A. Sloane, The Theory of Error-Correcting Codes.North Holland, 1977.

[2] R. Lidl and H. Niederreiter, Introduction to Finite Fields and TheirApplications. Cambridge Univ. Press, 1994.

[3] R.E. Blahut, Fast Algorithms for Digital Signal Processing. Addison-Wesley,1985.

[4] T.C. Bartee and D.J. Schneider, “Computation with Finite Fields,”Information and Computing, vol. 6, pp. 79-98, Mar. 1963.

[5] E.D. Mastrovito, “VLSI Architectures for Multiplication over Finite FieldGFð2mÞ,” Proc. Sixth Int’l Conf. Applied Algebra, Algebraic Algorithms, andError-Correcting Codes. (AAECC-6), T. Mora, ed., pp. 297-309, July 1988.

[6] C.K. Koc and B. Sunar, “Low-Complexity Bit-Parallel Canonical andNormal Basis Multipliers for a Class of Finite Fields,” IEEE Trans.Computers, vol. 47, no. 3, pp. 353-356, Mar. 1998.

[7] T. Itoh and S. Tsujii, “Structure of Parallel Multipliers for a Class of FieldsGFð2mÞ,” Information and Computation, vol. 83, pp. 21-40, 1989.

[8] C.Y. Lee, E.H. Lu, and J.Y. Lee, “Bit-Parallel Systolic Multipliers for GFð2mÞFields Defined by All-One and Equally-Spaced Polynomials,” IEEE Trans.Computers, vol. 50, no. 5, pp. 385-393, May 2001.

[9] C. Paar, “A New Architecture for a Parallel Finite Field Multiplier withLow Complexity Based on Composite Fields,” IEEE Trans. Computers,vol. 45, no. 7, pp. 856-861, July 1996.

[10] H. Wu, “Bit-Parallel Finite Field Multiplier and Squarer Using PolynomialBasis,” IEEE Trans. Computers, vol. 51, no. 7, pp. 750-758, July 2002.

[11] H. Fan and M.A. Hasan, “A New Approach to Subquadratic SpaceComplexity Parallel Multipliers for Extended Binary Fields,” IEEE Trans.Computers, vol. 56, no. 2, pp. 224-233, Feb. 2007.

[12] H. Wu, M.A. Hasan, and I.F. Blake, “New Low-Complexity Bit-ParallelFinite Field Multipliers Using Weakly Dual Bases,” IEEE Trans. Computers,vol. 47, no. 11, pp. 1223-1234, Nov. 1998.

[13] S.T.J. Fenn, M. Benaissa, and D. Taylor, “GFð2mÞ Multiplication andDivision over the Dual Basis,” IEEE Trans. Computers, vol. 45, no. 3,pp. 319-327, Mar. 1996.

[14] M. Wang and I.F. Blake, “Bit Serial Multiplication in Finite Fields,” SIAM J.Discrete Math., vol. 3, no. 1, pp. 140-148, Feb. 1990.

[15] E.R. Berlekamp, “Bit-Serial Reed-Solomon Encoder,” IEEE Trans. Informa-tion Theory, vol. 28, no. 6, pp. 869-874, Nov. 1982.

[16] C.Y. Lee and C.W. Chiou, “Efficient Design of Low-Complexity Bit-ParallelSystolic Hankel Multipliers to Implement Multiplication in Normal andDual Bases of GFð2mÞ,” IEICE Trans. Fundamentals of Electronics, Comm. andComputer Science, vol. E88-A, no. 11, pp. 3169-3179, Nov. 2005.

[17] J.L. Massey and J.K. Omura, Computational Method and Apparatus for FiniteField Arithmetic, US patent 4,587,627, May 1986.

856 IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009

TABLE 3Comparisons of GNB Multipliers with and without CEC Capability

Authorized licensed use limited to: LUNGHWA UNIV OF SCIENCE AND TECHNOLOGY. Downloaded on April 30, 2009 at 23:05 from IEEE Xplore. Restrictions apply.

[18] C.C. Wang, T.K. Truong, H.M. Shao, L.J. Deutsch, J.K. Omura, andI.S. Reed, “VLSI Architectures for Computing Multiplications andInverses in GFð2mÞ,” IEEE Trans. Computers, vol. 34, no. 8, pp. 709-717, Aug. 1985.

[19] A. Reyhani-Masoleh, “Efficient Algorithms and Architectures for FieldMultiplication Using Gaussian Normal Bases,” IEEE Trans. Computers,vol. 55, no. 1, pp. 34-47, Jan. 2006.

[20] C.W. Chiou and C.Y. Lee, “Multiplexer-Based Double-Exponentiation forNormal Basis of GF (2m),” Computers and Security, vol. 24, no. 1, pp. 83-86,2005.

[21] G.B. Agnew, R.C. Mullin, I.M. Onyszchuk, and S.A. Vanstone, “AnImplementation for a Fast Public-Key Cryptosystem,” J. Cryptology, vol. 3,pp. 63-79, 1991.

[22] M.A. Hasan, M.Z. Wang, and V.K. Bhargava, “A Modified Massey-OmuraParallel Multiplier for a Class of Finite Fields,” IEEE Trans. Computers,vol. 42, no. 10, pp. 1278-1280, Oct. 1993.

[23] S. Kwon, “A Low Complexity and a Low Latency Bit Parallel SystolicMultiplier over GFð2mÞ Using an Optimal Normal Basis of Type II,” Proc.16th IEEE Symp. Computer Arithmetic, pp. 196-202, June 2003.

[24] H. Fan and M.A. Hasan, “Subquadratic Computational ComplexitySchemes for Extended Binary Field Multiplication Using Optimal NormalBases,” IEEE Trans. Computers, vol. 56, no. 10, pp. 1435-1437, Oct. 2007.

[25] D.W. Ash, I.F. Blake, and S.A. Vanstone, “Low Complexity Normal Bases,”Discrete Applied Math., vol. 25, pp. 191-210, 1989.

[26] ANSI X.962, Public Key Cryptography for the Financial Services Industry: TheElliptic Curve Digital Signature Algorithm (ECDSA), Am. Nat’l StandardsInst., 1999.

[27] FIPS 186-2, Digital Signature Standard (DSS), Federal Information ProcessingStandards Publication 186-2, Nat’l Inst. of Standards and Technology, 2000.

[28] IEEE Standard 1363-2000, IEEE Standard Specifications for Public-KeyCryptography, Jan. 2000.

[29] D. Boneh, R. DeMillo, and R. Lipton, “On the Importance of CheckingCryptographic Protocols for Faults,” Proc. Ann. Int’l Conf. Eurocrypt,pp. 37-51, 1997.

[30] E. Biham and A. Shamir, “Differential Fault Analysis of Secret KeyCryptosystems,” Proc. Int’l Conf. Cryptology, pp. 513-525, 1997.

[31] J. Kelsey, B. Schneier, D. Wagner, and C. Hall, “Side-Channel Cryptanalysisof Product Ciphers,” Proc. European Symp. Research in Computer Security(ESORICS), pp. 97-110, Sept. 1998.

[32] R.J. Anderson and M. Kuhn, “Low Cost Attack on Tamper ResistantDevices,” Proc. Fifth Int’l Workshop Security Protocols, 1997.

[33] I. Biehl, B. Meyer, and V. Muller, “Differential Fault Attacks on EllipticCurve Cryptosystems,” Proc. Int’l Conf. Cryptology 2000, pp. 131-146, 2000.

[34] M. Ciet and M. Joye, “Elliptic Curve Cryptosystems in the Presence ofPermanent and Transient faults,” Cryptology ePrint Archive, 2003/028,http://eprint.iacr.org/2003/028.pdf, 2003.

[35] J. Blomer, M. Otto, and J.-P. Seifert, “Sign Change Fault Attacks on EllipticCurve Cryptosystems,” Proc. Int’l Workshop Fault Diagnosis and Tolerance inCryptography (FDTC ’06), pp. 36-52, 2006.

[36] R. Karri, G. Kuznetsov, and M. Goessel, “Parity-Based Concurrent ErrorDetection of Substitution-Permutation Network Block Ciphers,” Proc.Int’l Workshop Cryptographic Hardware and Embedded Systems (CHES ’03),pp. 113-124, 2003.

[37] G. Bertoni, L. Breveglieri, I. Koren, P. Maistri, and V. Piuri, “Error Analysisand Detection Procedures for a Hardware Implementation of the AdvancedEncryption Standard,” IEEE Trans. Computers, vol. 52, no. 4, pp. 492-505,Apr. 2003.

[38] M. Joye, A.K. Lenstra, and J.-J. Quisquater, “Chinese Remaindering BasedCryptosystems in the Presence of Faults,” J. Cryptology, vol. 12, pp. 241-245,1999.

[39] D. Boneh, R.A. DeMillo, and R.J. Lipton, “On the Importance of EliminatingErrors in Cryptographic Computations,” J. Cryptology, vol. 14, pp. 101-119,2001.

[40] S. Fenn, M. Gossel, M. Benaissa, and D. Taylor, “On-Line Error Detectionfor Bit-Serial Multipliers in GFð2mÞ,” J. Electronic Testing: Theory andApplications, vol. 13, pp. 29-40, 1998.

[41] A. Reyhani-Masoleh and M.A. Hasan, “Error Detection in Polynomial BasisMultipliers over Binary Extension Fields,” Proc. Int’l Workshop CryptographicHardware and Embedded Systems (CHES ’02), pp. 515-528, 2003.

[42] A. Reyhani-Masoleh and M.A. Hasan, “Fault Detection Architectures forField Multiplication Using Polynomial Bases,” IEEE Trans. Computers,vol. 55, no. 9, pp. 1089-1103, Sept. 2006.

[43] C.-Y. Lee, C.W. Chiou, and J.-L. Lin, “Concurrent Error Detection in a Bit-Parallel Systolic Multiplier for Dual Basis of GFð2mÞ,” J. Electronic Testing:Theory and Applications, vol. 21, no. 5, pp. 539-549, 2005.

[44] C.W. Chiou, “Concurrent Error Detection in Array Multipliers for GFð2mÞFields,” IEE Electronics Letters, vol. 38, no. 14, pp. 688-689, July 2002.

[45] C.W. Chiou, C.Y. Lee, and J.M. Lin, “Concurrent Error Detection in aPolynomial Basis Multiplier over GFð2mÞ,” J. Electronic Testing: Theory andApplications, vol. 22, no. 2, pp. 143-150, Apr. 2006.

[46] C.W. Chiou, C.Y. Lee, A.W. Deng, and J.M. Lin, “Concurrent ErrorDetection in Montgomery Multiplication over GFð2mÞ,” IEICE Trans.Fundamentals of Electronics, Comm., and Computer Science, vol. E89-A, no. 2,pp. 566-574, Feb. 2006.

[47] J.H. Patel and L.Y. Fung, “Concurrent Error Detection in ALU’s byRecomputing with Shifted Operands,” IEEE Trans. Computers, vol. 31, no. 7,pp. 589-595, July 1982.

[48] J.H. Patel and L.Y. Fung, “Concurrent Error Detection in Multiply andDivide Arrays,” IEEE Trans. Computers, vol. 32, no. 4, pp. 417-422, Apr.1983.

[49] A.J. Menezes, Applications of Finite Fields. Kluwer Academic Publications,1993.

[50] I.F. Blake, R.M. Roth, and G. Seroussi, “Efficient Arithmetic in GFð2mÞthrough Palindromic Representation,” Technical Report HPL-98-134,http://www.hpl.hp.com/techreports/98/HPL-98-134.html, 1998.

[51] H.Y. Kim, J.Y. Park, J.H. Cheon, J.H. Park, J.H. Kim, and S.G. Hahn, “FastElliptic Curve Point Counting Using Gaussian Normal Basis,” Proc. Ann.Int’l Conf. EUROCRYPT 2002, pp. 14-28, 2002.

[52] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A SystemPerspective. Addison-Wesley, 1985.

[53] M74HC86, Quad Exclusive OR Gate, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/2006.pdf, 2001.

[54] M74HC08, Quad 2-Input AND Gate, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1885.pdf, 2001.

[55] M74HC279, Quad �S� �R Latch, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1937.pdf, 2001.

[56] M74HC32: Quad 2-Input OR Gate, STMicroelectronics, http://www.st.com/stonline/books/pdf/docs/1944.pdf, 2001.

IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 6, JUNE 2009 857

Authorized licensed use limited to: LUNGHWA UNIV OF SCIENCE AND TECHNOLOGY. Downloaded on April 30, 2009 at 23:05 from IEEE Xplore. Restrictions apply.