Bit-Parallel Finite Field Multipliers for Irreducible Trinomials

Bit-Parallel Finite Field Multipliersfor Irreducible Trinomials

Jose Luis Imana, Juan Manuel Sanchez, and Francisco Tirado, Senior Member, IEEE

Abstract—A new formulation for the canonical basis multiplication in the finite fields GF ð2mÞ based on the use of a triangular basis and

on the decomposition of a product matrix is presented. From this algorithm, a new method for multiplication (named transpositional)

applicable to general irreducible polynomials is deduced. The transpositional method is based on the computation of 1-cycles and

2-cycles given by a permutation defined by the coordinate of the product to be computed and by the cardinality of the field GF ð2mÞ. The

obtained cycles define groups corresponding to subexpressions that can be shared among the different product coordinates. This new

multiplication method is applied to five types of irreducible trinomials. These polynomials have been widely studied due to their low-

complexity implementations. The theoretical complexity analysis of the corresponding bit-parallel multipliers shows that the space

complexities of our multipliers match the best results known to date for similar canonical GF ð2mÞ multipliers. The most important new

result is the reduction, in two of the five studied trinomials, of the time complexity with respect to the best known results.

Index Terms—Finite (or Galois) fields, multiplication, canonical basis, irreducible trinomials, complexity, triangular basis, matrix

decomposition, permutation, cycles, transpositions.

�

1 INTRODUCTION

EFFICIENT hardware implementations of arithmetic opera-tions in the Galois field GF ð2mÞ are highly desirable for

several applications, such as coding theory, computeralgebra, and cryptography [11], [13]. The efficiency of thehardware implementations is measured in terms of thenumber of gates (XOR and AND) and of the total gate delayof the circuit (TXOR and TAND). The representation of thefield elements has a crucial role in determining the spaceand time complexities of the arithmetic operations, parti-cularly the multiplication, which is considered the mostimportant building block. A number of efficient GF ð2mÞmultiplication approaches and architectures have beenproposed in which different basis representations of fieldelements are used. Among them, the most widely used arethe canonical (or standard or polynomial) [9], [12], normal [14],and dual [1] bases, although other ones, such as triangular [4]basis, can also be used. The complexity of the multiplieralso depends on the defining irreducible polynomialselected for the field. In this paper, we are interested inthe design of bit-parallel finite field multipliers usingcanonical basis for irreducible trinomials.

The canonical basis multiplication requires a polynomialmodular multiplication followed by a modular reduction.An efficient bit-parallel multiplier was proposed by Mas-trovito [12] in which a product matrix is introduced tocombine the above two steps together. The entries in this

matrix can be computed efficiently by sharing commonitems, a method that is known as subexpression sharing [15].The Mastrovito multipliers using the special irreducibletrinomials have been widely studied due to their low-complexity implementations [2], [3], [17], [20]. All theseworks exploit the subexpression sharing in order to find anefficient architecture for the multipliers. It has been shownin [17] that, when the irreducible trinomial is of the generalform xm þ xn þ 1 for n ¼ 1; 2; . . . ;m� 1 and m 6¼ 2n, thenthe Mastrovito multiplier only requires ðm2 � 1Þ XOR gatesand m2 AND gates. However, the required number ofXOR gates is reduced to ðm2 � m

2 Þ for the trinomial xm þxm2 þ 1 where m is even [17].

In this paper, we present a new canonical basis multi-plication method named transpositional. This method wasintroduced in [8], but only for fields generated byirreducible AOPs (all-one-polynomials). In this contribution,we generalize the method and apply it to irreducibletrinomials. In order to do so, we give a new generalformulation for the canonical basis multiplication inGF ð2mÞ based on the use of a triangular basis. This approachintroduce a product matrix that can be decomposed in a sum ofmatrices depending on the irreducible polynomial selectedfor the field. The decomposition of a matrix is an idea alreadyused in similar multiplication approaches [7], [10], [20] overfinite fields that try to exploit the subexpression sharing inorder to obtain GF ð2mÞ multipliers with reduced complex-ities. From this matrix decomposition, the transpositionalmultiplication method is deduced. It uses the notation givenin group theory for permutations and it is based on thecomputation of 1-cycles and 2-cycles given by the permuta-tion defined by the coordinate of the product to becomputed and by the cardinality of the field GF ð2mÞ. Theobtained cycles define groups corresponding to subexpres-sions in sum-of-products form that can be shared among thedifferent expressions of the product coordinates. A veryimportant characteristic of our method is that these first

520 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 5, MAY 2006

. J.L. Imana and F. Tirado are with the Departamento de Arquitectura deComputadores y Automatica, Facultad Ciencias Fisicas, UniversidadComplutense, 28040 Madrid, Spain.E-mail: {jluimana, ptirado}@dacya.ucm.es.

. J.M. Sanchez is with the Departamento de Informatica, Escuela Politecnica,Avda. Universidad s/n, 10071 Caceres, Spain.E-mail: [email protected].

Manuscript received 28 May 2004; revised 30 May 2005 accepted 20 Oct.2005; published online 22 Mar. 2006.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-0181-0504.

0018-9340/06/$20.00 � 2006 IEEE Published by the IEEE Computer Society

groups of subexpresions and the first matrix obtained fromthe product matrix decomposition are common to anyirreducible polynomial we could select, so our method can becompletely generalized to perform the multiplication overgeneral irreducible polynomials. In this paper, the transposi-tional method is applied to five types of irreducibletrinomials, for which more complex groups of subexpres-sions for sharing are determined. We also present explicitexpressions for multiplication for these five types oftrinomials. These expressions can be easily coded usinghardware description languages, such as VHDL andVerilog, to implement optimized multipliers. These codingscan be done without having any knowledge of finite fieldarithmetic. The theoretical complexity analyses of thecorresponding bit-parallel multipliers show that the spacecomplexities of our multipliers match the best results foundin the literature, while our method reduces, in two of thefive studied trinomials, the best time complexities known todate for similar multipliers.

We first introduce some basic concepts of the Galoisfields GF ð2mÞ in Section 2. In Section 3, a new formulationfor the canonical basis multiplication over GF ð2mÞ ispresented. The application of the new algorithm formultiplication to irreducible trinomials and the productmatrix decomposition are given in Section 4. From these,the first expressions for the transpositional multiplicationmethod are given in Section 5, where a new notation basedon permutations is used and where a first grouping ofsubexpressions can be observed. The transpositional methodis applied to five types of trinomials in Sections 6 to 10,where new particular expressions and more complex groupsGi of subexpressions are given for each trinomial. Theore-tical complexity analyses and some examples are alsopresented. Finally, the conclusions are summarized inSection 11.

2 PRELIMINARIES

The finite fieldGF ð2mÞ can be considered as a vector space ofdimensionmover the binary fieldGF ð2Þ. Therefore, elementsof GF ð2mÞ are represented by binary vectors of length m.Field addition is realized in all bases by a bit-wise XORoperation, whereas the structure of multiplication is deter-mined by the choice of the basis. A canonical basis � is the set ofelements � ¼ f1; !; !2; . . . ; !m�1g, where! is a root inGF ð2mÞof an irreducible polynomial fðxÞ ¼

Pmi¼0 fix

i of degree moverGF ð2Þ. Using this basis, the elements of the fieldGF ð2mÞare polynomials of degree at most m� 1 over GF ð2Þ andarithmetic is carried out modulo fðxÞ. The set � ¼f�0; �1; . . . ; �m�1g of m elements is called the triangular basis[5], [4] of � if �i ¼

Pm�1�ij¼0 fiþjþ1!

j, 0 � i � m� 1, where fisare the coefficients of fðxÞ. An element � 2 GF ð2mÞ can berepresented with respect to � as

� ¼Xm�1

i¼0

a�i!i ¼ ð1; !; . . . ; !m�1Þ � ða�0

; a�1; � � � ; a�m�1

ÞT ;

where a�is are the coordinates of � with respect to �. We

can denote as �� the vector of the coordinates of � withrespect to �, i.e., �� ¼ ða�0

; . . . ; a�m�1ÞT . The coordinates

vector of � with respect to � can be computed as�� ¼ T � ��, where [4]

T ¼

0 0 0 � � � 0 0 10 0 0 � � � 0 1 t10 0 0 � � � 1 t1 t2... ..

. ... . .

. ... ..

. ...

0 1 t1 � � � tm�4 tm�3 tm�2

1 t1 t2 � � � tm�3 tm�2 tm�1

0BBBBBBB@

1CCCCCCCA

ð1Þ

and tj ¼Pj�1

i¼0 fm�jþiti, for 0 < j � m� 1, and t0 ¼ 1. Wealso have that the m�m Hankel matrix Hð��Þ ¼ð��; �!�; . . . ; �!m�1

� Þ can be computed with the expres-sion [4]

�!iþ1�j¼

�!i�jþ10 � j � m� 1Pm�1

l¼0 �!i�lfl j ¼ m� 1:

(ð2Þ

Using the above concepts, we present in the following anew general formulation for the canonical basis multi-plication in GF ð2mÞ based on the use of a triangular basis.

3 CANONICAL BASIS MULTIPLICATION

Let �; �; � 2 GF ð2mÞ and ��; ��; ��be their coordinate

vectors, respectively, with respect to �. The product � ¼� � � can then be performed as follows:

1. Represent � in the triangular basis � as ��¼ T � �

�.

2. Construct the Hankel matrix for ��

,

Hð��Þ ¼ ð�

�; �!

�; . . . ; �!m�1

�Þ:

3. Construct a new m�m matrix, Kð��Þ, defined as

Kð��Þ ¼ Hð�

�Þ � F; ð3Þ

where F is a new m�m Toeplitz matrix defined as

F ¼

fm fm�1 fm�2 � � � f2 f1

0 fm fm�1 � � � f3 f2

0 0 fm � � � f4 f3

..

. ... ..

. . .. ..

. ...

0 0 0 � � � fm fm�1

0 0 0 � � � 0 fm

0BBBBBBB@

1CCCCCCCA

ð4Þ

and where the fis, 1 � i � m, are the coefficients ofthe irreducible polynomial fðxÞ.

4. Let �R� be the inverted coordinates of ��. The product� ¼ � � � in the canonical basis � can be finallycomputed as

�R� ¼ ðd�m�1; d�m�2

; . . . ; d�1; d�0Þ ¼ �T� �Kð��

Þ: ð5Þ

In (5), the product matrix K depends on the irreduciblepolynomial selected and the coordinates are given in sum-

of-products form, where these operations are performed overGF ð2Þ. K can be decomposed in a sum of matrices in such away that some groups of subexpressions can be shared amongthe product coordinates. This grouping of shared subexpres-

sions constitutes a new method for the canonical basismultiplication that we have named transpositional. The

IMA~NNA ET AL.: BIT-PARALLEL FINITE FIELD MULTIPLIERS FOR IRREDUCIBLE TRINOMIALS 521

number and structure of the matrices obtained from thedecomposition of K depend on the irreducible polynomialselected. Subexpression sharing [15] has been used by manyauthors [2], [3], [6], [9], [10], [17], [19], [20] to find low-complexity architectures for finite field multipliers. In thiswork, we apply the multiplication algorithm given in (5) toirreducible trinomials. From this application, we deduce thetranspositional multiplication method and present theproduct coordinates expressions obtained for five types ofirreducible trinomials. The theoretical space and timecomplexities for the bit-parallel multipliers are also given.

4 IRREDUCIBLE TRINOMIALS

In the canonical basis �, the product � of two elements �and �, � ¼ � � �, from a finite field generated by anirreducible trinomial fðxÞ ¼ xm þ xn þ 1 can be performedusing the new multiplication algorithm given in Section 3.When irreducible trinomials are used, the F matrix presentsonly two not null terms fi, corresponding to the nonzerocoefficients fm and fn from the trinomial. The K matrix canbe decomposed into a sum of matrices whose numberdepends on the values of m and n corresponding to fm andfn from the selected trinomial. The decomposition of K in asum of m�m matrices is performed in the following form:

K ¼ K0 þX�i¼1

Ki; ð6Þ

where � ¼ dm�1m�ne and where the K0 matrix is common to

any selected irreducible trinomial (even more, it can beproven that K0 is common to any irreducible polynomial). Thestructure of K0 is the following:

K0 ¼

c�m�1c�m�2

� � � c�1c�0

c�m�2c�m�3

� � � c�0c�m�1

c�m�3c�m�4

� � � c�m�1c�m�2

..

. ... . .

. ... ..

.

c�1c�0

� � � c�3c�2

c�0c�m�1

� � � c�2c�1

0BBBBBBB@

1CCCCCCCA; ð7Þ

where the c�iterms are the coordinates of � with respect to

� and where K0 is a Hankel matrix. From the decomposi-tion given in (6), we have that, for any trinomialfðxÞ ¼ xm þ xn þ 1, there exists at least one matrix K1

whose structure is given in (8), where the subscripts � havebeen removed from the terms c�i

for clarity.The K1 matrix presents some particularities. The first

one is that its nth column (numbered from 1 to m, fromright to left) has all its elements null, where the n index ofthis column corresponds with the index of the not nullcoefficient fn of the selected irreducible trinomial. It can beconsidered that the nth column divides the m�m K1

matrix into two submatrices. The first one is made up of them� n columns located to the left of the null column and thesecond one is made up of the n� 1 columns to the right ofthe nth column. We denote the left m� ðm� nÞ submatrixas L1 and the right m� ðn� 1Þ submatrix as R1, in such away that K1 can be represented as K1 ¼ ðL1jR1Þ. Thesubmatrices L1 and R1 are constructed by the consecutiveshifting of their less significant columns one row down with

zero insertion in the top one. The most significant column of

L1 is made up of m� n zeros in the upper rows and of n not

null rows with ci coefficients, while the less significant

column of R1 is made up of m� nþ 1 null rows and of

n� 1 rows with ci coefficients.

0 � � � 0 0 0 0 0 � � � 00 � � � 0 cm�1 0 0 0 � � � 00 � � � cm�1 cm�2 0 0 0 � � � 00 � � � cm�2 cm�3 0 0 0 � � � 0

..

. . .. ..

. ... ..

. ... ..

. . .. ..

.

0 � � � cnþ2 cnþ1 0 0 0 � � � 0cm�1 � � � cnþ1 cn 0 0 0 � � � 0cm�2 � � � cn cn�1 0 0 0 � � � cm�1

..

. . .. ..

. ... ..

. ... ..

. . .. ..

.

cm�nþ1 � � � c3 c2 0 0 cm�1 � � � cm�nþ2

cm�n � � � c2 c1 0 cm�1 cm�2 � � � cm�nþ1

0BBBBBBBBBBBBBBBBBB@

1CCCCCCCCCCCCCCCCCCA

:

ð8Þ

The existence of matrices K2;K3; . . . in the summation

given in (6) depends on the position of the nth null column

in K1 as is determined in the following: Starting with K1,

there will exist a K2 ¼ ðL2jR2Þmatrix in the decomposition

of K if the most significant column of L1 has n > 1 not null

ci terms. The construction of the submatrices L2 and R2 is

performed using R1 in the following form:

. If m < 2n� 1, a K3 matrix constructed in ananalogous way to K2 and, from the latter, exists.The submatrix L2 consists of the first m� n columnsfrom R1 and the remaining 2n�m� 1 columnsfrom R1 are the first columns of R2. The rest of them� n columns of R2 will be null columns.

. If m � 2n� 1, then K2 is the last matrix in thesummation given in (6). The submatrix L2 consists ofthe n� 1 columns from R1, while the remaining m�2nþ 1 columns of L2 (if they exist) are completedwith null columns. In this case, R2 will be asubmatrix with all its elements null.

In general, for any Kj ¼ ðLjjRjÞ matrix with j � 2, it is

verified that if the ðm� nÞth column from Lj presents h > 1

coefficients ci in its lower rows, then there will exist a

matrix Kjþ1 ¼ ðLjþ1jRjþ1Þ whose Ljþ1 and Rjþ1 subma-

trices are constructed from Rj as follows:

. If h� 1 > m� n, then Ljþ1 consists of the first m� ncolumns from Rj, while the remaining h� 1�mþ ncolumns from Rj are the first columns of Rjþ1,completed with m� h null columns. In this case,there will exist another Kjþ2 matrix constructed inan analogous way to Kjþ1.

. If h� 1 � m� n, the first h� 1 columns of Ljþ1 arethe h� 1 not null columns from Rj completed withm� n� hþ 1 null columns, while Rjþ1 has all itscolumns null. In this case, no more matrices exist.

The number of Kj matrices, j � 2, existing in the

summation given in (6) depends on the value of the nth

not null power of the selected irreducible trinomial

fðxÞ ¼ xm þ xn þ 1. Therefore, special trinomials with a

determined number of matrices in the K summation can be


distinguished. Once K is decomposed in a sum of � þ 1matrices, � ¼ dm�1

m�ne, and using the multiplication algorithmgiven in Section 3, the product � ¼ � � � in the canonicalbasis � generated by an irreducible trinomial can then becomputed by

�R� ¼ �T� �K ��

� �¼ �T� � K0 þ

X�i¼1

Ki

!

¼ �T� �K0 þ � � � þ �T� �K� ;

ð9Þ

where the coordinates of � are given as sum-of-products ofthe coordinates of � and �. From (9), a new method formultiplication (that we have named transpositional) basedon the grouping and sharing of common subexpressions isintroduced in the following section.

5 TRANSPOSITIONAL METHOD FOR

MULTIPLICATION

The product coordinates given in (9) are computed bysumming the �T�Ki (i ¼ 0; . . . ; �) vectors, where �T�K0 and�T�K1 always appear in the summation, with independenceof the selected trinomial (even more, the �T�K0 vector iscommon to any irreducible polynomial). The K0 matrixdefined in (7) consists of m columns formed by thesuccessive rotation of the coordinates vector in � of �,while those of K1 and Kj (j ¼ 2; 3; . . . ; �) are formed bysuccessive shiftings of the � coordinates vector. Animportant matter is that the components of �T�K1 and�T�Kj (j ¼ 2; 3; . . . ; �) consist of sum-of-products that alsoappear in the components of �T�K0. Common subexpres-sions grouped in sums of products and extracted from the�T�K0 components can therefore be determined and theirsharing leads to the reduction of the multiplier complexity.In order to give a new expression for the computation of thevectors �T�Ki (i ¼ 0; 1; . . . ; �), we use the notation given ingroup theory for permutations, which is remembered with thefollowing example:

The sum of products a0c1 þ a1c0 þ a3c9 þ a4c8 þ a5c7 þa6c6 þ a7c5 þ a8c4 þ a9c3 can be considered as the innerproduct of the coordinates vectors of the field elements �and � represented in � by ða0; a1; a3; a4; a5; a6; a7; a8; a9Þ andðc1; c0; c9; c8; c7; c6; c5; c4; c3Þ, respectively, where the sub-scripts � have been omitted for clarity. The product termsaicj can then be represented by the permutation

0 1 3 4 5 6 7 8 91 0 9 8 7 6 5 4 3

� �;

where the upper row contains the subscripts of thecoordinates of � which will be multiplied by the coordi-nates of � with subscripts given in the lower row. Startingwith the symbol 0, we see that the permutation takes 0 into1, representing the product a0c1. We then look for 1 in theupper line and see that the permutation takes 1 into 0 (a1c0

product), closing a cycle, which we write as ð0; 1Þ. This cyclerepresents the sum ða0c1 þ a1c0Þ in our notation. We nowstart with some other symbol in the top line, say 6. Thepermutation takes 6 into 6, giving the cycle ð6Þ representinga6c6. Continuing, we find the cycles ð3; 9Þ, ð4; 8Þ, and ð5; 7Þ.We may finally write our example permutation as the cycles

ð0; 1Þð3; 9Þð4; 8Þð5; 7Þð6Þ. The sum of products given for this

example will be the addition of the terms xij ¼ ðaicj þ ajciÞand xk ¼ ðakckÞ represented by the 2-cycles ði; jÞ and

1-cycles ðkÞ, respectively. The 2-cycles ði; jÞ are called, in

group theory, transpositions.Using this new notation, we introduce the functions

which provide the cycles for given values of the

parameters i and m. We group them in function of their

dependence or not on m and in function of the even or odd

value of i as follows:

. Function independent of m and valid for even valuesof i:

ECi0 ¼ ðh; lÞ

h ¼ 0; 1; . . . ;i

2� 1; l ¼ ði� h� 1Þ; h � 0:

ð10Þ

. Function independent of m and valid for odd valuesof i:

OCi0 ¼ðkÞ k ¼ i�1

2

ðh; lÞ h ¼ 0; 1; . . . ; i�12 � 1;

l ¼ ði� h� 1Þ; h � 0:

8<: ð11Þ

. Functions dependent on m and valid for even or oddvalues of i:

ECi1 ¼ðkÞ k ¼ dm2e þ bi2c

ðp; rÞ j ¼ 1; 2; . . . ; ðdm2e þ bi2cÞ � ðiþ 1Þp ¼ ðiþ jÞ; r ¼ ðm� jÞ;

�8<:

ð12Þ

OCi1 ¼ ðp; rÞ j ¼ 1; 2; . . . ; ðdm2 e þ biþ12 cÞ � ðiþ 1Þ

p ¼ ðiþ jÞ; r ¼ ðm� jÞ:

�ð13Þ

The new functions ECi0, OCi0, ECi1, and OCi1 compute

the 1-cycles and 2-cycles for given values of i and m, and

determine the subexpressions grouping characteristic of the

transpositional method. This is performed defining the new

functions Ei0, Oi0, Ei1, and Oi1, which carry out the sum of

the xk and xij terms represented by the cycles given by

ECi0, OCi0, ECi1, and OCi1, respectively.The �T�K0 vector is made from the sum of the new

functions Ei0, Oi0, Ei1, and Oi1. Moreover, the subexpres-

sions given by Ei1 and Oi1 constitute the components of all

the �T�Ki (i ¼ 1; . . . ; �) vectors. As these functions are

previously made for the construction of the �T�K0 vector,

they do not have to be reconstructed, so their sharing

reduces the multiplier complexity. The �T�K0 and �T�K1

vectors always appear in the summation given in (9). The

composition of �T�K0 is common to any trinomial selected,

but the composition of �T�K1 and the number and

composition of �T�Kj (j ¼ 2; . . . ; �) in (9) depend on the

selected trinomial. Using Ei0, Oi0, Ei1, and Oi1, the ith

(i ¼ 0; . . . ;m� 1) components of �T�K0 can be computed for

even (y odd) values of m as follows:


ð�T�K0Þi ¼yOðm�iÞ0 þEðm�i�1Þ1 odd i

yEðm�iÞ0 þOðm�i�1Þ1 even i; i 6¼ 0� i ¼ 0:

�8<: ð14Þ

In (14) and in the following, the symbol y indicates that,

for odd values of the given parameter, the E and O functions

must be exchanged, that is, yEij ¼ Oij and yOij ¼ Eij.

The �T�K1 and �T�Kj (j ¼ 2; 3; . . . ; �) vectors consist

exclusively of the functions Ei1 and Oi1 that are among the

components of �T�K0. We give the following expressions for

the computation of the ith components of �T�K1, for even (y

odd) values of m and where a new parameter � ¼ m� n is

introduced:

ð�T�K1Þi ¼

Eðð�þmÞ�ðiþ1ÞÞ1 i > �� i ¼ �yEð��ðiþ1ÞÞ1 i < �

9=; ðeven i and odd �Þ orðodd i and even �Þ

Oðð�þmÞ�ðiþ1ÞÞ1 i > �� i ¼ �yOð��ðiþ1ÞÞ1 i < �

9=; ð odd i and odd �Þ orðeven i and even �Þ:

8>>>>>><>>>>>>:

ð15Þ

We can also establish the expressions for the recursive

computation of the ith components (i ¼ 0; 1; . . . ;m� 1) of

the �T�Kj (j ¼ 2; 3; . . . ; � ¼ dm�1� e) vectors as follows:

ð�T�KjÞi ¼� i 2 f�;�þ 1; . . . ; 2�g mod m

ð�T�Kj�1Þði��Þmod m i 62 f�;�þ 1; . . . ; 2�gmod m;

(

ð16Þ

where the computation is performed starting with the

knowledge of �T�K1. The recursion will stop when a �T�Kj

vector with all its components null is obtained, represented

by the � symbol from (14) to (16). The last not null vector is

�T�K� , with � ¼ dm�1� e, as given in (6).

From (14) to (16), the general expressions for the

coordinates of the product � in � of two field elements �

and � using the transpositional method for irreducible

generating trinomials can be established. The expressions

are obtained from (9) and are given as (i ¼ 0; . . . ;m� 1)

d�i¼ ð�T�K0Þm�1�i þ ð�T�K1Þm�1�i þ

X�j¼2

ð�T�KjÞm�1�i; ð17Þ

where d�is are the coordinates of � in �. The number and

composition of the vectors �T�Kj (j ¼ 1; 2; . . . ; �) included in

the summation given in (17) depend on the trinomial type

used. In following sections, we study the transpositional

multiplication method in � for five types of irreducible

trinomials and their particular equations given by our

method are established. The theoretical complexity analyses

of the bit-parallel multipliers thus determined are also

presented.

6 IRREDUCIBLE TRINOMIALS

fðxÞ ¼ xm þ xm�1 þ 1

When the generating irreducible trinomial has the form

fðxÞ ¼ xm þ xm�1 þ 1, the parameter � ¼ dm�1� e ¼ m� 1

(with � ¼ 1). The decomposition of K given in (6) is the

sum K0 þK1 þ � � � þKm�1, which can also be deduced

from the structure of K1 given in (8) and from its properties

(see Section 4). From the general expressions of the d�i

coordinates of the product � given in (17) and using (14) to

(16), we can give the expressions for the computation of the

coordinates d�iof the product. For even values of m,

d�i¼

ðEðiþ1Þ0 þOi1Þ þXm�2

jðevenÞ¼iþ�

Ej1 þXm�2

jðoddÞ¼iþ�

Oj1 odd i

ðOðiþ1Þ0 þEi1Þ þ

Xm�2

jðevenÞ¼iþ�

Ej1þ

Xm�2

jðoddÞ¼iþ�

Oj1

even i

� i ¼ m� 2

8>>>>>>><>>>>>>>:

ðEm0Þ þXm�2

jðevenÞ¼0

Ej1 þXm�2

jðoddÞ¼0

Oj1 i ¼ m� 1

8>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>:

ð18Þ

while that for odd m values (i ¼ 0; 1; . . . ;m� 1) is

d�i¼

ðOðiþ1Þ0 þOi1Þ þXm�2

jðevenÞ¼iþ�

Oj1 þXm�2

jðoddÞ¼iþ�

Ej1 even i

ðEðiþ1Þ0 þEi1Þ þ

Xm�2

jðevenÞ¼iþ�

Oj1þ

Xm�2

jðoddÞ¼iþ�

Ej1

odd i

� i ¼ m� 2

8>>>>>>><>>>>>>>:

ðOm0Þ þXm�2

jðevenÞ¼0

Oj1 þXm�2

jðoddÞ¼0

Ej1 i ¼ m� 1:

8>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>:

ð19Þ

In (18) and (19), the terms in brackets are given by �T�K0

and the summations are given by �T�K1 and �T�Kj, withj ¼ 2; . . . ;m� 1. The grouping and sharing of the Ei1 and Oi1

functions reduce the theoretical complexities of the parallelmultipliers constructed with our method. A multiplicationexample over GF ð26Þ is presented in the following.

6.1 Multiplication Example over GF ð26ÞThe product � of two elements � and � from GF ð26Þgenerated by the irreducible trinomial fðxÞ ¼ x6 þ x5 þ 1can be computed using (18). The decomposition of K isgiven by the sum of the 1þ � ¼ 1þ d6�1

6�5e ¼ 6 followingmatrices, where the ci terms are the � coordinates in � andwhere K0, K1 and Kj (j ¼ 2; . . . ; 5) are constructed asmentioned in Section 4.


K ¼ K0 þK1 þX5

j¼2

Kj ¼

c5 c4 c3 c2 c1 c0

c4 c3 c2 c1 c0 c5

c3 c2 c1 c0 c5 c4

c2 c1 c0 c5 c4 c3

c1 c0 c5 c4 c3 c2

c0 c5 c4 c3 c2 c1

0BBBBBBBB@

1CCCCCCCCA

þ

0 0 0 0 0 0

c5 0 0 0 0 0

c4 0 0 0 0 c5

c3 0 0 0 c5 c4

c2 0 0 c5 c4 c3

c1 0 c5 c4 c3 c2

0BBBBBBBB@

1CCCCCCCCAþ

0 0 0 0 0 0

0 0 0 0 0 0

c5 0 0 0 0 0

c4 0 0 0 0 c5

c3 0 0 0 c5 c4

c2 0 0 c5 c4 c3

0BBBBBBBB@

1CCCCCCCCA

þ

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

c5 0 0 0 0 0

c4 0 0 0 0 c5

c3 0 0 0 c5 c4

0BBBBBBBB@

1CCCCCCCCAþ

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

c5 0 0 0 0 0

c4 0 0 0 0 c5

0BBBBBBBB@

1CCCCCCCCA

þ

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

c5 0 0 0 0 0

0BBBBBBBB@

1CCCCCCCCA:

ð20Þ

The di coordinates of the product � can be computed

using (9) and (20), but, from the sum of products obtained,

the grouping and sharing of terms given by the transpositional

method cannot be deduced. The 1-cycles and 2-cycles

obtained with this method are computed from (10) to (13)

and, for our GF ð26Þ example, are given by the functions

OC10 ¼ ð0Þ;EC20 ¼ ð0; 1Þ;OC30 ¼ ð1Þð0; 2Þ;EC40 ¼ ð0; 3Þð1; 2Þ;OC50 ¼ ð2Þð0; 4Þð1; 3Þ;EC60 ¼ ð0; 5Þð1; 4Þð2; 3Þ;

and

EC01 ¼ ð3Þð1; 5Þð2; 4Þ;OC11 ¼ ð2; 5Þð3; 4Þ;EC21 ¼ ð4Þð3; 5Þ;OC31 ¼ ð4; 5Þ;EC41 ¼ ð5Þ:

The sums of terms represented by these cycles are given by

O10, E20, O30, E40, O50, E60 and E01, O11, E21, O31, E41,

respectively, obtaining

O10 ¼ x0 ¼ a0c0

E20 ¼ x01 ¼ ða0c1 þ a1c0ÞO30 ¼ x1 þ x02 ¼ a1c1 þ ða0c2 þ a2c0ÞE40 ¼ x03 þ x12 ¼ ða0c3 þ a3c0Þ þ ða1c2 þ a2c1ÞO50 ¼ x2 þ x04 þ x13 ¼ a2c2 þ ða0c4 þ a4c0Þ

þða1c3 þ a3c1ÞE60 ¼ x05 þ x14 þ x23 ¼ ða0c5 þ a5c0Þ þ ða1c4 þ a4c1Þ

þða2c3 þ a3c2Þð21Þ

E01 ¼ x3 þ x15 þ x24 ¼ a3c3 þ ða1c5 þ a5c1Þþða2c4 þ a4c2Þ

O11 ¼ x25 þ x34 ¼ ða2c5 þ a5c2Þ þ ða3c4 þ a4c3ÞE21 ¼ x4 þ x35 ¼ a4c4 þ ða3c5 þ a5c3ÞO31 ¼ x45 ¼ ða4c5 þ a5c4ÞE41 ¼ x5 ¼ a5c5;

ð22Þ

where the ai and ci terms are the coordinates of � and �from GF ð26Þ, respectively. The components of �T�K0, �T�K1,and �T�Kj (j ¼ 2; 3; 4; 5) can be computed using (14), (15),and (16), respectively, for even values of m, obtaining

�T�K0 ¼ ðE60;O50 þE41;E40 þO31;O30 þE21;E20 þO11;O10 þE01Þ

�T�K1 ¼ ðE01;�;E41;O31;E21;O11Þ�T�K2 ¼ ðO11;�;�;E41;O31;E21Þ�T�K3 ¼ ðE21;�;�;�;E41;O31Þ�T�K4 ¼ ðO31;�;�;�;�;E41Þ�T�K5 ¼ ðE41;�;�;�;�;�Þ;

ð23Þ

where the � symbol represents a null component. Using(23) and (17), the product coordinates are obtained as sumsof terms grouped by Eið0;1Þ and Oið0;1Þ given in (21) and (22).These coordinates can be directly computed using (18). Theproduct coordinates for this example are given in Table 1,where the �T�K0, �T�K1 and �T�Kj (j ¼ 2; 3; 4; 5) componentsare specified and where a di coordinate is the sum of theterms Eið0;1Þ and Oið0;1Þ existent in the ith row.

The sharing property can be observed in Table 1. Startingwith the coordinate dm�2 ¼ d4, the term E41 appears in thesum of terms of d3, whose subexpression (O31 þE41) alsoappears in d2. The same fact can be stated for thesubexpression (E21 þO31 þE41) that appears in d2 andd1, and for (O11 þE21 þO31 þE41) belonging to d1 and d0.Finally, the subexpression (E01 þO11 þE21 þO31 þE41)belonging to d0 also appears in d5. These facts imply that thementioned terms can be constructed only once and then theycan be reused, without the necessity of reconstruction. Thenumber of gates finally needed for the GF ð26Þ multiplier is36 AND and 35 XOR gates.

This way of constructing the product coordinates lets usreduce the time complexity of the multiplier using a hybridtree of XOR gates in which the terms Eið0;1Þ and Oið0;1Þ areconstructed with a binary tree of XOR gates, whereas thesum of these terms is performed with an XOR linear tree, asgiven in Table 1. In Fig. 1a, the construction of thecoordinate d5, the one with highest space and time complex-ities, is shown. The delay for d5 is given by the sum of


TABLE 1Product Coordinates di for the Multiplication over GF ð26Þ

for fðxÞ ¼ x6 þ x5 þ 1

(E01 þO11 þE21 þO31 þE41) with E60. The term O31 isgiven by the sum of two product terms, so the total delay ofthe multiplier is TAND þ 6TXOR.

6.2 General Expressions

The product coordinates of � ¼ � � � are given in (18) and(19) as the summation of terms Eið0;1Þ and Oið0;1Þ. For anyvalue of m, these summations have the form given inTable 1, so this way of construction can be used. Startingwith the term Eðm�2Þ1 from the summation of dm�2, Eðm�2Þ1can be grouped with the term Oðm�3Þ1 from dm�3. Thisgroup can be denoted by G1 ¼ Eðm�2Þ1 þOðm�3Þ1, so thedm�4 coordinate includes the group G2 ¼ G1 þEðm�4Þ1, thedm�5 coordinate includes the group G3 ¼ G2 þOðm�5Þ1,and so on. The last group will be Gm�2 ¼ Gm�3 þE01 (foreven m) or Gm�2 ¼ Gm�3 þO01 (for odd m), which willbelong to the summations of d0 and dm�1. Therefore, them� 2 complex groups Gi consist exclusively of sum ofterms Ei1 and Oi1 determined by �T�K0.

In Fig. 1b, the m� 2 complex groups Gi are shown. Thestructure is a hybrid tree where the Ei1 and Oi1 terms areconstructed using a binary tree of XOR gates, whereas theirsum is performed with a linear tree of XOR gates. This lineartree is necessary for the construction of the Gi from the Gi�1

groups. A product coordinate is the sum of its group withits corresponding Oi0 or Ei0 term. The only coordinatewithout any group is dm�2. Therefore, the new expressionsestablished by the transpositional method for the computa-tion of the product coordinates di in � for even (y odd) valuesof m can be given as follows (i ¼ 0; 1; . . . ;m� 1):

d�i¼

yEm0 þGm�2 i ¼ m� 1yOðm�1Þ0 þEðm�2Þ1 i ¼ m� 2

Gm�2�i þOðiþ1Þ0 even iEðiþ1Þ0 odd i:

�8>><>>: ð24Þ

6.3 Theoretical Complexity Analysis

In order to determine the theoretical complexity of the bit-parallel transpositional canonical basis multiplier given by(24), we must first determine the complexity of the termsEið0;1Þ and Oið0;1Þ. These functions are constructed as binarytrees of XOR gates with a lower level of AND gates

(corresponding to the aicj products of the coordinates of �

and �). The theoretical complexities of these functions, for

the values of the subindexes given in (24), are the following:

. The Oi0 functions are the sum of 1 term xk ¼ akckand ðdi2e � 1Þ terms xij ¼ ðaicj þ ajciÞ. Therefore, theOi0 functions will need (2di2e � 1) 2-input AND gatesand a binary tree of 2ðdi2e � 1Þ 2-input XOR gates.The depth of the XOR binary tree will bedlog2ð2di2e � 1Þe, so the delay of the Oi0 terms willbe TAND þ dlog2ð2di2e � 1ÞeTXOR.

. The Ei0 functions are the sum of di2e terms xij, sothey will need 2di2e AND gates and a binary tree of2di2e � 1 XOR gates with a depth dlog2ð2di2eÞe. There-fore, the delay of the Ei0 terms will beTAND þ dlog2ð2di2eÞeTXOR.

. The Ei1 functions consist of one term xk and ofðm�i2 � 1Þ terms xij. Therefore, their implementationrequires m� i� 1 AND gates and a binary tree ofm� i� 2 XOR gates with depth dlog2ðm� i� 1Þe.T h e d e l a y o f t h e Ei1 t e r m s w i l l b eTAND þ dlog2ðm� i� 1ÞeTXOR.

. The Oi1 functions consist of m�i�12 terms xij, so

they require m� i� 1 AND gates and a binarytree of m� i� 2 XOR gates with depthdlog2ðm� i� 1Þe. The delay of the Oi1 terms willbe TAND þ dlog2ðm� i� 1ÞeTXOR.

From these results and using (24), the theoretical space

complexity of the multiplier can be finally given bym2 AND

gates and m2 � 1 XOR gates.In order to determine the theoretical time complexity of

the multiplier, we denote as �ðTÞ the depth of the XOR tree

of any given term T. From the definition of the Gi groups

and from Fig. 1b, it can be observed that

�ðGiÞ ¼ max �ðGi�1Þ;� EðOÞðm�2�iÞ1

� �� þ 1: ð25Þ

The G1 group was defined as G1 ¼ Eðm�2Þ1 þOðm�3Þ1,

where Eðm�2Þ1 is a product term and where Oðm�3Þ1 is the sum

of two products terms, so �ðG1Þ ¼ 2. The depths of the XOR

trees corresponding to Ei1 and Oi1 are equal and were given

as �ðEðOÞi1Þ ¼ dlog2ðm� i� 1Þe. Therefore, the depths of


Fig. 1. (a) Linear tree of XOR gates for the sum of Eið0;1Þ and Oið0;1Þ terms for the coordinate d5 in the example. (b) Tree structure of XOR gates for

the Gi groups.

the Gi groups are �ðG2Þ ¼ 3;�ðG3Þ ¼ 4; . . . ;�ðGiÞ ¼ iþ 1.

Applying (25), the highest depth will correspond to the

Gm�2 group, given by �ðGm�2Þ ¼ m� 1. The Gm�2 group

appears in the d0 and dm�1 coordinates, so these will

determine the total delay of the multiplier. The EðOÞm0

terms have the highest complexity, so the dm�1 coordinate

determines the highest delay and its depth is given by

�ðdm�1Þ ¼ max m� 1; log2 2m

2

l m� �l m� �þ 1 ¼ m: ð26Þ

Finally, the theoretical time complexity is given by this

number of XOR levels plus one AND level corresponding

to the aicj coordinates products of � and � from GF ð2mÞ.Therefore, the total delay for the multiplier is

TAND þmTXOR.In Table 2, we compare the theoretical complexities

obtained by our transpositional method with the best results

found in the literature given by Halbutogullari and Koc [3],

Zhang and Parhi [20] and Sunar and Koc [17]. These

authors use similar methods for the construction of bit-

parallel canonical basis multipliers generated by irreducible

trinomials fðxÞ ¼ xm þ xm�1 þ 1. It can be observed that our

method reduces the XOR delay of the canonical multipliers

in comparison with the best time complexities known to

date. Furthermore, the space complexity of our multipliers

matches the best results found in the literature.

7 IRREDUCIBLE TRINOMIALS fðxÞ ¼ xm þ xmþ12 þ 1

(ODD m)

When the generating irreducible trinomial has the form

fðxÞ ¼ xm þ xmþ12 þ 1 (odd m), the parameter � ¼ dm�1

� e ¼ 2

(� ¼ m� n ¼ m�12 ). The decomposition of K given in (6) is

the sum K0 þK1 þK2, which can also be deduced from

the structure of K1 given in (8).From the general expressions of the d�i

coordinates of

the product � given in (17) and using (14) to (16), we can

give the expressions for the computation of the coordinates

d�iof the product in which even and odd values of � are

distinguished. A new parameter, �m ¼ ðm� 1Þ ��, is also

introduced that, for this type of trinomial, is equal to �. For

even values of �, we then have

d�i¼

ðOðiþ1Þ0 þOi1Þ þOð�þiÞ1 even i; i < �m

� i ¼ �m

Eði�ð�mþ1ÞÞ1 þEði�1Þ1 even i; i > �m

8><>:

ðEðiþ1Þ0 þEi1Þ þEð�þiÞ1 odd i; i < �m

Oði�ð�mþ1ÞÞ1 þOði�1Þ1 odd i; i > �m

(

ðOm0Þ þEði�ð�mþ1ÞÞ1 þEði�1Þ1 i ¼ m� 1;

8>>>>>>>>><>>>>>>>>>:

ð27Þ

while, for odd values of �, the coordinates are

d�i¼

ðOðiþ1Þ0 þOi1Þ þEð�þiÞ1 even i; i < �m

Oði�ð�mþ1ÞÞ1 þEði�1Þ1 even i; i > �m

(

ðEðiþ1Þ0 þEi1Þ þOð�þiÞ1 odd i; i < �m

� i ¼ �m

Eði�ð�mþ1ÞÞ1 þOði�1Þ1 odd i; i > �m

8><>:

ðOm0Þ þOði�ð�mþ1ÞÞ1 þEði�1Þ1 i ¼ m� 1;

8>>>>>>>>><>>>>>>>>>:

ð28Þ

where i ¼ 0; 1; . . . ;m� 1 and where the terms in brackets

are given by �T�K0, whereas the third and fourth addends

(second and third if i ¼ m� 1) in (27) and (28) are given by

�T�K1 and �T�K2, respectively. An example over GF ð27Þ is

given in the following.

7.1 Multiplication Example over GF ð27ÞFor the irreducible trinomial fðxÞ ¼ x7 þ x4 þ 1, the para-

meter � ¼ m�12 ¼ 3 ¼ �m and the product � of two elements

� and � can then be computed using (28). K is decomposed

in the sum of the matrices K0, K1, and K2, that can be

constructed as given in Section 4.The transpositional approach computes the 1-cycles and

2-cycles using (10) to (13), obtaining the functions

OC10 ¼ ð0Þ;EC20 ¼ ð0; 1Þ;OC30 ¼ ð1Þð0; 2Þ;EC40 ¼ ð0; 3Þð1; 2Þ;OC50 ¼ ð2Þð0; 4Þð1; 3Þ;EC60 ¼ ð0; 5Þð1; 4Þð2; 3Þ;OC70 ¼ ð3Þð0; 6Þð1; 5Þð2; 4Þ

and the terms

OC01 ¼ ð1; 6Þð2; 5Þð3; 4Þ;EC11 ¼ ð4Þð2; 6Þð3; 5Þ;OC21 ¼ ð3; 6Þð4; 5Þ;EC31 ¼ ð5Þð4; 6Þ;OC41 ¼ ð5; 6Þ;EC51 ¼ ð6Þ:


TABLE 2Complexities for Canonical Multipliers Using Trinomials fðxÞ ¼ xm þ xm�1 þ 1

The sums of terms represented by these cycles are given by

O10, E20, O30, E40, O50, E60, O70 and O01, E11, O21, E31,

O41, E51, respectively, that can be easily computed.For this example, it can be proven that the product

coordinates are given as shown in Table 3, where the

components of �T�K0, �T�K1, and �T�K2 are specified and

where a di coordinate is the sum of the Eið0;1Þ and Oið0;1Þ

terms existent in the ith row. The sharing property can be

observed in Table 3. The subexpression (O21 þE51) from d2

also appears in d6. The same fact can be stated for the

subexpressions (E11 þO41) that appears in d1 and d5 and,

for (O01 þE31) belonging to d0 and d4. The number of gates

finally needed for the GF ð27Þ multiplier is 49 AND and

48 XOR gates. This way of construction of the product

coordinates also lets us reduce the time complexity of the

multiplier using binary trees of XOR gates. Finally, the total

delay of the multiplier is given by TAND þ 5TXOR.

7.2 General Expressions and TheoreticalComplexity Analysis

As with the previously studied trinomial, Table 3 can be

used to find complex groups determined by the sum of

terms Ei1 and/or Oi1. These groups are denoted as Gi,

where the i subindex is equal to the subindex of the product

coordinate to which the group belongs. According to this,

the following expressions for the Gi groups can be given for

even (y odd) values of �:

Gi ¼Oi1 þ yOð�þiÞ1 even i

Ei1 þ yEð�þiÞ1 odd i

�i < �m

Gi�ð�mþ1Þ i > �m

8<: ð29Þ

for i ¼ 0; 1; . . . ;m� 1 with i 6¼ �m because, for i ¼ �m,

there is no associated Gi group.Using (27) to (29), the following new expressions

established by the transpositional method for the coordinates

of the product in the canonical basis can be given for even

(y odd) values of � (with i ¼ 0; 1; . . . ;m� 1):

d�i¼

Gi þOðiþ1Þ0 even i

Eðiþ1Þ0 odd i

( )i < �m

yOðiþ1Þ0 þ yOi1 i ¼ �m

Gi�ð�mþ1Þ þOðiþ1Þ0 þOi1 even i

Eðiþ1Þ0 þEi1 odd i

( )i > �m

Om0 þGi�ð�mþ1Þ i ¼ m� 1:

8>>>>>>>>>><>>>>>>>>>>:

ð30Þ

Using binary trees of XOR gates for the construction, the

theoretical complexities of the multiplier can be deter-

mined. The space complexity is computed from the Eið0;1Þ

and Oið0;1Þ complexities given in Section 6.3 and using (29)

and (30). This complexity can be proven to be m2 AND and

m2 � 1 XOR gates.In order to determine the theoretical time complexity of

the multiplier, we must first compute the delay of the Gi

complex groups. Using (29), the depth of the XOR tree for a

group Gi is given by

�ðGiÞ ¼ maxð�ðEðOÞi1Þ;�ðEðOÞð�þiÞ1ÞÞ þ 1

¼ dlog2ðm� i� 1Þe þ 1;ð31Þ

where the complexities given in Section 6.3 have been used.

Using (31) and (30), the following complexities for the

product coordinates can be stated:

. When i < �m, the depth of the XOR tree for the diproduct coordinate is given by

�ðdiÞ ¼ �ðGiÞ þ 1 ¼ dlog2ðm� i� 1Þe þ 2:

. For i ¼ �m, the di coordinate does not have anyassociated Gi group. Therefore, for even �,

�ðdiÞ ¼ �ðOðiþ1Þ0Þ þ 1 ¼ log2 2iþ 1

2

� � 1

� �� þ 1;

while, for odd �, we have that

�ðdiÞ ¼ �ðEðiþ1Þ0Þ þ 1 ¼ log2 2iþ 1

2

� � �� þ 1:

. For i > �m (i 6¼ m� 1), Gi ¼ Gi�ð�mþ1Þ using (29).Therefore, the depth of the XOR tree for thedi coordinate is given by

�ðdiÞ ¼ �ðGiÞ þ 1 ¼ dlog2ðm� iþ�mÞe þ 2:

. For i ¼ m� 1, di i s the sum of Om0 andGi ¼ Gi�ð�mþ1Þ. Therefore, we have that

�ðdiÞ ¼ max log2md e; 1þ log2mþ 1

2

� �� þ 1

¼ dlog2me þ 1:




The total delay of the multiplier can be computed using

the highest complexity from the previous ones that is given

by 2þ dlog2ðm� 1Þe (selecting the index i ¼ 0). Finally, the

theoretical time complexity is given by this number of levels

of XOR gates plus one level of AND gates. Therefore, the

delay of the multiplier is TAND þ ð2þ dlog2ðm� 1ÞeÞTXOR.

In Table 4, the theoretical complexities obtained by our

transpositional method and the best results found in the

literature for similar multipliers are given. It can be

observed that our method reduces the XOR delay of the

canonical multipliers in comparison with the best time

complexities known to date, while the space complexity of

our multipliers matches the best results found in the

literature.

8 IRREDUCIBLE TRINOMIALS fðxÞ ¼ xm þ xm�12 þ 1

(ODD m)

When the irreducible generating trinomial is fðxÞ ¼xm þ xm�1

2 þ 1 (odd m), � ¼ mþ12 and � ¼ d2ðm�1Þ

ðmþ1Þ e. For

m > 3, � ¼ 2 and K ¼ K0 þK1 þK2, while, for m ¼ 3, � ¼1 and K ¼ K0 þK1. In fact, the trinomial fðxÞ ¼ x3 þ xþ 1

is of the form fðxÞ ¼ xm þ xþ 1 that will be considered in

the Section 10.

Using (14) to (17), the expressions of the coordinates d�i

of the product can be given, where even and odd values for

the � parameter are distinguished (in this case,

�m ¼ m�32 6¼ �) . F o r e v e n v a l u e s o f m a n d f o r

i ¼ 0; 1; . . . ;m� 1, the product coordinates are given by

d�i¼

ðOðiþ1Þ0 þOi1Þ þOð�þiÞ1 even i; i < �m

� i ¼ �m

Eði�ð�mþ1ÞÞ1 þEðiþ1Þ1 even i; i > �m

8><>:

ðEðiþ1Þ0 þEi1Þ þEð�þiÞ1 odd i; i < �m

Oði�ð�mþ1ÞÞ1 þOðiþ1Þ1 odd i; i > �m

(

ðEðiþ1Þ0 þEi1Þ þOði�ð�mþ1ÞÞ1 i ¼ m� 2

ðOm0Þ þEði�ð�mþ1ÞÞ1 i ¼ m� 1;

8>>>>>>>>>>>><>>>>>>>>>>>>:

ð32Þ

while, for odd values of �,

d�i¼

ðOðiþ1Þ0 þOi1Þ þEð�þiÞ1 even i; i < �m

Oði�ð�mþ1ÞÞ1 þEðiþ1Þ1 even i; i > �m

(

ðEðiþ1Þ0 þEi1Þ þOð�þiÞ1 odd i; i < �m

� i ¼ �m

Eði�ð�mþ1ÞÞ1 þOðiþ1Þ1 odd i; i > �m

8><>:

ðEðiþ1Þ0 þEi1Þ þEði�ð�mþ1ÞÞ1 i ¼ m� 2

ðOm0Þ þOði�ð�mþ1ÞÞ1 i ¼ m� 1:

8>>>>>>>>>>>><>>>>>>>>>>>>:

ð33Þ

8.1 Multiplication Example Over GF ð27ÞFor the irreducible trinomial fðxÞ ¼ x7 þ x3 þ 1, the 1-cyclesand 2-cycles computed by the transpositional approach using(10) to (13) are the ones given in Section 7.1 for the trinomialfðxÞ ¼ x7 þ x4 þ 1. The product coordinates obtained aregiven in Table 5, where the sharing property can beobserved for the subexpressions (O01 þO41) and(E11 þE51). Using binary trees of XOR gates for theconstruction, it can be proven that the space complexity ofthe multiplier is 49 AND and 48 XOR gates, whereas thetime complexity is TAND þ 5TXOR.

8.2 General Expressions and TheoreticalComplexity Analysis

As with the previous trinomials, Table 5 determinescomplex groups given by the sum of terms Ei1 and/orOi1. These groups Gi (with i equal to the subindex of theproduct coordinate to which the group belongs) are givenby (29), with the only difference being that, for fðxÞ ¼xm þ xm�1

2 þ 1 (odd m), the subindex i ¼ 0; 1; . . . ;m� 3. Fori ¼ �m, there is no associated Gi group. The following new


TABLE 4Complexities for Canonical Multipliers Using Trinomials fðxÞ ¼ xm þ xmþ1

2 þ 1 (Odd m)



expressions established by the transpositional method for the

coordinates of the product in � can be given for even (y odd)

values of � (with i ¼ 0; 1; . . . ;m� 1):

d�i¼

Gi þOðiþ1Þ0 even iEðiþ1Þ0 odd i

� �i < �m

yOðiþ1Þ0 þ yOi1 i ¼ �m

Gi�ð�mþ1Þ þOðiþ1Þ0 þOi1 even iEðiþ1Þ0 þEi1 odd i

� �i > �m

Eðiþ1Þ0 þEi1 þ yOði�ð�mþ1ÞÞ1 i ¼ m� 2

Om0 þ yEði�ð�mþ1ÞÞ1 i ¼ m� 1

8>>>>>>>>><>>>>>>>>>:

ð34Þ

Using binary trees of XOR gates for the construction, the

theoretical complexities of the multiplier can be deter-

mined. The space complexity is computed from the Eið0;1Þand Oið0;1Þ complexities given in Section 6.3 and using (29)

and (34). This complexity can be proven to be m2 AND and

m2 � 1 XOR gates.The theoretical time complexity of the multiplier can be

determined using (31), which establishes that the number of

XOR levels of the Gi groups is incremented with the decrease

of the index i. With this fact and using (34), the following

complexities for the product coordinates can be stated:

. When i < �m, the depth of the XOR treefor the di product coordinate is given by�ðdiÞ ¼ dlog2ðm� i� 1Þe þ 2.

. For i ¼ �m and for even �,

�ðdiÞ ¼ log2 2iþ 1

2

� � 1

� �� þ 1;

while, for odd �, we have �ðdiÞ ¼ dlog2ð2diþ12 eÞe þ 1.

. For i > �m with i 6¼ fm� 2;m� 1g, the depth of theXOR tree for the di coordinate is given by�ðdiÞ ¼ dlog2ðm� iþ�mÞe þ 2.

. For i ¼ m� 2, the depth of the XOR tree is given by�ðdiÞ ¼ dlog2ð2diþ1

2 eÞe þ 1.. For i ¼ m� 1, the depth of the XOR tree is given by

�ðdiÞ ¼ dlog2me þ 1.

The highest complexity from the previous ones is 2þdlog2ðm� 1Þe for i ¼ 0, so the theoretical time complexity is

given by this number of levels of XOR gates plus one level

of AND gates. Therefore, the delay of the multiplier is

TAND þ ð2þ dlog2ðm� 1ÞeÞTXOR, which is equal to the delay

obtained for the trinomials fðxÞ ¼ xm þ xmþ12 þ 1 with odd m.

In Table 6, the theoretical complexities obtained by ourtranspositional method and the best results found in theliterature for similar multipliers are given. It can beobserved that our method equals the best time complexitiesmost recently presented by Wu [18] and by Reyhani-Masoleh and Hasan [16]. The space complexity of ourmultipliers also matches these best results found in theliterature.

9 IRREDUCIBLE TRINOMIALS fðxÞ ¼ xm þ xm2 þ 1(EVEN m)

The irreducible trinomials of the form fðxÞ ¼ xm þ xm2 þ 1

(even m) are a special type of ESPs (Equally-Spaced-

Polynomials), known as ESTs (Equally-Spaced-Trinomials).

The irreducible ESPs are polynomials of the form

fðxÞ ¼ xk� þ xðk�1Þ� þ � � � þ x� þ 1, where m ¼ k� y k � 2.

The ESPs are reduced to ESTs when k ¼ 2 and are reduced

to AOPs (All-One-Polynomials) for � ¼ 1. For the EST fðxÞ ¼xm þ xm2 þ 1 (even m) , the parameter � ¼ m

2 and

�m ¼ m�22 ¼ �� 1. For even values of m with m > 2, � ¼

d2ðm�1Þm e ¼ 2 and K ¼ K0 þK1 þK2, whereas, for m ¼ 2,

� ¼ 1 and K ¼ K0 þK1 (in fact, fðxÞ ¼ x2 þ xþ 1 is of the

form fðxÞ ¼ xm þ xþ 1 that will be studied in Section 10).From the matrix structures given in Section 4, it can be

deduced that the sum of K0 and K2 generates sums ofidentical terms ci that are therefore canceled, i.e., ci þ ci ¼ 0.This fact reduces the space complexity of the multiplier. Thefollowing expressions for the product coordinates d�i

usingthe transpositional method for i ¼ 0; 1; . . . ;m� 1 and for even� (a condition that is verified for all irreducible ESTs) canbe given:

d�i¼

Oðiþ1Þ0 þEi1 þOð�þiÞ1 even i; i < �m

Ei1 i ¼ �m

Oði�ð�mþ1ÞÞ1 even i; i > �m

8<:

Eðiþ1Þ0 þOi1 þEð�þiÞ1 odd i; i < �m

Eði�ð�mþ1ÞÞ1 odd i; i > �m

�8>>>><>>>>:

ð35Þ

9.1 Multiplication Example over GF ð26ÞFor the trinomial fðxÞ ¼ x6 þ x3 þ 1 (� ¼ 3 and �m ¼ 2),the 1-cycles and 2-cycles are the ones given in Section 6.1.The product coordinates are given in Table 7, where, on itsleft side, the sums of terms ðO31 þO31Þ ¼ 0 and ðE41 þE41Þ ¼ 0 can be observed for the coordinates d3 and d4,respectively. On the right side of Table 7, the resultant


TABLE 6Complexities for Canonical Multipliers Using Trinomials fðxÞ ¼ xm þ xm�1

2 þ 1 (Odd m)

reduced table is given, where �T�K00 results from the

combination and reduction of �T�K0 and �T�K2. In this

case, no complex groups Gi of terms Ei1 and/or Oi1 can be

found for sharing. The only existing grouping (of terms xijand xk) is given by their own terms Eið0;1Þ and Oið0;1Þ. The

nonexistence of Gi groups implies that there are no more

general expressions for the product coordinates than the

ones given in (35). It can be proven that the space complexity

of the multiplier is 36 AND and 33 XOR gates, whereas the

time complexity is TAND þ 4TXOR.


The theoretical complexity of the multiplier can be

computed using (35) and the Eið0;1Þ and Oið0;1Þ complexities

given in Section 6.3, where binary trees of XOR gates are

used for the construction. Using the cancellations of terms

as the ones shown in the previous example, it can be proven

that the space complexity of the multiplier is of m2 AND and

m2 �� XOR gates, which is a lower complexity than those

given for the previously studied trinomials.For the computation of the theoretical time complexity of

the multiplier, the following depths of XOR trees for the diproduct coordinates can be stated:

. When i < �m, �ðdiÞ ¼ dlog2ðm� i� 1Þe þ 1.

. For i ¼ �m (with �m ¼ m2 � 1 for ESTs),

�ðdiÞ ¼ dlog2ðm2 Þe þ 1.. For i > �m, �ðdiÞ ¼ dlog2ð2diþ1

2 eÞe þ 1.

The total delay of the multiplier can be computed using

the highest complexity from the previous ones, which

corresponds to the coordinate dm�1 with XOR tree depth

dlog2me þ 1. Finally, the theoretical time complexity is given

by this number of levels of XOR gates plus one level of

AND gates. Therefore, the delay of the multiplier is

TAND þ ð1þ dlog2meÞTXOR.In Table 8, the theoretical complexities obtained with our

transpositional method and the best results found in the

literature for similar multipliers are given. It can be

observed that our method equals the best time complexities

known to date (the result presented by Wu [18] equals the

remaining delays given in Table 8 for the values of m that

verify that the ESTs are irreducibles). The space complexity

of our multipliers also matches these best results found in

the literature.

10 IRREDUCIBLE TRINOMIALS fðxÞ ¼ xm þ xþ 1

For the irreducible generating trinomial fðxÞ ¼ xm þ xþ 1,

� ¼ m� 1, �m ¼ 0, and � ¼ dm�1� e ¼ 1. The decomposition

of K given in (6) is only the sum of K0 and K1.The product coordinates d�i

for even (y odd) values of �

using the transpositional approach can be computed as

follows (i ¼ 0; 1; . . . ;m� 1):

d�i¼

ðOðiþ1Þ0 þ yOi1Þ i ¼ 0

ðOðiþ1Þ0 þ yOi1Þ þ yEði�1Þ1 even i

ðEðiþ1Þ0 þ yEi1Þ þ yOði�1Þ1 odd i

ðyOðiþ1Þ0Þ þEði�1Þ1 i ¼ m� 1:

8>><>>: ð36Þ

10.1 Multiplication Example over GF ð26ÞFor the irreducible trinomial fðxÞ ¼ x6 þ xþ 1 (� ¼ 5), the

1-cycles and 2-cycles are given in Section 6.1. The product

coordinates obtained are shown in Table 9, where no

complex groups Gi of terms Ei1 and/or Oi1 can be found

for sharing. The only existent grouping (of terms xij and xk) is

given by their own terms Eið0;1Þ and Oið0;1Þ. The nonexis-

tence of Gi groups implies that there are no more general

expressions for the product coordinates than the ones given

in (36). It can be proven that the space complexity of the

multiplier is 36 AND and 35 XOR gates, whereas the time

complexity is TAND þ 4TXOR.


The theoretical complexity of the multiplier can be

computed using (36) and the complexities given in

Section 6.3, where binary trees of XOR gates are used for

the construction. The space complexity of the multiplier is

m2 AND and m2 � 1 XOR gates.




TABLE 8Complexities for Canonical Multipliers Using Irreducible ESTs

For the computation of the theoretical time complexity of

the multiplier, the following depths of XOR trees for the diproduct coordinates can be stated:

. For i ¼ 0, �ðdiÞ ¼ dlog2ðm� 1Þe þ 1.

. For i 6¼ f0; m� 1g with even � and using (36), it canb e p r o v e n t h a t , i f �ðE11Þ ¼ �ðO01Þ, t h e n�ðdiÞ � dlog2ðm� 1Þe þ 2, w h i l e , i f �ðE11Þ <�ðO01Þ then �ðdiÞ � dlog2ðm� 1Þe þ 1.

. For i 6¼ f0;m� 1g with odd � and using (36), it canb e p r o v e n t h a t , i f �ðO11Þ ¼ �ðE01Þ, t h e n�ðdiÞ � dlog2ðm� 1Þe þ 2, w h i l e , i f �ðO11Þ <�ðE01Þ then �ðdiÞ � dlog2ðm� 1Þe þ 1.

. W h e n i ¼ m� 1 w i t h e v e n �, t h e n�ðdiÞ ¼ dlog2ð2dm2 e � 1Þe þ 1, while, for odd �, then�ðdiÞ ¼ dlog2ð2dm2 eÞe þ 1.

The total delay of the multiplier can be computed

using the highest complexity from the previous ones.

For m = 3, 4 and 6, �ðEðOÞ11Þ < �ðOðEÞ01Þ, so, for

these values of m, the theoretical time complexity of

the multiplier is TAND þ ð1þ dlog2ð2dm2eÞeÞTXOR, while,

for the remainder values of m, the time complexity is

TAND þ ð2þ dlog2ðm� 1ÞeÞTXOR.

In Table 10, the best theoretical complexities found in the

literature and the complexities obtained with our transposi-

tional approach are given. For our method, the results

obtained for m = 3, 4, and 6 (denoted as Transpositional1)

and the results obtained for the rest of the m values

(denoted as Transpositional2) are specified. For these

trinomials, our multipliers show worse time complexities

compared with the best results presented by Halbutogullari

and Koc [3], Zhang and Parhi [20], and Sunar and Koc [17].

However, the space complexity of our multipliers matches

these best results found in the literature.

11 CONCLUSIONS

In this paper, we have presented a new canonical basismultiplication method named transpositional. This methodhas been deduced from a new general formulation for thecanonical basis multiplication inGF ð2mÞbased on the use of atriangular basis. This approach introduces a product matrixthat can be decomposed in a sum of matrices depending on theirreducible polynomial selected for the field. From this matrixdecomposition, the transpositional multiplication method hasbeen deduced. It uses the notation given in group theory forpermutations and it is based on the computation of 1-cyclesand 2-cycles given by the permutation defined by thecoordinate of the product to be computed and by thecardinality of the field GF ð2mÞ. The obtained cycles definegroups of subexpressions in sum-of-products form (the func-tions Ei0, Oi0, Ei1, and Oi1) that can be shared among theproduct coordinates. A very important characteristic of ourmethod is that these functions and the K0 matrix are commonto any selected irreducible polynomial, so our method can becompletely generalized to perform the multiplication overgeneral irreducible polynomials.

In order to prove the efficiency of our transpositionalmethod, we have applied it to five types of irreducibletrinomials, for which more complex groups Gi of subexpres-sions for sharing are determined. These groups consistexclusively of the sum of terms Ei1 and Oi1 previouslyconstructed, so the Gi groups only contribute withadditional XORs to the multiplier complexity dependingon the selected polynomial. We have presented explicitexpressions for multiplication for these five types oftrinomials. These expressions can be easily coded (withouthaving any knowledge of finite field arithmetic) usinghardware description languages, which is a very attractivefeature for VLSI design and implementation of optimizedmultipliers.

The theoretical complexity analyses of the correspondingbit-parallel multipliers have shown that the space complex-ities of our multipliers match the best results found in theliterature. The complexity analyses have also proven thatour method reduces, in two of the five studied trinomials,the best time complexities known to date for similarmultipliers. This is shown in Table 11, where a list of



for fðxÞ ¼ x6 þ xþ 1

TABLE 10Complexities for Canonical Multipliers Using Irreducible Trinomials fðxÞ ¼ xm þ xþ 1

irreducible trinomials fðxÞ ¼ xm þ xn þ 1 (m � 1000) for

which the transpositional method achieves better perfor-

mance than the best results found in the literature for

similar multipliers, is given. These trinomials are repre-

sented as ðm;nÞ in Table 11. In another two of the studied

trinomials, our multipliers match the best results found in

the literature. Only in one of the five trinomials are the

multiplier complexities given by our method worse than the

best complexities presented by other authors.

REFERENCES

[1] S.T.J. Fenn, M. Benaissa, and D. Taylor, “GF ð2mÞ Multiplicationand Division over the Dual Basis,” IEEE Trans. Computers, vol. 45,no. 3, pp. 319-327, Mar. 1996.

[2] A. Halbutogullari and C.K. Koc, “Mastrovito Multiplier forGeneral Irreducible Polynomials,” Applied Algebra, AlgebraicAlgorithms, and Error-Correcting Codes, pp. 498-507, 1999.

[3] A. Halbutogullari and C.K. Koc, “Mastrovito Multiplier forGeneral Irreducible Polynomials,” IEEE Trans. Computers, vol. 49,no. 5, pp. 503-518, May 2000.

[4] M.A. Hasan, “Double-Basis Multiplicative Inversion overGF ð2mÞ,” IEEE Trans. Computers, vol. 47, no. 9, pp. 960-970, Sept.1998.

[5] M.A. Hasan and V.K. Bhargava, “Architecture for a LowComplexity Rate-Adaptive Reed-Solomon Encoder,” IEEE Trans.Computers, vol. 44, no. 6, pp. 938-942, June 1995.

[6] M.A. Hasan, M.Z. Wang, and V.K. Bhargava, “Modular Con-struction of Low Complexity Parallel Multipliers for a Class ofFinite Fields GF ð2mÞ,” IEEE Trans. Computers, vol. 41, no. 8,pp. 962-971, Aug. 1992.

[7] M.A. Hasan, M.Z. Wang, and V.K. Bhargava, “A ModifiedMassey-Omura Parallel Multiplier for a Class of Finite Fields,”IEEE Trans. Computers, vol. 42, no. 10, pp. 1278-1280, Oct. 1993.

[8] J.L. Imana and J.M. Sanchez, “A New Reconfigurable-OrientedMethod for Canonical Basis Multiplication Over a Class of FiniteFields GF ð2mÞ,” Proc. 13th Int’l Conf. Field Programmable Logic andApplications, pp. 1127-1130, 2003.

[9] T. Itoh and S. Tsujii, “Structure of Parallel Multipliers for a Classof Finite Fields GF ð2mÞ,” Information and Computation, vol. 83,pp. 21-40, 1989.

[10] C.K. Koc and B. Sunar, “Low-Complexity Bit-Parallel Canonicaland Normal Basis Multipliers for a Class of Finite Fields,” IEEETrans. Computers, vol. 47, no. 3, pp. 353-356, Mar. 1998.

[11] R. Lidl and H. Niederreiter, Introduction to Finite Fields and TheirApplications. New York: Cambridge Univ. Press, 1994.

[12] E.D. Mastrovito, “VLSI Architectures for Multiplication overFinite Fields GF ð2mÞ,” Proc. Sixth Int’l Conf. Applied Algebra,Algebraic Algorithms, and Error-Correcting Codes (AAECC-6),pp. 297-309, July 1988.

[13] Applications of Finite Fields, A.J. Menezes, ed. Boston: KluwerAcademic, 1993.

[14] J. Omura and J. Massey, “Computational Method and Apparatusfor Finite Field Arithmetic,” US Patent Number 4,587,627, May1986.

[15] K.K. Parhi, VLSI Digital Signal Processing Systems: Design andImplementation. John Wiley & Sons, 1999.

[16] A. Reyhani-Masoleh and M.A. Hasan, “On Low Complexity BitParallel Polynomial Basis Multipliers,” Proc. Workshop on Crypto-graphic Hardware and Embedded Systems (CHES 2003), pp. 189-202,2003.

[17] B. Sunar and C.K. Koc, “Mastrovito Multiplier for All Trinomials,”IEEE Trans. Computers, vol. 48, no. 5, pp. 522-527, May 1999.

[18] H. Wu, “Bit-Parallel Finite Field Multiplier and Squarer UsingPolynomial Basis,” IEEE Trans. Computers, vol. 51, no. 7, pp. 750-758, July 2002.

[19] H. Wu and M.A. Hasan, “Low-Complexity Bit-Parallel Multipliersfor a Class of Finite Fields,” IEEE Trans. Computers, vol. 47, no. 8,pp. 883-887, Aug. 1998.

[20] T. Zhang and K.K. Parhi, “Systematic Design of Original andModified Mastrovito Multipliers for General Irreducible Poly-nomials,” IEEE Trans. Computers, vol. 50, no. 7, pp. 734-749, July2001.

Jose Luis Imana received the PhD degree inphysics from the Complutense University ofMadrid, Spain, in 2003. From 1991 to 1993, hewas a design engineer (R&D) with the Depart-ment of Information Technologies, TechnologyInstitute of Madrid, Spain. He is currently withthe Department of Computer Architecture andAutomation at the Complutense University ofMadrid, where he is an assistant professor. Hisresearch interests include algorithms and VLSI

architectures for computations in Galois fields, cryptography, computerarithmetic, reconfigurable computing architectures, and formal methodsin verification.

Juan Manuel Sanchez received the PhDdegree in physics from the Complutense Uni-versity of Madrid in 1976. He is a professor ofcomputer architecture in the Department ofComputer Science, University of Extremadura,Spain. His research interests are applications ofreconfigurable hardware, logic design, moderncomputer architectures, and cryptography.

Francisco Tirado received the applied physicsdegree from Universidad Complutense de Ma-drid (UCM) in 1973 and the PhD degree inphysics from UCM in 1977. He has held severalpositions with the Computer Science and Auto-matic Control Department of the UCM. From1978-1985, he was an associate professor and,since 1986, he has been a professor ofcomputer architecture and technology. He hasworked in different fields within computer archi-

tecture, parallel processing, and design automation. His currentresearch areas are parallel algorithms and architectures, processordesign. Professor Tirado has coauthored more than 200 publications: 15book chapters, 47 magazine articles, and 133 papers at conferences.He has served in the organization of more than 60 internationalconferences as general chair, steering committee member, programchair, program committee member, invited speaker, and session chair.He is the director of the CSC4 (Center for SuperComputation) andMadrid Science Park. He has been the dean of the Physics Science andElectronic Engineering Faculty (1994-2002). He is member of theInformatics Advisory Board of the UCM and he has been also vice-deanof the Physics Science Faculty and Head of the Computer Science andAutomatic Control Department. For five years (1988-1992), he served asgeneral manager of the Spanish National Programme for Robotics andAdvanced Automation. He is an adviser of the National Agency forResearch and Development (CICYT). He also represents the CICYT onseveral national and international committees on information technol-ogy. Professor Tirado served on the research evaluation committee inSpain for three years and chaired it in 2001-2002. He is a senior memberof the IEEE and of several European institutions and committees. He isan adviser of the Spanish Ministry of Science and Technology.


TABLE 11Irreducible Trinomials (m � 1000) for which the Transpositional

Method Achieves Better Performance

Bit-Parallel Finite Field Multipliers for Irreducible Trinomials

Documents

Transcript of Bit-Parallel Finite Field Multipliers for Irreducible Trinomials