Bit-Parallel Finite Field Multipliers for Irreducible Trinomials
-
Upload
independent -
Category
Documents
-
view
2 -
download
0
Transcript of Bit-Parallel Finite Field Multipliers for Irreducible Trinomials
Bit-Parallel Finite Field Multipliersfor Irreducible Trinomials
Jose Luis Imana, Juan Manuel Sanchez, and Francisco Tirado, Senior Member, IEEE
Abstract—A new formulation for the canonical basis multiplication in the finite fields GF ð2mÞ based on the use of a triangular basis and
on the decomposition of a product matrix is presented. From this algorithm, a new method for multiplication (named transpositional)
applicable to general irreducible polynomials is deduced. The transpositional method is based on the computation of 1-cycles and
2-cycles given by a permutation defined by the coordinate of the product to be computed and by the cardinality of the field GF ð2mÞ. The
obtained cycles define groups corresponding to subexpressions that can be shared among the different product coordinates. This new
multiplication method is applied to five types of irreducible trinomials. These polynomials have been widely studied due to their low-
complexity implementations. The theoretical complexity analysis of the corresponding bit-parallel multipliers shows that the space
complexities of our multipliers match the best results known to date for similar canonical GF ð2mÞ multipliers. The most important new
result is the reduction, in two of the five studied trinomials, of the time complexity with respect to the best known results.
Index Terms—Finite (or Galois) fields, multiplication, canonical basis, irreducible trinomials, complexity, triangular basis, matrix
decomposition, permutation, cycles, transpositions.
�
1 INTRODUCTION
EFFICIENT hardware implementations of arithmetic opera-tions in the Galois field GF ð2mÞ are highly desirable for
several applications, such as coding theory, computeralgebra, and cryptography [11], [13]. The efficiency of thehardware implementations is measured in terms of thenumber of gates (XOR and AND) and of the total gate delayof the circuit (TXOR and TAND). The representation of thefield elements has a crucial role in determining the spaceand time complexities of the arithmetic operations, parti-cularly the multiplication, which is considered the mostimportant building block. A number of efficient GF ð2mÞmultiplication approaches and architectures have beenproposed in which different basis representations of fieldelements are used. Among them, the most widely used arethe canonical (or standard or polynomial) [9], [12], normal [14],and dual [1] bases, although other ones, such as triangular [4]basis, can also be used. The complexity of the multiplieralso depends on the defining irreducible polynomialselected for the field. In this paper, we are interested inthe design of bit-parallel finite field multipliers usingcanonical basis for irreducible trinomials.
The canonical basis multiplication requires a polynomialmodular multiplication followed by a modular reduction.An efficient bit-parallel multiplier was proposed by Mas-trovito [12] in which a product matrix is introduced tocombine the above two steps together. The entries in this
matrix can be computed efficiently by sharing commonitems, a method that is known as subexpression sharing [15].The Mastrovito multipliers using the special irreducibletrinomials have been widely studied due to their low-complexity implementations [2], [3], [17], [20]. All theseworks exploit the subexpression sharing in order to find anefficient architecture for the multipliers. It has been shownin [17] that, when the irreducible trinomial is of the generalform xm þ xn þ 1 for n ¼ 1; 2; . . . ;m� 1 and m 6¼ 2n, thenthe Mastrovito multiplier only requires ðm2 � 1Þ XOR gatesand m2 AND gates. However, the required number ofXOR gates is reduced to ðm2 � m
2 Þ for the trinomial xm þxm2 þ 1 where m is even [17].
In this paper, we present a new canonical basis multi-plication method named transpositional. This method wasintroduced in [8], but only for fields generated byirreducible AOPs (all-one-polynomials). In this contribution,we generalize the method and apply it to irreducibletrinomials. In order to do so, we give a new generalformulation for the canonical basis multiplication inGF ð2mÞ based on the use of a triangular basis. This approachintroduce a product matrix that can be decomposed in a sum ofmatrices depending on the irreducible polynomial selectedfor the field. The decomposition of a matrix is an idea alreadyused in similar multiplication approaches [7], [10], [20] overfinite fields that try to exploit the subexpression sharing inorder to obtain GF ð2mÞ multipliers with reduced complex-ities. From this matrix decomposition, the transpositionalmultiplication method is deduced. It uses the notation givenin group theory for permutations and it is based on thecomputation of 1-cycles and 2-cycles given by the permuta-tion defined by the coordinate of the product to becomputed and by the cardinality of the field GF ð2mÞ. Theobtained cycles define groups corresponding to subexpres-sions in sum-of-products form that can be shared among thedifferent expressions of the product coordinates. A veryimportant characteristic of our method is that these first
520 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 5, MAY 2006
. J.L. Imana and F. Tirado are with the Departamento de Arquitectura deComputadores y Automatica, Facultad Ciencias Fisicas, UniversidadComplutense, 28040 Madrid, Spain.E-mail: {jluimana, ptirado}@dacya.ucm.es.
. J.M. Sanchez is with the Departamento de Informatica, Escuela Politecnica,Avda. Universidad s/n, 10071 Caceres, Spain.E-mail: [email protected].
Manuscript received 28 May 2004; revised 30 May 2005 accepted 20 Oct.2005; published online 22 Mar. 2006.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-0181-0504.
0018-9340/06/$20.00 � 2006 IEEE Published by the IEEE Computer Society
groups of subexpresions and the first matrix obtained fromthe product matrix decomposition are common to anyirreducible polynomial we could select, so our method can becompletely generalized to perform the multiplication overgeneral irreducible polynomials. In this paper, the transposi-tional method is applied to five types of irreducibletrinomials, for which more complex groups of subexpres-sions for sharing are determined. We also present explicitexpressions for multiplication for these five types oftrinomials. These expressions can be easily coded usinghardware description languages, such as VHDL andVerilog, to implement optimized multipliers. These codingscan be done without having any knowledge of finite fieldarithmetic. The theoretical complexity analyses of thecorresponding bit-parallel multipliers show that the spacecomplexities of our multipliers match the best results foundin the literature, while our method reduces, in two of thefive studied trinomials, the best time complexities known todate for similar multipliers.
We first introduce some basic concepts of the Galoisfields GF ð2mÞ in Section 2. In Section 3, a new formulationfor the canonical basis multiplication over GF ð2mÞ ispresented. The application of the new algorithm formultiplication to irreducible trinomials and the productmatrix decomposition are given in Section 4. From these,the first expressions for the transpositional multiplicationmethod are given in Section 5, where a new notation basedon permutations is used and where a first grouping ofsubexpressions can be observed. The transpositional methodis applied to five types of trinomials in Sections 6 to 10,where new particular expressions and more complex groupsGi of subexpressions are given for each trinomial. Theore-tical complexity analyses and some examples are alsopresented. Finally, the conclusions are summarized inSection 11.
2 PRELIMINARIES
The finite fieldGF ð2mÞ can be considered as a vector space ofdimensionmover the binary fieldGF ð2Þ. Therefore, elementsof GF ð2mÞ are represented by binary vectors of length m.Field addition is realized in all bases by a bit-wise XORoperation, whereas the structure of multiplication is deter-mined by the choice of the basis. A canonical basis � is the set ofelements � ¼ f1; !; !2; . . . ; !m�1g, where! is a root inGF ð2mÞof an irreducible polynomial fðxÞ ¼
Pmi¼0 fix
i of degree moverGF ð2Þ. Using this basis, the elements of the fieldGF ð2mÞare polynomials of degree at most m� 1 over GF ð2Þ andarithmetic is carried out modulo fðxÞ. The set � ¼f�0; �1; . . . ; �m�1g of m elements is called the triangular basis[5], [4] of � if �i ¼
Pm�1�ij¼0 fiþjþ1!
j, 0 � i � m� 1, where fisare the coefficients of fðxÞ. An element � 2 GF ð2mÞ can berepresented with respect to � as
� ¼Xm�1
i¼0
a�i!i ¼ ð1; !; . . . ; !m�1Þ � ða�0
; a�1; � � � ; a�m�1
ÞT ;
where a�is are the coordinates of � with respect to �. We
can denote as �� the vector of the coordinates of � withrespect to �, i.e., �� ¼ ða�0
; . . . ; a�m�1ÞT . The coordinates
vector of � with respect to � can be computed as�� ¼ T � ��, where [4]
T ¼
0 0 0 � � � 0 0 10 0 0 � � � 0 1 t10 0 0 � � � 1 t1 t2... ..
. ... . .
. ... ..
. ...
0 1 t1 � � � tm�4 tm�3 tm�2
1 t1 t2 � � � tm�3 tm�2 tm�1
0BBBBBBB@
1CCCCCCCA
ð1Þ
and tj ¼Pj�1
i¼0 fm�jþiti, for 0 < j � m� 1, and t0 ¼ 1. Wealso have that the m�m Hankel matrix Hð��Þ ¼ð��; �!�; . . . ; �!m�1
� Þ can be computed with the expres-sion [4]
�!iþ1�j¼
�!i�jþ10 � j � m� 1Pm�1
l¼0 �!i�lfl j ¼ m� 1:
(ð2Þ
Using the above concepts, we present in the following anew general formulation for the canonical basis multi-plication in GF ð2mÞ based on the use of a triangular basis.
3 CANONICAL BASIS MULTIPLICATION
Let �; �; � 2 GF ð2mÞ and ��; ��; ��be their coordinate
vectors, respectively, with respect to �. The product � ¼� � � can then be performed as follows:
1. Represent � in the triangular basis � as ��¼ T � �
�.
2. Construct the Hankel matrix for ��
,
Hð��Þ ¼ ð�
�; �!
�; . . . ; �!m�1
�Þ:
3. Construct a new m�m matrix, Kð��Þ, defined as
Kð��Þ ¼ Hð�
�Þ � F; ð3Þ
where F is a new m�m Toeplitz matrix defined as
F ¼
fm fm�1 fm�2 � � � f2 f1
0 fm fm�1 � � � f3 f2
0 0 fm � � � f4 f3
..
. ... ..
. . .. ..
. ...
0 0 0 � � � fm fm�1
0 0 0 � � � 0 fm
0BBBBBBB@
1CCCCCCCA
ð4Þ
and where the fis, 1 � i � m, are the coefficients ofthe irreducible polynomial fðxÞ.
4. Let �R� be the inverted coordinates of ��. The product� ¼ � � � in the canonical basis � can be finallycomputed as
�R� ¼ ðd�m�1; d�m�2
; . . . ; d�1; d�0Þ ¼ �T� �Kð��
Þ: ð5Þ
In (5), the product matrix K depends on the irreduciblepolynomial selected and the coordinates are given in sum-
of-products form, where these operations are performed overGF ð2Þ. K can be decomposed in a sum of matrices in such away that some groups of subexpressions can be shared amongthe product coordinates. This grouping of shared subexpres-
sions constitutes a new method for the canonical basismultiplication that we have named transpositional. The
IMA~NNA ET AL.: BIT-PARALLEL FINITE FIELD MULTIPLIERS FOR IRREDUCIBLE TRINOMIALS 521
number and structure of the matrices obtained from thedecomposition of K depend on the irreducible polynomialselected. Subexpression sharing [15] has been used by manyauthors [2], [3], [6], [9], [10], [17], [19], [20] to find low-complexity architectures for finite field multipliers. In thiswork, we apply the multiplication algorithm given in (5) toirreducible trinomials. From this application, we deduce thetranspositional multiplication method and present theproduct coordinates expressions obtained for five types ofirreducible trinomials. The theoretical space and timecomplexities for the bit-parallel multipliers are also given.
4 IRREDUCIBLE TRINOMIALS
In the canonical basis �, the product � of two elements �and �, � ¼ � � �, from a finite field generated by anirreducible trinomial fðxÞ ¼ xm þ xn þ 1 can be performedusing the new multiplication algorithm given in Section 3.When irreducible trinomials are used, the F matrix presentsonly two not null terms fi, corresponding to the nonzerocoefficients fm and fn from the trinomial. The K matrix canbe decomposed into a sum of matrices whose numberdepends on the values of m and n corresponding to fm andfn from the selected trinomial. The decomposition of K in asum of m�m matrices is performed in the following form:
K ¼ K0 þX�i¼1
Ki; ð6Þ
where � ¼ dm�1m�ne and where the K0 matrix is common to
any selected irreducible trinomial (even more, it can beproven that K0 is common to any irreducible polynomial). Thestructure of K0 is the following:
K0 ¼
c�m�1c�m�2
� � � c�1c�0
c�m�2c�m�3
� � � c�0c�m�1
c�m�3c�m�4
� � � c�m�1c�m�2
..
. ... . .
. ... ..
.
c�1c�0
� � � c�3c�2
c�0c�m�1
� � � c�2c�1
0BBBBBBB@
1CCCCCCCA; ð7Þ
where the c�iterms are the coordinates of � with respect to
� and where K0 is a Hankel matrix. From the decomposi-tion given in (6), we have that, for any trinomialfðxÞ ¼ xm þ xn þ 1, there exists at least one matrix K1
whose structure is given in (8), where the subscripts � havebeen removed from the terms c�i
for clarity.The K1 matrix presents some particularities. The first
one is that its nth column (numbered from 1 to m, fromright to left) has all its elements null, where the n index ofthis column corresponds with the index of the not nullcoefficient fn of the selected irreducible trinomial. It can beconsidered that the nth column divides the m�m K1
matrix into two submatrices. The first one is made up of them� n columns located to the left of the null column and thesecond one is made up of the n� 1 columns to the right ofthe nth column. We denote the left m� ðm� nÞ submatrixas L1 and the right m� ðn� 1Þ submatrix as R1, in such away that K1 can be represented as K1 ¼ ðL1jR1Þ. Thesubmatrices L1 and R1 are constructed by the consecutiveshifting of their less significant columns one row down with
zero insertion in the top one. The most significant column of
L1 is made up of m� n zeros in the upper rows and of n not
null rows with ci coefficients, while the less significant
column of R1 is made up of m� nþ 1 null rows and of
n� 1 rows with ci coefficients.
0 � � � 0 0 0 0 0 � � � 00 � � � 0 cm�1 0 0 0 � � � 00 � � � cm�1 cm�2 0 0 0 � � � 00 � � � cm�2 cm�3 0 0 0 � � � 0
..
. . .. ..
. ... ..
. ... ..
. . .. ..
.
0 � � � cnþ2 cnþ1 0 0 0 � � � 0cm�1 � � � cnþ1 cn 0 0 0 � � � 0cm�2 � � � cn cn�1 0 0 0 � � � cm�1
..
. . .. ..
. ... ..
. ... ..
. . .. ..
.
cm�nþ1 � � � c3 c2 0 0 cm�1 � � � cm�nþ2
cm�n � � � c2 c1 0 cm�1 cm�2 � � � cm�nþ1
0BBBBBBBBBBBBBBBBBB@
1CCCCCCCCCCCCCCCCCCA
:
ð8Þ
The existence of matrices K2;K3; . . . in the summation
given in (6) depends on the position of the nth null column
in K1 as is determined in the following: Starting with K1,
there will exist a K2 ¼ ðL2jR2Þmatrix in the decomposition
of K if the most significant column of L1 has n > 1 not null
ci terms. The construction of the submatrices L2 and R2 is
performed using R1 in the following form:
. If m < 2n� 1, a K3 matrix constructed in ananalogous way to K2 and, from the latter, exists.The submatrix L2 consists of the first m� n columnsfrom R1 and the remaining 2n�m� 1 columnsfrom R1 are the first columns of R2. The rest of them� n columns of R2 will be null columns.
. If m � 2n� 1, then K2 is the last matrix in thesummation given in (6). The submatrix L2 consists ofthe n� 1 columns from R1, while the remaining m�2nþ 1 columns of L2 (if they exist) are completedwith null columns. In this case, R2 will be asubmatrix with all its elements null.
In general, for any Kj ¼ ðLjjRjÞ matrix with j � 2, it is
verified that if the ðm� nÞth column from Lj presents h > 1
coefficients ci in its lower rows, then there will exist a
matrix Kjþ1 ¼ ðLjþ1jRjþ1Þ whose Ljþ1 and Rjþ1 subma-
trices are constructed from Rj as follows:
. If h� 1 > m� n, then Ljþ1 consists of the first m� ncolumns from Rj, while the remaining h� 1�mþ ncolumns from Rj are the first columns of Rjþ1,completed with m� h null columns. In this case,there will exist another Kjþ2 matrix constructed inan analogous way to Kjþ1.
. If h� 1 � m� n, the first h� 1 columns of Ljþ1 arethe h� 1 not null columns from Rj completed withm� n� hþ 1 null columns, while Rjþ1 has all itscolumns null. In this case, no more matrices exist.
The number of Kj matrices, j � 2, existing in the
summation given in (6) depends on the value of the nth
not null power of the selected irreducible trinomial
fðxÞ ¼ xm þ xn þ 1. Therefore, special trinomials with a
determined number of matrices in the K summation can be
522 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 5, MAY 2006
distinguished. Once K is decomposed in a sum of � þ 1matrices, � ¼ dm�1
m�ne, and using the multiplication algorithmgiven in Section 3, the product � ¼ � � � in the canonicalbasis � generated by an irreducible trinomial can then becomputed by
�R� ¼ �T� �K ��
� �¼ �T� � K0 þ
X�i¼1
Ki
!
¼ �T� �K0 þ � � � þ �T� �K� ;
ð9Þ
where the coordinates of � are given as sum-of-products ofthe coordinates of � and �. From (9), a new method formultiplication (that we have named transpositional) basedon the grouping and sharing of common subexpressions isintroduced in the following section.
5 TRANSPOSITIONAL METHOD FOR
MULTIPLICATION
The product coordinates given in (9) are computed bysumming the �T�Ki (i ¼ 0; . . . ; �) vectors, where �T�K0 and�T�K1 always appear in the summation, with independenceof the selected trinomial (even more, the �T�K0 vector iscommon to any irreducible polynomial). The K0 matrixdefined in (7) consists of m columns formed by thesuccessive rotation of the coordinates vector in � of �,while those of K1 and Kj (j ¼ 2; 3; . . . ; �) are formed bysuccessive shiftings of the � coordinates vector. Animportant matter is that the components of �T�K1 and�T�Kj (j ¼ 2; 3; . . . ; �) consist of sum-of-products that alsoappear in the components of �T�K0. Common subexpres-sions grouped in sums of products and extracted from the�T�K0 components can therefore be determined and theirsharing leads to the reduction of the multiplier complexity.In order to give a new expression for the computation of thevectors �T�Ki (i ¼ 0; 1; . . . ; �), we use the notation given ingroup theory for permutations, which is remembered with thefollowing example:
The sum of products a0c1 þ a1c0 þ a3c9 þ a4c8 þ a5c7 þa6c6 þ a7c5 þ a8c4 þ a9c3 can be considered as the innerproduct of the coordinates vectors of the field elements �and � represented in � by ða0; a1; a3; a4; a5; a6; a7; a8; a9Þ andðc1; c0; c9; c8; c7; c6; c5; c4; c3Þ, respectively, where the sub-scripts � have been omitted for clarity. The product termsaicj can then be represented by the permutation
0 1 3 4 5 6 7 8 91 0 9 8 7 6 5 4 3
� �;
where the upper row contains the subscripts of thecoordinates of � which will be multiplied by the coordi-nates of � with subscripts given in the lower row. Startingwith the symbol 0, we see that the permutation takes 0 into1, representing the product a0c1. We then look for 1 in theupper line and see that the permutation takes 1 into 0 (a1c0
product), closing a cycle, which we write as ð0; 1Þ. This cyclerepresents the sum ða0c1 þ a1c0Þ in our notation. We nowstart with some other symbol in the top line, say 6. Thepermutation takes 6 into 6, giving the cycle ð6Þ representinga6c6. Continuing, we find the cycles ð3; 9Þ, ð4; 8Þ, and ð5; 7Þ.We may finally write our example permutation as the cycles
ð0; 1Þð3; 9Þð4; 8Þð5; 7Þð6Þ. The sum of products given for this
example will be the addition of the terms xij ¼ ðaicj þ ajciÞand xk ¼ ðakckÞ represented by the 2-cycles ði; jÞ and
1-cycles ðkÞ, respectively. The 2-cycles ði; jÞ are called, in
group theory, transpositions.Using this new notation, we introduce the functions
which provide the cycles for given values of the
parameters i and m. We group them in function of their
dependence or not on m and in function of the even or odd
value of i as follows:
. Function independent of m and valid for even valuesof i:
ECi0 ¼ ðh; lÞ
h ¼ 0; 1; . . . ;i
2� 1; l ¼ ði� h� 1Þ; h � 0:
ð10Þ
. Function independent of m and valid for odd valuesof i:
OCi0 ¼ðkÞ k ¼ i�1
2
ðh; lÞ h ¼ 0; 1; . . . ; i�12 � 1;
l ¼ ði� h� 1Þ; h � 0:
8<: ð11Þ
. Functions dependent on m and valid for even or oddvalues of i:
ECi1 ¼ðkÞ k ¼ dm2e þ bi2c
ðp; rÞ j ¼ 1; 2; . . . ; ðdm2e þ bi2cÞ � ðiþ 1Þp ¼ ðiþ jÞ; r ¼ ðm� jÞ;
�8<:
ð12Þ
OCi1 ¼ ðp; rÞ j ¼ 1; 2; . . . ; ðdm2 e þ biþ12 cÞ � ðiþ 1Þ
p ¼ ðiþ jÞ; r ¼ ðm� jÞ:
�ð13Þ
The new functions ECi0, OCi0, ECi1, and OCi1 compute
the 1-cycles and 2-cycles for given values of i and m, and
determine the subexpressions grouping characteristic of the
transpositional method. This is performed defining the new
functions Ei0, Oi0, Ei1, and Oi1, which carry out the sum of
the xk and xij terms represented by the cycles given by
ECi0, OCi0, ECi1, and OCi1, respectively.The �T�K0 vector is made from the sum of the new
functions Ei0, Oi0, Ei1, and Oi1. Moreover, the subexpres-
sions given by Ei1 and Oi1 constitute the components of all
the �T�Ki (i ¼ 1; . . . ; �) vectors. As these functions are
previously made for the construction of the �T�K0 vector,
they do not have to be reconstructed, so their sharing
reduces the multiplier complexity. The �T�K0 and �T�K1
vectors always appear in the summation given in (9). The
composition of �T�K0 is common to any trinomial selected,
but the composition of �T�K1 and the number and
composition of �T�Kj (j ¼ 2; . . . ; �) in (9) depend on the
selected trinomial. Using Ei0, Oi0, Ei1, and Oi1, the ith
(i ¼ 0; . . . ;m� 1) components of �T�K0 can be computed for
even (y odd) values of m as follows:
IMA~NNA ET AL.: BIT-PARALLEL FINITE FIELD MULTIPLIERS FOR IRREDUCIBLE TRINOMIALS 523
ð�T�K0Þi ¼yOðm�iÞ0 þEðm�i�1Þ1 odd i
yEðm�iÞ0 þOðm�i�1Þ1 even i; i 6¼ 0� i ¼ 0:
�8<: ð14Þ
In (14) and in the following, the symbol y indicates that,
for odd values of the given parameter, the E and O functions
must be exchanged, that is, yEij ¼ Oij and yOij ¼ Eij.
The �T�K1 and �T�Kj (j ¼ 2; 3; . . . ; �) vectors consist
exclusively of the functions Ei1 and Oi1 that are among the
components of �T�K0. We give the following expressions for
the computation of the ith components of �T�K1, for even (y
odd) values of m and where a new parameter � ¼ m� n is
introduced:
ð�T�K1Þi ¼
Eðð�þmÞ�ðiþ1ÞÞ1 i > �� i ¼ �yEð��ðiþ1ÞÞ1 i < �
9=; ðeven i and odd �Þ orðodd i and even �Þ
Oðð�þmÞ�ðiþ1ÞÞ1 i > �� i ¼ �yOð��ðiþ1ÞÞ1 i < �
9=; ð odd i and odd �Þ orðeven i and even �Þ:
8>>>>>><>>>>>>:
ð15Þ
We can also establish the expressions for the recursive
computation of the ith components (i ¼ 0; 1; . . . ;m� 1) of
the �T�Kj (j ¼ 2; 3; . . . ; � ¼ dm�1� e) vectors as follows:
ð�T�KjÞi ¼� i 2 f�;�þ 1; . . . ; 2�g mod m
ð�T�Kj�1Þði��Þmod m i 62 f�;�þ 1; . . . ; 2�gmod m;
(
ð16Þ
where the computation is performed starting with the
knowledge of �T�K1. The recursion will stop when a �T�Kj
vector with all its components null is obtained, represented
by the � symbol from (14) to (16). The last not null vector is
�T�K� , with � ¼ dm�1� e, as given in (6).
From (14) to (16), the general expressions for the
coordinates of the product � in � of two field elements �
and � using the transpositional method for irreducible
generating trinomials can be established. The expressions
are obtained from (9) and are given as (i ¼ 0; . . . ;m� 1)
d�i¼ ð�T�K0Þm�1�i þ ð�T�K1Þm�1�i þ
X�j¼2
ð�T�KjÞm�1�i; ð17Þ
where d�is are the coordinates of � in �. The number and
composition of the vectors �T�Kj (j ¼ 1; 2; . . . ; �) included in
the summation given in (17) depend on the trinomial type
used. In following sections, we study the transpositional
multiplication method in � for five types of irreducible
trinomials and their particular equations given by our
method are established. The theoretical complexity analyses
of the bit-parallel multipliers thus determined are also
presented.
6 IRREDUCIBLE TRINOMIALS
fðxÞ ¼ xm þ xm�1 þ 1
When the generating irreducible trinomial has the form
fðxÞ ¼ xm þ xm�1 þ 1, the parameter � ¼ dm�1� e ¼ m� 1
(with � ¼ 1). The decomposition of K given in (6) is the
sum K0 þK1 þ � � � þKm�1, which can also be deduced
from the structure of K1 given in (8) and from its properties
(see Section 4). From the general expressions of the d�i
coordinates of the product � given in (17) and using (14) to
(16), we can give the expressions for the computation of the
coordinates d�iof the product. For even values of m,
d�i¼
ðEðiþ1Þ0 þOi1Þ þXm�2
jðevenÞ¼iþ�
Ej1 þXm�2
jðoddÞ¼iþ�
Oj1 odd i
ðOðiþ1Þ0 þEi1Þ þ
Xm�2
jðevenÞ¼iþ�
Ej1þ
Xm�2
jðoddÞ¼iþ�
Oj1
even i
� i ¼ m� 2
8>>>>>>><>>>>>>>:
ðEm0Þ þXm�2
jðevenÞ¼0
Ej1 þXm�2
jðoddÞ¼0
Oj1 i ¼ m� 1
8>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>:
ð18Þ
while that for odd m values (i ¼ 0; 1; . . . ;m� 1) is
d�i¼
ðOðiþ1Þ0 þOi1Þ þXm�2
jðevenÞ¼iþ�
Oj1 þXm�2
jðoddÞ¼iþ�
Ej1 even i
ðEðiþ1Þ0 þEi1Þ þ
Xm�2
jðevenÞ¼iþ�
Oj1þ
Xm�2
jðoddÞ¼iþ�
Ej1
odd i
� i ¼ m� 2
8>>>>>>><>>>>>>>:
ðOm0Þ þXm�2
jðevenÞ¼0
Oj1 þXm�2
jðoddÞ¼0
Ej1 i ¼ m� 1:
8>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>:
ð19Þ
In (18) and (19), the terms in brackets are given by �T�K0
and the summations are given by �T�K1 and �T�Kj, withj ¼ 2; . . . ;m� 1. The grouping and sharing of the Ei1 and Oi1
functions reduce the theoretical complexities of the parallelmultipliers constructed with our method. A multiplicationexample over GF ð26Þ is presented in the following.
6.1 Multiplication Example over GF ð26ÞThe product � of two elements � and � from GF ð26Þgenerated by the irreducible trinomial fðxÞ ¼ x6 þ x5 þ 1can be computed using (18). The decomposition of K isgiven by the sum of the 1þ � ¼ 1þ d6�1
6�5e ¼ 6 followingmatrices, where the ci terms are the � coordinates in � andwhere K0, K1 and Kj (j ¼ 2; . . . ; 5) are constructed asmentioned in Section 4.
524 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 5, MAY 2006
K ¼ K0 þK1 þX5
j¼2
Kj ¼
c5 c4 c3 c2 c1 c0
c4 c3 c2 c1 c0 c5
c3 c2 c1 c0 c5 c4
c2 c1 c0 c5 c4 c3
c1 c0 c5 c4 c3 c2
c0 c5 c4 c3 c2 c1
0BBBBBBBB@
1CCCCCCCCA
þ
0 0 0 0 0 0
c5 0 0 0 0 0
c4 0 0 0 0 c5
c3 0 0 0 c5 c4
c2 0 0 c5 c4 c3
c1 0 c5 c4 c3 c2
0BBBBBBBB@
1CCCCCCCCAþ
0 0 0 0 0 0
0 0 0 0 0 0
c5 0 0 0 0 0
c4 0 0 0 0 c5
c3 0 0 0 c5 c4
c2 0 0 c5 c4 c3
0BBBBBBBB@
1CCCCCCCCA
þ
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
c5 0 0 0 0 0
c4 0 0 0 0 c5
c3 0 0 0 c5 c4
0BBBBBBBB@
1CCCCCCCCAþ
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
c5 0 0 0 0 0
c4 0 0 0 0 c5
0BBBBBBBB@
1CCCCCCCCA
þ
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
c5 0 0 0 0 0
0BBBBBBBB@
1CCCCCCCCA:
ð20Þ
The di coordinates of the product � can be computed
using (9) and (20), but, from the sum of products obtained,
the grouping and sharing of terms given by the transpositional
method cannot be deduced. The 1-cycles and 2-cycles
obtained with this method are computed from (10) to (13)
and, for our GF ð26Þ example, are given by the functions
OC10 ¼ ð0Þ;EC20 ¼ ð0; 1Þ;OC30 ¼ ð1Þð0; 2Þ;EC40 ¼ ð0; 3Þð1; 2Þ;OC50 ¼ ð2Þð0; 4Þð1; 3Þ;EC60 ¼ ð0; 5Þð1; 4Þð2; 3Þ;
and
EC01 ¼ ð3Þð1; 5Þð2; 4Þ;OC11 ¼ ð2; 5Þð3; 4Þ;EC21 ¼ ð4Þð3; 5Þ;OC31 ¼ ð4; 5Þ;EC41 ¼ ð5Þ:
The sums of terms represented by these cycles are given by
O10, E20, O30, E40, O50, E60 and E01, O11, E21, O31, E41,
respectively, obtaining
O10 ¼ x0 ¼ a0c0
E20 ¼ x01 ¼ ða0c1 þ a1c0ÞO30 ¼ x1 þ x02 ¼ a1c1 þ ða0c2 þ a2c0ÞE40 ¼ x03 þ x12 ¼ ða0c3 þ a3c0Þ þ ða1c2 þ a2c1ÞO50 ¼ x2 þ x04 þ x13 ¼ a2c2 þ ða0c4 þ a4c0Þ
þða1c3 þ a3c1ÞE60 ¼ x05 þ x14 þ x23 ¼ ða0c5 þ a5c0Þ þ ða1c4 þ a4c1Þ
þða2c3 þ a3c2Þð21Þ
E01 ¼ x3 þ x15 þ x24 ¼ a3c3 þ ða1c5 þ a5c1Þþða2c4 þ a4c2Þ
O11 ¼ x25 þ x34 ¼ ða2c5 þ a5c2Þ þ ða3c4 þ a4c3ÞE21 ¼ x4 þ x35 ¼ a4c4 þ ða3c5 þ a5c3ÞO31 ¼ x45 ¼ ða4c5 þ a5c4ÞE41 ¼ x5 ¼ a5c5;
ð22Þ
where the ai and ci terms are the coordinates of � and �from GF ð26Þ, respectively. The components of �T�K0, �T�K1,and �T�Kj (j ¼ 2; 3; 4; 5) can be computed using (14), (15),and (16), respectively, for even values of m, obtaining
�T�K0 ¼ ðE60;O50 þE41;E40 þO31;O30 þE21;E20 þO11;O10 þE01Þ
�T�K1 ¼ ðE01;�;E41;O31;E21;O11Þ�T�K2 ¼ ðO11;�;�;E41;O31;E21Þ�T�K3 ¼ ðE21;�;�;�;E41;O31Þ�T�K4 ¼ ðO31;�;�;�;�;E41Þ�T�K5 ¼ ðE41;�;�;�;�;�Þ;
ð23Þ
where the � symbol represents a null component. Using(23) and (17), the product coordinates are obtained as sumsof terms grouped by Eið0;1Þ and Oið0;1Þ given in (21) and (22).These coordinates can be directly computed using (18). Theproduct coordinates for this example are given in Table 1,where the �T�K0, �T�K1 and �T�Kj (j ¼ 2; 3; 4; 5) componentsare specified and where a di coordinate is the sum of theterms Eið0;1Þ and Oið0;1Þ existent in the ith row.
The sharing property can be observed in Table 1. Startingwith the coordinate dm�2 ¼ d4, the term E41 appears in thesum of terms of d3, whose subexpression (O31 þE41) alsoappears in d2. The same fact can be stated for thesubexpression (E21 þO31 þE41) that appears in d2 andd1, and for (O11 þE21 þO31 þE41) belonging to d1 and d0.Finally, the subexpression (E01 þO11 þE21 þO31 þE41)belonging to d0 also appears in d5. These facts imply that thementioned terms can be constructed only once and then theycan be reused, without the necessity of reconstruction. Thenumber of gates finally needed for the GF ð26Þ multiplier is36 AND and 35 XOR gates.
This way of constructing the product coordinates lets usreduce the time complexity of the multiplier using a hybridtree of XOR gates in which the terms Eið0;1Þ and Oið0;1Þ areconstructed with a binary tree of XOR gates, whereas thesum of these terms is performed with an XOR linear tree, asgiven in Table 1. In Fig. 1a, the construction of thecoordinate d5, the one with highest space and time complex-ities, is shown. The delay for d5 is given by the sum of
IMA~NNA ET AL.: BIT-PARALLEL FINITE FIELD MULTIPLIERS FOR IRREDUCIBLE TRINOMIALS 525
TABLE 1Product Coordinates di for the Multiplication over GF ð26Þ
for fðxÞ ¼ x6 þ x5 þ 1
(E01 þO11 þE21 þO31 þE41) with E60. The term O31 isgiven by the sum of two product terms, so the total delay ofthe multiplier is TAND þ 6TXOR.
6.2 General Expressions
The product coordinates of � ¼ � � � are given in (18) and(19) as the summation of terms Eið0;1Þ and Oið0;1Þ. For anyvalue of m, these summations have the form given inTable 1, so this way of construction can be used. Startingwith the term Eðm�2Þ1 from the summation of dm�2, Eðm�2Þ1can be grouped with the term Oðm�3Þ1 from dm�3. Thisgroup can be denoted by G1 ¼ Eðm�2Þ1 þOðm�3Þ1, so thedm�4 coordinate includes the group G2 ¼ G1 þEðm�4Þ1, thedm�5 coordinate includes the group G3 ¼ G2 þOðm�5Þ1,and so on. The last group will be Gm�2 ¼ Gm�3 þE01 (foreven m) or Gm�2 ¼ Gm�3 þO01 (for odd m), which willbelong to the summations of d0 and dm�1. Therefore, them� 2 complex groups Gi consist exclusively of sum ofterms Ei1 and Oi1 determined by �T�K0.
In Fig. 1b, the m� 2 complex groups Gi are shown. Thestructure is a hybrid tree where the Ei1 and Oi1 terms areconstructed using a binary tree of XOR gates, whereas theirsum is performed with a linear tree of XOR gates. This lineartree is necessary for the construction of the Gi from the Gi�1
groups. A product coordinate is the sum of its group withits corresponding Oi0 or Ei0 term. The only coordinatewithout any group is dm�2. Therefore, the new expressionsestablished by the transpositional method for the computa-tion of the product coordinates di in � for even (y odd) valuesof m can be given as follows (i ¼ 0; 1; . . . ;m� 1):
d�i¼
yEm0 þGm�2 i ¼ m� 1yOðm�1Þ0 þEðm�2Þ1 i ¼ m� 2
Gm�2�i þOðiþ1Þ0 even iEðiþ1Þ0 odd i:
�8>><>>: ð24Þ
6.3 Theoretical Complexity Analysis
In order to determine the theoretical complexity of the bit-parallel transpositional canonical basis multiplier given by(24), we must first determine the complexity of the termsEið0;1Þ and Oið0;1Þ. These functions are constructed as binarytrees of XOR gates with a lower level of AND gates
(corresponding to the aicj products of the coordinates of �
and �). The theoretical complexities of these functions, for
the values of the subindexes given in (24), are the following:
. The Oi0 functions are the sum of 1 term xk ¼ akckand ðdi2e � 1Þ terms xij ¼ ðaicj þ ajciÞ. Therefore, theOi0 functions will need (2di2e � 1) 2-input AND gatesand a binary tree of 2ðdi2e � 1Þ 2-input XOR gates.The depth of the XOR binary tree will bedlog2ð2di2e � 1Þe, so the delay of the Oi0 terms willbe TAND þ dlog2ð2di2e � 1ÞeTXOR.
. The Ei0 functions are the sum of di2e terms xij, sothey will need 2di2e AND gates and a binary tree of2di2e � 1 XOR gates with a depth dlog2ð2di2eÞe. There-fore, the delay of the Ei0 terms will beTAND þ dlog2ð2di2eÞeTXOR.
. The Ei1 functions consist of one term xk and ofðm�i2 � 1Þ terms xij. Therefore, their implementationrequires m� i� 1 AND gates and a binary tree ofm� i� 2 XOR gates with depth dlog2ðm� i� 1Þe.T h e d e l a y o f t h e Ei1 t e r m s w i l l b eTAND þ dlog2ðm� i� 1ÞeTXOR.
. The Oi1 functions consist of m�i�12 terms xij, so
they require m� i� 1 AND gates and a binarytree of m� i� 2 XOR gates with depthdlog2ðm� i� 1Þe. The delay of the Oi1 terms willbe TAND þ dlog2ðm� i� 1ÞeTXOR.
From these results and using (24), the theoretical space
complexity of the multiplier can be finally given bym2 AND
gates and m2 � 1 XOR gates.In order to determine the theoretical time complexity of
the multiplier, we denote as �ðTÞ the depth of the XOR tree
of any given term T. From the definition of the Gi groups
and from Fig. 1b, it can be observed that
�ðGiÞ ¼ max �ðGi�1Þ;� EðOÞðm�2�iÞ1
� �� �þ 1: ð25Þ
The G1 group was defined as G1 ¼ Eðm�2Þ1 þOðm�3Þ1,
where Eðm�2Þ1 is a product term and where Oðm�3Þ1 is the sum
of two products terms, so �ðG1Þ ¼ 2. The depths of the XOR
trees corresponding to Ei1 and Oi1 are equal and were given
as �ðEðOÞi1Þ ¼ dlog2ðm� i� 1Þe. Therefore, the depths of
526 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 5, MAY 2006
Fig. 1. (a) Linear tree of XOR gates for the sum of Eið0;1Þ and Oið0;1Þ terms for the coordinate d5 in the example. (b) Tree structure of XOR gates for
the Gi groups.
the Gi groups are �ðG2Þ ¼ 3;�ðG3Þ ¼ 4; . . . ;�ðGiÞ ¼ iþ 1.
Applying (25), the highest depth will correspond to the
Gm�2 group, given by �ðGm�2Þ ¼ m� 1. The Gm�2 group
appears in the d0 and dm�1 coordinates, so these will
determine the total delay of the multiplier. The EðOÞm0
terms have the highest complexity, so the dm�1 coordinate
determines the highest delay and its depth is given by
�ðdm�1Þ ¼ max m� 1; log2 2m
2
l m� �l m� �þ 1 ¼ m: ð26Þ
Finally, the theoretical time complexity is given by this
number of XOR levels plus one AND level corresponding
to the aicj coordinates products of � and � from GF ð2mÞ.Therefore, the total delay for the multiplier is
TAND þmTXOR.In Table 2, we compare the theoretical complexities
obtained by our transpositional method with the best results
found in the literature given by Halbutogullari and Koc [3],
Zhang and Parhi [20] and Sunar and Koc [17]. These
authors use similar methods for the construction of bit-
parallel canonical basis multipliers generated by irreducible
trinomials fðxÞ ¼ xm þ xm�1 þ 1. It can be observed that our
method reduces the XOR delay of the canonical multipliers
in comparison with the best time complexities known to
date. Furthermore, the space complexity of our multipliers
matches the best results found in the literature.
7 IRREDUCIBLE TRINOMIALS fðxÞ ¼ xm þ xmþ12 þ 1
(ODD m)
When the generating irreducible trinomial has the form
fðxÞ ¼ xm þ xmþ12 þ 1 (odd m), the parameter � ¼ dm�1
� e ¼ 2
(� ¼ m� n ¼ m�12 ). The decomposition of K given in (6) is
the sum K0 þK1 þK2, which can also be deduced from
the structure of K1 given in (8).From the general expressions of the d�i
coordinates of
the product � given in (17) and using (14) to (16), we can
give the expressions for the computation of the coordinates
d�iof the product in which even and odd values of � are
distinguished. A new parameter, �m ¼ ðm� 1Þ ��, is also
introduced that, for this type of trinomial, is equal to �. For
even values of �, we then have
d�i¼
ðOðiþ1Þ0 þOi1Þ þOð�þiÞ1 even i; i < �m
� i ¼ �m
Eði�ð�mþ1ÞÞ1 þEði�1Þ1 even i; i > �m
8><>:
ðEðiþ1Þ0 þEi1Þ þEð�þiÞ1 odd i; i < �m
Oði�ð�mþ1ÞÞ1 þOði�1Þ1 odd i; i > �m
(
ðOm0Þ þEði�ð�mþ1ÞÞ1 þEði�1Þ1 i ¼ m� 1;
8>>>>>>>>><>>>>>>>>>:
ð27Þ
while, for odd values of �, the coordinates are
d�i¼
ðOðiþ1Þ0 þOi1Þ þEð�þiÞ1 even i; i < �m
Oði�ð�mþ1ÞÞ1 þEði�1Þ1 even i; i > �m
(
ðEðiþ1Þ0 þEi1Þ þOð�þiÞ1 odd i; i < �m
� i ¼ �m
Eði�ð�mþ1ÞÞ1 þOði�1Þ1 odd i; i > �m
8><>:
ðOm0Þ þOði�ð�mþ1ÞÞ1 þEði�1Þ1 i ¼ m� 1;
8>>>>>>>>><>>>>>>>>>:
ð28Þ
where i ¼ 0; 1; . . . ;m� 1 and where the terms in brackets
are given by �T�K0, whereas the third and fourth addends
(second and third if i ¼ m� 1) in (27) and (28) are given by
�T�K1 and �T�K2, respectively. An example over GF ð27Þ is
given in the following.
7.1 Multiplication Example over GF ð27ÞFor the irreducible trinomial fðxÞ ¼ x7 þ x4 þ 1, the para-
meter � ¼ m�12 ¼ 3 ¼ �m and the product � of two elements
� and � can then be computed using (28). K is decomposed
in the sum of the matrices K0, K1, and K2, that can be
constructed as given in Section 4.The transpositional approach computes the 1-cycles and
2-cycles using (10) to (13), obtaining the functions
OC10 ¼ ð0Þ;EC20 ¼ ð0; 1Þ;OC30 ¼ ð1Þð0; 2Þ;EC40 ¼ ð0; 3Þð1; 2Þ;OC50 ¼ ð2Þð0; 4Þð1; 3Þ;EC60 ¼ ð0; 5Þð1; 4Þð2; 3Þ;OC70 ¼ ð3Þð0; 6Þð1; 5Þð2; 4Þ
and the terms
OC01 ¼ ð1; 6Þð2; 5Þð3; 4Þ;EC11 ¼ ð4Þð2; 6Þð3; 5Þ;OC21 ¼ ð3; 6Þð4; 5Þ;EC31 ¼ ð5Þð4; 6Þ;OC41 ¼ ð5; 6Þ;EC51 ¼ ð6Þ:
IMA~NNA ET AL.: BIT-PARALLEL FINITE FIELD MULTIPLIERS FOR IRREDUCIBLE TRINOMIALS 527
TABLE 2Complexities for Canonical Multipliers Using Trinomials fðxÞ ¼ xm þ xm�1 þ 1
The sums of terms represented by these cycles are given by
O10, E20, O30, E40, O50, E60, O70 and O01, E11, O21, E31,
O41, E51, respectively, that can be easily computed.For this example, it can be proven that the product
coordinates are given as shown in Table 3, where the
components of �T�K0, �T�K1, and �T�K2 are specified and
where a di coordinate is the sum of the Eið0;1Þ and Oið0;1Þ
terms existent in the ith row. The sharing property can be
observed in Table 3. The subexpression (O21 þE51) from d2
also appears in d6. The same fact can be stated for the
subexpressions (E11 þO41) that appears in d1 and d5 and,
for (O01 þE31) belonging to d0 and d4. The number of gates
finally needed for the GF ð27Þ multiplier is 49 AND and
48 XOR gates. This way of construction of the product
coordinates also lets us reduce the time complexity of the
multiplier using binary trees of XOR gates. Finally, the total
delay of the multiplier is given by TAND þ 5TXOR.
7.2 General Expressions and TheoreticalComplexity Analysis
As with the previously studied trinomial, Table 3 can be
used to find complex groups determined by the sum of
terms Ei1 and/or Oi1. These groups are denoted as Gi,
where the i subindex is equal to the subindex of the product
coordinate to which the group belongs. According to this,
the following expressions for the Gi groups can be given for
even (y odd) values of �:
Gi ¼Oi1 þ yOð�þiÞ1 even i
Ei1 þ yEð�þiÞ1 odd i
�i < �m
Gi�ð�mþ1Þ i > �m
8<: ð29Þ
for i ¼ 0; 1; . . . ;m� 1 with i 6¼ �m because, for i ¼ �m,
there is no associated Gi group.Using (27) to (29), the following new expressions
established by the transpositional method for the coordinates
of the product in the canonical basis can be given for even
(y odd) values of � (with i ¼ 0; 1; . . . ;m� 1):
d�i¼
Gi þOðiþ1Þ0 even i
Eðiþ1Þ0 odd i
( )i < �m
yOðiþ1Þ0 þ yOi1 i ¼ �m
Gi�ð�mþ1Þ þOðiþ1Þ0 þOi1 even i
Eðiþ1Þ0 þEi1 odd i
( )i > �m
Om0 þGi�ð�mþ1Þ i ¼ m� 1:
8>>>>>>>>>><>>>>>>>>>>:
ð30Þ
Using binary trees of XOR gates for the construction, the
theoretical complexities of the multiplier can be deter-
mined. The space complexity is computed from the Eið0;1Þ
and Oið0;1Þ complexities given in Section 6.3 and using (29)
and (30). This complexity can be proven to be m2 AND and
m2 � 1 XOR gates.In order to determine the theoretical time complexity of
the multiplier, we must first compute the delay of the Gi
complex groups. Using (29), the depth of the XOR tree for a
group Gi is given by
�ðGiÞ ¼ maxð�ðEðOÞi1Þ;�ðEðOÞð�þiÞ1ÞÞ þ 1
¼ dlog2ðm� i� 1Þe þ 1;ð31Þ
where the complexities given in Section 6.3 have been used.
Using (31) and (30), the following complexities for the
product coordinates can be stated:
. When i < �m, the depth of the XOR tree for the diproduct coordinate is given by
�ðdiÞ ¼ �ðGiÞ þ 1 ¼ dlog2ðm� i� 1Þe þ 2:
. For i ¼ �m, the di coordinate does not have anyassociated Gi group. Therefore, for even �,
�ðdiÞ ¼ �ðOðiþ1Þ0Þ þ 1 ¼ log2 2iþ 1
2
� � 1
� �� þ 1;
while, for odd �, we have that
�ðdiÞ ¼ �ðEðiþ1Þ0Þ þ 1 ¼ log2 2iþ 1
2
� � �� þ 1:
. For i > �m (i 6¼ m� 1), Gi ¼ Gi�ð�mþ1Þ using (29).Therefore, the depth of the XOR tree for thedi coordinate is given by
�ðdiÞ ¼ �ðGiÞ þ 1 ¼ dlog2ðm� iþ�mÞe þ 2:
. For i ¼ m� 1, di i s the sum of Om0 andGi ¼ Gi�ð�mþ1Þ. Therefore, we have that
�ðdiÞ ¼ max log2md e; 1þ log2mþ 1
2
� �� � �þ 1
¼ dlog2me þ 1:
528 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 5, MAY 2006
TABLE 3Product Coordinates di for the Multiplication over GF ð27Þ
for fðxÞ ¼ x7 þ x4 þ 1
The total delay of the multiplier can be computed using
the highest complexity from the previous ones that is given
by 2þ dlog2ðm� 1Þe (selecting the index i ¼ 0). Finally, the
theoretical time complexity is given by this number of levels
of XOR gates plus one level of AND gates. Therefore, the
delay of the multiplier is TAND þ ð2þ dlog2ðm� 1ÞeÞTXOR.
In Table 4, the theoretical complexities obtained by our
transpositional method and the best results found in the
literature for similar multipliers are given. It can be
observed that our method reduces the XOR delay of the
canonical multipliers in comparison with the best time
complexities known to date, while the space complexity of
our multipliers matches the best results found in the
literature.
8 IRREDUCIBLE TRINOMIALS fðxÞ ¼ xm þ xm�12 þ 1
(ODD m)
When the irreducible generating trinomial is fðxÞ ¼xm þ xm�1
2 þ 1 (odd m), � ¼ mþ12 and � ¼ d2ðm�1Þ
ðmþ1Þ e. For
m > 3, � ¼ 2 and K ¼ K0 þK1 þK2, while, for m ¼ 3, � ¼1 and K ¼ K0 þK1. In fact, the trinomial fðxÞ ¼ x3 þ xþ 1
is of the form fðxÞ ¼ xm þ xþ 1 that will be considered in
the Section 10.
Using (14) to (17), the expressions of the coordinates d�i
of the product can be given, where even and odd values for
the � parameter are distinguished (in this case,
�m ¼ m�32 6¼ �) . F o r e v e n v a l u e s o f m a n d f o r
i ¼ 0; 1; . . . ;m� 1, the product coordinates are given by
d�i¼
ðOðiþ1Þ0 þOi1Þ þOð�þiÞ1 even i; i < �m
� i ¼ �m
Eði�ð�mþ1ÞÞ1 þEðiþ1Þ1 even i; i > �m
8><>:
ðEðiþ1Þ0 þEi1Þ þEð�þiÞ1 odd i; i < �m
Oði�ð�mþ1ÞÞ1 þOðiþ1Þ1 odd i; i > �m
(
ðEðiþ1Þ0 þEi1Þ þOði�ð�mþ1ÞÞ1 i ¼ m� 2
ðOm0Þ þEði�ð�mþ1ÞÞ1 i ¼ m� 1;
8>>>>>>>>>>>><>>>>>>>>>>>>:
ð32Þ
while, for odd values of �,
d�i¼
ðOðiþ1Þ0 þOi1Þ þEð�þiÞ1 even i; i < �m
Oði�ð�mþ1ÞÞ1 þEðiþ1Þ1 even i; i > �m
(
ðEðiþ1Þ0 þEi1Þ þOð�þiÞ1 odd i; i < �m
� i ¼ �m
Eði�ð�mþ1ÞÞ1 þOðiþ1Þ1 odd i; i > �m
8><>:
ðEðiþ1Þ0 þEi1Þ þEði�ð�mþ1ÞÞ1 i ¼ m� 2
ðOm0Þ þOði�ð�mþ1ÞÞ1 i ¼ m� 1:
8>>>>>>>>>>>><>>>>>>>>>>>>:
ð33Þ
8.1 Multiplication Example Over GF ð27ÞFor the irreducible trinomial fðxÞ ¼ x7 þ x3 þ 1, the 1-cyclesand 2-cycles computed by the transpositional approach using(10) to (13) are the ones given in Section 7.1 for the trinomialfðxÞ ¼ x7 þ x4 þ 1. The product coordinates obtained aregiven in Table 5, where the sharing property can beobserved for the subexpressions (O01 þO41) and(E11 þE51). Using binary trees of XOR gates for theconstruction, it can be proven that the space complexity ofthe multiplier is 49 AND and 48 XOR gates, whereas thetime complexity is TAND þ 5TXOR.
8.2 General Expressions and TheoreticalComplexity Analysis
As with the previous trinomials, Table 5 determinescomplex groups given by the sum of terms Ei1 and/orOi1. These groups Gi (with i equal to the subindex of theproduct coordinate to which the group belongs) are givenby (29), with the only difference being that, for fðxÞ ¼xm þ xm�1
2 þ 1 (odd m), the subindex i ¼ 0; 1; . . . ;m� 3. Fori ¼ �m, there is no associated Gi group. The following new
IMA~NNA ET AL.: BIT-PARALLEL FINITE FIELD MULTIPLIERS FOR IRREDUCIBLE TRINOMIALS 529
TABLE 4Complexities for Canonical Multipliers Using Trinomials fðxÞ ¼ xm þ xmþ1
2 þ 1 (Odd m)
TABLE 5Product Coordinates di for the Multiplication over GF ð27Þ
for fðxÞ ¼ x7 þ x3 þ 1
expressions established by the transpositional method for the
coordinates of the product in � can be given for even (y odd)
values of � (with i ¼ 0; 1; . . . ;m� 1):
d�i¼
Gi þOðiþ1Þ0 even iEðiþ1Þ0 odd i
� �i < �m
yOðiþ1Þ0 þ yOi1 i ¼ �m
Gi�ð�mþ1Þ þOðiþ1Þ0 þOi1 even iEðiþ1Þ0 þEi1 odd i
� �i > �m
Eðiþ1Þ0 þEi1 þ yOði�ð�mþ1ÞÞ1 i ¼ m� 2
Om0 þ yEði�ð�mþ1ÞÞ1 i ¼ m� 1
8>>>>>>>>><>>>>>>>>>:
ð34Þ
Using binary trees of XOR gates for the construction, the
theoretical complexities of the multiplier can be deter-
mined. The space complexity is computed from the Eið0;1Þand Oið0;1Þ complexities given in Section 6.3 and using (29)
and (34). This complexity can be proven to be m2 AND and
m2 � 1 XOR gates.The theoretical time complexity of the multiplier can be
determined using (31), which establishes that the number of
XOR levels of the Gi groups is incremented with the decrease
of the index i. With this fact and using (34), the following
complexities for the product coordinates can be stated:
. When i < �m, the depth of the XOR treefor the di product coordinate is given by�ðdiÞ ¼ dlog2ðm� i� 1Þe þ 2.
. For i ¼ �m and for even �,
�ðdiÞ ¼ log2 2iþ 1
2
� � 1
� �� þ 1;
while, for odd �, we have �ðdiÞ ¼ dlog2ð2diþ12 eÞe þ 1.
. For i > �m with i 6¼ fm� 2;m� 1g, the depth of theXOR tree for the di coordinate is given by�ðdiÞ ¼ dlog2ðm� iþ�mÞe þ 2.
. For i ¼ m� 2, the depth of the XOR tree is given by�ðdiÞ ¼ dlog2ð2diþ1
2 eÞe þ 1.. For i ¼ m� 1, the depth of the XOR tree is given by
�ðdiÞ ¼ dlog2me þ 1.
The highest complexity from the previous ones is 2þdlog2ðm� 1Þe for i ¼ 0, so the theoretical time complexity is
given by this number of levels of XOR gates plus one level
of AND gates. Therefore, the delay of the multiplier is
TAND þ ð2þ dlog2ðm� 1ÞeÞTXOR, which is equal to the delay
obtained for the trinomials fðxÞ ¼ xm þ xmþ12 þ 1 with odd m.
In Table 6, the theoretical complexities obtained by ourtranspositional method and the best results found in theliterature for similar multipliers are given. It can beobserved that our method equals the best time complexitiesmost recently presented by Wu [18] and by Reyhani-Masoleh and Hasan [16]. The space complexity of ourmultipliers also matches these best results found in theliterature.
9 IRREDUCIBLE TRINOMIALS fðxÞ ¼ xm þ xm2 þ 1(EVEN m)
The irreducible trinomials of the form fðxÞ ¼ xm þ xm2 þ 1
(even m) are a special type of ESPs (Equally-Spaced-
Polynomials), known as ESTs (Equally-Spaced-Trinomials).
The irreducible ESPs are polynomials of the form
fðxÞ ¼ xk� þ xðk�1Þ� þ � � � þ x� þ 1, where m ¼ k� y k � 2.
The ESPs are reduced to ESTs when k ¼ 2 and are reduced
to AOPs (All-One-Polynomials) for � ¼ 1. For the EST fðxÞ ¼xm þ xm2 þ 1 (even m) , the parameter � ¼ m
2 and
�m ¼ m�22 ¼ �� 1. For even values of m with m > 2, � ¼
d2ðm�1Þm e ¼ 2 and K ¼ K0 þK1 þK2, whereas, for m ¼ 2,
� ¼ 1 and K ¼ K0 þK1 (in fact, fðxÞ ¼ x2 þ xþ 1 is of the
form fðxÞ ¼ xm þ xþ 1 that will be studied in Section 10).From the matrix structures given in Section 4, it can be
deduced that the sum of K0 and K2 generates sums ofidentical terms ci that are therefore canceled, i.e., ci þ ci ¼ 0.This fact reduces the space complexity of the multiplier. Thefollowing expressions for the product coordinates d�i
usingthe transpositional method for i ¼ 0; 1; . . . ;m� 1 and for even� (a condition that is verified for all irreducible ESTs) canbe given:
d�i¼
Oðiþ1Þ0 þEi1 þOð�þiÞ1 even i; i < �m
Ei1 i ¼ �m
Oði�ð�mþ1ÞÞ1 even i; i > �m
8<:
Eðiþ1Þ0 þOi1 þEð�þiÞ1 odd i; i < �m
Eði�ð�mþ1ÞÞ1 odd i; i > �m
�8>>>><>>>>:
ð35Þ
9.1 Multiplication Example over GF ð26ÞFor the trinomial fðxÞ ¼ x6 þ x3 þ 1 (� ¼ 3 and �m ¼ 2),the 1-cycles and 2-cycles are the ones given in Section 6.1.The product coordinates are given in Table 7, where, on itsleft side, the sums of terms ðO31 þO31Þ ¼ 0 and ðE41 þE41Þ ¼ 0 can be observed for the coordinates d3 and d4,respectively. On the right side of Table 7, the resultant
530 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 5, MAY 2006
TABLE 6Complexities for Canonical Multipliers Using Trinomials fðxÞ ¼ xm þ xm�1
2 þ 1 (Odd m)
reduced table is given, where �T�K00 results from the
combination and reduction of �T�K0 and �T�K2. In this
case, no complex groups Gi of terms Ei1 and/or Oi1 can be
found for sharing. The only existing grouping (of terms xijand xk) is given by their own terms Eið0;1Þ and Oið0;1Þ. The
nonexistence of Gi groups implies that there are no more
general expressions for the product coordinates than the
ones given in (35). It can be proven that the space complexity
of the multiplier is 36 AND and 33 XOR gates, whereas the
time complexity is TAND þ 4TXOR.
9.2 Theoretical Complexity Analysis
The theoretical complexity of the multiplier can be
computed using (35) and the Eið0;1Þ and Oið0;1Þ complexities
given in Section 6.3, where binary trees of XOR gates are
used for the construction. Using the cancellations of terms
as the ones shown in the previous example, it can be proven
that the space complexity of the multiplier is of m2 AND and
m2 �� XOR gates, which is a lower complexity than those
given for the previously studied trinomials.For the computation of the theoretical time complexity of
the multiplier, the following depths of XOR trees for the diproduct coordinates can be stated:
. When i < �m, �ðdiÞ ¼ dlog2ðm� i� 1Þe þ 1.
. For i ¼ �m (with �m ¼ m2 � 1 for ESTs),
�ðdiÞ ¼ dlog2ðm2 Þe þ 1.. For i > �m, �ðdiÞ ¼ dlog2ð2diþ1
2 eÞe þ 1.
The total delay of the multiplier can be computed using
the highest complexity from the previous ones, which
corresponds to the coordinate dm�1 with XOR tree depth
dlog2me þ 1. Finally, the theoretical time complexity is given
by this number of levels of XOR gates plus one level of
AND gates. Therefore, the delay of the multiplier is
TAND þ ð1þ dlog2meÞTXOR.In Table 8, the theoretical complexities obtained with our
transpositional method and the best results found in the
literature for similar multipliers are given. It can be
observed that our method equals the best time complexities
known to date (the result presented by Wu [18] equals the
remaining delays given in Table 8 for the values of m that
verify that the ESTs are irreducibles). The space complexity
of our multipliers also matches these best results found in
the literature.
10 IRREDUCIBLE TRINOMIALS fðxÞ ¼ xm þ xþ 1
For the irreducible generating trinomial fðxÞ ¼ xm þ xþ 1,
� ¼ m� 1, �m ¼ 0, and � ¼ dm�1� e ¼ 1. The decomposition
of K given in (6) is only the sum of K0 and K1.The product coordinates d�i
for even (y odd) values of �
using the transpositional approach can be computed as
follows (i ¼ 0; 1; . . . ;m� 1):
d�i¼
ðOðiþ1Þ0 þ yOi1Þ i ¼ 0
ðOðiþ1Þ0 þ yOi1Þ þ yEði�1Þ1 even i
ðEðiþ1Þ0 þ yEi1Þ þ yOði�1Þ1 odd i
ðyOðiþ1Þ0Þ þEði�1Þ1 i ¼ m� 1:
8>><>>: ð36Þ
10.1 Multiplication Example over GF ð26ÞFor the irreducible trinomial fðxÞ ¼ x6 þ xþ 1 (� ¼ 5), the
1-cycles and 2-cycles are given in Section 6.1. The product
coordinates obtained are shown in Table 9, where no
complex groups Gi of terms Ei1 and/or Oi1 can be found
for sharing. The only existent grouping (of terms xij and xk) is
given by their own terms Eið0;1Þ and Oið0;1Þ. The nonexis-
tence of Gi groups implies that there are no more general
expressions for the product coordinates than the ones given
in (36). It can be proven that the space complexity of the
multiplier is 36 AND and 35 XOR gates, whereas the time
complexity is TAND þ 4TXOR.
10.2 Theoretical Complexity Analysis
The theoretical complexity of the multiplier can be
computed using (36) and the complexities given in
Section 6.3, where binary trees of XOR gates are used for
the construction. The space complexity of the multiplier is
m2 AND and m2 � 1 XOR gates.
IMA~NNA ET AL.: BIT-PARALLEL FINITE FIELD MULTIPLIERS FOR IRREDUCIBLE TRINOMIALS 531
TABLE 7Product Coordinates di for the Multiplication over GF ð26Þ
for fðxÞ ¼ x6 þ x3 þ 1
TABLE 8Complexities for Canonical Multipliers Using Irreducible ESTs
For the computation of the theoretical time complexity of
the multiplier, the following depths of XOR trees for the diproduct coordinates can be stated:
. For i ¼ 0, �ðdiÞ ¼ dlog2ðm� 1Þe þ 1.
. For i 6¼ f0; m� 1g with even � and using (36), it canb e p r o v e n t h a t , i f �ðE11Þ ¼ �ðO01Þ, t h e n�ðdiÞ � dlog2ðm� 1Þe þ 2, w h i l e , i f �ðE11Þ <�ðO01Þ then �ðdiÞ � dlog2ðm� 1Þe þ 1.
. For i 6¼ f0;m� 1g with odd � and using (36), it canb e p r o v e n t h a t , i f �ðO11Þ ¼ �ðE01Þ, t h e n�ðdiÞ � dlog2ðm� 1Þe þ 2, w h i l e , i f �ðO11Þ <�ðE01Þ then �ðdiÞ � dlog2ðm� 1Þe þ 1.
. W h e n i ¼ m� 1 w i t h e v e n �, t h e n�ðdiÞ ¼ dlog2ð2dm2 e � 1Þe þ 1, while, for odd �, then�ðdiÞ ¼ dlog2ð2dm2 eÞe þ 1.
The total delay of the multiplier can be computed
using the highest complexity from the previous ones.
For m = 3, 4 and 6, �ðEðOÞ11Þ < �ðOðEÞ01Þ, so, for
these values of m, the theoretical time complexity of
the multiplier is TAND þ ð1þ dlog2ð2dm2eÞeÞTXOR, while,
for the remainder values of m, the time complexity is
TAND þ ð2þ dlog2ðm� 1ÞeÞTXOR.
In Table 10, the best theoretical complexities found in the
literature and the complexities obtained with our transposi-
tional approach are given. For our method, the results
obtained for m = 3, 4, and 6 (denoted as Transpositional1)
and the results obtained for the rest of the m values
(denoted as Transpositional2) are specified. For these
trinomials, our multipliers show worse time complexities
compared with the best results presented by Halbutogullari
and Koc [3], Zhang and Parhi [20], and Sunar and Koc [17].
However, the space complexity of our multipliers matches
these best results found in the literature.
11 CONCLUSIONS
In this paper, we have presented a new canonical basismultiplication method named transpositional. This methodhas been deduced from a new general formulation for thecanonical basis multiplication inGF ð2mÞbased on the use of atriangular basis. This approach introduces a product matrixthat can be decomposed in a sum of matrices depending on theirreducible polynomial selected for the field. From this matrixdecomposition, the transpositional multiplication method hasbeen deduced. It uses the notation given in group theory forpermutations and it is based on the computation of 1-cyclesand 2-cycles given by the permutation defined by thecoordinate of the product to be computed and by thecardinality of the field GF ð2mÞ. The obtained cycles definegroups of subexpressions in sum-of-products form (the func-tions Ei0, Oi0, Ei1, and Oi1) that can be shared among theproduct coordinates. A very important characteristic of ourmethod is that these functions and the K0 matrix are commonto any selected irreducible polynomial, so our method can becompletely generalized to perform the multiplication overgeneral irreducible polynomials.
In order to prove the efficiency of our transpositionalmethod, we have applied it to five types of irreducibletrinomials, for which more complex groups Gi of subexpres-sions for sharing are determined. These groups consistexclusively of the sum of terms Ei1 and Oi1 previouslyconstructed, so the Gi groups only contribute withadditional XORs to the multiplier complexity dependingon the selected polynomial. We have presented explicitexpressions for multiplication for these five types oftrinomials. These expressions can be easily coded (withouthaving any knowledge of finite field arithmetic) usinghardware description languages, which is a very attractivefeature for VLSI design and implementation of optimizedmultipliers.
The theoretical complexity analyses of the correspondingbit-parallel multipliers have shown that the space complex-ities of our multipliers match the best results found in theliterature. The complexity analyses have also proven thatour method reduces, in two of the five studied trinomials,the best time complexities known to date for similarmultipliers. This is shown in Table 11, where a list of
532 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 5, MAY 2006
TABLE 9Product Coordinates di for the Multiplication over GF ð26Þ
for fðxÞ ¼ x6 þ xþ 1
TABLE 10Complexities for Canonical Multipliers Using Irreducible Trinomials fðxÞ ¼ xm þ xþ 1
irreducible trinomials fðxÞ ¼ xm þ xn þ 1 (m � 1000) for
which the transpositional method achieves better perfor-
mance than the best results found in the literature for
similar multipliers, is given. These trinomials are repre-
sented as ðm;nÞ in Table 11. In another two of the studied
trinomials, our multipliers match the best results found in
the literature. Only in one of the five trinomials are the
multiplier complexities given by our method worse than the
best complexities presented by other authors.
REFERENCES
[1] S.T.J. Fenn, M. Benaissa, and D. Taylor, “GF ð2mÞ Multiplicationand Division over the Dual Basis,” IEEE Trans. Computers, vol. 45,no. 3, pp. 319-327, Mar. 1996.
[2] A. Halbutogullari and C.K. Koc, “Mastrovito Multiplier forGeneral Irreducible Polynomials,” Applied Algebra, AlgebraicAlgorithms, and Error-Correcting Codes, pp. 498-507, 1999.
[3] A. Halbutogullari and C.K. Koc, “Mastrovito Multiplier forGeneral Irreducible Polynomials,” IEEE Trans. Computers, vol. 49,no. 5, pp. 503-518, May 2000.
[4] M.A. Hasan, “Double-Basis Multiplicative Inversion overGF ð2mÞ,” IEEE Trans. Computers, vol. 47, no. 9, pp. 960-970, Sept.1998.
[5] M.A. Hasan and V.K. Bhargava, “Architecture for a LowComplexity Rate-Adaptive Reed-Solomon Encoder,” IEEE Trans.Computers, vol. 44, no. 6, pp. 938-942, June 1995.
[6] M.A. Hasan, M.Z. Wang, and V.K. Bhargava, “Modular Con-struction of Low Complexity Parallel Multipliers for a Class ofFinite Fields GF ð2mÞ,” IEEE Trans. Computers, vol. 41, no. 8,pp. 962-971, Aug. 1992.
[7] M.A. Hasan, M.Z. Wang, and V.K. Bhargava, “A ModifiedMassey-Omura Parallel Multiplier for a Class of Finite Fields,”IEEE Trans. Computers, vol. 42, no. 10, pp. 1278-1280, Oct. 1993.
[8] J.L. Imana and J.M. Sanchez, “A New Reconfigurable-OrientedMethod for Canonical Basis Multiplication Over a Class of FiniteFields GF ð2mÞ,” Proc. 13th Int’l Conf. Field Programmable Logic andApplications, pp. 1127-1130, 2003.
[9] T. Itoh and S. Tsujii, “Structure of Parallel Multipliers for a Classof Finite Fields GF ð2mÞ,” Information and Computation, vol. 83,pp. 21-40, 1989.
[10] C.K. Koc and B. Sunar, “Low-Complexity Bit-Parallel Canonicaland Normal Basis Multipliers for a Class of Finite Fields,” IEEETrans. Computers, vol. 47, no. 3, pp. 353-356, Mar. 1998.
[11] R. Lidl and H. Niederreiter, Introduction to Finite Fields and TheirApplications. New York: Cambridge Univ. Press, 1994.
[12] E.D. Mastrovito, “VLSI Architectures for Multiplication overFinite Fields GF ð2mÞ,” Proc. Sixth Int’l Conf. Applied Algebra,Algebraic Algorithms, and Error-Correcting Codes (AAECC-6),pp. 297-309, July 1988.
[13] Applications of Finite Fields, A.J. Menezes, ed. Boston: KluwerAcademic, 1993.
[14] J. Omura and J. Massey, “Computational Method and Apparatusfor Finite Field Arithmetic,” US Patent Number 4,587,627, May1986.
[15] K.K. Parhi, VLSI Digital Signal Processing Systems: Design andImplementation. John Wiley & Sons, 1999.
[16] A. Reyhani-Masoleh and M.A. Hasan, “On Low Complexity BitParallel Polynomial Basis Multipliers,” Proc. Workshop on Crypto-graphic Hardware and Embedded Systems (CHES 2003), pp. 189-202,2003.
[17] B. Sunar and C.K. Koc, “Mastrovito Multiplier for All Trinomials,”IEEE Trans. Computers, vol. 48, no. 5, pp. 522-527, May 1999.
[18] H. Wu, “Bit-Parallel Finite Field Multiplier and Squarer UsingPolynomial Basis,” IEEE Trans. Computers, vol. 51, no. 7, pp. 750-758, July 2002.
[19] H. Wu and M.A. Hasan, “Low-Complexity Bit-Parallel Multipliersfor a Class of Finite Fields,” IEEE Trans. Computers, vol. 47, no. 8,pp. 883-887, Aug. 1998.
[20] T. Zhang and K.K. Parhi, “Systematic Design of Original andModified Mastrovito Multipliers for General Irreducible Poly-nomials,” IEEE Trans. Computers, vol. 50, no. 7, pp. 734-749, July2001.
Jose Luis Imana received the PhD degree inphysics from the Complutense University ofMadrid, Spain, in 2003. From 1991 to 1993, hewas a design engineer (R&D) with the Depart-ment of Information Technologies, TechnologyInstitute of Madrid, Spain. He is currently withthe Department of Computer Architecture andAutomation at the Complutense University ofMadrid, where he is an assistant professor. Hisresearch interests include algorithms and VLSI
architectures for computations in Galois fields, cryptography, computerarithmetic, reconfigurable computing architectures, and formal methodsin verification.
Juan Manuel Sanchez received the PhDdegree in physics from the Complutense Uni-versity of Madrid in 1976. He is a professor ofcomputer architecture in the Department ofComputer Science, University of Extremadura,Spain. His research interests are applications ofreconfigurable hardware, logic design, moderncomputer architectures, and cryptography.
Francisco Tirado received the applied physicsdegree from Universidad Complutense de Ma-drid (UCM) in 1973 and the PhD degree inphysics from UCM in 1977. He has held severalpositions with the Computer Science and Auto-matic Control Department of the UCM. From1978-1985, he was an associate professor and,since 1986, he has been a professor ofcomputer architecture and technology. He hasworked in different fields within computer archi-
tecture, parallel processing, and design automation. His currentresearch areas are parallel algorithms and architectures, processordesign. Professor Tirado has coauthored more than 200 publications: 15book chapters, 47 magazine articles, and 133 papers at conferences.He has served in the organization of more than 60 internationalconferences as general chair, steering committee member, programchair, program committee member, invited speaker, and session chair.He is the director of the CSC4 (Center for SuperComputation) andMadrid Science Park. He has been the dean of the Physics Science andElectronic Engineering Faculty (1994-2002). He is member of theInformatics Advisory Board of the UCM and he has been also vice-deanof the Physics Science Faculty and Head of the Computer Science andAutomatic Control Department. For five years (1988-1992), he served asgeneral manager of the Spanish National Programme for Robotics andAdvanced Automation. He is an adviser of the National Agency forResearch and Development (CICYT). He also represents the CICYT onseveral national and international committees on information technol-ogy. Professor Tirado served on the research evaluation committee inSpain for three years and chaired it in 2001-2002. He is a senior memberof the IEEE and of several European institutions and committees. He isan adviser of the Spanish Ministry of Science and Technology.
IMA~NNA ET AL.: BIT-PARALLEL FINITE FIELD MULTIPLIERS FOR IRREDUCIBLE TRINOMIALS 533
TABLE 11Irreducible Trinomials (m � 1000) for which the Transpositional
Method Achieves Better Performance