Modular Range Reduction: A New Algorithm for Fast and Accurate Computation on the Elementary...

14
Journal of Universal Computer Science, vol. 1, no. 3 (1995), 162-175 submitted: 4/1/95, accepted: 1/3/95, appeared: 28/3/95, Springer Pub. Co. 162

Transcript of Modular Range Reduction: A New Algorithm for Fast and Accurate Computation on the Elementary...

Journal of Universal Computer Science, vol. 1, no. 3 (1995), 162-175submitted: 4/1/95, accepted: 1/3/95, appeared: 28/3/95, Springer Pub. Co.

{

{

O n n

� �

1 Introduction

: : :

f x xf x g x

x x xx

g

x x kC k CC

�=

Modular Range Reduction (MRR)Abstract:

Key Words:

Category:

reduced argument

Additive reduction.

Modular Range Reduction: a New Algorithm for Fast and

Accurate Computation of the Elementary Functions

Marc Daumas

Christophe Mazenc

Xavier Merrheim

Jean-Michel Muller

The algorithms used for evaluating the elementary functions (polynomial or ra-tional approximations [Cody and Waite 1980, Remes 1934], Taylor expansions,shift-and-add algorithms | see [Ercegovac 1973],[DeLugish 1970],[Walther 1971]and [Asada et al. 1987], table lookup methods ) only give a correct result ifthe argument is within a given bounded interval. In order to evaluate an elemen-tary function ( ) for any , one must �nd some \transformation" that makesit possible to deduce ( ) from some value ( ), with:

is deduced from , is called thebelongs to the convergence domain of the algorithm implemented for the

evaluation of .

In practice, there are two di�erent kinds of reduction:

1. is equal to , where is an integer and aconstant (for instance, for the trigonometric functions, is a multiple of

4).

(Lab. LIP, Ecole Normale Sup�erieure de [email protected])

(Lab. LIP, Ecole Normale Sup�erieure de [email protected])

(Lab. LIP, Ecole Normale Sup�erieure de [email protected])

(CNRS, Lab. LIP, Ecole Normale Sup�erieure de [email protected])

A new range reduction algorithm, called ,brie y introduced by the authors in [Daumas et al. 1994] is deeply analyzed. It is usedto reduce the arguments to exponential and trigonometric function algorithms to bewithin the small range for which the algorithms are valid. MRR reduces the argumentsquickly and accurately. A fast hardwired implementation of MRR operates in time(log( )), where is the number of bits of the binary input value. For example,

with MRR it becomes possible to compute the sine and cosine of a very large numberaccurately. We propose two possible architectures implementing this algorithm.

Computer Arithmetic, Elementary Functions, Range Reduction

B.2, G.1.0

162

8><>:

{

{

{

{

{

{

{

{

{

k

x

x

x

x k

k

x x k k

x

2 � �

��

2 �

�b cb c �

� � �

� �

� �

x x=C k CC

x�= ; �=

C �= x

x k x �= ; �= x x k�=g x ; k

x kx kx kx k

x g x ; k

e

; C e

x ; k x x k

g x ee g x

; x x x

k x g x e e g x e ek

xx g k

x

e g x

Multiplicative reduction.

Example 1 Computation of the cosine function.

additive reduction

Example 2 Computation of the exponential function.

2. is equal to , where is an integer and aconstant (for instance, for the logarithm function, a convenient choice foris the radix of the number system).

Assume that we want to eva-luate cos( ), and that the convergence domain of the algorithm used to evaluatethe sine and cosine of the reduced argument contains [ 4 + 4]. We choose= 2, and the computation of cos( ) is decomposed in three steps:

Compute and such that [ 4 + 4] and = 2Compute ( ) =

cos( ) if mod 4 = 0sin( ) if mod 4 = 1cos( ) if mod 4 = 2sin( ) if mod 4 = 3

(1)

Obtain cos( ) = ( )

The previous reduction mechanism is an . Let us examineanother example of additive reduction.

Assume that we want toevaluate in a radix-2 number system, and that the convergence domain of thealgorithm used to evaluate the exponential of the reduced argument contains[0 ln(2)]. We choose = ln(2), and the computation of is decomposed inthree steps:

Compute [0 ln(2)] and such that = ln(2).

Compute ( ) =Compute = 2 ( )

The radix-2 number system makes the �nal multiplication by 2 straightfor-ward.

Another way of performing the range reduction for the exponential function(with an algorithmwhose convergence domain is [0 1]) is to choose = ,= , and ( ) = . Then, = ( ) , and can either be evaluated

by performing a few multiplications| since is an integer | or by table-lookup.Usually, this latter method is preferable, since there is no loss of accuracy whencomputing . With the range reduction algorithm we give in the following, theformer choices for , , and become interesting, for several reasons:

we will be able to compute very accurately,the required convergence interval is smaller, which means that to reach thesame accuracy, we need a smaller number of coe�cients for a polynomial orrational approximation,there will not be any error when deducing from ( ).

Anyway, range reduction is more a problem for trigonometric functions thanfor exponentials, since, in practice, we never have to deal with exponentials ofvery large numbers: they merely are over ows!

163

2

{

{

{

k

k

� � �

6

Table 1.

2

� �

� 2

� �

� �

� �

I C= �;C= � I �;C �

��

2 The Modular Range Reduction Algorithm

x x >

= ; C x

x = ; k x x= xx k

g x ; k xx g x ; k k

nC

x=C

g I C= �; C= ��; C �

� x I k

x x kC

� > x kx

C ; I ; x : kx : : : : k x : : : : �

= [ 2 2 + ] = [ + ]

= 0 symmetrical non-redundant positive non-redundant= 0 symmetrical redundant positive redundant

The di�erent cases in additive range-reduction

Example 3 Computation of the logarithm function.

multiplicative

additive

\symmetrical reduction" \positive

reduction"

\redundant range

reduction"

\non-redundant range reduction"

Assume that we want to eva-luate ln( ), 0, in a radix-2 number system, and that the convergence domainof the algorithmused to compute the logarithmof the reduced argument contains[1 2 1]. We choose = 2, and the computation of ln( )is decomposed in threesteps:

Compute [1 2 1] and such that = 2 (if is a normalizedradix-2 oating-point number, is its mantissa, while is its exponent).Compute ( ) = ln( )Compute ln( ) = ( ) + ln(2)

The previous mechanism is a reduction.

In practice, multiplicative reduction is not a problem: when computing the usualmathematical functions, it only occurs with logarithms and -th roots. Withthese functions, as in the example above, can be chosen equal to a power ofthe radix of the number system. This makes the computation of straight-forward. Therefore, in the following, we concentrate on the problem ofrange reduction only.

We focus on the following problem: we assume that we have an algorithm ableto compute the function in an interval of the form [ 2 + 2+ ] (wecall this case ) or [ + ] (we call this case

), with 0. We want to compute and an integer such that:

= (2)

If 0, then and are not uniquely de�ned by Eq. 2. In such a case,the problem of deducing these values from will be called

. For example, if = = [ 1 1] and = 2 5, then = 1 and= 0 929203 or = 2 and = 0 641592 are possible values. If = 0,

this problem is called . As in many �elds ofcomputer arithmetic, redundancy will allow faster algorithms. Table 1 sums-upthe di�erent possible cases.

164

P

ii

Cordic

� � � � � �

1 2 3 0 1 2

1

=

+1

2

b e

b c b e

2 f g

� 2 �

N

N N N p i

N

i p ii

� �

imC

ii

{

{

2.1 Fixed-point reduction

redundant

modu-lar reduction algorithm

Fixed-point radix-2 numbers

redundant

kC �=

k C � k

I C

: : : : ; : : : :�= ; �=

x x

k x=C k x=C

x x kCk kC

x

N p

x x x : : :x :x x : : : x ; x ;

x

�< C

i � m C= ;C=

m C

It is worth noticing that:

1. In some practical cases, it is not necessary to fully compute . For instance,for the trigonometric functions, if = 2, then one just needs to knowmod 4. If = 2 , there is no need for any information about .

2. With the usual algorithms for evaluating the elementary functions, one canassume that the length of the convergence domain is greater than ,i.e. that we can perform a range reduction. For instance, withthe algorithm, when performing rotations (see [Walther 1971]), theconvergence domain is [ 1 743 +1 743 ], which is much larger than[ 2 + 2]. With polynomial or rational approximations, the convergencedomain can be enlarged by adding one coe�cient to the approximation.

Let us de�ne as the nearest integer to . Usually, the range reduction isdone by:

Computing = (in the case of a positive reduction), or =(in the case of a symmetrical reduction) by the means of multiplication ordivision.Computing the reduced argument = by the means of a multipli-cation by or a table-lookup if the values are pre computed and storedfollowed by a subtraction.

The above process may be rather inaccurate (for large values of , the �-nal subtraction step leads to a catastrophic cancellation | although cunningtricks have been proposed to limit this cancellation [Cody and Waite 1980]).In [Daumas et al. 1994] we brie y proposed a new algorithm, called the

(MRR), that performs the range reduction quickly andaccurately. In the following we deeply analyze that algorithm in terms of speedand accuracy, and we propose architectures implementing it.

First of all, we assume that the input operands are ,less than 2 . These numbers have a -bit integer part and a -bit fractionalpart. So the digit chain:

where 0 1

represents the number 2 .We assume that we should perform a range reduction, and we callthe integer such that 2 2 .

Let us de�ne, for the number [ 2 2) such that is an

integer (in the following, we will write \ 2 mod "). The Modular RangeReduction (MRR) algorithm consists in performing the two following steps:

165

P P

2

2

� �

r x m x

1

1

=

1

=

N

i � i i�

i p ii

First reduction

Second reduction

� � � � � �

� � � � � �

� � � � � �

� � � �

� � � � b c

b c

� �

� �

N N N N N N � �

� � � p

i

i i

� � � p

i

` ` `

` ` ` �

rC

rC

rC�

rC

m

i

q

q

i

Proof

In the symmetrical case.

In the positive case.

1

1 1 2 2 3 3

1 2 3 0 1 2

1 2 3 0 1 2

1 2 0 1 2

2

2

1 2 0 1 2 log ( )

2 2

^ ^

^ 12

log ( )

^

1

1

2

� � �

� � � � � �

b � c

d� e �d� e b c

� � � � � �

b � c d� e

b e b c � � ��

j � j � j � j �

j � j � �

j � j �

� � �� � �

d � e

This formula looks correct only for positive values of . It would be more correct,

although maybe less clear, to write: = + 2

r x m x m x m : : : x mx x x : : : x :x x : : :x

xN � r N � C=

N � C= x mC=

x x x : : :x :x x : : : x

Cr

r r r r r :r r

` N �r

r � x x

r r r r r :r r r

r m m N � �

k r kC C= �; C= ��; C �

k r kC C=

r r r �

r kCC

k < k r kC < C� r kC < C �

k rkC

rN �

m kCq

N �x

x pm kC p N �

We compute the number :

= ( ) + ( ) + ( ) + + ( )+

(3)

Since the 's are equal to 0 or 1, this computation is reduced to the sum of+1 terms. The result of this �rst reduction is between ( +2) 2

and +( + 2) 2. This is a consequence of the fact that all thehave an absolute value less than 2, and

has an absolute value less than 2 , which is less than .De�ne the 's as the digits of the result of the �rst reduc-

tion:=

where = log ( + 2) .Let us de�ne ^ as the number obtained by truncating the binary representa-tion of after the log ( ) -th bit, that is (using the relation = ):

^ =

^ is an -digit number, where = log ( + 2) + log ( )is a very small number in all practical cases (see the example below). If wede�ne as (resp. ) then will belong to [ 2 + 2+ ](resp. [ + ]), i.e. it will be the correct result of the symmetrical (resp.positive) range reduction.

1. We have , therefore ^ 2.

From the de�nition of , ^ 2 , therefore:

2+

2. We have + 1, therefore 0 ^ ,therefore + .

Since can be deduced from , this second reduction step will be implementedby looking up the value in a 2 -bit entry table at the address constitutedby the bits of . Fig. 1 Sums up the di�erent steps of MRR.

During this reduction process, we perform the addition of + 1 terms.If these terms (namely the 's and the value of the second reduction step)are represented in �xed-point with fractional bits (i.e. the error on each ofthese term is bounded by 2 ), then the di�erence between the result of thecomputation and the exact reduced argument is bounded by 2 ( + 1).In order to obtain the reduced argument with the same absolute accuracy asthe input argument (i.e. signi�cant �xed-point fractional digits), one needsto store the 's and the values with + log ( + 1) fractional bits.

166

{

{

{

Fig. 1.

Cordic

2 �d� e

� �

� �

� � � � �

1 1

2 2

1 2 0 1 2

N N

N N

� �

� � p

20 20

2

2

3

5

2

Example 4.

symmetrical redundant

+ 1

^

The Modular Reduction Algorithm

x m

x m

x m

x x : : : x :x x : : : x

N �

: : :

: : :

r

kC

: : :

I : : : : ; : : : :

: >

C � � : : : : : : : : >

N �

r �; � � <

-

-

-

-

? ? ? ? ? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

. .

. .

. .-input adder

Table

Subtractor

Reduced argument

Assume we need to compute sines of angles between 2 and 2 ,and that the algorithmused with the reduced arguments is [Volder 1959],[Walther 1971]. The convergence interval is [ 1 743 +1 743 ], therefore(since 1 743 ) we have to perform a range reduction,

with = and = +1 743 = 0 172 2 . We immediately get thefollowing parameters:

= 20 and = 2The �rst range reduction consists of the addition of 19 terms

[ 10 +10 ], therefore, since 10 2 , the second reduction step re-quires a table with 5 + log = 8-address bits.

167

P P

ii

� �

� �

� �

� �

� �

absolute

relative

8 6 5 1

8

6

5

1

8 6 5 1

1 2 3

2

1

=0 =0

{

{

{

{

{

2.2 Floating-point reduction

3 Architectures for Modular Reduction

i

nexponent

i

exponent i ii

m

C

i

i

q

q

i ii q

j ij

jj

pm kC p

m m m m

m � : : : :m � : : : :m � : : : :m � : : : :

m m m m : : : :� : : : :

: : : :: : : :

x

x :x x x : : : x

mm m C

C= ;C=

m

m kC q

n

N �

x x y y

q y x

To obtain the reduced argument with signi�cant fractional bits, one needsto store the 's and the values with + 5 bits.

Assume we compute the sine of 355. The binary representation of 355 is 101100011.Therefore during the �rst reduction, we have to compute + + + +1,where:

= 256 81 = 1 530995059226747684= 64 20 = 1 1681469282041352307= 32 10 = 0 5840734641020676153= 2 = 1 141592653589793238462

We get + + + +1 = 3 1416227979431572921 The second reductionconsists in subtracting from that result, which gives 0 00003014435336405372 ,the sine of which is 0 000030144353359488449Therefore, sin(355) = 0 000030144353359488449

Now, assume that the input value is a radix-2 oating-point number:

= 0 2

The range reduction can be performed exactly as in the �xed-point case. Duringthe �rst reduction, we replace the addition of the terms by the addition ofthe terms . As previously, 2 mod is the number belonging

to [ 2 2) such that is an integer. The main di�erence between thisreduction method and the other ones is that during the reduction process, wejust add numbers (the 's) of the same order of magnitude, represented in�xed-point. This makes the reduction very accurate. One can easily show thatif the 's and the terms of the second reduction are represented withfractional bits then the error on the reduced argument is bounded by( + 1)2 . Thus, for instance it is possible to compute with good accuracythe sine and cosine of a huge oating-point number. In oating point numbersystems, one would prefer informations on the error: this will be discussedlater.

The �rst reduction consists of adding + 1 numbers. This addition can beperformed in a redundant number system (carry-save or signed-digit) in order tobene�t from the carry-free ability of such a system, and/or in an arborescent way.This problem is obviously closely related to the problem of multiplying two num-bers (multiplying = 2 by = 2 reduces to computing the sum

of the +1 terms 2 ). Therefore, almost all the classical architectures propo-sed in the multiplication literature (see for instance [Braun 1963], [Dadda 1965],[Harata et al. 1987], [Nakamura 1986], [Takagi et al. 1985], [Wallace 1964]), canbe slightly modi�ed in order to be used for range reduction. For instance,the architecture shown Fig. 2 is obtained from Braun's cellular array multi-plier [Braun 1963], while the logarithmic-time architecture shown in Fig. 4 is a

168

e

� �

j � j �

� �

��������

0

0

1

1

2

2

3

3

4

4

ijj

i

i

q

mexponent

e

m

e

Mk

i i

P

Q

P

Q

P

Q

P

Q

P

Q

1

1 2 3

1 2

2

22

7

333

106

355

113

103993

33102

4 Accuracy of MRR with oating-point inputs

4.1 How could MRR results close to zero be avoided ?

m m

xx

x mm

kC qm

x

x :x x x : : :x

x M

M x x : : :xe exponent m

� k

x kC �

M

kC

k

C

CP =Q

C �

absolute

absoluterelative

rational approximation

continued fraction approximations

Wallace tree [Wallace 1964]. In Fig. 3, is the digit of weight 2 of . Thissimilarity between the Modular Range Reduction algorithm and the multipli-cation makes it possible to perform both operations with the same hardware,which can save some silicon area on a circuit.

The similarity between the range reduction and the multiplication leads us toanother idea: in order to accelerate the �rst reduction, one can perform a Boothrecoding [Booth 1951], or merely a modi�ed booth recoding [Hwang 1979], of .This would give a signed digit (with digits -1, 0 and 1) representation of withat least half of the digits equal to zero. Then the number of terms to be addedduring the �rst reduction would be halved.

As pointed out in section 2.2, if the input value is a -mantissa bit radix-2 oating-point number, and if the terms 's of the �rst reduction and theterms of the second reduction are represented with fractional bits thenthe error on the reduced argument is bounded by ( + 1)2 . Thismakes it possible to evaluate sines and cosines of huge oating-point numberswith a good accuracy. However, in oating-point, one is more concernedwith the accuracy: what could be done if the result of MRR is zero or anumber close to zero? This would indicate that the exact reduced argument isvery small, but the computed value would only have a few (maybe not at all)signi�cant digits. In the sequel of this paper, we deal with that problem.

As in section 2.2, the input number is the radix-2 oating-point number:

= 0 2

Therefore, is equal to 2 , where:

==

M is an integer. Assume that the result of the range reduction is very small, sayless than a very small real number . This means that there exists an integersuch that:

This implies:2

(4)

This means that is a very good of . To see to whatextent relation (4) is likely to happen, we can compare this approximation to thesequence of the \best" possible rational approximations of , i.e. the sequenceof its . Let us call ( ) the sequence of thecontinued fraction approximations of C. For instance if = then:

= 3 = = = =

169

� �

{

{

{

� �( )

( )

1

5 Conclusion

max

max

� k

� k

q

� � �

� �

� �

( ) 1 ( )

( )

( )

( )

( )( ) 1

( ) 1

( +1)2

200

� k

Q < k Q

d Qn

dC

P

QC

P

QC

k

P

QC Q �

� x

k

� C Q

q

� k � k

nd

i

i

i

� k

� k

� k

� k� k

r max

maxx

C

k kP

Q � k

m

� r

���

���

����

����

����

����

����

����

� ����

���

Let us de�ne ( ) as the integer such that:

Since (from the theory of continued fractions), for each rational number suchthat , we have:

we deduce, from (4):

Therefore:

(5)

So, if one wants to get a relative error for input values lower than ,the way to proceed is the following one:

evaluate =

compute = min

make sure that the number of fractional digits used for the reduction

satis�es

We have proposed a fast and accurate algorithm to perform the range reductionfor computing the elementary functions. This algorithm can be implementedusing a slightly modi�ed multiplier, so that range reduction and multiplicationcan be performed using the same hardware, which can save some silicon area ona circuit. Using this algorithm, accurately and quickly computing the sine of anumber such as 10 becomes possible.

170

���

? ?

? ???

Fig. 2.

321

9

8

01

2

3

4

5

6

7

xxx

x

x

xx

x

x

x

x

x

x

0 00

A cellular array for range reduction

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

{

Reduced Argument

3-INPUT ADDER

Multiples of C

"SMALL ADDER"

HA

HA

HA

HA

HA

HA

HA

HA

HA

HA

HA

HA

� �@@

? ? ? ?

?

?

��

��

��

?

?

?

?

��

��

��

��

��

?

?

?

?

?

?

���

?

���

?

���

?

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

���

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

?

? ? ? ? ? ? ? ?

171

tx

m

i

ij

Fig. 3.

���

?

��

��

�?

sumcarry

and

FA

A black cell of the cellular array

172

Fig. 4.

x m x m x m x m x m x m x m x m x m x m x m x x :x : : :

? ? ? ?? ? ? ?? ? ? ?

? ? ? ? ? ?

? ?? ? ? ?

? ? ?

? ?

?

?

2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 1 0 1 2

A logarithmic-time range-reduction tree

� � � � � � � � � � � � � � � � � � � � � � � �

� � � � � � � �

� � � � � � � � � � � �� � � �

� � � �

� � � � � � � �

� � � �

� � � � � � � �

� �

� �

� �

( )

n n n n n n n n n n n n

2n 2n 2n 2n

n n n n n nn n

2n 2n

n n n n

2n 2n

n n n n

2n

2n

n

Carry-Save Adder Carry-Save Adder Carry-Save Adder Carry-Save Adder

Carry-Save Adder Carry-Save Adder

Carry-Save Adder Carry-Save Adder

Carry-Save Adder

Carry Save Adder

Carry-Propagate Adder

Result of the �rst reduction

173

References

Systems and computers in Japan

Quar-

terly journal of mechanics and applied mathematics

Digital computer design

Software Manual for the Elementary

Functions

Alta Fre-

quenza

14th IMACS World Congress on Com-

putational and Applied Mathematics

IEEE Transactions on Computers

IEEE journal of solid-state circuits

Computer Arithmetic Principles, Architecture and

design

A Class of Algorithms for Automatic Evalua-

tion of Functions and Computations in a Digital Computer

IEEE Transactions on Computers

C.R.

Acad. Sci. Paris

IEEE

Transactions on Computers

IRE Transactions

on Electronic Computers

IEEE Tran-

sactions on Electronic Computers

Joint Computer Conference Proceedings

[Asada et al. 1987] T. Asada, N. Takagi, and S. Yajima. A hardware algorithm forcomputing sine and cosine using redundant binary representa-tion. , 18(8), 1987.

[Booth 1951] A.D. Booth. A signed binary multiplication technique., 4(2):236{

240, 1951. Reprinted in E.E. Swartzlander, Computer Arith-metic, Vol. 1, IEEE Computer Society Press Tutorial, 1990.

[Braun 1963] E.L. Braun. . New York academic,1963.

[Cody and Waite 1980] W. Cody and W. Waite.. Prentice-Hall Inc, 1980.

[Dadda 1965] L. Dadda. Some schemes for parallel multipliers., 34:349{356, March 1965. Reprinted in E.E. Swartz-

lander, Computer Arithmetic, Vol. 1, IEEE Computer SocietyPress Tutorial, 1990.

[Daumas et al. 1994] M. Daumas, C. Mazenc, X. Merrheim, and J.M. Muller. Fastand Accurate Range Reduction for the Computation of Ele-mentary Functions. In

, Atlanta, Georgia, 1994.[Ercegovac 1973] M.D. Ercegovac. Radix 16 evaluation of certain elementary

functions. , C-22(6), June1973. Reprinted in E.E. Swartzlander, Computer Arithmetic,Vol. 1, IEEE Computer Society Press Tutorial, 1990.

[Harata et al. 1987] Y. Harata, Y. Nakamura, H. Nagase, M. Takigawa, andN. Takagi. A high-speed multiplier using a redundant binaryadder tree. , SC-22(1):28{34, February 1987. Reprinted in E.E. Swartzlander, Compu-ter Arithmetic, Vol. 2, IEEE Computer Society Press Tutorial,1990.

[Hwang 1979] K. Hwang.. Wiley & Sons Inc, 1979.

[DeLugish 1970] B. De Lugish..

PhD thesis, Dept. of Computer Science, University of Illinois,Urbana, 1970.

[Nakamura 1986] S. Nakamura. Algorithms for iterative array multiplication., C-35(8), August 1986.

[Remes 1934] E. Remes. Sur un proc�ed�e convergent d'approximations suc-cessives pour d�eterminer les polynomes d'approximation.

, 198, 1934.[Takagi et al. 1985] N. Takagi, H. Yasukura, and S. Yajima. High speed multipli-

cation algorithm with a redundant binary addition tree., C-34(9), September 1985.

[Volder 1959] J. Volder. The cordic computing technique., 1959. Reprinted in E.E. Swartz-

lander, Computer Arithmetic, Vol. 1, IEEE Computer SocietyPress Tutorial, 1990.

[Wallace 1964] C.S. Wallace. A suggestion for a fast multiplier., pages 14{17, February 1964.

Reprinted in E.E. Swartzlander, Computer Arithmetic, Vol. 1,IEEE Computer Society Press Tutorial, 1990.

[Walther 1971] J. Walther. A uni�ed algorithm for elementary functions. In, 1971. Reprinted in

174

E.E. Swartzlander, Computer Arithmetic, Vol. 1, IEEE Com-

puter Society Press Tutorial, 1990.

175