Fast Structured Total Least Squares Algorithm for Solving the Basic Deconvolution Problem

22

Transcript of Fast Structured Total Least Squares Algorithm for Solving the Basic Deconvolution Problem

Katholieke Universiteit LeuvenDepartement Elektrotechniek ESAT-SISTA/TR 1998-89Fast structured Total Least Squares algorithm for solvingthe basic deconvolution problem1Nicola Mastronardi2 Philippe Lemmerling3Sabine Van Hu�el4September 1998Submitted to SIMAX

1This report is available by anonymous ftp from ftp.esat.kuleuven.ac.be in the di-rectory pub/SISTA/lemmerli/reports/int9889.ps.Z. This work was supportedby the National Research Council of Italy under the Short Term Mobility Pro-gram, by the Belgian Programme on Interuniversity Poles of Attraction (IUAP-4/2 & 24), initiated by the Belgian State, Prime Minister's O�ce for Science,and by a Concerted Research Action (GOA) project of the Flemish Community,entitled \Model-based Information Processing Systems".2Dipartimento di Matematica, Universit�a della Basilicata, via N. Sauro 85, 85100Potenza, Italy ([email protected]). This author would like to acknowl-edge the hospitality of Dept. Elektrotechniek (ESAT), Katholieke UniversiteitLeuven, where this work took place.3Department of Electrical Engineering, ESAT-SISTA, Katholieke Uni-versiteit Leuven, Kardinaal Mercierlaan 94, 3001 Heverlee, Belgium([email protected]). This author is a Ph.D.student funded by the IWT (Flemish Institute for Support of Scienti�c-Technological Research in Industry).4Department of Electrical Engineering, ESAT-SISTA, Katholieke Uni-versiteit Leuven, Kardinaal Mercierlaan 94, 3001 Heverlee, Bel-gium([email protected]). This author is a ResearchAssociate with the F.W.O. (Fund for Scienti�c Research-Flanders).

AbstractIn this paper we develop a fast algorithm for the basic deconvolution prob-lem. First we show that the kernel problem to be solved in the basic de-convolution problem is a so-called structured Total Least Squares problem.Due to the low displacement rank of the involved matrices, we are ableto develop a fast algorithm. We apply the new algorithm on a deconvo-lution problem arising in a medical application in renography. By meansof this example, we show the increased computational performance of ouralgorithm as compared to other algorithms for solving this type of struc-tured Total Least Squares problems. In addition, Monte-Carlo simulationsindicate the superior statistical performance of the structured Total LeastSquares estimator compared to other estimators such as the ordinary TotalLeast Squares estimator.

FAST STRUCTURED TOTAL LEAST SQUARES ALGORITHM FORSOLVING THE BASIC DECONVOLUTION PROBLEM �NICOLA MASTRONARDIy, PHILIPPE LEMMERLINGz, AND SABINE VAN HUFFELxAbstract. In this paper we develop a fast algorithm for the basic deconvolution problem. Firstwe show that the kernel problem to be solved in the basic deconvolution problem is a so-calledstructured Total Least Squares problem. Due to the low displacement rank of the involved matrices,we are able to develop a fast algorithm. We apply the new algorithm on a deconvolution problemarising in a medical application in renography. By means of this example, we show the increasedcomputational performance of our algorithm as compared to other algorithms for solving this type ofstructured Total Least Squares problems. In addition, Monte-Carlo simulations indicate the superiorstatistical performance of the structured Total Least Squares estimator compared to other estimatorssuch as the ordinary Total Least Squares estimator.Key words. deconvolution, structured Total Least Squares, displacement rank, StructuredTotal Least NormAMS subject classi�cations. 15A03, 62P10, 65C051. Introduction. Deconvolution problems occur in many areas such as re ec-tion seismology, telecommunications, medical applications [5, 20, 3, 11],... to namejust a few. In this paper we develop a fast algorithm for the basic deconvolution prob-lem. The latter problem is depicted in �gure 1.1, where u(k) represents the input andy(k) the output at time k; �u(k) and �y(k) are i.i.d. white Gaussian measurementnoise added respectively to the input and to the output. The system, representedby its transfer function X(z), is a linear time-invariant system with impulse responsex 2 Rn�1 . The basic deconvolution problem can now be formulated as follows:Given the measurements u(k) + �u(k); k = 1; : : : ;m and y(k) + �y(k);k = 1; : : : ;m; of a system, �nd a Maximum Likelihood (ML) estimatefor the system impulse response x(i); i = 1; : : : ; n:In the next section we show that the ML estimator for the basic deconvolutionproblem is a so-called structured Total Least Squares (sTLS) problem. The sTLSproblem is an extension of the ordinary Total Least Squares (TLS) problem. Theordinary TLS problem can be formulated as follows:min�A;�b;xk[�A �b]k2F(1.1)�This work was supported by the National Research Council of Italy under the Short TermMobility Program, by the Belgian Programme on Interuniversity Poles of Attraction (IUAP-4/2 &24), initiated by the Belgian State, Prime Minister's O�ce for Science, and by a Concerted ResearchAction (GOA) project of the Flemish Community, entitled \Model-based Information ProcessingSystems".yDipartimento di Matematica, Universit�a della Basilicata, via N. Sauro 85, 85100 Potenza, Italy([email protected]). This author would like to acknowledge the hospitality of Dept. Elek-trotechniek (ESAT), Katholieke Universiteit Leuven, where this work took placez Department of Electrical Engineering, ESAT-SISTA, Katholieke Universiteit Leuven, KardinaalMercierlaan 94, 3001 Heverlee, Belgium ([email protected]). This authoris a Ph.D. student funded by the IWT (Flemish Institute for Support of Scienti�c-TechnologicalResearch in Industry).xDepartment of Electrical Engineering, ESAT-SISTA, Katholieke Universiteit Leuven, KardinaalMercierlaan 94, 3001 Heverlee, Belgium([email protected]). This author is aResearch Associate with the F.W.O. (Fund for Scienti�c Research-Flanders).1

PSfrag replacementsu(k)�u(k) u(k) + �u(k)y(k)�y(k)X(z)

y(k) + �y(k)Fig. 1.1. This �gure shows a schematic outline of the basic deconvolution problem. The goalis to estimate the impulse response x starting from the noisy input (u +�u) and the noisy output(y +�y). such that (A+�A)x = b+�b;with A 2 Rm�n and b 2 Rm�1 . The sTLS formulation additionally imposes a struc-ture on the correction matrix [�A �b] (e.g. a Hankel structure), hence its namestructured TLS. Furthermore, it is possible that the di�erent elements in [�A �b]get a user-de�ned weighting, di�erent from the one represented by the Frobenius normof [�A �b].In recent years many problem formulations and associated solution methods have beendevised for the sTLS problem: the Structured Total Least Norm (STLN) approach[21, 22, 27], the Constrained Total Least Squares (CTLS) approach [1, 2] and theStructured Total Least Squares (STLS) approach [6]. We will use the straightforwardoptimization approach adopted in the STLN framework, since the other approacheseither do not have e�cient algorithms to solve them (e.g. STLS approach) or intro-duce numerical inaccuracies by forming products involving the data matrix [A b] andits transpose (e.g. CTLS approach). Section 3 describes a fast algorithm for solvingthe kernel problem of the STLN approach applied to the basic deconvolution problem:a Least Squares (LS) problem involving structured matrices. As will be shown, thealgorithm is based on the low displacement rank of the involved matrices. Section 4describes an example of a typical deconvolution problem. A simulation experimentbased on a medical application in renography is described. This example is used todemonstrate the improved e�ciency of the new algorithm compared to other algo-rithms for solving this type of sTLS problem. By means of a Monte-Carlo simulation,using several noise levels, we show the improved statistical accuracy of the decon-volution results obtained with the sTLS estimator as compared to other estimatorssuch as the TLS estimator that do not impose a structure on the correction matrix[�A �b].2. The basic deconvolution problem. Starting from the problem formulationof the basic deconvolution problem in section 1, it is straightforward to show that aML estimate can be found as the solution of the following problem (for a proof, see[2]): minE;x �T�+ �T�s.t. (A+E)x = y + �;(2.1) 2

with A = 26664 u(n) u(n� 1) : : : u(1)u(n+ 1) u(n) : : : u(2)... . . . ...u(m+ n� 1) u(m+ n� 2) : : : u(m) 37775 2 Rm�n ;E = 26664 �(n) �(n� 1) : : : �(1)�(n+ 1) �(n) : : : �(2)... . . . ...�(m+ n� 1) �(m+ n� 2) : : : �(m) 37775 2 Rm�n ;� = (A+E)x� y 2 Rm�1with E the correction applied to A, � the correction applied to y, y 2 Rm�1 the outputand x 2 Rn�1 the impulse response. Problem (2.1) is a structured TLS problem,since corrections can be applied to the left hand side matrix A of the constraints in(2.1) (implying that it is a Total Least Squares type problem) and in addition thecorresponding correction matrix E is structured (implying that we have to deal witha structured TLS problem). As already mentioned in the introduction, we will applythe STLN approach, implying that we solve (2.1) as an optimization problem. Usingthe zeroth and �rst order terms of the Taylor series expansion of � = (A+E(�))x�y (where we use the notation E(�) to denote the dependence of E on �) around[�T xT ]T , we obtain the Gauss-Newton method for solving (2.1) (for a proof see[22]). The outline of the basic deconvolution algorithm is then as follows:Basic Deconvolution AlgorithmInput: extended data matrix [A b] 2 Rm�(n+1) (m > n) of full rank n+ 1.Output: correction vector � and parameter vector x s.t. �T� + �T� is as small aspossible and � = (A+E(�))x � y.Step 1: � 0x AnbStep 2: while stop criterion not satis�edStep 2.1: min�x;�� kM � ���x �+ � �� � k2with M = � X A+EI 0 � 2 R(2m+n�1)�(m+2n�1)Step 2.2: x x+�x� �+��endwith X 2 Rm�(m+n�1) de�ned such that X� = Ex. Taking E as de�ned previouslyin the deconvolution problem (2.1), X becomesX = 266666664 x(n) x(n� 1) � � � x(1) 0 � � � � � � 00 x(n) x(n� 1) � � � x(1) 0 ...... . . . . . . . . . ...... . . . . . . . . . 00 � � � � � � 0 x(n) x(n� 1) � � � x(1)377777775 :3

Note that Anb in Step 1 is a shorthand notation for the LS solution of the overdeter-mined system of equations Ax � b. As described in [15], more advanced initializationsteps are possible. They yield better starting values in the sense that convergencetakes place in fewer iterations and to a better local minimum. However the priceto be paid is an increase in computational complexity of the initialization. Due tothe nature of the problem we consider, the simple LS estimate will turn out to besu�cient in the considered application.We use the following stop criterion in our implementation of the algorithm:k[��T �xT ]k2 < 10�6:3. Fast algorithm. In this section we describe a fast algorithm for solving theLS problem in Step 2.1 of the basic deconvolution algorithm described in the previoussection. It basically consists of a fast triangularization of the matrix M followed bythe solution of the normal equationsMTM [��T �xT ]T = RTR[��T �xT ]T = �MT [�T �T ]T ;with M = Q[RT 0]T the QR factorization of M . The triangularization of the matrixM can be obtained by means of the QR decomposition of M with a computationalcomplexity of O((m + n)3): The algorithm considered in [22] requires O(mn2 +m2) ops. We propose an algorithm for computing the matrix R based on the displacementrepresentation [13, 7] of the matrix MTM that requires O(mn + n2) ops. First wedescribe brie y the displacement representation of a matrix (see [7, 13] for moredetails).Let Zk 2 Rk�k be the lower shift matrix, that isZk = 26664 0 0 � � � 01 0 � � � 0. . . . . .1 0 37775 :and x 2 Rk . We denote by Lk(x) the so{called Krylov matrix generated by x :Lk(x) = �x; Zkx; Z2kx; : : : ; Zk�1k x� :The following lemma holds [7].Lemma 3.1. For an arbitrary matrix A 2 Rk�k ;A� ZkAZTk = �̂Xi=1 xiyTi if and only if A = �̂Xi=1 Lk(xi)Lk(yi)T :The matrix pair G�̂(A) = fX;Y g ; where X = �x1; : : : ; x�̂� and Y = �y1; : : : ; y�̂� ; iscalled a generator of A: Generators are clearly not unique and can be of di�erentlengths. The smallest possible length is called the displacement rank of A and denotedby �(A):Remark 3.1. A symmetric matrix A has a symmetric generator, in the sensethat xi = �yi; i = 1; : : : ; �̂: Hence, its displacement representation has the symmetricform A = pXi=1 Lk(xi)Lk(xi)T � �̂Xi=p+1Lk(xi)Lk(xi)T :4

Given a displacement representation of MTM , it is possible to construct a fac-torization procedure with a computational complexity proportional to the displace-ment rank of MTM . Since MTM is a block{Toeplitz{like matrix it is more naturalto consider its displacement representation with respect to the block{shift matrixZ = Zm+n�1 � Zn [12].3.1. Generators forMTM . For the sake of brevity, in the following we indicatethe Krylov matrices Lm+2n�1(x) by L(x): The displacement rank of MTM is 5, thatis �(MTM) = 5.Lemma 3.2. The following vectors form a generator of MTM .x1 =M(1; :)Tx2 = e1x3x4x5 = [0;M(m; 1 : m+ n� 2)T ; 0;M(m;m+ n+ 1 : m+ 2n� 2)T ]T ;where e1 = [1; 0; : : : ; 0| {z }m+2n�2]T .Let w = M(2 : 2m + n � 1 :;m + n)=kM(2 : 2m + n � 1;m + n)k2 and t =MT (2 : 2m+ n � 1; 2 : m + 2n� 1)w: The generators x3 and x4 are then de�ned inthe following way:x3(i) = [0; tT ]T ; x4 = [0; tT (1 : m+ n� 1); 0; tT (m+ n+ 1 : m+ 2n� 1)]T :Then MTM = 3Xi=1 L(xi)L(xi)T � 5Xi=4 L(xi)L(xi)T(3.1)Proof. Construct MTM and ZMTMZT ; then straightforward manipulationsshows that MTM �ZMTMZT can be expressed as a sum of 5 rank one matrices.Following the technique described in [7, 19] we can easily construct an algorithmfor the computation of R that requires only O(m2 +mn+ n2) ops.In the following section we consider an algorithm with computational complexityof O(mn+ n2) taking into account the \sparsity" of the vectors xi; i = 1; : : : ; 5:3.2. Description of the algorithm. Starting from the vectors xi; i = 1; : : : ; 5;and following the same steps of the method proposed in [7, 19], (see also the Ap-pendix), we transform these vectors in the following way,� xTixTj � := Q � xTixTj � ;(3.2)whereQ is either a Givens rotation (updating) if L(xi) and L(xj) have the same sign inthe sum (3.1) or a hyperbolic rotation (downdating) if these terms have opposite signin the sum (3.1). We perform the downdating step by means of stabilized hyperbolicrotation [24], since the latter is more stable. Furthermore, at the kth iteration, thematrix Q is chosen to annihilate the kth entry of the resulting vector xj : At the endof the kth iteration, we have xj(k) = 0; j 6= 1: Then x1(k : m+2n� 1) is the kth rowof R and we set x1(k + 1 : m+ 2n� 1) := x1(k : m+ 2n� 2):We divide the algorithm in 4 phases: 5

1. initialization: i = 1,2. the iterations for i = 2 : m;3. the iterations for i = m+ 1 : m+ n� 1;4. the iterations for i = m+ n : m+ 2n� 1:3.2.1. Initialization: i = 1. The only vectors with the �rst entry di�erent from0 are x1 and x2: The new vectors ~x1 and ~x2 are computed as� ~xT1~xT2 � = G � xT1xT2 � ;where G is the Givens rotation chosen to annihilate the �rst element of x2: The �rstrow of R is ~xT1 : In the following we denote by x(i)k ; k = 1; : : : ; 5; the vectors at the ithiteration. Then we de�nex(1)1 = [0; ~x1(1 : m+ n� 2)T ; 0; ~x1(m+ n : m+ 2n� 2)T ]x(1)2 = ~x2x(1)3 = x3x(1)4 = x4x(1)5 = x5:The number of ops for this phase is 4n:3.2.2. Iterations for i = 2 : m. In each iteration of this phase L(x(i�1)1 ) is up-dated with L(x(i�1)2 ) and L(x(i�1)3 ); L(x(i�1)4 ) is updated with L(x(i�1)5 ) and L(x(i�1)1 )is downdated with L(x(i�1)4 ): Since x(1)5 = [0; : : : ; 0| {z }m ; xn; : : : ; x1; �m+n�1; : : : ; �m+1]T ;L(x(1)5 ) does not contribute to this phase.Moreover, the structure of the vectors x(i�1)k ; k = 1; 2; at the beginning of the ithiteration is x(i�1)1 = [0; : : : ; 0| {z }i�1 ; �; � � � ; �; �| {z }n ; 0; : : : ; 0| {z }m�i+1 ; �; � � � ; �| {z }n�1 ]Tx(i�1)2 = [0; : : : ; 0| {z }i�1 ; �; � � � ; �| {z }n�1 ; 0; : : : ; 0; 0| {z }m�i+1 ; �; � � � ; �| {z }n ]T ;where the entries � are in general di�erent from 0. Then updating L(x(i�1)1 ) withL(x(i�1)2 ) does not modify the sparsity of the updated x(i�1)1 : To complete the ithiteration, L(x(i�1)1 ) must be updated with L(x(i�1)3 ) and downdated with L(x(i�1)4 ):To explain this computation, we describe the iteration for i = 2; recalling that thevectors x(1)3 and x(1)4 are equal, except for the (m+n)th element (the entries of thesevectors are generally di�erent from 0). Letx(1)1 = [0; �2; : : : ; �n+1; 0; : : : ; 0| {z }m�2 ; �m+n; : : : ; �m+2n�1]T(3.3) x(1)3 = [0; �2; : : : ; �m+n�1; �m+n; : : : ; �m+2n�1]Tx(1)4 = [0; �2; : : : ; �m+n�1; �m+n; : : : ; �m+2n�1]T :6

The Givens rotation used during the updating isG = " c(1)G s(1)G�s(1)G c(1)G # ; with c(1)G = �2p�22 + �22 and s(1)G = �2p�22 + �22 :The updated vectors ~x(1)1 and ~x(1)3 are~x(1)1 = c(1)G x(1)1 + s(1)G x(1)3(3.4) ~x(1)3 = �s(1)G x(1)1 + c(1)G x(1)3 ;(3.5)with ~x(1)1 = [0; ~�2; : : : ; ~�m+2n�1]T �~�2 =p�22 + �22� : Moreover, we point out that,for (3.3), ~x(1)3 (n+ 2 : n+m� 1) = c(1)G x(1)3 (n+ 2 : n+m� 1):(3.6)To �nish the iteration for i = 2; L(~x(1)1 ) must be downdated with L(x(1)4 ) by meansof the stabilized hyperbolic rotationH = � 1 0� p1� �2 � " 1p1��2 00 1 # � 1 �0 1 � ;where � is such that H [~�2; �2]T = [�̂2; 0]T : Taking (3.4) into account, it is strightfor-ward to see that � = �s(1)G and p1� �2 = c(1)G :The downdated vectors x̂(1)1 and ~x(1)4 arex̂(1)1 = ~x(1)1 � s(1)G x(1)4c(1)G = x(1)1 + s(1)G (x(1)3 � x(1)4 )c(1)G(3.7)and ~x(1)4 = �s(1)G x̂(1)1 + c(1)G x(1)4(3.8)Hence, ~x(1)4 = �s(1)G x̂(1)1 + c(1)G x(1)4(3.9) = �s(1)G x(1)1 + c(1)G 2x(1)4 � s(1)G 2(x(1)3 � x(1)4 )c(1)G= �s(1)G x(1)1 + x(1)4 � (1� c(1)G 2)x(1)3c(1)G>From (3.5) and (3.9), ~x(1)3 and ~x(1)4 continue to be equal, except for the (m + n)thentry. Furthermore, from (3.7), we observe that x(1)1 and x̂(1)1 di�er in their (n+m)th7

entry. x̂(1)1 now becomes the 2nd row of R; and, for the next iteration, the updatedvectors arex(2)1 = [0; 0; x̂(1)1 (2 : m+ n� 2)T ; 0; x̂(1)1 (m+ n+ 1 : m+ 2n� 2)T ]Tx(2)5 = x(1)5x(2)2 = x(1)2x(2)3 = ~x(1)3x(2)4 = [~x(1)3 (1 : m+ n� 1); ; ~x(1)3 (m+ n+ 1 : m+ 2n� 1)] ;where = �s(1)G x̂(1)1 (m + n) + c(1)G x(1)4 (m + n): To reduce the computational cost ofthis phase, we observe that it is not necessary to calculate x(2)3 (n + 3 : m + n � 1)since at the next iteration the corresponding entries of the vector x(2)1 are equal to 0:Hence for the vector x(3)3 (n+ 4 : m+ n� 1) the following relation holds,x(3)3 (n+ 4 : m+ n� 1) = c(2)G c(1)G x(1)3 (n+ 4 : m+ n� 1)and, at the ith iterationx(i�1)3 (n+ i : m+ n� 1) = c(i�2)G � � � c(2)G c(1)G x(1)3 (n+ i : m+ n� 1):Hence it is su�cient to store the partial productc(i�2)G � � � c(2)G c(1)G(3.10)into a temporary variable, and multiply x(1)3 (n + i) with this variable at the ithiteration. At the end of each iteration we set R(i; i : m+2n�1) = x(i�1)1 (i : m+2n�1);x(i)1 (i+ 1 : m+ 2n� 1) = x(i�1)i (i : m+ 2n� 2); x(i)1 (m+ n) = 0: Hence the numberof ops of this phase is 18mn:3.2.3. Iterations for i = m + 1 : m+ n� 1. The only di�erence of this phasewith the previous one is that also L(x(i)5 ) must be downdated from L(x(i)1 ): We setR(i; i : m + 2n � 1) = x(i�1)1 (i : m + 2n � 1); x(i)1 (i + 1 : m + 2n � 1) = x(i�1)i (i :m+ 2n� 2); x(i)1 (m+ n) = 0: The number of ops of this phase is 24n2:3.2.4. Iterations for i = m+n : m+2n�1. This phase is similar to the previousone. The only di�erence is that also the vector x(i)4 must be computed, since now itdi�ers from x(i)3 : Also after each iteration of this phase we set R(i; i : m+ 2n� 1) =x(i�1)1 (i : m+2n�1); x(i)1 (i+1 : m+2n�1) = x(i�1)i (i : m+2n�2); x(i)1 (m+n) = 0The number of ops of this phase is 12:5n2:3.3. Matlab-like program. The algorithm is summarized in the function FTriang.generator is a Matlab function that, given the matrix M; computes the generatorxi; i = 1 : : : ; 5 of MTM: givens and hyp compute the Givens and hyperbolic ro-tation, respectively. The variables t1; t2; t3; t4 are temporary variables like temp inwhich the partial product (3.10) is stored.function[R]=FTriang(M;m; n)(x1; x2; x3; x4; x5) = generator(M);temp = 1;mn1 = m+ 2n� 1;mn2 = m+ 2n� 2;% Initialization 8

(c; s) = givens(x1(1); x2(1));� xT1 ([1 : n;m+ n : mn1])xT2 ([1 : n;m+ n : mn1]) � = � c s�s c �� xT1 ([1 : n;m+ n : mn1])xT2 ([1 : n;m+ n : mn1]) �R(1; 1 : mn1) = xT1 ; xT1 (2 : mn1) = xT1 (1 : mn2); x1(m+ n) = 0;% Phase 1for i = 2 : m;(c; s) = givens(x1(i); x2(i));� xT1 ([i : n+ i� 1; m+ n : mn1])xT2 ([i : n+ i� 1; m+ n : mn1]) � = � c s�s c �� xT1 ([i : n+ i� 1;m+ n : mn1])xT2 ([i : n+ i� 1;m+ n : mn1]) �t3 = x3(n+m); t4 = x4(n+m);(c; s) = givens(x1(i); x3(i));xT3 ([i+ 1 : n+ i� 1; m+ n : mn1]) = �sxT1 ([i+ 1 : n+ i� 1;m+ n : mn1])+cxT3 ([i+ 1 : n+ i� 1; m+ n : mn1]);x1(m+ n) = x1(m+ n) + s(t3 � t4)=c;x4(m+ n) = �sx1(m+ n) + ct4;temp = temp� c;if i < m;x3(n+ i) = temp� x3(n+ i);endR(i; i : mn1) = x1(i : mn1);x1(i+ 1 : mn1) = x1(i : mn2); x1(m+ n) = 0;end% Phase 2for i = m+ 1 : m+ n� 1;(c; s) = givens(x1(i); x2(i));� xT1 (i : mn1)xT2 (i : mn1) � = � c s�s c �� xT1 (i : mn1)xT2 (i : mn1) �t3 = x3(n+m); t4 = x4(n+m);(c; s) = givens(x1(i); x3(i));xT3 (i+ 1 : mn1) = �sxT1 (i+ 1 : mn1) + cxT3 (i+ 1 : mn1);x1(m+ n) = x1(m+ n) + s(t3 � t4)=c;x4(m+ n) = �sx1(m+ n) + ct4;s = x5(i)=x1(i);c1 =px1(i)2 � x5(i)2;c = c1=x1(i);xT1 (i+ 1 : m+ n) = �xT1 (i+ 1 : m+ n)� sxT5 (i+ 1 : m+ n)� =c;xT5 (i+ 1 : m+ n) = �sxT1 (i+ 1 : m+ n) + cxT5 (i+ 1 : m+ n);x1(i) = c1;R(i; i : mn1) = xT1 (i : mn1); xT1 (i+ 1 : mn1) = xT1 (i : mn2); x1(m+ n) = 0;endx4(n+m) = t4; x4(m+ n+ 1 : mn1) = x3(m+ n+ 1 : mn1);% Phase 3for i = m+ n : mn1;(c; s) = givens(x1(i); x2(i));� xT1 (i : mn1)xT2 (i : mn1) � = � c s�s c �� xT1 (i : mn1)xT2 (i : mn1) �(c; s) = givens(x1(i); x3(i));� xT1 (i : mn1)xT3 (i : mn1) � = � c s�s c �� xT1 (i : mn1)xT3 (i : mn1) �(c; s) = givens(x5(i); x4(i));� xT5 (i : mn1)xT4 (i : mn1) � = � c s�s c �� xT5 (i : mn1)xT4 (i : mn1) �s = x5(i)=x1(i); 9

c1 =px1(i)2 � x5(i)2;c = c1=x1(i);xT1 (i+ 1 : m+ n) = �xT1 (i+ 1 : m+ n)� sxT5 (i+ 1 : m+ n)� =c;xT5 (i+ 1 : m+ n) = �sxT1 (i+ 1 : m+ n) + cxT5 (i+ 1 : m+ n);x1(i) = c1;R(i; i : mn1) = xT1 (i : mn1); xT1 (i+ 1 : mn1) = xT1 (i : mn2);end 3.4. Modi�ed problem. We will now consider a slightly modi�ed problem(2.1). The modi�cation consists of the introduction of an error-free zero upper tri-angular part in the matrix A and by consequence E also has a zero upper triangularpart. The latter means that A and E are as follows:A = 26666666664

u(1) 0 : : : 0u(2) u(1) : : : 0... . . . . . . ...... . . . u(1)... ...u(m) u(m� 1) : : : u(m� n+ 1)37777777775 2 Rm�n ; E = 26666666664

�(1) 0 : : : 0�(2) �(1) : : : 0... . . . . . . ...... . . . �(1)... ...�(m) �(m� 1) : : : �(m � n+ 1)37777777775 2 Rm�n :By consequence, X becomes (remember that X is de�ned by X� = Ex):

X = 26666666664x(1)x(2) x(1)... . . . . . .x(n) . . . . . . . . .. . . . . . . . . . . .x(n) x(n� 1) � � � x(1)

37777777775 2 Rm�m :Again we de�ne M as M = � X A+EI 0 � 2 R2m�(m+n) ;with I 2 Rm�m the identity matrix and 0 2 Rm�n the null matrix. The displacementrank of MTM with respect to the block shift matrix Z = Zm�Zn is 5. Let w1 =M(:; 1)=kM(:; 1)k2 and t1 =MTw1: Let w2 =M(:;m+1)=kM(:;m+1)k2 and t2 =MTw2:Then the generators of MTM with respect to Z arex1 = t1x2 = [0; tT2 (2 : m+ n)]Tx3 = [0; tT2 (2 : m); 0; tT2 (m+ 2 : m+ n)]Tx4 = [0; tT1 (2 : m+ n)]Tx5 = [0;M(m; 1 : m� 1)T ; 0;M(m;m+ 1 : m+ n� 1)T ]Tand MTM = 2Xi=1 L(xi)L(xi)T � 5Xi=3 L(xi)L(xi)T :10

Also in this case, taking into account the sparsity of the vectors, x1; x4; x5; and sincex2 and x3 di�er only in the (m + 1)th entry, following the same technique describedin section 3.2, it is possible to construct an algorithm for the fast triangularizationof M with the same computational complexity as the previous one (O(mn + n2)).We omit the description of the algorithm for the sake of brevity and summarize thecomputations into the following Matlab function.function[R]=Ftriang2(M;m;n)(x1; x2; x3; x4; x5) = generator2(M);% Initializationtemp = 1;R(1; 1 : m+ n) = xT1 ; xT1 (2 : m+ n) = xT1 (1 : m+ n� 1); x1(m+ 1) = 0;% Phase 1for i = 2 : m� n+ 1;t2 = x2(m+ 1); t3 = x3(m+ 1);(c; s) = givens(x1(i); x2(i));xT2 ([i+ 1 : n+ i� 1;m+ 1 : m+ n]) = �sxT1 ([i+ 1 : n+ i� 1; m+ 1 : m+ n])+cxT2 ([i+ 1 : n+ i� 1;m+ 1 : m+ n])x1(m+ 1) = s � (t2 � t3)=c;x3(m+ 1) = c � t3 � s � x1(m+ 1);temp = temp � c;if i < m� n+ 1;x2(n+ i) = temp � x2(n+ i);ends = x4(i)=x1(i);c1 = px1(i)2 � x4(i)2;c = c1=x1(i);xT1 ([i+ 1 : n+ i� 1;m+ 1 : m+ n]) = �xT1 ([i+ 1 : n+ i� 1; m+ 1 : m+ n])�sxT4 ([i+ 1 : n+ i� 1; m+ 1 : m+ n])� =c;xT4 ([i+ 1 : n+ i� 1;m+ 1 : m+ n]) =�sxT1 ([i+ 1 : n+ i� 1;m+ 1 : m+ n])+cxT4 ([i+ 1 : n+ i� 1; m+ 1 : m+ n]);x1(i) = c1;R(i; i : m+ n) = x1(i : m+ n)T ;x1(i+ 1 : m+ n) = x1(i : m+ n� 1); x1(m+ 1) = 0;end% Phase 2for i = m� n+ 2 : m;t2 = x2(m+ 1); t3 = x3(m+ 1);(c; s) = givens(x1(i); x2(i));xT2 (i+ 1 : m+ n) = �sxT1 (i+ 1 : m+ n) + cxT2 (i+ 1 : m+ n);x1(m+ 1) = x1(m+ 1) + s � (t2 � t3)=c;x3(m+ 1) = c � t3 � s � x1(m+ 1);(c; s) = givens(x4(i); x5(i));� xT4 (i : m+ n)xT5 (i : m+ n) � = � c s�s c �� xT4 (i : m+ n)xT5 (i : m+ n) �s = x4(i)=x1(i);c1 = px1(i)2 � x4(i)2;c = c1=x1(i);xT1 (i+ 1 : m+ n) = �xT1 (i+ 1 : m+ n) +�sxT4 (i+ 1 : m+ n)� =c;xT4 (i+ 1 : m+ n) = �sxT1 (i+ 1 : m+ n) + cxT4 (i+ 1 : m+ n);x1(i) = c1;R(i; i : m+ n) = x1(i : m+ n)T ; 11

x1(i+ 1 : m+ n) = x1(i : m+ n� 1);x1(m+ 1) = 0;endx3(m+ 2 : m+ n) = x2(m+ 2 : m+ n);% Phase 3for i = m+ 1 : m+ n;(c; s) = givens(x1(i); x2(i));� xT1 (i : m+ n)xT2 (i : m+ n) � = � c s�s c �� xT1 (i : m+ n)xT2 (i : m+ n) �(c; s) = givens(x4(i); x3(i));� xT4 (i : m+ n)xT3 (i : m+ n) � = � c s�s c �� xT4 (i : m+ n)xT3 (i : m+ n) �(c; s) = givens(x4(i); x5(i));� xT4 (i : m+ n)xT5 (i : m+ n) � = � c s�s c �� xT4 (i : m+ n)xT5 (i : m+ n) �s = x4(i)=x1(i);c1 = px1(i)2 � x4(i)2;c = c1=x1(i);xT1 (i+ 1 : m+ n) = �xT1 (i+ 1 : m+ n) +�sxT4 (i+ 1 : m+ n)� =c;xT4 (i+ 1 : m+ n) = �sxT1 (i+ 1 : m+ n) + +cxT4 (i+ 1 : m+ n);x1(i) = c1;R(i; i : m+ n) = x1(i : m+ n)T ; x1(i+ 1 : m+ n) = x1(i : m+ n� 1);end 4. Numerical experiments. In this section we illustrate, by means of a de-convolution problem that occurs in renography [8], the e�ciency of the algorithmdescribed in section 3 and the increased statistical accuracy by using the sTLS esti-mator as compared to the TLS estimator used previously in [29]. The goal is to de-termine via deconvolution the so-called renal retention function of the kidney, whichin system theoretic terms corresponds to the impulse response x of the system in�gure 1.1. This retention function visualizes the mean whole kidney transit time ofone unit of a tracer, injected into the patient and enables a physician to evaluatethe renal function and renal dysfunction severity after transplantation. In order toobtain this impulse response, the following experiment is conducted. A radioactivetracer is injected in an artery of the patient. The input of the system (u in �gure1.1) is the arterial concentration of the radioactive tracer as a function of time. Thisconcentration is measured by means of a gamma camera and thus in discretized timeu(k) represents the number of counts registered in the vascular region at the entranceof the kidney under study during the kth sampling interval. The output y(k) (theso-called renogram) represents the number of counts registered in the whole kidneyregion by the gamma camera in the kth sampling interval. Deconvolution analysis ofthe renogram is based on modelling the kidney as a linear time-invariant system withzero-initial state. This is why we consider the modi�ed problem, described in section3.4. In a �rst subsection we will compare the e�ciency of the new algorithm withthat of the standard STLN approach (which does not exploit the low displacementrank structure of the kernel LS problem) and the improved version presented in [21].A second subsection shows the better statistical accuracy of the sTLS estimator ascompared to the other estimators. Since we want to evaluate some statistical proper-ties of the sTLS estimator, we will use the same simulation example as described in[29]. The noiseless input is described as follows:u0(k + 1) = Ae(�a1k�t) +Be(�a2k�t) + Ce(�a3k�t); k = 0; 1; 2; : : : ; (tobs=�t)� 112

with A = 40:3, B = 45:2, C = 15:2, a1 = 1:8, a2 = 0:43 and a3 = 0:035. �t representsthe sampling interval, expressed in minutes, and tobs is the total observation time.The exact impulse response is characterized by the following function:x0(k + 1) = � 1 k = 0; 1; 2; : : : ; (t1=�t)� 1(k�t� t2)=(t1 � t2) k = t1=�t; (t1=�t) + 1; : : : ; (t2=�t)� 1 ;with t1 = 3 minutes, t2 = 5 minutes and tobs = 20 minutes. The noiseless output y0is then obtained by convolution of the noiseless input u0 with the proposed impulseresponse x0. In matrix format we have thatAx0 = y0;where A = 26666666664u0(1) 0 : : : 0u0(2) u0(1) : : : 0... . . . . . . ...... . . . . . . u0(1)... . . . . . . ...u0(m) u0(m� 1) : : : u0(m� n+ 1)

37777777775 2 Rm�n :For �t = 1=3 minutes, we have that m � 3tobs = 60 and n � 3t2 = 15. Asdescribed in [9], these functions and constants are a realistic simulation of real in-vivomeasurements.4.1. E�ciency. In this subsection we give the computational cost of the de-convolution algorithm for the modi�ed problem described in section 3.4. Given theiterative nature of the algorithm, the total number of oating point operations ( ops)is a multiple of the ops necessary to execute Step 2.1 of the basic deconvolution algo-rithm. Therefore, we will analyze this step in further detail and see how it comparesto standard algorithms for solving Step 2.1.In Step 2.1 we solve a LS problem by solving the corresponding normal equationsRT (R � ���x �) = �MT � �� � ;(4.1)where R is the triangular factor of the QR factorization of M . From (4.1), thedescription of the fast QR algorithm in section 3.4 and the speci�c structure of Mand R, we obtain the following number of computations for Step 2.1:(i) Calculation of the generators: 4mn+ 4m+ 2n2 � 2n+ 12.(ii) Construction of right hand side (see (4.1)): 4mn+ 4m� n2 � n+ 8.(iii) Fast QR: O(mn+ n2).(iv) Solving lower triangular system (see (4.1)): 4mn+ 3m+ n+ 1.(v) Solving upper triangular system (see (4.1)): 4mn+ 5m+ n+ 6.This means that overall the algorithm is O(mn), when m >> n as is mostly the case.We consider 3 di�erent cases. The �rst case is the simulation example we alreadydescribed: m = 60 and n = 15. For the second case, we consider tobs = 60 minutesand t2 = 5 minutes, and obtain a data matrix A 2 Rm�n , with m = 180 and n = 15.The third case results from taking tobs = 20 and t2 = 10 minutes, yielding a datamatrix A 2 Rm�n , with m = 60 and n = 30. In table 4.1 we give the number of ops13

of the di�erent parts of Step 2.1, as well as their sum. Flops are calculated using theMatlab command ops. From the table we clearly can see that the computationalcost of Step 2.1 is O(mn). Table 4.1This table shows the number of ops for the di�erent parts of Step 2.1, as well as their sum.The number of ops is measured in Matlab using the ops command.m� n part 1 part 2 part 3 part 4 part 5 total60� 15 4458 3608 22175 3796 3921 37958180� 15 12738 11288 58895 11356 11721 10599860� 30 8583 6518 49640 7411 7536 79688To illustrate the better computational e�ciency of the newly presented STLN algo-rithm (referred to as alg1), we compare its e�ciency with that of the standard STLNapproach [22, 27] (referred to as alg2) and the faster STLN algorithm for Toeplitzstructured TLS problems described in [21] (referred to as alg3). As example we takethe basic deconvolution problem considered in [27]. We use an example di�erent fromthe previous paragraph since the algorithm alg3 solves the basic deconvolution prob-lem and not the modi�ed one. As a consequence alg1 is the algorithm described insection 3.2. Table 4.2 clearly shows the computational advantage of alg1 over alg2and alg3, for di�erent sizes m� n of the matrix A.Table 4.2This table shows the following ratios: total number of ops of alg2 w.r.t. total number of opsof alg1 (flopsalg2=flopsalg1) and total number of ops of alg3 w.r.t. total number of ops of alg1(flopsalg3=flopsalg1), for matrices A of dimensionm�n. The number of ops is measured in Matlabusing the ops command.m� n flopsalg2=flopsalg1 flopsalg3=flopsalg190� 20 117:2 5:5180� 20 368:6 6:290� 10 162:5 3:54.2. Accuracy. We compare the statistical accuracy of the sTLS estimator withthat of the TLS estimator which in [29] was shown to outperform by far other esti-mators. To this end, we perform for each noise standard deviation �� a Monte-Carlosimulation consisting of 100 runs. In every run, we add a di�erent realization ofi.i.d. white Gaussian noise with standard deviation �� to the noiseless input u0 andthe noiseless output y0 of the previously described medical simulation example. Theobtained noisy vectors u and y are used as input to the modi�ed deconvolution al-gorithm described at the beginning of section 3.4. To compare the performance ofboth estimators at a noise level �� , we average for both estimators the following rel-ative error kx�x0k2kx0k2 over the di�erent runs. Table 4.3 shows that in the case of thesTLS estimator, the relative errors are 9% to 14% lower than in the case of the TLSestimator, con�rming the statistical superior performance of the sTLS estimator.5. Conclusions. We have proposed a new algorithm for solving the basic de-convolution problem in a ML sense. The algorithm solves the corresponding sTLS14

Table 4.3This table shows the relative error kx�x0k2kx0k2 for the TLS and sTLS estimator, averaged over 100runs, for di�erent noise standard deviations �� .�� TLS sTLS0.05 0.0010 0.00090.071 0.0015 0.00130.087 0.0018 0.00160.1 0.0021 0.00180.158 0.0032 0.00290.224 0.0046 0.00410.274 0.0056 0.00490.316 0.0064 0.0057�� TLS sTLS0.5 0.0104 0.00910.707 0.0138 0.01230.866 0.0177 0.01531 0.0204 0.01801.58 0.0313 0.02792.24 0.0454 0.04022.74 0.0576 0.05133.16 0.0660 0.0601problem in O(mn) ops by exploiting the low displacement rank of the matrices in-volved in the basic deconvolution problem.By means of a deconvolution example, we illustrated the improved e�ciency of ouralgorithm as compared to other algorithms for solving this type of sTLS problem. Weuse a medical example in renography to illustrate the superior statistical performanceof the sTLS estimator as compared to other estimators.Appendix. Triangularization of a symmetric positive de�nite matrixexpressed by its displacement representation. Let A 2 Rn�n be a symmetricpositive de�nite matrix with displacement representationA = mXi=1 L(xi)L(xi)T � pXi=m+1L(xi)L(xi)T ; p � n;(A.1)where xi 2 Rn ; i = 1; : : : ; p: We can write (A.1) in the following way

A = [L(x1); : : : L(xm); L(xm+1); : : : ; L(xp)]2666666664 I . . . I �I . . . �I37777777752666666664 L(x1)T...L(xm)TL(xm+1)T...L(xp)T

3777777775= [L(x1); : : : L(xm); L(xm+1); : : : ; L(xp)]J 2666666664 L(x1)T...L(xm)TL(xm+1)T...L(xp)T

3777777775where I is the identity matrix of order n: We say that a matrix Q is J{orthogonal ifJ = QJQT : To compute the Cholesky factor of A, we have to construct a J{orthogonal15

matrix Q such that Q2666666664 L(x1)T...L(xm)TL(xm+1)T...L(xp)T3777777775 = 26666664 RO......O

37777775 ;where R;O 2 Rn�n ; R is upper triangular and O is the null matrix.As an example, we describe brie y how to obtain the matrix R in case A 2 R5�5and its displacement representation is given by:A = L(x1)L(x1)T + L(x2)L(x2)T � L(x3)L(x3)T � L(x4)L(x4)T :Then J = 2664 I I �I �I 3775 ;where I is the identity matrix of order 5, andL =

266666666666666666666666666666666664

x1(1) x1(2) x1(3) x1(4) x1(5)x1(1) x1(2) x1(3) x1(4)x1(1) x1(2) x1(3)x1(1) x1(2)x1(1)x2(1) x2(2) x2(3) x2(4) x2(5)x2(1) x2(2) x2(3) x2(4)x2(1) x2(2) x2(3)x2(1) x2(2)x2(1)x3(1) x3(2) x3(3) x3(4) x3(5)x3(1) x3(2) x3(3) x3(4)x3(1) x3(2) x3(3)x3(1) x3(2)x3(1)x4(1) x4(2) x4(3) x4(4) x4(5)x4(1) x4(2) x4(3) x4(4)x4(1) x4(2) x4(3)x4(1) x4(2)x4(1)

377777777777777777777777777777777775:(A.2)

Denote byGi;j ; Hi;j the Givens and hyperbolic rotation of order 20 respectively, whereGi;j ; Hi;j are the identity matrices except for the following 4 entriesGi;j(i; i) = Gi;j(j; j) = c; Gi;j(i; j) = s; Gi;j(j; i) = �s;Hi;j(i; i) = Hi;j(j; j) = c; Hi;j(i; j) = �s; Hi;j(j; i) = �s;16

the following matrices are J{orthogonal:Gi;j ; 1 � i; j � 10; or 11 � i; j � 20Hi;j ; 1 � i � 10; and 11 � j � 20:Now we describe brie y how to annihilate the matrices L(xi)T ; i = 2; : : : ; 4; to obtainR: At the �rst step, we consider the Givens rotation G1;6 chosen to annihilate theentries x2(1) of L: We can construct the Givens matrices Gi;i+5; i = 2; : : : ; 5; suchthat the multiplication of these matrices with L annihilates the diagonal elements ofthe matrix L(x2)T without any computation, because of the block Toeplitz structureof L. We remark that these matrices are J{orthogonal. Let Q1;2 =Q5i=1Gi;i+5: ThenQ1;2L =

266666666666666666666666666666666664

~x1(1) ~x1(2) ~x1(3) ~x1(4) ~x1(5)~x1(1) ~x1(2) ~x1(3) ~x1(4)~x1(1) ~x1(2) ~x1(3)~x1(1) ~x1(2)~x1(1)0 ~x2(2) ~x2(3) ~x2(4) ~x2(5)0 ~x2(2) ~x2(3) ~x2(4)0 ~x2(2) ~x2(3)0 ~x2(2)0x3(1) x3(2) x3(3) x3(4) x3(5)x3(1) x3(2) x3(3) x3(4)x3(1) x3(2) x3(3)x3(1) x3(2)x3(1)x4(1) x4(2) x4(3) x4(4) x4(5)x4(1) x4(2) x4(3) x4(4)x4(1) x4(2) x4(3)x4(1) x4(2)x4(1)

377777777777777777777777777777777775:

We call the multiplication of Q1;2 times L an update between L(x1) and L(x2): Thenwe set L := Q1;2L: We remark that the newly computed matrices L(x1) and L(x2)continue to have the Toeplitz structure. To compute this update it is su�cient toupdate the vectors x1 and x2 (instead of L(x1) and L(x2)),� ~xT1~xT2 � = � c s�s c �� x1Tx2T � ;where c and s are the same \Givens coe�cients" of the matrix G1;6: In the sameway, we compute the Givens rotation G11;16 chosen to annihilate the (16; 1) entryof the new L and the corresponding Givens matrices Gi+10;i+15; i = 2; : : : ; 5: Let17

Q3;4 =Q5i=1Gi+10;i+15: Then

Q3;4L =266666666666666666666666666666666664

~x1(1) ~x1(2) ~x1(3) ~x1(4) ~x1(5)~x1(1) ~x1(2) ~x1(3) ~x1(4)~x1(1) ~x1(2) ~x1(3)~x1(1) ~x1(2)~x1(1)0 ~x2(2) ~x2(3) ~x2(4) ~x2(5)0 ~x2(2) ~x2(3) ~x2(4)0 ~x2(2) ~x2(3)0 ~x2(2)0~x3(1) ~x3(2) ~x3(3) ~x3(4) ~x3(5)~x3(1) ~x3(2) ~x3(3) ~x3(4)~x3(1) ~x3(2) ~x3(3)~x3(1) ~x3(2)~x3(1)0 ~x4(2) ~x4(3) ~x4(4) ~x4(5)0 ~x4(2) ~x4(3) ~x4(4)0 ~x4(2) ~x4(3)0 ~x4(2)0

377777777777777777777777777777777775:(A.3)

We call the multiplication of Q3;4 times L an update between L(x3) and L(x4) andde�ne L := Q3;4L: Also in this case, because of the Toeplitz structure of these ma-trices, in order to compute the new L it is su�cient to update the vectors x3 andx4 only, i.e. � ~xT3~xT4 � = � c s�s c �� x3Tx4T � ; where c and s are the same \Givenscoe�cients" of the matrix G11;16: To complete an iteration we consider the hyperbolicrotaton H1;11 constructed to annihilate the (11; 1) entry of the matrix (A.3). With-out any computation we obtain from H1;11 also the associated hyperbolic rotations18

Hi;i+10; i = 2; : : : ; 5: Let S1;3 =Q5i=1Hi;i+10: ThenS1;3L =

266666666666666666666666666666666664

x̂1(1) x̂1(2) x̂1(3) x̂1(4) x̂1(5)x̂1(1) x̂1(2) x̂1(3) x̂1(4)x̂1(1) x̂1(2) x̂1(3)x̂1(1) x̂1(2)x̂1(1)0 ~x2(2) ~x2(3) ~x2(4) ~x2(5)0 ~x2(2) ~x2(3) ~x2(4)0 ~x2(2) ~x2(3)0 ~x2(2)00 x̂3(2) x̂3(3) x̂3(4) x̂3(5)0 x̂3(2) x̂3(3) x̂3(4)0 x̂3(2) x̂3(3)0 x̂3(2)00 ~x4(2) ~x4(3) ~x4(4) ~x4(5)0 ~x4(2) ~x4(3) ~x4(4)0 ~x4(2) ~x4(3)0 ~x4(2)0

377777777777777777777777777777777775:(A.4)

We observe that the matrix S1;3Q3;4Q1;2 is J{orthogonal. We call the multiplicationof S1;3 times L a downdate between L(x1) and L(x3): Also in this case, since thesematrices have a Toeplitz structure, it is su�cient to downdate the vectors x1 and x3in order to compute the multiplication S1;3L,� x̂T1x̂T3 � = � c �s�s c �� ~xT1~xT3 � ;where c and s are the same \hyperbolic coe�cients" of the matrix H1;11: Then, the�rst row of (A.4) is the �rst row of R: To compute the other rows of R; we apply theprevious procedure to the matrix L(2 : 20; 2 : 5):REFERENCES[1] T. J. Abatzoglou and J. M. Mendel, Constrained Total Least Squares, IEEE InternationalConf. on Acoustics, Speech & Signal Processing, Dallas, 1987, pp. 1485-1488.[2] T. J. Abatzoglou, J. M. Mendel and G. A. Harada, The Constrained Total Least SquaresTechnique and its Applications to Harmonic Superresolution, IEEE Transactions on SignalProcessing, 39 (1991), pp. 1070-1086.[3] M.T. Bajen, R. Puchal, A. Gonzalez, J.M. Grinyo, A. Castelao, J. Mora, J. MartinComin, MAG3 renogram deconvolution in kidney transplantation: utility of the measure-ment of initial tracer uptake, J. Nucl. Med., Aug. 1997, 38(8), pp. 1295-1299.[4] A.W. Bojanczyk, R.P. Brent, P. Van Dooren and F.R. De Hoog, A note on downdatingthe Cholesky factorization, SIAM J. Sci. Stat. Comput., 1 (1980), pp. 210{220.[5] A. Capderou, D. Douguet, T. Similowski, A. Aurengo, M. Zelter, Non-invasive assess-ment of technetium-99m albumin transit time distribution in the pulmonary circulation by�rst-pass angiocardiography, Eur. J. Nucl. Med., Jul. 1997, 24(7), pp. 745-753.[6] B. De Moor, Total least squares for a�nely structured matrices and the noisy realizationproblem, IEEE Transactions on Signal Processing, 42 (1994), pp. 3004-3113.[7] J. Chun, T. Kailath and H. Lev{ari, Fast parallel algorithms for QR and triangular factor-ization, SIAM J. Sci. and Stat. Comp., Vol. 8, No. 6 (1987), pp. 899{913.19

[8] J.S. Fleming and B.A. Goddard, A technique for the deconvolution of the renogram, Phys.Med. Biol., 19 (1974), pp. 546-549.[9] J.S. Fleming, Measurement of Hippuran plasma clearance using a gamma camera, Ibid., 22(1977), pp 526{530.[10] G. H. Golub and C. F. Van Loan, Matrix Computations, Third ed., The John HopkinsUniversity Press, Baltimore, MD, 1996.[11] R. Howman Giles, A. Moase, K. Gaskin, R. Uren, Hepatobiliary scintigraphy in a pediatricpopulation: determination of hepatic extraction fraction by deconvolution analysis, J. Nucl.Med., Feb. 1993, 34(2), pp. 214-221.[12] T. Kailath, and J. Chun, Generalized displacement structure for block{Toeplitz, Toeplitz{block and Toeplitz{derived matrices, SIAM J. Matrix Anal. Appl., 15, (1994), pp. 114{128.[13] T. Kailath, S. Kung and M. Morf, Displacement ranks of matrices and linear equations, J.Math. Anal. Appl.,68, (1979), pp. 395{407.[14] S.Y. Kung and K.S. Arun and D.V. Bhaskar Rao, State-space and singular-valuedecomposition-based approximation methods for the harmonic retrieval problem, J. Opt.Soc. Am., vol. 73, no. 12 (1983), pp. 1799{1811.[15] Lemmerling P., Dologlou I., Van Huffel S., Speech compression based on exact modelingand structured total least norm optimization. Proceedings of the International Conferenceon Acoustics, Speech, and Signal Processing (ICASSP 98), Seattle, Washington, U.S.A.,May 12-15, 1998, Vol. I, pp. 353{356.[16] Lemmerling P., Dologlou I., Van Huffel S., Variable rate speech compression based onexact modeling and waveform vector quantization. Proceedings Signal Processing Sympo-sium (SPS 98), IEEE Benelux Signal Processing Chapter, Leuven, Belgium, March 26-27,1998, pp. 127-130.[17] Lemmerling P., Van Huffel S., De Moor B., \Structured total least squares problems :formulations, algorithms and applications", in Recent Advances in Total Least SquaresTechniques and Errors-in-Variables modeling SIAM, Philadelphia, 1997, pp. 215-223.[18] J.G. Nagy, Fast inverse QR factorization for Toeplitz matrices, SIAM J. Sci. and Stat. Comp.,Vol. 14, No.5, (1993), pp. 1174{1193.[19] H. Park and L. Eld�en, Stability analysis and fast triangularization of Toeplitz matrices,Numer. Math, 76 (1997), pp 383{402.[20] J.J. Pedroso de Lima, Nuclear medicine and mathematics, Eur. J. Nucl. Med., Jun. 1996,23(6), pp. 705-719.[21] J. B. Rosen, H. Park and J. Glick, Total least norm problem: formulation and algorithms,Army High Performance Computing Research Center, preprint 94{041, University of Min-nesota, November, 1993, revised July, 1994.[22] J. B. Rosen, H. Park and J. Glick, Total least norm formulation and solution for structuredproblems, SIAM J. Matrix Anal. Appl., Vol. 17, No. 1 (1996), pp. 110{126.[23] P. Stoica, R. L. Moses, B. Friedlander, and T. S�oderstr�om,Maximum Likelihood Estima-tion of the Parameters of Multiple Sinusoids from Noisy Measurements, IEEE Transactionson Acoustics, Speech, and Signal Processing, 37 (1989), pp. 378-391.[24] M. Stewart and P. Van Dooren, Stability issues in the factorization of structured matrices,SIAM J. Matrix Anal. Appl., Vol. 18, (1997), pp. 104{118.[25] S. Van Huffel, C. Decanniere, H. Chen, P. Van Hecke, Algorithm for Time-Domain NMRData Fitting Based on Total Least Squares, J. Magn. Reson., A110 (1994), pp. 228-237.[26] S. Van Huffel and J. Vandewalle, The total least squares problem : computational aspectsand analysis, Frontiers in Applied Mathematics series, Vol. 9, SIAM, Philadelphia, 1991.[27] S. Van Huffel, H. Park and J. Ben Rosen, Formulation and Solution of Structured TotalLeast Norm Problems for Parameter Estimation, IEEE Trans. on Signal Processing, SP-44,no. 10 (1996), pp. 2464-2474.[28] S. Van Huffel,H. Park and J. B. Rosen, Total least norm problem formulation and solutionof structured problems in parameter extraction, Proc. of the ProRISC/IEEE Workshop onCircuits, Systems and Signal Processing, Mierlo, The Netherlands, Mar. 1995, pp. 317{326.[29] S. Van Huffel, J. Vandewalle, M. Ch. De Roo, and J.L. Willems, Reliable and e�cientdeconvolution technique based on total linear least squares for calculating the renal reten-tion function, Med. & Biol. Eng. & Comput., 25 (1987), pp. 26{33.20